Note: This is a reblog from the OKFN Science Blog. As part of my duties as a Panton Fellow, I will be regularly blogging there about my activities concerning open data and open science.


by Leo Reynolds

Altmetrics, web-based metrics for measuring research output, have recently received a lot of attention. Started only in 2010, altmetrics have become a phenomenon both in the scientific community and in the publishing world. This year alone, EBSCO acquired PLUM Analytics, Springer included Altmetric info into SpringerLink, and Scopus augmented articles with Mendeley readership statistics.

Altmetrics have a lot of potential. They are usually earlier available than citation-based metrics, allowing for an early evaluation of articles. With altmetrics, it also becomes possible to assess the many outcomes of research besides just the paper – meaning data, source code, presentations, blog posts etc.

One of the problems with the recent hype surrounding altmetrics, however, is that it leads some people to believe that altmetrics are somehow intrinsically better than citation-based metrics. They are, of course, not. In fact, if we just replace the impact factor with the some aggregate of altmetrics then we have gained nothing. Let me explain why.

The problem with metrics for evaluation

You might know this famous quote:

“All models are wrong, but some are useful” (George Box)

It refers to the fact that all models are a simplified view of the world. In order to be able to generalize phenomena, we must leave out some of the details. Thus, we can never explain a phenomenon in full with a model, but we might be able to explain the main characteristics of many phenomena that fall in the same category. The models that can do that are the useful ones.

Example of a scientific model, explaining atmospheric composition based on chemical process and transport processes.  Source: Strategic Plan for the U.S. Climate Change Science Program (Image by  Phillipe Rekacewicz)

Example of a scientific model, explaining atmospheric composition based on chemical process and transport processes. Source: Strategic Plan for the U.S. Climate Change Science Program (Image by Phillipe Rekacewicz)

The very same can be said about metrics – with the grave addition that metrics have a lot less explanatory power than a model. Metrics might tell you something about the world in a quantified way, but for the how and why we need models and theories. Matters become even worse when we are talking about metrics that are generated in the social world rather than the physical world. Humans are notoriously unreliable and it is hard to pinpoint the motives behind their actions. A paper may be cited for example to confirm or refute a result, or simply to acknowledge it. A paper may be tweeted to showcase good or to condemn bad research.

In addtion, all of these measures are susceptible to gaming. According to ImpactStory, an article with just 54 Mendeley readers is already in the 94-99 percentile (thanks to Juan Gorraiz for the example). Getting your paper in the top ranks is therefore easy. And even indicators like downloads or views that go into the hundreds of thousands can probably be easily gamed with a simple script deployed on a couple of university servers around the country. This makes the old citation cartel look pretty labor-intensive, doesn’t it?

Why we still need metrics and how we can better utilize them

Don’t get me wrong: I do not think that we can come by without metrics. Science is still growing exponentially, and therefore we cannot rely on qualitative evaluation alone. There are just too many papers published, too many applications for tenure track positions submitted and too many journals and conferences launched each day. In order to address the concerns raised above, however, we need to get away from a single number determining the worth of an article, a publication, or a researcher.

One way to do this would be a more sophisticated evaluation system that is based on many different metrics, and that gives context to these metrics. This would require that we work towards getting a better understanding of how and why measures are generated and how they relate to each other. In analogy to the models, we have to find those numbers that give us a good picture of the many facets of a paper – the useful ones.

As I have argued before, visualization would be a good way to represent the different dimensions of a paper and its context. Furthermore, the way the metrics are generated must be open and transparent to make gaming of the system more difficult, and to expose the biases that are inherent in humanly created data. Last, and probably most crucial, we, the researchers and the research evaluators must critically review the metrics that are served to us.

Altmetrics do not only give us new tools for evaluation, their introduction also presents us with the opportunity to revisit academic evaluation as such – let’s seize this opportunity!

Note: This is a reblog from the OKFN Science Blog. As part of my duties as a Panton Fellow, I will be regularly blogging there about my activities concerning open data and open science.

In July last year, I released the first version of a knowledge domain visualization called Head Start. Head Start is intended for scholars who want to get an overview of a research field. They could be young PhDs getting into a new field, or established scholars who venture into a neighboring field. The idea is that you can see the main areas and papers in a field at a glance without having to do weeks of searching and reading.


Interface of Head Start

You can find an application for the field of educational technology on Mendeley Labs. Papers are grouped by research area, and you can zoom into each area to see the individual papers’ metadata and a preview (or the full text in case of open access publications). The closer two areas are, the more related they are subject-wise. The prototye is based on readership data from the online reference management system Mendeley. The idea is that the more often two papers are read together, the closer they are subject-wise. More information on this approach can be found in my dissertation (see chapter 5), or if you like it a bit shorter, in this paper and in this paper.

Head Start is a web application built with D3.js. The first version worked very well in terms of user interaction, but it was a nightmare to extend and maintain. Luckily, Philipp Weißensteiner, a student at Graz University of Technology became interested in the project. Philipp worked on the visualization as part of his bachelor’s thesis at the Know-Center. Not only did he modularize the source code, he also introduced Javascript Finite State Machine that lets you easily describe different states of the visualization. To setup a new instance of Head Start is now only a matter of a couple of lines. Philipp developed a cool proof of concept for his approach: a visualization that shows the evolution of a research field over time using small multiples. You can find his excellent bachelor’s thesis in the repository (German).


Head Start Timeline View

In addition, I cleaned up the pre-processing scripts that do all the clustering, ordination and naming. The only thing that you need to get started is a list of publications and their metadata as well as a file containing similarity values between papers. Originally, the similarity values were based on readership co-occurrence, but there are many other measures that you can use (e.g. the number of keywords or tags that two papers have in common).

So without further ado, here is the link to the Github repository. Any questions or comments, please send them to me or leave a comment below.


Note: This is a reblog from the OKFN Science Blog. As part of my duties as a Panton Fellow, I will be regularly blogging there about my activities concerning open data and open science.


by AG Cann

Altmetrics are a hot topic in scientific community right now. Classic citation-based indicators such as the impact factor are amended by alternative metrics generated from online platforms. Usage statistics (downloads, readership) are often employed, but links, likes and shares on the web and in social media are considered as well. The altmetrics promise, as laid out in the excellent manifesto, is that they assess impact quicker and on a broader scale.

The main focus of altmetrics at the moment is evaluation of scientific output. Examples are the article-level metrics in PLOS journals, and the Altmetric donut. ImpactStory has a slightly different focus, as it aims to evaluate the oeuvre of an author rather than an individual paper.

This is all good and well, but in my opinion, altmetrics have a huge potential for discovery that goes beyond rankings of top papers and researchers. A potential that is largely untapped so far.

How so? To answer this question, it is helpful to shed a little light on the history of citation indices.

Pathways through science

In 1955, Eugene Garfield created the Science Citation Index (SCI) which later went on to become the Web of Knowledge. His initial idea – next to measuring impact – was to record citations in a large index to create pathways through science. Thus one can link papers that are not linked by shared keywords. It makes a lot of sense: you can talk about the same thing using totally different terminology, especially when you are not in the same field. Furthermore, terminology has proven to be very fluent even in the same domain (Leydesdorff 1997). In 1973, Small and Marshakova realized – independently from each other – that co-citation is a measure of subject similarity and therefore can be used to map a scientific field.

Due to the fact that citations are considerably delayed, however, co-citation maps are often a look into the past and not a timely overview of a scientific field.

Altmetrics for discovery

In come altmetrics. Similarly to citations, they can create pathways through science. After all, a citation is nothing else but a link to another paper. With altmetrics, it is not so much which papers are often referenced together, but rather which papers are often accessed, read, or linked together. The main advantage of altmetrics, as with impact, is that they are much earlier available.


Bollen et al. (2009): Clickstream Data Yields High-Resolution Maps of Science. PLOS One. DOI: 10.1371/journal.pone.0004803.

One of the efforts in this direction is the work of Bollen et al. (2009) on click-streams. Using the sequences of clicks to different journals, they create a map of science (see above).

In my PhD, I looked at the potential of readership statistics for knowledge domain visualizations. It turns out that co-readership is a good indicator for subject similarity. This allowed me to visualize the field of educational technology based on Mendeley readership data (see below). You can find the web visualization called Head Start here and the code here (username: anonymous, leave password blank).

Why we need open and transparent altmetrics

The evaluation of Head Start showed that the overview is indeed more timely than maps based on citations. It, however, also provided further evidence that altmetrics are prone to sample biases. In the visualization of educational technology, the computer science driven areas such as adaptive hypermedia are largely missing. Bollen and Van de Sompel (2008) reported the same problem when they compared rankings based on usage data to rankings based on the impact factor.

It is therefore important that altmetrics are transparent and reproducible, and that the underlying data is openly available. This is the only way to ensure that all possible biases can be understood.

As part of my Panton Fellowship, I will try to find datasets that satisfy these criteria. There are several examples of open bibliometric data, such as the Mendeley API, and figshare API that have adopted CC BY, but most of the usage data is not available publicly or cannot be redistributed. In my fellowship, I want to evaluate the goodness of fit of different open altmetrics data. Furthermore, I plan to create more knowledge domain visualizations such as the one above.

So if you know any good datasets please leave a comment below. Of course any other comments on the idea are much appreciated as well.

Note: This is a reblog from the OKFN Science Blog. As part of my duties as a Panton Fellow, I will be regularly blogging there about my activities concerning open data and open science.

Peer review is one of the oldest and most respected instruments of quality control in science and research. Peer review means that a paper is evaluated by a number of experts on the topic of the article (the peers). The criteria may vary, but most of the time they include methodological and technical soundness, scientific relevance, and presentation.

“Peer-reviewed” is a widely accepted sign of quality of a scientific paper. Peer review has its problems, but you won’t find many researchers that favour a non peer-reviewed paper over a peer-reviewed one. As a result, if you want your paper to be scientifically acknowledged, you most likely have to submit it to a peer-reviewed journal.

Even though it will take more time and effort to get it published than in a non peer-reviewed publication outlet.

Peer review helps to weed out bad science and pseudo-science, but it also has serious limitations. One of these limitations is that the primary data and other supplementary material such as documentation source code are usually not available. The results of the paper are thus not reproducible. When I review such a paper, I usually have to trust the authors on a number of issues: that they have described the process of achieving the results as accurate as possible, that they have not left out any crucial pre-processing steps and so on. When I suspect a certain bias in a survey for example, I can only note that in the review, but I cannot test for that bias in the data myself. When the results of an experiment seem to be too good to be true, I cannot inspect the data pre-processing to see if the authors left out any important steps.

As a result, later efforts in reproducing research results can lead to devastating outcomes. Wang et al. (2010) for example found that they could not reproduce almost all of the literature on a certain topic in computer science.

“Reproducible”: a new quality criterion

Needless to say this is not a very desirable state. Therefore, I argue that we should start promoting a new quality criterion: “reproducible”. Reproducible means that the results achieved in the paper can be reproduced by anyone because all of the necessary supplementary resources have been openly provided along with the paper.

It is easy to see why a peer-reviewed and reproducible paper is of higher quality than just a peer-reviewed one. You do not have to take the researchers’ word of how they calculated their results – you can reconstruct them yourself. As a welcome side-effect, this would make more datasets and source code openly available. Thus, we could start building on each others’ work and aggregate data from different sources to gain new insights.

In my opinion, reproducible papers could be published alongside non-reproducible papers, just like peer-reviewed articles are usually published alongside editorials, letters, and other non peer-reviewed content. I would think, however, that over time, reproducible would become the overall quality standard of choice – just like peer-reviewed is the preferred standard right now. To help this process, journals and conferences could designate a certain share of their space to reproducible papers. I would imagine that they would not have to do that for too long though. Researchers will aim for a higher quality standard, even if it takes more time and effort.

I do not claim that reproducibility solves all of the problems that we see in science and research right now. For example, it will still be possible to manipulate the data to a certain degree. I do, however, believe that reproducibility as an additional quality criterion would be an important step for open and reproducible science and research.

So that you can say to your colleague one day: “Let’s go with the method described in this paper. It’s not only peer-reviewed, it’s reproducible!”

Note: This is a reblog from the OKFN Science Blog. To my excitment and delight, I was recently awarded a Panton Fellowship. As part of my duties, I will be regularly blogging there about my activities concerning open data and open science.

Peter Kraker at Barcamp Graz 2012. Photo by Rene Kaiser

Photo by Rene Kaiser

Hi, my name is Peter Kraker and I am one of the new Panton Fellows. After an exciting week at OKCon, I was asked to introduce myself and what I want to achieve during my fellowship, which I am very happy to do. I am a research assistant at Know-Center of Graz University of Technology and a late-stage PhD student at University of Graz. Like many others, I believe that an open approach is essential for science and research to making progress. Open science to me is about reproducibility and comparability of scientific output. Research data should therefore be put into the public domain, as called for in the Panton Principles.

In my PhD, I am concerning myself with research practices on the web and how academic literature search can be improved with overview visualizations. I have developed and open-sourced a knowledge domain visualization called Head Start. Head Start is based on altmetrics data rather than citation data. Altmetrics are indicators of scholarly activity and impact on the web. Have a look at the altmetrics manifesto for a thorough introduction.

In my evaluation of Head Start, I noticed that altmetrics are prone to sample biases. It is therefore important that analyses based on altmetrics are transparent and reproducible, and that the underlying data is openly available. Contributing to open and transparent altmetrics will be my first objective as a Panton Fellow. I will establish an altmetrics data repository for the upcoming open access journal European Information Science. This will allow the information science community to analyse the field based on this data, and add an additional data source for the growing altmetrics community. My vision is that in the long run, altmetrics will not only help us to evaluate science, but also to connect researchers around the world.

My second objective as a Panton Fellow is to promote open science based on an inclusive approach. The case of the Bermuda Rules, which state that DNA sequences should be rapidly released into the public domain, has shown that open practices can be established, if the community stands together. In my opinion, it is therefore necessary to get as many researchers aboard as possible. From a community perspective, it is the commitment to openness that matters, and the willingness to promote this openness. The inclusive approach puts the researcher in his or her many roles at the center of attention. This approach is not intended to replace existing initiatives but to make researchers aware of these initiatives and helping them with choosing their approach to open science. You can find more on that in on my blog.

Locally, I will be working with the Austrian Chapter of the Open Knowledge Foundation to promote open science based on this inclusive approach. Together with the Austrian Student’s Union, we will be having workshops with students, faculty, and librarians. I will also make the case for open science in the research communities that I am involved in. For the International Journal on Technology Enhanced Learning for example, I will develop an open data policy.

I am very honored to be selected as a Panton Fellow, and I am excited to get started. If you want to work with me on one or the other objective, please do not hesitate to contact me. You can also follow my work on Twitter and on my blog. Looking forward to furthering the cause of open data and open science with you!

Open Science Logo v2

by gemmerich

Update: There is a OKFN pad devoted to discussing this idea. Please add your comments and critique there!

When Derick Leony, Wolfgang Reinhardt, Günter Beham and myself made the case for an open science in technology-enhanced learning back in late 2011, we discussed how open science could become a reality. We finally concluded that this was first and foremost a matter of consensus in the community:

Open Science is first and foremost a community effort. In fact we are arguing that reproducibility and comparability should become two of the standard criteria that every reviewer has to judge when assessing a paper.  [..] These two criteria should be of equal importance as the established criteria, giving incentive to the authors to actually apply the instruments of Open Science.

In addition, journals and conferences ought to make the submission of source code, data, and methodological descriptions together with the paper mandatory for them to be published. Conferences and journals themselves should in turn commit to making the papers openly accessible. The case of the genetic sequence database GenBank, which stores DNA sequences and makes them available to the public, has shown that if publishers and conference organisers adopt new standards, they can be propagated quickly within the community. The huge success of GenBank is due to the fact that many journals adopted the Bermuda principles (Marshall 2001), which state among other things that DNA sequences should be rapidly released into the public domain.

There is a crucial interplay at work between individual researchers and other actors within a field such as funding agencies, journals, and conferences. On the one hand, individual researchers are often bound by the rules that are made by those institutions because they depend on them as sources of funding and as publication outlets. On the other hand, the boards and committees steering these institutions are (at least partly) made up of the same researchers. Many researchers are sitting on conference committees, editorial boards, and policy advisory boards. They are thus shaping the community and commonly defining what is shared pratice among its participants. In their role, they can advocate open practices and propose rules that help establishing an open science.

In my perception, the discourse in open science often runs along the lines of open vs. closed approaches. A lot of effort is put into determining what is truly open and what is actually still closed. In open access for example, there is a heated debate whether to choose the green or the gold road with advocates on both sides ferociously arguing why only one of the two can only be considered as true open access. Whereas this discussion surely has some merit, most researchers have to worry more about whether their efforts are recognized by the community than what constitutes true openness. As Antonella Esposito writes in her insightful study on digital research practices:

Nonetheless their digital identities and online activities constituted a ‘parallel’ academic life that developed as a self–legitimating approach within a traditional mode of knowledge production and distribution. These tentative efforts were not acknowledged in their respective communities, struggling to become identifiable open research practices. Indeed, some interviewees called for clear institutional rules enabling sharing practices — especially in teaching and learning — that might slowly produce a general change of attitude and overcome current isolated initiatives by a few pioneers of open scholarship.

Most researchers are neither completely open nor completely closed. There is no black and white, but different shades of grey. Nonetheless, there are many researchers out there who make their publications available or put their source code online. In my opinion, it is necessary to get these reseachers aboard, not to drive them away with endless debates whether their research is “truly” open. Don’t get me wrong: it is important to have discussions about the optimal characteristics of open science, but not at the expense of making open science an elitist club where only a small minority can enter that satisfies all criteria. From a community perspective, it is the commitment to openness that matters, and the willingness to promote this openness on editorial boards and program committees.

It seems that such a holistic view is gaining some traction: in a recent Web Science paper, R. Fyson, J. Simon and L. Carr discuss the interplay between actors regarding open access publications. Another good example of an inclusive approach is the Open Science Project here in Graz. The Open Science Project is a group of students led by Stefan Kasberger that tries to do all of their study-related work according to open science practices. This means that they try to use open source software for their homework assignments and make the results publicly available. They go to great lengths in their effort as they also try to persuade lecturers to follow their example and make their scripts openly accessible.

Draft Petition

At a recent meeting of the Austrian chapter of the OKFN Open Science, we started discussing an inclusive  approach to open science. This motivated me to write a first draft for a petition which you can find below. So my question is: would you sign such a petition? Do you think it is engaging/far going/well worded enough? Let me know what you think in the comments or join us at the OKFN Pad where you can help us to collaboratively edit the text:

Science is one of the greatest endeavours of mankind. It has enjoyed  enormous growth since its inception more than 400 years ago. Science has  not only produced an incredible amount of knowledge, it has also created  tools for communication and quality control. Journals, conferences, peer  review to name just a few. Lately, serious shortcomings of these  established instruments have surfaced. Scientific results are often irreproducible and lead to ill-guided decisions. Retraction rates are on  the rise. There have been many cases of high profile scientific fraud.

In our view, all of these problems can be addressed by a more open approach to science. We see Open Science as making the scientific  process and all of its outcomes openly accessible to the general public. Open Science would benefit science, because it would make results more  reproducible, and quality control more transparent. Open Science would also benefit the society by including more people in the process and sparking open innovation.

Besides the greater good, open science also benefits individual scientists. Research has shown that papers that are openly accessible are cited more  often. If you share source code and data, you could get credited for  these parts of your research as well. If you talk about your methodology and share it with others, this will bring attention to your work. The internet provides us with the technology to make Open Science possible. In our view, it is time to embrace these possibilities and innovate in the scientific process.

It is very important to note that we see Open Science as a community effort that can only work if we include as many people as possible. We know that it is not possible to open up entire work processes  overnight. In our view, this is not necessary to contribute to an Open Science. The idea is to open everything up that you already can and work towards establishing open practices in your work and your  community. You might already have papers that you are allowed to share in a personal and institutional repository. You might have source code or data that you can easily publish under a permissive license. And you might be sitting on a board and committee where you can bring open practices into the discussion.

If you agree with this point of view, you are encouraged to sign the  declaration below.

  • I will open up resources that I have the legal right to
  • I will work towards establishing open practices in my research
  • I will promote Open Science in my institution and my research community

If you would like to comment on the manifesto, or add your own ideas, please go to this OKFN Pad.

Photo by Cory Doctorow, Slides by Lora Aroyo

Photo by Cory Doctorow, slides by Lora Aroyo

I spent last week at Web Science 2013 in Paris. And what a well spent time that was. Web Science was for sure the most diverse conference I have ever attended. One of the reasons for this diversity is that Webscience was collocated with CHI (Human-Computer-Interaction) and Hypertext. But most importantly, the community of Webscience itself is very diverse. There were more than 300 participants from a wide array of disciplines. The conference spanned talks from philosophy to computer science (and everything in-between) with keynotes by Cory Doctorow and Vint Cerf. This resulted in many insightful discussions, looking at the web from a multitude of angles. I really enjoyed the wide variety of talks.

Nevertheless, there were some talks that failed to resonate with the audience. It seems to me that this was mostly due to the fact that they were too rooted in a single discipline. Some presenters assumed a common understanding of the problem discussed and used a lot of domain-specific vocabulary that made it hard to follow the talk. Don’t get me wrong: most presenters tried to appeal to the whole audience but with some subjects this seemed to be impossible.

To me, this shows that a better insight is needed on what Web Science actually is and more discussion on what should be researched under this banner. There seems to be a certain uncertainty about this, which was also reflected in the peer reviews. Hugh Davis, the general chair for Websci’13, highlighted this in his opening speech:

I think that Web Science is a good example where Open Peer Review could contribute to a common understanding and a better communication among the actors involved. I have been critical of open processes in the past because they take away the benefits of blinding. Mark Bernstein, the program chair, also stressed this point in a tweet:

Nowadays, however, I think that the potential benefits of open peer review (transparency, increased communication, incentives to write better reviews) outweigh the effects of taking away the anonymity of reviewers. Science will always be influenced by power structures, but with open peer review they are at least visible. Don’t get me wrong: I really like the inclusive approach to Web Science that the organizers have taken. The web cannot be understood with the paradigm of a single discipline, and at this very point in time it is very valuable to get input from all sides on the discussion. In my opinion, open peer review could help in facilitating this discussion before and after the conference as well.


I made two contributions to this year’s Web Science conference. First, I presented a paper written together with Sebastian Dennerlein in the Social Theory for Web Science Workshop entitled “Towards a Model of Interdisciplinary Teamwork for Web Science: What can Social Theory Contribute?”. In this position paper, we argue that social scientists and computer scientists do not work together in an interdisciplinary way due to a fundamentally different approach to research. We sketch a model of interdisciplinary teamwork in order to overcome this problem. The feedback on this talk was very interesting. On the one hand participants could relate to the problem, but on the other hand they alerted us of many other influences to interdisciplinary teamwork. For one, there is often a disagreement at the very beginning of a research project about what the problem actually is. Furthermore, the disciplines are fragmented as well and have often different paradigms that they follow. We will consider this feedback when specifying the formal model. You can find the paper here and the slides of my talk below.

In general, the workshop was very well attended and there was a certain sense of common understanding regarding opportunities and challenges of applying social theory in web science. All in all, I think that a community has been established that could produce interesting results in the future.

My second contribution was a poster with the title “Head Start: Improving Academic Literature Search with Overview Visualizations based on Readership Statistics” which I co-wrote with Kris Jack, Christian Schlögl, Christoph Trattner, and Stefanie Lindstaedt. As you may recall, Head Start is an interactive visualization of the research field of Educational Technology based on co-readership structures. Head Start was received very positively. Many participants were interested in the idea of readership statistics for mapping. There were some scientometrists but also educational technologists who expressed their interest. Many comments went towards how the prototype could be extended. You can find the paper at the end of the post and the poster below.

Head Start

Several participants noted that they would like to adapt and extend the visualization. Clare Hooper for example is working on a content-based representation of the field of Web Science, and it would be interesting to combine our approaches. This encouraged me even more to open source the software as soon as possible.

All in all, it was a very enjoyable conference. I also like the way that the organizers innovate in the format every year. The pecha kucha session worked especially well in my opinion, sporting concise and entertaining talks throughout. Thanks to all organizers, speakers and participants for making this conference such a nice event!

Peter Kraker, Kris Jack, Christian Schlögl, Christoph Trattner, & Stefanie Lindstaedt (2013). Head Start: Improving Academic Literature Search with Overview Visualizations based on Readership Statistics Web Science 2013

Network model, or network pattern? Image by GustavoG

Update: Sebastian Dennerlein and I have written a paper entitled “Towards a Model of Interdisciplinary Teamwork for Web Science: What can Social Theory Contribute?” which includes the patterns vs. models problem. The paper has been accepted for the Web Science 2013 Workshop: “Harnessing the Power of Social Theory for Web Science”. You can download it here.

Scientific disciplines are curious formations. Each discipline has its own culture of doing things. Kuhn called these cultures “paradigms” – a combination of assumptions, theories, and methods that guide research in a discipline. Sometimes a new set of problems arises that cannot be answered with the standard paradigm of a single discipline. Instead, these problems require knowledge from different disciplines. In education, for example, it became apparent that learning might benefit from integrating technology. When the world wide web became social, studying peoples’ online behavior became as interesting as building the infrastructure that allow for these interactions to take place.

As a result, even more curious formations emerge – interdisciplinary fields. Suddenly, learning scientists need to work with computer scientists; computer scientist need to work with social scientists; social scientists need to work with jurists – and sometimes they all need to work together. As you can imagine, when people from different scientific cultures need to talk to each other, they have problems with comprehending each other. They come from a different background, have a different vocabulary, and a different methodology.

Examples for interdisciplinary fields are educational technology and web science. Both fields are interdisciplinary, in both fields computer scientists meet social scientists in a wider sense (sociologists, psychologists, learning scientists). And from my point of view, both fields suffer from the same problem. This problem is a fundamental problem. It goes beyond simple terminology or methodology. I call it the models vs. patterns problem.

Patterns and models – or computer science vs. social science

It struck me some time ago in a lecture on Knowledge Discovery in Databases by Brano Markić from the University of Mostar. He introduced knowledge discovery as defined by Fayyad et al. from 1996: the goal of knowledge discovery is to find new, valid, useful, and understandable patterns in data. Fayyad et al. use patterns and models synonymously, but Markić made a very interesting distinction: models are like the general equation of a line y = a + bx, while patterns are like a specific equation, e.g. y = 5 + 2x. Fayyad et al. also describe the knowledge discovery process: after preprocessing and data selection, you perform some sort of data mining method (e.g. clustering or machine learning). The output of the data mining step are the aforementioned patterns. In a next step, you evaluate the pattern and thus gain knowledge.

Then I realized: this is not only the knowledge discovery process. This is the way that a lot of computer scientists do research. Starting from a certain problem, they try to find patterns that relate to that problem in a big dataset. There is a certain caveat to that definition of knowledge, and Fayyad and his colleagues make it very clear: „[..] knowledge in this definition is purely user oriented and domain specific and is determined by whatever functions and thresholds the user chooses.” While this might be fine for practical problems, it surely isn’t for scientific ones. This definition of knowledge excludes any generalization of results that goes beyond the specific situation and the specific user.

Now, don’t get me wrong: I do not claim that computer scientists produce useless results. Computer scientists have developed good ways to identify reliable patterns that are independent of user and situation. But a lot of these patterns are hard to interpret. Say you wanted to know which Twitter users are more likely to talk to strangers, and by various analyses you find that those are the ones that mention significantly more names of colors in their tweets. This might be a very stable pattern in the sense described before, but how do you interpret this results? This is when computer scientists turn to social scientists in order to find answers to their questions.

Social scientists, however, have a fundamentally different way of approaching a problem. Let’s take the problem of which users are more likely to talk to strangers. Usually social scientists first turn to theories, in order to see which one might be applicable to the problem area. They might choose social information processing that deals with how people get to know each other online. Then they come up with a general model or hypotheses based on this theory that describes the problem. Afterwards, they build an instrument to test this model, such as a survey, an interview, or an observation. In the end, they know whether the model has survived this specific test (or, they adapt the model to the results – but no one would do that of course). The usual problem is that due to smaller sample sizes it is unclear to what extent the results can be generalized. That is when social scientists turn to computer scientists who can seemingly provide access to larger datasets.

This is when the confusion begins: social scientists disregard computer scientists’ results because they are not grounded in theory. Computer scientists disregard social scientists’ results because they are not based on big datasets. Social scientists cannot interpret computer scientists’ results because they are often on a level that is not covered by traditional theories and models. Computer scientists cannot test social scientists’ models because they often do not have the data in the form that is required by the models.

Overcoming the problem

In my opinion, it is important for interdisciplinary fields to close the gap that results from the models vs. patterns problem. Otherwise, the different disciplines cannot work together as effective as they potentially could. On the more pattern-oriented side, it would be important to understand that theories are more than just castles in the sky. They can be effective guiding principles to interpret the results that they achieve. Theory should be baked into research as a guiding principle to be able to understand these results. On the more theory-oriented side, researchers need to understand that data mining methods can be useful to evaluate models models, but their properties need already be considered when building the models. In that way, both sides could build on each others’ strength – instead of suspiciously looking at each others’ results.

What do you think? Am I oversimplifying here? What are the biggest challenges in interdisciplinary research from your perspective?

Thanks to Sebastian Dennerlein for valuable feedback on this post!

Peter Kraker, & Sebastian Dennerlein (2013). Towards a Model of Interdisciplinary Teamwork for Web Science: What can Social Theory Contribute? Web Science 2013 Workshop: Harnessing the Power of Social Theory for Web Science

iknowToday, the keynote speakers for i-KNOW 2013 have been announced:

I would like to use this opportunity to draw your attention to the Call for Papers. Science 2.0 is an important part of the conference. Topics include but are not limited to:

  • New Publication and Research Processes
  • Opportunities and Challenges for Researchers and Research organizations
  • New Indicator Systems to Measure Scientific Quality
  • Awareness-support for Science 2.0 Activities
  • New Paradigms for Scientific Communication
  • New Feedback Mechanisms among Researchers and between Science and Society
  • Empirical Studies on the Use of Web 2.0 Tools for Science 2.0
  • Recommender Systems in Science 2.0
  • Virtual Research Environments
  • Digital Research Libraries
  • Applications in and for Science 2.0
  • Crowd-sourcing in Science
  • Robust Methods for dealing with Noisy Crowd-sourced Data
  • Data Schemes and Interoperability Formats
  • Social Mining and Metadata Extraction in Academic Resources
  • Metadata Quality and Quality Assessment
  • Design and architecture of data sharing facilities
  • Semantic Web Standards for Science 2.0
  • Systems design accounting for standardized data sets

Other tracks include social computing, visual analytics & information visualization, knowledge managementknowledge discovery & data mining, and mobile computing. The deadline for full paper submissions is April 1, 2013. You can find all submission information and the full Call for Papers here.


Open access logo by PLOS

Recently, I ordered a book via interlibrary loan. I entered the bibliographic details into an online form on my university library’s site. My library received the form, a librarian looked up the book in a catalogue and sent a request to a German library. There, another librarian collected the book and sent it to my university library. I got an email when the book arrived, and went to the library to collect it. I read the book and scanned the chapters relevant to me so that I had them for later reference. Then I returned the book to my library which sent it back to Germany.

Why did I tell this lengthy and slightly boring story? Because there are still scholars who think that digital publishing and open access are seriously harming science. Douglas Fields makes many problematic assumptions in this article, and I do not want to address all of them. Jaleehs Rehman and Björn Brembs have already written very crafty replies that address most of the issues. I only want to focus on one statement which is outright wrong: the assumption that in the past, the peer-review system has allowed only for world-class research to be published. Now, open access journals would flood the literature.

After all, science has been growing exponentially for the last 400 years. Even in 1613, an author called Barnaby Rich decried the increase in literature:

One of the diseases of this age is the multiplicity of books; they doth so overcharge the world that it is not able to digest the abundance of idle matter that is every day hatched and brought forth into the world. (cited after Price 1961)

As we can see, information overload is not a contemporary problem in science. In fact, the first journals were established because people could not read all the books that were published. Barnaby Rich himself authored 26 books in his lifetime which is a high output, even by today’s standards. This led two German information scientists to declare the Barnaby Rich Effect (Braun and Zsindely 1985):

It‘s always the other author(s) who publishes too much and “pollutes“, “floods”, “eutroficates” the literature, never me.

Keeping that enormous output in mind, it is impossible that all research pre-web was world-class research. Following Dr. Fields rationale, this would not only require that every piece of research published was of world-class but that there were also always three world-class researchers to review the material. It is quite easy to comprehend that this cannot be the case. There is a good reason, why measures to judge the quality of research output have become so popular. In a world with an ever-increasing number of researchers, journals and papers, they are being used to separate the wheat from the chaff (how well they are doing that is subject to an ongoing discussion…).

Research into the quality of papers has shown consistently and over time that quality follows a power law. There are only a few publications that publish world-class research. The papers in these publications get cited often and thus have a high impact. The vast majority of papers, however, goes by almost unnoticed. Therefore, it is not the fault of open access journals that so much research gets published: it is rather an artefact of the enormous growth of science. Bad research has always been published, but in most cases, it just did not get the attention. And in the worst case, world-class research went by unnoticed because it was simply not picked up, like Mendel’s lost paper.

Taking publications into the digital age

However the world of digital scholarship will look like, I expect that scholars will still judge the quality of papers based on certain metrics and quality indicators, and read only those that are worth reading to them. A whole movement called altmetrics is devoted to adapting scholarly metrics to the digital age. But apart from the issue of filtering, digital publishing offers many opportunities that the existing system doesn’t. With the content of papers being machine readable, we can start analyzing and linking it in ways that were not possible before. One practical application of these analyses are recommendations. They might help to unearth that valuable piece of research that went previously unnoticed. Another application that I am currently involved in is to use digital links between papers to create timely overviews of research fields.

But we can go even further than that: a truly open science. By making data and source code available, and linking it to the research results by an open methodology, we can make research better reproducible – and therefore easier weed out bad research.

True, there are certain problems with respect to open access that need to be solved. Fields makes a valid point when he hints at predatory publishers that are only interested in profit and do not provide quality control. But scholars already started to collect information about questionable open access publishers. He also talks about the loss of blinded reviews in open peer review. Anonymity gives reviewers the possiblity to be honest even towards the most renowned researchers and institutions This issue needs to be addressed, as I wrote in an earlier blogpost.

Nevertheless, I believe that digital publishing and open access are such a great opportunity for science that it is worth taking the risk. The open access version of the story at the beginning of this post is rather short: I look up the book in a search engine and download the PDF. Think about what difference it would make not in my case, but also for people in regions of the world where interlibrary loans do not exist, and libraires cannot afford paying for journal subscriptions. Taking all of the other potentials aside, an open science would mean that billions of people would get access to knowledge that was not available to them before.

<span>%d</span> bloggers like this: