With “Ich bin Open Science!”, we want to raise public awareness for open science in Austria and beyond. The project, a collaboration between Know-Center and FH Joanneum, has been submitted to netidee 2015. In the video (German only for the moment) we explain the project idea, and you can see first testimonials who lend a face to open science. Why are you committed to openness in science and research?
Note: This is a reblog from the RDA Europe website.
From March 8 to 11, I spent several insightful days in Southern California – at the 5th Plenary of the Research Data Alliance in San Diego to be precise. The RDA, for those of you not familiar with the organization, is a global consortium of individuals and organizations with a common goal: building the social and technical bridges that enable open sharing of data in research. It’s vision is: “Researchers and innovators openly sharing data across technologies, disciplines, and countries to address the grand challenges of society.” The organization has high-level backing: among its supporters are funding heavyweights NSF, NIH, and the European Commission.
My stay in San Diego was made possible by the generous financial support provided by RDA Europe’s Early Career Programme. I applied as I was especially interested in the Data Citation working group headed by fellow Austrian Andreas Rauber and his co-chairs Ari Asmi and Dieter van Uytvanck. I was closely following the activities of this group throughout the meeting and acted as a scribe for the group in the working group meeting, their presentation in the plenary and the Data Publishing interest group. The Data Citation WG has come up with a way to make dynamic and highly volatile dataets and parts thereof citable. Data citations of this kind are very important for reproducibility of science and they are not supported by current solutions. I was very impressed with the results of the working group – and by the pilots and workshops that are being carried out by NERC, ESIP, CLARIN, and NASA. If I have sparked your interest, I’d encourage you to check out the website of the WG and join the group.
In a way, the Data Citation WG embodies the RDA’s spirit: solution-oriented, focused and implementation-driven. Nevertheless there was also plenty of room for high-level talk at the meeting. I was impressed by the keynote by Stephen Friend of SAGE Bionetworks (check out the recording of his and other talk here). He provided a look into a data-driven future in biomedical research illustrated by a number of projects that have turned heads beyond the research community. These include Accelerating Medicines Partnership in Alzheimer’s Disease (AMP-AD) and Apple’s ResearchKit.
Bibliometrics and altmetrics, which are two of my main research foci, were also discussed in the course of the Plenary; most notably during the Publishing Data Bibliometrics WG of course, but also in the Publishing Data Interest Group. There, I presented two recent studies that I had been part of, dealing with the distribution of data citations and altmetrics. More information can be found in the accompanying slides.
I also contributed to the event by presenting a poster on the overview visualization of scholarly materials that I have developed in my PhD. More information on that in the poster below and in this blogpost. Discovery was also the main topic in the Data Description Registry Interoperability (DDRI) session. Amir Aryani presented the Research Data Switchboard, which connects datasets over repositories using semantic relations. Can’t wait to try this one out myself!
The RDA meeting was a unique experience. I did get to meet many fascinating people, and it was awesome to see just how many people are working towards promoting and enabling sharing research data in an open manner. I will certainly follow the work of the working groups that I participated in and I will try to contribute as much as I can – and I would encourage everyone interested in open research data to do the same!
Note: This post is a reblog from the LSE Impact Blog.
In Douglas Adam’s famous novel The Hitchhiker’s Guide to the Galaxy, an unsuspecting man called Arthur Dent is lifted onto a spaceship just before earth is demolished by intergalactic bureaucrats. Together with a group of interstellar travellers (including amongst others the President of the galaxy), he then embarks on a journey through the universe to unravel the events that lead to the destruction of earth. To help Arthur better understand the new surroundings he is thrown into, he is handed a copy of The Hitchhiker’s Guide to the Galaxy, a multimedia guidebook that offers wisdom and advice on all topics of interest in the universe.
Starting out in a new scientific field can feel very similar: you are faced with a new world that you have to make sense of. Unfortunately, the knowledge needed to understand this new world is not readily structured and summarized in one handy guide, but scattered over millions of scientific articles. To make matters worse, you have no idea which articles belong to the field that you are interested in and which of them are actually important. For many researchers, the starting point in their quest to conquer an unfamiliar knowledge domain is to turn to their personal favourite search engine, type in the name of the field of interest and start reading at the top of the list. Once you have read through the first few articles (usually highly cited review articles), and followed relevant references, you develop an idea of important journals and authors in the field and adapt your search strategy accordingly. With time and patience, a researcher can thus build a mental model of a field.
The problem with this strategy is that it can take weeks, if not months before this mental model emerges. Indeed, in many PhD programs, the first year is devoted to catching up with the state-of-the-art. There is also a lot of reading and summarizing involved, but searching for relevant literature usually accounts for a large chunk of the time. And even with the most thorough search strategy, the probability that you are going to miss out on an important piece of prior work is rather high.
Another means of getting an overview a research field are knowledge domain visualizations. An example for such a visualization is given above. Knowledge domain visualizations show the main areas in a field, and assign relevant articles to these main areas. Hence, an interested researcher can see the intellectual structure of a field at a glance without performing countless searches with all different sorts of queries. An additional characteristic of knowledge domain visualizations is that areas of a similar subject are positioned closer to each other than areas of an unrelated subject. In the example “Pedagogical models” is subject-wise closer to “Virtual learning environments” than “Psychological theories”. Thus it is easy to find related areas to one’s own interests. Granted, even with a knowledge domain visualization in hand, you would still need to do the reading. But it would certainly save you a lot of time that you would otherwise spend on searching, indexing and structuring.
Knowledge domain visualizations can not only be created on the level of the individual research article. Below you can see a visualization by Bollen et al. (2009) of all of science. The nodes in the network represent research journals and the different colors designate different disciplines. Even though the idea of knowledge domain visualizations has been around for quite some time, and despite their obvious usefulness, they are not yet widely available. Part of the reason may be that in the past, the data needed to construct these visualizations was only available from a few rather expensive choices. Part of the reason may be that there has been an emphasis on all-encompassing overviews. While they provide valuable insights into the structure of science as a whole, they are usually not interactive and provide little value in day-to-day work where you want to be able to zoom into specific publications. There are several applications out there that can be used to create one’s own overview, but they can usually only be operated by users that are information visualization specialists.
In our work, we therefore aimed at creating an interactive visualization that can be used by anyone. As a first case, we chose to visualize the field of educational technology, as it represents a highly dynamic and interdisciplinary research field. As described in a recently published paper in the Journal of Informetrics (Kraker et al 2015), the visualization is based on a novel data source – the online reference management software Mendeley. The articles for the visualization were selected from Mendeley’s research catalog which is crowd-sourced from over 2.5 million users from around the world and offers structured access to more than a 100 million papers.
One of the most important steps when creating a knowledge domain visualization is to decide which measure defines the similarity between two articles. The measure determines where an article gets placed on the map and how it is related to other articles. Again, we used Mendeley data to tackle this issue. Specifically, we used co-readership information. “So what is this co-readership exactly?” you may ask. Mendeley enables users to store their references in a personal library and share them with other people. The number of times an article has been added to user libraries is commonly referred to as the number of readers, or in short readership. In analogy to that, we are talking about the co-readership of documents, when they are added to the same user library. When Alice adds Paper 1 and Paper 2 to her user library, the co-readership of these two documents is 1. When Bill adds the same two papers, the co-readership count goes up to 2, and so on. Our assumption was now that the higher the co-readership of two documents, the more likely they are of the same or a similar subject. It’s not unlike two books that are often rented together from a library – there is a good chance that they address related topics. And indeed, our first analyses indicate that our assumption is valid.
The cool thing is that once you have settled on a similarity measure, the process of creating the map can be highly automated. We adapted procedures for assigning papers to research areas and for situating them on the map. We also put a heuristic in place that tries to guess a name for each area using web-based text mining systems OpenCalais and Zemanta.
The resulting knowledge domain visualization can be seen below. The blue bubbles represent the main areas in the field. The size of the bubbles signifies the number of readers of publications in that area. The closer two areas are in the visualization, the closer they are subject-wise. An interactive version is also available; once you click on a bubble, you are presented with popular papers in that area. The dropdown on the right displays the same data in list form. Just go to Mendeley Labs (http://labs.mendeley.com/headstart) and try it for yourself! The source code is available on github: http://github.com/pkraker/Headstart
Apart from the fact that you can get a quick overview of a field, there are many other interesting things that you can learn about a domain from such a visualization. Fisichella and his colleagues even argue that mappings like the one above might help to overcome the fragmentation in educational technology by building awareness among researchers of the different sub-communities. There may be some truth to this assumption: when I evaluated the map with researchers from computer science, they discovered research areas that they did not know existed. One example is Technological Pedagogical Content Knowledge, which is a conceptual framework emanating from the educational part of the research community.
Another interesting possibility is to study the development of fields over time . When I compared the map to similar maps based on older literature (e.g. Cho et al. 2012), I learned a lot about the development of the field. Whereas learning environments played an important role in the 2000s, issues relating to them have later split up into different areas (e.g. Personal Learning Environments, Game-based Learning). You can find further examples in the paper describing the full details of the evaluation which still under review. You can find a pre-print on arXiv.
Given the enormous amount of new knowledge that is produced each and every day, the need for better ways of gaining – and keeping – an overview is becoming more and more apparent. I think that visualizations based on co-readership structures could provide this overview and serve as universal up-to-date guides to knowledge domains. There are still several things that need fixing – the automated procedure for example is not perfect and still requires manual interventions. Furthermore, the characteristics of the users have a certain influence on the result, and we need to figure out a way to make users aware of this inherent bias. Therefore, we are currently working on improving automatization techniques. Algorithms, however, will never be correct 100% of the time, which is why we are also experimenting with collaborative models to refine and extend the visualizations. After all, an automated overview can never be the end product, but rather a starting point to discovery.
Kraker, P., Schlögl, C., Jack, K., & Lindstaedt, S. (2015). Visualization of co-readership patterns from an online reference management system Journal of Informetrics, 9 (1), 169-182 DOI: 10.1016/j.joi.2014.12.003. Preprint: http://arxiv.org/abs/1409.0348
 Educational technology experts will notice that some of the newest developments in the field such as MOOCs or learning analytics are missing from the overview. That is due to the fact that the data for this prototype was sourced in August 2012 and is therefore almost 2,5 years old. The evaluation was conducted in the first half of 2013.
Note: This is a reblog from the OKFN Science Blog.
It’s hard to believe that it has been over a year since Peter Murray-Rust announced the new Panton fellows at OKCon 2013. I am immensly proud that I was one of the 2013/14 Panton Fellows and the first non UK-based fellow. In this post, I will recap my activities during the last year and give an outlook of things to come after the end of the fellowship. At the end of the post, you can find all outputs of my fellowship at a glance. My fellowship had two focal points: the work on open and transparent altmetrics and the promotion of open science in Austria and beyond.
Open and transparent altmetrics
The blog post entitled “All metrics are wrong, but some are useful” sums up my views on (alt)metrics: I argue that no single number can determine the worth of an article, a journal, or a researcher. Instead, we have to find those numbers that give us a good picture of the many facets of these entities and put them into context. Openness and transparency are two necessary properties of such an (alt)metrics system, as this is the only sustainable way to uncover inherent biases and to detect attempts of gaming. In my comment to the NISO whitepaper on altmetrics standards, I therefore maintained that openness and transparency should be strongly considered for altmetrics standards.
In another post on “Open and transparent altmetrics for discovery”, I laid out that altmetrics have a largely untapped potential for visualizaton and discovery that goes beyond rankings of top papers and researchers. In order to help uncover this potential, I released the open source visualization Head Start that I developed as part of my PhD project. Head Start gives scholars an overview of a research field based on relational information derived from altmetrics. In two blog posts, “New version of open source visualization Head Start released” and “What’s new in Head Start?” I chronicled the development of a server component, the introdcution of the timeline visualization created by Philipp Weißensteiner, and the integration of Head Start with Conference Navigator 3, a nifty conference scheduling system. With Chris Kittel and Fabian Dablander, I took first steps towards automatic visualizations of PLOS papers. Recently, Head Start also became part of the Open Knowledge Labs. In order to make the maps created with Head Start openly available to all, I will set up a server and website for the project in the months to come. The ultimate goal would be to have an environment where everybody can create their own maps based on open knowledge and share them with the world. If you are interested in contributing to the project, please get in touch with me, or have a look at the open feature requests.
Promotion of open science and open data
Regarding the promotion of open science, I teamed up with Stefan Kasberger and Chris Kittel of openscienceasap.org and the Austrian chapter of Open Knowledge for a series of events that were intended to generate more awareness in the local community. In October 2013, I was a panelist at the openscienceASAP kick-off event at University of Graz entitled “The Changing Face of Science: Is Open Science the Future?”. In December, I helped organizing an OKFN Open Science Meetup in Vienna on altmetrics. I also gave an introductory talk on this occasion that got more than 1000 views on Slideshare. In February 2014, I was interviewed for the openscienceASAP podcast on my Panton Fellowship and the
need for an inclusive approach to open science.
In June, Panton Fellowship mentors Peter Murray-Rust and Michelle Brook visited Vienna. The three-day visit, made possible by the Austrian Science Fund (FWF), kicked off with a lecture by Peter and Michelle at the FWF. On the next day, the two lead a well-attended workshop on content mining at the Institute of Science and Technology Austria.The visit ended with a hackday organized by openscienceASAP, and an OKFN-AT meetup on content mining. Finally, last month, I gave a talk on open data at the “Open Science Panel” on board of the MS Wissenschaft in Vienna.
I also became active in the Open Access Network Austria (OANA) of the Austrian Science Fund. Specifically, I am contributing to the working group “Involvment of researchers in open access”. There, I am responsible for a visibility concept for open access researchers. Throughout the year, I have also contributed to a monthly sum-up of open science activities in order to make these activities more visible within the local community. You can find the sum-ups (only available in German) on the openscienceASAP stream.
I also went to a lot of events outside Austria where I argued for more openness and transparency in science: OKCon 2013 in Geneva, SpotOn 2013 in London, and Science Online Together 2014 in Raleigh (NC). At the Open Knowledge Festival in Berlin, I was session facilitator for “Open Data and the Panton Principles for the Humanities. How do we go about that?”. The goal of this session is to devise a set of clear principles which describe what we mean by Open Data in the humanities, what these should contain and how to use them. In my role as an advocate for reproducibility I wrote a blog post on why reproducibility should become a quality criterion in science. The post sparked a lot of discussion, and was widely linked and tweeted.
The Panton Fellowship was a unique opportunity for me to work on open science, to visit open knowledge events around the world, and to meet many new people who are passionate about the topic. Naturally, the end of the fellowship does not mark the end of my involvement with the open science community. In my new role as a scientific project developer for Science 2.0 and open science at Know-Center, I will continue to advocate openness and transparency. As part of my research on altmetrics-driven discovery, I will also pursue my open source work on the Head Start framework. With regards to outreach work, I am currently busy drafting a visibility concept for open access researchers in the Open Access Network Austria (OANA). Furthermore, I am involved in efforts to establish a German-speaking open science group
I had a great year, and I would like to thank everyone who got involved. Special thanks go to Peter Murray-Rust and Michelle Brook for administering the program and for their continued support. As always, if you are interested in helping out with one or the other project, please get in touch with me. If you have comments or questions, please leave them in the comments field below.
All outputs at a glance
Head Start – open source research overview visualization
- “It’s not only peer reviewed, it’s reproducible”
- Open and transparent metrics for discovery
- New version of open source visualization Head Start released
- What’s new in Head Start?
- All metrics are wrong, but some are useful
Audio and Video
- Panton Fellows introduction at OKCon 2013
- Panel “Science in a time of change – Is Open science the future?” [German]
- Podcast Open Science in Research [German]
- Introduction to Open Research Data as part of Open Science Panel Vienna [German]
- Introduction to Altmetrics @ OKFN Austria Open Science Meetup
- Open Data and the Panton Principles for the Humanities – How do we go about that? @ Open Knowledge Festival 2014
- Altmetrics-based Visualizations Depicting the Evolution of a Knowledge Domain @ Science and Technology Indicators 2014
- Open Data @ Open Science on board the MS Wissenschaft [German]
- My objectives as a Panton Fellow
- First quarterly report on my Panton Fellowship activities
- Second quarterly report on my Panton Fellowship activities
- The third quarter of my Panton Fellowship in the rear view mirror
- Open data and the Panton Principles in the humanities
Open Science Sum-Ups (contributions) [German]
From September 3 to 5, I will be attending STI 2014, the 19th International Conference on Science and Technology Indicators. There, I will present a paper entitled “Altmetrics-based Visualizations Depicting the Evolution of a Knowledge Domain” that I co-authored with Philipp Weißensteiner and Peter Brusilovsky (download the PDF here). In this work-in-progress paper, we present an approach to visualizing the topical evolution of a scientific conference over time.
Below you can see the results: a topical overview of the 19th and 20th iteration of UMAP, representing the conference years of 2011 and 2012, as well as the evolution of the domain.An interactive prototype can be found on http://stellar.know-center.tugraz.at/umap/.
Data Source and Method
The data source for these visualization is quite unique: it is the conference scheduling system Conference Navigator 3 (CN3). CN3 allows conference attendees to create a personal schedule by bookmarking talks from the program that they intend to follow. And it is exactly this scheduling data that we have employed to create the above visualizations: we used co-bookmarking as a measure of subject similarity, meaning that two documents are related when they are bookmarked by the same user in the system (see example to the right). The more often two documents are bookmarked together, the more similar they are subject-wise.
On top of this co-bookmarking data, we performed the knowledge domain visualization process from the open source visualization Head Start to create individual representations of the field (please refer to the paper for details). This resulted in the first two visualizations pictured earlier on. The blue bubbles represent research areas. The size of an area is determined by number of bookmarks that the papers related to this area have received. Spatial closeness implies topical similarities. In 2011, “User modeling” is the area with most papers and most bookmarks. It is closely connected to several other larger areas, including “Recommender system”. A second cluster of areas can be found on the right hand side of the visualization, involving “Intelligent tutoring system”, “Adaptive system”, and “Problem solving”.
The next question was how to visualize the evolution of the conference. As far as time series visualization goes, there are many types of visualizations, most prominently index charts and stacked graphs. In the case of knowledge domain visualizations, simple visualizations are unfortunately not able to convey all necessary dimensions of the data (in terms of ordination, size of research areas and closeness). One possibility would have been to use animation, as shown in the video below with Hans Rosling.
In the end, we did not choose to use animation. Why? The reason for that is a psychological phenomenon called change blindness (Simons and Rensink, 2005). It means that people are bad at recognizing change in an object or scence. In the next video, the phenomenon is explained and illustrated with an astonishing example.
Animation seems to be especially prone to change blindness; in the video below by Suchow and Alvarez (2011), the colored dots making up the ring are constantly changing. This changing of color seems to stop when the circle itself starts to move – except that it does not. If you concentrate on individual dots, you can see that they keep changing color.
Surely, this is an extreme example, but think about it: if Hans Rosling were not to be talking you through the video above, would you have recognized all the changes taking place and would you have been able to interpret them correctly? If you concentrate on one country specifically, could you remember the movement of the other countries as well? Chances are, you would have to watch the animation many times to come up with the same interpretation as Prof. Rosling.
All of these considerations led us to choose a different visualization concept popularized by Edward Tufte: small multiples. In small multiples, a graph is drawn for each of the steps in a time series. Then the graphs are positioned next to each other. This approach thus allows for direct visual comparison between different representations.
To aid the user in detecting changes between the representations, we introduced two visual helpers. First, a grid is drawn to help with comparing size and position of the research areas. Second, whenever users hover over an area, the corresponding area is highlighted in the other representation, and a line is drawn between the two entities. There are three areas that are present in both years: “User modelling”, “Recommender system” and “Intelligent tutoring system”. While the relative position of the areas to each other has not changed much, the area with the most papers and bookmarks is now “Recommender system”.
As you can see from the examples above, this is just a first prototype, albeit a promising one. Using small multiples allows for a comparison of knowledge domain visualizations over various years.
Nevertheless, there are certain weaknesses in the current approach: first, the topology of the visualizations is not ideal, as many areas may overlap each other. Second, the usefulness of the method depends on the usage of the system by conference participants. Therefore, we are looking into supplementing bookmarking data with content-based measures when there was lower participation. Third, the continuity between the two years is very low. This could be improved by introducing moving time windows of two years. Finally, it will be important to evaluate the method and the interface.
Any comments on the issues mentioned above and the paper in general are of course very welcome!
Kraker, P., Weißensteiner, P., & Brusilovsky, P. (2014). Altmetrics-based Visualizations Depicting the Evolution of a Knowledge Domain 19th International Conference on Science and Technology Indicators (STI 2014), 330-333
Note: This is a reblog from the OKFN Science Blog. As part of my duties as a Panton Fellow, I will be regularly blogging there about my activities concerning open data and open science.
Altmetrics, web-based metrics for measuring research output, have recently received a lot of attention. Started only in 2010, altmetrics have become a phenomenon both in the scientific community and in the publishing world. This year alone, EBSCO acquired PLUM Analytics, Springer included Altmetric info into SpringerLink, and Scopus augmented articles with Mendeley readership statistics.
Altmetrics have a lot of potential. They are usually earlier available than citation-based metrics, allowing for an early evaluation of articles. With altmetrics, it also becomes possible to assess the many outcomes of research besides just the paper – meaning data, source code, presentations, blog posts etc.
One of the problems with the recent hype surrounding altmetrics, however, is that it leads some people to believe that altmetrics are somehow intrinsically better than citation-based metrics. They are, of course, not. In fact, if we just replace the impact factor with the some aggregate of altmetrics then we have gained nothing. Let me explain why.
The problem with metrics for evaluation
You might know this famous quote:
“All models are wrong, but some are useful” (George Box)
It refers to the fact that all models are a simplified view of the world. In order to be able to generalize phenomena, we must leave out some of the details. Thus, we can never explain a phenomenon in full with a model, but we might be able to explain the main characteristics of many phenomena that fall in the same category. The models that can do that are the useful ones.
The very same can be said about metrics – with the grave addition that metrics have a lot less explanatory power than a model. Metrics might tell you something about the world in a quantified way, but for the how and why we need models and theories. Matters become even worse when we are talking about metrics that are generated in the social world rather than the physical world. Humans are notoriously unreliable and it is hard to pinpoint the motives behind their actions. A paper may be cited for example to confirm or refute a result, or simply to acknowledge it. A paper may be tweeted to showcase good or to condemn bad research.
In addtion, all of these measures are susceptible to gaming. According to ImpactStory, an article with just 54 Mendeley readers is already in the 94-99 percentile (thanks to Juan Gorraiz for the example). Getting your paper in the top ranks is therefore easy. And even indicators like downloads or views that go into the hundreds of thousands can probably be easily gamed with a simple script deployed on a couple of university servers around the country. This makes the old citation cartel look pretty labor-intensive, doesn’t it?
Why we still need metrics and how we can better utilize them
Don’t get me wrong: I do not think that we can come by without metrics. Science is still growing exponentially, and therefore we cannot rely on qualitative evaluation alone. There are just too many papers published, too many applications for tenure track positions submitted and too many journals and conferences launched each day. In order to address the concerns raised above, however, we need to get away from a single number determining the worth of an article, a publication, or a researcher.
One way to do this would be a more sophisticated evaluation system that is based on many different metrics, and that gives context to these metrics. This would require that we work towards getting a better understanding of how and why measures are generated and how they relate to each other. In analogy to the models, we have to find those numbers that give us a good picture of the many facets of a paper – the useful ones.
As I have argued before, visualization would be a good way to represent the different dimensions of a paper and its context. Furthermore, the way the metrics are generated must be open and transparent to make gaming of the system more difficult, and to expose the biases that are inherent in humanly created data. Last, and probably most crucial, we, the researchers and the research evaluators must critically review the metrics that are served to us.
Altmetrics do not only give us new tools for evaluation, their introduction also presents us with the opportunity to revisit academic evaluation as such – let’s seize this opportunity!