Archive

Visualizations

Note: this post first appeared on ZBW Mediatalk and has been updated to reflect the latest update of Open Knowledge Maps.

Science and research are more productive than ever. Every year, around 2.5 million research articles are published, and counting. A lot of research information is openly available: thanks to the open access movement, we can now find more than 100 million scientific outputs on the web. We have made great strides with respect to accessibility; but what about discoverability? After all, this enormous amount of knowledge is only of use to us, if it reaches the people that need it, and if it is reused as a basis for further research or transferred to practice. Here we can see a big gap. Depending on the discipline 12% to 82% of all scientific publications are never cited. This means that these publications do not serve as a basis for further research. When talking about transfer to practice, the gap is even wider: even in application-oriented disciplines such as medicine, only a small percentage of research findings ever influence practice – and even if they do so, often with a considerable delay.

Mission

What prevents knowledge transfer to practice?

One reason for this situation is that the tools for exploration and discovery of scientific knowledge are seriously lacking. Most people use search engines for this task. Search engines work very well, when you know what you want. Then, they deliver the result you are looking for – often with high precision. However, if you want to get an overview of an unknown scientific field, the list-based representation with only 10 results per page is not sufficient. With search engines, it takes a long time, before you know the main areas in a field, the most important terms, authors and journals. It can take weeks if not months – indeed in many PhD programs, the whole first year is devoted to this process. Many people in research and especially practitioners do not have that much time. Think about science journalists or patients. To summarize: there are many people out there that could benefit from scientific knowledge, if there were better tools for discovering research results.

Knowledge maps instead of lists

At Open Knowledge Maps, we intend to close this gap, to provide the missing link between accessibility and discoverability. Instead of lists, we use knowledge maps for discovery. Knowledge Maps show the main areas of a field at a glance. Relevant publications are already attached to each area. This enables users to get a quick overview of a field.

overview_heart_diseases_rund-exampleThe sub-areas also make you aware of the terminology in a field. This information alone may take weeks to find out. How much time have you already lost to searching without knowing the best search terms? In addition, the knowledge map enables users to separate the wheat from the chaff with respect to their current information need. For an ambiguous search term for example, the different meanings are sorted into separate areas.

benefits-all

Open Knowledge Maps as an openly accessible service

At Open Knowledge Maps, we are offering an openly accessible service, which allows you to create a knowledge map for any search term. Users can choose between two databases: Bielefeld Academic Search Engine (BASE) with more than 110 million scientific documents from all disciplines, and PubMed, the large biomedical database with 26 million references. We use the 100 most relevant results for a search term as reported by the respective data base as a base for our knowledge maps. The ordination and the determination of the areas is based on textual similarity of the metadata of the results. This means: the more words two documents have in common in either title, abstract, authors or journal name, the closer they are positioned on the map and the more likely they are placed in the same area. For everyone who would like to dive deeper into the algorithms used to create the map, our article Open Knowledge Maps: Creating a Visual Interface to the World’s Scientific Knowledge Based on Natural Language Processing in the journal 027.7 is worth reading.

The knowledge map for research “sugar” can be seen below. As described above, the bubbles represent the different areas. If you click on one of the bubbles, you are presented with papers related to this area. Open access articles are clearly marked and can be read within the interface. The idea is that you do not need to leave the browser tab while searching for literature. Go to Open Knowledge Maps to check out the search service.

sugar

Open Knowledge Maps brings open science to life

The “Open” in Open Knowledge Maps does not only stand for open access articles – we want to go the whole open science way and create a public good. This means that all of our software is developed open source. You can also find our development roadmap on Github and leave comments by opening an issue. The knowledge maps themselves are licensed under a Creative Commons Attribution license and can be freely shared and modified. We will also openly share the underlying data, for example as Linked Open Data. This way, we want to contribute to the open science ecosystem that our partners, including rOpenSci, ContentMine, Open Knowledge, the Internet Archive Labs and Wikimedia are creating.

We see libraries as important collaboration partners. We cooperate with the libraries of the University of Bielefeld and the Austrian Academy of Sciences. ZBW is also using software from Open Knowledge Maps in a joint project with the Know-Center. This collaboration is a win for both sides: Open Knowledge Maps is a stable, user friendly system, which enables libraries to visualize their collections of documents and to improve their discoverability. On the other hand, improvements from these projects are fed back into the software of Open Knowledge Maps, improving the system for all users.

Vision: Collaborative literature search

In the future, we want to enable collaborative literature search in Open Knowledge Maps. At the moment, most people are tackling discovery on their own. But usually someone has already walked this way before you. Unfortunately, the knowledge gained during discovery remains in people’s heads. We want to further develop Open Knowledge Maps, so that maps can be modified, extended and shared again – so that we can build on top of each other’s’ knowledge. We have created a short video to illustrate this idea:

We see libraries and librarians as central to this vision. A collaborative system cannot work without experts on knowledge curation and structuring. Together with the other stakeholders from research and society, including researchers, students, journalists, citizen scientists and many more, we want to create a system that enables us to create pathways through science for each other. So that we can all benefit from this unique knowledge.

Advertisements

baseintegration

We have now connected Open Knowledge Maps to one of the largest academic search engines in the world: BASE. This means, you are able to visualize a research topic from 100+ million documents. And for the first time, you can search within different types of resources, including datasets and software. I would like to thank our collaborators BASE and rOpenSci for their outstanding support in making this happen!

We have also spent a lot of time improving the naming of the sub-areas to make the concepts in a field more visible – which means that this update improves our existing PubMed integration too.

In addition, we have added much more information to the site about the project and our approach. Open Knowledge Maps follows the motto “open science, all the way”. From our roadmap to our source code and our data, we publish everything under an open license that is compatible to the Open Definition.

Try it out now and let me know of any feedback you may have!

Create a visualization based on 28 million articles

Today, I am very proud to announce a milestone for Open Knowledge Maps. Thanks to an outstanding team and continued support by our partners and advisors, we have added two major content sources: the Directory of Open Access Journals (DOAJ) with more than 2.3 million articles and PubMed with more than 26.5 million articles. Taking into account a certain overlap between the two sources, we can now credibly state that one can create maps based on 28 million papers. That’s a content pool that is 175 times larger than in the previous iteration using PLOS (about 160,000 articles).

We have also completely overhauled our design & overall presentation and improved the user experience considerably. In addition, we have included the open annotation software Hypothes.is in our PDF preview.

We believe that this is a major step towards revolutionizing the way we discover research. There are many new things to try out and explore – we are looking forward to your feedback and suggestions!

Try it out now!

okmaps_start

We officially announced the launch of Open Knowledge Maps, the visual interface to the world’s scientific knowledge, with a tweet and a message to our advisors late on May 11. We had soft-launched the site more than a week before that, and a bare-bones version of the PLOS visualization service had been online since Mozfest. The website was already getting some attention, and people were using the service on a daily basis. One of the highlights was a feature on Storybench in the very early days of the project. The idea behind the announcement was to get broader feedback on the search service and the overall idea behind Open Knowledge Maps. We had come a long way since the Mozfest days, and we thought that the website was ready for a wider audience.

What was to follow though went far beyond my highest expectations. Over the next 48 hours, we saw more than 350,000 hits on openknowledgemaps.org, generated by 12,000 visitors from all over the world.

What had happened? On the morning of the next day (May 12), I noticed that the tweet had gained a lot of traction, which had translated to acitivity on the site. Lots of people were using the search service, and a new map was created every few seconds. Much to our delight, the feedback was overwhelmingly positive. We started filing all the reactions as many of them contained useful pointers for future improvements. At this point, our server was still humming along fine. Granted, you had to wait a few extra seconds on the search here and there, but nothing out of the ordinary.

During the day, news about Open Knowledge Maps spread to other channels, and at some point around noon CEST, we hit the front page of Hacker News. I immediately noticed a spike in all our parameters. We went from a map every few seconds to multiple maps being created each second. Search time began to rise and we started receive complaints about failed or endless searches. Around 3:30 pm, our server finally gave in. Hundreds of searches were running at the same time, each of them taking minutes to be processed. It was time to take the search service offline and to post our version of the “Fail Whale”. You can still find a version of this screen here.

okmaps_fail

While we frantically rewrote the search service to handle a larger amount of requests (it was back a mere 60 minutes later), the stream of positive feedback continued to roll in. Up until today, Open Knowledge Maps was mentioned in over 100 tweets, with the announcement tweet creating more than 22,000 impressions alone. You can find a collection of my favourite tweets in this collection. But it was not just Twitter – the news was shared on Facebook, blogs, and in discussion forums.

At one point, we were called “the Wikipedia of scientific knowledge”. It is clear that we still have to go a long way to really deserve this tagline, but it is encouraging that people see the potential of the idea. Needless to say, the positive feedback also sparked the ambition of the Open Knowledge Maps team of volunteers. We are currently busy improving the site and the service; the first results will be available in just a few weeks.

It was a fascinating day in the eye of the storm. I’d like to thank my awesome team for their outstanding work and our great advisors for their help in shaping Open Knowledge Maps. And I’d like to thank all of you out there for the love that you have shown for this project. It means a lot to me – Open Knowledge Maps is a project that is very close to my heart. Please continue providing feedback via social media, e-mail, or on our Github repositories. You can also sign up to the newsletter to stay on top of everything #OKMaps.

It is time to change the way we discover research, and we are off to a good start!

A little longer than a month ago, I posted an Open Call for Collaborators for an Open Science Prize Proposal on Discovery on this blog and to various open science mailing lists. The call has been very fruitful and I am happy to announce that we have submitted a proposal. In the spirit of open science, you can find the full proposal and the supplementary materials on Github. See below for the executive summary and our video.

Team Open Discovery: Peter Kraker, Mike Skaug, Scott Chamberlain, Maxi Schramm, Michael Karpeles, Omiros Metaxas, Asura Enkhbayar & Björn Brembs

Executive Summary: Discovery is an essential task for every researcher, especially in dynamic research fields such as biomedicine. Currently, however, there are very few discover tools that can be used by a mainstream audience, most notably search engines. The problem with search engines is that they present resources in a linear, one-dimensional way, making it necessary to sift through every item in a list. Another problem is that the results of the traditional discovery process are usually closed. Therefore, the discovery process is repeated over and over again by different researchers, taking away valuable time and resources from the actual research. To solve these challenges and bring the discovery process into the open science era, we propose BLAZE, the comprehensive open science discovery tool. BLAZE will leverage the existing open science ecosystem to provide multi-dimensional topical maps of research fields, involving not only publications, but also datasets, presentations, source code and media files. BLAZE will provide a single, intuitive interface for researchers to explore, edit and share maps. The edit history of a map will be preserved to allow Wikipedia style collaboration. The maps themselves will be open, so users can embed them on their own websites and export the structure into other open science tools. Opening the discovery process will enable researchers to reuse maps, saving valuable time and effort because they can build on top of each other’s work. Furthermore, they will be able identify collaborators long before the research is usually communicated. There is an existing, early-stage protoype for BLAZE and with the Open Science Prize, we plan to develop this prototype into a comprehensive tool. BLAZE will show the enormous potential of open science for innovation in scholarly communication by providing a structured, open and multi-dimensional approach to discovery.

I am currently preparing a proposal for the Open Science Prize in the field of open discovery, and I am looking for motivated collaborators who want to join the project and change the way we do discovery. Here is the current summary:

Discovery is an essential task for every researcher, especially in dynamic research fields such as biomedicine. Currently, however, there are only a limited number of tools that can be used by a mainstream audience.We propose BLAZE, an open discovery tool that goes far beyond the functionality of search engines and social reading lists. The tool builds on Pubmed Central and other open content sources and will provide topical maps for a given list of papers, e.g. a search result or a journal volume. The maps are created automatically using fulltexts to calculate similarities and derive topical structures among papers. Furthermore, they will be enriched with features that are extracted from the papers (e.g. all papers with the same species are highlighted). BLAZE will enable users to do their discovery in a single interface. Users can interact with the maps, explore different topical areas, filter and read individual papers in the same interface. An edit mode will provided for users to make changes to the maps and to introduce new papers and topical areas. Users can openly share maps with others and export the structure in various open formats. BLAZE will be based on the existing open source visualization Head Start, and make extensive use of the digital open science ecosystem, including, but not limited to, open content, content mining services, open source solutions, and open metrics data. With this tool, we want to show the potential of open science for innovation in scholarly communication and discovery. In addition, we believe that this tool will increase the visibility of and awareness for open content and open science in general.

A first draft is also available.

I am looking for backend and frontend web developers who code in JavaScript and/or PHP and R. We will be extending an existing tool for creating web-based knowledge domain visualizations that uses D3.js on the frontend, and R content mining packages on the backend, in particular rOpenSci and tm, so you should have experience with at least one of these libraries. A background in biomed would be nice but it’s not mandatory.

Everything about this project will be open: we will prepare the proposal in the open, the development will take place on a public Github repository, and all project outputs will be published under an open license.

So if you want to join the project and create an awesome open science tool together with me, please send an e-mail to opendiscovery@gmx.at outlining which part of the project interests you most, what you’d be able to contribute and how many hours you could devote to the project over the coming months. Please also include a link to your Github repository. It would be great if you could let me know whether you are a citizen of, or permanent resident in, the United States (US), as we will need to have at least one team member who satisfies this criterion. I am looking forward to your messages!

Note: This post is a reblog from the LSE Impact Blog.hitchhikers-guide

In Douglas Adam’s famous novel The Hitchhiker’s Guide to the Galaxy, an unsuspecting man called Arthur Dent is lifted onto a spaceship just before earth is demolished by intergalactic bureaucrats. Together with a group of interstellar travellers (including amongst others the President of the galaxy), he then embarks on a journey through the universe to unravel the events that lead to the destruction of earth. To help Arthur better understand the new surroundings he is thrown into, he is handed a copy of The Hitchhiker’s Guide to the Galaxy, a multimedia guidebook that offers wisdom and advice on all topics of interest in the universe.

Starting out in a new scientific field can feel very similar: you are faced with a new world that you have to make sense of. Unfortunately, the knowledge needed to understand this new world is not readily structured and summarized in one handy guide, but scattered over millions of scientific articles. To make matters worse, you have no idea which articles belong to the field that you are interested in and which of them are actually important. For many researchers, the starting point in their quest to conquer an unfamiliar knowledge domain is to turn to their personal favourite search engine, type in the name of the field of interest and start reading at the top of the list. Once you have read through the first few articles (usually highly cited review articles), and followed relevant references, you develop an idea of important journals and authors in the field and adapt your search strategy accordingly. With time and patience, a researcher can thus build a mental model of a field.

The problem with this strategy is that it can take weeks, if not months before this mental model emerges. Indeed, in many PhD programs, the first year is devoted to catching up with the state-of-the-art. There is also a lot of reading and summarizing involved, but searching for relevant literature usually accounts for a large chunk of the time. And even with the most thorough search strategy, the probability that you are going to miss out on an important piece of prior work is rather high.

Another means of getting an overview a research field are knowledge domain visualizations. An example for such a visualization is given above. Knowledge domain visualizations show the main areas in a field, and assign relevant articles to these main areas. Hence, an interested researcher can see the intellectual structure of a field at a glance without performing countless searches with all different sorts of queries. An additional characteristic of knowledge domain visualizations is that areas of a similar subject are positioned closer to each other than areas of an unrelated subject. In the example “Pedagogical models” is subject-wise closer to “Virtual learning environments” than “Psychological theories”. Thus it is easy to find related areas to one’s own interests. Granted, even with a knowledge domain visualization in hand, you would still need to do the reading. But it would certainly save you a lot of time that you would otherwise spend on searching, indexing and structuring.

Image credit: Maxi Schramm. Public domain.

Image credit: Maxi Schramm. Public domain.

Knowledge domain visualizations can not only be created on the level of the individual research article. Below you can see a visualization by Bollen et al. (2009) of all of science. The nodes in the network represent research journals and the different colors designate different disciplines. Even though the idea of knowledge domain visualizations has been around for quite some time, and despite their obvious usefulness, they are not yet widely available. Part of the reason may be that in the past, the data needed to construct these visualizations was only available from a few rather expensive choices. Part of the reason may be that there has been an emphasis on all-encompassing overviews. While they provide valuable insights into the structure of science as a whole, they are usually not interactive and provide little value in day-to-day work where you want to be able to zoom into specific publications. There are several applications out there that can be used to create one’s own overview, but they can usually only be operated by users that are information visualization specialists.

Image credit: Bollen J, Van de Sompel H, Hagberg A, Bettencourt L, Chute R, et al. (2009) Clickstream Data Yields High-Resolution Maps of Science. PLoS ONE 4(3): e4803. Creative Commons Attribution 3.0 Unported.

Image credit: Bollen J, Van de Sompel H, Hagberg A, Bettencourt L, Chute R, et al. (2009) Clickstream Data Yields High-Resolution Maps of Science. PLoS ONE 4(3): e4803. Creative Commons Attribution 3.0 Unported.

In our work, we therefore aimed at creating an interactive visualization that can be used by anyone. As a first case, we chose to visualize the field of educational technology, as it represents a highly dynamic and interdisciplinary research field. As described in a recently published paper in the Journal of Informetrics (Kraker et al 2015), the visualization is based on a novel data source – the online reference management software Mendeley. The articles for the visualization were selected from Mendeley’s research catalog which is crowd-sourced from over 2.5 million users from around the world and offers structured access to more than a 100 million papers.

One of the most important steps when creating a knowledge domain visualization is to decide which measure defines the similarity between two articles. The measure determines where an article gets placed on the map and how it is related to other articles. Again, we used Mendeley data to tackle this issue. Specifically, we used co-readership information. “So what is this co-readership exactly?” you may ask. Mendeley enables users to store their references in a personal library and share them with other people. The number of times an article has been added to user libraries is commonly referred to as the number of readers, or in short readership. In analogy to that, we are talking about the co-readership of documents, when they are added to the same user library. When Alice adds Paper 1 and Paper 2 to her user library, the co-readership of these two documents is 1. When Bill adds the same two papers, the co-readership count goes up to 2, and so on. Our assumption was now that the higher the co-readership of two documents, the more likely they are of the same or a similar subject. It’s not unlike two books that are often rented together from a library – there is a good chance that they address related topics. And indeed, our first analyses indicate that our assumption is valid.

The cool thing is that once you have settled on a similarity measure, the process of creating the map can be highly automated. We adapted procedures for assigning papers to research areas and for situating them on the map. We also put a heuristic in place that tries to guess a name for each area using web-based text mining systems OpenCalais and Zemanta.

The resulting knowledge domain visualization can be seen below. The blue bubbles represent the main areas in the field. The size of the bubbles signifies the number of readers of publications in that area. The closer two areas are in the visualization, the closer they are subject-wise. An interactive version is also available; once you click on a bubble, you are presented with popular papers in that area. The dropdown on the right displays the same data in list form. Just go to Mendeley Labs (http://labs.mendeley.com/headstart) and try it for yourself! The source code is available on github: http://github.com/pkraker/Headstart

headstart

Apart from the fact that you can get a quick overview of a field, there are many other interesting things that you can learn about a domain from such a visualization. Fisichella and his colleagues even argue that mappings like the one above might help to overcome the fragmentation in educational technology by building awareness among researchers of the different sub-communities. There may be some truth to this assumption: when I evaluated the map with researchers from computer science, they discovered research areas that they did not know existed. One example is Technological Pedagogical Content Knowledge, which is a conceptual framework emanating from the educational part of the research community.

Another interesting possibility is to study the development of fields over time [1]. When I compared the map to similar maps based on older literature (e.g. Cho et al. 2012), I learned a lot about the development of the field. Whereas learning environments played an important role in the 2000s, issues relating to them have later split up into different areas (e.g. Personal Learning Environments, Game-based Learning). You can find further examples in the paper describing the full details of the evaluation which still under review. You can find a pre-print on arXiv.

Given the enormous amount of new knowledge that is produced each and every day, the need for better ways of gaining – and keeping – an overview is becoming more and more apparent. I think that visualizations based on co-readership structures could provide this overview and serve as universal up-to-date guides to knowledge domains. There are still several things that need fixing – the automated procedure for example is not perfect and still requires manual interventions. Furthermore, the characteristics of the users have a certain influence on the result, and we need to figure out a way to make users aware of this inherent bias. Therefore, we are currently working on improving automatization techniques. Algorithms, however, will never be correct 100% of the time, which is why we are also experimenting with collaborative models to refine and extend the visualizations. After all, an automated overview can never be the end product, but rather a starting point to discovery.

Kraker, P., Schlögl, C., Jack, K., & Lindstaedt, S. (2015). Visualization of co-readership patterns from an online reference management system Journal of Informetrics, 9 (1), 169-182 DOI: 10.1016/j.joi.2014.12.003. Preprint: http://arxiv.org/abs/1409.0348

[1] Educational technology experts will notice that some of the newest developments in the field such as MOOCs or learning analytics are missing from the overview. That is due to the fact that the data for this prototype was sourced in August 2012 and is therefore almost 2,5 years old. The evaluation was conducted in the first half of 2013.

%d bloggers like this: