Create a visualization based on 28 million articles

Today, I am very proud to announce a milestone for Open Knowledge Maps. Thanks to an outstanding team and continued support by our partners and advisors, we have added two major content sources: the Directory of Open Access Journals (DOAJ) with more than 2.3 million articles and PubMed with more than 26.5 million articles. Taking into account a certain overlap between the two sources, we can now credibly state that one can create maps based on 28 million papers. That’s a content pool that is 175 times larger than in the previous iteration using PLOS (about 160,000 articles).

We have also completely overhauled our design & overall presentation and improved the user experience considerably. In addition, we have included the open annotation software in our PDF preview.

We believe that this is a major step towards revolutionizing the way we discover research. There are many new things to try out and explore – we are looking forward to your feedback and suggestions!

Try it out now!

by Rich Savage, CC BY 2.0

by Rich Savage, CC BY 2.0

Note: This post originally appeared on the OpenAIRE blog on 22 June 2016.

Last week, we published the Vienna Principles: A Vision for Scholarly Communication in the 21st Century. The announcement of the publication has been widely shared.

In this contribution, I’d like to provide more context on how the principles came about – starting with the network that brought the authors together: the Open Access Network Austria (OANA). OANA was established in 2012 as a joint activity under the organisational umbrella of the Austrian Science Fund (FWF) and Universities Austria (UNIKO), and it has become a well-known entity in the world of open access. Its members were for example part of the negotiation team that led to the Austrian Springer deal. OANA is also the origin of the widely shared and well-received Recommendations for the Transition to Open Access in Austria, which call for the bulk of scholarly communication in Austria to be open access by 2025. In line with OANA’s mission, the document does not only propose objectives, but also defines a set of specific recommendations for the implementation of this goal. OANA is therefore an important driving force for making open access a reality in Austria.

During OANA’s second assembly in 2015, Open Knowledge Austria brought forward a proposal to broaden the scope of the network beyond open access to explore various other instruments of open science. Based on this proposal, the OANA core team commissioned the working group “Open Access and Scholarly Communication” to sketch a vision of how open science can change scholarly communication in the long run. The working group first met in April 2015 in the Museumsquartier in Vienna. Over the following year, we had five further meetings, each of them in a different Viennese location – hence the name “Vienna Principles”.


Location of our meetings, Image contains content by OpenStreetMap Contributors, CC BY-SA 2.0

By scholarly communication we mean the processes of producing, reviewing, organising, disseminating and preserving scholarly knowledge (This definition is based on the definition found in Wikipedia [05 June 2016]). Scholarly communication does not only concern researchers, but also society at large, especially students, educators, policy makers, public administrators, funders, librarians, journalists, practitioners, publishers, public and private organisations, and interested citizens.

As you can see from our working definition above, we have a broad understanding of scholarly communication, especially when it comes to its stakeholders. Our group reflected this diverse approach: it consisted of librarians, science administrators, students and researchers from a wide range of disciplines, including arts & humanities, engineering, natural sciences and social sciences in both basic and applied contexts. Many working group members are involved in related initiatives, such as Citizen Science Austria, Open Knowledge and OpenAIRE to name just a few, and several have a relevant professional background, including publishing and  software development. The core group consisted of nine participants, but the overall work involved contributions and feedback by more than 20 people and the audiences of the 15th Annual STS Conference, Graz and the 3rd Plenary of the Open Access Network Austria.

Our work started from a number of observations that were based on our own involvement in open science, and by the experience of members of the group that had joined the movement only very recently. The first of these observations was that for many, open science is still a fuzzy concept. People are often unclear about its benefits and therefore tend to have a reserved attitude towards openness. Our second observation was that the debate within the open science community is not necessarily focused on the benefits of openness, but mostly on what constitutes openness, how to achieve openness, and what steps to take next. The classic debate around the “green” and the “gold” route to open access is a good example for this. In these discussions, many of the arguments carry implicit assumptions about the structures of a future scholarly communication system.

These observations led to a first round of input backed up by results from research into the subject, which included an analysis of the state of the debate in open access, a compilation of actors and actor groups within scholarly communication, benefits and issues of open science, and the deficits of the current scholarly communication system.

We concluded that there is currently no commonly agreed set of principles that describes the system of open scholarly communication that we want to create. Such a collection of widely shared cornerstones of the scholarly communication system would, however, help to better guide the debate around open science. At the same time, a vision that answers the question “what for?” would help to better convey the need for openness in scholarly communication to academia and society.

For the definition of the principles, we adopted a clean slate approach, as advocated for example by Cameron Neylon. This means that we set out to describe the world that we want to live in, if we had the chance to design it from scratch, without considering the restrictions and path dependencies of the current system. Our aim was to be clear, concise and as comprehensive as possible, without repeating ourselves. What followed was an intense phase, where we devised and revised, expanded and reduced, split and merged. We also addressed and incorporated the valuable feedback that we received by participants of the 15th Annual STS Conference in Graz and the 3rd Plenary of the Open Access Network Austria.

The main result of our considerations can be seen below: a set of twelve principles of scholarly communication describing the cornerstones of open scholarly communication. It is important to note that we do not see this document as the end of the matter – it is version 1.0. We invite everyone to comment on this first version on


So what’s next? The working group will continue its job in the next iteration of OANA, starting this fall. Consolidating the feedback will be an important part of our work, as well as staying on top of the developments in scholarly communication. But we we will also be busy to devise recommendations on how to turn each principle into reality, while coordinating our efforts with other groups such as the Force11 working group on the Scholarly Commons. We are looking forward to shaping the scholarly communication system of the future together with all of you!


We officially announced the launch of Open Knowledge Maps, the visual interface to the world’s scientific knowledge, with a tweet and a message to our advisors late on May 11. We had soft-launched the site more than a week before that, and a bare-bones version of the PLOS visualization service had been online since Mozfest. The website was already getting some attention, and people were using the service on a daily basis. One of the highlights was a feature on Storybench in the very early days of the project. The idea behind the announcement was to get broader feedback on the search service and the overall idea behind Open Knowledge Maps. We had come a long way since the Mozfest days, and we thought that the website was ready for a wider audience.

What was to follow though went far beyond my highest expectations. Over the next 48 hours, we saw more than 350,000 hits on, generated by 12,000 visitors from all over the world.

What had happened? On the morning of the next day (May 12), I noticed that the tweet had gained a lot of traction, which had translated to acitivity on the site. Lots of people were using the search service, and a new map was created every few seconds. Much to our delight, the feedback was overwhelmingly positive. We started filing all the reactions as many of them contained useful pointers for future improvements. At this point, our server was still humming along fine. Granted, you had to wait a few extra seconds on the search here and there, but nothing out of the ordinary.

During the day, news about Open Knowledge Maps spread to other channels, and at some point around noon CEST, we hit the front page of Hacker News. I immediately noticed a spike in all our parameters. We went from a map every few seconds to multiple maps being created each second. Search time began to rise and we started receive complaints about failed or endless searches. Around 3:30 pm, our server finally gave in. Hundreds of searches were running at the same time, each of them taking minutes to be processed. It was time to take the search service offline and to post our version of the “Fail Whale”. You can still find a version of this screen here.


While we frantically rewrote the search service to handle a larger amount of requests (it was back a mere 60 minutes later), the stream of positive feedback continued to roll in. Up until today, Open Knowledge Maps was mentioned in over 100 tweets, with the announcement tweet creating more than 22,000 impressions alone. You can find a collection of my favourite tweets in this collection. But it was not just Twitter – the news was shared on Facebook, blogs, and in discussion forums.

At one point, we were called “the Wikipedia of scientific knowledge”. It is clear that we still have to go a long way to really deserve this tagline, but it is encouraging that people see the potential of the idea. Needless to say, the positive feedback also sparked the ambition of the Open Knowledge Maps team of volunteers. We are currently busy improving the site and the service; the first results will be available in just a few weeks.

It was a fascinating day in the eye of the storm. I’d like to thank my awesome team for their outstanding work and our great advisors for their help in shaping Open Knowledge Maps. And I’d like to thank all of you out there for the love that you have shown for this project. It means a lot to me – Open Knowledge Maps is a project that is very close to my heart. Please continue providing feedback via social media, e-mail, or on our Github repositories. You can also sign up to the newsletter to stay on top of everything #OKMaps.

It is time to change the way we discover research, and we are off to a good start!

On May 1, I submitted an application for a Shuttleworth Foundation Fellowhip. Started by Mark Shuttleworth in 2001, the Shuttleworth Foundation has enabled many amazing open knowledge initatives, including  ContentMine (Peter Murray-Rust) and (Dan Whaley). The foundation has expressed the following vision:

“We would like to live in an open knowledge society
with limitless possibilities for all.” (Shuttleworth Foundation)

This vision aligns strongly with my own goal to enable everyone in society to benefit from scientific knowledge. My belief is that if we turn discovery from a closed, solitary activity into an open and collaborative one, we can bring the fruits of the open content revolution to everyone. To make this change possible, I want to create Open Knowledge Maps: a large-scale, collaborative system of open, interactive and interlinked knowledge maps for every research topic, every field and every discipline. For all  details, please see my application below – or watch the application video. As you would expect, the application was openly developed on Github.


1) Tell us about the world as you see it

A description of the status quo and context in which you will be working

Currently, the fruits of the open content revolution are unequally distributed. In the recent past, humanity has started to open up large amounts of scientific knowledge. Today, we can read over 90 million scientific articles on the web. But the tools for exploring and discovering this massive amount of content are seriously lacking. Most people rely on search engines, where they have to examine articles and their relationships by hand in order to get to the knowledge that they need. If you want to gain an overview of a research field, it will take weeks if not months to process all the necessary information, scattered over thousands of scholarly articles. There are more powerful tools that guide you through the literature – but they are proprietary and hugely expensive.

This is a problem for researchers, who spend a lot of time and effort on gaining and keeping an overview of scientific fields. But researchers have a community of peers that supports them in this task. People outside academia are usually on their own, and therefore often lost. Take the example of patients who would like to learn about the newest research on their illness. In the worst case, they don’t discover a lifesaving treatment, because the paper describing it was buried far down the results list.

There is a huge demand for better exploration and discovery tools, inside and outside of academia, but there are no large-scale attempts to provide these tools in an open manner. I am set to change that.

2) What change do you want to make in the world?

A description of what you want to change about the status quo, in the world, your personal vision for this area

To create a visual interface to the world’s scientific knowledge that can be used by anyone in order to revolutionize the way we discover research.

The base for this visual interface are so-called knowledge maps, a powerful tool for the exploration of a research field. Knowledge maps show the main areas of the field at a glance, and papers related to each area. By overlaying further connections between papers, e.g. references, we can also highlight relationships between areas. This makes it possible to infer connections between research results, which may have been unknown. Knowledge maps thus enable the exploration of existing knowledge, and the discovery of new knowledge.

My goal is to provide Open Knowledge Maps: a large-scale, web-based system of open, interactive and interlinked knowledge maps for every research topic, every field and every discipline. Around these maps, I want to create a space for collective knowledge mapping that brings together individuals and communities involved in exploration and discovery: researchers, students, journalists, librarians, practitioners and citizens. I want to enable people to guide each other in getting to the knowledge that they need, by collaboratively annotating and modifying the automatically created maps. I also want to enable users to create and contribute their own maps – achieving layered overviews of the world’s scientific knowledge including the perspectives of different epistemic cultures, geographic regions etc.

3) What has prevented this change from happening?

Describe the innovations or questions you would like to explore during the fellowship year

I want to explore how to automatically create knowledge maps on a massive scale and how to design an inclusive and sustainable space for collective knowledge mapping that brings together the individuals and communities involved in exploration and discovery.

In the recent past, open access has dramatically grown with up to 50% of new articles being published open access. Even the situation regarding legacy content is changing, with the ContentMine liberating millions of facts from closed sources. In my PhD, I created an open source, web-based knowledge mapping software called Head Start that builds on top of this open content (we further developed it during my subsequent Panton Fellowship). Head Start is capable of automatically producing knowledge maps from a variety of data, including text, metadata and references. The approach has received a lot of positive feedback from users and experts alike, and multiple awards.

Many people are currently tackling exploration and discovery of scientific knowledge on their own. The results of their efforts are usually not shared; they become visible only later as references in a publication or as reading lists. I want to explore how to bring different individuals and communities together, for example how to best connect patients, researchers and medical librarians to collaboratively map the newest research on a certain disease and how to enable them to openly share their efforts for the benefit of others affected by this disease.

4) What are you going to do to get there?

A description of what you actually plan to do during the year

Further develop the existing mapping software: In January, I published a Call for Collaborators that brought together the Open Knowledge Maps team. Jointly, we created an open roadmap on how to develop Head Start into a system of living, crowd-sourced guides to research fields. We will connect Head Start to over 90 million scholarly articles to create overviews of all fields of research. The maps will be enriched with facts extracted from full text and made available on There, they can be interactively explored, collaboratively annotated and modified in a Wikipedia-style editing process.

Create and implement a community strategy: My approach has always been to involve users at every step of the process, taking usability and cognition into careful consideration. I have therefore initiated an advisor programme to guide the development of Open Knowledge Maps in a human-centered design process. We will also review social factors that prevent people from using open knowledge systems, and explore ways to address these concerns. Another concrete action is to establish mapping parties (similar to those in the Open Street Maps project), where people get together to jointly map an underrepresented research field, for example a neglected disease.

Formulate a long term plan: My goal is to develop Open Knowledge Maps into a building block of the open knowledge society. Therefore, I will address points such as a legal entity and a sustainable funding stream.

5) What challenges or uncertainties do you expect to face?

If we build it, will they come? This is a challenge for any socio-technical system, which I will address by following best practices as detailed above: human-centered design and the development of a community strategy. The cold start problem will not be an issue as a massive amount of maps will be pre-computed and ready for exploration.

Establishing a strong and diverse community: I will face this key challenge by leveraging and expanding the existing advisor community, relying on experience that I have gained as one of the founders of Barcamp Graz, and as a coordinator of the open science WG of Open Knowledge AT. In both cases, we have have established strong communities based on openness and inclusiveness.

Technical challenges related to building a large-scale system: In this respect, I will draw on my long experience in software engineering (15 years, thereof 7 years as a project manager), and the experience of my team. It will be crucial to address scalability from the start and build it into the core architecture. We will use a distributed agile process and adopt strategies of successful open source projects.

Launching a self-sustainable non-profit organization: My involvement with non-profit organizations in the past 7 years – running a smaller organization, Knowledge Management Forum Graz, for 2 years – has made me conscious of the challenges that are connected to that. As with the other challenges, I plan to gather advice from the Shuttleworth community.

6) What part does openness play in your idea?

Openness is at the very core of my idea. Open Knowledge Maps strives to be a building block of the open knowledge society by openly sharing data, source code, and content that is being created. The code will be made available on Github under the license of the existing project (LGPL v3). The visualizations will be released under CC BY – with the exception of the contained content, which of course retains its original license. The underlying knowledge structures will be mapped to Wikidata concepts and can be exported in various open formats under CC0, so that they can be easily re-used.

We partner with existing open initiatives, including ContentMine,, rOpenSci, and the Internet Archive Labs. We will actively involve our partners, advisors and users to seek feedback, input, and pointers for further collaboration throughout the project. My goal is to reuse as much of the existing ecosystem as possible. To achieve this, the project progress is openly shared with the world, starting with this proposal which is hosted on Github. The development will also take place on Github. The concrete targets for developing the system will be published as issues in our repositories.

Openness will also play an important role in all social activities, which will be organized in the spirit of other open knowledge events. Mapping parties, for example, will be free of charge and they will be open to everyone interested in collaborative knowledge discovery.

A little longer than a month ago, I posted an Open Call for Collaborators for an Open Science Prize Proposal on Discovery on this blog and to various open science mailing lists. The call has been very fruitful and I am happy to announce that we have submitted a proposal. In the spirit of open science, you can find the full proposal and the supplementary materials on Github. See below for the executive summary and our video.

Team Open Discovery: Peter Kraker, Mike Skaug, Scott Chamberlain, Maxi Schramm, Michael Karpeles, Omiros Metaxas, Asura Enkhbayar & Björn Brembs

Executive Summary: Discovery is an essential task for every researcher, especially in dynamic research fields such as biomedicine. Currently, however, there are very few discover tools that can be used by a mainstream audience, most notably search engines. The problem with search engines is that they present resources in a linear, one-dimensional way, making it necessary to sift through every item in a list. Another problem is that the results of the traditional discovery process are usually closed. Therefore, the discovery process is repeated over and over again by different researchers, taking away valuable time and resources from the actual research. To solve these challenges and bring the discovery process into the open science era, we propose BLAZE, the comprehensive open science discovery tool. BLAZE will leverage the existing open science ecosystem to provide multi-dimensional topical maps of research fields, involving not only publications, but also datasets, presentations, source code and media files. BLAZE will provide a single, intuitive interface for researchers to explore, edit and share maps. The edit history of a map will be preserved to allow Wikipedia style collaboration. The maps themselves will be open, so users can embed them on their own websites and export the structure into other open science tools. Opening the discovery process will enable researchers to reuse maps, saving valuable time and effort because they can build on top of each other’s work. Furthermore, they will be able identify collaborators long before the research is usually communicated. There is an existing, early-stage protoype for BLAZE and with the Open Science Prize, we plan to develop this prototype into a comprehensive tool. BLAZE will show the enormous potential of open science for innovation in scholarly communication by providing a structured, open and multi-dimensional approach to discovery.

I am currently preparing a proposal for the Open Science Prize in the field of open discovery, and I am looking for motivated collaborators who want to join the project and change the way we do discovery. Here is the current summary:

Discovery is an essential task for every researcher, especially in dynamic research fields such as biomedicine. Currently, however, there are only a limited number of tools that can be used by a mainstream audience.We propose BLAZE, an open discovery tool that goes far beyond the functionality of search engines and social reading lists. The tool builds on Pubmed Central and other open content sources and will provide topical maps for a given list of papers, e.g. a search result or a journal volume. The maps are created automatically using fulltexts to calculate similarities and derive topical structures among papers. Furthermore, they will be enriched with features that are extracted from the papers (e.g. all papers with the same species are highlighted). BLAZE will enable users to do their discovery in a single interface. Users can interact with the maps, explore different topical areas, filter and read individual papers in the same interface. An edit mode will provided for users to make changes to the maps and to introduce new papers and topical areas. Users can openly share maps with others and export the structure in various open formats. BLAZE will be based on the existing open source visualization Head Start, and make extensive use of the digital open science ecosystem, including, but not limited to, open content, content mining services, open source solutions, and open metrics data. With this tool, we want to show the potential of open science for innovation in scholarly communication and discovery. In addition, we believe that this tool will increase the visibility of and awareness for open content and open science in general.

A first draft is also available.

I am looking for backend and frontend web developers who code in JavaScript and/or PHP and R. We will be extending an existing tool for creating web-based knowledge domain visualizations that uses D3.js on the frontend, and R content mining packages on the backend, in particular rOpenSci and tm, so you should have experience with at least one of these libraries. A background in biomed would be nice but it’s not mandatory.

Everything about this project will be open: we will prepare the proposal in the open, the development will take place on a public Github repository, and all project outputs will be published under an open license.

So if you want to join the project and create an awesome open science tool together with me, please send an e-mail to outlining which part of the project interests you most, what you’d be able to contribute and how many hours you could devote to the project over the coming months. Please also include a link to your Github repository. It would be great if you could let me know whether you are a citizen of, or permanent resident in, the United States (US), as we will need to have at least one team member who satisfies this criterion. I am looking forward to your messages!

With “Ich bin Open Science!”, we want to raise public awareness for open science in Austria and beyond. The project, a collaboration between Know-Center and FH Joanneum, has been submitted to netidee 2015. In the video (German only for the moment) we explain the project idea, and you can see first testimonials who lend a face to open science. Why are you committed to openness in science and research?

%d bloggers like this: