Open Science

Note: This article first appeared in Research Europe and Research Professional News.

The Covid-19 pandemic has triggered an explosion of knowledge, with more than 200,000 papers published to date. At one point last year, scientific output on the topic was doubling every 20 days. This huge growth poses big challenges for researchers, many of whom have pivoted to coronavirus research without experience or preparation.

Mainstream academic search engines are not built for such a situation. Tools such as Google Scholar, Scopus and Web of Science provide long, unstructured lists of results with little context.

These work well if you know what you are looking for. But for anyone diving into an unknown field, it can take weeks, even months, to identify the most important topics, publication venues and authors. This is far too long in a public health emergency.

The result has been delays, duplicated work, and problems with identifying reliable findings. This lack of tools to provide a quick overview of research results and evaluate them correctly has created a crisis in discoverability itself.

The pandemic has highlighted this, but with three million research papers published each year, and a growing diversity of other outputs such as datasets, discoverability has become a challenge in all disciplines.

For years a few large companies have dominated the market for discovery systems: Google Scholar; Microsoft’s soon-to-be-closed Academic; Clarivate’s Web of Science, formerly owned by Thomson Reuters; and Elsevier’s Scopus.

But investment hasn’t kept pace with the growth of scientific knowledge. What were once groundbreaking search engines have only been modestly updated, so that mainstream discovery systems are now of limited use.

This would not be a problem if others could build on companies’ search indices and databases. But usually they can’t.

A new openness
In the shadows of these giants, however, an alternative discovery infrastructure has been created, built on thousands of public and private archives, repositories and aggregators, and championed by libraries, non-profit organisations and open-source software developers. Unlike the commercial players, these systems make their publication data and metadata openly available.

Building on these, meta-aggregators such as Base, Core and OpenAIRE have begun to rival and in some cases outperform the proprietary search engines. Their openness supports a rich ecosystem of value-added services, such as the visual discovery system Open Knowledge Maps, which I founded, or the open-access service Unpaywall.

This open infrastructure has become the strongest driver of innovation in discovery, enabling the quick development of a variety of discovery tools during the pandemic. Technologies such as semantic search, recommendation systems and text and data mining are increasingly available.

Many open systems, though, are not sustainably funded. Some of the most heavily used make ends meet with a tiny core team. Half, including Open Knowledge Maps, rely on volunteers to provide basic services.

The funding options for non-profit organisations and open-source projects are very limited. Most rely on research grants, which are meant as a jumping off point, not a long-term solution.

The academic community needs to step up and secure the future of this crucial infrastructure. The shape of research infrastructure depends on institutions’ buying decisions. If most of the money goes to closed systems, these will prevail.

A first step would be to create dedicated budget lines for open infrastructures. The initial investment would be relatively small, as their membership fees are usually orders of magnitude cheaper than the license fees of their proprietary counterparts. Over time, strengthening open infrastructure will enable research institutions to cancel their proprietary products.

It’s not just about money. Open infrastructures do not lock institutions into closed systems, and save them from selling off their researchers’ user data, an issue gaining prominence as large commercial publishers become data analytics businesses.

The coronavirus pandemic has shown that the challenges of our globalised world demand international collaboration. That requires building on each others’ knowledge.

This is not possible with closed and proprietary discovery infrastructures that have fallen behind the growth of scientific knowledge. Instead, we need to guarantee the sustainability of the open discovery infrastructure, so that we can rely on it for today’s and tomorrow’s challenges.

#DontLeaveItToGoogle – How open infrastructures enable continuous innovation in the research workflow

October 24, 2019

Campaign, Infrastructure, Open Science

Leave a comment

My talk on the #DontLeaveItToGoogle campaign (co-created with Maxi Schramm) at OAI’11 – The CERN-UNIGE Workshop on Innovations in Scholarly Communication. More information, including sketchnotes and a copy of the slides can be found here: indico.cern.ch/event/786048/contributions/3381224/

Update on the #DontLeaveItToGoogle campaign

March 28, 2019

Campaign, Infrastructure, Open Science, Uncategorized

Leave a comment

A few months ago, I started the #DontLeaveItToGoogle campaign on Twitter to protest Google Dataset Search and to urge funders to provide the means for creating open alternatives. The original tweet has since been retweeted and liked hundreds of times and reached 50K impressions. As I had hoped, it also started a widespread discussion about open infrastructures in research.

Since then, I have been interviewed for the Elephant in the Lab to clarify the issues behind the initial tweet. I also wrote a piece for GenR where I laid out the specific problems in the discovery space.

And last week, I gave a keynote at the Open Science Conference in Berlin that contrasted the approach of proprietary vendors such as Google with the open infrastructure. In this talk that I co-created with Maxi Schramm, I argued that there is a serious crisis of discoverability in research. Scientific knowledge is growing at an unprecedented rate, but we do not have the tools to keep up with this growth. As a result, we see the emergence of dark knowledge, i.e. knowledge that cannot be discovered and reused.

Proprietary discovery tools from companies such as Google, Elsevier, and Digital Science are one of the main reasons for this problem. Their tools have outdated user interfaces and they do not allow for reuse of their data and software. What cannot be found within their systems is invisible to researchers and practitioners, essentially creating a wall of dark knowledge. We therefore cannot leave it to these large commercial entities to solve the discoverability crisis.

Instead we have to invest in the open discovery infrastructure. In the open infrastructure, reuse reigns supreme. We can all build on top of each other’s work, creating a cycle of continuous innovation. No one tool has the monopoly over which content researchers and practitioners get to see, tearing down the wall of dark knowledge.

Going forward, it will be especially important to fund user interfaces and user-facing services. Otherwise, we will give up all control over how users interact with open science. We also lose governance over user data and algorithms, and we miss out on a large chunk of innovation potential in that area.

You can find the slides from the talk here. I’d be very interested in your feedback on these ideas and arguments. Add your voice on social media using the #DontLeaveItToGoogle hashtag, or respond in the comments below.

#DontLeaveItToGoogle

September 10, 2018

Campaign, Infrastructure, Open Science

1 Comment

As you may have heard, Google is building a search engine for datasets. This will mean yet another proprietary index on top of our own data that nobody can reuse – and another inferior list-based interface that will be pushed onto scientists by Google’s sheer market dominance.

At the same time, there is no funding for a community-driven open source alternative. We should not leave the field to Google, or science will be poorer for it! We finally need to invest in a true open science contender for research data discovery.

This would bring meaningful competition to this space and drive innovation.

This is why I have launched the #DontLeaveItToGoogle campaign. I encourage you to add your own voice to the discussion using this hashtag. It’s time to change the way we discover research – not perpetuate the same proprietary model time and time again!

Why don't we FINALLY invest in #opensource discovery solutions? This would bring meaningful competition to this space & drive innovation. At @OK_Maps, we are ready – but funding is extremely scarce. Thus the field is left to Google & science is poorer for it! #DontLeaveItToGoogle

— Peter Kraker @PeterKraker@scicomm.xyz (@PeterKraker) September 10, 2018

DontLeaveItToGoogle

WikidataCon: Giving more people more access to more knowledge

December 11, 2017

Conference, Open Science

Leave a comment

At the end of October, I attended WikidataCon 2017 in Berlin. My participation was made possible thanks to the generous support of Wikimedia Austria, so a big shout-out to the great team here in Vienna! 🎉

I’ve been intrigued by the project for quite some time, and I wanted to learn more as to how we at Open Knowledge Maps can contribute to this evolving ecosystem of linked open data. And what an ecosystem it has become! As Lydia Pintscher, product manager of Wikidata, showed in her presentation on the occasion of Wikidata’s 5th birthday, the database now comprises of 37.8 million items contributed by some 17,600 editors.

Amazing to see all the tools, games and other uses of Wikidata #WikidataCon pic.twitter.com/ceDG1NgcuW

— Peter Kraker @PeterKraker@scicomm.xyz (@PeterKraker) October 28, 2017

Due to the increasing number of items on scholarly articles, there were many interesting talks on the uses of Wikidata for open science. This included a demo of neonion, an annotation and recommendation system connected to Wikidata, presented by Adelheid Heftberger and led by Claudia Müller-Birn. Dario Taraborelli presented WikiCite, which aims to build an extensive open bibliographic database in Wikidata including large amounts of citation data. Finn Årup Nielsen gave an introduction to Scholia, a presenter tool for this information. Dario started his talk with a controversial statement, which sparked a lot of discussion on Twitter:

“Wikipedia has solved the problem of fake news 15 years ago“ @ReaderMeter introducing @Wikicite at #WikidataCon pic.twitter.com/JS3BCqyXOB

— Peter Kraker @PeterKraker@scicomm.xyz (@PeterKraker) October 28, 2017

Much of my interest in Wikidata is driven by Stefan Kasberger and Daniel Mietchen, who are both Open Knowledge Maps advisors and strong contributors to Wikidata & WikiCite. During the Wikimedia Hackathon and WikiCite Conference this year, Daniel, Tom Arrow and myself formulated a first plan, involving the WikiFactMine project. WikiFactMine is carried out by our collaboration partner ContentMine and seeks to use text mining of the Open Access bioscience literature to enhance the Wikimedia projects. With the release of the WikiFactMine API, which was announced at WikidataCon, this idea is getting closer to becoming a reality.

Gotta check out the WikiFactMine API! https://t.co/aPHPrn3lbb Presented by @ThomasArrow1 at #WikidataCon @OK_Maps

— Peter Kraker @PeterKraker@scicomm.xyz (@PeterKraker) October 28, 2017

We know that Wikipedia has sparked a lot of research, but there is also an increasing amount of research based on Wikidata, as exemplified by the knowledge map below. As such, Wikidata is becoming an increasingly important bridge between the scientific community and Wikimedia.

Map of research on Wikidata – incl. papers by @fnielsen @clmbirn @EvoMRI @nichtich @inablu @ReaderMeter https://t.co/nZVSS2AGHa #WikidataCon

— Open Knowledge Maps (@OK_Maps) October 28, 2017

As every other Wikimedia event that I had previously attended, WikidataCon was a well-organized, welcoming and inclusive event. It’s always great to see, how Wikimedia communities tackle diversity and bias heads-on.

Ideas for a plan of action to tackle the Global South systemic bias #WikidataCon pic.twitter.com/YMxvXDFGzP

— Peter Kraker @PeterKraker@scicomm.xyz (@PeterKraker) October 28, 2017

I thoroughly enjoyed my stay and I am looking forward to the community-organized events on Wikidata around the world next year, before WikidataCon returns to Berlin in 2019.

https://twitter.com/photos_floues/status/924627678996631552

Open Science, All The Way: Open Knowledge Maps

September 12, 2017

Open Knowledge Maps, Open Science, Scholarly Communication, Visualizations

Leave a comment

Note: this post first appeared on ZBW Mediatalk and has been updated to reflect the latest update of Open Knowledge Maps.

Science and research are more productive than ever. Every year, around 2.5 million research articles are published, and counting. A lot of research information is openly available: thanks to the open access movement, we can now find more than 100 million scientific outputs on the web. We have made great strides with respect to accessibility; but what about discoverability? After all, this enormous amount of knowledge is only of use to us, if it reaches the people that need it, and if it is reused as a basis for further research or transferred to practice. Here we can see a big gap. Depending on the discipline 12% to 82% of all scientific publications are never cited. This means that these publications do not serve as a basis for further research. When talking about transfer to practice, the gap is even wider: even in application-oriented disciplines such as medicine, only a small percentage of research findings ever influence practice – and even if they do so, often with a considerable delay.

Mission

What prevents knowledge transfer to practice?

One reason for this situation is that the tools for exploration and discovery of scientific knowledge are seriously lacking. Most people use search engines for this task. Search engines work very well, when you know what you want. Then, they deliver the result you are looking for – often with high precision. However, if you want to get an overview of an unknown scientific field, the list-based representation with only 10 results per page is not sufficient. With search engines, it takes a long time, before you know the main areas in a field, the most important terms, authors and journals. It can take weeks if not months – indeed in many PhD programs, the whole first year is devoted to this process. Many people in research and especially practitioners do not have that much time. Think about science journalists or patients. To summarize: there are many people out there that could benefit from scientific knowledge, if there were better tools for discovering research results.

Knowledge maps instead of lists

At Open Knowledge Maps, we intend to close this gap, to provide the missing link between accessibility and discoverability. Instead of lists, we use knowledge maps for discovery. Knowledge Maps show the main areas of a field at a glance. Relevant publications are already attached to each area. This enables users to get a quick overview of a field.

overview_heart_diseases_rund-example The sub-areas also make you aware of the terminology in a field. This information alone may take weeks to find out. How much time have you already lost to searching without knowing the best search terms? In addition, the knowledge map enables users to separate the wheat from the chaff with respect to their current information need. For an ambiguous search term for example, the different meanings are sorted into separate areas.

benefits-all

Open Knowledge Maps as an openly accessible service

At Open Knowledge Maps, we are offering an openly accessible service, which allows you to create a knowledge map for any search term. Users can choose between two databases: Bielefeld Academic Search Engine (BASE) with more than 110 million scientific documents from all disciplines, and PubMed, the large biomedical database with 26 million references. We use the 100 most relevant results for a search term as reported by the respective data base as a base for our knowledge maps. The ordination and the determination of the areas is based on textual similarity of the metadata of the results. This means: the more words two documents have in common in either title, abstract, authors or journal name, the closer they are positioned on the map and the more likely they are placed in the same area. For everyone who would like to dive deeper into the algorithms used to create the map, our article Open Knowledge Maps: Creating a Visual Interface to the World’s Scientific Knowledge Based on Natural Language Processing in the journal 027.7 is worth reading.

The knowledge map for research “sugar” can be seen below. As described above, the bubbles represent the different areas. If you click on one of the bubbles, you are presented with papers related to this area. Open access articles are clearly marked and can be read within the interface. The idea is that you do not need to leave the browser tab while searching for literature. Go to Open Knowledge Maps to check out the search service.

Open Knowledge Maps brings open science to life

The “Open” in Open Knowledge Maps does not only stand for open access articles – we want to go the whole open science way and create a public good. This means that all of our software is developed open source. You can also find our development roadmap on Github and leave comments by opening an issue. The knowledge maps themselves are licensed under a Creative Commons Attribution license and can be freely shared and modified. We will also openly share the underlying data, for example as Linked Open Data. This way, we want to contribute to the open science ecosystem that our partners, including rOpenSci, ContentMine, Open Knowledge, the Internet Archive Labs and Wikimedia are creating.

We see libraries as important collaboration partners. We cooperate with the libraries of the University of Bielefeld and the Austrian Academy of Sciences. ZBW is also using software from Open Knowledge Maps in a joint project with the Know-Center. This collaboration is a win for both sides: Open Knowledge Maps is a stable, user friendly system, which enables libraries to visualize their collections of documents and to improve their discoverability. On the other hand, improvements from these projects are fed back into the software of Open Knowledge Maps, improving the system for all users.

Vision: Collaborative literature search

In the future, we want to enable collaborative literature search in Open Knowledge Maps. At the moment, most people are tackling discovery on their own. But usually someone has already walked this way before you. Unfortunately, the knowledge gained during discovery remains in people’s heads. We want to further develop Open Knowledge Maps, so that maps can be modified, extended and shared again – so that we can build on top of each other’s’ knowledge. We have created a short video to illustrate this idea:

We see libraries and librarians as central to this vision. A collaborative system cannot work without experts on knowledge curation and structuring. Together with the other stakeholders from research and society, including researchers, students, journalists, citizen scientists and many more, we want to create a system that enables us to create pathways through science for each other. So that we can all benefit from this unique knowledge.

Creating Gateways Into Research

May 17, 2017

Applications, Open Knowledge Maps, Open Science, Proposal

Leave a comment

I have just applied for a Shuttleworth Fellowship.

The fellowships are issued by the Shuttleworth Foundation, which describes its vision as “We would like to live in an open knowledge society with limitless possibilities for all.” I very much share this vision – it is the main reason I founded Open Knowledge Maps. Our goal is to build a visual interface to the world’s scientific knowledge in order to dramatically increase the visibility of research findings for science and society alike.

We want to provide a solution to a challenge that’s almost paradoxical: on the one hand, more research is openly available than ever, and we see considerable interest in science and technology. On the other hand, we are faced with a serious crisis of trust in scientific research, with anti-vaccination movements and climate change deniers on the rise.

I believe that the root of this problem is that it is very hard to get an “in” on research. Access does not equal discoverability or even participation. People outside academia trying to understand a research field are therefore often lost. I want to empower these people by providing better gateways into scientific research. Think: policy makers attempting to optimize decision-making by using evidence from relevant research, educators striving to convey the state-of-the-art, fact checkers trying to verify statements, or patients who would like to learn about the newest findings on their illness.

To make this happen, I believe that we need to do two things: first, improving the discoverability of research findings. Second, turning discovery into a collaborative process – thus enabling participation, and allowing people to create pathways through science for each other. Take a rare disease as an example: wouldn’t it be great, if researchers, doctors and patients would collaboratively map the newest research on this disease – and then share the results of their efforts for the benefit of patients, who don’t have access to specialists?

Enter Open Knowledge Maps: we use knowledge maps, a powerful tool for exploration and discovery. Knowledge maps provide an instant overview of a field by showing the main areas of the field at a glance, and papers related to each area (see below). In addition, knowledge maps make it possible to easily identify useful, pertinent information by separating papers into meaningful clusters – and they are exposing important concepts in the field that you would often need weeks to find out.

Examplary knowledge map for research on heart diseases

During the fellowship year, I want to explore how we can create a space for participatory discovery around these maps. How can different communities interact on a level-playing field, so that they create pathways through science for each other?

A little backstory

If you are an avid reader of this blog, you may recall that I alreay applied for a Shuttleworth Fellowship a year ago. The fellowships first caught my eye, when I learned that amazing projects like ContentMine (Peter Murray-Rust), Hypothes.is (Dan Whaley), and Koruza (Luka Mustafa) had all been enabled by a Shuttleworth Fellowship.

Back in May 2016, Open Knowledge Maps was just starting out, with an enthusiastic group of volunteers, and a prototypical service that enabled users to create a knowledge map for a topic based on the PLOS library (160,000 articles).

Since then, a lot has happened in the project.

The team has grown: I am developing Open Knowledge Maps together with nine amazing volunteers. I have also found great advisors and strong partners from the open knowledge community. We put out two major updates of our search service – pushing our coverage to 100 million scientific articles from all disciplines thanks to BASE. We’ve considerably improved the user experience based on feedback from the community, and we’ve enabled features such as collaborative annotation thanks to Hypothes.is. And we held numerous workshops and sessions at events such as OpenCon, MozFest and re:publica, with more than 300 people in attendance.

Knowledge map for digital education. Click on the map to get to the interactive version on Open Knowledge Maps.

I am very happy that our efforts resonated with the community. Open Knowledge Maps was featured on the frontpages of reddit and HackerNews. Our user base has quickly grown: in less than a year, we saw over 100,000 vists and more than 30,000 maps have been created to date. We’ve received hundreds of enthusiastic tweets, e-mails and blog posts, motivating us to proceed with our vision.

We’ve now reached the limits of what we can do as a pure volunteer project. In order to realize the full potential of the idea, we need support. This is why I decided it’s time to give it another go. I also believe that it is a critical time that we are creating this platform in. There are several closed solutions for providing visual overviews that are being developed right now. If we do not provide an open alternative in time, we risk being stuck with proprietary solutions and wasted public money for decades.

As usual, the proposal was developed in the open. Special thanks go to Maxi Schramm, Christopher Kittel, Florian Heigl, Rufus Pollock, Antica Culina and Daniel Mietchen for comments on the draft. But I’d like to thank the Open Knowledge Maps family – team, advisors, partners and users. I am very lucky to shape this vision together with you.

You can find the full proposal on Github.

Visualize a research topic based on 100 million scientific documents

March 30, 2017

Announcements, Open Knowledge Maps, Open Science, Visualizations

Leave a comment

We have now connected Open Knowledge Maps to one of the largest academic search engines in the world: BASE. This means, you are able to visualize a research topic from 100+ million documents. And for the first time, you can search within different types of resources, including datasets and software. I would like to thank our collaborators BASE and rOpenSci for their outstanding support in making this happen!

We have also spent a lot of time improving the naming of the sub-areas to make the concepts in a field more visible – which means that this update improves our existing PubMed integration too.

In addition, we have added much more information to the site about the project and our approach. Open Knowledge Maps follows the motto “open science, all the way”. From our roadmap to our source code and our data, we publish everything under an open license that is compatible to the Open Definition.

Try it out now and let me know of any feedback you may have!

Open Knowledge Maps: PubMed and DOAJ integration is here!

November 2, 2016

Announcements, Open Knowledge Maps, Open Science, Visualizations

2 Comments

Today, I am very proud to announce a milestone for Open Knowledge Maps. Thanks to an outstanding team and continued support by our partners and advisors, we have added two major content sources: the Directory of Open Access Journals (DOAJ) with more than 2.3 million articles and PubMed with more than 26.5 million articles. Taking into account a certain overlap between the two sources, we can now credibly state that one can create maps based on 28 million papers. That’s a content pool that is 175 times larger than in the previous iteration using PLOS (about 160,000 articles).

We have also completely overhauled our design & overall presentation and improved the user experience considerably. In addition, we have included the open annotation software Hypothes.is in our PDF preview.

We believe that this is a major step towards revolutionizing the way we discover research. There are many new things to try out and explore – we are looking forward to your feedback and suggestions!

Try it out now!

Open Science Prize Proposal Submitted

March 7, 2016

Open Science, Proposal, Visualizations

Leave a comment

A little longer than a month ago, I posted an Open Call for Collaborators for an Open Science Prize Proposal on Discovery on this blog and to various open science mailing lists. The call has been very fruitful and I am happy to announce that we have submitted a proposal. In the spirit of open science, you can find the full proposal and the supplementary materials on Github. See below for the executive summary and our video.

Team Open Discovery: Peter Kraker, Mike Skaug, Scott Chamberlain, Maxi Schramm, Michael Karpeles, Omiros Metaxas, Asura Enkhbayar & Björn Brembs

Executive Summary: Discovery is an essential task for every researcher, especially in dynamic research fields such as biomedicine. Currently, however, there are very few discover tools that can be used by a mainstream audience, most notably search engines. The problem with search engines is that they present resources in a linear, one-dimensional way, making it necessary to sift through every item in a list. Another problem is that the results of the traditional discovery process are usually closed. Therefore, the discovery process is repeated over and over again by different researchers, taking away valuable time and resources from the actual research. To solve these challenges and bring the discovery process into the open science era, we propose BLAZE, the comprehensive open science discovery tool. BLAZE will leverage the existing open science ecosystem to provide multi-dimensional topical maps of research fields, involving not only publications, but also datasets, presentations, source code and media files. BLAZE will provide a single, intuitive interface for researchers to explore, edit and share maps. The edit history of a map will be preserved to allow Wikipedia style collaboration. The maps themselves will be open, so users can embed them on their own websites and export the structure into other open science tools. Opening the discovery process will enable researchers to reuse maps, saving valuable time and effort because they can build on top of each other’s work. Furthermore, they will be able identify collaborators long before the research is usually communicated. There is an existing, early-stage protoype for BLAZE and with the Open Science Prize, we plan to develop this prototype into a comprehensive tool. BLAZE will show the enormous potential of open science for innovation in scholarly communication by providing a structured, open and multi-dimensional approach to discovery.

—Science and the Web

Peter Kraker's Weblog

Archive

Open Science

Open search tools need sustainable funding

#DontLeaveItToGoogle – How open infrastructures enable continuous innovation in the research workflow

Update on the #DontLeaveItToGoogle campaign

#DontLeaveItToGoogle

WikidataCon: Giving more people more access to more knowledge