I am currently preparing a proposal for the Open Science Prize in the field of open discovery, and I am looking for motivated collaborators who want to join the project and change the way we do discovery. Here is the current summary:

Discovery is an essential task for every researcher, especially in dynamic research fields such as biomedicine. Currently, however, there are only a limited number of tools that can be used by a mainstream audience.We propose BLAZE, an open discovery tool that goes far beyond the functionality of search engines and social reading lists. The tool builds on Pubmed Central and other open content sources and will provide topical maps for a given list of papers, e.g. a search result or a journal volume. The maps are created automatically using fulltexts to calculate similarities and derive topical structures among papers. Furthermore, they will be enriched with features that are extracted from the papers (e.g. all papers with the same species are highlighted). BLAZE will enable users to do their discovery in a single interface. Users can interact with the maps, explore different topical areas, filter and read individual papers in the same interface. An edit mode will provided for users to make changes to the maps and to introduce new papers and topical areas. Users can openly share maps with others and export the structure in various open formats. BLAZE will be based on the existing open source visualization Head Start, and make extensive use of the digital open science ecosystem, including, but not limited to, open content, content mining services, open source solutions, and open metrics data. With this tool, we want to show the potential of open science for innovation in scholarly communication and discovery. In addition, we believe that this tool will increase the visibility of and awareness for open content and open science in general.

A first draft is also available.

I am looking for backend and frontend web developers who code in JavaScript and/or PHP and R. We will be extending an existing tool for creating web-based knowledge domain visualizations that uses D3.js on the frontend, and R content mining packages on the backend, in particular rOpenSci and tm, so you should have experience with at least one of these libraries. A background in biomed would be nice but it’s not mandatory.

Everything about this project will be open: we will prepare the proposal in the open, the development will take place on a public Github repository, and all project outputs will be published under an open license.

So if you want to join the project and create an awesome open science tool together with me, please send an e-mail to outlining which part of the project interests you most, what you’d be able to contribute and how many hours you could devote to the project over the coming months. Please also include a link to your Github repository. It would be great if you could let me know whether you are a citizen of, or permanent resident in, the United States (US), as we will need to have at least one team member who satisfies this criterion. I am looking forward to your messages!

If you have been to a TEL event recently, you might have already seen our Tweet Visualizations in the domain of TEL. Stefanie Lindstaedt will be presenting our paper entitled “On the Way to a Science Intelligence: Visualizing TEL Tweets for Trend Detection” at EC-TEL 2011 on Wednesday Thursday. I thought this was a good occasion to tell you a bit more about the background of the visualizations. I should not forget to mention that the system was built in a joint project between Know-Center and Joanneum Research in the context of the STELLAR Network of Excellence. Most of the stuff below is taken from our paper, if you want to know more, you can access the preprint here.


Below you can see the architecture of the system. We developed a focused Twitter Crawler, which takes as input either a taxonomy of hashtags, or a list of users, or both. It then queries the Twitter Streaming API for matching data. This allows for adapting the system to a certain domain. The tweets are logged, cleaned, and informative tokens (such as nouns and hashtags) are extracted using TreeTagger. Finally we store the tweets, their metadata, and their associated informative tokens in a Solr index. Therefore, one can access the real-time data, but also go back in time. At the moment, this works only for a couple of weeks, but we are in the process of widening this timeframe drastically.

The Visualization Dataservices are REST-ful Webservices, which  translate the search query into a Solr query and preprocess the Solr result in different ways: the Streamgraph Dataservice focuses on analyzing the temporal evolution of topics over time; the Weighted Graph Dataservice focuses on relations between different topics. The dataservices are used to power two visualizations. One is a weighted graph, a co-occurrence network for analyzing semantic networks of terms based on the JavaScript InfoVis Toolkit (JIT).


Below you can see the Weighted Graph Visualization for the hashtag of the 2nd STELLAR Alpine Rendez-vous.

In the center, there is the official hashtag for the event #arv11. The hashtags which are directly related to the event hashtag, are hashtags of individual workshops, such as #datatel11 for the dataTEL workshop, and #arv3t for the workshop “Structuring online collaboration though 3 Ts : task time & teams”. Co-occurring with the individual hashtags are hashtags that describe some of the content of the workshops, such as agency and PLE for the 3T workshop.

The second visualization is a streamgraph based on the Grafico javascript charting library for analyzing trends over time. The graph below shows a screenshot of the Streamgraph Visualization, displaying the co-occurring hashtags for the query ”conferences” from 20/2/2011 to 14/04/2011. On the x-axis, the time intervals are outlined, whereas on the y-axis, the relative number of occurrences is shown. Each colored stream represents one co-occurring hashtag. The visualization shows that the hashtag for the South-by-Southwest conference (#sxsw) is trending around the actual event on March 15. The #pelc11 hashtag was trending around April 7, with the Plymotuh E-Learning conference taking place from April 6-8. Another conference that is trending is the PLE Conference in Southhampton (#PLE_SOU), which took place later but generated a lot of tweets even before the event. The other co-occurring hashtags are not tied to a certain conference (such as #mlearning and #edchat), but they denote hashtags in the TEL area which contain a large amount of tweets about conferences. These hashtags could be used to find out about further conferences in the area.

Due to the fact that the user interface relies on web standards, the visualizations can be easily included in any system that relies on those standards. Apart from the main website, versions of the visualizations are also available as widgets in TELeurope.

Reception and Outlook

We are quite pleased with the reception of our system. The visualizations have been used as a support in the dataTEL workshop (#datatel11), the RDSRP’11 special track at i-KNOW (#rdsrp11), and the SURF Learning Analytics Seminar (#sa_la). Participants liked the look of the visualizations, and the idea behind them. The system will also be employed as a reflection tool in the Workshop on Awareness and Reflection in Learning Networks (#arnets11) at EC-TEL 2011.

Nevertheless, there are several issues we still need to address. As already mentioned, we are working on providing a larger index that goes back to 2010. We are also looking into new zooming and filtering facilities, allowing users to dig deeper into the data. Furthermore, we want to integrate different measures to derive more meaningful terms for longer periods. If you want to know more, especially about the evaluations we have already conducted, please refer to the paper. If you want to use the visualizations in one of your events, just contact me. As always, comments and suggestions are welcome!

Update: The streamgraph now accepts parameterized URLs for easier sharing of individual visualizations. Check out the EC-TEL 2011 stream, or a chart of the RDSRP’11 morning session.

Kraker, Peter, Wagner, Claudia, Jeanquartier, Fleur, & Lindstaedt, Stefanie (2011). On the Way to a Science Intelligence: Visualizing TEL Tweets for Trend Detection Proceedings of the 6th European Conference on Technology Enhanced Learning, 220-232 DOI: 10.1007/978-3-642-23985-4_18

This week I saw a presentation by David Lowe from University of Technology in Sydney on the Australian Labshare project. In this project, they are developing remote labs; laboratories that can be operated over the internet.

Unfortunately, I was not able to see the demo of the software (check it out – it is called Sahara and you can find it on Sourceforge), but as far as I understood it, the process is as follows: You can choose from a range of experiments in every lab. If you have found an interesting one, you can fiddle with the settings and – subsequently – run it. In the process you are getting visual feedback from a camera. Afterwards your are presented with the data from the experiment in the form of sketches and numbers.

At the moment, they are using it mainly for educational purposes. There was a long discussion after the presentation whether real labs could largely be replaced with simulations. This is an interesting topic, and it sparked a lot of controversy, but I was more interested in  “Doing research with remote labs”. I am not a natural scientist, but as far as I can see, remote experiments would make it a lot easier to write protocols and keep open lab books like on OpenWetWare. The software records your settings as well as your results, so you would only have to fill in the rationale between the experiments.

Apart from the set-up and the data, you would be able to also share something even more valuable: you could share the whole experiment! I mean this in a sense that everyone would be able to have the same experience as the original researcher. This naturally includes the recordings, but it extends even beyond that. You could provide others with the exact same set-up in the exact same lab, so that they can reproduce the experiment from the beginning to the end.

I am aware that there are certain challenges on the way: experiments in research are most possibly more complicated and need more variation than those intended for education. Still, I am very intrigued by the idea. I would love to hear your opinions on this (especially from people in natural sciences) and I will definitely follow the Labshare project to see what they will come up with in this area.

%d bloggers like this: