My secondment at Mendeley is drawing to a close. In the last few weeks, I have been mainly working on a running prototype of the overview visualizations of research fields. The idea is to give people who are new to a field a head start in their literature search. Below you can see a first screenshot of the prototype for the field of Educational Technology. The blue bubbles repesent the different research areas in Educational Technology. Shining through these bubbles are the most important papers in these areas. The papers become fully visible upon zooming into the area, and you can even access the paper previews within the visualization. I hope to get the prototype up and running in the next two weeks, so you can explore the visualization for yourself.
And now for something completely different
Well, at least partially different. If you are a researcher, you will have encountered the following situation: you are working on a project when you suddenly find out that someone has done exactly the same thing as you but 5(0) years earlier. Even though you have done an extensive literature review, that particular paper or project has escaped your search. There are certain reasons why something like that might happen. One is that terminology is very fluent in research; even within the same field names can change pretty quickly. This is the reason why I rely on structural data (co-readership patterns) rather than textual data (e.g. keywords, titles, abstracts) to create my visualizations. Structural data has proven to be a lot more stable over time. Much like an individual researcher usually not only searches for literature but also follows references and incoming citations to find related research.
Another reason for previous research going unnoticed is that the research was done in another community or research field. In that case, not only the terminology might be different, but also the link between the two areas may be inexistent. Ever since I started with these visualizations, I thus wanted to not only show the borders of research areas, but also their overlaps.
A semantic approach
The idea to use semantics arose when I had the problem to come up with a mechanism for automatically naming the research areas. I ended up sending the abstracts and titles from each area to OpenCalais and Zemanta. Both services crawl the semantic web and return a number of concepts that are describing the content. I use this information to find the most appropriate name for the area. I compare the concepts that I get back to word n-grams. The more words a concept has, and the more often it occurs within the text, the more likely it is to be the name of the area.
Now I would like to use the concepts that I get back from the webservices to show connections between research areas. Using semantic reasoning, it should be possible to show overlapping research areas by common concepts. The number of common concepts could be an indicator of the size of the overlap. If there was some kind of concept classification involved, it would also be possible to give hints not only on the extent of the overlap, but also on the nature of the overlap. Are two areas using the same methodes, or are they working on similar topics? Which areas have similar theoretical backgrounds, or similar problems to solve?
This is still in a very early stage, but I am eager to get feedback on the idea. What do you think about involving semantics in that kind of way? What are other potential uses of a semantic description of research fields?