Archive

Visualizations

Today, I am happy to announce that Head Start, my overview visualization of Educational Technology has been released on Mendeley Labs. The visualization is intended to give researchers that are new to a field a head start on their literature review (hence the name).

When you fire up the visualization, the main areas in the field are shown, represented by the blue bubbles. Once you click on a bubble, you are presented with the main papers in that area. The dropdown on the right displays the same data in list form. By clicking on one of the papers, you can access all metadata for that paper. If a preview is available, you can retrieve it by clicking on the thumbnail in the metadata panel. By clicking on the white background, you can then zoom out and inspect another area. Go ahead and try it out!

The visualization was created with D3.js. It has been successfully tested with Chrome 22, Firefox 15, and Safari 5.1. Unfortunately, Internet Explorer and Opera are not supported at this point.

Head Start is based on readership co-occurrence. I wrote several blogposts in the past that describe the ideas behind it. You can find them here, here, and here. There is also a paper from the LSNA workshop, which co-authored with Christian Körner, Kris Jack, and Michael Granitzer. I will write another post about the inner workings of the visualization in the near future. Until then, I am looking forward to your comments and feedback!

My secondment at Mendeley is drawing to a close. In the last few weeks, I have been mainly working on a running prototype of the overview visualizations of research fields. The idea is to give people who are new to a field a head start in their literature search. Below you can see a first screenshot of the prototype for the field of Educational Technology. The blue bubbles repesent the different research areas in Educational Technology. Shining through these bubbles are the most important papers in these areas. The papers become fully visible upon zooming into the area, and you can even access the paper previews within the visualization. I hope to get the prototype up and running in the next two weeks, so you can explore the visualization for yourself.

And now for something completely different

Well, at least partially different. If you are a researcher, you will have encountered the following situation: you are working on a project when you suddenly find out that someone has done exactly the same thing as you but 5(0) years earlier. Even though you have done an extensive literature review, that particular paper or project has escaped your search. There are certain reasons why something like that might happen. One is that terminology is very fluent in research; even within the same field names can change pretty quickly. This is the reason why I rely on structural data (co-readership patterns) rather than textual data (e.g. keywords, titles, abstracts) to create my visualizations. Structural data has proven to be a lot more stable over time. Much like an individual researcher usually not only searches for literature but also follows references and incoming citations to find related research.

Another reason for previous research going unnoticed is that the research was done in another community or research field. In that case, not only the terminology might be different, but also the link between the two areas may be inexistent. Ever since I started with these visualizations, I thus wanted to not only show the borders of research areas, but also their overlaps.

A semantic approach

The idea to use semantics arose when I had the problem to come up with a mechanism for automatically naming the research areas. I ended up sending the abstracts and titles from each area to OpenCalais and Zemanta. Both services crawl the semantic web and return a number of concepts that are describing the content. I use this information to find the most appropriate  name for the area. I compare the concepts that I get back to word n-grams. The more words a concept has, and the more often it occurs within the text, the more likely it is to be the name of the area.

Now I would like to use the concepts that I get back from the webservices to show connections between research areas. Using semantic reasoning, it should be possible to show overlapping research areas by common concepts. The number of common concepts could be an indicator of the size of the overlap. If there was some kind of concept classification involved, it would also be possible to give hints not only on the extent of the overlap, but also on the nature of the overlap. Are two areas using the same methodes, or are they working on similar topics? Which areas have similar theoretical backgrounds, or similar problems to solve?

This is still in a very early stage, but I am eager to get feedback on the idea. What do you think about involving semantics in that kind of way? What are other potential uses of a semantic description of research fields?

I haven’t blogged lately, mostly due to the fact that I was busy moving to London. I will be with Mendeley for the next four months in the context of the Marie Curie project TEAM. My first week is over now, and I have already started to settle in thanks to the great folks at Mendeley, who have given me a very warm welcome!

My secondment at Mendeley will focus on visualizing research fields with the help of readership statistics. A while ago, I blogged about the potential that readership statistics have for mapping out scientific fields. While these thoughts were on a rather theoretical level, I have been taking a more practical look at the issue in the last few months. Together with Christian Körner, Kris Jack, and Michael Granitzer, I did an exploratory first study on the subject. This resulted in a paper entitled “Harnessing Usage Statistics for Research Evaluation and Knowledge Domain Visualization” which I presented at the Large Scale Network Analysis Workshop at WWW’2012 in Lyon.

The problem

The problem that we started out with is the lacking overview of research fields. If you want to get an overview of a field, you usually go to an academic search engine and either type in a query or, if there has been some preselection, browse to the field of your choice. You will then be presented with a large number of papers. You usually pick the most popular overview article, read through it, browse the references, and look at the recommendations or incoming citations (if available). You choose which paper to read next and repeat. Over time, this strategy allows you to build a mental model of the field. Unfortunatly, there are a few issues with this approach:

  • It is very slow.
  • You never know when you are finished. Even with the best search strategy, you might still have a blind spot.
  • Science and Research are growing exponentially, making it very hard to not only get an overview, but also to keep it.

In come visualizations

Below you can see the visualization of the field of Technology Enhanced Learning developed in the exploratory study for the LSNA workshop. Here is how you read it: each bubble represents a paper, and the size of the bubble represents the number of readers. Each bubble is attributed to a research area denoted by a color – in this case either “Aadaptive Hypermedia” (blue), “Game-based Learning” (red), or “Miscellaneous” (yellow). The closer that two papers are in the visualization, the closer they are subject-wise. Morevoer, the further a paper is to the center of a field, the more central it is for that field. If you click on the visualization, you will get to a HTML5 version built with Google Charting Tools. In this interactive visualization, you can hover over a bubble to see the metadata of the paper.

Usually, visualizations like this one are based on citations. Small defined co-citations as a measure of subject similarity. The more often two authors or publications are being referenced in the same publication, the closer they are subject-wise. Using this measure in connection with multi-dimensional scaling and clustering, one can produce a visualization of a field. The co-citation measure is empirically well validated and has been used in hundreds, if not thousands of studies. Unfortunately, there a problem inherent of citations: they take a rather long time to appear. It takes three to five years before the number of incoming citations reaches its peark. Therefore, visualizations based on co-citations are actually a view of the past, and do not reflect recent developments in a field.

How to deal with the citation lag?

In the last few years, usage statistics have been a focus for research evaluation (see the altmetrics movement for example), and in some cases also visualizations. Usage statistics were not available, at least not on a large scale, prior to the web and tools such as Mendeley. One of the advantages of usage statistics in comparison to citations is that they are earlier available. People can start reading the paper immediately after publication, and in the case of pre-prints even before that. The measure that I used to produce the visualization above is the co-occurrence of publications in Mendeley libraries. Much like the possibility that two books, which are often rented from the library together, are of the same or a similar subject is high, the co-occurrence in libraries is taken as a measure of subject similarity. I took the Technology Enhanced Learning thesaurus to derive all libraries from that field. I then selected the 25 most frequent papers, and calculated their co-occurrences.

As this was our first study, we limited ourselves to only libraries from the field of computer science. As you can see, we were able to derive two areas pretty well: adaptive hypermedia, and game-based learning. Both are very important for the field. Adaptive hypermedia is a core topic of the field, especially with computer scientists; game-based learning is an area that has received a lot of attention in the last few years and continues to be of great interest for the community. You will also have noticed that there is a huge cluster labelled “Miscellaneous”. These papers could not be attributed to one research area. There are several possible reasons for this cluster: the most likely is that we did not have enough data. Another explanation is that Technology Enhanced Learning is still a growing field, with diverse foci, which results in a large cluster of different publications. Furthermore, we expect readership to be less focused than citations. This has on the one hand the possibility to show more influences to a field than citation data would, on the other hand too little focus will result in fuzzy clusters. To clarify these points, I am looking at the at the moment at a larger dataset, which includes all disciplines related to TEL (such as pedagogy, psychology, and sociology). Moreover, I am keen to learn more about the motivation of adding papers to one’s library.

In my view, visualizations based on co-readership bear a great potential. They could provide timely overviews, and serve as naviagtional instruments. Furthermore, it would be interesting to take snapshots from time to time to see the development of a field over the years. Finally, such visualizations could be useful to shed light on interdisciplinary relations and topical overlap between fields. These issues, and their relations to semantics will be the topic of another blogpost though. For the time being, I am curious about your opinions on the matter. How do you see visualizations? Could they be useful for your research? What would you like to be able to do with them in terms of features? I am looking forward to your opinions!

Citation
Peter Kraker, Christian Körner, Kris Jack, & Michael Granitzer (2012). Harnessing User Library Statistics for Research Evaluation and Knowledge Domain Visualization Proceedings of the 21st International Conference Companion on World Wide Web , 1017-1024 DOI: 10.1145/2187980.2188236

<span>%d</span> bloggers like this: