Archive

Tag Archives: TEAM

Photo by Cory Doctorow, Slides by Lora Aroyo

Photo by Cory Doctorow, slides by Lora Aroyo

I spent last week at Web Science 2013 in Paris. And what a well spent time that was. Web Science was for sure the most diverse conference I have ever attended. One of the reasons for this diversity is that Webscience was collocated with CHI (Human-Computer-Interaction) and Hypertext. But most importantly, the community of Webscience itself is very diverse. There were more than 300 participants from a wide array of disciplines. The conference spanned talks from philosophy to computer science (and everything in-between) with keynotes by Cory Doctorow and Vint Cerf. This resulted in many insightful discussions, looking at the web from a multitude of angles. I really enjoyed the wide variety of talks.

Nevertheless, there were some talks that failed to resonate with the audience. It seems to me that this was mostly due to the fact that they were too rooted in a single discipline. Some presenters assumed a common understanding of the problem discussed and used a lot of domain-specific vocabulary that made it hard to follow the talk. Don’t get me wrong: most presenters tried to appeal to the whole audience but with some subjects this seemed to be impossible.

To me, this shows that a better insight is needed on what Web Science actually is and more discussion on what should be researched under this banner. There seems to be a certain uncertainty about this, which was also reflected in the peer reviews. Hugh Davis, the general chair for Websci’13, highlighted this in his opening speech:

I think that Web Science is a good example where Open Peer Review could contribute to a common understanding and a better communication among the actors involved. I have been critical of open processes in the past because they take away the benefits of blinding. Mark Bernstein, the program chair, also stressed this point in a tweet:

Nowadays, however, I think that the potential benefits of open peer review (transparency, increased communication, incentives to write better reviews) outweigh the effects of taking away the anonymity of reviewers. Science will always be influenced by power structures, but with open peer review they are at least visible. Don’t get me wrong: I really like the inclusive approach to Web Science that the organizers have taken. The web cannot be understood with the paradigm of a single discipline, and at this very point in time it is very valuable to get input from all sides on the discussion. In my opinion, open peer review could help in facilitating this discussion before and after the conference as well.

Contributions

I made two contributions to this year’s Web Science conference. First, I presented a paper written together with Sebastian Dennerlein in the Social Theory for Web Science Workshop entitled “Towards a Model of Interdisciplinary Teamwork for Web Science: What can Social Theory Contribute?”. In this position paper, we argue that social scientists and computer scientists do not work together in an interdisciplinary way due to a fundamentally different approach to research. We sketch a model of interdisciplinary teamwork in order to overcome this problem. The feedback on this talk was very interesting. On the one hand participants could relate to the problem, but on the other hand they alerted us of many other influences to interdisciplinary teamwork. For one, there is often a disagreement at the very beginning of a research project about what the problem actually is. Furthermore, the disciplines are fragmented as well and have often different paradigms that they follow. We will consider this feedback when specifying the formal model. You can find the paper here and the slides of my talk below.

In general, the workshop was very well attended and there was a certain sense of common understanding regarding opportunities and challenges of applying social theory in web science. All in all, I think that a community has been established that could produce interesting results in the future.

My second contribution was a poster with the title “Head Start: Improving Academic Literature Search with Overview Visualizations based on Readership Statistics” which I co-wrote with Kris Jack, Christian Schlögl, Christoph Trattner, and Stefanie Lindstaedt. As you may recall, Head Start is an interactive visualization of the research field of Educational Technology based on co-readership structures. Head Start was received very positively. Many participants were interested in the idea of readership statistics for mapping. There were some scientometrists but also educational technologists who expressed their interest. Many comments went towards how the prototype could be extended. You can find the paper at the end of the post and the poster below.

Head Start

Several participants noted that they would like to adapt and extend the visualization. Clare Hooper for example is working on a content-based representation of the field of Web Science, and it would be interesting to combine our approaches. This encouraged me even more to open source the software as soon as possible.

All in all, it was a very enjoyable conference. I also like the way that the organizers innovate in the format every year. The pecha kucha session worked especially well in my opinion, sporting concise and entertaining talks throughout. Thanks to all organizers, speakers and participants for making this conference such a nice event!

Citation
Peter Kraker, Kris Jack, Christian Schlögl, Christoph Trattner, & Stefanie Lindstaedt (2013). Head Start: Improving Academic Literature Search with Overview Visualizations based on Readership Statistics Web Science 2013

Advertisements

Today, I am happy to announce that Head Start, my overview visualization of Educational Technology has been released on Mendeley Labs. The visualization is intended to give researchers that are new to a field a head start on their literature review (hence the name).

When you fire up the visualization, the main areas in the field are shown, represented by the blue bubbles. Once you click on a bubble, you are presented with the main papers in that area. The dropdown on the right displays the same data in list form. By clicking on one of the papers, you can access all metadata for that paper. If a preview is available, you can retrieve it by clicking on the thumbnail in the metadata panel. By clicking on the white background, you can then zoom out and inspect another area. Go ahead and try it out!

The visualization was created with D3.js. It has been successfully tested with Chrome 22, Firefox 15, and Safari 5.1. Unfortunately, Internet Explorer and Opera are not supported at this point.

Head Start is based on readership co-occurrence. I wrote several blogposts in the past that describe the ideas behind it. You can find them here, here, and here. There is also a paper from the LSNA workshop, which co-authored with Christian Körner, Kris Jack, and Michael Granitzer. I will write another post about the inner workings of the visualization in the near future. Until then, I am looking forward to your comments and feedback!

My secondment at Mendeley is drawing to a close. In the last few weeks, I have been mainly working on a running prototype of the overview visualizations of research fields. The idea is to give people who are new to a field a head start in their literature search. Below you can see a first screenshot of the prototype for the field of Educational Technology. The blue bubbles repesent the different research areas in Educational Technology. Shining through these bubbles are the most important papers in these areas. The papers become fully visible upon zooming into the area, and you can even access the paper previews within the visualization. I hope to get the prototype up and running in the next two weeks, so you can explore the visualization for yourself.

And now for something completely different

Well, at least partially different. If you are a researcher, you will have encountered the following situation: you are working on a project when you suddenly find out that someone has done exactly the same thing as you but 5(0) years earlier. Even though you have done an extensive literature review, that particular paper or project has escaped your search. There are certain reasons why something like that might happen. One is that terminology is very fluent in research; even within the same field names can change pretty quickly. This is the reason why I rely on structural data (co-readership patterns) rather than textual data (e.g. keywords, titles, abstracts) to create my visualizations. Structural data has proven to be a lot more stable over time. Much like an individual researcher usually not only searches for literature but also follows references and incoming citations to find related research.

Another reason for previous research going unnoticed is that the research was done in another community or research field. In that case, not only the terminology might be different, but also the link between the two areas may be inexistent. Ever since I started with these visualizations, I thus wanted to not only show the borders of research areas, but also their overlaps.

A semantic approach

The idea to use semantics arose when I had the problem to come up with a mechanism for automatically naming the research areas. I ended up sending the abstracts and titles from each area to OpenCalais and Zemanta. Both services crawl the semantic web and return a number of concepts that are describing the content. I use this information to find the most appropriate  name for the area. I compare the concepts that I get back to word n-grams. The more words a concept has, and the more often it occurs within the text, the more likely it is to be the name of the area.

Now I would like to use the concepts that I get back from the webservices to show connections between research areas. Using semantic reasoning, it should be possible to show overlapping research areas by common concepts. The number of common concepts could be an indicator of the size of the overlap. If there was some kind of concept classification involved, it would also be possible to give hints not only on the extent of the overlap, but also on the nature of the overlap. Are two areas using the same methodes, or are they working on similar topics? Which areas have similar theoretical backgrounds, or similar problems to solve?

This is still in a very early stage, but I am eager to get feedback on the idea. What do you think about involving semantics in that kind of way? What are other potential uses of a semantic description of research fields?

Two weeks ago, I had the pleasure of moderating the Special Track on Research 2.0 at i-KNOW 2012. It was now the third year that we had a Research 2.0 themed track at i-KNOW. Counting in the two workshop at the European Conference on Technology Enhanced Learning (EC-TEL) in 2009 and 2010, this was the fifth installment of its kind. For the second time, the special track was collaboration of the STELLAR Network of Excellence and the Marie Curie project TEAM.

In the latest edition, we had a wide papers on a variety of subjects. But the track was not only diverse in terms of topics, but also in terms of geography. The authors and speakers came from Belgium, Canada, the US, the UK, Germany, and Austria. At this point, I would like to thank our excellent program committee. These fine people not only helped us in reviewing the submissions but also spread the word of the event far beyond our own circles. So what happened in the special track?

Session 1

Erik Duval (pictured above) from KU Leuven started the day with an inspiring keynote. He talked about analytics in research, and how we need tools to manage the exponential growth of knowledge in research. He presented several tools developed at KU Leuven, such as TiNYARM (This is Not Yet Another Reference Manager), a social reading application, More!, a mobile conference application, and visualizations of scientific papers for a tabletop. For all the details, check out his presentation on Slideshare.

Next up was Marta Sabou from MODUL University Vienna, who presented a paper on crowdsourcing in (NLP) research. She talked about different crowdsourcing approaches in research along the dimensions of time, genre, and steps in the research process. She then laid out how crowdsourcing has revolutionised NLP research in terms of cost and diversification of the research agenda. She finished with the challenges that arise from this new research methodology. More can be found in her presentation on Slideshare.

In the following talk, Aida Gandara from the Cyber-Share Center of the University of Texas at El Paso presented an infrastructure that supports researchers in documenting and sharing scientific research on the semantic web. The infrastructure follows a methodology that emphasizes the need for documentation during the whole research process, not only when a research cycle has ended. The infrastructure is currently used by a number of teams which collaboratively document their research in it.

Session 2

After the lunch break, Daniel Bahls continued on the notion of the semantic web. Their paper, however, focuses on the field of economics. He discussed an approach for representing statistical data from empirical research in a semantic form. Afterwards, he showed prototypes that build on this approach, and allow for recommendation in user assistance in evaluating datasets. He concluded with the outlook that open data is a prerequisite for open science, but that it also needs methodological information in order to reproduce research results.

Orland Hoeber returned to the topic of analytics in the following talk. He presented a system for scholarly search which is enhanced by metadata visualization. Among these visualizations are a histogram of the most frequent keywords in the search results, as well as a bow tie representation of citation metadata. The latter is an innovatve way of conveying citation patterns to users. The focus is on integrating the visualizations in the search and exploration process as seamless as possible. For all details, see the paper.

The second session was concluded by two practical demos. First off, Christian Voigt from the Centre for Social Innovation and Adam Cooper from the University of Bolton introduced a roadmapping methodology for Technology Enhanced Learning. At first, they talked about sources and frameworks for roadmapping. Then, they presented their approach which is based on text mining of blogs from the field. For more information, see the demo paper. Afterwards, Karl Voit introduced Emacs org-mode which enables researchers to include raw data and data analysis in the research paper. This allows for the reproducibilty of research results. An example can be found in his demo paper.

Session 3

After the coffee break, Hermann Maurer gave an invited talk on one of the first open access journals J.UCS – Journal on Universal Computer Science. J.UCS was established in 1994; today it is in its 18th volume, attracting 700,000 article downloads per year. The journal builds on a Hyperwave server which enables several unique features such as automatic detection of incoming citations, and a refereeing system that includes the whole editorial board. If you want to know more about this system and its history, take a look at Hermann Maurer’s presentation.

In the general discussion, open access was one of the main topics. It was debated how the process of opening up research could be facilitated, concerning both technical and social issues. One of the conclusions was that open access is a necessary step, but that it can only be the beginning of what we call Research 2.0.

In my opinion, the special track gave a good insight on what is possible when science opens up and embraces the possibilities of the web. I would like to thank all authors, reviewers, and participants, as well as my co-organizers Roman Kern (Know-Center) and Kris Jack (Mendeley) for making this event possible. See you online and at the next Research 2.0 event!

I haven’t blogged lately, mostly due to the fact that I was busy moving to London. I will be with Mendeley for the next four months in the context of the Marie Curie project TEAM. My first week is over now, and I have already started to settle in thanks to the great folks at Mendeley, who have given me a very warm welcome!

My secondment at Mendeley will focus on visualizing research fields with the help of readership statistics. A while ago, I blogged about the potential that readership statistics have for mapping out scientific fields. While these thoughts were on a rather theoretical level, I have been taking a more practical look at the issue in the last few months. Together with Christian Körner, Kris Jack, and Michael Granitzer, I did an exploratory first study on the subject. This resulted in a paper entitled “Harnessing Usage Statistics for Research Evaluation and Knowledge Domain Visualization” which I presented at the Large Scale Network Analysis Workshop at WWW’2012 in Lyon.

The problem

The problem that we started out with is the lacking overview of research fields. If you want to get an overview of a field, you usually go to an academic search engine and either type in a query or, if there has been some preselection, browse to the field of your choice. You will then be presented with a large number of papers. You usually pick the most popular overview article, read through it, browse the references, and look at the recommendations or incoming citations (if available). You choose which paper to read next and repeat. Over time, this strategy allows you to build a mental model of the field. Unfortunatly, there are a few issues with this approach:

  • It is very slow.
  • You never know when you are finished. Even with the best search strategy, you might still have a blind spot.
  • Science and Research are growing exponentially, making it very hard to not only get an overview, but also to keep it.

In come visualizations

Below you can see the visualization of the field of Technology Enhanced Learning developed in the exploratory study for the LSNA workshop. Here is how you read it: each bubble represents a paper, and the size of the bubble represents the number of readers. Each bubble is attributed to a research area denoted by a color – in this case either “Aadaptive Hypermedia” (blue), “Game-based Learning” (red), or “Miscellaneous” (yellow). The closer that two papers are in the visualization, the closer they are subject-wise. Morevoer, the further a paper is to the center of a field, the more central it is for that field. If you click on the visualization, you will get to a HTML5 version built with Google Charting Tools. In this interactive visualization, you can hover over a bubble to see the metadata of the paper.

Usually, visualizations like this one are based on citations. Small defined co-citations as a measure of subject similarity. The more often two authors or publications are being referenced in the same publication, the closer they are subject-wise. Using this measure in connection with multi-dimensional scaling and clustering, one can produce a visualization of a field. The co-citation measure is empirically well validated and has been used in hundreds, if not thousands of studies. Unfortunately, there a problem inherent of citations: they take a rather long time to appear. It takes three to five years before the number of incoming citations reaches its peark. Therefore, visualizations based on co-citations are actually a view of the past, and do not reflect recent developments in a field.

How to deal with the citation lag?

In the last few years, usage statistics have been a focus for research evaluation (see the altmetrics movement for example), and in some cases also visualizations. Usage statistics were not available, at least not on a large scale, prior to the web and tools such as Mendeley. One of the advantages of usage statistics in comparison to citations is that they are earlier available. People can start reading the paper immediately after publication, and in the case of pre-prints even before that. The measure that I used to produce the visualization above is the co-occurrence of publications in Mendeley libraries. Much like the possibility that two books, which are often rented from the library together, are of the same or a similar subject is high, the co-occurrence in libraries is taken as a measure of subject similarity. I took the Technology Enhanced Learning thesaurus to derive all libraries from that field. I then selected the 25 most frequent papers, and calculated their co-occurrences.

As this was our first study, we limited ourselves to only libraries from the field of computer science. As you can see, we were able to derive two areas pretty well: adaptive hypermedia, and game-based learning. Both are very important for the field. Adaptive hypermedia is a core topic of the field, especially with computer scientists; game-based learning is an area that has received a lot of attention in the last few years and continues to be of great interest for the community. You will also have noticed that there is a huge cluster labelled “Miscellaneous”. These papers could not be attributed to one research area. There are several possible reasons for this cluster: the most likely is that we did not have enough data. Another explanation is that Technology Enhanced Learning is still a growing field, with diverse foci, which results in a large cluster of different publications. Furthermore, we expect readership to be less focused than citations. This has on the one hand the possibility to show more influences to a field than citation data would, on the other hand too little focus will result in fuzzy clusters. To clarify these points, I am looking at the at the moment at a larger dataset, which includes all disciplines related to TEL (such as pedagogy, psychology, and sociology). Moreover, I am keen to learn more about the motivation of adding papers to one’s library.

In my view, visualizations based on co-readership bear a great potential. They could provide timely overviews, and serve as naviagtional instruments. Furthermore, it would be interesting to take snapshots from time to time to see the development of a field over the years. Finally, such visualizations could be useful to shed light on interdisciplinary relations and topical overlap between fields. These issues, and their relations to semantics will be the topic of another blogpost though. For the time being, I am curious about your opinions on the matter. How do you see visualizations? Could they be useful for your research? What would you like to be able to do with them in terms of features? I am looking forward to your opinions!

Citation
Peter Kraker, Christian Körner, Kris Jack, & Michael Granitzer (2012). Harnessing User Library Statistics for Research Evaluation and Knowledge Domain Visualization Proceedings of the 21st International Conference Companion on World Wide Web , 1017-1024 DOI: 10.1145/2187980.2188236

Recently I joined the TEAM Project, which focuses on research networks on the web. The project deals with issues like recommendation, text disambiguation, and metadata validation. In my part, I will do something a bit different: I will take a look at how a research fields is represented in such research networks.

So far, academic fields have been analyzed using metadata that came with the published articles. That includes, amongst others co-authorship, categories, keywords, and most importantly, citations. With this kind of metadata, it is possible to map out a research field from the position of the authors. Now, employing user generated data from research networks, it is possible to take a look at a field from a whole new viewpoint: that of the reader.

You might ask: why is that interesting? Well, meta-data from articles always only give you one part of the story. Co-citation and co-authorship analysis surely are sound ways to look at a field; but what if there are two groups of authors in different fields working on the same topic that just never publish together and never cite each other? In that case you will not get the connection between them. Most probably they will be using different language, so text analysis won’t help either. In come the readers: they might have identified that the authors are working on the same topic despite all the issues mentioned above. Furthermore, they might have grouped them together or used the same tags to describe their articles. If we analyze these groups and tags, we can find the connection, thus extending the field beyond its original borders.

That is not all; other interesting questions include: How are articles shared among researchers, and what does that say about interdisciplinarity in a field? Are the articles that are often read the ones that are often cited? As you can see, I am pretty enthusiastic about this. I could go on why I think that readership analysis is a good idea, but I am more inclined to get some early feedback: What other issues would be interesting to look at? What problems do you see with this kind of analysis?

%d bloggers like this: