Archive

Methods

Network model, or network pattern? Image by GustavoG

Update: Sebastian Dennerlein and I have written a paper entitled “Towards a Model of Interdisciplinary Teamwork for Web Science: What can Social Theory Contribute?” which includes the patterns vs. models problem. The paper has been accepted for the Web Science 2013 Workshop: “Harnessing the Power of Social Theory for Web Science”. You can download it here.

Scientific disciplines are curious formations. Each discipline has its own culture of doing things. Kuhn called these cultures “paradigms” – a combination of assumptions, theories, and methods that guide research in a discipline. Sometimes a new set of problems arises that cannot be answered with the standard paradigm of a single discipline. Instead, these problems require knowledge from different disciplines. In education, for example, it became apparent that learning might benefit from integrating technology. When the world wide web became social, studying peoples’ online behavior became as interesting as building the infrastructure that allow for these interactions to take place.

As a result, even more curious formations emerge – interdisciplinary fields. Suddenly, learning scientists need to work with computer scientists; computer scientist need to work with social scientists; social scientists need to work with jurists – and sometimes they all need to work together. As you can imagine, when people from different scientific cultures need to talk to each other, they have problems with comprehending each other. They come from a different background, have a different vocabulary, and a different methodology.

Examples for interdisciplinary fields are educational technology and web science. Both fields are interdisciplinary, in both fields computer scientists meet social scientists in a wider sense (sociologists, psychologists, learning scientists). And from my point of view, both fields suffer from the same problem. This problem is a fundamental problem. It goes beyond simple terminology or methodology. I call it the models vs. patterns problem.

Patterns and models – or computer science vs. social science

It struck me some time ago in a lecture on Knowledge Discovery in Databases by Brano Markić from the University of Mostar. He introduced knowledge discovery as defined by Fayyad et al. from 1996: the goal of knowledge discovery is to find new, valid, useful, and understandable patterns in data. Fayyad et al. use patterns and models synonymously, but Markić made a very interesting distinction: models are like the general equation of a line y = a + bx, while patterns are like a specific equation, e.g. y = 5 + 2x. Fayyad et al. also describe the knowledge discovery process: after preprocessing and data selection, you perform some sort of data mining method (e.g. clustering or machine learning). The output of the data mining step are the aforementioned patterns. In a next step, you evaluate the pattern and thus gain knowledge.

Then I realized: this is not only the knowledge discovery process. This is the way that a lot of computer scientists do research. Starting from a certain problem, they try to find patterns that relate to that problem in a big dataset. There is a certain caveat to that definition of knowledge, and Fayyad and his colleagues make it very clear: „[..] knowledge in this definition is purely user oriented and domain specific and is determined by whatever functions and thresholds the user chooses.” While this might be fine for practical problems, it surely isn’t for scientific ones. This definition of knowledge excludes any generalization of results that goes beyond the specific situation and the specific user.

Now, don’t get me wrong: I do not claim that computer scientists produce useless results. Computer scientists have developed good ways to identify reliable patterns that are independent of user and situation. But a lot of these patterns are hard to interpret. Say you wanted to know which Twitter users are more likely to talk to strangers, and by various analyses you find that those are the ones that mention significantly more names of colors in their tweets. This might be a very stable pattern in the sense described before, but how do you interpret this results? This is when computer scientists turn to social scientists in order to find answers to their questions.

Social scientists, however, have a fundamentally different way of approaching a problem. Let’s take the problem of which users are more likely to talk to strangers. Usually social scientists first turn to theories, in order to see which one might be applicable to the problem area. They might choose social information processing that deals with how people get to know each other online. Then they come up with a general model or hypotheses based on this theory that describes the problem. Afterwards, they build an instrument to test this model, such as a survey, an interview, or an observation. In the end, they know whether the model has survived this specific test (or, they adapt the model to the results – but no one would do that of course). The usual problem is that due to smaller sample sizes it is unclear to what extent the results can be generalized. That is when social scientists turn to computer scientists who can seemingly provide access to larger datasets.

This is when the confusion begins: social scientists disregard computer scientists’ results because they are not grounded in theory. Computer scientists disregard social scientists’ results because they are not based on big datasets. Social scientists cannot interpret computer scientists’ results because they are often on a level that is not covered by traditional theories and models. Computer scientists cannot test social scientists’ models because they often do not have the data in the form that is required by the models.

Overcoming the problem

In my opinion, it is important for interdisciplinary fields to close the gap that results from the models vs. patterns problem. Otherwise, the different disciplines cannot work together as effective as they potentially could. On the more pattern-oriented side, it would be important to understand that theories are more than just castles in the sky. They can be effective guiding principles to interpret the results that they achieve. Theory should be baked into research as a guiding principle to be able to understand these results. On the more theory-oriented side, researchers need to understand that data mining methods can be useful to evaluate models models, but their properties need already be considered when building the models. In that way, both sides could build on each others’ strength – instead of suspiciously looking at each others’ results.

What do you think? Am I oversimplifying here? What are the biggest challenges in interdisciplinary research from your perspective?

Thanks to Sebastian Dennerlein for valuable feedback on this post!

Citation
Peter Kraker, & Sebastian Dennerlein (2013). Towards a Model of Interdisciplinary Teamwork for Web Science: What can Social Theory Contribute? Web Science 2013 Workshop: Harnessing the Power of Social Theory for Web Science

This week I saw a presentation by David Lowe from University of Technology in Sydney on the Australian Labshare project. In this project, they are developing remote labs; laboratories that can be operated over the internet.

Unfortunately, I was not able to see the demo of the software (check it out – it is called Sahara and you can find it on Sourceforge), but as far as I understood it, the process is as follows: You can choose from a range of experiments in every lab. If you have found an interesting one, you can fiddle with the settings and – subsequently – run it. In the process you are getting visual feedback from a camera. Afterwards your are presented with the data from the experiment in the form of sketches and numbers.

At the moment, they are using it mainly for educational purposes. There was a long discussion after the presentation whether real labs could largely be replaced with simulations. This is an interesting topic, and it sparked a lot of controversy, but I was more interested in  “Doing research with remote labs”. I am not a natural scientist, but as far as I can see, remote experiments would make it a lot easier to write protocols and keep open lab books like on OpenWetWare. The software records your settings as well as your results, so you would only have to fill in the rationale between the experiments.

Apart from the set-up and the data, you would be able to also share something even more valuable: you could share the whole experiment! I mean this in a sense that everyone would be able to have the same experience as the original researcher. This naturally includes the recordings, but it extends even beyond that. You could provide others with the exact same set-up in the exact same lab, so that they can reproduce the experiment from the beginning to the end.

I am aware that there are certain challenges on the way: experiments in research are most possibly more complicated and need more variation than those intended for education. Still, I am very intrigued by the idea. I would love to hear your opinions on this (especially from people in natural sciences) and I will definitely follow the Labshare project to see what they will come up with in this area.

If you followed me on Twitter lately, you could not help but notice that I am conducting a series of focus groups on Web 2.0 practices among researchers. In the last group, we did not get to discuss a very interesting topic – career planning, that is – and participants were quite eager to talk about it in a follow-up. Since a second face-to-face meeting was not feasible, I decided to set up a Google Wave for this purpose. I hadn’t used Wave much before, but I considered it to be especially suitable in this case, primarily out of three reasons:

  • Wave allows you to have a structured discussion with different “threads” (indented replies). Thus, I could ask multiple initial questions without losing overview and confusing participants.
  • Wave bridges the gap between synchronous and asynchronous communication. You can have IM-like chat, because you see everything a person types, but the posts are all persisted into one place, and participants are notified of new content in the wave.
  • Wave has a number of extensions which add a lot of functionality interesting for a group discussion. There is a voting extension, a mindmapping tool, and many others (unfortunately I was not able to use extensions – see below for more details)

Getting people into Wave

Since most of the participants had not used Wave before, I had to invite them. A superfluous step since last week, I know, but this was quite challenging: I did not get a notification when someone had accepted my invitation, and not everyone who accepted my invitation was shown in my contacts. Moreover, I had trouble finding people on Google Wave without their exact Wave or Gmail address. Possibly these issues are now fixed though.

Once I got everyone onto the wave, however, problems stopped and the discussion started to flow. Wave is pretty much self-explaining. I had posted all the necessary visualizations from the discussion (flipcharts mostly), and three blips (Google lingo for post) with initial questions, to which I asked people to provide indented replies. An interesting side-note is that you can make an indented reply to every blip but the last one in a thread. This is done by hovering over the bottom border and clicking on “Insert reply here”. On the last blip, though, you only get “Continue this thread” (or “Click here to reply”) with the same procedure, which generates a reply on the same level as the preceding blip. Unless of course, you choose “Indented reply” from a dropdown menu within the blip. This confused all participants, and even the more experienced ones consistently failed to provide indented replies to the last blip in a thread, which disrupted the structure of the discussion a bit.

Even though the wave did not grow exceptionally large (about thirty blips containing mostly text), the application would get unbearably slow on rare occasions. Apart from that, the discussion went on pretty well; posts ranged from longer contributions over a few paragraphs to chat-like comments, such as “I agree”. Notifications about new posts came along rather reliable, containing the wave(s) that had been updated with a (supposed) excerpt from the newest post. This excerpt, however, contained text from a post that I had already read most of the time.

Extensions and data in the wave

One of the main drawbacks to me was the fact that I could not use bots and extensions in the wave. Why? The reason is simple: I had promised to participants that all their contributions would remain with me, and that I would only release anonymisied transcripts to third parties with their explicit consent. If you, however, add gadgets or bots to a wave, they can read the entire content of the wave (as detailed in Google’s privacy policy for Wave). Since most of the extensions are not developed by Google, I would have to check every provider’s privacy policy, (1) which does not exist most of the time, and (2) even if it does, I would still have to decide, if I can trust this provider.

This is also true for the “Ferry” extension which exports waves to Google Docs. Consequently, I had to manually copy and paste the content to a Word file and add all the formatting that was lost, e.g. bullet points and indentations. This was still a lot faster than transcribing the same content from video but a Wave-native export to overcome this problem would be appreciated.

Conclusion

All in all, I would say the experience was quite enjoyable. I would use Wave for a group discussion of this size again any day, but I would be more wary towards a whole focus group or discussions with more participants and longer threads. Then the issue of indented replies and the missing native export would be of more importance.