The models vs. patterns problem

Network model, or network pattern? Image by GustavoG

Update: Sebastian Dennerlein and I have written a paper entitled “Towards a Model of Interdisciplinary Teamwork for Web Science: What can Social Theory Contribute?” which includes the patterns vs. models problem. The paper has been accepted for the Web Science 2013 Workshop: “Harnessing the Power of Social Theory for Web Science”. You can download it here.

Scientific disciplines are curious formations. Each discipline has its own culture of doing things. Kuhn called these cultures “paradigms” – a combination of assumptions, theories, and methods that guide research in a discipline. Sometimes a new set of problems arises that cannot be answered with the standard paradigm of a single discipline. Instead, these problems require knowledge from different disciplines. In education, for example, it became apparent that learning might benefit from integrating technology. When the world wide web became social, studying peoples’ online behavior became as interesting as building the infrastructure that allow for these interactions to take place.

As a result, even more curious formations emerge - interdisciplinary fields. Suddenly, learning scientists need to work with computer scientists; computer scientist need to work with social scientists; social scientists need to work with jurists – and sometimes they all need to work together. As you can imagine, when people from different scientific cultures need to talk to each other, they have problems with comprehending each other. They come from a different background, have a different vocabulary, and a different methodology.

Examples for interdisciplinary fields are educational technology and web science. Both fields are interdisciplinary, in both fields computer scientists meet social scientists in a wider sense (sociologists, psychologists, learning scientists). And from my point of view, both fields suffer from the same problem. This problem is a fundamental problem. It goes beyond simple terminology or methodology. I call it the models vs. patterns problem.

Patterns and models – or computer science vs. social science

It struck me some time ago in a lecture on Knowledge Discovery in Databases by Brano Markić from the University of Mostar. He introduced knowledge discovery as defined by Fayyad et al. from 1996: the goal of knowledge discovery is to find new, valid, useful, and understandable patterns in data. Fayyad et al. use patterns and models synonymously, but Markić made a very interesting distinction: models are like the general equation of a line y = a + bx, while patterns are like a specific equation, e.g. y = 5 + 2x. Fayyad et al. also describe the knowledge discovery process: after preprocessing and data selection, you perform some sort of data mining method (e.g. clustering or machine learning). The output of the data mining step are the aforementioned patterns. In a next step, you evaluate the pattern and thus gain knowledge.

Then I realized: this is not only the knowledge discovery process. This is the way that a lot of computer scientists do research. Starting from a certain problem, they try to find patterns that relate to that problem in a big dataset. There is a certain caveat to that definition of knowledge, and Fayyad and his colleagues make it very clear: „[..] knowledge in this definition is purely user oriented and domain specific and is determined by whatever functions and thresholds the user chooses.” While this might be fine for practical problems, it surely isn’t for scientific ones. This definition of knowledge excludes any generalization of results that goes beyond the specific situation and the specific user.

Now, don’t get me wrong: I do not claim that computer scientists produce useless results. Computer scientists have developed good ways to identify reliable patterns that are independent of user and situation. But a lot of these patterns are hard to interpret. Say you wanted to know which Twitter users are more likely to talk to strangers, and by various analyses you find that those are the ones that mention significantly more names of colors in their tweets. This might be a very stable pattern in the sense described before, but how do you interpret this results? This is when computer scientists turn to social scientists in order to find answers to their questions.

Social scientists, however, have a fundamentally different way of approaching a problem. Let’s take the problem of which users are more likely to talk to strangers. Usually social scientists first turn to theories, in order to see which one might be applicable to the problem area. They might choose social information processing that deals with how people get to know each other online. Then they come up with a general model or hypotheses based on this theory that describes the problem. Afterwards, they build an instrument to test this model, such as a survey, an interview, or an observation. In the end, they know whether the model has survived this specific test (or, they adapt the model to the results – but no one would do that of course). The usual problem is that due to smaller sample sizes it is unclear to what extent the results can be generalized. That is when social scientists turn to computer scientists who can seemingly provide access to larger datasets.

This is when the confusion begins: social scientists disregard computer scientists’ results because they are not grounded in theory. Computer scientists disregard social scientists’ results because they are not based on big datasets. Social scientists cannot interpret computer scientists’ results because they are often on a level that is not covered by traditional theories and models. Computer scientists cannot test social scientists’ models because they often do not have the data in the form that is required by the models.

Overcoming the problem

In my opinion, it is important for interdisciplinary fields to close the gap that results from the models vs. patterns problem. Otherwise, the different disciplines cannot work together as effective as they potentially could. On the more pattern-oriented side, it would be important to understand that theories are more than just castles in the sky. They can be effective guiding principles to interpret the results that they achieve. Theory should be baked into research as a guiding principle to be able to understand these results. On the more theory-oriented side, researchers need to understand that data mining methods can be useful to evaluate models models, but their properties need already be considered when building the models. In that way, both sides could build on each others’ strength – instead of suspiciously looking at each others’ results.

What do you think? Am I oversimplifying here? What are the biggest challenges in interdisciplinary research from your perspective?

Thanks to Sebastian Dennerlein for valuable feedback on this post!

Citation
Peter Kraker, & Sebastian Dennerlein (2013). Towards a Model of Interdisciplinary Teamwork for Web Science: What can Social Theory Contribute? Web Science 2013 Workshop: Harnessing the Power of Social Theory for Web Science

About these ads
2 comments
  1. Hm, so, this is not really a commend on challenges but on the patterns vs. models part of the post: Scott E. Page has a section in his model thinking course ( https://www.coursera.org/course/modelthinking ) he calls “The big coefficient vs. New Realities”. He argues that while statistical analysis of patterns are a wonderful tool to analyse the current reality, they are new suited to simulate a changed system, because that data is not there. So if you want to simulate a changed reality you need a model. This thinking has a nice engineering aspect, I think.

    But that also leads to a motivation why we really should aim to overcome that gap: As researchers we should not only understand why a problem/or property exist, but help policy makers to overcome problems or use that systems property at it’s best, and for future use, you have no present data ;).

  2. Interesting point about predictions – theories and models are of course instrumental in that. We can observe that an apple falls to the ground every time we let it go, but we can never be sure about future cases (the fundamental problem of induction). It is the theory of gravitation that lets us safely assume that the apple will fall down tomorrow as well.

    Note that predictive modelling comes with its own set of problems though, and they get even worse when humans are involved. Future share prices is a well-known example. As Niels Bohr once said: “Prediction is very difficult, especially about the future”.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: