Archive

Editorial

Note: This is a reblog from the OKFN Science Blog. To my excitment and delight, I was recently awarded a Panton Fellowship. As part of my duties, I will be regularly blogging there about my activities concerning open data and open science.

Peter Kraker at Barcamp Graz 2012. Photo by Rene Kaiser

Photo by Rene Kaiser

Hi, my name is Peter Kraker and I am one of the new Panton Fellows. After an exciting week at OKCon, I was asked to introduce myself and what I want to achieve during my fellowship, which I am very happy to do. I am a research assistant at Know-Center of Graz University of Technology and a late-stage PhD student at University of Graz. Like many others, I believe that an open approach is essential for science and research to making progress. Open science to me is about reproducibility and comparability of scientific output. Research data should therefore be put into the public domain, as called for in the Panton Principles.

In my PhD, I am concerning myself with research practices on the web and how academic literature search can be improved with overview visualizations. I have developed and open-sourced a knowledge domain visualization called Head Start. Head Start is based on altmetrics data rather than citation data. Altmetrics are indicators of scholarly activity and impact on the web. Have a look at the altmetrics manifesto for a thorough introduction.

In my evaluation of Head Start, I noticed that altmetrics are prone to sample biases. It is therefore important that analyses based on altmetrics are transparent and reproducible, and that the underlying data is openly available. Contributing to open and transparent altmetrics will be my first objective as a Panton Fellow. I will establish an altmetrics data repository for the upcoming open access journal European Information Science. This will allow the information science community to analyse the field based on this data, and add an additional data source for the growing altmetrics community. My vision is that in the long run, altmetrics will not only help us to evaluate science, but also to connect researchers around the world.

My second objective as a Panton Fellow is to promote open science based on an inclusive approach. The case of the Bermuda Rules, which state that DNA sequences should be rapidly released into the public domain, has shown that open practices can be established, if the community stands together. In my opinion, it is therefore necessary to get as many researchers aboard as possible. From a community perspective, it is the commitment to openness that matters, and the willingness to promote this openness. The inclusive approach puts the researcher in his or her many roles at the center of attention. This approach is not intended to replace existing initiatives but to make researchers aware of these initiatives and helping them with choosing their approach to open science. You can find more on that in on my blog.

Locally, I will be working with the Austrian Chapter of the Open Knowledge Foundation to promote open science based on this inclusive approach. Together with the Austrian Student’s Union, we will be having workshops with students, faculty, and librarians. I will also make the case for open science in the research communities that I am involved in. For the International Journal on Technology Enhanced Learning for example, I will develop an open data policy.

I am very honored to be selected as a Panton Fellow, and I am excited to get started. If you want to work with me on one or the other objective, please do not hesitate to contact me. You can also follow my work on Twitter and on my blog. Looking forward to furthering the cause of open data and open science with you!

Advertisements
by Alan Cleaver

Image by alancleaver

I am usually not a fast blogger. This post, however, has been rather long in the making, even by my standards. I first started to explore the topic of post privacy – i.e. the notion that the (almost) total loss of privacy is inevitable – in 2010. My interest was based on two observations. But before we get to these observations, let’s look at the term privacy first.

Defining privacy

Recently, there was an interesting discussion on the W3C mailing list on the definition of privacy. It quickly emerged that data protection and confidentiality (“the right to be left alone”) are two important concepts in that context. But as Kasey Chappelle put it, privacy is more than that. He defined privacy as informational self-determination, meaning the individual right to decide which information is shared about oneself and under what circumstances. On top of that, I would put Seda Gürses’s definition of privacy as a practice: not only the individual can decide on the use of personal information, but it is also a social convention on what is acceptable and what not. This “what is acceptable and what not” is fluent and subject to a social negotiation process.

The loss of privacy

Now for the observations that ignited my interest:

  1. All data about us is stored in digital form. Most of this data is with third parties, such as the state, insurance companies and so on. There is a lot of data  about us which we would never think of: location data collected by cashback cards and digital traffic surveillance for example, connection data in telecommunication… And this is not even taking into account the data about us that we or others put into the world – such as photos, tweets etc.
  2. Digital data is fugitive. It is in the nature of digital data that it can easily be copied and replicated, and we have a hard time protecting it. Countermeasures such as encryption are not widely adopted. Also, different entities have different interests;  Facebook is not in the data protection business after all.

In a highly interconnected world, these two factors spell trouble. In a recent keynote at WWW 2012 (the World Wide Web Conference), Tim Berners-Lee addressed further issues. One of them is jigsaw identification. Jigsaw identification relates to the fact that while information from one source might not be suitable to identify someone, the combination of information from different sources might well be. For example, if one source publishes post code and age of a person, then that information will not be enough to identify a person: these characteristics usually apply to more than one person. But if another source publishes gender and profession of the same person (which by themselves apply to several people as well), the combination of these four characteristics might be enough to uniquely identify a person. And with the eternal memory of the web, the dates of publication might be far from each other.

All of that leads me to the conclusion that data protection and confidentiality are a lost cause in a digital and highly interconnected world. And with all the data out there, informational self-determination will become impossible. As Tim said, we cannot know which data will be published about us in the future. If a potential employer can buy my health records from a data provider, then the whole notion of privacy as we know it is bound to fail. Furthermore, the data that we publish voluntarily is only the tip of the iceberg. More important is the data that we expose involuntarily (e.g. connection data), data that others expose voluntarily or involuntarily about us (see a data loss scandal near you), and information that can be inferred based on data from various sources (e.g. our social graph). As time passes by, the evidence grows for me that we are headed towards the loss of privacy.

What will happen?

Interestingly enough, most discussions that I had on the consequences of these developments followed the same pattern. The two most prominent views are: a) you are wrong, because I am the one who controls which data is out there (by setting everything private on Facebook, disallowing photo tagging etc.), and b) let’s simply abandon privacy. If all of the data is out there, we actually level the playfield for everyone. In that we discover that everyone has faults, we will attribute less importance to these faults, and society will come out better as a whole.

I think that both of these statements are wrong. Regarding the former statement, I pointed out earlier that the main problem is not the data that we voluntarily put on the web, but rather what others expose about us, what we expose involuntarily and what can be inferred from different data sources. I do not believe that anyone will have everybody’s data at their fingertips. But I do think that all data will be somehow obtainable. The gray market for data that was lost or stolen from third parties, is already huge, and it will continue to grow. It will supplemented by companies who explicitely seek to infer data from various sources, exploiting the jigsaw effect.

With the latter the argument is not so easy. I think it is an intriguing idea. But it I have a hard time in believing that abandoning privacy will make the world a better place, mainly out of two reasons: 1. even though all the data is out there, we will not have an even playfield. We will still have different capabilities in processing the data to get something meaningful out of it. After all, we need to make sense of the data first, and even though everyone can theoretically access it, processing capabilities will not be evenly distributed. Therefore there will be parties with a competitive advantage that they can use to exert power on those with fewer processing cpapbilities. 2. Even when assuming that scoiety will get more tolerant, there will still be things that are more frowned upon than others on a moral scale. A lot will also depend on the presentation of facts to others. The “shitstorms” that we already witness on social media are often based on incomplete or outright false facts.

What can we do?

Now we get to the question that I think is the really important one: How can we deal with the loss of privacy? The only concept that I know of so far is information accountability. It was postulated by Weitzner and al., and builds on informational self-determination. Information accountability is a different paradigm: not the sender is protected, but the receiver is guarded. That means that you only take note in case something happens (you do not get a job, or an insurance because of leaked data). In that event, the offending party would have to present which data they used to make the decision. Bearing that in mind, one of the major questions is: how can we assure accountability on a technological level?

One proposal comes from Oshani Seneviratne. In her PhD at MIT, she develops HTTPa, an accountability-aware web protocol. In essence, the protocol enables you to tell the receiver what he is allowed to do with the transmitted data. This is kind of a creative commons for personal data. A network of provenance trackers stores logs of those permissions and can be consulted in case something goes wrong. There is a lot more to Oshani’s work, and I suggest to check out this presentation as a start.

So should we abolish data protection and confidentiality now, and move solely to information accountability? I do not think so. I see accountability as a good addition which may be suitable to deal the new requirements of a digital and heavily interconnected world. As Oshani points out, accountability is quite compatible with anonymity – because you only have to reveal your identity in case something goes wrong. Apart from the technical solutions, we also need to discuss legal frameworks for accountability to work. Otherwise there will be no way to hold people accountable in court. Therefore, we need to have a broad debate on what is acceptable and what not on a social level. Thankfully, that debate has already started and gets more and more attention. Just like Seda Gürses put it with privacy as a practice. After all, technology can only give us the tools, but what we want to do with them is up to us.

If you made it that far, thanks for reading. Below are a few slides that are meant as a short summary. Of course, I would love to hear your comments and ideas on the subject! Does it make sense to you? Which concepts am I missing?

In the spirit of the upcoming RDSRP’11, I decided to list a few Research 2.0 communities that I check with more or less frequently. That means communities specifically on the topic of Research 2.0, not just Web 2.0 tools for science. Without further ado:

I am sure, I missed tons of places here. What are your favourite Research 2.0 hangouts?

Welcome back in 2011! I haven’t written too many posts lately (due to a lot of work), so it is a nice incentive that the stats team from WordPress.com thinks this blog did quite well last year. Below is a high level overview of blog stats since its inception in March 2010 – courtesy of Andy, Joen, Martin, Zé, and Automattic at WordPress.com with some amendments and reformulations from me:

Healthy blog!

The Blog-Health-o-Meter™ reads This blog is doing awesome! – as you can see this is a rather close call though 😉

Crunchy numbers

Featured image

A helper monkey made this abstract painting, inspired by your stats.

The Leaning Tower of Pisa has 296 steps to reach the top. This blog was viewed about 1,100 times in 2010. If those were steps, it would have climbed the Leaning Tower of Pisa 4 times

In 2010, there were 11 new posts, not bad for the first year! (seeing post counts from various other scientific bloggers leaves with some doubt about that statement)

The busiest day of the year was October 11th with 37 views. The most popular post that day was Blinded peer reviews – a thing of the past?.

Where did they come from?

The top referring site in 2010 was twitter.com (by far) . This is not surprising as I announce all new posts there.

People that came via search engines searched mostly for science 2.0, for me, or for a combination of both. The most popular searches contentwise related to conducting a group discussion.

Attractions in 2010

These are the posts and pages that got the most views in 2010.

1

Blinded peer reviews – a thing of the past? October 2010
1 comment

2

Barcamp Graz 2010 – A weekend in review May 2010
1 comment

3

A Publication Feed Ecosystem for Technology Enhanced Learning [UPDATED] July 2010
1 comment

4

Reminder: Research 2.0 Workshop at ECTEL 2010 June 2010

5

IJTEL Young Researcher Special Issue CfP and CfR September 2010

With that little overview I would like to say “Thank you!” to my readers. I wish all of you a successful year 2011!

Hi, my name is Peter Kraker and I am a research assistant at Know-Center (Graz University of Technology). Currently, I am involved in STELLAR, an EU-funded  Network of Excellence revolving around Technology Enhanced Learning. My main research interest and the topic of my PhD thesis is “Science 2.0”: the way in which researchers use Web 2.0 for their work and the effects this has on science itself.

I will use this blog to report about my research, to cover important developments in the area, and to publish interesting stuff I came across. I am looking  forward to your input and I sincerely hope that this will lead to a fruitful exchange!

%d bloggers like this: