21 July 2011

Privacy and your online learner identity

This post is prompted by an article I happened to read in the Chronicle of Higher Education of May 15th, 2011 entitled Why privacy matters even if you have nothing to hide, written by Daniel Solove, a professor of law at George Washington University. It is a prequel to a book called Nothing to Hide. My interest in it stems from an article Adriana Berlanga and I wrote about online learner identities. The question we address there is how best to balance the need to know as much as you can about a lifelong learner to be able to offer him or her the best possible learning arrangements (in an online learning environment) with the justified worry that yielding all those data may easily invade that person's privacy.

the identity question, finger print with that text
Fundamental to our argument is the observation, made by many, that the online realm or cyberspace becomes ever more a place where we lead our social lives, also our live as a (lifelong) learner and worker. Consequently, we need to build online identities, which we dubbed a online learner identity in so far as that identity should allow us to 'live' in networked environments geared for learning and professional development (Learning Networks, if you like). However, since these identities are fragmented across the various social networking sites out there (Facebook, Google, Ning, LinkedIn, ...) it is difficult for an individual user to build, let alone maintain, such an identity. One needs to repeatedly update various sites and, even harder, one needs to imagine what the big picture of oneself is that emerges this way. So technical solutions may be attempted that allow data to be automatically exchanged between those sites. Perhaps a kind of dashboard that aggregates data from various sources is a good idea. (This assumes the hosting parties would allow that, which does not go without saying as sharing with such a dashboard site lowers traffic and thus is not in their interest.) Also, a learning perspective is needed to dictate what data the dashboard should collect. Past education, for instance, seems more important than the kinds of movies one likes.

However, there is another issue that is inextricably linked to these technical and learning-theoretical issue. It is whether we as users of such a dashboard do indeed want to aggregate our existing fragmented identities. It does not go without saying that we do. Facebook, for instance, once was a fun site only but increasingly has earned itself a bad reputation for revealing ever more data about its users without asking them explicitly beforehand. And every service Google offers us for free betrays Google's hunger for our (profiling) data. This should not come as a surprise, of course. Somebody should foot the bill for the services provided to us. It turns out that we ourselves do so by giving up our data for free, allowing the Facebooks and Googles of this world to make money through targeted advertising and selling of profiling data to third parties. But we need at least ask the question if this is the way we want it, for Facebook and Google but also for dashboard-like services that ostensibly only have the best intentions. At face value, this question is about privacy issues. Solove's paper shines an illuminating light on helping us understand it that way.

His point of departure is the often voiced argument that if you have nothing to hide, it is ok for the government to know anything there is to know about you. The counterargument is that this constitutes an invasion of your privacy. Parenthetically, in the discussion that follows the article someone rightly points out that privacy is a Human Right (number 12) granted to you by birth and that invasions thereof are a privilege that needs to be granted through proper argument, even by governments. However, to make the counterargument stick we need to understand what privacy is. Solove attempts to delineate the notion by using two metaphors, a quite ingenious move in my view. Some aspects of privacy are addressed by George Orwell in his Nineteen Eighty-Four novel, by describing the omnipresent state which watches and stores in huge databases our every step. This is the surveillance aspect of privacy. The other metaphor is discussed by Franz Kafka in his Der Prozess (The Trial). This is about someone who has to stand trial but has no idea what he is accused of nor is he allowed to have access to the accusations and the reasoning behind it. This aspect of privacy Solove calls information processing, it addresses the government as a bureaucracy, which lacks transparency and refuses to be accountable for what it does with those data. He then argues: the problems [with privacy invasions] are not just Orwellian but Kafkaesque. Government information-gathering programs are problematic even if no information that people want to hide is uncovered. In The Trial, the problem is not inhibited behavior but rather a suffocating powerlessness and vulnerability created by the court system's use of personal data and its denial to the protagonist of any knowledge of or participation in the process. The harms are bureaucratic ones—indifference, error, abuse, frustration, and lack of transparency and accountability.

So, one should not so much worry about the mere storage of data, that which George Orwell denounced, but about the subsequent processing of them in opaque ways, that which worried Franz Kafka so much. To unpack the processing, data aggregation is one way of data processing, 'the fusion of small bits of seemingly innocuous data'. Aggregation may be objected to since the picture of someone that emerges after aggregation is not apparent in the constituting bits. The whole is more than the sum of its parts, sums this up nicely. Exclusion, preventing people 'from having knowledge about how information about them is being used' and barring them 'from accessing and correcting errors in that data', is another way. Exclusion goes to the heart of the Kafka objection. Job applicants whose application was turned down because they were unable to remove online pictures taken of them taken in a moment of weakness understand the harm exclusion can do full well. This problem is exacerbated when secondary use of those data is made, as the route from misuse to the data source is now even harder to trace. Distortion is a third kind of data processing, meaning that, necessarily, stored data only show part of a personality, which may lead to a distorted picture of that person. When first impressions matter, as in job interviews, distortion can do much harm.

In the case of a learner's online identity, Adriana and I argue against the fragmentation of someone's identity across the various social media sites in existence. This is a variation of the distortion argument. Even if we admit that people may have good reasons to maintain several, separate online identities (one for work, one or more for your leisure activities), what such an identity should look like should be under the identified person's control and only his or her control. After all, only that person can oversee the degree and kind of allowable distortion. Thus, the practical argument we leveled against fragmentation proves to have a privacy aspect as well. This brings us to the exclusion argument. People need to have access to the data stored about them to correct those data, extend them, prune them, etc. In our paper, we offered a practical argument for this, arguing that people should be able to build an online identity qua learner that suits their learning and professional development best. This argument too turns out to have a privacy twist to it, being that control over one's data is a matter of principle (privacy) and not only convenience.  And finally, the defragmentation that we argued for of course is a form of aggregation. However interesting the technical challenges may be to overcome defragmentation and however useful it may be from a learning perspective, doing so inevitably also impacts our privacy. That is the key value of Solove's argument.

Solove thus exposes the nothing-to-hide argument as too simplistic. Privacy is a multifaceted thing, nothing to hide only addresses the data surveillance aspect of it, not the data processing aspect. Data processing itself is complex, encompassing such things as aggregation, exclusion and distortion. Any one of these impinges on efforts to arrive at the consolidated online learner identity we argued for in our paper. Solove, in focusing on debunking the nothing-to-hide argument, does not offer any solutions on how someone's privacy may be safeguarded against the aggregation, exclusion and distortion of their data. But perhaps this cannot be discussed in general terms, perhaps it can only be understood in the concrete case of, for instance, building a consolidated digital identity for learners. If so, his refined understanding of what privacy is about should help us do so. It should help us to reap the benefits of online learning while giving due attention to the privacy challenges that come in its wake.

October 22, 2012. Note added after publication: It has come to my attention that there is an EU funded, 7th framework project that goes by the name of Trusted architecture for securely shared services (TAS3). I quote from their summary: TAS3 will develop and implement an architecture with trusted services to manage and process distributed personal information. [...] TAS3 will focus an instantiation of this architecture in the employability and e-health sector allowing users and service providers in these two sectors to manage the lifelong generated personal employability and e-health information of the individuals involved. This sounds like an architecture that should also work for online learner identities, even though TAS3 will focus on data in offline databases and we are more interested in online databases (behind social media interfaces). Second, the EIfEL team has published a blog post with the intriguing title: To create a trustworthy Internet respectful of our privacy, shouldn't we simply make our personal data public? Without going into detail, their solution is to spread your personal data over various sites, but anonymously. You as the owner keep a bundle of private keys through which you can grant access to those data in a piecemeal fashion. This way, you can allow whoever you want to access and disallow everybody else access. Quite ingenious, although I am not sure Facebook and Google would like the idea of only having uninformative bits and pieces of your personal profile data hidden behind an alias. Even so, Google just said are considering allowing aliases on their Google+ service.