Hilary Mason, chief scientist at bit.ly, has two posts on how and why companies like bit.ly should share data with academics. As a beneficiary (together with John and other members of a team of social scientists – we wrote the Arab Spring research that she mentions) of bit.ly’s generosity, I can only applaud. However, there’s one part of her post that deserves some comment. She mentions that bit.ly requires academics who use their data to sign a non-disclosure agreement, but emphasizes how minimal the terms of the NDA are. In particular, under bit.ly’s standard policy:
You may publish whatever you like. Academic freedom FTW.
This is great – but my impression is that bit.ly is unusual in its embrace of academic freedom. From anecdotal evidence, other companies who provide social science academics with access to big data frequently impose much more stringent conditions, aimed, essentially, at ensuring that researchers do nothing that might plausibly be regarded as controversial with their data. Some of this is to the good. Every social scientist has their inner Dr. Marvin Monroe which has to be restrained, forcibly if necessary, when dealing e.g. with personally sensitive data. But some of it is not. It means at the least that it’s going to be very difficult to study e.g. social protest movements with the internal data of international social media companies (if you are a company doing business in a variety of non-democratic regimes, your product’s capacity for stirring up social unrest is unlikely to be a selling point, and hence is something that you will probably discourage researchers from highlighting). This also means that the set of reported findings is going to be a biased sample of the underlying universe of interesting findings that researchers would be reporting if the data were open access (or accessible under minimally restrictive conditions such as bit.ly’s). And in turn, this bias may skew our understanding of the social and political consequences of social media in problematic ways.