AidData has a full response to Deborah Brautigam’s criticism of their data that I discussed yesterday. The debate contains a lot of nuts and bolts but it also raises some issues that should interest anyone involved in using or collecting large amounts of data based onlocal media reports:
Brautigam seems to think that it’s a bad idea to publish this dataset and expose our methods and sources to public scrutiny. She says “[d]ata-driven researchers won’t wait around to have someone clean the data. They’ll start using it and publishing with it and setting these numbers into stone.” People may abuse the data, but we disagree that this is a good reason not to publish the data. We have been systematically collecting project-level development finance data for the better part of the last ten years, and we find errors in the official data all the time. You cannot fix errors until you know that they exist, and we believe that more sunlight and scrutiny is the best way to spot and fix errors. Brautigam’s arguments seem to suggest that only a small group of people who she considers to be experts should be allowed to collect and analyze data on the nature, distribution, and effects of Chinese development finance. We disagree with this “gatekeeper” approach to social science and expect that it will slow progress in this narrow sub-field of the academy.
I am sympathetic to both claims. I fear that Brautigam is correct that people will start using the data simply because it is there without much regard for data quality. Yet, there is much to be applauded about AidData’s transparency and willingness to publish data before major publications have appeared. The usual strategy to only release data after publication only magnifies the problem. By keeping data proprietary, researchers can use data without similar levels of scrutiny. This is why Brautigam’s post is so useful: it is easily accessible to reviewers so careful researchers will have to engage with the issues she raises. Moreover, publications in peer-reviewed journals legitimize data sources; reducing incentives for researchers to spend time and resources scrutinizing data quality. There are at least some grounds for optimism that AidData’s approach will pay dividends:
Finally, Brautigam claims that AidData’s attempt to crowdsource Chinese development finance information will not work. This assertion is a testable hypothesis. With time, others will be able to judge whether we succeed in generating higher-quality and higher-resolution data over time. It is interesting to note that within 24 hours of the china.aiddata.org site being launched, users began to provide new information about specific projects in the database. The Guardian produced field reports on four of the projects in our database (here, here, here, and here).