Why You Should Never Trust a Data Scientist

Pete Warden

The wonderful thing about being a data scientist is that I get all of the credibility of genuine science, with none of the irritating peer review or reproducibility worries. My first taste of this was my Facebook friends connection map. The underlying data was sound, derived from 220m public profiles. The network visualization of drawing lines between the top ten links for each city had issues, but was defensible. The clustering was produced by me squinting at all the lines, coloring in some areas that seemed more connected in a paint program, and picking silly names for the areas. I thought I was publishing an entertaining view of some data I’d extracted, but it was treated like a scientific study. A New York Times columnist used it as evidence that the US was perilously divided. White supremacists dug into the tool to show that Juan was more popular than Juan[HF – John???] in Texan border towns, and so the country was on the verge of being swamped by Hispanics. …
I’ve enjoyed publishing a lot of data-driven stories since then, but I’ve never ceased to be disturbed at how the inclusion of numbers and the mention of large data sets numbs criticism. The articles live in a strange purgatory between journalism, which most readers have a healthy skepticism towards, and science, where we sub-contract verification to other scientists and so trust the public output far more. … If a sociologist tells you that people in Utah only have friends in Utah, you can follow a web of references and peer review to understand if she’s believable. If I, or somebody at a large tech company, tells you the same, there’s no way to check. The source data is proprietary, and in a lot of cases may not even exist any more in the same exact form as databases turn over, and users delete or update their information. Even other data scientists outside the team won’t be able to verify the results. The data scientists I know are honest people, but there’s no external checks in the system to keep them that way.

The Political Science of PRISM and International Privacy

The Financial Times has an editorial warning gravely that the European Union may overreact to the PRISM revelations.

If recent leaks about US internet surveillance spur Europe’s political leaders to press ahead with a proposed privacy directive, so much the better. That looks like one potential outcome from disclosures about the National Security Agency’s Prism program, with German chancellor Angela Merkel this week joining the chorus in favour of moving ahead with a privacy overhaul that was first put forward at the start of last year. There is a danger, however, that ill-considered responses to the Prism leaks will also risk Balkanising the internet and hampering companies that have been at the forefront of digital innovation. Protecting citizens’ privacy is an important job for governments – but so is using the new tools of online surveillance to make those citizens secure. These two goals should not be confused, and knee-jerk responses to populist outrage could do more harm than good.

I suspect that what is driving this is the realization that international business (i.e. the FT’s readership base) is likely to get hit as the regulatory politics of privacy and espionage start to get messy. Abraham Newman at Georgetown and I recently wrote a piece for Foreign Affairs’ website describing the complex forms of coordination that have sprung up between the EU and US over information sharing, and describing the likely consequences of the current scandals for this security cooperation. The article draws on a book we’ve been writing over the last couple of years on the transatlantic politics of data sharing, which has suddenly become a rather livelier issue than it was when we first started writing about it. This forthcoming article in World Politics on the ‘new interdependence’ will give political scientists some idea of the kinds of argument we are using, although it doesn’t address the empirics (it’s framed around a review of other people’s work).

There are a number of misunderstandings in the general coverage of this dispute – I’ll write about them as opportunity arises, drawing ideas (if not always empirical evidence) from the joint research that Abe and I have been doing. One such is reflected in Ed Luce and Tyler Cowen’s hope that the Europeans can be relied on to press for privacy protection in e.g. transatlantic trade negotiations. Luce and Cowen may turn out to be right – but only post-hoc. Europe has been having its own internal fight between officials and politicians who privilege security, and officials and politicians who privilege privacy, and until the last few weeks, the security-focused officials were winning. Instead of privacy-focused officials using transatlantic negotiations to reform American politics, security focused officials were using transatlantic negotiations to reform European politics. The EU, which had vigorously fought US proposals on terrorist financing tracking (the US so-called TFTP program) and airline passenger information in the 2000s, had agreed in principle to build its own TFTP, and was likely to introduce airline passenger data screening too along US lines. The transatlantic agreements that had resolved these disputes was leveraged by security-focused officials to bring through domestic changes within Europe.

This has changed thanks to PRISM and revelations (which weren’t really revelations – but that’s another story) that the US was tapping European Union communications in the Council of Ministers. Senior officials, including German officials, still privately think that this is a fuss over nothing. But they are finding themselves constrained by domestic politics to take action that seems to restore privacy protections. Nowhere is this clearer than in Germany. We know, from the Wikileaks cables about the previous TFTP dispute that German Chancellor Angela Merkel was never on the side of the privacy advocates in the confrontation with the US.

Hamburg Mayor Ole von Beust (CDU) told Ambassador today (2/12) that he had met with Chancellor Merkel last night and she was “very, very angry – angrier than he had ever seen her” with the outcome of the vote. Beust said that the Chancellor had personally lobbied German MEPs from the CDU/CSU parties to support the agreement, but that most of these MEPs ended up voting against the agreement anyway. Merkel expressed concerns to Beust that Washington will view the EP veto as a sign that Europe does not take the terrorist threat seriously. Merkel also worried about the ramifications (presumably within Europe and for transatlantic relations) that might follow were a terrorist attack to occur that could have been prevented had SWIFT data been exchanged.

This helps explain the anodyne response of the German government to the crisis. Merkel and her allies quietly agree with the US, and desperately want the controversy to go away. But the scandal is allowing the main opposition parties, the SPD and Greens, to put Merkel on the defensive. This pushes her in turn to take a more active position than she would like with respect to US spying, while also pushing for a stronger EU privacy framework more generally. This last can be expected to have knock-on repercussions for relations with the US – but that is a topic for another post.

Journalists, bloggers and indeed most international relations scholars like to think of disputes like this as face-offs between different states, with fundamentally different ways of doing things. But in fact, the more interesting politics often goes on in the dubious interzone between transnational and domestic politics. The US push for security over privacy has had many supporters within Europe, who have used the transatlantic relationship to bring through laws and policy changes that weaken the previously existing privacy regime. These security focused officials were becoming increasingly dominant in Europe as well as the US. Now they are at least temporarily beleaguered, which may, somewhat unexpectedly, lead to a new eruption of privacy disputes between Europe and the US.

Autism and the social contagion of information

The news that Jenny McCarthy will become a co-host of TV discussion show The View is generating a lot of controversy – people worry that McCarthy will be able to spread her controversial (for which read: crazy) views on autism, vaccination and chelation therapy to a much wider audience. Some of the theories as to what causes autism point to geographic clustering as evidence for some common physical cause. However, Columbia sociologist Peter Bearman and his collaborators build on a rich body of data from California to show that clustering of autism cases is plausibly caused by increased diagnosis thanks to the diffusion of knowledge across local communities. Ka-Yuet Liu, Marissa King, and Bearman find that at least 16% of the recent increase in diagnoses is down to the spread of knowledge:

One does not “catch” autism from someone else, yet a social diffusion process contributes significantly to the increased prevalence of autism. We observe a strong positive effect of proximity to other children with autism on the subsequent chance of diagnosis, robust to a range of individual- and community-level controls in both urban and less urban areas. In addition, close proximity to a child with autism was inversely associated with the likelihood of subsequent sole MR diagnosis, while it correlated strongly with the chance of autism-MR diagnosis. Proximity also increases the chance of autism rather [than] MR diagnosis given the same level of severity in autism symptoms. Social influence arises strongly for high-functioning cases of autism. The effect of proximity is also more prominent in younger children, when diagnosis is more difficult and parental resources are more important. Children who were diagnosed with autism have a similar mode of referral as that of their nearest neighbor with autism before their diagnosis. All of these findings are consistent with a mechanism of social diffusion of awareness of the symptoms and the benefits of treatment and are inconsistent with competing explanations. Social influence also accounts for the observed spatial clustering of autism. Such clustering could be caused by local environmental toxicants, the diffusion of a virus, or residential selection, but it is hard to see how a toxicant could cause a reduction in MR diagnoses, operate in all types of communities (urban or rural), and affect most strongly the high-functioning end of the severity distribution.

It’s likely that the increase in diagnoses reflects the difficulties in getting assistance within California (and the US education system more generally) for children who urgently need help, and who would be denied it without a full autism diagnosis:

Because the DDS provides services only to children with autism and not to children diagnosed with disorders on the autism spectrum, the importance of an autism diagnosis for parents striving to secure resources for their children is amplified. The steep and sudden cliff creates incentives that may not be present in other contexts, but pressure to do anything to help children is likely widespread and not limited to the California context. As Judith Rapoport of the National Institute of Mental Health told Grinker, “I’ll call a kid a zebra if it will get him the educational services I think he needs.”

Anecdotal evidence suggests that parents of kids with autism or related issues face grueling battles if they want local schools to acknowledge their needs and provide for them. It wouldn’t surprise me at all if much of the knowledge that is shared locally is knowledge of how to force action from a bureaucratic system where administrators have limited budgets and strong incentives to deny care if they think they can get away with it.

Update: This shorter piece by Bearman on why many parents believe that vaccines cause autism is also interesting.

It’s possible indeed

This, from Megan McArdle:

My assertion that there’s a 70% chance that the GOP controls White House, Senate, and House in 2017 has attracted a lot of pushback. And it’s certainly possible that I’m wrong! Here’s my thinking, for what it’s worth: Since the Civil War, only two Democratic presidents have been succeeded by another Democrat. Both of them–FDR and JFK–accomplished this by dying in office. Since World War II, only four presidents have been succeeded by a member of their party. As I mentioned above, two of them accomplished this by dying in office. One of them accomplished this by resigning in disgrace ahead of his own impeachment. Only one of them, Ronald Reagan, left office at the end of his appointed term and was succeeded by a duly elected member of his own party.

reminds me of this classic cartoon by XKCD, which should be blown up into A0 format, and placed in a permanently visible position in front of the desk of every pundit tempted to make pseudo-quantitative oracular announcements about American politics (extremists might want to go the full Clockwork Orange with the eyeclamps but that strikes me as overkill).

Human beings are cognitively predisposed to perceive patterns in the world. Many, likely most of these patterns are garbage. Without good theories, and good ways of testing those theories, we’ll never be able to tell the garbage patterns from the real ones.

Violence as a Source of Trust in Mafia-type Organizations

Criminals have great difficulty in trusting each other – they often have conflicting interests (and may sometimes have incentives to inform on each other) but have no very good equivalent of the state to enforce contracts. One traditional solution is to rely on family members, who are presumably more trustworthy.

But there are others – scholars such as Thomas Schelling and Diego Gambetta have speculated that shared information about violent acts might help to cement cooperation. If I know that you have committed a violent act, and you know that I have committed a violent act, we each have information on each other that we might threaten to use if relations go sour (Schelling notes that one of the most valuable rights in business relations is the right to be sued – this is a functional equivalent). Of course, it’s difficult to establish this empirically – as Gambetta notes in his classic book on the Sicilian mafia, active mafiosi make poor interview subjects – at the very best they are likely to be reticent about their activities.

Paolo Campana and Federico Varese have a very nice new article in Rationality and Society which tests how both traditional sources of trust such as family ties, and less conventional sources such as shared information about violence, might work among real criminals.

This article relies on two unique datasets we have collected on two Mafialike organizations: a Neapolitan Camorra clan based some 50 kilometres north of Naples, and a Russian Mafia group operating in Rome. … Both groups had been under extensive police surveillance, during which investigators were able to monitor all the telephone lines used by the key players, and listen to their conversations. … In both instances we had access to the files prepared by the police for the Prosecutor Office to be used as evidence in court; they include the transcripts of the wiretapped phone conversations. … The network of contacts between the core members of the Neapolitan Camorra clan amounts to 1370 while the core members of the Russian group have exchanged a total of 295 contacts among them.
Kinship does indeed have a statistically significant effect in the Camorra clan: the frequency of contacts between two associates increases when both are near-relatives of the boss. This finding confirms the importance of kinship within this particular Mafia. Extended kinship appears to play a role in the case of the Russian–Italian group. Rather more surprisingly, in both models violence does have an impact on tie formation between two actors. Having shared information about violent acts increases the frequency of contacts occurring among two actors. The ‘violence effect’ is fairly strong, and greater than that recorded for kinship in both cases, including the Camorra. This would suggest that even in clans made of relatives, having discussed violence is a better predictor of cooperation than kinship itself. This further suggests that there is nothing ontological in the role of kinship in organized crime. When better and more reliable mechanisms to increase commitment are available, criminals will use them, just as organizations in advanced societies tend to rely on merit rather than kinship when recruiting employees.
There is additional, non-statistical evidence of the use of violence as a form of credible commitment. The boss of the Camorra clan discussed here would instruct all his men to shoot together at the same time when committing a murder. Everybody in the firing squad had to fire at least one shot. … Each perpetrator is made ‘a hostage’ to all the others, in order to reduce his incentives to defect and/or inform on his fellow associates.
Robert Putnam on funding the social sciences

In Politico

This week, I was one of 12 Americans to receive a National Humanities Medal, based in part on research I began more than 40 years ago on civil society and democracy. Making Democracy Work has become one of the most cited works of social science in the past half-century, because it offered hard scientific evidence for the classic idea that grass-roots civic engagement — what the English conservative Edmund Burke called “the little platoons” of society — is the crucial ingredient in successful democracies. … Because my findings resonated broadly, American leaders from Bill Clinton to Jeb Bush and from Mike Huckabee to Al Gore have discussed the implications of this work for the challenges facing our country today. One of the harshest critics of National Science Foundation funding of political science has even praised my study as “one of the most influential pieces of practical research in the last half-century.”
Ironically, however, if the recent amendment by Sen. Tom Coburn (R-Okla.) that restricts NSF funding for political science had been in effect when I began this research, it never would have gotten off the ground since the foundational grant that made this project possible came from the NSF Political Science Program. The NSF is now grappling with what Coburn’s narrow criteria mean for the $10 million of political science research it supports each year.
Distinguishing Offense from Defense in Cybersecurity

This New York Times article about Edward Snowden implicitly highlights the perceived dilemmas of US cybersecurity policy.

In 2010, while working for a National Security Agency contractor, Edward J. Snowden learned to be a hacker. He took a course that trains security professionals to think like hackers and understand their techniques, all with the intent of turning out “certified ethical hackers” who can better defend their employers’ networks. But the certification, listed on a résumé that Mr. Snowden later prepared, would also have given him some of the skills he needed to rummage undetected through N.S.A. computer systems and gather the highly classified surveillance documents that he leaked last month, security experts say.
Some intelligence experts say that the types of files he improperly downloaded at Booz Allen suggest that he had shifted to the offensive side of electronic spying or cyberwarfare, in which the N.S.A. examines other nations’ computer systems to steal information or to prepare attacks. The N.S.A.’s director, Gen. Keith B. Alexander, has encouraged workers to try their skills both defensively and offensively, and moving to offense from defense is a common career pattern, officials say. Continue Reading →
How Cities Compete For Business

Local economic development theory argues that it’s usually a bad idea for cities to offer special incentives to try to attract businesses. These incentives weaken the fiscal position of the city, but often provide freebies for location decisions that the businesses would have taken anyway. So why do cities do it? In a new paper, Nate Jensen, Edward Malesky and Matthew Walsh argue that these bad decisions are driven by voters. Voters know enough to want to vote for politicians who take active steps to attract business to the city, while not knowing enough to realize that the costs of these active steps often outweigh the benefits. Jensen et alia argue that this theory is supported by evidence of real differences between cities run by elected mayors, and cities run by managers responsible to a council. They argue that managers are more insulated from electoral politics than mayors, and hence less likely to take wasteful decisions to offer incentives.

We test the impact of electoral institutions on a dataset of over 2,000 project-level incentives ( 2013), finding significant support for our electoral pandering hypothesis. Elected mayors: offer 14% more money than council-managers overall and 20% on a per-firm basis($822,000 on the average project); are 16% more likely to offer an incentive to an individual firm; and are 7.6% less likely to have an oversight program in place for the use of investment incentives. The larger generosity of elected mayors is facilitated by the fact that they face less oversight in the targeting of incentives and requirements on the size of incentives than comparable cities subject to council-manager systems.
Conservatives for Better Childcare

This bit in a New York Times about Angela Merkel’s election manifesto may seem a little strange to US observers.

Addressing some 600 representatives of her Christian Democratic Union and its Bavarian sister party, the Christian Social Union, on Monday, Ms. Merkel vowed that if returned to office, she would increase child subsidy, as well as retirement benefits for older mothers and lower-earning workers. … Building on her long-nurtured image as frugal, commonsense homemaker, the chancellor painted the 127-page election manifesto as a solid foundation for the country’s future — forgoing tax increases, but still chipping away at Germany’s outstanding debts, and ensuring that budget consolidation would remain a cornerstone of a “moderate and balanced” program.

Merkel is the leader of Germany’s conservative party and (as the article suggests) views her reputation for fiscal probity as an asset. Why would conservatives who pride themselves on their stinginess boast about their plans to spend more money on childcare and benefits for women?

Kimberly Morgan has a recent article in World Politics (temporarily ungated) which explains why Merkel and other European conservatives have taken up the cause of spending more on women. The reasons have much to do with straightforward electoral politics.

In the past, women tended to be more politically conservative than men, and their greater support for conservative parties was highly consequential in that it enabled the electoral dominance of these parties in the post–1945 period. However, the rise in women’s workforce participation has transformed women’s views on both gender relations and politics, as women are increasingly more likely to vote for left parties, embrace gender egalitarian norms, and support social welfare spending. … Given the significant size of the female electorate and women’s importance to conservative party successes in the past, parties have competed for increasingly dealigned segments of the female vote through two types of changes. First, they have recruited more female members, promoted women within the party leadership, and adopted targets or quotas to increase their representation. … Second, political parties have altered their electoral platforms to appeal to female voters, a process reinforced by the growing influence of women within these parties.
Worried about the erosion of female support, in 1996 Chancellor Kohl persuaded the CDU to adopt a weak “quorum” for elected offices and positions within the party, although conservatives successfully resisted hard targets. … In Germany, efforts to help parents balance work and family were taken up toward the end of an SPD-Green government, but it was under a coalition of the CDU/CSU-SPD, with a female CDU politician in charge at the family ministry, that significant policy change occurred. … Traditionally a party that disproportionately drew the support of women, the CDU had been steadily losing the female vote, particularly among younger women. Electoral defeats in 1998 and 2002 were particularly jarring, as the party lost even older female voters to the SPD and faced declines among urban voters, another critical group.
… Merkel’s overarching strategy was to “steal themes—as well as young, urban, and female swing voters”—from the SPD, and modernizing family policies was one way to achieve this. In 2007 and 2008 the grand Coalition government enacted transformative changes in germany’s work-family policies, including tax breaks for child care costs, the parental leave law, and an agreement with the Länder that required them to pass laws in their assemblies giving parents the right to a place in day care for children aged one to two by 2013. The federal government promised several billion dollars for the creation of these centers and continuing assistance with operational costs after 2013, with the goal of covering 35 percent of children under age three by that date.

Morgan makes it clear that these changes took place only after internal struggles within the party, between pragmatists who wanted to win votes and conservatives who rejected this new orientation on principle. Nonetheless, they seem to have helped shore up support among an important electoral constituency that might otherwise have defected. Whether US conservatives could ever carry out such a volte-face is of course an interesting question.

Do Human Rights Treaties Work?

One of the weirder findings of the recent political science literature is that states which sign up to human rights treaties are more likely, not less, to commit human rights abuses. This is hard to explain. In a new article for the American Journal of Political Science (ungated draft), Yonatan Lupu argues that this can probably be accounted for by selection effects. In other words, the finding reflects the fact that some states are more likely to sign up to human rights treaties than others, not the consequences of the treaty for subsequent state behavior. To show this, Lupu uses an interesting trick – he borrows the NOMINATE methodology that Americanists use to figure out the ideal points of Congressional legislators, and uses it to estimate the ideal points of states instead. The results suggest that human rights treaties do not cause states to commit more human rights violations; instead, states that commit more human rights violations are more likely to join human rights treaties. When selection artefacts are taken into account, treaties on torture and civil and political rights appear to have no consequences for human rights violations, but the Convention on the Elimination of All Forms of Discrimination against Women has a substantial impact on a wide variety of women’s rights.

