WikiLeaks and Small-n Research

I couldn’t get access to WikiLeaks this afternoon. “The site could be temporarily unavailable or too busy.” Unsurprisingly, the announcement of the publication of classified State Department documents has triggered an enormous media resonance, inviting many people to check ‘what’s up(loaded)’. From a professional perspective, I wonder whether WikiLeaks has the potential of becoming an important data source for political scientists providing them with background information about domestic and foreign politics around the globe. Of course, we can ask whether the diplomatic reports reveal more information about the world or about the authors’ perceptions of the world. The data, is subjective, but that is the case for most information that we collect in expert interviews. The good news is that this whistle-blower data is primary data. But as always, before relying on one source in small-n research (assuming that one does not mind using such sources for ethical reasons), such data should be triangulated with other sources.

Talking about triangulation, in a recent – so far unpublished – paper, Susumu Shikano, Stefanie Walter and I analyzed different data aggregation strategies commonly used in small-n research. In the paper, we are interested in how best to aggregate different sources under different conditions in triangulation. Using computer simulations we so far tested a set of five very simple aggregation strategies, namely, a random selection of sources, a simple average, a weighted average, the mode and a winner-takes-it all strategy.

In our simulations we a-priori define a uni-dimensional continuous scale for a concept whose true value is assumed to be 50 without loss of generality. We further assume that every expert is uncertain about this true value to some degree. That is, every expert recognizes the true value with certain cognition error. For the errors, we assume a normal distribution whose expected value equals zero. That is, the probability of small errors is higher than that of large errors. We differentiate this probability of errors between experts with different information levels (technically, this is reached by varying the standard deviation).

Once we define the number of experts holding different information levels, we randomly draw information from the corresponding normal distributions independently for each expert. We repeat this random draw and aggregation a 1000 times and thereby obtain 1000 measures for each aggregation rule. These results are, in turn, evaluated in terms of the ‘true’ value. For this purpose, we utilize the mean absolute error (MAE).

The left panel of the figure below (please follow the link) highlights our simulation results with one well-informed source (stdev of 2) and an increasing number of worse informed sources (stdev of 20). In the right panel we increase the number of well-informed sources.

When assuming that small-n researchers commonly only use up to six sources we find that a weighted average is to be recommended. Assuming unsystematic errors, the simple average in our simulations only reach a similar performance when using six or more sources. The winner-takes-it-all strategy displays a satisfactory performance. The winner-takes-it-all and the weighted average, however, demand a qualification of the sources. But how would you, for instance, classify the documents released by WikiLeaks?


2 thoughts on “WikiLeaks and Small-n Research

  1. I came across a recent article in Sociological Methods & Research that seems relevant in this context ( Here’s the abstract:

    Multiple Informant Methodology: A Critical Review and Recommendations
    The value of multiple informant methodology for improving the validity in determining organizational properties has been increasingly recognized. However, the majority of empirical research still relies on single (key) informants. This is partly due to the lack of comprehensive methodological narratives and precise recommendations on the application of this important methodology. Therefore, the authors have developed a critical review and derived clear recommendations for the key challenges that researchers face in using multiple informants: (1) Which and how many informants should be considered? (2) How should the consensus among the informants be judged? (3) How are multiple responses combined into a single, organizational response to conduct further data analyses?

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s