Tuesday, October 25, 2005

INTERESTING NEW DATA SET WATCH: This, over a listserv I participate in, and which I thought might interest some of our readers.
From: Gary King

I thought you might be interested in a newly updated dataset of almost 10 million individually coded international events (1990-2004). Each event is summarized in the data as "Actor A does something to Actor B", with Actors A and B coded for about 450 countries (and other actors) and "does something to" coded in an ontology of about 200 types of actions. The data are coded by a computer "reading" millions of Reuters news reports. Will Lowe and I wrote an article* that evaluated the software system (produced by VRA) that performs this task and found that for the numbers of events it was possible to convince humans (trained Harvard undergraduates) to coded by hand, the machine did as well as the humans. However, in part since there is only so much pizza you can feed undergraduates, the machine clearly dominates for larger numbers of events. We previously released a dataset with 3.5 million events; this one is bigger, more accurate (since the software has been improved), and covers a longer time period.

Most international relations data are limited to analyses aggregated to the year or month. Yet, as we say in the article, when the Palestinians launch a mortar attack into Israel, the Israeli army does not wait until the end of the calendar year to react. We think there is much to be learned about international relations from data like these.

For the data, documentation, and our article, see



*Gary King and Will Lowe. 2003. "An Automated Information Extraction
Tool For International Conflict Data with Performance as Good as Human
Coders: A Rare Events Evaluation Design" International
, 57, 3 (July, 2003): Pp. 617-642.

Gary King
David Florence Professor of Government,
Director, Institute for Quantitative Social Science
Harvard University, 1737 Cambridge St, Cambridge, MA 02138
