This assignment must be turned in by the start of class on Thursday, January 21. You must follow the instructions for submitting an assignment.

Background

You will be working with the dataset capability-data.csv, a modified version of the National Material Capabilities dataset from the Correlates of War project.

capability_data <- read.csv("capability-data.csv")
dim(capability_data)
## [1] 198 109
head(names(capability_data), 10)
##  [1] "ccode"     "irst.1990" "irst.1991" "irst.1992" "irst.1993"
##  [6] "irst.1994" "irst.1995" "irst.1996" "irst.1997" "irst.1998"
tail(names(capability_data), 10)
##  [1] "upop.1998" "upop.1999" "upop.2000" "upop.2001" "upop.2002"
##  [6] "upop.2003" "upop.2004" "upop.2005" "upop.2006" "upop.2007"

Each row of the dataset is a country. The first column, ccode, gives its Correlates of War country code. You can look up which countries have which codes in the COW State System Membership dataset. Each subsequent column gives how much of a particular military capability component a state possessed in a particular year. There are six of these components:

Missing data are coded with the value -9.

Main Task

You will tidy this ugly dataset, and along the way you will calculate some commonly used transformations of the data. Your goal is to transform capability-data.csv into a dataset where each row is a country-year pair. The dataset should contain nine columns:

I strongly recommend using the dplyr and tidyr packages to perform the cleaning and transformation.

The first few rows of your final output should look something like the following. The values here are made up, so don’t expect yours to match exactly.

##   ccode year irst milex milper  pec tpop upop     cinc
## 1     2 1990 0.25  0.10   0.05 0.15 0.00 0.00 0.091667
## 2     2 1991 0.20  0.05   0.20 0.00 0.25 0.10 0.133333
## 3     2 1992 0.10  0.20   0.00 0.10 0.15 0.25 0.133333

Your R script should save your final output in a CSV file called final-data.csv. You don’t have to commit this file to your Git repository (though you may if you choose).

Your grade will reflect not only whether you get the right answer, but also whether your code is readable and follows the best practices laid out in “Best Practices for Scientific Computing” and the other readings from this week. The same will hold true for the rest of the assignments this semester.

Weekly Visualization Challenge

How best to communicate empirical findings visually could be—should be—an entire course of its own. So, unfortunately, we don’t have the time to cover data visualization in a systematic way. As a cheap ad hoc substitute, each weekly assignment will end with a “visualization challenge” asking you to make a graph of some slice of the assignment results.

You will receive credit for the visualization portion of the assignment as long as you make a good faith effort. But whoever turns in the best visualization each week, as chosen by me, will receive a $5 Starbucks gift card. We will discuss each week’s winning entry and what makes it good, in hopes that we will all absorb some good taste in visualization by osmosis, despite the scant coverage elsewhere in the curriculum. I recommend, but do not require, using the ggplot2 package to make graphics.

For this week’s challenge, we’ll start simple: Track the changes over time in the CINC scores of the five countries with a U.N. Security Council veto.