Your probability of dying from a firearm (or, how to manipulate statistics)

probability of dying from a firearm

The above is a line chart of your probability of dying from a firearm each year from 2003 to 2010 (it would have included more years, but finding death statistics is a pain for some reason). I made it to demonstrate how easy it is to manipulate numbers in order to tell a specific story. The numbers work out – I found the probability of dying in the United States, found the probability of dying of a firearm given you died, and multiplied them together. 

The chart would lead you to believe that your chances of dying from a firearm have fallen. The problem is that this chart doesn’t include anywhere near enough information to tell a full story. You have to look at the actual year-by-year statistics to see the full story. That means looking at the number of deaths, the number of firearm deaths, and the probabilities involved.

When you look at the actual numbers, the number of gun deaths per year had indeed fallen from 2003 to 2010. However, the difference to the final probability of dying from a firearm was in reality negligible, falling from 0.0038% to 0.0032%.

(Also, the number of deaths from firearms last year was higher than in 2003, so the trend didn’t continue.)

Keep this in mind when looking about statistics and graphics about political issues.

Prices are higher in small towns than in big cities (also, a tutorial for R)

So I’ve decided to start learning about statistical computing ahead of the harder stats classes that I’ll be taking this fall (my subfield within the political science major is Empirical Theory and Quantitative Methods) and as my first little project to teach myself the basics of the R language/environment I decided to take a look at the consumer price index in small cities (population less than 50,000) versus large cities (population greater than 1,500,000). To do that, I needed to get that data, format it in a way that was R-friendly, and then present it in a way that makes sense. Since I noticed that many of the R tutorials out there aren’t very clear on some things, I decided to document my steps as I figured out what worked.

Getting data

The Bureau of Labor Statistics gives anyone access to their consumer price index database, and lets you see the information for specific regions. The two pieces of data I chose were Size Class A (over 1,500,000) and Size Class D (under 50,000) for 1993 to 2012. Retrieving the data as tables, I pasted each into a separate Numbers spreadsheet (this is on my MacBook Air) and exported them to my Downloads as “cpibig19932012.csv” and “cpi19932012.csv”, respectively. 

Getting it into R

Working in RStudio, I clicked on the Files tab in the bottom right window, clicked Home, clicked Downloads (or wherever you decided to save the .csv files), clicked More, then Set As Working Directory. This lets us access the .csv files in the R environment.

In a new script in the top left window, I import the data into variables cpi and cpiBig for the small cities and big cities, respectively:

cpi <- read.csv(file=”cpi19932012.csv”,head=TRUE,sep=”,”)
cpiBig <- read.csv(file=”cpibig19932012.csv”,head=TRUE,sep=”,”)

Making a graph

I decided that the best way to represent the data over time would be a line chart showing both data sets on the same graph. I start by deciding on a heading, “Consumer Price Index in small vs. large cities 1993-2012”:

heading = “Consumer Price Index in small vs. large cities 1993-2012”

Next, I had to set up the axes of the graph:

plot(cpi$Year,
cpi$Annual,
type=”n”,
main=heading,
xlab = “Year”,
ylab = “Average Annual CPI”)

This line:

  • sets the x-axis as the years from the small cities dataset, 
  • sets the y-axis as the Average Annual consumer price index from the small cities data set,
  • tells R not to also show the data points as a scatter plot on the graph,
  • labels the x-axis as Year,
  • labels the y-axis as Average Annual CPI 

Note that to see all of your options for data to assign to axes for a dataset, you can type the following into the Console in the bottom left window:

names(cpi)  

Where you can replace “cpi” with whatever variable you’re interested in.

Then we graph the data as lines, with small cities colored red and large cities colored blue:

lines(cpi$Year, cpi$Annual, type=”l”, col=”red”)
lines(cpiBig$Year, cpiBig$Annual, type=”l”, col=”blue”)

Finally, we give the chart a legend:

legend(“topleft” , title=”City Size”, cex=0.75, pch=16,
col=c(“red”, “blue”), legend=c(“Pop. < 50,000”, “Pop. > 1,500,000”), ncol=2)

This tells R to put the legend in the top left of the chart, title it City Size, colors the lines the correct color values, and gives them the correct label for each line.

To see the output of your script, click Source and then Run in the top left window. You should have something like this show up in the bottom right window:

Plot of CPI in small and big cities

So what’s happening?

The line for small cities is consistently higher than the line for big cities. How does that make sense? Aren’t small towns full of poor rednecks, and cities full of wealthy-ish hipster urbanites? 

I asked my friend Jason Zeng, an economic analyst friend here in Berkeley about it and he gave the following explanation: it comes down to rich suburbanites and urban squalor. The poor in big cities can’t buy the quality goods that the wealthier commuters in suburbs do, so their prices are lower. There are more poor in the cities than in the suburbs, so the CPI for cities is dragged lower than the CPI for suburbs.