It’s incredibly difficult to climb the income ladder in the South

If you’re poor and raising kids in the South, you owe it to your children to move elsewhere as soon as possible. David Leonhardt, for The New York Times:

Especially intriguing is the fact that children who moved at a young age from a low-mobility area to a high-mobility area did almost as well as those who spent their entire childhoods in a higher-mobility area. But children who moved as teenagers did less well.

Income ladder

Republicans have the statistics backwards on pregnancy from rape

celeste greig california republican assembly

Proving yet again that Republicans don’t know when to stop talking, here’s Celeste Greig, president of the California Republican Assembly:

“Granted, the percentage of pregnancies due to rape is small because it’s an act of violence, because the body is traumatized,” Greig told the newspaper. “I don’t know what percentage of pregnancies are due to the violence of rape. Because of the trauma the body goes through, I don’t know what percentage of pregnancy results from the act.”

Yeah, the percentage is “small” if small means, you know, “twice the per-incident rate for consensual sex”:

The newspaper cited statistics from a 2003 study by St. Lawrence University that showed women get pregnant after rape at a rate that is more than double that for a single act of consensual sex. Relying on data from U.S. National Violence Against Women survey, the study said the per-incident rape-pregnancy rate was 6.4 percent while the same rate for women having consensual sex was 3.1 percent per encounter.

Your probability of dying from a firearm (or, how to manipulate statistics)

probability of dying from a firearm

The above is a line chart of your probability of dying from a firearm each year from 2003 to 2010 (it would have included more years, but finding death statistics is a pain for some reason). I made it to demonstrate how easy it is to manipulate numbers in order to tell a specific story. The numbers work out – I found the probability of dying in the United States, found the probability of dying of a firearm given you died, and multiplied them together. 

The chart would lead you to believe that your chances of dying from a firearm have fallen. The problem is that this chart doesn’t include anywhere near enough information to tell a full story. You have to look at the actual year-by-year statistics to see the full story. That means looking at the number of deaths, the number of firearm deaths, and the probabilities involved.

When you look at the actual numbers, the number of gun deaths per year had indeed fallen from 2003 to 2010. However, the difference to the final probability of dying from a firearm was in reality negligible, falling from 0.0038% to 0.0032%.

(Also, the number of deaths from firearms last year was higher than in 2003, so the trend didn’t continue.)

Keep this in mind when looking about statistics and graphics about political issues.

Prices are higher in small towns than in big cities (also, a tutorial for R)

So I’ve decided to start learning about statistical computing ahead of the harder stats classes that I’ll be taking this fall (my subfield within the political science major is Empirical Theory and Quantitative Methods) and as my first little project to teach myself the basics of the R language/environment I decided to take a look at the consumer price index in small cities (population less than 50,000) versus large cities (population greater than 1,500,000). To do that, I needed to get that data, format it in a way that was R-friendly, and then present it in a way that makes sense. Since I noticed that many of the R tutorials out there aren’t very clear on some things, I decided to document my steps as I figured out what worked.

Getting data

The Bureau of Labor Statistics gives anyone access to their consumer price index database, and lets you see the information for specific regions. The two pieces of data I chose were Size Class A (over 1,500,000) and Size Class D (under 50,000) for 1993 to 2012. Retrieving the data as tables, I pasted each into a separate Numbers spreadsheet (this is on my MacBook Air) and exported them to my Downloads as “cpibig19932012.csv” and “cpi19932012.csv”, respectively. 

Getting it into R

Working in RStudio, I clicked on the Files tab in the bottom right window, clicked Home, clicked Downloads (or wherever you decided to save the .csv files), clicked More, then Set As Working Directory. This lets us access the .csv files in the R environment.

In a new script in the top left window, I import the data into variables cpi and cpiBig for the small cities and big cities, respectively:

cpi <- read.csv(file=”cpi19932012.csv”,head=TRUE,sep=”,”)
cpiBig <- read.csv(file=”cpibig19932012.csv”,head=TRUE,sep=”,”)

Making a graph

I decided that the best way to represent the data over time would be a line chart showing both data sets on the same graph. I start by deciding on a heading, “Consumer Price Index in small vs. large cities 1993-2012”:

heading = “Consumer Price Index in small vs. large cities 1993-2012”

Next, I had to set up the axes of the graph:

plot(cpi$Year,
cpi$Annual,
type=”n”,
main=heading,
xlab = “Year”,
ylab = “Average Annual CPI”)

This line:

  • sets the x-axis as the years from the small cities dataset, 
  • sets the y-axis as the Average Annual consumer price index from the small cities data set,
  • tells R not to also show the data points as a scatter plot on the graph,
  • labels the x-axis as Year,
  • labels the y-axis as Average Annual CPI 

Note that to see all of your options for data to assign to axes for a dataset, you can type the following into the Console in the bottom left window:

names(cpi)  

Where you can replace “cpi” with whatever variable you’re interested in.

Then we graph the data as lines, with small cities colored red and large cities colored blue:

lines(cpi$Year, cpi$Annual, type=”l”, col=”red”)
lines(cpiBig$Year, cpiBig$Annual, type=”l”, col=”blue”)

Finally, we give the chart a legend:

legend(“topleft” , title=”City Size”, cex=0.75, pch=16,
col=c(“red”, “blue”), legend=c(“Pop. < 50,000”, “Pop. > 1,500,000”), ncol=2)

This tells R to put the legend in the top left of the chart, title it City Size, colors the lines the correct color values, and gives them the correct label for each line.

To see the output of your script, click Source and then Run in the top left window. You should have something like this show up in the bottom right window:

Plot of CPI in small and big cities

So what’s happening?

The line for small cities is consistently higher than the line for big cities. How does that make sense? Aren’t small towns full of poor rednecks, and cities full of wealthy-ish hipster urbanites? 

I asked my friend Jason Zeng, an economic analyst friend here in Berkeley about it and he gave the following explanation: it comes down to rich suburbanites and urban squalor. The poor in big cities can’t buy the quality goods that the wealthier commuters in suburbs do, so their prices are lower. There are more poor in the cities than in the suburbs, so the CPI for cities is dragged lower than the CPI for suburbs.

Nate Silver on whether or not gun control would make America safer

From his AMA on reddit:

It’s a tricky problem, statistically. The issue is that while gun ownership rates could plausibly be a cause of fatal crimes and accidents, it can also be a reaction to it, i.e. people purchase guns because they feel unsafe.

I’m not saying that the issue is intrinsically inscrutable. But it’s something that more requires a PhD-thesis-level treatment than a blog post to really add much insight, I think.

You were probably hoping for a clear, concise answer. The problem is that the issue of gun control really isn’t as cut-and-dry as many on both sides of the debate make it out to be.