The idea started as I was driving back from a caving trip in southern Utah. My friend Mark & I drove through Colorad City, Arizona, known for its community of polygamists.
“Wow” we thought “this is not a normal town.”
Like I mentioned, we had just spend a few days in Utah, a state known for its disproportionate percentage of followers of the Mormon faith. And as we drove back we crossed through Nevada, a state that you are always aware when you enter on account of the immediate presence of large casinos on the border. Not exactly normal.
As we pondered what it was that we meant by ‘normal’ we realized that our beloved home of California certainly wouldn’t fit the mold.
“So,” we asked ourselves “which state is the normal one?”
If the states were family members in a bizarre sitcom – which one would be the main character? The relatable one. The one who’s character wasn’t saturated with stereotypes and caricature.
We debated for the better part of a few hours – which is nice on a 12 hour drive. Illinois and South Carolina seemed to come out on top – but none of this was based on anything solid.
So, operating under a belief that there is a state more normal than the others – I am going to attempt to pull together some data that will make clear which one it is.
Wish me luck.
Step 1: What makes a state normal?
As we drove we began to brainstorm a list of ideas of attributes of a state that could help to determine its normalness or lack thereof.
We settled on a few major categories:
- Social Norms
- Geography & Climate
Each category could then be broken down into a number of attributes that are more quantifiable. For example – in politics, we would look at the state’s popular vote breakdown as compared to the nation’s in the most recent presidential election.
The methodology we will take with this project is to assign, for each attribute, an absolute value z-score for each state. This will show which states are closest to the national mean – the ones with low z-scores – and which ones are not – those with higher z-scores.
We realized that it didn’t matter if they were above or below the national average because we will be comparing across attributes that are in no way related. If we tried to keep track of positive and negative values we would be presented with the challenge of determining whether high amounts of rain was more of a Democrat or Republican thing. We don’t care – if you’re not close to the mean – you’re not normal.
Step 2: Getting the data
Now that we know what we know what we are measuring and comparing against, we need some data. I’m really hoping there is a massive state info API out there – but in the likely case that there is not – it might be time to start web scraping & mechanical turking. This is going to be the fun part.
I will be keeping all code publicly viewable on github – so feel free to contribute: Most Normal State Github Repository
You can ready part 2 here, where I attempt to get data from the US Census.