CanViz: Popular surnames as recorded by the U.S. Census [2017-10-11]¶
Attention
- Slugline
- padjo-2017 can-viz census-popular-surnames your_sunet_id
Turn-in instructions
Send an email to dun@stanford.edu
Subject line should be almost exactly like the following (spacing, case, etc.) except for your Sunet ID:
padjo-2017 can-viz census-popular-surnames lelandjr
Send me an attachment containing a Google Document (i.e. word processor page). This page can contain several charts, with words, WordArt, etc. if necessary.
- Due
- 2017-10-11 23:59
- Description
- The Census has 2000 and 2010 data on Americans’ last names, including how many Americans have a common last name, and the ethnic/racial makeup of that last name.
Background¶
Among the many data points collected by the U.S. Census surveys are the survey respondents’ names. While the Census doesn’t publish its socioeconomic/demographic data in a form so granular that names are attached, it does publish name data in aggregate form – namely, the number of Americans who share a surname, and the demographics of that group.
Here is the Census’s landing page for this information, where you can download the original spreadsheets yourself:
https://www.census.gov/topics/population/genealogy/data.html
I recommend just copying the spreadsheets I’ve made in Google Sheets for yourself:
However, I do recommend reading the technical documentation on the Census site. I believe 2000 and 2010 are pretty similar, but you should double-check for yourself. Reading the docs will also help you make sense of the column names:
- https://www2.census.gov/topics/genealogy/2000surnames/surnames.pdf
- https://www2.census.gov/topics/genealogy/2010surnames/surnames.pdf
Real-life uses of surname data¶
If you need examples of visualizations, you can find a few on the Census site itself:
- https://www.census.gov/library/visualizations/2016/comm/cb16-tps154_top_surnames.html
- https://www.census.gov/library/visualizations/2016/comm/cb16-tps154_surnames_top15.html
(yes, lists/tables count as visualizations)
Here’s an example from Randy Olson, though he had to use the much-more limited 1990 version of the data:
http://www.randalolson.com/2014/05/31/fastest-growing-and-declining-surnames-in-the-u-s/
Beyond visualizations, the hypothesis that someone’s ethnicity might correlate with their last name has led to interesting applications, such as the government using it to guesstimate who to send mailers regarding race-specific lawsuits. The Wall Street Journal developed a nice little interactive that simulates the methodology:
Directions¶
Make me a visualization that tells me something interesting about American last names, their frequency, the demographics of people who have them, how things have changed between 2000 and 2010, and any combination thereof.
Visualization can be comprised of several charts if Google Sheets’ Wizard isn’t enough. I recommend pasting them into a Google Document and using the drawing tools if you need extra annotative effect.
Send me an email with the slugline:
padjo-2017 can-viz census-popular-surnames your_sunet_id
The email should contain an attached document (or an image file).
Beware big data¶
OK the Census data isn’t really “big data”. But for spreadsheets, especially Google Spreadsheets, there are too many data cells to do a calculation across all of the cells.
So my advice is that after you make copies of my 2 spreadsheets, make a new spreadsheet from which you’ve copied/pasted just a subset of the full Census data. What could that subset be? Top 10/100/1000 is a start. But consider all the other attributes that we have for each surname that you can sort by…