Categorizing the Census Surname Data¶
(Maybe do this in class?)
What is a categorical variable vs a continuous variable¶
The problem with the Census surname data is that it has too many continuous variables.
Calculating attributes of surnames¶
- Length of name
- First character
- Last character
- First/last 2 characters
Deriving more values from the pct columns¶
- White vs non-white
- Number of white/etc people for a given name
Categorizing the group-percentage values¶
- Diverse vs. non-diverse name?
- Labeling a name by its most dominant racial/ethnic group