Sorting a spreadsheet by single columns¶
Overview¶
How to sort the rows in a spreadsheet based on values in a single column. Choosing what to sort by is one of the most fundamental and powerful data analysis techniques, as basic as it might seem.
This is meant to be a basic lesson for those folks very new to modern spreadsheets. If you know enough about spreadsheets to sort a list of names in alphabetical order, and then to sort those names in reverse-alphabetical order, you may want to skip ahead to:
Sorting a spreadsheet using multiple columns
Data links¶
For this lesson, I’ve created a public Google sheet that you can view and copy for yourself. It contains multiple sheets, each one corresponding to the sorting examples in this lesson:
Sorting the sheet by a single column¶
Ascending vs. Descending order; from A to Z, and Z to A¶
The resulting transformation to the sheet is that the rows are now ordered based on the each row’s respective value in the count
column:
Note that the rank_within_sex
values for rows now at the top of the sheet are 100
– i.e. these names are the 100th of the the top 100 names, i.e. the least frequent in baby count.
Sorting by A -> Z means to sort by alphabetical order, i.e. first to last.
When the values are numbers, the sort is done from smallest to biggest number. In the case of our data, the count
of 2533
comes before 2544
.
At this point, you should Undo the sort so that the data rows are in their original order, i.e. Mary
and David
at the top for the year 1960, with respective counts of ``51477
and 85940
.
Descending order, i.e. Z to A¶
Now sort the count
column in reverse order, i.e. Sort sheet Z -> A.
When the column of values involves string text, the values are sorted from “Z to A”, i.e. reverse alphabetical order. For numerical values, the sort is done from biggest to smallest.
For our data, note how many of the top names seem to be the male names of “David” and “Mike” – apparently, these names for boys were far more popular than whatever the most popular girl names were in the respective years:
The need for tie-breakers when sorting¶
Just in case you’re wondering what a normal alphabetical sort looks like, try sorting the name
column. You don’t have to Undo the previous reverse-sort-by-count operation – there’s no reason why a sort-by-name
operation has to care about how the data was previously sorted by count
:
Multiple rows have “Zoey” as the name
value. So how are all these “Zoey” rows sorted? It appears to be in descending of the count
column, i.e. most Zoeys to least Zoeys. Why is that? Most likely coincidence, from how the rows were originally sorted.
In fact, coincidence is the most likely explanation for how the Zoey (and other rows) are sorted. And it should be an unsatisfactory explanation.
We haven’t encountered an option in the spreadsheet software that lets us sort by anything more than a single column, which means we haven’t yet had any input on tie values. So if we want anything else besides the row with the highest count of Zoeys, such as the row with the least count of Zoeys. Or the row with “Zoey” for name with the smallest (i.e. oldest) year
value – we’re currently out of luck.