Sorting a spreadsheet by single columns

Overview

How to sort the rows in a spreadsheet based on values in a single column. Choosing what to sort by is one of the most fundamental and powerful data analysis techniques, as basic as it might seem.

This is meant to be a basic lesson for those folks very new to modern spreadsheets. If you know enough about spreadsheets to sort a list of names in alphabetical order, and then to sort those names in reverse-alphabetical order, you may want to skip ahead to:

Sorting a spreadsheet using multiple columns

Sorting the sheet by a single column

Using the column label dropdown menu

../../../_images/single-column-sort-hover-col-dropdown.png

Click the dropdown symbol to show the (lengthy) dropdown menu. Then click the menu action, Sort sheet A -> Z, and the effect should be immediate.

../../../_images/single-column-sort-column-label-dropdown-sort-sheet-a-to-z.png

Ascending vs. Descending order; from A to Z, and Z to A

The resulting transformation to the sheet is that the rows are now ordered based on the each row’s respective value in the count column:

Note that the rank_within_sex values for rows now at the top of the sheet are 100 – i.e. these names are the 100th of the the top 100 names, i.e. the least frequent in baby count.

Sorting by A -> Z means to sort by alphabetical order, i.e. first to last.

When the values are numbers, the sort is done from smallest to biggest number. In the case of our data, the count of 2533 comes before 2544.

At this point, you should Undo the sort so that the data rows are in their original order, i.e. Mary and David at the top for the year 1960, with respective counts of ``51477 and 85940.

Descending order, i.e. Z to A

Now sort the count column in reverse order, i.e. Sort sheet Z -> A.

When the column of values involves string text, the values are sorted from “Z to A”, i.e. reverse alphabetical order. For numerical values, the sort is done from biggest to smallest.

For our data, note how many of the top names seem to be the male names of “David” and “Mike” – apparently, these names for boys were far more popular than whatever the most popular girl names were in the respective years:

../../../_images/single-column-sort-mostly-michaels.png

The need for tie-breakers when sorting

Just in case you’re wondering what a normal alphabetical sort looks like, try sorting the name column. You don’t have to Undo the previous reverse-sort-by-count operation – there’s no reason why a sort-by-name operation has to care about how the data was previously sorted by count:

../../../_images/single-column-sort-name-zoeys.png

Multiple rows have “Zoey” as the name value. So how are all these “Zoey” rows sorted? It appears to be in descending of the count column, i.e. most Zoeys to least Zoeys. Why is that? Most likely coincidence, from how the rows were originally sorted.

In fact, coincidence is the most likely explanation for how the Zoey (and other rows) are sorted. And it should be an unsatisfactory explanation.

We haven’t encountered an option in the spreadsheet software that lets us sort by anything more than a single column, which means we haven’t yet had any input on tie values. So if we want anything else besides the row with the highest count of Zoeys, such as the row with the least count of Zoeys. Or the row with “Zoey” for name with the smallest (i.e. oldest) year value – we’re currently out of luck.

Sorting using the Data menu

Using the dropdown-column menu to sort by columns is by far the most convenient way to sort data by a single column.

In Google Sheets, this action can also be found in the Data menu. First you have to click anywhere on the desired column, such as the column D for count in our example:

../../../_images/single-column-sort-click-d.png

Then click the Data menu, which will include the alphabetical/reverse-alphabetical sorting actions:

  • Sort sheet by column D, A -> Z
  • Sort sheet by column D, Z -> A
../../../_images/single-column-sort-click-d-data-menu-actions.png

Never NOT sort the entire sheet

Sometimes, the Data menu will have 2 seemingly similar sort actions:

  • Sort range by column D, A -> Z
  • Sort range by column D, Z -> A

I won’t elaborate on how these options come up in the interface. It’s enough to tell you with almost absolute certainty that you should never sort using those options. Here’s a helpful diagram:

../../../_images/single-column-sort-click-d-data-menu-sort-range-beware.png

To “sort range” by a single column basically means to sort the values in a single column independently of their respective rows, i.e. to completely disconnect the values of a column from all the other columns. I can’t think of any reason why that is useful, other than this being a classic way for people to unwittingly ruin their data.

Using our original dataset, here’s what a ascending-order sort (i.e. A to Z) by the count column looks like using the “sort range by [single] column”:

../../../_images/single-column-sort-d-sort-range-bad.png

Sort range

The main reason why I bring up the Data menu in the context of sorting is that there is an action named Sort range… that will be immensely useful to us:

../../../_images/multi-column-sort-data-sort-range-action.png

This action provides the functionality in Google Sheets to sort by multiple columns, i.e. tell the spreadsheets how to break a tie when sorting by one column just isn’t enough.