SQL Counting and Filtering Lesson¶
The SQLite data¶
Here are the quick links to the downloadable SQLite databases that we’re using today. The .sqlite
file extension is arbitrary, it’s just there so that when we look at the filename (from among many that we download), we have a reminder of what the file may be. The DB Browser can open any file you point it to.
However, SQLite databases are indeed a binary file format. Open it in Excel and it will throw an error. Open it in a plain text editor and you’ll get gibberish.
Census Most Frequent Surnames for 2000 and 2010¶
As practiced in this assignment – except with all of the data in one table, as opposed to separate spreadsheets. : CanViz: Popular surnames as recorded by the U.S. Census [2017-10-11]
Download: census_surnames.sqlite
Palo Alto Public Salaries, from 2011 to 2016¶
As practiced in this assignment – except with all of the data, 2011 through 2016, in one table, not just 2014 and 2016: : Palo Alto Pay Aggregated, 2014 vs 2016 [2017-10-16]
Download: palo-alto-salaries.sqlite
Reference¶
Previous lessons for Reading¶
I’ve been wanting to streamline and prettify these lessons, but they pretty much cover the concepts exactly as I would in any year. The lessons refer to different downloaded datasets – this is a minor detail that you can figure out, or if you can’t, you can download the past versions of data. It doesn’t matter for our purposes, as the SQL programming is the same.
- Getting started with SQLite Browser
- SELECTing rows FROM data tables
- LIMIT and ORDER BY in SQL queries
- USING the WHERE clause to filter data in SQL
- More Boolean Expressions to Filter SQL Queries Using LIKE, IN, BETWEEN, and wildcards to match multiple values in SQL
- Functions for transforming text and numbers in SQL
- Aliasing Columns and Tables in SQL Databases <http://2015.padjo.org/tutorials/sql-basics/aliasing-values-in-sql/>`_
Counting, sorting, and filtering with SQL, plus Answers¶
Assuming you’ve dowloaded this SQLite data:
- ssa-babynames-nationwide.sqlite <http://stash.padjo.org/data/ssa-babynames-nationwide.sqlite>`_
Attempt to answer these questions: SQL Counting/Sorting/Filtering (Questions)
The answers for ssa-babynames¶
The first records (just name, sex, year) in the babynames database
SELECT name, sex, year FROM babynames LIMIT 5;
Top 5 popular baby names of all time
SELECT name, sex, count, year FROM babynames ORDER BY count DESC LIMIT 5;
Top 5 female babynames in 1991
(no need to inclue sex, since it’s all female)
SELECT name, count, year FROM babynames WHERE sex = 'F' AND year = 1991 ORDER BY count DESC LIMIT 5;
How the female name “Jenny” compares to “Jennifer” in 2001
SELECT name, count FROM babynames WHERE sex = 'F' AND year = '2001' AND (name = 'Jennifer' OR name = 'Jenny');
How the male names ‘Noah’, ‘Jordan’, and David compare, in 1980 and 2010
Include the
year
column and sort it by bothyear
andorder
to get a listing in which it’s easy to compare the years.SELECT name, year, count FROM babynames WHERE sex = 'M' AND year IN(1980, 2010) AND name IN('David', 'Noah', 'Jordan') ORDER BY year ASC, count DESC;
The 10 most popular names in 2009 that began with a vowel
SELECT name, sex, count FROM babynames WHERE SUBSTR(name, 1, 1) IN ('A', 'E', 'I', 'O', 'U') AND year = 2009 ORDER BY count DESC LIMIT 10;
The 10 most popular names in the year 2015 that begin with ‘Z’ and have at least 5 characters
SELECT name, sex, year, LENGTH(name) AS name_length, count FROM babynames WHERE name_length >= 5 AND SUBSTR(name, 1, 1) = 'Z' AND year = 2015 ORDER BY count DESC LIMIT 10;
Social Security Baby Names, Nationwide, Since 1980¶
Referred to in previous lessons:
Download: ssa-babynames-nationwide.sqlite