Using SQL: The Select statement, using Mode Analytics ~ Data-Driven Reporting

(Postdated lecture notes/handout)

SQL: SELECT FROM WHERE ORDER BY

Throughout these handouts SQL has been written out so you have a record of what different things do. I HEAVILY ENCOURAGE YOU TO TYPE IT OUT AND NOT COPY/PASTE. You will learn more if you make typos and mistakes along the way and have to figure out why and correct it.

There aren’t that many commands in SQL, and you can guess what they do by looking at them.

SQL commands

SELECT

used to find data

INSERT

used to add data

UPDATE

used to change data

DELETE

used to remove data

SELECT is your workhorse. You’ll build your SELECT statements with some basic keywords:

SELECT

followed by the fields you are looking for

FROM

the table (or tables) where your are looking

WHERE

certain conditions exists.

GROUP BY

deals with how you combine data

HAVING

allows you to filter data after it is grouped

ORDER BY

the sorting order your want your data

SELECT and FROM are required.

The syntax is a lot like English, so at it's basic level it is simple to understand, but it can do a lot.

SELECT

name,

address,

hometown

FROM

customers

WHERE

hometown = 'Austin';

That says give me the name, address and hometown of all the customers that have a hometown of Austin. Simple, right?

The SELECT basics

We are going to use ModeAnalytics to learn basic SQL. Log into Mode. (You should've gotten an invitation from me already and already have created a login.) Click on New Query.

For the following exercise, we are going to use a table called "est2013_48" in the schema "utdata." It sometimes helps to have the table open to reference, so in a new browser tab go to this url:

https://modeanalytics.com/utdata/tables/est2013_48

Using SELECT to view everything

We are going to query some population estimate data that is already uploaded into Mode. Type in the following, and then click Run.

SELECT *

FROM utdata.est2013_48;

What it means

SELECT means to find, and it is followed by what fields you want. In this case, we want every field, so we are using the wildcard *
FROM utdata.est2013_48 means to look in the schema “utdata” for the table “est2013_48”.
The semicolon at the end of the statement says that it is ended. It’s like -30- at the end of a story. (Wait, you youngsters probably don’t get that.)

(I’ll take a minute to explain what each column means in this data set because I know the data, but I’m not going to write it out here.)

Using LIMIT

Did anyone notice how many results we got? Do you think there are only 100 records in this data set?

LIMIT saves you time by only returning a limited number of records. SQL managers like Mode and MySQL Workbench set a base limit by default, but you can override it.

Click off the “Limit 100” button at the top of the Mode window and see how many records you get. Should be 3102.

You can also set the limit within your query:

SELECT *

FROM utdata.est2013_48

LIMIT 2000;

Adding the LIMIT to your query will override Mode’s “Limit” button. For now, you can turn that button back on and set it to 100.

Use SELECT for just some information

But maybe we don’t want to show every columns. Maybe we only want the fields for the city name and estimates for each year. So the fields we need are: name, estimatesbase2010, popestimate2011, popestimate2012, popestimate2013. So we will select those by name, separating them with a comma.

SELECT

name,

estimatesbase2010,

popestimate2011,

popestimate2012,

popestimate2013

FROM

utdata.est2013_48;

You’ll see that I’ve indented some of the lines. This is good coding practice because it helps you discern what goes where and why.

Setting column header names

Your data may have long or hard to understand names with not spaces, etc, but you can rename them by using as "New Name" in your query. Use double quotes for these names:

SELECT

name as "City",

estimatesbase2010 as "2010",

popestimate2011 as "2011",

popestimate2012 as "2012",

popestimate2013 as "2013"

FROM

utdata.est2013_48;

What we are doing here is just changing the name of the column that is displayed.

Syntax differences

I may have mentioned somewhere that SQL syntax may differ depending on the database and client you use, and that comes into play here. Mode wants us to use double quotes when when we are referencing column names, and single quotes when we are representing values within the data, which we do below. This is different than MySQL/Navicat, where you can call values in double quotes. You just have to deal with the nuances. (For instance, I sent a support ticket into Mode when I wanted to fine the name value 'CHILI'S GRILL & BAR' because the query got confused with the apostrophe in CHILI'S. The answer was I used two apostrophes inside the quotes: 'CHILI''S GRILL & BAR' …which looks like a doublequote, but isn't. Confused? Yeah, me, too.)

Let's add some ORDER (BY)

ORDER BY is the term for sorting in SQL. You'll notice that our cities are in no particular order at all, and some are repeated but not together. It's kind of a mess. Let's ORDER BY name.

SELECT

name as "City",

estimatesbase2010 as "2010",

popestimate2011 as "2011",

popestimate2012 as "2012",

popestimate2013 as "2013"

FROM

utdata.est2013_48

ORDER BY

name;

Using WHERE to filter data

What if we don’t want all the data? What if we only want certain data where a condition is true?

Did anyone notice that some cities are listed more than once in our data? This data includes population estimates not only for "places" but it also breaks out where different places may cross boundaries, like how parts of Austin are in both Travis and Williamson counties. It’s quite a mess. Just believe me when I say if we want just the populations of a whole "place", we want to find the records where sumlev = 162, but first we're going to add the sumlev and county fields so you can see the filtering.

SELECT

sumlev,

county,

name as "City",

estimatesbase2010 as "2010",

popestimate2011 as "2011",

popestimate2012 as "2012",

popestimate2013 as "2013"

FROM

utdata.est2013_48

ORDER BY

name;

Now you can at least see how the sumlev and county relate to each other. Now let's use the WHERE keyword to filter these to just the rows with a sumlev of 162

SELECT

sumlev,

county,

name as "City",

estimatesbase2010 as "2010",

popestimate2011 as "2011",

popestimate2012 as "2012",

popestimate2013 as "2013"

FROM

utdata.est2013_48

WHERE

sumlev = '162'

ORDER BY

name

A couple of notes about this line:

We are putting '162' in single quotes because this field is considered a string instead of a number. This is a quirk of census data. If this were really a number, we wouldn't put it in quotes. (Numbers come next.) But sometimes these codes start with 0, which would be dropped if this field were made into a number, and that would cause problems for us later on. But notice the use of single quotes here when you are referencing a value in the table.
We are using an equals sign here like a math operator. And yes you can actually do math here, too. (Also next)
Also note the syntax difference here with Mode. In MySQL/Navicat, I would've used sumlev = "162".

Compound WHERE and operators

So, now we are going to add on another filter. We want just the cities (sumlev = '162'), but only with a 2013 population greater than 50,000.

SELECT

sumlev,

county,

name as "City",

estimatesbase2010 as "2010",

popestimate2011 as "2011",

popestimate2012 as "2012",

popestimate2013 as "2013"

FROM

utdata.est2013_48

WHERE

sumlev = '162' AND popestimate2013 > 50000

ORDER BY

name;

QUIZ QUESTION

Take a screen shot of your SQL and the result and use it to answer the question in today's assignment: What were the estimated populations of Abilene and Austin for the years 2010 and 2013?

-- 30 --

Data-Driven Reporting | UT Journalism

Wednesday, October 15, 2014

Using SQL: The Select statement, using Mode Analytics

The SELECT basics

Using SELECT to view everything

What it means

Using LIMIT

Use SELECT for just some information

Setting column header names

Syntax differences

Let's add some ORDER (BY)

Using WHERE to filter data

Compound WHERE and operators

0 comments:

Post a Comment