Wednesday, October 15, 2014

Introduction to databases lecture


Introduction to databases

What is a relational database?

Simply put, it is an organized collection of data. It’s made of “tables” and information about how they relate to each other. You can think of a “table” as a single spreadsheet, but it’s a bit more strict about what can be in each row and column. A database can have multiple tables, and they can tie to each other.

A simple database example

A store may keep a table of all the customers that they do business with. It has their name, address and such. Each customer would be listed only once and they would be given an ID or “primary key.”

The store may keep another table of all the purchases made at the store. It might contain the name of the item, where it comes from, and how much it sold for and on what date. It might also include a customerID, which is a “foreign key” that is the same as the ID in the customer table.

By using this primary and foreign key combination, the store can tie each purchase with the customer who made it, without having to record that customer’s personal information for each purchase.

Why use a relational database?

How much time do we have?
  • To manage complexity. Databases help you keep similar information together without repeating it in other places.
  • To manage large data sets. The software that runs them make it possible to “query” very large amounts of data quickly.
  • To join different data sets together. Sometimes two different data sets may not be created to work together, but by “joining” them together, you can discover relationships that you might not be able to discern by looking at each datum independently.

How is a table different than a spreadsheet?

A well-structure spreadsheet with a header row is very much like a table, but the table enforces rules about what can be in each cell based on its column, or “field”. Each field is designed to hold text, dates or numbers, and you can’t mix them. So if a table of purchases has a date_purchased field, it will need to be in a date format, and each record entered must use the same format.

It’s this structure that you’ll find most frustrating when importing data. If you have census data of populations that include an X to signify there wasn’t enough data, then your import will fail because the database is expecting a number and not a letter.

What is a schema?

Getting a little bit into the weeds here, but essentially a schema is the collection that has all the related tables together. The “database” is just software and it can hold anything, it is the “schema” that says different tables are related to each other. So you might have a schema that holds all your tables about campaign finance, and then another schema with all your tables about use of force.

What is SQL?

Structured Query Language is a common language that many databases are built to understand. It’s **probably, officially** pronounced S-Q-L, but many people say “sequel” and they are just as right. It’s the simplest programming language you will ever learn, but also very powerful.

What is MySQL vs Postgres vs Access?

These are different flavors of databases that all use SQL or a variant of it, and there are a ton more. MySQL is the backbone of the Internet, is free, and it’s what we will use. Microsoft Access has been popular among journalist in the past because it is part of MS Office for PCs (only. No Macs) and plays so nice with Excel.

They are all based on the same language, but each have their own little changes and tweaks, just enough to drive you crazy for a minute or two.

We will use ModeAnalytics to learn the basics of SQL because it is cloud based and doesn’t require you to download, install or configure anything (though you do need to learn that). It will let us focus on the learning first. Also, Mode has a pretty awesome SQL School from which I drew inspiration for these lessons. That would be a great review or supplement to what we do in class.

You need to sign up for Mode and sent me your email address so I can add you to the “utdata” organization.


Post a Comment