Wednesday, October 1, 2014

Keeping a data diary


Keeping a data diary
It’s good practice to keep open and update a log of the work you do with data files. It serves several purposes:
  • You’ll make so many changes to your data you won’t remember what you did to reach a specific value used in a story. If you record your steps, you can defend those actions later, if required.
  • You’ll sometimes work on data at different times, and not remember what you did the last time you worked on it. This will keep you from redoing work you’ve already done once.

The Data Diary is a document written for yourself, so it doesn’t have to be nicely formatted and readable like a Data Report, which will be read by reports and editors. I keep a simple text file using a code editor (TextWrangler (mac) or Notepad++ (pc) or Sublime (both, but badgerware) and I use a syntax called Markdown. This is a personal preference, but there are some advantages. MS Word will convert quote marks into characters that Excel and MySQL won’t understand. It is also the language that GitHub uses, and it is easily converted to HTML.

Some types of things you might want to record:
  • A list of your original data, where you got it from and how. Did you download it from the web? Include the URL so you can find it again. Did it come from a source? Include the contact information in case you have questions later.
  • A list of files you create and why. I find myself importing and exporting different queries and subsets of data, and I’ll fill a folder full of files I don’t remember if I don’t keep track of them.
  • Any time you manipulate data, you should include the formula or syntax somewhere with an explanation of why you did it. (Of course, best practice is to keep both the original and the modified data so you can compare them if needed.) While you can include comments in your MySQL queries, you might record in your diary which query you used to get whatever result you were looking for.
  • A list of tasks you hope to perform as you think of them.

Here is a simple example from one of my projects.


Post a Comment