Udacity’s Secret Ingredient: Positivity, Support and Encouragement

Last week, I finished the 6 course Data Analyst Nanodegree offered by Sabastian Thrun’s Udacity Inc. Even with an above average number of academic credentials behind me, I will venture to say that this was the best educational endeavor I’ve been through yet. When I worked through my undergrad and graduate degrees in economics, I … [Read more…]

A Look at Election Campaign Contributions with R

The state of Florida, every four years, is a definitive swing state in the US Presidential election. Since 1996 – 5 election cycles ago – the candidate that captured Florida’s electoral votes became the next US President. Assuming a strong correlation between campaign contributions and election results and having perfect hindsight of knowing the 2012 Presidential election … [Read more…]

MongoDB & pymongo: Tutorial

In this post I’ll pretend that I am teaching a data science course on collecting, cleaning, storing, and updating data. Rather than pretending my students are computer scientists or software developers, I’ll pretend that they are business analysts or college grads going on to become business analysts. However, I’ll assume they know some programming (python), … [Read more…]

MongoDB & pymongo: Step by Step

As I ventured into Lesson 4 in Udacity’s Data Wrangling with MongoDB, I really wanted to run the first script — inserting a record into the database — locally. I feel like I really damaged the sanctity of my files by installing, uninstalling, messing with permission, etc. for hours in all different locations in my … [Read more…]

Worksheet for Udacity’s Intro Statistics Courses

I’ve created an ‘in-progress’ google spreadsheet for a lot of the exercises and examples in Udacity’s Intro Statistics courses (Intro to Descriptive Statistics & Intro to Statistical Inference) — link is below. When I used to teach finance, I taught class with spreadsheets like this one rather than with powerpoint. Preparing this kind of document … [Read more…]

What is the Most Harmful Storm in the US?

The National Oceanic and Atmospheric Administration (NOAA) regularly publishes data on storm occurrences in the US. They make available annual data dating back to 1950 and it includes time-series, geographic proximity, and financial destruction information as well as storm characteristics (event type, width of tornado, wind gust estimates, etc.). While you could do thousands of … [Read more…]

Fixing Excel’s Sci Not Faux Pas with R

Encountered what I think is a pretty common excel problem at work today. A colleague showed me an excel spreadsheet that was reading warehouse locations as scientific notation. For example, location 05E03 was being read into excel as 5.00E+03 and if you tried to edit the cell or convert it to text, you’d be given … [Read more…]

Predicting Fuel Economy for 1974 Automobiles

Based on the infamous mtcars dataset, I used a stepwise selection process to generate a predictive model of fuel economy for 1974 automobiles*. My entire process was done primarily in R and can be found here on rPubs. In short, the model uses a car’s weight and 1/4-mile time to predict mpg’s. At the same … [Read more…]