Based on the infamous mtcars dataset, I used a stepwise selection process to generate a predictive model of fuel economy for 1974 automobiles*. My entire process was done primarily in R and can be found here on rPubs. In short, the model uses a car’s weight and 1/4-mile time to predict mpg’s. At the same time, the outcome is dependent on the cars transmission type; automatic or manual. The results? As you might suspect, heavier cars get worse gas mileage than lighter cars. Slower cars (meaning slower 1/4-mile time) get better gas mileage than faster cars. Also, manual transmission vehicles get better gas mileage than automatics. Take a look at the multiple linear regression model in action…
As you can see, the predictions are … kind of good. There are 11 instances when the prediction, including the confidence interval error bars, do not span the car’s actual mpg’s, which means there are 21 instances where the prediction is … close. From here, all I had were more questions and a barrage of ideas.
Is there a commonality between the automobiles where our prediction wasn’t even close? Perhaps a more advanced machine learning technique would better predict mpg’s? Would these results be similar if we used a larger dataset (than just the 32 instances used above)? Would these results be similar if we used a 2013/2014 dataset? What about practicability — what could this be used for in a business context? Could I put similar models to work at my 3PL? How are these methods being utilized in the supply chain industry?
And then I started googling “multiple linear regression supply chain management” ….
* This was an assignment as part of the linear regression course in the Coursera Data Science Signature Track