gravatar

jsheremat

Jeff Sheremata

Recently Published

Mondrian Plot
Circular Packing Plot
My Top 50 Places
Machine Learning Income Prediction Using Census Data
The goal of this project is to data from the US Census to develop predictive models to predict if an individual has an income higher than $50,000/year. In this project, the following predictive methods were evaluated: logistic regression, CART tree models, random forest, C50 boosted tree models, GBM boosted tree models. A cross validation CART tree model was also constructed. Each model was evaluated and analyzed using several metrics such as accuracy and AUC. A C50 boosted tree model was found to have the best performance.
Moneyball Regression
This RPubs document gives a quick overview of the linear regression methods used by the Oakland A's in the early 2000s. The A's had the idea that analytics and data mining would allow them to assemble a team on a modest budget that would make the playoffs.
Publish Presentation
Publish Document
CapstoneMilestone
Motor Trend MPG Regression Model Devlopment
Motor Trend, is interested in the following two questions: 1. Is an automatic or manual transmission better for MPG? 2. What is the MPG difference between automatic and manual transmissions? The approach taken was to use the mtcars dataset, and create a linear regression model with MPG as the predicted variable, with transmission and other variables as predictors. A linear regression model (R^2 =0.866) determined that when the other model variables (hp, cyl, and wt) were constant, manual trasmisions improved MPG by an average of 1.8 MPH compared to automatic transmissions.
Major League Baseball Yearly Team Statistics App
For each current (2015) team in Major League Baseball, yearly batting, pitching, and fielding statistics are mined and returned in this app. Statistics are from 1872-2013.
US Public Health and Economic Impacts of Severe Weather Events
The basic goal of this assignment is to explore the NOAA Storm Database and answer some basic questions about severe weather events. Fatality and casualty data were used to analyze impacts on public health. Crop and property damage estimates were used to analyze economic impacts. Based on data from 1995-2011, it was determined that for population health in the USA, tornados, heat and floods have the most severe impacts. Conversely, for impact on the USA economy, droughts, floods, hurricanes, and tornados have the most severe impacts.