Machine Learning project where I train a random forest algorithm to predict activity class using wearable sensor data. I explore ways of reducing training time and improving product design.
Exploratory analysis including Principal Components and Pairplots of the activity classification data used in my machine learning project.
A nested multiple regression analysis for effects of transmission type on fuel economy.
A distribution of sample means is simulated and compared to the standard normal distribution. Quantile-quantile plots and statistical tests are used.
A data cleaning project of The National Oceanic and Atmospheric Administration database. They define 48 types of storms but there are more than 400 unique values for storm type in the database. I use regular expressions and logical statements to efficiently clean these data into a true factor variable with a limited number of levels. Then I create a visualization to show the number of injuries and fatalities and the financial costs of storm types in the USA.