Recently Published
Final Project - DATA622
Use of Decision Tree, Random Forest, KNN, XGBoost, and SVM for modeling which data scientists are looking to leave their current jobs
Time Series and Decomposition
Group 4 - Subhalaxmi Rout, Kenan Sooklall, Devin Teran, Christian Thieme, Leo Yi
Influenza and Pneumonia Mortality during the Global COVID-19 Pandemic and the Impact of Local Government Restrictions
Final project can be viewed here: https://github.com/christianthieme/Business-Analytics-and-Data-Mining-with-Regression/blob/main/Final%20Project%20-%20COVID-19%20Effect%20on%20Pneumonia%20%26%20Influenza/final_project_report.pdf
CODE APPENDIX - COVID-19 Effect on Pneumonia & Influenza
Code Appendix for 621 final project
Poisson, quasi-Poisson, Negative Binomial, and Zero Inflated Regression
Our objective is to build a count regression model to predict the number of cases of wine that will be sold given certain properties of the wine.
Understanding Common Classification Metrics - Titanic Style
Discussion on Confusion Matrices, Accuracy, Classification Error Rate, Precision, Sensitivity, and Specificity
Understanding Classification Metrics
Review of accuracy, precision, sensitivity, specificity, F1 score, and ROC curve
Moneyball Multiple Regression Analysis
Full EDA, imputation of nulls, model building, and model diagnostics
Ink-Data Ratio ggplot2 EDA
Using principles from Edward Tufte, use ggplot2 to come up with visuals for the EDA questions provided.
Inc 5,000 Fastest Growing Companies Web Scrape
The website Inc. 5000 has a list of the top 5,000 fastest growing private companies from 2020. I will use xml2 and rvest to scrape the data from the website.
Computational Mathematics Final Exam
DATA605 Final Exam
Predicting House Prices: Regression Techniques
Final Project DATA605
Part II - NBA Player Salary Analysis with Multiple Regression
Can We Predict an NBA Player’s Salary Using His Statistics from the Prior Year?
WHO dataset Regression Analysis
Forecasting Life Expectancy
Markov Chains and Random Walks
Prisoner's dilemma questions
Project 4: Document Classification - Using Machine Learning to Build a SPAM Predictor
The purpose of this project is to build a classification model that can accurately classify spam email messages from ham email messages. We will do this by using pre-classified email messages to build a training set and then build a predictive model to forecast unseen email messages as either spam or ham.
Thieme-Proposal DATA606
DATA606 research proposal
Tidyverse Extend Assignment - Lubridate
Extension of Ken Popkin's Lubridate Tidyverse Create Assignment
Week 10 Assignment DATA607 - Sentiment Analysis
The purpose of this project is two-fold: First, to take a deep dive into the mechanics and application of Sentiment Analysis by following an example provided by Juilia Silge and David Robinson from their book “Text Mining with R - A Tidy Approach”. Second, to choose another corpus and incorporate another lexicon, not used in the example below, to perform sentiment analysis.
Week 9 Assignment - Working with Web APIs
For this project, I will work with The New York Times web site API
Week 7 Assignment - Working with HTML, XML, and JSON in R
The purpose of this project is to demonstrate knowledge of HTML, XML, and JSON, as well as how to parse and extract information from each.
Project 2 - Data Transformation
The purpose of this project is to demonstrate the ability to transform data from various wide formats into a more digestible format for analysis. As part of the project, I will also clean/tidy the data and perform analysis. Below you will see three different data sets that were provided by fellow classmates. In addition to providing the data set, each classmate was asked to suggest analysis that could be completed using the data set. I will show the loading, tidying, and analysis of each data set below.
Week 5 Assignment DATA607 - Tidying and Transforming Data with tidyr
The purpose of this assignment is to:
1. Demonstrate how to transform data between wide and long formats with tidyr
2. Demonstrate how to tidy messy/unitdy data using tidyr - single entries on multiple lines and missing data
3. Perform data analysis using ggplot
Week 4 Project 1- Chess Tournament Data - Regular Expressions
In this project we will take a raw text file containing the results of a chess tournament and extract key infomration from the file and perform some cacluations. What makes this project particularly challenging is that each entry in the file (a single chess player) has data points spanning two rows. Our task will be to find a way to extract the information that we need so that we can create a CSV file where all of the data that we want for an entry, including data we will calculate, is on one row.
3 Week Assignment DATA607 - R Character Manipulation
The below assignment is geared toward jumping into character extraction/manipulation with R using Regular Expressions. The examples show how to use regex in a variety of ways such as identifying rows from a dataframe containing certain words, extracting key data from messy datasets, as well as using capture groups, lookbacks, and more to solve for tricky scenarios with word extraction.
Final Project - Student Performance Analysis - R Bridge
Analysis of student performance on tests based on socioeconomic factors
Week 2 Assignment R Bridge
Week 2 Assignment R Bridge for Christian Thieme
Week 1 Assignment - R Bridge Course
This is the completed assignment for Week 1.