Recently Published
Air Quality and Asthma Emergency Department Visits in New York City
Analysis of PM2.5 air pollution and asthma emergency department visit rates in New York City using official public health data and statistical modeling.
A Statistical Analysis of U.S. College Majors
This study examines whether college majors classified as STEM (Science, Technology, Engineering, and Mathematics) are associated with higher median annual salaries than non-STEM majors in the United States.
U.S. Energy Production and Dependency by State
This report uses EIA State Energy Data System (SEDS) data to map state-level energy production, consumption, and import dependency in the lower 48 states, highlighting which states are net exporters and which rely heavily on external energy sources.
Extended TidyVerse Analysis – NYC Housing Data
This vignette extends a classmate’s original TidyVerse analysis of NYC housing data by adding new summaries, transformations, and visualizations. Additional insights include price-per-square-foot trends, property size categories, price distributions, and bedroom-price relationships.
Data 607 - Project 4
Classifying emails as spam or ham using the SpamAssassin dataset with Naive Bayes and Logistic Regression. Includes text preprocessing, model comparison, and predictions on new messages.
Kuhn & Johnson – Chapter 8
This document presents my complete solutions to Exercises 8.1, 8.2, 8.3, and 8.7 from Applied Predictive Modeling by Kuhn and Johnson.
Data 607 Assignment 11
Scenario Design Analysis of Amazon’s Recommender System
Kuhn & Johnson Chapter 7
This report explores nonlinear regression modeling concepts from Applied Predictive Modeling (Kuhn & Johnson, Chapter 7).
Kuhn & Johnson Chapter 6: Problems 6.2 & 6.3
This report presents the solutions to Problems 6.2 and 6.3 from Kuhn & Johnson’s Applied Predictive Modeling.
It explores different regression models - including PLS, Elastic Net, and Random Forest - to predict compound permeability and manufacturing yield.
The analysis includes model tuning, performance comparison, and visualization of top predictors with concise interpretations for each step.
Sentiment Analysis Healthcare Management
This report follows Text Mining with R: A Tidy Approach (Silge & Robinson, 2017), Chapter 2 to implement baseline sentiment analysis and then extends it in two ways as required:
Project 1: Predictive Analytics
This project applies time series forecasting to three datasets — ATM cash withdrawals, residential power usage, and hourly waterflow. Using the fpp3 framework, ETS and ARIMA models were fit to analyze patterns, generate future forecasts, and visualize results. Forecast outputs are exported to Excel for further validation.
Homework 6 - ARIMA models (Hyndman Ch. 9)
This report includes time series analysis exercises. Each question includes visualizations and concise explanations based on the Hyndman textbook.
Working with XML and JSON in R
This document loads book data from three manually created files in HTML, XML, and JSON formats
IS 607 – Project 2
The purpose of this project is to practice preparing and tidying wide-format datasets for analysis.
Homework 5 - Exponential Smoothing (Hyndman Ch. 8)
This document contains assignment for Homework 5 - Exponential Smoothing (Hyndman Ch. 8) from Hyndman & Athanasopoulos
Applied Predictive Modeling - Chapter 3 (Problems 3.1 & 3.2)
This report addresses Chapter 3, Exercises 3.1 and 3.2 from Applied Predictive Modeling by Kuhn & Johnson.
Tidying and Transforming Data
This assignment focuses on tidying and transforming data related to airline arrival delays.
Chess Tournament Project 1
In this project, we parse the chess tournament cross table and generate a dataset with the following fields:
Assignment – Exercise 3.10 (Time Series decomposition)
This document contains assignment for Chapter 3 Exercises (1–10) from Hyndman & Athanasopoulos: Forecasting: Principles and Practice (3rd ed.).
Used fpp3 package for time series analysis.
Exercise 2.10 (Time Series)
This assignment explores different time series datasets from the fpp3 package and related sources.
Week 1: Loading & Transformations — Multilingual App Reviews (2025)
This report loads a GitHub-hosted CSV of multilingual mobile app reviews and prepares a small, tidy data frame for basic analysis. It selects a clear subset of variables, assigns readable names, converts rating to numeric, parses review_date, and expands common language codes (e.g., “en” → “English”). A short overview links to background reading on differences between star ratings and review text.