Recently Published
STA 490 Sampling Presentation of Combined Bank Loan Data
A group presentation done with using various sampling methods on a combined U.S. bank loan data set. We will look at calculating the default rates for the population and the various random sampling methods in order to determine which sampling method would be the best for our data.
Sampling Design with the Combined Bank Loan Data
In this assignment, we will look at a data set which observed bank loan data. We will conduct steps of sampling preparation to analyze and prepare this data for further analysis. We will address concerns such as potential missing values, along with creating and redefining some of the variables in the original data set to make them more meaningful and relevant for future analysis. We will then conduct the random sampling process. We will take various types of samples including a simple random sample, a systematic sample, a stratified sample, and a cluster sample.
EDA with the Combined Bank Loan Data Set
In this assignment, we will look at a data set which observed bank loan data and conduct various exploratory data analysis steps in order to address concerns such as missing values, along with creating and redefining new variables which add more meaning and relevance to our data set.
STA 321 Logistic Regression Project Presentation: Predicting a Patient's Odds of CHD
The finalized group presentation of my STA 321 logistic regression project. This project uses logistic regression to create models which predict a patient's odds of being at risk for developing CHD based upon various personal and medical factors. This presentation was created as a group, and the contributions of each group member are listed at the end.
STA 321 Logistic Regression Project Presentation: Predicting a Patient's Odds of CHD
The finalized group presentation of my STA 321 logistic regression project. This project uses logistic regression to create models which predict a patient's odds of being at risk for developing CHD based upon various personal and medical factors. This presentation was created as a group, and the contributions of each group member are listed at the end.
STA 321 Logistic Regression Project: Predicting a Patient's Odds of CHD
A HTML presentation on a logistic regression project which uses binary predictive modeling to predict a patient's odds of being at risk for developing CHD in a 10-year period of time.
Practice Ninja Presentation
A practice ninja presentation using my logistic regression project from STA 321.
Kepler Exoplanet Search: Detecting and Confirming Planets Beyond our Solar System
In this project, I will use the Kepler Space Observatory data to analysis the observations of detected and confirmed exoplanet findings along with potential candidate exoplanet observations. This project looks at both unsupervised learning models, such as PCA, and supervised learning models, such as linear regression and classification, to learn more about the factors that have significance in whether a sighting is confirmed as a true exoplanet, or if it is merely a false positive sighting.
Monthly Average Air Travel Passengers: Time Series with Exponential Smoothing
In this project, I looked at a time series which recorded the monthly average number of air travel passengers on all U.S. flights over the course of 2003 to 2023. I used several different exponential smoothing methods to create various models and determine which one provided the best performance.
House Sale Prices from 2007 to 2019: Time Series Forecasting with Decomposing
In this project, I analyzed a time series of a collection of home sales prices from 2007 to 2019. I created a monthly time series which looks at the average home sale prices by month. I used classical and STL decomposition to observe the patterns and trends in this time series, and forecasting to estimate the average home sale prices for the next twelve months. I also looked at what would be the ideal sample size of observations for a training data set made from this time series.
Quassi Poisson Regression Model of the Cyclists on the Williamsburg Bridge
In this project, I created standard Poisson regression models on frequency counts and rates of the cyclists on the Williamsburg Bridge, along with a Quassi-Poisson regression model to look at the dispersion.
Poisson Regression of the Counts and Rates of Cyclists on the Williamsburg Bridge
A Poisson regression model project of the counts and rates of cyclists on the Williamsburg Bridge.
Predicting a Patient's Odds of Being at Risk for Developing CHD- Binary Predictive Modeling
In this project, we will create several candidate models for the purpose of using a multiple logistic regression model to predict the odds of an individual being at risk for developing coronary heart disease (CHD) over a 10-year period. We will use cross-validation to determine which candidate model has the greatest predictive power. We will also use ROC analysis the determine which candidate model has the greatest global goodness.
Predicting a Patient's Odds of Being at Risk for Developing CHD- Multiple Logistic Regression
This project utilizes multiple logistic regression to build a model which can be used to predict a patient's odds of being at risk for developing coronary heart disease (CHD) based upon various medical and personal risk factors.
Using Diastolic Blood Pressure to Predict a Patient’s Odds of Being at Risk for Developing CHD- Simple Logistic Regression
In this project, I created and analyzed a simple logistic regression model to predict the odds of a patient being at risk for developing coronary heart disease (CHD) over a 10-year period of time based on their diastolic blood pressure level.
Factors Affecting Forest Fires MLR Project Report
Analyzing the factors which affect the area of land affected by forest fires through the use of multiple regression models and bootstrap confidence intervals.
Factors Affecting Forest Fires- Multiple Linear Regression
A statistical analysis of the various factors affecting the area burned by a forest fire. This project tests several multiple regression models for this data to see which one provides the best utility for the prediction and estimation of the area affected by a forest fire.