Recently Published
Stroke Prediction Model with Logistic Regression, Random Forest, & SVMs
This RMarkdown file contains the report of the data analysis done for the project on building and deploying a stroke prediction model in R. It contains analysis such as data exploration, summary statistics and building the prediction models.
Logistic Regression and Random Forest to Determine Credit Card Default (3rd Edition)
This edition improves the logistic regression model using ROC curve and optimal cutoff points.
Logistic Regression and Random Forest to Determine Credit Card Default (2nd Edition)
In this session, I revised the logistic regression and random forest prediction models using random sampling in the training data (oversampling and undersampling). In addition, I printed the accuracy, precision, and recall values for each prediction model.
Logistic Regression and Random Forest to Determine Credit Card Default
In this session, I examine the Credit Card Clients data set found on the UCI Machine Learning Repository website to determine if a person will default on their credit card using logistic regression and random forest prediction models.
Clustering of US Arrests Data Set
In this R Markdown session, I will use the built-in "USArrests" data set and perform a hierarchical and k-means clustering.
Clustering of Student Performance
In this R Markdown session, I will revisit the "Students Performance in Exams" dataset and perform a few clustering techniques.
Word Cloud for Baby Gender Names
In this R Markdown session, I will demonstrate how to use a Word Cloud in R.
Determine the Best Regression Model for Houses in the United States
In this R Markdown session, I will examine the characteristics of the "USA Real Estate Dataset" available on the Kaggle website. In addition, an AIC (Akaike Information Criterion) model will be created to see if there is a potential regression model for predicting a house's price based on the variables available.
Discriminant Analysis and Random Forest Using FIFA 19 Data
Data Cleaning & Manipulation, Discriminant Analysis, and Random Forest with FIFA 19 data set. This was done as part of a group project my final semester of college.
Exploratory Data Analysis, Linear Regression, and Prediction Modeling of Student Performance
This document demonstrates exploratory data analysis techniques in addition to linear regression and prediction modeling. The dataset used is the "Students Performance in Exams" dataset from the Kaggle site.
Basic Data Cleaning and Statistical Methods Implemented Using UCI Heart Disease Data Set
In this document, I detail a few basic techniques in R used to check the cleanliness of a data set. In addition, I demonstrate a few basic and well-known statistical methods (box plot, scatter plot, linear regression) used to view and interpret data.