gravatar

Kevin_Nguyen_Tran

Kevin Tran

Recently Published

AdhereR - Medication Adherence
Utilizing AdhereR and learning the package. This analysis deals with exploring medication adherence from the hypothetical data set included in the AdhereR package in RStudio. The exploration of medication adherence begins with ensuring that both medication A (medA) and medication B (medB) were required to be taken by each patient. Once confirmed, we will narrow our focus onto two patients and dive into a general overview of the medication events per patient. We will analyze their adherence which will be represented as a percentage. The data represented in this analysis can be found on https://cran.r-project.org/web/packages/AdhereR/vignettes/AdhereR-overview.html as well as definitions to the terminology used within this analysis.
Employee Attrition EDA
This analysis deals with exploring different categorical and quantitative variables and their relationship to employee attrition. We will begin this analysis with a general overview to see how the data set is organized, calculate and visualize the correlation between the different variables and employee attrition, and generate a generalized linear model to reveal the good predictors of employee attrition. The data represented in this analysis is artificial/hypothetical and can be found on kaggle.com.
Pima Indians Diabetes Report
Diabetes is defined as a group of metabolic diseases that is categorized by chronic hyperglycemia. This can result from defects in insulin secretion, insulin action, or both. (Kharroubi and Darwish 2015) Throughout this analysis, we will analyze multiple variables and understand their prevalence in diabetes. We will first analyze each variable and its correlation to diabetes. From there, we will group each variable together that share similar correlations whether it is positive, negative, or neutral and analyze them further. Lastly, we will formulate a linear regression model to predict which variables are good predictors of having diabetes and which are not.
Medical Appointment No Show Analysis
This analysis deals with exploring variables and their relationship to no show rates for medical appointments. The variables that we explored are gender, number of days an appointment is scheduled in advance, age, financial aid via scholarships, preconceived health conditions, and SMS Message reminders. The data in this analysis were collected from patients in Brazil and comprises of 110,527 medical appointments and can be found on kaggle.com.
Breast Cancer UCI Analysis
Heart Disease UCI Analysis
This analysis deals with exploring disease status and understanding the relationship it has with a multitude of different variables and factors. Specifically, we aim for a better understanding of how sex (male and female), age, serum cholesterol levels, chest pain, blood pressure (bp), and maximum heart rate (hr) plays a part in the prevalence of heart disease. The following variables will be analyzed individually and then used to create a generalized binomial linear model to predict the probability of not having heart disease. The data represented in this analysis can be found at archive.ics.uci.edu.
COVID-19 Analysis - Death Rate
First data analysis project (roughly 4 months of experience coding in R and statistics). This analysis deals with exploring death trends in the United States, specifically having Florida as the focal point. The exploration in Florida begins with analyzing positive cases/trends, visualizing the positive and negative cases, and inspecting the total number of deaths. On the basis of population comparisons, Florida will be compared to California and Texas due to having the most similar total number of positive COVID-19 cases. The two main factors that will be investigated between the states are the average percent increase of deaths per day and the average deaths per day. The data represented in this analysis can be found on <https://covidtracking.com/> and will cover the time frame of 2020-03-04 to 2020-08-07. The questions asked during the analysis were: * How does Florida compare to other states in the U.S, in terms of, positive COVID-19 cases, total numbers of deaths, death rate, etc? * Which states currently (2020-08-07) have the most positive cases? * Since positive cases and deaths are positively correlated, which state has the highest death rate? Can we hypothesize that the state with the most total deaths has the highest death rate? * Based on linear models, can we predict which states will have the most deaths in the long run? The data analysis will be broken down by the following sections: * Florida Data * Data for All States * Narrow Focus - Florida, California, and Texas * Florida vs California * Florida vs Texas * Florida vs California vs Texas