Recently Published
Correspondence Analysis
Correspondence Analysis for 2 categorical variables (Race/Restaurants)
MYOPIA Study
This dataset is a subset of data from the Orinda Longitudinal Study of Myopia (OLSM), a cohort study of ocular component development and risk factors for the onset of myopia in children. Data collection began in the 1989–1990 school year and continued annually through the 2000–2001 school year.
All data about the parts that make up the eye (the ocular components) were collected during an examination during the school day. Data on family history and visual activities were collected yearly in a survey completed by a parent or guardian. The dataset used in this text is from 618 of the subjects who had at least five years of follow-up and were not myopic when they entered the study
Data Visualization with ggplot
Data Visualization with ggplot2
R Coding Samples
Some samples for R Coding
Libraries:
MASS
Basics
moments
dplyr
haven
Data Visualization with R
Some Samples for Data visualization with R
Logistic Linear Regression, Model Selection
Perform logistic model selection methods and find the best model for dataset birthweight_final.csv
Multiple Linear Regression, Model Selection
Perform model selection methods and find best model for dataset birthweight_final.csv
ANOVA
We will use dataset birthweight.csv.
The birthweight data record live, singleton births to mothers between the ages of 18 and 45 in the United States who were classified as black or white. There are total of 295 observations in birthweight.
We will perform ANOVA analysis.
Perform
Logistic Regression, Stepwise Model Selection with AIC- II
Find and specify the best set of predictors via stepwise selection with AIC criteria for "Sleep" dataset
Logistic Regression, Stepwise Model Selection with AIC
We want to only for females in the data set, find and specify the best set of predictors via stepwise selection with AIC criteria for a logistic regression model predicting whether a female is a liver patient.
Linear Regression, Stepwise Model selection
Consider stepwise model selection for the Cholesterol model. Before performing the model selection, which has a cook’s distance larger than 0.015.
Linear Regression, Best Subset Selection
consider the best subset selection for the Cholesterol model, which has cook’s distance larger than 0.015, before performing the model selection.
Linear Regression
The medical director wants to know if blood pressures and weight can better predict cholesterol outcome. Consider modeling cholesterol as a function of diastolic, systolic, and weight.
Linear Regression
The medical director at your company wants to know if Weight alone can predict Cholesterol outcomes. Consider modeling Cholesterol as a function of Weight
two-way ANOVA
Analysis of Variance
Use the cars_new.csv. See HW1 for detailed information of variables.
two-way ANOVA
The psychology department at a hypothetical university has been accused of underpaying female faculty members. The data represent salary (in thousands of dollars) for all 22 professors in the department. This problem is from Maxwell and Delaney (2004).
one-way ANOVA
For this problem, we will use the bupa.csv data set.
UCI Machine Learning Repository for more information (http://archive.ics.uci.edu/ml/datasets/Liver+Disorders).
The mean corpuscular volume and alkaline phosphatase are blood tests thought to be sensitive to liver disorder related to excessive alcohol consumption. We assume that normality and independence assumptions are valid.
Analysis of Variance
The heartbpchol.csv data set contains continuous cholesterol (Cholesterol) and blood pressure status (BP_Status) (category: High/ Normal/ Optimal) for alive patients.
For the heartbpchol.xlsx data set, consider a one-way ANOVA model to identify differences between group cholesterol means. The normality assumption is reasonable.
Hypothesis Testing II
The airquality data will be used for this Exercise.
Perform a hypothesis test -whether Wind in July has a different speed (mph) than Wind in August.
Hypothesis Testing I
Cars.csv will be used for Exercise. Perform a hypothesis test of whether SUV has different horsepower than Truck, and state your conclusions
Descriptive Statistics
Create a combined mpg variable called MPG_Combo which combines 60% of the MPG_City and 40% of the MPG_Highway. Obtain a box plot for MPG_Combo and comment on what the plot tells us about fuel efficiencies.
One-Way ANOVA - Sample 1
The heartbpchol.csv data set contains continuous cholesterol (Cholesterol) and blood pressure status (BP_Status) (category: High/ Normal/ Optimal) for alive patients.
For the heartbpchol.csv data set, consider a one-way ANOVA model to identify differences between group cholesterol means.