gravatar

SophieXZ

Xue Zhang

Recently Published

presentation
Presentation
Presentation module7
Exploratory Data Analysis
test
test
Practical Machine Learning Project
In this project, weused data from accelerometers on the belt, forearm, arm, and dumbell of 6 participants. They were asked to perform barbell lifts correctly and incorrectly in 5 different ways. Our goal is to identifying those 6 performances according to the accelerometers recordings in the training data set. After PCA, we applied several classifation models, including : decision tree, random forest, LDA, QDA, KNN and SVM. It turns out that the random forest performs best in cross validation with accurate of 97%.
PM2.5 EDA and Optimized Visualization
The overall goal of this study is to explore the PM2.5 data from 1999 to 2008 and answer questions. The data and questions are from a course("Exploratory Data Analysis") homework of "Data Science Specialization" in Coursera from Johns Hopkins University. The requirement of the project is to answer the questions by exploring the data set in whatever plots. After I have a better understanding of data science, I feel it is crucial for the plots to talk itself. Thus I improved the plots in ways to show better efficiency. Some of the optimize ideas are enlightened from the book "storytelling with data" by Cole Nussbaumer Knaflic.
Reducing redundant predictors
This study was to find out what is/are the most critical factor(s) in predicting the numeric response. The main challenge is to deal with the redundant predictors, especially considering the limited number of observations. After EDA, we first reduce the number of predictors from 22 to 6 by comparing the correlation, then applied the best subset method to reduce the final predictor to only 3. The final model uses 3 predictors out of 21, and explains over 99% of the speed(response) variation. There had been many details to consider in this project, from data cleaning, exploring, and transforming, to the selection of statistical methods. Here I just post the main ideas for better understanding.
understanding poit estimates(method of maximum likelihood vs method of moments) in one example
understanding poit estimates(method of maximum likelihood vs method of moments) in one example
[Training Notes] Example: visit date
Deriving a Visit Based on Visit Windowing: Calculating study day and difference, CHOOSE THE EARLIER OF THE TWO OBSERVATIONS IN THE EVENT OF A TIE ON BOTH SIDES OF THE TARGET.
[Training Notes] AE summerizing examples
Summarizing Free-Text Adverse Event Data, Summarizing Coded Adverse Event Data
[Training Notes] data analysis examples
Frequencies, Cochran-Mantel-Haenszel Statistics, Cochran-Armitage Trend Test, Friedman Chi-Square Test, Cochran Q Test.
[Training Notes] Combine data in SAS
To get longer table(more rows): SET or (PROC APPEND) To get wider table( more columns): MERGE when dataA and dataB are not with the same number of rows : Some details match by BY SORT and then BY
[Training Notes] Combine data in SQL
Join only matching records: WHERE or ON Left join Right join Full join Cross join Union join Producing Unique Rows from Both Queries (UNION) Producing Rows from Both Queries (UNION ALL) Producing Rows That Are in Only the First Query Result (EXCEPT) Producing Rows That Are in Only the First Query Result (EXCEPT) Producing Rows That Belong to Both Query Results (INTERSECT) Concatenating the Query Results (OUTER UNION) Concatenating the Query Results (OUTER UNION CORR) Producing Rows from the First Query or the Second Query