Easy web publishing from R
Write
R Markdown
documents in RStudio.
Share them here on RPubs.
(It’s free, and couldn’t be simpler!)
Get Started
Recently Published
PAF 516 Final Dashboard - Christopher Ogino
PAF 516 Final Dashboard - Christopher Ogino
Statistics for Data Science (229711) - Chapter 6: Data Preprocessing
This chapter dives into the "engine room" of Data Science: Preprocessing. Students will learn that the quality of a model is determined long before it is trained, focusing on the critical steps required to turn messy, real-world data into a "model-ready" format.
Core Topics covered:
Why Preprocessing Matters
Handling Missing Data
Outlier Detection and Treatment
Data Transformation
Encoding Categorical Variables
Feature Scaling
Data Integration and Reshaping
Chapter Lab Activity: Full Preprocessing Pipeline with msleep
Statistics for Data Science (229711) - Chapter 5: Data Sampling Techniques
This chapter addresses the foundational question of data science: "How do we ensure our data truly represents the world?" It explores the mechanics of selection, the math of sample size, and the power of computational resampling.
Core Topics covered:
Why Sampling Matters
Probability Sampling Methods
Non-Probability Sampling Methods
Sample Size Determination
Sampling Bias and Common Pitfalls
Bootstrap Resampling
Evaluating Sample Quality
Chapter Lab Activity: Exploring Sampling with nhanes-Style Data
Statistics for Data Science (229711) - Chapter 4: Test of Independence of Variables
This chapter explores the statistical frameworks used to detect and quantify relationships between variables. It moves from testing the independence of categorical factors to measuring the strength and direction of associations in both discrete and continuous data.
Core Topics covered:
The Concept of Independence
Chi-Square Test of Independence
Fisher’s Exact Test
Cramér’s V and Effect Size for Categorical Association
Correlation Tests
Point-Biserial and Phi Coefficients
Partial Correlation
Chapter Lab Activity: Exploring Independence with the titanic and mtcars Datasets
Emerging and Developed Markets: The Stories Data can Tell
STAT 3280 Project