gravatar

aweber47

Alex Weber

Recently Published

Final Project
My Final Project for Stats :D
Week 12 - Data Dive - Critiquing Models and Analyses
The purpose of this week's data dive is for you to think about identifying both analytical, ethical, and epistemological issues with statistical models.
Week 11 - Data Dive - Time Series (Alex Weber)
Time series analysis Wikipedia page views, cat and dog stuff :D
Week 10 - Data Dive - GLMs - Part 2 (Alex Weber)
In this week's data dive, I built a linear model to predict the `expected.mean` using the variables: `year`, `pop`, `acm.mean` and `excess.mean`. The model diagnostics indicated a strong fit for `pop`, `acm.mean`, and `excess.mean.` It is important to note there are no multi-collinearity issues. Yet, the unusually perfect R-squared values and large residuals suggest a need for further investigation - and scrutiny of the Model's reliability to predict and potential data anomalies.
Week 9 - Data Dive - GLM (Alex Weber)
In this data dive, I have built a logistic regression model that works, but doesn't produce the best results, Regardless, the model was able to predict the binary_outcome `Interested_Cat` based on the `sex` variable. I interpreted the coefficients and assessed the model precision with standard errors and confidence intervals (as mentioned as a bonus exercise).
Week 8 - Data Dive - Regression (Alex Weber)
In this week's data dive, I looked at how different features affect the `excess.mean` in the COVID dataset. I ran an ANOVA test, and the results showed that `sex` had no effect on `excess.mean`. Then, using `pop`, I constructed a linear regression model, discovering a statistically significant association, but the practical impact was minor. The addition of `sex` as a variable did not significantly increase the model's explanatory power. More research is needed to uncover more influential predictors or interactions for `excess.mean`.
Week 7 - Data Dive - Hypothesis Testing (Alex Weber)
Using RMarkdown, I conducted a thorough analysis of our dataset. I examined the data for hours before developing two unique null hypotheses, each with its own alpha, power, and minimal effect size. After that, I explained why I took those decisions. As appropriate, I investigated the feasibility of Fisher's style tests and Neyman-Pearson hypothesis testing, offering p-value interpretations. In addition, I produced two unique visualizations that provide useful insights into the data by emphasizing the outcomes of each hypothesis test. My research yielded substantial discoveries as well as new research ideas.
Week 6 - Data Dive - Confidence Intervals (Alex Weber)
For this week's data dive, I constructed three sets of variable combinations in my RMarkdown notebook. For each set, I ensured that all variables were continuous or sorted by including at least one calculated column. I built visualizations for each response-explanatory link, evaluated the plots for outliers, calculated the correlation coefficients, and highlighted the importance of the results using the visualizations. Along with deriving particular demographic findings, I provided confidence intervals for the response variables, stressing further research considerations. This experiment demonstrated the need to refer to data documentation and document models.
Week 5 - Data Dive - Documentation (Alex Weber)
This week's data dive aimed to locate and clarify any initially ambiguous columns or values in the dataset, identify any elements that remained unclear despite consulting the documentation, create a visualization to draw attention to the problem, and address any potential hazards throughout the data analysis process.
Week 4 - Data Dive - Sampling and Drawing Conclusions (Alex Weber)
I finished the assignment by generating numerous random sub_dfs of data, each representing roughly 0.5 of the main COVID dataset and storing them in separate data frames (sub_dfs 1:6). The analysis of these subsamples revealed variances in data features, demonstrating the impact of random sampling on data characteristics and underlining the importance of drawing cautionary inferences based on single samples. However, more research is required to understand the underlying data patterns and relationships.
Week 3 - Data Dive - Probabilities and Anomalies [UPDATED]
[UPDATED EDITION - MORE VISUALS!] I created at least three different data frames for my data analysis work using the "group_by" function on categorical columns, each of which presented the data in a different way. I calculated the expected probabilities for several groups inside these data frames and labeled the group with the lowest probability as an "anomaly," adding this useful information back into the original dataset. Through careful analysis, I was able to draw insightful conclusions from the computed data and develop well-founded theories to explain why some groupings are relatively rare. I explored combinations of two categorical variables to identify missing combinations and gain a thorough understanding of the prevalence of various combinations, all while taking into account the underlying factors that may contribute to these observed patterns.
Week 3 - Data Dive - Probabilities and Anomalies
I created at least three different data frames for my data analysis work using the "group_by" function on categorical columns, each of which presented the data in a different way. I calculated the expected probabilities for several groups inside these data frames and labeled the group with the lowest probability as an "anomaly," adding this useful information back into the original dataset. Through careful analysis, I was able to draw insightful conclusions from the computed data and develop well-founded theories to explain why some groupings are relatively rare. I explored combinations of two categorical variables to identify missing combinations and gain a thorough understanding of the prevalence of various combinations, all while taking into account the underlying factors that may contribute to these observed patterns.
Week 2 - Data Dive - Summaries (Alex Weber)
I supplied numerical summaries for at least 10 columns in this week's data dive, along with critical statistics for numerical data, categorical values, and counts. In order to find intriguing insights, I also put five other questions based on these summaries and the project objectives. I was able to study distributions and trends using the visual summaries for five columns, which improved my comprehension of the data. I underlined open questions for additional research and described the relevance of each finding along the way. In order to comprehend the dataset I'm working with better, I also gave insightful code comments and breakdowns.