Recently Published
Aviation Accident Analysis - Final Project
Apart from being fascinated by the aircraft's technical part, it is also very important to care about its safety, right? Yes, I am also more concerned to know what went wrong.
How to learn from what went wrong? and how to improve airplane travel quality and safety. For a comprehensive analysis, I will reflect on the topics covered in this course from Week 1 to this moment, and I ensure that every step is carefully documented and explained properly for easy understanding. These are the purposes of selecting this dataset and conducting an extensive analysis to give the right answers to my WHAT, HOW, WHY and WHAT NEXT?
Model Critique - Week 13
The purpose of this week's data dive is to think about identifying analytical, ethical, and epistemological issues with statistical models.
A group review of some of the labs used in the course, Week 11 to be precise.
Bike Data Analysis - Week 12 - Time Series Modelling
This week's data dive is on Time Series Modelling, a statistical concept that uses time as an explanatory variable in a model. It is a crucial aspect of analyzing and forecasting time-dependent data, which involves studying the patterns and characteristics of a sequence of observations collected over time, to understand the underlying data-generating process and make accurate predictions about future values.
As discussed earlier in the week; the key components of time series modeling include: trend, seasonality, cyclical patterns, and random noise or irregular fluctuations.
Time series modeling involves several steps, including data exploration and visualization, stationarity testing, model identification, parameter estimation, diagnostic checking, and forecasting. The choice of model depends on the characteristics of the data we are looking at, the presence of trend and seasonality, and the desired level of accuracy and interpretability.
Week 11 | Generalized Linear Models (Part 2)
Last week's analysis covered Generalized Linear Models (GLM), Transformations, and Logistic Regression, however this data dive will mostly be the continuation of Generalized Linear Models (GLM). As discussed earlier, Generalized Linear Models (GLM) are a broad class of statistical models that extend linear regression to accommodate response variables that are not well modeled by a normal distribution.
As discussed in last week's class, this data dive will follow these statistical topics:
Generalized Linear Models (GLMs): These are a class of models that extend linear regression to handle non-normally distributed response variables by using a link function.
Maximum Likelihood Estimation (MLE) is a method used in GLMs to estimate the model parameters by maximizing the likelihood function, providing the most probable parameter values given the observed data.
When comparing GLMs, metrics like Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC), analysis of variance (ANOVA), and deviance help assess model fit, complexity, and predictive performance, aiding in model selection.
Variable Expansion involves transforming predictors to fit the data distribution better, while Variable Selection aims to identify the most relevant predictors, enhancing model interpretability and predictive accuracy in GLMs.
As discussed in last week's analysis, and a continuation in this part for the General Linear Models (GLM), the analysis journey continues with another deep dive into how it relates and informs our decisions, conclusions, and insights for the bike sale dataset.
Week 10 | Generalized Linear Models
This data dive is focused on Generalized Linear Models (GLM), Transformations, and Logistic Regression as discussed in this week's class.
Generalized Linear Models (GLM) are a broad class of statistical models that extend linear regression to accommodate response variables that are not well modeled by a normal distribution.
In a standard linear regression model, we tend to predict an outcome variable based on a linear combination of predictor variables, and we assume that the outcome variable is normally distributed around that linear prediction.
Bike Data Analysis - Week 9 | Regression Diagnostics
This week's data dive is a continuation of the past week's data dive, while this particular dive is primarily focused on critically interpreting and diagnosing regression models. As discussed last week and as I continue the regression modeling this week, I will be presenting a comprehensive data analysis that is focused on understanding factors influencing bike purchases.
Bike Data Analysis - Week 8 | Regression Modeling
This task will guide me in getting experience running ANOVA tests and building regression models using the Bike sales dataset. Earlier in the week, regression model building was discussed alongside other topics such as:
ANOVA.
F-test.
Bonferroni Correction.
Linear Regression Theory.
Bike Data Analysis - Week 7 | Hypothesis Testing
This week's data dive is focused on exploring the bike sales dataset on hypothesis testing, after having explored this particular dataset over the past few weeks; sometimes I come to a halt, confused and lost in my analysis process. I came across various questions that I could not answer instantly, however, I will try to address some of these questions in this week's data dive.
So, this week's data dive will analyze the dataset around the following topics as discussed this week:
a. Empiricism vs. Rationalism
b. Hypothesis Testing Paradigms
c. Null Hypothesis
d. Type I vs. Type II Error
e. p-value
f. AB Testing
Bike Data Analysis - Week 6 | Confidence Intervals
This week’s data dive presents a comprehensive data analysis focused on understanding factors influencing bike purchases. I will explore various relationships within the Bike sale dataset, emphasizing the importance of data documentation, detailed analysis, and referencing the documentation for the data that I am using.
Through this analysis, I aim to uncover insights that can guide strategic decisions and highlight the value of rigorous data examination, which is a bit similar to last week’s data dive.
Bike Data Analysis - Week 5 | Documentation
This week's data dive is focused on exploring my usual dataset on bike purchases, I will be focusing on the importance of data documentation, which is the main task of this week's analysis, and making it as easy, and simple to read to anyone reading this week's R Markdown document.
I will delve into specific columns that require documentation for clarity, identify elements that remain unclear even after consulting documentation, and visualize these findings to highlight potential issues.
Bike Data Analysis - Week 4 | Sampling and Drawing Conclusions
This analysis is still on the bike dataset, which focuses on thinking critically about what might go wrong when it comes time to make conclusions about my data.
With a comprehensive analysis, I was able to isolate collection of 5 random samples, Scrutinize these subsamples and Consider how this investigation affects how I might draw conclusions about the data in the future.
Bike Data Analysis - Week 3 | Group By and Probabilities
This data dive focuses on individual rows of data, and groups of them. For each row/group, I investigated the probability of that row or group.
Bike Data Analysis - Week 2 | Summaries
This data tracks customer demographics (age, income, number of children, education level, etc) for a bike manufacturing company. The relevant data gathered is important in deciding what the future target customer base is.
The data also includes detailed segmentation data - region, commute distance ranges, and occupation categories, which indicates a desire to precisely understand behavioral differences in subgroups.