Recently Published
Naive Bayes/K-Nearest Neighbor Implementation
Predicting outcome y - diagnosis benign or malignant based on input variables x - “radius_mean”
[area_mean”
“smoothness_mean”
Decision Tree Implementation - creating a model and checking accuracy
mplementing a desicion tree to Classify if a particular flower belongs to output variable - species - setosa, versicolor, or virginica depending on input variable - sepal length, sepal width, petal length and petal width -
Multiple Linear Regression
Predicts mpg for a car from hp, vol, sp, and wt
Cars dataset - IV x1 hp, x2 vol, x3 sp, x4 wt input variable
- DV y mpg output variable
Simple Linear Regression - Fat~WaistCircumference
Linear Regression
Eg - Predict fat (adipose tissue - AT) in body based on waist circumference.
7. Binomial and Multinomial Logistic Regression
A logistic regression model was conducted to predict whether a participant will click the link is influenced by types of advertisements. A significant regression equation was found (x2(2) = 31.13, p < .001), with a Pseudo-R2of .01. Both intercept (z = 5.26, p < .001. b = .40), advertisement one (z = 2.23, p = .026. b = .24) and advertisement two (z = 5.51, p < .001. b = .63) were statistically significant. The odds ratio of advertisement one is 1.28, which suggests that individuals who see advertisement one, would have a 1.28 times of chance to click the related website than those individuals who don’t see any advertisements; the odds ratio of advertisement two is 1.87, which suggests that individuals who see advertisement two, would have a 1.87 times of chance to click the related website than those individuals who don’t see any advertisements.
6. Multiple Linear Regression - MLR
A multiple linear regression model was conducted to predict participants’ weight increase, based on their sugar and water intake. All the regression assumptions were met, and no further adjustments made. A significant regression equation was found (F(2,17) = 21.47, p < .001), with an R-square of .72, which suggests that 72% of the variance of participant’s weight gained can be explained by the two predictors. Both sugar (t = 4.07, p < .001. b = 1.57) and water (t = 3.65, p = .002. b = .02) were statistically significant. The result suggested that sugar predicts that for each sugar intake, there is a 1.57-gram increase in participant’s weight. Besides, water also predicts that for a cup of water drank there is a 0.02-gram increase in participant’s weight.
5. Simple Linear Regression - SLR
A simple linear regression model was conducted to predict participants’ money-saving motivation, based on their annul income. All the regression assumptions were met, and no further adjustment made. A significant regression equation was found (F (1, 97) = 18.37, p < .001), with an R2 of .16. Both the intercept (p = .002) and predictor (p < .001) were statistically significant. The result suggested that, income predicts and shows that for each dollar increase in income there is a 0.0000001097 percent increase in savings.
4. Introduction of Correlation
1 - Bivariate Correlation - t shows how much X will change when there is a change in Y.
(Parametric and Non Parametric)
2. Partial and Semi Partial Correlation: looks at the relationship between two variables while controlling the effect of one or more additional variables
3. Factorial (2 or more Way) ANOVA
A study was conducted to investigate whether, taken together, a person’s status and product gamble form affected individuals’ valuation of the product. Observations from the study were analyzed by conducting a two-way analysis of variance with the two independent variables (condition and product or gamble) and the dependent variable (valuation) using R version 3.61.
First, all assumptions are met and there is no adjustment made. ANOVA analysis
revealed there is no statistically significant interaction effect (F(2, 234) = 2.698, p = .070). Results also suggest that the valuations of an item’s worth was affected by the status of (as buyers, sellers, or choosers) the participants (F(2, 234) = 23.45, p < .001) and product or gamble (F(1, 234) = 43.98, p < .001).
2. ANOVA with Blocking Design
Observations from the study were analyzed by conducting a one-way analysis of variance using R version 3.6.1. First, all assumptions are met and there is no adjustment made. Results suggest that the task conditions (predictor) has a significant effect on performace score (outcome) (F(2, 86) = 29.89, p < .001). [We can not consider age group as blocks because it also has a significant effect on outcomes F(1, 87) = 60.21, p < .001.]
Continuing the discussion with specifically which task condition produced the signiificantaly differed measures of the performance score, a Tukey’s hoc test was established. The result suggested that there is a significant difference between task condition 1 and 2, condition 2 and 3, and condition 1 and 3 (All p-value < 0.001) in terms of the performance score. The effect were large, Cohen’ D = 0.95, 1.05, and 2.02.
1. ANOVA - Intro to Analysis of Variance
The statement of the research/ study purpose H0: BrewMethod1 = BrewMethod2 = BrewMethod3 H1: at least one pair of levels differ from one another
The type of analysis conducted, i.e. D’Agostino test, Scatterplot of residuals, Bartlett test. etc.
Descriptive statistics: basic information of the data, i.e. age and gender of the participants.
The ANOVA test
Post-hoc analysis
Effect size
Conclusions
Chi Square Test
Chi Sq Test - used when variables are categorical (not continuous)
TYPES 1) Chi Square Goodness of Fit – when we wish to compare an observed frequency to an expected one.
TYPE 2) Chi Square Test of Independence – when we wish to see if two groups differ in their observed frequencies across a categorical dependent variable.
Paired Samples T Test
Paired Samples t-test - to compare before and after or with and without treatment kind of situation data
Independent or Two Sample T Test
Independent or Welch Two Sample T Test - comparing two different groups of data
One Sample T Test
One Sample T Test - Comparing our current data sample with a new data sample
Linear Regression - Basics
Simple Linear Regression
Confirmatory Factor Analysis - CFA
In the study, the researcher conducted a confirmatory factor analysis to evaluate the hypothesized behavioral factor structure. Four factors are conducted; attitude, loyalty, interest and satisfaction. We allowed each of the items (interests, preference, purchase, etc.) to load onto their respective factors; The results are reported in Table 1. The first hypothesized four-factor model fit the data well: RMSEA = .068, (90% confidence interval: .065 – .072), CFI = .891, SRMR = .044. Indicator loadings for each factor were statistically significant and high (.40 ≤ λs ≤ .85). (provide details) A test of modification indices on CFA2, we found a high MI score of 59.838, indicating that Q14R may be better loaded in Factor 4. Therefore, we created another model, moving Q14R from Factor 1 to Factor 4. The new suggested model fit the data well: RMSEA = .066, (90% confidence interval: .062 – .07), CFI = .903, SRMR = .044. Indicator loadings for each factor were statistically significant and high (.40 ≤ λs ≤ .85). Chi-square test indicated that there is no significant difference between the two models.
Exploratory Factor Analysis - EFA
Initially, the factorability of the 10 items was examined. Several well-recognized criteria for the factorability of a correlation were used. Firstly, it was observed that 6 of the 10 items correlated at least 0.3 with at least one other item, suggesting reasonable factorability. Secondly, the Kaiser-Meyer-Olkin measure of sampling adequacy was .93 (overall MSA), above the commonly recommended value of .84. Given these overall indicators, factor analysis was deemed to be suitable with all 16 items.
An exploratory factor analysis with oblique rotation was conducted since the factors are correlated. In total, 45% of the variance is explained by the two factors. Factor one includes 6 items, explains 23% of the variance and, factor two includes 4 items, explains 22% of the variance.
The test of Cronbach's alpha was also performed in terms of studying the internal consistency. The Cronbach's alpha for the first factor was .454 and .042 for factor 2.
EDA Project Cov19 Mortality
The statement of the research/ study purpose
H0: Death in white = Death in latin/hispanic = Death in black americans
H1: at least one pair of death rate differ from one another
The type of analysis conducted, i.e. D’Agostino test, Scatterplot of residuals, Bartlett test. etc.
Descriptive statistics: basic information of the data, i.e. age and gender of the participants.
The ANOVA test
Post-hoc analysis
Effect size
Summary Write Up/Conclusions
Probability of Normal Distribution and Transformation
Probability of Normal Distribution and Transformation