This blog talks about handling imbalance in data for classification using different sampling methods.
Introduction to Artificial Neural Network. Comparison with natural neural network and backpropagation. ANN for And-gate built from scratch in R.
Step by step explanation of CART decision tree classification using Titanic dataset.
Explaining the curse of dimensionality using a relevant example
Step by step explanation of CHAID decision trees using the titanic dataset
Factor analysis is discussed. After a brief introduction to PCA and CFA, hypothesis tests like KMO,Bartlett's test of sphericity are introduced. In PCA, Scree plot, eigenvalues, validation and interpreting the factors is discussed.
Dickey fuller unit root test and Ljung box independence tests are discussed using attendance data set.
Blog on hierarchical clustering using dendogram for beer customer segmentation.
Discussion about stationary, random walk, deterministic drift and other vocabulary which form as foundation to time series
ARIMA using the Box-Jenkins approach. Discussed Dickey fuller, Ljung−Box Test and KPSS tests. Built an ARIMA model from scratch and validated the same.
Forecasting sales of new products using Bass model. Calculating p, q and m for iPhone sales using gradient descent. Cool visualizations and code provided.
Customer Lifetime value and steady state retention probability using Markov chains. Markov chains, steady state, homogeneity and Anderson− Goodman test and CLT explained. Used data from UCI m/c learning repository.
Linear programming in R along with sensitivity analysis and cool visualizations.
A complete analytical journey of linear regression. From EDA, model building, model diagnostics, residual plots, outlier treatment, co-linearity effects, transformation of variables, model re-building and validation for Boston housing price prediction problem.
Understanding part (semi partial) and partial correlation coefficients in multiple regression model. Deriving the multiple R-Squared and beta coefficients from basics. Inspired from Business Analytics: The Science of Data-Driven Decision Making by Dinesh Kumar.
Logistic regression using caret. validation using multiple tests and plots, insights, EDA and analysis. A complete analytical journey for solving classification problems.
Handling missing values in original mtcars data set by imputation using KNN algorithm.
Anova hypothesis test to test if unemployment is similar across Bangalore. Post hoc analysis and visualizations are presented.
Tutorial on Univariate analysis which is the first part of EDA. Explained using in-time problem with reusable R code.
Tutorial on Multivariate analysis which is the second part of EDA. Explained using in-time problem with reusable R code.
Explaining Class size paradox by web scraping placements data from Amrita University website.
Analyzing the safety of different types of vehicles in Bangalore by using a Chi Square test of Independence.
Implementing Chi square goodness of fit test on R. Testing if a sampling distribution is Normally distributed or exponentially distributed.
z test and t test for .one sample location tests
Tutorial on Multicollinearity which is the third part of EDA. Plot of Correlation matrix and network for in-time problem with reusable code.
Recommendation systems using associate mining