Harsha Achyutuni

Recently Published

Handling imbalanced classes
This blog talks about handling imbalance in data for classification using different sampling methods.
Artificial Neural Network from scratch - part 1
Introduction to Artificial Neural Network. Comparison with natural neural network and backpropagation. ANN for And-gate built from scratch in R.
CART Classification
Step by step explanation of CART decision tree classification using Titanic dataset.
Curse of dimensionality
Explaining the curse of dimensionality using a relevant example
Step by step explanation of CHAID decision trees using the titanic dataset
Exploratory Factor analysis
Factor analysis is discussed. After a brief introduction to PCA and CFA, hypothesis tests like KMO,Bartlett's test of sphericity are introduced. In PCA, Scree plot, eigenvalues, validation and interpreting the factors is discussed.
Stationarity tests
Dickey fuller unit root test and Ljung box independence tests are discussed using attendance data set.
Hierarchical Clustering
Blog on hierarchical clustering using dendogram for beer customer segmentation.
Stationarity introduction
Discussion about stationary, random walk, deterministic drift and other vocabulary which form as foundation to time series
ARIMA using the Box-Jenkins approach. Discussed Dickey fuller, Ljung−Box Test and KPSS tests. Built an ARIMA model from scratch and validated the same.
Adoption of new product - non linear programming
Forecasting sales of new products using Bass model. Calculating p, q and m for iPhone sales using gradient descent. Cool visualizations and code provided.
Customer Lifetime Value
Customer Lifetime value and steady state retention probability using Markov chains. Markov chains, steady state, homogeneity and Anderson− Goodman test and CLT explained. Used data from UCI m/c learning repository.
Linear Programming
Linear programming in R along with sensitivity analysis and cool visualizations.
Linear regression
A complete analytical journey of linear regression. From EDA, model building, model diagnostics, residual plots, outlier treatment, co-linearity effects, transformation of variables, model re-building and validation for Boston housing price prediction problem.
Part and partial correlation
Understanding part (semi partial) and partial correlation coefficients in multiple regression model. Deriving the multiple R-Squared and beta coefficients from basics. Inspired from Business Analytics: The Science of Data-Driven Decision Making by Dinesh Kumar.
Logistic regression
Logistic regression using caret. validation using multiple tests and plots, insights, EDA and analysis. A complete analytical journey for solving classification problems.
KNN Imputation
Handling missing values in original mtcars data set by imputation using KNN algorithm.
Analysis of Variance
Anova hypothesis test to test if unemployment is similar across Bangalore. Post hoc analysis and visualizations are presented.
Univariate analysis
Tutorial on Univariate analysis which is the first part of EDA. Explained using in-time problem with reusable R code.
Multivariate analysis
Tutorial on Multivariate analysis which is the second part of EDA. Explained using in-time problem with reusable R code.
Web scraping in R
Explaining Class size paradox by web scraping placements data from Amrita University website.
Chi Square test of independence
Analyzing the safety of different types of vehicles in Bangalore by using a Chi Square test of Independence.
Chi-square goodness of fit test
Implementing Chi square goodness of fit test on R. Testing if a sampling distribution is Normally distributed or exponentially distributed.
One Sample Location test
z test and t test for .one sample location tests
Tutorial on Multicollinearity which is the third part of EDA. Plot of Correlation matrix and network for in-time problem with reusable code.
Recommendation Systems
Recommendation systems using associate mining