Recently Published
Comparison of models for credit risk purposes - logistic regression vs random forest
In this research I compared predictive power of Logistic Regression and Random Forest in the context of Credit Scorecard. Firstly I estimated GLM model, in accordance with best practices and subsequently I trained a Random Forest, using cross-validation to optimise hyperparameters. Random Forest seems to outperform Logistic Regression, however the difference is not immense. In case of Credit Scorecard building, GLM models still are preferable as they provide clear answer about how each trait of a client contributes to their Credit Score.
Evolution of language in literature throughout ages
The goal of our project is to compare how English language evolved in four corresponding centuries, from 17th century to 20th century. To achieve this aim we collected books from specific centuries from Project Guttenberg.
Using Association Rules for transactional data
Using Association Rules, I mine interesting relationships between the items. I also create a simple recommender system in R shiny.
Dimension reduction in Analysis of Human Interests
The main goal of this research was to examine whether human interests can be described by a smaller number of latent concepts by dimension reduction. In the analysis I used Multidimensional Scaling with k-means and compared the results to Hierarchical Clustering.
Extreme Value Theory
Short introduction to Extreme Value Theory with a short literature review.
Using K-means and PAM clustering for Customer Segmentation
In this article, I will use data mining techniques such as K-means and PAM to divide customers into groups with different characteristics. Data comes from a small online shop.