RPubs

by RStudio

Shaahin

Shahin Ashkiani

Recently Published

Visualization of Cross-Efficiency Matrix as a Network

Part of my working paper, regarding visualization of data envelopment analysis problems. A document to keep my efforts of using three different packages for network visualization in R.

about 6 years ago

Extra Baggage? A Prediction Modelling on Airline Data

Using Logistic regression(Lasso), RandomForest, and Gradient Boosting, I predicted whether a customer would like to purchase extra baggage or not. The models are compared based on average AUC of 5folds 5repetitions cross-validation.

over 7 years ago

Analysis of a Mobile Game Dataset

Mobile gaming is a growing market with ~200b $ value. Data scientists are needed to work on the game data and extract valuable insights and actionable suggestions from them. In this report, I have analysed a set of three real datasets related to an anonymous game.

almost 8 years ago

Time-Series Analysis of Online Ads

A comprehensive approach to a dataset of 100,000 records of 71 companies, their ad types, the impressions and costs of the ads. EDA + time-series analysis

almost 8 years ago

Class Imbalance Problem - Part3

In this part, a new approach is used to tackle the problem. First, the training dataset is clustered using PAM, then a logistic regression model is fitted in each cluster. Using test data, this new approach shows huge advantage over the previous methods.

almost 8 years ago

Class Imbalance Problem - Part 2

In this part of the study, I have tried to improve classification by fitting logit models to each segment of the feature space. The segmentation is done using decision tree method. The result is poor.

almost 8 years ago

Class Imbalance Problems - Part I

Class Imbalance Problems are the classification problems where the frequencies of the response variable classes are not balanced. In such situations one class over-shadows the other, and being so makes it difficult for classifiers such as logistic regression to detect the dominated class observations. This study has two parts and this report is the part one. In this report, under sampling and ROSE methods are compared to logistic regression.

almost 8 years ago

A/B Testing - Udacity Course Final Project

A/B Testing is a very common sort of experimental design and analysis in digital world. The method is used in order to check the effect of changes before implementations, and it is vital in optimisation of websites, online games, etc...

almost 8 years ago

Bi-gram, tri-gram and word network analysis

The third part of text analysis of medical forum posts related to anxiety disorder. In this analysis, I have used bi-grams and trigrams analyses. Moreover, bi-gram networks are presented to give a more holistic perspective of the text

about 8 years ago

Sentiment Analysis of a Medical Forum

A medical forum about anxiety is used for performing sentiment analysis using nrc and afinn lexicons.

about 8 years ago

Sentiment Analysis of tweets about Klopp

Jurgen Klopp is one of my favourite managers. Probably the most favourite. However, he has not recently gotten any good result by Liverpool FC. This study is a text analysis, including word frequency and sentiment analysis, of 1000 tweets about Klopp, sent from Liverpool City.

about 8 years ago

How To Cope With Anxiety? : A Text Analysis Study

The data is from a medical forum, and is scraped using R. About 500 posts regarding the experience of members to deal with anxiety are analysed using Word frequency. The Word clouds reveal some interesting points

about 8 years ago

Visual Exploration of CO2 Emission of Copenhagen

Denmark Open Data website has various datasets, including a small dataset about CO2 emission in Copenhagen. The visualisations in the original spreadsheet were very limited, so I decided to do some further visual exploration into the data, using PCA bi-plots and time-series. The new plots revealed some hidden insights.

about 8 years ago

Exploration of Local/Global Average Annual Temperature

This is a report about comparison of Barcelona's annual average temperature and world's average. However, there is a link to a Shiny Applet that is generalisation of this report, and can be used for further exploration among various cities and different time periods.

about 8 years ago

SciMagViz: Interactive visualization of a SciMago Journal dataset

Scientometric has been an interesting topic to me since 2011, when I first introduced to it and did my MSc dissertation on citation analysis. This report is an introduction the a Shiny applet that I have recently developed in response to a personal curiosity about the performance of the open access journals. The applet at the current level has filtering features for the data, and relatively comprehensive features for visualisation based on Bi-plot and MDS. The GitHub repo and the applet address are available inside the report.

about 8 years ago

Rare Event dataset: logistic regression, Firth's logit and downsampling

I was happy that the logit model has 85% accuracy! However, the accuracy in isolation means almost nothing! The truth about the logit model revealed by checking TPR... When the proportion of the success is very low in the dataset, we have the rare event phenomenon. But what to do with it? This report tends to check some remedies, after investigation of the problem.

about 8 years ago

Exploration of DEMO's Alumni data using Alluvial Diagram

DEMO(Doctorate in Economics, Management and Organisation) is one of the PhD programs at Business Department's of Autonomous University of Barcelona(UAB). This report is a concise visual investigation of DEMO's alumni data, regarding their placements after graduation.

over 8 years ago

Predictive Analytics on Framingham Dataset

A Study on a dataset of 4240 cases of Framingham Heart Study centre, in order to build predictive models, using Logistic regression and Random Forrest. The evaluation is done using 5-Fold Cross Validation.

over 8 years ago

Data Visualisation Using Multidimensional Scaling

This publication is the first part of a series on Multidimensional Scaling and Unfolding that I am going to publish in Rpubs. It is A practical introduction to SMACOF package and Multidimensional Scaling, a dimension reduction and data visualisation technique. As I explained in the text, it intends to fill the gap between too short reports on MDS, available on web, and comprehensive textbooks such as (Borg and Groenen,2005). I did my best to make this instruction concise, incisive and accurate.

almost 9 years ago

R-Movies: Rise and Fall

Ratio of R-rated movies to all the movies produced the same year is calculated for each decade from 70s to 2010s, based on a small dataset of 650 titles. The significance of differences between consecutive proportions are evaluated. This report is a hobby job, from the hobby mini-projects that I have done and I do for fun, as well as practice statistics. Any critical appraisal on technical and non-technical aspects is welcomed.

almost 9 years ago

Sign In

Shaahin

Shahin Ashkiani

Recently Published