Recently Published
Analysing COVID-19 (SARS-CoV-2) outbreak in France (Part 1)
This document serves at the first part in the analysis of COVID-19 pandemic in France. In this notebook, we will take a look at incidence cases data by day at national and regional level through various tools, including fitting log-linear models, calculating effective reproduction number R and lambda, the relative measure of "force of infection".
Introduction to tidyverse (Part 2): Data manipulation with dplyr
This document provides basic introduction to data manipulation with verbs in the package dplyr, within the suit tidyverse: filter, arrange, select, mutate, sample and summarize. Moreover, it also includes different ways of joining two datasets with a similar approach with SQL.
Correlation between discrete (categorical) variables
This markdown is dedicated to introducing parameters to evaluate the strength of correlation between two categorical variables. First, it introduces contingency table, chi-squared test and related coefficients. Then, to fix the problem of symmetry, the uncertainty coefficient (or Thiel's U), which is based on the concept of information entropy, is taken into account.
Correlation between numerical variables
This markdown discuss three most common correlation coefficients for continuous/numerical variables (Pearson, Kendall, Spearman) and how to build the correlation matrix/heatmap.
Tidyverse in R (Part 1): An Overview
This RMarkdown is an overview of the package 'tidyverse' in R, one of the most popular and coherent suites of packages for data scientists/analysts.