RPubs

by RStudio

mammykins

Matthew Gregory

Recently Published

Classifier performance quantification

When we build statistical models from data using Machine Learning tools how do we go about assessing the models performance? We introduce some commonly used packages and functions therein which help us address this problem. We look at a decision tree classifier we built in a previous post to predict the outcome of a student's end of year exam result from Portuguese Maths students and assess its accuracy and other characteristics.

about 8 years ago

Merging Government data

We simulate some student data from the national pupil database and merge it with demographic and social deprivation data matched by Lower Super Output Area (LSOA).

about 8 years ago

lookup_tables_npd

We elucidate base tools for the problem of looking up translations using a lookup table in R. Thus we can recode or translate or mutate variables as required.

about 8 years ago

Regression trees for wine quality prediction

To develop a red wine rating model, we use data donated to the UCI Machine Learning Data Repository by Cortez et al. We are interested in predicting the quality of a red wine taste based on its quantified chemical characteristics. We fit a regression tree model to training data then evaluate with a test data set.

about 8 years ago

Forest fire prediction with support vector machines

This post is based on a paper by Cortez & Morais (2007). Forest fires are a major environmental issue, creating economical and ecological damage while endangering human lives. Fast detection is a key element for controlling such phenomenon. To achieve this, one alternative is to use automatic tools based on local sensors, such as microclimate and weather data provided by meteorological stations. We demonstrate the workflow in R for implementing a svm model using available functions.

about 8 years ago

Clustering analysis with K-means

To assess the patterning of the sex ratios and whether my intuition that there were three distinct groups or clusters in the data following RNAi treatment of female Tribolium castaneum beetles, K-means clustering was used to objectively (it is reproducible given the random seed) partition the data into K distinct, non-overlapping clusters. Essentially we wished to identify sub-groups within the data, with the three groups representing the non-fertile crosses, crosses yielding progeny with a 1:1 sex ration and crosses yielding all males. Code pertaining to k-means in R and how best to display the results are described.

about 8 years ago

Logistic Regression

Fit a logistic regression model in order to predict student Maths exam pass at the end of the year based on pertinent variables introduced in previous posts. Data derived from Cortez et al., publicly available on the UCI machine learning site (https://archive.ics.uci.edu/ml/datasets/Student+Performance).

about 8 years ago

cem_vig_matching_methods

Observational data tend to be technically and logistically easier to collect than randomised experimental data, hence tends to be more readily available. However, it comes with issues such as the treatment assignment mechanism which is likely to be non-random and unknown. This results in the distribution of the covariates (X) in the treatment groups being dissimilar. Thus we must try to match the treatment and control groups across variables to mitigate bias. We demonstrate that here with the cem package in R.

about 8 years ago

synthpop package in R with student attainment data

Synthesising Sensitive Student dataset to test machine learning tools for prediction using the synthpop package in R with Cortez (2008) secondary school student mathematics score data.

about 8 years ago

Neural Networks and student attainment prediction

An introduction to using neural networks to predict a continuous response variable output; the final year exam score in Maths, at a couple of Secondary schools in Portugal.

about 8 years ago

Decision trees for prediction in education

Using the machine learning decision tree algorithm to predict exam performance of students in secondary education. The decision tree is human interpretable.

about 8 years ago

Baccarat

Elucidates betting strategy for Baccarat and the long term loss one can expect under standard Casino rules.

about 8 years ago

Diamonds - linear regression

We are interested in predicting the price of a diamond based on 9 predictors. We want to know if we are getting ripped off for the price we pay for a diamond compared to historical prices.

about 8 years ago

Dota 2 - the mechanics of effective hp

How to calculate how many effective hit points one has to resist physical damage. Includes elucidation of several core functions that make up the mathematics of the Dota 2 universe.

over 8 years ago

Dota 2 - item choice to optimise ehp

Bang for your buck! How much effective hit points is an item purchase giving you?

over 8 years ago

Sign In

mammykins

Matthew Gregory

Recently Published