RPubs

by RStudio

juanhklopper

Dr Juan H Klopper

Recently Published

Sampling and sampling distributions

In this R markdown file I consider sampling and sampling distributions and show how randomisation under the null hypothesis can help us understand how likely a test statistics such as the difference in means between two groups is given sample data.

over 3 years ago

Odds ratio and confidence intervals

Calculating odds, odds ratios, and the confidence intervals for odds ratios using bootstrapping.

about 4 years ago

Astra Zeneca efficacy SARS-CoV-2 vaccine trial on healthy individuals in SA

Replicating the results of the Astra-Zeneca SARS-CoV-2 vaccine trial in South Africa. This small trial recruited (relatively) young, healthy individuals. The outcome showed poor efficacy and lead to the discontinuation of the use of the vaccine in South Africa.

about 4 years ago

Uncertainty in relative risk

Calculating confidence intervals for relative risk

about 4 years ago

JHU coronavirus analysis end 2020

A short notebook on analysing coronavirus data fro the JHU dataset. Considering what lies ahead for RSA in 2021.

over 4 years ago

Discrete time series modeling - Modeling with R series

In this publication we take a look at modeling in discrete time steps. It is part of a series showing modeling using R.

about 5 years ago

Logistic regression - Modeling with R series

In this publication we look at logistic regression. It is part of a series showing modeling using R.

about 5 years ago

Nonlinear modeling - Modeling with R series

In this publication we take a look at nonlinear modeling. It is part of a series showing modeling using R.

about 5 years ago

Plotting - Modeling with R series

In this publication we take a look at plotting data. It is part of a series showing modeling using R.

about 5 years ago

Statistics using the tidyverse - Modeling with R series

In this publication we take a look at conducting basic statistical analysis using the R language. It is part of a series showing modeling using R.

about 5 years ago

Introduction to R - Modeling with R series

This publication introduces you to the R language. It is part of a series showing modeling using R.

about 5 years ago

Predicted skin lesion images

How to batch images from your local drive using image data generation.

over 5 years ago

Simple gradient descent

This tutorial gives an intuitive view of gradient descent (finding the minimum of a function) by way of the simple case of a parabola in one variable.

almost 6 years ago

Working with data

Importing spreadsheet files and manipulating data with dplyr.

about 6 years ago

Introduction to R and RStudio

The first lecture in my new course on medical statistic for residents (specialist in training) in the Health Sciences Faculty of the University of Cape Town. Lectures are held on a Tuesday evening and repeated on a Thursday evening.

about 6 years ago

Forcats library for categorical variables

This document looks at managing categorical variables in a dataset. In base R, these are seen as factors. Not so in the tidyverse, which creates tibbles instead of dataframes. The forcats library allows us to make use the advantages of factors in the tidyverse.

about 6 years ago

Box-and-whisker plots

In this tutorial we take a look at the ubiquitous box-and-whisker plot. It is ideal to visualize the spread in a numerical variable by way of quartile values. It can also display statistical outliers which can be great importance.

over 6 years ago

Adding color to Plotly plots

In this tutorial we take a look at all the color options available in Plotly for R. You can indeed color your plots and charts to your heart's content.

over 6 years ago

Control color in histograms using Plotly for R

In this tutorial we add some control over the color of our histograms. It is an extension of a previous tutorial on histograms.

over 6 years ago

G tests for categorical variables

In this tutorial we explore alternative tests for categorical variables.

over 6 years ago

Tests for categorical variables

In this tutorial we take a look at the most common tests to analyze categorical variables.

over 6 years ago

Exact test of goodness of fit

In this tutorial we take a look at the exact test of goodness of fit for binomial categorical variables. This hypothesis test compares the actual counts found during data capture against an expected count.

over 6 years ago

Multivariate comparison of the means of two groups

This tutorials explains the nature and use of Hotelling's T-squared test to compare the means of several numerical variables between two groups. Along the way we also consider some of the assumptions for the use of this test and look at the packages and functions required for the test.

over 6 years ago

Introducing R for biostatistics

Accompanying RPubs file for my YouTube tutorial introducing R for biostatistics.

over 6 years ago

Example of a convolutional neural network

This document contains a first look at an example of a convolutional neural network. It uses one of the built-in Keras image datasets and shows the use of convolutional operation layers, maximum pooling layers, and a flatten layer.

over 6 years ago

Introduction to convolutional neural networks

Convolutional neural networks (CNN) are ideal for image classification. This post provides an explanation of the concepts required to construct a CNN. These include the convolution operation, pooling, stride length, padding, and more.

over 6 years ago

Cross entropy

Cross entropy describes a method for determining the difference between an actual and a predicted categorical data point value.

over 6 years ago

Deep neural networks for regression problems

Regression problems have numerical data as target variable and requires specific loss functions and output layer design.

over 6 years ago

Improvement techniques in neural network training

Researchers have added many mathematical changes to the original concept of a layered perceptron model. This chapter discusses some of these improvements and are aimed at providing an overview of this topic. This will be useful when implementing techniques such as RMSprop and batch normalization in code.

over 6 years ago

Implementing regularization and dropout in Keras

This chapter implements regularization and dropout to help overfitting (high variance). The IMDB dataset is used as a prime example of high variance, especially when large networks are used.

over 6 years ago

Dropout regularization

Dropout is a regularization technique that can be implemented when a network overfits (i.e. a high variance exists). It randomly removes the values of some nodes in a network leading to a simpler model and hence reduces the hypothesis space.

over 6 years ago

Regularization

This document describes the use of regularization in deep neural network. Regularization is introduced as a form of complexity measurement and constrain on the hypothesis space of a network. Regularization constrains the hypothesis space by creating simpler networks that generalize better and improve high variance.

over 6 years ago

Poor performance of a deep learning model

This post discusses the issues around creating a proper training and test set. It also discusses the subjects of bias and variance in a model, showing how these are recognized and what steps to take to correct for them.

over 6 years ago

Example of a deep neural network using Keras

This chapter shows the actual code to construct a neural network using Keras in R. It introduces the concept of splitting the data into a training and test set. This allows for both training the neural network as well as testing its accuracy.

over 6 years ago

R Primer

A very short introduction to R

over 6 years ago

Basic neural network

This post describes the basics of a neural network

over 6 years ago

Logistic regression as a single layer network

The last puzzle piece required to understand the fundamentals of deep neural networks is introduced in this post. It considers expressing the predicted variable as a probability so that it can be used in classification problems.

over 6 years ago

Multiple linear regression as a shallow network

This chapter describes a multivariable linear regression model as a neural network with a single hidden layer. This is done so as to create a familiarity with the terms and processes of deep learning.

over 6 years ago

Linear regression as a simple learning network

An example of linear regression serves to illustrate the basic concepts of deep learning through the explanation of terms such as cost functions, backpropagation, global minima, and gradient descent.

over 6 years ago

Regression as a first step in understanding deep learning

This post explains the concepts of a model and an error through the use of linear regression. These concepts will play an important role in creating deep neural networks.

over 6 years ago

Introducing Keras for deep learning

This post introduces Keras in R. A deep learning framework using Google's TensorFlow backend. The example is a multi-class classification problem from the University of California at Irvine database for machine learning. The dataset is converted to a .csv file and is available in my GitHub repository.

over 6 years ago

Sign In

juanhklopper

Dr Juan H Klopper

Recently Published