 # juanhklopper

## Recently Published ##### Discrete time series modeling - Modeling with R series
In this publication we take a look at modeling in discrete time steps. It is part of a series showing modeling using R. ##### Statistics using the tidyverse - Modeling with R series
In this publication we take a look at conducting basic statistical analysis using the R language. It is part of a series showing modeling using R. This tutorial gives an intuitive view of gradient descent (finding the minimum of a function) by way of the simple case of a parabola in one variable. ##### Introduction to R and RStudio
The first lecture in my new course on medical statistic for residents (specialist in training) in the Health Sciences Faculty of the University of Cape Town. Lectures are held on a Tuesday evening and repeated on a Thursday evening. ##### Forcats library for categorical variables
This document looks at managing categorical variables in a dataset. In base R, these are seen as factors. Not so in the tidyverse, which creates tibbles instead of dataframes. The forcats library allows us to make use the advantages of factors in the tidyverse. ##### Box-and-whisker plots
In this tutorial we take a look at the ubiquitous box-and-whisker plot. It is ideal to visualize the spread in a numerical variable by way of quartile values. It can also display statistical outliers which can be great importance. ##### Adding color to Plotly plots
In this tutorial we take a look at all the color options available in Plotly for R. You can indeed color your plots and charts to your heart's content. ##### Control color in histograms using Plotly for R
In this tutorial we add some control over the color of our histograms. It is an extension of a previous tutorial on histograms. ##### Exact test of goodness of fit
In this tutorial we take a look at the exact test of goodness of fit for binomial categorical variables. This hypothesis test compares the actual counts found during data capture against an expected count. ##### Multivariate comparison of the means of two groups
This tutorials explains the nature and use of Hotelling's T-squared test to compare the means of several numerical variables between two groups. Along the way we also consider some of the assumptions for the use of this test and look at the packages and functions required for the test. ##### Example of a convolutional neural network
This document contains a first look at an example of a convolutional neural network. It uses one of the built-in Keras image datasets and shows the use of convolutional operation layers, maximum pooling layers, and a flatten layer. ##### Introduction to convolutional neural networks
Convolutional neural networks (CNN) are ideal for image classification. This post provides an explanation of the concepts required to construct a CNN. These include the convolution operation, pooling, stride length, padding, and more. ##### Deep neural networks for regression problems
Regression problems have numerical data as target variable and requires specific loss functions and output layer design. ##### Improvement techniques in neural network training
Researchers have added many mathematical changes to the original concept of a layered perceptron model. This chapter discusses some of these improvements and are aimed at providing an overview of this topic. This will be useful when implementing techniques such as RMSprop and batch normalization in code. ##### Implementing regularization and dropout in Keras
This chapter implements regularization and dropout to help overfitting (high variance). The IMDB dataset is used as a prime example of high variance, especially when large networks are used. ##### Dropout regularization
Dropout is a regularization technique that can be implemented when a network overfits (i.e. a high variance exists). It randomly removes the values of some nodes in a network leading to a simpler model and hence reduces the hypothesis space. ##### Regularization
This document describes the use of regularization in deep neural network. Regularization is introduced as a form of complexity measurement and constrain on the hypothesis space of a network. Regularization constrains the hypothesis space by creating simpler networks that generalize better and improve high variance. ##### Poor performance of a deep learning model
This post discusses the issues around creating a proper training and test set. It also discusses the subjects of bias and variance in a model, showing how these are recognized and what steps to take to correct for them. ##### Example of a deep neural network using Keras
This chapter shows the actual code to construct a neural network using Keras in R. It introduces the concept of splitting the data into a training and test set. This allows for both training the neural network as well as testing its accuracy. ##### Logistic regression as a single layer network
The last puzzle piece required to understand the fundamentals of deep neural networks is introduced in this post. It considers expressing the predicted variable as a probability so that it can be used in classification problems. ##### Multiple linear regression as a shallow network
This chapter describes a multivariable linear regression model as a neural network with a single hidden layer. This is done so as to create a familiarity with the terms and processes of deep learning. ##### Linear regression as a simple learning network
An example of linear regression serves to illustrate the basic concepts of deep learning through the explanation of terms such as cost functions, backpropagation, global minima, and gradient descent. ##### Regression as a first step in understanding deep learning
This post explains the concepts of a model and an error through the use of linear regression. These concepts will play an important role in creating deep neural networks. ##### Introducing Keras for deep learning
This post introduces Keras in R. A deep learning framework using Google's TensorFlow backend. The example is a multi-class classification problem from the University of California at Irvine database for machine learning. The dataset is converted to a .csv file and is available in my GitHub repository. ##### Biserial correlation in R
Biserial and point-biserial correlations allows for the calculation of a correlation coefficient if one of the variables is discrete in nature. ##### Testing assumptions for the use of parametric tests
In this post with R code snippets, I discuss some of the assumptions that must be met for the use of parametric tests