gravatar

ardalby

Andrew Dalby

Recently Published

Highly Pathogenic Avian Influenza in Cattle in the US
This is a series of choropleth plots showing the evolution of the highly pathogenic avian influenza (H5N1) outbreak in cattle in the US between March 2024 and March 2025
Choropleth of Highly Pathogenic Avian Influenza in the USA
This is a graph showing bird mortality data recorded by the CDC for the 2024-2025 outbreak of highly pathogenic avian influenza in the United states.
Using Linear Models with the Algal Bloom Data
This is the algal bloom data from the data-mining with R (DMwR2 package). I have done an in depth analysis of fitting linear models and using exploratory data analysis to help direct this investigation. The results suggest that the data is going to always yield poor predictions of the outcomes as there are causative variables that are missing.
The Iris Dataset Analysed
I go through a quick machine learning approach to the Anderson/Fisher Iris data using R. First I carry out exploratory data analysis to show that only the petal data is needed to distinguish the species and then I use a decision tree approach as the classification method
The Poisson Distribution
This is part of the series based on the textbook Further Mechanics and Probability by L. Bostock and S. Chandler.
Summary Statistics of Discrete Probability Distributions
This short section defines the expectation and variance of discrete probability distributions as well as discussing how functions of random variables will affect the expectation.
Discrete Random Variables: The Geometric Distribution
This covers the geometric distribution and its relationship to the binomial distribution and also the Poisson distribution.
Discrete Random Variables: The Uniform Distribution
The third part of this series looks at the uniform distribution but also starts to develop a more experimental approach and an understanding that samples do not give the same results as the theoretical distributions.
Discrete Random Variables: The Binomial Distribution
The second part of the series focuses on the Binomial distribution and how this can be simulated in R. The focus is again on biological examples for genetics and also sample size calculations
Discrete Random Variables
This is an introduction to discrete random variables inspired by "Further Mechanics and Probability" L. Bostock and S. Chandler, Stanley Thornes 1985. This gives a theoretical background and I have used this framework to build a short series of pages about probability distributions. I have created examples linked to biology and in particular sequence analysis.
A Second R Session
In this session I show how to import a csv file, rename the column headings, deal with missing values and calculate the summary statistics for the entire dataset as well as using factors to define subgroups. I also carry out the independent sample t-test using two levels of the factor gender.
A First R Session
This is a simple R session that uses a single variable set of data that is created as a vector within R and then summarised. It also shows how R operations act on vectors and how to create a simple function in R
Covid Deaths as a Percentage of Total Deaths by Age in England and Wales for 2023
This is an analysis of the ONS data for the first 12 weeks of 2023 up to and including the 24th of March. The reasoning for the analysis was to try and explain the anomalous percentage of deaths for children aged between 1 and 14.
Example Linear Regression
This is a linear regression calculated in R using the formulae and the linear model function.
Father and Son Heights Regression
This is an example of a major axis regression showing that ordinary least squares significantly underestimates the slope of the relationship.
Children Height Data
This is a set of data of children's heights for three groups of 10. I have used it to show how to create datafiles for R, Excel and SPSS. I have also used it to show how you can summarise the data in R and how you can visualise it as well as carry out null hypothesis significance testing.
An A-level Question based on Capture and Recapture proportions
This calculates confidence intervals for a sample proportion so that you can estimate the confidence intervals for the population size in a capture and recapture experiment. This illustrates how complex those experiments really are with the asymmetry of the estimated confidence interval because of the proportion/
Alternative Methods of Calculating the Mean
This describes the geometric and harmonic mean as alternatives to the arithmetic mean and shows how to calculate them in R.
Pictorial Representation
These examples come from how we used to think about data visualisation when I did my A-levels in the 1980s.
Simple Analysis of Categorical Data
These are examples of the analysis of categorical data including contingency tables and chi-squared tests using the data from Statistical Methods in Biology, 2nd Edition N.J.T Bailey, Hodder and Stoughton, London, 1981.
Paired t-Test of Analgesics
Paired t-test of analgesic effectiveness from Statistical Methods in Biology by N.J.T. Bailey, Hodder and Stoughton, 1981.
Multiple Comparison Tests in R - Miller Data
This is the Miller dataset used in Toothaker to explore the reliability of multiple comparison tests when the data has a clearly defined pattern in its structure.
Multiple Comparison Tests in R
This is an R file to accompany Larry Toothaker's book Multiple Comparison Procedures SAGE, Newbury Park, 1993.
Some Examples of the Poisson Distribution
These are example of the Poisson Distribution taken from Statistical Methods in Biology 2nd Edition, Norman T. Bailey, Hodder and Stoughton, London, 1981.
Example E
This is example E from Cox and Snell's Applied Statistics analysed using a paired T-test.
Quick Guide to Meta Analysis
This is a very quick guide to doing the meta analysis plots in R. It takes you from the csv file through generating the plots which are production quality. It is not a guide to the theory or the concepts.
Normality Tests Confusion Matrix
In the first investigation I looked at the p-value distributions for the normality tests and these suggested an interesting simulation to randomly pick data from one of the three distributions and test it for normality. In this case I have balanced the data between normal and non-normal distributions (either exponential or gamma).
Normality Tests
I wanted to explore how useful normality tests are. I found some interesting results.
Bias in the Standard Error of the Mean
The estimate of the standard error of the mean is the sample standard deviation divided by the root of the sample size. By using simulations I examined how this estimator performs as the sample size increases.
Capture and Release Models
This is the latest version of my capture and release models with an improved work through of the original probability calculations.
De Moivres Theorem and the Standard Error of the Mean
This document uses simulation to show that the standard error of the mean is proportional to the reciprocal of the root of the sample size and not the reciprocal of the sample size.
Fitting Non-linear Models
This is an example of non-linear regression where the data was transformed to be a logistic model.
Combined Likert Scores as Quantitative Data
This shows how you can combine Likert scores from multiple questions to create quantitative data.
European Parliament Voting in GGPlot
These are some examples of using GGPlot on the data for European Parliament voting.
Statistics A-level Questions
These are some example A-level Statistics questions solved and documented in R
Reflective Commentary on an A-level Question
While I was helping my son with an A-level paper I was also thinking about how to solve the problem from first principals and what I remember of my A-level from 30 years ago.
Getting Started with Rstudio
This is a basic guide to using Rstudio which shows the functions of the different panels.
A simple example of using R-exams
This is an example of using R-exams to produce a contingency table and calculate a conditional probability from it.
Analysing Breast Cancer Gene Expression Data from TCGA Using R and Bioconductor
This is an example of gene expression analysis between stage I and stage IV Breast Cancer using Bioconductor and R in R studio