Two groups of people are playing a guessing game. Each individual has to guess the answer to a single question. Although responses are individual, the final results are tallied by group. Does the smaller group have an advantage or disadvantage?
Let’s say we’re looking at the total deaths by quarter at a large hospital. The question we’re interested in is whether there are meaningful differences across quarters. For instance, anecdotal evidence indicates that deaths often rise in the 4th quarter of the year.
stat_summary can be used to return either just a single point (using a function such as mean) or it can return a dataframe with y, ymin, and ymax (e.g. when using Hmisc::mean_cl_normal)
Unlike daily ED visits data, daily census count data is likely to have correlated errors. So it’s not a good idea to use classical normal-based regression for inference or prediction. Instead, let’s use time series methods.
At first glance, the probability density function (pdf) of the normal distribution seems like it has a lot going on. At its heart, though, is a fairly simple function: exp(-(x^2))
R has many packages for optimization and linear programming. I'm trying out two of them here on a simple linear programming problem.
Demonstrating the approach used for long-term ED visits forecasting, using projected population numbers from BC Stats. These estimates should be taken as a rough estimate, but they do capture heterogeneous behavior across age groups, which is one of the main goals of ED visits projections.
Normal and lognormal distributions are closely related, but the variability of the former is the result of independent additive effects, whereas the variability of the latter is based on independent multiplicative effects. Here we do a simple simulation to explore these two types of effects.
It's important to track how long patients spend in the emergency department (ED), and ensure that they are treated and discharged within a reasonable amount of time. At any given moment in time, patients may not have an "end time" for their stay in ED, because they are still in the ED. This suggests the use of methods that can deal with censored data in analyzing ED LOS. Here, we use a Kaplan-Meier curve.
A simple regression model is reasonably good at explaining some of the variability in daily ED visits using the following predictors: day-of-week, month, and year.
Very often, a time series has no values on many dates. When pulling from databases, this means that there will be no rows associated with those dates. I often need to "fill in" those dates for time series plotting, calculating rates, etc. This function makes it very quick to do that "filling in".
Segmented regression is used to estimate intervention effects, allowing us to find both immediate and long-term changes due to the intervention. It is best used on de-seasonalized data. Since the data here had less than 24 time points, most de-seasonlizing methods would not work, and we used Fourier components to fit a seasonal model.
To me, Twitter is for keeping up with the stats/ML/AI/R/Python world. I use this to find interesting content that I've liked before.
Exploring the "standard" and Wilson CIs for binomial proportion, where the proportion is close to 0.
Bootstrapping can be used to generate sampling distributions for statistics, and to construct confidence intervals. Here I use the `rsample` package to bootstrap difference between means in two groups.
This is an example of how I use RMarkdown docs to organize exploratory data analysis, so that I can keep code, graphs, and notes in one place; and to ensure reproducibility.
A basic task in data analysis is to compare different entities using a set of measurements on several variables. When the set of variables includes both numeric and categorical variables, Gower's dissimilarity measure is often used.
In this session, we do some basic data manipulation and visualization.