RPubs

by RStudio

timothyfraser

Timothy Fraser

Recently Published

Making Adjacency Matrices in R: The Pokemon Matrix

All social network analysis tutorials need good sample data, so why go witha boring one when you could catch 'em all? Here's a short tutorial for producing a Pokemon Damage Adjacency Matrix, for making a weighted, directed unipartite network.

almost 3 years ago

Hurricane Sandy Collaboration

Description of steps for collaborating on our study of disaster recovery committees after Hurricane Sandy.

almost 3 years ago

Workshop: Marginal Effects for Logged Outcomes

Quick tutorial on using the margins package with gamma models to get around the tricky back-transformation problems that occur when using logged outcome variables.

about 3 years ago

Social Capital Indices for Census Tracts, Zipcodes, & County Subdivisions

This RMarkdown document identifies each step necessary for creating a census-tract level social capital index. Download our indices at: https://doi.org/10.7910/DVN/OSVCRC

about 3 years ago

Social Connectedness in Boston

about 3 years ago

Teaching Portfolio

Details teaching statement, diversity statement, sample courses, research with undergrads, and sample syllabi

over 3 years ago

Using Multiple Imputation in R

over 3 years ago

Working with OSMData

A brief tutorial, still in the works, for getting hyper local neighborhood boundaries from OpenStreetMap

over 3 years ago

Research Team: BlueBikes Project Guide

over 3 years ago

Research Team: Social Infrastructure Project Guide

over 3 years ago

Mapping Urban Policy: Social Infrastructure in Worcester, MA

over 3 years ago

Reviewing Modeling and Visualization

Welcome to Workshop 14: Final Review! In this workshop, we will review all key workshop concepts relevant to modeling, simulation, and geospatial visualization, through several review problems. Answers are posted below.

over 3 years ago

Lab: Divides in Social Infrastructure in Boston

This lab investigates divisions in social infrastructure in neighborhoods in the center of Boston. Social infrastructure refers to the physical spaces in our communities that build social ties between residents. These include community spaces, like libraries or community centers, places of worship, like mosques, synagogues, and churches, social businesses, like cafes, barbershops, and nail salons, and parks, like green belts, squares, and fountains. What if trends in social infrastructure differ more when we break them down by the type of social infrastructure? In this lab, you will use mapping in the sf package to visualize and tabulate your data, and then unleash your tidyverse and lm toolkit to analyze trends in one type of social infrastructure.

over 3 years ago

Workshop: Mapping Boston Social Infrastructure

This workshop introduces basic mapping visualization and spatial joins, by examining which some Boston neighborhoods host more social infrastructure than others. "Social infrastructure" refers to the physical spaces in our communities that build social ties between residents. These include community spaces, like libraries or community centers, places of worship, like mosques, synagogues, and churches, social businesses, like cafes, barbershops, and nail salons, and parks, like green belts, squares, and fountains.

over 3 years ago

Lab: Boston Vaccination Rates over Time

This lab examines vaccination rates in Boston zipcodes, using 448 weekly records of Boston’s 29 zipcodes over 16 weeks, from July 20 to November 2, 2021. Peer effects, where you are more likely to get vaccinated if your friends and family nearby have been vaccinated, have been documented in past studies of vaccination campaigns. However, might other key variables, like time, partisanship, and identity groups, might shape the strength of these peer effects on our effort to stem the tide of the COVID-19 pandemic? In this lab, we will break our dataset into smaller groups of zipcodes-weeks to examine how these key variables shape vaccination rates. You will practice using ggcorr(), ggplot(), and Zelig to investigate your model results.

over 3 years ago

Workshop: Using GIS to find the Nearest Polling Place in Boston

How close is your nearest polling place? Questions like these are vital to the democratic process, and enabling voter turnout, but are perniciously difficult to answer! This tutorial uses the case of polling places in Boston precincts to demonstrate how to identify your distance from your nearest polling place, focusing on the sf and tidyverse packages in R.

over 3 years ago

Workshop: Testing Regression Assumptions when Modeling Boston Vaccination Rates

This workshop examines a dataset of 448 weekly records of Boston’s 29 zipcodes over 16 weeks, from July 20 to November 2, 2021. This data documents what percentage of residents got vaccinated in the past week. Past studies suggest peer effects - whether friends and neighbors get vaccination - can boost vaccination rates, but what about during COVID, in Boston? We are going to investigate to to what degree we can predict changing local vaccination rates based on peer effects - whether people tend to go get their first shot if more people from their neighborhood got fully vaccinated 2 weeks prior. We will learn to use the GGally package and texreg packages to investigate colinearity, heteroskedasticity, and linear trends in our data.

over 3 years ago

Workshop: Using Time-Lagged Variables to Model Solar Farm Adoption in Japan

Some longitudinal studies use time-lagged variables, but what do we mean by a time-lagged variable? Lagging refers to when we investigate the effect of a past outcome (eg. COVID-19 cases last week) on a future outcome (eg. COVID-19 cases this week). This workshop introduces time-lagged variables to modeling in R, applying tools from the tidyverse, lmtest, and Zelig packages to a dataset of Japanese solar farm adoption rates.

over 3 years ago

Workshop: Using Statistical Simulation to Predict Food Deserts

Social scientists, policy analysts, and coders often have vitally important findings to convey to key decision-makers. Instead of telling policymakers about our beta-coefficients, what if we could show them what our model predicts instead? This tutorial introduces statistical simulation using the example of the Food Environment Index from the University of Wisconsin's County Health Rankings, which measures access to healthy foods in each county. This workshop highlights how racial and ethnic minorities groups face systemic barriers to health, like "food deserts." Learn to use the Zelig package to simulate your model trends!

over 3 years ago

Lab: Visualizing Economic Effects of Disasters

This lab examines 96 disaster affected Japanese municipalities over 7 years, from 2011 to 2017. You will learn to make dot-and-whisker plots of regression coefficients using ggplot in R, by testing how disaster damage affected income per capita in years following Japan's 2011 earthquake and tsunami.

over 3 years ago

Workshop: Testing Effects of Disaster on Social Capital with Multiple Regression

Why do some communities see stronger social capital than others? Recent studies suggest that after disasters, residents' social capital actually increases, because people recognize the value of friends and family as they work to recover and rebuild. This workshop uses regression analysis to test this hypothesis on 151 Japanese municipalities over 7 years, from 2011 to 2017, totaling 1057 city-year observations. This includes 85 coastal municipalities hit by the 2011 tsunami and 66 municipalities as similar as possible, just next door, that were not hit. You'll learn to use OLS multivariate regression and the texreg package to summarize your results.

over 3 years ago

Lab: Modeling Carbon Footprints in Japan

This lab investigates trends in carbon emissions in Japanese municipalities, from 2005 to 2017. Japan held the landmark Kyoto Protocol in 1997, which started international commitments to reduce greenhouse gas emissions. How much have Japanese cities reduced their carbon footprint, and which ones have succeeded the most? In this lab, we hypothesize that each passing year has led to a statistically significant reduction in cities’ carbon footprints. Let’s test that hypothesis using regression models and lines of best fit!

over 3 years ago

Regression and the Line of Best Fit

Today, we're going to investigate a popular dataset on commerce. The ggplot2 package's diamonds dataset contains 53,940 diamond sales gathered from the Loose Diamonds Search Engine in 2017. We're going to examine a random sample of 1000 of these diamonds, and learn how to generate scatterplots, correlation tests, and bivariate regression models with this data in R.

over 3 years ago

Choosing a Dataset for Your Project

For your first project, you will need a dataset! But how should you go about choosing your dataset and your variables? I’ve laid out some steps below that will help you get started. Here are a few great datasets on politics, the environment, economics, and health.

over 3 years ago

Lab: Effects on Political Polarization on Health

This lab investigates why some US counties experience worse health outcomes than others. You will conduct several experiments on county data using the t_test() function in the infer package in R. You have been commissioned to test two main hypotheses: 1. Is political polarization linked to worse health outcomes? 2. In communities that are polarized, is greater social capital linked to better health?

over 3 years ago

Reviewing the Tidyverse (Answer Key)

Answer Key for the following workshop: http://rpubs.com/timothyfraser/workshop_6_review This workshop contains 9 quick examples designed to review new coders' knowledge of the tidyverse, descriptive statistics, and inferential statistics. Students can do it themselves on RStudio.Cloud at the following link: https://rstudio.cloud/project/3023360

over 3 years ago

Reviewing the Tidyverse

This workshop contains 9 quick examples designed to review new coders' knowledge of the tidyverse, descriptive statistics, and inferential statistics. Students can do it themselves on RStudio.Cloud at the following link: https://rstudio.cloud/project/3023360

over 3 years ago

Lab: Testing the Fukushima Effect on Japanese Elections

This lab examines why some Japanese municipalities saw higher voter turnout than others in local elections in 2011, testing the effect of several aspects of the 3/11 disaster. It uses an adapted version of Yusaku Horuichi and Ryota Natori's terrific local elections dataset (https://doi.org/10.7910/DVN/TLC5R4). Students can apply their knowledge of t-tests and the difference of means in R.

over 3 years ago

Workshop 5: Learning Inferential Statistics

This tutorial introduces students to t-tests, chi-squared tests, and correlation tests in R. Through a series of short activities, students will gain foundations in these techniques, ask questions about a novel dataset about Japanese elections, and develop their own statistical tests.

over 3 years ago

Lab: Political Polarization in US Counties

This lab examines changes in political polarization over time, to verify whether or not American voters have grown more polarized over time. Users will utilize County Elections Outcomes in every presidential election since the year 2000, drawing from data from the MIT Elections Data. Designed for classroom use!

over 3 years ago

Mapping Raster Data in the Tidyverse

This tutorial introduces intermediate coders to how to work with raster data in the sf/tidyverse ecosystem. I demo several processing, including converting geospatial data recorded as rasters to a tidier sf spatial dataframe format, aggregating values on a fishnet grid, and measuring correlations between rasters at different levels.

over 3 years ago

Using Difference-in-Differences Models for Environmental Social Science

Increasingly, political scientists are using data on multiple cities, countries, or individuals, recorded over time. How do we work with this data? In addition to fixed effects or random effects, which show constant effects over time, we might also be interested in change over time. This is the main advantage of Difference-in-Differences, and can achieved using a simple OLS model with an interaction effect, with several simple steps for scaling up to more powerful models. This tutorial introduces several short lines of R code on how to run a Difference-in-Differences model and get useful visuals, which you can do in SPSS, Stata, or R.

over 3 years ago

Simulation in Zelig: How it Works

What is statistical simulation, and how does it work? This tutorial describes how to simulate your model's effects in the Zelig package in R, and then goes under the hood to make our own simulations.

over 3 years ago

Visualizing Predictions in R

Data scientists need compelling ways to visualize out regression model predictions. This tutorial introduces several common options, including functions like predict(), ggeffects(), and zelig() from popular packages.

over 3 years ago

Matching Experiments for Disaster Social Science

This tutorial introduces viewers to matching experiments, using coarsened exact matching in the CEM package in R. Using the case of COVID19 Test Positivity Rates in US cities, I demonstrate how you can quickly make better causal inference and visualize your results, all in R!

over 3 years ago

Importing and Tidying Data in R

This workshop in R introduces users to importing data into R, how to pivot data, and creative ways to visualize and compare categories using ggplot. Part of my Quantitative Techniques in R sequence. This version is completed, intended for instructors or viewers. Please see the link below for a version students can work through on their own: http://rpubs.com/timothyfraser/quant_workshop_4

over 3 years ago

15 Ways to Visualize Regression Results

We're often taught in statistics courses to interpret models using beta coefficients, but are there more visually appealing, exciting ways to convey this information for our readers? Here are 15 different ways you could convey results, all using statistical simulation using the Zelig package in R.

over 3 years ago

Importing and Tidying Data in R (Student Version)

This workshop in R introduces users to importing data into R, how to pivot data, and creative ways to visualize and compare categories using ggplot. Part of my Quantitative Techniques in R sequence. This version is intentionally left incomplete, without visuals, for students to work through to discover the output.

over 3 years ago

Sign In

timothyfraser

Timothy Fraser

Recently Published