RPubs

by RStudio

kieroneil

Kier O Neil

Recently Published

COVID-19: Those leading days are messing up your model

Optimize your forecast with run-length encoding

about 5 years ago

Economics in R: US Distributional National Accounts

A gentle introduction

about 5 years ago

Economics in R: US Distributional National Accounts

7. One Big Loading & Formatting function

about 5 years ago

Economics in R: US Distributional National Accounts

6. Creating Income & Wealth Distributions

about 5 years ago

Economics in R: US Distributional National Accounts

5. Creating proportions

over 5 years ago

Economics in R: US Distributional National Accounts

4. Reconciling the data

over 5 years ago

Economics in R: US Distributional National Accounts

3. Renaming the variables

over 5 years ago

Economics in R: US Distributional National Accounts

2. Getting the Data

over 5 years ago

Bulk Processing Zip Files

The power of functional programming in R

over 5 years ago

World Happiness Report - Prediction with tidymodels

Machine Learning, tidymodels, tidyverse

over 5 years ago

Does stress impact happiness?

Explores data from Gallup and World Happiness Report to determine correlation between two variables.

about 6 years ago

The Clash of Clans Curve - How MMO Strategy Games Get You Hooked

This curve is built into every successful MMO game for good reason.

almost 7 years ago

Machine Learning - Decoding Happiness

This paper uses data from The Happiness Report which collects metrics from each country annually and produces a final score of happiness of the country's citizens. The machine learning solution reverse-engineers the scoring algorithm used in The Happiness Report.

almost 7 years ago

NCAA March Madness - How the Receptionist Wins the Office Brackets

Hypothesis testing of three general approaches to picking brackets.

about 7 years ago

Data Wrangling - Wine Reviews v2

I focus on getting the data into the optimal format for further exploratory analysis in this part. I am using the 130,000 record file because it has a few more columns than the 150,000 record version. The end-product has 115,000 records with no NA’s.

about 7 years ago

How to create 50,000 Argentines ... or more

This is about creating mock data. Popular names and a city, state, postal code lookup table help to create simulated Argentine customers that will look plausible to a citizen.

over 7 years ago

Fraud Detection with Machine Learning

In this paper, I use a publically available dataset of 6.3 million European Credit Card transactions to determine the best model fit between Recursive Partitioning, Random Forest, C5.0, and Support Vector Machines, based on accuracy and the prevalence of false-positives.,

over 7 years ago

Predictive Data Product - The Word Genie

Predict the next word of a short phrase using US News, Blogs, and Tweets supplied from Swiftkey.

over 7 years ago

Text Mining US Tweets, Blogs & News

In this analysis I look at a dataset of US news, blogs, and tweets to examine the most common words and as a whole and per source. This analysis includes charts for visualizing the data and comparing one set to another.

over 7 years ago

Mapping Census Data with tidycensus and leaflet

I was inspired by a blog from Julia Silge to explore tidycensus. I have an interest in demographics and economics so I wanted to see if it wold make my research easier. Yes it does.

almost 8 years ago

What’s New in dplyr 0.7.0?

Executive Summary The dplyr package is one of my workhorses when manipulating dataframes in R. The verb-based approach fits comfortably with my SQL background. It is a part of Hadley Wickham’s tidyverse which is a toolbox for almost anything a data scientist would need, with a common grammar. This paper exclusively walks through the new functionality and datasets available in the new version of dplyr. At the end we will look at dplyrs implementation of rlang functionality to better provide column-references for functions and apps.

almost 8 years ago

Canadian Job Prospects through 2024

Executive Summary: For Canadians looking at job opportunities, it is nice to have an idea how the job market will perform in the future for your chosen occupation or industry. Many countries have open data sets that offer this kind of data. In these exercises provided by Lauro Silva we will use R to analyze the future Canadian job prospects through 2024.

almost 8 years ago

Officer Shinn, you have some explaining to do

This is derived from the San Francisco, CA, USA public employee dataset that includes income details of every person working for the city from 2011-2014.

about 8 years ago

Data Product - San Francisco Police, Fire, Nurse Pay Distribution

This is the documentation for a shiny application that uses a dataset of the pay distribution of all San Francisco public employees. It was filtered to only include people working in the Police, Nurse, and Fire professions. The app allows you to select any number of four years to use, the type of pay, and the income range to display. The output is a violin plot of the pay distribution in each profession. Enjoy.

about 8 years ago

Mapping US Hospital Ratings with Plotly

This deck shows a map of all US Hospitals that were reported with each marker colored to indicate their overall rating.

over 8 years ago

The Twitterverse of Good Mornings around the Globe

This geo-data uses a dataset of all of the twitter users who tweeted "Good Morning" on either Dec 7 or 8, 2016 and plots their locations on a map. This data was acquired from: https://www.kaggle.com/tentotheminus9/good-morning-tweets

over 8 years ago

Machine Learning Movement-Class based on 150 Variables

In this analysis of dumb bell lifting form I explore four different models to determine how best to predict the classe given 150+ variables.

over 8 years ago

Throwing Dice - Proving Regression toward the Mean

This example enhances upon the dice throwing function published by http://rpubs.com/Lionel/11497. I mainly changed the variable names for clarity and changed the plotting type to barchart.

over 8 years ago

Exploratory Data Analysis - The Rise and Fall of Kaitlyn

Do you know a Kaitlyn? Of course you do. Find out when this baby name experienced it's growth and why it's headed to the scrapheap of history. Uses the US Baby Names database

over 8 years ago

On the Greatness of People Named Kier born between 1968 and 1972

I got my hands on the database of US Baby Names from the Kaggle website. It covers babies born in the US between 1880 and 2014. Initially I was interested to see how the popularity of the names that my siblings and parents changed over this period . I probably shouldn't have been surprised, but I was, that the popularity of their name peaked right around their birth year. For instance Ryan and Erin saw their name's greatest popularity in the years around their birth years, 1970 and 1971, respectively. My name, Kier, has an unusual anomaly where there is a blip of popularity around my birth year, 1968, and then as we Kier's reach adulthood the popularity of the name continued to grow over time. A reasonable conclusion to be inferred is that as the greatness of the late 1960's Kier's became to be recognized as they reached adulthood, more women were inspired to name their babies after them.

over 8 years ago

US Storm Damage to Health & Property

This analysis looks at damage to health and property caused by storms since 1950.

over 8 years ago

Sign In

kieroneil

Kier O Neil

Recently Published