Explores data from Gallup and World Happiness Report to determine correlation between two variables.
This curve is built into every successful MMO game for good reason.
This paper uses data from The Happiness Report which collects metrics from each country annually and produces a final score of happiness of the country's citizens. The machine learning solution reverse-engineers the scoring algorithm used in The Happiness Report.
Hypothesis testing of three general approaches to picking brackets.
I focus on getting the data into the optimal format for further exploratory analysis in this part. I am using the 130,000 record file because it has a few more columns than the 150,000 record version. The end-product has 115,000 records with no NA’s.
This is about creating mock data. Popular names and a city, state, postal code lookup table help to create simulated Argentine customers that will look plausible to a citizen.
In this paper, I use a publically available dataset of 6.3 million European Credit Card transactions to determine the best model fit between Recursive Partitioning, Random Forest, C5.0, and Support Vector Machines, based on accuracy and the prevalence of false-positives.,
Predict the next word of a short phrase using US News, Blogs, and Tweets supplied from Swiftkey.
In this analysis I look at a dataset of US news, blogs, and tweets to examine the most common words and as a whole and per source. This analysis includes charts for visualizing the data and comparing one set to another.
I was inspired by a blog from Julia Silge to explore tidycensus. I have an interest in demographics and economics so I wanted to see if it wold make my research easier. Yes it does.
Executive Summary The dplyr package is one of my workhorses when manipulating dataframes in R. The verb-based approach fits comfortably with my SQL background. It is a part of Hadley Wickham’s tidyverse which is a toolbox for almost anything a data scientist would need, with a common grammar. This paper exclusively walks through the new functionality and datasets available in the new version of dplyr. At the end we will look at dplyrs implementation of rlang functionality to better provide column-references for functions and apps.
Executive Summary: For Canadians looking at job opportunities, it is nice to have an idea how the job market will perform in the future for your chosen occupation or industry. Many countries have open data sets that offer this kind of data. In these exercises provided by Lauro Silva we will use R to analyze the future Canadian job prospects through 2024.
This is derived from the San Francisco, CA, USA public employee dataset that includes income details of every person working for the city from 2011-2014.
This is the documentation for a shiny application that uses a dataset of the pay distribution of all San Francisco public employees. It was filtered to only include people working in the Police, Nurse, and Fire professions. The app allows you to select any number of four years to use, the type of pay, and the income range to display. The output is a violin plot of the pay distribution in each profession. Enjoy.
This deck shows a map of all US Hospitals that were reported with each marker colored to indicate their overall rating.
This geo-data uses a dataset of all of the twitter users who tweeted "Good Morning" on either Dec 7 or 8, 2016 and plots their locations on a map. This data was acquired from: https://www.kaggle.com/tentotheminus9/good-morning-tweets
In this analysis of dumb bell lifting form I explore four different models to determine how best to predict the classe given 150+ variables.
This example enhances upon the dice throwing function published by http://rpubs.com/Lionel/11497. I mainly changed the variable names for clarity and changed the plotting type to barchart.
Do you know a Kaitlyn? Of course you do. Find out when this baby name experienced it's growth and why it's headed to the scrapheap of history. Uses the US Baby Names database
I got my hands on the database of US Baby Names from the Kaggle website. It covers babies born in the US between 1880 and 2014. Initially I was interested to see how the popularity of the names that my siblings and parents changed over this period . I probably shouldn't have been surprised, but I was, that the popularity of their name peaked right around their birth year. For instance Ryan and Erin saw their name's greatest popularity in the years around their birth years, 1970 and 1971, respectively. My name, Kier, has an unusual anomaly where there is a blip of popularity around my birth year, 1968, and then as we Kier's reach adulthood the popularity of the name continued to grow over time. A reasonable conclusion to be inferred is that as the greatness of the late 1960's Kier's became to be recognized as they reached adulthood, more women were inspired to name their babies after them.
This analysis looks at damage to health and property caused by storms since 1950.