gravatar

mdancho84

Matt Dancho

Recently Published

Coursera Data Science Capstone Project: Word Prediction Web App
The slide deck that is part of the final capstone project for the Coursera Data Science course. The slide deck pitches a web application that predicts the next word from a phrase.
Capstone Project: Milestone Report
The goal of this milestone report is to perform an exploratory analysis using text mining that eventually will lead to a text prediction algorithm and a Shiny application. In this report, three files (en_US.blogs.txt, en_US.news.txt, and en_US.twitter.txt) containing unstructured text are loaded. The data is subset to reduce the time for algorithm pre-processing and tokenization. Pre-processing is performed to cleans the data by removing punctuation, stripping white space, removing stop words and profanity, and stemming the words. Tokenization is performed to turn the text units into n-gram word vectors of length one (unigrams), two (bigrams) and three (trigrams). Exploratory analysis is then performed to understand the highest frequency n-grams using both bar plots and word clouds.
Advanced R, Chapter 11: Functionals
This is a replication of Hadley Wickham's Advanced R, Chapter 11: Functionals, mostly used for my learning. I skipped a lot of the extraneous information, focusing on the apply and Map functions.
Advanced R, Chapter 10: Functional Programming
This is a replication of Hadley Wickham's Advanced R, Chapter 10: Functional Programming, mostly used for my learning. I have taken a few creative liberties and expanded on some of the concepts. Also, most of the exercise questions have answers.
R4DS Chapter 23: Model Basics
This is an extension of Hadley Wickham's R for Data Science, Chapter 23: Model Basics. The goal is to reproduce and extend the chapter.
R4DS Chapter 24: Model Building
This is an extension of Hadley Wickham's R for Data Science, Chapter 24: Model Building. The goal is to reproduce and extend the chapter.
R4DS Chapter 25: Many Models
This is an extension of Hadley Wickham's R for Data Science, Chapter 25: Many Models. The goal is to reproduce and extend the many models chapter.
Practical Machine Learning: Prediction Assignment
This is the course project for the Practical Machine Learning course that is part of the Coursera Data Science Specialization.
Bike Sales By Geographic Location
This map presents the customer sales by geographic location. The map will be posted on the orderSimulatoR blog post at www.mattdancho.com. The bikes data set is a simulated data set using the orderSimulatoR scripts, which can be found at https://github.com/mdancho84/orderSimulatoR.
Effects of Severe Weather Events on Population and Economy
This report reviews the effects of severe weather events on both the population and the economy. This is an investigative report in that we are seeking to determine which event types have historically caused the most fatalities (a measure of population impact) and property damage (a measure of economic cost). Data was obtained from the U.S. National Oceanic and Atmospheric Administration's (NOAA) storm database. The events start in the year 1950 and end in November 2011. The conclusion is that tornado's cause the most harm to the population in terms of fatalities and floods cause the most economic cost.