This Milestone Report includes the code and results of the Data Science Capstone project data downloading, unzipping and loading into R. Additionally, it also has all the necessary data processing commands to build an actual corpus from the data, refine it so that it doesn’t include profane terminology, and clean it by removing punctuation, transforming it to plain text and use lowercase characters only. Finally, some exploratory data analysis was perform to understand the data, at least partially, and to establish next steps towards achieving the course’s end goal of building a shiny app that illustrates how a model would be able to predict the next word based on the user’s input.
Event Types with the Most Economic Consequences and Harm to the Population’s Health (NOAA Storm Database)
This document describes all of the steps required to complete the Reproducible Research JHU-Coursera, Course Project 2. These include reading and exploring the NOAA Storm Database in order to identify the Tornados as the most harmful event types with respect to the population health, and Floods as the most economically impactful event typs across the United States. To achieve this, several pre-processing actions were carried out such as validating the event type field values agains the official event list provided in the following link: https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf. After that, the harm to the population health was calculated by adding the total number of fatalities and injuries. The economical consequences, on the other hand, were calculated by adding the total value of property and crop damages generated. Finally, both the population harm, and the economic impacts were plotted using ggplot2.