Recently Published
Data Science Capstone - Slide Deck
NextWord is an App to predict the next word from a previously given text.
Data Science Capstone - Report
The following is a complete report of what we have researched until now, and the intentions and ideas that are being identified for the development of the prediction application, “Next word.”
We begin by using initial data provided by “HC Corpora.” We have various languages to choose from but, at least initially, we are only going to address the files found in the folder “en_US.” This folder contains 3 different files: Blogs, News and Twitter (we have to open one of them in binary mode in order to see all of its data).
Data Science Capstone - Report
Vamos a hacer un reporte completo de lo investigado hasta ahora y de las intenciones e ideas que vamos identificando para el desarrollo de nuestra aplicación de predicción de la ‘Siguiente palabra’.
Partimos de unos datos iniciales proporcionados por ‘HC Corpora’. Tenemos varios idiomas disponibles, pero en principio, solo vamos a tratar los archivos de la carpeta ‘en_US’. En esta carpeta tenemos 3 archivos diferentes: Blogs, News y Twitter (tenemos que abrir uno de ellos en modo binario para poder disfrutar de todos sus datos).
Processing 'Datasets' with Random Forest
Shiny App. Processing 'Datasets' with Random Forest
Plotly forever
Plotly 3D Surface Graphic
Leaflet Map
Boulder, CO, US vs Madrid, ES
Prediction Model using Weight Lifting Exercises Dataset
We are going to create an algorithm to predict as precisely as possible the correct way (How well) to exercise. To do so, we are going to use the public dataset (Weight Lifting Exercises Dataset) and Machine Learning techniques, principally Random Forest, Generalized Boosted, Linear Discriminant Analysis, Recursive Partitioning And Regression Trees and, of course, Cross Validation.
Modelo de Predicción usando Weight Lifting Exercises Dataset
Vamos a crear un algoritmo para predecir con la mayor precisión, la manera (How well) de realizar ejercicio. Para ello vamos a usar un dataset publico (Weight Lifting Exercises Dataset) y técnicas de Machine Learning, principalmente Random Forest, Generalized Boosted, Linear Discriminant Analysis, Recursive Partitioning And Regression Trees and, of course, Cross Validation.
Efficiency analysis in MPG (miles per gallon) for automatic vs. manual transmission cars
We are going to investigate gas efficiency in cars, measured in miles per gallon, by using the well-known public data set mtcars. In this case, we are going to focus on comparing the efficiency of automatic and manual transmission cars. We are also going to evaluate the impact of introducing all or some covariates in the linear fitting to get a holistic view of the reality of the data set.
Analisis del rendimiento en MPG (millas por galon) de coches automaticos vs manuales
Vamos a investigar el rendimiento del combustible en coches, medido en millas por galon, para ello vamos a usar el conocido dataset publico mtcars. Pero nos vamos a centrar en averiguarlo para coches de cambio automatico comparado con coches de cambio manual. Tambien vamos a evaluar el impacto de introducir todas o algunas covariables en el ajuste lineal para acercarnos holisticamente a la realidad del conjunto de datos. Las conclusiones las podremos ir descubriendo a lo largo de este informe, pero al final habrá un executive summary.
Tooth Growth in Guinea Pigs - Orange Juice vs Ascorbic Acid
We are going to research the growth of Guinea Pigs’ teeth using a treatment based on Vitamin C. To do so, we are going to give each animal different doses of two components that contain high amounts of Vitamin C: on the one hand Orange Juice, and on the other Ascorbic Acid. The conclusions have been outlined at the end of the report. We are going to use the following tools: Plots, Confidence Interval and Hypothesis Test.
Exponential Distribution vs Central Limit Theorem
We are going to demonstrate how an exponential distribution can be approximated using a normal distribution. To do so, we are going to compare a random sample of data and its empirical characteristics, with the theoretic values calculated using the Central Limit Theorem.
Tooth Growth in Guinea Pigs - Orange Juice vs Ascorbic Acid
Vamos a investigar el crecimiento de los dientes de Guinea Pigs usando un tratamiento basado en la Vitamica C. Para ello vamos a suministrar a cada ‘individuo’, si se les puede llamar asi, diferentes dosis de dos componentes con alto contenido de Vitamica C, por un lado Zumo de Naranja, y por otro Ácido Ascorbico. Las conclusiones las podremos ver al final del informe. Vamos a usar las siguientes herramientas: Plots, Confidence interval and Hypothesis Test.
Distribución Exponencial vs Central Limit Theorem
Vamos a demostrar como una distrubucion exponencial se puede aproximar con una distribución normal. Para ello vamos a comparar una muestra aleatoria de datos y sus caracteristicas empiricas, con los valores teóricos calculados basandonos en el Central Limit Theorem.
Consequence analysis of severe meteorological events on the health and economy of the United States - (NOAA).
The objective of this report is to identify the severe meteorological events that are most harmful both in terms of individual health and the effects on the U.S. economy. To do so, we are using the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database, which covers the period from 1950 through 2011. This CSV archive has registered a large number of these events.
RR-CP2-JCCC - Analisis de la BBDD de Tormentas - NOAA
Proyecto fin de curso de Reproducible Research - Versión en Español