gravatar

ykv001

Yaakov Miller

Recently Published

Milestone Report - Capstone Project
main features of the corpora
Data Science Capstone Next Word Prediction Webapp
Natural language processing (NLP) was applied to English documents coming from Tweets, News and Blogs.
Analysis of the ToothGrowth data in the R datasets package
This project analyses the ToothGrowth data in the R datasets package. The structure of this analysis is as follows: 1. Basic exploratory data anlysis 2. Basic summary of the data 3. Use confidence intervals and/or hypothesis tests to compare tooth growth by supp and dose 4. Conclusion
The exponential distribution and its comparison with the Central Limit Theorem
This project investigated the exponential distribution in R and compare it with the Central Limit Theorem. The project sets lambda = 0.2 for all of the simulations. The distribution of averages of 40 exponentials was investigated. A thousand simulations were made.
Developing Data Products
Week 4 Assignment
World latest Eartquakes
Latest earthquakes as 2017-07-30 21:55:42 (UTC)
Plotly & ggplot - Diamonds set
library(ggplot2) set.seed(100) d <- diamonds[sample(nrow(diamonds),1000),] p<- ggplot(data=d, aes(x=carat,y=price))+ geom_point(aes(text=paste("Clarity:", clarity)), size=4)+ geom_smooth(aes(colour=cut, fill=cut)) + facet_wrap( ~ cut) (gg<-ggplotly(p))
PLOTLY stocks
library(plotly) library(tidyr) library(dplyr) data("EuStockMarkets") stocks <- as.data.frame(EuStockMarkets) %>% gather(index, price) %>% mutate(time = rep(time(EuStockMarkets), 4)) plot_ly(stocks, x = ~time, y = ~price, color = ~index, type = "scatter", mode = "lines")
Airmiles plotly
library(plotly) data("airmiles") plot_ly(x = ~time(airmiles), y = ~airmiles, type = "scatter", mode = "lines")
US pop 1975 plotly
Plotly R API
NBA dashborad example
Exploring the US NOAA storm DB - Week 4 Project
The basic goal is to explore the NOAA Storm Database in relation severe weather events and answer the following questions: 1.Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health? 2.Across the United States, which types of events have the greatest economic consequences?
ggplt2 Plot (diamonds)
qplot(carat, price, data=diamonds, color=cut, facets=.~cut)+geom_smooth(method = "lm")
Histogram ggplot2 (mpg)
qplot(hwy, data=mpg, fill=drv)
ggplot2 Boxplot (mpg)
qplot(drv, hwy,data=mpg,geom="boxplot", color=manufacturer)
ggplot2 Plot (mpg)
qplot(displ,hwy,data=mpg, color=drv, geom=c("point","smooth"))
Lattice Plot Diamonds Data Set
With strip=F xyplot(price~carat | color*cut, data=diamonds, pch=20, xlab=myxlab, ylab=myylab, main=mymain)
Lattice Plot Diamonds Data Set
xyplot(price~carat | color*cut, data=diamonds, strip=FALSE, pch=20, xlab=myxlab, ylab=myylab, main=mymain)
Lattice State Plot
d3radarR Library
d3hiver library
Shared names
Visualize the most popular names that are shared by both males and females.
Most popular names (2014)
Identify the top 5 male and female names from 2014. Visualize the popularity trend over time.
Most popular names (1986)
Identify the top 5 male and female names from 1986. Visualize the popularity trend over time.
Total US births - Plot total US births recorded from the Social Security Administration.
Analysis of babynames with dplyr Use dplyr syntax to write Apache Spark SQL queries. Use select, where, group by, joins, and window functions in Aparche Spark SQL.
d3treeR Library 2
d3treeR Library
D3scatter Library
D3heatmap Library
Number of Clusters
Whole Sale Customers Data
Box Plot
Test
Histogram
test
Heating Load
Conditioned scatter plots
pair-wise scatter plot
Test Energy Efficiency Regression Data