Seattle has had a very dry Fall. To put it into context I got weather records for the last 100 years and analyzed them.
An analysis of the upward trend of nighttime temperatures in Aberdeen WA
Regional analysis of COVID stats,
including presidential politics of 2016
Looking at the Red vs. Blue State Covid infection rate breakdown. (In an era where speaking truth is a political act).
Analysis of New York Times data
Based on NYTImes data. #rstats
Analysis based on NY Times dataset.
Analysis of COVID-19 Pandemic for the Continental US
This is an analysis of the Hottest 10 COVID states.
This is a map of the COVID cases similar to the one the New York Times publishes.
Evolving analysis of COVID cases.
Data are from the NYTImes. Data analysis shows total cases and growth rate.
Heat map of COVID cases in California counties. Data from NY Times Github https://github.com/nytimes/covid-19-data
Heat Map of COVID-19 Growth. Points are color coded to reflect growth rate.
Heatmap of COVID-19 cases in North Carolina Counties. From data compiled by the NYTimes.
Heatmap of South Carolina from data compiled by the NYTimes
Heatmap of Covid 19 cases in Georgia. From data compiled by the NYTimes.
Heat Map of COVID-19 cases. From NYTimes data.
Heat Map of COVID-19 cases and doubling times. Based on NYTimes Data
Heatmap of Washington State Covid Cases. From NYTImes data
Covid data from NYTimes mapped with doubling rates encoded as color.
Heat Map of COVID cases in the US. Case doubling time is estimated using a spline fit of the case data. Extrapolated values are used to estimate regional severity.
#COVID19 Regional Growth
COVID-19 Case analysis from the #NYTimes #COVID19 data
An adaptation of the New York Times COVID-19 Map which includes an analysis of regional case growth rates.
Tracking regional trends in COVID-19 growth rates. Cases are spreading much faster outside the Puget Sound region, with doubling times on the order of four days, while withing the Puget Sound region, cases are doubling approximately every 9 days. #COVID #rCOVID
Regional Trends for the Bay Area, Greater California, and Southern California
Trends in COVID-19 cases in the Willamette Valley versus Grater Oregon.
Analysis of COVID-19 case growth in Washington State. While cases are higher in the Puget Sound region, their growth rate is now effectively half of that in the rest of the State.
Compares COVID-19 cases in Greater Oregon to the Willamette Valley region.
Analysis of COVID-19 cases fr the Puget Sound Region versus Greater Washington. Growth in Puget Sound has slowed while the rest of the state is accelerating.
A regional analysis of COVID-19 Cases for Washington State. Reveals substantial differences between the (largely urban) Puget Sound region and the (more rural) rest of the State. Growth in Puget Sound has achieved doubling times ~ 6 days while the rest of the State is still near ~3.
Mapping analysis of the network of Opioid transations involving WA 2016 to 2912 based on the Washington Post Opioid Data
Part of ongoing casual analysis of the Washington Post Opioid dataset.
This is an analysis of the Washington Post Opiod Dataset looking at the top million shipments (by volume) in 2006 to 2012. An interesting relationship between shipment size and rank, akin to Zipf's Law, is observed.
This is a draft document of learning from initial Education Survey
I used some sentiment analysis tools just to look at trends
This is an attempt to play around with animating sentiment analysis
#AI tweet fluxes measured using twitteR package. Now normalized to metro areas.
Collection of #TrumpWon tweets from the 2016 Presidential Debate
Updated method to include metro populations and areas.
I use `twitteR` and `ggmap` to display data on the usage of #rstats in tweets.
I use the recently published OkCupid data to understand from a practical standpoint what private information can be inferred from more public information.
Exploratory analysis of OkCupid data set from library(okcupid)
Several options for reverse geo-coding (i.e. determining a specific State and County from (latitude, longitude) coordinate pairs) are explored for both performance and accuracy. Direct reverse-geo-code API calls, which take about 200msec per point, are compared to computation via “point-in-polygon”, as well as machine-learning randomForest and nnet classification models. A Random Forest model with accuracy approaching 98% improves throughput by a factor of 104 over a web-based API call. A neural network model has faster prediction times, but its accuracy was lower and modeling times were prohibitively long.
This explores some canonical examples of word vectors based on the GloVe vectors from Pennington et al.
Plots a heat map of frequent words. Provides ability to filter for specific topics
playing aournd with GloVe 100d word vectors
Playing around with GloVe Vectors in R
This is just some playing around with word vectors
Using the GloVe word vectors I explore some well known examples of word vector relations. The GloVe word vectors are from Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014, http://nlp.stanford.edu/pubs/glove.pdf
An analysis of word frequencies used in the Debates by Presidential Candidates for the 2016 election cycle.
Analysis of Santiam Pass Oregon Crashes and their correlation to snow and other factors.
Visualization of ski days at Mt Bachelor Ski Area for the season of 2015-2016
Draft of Ski Analytics Visualization
Looks at different methods of analyzing speech from Presidential Debates 2015
This is a word-frequency analysis of the 2015 Presidential debate texts. The point of the analysis is to explore whether word analytics can reveal biases in the positions of candidates.
A data visualization exploration and analysis of the June2015 Top and Green500 Supercomputers.
Presentation on Coursera Capstone NLP word predictor
This is a "preliminary" analysis of data from 10 seasons of skiing.
A short exploratory analysis of four different text corpora for the Coursera Capstone. In addition to looking at basic statistics like word number and frequency, adherence to Zipf's Law is examined.
This is my presentation on the motor trend cars data. Very swank.
This analysis uses data from the Oregon DOT real time Twitter feeds on road conditions to understand historical trends affecting road safety. It shows the data have reasonable predictive value.
AN Exascalar analysis of the Top500/Green500 Supercomputers for architectural influencers. This is a first quick analysis. To be updated.
The Top500 and Green500 lists appear to be largely uncorrelated. However, but using Exascalar as an intermediate analysis point a linkage can be understood.
An analysis of the change of the Top500 and Green500 supercomputer populations between November and April 2014 using Exascalar
Short blog on Exascalar trends and some extrapolation
This is an analysis of the locations, dates, and density of accidents on Santiam Pass, Oregon, based on ODOT twitter feed data which reports realtime accident data.
This analysis uses threshold detection to look for increases in TMIN (daily minimum temperatures) in data from the NOAA National Climate Data Center http://www.ncdc.noaa.gov/cdo-web/datasets.
This is a quick analysis of the dependency of CPU performance, as measured by SPECCPU2006 on core count and MHz.
Hardly as week goes by without a data breach being reported. I found some data online that I was able to clean and analyze to partially answer the question: are the number and severity of hacking attacks increasing relative to the overall population of data breaches. The answer is surprising.
Analysis of storm event data for reproducible research class from JHU on Coursera.