RPubs

by RStudio

mdat

Recently Published

Hard Drive Failure

Exploring a dataset on hard drive failure, and then building a predictive (classification) model with H2O. This is a basic model built for the purpose of testing how to implement the model with Plumber as an API. As the subsequent steps are not done in Markdown, they will not be placed here. All files are available on GitHub at: https://github.com/supermdat/hard_drive_failure

almost 5 years ago

Chicago Subway Ridership - Step 08

Step 08 of a multi-part forecasting of Chicago subway ridership. Focusing on the visualization of accuracy metrics from the various models used.

over 5 years ago

Chicago Subway Ridership - Step 07

Step 07 of a multi-part forecasting of Chicago subway ridership. Focusing Keras LSTM models.

over 5 years ago

Chicago Subway Ridership - Step 06

Step 06 of a multi-part forecasting of Chicago subway ridership. Focusing on calculating accuracy stats for all models to this point.

over 5 years ago

Chicago Subway Ridership - Step 05

Step 05 of a multi-part forecasting of Chicago subway ridership. Focusing on H2O AutoML models.

over 5 years ago

Chicago Subway Ridership - Step 04

Step 04 of a multi-part forecasting of Chicago subway ridership. Focusing on ARIMA and Prophet models.

over 5 years ago

Chicago Subway Ridership - Step 03

Step 03 of a multi-part forecasting of Chicago subway ridership. Focusing on Random Forest and XGBTree models.

over 5 years ago

Chicago Subway Ridership - Step 02

Step 02 of a multi-part forecasting of Chicago subway ridership. Focusing on setting up the modeling infrastructure.

over 5 years ago

Chicago Subway Ridership - Step 01

Step 01 of a multi-part forecasting of Chicago subway ridership. Focusing on obtaining data and feature engineering.

over 5 years ago

Comparing Regex Models Classifying Rat Service Calls

This notebook compares two regex models classifying the outcomes of service calls for rat abatement (i.e., classifying the outcome as `rats_found`, or `no_rats_found`, or `unknown`). Each model is run on two texts, a “raw text” and a “cleaned text,” for a total of four comparisons. A sample of manually categorized text is also used for creating Confusion Matrix Statistics.

over 6 years ago

Classify Rat Service Calls Regex

This is a work-in-process notebook that uses natural language processing (primarily Latent Dirichlet Allocation) to inform a regex model which classifies the outcome of service calls for rat abatement (i.e., classifying the outcome as `rats_found`, or `no_rats_found`, or `unknown`). The specific dataset used is the "newer" version: dc_311-2017-10-07.csv

over 6 years ago

Classify Rat Service Calls LDA

This is a work-in-process notebook that uses natural language processing (primarily Latent Dirichlet Allocation) to classify the outcome of service calls for rat abatement (i.e., classifying the outcome as `rats_found`, or `no_rats_found`, or `unknown`). The specific dataset used is the "older" version: dc_311-2017-01-16.csv

over 6 years ago

DCMetroBus_Notebook

Notebook for analyzing data on DC buses.

about 7 years ago

Sign In

mdat

mdat

Recently Published