gravatar

mdat

mdat

Recently Published

Hard Drive Failure
Exploring a dataset on hard drive failure, and then building a predictive (classification) model with H2O. This is a basic model built for the purpose of testing how to implement the model with Plumber as an API. As the subsequent steps are not done in Markdown, they will not be placed here. All files are available on GitHub at: https://github.com/supermdat/hard_drive_failure
Chicago Subway Ridership - Step 08
Step 08 of a multi-part forecasting of Chicago subway ridership. Focusing on the visualization of accuracy metrics from the various models used.
Chicago Subway Ridership - Step 07
Step 07 of a multi-part forecasting of Chicago subway ridership. Focusing Keras LSTM models.
Chicago Subway Ridership - Step 06
Step 06 of a multi-part forecasting of Chicago subway ridership. Focusing on calculating accuracy stats for all models to this point.
Chicago Subway Ridership - Step 05
Step 05 of a multi-part forecasting of Chicago subway ridership. Focusing on H2O AutoML models.
Chicago Subway Ridership - Step 04
Step 04 of a multi-part forecasting of Chicago subway ridership. Focusing on ARIMA and Prophet models.
Chicago Subway Ridership - Step 03
Step 03 of a multi-part forecasting of Chicago subway ridership. Focusing on Random Forest and XGBTree models.
Chicago Subway Ridership - Step 02
Step 02 of a multi-part forecasting of Chicago subway ridership. Focusing on setting up the modeling infrastructure.
Chicago Subway Ridership - Step 01
Step 01 of a multi-part forecasting of Chicago subway ridership. Focusing on obtaining data and feature engineering.
Comparing Regex Models Classifying Rat Service Calls
This notebook compares two regex models classifying the outcomes of service calls for rat abatement (i.e., classifying the outcome as `rats_found`, or `no_rats_found`, or `unknown`). Each model is run on two texts, a “raw text” and a “cleaned text,” for a total of four comparisons. A sample of manually categorized text is also used for creating Confusion Matrix Statistics.
Classify Rat Service Calls Regex
This is a work-in-process notebook that uses natural language processing (primarily Latent Dirichlet Allocation) to inform a regex model which classifies the outcome of service calls for rat abatement (i.e., classifying the outcome as `rats_found`, or `no_rats_found`, or `unknown`). The specific dataset used is the "newer" version: dc_311-2017-10-07.csv
Classify Rat Service Calls LDA
This is a work-in-process notebook that uses natural language processing (primarily Latent Dirichlet Allocation) to classify the outcome of service calls for rat abatement (i.e., classifying the outcome as `rats_found`, or `no_rats_found`, or `unknown`). The specific dataset used is the "older" version: dc_311-2017-01-16.csv
DCMetroBus_Notebook
Notebook for analyzing data on DC buses.