RPubs

by RStudio

tahmad

Taha Ahmad

Recently Published

DATA624 Brewing Mode

New regulations are requiring us to understand our manufacturing process and the predictive factors with modeling. We also need to choose an optimal predictive model of PH to report to our boss. We have been given historical data regarding our manufacturing process that has already been split into a training set of "StudentData.xlsx" and a test set of "StudentEvaluation.xlsx". This will be a technical report that showcases our process of tidying the data received and exploring it.

over 1 year ago

DATA622 Final Project

Our goal is to utilize job salary data (retrieved from Ask A Manager Salary Survey here: https://docs.google.com/spreadsheets/d/1IPS5dBSGtwYVbjsfbaMCYIWnOuRmJcbequohNxCyGVw/edit?resourcekey#gid=1625408792) and demographic data (Federal Reserve Economic Data: https://fred.stlouisfed.org/release/tables?eid=257197&rid=110) to predict if the salaries in the survey end up above or below per capita personal income for state. The data itself is a bit messy being a real-world survey dataset, so this will take a decent amount of data transformation and cleaning. After cleaning and preparation, we want to build classification models consisting of a logit model, an svm model, and a neural network model. The end goal of this analysis is to build a model that is best able to predict if someone should have above or below the median state income with their personal qualifications that can be utilized by workers to be able to determine factors that contribute to being paid above or below the median state income and allow businesses to determine if they should be paying a work above or below median state income based on their qualifications.

over 1 year ago

DATA621 Final SalPred

Our goal is to utilize job salary data (retrieved from Ask A Manager Salary Survey here: https://docs.google.com/spreadsheets/d/1IPS5dBSGtwYVbjsfbaMCYIWnOuRmJcbequohNxCyGVw/edit?resourcekey#gid=1625408792) and demographic data to predict if the salaries in the survey end up above or below per capita personal income for state.

over 1 year ago

DATA622 Support Vector Machines

Our objective for this analysis is to model a support vector machine on a dataset to compare how the results change compared to previously built decision tree models. For our dataset we will utilize the sample sales dataset from (https://excelbianalytics.com/wp/downloads-18-sample-csv-files-data-sets-for-testing-sales/) containing 1,000,000 million records.

over 1 year ago

DATA624 JK Chapter 8

In this document, we will be going through exercises 8.1, 8.2, 8.3, and 8.7 from Applied Predictive Modeling - Kuhn and Johnson.

over 1 year ago

DATA624 JK Chapter 7

In this document, we will be going through exercises 7.2 and 7.5 from Applied Predictive Modeling - Kuhn and Johnson.

over 1 year ago

DATA621 HW4 Insurance

We will explore, analyze and model a data set containing approximately 8000 records. Each record represents a customer at an auto insurance company. Each record has various predictor variables regarding the customer’s car, job, and demographics. The response variables within this dataset indicate if the customer was in a car crash with a binary label and the value of damages done if the customer was in a car crash.

over 1 year ago

DATA624 JK Chapter 6

In this document, we will be going through exercises 6.2 and 6.3 from Applied Predictive Modeling - Kuhn and Johnson.

over 1 year ago

DATA 622 Decision Trees Algorithms

Our objective for this analysis is to make two decision trees with different variables used and a random forest model from a dataset to compare how the results change depending on which model we are using. For our dataset we will utilize the sample sales dataset from (https://excelbianalytics.com/wp/downloads-18-sample-csv-files-data-sets-for-testing-sales/) containing 1,000,000 million records.

over 1 year ago

DATA621 HW4 Insurance Predictions

Processing insurance customer data to generate a logistic regression model for determining if they're likely to crash and a linear regression model for how much the crash would cost if they are likely.

over 1 year ago

Curse of Dimensionality

A basic empirical exploration into the curse of dimensionality

over 1 year ago

DATA624 Project 1 Time Series Forecasting

The full tidy forecasting workflow applied to untidy time series with multiple keys.

over 1 year ago

DATA624 FPP Chapter 9

In this document, we will be going through exercises 9.1, 9.2, 9.3, 9.5, 9.6, 9.7, 9.8 from Forecasting: Principles and Practice (3rd ed).

over 1 year ago

DATA622 Exploratory Machine Learning Analysis

We will explore, analyze and model two sample sales datasets from (https://excelbianalytics.com/wp/downloads-18-sample-csv-files-data-sets-for-testing-sales/) containing 100 and 1,000,000 million records.

over 1 year ago

DATA624 FPP Chapter 8

In this document, we will be going through exercises 8.1, 8.5, 8.6, 8.7, 8.8, and 8.9 from Forecasting: Principles and Practice (3rd ed).

over 1 year ago

DATA624 JK Chapter 3

In this document, we will be going through exercises 3.1 and 3.2 from Applied Predictive Modeling - Kuhn and Johnson.

over 1 year ago

DATA624 FPP Chapter 5

In this document, we will be going through exercises 5.1, 5.2, 5.3, 5.4 and 5.7 from Forecasting: Principles and Practice (3rd ed).

over 1 year ago

DATA621 HW1 Moneyball

We will explore, analyze and model a data set containing approximately 2200 records. Each record represents a professional baseball team from the years 1871 to 2006 inclusive. Each record has the performance of the team for the given year, with all of the statistics adjusted to match the performance of a 162 game season

over 1 year ago

DATA624 FPP Chapter 3

In this document, we will be going through exercises 3.1, 3.2, 3.3, 3.4, 3.5, 3.7, 3.8 and 3.9 from Forecasting: Principles and Practice (3rd ed).

over 1 year ago

DATA624 FPP Chapter 2

Forecasting: Principles and Practice Exercises from Chapter 2

over 1 year ago

DATA 607 Final Project

Predictive modeling for Steam game playtime.

almost 2 years ago

DATA 605 Final Project

Fundamentals of computational mathematics final project culminating in a Kaggle submission for a regression model.

about 2 years ago

DATA 606 Final Project

In this project, we take data regarding games listed on Valve’s Steam platform for video games from the Steam Spy API. Specifically, we’re interested in the median playtime of the games with the top 100 users in the past 2 weeks from this data. With the additional variables of user rating, amount of game owners, and game price we also attempt to answer the question of if: The user ratings, reported by Steam, are related to median playtime. This is answered based on creating regression models from the data after processing, transforming, and removing the outliers from the data.

about 2 years ago

Data 607 Week 12 Project

When you are dealing with classification problems that have large sets of publicly available data, the best method to resolve such problems is to train a classifier on them. In this project we will be utilizing a public dataset of spam emails and ham (non-spam) emails from https://spamassassin.apache.org/ to attempt to train a model on detecting the difference between spam and ham emails.

about 2 years ago

DATA 605 Multiple Regression Discussion

Multiple regression model applied to video game based data.

about 2 years ago

Data 605 - Week 11 Discussion

Building a simple regression model and checking its fit for Steam Spy data.

about 2 years ago

DATA 607 Week 10 Assignment

With the textbook Text Mining with R by by Julia Silge and David Robinson, we explore utilizing sentiment analysis on text. We begin with mimicking code examples present in the text, and then we extend it to utilize a different corpus and sentiment lexicons.

about 2 years ago

DATA 607 Week 9 Project

For this assignment, we’ll be practicing our knowledge of Tidyverse functions by creating vignette examples of the packages that make up Tidyverse. In my case, I wanted to attempt going over the forcats package which focuses on manipulating factor elements in a dataframe, as I have no experience with using it at this point.

about 2 years ago

DATA 607 Week 9 Assignment

For this assignment, we’ll be testing our capabilities of accessing APIs and pulling json data from them into data frames. Specifically, we’ll be looking at data from the New York Times books API.

about 2 years ago

DATA 607 Week 7 Assignment

The goal of this assignment is to begin working on the ability to process data sourced from web sources that is not in a convenient direct download to csv or some sort of tabular data. The data types of focus on this assignment are html files which would be typical from direct scraping, along with XML and JSON files which are more likely to be retrieved from API utilization. To get directly familiar with these formats, we will create a representation of information regarding three books of a certain genre in the three different formats. After the data has been created, we will utilize various R packages in order to load the information as dataframes.

about 2 years ago

DATA 607 Week 6 Project

Tidying data is apparently one of the most common uses of your time as a data scientist. That is why having the methods for tidying data down is important to learn. In this assignment we will be importing untidy data from a .csv file, tidying the data up, and then performing analysis on the data. The data we will be working on are three different untidy datasets provided by our classmates:

about 2 years ago

DATA 607 Week 5 Assignment

In this assignment we will be importing untidy data from a .csv file, tidying the data up, and then performing analysis on the data. The data set we will be working on in is a small chart describing arrival delays for two airlines across five destinations. Ultimately we want to compare the arrival delays for the two airlines in our analysis.

about 2 years ago

DATA 605 Assignment 4

Utilizing eigenvalues and eigenvectors to generate new images that account for variance amongst a given set of original images.

about 2 years ago

DATA 607 Week 4 Project

Project 1 is structured around scraping data from an unfriendly text table about chess statistics to get it into R. Once the data has been wrangled, the next focus is transferring information that exists one multiple other rows to another single row, for every row. Our ultimate goal is to end up with a CSV that includes the data formatted into the columns: Player’s Name, Player’s State, Total Number of Points, Player’s Pre-Rating, and Average Pre Chess Rating of Opponents.

about 2 years ago

Sign In

tahmad

Taha Ahmad

Recently Published