Nathaniel Cooper

Recently Published

Correlation: Pearson v. Sprearman, v. Kendall
An example I made for a DATA 621 discussion.
Second Lab Quiz
Lab Quiz 1
Regression Analysis of Crime Statistics
In this report, I analyze crime statistics for Boston in 1978 using Logistic Regression to classify neighborhoods as high crime or low crime. I find that the clearest indicator if a neighborhood is high crime is high air pollution measured by concentration of nitrogen oxides, which is caused by fossil fuel combustion.
DATA 621 Question
A write up for a discussion question in DATA 621. The question has to do with explaining to others how outliers can influence regression.
DATA 606 Final Project
In this paper, I analyze salary and unemployment data pertaining to college majors. These data were obtained from's github page. I find that STEM Majors earn more with less unemployment than Humanities, and that gender inequality in these two categories could play a role in the difference in pay.
DATA 607 Final Project
This is the final version of the analysis that Chunhui Zhu and I did of world electrical energy production.
DATA 607 Final Project Draft
This is a draft of Chunhui Zhu's and my Final Project where we examine the production of electricity and the flow of energy resources.
DATA 605 HW15
Our final assignment is about multivariate calculus.
DATA 607 Final Project Draft
Draft of the final project for DATA 607 where we analyze energy production and usage for the top 10 economies.
DATA 605 Week 15
This week is multivariate calculus.
Calculating pi
I use Euler's formula for pi to calculate pi and compare to r's default value.
DATA 605 HW14
DATA 605 Week 14
This week's assignment covers series and sequences.
DATA 607 Database Migration
I migrate a MySQL database to a Neo4J database.
DATA 606 Lab8
This week's lab is about multiple regression.
DATA 605 Week 13 part 2
Checking another student's work by request.
DATA 606 HW8
This Assignment covers Logistic and Multiple Regression models.
DATA 605 Week 13
This week's discussion is solving a Surface Area integral using trigonometric substitution and substitution methods.
DATA 605 Week 13 HW
This week's homework is a primer in basic calculus.
Data 605 HW12
This week's assignment covers multiple linear regression models and transformations to make data fit a linear regression.
Kinematics Presentation
A sample lesson on the derivation and use of kinematic equations.
DATA 607 Final Project Proposal
Our proposal for the Final Project
DATA 605 Week 12 part 2
I explain a couple methods to make non-linear data usable for linear regression.
DATA 605 Week 12
This week I perform a multiple regression analysis on Human Resources data from to see if the factors measures predict job satisfaction.
DATA 606 HW7
This assignment looks at the interpretation of linear models and calculating parameters such as slope from R and standard deviations.
DATA 606 Lab7
We use linear regression to reproduce the analysis done in the move Moneyball. I find that team Batting Average is the best traditional predictor of runs and On-Base%+Slugging is the best modern predictor of runs. All modern statistics out-preformed traditional statistics in predicting runs.
DATA 605 HW11
This week we perform linear regression on the breaking distance of a car vs speed, and see that just because you get a low p-value, it doesn't mean the model is valid.
DATA 607 Discussion Week 11 - Draft
This week we are tasked with qualitatively reverse engineering a recommend system. Our group selected grubhub. Essentially we are testing what recommendations are made based on our input and conjecturing what models grubhub uses.
DATA 605 Week11
We begin linear regression by examining the relationship between video duration and views for TED talks.
DATA 607 Project 4
We were tasked with using the text mining package in r, 'tm' and supervised learning techniques to classify emails as 'spam' or not. I was able to get >96% accuracy.
DATA 607 Context Presentation
This draft of my "Data Science in Context" presentation provides a basic formula for creating a word cloud from a data frame.
DATA 605 HW10
This assignment explores the Gambler's Ruin problem from 2 different strategies. First using a constant bet, and then using a increasing bet. Markov Chains, Binomial Distribution and Simulations are used.
DATA 605 Week10
This week's discussion question involves the Gambler's Ruin problem.
DATA 607 HW9
The goal of this assignment is to extract data from the NYT's API in the form a json file and to format it as an R data frame.
DATA 606 Lab6
This lab covers inference of proportions.
DATA 605 HW9
This weeks assignment covers CLT for independent random variables and Moment Generating Functions.
DATA 606 HW6
This chapter covers proportion tests, calculating confidence intervals and Chi-sq tests.
DATA 606 HW Chapter 5
This assignment cover Hypothesis testing using Confidence Intervals, t-tests and ANOVA.
DATA 606 Lab5
In this week's lab we calculate confidence intervals and perform hypothesis testing on data about pregnancies.
DATA 605 Week9
This weeks discussion in DATA 605 tests the Central Limit Theorem for proportions.
DATA 607 Project 3 Presentation
Slides for DATA 607 presentation.
Data 606 Project Proposal
Minor correction made on the original.
Data 606 Project Proposal
This is my proposal for the final project for DATA 606. I will take an in depth look at incomes and employment statistics for 173 college majors using data obtained from the github page.
Project 3 Presentation Draft
This is a rough draft of the presentation slide for DATA 607 Project 3.
DATA 607 Project 3 Salaries
Extended Silverio's work to include Confidence Intervals, t-tests, and KS tests for salaries. I also adjusted salaries for Cost of Living Index.
DATA 605 HW8
This weeks assignment covers convolution of discrete and continuous random variables.
DATA 605 Week8
In time for the World Series, here I calculation the probability of at bat outcomes given a probability distribution for 4 at-bats.
DATA 607 HW7
This weeks assignment covers loading data from web-based formats: hmtl, xml, and json into r in the form of data frames.
DATA 605 HW7
This weeks assignment covers important probability densities and distributions, such as the Beta, Geometric, Exponential, Binomial, and Poisson.
DATA 606 HW4
This assignment covers calculating confidence intervals, p-values, and hypothesis testing.
DATA 606 Lab4b
This lab examines how to define a confidence intervals. Please note that the data in this file will be different than the data I had in R studio while writing, so the answers may not match the graphs and summary statistics.
DATA 605 Week7
This week's discussion covers common continuous probability densities and discrete distributions.
DATA 607 Project 2
The goal of this Project is to take data from three different sources. In this case two .csv files and one scraped from a web page, and use tidyr and dplyr to clean and reorganize the data for further analysis.
DATA 605 HW6
This assignment covers combinatorics and probability.
DATA 605 Week6
This week's discussion covers combinatorics and conditional probability.
This lab explores behaviors of sampling and populations needed to introduce the Central Limit Theorem and Confidence Intervals.
DATA 605 HW5
This assignment covers defining probability distributions and calculating probabilities from probability distributions.
DATA 607 HW5
We were tasked with tidying and analyzing a data set using r's tidyr and dplyr. I opted to use an SQL database for the starting data instead of a .csv file
DATA 606 HW3
This homework assignment covers probability distributions, namely the Normal, Geometric and Binomial distributions.
DATA 605 Week5
This week in DATA 605 is probability distributions. Here I use a basic simulation to solve a popular urban legend about a professor tricking his students, after they try to trick him/her.
This Lab covers the properties of the Normal Distribution.
DATA 605 HW4
This week's homework covers the svd decomposition of a matrix and finding the inverse matrix from it's co-factors.
DATA 607 Project 1
In this project we were tasked with taking a semi-structured .txt file and creating a R markdown file that would output a .csv that can be used to populate a SQL database.
DATA 605 Week4
This week's topic is Linear Transformations. In this discussion question I show that a transformation is linear.
DATA 607 Extra Credit Questio
This was an optional question for Week 3 HW
This week's DATA 605 homework covers matrix ranks, eigenvalues, and eigenvectors.
This assignment covers using regular expressions to extract data from files.
This homework set covers Probability and discrete random variables.
This homework covers Matrix operations such as trace, transpose, matrix multiplication, and factorization.
This lab analyzes Kobe Bryant's shooting performance during the 2009 NBA finals to test the "hot hands" hypothesis. We find that Kobe performed no better that a simulation where the simulated shooter's hit percentage was set at Kobe's hit percentage.
Presentation Question for DATA 606
We have to present a practice problem from the text. I am presenting 1.23 which evaluates the methodology used in a survey.
605 Discussion addition
Something I wanted to add to my discussion for DATA 605
N Cooper 605 Discussion
This is my week 2 discussion question to DATA 605.
This Lab demonstrates techniques for initial data summaries and visualizations and how to subset data.
My solutions to the first homework set for CUNY DATA 606, Statistics and Probability for Data Analytics.
This is my first homework for DATA 605, Fundamentals of Computational Mathematics. This assignment mostly covers vector and matrix operations and solving systems of equations.
This is a lab for CUNY DATA 606 to familiarize the student with the functionality of R and Rstudio.
This is my submission for CUNY's MS in DATA Science's DATA 607 Homework 1. The objectives were to load the data on mushrooms from a website using R into a data frame, then create a subset of that data frame with 3 or 4 columns from the original data. Finally we were tasked with relabeling the data headers and categories into a more readable format. I also added a few visualizations.
Pittsburgh Bridges
This is the test lab for DATA 607 for the CUNY MS in Data Science program.
Crime Data
This is an analysis of arrest records.
Week1 R Bridge
This is the homework for week 1 of the MSDA Bridge program in R