## Recently Published

##### Data 643 Discussion 2

I discuss a little about Hadoop vs, Spark.

##### 621 Final Presentation Slides -Draft

Slides for analysis of the Final Project of DATA 621

##### Correlation: Pearson v. Sprearman, v. Kendall

An example I made for a DATA 621 discussion.

##### Regression Analysis of Crime Statistics

In this report, I analyze crime statistics for Boston in 1978 using Logistic Regression to classify neighborhoods as high crime or low crime. I find that the clearest indicator if a neighborhood is high crime is high air pollution measured by concentration of nitrogen oxides, which is caused by fossil fuel combustion.

##### DATA 621 Question

A write up for a discussion question in DATA 621. The question has to do with explaining to others how outliers can influence regression.

##### DATA 606 Final Project

In this paper, I analyze salary and unemployment data pertaining to college majors. These data were obtained from fivethirtyeight.com's github page.
I find that STEM Majors earn more with less unemployment than Humanities, and that gender inequality in these two categories could play a role in the difference in pay.

##### DATA 607 Final Project

This is the final version of the analysis that Chunhui Zhu and I did of world electrical energy production.

##### DATA 607 Final Project Draft

This is a draft of Chunhui Zhu's and my Final Project where we examine the production of electricity and the flow of energy resources.

##### DATA 605 HW15

Our final assignment is about multivariate calculus.

##### DATA 607 Final Project Draft

Draft of the final project for DATA 607 where we analyze energy production and usage for the top 10 economies.

##### DATA 605 Week 15

This week is multivariate calculus.

##### Calculating pi

I use Euler's formula for pi to calculate pi and compare to r's default value.

##### DATA 605 Week 14

This week's assignment covers series and sequences.

##### DATA 607 Database Migration

I migrate a MySQL database to a Neo4J database.

##### DATA 606 Lab8

This week's lab is about multiple regression.

##### DATA 605 Week 13 part 2

Checking another student's work by request.

##### DATA 606 HW8

This Assignment covers Logistic and Multiple Regression models.

##### DATA 605 Week 13

This week's discussion is solving a Surface Area integral using trigonometric substitution and substitution methods.

##### DATA 605 Week 13 HW

This week's homework is a primer in basic calculus.

##### Data 605 HW12

This week's assignment covers multiple linear regression models and transformations to make data fit a linear regression.

##### Kinematics Presentation

A sample lesson on the derivation and use of kinematic equations.

##### DATA 607 Final Project Proposal

Our proposal for the Final Project

##### DATA 605 Week 12 part 2

I explain a couple methods to make non-linear data usable for linear regression.

##### DATA 605 Week 12

This week I perform a multiple regression analysis on Human Resources data from Kaggle.com: https://www.kaggle.com/ludobenistant/hr-analytics/data to see if the factors measures predict job satisfaction.

##### DATA 606 HW7

This assignment looks at the interpretation of linear models and calculating parameters such as slope from R and standard deviations.

##### DATA 606 Lab7

We use linear regression to reproduce the analysis done in the move Moneyball. I find that team Batting Average is the best traditional predictor of runs and On-Base%+Slugging is the best modern predictor of runs. All modern statistics out-preformed traditional statistics in predicting runs.

##### DATA 605 HW11

This week we perform linear regression on the breaking distance of a car vs speed, and see that just because you get a low p-value, it doesn't mean the model is valid.

##### DATA 607 Discussion Week 11 - Draft

This week we are tasked with qualitatively reverse engineering a recommend system. Our group selected grubhub. Essentially we are testing what recommendations are made based on our input and conjecturing what models grubhub uses.

##### DATA 605 Week11

We begin linear regression by examining the relationship between video duration and views for TED talks.

##### DATA 607 Project 4

We were tasked with using the text mining package in r, 'tm' and supervised learning techniques to classify emails as 'spam' or not. I was able to get >96% accuracy.

##### DATA 607 Context Presentation

This draft of my "Data Science in Context" presentation provides a basic formula for creating a word cloud from a data frame.

##### DATA 605 HW10

This assignment explores the Gambler's Ruin problem from 2 different strategies. First using a constant bet, and then using a increasing bet.
Markov Chains, Binomial Distribution and Simulations are used.

##### DATA 605 Week10

This week's discussion question involves the Gambler's Ruin problem.

##### DATA 607 HW9

The goal of this assignment is to extract data from the NYT's API in the form a json file and to format it as an R data frame.

##### DATA 606 Lab6

This lab covers inference of proportions.

##### DATA 605 HW9

This weeks assignment covers CLT for independent random variables and Moment Generating Functions.

##### DATA 606 HW6

This chapter covers proportion tests, calculating confidence intervals and Chi-sq tests.

##### DATA 606 HW Chapter 5

This assignment cover Hypothesis testing using Confidence Intervals, t-tests and ANOVA.

##### DATA 606 Lab5

In this week's lab we calculate confidence intervals and perform hypothesis testing on data about pregnancies.

##### DATA 605 Week9

This weeks discussion in DATA 605 tests the Central Limit Theorem for proportions.

##### DATA 607 Project 3 Presentation

Slides for DATA 607 presentation.

##### Data 606 Project Proposal

Minor correction made on the original.

##### Data 606 Project Proposal

This is my proposal for the final project for DATA 606. I will take an in depth look at incomes and employment statistics for 173 college majors using data obtained from the fivethirtyeight.com github page.

##### Project 3 Presentation Draft

This is a rough draft of the presentation slide for DATA 607 Project 3.

##### DATA 607 Project 3 Salaries

Extended Silverio's work to include Confidence Intervals, t-tests, and KS tests for salaries. I also adjusted salaries for Cost of Living Index.

##### DATA 605 HW8

This weeks assignment covers convolution of discrete and continuous random variables.

##### DATA 605 Week8

In time for the World Series, here I calculation the probability of at bat outcomes given a probability distribution for 4 at-bats.

##### DATA 607 HW7

This weeks assignment covers loading data from web-based formats: hmtl, xml, and json into r in the form of data frames.

##### DATA 605 HW7

This weeks assignment covers important probability densities and distributions, such as the Beta, Geometric, Exponential, Binomial, and Poisson.

##### DATA 606 HW4

This assignment covers calculating confidence intervals, p-values, and hypothesis testing.

##### DATA 606 Lab4b

This lab examines how to define a confidence intervals. Please note that the data in this file will be different than the data I had in R studio while writing, so the answers may not match the graphs and summary statistics.

##### DATA 605 Week7

This week's discussion covers common continuous probability densities and discrete distributions.

##### DATA 607 Project 2

The goal of this Project is to take data from three different sources. In this case two .csv files and one scraped from a web page, and use tidyr and dplyr to clean and reorganize the data for further analysis.

##### DATA 605 HW6

This assignment covers combinatorics and probability.

##### DATA 605 Week6

This week's discussion covers combinatorics and conditional probability.

##### DATA_606_Lab4a

This lab explores behaviors of sampling and populations needed to introduce the Central Limit Theorem and Confidence Intervals.

##### DATA 605 HW5

This assignment covers defining probability distributions and calculating probabilities from probability distributions.

##### DATA 607 HW5

We were tasked with tidying and analyzing a data set using r's tidyr and dplyr. I opted to use an SQL database for the starting data instead of a .csv file

##### DATA 606 HW3

This homework assignment covers probability distributions, namely the Normal, Geometric and Binomial distributions.

##### DATA 605 Week5

This week in DATA 605 is probability distributions. Here I use a basic simulation to solve a popular urban legend about a professor tricking his students, after they try to trick him/her.

##### DATA_606_Lab3

This Lab covers the properties of the Normal Distribution.

##### DATA 605 HW4

This week's homework covers the svd decomposition of a matrix and finding the inverse matrix from it's co-factors.

##### DATA 607 Project 1

In this project we were tasked with taking a semi-structured .txt file and creating a R markdown file that would output a .csv that can be used to populate a SQL database.

##### DATA 605 Week4

This week's topic is Linear Transformations. In this discussion question I show that a transformation is linear.

##### DATA 607 Extra Credit Questio

This was an optional question for Week 3 HW

##### DATA_605_HW3

This week's DATA 605 homework covers matrix ranks, eigenvalues, and eigenvectors.

##### DATA_607_HW3

This assignment covers using regular expressions to extract data from files.

##### DATA_606_HW2

This homework set covers Probability and discrete random variables.

##### DATA_605_HW2

This homework covers Matrix operations such as trace, transpose, matrix multiplication, and factorization.

##### DATA_606_Lab2

This lab analyzes Kobe Bryant's shooting performance during the 2009 NBA finals to test the "hot hands" hypothesis.
We find that Kobe performed no better that a simulation where the simulated shooter's hit percentage was set at Kobe's hit percentage.

##### Presentation Question for DATA 606

We have to present a practice problem from the text. I am presenting 1.23 which evaluates the methodology used in a survey.

##### 605 Discussion addition

Something I wanted to add to my discussion for DATA 605

##### N Cooper 605 Discussion

This is my week 2 discussion question to DATA 605.

##### DATA_606_Lab1

This Lab demonstrates techniques for initial data summaries and visualizations and how to subset data.

##### DATA_606_HW1

My solutions to the first homework set for CUNY DATA 606, Statistics and Probability for Data Analytics.

##### DATA_605_HW1

This is my first homework for DATA 605, Fundamentals of Computational Mathematics. This assignment mostly covers vector and matrix operations and solving systems of equations.

##### DATA_606_Lab0

This is a lab for CUNY DATA 606 to familiarize the student with the functionality of R and Rstudio.

##### DATA_607_HW1

This is my submission for CUNY's MS in DATA Science's DATA 607 Homework 1. The objectives were to load the data on mushrooms from a website using R into a data frame, then create a subset of that data frame with 3 or 4 columns from the original data. Finally we were tasked with relabeling the data headers and categories into a more readable format. I also added a few visualizations.

##### Pittsburgh Bridges

This is the test lab for DATA 607 for the CUNY MS in Data Science program.

##### Crime Data

This is an analysis of arrest records.

##### Week1 R Bridge

This is the homework for week 1 of the MSDA Bridge program in R