gravatar

dhnanjay

Dhananjay Kumar

Recently Published

Data 608
DATA 609 - HW 4
HW 3
DATA 609: Week2 Homework
DATA609 - HW1
Netflix Recommendation Engine
Final Project - DATA 605
HW14
HW-12
NUMERICAL INTEGRATION AND DIFFERENTIATION
HW 12
HW 11 Data 605
Data 605 - hw10
HW 9 - Data 605
HW 7 - Data 605
HW8 - Data 605
Assignment 6
hw5
DATA 605 - Assignment 3
Data 605 Assignment 1
Final Project Data607
Data606 Presentation
Final Project Data606
Multiple linear regression
Multiple linear regression Lab 8
Data606 - Project Proposal
Introduction to linear regression
Introduction to linear regression (DATA606 - Lab 7)
Inference for Categorical Data
Inference for categorical data - Lab 6 (DATA606)
Lab 5
Inference for numerical data
Lab 4b
Foundations for statistical inference - Confidence intervals
Data606 Lab 4a
Foundations for statistical inference - Sampling distributions
NoSQL Migration
MySQL to MongoDB Migration using R
Week 10 Assignment: Document Classification
It can be useful to be able to classify new “test” documents using already classified “training” documents. A common example is using a corpus of labeled spam and ham (non-spam) e-mails to predict whether or not a new document is spam. For this project, you can start with a spam/ham dataset, then predict the class of new documents (either withheld from the training dataset or from another source such as your own spam folder). One example corpus: https://spamassassin.apache.org/publiccorpus/
Web APIs
Our task for the week 9 assignment is to choose one of the New York Times APIs, construct an interface in R to read in the JSON data, and transform it to an R dataframe. For this assignment I opted for the Best Seller Books API from New York Times.
Working with XML and JSON in R
Pick three of your favorite books on one of your favorite subjects. At least one of the books should have more than one author. For each book, include the title, authors, and two or three other attributes that you find interesting. Take the information that you’ve selected about these three books, and separately create three files which store the book’s information in HTML (using an html table), XML, and JSON formats (e.g. “books.html”, “books.xml”, and “books.json”).
Tidying and Transforming Data
Assignment 1. Create a .CSV file that includes the Airline information . 2. Read the information from your .CSV file into R, and use tidyr and dplyr as needed to tidy and transform your data. 3. Perform analysis to compare the arrival delays for the two airlines.
Project 1: Chess Tournament
In this project, you’re given a text file with chess tournament results where the information has some structure. Your job is to create an R Markdown file that generates a .CSV file (that could for example be imported into a SQL database) with the following information for all of the players: Player’s Name, Player’s State, Total Number of Points, Player’s Pre-Rating, and Average Pre Chess Rating of Opponents
Regular Expression
Solution for problems 3, 4 and 9 from chapter 8 of Automated Data Collection in R.
Movie Ratings
Choose six recent popular movies. Ask at least five people that you know (friends, family, classmates, imaginary friends) to rate each of these movie that they have seen on a scale of 1 to 5. Take the results (observations) and store them in a SQL database. Load the information into an R dataframe.
R-Week-5 Final Project
R Final Presentation
R-Week 4 assignment
Week 4 Assignment Your assignment should be published to RPubs.com. Place your script in a public repository on GitHub and submit a link to the URL here. You may work with up to three partners of your own choosing on this assignment. Group work should be more impressive.  Exploratory Data Analysis in R. Choose an interesting dataset and use R graphics to describe the data. You may use base R graphics, or a graphics package of your choice. You should include at least one example of each of the following: • histogram • boxplot • scatterplot Do the graphics provide insight into any relationships in the data? You may use R code that already exists in a book or a web site as a starting point for your work. You need to cite your source, and you need to clearly delineate what you were given to start with, and what is your own original analysis, so that your grade can be based on your own work! This is an open-ended assignment. Full credit will be given for any reasonable effort, but as with most technical learning work, the more sustained effort over time that you invest, the more you will get out of it. If you don’t know where to start, there is a list of over 700 R datasets here: http://vincentarelbundock.github.io/Rdatasets/. You can of course also load data from another source, such as a .CSV file. You’ll want to pick a dataset where it makes sense to use histograms, boxplots, and scatterplots.