RPubs

by RStudio

UniversitySolutionsHub

University Solutions Hub

Recently Published

Week 5 BDT

Read this article about doing statistics with categorical variables. Write at least 500 words discussing how to use these statistics to help understand big data.

almost 2 years ago

Write at least 500 words discussing one or more use cases for Spark. Use at least three sources. Include at least 3 quotes from your sources enclosed in quotation marks and cited in-line by reference to your reference list. Example: "words you copied" (citation) These quotes should be one full sentence not altered or paraphrased. Cite your sources using APA format. Use the quotes in your paragraphs. Write in essay format not in bulleted, numbered or other list format.

about 2 years ago

Week 2 BDT

Write at least 500 words discussing what Spark is and does. Explain what problems it solves. Use at least three sources. Include at least 3 quotes from your sources enclosed in quotation marks and cited in-line by reference to your reference list. Example: "words you copied" (citation) These quotes should be one full sentence not altered or paraphrased. Cite your sources using APA format. Use the quotes in your paragraphs.

about 2 years ago

Week 1 BDT

Write at least 500 words on what 'Big' means in Big Data. What exposure have you had to Big Data? Use at least three sources. Include at least 3 quotes from your sources enclosed in quotation marks and cited in-line by reference to your reference list. Example: "words you copied" (citation) These quotes should be one full sentence not altered or paraphrased. Cite your sources using APA format. Use the quotes in your paragraphs. Write in essay format not in bulleted, numbered or other list format.

about 2 years ago

Week 15 ML

Optimization of regression models Write a summary of the three most useful algorithms learned during the course.

about 2 years ago

Week 14 ML

Human Activity Recognition Final Case Analysis

about 2 years ago

Week 13 ML

Optimization of regression models Describe various aspects of DBSCAN and Mixture clustering methods. Describe the process of anomaly detection using clustering.

about 2 years ago

Week 12 ML

Clustering techniques Part 1 Describe in detail the difference between SOM and LLE. Which techniques do you think is more effective than K-means clustering.

about 2 years ago

Week 11 ML

Unsupervised Learning using Dimension Reduction

about 2 years ago

week 10 ML

Nutrition Case Study Write a formal report on your findings from the last several weeks for the regression problem of the Nutrition Case Study. A sample template for the final report is provided that contains minimum requirements for the report including the following sections: Introduction, Analysis and Results, Methodology, Limitations and Conclusion. The main objective is to write a fully executed R-Markdown program performing regression prediction using the best models found for and comparing the cost functions and R-square values. Make sure to describe the final hyperparameter settings of all algorithms that were used for comparison purposes.

about 2 years ago

Week 9 ML

Optimization of regression models Describe various aspects of machine learning training for regression training such as cost function, Gradient Descent, and Bias-Variance Tradeoff.

about 2 years ago

Week 8 ML

Preventing overfitting Describe the difference between LASSO and Ridge regression techniques. Do you have a preference for one over the other for the Nutrition case study predicting the response variable?

about 2 years ago

Week 7 ML

Ensemble Methods Describe how linear regression models are different from nonlinear models. Also describe the main idea behind polynomial terms in a nonlinear model.

about 2 years ago

Week 6 ML

Santander Bank Case Study Write a formal report on your findings from the last several weeks for the classification of the Santander Bank Case Study. The main objective is to write a fully executed R-Markdown program performing classification using the best models found for logistic regression, SVM, Random Forest and XGBoost algorithms, and comparing the values of their cost functions and accuracy scores. Make sure to describe the final hyperparameter settings of all algorithms that were used for comparison purposes. Clean and merge all the files using proper IDs discussed in the second week. Create one master data file to be analyzed for case study 2. Now perform EDA and share your findings in the form of an R Markdown report.

about 2 years ago

Week 5 ML

Ensemble Methods Describe the fundamental difference between Random Forest and a Boosted tree. Describe Out of bag sample and the cost functions for both types of models

about 2 years ago

Week 4 ML

Support Vector Machine Describe a kernel function in SVM. Also describe the process of feature engineering and boundary creation in SVM.

about 2 years ago

Week 3 ML

Logistic Regression

about 2 years ago

Week 2 ML

Challenges of ML Visit the Kaggle website for the Santander Bank classification challenge and describe the main problem. Also describe various challenges and limitations of this case analysis.

about 2 years ago

Week 1 ML

You are expected to be able to program in R prior to taking this class. Use Titanic dataset and perform EDA on various columns. Without using any modeling algorithms, and only using basic methods such as frequency distribution, describe the most important predictors of survival of Titanic passengers, e.g. were males or females more likely to survive, were young and rich females more likely to survive than old poor males etc?

about 2 years ago

Week 15 VA

about 2 years ago

Week 14 VA

Refinements in ggplot

about 2 years ago

Week 13 VA

Find a graph of covid 19 disease or economic data in a newspaper, journal, or website and recreate it. Find a map of covid 19 disease or economic data in a newspaper, journal, or website and recreate it.

about 2 years ago

Week 12 VA

about 2 years ago

Week 11 VA

about 2 years ago

Week 10 VA

load the broom library use tidy() on the out dataframe to produce a new dataframe of component level information. Store the result in out_comp. round all the columns to two decimal places using round_df(). Produce a flipped scatter plot of Term v. Estimate Produce a new tidy output of out including confidence intervals. Store it in a variable called out_conf after rounding the dataframe to two decimals. Remove the intercept column and the term continent from the label and make a plot of points with whiskers to show the coefficients with a confidence range and order the output from smallest to largest. use the head function to see the first six rows after applying the augment function to out. Store the result in out_aug. Add the data back into out_aug with the data = argument. plot the .fitted data v. the .resid data What does this graph show? using the pipe round the output of glance(out)

about 2 years ago

Week 9 VA

about 2 years ago

Week 8 VA

Return to the visualization for Presidential Elections: Popular and Electoral College margins, subset by party, and use that to add color to your points. Recreate figures 5.28 using functions from the dplyr library. Using gss_sm data, calculate the mean and median number of children by degree Using gapminder data, create a boxplot of life expectancy over time Using gapminder data, create a violin plot of population over time.

about 2 years ago

RPubs

UniversitySolutionsHub

University Solutions Hub

Recently Published

Week 5 BDT

Week 4 BDT

Week 3 BDT

Week 2 BDT

Week 1 BDT

Week 15 ML

Week 14 ML

Week 13 ML

Week 12 ML

Week 11 ML

week 10 ML

Week 9 ML

Week 8 ML

Week 7 ML

Week 6 ML

Week 5 ML

Week 4 ML

Week 3 ML

Week 2 ML

Week 1 ML

Week 15 VA

Week 14 VA

Week 13 VA

Week 12 VA

Week 11 VA

Week 10 VA

Week 9 VA

Week 8 VA

Week 7 VA

Week 6 VA

Week 5 VA

Week 4 VA

Visual Analytics Week 1

Sign In

UniversitySolutionsHub

University Solutions Hub

Recently Published