RPubs

by RStudio

Mansi_joshi

Mansi

Recently Published

Final_Project_Mansi

News Popularity in Social Media: Insights through Statistical Analysis

about 2 years ago

Week 13 | Data Dive — Critiquing Models and Analyses

Think about the context of the lab (chosen Lab8 here) and consider the following: - Analytical issues, such as model assumptions - Overcoming biases (existing or potential) - Possible risks or societal implications - Crucial issues which might not be measurable

about 2 years ago

Mansi_Data_Dive_Time_based_Data

Select a column of your data that encodes time (e.g., "date", "timestamp", "year", etc.). Convert this into a Date in R. Choose a column of data to analyze over time. This should be a "response-like" variable that is of particular interest. Create a tsibble object of just the date and response variable. Then, plot your data over time. Consider different windows of time. What stands out immediately? Use linear regression to detect any upwards or downwards trends. Do you need to subset the data for multiple trends? How strong are these trends? Use smoothing to detect at least one season in your data, and interpret your results. Can you illustrate the seasonality using ACF or PACF?

about 2 years ago

Mansi_Data_Dive_GLMs_Part2

Build a linear (or generalized linear) model as you like - Use whatever response variable and explanatory variables you prefer Use the tools from previous weeks to diagnose the model - Highlight any issues with the model Interpret at least one of the coefficients

about 2 years ago

Mansi_Data_Dive_GLMs

Select an interesting binary column of data, or one which can be reasonably converted into a binary variable This should be something worth modeling Build a logistic regression model for this variable, using between 1-4 explanatory variables Interpret the coefficients, and explain what they mean in your notebook (Bonus) Using the Standard Error for at least one coefficient, build a C.I. for that coefficient, and interpret its meaning Consider a transformation for any explanatory variable, and illustrate why you need the transformation (or why you do not) Scatter Plots ...

about 2 years ago

Data Dive — Regression

Your RMarkdown notebook for this data dive should contain the following: Select a continuous (or ordered integer) column of data that seems most "valuable" given the context of your data, and call this your response variable. For example, in the Ames housing data, the price of the house is likely of the most value to both buyers and sellers. This is the thing most people will ask about when it comes to houses. Select a categorical column of data (explanatory variable) that you expect might influence the response variable. Devise a null hypothesis for an ANOVA test given this situation. Test this hypothesis using ANOVA, and summarize your results. Be clear about how the R output relates to your conclusions. If there are more than 10 categories, consider consolidating them before running the test using the methods we've learned in class. Explain what this might mean for people who may be interested in your data. E.g., "there is not enough evidence to conclude [----], so it would be safe to assume that we can [------]". Find at least one other continuous (or ordered integer) column of data that might influence the response variable. Make sure the relationship between this variable and the response is roughly linear. Build a linear regression model of the response using just this column, and evaluate its fit. Run appropriate hypothesis tests and summarize their results. Use diagnostic plots to identify any issues with your model. Interpret the coefficients of your model, and explain how they relate to the context of your data. For example, can you make any recommendations about an optimal way of doing something? Include at least one other variable into your regression model (e.g., you might use the one from the ANOVA), and evaluate how it helps (or doesn't). Maybe include an interaction term, but explain why you included it. You can add up to 4 variables if you like.

about 2 years ago

Mansi Data Dive — Confidence Interval

## Part 1: Build at least three sets of variable combinations - For each set of variables, include at least one column that you created (i.e., calculated based on others) - All variables for this data dive should be either continuous (i.e., numeric) or ordered (e.g., ['small', 'medium', 'large'] is okay, but ["apples", "oranges", "bananas"] is not) - For each set, there should be one response variable with the others as explanatory variables ## Part 2: Plot a visualization for each response-explanatory relationship, and draw some conclusions based on the plot - Use what we've covered so far in class to scrutinize the plot (e.g., are there any outliers?) ## Part 3 : Calculate the appropriate correlation coefficient for each of these combinations - Explain why the value makes sense (or doesn't) based on the visualization(s) ## Part 4: Build a confidence interval for each of the response variables. Provide a detailed conclusion of the response variable (i.e., the population) based on your confidence interval.

about 2 years ago

Mansi_Data_Dive_Documentation

1. A list of at least 3 columns (or values) in your data which are unclear until you read the documentation. E.g., this could be a column name, or just some value inside a cell of your data Why do you think they chose to encode the data the way they did? What could have happened if you didn't read the documentation? 2. At least one element or your data that is unclear even after reading the documentation You may need to do some digging, but is there anything about the data that your documentation does not explain? 3. Build a visualization which uses a column of data that is affected by the issue you brought up in bullet #2, above. In this visualization, find a way to highlight the issue, and explain what is unclear and why it might be unclear. You can use color or an annotation, but also make sure to explain your thoughts using Markdown Do you notice any significant risks? If so, what could you do to reduce negative consequences?

about 2 years ago

Mansi Assignment: Sampling n Drawing Conclusions

Your RMarkdown notebook for this data dive should contain the following: A collection of 5-10 random samples of data (with replacement) from at least 6 columns of data Each subsample should be as long as roughly 50% percent of your data. We are simulating the act of collecting data from a population where the "population" is represented by the data set you already have. Store each sample set in a separate data frame (e.g., df_i might contain m rows from columns 1-6) These subsamples should include both categorical and continuous (numeric) data Scrutinize these subsamples. How different are they? What would you have called an anomaly in one sub-sample that you wouldn't in another? Are there aspects of the data that are consistent among all sub-samples? Consider how this investigation affects how you might draw conclusions about the data in the future.

about 2 years ago

Mansi_Assignment2_Week3

Week 3 | Data Dive — Probabilities and Anomalies

over 2 years ago

Assignment: Data Dive - Summaries

RMarkdown notebook for this data dive on News Popularity in Multiple Social Media Platforms

over 2 years ago

Sign In

Mansi_joshi

Mansi

Recently Published