Recently Published
Reformatted Report for General Readers
Produce a version of your recent report on voter data, which is formatted for a non-technical person to read.
Hide all code, and improve the formatting of the output so that it reads like an informative document on the topic that you are studying. Use all formatting options that are available to you to ensure that the output is presented in a professional, neat, and well organized manner.
Use Markdown syntax to format the text in your report.
Use ggplot2 & ggthemes packages to produce good looking visualizations
Use knitr & kableExtra packages to format tables nicely.
Use chunk options to hide messages, warnings, and code from the output.
Conducting Analysis & Running Statistical Tests
Step 1: State your research Question.
Step 2: Describe what variables you will use to answer your research question. (be sure that you have at least 1 of each type of variable - categorical/continuous)
Step 3: Prepare your data, recode those variables.
Step 4: Generate crosstabs, or tables comparing averages, informing the answer to your research question.
Step 5: For comparisons of categorical x continuous variables, generate sampling distributions which show the difference in distributions of sample means. Perform t-tests to confirm the presence of a statistically significant difference in mean scores.
Step 6: For categorical x categorical variables, perform a chi-squared test to confirm that the categories are not independent of one another. Present the table which displays the null hypothesis (complete independence from one another), and which displays the actual frequency distribution.
DATA 333 - Recoding Variables
Look through the guide to the 2018 Survey data to get familiar with the variables that are available to you in this dataset.
Come up with a research question that you might answer, using some of the variables contained in this data.
Specifically, Identify 5-6 variables which you might use to answer your research question.
Recode these variables from their numeric form, to their labeled form.
Generate some tables (crosstabs, averages) which might help you to begin answering the research question that you identified above.
Post your file on Rpubs, and submit the file here by the deadline.
Be sure that your post includes:
Research Question
List of variables you will use to answer research question.
Code showing the recoding of variables above from their numeric to their labeled formats.
Tables showing preliminary investigation of the relationships between variables.
DATA 333 - Assignment 10
Choose 3 of the continuous variables from the voter dataset.
Also, choose a categorical variable from the dataset which will segment respondents into two or more groups. For example - The gender variable will segment respondents into two groups - Male & Female.
Load the voter data into R with read_csv()
Use group_by(), summarize(), and mean() to compare how subsets of respondents differ in their mean scores for the continuous variables that you selected.
Use ggplot to create histograms show how subsets of respondents differ in their distribution of the the continuous variables that you selected.
DATA 333 - Assignment 9
Modify your recent Voter Data Analysis, reordering factor variables so that they are presented in your various tables/charts in their logical order to support ease of interpretation.
Assignment 8: Distributions with group_by(), summarize(), and mutate()
Replicate the analysis that you conducted for assignment 7, this time using group_by, summarize, and mutate to generate percent distributions rather than using table and prop.table.
Other than the change from table/prop.table to group_by/summarize/mutate, you should follow all of the same instructions that are listed for assignment 7.
For 1 point of extra credit, produce a bar chart to visualize the relationships illustrated by each of your tables.
DATA 333 - Assignment 7
1. Import the attached .csv [Abbreviated Dataset Labeled(October Only).csv ] file into R.
2. Use the head() command to preview the data & confirm that it has been imported.
3. Generate cross tabs so that you can investigate the relationship between variables of your choice. Minimally, you should investigate 6 different variables. Use column percent, or row percent appropriately to identify meaningful relationships between variables.
4. What are the tables telling you? Interpret your cross tabs in plain language, articulating the most important relationships that are uncovered by the cross tab.
DATA 333 - Assignment 5
Choose 1 of the datasets that we have used in class.
-gapminder data from the gapminder package
-flights from the nycflights13 package
-murders from the dslabs package.
Summarize this data and arrive at a noteworthy insight. Produce a visualization using the ggplot2 package which allows you to highlight this relationship visually.
Voice Recognition Software
DATA 333 - Assignment 2:
Identify any aspect of society that has been, or has the potential to be impacted by modern data science/analytics. Put together 2-3 sources to help you develop an informed understanding of this topic (news articles, peer-reviewed journals, etc…). Please be sure to provide links to your sources at the bottom of your post.
Write 2-3 sentences describing some of the potential advantages that modern data science affords us in this area.
Write 2-3 sentences describing some of the potential risks posed by modern data science in this area.