RPubs

by RStudio

conya1

Njenwieh Onya Clovis

Recently Published

Project 3: Climate Change

Climate change is one of the most fiercely debated scientific issues of the past 20 years. Human-induced warming is superimposed on a naturally varying climate, the temperature rise has not been, and will not be, uniform or smooth across the country or over time.

over 4 years ago

Suicide Rate Analysis

Suicide presents a major challenge to public health in the United States and worldwide. It contributes to premature death, morbidity, lost productivity, and health care costs. In 2015 (the most recent year of available death data), suicide was responsible for 44,193 deaths in the U.S., which is approximately one suicide every 12 minutes. In 2015, suicide ranked as the 10th leading cause of death and has been among the top 12 leading causes of death since 1975 in the U.S. Overall suicide rates increased 28% from 2000 to 2015. Suicide is a problem throughout the life span; it is the third leading cause of death for youth 10–14 years of age, the second leading cause of death among people 15–24 and 25–34 years of age; the fourth leading cause among people 35 to 44 years of age, the fifth leading cause among people ages 45–54 and eighth leading cause among people 55–64 years of age.

over 4 years ago

Creating Maps in R

complete steps 1 – 10 to create a static map and the interactive map

over 4 years ago

Web Scraping

1. What is Web Scraping? Web scraping is a technique for converting the data present in unstructured format (HTML tags) over the web to the structured format which can easily be accessed and used. Almost all the main languages provide ways for performing web scraping. In this article, we’ll use R for scraping the data for the most popular feature films of 2016 from the IMDb website. We’ll get a number of features for each of the 100 popular feature films released in 2016. Also, we’ll look at the most common problems that one might face while scraping data from the internet because of the lack of consistency in the website code and look at how to solve these problems.

over 4 years ago

DSlabs using Highcharter and Ggplot2

According to NASA, global warming is the unusually rapid increase in Earths average surface temperature over the past century primarily due to the greenhouse gases released by people burning fossil fuels. The three most important greenhouse gases in the atmosphere are carbon dioxide (CO2), methane (CH4) and nitrrous oxide (N2O). While carbon dioxide is the greenhouse gas we hear the most about, methane and nitrous oxide have greater global warming potential (GWP) too and thus warm up the climate quicker requiring a response from everyone.

over 4 years ago

Representation of the GDP of Africa's Largest Economy

Details for Nations Dataset Charts Assignment • For both charts, you will first need to create a new variable in the data, using mutate from dplyr, giving the GDP of each country in trillions of dollars, by multiplying gdp_percap by population and dividing by a trillion. • Draw both charts with ggplot2. • For the first chart, you will need to filter the data with dplyr for the four desired countries. When making the chart with ggplot2 you will need to add both geom_point and geom_line layers, and use the Set1 ColorBrewer palette using: scale_color_brewer(palette = "Set1"). • For the second chart, using dplyr you will need to group_by region and year, and then summarize on your mutated value for gdp using summarise(GDP = sum(gdp, na.rm = TRUE)). (There will be null values, or NAs, in this data, so you will need to use na.rm = TRUE). • Each region’s area will be generated by the command geom_area () • When drawing the chart with ggplot2, you will need to use the Set2 ColorBrewer palette using scale_fill_brewer(palette = "Set2") • Think about the difference between fill and color when making the chart, and where the above fill command needs to go in order for the regions to fill with the different colors when making the chart, and put a very thin white line around each area.

over 4 years ago

Document

almost 5 years ago

NYCFlights13

The nycflights13 dataset is a collection of data pertaining to different airlines flying from different airports in NYC, also capturing flight, plane specific details during the year of 2013.

almost 5 years ago

Treemaps Heatmaps Streamgraphs and Alluvials

Treemaps display hierarchical (tree-structured) data as a set of nested rectangles. Each branch of the tree is given a rectangle, which is then tiled with smaller rectangles representing sub-branches. A leaf node’s rectangle has an area proportional to a specified dimension of the data.[1] Often the leaf nodes are colored to show a separate dimension of the data. When the color and size dimensions are correlated in some way with the tree structure, one can often easily see patterns that would be difficult to spot in other ways, such as whether a certain color is particularly relevant. A second advantage of treemaps is that, by construction, they make efficient use of space. As a result, they can legibly display thousands of items on the screen simultaneously. The downside of treemaps is that as the aspect ratio is optimized, the order of placement becomes less predictable. As the order becomes more stable, the aspect ratio is degraded. (Wikipedia) Use Nathan Yau’s dataset from the flowingdata website: http://datasets.flowingdata.com/post-data.txt You will need the package “treemap” and the package “RColorBrewer”. A heatmap is a literal way of visualizing a table of numbers, where you substitute the numbers with colored cells. There are two fundamentally different categories of heat maps: the cluster heat map and the spatial heat map. In a cluster heat map, magnitudes are laid out into a matrix of fixed cell size whose rows and columns are discrete categories, and the sorting of rows and columns is intentional. The size of the cell is arbitrary but large enough to be clearly visible. By contrast, the position of a magnitude in a spatial heat map is forced by the location of the magnitude in that space, and there is no notion of cells; the phenomenon is considered to vary continuously. (Wikipedia) This type of visualisation is a variation of a stacked area graph, but instead of plotting values against a fixed, straight axis, a streamgraph has values displaced around a varying central baseline. Streamgraphs display the changes in data over time of different categories through the use of flowing, organic shapes that somewhat resemble a river-like stream. This makes streamgraphs aesthetically pleasing and more engaging to look at. The size of each individual stream shape is proportional to the values in each category. The axis that a streamgraph flows parallel to is used for the timescale. Color can be used to either distinguish each category or to visualize each category’s additional quantitative values through varying the color shade. Streamgraphs are ideal for displaying high-volume datasets, in order to discover trends and patterns over time across a wide range of categories. For example, seasonal peaks and troughs in the stream shape can suggest a periodic pattern. A streamgraph could also be used to visualize the volatility for a large group of assets over a certain period of time. The downside to a streamgraph is that they suffer from legibility issues, as they are often very cluttered. The categories with smaller values are often drowned out to make way for categories with much larger values, making it impossible to see all the data. Also, it’s impossible to read the exact values visualized, as there is no axis to use as a reference.

almost 5 years ago

Hate Crimes in NY from 2010-2016

almost 5 years ago

Practicing programming with R

This is a piece of work which will help anyone begin practicing programming, this work is almost entirely that of my professor but for the two last histogram plots. The "airquality" dataset is a pre-built dataset in R.

almost 5 years ago

Sign In

conya1

Njenwieh Onya Clovis

Recently Published