RPubs

by RStudio

lindangulopez

Linda Angulo Lopez

Recently Published

Tests

https://bookdown.org/yihui/rmarkdown/basics.html

over 1 year ago

First Quartro Document

Atelier Effectuer des rapports automatisés avec R-Markdown/Quarto Amandine BLIN <amandine.blin@mnhn.fr>

over 1 year ago

Rapport avec R-Markdown

Amandine Blin UAR 2700 2AD, Service Analyse de Données

over 2 years ago

Demo Iris

Amandine Blin UAR 2700 2AD Service Analyse de Données Pôle Analyse de Données

over 2 years ago

Demo 1

Amandine Blin UAR 2700 2AD Service Analyse de Données Pôle Analyse de Données

over 2 years ago

Methods and Skills for Sustainability

A Quick Demo on How to Plot SDGs on Maps, with R

over 3 years ago

SCGIS6

Geoprocessing 1 - Geometric Measurements

over 3 years ago

SCGIS4

Wrangling 2D Data with dplyr

over 3 years ago

SCGIS3

import raster GIS data into R

over 3 years ago

SCGIS

how to import various types of vector GIS data into R.

over 3 years ago

[mySWIRL notes: Statistical_Inference](https://github.com/DataScienceSpecialization/courses/) ## Multiple Testing Multiple testing is when you use data to test several hypotheses, for example if we set $\alpha$ =.5 we can only test 20 hypotheses before one of the outcomes is an error and even if a p value is significantly low, it could be false. In these cases error measures come into play, especially for big data analysis.The the questions asked are: _"Which of the variables matter among the thousands should be measured?"_ _"How do we relate unrelated information?"_

over 4 years ago

P for POWER

library(knitr) # creating a pdf document ; library(ggplot2) # making plots library(reshape2) # data frames Power is the probability of rejecting the NULL HYPOTHESIS when it is false. :: Power is used to determine if your sample size was big enough to yield a meaningful, rather than random, result :: Detect if your ALTERNATIVE hypothesis is true, to lower the risk of a Type II errors. As beta is the probability of a _Type II error, for accepting_ _a false null hypothesis_, and the complement of this is (1-$\beta$), the power is: \[ P = (1-\beta) \]

over 4 years ago

Basic Inferential Data Analysis Exercise

The project consists of two parts (i) [a simulation exercise](https://rpubs.com/lindangulopez/709073) and a (ii) basic inferential data analysis, the former is presented here. The hypothesis tested, was that `the dosage and supplement do not affect tooth length`, the alternative is that it does. The the population was assumed to be near normally distributed. It was also found that at p-value is 0.03032, when comparing orange juice to vitamin C, so we can reject the null hypothesis. Further investigation showed that orange juice is linked to higher tooth growth length at dose = .5mg and dose = 1.0mg. But that there is no significant difference of tooth length at dose = 2.0mg, the p-value was almost 1 at 0.9639.

over 4 years ago

Code Exponential Distribution

over 4 years ago

Exponential Distribution Simulation Exercise

The hypothesis tested, was that `the sampling distribution of exponential distribution has a normal distribution with a mean that matches the population mean and a variance that matches the theoretical result`. It was found that for the exponential distribution generated the true mean that matches the population mean and a variance that matches the theoretical result in addition the distributions have similar means at the quantiles: 5%, 25%, 50% , 75% and 95%.

over 4 years ago

The Central Limit Theorem, a Swiss Army Knife of Statistics

The project consists of two parts (i) a simulation exercise and (ii) a basic inferential data analysis, the former is presented here. Asymptotics form the basis for frequency interpretation of probabilities, where the behavior of statistics depends on the sample size or some other relevant quantity of limits to infinity or to zero. These limits are the [the swiss army knives of statistics](https://github.com/bcaffo/courses/raw/master/06_StatisticalInference/07_Asymptopia/index.pdf), Brian Caffo. Simulations were made to investigate the asymptotic distributions of exponential distributions, a discreet case, and compared to test statistics which are expected to be Gaussian, a strong form of the [Central Limit Theorem](https://youtu.be/hgtMWR3TFnY), in R4.0. Results show that like with greater $n$in the CLT, with an increase the unit of time,$\lambda$, the coverage improves and adheres to the CLT.

over 4 years ago

Power

Worked Examples

over 4 years ago

Bootstrap in action

Resampling Basics

over 4 years ago

multiple-comparisons

https://www.coursera.org/learn/statistical-inference/lecture/7c7Ns/12-01-multiple-comparisons

over 4 years ago

Power Basics

When designing an experiment we usually know mu0 & alpha and we want to know if we have or can get enough data at the power we want. The more power the better the experimental design, as one of the errors is we could make is to reject the null when it is false, this is called the Power = 1 - beta

over 4 years ago

Practice Exercises, S_p & CI

we know, S_p = sqrt(((n_x - 1) * S_x^2 + (n_y - 1) * S_y^2)/(n_x + n_y - 2)) and we know, CI = mu_x - mu_y + c(-1, 1) * qnorm(quantile) * S_p * (1 / n_x + 1 / n_y)^.5 so we can plug in ...

over 4 years ago

How to Generate & Interpret P Values.

The question motivating p-values is: Given that we have some null hypothesis concerning our data, how unusual or extreme is the sample value we get from our data, from for example, its mean? Is our [test statistic](https://rpubs.com/lindangulopez/702246) consistent with our hypothesis? There are, implicitly, three steps we have to take to answer these types of questions. - Create a null hypothesis - Calculate a test statistic from the given data - Compare the test statistic to the hypothetical distribution

over 4 years ago

Making decisions about populations using observed data.

An important concept in hypothesis testing is `the NULL hypothesis`, usually denoted as H_0: This is the hypothesis that `represents the status_quo that is which is assumed to be true`. It's a baseline against which you're testing alternative hypotheses, usually denoted by H_a. - Statisticalevidence is required to reject H_0 in favor of the research or alternative hypothesis.

over 4 years ago

Statistical methods for dealing with large & small datasets.

Central Limit Theorem (CLT) - Z statistic - Student’s or Gosset’s t distribution - t confidence intervals

over 4 years ago

P Values

P-values are a convenient way to communicate the results of a hypothesis test. When communicating a P-value, the reader can perform the test at whatever Type I error rate that they would like. Just compare the P-value to the desired Type I error rate and if the P-value is smaller, reject the null hypothesis. Formally, the P-value is the probability of getting data as or more extreme than the observed data in favor of the alternative. The probability calculation is done assuming that the null is true. In other words if we get a very large T statistic the P-value answers the question "How likely would it be to get a statistic this large or larger if the null was actually true?". If the answer to that question is "very unlikely", in other words the P-value is very small, then it sheds doubt on the null being true, since you actually observed a statistic that extreme.

over 4 years ago

Hypothesis testing

Statistical hypothesis testing is the formal inferential framework around choosing between hypotheses. The null hypothesis is assumed true, H0, and statistical evidence is required to reject it in favor of a research or alternative hypothesis, Ha.

over 4 years ago

T Confidence Intervals

# Uncomment & Run manipulate in RStudio

over 4 years ago

NOAA Reproducible Research

Contents: - Synopsis, Reproducible Research Checklist - Summary of Results, Past Significant Weather Events in the USA, 1950-2011 - Raw Data: NOAA StormData.csv.bz2 - Data Transformation, storm_data_corrected2 - Data Processing, Storm Event Type Damage - Data Analysis, Tables and Plots

over 4 years ago

Maps with R

Using base maps from R’s maps package and also using the ggmap package, with the ggplot2 package.

over 4 years ago

Plot in Plot

Demo

over 4 years ago

echo = false

knitr

over 4 years ago

FirstKnitr

Demo

over 4 years ago

R Markdown

Quick Demo

over 4 years ago

Reproducible reporting with R

R markdown files can be used to generate reproducible reports

over 4 years ago

Explore the National Emissions Inventory database

## [Assignment](https://www.coursera.org/learn/exploratory-data-analysis/peer/b5Ecl/course-project-2) The overall goal of this assignment is to explore the National Emissions Inventory database and see what it say about fine particulate matter pollution in the United states over the 10-year period 1999 to 2008.

over 4 years ago

Changes in PM25 levels

This plot needs a bit of work still, it's not easy to see by eye, it looks but that many states have decreased the average PM levels from 1999 to 2012, and a few states actually increased their levels ..

over 4 years ago

Clustering Tips

Finding differences in patterns are useful for modeling.

over 4 years ago

colours in R

quick demo

over 4 years ago

Grouping Data in R

When exploring data, there are two principal uses of grouping, (i) to point out groups of similar data, here the distance/similarity has to be chosen to match the problem and (ii) to create a set of variables which are uncorrelated but representative of the data, which would explain as much variance as possible, here the first goal is statistical, solved by PCA, and the second goal is data compression which can be solved by SVD.

over 4 years ago

ggplot2(maacs)

Plotting in ggplot2 allows for the building up in layers, for example (i) Plot the data (ii) Overlay a summary (iii) then add Metadata and annotation. [Data set: MAACS Cohort] A mouse allergen and asthma cohort study of children (aged 5—17) with persistent asthma, data was collected 5 times, per child, over a year.

over 4 years ago

The Lattice Plotting System

The lattice plotting system is implemented using the following packages: lattice: contains code for producing Trellis graphics, which are independent of the “base” graphics system; includes functions like xyplot, bwplot, levelplot grid: implements a different graphing system independent of the “base” system; the lattice package builds on top of grid The lattice plotting system does not have a "two-phase" aspect with separate plotting and annotation like in base plotting All plotting/annotation is done at once with a single function call

over 4 years ago

Exploratory household energy usage data analysis, with R

Examine how household energy usage varies over a 2-day period in February, 2007. Your task is to reconstruct the following plots below, all of which were constructed using the base plotting system. Step 1: fork and clone this GitHub repository Step 2: Download the data to your working directory. ...

over 4 years ago

Base Plotting on Graphic Devices

screen, pdf, png

over 4 years ago

Base Plot

Examples for screen device display

over 4 years ago

Plottingin R

There are three key plotting systems in R, the Base, which is a type of artist’s palette, model, a Lattice System, which allows to specify an entire plot specified by one function and the conditioning ggplot2 with mixed elements of Base and Lattice.

over 4 years ago

Basics, of Exploratory Data Analysis in R

http://rpubs.com/lindangulopez/656231 <-replaces

over 4 years ago

HAR-subject-activity-mean Code Book

The course Getting and Cleaning Data, is course 3 of the [Data Science Specialization], it is taught by Jeff Leek, Brian Caffo and Roger D. Peng, all Professors from the [Johns Hopkins School of Public Health]. You will learn how to obtain data from the web, from APIs, from databases and from colleagues in various formats. The main aim was to understand the basics of data cleaning and how to make data “tidy”. Tidy data dramatically speeds up downstream data analysis tasks as well as the components of a complete data set including raw data, processing instructions, codebooks, and processed data, and is the focus of this blog which presents the Tidy Data Assignment Results: - a [README.md] which explains the purpose and content of the repository. - there is the R script [run_analysis.R] which transforms the given data to a tidy data set. - the tidy dataset [HAR-subject-activity-mean.txt]which is produced as an output from the R script. - a csv file [HAR-subject-activity-mean.csv] for easy data analysis with csv tools. - a metadata file, for statisticians, the [CodeBook.md] which lists the variables of the tidy data set.

over 4 years ago

Tidy Data

with tidyverse

over 4 years ago

Case Studies

Getting & Cleaning Data

over 4 years ago

Working with Dates

Week 4 Course Notes: data-cleaning in R, by [Linda] (@lindangulopez)

over 4 years ago

Regular Expressions, in R

Week 4 Course Notes: data-cleaning Regular Expressions are useful when seeking text using wide patterns, using metacharacter which are the 'grammar' in the logic search and literals which are similar to 'words' used in Natural Language, NL.

over 4 years ago

clean text variables

Week 4 Course Notes: data-cleaning You run these R code chunks to clean text variables, if you need to download & clean data

over 4 years ago

Week 3 Quiz: data-cleaning

Extracts of a few Case Studies Managing Data Frames with tidyverse [dplyr]

almost 5 years ago

Merging Data in R4.0

The magrittr package [dplyr] package facilitates the rendering of Tidy Data, that is when: each variable forms a column each observation forms a row and each table/file stores data about one kind of observation

almost 5 years ago

Managing Data Frames with dplyr

The magrittr package installed as part of the dplyr package facilitate the rendering of Tidy Data, that is when: Each variable forms a column Each Observation forms a row Each table/file stores data about one kind of observation through these type of data manipulations: reveal new variables, new observations, and new ways to describe data as well as subset data, do group wise operations etc.

almost 5 years ago

Reshaping Data & Using Pipes

This is an R Markdown document, feel free to [reach out](https://www.linkedin.com/in/lindangulopez/) for finer details.

almost 5 years ago

New Variables

Create & Add the New Variables to the Data Frame

almost 5 years ago

Summarizing Data

from summary () to ftable()

almost 5 years ago

Subsetting & Sorting

which(), sort(), arrange()

almost 5 years ago

data-cleaning/week-2-quiz

TO DO: test Q in Ubuntu & R (not RStudio:°)

almost 5 years ago

Accessing Geospatial Data Using API’s with R

Lesson 7. Programmatically Accessing Geospatial Data Using API’s - Working with and Mapping JSON Data from the Colorado Information Warehouse in R

almost 5 years ago

Web Scraping in R

To avoid getting your IP address blocked respect the terms of service of the website.

almost 5 years ago

Reading & writing to HDF5 with R

Environmental Setup & Code Chunks

almost 5 years ago

Connecting & Sourcing MySQL Data in R

This is an R Markdown document on sourcing data from the MariaDB a mySQL server with R.

almost 5 years ago

Getting & Cleaning Data Lab

Answers to Quiz 1

almost 5 years ago

Data Sourcing and Examination, in R

This is an R3.6 Markdown document on **Data Sourcing and Examination, in R**, on how processing steps can impact results.

almost 5 years ago

Hospital Quality Study

Programming in R Assignment

almost 5 years ago

Document

This is an attempt at the Caching the Inverse of a Matrix Assignment from Programming with R, the exercise is to demonstrate closures. Closures get their name because they enclose the environment of the parent function and can access all its variables. Closures are useful for making function factories, and are one way to manage mutable state in R. It allows us to have two levels of parameters: a parent level that controls operation and a child level that does the work. The example, makeCacheMatrix(), uses this idea to store x and inv in the enclosing environment of the set, get, setInverse, getInverse functions. That means the environment within which they were defined, i.e., the environment created by the makeCacheMatrix().

almost 5 years ago

Cars Data Set

My First R Markdown

almost 5 years ago

Sign In

lindangulopez

Linda Angulo Lopez

Recently Published