Recently Published
Data Science Project
This project aims
To explore the US diabetes dataset using the R programming language.
To predict the chance of admission and analyze the relationship between variables using machine learning approaches.
North Carolina births record
The project aims to explore the relation between expectant mothers' habits and practices and their children's birth based on “nc” datasets. In 2004, the state of North Carolina released an extensive data set containing information on all births recorded in their state.
Our main objective of this study is to find whether smoking status of mother during pregnancy time associate with low birth weight of baby or not.
We will investigate the association using essential statistical tools, such as 'Descriptive Statistics', 'Data visualization' (normality check, distribution of variables, histogram, and Boxplot), and 'Quantitive Data Analysis' by linear and nonlinear regression models.
Probability
Basketball players who make several baskets in succession are described as having a hot hand. Fans and players have long believed in the hot hand phenomenon, which refutes the assumption that each shot is independent of the next. However, a 1985 paper by Gilovich, Vallone, and Tversky collected evidence that contradicted this belief and showed that successive shots are independent events. This paper started a great controversy that continues to this day, as We can see by Googling hot hand basketball.
We do not expect to resolve this controversy today. However, in this lab we’ll apply one approach to answering questions like this. The goals for this lab are to (1) think about the effects of independent and dependent events, (2) learn how to simulate shooting streaks in R, and (3) to compare a simulation to actual data in order to determine if the hot hand phenomenon appears to be real.
Internal migration Exploratory Data Analysis (EDA)
This “internal migration” is the movement of households from one address to another address within the same town, county, state, or between states without leaving the country. The U.S. Census Bureau estimates that about 14% of the people living in the U.S. move within the U.S. each year (As of Oct 4, 2019).
In this analysis, we use the ‘county’ data set (available in the ‘usdata’ package in RStudio), to visualize this phenomenon in the context of the United States, and consolidate its validity. We will also purpose, to a certain degree, to explore and identify key factors that encourage such behavior.
Preliminary Statistical Analysis
Lab 2 is based on the creation of a vector, matrix, list of variables, and data frame. Finally, we will investigate the association of two variables by a simple linear regression model as well as a scatter plot for visual inspection.
Flights data analysis
Some define statistics as the field that focuses on turning information into knowledge. The first step in that process is to summarize and describe the raw information – the data. In this homework, we will explore flights, precisely a random sample of domestic flights that departed from the three major New York City airports in 2013. We will generate simple graphical and numerical summaries of data on these flights and explore delay times. As this is a large data set, you’ll also learn indispensable data processing and sub-setting skills.
Investigate causal relationships
Lab 1 is based on the “Wage” dataset to investigate causal relationships. In particular, we will try and see if we can predict Wage based on Education, Race, Health, Age, and other variables.
Historical Arbuthnot data exploration
The Homework-1 is based on “arbuthnot” and “present” datasets to investigate the comparison of whether christen males or christen females were born more for these 82 years. As well as we will compare the result with two generations from 1629 to 1710 and from 1940 to 2002.
The Arbuthnot data set refers to Dr. John Arbuthnot, an 18th-century physician, writer, and mathematician. He was interested in the ratio of newborn boys to newborn girls, so he gathered the christening records for children born in London every year from 1629 to 1710. And the “present” data set is an updated version of the historical Arbuthnot dataset. Numbers of boys and girls born in the United States between 1940 and 2002.