# Easy web publishing from R

##
Write
R Markdown
documents in RStudio.

Share them here on RPubs.
(It’s free, and couldn’t be simpler!)

**Get Started**

## Recently Published

##### Graficas_Canciones

Most_Streamed_Spotify_Songs_2024.csv

##### Document

The objective of this project is to develop a predictive model that accurately forecasts the selling prices of used cars based on various features such as year of manufacture, kilometers driven, fuel type, seller type, transmission, ownership, mileage, engine capacity, and maximum power. By employing regression analysis techniques, this project aims to provide valuable insights into the factors that influence car prices and to create a reliable tool for predicting the market value of used cars.

##### Milestone Report

Milestone Report for the Johns Hopkins Data Science Capstone project

##### Simulacion inicial

2 factores, n 100, 8 items, 4 opciones de respuesta

##### Comprehensive Analysis of Silkworm Phenotypic Traits

Overall Structure & Purpose
The R Markdown script is designed to thoroughly analyze silkworm phenotypic traits, aiming to uncover patterns, relationships, and differences among various strains. It employs a multi-faceted approach encompassing data exploration, statistical testing, multivariate analysis, and correlation analysis. The ultimate goal is to generate insights that can inform further research and potential breeding strategies for silkworms.
Code Sections & Explanations
Setup
knitr::opts_chunk$set(echo = TRUE, warning = FALSE, dpi = 1000)
Configures the R Markdown knitting process.
echo = TRUE: Ensures that the R code is displayed in the output alongside the results.
warning = FALSE: Suppresses any warnings that might arise during code execution, keeping the output clean.
dpi = 1000: Sets the resolution of generated plots to 1000 dots per inch, ensuring high-quality images.
Libraries
library(...)
Loads essential R packages required for the analysis.
tidyverse: Provides a collection of data manipulation and visualization tools.
emmeans: Facilitates the estimation and analysis of marginal means in statistical models.
ggplot2: A powerful grammar of graphics for creating elegant and customizable plots.
lme4 & lmerTest: Enable fitting and analysis of linear mixed-effects models.
Hmisc: Offers various statistical functions and tools.
performance & optimx: Provide additional optimization algorithms for model fitting.
multcompView: Assists in creating compact letter displays for multiple comparison results.
readxl: Enables reading data from Excel files.
dplyr: Provides a grammar of data manipulation.
agricolae: Offers tools for agricultural research and analysis, including ANOVA and Tukey's HSD test.
broom: Converts statistical analysis outputs into tidy data frames.
ggpubr: Simplifies the creation of publication-ready plots, including those with mean values and error bars.
FactoMineR & factoextra: Support principal component analysis (PCA) and provide visualization tools for PCA results.
corrplot: Generates visually appealing correlation plots (correlograms).
GGally: Creates scatterplot matrices to visualize pairwise relationships between variables.
car: Contains the leveneTest function for testing homogeneity of variance.
FSA: Includes the dunnTest function for non-parametric post-hoc analysis.
Data Loading and Pre-processing
data <- read_excel("Hatching_percentage.xls")
Reads the phenotypic data from an Excel file named "Hatching_percentage.xls" into an R data frame called data.
data$Strain processing
Cleans and standardizes the Strain column by extracting the relevant strain information and converting it into a factor (categorical variable).
traits <- c(...)
Defines a vector traits containing the names of the six phenotypic traits to be analyzed.
Data Exploration and Assumption Checking
check_model function
Defines a function to create diagnostic plots for a linear model, aiding in the assessment of model assumptions (normality of residuals, homoscedasticity, etc.).
Loop through traits
Iterates over each trait in the traits vector.
Fits a temporary linear model (temp_model) to assess assumptions without saving it in the global environment.
Creates a boxplot for each trait, visualizing its distribution across different strains and highlighting outliers.
Performs the Shapiro-Wilk test for normality and Levene's test for homogeneity of variance, printing the results.
ANOVA, Transformation, and Non-Parametric Testing
Subsets analysis_data
Creates a new data frame analysis_data by selecting only the numeric columns (traits) from the original data, excluding the "Replicate" column.
Defines anova_traits and non_parametric_traits
Explicitly separates traits suitable for ANOVA and those requiring transformation/non-parametric tests.
Loop through traits
For each trait:
Fits a linear model.
Checks normality and homogeneity assumptions.
If assumptions are violated, attempts a log transformation and re-checks assumptions.
If at least one assumption is met (originally or after transformation), performs ANOVA and Tukey's HSD (if significant).
If both assumptions remain violated, performs the Kruskal-Wallis test and Dunn's test (if significant).
Multivariate Analysis
Performs PCA on analysis_data.
Creates a scree plot to visualize variance explained by each principal component.
Generates separate plots for individuals (colored by strain) and variables.
Performs hierarchical clustering on the first three principal components and visualizes the dendrogram with vertical labels.
Determines the optimal number of clusters for k-means clustering using the elbow method.
Performs k-means clustering and visualizes the clusters in the PCA space.
Correlation Analysis
Calculates Pearson and Spearman correlation matrices for analysis_data.
Visualizes the correlation matrices using correlograms.
Defines a function plot_correlation to perform correlation tests (Pearson or Spearman) between specific trait pairs and visualize the relationship with scatterplots and regression lines.
Creates scatterplot matrices with Pearson and Spearman correlations, adjusting label sizes and margins for readability.
Conclusion
Summarizes the key findings and insights gained from the analysis.
Scientific Rigor
This R Markdown demonstrates scientific rigor through:
Explicit Assumption Checking: It systematically evaluates the assumptions of normality and homogeneity of variance before applying statistical tests.
Data Transformation: It attempts to address violations of assumptions using transformations (log transformation in this case).
Appropriate Statistical Tests: It employs ANOVA when assumptions are met and non-parametric tests (Kruskal-Wallis) when they are not.
Post-hoc Analysis: It uses Tukey's HSD and Dunn's test for pairwise comparisons after ANOVA and Kruskal-Wallis, respectively.
Multivariate Exploration: It utilizes PCA and clustering to uncover underlying patterns and relationships in the data.
Correlation Analysis: It explores both Pearson and Spearman correlations to understand the associations between traits.
Clear Visualization: It generates informative plots (boxplots, PCA plots, dendrograms, correlograms, scatterplot matrices) to visualize the data and results.
Interpretation and Conclusion: It provides guidance for interpreting the results and encourages further exploration and analysis based on the findings.

##### ActividadClase6

Actividad clase 06