Recently Published
Comprimento rostro cloacal (CRC): comparação de amostras (Anura) por estação do ano e relação linear com temperatura e precipitação pluviométrica usando R
Minicurso apresentado no 1° ENCONTRO AMAZÔNICO DE CIÊNCIAS AMBIENTAIS: COP 30 e Amazônia no Contexto das Mudanças Climáticas.
Conteúdo:
- Estatística Descritiva
- Teste de Normalidade
- Teste t-Student
- Análise de Regressão e Correlação Linear Simples eMúltipla
Altitude en Martinique
Représentation de l'altitude en contour de relief (Tanaka, 1950). Packages tanaka, sf, cartography et terra - Auteur : Florent Demoraes - septembre 2024
MiBici Data Analitics
MiBici public bike share system data collecting, cleaning and initial processing.
Introduction to Artificial Neural Networks.
A simple (Not so accurate) implementation of the ANN with the help of Haberman's Survival Dataset
Preprocess and Exploratory Analysis Stroke Prediction Dataset
In this RMarkdown document, I perform data preparation and exploratory analysis on a stroke prediction dataset obtained from Kaggle. The steps include:
Data Preparation:
Setup: Disable warnings and messages in R to ensure cleaner output during execution.
Data Import: Load the stroke prediction dataset from a CSV file and inspect its structure by viewing the first few rows in a tabular format.
Data Cleaning: Remove the first column (patient ID) as it is not relevant for analysis, and delete rows with missing values (NA). Convert integer columns to numeric types and rename columns for clarity. Recode categorical variables
Data Analysis:
Data Visualization: Display the cleaned dataset.
Factor Conversion: Convert the 'stroke' variable into a factor.
Outlier Handling: Exclude the 'Other' category in gender, as it represents only one individual and is not relevant for further study.
Missing Value Imputation: Impute missing values in all predictor variables, including BMI, using k-nearest neighbors (KNN).
Numerical Variable Distribution: Analyze the distribution of numerical variables and their relationships with the response variable using box plots.
Categorical Variables Distribution: Create mosaic plots to show relationships between categorical variables.
Contingency Tables and Association Tests: Perform statistical tests to examine associations between categorical variables and the response.
Correlation Matrix: Compute and visualize the correlation matrix for numerical variables.
Non-linear Relationships: Explore non-linear relationships between numerical variables and the response through scatter plots with non-linear fit lines.
Heatmap: Create a heatmap to show the distribution of average age by job type.
Data Filtering and Standardization: Filter out patients younger than 50 and those with job statuses of never worked or children.
This document aims to clean and explore the dataset thoroughly to prepare it for further predictive modeling and analysis.
NN model on Haberman`s survival dataset
in this model, EDA, Preprocessing and NN model is build.