RPubs

by RStudio

mz03

Marta Zawada

Recently Published

Clustering Specialty Coffee Flavor Profiles

This study applies unsupervised clustering methods to specialty arabica coffee flavor profiles, using expert-rated sensory attributes (Aroma, Flavor, Aftertaste, Acidity, Body, and Balance) sourced from the Coffee Quality Institute. Both k-means and k-medoids (PAM) algorithms were tested, with the optimal number of clusters evaluated via silhouette width, the elbow method, and the gap statistic. Despite preprocessing steps to handle outliers and a high Hopkins statistic when sweetness was included, removing the near-constant sweetness variable reduced clusterability sharply (Hopkins ≈ 0.54). Neither algorithm produced well-separated, meaningful clusters. The findings suggest that high-quality arabica coffees are remarkably homogeneous in their sensory profiles, making data-driven flavor grouping unreliable on this dataset.

3 months ago

Car Price Prediction - Principal Component Regression Approach

This project investigates the effectiveness of Principal Component Regression (PCR) in predicting car prices, specifically addressing the challenge of multicollinearity inherent in automotive specifications. The analysis reveals that the first two components explain approximately 70% of the variance, representing intuitive axes of "size-and-power" and "sporty versus economy". Ultimately, utilizing three principal components provided an optimal balance between information retention and model stability. Still, Ordinary Least Squares (OLS) model demonstrated marginally higher predictive accuracy in 10-fold cross-validation.

3 months ago

Association Rules for COVID-19 triage

This project applies association rules mining and classification based on association rules (CBA) to a synthetic COVID-19 dataset with the goal of supporting patient triage. After cleaning the data (notably dropping body temperature due to synthetic generation flaws), variables are discretized both manually and automatically (MDLP), and apriori rules are mined separately for positive and negative diagnoses. Key findings include that contact with a COVID-19 patient, dry cough, and fever together nearly guarantee a positive result, that age and comorbidities don't predict infection in this dataset, and that gender influences which symptoms are most predictive. The CBA classifier achieves ~96% accuracy on the full dataset and somewhat lower but still strong performance on held-out test data, with perfect specificity on training data.

3 months ago

Sign In

RPubs

mz03

Marta Zawada

Recently Published

Clustering Specialty Coffee Flavor Profiles

Car Price Prediction - Principal Component Regression Approach

Association Rules for COVID-19 triage