Recently Published
Chocolate Bar Ratings: Exploratory Data Analysis of Chocolate Ratings Data from Kaggle
This is an exploratory data analysis of chocolate ratings data from Kaggle.
The dataset contains almost 1800 expert ratings of various chocolate bars from different types of cocoa beans, origins, companies and manufacturing countries. Each rating also contains additional information about the specific bean origin (if known) and information about the percentage of cocoa.
Clustering of Italian Olive Oils with their Fatty Acid Composition
Use of K-Means, Gaussian Mixture Models and HDBSCAN
Use of Search Methods for Maximizing the Compressive Strength of Concrete
Use and Comparison of Local and Population-based Search Methods for Concrete Mixture Optimization
Modeling the Thermophysical Properties of Milk
Predict the heat capacity, thermal conductivity and density of cow's milk with temperature, water and fat content
The objective of this document is to develop statistical models that predict the heat capacity, thermal conductivity and density of milk with various concentrations of fat and water. Various polynomials functions were trained and tested on a training set using best subset regression. The final models selected were assessed on a hold-out test set.
Search Methods for Hyperparameter Tuning in R
Use various search methods in finding the optimal values of hyperparameters of a machine learning model. The population-based search methods to be tested are genetic algorithms (GA), differential evolution (DE) and particle swarm optimization (PSO). Grid and random search will also be performed and used as benchmarks.
Text Prediction with the Swiftkey Dataset
The goal of this project was to develop a prediction algorithm to predict words based on previous text and create an user interface that can be accessed by others.
The project is part of the capstone project of the Data Science Specialization by Johns Hopkins University in partnership with Swiftkey.