Recently Published
Data Mining Analysis of Transfermarkt Football Data
This project applies data mining techniques to football player data from Transfermarkt, exploring performance indicators, clustering, and predictive modeling to uncover insights about player value and team dynamics
Credit Risk Classification with Decision Trees, Random Forest, and SVM
This project applies supervised machine learning techniques to the German Credit dataset, comparing Decision Trees, Random Forest, and Support Vector Machines (SVM). The analysis includes preprocessing, exploratory data analysis, model evaluation with confusion matrices and key metrics, and variable importance. Results highlight the trade-offs between accuracy, sensitivity, and specificity, offering insights for credit risk assessment.
Unsupervised Learning in Action: Discovering Patterns Without Labels
This analysis explores clustering techniques, specifically comparing K-means and DBSCAN, to uncover patterns in raptor bird data. While K-means excels with uniform data, DBSCAN is more effective for datasets with noise and varying densities, offering deeper insights into complex data structures.
Traffic Accident Analysis
This study analyzes fatal traffic accidents, focusing on emergency response times, weather conditions, and risk factors. Findings show most accidents occur in clear weather, with fatalities peaking at night. Emergency response times play a key role, emphasizing the need for targeted road safety measures
Analysis of Factors Influencing Voting Behavior and Survey Duration
This study examines the impact of household income, voting probability, and ideological self-placement on voting behavior, alongside factors such as age, gender, ideology, and telephone type influencing survey duration using statistical modeling techniqu
Multivariate Analysis of Salaries, Survival, and Popularity Trends
This study examines salary trends, survival rates, and song popularity using clustering and regression techniques to uncover meaningful patterns and insights across multiple datasets.
Multivariate Analysis of Car Pricing and Social Data Using Clustering Techniques
This project analyzes car pricing and social data using clustering methods, identifying patterns in vehicle affordability and demographic trends to aid decision-making and policy development.
A Descriptive and Correlation Study
This report presents a multivariate analysis of two datasets: California school data and Canadian political contributions. The analysis involves descriptive statistics, correlation studies, and data visualization to explore relationships between key variables.
Key findings include:
A strong positive correlation (0.87) between the number of teachers and computers in California schools.
Descriptive statistics indicate variations in the number of teachers (ranging from 20 to 150) and computers (ranging from 50 to 1200) across schools.
A moderate correlation (0.34) between political parties and pharmaceutical industry contributions in Canada.
Visual representations such as scatter plots and bar charts provide insights into these relationships, highlighting patterns that could be further examined through advanced multivariate techniques.