Recently Published

多元统计第二次作业
Plot
Document
Plot
26_SOK1004_C5
Clustering Universities: A K-Means Clustering Approach in Python
This project demonstrates the application of K-Means Clustering, an unsupervised learning algorithm, to classify universities into two groups: Private and Public. Although K-Means typically operates without labels, we leverage the true labels in the dataset for educational purposes, providing a unique opportunity to evaluate clustering performance using a classification report and confusion matrix. The analysis begins with exploratory data visualization and summary statistics to understand the dataset’s structure and relationships. We then implement the K-Means algorithm with two clusters and assess its results against the actual classifications. The findings highlight the limitations of K-Means, including its sensitivity to feature scaling and assumptions of cluster shape and size. While the clustering algorithm does not perform well in this specific context, the project underscores the importance of pre-processing, feature selection, and domain expertise in unsupervised learning. This exercise provides valuable insights into the practical application and challenges of clustering techniques in real-world datasets.
Biostatistics - Day 2/3
Weather Forecasting Dashboard
This interactive dashboard, built using R Markdown and Flexdashboard, compares the performance of various machine learning models—including SVM, Naive Bayes, KNN, Decision Tree, and Logistic Regression—on a weather forecasting dataset. Key insights are supported by additional visualizations for data trends and distributions, providing a comprehensive overview of predictive model accuracy and weather patterns.
Plot
top genes by cluster