Emily C. Li

Recently Published

Titanic Survival Analysis Using Logistic Regression and SVM
packages used: Amelia, ROCR, aod, pscl and kernlab
Analyzing Social Networking Data Using Clustering with K-Mean
package used: stats
Market Basket Analysis Using Association Rules
Using association rules to find out particular purchased patterns and association between items in the grocery store. Package used: arules.
Letter Recognition Using Support Vector Machines (SVM)
Using SVM to build model and make prediction of alphabetical letter classes (26 letters) for unseen data. Package used: kernlab
Concrete Strength Modeling using Artificial Neural Networks (ANN)
Using ANN to model the concrete strength. packages used include neuralnet and NeuralNetTools.
Random Forest Analysis for Risky Bank Default
random forest analysis is applied to the credit data to make prediction for the bank loan default status. improvement is done by adjusting parameters of the model.
Logistic Regression Analysis on Risky Bank Default
Apply logistic regression on a credit dataset and predict the bank loan applicant's default status. Packages included Amelia, ROCR, caret
Logistic Regression Analysis on Distress Condition in Challenger Space Shuttle
logistic regression analysis on the distress conditions for the Challenger space shuttle. Model improvement include bootstrapping and k-fold cross validation. packages used in clude ROCR, Amelia and caret
Regression Tree & Model Tree for Analyzing Red Wine Quality
package highlight: rpart, rpart.plot, Rweka
Linear Regression Analysis Using Insurance Data
Multiple Linear regression analysis is made based on individual's personal/family information and their medical expenses. The model is made for prediction on medical expenses of future clients to aid medical insurance on charging the premium. Packages highlight: psych
Rules Learners for Identifying Poisonous Mushrooms
Using both the One R and RIPPER rules learner algorithm to identify poisonous mushrooms. The dataset contains about 8000 observations and 23 features in the raw dataset. The packages used here are: Rweka, gmodels, C50
Decision Tree Classification for Identifying Risky Bank Loans
Packages used are C50 and gmodels.
Naived Bayes Classification for House Prediction
Using Naived Bayes to prediction the house of representatives in the test dataset using trained dataset and its lebels. The packages used here are e1071, mlbench and gmodels.
Naived Bayes Classification for SMS Spam Detection
Using the Naived Bayes Classification to predict incoming spam SMS in high accuracy. The packages used are: e1071, tm, SnowballC, wordcloud, and gmodels. (This is an exercise from Chapter 3 in Machine Learning)