Recently Published
Movie Recommendation with Market Basket Analysis
In this project, we applied a data mining algorithm, Apriori, to mine a relationship among films and build a movie recommendation engine. Apriori is a technique in Market Basket Analysis used to discover items that are frequently sold together. Frequently purchased itemset suggests marketing opportunity when customers displayed interest in the subset items. In this case, movies can be viewed as a set of items. We obtained our training data from MovieLens’s website(http://grouplens.org/datasets/movielens/). We used MovieLens 20M Dataset dataset which consisted of 20,000,263 user ratings, across 27,278 movies and 138,493 raters. We found that the mining technique can be utilized to uncover an underlying connection within the movies. It can also be used in a movie recommendation, but a number of suggested films can be quite limited and the quality of such suggestions can be vary. Additionally, we also built a web interface that allows users to access our mining result. The web can be found here https://vitidn.shinyapps.io/MovieRecommendationWithMarketBasketAnalysis/.
Predict LendingClub's Loan Data
In this report, we attempt to predict the risk of the loan being default based on the past loan data. We obtained data from LendingClub’s website(https://www.lendingclub.com/info/download-data.action). We use loan data from year 2012-2014 as training and cross-validation set and loan data from year 2015 as a testing set. We also compare our investment performance against the baseline algorithm. We found that, among multiple machine learning algorithms that we tried, Logistic Regression provided a reasonable trade-off performance, and a higher return than the naive loan picking strategy can be achieved.