This ML project is designed to accurately predict the quality, and therefore, the price of wine in the Bordeaux region of France. It is widely known that older wines typically taste better and consequently fetch a higher price. Wine buyers profit from knowing in advance which wines will taste better in the future. I used a multi-variable linear regression in RStudio with price as my dependent variable (outcome variable) and independent variables of weather and age to predict wine prices. I created a total of five ML models and used a test dataset from 1979 - 1980 to verify that my model was accurate. An analysis of the test data and the predictTest concludes that model4 is the closest prediction of wine prices of 6.95 for 1979 and 6.5 for 1980.
The maiden voyage of the Titanic; made popular by James Cameron's epic 1997 film, is one of the most infamous tragedies in human history. The "unsinkable" oceanliner struck an iceberg and sank to the bottom of the Atlantic killing 1,502 of its 2,224 passengers and crew. Today, with the help of the ship's manifest, data analysts and data scientists from around the globe are able to discover numerous insights by appyling machine learning algorithms to predict who lived and who ultimately died onboard the RMS Titanic. My goal here is to use machine learning and feature engineering to visualize the distinguishing factors between the passengers that were more likely to survive the shipwreck from those who ultimately perished. I used ggplot2 package to visualize the prediction that women in first class without children were more likely to survive the Titanic while men in third class with children were more likely to perish.
NBA Moneyball uses a multiple linear regression of NBA stats and previous Win/Loss team records to predict the chances of NBA teams reaching the playoffs. The term Moneyball was made popular by the 2011 movie starring Brad Pit as Billy Beane, the coach of the flailing Oakland A's, who used a data analytics approach to winning ball games. In this regression Model4 has an .8127 accuracy.
This Biomedical Text Mining project uses the R-Statistical Package to extract cancer related abstracts from the RISmed databases. A Term Document Matrix was created and used to find associations between the terms "tumor" ,"poor", and "rich" and the frequency of other terms in each document. A wordcloud and plots were generated showing a clustering of related documents in the corpus.
The objective of this project is to create a sentiment analysis using a corpus of four negative online reviews for the Penske Truck Rentals company. This project that will help reveal an isolate any apparent systemic problems within the process of leasing Penske trucks. I copied and pasted four online reviews from the website http://www.ripoffreport.com and created separate .txt documents for each review.
NBA Moneyball uses multiple regressions of NBA stats and previous Win/Loss team records to predict the chances of a team reaching the playoffs. The term Moneyball was made popular by the 2011 movie starring Brad Pit as Billy Beane, the coach of the flailing Oakland A’s, who successfully used a data analytics approach to winning ball games.