Recently Published
Word Prediction Model Presentation
Presentation slides on explaining the use, methodology and accuracy of my word prediction model.
Word Prediction Model
As part of the Data Science Specialisation in Coursera, we are tasked to build a predictive text product. In this R Markdown file, we develop a modified Kneser-Ney prediction algorithm to predict the next most common words arising from a user's text inputs.
Coursera - Data Science Capstone: Milestone - Report
As part of the Data Science Specialisation in Coursera, we are tasked to build a predictive text product. Before building our model, we explore and clean the data. Following which, we conduct preliminary data analysis to identify the most frequent unigrams, bigrams and trigrams.
Titanic: Machine Learning from Disaster (Data Cleaning)
Conduct simple exploratory data analysis, data cleaning and data imputation.
Reproducible Pitch: Distribution Visualisation Application
In this app, the student first chooses the distribution that he/she is interested in finding more about, and then sets the parameters for the distribution, to visualize how the distribution looks.
Fuel Efficiency of Various Car Makes
In this data visualisation exercise, we attempt to track the fuel efficiency of specific car makes whilst driving in the city from 1985 to 2000.
Visualizing Taxi Availability in Singapore
In this report, we call the taxi-availability real-time API from data.gov.sg, and visualize the current locations of all available taxis on a leaflet map. The API returns latitude and longitude data of all available taxis at a given timing, which you can define by setting the date_time parameter.
In order to call the API, you will have to apply for an API-Key, by creating an account at data.gov.sg. Lastly, in order to obtain the most current data, we leave the date_time parameter blank.
Peer Graded Assignment - Prediction Assignment
In this prediction exercise, our goal is to predict the manner in which the participants carried out the various exercises, using data from accelerometers on the belt, forearm, arm and dumbbell of 6 participants.
Titanic: Machine Learning from Disaster
The sinking of the RMS Titanic is one of the most infamous shipwrecks in history. On April 15, 1912, during her maiden voyage, the Titanic sank after colliding with an iceberg, killing 1502 out of 2224 passengers and crew. This sensational tragedy shocked the international community and led to better safety regulations for ships.
One of the reasons that the shipwreck led to such loss of life was that there were not enough lifeboats for the passengers and crew. Although there was some element of luck involved in surviving the sinking, some groups of people were more likely to survive than others, such as women, children, and the upper-class.
In this challenge, we are tasked to complete the analysis of what sorts of people were likely to survive.
Taxi Temporal Analysis
Illustrating the impact of the entry of private-hire car companies on the taxi industry through simple data visualisations
Reproducible Research - Course Project 2
In this report, we identify severe weather events which are most harmful with respect to population health, as well as events which have the greatest economic consequences. To identify these events, we leverage on data from the NOAA Storm Database, which tracks characteristics of major storms and weather events in the United States. From our analysis, we find that tornados are the most harmful with respect to population health, having killed over 5000 people and injuring more than 90,000 people, while Floods have the greatest economic impact, causing the US more than $100 billion in collateral damage.