gravatar

mszczepaniak

Michael Szczepaniak

Recently Published

Introduction to Maximum Likelihood
Introduces the concept of maximum likelihood and provides the background for were the the equation in Part I came from.
Predicting Next Word Using Katz Back-Off: Part 3 - Understanding and Implementing the Model
The goal of this part was to develop the conceptual framework and the code to implement the Katz Backoff Trigram algorithm as the model used to predict the next word of a given phrase.
Assessing Goodness of Classification Models
Discusses two ways to assess how good a classification model performs: overall error rate and cross-entropy. Provides and example to gain an intuition for how both metrics work.
Part 4 - Determining Discount Parameters With Cross-Validation
The goal of this part was to determine good values for the discount rates using 5-fold cross-validation on the training partitions of the corpus. Analysis is concluded with an accuracy estimate using the held out test set.
Predicting Next Word Using Katz Back-Off: Part 2 - N-grams Generation and Exploratory Data Analysis
The goal of this part of project was to generate the unigram, bigram, and trigram tables used by the KBO Trigram language model and to do some exploratory data analysis along the way.
Predicting Next Word Using Katz Back-Off: Part 1 - Overview and Pre-Processing
The goal of this project was to build a data product which uses a Katz Backoff Trigram language model to the predict the next word from a series of prior words. This is being implemented as a Shiny R web application which will be accessible from the following link: https://michael-szczepaniak.shinyapps.io/predictnextkbo/ This main goal of this part was to convert the raw corpus data into a form which could be easily utilized by the next step in building n-gram tables and perform exploratory data analysis (EDA).