RPubs

by RStudio

kouimette

Kimberly Ouimette

Recently Published

Real or Fake? Using a Naïve Bayes Classifier to Identify Fraudulent Job Postings

Abstract: The proliferation of fraudulent job postings presents a significant concern globally, with negative implications for both individuals and economies. In this study, I aim to replicate the preprocessing steps outlined by Amaari et al. (2022) and extend their work to compare the performance of Naïve Bayes classifiers to other machine learning algorithms (e.g., random forest, support vector machines) in predicting fraudulent job postings. Utilizing a dataset of real job advertisements, I replicated Amaari and colleagues' (2022) rigorous text preprocessing techniques and oversampling methods to address imbalanced data and applied a Naïve Bayes classification model to the identical dataset used by Amaari et al. (2022). Results indicate that while Naïve Bayes models perform well, they do not surpass the performance of other machine learning algorithms with the exception of K Nearest Neighbor. Notably, further refinement of preprocessing steps (i.e., reducing feature space) improved Naïve Bayes performance significantly, highlighting the importance of preprocessing in text analysis via machine learning. These findings contribute to a growing literature of how different machine learning algorithms perform in detecting fraudulent job postings and emphasize the need for comparative analysis in text classification.

over 1 year ago

Assignment 2: Exploratory Data Analysis

over 2 years ago

American National Election Studies Data Exploration

Johns Hopkins University, Programming and Data Management Course Final