Recently Published
Data management ideas for researchers
My sense is that data management is a challenge for researchers. In an academic context, some fields may receive greater institutional support than others. My experience in business schools was that there was very little support for data curation and management. While many of the ideas I discuss here are general in nature, for concreteness, I focus on the special case of a WRDS user maintaining a local Parquet data library of the kind discussed in Appendix E of Empirical Research in Accounting: Tools and Methods and provide examples using my Python package db2pq.
Category Learning Analyses
Scripts with analyses reported in the manuscript "Stimulus Presentation Duration Affects Category-Learning Accuracy", by Fotis A. Fotiadis, Iris Antonatou Stamatopoulou, & Argiro Vatakis
Predictive_Data_AT: A Structured Workflow for Data Cleaning, Imputation, and Descriptive Analytics in R
Predictive_Data_AT is a comprehensive R‑based workflow that guides users through the full lifecycle of preparing a raw dataset for predictive modeling. The script automates essential preprocessing tasks, including directory setup, data import, sampling, missing‑value detection, blank‑to‑NA conversion, factor re‑encoding, and multi‑stage imputation using mean, mode, and interpolation strategies. It also generates detailed exploratory summaries and descriptive statistics such as central tendency, dispersion, skewness, and kurtosis to help users evaluate the effects of cleaning and imputation on data structure and distribution. This framework provides a reproducible, pedagogically structured template for developing data literacy, ensuring data integrity, and preparing high‑quality inputs for downstream predictive analytics.
Assignment 5 - RQ 2
Research Scenario B2
A university researcher is interested in examining whether student type (domestic or international) is associated with pet ownership. The researcher surveys a group of students and collects information on each student’s status (domestic or international) and whether they currently own a pet (yes or no). Understanding this relationship can help the university and local housing providers better anticipate student needs, such as pet-friendly housing options and support services. Conduct the correct Chi-Square analysis to test the hypothesis.
Start with Research Scenario A2.
Download DatasetA2 (provided above).
First, read Lecture 4: R Basics. Make sure you have downloaded RStudio and have a basic understanding of the RStudio layout.
Next, read the overview sections of Lecture 5: Chi-Square Goodness of Fit. and Lecture 5: Chi-Square Test of Independence.
Determine which Chi-Square test should be used to test the research question.
Once you have selected the correcy Chi-Square test, read the full instructions line by line.
Write the code line by line.
Report your interpretation of the code as hashtags inside your RScript file.
Save your RScript File.
Once you have finished conducting the test, read Lecture 4: R Markdown & Rpubs
Read the instructions line by line in order to create an RMarkdown file. Save your RMarkdown file.
Next, create an Rpubs html document. Save your Rpubs URL.
After you finish conducting the analysis for Research Scenario A2, repeat steps 1 through 6 for Research Scenario B2
R4DS Data Story Report - Axe Throwing
final rendered version of my semester long project for R for Data Science, Fall '24