Recently Published
Mind the Gap: Unveiling Educational Divides in PISA 2018 - A Tale of Top and Bottom Performers
This article analyzes data from the PISA 2018 assessment, focusing on student performance in reading, math, and science across 10 countries. The study compares four bottom-performing countries with six top-performing countries to explore performance gaps and the influence of various factors.
Comprehensive Text Mining and Natural Language Processing Analysis of a Ph.D. Dissertation: Insights from Multi-Faceted Linguistic Exploration
This project aims to conduct a comprehensive linguistic analysis of a Ph.D. dissertation using advanced text mining and natural language processing techniques. By employing a diverse set of analytical methods, including word frequency analysis, sentiment analysis, topic modeling, named entity recognition, and readability assessment, among others, the study seeks to uncover deeper insights into the dissertation’s content, structure, and stylistic features. This multi-faceted approach will not only provide a quantitative understanding of the text but also offer qualitative insights into the dissertation’s themes, coherence, and overall academic contribution. The project’s findings are expected to demonstrate the potential of computational linguistics in enhancing the evaluation and understanding of complex academic texts, potentially paving the way for more sophisticated tools in academic writing and assessment.
Profiling
Testing the how to do profiling in R
Complete Propensity Score Matching with Simulated Data
Propensity Score Matching (PSM): Employed PSM to assess the Average Treatment Effect on the Treated (ATT) using various covariates.
Variable Selection and Interaction Terms: Introduced interaction terms to explore potential synergies or contrasting effects between variables.
Polynomial Transformation: Incorporated polynomial terms to capture potential non-linear relationships between test scores and the outcome variable.
Balance Diagnostics: Utilized balance diagnostics to meticulously evaluate the quality of matching in the models.
Average Treatment Effect on the Treated (ATT): Derived different ATT estimates from linear regression models using matched samples to understand treatment effectiveness.
Multiple Correspondence Analysis (MCA) in Educational Data
- Explore the relationships between different demographic variables, such as age, gender, and education level.
- Explore the relationships between different product features, such as price, color, and size.
- Explore the relationships between different customer segments, such as brand loyalists, price-sensitive shoppers, and impulse buyers.
Scrapping Google Scholar Using R
# Set the system property for the chromedriver executable path
# Defining search keyword
# Function to Extract Data from a page
Time Series Analysis
- Run Time Series Analysis
- Decompose the time series into its components
- Make a forecast
- Evaluate the Performance of Time Series Model
Linear Model, Quadratic Model, Cubic Model, Periodic Signal
Plot the final fit Fn (ti) + pi. Your plot should clearly show the final model on top of the entire time series, while indicating the split between the training and testing data.
Plot the periodic signal Pi. (Your plot should have 1 data point for each month, so 12 in total.) Clearly state the definition the Pi, and make sure your plot is clearly labeled.
Missing Data Imputation and Getting Ready for Real analysis
- How to conduct the Multivariate Imputation By Chained Equations algorithm (MICE) of Real Data?
- How to Visualization and Imputation of Missing Values (VIM)
Using ChatGPT in R
How to use the amazing AI chatbot tool from OpenAI on R? It may sound tough but it's a piece of cake. Here's how we do it.
Survey of Reading Strategies (SORS) Mining
A study of texts included in Survey on Reading Strategies (SORS) using text mining features in R.
Text Mining in R
Tokenizing, Term Document Matrix, Word Analysis, N-Grams
Quarto Document with EdSurvey Package
Wondering about how EdSurvey works?
- Use this post to check some examples with PISA 2018 data.
- Why use weighted sample and what are the significances?
Data Wrangling
Demonstration of multiple datasets saved in different format have been loaded, prepared and merged for a study.
#datascience #datawrangling #rstudio #educationalresearch #SEM #regression
Data Wrangling in R
This demonstration is a part of actual data wrangling that I conducted for my research. There are some new functions that I have used in this demonstration like dealing with the levelled data that came with SPSS dataset. May be very useful for anyone interested.
Advanced Two Level Hierarchical Models
Highlights
1. Null Model Using lmer() function
2. Random Intercept, Fixed Slope Models
3. Random Intercept, Random Slope Models
4. Best Fitting Model
5. Model Comparison
6. Plotting Models (Random effects and Fixed effects)
7. APA Tables
8. Calculating R-Squared, Omega-Squared values etc.
Full Study- H1B Facts and Figures
Advanced Data Manipulation for H1B Longitudinal Analyses
Importing Data from Multiple Online Sources,
1. Tidying Data,
2. Transforming Data,
3. Visualizing Data,
4. Modeling data, and
5. Communication of Results
Facts and Figures of H1B Employees in the US: Exploratory and Trend Analyses
Any US employer that wants to hire an international employee under the so called “specialty occupations” need to go through a standard procedure. One of such stages is famously called Labor Condition Application (LCA), which is filed by the employer at the Department of Labor(DoL), which is pretty much like a written affidavit that the company would pay and treat fairly to the new international hire. It takes roughly 7-days to get decision on the application, which opens up a door for employers to file Non-immigrant temporary visa called H1B at the United States Citizenship and Immigration Services (USCIS). Currently, I am waiting on my H1B approval, which gave me enough motivation to look into this data set.
Regression Tutorial and Poor Model Fitting Alternatives
Sample regression analysis from data manipulation to results reporting in Education.
- Exploratory Data Analysis
- Data Visualization
- Regression Analysis
- Poor Model Fitting and Testing Alternate Model: LASSO Regression
simulating data
Simulation of Data to be modeled in Education
Table Manipulation for GRADE Analysis, Part 3
Some key information and demonstration of data manipulation in R
Reading Table and Data Manipulation
- Reading a table
- Evaluating the variables
- Splitting the table into small tables
- Writing the Tables in the Local Disk
Data Table Manipulation
- Splitting a Data Table into many small data tables.
- Working independently in small data tables, and
- Merging the manipulated tables into a single data table
Simple Text Analysis with Tidy in R
A demonstration of text mining using simple tidytext package. This file shows tokenization by words and culminates with the word cloud.
Exponential Simulation, Experimental Research and linear regression
Variety of exploratory data analysis and modeling techniques
Practical Data Manipulation
The solution for Task 1a and 1b must be programmatic and not resolved using copy/paste. You may use programs such as Access, SQL, SPSS or SAS. Save the program and logic you used to complete these tasks with your final files. You could also include your assumptions and explanations of your methodology if you would like.
Working on purrr
simple tricks using map and other purr functions
Program Evaluation Analyst Performance Task
Program Evaluation Analyst Performance Task
R-Basic Part VII
* The str function
* Simulation - Generating Random Numbers
* Simulation - Simulating a Linear Model
* Simulation - Random Sampling
* R Profiler
R-Basic Part VI
- Loop Functions and Debugging in R
- Loop Functions- lapply
- Loop Functions- apply
- Loop Functions- mapply
- Loop Functions- tapply
- Loop Functions- split
R Basic: Part V
- If-else expression
- For-loop, While-loop, and Repeat-loop
- Writing a Function and Return Value
- Lexical Scoping and the values of Free Variables
- Lexical Scoping vs. Dynamic Scoping
- Character String representing date/time into an R datetime object
R-Basic Part IV
* Understand some of the programming capabilities or R
* Use basic conditional expression to perform different operations
* Check if any or all elements of logical vector are TRUE
* Define and call functions to perform various operations
* Pass arguments to functions, and return variables/objects from functions
* Use for-loops to perform repeated operations
* Articulate in-built functions of R that one can try oneself
R Basics: Part III
* Subset a vector based on properties of another vector.
* Use multiple logical operators to index vectors.
* Extract the indices of vector elements satisfying one or more logical conditions.
* Extract the indices of vector elements matching with another vector.
* Determine which elements in one vector are present in another vector.
* Wrangle data tables using functions in the dplyr package.
* Modify a data table by adding or changing columns.
* Subset rows in a data table.
* Subset columns in a data table.
* Perform a series of operations using the pipe operator.
* Create data frames.
* Plot data in scatter plots, box plots, and histograms.
R-Basic Part II
* Create numeric and character vectors.
* Name the columns of a vector.
* Generate numeric sequences.
* Access specific elements or parts of a vector.
* Coerce data into different data types as needed.
* Sort vectors in ascending and descending order.
* Extract the indices of the sorted elements from the original vector.
* Find the maximum and minimum elements, as well as their indices, in a vector.
* Rank the elements of a vector in increasing order.
* Perform arithmetic between a vector and a single number.
* Perform arithmetic between two vectors of the same length. And
* Some sample Q/As
R-Basic Part I
1. Creating Objects
1.i. Checking Available Objects in the Active Working Directory
1.ii. Calculating the Quadratic Equation for the Value of X
1.iii. Removing Unnecessary Objects from the Working Directory
2. Functions
2.i. Five Basic Characteristics
3. Data Sets and Variables
4. Variable Types
5. Writing Loops in R
5.i. Write Codes for all Possible Outcomes One by One
5.ii. Give a Range
5.iii. Easy for loop
5.iii.a. Easy for loop b
5.iv. for loop 2
5.v. Professional for loop
5.vi. Extra for loop
5.vii. Sample Question/Answer *for loop
6. Writing a for loop that prints the lyrics to the children's song “Alice the Camel”, and “Five Little Monkeys”
Creating a Fake Data Set
This is a demonstration of creating a fake data set. I am going to use this data in various analytical demonstrations. You can copy the codes and simulate your own data.
Doing More with Regression Output in R
The default R output for linear or other models include a lot of unnecessary beside what we want to see, which can easily overwhelm us and may sometime confuse us. Having some strategies that help narrow down our requirement in a nice visually appealing way helps us immensely.
Examples on Endogeneity, Instrumental Variables, and Experimental Design
In this part of the problem set, we are going to replicate part of the results of Joshua Angrist and William Evans' article "Children and Their Parents' Labor Supply: Evidence from Exogenous Variation in Family Size."
Sample GGPLOT Codes and Plots
A good source of sample GGPLOT codes and Plots
Simple Linear Model Tutorial
1. Define the Linear Model
2. Fit Linear Model using Fake Data &
3. Plot Linear Model
Powerball Simulation
For a long time I was trying to come up with a function that would simulate a Powerball draw and record the lucky numbers (the complete set of six). I recently realized that a simple function could do such a big task.
Fisher's Exact P-value, Sharp Null Hypothesis, and KS Test
Causality, Experimental Design, Fisher's Exact P-value, Sharp Null Hypothesis, Kolmogorov-Smirnov Test Statistics
Special Distribution: Simulation and Graph
Based on the lectures from Dr. Duflo at MIT
Downloading and Cleaning Data, Some Preliminary Steps
It is a demonstration of downloading data from an online repository and conducting some intuitive data preparation, and or cleaning activities for further analyses.
Fictional Nepalese Research Study
This project is classified into two phases: A. Fictional Data Generation, and B. Exploratory Data Analyses/Data Visualization
Exploratory Data Analysis: Histogram, Kernel Distribution, and CDF
A Self Help Guide to Exploratory Data Analyses using Histogram (including double histogram), Kernel Distribution, and CDF using GGPLOT2, TIDYVERSE, and COWPLOT
Sample (), Simulation, and Apply Funcations
Self Help Tutorial
Using GGPLOT2
Self help practice of GGPLOT package
Creating Matrix and Doing Fun Things
Self help practice of Matrix in R
Correlation Among Students' Pretest and Posttest Scores_1
Changing the X-axis tick text and size
Capstone Modified Week 2
Modified Document
Presentation Deck _ Capstone Project
Final Assignment for the Coursera Capstone Project by Nirmal Ghimire
Week 2 Capstone
Capstone Project Week 2 Peer Review Assignment
Data Science Capstone_Week 1
This component also include solutions to Quiz 1. These programs may not run in your system properly because the paths to data files are personalized to my need. You have to change the paths for your program.
Thanks,
Creating Interactive Lessons on R
Using 'swirlify' package we can create an interactive Lesson, Tests on R.
-Based on Dr. Peng's Class
Plot.ly Project, Developing Data Products
Assignment
R Markdown Files and Sharing Them
Based on the lectures from Dr. Caffo
Peer Reviewed Project_PML
Practical Machine Learning Final Project
Combining Predictors, Forecasting, & Unsupervised Prediction
Based on Dr. J Leek's Lectures.
Boosting, and Model Based Predication
Based on Dr. J. Leek's Lecture
Predicting with Trees, Bagging, and Bootstrapping
Based on the lecture by Dr. J. Leek
Predicting with Regression
Using manual method and the caret package on R.
Based on the lectures by Dr. J Leek.
Preprocessing with PCA
It shows the combination of many different prediction packages including 'care' and 'rpart'. The functions 'train' and 'predict' did not work for some reason. However, I completed this section with the help of 'rpart' package.
This section is based on the lectures by Dr. J Leek.
Covariate Creation
Based on professor J.Leek's lectures
Preprocessing for Prediction Variables
Based on Dr. J. Leek's Class
Plotting Prediction on ML
Based on Professor J. Leek's class
Regression Model Project
Regression Model Project through Coursera
Linear Regression_Residuals
Based on Dr. Caffo's lecture
Document
Part 2
Statistical Inference Project
Document
Part 1
Statistical Inference
Project
Document
Statistical Inference Project
Part 1
Part 2
Reproducible Research Project 2
A small research study in the partial fulfillment of course Reproducible Research
Reproducible Research_Project 1
https://ghimirenirmal.blogspot.com/