Recently Published
K-Nearest Neighbor - Predicting Bug Covering by Ranking
Train a KNN algorithm to identify a bug covering question by looking at the ranking of question. The ranking is based on the number of YES answers received by the question, for instance, the question (s) with the highest number of YES is ranked one, the questions with the second highest number of YES answers are ranked two, and so forth.
K-Nearest Neighbor - Predicting Bug Covering by Threshold Voting
Estimate the level of YES votes necessary to predict the code fragments that are related to failure. This minimal level is called the threshold voting metric. I employed the k-nearest neighbor with leaving one out cross validation. Results showed that the best classification came when YES votes were at least 6.
Bug prediction based on Majority Voting
Estimate the level of majority vote necessary to predict the code fragments that are related to failure. Majority voting is computed by the difference between the number of YES votes and NO votes. I employed the k-nearest neighbor with leaving one out cross validation. Results showed that best classification came when difference between YES and NO is larger or equal to -2.
How are workers distributed by bug covering questions
Since I did not control for workers across different types of questions, I would like to know if there is a staticially significant concentration of highly skilled workers on the questions that cover bugs.
Are answers more accurate for easier questions?
Analysis of answer accuracy versus difficulty of questions.
Were more some questions answered by more experienced workers?
Analysis of how workers were distributed across questions in terms the number of years of programming experience from workers.
How are professions distributed across questions?
Workers self-declared different profession types, e.g., professional developer, graduate student, hobbyist, under-gradutate student, and other. Since workers were randomly allocated to questions, I would like to know if some questions were answered by disproportionate number of workers from certain professions. I am particularly interested the professions that present lower answer accuracy, because such can cause some questions to be overlooked.
Worker score distribution across questions
Each question was asked to 20 different workers. and each worker was given a score by taking a computer programming test. This score ranges from 3 to 5, which correspond to number of correct answers in the test (5 was maximum).
How is answer accuracy distributed by question?
How is answer accuracy distributed over all the questions asked?
What are the questions with the most and least accurate answers?
Do these questions have anything in common?
Are there meaningful clusters of Mechanical Turk workers grouped by age and years of programming experience?
Experiment in which MTurk workers helped locate faults in open source software. This shows how the workers were distributed by age and years of experience with software programming. Yes..., many MTurkers declared that they can read source code!