gravatar

helenabakic

Helena Bakic

Recently Published

women-in-open-data
We used metadata in rdf form to analyze the prevalence of gender in 54682 data sets from ‘govdata.de’. The data was retrieved through an API and cleaned. For the analysis different parts of the data set were subset in order to filter for gendered data. Four different data analysis approaches were performed: calculating the percentage for the overall data sets and gendered data thereof and comparing on an annual basis, filtering the amount of gendered and non-gendered data sets per topic, unsupervised topic modeling with LDA on the descriptions, term frequency analysis on the keywords. We found that gendered data sets only make up a small amount (~6 percent) of overall data sets and are confined to 7 out of 12 topics. Furthermore, the most prevalent data appears to be treating construction and traffic. Our analysis has shown that there is a stark gender data gap and in order to be inclusive the portal would need to apply (inclusive) standards on the data published.