Jump to: content, navigation, search

Navigation menu

2021 Survey Results: Difference between revisions

The following are steps we took to anonymize published survey data in order to preserve the privacy of respondents.
 
1.# Download the survey data into a spreadsheet.<br>
2.# Create pivot tables based on demographic data, language used, and location data (country) to identify groups, particularly demographic groups, with fewer than 20 observations (e.g., fewer than 20 from any country, fewer than 20 using any language).<br>
3.# Identify comments by individuals who did not grant permission to share their comments, and mark appropriately (highlight in red, for example). Ensure that those comments will not leave control of the Board.<br>
4.# Copy comments to a separate spreadsheet, translate them into English using DeepL or Microsoft Translator, and publish translations of only those comments for which such assent has been granted, and publish only separately from all other data (comments can potentially be used to identify respondents).<br>
5.# Given the relatively small number of females and representatives of non-binary and other genders in the OSM community, gender data must be released separately from other data in a manner that does not allow identification (deanonymization) of individual respondents. This is also true of the "time in project" variable, which could be used to identify individual respondents if matched to country of residence.<br>
5a.## Create two duplicate files of the survey data, for a total of three copies (original and two duplicates).<br>
5b.## In all spreadsheets, replace labels of all demographic and country groups with fewer than 20 observations with more generic labels (e.g., if fewer than 20 respondents are from Saudi Arabia, change 'Saudi Arabia' to 'Middle East'; if fewer than 20 respondents use Arabic, change the 'ar' language code to 'Middle East').<br>
5c.## In the duplicates of the survey data, delete the columns containing language codes, country codes and names, and all demographic data except gender (in one duplicate) or "time in project" (in the other duplicate). <br>
5d.## Ensure that in the two duplicates, one or the other of these columns is preserved and all other demographic columns are deleted.<br>
6.# Delete the 'Response ID', 'Date submitted', 'Last page', 'Start language', 'Seed', 'May we share (anonymously, of course) your comments with the OpenStreetMap community?' columns from worksheets to make the possibility of merging of the datasets less likely.<br>
7.# Run script to scramble randomly the order of responses by gender and "time in project" in order to prevent merging with the original spreadsheet and thereby deanonymizing the data.
 
=== Country clustering===
|Middle East
|-
| Netherlands
|-
|Other Europe (Non-EU), not elsewhere specified
149

edits