2021 Survey Results
The 2021 OSMF Community Survey concluded on February 14th. Summary statistics and anonymized raw data are posted here. See the links below.
The 2021 OSMF Community Survey ran from January 16 to February 14, 2021. It was made available initially in 14 languages, and 4 more were added. The survey was advertised via social media, mailing lists, direct emails to user and working groups, and via banners on websites. The target audience was members of the OpenStreetMap community.
Final response statistics
Full responses 2958
Incomplete responses 1270
Total responses 4228
Numbers of responses to specific optional demographic questions
What is your gender? 2831
Country of residence? 2284
Are you a...? 2958
Survey participant summary
Total tokens issued 4958
Total surveys completed 2958
Responses came from 121 countries, based on respondents' declarations of countries of residence. You may download the list of countries with numbers of respondents from each here. The sum of responses to this question is 2,284 (not all respondents provided this information). The top 20 countries of declared residence are:
Paul Norman has analyzed participation in the survey by country compared to active contributor data as well as by views of openstreetmap.org
This list consists of the languages used initially by respondents. The sum of these numbers is 4,228. Not all of these respondents completed the survey.
Anonymized raw data and summary statistics based on raw data
A spreadsheet containing anonymized raw data from the 2021 survey can be downloaded from here. The responses have been segregated in some cases into individual worksheets to preserve anonymity to the maximum degree. Some countries and languages were grouped in order to preserve anonymity.
A spreadsheet containing summary statistics calculated from the raw data can be downloaded from here. These statistics were calculated using raw data before anonymization.
Guillaume Rischard (aka Stereo) has done a separate calculation of the ranking of community sentiment question S1 here.
Cross-sectional analyses of responses of various segments of the OSM community and associated P-tests (to indicate level of significance of differences) can be downloaded from here.
Weighted (normalized) data
Weighted (normalized) means of questions F1-F5 and S2-S3 of the survey are in this document.
Dr. Jennings Anderson has kindly provided the data used for normalization.
Comments received in the survey
Two opportunities were provided to offer comments, at the end of the feedback section of the survey, and between the community sentiment and demographic sections of the survey. Respondents were given the option of allowing anonymous public release of comments or asking that they be read only by members of the OSMF Board of Directors. All comments in languages other than English were machine translated into English prior to publication. The spreadsheet containing comments eligible for public release is here.
Anonymization of survey data to preserve privacy
The following are steps we took to anonymize published survey data in order to preserve the privacy of respondents.
1. Download the survey data into a spreadsheet.
2. Create pivot tables based on demographic data, language used, and location data (country) to identify groups, particularly demographic groups, with fewer than 20 observations (e.g., fewer than 20 from any country, fewer than 20 using any language).
3. Identify comments by individuals who did not grant permission to share their comments, and mark appropriately (highlight in red, for example). Ensure that those comments will not leave control of the Board.
4. Copy comments to a separate spreadsheet, translate them into English using DeepL or Microsoft Translator, and publish translations of only those comments for which such assent has been granted, and publish only separately from all other data (comments can potentially be used to identify respondents).
5. Given the relatively small number of females and representatives of non-binary and other genders in the OSM community, gender data must be released separately from other data in a manner that does not allow identification (deanonymization) of individual respondents. This is also true of the "time in project" variable, which could be used to identify individual respondents if matched to country of residence.
5a. Create two duplicate files of the survey data, for a total of three copies (original and two duplicates).
5b. In all spreadsheets, replace labels of all demographic and country groups with fewer than 20 observations with more generic labels (e.g., if fewer than 20 respondents are from Saudi Arabia, change 'Saudi Arabia' to 'Middle East'; if fewer than 20 respondents use Arabic, change the 'ar' language code to 'Middle East').
5c. In the duplicates of the survey data, delete the columns containing language codes, country codes and names, and all demographic data except gender (in one duplicate) or "time in project" (in the other duplicate).
5d. Ensure that in the two duplicates, one or the other of these columns is preserved and all other demographic columns are deleted.
6. Delete the 'Response ID', 'Date submitted', 'Last page', 'Start language', 'Seed', 'May we share (anonymously, of course) your comments with the OpenStreetMap community?' columns from worksheets to make the possibility of merging of the datasets less likely.
7. Run script to scramble randomly the order of responses by gender and "time in project" in order to prevent merging with the original spreadsheet and thereby deanonymizing the data.
To preserve anonymity, certain countries are clustered (aggregated) in the publicly released dataset. The country/region list is as follows:
|Americas, other, not elsewhere specified|
|East Asia/Oceania. not elsewhere specified|
|European Union, other, not elsewhere specified|
|Other Europe (Non-EU), not elsewhere specified|
|South & West Asia, not elsewhere specified|
Plan for presentation of survey results
Participation numbers, the list of participating countries, releasable comments, anonymized raw data, and summary statistics have been released (see above). Weighted (normalized) summary statistics for questions F1-F5 and S2-S3 have been released (see above).
The main effort to release data is complete. If you are interested in analyses absent from the statistics presented above and that cannot be calculated from the anonymized raw data, please contact the Board of Directors at firstname.lastname@example.org with your analytical request and we will seek to accommodate it as time permits. Respecting privacy and anonymity of respondents is our highest priority, followed closely by presentation and analysis of the data as quickly as we can.
Remaining to be done:
Mar 4 – GeoMob podcast interview with Ed Freyfogle on the survey.
Mar 10-11 - Series of three BBB video conferences open to OSM community 8 hours apart to answer questions about data, at noon UTC, 2000 UTC, and 0400 March 11 UTC (these correspond to 0700, 1500, and 2300 EST).