2021 Survey Results
The 2021 OSMF Community Survey concluded on February 14th. Summary statistics and anonymized raw data are posted here. See the links below.
Final response statistics
The 2021 OSMF Community Survey ran from January 16 to February 14, 2021. It was made available initially in 14 languages, and 4 more were added. The survey was advertised via social media, mailing lists, direct emails to user and working groups, and via banners on websites. The target audience was members of the OpenStreetMap community.
Full responses 2958
Incomplete responses 1270
Total responses 4228
Numbers of responses to specific optional demographic questions
What is your gender? 2831
Country of residence? 2284
Are you a...? 2958
Survey participant summary
Total tokens issued 4958
Total surveys completed 2958
Responses came from 121 countries, based on respondents' declarations of countries of residence. You may download the list of countries with numbers of respondents from each here. The sum of responses to this question is 2,284 (not all respondents provided this information). The top 20 countries of declared residence are:
Paul Norman (aka pnorman) has analyzed participation in the survey by country compared to active contributor data as well as by views of openstreetmap.org
This list consists of the languages used initially by respondents. The sum of these numbers is 4,228. Not all of these respondents completed the survey.
Anonymized raw data and summary statistics based on raw data
A spreadsheet containing anonymized raw data from the 2021 survey can be downloaded from here. The responses have been segregated in some cases into individual worksheets to preserve anonymity to the maximum degree. Some countries and languages were grouped in order to preserve anonymity.
A spreadsheet containing summary statistics calculated from the raw data can be downloaded from here. These statistics were calculated using raw data before anonymization.
Guillaume Rischard (aka Stereo) has done a separate calculation of the ranking of community sentiment question S1 here. User TheSwavu added a Condorcet ranking of that question to Stereo's diary post.
Cross-sectional analyses of responses of various segments of the OSM community and associated P-tests (to indicate level of significance of differences) can be downloaded from here.
Weighted (normalized) data
Weighted (normalized) means of questions F1-F5 and S2-S3 of the survey are in this document.
Comments received in the survey
Two opportunities were provided to offer comments, at the end of the feedback section of the survey, and between the community sentiment and demographic sections of the survey. Respondents were given the option of allowing anonymous public release of comments or asking that they be read only by members of the OSMF Board of Directors. All comments in languages other than English were machine translated into English prior to publication. The spreadsheet containing comments eligible for public release is here.
The target audience was the OSM community, and was specifically not limited only to members of the OSM Foundation. The OSM community is ill defined and definition of it has been controversial. For purposes of this survey, the community was defined as consisting of mappers, communicators, data users, developers/maintainers, event organizers, and hardware/system operators involved in the OSM project. A freeform "other" answer was also made available in that demographic question.
Selection biases apparent in the survey
- The survey required Internet access (but so do contribution to and use of OpenStreetMap, so this particular bias is of relatively little concern).
- It required knowledge of one of the 18 languages used in the survey (this is the largest number of languages used to date in any OSMF survey, and was part of a concerted effort to reduce bias toward speakers and readers of English).
- It was announced and advertised through OSM-centric communications media (OSM mail lists; direct email to user groups and working groups; social media channels used by OSM contributors and users, local chapters, and local communities; newsletters; banners on osm.org and in editors) but access to the survey instrument was unrestricted (this created potential for oversampling, i.e., inclusion of respondents external to the OSM community).
A member of the community independently published the questions separately from the survey. This injected an additional selection bias by alerting some potential respondents to the specific questions to be asked, causing some of them to decline to participate. The Board had deliberately declined to publish the questions in advance so as to avoid this problem.
Post-survey, we noted that members of the OpenStreetMap Foundation were overrepresented. Based on the demographic data, slightly more than 1/5 of respondents declared themselves to be Foundation members. However, comparison of summary statistics of the entire sample with summary statistics of Foundation members revealed no substantial bias in the data due to this. It is highly probable that employees of NGOs and firms using OSM data, members of working groups, as well as members of local chapters and communities were also overrepresented. Again, however, no substantial bias in the data due to this could be discerned when results from those segments were compared to other segments.
- That the sample would be biased toward individuals with above-average engagement in the project, and that casual mappers and data users would be underrepresented. This assumption implies that mappers eligible for the "active contributor program", who number fewer than 8,000, would constitute a significant portion of the sample and likely be overrepresented.
- That Foundation members would be overrepresented, since Foundation members are typically more engaged (as noted above, this assumption was justified).
- Some degree of geographic normalization of data would be possible by comparing responses to known characteristics of mappers in the community, such as locations and volumes of edits and changesets from OSMstats (and this normalization effort has indicated that the results are not significantly biased in any direction in terms of geographic coverage).
- That making demographic data optional would increase overall participation in the survey (the results justified this assumption).
- That forcing answers to the "feedback" and "community sentiment" questions would reduce bias against respondents who either didn't care or have an opinion about the specific issue (the high number of "neutral" responses to question F1 is a strong indicator that this assumption was justified).
- That the sample would be biased toward individuals interested in the OSM project.
Anonymization of survey data to preserve privacy
The following are steps we took to anonymize published survey data in order to preserve the privacy of respondents.
- Download the survey data into a spreadsheet.
- Create pivot tables based on demographic data, language used, and location data (country) to identify groups, particularly demographic groups, with fewer than 20 observations (e.g., fewer than 20 from any country, fewer than 20 using any language).
- Identify comments by individuals who did not grant permission to share their comments, and mark appropriately (highlight in red, for example). Ensure that those comments will not leave control of the Board.
- Copy comments to a separate spreadsheet, translate them into English using DeepL or Microsoft Translator, and publish translations of only those comments for which such assent has been granted, and publish only separately from all other data (comments can potentially be used to identify respondents).
- Given the relatively small number of females and representatives of non-binary and other genders in the OSM community, gender data must be released separately from other data in a manner that does not allow identification (deanonymization) of individual respondents. This is also true of the "time in project" variable, which could be used to identify individual respondents if matched to country of residence.
- Create two duplicate files of the survey data, for a total of three copies (original and two duplicates).
- In all spreadsheets, replace labels of all demographic and country groups with fewer than 20 observations with more generic labels (e.g., if fewer than 20 respondents are from Saudi Arabia, change 'Saudi Arabia' to 'Middle East'; if fewer than 20 respondents use Arabic, change the 'ar' language code to 'Middle East').
- In the duplicates of the survey data, delete the columns containing language codes, country codes and names, and all demographic data except gender (in one duplicate) or "time in project" (in the other duplicate).
- Ensure that in the two duplicates, one or the other of these columns is preserved and all other demographic columns are deleted.
- Create two duplicate files of the survey data, for a total of three copies (original and two duplicates).
- Delete the 'Response ID', 'Date submitted', 'Last page', 'Start language', 'Seed', 'May we share (anonymously, of course) your comments with the OpenStreetMap community?' columns from worksheets to make the possibility of merging of the datasets less likely.
- Run script to scramble randomly the order of responses by gender and "time in project" in order to prevent merging with the original spreadsheet and thereby deanonymizing the data.
To preserve anonymity, certain countries are clustered (aggregated) in the publicly released dataset. The country/region list is as follows:
|Americas, other, not elsewhere specified|
|East Asia/Oceania. not elsewhere specified|
|European Union, other, not elsewhere specified|
|Other Europe (Non-EU), not elsewhere specified|
|South & West Asia, not elsewhere specified|
Plan for presentation of survey results
Participation numbers, the list of participating countries, releasable comments, anonymized raw data, and summary statistics have been released (see above). Weighted (normalized) summary statistics for questions F1-F5 and S2-S3 have been released (see above).
The main effort to release data is complete. If you are interested in analyses absent from the statistics presented above and that cannot be calculated from the anonymized raw data, please contact the Board of Directors at email@example.com with your analytical request and we will seek to accommodate it as time permits. Respecting privacy and anonymity of respondents is our highest priority, followed closely by presentation and analysis of the data as quickly as we can.
A GeoMob podcast on the survey results was recorded on March 4 and can be heard here.
Remaining to be done:
Answering all questions posed in varous fora
Questions and answers
Q. How important is the takeover protection (question S1, choice 1) for a) the survey respondents who are a corporate sponsor of OSM or a commercial company using OSM data (question D1, choices 5, 6, or and b) those respondents who are neither? Is there a difference between the two groups? If yes, is it (statistically) significant?
A. There is a statistically significant difference between those who answered yes and those who answered no to these two questions. I did not break out the "double no" group ("no" in both categories) because to be frank doing that complicates the problem and the difference on initial examination was not that great. You can see the numbers in the spreadsheet here. Unfortunately, because the raw data must be kept separate, this spreadsheet contains only the values of the calculations and not the formulae, but you can reproduce the calculations from the 28 pivot tables if you wish to check the math.
The methodology was to assign a rank of 7 to 1 for first choice, second choice, et cetera for the option "Takeover protection". I multiplied the rank of each choice times the number of observations who made that choice, then took the means and standard deviations (range of course was 1 through 7) and counted the number of observations.