Difference between revisions of "2021 Survey Results"

From OpenStreetMap Foundation
Line 108: Line 108:
Paul Norman (aka pnorman) has analyzed [https://www.paulnorman.ca/blog/2021/02/openstreetmap-survey/ participation in the survey by country compared to active contributor data] as well as [https://www.openstreetmap.org/user/pnorman/diary/395896 by views of openstreetmap.org]
Paul Norman (aka [https://www.openstreetmap.org/user/pnorman/ pnorman]) has analyzed [https://www.paulnorman.ca/blog/2021/02/openstreetmap-survey/ participation in the survey by country compared to active contributor data] as well as [https://www.openstreetmap.org/user/pnorman/diary/395896 by views of openstreetmap.org]
===Languages used===
===Languages used===

Revision as of 18:32, 28 February 2021


The 2021 OSMF Community Survey concluded on February 14th. Summary statistics and anonymized raw data are posted here. See the links below.

Quick Links
Anonymized raw survey data https://wiki.osmfoundation.org/w/images/9/95/2021_OSMF_survey_split_anonymized.ods
Summary statistics https://wiki.osmfoundation.org/w/images/4/44/2021_OSMF_survey_summary_stats.ods
Probability tests (P-tests) of cross-sectional comparisons https://wiki.osmfoundation.org/w/images/7/79/2021_OSMF_survey_P-tests.ods
weighted (normalized) responses https://wiki.osmfoundation.org/w/images/3/36/2021_OSMF_survey_normalized_weighted_responses.odt
normalization data https://wiki.osmfoundation.org/w/images/1/1b/Normalization-data.ods
comments https://wiki.osmfoundation.org/w/images/4/45/2021_survey_releasable_comments_ENG.ods
Mike Migurski's maps https://www.openstreetmap.org/user/migurski/diary/395957
Borda and Condorcet rankings of question S1 (Board priorities for 2021) https://www.openstreetmap.org/user/Stereo/diary/395903

Final response statistics

The 2021 OSMF Community Survey ran from January 16 to February 14, 2021. It was made available initially in 14 languages, and 4 more were added. The survey was advertised via social media, mailing lists, direct emails to user and working groups, and via banners on websites. The target audience was members of the OpenStreetMap community.

Response summary

Full responses 2958
Incomplete responses 1270
Total responses 4228

Numbers of responses to specific optional demographic questions

What is your gender? 2831
Country of residence? 2284
Are you a...? 2958

Survey participant summary

Total tokens issued 4958
Total surveys completed 2958

Countries participating

Responses came from 121 countries, based on respondents' declarations of countries of residence. You may download the list of countries with numbers of respondents from each here. The sum of responses to this question is 2,284 (not all respondents provided this information). The top 20 countries of declared residence are:

Germany 411
United States 294
France 165
United Kingdom 146
Italy 125
Russian Federation 61
Poland 56
Spain 53
Brazil 52
Switzerland 50
Austria 47
Canada 41
India 41
Netherlands 39
Australia 38
Ukraine 36
Belgium 28
Japan 27
Philippines 27
Belarus 26

Paul Norman (aka pnorman) has analyzed participation in the survey by country compared to active contributor data as well as by views of openstreetmap.org

Languages used

This list consists of the languages used initially by respondents. The sum of these numbers is 4,228. Not all of these respondents completed the survey.

English 2365
German 616
French 367
Spanish 169
Italian 158
Russian 150
Polish 97
Portuguese (Brazillian) 87
Ukrainian 54
Japanese 42
Chinese 42
Indonesian 22
Persian 21
Turkish 15
Korean 14
Arabic 7
Vietnamese 2

Anonymized raw data and summary statistics based on raw data

A spreadsheet containing anonymized raw data from the 2021 survey can be downloaded from here. The responses have been segregated in some cases into individual worksheets to preserve anonymity to the maximum degree. Some countries and languages were grouped in order to preserve anonymity.

A spreadsheet containing summary statistics calculated from the raw data can be downloaded from here. These statistics were calculated using raw data before anonymization.

Guillaume Rischard (aka Stereo) has done a separate calculation of the ranking of community sentiment question S1 here. User TheSwavu added a Condorcet ranking of that question to Stereo's diary post.

Mike Migurski (aka migurski) produced maps showing respondents to individual questions by country or anonymized region.


Cross-sectional analyses of responses of various segments of the OSM community and associated P-tests (to indicate level of significance of differences) can be downloaded from here.

Weighted (normalized) data

Weighted (normalized) means of questions F1-F5 and S2-S3 of the survey are in this document.

Dr. Jennings Anderson has kindly provided the data used for normalization. For a detailed discussion of normalization of the data, please see his diary post here.

Comments received in the survey

Two opportunities were provided to offer comments, at the end of the feedback section of the survey, and between the community sentiment and demographic sections of the survey. Respondents were given the option of allowing anonymous public release of comments or asking that they be read only by members of the OSMF Board of Directors. All comments in languages other than English were machine translated into English prior to publication. The spreadsheet containing comments eligible for public release is here.

Anonymization of survey data to preserve privacy

(See also the survey privacy policy.)

In the interests of transparency, the OSMF strives to publish as much survey data as it can while preserving the privacy and anonymity of respondents, both in compliance with applicable privacy laws and OSMF privacy policy, and in order to balance the need for transparency and obligation to respect privacy of OSM community members.

The following are steps we took to anonymize published survey data in order to preserve the privacy of respondents.

  1. Download the survey data into a spreadsheet.
  2. Create pivot tables based on demographic data, language used, and location data (country) to identify groups, particularly demographic groups, with fewer than 20 observations (e.g., fewer than 20 from any country, fewer than 20 using any language).
  3. Identify comments by individuals who did not grant permission to share their comments, and mark appropriately (highlight in red, for example). Ensure that those comments will not leave control of the Board.
  4. Copy comments to a separate spreadsheet, translate them into English using DeepL or Microsoft Translator, and publish translations of only those comments for which such assent has been granted, and publish only separately from all other data (comments can potentially be used to identify respondents).
  5. Given the relatively small number of females and representatives of non-binary and other genders in the OSM community, gender data must be released separately from other data in a manner that does not allow identification (deanonymization) of individual respondents. This is also true of the "time in project" variable, which could be used to identify individual respondents if matched to country of residence.
    1. Create two duplicate files of the survey data, for a total of three copies (original and two duplicates).
    2. In all spreadsheets, replace labels of all demographic and country groups with fewer than 20 observations with more generic labels (e.g., if fewer than 20 respondents are from Saudi Arabia, change 'Saudi Arabia' to 'Middle East'; if fewer than 20 respondents use Arabic, change the 'ar' language code to 'Middle East').
    3. In the duplicates of the survey data, delete the columns containing language codes, country codes and names, and all demographic data except gender (in one duplicate) or "time in project" (in the other duplicate).
    4. Ensure that in the two duplicates, one or the other of these columns is preserved and all other demographic columns are deleted.
  6. Delete the 'Response ID', 'Date submitted', 'Last page', 'Start language', 'Seed', 'May we share (anonymously, of course) your comments with the OpenStreetMap community?' columns from worksheets to make the possibility of merging of the datasets less likely.
  7. Run script to scramble randomly the order of responses by gender and "time in project" in order to prevent merging with the original spreadsheet and thereby deanonymizing the data.

Country clustering

To preserve anonymity, certain countries are clustered (aggregated) in the publicly released dataset. The country/region list is as follows:

Africa, anglophone
Africa, romanophone
Americas, other, not elsewhere specified
East Asia/Oceania. not elsewhere specified
European Union, other, not elsewhere specified
Middle East
Other Europe (Non-EU), not elsewhere specified
South & West Asia, not elsewhere specified
United States

Plan for presentation of survey results

Participation numbers, the list of participating countries, releasable comments, anonymized raw data, and summary statistics have been released (see above). Weighted (normalized) summary statistics for questions F1-F5 and S2-S3 have been released (see above).

The main effort to release data is complete. If you are interested in analyses absent from the statistics presented above and that cannot be calculated from the anonymized raw data, please contact the Board of Directors at board@osmfoundation.org with your analytical request and we will seek to accommodate it as time permits. Respecting privacy and anonymity of respondents is our highest priority, followed closely by presentation and analysis of the data as quickly as we can.

Remaining to be done:

Mar 4 – GeoMob podcast interview with Ed Freyfogle on the survey.

Mar 10-11 - Series of three BBB video conferences open to OSM community 8 hours apart to answer questions about data, at noon UTC, 2000 UTC, and 0400 March 11 UTC (these correspond to 0700, 1500, and 2300 EST).