2021 Survey Results

From OpenStreetMap Foundation
Revision as of 16:40, 5 April 2021 by Apm-wa (talk | contribs) (→‎Overview)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Overview

The 2021 OSMF Community Survey concluded on February 14th. Summary statistics and anonymized raw data are posted here. See the links below.

Quick Links
Anonymized raw survey data https://wiki.osmfoundation.org/w/images/9/95/2021_OSMF_survey_split_anonymized.ods
Summary statistics https://wiki.osmfoundation.org/w/images/4/44/2021_OSMF_survey_summary_stats.ods
Probability tests (P-tests) of cross-sectional comparisons https://wiki.osmfoundation.org/w/images/7/79/2021_OSMF_survey_P-tests.ods
weighted (normalized) responses https://wiki.osmfoundation.org/w/images/3/36/2021_OSMF_survey_normalized_weighted_responses.odt
normalization data https://wiki.osmfoundation.org/w/images/1/1b/Normalization-data.ods
comments https://wiki.osmfoundation.org/w/images/4/45/2021_survey_releasable_comments_ENG.ods
Mike Migurski's maps ("Summary Maps of OSMF 2021 Survey Results") https://www.openstreetmap.org/user/migurski/diary/395957
Dr. Jennings Anderson's maps ("Editing Activity by Country Since 2020 and the 2021 OSMF Survey") https://www.openstreetmap.org/user/Jennings%20Anderson/diary/395950
Paul Norman's graphic comparisons to contributor and user statistics https://www.paulnorman.ca/blog/2021/02/openstreetmap-survey/
https://www.openstreetmap.org/user/pnorman/diary/395896
Borda and Condorcet rankings of question S1 (Board priorities for 2021) https://www.openstreetmap.org/user/Stereo/diary/395903
Slide presentation on survey results, including speaker's notes, PDF version https://wiki.osmfoundation.org/w/images/2/27/2021_survey_slides.pdf
GeoMob podcast on the survey results https://thegeomob.com/podcast/episode-63

Final response statistics

The 2021 OSMF Community Survey ran from January 16 to February 14, 2021. It was made available initially in 14 languages, and 4 more were added. The survey was advertised via social media, mailing lists, direct emails to user and working groups, and via banners on websites. The target audience was members of the OpenStreetMap community.

Response summary

Full responses 2958
Incomplete responses 1270
Total responses 4228

Numbers of responses to specific optional demographic questions

What is your gender? 2831
Country of residence? 2284
Are you a...? 2958

Survey participant summary

Total tokens issued 4958
Total surveys completed 2958

Countries participating

Responses came from 121 countries, based on respondents' declarations of countries of residence. You may download the list of countries with numbers of respondents from each here. The sum of responses to this question is 2,284 (not all respondents provided this information). The top 20 countries of declared residence are:

Germany 411
United States 294
France 165
United Kingdom 146
Italy 125
Russian Federation 61
Poland 56
Spain 53
Brazil 52
Switzerland 50
Austria 47
Canada 41
India 41
Netherlands 39
Australia 38
Ukraine 36
Belgium 28
Japan 27
Philippines 27
Belarus 26

Paul Norman (aka pnorman) has analyzed participation in the survey by country compared to active contributor data as well as by views of openstreetmap.org

Languages used

This list consists of the languages used initially by respondents. The sum of these numbers is 4,228. Not all of these respondents completed the survey.

English 2365
German 616
French 367
Spanish 169
Italian 158
Russian 150
Polish 97
Portuguese (Brazillian) 87
Ukrainian 54
Japanese 42
Chinese 42
Indonesian 22
Persian 21
Turkish 15
Korean 14
Arabic 7
Vietnamese 2

Anonymized raw data and summary statistics based on raw data

A spreadsheet containing anonymized raw data from the 2021 survey can be downloaded from here. The responses have been segregated in some cases into individual worksheets to preserve anonymity to the maximum degree. Some countries and languages were grouped in order to preserve anonymity.

A spreadsheet containing summary statistics calculated from the raw data can be downloaded from here. These statistics were calculated using raw data before anonymization.

Guillaume Rischard (aka Stereo) has done a separate calculation of the ranking of community sentiment question S1 here. User TheSwavu added a Condorcet ranking of that question to Stereo's diary post.

Mike Migurski (aka migurski) produced maps showing respondents to individual questions by country or anonymized region.

P-tests

Cross-sectional analyses of responses of various segments of the OSM community and associated P-tests (to indicate level of significance of differences) can be downloaded from here.

Weighted (normalized) data

Weighted (normalized) means of questions F1-F5 and S2-S3 of the survey are in this document.

Dr. Jennings Anderson has kindly provided the data used for normalization. For a detailed discussion of normalization of the data, please see his diary post here.

Comments received in the survey

Two opportunities were provided to offer comments, at the end of the feedback section of the survey, and between the community sentiment and demographic sections of the survey. Respondents were given the option of allowing anonymous public release of comments or asking that they be read only by members of the OSMF Board of Directors. All comments in languages other than English were machine translated into English prior to publication. The spreadsheet containing comments eligible for public release is here.

Target Audience

The target audience was the OSM community, and was specifically not limited only to members of the OSM Foundation.  The OSM community is ill defined and definition of it has been controversial. For purposes of this survey, the community was defined as consisting of mappers, communicators, data users, developers/maintainers, event organizers, and hardware/system operators involved in the OSM project.  A freeform "other" answer was also made available in that demographic question.

Selection biases apparent in the survey

  • The survey required Internet access (but so do contribution to and use of OpenStreetMap, so this particular bias is of relatively little concern).
  • It required knowledge of one of the 18 languages used in the survey (this is the largest number of languages used to date in any OSMF survey, and was part of a concerted effort to reduce bias toward speakers and readers of English).
  • It was announced and advertised through OSM-centric communications media (OSM mail lists; direct email to user groups and working groups; social media channels used by OSM contributors and users, local chapters, and local communities; newsletters; banners on osm.org and in editors) but access to the survey instrument was unrestricted (this created potential for oversampling, i.e., inclusion of respondents external to the OSM community).

A member of the community independently published the questions separately from the survey. This injected an additional selection bias by alerting some potential respondents to the specific questions to be asked, causing some of them to decline to participate. The Board had deliberately declined to publish the questions in advance so as to avoid this problem.

Post-survey, we noted that members of the OpenStreetMap Foundation were overrepresented.  Based on the demographic data, slightly more than 1/5 of respondents declared themselves to be Foundation members.  However, comparison of summary statistics of the entire sample with summary statistics of Foundation members revealed no substantial bias in the data due to this.  It is highly probable that employees of NGOs and firms using OSM data, members of working groups, as well as members of local chapters and communities were also overrepresented.  Again, however, no substantial bias in the data due to this could be discerned when results from those segments were compared to other segments.

Assumptions

  • That the sample would be biased toward individuals with above-average engagement in the project, and that casual mappers and data users would be underrepresented.  This assumption implies that mappers eligible for the "active contributor program", who number fewer than 8,000, would constitute a significant portion of the sample and likely be overrepresented.
  • That Foundation members would be overrepresented, since Foundation members are typically more engaged (as noted above, this assumption was justified).
  • Some degree of geographic normalization of data would be possible by comparing responses to known characteristics of mappers in the community, such as locations and volumes of edits and changesets from OSMstats (and this normalization effort has indicated that the results are not significantly biased in any direction in terms of geographic coverage).
  • That making demographic data optional would increase overall participation in the survey (the results justified this assumption).
  • That forcing answers to the "feedback" and "community sentiment" questions would reduce bias against respondents who either didn't care or have an opinion about the specific issue (the high number of "neutral" responses to question F1 is a strong indicator that this assumption was justified).
  • That the sample would be biased toward individuals interested in the OSM project.

Anonymization of survey data to preserve privacy

(See also the survey privacy policy.)

In the interests of transparency, the OSMF strives to publish as much survey data as it can while preserving the privacy and anonymity of respondents, both in compliance with applicable privacy laws and OSMF privacy policy, and in order to balance the need for transparency and obligation to respect privacy of OSM community members.

The following are steps we took to anonymize published survey data in order to preserve the privacy of respondents.

  1. Download the survey data into a spreadsheet.
  2. Create pivot tables based on demographic data, language used, and location data (country) to identify groups, particularly demographic groups, with fewer than 20 observations (e.g., fewer than 20 from any country, fewer than 20 using any language).
  3. Identify comments by individuals who did not grant permission to share their comments, and mark appropriately (highlight in red, for example). Ensure that those comments will not leave control of the Board.
  4. Copy comments to a separate spreadsheet, translate them into English using DeepL or Microsoft Translator, and publish translations of only those comments for which such assent has been granted, and publish only separately from all other data (comments can potentially be used to identify respondents).
  5. Given the relatively small number of females and representatives of non-binary and other genders in the OSM community, gender data must be released separately from other data in a manner that does not allow identification (deanonymization) of individual respondents. This is also true of the "time in project" variable, which could be used to identify individual respondents if matched to country of residence.
    1. Create two duplicate files of the survey data, for a total of three copies (original and two duplicates).
    2. In all spreadsheets, replace labels of all demographic and country groups with fewer than 20 observations with more generic labels (e.g., if fewer than 20 respondents are from Saudi Arabia, change 'Saudi Arabia' to 'Middle East'; if fewer than 20 respondents use Arabic, change the 'ar' language code to 'Middle East').
    3. In the duplicates of the survey data, delete the columns containing language codes, country codes and names, and all demographic data except gender (in one duplicate) or "time in project" (in the other duplicate).
    4. Ensure that in the two duplicates, one or the other of these columns is preserved and all other demographic columns are deleted.
  6. Delete the 'Response ID', 'Date submitted', 'Last page', 'Start language', 'Seed', 'May we share (anonymously, of course) your comments with the OpenStreetMap community?' columns from worksheets to make the possibility of merging of the datasets less likely.
  7. Run script to scramble randomly the order of responses by gender and "time in project" in order to prevent merging with the original spreadsheet and thereby deanonymizing the data.

Country clustering

To preserve anonymity, certain countries are clustered (aggregated) in the publicly released dataset. The country/region list is as follows:

Africa, anglophone
Africa, romanophone
Americas, other, not elsewhere specified
Australia
Belarus
Belgium
Brazil
Canada
China
East Asia/Oceania. not elsewhere specified
European Union, other, not elsewhere specified
France
Germany
India
Indonesia
Italy
Japan
Middle East
Netherlands
Other Europe (Non-EU), not elsewhere specified
Philippines
Poland
Russia
South & West Asia, not elsewhere specified
Spain
Sweden
Switzerland
Ukraine
United States

Plan for presentation of survey results

Participation numbers, the list of participating countries, releasable comments, anonymized raw data, and summary statistics have been released (see above). Weighted (normalized) summary statistics for questions F1-F5 and S2-S3 have been released (see above).

The main effort to release data is complete. If you are interested in analyses absent from the statistics presented above and that cannot be calculated from the anonymized raw data, please contact the Board of Directors at board@osmfoundation.org with your analytical request and we will seek to accommodate it as time permits. Respecting privacy and anonymity of respondents is our highest priority, followed closely by presentation and analysis of the data as quickly as we can.

A GeoMob podcast on the survey results was recorded on March 4 and can be heard here.

A link to a PDF copy of the slide show presented in the March 10-11 video briefings is in the quick links table above.

Questions and answers

Q. How important is the takeover protection (question S1, choice 1) for a) the survey respondents who are a corporate sponsor of OSM or a commercial company using OSM data (question D1, choices 5, 6, or and b) those respondents who are neither? Is there a difference between the two groups? If yes, is it (statistically) significant?

A. There is a statistically significant difference between those who answered yes and those who answered no to these two questions. I did not break out the "double no" group ("no" in both categories) because to be frank doing that complicates the problem and the difference in results on initial examination was not that great. You can see the numbers in the spreadsheet here. Unfortunately, because the raw data must be kept separate, this spreadsheet contains only the values of the calculations and not the formulae, but you can reproduce the calculations from the 28 pivot tables if you wish to check the math. The results were:

sponsor=yes 3.014285714
sponsor=no 4.455867082
company=yes 3.92733564
company=no 4.475280899

In other words, employees of corporate sponsors ranked takeover protection 3rd from the bottom (5th in priority), while non-employees of corporate sponsors ranked it 4th from the bottom (4th in priority). Employees of companies using OSM data ranked takeover protection 4th in priority, while non-employees of such companies ranked it in the same position but more strongly.

The methodology was to assign a rank of 7 to 1 for first choice, second choice, et cetera for the option "Takeover protection". I multiplied the rank of each choice times the number of respondents who made that choice, then took the means and standard deviations (range of course was 1 through 7) and counted the total number of observations. Those data were then fed into the MedCalc comparison of means online calculator to determine statistical significance.

---

Q. Can you do a crosscut of respondents from Africa who work for an NGO or for a private company using OSM data?

A. Observations for Africa were n=117. Of these, 29 reported working for non-profits, and 2 reported working for companies using OSM data.

---