Operations/Minutes/2020-04-10

From OpenStreetMap Foundation

Draft

OpenStreetMap Foundation, Operations Meeting* - Agenda & Minutes
Friday April 10th 2020, 18:00 London time
Location: Video room at https://osmvideo.cloud68.co

  • Please note that this was not an OWG meeting.

Participants

Present:

Tracker

Action items

  • 2020-04-10 Tom to try Github properly (live tagging on Github is old version) and see if successful. [Topic: Fixing the failing CI tests..]
  • 2020-04-10 Tom to put a copy of the website on another machine (not in UCL), to see if the UCL networking is the problem. [Topic: Fixing the failing CI tests..]
  • 2020-04-10 OWG Push up tile usage policy (commercial entities, vehicle tracking applications - which are heavy on Nominatim and probably not attributing as well). [Topic: Commercial CDN for Bulk Tile Users]
  • 2020-04-10 Ian will continue to talk with Fastly as it will take some time to make decisions. They said they would send a contract and once he knows more, he will share. [Topic: Commercial CDN for Bulk Tile Users]
  • 2020-04-10 Grant and Tom to work out a table of different data bits, work out how they are backed up and what can be potentially improved. [Topic: High Availability / Redundancy of OpenStreetMap.org (and primary services)]
  • 2020-04-10 Not assigned: Potentially move some more of backup data into long term S3 buckets. [Topic: High Availability / Redundancy of OpenStreetMap.org (and primary services)]
  • 2020-04-10 Not assigned: Get a better handle on number of requests. [Topic: Commercial CDN for Bulk Tile Users]

Decisions and pending items

  • 2020-04-10 [Topic: Commercial CDN for Bulk Tile Users] Follow up on next meeting.
  • 2020-04-10 [Topic: New Data Centre Space?] Pending question: Can we find expertise to run it?
  • 2020-04-10 [Topic: Tile CDN usage policy (OSM, Friends, Others?)] Discussion deferred - depends on decision on Fastly.
  • 2020-04-10 [Topic: Tile CDN] Discussion deferred.

Fixing the failing CI tests

Issue: Unreliable https://github.com/openstreetmap/chef/actions
Should we make Github Primary? Should we move git.openstreetmap.org to another data centre?

Background

  • Website lives both on OSM.org and Github (most of dev. work on Github - where we deploy for production).

Historic

  • When switched to Github, people were reluctant to make something that we didnd't control to be the major repo.
  • Gradually added to Github Chef, DNS etc. Most things are now pushed to both places.

On problem
Brief testing by Grant on UCL where osm.org is based.

  • Occasional network problems: packets drop - mainly outbound.
  • Bounces ~ 6%
  • Jobs running on Github actions often fail to check things out from OSM Git server. Nothing else seems to have problem checking out from this server.
  • UCL has not very reliable connectivity - previous firewall giving problems.
  • Michal had started a PR that would change the source repo to Github - but required a group decision and was closed. Also PR failed to run because the tags that our deployment uses do not exist in our Github repo.

Other suggestion

  • Move Git and stuff elsewhere. Chef doesn't have to move.
    • We have decent HP servers in Amsterdam.

On asking again the community about making the Github repo the master

  • Not main issue.
  • Portion of community doesn't like non open-source.
    • False dichotomy as Gitlab is not truly open either.
  • Sarah has concerns with Nominatim - might be resolved by branch.

Action items

  • Tom to try Github properly (live tagging on Github is old version) and see if successful. Michal's PR just needs the correct tags.
  • Tom to put a copy of the website on another machine (not in UCL), to see if the UCL networking is the problem.

New Data Centre Space?

There are a few European ISPs/transfer providers eager to give us transient or hosting space but they want us to become a real ISP:

  • DE: DECIX may be able to offer us some rackspace, but we'd need our own ASN + address space allocation & probably upgrade hardware (e.g. routers).
    • DECIX: are the 2nd or 3rd IX in the world, have points of presence all over the world.
    • Offered location in western Germany (probably Frankfurt).
  • AMS AM6 "In this list, I would suggest approaching Core-Backbone: they already sponsor some community projects such as https://www.community-ix.de/sponsoren/ " - Baptiste Jonglez
  • US: Jamie has found an ISP with data center in California, offered free resources. Email offer from them.
  • HE.net offered us free/near-free transit.

Points mentioned during discussion

  • If we find expertise to run it it is probably worth it, otherwise not.
  • Some costs involved (joining European registry)
  • New data center will address shortcomings with Slough and UCL and help with general OSM services.
  • IPV4 blocks - can still get some blocks, but very small.
  • If OPS want to restrict tile usage, OSMF board would be probably supportive.

Concerns

  • Time and finding people to help.

Pending question

Can we find expertise to run it?

Commercial CDN for Bulk Tile Users

Fastly has expressed interest in supporting OSM.
Proposal: move tile.osm.org traffic to Fastly CDN with a longer cache timeout and adjust friendly clients that need more up-to-date tiles to use existing CDN.

  • Initial approach from them from a marketing side (mentioned "co-marketing", Ian suggested blogpost)
  • They would be on the front page, with the data centers (see usage, below).
  • They seem to want to support as much traffic as OSM can sent them.
  • Paul had also contacted them in the past.
  • They might be able to help with writing VCL to block abusers.

On cost and OSM US potentially covering the amount

  • Initial offer: 1k USD + covering anything else we would need. Ian mentioned 10K and she seemed ok.
  • OSM US could write the amount off as an in-kind donation.
  • Estimated cost ~ 5 million/year with our current numbers of front-end usage. We would have to check with OSM US.
  • Fastly: Cheaper for NA and Europe. Cost could be cheaper if we commit to high traffic.

Concern
Not ending up managing 2 tile networks.

On having just the commercial CDN - concerns

  • Mappers care about up-to-date tiles.
  • Can't lose all control of being able to block the worst abusers.
  • Fastly does not allow to do VCL - could block there.
  • Serve abusers a TCP response with "do not cache" - they would just get what is in the cache.
  • Could not then switch back to running our own CDN.

On stats

  • Nginx numbers seem wrong.
  • Majority of traffic NA and Europe

On tile usage
OSMF policy on tiles is clear, that tile use is not a "free for all" and abusers of the current policy as published on the website may be kicked off. The Board will support the OWG in any effort to enforce the current tile policy. https://operations.osmfoundation.org/policies/tiles/

  • We consider users like QGIS as friends.
  • Commercial websites using our tiles just because we're free should go away.
  • Encourage people to not use us.
  • Relevant to increasing capacity in the US.

On if there is dire need to increase capacity

  • Taiwan is getting slow.
  • Continuous need.
  • Usage can double a year over a year.
  • Having additional capacity for redundancy is useful.

Next steps

  • Push up tile usage policy (commercial entities, vehicle tracking applications - which are heavy on Nominatim and probably not attributing as well).
  • Action item: Ian will continue to talk with Fastly as it will take some time to make decisions. They said they would send a contract and once he knows more, he will share.
  • Get a better handle on number of requests.
  • Follow up on next meeting.

48'33 - Guillaume joined

Tile CDN usage policy (OSM, Friends, Others?)

https://operations.osmfoundation.org/policies/tiles/
Can we continue to be be open to the world, what policies (not yet actions) can we put in place now to encourage people to move to other providers? eg: Do we continue to allow Commercial Sites to use tile.openstreetmap.org ?

If we go to Fastly, this issue becomes moot. Not discussed further.

High Availability / Redundancy of OpenStreetMap.org (and primary services)

  • Where do we currently stand? (Hot/Cold/Redundancy etc)
  • What do we still need to do?
  • Should we get another Data Centre? (See Below)

We have 2 main data centers for osm.org

  • AMS: Everything runs from here for osm.org.
  • Bytemark: Warm/coldish copy - needs some preparation before moving across. Not fully highly available.
  • Blocking: GPX traces blocked on some pages in Rails (coming in next version).
  • Also have to do: planet files & replication diffs (work in progress with Johan)

On backups

  • Main geodata is replicated and backed-up.
  • GPX files only synced manually and occasionally.

Action items

  • Grant and Tom to work out a table of different data bits, work out how they are backed up and what can be potentially improved.
  • Potentially move some more of backup data into long term S3 buckets.

Open Ops Tickets

Review open, what needs policy and what needs someone to help with.

Migrate help.openstreetmap.org from OSQA

  • Long-standing, non-urgent.
  • Options:

Need a new way of doing diff replication

  • Johan is working on it.

Adding API key support for tile.osm.org?

  • Discussion postponed

Tile CDN

Squid / nginx / GeoDNS? Is the software right? Issues and improvement
What experiments should we be doing to improve the network.

Discussion deferred.