Operations/Minutes/2021-05-19

From OpenStreetMap Foundation

OpenStreetMap Foundation, Operations Meeting* - Agenda & draft minutes
Wednesday 19 May 2021 18:00 London time
Location: Video room at https://osmvideo.cloud68.co

* Please note that this was not strictly an OWG meeting.

Participants

Present:

Minutes by Dorothea Kazazi.

Apologies:

Administrative

Previous minutes

2021-05-05

Action items

  • 2021-05-19 Grant to give Twitter credentials to Paul (was: 2021-05-05 Grant to check/fix GroupTweet for osm_tech Twitter account)
  • 2021-05-05 Grant to email Toby from WMF and suggest chattting to MapTiler. [Topic: Wikimedia]
  • 2021-05-05 Paul to do a circular for ISP in a couple of days. [Topic: Dublin updates - Network requirements change]
  • 2021-05-05 Grant to provide switches model and vendor to Paul. [Topic: Dublin updates - Someone to handle network purchasing]
  • 2021-04-21 Paul to work out where we need the new HP DL360 servers. [Topic: New HP DL360 servers] # 2021-05-19 on the agenda.
  • 2021-04-21 Paul to tweet asking for recommendation of HP resellers in Ireland. [Topic: New HP DL360 servers] # 2021-05-19 will tweet once he gets the info.
  • 2021-04-21 Paul to check how raid1 with hot spare works out with the budget. [Topic: New Rendering server]
  • 2021-04-07 Grant to get contact info of ISP to Guillaume [Topic: Improving networking in AMS]. # 2021-04-21 couldn't find it. He'll try to figure out who's available in the new data center in AMS. # 2021-05-19 decision to be removed
  • 2021-03-24 Paul to create ticket related to API PostgreSQL update [Topic: API PostgreSQL update]
  • 2021-03-24 Hrvoje to check power supplies on Viserion/Drogon [Topic: Old tile caches: Viserion and Drogon] # 2021-05-19 decision to be removed.
  • 2021-03-10 Paul to have a look at TimescaleDB. [Topic:TimescaleDB] # 2021-05-19 decision to be removed as talking to TimescaleDB community, which has more expertise.
  • 2021-02-24 Tom to report back on TimescaleDB again at next meeting. [Topic: Reportage] [was: 2021-01-13 Tom to evaluate TimescaleDB] [Topic: Longer term metric retention] # 2021-04-21 SSD Disk Failing in US # 2021-05-19 decision to leave on the agenda.
  • 2021-02-24 OWG --> Grant to install a Discourse instance to get us started. [Topic: Discourse] # 2021-04-21 on the agenda. # 2021-05-19 pending.
  • 2021-01-13 OWG to send message to the servers we want to keep. [Reportage. Existing CDN servers] # 2021-03-24 Three servers stopped talking to us (shenron, naga and one more) # 2021-05-19 pending.
  • 2021-01-13 Grant to wipe thorns and the 3 other machines [AMS] [Topic: Longer term metric retention] # 2021-05-19 pending.
  • 2021-01-13 Paul to create ticket with Equinix to scrap the wiped thorns and the other 3 machines [Topic: Longer term metric retention]
  • 2021-01-13 Paul to create a ticket related to tile geographical localisation. [Topic: Lack of render capacity] # 2021-05-19 Done.
  • 2020-12-02 Grant to develop some thoughts on what is next for us using AWS. [Topic: AWS] # 2021-05-19 pending.
  • 2020-11-04 Grant to do heavy integrity checks to katla to test its response to heavy load. # 2021-03-24 Grant has got some disks to replace. Needs to open ticket with Bytemark. # 2021-05-19 Done.
  • 2020-11-04 OWG to work out tile log archival and deletion policy at later stage. [Topic: Commercial CDN] # 2021-03-24 & 2021-05-19 deferred to future point.
  • 2020-10-21 Paul to write to Discourse ticket and email the board [Topic: Discourse]
  • 2020-09-23 Grant to put in touch Guillaume and Toby. [Topic: Wikimedia challenges with Tile CDN delivery] Grant to check up on status. # 2021-05-19 superseded by email to be written.
  • 2020-09-23 OWG to pencil out what is needed. [Topic: Wikimedia challenges with Tile CDN delivery] # 2021-05-19 superseded by email to be written.
  • 2020-09-23 Toby Negrin (Wikimedia) to ask Wikimedia whether they would be interested in OSMF running a tile service available to Wikipedia and if they would be willing to share hardware resources or expertise. [Topic: Wikimedia challenges with Tile CDN delivery] # 2021-05-19 superseded by email to be written.
  • 2020-09-09 Tom to update OAuth ticket https://github.com/openstreetmap/openstreetmap-website/issues/1408 [2020-09-09 Reportage, related to 2020-08-26 action item] # 2021-05-19 Done.
  • 2020-09-09 Grant [Topic: AWS] Speak to AWS person about going ahead with open data program with official OSM S3 bucket. # 2021-05-19 pending.
  • 2020-09-09 [Not assigned] [Topic: AWS] Decide on services we need to run on AWS. Need clearance. # 2021-05-19 overlap with future AWS usage - decision to have a single ticket.
  • 2020-09-09 [Not assigned] [Topic: AWS] Work out rough budget. # 2021-05-19 decision to remove as budget will be worked out once decided what to run.
  • 2020-09-09 Grant [Topic: AWS] Talk to OpenAerial Map/HOT. # 2021-05-19 pending.
  • 2020-09-09 [Not assigned] [Topic: Federating OSM communities' rooms through OSMF-hosted Matrix servers] Evaluate effort required. Constrain the scope to what we can support and perhaps ask volunteers to step in. # 2021-05-19 decision to remove. Stick with Discourse for the time being.
  • 2020-09-09 [Topic: Ironbelly replacement] Paul to work out a proposal for the ironbelly replacement. # 2021-05-19 on agenda.
  • 2020-08-26 Tom to look at road ahead for OAuth. [Topic: Merge forums, OSQA, MLs to discourse?] https://github.com/openstreetmap/openstreetmap-website/issues/1408 # 2020-09-09 Did some investigation - branch with some code. Better understanding of OAuth 2 and options. Doable. # 2021-05-19 decision to remove as superceded by more recent action items.
  • 2020-08-26 Grant to talk to Ianabout migrating old content to Discourse. [Topic: Merge forums, OSQA, MLs to discourse?] # 2020-09-09 pending. # 2021-05-19 Paul has stricken this through.
  • 2020-08-26 [Not assigned] Create Github ticket for updated OAuth. [Topic: Merge forums, OSQA, MLs to discourse?]
  • 2020-08-12 Michal to try to rekindle excitement about people helping with imagery (on dev channel/imagery channel or Slack). # 2020-08-26 No progress.
  • 2020-07-29 Grant to enable background sync to Amazon Web Services (AWS) S3. [Topic: Ironbelly] # 2020-08-12&26 Manually run, automated scripting to be added. # 2021-05-19 Grant to run the script again.
  • 2020-07-29 Grant to check with Wiki Admins on hCaptcha (reCaptcha replacement). [Topic: Wiki reCaptcha issue] https://github.com/openstreetmap/operations/issues/454 # 2020-08-12 hCaptcha people reached out and happy to help. Blocker on Mediawiki 1.35 being released in August. # 2021-05-19 blocker removed.
  • 2020-07-15 Paul and Grant to quote up a server to replace errol/kessie. [Topic: Replacement of Errol/Kessie]. # 2020-08-12 A new person in OWG asked to do Errol. Need to replace it at some point - at University College London. # 2021-05-19 pending.
  • 2020-07-15 Ian to try converting fluxBB DB to go into Discourse. [Topic: OSM Forum (FluxBB) update]. # Evaluating whether moving is an option. Need to see about history, user log-in. # 2021-05-19 decision to leave the action item open.
  • 2020-07-01 Paul to create a ticket about solutions to reduce incoming comms. [Topic: Revision of acceptable use policy to reduce incoming comms] # 2021-05-19 decision to leave the action item open.
  • 2020-07-01 Grant to work out some of the questions for an online form as a solution to reduce incoming comms. [Topic: Revision of acceptable use policy to reduce incoming comms] # 2020-08-12 need to think about the reply # 2021-05-19 decision to leave the action item open.
  • 2020-07-01 Michal to reach to Amazon Web Services (AWS) (need a story for AWS to show how their help will lead to AWS spending from users). [Topic: Commercial CDN for Bulk Tile Users] https://lists.openstreetmap.org/pipermail/talk/2020-May/084700.html # 2020-08-12 Michal feels blocked, could draft something. We got contacted by AWS, not replied yet. More info at 2020-08-12 reportage. # 2021-05-19 decision to remove.
  • 2020-06-04 Paul to update the Github ticket "Adding API key support for tile.osm.org" https://github.com/openstreetmap/operations/issues/342
  • 2020-06-04 OPS team: draft an email (regarding a call for proposals), ask for comments. [Topic:Adding API key support for tile.osm.org https://github.com/openstreetmap/operations/issues/342] # 2021-05-19 decision to remove.
  • 2020-04-10 OWG to push up tile usage policy (commercial entities, vehicle tracking applications - which are heavy on Nominatim and probably not attributing as well) [Topic: Commercial CDN for Bulk Tile Users] # 2021-05-19 decision to remove.
  • 2020-04-10 Grant to work out a table of different data bits, work out how they are backed up and what can be potentially improved. [Topic: High Availability / Redundancy of OpenStreetMap.org (and primary services)] # 2021-05-19 decision to leave the action item open.
  • 2020-04-10 [Not assigned]: Potentially move some more of backup data into long term S3 buckets. [Topic: High Availability / Redundancy of OpenStreetMap.org (and primary service)] # 2021-05-19 decision to remove.

Reportage

Netlify related request

Request for updated permissions - to be granted.

From action item updates

Replacing reCaptcha with hCaptcha.

  • hCaptcha better supported now in Mediawiki.
  • Bugs of reCaptcha and simple editor.

HP DL360 Gen9 servers for Dublin

Paul came up with different numbers of servers required when budgeting and now: 7 then and 10 now. (https://github.com/openstreetmap/operations/issues/525)

Improved locality of backend tile requests

https://github.com/openstreetmap/operations/issues/527
Europe tile requests are being split based on metatile coordinate into server groups of one powerful server + one weak server. Significant reduction in rendering workload.

Will this cause problems with failover?

Decision: Test 10 minute stopping of apache on odin or ysera at 22:00 or 23:00 UK time.

Planet servers

We need to decide the plan for the planet servers and if they're going to need a large (>30TB) RAID array or we will use object store. What needs to happen?

3 things that need storage

  • planet server
    • has backups, probably a significant portion of that
    • backups should be moved to AWS
    • Grant to provide breakdown of usage of planet server storage space
    • Paul to look at new server with enough storage to replace ironbelly
  • dev server
  • imagery server

Action: Grant to provide breakdown of planet server files.

Breakdown of planet server files (incomplete run) 3.4 TB Backups
400 G log files
300 G Current run of a planet dump
RAILS storage (old user images and gpx files)

AWS

  • Planet serving portion of S3 might be provided for free (potentially replication and services we need to run the S3 bucket). Wouldn't run tile services.
  • Concern: development time.

Decision: 2U machine. Suggestion: 32TB + 25%. Will depend on run.

Future: add extra disks to slots. No concern for unmatched disks.

QGIS

Topic added after request of Sarah Hoffmann (Nominatim).

Deferred until we can talk to Sarah.

10Gb Switches

£3500 each minimum

Action: Grant to price up options for review and decision.

RAM for new DB server in Dublin

DB server: 11.04 TB used.

Decision: 0.5 TB RAM

Dublin tickets

https://github.com/openstreetmap/operations/milestone/5

Suggestion: split ticket "cabling and accessories" https://github.com/openstreetmap/operations/issues/529

Open Ops Tickets

Review open, what needs policy and what needs someone to help with.
https://github.com/openstreetmap/operations/issues

Action items from this meeting

  • Grant to give Twitter credentials to Paul. [Action item updates]
  • Grant to provide breakdown of planet server files. [Topic: Planet servers]
  • Grant to price up 10Gb Switches options for review and decision. [Topic: 10Gb Switches]

Next meeting

Wednesday 2 June 2021 18:00 London time

Operations meetings are currently being held every 2 Wednesdays, at 18:00 London time.
Online calendar showing the OPS meetings.