Operations/Minutes/2024-02-08

From OpenStreetMap Foundation

OpenStreetMap Foundation, Operations Meeting - Draft minutes

These minutes do not go through a formal acceptance process.
This is not strictly an Operations Working Group (OWG) meeting.

Thursday 8 February, 19:00 London time
Location: Video room at https://osmvideo.cloud68.co

Participants

Minutes by Dorothea Kazazi


New action items from this meeting

  • 2024-02-08 Paul to ping Sarah on the ticket regarding the Overpass move. [Topic: New general purpose machines for DB4]
  • 2024-02-08 Grant to check: remote access, the procedures to go back to the previous firmware if something fails, read the links on NSSU. [Topic: Switch upgrade maintenance window]
  • 2024-02-08 Grant to announce the Discourse maintenance window on Twitter/Mastodon/Community.osm.org. [Topic: Switch upgrade maintenance window]
  • 2024-02-08 OWG to review the Editor policy during one of the next calls and possibly vote on it. [Topic: Editor Policy adding to OpenStreetMap.org]

Reportage

Setting-up PagerDuty

2024-01-25 Grant to set-up PagerDuty and provide instructions to OPS on how to trigger PagerDuty manually if we notice the website is down. (there is a ticket) [Topic: When to page someone for a problem ]

Grant set up StatusCake to send sms notifications in the interim. Setting up PagerDuty for alerts. Will probably finish tomorrow.

Publishing OAuth 1.0a schedule

2024-01-25 Grant to publish the schedule for OAuth 1 deprecation on mailing lists/Discourse/Mastodon/Twitter [Topic:OAuth 1.0a]

Mostly done, except mailing lists (dev).

Disabling OAuth 1.0a and OAuth 1

OAuth 1.0a

  • Tom opened a pull request to add proper support for disabling OAuth 1.0a separately and for returning sensible error messages for both basic OAuth and OAuth 1.0a.
  • We added OAuth 1.0a support 10 years ago.

OAuth 1

  • There's a switch we can set to CGImap to stop it accepting OAuth 1 tokens, but it will not send any message mentioning that OAuth 1.0a is disabled.
  • We can turn off OAuth 1.0 at any time, since it is insecure by design. We don't know how many applications are using it.

Creating a test parallel version of tile CDN

2024-01-11 Grant to create a test parallel version of tile CDN. [Topic: Serving different images instead of errors to tile requests]

  • Depends on OpenTofu - currently working on OpenTofu for PagerDuty.
  • Paul has created the distribution and Fastly settings have been changed.
  • Goal: use OpenTofu features that allow us to change the environment.

New general purpose machines for DB4

https://github.com/openstreetmap/operations/issues/917
We bought replacements for gorwen and jakelong, the last Gen8 used for our services. They are installed as fume and grisu and are turned on, but the services have not yet been moved.

  • Suggestion: Add the recipe to the new machine, Chef up, install docker containers and then move the database.
  • Downtime: necessary for the move, probably next week.
  • Move: includes more than the Postgres database, but it's all a set of directories.
    • If containers are shut down, it's safe to move.

Overpass:

  • Downtime announcement: none needed, as we can redirect.
  • Sarah might want to rebuilt it.
  • In case Sarah does not want the machine:
    • we can move OSMF data or
    • we can use it as a container server, as it's going to be the first Debian deployment that we have production stuff on.
    • we wouldn't have Overpass.

Action item: Paul to ping Sarah on the ticket regarding the Overpass move.


Switch upgrade maintenance window

On test run

  • Done a few days ago. Tested both versions.
  • Process involves double reboot and double install: completes the initial partition update, then it reboots and then it does the full partition update.
  • Duration: took a long time. 40 minutes to update 1 switch.

Suggestions

  • Maintenance window: 2 hour window - not expecting outage, if everything goes ok.
  • Order: Will do Dublin first, as it is lower risk.
  • Check xmodem transfers over serial to get the firmware.
    • There is back-up firmware on Juniper and we can sync it.

Main issue: Requires as much free space as possible. Necessary to perform a clean-up process, delete old configs and maual purge some cache database.

Suggestions related to timing

  • Dublin on Sunday 11 February 2024.
  • Do both the same day and cancel the second one if the first one goes badly.
  • 4 to 6 hour window - hopefully 0 outage.

Other points mentioned during discussion

  • There's a slightly different upgrade procedure for the 2 switches.

Action items

  • Grant to check: remote access, the procedures to go back to the previous firmware if something fails, read the links on NSSU.
  • Grant to announce the Discourse maintenance window on Twitter/Mastodon/Community.osm.org.

Purchase of openstreetmap.mg - is this a decision for OPS?

(topic added by Dorothea)
There will be a SotM Madagascar 2024 in April. The organisers (who got their SotM trademark licence request approved), suggested OSMF purchasing openstreetmap.mg and then pointing to a location which the organisers choose. Raised this with the board and there was a comment that domains are managed by OWG.

  • The OWG policy has been that we'll buy domains and redirect them for local communities.
  • Cost: $117.00 (EUR 95) 1 year then $129.99 /year.
  • Agreement that it is expensive.

Grant registered the domain.


Editor Policy adding to OpenStreetMap.org

https://community.openstreetmap.org/t/proposal-osm-org-editor-inclusion-policy/106589/71
We will revisit in an upcoming meeting to review a revised copy.

  • Grant was contacted from someone on behalf from Rapid, asking regarding the completion of the policy for adding Editors to www.osm.org.
  • Decision on Rapid might be bounced to the board.

Reference: https://community.openstreetmap.org/t/proposal-osm-org-editor-inclusion-policy/106589/71

Action item: OWG to review the Editor policy during one of the next calls and possibly vote on it.


OWG 2024 budget (no changes)

  • The board will not change the OWG budget and nothing will be trimmed.
  • The contingency might not get spent. The double contingency got removed because it was added by mistake.
  • Earlier in the year it was known that the 2024 budget would be a tight budget - that's there was a push for limiting hardware upgrades.
  • Next year is going to be much bigger on the capital.

Suggestion: Optimise communication for fundraising for hardware.


Next agenda on hackmd

  • Guillaume created a pad for the next agenda on hackmd.io instead of ccc.de

Issue: Naming of the pads is no longer sequential.


rhaegal found - next steps?

(Topic added by Hrvoje Bogner)
Hrvoje Bogner found [rhaegal https://hardware.openstreetmap.org/servers/rhaegal.openstreetmap.org/] and has access to it, but the server is currently not connected to the network.

Suggestions

  • Use it for testing Vector Tiles.
  • Install Debian.
  • Put it in the data center.

rhaegal has 6 1 TB drives.


SotM Croatia 2024 conference invitation

Hrvoje Bogner is in the organising committee of a local IT conference, which will have a "State of the Map" track. Trademark licence approved.

Enquiry whether someone from OPS would like to give the keynote speech.

Hrvoje Bogner to ask if travel is provided and may sent a formal invitation.


Action items reviewed at the beginning of the meeting

  • 2024-01-25 Grant to set-up PagerDuty and provide instructions to OPS on how to trigger PagerDuty manually if we notice the website is down. (there is a ticket) [Topic: When to page someone for a problem ]
  • 2024-01-25 OPS to publish the schedule for OAuth [Topic: OAuth 1.0a]
  • 2024-01-25 Grant to contact the CWG about a blogpost on OAuth 1 deprecation. [Topic:OAuth 1.0a]
  • 2024-01-25 Grant to publish the schedule for OAuth 1 deprecation on mailing lists/Discourse/Mastodon/Twitter [Topic: OAuth 1.0a]
  • 2024-01-25 Tom to investigate what happens to existing OAuth 1 tokens when OAuth 1 gets turned off, and whether error messages can be sent back in the html of the response to basic/OAuth 1 users (yes, PR in progress). [Topic: OAuth 1.0a]
  • 2024-01-11 Grant to create a test parallel version of tile CDN. [Topic: Serving different images instead of errors to tile requests] -> Grant to create GitHub ticket
  • 2024-01-11 OPS to move to Fastly config as code. [Topic: Serving different images instead of errors to tile requests]
  • 2023-11-30 Grant to revisit the "policy for purchasing" document, which currently is focused on specs, and add information such as the process for obtaining approval for purchases. [Reportage] Added info: Who Approves / Steps etc -> Grant to create GitHub ticket
  • 2023-11-30 OPS to review the issue of spam reports to ISPs in 6 months (May 2024) -> Grant to create GitHub ticket
  • 2023-05-18 Paul to start an open document listing goals for longer-term planning. [Topic: Longer-term planning]