Operations/Minutes/2023-02-09

From OpenStreetMap Foundation

OpenStreetMap Foundation, Operations Meeting - Draft minutes

These minutes do not go through a formal acceptance process.
This is not strictly an Operations Working Group (OWG) meeting.

Thursday 9 February 2023, 19:00 London time
Location: Video room at https://osmvideo.cloud68.co

Participants

Minutes by Dorothea Kazazi.

Absent

New action items from this meeting

  • Tom Hughes to add command to output the error related to podman container failing to stop. [Topic: Docker]
  • Paul Norman to create a ticket for making the directory structure of planet more sensible. [Topic: Planet server]
  • Tom Hughes to create a ticket for replication or what it will takes us to move to the alternative server. [Topic: Planet server]
  • OPS need ticket for moving to S3. [Topic: Planet server]
  • Grant Slater to create a ticket regarding switching from Ubuntu, probably to Debian. [Any other business: Mailman 3 migration]

Reportage

Katla

  • Its cost was GBP 28,000, ten years ago. We used it for ~8 years.
  • 36 kg without any disks. Can take up to ~72 disks.
  • Katla donated - will replace a 20-year old server.
  • Donation saved us the cost of sending it to recycling.

Forum content migration to Discourse

openstreetmap/operations ticket 604: forum content migration to discourse
  • Thanks to Tom's help, user merging looks good.
  • Grant asked the Russian community for feedback on test imports and they highlighted 5 issues - Grant's considers the issue with the nested quotes is the one that it needs fixing.
  • Messages on forum.osm.org plus community.osm.org: ~ 930,000 messages.
    • Megathread on the Russian forum with 114-117,000 messages - seems related to tagging.

Chef testing

Ongoing.

Docker

  • Tom helped.
  • Switched to Podman - looks good.
  • Current issue: Podman container is failing to stop.

Action item: Tom to add command to output the error related to Podman container failing to stop.

  • Pull request to do the welcome mat, then do the State of the Map websites, with the same template and resources.

On plan for where containers will be build and stored

  • Grant's preference: Github actions using Github packages.
  • Grant wrote the code that creates and publishes the packages for the Welcome and State of the Map (SotM) websites.

On making sure that all copies of Apache will be up to date

  • Nginx and weekly schedule that rebuilds the image.
  • Podman can manage the updates in any of the containers, on a schedule that we can define.

Suggestion for future: use triggers for Apache updates. Have to consider security.

Suggested update cycles

Planet server

  • Still using Ironbelly (12 years old) - the disks (especially the Western Digital ones) are becoming troublesome.
  • Database sump gets run on database server and getting copied to Ironbelly when it's finished.
  • Grant has trimmed down some of the published Planet and history files.

Reasons for being prevented to switch to the planet server in Amsterdam

  • lack of extra disk capacity.
  • weekly planet dump issue - runs from a database backup (DB backup -> Ironbelly -> triggers planet dump on Ironbelly)

Backups: 2/3 or 3/4 of backups is the DB dumps.

On disk space

  • Ironbelly: 33 TB storage
  • Norbert: 35 TB storage

Suggestions

  • Rsync to new machine.
  • Make the new planet server the new backup server as well.
  • Split the backups. Some backups to Ironbelly for the moment and the Postgres DB dumps and diffs to Norbert https://hardware.openstreetmap.org/servers/norbert.openstreetmap.org/
  • Have proper automatic mirorring to machine in Dublin, so that we can switch over if necessaey.

Jochen's process - not implemented yet : Dump logical changelog to disk -> sync that to AWS -> flush the logical changelogs from the database -> generate the published diffs. Advantage: if anything goes wrong, intermediate files can be pulled from AWS.

Link shared: https://github.com/openstreetmap/osmdbt#3-copy-log-file-to-separate-host-optional

Suggestion: create Github issue.

Next steps

  • Get off Ironbelly and move to Norbert (urgent). The new server has fewer warnings.
  • Plan the move of planet stuff to S3.
  • Possibly make the directory structure of the planet more sensible.

Action items

  • Paul to create a ticket for making the directory structure of planet more sensible.
  • Tom create a ticket for replication or what it will takes us to move to the alternative server.
  • Need ticket for moving to S3.

Network Upgrades AM6

  • Switches are not getting monitored. Junipers do not report fan speeds.
  • As soon as planet comes out, 20 people try to download it.

Priority

Getting AMS to redundant and off Cogent (we still have routing problems with them).

Dublin

  • All machines have 10G-capable networking.
  • 2 machines have fiber ports - the rest have RJ45.
  • Switches: 1G (2G with the bonding). Have 4x10G capacity.
  • Might need 10G internal networking.

Suggestions

  • Not make network upgrades as hitting 1G is rare and downgrade Dublin from 10 G to 1 G.
    • Re-evaluate upgrade when we move to S3.
  • Grant to remove the planet rate limits for a couple of days.
    • Not do it atm.
  • Get one or two 10G switches and interlink with Junipers.
  • Consider making Dublin the primary site for a while.
  • If we stick with HE.net, get a second port and have them switch the existing port - can be done seamlessly.

Action item

Guillaume to ask quotes again for 1G, 2 ports [Network Upgrades AM6]

Plan

After planet, switch to Dublin as the primary site and then switch Amsterdam.

Any other business

On OWG budget

The OWG 2023 proposed budget got approved by the board at the high tier.

US rendering server

The Arizona State University (ASU) wants to donate an US rendering server and would like to go for DELL - will be paid by them.

Travel policy

  • Travel policy suggested by board includes approval by the Operations Working Group (OWG) for OWG-related travels.
  • Grant will go to Slough, probably before the next OPS meeting.
  • Grant charges mileage if he uses his car - standard AA travel rate, mileage for 70' evening travel.

Bytemark

  • They know that they should leave Shenron* plugged-in - the server is owned by them.
  • Some of the hardware will be donated.

* Mailing lists server and OSQA server for help.openstreetmap.org

Mailman 3 migration

Tom will do a test migration to Mailman 3.

Mailman 3

  • More complicated than Mailman 2.
  • Not 1-to-1 equivalence in functionality.
  • Ubuntu 20.04 - Canonical is holding back security updates, unless you go Pro. Debian seems an attractive option with long term support.

Action item: Grant to create a ticket regarding switching from Ubuntu, probably to Debian.

Action items

  • 2023-01-26 Grant Slater to check whether Tabaluga is still under supply warranty. [Any other business: TagInfo server]
  • 2023-01-12 Guillaume Rischard to look at bandwidth stats for next OPS meeting to make predictions for future consumption. [Reportage: Network upgrades AM6]
  • 2023-01-12 Grant Slater to experiment with Netbox. [Topic: Asset management]
  • 2023-01-12 Grant Slater to email Slough to see if he can make any arrangements. [Reportage: Decommissioning]
  • 2022-12-15 Guillaume Rischard to make sure Grant does the forum to Discourse migration. [Topic: Forum to Discourse https://github.com/openstreetmap/operations/issues/604 ] # 2022-12-29 In progress. # 2023-01-12 In progress'
  • 2022-12-15 Grant Slater to produce a pull request and finish the OSQA one. [Topic: Containerisation of small services https://github.com/openstreetmap/operations/issues/807] # 2022-12-29 In progress.
  • 2022-11-03 [Network upgrades AM6] Guillaume Rischard to talk with Clement / Open Source ISP about reliability and get insurances that the virtualised Layer2 links to Paris will be reliable. # 2022-11-17 Pending # 2022-12-29 Changed from "OPS" to "Guillaume".
  • 2022-09-22 [Network upgrades @ AM6] Grant Slater and Guillaume Rischard to talk to Clement about the options for network upgrades. # 2022-10-06 Guillaume talked with Clement.
  • 2022-09-22 [Network upgrades @ AM6] Guillaume Rischard to provide a schema at next meeting.
  • 2022-09-22 [Network upgrades @ AM6] Grant Slater and Guillaume Rischard to work out all the costs involved for option 1.
  • 2022-09-08 Grant Slater to document Chef testing. [Topic: How to get more people involved] # 2022-09-22 Chef kitchen tests running locally. # 2022-11-03 pending. #2022-11-17 Pending # 2022-12-29 In progress. # 2023-01-12 In progress - one of the rests fails randomly.
  • 2022-07-14 Guillaume Rischard to ask Brian Sperlongano about OpenMapTiles and YAML, to do a test run. [Topic: Vector tile status]. # 2022-09-22: Suspended for a while. Brian is back at school and has no time to work on this. Guillaume will try to move vector tiles forward with someone else. # 2022-12-29 Still suspended.
  • 2020-12-02 [AWS] Grant Slater to develop some thoughts on what is next for us using AWS. [Topic: AWS] # 2021-05-19 & 2021-06-02 & 2021-06-16 postponed for a few weeks. # 2022-12-29 In progress.
  • 2020-07-29 [AWS] Grant Slater to enable background sync to AWS S3. [Topic: Ironbelly] #2020-08-12&26 & 2021-06-02 Manually run, automated scripting to be added. # 2021-05-19 Grant to run the script again. # 2022-04-09 Still manually run. # 2022-09-22 wants to make sure there are absolutely minimum permissions. # 2022-12-29 In progress. # 2023-01-12 Still manually run. Grant to look at the policies together with Paul.
  • 2020-07-01 Paul Norman to create a ticket about solutions to reduce incoming communications. [Topic:Revision of acceptable use policy to reduce incoming comms] # 2021-05-19 decision to leave the action item open. # 2021-06-02 discussion about priority for account deletion. # 2022-04-09 Grant can show Paul how to do that with autoresponder which Tom built. Might be better to work on an online form (action item below).
  • 2020-07-01 Grant Slater to work out some of the questions for an online form as a solution to reduce incoming comms. [Topic: Revision of acceptable use policy to reduce incoming comms] 2020-08-12 need to think about the reply # 2021-05-19 decision to leave the action item open. # 2022-04-09 Grant is thinking about examples. Suggestion to add what is considered large for tile usage.
  • 2020-04-10 Grant Slater to work out a table of different data bits, work out how they are backed up and what can be potentially improved. [Topic: High Availability / redundancy of OpenStreetMap.org (and primary services)] # 2021-05-19 decision to leave the action item open. # 2021-06-02 pending.

Meeting adjourned 34' after start.


Next meeting

Thursday 23 February 2023, 19:00 London time, unless rescheduled.

Operations meetings are currently being held every two Thursdays, at 19:00 London time.
Online calendar showing the OPS meetings.