Difference between revisions of "Operations/Minutes/2020-07-01"

From OpenStreetMap Foundation
(Created page with "<font color="blue">Draft minutes</font><br> '''OpenStreetMap Foundation, Operations Meeting* - Agenda & Minutes'''<br> Wednesday July 1st 2020, 18:00 London time<br> Location...")
 
m (removed duplicated words)
 
Line 79: Line 79:
 
''Grant joined''
 
''Grant joined''
   
''Even if we move stuff to virtual providers and hosting companies, we think we still want to maintain a physical presence somewhere as a as a second site. Current situation in Slough works, but not for next 5 years.''
+
''Even if we move stuff to virtual providers and hosting companies, we think we still want to maintain a physical presence somewhere as a second site. Current situation in Slough works, but not for next 5 years.''
   
 
Options:
 
Options:

Latest revision as of 09:08, 6 August 2020

Draft minutes

OpenStreetMap Foundation, Operations Meeting* - Agenda & Minutes
Wednesday July 1st 2020, 18:00 London time
Location: Video room at https://osmvideo.cloud68.co

* Please note that this was not strictly an OWG meeting.

Participants

Present:

Apologies:

Minutes by Dorothea.

Administrative

Previous minutes

2020-06-04

Action items from this meeting

  • Paul to follow up with Ian again about Fastly. [Topic: Commercial CDN for Bulk Tile Users].
  • Paul to create a ticket about solutions to reduce incoming comms. [Topic: Revision of acceptable use policy to reduce incoming comms].
  • Grant to work out some of the questions for an online form as a solution to reduce incoming comms. [Topic: Revision of acceptable use policy to reduce incoming comms].
  • Grant to send IPv4 trace data for US servers with packet loss [Topic: US servers packet loss].
  • Michal to reach to AWS (need a story for AWS (Amazon Web Services) to show how their help will lead to AWS spending from users). [Topic: Commercial CDN for Bulk Tile Users].
  • Michal to write a summary about archiving help.openstreetmap.org and post it to the thread. Say looking at long term replacements and point to the ticket, include link to uptime graphs and follow up in original thread.

Commercial CDN for Bulk Tile Users

Cloudflare

  • We might not be paying the published rate.
  • Pricing does not depend on number of requests.

Amazon Web Services (AWS)

  • They're interested in ways in which the usage might translate into higher EC2 income or Athena income.
  • Paul asked for someone else to reach to AWS, as he currently works for Amazon.

Akamai

  • Not known about giving away stuff.

Fastly

Possible point to consider: CDN being able to handle snarly geographically diverse back ends.
* Not critical in our current design.
* Currently routing people to particular caches and caches to particular render servers, so one gets some locality.

Multi-node machines

  • US: ok
  • FR and DE:
    • some are not good.
    • six or seven machines with widely widely different capabilities.
    • we completely turned off one DE machine.
  • Administering non homogeneous set of machines as a node adds complexity.

Suggestion: substitute some small machines with renting our own VMs. -> would improve European reliability

Decision deferred till we look at commercial options.

Action items

  • Paul to follow up with Ian again.
  • Michal to reach to AWS (need a story for AWS to show how their help will lead to AWS spending from users).

Long term plans for data center space

Grant joined

Even if we move stuff to virtual providers and hosting companies, we think we still want to maintain a physical presence somewhere as a second site. Current situation in Slough works, but not for next 5 years.

Options:

  • Add more stuff in Bytemark (UK-based server hosting and datacentre provider).
    • Bought by other company and all open source friendly people have left.
  • Get commercial hosting, probably in London.
    • Probably 50-100% more expensive than AMS but not unreasonable.
    • Slough data center probably cheaper than central London.
    • Advantages to have location that is easy to travel to,
      • though we've gone good with remote hands in AMS (though expensive). Only thing we couldn't do through them is disk disposal - go to IT waste.

On rack space: We have half to three quarters of a rack.

Suggestion: Find out cost for duplicating our rack in Slough or Isle of dogs.

Open Ops Tickets

Review open, what needs policy and what needs someone to help with.

Archiving help.openstreetmap.org

Related to action item: "2020-05-22 Michal to look into archiving help (priority) and Trac".
Related links:
* OPS May meeting: Migrate help.osm.org from OSQA
* Github: Migrate help.openstreetmap.org from OSQA #149

Points mentioned:

  • 10 years old machine. 6 cores and 8 G. Running Trac.
  • Trac uses a lot of memory if many people hit it - Apache restarting usually fixes it.
  • Pressure is intermittent, the impact is rare and therefore there's not some kind of direct or urgent action needs to be taken right now. Latest problems: 6th June, 24th April.
  • Potentially move it to a different server at some point or get rid of Trac to relieve some of the pressure.

On monitoring

Action item

  • Michal to write a summary about archiving help.openstreetmap.org and post it to the thread. Say looking at long term replacements and point to the ticket, include link to uptime graphs and follow up in original thread.

Revision of acceptable use policy to reduce incoming comms

Suggestion:
Website with wizard questions to gauge how big their potential use is and capture user agent/referrer. Then log them into a database for future contact in case of problems.

Factors:

  • Number of users.
  • Bulk downloading data or casually looking at map.
  • Vehicle tracking company.
  • Small experiment.
  • Duration.

Simplest version: Change email address to auto-responder / otrs (open source issue tracking system, used by several OSMF working groups).

Action items

  • Paul to create a ticket about solutions to reduce incoming comms.
  • Grant to work out some of the questions for an online form.

US servers packet loss

2 new tile caches is U.S. have a persistent packet loss,

  • Around 1.7% packet loss (IN) for the last 3 hours on final hop - using MTR. Also small artifact in Munin graphs (TCP connections).
  • Got network cards to do high amounts of traffic.
  • Recently upgraded servers to new Ubuntu version.

Action item:

  • Grant to send IPv4 trace data.

Next meeting

This time every two weeks works for the participants of this meeting.