Working Group Minutes/EWG 2013-05-20

Attendees

IRC nick Real name
apmon Kai Krueger
Firefishy Grant Slater
gravitystorm Andy Allan
RichardF Richard Fairhurst
TomH Tom Hughes
zere Matt Amos

Summary

  • rails_port README
    • gravitystorm has made progress on the READMEs, and had a few items for discussion by the group.
    • ACTION: gravitystorm create platform-specific install documentation in INSTALL.$platform.md files
    • ACTION: gravitystorm change the databases in example.config.yml (and the docs) to be osm-development/osm-test/osm-production
  • Carto benchmarking
    • pnorman has done some benchmarking and found a slowdown between 12% and 22% [1].
    • It was generally considered that the slowdown we would experience would be somewhere between these, and not so bad that it couldn't be handled with some hardware mitigations.


IRC Log

17:03:30 <zere> minutes of the last meeting: http://www.osmfoundation.org/wiki/Working_Group_Minutes/EWG_2013-05-13
17:04:42 <zere> gravitystorm: any news on the READMEs?
17:05:43 <gravitystorm> zere: nope, still in progress. A question for the group though - how best to deal with platform-specific notes? I know the ones on the wiki are all hopelessly outdated, but I haven't nailed a strategy for replacements. Options:
17:06:27 <gravitystorm> 1) All crammed into INSTALL.md 2) Extra pages on the openstreetmap-website/wiki on github 3) extra pages on the osm.org wiki 4) INSTALL.windows.md etc
17:06:31 <gravitystorm> thoughts?
17:07:28 <zere> i would prefer the INSTALL.$platform.md approach, but having them in a .gh-pages branch would also be good.
17:07:52 <zere> i suppose the main idea would be that they're well-written, and not overflowing with outdated information like the wiki is
17:08:17 <apmon> If we go the route of having the main doc in INSTALL.md, I think I'd vote for INSTALL.$platform.md
17:09:25 <gravitystorm> OK, unless anyone suggests otherwise, that's what I'll go for
17:10:10 <zere> awesome. before the meeting started, pnorman reported: "best guess on osm.xml vs osm-carto is 10%-20% slower for carto. 10% on in-memory working set, 20% if it has to hit the disk. other changes like upgrading postgres/postgis may speed up things."
17:10:51 <gravitystorm> Second question, fall under the "worth the effort?" category - I'd like to rename the databases. Currently they are  osm/osm-test/openstreetmap, and I think that can lead to confusion for newbies. I propose osm-development/osm-test/osm-production and amending the docs accordingly
17:11:22 <RichardF> +1. that confused me a bit when setting up the rails_port last week.
17:11:35 <RichardF> (do most people need to set up anything apart from development anyway?)
17:12:04 <gravitystorm> RichardF: few people need production, but development + test is common
17:12:41 <zere> if you're doing development... you should have test as well!
17:12:54 <zere> s/have/want/, rather.
17:13:47 <gravitystorm> Does anyone know if there is other documentation that would be impacted by this change, e.g. switch2osm or something similar?
17:15:03 <zere> i don't think so... isn't switch2osm more about the tiles?
17:15:46 <RichardF> yep
17:16:09 <apmon> zere: Wasn't it the other way round? 20% in-memory and 10% if it hits disk?
17:16:47 <gravitystorm> OK, I'm done on rails-port docs
17:17:16 <gravitystorm> #action gravitystorm create platform-specific install documentation in INSTALL.$platform.md files
17:17:30 <RichardF> gravitystorm: yell if you need any help on OS X stuff
17:17:32 <zere> apmon: that wasn't what pnorman said 45 mins ago.
17:17:54 <apmon> Can you check with him again on that then?
17:18:07 <gravitystorm> #action gravitystorm change the databases in example.config.yml (and the docs) to be osm-development/osm-test/osm-production
17:18:58 <zere> apmon: yup. hopefully he'll be back before the end of the meeting and can explain the results.
17:19:08 <gravitystorm> apmon: you're correct on this, going by his original mailing list post
17:19:45 <gravitystorm> http://lists.openstreetmap.org/pipermail/tile-serving/2013-May/000217.html
17:19:51 <apmon> Yes, all of the previous discussions was that way round, so I guess he might have just said it wrong in the last discussion with zere
17:20:04 <gravitystorm> For the from-ram [...] a decrease of 22%.
17:20:28 <gravitystorm> For the larger [from-disk] set [...] a decrease of 12%.
17:21:10 <apmon> So one question is, when moving from EC2 disks to SSDs on yevaud / orm, will it behave more like in memory or like from-disk
17:22:12 <gravitystorm> Well, perhaps we shouldn't guess too much, and say that it's likely to be between a 12-22% slowdown. What consequences are there for that?
17:22:40 <apmon> Not too many
17:24:45 <zere> splitting the difference - 17% slowdown... probably not a massive problem. does mean that the queues might go up a bit.
17:25:07 <apmon> On yevaud, we'd likely hit the queue full situation more often, but it should be fine most of the time. On Orm, the faster server should compensate for the carto slowdown
17:25:42 <zere> if the from-ram is a 22% slowdown, does that mean that the carto is making mapnik do 22% more work? is that the right conclusion to draw?
17:27:06 <zere> (well, to be precise 27% more work)
17:27:28 <apmon> It is possible that there are some differences in postgresql as well
17:28:28 <apmon> It is possible that mapnik hits postgresql with the same queries multiple times (as I think layer caching wasn't turned on)
17:28:55 <apmon> The consequent times queries comeing from ram.
17:30:38 <gravitystorm> yes, as the style has been developed, the SQL queries are diverging between the two styles
17:31:41 <gravitystorm> I don't think there's anything to do (or much to discuss) until specific actions are identified as per a+b in http://lists.openstreetmap.org/pipermail/tile-serving/2013-May/000232.html
17:32:23 <gravitystorm> But the overarching point is - do we *need* to improve performance? Or is it just a nice-to-have?
17:33:20 <zere> somewhere in between the two :-)
17:34:30 <gravitystorm> :-)
17:34:41 <zere> on the one hand, no - we don't *need* to improve performance. we had a discussion last week about various ways of using multiple servers for rendering which would eliminate that need. but they're all ideas at this point - not much working code.
17:35:38 <zere> on the other hand, if performance improvements were made, it would extend the life and the capabilities of the existing servers. and possibly lead to the ability to do more resource-intensive cartography.
17:35:46 <Firefishy> orm can now be safely pulled from the tile caches. I need to do final spec and purchasing of SSD for orm.
17:36:53 <Firefishy> I have just pulled orm from tile caches.
17:37:01 <TomH> and we need to fix the style to download coastline data from somewhere that isn't tile.osm...
17:37:03 <gravitystorm> zere: OK, but I think what you're saying is that without some kind of change, then we *can't* put openstreetmap-carto onto the tileserver?
17:37:28 <zere> no, i think pnorman's benchmarking shows that the slowdown won't be too bad.
17:37:35 <gravitystorm> TomH: OK, I can wget it onto a different server :-)
17:38:23 <zere> but if there are some things (e.g: z13) which are clearly slow then it looks like that would be something worth looking into.
17:38:49 <Firefishy> We can re-point parent cache within minutes and per tile cache... so we can test load on orm per tile cache region and during specific periods.
17:38:57 <gravitystorm> zere: OK. I just want to clarify what's blocking deployment, as opposed to what osm-ewg thinks is worth working on
17:39:46 <Firefishy> So for example, I can move Australia to orm for a 24 hour testing period and then keep adding regions.
17:41:25 <zere> gravitystorm: yeah. unless i'm proved horribly wrong by spiralling queues - i think a 15%-ish slowdown is a nuisance not a disaster.
17:41:44 <Firefishy> We would need to slowly migrate the tile traffic anyway, so that orm is able to build up a base of tiles. 2 or 3 weeks at a guess.
17:44:20 <zere> worth pointing out that, from pnorman's results, z13 is 33% of the total render time. yevaud's statistics put z13+z14 at 29% of the total render time.
17:45:26 <zere> might just be an artefact of the benchmarking sample, but it would seem that a 10-20% speedup could be reclaimed from z13.
17:45:36 <apmon> So that matches reasonably well
17:46:46 <apmon> Are there any more statistics one could build into mod_tile / renderd to check things are running smoothly once in production?
17:47:20 <apmon> zere: Between, renderd should spit out the detailed statistics per zoom level, only munin bunches them up into groups of zoom level
17:48:44 <Firefishy> The SSD IOPs rate should increase from yevaud to orm: 39500 IOPs (Intel 320) -> 100000 IOPs (Samsung 840 Pro)
17:49:05 <Firefishy> 270 MB/s -> 540 MB/s
17:49:35 <zere> i wonder whether it's possible to have some tool to just look at the quantity of data that each layer's query pulls in from the database across a bunch of zoom levels?
17:50:38 <zere> presumably it's easy enough to pull the queries out of the carto mml file. is it easy to figure out which queries are used at which zoom levels?
17:51:02 <apmon> woodpeck had a tool like that a while ago
17:51:52 <apmon> https://github.com/openstreetmap/mapnik-stylesheets/blob/master/utils/stylecheck.pl
17:51:59 <apmon> Not sure if it still works
17:55:00 <apmon> Looks like it does still work on the default osm.xml
17:55:53 <apmon> http://pastebin.com/V3hGU9Mu is the result for the default osm.xml
18:00:07 <zere> quite a lot of stuff kicks in at 13, according to that...
18:01:11 <zere> i guess the next step might be to execute those queries, or at least find out how many features they return.
18:01:20 <zere> something for next week perhaps.
18:01:27 <zere> thanks to everyone for coming!