Working Group Minutes/EWG 2013-09-30

Attendees

IRC nick Real name
gravitystorm Andy Allan
pnorman Paul Norman
RichardF Richard Fairhurst
TomH Tom Hughes
zere Matt Amos

Summary

  • Switch2OSM Carto / tile server setup instructions
    • pnorman's work in progress: [1].
    • Difference between "how-to" and "full guide":
      • ideally we would get people up & running quickly,
      • and have separate sections for the tuning of PostgreSQL, osm2pgsql, etc...
      • which could include more detailed information that might obscure the beginner's route to just getting something working.
  • PostGIS dumps
    • would help to get people up & running faster if they import a postgres dump of the planet / extract rather than import from OSM XML/PBF, mainly due to not needing to "gather" nodes/ways for building relations/polygons.
    • ACTION: pnorman to talk to iandees and AWS and see if they're willing to give a machine for osm2pgsql dump generation.

IRC Log

17:03:20 <pnorman> leaving aside the stuff inside it, is https://github.com/gravitystorm/openstreetmap-carto/blob/master/roads.mss#L1070 with [feature = 'highway_track'] {...} the right way to select it?
17:03:29 <pnorman> ah, after the meeting
17:03:54 <zere> during is fine - that's what we're here for ;-)
17:03:57 <pnorman> I actually got stuff done on one of my outstanding items, switch2osm rewrite to carto
17:03:58 <gravitystorm> zere: cool. I'm around until 20 past the hour, if there's anything that I'm needed for
17:04:16 <pnorman> https://gist.github.com/pnorman/6739765
17:05:03 <zere> that's pretty long!
17:05:05 <pnorman> pulling from postgres repo (9.3+2.1), apmon's repo, mapnik repo and chris-lea/node.js
17:05:06 <gravitystorm> pnorman: we should discuss the intricacies elsewhere, I think
17:06:20 <pnorman> I'm of two minds about the tuning section. it's really its own seperate topic, on the other hand, some basic tuning is suggested because the defaults are horribly off for a planet-sized osm2pgsql DB
17:08:55 <gravitystorm> pnorman: I'm generally of the opinion that guides to osm2pgsql import should assume a first-run with a small-ish (country/state) import, and save performance tuning for a second run with a full planet.
17:09:56 <pnorman> it's horribly wrong for a country/state import too, the values are suited for a machine with <1GB of RAM
17:10:34 <gravitystorm> by "horribly wrong" how do you mean? I mean, is it making something take 5 mins instead of 3 on a typical laptop?
17:10:59 <zere> i guess the emphasis is on whether you're looking to set up a proper, hardcore tile server. or looking to set up *something* like a tile server to get started learning the intricacies of the toolset.
17:11:24 <zere> s/i guess the/i guess the question is whether the/
17:11:46 <zere> btw: minutes of the last meeting - let me know if there are any problems: http://www.osmfoundation.org/wiki/Working_Group_Minutes/EWG_2013-09-23
17:11:53 <RichardF> I've been surprised how many people wander into #osm wanting to set up a full planet. I suspect a lot of devs are being told "give me a rendering server, I don't want to pay for Google Maps" by their management, and don't have the first idea where to start.
17:11:58 <zere> #topic tile server setup instructions
17:12:05 <pnorman> about a 10-20% difference in speed between memory settings according to frederik's benchmarks
17:12:31 <RichardF> so I'd support having the tuning instructions on there, but _potentially_ put them on a separate, linked page to avoid confusing the issue for the "just trying it out" people
17:12:38 <pnorman> that'd work
17:13:30 <zere> similar to postgres itself - the instructions on how to set it up don't really go into all the tuning parameters, but instead link to the docs for them.
17:15:01 <pnorman> that helps the size out. largest part written so far is now about putting together an osm2pgsql command line
17:16:07 <zere> the defaults are probably good enough for a small extract, aren't they?
17:17:08 <pnorman> maintenance_work_mem is probably the most critical one for the loading that  you'd see on a small import, but for a state/small country it should be okay
17:19:46 <zere> cool. anything else on this topic?
17:20:20 <pnorman> nope. still working through installs when doing ec2 testing
17:20:54 <zere> it doesn't look like apmon is around, so i guess that wraps up the updates section.
17:21:03 <pnorman> which brings me to another topic to put on the agenda
17:21:12 <zere> cool, what's that?
17:22:09 <pnorman> if you prepare a dump with pg_dump of a full planet database (--slim --drop, default.style) it takes about 2.5 hours to restore on a fairly modest EC2 instance
17:23:26 <pnorman> I know back in 2011 someone was providing pre-imported osm2pgsql databases somehow with ec2 and I think it's worth doing again
17:24:30 <zere> is it possible to dump one from the database apmon is running on dev?
17:25:05 <zere> i don't think that was --slim --drop, but wouldn't it be possible to just not include the slim tables in the dump?
17:25:21 <pnorman> yes, it'd work. my concern is errol can be very slow
17:25:49 <pnorman> also errol's postgres is too old, you need 9.2 for parallel dump/restore
17:26:21 <TomH> no you don't
17:26:25 <TomH> well not for restore anyway
17:26:54 <pnorman> for restore from folder dumps you need it, and those are the only kind you can dump in parallel
17:29:04 <pnorman> oh wait, errol's database has the extra attributes which you don't want
17:31:21 <TomH> I suspect amazon would be amenable to giving us resources to create and maintain such an image anyway, as it will draw people in to use their services
17:31:29 <zere> any idea why restoring a backup is so much faster than doing an import?
17:31:38 <pnorman> i'll talk to ian, some amazon people had talked to him at sotm-us
17:31:54 <pnorman> ya, all the linestrings/polygons are already computed
17:32:03 <TomH> it's the kind of thing they wete suggesting at SOTM
17:32:17 <zere> indeed, he emailed me and i haven't yet emailed him back... he was suggesting some other stuff too...
17:33:57 <pnorman> even if you want to consume updates it's faster to do a pg_restore of a week-old database and update with diffs than do a fresh import
17:33:59 <zere> i find it odd that computing the linestrings / polygons takes up a long time (like 10 hours).
17:34:15 <pnorman> well for relations it has to fetch the linestringsfrom the database
17:35:08 <zere> ah, so not compute as in CPU calculation... the time is mostly spent in the database.
17:35:46 <pnorman> ya, mainly. also it has to do the slim tables, which you don't need if restoring
17:36:50 <pnorman> the other big advantage is that you need >16GB of RAM to do an import in a reasonable time (node cache), but don't need near that for a pg_restore
17:36:54 <zere> sure, but there are probably ways to not need those on the initial import as well
17:37:03 <pnorman> ya. non-slim mode
17:37:19 <zere> does that still work? i thought it was necessary now
17:37:37 <pnorman> it still works, and it still works for large files. You just need over 64GB of RAM for just Europe
17:38:43 <zere> just wondering if anyone had tried https://github.com/omniscale/imposm3 ?
17:39:42 <zere> they benchmarked a 6.5h import time for a full planet with generalised geometry tables on a reasonably good machine.
17:40:42 <pnorman> simon had faster, but that's pretty good
17:42:23 <pnorman> anyways, pnorman to talk to ian to get ec2/amazon contact stuff?
17:43:29 <zere> so i guess we've got 2 osm2pgsql databases (not counting nominatim), and we could do a dump from either of those. but otherwise, seems a bit excessive to get a decent machine solely for the purpose of dumping for other people to use.
17:43:39 <pnorman> do it on amazon
17:43:51 <zere> and i think gravitystorm has an opinion on the matter, but i don't want to put words in his mouth.
17:44:23 <zere> pnorman: exactly, as long as amazon are happy to give us those resources.
17:44:28 <pnorman> well yes
17:44:30 <zere> so yes
17:44:55 <pnorman> pg_dump took under an hour for what it's worth
17:45:04 <zere> #action pnorman to talk to iandees and AWS and see if they're willing to give a machine for osm2pgsql dump generation.
17:48:03 <pnorman> anyways, that's all I had.
17:51:29 <zere> cool. i wanted to also get everyone to think about what we want to do in 2014.
17:51:37 <zere> what resources we'll need, etc...
17:52:10 <zere> given that we've not used any of our budget this year, it might be worth thinking about alternatives - other things we can do to help out.
17:54:16 <zere> i'd like to see API 0.7, but that might be just me ;-)
17:57:22 <zere> ok, i guess that's it for this week.
17:57:36 <zere> hope to see you next week, and we'll discuss plans for 2014.
17:57:47 <zere> thanks for coming :-)