Working Group Minutes/EWG 2012-10-29

Attendees

IRC nick	Real name
apmon_	Kai Krueger
pnorman / pnorman__	Paul Norman
ppawel	Paweł Paprota
shaunmcdonald	Shaun McDonald
TomH	Tom Hughes
zere	Matt Amos
Summary

Matters arising:
- Nothing to report on help.osm.org task - leave pending for future meetings.
Clickable POIs
- Image-based methods (e.g: UTFgrid) seem to be more intuitive, at least they're similar to existing clickable methods on other sites.
- Mitigation of "too close POIs" problem would be to add another zoom level.
- Image-based methods don't help where POIs are "stacked", and don't expose items which aren't rendered.
- AGREED: the current clickable pois task should be split into two; one for rendering-dependent lookup and another for discovering features not rendered.
- ACTION: zere update the TTTs page to include this, and other recently agreed updates.
- Vector tiles would be a good solution to this, but at the moment suffer many backwards-compatibility problems with older browsers and it isn't clear whether existing implementations are as performant as image tiles.
AoB
- Notes/bugs:
  - There is concern that people who haven't accepted the CTs could submit data which is tainted (not usable by OSM) to the DB and information from the bugs could be directly used by another mapper to add to the map.
  - ACTION: zere contact LWG to get a better idea of the legal implications of anonymous notes.
IRC Log

18:01:38 <zere> welcome. i'm afraid i'm a bit jet-lagged, so i might be slow on the uptake today :-)
18:01:59 <ppawel> hello
18:02:10 <zere> and apologies, but i haven't written up last week's minutes on the wiki yet. the raw logs are here: http://matt.dev.openstreetmap.org/osm-ewg/2012/osm-ewg.2012-10-22-18.03.html
18:02:59 <zere> the agenda today is: matters arising from last meeting & a continuation of the clickable discussion we were having last time, then AoB
18:03:45 <zere> but a quick note before that - i'd like to thank everyone who was involved in the hack weekend over the past couple of days, both in-person and on IRC. thank you all!
18:03:57 * pnorman checks in
18:04:12 <ppawel> yeah it was very cool
18:04:40 <zere> actions from previous minutes - the only one i'm seeing is about help.osm.org. anything to report on this, or leave it pending?
18:04:58 <ppawel> no nothing to report, I've been working exclusively on OWL
18:05:16 <zere> i may be biased, but: cool :-)
18:05:36 <zere> #info nothing to report on help.osm.org task - leave pending for future meetings.
18:05:48 <zere> that's the only action i can see, so...
18:05:58 <zere> #topic Clickable POIs
18:07:36 <zere> as i recall, last meeting we were at the point where we were trying to decide what this feature is actually *for*, so that we could use that information to make technology decisions about whether returned information should be tied to a rendering style (e.g: utfgrid) or expose all available information (e.g: vector tiles / overpass).
18:08:02 <zere> does anyone have any new thoughts they'd like to share on this?
18:08:12 <pnorman> I know the use case on Kompza's clickables has been for where there are too many POIs to render. Or at least one of them.
18:10:58 <zere> there are a few options we can consider: 1. utfgrid, 2. overpass, 3. polling to see which feature set is most desirable, and maybe even 4. do both
18:11:20 <zere> of course, doing both is clearly harder than doing either
18:12:47 <apmon_> pnorman: The easiest "fix" for "where there are too many POIs to render" is to add another zoom level to the rendering
18:13:38 <apmon_> Imho that should be considered. But I guess that is a somewhat different debate
18:13:55 <pnorman> apmon_: until you get to japan and you have pois stacked - I believe with his you could click on a building (e.g. mall) and it would give a list of shops
18:14:05 <zere> indeed. it doesn't help when you have POIs that are always too close (like pnorman just said)
18:14:30 <zere> there are plenty of ATMs outside banks which are positioned so close they might not render even on z19 or z20
18:15:29 <zere> anyone know if lists are possible in utfgrid? i.e: the info for a particular pixel / char is the collection of things that might have been rendered there if there was space?
18:15:57 <pnorman> and even if they render you're not likely to get names - it'd be nice to be able to get the name associated with a rendered icon
18:16:31 <zere> i feel like we're working our way towards wanting both...
18:17:11 <zere> should we split the task into a) click to see what's there in the rendering and b) click to see everything that's nearby?
18:17:58 <pnorman> does it make sense from a UI perspective to split them? personally I'm happy to download the area in JOSM to see so I'm not exactly an average user here
18:18:00 <apmon_> I think at least as important as too many POIs is the data that doesn't get visualised by the map at all.
18:18:09 <apmon_> E.g. the things that openlinkmap shows
18:18:29 <ppawel> I'm trying to find a place with 'too many pois' on google maps
18:18:36 <zere> indeed. we'd need an almost infinite set of tiles to possibly show everything.
18:18:53 <pnorman> Yes - and /browse/ isn't a great solution, we need a "rendering" to convert tags into human-readable text
18:19:47 <pnorman> ppawel: gmaps has wide-spread POI coverage but I don't think they have the density we have in some downtown retail areas where we have every store mapped
18:20:17 <apmon_> gmaps also has one zoom level more than osm...
18:20:33 <ppawel> something like this ... http://maps.google.com/?ll=40.715491,-74.006165&spn=0.001813,0.003919&t=m&z=19
18:20:50 <ppawel> when you zoom out they remove some pois
18:21:11 <zere> one place it happens for sure is central tokyo - it seemed like most POIs were stacked in multi-storey buildings, so it's near-impossible to render them all
18:21:17 <apmon_> ppawel: Yes, so at z18 they also don't show many of them
18:21:18 <zere> and it seems that google doesn't even try
18:21:44 <pnorman> church between reade and duane seems a good example
18:22:04 <pnorman> i suspect google has an advantage here at data to determine which POIs are more interesting by search habits, etc
18:23:15 <ppawel> I think making what is already rendered clickable should be a priority
18:23:25 <ppawel> 'browsing an area' looks like a different problem
18:23:26 <zere> i'm looking at Church St on streetview and there are clearly more shops than google is showing. although, of course, it's hard to tell whether that's because they're occluded or because google doesn't have that data
18:23:44 <zere> therefore: clearly google needs an overpass-style click method :-)
18:24:03 <shaunmcdonald> Click and get a list of things under that point to choose from
18:24:12 <zere> s/under/near/ i think...
18:24:30 <pnorman> yes - never count on pixel-perfect clicking
18:24:52 <shaunmcdonald> yeah, near is probably more accurate
18:25:54 <zere> so... it sounds like we're really talking about doing both. should we split "clickable POIs" into a rendering-based approach and a complementary overpass-style approach? can i get a +1/-1 from each of you on that, please?
18:27:05 <pnorman> I think we need to be clear as to what we mean by both - I'm not quite sure
18:27:49 <ppawel> #yes
18:28:03 <ppawel> #help
18:28:15 <ppawel> #agreed
18:28:15 <apmon_> zere: Are you proposing a clickable POI as an overlay on osm.org in addition to offering vector based data tiles?
18:28:35 <zere> i think something like the google method to click visible objects on the map. *plus* some sort of inspection tool (maybe right-click, maybe something else) to get a list of nearby objects, possibly in an infobox, maybe in the side-bar?
18:28:59 <ppawel> +1 to splitting it as zere said
18:30:02 <zere> apmon_: i think vector tiles will be cool. but my gut feeling is that they've got further to go than the other options on the table, and even then we'd still need some backwards-compatibility with browsers not capable of showing vector tiles (performantly).
18:30:32 <pnorman> I think something like the google method for getting details on a POI by clicking and something else for showing un-rendered stuff in the area. I'm not sure if that's what zere is suggesting
18:31:17 <zere> pnorman: that's what i'm suggesting, and further; that those two approaches should be separate tasks in the TTTs
18:31:46 <zere> it's currently a little unclear which, if either, of those approaches is the task in the TTTs
18:32:12 <pnorman> +1 then, and +1 to separate tasks on the TTT although there may be some common parts when it comes to the UI
18:33:12 <apmon_> +1 for both
18:33:35 <zere> ok, so far i see 3 +1s, no -1s. does anyone else want to add their voice?
18:36:06 <zere> #agreed the current clickable pois task should be split into two; one for rendering-dependent lookup and another for discovering features not rendered.
18:36:29 <zere> #action zere update the TTTs page to include this, and other recently agreed updates
18:37:08 <zere> apmon_: while we're talkign about this - do you want to say anything about vector tiles, or shall we agenda that for a future meeting?
18:38:50 <shaunmcdonald> +1 for both
18:39:02 <pnorman> I have a couple of AOB matters (vandalism detection and vector layers for editing) if we have time at the end - not sure how long they'll take
18:40:00 <zere> we're at the end of the agenda items held over from previous meetings, so...
18:40:11 <zere> anyone have any other business?
18:40:14 <zere> #topic AoB
18:41:18 <apmon_> TomH: Any updates on the notes branch?
18:42:59 <apmon_> paticularly, do you have thoughts on what steps are still necessary to get it ready for deployment?
18:43:41 <zere> i've not seen any updates, so i guess it's the same status as in previous meetings
18:44:39 <TomH> well I guess we need to make a decision about the anon user issue
18:45:30 <zere> wasn't there a suggestion that anon users would be able to add notes, but not update or close them? that seemed fairly sensible.
18:45:43 <pnorman> -1 to anon users for legal reasons if nothing else, and registration (particularly if facebook login gets integrated) is not a high bar. also, it might be easier to convert note adding users to mapper users if they're registered
18:47:05 <apmon_> So who makes that final decision if anon users are allowed? OWG?
18:47:36 <zere> might it be possible to build it in such a way that adding anon users in the future isn't too much work, but leave that disabled until the issues have been explored?
18:47:48 <TomH> well it already has them
18:48:04 <TomH> but they do make everything a billion times more complicated
18:48:20 <TomH> and at the moment it's all a bit crap - if we're going to keep them then it needs some work
18:48:21 <zere> personally, i think the registration is a barrier and, no matter how low it is, it's going to put some people off.
18:49:21 <apmon_> disabling anon users (in the current system) would simply require adding adding an authetication pre-filter to the API call.
18:49:47 <zere> for example - if i'm using some routing app and it routes me to my "destination", but i carry on to some other point then it seems that it would be good to have a feedback mechanism where people could say "my original destination (address, POI) is actually here"
18:50:19 <TomH> apmon_: yes that's the easy bit - cleaning up the whole way the name field might be an authenticated name or might not etc is the hard work
18:50:34 <TomH> and the stupid (A) suffix thing it does
18:50:37 <zere> and i think that a large number of people won't want to bother signing up to submit such feedback.
18:50:46 <TomH> and having to provide a field for peope to enter a name if they're not authenticated
18:51:14 <TomH> maybe we should allow anon reports with no name rather then trying to prompt for a largely meaningless name?
18:51:32 <pnorman> Perhaps bounce the legal question off of the LWG first?
18:52:04 <zere> ok. can we be more precise about the legal concerns? what might actually be a problem?
18:53:17 <pnorman> people who haven't accepted the CTs submitting data to the DB and the bugs being directly used by another mapper to add to the map
18:53:25 <apmon_> TomH: Yes, if anon users remain permanently disallowed, then there are bunch of things that need cleaning up
18:53:26 <shaunmcdonald> zere: someone copying from a commercial map and submitting such feedback, which is then used
18:53:43 <apmon_> But a temporary ban to see how things go should be easy.
18:53:49 <TomH> apmon_: well if they are allowed then there are, because the current system is horrid
18:54:19 <pnorman> e.g. an anon user submits a bunch of "There's an Esso gas station here" bugs but they're all from a commercial database/map.
18:54:21 <TomH> what I mean is that I'm not prepared to deploy anon users in the current state, so if we want to keep support for them then I need to do a bit of work
18:55:29 <zere> is it enought to put a notice on anon notes saying "the source has not been verified, please use acceptable sources for gathering concrete information"?
18:55:35 <pnorman> It's come to my attention that my large-scale vandalism, mechanical edit and import detection tools have had some weaknesses published. Some additional cases (people "testing" our anti-vandalism measures and 4chan) that the DWG has had come up have also established that we don't have ways of detecting small-scale non-subtle vandalism (e.g. changing name=* to profanity)
18:55:47 <apmon_> TomH: Is there a better way to support both?
18:56:16 <pnorman> zere: I don't know - in the esso scenario I don't think you could distribute the notes themselves
18:56:35 <TomH> apmon_: well the main thing is I want to get rid of the (A) thing as nobody will know what it means
18:56:54 <TomH> that's why I was wondering about not allowing a name - it has no meaning after all
18:57:01 <zere> that's ok - we're covered by DMCA protections as far as distributing the notes goes. we only need to remove them once we're made aware of the infringement
18:57:07 <TomH> there is no link between two notes reported using the same name
18:57:16 <TomH> and no way to use it to contact the reporter
18:57:21 <TomH> so what purpose does it serve?
18:57:45 <apmon_> Not much I guess.
18:58:18 <zere> ok... i guess we move forward on this by checking out the legal concerns
18:58:23 <apmon_> Other than perhaps adding a more friendly/personal feel to it.
18:58:47 <zere> #action zere contact LWG to get a better idea of the legal implications of anon notes
18:59:01 <pnorman> +1 to getting rid of the name for anon notes if we keep them, -1 to keeping anon notes (knowing that registration does add a small barrier) and +1 to checking the legal concerns because if the LWG says we can't take anon notes then it renders this moot
18:59:50 <zere> any other actions/minutable items re: notes, before we move on to pnorman's vandalism detection item?
19:01:32 <zere> ok. i know it's late, but pnorman: do you have any ideas for how we could write tools for subtle vandalism detection?
19:01:45 <pnorman> basically, I can detect someone adding/modifying/deleting a lot of nodes/ways/relations in one changeset with my tools.
19:03:08 <pnorman> properly? no. my scripts are fairly hackish and work, just. I figure I'll add parsing of name=* to detect people adding profanity but not disclose the list of what words i'm checking
19:04:00 <pnorman> this *will* become an issue as we grow in popularity. the 4chan vandalism would not of been detected normally.
19:04:00 <zere> i guess this is a catch-22 question, but do you know any other kinds of subtle vandalism are going on?
19:04:30 <pnorman> I'm looking at non-subtle small-scale vandalism right now
19:04:49 <zere> one thing that will help (i hope) is when we get OWL back, people will be better able to monitor all kinds of activity in their local areas, which should help in detecting this stuff
19:05:24 <ppawel> pnorman, +1 to your popularity/scale/4chan comment
19:06:10 <ppawel> yes OWL can play a role at some point but any vandalism tools could be helpful...
19:06:13 <pnorman> we're not talking sophisticated stuff - 4chan was locker room mentality stuff and they drew attention to themselves by posting about it on 4chan
19:06:49 <ppawel> I don't know much about it but to my eyes the redaction process did not really go THAT smoothly with respect to tools
19:07:26 <ppawel> i.e. reverting stuff took a long time etc... I hope there's a lesson somewhere in there that someone is learning from...
19:07:30 <pnorman> if they had not posted about it to 4chan I don't know if it would of been picked up right away. If they hadn't picked a place with an active community to start I know it'd of been missed
19:07:37 <pnorman> our revert tools suck.
19:08:12 <ppawel> not that it was wrong or something - just that with growing scale there will be problems that (ideally) tools should be tested for beforehand, like vandalism which pnorman talks about
19:08:16 <pnorman> it takes me a week of uploading with the perl scripts to revert 1-2 hours of bad imports.
19:09:44 <ppawel> pnorman, do you have suggestions how to move forward with this? do you propose this as a TTT?
19:10:07 <pnorman> I suppose that's a second issue - we need better undo tools. I might look into a mode to run the redaction bot which skips the redaction api call and only do the revert changeset uploads
19:10:23 <zere> is that the slow part?
19:10:29 <pnorman> I'm not sure - I see two issues, one of detection, the other of undoing.
19:10:46 <pnorman> zere: well, the problem with the perl scripts are that they upload object by object.
19:11:01 <zere> hmm... that should be easy enough to fix
19:11:15 <pnorman> and the problem with the redaction bot is it does the redaction api calls which we don't want for a normal revert.
19:11:42 <zere> a quick fix would just be to comment that bit out ;-)
19:12:03 <pnorman> yes, and I should be able to do that with even my knowledge of rails :)
19:13:13 <pnorman> It's the detection which is going to be a problem - I'm sure most of the people here could sit down and code a revert tool given the time to do so and they wouldn't run into any major issues.
19:13:47 <ppawel> do you mean coming up with heuristics for detecting bad changes?
19:13:57 <ppawel> or e.g. infrastructure issue because of size of database?
19:14:11 <zere> i think the former...
19:14:19 <pnorman> yes. I know there was a gsoc project around it, but it never went anywhere, and I'm personally skeptical of the machine learning approach
19:15:03 <zere> pnorman: are you keeping a log of all these vandalism changesets that we might be able to later mine for commonalities? (e.g: age of user account, type of change, etc...)
19:15:33 <pnorman> zere: heh - that's another problem. I don't think anyone on the DWG likes OTRS (our ticket tracking system) so we don't use it like we should
19:18:07 <zere> this might not be a totally EWG thing, but do you think you could list the features that you use / would need, and maybe we can scout around for a replacement?
19:18:08 <apmon_> pnorman: It would be good to build up a database of vandalims / typical problems.
19:18:16 <pnorman__> Lost connection to my home machine, waiting for it to come back
19:18:26 <zere> this might not be a totally EWG thing, but do you think you could list the features that you use / would need, and maybe we can scout around for a replacement?
19:18:27 <pnorman__> last message was: 12:15 < pnorman> zere: heh - that's another problem. I don't think anyone on the DWG likes OTRS (our ticket tracking system) so we don't use it like we should
19:18:28 <apmon_> Both machine learning approaches and "expert system" like approaches would benefit from that
19:19:09 <zere> sure, but if you published it, then the vandals could use it to filter their edits to be less detectable
19:19:28 <apmon_> without there being clear patterns it will be difficult to do any automated detection
19:19:29 <pnorman__> zere: already ran into that with import detection
19:20:54 <zere> apmon_: yeah, i think the only long-term solution to this is the "many eyes" approach - to make it easier to monitor areas of interest, and try to recruit all users of the site to report problems
19:21:23 <pnorman__> Okay, I'll a) talk internally about what we need in a tracking system
19:21:29 <pnorman__> b) see if I can come up with a list
19:21:33 <zere> it'll probably never catch everything, but it's a sad fact of life that there are some trolls out there who relish the challenge of destruction
19:21:59 <pnorman__> zere: yes and no - I think that's what we need for some stuff, but for others (profanity in names) we need to be able to be proactive
19:22:13 <apmon_> I do think if we have a resonable sized database of vandalism examples (you don't need to publish the detection algorithm) that some automated detection can work fairly well. Although it obviously has to be in addition to "Many eyes"
19:23:20 <pnorman__> c) look at using the rails redaction code for reverts
19:23:31 <zere> pnorman__: first order, sure. but then you get the addition of special chars to break simple string matching. then you get people using interesting parts of the UTF charset as look-alike replacements for characters, etc...
19:24:07 <pnorman__> zere: you're assuming the vandals are smart and sophisticated.
19:24:30 <zere> sadly, many are smart. sophisticated only in a technical sense, not social ;-)
19:25:15 <zere> we're well over time, so i'm going to close the meeting
19:25:23 <pnorman__> also, keep in mind if people see their edit accepted they're likely to assume it worked, even if it gets reverted 2 minutes later
19:25:36 <zere> thanks to everyone for coming, and hope to see you next week :-)
19:25:52 <zere> pnorman__: sure, that's the concept behind general dreedle, i guess...
19:26:20 <zere> would it be useful to add a ban mode that just returns 200 OK for all uploads, but in reality just throws them away?
19:26:49 <pnorman__> I'll email dev@ about ticketing systems - someone in the community must have used something applicable
19:26:59 <pnorman__> I only mentioned it to explain why we don't have any lists