-
-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
remove public_transport from features list #556
base: master
Are you sure you want to change the base?
Conversation
"_type": "_doc", | ||
"data": { | ||
"name": { | ||
"default": "Vancouver Pacific Central Station" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one would have been nice to keep since the name is fairly unique.
Fortunately that node has since been deleted and replaced with https://www.openstreetmap.org/way/32603955 which we will bring in via building~name
…me from features list
removed highway:bus_stop and railway:tram_stop, these smaller local stops generally don't have a unique name. |
Hm, I am torn on this one. On the one hand, I can absolutely see how this can in many cases improve results, especially for "plain" queries without any filtering on layers. But I also think there's a good chance this will impact some queries. Lots of Pelias users are particularly interested in transit stops, so ensuring we don't break anything is worth some thought. Here are some scenarios to think about
All that said, I 100% support the goal of cleaner query results in cases like this, so investigating solutions to the problem are worth it. |
Based on the changes in fixture diff in this PR it appears to be positive, although I can understand some hesitation. I've generated a much larger diff for the whole of Berlin (1MB) which shows all the records which would be removed as part of this feature. This was generated by running the comm -2 -3 <(jq -cM -s 'sort_by(._id) | .[]' berlin-latest.osm.pbf.jsonl) <(jq -cM -s 'sort_by(._id) | .[]' berlin-latest.osm.pbf.no_public_transport.jsonl) > berlin-latest.osm.pbf.diff.jsonl
cat berlin-latest.osm.pbf.diff.jsonl | jq -r '(._id + " " + .data.name.default)' I chose a popular transit stop Potsdamer Platz as a test case since it has a variety of public transport options, a couple streets with the same name and it's also the name of the surrounding neighbourhood: grep -i 'potsdamer' berlin.osm.public_transport.diff.jsonl | grep -i 'platz'
openstreetmap:venue:node/1874617251 S Potsdamer Platz/Voßstraße
openstreetmap:venue:node/1877576992 Varian-Fry-Straße/Potsdamer Platz
openstreetmap:venue:node/1877576997 Varian-Fry-Straße/Potsdamer Platz
openstreetmap:venue:node/1877848269 S+U Potsdamer Platz
openstreetmap:venue:node/1877848274 S+U Potsdamer Platz
openstreetmap:venue:node/1877848279 S+U Potsdamer Platz
openstreetmap:venue:node/1877848283 S+U Potsdamer Platz
openstreetmap:venue:node/2093326467 S Potsdamer Platz
openstreetmap:venue:node/2098466340 U Potsdamer Platz
openstreetmap:venue:node/2451831547 S Potsdamer Platz
openstreetmap:venue:node/2456043650 Berlin Potsdamer Platz
openstreetmap:venue:node/2456043651 Berlin Potsdamer Platz
openstreetmap:venue:node/2456043652 Berlin Potsdamer Platz
openstreetmap:venue:node/27424921 Berlin Potsdamer Platz
openstreetmap:venue:node/29495049 U Potsdamer Platz
openstreetmap:venue:node/298516304 S Potsdamer Platz/Voßstraße
openstreetmap:venue:node/30353079 S Potsdamer Platz
openstreetmap:venue:node/322796203 S Potsdamer Platz
openstreetmap:venue:relation/2364886 Berlin, S Potsdamer Platz/Voßstraße
openstreetmap:venue:relation/2368542 Berlin, Varian-Fry-Straße/Potsdamer Platz
openstreetmap:venue:relation/3198309 S Potsdamer Platz
openstreetmap:venue:relation/3200537 S Potsdamer Platz
openstreetmap:venue:relation/5686702 Berlin Potsdamer Platz, Bahnhof
openstreetmap:venue:relation/5686709 Berlin, S+U Potsdamer Platz [Bus Stresemannstraße]
openstreetmap:venue:relation/6865726 U Potsdamer Platz
openstreetmap:venue:relation/8435158 Berlin, S+U Potsdamer Platz [Bus Leipziger Straße]
openstreetmap:venue:way/237532199 Berlin Potsdamer Platz
openstreetmap:venue:way/237532200 Berlin Potsdamer Platz
openstreetmap:venue:way/382265010 S Potsdamer Platz/Voßstraße
openstreetmap:venue:way/382265011 S Potsdamer Platz/Voßstraße
openstreetmap:venue:way/382265012 S+U Potsdamer Platz
openstreetmap:venue:way/382265013 S+U Potsdamer Platz
openstreetmap:venue:way/382265014 S+U Potsdamer Platz
openstreetmap:venue:way/382265015 S+U Potsdamer Platz
openstreetmap:venue:way/382265023 Varian-Fry-Straße/Potsdamer Platz
openstreetmap:venue:way/382265024 Varian-Fry-Straße/Potsdamer Platz The most popular public transit option is "U Potsdamer Platz" (the underground) and we can see that 3x records with So a couple things worth mentioning here:
And looking at what didn't get removed: cat berlin-latest.osm.pbf.no_public_transport.jsonl | jq -r '(._id + " " + .data.name.default)' | grep -i 'potsdamer' | grep -i 'platz' | grep 'venue'
openstreetmap:venue:node/84644779 Suite Novotel Berlin City Potsdamer Platz
openstreetmap:venue:node/340388625 Dalí - Die Ausstellung am Potsdamer Platz
openstreetmap:venue:node/365726272 Mövenpick Hotel Berlin am Potsdamer Platz
openstreetmap:venue:node/394948618 U Potsdamer Platz
openstreetmap:venue:node/394948620 U Potsdamer Platz
openstreetmap:venue:node/396004773 U Potsdamer Platz
openstreetmap:venue:node/1251201121 U-Bhf. Potsdamer Platz/Linkstraße
openstreetmap:venue:node/1259545063 ibis Berlin City Potsdamer Platz
openstreetmap:venue:node/1318190070 Potsdamer Platz
openstreetmap:venue:node/1336883756 Stage Theater am Potsdamer Platz
openstreetmap:venue:node/1573193294 S+U Potsdamer Platz
openstreetmap:venue:node/1573193317 S+U Potsdamer Platz
openstreetmap:venue:node/1576240058 S+U Potsdamer Platz
openstreetmap:venue:node/1788588105 Potsdamer-Platz-Apotheke
openstreetmap:venue:node/2451831603 S Potsdamer Platz
openstreetmap:venue:node/2451831708 S Potsdamer Platz
openstreetmap:venue:node/2454155640 S+U Potsdamer Platz
openstreetmap:venue:node/2456291607 S Potsdamer Platz
openstreetmap:venue:node/2491792013 S+U Potsdamer Platz
openstreetmap:venue:node/2491824683 S Potsdamer Platz
openstreetmap:venue:node/2491846867 S Potsdamer Platz
openstreetmap:venue:node/2491848735 S Potsdamer Platz
openstreetmap:venue:node/2491871589 S+U Potsdamer Platz
openstreetmap:venue:node/2491871593 S+U Potsdamer Platz
openstreetmap:venue:node/2491872658 S Potsdamer Platz
openstreetmap:venue:node/2581112669 Hotel Potsdamer Hof Berlin Am Potsdamer Platz
openstreetmap:venue:node/2749149916 Potsdamer Platz/Ebertstraße
openstreetmap:venue:node/3493177833 Berlin Potsdamer Platz
openstreetmap:venue:node/3519366793 Motel One Potsdamer Platz
openstreetmap:venue:node/3780937605 NH Berlin Potsdamer Platz
openstreetmap:venue:node/3837620167 Alper Hotel am Potsdamer Platz
openstreetmap:venue:node/3837620172 B&B Hotel Berlin-Potsdamer Platz
openstreetmap:venue:node/3854590996 S Potsdamer Platz
openstreetmap:venue:node/3854591000 U Potsdamer Platz
openstreetmap:venue:node/5333707421 WeWork Potsdamer Platz
openstreetmap:venue:node/6143046477 Potsdamer-Platz-Apotheke
openstreetmap:venue:way/22989670 Potsdamer Platz Arkaden
openstreetmap:venue:way/81456696 Scandic Berlin Potsdamer Platz
openstreetmap:venue:way/92335625 Crowne Plaza Berlin Potsdamer Platz
openstreetmap:venue:relation/3200536 Potsdamer Platz There's still 4x "U Potsdamer Platz" and 7x with some form of "S+U Potsdamer Platz" 😆 Some entries like "Varian-Fry-Straße/Potsdamer Platz" have been completely removed, which is to be expected since this PR removes all bus/tram stops too. Nominatim does actually include one version of that bus stop and I guess it's a product decision about whether to include those or not and how much value they provide, it's most likely being included from being tagged |
When searching a Berlin street, Pelias returned a bus stop with the same name:
https://pelias.github.io/compare/#/v1/autocomplete?text=gleimstrasse
https://www.openstreetmap.org/relation/6378676
When I dug a little deeper I realised that this is actually quite a common issue, many of these
public_transport
tags are named literally the same as a nearby street or the containing neighbourhood.I checked the
nominatim
andosm2pgsql
source to see how they handle it, and... they don't, from what I can tell they don't target the tagpublic_transport
for thegazetteer
style export fromosm2pgsql
at all 🤔This PR removes
public_transport:*
from our builds, the diff of which returned a bunch of results like I mentioned above, where the name was easily confused with a nearby street, neighbourhood.It seems counterintuitive to remove
public_transport
however many of the things we were intending to target with that are actually being covered by other more descriptive tags such asrailway~station+name
.I believe this will be very beneficial for users as there will be less duplication and confusion with similarly named entities in the results.
It's possible that we end up removing some entries which some users would like, although I suspect this will be rare, if at all, and we can always add more specific tags again in the future as required.
note: as part of this work I rewrote some of the
end-to-end
test code since the diff output was basically impossible to read.