Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quality of Postal Codes #358

Open
bartek5186 opened this issue Apr 24, 2019 · 14 comments
Open

Quality of Postal Codes #358

bartek5186 opened this issue Apr 24, 2019 · 14 comments

Comments

@bartek5186
Copy link

bartek5186 commented Apr 24, 2019

Why Pelias don't use better quality postal codes from GEONAMES ???
http://www.geonames.org/export/zip/

WoF has weak source of Postal Codes database.
whosonfirst-data/whosonfirst-data#1584

@missinglink
Copy link
Member

Hi @bartek5186, can you confirm this is the case globally or is it only better in Poland?

@bartek5186
Copy link
Author

bartek5186 commented Apr 24, 2019

I have worked on PL, CZ, DE... I don't notice problem in other countries yet.

There are also a postals with bad postalcode, and bad position for example:

EDIT: You can obtain lat and lng of parent location of specific zip code via parser/findbyid?ids=101841989&lang=pol

image

@Isabel-pena
Copy link

I have found some entries in postal-codes database from wof that have incomplete data and it produces inconsistency on searchs.
This is an example from ES:

{"id":554829649,"type":"Feature","properties":{"edtf:cessation":"uuuu","edtf:inception":"uuuu","geom:area":0,"geom:bbox":"0.0,0.0,0.0,0.0","geom:latitude":0,"geom:longitude":0,"gp:parent_id":"12602116","iso:country":"ES","mz:hierarchy_label":1,"src:geom":"geoplanet","wof:belongsto":[],"wof:breaches":[],"wof:concordances":{"gp:id":"22664266"},"wof:country":"ES","wof:geomhash":"fc4d4085e55d16b479f231dbf54d3cfb","wof:hierarchy":[],"wof:id":554829649,"wof:lastmodified":1474569770,"wof:name":"09151","wof:parent_id":-1,"wof:placetype":"postalcode","wof:repo":"whosonfirst-data-postalcode-es","wof:superseded_by":[],"wof:supersedes":[],"wof:tags":[]},"bbox":[0,0,0,0],"geometry":{"coordinates":[0,0],"type":"Point"}}

It is even difficult when you manage to search a postalcode that is the same in other country.
Then you get the info about the other country and not from Spain.

@missinglink
Copy link
Member

missinglink commented Mar 14, 2021

The WOF dataset contains a lot of those 0,0 postcodes, I believe the WOF team leave them as placeholders for when the correct coordinates become available.

Pelias should not import null island places, so those 0,0 records you pasted will not enter the search index, if you see results with a location of 0,0 in the index then it's a bug.

@missinglink
Copy link
Member

I had a quick look at this today and opened up whosonfirst-data/whosonfirst-data-postalcode-pl#1 to discuss with the WOF team.

@bartek5186 I pulled down http://www.geonames.org/export/zip/PL.zip to have a look and I'm not sure the data is very good quality? The coordinates appear to be duplicated and rounded to two decimal points of precision in many cases.

Could you please confirm that the data is actually correct for Poland before we continue?

@missinglink
Copy link
Member

head PL.txt
PL	00-001	Warszawa	Mazowieckie		Warszawa			52.25	21	4
PL	00-002	Warszawa	Mazowieckie		Warszawa			52.25	21	4
PL	00-003	Warszawa	Mazowieckie		Warszawa			52.25	21	4
PL	00-004	Warszawa	Mazowieckie		Warszawa			52.25	21	4
PL	00-005	Warszawa	Mazowieckie		Warszawa			52.25	21	4
PL	00-006	Warszawa	Mazowieckie		Warszawa			52.25	21	4
PL	00-007	Warszawa	Mazowieckie		Warszawa			52.25	21	4
PL	00-008	Warszawa	Mazowieckie		Warszawa			52.25	21	4
PL	00-009	Warszawa	Mazowieckie		Warszawa			52.25	21	4
PL	00-010	Warszawa	Mazowieckie		Warszawa			52.25	21	4
head -n1000 PL.txt | tail
PL	01-193	Warszawa	Mazowieckie		Warszawa			52.25	21	4
PL	01-194	Warszawa	Mazowieckie		Warszawa			52.25	21	4
PL	01-195	Warszawa	Mazowieckie		Warszawa			52.25	21	4
PL	01-196	Warszawa	Mazowieckie		Warszawa			52.25	21	4
PL	01-197	Warszawa	Mazowieckie		Warszawa			52.25	21	4
PL	01-198	Warszawa	Mazowieckie		Warszawa			52.25	21	4
PL	01-199	Warszawa	Mazowieckie		Warszawa			52.25	21	4
PL	01-201	Warszawa	Mazowieckie		Warszawa			52.25	21	4
PL	01-202	Warszawa	Mazowieckie		Warszawa			52.25	21	4
PL	01-203	Warszawa	Mazowieckie		Warszawa			52.25	21	4
head -n5000 PL.txt | tail
PL	10-537	Olsztyn	Warmińsko-Mazurskie		Olsztyn				53.7833	20.4833	4
PL	10-538	Olsztyn	Warmińsko-Mazurskie		Olsztyn				53.7833	20.4833	4
PL	10-539	Olsztyn	Warmińsko-Mazurskie		Olsztyn				53.7833	20.4833	4
PL	10-540	Olsztyn	Warmińsko-Mazurskie		Olsztyn				53.7833	20.4833	4
PL	10-541	Olsztyn	Warmińsko-Mazurskie		Olsztyn				53.7833	20.4833	4
PL	10-542	Olsztyn	Warmińsko-Mazurskie		Olsztyn				53.7833	20.4833	4
PL	10-543	Olsztyn	Warmińsko-Mazurskie		Olsztyn				53.7833	20.4833	4
PL	10-544	Olsztyn	Warmińsko-Mazurskie		Olsztyn				53.7833	20.4833	4
PL	10-545	Olsztyn	Warmińsko-Mazurskie		Olsztyn				53.7833	20.4833	4
PL	10-546	Olsztyn	Warmińsko-Mazurskie		Olsztyn				53.7833	20.4833	4
head -n10000 PL.txt | tail
PL	40-094	Katowice	Śląskie		Katowice				50.2667	19.0167	4
PL	40-095	Katowice	Śląskie		Katowice				50.2667	19.0167	4
PL	40-096	Katowice	Śląskie		Katowice				50.2667	19.0167	4
PL	40-097	Katowice	Śląskie		Katowice				50.2667	19.0167	4
PL	40-098	Katowice	Śląskie		Katowice				50.2667	19.0167	4
PL	40-100	Katowice	Śląskie		Katowice				50.2667	19.0167	4
PL	40-101	Katowice	Śląskie		Katowice				50.2667	19.0167	4
PL	40-102	Katowice	Śląskie		Katowice				50.2667	19.0167	4
PL	40-103	Katowice	Śląskie		Katowice				50.2667	19.0167	4
PL	40-104	Katowice	Śląskie		Katowice				50.2667	19.0167	4

@Isabel-pena
Copy link

You are right with 0,0 coordiantes, because at the init steps I didn't find this postalcodes but I have update the geometry info quering from geonames.

I really have not many problems with coordinates. I am working with ES postalcodes, not PL. For now I update the coordinates in wof postalcodes-es with the coordinates in Geonames (I really need to find this postalcodes).
The worst thing in this data is that too many postalcodes doesn't have the hierarchy in geojson, this field appears empty, and have the same issue with belongsto
I updated this data manually, searching in admin-es the hierarchy in the cases the postalcode have a parent_id, again I can complete it with the help of geojson.

Also, me and my team have problems with postalcodes that doesn't exists in whosonfirst but are registered and exists in Spain, some of them are in geonames. Now I have build my index with the wof-spain data updated by myself and geonames. The postalcodes that have now fixed the hierarchy appears in searchs, with the locality, localadmin, region... corrected, with the original data from wof this doesn't happen, The bad thing is that I can't find the postalcodes from geonames that doesn't exist in wof, and we need it for our work.

Is any way in which we can update it and also fix the hierarchy of the postalcodes I have to update manually?

@bartek5186
Copy link
Author

bartek5186 commented Mar 17, 2021

Could you please confirm that the data is actually correct for Poland before we continue?
I'm not sure the data is very good quality?

In Poland, some of bigger cities have multiple postal codes (based on for example streets, zones, offices or districts).
So this dataset have poor/low quality without any detailed LatLon position.

For Example in Poland, There are postal codes conneted with for example streets - so there are possibility to make high quality database.

Poland PNA (postal codes) dataset are there:
https://www.poczta-polska.pl/hermes/uploads/2013/11/spispna.pdf
There are no LatLng position, but... there are address name for example:
image
Located for example there:
52°14'00.5"N 20°58'37.9"E
52.233480, 20.977189

Not in: 52.21, 21

This simple LatLng looks like high level container for bigger city like "Warszawa"

@InteNs
Copy link

InteNs commented Nov 11, 2021

for NL country geonames is also way better, wof data is 4 years out of date and incomplete

geonames is updated daily from official government sources
unfortunately it can't be imported into pelias

@missinglink
Copy link
Member

Which is the official source that geonames uses?
You might be better off just using the csv-importer to import those files directly.

We've found the Geonames postcodes files to be mixed bag, generally not very good, NL might be an exception.

@InteNs
Copy link

InteNs commented Nov 30, 2021

For the dutch data it uses https://www.cbs.nl (Statistics Netherlands) and www.kadaster.nl (The Netherlands’ Cadastre, Land Registry and Mapping Agency) which are both officially related (fuly or partially) to our government.

we succesfully used the csv-importer for that dataset, thanks for the heads-up, I didn't know there was a csv-importer :)

@bartek5186
Copy link
Author

bartek5186 commented Nov 30, 2021

I also take cvs-importer to this action, and this works great. Build-in postalcodes in this case (Europe) are useless. I have imported all custom prepared Europe region via csv-importer. The data of postal codes was prepared from official sources, and manually revisioned. I noticed little bug in importer. Imported data are named csv:postalcode, but should be named bdp:postalcode (because i set layer name source to "bdp" in importer config file, that was ignored during csv import and name in the output is csv).
Because I need autocomplete to work with postalcodes too. I was put into name_iso multiple codes.
the import file looks like that:

source,popularity,layer,id,lat,lon,name,postalcode,country,name_jso
bdp,100,postalcode,71ff447b-972b-4f7d-a8c1-e0c8c02a1a19,53.468958363988,18.760770296251,Grudziądz,86-300,PL,"[""86-300"", "" 86-301"", "" 86-302"", "" 86-303"", "" 86-304"", "" 86-305"", "" 86-306"", "" 86-307"", "" 86-308"", "" 86-309"", "" 86-310"", "" 86-311""]"

Searching work great with multiple codes in name_iso and in output i have postalocode from column postalcode.

Output:
image

@missinglink
Copy link
Member

Hi @bartek5186 I had a quick look at the issue you reported and I wasn't able to reproduce the error where the source you provide is not the same as the source of the document.

We actually have a testcase here which ensures that functionality works as expected.

If you're able to reproduce this could you please open a ticket.

@bartek5186
Copy link
Author

Hi @bartek5186 I had a quick look at the issue you reported and I wasn't able to reproduce the error where the source you provide is not the same as the source of the document.

We actually have a testcase here which ensures that functionality works as expected.

If you're able to reproduce this could you please open a ticket.

I have already done that before..
pelias/csv-importer#89

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants