Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Importing large maps #43

Open
blegat opened this issue Jul 10, 2021 · 8 comments
Open

Importing large maps #43

blegat opened this issue Jul 10, 2021 · 8 comments

Comments

@blegat
Copy link
Contributor

blegat commented Jul 10, 2021

Just started to import Belgium. belgium-latest.osm.bz2 makes 750 MB, and the unpacked belgium-latest.osm takes 8.6 GB. I tried get_map_data("belgium-latest.osm"), it took some time until my computer ran out of its 16 GB of RAM and 6 GB of SWAP and then the Julia program was killed.
I'm wondering if it would be possible to load such map given enough time, e.g. by storing things in the disc, I'm wondering if that's what graphhopper does with the _gh directory.
Another solution that might help for medium size osm file would be to support .pbf, is that feature planned or in the scope of OpenStreetMapX ?

@pszufe
Copy link
Owner

pszufe commented Jul 11, 2021

One possible way to go would be to use osmfilter and have a reduced dataset (several pieces of information can be dropped without affecting the actual map content). Still the performance bottleneck is the XML parser used with the library, and, perhaps, a different one could be tried.
Thanks for the PR - I will review it and I will be glad to help with that.

@blegat
Copy link
Contributor Author

blegat commented Jul 12, 2021

One possible way to go would be to use osmfilter and have a reduced dataset (several pieces of information can be dropped without affecting the actual map content).

Does MapData contain every information that was in the .osm file ? And does it contain information that it is not useful for computing shortest paths ?

Still the performance bottleneck is the XML parser used with the library, and, perhaps, a different one could be tried.

Indeed, but I would have expected that the memory complexity is lower. Since we use the callback API, it should have to represent the full XML at any time. So what is taking all the memory ?
With Andorra, I get:

julia> d = @time get_map_data("/home/blegat/Downloads/andorra-latest.osm", use_cache=false);
  1.666097 seconds (10.15 M allocations: 805.471 MiB, 19.43% gc time)

julia> d = @time get_map_data("/home/blegat/Downloads/andorra-latest.osm", use_cache=false);
  1.444814 seconds (10.15 M allocations: 805.486 MiB, 8.79% gc time)

julia> d = @time get_map_data("/home/blegat/Downloads/andorra-latest.osm", use_cache=false);
  1.638368 seconds (10.15 M allocations: 805.471 MiB, 18.30% gc time)

julia> d = @time get_map_data("/home/blegat/Downloads/andorra-latest.osm", use_cache=false);
  1.446085 seconds (10.15 M allocations: 805.477 MiB, 10.57% gc time)

julia> Base.summarysize(d)
5150378

julia> d = @time get_map_data("/home/blegat/Downloads/andorra-latest.osm");
[ Info: Read map data from cache /home/blegat/Downloads/andorra-latest.osm.cache
  0.058867 seconds (280.02 k allocations: 15.695 MiB)

andorra-latest.osm.bz2 is 3.4 MB, andorra-latest.osm is 37.3 MB, andorra-latest.osm.cache is 2.1 MB and andorra-latest.osm.pbf is 1.8 MB.
The size used by d seems to be 5.15 GB so for Belgium, we could expect MapData to be around 1.2 MB (8.6 / 37.3 * 5.15).
That should fit in my RAM. Do you know what else was using so much memory that I ran out of RAM ?

@pszufe
Copy link
Owner

pszufe commented Jul 13, 2021

There are two versions of map parsers - routing oriented and raw

Routing oriented (does additional processing)

julia> sample_file = joinpath(dirname(pathof(OpenStreetMapX)),"..","test","data","reno_east3.osm");

julia> @btime get_map_data($sample_file;use_cache=false);
  93.730 ms (667494 allocations: 51.23 MiB)

Raw version (25% lighter):

julia> @btime OpenStreetMapX.parseOSM($sample_file);
  74.493 ms (576298 allocations: 42.94 MiB)

The code for collecting elements can be found at the beginning of parseMap.jl file.
You can see that only a subset of nodes is parsed.

I actually run the profiler:

ProfileView.@proview OpenStreetMapX.parseOSM(sample_file);

If you try running it you can see that around 20% of time is OSMX while the rest is LibExpat. So perhaps one option would be to try a faster XML parser. Looking at the number of allocations it seems that LibExpat.jl is operating on Strings (rather than much faster Symbols) and is inefficient for large files.

@pszufe
Copy link
Owner

pszufe commented Jul 13, 2021

One more test:

julia> dat = String(read(sample_file));

julia> @btime xp_parse($dat);
  53.914 ms (739840 allocations: 66.95 MiB)

Hence currently the XML parser is the major source of problems. At the time we started repairing https://github.com/tedsteiner/OpenStreetMap.jl the LibExpat.jl was the best we could have - there were not too many great Julia stream based XML parsers at that time. Prhaps EzXML.jl could be a good new choice?

@blegat
Copy link
Contributor Author

blegat commented Jul 13, 2021

Yes, I think moving to EzXML might help.

@pszufe
Copy link
Owner

pszufe commented Aug 1, 2021

Hi, thanks for *.pbf support! I have also updated the tests since they were relying on RNG and this has changed in Julia 1.6. Now all tests pass locally on the current Julia version. I can also see that Travis migrated their servers from travis-ci.org to travis-ci.com. Somehow I am not able to change the unit testing mechanism from *.org to *.com. Travis.com seems not to be aware of the OpenStreetMapX (I just do not see the project in Travis list) - I still need to sort that out.

@pszufe
Copy link
Owner

pszufe commented Aug 1, 2021

I managed to reconfigure Travis and get everything to work, so now we have a new OpenStreetMapX release with pbf support! Should you need other functionality for your project (perhaps with some support on my site) please let me know. Thank you.

@blegat
Copy link
Contributor Author

blegat commented Aug 27, 2021

Thanks! I made a few changes that are mixed up in a branch of my fork: https://github.com/blegat/OpenStreetMapX.jl/tree/mixed_changes
as well as in my fork of ProtoBuf.jl:
https://github.com/blegat/ProtoBuf.jl/tree/mixed_changes
I have started making separate PRs for each repo where each change is precisely motivated to make it easier to review and make sure each change is an improvement and don't break anything for existing users.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants