Skip to content

Latest commit

 

History

History
306 lines (232 loc) · 18.6 KB

README.md

File metadata and controls

306 lines (232 loc) · 18.6 KB

The Road Graph Tool is a project for processing data from various sources into a road graph usable as an input for transportation problems. Version 0.1.0 of the project targets to provide a road network with the following features:

  • geographical location of vertices and edges, and
  • geographical shape of edges

The version 0.1.0 use the following data sources:

  • OpenStreetMap (OSM) data for the road network and its geographical properties,

The processing and storage of the data are done in a PostgreSQL/PostGIS database. To manipulate the database, import data to the database, and export data from the database, the project provides a set of Python scripts.

Dependencies

To run the tool, you need access to a local or remote PostgreSQL database with the following extensions installed:

Refer to the Prerequisities section for details on installing the required dependencies for importing data into the database.

Quick Start Guide

After setting up the configuration file, your next step is to edit the main.py file to execute only the steps you need. Currently, the content of main.py includes Python wrappers for the provided SQL functions in the SQL/ directory, an example of an argument parser, and a main execution pipeline, which may be of interest to you.

To execute the configured pipeline, follow these steps:

  1. Install the Road Graph Tool Python package: pip install -e <clone dir>/Python.

  2. Configure the database in the config.ini file (see the config-EXAMPLE.ini file for an example configuration). The remote database can be accessed through an SSH tunnel. The SSH tunneling is handled at the application level.

  3. In the python/ directory, run py scripts/install_sql.py. If some of the necessary extensions are not available in your database, the execution will fail with a corresponding logging message. Additionally, this script will initialize the needed tables, procedures, functions, etc., in your database.

  4. Import data into database and then postprocess the data in the database. There are two methods to achieve this:

    1. Execute process_osm.py u COUNTRY.osm.pbf. This triggers import_osm_to_db() function, which requires the OSM file path as an argument.

      DATABASE and SSH CONFIGURATION: Tool osm2pgsql connects to database using credentials specified in config.ini file, so make sure to check that the connection details are correct and that the database server is running. Some databases require a password, so either you are prompted to enter a password or use -P flag to have pgpass.conf file set-up in root folder of the project - use CREDENTIALS.setup_pgpass() and CREDENTIALS.remove_pgpass() when connecting to database.

      SSH TUNNEL: To ensure the SSH tunnel is correctly set up for a remote database, provide ssh details in config.ini. SSH tunnel setup is handled with set_ssh_to_db_server_and_set_port().

    2. Run the main.py script with with -i or --importflag which also calls the import_osm_to_db() function along with other SQL queries.

  5. Your database is now ready. You can execute main.py in the python/ directory.

So in the end execution order may look like this:

alias py=python3
echo 'Pre-processing database...'
py python/scripts/install_sql.py
echo "Importing OSM data to database"
py scripts/process_osm.py u COUNTRY.osm.pbf
echo 'Executing main.py...'
py main.py -a 1 -s 4326 -f False

Configuration

For configuring the Road Graph Tool, we use the YAML format. The path to the configuration file should be specified as a first argument when running the main script. All the relative paths specified in the configuration file are relative to the configuration file itself, unless specified otherwise. The main configuration affecting the whole tool is in the root of the configuration file. Other parameters are in following sections:

  • db: database configuration
  • import: configuration for the import component
  • export: configuration for the export component

In the root of the project, there is an example configuration file named config-example.yml.

Testing

For testing the PostgreSQL procedures that are the core of the Road Graph Tool, we use the pgTAP testing framework. To learn how to use pgTAP, see the pgTAP manual.

To run the tests, follow these steps:

  1. Install the pgTAP extension for your PostgreSQL database cluster according to the pgTAP manual.
  2. If you haven't already, create and initialize the database
    1. create new database using CREATE DATABASE <database_name>;
    2. copy the config-EXAMPLE.ini file to config.ini and fill in the necessary information
    3. inititalize new database using the script <rgt root>/python/scripts/install_db.py.
      • this script will install all necessary extensions and create all necessary tables, procedures, and functions.
      • the configuration for the database is loaded from the config.ini file.
  3. Execute the tests by running the following query in your PostgreSQL console:
    SELECT * FROM run_all_tests();
    • This query will return a result set containing the execution status of each test.

Components

The road graph tool consists of a set of components that are responsible for individual processing steps, importing data, or exporting data. Each component is implemented as an PostgreSQL procedure or Python script, possibly calling other procedures or functions. Additionally, each component has its own Python wrapper script that connects to the database and calls the procedure. Currently, the following components are implemented:

  • OSM file processing for importing to PostgreSQL database: processes data from OSM file that are to be imported into PostgreSQL database for further use
  • Graph Contraction: simplifies the road graph by contracting nodes and creating edges between the contracted nodes.

OSM file processing and importing

This component processes the data in an Open Street Map (OSM) XML file format and imports it into a PostgreSQL database.

Prerequisities

Before processing and loading data (can be downloaded at Geofabrik) into the database, we'll need to install several libraries:

  • psql for PostgreSQL
  • osmium: osmium-tool (macOS: brew install osmium-tool, Ubuntu: apt install osmium-tool) for preprocessing of OSM files
  • osm2pgsql (macOS: brew install osm2pgsql, Ubuntu: apt install osm2pgsql for version 1.6.0) for importing - the current version of RGT is compatible with both 2.0.0 and 1.11.0 version of osm2pgsql. The PostgreSQL database needs PostGis extension in order to enable spatial and geographic capabilities within the database, which is essential for working with OSM data. Loading large OSM files to database is memory demanding so documentation suggests to have RAM of at least the size of the OSM file.

1. Preprocessing of OSM file

Preprocessing an OSM file with osmium aims to enhance importing efficiency and speed of osm2pgsql tool. The two most common actions are sorting and renumbering. For these options, you can use the provided process_osm.py Python script:

python3 process_osm.py [option_flag] [input_file] -o [output_file]

Call python3 process_osm.py -h or python3 process_osm.py --help for more information.

  • Sorting: Sorts objects based on IDs in ascending order.
python3 process_osm.py s [input_file] -o [output_file]
  • Renumbering: Negative IDs usually represent inofficial non-OSM data (no clashes with OSM data), osm2pgsql can only handle positive sorted IDs (negative IDs are used internally for geometries). Renumbering starts at index 1 and goes in ascending order.
python3 process_osm.py r [input_file] -o [output_file]
  • Sorting and renumbering: Sorts and renumbers IDs in ascending order starting from index 1.
python3 process_osm.py sr [input_file] -o [output_file]

2. Importing to database using Flex output

The primary function of process_osm.py script is to import OSM data to the database using osm2pgsql tool configured by Flex output. Flex output allows more flexible configuration such as filtering logic and creating additional types (e.g. areas, boundary, multipolygons) and tables for various POIs (e.g. restaurants, themeparks) to get the desired output. To use it, we define the Flex style file (Lua script) that has all the logic for processing data in OSM file.

The u flag triggers import_osm_to_db() function, which requires the OSM file path as an argument.

  • Imports the data into the database (default schema is `public, but a different schema can be specified) with provided Lua style file - if omitted, the default style file pipeline.lua is used. To customize the style file, set a new path for the DEFAULT_STYLE_FILE.
  • Postprocesses the data in database if specified in POSTPROCESS_DICT, which can be configured based on the style file used during importing
python3 process_osm.py u [input_file] [-l style_file]

WARNING: Running this command will overwrite existing data in the relevant table (these tables are specified in schema.py). If you wish to proceed, use --force flag to overwrite or create new schema for new data.

E.g. this command (described bellow) processes OSM file of Lithuania using Flex output and uploads it into database (all configurations should be provided in config.ini in root folder of the project).

# runs with pipeline.lua
python3 process_osm.py u lithuania-latest.osm.pbf
# runs with simple.lua script
python3 process_osm.py u lithuania-latest.osm.pbf -l resources/lua_styles/simple.lua

Nodes in Lithuania:

Nodes in Lithuania in QGIS

3. Filtering and extraction

Data are often huge and lot of times we only need certain extracts or objects of interest in our database. So it's better practice to filter out only what we need and work with that in our database.

3.1 Geographical extracts

3.1.1 Box boundary extracts

Both osmium and osm2pgsql filter data inside the bounding box of following format: bottom-left (minlon,minlat) corner, top-right (maxlon,maxlat) corner.

Nodes inside bounding box in Lithuania:

Nodes inside bounding box in Lithuania in QGIS

Osmium
  • These commands process OSM file using bounding box coordinates to filter data within the bounding box. File resources/extracted-bbox.osm.pbf is created and can be futher processed with Flex output.
# bounding box specified directly
python3 filter_osm.py b [input_file] -c [left],[bottom],[right],[top]
# bounding box specified in config file:
python3 filter_osm.py b [input_file] -c [config_file]
  • E.g. extract bounding box of Lithuania OSM file:
python3 filter_osm.py b lithuania-latest.osm.pbf -c 25.12,54.57,25.43,54.75
# or:
python3 filter_osm.py b lithuania-latest.osm.pbf -c resources/extract-bbox.geojson
Flex output
  • We can calculate the greatest bounding box coordinates using python3 process_osm.py b based on the ID of relation (mentioned in 3.1.2) that specifies the area of interest (e.g. Vilnius - capital of Lithuania). This command processes OSM file using calculated bounding box coordinates with Flex output and imports the bounded data into database.
# find bbox (uses Python script find_bbox.py)
python3 process_osm.py b [input_file] -id [relation_id] -s [style_file]
  • E.g. this command extracts greatest bounding box from given relation ID of Lithuania OSM file and uploads it to PostgreSQL database using osm2pgsql:
python3 process_osm.py b lithuania-latest.osm.pbf -id 1529146

3.1.2 Multipolygon ID extracts

For more precise extraction, we define multipolygon - its specification is based on relation ID: https://www.openstreetmap.org/api/0.6/relation/RELATION-ID/full.

It's better to filter out only what we need with osmium (before processing with flex output) as suggested.

Ways inside multipolygon of Vilnius:

Ways inside multipolygon of Vilnius in QGIS

Osmium
  • ID can be found by specific filtering using resources/expression-example.txt or on OpenStreetMap - more on how to filter

    • use name:en for easiest filtering

    NOTE: admin_level=* expression represents administrative level of feature (borders of territorial political entities) - each country (even county) can have different numbering

  • e.g. to find relation ID that bounds Vilnius city (ID: 1529146), run double tag filtration:

# expressions-example.txt should contain: r/type=boundary
python3 filter_osm.py f lithuania-latest.osm.pbf -e expressions-example.txt
# expressions-example.txt should contain: r/name:en=Vilnius
python3 filter_osm.py f lithuania-latest.osm.pbf -e expressions-example.txt
  • get multipolygon extract that can be further processed with Flex output:
python3 filter_osm.py id [input_file] -rid [relation_id] [-s strategy] 
# E.g. extract multipolygon based on relation ID of Vilnius city:
python3 filter_osm.py id lithuania-latest.osm.pbf -rid 1529146 # creates: id_extract.osm
python3 process_osm.py u id_extract.osm
  • Strategies (optional for id and b flags in filter_osm.py) are used to extract region in certain way: use [-s strategy]to set strategy:
    • simple: faster, doesn't include complete ways (ways out of multipolygon)
    • complete ways: ways are reference-complete
    • smart: ways and multipolygon relations (by default) are reference-complete

3.2 Filter tags

Filter specific objects based on tags.

Ways with highway tag in Lithuania:

Ways with highway tag in Lithuania in QGIS

3.2.1 Osmium

  • use resources/expressions-example.txt to specify tags to be filtered in format: [object_type]/[expression] where:
    • object_type: n (nodes), w (ways), r (relations) - can be combined
    • expression: what it should match against
    • more details
python3 filter_osm.py t [input_file] -e [expression_file] [-R]
  • Optional -R flag: nodes referenced in ways and members referenced in relations will not be added to output if -R flag is used
  • e.g. to filter out highway objects use:
# expression file contains: nwr/highway
python3 filter_osm.py t [input_file] -e [expression_file]
  • use filter_osm.py h to filter objects with highway tags (even referenced and untagged)

3.2.2 Flex output

  • Use lua style files to filter out objects that have the desired tag.
    • e.g. to filter out highway objects use resources/lua_styles/filter-highway.lua which filters nodes, ways and relations with highway flag
python3 process_osm.py u lithuania-latest.osm.pbf -s resources/lua_styles/filter-highway.lua

NOTE: Unfortunately, untagged nodes and members referenced in ways and relations respectively can't be included as osm2pgsql processes objects in certain order. Use filter_osm.py for filtering referenced objects too.

Logging

Both filter_osm.py and process_osm.py output some basic logging info. Use -v/--verbose for more debugging.

Graph Contraction

This script contracts the road graph within a specified area.

  • function: contract_graph_in_area
  • SQL procedure: contract_graph_in_area
  • location: python/main.py
  • required tables:
    • nodes
    • edges
    • road_segments

Processing details

The SQL procedure contract_graph_in_area processes the graph in the following steps, visualized in the diagram below:

  1. Road Segments Table Creation: Generates a temporary table containing road segments within a target area. A road segment is a line between two subsequent nodes from the OSM data.
  2. Graph Contraction: Contracts the graph by creating a temporary table that holds the contraction information for each node.
  3. Node Updates: Updates the nodes in the database to mark some of them as contracted.
  4. Edge Creation: Generates edges for both contracted and non-contracted road segments.
  5. Contraction Segments Generation: Creates contraction segments to facilitate the creation of edges for contracted road segments.

procedure_contract_graph_in_area

Exporter

The exporter component is responsible for exporting the processed data from the database. Currently, the following formats are supported:

  • CSV: exports the data to two CSV files: one for nodes and one for edges. The columns are separated by a tabulator.
  • Shapefile: exports the data to two shapefiles: one for nodes and one for edges.

The output files contain the following fields:

The nodes file contains

  • id: the unique identifier of the node. The id goes from 0 to the number of exported nodes - 1, so it can be used as an index.
  • db_id: the unique identifier of the node in the database.
  • x: the x-coordinate of the node.
  • y: the y-coordinate of the node.

The edges file contains:

  • u: the id of the starting node of the edge.
  • v: the id of the ending node of the edge.
  • db_id_from: the unique identifier of the starting node in the database.
  • db_id_to: the unique identifier of the ending node in the database.
  • length: the length of the edge in meters.
  • speed: the speed on the edge in km/h.