In this file, all the file formats used by this project are described.
This part described all the file formats that can be used as input for the preprocessor
(the mapping file can also be used as input for the library
).
One of the three input formats that can be used to describe the input graph for the preprocessing. This format can also be used when benchmarking Dijkstra's Algorithm performance.
The format is a plain text file that looks as follows:
- It begins with a line
XGI n e
whereXGI
is a fixed string which serves as a magic constant.n
is then a positive integer denoting the number of nodes, whilee
is a positive integer denoting the number of edges. - After that follows exactly
e
lines in the formats t w f
each representing one edge. In this case,s
is the source node of the edge,t
is the target node of the edge, both must be in the range from 0 ton
-1 (nodes are indexed from 0).w
is the weight of the edge, in our case a positive integer.f
is a flag denoting whether the edge is a one way edge. It must be either 1 or 0. Iff
is one, the edge is a one way edge and only an edge froms
tot
is added to the graph. Iff
is 0, we treat the edge as a bidirectional edge and therefore two edges are added into the graph, an edge froms
tot
and an edge fromt
tos
, both with the weightw
.
The expected suffix for XenGraph files is .xeng
although it is not enforced.
One of the three input formats that can be used to describe the input graph for the preprocessing.
This input format was used during the 9th DIMACS Implementation Challenge on shortest paths. Example graphs in this format can therefore be downloaded from the website of the challenge: http://users.diag.uniroma1.it/challenge9/download.shtml . The graph file is a plain text file and it looks as follows:
- Everywhere in the file lines beginning with the character
c
can occur. Those lines are comment lines and are skipped during loading. - First line of the file not starting with the character
c
must be in the formatp sp n e
. Herep sp
is a fixed string which serves as a magic constant.n
is a positive integer denoting the number of nodes,e
is another positive integer denoting the number of edges. - Then there must be exactly
e
lines of the formata s t w
each representing one edge. In this case,a
is a fixed character constant indicating that the line describes one edge (so it is not a comment line).s
is the source node of the edge andt
is the target node of the edge. In this format, nodes are indexed from 1, so boths
andt
must be in the range from 1 ton
(Internally, nodes are indexed from 0, so during loading, each node ID is automatically decreased by one). In this format, each edge is considered to be a one way edge, so if we want a bidirectional edge betweens
tot
, we must provide two lines, one for an edge froms
tot
and one for an edge fromt
tos
, both with the same weight.
The expected suffix for DIMACS graph files is .gr
although it is not enforced.
One of the three input formats that can be used to describe the input graph for the preprocessing.
A CSV file represents an adjacency matrix of a graph.
- The file is expected to not have a header.
- The file must have its amount of rows equal to its amount of columns.
- Each value on row
i
and columnj
represents the weight of a directed edge from nodei
to nodej
. - Weights are expected to be valid positive integers, or
nan
for where there is no directed edge joining the nodes.
A pair of files named nodes.csv
and edges.csv
located in the same directory.
- Path to the directory containing these files must be specified in the
input_path
argument. nodes.csv
represents a list of alln
nodes identified by numbers from 0 ton
-1 (id
column) and optionally (for A* benchmarking) columnsx
(longitude) andy
(latitude).edges.csv
is a list of edges. The required columns areu
(source edge identifier),v
(target edge identifier) andlength
(edge weight, non-negative integer or floating point number).- In this format, each edge is considered to be a directional edge, so if we want a bidirectional edge between
u
andv
, we must provide two lines, one for an edge fromu
tov
and one for an edge fromv
tou
, both with the same weight. - Values must be separated by a Tab character (
\t
).
For easier benchmarking, the user can provide a set of queries which will be used for the benchmark in a very simple plain text format:
- The file begins with a line that contains only a single integer
cnt
denoting the number of queries. - The first line is followed by exactly
cnt
lines in the formats g
each denoting one query.s
andg
are both positive integers. When benchmarking without mapping,s
andg
must be in the range from 0 ton
-1 wheren
is the number of nodes in the graph. When benchmarking with mapping,s
andg
can be arbitrary positive integers as long as they fit in the long long unsigned int datatype (usually 64 bit).
If your application internally for some reason uses some node IDs that are not in the range from 0 to n
-1 (where n
is the number of nodes in the graph), you can use a mapping file that will allow the application to transform node IDs from your application to IDs in the range from 0 to n
-1. Using those mapping files, you can let the library do all the mapping work, so will not need to change the IDs in your application. The mapping file is a plain text file in the following format:
- The file begins with a line
XID n
whereXID
is a fixed string which serves as a magic constant andn
is the amount of nodes in the graph. - The first line is followed by exactly
n
lines each only containing one integeri
. Thej
-th line represents the original ID of the(j-2)
-th node. So the second line contains the original ID of the node with the ID 0 in our application. The third line contains the original ID of the node with the ID 1, and so on up to the linen+1
contains the original ID for the node with the IDn-1
in our application.
The expected suffix for the mapping files is .xeni
although it is not enforced.
All the data needed for the Contraction Hierarchies shortest distance query algorithm is stored in a binary file that is organised as follows:
- "CH\r\n" (0x43 0x48 0x0d 0x0a)
- unsigned int: version (currently "1", shold be == compared)
- unsigned int: number of nodes (= n)
- unsigned int: number of original edges (= m1)
- unsigned int: number of shortcut edges (= m2)
- n times, for each node 0..(n-1):
- unsigned int: rank
- m1 times, original edges:
- unsigned int: source node
- unsigned int: target node
- unsigned int: weight
- unsigned int: flags
- m2 times, shortcut edges:
- unsigned int: source node
- unsigned int: target node
- unsigned int: weight
- unsigned int: flags
- unsigned int: shortcut middle node
- unsigned int: 0x12345678 as terminator
- possible (bit) flags are:
- 1 = forward edge
- 2 = backward edge
- 4 = shortcut edge
- the actual flag is obtained as sum of those, so for example a forward shortcut edge will have the flag set to 5 (1+4)
Contraction Hierarchies data structure files are automatically generated with the .ch
suffix. This suffix is not enforced when loading the data structure.
All the data needed for the Transit Node Routing shortest distance query algorithm is stored in a binary file that is organised as follows:
- "TNRG" (0x54 0x4e 0x52 0x47)
- unsigned int: number of nodes (= n)
- unsigned int: number of edges (= e, both original edges and shortcut edges are contained)
- unsigned int: number of transit nodes (= t)
- e times, for each edge 0..(e-1):
- unsigned int: source node
- unsigned int: target node
- unsigned int: weight
- bool: forward flag (determines whether the edge can be used in the forward direction)
- bool: backward flag (determines whether the edge can be used in the backward direction)
- n times, for each node 0..(n-1):
- unsigned int: rank
- t times:
- unsigned int: a transit node ID (indices of all the nodes in the transit node set)
- t*t times, transit node-set distance table values:
- unsigned int: weight (shortest distance from the i-th transit node to the j-th transit node)
- n times: access nodes
- unsigned int: the number of forward access nodes for node i (k)
- k times: forward access nodes for node i
- unsigned int: forward access node ID
- unsigned int: forward access node weight
- unsigned int: the number of backward access nodes for node i (l)
- l times: backward access nodes for node i
- unsigned int: backward access node ID
- unsigned int: backward access node weight
- n times: search spaces
- unsigned int: forward search space size (k)
- k times: forward search space for node i
- unsigned int: a node ID of a node in the forward search space of node i
- unsigned int: backward search space size (l)
- l times: backward search space for node i
- unsigned int: a node ID of a node in the backward search space of node i
Transit Node Routing data structure files are automatically generated with the .tnrg
suffix. This suffix is not enforced when loading the data structure.
All the data needed for the Transit Node Routing with Arc Flags shortest distance query algorithm is stored in a binary file that is organised as follows:
- "TGAF" (0x54 0x47 0x41 0x46)
- unsigned int: number of nodes (= n)
- unsigned int: number of edges (= e, both original edges and shortcut edges are contained)
- unsigned int: number of transit nodes (= t)
- unsigned int: number of regions for Arc Flags (= k)
- e times, for each edge 0..(e-1):
- unsigned int: source node
- unsigned int: target node
- unsigned int: weight
- bool: forward flag (determines whether the edge can be used in the forward direction)
- bool: backward flag (determines whether the edge can be used in the backward direction)
- n times, for each node 0..(n-1):
- unsigned int: rank
- n times, for each node 0..(n-1):
- unsigned int: region (for Arc Flags)
- t times:
- unsigned int: a transit node ID (indices of all the nodes in the transit node set)
- t*t times, transit node-set distance table values:
- unsigned int: weight (shortest distance from the i-th transit node to the j-th transit node)
- n times: access nodes
- unsigned int: the number of forward access nodes for node i (x)
- x times: forward access nodes for node i
- unsigned int: forward access node ID
- unsigned int: forward access node weight
- uint32_t: arc flags for the forward access node encoded into a 32 bit unsigned integer
- unsigned int: the number of backward access nodes for node i (y)
- y times: backward access nodes for node i
- unsigned int: backward access node ID
- unsigned int: backward access node weight
- uint32_t: arc flags for the backward access node encoded into a 32 bit unsigned integer
- n times: search spaces
- unsigned int: forward search space size (x)
- x times: forward search space for node i
- unsigned int: a node ID of a node in the forward search space of node i
- unsigned int: backward search space size (y)
- y times: backward search space for node i
- unsigned int: a node ID of a node in the backward search space of node i
Transit Node Routing with Arc Flags data structure files are automatically generated with the .tgaf
suffix. This suffix is not enforced when loading the data structure.
The implementation can compute Distance Matrices and answer queries using those
matrices. This can not be used from the library
, but can be used from the
preprocessor
, as described above.
This was used in the thesis during benchmarks (examples can be found in src/thesisTestScenarios.cpp). The Distance Matrix is stored in a binary file that is organized as follows:
- "XDM" (0x58 0x44 0x4d)
- unsigned int: number of nodes (= n)
- n*n times, the values for the matrix:
- unsigned int: value for the field[i][j] in the distance matrix
The Distance Matrices obtained from our implementation automatically have the .xdm
suffix. This suffix is not enforced when loading the Distance Matrix.
Very similar to the CSV graph input format described previously: This output is a CSV file containing the distance matrix of a graph.
- The file does not have a header.
- The file has its amount of rows equal to its amount of columns.
- Each value on row
i
and columnj
represents the distance from nodei
to nodej
. - All values are positive integers.
When benchmarking, the user can specify an optional file into which the values for all the queries computed by the chosen method will be stored. Those values can be used to verify the correctness of the algorithms. The output format is a very simple plain text file that looks as follows:
- A string containing the name of the query set file (for example
Prague10000randomQueries.txt
). - After that follows exactly
q
lines whereq
is the amount of queries in the query set. Each line contains exactly one number corresponding to the answer to the query. So the second line contains the answer to the first query, the third line contains the answer to the second query and so on.