This repository presents a set of large-scale ridesharing Dial-a-Ride Problem (DARP) instances. The instances were created as a standardized set of ridesharing DARP problems for the purpose of benchmarking and comparing different solution methods.
The instances are based on real demand and realistic travel time data from 3 different US cities, Chicago, New York City and Washington, DC. The instances consist of real travel requests from the selected period, positions of vehicles with their capacities and realistic shortest travel times between all pairs of locations in each city.
The instances and results of two solution methods, the Insertion Heuristic and the optimal Vehicle-group Assignment method, can be found in the linked dataset.
The dataset and methodology used to create it are described in the paper Large-scale Ridesharing DARP Instances Based on Real Travel Demand. This paper was accepted to the Intelligent Transportation Systems Conference 2023 (see the PowerPoint presentation).
- Instances and Results download
- Instances structure
- Results structure
- Instance creation
- Citation
- License
The dataset of instances and associated results are available through the dataset repository Zenodo. The dataset is compressed by 7zip to adhere to the Zenodo dataset size limits, with some of the archives split into multiple parts. The distance matrices, instances, and results are in separate archives. However, the folder structure inside the archives follows the schema described below. Thus, unpacking the distance matrix archives places them in an appropriate directory in the Instances
folder. The dataset is licensed under the Creative Commons Attribution 4.0 International license.
Each area has its own folder in the `πInstances directory. This direcory contains the distance matrix (travel time model) and map files.
The instances are then organized into directories based on their parameters.
That is, an instance in an area, with a given start time, duration and max delay
π Instances/<area>/
βββ π dm.hd5
βββ πinstances/start_<start time>/duration_<duration>/max_delay_<max delay>/
βββ π vehicles.csv
βββ π requests.csv
βββ πΊ config.yaml
and consists of three files, vehicles.csv
, requests.csv
and config.yaml
.
π Instances/<area>/instances/start_<start time>/duration_<duration>/max_delay_<max delay>/config.yaml
The instance configuration file has the following structure:
area_dir: # path to the area directory. Only used for automatic construction of paths to files shared between instances in the same area
demand:
- filepath: # path to the demand file. ./requests.csv if not provided
dm_filepath: # path to the distance matrix file. <area_dir>/dm.hd5 if not provided
max_prolongation: # maximum delay in seconds (deprecated)
max_pickup_delay: # maximum delay for the pickup in seconds.
max_travel_time_delay:
- mode: # mode of the delay calculation. Can be 'absolute' for the absolute delay or 'relative' for the delay relative to the minimal travel time
- relative: # proportion of the delay relative to the minimal travel time. 1.0 means the maximal delay is equal to the minimal travel time
- seconds: # absolute delay in seconds
The maximum delay for each request should be interpreted as follows:
- if
maxing_travel_time_delay
is provided:- if
mode
isabsolute
, the maximum delay is equal tomax_travel_time_delay.seconds
- if
mode
isrelative
, the maximum delay is equal tomax_travel_time_delay.relative
* minimal travel time
- if
- else, if
max_prolongation
is provided, the maximum delay is equal tomax_prolongation
- else, the maximum delay is 0.
The logic for the maximum time for each action is as follows:
- The maximum pickup time: is equal to
max_pickup_delay
if provided, otherwise, it is equal to the maximum delay for the request. - The maximum dropoff time: is more compliacted. It is calculated as desired pickup time + minimal travel time + maximum delay +
max_pickup_delay
if provided.- The logic is as follows:
- If the
max_pickup_delay
is provided, we need to add it to the maximum delay as we consider maximum delay to be an extra time that can be added to the minimal travel time. Therefore, the maximum dropoff time is equal to the maximum pickup time + minimal travel time + maximum delay. - If the
max_pickup_delay
is not provided, the maximum dropoff time is simply the maximum pickup time + minimal travel time, as the maximum delay is already included in the maximum pickup time.
- If the
- Finally, the maximum dropoff time is rounded up to the nearest second.
- The logic is as follows:
Apart from the configuration above, there can be other fields used for the instance generation. These fields has no effect on the instance itself, and can be safely ignored when using the instances.
The instance folder contains the two main instance files:
πInstances/<area>/instances/start_<start time>/duration_<duration>/max_delay_<max delay>/
-
π requests.csv
- a 3 (4) column<tab>
separated file containing the list of requests$R$ with a header defining the following columns:-
time_ms
- a request time in milliseconds from the start of the day$t$ -
origin
- index of the origin node$o$ . Used for indexing into the distance matrix -
dest
- index of the destination node$d$ -
min_travel_time
(optional) - direct, minimal travel time in seconds between origin and destination nodes
-
-
π vehicles.csv
- a 2-column<tab>
separated file containing the set of vehicles$V$ with no header row and the following column meaning:- vehicle starting node
$s$ - vehicle capacity
$c$
- vehicle starting node
A concrete example of an instance path is Instances/NYC/instances/start_18-00/duration_05_min/max_delay_03_min/
.
π Instances/<area>/dm.hd5
The travel time model hdf5
format. To load the distance matrix into Python, use h5py
python package. The loading of the distance matrix is implemented in the MatrixTravelTimeProvider.from_hdf
. Method get_travel_time(from_index, to_index)
implements the access to the distance matrix and is equivalent to
In addition to the main instance files, the instance and area folders contain several additional files holding metadata about the instance used for instance generation, visualization, or analysis. The list of the files with their location in the directory tree is below.
πInstances/
βββ πNYC/
β βββ ...
βββ πManhattan/
β βββ ...
βββ πDC/
β βββ ...
βββ πChicago/
βββ π dm.h5 # Area-specific distance matrix
βββ πmap/
β βββ πΊ nodes.csv # List of nodes present in the area
β βββ πΊ edges.csv # List of edges present in the area
β βββ πshapefiles/ # Area shapefiles for visualization
β βββ πΊ nodes.[shx, shp, prh, dbf, cpg]
β βββ πΊ edges.[shx, shp, prh, dbf, cpg]
βββ πinstances/
βββ πstart_<time>/
β βββ πduration_<duration>/
β β βββ πmax_delay_<delay>/
β β β βββ πΊ config.yaml # Instance generation config file
β β β βββ π requests.csv # Requests file
β β β βββ π vehicles.csv # Vehicles file
β β β βββ πΊ sizing.csv # (optional) - file holding data on the instance sizing process
β β β βββ πΊ vehicles_pre_sizing.csv # (optional) - file holding data on the vehicles before the sizing process
β β β βββ πshapefiles/ # Instance shapefiles for visualization
β β β βββ πΊ vehicles.[shx, shp, prh, dbf, cpg]
β β β βββ πΊ pickup.[shx, shp, prh, dbf, cpg]
β β β βββ πΊ dropoff.[shx, shp, prh, dbf, cpg]
β β βββ ...
β βββ ...
βββ ...
π Instances/<area>/instances/start_<start time>/duration_<duration>/max_delay_<max delay>/
-
πΊ config.yaml
contains metadata used in the instance generation. Notable fields are-
demand:
-
min_time
anddemand: max_time
give the interval for the demand used in the instance. The format isyy-mm-dd HH:MM:SS
in the local timezone.
-
-
max_prolongation
: the maximum delay$\Delta$ (in seconds) -
vehicles:
-
start_time
: The datetime of the start of the vehicle operation. The format is the same as for the demand interval. -
vehicle_capacity
- sets the capacity parameter$c$ for the instance generation -
vehicle_count
- sets the number of vehicles for the instance generation
-
-
map
: the object for the map configuration-
SRID
: The SRID of the map projection. Example:4326
(GPS) -
SRID_plane
: The SRID of the map planar projection. Example:32618
(UTM zone 18N)
-
-
-
πΊ sizing.csv
contains the results of the instance sizing, which is the step in the instance generation process that selects the number of vehicles for the instance so that the solution found by the insertion heuristic can service all requests in the instance. See the article for details. The file uses a comma as a separator and contains three columns with a header:-
vehicle_count
- the number of vehicles used at a given step of the sizing process -
dropped_requests
- the number of requests that cannot be serviced by the given number of vehicles when solved by the insertion heuristic -
interval_size
- the size of the interval-halving step used in the sizing process
-
π Instances/<area>/map/
πΊ nodes.csv
contains information about processed road network nodes in the area. The file uses<tab>
as a separator and contains four columns with a header:id
- node index in the distance matrixdb_id
- node id in the database that was used for the instance generationx
- node x coordinate in the map planar projectiony
- node y coordinate in the map planar projection
πΊ edges.csv
contains information about processed road network edges in the area, including the speed. The file uses<tab>
as a separator and contains six columns with a header:u
- from nodeid
v
- to nodeid
db_id_from
- from nodedb_id
db_id_to
- to nodedb_id
length
- length of the edge in metersspeed
- speed of the edge used in travel time calculations, in km/h.
Contains area and instance files for visualization in e.g. Q-GIS
π Instances/<area>/map/shapefiles/
πΊ nodes.[shx, shp, prh, dbf, cpg]
πΊ edges.[shx, shp, prh, dbf, cpg]
π Instances/<area>/instances/start_<start time>/duration_<duration>/max_delay_<max delay>/shapefiles/
πΊ vehicles.[shx, shp, prh, dbf, cpg]
- starting vehicle locationsπΊ pickup.[shx, shp, prh, dbf, cpg]
- request pickup pointsπΊ dropoff.[shx, shp, prh, dbf, cpg]
- request dropoff points
The results are stored in the π Results/
folder. The folder structure follows a similar pattern as the π Instance/
folder:
π Results/<area>/start_<start time>/duration_<duration>/max_delay_<max delay>/<method>/
βββ πΊ config.yaml # experiment config file
βββ π config.yaml-solution.json # results from experiment defined by `config.yaml`
βββ πΊ config.yaml-performance.json # performance metrics from experiment defined by `config.yaml`
The <method>
folders are ih
for Insertion Heuristic and vga
for Vehicle Group Assignment method.
The solution is stored in π config.yaml-solution.json
and contains the following fields:
π config.yaml-solution.json
cost
- total cost (total travel time of all vehicles) of the solution in seconds.cost_minutes
- total cost of the solution in minutes, rounded.dropped_requests
- list of requests that were dropped in this solution.plans
- list of vehicle plans; each plan contains a list of actions determining which requests are served by the given vehicle and in which order. The actions are "pickup" and "drop_off".
All locations in the solution file are node IDs from the road network. The node IDs are the same as in the πΊ nodes.csv
file in the instance folder. All times are in seconds from the start of the day.
A complete description of the solution format is given by the JSON schema in this repository.
There are two files with meta-data for the solution, πΊ config.yaml
and πΊ config.yaml-performance.json
πΊ config.yaml
file contains the experiment configuration, such as the relative path to the instance, method-specific configuration and so on.
πΊ config.yaml-performance.json
file contains logged information on the run of the solver. The JSON has the following fields
total_time
- total time of the solver run in secondspeak_memory_KiB
- peak memory usage of the solver in KiBsolver_stats
- solver-specific statistics, if available. For example, for the VGA method,group_generation_time
andvehicle_assignment_time
are logged separately.
The methodology for the instance creation is described in the article. The process is divided into the following steps:
Many of the steps are implemented in the associated repository, but some of them rely on external binaries. That is why the published dataset contains full distance matrices for every area instead of the instance-specific, smaller distance matrices.
The sizing of the instances is performed with the insertion heuristic (IH):
- We find the lowest number of vehicles for an instance for which the IH solution drops 0 requests.
- We multiply this number by 1.05, adding a buffer of 5% of vehicles.
This number is then used in all the experimental results.
The following data sources were used to generate demand and travel time data:
Area | Demand Dataset | Zone Dataset | Request times |
---|---|---|---|
New York City and Manhattan | NYC Taxi and Limousine Commission | NYC taxi zones | exact |
Chicago | City of Chicago | Census tracts and community areas | generated |
Washington, DC | City of Washington, DC | Master Address Repository | generated |
To check the validity of the solutions, we provide a solution checker implemented in Python in file darpinstances.solution_checker.py
. It can be used in two ways:
- by running the script from the command line
- be invoking the
check_solution
function from your code
The script can be run from the command line with the following arguments:
python darpinstances/solution_checker.py <solution_file> [-i, --instance <instance_path>]
where:
<solution_file>
is the path to the JSON solution file to be checked and<instance_path>
is the path to the YAML instance configuration file. If the instance path is not provided, the script will use theinstance
field from the experiment configuration file namedconfig.yaml
located in the same directory as the solution file.
When using the instances or the code, please cite the following paper:
[1] D. Fiedler and J. Mrkos, βLarge-scale Ridesharing DARP Instances Based on Real Travel Demand.β arXiv, May 30, 2023. doi: 10.48550/arXiv.2305.18859.
Bibtex entry:
@misc{fiedler2023largescale,
title={Large-scale Ridesharing DARP Instances Based on Real Travel Demand},
author={David Fiedler and Jan Mrkos},
year={2023},
eprint={2305.18859},
archivePrefix={arXiv},
primaryClass={cs.AI}
}
The code in the repository used to generate the instances is licensed using the GNU GENERAL PUBLIC LICENSE v3.
The dataset is licensed using the Creative Commons Attribution 4.0 International (CC BY 4.0) license.