Skip to content

aicenter/Ridesharing_DARP_instances

Repository files navigation

Collage displaying different steps of the instance generation, the speed model, demand zones and generated request origins.

Large-scale Ridesharing DARP Instances Based on Real Travel Demand

arXiv link Dataset DOI Licence badge

This repository presents a set of large-scale ridesharing Dial-a-Ride Problem (DARP) instances. The instances were created as a standardized set of ridesharing DARP problems for the purpose of benchmarking and comparing different solution methods.

The instances are based on real demand and realistic travel time data from 3 different US cities, Chicago, New York City and Washington, DC. The instances consist of real travel requests from the selected period, positions of vehicles with their capacities and realistic shortest travel times between all pairs of locations in each city.

The instances and results of two solution methods, the Insertion Heuristic and the optimal Vehicle-group Assignment method, can be found in the linked dataset.

The dataset and methodology used to create it are described in the paper Large-scale Ridesharing DARP Instances Based on Real Travel Demand. This paper was accepted to the Intelligent Transportation Systems Conference 2023 (see the PowerPoint presentation).

Table of contents

Instances and Results download

Dataset DOI

The dataset of instances and associated results are available through the dataset repository Zenodo. The dataset is compressed by 7zip to adhere to the Zenodo dataset size limits, with some of the archives split into multiple parts. The distance matrices, instances, and results are in separate archives. However, the folder structure inside the archives follows the schema described below. Thus, unpacking the distance matrix archives places them in an appropriate directory in the Instances folder. The dataset is licensed under the Creative Commons Attribution 4.0 International license.

Instances structure

Each area has its own folder in the `πŸ“Instances directory. This direcory contains the distance matrix (travel time model) and map files. The instances are then organized into directories based on their parameters. That is, an instance in an area, with a given start time, duration and max delay $\Delta$ is in the following directory structure:

πŸ“ Instances/<area>/
β”œβ”€β”€ πŸ—Ž dm.hd5
└── πŸ“instances/start_<start time>/duration_<duration>/max_delay_<max delay>/
    β”œβ”€β”€ πŸ—Ž vehicles.csv
    └── πŸ—Ž requests.csv
    └── πŸ–Ί config.yaml

and consists of three files, vehicles.csv, requests.csv and config.yaml.

Instance configuration file

πŸ“ Instances/<area>/instances/start_<start time>/duration_<duration>/max_delay_<max delay>/config.yaml

The instance configuration file has the following structure:

area_dir: # path to the area directory. Only used for automatic construction of paths to files shared between instances in the same area
demand:
- filepath: # path to the demand file. ./requests.csv if not provided 
dm_filepath: # path to the distance matrix file. <area_dir>/dm.hd5 if not provided
max_prolongation: # maximum delay in seconds (deprecated)
max_pickup_delay: # maximum delay for the pickup in seconds. 
max_travel_time_delay:
- mode: # mode of the delay calculation. Can be 'absolute' for the absolute delay or 'relative' for the delay relative to the minimal travel time
- relative: # proportion of the delay relative to the minimal travel time. 1.0 means the maximal delay is equal to the minimal travel time
- seconds: # absolute delay in seconds

The maximum delay for each request should be interpreted as follows:

  1. if maxing_travel_time_delay is provided:
    1. if mode is absolute, the maximum delay is equal to max_travel_time_delay.seconds
    2. if mode is relative, the maximum delay is equal to max_travel_time_delay.relative * minimal travel time
  2. else, if max_prolongation is provided, the maximum delay is equal to max_prolongation
  3. else, the maximum delay is 0.

The logic for the maximum time for each action is as follows:

  • The maximum pickup time: is equal to max_pickup_delay if provided, otherwise, it is equal to the maximum delay for the request.
  • The maximum dropoff time: is more compliacted. It is calculated as desired pickup time + minimal travel time + maximum delay + max_pickup_delay if provided.
    • The logic is as follows:
      • If the max_pickup_delay is provided, we need to add it to the maximum delay as we consider maximum delay to be an extra time that can be added to the minimal travel time. Therefore, the maximum dropoff time is equal to the maximum pickup time + minimal travel time + maximum delay.
      • If the max_pickup_delay is not provided, the maximum dropoff time is simply the maximum pickup time + minimal travel time, as the maximum delay is already included in the maximum pickup time.
    • Finally, the maximum dropoff time is rounded up to the nearest second.

Apart from the configuration above, there can be other fields used for the instance generation. These fields has no effect on the instance itself, and can be safely ignored when using the instances.

Requests and Vehicles files

The instance folder contains the two main instance files:

πŸ“Instances/<area>/instances/start_<start time>/duration_<duration>/max_delay_<max delay>/

  • πŸ—Ž requests.csv - a 3 (4) column <tab> separated file containing the list of requests $R$ with a header defining the following columns:
    • time_ms - a request time in milliseconds from the start of the day $t$
    • origin - index of the origin node $o$. Used for indexing into the distance matrix
    • dest - index of the destination node $d$
    • min_travel_time (optional) - direct, minimal travel time in seconds between origin and destination nodes
  • πŸ—Ž vehicles.csv - a 2-column <tab> separated file containing the set of vehicles $V$ with no header row and the following column meaning:
    • vehicle starting node $s$
    • vehicle capacity $c$

A concrete example of an instance path is Instances/NYC/instances/start_18-00/duration_05_min/max_delay_03_min/.

Distance Matrix - the travel time model

πŸ—Ž Instances/<area>/dm.hd5

The travel time model $f_t(l, l')$ that determines the shortest travel time between any two nodes $l$ and $l'$ has a form of distance matrix and is shared by all instances in the same area. Since, for some areas, the matrix is quite large, it is saved using the hdf5 format. To load the distance matrix into Python, use h5py python package. The loading of the distance matrix is implemented in the MatrixTravelTimeProvider.from_hdf. Method get_travel_time(from_index, to_index) implements the access to the distance matrix and is equivalent to $f_t(l, l')$

Instance metadata and supporting files

In addition to the main instance files, the instance and area folders contain several additional files holding metadata about the instance used for instance generation, visualization, or analysis. The list of the files with their location in the directory tree is below.

πŸ“Instances/
β”œβ”€β”€ πŸ“NYC/
β”‚   └── ...
β”œβ”€β”€ πŸ“Manhattan/
β”‚   └── ...
β”œβ”€β”€ πŸ“DC/
β”‚   └── ...
└── πŸ“Chicago/
    β”œβ”€β”€ πŸ—Ž dm.h5                              # Area-specific distance matrix                 
    β”œβ”€β”€ πŸ“map/
    β”‚   β”œβ”€β”€ πŸ–Ί nodes.csv                      # List of nodes present in the area          
    β”‚   β”œβ”€β”€ πŸ–Ί edges.csv                      # List of edges present in the area       
    β”‚   └── πŸ“shapefiles/                    # Area shapefiles for visualization
    β”‚       β”œβ”€β”€ πŸ—Ί nodes.[shx, shp, prh, dbf, cpg]
    β”‚       └── πŸ—Ί edges.[shx, shp, prh, dbf, cpg]
    └── πŸ“instances/
        β”œβ”€β”€ πŸ“start_<time>/
        β”‚   β”œβ”€β”€ πŸ“duration_<duration>/
        β”‚   β”‚   β”œβ”€β”€ πŸ“max_delay_<delay>/
        β”‚   β”‚   β”‚   β”œβ”€β”€ πŸ–Ί config.yaml        # Instance generation config file
        β”‚   β”‚   β”‚   β”œβ”€β”€ πŸ—Ž requests.csv       # Requests file
        β”‚   β”‚   β”‚   β”œβ”€β”€ πŸ—Ž vehicles.csv       # Vehicles file
        β”‚   β”‚   β”‚   β”œβ”€β”€ πŸ–Ί sizing.csv         # (optional) - file holding data on the instance sizing process
        β”‚   β”‚   β”‚   β”œβ”€β”€ πŸ–Ί vehicles_pre_sizing.csv    # (optional) - file holding data on the vehicles before the sizing process
        β”‚   β”‚   β”‚   └── πŸ“shapefiles/        # Instance shapefiles for visualization
        β”‚   β”‚   β”‚       β”œβ”€β”€ πŸ—Ί vehicles.[shx, shp, prh, dbf, cpg] 
        β”‚   β”‚   β”‚       β”œβ”€β”€ πŸ—Ί pickup.[shx, shp, prh, dbf, cpg]
        β”‚   β”‚   β”‚       └── πŸ—Ί dropoff.[shx, shp, prh, dbf, cpg]
        β”‚   β”‚   └── ...
        β”‚   └── ...
        └── ...

Instance generation config files

πŸ“ Instances/<area>/instances/start_<start time>/duration_<duration>/max_delay_<max delay>/

  • πŸ–Ί config.yaml contains metadata used in the instance generation. Notable fields are
    • demand:
      • min_time and demand: max_time give the interval for the demand used in the instance. The format is yy-mm-dd HH:MM:SS in the local timezone.
    • max_prolongation: the maximum delay $\Delta$ (in seconds)
    • vehicles:
      • start_time: The datetime of the start of the vehicle operation. The format is the same as for the demand interval.
      • vehicle_capacity - sets the capacity parameter $c$ for the instance generation
      • vehicle_count - sets the number of vehicles for the instance generation
    • map: the object for the map configuration
      • SRID: The SRID of the map projection. Example: 4326 (GPS)
      • SRID_plane: The SRID of the map planar projection. Example: 32618 (UTM zone 18N)
  • πŸ–Ί sizing.csv contains the results of the instance sizing, which is the step in the instance generation process that selects the number of vehicles for the instance so that the solution found by the insertion heuristic can service all requests in the instance. See the article for details. The file uses a comma as a separator and contains three columns with a header:
    • vehicle_count - the number of vehicles used at a given step of the sizing process
    • dropped_requests - the number of requests that cannot be serviced by the given number of vehicles when solved by the insertion heuristic
    • interval_size - the size of the interval-halving step used in the sizing process

πŸ“ Instances/<area>/map/

  • πŸ–Ί nodes.csv contains information about processed road network nodes in the area. The file uses <tab> as a separator and contains four columns with a header:
    • id - node index in the distance matrix
    • db_id - node id in the database that was used for the instance generation
    • x - node x coordinate in the map planar projection
    • y - node y coordinate in the map planar projection
  • πŸ–Ί edges.csv contains information about processed road network edges in the area, including the speed. The file uses <tab> as a separator and contains six columns with a header:
    • u - from node id
    • v - to node id
    • db_id_from - from node db_id
    • db_id_to - to node db_id
    • length - length of the edge in meters
    • speed - speed of the edge used in travel time calculations, in km/h.

Visualization files

Contains area and instance files for visualization in e.g. Q-GIS

πŸ“ Instances/<area>/map/shapefiles/

  • πŸ—Ί nodes.[shx, shp, prh, dbf, cpg]
  • πŸ—Ί edges.[shx, shp, prh, dbf, cpg]

πŸ“ Instances/<area>/instances/start_<start time>/duration_<duration>/max_delay_<max delay>/shapefiles/

  • πŸ—Ί vehicles.[shx, shp, prh, dbf, cpg] - starting vehicle locations
  • πŸ—Ί pickup.[shx, shp, prh, dbf, cpg] - request pickup points
  • πŸ—Ί dropoff.[shx, shp, prh, dbf, cpg] - request dropoff points

Results structure

The results are stored in the πŸ“ Results/ folder. The folder structure follows a similar pattern as the πŸ“ Instance/ folder:

πŸ“ Results/<area>/start_<start time>/duration_<duration>/max_delay_<max delay>/<method>/
β”œβ”€β”€ πŸ–Ί config.yaml                   # experiment config file
β”œβ”€β”€ πŸ—Ž config.yaml-solution.json      # results from experiment defined by `config.yaml`
└── πŸ–Ί config.yaml-performance.json  # performance metrics from experiment defined by `config.yaml`

The <method> folders are ih for Insertion Heuristic and vga for Vehicle Group Assignment method.

Solution file

The solution is stored in πŸ—Ž config.yaml-solution.json and contains the following fields:

πŸ—Ž config.yaml-solution.json

  • cost - total cost (total travel time of all vehicles) of the solution in seconds.
  • cost_minutes - total cost of the solution in minutes, rounded.
  • dropped_requests - list of requests that were dropped in this solution.
  • plans - list of vehicle plans; each plan contains a list of actions determining which requests are served by the given vehicle and in which order. The actions are "pickup" and "drop_off".

All locations in the solution file are node IDs from the road network. The node IDs are the same as in the πŸ–Ί nodes.csv file in the instance folder. All times are in seconds from the start of the day.

A complete description of the solution format is given by the JSON schema in this repository.

Solution meta-data

There are two files with meta-data for the solution, πŸ–Ί config.yaml and πŸ–Ί config.yaml-performance.json

πŸ–Ί config.yaml file contains the experiment configuration, such as the relative path to the instance, method-specific configuration and so on.

πŸ–Ί config.yaml-performance.json file contains logged information on the run of the solver. The JSON has the following fields

  • total_time - total time of the solver run in seconds
  • peak_memory_KiB - peak memory usage of the solver in KiB
  • solver_stats- solver-specific statistics, if available. For example, for the VGA method, group_generation_time and vehicle_assignment_time are logged separately.

Instance Creation

The methodology for the instance creation is described in the article. The process is divided into the following steps:

FlowChart_v3-1.png

Many of the steps are implemented in the associated repository, but some of them rely on external binaries. That is why the published dataset contains full distance matrices for every area instead of the instance-specific, smaller distance matrices.

Sizing

The sizing of the instances is performed with the insertion heuristic (IH):

  1. We find the lowest number of vehicles for an instance for which the IH solution drops 0 requests.
  2. We multiply this number by 1.05, adding a buffer of 5% of vehicles.

This number is then used in all the experimental results.

Public Datasets used in the creation of the instances

The following data sources were used to generate demand and travel time data:

Area Demand Dataset Zone Dataset Request times
New York City and Manhattan NYC Taxi and Limousine Commission NYC taxi zones exact
Chicago City of Chicago Census tracts and community areas generated
Washington, DC City of Washington, DC Master Address Repository generated

Solution Checker

To check the validity of the solutions, we provide a solution checker implemented in Python in file darpinstances.solution_checker.py. It can be used in two ways:

  • by running the script from the command line
  • be invoking the check_solution function from your code

Command line usage

The script can be run from the command line with the following arguments:

python darpinstances/solution_checker.py <solution_file> [-i, --instance <instance_path>]

where:

  • <solution_file> is the path to the JSON solution file to be checked and
  • <instance_path> is the path to the YAML instance configuration file. If the instance path is not provided, the script will use the instance field from the experiment configuration file named config.yaml located in the same directory as the solution file.

Citation

When using the instances or the code, please cite the following paper:

[1] D. Fiedler and J. Mrkos, β€œLarge-scale Ridesharing DARP Instances Based on Real Travel Demand.” arXiv, May 30, 2023. doi: 10.48550/arXiv.2305.18859.

Bibtex entry:

@misc{fiedler2023largescale,
      title={Large-scale Ridesharing DARP Instances Based on Real Travel Demand}, 
      author={David Fiedler and Jan Mrkos},
      year={2023},
      eprint={2305.18859},
      archivePrefix={arXiv},
      primaryClass={cs.AI}
}

License

The code in the repository used to generate the instances is licensed using the GNU GENERAL PUBLIC LICENSE v3.

The dataset is licensed using the Creative Commons Attribution 4.0 International (CC BY 4.0) license.