Graph Scaling Tool

A Sampling-based Method for Scaling Graph Datasets

This tool allows users to scale a graph up or down using sampling as a basis for both operations.

Paper: https://dl.acm.org/doi/10.1145/3358960.3379144

Scaling Guidelines and Results

Refer to the scaling up guidelines to target certain property changes whenever scaling up a graph.
The scaling-results document shows the results of several scaling operations conducted on different graph datasets.

Usage

Compilation

Dependencies (install them by running install_dependencies.sh):

googletest - Optional for compiling the test project.
snap-stanford - Required for the auto-tuner.

mkdir build
cd build
cmake ../
make -j$NPROC

Single thread and machine

Running graph_scaling_tool without any parameter should prompt the user several input requests accordingly, including whether a scaling up or scaling down operation should be performed.

To perform one of the two operations directly, the user must specify at least:

The input graph path: -i (e.g., -i /home/user/graph.csv). Note that this can be any format, as long as the input graph is an edge list (where each vertex is represented numerically)
The output (folder) path: -o (e.g., -o /home/user/)

To directly perform a scaling down operation, the -s parameter is required to specify the preferred sample size (based on the number of vertices). This parameter should be between 0 and 1. By default, TIES is used as sampling algorithm. This can be changed by adding the -a option with randomedge, randomnode, ties or forestfire -v {sourcevertex}. If forest fire is used, replace {sourcevertex} with the source vertex.

Example scaling down: ./graph_scaling_tool -i /home/user/graph.txt -o /home/user/graph_output -s 0.4 -a ties

To directly perform a scaling up operation, the following parameters are required:

Scaling factor -u (based on the number of vertices)
Sample size -s for each sampled graph. This value should be between 0 and 1.
Sampling algorithm -a either: ties, randomedge, randomnode or forestfire -v {sourcevertex}.
Topology -t either: star, chain, ring or fullyconnected
Bridging type -b, either: random or high (degrees)
Number of interconnections between each sample: -n
Force directed bridges -d. Setting this to true would add directed bridges.

Example scaling up: ./graph_scaling_tool -i /home/user/graph.txt -o /home/user/graph_output -u 3.0 -s 0.5 -a ties -d false -n 10 -b random -t star

Parallel & distributed

Switch to the 'distributed' branch; compile the project using:

make release

Run on a cluster with:

prun -np <number of compute nodes> -v -1 -reserve <reservation id> -sge-script $PRUN_ETC/prun-openmpi sample <input file> <output file> <sampling fraction> <CPU mem limit (bytes, per node)> <GPU mem limit (bytes, per node)>

Example with 2 nodes, "wiki" input file, 0.5 sampling fraction, 32 GB CPU memory and 12 GB GPU memory limits:

prun -np 2 -v -1 -reserve <reservation id> -sge-script $PRUN_ETC/prun-openmpi sample OUTPUT_PATH 0.5 34359738368 12884901888

Note that the input file must be an edge list, ordered by source vertex, and then by destination vertex. The number of vertices and edges in the graph must be denoted at the beginning of the file, on a separate rule, like: "! nVertices nEdges" (without quotes) The input graph should not have 'missing' vertices (vertices without incoming/outgoing edges and are therefore not present in the input file).

Name		Name	Last commit message	Last commit date
Latest commit History 144 Commits
docs		docs
graph		graph
io		io
kernel		kernel
scaling		scaling
tests		tests
.gitignore		.gitignore
CITATION.cff		CITATION.cff
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
Main.cpp		Main.cpp
README.md		README.md
install_dependencies.sh		install_dependencies.sh
scaling-guidelines.pdf		scaling-guidelines.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Graph Scaling Tool

A Sampling-based Method for Scaling Graph Datasets

Scaling Guidelines and Results

Usage

Compilation

Single thread and machine

Parallel & distributed

About

Releases

Packages

Languages

License

amusaafir/graph-scaling

Folders and files

Latest commit

History

Repository files navigation

Graph Scaling Tool

A Sampling-based Method for Scaling Graph Datasets

Scaling Guidelines and Results

Usage

Compilation

Single thread and machine

Parallel & distributed

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages