You'll need grab a copy of the source dataset from OGB. Once you pull down the data, you'll need to stage it in BigQuery.
See some details on the dataprep in DATAPREP.md
Spin up a VertexAI workbench and stage the files. You might need to make sure the service account backing the GCE VM has access to BigQuery.
Install Neo4j Enterprise v4.4
on a GCE VM. (You can install it anywhere you
like, but we'll be using a GCP environment to make integration with
BigQuery easier.)
Follow the Bloom docs and install Bloom 2.2.
Once you have Bloom running, you can import the perspective.
Make sure to install GDS 2.1
as well. In the neo4j.conf
file, enable the Apache Arrow features by adding:
gds.arrow.listen_address=0.0.0.0:8491
gds.arrow.enabled=true
For the demo, we originally used 512g of heap for the jvm. A heap that size is not required for running the data import, but if you'd like to analyze the graph using the full suite of GDS we recommend having the headroom.
dbms.memory.heap.initial_size=512g
dbms.memory.heap.max_size=512g
Setting the
initial_size
will preallocate memory for the jvm during startup of Neo4j.
Lastly, make sure to setup the neo4j
password. Put a copy of it in a new file
called pass.txt
in the notebook environment.
Run the neo4j_arrow_mag240 notebook to load the data.
The GDS_code notebook contains examples using the Neo4j GDS Client