A command-line utility for migrating data from Neo4j to Neptune.
java -jar neo4j-to-neptune.jar convert-csv -i /tmp/neo4j-export.csv -d output --infer-types
mvn clean install
Migration of data from Neo4j to Neptune is a multi-step process:
- Export CSV from Neo4j – Use the APOC export procedures to export data from Neptune to CSV.
- Convert CSV – Use the
convert-csv
command-line utility to convert the exported CSV into the Neptune Gremlin bulk load CSV format. - Bulk Load into Neptune – Use the Neptune bulk load API to load data into Neptune.
Use the apoc.export.csv.all
procedure from neo4j's APOC library to export data from Neo4j to CSV.
Follow the instructions for installing the APOC library for either Neo4j Desktop or Neo4j Server.
Update the neo4j.conf configuration file to enable exports:
apoc.export.file.enabled=true
CALL apoc.export.csv.all(
"neo4j-export.csv",
{d:','}
)
The path that you specify for the export file will be resolved relative to the Neo4j import directory. apoc.export.csv.all
creates a single CSV file containing data for all nodes and relationships.
Use the convert-csv
command-line utility to convert the CSV exported from Neo4j into the Neptune Gremlin bulk load CSV format.
The utility has two required parameters: the path to the Neo4j export file and the name of a directory where the converted CSV files will be written. There are also optional parameters that allow you to specify node and relationship multi-valued property policies and turn on data type inferencing.
Neo4j allows 'homogeneous lists of simple types' to be stored as properties on both nodes and edges. These lists can contain duplicate values.
Neptune provides for set and single cardinality for vertex properties, and single cardinality for edge properties. Hence, there is no straightforward migration of Neo4j node list properties containing duplicate values into Neptune vertex properties, or Neo4j relationship list properties into Neptune edge properties.
The --node-property-policy
and --relationship-property-policy
parameters allow you to control the migration of multi-valued properties into Neptune.
--node-property-policy
takes one of four values, the default being PutInSetIgnoringDuplicates
:
LeaveAsString
– Store a multi-valued Neo4j node property as a string representation of a JSON-formatted listHalt
– Halt (throw an exception) if a multi-valued Neo4j node property is encounteredPutInSetIgnoringDuplicates
– Convert a multi-valued Neo4j node property to a set cardinality Neptune property, discarding duplicate valuesPutInSetButHaltIfDuplicates
– Convert a multi-valued Neo4j node property to a set cardinality Neptune property, discarding duplicate values but halt (throw an exception) if a multi-valued Neo4j node property containing duplicate values is encountered
--relationship-property-policy
takes one of two values, the default being LeaveAsString
:
LeaveAsString
– Store a multi-valued Neo4j relationship property as a string representation of a JSON-formatted listHalt
– Halt (throw an exception) if a multi-valued Neo4j relationship property is encountered
When importing data into Neptune using the bulk loader, you can specify the data type for each property. If you supply an --infer-types
flag to convert-csv
, the utility will attempt to infer the narrowest supported type for each column in the output CSV.
Note that convert-csv
will always use a double for values with decimal or scientific notation.
Use the Neptune bulk loader to load data into Neptune from the converted CSV files.