Generates auxiliary data based on an LDBC SNB social network dataset.
For example, it can generate fake names for existing people such as:
<http://www.ldbc.eu/ldbc_socialnet/1.0/data/pers00000000000000000471> a <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/Person>
<http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/firstName> "Zulma";
<http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/lastName> "Tulma";
<http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/hasMaliciousCreator> <http://www.ldbc.eu/ldbc_socialnet/1.0/data/pers00000032985348840411>.
All auxiliary data that is generated is annotated with the predicate http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/hasMaliciousCreator
,
which can refer to an existing person, that acts as a malicious actor.
$ npm install -g ldbc-snb-enhancer
or
$ yarn global add ldbc-snb-enhancer
This tool can be used on the command line as ldbc-snb-enhancer
,
which takes as single parameter the path to a config file:
$ ldbc-snb-enhancer path/to/config.json
The config file that should be passed to the command line tool has the following JSON structure:
{
"@context": "https://linkedsoftwaredependencies.org/bundles/npm/ldbc-snb-enhancer/^2.0.0/components/context.jsonld",
"@id": "urn:ldbc-snb-enhancer:default",
"@type": "Enhancer",
"personsPath": "path/to/social_network_person_0_0.ttl",
"activitiesPath": "path/to/social_network_activity_0_0.ttl",
"staticPath": "path/to/social_network_static_0_0.ttl",
"destinationPathData": "path/to/social_network_auxiliary.ttl",
"logger": {
"@type": "LoggerStdout"
},
"dataSelector": {
"@type": "DataSelectorRandom",
"seed": 12345
},
"handlers": [
{
"@type": "EnhancementHandlerPersonNames",
"chance": 0.3
}
]
}
The important parts in this config file are:
"personsPath"
: Path to the persons output file of LDBC SNB."destinationPath"
: Path of the destination file to create."logger"
: An optional logger for tracking the generation process. (LoggerStdout
prints to standard output)"dataSelector"
: A strategy for selecting values from a collection. (DataSelectorRandom
selects random values based on a given seed)"handlers"
: An array of enhancement handlers, which are strategies for generating data."parameterEmitterPosts""
: An optional parameter emitter for the extracted posts."parameterEmitterComments""
: An optional parameter emitter for the extracted comments.
The following handlers can be configured.
Generate additional names for existing people. People are selected randomly from the friends that are known by the given person.
{
"handlers": [
{
"@type": "EnhancementHandlerPersonNames",
"chance": 0.3
}
]
}
Parameters:
"chance"
: The chance for a name to be generated. The number of new names will be the number of people times this chance, where names are randomly assigned to names."parameterEmitter""
: An optional parameter emitter.
Generated shape:
<http://www.ldbc.eu/ldbc_socialnet/1.0/data/pers00000000000000000471> a <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/Person>
<http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/firstName> "Zulma";
<http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/lastName> "Tulma";
<http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/hasMaliciousCreator> <http://www.ldbc.eu/ldbc_socialnet/1.0/data/pers00000032985348840411>.
Generate additional names for existing people where the malicious creator refers to a city. Cities will be selected based on the city the random person is located in.
This is a variant of the Person Names Handler.
{
"handlers": [
{
"@type": "EnhancementHandlerPersonNamesCities",
"chance": 0.3
}
]
}
Parameters:
"chance"
: The chance for a name to be generated. The number of new names will be the number of people times this chance, where names are randomly assigned to names."parameterEmitter""
: An optional parameter emitter.
Generated shape:
<http://www.ldbc.eu/ldbc_socialnet/1.0/data/pers00000021990232555617> a <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/Person>
<http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/firstName> "Zulma";
<http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/lastName> "Tulma";
<http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/hasMaliciousCreator> <http://dbpedia.org/resource/Dingzhou>.
Generate additional triples attached to existing people. People are selected randomly.
{
"handlers": [
{
"@type": "EnhancementHandlerPersonNoise",
"chance": 0.3
}
]
}
Parameters:
"chance"
: The chance for an additional triple to be generated. The number of new triples will be the number of people times this chance. This value can be larger than 1 to generate multiple triples per person.
Generated shape:
<http://www.ldbc.eu/ldbc_socialnet/1.0/data/pers00000000000000000471-noise-1>
<http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/noise> "NOISE-1";
<http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/hasCreator> <http://www.ldbc.eu/ldbc_socialnet/1.0/data/pers00000000000000000471>.
Generate posts and assign them to existing people.
{
"handlers": [
{
"@type": "EnhancementHandlerPosts",
"chance": 0.3
}
]
}
Parameters:
"chance"
: The chance for posts to be generated. The number of posts will be the number of people times this chance, where people are randomly assigned to posts.
Generated shape:
<http://www.ldbc.eu/ldbc_socialnet/1.0/data/post-fake2967> a <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/Post>;
<http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/id> "2967";
<http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/hasCreator> <http://www.ldbc.eu/ldbc_socialnet/1.0/data/pers00000004398046512167>;
<http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/hasMaliciousCreator> <http://www.ldbc.eu/ldbc_socialnet/1.0/data/pers00000010995116283441>;
<http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/creationDate> "2021-02-22T10:39:31.595Z";
<http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/locationIP> "200.200.200.200";
<http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/browserUsed> "Firefox";
<http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/content> "Tomatoes are blue";
<http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/length> "17";
<http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/language> "en";
<http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/locatedIn> <http://dbpedia.org/resource/Belgium>;
<http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/hasTag> <http://www.ldbc.eu/ldbc_socialnet/1.0/tag/Georges_Bizet>.
Generate comments and assign them to existing people as reply to existing posts
{
"handlers": [
{
"@type": "EnhancementHandlerComments",
"chance": 0.3
}
]
}
Parameters:
"chance"
: The chance for comments to be generated. The number of comments will be the number of people times this chance, where people are randomly assigned to comments.
Generated shape:
<http://www.ldbc.eu/ldbc_socialnet/1.0/data/comment-fake9> a <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/Comment>;
<http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/id> "9";
<http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/hasCreator> <http://www.ldbc.eu/ldbc_socialnet/1.0/data/pers00000008796093024878>;
<http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/hasMaliciousCreator> <http://www.ldbc.eu/ldbc_socialnet/1.0/data/pers00000032985348839704>;
<http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/creationDate> "2021-02-22T10:39:31.595Z";
<http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/locationIP> "200.200.200.200";
<http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/browserUsed> "Firefox";
<http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/content> "Tomatoes are blue";
<http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/length> "17";
<http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/replyOf> <http://www.ldbc.eu/ldbc_socialnet/1.0/data/post00000000274877908873>;
<http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/language> "en";
<http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/locatedIn> <http://dbpedia.org/resource/Belgium>;
<http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/hasTag> <http://www.ldbc.eu/ldbc_socialnet/1.0/tag/Georges_Bizet>.
Generate additional contents for existing posts.
{
"handlers": [
{
"@type": "EnhancementHandlerPostContents",
"chance": 0.3
}
]
}
Parameters:
"chance"
: The chance for post content to be generated. The number of new post contents will be the number of posts times this chance, where contents are randomly assigned to posts. @range {double}
Generated shape:
<http://www.ldbc.eu/ldbc_socialnet/1.0/data/post00000000206158430485> <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/id> "962072675046";
<http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/content> "Tomatoes are blue";
<http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/hasMaliciousCreator> <http://www.ldbc.eu/ldbc_socialnet/1.0/data/pers00000017592186048516>.
Generate additional authors for existing posts.
{
"handlers": [
{
"@type": "EnhancementHandlerPostAuthors",
"chance": 0.3
}
]
}
Parameters:
"chance"
: The chance for a post author to be generated. The number of new post authors will be the number of posts times this chance, where authors are randomly assigned to posts.
Generated shape:
<http://www.ldbc.eu/ldbc_socialnet/1.0/data/post00000000962072675046> <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/id> "962072675046";
<http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/hasCreator> <http://www.ldbc.eu/ldbc_socialnet/1.0/data/pers00000006597069770017>;
<http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/hasMaliciousCreator> <http://www.ldbc.eu/ldbc_socialnet/1.0/data/pers00000019791209301543>.
Generates vocabulary information.
{
"handlers": [
{
"@type": "EnhancementHandlerVocabulary"
}
]
}
Generated shape:
<http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/id> a rdf:Property.
<http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/hasCreator> a rdf:Property.
<http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/Person> a rdfs:Class.
Generates vocabulary information about the domain of a specific predicate.
{
"handlers": [
{
"@type": "EnhancementHandlerVocabularyPredicateDomain",
"classIRI": "http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/Comment",
"predicateIRI": "http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/locationIP"
}
]
}
Generated shape:
<http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/locationIP> rdfs:domain
<http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/Comment>.
Multiply the number of posts by a given amount.
{
"handlers": [
{
"@type": "EnhancementHandlerPostsMultiply",
"factor": 10
}
]
}
Generated shape:
<http://www.ldbc.eu/ldbc_socialnet/1.0/data/post00000000618475290624000001>
a <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/Post>;
<http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/id> "618475290624000001";
<http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/browserUsed> "Firefox";
<http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/content> "About Rupert Murdoch ... COPY 1";
Certain handlers allow their internal parameters to be emitted.
Such parameters may then for instance be valuable as query substitution parameters.
Emits parameters as CSV files.
{
"handlers": [
{
"@type": "EnhancementHandlerPersonNames",
"chance": 0.3,
"parameterEmitter": {
"@type": "ParameterEmitterCsv",
"destinationPath": "parameters-person-names.csv"
}
}
]
}
This software is written by Ruben Taelman.
This code is released under the MIT license.