Efficient RDF Interchange (ERI) Format for RDF Data Streams
RDF streams are sequences of timestamped RDF statements or graphs, which can be generated by several types of data sources (sensors, social networks, etc.). They may provide data at high volumes and rates, and be consumed by applications that require real-time responses. Hence it is important to publish and interchange them efficiently.
The Efficient RDF Interchange (ERI) format is a compressed serialization for RDF streams. ERI exploits a key feature of RDF data streams, which is the regularity of their structure and data values, proposing a compressed serialization which can reduce the amount of data transmitted when processing RDF streams. ERI achieves significant space savings w.r.t. standard data streaming compression, remaining efficient in performance.
The ERI proposal is published in the Internation Semantic Web Conference 2014
Fernández, J. D., Llaves, A., & Corcho, O. (2014, October). Efficient RDF interchange (ERI) format for RDF data streams. In International Semantic Web Conference (pp. 244-259). Springer, Cham.
- Javier D. Fernández, Vienna University of Economics and Business (Austria);
- Alejandro Llaves, Fujitsu Laboratories of Europe (Spain);
- Óscar Corcho, Ontology Engineering Group (OEG), Univ. Politécnica de Madrid (Spain);
The research leading to these results has received funding from the European Union’s Seventh Framework Programme (FP7/2007-2013) under grant agreement no. 257641, PlanetData network of excellence.
Import the code and use the available tools in /src/org/oegupm/compactstreaming/tools:
RDF2StreamFile <input RDF> <outputRDF_Comp>
Converts a given RDF input to ERI. Parameters:
-rdftype : Type of RDF Input (ntriples, nquad, n3, turtle, rdfxml)
-base : Base URI for the dataset
-config : Config file for the conversion
-prefixes : File including the URIs to be treated as prefixes during the conversion (one per line)
-discrete : File inluding the URIs of the discrete predicates (one per line), i.e. those predicates followed by few different object values.
-uniq : File inluding the URIs of the those predicates (one per line) whose objects are mostly unrepeated.
-block : Number of Triples per Block
-quiet : Do not show progress of the conversion
StreamFile2RDF <input RDF_Comp> <outputRDF>
Coverts ERI back to plain RDF (only ntriples supported). Parameters
-quiet : Do not show progress of the conversion
Please specify conversion parameters via a config file using the schema <property>=<value>
Example:
store_subject_dictionary=false
store_object_dictionary=false
disable_consistent_predicates=false
block_size=4096
- store_subject_dictionary : Boolean to indicate if a LRU cache of subjects is used (improves compression if subjects are highly repeated)
- store_object_dictionary : Boolean to indicate if a LRU cache of objects is used (improves compression if objects are highly repeated)
- disable_consistent_predicates: By default, ERI assummes all literal values of a given predicate are of the same data type (float, string, dateTime, etc.). If this cannot be assumd, use this property (set it to false) to disable this feature.
- block_size : Integer value indicating the number of triples per block.