Daffodil is an open source implementation of the Data Format Description Language (DFDL). It builds upon the XML Schema language to provide an author a standard way to describe a variety of data formats. The DFDL specification is managed by the Open Grid Forum. See the resources section below for more links to helpful user documentation about Daffodil as well as XML/XSD.
The example presented here models a set of physics data and includes the following files:
-
data/
Folder containing data files corresponding to the schema.-
scdms_raw.bin
This is a sample set of SCDMS data in its raw binary format. -
scdms_xml_data.xml
This xml file was previously generated using the schema definition to show what the results of a correct parse into xml looks like.
-
-
src/scdms/xsd/
Source code for schema-
scdms.dfdl.xsd
DFDL schema file that describes the data format. This is the file that Daffodil uses as a template to parse your raw binary file. -
config.xml
Configuration file to demonstrate use of tunable parameters. More information on tunables can be found at the Daffodil Configuration documentation page.
-
-
src/scdms/SuperCDMS_Data_Format.xlsx
Excel file with an easily human-readable description of format.
To run the provided example, you first need to make sure that Java 8+ is installed on your machine. Then, download Daffodil. These instructions and commands are written assuming that the Daffodil folder was extracted into the dfdl/
directory meaning your directory structure would look something like:
dfdl/
|-apache-daffodil-2.2.0-incubating-bin/
|-bin/
|-lib/
.
.
.
|-data/
|-scdms_raw.bin
|-scdms_xml_data.xml
|-src/
|-scdms/
|-xsd/
|-config.xml
|-scdms.dfdl.xsd
but it can be run from anywhere assuming you modify the file paths. Daffodil requires the use of Java 8 so there is only one download needed to work on either Windows/Linux.
- First move into the Daffodil directory:
$ cd apache-daffodil-2.2.0-incubating-bin
- The command to parse the raw binary file into xml is:
$ ./bin/daffodil parse -s ../src/scdms/xsd/scdms.dfdl.xsd -c ../src/scdms/xsd/config.xml -o ../data/my_xml_file.xml ../data/scdms_raw.bin
or on Windows use the daffodil.bat script:
> .\bin\daffodil.bat parse -s ..\src\scdms\xsd\scdms.dfdl.xsd -c ..\src\scdms\xsd\config.xml -o ..\data\my_xml_file.xml ..\data\scdms_raw.bin
-
Your output file should now be in the
data/
folder. You can compare with existingscdms_xml_data.xml
file to see if the parse worked correctly. -
To parse the file into JSON the command simply needs a different flag (and to change your output file extension). Daffodil defaults to xml but you need the
-I json
flag to switch to that output format.
$ ./bin/daffodil parse -s ../src/scdms/xsd/scdms.dfdl.xsd -c ../src/scdms/xsd/config.xml -I json -o ../data/my_xml_file.json ../data/scdms_raw.bin
and on Windows:
> .\bin\daffodil.bat parse -s ..\src\scdms\xsd\scdms\dfdl.xsd -c ..\src\scdms\xsd\config.xml -I json -o ..\data\my_xml_file.json ..\data\scdms_raw.bin
DFDL is built on top of eXtensible Markup Language (XML) and XML Schema (XSD). It provides an extra set of attributes that allows an author to define data formats with much more granularity and nuance. Because of this, a basic understanding of XML and XSD is necessary to begin writing (and understanding) your own schema.
As of this writing, the Data Format Description Language v1.0 standard is defined in GFD-P-R.207.
Daffodil is currently under development as an Apache Incubator project. Documentation and downloads can be found at https://daffodil.apache.org/. This example was completed using version 2.2.0.
- Getting Started - Includes information on the command line options etc.
- Mailing List and Community - The Daffodil community is very responsive and extremely helpful for questions.
- Github Repository
- Examples
Daffodil also supports the ability to serialize (unparse) data (i.e. writing an xml or json file into the binary format specified by your .dfdl.xsd schema file). This process is not covered here but more information can be found in the Daffodil documentation.