Skip to content
cowtowncoder edited this page Feb 15, 2013 · 9 revisions

BDB Data Format

General

  • All values are in Big-Endian format, that is, starting with the Most Significant Bytes (and bits)
  • Variable-length Integers are only used for lengths, and thus only support positive integers (removing need for using Zigzag encoding). Encoding is done using sign-bit to denote the last byte; all bytes have 7 data bits.
  • Hash codes are calculated using Murmur3/32 hash, with seed value of 0. Hash value of 0 must be masked as 1, as 0 is used as the marker for "not available"

BDB Entries

Entry metadata is stored in "raw" format with simple structure.

Structure can be thought of as consisting of multiple sections

First section: fixed data

This section has fixed offsets and is unlikely to change between data format versions.

  1. #0-#7: long "lastMod"; last modified timestamp (used for secondary index)
  2. #8-#11: Status section
  • #8: byte "version"; data entry version, hard-coded to 0x11 for the current version (other values reserved for future compatibility needs)
  • #9: byte "status"; entry status stats:
    • 0x01: soft-deleted? (active vs tombstone)
    • 0x02: is-replicated? (0->primary, not replicated; 1->secondary, created by replication)
    • others (0x4 - 0x80) reserved for future; should be left 0
  • #10: byte "compression"; Compression method, with allowed values of:
  • #11: byte "externalPathLength": 8-bit unsigned length of external storage path; 0 for inlined storage
  1. #12-#15: int "contentHash"; hash code over uncompressed content

Second section: optional fields

Currently this section only contains data if entry is compressed:

  1. #16-#19: int "compressedHash"; hash code over compressed data -- only included if compression is used (i.e. compression value is NOT 0)
  2. #20...: vlong "originalLength"; original (uncompressed) length of data -- only included if compression is used

Third section: opaque metadata

This section contains metadata used by application that uses StoreMate: it is simply stored and exposed as-is, without modifications or semantics for StoreMate itself.

  1. #? vint "metadataLength"; length in bytes of opaque metadata
  2. #? byte[metadataLength] opaque metadata itself

Fourth section: payload

This section contains either:

  • Inlined entry data (for small entries; threshold configurable), OR
  • External path (ASCII String) for larger entries

either way, it starts with:

  1. #? vlong storageLength: length of stored data, in bytes; either length of storage file, or number of inlined bytes.

and continues, depending on value of "externalPathLength":

Inlined entry data

if "externalPathLength" is 0 ('no external data'):

  1. #? byte[storageLength] "inlinedData"; actual inlined data

External data

if "externalPathLength" longer than 0:

  1. N+x: byte[externalPathLength] "externalPath"; Relative filename (ASCII-chars only) to data file that contains payload bytes