Skip to content

Latest commit

 

History

History

examples

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

We support multiple common data schemas and here are a few examples with corresponding configuration files. You may find the "nearest match" to start with.

Note: across all examples, iteration are set to a small number to ensure a quick E2E test. For generating high-quality synthetic data, we recommend increasing iteration by your experience and computational resources.

Prerequiste

We support four different fields:

  1. Bit field (encoded as bit strings) e.g.,

    {
        "column": "srcip",
        "type": "integer",
        "encoding": "bit",
        "n_bits": 32
    }

    An optional property to this field is truncate, which is a boolean value with default False. If truncate is set to true, then we will truncate large integers and consider only the most significant n_bits bits.

  2. Word2Vec field (encoded as Word2Vec vectors), e.g.,

    {
        "column": "srcport",
        "type": "integer",
        "encoding": "word2vec_port"
    }
  3. Categorical field (encoded as one-hot encoding), e.g.,

    {
        "column": "type",
        "type": "string",
        "encoding": "categorical"
    }
  4. Continuous field, e.g.,

    {
        "column": "pkt",
        "type": "float",
        "normalization": "ZERO_ONE",
        "log1p_norm": true
    }

Dataset type 1: single-event

Single-event schema contains one timeseries per row.

Data schema

Timestamp (optional) Metadata 1 Metadata 2 ... Timeseries 1 Timeseries 2 ...
t1
t2
...

Examples

  1. PCAP

    Timestamp Srcip Dstip Srcport Dstport Proto Pkt_size ...
    t1
    t2
    ...
  2. NetFlow (configuration_file)

Multi-event data schema contains multiple timeseries per row.

Data Schema

Metadata 1 Metadata 2 ... {Timestamp (optional), Timeseries 1, Timeseries 2, ...} {Timestamp (optional), Timeseries 1, Timeseries 2, ...} ...

Examples

  1. Wikipedia dataset (configuration_file)
    Domain Access type Agent {Date 1, page view} {Date 2, page view} ...