Skip to content

Augmented PCAP Next Generation Dump File Format

Brandon Carpenter edited this page May 2, 2014 · 13 revisions

Table of Contents

Introduction

The PCAP format has become the standard format for dumping captured packets in the free and open-source software community. Hone strives to adhere to this format as much as possible to help achieve acceptance within the networking community and to allow interoperability with other software. The original PCAP format, however, is deficient in describing anything except packets. Luckily, there is a new PCAP format, PCAP-NG (PCAP Next Generation), on the horizon with initial support in libpcap, wireshark, and other analysis software.

PCAP-NG allows for nearly any type of data to be dumped, in addition to packets, using a generic block format. Applications that lack support for a particular block type may just ignore it and continue processing blocks they do support. This provides an opportunity to utilize this new format to include process information along with packets while remaining compatible with applications that are lacking process support.

There are a minimum of two pieces of data that must be captured to perform packet-process correlation and a third that while optional provides additional information and simplifies the capture in the Linux kernel. The most obvious data that must be captured is the packet data. This data is already well defined but requires an additional piece of information to correlate packets: the process identifier. This leads to the second piece of data that must be collected: process information. The third and optional piece is connection information. While collecting connection information doesn't necessarily help correlate packets to processes, it does give additional information that may be impossible to glean from the packet data, such as determining when a socket was created and destroyed.

Below is a description of how the PCAP-NG format is extended to support the data required by Hone. Please see the PCAP-NG draft specification for more information as the sections below build on ideas and terminology from that document.

Note: There is a problem with the timestamps defined in the process and connection event blocks below. The timestamps are to be treated the same as timestamps on the enhanced packet blocks, requiring an interface description block index. However, the process and connection event blocks do not define an interface index. Currently, it is assumed that the first interface description block should be used to determine the timestamp offset, resolution, and timezone, but an interface index option or field may be added to remedy any ambiguity.

Section Information

Hone utilizes the Section Header Block defined in section 3.1 with the addition of Hone-specific options. In addition to the options defined in Section 3.1 (Section Header Block Options), the following options are valid within this block:

Name Code Length Description Example(s)
shb_host_id 257 Variable This option specifies the unique host ID. The first byte indicates the ID type, while the following bytes contain the actual identifier, the size of which is determined by the ID type. The ID type byte can be 0 (a UTF-8 string) or 1 (a 16-byte binary GUID in native endianness and prepended with 3 bytes of zeros). 0 "myhost.mydomain.tld"
1 0 0 0 21EC2020-3AEA-1069-A2DD-08002B30309D

If a host ID type of 1 (GUID) is used, the GUID must be padded with three bytes of zeros to align the GUID on a 32-bit boundary. This requires the same amount of space in the output but keeps the data aligned simplifying reading to and writing from structures.

Process Information

Process information requires a new PCAP-NG block type as defined below.

Process Event Block

     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +---------------------------------------------------------------+
  0 |                    Block Type = 0x00000101                    |
    +---------------------------------------------------------------+
  4 |                      Block Total Length                       |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  8 |                          Process ID                           |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 12 |                        Timestamp (High)                       |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 16 |                        Timestamp (Low)                        |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 20 /                                                               /
    /                      Options (variable)                       /
    /                                                               /
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                      Block Total Length                       |
    +---------------------------------------------------------------+

The meaning of the fields is:

  • Block Type: the block type of the Process Event Block is 257.
  • Block Total Length: total size of this block, as described in Section 2.1 (General Block Structure).
  • Process ID: the process ID (PID) as given by the OS.
  • Timestamp (High) and Timestamp (Low): high and low 32-bits of a 64-bit quantity representing the timestamp, as described in Section 3.3 (Enhanced Packet Block). If present, the if_index option specifies the index of the interface to use when interpreting the timestamp.
  • Options: optionally, a list of options (formatted according to the rules defined in Section 2.5 (Options)) can be present.
In addition to the options defined in Section 2.5 (Options), the following options are valid within this block:
Name Code Length Description Example(s)
proc_event 2 4 A value describing the state of the process: see below 1
proc_path 3 Variable A UTF-8 string containing the full path to the executable. "/usr/bin/bash"
proc_argv 4 Variable Zero terminated UTF-8 strings containing the argv components. "bash\0--login\0"
proc_ppid 5 4 Process ID (PID) of parent. 1
proc_uid 6 4 Effective user ID of executing user. 1000
proc_gid 7 4 Effective group ID. 100
proc_user 8 Variable Effective user name of executing user. "jdoe"
proc_group 9 Variable Effective group name. "users"
proc_sid 10 Variable UTF-8 string containing the Windows Security ID (SID) of process owner. "S-1-5-21-1180699209-877415012-3182924384-1004"
proc_raw_args 11 Variable UTF-8 string containing the raw Windows process argument string. "notepad  C:\Users\user\Documents\Mailing List.txt"
if_index 20 4 The index of the interface used to interpret the timestamp (using if_tsresol, if_tsoffset, and if_tzone). If missing, the first interface is assumed. 2

If the proc_event option (code 2) is not included, it is considered to be equal to start (0x00).

Some systems may allow paths and command-line parameters to exceed 65536 characters. Those options (proc_path and proc_argv) may be used multiple times within the same block with the value spread across multiple entries of the same option type which, when read, should be concatenated to form the actual option value.

List of proc_event values:

  • 0x00000000 - process started (includes exec and post-CreateProcess)
  • 0x00000001 - child process spawned (includes fork and CreateProcess)
  • 0x00000002 - kernel thread created
  • 0xFFFFFFFF - process ended

Connection Information (Optional)

Connection information requires a new PCAP-NG block type as defined below.

Connection Event Block

     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +---------------------------------------------------------------+
  0 |                    Block Type = 0x00000102                    |
    +---------------------------------------------------------------+
  4 |                      Block Total Length                       |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  8 |                        Connection ID                          |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 12 |                          Process ID                           |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 16 |                        Timestamp (High)                       |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 20 |                        Timestamp (Low)                        |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 24 /                                                               /
    /                      Options (variable)                       /
    /                                                               /
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                      Block Total Length                       |
    +---------------------------------------------------------------+

The meaning of the fields is:

  • Block Type: the block type of the Connection Event Block is 258.
  • Block Total Length: total size of this block, as described in Section 2.1 (General Block Structure).
  • Connection ID: the connection ID of the connection. This ID is unique only over the lifetime of the connection.
  • Process ID: the process ID (PID) of the process.
  • Timestamp (High) and Timestamp (Low): high and low 32-bits of a 64-bit quantity representing the timestamp, as described in Section 3.3 (Enhanced Packet Block). If present, the if_index option specifies the index of the interface to use when interpreting the timestamp.
  • Options: optionally, a list of options (formatted according to the rules defined in Section 2.5 (Options)) can be present.
In addition to the options defined in Section 2.5 (Options), the following options are valid within this block:
Name Code Length Description Example(s)
conn_event 2 4 Value describing the event: 0x00 = open, 0xFFFFFFFF = close (others?). 0xFFFFFFFF
if_index 20 4 The index of the interface used to interpret the timestamp (using if_tsresol, if_tsoffset, and if_tzone). If missing, the first interface is assumed. 2

If conn_event is not specified, it is assumed to be open (0x00).

Packet Information

Packet information utilizes the Enhanced Packet Block defined in section 3.3 with the addition of Hone-specific options. In addition to the options defined in Section 3.3 (Enhanced Packet Block Options), the following options are valid within this block:

Name Code Length Description Example(s)
epb_connid 257 4 Connection ID of connection associated with the packet. 4816326
epb_pid 258 4 Process ID of process associated with the packet. 219

Event Statistics

Process and connection event statistics augment the Interface Statistics Block defined in section 3.7 with the addition of Hone-specific statistics. In addition to the options defined in Section 3.7 (Interface Statistics Block Options), the following options are valid within this block:

Name Code Length Description Example(s)
isb_proc_recv 257 8 Number of process events (only incremented when one or more readers are attached). 100
isb_proc_drop 258 8 Number of process events dropped by OS due to lack of resources. 2
isb_proc_rdrdrop 259 8 Number of process events dropped by reader due to lack of buffer space starting from the beginning of the capture. 1
isb_proc_usrdeliv 260 8 Number of process events delivered to the reader starting from the beginning of the capture. 99
isb_conn_recv 261 8 Number of connection events (only incremented when one or more readers are attached). 100
isb_conn_drop 262 8 Number of connection events dropped by OS due to lack of resources. 2
isb_conn_rdrdrop 263 8 Number of connection events dropped by reader due to lack of buffer space starting from the beginning of the capture. 1
isb_conn_usrdeliv 264 8 Number of connection events delivered to the reader starting from the beginning of the capture. 99

The isb_ifrecv and isb_ifdrop values are initialized to zero when the sensor module is first loaded and incremented for packets received and dropped only while readers are attached.