Skip to content

Autogenerated Bytewise SIMD-Optimized Look-Up Tables

License

Notifications You must be signed in to change notification settings

fuzzypixelz/absolut

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Absolut

Absolut stands for "Autogenerated Bytewise SIMD-Optimized Look-Up Tables". The following is a breakdown of this jargon:

  • Bytewise Lookup Table: One-to-one mappings between sets of bytes.
  • SIMD-Optimized: Said lookup tables are implemented using SIMD (Single Instruction Multiple Data) instructions, such as PSHUFB on x86_64 and TBL on AArch64.
  • Autogenerated: This crate utilizes procedural macros to generate (if possible) SIMD lookup tables given a human-readable byte-to-byte mapping.

Why?

SIMD instructions allow for greater data parallelism when performing table lookups on bytes. This is has proved incredibly useful for high-performance data processing.

Unfortunately, SIMD table lookup instructions (or byte shuffling instructions) operate on tables too small to cover the entire 8-bit integer space. These tables typically have a size of 16 on x86_64, while on AArch64 tables of up to 64 elements are supported.

This library facilitates the generation of SIMD lookup tables from high-level descriptions of byte-to-byte mappings. The goal is to avoid the need to hardcode manually-computed SIMD lookup tables, thus enabling a wider audience to utilize these techniques more easily.

How?

Absolut is essentially a set of procedural macros that accept byte-to-byte mapping descriptions in the form of Rust enums:

#[absolut::one_hot]
pub enum JsonTable {
    #[matches(b',')]
    Comma,
    #[matches(b':')]
    Colon,
    #[matches(b'[', b']', b'{', b'}')]
    Brackets,
    #[matches(b'\r', b'\n', b'\t')]
    Control,
    #[matches(b' ')]
    Space,
    #[wildcard]
    Other,
}

The above JsonTable enum encodes the following one-to-one mapping:

Input Output
0x2C Comma
0x3A Colon
0x5B, 0x5D, 0x7B, 0x7D Brackets
0xD, 0xA, 0x9 Control
0x20 Space
* Other

Where * denotes all other bytes not explicitly mapped.

Mapping results needn't be explicitly defined as Absolut will solve for them automatically. In the previous code snippet, the expression JsonTable::Space as u8 evaluates to the output byte when performing a table lookup on 0x20.

Absolut supports multiple techniques for constructing SIMD lookup tables called algorithms. Each algorithm is implemented as a procedural macro that accepts byte-to-byte mappings described using enums with attribute-annotated variants as illustrated above with the absolut::one_hot algorithm.

Known issues

Error messages

In case a byte-to-byte mapping cannot be implemented using a given Absolut algorithm (i.e. the table is unsatisfiable) the resulting error messages won't be useful for understanding why the algorithm failed to solve for the table. Unless the user is at least vaguely familiar with how the algorithm at play works, it would be difficult for them to figure out how to change the mapping in such a way that it becomes satisfiable and stay useful for their purposes.

SIMD lookup routines

Absolut currently does not provide SIMD implementations of lookup routines for the generated lookup tables. However, the library tests contain lookup routines for SSSE3 and NEON.

License

Absolut is open-source software licensed under the terms of the MIT License.