From 036e821c565a143db80391723e80f6c456ca86f0 Mon Sep 17 00:00:00 2001 From: Stella Laurenzo Date: Sat, 20 Apr 2024 20:04:22 -0700 Subject: [PATCH] Tweak --- docs/programming_guide.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/docs/programming_guide.md b/docs/programming_guide.md index fa2d73a58..ba64aa0c9 100644 --- a/docs/programming_guide.md +++ b/docs/programming_guide.md @@ -51,11 +51,13 @@ usage in a few key ways: a. `PrimitiveInferenceTensor`: Simply backed by a PyTorch tensor (typically from a memory mapped array in a `Dataset` on storage but can be arbitrary). + b. Packed `QuantizedTensor`: These tensors are backed by a single at-rest PyTorch tensor with a specific manner of packing scheme, logically represented by a `Layout`. In practice, each GGUF quantization scheme has a distinct type of packed `QuantizedTensor` implementation. It is an open world, and arbitrary implementations are easily created. + c. Planar `QuantizedTensor`: These tensors are backed by an arbitrary dictionary of tensors (i.e. "planes"), logically represented by a `Layout`. Typically, packed `QuantizedTensors` can be converted to planar form.