All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
2.9.1 - 2024-11-27
TensorSerializer
no longer sometimes fails to serialize very large 1-dimensional tensors with multibytedtype
sRedisStreamFile.readable()
andRedisStreamFile.seekable()
now correctly returnTrue
2.9.0 - 2024-04-17
- Multiple file readers during deserialization (#87)
- Controlled by the new
num_readers
int
parameter to theTensorDeserializer
constructor - Files capable of having multiple readers opened to the same source can
make use of this parameter to increase deserialization speed
- Files on the filesystem and HTTP(S) & S3 streams from
stream_io.open_stream
are eligible to be reopened this way
- Files on the filesystem and HTTP(S) & S3 streams from
- The default number of readers is dynamic based on the type of file used
- To disable concurrent readers, pass
num_readers=1
as a parameter
- To disable concurrent readers, pass
- Controlled by the new
- Structured object serialization (#115)
TensorSerializer.write_state_dict
can now write nested mappings, sequences, and other mixtures of mappings and sequences nested in each other- When accessing an object serialized this way with a
TensorDeserializer
, sequences are converted to mappings with integer keys TensorDeserializer.tree
allows converting the deserialized objects back to a compatible collection type- Serialized as a sequence →
collections.abc.Sequence
- Serialized as a mapping →
collections.abc.Mapping
- Serialized as a sequence →
- For more information, see:
- The
TensorSerializer.write_state_dict
docstring - The
TensorDeserializer.tree
docstring - PR #115
- The
- Configurable CPU concurrency limit during serialization
- Controlled by the new
limit_cpu_concurrency
int
parameter to theTensorSerializer
constructor
- Controlled by the new
- New optional keyword parameters to
stream_io.open_stream
:- Object storage connection settings
s3_region_name
&s3_signature_version
- File byte range markers
start
&end
start
applies to all files and streamsend
applies only to HTTP(S) & S3 streams, for which it is interpreted as thestart
andend
parameters for the createdCURLStreamFile
object
- Object storage connection settings
- The
plaid_mode
andplaid_mode_buffers
parameters toTensorDeserializer
no longer have an effect- The previous default behaviour (
plaid_mode=True
wherever available) is now always applied
- The previous default behaviour (
- Serialization performance has been improved
TensorDeserializer.read_tensors
now returns tensors on the target device, and functions more efficiently- Previously, the returned values were always on the CPU
TensorDeserializer.read_tensors
's behaviour is no longer affected by the position of the file descriptor at the time of the call- Sequential calls to
read_tensors
still read consecutive parts of the file
- Sequential calls to
- Importing
tensorizer
doesn't implicitly initializetorch.cuda
whenever a GPU is available- This allows forking after importing
tensorizer
, and using the library in a subprocess
- This allows forking after importing
TensorDeserializer.read_numpy_arrays
now throws an error when used with CUDA deserialization, since numpy arrays can't be deserialized to CUDA
- Fixed a bug where
stream_io.CURLStreamFile
objects constructed with anend
parameter would read one byte past their end when callingCURLStreamFile.read
with no argument
2.8.1 - 2024-02-15
- Performance has been improved when serializing to some filesystems
(e.g. NFS, CephFS) by skipping
fallocate
pre-allocation where it is not natively supported- Previously,
posix_fallocate
's fallback behaviour was used, which wasted time writing out zeroes that would only be overwritten later
- Previously,
examples/hf_serialization.py
is now more robust when overwriting an existing serialized model in an object storage bucket- Previously, it would sometimes find and use outdated, cached data, and thus erroneously skip serialization and/or fail validation
2.8.0 - 2024-02-08
- Tensors on the
meta
device may now be serialized- These store no tensor data (only metadata) in the tensorized file
- These have no hashes for their tensor data, since there is nothing to hash
- These cannot have their data encrypted, since there is nothing to encrypt
- During deserialization, these are returned as zero-filled buffers on
the same device as other tensors
- Essentially equivalent to
torch.zeros_like(meta_tensor, device=...)
- Essentially equivalent to
TensorDeserializer
now defaults toplaid_mode=True
when deserializing to CUDA devices for better performance- There is no difference between
plaid_mode
-deserialized tensors and regular deserialized tensors (beyond deserialization performance), so this is not a breaking change
- There is no difference between
- Removed incorrect warnings in the documentation about
plaid_mode
being unsafe
- Passing
include_non_persistent_buffers=False
toTensorSerializer.write_module()
now works as intended- Previously, setting this flag to
False
filtered out both non-persistent buffers and parameters, leaving only persistent buffers - The corrected behaviour only filters out non-persistent buffers, leaving parameters untouched
- Previously, setting this flag to
- Very large individual tensors (over approximately 2147479552 bytes)
now serialize correctly
- Previously, anything over the limit for a single
write
orpwrite
syscall could not be fully written, and an error was raised during serialization - Now, multiple writes are used
- This also fixes large writes to unbuffered file-like objects if
pwrite
is not supported, as they would encounter the same issue
- Previously, anything over the limit for a single
2.7.2 - 2024-01-30
- File objects opened with
stream_io.open_stream("s3://...", "wb")
for writing to object storage now correctly upload their content when closed implicitly at the end of awith
block, without requiring an explicit call to their.close()
method- Since
TensorSerializer
objects already call.close()
explicitly on their output file objects, either whenTensorSerializer.close()
is invoked or when theTensorSerializer
is garbage collected, this bug mainly applies to manual usage ofstream_io.open_stream()
for object storage uploads not involving aTensorSerializer
- Since
2.7.1 - 2023-12-06
- Fixed a bug where a
CURLStreamFile
would report itself as unreadable, causing HTTP(S) and S3 deserialization to fail
2.7.0 - 2023-12-06
- Tensor encryption
- Refer to docs/encryption.md for details
- Encrypts all tensor weights in a file with minimal overhead
- Doesn't encrypt tensor metadata, such as:
- Tensor name
- Tensor
dtype
- Tensor shape & size
- Requires an up-to-date version of
libsodium
- Use
apt-get install libsodium23
on Ubuntu or Debian - On other platforms, follow the installation instructions from the libsodium documentation
- Takes up less than 500 KiB once installed
- Use
- Uses a parallelized version of XSalsa20-Poly1305 as its encryption algorithm
- Splits each tensor's weights into ≤ 2 MiB chunks, encrypted separately
- Example usage: see examples/encryption.py
- Example CLI tool to add or remove encryption from pre-serialized models: examples/encrypt_existing.py
- Added more error checking against deserializing corrupted files
- Added stricter error checking for file writes during serialization
- Fix cases where the
pynvml
library was available on a node with no NVML devices- This allows CPU-only deployments to work with
pynvml
in the image
- This allows CPU-only deployments to work with
- Fix serialization for tensors with discontiguous memory
- Fixed a bug where the
module_idx
on bulk serialized tensors was misaligned- During bulk writes (
write_module()
,write_state_dict()
), each tensor was receiving the preceding one'smodule_idx
instead of its own
- During bulk writes (
2.6.0 - 2023-10-30
TensorSerializer.write_module
now acceptsinclude_non_persistent_buffers
as a keyword-only boolean argument that can be set toFalse
to exclude buffers from serialization that were originally registered to the module through callingtorch.nn.Module.register_buffer
withpersistent=False
torch.nn.Module.state_dict
never includes persistent buffers, so setting this toFalse
will more closely match the behaviour ofstate_dict
serializationTensorSerializer.write_module
used to always include non-persistent buffers- The default (
include_non_persistent_buffers=True
) matches the old behaviour
stream_io.open_stream
andstream_io.CURLStreamFile
now accept an additional, optionalcertificate_handling
argument to customize the verification of SSL certificates- This corresponds to the flags
--cacert
,--capath
, and-k
/--insecure
incurl
- Customization is achieved by passing an instance of
stream_io.CAInfo
toopen_stream
or theCURLStreamFile
constructor - Example usages:
open_stream("https://localhost/model.tensors", certificate_handling=CAInfo(cacert="./localhost.pem")
open_stream("https://127.0.0.1/model.tensors", certificate_handling=CAInfo(allow_untrusted=True)
- Pass
certificate_handling=None
(the default) to use default certificate verification as compiled into cURL
- This corresponds to the flags
2.5.1 - 2023-10-17
TensorSerializer.write_state_dict
has been optimized to better match the speed ofTensorSerializer.write_module
- Improved error tracebacks reported during bulk tensor deserialization
- Serializing to a buffered file-like object with a large buffer size no longer sometimes corrupts the resulting serialized file
2.5.0 - 2023-10-13
TensorDeserializer
now takes aplaid_mode_buffers
argument specifying a fixed number of buffers to allocate whenplaid_mode=True
- Previously,
plaid_mode
used a single buffer - More buffers help when loading from very fast sources
or when
verify_hash=True
- The new default number of buffers is contextual
- 1 for HTTP/S3 streams
- 2 for other streams (e.g. local files, Redis)
- 8 when
verify_hash=True
- Previously,
TensorDeserializer
objects can now be used as context managers to safely callTensorDeserializer.close
when they are done being used
TensorDeserializer
methods that load multiple tensors at a time are now fasterTensorDeserializer
'sverify_hash
mode is much, much faster- Specifying
plaid_mode=True
for aTensorDeserializer
no longer implies (or requires)lazy_load=True
- The old default behaviour can be restored by specifying both
plaid_mode=True, lazy_load=True
- The old default behaviour can be restored by specifying both
plaid_mode
no longer prohibits accessing previously loaded tensorsdtype
conversion is more efficient for CUDA tensor deserialization- Conversions are now performed on-device rather than on the CPU
- CPU memory is now freed immediately after
TensorDeserializer
initialization for CUDA tensor deserialization whenlazy_load=False
TensorDeserializer
'slazy_load
mode no longer eagerly allocates memory that is never used
2.4.0 - 2023-10-05
- Support for
redis://
URIs instream_io.open_stream
- E.g.
redis://localhost:6379/mymodel
- E.g.
- New
stream_io.RedisStreamFile
class- Similar to
stream_io.CURLStreamFile
- Similar to
TensorDeserializer.to_redis
method for initially loading tensors into a Redis data storeforce_http
parameter tostream_io.open_stream
to downgrade an S3 connection from HTTPS to HTTP- Warning! This will stream all data completely unencrypted
- Warning! If accessing a private S3 bucket, this will also send your object-scoped access key to the server unencrypted
buffer_size
parameter tostream_io.open_stream
to control the amount of data buffered in advance during HTTP(S) loading- Defaults to 16 MiB for HTTP(S) streams and 1 to 8 MiB for Redis streams
- Previously, this was fixed at 256 MiB
TensorSerializer.write_module
has been optimized further for a speedup of ~3.6x on CUDA modules and ~3.1x on CPU modulesredis
andhiredis
are now required package dependencies
CURLStreamFile.response_headers
no longer has a chance to contain incomplete header information
2.3.0 - 2023-09-06
CURLStreamFile
now tracks request headers inCURLStreamFile.response_headers
- This can be used to track cache hits and misses during deserialization
through the
TensorDeserializer.cache_status
property
- This can be used to track cache hits and misses during deserialization
through the
2.2.0 - 2023-09-05
- Model serialization has been optimized for a speedup of approximately ~2x
2.1.2 - 2023-08-17
- Requests now include a custom
User-Agent
header specific totensorizer
2.1.1 - 2023-08-10
verify_hash
parameter forTensorDeserializer.read_tensors
- Matches the one for
TensorDeserializer.read_numpy_arrays
- Matches the one for
2.1.0 - 2023-08-09
- Hash verification of deserialized models
- During deserialization, specify
verify_hash=True
in either:- The
TensorDeserializer
constructor, TensorDeserializer.read_numpy_arrays
, orTensorDeserializer.load_into_module
(only while lazy loading)
- The
- Comparing a model already in memory against its
.tensors
file:TensorDeserializer.verify_module
- During deserialization, specify
2.0.0 - 2023-06-07
bfloat16
andcomplex32
support
- Newly serialized files now use the
TENSORIZER_VERSION = 2
binary format- Format v2 allows for
bfloat16
andcomplex32
dtypes to be stored - Existing format v1 files can still be deserialized (backwards-compatible)
- Format v2 allows for
TensorDeserializer
'sdtype
parameter now only accepts the typestorch.dtype
andNone
- It previously accepted
numpy.dtype
,str
, andNone
- It previously accepted
TensorDeserializer.read_tensors
now yieldstorch.Tensor
objects instead ofnumpy.ndarray
objectsTensorDeserializer.read_numpy_arrays
provides the old functionality- Will error when deserializing
bfloat16
orcomplex32
by default, since they are not valid dtypes innumpy
- The parameter
allow_raw_data
can be specified to readbfloat16
andcomplex32
arrays anyway but with an invalid dtype
- Will error when deserializing
TensorDeserializer
'splaid_mode
now correctly implieslazy_load
1.1.0 - 2023-05-05
- Better docstrings for the public
tensorizer
interface - More memory utilities in
utils
:MemoryUsage
: Same information asget_mem_usage
as a structured typeGlobalGPUMemoryUsage
: GPU information subset ofMemoryUsage
TorchGPUMemoryUsage
: Torch information subset ofMemoryUsage
CPUMemoryUsage
: CPU information subset ofMemoryUsage
utils.no_init_or_tensor
can now be used as a context manager
1.0.1 - 2023-03-21
- Loading from public-read S3 buckets no longer requires blank credentials
to be explicitly specified via
stream_io.open_stream
1.0.0 - 2023-03-21
TensorSerializer
classTensorDeserializer
class- State dict compatibility
- File, HTTP(S), and S3 stream compatibility
stream_io
module andstream_io.open_stream
interfaces3://tensorized
public bucket hosting pre-serialized modelsutils
module including:convert_bytes
get_device
get_mem_usage
get_gpu_name
no_init_or_tensor