yalloc is yet another memory allocator providing affordable safety in a compact package.
You can use it as a drop-in replacement for the system’s malloc(), free() and related library functions. The behaviour is fully compatible with C11.
Many memory allocators prioritize performance over safety aka security. Some others prioritize the other way around.
Essential safety, such as detecting double and invalid frees, and safeguarding internal control structures from overwriting, comes with unavoidable implementation cost.
yalloc’s main goal is to provide such safety with low enough cost to make the added value affordable in terms of efficiency.
Secondary goal is to keep implementation size, complexity and dependencies low.
The main features of this allocator are :
- segregated metadata / no headers
-
Metadata is kept separate from user data. This reduces the risk of metadata corruption and results in a more cache / TLB friendly layout.
- double free and invalid free detection
-
With configurable handling and reporting. Includes concurrent access from multiple threads.
- multithreading supported by nonblocking atomics
-
creating heaps on-demand to avoid contention.
- configurable statistics, tracing and diagnostics
-
low-overhead detailed stats and tracing.
- modest code size & complexity
-
˜ 8 KLOC inclusive, compiling into ˜ 60 KB code.
- good performance
-
compares well in many benchmarks.
Alpha, no outstanding issues.
Most benchmarks and tests work on 64-bit Linux and MacOS and show decent results.
Integrated in musl C library, installed systemwide on Linux (Alpine) system, working.
Preliminary 32-bit Linux test passes.
Some synthetic tests show high memory usage.
-
64 or 32 bit Unix-like operating system with mmap() support. Linux, FreeBSD, MacOs and Haiku tested.
-
C11 C compiler or C99 compiler with atomics support. If atomics are not available, free(p) and realloc(p,n) need to be called from the same thread malloc() was called.
-
Either thread local store or pthreads.
yalloc has minimal OS dependencies and, as expected, quite minimal standard library dependencies. For diagnostics and statistics formatting, a miniature self-contained printf is included.
Review settings in config.h
.
Review compiler and options in build.sh
.
./build.sh
See Install for details and other options.
yalloc has various provisions to help troubleshoot issues at your app’s side as well as on yalloc’s side. The overhead is low enough that these can be enabled by default.
You can enable support with compile time variable Yal_enable_stats
. This will add minimal overhead.
Set environment variable Yalloc_stats
to a value as per config.h
to print statistics at program exit.
You can enable support with compile time variable Yal_enable_trace
. This will add minimal overhead.
Set environment variable Yalloc_trace
to enable at runtime, or call yal_options()
from your app.
You can have file coordinates included in a trace by replacing malloc(len)
with yal_alloc(len,tag)
This tag can contain encoded file coordinates or a symbolic callsite label. See yal_options(Yal_trace_name,…)
for details
If you also enable Yal_enable_tag
, this callsite info is retained and shown at invalid free diagnostics.
Valgrind is a dynamic analyzer. Its client request mechanism is used to mark allocated and freed memory as well as protect metadata. Thus, when running your app linked with yalloc under valgrind without replacing malloc, illegal access on both sides of the API is detected.
Enable Valgrind support in config.h
and run ./vg_mc.sh myprog
Usage patterns can vary considerably. Some pattens align better with yalloc than others.
-
short-lived blocks, e.g. allocating and freeing a small number of blocks within a loop. Favourable.
-
many similar-sized blocks, e.g. building a large graph. Favourable.
-
allocating a high number of same-sized small blocks, then use them many times. Very favourable.
-
free and realloc from another thread than the block was allocated. Less favourable due to double directory lookup.
-
allocating blocks from a large size distribution. Popular sizes go in fixed-size bins, others into a bump allocator. Moderately favourable (more memory overhead)
-
creating a large number of threads, each allocating some blocks. With low contention, only a small number of heaps will be created.
yalloc development is helped by using the following tools:
valgrind - dynamic analyzer
PVS-Studio - static analyzer for C, C++, C#, and Java code
Coverity - static analysis
A heap is the toplevel structure to hold all user data and admin aka metadata. Memory ranges are obtained from the OS as large regions. Each region has separate user data and metadata blocks. User blocks above a given size are obtained directly directly, described by a virtual region. Other blocks are arranged from fixed-sized pools named regions. Initial regions are of a given size, subsequent regions of the same size class will be successively larger.
Regions are described by a directory, similar to how multi-level page tables describe virtual memory. A single top-level directory holds entries to mid-level tables. These in turn hold entries to leaf tables. The latter holds a region pointer per OS memory page. free() and realloc() uses these to locate a block. Pointers are validated by leading to a region and being at a valid cell start.
Within a region, user data is kept separate from admin aka metadata. This protects metadata from being overwriitten and aligns user blocks favourably. The user data is a single block, consisting of fixed-size cells. The metadata contain an entries per cell. User blocks have no header or trailer. Consecutively allocated blocks are adjacent without a gap. This helps cache and TLB efficiency. Once a region becomes fully free, it is aged gradually and eventually released to the OS. During this period, it can be reused for similar or other size classes.
Blocks are aligned following weak alignment as in C11 WG14 / N2293 Thus, small blocks follow the alignment of the largest type that fits in. 2=2 3=4 4=4 5=8 … unless configured otherwise.
Freed blocks are held in a recycling bin aka freelist.. A subsequent malloc() of similar size hands these out most recently freed first. In additon, each cell has a marker used to detect double free or invalid free.
Multiple threads are supported by having each thread use a private heap during the call, from a pool of several heaps. The number of heaps is determined by detecting contention and grows on demand. Allocations are always local in a thread’s own heap. Synchronization is done by opportunistic trylocks using atomic compare-swap instructions. If free / realloc cannot locate a block [in the local heap], a global directory is consulted. This directory holds an aggregate region directory and is updated atomically. Each region contains a local and remote freelist, the latter allocated on demand. A free or realloc from the same thread is taken from the local freelist without atomics (except double-free detect) or locking. Free or realloc from a different thread is handled by adding it to the owner region’s remote freelist. A subsequent alloc request will inspect the local freelist first. Periodically, the remote freelist is checked and a nonblocking opportunistic lock is used to remove the entry.
Single-threaded programs are detected by the absence of additional threads and the locking as described above is bypassed.
For realloc(), the size can be obtained first. If a change is needed, a new block is allocated from the local heap, and the free of the original block is handled as with a regular free().
Double-free detection is done using atomic compare-swap, to detect double or concurrent free / realloc in the presence of multiple threads. This is independent from the freelist binning described above. Without such check, a doubly freed block would result in the same block being handed out by subsequent mallocs of a similar size.