add atomic-ref-list post

SamWindell · Mar 16, 2024 · 9bfbd07 · 9bfbd07
1 parent 9266aa6
commit 9bfbd07
Show file tree

Hide file tree

Showing 4 changed files with 304 additions and 4 deletions.
diff --git a/_posts/2023-11-29-lock-free-queue.md b/_posts/2023-11-29-lock-free-queue.md
@@ -13,8 +13,8 @@ Let's start with an overview of things we might want to consider first:
 
 The simplest design that I've found for a SPSC queue is from the paper ["Correct and Efficient Bounded FIFO Queues by Nhat Minh Lê, Adrien Guatto, Albert Cohen, Antoniu Pop"](https://inria.hal.science/hal-00862450/document). It includes an implementation using C11 atomics of an improved version of an Lamport's queue (from an older paper). It also proposes a faster version called WeakRB that might be worth looking at. For now, let's just take a look at the improved Lamport's queue because it's incredibly simple. The underlying data structure is fixed-size ring-buffer.
 
-<pre>
-<code>atomic_size_t front;
+```cpp
+atomic_size_t front;
 size_t pfront;
 atomic_size_t back;
 size_t cback;
@@ -44,8 +44,8 @@ static inline bool pop(T *elem) {
     *elem = data[b];
     atomic_store_explicit(&front, (f + 1) % SIZE, memory_order_release);
     return true;
-}</code>
-</pre>
+}
+```
 
 It would be very simple to translate this into C++11 using std::atomic if needed: C11 and C++ use the same atomics model.
 
@@ -57,6 +57,7 @@ If multiple-producers or multiple-consumers are needed I would suggest that the
 - [FreeBSD buf_rung.h](https://svnweb.freebsd.org/base/release/12.2.0/sys/sys/buf_ring.h?revision=367086&view=markup)
 - [The Book of Gehn - Lock-Free Queue](https://book-of-gehn.github.io/articles/2020/03/22/Lock-Free-Queue-Part-I.html)
 - [Loki lock-free queue library](https://github.com/eldipa/loki)
+- [Correct and Efficient Work-Stealing for Weak Memory Models](https://fzn.fr/readings/ppopp13.pdf)
 
 On a slightly unrelated note, here's a link to a neat-trick regarding read/write indexes of a ring buffer: [Juho Snellman's Weblog - I've been writing ring buffers wrong all these years](https://www.snellman.net/blog/archive/2016-12-13-ring-buffers).
 

diff --git a/_posts/2024-03-16-atomic-ref-list.md b/_posts/2024-03-16-atomic-ref-list.md
@@ -0,0 +1,232 @@
+---
+title: Atomic lock-free linked list
+layout: post
+tags: programming c/c++
+---
+
+For [Mirage](https://frozenplain.com/mirage), I need to load sample-library configuration files from disk, including lots of audio files which need decoding into memory for other threads to use. A [new update](https://frozenplain.com/code-your-own-libraries-devlog-5/) that I'm working on makes this a little trickier because it introduces the requirement that the set of audio files can change on the fly; changes to the Lua configuration script should cause Mirage to automatically apply them.
+
+This is a problem of sharing memory across threads. The requirements are as follows:
+- Communication is between exactly 2 threads: the reading thread, and the writing thread.
+- The reading thread (the GUI thread) should be able access and iterate the available sample-libraries and audio files with absolute minimum of overhead.
+- The writing thread (the thread that reads files from disk) should be able to modify or remove items from the sample-library list. And it should be able to periodically delete unreferenced items.
+
+I think the data structure that I've come up with to solve this fiddly problem is pretty neat. It combines 3 well-known patterns: singly-linked lists (3 of them), a memory-arena and weak reference counting. It makes extensive use of atomic operations instead of locks. See the whole code [here](https://gist.github.com/SamWindell/5f9eb5226eeb110f91e3917015e98e8e).
+
+Let's start with what the struct looks like:
+
+```cpp
+template <typename ValueType>
+struct AtomicRefList {
+    Atomic<Node *> live_list {}; // reader-thread or writer-thread
+    Node *dead_list {}; // writer-thread
+    Node *free_list {}; // writer-thread
+    ArenaAllocator arena {PageAllocator::Instance()}; // writer-thread
+};
+```
+
+Above, we have the 3 intrusive singly-linked lists. We also have an ArenaAllocator. I'm not going to include the implementation of the ArenaAllocator here because things will get a bit complicated. Essentially it can be used to allocate memory that is mostly contiguous and can be freed all at once. When memory is requested, it tries to bump-allocate the requested amount by incrementing a cursor on a large memory region. If there's no room in its current region, it will create a new region and return memory from that instead. The important things to note here are: memory never moves, and memory is freed all at once in its destructor. Read about [arena allocators here](https://en.wikipedia.org/wiki/Region-based_memory_management), or have a look at [Zig's implementation](https://github.com/ziglang/zig/blob/master/lib/std/heap/arena_allocator.zig) of one.
+
+Using linked lists is often discouraged for performance reasons. It's valid recommendation if the nodes are allocated using a general-purpose allocator, such as malloc/free or new/delete. In these cases, the memory of each node is likely to be in completely different locations and therefore when iterating through the list, the CPU is not able to effectively cache or prefetch contiguous memory. However, if the nodes of a linked list are allocated using an arena allocator then they are most often near-contiguous and so the cache and prefetch systems of the CPU are effective.
+
+Back to the implementation of this atomic linked list. Next, let's define what the Node structure looks like:
+
+
+```cpp
+// Nodes are never destroyed or freed until this class is destroyed so use-after-free is not an issue. To
+// get around the issues of using-after-destructor, we use weak reference counting involving a bit flag.
+struct Node {
+    // reader
+    ValueType *TryRetain() {
+        const auto r = reader_uses.FetchAdd(1, MemoryOrder::Relaxed);
+        if (r & k_dead_bit) [[unlikely]] {
+            reader_uses.FetchSub(1, MemoryOrder::Relaxed);
+            return nullptr;
+        }
+        return &value;
+    }
+
+    // reader, if TryRetain() returned non-null
+    void Release() {
+        const auto r = reader_uses.FetchSub(1, MemoryOrder::Relaxed);
+        ASSERT(r != 0);
+    }
+
+    // Presence of this bit signifies that this node should not be read. However, increment and decrement operations
+    // will still work fine regardless of whether it is set - there will be 31-bits of data that track
+    // changes. Doing it this way moves the more expensive operations onto the writer thread rather than
+    // the reader thread. The writer thread does atomic bitwise-AND (which is sometimes a CAS loop in
+    // implementation), but the reader thread can do an atomic increment and then check the bit on the
+    // result, non-atomically. The alternative might be to get the reader thread to do an atomic CAS to
+    // determine if reader_uses is zero, and only increment it if its not, but this is likely more
+    // expensive.
+    static constexpr u32 k_dead_bit = 1u << 31;
+
+    Atomic<u32> reader_uses;
+    ValueType value;
+    Atomic<Node *> next;
+    Node *writer_next;
+};
+```
+
+From the 2 snippets above we can begin to see how these structs might be used:
+- Once allocated, Nodes are always a valid memory location.
+- The reader-thread can access the live_list of nodes, but it must acquire access to the value by doing TryRetain(), and afterwards, Release(). This is like std::weak_ptr::lock.
+- The writer-thread can move nodes from the live_list into the dead_list, and subsequently when no readers are using the node, it can destroy the node->value and add it to the free_list, ready to be used again.
+
+Here's what the reader-thread can use:
+
+```cpp
+struct Iterator {
+    friend bool operator==(const Iterator &a, const Iterator &b) { return a.node == b.node; };
+    friend bool operator!=(const Iterator &a, const Iterator &b) { return a.node != b.node; };
+    Node &operator*() const { return *node; }
+    Node *operator->() { return node; }
+    Iterator &operator++() {
+        prev = node;
+        node = node->next.Load(MemoryOrder::Relaxed);
+        return *this;
+    }
+    Node *node {};
+    Node *prev {};
+};
+
+// reader or writer
+// If you are the reader the values should be considered weak references; you MUST call TryRetain (and
+// afterwards Release) on the object before using it.
+Iterator begin() const { return Iterator(live_list.Load(MemoryOrder::Relaxed), nullptr); }
+Iterator end() const { return Iterator(nullptr, nullptr); }
+```
+
+And the features that the writer-thread can use:
+
+```cpp
+// writer, call placement-new on node->value
+Node *AllocateUninitialised() {
+    if (free_list) {
+        auto node = free_list;
+        free_list = free_list->writer_next;
+        ASSERT(node->reader_uses.Load() & Node::k_dead_bit);
+        return node;
+    }
+
+    auto node = arena.NewUninitialised<Node>();
+    node->reader_uses.Raw() = 0;
+    return node;
+}
+
+// writer, only pass a node just acquired from AllocateUnitialised and placement-new'ed
+void DiscardAllocatedInitialised(Node *node) {
+    node->value.~ValueType();
+    node->writer_next = free_list;
+    free_list = node;
+}
+
+// writer, node from AllocateUninitalised
+void Insert(Node *node) {
+    // insert so the memory is sequential for better cache locality
+    Node *insert_after {};
+    {
+        Node *prev {};
+        for (auto n = live_list.Load(MemoryOrder::Relaxed); n != nullptr;
+                n = n->next.Load(MemoryOrder::Relaxed)) {
+            if (n > node) {
+                insert_after = prev;
+                break;
+            }
+            prev = n;
+        }
+    }
+
+    // put it into the live list
+    if (insert_after) {
+        node->next.Store(insert_after->next.Load());
+        insert_after->next.Store(node);
+    } else {
+        node->next.Store(live_list.Load());
+        live_list.Store(node);
+    }
+
+    // signal that the reader can now use this node
+    node->reader_uses.FetchAnd(~Node::k_dead_bit);
+}
+
+// writer, returns next iterator (i.e. instead of ++it in a loop)
+Iterator Remove(Iterator iterator) {
+    if constexpr (DEBUG_CHECKS_ENABLED) {
+        bool found = false;
+        for (auto n = live_list.Load(MemoryOrder::Relaxed); n != nullptr;
+                n = n->next.Load(MemoryOrder::Relaxed)) {
+            if (n == iterator.node) {
+                found = true;
+                break;
+            }
+        }
+        ASSERT(found);
+    }
+
+    // remove it from the live_list
+    if (iterator.prev)
+        iterator.prev->next.Store(iterator.node->next.Load());
+    else
+        live_list.Store(iterator.node->next.Load());
+
+    // add it to the dead list. we use a separate 'next' variable for this because the reader still might
+    // be using the node and it needs to know how to correctly iterate through the list list rather than
+    // suddendly being redirecting into iterating the dead list
+    iterator.node->writer_next = dead_list;
+    dead_list = iterator.node;
+
+    // signal that the reader should no longer user this node
+    iterator.node->reader_uses.FetchAdd(Node::k_dead_bit);
+
+    return Iterator {.node = iterator.node->next.Load(), .prev = iterator.prev};
+}
+
+// writer
+void Remove(Node *node) {
+    Node *previous {};
+    for (auto it = begin(); it != end(); ++it) {
+        if (it.node == node) break;
+        previous = it.node;
+    }
+    Remove(Iterator {node, previous});
+}
+
+// writer
+void RemoveAll() {
+    for (auto it = begin(); it != end();)
+        it = Remove(it);
+}
+
+// writer, call this regularly
+void DeleteRemovedAndUnreferenced() {
+    Node *previous = nullptr;
+    for (auto i = dead_list; i != nullptr;) {
+        ASSERT(i->writer_next != i);
+        ASSERT(previous != i);
+        if (previous) ASSERT(previous != i->writer_next);
+
+        if (i->reader_uses.Load() == Node::k_dead_bit) {
+            if (!previous)
+                dead_list = i->writer_next;
+            else
+                previous->writer_next = i->writer_next;
+            auto next = i->writer_next;
+            i->value.~ValueType();
+            i->writer_next = free_list;
+            free_list = i;
+            i = next;
+        } else {
+            previous = i;
+            i = i->writer_next;
+        }
+    }
+}
+```
+
+There's a slightly strange pattern in the above snippet where adding items to the list is a 3-step process. First you must call AllocateUninitialised(), then placement-new the node->value and finally pass it to Insert(). This could certainly be combined into a single operation. However, in my case a placement-new is necessary when the ValueType is non-copyable and non-moveable.
+
+The writer-thread is probably a thread that always runs in 'background' of your application. It is probably some sort of event-loop that wakes-up to respond to requests. In my case, the writer-thread wakes-up when it is informed of changes to sample-library configuration files. It then reads and decodes files and makes changes to the AtomicRefList, and finally calls the DeleteRemovedAndUnreferenced() method to clean up any unused items.
+
+I hope that this is interesting or helpful to someone. Please leave any suggestions or comments below.
diff --git a/style.css b/style.css
@@ -1,3 +1,5 @@
+@import "syntax-theme.css";
+
 :root {
    --background-1: rgb(32, 35, 39);
    --background-2: rgb(31, 31, 31);

diff --git a/syntax-theme.css b/syntax-theme.css
@@ -0,0 +1,65 @@
+.highlight pre { background-color: #272822; }
+.highlight .hll { background-color: #272822; }
+.highlight .c { color: #75715e } /* Comment */
+.highlight .err { color: #960050; background-color: #1e0010 } /* Error */
+.highlight .k { color: #66d9ef } /* Keyword */
+.highlight .l { color: #ae81ff } /* Literal */
+.highlight .n { color: #f8f8f2 } /* Name */
+.highlight .o { color: #f92672 } /* Operator */
+.highlight .p { color: #f8f8f2 } /* Punctuation */
+.highlight .cm { color: #75715e } /* Comment.Multiline */
+.highlight .cp { color: #75715e } /* Comment.Preproc */
+.highlight .c1 { color: #75715e } /* Comment.Single */
+.highlight .cs { color: #75715e } /* Comment.Special */
+.highlight .ge { font-style: italic } /* Generic.Emph */
+.highlight .gs { font-weight: bold } /* Generic.Strong */
+.highlight .kc { color: #66d9ef } /* Keyword.Constant */
+.highlight .kd { color: #66d9ef } /* Keyword.Declaration */
+.highlight .kn { color: #f92672 } /* Keyword.Namespace */
+.highlight .kp { color: #66d9ef } /* Keyword.Pseudo */
+.highlight .kr { color: #66d9ef } /* Keyword.Reserved */
+.highlight .kt { color: #66d9ef } /* Keyword.Type */
+.highlight .ld { color: #e6db74 } /* Literal.Date */
+.highlight .m { color: #ae81ff } /* Literal.Number */
+.highlight .s { color: #e6db74 } /* Literal.String */
+.highlight .na { color: #a6e22e } /* Name.Attribute */
+.highlight .nb { color: #f8f8f2 } /* Name.Builtin */
+.highlight .nc { color: #a6e22e } /* Name.Class */
+.highlight .no { color: #66d9ef } /* Name.Constant */
+.highlight .nd { color: #a6e22e } /* Name.Decorator */
+.highlight .ni { color: #f8f8f2 } /* Name.Entity */
+.highlight .ne { color: #a6e22e } /* Name.Exception */
+.highlight .nf { color: #a6e22e } /* Name.Function */
+.highlight .nl { color: #f8f8f2 } /* Name.Label */
+.highlight .nn { color: #f8f8f2 } /* Name.Namespace */
+.highlight .nx { color: #a6e22e } /* Name.Other */
+.highlight .py { color: #f8f8f2 } /* Name.Property */
+.highlight .nt { color: #f92672 } /* Name.Tag */
+.highlight .nv { color: #f8f8f2 } /* Name.Variable */
+.highlight .ow { color: #f92672 } /* Operator.Word */
+.highlight .w { color: #f8f8f2 } /* Text.Whitespace */
+.highlight .mf { color: #ae81ff } /* Literal.Number.Float */
+.highlight .mh { color: #ae81ff } /* Literal.Number.Hex */
+.highlight .mi { color: #ae81ff } /* Literal.Number.Integer */
+.highlight .mo { color: #ae81ff } /* Literal.Number.Oct */
+.highlight .sb { color: #e6db74 } /* Literal.String.Backtick */
+.highlight .sc { color: #e6db74 } /* Literal.String.Char */
+.highlight .sd { color: #e6db74 } /* Literal.String.Doc */
+.highlight .s2 { color: #e6db74 } /* Literal.String.Double */
+.highlight .se { color: #ae81ff } /* Literal.String.Escape */
+.highlight .sh { color: #e6db74 } /* Literal.String.Heredoc */
+.highlight .si { color: #e6db74 } /* Literal.String.Interpol */
+.highlight .sx { color: #e6db74 } /* Literal.String.Other */
+.highlight .sr { color: #e6db74 } /* Literal.String.Regex */
+.highlight .s1 { color: #e6db74 } /* Literal.String.Single */
+.highlight .ss { color: #e6db74 } /* Literal.String.Symbol */
+.highlight .bp { color: #f8f8f2 } /* Name.Builtin.Pseudo */
+.highlight .vc { color: #f8f8f2 } /* Name.Variable.Class */
+.highlight .vg { color: #f8f8f2 } /* Name.Variable.Global */
+.highlight .vi { color: #f8f8f2 } /* Name.Variable.Instance */
+.highlight .il { color: #ae81ff } /* Literal.Number.Integer.Long */
+
+.highlight .gh { } /* Generic Heading & Diff Header */
+.highlight .gu { color: #75715e; } /* Generic.Subheading & Diff Unified/Comment? */
+.highlight .gd { color: #f92672; } /* Generic.Deleted & Diff Deleted */
+.highlight .gi { color: #a6e22e; } /* Generic.Inserted & Diff Inserted */