-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Concurrency support #8
base: concurrent
Are you sure you want to change the base?
Conversation
@@ -139,6 +139,7 @@ struct cache_obj; | |||
typedef struct cache_obj { | |||
struct cache_obj *hash_next; | |||
obj_id_t obj_id; | |||
uint64_t fingerprint; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use less space for fingerprint? like one-byte? or guard with a macro? Because we may have billions of cache_obj in simulations, we are very sensitive to the memory usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can leave it if you do not plan to merge into develop branch
@@ -196,13 +196,13 @@ cache_obj_t *chained_hashtable_insert(hashtable_t *hashtable, request_t *req) { | |||
cache_obj->hash_next = new_cache_obj; | |||
cache_obj = new_cache_obj; | |||
} | |||
hashtable->n_obj += 1; | |||
__sync_fetch_and_add(&hashtable->n_obj, 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
neat, __sync_fetch_and_add
is legacy code according to GCC doc (https://gcc.gnu.org/onlinedocs/gcc/_005f_005fsync-Builtins.html), it might be better to use __atomic_add_fetch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, __sync_fetch_and_add has been outdated. However, when I used <stdatomic.h> and atomic_add_fetch, I got an error: ‘_Atomic’ does not name a type. I'm not sure whether it is caused by my GCC version. Maybe you can help me to solve this problem :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, that's weird, I think atomic does not need a header. Let's just leave it as it as for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Look great to me! :)
if (cur_obj != NULL) { | ||
prev_obj->hash_next = cur_obj->hash_next; | ||
if (!hashtable->external_obj) free_cache_obj(cur_obj); | ||
hashtable->n_obj -= 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like there was a bug in the old code. Good catch! Can you port this change to the develop branch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course I will. This is just a minor bug, because it seems that this function is not be used in the project.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
Fix function chained_hashtable_delete_obj_id_v2
1. fix warnings 2. fix turning off zstd not working 3. fix bugs at several places 4. it is still challenging to build on macos
* add SIEVE to be the main cache of TinyLFU algorithm * log fix * Add options for trace print formatting * add a note how to run the caffeine simulator * Update table of contents in README.md
* fix compile warnning and errors * fix bugs in CMakeLists * remove xxhash from source and use system-level xxhash so that macos can compile * change global variable names * add n_core function for macos
Signed-off-by: zztaki <[email protected]>
* Update README.md * Update README.md * Update README.md
Signed-off-by: zztaki <[email protected]>
Signed-off-by: zztaki <[email protected]>
Signed-off-by: zztaki <[email protected]>
update documentation of traceConv; add better description in Belady algorithm
Merge the 'cacheMon-develop' into concurrency-support
Great work! Should I take a look now or wait until you finish? BTW, it would be great if you can push to https://github.com/1a1a11a/libCacheSim, but if it is too much trouble. I can do the work. :) |
The task about codes is almost completed. You can check the codes now. However, I make many updates, which may take several days to review them. I'll try to reply any question if you have when I'm free I'll use larger traces to test the program in the few days. If no serious error, I'd like to push the codes to the given repo. Sadly, the current code is impossible to merge into the original project now. The cahce behaviors in multiple threads are different with the single-thread ones. |
Support for concurrent cache instance
The current libCacheSim has limited support for concurrent simulation. It allows n threads to access n cache instances, but it does not permit n threads to read/write 1 cache instance. This limitation hinders researchers from evaluating the concurrent performance of their algorithms or systems. Therefore, we plan to add concurrent support to libCacheSim. The goal of this project is to implement thread-safe cache instances while maintaining the original performance of miss ratio and single-thread throughput.
Part 1: Concurrent Support to Hashtable
Con-HashTable Implementation
LibCacheSim uses a chainedHashTable, as shown:
The hashtable is naturally friendly with concurrency because operations on different buckets do not compete with each other. To make the hashtable supports concurrency, what we should do is controlling operations on the same bucket. A naive method is to use an rwlock for each bucket, as shown:
However, this method is not space-efficient. In the implementation, we initialize an rwlock pool, whose size is a power of 2. We use the function
rw_index = bucket_index & (rw_count-1)
to map buckets to rwlocks, allowing multiple buckets to share the same rwlock.For simplicity, the con-HashTable is a static hashtable, whose default size is 2^23 (around 8 Million). The con-HashTable supports thread-safe inserts, deletes, and reads. It is the basic data structure for cache instance.
Test for con-HashTable
We conducted some simple tests for the concurrent hash table (con-HashTable).
Environment:
Test:
Single-thread Test: We tested the single-thread throughput of con-HashTable and the original chained hash table separately for insertions, reads, and deletes. We used MOPS (million operations per second) as the metric.
Multi-thread Test: We tested the multi-thread throughput of con-HashTable. Each thread runs as follows:
Step 1: It generates$2^{19}$ random numbers as keys to insert.
Step 2, it reads the$2^{19}$ random keys for 10 rounds.
Step 3, it deletes the$2^{19}$ random keys.
Repeat steps 1-3 until the program ends. We used MOPS as the metric to evaluate the throughput of con-HashTable.
We also tested the original chained hash table for comparison. Since the original hash table is not thread-safe, we tested it with one thread only.
Result:
Single-thread Test: The throughput of con-HashTable is lower than the original hash table. This is because con-HashTable uses rwlocks to protect buckets, which introduces some overhead.
Multi-thread Test: The maximum throughput of con-HashTable is 22.6x higher than the original hash table.
Reproduce:
Single-thread Test (default time is 20s):
bin/testCHashtable --ht-type 1
bin/testCHashtable
Multi-thread Test:
bin/testCHashtable --test-type 1 --ht-type 1
bin/testCHashtable --test-type 1
bin/testCHashtable --test-type 1 --thread-num 10
bin/testCHashtable --test-type 1 --thread-num 110
Support for libcuckoo hash table.
Part 2: Thread-safe Cache Operations
To make sure the consistency, we introduce a new flags
in_cache
for each object to show whether the object is in the cache. Only if the object is both in the hashtable and eviction metadata,in_cache
is set to true. Either the object is not in hashtable or not in the eviction metadata,in_cache
is set to false. Users can only access cache objects within_cache
set to true.To ensure consistency, we introduce a new flag
in_cache
for each object to indicate whether the object is in the cache.in_cache
is set to true only if the object is both in the hashtable and eviction metadata. If the object is not in the hashtable or not in the eviction metadata,in_cache
is set to false. Users can only access cache objects within_cache
set to true.In part 1, we have implemented a thread-safe hashtable, with three fundamental behaviors: insert an object into the hashtable (
insert
), remove an object from the hashtable (remove
), and find an object from the hashtable (find
). Similarly, we define three fundamental behaviors for the eviction metadata: insert an object into the eviction metadata (insert
), evict an object from the eviction metadata (evict
), and access an object (access
). Assuming the eviction metadata behaviors are thread-safe (or atomic). The basic cache operations can be based on the above behaviors, which are implemented as follows.In part 1, we have implemented a thread-safe hashtable with three fundamental behaviors: inserting an object into the hashtable (
insert
), removing an object from the hashtable (remove
), and finding an object from the hashtable (find
). Similarly, we define three fundamental behaviors for the eviction metadata: inserting an object into the eviction metadata (insert
), evicting an object from the eviction metadata (evict
), and accessing an object (access
). Assuming the eviction metadata behaviors are thread-safe, the basic cache operations can be based on the above behaviors, which are implemented as follows.To simplify the design of eviction metadata, the function
remove_base
does not immediately remove the object from the cache. Instead, it only sets thein_cache
flag to false and removes its pointer from the hashtable. However, the object remains in the eviction metadata. It will only be removed from the eviction metadata when it is evicted. Since this item will never be accessed again, it will be evicted soon in the future.find_base
andevict_base
. The result is consistent. As the object is evicted from the eviction metadata, it is marked asin_cache = false
. Therefore,find_base
will return false even if the object is still in the hashtable.find_base
andinsert_base
. The result is consistent. Thein_cache
flag is set to true only if the object is in both the hashtable and eviction metadata. Then the object can be found by the client.evict_base
andinsert_base
. The result is possibly inconsistent, but it does not affect correctness. In the scenario where an object is just inserted and immediately evicted, inconsistency may occur:In this case, the object is not in the cache, but the
in_cache
flag is set to true. However, this discrepancy of an evicted object does not impact the correctness of the cache.We can implement other cache operations based on these thread-safe basic cache operations.
Part 3: Thread-safe Eviction Algorithms
Implementation of LRU
As discussed in the previous section, eviction metadata involves three basic operations:
insert
,evict
, andaccess
. We use a doubly linked list with a mutex to implement thread-safe operations for the LRU policy. Here's the code:Implementation of FIFO
While the LRU policy ensures thread safety by using a mutex for the doubly linked list, this introduces locking overhead. To reduce this overhead, we implement the FIFO policy without locks. It uses a singly linked list and CAS commands for thread safety. Here's the code:
Similarly, we implement thread-safe versions of
LFU
,SIEVE
, andCLOCK
.LFU
andSIEVE
are similar toLRU
, using mutexes, whileCLOCK
is similar toFIFO
, utilizing CAS.Part 4: Thread-safe CacheSim.
We make the following changes to the original CacheSim to support concurrent simulation.
Changes to CacheSim Parameters
The original CacheSim doesn't support multiple threads accessing the same cache instance, but it has a parameter called
num-thread
. This parameter specifies the number of threads accessing cache instances. For example:This command runs CacheSim with four cache instances (LRU caches with sizes of 1GB, 2GB, 3GB, and 4GB, respectively) and creates a thread pool with 16 threads. However, each thread can only access one cache instance, underutilizing the thread pool. This parameter is used for efficient batch evaluations on multi-core machines.
We repurpose this parameter so that
num-thread
threads access one cache instance. For instance, the above command would create a thread pool with 64 threads, with each thread accessing one of the four cache instances. This ensures each cache instance is accessed by 16 threads. The default value ofnum-thread
is 1.Scaling traces
To make the traces suitable for concurrent simulation, we'll scale them. Each thread reads the trace and adds a unique prefix to each object ID. For example, object ID 1 in the first thread becomes 1001, 1002, ..., 1016, where the prefix is the thread ID. This ensures each thread has a unique trace with the same distribution.
Outputing throughput
In addition to miss ratio and the number of misses, CacheSim in multi-thread tests will now output throughput, measured in million operations per second. The simulation ends when any thread finishes the trace, and throughput is calculated as the total number of operations divided by the total time. \par
Now, we can use the new CacheSim to evaluate the multi-thread performance of different cache instances. Here's an example command:
This command tests an LRU cache with a size of 10 KiB using the
twitter_cluster52.csv
trace, with 4 threads accessing the same cache instance, each launching 1M requests.We then test the performance of the new CacheSim with various cache algorithms (LRU, LFU, Sieve, FIFO, and CLOCK) using 2 to 20 threads, with each thread corresponding to 1M requests and 10KiB cache space. The results are depicted in the following figure.
Part 5: Further Developments
Thread-safe Admission and Prefetch Algorithms
The implementation of thread-safe admission and prefetch algorithms is a crucial next step. While I lack familiarity with these algorithms, they play a vital role in cache management and optimization. Therefore, I'll leave this part to be handled by others who specialize in these areas.
More eviction algorithms
Expanding CacheSim to support more eviction algorithms like ARC, 2Q, and MQ would enhance its utility and applicability. However, integrating these policies will require additional modifications and careful consideration of their implementation specifics.
Extensive Testing
The current test suite is simple and does not cover all edge cases or scenarios. Extensive testing is essential to validate the correctness and robustness of the implementation.
Current Insuffciencies and Bugs
in_cache = true
. However, it has been removed from the hashtable. This cache item is not accessible by the client, which is in a wrong state. However, it will eventually be evicted by the eviction policy, making the cache keeps working.in_cache = true
. This cache item is in a wrong state. However, it will eventually be evicted by the eviction policy, making the cache keeps working.remove
operation in the cache doesn't remove objects from the eviction metadata. It only sets thein_cache
flag to false and removes the pointer from the hashtable. This design reduces the update overhead of the eviction metadata, but leading to space inefficiency. Additionally, this design does not suit forrandom
-based eviciton policies, which cannot guarantee that the invalid(and never accessed) cache items will be evicted eventually.