Add partial support for cancelling async write mutex requests #6486

tgoyne · 2023-04-10T19:14:55Z

While we can't cancel the actual wait on the write mutex, we can dequeue specific Transactions which are waiting for their turn to write, and only block when the DB itself is destroyed. This makes it so that individual Transactions with cancelled async writes can be cleaned up while the write lock is held.

This is done by changing the async write queue in DB::AsyncCommitHelper from arbitrary callbacks to a queue of Transaction instances. It holds unowned Transactions to avoid ever having the final reference to the DB held on the worker thread. This required some adjustments to the locking to ensure that we're holding a lock whenever we need a pointer to remain valid, and to avoid lock order inversions this means that all of the calls to AsyncCommitHelper haver to be done without a lock held. We only do those calls from the Transaction's thread, so that didn't actually cause many problems.

The changes to BowlOfStonesSemaphore scoping in sync tests is to fix a pre-existing race condition in those tests which tsan is now complaining about. The semaphore was often being captured by something which outlived it, and could theoretically be destroyed before the call to pthread_cond_signal() returned. I doubt this ever caused any actual problems, but it could explain extremely rare crashes in sync tests.

This fixes the same problem as #6413, but without the part where we'd sometimes end up closing the DB from the async commit helper thread, as that really didn't work.

While we can't cancel the actual wait on the write mutex, we can dequeue specific Transactions which are waiting for their turn to write, and only block when the DB itself is destroyed. This makes it so that individual Transactions with cancelled async writes can be cleaned up while the write lock is held. This is done by changing the async write queue in DB::AsyncCommitHelper from arbitrary callbacks to a queue of Transaction instances. This increases the coupling between the types, but makes it easier to dequeue specific instances and relying on the specific details of what Transaction will do simplifies the locking involved.

tgoyne self-assigned this Apr 10, 2023

cla-bot bot added the cla: yes label Apr 10, 2023

tgoyne mentioned this pull request Apr 10, 2023

Avoid blocking in Transaction::close() when there's a cancelled async write when possible #6413

Closed

tgoyne force-pushed the tg/async-write-cancel-2 branch 9 times, most recently from 4ede7ad to 2604832 Compare April 13, 2023 05:27

tgoyne force-pushed the tg/async-write-cancel-2 branch from 2604832 to e28a395 Compare April 13, 2023 17:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add partial support for cancelling async write mutex requests #6486

Add partial support for cancelling async write mutex requests #6486

tgoyne commented Apr 10, 2023 •

edited

Loading

Add partial support for cancelling async write mutex requests #6486

Are you sure you want to change the base?

Add partial support for cancelling async write mutex requests #6486

Conversation

tgoyne commented Apr 10, 2023 • edited Loading

tgoyne commented Apr 10, 2023 •

edited

Loading