Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable the usage of withF and withG context for DB transactions and provide transactional iterators #10

Merged
merged 5 commits into from
Mar 23, 2022

Conversation

CMCDragonkai
Copy link
Member

@CMCDragonkai CMCDragonkai commented Mar 16, 2022

Description

The withF and withG generalises resource acquisition and release. The DB transaction is one of those resources. Since we share a DB being used in PK, having these methods would be useful.

Long term, the withF and withG types and utility functions would be extracted into its own package like js-context so it can be shared.

Over time this should replace the DB.transact and DB.withLocks methods, and instead just provide the DB.transaction method.

The transactional iterator implementation follows the iterator on LevelDB including the exception behaviour like in leveldown and abstract-leveldown packages. This is a stop-gap solution until we migrate the whole DB implementation over abstract-level #11.

Downstream packages EFS and PK will need to be updated.

Issues Fixed

Tasks

  • 1. Use resource control functions for transactions (withF and withG) by integrating ResourceAcquire and ResourceRelease types into a DB.transaction method.

  • 2. Add an iterator method into Transaction and DBTransaction interface so that we can perform transactional iteration. This will still use the AbstractIterator type, which is not currently up to date in upstream which will necessitate the usage of @ts-ignore comments. See: [@types/level] iterator type is lacking `[Symbol.asyncIterator]()` DefinitelyTyped/DefinitelyTyped#59265
    - This iterator must act like a "overlay", in that writes in the transaction take priority over data that is currently in the DB.

      | KEYS | DB    | SNAPSHOT | RESULT |
      |------|-------|----------|--------|
      | a    | a = a | a = 1    | a = 1  |
      | b    | b = b |          | b = b  |
      | c    |       | c = 3    | c = 3  |
      | d    | d = d |          | d = d  |
      | e    | e = e | e = 5    | e = 5  |
      | f    |       | f = 6    | f = 6  |
      | g    |       |          |        |
      | h    | h = h |          | h = h  |
      | i    |       |          |        |
      | j    |       | j = 10   | j = 10 |
      | k    | k = k | k = 11   | k = 11 |
    
  • 3. Remove the methods DB.withLocks and DB.transact as this uses locking code, and locks are not necessary for our read-committed isolation level transactions, advisory-locks for transactions are now up to the user to manage.

  • 4. Create root sublevels such as data and transactions and eventually index to support temporary state, so that users of DB will only be storing data under data sublevel.
    - This means that db.db shouldn't be used anymore, as the root of data is now in db.dataDb, the db.db is the still the underlying root LevelDB database, but this shouldn't be used unless you're doing something hacky on the LevelDB database.

  • 5. Move withF and withG and ResourceAcquire and ResourceRelease types into @matrixai/resources package, which will be used by PK as well and subsequently EFS.

  • 6. Add Transaction.test.ts to formally test the Transaction class, and formalise the 4 isolation-levels vs read-phenomena properties into the tests.
    image

  • 5. Removed DBTransaction interface, and renamed Transaction class to DBTransaction, this just simplifies the types

  • 6. Removed AbstractIterator in favour of our own DBIterator since the upstream type is out of date, and this avoids having users having to do // @ts-ignore all the time.

Final checklist

  • Domain specific tests
  • Full tests
  • Updated inline-comment documentation
  • Lint fixed
  • Squash and rebased
  • Sanity check the final build

@CMCDragonkai
Copy link
Member Author

In order to allow some streaming based functionality, we have to allow the creation of iterators inside the transaction.

We will preserve the read-committed isolation level, which means we create iterators against the DB inside the transaction, and each iterator then needs to operate against the snapshot and the real DB.

This means our snapshot needs to become an ordered key value structure.

This means we are creating another temporary DB to be the snapshot. However I'm going to first just try with a sublevel.

This would mean that there is a reserved area in the DB where transactions are working against, and this sublevel should never be used by any other places.

@CMCDragonkai
Copy link
Member Author

The usage of the snapshot sublevel does impact any usage of streaming on the root DB though, because using would iterate over the sublevel snapshots too.

Furthermore we cannot just use snapshot sublevel, we would need to create a new sublevel for each snapshot that is bound to a transaction.

Alternatively we forego using a snapshot level entirely and just use an in-memory ordered map structure, but that is its own can of worms.

Most efficient would be:

  1. Snapshots sublevel with snapshot sublevels keyed to unique transaction IDs
  2. This enables later persistent snapshots too, though not sure if this would be useful. At the very least, snapshot can grow beyond memory size?
  3. This would need the root DB to create a sublevel to actually be used, which is to say, the root db level woudn't be the real root db, but instead something under the root database. We could have data and snapshots as the 2 root sublevels.
  4. Would this impact existing users? I don't think so, it should be fine.

@CMCDragonkai
Copy link
Member Author

CMCDragonkai commented Mar 16, 2022

These additional "root" sublevels can be extended to deal with automatic indexing #1 which would then supersede #2. Which would give us:

!data
!snapshots
!index

Then index sublevels can be accessed separately and managed separately.

Additional relevant issue Level/subleveldown#111

Also I noticed that a abstract-level is being created and that may evolve the leveldb system to be more sophisticated. In particular supporting Uint8Array https://github.com/Level/abstract-level

@CMCDragonkai
Copy link
Member Author

The https://github.com/Level/abstract-level might be a good foundation for a new DB, which would then still use leveldown "now known as classic-level" as the node underlying database.

@CMCDragonkai
Copy link
Member Author

I've rolled up a bunch of issues in js-db into larger epic #11 that would be one big change to DB, that would be breaking relative to EFS and PK. So this PR will focus on just integrating withF and withG and the ability to use iterator inside a transaction. However this may be removing the transact and withLocks method which does break EFS, but that is an easy fix, since EFS doesn't actually use any of the DB locking, so it can just have withF integrated with the resource types.

@CMCDragonkai
Copy link
Member Author

The DBTransaction now uses transactionDb instead of snapshot. The tran.iterator(['d1', 'd2']) can take DBDomain, this makes our iterator method a bit different from leveldb's iterator method. This is because our tran methods tend to operate from the root, and that's how the get, and put works. Remember that since levels are not using the DB interface, we had to do this.

Furthermore the transactionDb is encrypted because it is on disk.

The iterator method is a bit strange, I need to test what happens when the keys are different between transactionDb and the dataDb it is operating over... And the usage of seek. Further prototyping required.

@CMCDragonkai
Copy link
Member Author

The changes to DB has impacted some tests.

    ✓ async construction constructs the filesystem state (13 ms)
    ✓ async destruction removes filesystem state (3 ms)
    ✓ async start and stop preserves state (18 ms)
    ✓ async start and stop preserves state without crypto (4 ms)
    ✓ async start and stop requires recreation of db levels (6 ms)
    ✓ creating fresh db (5 ms)
    ✓ get and put and del (11 ms)
    ✓ batch put and del (13 ms)
    ✕ db levels are leveldbs (6 ms)
    ✓ db levels are just ephemeral abstractions (4 ms)
    ✓ db levels are facilitated by key prefixes (12 ms)
    ✓ clearing a db level clears all sublevels (10 ms)
    ✕ lexicographic iteration order (9 ms)
    ✕ lexicographic buffer iteration order (112 ms)
    ✕ lexicographic integer iteration (7 ms)
    ✓ db level lexicographic iteration (10 ms)
    ✓ get and put and del on string and buffer keys (6 ms)
    ✕ streams can be consumed with promises (11 ms)
    ✓ counting sublevels (15 ms)
    ✓ parallelized get and put and del (428 ms)
    ✓ parallelized batch put and del (464 ms)
    ✓ works without crypto (6 ms)

I'm creating Transaction.test.ts now to test different usages of transaction and prototype the iterator usage.

@CMCDragonkai
Copy link
Member Author

CMCDragonkai commented Mar 21, 2022

I have a working prototype for the tran.iterator.

It is able to handle this sort of data:

KEYS DB SNAPSHOT RESULT
a a = a a = 1 a = 1
b b = b b = b
c c = 3 c = 3
d d = d d = d
e e = e e = 5 e = 5
f f = 6 f = 6
g
h h = h h = h
i
j j = 10 j = 10
k k = k k = 11 k = 11

@CMCDragonkai
Copy link
Member Author

Tests have been done for the transactional iterator.

Now it's time to clean up the tests for the DB.test.ts, and we can start to merge this and apply it to the NodeGraph.

@CMCDragonkai
Copy link
Member Author

CMCDragonkai commented Mar 22, 2022

In this PR, we are removing the withLocks and other async-mutex usages.

Having locks in the DB muddles the separation of responsibilities.

It turns out our read-committed transactions doesn't actually need locking at all.

You only need locks if you want to also prevent:

  • non-repeatable reads
  • phantom reads
  • lost-updates

However this all depends on the end-user and their specific situation when using transactions. Sometimes they may use only simple mutexes, in other cases they need to use read/write locks, OCC, PCC... etc.

So therefore locking functionality is removed from the DB.

We are however using withF and withG here. And we may import this from: https://github.com/MatrixAI/js-resources

@CMCDragonkai
Copy link
Member Author

We now have tests for each isolation property:

[nix-shell:~/Projects/js-db]$ npm test -- ./tests/Transaction.test.ts 

> @matrixai/[email protected] test /home/cmcdragonkai/Projects/js-db
> jest "./tests/Transaction.test.ts"

Determining test suites to run...
GLOBAL SETUP
 PASS  tests/Transaction.test.ts
  Transaction
    ✓ snapshot state is cleared after releasing transactions (34 ms)
    ✓ get, put and del (17 ms)
    ✓ no dirty reads (17 ms)
    ✓ non-repeatable reads (9 ms)
    ✓ phantom reads (11 ms)
    ✓ lost updates (7 ms)
    ✓ iterator with same largest key (16 ms)
    ✓ iterator with same largest key in reverse (15 ms)
    ✓ iterator with snapshot largest key (12 ms)
    ✓ iterator with snapshot largest key in reverse (12 ms)
    ✓ iterator with db largest key (12 ms)
    ✓ iterator with db largest key in reverse (10 ms)
    ✓ iterator with undefined values (20 ms)
    ✓ iterator using seek and next (19 ms)
    ✓ queue success hooks (5 ms)
    ✓ queue failure hooks (7 ms)
    ✓ rollback on error (7 ms)

Test Suites: 1 passed, 1 total
Tests:       17 passed, 17 total
Snapshots:   0 total
Time:        1.003 s, estimated 3 s
Ran all test suites matching /.\/tests\/Transaction.test.ts/i.
GLOBAL TEARDOWN

And matches this table:

image

@CMCDragonkai CMCDragonkai self-assigned this Mar 22, 2022
@CMCDragonkai CMCDragonkai changed the title WIP: Integrate withF and withG context to transactions WIP: Enable the usage of withF and withG context for DB transactions and provide transactional iterators Mar 22, 2022
@CMCDragonkai
Copy link
Member Author

The db.db should no longer be used when attempting to access the internal DB at root.

You can use db.dataDb.

However that shouldn't be necessary either.

We now expose:

  • DB.clear() - use this to clear the dataDb
  • DB.dump() - use this to dump the dataDb
  • DB.iterator() - use this to get an iterator on dataDB
  • DB.count() - use this to count from dataDB

And with that, the DB.test.ts now passes.

@CMCDragonkai
Copy link
Member Author

One issue is the difference in API between DB and Transaction for some methods like clear, iterator, dump and count.

And the fact that I'm only supporting promise-API, like for iterator. So callback style won't work.

@CMCDragonkai
Copy link
Member Author

CMCDragonkai commented Mar 23, 2022

The DBTransaction interface is removed. In place, I'm going to rename Transaction class to DBTransaction. This is just because Transaction is quite an overloaded name, so this just ensures that when importing like import { DB, DBTransaction } avoids less renaming.

@CMCDragonkai
Copy link
Member Author

CMCDragonkai commented Mar 23, 2022

Rather than using the AbstractIterator, both DB.iterator and DBTransaction.iterator now both return DBIterator. This will be used as a stopgap until #11 is done.

The APIs between DB and DBTransaction is still different because we want to eventually move to where you can just specify a key path instead of directly maintaining particular sublevel objects. Once all methods can just take a full keypath, there's no need to manage a bunch of sublevel objects, and that should reduce the object maintenance overhead.

This means later in #11, we should have:

// root iterator at dataDb
DB.iterator(options);
// same
DB.iterator(options, []);
// down level 1
DB.iterator(options, ['level1']);
DB.iterator(options, ['level1', 'level2]);
DBTransaction.iterator(options, ['level1', 'level2']);

DB.count();
DB.count(['level1', 'level2']);
DBTransaction.count();
DBTransaction.count(['level1']);

DB.clear();
DB.clear(['level1']);
DBTransaction.clear(['level1']); // this should be a transactional clear, to do this, we must iterator and then do `db.del`

DB.get('key');
DB.get(['level1', 'key']);
// these are not allowed, or they are equivalent to asking for undefined
DB.get([]);
DB.get();

DB.put('key', 'value');
DB.put(['level1', 'key'], 'value');

DB.del('key');
DB.del(['level1', 'key']);
DBTransaction.del('key');
DBTransaction.del(['level1', 'key']);

… `ResourceRelease` types

* Provided `iterator` methods to `DB` and `DBTransaction` to enable transactional iterators
* Remove `DB.withLocks` and `DB.transact` as advisory locking is now the responsibility of the user
* Root sublevels have been created `data` and `transactions`
* Transactions have isolation level and read-phenomena tests
* New methods on DB: `iterator`, `clear`, `count`, `dump`
* New methods on DBTransaction: `iterator`, `clear`, `count`, `dump`
@CMCDragonkai
Copy link
Member Author

Still need to add DBTransaction.clear method to be a transactional clear that iterates over a sublevel and deletes all the entries.

@CMCDragonkai CMCDragonkai changed the title WIP: Enable the usage of withF and withG context for DB transactions and provide transactional iterators Enable the usage of withF and withG context for DB transactions and provide transactional iterators Mar 23, 2022
@CMCDragonkai
Copy link
Member Author

Ok this is ready to merge.

@CMCDragonkai CMCDragonkai merged commit 3bf40af into master Mar 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

1 participant