Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Upgrading @matrixai/async-init, @matrixai/async-locks, @matrixai/db, @matrixai/errors, @matrixai/workers, Node.js and integrating @matrixai/resources #366

Closed
wants to merge 78 commits into from

Conversation

CMCDragonkai
Copy link
Member

@CMCDragonkai CMCDragonkai commented Mar 28, 2022

Description

Several core libraries have been updated and so they need to be integrated into PK. Refer to MatrixAI/js-encryptedfs#63 to see how this was handled for EFS and repeat here.

  1. @matrixai/async-init - no major changes
  2. @matrixai/async-locks - all instances of Mutex should change to use Lock or RWLockWriter
  3. @matrixai/db - the DB now supports proper DB transactions and has introduced iterator, however no more sublevels (use KeyPath/LevelPath instead)
  4. @matrixai/errors - we make all of our errors extend AbstractError<T> and also provide static descriptions to all of them, as well as use the cause chain
  5. @matrixai/workers - no major changes here
  6. node.js - we should upgrade to Node 16 in order to integrate promise cancellation, as well as using Promise.any for ICE protocol
  7. @matrixai/resources - since the @matrixai/db no longer does any locking, the acquisition of the DBTransaction and Lock has to be done together with withF or withG

Along the way, we can explore how to deal with indexing #188 and #257, which should be easier now that DB has root levels of data, transactions.

Locking changes

There were three important types of locks brought in by js-async-locks:

  1. Lock (this is the equivalent of Mutex which we were using previously)
  2. RWLockWriter (a write-preferring read/write lock)
  3. LockBox (a collection of related locks)

In most cases, we are removing locks in favour of using optimistic concurrency control (OCC). This means that most domain locks should be removed with a few exceptions:

  • Counters/serialised data - places where we have strict serialisation, such as counters and lexicographically ordered keys, require locking the DB in order to maintain this order. This applies to the keys used by the Discovery Queue in Discovery.ts, the NotificationIds and message count used by NotificationsManager.ts, the ClaimIds and sequence numbers used by Sigchain.ts, and if the set/ping node or refresh buckets queues introduced in Testnet Deployment #326 become persistent then we would need locking there as well. This could be achieved for these cases by introducing a LockBox to the affected domains and locking the relevant keys when we access the db, e.g. withF([this.db.transaction(), this.locks.lock(...lockRequests)], async ([tran]) => {}) where this.locks is a LockBox and ...lockRequests is an array of [KeyPath, Lock] (see EFS INodeManager.ts).
  • Object maps - in NodeConnectionManager and VaultManager we need to lock groups of objects, this can be done using a LockBox where we are referencing the same ID each time we want to lock a specific object.

Everywhere else we expect that conflicts will be rare, so we don't use locks in order to simplify our codebase. In the case of a conflict, we can either retry (if safe) or bubble up the error to the user.

Errors changes

The new js-errors allows us to bring in error chaining, along with more standardised JSON serialisation/deserialisation (for sending errors across gRPC). With this error chaining ability, there are now three ways that we can handle/propagate errors:

  1. Error re-raise/re-throw: This re-raises the same error, in this case it just means you couldn't handle the error.
try {
  throw new ErrorPolykey('something happened');
} catch (e) {
  throw e;
}
  1. Error override: This overrides the error with a whole new error. Use this when you are wrapping a library of internal errors and presenting transformed external errors to the user.
try {
  throw new ErrorPolykey('something happened');
} catch (e) {
  throw new ErrorPolykey2(e.message);
}
  1. Error chain: This raises a new error and chains the new error with the old error. Use this when your code represents an onion of different layers each capable of handling different errors.
try {
  throw new ErrorPolykey('something happened');
} catch (e) {
  throw new ErrorPolykey2(e.message, { cause: e });
}

In all places where we are catching one error and throwing a different error in its place, we should be using approach 3 (error chain). If we just want to bubble the original exception upwards then use approach 1 (re-raise/re-throw). Finally, if we want to hide the original error from the user (perhaps it contains irrelevant implementation details or could be confusing and thus requires additional context) we can use approach 2 (error override). There is a fourth approach that exists in Python for errors that occur as a direct result of handling another error, however, this does not exist in TypeScript (in such a case we would use approach 3). When using approach 2 (and in some cases approach 3) you may want to log out the original error in addition to throwing the new error.

JSON serialisation/deserialisation

When sending errors between agents/from client to agent we need to serialise errors (including the error chain if this exists). Then, on the receiving side, we need to be able to deserialise back into the original error types.

We are able to do this using JSON.stringify() (serialisation) and JSON.parse() (deserialisation). These methods allow us to pass in a replacer/reviver to aid with converting our error data structure, as well as being combined with toJSON() and fromJSON() utility methods on the error class itself. These are implemented on AbstractError from js-errors, however, we need to extend these to work with ErrorPolykey in order to handle the additional exitCode property. While toJSON() can simply call super.toJSON() and add in the extra field, fromJSON() needs to be completely reimplemented (although this can be copied from AbstractError for the most part). Similarly, the replacer and reviver can be based on the replacer and reviver used in the js-errors tests.

ErrorPolykeyRemote

Errors are propagated between agents and clients as follows:

toError: ServiceError -> ErrorPolykeyRemote                         fromError: Error -> ServerStatusResponse

                                                                         ┌─────────────────────────────┐
      ┌───────────────────────────────┐                                  │ NodeId 1 - Server           │
      │ Client                        │             Request              │                             │
      │                               │                                  │ ┌─────────────────────────┐ │
      │ Receives toError ServiceError ├──────────────────────────────────┼─► Service Handlers        │ │
      │ Throws ErrorPolykeyRemote     │                                  │ │                         │ │
      │ cause: ErrorX | ErrorUnknown  ◄──────────────────────────────────┼─┤ Sends fromError ErrorX  │ │
      │ data: { nodeID: 1, ... }      │                                  │ └─────────────────────────┘ │
      └───────────────────────────────┘      Response with error         │                             │
                                             JSON string in metadata     └─────────────────────────────┘
                                          error: { type: ErrorX, ... }

ErrorPolykeyRemote is constructed on the client side (not the server side) inside our gRPC toError() utility. After the received error is deserialised, it is wrapped as the cause property of a new ErrorPolykeyRemote, which should also contain the nodeId, host, and port of the agent that originally sent the error in its data property. In order to access this information, it needs to be passed through from wherever the client/agent method is called (this would be bin commands for the client service and domain methods for the agent service). The data can then be passed through to the promisify...() methods, which in turn call toError().

Testing

Now that we have an error chain, we need to adjust our tests to be able to perform checks on these. In many cases where we were originally expecting some specific ErrorPolykey we will now be receiving an ErrorPolykeyRemote with the original error in its cause property. For simple cases like this it is simple enough to just perform the existing checks on the cause property of the received error rather than the top-level error, however this approach becomes more complicated for longer error chains. Additionally, we may want to perform checks on the top-level ErrorPolykeyRemote (such as checking the metadata for the sending agent).

In this case, it would be useful to create an expectation utility that allows one to perform checks on the entire error chain, from the top-level error to the final error in the chain. This could look something like this:

function expectErrorChain(thrownError: any, expectedErrors: Array<any>) {
  let checkingError = thrownError;
  for (error in errors) {
    // Check expected type first
    expect(checkingError).toBeInstanceOf(...);
    // Match against every expected property
    expect(checkingError.message).toBe(error.message);
    if (error instanceof ErrorPolykey) {
      expect(checkingError.data).toEqual(error.data);
      // ...
    }
    // Move along to the next in the cause chain
    checkingError = checkingError.cause;
  }
}

We could also think about using the toJSON() method on each error to allow us to use jest's expect().toMatchObject() matcher rather than having to check every error property individually. Also potentially including parameters to specify which properties of the error you do and don't want to check against.

Additional context

Database changes

Changes include but are not limited to

  1. Updating any transact wrappers by removing them or using withF, withG locking directly.
  2. In all cases where there is a conflict just throw it up the stack. We will expect to handle them within the handlers or look deeper into it later.
  3. ErrorDBTransactionConflict Error should never be seen by the user. We should catch and override it with a more descriptive error for the context.
  4. Transactions should be started within the handlers and passed all the way down to where they are needed. The idea is to make each handler attomic.
  5. Concurrency testing should be introduced but only after SI transactions has be implemented.
  6. All usage of DB should be updated to use the new API. This means removing sublevels and utilising LevelPath and KeyPaths instead.
  7. All usage of db streams should be replaced with the db iterator.
  8. All instances of db.put, db.get and db.del should be using transactions via tran.put/get/del

This applies to all domains that make use of DB OR domains that depend on others that make use of DB. The goal here is to make any even starting from the handlers atomic.

There are limitations to this however. Since a transaction can fail if there is overlapping edits between transactions. We can't really include changes to the db that will commonly or guarantee conflict. Example of this are counters or commonly updated fields. So far this has been seen in;

  1. NotificationsManager. Makes use of a counter so any transactions that include Adding or removing a notification WILL conflict. Reads also update metadata so concurrently reading the same message WILL conflict.
  2. More to follow?

Some cases we will need to make use of locking along with a transaction. A good example of this is in the NotificationManager where we are locking the counter update. When this is the case we need to take extra care with the locking. Unless the lock wraps the whole transaction it is still possible to conflict on the transaction. we can't compose operations that rely on this locking with larger transactions.

An example of this problem is.

start tran1
start tran2

start lock
	tran1 update counter
end lock

start lock
	tran2 update counter
end lock

end tran1
end tran2 // conflict!

To avoid this tran2 has to start after tran1 ends. 
We need to force serialisation of the transactions here.
As a result we can't compose with a transaction outside of
the lock.

This means that some operations or domains can't be composed with larger transactions. It has yet to be seen if this will cause an issue since more testing is required to confirm any problem. I suppose this means we can't mix pessimistic and optimistic transactions. So far it seems it will be a problem with the following domains.

  1. Vaults domain - Lifecycle and editing of vaults relies heavily on locks.
  2. NotificationsManager - Locking is needed for the counter and preventing other conflicts.

Node.js changes

After upgrading to Node v16 we will be able to bring in some new features.

  1. Promise.any - we are currently using Promise.all in our pingNode() method in NodeConnectionManager (this is being done in Testnet Deployment #326) but this should change to using Promise.any. This is because Promise.all waits for every promise to resolve/reject, however we only care about whichever finishes first and we want to cancel the rest. This change can be made in Testnet Deployment #326 after this PR is merged.
  2. AggregateError - this error is emitted by Promise.any if all of the given promises are rejected. In our case this would mean that we were unable to ping the desired node via direct connection or signalling (and eventually relaying once this is implemented), so we may want to catch this and re-throw some other error to represent this. We will also need to add this into our error serialisation/deserialisation.
  3. AbortController - there are a number of places that have been identified in Asynchronous Promise Cancellation with Cancellable Promises, AbortController and Generic Timer #297 where we could use the new AbortController/AbortSignal. This can be done in a separate PR when rebasing Testnet Deployment #326.

Issues Fixed

Tasks

  • 1. Upgrade all exceptions to use AbstractError and use cause chain and static descriptions
  • 2. Remove all sublevel object maintenance
  • 3. Update all usages of db.get and db.put and db.del to use KeyPath
  • 4. Remove all uses of batch and domain locking and instead migrate to using DBTransaction and the withF and withG from @matrixai/resources
  • 5. Replace all uses of createReadStream, createKeyStream and createValueStream with db.iterator
  • 6. All domain methods that involve DB state should now take a tran?: DBTransaction optional parameter in the last parameter to allow one to compose a transaction context
  • 7. If possible, get EFS to use the new DB as well, which can be done by making DB take a "prefix", thus recreating sublevels, but only for this usecase. This would mean both EFS and PK use the same DB, with EFS completely controlling a lower level.
  • 8. See where RWLock can be used instead when doing concurrency control on the DBTransaction to raise the isolation level to avoid non-repeatable reads, or phantom reads or lost-updates.
  • 9. Upgrade to Node 16 (follow procedure from Upgrading to node 16.x LTS (and pkgs.nix upgrade to 21.11 or later) TypeScript-Demo-Lib#37)
  • 10. Integrate @matrixai/async-locks to use the Lock class instead of Mutex from async-mutex
  • 11. Use @matrixai/resources withF and withG to replace transactions
  • 12. Clean up left over commentary from migration
  • 13. Update nodeConnectionManager and VaultManager to use lockbox for the object maps.
  • 14. Update NodeConnectionManager withConnF to properly use new resources API features. This includes the ResourceAquire making use of error handling.

Tasks are divided by domain for conversion. Assignments are as follows. Domains labelled 'last' depend on a lot of other domains and will be divided later.

  • acl - Emma
  • agent - last
  • bootstrap - last
  • client - last
  • discovery - Brain
  • gestalts - Emma
  • identities - Emma
  • nodes - Emma , depends on sigchain
  • notifications - Brian
  • sessions - Emma
  • sigchain - Brain
  • vaults - Brian
  • polykeyAgent - last

Final checklist

  • Domain specific tests
  • Full tests
  • Updated inline-comment documentation
  • Lint fixed
  • Squash and rebased
  • Sanity check the final build

TBD spec:

Error changes:

Testing for an error with a cause.

  1. Pending changes from JSON serialisation/deserialisation from the new AbstractError
  2. ErrorPolykeyRemote needs to have remote node metadata
  3. Work out how ErrorPolykeyRemote is propagated from agent to agent to client.
  4. Create new expectation utilities that can standardise how we expect things to fail and the checks on those exceptions, as well as the cause property
  5. Bring in the standards of exception override vs exception chain into the spec

DB changes:

domains - Each domain used to have a transact or withTransaction wrapper. Unless this wrapper actually needs some additional context, and it is just calling withF and withG directly, then avoid this and just replace it with direct usage of withF and withG utilities. We should identify domains that require additional context, in those cases we continue to use a with transaction wrapper.

transaction, and its utilisation of SI transactions

withF([db.transaction()], async ([tran]) => {
    await tran.put(k1, ...);
    await tran.put(k2, ...);
});

withF([db.transaction()], async ([tran]) => {
    await tran.put(k2, ...);
    await tran.put(k3, ...);
});
ErrorDBTransactionConflict
  .data => [k1, k2, k3]

In all cases, you just throw this exception up, and propagate it all the way to the client.

The pk client however will change the message to be more specific to the user's action.

Retrying this can be identified and sprinkled into the codebase afterwards.

  1. Work out a way to write a user-friendly message when a ErrorDBTransactionConflict occurs in the PolykeyClient (at the CLI)

  2. Identify where automatic retries can occur, and make those catch the ErrorDBTransactionConflict and resubmit the transaction

  3. Identify write-skews, this is where we must use solutions for dealing with write skews => locking and materialising the write conflict

  4. Identify serialisation requirements - counter updates, and where PCC is demanded as part of the API, in those situations use the LockBox or lock

Handlers need to now start the transaction. For agent and client.

We have tran?: DBTransaction where it allows public methods to create their own transaction context.

But this is only for debugging and convenience.

During production, the transaction context is meant to be setup at the handler level.

This means that each handler is its own atomic operation.

By default they start out at the handler level, this will make it easier for us to identify our atomic operations and to compare them.

Tests - introduce concurrency tests, consider some new cocurrency expectation combinators that allow us to more easily say one of these results has to be X, and the rest has to be Y.

Concurrency tests can be introduced domain by domain. - Use Promise.allSettled

We can however focus our concurrency testing at the handler level, because we expect that to be the unit of atomic operations.

All ops removed, changed to using keypath and levelpath.

In the iterator:

  • For tran.iterator you will always have the key no matter what because it is being used for matching between the overlay and underlying db data.
  • The keyAsBuffer and valueAsBuffer are by default true, if you change them to false, it will decode the 2, and return as decoded values. If you need both, use dbUtils.deserialize().
  • More docs for tran.iterator
  • Eliminate all subleveldown, we are no longer using that (can be brought back later, would only be useful to share db betweeen EFS and PK DB)
  • Eliminate all streams
withF([db.transaction()], async ([tran]) => {
    await tran.put('k1', 'adfdsf);
    const i = tran.iterator();
    const r1 = await tran.get('k1')
    i.seek('k1')
    const r2 = i.next()
    // r1 !== r2 (possible)
    // with SI r1 === r2
});

Just think that the iterator has the same snapshot as the original transaction.

Don't start writing concurrency tests until we have SI transactions. Serial tests are still fine.

Nodev16 Changes:

  1. Promise.any - for ICE protocol for the DC and signalling
  2. AggregateError - its in our serialisation and deserialisation, but it's only emitted by Promise.any
  3. AbortController - separate PR (when rebasing testnet deployment) - identify where these may be changed polling

Locking Changes:

All async locks and async mutex is to be replaced with js-async-locks.

In the js-async-locks:

  1. Lock - mutex
  2. RWLockWriter - read/write
  3. LockBox - locking collection

Convert object-map pattern to using LockBox by referencing the same ID.

Use INodeManager in EFS as a guide.

@CMCDragonkai
Copy link
Member Author

CMCDragonkai commented Mar 28, 2022

RWLocks is now in https://github.com/MatrixAI/js-async-locks. Locking tests have moved to there too. This can now be used together here. Note that we pretty much only use RWLockWriter as it is now write-preferring to differentiate from read-preferring RWLockReader.

Standard mutex usage is also Lock.

Why use these?

Because they fit the ResourceAcquire type.

Now anything that needs to acquire a single lock is just a matter of doing:

const lock = new Lock();
withF([lock.acquire], async ([lock]) => { ... });

The lock.acquire are arrow functions so they won't lose the this context. This was important! And so any usage of withF or withG needs to deal with this.

Note that DB did a bit differently:

withF([db.transaction()], ...);

In this case, it returned the ResourceAcquire.

In the locks case, it is just acquire or acquireRead or acquireWrite. Although in our domains, we may further provide abstractions especially in terms of collections of locks that enable us to lock specific keys or ranges of keys (possibly using ES6 symbols) to ensure that we can do concurrency control beyond read-committed transactions.

Furthermore, the withG has some special behaviour https://stackoverflow.com/questions/71645327/typescript-doesnt-infer-tnext-unknown-for-asyncgenerator-function-expressi. @tegefaulkes this could be one of the problems you faced, but there's a simple fix is to just either the generator function in another line or explicitly type the TNext.

So in PK, we would bring in @matrixai/resources and @matrixai/async-locks to replace both locking and with contexts.

The @matrixai/async-init also can use @matrixai/async-locks too, but not super important.

@CMCDragonkai
Copy link
Member Author

MatrixAI/js-db#13 allows us now to use levels arbitrarily, no need to worry about reserved characters anymore. No more base encoding of arbitrary level parts. Key parts always had the ability to have any bytes.

@CMCDragonkai
Copy link
Member Author

CMCDragonkai commented Mar 31, 2022

Example of fixing the sameNodePerm with DBTransaction:

  @ready(new aclErrors.ErrorACLNotRunning())
  public async sameNodePerm(
    nodeId1: NodeId,
    nodeId2: NodeId,
    tran?: DBTransaction,
  ): Promise<boolean> {
    const nodeId1Path = [...this.aclNodesDbPath, nodeId1.toBuffer()] as unknown as KeyPath;
    const nodeId2Path = [...this.aclNodesDbPath, nodeId2.toBuffer()] as unknown as KeyPath;
    if (tran == null) {
      return withF(
        [
          this.db.transaction(),
          this.locks.lockRead(
            dbUtils.keyPathToKey(nodeId1Path).toString('binary'),
            dbUtils.keyPathToKey(nodeId2Path).toString('binary')
          ),
        ],
        async ([tran]) => this.sameNodePerm(nodeId1, nodeId2, tran)
      );
    }
    const permId1 = await tran.get(nodeId1Path, true);
    const permId2 = await tran.get(nodeId2Path, true);
    if (permId1 != null && permId2 != null) {
      return IdInternal.fromBuffer(permId1).equals(IdInternal.fromBuffer(permId2));
    }
    return false;
  }

Furthermore, integration of timeouts into js-async-locks also impacts the API of js-async-locks. But so it's best to merge that before going ahead with the current API of @matrixai/async-locks. MatrixAI/js-async-locks#3

@CMCDragonkai
Copy link
Member Author

#369 discovered that withConnF isn't using the exception parameter for ResourceRelease. Since this PR brings in js-resources, will need to remember to do that here.

@CMCDragonkai
Copy link
Member Author

3.6.3 of js-db builds in the canary check. We should catch the ErrorDBKey exception upon DB.createDB and rethrow as a PK error indicating incorrect key. We technically have something like this already in a number of places.

@CMCDragonkai
Copy link
Member Author

TS demo lib has now been upgraded to v16. You should be able to bring it in here in this PR. Make sure it's a separate commit.

@CMCDragonkai
Copy link
Member Author

@emmacasolin after v16 upgrades, please refer to MatrixAI/js-encryptedfs#63 in order to know how to start upgrading all of our dependencies. Start with the easiest ones like js-errors then to the harder ones like js-resources. We should meet and discuss further once you get there.

I may be upgrading our transaction system further based on lessons from EFS.

@emmacasolin emmacasolin changed the title WIP: Integration of DB Transactions for all Domains WIP: Upgrading @matrixai/async-init, @matrixai/async-locks, @matrixai/db, @matrixai/errors, @matrixai/workers, Node.js and integrating @matrixai/resources Apr 26, 2022
Updated pkgs.nix to a5774e76bb8c3145eac524be62375c937143b80c
Updated node2nix to 1.11.0 and loading directly from GitHub
Updated pkg to 5.6.0 and using pkg-fetch 3.3 base binaries
Updated to TypeScript 4.5+
Updated @types/node to 16.11.7
Updated node-gyp-build to 4.4.0
Updated typedoc to 0.22.15, the .nojekyll generation is automatic
Changed to target ES2021 as node 16 supports it
Bin executable duct-tape in default.nix because node2nix no longer generates bin executables
Updated pkg builds hashes to v16.14.2 and specified target node range
@emmacasolin
Copy link
Contributor

I've brought in the Node 16 upgrade here. Nix shell is working fine along with npm commands (e.g. lint), however, anything involving a build is failing due to the existing WIP changes here since there are errors. Will need to check the builds at some point before this PR is merged.

@CMCDragonkai
Copy link
Member Author

certain kinds of mocks such as import { mocked } from 'ts-jest/utils'; is now deprecated. We should look into replacing it. Maybe with jest.spyon?

This path should now be import { mocked } from 'ts-jest'. That should work.

@CMCDragonkai
Copy link
Member Author

I think I should remove the option to compose transactions with the NotificationsManager methods. We should be taking a pessimistic approach with it. I'll have to look over it again.

The NodeManager needs some of the methods updated with the transactions. I'll fix that as well.

As explained in our chat, there's no problem with mixing PCC and OCC together. During any serialised counter update, add a PCC lock there for that specific key. This will serialise updates to the counter. If other conflicts occur due to overlapping write-set, an ErrorDBTransactionConflict can be thrown but this is by-design. SI always ends up with potentially unnecessary conflicts, it is trade off being made to ensure programming consistency is easier. The work around is to sprinkle PCC where necessary but that specialisation can be done after we setup the foundation of OCC.

@emmacasolin
Copy link
Contributor

No you don't need to do that. You want to add metadata at the toError which is called on the client side. The client side has all the knowledge about where it is calling the remote node. You need to pass the GRPC client information to the toError call. This can be done without doing anything to fromError.

So is the nodeId of the GRPCClientClient/GRPCClientAgent the nodeId of the remote node? In that case it's easy to pass that information through for most calls, except for duplex streams where generatorDuplex is called in the service handlers as well. So we still need to acquire the nodeId from somewhere for those.

@CMCDragonkai
Copy link
Member Author

CMCDragonkai commented May 19, 2022

No you don't need to do that. You want to add metadata at the toError which is called on the client side. The client side has all the knowledge about where it is calling the remote node. You need to pass the GRPC client information to the toError call. This can be done without doing anything to fromError.

So is the nodeId of the GRPCClientClient/GRPCClientAgent the nodeId of the remote node? In that case it's easy to pass that information through for most calls, except for duplex streams where generatorDuplex is called in the service handlers as well. So we still need to acquire the nodeId from somewhere for those.

The nodeId property of GRPCClientClient and GRPCClientAgent is always the NodeID of what the client is connecting to. That's the purpose of this property.

For generatorDuplex, use promisifyDuplexStreamCall. Inject the this.nodeId there as an additional parameter.

@CMCDragonkai
Copy link
Member Author

CMCDragonkai commented May 19, 2022

@emmacasolin to make it a bit more generic, instead of passing just the nodeId: NodeId, you can do something like:

function promisifyUnaryCall<T>(
  client: Client,
  f: (...args: any[]) => ClientUnaryCall,
  clientMetadata?: POJO
): (...args: any[]) => PromiseUnaryCall<T> {

This would allow you to pass any arbitrary data into the ErrorPolykeyRemote.

It's a generic POJO and optional too. Alternatively don't make it optional so that way it's always enforced.

If we're changing the API, and you make a mandatory parameter, then change it to be:

function promisifyUnaryCall<T>(
  client: Client,
  clientMetadata: POJO
  f: (...args: any[]) => ClientUnaryCall,
): (...args: any[]) => PromiseUnaryCall<T> {

@emmacasolin
Copy link
Contributor

For generatorDuplex, use promisifyDuplexStreamCall. Inject the this.nodeId there as an additional parameter.

The place I'm referring to is here for example

function nodesCrossSignClaim({
  db,
  keyManager,
  nodeManager,
  sigchain,
  logger,
}: {
  db: DB;
  keyManager: KeyManager;
  nodeManager: NodeManager;
  sigchain: Sigchain;
  logger: Logger;
}) {
  return async (
    call: grpc.ServerDuplexStream<nodesPB.CrossSign, nodesPB.CrossSign>,
  ) => {
    const genClaims = grpcUtils.generatorDuplex(call, true);
    try {
      ...

generatorDuplex calls generatorReadable, which calls toError(), so the nodeId needs to be passed through from the handler.

@CMCDragonkai
Copy link
Member Author

For generatorDuplex, use promisifyDuplexStreamCall. Inject the this.nodeId there as an additional parameter.

The place I'm referring to is here for example

function nodesCrossSignClaim({
  db,
  keyManager,
  nodeManager,
  sigchain,
  logger,
}: {
  db: DB;
  keyManager: KeyManager;
  nodeManager: NodeManager;
  sigchain: Sigchain;
  logger: Logger;
}) {
  return async (
    call: grpc.ServerDuplexStream<nodesPB.CrossSign, nodesPB.CrossSign>,
  ) => {
    const genClaims = grpcUtils.generatorDuplex(call, true);
    try {
      ...

generatorDuplex calls generatorReadable, which calls toError(), so the nodeId needs to be passed through from the handler.

Well because all of the promisify* utilities ultimately call the generator* utilities, then when you parameterise the promisify* utilities to include the new client metadata, then you have to do the same for the generator* utilities. You're just pushing/passing down information from the top of the call hierarchy.

Also I realised that clientMetadata may be confusing, perhaps connectionMetadata? Or just metadata could work too.

@emmacasolin
Copy link
Contributor

Well because all of the promisify* utilities ultimately call the generator* utilities, then when you parameterise the promisify* utilities to include the new client metadata, then you have to do the same for the generator* utilities. You're just pushing/passing down information from the top of the call hierarchy.

Yes that's what I'm saying. But from my previous comment:

  1. In order for the service handlers to have access to the nodeId, host, and port, they would need to be injected with both the keyManager and proxy (they can't be injected with these values (specifically the nodeId) directly since the keyManager is not yet started when the client/agent services are created in PolykeyAgent).
  2. There's also the issue of the nodeId/host/port needing to be defined outside of the try-catch handlers of the service handlers, since we need these values inside the catch block. This means that if there are any errors while we are trying to acquire the values they won't be handled correctly.

Ignoring the host/port since that can be retrieved from call.getPeer()

@CMCDragonkai
Copy link
Member Author

CMCDragonkai commented May 19, 2022

Well because all of the promisify* utilities ultimately call the generator* utilities, then when you parameterise the promisify* utilities to include the new client metadata, then you have to do the same for the generator* utilities. You're just pushing/passing down information from the top of the call hierarchy.

Yes that's what I'm saying. But from my previous comment:

  1. In order for the service handlers to have access to the nodeId, host, and port, they would need to be injected with both the keyManager and proxy (they can't be injected with these values (specifically the nodeId) directly since the keyManager is not yet started when the client/agent services are created in PolykeyAgent).
  2. There's also the issue of the nodeId/host/port needing to be defined outside of the try-catch handlers of the service handlers, since we need these values inside the catch block. This means that if there are any errors while we are trying to acquire the values they won't be handled correctly.

Ignoring the host/port since that can be retrieved from call.getPeer()

Well now that we allow generic POJO data, you can just write mock data. It doesn't have to be real node ID. This is for a testing situation.

As for acquiring the values, in the catch handler. Fetching these values cannot throw exceptions. They are static values that is set by the time you have started the connection.

Even if ClientMetadata had a specific nodeId: NodeId, then you can easily mock one with IdInternal.from<NodeId>(...).

@CMCDragonkai
Copy link
Member Author

What's the reason to encode the NodeID? Why not leave it as normal NodeId when setting it in the data?

@emmacasolin
Copy link
Contributor

What's the reason to encode the NodeID? Why not leave it as normal NodeId when setting it in the data?

I'm adding it into the metadata of the ServiceError that gets passed into toError, and if I leave it as a NodeId then it loses its type during the serialisation/deserialisation. I could pass the metadata as a POJO instead as a second argument to toError? And that way it could stay as a NodeId. That would preserve the types of the host/port and any other data as well so I might do that.

@CMCDragonkai
Copy link
Member Author

CMCDragonkai commented May 20, 2022

Yes pass it as a second argument. Don't mutate the types, ensure separation of responsibility here.

@CMCDragonkai
Copy link
Member Author

CMCDragonkai commented May 20, 2022

@tegefaulkes

The usual suspects of things breaking due to node v16 are:

  1. network
  2. nodes
  3. vaults

Because vaults rely on nodes, and node rely on network. Focus on fixing the network domain first, then investigate the problem with nodes (at the same time migrating to using the LockBox).

Finally the vaults requires LockBox over the object map, and try fixing the isogit package version to what we have in package.json on master branch in case the isogit introduces changes to their internals, we manipulate a bunch of stuff in isogit to get the cloning system working.

…ors properly

Making use of the updated API to use the error provided to the resourceRelease.
@tegefaulkes tegefaulkes deleted the dbtran branch May 20, 2022 08:45
@CMCDragonkai CMCDragonkai restored the dbtran branch May 20, 2022 08:47
@CMCDragonkai CMCDragonkai reopened this May 20, 2022
@CMCDragonkai
Copy link
Member Author

The dbtran was renamed to feature-dep-upgrades due to security requirements. This PR information has to be copied to the new PR.

@CMCDragonkai
Copy link
Member Author

And also need to reset the links in zenhub.

@tegefaulkes
Copy link
Contributor

New PR at #374. Work from there from now on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment