Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coalesce concurrent locking when writing the session token during GRPC requests #18

Open
emmacasolin opened this issue Dec 1, 2021 · 11 comments
Labels
development Standard development r&d:polykey:core activity 1 Secret Vault Sharing and Secret History Management respeck

Comments

@emmacasolin
Copy link

Specification

Currently requests to refresh the session token will not be dropped if another process is holding a write lock on the session file. It will wait for the other process to finish and then refresh the session token as well. This is unnecessary, since we can simply drop the request if we know the other process will refresh the session token when it completes (which we know it will).

In order to do this we need to make changes to the fd-lock library we're currently using (likely using C++) to allow us to see which processes currently hold a lock on the session file, as well as what type of lock it is.

Additional context

test('Parallel processes should not refresh the session token', async () => {
      let tokenP1 = 'token1' as SessionToken;
      let tokenP2 = 'token2' as SessionToken;

      tokenBuffer = await fs.promises.readFile(sessionFile);
      const prevTokenParallel = tokenBuffer.toString() as SessionToken;

      async function runListCommand(): Promise<SessionToken> {
        // At least 1 second of delay
        // Expiry time resolution is in seconds
        await sleep(1000);
        await testUtils.pkStdio(command, {}, dataDir);
        const buffer = await fs.promises.readFile(sessionFile);
        return buffer.toString() as SessionToken;
      }

      [tokenP1, tokenP2] = await Promise.all([
        runListCommand(),
        runListCommand(),
      ]);

      // Verify both tokens
      expect(
        async () => await polykeyAgent.sessionManager.verifyToken(tokenP1),
      ).not.toThrow();
      expect(
        async () => await polykeyAgent.sessionManager.verifyToken(tokenP2),
      ).not.toThrow();

      // Check that both commands were completed
      expect(tokenP1).not.toEqual('token1');
      expect(tokenP2).not.toEqual('token2');

      // Check that the session token was refreshed exactly one time
      if (tokenP1 === prevTokenParallel) {
        expect(tokenP2).not.toEqual(prevTokenParallel);
      } else if (tokenP2 === prevTokenParallel) {
        expect(tokenP1).not.toEqual(prevTokenParallel);
      } else {
        expect(tokenP1).toEqual(tokenP2);
      }
    });
  });

Tasks

  1. Make necessary changes to fd-lock library
  2. Write tests to check behaviour
@emmacasolin emmacasolin added bug Something isn't working development Standard development labels Dec 1, 2021
@CMCDragonkai
Copy link
Member

MatrixAI/Polykey#290 is the issue for making changes to fd-lock

@CMCDragonkai
Copy link
Member

Originally the idea was that on concurrent requests, the lock on the session token would be checked, and if it is already locked, it would drop the lock. This ensures that concurrent requests don't block each other only because they are queuing up to write the session token.

Because the onReceiveMetadata used by GRPC is called synchronously, this means that writing the token is done asynchronously and therefore doesn't block the GRPC call anyway.

This turned out to be a problem later in our tests when using pkStdio because it would finish the main CLI function without writing the token. Details here: MatrixAI/Polykey#296 (comment) and MatrixAI/Polykey#296 (comment)

Upstream is now potentially fixing this so that onReceiveMetadata will be asynchronously called and therefore async operations there will complete before the GRPC call finishes.

When that happens our concurrent GRPC calls will be blocking each other. And we will need to address this by changing the way our locking works.

  1. We will need to ensure that if the lock is already held, we can drop the locking and just complete, thus coalescing the session token write between concurrent GRPC calls.
  2. We should also bring in Create IPC locking library @matrixai/js-file-locks to introduce RWlocks in IPC Polykey#290 to ensure that our overall locking design can be improved anyway, right now reading also blocks other reads.

@CMCDragonkai CMCDragonkai changed the title Concurrent requests to write to the session file should be dropped Coalesce concurrent locking when writing the session token Dec 13, 2021
@CMCDragonkai CMCDragonkai changed the title Coalesce concurrent locking when writing the session token Coalesce concurrent locking when writing the session token during GRPC requests Dec 13, 2021
@CMCDragonkai
Copy link
Member

Not a bug atm, removing the label.

@CMCDragonkai CMCDragonkai removed the bug Something isn't working label Dec 15, 2021
@CMCDragonkai CMCDragonkai added epic Big issue with multiple subissues r&d:polykey:core activity 1 Secret Vault Sharing and Secret History Management labels Jul 23, 2022
@CMCDragonkai
Copy link
Member

The transition to JSON RPC means we need to consider how to implement middleware for the streams.

Right now we have client side GRPC interceptors and also utility functions on the server side handlers.

We could use transform streams for the JSON streams, and use them as middleware for the RPC client and RPC server.

These middleware may need to be able to cancel the subsequent handling if for example due to authentication failure.

We could make this simpler with just providing utility functions to be used.

@CMCDragonkai
Copy link
Member

The RPC server has to always consume the first frame (but it can yield this to the next consumer). This is because it has to know what the method is, to know which RPC handler will handle it. At this point it can also perform middleware duties and know if the call is authenticated or now. It could then return a failure frame to the peer and close the call, or provide metadata to the end consumer.

@CMCDragonkai
Copy link
Member

So the only thing we need here is to change fd-lock so that it supports a 0 timeout. That makes a "nonblocking locking library". Then if something else is locking it, then we just skip if our timeout is 0.

@CMCDragonkai CMCDragonkai self-assigned this Jul 10, 2023
@CMCDragonkai
Copy link
Member

Moving to Polykey-CLI.

@CMCDragonkai CMCDragonkai transferred this issue from MatrixAI/Polykey Aug 11, 2023
@CMCDragonkai CMCDragonkai removed their assignment Aug 18, 2023
@CMCDragonkai
Copy link
Member

CMCDragonkai commented Aug 11, 2024

GRPC and its middleware design isn't relevant anymore, but the general problem of coaslescing the locking is still an issue. Because it is necessary to allow concurrent operations of the PK CLI when interacting with the PK agent due to its client to server authentication.

PK-CLI still currently uses fd-lock, and I'm pretty sure this stub repo https://github.com/MatrixAI/js-file-locks was created to eventually replace it.

This issue should be re-specced in line with the current RPC structure, and the middleware that handles the authentication.

@CMCDragonkai
Copy link
Member

@tegefaulkes can you verify this (delegate this to someone too).

@CMCDragonkai CMCDragonkai removed the epic Big issue with multiple subissues label Aug 12, 2024
@tegefaulkes tegefaulkes added the respeck label Aug 13, 2024 — with Linear
@tegefaulkes
Copy link
Contributor

I'll have a look into this.

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Aug 16, 2024

The parent issue of this is actually MatrixAI/Polykey#290.

While the issue title talks about GRPC, I think this is still relevant to the current js-rpc. So the spec should be updated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
development Standard development r&d:polykey:core activity 1 Secret Vault Sharing and Secret History Management respeck
Development

No branches or pull requests

3 participants