Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

s2a: Add gRPC S2A #11113

Merged
merged 28 commits into from
Sep 14, 2024
Merged

s2a: Add gRPC S2A #11113

merged 28 commits into from
Sep 14, 2024

Conversation

rmehta19
Copy link
Contributor

@rmehta19 rmehta19 commented Apr 18, 2024

Add S2A Java client to gRPC Java.

Context: https://github.com/google/s2a-go/blob/main/README.md

Copy link

linux-foundation-easycla bot commented Apr 18, 2024

CLA Signed

The committers listed above are authorized under a signed CLA.

@matthewstevenson88
Copy link
Contributor

Thanks @rmehta19. Can you PTAL at the test failures?

@rmehta19 rmehta19 force-pushed the s2a branch 5 times, most recently from 604f9a0 to 3c867c9 Compare April 22, 2024 18:14
@rmehta19
Copy link
Contributor Author

rmehta19 commented Apr 22, 2024

@matthewstevenson88, I made 2 changes and it looks like 2/3 linux runs are passing. tests(11) is failing with an error in code that was not affected by this PR, so I don't think that failure is related.

The changes:

  • The build originally failed because the generated grpc code in this PR did not match the code generated by the codegen plugin. I couldn't easily compile the codegen myself(due to reasons explained in: protoc-gen-grpc-java not available on apple m1 #7690). So instead I used the latest published binary available on https://mvnrepository.com/artifact/io.grpc/protoc-gen-grpc-java (1.63) to generate S2AServiceGrpc.java (Created and used a separate simple maven project just to run the plugin and get grpc gen code). Then I changed line 8 & 9 of S2AServiceGrpc.java to match other generated code in this repo (ex:

    value = "by gRPC proto compiler",
    comments = "Source: grpc/gcp/handshaker.proto")
    ). This resolved the errors thrown by
    - name: Check for modified codegen
    run: test -z "$(git status --porcelain)" || (git status && echo Error Working directory is not clean. Forget to commit generated files? && false)

  • A failure in the tests was caused when IntegrationTest.java is in package io.grpc.s2a: GetAuthenticationMechanismsTest.java fails because the flag is not being set by JCommander (or perhaps it is getting overwritten). When I moved IntegrationTest.java into the same package as GetAuthenticationMechanismsTest.java: io.grpc.s2a.handshaker, this error does not happen.

@rmehta19 rmehta19 force-pushed the s2a branch 2 times, most recently from b7b51bc to 73cc37a Compare April 24, 2024 22:22
@matthewstevenson88
Copy link
Contributor

LGTM, thanks @rmehta19! When you have a chance, please remove "draft" to the PR title and can we say a bit more in the CL description including pointing to the S2A-Go README (similar to what we did in the proto PR)?

After that, can we get sign-off from @erm-g and @ejona86?

@rmehta19 rmehta19 changed the title [DRAFT]s2a: Add gRPC S2A s2a: Add gRPC S2A Apr 26, 2024
@rmehta19
Copy link
Contributor Author

LGTM, thanks @rmehta19! When you have a chance, please remove "draft" to the PR title and can we say a bit more in the CL description including pointing to the S2A-Go README (similar to what we did in the proto PR)?

After that, can we get sign-off from @erm-g and @ejona86?

Done -- thank you for the review @matthewstevenson88!

Copy link
Member

@ejona86 ejona86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make sure I understand, s2a is the public API, and s2a/channel and s2a/handshaker are internal APIs. Are those internal APIs to be used anywhere, even within Google?

s2a/src/main/java/io/grpc/s2a/S2AChannelCredentials.java Outdated Show resolved Hide resolved
s2a/src/main/proto/grpc/gcp/common.proto Outdated Show resolved Hide resolved
@rmehta19
Copy link
Contributor Author

Thanks for the review @larry-safran ! Please let me know if there is anything else to address.

@larry-safran larry-safran added the kokoro:run Add this label to a PR to tell Kokoro the code is safe and tests can be run label Sep 13, 2024
@grpc-kokoro grpc-kokoro removed the kokoro:run Add this label to a PR to tell Kokoro the code is safe and tests can be run label Sep 13, 2024
@larry-safran larry-safran merged commit b8c1aa5 into grpc:master Sep 14, 2024
15 checks passed
Copy link
Member

@ejona86 ejona86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some random things I noticed.

Copy link
Member

@ejona86 ejona86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More random stuff. This is mostly from just skimming randomly and noticing unexpected code shapes.

s2a/src/test/resources/client.csr Show resolved Hide resolved
subjectAltName = @alt_names

[alt_names]
IP.1 = ::
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is happening here? How would this ever work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is so that the generated certs have localhost IP addr, so that certificate verification passes in tests

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't localhost ::1?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is true. I ran a local test modifying the IP to be ::1 and regenerating these certs, confirming our tests still pass. I'll modify this in a separate PR along with the other comments on the EC cert data: #11540 (comment) .

I'll also update our go client tests for consistency: https://github.com/google/s2a-go/blob/main/testdata/config.cnf

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 7a879a2

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the confusion and thanks for pointing out. To be clear, out tests rely on the fact that Common Name is set to localhost (not actually in this config, but when this config is used to create the CSR, it is set. I noted this in the README). So that's why this field was left as ::, and had no effect on tests when set to ::1.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then let me repeat my initial question: What is happening here? What is it doing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use the config to generate certs to be used for client & S2A when testing mTLS to S2A. This is used in an integration test (also in S2AChannelCredentialsTest unit test, but that's just a unit test, no channel actually gets established between the client and S2A). As part of mTLS peer cert verification process, it is required that certificate CommonName matches the hostname (which is localhost).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand. This is alt_names aka SAN, which is not CommonName. And :: is not ::1 and that would not validate if the client used "localhost". So that all seems irrelevant.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(As I said before, as a hint for integration tests, don't put localhost in the cert, which is actually invalid. Instead, use managedChannelBuilder.overrideAuthority(). It is available in all languages; WithAuthority() in Go.)


Channel ch = channelPool.getChannel();
S2AServiceGrpc.S2AServiceStub stub = S2AServiceGrpc.newStub(ch);
S2AStub s2aStub = S2AStub.newInstance(stub);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This Stub is never closed. This causes IntegrationTest (after Larry's reordering of awaitTermination) to not complete tests quickly, as the RPC is outstanding so awaitTermination() hangs. I think once we shut down the top-level channel that releases the S2AHandshakerServiceChannel.ChannelResource after 1 second, which then calls shutdownNow() on the s2a channel which cancels the RPC.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we never call close on the stub directly. When the
close gets invoked, everything you mentioned happens.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not okay. That means we leak an RPC per connection for the life of all s2a channels, which is probably the lifetime of the process.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would closing the stub in close resolve the issue?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It resolves "never" but that is still busted. The stub needs to be closed when it is no longer needed. That could be when the future completes (e.g., sslContextFuture.addListener(s2aStub::close, service)). But I don't see a reason to split up the lifetimes between the threads like this. It seems clearer to:

@Override
protected void handlerAdded0(ChannelHandlerContext ctx) {
  // Buffer all reads until the TLS Handler is added.
  BufferReadsHandler bufferReads = new BufferReadsHandler();
  ctx.pipeline().addBefore(ctx.name(), /* name= */ null, bufferReads);
  ListenableFuture<SslContext> sslContextFuture = service.submit(this::createSslContext);
  ...
}

private SslContext createSslContext() {
  Channel ch = channelPool.getChannel();
  try (S2AStub s2aStub = S2AStub.newInstance(S2AServiceGrpc.newStub(ch))) {
    return SslContextFactory.createForClient(s2aStub, hostname, localIdentity);
  } finally {
    channelPool.returnToPool(ch);
  }
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The channel is still needed after the SslContext future is complete (we make RPCs to S2A to verify the peer, and perform private key operations during the handshake). The SslContextBuilder configures the trustManager and PrivateKeyMethod with the stub. In this example, the channel gets returned to the pool before the handshake is complete, which causes an error when S2ATrustManager uses the stub to verify the peer cert.

The stub is no longer needed when the handshake is complete. A probably not great option would be to pass around the channel instead, and create a new stub for every RPC? Or perhaps we plumb the stub down to the ClientTlsProtocolNegotiator and close it when the handshake is done?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ejona86, WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. A new stub for every RPC is fine. I thought y'all needed to use the same streaming RPC to preserve context. If separate RPCs are fine, that is definitely superior.

We don't want to inject into the handshake completion code itself, but that code is triggering a ProtocolNegotiationEvent. We can use that.

private static final class S2AStubCleanupNegotiationHandler extends ProtocolNegotiationHandler {
  private final s2aStub; // pass in constructor
  @Override
  protected void protocolNegotiationEventTriggered(ChannelHandlerContext ctx) {
    s2aStub.close();
    fireProtocolNegotiationEvent(ctx);
    ctx.pipeline().remove(ctx);
  }
}

And then in S2aProtocolNegotiationHandler, ctx.pipeline().addAfter() that S2AStubCleanupNegotiationHandler before adding the tls handler. (So the TLS handler will be between that cleanup handler and the s2a handler.)

We could do alternatives like adding the tls handler before the s2a handler, but they take a bit more reorganization.

Copy link
Contributor Author

@rmehta19 rmehta19 Oct 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Eric! I've implemented what you mention in #11600

I initially did think that we could do a new stub for each RPC, since we attach the localIdentity to each RPC, but chatted with @matthewstevenson88 and he mentioned there is a requirement from S2A that the lifetime of the bidirectional stream from the client to the S2A must be tied to the lifetime of the handshake.

@rmehta19
Copy link
Contributor Author

Thanks for the comments @ejona86; I'll address them in separate commits at #11534. Let me know if you prefer separate PRs for them, or any other way of organizing.

@ejona86
Copy link
Member

ejona86 commented Sep 18, 2024

There's plenty of smaller/isolated items. Feel free to group those together as you wish, as they are easy to review. A few of the comments might turn into larger/plumbing changes. Some of those may be best to be separate. But overall, if you do lots of the "easy" stuff together that'd be fine to review/merge while you work on something more difficult.

@rmehta19
Copy link
Contributor Author

@ejona86 , @larry-safran I have addressed all the comments on this PR in the following 3 PRs:

PTAL at these 3 and leave any comments.

I will address the comment to move things to internal package and combine the MTLS and non-MTLS apis in the following 2 PRs, once the above 3 are merged.

Besides these changes, please let me know anything else we can do to enable s2a to be included in a grpc-java release. Thank you!


Channel ch = channelPool.getChannel();
S2AServiceGrpc.S2AServiceStub stub = S2AServiceGrpc.newStub(ch);
S2AStub s2aStub = S2AStub.newInstance(stub);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It resolves "never" but that is still busted. The stub needs to be closed when it is no longer needed. That could be when the future completes (e.g., sslContextFuture.addListener(s2aStub::close, service)). But I don't see a reason to split up the lifetimes between the threads like this. It seems clearer to:

@Override
protected void handlerAdded0(ChannelHandlerContext ctx) {
  // Buffer all reads until the TLS Handler is added.
  BufferReadsHandler bufferReads = new BufferReadsHandler();
  ctx.pipeline().addBefore(ctx.name(), /* name= */ null, bufferReads);
  ListenableFuture<SslContext> sslContextFuture = service.submit(this::createSslContext);
  ...
}

private SslContext createSslContext() {
  Channel ch = channelPool.getChannel();
  try (S2AStub s2aStub = S2AStub.newInstance(S2AServiceGrpc.newStub(ch))) {
    return SslContextFactory.createForClient(s2aStub, hostname, localIdentity);
  } finally {
    channelPool.returnToPool(ch);
  }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants