Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Integrate op-node with op-signer for block payload signing #103

Conversation

mininny
Copy link
Contributor

@mininny mininny commented Oct 2, 2024

Description

This PR introduces how we can integrate op-node with op-signer when signing the block payload for sequencer gossiping.

@mininny mininny force-pushed the feat/op-node-op-signer-integration branch from 71302c4 to bf4c4bb Compare October 2, 2024 21:59
@tynes
Copy link
Contributor

tynes commented Oct 2, 2024

This is awesome! Great design doc

# Risks & Uncertainties

1. Because op-signer will sign any payloads sent to its RPC, we must ensure that only trusted entities can access the RPC of op-signer. Otherwise, an attacker might request op-signer to sign malicious block payloads, and use its signature to propagate invalid block data to peers in the network. To prevent this, we must ensure that we have safe mechanisms in place to only allow the connection between trusted entities.
2. As we create secure connection between the op-signer and op-node (whether by mTLS or other methods), we must ensure that the time for signing the payload is quick enough. It must be always faster than the block build time, and quick enough for good gossiping across the network.
Copy link

@zhwrd zhwrd Oct 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think we need to double check a few other things related to performance, capacity and availability since the signing frequency will be increased a lot from what we do with batcher, proposer & challenger today. Also none of those services have as strict uptime requirements as a sequencer does.

  • What is the downstream rate limit / quota for signing requests to HSM backed key in KMS? Do we expect to be within that quota?
  • What is the impact to the sequencing if we have an op-signer or KMS outage. Should we have a fallback mechanism?
  • What is the acceptable max latency for signing requests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • The rate limit of hsm is about 50 QPS for assymetric signing, so it's enough for signing every 2 seconds per block.
  • The HSM has uptime SLA as 99.95%-99.99%: https://cloud.google.com/kms/sla?hl=en. AWS provides better uptime with 99.999%.
  • I couldn't find any concrete information about the latency of HSM signing request, but I guess we could do some testings and figure out. I think the acceptable max latency is 2 seconds since we have to sign every new block, and I assume mTLS latency + HSM signing operation to be less than 2 seconds at the most.

As you pointed out, the availability of the op-signer and KMS integration is very important as it is directly connected with the sequencer's ability to propagate new l2 blocks. And it is true that HSM is the single point of failure here. In order to mitigate this, we could consider using multi-region HSM as well as multi-provider KMS by integrating with other cloud providers like AWS.

However, I think it's also worth noting that this integration is optional and only crucial when the operator of the sequencer and signer is different. For crucial chains, we should still use the fixed secret key on the sequencer as the main code path.

@raffaele-oplabs raffaele-oplabs self-assigned this Oct 16, 2024
Copy link

@alfonso-op alfonso-op left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mininny, great doc. I left some comments.

I agree with Zach there are a few other performance and availability considerations we'll want to expand on.


### Considerations

Above signing structure is a custom scheme, and does not align with other more official signing schemes like the ones introduced by EIP-712 or ERC-191, so we can’t use something like `eth_sign` or `eth_signTypedData`.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems desirable to try and align on a more standard signing scheme. Provided it doesn't add considerable complexity / additional engineering effort, I recommend we implement against a standard and preserve backwards compatibility for older versions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree on using standard signing scheme and adding backwards compatibility to existing scheme when possible. I did look at other schemes like eth_sign and eth_signTypedData, but they require more payload and parsing as they contain ethereum-related information necessary for signing.

I think this rpc is specific to the op-node's p2p signing behavior, and will most likely just be handled by the op-signer(or may beother optimism-related clients in the future), rather than being used in general clients like geth, etc. It'd be best to minimize the payload size to keep the signature latency as minimal possible, and that would require optimism-specific rpc.

Rather, what do you think about changing the rpc name from eth_signBlockPayload to something like optimism_signBlockPayload to remove the impression that this operation is related to ethereum when it's not?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 one introducing it into the optimism RPC namespace, or a new opsigner namespace, rather than the ethereum namespace.

Signing with eth_signTypedData over typed data would make sense, but that API also has a few flaws (more complicated schema decoding and communication, on every signing request). We can look into unifying that in a future upgrade, if desired. Perhaps optimism_signBlockPayload can become a front-end, wrapping eth_signTypedData and applying the schema, type checks and logging needed for block signing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the design doc and the implementation to use the new opsigner namespace: ethereum-optimism/infra@dda416c. And left unifying the signing scheme to a possible future work in the docs

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to not adding to the eth namespace. It doesn't look like neither ethereum-optimism/infra#59 in the signer client nor ethereum-optimism/optimism#12325 docs were updated to the new name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@trianglesphere Oh you're right, the code were updated but not the descriptions. I just updated the description for those PRs to the new opsigner_signBlockPayload

```
This means we are sending more data to the op-signer, but may be more secure since the op-signer controls all of the logic for signing the payload.

# Risks & Uncertainties

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another potential risk to keep in mind is ensuring we enforce the invariant that only the active sequencer is able to successfully sign blocks without odd race conditions (e.g., during sequencer leadership transfer).

To prevent such issues, does the design require any further integration / communication with op-conductor, or will this be guaranteed with the existing op-node implementation?

for each signing request, op-signer will look at the requesting server's domain, and look at this yaml file for a matching config. Then, it will look for the gcp kms key that matches the `key` name.


## Security

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note that we'll want to conduct an FMA as part of this project.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course! I will write it before we deploy it :) thanks

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good callout, @alfonso-op, I will mention this in the FMA of SiaB project!


The signing of block payload in op-node is a custom signing structure where the signing hash is: `[32]byte keccak256Hash([32]byte domain, [32]byte chainId, [32]byte keccak256(payloadBytes))` (see op-node/p2p/signer.go). This signing hash is signed with the private key, to produce a ECDSA signature of 65 bytes.

# Proposed Solution

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the security-sensitive changes involved, these changes will require governance approval. We'll want to factor this into planning and timelines cc @taempark

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the scope this design doc is just adding more functionality to the op-signer. So I'm not sure if we'd need the governance approval for this feature. But, I think mentioning this to the FMA seem to be necessary.

One point I've thought about the governance approval is relevant is when we hand over the sequencer fully. I briefly discussed this with @mslipper, but that seem to not need the governance approval as well. But the decision of the conversation was let's double-check with @ben-chain just in case.


Thankfully, op-signer already has mTLS support, as integrated in op-batcher and op-proposer.

Here, users can specify

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'user can specify': does this refers to op-signer configuration service?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the tls certificate has to be set in both the op-signer / op-node service. Perhaps the wording users can is little misleading, as it's more of a must


# Risks & Uncertainties

1. Because op-signer will sign any payloads sent to its RPC, we must ensure that only trusted entities can access the RPC of op-signer. Otherwise, an attacker might request op-signer to sign malicious block payloads, and use its signature to propagate invalid block data to peers in the network. To prevent this, we must ensure that we have safe mechanisms in place to only allow the connection between trusted entities.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have read the code of op-signer and it technically does not manage authorization as far as I see.
Any authenticated user ( the one that have a valid client certificate) will be able to access the key with the same name as the one embedded in the certificate.
This may work in the assumption that any client own the key with the same name, and any client can perform all operations with that key.
This may be a valid assumption, I just want to make sure this is something we are aware in the moment we manage the certificates signing. As I was not able to see if there is a revocation list, the whole system security depends on how well the KMS for the client certificate is managed, and that there is an explicit written procedure that spell out the assumption that the authorization works with "DNSNames[0]" = access to a specific key+ all the functionalities of op-signer.
On top of that op-signer at the moment does not log network informations for auditing purposes like origin ip, or certificate signature as far as I can see.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree that we have an assumption that client with the specific DNS name + correct mTLS certificate will get access to all of the functionalities of op-signer using that specific key.

I'm all in for adding logs to op-signer for log network information and incoming request information.

Given that we safely set the mTLS certificate pair on the op-node and op-signer, what other security measures do you think we need to apply here?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before proceeding I think we should look at this more carefully. Or add a blocker to this before we are ready to deploy. At the moment, for how this is implemented, is possible for anyone able to deploy in the same kluster, to obtain a valid certificate with the characteristic mentioned.
@alfonso-op how could we manage this? As part of the MFA mentioned above?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@raffaele-oplabs yeah I suggest we cover as part of the FMA. We should allocate some extra time, since we know there is some additional security hardening we will have to implement here.

We could start with request logs and alerts, but should brainstorm other potential approaches.

security/op-node-op-signer-integration.md Outdated Show resolved Hide resolved
Comment on lines 71 to 78
## New RPC: eth_signBlockPayload

new eth-domain RPC (or under some other namespace)

```go
func eth_signBlockPayload(signingHash [32]byte) (sig *[65]byte, err error)
// where signingHash is keccak256Hash([32]byte domain, [32]byte chainId, [32]byte keccak256(payloadBytes))
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better if the 3 attributes are communicated over RPC fully. That way the op-signer server side can log the request, like it does with transaction signing requests, to clarify what is being signed.

Additionally, the chainID should be checked, as well as the signingHash, to not accidentally sign things not intended to be signed. E.g. when reusing the same sequencer key for multiple chains.

In a later protocol upgrade we might also want to structure it even more, to bind the block-number to the domain, or add one more layer of hashing to the payloadBytes part (like a head and a payload-body). That way the signing request can be more expressive, while summarizing the body content, to not pass through the full payload content which might otherwise slow things down.


### Considerations

Above signing structure is a custom scheme, and does not align with other more official signing schemes like the ones introduced by EIP-712 or ERC-191, so we can’t use something like `eth_sign` or `eth_signTypedData`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 one introducing it into the optimism RPC namespace, or a new opsigner namespace, rather than the ethereum namespace.

Signing with eth_signTypedData over typed data would make sense, but that API also has a few flaws (more complicated schema decoding and communication, on every signing request). We can look into unifying that in a future upgrade, if desired. Perhaps optimism_signBlockPayload can become a front-end, wrapping eth_signTypedData and applying the schema, type checks and logging needed for block signing.

payloadHash [32]byte `json:"payloadHash"`
}
```
This means we are sending more data to the op-signer, but may be more secure since the op-signer controls all of the logic for signing the payload.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think op-signer being configured with expected parameters and checking the RPC method arguments match is a better trade-off. That way you can detect misuse of the API, rather than producing a signature for the assumed domain and finding it's invalid much later when first doing a verification of the produced signature.

Perhaps the RPC should also include the "sender address", so we can assert that matches the expected address. And then it can be used as an intent, in case this same op-signer RPC server can sign for multiple different p2p signer keys.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comment! then I'll modify the design doc and code to send the domain, chainId, payloadBytesHash and senderAddress, and add validation for these parameters. Then we could add configurable parameter for chainId and senderAddress when spinning up the op-signer, and validate the incoming rpc requests to match these configurations :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And maybe we should bundle them in a struct, rather than specify them as individual function arguments. This will make it easier to extend the input in the future, by adding optional struct fields.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is updated in the design doc as well as the op-node pr here! ethereum-optimism/optimism@12c4820

Copy link

@ImTei ImTei Oct 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the purpose of the sender address to verify if the request is valid in an expected restricted environment? I mean, if it has the security purpose, attackers can easily change the address of a request.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The address should not be trusted, but it should help specify which account is meant to be used, if there are multiple options. The eth signing API supports a from-address, so it makes sense to do the same here.

@mininny mininny force-pushed the feat/op-node-op-signer-integration branch from 5ac3619 to 0266c3c Compare October 18, 2024 22:07
@mininny mininny merged commit 4e9938a into ethereum-optimism:main Oct 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants