va: log MPIC summaries #7817

jsha · 2024-11-15T17:56:44Z

Add "Perspective" and "RIR" configuration at the primary VA for each of its backends. This information is passed through to the VA as part of the RemoteVA struct.

Remove the "Hostname" field from the RemoteVA struct. It was previously used in logging errors, but Perspective will be a clearer way to indicate those errors.

Hoist the logic for "am I primary?" from performRemoteValidation to PerformValidation. This allows performRemoteValidation to more consistently return a set of succeeded and failed results.

Note: this is a stacked change on top of #7815. DO NOT MERGE until #7815 is merged. Also, this change pulls in the change from good++ / bad++ to slices of passed / failed from #7814.

github-actions · 2024-11-15T17:56:55Z

@jsha, this PR appears to contain configuration and/or SQL schema changes. Please ensure that a corresponding deployment ticket has been filed with the new values.

jsha · 2024-11-15T17:58:33Z

Pulling in some feedback from #7815 (comment):

I previously considered this approach but deemed it too risky. The remote VA backends are configured using SRV records from an in-DC Consul, populated by cloud automation. If the primary VA is set to treat certain backends as "perspective A," but their DNS actually points to "perspective B" backends, it could take a long time to discover the mismatch. The resulting revocation event would include every certificate using an authorization validated by "perspective A".

Instead, we should rely on the remote VAs to report their perspective as part of the ValidationResult. The result of a DNS misconfiguration or a primary VA configured to to lookup the wrong SRV record would be an increase in validations which now fail because they cannot reach quorum.

These are good points, and MPIC is important enough compliance wise that we should do what we can to programmatically detect config problems. There are big advantages to the primary knowing what its perspectives are too, like being able to abort at startup if the perspectives are wrong. And the primary can report failures by perspective even when that perspective is down.

What do you think about this: primary and remotes both know about perspectives, and in each RPC to a remote, the primary asserts what it believes the perspective and RIR are. If the remote disagrees, it returns an error.

beautifulentropy · 2024-11-15T20:42:19Z

What do you think about this: primary and remotes both know about perspectives, and in each RPC to a remote, the primary asserts what it believes the perspective and RIR are. If the remote disagrees, it returns an error.

Agreed, comparing the results of both is probably the best call.

beautifulentropy · 2024-11-15T20:43:58Z

I'll hold off on reviewing this until main is merged.

aarongable · 2024-11-15T20:53:51Z

What do you think about this: primary and remotes both know about perspectives, and in each RPC to a remote, the primary asserts what it believes the perspective and RIR are. If the remote disagrees, it returns an error.

I'm fine with this approach in the long term. But the purpose of these smaller PRs was to refactor Samantha's big PR into a collection of more-easily-reviewable changes, not to change the approach taken in that PR. For now, let's stick with having each remote know (and report) its own perspective/RIR, and leave the gRPC request to those remotes alone.

We can add having the primary VA know about its remotes' perspectives/RIRs and add checks that they match up later.

jsha · 2024-11-15T21:14:56Z

Sounds good, I'll split that part out (and fix up merge conflicts).

beautifulentropy · 2024-11-15T21:12:38Z

va/va.go

+	InternalError     string       `json:",omitempty"`
+	Perspective       string       `json:",omitempty"`
+	RIR               string       `json:",omitempty"`
+	MPICSummary       *mpicSummary `json:",omitempty"`


MPIC summary must always be logged, even if it's empty because we caught a local failure. Our logs don't distinguish between a local problem and a remote problem so it's useful to positively state that we did not attempt MPIC.

Good point, will change this to not omitempty.

beautifulentropy · 2024-11-15T21:26:45Z

va/va.go

+		// Note: len(va.remoteVAs) is greater than len(passed) + len(failed) because some
+		// are canceled on reaching quorum.
+		QuorumResult: fmt.Sprintf("%d/%d", len(passed), len(va.remoteVAs)),


If we make an "attempt" I believe that we have to log the outcome of that attempt according to the BRs:

5.4.1 Types of events recorded

...

The CA SHALL record at least the following events:

...

Multi-Perspective Issuance Corroboration attempts from each Network Perspective, minimally recording the following information:

a. an identifier that uniquely identifies the Network Perspective used;

b. the attempted domain name and/or IP address; and

c. the result of the attempt (e.g., "domain validation pass/fail", "CAA permission/prohibition").>
Multi-Perspective Issuance Corroboration quorum results for each attempted domain name or IP address represented in a Certificate request (i.e., "3/4" which should be interpreted as "Three (3) out of four (4) attempted Network Perspectives corroborated the determinations made by the Primary Network Perspective).

beautifulentropy · 2024-11-15T21:28:13Z

va/va.go

+		// Note: len(va.remoteVAs) is greater than len(passed) + len(failed) because some
+		// are canceled on reaching quorum.


I do not believe these are cancelled, they simply hit timeout after we've already returned the ValidationResult.

In grpc/interceptors.go we apply a deadline, and then defer cancel(). The cancel gets run after the handler returns.

boulder/grpc/interceptors.go

Lines 114 to 116 in 2502113

localCtx, cancel := context.WithDeadline(ctx, deadline)

defer cancel()

beautifulentropy · 2024-11-15T21:30:24Z

va/va.go

-	prob = va.performRemoteValidation(ctx, req)
+	mpicSummary, prob := va.performRemoteValidation(ctx, req)
+	logEvent.MPICSummary = mpicSummary
+
 	return bgrpc.ValidationResultToPB(records, filterProblemDetails(prob))


bgrpc.ValidationResultToPB must be modified to pass through the RIR and Perspective or the Primary VA will just get back empty strings.

beautifulentropy · 2024-11-15T21:32:27Z

va/va.go

+	return &mpicSummary{
+		PassedPerspectives: passedPerspectives,
+		FailedPerspectives: failedPerspectives,
+		PassedRIRs:         passedRIRs,


We're collecting up the RIRs but we're not ensuring that passing validations came from at least two unique RIRs.

This is true. My goal here was to add logging, not enforcement.

beautifulentropy · 2024-11-15T21:37:23Z

va/caa.go

@@ -255,17 +256,17 @@ func (va *ValidationAuthorityImpl) performRemoteCAACheck(
 					result.Problem = probs.ServerInternal("Remote VA IsCAAValid RPC cancelled")
 				} else {
 					// Handle validation error.
-					va.log.Errf("Remote VA %q.IsCAAValid failed: %s", rva.Address, err)
+					va.log.Errf("Remote VA %q.IsCAAValid failed: %s", perspective, err)


rva.Perspective is probably less useful than rva.Address when trying to find a remote VA instance that's acting up so you can terminate it.

beautifulentropy · 2024-11-15T21:42:15Z

va/caa.go

 			result := &remoteVAResult{
-				VAHostname: rva.Address,


This was removed, if you add it back as I have suggested in other comments it should be Address as this is the [IP Address]:Port of the remote VA.

beautifulentropy · 2024-11-15T21:44:22Z

va/caa_test.go

+				{brokenVA, testPerspective1, testRIR1},
+				{remoteVA, testPerspective2, testRIR2},
+				{remoteVA, testPerspective3, testRIR3},


All of the test cases here stop at 3 perspectives, but the number of allowed failures increases to 2 at 6+

Table: Quorum Requirements

# of Distinct Remote Network Perspectives Used # of Allowed non-Corroborations

2-5 1

6+ 2

We should add a test that checks this at exactly below and above that threshold.

beautifulentropy · 2024-11-15T21:52:38Z

va/va.go

+	for i, va1 := range remoteVAs {
+		for j, va2 := range remoteVAs {
+			if i != j && va1.Perspective == va2.Perspective && va1.Perspective != "" {
+				return nil, fmt.Errorf("duplicate remote VA perspective %q", va1.Perspective)
+			}
+		}
+	}


I think a check like this could give someone a false sense of confidence, when in reality the backends it starts with will be steadily replaced as remote VAs scale up and down daily. It probably doesn't need to go away, but we should comment that these are also evaluated when receiving each validation/check RPC.

aarongable · 2024-11-15T22:08:19Z

Given the complexity of the requirements here (as exemplified by the comments that Samantha just left) I'm unsure if the "lots of small refactors" plan should go this far. Right now I'm thinking it may be best for the two biggest pieces of Samantha's PR -- MPIC enforcement, and dropping CAA -- to remain in her original PR, rebased on top of the great small cleanups landed this week. Do we think that the changes made so far have improved the reviewability of that PR sufficiently?

Maybe I'm totally wrong, and this is the right approach to be taking. But it worries me to be going over MPIC requirements (like "what's the denominator in the quorum fraction") again, when that PR had already resolved those.

jsha · 2024-11-15T22:54:38Z

You both make the good point that this PR got too big! I noticed the issue of "oops sometimes we log hostnames instead of perspectives," and fixing it, well... snowballed.

I've update this to only add the logging, and leave the hostnames alone for a future PR. I still need to add some test cases that check for the logging.

I've also changed the base to main.

jsha requested a review from a team as a code owner November 15, 2024 17:56

jsha requested review from beautifulentropy and removed request for a team November 15, 2024 17:56

This was referenced Nov 15, 2024

VA: Cleanup performRemoteValidation #7814

Merged

MPIC: both primary and remote should know about perspectives #7819

Open

jsha force-pushed the min-3-rvas branch from cbd00b9 to 69655b5 Compare November 15, 2024 20:23

jsha force-pushed the mpic-summary branch from 6eaf282 to e14920d Compare November 15, 2024 21:14

beautifulentropy requested changes Nov 15, 2024

View reviewed changes

jsha force-pushed the mpic-summary branch from e14920d to 1fb129b Compare November 15, 2024 22:51

jsha changed the base branch from min-3-rvas to main November 15, 2024 22:54

va: log an mpicSummary for remote validations

6059269

jsha force-pushed the mpic-summary branch from 1fb129b to 6059269 Compare November 16, 2024 00:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

va: log MPIC summaries #7817

va: log MPIC summaries #7817

jsha commented Nov 15, 2024

github-actions bot commented Nov 15, 2024

jsha commented Nov 15, 2024

beautifulentropy commented Nov 15, 2024

beautifulentropy commented Nov 15, 2024

aarongable commented Nov 15, 2024

jsha commented Nov 15, 2024 •

edited

Loading

beautifulentropy Nov 15, 2024

jsha Nov 15, 2024

beautifulentropy Nov 15, 2024

beautifulentropy Nov 15, 2024

jsha Nov 15, 2024

beautifulentropy Nov 15, 2024

beautifulentropy Nov 15, 2024

jsha Nov 15, 2024

beautifulentropy Nov 15, 2024

beautifulentropy Nov 15, 2024

beautifulentropy Nov 15, 2024

beautifulentropy Nov 15, 2024

aarongable commented Nov 15, 2024 •

edited

Loading

jsha commented Nov 15, 2024

		// Note: len(va.remoteVAs) is greater than len(passed) + len(failed) because some
		// are canceled on reaching quorum.

	localCtx, cancel := context.WithDeadline(ctx, deadline)
	defer cancel()

va: log MPIC summaries #7817

Are you sure you want to change the base?

va: log MPIC summaries #7817

Conversation

jsha commented Nov 15, 2024

github-actions bot commented Nov 15, 2024

jsha commented Nov 15, 2024

beautifulentropy commented Nov 15, 2024

beautifulentropy commented Nov 15, 2024

aarongable commented Nov 15, 2024

jsha commented Nov 15, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

5.4.1 Types of events recorded

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aarongable commented Nov 15, 2024 • edited Loading

jsha commented Nov 15, 2024

jsha commented Nov 15, 2024 •

edited

Loading

aarongable commented Nov 15, 2024 •

edited

Loading