Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gorouter does not count 404s (route not found) as requests in metrics #365

Open
domdom82 opened this issue Nov 22, 2023 · 6 comments
Open
Assignees

Comments

@domdom82
Copy link
Contributor

Is this a security vulnerability?

No

Issue

Gorouter ignores unknown_route errors for the purpose of metrics. It does handle the request and returns a 404 to the client but it does not count that as a request that's been handled by gorouter, nor does it count the 404 as a responses_4xx value in metrics.

Affected Versions

All

Context

In production, we have HAProxy sitting in front of gorouter for TLS offloading. The 4xx status code metrics between HAProxy differ in that 404s coming from gorouter are recorded as 4xx errors in HAProxy but not in gorouter. There, the 404s are "invisible" on the metrics dashboards. We feel this is wrong, because even though there was no backend app that responded to the request, there still was a request and it should be counted as such.

Steps to Reproduce

  1. Push some-app.cf-app.com
  2. curl https://some-app.cf-app.com
  3. curl -u <user:pwd> http://gorouter:8082/varz
  4. Notice how the requests and responses_2xx metrics have increased
  5. curl https://does-not-exist.cf-app.com
  6. curl -u <user:pwd> http://gorouter:8082/varz
  7. Notice how the requests and responses_4xx metrics have not increased

Expected result

requests and responses_4xx metrics should include 404 unknown_route requests.

Note:
By extension, this should also apply to these errors:

  • 400 empty_host
  • 400 invalid_cf_app_instance_header
  • 503 no_endpoints
  • 503 Connection Limit Reached

Current result

requests and responses_4xx remain the same

Possible Fix

The request handling is stopped early in the lookup handler if the request cannot be forwarded.
A likely solution will be to also record the correct metrics in this handler.

maxmoehl pushed a commit to sap-contributions/routing-release that referenced this issue Jan 16, 2024
Submodule src/code.cloudfoundry.org/gorouter 1dafeaa9..62f4579a:
  > Check trace ID length before parsing the UUID (cloudfoundry#365)
Submodule src/code.cloudfoundry.org/routing-api-cli 5c9811fd..5b19c9d6:
  > Add release-binaries.bash
@MarcPaquette MarcPaquette moved this from Inbox to Waiting for Changes | Open for Contribution in Application Runtime Platform Working Group Feb 13, 2024
@tpbrisco
Copy link

tpbrisco commented Feb 20, 2024

There are 2 sorts of "4xx" series messages that can be returned -- there's the curl https://does-not-exist.cf-app.com version, but also the version where the targeted application returns a 400. That might account for the difference in metrics between haproxy and gorouter.
As per above, the "does-not-exist" does not increment the requests nor responses_4xx metrics. However, using a simple echo application (with ?status=401, see here and here) an application that exists and returns a 4xx message causes both metrics to be incremented.
In the case above for "does-not-exist", note that the "bad_requests" metric is incremented.

The "bad_request" reflects the non-existant request (as no such application could be delivered), whereas the requests and response_XXX metrics represent the back-end health. For purposes of correlation with a load-balancer/proxy in front of the platform, including the "bad_request" metric will probably allow the numbers to align.

@MarcPaquette
Copy link
Contributor

Hey @domdom82 ,
Did @tpbrisco's analysis meet your needs?

@mariash mariash self-assigned this Mar 26, 2024
@tpbrisco
Copy link

tpbrisco commented Apr 2, 2024

@domdom82 -- Dominc, I wanted to check in if there are open questions on this item. I'd like to submit it for closure. Check the above, it might answer your questions - or if there are other inquiries, let me know
Tom

@domdom82
Copy link
Contributor Author

domdom82 commented Apr 4, 2024

Hi @tpbrisco @MarcPaquette , thanks for your replies.
I think we disagree on what might be considered a "request". Looking at the code, current metrics suggest that only a request that made it through to the backend successfully is reported in the "total_requests" bucket.

To me, the term "total_requests" includes all requests handled by gorouter, whether they are successful or not, because work was done on the proxy that needs to be accounted for.

@tpbrisco following your logic I could combine the two buckets "total_requests" and "rejected_requests" (referred to as bad_requests in the code), however that would still not give me the full picture.

To gain the true number of requests handled by gorouter at a certain point in time, I would have to combine all of these metrics:

So my argument is to include not just requests that made it to the backend (better: ProxyRoundTripper) but also include requests that were handled by gorouter and got stopped in lookup.go - they are still requests and should be counted as such.

One solution could be, to

  • add l.reporter.CaptureRoutingRequest(nil) to the calls to l.reporter.CaptureBadRequest() in lookup.go or
  • add m.Batcher.BatchIncrementCounter("total_requests") to m.Batcher.BatchIncrementCounter("rejected_requests") and m.Batcher.BatchIncrementCounter("backend_exhausted_conns") in metricsreporter.go

@tpbrisco
Copy link

tpbrisco commented Apr 7, 2024 via email

@domdom82
Copy link
Contributor Author

@tpbrisco yep, you got that right. Exactly my sentiment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Waiting for Changes | Open for Contribution
Development

No branches or pull requests

4 participants