Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proxies (NGINX and HAProxy) return 504's on the /proto.Woodpecker/Next gRPC route between agent and server #4503

Open
3 tasks done
IvoWingelaar opened this issue Dec 2, 2024 · 4 comments
Labels
bug Something isn't working

Comments

@IvoWingelaar
Copy link

IvoWingelaar commented Dec 2, 2024

Component

server, agent

Describe the bug

I configured NGINX as a gRPC reverse proxy with TLS offloading between the agents and the server. However, the NGINX logs quickly are filled with lines complaining about timeouts:

upstream timed out (110: Connection timed out) while reading response header from upstream, client: x.x.x.x, server: example.org, request: "POST /proto.Woodpecker/Next HTTP/2.0", upstream: "grpc://127.0.0.1:3002", host: "example.org"

Which means the access log of NGINX will log 504's being sent back to the clients:

[02/Dec/2024:13:27:57 +0000] "POST /proto.Woodpecker/Next HTTP/2.0" 504 167 "-" "grpc-go/1.65.0"

From the debug logs of NGINX it can be determined that the timeouts happen after 60s.
Setting WOODPECKER_KEEPALIVE_TIME=10s on the agent to try and keep the connection open does nothing.

Steps to reproduce

  1. Install the latest Woodpecker server. Let it listen to gRPC on port 3002.
  2. Configure an NGINX reverse proxy to perform gRPC TLS offloading as follows:
upstream wp {
  server 127.0.0.1:3002;
}

server {
  listen 443 ssl http2;
  listen [::]:443 ssl http2;

  server_name example.org;

  ssl_certificate /etc/example.org/fullchain.pem;
  ssl_certificate_key /etc/example.org/privkey.pem;

  location / {
    grpc_pass grpc://wp;
  }
}
  1. Configure a Woodpecker agent with WOODPECKER_SERVER=example.org, and WOODPECKER_GRPC_SECURE=true.
  2. Notice the Woodpecker agent can connect to the server and take tasks from the queue successfully, but there are frequent (~1 minute interval) 504's in the access.log and upstream timeouts in the error.log of NGINX.

Expected behavior

The agent appears to be polling for new tasks on the /proto.Woodpecker/Next route to the server. The implementation of this form of long-polling is fragile, as an intermediary infra-component like NGINX in the path of the request can terminate connections if they are too long lived.

I would have expected that the WOODPECKER_KEEPALIVE_TIME argument on the agent would prevent this from happening, but it does not keep the connection alive when used.

System Info

{"source":"https://github.com/woodpecker-ci/woodpecker","version":"2.7.3"}

Additional context

No response

Validations

  • Read the docs.
  • Check that there isn't already an issue that reports the same bug to avoid creating a duplicate.
  • Checked that the bug isn't fixed in the next version already [https://woodpecker-ci.org/faq#which-version-of-woodpecker-should-i-use]
@IvoWingelaar IvoWingelaar added the bug Something isn't working label Dec 2, 2024
@zc-devs
Copy link
Contributor

zc-devs commented Dec 2, 2024

#693

@IvoWingelaar
Copy link
Author

I don't see the relevance of that linked PR @zc-devs, as none of the variables removed by that PR refer to timeouts at all.

@zc-devs
Copy link
Contributor

zc-devs commented Dec 7, 2024

Sorry, I thought this change was a cause.

Setting WOODPECKER_KEEPALIVE_TIME=10s on the agent to try and keep the connection open does nothing.

WOODPECKER_KEEPALIVE_TIME argument on the agent ... does not keep the connection alive when used.

@IvoWingelaar
Copy link
Author

For the record, I tried the same setup using HAproxy, to exclude it being an Nginx-related bug, and I got the same result:

Dec 07 18:57:37 test-vm haproxy[353712]: x.x.x.x:42374 [07/Dec/2024:18:56:47.682] y.y.y.y~ grpc_wp/wp 0/0/0/-1/50002 504 198 - - sH-- 1/1/0/0/0 0/0 "POST https://y.y.y.y:5001/proto.Woodpecker/Next HTTP/2.0"
Dec 07 18:57:37 test-vm woodpecker-agent[353750]: {"level":"warn","error":"rpc error: code = Unavailable desc = unexpected HTTP status code received from server: 504 (Gateway Timeout); transport: received unexpected content-type \"text/html\"","time":"2024-12-07T18:57:37Z","message":"grpc error: next(): code: Unavailable"}

@IvoWingelaar IvoWingelaar changed the title NGINX 504's on the /proto.Woodpecker/Next route between agent and server when using grpc_pass Proxies (NGINX and HAProxy) return 504's on the /proto.Woodpecker/Next gRPC route between agent and server Dec 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants