Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add coordinatorId to /v1/info #23910

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

oneonestar
Copy link
Member

@oneonestar oneonestar commented Oct 25, 2024

Description

Add coordinatorId to /v1/info.

Motivation

Trino Gateway works as a proxy between client and coordinator. After the initial query submission, subsequent requests for the same query must be routed to the same coordinator. Currently, gateway store the queryId after the initial response from coordinator for later routing purposes.

If the queryId doesn't exist in the store (which happens when there are multiple gateway instances being deployed without sticky session), gateway will query every coordinator to find the correct destination.

By adding coordinatorId to /v1/info, the above routing logic can be simplified to:

  • During health check, get and store the coordinatorId in gateway.
  • Extract coordinatorId from queryId and route it accordingly.

Other concerns

coordinatorId is just a random string being generated during server startup. The only usage so far is for generating queryId. I don't think there are security concerns with exposing it in the public API.

Additional context and related issues

From coordinator:

$ curl localhost:8080/v1/info | jq
{
  "nodeVersion": {
    "version": "dev"
  },
  "environment": "test",
  "coordinator": true,
  "starting": true,
  "uptime": "13.03s",
  "coordinatorId": "mv3x6"
}

From worker:

$ curl localhost:8080/v1/info | jq
{
  "nodeVersion": {
    "version": "dev"
  },
  "environment": "test",
  "coordinator": false,
  "starting": true,
  "uptime": "16.89s"
}

Release notes

(x) Release notes are required, with the following suggested text:

## Section
* Add `coordinatorId` to `/v1/info`. ({issue}`issuenumber`)

@cla-bot cla-bot bot added the cla-signed label Oct 25, 2024
@martint
Copy link
Member

martint commented Oct 25, 2024

Each node in a Trino cluster already has a node id. Why is that not sufficient?

@martint
Copy link
Member

martint commented Oct 25, 2024

❯ curl 'localhost:8080/v1/status?pretty'

{
  "nodeId" : "1735155ae4e7",
  "nodeVersion" : {
    "version" : "463"
  },
  "environment" : "docker",
  "coordinator" : true,
  "uptime" : "9.71h",
  ...
}

@oneonestar
Copy link
Member Author

oneonestar commented Oct 25, 2024

Each node in a Trino cluster already has a node id. Why is that not sufficient?

We need to make routing decision base on queryId in the subsequent requests from client.


First request from client:

https://trino.com/v1/statement

First response from coordinator:

{
  "id":"20241025_020208_00989_w2sqk",
  "nextUri":"https://trino.com/v1/statement/queued/20241025_020208_00989_w2sqk/y24e7eac652b9e9e8953f71c3647de8a45d6af653/1",
  ...
}

The following request from client:

https://trino.com/v1/statement/queued/20241025_020208_00989_w2sqk/y24e7eac652b9e9e8953f71c3647de8a45d6af653/1

We want to route the request to the same coordinator. What we have here is the queryId
20241025_020208_00989_w2sqk which contains the coordinatorId.

Trino Gateway acts as a transparent proxy for one or more Trino clusters (ref). Multiple Trino clusters share the same domain trino.com.

@martint
Copy link
Member

martint commented Oct 25, 2024

The query id is an opaque identifier. It's not meant to be parsed and interpreted by clients. Its format is subject to change at any time.

@martint
Copy link
Member

martint commented Oct 25, 2024

One possible solution would be to rewrite the nextUri in the gateway to include the node id of the coordinator (e.g, in a query parameter), so that subsequent requests can be interpreted and routed to the right place.

@oneonestar
Copy link
Member Author

Yes, that's a possible solution. There are some alternatives being discussed:

All these modify the protocol between client and gateway. We don't have a specification
for the Trino client protocol. One might argue the protocol could also subject to change
at any time and break the assumptions used by the above approaches.

Another possible solution is to introduce some kind of sessionId into the protocol officially.
That could solve the START TRANSACTION issue in trinodb/trino-gateway#526.

@mosabua
Copy link
Member

mosabua commented Oct 31, 2024

Can @wendigo maybe look here as well and then maybe we should discuss as group of @oneonestar @martint @electrum @mosabua and others interested.

@martint
Copy link
Member

martint commented Oct 31, 2024

We don't have a specification for the Trino client protocol. One might argue the protocol could also subject to change
at any time and break the assumptions used by the above approaches.

The protocol is specified here, and is guaranteed to be stable: https://trino.io/docs/current/develop/client-protocol.html

Another possible solution is to introduce some kind of sessionId into the protocol officially.

That's a much larger topic. There's no state kept in the server at this time.

@electrum
Copy link
Member

electrum commented Nov 3, 2024

I don't see a problem with having the Trino gateway depend on the format of query ID, if that is the simplest solution. The query ID format has existed since the beginning of Trino and it seems unlikely we would need to change it in the future.

One consideration is that the coordinator ID is random, 5 digit base-32 number, so it only has ~33 million (32^5) possible values. What would happen in the unlikely event that multiple clusters choose the same coordinator ID?

Copy link

This pull request has gone a while without any activity. Tagging for triage help: @mosabua

@github-actions github-actions bot added the stale label Nov 25, 2024
Copy link

Closing this pull request, as it has been stale for six weeks. Feel free to re-open at any time.

@github-actions github-actions bot closed this Dec 16, 2024
@mosabua mosabua reopened this Jan 9, 2025
@mosabua mosabua added stale-ignore Use this label on PRs that should be ignored by the stale bot so they are not flagged or closed. and removed stale labels Jan 9, 2025
@mosabua
Copy link
Member

mosabua commented Jan 9, 2025

Now that @electrum chimed in I think we can go ahead and rebase and get this towards merge. We should just add a test here and maybe also in Trino Gateway so we catch any changes that might break things

@wendigo
Copy link
Contributor

wendigo commented Jan 9, 2025

Can we use node id that we already have instead of generating new id?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed stale-ignore Use this label on PRs that should be ignored by the stale bot so they are not flagged or closed.
Development

Successfully merging this pull request may close these issues.

5 participants