-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Downsample in OpenTSDB #31
Comments
I don't think there's enough detail in the Thanos store api's SeriesRequest to determine what aggregations could help. Thanos needs more work I think |
It's very unlikely thanos will change there, the store API fairly closely matches Prometheus's tsdb so it's unclear how work could even be done there. The main thing we can do would be downsampling as often OpenTSDB data has more points than Prometheus data would due to the push rather than scrape model. |
If you compare PromQL to resulting SeriesRequest, you'll see that downsampling window size is not transmitted, and functions are mapped to the Aggr value but it seems to be just a stub of an idea and is unused: |
A configurable minimum resolution could be used to downsample all queries regardless of what is specified in a user's PromQL. |
So, having done some digging, a maximum resolution window is passed down to the series request when you select a downsample mode other than "Only raw data" on the Thanos query UI (raw data is the default). If a non-zero value is passed from the query (auto is a valid value) then this is set to a sensible value which would then allow downsampling, which can be combined with the aggregate type to do the right thing. OpenTSDB 2.4 also supports rollups and pre-aggregates, but we certainly don't run this (it supports storage of such, and retrieval via new APIs, but not the actual calculation thereof) so I don't see much value in supporting at this stage. |
We should also investigate setting https://github.com/thanos-io/thanos/blob/master/docs/components/query.md#auto-downsampling on the querier (as the thanos UI has the downsampling controls mentioned, but many queries will come from Grafana...). And document this behaviour... |
Yeah, that's a request-level param. We could either add explicit Thanos support in Grafana datasources, and then configure this at the datasource level (or at least default it in the UI) or default this on at the querier level - but that feels more insidious.. |
Hmm, so the Prometheus datasource in Grafana supports custom query parameters (turns out these were added for Thanos - grafana/grafana#16665): https://grafana.com/docs/grafana/latest/features/datasources/prometheus/ However it would be nice to set on a per-query basis - extending the Prometheus datasource to also support Thanos might be desirable. In fact in the grafana issue above @davkal made a suggestion along the lines I was thinking - might be worth trying to properly implement that. |
Re per-query parameters: we recommend using two datasources with different query parameters pointing to the same thanos instead. In the panel editor you then choose the Mixed datasource and can have per-query datasources. |
I think my concern with the current recommendation is that as the set of additional parameters the Thanos Query component supports over Prometheus grows, the potential set of configurations you might want to choose from grows. Given the flexibility of Thanos, we're finding that we're deploying many different Query components depending on the Store API implementations we want to be able to query over, so multiplying the two could get very unwieldy. This might be best discussed over in a Grafana issue, but I'd be keen to gauge appetite on the LTS extensions for the Prometheus datasource such that we can have a single datasource used in different ways set at query time with appropriate validation. |
It may be possible to do some aggregations in OpenTSDB to avoid sending the raw OpenTSDB data back. Not all the aggregations make sense, in particular the Thanos aggrchunks are about downsampling mostly, the labels don't seem to change.
Maybe useful to do #15 first so we can actually measure the impact.
The text was updated successfully, but these errors were encountered: