Skip to content

Commit

Permalink
Merge branch 'db_main' into pranavmishradatabricks/block-expensive-qu…
Browse files Browse the repository at this point in the history
…eries

Signed-off-by: pranavmishradatabricks <[email protected]>
  • Loading branch information
pranavmishradatabricks authored Jun 4, 2024
2 parents c3f390b + 94eb766 commit 7846702
Show file tree
Hide file tree
Showing 96 changed files with 4,238 additions and 1,366 deletions.
58 changes: 54 additions & 4 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,28 +12,76 @@ We use *breaking :warning:* to mark changes that are not backward compatible (re

### Fixed

### Added

### Changed

### Removed

## [v0.35.1](https://github.com/thanos-io/thanos/tree/release-0.35) - 28.05.2024

### Fixed

- [#7323](https://github.com/thanos-io/thanos/pull/7323) Sidecar: wait for prometheus on startup
- [#6948](https://github.com/thanos-io/thanos/pull/6948) Receive: fix goroutines leak during series requests to thanos store api.
- [#7382](https://github.com/thanos-io/thanos/pull/7382) *: Ensure objstore flag values are masked & disable debug/pprof/cmdline
- [#7392](https://github.com/thanos-io/thanos/pull/7392) Query: fix broken min, max for pre 0.34.1 sidecars
- [#7373](https://github.com/thanos-io/thanos/pull/7373) Receive: Fix stats for remote write
- [#7318](https://github.com/thanos-io/thanos/pull/7318) Compactor: Recover from panic to log block ID

### Added

### Changed

### Removed

## [v0.35.0](https://github.com/thanos-io/thanos/tree/release-0.35) - 02.05.2024

### Fixed

- [#7083](https://github.com/thanos-io/thanos/pull/7083) Store Gateway: Fix lazy expanded postings with 0 length failed to be cached.
- [#7080](https://github.com/thanos-io/thanos/pull/7080) Receive: race condition in handler Close() when stopped early
- [#7132](https://github.com/thanos-io/thanos/pull/7132) Documentation: fix broken helm installation instruction
- [#7134](https://github.com/thanos-io/thanos/pull/7134) Store, Compact: Revert the recursive block listing mechanism introduced in https://github.com/thanos-io/thanos/pull/6474 and use the same strategy as in 0.31. Introduce a `--block-discovery-strategy` flag to control the listing strategy so that a recursive lister can still be used if the tradeoff of slower but cheaper discovery is preferred.
- [#7122](https://github.com/thanos-io/thanos/pull/7122) Store Gateway: Fix lazy expanded postings estimate base cardinality using posting group with remove keys.
- [#7166](https://github.com/thanos-io/thanos/pull/7166) Receive/MultiTSDB: Do not delete non-uploaded blocks
- [#7179](https://github.com/thanos-io/thanos/pull/7179) Query: Fix merging of query analysis
- [#7224](https://github.com/thanos-io/thanos/pull/7224) Query-frontend: Add Redis username to the client configuration.
- [#7220](https://github.com/thanos-io/thanos/pull/7220) Store Gateway: Fix lazy expanded postings caching partial expanded postings and bug of estimating remove postings with non existent value. Added `PromQLSmith` based fuzz test to improve correctness.
- [#7225](https://github.com/thanos-io/thanos/pull/7225) Compact: Don't halt due to overlapping sources when vertical compaction is enabled
- [#7244](https://github.com/thanos-io/thanos/pull/7244) Query: Fix Internal Server Error unknown targetHealth: "unknown" when trying to open the targets page.
- [#7248](https://github.com/thanos-io/thanos/pull/7248) Receive: Fix RemoteWriteAsync was sequentially executed causing high latency in the ingestion path.
- [#7271](https://github.com/thanos-io/thanos/pull/7271) Query: fixing dedup iterator when working on mixed sample types.
- [#7289](https://github.com/thanos-io/thanos/pull/7289) Query Frontend: show warnings from downstream queries.
- [#7308](https://github.com/thanos-io/thanos/pull/7308) Store: Batch TSDB Infos for blocks.

### Added

- [#7155](https://github.com/thanos-io/thanos/pull/7155) Receive: Add tenant globbing support to hashring config
- [#7231](https://github.com/thanos-io/thanos/pull/7231) Tracing: added missing sampler types
- [#7194](https://github.com/thanos-io/thanos/pull/7194) Downsample: retry objstore related errors
- [#7105](https://github.com/thanos-io/thanos/pull/7105) Rule: add flag `--query.enable-x-functions` to allow usage of extended promql functions (xrate, xincrease, xdelta) in loaded rules
- [#6867](https://github.com/thanos-io/thanos/pull/6867) Query UI: Tenant input box added to the Query UI, in order to be able to specify which tenant the query should use.
- [#7175](https://github.com/thanos-io/thanos/pull/7175): Query: Add `--query.mode=distributed` which enables the new distributed mode of the Thanos query engine.
- [#7199](https://github.com/thanos-io/thanos/pull/7199): Reloader: Add support for watching and decompressing Prometheus configuration directories
- [#7200](https://github.com/thanos-io/thanos/pull/7175): Query: Add `--selector.relabel-config` and `--selector.relabel-config-file` flags which allows scoping the Querier to a subset of matched TSDBs.
- [#7233](https://github.com/thanos-io/thanos/pull/7233): UI: Showing Block Size Stats
- [#7186](https://github.com/thanos-io/thanos/pull/7186) Query UI: Only show tenant input box when query tenant enforcement is enabled
- [#7175](https://github.com/thanos-io/thanos/pull/7175) Query: Add `--query.mode=distributed` which enables the new distributed mode of the Thanos query engine.
- [#7199](https://github.com/thanos-io/thanos/pull/7199) Reloader: Add support for watching and decompressing Prometheus configuration directories
- [#7200](https://github.com/thanos-io/thanos/pull/7175) Query: Add `--selector.relabel-config` and `--selector.relabel-config-file` flags which allows scoping the Querier to a subset of matched TSDBs.
- [#7233](https://github.com/thanos-io/thanos/pull/7233) UI: Showing Block Size Stats
- [#7256](https://github.com/thanos-io/thanos/pull/7256) Receive: Split remote-write HTTP requests via tenant labels of series
- [#7269](https://github.com/thanos-io/thanos/pull/7269) Query UI: Show peak/total samples in query analysis
- [#7280](https://github.com/thanos-io/thanos/pull/7281) *: Adding User-Agent to request logs
- [#7219](https://github.com/thanos-io/thanos/pull/7219) Receive: add `--remote-write.client-tls-secure` and `--remote-write.client-tls-skip-verify` flags to stop relying on grpc server config to determine grpc client secure/skipVerify.
- [#7297](https://github.com/thanos-io/thanos/pull/7297) *: mark as not queryable if status is not ready
- [#7302](https://github.com/thanos-io/thanos/pull/7303) Considering the `X-Forwarded-For` header for the remote address in the logs.
- [#7304](https://github.com/thanos-io/thanos/pull/7304) Store: Use loser trees for merging results

### Changed

- [#7123](https://github.com/thanos-io/thanos/pull/7123) Rule: Change default Alertmanager API version to v2.
- [#7192](https://github.com/thanos-io/thanos/pull/7192) Rule: Do not turn off ruler even if resolving fails
- [#7223](https://github.com/thanos-io/thanos/pull/7223) Automatic detection of memory limits and configure GOMEMLIMIT to match.
- [#7283](https://github.com/thanos-io/thanos/pull/7283) Compact: *breaking :warning:* Replace group with resolution in compact downsample metrics to avoid cardinality explosion with large numbers of groups.
- [#7305](https://github.com/thanos-io/thanos/pull/7305) Query|Receiver: Do not log full request on ProxyStore by default.

### Removed

Expand All @@ -44,6 +92,7 @@ We use *breaking :warning:* to mark changes that are not backward compatible (re
- [#7078](https://github.com/thanos-io/thanos/pull/7078) *: Bump gRPC to 1.57.2

### Added
- [#7105](https://github.com/thanos-io/thanos/pull/7105) Rule: add flag `--query.enable-x-functions` to allow usage of extended promql functions (xrate, xincrease, xdelta) in loaded rules

### Changed

Expand Down Expand Up @@ -147,6 +196,7 @@ We use *breaking :warning:* to mark changes that are not backward compatible (re
- [#6692](https://github.com/thanos-io/thanos/pull/6692) Store: Fix matching bug when using empty alternative in regex matcher, for example (a||b).
- [#6679](https://github.com/thanos-io/thanos/pull/6697) Store: Fix block deduplication
- [#6706](https://github.com/thanos-io/thanos/pull/6706) Store: Series responses should always be sorted
- [#7286](https://github.com/thanos-io/thanos/pull/7286) Query: Propagate instant query warnings in distributed execution mode.

### Added

Expand Down
2 changes: 1 addition & 1 deletion VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
0.35.0-dev
0.35.1
6 changes: 3 additions & 3 deletions cmd/thanos/compact.go
Original file line number Diff line number Diff line change
Expand Up @@ -458,9 +458,9 @@ func runCompact(
}

for _, meta := range filteredMetas {
groupKey := meta.Thanos.GroupKey()
downsampleMetrics.downsamples.WithLabelValues(groupKey)
downsampleMetrics.downsampleFailures.WithLabelValues(groupKey)
resolutionLabel := meta.Thanos.ResolutionString()
downsampleMetrics.downsamples.WithLabelValues(resolutionLabel)
downsampleMetrics.downsampleFailures.WithLabelValues(resolutionLabel)
}

if err := downsampleBucket(
Expand Down
18 changes: 9 additions & 9 deletions cmd/thanos/downsample.go
Original file line number Diff line number Diff line change
Expand Up @@ -50,16 +50,16 @@ func newDownsampleMetrics(reg *prometheus.Registry) *DownsampleMetrics {
m.downsamples = promauto.With(reg).NewCounterVec(prometheus.CounterOpts{
Name: "thanos_compact_downsample_total",
Help: "Total number of downsampling attempts.",
}, []string{"group"})
}, []string{"resolution"})
m.downsampleFailures = promauto.With(reg).NewCounterVec(prometheus.CounterOpts{
Name: "thanos_compact_downsample_failures_total",
Help: "Total number of failed downsampling attempts.",
}, []string{"group"})
}, []string{"resolution"})
m.downsampleDuration = promauto.With(reg).NewHistogramVec(prometheus.HistogramOpts{
Name: "thanos_compact_downsample_duration_seconds",
Help: "Duration of downsample runs",
Buckets: []float64{60, 300, 900, 1800, 3600, 7200, 14400}, // 1m, 5m, 15m, 30m, 60m, 120m, 240m
}, []string{"group"})
}, []string{"resolution"})

return m
}
Expand Down Expand Up @@ -130,9 +130,9 @@ func RunDownsample(
}

for _, meta := range metas {
groupKey := meta.Thanos.GroupKey()
metrics.downsamples.WithLabelValues(groupKey)
metrics.downsampleFailures.WithLabelValues(groupKey)
resolutionLabel := meta.Thanos.ResolutionString()
metrics.downsamples.WithLabelValues(resolutionLabel)
metrics.downsampleFailures.WithLabelValues(resolutionLabel)
}
if err := downsampleBucket(ctx, logger, metrics, insBkt, metas, dataDir, downsampleConcurrency, blockFilesConcurrency, hashFunc, false); err != nil {
return errors.Wrap(err, "downsampling failed")
Expand Down Expand Up @@ -263,11 +263,11 @@ func downsampleBucket(
errMsg = "downsampling to 60 min"
}
if err := processDownsampling(workerCtx, logger, bkt, m, dir, resolution, hashFunc, metrics, acceptMalformedIndex, blockFilesConcurrency); err != nil {
metrics.downsampleFailures.WithLabelValues(m.Thanos.GroupKey()).Inc()
metrics.downsampleFailures.WithLabelValues(m.Thanos.ResolutionString()).Inc()
errCh <- errors.Wrap(err, errMsg)

}
metrics.downsamples.WithLabelValues(m.Thanos.GroupKey()).Inc()
metrics.downsamples.WithLabelValues(m.Thanos.ResolutionString()).Inc()
}
}()
}
Expand Down Expand Up @@ -391,7 +391,7 @@ func processDownsampling(
downsampleDuration := time.Since(begin)
level.Info(logger).Log("msg", "downsampled block",
"from", m.ULID, "to", id, "duration", downsampleDuration, "duration_ms", downsampleDuration.Milliseconds())
metrics.downsampleDuration.WithLabelValues(m.Thanos.GroupKey()).Observe(downsampleDuration.Seconds())
metrics.downsampleDuration.WithLabelValues(m.Thanos.ResolutionString()).Observe(downsampleDuration.Seconds())

stats, err := block.GatherIndexHealthStats(ctx, logger, filepath.Join(resdir, block.IndexFilename), m.MinTime, m.MaxTime)
if err == nil {
Expand Down
5 changes: 5 additions & 0 deletions cmd/thanos/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -214,6 +214,11 @@ func getFlagsMap(flags []*kingpin.FlagModel) map[string]string {
if boilerplateFlags.GetFlag(f.Name) != nil {
continue
}
// Mask inline objstore flag which can have credentials.
if f.Name == "objstore.config" || f.Name == "objstore.config-file" {
flagsMap[f.Name] = "<REDACTED>"
continue
}
flagsMap[f.Name] = f.Value.String()
}

Expand Down
6 changes: 3 additions & 3 deletions cmd/thanos/main_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -157,7 +157,7 @@ func TestRegression4960_Deadlock(t *testing.T) {
testutil.Ok(t, err)

metrics := newDownsampleMetrics(prometheus.NewRegistry())
testutil.Equals(t, 0.0, promtest.ToFloat64(metrics.downsamples.WithLabelValues(meta.Thanos.GroupKey())))
testutil.Equals(t, 0.0, promtest.ToFloat64(metrics.downsamples.WithLabelValues(meta.Thanos.ResolutionString())))
baseBlockIDsFetcher := block.NewConcurrentLister(logger, bkt)
metaFetcher, err := block.NewMetaFetcher(nil, block.FetcherConcurrency, bkt, baseBlockIDsFetcher, "", nil, nil)
testutil.Ok(t, err)
Expand Down Expand Up @@ -197,15 +197,15 @@ func TestCleanupDownsampleCacheFolder(t *testing.T) {
testutil.Ok(t, err)

metrics := newDownsampleMetrics(prometheus.NewRegistry())
testutil.Equals(t, 0.0, promtest.ToFloat64(metrics.downsamples.WithLabelValues(meta.Thanos.GroupKey())))
testutil.Equals(t, 0.0, promtest.ToFloat64(metrics.downsamples.WithLabelValues(meta.Thanos.ResolutionString())))
baseBlockIDsFetcher := block.NewConcurrentLister(logger, bkt)
metaFetcher, err := block.NewMetaFetcher(nil, block.FetcherConcurrency, bkt, baseBlockIDsFetcher, "", nil, nil)
testutil.Ok(t, err)

metas, _, err := metaFetcher.Fetch(ctx)
testutil.Ok(t, err)
testutil.Ok(t, downsampleBucket(ctx, logger, metrics, bkt, metas, dir, 1, 1, metadata.NoneFunc, false))
testutil.Equals(t, 1.0, promtest.ToFloat64(metrics.downsamples.WithLabelValues(meta.Thanos.GroupKey())))
testutil.Equals(t, 1.0, promtest.ToFloat64(metrics.downsamples.WithLabelValues(meta.Thanos.ResolutionString())))

_, err = os.Stat(dir)
testutil.Assert(t, os.IsNotExist(err), "index cache dir should not exist at the end of execution")
Expand Down
42 changes: 21 additions & 21 deletions cmd/thanos/query.go
Original file line number Diff line number Diff line change
Expand Up @@ -127,6 +127,9 @@ func registerQuery(app *extkingpin.App) {
queryReplicaLabels := cmd.Flag("query.replica-label", "Labels to treat as a replica indicator along which data is deduplicated. Still you will be able to query without deduplication using 'dedup=false' parameter. Data includes time series, recording rules, and alerting rules.").
Strings()

enableDedupMerge := cmd.Flag("query.dedup-merge", "Enable deduplication merge of multiple time series with the same labels.").
Default("false").Bool()

instantDefaultMaxSourceResolution := extkingpin.ModelDuration(cmd.Flag("query.instant.default.max_source_resolution", "default value for max_source_resolution for instant queries. If not set, defaults to 0s only taking raw resolution into account. 1h can be a good value if you use instant queries over time ranges that incorporate times outside of your raw-retention.").Default("0s").Hidden())

defaultMetadataTimeRange := cmd.Flag("query.metadata.default-time-range", "The default metadata time range duration for retrieving labels through Labels and Series API when the range parameters are not specified. The zero value means range covers the time since the beginning.").Default("0s").Duration()
Expand Down Expand Up @@ -218,7 +221,7 @@ func registerQuery(app *extkingpin.App) {
storeSelectorRelabelConf := *extflag.RegisterPathOrContent(
cmd,
"selector.relabel-config",
"YAML with relabeling configuration that allows the Querier to select specific TSDBs by their external label. It follows native Prometheus relabel-config syntax. See format details: https://prometheus.io/docs/prometheus/latest/configuration/configuration/#relabel_config ",
"YAML file with relabeling configuration that allows selecting blocks to query based on their external labels. It follows the Thanos sharding relabel-config syntax. For format details see: https://thanos.io/tip/thanos/sharding.md/#relabelling ",
extflag.WithEnvSubstitution(),
)

Expand Down Expand Up @@ -374,6 +377,7 @@ func registerQuery(app *extkingpin.App) {
*enforceTenancy,
*tenantLabel,
*enableGroupReplicaPartialStrategy,
*enableDedupMerge,
)
})
}
Expand Down Expand Up @@ -457,6 +461,7 @@ func runQuery(
enforceTenancy bool,
tenantLabel string,
groupReplicaPartialResponseStrategy bool,
enableDedupMerge bool,
) error {
if alertQueryURL == "" {
lastColon := strings.LastIndex(httpBindAddr, ":")
Expand Down Expand Up @@ -566,24 +571,19 @@ func runQuery(
exemplarsProxy = exemplars.NewProxy(logger, endpoints.GetExemplarsStores, selectorLset)
queryableCreator query.QueryableCreator
)
if groupReplicaPartialResponseStrategy {
level.Info(logger).Log("msg", "Enabled group-replica partial response strategy")
queryableCreator = query.NewQueryableCreatorWithGroupReplicaPartialResponseStrategy(
logger,
extprom.WrapRegistererWithPrefix("thanos_query_", reg),
proxy,
maxConcurrentSelects,
queryTimeout,
)
} else {
queryableCreator = query.NewQueryableCreator(
logger,
extprom.WrapRegistererWithPrefix("thanos_query_", reg),
proxy,
maxConcurrentSelects,
queryTimeout,
)
opts := query.Options{
GroupReplicaPartialResponseStrategy: groupReplicaPartialResponseStrategy,
EnableDedupMerge: enableDedupMerge,
}
level.Info(logger).Log("msg", "databricks querier features", "opts", opts)
queryableCreator = query.NewQueryableCreatorWithOptions(
logger,
extprom.WrapRegistererWithPrefix("thanos_query_", reg),
proxy,
maxConcurrentSelects,
queryTimeout,
opts,
)

// Run File Service Discovery and update the store set when the files are modified.
if fileSD != nil {
Expand Down Expand Up @@ -803,7 +803,7 @@ func runQuery(
infoSrv := info.NewInfoServer(
component.Query.String(),
info.WithLabelSetFunc(func() []labelpb.ZLabelSet { return proxy.LabelSet() }),
info.WithStoreInfoFunc(func() *infopb.StoreInfo {
info.WithStoreInfoFunc(func() (*infopb.StoreInfo, error) {
if httpProbe.IsReady() {
mint, maxt := proxy.TimeRange()
return &infopb.StoreInfo{
Expand All @@ -812,9 +812,9 @@ func runQuery(
SupportsSharding: true,
SupportsWithoutReplicaLabels: true,
TsdbInfos: proxy.TSDBInfos(),
}
}, nil
}
return nil
return nil, errors.New("Not ready")
}),
info.WithExemplarsInfoFunc(),
info.WithRulesInfoFunc(),
Expand Down
Loading

0 comments on commit 7846702

Please sign in to comment.