Skip to content

Commit

Permalink
update docs after legacy configs have been removed (#17634)
Browse files Browse the repository at this point in the history
* update docs after legacy configs have been removed
* better runtime.properties validation with new StartupInjectorBuilder.PropertiesValidator to fail fast if removed properties are set
  • Loading branch information
clintropolis authored Jan 17, 2025
1 parent 05431fe commit e805c9a
Show file tree
Hide file tree
Showing 25 changed files with 216 additions and 206 deletions.
1 change: 0 additions & 1 deletion docs/api-reference/service-status-api.md
Original file line number Diff line number Diff line change
Expand Up @@ -400,7 +400,6 @@ Host: http://ROUTER_IP:ROUTER_PORT
"druid.emitter": "noop",
"sun.io.unicode.encoding": "UnicodeBig",
"druid.storage.type": "local",
"druid.expressions.useStrictBooleans": "true",
"java.class.version": "55.0",
"socksNonProxyHosts": "local|*.local|169.254/16|*.169.254/16",
"druid.server.hiddenProperties": "[\"druid.s3.accessKey\",\"druid.s3.secretKey\",\"druid.metadata.storage.connector.password\", \"password\", \"key\", \"token\", \"pwd\"]"
Expand Down
20 changes: 0 additions & 20 deletions docs/configuration/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -823,20 +823,6 @@ Support for 64-bit floating point columns was released in Druid 0.11.0, so if yo
|--------|-----------|-------|
|`druid.indexing.doubleStorage`|Set to "float" to use 32-bit double representation for double columns.|double|

### SQL compatible null handling

These configurations are deprecated and will be removed in a future release at which point Druid will always have SQl compatible null handling.

Prior to version 0.13.0, Druid string columns treated `''` and `null` values as interchangeable, and numeric columns were unable to represent `null` values, coercing `null` to `0`. Druid 0.13.0 introduced a mode which enabled SQL compatible null handling, allowing string columns to distinguish empty strings from nulls, and numeric columns to contain null rows.

|Property|Description|Default|
|--------|-----------|-------|
|`druid.generic.useDefaultValueForNull`|Set to `false` to store and query data in SQL compatible mode. This configuration has been deprecated and will be removed in a future release, taking on the `false` behavior. When set to `true` (deprecated legacy mode), `null` values will be stored as `''` for string columns and `0` for numeric columns.|`false`|
|`druid.generic.useThreeValueLogicForNativeFilters`|Set to `true` to use SQL compatible three-value logic when processing native Druid filters when `druid.generic.useDefaultValueForNull=false` and `druid.expressions.useStrictBooleans=true`. This configuration has been deprecated and will be removed in a future release, taking on the `true` behavior. When set to `false` Druid uses 2 value logic for filter processing, even when `druid.generic.useDefaultValueForNull=false` and `druid.expressions.useStrictBooleans=true`. See [boolean handling](../querying/sql-data-types.md#boolean-logic) for more details|`true`|
|`druid.generic.ignoreNullsForStringCardinality`|When set to `true`, `null` values will be ignored for the built-in cardinality aggregator over string columns. Set to `false` to include `null` values while estimating cardinality of only string columns using the built-in cardinality aggregator. This setting takes effect only when `druid.generic.useDefaultValueForNull` is set to `true` and is ignored in SQL compatibility mode. Additionally, empty strings (equivalent to null) are not counted when this is set to `true`. This configuration has been deprecated and will be removed in a future release since it has no effect when `druid.generic.useDefaultValueForNull=false`. |`false`|

This mode does have a storage size and query performance cost, see [segment documentation](../design/segments.md#handling-null-values) for more details.

### HTTP client

All Druid components can communicate with each other over HTTP.
Expand Down Expand Up @@ -2213,12 +2199,6 @@ Supported query contexts:
|`sortByDimsFirst`|Sort the results first by dimension values and then by timestamp.|false|
|`forceLimitPushDown`|When all fields in the orderby are part of the grouping key, the broker will push limit application down to the Historical processes. When the sorting order uses fields that are not in the grouping key, applying this optimization can result in approximate results with unknown accuracy, so this optimization is disabled by default in that case. Enabling this context flag turns on limit push down for limit/orderbys that contain non-grouping key columns.|false|

#### Expression processing configurations

|Key|Description|Default|
|---|-----------|-------|
|`druid.expressions.useStrictBooleans`|Controls the behavior of Druid boolean operators and functions, if set to `true` all boolean values are either `1` or `0`. This configuration has been deprecated and will be removed in a future release, taking on the `true` behavior. See [expression documentation](../querying/math-expr.md#logical-operator-modes) for more information.|true|

### Router

#### Router process configs
Expand Down
11 changes: 1 addition & 10 deletions docs/design/segments.md
Original file line number Diff line number Diff line change
Expand Up @@ -82,16 +82,7 @@ For each row in the list of column data, there is only a single bitmap that has

## Handling null values

By default Druid stores segments in a SQL compatible null handling mode. String columns always store the null value as id 0, the first position in the value dictionary and an associated entry in the bitmap value indexes used to filter null values. Numeric columns also store a null value bitmap index to indicate the null valued rows, which is used to null check aggregations and for filter matching null values.

Druid also has a legacy mode which uses default values instead of nulls, which was the default prior to Druid 28.0.0. This legacy mode is deprecated and will be removed in a future release, but can be enabled by setting `druid.generic.useDefaultValueForNull=true`.

In legacy mode, Druid segments created _at ingestion time_ have the following characteristics:

* String columns can not distinguish `''` from `null`, they are treated interchangeably as the same value
* Numeric columns can not represent `null` valued rows, and instead store a `0`.

In legacy mode, numeric columns do not have the null value bitmap, and so can have slightly decreased segment sizes, and queries involving numeric columns can have slightly increased performance in some cases since there is no need to check the null value bitmap.
String columns always store the null value if present in any row as id 0, the first position in the value dictionary and an associated entry in the bitmap value indexes used to filter null values. Numeric columns also store a null value bitmap index to indicate the null valued rows, which is used to null check aggregations and for filter matching null values.

## Segments with different schemas

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ There are currently no configuration properties specific to Moving Average.
## Limitations
* movingAverage is missing support for the following groupBy properties: `subtotalsSpec`, `virtualColumns`.
* movingAverage is missing support for the following timeseries properties: `descending`.
* movingAverage is missing support for [SQL-compatible null handling](https://github.com/apache/druid/issues/4349) (So setting druid.generic.useDefaultValueForNull in configuration will give an error).
* movingAverage averagers consider empty buckets and null aggregation values as 0 unless otherwise noted.

## Query spec
* Most properties in the query spec derived from [groupBy query](../../querying/groupbyquery.md) / [timeseries](../../querying/timeseriesquery.md), see documentation for these query types.
Expand Down
4 changes: 1 addition & 3 deletions docs/development/extensions-core/approximate-histograms.md
Original file line number Diff line number Diff line change
Expand Up @@ -226,9 +226,7 @@ For performance and accuracy reasons, we recommend avoiding aggregation of histo

### Null handling

If `druid.generic.useDefaultValueForNull` is false, null values will be tracked in the `missingValueCount` field of the histogram.

If `druid.generic.useDefaultValueForNull` is true, null values will be added to the histogram as the default 0.0 value.
Druid tracks null values in the `missingValueCount` field of the histogram.

## Histogram post-aggregators

Expand Down
12 changes: 6 additions & 6 deletions docs/development/extensions-core/stats.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,12 +62,12 @@ You can use the variance and standard deviation aggregation functions in the SEL

|Function|Notes|Default|
|--------|-----|-------|
|`VAR_POP(expr)`|Computes variance population of `expr`.|`null` or `0` if `druid.generic.useDefaultValueForNull=true` (deprecated legacy mode)|
|`VAR_SAMP(expr)`|Computes variance sample of `expr`.|`null` or `0` if `druid.generic.useDefaultValueForNull=true` (deprecated legacy mode)|
|`VARIANCE(expr)`|Computes variance sample of `expr`.|`null` or `0` if `druid.generic.useDefaultValueForNull=true` (deprecated legacy mode)|
|`STDDEV_POP(expr)`|Computes standard deviation population of `expr`.|`null` or `0` if `druid.generic.useDefaultValueForNull=true` (deprecated legacy mode)|
|`STDDEV_SAMP(expr)`|Computes standard deviation sample of `expr`.|`null` or `0` if `druid.generic.useDefaultValueForNull=true` (deprecated legacy mode)|
|`STDDEV(expr)`|Computes standard deviation sample of `expr`.|`null` or `0` if `druid.generic.useDefaultValueForNull=true` (deprecated legacy mode)|
|`VAR_POP(expr)`|Computes variance population of `expr`.|`null`|
|`VAR_SAMP(expr)`|Computes variance sample of `expr`.|`null`|
|`VARIANCE(expr)`|Computes variance sample of `expr`.|`null`|
|`STDDEV_POP(expr)`|Computes standard deviation population of `expr`.|`null`|
|`STDDEV_SAMP(expr)`|Computes standard deviation sample of `expr`.|`null`|
|`STDDEV(expr)`|Computes standard deviation sample of `expr`.|`null`|

### Pre-aggregating variance at ingestion time

Expand Down
3 changes: 1 addition & 2 deletions docs/ingestion/schema-design.md
Original file line number Diff line number Diff line change
Expand Up @@ -258,8 +258,7 @@ You can have Druid infer the schema and types for your data partially or fully b
When performing type-aware schema discovery, Druid can discover all the columns of your input data (that are not present in
the exclusion list). Druid automatically chooses the most appropriate native Druid type among `STRING`, `LONG`,
`DOUBLE`, `ARRAY<STRING>`, `ARRAY<LONG>`, `ARRAY<DOUBLE>`, or `COMPLEX<json>` for nested data. For input formats with
native boolean types, Druid ingests these values as longs if `druid.expressions.useStrictBooleans` is set to `true`
(the default) or strings if set to `false`. Array typed columns can be queried using
native boolean types, Druid ingests these values as longs. Array typed columns can be queried using
the [array functions](../querying/sql-array-functions.md) or [UNNEST](../querying/sql-functions.md#unnest). Nested
columns can be queried with the [JSON functions](../querying/sql-json-functions.md).

Expand Down
14 changes: 7 additions & 7 deletions docs/querying/aggregations.md
Original file line number Diff line number Diff line change
Expand Up @@ -103,7 +103,7 @@ Example:

#### `doubleMin` aggregator

`doubleMin` computes the minimum of all input values and null if `druid.generic.useDefaultValueForNull` is false or Double.POSITIVE_INFINITY if true.
`doubleMin` computes the minimum of all input values and null.

Example:
```json
Expand All @@ -112,7 +112,7 @@ Example:

#### `doubleMax` aggregator

`doubleMax` computes the maximum of all input values and null if `druid.generic.useDefaultValueForNull` is false or Double.NEGATIVE_INFINITY if true.
`doubleMax` computes the maximum of all input values and null.

Example:
```json
Expand All @@ -121,7 +121,7 @@ Example:

#### `floatMin` aggregator

`floatMin` computes the minimum of all input values and null if `druid.generic.useDefaultValueForNull` is false or Float.POSITIVE_INFINITY if true.
`floatMin` computes the minimum of all input values and null.

Example:
```json
Expand All @@ -130,7 +130,7 @@ Example:

#### `floatMax` aggregator

`floatMax` computes the maximum of all input values and null if `druid.generic.useDefaultValueForNull` is false or Float.NEGATIVE_INFINITY if true.
`floatMax` computes the maximum of all input values and null.

Example:
```json
Expand All @@ -139,7 +139,7 @@ Example:

#### `longMin` aggregator

`longMin` computes the minimum of all input values and null if `druid.generic.useDefaultValueForNull` is false or Long.MAX_VALUE if true.
`longMin` computes the minimum of all input values and null.

Example:
```json
Expand All @@ -148,7 +148,7 @@ Example:

#### `longMax` aggregator

`longMax` computes the maximum of all metric values and null if `druid.generic.useDefaultValueForNull` is false or Long.MIN_VALUE if true.
`longMax` computes the maximum of all metric values and null.

Example:
```json
Expand Down Expand Up @@ -485,7 +485,7 @@ Aggregator applicable only at query time. Aggregates results using [Druid expres
| `finalize` | The finalize expression which can only refer to a single input variable, `o`. This expression is used to perform any final transformation of the output of the `fold` or `combine` expressions. If not set, then the value is not transformed. | No |
| `initialValue` | The initial value of the accumulator for the `fold` (and `combine`, if `InitialCombineValue` is null) expression. | Yes |
| `initialCombineValue` | The initial value of the accumulator for the `combine` expression. | No. Default `initialValue`. |
| `isNullUnlessAggregated` | Indicates that the default output value should be `null` if the aggregator does not process any rows. If true, the value is `null`, if false, the result of running the expressions with initial values is used instead. | No. Defaults to the value of `druid.generic.useDefaultValueForNull`. |
| `isNullUnlessAggregated` | If true, sets the default output value to `null` when the aggregator does not process any rows. If false, Druid computes the value as the result of running the expressions with initial values. | No. Defaults to `true`. |
| `shouldAggregateNullInputs` | Indicates if the `fold` expression should operate on any `null` input values. | No. Defaults to `true`. |
| `shouldCombineAggregateNullInputs` | Indicates if the `combine` expression should operate on any `null` input values. | No. Defaults to the value of `shouldAggregateNullInputs`. |
| `maxSizeBytes` | Maximum size in bytes that variably sized aggregator output types such as strings and arrays are allowed to grow to before the aggregation fails. | No. Default is 8192 bytes. |
Expand Down
6 changes: 3 additions & 3 deletions docs/querying/filters.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ When the selector filter matches against numeric inputs, the string `value` will

The equality filter is a replacement for the selector filter with the ability to match against any type of column. The equality filter is designed to have more SQL compatible behavior than the selector filter and so can not match null values. To match null values use the null filter.

Druid's SQL planner uses the equality filter by default instead of selector filter whenever `druid.generic.useDefaultValueForNull=false`, or if `sqlUseBoundAndSelectors` is set to false on the [SQL query context](./sql-query-context.md).
Druid's SQL planner uses the equality filter by default instead of selector filter whenever unless `sqlUseBoundAndSelectors` is set to true on the [SQL query context](./sql-query-context.md).

| Property | Description | Required |
| -------- | ----------- | -------- |
Expand Down Expand Up @@ -99,7 +99,7 @@ Druid's SQL planner uses the equality filter by default instead of selector filt

The null filter is a partial replacement for the selector filter. It is dedicated to matching NULL values.

Druid's SQL planner uses the null filter by default instead of selector filter whenever `druid.generic.useDefaultValueForNull=false`, or if `sqlUseBoundAndSelectors` is set to false on the [SQL query context](./sql-query-context.md).
Druid's SQL planner uses the null filter by default instead of selector filter unless `sqlUseBoundAndSelectors` is set to true on the [SQL query context](./sql-query-context.md).

| Property | Description | Required |
| -------- | ----------- | -------- |
Expand Down Expand Up @@ -310,7 +310,7 @@ Note that the bound filter matches null values if you don't specify a lower boun

The range filter is a replacement for the bound filter. It compares against any type of column and is designed to have has more SQL compliant behavior than the bound filter. It won't match null values, even if you don't specify a lower bound.

Druid's SQL planner uses the range filter by default instead of bound filter whenever `druid.generic.useDefaultValueForNull=false`, or if `sqlUseBoundAndSelectors` is set to false on the [SQL query context](./sql-query-context.md).
Druid's SQL planner uses the range filter by default instead of bound filter unless `sqlUseBoundAndSelectors` is set to true on the [SQL query context](./sql-query-context.md).

| Property | Description | Required |
| -------- | ----------- | -------- |
Expand Down
6 changes: 1 addition & 5 deletions docs/querying/lookups.md
Original file line number Diff line number Diff line change
Expand Up @@ -210,11 +210,7 @@ Injective lookups are eligible for the largest set of query rewrites. Injective
- The lookup table must have a key-value pair defined for every input that the `LOOKUP` function call may
encounter. For example, when calling `LOOKUP(sku, 'sku_to_name')`, the `sku_to_name` lookup table must have a key
for all possible `sku`.
- In SQL-compatible null handling mode (when `druid.generic.useDefaultValueForNull = false`, the default) injective
lookup tables are not required to have keys for `null`, since `LOOKUP` of `null` is always `null` itself.
- When `druid.generic.useDefaultValueForNull = true`, a `LOOKUP` of `null` retrieves the value mapped to the
empty-string key (`""`). In this mode, injective lookup tables must have an empty-string key if the `LOOKUP`
function may encounter null input values.
- Injective lookup tables are not required to have keys for `null`, since `LOOKUP` of `null` is always `null` itself.

To determine whether a lookup is injective, Druid relies on an `injective` property that you can set in the
[lookup definition](./lookups-cached-global.md). In general, you should set
Expand Down
Loading

0 comments on commit e805c9a

Please sign in to comment.