Skip to content

Commit

Permalink
Add cooperative-sticky partition_assignment_strategy option to kafka_…
Browse files Browse the repository at this point in the history
…reader (#882)

This PR makes the following changes:

- Bump the kafka-asset version from 5.3.0 to 5.4.0
- Update the `kafka_reader` and `kafka_reader_api` to allow for
`cooperative-sticky` as a `partition_assignment_strategy` option.
- Set `minimum_teraslice_version` to 2.9.0, as this version uses
node-rdkafka v3.2.0, which is the first version to allow for incremental
rebalancing using `cooperative-sticky`.
- Remove `partition_assignment_strategy` option from `kafka_sender`,
`kafka_sender_api` and `kafka_dead_letter`. It looks like these might
have been copy/paste errors, as this is not a valid property on a kafka
producer config.

Ref: #869, #873
  • Loading branch information
busma13 authored Dec 6, 2024
1 parent 97dcb16 commit 11cd106
Show file tree
Hide file tree
Showing 16 changed files with 23 additions and 53 deletions.
3 changes: 2 additions & 1 deletion asset/asset.json
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
{
"name": "kafka",
"description": "Kafka reader and writer support.",
"version": "5.3.1"
"version": "5.4.0",
"minimum_teraslice_version": "2.9.0"
}
2 changes: 1 addition & 1 deletion asset/package.json
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
{
"name": "kafka-assets",
"displayName": "Asset",
"version": "5.3.1",
"version": "5.4.0",
"private": true,
"description": "Teraslice asset for kafka operations",
"license": "MIT",
Expand Down
11 changes: 2 additions & 9 deletions asset/src/kafka_dead_letter/api.ts
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,7 @@ export default class KafkaDeadLetter extends OperationAPI<KafkaDeadLetterConfig>
}

private clientConfig() {
const config = {
return {
type: 'kafka',
endpoint: this.apiConfig.connection,
options: {
Expand All @@ -93,14 +93,7 @@ export default class KafkaDeadLetter extends OperationAPI<KafkaDeadLetterConfig>
'log.connection.close': false
} as Record<string, any>,
autoconnect: false
};

const assignmentStrategy = this.apiConfig.partition_assignment_strategy;
if (assignmentStrategy) {
config.rdkafka_options['partition.assignment.strategy'] = assignmentStrategy;
}

return config as ConnectionConfig;
} as ConnectionConfig;
}

private async createClient(): Promise<kafka.Producer> {
Expand Down
5 changes: 0 additions & 5 deletions asset/src/kafka_dead_letter/interfaces.ts
Original file line number Diff line number Diff line change
Expand Up @@ -34,9 +34,4 @@ export interface KafkaDeadLetterConfig extends APIConfig {
Set to -1 to disable polling.
*/
metadata_refresh: number;
/**
* Name of partition assignment strategy to use
* when elected group leader assigns partitions to group members.
*/
partition_assignment_strategy?: 'range' | 'roundrobin';
}
5 changes: 0 additions & 5 deletions asset/src/kafka_dead_letter/schema.ts
Original file line number Diff line number Diff line change
Expand Up @@ -43,11 +43,6 @@ export default class Schema extends ConvictSchema<KafkaDeadLetterConfig> {
doc: 'How often the producer will poll the broker for metadata information. Set to -1 to disable polling.',
default: 300000,
format: Number
},
partition_assignment_strategy: {
doc: 'Name of partition assignment strategy to use when elected group leader assigns partitions to group members.',
default: '',
format: ['range', 'roundrobin', '']
}
};
}
Expand Down
2 changes: 1 addition & 1 deletion asset/src/kafka_reader/interfaces.ts
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ export interface KafkaReaderConfig extends OpConfig {
* Name of partition assignment strategy to use when elected group leader
* assigns partitions to group members.
*/
partition_assignment_strategy?: 'range' | 'roundrobin';
partition_assignment_strategy?: 'range' | 'roundrobin' | 'cooperative-sticky';
/**
* Name of kafka api used for reader, if none is provided, then one is made
* and the name is kafka_reader_api, and is injected into the execution
Expand Down
2 changes: 1 addition & 1 deletion asset/src/kafka_reader_api/schema.ts
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@ export const schema = {
partition_assignment_strategy: {
doc: 'Name of partition assignment strategy to use when elected group leader assigns partitions to group members.',
default: '',
format: ['range', 'roundrobin', '']
format: ['range', 'roundrobin', 'cooperative-sticky', '']
}
};

Expand Down
5 changes: 0 additions & 5 deletions asset/src/kafka_sender/interfaces.ts
Original file line number Diff line number Diff line change
Expand Up @@ -38,11 +38,6 @@ export interface KafkaSenderConfig extends OpConfig {
Set to -1 to disable polling.
*/
metadata_refresh: number;
/**
* Name of partition assignment strategy to use when elected
* group leader assigns partitions to group members.
*/
partition_assignment_strategy?: 'range' | 'roundrobin';
/**
* This field indicates the number of acknowledgements the leader broker
* must receive from ISR brokers before responding to the request:
Expand Down
12 changes: 2 additions & 10 deletions asset/src/kafka_sender_api/api.ts
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,6 @@ export default class KafkaSenderApi extends APIFactory<KafkaRouteSender, KafkaSe
if (isNotNil(config.id_field) && !isString(config.id_field)) throw new Error(`Parameter id_field must be provided and be of type string, got ${getTypeOf(config.id_field)}`);
if (isNotNil(config.timestamp_field) && !isString(config.timestamp_field)) throw new Error(`Parameter timestamp_field must be provided and be of type string, got ${getTypeOf(config.timestamp_field)}`);
if (isNotNil(config.timestamp_now) && !isBoolean(config.timestamp_now)) throw new Error(`Parameter timestamp_now must be provided and be of type string, got ${getTypeOf(config.timestamp_now)}`);
if (isNil(config.partition_assignment_strategy) || !isString(config.partition_assignment_strategy)) throw new Error(`Parameter partition_assignment_strategy must be provided and be of type string, got ${getTypeOf(config.partition_assignment_strategy)}`);
if (isNil(config.compression) || !isString(config.compression)) throw new Error(`Parameter compression must be provided and be of type string, got ${getTypeOf(config.compression)}`);
if (isNil(config.wait) || !isNumber(config.wait)) throw new Error(`Parameter wait must be provided and be of type number, got ${getTypeOf(config.wait)}`);
if (isNil(config.metadata_refresh) || !isNumber(config.metadata_refresh)) throw new Error(`Parameter metadata_refresh must be provided and be of type number, got ${getTypeOf(config.metadata_refresh)}`);
Expand All @@ -40,7 +39,7 @@ export default class KafkaSenderApi extends APIFactory<KafkaRouteSender, KafkaSe

private clientConfig(clientConfig: KafkaSenderAPIConfig = {}) {
const kafkaConfig = Object.assign({}, this.apiConfig, clientConfig);
const config = {
return {
type: 'kafka',
endpoint: kafkaConfig.connection,
options: {
Expand All @@ -59,14 +58,7 @@ export default class KafkaSenderApi extends APIFactory<KafkaRouteSender, KafkaSe
'request.required.acks': kafkaConfig.required_acks
} as Record<string, any>,
autoconnect: false
};

const assignmentStrategy = kafkaConfig.partition_assignment_strategy;
if (assignmentStrategy) {
config.rdkafka_options['partition.assignment.strategy'] = assignmentStrategy;
}

return config as ConnectionConfig;
} as ConnectionConfig;
}

async create(
Expand Down
5 changes: 0 additions & 5 deletions asset/src/kafka_sender_api/schema.ts
Original file line number Diff line number Diff line change
Expand Up @@ -81,11 +81,6 @@ export const schema = {
default: '5 minutes',
format: 'duration'
},
partition_assignment_strategy: {
doc: 'Name of partition assignment strategy to use when elected group leader assigns partitions to group members.',
default: '',
format: ['range', 'roundrobin', '']
},
required_acks: {
doc: 'The number of required broker acknowledgements for a given request, set to -1 for all.',
default: 1,
Expand Down
1 change: 0 additions & 1 deletion docs/apis/kafka_dead_letter.md
Original file line number Diff line number Diff line change
Expand Up @@ -96,5 +96,4 @@ are sent to topic "failed_record_topic" at the end of the slice
| wait | How long to wait for size messages to become available on the producer, in milliseconds. | String/Duration/Number | optional, defaults to `500` |
| connection | Name of the kafka connection to use when sending data | String | optional, defaults to the 'default' connection in the kafka terafoundation connector config |
| metadata_refresh | How often the producer will poll the broker for metadata information. Set to -1 to disable polling. | String/Duration/Number | optional, defaults to `"5 minutes"` |
| partition_assignment_strategy | Name of partition assignment strategy to use when elected group leader assigns partitions to group members. May be set to `range`, `roundrobin` or `""` | String | optional, defaults to `""` |
| _encoding | Used for specifying the data encoding type when using DataEntity.fromBuffer. May be set to `json` or `raw` | String | optional, defaults to `json` |
2 changes: 1 addition & 1 deletion docs/apis/kafka_reader_api.md
Original file line number Diff line number Diff line change
Expand Up @@ -195,7 +195,7 @@ const results = await api.consume(query)
| connection | Name of the kafka connection to use when sending data | String | optional, defaults to the 'default' connection in the kafka terafoundation connector config |
| max_poll_interval | The maximum delay between invocations of poll() when using consumer group management. This places an upper bound on the amount of time that the consumer can be idle before fetching more records. If poll() is not called before expiration of this timeout, then the consumer is considered failed and the group will rebalance in order to reassign the partitions to another member| String/Duration | optional, defaults to `"5 minutes"` |
| offset_reset | How offset resets should be handled when there are no valid offsets for the consumer group. May be set to `smallest`, `earliest`, `beginning`, `largest`, `latest` or `error` | String | optional, defaults to `smallest` |
| partition_assignment_strategy | Name of partition assignment strategy to use when elected group leader assigns partitions to group members. May be set to `range`, `roundrobin` or `""` | String | optional, defaults to `""` |
| partition_assignment_strategy | Name of partition assignment strategy to use when elected group leader assigns partitions to group members. May be set to `range`, `roundrobin`, `cooperative-sticky` or `""` | String | optional, defaults to `""` |
| rollback_on_failure | Controls whether the consumer state is rolled back on failure. This will protect against data loss, however this can have an unintended side effect of blocking the job from moving if failures are minor and persistent. NOTE: This currently defaults to false due to the side effects of the behavior, at some point in the future it is expected this will default to true.| Boolean | optional, defaults to `false` |
| use_commit_sync | Use commit sync instead of async (usually not recommended) | Boolean | optional, defaults to `false` |
| wait |How long to wait for a full chunk of data to be available. Specified in milliseconds if you use a number. | String/Duration/Number | optional, defaults to `30 seconds` |
Expand Down
1 change: 0 additions & 1 deletion docs/apis/kafka_sender_api.md
Original file line number Diff line number Diff line change
Expand Up @@ -205,5 +205,4 @@ await api.send([
| connection | Name of the kafka connection to use when sending data | String | optional, defaults to the 'default' connection in the kafka terafoundation connector config |
| required_acks | The number of required broker acknowledgements for a given request, set to -1 for all. | Number | optional, defaults to `1` |
| metadata_refresh | How often the producer will poll the broker for metadata information. Set to -1 to disable polling. | String/Duration/Number | optional, defaults to `"5 minutes"` |
| partition_assignment_strategy | Name of partition assignment strategy to use when elected group leader assigns partitions to group members. May be set to `range`, `roundrobin` or `""` | String | optional, defaults to `""` |
| api_name | Name of `kafka_sender_api` used for the sender, if none is provided, then one is made and assigned the name to `kafka_sender_api`, and is injected into the execution | String | optional, defaults to `kafka_sender_api`|| _encoding | Used for specifying the data encoding type when using DataEntity.fromBuffer. May be set to `json` or `raw` | String | optional, defaults to `json` |
2 changes: 1 addition & 1 deletion docs/operations/kafka_reader.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@ results.length === 5000;
| connection | Name of the kafka connection to use when sending data | String | optional, defaults to the 'default' connection in the kafka terafoundation connector config |
| max_poll_interval | The maximum delay between invocations of poll() when using consumer group management. This places an upper bound on the amount of time that the consumer can be idle before fetching more records. If poll() is not called before expiration of this timeout, then the consumer is considered failed and the group will rebalance in order to reassign the partitions to another member| String/Duration | optional, defaults to `"5 minutes"` |
| offset_reset | How offset resets should be handled when there are no valid offsets for the consumer group. May be set to `smallest`, `earliest`, `beginning`, `largest`, `latest` or `error` | String | optional, defaults to `smallest` |
| partition_assignment_strategy | Name of partition assignment strategy to use when elected group leader assigns partitions to group members. May be set to `range`, `roundrobin` or `""` | String | optional, defaults to `""` |
| partition_assignment_strategy | Name of partition assignment strategy to use when elected group leader assigns partitions to group members. May be set to `range`, `roundrobin`, `cooperative-sticky` or `""` | String | optional, defaults to `""` |
| rollback_on_failure | Controls whether the consumer state is rolled back on failure. This will protect against data loss, however this can have an unintended side effect of blocking the job from moving if failures are minor and persistent. NOTE: This currently defaults to false due to the side effects of the behavior, at some point in the future it is expected this will default to true.| Boolean | optional, defaults to `false` |
| use_commit_sync | Use commit sync instead of async (usually not recommended) | Boolean | optional, defaults to `false` |
| wait | How long to wait for a full chunk of data to be available. Specified in milliseconds if you use a number. | String/Duration/Number | optional, defaults to `30 seconds` |
Expand Down
16 changes: 11 additions & 5 deletions docs/operations/kafka_sender.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
# kafka_sender

The kafka_sender is used to send data to a kafka topic. This is a high throughput operation.

This uses [node-rdkafka](https://github.com/Blizzard/node-rdkafka) underneath the hood.
Expand All @@ -8,9 +9,11 @@ For this sender to function properly, you will need a running kafka cluster and
## Usage

### Send data to topic, use key and time from fields on record

In this example, the kafka_sender will send data to the kafka-test-sender topic using the uuid field of the record. It will also annotate the kafka record timestamp metadata with the date specified on the created field on the record.

Example job

```json
{
"name": "test-job",
Expand Down Expand Up @@ -64,9 +67,11 @@ results === data;
```

### Send data to topic, use _key metadata and create its own timestamp
In this example, the kafka_sender will send data to the kafka-test-sender topic using the _key metadata value, which happens when the `id_field` is not set. It will also annotate the kafka record timestamp metadata with a new date at processing time.

In this example, the kafka_sender will send data to the kafka-test-sender topic using the_key metadata value, which happens when the `id_field` is not set. It will also annotate the kafka record timestamp metadata with a new date at processing time.

Example job

```json
{
"name": "test-job",
Expand Down Expand Up @@ -133,19 +138,20 @@ results === data;
| connection | Name of the kafka connection to use when sending data | String | optional, defaults to the 'default' connection in the kafka terafoundation connector config |
| required_acks | The number of required broker acknowledgements for a given request, set to -1 for all. | Number | optional, defaults to `1` |
| metadata_refresh | How often the producer will poll the broker for metadata information. Set to -1 to disable polling. | String/Duration/Number | optional, defaults to `"5 minutes"` |
| partition_assignment_strategy | Name of partition assignment strategy to use when elected group leader assigns partitions to group members. May be set to `range`, `roundrobin` or `""` | String | optional, defaults to `""` |
| api_name | Name of `kafka_sender_api` used for the sender, if none is provided, then one is made and assigned the name to `kafka_sender_api`, and is injected into the execution | String | optional, defaults to `kafka_sender_api`|| _encoding | Used for specifying the data encoding type when using DataEntity.fromBuffer. May be set to `json` or `raw` | String | optional, defaults to `json` |
| api_name | Name of `kafka_sender_api` used for the sender, if none is provided, then one is made and assigned the name to `kafka_sender_api`, and is injected into the execution | String | optional, defaults to `kafka_sender_api`|
| _encoding | Used for specifying the data encoding type when using DataEntity.fromBuffer. May be set to `json` or `raw` | String | optional, defaults to `json` |
| _dead_letter_action | action will specify what to do when failing to parse or transform a record. It may be set to `throw`, `log` or `none`. If none of the actions are specified it will try and use a registered Dead Letter Queue API under that name.The API must be already be created by a operation before it can used. | String | optional, defaults to `throw` |

### API usage

#### API usage
In kafka_assets v3, many core components were made into teraslice apis. When you use an kafka processor it will automatically setup the api for you, but if you manually specify the api, then there are restrictions on what configurations you can put on the operation so that clashing of configurations are minimized. The api configs take precedence.

If submitting the job in long form, here is a list of parameters that will throw an error if also specified on the opConfig, since these values should be placed on the api:
- `topic`

- `topic`

`SHORT FORM (no api specified)`

```json
{
"name": "test-job",
Expand Down
2 changes: 1 addition & 1 deletion package.json
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
{
"name": "kafka-asset-bundle",
"displayName": "Kafka Asset Bundle",
"version": "5.3.1",
"version": "5.4.0",
"private": true,
"description": "A bundle of Kafka operations and processors for Teraslice",
"repository": "[email protected]:terascope/kafka-assets.git",
Expand Down

0 comments on commit 11cd106

Please sign in to comment.