From c0080ee34f2571a16184cff053e82298f6444c34 Mon Sep 17 00:00:00 2001 From: Grace Cai Date: Thu, 26 Dec 2024 18:19:04 +0800 Subject: [PATCH 1/9] Add temp.md --- temp.md | 1 + 1 file changed, 1 insertion(+) create mode 100644 temp.md diff --git a/temp.md b/temp.md new file mode 100644 index 0000000000000..af27ff4986a7b --- /dev/null +++ b/temp.md @@ -0,0 +1 @@ +This is a test file. \ No newline at end of file From c901e79d6fa9f4e81a862907bfc91744170f4487 Mon Sep 17 00:00:00 2001 From: Grace Cai Date: Thu, 26 Dec 2024 18:19:09 +0800 Subject: [PATCH 2/9] Delete temp.md --- temp.md | 1 - 1 file changed, 1 deletion(-) delete mode 100644 temp.md diff --git a/temp.md b/temp.md deleted file mode 100644 index af27ff4986a7b..0000000000000 --- a/temp.md +++ /dev/null @@ -1 +0,0 @@ -This is a test file. \ No newline at end of file From 74fcca56d654ce606fdfda2a6aa84ca2f6cb694e Mon Sep 17 00:00:00 2001 From: qiancai Date: Fri, 27 Dec 2024 10:22:52 +0800 Subject: [PATCH 3/9] Create batch-processing.md --- batch-processing.md | 97 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 97 insertions(+) create mode 100644 batch-processing.md diff --git a/batch-processing.md b/batch-processing.md new file mode 100644 index 0000000000000..564bdbd41694d --- /dev/null +++ b/batch-processing.md @@ -0,0 +1,97 @@ +--- +title: Batch Data Processing +summary: Introduces batch data processing features in TiDB, including Pipelined DML, non-transactional DML, the `IMPORT INTO` statement, and the deprecated batch-dml feature. +--- + +# Batch Data Processing + +Batch data processing is a common and essential operation in real-world scenarios. It enables efficient handling of large datasets for tasks such as data migration, bulk imports, archiving, and large-scale updates. + +To optimize performance for batch operations, TiDB introduces various features over its version evolution: + +- Data import + - `IMPORT INTO` statement (introduced in TiDB v7.2.0 and GA in v7.5.0) +- Data inserts, updates, and deletions + - Pipelined DML (experimental, introduced in TiDB v8.0.0) + - Non-transactional DML (introduced in TiDB v6.1.0) + - Batch-dml (deprecated) + +This document outlines the key benefits, limitations, and use cases of these features to help you choose the most suitable solution for efficient batch data processing. + +## Data import + +The `IMPORT INTO` statement is designed for data import tasks. It allows you to quickly import data in formats such as CSV, SQL, or PARQUET into an empty TiDB table, without the need to deploy [TiDB Lightning](/tidb-lightning/tidb-lightning-overview.md) separately. + +### Key benefits + +- Extremely fast import speed +- Easier to use compared to TiDB Lightning + +### Limitations + +- No transactional [ACID](/glossary.md#acid) guarantees +- Subject to various usage restrictions + +### Use cases + +- Suitable for data import scenarios such as data migration or recovery. It is recommended to use `IMPORT INTO` instead of TiDB Lightning where applicable. + +For more information, see [`IMPORT INTO`](/sql-statements/sql-statement-import-into.md). + +## Data inserts, updates, and deletions + +### Pipelined DML + +Pipelined DML is an experimental feature introduced in TiDB v8.0.0. In v8.5.0, the feature is enhanced with significant performance improvements. + +#### Key benefits + +- Streams data to the storage layer during transaction execution instead of caching it entirely in memory, allowing transaction size no longer limited by TiDB memory and supporting ultra-large-scale data processing +- Achieves faster performance compared to standard DML +- Can be enabled through system variables without SQL modifications + +#### Limitations + +- Only supports [autocommit](/transaction-overview.md#autocommit) `INSERT`, `REPLACE`, `UPDATE`, and `DELETE` statements. + +#### Use cases + +- Suitable for general batch data processing tasks, such as bulk data inserts, updates, and deletions. + +For more information, see [Pipelined DML](/pipelined-dml.md). + +### Non-transactional DML statements + +Non-transactional DML is introduced in TiDB v6.1.0. Initially, only the `DELETE` statement supports this feature. Starting from v6.5.0, `INSERT`, `REPLACE`, and `UPDATE` statements also supports this feature. + +#### Key benefits + +- Splits a single SQL statement into multiple smaller statements, bypassing memory limitations. +- Achieves performance that is slightly faster or comparable to standard DML. + +#### Limitations + +- Only supports [autocommit](/transaction-overview.md#autocommit) statements +- Requires modifications to SQL statements +- Imposes strict requirements on SQL syntax; some statements might need rewriting +- Lacks full transactional ACID guarantees; in case of failures, partial execution of a statement might occur + +#### Use cases + +- Suitable for scenarios involving bulk data inserts, updates, and deletions. Due to its limitations, it is recommended to consider non-transactional DML only when Pipelined DML is not applicable. + +For more details, refer to the [Non-transactional DML](/non-transactional-dml.md) documentation. + +### Deprecated batch-dml feature + +The batch-dml feature, available in TiDB versions prior to v4.0, is now deprecated and no longer recommended. This feature is controlled by the following system variables: + +- `tidb_batch_insert` +- `tidb_batch_delete` +- `tidb_batch_commit` +- `tidb_enable_batch_dml` +- `tidb_dml_batch_size` + +Due to the risk of data corruption or loss caused by inconsistent data and indexes, these variables have been deprecated and are planned for removal in future releases. + +It is **NOT RECOMMENDED** to use the deprecated batch-dml feature under any circumstances. Instead, consider other alternative features outlined in this document. \ No newline at end of file From 57cbb3cf1c786bec26fabaefebba36f3277519aa Mon Sep 17 00:00:00 2001 From: qiancai Date: Fri, 27 Dec 2024 16:42:38 +0800 Subject: [PATCH 4/9] add pipelined dml --- TOC.md | 2 + pipelined-dml.md.md | 149 ++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 151 insertions(+) create mode 100644 pipelined-dml.md.md diff --git a/TOC.md b/TOC.md index 50c3a78800875..b4557cb81615a 100644 --- a/TOC.md +++ b/TOC.md @@ -397,6 +397,7 @@ - [Use Load Base Split](/configure-load-base-split.md) - [Use Store Limit](/configure-store-limit.md) - [DDL Execution Principles and Best Practices](/ddl-introduction.md) + - [Batch Data Processing](/batch-processing.md) - Use PD Microservices - [PD Microservices Overview](/pd-microservices.md) - [Scale PD Microservice Nodes Using TiUP](/scale-microservices-using-tiup.md) @@ -937,6 +938,7 @@ - [Optimistic Transactions](/optimistic-transaction.md) - [Pessimistic Transactions](/pessimistic-transaction.md) - [Non-Transactional DML Statements](/non-transactional-dml.md) + - [Pipelined DML](/pipelined-dml.md) - [Views](/views.md) - [Partitioning](/partitioned-table.md) - [Temporary Tables](/temporary-tables.md) diff --git a/pipelined-dml.md.md b/pipelined-dml.md.md new file mode 100644 index 0000000000000..601c553a63625 --- /dev/null +++ b/pipelined-dml.md.md @@ -0,0 +1,149 @@ +--- +title: Pipelined DML +summary: Introduces the use cases, methods, limitations, and FAQs of Pipelined DML. Pipelined DML enhances TiDB's batch processing capabilities, allowing transaction sizes to bypass TiDB's memory limits. +--- + +# Pipelined DML + +> **Warning:** +> +> Pipelined DML is an experimental feature. It is not recommended that you use it in the production environment. This feature might be changed or removed without prior notice. If you find a bug, you can report an [issue](https://github.com/pingcap/tidb/issues) on GitHub. + +This document introduces the use cases, methods, limitations, and common issues related to Pipelined DML. + +## Overview + +Pipelined DML is an experimental feature introduced in TiDB v8.0.0 to improve the performance of large-scale data write operations. When this feature is enabled, TiDB streams data directly to the storage layer during DML operations, instead of caching it entirely in memory. This pipeline-like approach simultaneously reads data (input) and writes it to the storage layer (output), effectively resolving common challenges in large-scale DML operations as follows: + +- Memory limits: traditional DML operations might encounter out-of-memory (OOM) errors when handling large datasets. +- Performance bottlenecks: large transactions are often inefficient and is prone to causing workload fluctuations. +- Operational limits: TiDB memory limits make it difficult to execute ultra-large data processing tasks. + +With pipelined DML enabled, you can achieve the following: + +- Perform large-scale data operations without being constrained by TiDB memory limits. +- Maintain smoother workload and lower operation latency. +- Keep transaction memory usage predictable, typically within 1 GiB. + +It is recommended to enable Pipelined DML in the following scenarios: + +- Processing data writes involving millions of rows or more +- Encountering memory insufficient errors during DML operations +- Experiencing noticeable workload fluctuations during large-scale data operations + +Note that although Pipelined DML significantly reduces memory usage during transaction processing, you still need to configure a [reasonable memory threshold](/system-variables.md#tidb_mem_quota_query) (at least 2 GiB recommended) to ensure other modules (such as executors) function properly during large-scale data operations. + +## Limitations + +Currently, Pipelined DML has the following limitations: + +- Pipelined DML is currently incompatible with TiCDC, TiFlash, or BR. Avoid using Pipelined DML on tables associated with these components, as it might lead to issues such as blocking or OOM in these components. +- Pipelined DML is not suitable for scenarios with write conflicts, because it might lead to significant performance degradation or operation failures that require rollback. +- Make sure that the [metadata lock](/metadata-lock.md) is enabled during Pipelined DML operations. +- When executing DML statements with Pipelined DML enabled, TiDB checks the following conditions. If any condition is not met, TiDB falls back to standard DML execution and generates a warning: + - Only [autocommit](/transaction-overview.md#autocommit) statements are supported. + - Only `INSERT`, `UPDATE`, `REPLACE`, and `DELETE` statements are supported. + - Target tables must not include [temporary tables](/temporary-tables.md) or [cached tables](/cached-tables.md). + - When [foreign key constraints](/foreign-key.md) are enabled (`foreign_key_checks = ON`), target tables must not include foreign key relationships. +- When executing `INSERT IGNORE ... ON DUPLICATE KEY UPDATE` statements, conflicting updates might result in `Duplicate entry` errors. + +## Usage + +This section describes how to enable Pipelined DML and verify whether it takes effect. + +### Enable Pipelined DML + +You can enable Pipelined DML in one of the following methods: + +- To enable Pipelined DML for the current session, set the [`tidb_dml_type`](/system-variables.md#tidb_dml_type-new-in-v800) variable to `"bulk"`: + + ```sql + SET tidb_dml_type = "bulk"; + ``` + +- To enable Pipelined DML for a specific statement, add the [`SET_VAR`](/optimizer-hints.md#set_varvar_namevar_value) hint in the statement. + + - Data archiving example: + + ```sql + INSERT /*+ SET_VAR(tidb_dml_type='bulk') */ INTO target_table SELECT * FROM source_table; + ``` + + - Bulk data update example: + + ```sql + UPDATE /*+ SET_VAR(tidb_dml_type='bulk') */ products + SET price = price * 1.1 + WHERE category = 'electronics'; + ``` + + - Bulk deletion example: + + ```sql + DELETE /*+ SET_VAR(tidb_dml_type='bulk') */ FROM logs WHERE log_time < '2023-01-01'; + ``` + +### Verify Pipelined DML + +After executing a DML statement, you can verify whether Pipelined DML is used for the statement execution by checking the [`tidb_last_txn_info`](/system-variables.md#tidb_last_txn_info-new-in-v409) variable: + +```sql +SELECT @@tidb_last_txn_info; +``` + +If the `pipelined` field in the output is `true`, it indicates that Pipelined DML is successfully used. + +## Best practices + +- Increase the value of [`tidb_mem_quota_query`](/system-variables.md#tidb_mem_quota_query) slightly to ensure that memory usage for components such as executors does not exceed the limit. A value of at least 2 GiB is recommended. For environments with sufficient TiDB memory, you can increase this value further. +- In scenarios where data is inserted into new tables, the performance of Pipelined DML might be affected by hotspots. To achieve optimal performance, it is recommended to address hotspots in advance. For more information, see [Troubleshoot Hotspot Issues](/troubleshoot-hot-spot-issues.md). + +## Related configurations + +- The [`tidb_dml_type`](/system-variables.md#tidb_dml_type-new-in-v800) system variable controls whether Pipelined DML is enabled at the session level. +- When [`tidb_dml_type`](/system-variables.md#tidb_dml_type-new-in-v800) is set to `"bulk"`, the [`pessimistic-auto-commit`](/tidb-configuration-file.md#pessimistic-auto-commit) configuration item behaves as if it is set to `false`. +- Transactions executed using Pipelined DML are not subject to the size limit specified by the TiDB configuration item [`txn-total-size-limit`](/tidb-configuration-file.md#txn-total-size-limit). +- For large transactions executed using Pipelined DML, transaction duration might increase. In such cases, the maximum TTL for the transaction lock is the larger value of [`max-txn-ttl`](/tidb-configuration-file.md#max-txn-ttl) or 24 hours. +- If the execution time of a transaction exceeds the value set by [`tidb_gc_max_wait_time`](/system-variables.md#tidb_gc_max_wait_time-new-in-v610), garbage collection (GC) might force the transaction to roll back, causing it to fail. + +## Monitor Pipelined DML + +You can monitor the execution of Pipelined DML using the following methods: + +- Check the [`tidb_last_txn_info`](/system-variables.md#tidb_last_txn_info-new-in-v409) system variable to get information about the last transaction executed in the current session, including whether Pipelined DML was used. +- Look for lines containing `"[pipelined dml]"` in TiDB logs to understand the execution process and progress of Pipelined DML, including the current stage and the amount of data written. +- View the `affected rows` field in the [`expensive query`](/identify-expensive-queries.md#expensive-query-log-example) logs to track the progress of long-running statements. +- Query the [`INFORMATION_SCHEMA.PROCESSLIST`](/information-schema/information-schema-processlist.md) table to view transaction execution progress. Pipelined DML is typically used for large transactions, so you can use this table monitor their execution progress. + +## FAQs + +### Why wasn’t my query executed using Pipelined DML? + +When TiDB rejects to execute a statement using Pipelined DML, it generates a warning message accordingly. You can execute `SHOW WARNINGS;` to check the warning and identify the cause. + +Common reasons: + +- The DML statement is not autocommited. +- The statement involves unsupported table types, such as [temporary tables](/temporary-tables.md) or [cached tables](/cached-tables.md). +- The operation involves foreign keys, and foreign key checks are enabled. + +### Does Pipelined DML affect the isolation level of transactions? + +No. Pipelined DML only changes the data-writing mechanism during transactions and does not affect isolation guarantees of TiDB transactions. + +### Why do I still encounter out-of-memory (OOM) errors after enabling Pipelined DML? + +Even with Pipelined DML enabled, you might still encounter query termination caused by memory limit issues: + +``` +The query has been canceled due to exceeding the memory limit allowed for a single SQL query. Please try to narrow the query scope or increase the tidb_mem_quota_query limit, and then try again. +``` + +This error occurs because Pipelined DML only controls the memory usage by data during transaction execution. However, the total memory consumed during statement execution also includes memory used by other components, such as executors. If the total memory required exceeds TiDB memory limit, out-of-memory (OOM) errors might still occur. + +In most cases, you can increase the system variable [`tidb_mem_quota_query`](/system-variables.md#tidb_mem_quota_query) to a higher value to resolve this issue. A value of at least 2 GiB is recommended. For SQL statements with complex operators or involving large datasets, you might need to increase this value further. + +## Learn More + +- [Batch Data Processing](/batch-processing.md) +- [TiDB Memory Control](/configure-memory-usage.md) \ No newline at end of file From 874e45ee8c1347c706f6439bd74db4686ccf2a37 Mon Sep 17 00:00:00 2001 From: qiancai Date: Fri, 27 Dec 2024 16:47:10 +0800 Subject: [PATCH 5/9] Update system-variables.md --- system-variables.md | 13 +------------ 1 file changed, 1 insertion(+), 12 deletions(-) diff --git a/system-variables.md b/system-variables.md index a7336e557a2b0..112365d58af7a 100644 --- a/system-variables.md +++ b/system-variables.md @@ -1830,18 +1830,7 @@ mysql> SELECT job_info FROM mysql.analyze_jobs ORDER BY end_time DESC LIMIT 1; - Value options: `"standard"`, `"bulk"` - This variable controls the execution mode of DML statements. - `"standard"` indicates the standard DML execution mode, where TiDB transactions are cached in memory before being committed. This mode is suitable for high-concurrency transaction scenarios with potential conflicts and is the default recommended execution mode. - - `"bulk"` indicates the bulk DML execution mode, which is suitable for scenarios where a large amount of data is written, causing excessive memory usage in TiDB. - - During the execution of TiDB transactions, the data is not fully cached in the TiDB memory, but is continuously written to TiKV to reduce memory usage and smooth the write pressure. - - Only `INSERT`, `UPDATE`, `REPLACE`, and `DELETE` statements are affected by the `"bulk"` mode. Due to the pipelined execution in `"bulk"` mode, the usage of `INSERT IGNORE ... ON DUPLICATE UPDATE ...` might result in a `Duplicate entry` error when updates cause conflicts. In contrast, in `"standard"` mode, because the `IGNORE` keyword is set, this error would be ignored and not be returned to the user. - - `"bulk"` mode is only suitable for scenarios where a large amount of **data is written without conflicts**. This mode is not efficient for handling write conflicts, as write-write conflicts might cause large transactions to fail and be rolled back. - - `"bulk"` mode only takes effect on statements with auto-commit enabled, and requires the [`pessimistic-auto-commit`](https://docs.pingcap.com/tidb/stable/tidb-configuration-file#pessimistic-auto-commit-new-in-v600) configuration item to be set to `false`. - - When using the `"bulk"` mode to execute statements, ensure that the [metadata lock](/metadata-lock.md) remains enabled during the execution process. - - `"bulk"` mode cannot be used on [temporary tables](/temporary-tables.md) and [cached tables](/cached-tables.md). - - `"bulk"` mode cannot be used on tables containing foreign keys and tables referenced by foreign keys when the foreign key constraint check is enabled (`foreign_key_checks = ON`). - - In situations that the environment does not support or is incompatible with the `"bulk"` mode, TiDB falls back to the `"standard"` mode and returns a warning message. To verify if the `"bulk"` mode is used, you can check the `pipelined` field using [`tidb_last_txn_info`](#tidb_last_txn_info-new-in-v409). A `true` value indicates that the `"bulk"` mode is used. - - When executing large transactions in the `"bulk"` mode, the transaction duration might be long. For transactions in this mode, the maximum TTL of the transaction lock is the greater value between [`max-txn-ttl`](https://docs.pingcap.com/tidb/stable/tidb-configuration-file#max-txn-ttl) and 24 hours. Additionally, if the transaction execution time exceeds the value set by [`tidb_gc_max_wait_time`](#tidb_gc_max_wait_time-new-in-v610), the GC might force a rollback of the transaction, leading to its failure. - - When TiDB executes transactions in the `"bulk"` mode, transaction size is not limited by the TiDB configuration item [`txn-total-size-limit`](https://docs.pingcap.com/tidb/stable/tidb-configuration-file#txn-total-size-limit). - - This mode is implemented by the Pipelined DML feature. For detailed design and GitHub issues, see [Pipelined DML](https://github.com/pingcap/tidb/blob/master/docs/design/2024-01-09-pipelined-DML.md) and [#50215](https://github.com/pingcap/tidb/issues/50215). + - `"bulk"` indicates the pipelined DML execution mode, which is suitable for scenarios where a large amount of data is written, causing excessive memory usage in TiDB. For more information, see [Pipelined DML](/pipelined-dml.md). ### tidb_enable_1pc New in v5.0 From 4e031cb44a8e03c5777be014741ff87cfd6a9d93 Mon Sep 17 00:00:00 2001 From: Grace Cai Date: Fri, 27 Dec 2024 16:54:55 +0800 Subject: [PATCH 6/9] Update pipelined-dml.md.md --- pipelined-dml.md.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pipelined-dml.md.md b/pipelined-dml.md.md index 601c553a63625..cdafe1e53c1f8 100644 --- a/pipelined-dml.md.md +++ b/pipelined-dml.md.md @@ -31,7 +31,7 @@ It is recommended to enable Pipelined DML in the following scenarios: - Encountering memory insufficient errors during DML operations - Experiencing noticeable workload fluctuations during large-scale data operations -Note that although Pipelined DML significantly reduces memory usage during transaction processing, you still need to configure a [reasonable memory threshold](/system-variables.md#tidb_mem_quota_query) (at least 2 GiB recommended) to ensure other modules (such as executors) function properly during large-scale data operations. +Note that although Pipelined DML significantly reduces memory usage during transaction processing, you still need to configure a [reasonable memory threshold](/system-variables.md#tidb_mem_quota_query) (at least 2 GiB recommended) to ensure other components (such as executors) function properly during large-scale data operations. ## Limitations From e779ee797e33aee69056750c1b0629c03b54d6a3 Mon Sep 17 00:00:00 2001 From: Grace Cai Date: Fri, 27 Dec 2024 16:57:14 +0800 Subject: [PATCH 7/9] Update pipelined-dml.md.md --- pipelined-dml.md.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pipelined-dml.md.md b/pipelined-dml.md.md index cdafe1e53c1f8..6b5a896eb97a8 100644 --- a/pipelined-dml.md.md +++ b/pipelined-dml.md.md @@ -117,7 +117,7 @@ You can monitor the execution of Pipelined DML using the following methods: ## FAQs -### Why wasn’t my query executed using Pipelined DML? +### Why wasn't my query executed using Pipelined DML? When TiDB rejects to execute a statement using Pipelined DML, it generates a warning message accordingly. You can execute `SHOW WARNINGS;` to check the warning and identify the cause. From a5ab9e10ca537a2586dfc90d4ef68c6ec8b8cf2d Mon Sep 17 00:00:00 2001 From: qiancai Date: Fri, 27 Dec 2024 17:25:33 +0800 Subject: [PATCH 8/9] Update batch-processing.md --- batch-processing.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/batch-processing.md b/batch-processing.md index 564bdbd41694d..c280e27bcd4d3 100644 --- a/batch-processing.md +++ b/batch-processing.md @@ -80,7 +80,7 @@ Non-transactional DML is introduced in TiDB v6.1.0. Initially, only the `DELETE` - Suitable for scenarios involving bulk data inserts, updates, and deletions. Due to its limitations, it is recommended to consider non-transactional DML only when Pipelined DML is not applicable. -For more details, refer to the [Non-transactional DML](/non-transactional-dml.md) documentation. +For more information, see [Non-transactional DML](/non-transactional-dml.md). ### Deprecated batch-dml feature From 3c394f1f2d826827b85b2b0939540922cc40fe7c Mon Sep 17 00:00:00 2001 From: Grace Cai Date: Tue, 31 Dec 2024 09:15:21 +0800 Subject: [PATCH 9/9] Apply suggestions from code review Co-authored-by: ekexium --- batch-processing.md | 2 +- pipelined-dml.md.md | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/batch-processing.md b/batch-processing.md index c280e27bcd4d3..7e34e12243221 100644 --- a/batch-processing.md +++ b/batch-processing.md @@ -46,7 +46,7 @@ Pipelined DML is an experimental feature introduced in TiDB v8.0.0. In v8.5.0, t #### Key benefits -- Streams data to the storage layer during transaction execution instead of caching it entirely in memory, allowing transaction size no longer limited by TiDB memory and supporting ultra-large-scale data processing +- Streams data to the storage layer during transaction execution instead of buffering it entirely in memory, allowing transaction size no longer limited by TiDB memory and supporting ultra-large-scale data processing - Achieves faster performance compared to standard DML - Can be enabled through system variables without SQL modifications diff --git a/pipelined-dml.md.md b/pipelined-dml.md.md index 6b5a896eb97a8..c36cfeb707dd0 100644 --- a/pipelined-dml.md.md +++ b/pipelined-dml.md.md @@ -13,7 +13,7 @@ This document introduces the use cases, methods, limitations, and common issues ## Overview -Pipelined DML is an experimental feature introduced in TiDB v8.0.0 to improve the performance of large-scale data write operations. When this feature is enabled, TiDB streams data directly to the storage layer during DML operations, instead of caching it entirely in memory. This pipeline-like approach simultaneously reads data (input) and writes it to the storage layer (output), effectively resolving common challenges in large-scale DML operations as follows: +Pipelined DML is an experimental feature introduced in TiDB v8.0.0 to improve the performance of large-scale data write operations. When this feature is enabled, TiDB streams data directly to the storage layer during DML operations, instead of buffering it entirely in memory. This pipeline-like approach simultaneously reads data (input) and writes it to the storage layer (output), effectively resolving common challenges in large-scale DML operations as follows: - Memory limits: traditional DML operations might encounter out-of-memory (OOM) errors when handling large datasets. - Performance bottlenecks: large transactions are often inefficient and is prone to causing workload fluctuations. @@ -113,7 +113,7 @@ You can monitor the execution of Pipelined DML using the following methods: - Check the [`tidb_last_txn_info`](/system-variables.md#tidb_last_txn_info-new-in-v409) system variable to get information about the last transaction executed in the current session, including whether Pipelined DML was used. - Look for lines containing `"[pipelined dml]"` in TiDB logs to understand the execution process and progress of Pipelined DML, including the current stage and the amount of data written. - View the `affected rows` field in the [`expensive query`](/identify-expensive-queries.md#expensive-query-log-example) logs to track the progress of long-running statements. -- Query the [`INFORMATION_SCHEMA.PROCESSLIST`](/information-schema/information-schema-processlist.md) table to view transaction execution progress. Pipelined DML is typically used for large transactions, so you can use this table monitor their execution progress. +- Query the [`INFORMATION_SCHEMA.PROCESSLIST`](/information-schema/information-schema-processlist.md) table to view transaction execution progress. Pipelined DML is typically used for large transactions, so you can use this table to monitor their execution progress. ## FAQs