title | summary |
---|---|
Pipelined DML |
Introduce the use cases, methods, limitations, and FAQs of Pipelined DML. Pipelined DML enhances TiDB's batch processing capabilities, allowing transaction sizes to bypass TiDB's memory limits. |
Warning:
Pipelined DML is an experimental feature. It is not recommended that you use it in the production environment. This feature might be changed or removed without prior notice. If you find a bug, you can report an issue on GitHub.
This document introduces the use cases, methods, limitations, and common issues related to Pipelined DML.
Pipelined DML is an experimental feature introduced in TiDB v8.0.0 to improve the performance of large-scale data write operations. When this feature is enabled, TiDB streams data directly to the storage layer during DML operations, instead of buffering it entirely in memory. This pipeline-like approach simultaneously reads data (input) and writes it to the storage layer (output), effectively resolving common challenges in large-scale DML operations as follows:
- Memory limits: traditional DML operations might encounter out-of-memory (OOM) errors when handling large datasets.
- Performance bottlenecks: large transactions are often inefficient and is prone to causing workload fluctuations.
With pipelined DML enabled, you can achieve the following:
- Perform large-scale data operations without being constrained by TiDB memory limits.
- Maintain smoother workloads and lower operation latency.
- Keep transaction memory usage predictable, typically within 1 GiB.
It is recommended to enable Pipelined DML in the following scenarios:
- Processing data writes involving millions of rows or more.
- Encountering memory insufficient errors during DML operations.
- Experiencing noticeable workload fluctuations during large-scale data operations.
Note that although Pipelined DML significantly reduces memory usage during transaction processing, you still need to configure a reasonable memory threshold (at least 2 GiB recommended) to ensure other components (such as executors) function properly during large-scale data operations.
Currently, Pipelined DML has the following limitations:
- Pipelined DML is currently incompatible with TiCDC, TiFlash, or BR. Avoid using Pipelined DML on tables associated with these components, as it might lead to issues such as blocking or OOM in these components.
- Pipelined DML is not suitable for scenarios with write conflicts, because it might lead to significant performance degradation or operation failures that require rollback.
- Make sure that the metadata lock is enabled during Pipelined DML operations.
- When executing DML statements with Pipelined DML enabled, TiDB checks the following conditions. If any condition is not met, TiDB falls back to standard DML execution and generates a warning:
- Only autocommit statements are supported.
- Only
INSERT
,UPDATE
,REPLACE
, andDELETE
statements are supported. - Target tables must not include temporary tables or cached tables.
- When foreign key constraints are enabled (
foreign_key_checks = ON
), target tables must not include foreign key relationships.
- When executing
INSERT IGNORE ... ON DUPLICATE KEY UPDATE
statements, conflicting updates might result inDuplicate entry
errors.
This section describes how to enable Pipelined DML and verify whether it takes effect.
You can enable Pipelined DML in one of the following methods:
-
To enable Pipelined DML for the current session, set the
tidb_dml_type
variable to"bulk"
:SET tidb_dml_type = "bulk";
-
To enable Pipelined DML for a specific statement, add the
SET_VAR
hint in the statement.-
Data archiving example:
INSERT /*+ SET_VAR(tidb_dml_type='bulk') */ INTO target_table SELECT * FROM source_table;
-
Bulk data update example:
UPDATE /*+ SET_VAR(tidb_dml_type='bulk') */ products SET price = price * 1.1 WHERE category = 'electronics';
-
Bulk deletion example:
DELETE /*+ SET_VAR(tidb_dml_type='bulk') */ FROM logs WHERE log_time < '2023-01-01';
-
After executing a DML statement, you can verify whether Pipelined DML is used for the statement execution by checking the tidb_last_txn_info
variable:
SELECT @@tidb_last_txn_info;
If the pipelined
field in the output is true
, it indicates that Pipelined DML is successfully used.
- Increase the value of
tidb_mem_quota_query
slightly to ensure that memory usage for components such as executors does not exceed the limit. A value of at least 2 GiB is recommended. For environments with sufficient TiDB memory, you can increase this value further. - In scenarios where data is inserted into new tables, the performance of Pipelined DML might be affected by hotspots. To achieve optimal performance, it is recommended to address hotspots in advance. For more information, see Troubleshoot Hotspot Issues.
- The
tidb_dml_type
system variable controls whether Pipelined DML is enabled at the session level. - When
tidb_dml_type
is set to"bulk"
, thepessimistic-auto-commit
configuration item behaves as if it is set tofalse
. - Transactions executed using Pipelined DML are not subject to the size limit specified by the TiDB configuration item
txn-total-size-limit
. - For large transactions executed using Pipelined DML, transaction duration might increase. In such cases, the maximum TTL for the transaction lock is the larger value of
max-txn-ttl
or 24 hours. - If the execution time of a transaction exceeds the value set by
tidb_gc_max_wait_time
, garbage collection (GC) might force the transaction to roll back, causing it to fail.
- The
tidb_dml_type
system variable controls whether Pipelined DML is enabled at the session level. - When
tidb_dml_type
is set to"bulk"
, thepessimistic-auto-commit
configuration item behaves as if it is set tofalse
. - Transactions executed using Pipelined DML are not subject to the size limit specified by the TiDB configuration item
txn-total-size-limit
. - For large transactions executed using Pipelined DML, transaction duration might increase. In such cases, the maximum TTL for the transaction lock is the larger value of
max-txn-ttl
or 24 hours. - If the execution time of a transaction exceeds the value set by
tidb_gc_max_wait_time
, garbage collection (GC) might force the transaction to roll back, causing it to fail.
You can monitor the execution of Pipelined DML using the following methods:
- Check the
tidb_last_txn_info
system variable to get information about the last transaction executed in the current session, including whether Pipelined DML was used. - Look for lines containing
"[pipelined dml]"
in TiDB logs to understand the execution process and progress of Pipelined DML, including the current stage and the amount of data written. - View the
affected rows
field in theexpensive query
logs to track the progress of long-running statements. - Query the
INFORMATION_SCHEMA.PROCESSLIST
table to view transaction execution progress. Pipelined DML is typically used for large transactions, so you can use this table to monitor their execution progress.
When TiDB rejects to execute a statement using Pipelined DML, it generates a warning message accordingly. You can execute SHOW WARNINGS;
to check the warning and identify the cause.
Common reasons:
- The DML statement is not autocommitted.
- The statement involves unsupported table types, such as temporary tables or cached tables.
- The operation involves foreign keys, and foreign key checks are enabled.
No. Pipelined DML only changes the data-writing mechanism during transactions and does not affect isolation guarantees of TiDB transactions.
Even with Pipelined DML enabled, you might still encounter query termination caused by memory limit issues:
The query has been canceled due to exceeding the memory limit allowed for a single SQL query. Please try to narrow the query scope or increase the tidb_mem_quota_query limit, and then try again.
This error occurs because Pipelined DML only controls the memory usage by data during transaction execution. However, the total memory consumed during statement execution also includes memory used by other components, such as executors. If the total memory required exceeds the TiDB memory limit, out-of-memory (OOM) errors might still occur.
In most cases, you can increase the system variable tidb_mem_quota_query
to a higher value to resolve this issue. A value of at least 2 GiB is recommended. For SQL statements with complex operators or involving large datasets, you might need to increase this value further.