Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

memory_usage_limit is deduced much less then expected #9745

Open
CalvinNeo opened this issue Dec 26, 2024 · 1 comment · May be fixed by pingcap/tidb-engine-ext#408
Open

memory_usage_limit is deduced much less then expected #9745

CalvinNeo opened this issue Dec 26, 2024 · 1 comment · May be fixed by pingcap/tidb-engine-ext#408
Labels
affects-5.4 This bug affects the 5.4.x(LTS) versions. affects-6.1 This bug affects the 6.1.x(LTS) versions. affects-6.5 This bug affects the 6.5.x(LTS) versions. affects-7.1 This bug affects the 7.1.x(LTS) versions. affects-7.5 This bug affects the 7.5.x(LTS) versions. affects-8.1 This bug affects the 8.1.x(LTS) versions. affects-8.5 This bug affects the 8.5.x(LTS) versions. component/storage severity/major type/bug The issue is confirmed as a bug.

Comments

@CalvinNeo
Copy link
Member

CalvinNeo commented Dec 26, 2024

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

2. What did you expect to see? (Required)

3. What did you see instead (Required)

4. What is your TiFlash version? (Required)

memory_usage_limit is inferred by block_cache_cap

            let mut limit =
                (block_cache_cap.0 as f64 / BLOCK_CACHE_RATE * MEMORY_USAGE_LIMIT_RATE) as u64;

https://github.com/pingcap/tidb-engine-ext/blob/521fd9dbc55e58646045d88f91c3c35db50b5981/src/config/mod.rs#L3597-L3598
which is set by

        if let Some(a) = self.rocksdb.defaultcf.block_cache_size
            && let Some(b) = self.rocksdb.writecf.block_cache_size
            && let Some(c) = self.rocksdb.lockcf.block_cache_size
        {
            let d = self
                .raftdb
                .defaultcf
                .block_cache_size
                .map(|s| s.0)
                .unwrap_or_default();
            let sum = a.0 + b.0 + c.0 + d;
            self.storage.block_cache.capacity = Some(ReadableSize(sum));
        }

https://github.com/pingcap/tidb-engine-ext/blob/521fd9dbc55e58646045d88f91c3c35db50b5981/src/config/mod.rs#L3795-L3805

and modified by

    config.raftdb.defaultcf.block_cache_size = proxy_config.raftdb.defaultcf.block_cache_size;
    config.rocksdb.defaultcf.block_cache_size = proxy_config.rocksdb.defaultcf.block_cache_size;
    config.rocksdb.writecf.block_cache_size = proxy_config.rocksdb.writecf.block_cache_size;
    config.rocksdb.lockcf.block_cache_size = proxy_config.rocksdb.lockcf.block_cache_size;

https://github.com/pingcap/tidb-engine-ext/blob/521fd9dbc55e58646045d88f91c3c35db50b5981/proxy_components/proxy_server/src/config.rs#L405-L408

See pingcap/tidb-engine-ext@2f2900a.


And tiflash proxy limit the memory for CF to a small number.

pub fn memory_limit_for_cf(is_raft_db: bool, cf: &str, total_mem: u64) -> ReadableSize {
    let (ratio, min, max) = match (is_raft_db, cf) {
        (true, CF_DEFAULT) => (0.05, 256 * MIB as usize, usize::MAX),
        (false, CF_DEFAULT) => (0.25, 0, 128 * MIB as usize),
        (false, CF_LOCK) => (0.02, 0, 128 * MIB as usize),
        (false, CF_WRITE) => (0.15, 0, 128 * MIB as usize),
        _ => unreachable!(),
    };
    let mut size = (total_mem as f64 * ratio) as usize;
    size = size.clamp(min, max);
    ReadableSize::mb(size as u64 / MIB)
}

https://github.com/pingcap/tidb-engine-ext/blob/521fd9dbc55e58646045d88f91c3c35db50b5981/proxy_components/proxy_server/src/config.rs#L123-L134


The logic affects all released versions including LTS 6.1/6.5/7.1/7.5/8.1/8.5

As a result, consider an enough big memory, the limit could be:

  • kv cf: 128MB * 3 = 0.375GiB
  • raft cf: 0.05 * total available machine memory

So basicly, the memory limit is 0.05 * total available machine memory / 0.45 * 0.75.

  • If the machine memory is 376 GiB, then memory limit is 31.3GiB.
  • If the machine memory is 32 GiB, then memory limit is 2.66 GiB.

Note, even if the raft-engine is used, we still take the raft_db memory size into account.

And because there is a memory high water machanism, the memory usage on proxy will time another 0.1 factor

#[allow(clippy::derivable_impls)]
impl Default for ProxyConfig {
    fn default() -> Self {
        ProxyConfig {
            raft_store: RaftstoreConfig::default(),
            server: ServerConfig::default(),
            rocksdb: RocksDbConfig::default(),
            raftdb: RaftDbConfig::default(),
            storage: StorageConfig::default(),
            enable_io_snoop: false,
            memory_usage_high_water: 0.1,
            readpool: ReadPoolConfig::default(),
            import: ImportConfig::default(),
            engine_store: EngineStoreConfig::default(),
            memory: MemoryConfig::default(),
        }
    }
}

So as a result,

  • If the machine memory is 376 GiB, then memory limit is 31.3GiB. TiFlash would complain if the memory is more than 3.1GiB.
  • If the machine memory is 32 GiB, then memory limit is 2.66 GiB. TiFlash would complain if the memory usage is more than 0.26 GiB.
@CalvinNeo CalvinNeo added type/bug The issue is confirmed as a bug. component/storage severity/major labels Dec 26, 2024
@CalvinNeo CalvinNeo added affects-5.4 This bug affects the 5.4.x(LTS) versions. affects-6.1 This bug affects the 6.1.x(LTS) versions. affects-6.5 This bug affects the 6.5.x(LTS) versions. affects-7.1 This bug affects the 7.1.x(LTS) versions. affects-7.5 This bug affects the 7.5.x(LTS) versions. affects-8.1 This bug affects the 8.1.x(LTS) versions. affects-8.5 This bug affects the 8.5.x(LTS) versions. component/storage severity/major type/bug The issue is confirmed as a bug. and removed type/bug The issue is confirmed as a bug. severity/major may-affects-5.4 component/storage may-affects-6.1 may-affects-6.5 may-affects-7.1 may-affects-7.5 may-affects-8.1 may-affects-8.5 labels Dec 26, 2024
@JaySon-Huang
Copy link
Contributor

JaySon-Huang commented Dec 26, 2024

After the "memory_usage_high_water" is hit. And the memory usage of "raft_msg_usage + cached_entries + applying_entries" reaches reject_messages_on_memory_ratio. It may cause tiflash reject append msg.

In such case, we may see logging as below in the tikv leader side "Raft raft: cannot step raft local message":

[2024/12/02 05:15:23.768 +09:00] [ERROR] [peer.rs:618] ["handle raft message err"] [err_code=KV:Raft:StepLocalMsg] [err="Raft raft: cannot step raft local message"] [peer_id=5569340825] [region_id=5569340823]
[2024/12/02 05:15:25.771 +09:00] [ERROR] [peer.rs:618] ["handle raft message err"] [err_code=KV:Raft:StepLocalMsg] [err="Raft raft: cannot step raft local message"] [peer_id=5569340825] [region_id=5569340823]

@CalvinNeo CalvinNeo linked a pull request Dec 30, 2024 that will close this issue
9 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-5.4 This bug affects the 5.4.x(LTS) versions. affects-6.1 This bug affects the 6.1.x(LTS) versions. affects-6.5 This bug affects the 6.5.x(LTS) versions. affects-7.1 This bug affects the 7.1.x(LTS) versions. affects-7.5 This bug affects the 7.5.x(LTS) versions. affects-8.1 This bug affects the 8.1.x(LTS) versions. affects-8.5 This bug affects the 8.5.x(LTS) versions. component/storage severity/major type/bug The issue is confirmed as a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants