You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Catnip panics with failed to clone mbuf every time at higher loads with multiple connections. The reason is that the DPDK mempool runs out of free memory to clone mbufs from. This has been verified using rte_mempool_avail_count(). It was tested using a custom example static page HTTP server and the Caladan client generator at 150K requests per second over 100 connections (this number is specific to my testing, but there is always a threshold value above which it runs out of memory). This issue was not occurring on commit 803c363a765ae14f3eff2a9de021eabf23f8d10c dated 30th September 2022.
The pattern stays similar every time:
The server works fine (the free memory in the mempool stays around the same) for a short while.
The free memory then starts decreasing. The decreases are always in big jumps and then they start rising slowly for a while (presumably due to free()s) before a big decrease again (another big allocation), with a net negative differential.
This continues until the mempool is out of memory and the server panics.
Note: This issue does not happen for a lesser number of connections. A possible reason could be the O(N) time complexity of wait_any(), which leads to a higher time interval between calls to poll() as the number of connections, and consequently the number of QTokens passed to wait_any(), increases, leading to a backlog of arrivals at the NIC that are polled all at once, which requires a large allocation of DPDK memory. This still does not explain why it did not occur in the aforementioned commit, so something elsewhere was probably modified that causes this memory leak.
The text was updated successfully, but these errors were encountered:
Description
Catnip panics with
failed to clone mbuf
every time at higher loads with multiple connections. The reason is that the DPDKmempool
runs out of free memory to clonembuf
s from. This has been verified usingrte_mempool_avail_count()
. It was tested using a custom example static page HTTP server and the Caladan client generator at 150K requests per second over 100 connections (this number is specific to my testing, but there is always a threshold value above which it runs out of memory). This issue was not occurring on commit803c363a765ae14f3eff2a9de021eabf23f8d10c
dated 30th September 2022.The pattern stays similar every time:
The server works fine (the free memory in the
mempool
stays around the same) for a short while.The free memory then starts decreasing. The decreases are always in big jumps and then they start rising slowly for a while (presumably due to
free()
s) before a big decrease again (another big allocation), with a net negative differential.This continues until the
mempool
is out of memory and the server panics.Note: This issue does not happen for a lesser number of connections. A possible reason could be the O(N) time complexity of
wait_any()
, which leads to a higher time interval between calls topoll()
as the number of connections, and consequently the number ofQToken
s passed towait_any()
, increases, leading to a backlog of arrivals at the NIC that are polled all at once, which requires a large allocation of DPDK memory. This still does not explain why it did not occur in the aforementioned commit, so something elsewhere was probably modified that causes this memory leak.The text was updated successfully, but these errors were encountered: