Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large (30-40MB) data chunks loaded through imports are copied over each VU #3515

Closed
Dasha27 opened this issue Dec 20, 2023 · 9 comments
Closed

Comments

@Dasha27
Copy link

Dasha27 commented Dec 20, 2023

Brief summary

k6 consumes a lot of memory when using imports. Any non-empty import affects the test which causes a problem with running stability tests.

k6 version

k6 v0.46.0, go1.21.0

OS

Debian 11

Docker version and image (if applicable)

No response

Steps to reproduce the problem

We use several scripts with imports like this:

import http from 'k6/http';
import exec from 'k6/execution'
import {check, fail, sleep, group} from 'k6';
import {sha512} from 'k6/crypto';
import {getSession} from '../start_session.js';
import {headers, env, testType, testConfig, trendStats, testTags} from "../constants.js"
import {name, config} from "../configuration.js"
import {generatePayload1, generatePayload2, generatePayload3, generatePayload4} from "../payload.js";
import {function1, function2, function3} from "../functions.js";

The test with such imports consumes all the memory of the load generator (16 GB) within 1 hour of a fix-load test with 5000 VUs.

Here in file you can see the memory consumption for 500 VUs
k6_mem_deduplicated.log

Expected behaviour

The test should run without any memory leaks even on the high load and using different imports.

Actual behaviour

The test consumes all the memory within a quite short period. It happens even without using any xk6 extensions and writing any logs/artifacts. Using only one file without any imports (except k6 libraries) works fine and without memory leaks.

@joanlopez
Copy link
Contributor

Hi @Dasha27,

Thanks for details. I'll try to reproduce it. But meanwhile, would you be able to to share a minimal reproducible example?
I see a lot of custom files (e.g. start_session.js, constants.js, configuration.js, payload.js, etc), and I guess whatever that's there could make the difference.

Thanks!

@Dasha27
Copy link
Author

Dasha27 commented Dec 20, 2023

Hi @joanlopez,

Sure, here are some examples of the custom files.
start_session.txt
constants.txt
functions.txt
payload.txt
configuration.txt

@joanlopez
Copy link
Contributor

joanlopez commented Dec 20, 2023

Sorry @Dasha27, but I still cannot see what the main (default) test function actually does in your case, so although I appreciate you shared the helper files imported (what you pointed as what seems to be reason of huge memory consumption), I'd need to at least have a clue of how the test looks like (from the initial message I can only see the list of imports), to understand how those 5000VUs will behave, and what could be causing the memory consumption (ideally try to reproduce it and profile).

So, please could you shed some light? Thanks!

@Dasha27
Copy link
Author

Dasha27 commented Dec 20, 2023

Sure, sorry for misunderstanding, here is the main file
main.txt

@joanlopez
Copy link
Contributor

joanlopez commented Dec 27, 2023

Hi @Dasha27,

I've spent some time trying to reproduce the same behavior (distinct memory consumption with/out imports) with no luck. So, at this point, I'd like to give you two suggestions. Either:

  • Try to extract some memory profiles from your high memory consumption executions (see here how to enable profiling endpoints in k6), so we can identify what pieces are consuming more memory and try to reason why.
  • Try to strip out some of your test bits, so you will either:
    • Identify what specific piece is causing the high memory consumption.
    • End up having a much simpler test that you can share with us, so it's easier for us to reproduce your casuistic (ideally, just a very few lines, even if spread across different files).

Honestly, I've spent some time trying to reproduce it with the bits you shared so far, but I had no luck, and honestly I struggled a bit because they contain many specific details custom to your environment/scenario. However, I have to admit that after a quick look, I haven't detected yet any red flag that might be causing such a high memory consumption as you mention. So, still curious.


Also, note that high memory consumption for certain large and long tests might be expected. You can look for some references along these benchmarks, which are a bit outdated, but shouldn't differ much for most recent releases. Additionally, if you're curious about related conversations, you can take a look at the discussion we recently had at #3498, and what's described in #2367 (which is still tbd, btw).

Thanks!

@joanlopez joanlopez added awaiting user waiting for user to respond and removed triage labels Jan 10, 2024
@joanlopez joanlopez removed their assignment Mar 18, 2024
@metaturso
Copy link

metaturso commented Jul 18, 2024

I'm in a similar predicament. In my case, the test loads a large (30-40MB) CSV file and immediately runs out of memory.

I initially thought parsing large amounts of CSV data was the problem. It wasn't. Then suspected some funky business was going on with SharedArray (#3237). But it wasn't that either.

I kept removing code from the script until I got a minimal scenario that eats up 64GB of memory in about a couple minutes:

import { data } from "large-csv-file-now-converted-to-javascript.js";

export function setup() {
    return {};
}

export const options = {
    /* cloud */
    scenarios: {
        leak: {
            executor: "ramping-arrival-rate",
            exec: "leak",
            timeUnit: "1m",
            startRate: 288,
            preAllocatedVUs: 400,
            maxVUs: 400,
            stages: [
                {target: 3378, duration: "10m"},
                {target: 3378, duration: "175m"},
                {target: 0, duration: "5m"},
            ],
        },
    }
};

export function leak() {
    // empty body.
}

This is the data file. The objects have 3 fields.

// This is a 30MB worth of captured request data.

// The original file was a CSV parsed with papaparse.
// Then it was a JSON file, but it always exported { default: {} } without data...
// Then we got to this file.

export const data = [
    {
        path: "/",
        query: "",
        method: "GET",
    },
    { /* ... */ },
    { /* ... */ },
      /* ... */
    { /* ... */ },
    { /* ... */ },
];

@joanlopez
Copy link
Contributor

joanlopez commented Jul 22, 2024

Hi @metaturso,

From k6 docs, you can read:

In general, all external modules added to a test project have a negative impact on performance, as they further increase the memory footprint and CPU usage.

Usually, this is not a big problem as each application only allocates these resources once. In k6, however, every VU has a separate JavaScript virtual machine (VM), duplicating the resource usage once each.

So, looking at the example you provided, I think that huge memory usage is just expected, as these ~35MB would be copied over each VU (cause each VU is an isolated JS runtime, and the data variable needs to be set on each of them).

In fact, I just profiled the memory usage of such example, and the consumption is around ~14GB, which matches with the rough math: 400 VUs x 35MB. The memory usage from the OS standpoint (process) is much higher (around ~50GB), but that's probably because Go's garbage collector isn't very optimal for such use scenario.

That said, what I'd recommend you to avoid such large memory consumption is:

  • Use SharedArray, designed precisely to handle such scenarios.
  • Make data be a function that returns the data, or use a data file (CSV, JSON, etc) and open it with k6/experimental/fs.open.

For instance, note the difference in your script from:

// script.js
import { data } from "large-csv-file-now-converted-to-javascript.js";

export function setup() {
    ...
}

export const options = {
   ...
};

export function leak() {
    // empty body.
}

// large-csv-file-now-converted-to-javascript.js
export const data = [
    {
        path: "/",
        query: "",
        method: "GET",
    },
    { /* ... */ },
    { /* ... */ },
      /* ... */
    { /* ... */ },
    { /* ... */ },
];

vs

// script.js
import { getData } from "large-csv-file-now-converted-to-javascript.js";
import { SharedArray } from 'k6/data';

const data = new SharedArray('data', function () {
  return getData();
});

export function setup() {
    ...
}

export const options = {
   ...
};

export function leak() {
    // empty body.
}

// large-csv-file-now-converted-to-javascript.js
export function getData() return [
        {
            path: "/",
            query: "",
            method: "GET",
        },
        { /* ... */ },
        { /* ... */ },
          /* ... */
        { /* ... */ },
        { /* ... */ },
    ];
}

Please, note that only modifying your example to use SharedArray isn't enough, as what you want is also avoiding the memory allocation of data for each VU, which would remain if you import { data } from file, directly as raw data.

I hope that helps! @Dasha27 could you confirm that would also help in your case (which I guess is a more complex scenario of what @metaturso shared)? If so, I'd suggest to close the issue, as I'd mark what's described as expected behavior and just consider what I suggested above as solution.

Thanks! 🙇🏻

PS: Thanks @metaturso for providing such an easy to reproduce example! 🙌🏻

@metaturso
Copy link

@joanlopez: Thank you so much for looking into this and for debugging the scenario.

I also expected to see memory allocations in the region of 15GB. However, my concern was that k6 never actually stopped allocating until both memory and swap file were completely filled instead of staying below 20GB.

I didn't realise this might have been an issue at a lower level. I'm happy to call this working as intended and blame it on Go's garbage garbage collection 😅

Regarding the use of SharedArray, the reason I tried using an import to load the data is that large elements in a SharedArray also tend to leak memory significantly, as described in #3237.

Fortunately, in my case, I can split my data into chunks small enough that don't clog the SharedArray 😃

@joanlopez joanlopez removed their assignment Jul 22, 2024
@joanlopez joanlopez added performance and removed bug awaiting user waiting for user to respond labels Jul 22, 2024
@joanlopez joanlopez changed the title k6 memory leak using imports Large (30-40MB) data chunks loaded through imports are copied over each VU Oct 7, 2024
@joanlopez
Copy link
Contributor

Considering that there has been no news since more than two months ago, and that the exploration was quite successful: the root cause is clearly identified, as well as the possible workaround; I'm about to close this issue, as I guess the only remaining work from this discussion is already stated in other issues, like #3237.

If you have any other trouble, feel free to open another issue. Thanks! 🙇🏻

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants