Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can you configure the NetVips/libvips runtime temp dir? #245

Open
blasky opened this issue Dec 15, 2024 · 9 comments
Open

Can you configure the NetVips/libvips runtime temp dir? #245

blasky opened this issue Dec 15, 2024 · 9 comments
Labels
question Further information is requested

Comments

@blasky
Copy link

blasky commented Dec 15, 2024

I have a deployed linux container within which NetVips is running. There is a local volume (...with /tmp) and a larger volume shared across multiple containers.

My image processing code is set to use the shared disk space so I can share some resources and write large bigtiff files. However, when doing a tiff save on the shared volume it appears that NetVips/libvips is still using the local /tmp folder for temporary memory operations and these files can be large. We've already had some transient disk space errors when things are running up to scale.

drwxrwxrwt 1 root root 4096 Dec 14 23:38 .
drwxr-xr-x 1 root root 4096 Dec 14 23:35 ..
-rw-r--r-- 1 root root 13587157 Dec 14 23:39 vips-0-1949487261.tif
-rw-r--r-- 1 root root 3267551 Dec 14 23:39 vips-1-790835135.tif
-rw-r--r-- 1 root root 853294 Dec 14 23:39 vips-2-2498032444.tif
-rw-r--r-- 1 root root 273136 Dec 14 23:39 vips-3-1228004951.tif
-rw-r--r-- 1 root root 66977 Dec 14 23:39 vips-4-2867559397.tif
-rw-r--r-- 1 root root 16 Dec 14 23:38 vips-5-1162753056.tif

Is there a way to configure NetVips to use an explicit working directory location for its temp work so I can move it over to the shared volume?

I know I can set the linux TMPDIR env var and NetVips will respect that and shift over. However, that will move all the container's linux /tmp operations over to the shared volume which could create problems for other code. Can I configure just the NetVips/libvips temporary workspace directory without altering the system TMPDIR?

@blasky blasky changed the title Can you configure NetVip's runtime temp dir? Can you configure the NetVips/libvips runtime temp dir? Dec 15, 2024
@kleisauke kleisauke added the question Further information is requested label Dec 15, 2024
@kleisauke
Copy link
Owner

Can I configure just the NetVips/libvips temporary workspace directory without altering the system TMPDIR?

The temporary directory used by libvips is handled here:
https://github.com/libvips/libvips/blob/681a06340f22620559eb26b4037c9e9acc3e90b8/libvips/iofuncs/util.c#L1500

So, to specify a different temporary path, you must use the TMPDIR environment variable.

However, you can adjust the threshold for open-via-memory versus open-via-disk using the VIPS_DISC_THRESHOLD environment variable. See for more information:
https://www.libvips.org/API/current/How-it-opens-files.html

By default, this threshold is set to 100 MiB (uncompressed pixel data), increasing this value allows larger temporary images to be stored in RAM.

However, that will move all the container's linux /tmp operations over to the shared volume which could create problems for other code.

Note that you could set the TMPDIR environment variable for a specific dotnet process or in the code itself. For example:

Environment.SetEnvironmentVariable("TMPDIR", "/dev/shm");

using var image = Image.NewFromFile("huge.jpg");
image.WriteToFile("x.tif");

Environment.SetEnvironmentVariable("TMPDIR", null);

@blasky
Copy link
Author

blasky commented Dec 15, 2024

Thanks.

Moving TMPDIR over is risky, like I said. And it means we would have to take on the maintenance of these temp spaces too (...as opposed to /tmp going away when the container goes away).

So the temporary runtime adjustment you suggest is attractive, but I worry about temp files getting fragmented or lost if other temp files are written during the period when TMPDIR has been redirected.

Hmm. I will think and play.

@blasky
Copy link
Author

blasky commented Dec 16, 2024

FYI...

From Microsoft
https://learn.microsoft.com/en-us/dotnet/api/system.environment.setenvironmentvariable?view=net-6.0

On Unix-like systems, calls to the SetEnvironmentVariable(String, String) method have no effect on any native libraries that are, or will be, loaded. (Conversely, in-process environment modifications made by native libraries aren't seen by managed callers.)

This aligns with what I'm seeing. I'm making a runtime call to set TMPDIR and libvips is still writing to /tmp.

So far only hard coding TMPDIR in the container yml is working. But that will mean all instances will share a single temp location. Dangerous. Still digging.

@kleisauke
Copy link
Owner

TIL, I wasn't aware of that; I should have tested that sample.

We could consider adding an additional check for another env variable (such as VIPS_TMPDIR) within vips__temp_dir(). Would that help?

Note that if you know you will just be doing simple top-to-bottom operations on the image, like arithmetic or filtering or resize, you can tell libvips that you only need sequential access to pixels:

using var image = Image.NewFromFile("huge.jpg", access: Enums.Access.Sequential);
image.WriteToFile("x.tif");

This avoids the creation of temporary files and you should see a nice drop in memory and CPU usage.

@blasky
Copy link
Author

blasky commented Dec 16, 2024

Sequential access might be possible in this case. Will do a quick test.

@blasky
Copy link
Author

blasky commented Dec 16, 2024

The operation in question is opening a bigtiff (...created elsewhere) and re-saving it as a pyramid with LZW compression. The NewFromFile() in this case did not have the sequential read so I added that. But I assume something about the pyramid save or compression is still triggering libvips temp writes. I still see them in /tmp.

using (NetVips.Image baseImage = NetVips.Image.NewFromFile(_filename, access: Enums.Access.Sequential))
{
    baseImage.Tiffsave(
        filename: outputFile,
        compression: Enums.ForeignTiffCompression.Lzw,
        tile: true,
        tileWidth: tileWidth,
        tileHeight: tileHeight,
        pyramid: true,
        bigtiff: true,
        xres: xResolution,
        yres: yResolution,
        resunit: Enums.ForeignTiffResunit.Cm,
        miniswhite: false
    );
}

@blasky
Copy link
Author

blasky commented Dec 16, 2024

And then I have another case where we create a Black() image for the full image and then a series of inserts of other images. I will check to see if this is writing temp files as well.

using (NetVips.Image baseImage = NetVips.Image.Black(_tiffWidth, _tiffHeight))
{
    // ...other code

    using (NetVips.Image tileImage = NetVips.Image.NewFromFile(tileFile, access: Enums.Access.Sequential))
    {
        baseImage = baseImage.Insert(tileImage, xPx, yPx);
    }

    // another pyramid with LZW compression
    baseImage.Tiffsave()
}

@blasky
Copy link
Author

blasky commented Dec 16, 2024

Yup. Looks like Tiffsave() in both these cases is the point at which libvips is writing TMPDIR files.

@blasky
Copy link
Author

blasky commented Dec 16, 2024

Been doing a bunch of reading today. Looks like there is just no way to get the TMPDIR set for the current linux container instance inside the Net runtime. Only at instantiation via their env config.

Are these libvips filenames "thread safe"? If I allowed multiple containers to point toward the same hard coded TMPDIR would these temp filenames be protected from collisions?

-rw-r--r-- 1 root root 13510885 Dec 16 23:14 vips-0-687469131.target
-rw-r--r-- 1 root root  3264827 Dec 16 23:14 vips-1-998795644.target
-rw-r--r-- 1 root root   850570 Dec 16 23:14 vips-2-4115502984.target
-rw-r--r-- 1 root root   270412 Dec 16 23:13 vips-3-3327477743.target
-rw-r--r-- 1 root root    66977 Dec 16 23:13 vips-4-1201693285.target
-rw-r--r-- 1 root root        0 Dec 16 23:12 vips-5-2544428402.target

Update from vips__temp_name. Maybe yes, maybe no.

* g_random_int() should be safe enough -- it's seeded from time(), so
* it ought not to collide often -- and on linux at least we never
* actually use these filenames in the filesystem anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants