Severe performance degradation in 4.9.3-rc2 #3067

xavierabellan · 2024-12-20T16:07:56Z

This issue is related to Unidata/netcdf4-python#1393 but narrowed down to netcdf-c library in it's 4.9.3-rc2 version.

We have noticed a very significant degradation in read performance when accessing a variable in a loop.

As a reproducer, the International Best Track Archive for Climate Stewardship (IBTrACS) data can be used:

wget https://www.ncei.noaa.gov/data/international-best-track-archive-for-climate-stewardship-ibtracs/v04r01/access/netcdf/IBTrACS.ALL.v04r01.nc

With the following minimal C code:

#include <netcdf.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>

#define FILE_NAME "IBTrACS.ALL.v04r00.nc"
#define DIM_STORM "storm"
#define VAR_LAT "lat"

int main() {
    int ncid;                 // File ID
    int storm_dimid;          // ID for the "storm" dimension
    int lat_varid;            // ID for the "lat" variable
    size_t nstorms;           // Total number of storms
    size_t start[2], count[2]; // Start and count arrays for reading lat data
    float lat_value;          // Latitude value to read
    double total_time = 0.0;  // Total time of getting lat value
    int iterations = 0;       // Number of iterations

    // Open the NetCDF file
    nc_open(FILE_NAME, NC_NOWRITE, &ncid);

    // Get the "storm" dimension ID and size
    nc_inq_dimid(ncid, DIM_STORM, &storm_dimid);
    nc_inq_dimlen(ncid, storm_dimid, &nstorms);

    // Get the "lat" variable ID
    nc_inq_varid(ncid, VAR_LAT, &lat_varid);

    // Iterate over the last 10 storms
    for (size_t ns = nstorms - 10; ns < nstorms; ns++) {
        // Set start and count for reading a single value
        start[0] = ns;  // Index of the storm
        start[1] = 0;   // First time step
        count[0] = 1;   // Read one storm
        count[1] = 1;   // Read one time step

        clock_t cstart, cend;
        cstart = clock();

        // Read the latitude value
        nc_get_vara_float(ncid, lat_varid, start, count, &lat_value);

        cend = clock();
        double time_taken = ((double)(cend - cstart)) * 1000/ CLOCKS_PER_SEC; // Convert to ms
        total_time += time_taken;
	iterations++;

        // Print the latitude value
        printf("Storm %zu: Latitude is %f, Time taken = %.3f ms\n", ns, lat_value, time_taken);

    }

    double average_time = total_time / iterations;
    printf("\nAverage runtime per iteration: %.3f ms\n", average_time);

    // Close the NetCDF file
    nc_close(ncid);

    return 0;
}

iterating over the last 10 storms in the input file and accessing the corresponding lat value. With version 4.9.2 and below only the first read takes significant time, while the rest take virtually nothing:

Storm 13801: Latitude is -12.000000, Time taken = 80.000 ms
Storm 13802: Latitude is -13.400000, Time taken = 0.000 ms
Storm 13803: Latitude is -12.700001, Time taken = 0.000 ms
Storm 13804: Latitude is -13.100001, Time taken = 0.000 ms
Storm 13805: Latitude is -13.800000, Time taken = 0.000 ms
Storm 13806: Latitude is -8.600000, Time taken = 0.000 ms
Storm 13807: Latitude is -9.100000, Time taken = 0.000 ms
Storm 13808: Latitude is -2.300000, Time taken = 0.000 ms
Storm 13809: Latitude is 11.199999, Time taken = 0.000 ms
Storm 13810: Latitude is 18.800001, Time taken = 0.000 ms

Average runtime per iteration: 8.000 ms

Where as using 4.9.3-rc2, every iteration takes a similar amount of time:

Storm 13801: Latitude is -12.000000, Time taken = 80.000 ms
Storm 13802: Latitude is -13.400000, Time taken = 90.000 ms
Storm 13803: Latitude is -12.700001, Time taken = 50.000 ms
Storm 13804: Latitude is -13.100001, Time taken = 60.000 ms
Storm 13805: Latitude is -13.800000, Time taken = 40.000 ms
Storm 13806: Latitude is -8.600000, Time taken = 50.000 ms
Storm 13807: Latitude is -9.100000, Time taken = 50.000 ms
Storm 13808: Latitude is -2.300000, Time taken = 50.000 ms
Storm 13809: Latitude is 11.199999, Time taken = 50.000 ms
Storm 13810: Latitude is 18.800001, Time taken = 50.000 ms

Average runtime per iteration: 57.000 ms

For certain programs, this may result in a very noticeable slowdown.

This has been reproduced in Linux and Mac.

The text was updated successfully, but these errors were encountered:

WardF · 2024-12-21T00:02:13Z

Thank you for reporting this, and thank you for providing the test program. I'll take a look at this and see if I can sort it out. Out of curiosity, do you know which version of libhdf5 is being linked against?

xavierabellan · 2025-01-09T10:34:11Z

The results pasted above were using hdf5 1.14.5, but I could also observe the same behaviour with a 1.12.X

edwardhartnett · 2025-01-09T14:08:57Z

OK, this is very significant and I need to take a look at this. @WardF I don't think you should release 4.9.3 until we get to the bottom of this problem.

The access pattern in use here is very common indeed, so this is not an edge case. We can't just slow down everyone's supercomputer programs for more I/O! There will be a crowd of angry scientists outside Unidata offices with pitchforks and torches...

edwardhartnett · 2025-01-09T14:09:49Z

Also it would be great if we could work out some basic performance testing in CI. It's a difficult problem, but surely we can try to catch such an obvious slowdown somehow...

DennisHeimbigner · 2025-01-10T01:11:57Z

My first thought was that the implementation of get_vara/get_vars had changed.
So I took the get_vars code from 4.9.2 and inserted it into the main branch.
It still shows the same perf. problem. So with high probability, it is not get_vars
(or at least only get_vars). BTW, I was using HDF5 1.12.3.

WardF · 2025-01-13T19:24:06Z

Using git bisect, initial indicators are that the issue was introduced in f1a3a64. Description of commit as follows:

    Cleanup the handling of cache parameters.
    
    re: https://github.com/Unidata/netcdf-c/issues/2733
    
    When addressing the above issue, I noticed that there was a disconnect
    in NCZarr between nc_set_chunk_cache and nc_set_var_chunk cache.
    Specifically, setting nc_set_chunk_cache had no impact on the per-variable cache parameters when nc_set_var_chunk_cache was not used.
    
    So, modified the NCZarr code so that the per-variable cache parameters are set in this order (#1 is first choice):
    1. The values set by nc_set_var_chunk_cache
    2. The values set by nc_set_chunk_cache
    3. The defaults set by configure.ac

This seems like it could be related (or at least is not obviously unrelated), I will dig into it.

WardF · 2025-01-13T19:47:20Z

The issue appears to be related to the following change in logic in libhdf5/nc4hdf.c (around lines 1147) and default chunk-cache related default values:

Prior to the first bad commit, chunkcache.size was checked against (and set to if exceeding) MAX_DEFAULT_CACHE_SIZE, which was defined as 67108864. In the first bad commit, this has been changed to check against (and set to if exceeding) DEFAULT_CHUNK_CACHE_SIZE, 16777216 by default, with MAX_DEFAULT_CACHE_SIZE having been removed altogether. This quartering of the default cache size is why we are seeing this tenfold (on my system, at least) increases in access time.

The naive fix on our end is to quadruple the DEFAULT_CHUNK_CACHE_SIZE, but I want to wrap my head around why we removed MAX_DEFAULT_CACHE_SIZE in the first place. @DennisHeimbigner do you have any recollection? I'll work through the code change on my end.

WardF · 2025-01-13T19:52:31Z

At first read-through, it appears that prior to this change, MAX_DEFAULT_CACHE_SIZE was used as the per-variable chunk cache size, and CHUNK_CACHE_SIZE was stored as part of the global state, but wasn't referenced(?) I need to double check that last part, since I'm only looking at the diff logs right now. But it appears that perhaps in cleaning up this discrepancy, we inadvertently quartered the size of the per-variable chunk cache, which would explain what we're observing.

DennisHeimbigner · 2025-01-13T19:59:30Z

If memory serves, this change was in response to a user posted issue about cache size.
Possibilities:

Huge memory consumption in chunk-cache after handling several opened netcdf-files with default-chunk-cache configuration #2913
nczarr_test/run_cachetest fails #2948

WardF · 2025-01-13T20:03:12Z

Thanks for reminding me, @DennisHeimbigner. That makes sense. The question now is how to proceed, and I think the answer may be as much about education as it is setting default cache size values. As @edwardhartnett points out, this is a pretty common use case (as reported in this issue), so I am inclined to lean towards default values that ameliorate the observed jump in access time. This might result in issues being opened related to increased memory usage, but ultimately these are parameters that the end user has control over, if I can remember to reference them.

DennisHeimbigner · 2025-01-13T20:12:26Z

I found the original pull request: #2734
It refers to issue #2733.

edwardhartnett · 2025-01-13T20:35:21Z

I commented in #2734

xavierabellan mentioned this issue Dec 20, 2024

Severe performance degradation with bundled netcdf library in wheel packages (Linux) Unidata/netcdf4-python#1393

Open

This was referenced Jan 14, 2025

Cleanup the handling of cache parameters. #2734

Merged

Update default_chunk_cache_size #3077

Merged

WardF closed this as completed in #3077 Jan 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Severe performance degradation in 4.9.3-rc2 #3067

Severe performance degradation in 4.9.3-rc2 #3067

xavierabellan commented Dec 20, 2024

WardF commented Dec 21, 2024

xavierabellan commented Jan 9, 2025

edwardhartnett commented Jan 9, 2025

edwardhartnett commented Jan 9, 2025

DennisHeimbigner commented Jan 10, 2025

WardF commented Jan 13, 2025

WardF commented Jan 13, 2025 •

edited

Loading

WardF commented Jan 13, 2025

DennisHeimbigner commented Jan 13, 2025

WardF commented Jan 13, 2025

DennisHeimbigner commented Jan 13, 2025

edwardhartnett commented Jan 13, 2025

Severe performance degradation in 4.9.3-rc2 #3067

Severe performance degradation in 4.9.3-rc2 #3067

Comments

xavierabellan commented Dec 20, 2024

WardF commented Dec 21, 2024

xavierabellan commented Jan 9, 2025

edwardhartnett commented Jan 9, 2025

edwardhartnett commented Jan 9, 2025

DennisHeimbigner commented Jan 10, 2025

WardF commented Jan 13, 2025

WardF commented Jan 13, 2025 • edited Loading

WardF commented Jan 13, 2025

DennisHeimbigner commented Jan 13, 2025

WardF commented Jan 13, 2025

DennisHeimbigner commented Jan 13, 2025

edwardhartnett commented Jan 13, 2025

WardF commented Jan 13, 2025 •

edited

Loading