Restructure cudf spill metrics and test #8984

TomAugspurger · 2025-01-10T14:48:02Z

This updates how we define the cuDF spilling metric and test. The primary motivation is to make it easier to test this within the main pytest process. Previously, the test was skipped if the environment wasn't configured with several environment variables controlling behavior in both distributed and cudf.

The cuDF config can be changed programmatically within the test process without issue, so I added a new fixture that sets & unsets the values we need for this test.

The distributed configuration is a bit harder, since we rely on some side-effects of import dask.worker to define a new metric and add it to the list of DEFAULT_METRICS. This makes it challenging for us to change this in our tests (not to mention users).

I changed dask.worker to always define this metric-collecting function, but only add it to DEFAULT_METRICS when distributed.diagnostics.cudf is set. This should result in the same behavior (whether you have dask-cudf installed or not, and whether that option is set or not) for most cases, but makes it a bit easier to test. The only change in behavior is if you have distributed.diagnostics.cudf set but don't have dask-cudf installed on your workers. Now we'll error when trying to start the metric rather than silently failing to collect that metric.

And I split the test in two:

The original test for the metrics, but updated to explicitly add the cuDF spill metric to the worker metrics, since the environment might not be configured to do that automatically
A second test that ensures that the metric is present by default when the environment is configured (by monkeypatching the environment and clearing the import cache before re-importing dask.worker).

This updates the cudf-spilling test to rely on a few less environment variables.

github-actions · 2025-01-10T15:40:39Z

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

27 files ± 0 27 suites ±0 11h 33m 52s ⏱️ + 38m 1s
4 112 tests - 2 3 998 ✅ + 1 111 💤 - 2 3 ❌ ±0
51 563 runs +1 408 49 270 ✅ +1 372 2 288 💤 +37 5 ❌ ±0

For more details on these failures, see this check.

Results for commit 49322cc. ± Comparison against base commit 0657de2.

This pull request removes 2 tests.

distributed.cli.tests.test_dask_worker.test_listen_address_ipv6[tcp:..[ ‑ 1]:---nanny]
distributed.cli.tests.test_dask_worker.test_listen_address_ipv6[tcp:..[ ‑ 1]:---no-nanny]

Updated cudf-spill test

49322cc

This updates the cudf-spilling test to rely on a few less environment variables.

TomAugspurger requested a review from fjetter as a code owner January 10, 2025 14:48

TomAugspurger requested review from charlesbluca and removed request for fjetter January 10, 2025 14:48

TomAugspurger mentioned this pull request Jan 10, 2025

Initial setup for dask-upstream-testing rapidsai/dask-upstream-testing#1

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Restructure cudf spill metrics and test #8984

Restructure cudf spill metrics and test #8984

TomAugspurger commented Jan 10, 2025

github-actions bot commented Jan 10, 2025

Restructure cudf spill metrics and test #8984

Are you sure you want to change the base?

Restructure cudf spill metrics and test #8984

Conversation

TomAugspurger commented Jan 10, 2025

github-actions bot commented Jan 10, 2025

Unit Test Results