Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restructure cudf spill metrics and test #8984

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

TomAugspurger
Copy link
Member

This updates how we define the cuDF spilling metric and test. The primary motivation is to make it easier to test this within the main pytest process. Previously, the test was skipped if the environment wasn't configured with several environment variables controlling behavior in both distributed and cudf.

The cuDF config can be changed programmatically within the test process without issue, so I added a new fixture that sets & unsets the values we need for this test.

The distributed configuration is a bit harder, since we rely on some side-effects of import dask.worker to define a new metric and add it to the list of DEFAULT_METRICS. This makes it challenging for us to change this in our tests (not to mention users).

I changed dask.worker to always define this metric-collecting function, but only add it to DEFAULT_METRICS when distributed.diagnostics.cudf is set. This should result in the same behavior (whether you have dask-cudf installed or not, and whether that option is set or not) for most cases, but makes it a bit easier to test. The only change in behavior is if you have distributed.diagnostics.cudf set but don't have dask-cudf installed on your workers. Now we'll error when trying to start the metric rather than silently failing to collect that metric.

And I split the test in two:

  1. The original test for the metrics, but updated to explicitly add the cuDF spill metric to the worker metrics, since the environment might not be configured to do that automatically
  2. A second test that ensures that the metric is present by default when the environment is configured (by monkeypatching the environment and clearing the import cache before re-importing dask.worker).

This updates the cudf-spilling test to rely on a few less environment
variables.
Copy link
Contributor

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

    27 files  ±    0      27 suites  ±0   11h 33m 52s ⏱️ + 38m 1s
 4 112 tests  -     2   3 998 ✅ +    1    111 💤  -  2  3 ❌ ±0 
51 563 runs  +1 408  49 270 ✅ +1 372  2 288 💤 +37  5 ❌ ±0 

For more details on these failures, see this check.

Results for commit 49322cc. ± Comparison against base commit 0657de2.

This pull request removes 2 tests.
distributed.cli.tests.test_dask_worker.test_listen_address_ipv6[tcp:..[ ‑ 1]:---nanny]
distributed.cli.tests.test_dask_worker.test_listen_address_ipv6[tcp:..[ ‑ 1]:---no-nanny]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant