Restructure cudf spill metrics and test #8984
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This updates how we define the cuDF spilling metric and test. The primary motivation is to make it easier to test this within the main
pytest
process. Previously, the test was skipped if the environment wasn't configured with several environment variables controlling behavior in both distributed and cudf.The cuDF config can be changed programmatically within the test process without issue, so I added a new fixture that sets & unsets the values we need for this test.
The distributed configuration is a bit harder, since we rely on some side-effects of
import dask.worker
to define a new metric and add it to the list ofDEFAULT_METRICS
. This makes it challenging for us to change this in our tests (not to mention users).I changed
dask.worker
to always define this metric-collecting function, but only add it toDEFAULT_METRICS
whendistributed.diagnostics.cudf
is set. This should result in the same behavior (whether you have dask-cudf installed or not, and whether that option is set or not) for most cases, but makes it a bit easier to test. The only change in behavior is if you havedistributed.diagnostics.cudf
set but don't have dask-cudf installed on your workers. Now we'll error when trying to start the metric rather than silently failing to collect that metric.And I split the test in two: