Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validation bug when using feature values as slicing specs #150

Open
zywind opened this issue Jan 24, 2022 · 2 comments
Open

Validation bug when using feature values as slicing specs #150

zywind opened this issue Jan 24, 2022 · 2 comments

Comments

@zywind
Copy link

zywind commented Jan 24, 2022

TFMA version: 0.33

Hi team,

I found a bug in TFMA validation. It happens when I use multiple feature values to specify slicing like so:

eval_config = text_format.Parse("""
  model_specs {
    signature_name: "serving_default"
    preprocessing_function_names: ["transform_features"]
    label_key: "income_bracket"
  }
  metrics_specs {
    metrics {
      class_name: "ExampleCount"
    }
    metrics {
      class_name: "BinaryAccuracy"
      config: '{"name": "accuracy"}'
      per_slice_thresholds {
        slicing_specs {
          feature_values: {key: "race", value: "White"}
          feature_values: {key: "sex", value: "Male"}
        }
        threshold {
          value_threshold {
            lower_bound { value: 0.7 }
          }
        }
      }
    }
  }
  slicing_specs {
    feature_keys: ["race", "sex"]
  }
""", tfma.EvalConfig())

In this example, I specify that I just want to validate accuracy on a particular group: white males. The evaluation will run, but when I load validation results, I get:

tfma.load_validation_result(evaluator.outputs['evaluation'].get()[0].uri)

missing_slices {
  feature_values {
    key: "race"
    value: "White"
  }
  feature_values {
    key: "sex"
    value: "Male"
  }
}

I've tracked down the cause of this bug, and I believe it's the following two lines:

slice_hash = details.slicing_spec.SerializeToString()

slice_hash = slice_spec.SerializeToString()

Here, you're relying on serialization of slicing_specs for equality comparison. The problem is that feature_values is a map field, and the entry order is random in the serialized string. Sometimes you get race serialized before sex, and sometimes the reverse happens, which is why I get the missing slices message.

I think custom serialization using sorted map fields would fix this bug.

@genehwung
Copy link
Contributor

Thanks a lot for reporting the bug. We are actively working on a fix.

@embr
Copy link
Collaborator

embr commented Feb 7, 2022

This issue should be fixed at head. Let us know if you are still seeing the problem after d31e583

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants