Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support custom validators for TryFromBytes #1330

Open
Tracked by #5
kupiakos opened this issue May 20, 2024 · 3 comments
Open
Tracked by #5

Support custom validators for TryFromBytes #1330

kupiakos opened this issue May 20, 2024 · 3 comments
Labels
customer-request Documents customer requests.

Comments

@kupiakos
Copy link
Contributor

kupiakos commented May 20, 2024

These are kinds of validity that users may need to have checked before transmutation from &[u8] to &T:

  1. The language-level validity of the bits for the type of each field in T, e.g. a bool must be either 0 or 1. This is implemented by the derive.
  2. The library-level validity of the bits in T, e.g. an invariant that the first field is less than the second. This can be referenced by the derive but inherently must be user-controlled.
  3. The library-level validity of the individual fields in T, based on the above library-level check applied to each field. This is also implemented by the derive.
  4. The library-level validity of the length of the struct given the header contents, e.g. the length field is equal to the size of the tail slice. This only applies to dynamically sized structs ending in a slice.

The plan discussed in #5 and #372 is to support the concept of a custom validator, a function or closure provided to derive(TryFromBytes) that will always be called before allowing a TryFromBytes transmute to succeed.

Open Questions

  • Should users be able to provide their own error type, to communicate the way in which the validation failed? How would this be exposed to the higher-level TryFromBytes APIs?
  • How will data validation interact with custom DSTs? Is the validator also responsible for validating the correct length of the tail slice, or even deriving the correct length (as proposed in Support casting to a KnownLayout type with a user-provided length computed dynamically #1289 (comment))?
    • What happens if the user provides both a #[length] and custom validator function and they disagree?
    • Given struct Outer { x: u8, y: Inner }/struct Inner { a: [u8; 4], b: [u8] }, is the validator for Outer allowed to communicate a maximum length for the tail slice located inside of Inner? What if Inner has a custom validator that returns "valid if the tail slice is truncated to N", but Outer has a different return from its custom validator?
  • What are the required signature(s) for the custom validators? In theory the derive can support many return types simultaneously, including bool, Result<(), CustomError>, or Result<Option<usize>, CustomError> (to communicate a required length).
  • When zerocopy's expectations surrounding validator behavior are violated, should it panic (terrible for embedded), only panic in debug mode (middle ground a la + overflow checking), or always reject the input (could miss bugs in validators).
  • Should we ban mutation inside of a validator (either programmatically or as a safety invariant)? See also: TryFromBytes::is_bit_valid should promise not to mutate its referent #1831
@kupiakos kupiakos added the customer-request Documents customer requests. label May 20, 2024
@joshlf
Copy link
Member

joshlf commented May 20, 2024

See also: #590

@joshlf
Copy link
Member

joshlf commented Oct 4, 2024

@jswrenn and I met and discussed potential designs. He has his own design proposal that he'll share at some point. Here's mine.

This design is based on the observation that some users may want to not only validate safety conditions, but actively transform their type - for example to construct a new witness wrapper type. It permits a distinction between the type being constructed and its "raw" equivalent.

This design is also explicitly meant to support custom length fields (#1289). In order to do that, validation can return a rich error message which might be required when a length field cannot be parsed.

We didn't have time during our discussion to think deeply about how this composes with a #[length] attribute. For example, is the length extracted before or after calling the user's custom validator? Can the #[length] attribute itself provide a "custom extractor" that also has error cases? These will need to be thought through.

// Used in the following examples
type MaybeValid<T, A> = Ptr<T, (A, Any, AsInitialized)>;
type MaybeAligned<T, A> = Ptr<T, (A, Any, Valid)>;

unsafe trait TryFromBytes {
    // Set by `#[zerocopy(raw = ...)]`, defaults to `Self`.
    #[doc(hidden)]
    type Raw: TryFromBytes;
    
    // Set by `#[zerocopy(error = ...)]`, defaults to
    // `()` or similar.
    //
    // Not doc(hidden)!
    type ValidationError;

    // Replaces `is_bit_valid`.
    fn try_from_maybe_valid_raw<A>(
        maybe_raw: MaybeValid<Self::Raw, A>,
    ) -> Result<MaybeAligned<Self, A>, Self::ValidationError>;
}

Here's how this would be used by a hypothetical user:

#[derive(TryFromBytes)]
#[zerocopy(raw = FooRaw, error = FooError, validator = Foo::try_from_raw)]
struct Foo(...);

#[derive(TryFromBytes)]
struct FooRaw(...);

struct FooError(...);

impl Foo {
    fn try_from_raw<A: Aliasing>(
        r: Result<MaybeAligned<Self::Raw, A>, Self::Raw::ValidationError>
    ) -> Result<MaybeAligned<Self, A>, Self::ValidationError> {
        ...
    }
}

Our derive would emit the following impl:

unsafe impl TryFromBytes {
    type Raw = FooRaw;
    type ValidationError = FooError;

    fn try_from_maybe_valid_raw<A>(
        maybe_raw: MaybeValid<FooRaw, A>,
    ) -> Result<MaybeAligned<Self, A>, Self::ValidationError> {
        let raw_result = FooRaw::is_bit_valid(maybe_raw);
        Foo::try_from_raw(raw_result)
    }
}

As written, this gives the user full power: they are responsible for converting Self::Raw::ValidationError into Self::ValidationError and MaybeAligned<Self::Raw> into MaybeAligned<Self>. However, the user may not want to deal with all of these details. Thus, we can abstract somewhat and support the following simplifications:

  • The validator can take a value rather than a Result; the framework will handle propagating errors
  • The validator can return a bool rather than a Result; the framework will handle generating Ok or Err values as appropriate

First, we introduce the following trait and implement it for different function types. Each impl (except for the most general one, which puts all of the onus on the user) carries restrictions:

  • When the validator takes a value rather than a Result, we add a T::ValidationError: From<<T::Raw as TryFromBytes>::ValidationError> bound to ensure that, if an error is encountered, the framework can convert the error
  • When the validator returns a bool, we additionally require T::ValidationError: Default to ensure that our framework can synthesize new errors when the validator returns false
trait Validator<T: TryFromBytes, A, Disambiguator> {
    fn try_from_raw(
        self,
        r: Result<
            MaybeAligned<<T as TryFromBytes>::Raw, A>,
            <<T as TryFromBytes>::Raw as TryFromBytes>::ValidationError,
        >,
    ) -> Result<MaybeAligned<T, A>, T::ValidationError>;
}

impl<T, F, A> Validator<T, A, ()> for F
where
    T: TryFromBytes,
    F: FnOnce(
        Result<MaybeAligned<T::Raw, A>, <T::Raw as TryFromBytes>::ValidationError>,
    ) -> Result<MaybeAligned<T, A>, T::ValidationError>,
{
    fn try_from_raw(
        self,
        r: Result<MaybeAligned<T::Raw, A>, <T::Raw as TryFromBytes>::ValidationError>,
    ) -> Result<MaybeAligned<T, A>, T::ValidationError> {
        self(r)
    }
}

impl<T, F, A> Validator<T, A, ((),)> for F
where
    T: TryFromBytes,
    T::ValidationError: From<<T::Raw as TryFromBytes>::ValidationError>,
    F: FnOnce(MaybeAligned<T::Raw, A>) -> Result<Maybe<T, A>, T::ValidationError>,
{
    fn try_from_raw(
        self,
        r: Result<MaybeAligned<T::Raw, A>, <T::Raw as TryFromBytes>::ValidationError>,
    ) -> Result<MaybeAligned<T, A>, T::ValidationError> {
        match r {
            Ok(r) => self(r),
            Err(err) => Err(err.into()),
        }
    }
}

impl<T, F, A> Validator<T, A, (((),),)> for F
where
    T: TryFromBytes<Raw = T>,
    T::ValidationError: Default + From<<T::Raw as TryFromBytes>::ValidationError>,
    F: FnOnce(MaybeAligned<T::Raw, A>) -> bool,
{
    fn try_from_raw(
        self,
        r: Result<MaybeAligned<T, A>, <T::Raw as TryFromBytes>::ValidationError>,
    ) -> Result<MaybeAligned<T, A>, T::ValidationError> {
        match r {
            Ok(r) => if self(r) {
                Ok(r)  
            } else {
                Err(Default::default())
            },
            Err(err) => Err(err.into()),
        }
    }
}

Finally, we can change the derive-generated code like so:

    fn try_from_maybe_valid_raw<A>(
        maybe_raw: MaybeValid<FooRaw, A>,
    ) -> Result<MaybeAligned<Self, A>, Self::ValidationError> {
        let raw_result = FooRaw::is_bit_valid(maybe_raw);
        Validator::try_from_raw(Foo::try_from_raw, raw_result)
    }

Note that this also supports closures rather than named validators. For example, if the user specifies #[zerocopy(validator = |foo| foo.0.is_valid())], this desugars as expected:

    fn try_from_maybe_valid_raw<A>(
        maybe_raw: MaybeValid<FooRaw, A>,
    ) -> Result<MaybeAligned<Self, A>, Self::ValidationError> {
        let raw_result = FooRaw::is_bit_valid(maybe_raw);
        Validator::try_from_raw(|foo| foo.0.is_valid(), raw_result)
    }

@djkoloski if we added validation context to this design, would it support your rkyv use case? In particular, is the ability to perform mutation in the validator enough to do your fix-up operation?

We'd at a minimum need a slight tweak: we'd have to have a way of signaling that a TryFromBytes impl only works on mutable input in order to do the fix-up. But let's assume we've done that for the sake of this question.

@newpavlov
Copy link
Contributor

@joshlf
What is motivation for passing Result to try_from_raw? I would assume that if we have failed to get a raw value, then there is no reason to run user-provided validators.

As for errors, in my use cases returning bool should be sufficient since failed validation means we got corrupted data and we can not do any reasonable error processing based on failure kind inside the program. Maybe it can be done by using simple_validator for bool-based functions and validator for Result-based?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
customer-request Documents customer requests.
Projects
None yet
Development

No branches or pull requests

3 participants