-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
overhaul python error reporting for structured failure #7
base: main
Are you sure you want to change the base?
Conversation
…et/rx into rx_error_messages
Thanks very much for this! I wanted to let you know that I have received it and look forward to reviewing it, but just haven't made the time yet. |
So, as usual, I am late in delivering. I'm afraid some conferencing got in the way. I'm very pleased to see Python 3 compatibility added, but I have to balk at the structured errors. It's not that I don't want structure failure information — I do, I do! — it's that the spec tests have added information on how failures should be reported. They're implemented by the If you're stoked at the chance to implement them, that would be excellent. I would be pleased to help. If not, I'll probably merge at least the changes to make this Python 3 compatible, and hope to make time ot do the structure failure information myself. |
Ack, this is what I get for not reading all the documentation. I'm pretty free this week so I will get these errors implemented properly. |
I just started reviewing the Perl code that implements the structured failures, and maybe it's because I'm unfamiliar with Perl, but I'm not getting a feel for these error structures or how best to translate them into Python. Do you have any language-agnostic specification for how exactly these should work? I saw the spec.pod file but would like a more detailed description. |
I wanted to note, too, that you are not at fault for missing some sort of glaring documentation. This was done (nicely, I think) in Perl, but there was no big spec update or anything, or big TODO documentation. It just sort of sat there. So, it's been a while since worked on this, so I'm going to sort of give you a unstructured braindump, and hope it's useful.
For a good example of a leafy type, look at
(At this point, I am panicking, because I can't force any tests to fail to double-check things. New task for rjbs: figure out what key fact I have forgotten in the last year!) A more complex tree-like check is The same thing, by the way, applies to These deep paths are important because they let you get the "data path" and "check path" of a failure. These are critical. You can see an example of what you can, and why it's so important, at http://rjbs.manxome.org/rubric/entry/1743 You end up getting, in Perl, a stringified exception that says something like "the data at ->[0]->{foo}->[1] fails the check at ->{contents}->[0] because of 'range' violation" I hope this was useful. If not, please let me know and I will try again! Also, let me know if I've just left out something helpful. Tomorrow, I hope I can fix the weird problem I had with making tests fail! If I can do that, I can add a few more demonstrative tests! |
The path data structures make sense to me in a context where only one subcheck fails, but what does the error structure look like in more complicated situations where, say, the 0th and 1st data are both no good, or a I apologize if all this should be obvious by reviewing the Perl code, languages that demand you prefix your variable names with special characters give me the willies, so thank you for putting it all in natural language. :) |
(…come to the Perl side … but not until you finish the Python updates… 😉) For an extra-complex failure, check out It's this: ---
type: //rec
required: { key: //int }
optional: { opt: //bool }
rest: { type: //map, values //bool } In other words, it's a dict where you must have an int as the value for "key", you may have an "opt" that's a bool, plus any number of other entries with bool values. (This would be more interesting if "opt" had a non-bool type, but so it goes.) You can fail this in a bunch of ways. You could have something bad for "opt" or for "key" or you could be missing "opt" or you could have an entry for "foobar" with a non-bool value. The tests run this schema against the input { "opt": "pants", "rest": "pants" } …and which has three obvious problems:
Dutifully, we expect three failures, as we see in the first file I mentioned: "opt-pants-rest-pants": {
"errors":
[
{
"data": [ ],
"check": [ ],
"error": [ "missing" ]
},
{
"data": [ "opt" ],
"check": [ "optional", "opt" ],
"error": [ "type" ]
}, {
"data": [ "rest" ],
"check": [ "rest", "values" ],
"error": [ "type" ]
}
]
}
For the question about //any, I'm going to force you to look at a little Perl. This is the sub assert_valid {
return 1 unless $_[0]->{of};
my ($self, $value) = @_;
my @failures;
for my $i (0 .. $#{ $self->{of} }) {
my $check = $self->{of}[ $i ];
return 1 if eval { $check->assert_valid($value) };
my $failure = $@;
$failure->contextualize({
type => $self->type,
check_path => [ [ 'of', 'key'], [ $i, 'index' ] ],
});
push @failures, $failure;
}
$self->fail({
error => [ qw(none) ],
message => "matched none of the available alternatives",
value => $value,
failures => \@failures,
});
} It tries each subcheck. As soon as one matches, it returns 1 — any match is a total success! Otherwise, it accumulates each failure as it goes. The After all the contextualized failures are collected in the array Let me know if this did or didn't help. The tests here are not as comprehensive as they could be, which makes them unclear. I should update the spec, of course. I still use Rx heavily in Perl, and lightly in some other languages, but because the Perl implementation is done, I have been lax on touching anything else. I keep finding out other people use it, though, so I really need to make the time! |
There are some aspects of this that I still don't understand. How do the failure structures look for nested errors? Say, for instance, that Also how do you feel about implementing these errors as Python objects instead of straight dictionaries, and then possibly including a method that converts them to dictionaries of the form that you expect for your tests? |
They should absolutely be thrown as objects. In the Perl code, for example, you get a Data::Rx::Failure object. The spec is describing the properties of the error expected, and absolutely not defining the exact representation. Data::Rx::Failure and Data::Rx::FailureSet do not have a method like the one you describe. Instead, there is a helper library for the tests which does the comparison of a Failure object to a spec entry. You are of course free to do whichever you prefer. I think I'd advise to follow the Perl code's model. That way, if the test definitions change, only the test code needs updating. Still, I wouldn't expect it to be a big deal. As for the failure structures looking for nested errors: well, I'm not sure I fully understand your question, and I hope I don't sound too thick or pedantic when I say that of course the failure structures don't need to look for nested errors -- they already have the record of them. I think you're asking one of two things, and I'll try to answer both. When it turns out I'm wrong, maybe you'll know how to set me right. If you mean "How do you manage to return a structured error when each checker checks one thing" When you call assert_valid on a complex schema and it can't reject right away (as you could if, say, you're trying to assert that So the call stack, going deeper, eventually hits an exception. As the exception handlers run on the way back up, each one contextualizes it more and eventually it goes uncaught back to the caller. In the example above, it would add "3" to the path for the data (for index 3 into the data) and "contents" to the path for the check, because it's the "contents" check of This not the only way to do this. It might not always be the best way. It's just how Perl's implementation works. In the example you gave, the top-level If you mean "How does the user look at the exception they receive and figure out what happened?" Well, there are a few ways. Me, I just print the exception to the screen and it tells me — but of course the implementation of When you get a failure on a deep structure, the thing you get back has all the context to find the data that was rejected and the check that rejected it. That's what got added to it by the contextualization, above. The way that the Perl errors implement the path data is with a pair of arrayrefs (Lists). When we note that we got this exception at (data path 3, check path "contents") we do the equivalent of: error.context.append({ "data_path": 3, "check_path": "contents" }) That happens a bunch of times as you go up the handler and eventually you end up with a context that is a list of frames, just like an exception's stack trace, except it's tracing the parts of the schema, not the program. I seem to have explained how the structure is built, more than the actual question, but that's okay, because this information makes the answer clear: To explain the context of the error, there is a method called Stringifying a Data::Rx::Failure is: my $str = sprintf "Failed %s: %s (error: %s at %s)",
$struct->[0]{type},
$struct->[0]{message},
$self->error_string,
$self->data_string; The type and error there are from the very first bit of context: what was the real error? This will tell us, for example, that we had a "range" error with the message "5 is too big, 2 was the max" when talking about an int found deep in a dict. Finally, one more elaboration: note that I used error_string and data_string. These format the paths into something like Double finally, all this is just how the Perl code does it. I think it's fine, but you may have another idea that will work better for you. Let me know! |
I am sorry, but neither of those answers quite address the situation I am stuck on. What I meant to ask was how these structured errors work for a multi-error failure where errors exist across multiple levels in the data, in what I picture as a recursive error tree. As I understand it right now (quite possibly incorrectly), your final error structure would hold all the discovered errors in one denormalized list; if two elements in an array nested deeply within some complicated structure both failed a test, these failures would be represented as two objects in your top-level list of errors, with their To my mind it would be more natural to represent the failure structure as a tree, where container errors (map, rec, seq, arr) can act as branch nodes pointing down to the errors in their children. I have to go right now, but I believe there are several advantages to this structure over the mental model of that perl structure I currently have. I'll be back and flesh this out in a couple hours. |
Thanks for clarifying! Let's have a look at what happens. Here's a test program in Perl: use strict;
use warnings;
use Data::Rx;
use YAML::XS qw(Load);
my $schema_struct = Load("
---
type: //rec
required:
FOO:
type: //rec
required:
BAR:
type: //all
of :
- { type: //int, range: { min: 1 } }
- { type: //int, range: { min: 2 } }
");
my $rx = Data::Rx->new;
my $schema = $rx->make_schema( $schema_struct );
$schema->assert_valid({ FOO => { BAR => 0 } }); It prints this:
So far so good, but you want to know whether we've created a tree that diverges only as necessary or whether we've got a list of paths, some of which have common prefixes. So, let's dump the error object's guts: bless({
'failures' => [
# Here's the first error we get
bless({
'struct' => [ {
'type' => '//int',
'message' => 'value is outside allowed range',
'value' => 0,
'error' => [ 'range' ]
},
{
'check_path' => [ [ 'of', 'key' ], [ 0, 'index' ] ],
'type' => '//all'
},
{
'data_path' => [ [ 'BAR', 'key' ] ],
'check_path' => [ [ 'required', 'key' ], [ 'BAR', 'key' ] ],
'type' => '//rec'
},
{
'type' => '//rec',
'check_path' => [ [ 'required', 'key' ], [ 'FOO', 'key' ] ],
'data_path' => [ [ 'FOO', 'key' ] ] }
],
},
'Data::Rx::Failure'
),
# Here's the second one
bless({
'struct' => [ {
'error' => [ 'range' ],
'value' => 0,
'message' => 'value is outside allowed range',
'type' => '//int'
},
{
'type' => '//all',
'check_path' => [ [ 'of', 'key' ], [ 1, 'index' ] ]
},
$VAR1->{'failures'}[0]{'struct'}[2],
$VAR1->{'failures'}[0]{'struct'}[3]
],
},
'Data::Rx::Failure'
) ]
},
'Data::Rx::FailureSet'
); Notable facts:
Certainly what you describe could be done, and it wouldn't be too hard. It may be better for some applications, if you're going to walk the error object, especially if you walk it in sync with the data structure. The existing implementation may have no particular benefit over it, in fact. I think it's just how we ended up doing it! If I think of some benefit beyond "was easy to implement" I'll add it here. ;) |
wanted reference to variable `name`, not the string '`name`'
Back. I think it would be easier for me to implement the tree structure, and personally I think a tree better conceptually represents the nature of this data. Additionally, a tree makes it easy to implement error messages which are less redundant, by grouping all the failures associated with a particular container under a "section" for that container. For example, my current implementation of the error messages does this with your example: import Rx
import yaml
schema_struct = yaml.load('''
---
type: //rec
required:
FOO:
type: //rec
required:
BAR:
type: //all
of :
- { type: //int, range: { min: 1 } }
- { type: //int, range: { min: 2 } }
''')
rx = Rx.Factory()
schema = rx.make_schema(schema_struct);
schema.validate({'FOO': {'BAR': 0}}) raises an error with the message:
which, while some details could be improved upon, I think is a generally nicer message. With the error tree, the |
I realize now that maybe you posted the above with the thought that I would reply. Just in case: your plan sounds good. 😉 |
Update: I've been working on this and am close to done, but school's started up again and its slowing me down a lot. I'll have a commit with the structured errors as soon as I can, hopefully within a week. |
Hi, it seems that this PR has already been merged. Except the last commit of CesiumLifeJacket, everything is already in master! |
Just applied that last commit as e0368c9. Because this ticket acquired another purpose, I'm leaving it open, but I'll change the title to avoid the apeparance that we're not py3 ready. Thanks for the nudge! |
👍 @CesiumLifeJacket I look at your proposition for the error management and I like it (especially having the full path to the key that raised the error(s) ). @rjbs I think |
Thanks for commenting Oliverpool, you made me actually follow up on this. I've updated my version of Rx.py with a rough draft of the structured error classes. The schemas that can act like branches now raise >>> import Rx
>>> f = Rx.Factory()
>>> import yaml
>>> s = f.make_schema(yaml.load('''
... type: //seq
... contents:
... - //int
... - //nil
... - type: //rec
... required:
... foo: //int
... bar: //int
... optional:
... baz:
... type : //arr
... contents: //int
... length:
... min: 1
... max: 3
... '''))
>>> s.validate([1, None, {'foo': 1, 'baz': [3, 4, 5, 6.2, 7], 'bar': 2}])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/jeremy/Documents/repos/rx/python/Rx.py", line 608, in validate
raise mismatch;
Rx.SchemaTreeMismatch: [2] ['baz'] does not match schema requirements:
length must be in range [1, 3]
[3] must be an integer
There are some things about this system I'm not in love with and am already thinking about changing, but if you have any feedback about the direction I'm taking this before I get in too deep, it would be appreciated. |
I really like it! 👍 To emphasis the tree structure, maybe you could print The errors could also report the values that where found:
but I think the formulation can be improved |
Thanks for the feedback. What do you mean by formulation? Can you be more specific? One thing I was thinking about changing was to make all the Mismatch exceptions more like the |
is aimed at my proposal: the words and sentences that I propose could be improved. Your "data storage proposition" sounds great to me: it allows on the developer side to have a direct insight of the error if needed. |
Latest commit is a rework of the way Mismatch exceptions are designed, with your proposed changes implemented. Still probably needs a little polishing but I'm much happier with the overall structure now. That same code now produces an error message like this:
|
While I'm at it, another thing I've been thinking about is the Factory class. What utility does this class provide that couldn't be gotten from simply making the Rx module behave like an instance of Factory? If, for example, instead of writing import Rx
rx = Rx.Factory()
schema = rx.make_schema(... you would just write import Rx
schema = Rx.make_schema(... I feel like the vast majority of the time, the ladder would serve just fine, and saves a line of code. If there is some other functionality this eliminates, would it be possible to add that functionality back in as the exception, instead of the rule? I think this change would make using Rx for the first time a little more approachable. |
I'm totally satisfied with the error message that is now displayed (the
I agree with this proposal! |
The no-factory branch of my fork of the repository makes that change, but I'd like to hear Ricardo's opinion on this, because it feels like I'm removing some functionality that I don't quite appreciate. Also, minor stylistic decision: The Type classes are ordered alphabetically, so should the Mismatch exceptions be ordered this way as well? Right now they're kind of thematically arranged, with similar exceptions grouped together and generally increasing in complexity as you go down. I like the idea of alphabetical ordering, but it would separate some very similar exception classes, such as |
Personally, I either read the complete code linearly (so grouping similar exceptions makes sense) or when I search for something I use my editor tools (so I don't care about the ordering). Apart from that, I can't tell... |
Thanks for your patience during the holidays while I ignored most of my repositories! I am opposed to removing the factory. The factory provides a useful layer of indirection so you can do things like provide your own implementation of core types on a per-Factory basis. Your patch makes this impossible because it unconditionally registers the core types. To allow different sets of checkers to exist in one process, it needs to be possible to have two factories. Put another way: removing the factory makes Rx configuration global and less flexible. |
It's nice to see this getting some love! I agree with your suggestion: better to have more data-rich errors than to package it all up into human-readable strings. Also, note that the Rx test suite wants keywords on errors. These should be easy to add with that change, though! |
I think I understand your point. Instead of removing the import Rx
schema = Factory.standard_schema(yaml_like_object) I think the default core types are enough for the majority of people: it can be good to simplify their first contact with |
I'm not sure I 100% follow your example, but I think what you're suggesting is the same thing I was going import Rx
schema = Rx.make_schema(...) ...where that method would be something like: std_factory = None
def make_schema(self, schema):
global std_factory
if std_factory is None: std_factory = Factory()
return std_factory.make_schema(schema) Right? |
Right! Latest commit is a file with the old The next step seems to me to update |
I am back after years in the wilderness, and wonder whether this PR should be closed, left open, or other. If nothing else, maybe I am finally ready to split the repo, as it's often asked for… |
For my own project, I have updated Rx.py to be compatible with both Python 2 and 3. I have also rewritten a lot of the code to be slightly more concise/modern/pythonic.
The other major feature is that the types'
check()
methods now have an augmented version calledvalidate()
, which instead of simply returningTrue
orFalse
, raises aSchemaMismatch
exception with an informative error message when the value doesn't match the schema.I've also changed the behavior of
Factory.__init__()
to register core types automatically, and takeregister_core_types
as a boolean argument instead of as a key in a dictionary. I am unsure why this argument was a dictionary in the first place, and what this afforded the user, so this may be a change worth reverting.