Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow per item validators to be specified by string #419

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions docs/source/item-validation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -141,6 +141,17 @@ as a `dict`:
OtherItem: '/path/to/otheritem_schema.json',
}

Keys of ``dict`` can also be strings matching item class names:

.. code-block:: python

SPIDERMON_VALIDATION_SCHEMAS = {
'DummyItem': '/path/to/dummyitem_schema.json',
'OtherItem': '/path/to/otheritem_schema.json',
Comment on lines +149 to +150
Copy link
Member

@Gallaecio Gallaecio Sep 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm… I don’t like that this is done by name instead of import path. Would you be OK with switching approaches? It is how Scrapy settings handle such situations (in fact, Scrapy settings first supported import paths as strings, and later also added support for actual objects).

}



Validation in Monitors
----------------------

Expand Down
2 changes: 1 addition & 1 deletion spidermon/contrib/scrapy/pipelines.py
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ def set_validators(loader, schema):
if type(schema) in (list, tuple):
schema = {Item: schema}
for obj, paths in schema.items():
key = obj.__name__
key = obj.__name__ if hasattr(obj, "__name__") else str(obj)
paths = paths if type(paths) in (list, tuple) else [paths]
objects = [loader(v) for v in paths]
validators[key].extend(objects)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm…

So, I am not very familiar with Spidermon code, but it feels wrong that the pre-existing code was only taking into account obj.__name__. Sounds like different item types with the same name but imported from different modules would be validated with the same schema.

I find the pre-existing code a bit hard to read, though, so I might be misreading.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your read is correct. The find_validators method at line 126 only uses item.__class__.__name__ to select a validator out of the dict so same name different module still gets the same schema. This is why I just went with simple class names in the string alternative because they were already being stored that way internally so the change was small.

I like the idea of using fully qualified import paths as keys but it would require some careful thought to be backwards compatible (unless we want a breaking change?). Internally we'd have to store validators using the import path but then during lookup if we fail to find a given item class we could loop over keys and check just the class name part.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually it might be cleaner to just store two keys internally, fully qualified and class name and then look them up in that order. No more looping over strings and splitting them.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I came back to look at this again finally and this is a bigger lift than I have time to work on right now, especially being less familiar with this code myself. Besides requiring changing the existing internal representation and all the related tests, it would likely be considered a breaking change given that someone could be inadvertently relying on the (admittedly bad) behavior of mapping only by the unqualified class name and would need some careful documentation.

Can I request that you either accept this PR as is since it's consistent with the existing code or reject it and turn @Gallaecio's request into it's own issue?

Copy link
Member

@Gallaecio Gallaecio Nov 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about we keep the current behavior, but we require strings to be complete import parts, we load them with load_object, and then use their name?

obj = load_object(obj)
key = obj.__name__

It’s backward-compatible, and while not fixing the “comparison by name” issue now, it does allow us to fix it in the future if we want without requiring user code to change, which would not be possible if we started allowing class names as strings now.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems like it would work well as a compromise keeping existing behavior intact but solving for the problem I was originally dealing with.

Expand Down
12 changes: 12 additions & 0 deletions tests/contrib/scrapy/test_pipelines.py
Original file line number Diff line number Diff line change
Expand Up @@ -203,6 +203,18 @@ class PipelineJSONSchemaValidator(PipelineTest):
assert_type_in_stats(TreeItem),
],
),
DataTest(
name="validators specified by str rather than class",
item=TreeItem(),
settings={
SETTING_SCHEMAS: {"TestItem": test_schema, "TreeItem": tree_schema}
},
cases=[
f"{{stats}}['{STATS_MISSINGS}'] is 1",
assert_type_in_stats(TestItem),
assert_type_in_stats(TreeItem),
],
),
]


Expand Down
Loading