Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DAR-4036][External] E2E tests for push and some coverage for import #928

Merged
merged 6 commits into from
Sep 26, 2024

Conversation

JBWilkie
Copy link
Collaborator

@JBWilkie JBWilkie commented Sep 20, 2024

Problem

darwin-py's E2E tests are sparse

Solution

Add a e2e_tests/cli/test_push.py file with the following tests:

  • test_push_mixed_filetypes - Test pushing a directory of files containing various filetypes and verify they finish processing
  • test_push_nested_directory_of_images - Test pushing a nested directory structure of some images with the preserve_folders flag. Verify they finish processing and they end up in the correct remote paths
  • test_push_videos_with_non_native_fps - Test that if FPS is set, that the value is respected in the resulting video items

These tests will wait a maximum of 10 minutes for all items to finish processing. If this timeout is exceeded, the test will fail

Add a e2e_tests/cli_test_import.py file with the following tests:

  • test_import_basic_annotations_to_images - Test importing a set of basic image annotations (no sub-types or properties) to a set of pre-registered files in a dataset
  • test_import_annotations_with_subtypes_to_images - Test importing a set of image annotations including subtypes & properties to a set of pre-registered files in a dataset
  • test_annotation_classes_are_created_on_import - Test that importing non-existent annotation classes creates those classes in the target Darwin team
  • test_annotation_classes_are_created_with_properties_on_import - Test that importing non-existent annotation classes with properties creates those classes and properties in the target Darwin team
  • test_appending_annotations - Test that appending annotations to an item with already existing annotations does not overwrite the original annotations
  • test_overwriting_annotations - Test that the --overwrite flag allows bypassing of the overwrite warning when importing to items with already existing annotations
  • test_annotation_overwrite_warning - Test that importing annotations to an item with already existing annotations throws a warning if not using the --append or --overwrite flags

Changelog

Expanded darwin-py E2E

Copy link

linear bot commented Sep 20, 2024

@wiz-inc-4ad3b29aa7
Copy link

wiz-inc-4ad3b29aa7 bot commented Sep 23, 2024

Wiz Scan Summary

Scan Module Critical High Medium Low Info Total
IaC Misconfigurations 0 0 0 0 0 0
Vulnerabilities 0 2 1 0 0 3
Sensitive Data 0 0 0 0 0 0
Secrets 0 0 0 0 0 0
Total 0 2 1 0 0 3

View scan details in Wiz

To detect these findings earlier in the dev lifecycle, try using Wiz Code VS Code Extension.

Comment on lines 72 to 75
for annotation in actual_annotations
if annotation.data == expected_annotation_data
and annotation.annotation_class.annotation_type
== expected_annotation_type
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is necessary because the data field of both the tag and mask types is just {}, so we need to check that annotation_type matches too

Comment on lines +75 to +76
# Prefix the command with 'poetry run' to ensure it runs in the Poetry shell
command = f"poetry run {command}"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes running E2Es locally less error prone by forcing them to run in the poetry shell that's guaranteed to point to the correct darwin-py installation

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, python also has sys.executable 😄

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't know this, it's much nicer. thank you!

Copy link

@umbertoDifa umbertoDifa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really excited for these e2e!

@@ -0,0 +1,33 @@
# from pathlib import Path

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this supposed to be commented?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see there are TODO's also in other files, let's make sure to clean them up before merge

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed all the TODOs. There are for future E2E PRs so don't have to be present here

Comment on lines 17 to 38
tmp_dir: Path, import_dir: Path, appending: bool = False
):
"""
Validate that the annotations downloaded from a release match the annotations in
a particular directory, ignoring hidden files.

If `appending` is set, then the number of actual annotations should exceed the
number of expected annotations
"""
annotations_dir = tmp_dir / "annotations"
with zipfile.ZipFile(tmp_dir / "dataset.zip") as z:
z.extractall(annotations_dir)
expected_annotation_files = {
file.name: str(file)
for file in import_dir.iterdir()
if file.is_file() and not file.name.startswith(".")
}
actual_annotation_files = {
file.name: str(file)
for file in annotations_dir.iterdir()
if file.is_file() and not file.name.startswith(".")
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I read validate_downloaded_annotations what can I expect tmp_dir to be? Is tmp_dir the expected or the actual result? This is of course more visible to me, not knowing the internals, whilst I understand it might be obvious to you.
I'm thinking that we should either stick to export/import naming or actual/expected. I mean it looks like we're using both naming conventions within the same function and IMO this might be confusing.
eg.

def validate_downloaded_annotations(
    export_dir: Path, import_dir: Path, appending: bool = False
):

or

def validate_downloaded_annotations(
    actual_annotations_dir: Path, expected_annotations_dir: Path, appending: bool = False
):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also think about compare_annotations_export(export_dir, import_dir......
I'm nitpicking over naming, just cause it make it easier for me to read the code, but up to you to evaluate these comments

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is all sensible feedback, no reason not to improve the naming convention for others who will work on these tests in the future

Comment on lines 34 to 38
actual_annotation_files = {
file.name: str(file)
for file in annotations_dir.iterdir()
if file.is_file() and not file.name.startswith(".")
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesn't have to be in the with right?

Comment on lines 48 to 56
# Delete generated UUIDs as these will break asserting equality
for annotation in expected_annotations:
del [annotation.id] # type: ignore
if annotation.annotation_class.annotation_type == "raster_layer":
del [annotation.data["mask_annotation_ids_mapping"]] # type: ignore
for annotation in actual_annotations:
del [annotation.id] # type: ignore
if annotation.annotation_class.annotation_type == "raster_layer":
del [annotation.data["mask_annotation_ids_mapping"]] # type: ignore

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my rule of thumb: if I need to add a comment to explain a section, then that section is a method. e.g. delete_annotatoins_uuids

Comment on lines 65 to 93
expected_annotation_data = expected_annotation.data
expected_annotation_type = (
expected_annotation.annotation_class.annotation_type
)
actual_annotation = next(
(
annotation
for annotation in actual_annotations
if annotation.data == expected_annotation_data
and annotation.annotation_class.annotation_type
== expected_annotation_type
),
None,
)
assert (
actual_annotation is not None
), "Annotation not found in actual annotations"

# Properties must be compared separately because the order of properties
# is a list with variable order. Differences in order will cause assertion failure
if expected_annotation.properties:
assert actual_annotation.properties is not None
expected_properties = expected_annotation.properties
del expected_annotation.properties
actual_properties = actual_annotation.properties
del actual_annotation.properties
for expected_property in expected_properties:
assert expected_property in actual_properties
assert expected_annotation == actual_annotation

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd split this into assert_same_annotation_data and assert_same_annotations_properties. Ideally: this function reads more smoothly, no need of comments and then I can check the sub-fuctions separately without keeping all the context in mind

local_dataset: E2EDataset, config_values: ConfigValues
) -> None:
"""
Test importing a set of basic annotations (no sub-types or properties) to a set of pre-registered files in a dataset.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

basic annotations I think this is an arbitrary concept. I see what you are trying to do and I don't have a better way to define this anyways, so I think the doc is very useful

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading below I'm thinking if this should just be test_import_annotations_without_subtypes_to_images

Comment on lines 110 to 113
with tempfile.TemporaryDirectory() as tmp_dir_str:
tmp_dir = Path(tmp_dir_str)
export_and_download_annotations(tmp_dir, local_dataset, config_values)
validate_downloaded_annotations(tmp_dir, import_dir)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice and clean 💯

Comment on lines 191 to 210
Test that appending annotations to an item with already existing annotations does not overwrite the original annotations
"""
local_dataset.register_read_only_items(config_values)
import_dir = (
Path(__file__).parents[1] / "data" / "import" / "image_basic_annotations"
)
# 1st import to create annotations
result = run_cli_command(
f"darwin dataset import {local_dataset.name} darwin {import_dir}"
)
assert_cli(result, 0)
# 2nd import to append more annotations
result = run_cli_command(
f"darwin dataset import {local_dataset.name} darwin {import_dir} --append"
)
assert_cli(result, 0)
with tempfile.TemporaryDirectory() as tmp_dir_str:
tmp_dir = Path(tmp_dir_str)
export_and_download_annotations(tmp_dir, local_dataset, config_values)
validate_downloaded_annotations(tmp_dir, import_dir, appending=True)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see.
Or: we could import image_basic_annotations in two steps (half and half) (I don't think this is super easy as truncating the file in the mid). This would allow us to validate without appending=true cause we'd known exactly that we expect the full annotations in image_basic_annotations in the export.
Or: we could split image_basic_annotations into two smaller files to begin with.

The only reason I'm thinking this is that validate_downloaded_annotations only checks that the export is bigger than the import but it's not a super-strict check on the fact that (in this case) they have to be exactly double. Am I missing something?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense - What I can do is create a new import_dir specifically for this test containing 2 files (half & half), then I can remove the appending concept from validate_download_annotations)

@@ -0,0 +1,112 @@
from pathlib import Path

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

beautiful tests

/ "image_annotations_split_in_two_files"
)
result = run_cli_command(
f"darwin dataset import {local_dataset.name} darwin {expected_annotations_dir}"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this include --append?

f"darwin dataset import {local_dataset.name} darwin {expected_annotations_dir}"
)
assert_cli(result, 0)
assert_cli(result, 0)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double assert?

@JBWilkie JBWilkie merged commit 44ef00d into master Sep 26, 2024
27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants