[DAR-4036][External] E2E tests for `push` and some coverage for `import` #928

JBWilkie · 2024-09-20T14:26:23Z

Problem

darwin-py's E2E tests are sparse

Solution

Add a e2e_tests/cli/test_push.py file with the following tests:

test_push_mixed_filetypes - Test pushing a directory of files containing various filetypes and verify they finish processing
test_push_nested_directory_of_images - Test pushing a nested directory structure of some images with the preserve_folders flag. Verify they finish processing and they end up in the correct remote paths
test_push_videos_with_non_native_fps - Test that if FPS is set, that the value is respected in the resulting video items

These tests will wait a maximum of 10 minutes for all items to finish processing. If this timeout is exceeded, the test will fail

Add a e2e_tests/cli_test_import.py file with the following tests:

test_import_basic_annotations_to_images - Test importing a set of basic image annotations (no sub-types or properties) to a set of pre-registered files in a dataset
test_import_annotations_with_subtypes_to_images - Test importing a set of image annotations including subtypes & properties to a set of pre-registered files in a dataset
test_annotation_classes_are_created_on_import - Test that importing non-existent annotation classes creates those classes in the target Darwin team
test_annotation_classes_are_created_with_properties_on_import - Test that importing non-existent annotation classes with properties creates those classes and properties in the target Darwin team
test_appending_annotations - Test that appending annotations to an item with already existing annotations does not overwrite the original annotations
test_overwriting_annotations - Test that the --overwrite flag allows bypassing of the overwrite warning when importing to items with already existing annotations
test_annotation_overwrite_warning - Test that importing annotations to an item with already existing annotations throws a warning if not using the --append or --overwrite flags

Changelog

Expanded darwin-py E2E

linear · 2024-09-20T14:26:26Z

DAR-4036 E2E tests for `push` & initial tests for `import`

wiz-inc-4ad3b29aa7 · 2024-09-23T15:48:34Z

Wiz Scan Summary

Scan Module						Total
IaC Misconfigurations	0	0	0	0	0	0
Vulnerabilities	0	2	1	0	0	3
Sensitive Data	0	0	0	0	0	0
Secrets	0	0	0	0	0	0

Total	0	2	1	0	0	3

View scan details in Wiz

To detect these findings earlier in the dev lifecycle, try using Wiz Code VS Code Extension.

JBWilkie · 2024-09-25T14:39:22Z

e2e_tests/cli/test_import.py

+                        for annotation in actual_annotations
+                        if annotation.data == expected_annotation_data
+                        and annotation.annotation_class.annotation_type
+                        == expected_annotation_type


This is necessary because the data field of both the tag and mask types is just {}, so we need to check that annotation_type matches too

JBWilkie · 2024-09-25T21:42:22Z

e2e_tests/helpers.py

+    # Prefix the command with 'poetry run' to ensure it runs in the Poetry shell
+    command = f"poetry run {command}"


This makes running E2Es locally less error prone by forcing them to run in the poetry shell that's guaranteed to point to the correct darwin-py installation

btw, python also has sys.executable 😄

I didn't know this, it's much nicer. thank you!

umbertoDifa

Really excited for these e2e!

umbertoDifa · 2024-09-25T14:56:15Z

e2e_tests/cli/test_convert.py

@@ -0,0 +1,33 @@
+# from pathlib import Path


Is this supposed to be commented?

I see there are TODO's also in other files, let's make sure to clean them up before merge

Removed all the TODOs. There are for future E2E PRs so don't have to be present here

umbertoDifa · 2024-09-26T07:28:47Z

e2e_tests/cli/test_import.py

+    tmp_dir: Path, import_dir: Path, appending: bool = False
+):
+    """
+    Validate that the annotations downloaded from a release match the annotations in
+    a particular directory, ignoring hidden files.
+
+    If `appending` is set, then the number of actual annotations should exceed the
+    number of expected annotations
+    """
+    annotations_dir = tmp_dir / "annotations"
+    with zipfile.ZipFile(tmp_dir / "dataset.zip") as z:
+        z.extractall(annotations_dir)
+        expected_annotation_files = {
+            file.name: str(file)
+            for file in import_dir.iterdir()
+            if file.is_file() and not file.name.startswith(".")
+        }
+        actual_annotation_files = {
+            file.name: str(file)
+            for file in annotations_dir.iterdir()
+            if file.is_file() and not file.name.startswith(".")
+        }


If I read validate_downloaded_annotations what can I expect tmp_dir to be? Is tmp_dir the expected or the actual result? This is of course more visible to me, not knowing the internals, whilst I understand it might be obvious to you.
I'm thinking that we should either stick to export/import naming or actual/expected. I mean it looks like we're using both naming conventions within the same function and IMO this might be confusing.
eg.

def validate_downloaded_annotations( export_dir: Path, import_dir: Path, appending: bool = False ):

or

def validate_downloaded_annotations( actual_annotations_dir: Path, expected_annotations_dir: Path, appending: bool = False ):

We could also think about compare_annotations_export(export_dir, import_dir......
I'm nitpicking over naming, just cause it make it easier for me to read the code, but up to you to evaluate these comments

This is all sensible feedback, no reason not to improve the naming convention for others who will work on these tests in the future

umbertoDifa · 2024-09-26T07:35:04Z

e2e_tests/cli/test_import.py

+        actual_annotation_files = {
+            file.name: str(file)
+            for file in annotations_dir.iterdir()
+            if file.is_file() and not file.name.startswith(".")
+        }


this doesn't have to be in the with right?

umbertoDifa · 2024-09-26T07:38:43Z

e2e_tests/cli/test_import.py

+            # Delete generated UUIDs as these will break asserting equality
+            for annotation in expected_annotations:
+                del [annotation.id]  # type: ignore
+                if annotation.annotation_class.annotation_type == "raster_layer":
+                    del [annotation.data["mask_annotation_ids_mapping"]]  # type: ignore
+            for annotation in actual_annotations:
+                del [annotation.id]  # type: ignore
+                if annotation.annotation_class.annotation_type == "raster_layer":
+                    del [annotation.data["mask_annotation_ids_mapping"]]  # type: ignore


my rule of thumb: if I need to add a comment to explain a section, then that section is a method. e.g. delete_annotatoins_uuids

umbertoDifa · 2024-09-26T07:42:15Z

e2e_tests/cli/test_import.py

+                expected_annotation_data = expected_annotation.data
+                expected_annotation_type = (
+                    expected_annotation.annotation_class.annotation_type
+                )
+                actual_annotation = next(
+                    (
+                        annotation
+                        for annotation in actual_annotations
+                        if annotation.data == expected_annotation_data
+                        and annotation.annotation_class.annotation_type
+                        == expected_annotation_type
+                    ),
+                    None,
+                )
+                assert (
+                    actual_annotation is not None
+                ), "Annotation not found in actual annotations"
+
+                # Properties must be compared separately because the order of properties
+                # is a list with variable order. Differences in order will cause assertion failure
+                if expected_annotation.properties:
+                    assert actual_annotation.properties is not None
+                    expected_properties = expected_annotation.properties
+                    del expected_annotation.properties
+                    actual_properties = actual_annotation.properties
+                    del actual_annotation.properties
+                    for expected_property in expected_properties:
+                        assert expected_property in actual_properties
+                assert expected_annotation == actual_annotation


I'd split this into assert_same_annotation_data and assert_same_annotations_properties. Ideally: this function reads more smoothly, no need of comments and then I can check the sub-fuctions separately without keeping all the context in mind

umbertoDifa · 2024-09-26T07:45:22Z

e2e_tests/cli/test_import.py

+    local_dataset: E2EDataset, config_values: ConfigValues
+) -> None:
+    """
+    Test importing a set of basic annotations (no sub-types or properties) to a set of pre-registered files in a dataset.


basic annotations I think this is an arbitrary concept. I see what you are trying to do and I don't have a better way to define this anyways, so I think the doc is very useful

Reading below I'm thinking if this should just be test_import_annotations_without_subtypes_to_images

umbertoDifa · 2024-09-26T07:45:34Z

e2e_tests/cli/test_import.py

+    with tempfile.TemporaryDirectory() as tmp_dir_str:
+        tmp_dir = Path(tmp_dir_str)
+        export_and_download_annotations(tmp_dir, local_dataset, config_values)
+        validate_downloaded_annotations(tmp_dir, import_dir)


nice and clean 💯

umbertoDifa · 2024-09-26T07:55:24Z

e2e_tests/cli/test_import.py

+    Test that appending annotations to an item with already existing annotations does not overwrite the original annotations
+    """
+    local_dataset.register_read_only_items(config_values)
+    import_dir = (
+        Path(__file__).parents[1] / "data" / "import" / "image_basic_annotations"
+    )
+    # 1st import to create annotations
+    result = run_cli_command(
+        f"darwin dataset import {local_dataset.name} darwin {import_dir}"
+    )
+    assert_cli(result, 0)
+    # 2nd import to append more annotations
+    result = run_cli_command(
+        f"darwin dataset import {local_dataset.name} darwin {import_dir} --append"
+    )
+    assert_cli(result, 0)
+    with tempfile.TemporaryDirectory() as tmp_dir_str:
+        tmp_dir = Path(tmp_dir_str)
+        export_and_download_annotations(tmp_dir, local_dataset, config_values)
+        validate_downloaded_annotations(tmp_dir, import_dir, appending=True)


I see.
Or: we could import image_basic_annotations in two steps (half and half) (I don't think this is super easy as truncating the file in the mid). This would allow us to validate without appending=true cause we'd known exactly that we expect the full annotations in image_basic_annotations in the export.
Or: we could split image_basic_annotations into two smaller files to begin with.

The only reason I'm thinking this is that validate_downloaded_annotations only checks that the export is bigger than the import but it's not a super-strict check on the fact that (in this case) they have to be exactly double. Am I missing something?

This makes sense - What I can do is create a new import_dir specifically for this test containing 2 files (half & half), then I can remove the appending concept from validate_download_annotations)

umbertoDifa · 2024-09-26T08:01:48Z

e2e_tests/cli/test_push.py

@@ -0,0 +1,112 @@
+from pathlib import Path


beautiful tests

umbertoDifa · 2024-09-26T10:27:03Z

e2e_tests/cli/test_import.py

+        / "image_annotations_split_in_two_files"
+    )
+    result = run_cli_command(
+        f"darwin dataset import {local_dataset.name} darwin {expected_annotations_dir}"


shouldn't this include --append?

umbertoDifa · 2024-09-26T10:27:12Z

e2e_tests/cli/test_import.py

+        f"darwin dataset import {local_dataset.name} darwin {expected_annotations_dir}"
+    )
+    assert_cli(result, 0)
+    assert_cli(result, 0)


double assert?

Initial tests

4a66068

JBWilkie force-pushed the DAR-4036 branch from 4cfbc1c to 4a66068 Compare September 22, 2024 09:55

Update albumentations to fix a downstream dependency issue

d930385

JBWilkie added 3 commits September 25, 2024 13:22

Refactored import tests to use a simpler approach

08c37d5

Merge branch 'master' into DAR-4036

4eaa320

Refactored import tests

72bf92a

JBWilkie requested review from AndriiKlymchuk and dorfmanrobert September 25, 2024 14:31

JBWilkie commented Sep 25, 2024

View reviewed changes

umbertoDifa reviewed Sep 26, 2024

View reviewed changes

umbertoDifa approved these changes Sep 26, 2024

View reviewed changes

Style improvements

0b91a82

JBWilkie force-pushed the DAR-4036 branch from 5781d67 to 0b91a82 Compare September 26, 2024 11:50

JBWilkie merged commit 44ef00d into master Sep 26, 2024
27 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DAR-4036][External] E2E tests for `push` and some coverage for `import` #928

[DAR-4036][External] E2E tests for `push` and some coverage for `import` #928

JBWilkie commented Sep 20, 2024 •

edited

Loading

linear bot commented Sep 20, 2024

wiz-inc-4ad3b29aa7 bot commented Sep 23, 2024 •

edited

Loading

JBWilkie Sep 25, 2024

JBWilkie Sep 25, 2024

saurbhc Sep 26, 2024

JBWilkie Sep 26, 2024

umbertoDifa left a comment

umbertoDifa Sep 25, 2024

umbertoDifa Sep 26, 2024

JBWilkie Sep 26, 2024

umbertoDifa Sep 26, 2024

umbertoDifa Sep 26, 2024

JBWilkie Sep 26, 2024

umbertoDifa Sep 26, 2024

umbertoDifa Sep 26, 2024

umbertoDifa Sep 26, 2024

umbertoDifa Sep 26, 2024

umbertoDifa Sep 26, 2024

umbertoDifa Sep 26, 2024

umbertoDifa Sep 26, 2024

JBWilkie Sep 26, 2024

umbertoDifa Sep 26, 2024

umbertoDifa Sep 26, 2024

umbertoDifa Sep 26, 2024

		# Prefix the command with 'poetry run' to ensure it runs in the Poetry shell
		command = f"poetry run {command}"

[DAR-4036][External] E2E tests for push and some coverage for import #928

[DAR-4036][External] E2E tests for push and some coverage for import #928

Conversation

JBWilkie commented Sep 20, 2024 • edited Loading

Problem

Solution

Changelog

linear bot commented Sep 20, 2024

wiz-inc-4ad3b29aa7 bot commented Sep 23, 2024 • edited Loading

Wiz Scan Summary

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

umbertoDifa left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

[DAR-4036][External] E2E tests for `push` and some coverage for `import` #928

[DAR-4036][External] E2E tests for `push` and some coverage for `import` #928

JBWilkie commented Sep 20, 2024 •

edited

Loading

wiz-inc-4ad3b29aa7 bot commented Sep 23, 2024 •

edited

Loading