-
Notifications
You must be signed in to change notification settings - Fork 321
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convert some tests to pytest #1693
Open
ZeyadTarekk
wants to merge
8
commits into
facebook:main
Choose a base branch
from
ZeyadTarekk:convert_some_tests_to_pytest
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+190
−183
Open
Changes from all commits
Commits
Show all changes
8 commits
Select commit
Hold shift + click to select a range
a45f986
change test_hash_from_x
ZeyadTarekk 890ac4b
change test_md5_hash
ZeyadTarekk a4ccca2
change test_raw_text
ZeyadTarekk 485f8f7
change test_url_md5_hash
ZeyadTarekk e71df22
change test_pdq_index
ZeyadTarekk 29c27d4
Fix formatting
ZeyadTarekk f303965
Update pdf test
ZeyadTarekk be67fa4
fix test raw text
ZeyadTarekk File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
28 changes: 17 additions & 11 deletions
28
python-threatexchange/threatexchange/signal_type/tests/test_md5_hash.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,23 +1,29 @@ | ||
# Copyright (c) Meta Platforms, Inc. and affiliates. | ||
|
||
import unittest | ||
import pathlib | ||
|
||
import pytest | ||
from threatexchange.signal_type.md5 import VideoMD5Signal | ||
|
||
# Define the test file path | ||
TEST_FILE = pathlib.Path(__file__).parent.parent.parent.parent.joinpath( | ||
"data", "sample-b.jpg" | ||
) | ||
|
||
|
||
class VideoMD5SignalTestCase(unittest.TestCase): | ||
def setUp(self): | ||
self.a_file = open(TEST_FILE, "rb") | ||
@pytest.fixture | ||
def file_content(): | ||
""" | ||
Fixture to open and yield file content for testing, | ||
then close the file after the test. | ||
""" | ||
with open(TEST_FILE, "rb") as f: | ||
yield f.read() | ||
|
||
def tearDown(self): | ||
self.a_file.close() | ||
|
||
def test_can_hash_simple_files(self): | ||
assert "d35c785545392755e7e4164457657269" == VideoMD5Signal.hash_from_bytes( | ||
self.a_file.read() | ||
), "MD5 hash does not match" | ||
def test_can_hash_simple_files(file_content): | ||
""" | ||
Test that the VideoMD5Signal produces the expected hash. | ||
""" | ||
expected_hash = "d35c785545392755e7e4164457657269" | ||
computed_hash = VideoMD5Signal.hash_from_bytes(file_content) | ||
assert computed_hash == expected_hash, "MD5 hash does not match" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,8 +1,6 @@ | ||
# Copyright (c) Meta Platforms, Inc. and affiliates. | ||
|
||
import unittest | ||
import pickle | ||
import typing as t | ||
import pytest | ||
import functools | ||
|
||
from threatexchange.signal_type.index import ( | ||
|
@@ -13,139 +11,72 @@ | |
test_entries = [ | ||
( | ||
"0000000000000000000000000000000000000000000000000000000000000000", | ||
dict( | ||
{ | ||
"hash_type": "pdq", | ||
"system_id": 9, | ||
} | ||
), | ||
{"hash_type": "pdq", "system_id": 9}, | ||
), | ||
( | ||
"000000000000000000000000000000000000000000000000000000000000ffff", | ||
dict( | ||
{ | ||
"hash_type": "pdq", | ||
"system_id": 8, | ||
} | ||
), | ||
{"hash_type": "pdq", "system_id": 8}, | ||
), | ||
( | ||
"0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f", | ||
dict( | ||
{ | ||
"hash_type": "pdq", | ||
"system_id": 7, | ||
} | ||
), | ||
{"hash_type": "pdq", "system_id": 7}, | ||
), | ||
( | ||
"f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0", | ||
dict( | ||
{ | ||
"hash_type": "pdq", | ||
"system_id": 6, | ||
} | ||
), | ||
{"hash_type": "pdq", "system_id": 6}, | ||
), | ||
( | ||
"ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff", | ||
dict( | ||
{ | ||
"hash_type": "pdq", | ||
"system_id": 5, | ||
} | ||
), | ||
{"hash_type": "pdq", "system_id": 5}, | ||
), | ||
] | ||
|
||
|
||
class TestPDQIndex(unittest.TestCase): | ||
def setUp(self): | ||
self.index = PDQIndex.build(test_entries) | ||
@pytest.fixture | ||
def index(): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ignorable: While this might be a misuse of feature, fixtures are basically a 1:1 mapping for setUp, so I think this a faithful translation |
||
return PDQIndex.build(test_entries) | ||
|
||
def assertEqualPDQIndexMatchResults( | ||
self, result: t.List[PDQIndexMatch], expected: t.List[PDQIndexMatch] | ||
): | ||
self.assertEqual( | ||
len(result), len(expected), "search results not of expected length" | ||
) | ||
|
||
accum_type = t.Dict[int, t.Set[int]] | ||
|
||
# Between python 3.8.6 and 3.8.11, something caused the order of results | ||
# from the index to change. This was noticed for items which had the | ||
# same distance. To allow for this, we convert result and expected | ||
# arrays from | ||
# [PDQIndexMatch, PDQIndexMatch] to { distance: <set of PDQIndexMatch.metadata hash> } | ||
# This allows you to compare [PDQIndexMatch A, PDQIndexMatch B] with | ||
# [PDQIndexMatch B, PDQIndexMatch A] as long as A.distance == B.distance. | ||
def quality_indexed_dict_reducer( | ||
acc: accum_type, item: PDQIndexMatch | ||
) -> accum_type: | ||
acc[item.similarity_info.distance] = acc.get( | ||
item.similarity_info.distance, set() | ||
) | ||
# Instead of storing the unhashable item.metadata dict, store its | ||
# hash so we can compare using self.assertSetEqual | ||
acc[item.similarity_info.distance].add(hash(frozenset(item.metadata))) | ||
return acc | ||
|
||
# Convert results to distance -> set of metadata map | ||
distance_to_result_items_map: accum_type = functools.reduce( | ||
quality_indexed_dict_reducer, result, {} | ||
) | ||
def assert_equal_pdq_index_match_results( | ||
result: t.List[PDQIndexMatch], expected: t.List[PDQIndexMatch] | ||
): | ||
assert len(result) == len(expected), "Search results not of expected length" | ||
|
||
# Convert expected to distance -> set of metadata map | ||
distance_to_expected_items_map: accum_type = functools.reduce( | ||
quality_indexed_dict_reducer, expected, {} | ||
def quality_indexed_dict_reducer( | ||
acc: t.Dict[int, t.Set[int]], item: PDQIndexMatch | ||
) -> t.Dict[int, t.Set[int]]: | ||
acc[item.similarity_info.distance] = acc.get( | ||
item.similarity_info.distance, set() | ||
) | ||
acc[item.similarity_info.distance].add(hash(frozenset(item.metadata))) | ||
return acc | ||
|
||
assert len(distance_to_expected_items_map) == len( | ||
distance_to_result_items_map | ||
), "Unequal number of items in expected and results." | ||
|
||
for distance, result_items in distance_to_result_items_map.items(): | ||
assert ( | ||
distance in distance_to_expected_items_map | ||
), f"Unexpected distance {distance} found" | ||
self.assertSetEqual(result_items, distance_to_expected_items_map[distance]) | ||
|
||
def test_search_index_for_matches(self): | ||
entry_hash = test_entries[1][0] | ||
result = self.index.query(entry_hash) | ||
self.assertEqualPDQIndexMatchResults( | ||
result, | ||
[ | ||
PDQIndexMatch( | ||
SignalSimilarityInfoWithIntDistance(0), test_entries[1][1] | ||
), | ||
PDQIndexMatch( | ||
SignalSimilarityInfoWithIntDistance(16), test_entries[0][1] | ||
), | ||
], | ||
) | ||
|
||
def test_search_index_with_no_match(self): | ||
query_hash = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa" | ||
result = self.index.query(query_hash) | ||
self.assertEqualPDQIndexMatchResults(result, []) | ||
distance_to_result_items_map = functools.reduce( | ||
quality_indexed_dict_reducer, result, {} | ||
) | ||
distance_to_expected_items_map = functools.reduce( | ||
quality_indexed_dict_reducer, expected, {} | ||
) | ||
|
||
def test_supports_pickling(self): | ||
pickled_data = pickle.dumps(self.index) | ||
assert pickled_data != None, "index does not support pickling to a data stream" | ||
assert len(distance_to_expected_items_map) == len( | ||
distance_to_result_items_map | ||
), "Unequal number of distance groups" | ||
|
||
reconstructed_index = pickle.loads(pickled_data) | ||
assert ( | ||
reconstructed_index != None | ||
), "index does not support unpickling from data stream" | ||
for distance, result_items in distance_to_result_items_map.items(): | ||
assert ( | ||
reconstructed_index.index.faiss_index != self.index.index.faiss_index | ||
), "unpickling should create it's own faiss index in memory" | ||
distance in distance_to_expected_items_map | ||
), f"Unexpected distance {distance} found in results" | ||
assert result_items == distance_to_expected_items_map[distance], ( | ||
f"Mismatch at distance {distance}. " | ||
f"Expected: {distance_to_expected_items_map[distance]}, Got: {result_items}" | ||
) | ||
|
||
|
||
query = test_entries[0][0] | ||
result = reconstructed_index.query(query) | ||
self.assertEqualPDQIndexMatchResults( | ||
result, | ||
@pytest.mark.parametrize( | ||
"entry_hash, expected_matches", | ||
[ | ||
( | ||
test_entries[1][0], | ||
[ | ||
PDQIndexMatch( | ||
SignalSimilarityInfoWithIntDistance(0), test_entries[1][1] | ||
|
@@ -154,4 +85,46 @@ def test_supports_pickling(self): | |
SignalSimilarityInfoWithIntDistance(16), test_entries[0][1] | ||
), | ||
], | ||
) | ||
), | ||
( | ||
"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa", | ||
[], | ||
), | ||
], | ||
) | ||
def test_search_index(index, entry_hash, expected_matches): | ||
result = index.query(entry_hash) | ||
assert_equal_pdq_index_match_results(result, expected_matches) | ||
|
||
|
||
def test_partial_match_below_threshold(index): | ||
query_hash = "ffffffffffffffffffffffffffffffffffffffffffffffffffffffff00000000" | ||
result = index.query(query_hash) | ||
assert_equal_pdq_index_match_results(result, []) | ||
|
||
|
||
def test_supports_pickling(index): | ||
pickled_data = pickle.dumps(index) | ||
assert pickled_data is not None, "Index does not support pickling to a data stream" | ||
|
||
reconstructed_index = pickle.loads(pickled_data) | ||
assert ( | ||
reconstructed_index is not None | ||
), "Index does not support unpickling from data stream" | ||
assert ( | ||
reconstructed_index.index.faiss_index != index.index.faiss_index | ||
), "Unpickling should create its own FAISS index in memory" | ||
|
||
assert ( | ||
reconstructed_index.index_size == index.index_size | ||
), "Index size mismatch after unpickling" | ||
|
||
query = test_entries[0][0] | ||
result = reconstructed_index.query(query) | ||
assert_equal_pdq_index_match_results( | ||
result, | ||
[ | ||
PDQIndexMatch(SignalSimilarityInfoWithIntDistance(0), test_entries[1][1]), | ||
PDQIndexMatch(SignalSimilarityInfoWithIntDistance(16), test_entries[0][1]), | ||
], | ||
) |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure that you need a fixture of this - the test can just open itself.
Fixtures are helpful when you are sharing setup between tests, and here's there is only one test.