Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

⚡️ Speed up function all_identical by 7% in pydantic/_internal/_utils.py #33

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

codeflash-ai[bot]
Copy link

@codeflash-ai codeflash-ai bot commented Nov 22, 2024

📄 all_identical() in pydantic/_internal/_utils.py

📈 Performance improved by 7% (0.07x faster)

⏱️ Runtime went down from 55.0 milliseconds to 51.5 milliseconds (best of 68 runs)

Explanation and details

To optimize the given function, we can use the built-in zip function instead of zip_longest. zip will automatically stop when the shortest iterable is exhausted, thus simplifying the checks and potentially making it a bit faster, especially for cases where the lengths of the input iterables are the same.

Here is the optimized version.

Key changes.

  1. Replaced zip_longest with zip to stop the iteration when one of the iterables is exhausted.
  2. Added an additional check to ensure that both iterables are of the same length for them to be considered identical.

This approach ensures that the function operates faster for cases where the input sequences are already of the same length while maintaining correct behavior for other cases.

Correctness verification

The new optimized code was tested for correctness. The results are listed below.

🔘 (none found) − ⚙️ Existing Unit Tests

✅ 23 Passed − 🌀 Generated Regression Tests

(click to show generated tests)
# function to test
import typing
from itertools import zip_longest
from typing import Any

import pytest  # used for our unit tests
from pydantic._internal._utils import all_identical

_SENTINEL = object()
from pydantic._internal._utils import all_identical

# unit tests

def test_basic_identical():
    # Identical iterables with same objects
    codeflash_output = all_identical([1, 2, 3], [1, 2, 3])
    codeflash_output = all_identical(['a', 'b', 'c'], ['a', 'b', 'c'])
    # Outputs were verified to be equal to the original implementation

def test_basic_non_identical():
    # Non-identical iterables with different objects
    codeflash_output = all_identical([1, 2, 3], [1, 2, 4])
    codeflash_output = all_identical(['a', 'b', 'c'], ['a', 'b', 'd'])
    # Outputs were verified to be equal to the original implementation

def test_edge_empty_iterables():
    # Empty iterables
    codeflash_output = all_identical([], [])
    codeflash_output = all_identical([], [1])
    codeflash_output = all_identical([1], [])
    # Outputs were verified to be equal to the original implementation

def test_edge_different_lengths():
    # Different lengths
    codeflash_output = all_identical([1, 2, 3], [1, 2])
    codeflash_output = all_identical([1], [1, 2, 3])
    # Outputs were verified to be equal to the original implementation

def test_object_identity():
    # Identical objects (same instance)
    a = object()
    codeflash_output = all_identical([a, a], [a, a])
    # Outputs were verified to be equal to the original implementation

def test_equal_but_not_identical_objects():
    # Equal but not identical objects (different instances)
    codeflash_output = all_identical([[], []], [[], []])
    # Outputs were verified to be equal to the original implementation

def test_mixed_types():
    # Different data types in iterables
    codeflash_output = all_identical([1, 'a', 3.0], [1, 'a', 3.0])
    codeflash_output = all_identical([1, 'a', 3.0], [1, 'a', 3])
    # Outputs were verified to be equal to the original implementation

def test_nested_structures_identical():
    # Nested lists with identical objects
    a = object()
    codeflash_output = all_identical([a, [a]], [a, [a]])
    # Outputs were verified to be equal to the original implementation

def test_nested_structures_non_identical():
    # Nested lists with equal but not identical objects
    codeflash_output = all_identical([1, [2]], [1, [2]])
    # Outputs were verified to be equal to the original implementation

def test_large_scale_identical():
    # Large identical lists
    codeflash_output = all_identical(list(range(1000)), list(range(1000)))
    # Outputs were verified to be equal to the original implementation

def test_large_scale_non_identical():
    # Large non-identical lists
    codeflash_output = all_identical(list(range(1000)), list(range(1000)) + [10001])
    # Outputs were verified to be equal to the original implementation

def test_performance_large_data():
    # Performance with large data
    codeflash_output = all_identical([object()] * 1000000, [object()] * 1000000)
    a = object()
    codeflash_output = all_identical([a] * 1000000, [a] * 1000000)
    # Outputs were verified to be equal to the original implementation

def test_special_cases_none_values():
    # Iterables with `None` values
    codeflash_output = all_identical([None, None], [None, None])
    codeflash_output = all_identical([None, 1], [None, 2])
    # Outputs were verified to be equal to the original implementation

def test_special_cases_sentinel_like_objects():
    # Iterables with `_SENTINEL`-like objects
    sentinel = object()
    codeflash_output = all_identical([sentinel], [sentinel])
    sentinel1, sentinel2 = object(), object()
    codeflash_output = all_identical([sentinel1], [sentinel2])
    # Outputs were verified to be equal to the original implementation

if __name__ == "__main__":
    pytest.main()

🔘 (none found) − ⏪ Replay Tests

To optimize the given function, we can use the built-in `zip` function instead of `zip_longest`. `zip` will automatically stop when the shortest iterable is exhausted, thus simplifying the checks and potentially making it a bit faster, especially for cases where the lengths of the input iterables are the same. 

Here is the optimized version.



Key changes.
1. Replaced `zip_longest` with `zip` to stop the iteration when one of the iterables is exhausted.
2. Added an additional check to ensure that both iterables are of the same length for them to be considered identical.

This approach ensures that the function operates faster for cases where the input sequences are already of the same length while maintaining correct behavior for other cases.
@codeflash-ai codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Nov 22, 2024
@codeflash-ai codeflash-ai bot requested a review from alvin-r November 22, 2024 00:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
⚡️ codeflash Optimization PR opened by Codeflash AI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

0 participants