gh-128150: improve performances of `uuid.uuid*` constructor functions. #128151

picnixz · 2024-12-21T10:12:01Z

There are some points that can be addressed:

We can drop some micro-optimizations to reduce the diff. Most of the time is taken by function calls and loading integers.
HACL* MD5 is faster than OpenSSL MD5 so it's better to use the former. However, using _md5.md5 or from _md5 import md5 is a micro-optimization that can be dropped without affecting performances too much (see cff86e9 and 7095aa4) For consistency, we'll rely on OpenSSL-based implementation even if it's a bit slower.

The rationale of expanding not 0 <= x < 1 << 128 into x < 0 or x > 0xffff_ffff_ffff_ffff_ffff_ffff_ffff_ffff is due to the non-equivalent bytecodes.
Similar arguments apply to expanding not 0 <= x < (1 << C) into x < 0 or x > B where B is the hardcoded hexadecimal value of (1 << C) - 1.

Bytecode comparisons (not useful, constant folding will make them similarly performant)

   1           LOAD_SMALL_INT           0
               LOAD_NAME                0 (x)
               SWAP                     2
               COPY                     2
               COMPARE_OP              42 (<=)
               COPY                     1
               TO_BOOL
               POP_JUMP_IF_FALSE        9 (to L1)
               NOT_TAKEN
               POP_TOP
               LOAD_SMALL_INT           1
               LOAD_SMALL_INT         128
               BINARY_OP                3 (<<)
               COMPARE_OP               2 (<)
               JUMP_FORWARD             2 (to L2)
       L1:     SWAP                     2
               POP_TOP
       L2:     TO_BOOL
               UNARY_NOT
               POP_TOP
               LOAD_CONST               0 (None)
               RETURN_VALUE

versus

   1           LOAD_NAME                0 (x)
               LOAD_SMALL_INT           0
               COMPARE_OP               2 (<)
               COPY                     1
               TO_BOOL
               POP_JUMP_IF_TRUE         8 (to L1)
               POP_TOP
               LOAD_NAME                0 (x)
               LOAD_CONST               0 (340282366920938463463374607431768211455)
               COMPARE_OP             132 (>)
               POP_TOP
               LOAD_CONST               1 (None)
               RETURN_VALUE
       L1:     POP_TOP
               LOAD_CONST               1 (None)
               RETURN_VALUE

Issue: Improve performances of uuid.* functions #128150

📚 Documentation preview 📚: https://cpython-previews--128151.org.readthedocs.build/

eendebakpt · 2024-12-21T13:16:55Z

The changes itself look good at first glance. On the other hand: if performance is really important, there there dedicated packages to calculate uuids (binding to rust or C) that are much faster.

One more idea to improve performance: add a dedicated constructor that skips the checks. For example add to UUID:

    @classmethod
    def _from_int(cls, int,  is_safe=SafeUUID.unknown):
        v= cls.__new__(cls)
        object.__setattr__(v, 'int', int)
        object.__setattr__(v, 'is_safe', is_safe)
        return v

Results in

%timeit UUID._from_int(123 )
%timeit UUID(int=123, version=None)
451 ns ± 15.7 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
767 ns ± 41.2 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

(the UUID._from_int can be used from inside uuid4 for example)

picnixz · 2024-12-21T13:53:31Z

On the other hand: if performance is really important, there there dedicated packages to calculate uuids (binding to rust or C) that are much faster.

I also thought about expanding the C interface for the module but it would have been too complex as a first iteration. As for third-party packages, I do know about them but there might be slightly differences in which methods they use for the UUID (and this could be a stop for existing code, namely switching to another implementation).

One more idea to improve performance: add a dedicated constructor that skips the checks

I also had this idea but haven't tested it as a first iteration. I wanted to get some feedback (I feel that performance gains are fine but OTOH, the code is a bit uglier =/)

picnixz · 2024-12-21T14:37:41Z

Ok the benchmarks are not always very stable but I do see improvements sith the dedicated constructor. I need to go now but I'll try to see which version is the best and the most stable.

picnixz · 2024-12-22T10:16:58Z

So, we're now stable and consistent:

+----------------------------------------+---------+-----------------------+-----------------------+
| Benchmark                              | ref     | new                   | opt                   |
+========================================+=========+=======================+=======================+
| uuid3(NAMESPACE_DNS, os.urandom(16))   | 1.13 us | 767 ns: 1.47x faster  | 767 ns: 1.47x faster  |
+----------------------------------------+---------+-----------------------+-----------------------+
| uuid3(NAMESPACE_DNS, os.urandom(1024)) | 2.05 us | 1.82 us: 1.13x faster | 1.78 us: 1.15x faster |
+----------------------------------------+---------+-----------------------+-----------------------+
| uuid4()                                | 1.15 us | 867 ns: 1.33x faster  | 860 ns: 1.34x faster  |
+----------------------------------------+---------+-----------------------+-----------------------+
| uuid5(NAMESPACE_DNS, os.urandom(16))   | 1.10 us | 810 ns: 1.35x faster  | 778 ns: 1.41x faster  |
+----------------------------------------+---------+-----------------------+-----------------------+
| uuid5(NAMESPACE_DNS, os.urandom(1024)) | 1.52 us | 1.22 us: 1.24x faster | 1.19 us: 1.27x faster |
+----------------------------------------+---------+-----------------------+-----------------------+
| uuid8()                                | 926 ns  | 673 ns: 1.38x faster  | 671 ns: 1.38x faster  |
+----------------------------------------+---------+-----------------------+-----------------------+
| Geometric mean                         | (ref)   | 1.21x faster          | 1.22x faster          |
+----------------------------------------+---------+-----------------------+-----------------------+

Benchmark hidden because not significant (3): uuid1(), uuid1(node, None), uuid1(None, clock_seq)

Strictly speaking, the uuid1() benchmarks can be considered significant but only if you consider a 4% improvement as significant, which I did not. I only kept improvements over 10%. The last column is the same as the second one (PGO, no LTO) but using python -OO (namely assertions are removed).

Lib/uuid.py

eendebakpt

Nice improvement overall! Personally I am not a fan of the lazy imports here, but I'll let someone else decide on that.

Doc/whatsnew/3.14.rst

picnixz · 2024-12-23T08:11:34Z

The entire module has been written so to reduce import times but I understand. I'll adress your comments tomorrow and will also check if I can remove some unnecessary micro optimizations. Thank you!

In this commit, we move the rationale for using HACL*-based MD5 instead of its OpenSSL implementation from the code to this note. HACL*-based MD5 is 2x faster than its OpenSSL implementation for creating the hash object via `h = md5(..., usedforsecurity=False)` but `h.digest()` is slightly (yet noticeably) slower. Overall, HACL*-based MD5 still remains faster than its OpenSSL-based implementation, whence the choice of `_md5.md5` over `hashlib.md5`.

In this commit, we move the rationale for using OpenSSL-based SHA-1 instead of its HACL* implementation from the code to this note. HACL*-based SHA-1 is 2x faster than its OpenSSL implementation for creating the hash object via `h = sha1(..., usedforsecurity=False)` but `h.digest()` is almost 3x slower. Unlike HACL* MD5, HACL*-based SHA-1 is slower than its OpenSSL-based implementation, whence the choice of `hashlib.sha1` over `_sha1.sha1`.

picnixz · 2024-12-25T12:47:01Z

Ok, here are the final benchmarks:

+----------------------------------------+---------+-----------------------+
| Benchmark                              | ref     | upd                   |
+========================================+=========+=======================+
| uuid1(node, None)                      | 1.21 us | 1.17 us: 1.03x faster |
+----------------------------------------+---------+-----------------------+
| uuid1(None, clock_seq)                 | 1.18 us | 1.10 us: 1.08x faster |
+----------------------------------------+---------+-----------------------+
| uuid3(NAMESPACE_DNS, os.urandom(16))   | 1.13 us | 789 ns: 1.43x faster  |
+----------------------------------------+---------+-----------------------+
| uuid3(NAMESPACE_DNS, os.urandom(1024)) | 2.05 us | 1.86 us: 1.10x faster |
+----------------------------------------+---------+-----------------------+
| uuid4()                                | 1.15 us | 879 ns: 1.31x faster  |
+----------------------------------------+---------+-----------------------+
| uuid5(NAMESPACE_DNS, os.urandom(16))   | 1.10 us | 837 ns: 1.31x faster  |
+----------------------------------------+---------+-----------------------+
| uuid5(NAMESPACE_DNS, os.urandom(1024)) | 1.52 us | 1.24 us: 1.22x faster |
+----------------------------------------+---------+-----------------------+
| uuid8()                                | 926 ns  | 693 ns: 1.34x faster  |
+----------------------------------------+---------+-----------------------+
| Geometric mean                         | (ref)   | 1.19x faster          |
+----------------------------------------+---------+-----------------------+

Agreed that this is slightly slower (roughly a constant 20 ns slower, which may be because I switched from int_ to int, though it's weird) but I think it's clearer now (we don't really care about those 20ns imo). I've also rounded down the numbers.

eendebakpt

I left a few small remarks, but am also ok with leaving the code as is. The PR improves performance for the main cases, does not make the other cases smaller and code complexity is roughly the same.

Lib/uuid.py

picnixz · 2024-12-27T10:12:03Z

Not sure what happens but I'm seeing slow downs. How can I check that constant folding was done?

EDIT: I'll regenerate the benchmarks to be sure. Wait a bit.
EDIT 2: Nevermind the numbers align now and constant folding seems to correctly appear. My bad.

picnixz · 2024-12-27T10:33:16Z

Here are the final benchmarks:

+----------------------------------------+---------+-----------------------+
| Benchmark                              | ref     | upd                   |
+========================================+=========+=======================+
| uuid1()                                | 2.24 us | 2.13 us: 1.05x faster |
+----------------------------------------+---------+-----------------------+
| uuid1(node, None)                      | 1.26 us | 1.17 us: 1.07x faster |
+----------------------------------------+---------+-----------------------+
| uuid1(None, clock_seq)                 | 1.19 us | 1.11 us: 1.07x faster |
+----------------------------------------+---------+-----------------------+
| uuid3(NAMESPACE_DNS, os.urandom(16))   | 1.17 us | 695 ns: 1.68x faster  |
+----------------------------------------+---------+-----------------------+
| uuid3(NAMESPACE_DNS, os.urandom(1024)) | 2.08 us | 1.72 us: 1.21x faster |
+----------------------------------------+---------+-----------------------+
| uuid4()                                | 1.15 us | 883 ns: 1.30x faster  |
+----------------------------------------+---------+-----------------------+
| uuid5(NAMESPACE_DNS, os.urandom(16))   | 1.14 us | 797 ns: 1.43x faster  |
+----------------------------------------+---------+-----------------------+
| uuid5(NAMESPACE_DNS, os.urandom(1024)) | 1.55 us | 1.20 us: 1.29x faster |
+----------------------------------------+---------+-----------------------+
| uuid8()                                | 953 ns  | 658 ns: 1.45x faster  |
+----------------------------------------+---------+-----------------------+
| Geometric mean                         | (ref)   | 1.27x faster          |
+----------------------------------------+---------+-----------------------+

We are indeed faster. Note that with a manual constant folding, I also have the same numbers (I just regenerated everything from scratch). I think sometimes we have noise. I'll update the NEWS as well to reflect the latest numbers.

Lib/uuid.py

picnixz · 2025-01-12T11:06:32Z

Replacing hashlib.md5 instead of using fallback MD5 is as follows:

uuid3(NAMESPACE_DNS, os.urandom(16)): Mean +- std dev: 781 ns +- 10 ns
uuid3(NAMESPACE_DNS, os.urandom(1024)): Mean +- std dev: 1.74 us +- 0.02 us

On main it is:

uuid3(NAMESPACE_DNS, os.urandom(16)): Mean +- std dev: 1.11 us +- 0.03 us
uuid3(NAMESPACE_DNS, os.urandom(1024)): Mean +- std dev: 2.04 us +- 0.02 us

With fallback MD5, we're a bit faster for small lengths:

uuid3(NAMESPACE_DNS, os.urandom(16)): Mean +- std dev: 702 ns +- 17 ns
uuid3(NAMESPACE_DNS, os.urandom(1024)): Mean +- std dev: 1.75 us +- 0.02 us

I think I can live with hashlib.md5. Smaller diff and consistent implementaion. I'll update the PR and the numbers.

encukou

Since we're always bundling HACL* MD5 implementation, I wondered whether we could just use it. Or do you think users would prefer if we use hashlib explicitly (and OpenSSL when available)?

It's possible to build Python without HACL*, mainly for environments where cryptography is somehow regulated (i.e. the infamous FIPS mode).

It might be OK to use HACL* by default in hashlib, possibly making it a bit faster for everyone (who didn't opt out).

But UUID should, IMO, use the default. We agree on that, hit the green button :)

picnixz added 2 commits December 19, 2024 21:43

improve performance of UUIDs creation

0d49ccb

add What's New entry

603335f

bedevere-app bot mentioned this pull request Dec 21, 2024

Improve performances of uuid.* functions #128150

Open

bedevere-app bot added the awaiting review label Dec 21, 2024

picnixz added 5 commits December 21, 2024 11:13

blurb

154ff8b

fix issue number

b965887

fix typos

a8a1894

ensure 14-bit clock sequence

c8aa752

Merge branch 'main' into perf/uuid/init-128150

8c9d5cf

picnixz added 2 commits December 21, 2024 15:18

add dedicated private fast constructor

a2278b8

revert UUIDv1 construction

0710549

picnixz force-pushed the perf/uuid/init-128150 branch from 4f2744a to 0710549 Compare December 21, 2024 14:25

change eager check into an assertion check for internal constructor

5b6922f

update performance results

e631593

eendebakpt reviewed Dec 23, 2024

View reviewed changes

Lib/uuid.py Outdated Show resolved Hide resolved

Lib/uuid.py Outdated Show resolved Hide resolved

Lib/uuid.py Outdated Show resolved Hide resolved

eendebakpt reviewed Dec 23, 2024

View reviewed changes

Doc/whatsnew/3.14.rst Outdated Show resolved Hide resolved

Doc/whatsnew/3.14.rst Outdated Show resolved Hide resolved

Doc/whatsnew/3.14.rst Outdated Show resolved Hide resolved

Doc/whatsnew/3.14.rst Outdated Show resolved Hide resolved

picnixz added 9 commits December 23, 2024 16:29

describe constants

1c10901

revert UUIDv1 optimizations to reduce the diff

0bc7321

simplify _from_int private constructor as per Pieter's review

26b1eb1

revert micro-optimization of not a <= x <= b

df50a7a

use built-in int when it is not shadowed

c1ffa7d

clear variant and version bits using dedicated mask

4af1535

fix typos

0d4c008

update benchmarks

9854f69

eendebakpt approved these changes Dec 25, 2024

View reviewed changes

Lib/uuid.py Outdated Show resolved Hide resolved

Lib/uuid.py Outdated Show resolved Hide resolved

Lib/uuid.py Show resolved Hide resolved

bedevere-app bot added awaiting core review and removed awaiting review labels Dec 25, 2024

picnixz added 2 commits December 26, 2024 09:40

remove un-necessary assertions

897902b

use object.__new__ instead of cls.__new__

a8a19e1

hauntsaninja reviewed Dec 26, 2024

View reviewed changes

Lib/uuid.py Outdated Show resolved Hide resolved

hauntsaninja reviewed Dec 26, 2024

View reviewed changes

Lib/uuid.py Outdated Show resolved Hide resolved

Merge branch 'main' into perf/uuid/init-128150

ccb972b

picnixz added 2 commits December 27, 2024 11:37

remove dedicated constant folding

e2b8b08

update benchmarks

1d4216a

picnixz requested review from hauntsaninja and eendebakpt December 27, 2024 10:50

eendebakpt approved these changes Dec 27, 2024

View reviewed changes

picnixz mentioned this pull request Jan 3, 2025

gh-128427: Add uuid.NIL and uuid.MAX #128429

Open

encukou reviewed Jan 8, 2025

View reviewed changes

Lib/uuid.py Outdated Show resolved Hide resolved

picnixz added 2 commits January 12, 2025 12:17

Always use hashlib.md5 for consistency as per Petr's comment.

5c87adf

update benchmarks

ea23629

picnixz requested a review from encukou January 12, 2025 12:37

encukou approved these changes Jan 13, 2025

View reviewed changes

bedevere-app bot added awaiting merge and removed awaiting core review labels Jan 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gh-128150: improve performances of `uuid.uuid*` constructor functions. #128151

gh-128150: improve performances of `uuid.uuid*` constructor functions. #128151

picnixz commented Dec 21, 2024 •

edited

Loading

eendebakpt commented Dec 21, 2024

picnixz commented Dec 21, 2024

picnixz commented Dec 21, 2024

picnixz commented Dec 22, 2024 •

edited

Loading

eendebakpt left a comment

picnixz commented Dec 23, 2024

picnixz commented Dec 25, 2024 •

edited

Loading

eendebakpt left a comment

picnixz commented Dec 27, 2024 •

edited

Loading

picnixz commented Dec 27, 2024

picnixz commented Jan 12, 2025

encukou left a comment •

edited

Loading

gh-128150: improve performances of uuid.uuid* constructor functions. #128151

Are you sure you want to change the base?

gh-128150: improve performances of uuid.uuid* constructor functions. #128151

Conversation

picnixz commented Dec 21, 2024 • edited Loading

eendebakpt commented Dec 21, 2024

picnixz commented Dec 21, 2024

picnixz commented Dec 21, 2024

picnixz commented Dec 22, 2024 • edited Loading

eendebakpt left a comment

Choose a reason for hiding this comment

picnixz commented Dec 23, 2024

picnixz commented Dec 25, 2024 • edited Loading

eendebakpt left a comment

Choose a reason for hiding this comment

picnixz commented Dec 27, 2024 • edited Loading

picnixz commented Dec 27, 2024

picnixz commented Jan 12, 2025

encukou left a comment • edited Loading

Choose a reason for hiding this comment

gh-128150: improve performances of `uuid.uuid*` constructor functions. #128151

gh-128150: improve performances of `uuid.uuid*` constructor functions. #128151

picnixz commented Dec 21, 2024 •

edited

Loading

picnixz commented Dec 22, 2024 •

edited

Loading

picnixz commented Dec 25, 2024 •

edited

Loading

picnixz commented Dec 27, 2024 •

edited

Loading

encukou left a comment •

edited

Loading