Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-115999: Specialize loading attributes from modules in free-threaded builds #127711

Merged
merged 11 commits into from
Dec 13, 2024

Conversation

mpage
Copy link
Contributor

@mpage mpage commented Dec 6, 2024

This is the first in a series of PRs to specialize the LOAD_ATTR family of opcodes in free-threaded builds. PRs specializing LOAD_ATTR for instance and class receivers will follow.

We use the same approach that was used for specialization of LOAD_GLOBAL in free-threaded builds:

_CHECK_ATTR_MODULE is renamed to _CHECK_ATTR_MODULE_PUSH_KEYS; it pushes the keys object for the following _LOAD_ATTR_MODULE_FROM_KEYS (nee _LOAD_ATTR_MODULE). This arrangement avoids having to recheck the keys version.

_LOAD_ATTR_MODULE is renamed to _LOAD_ATTR_MODULE_FROM_KEYS; it loads the value from the keys object pushed by the preceding _CHECK_ATTR_MODULE_PUSH_KEYS at the cached index.

A few other changes were necessary to support this arrangement:

  • Added POP_DEAD_INPUTS() to the cases generator. It removes dead inputs from the stack and updates the stack pointer. This is used in _LOAD_ATTR_MODULE_FROM_KEYS to remove the keys object from the stack before deopting.
  • Added a tier2-only implementation of _LOAD_ATTR_MODULE. This is used by the tier2 optimizer when we successfully erase a _CHECK_ATTR_MODULE_PUSH_KEYS but cannot convert the conjugate _LOAD_ATTR_MODULE_FROM_KEYS into a LOAD_CONST.
  • Refactored the specialization logic to use the specialize and unspecialize helpers. This isn't strictly necessary, but it makes the implementation of gradually adding specialization support cleaner.
  • Fixed test_c_subclass_of_heap_ctype_with_del_modifying_dunder_class_only_decrefs_once so that it works with deferred refcounting in the free-threaded build. Also refactored the test so that the evaluation of the argument expression to sys.getrefcount always uses deferred refcounting regardless of whether or not it's specialized.

Thread Safety

Specialization

A critical section on the module's dictionary is held during specialization. This prevents concurrent mutation of the dictionary and its keys version. _PyDict_GetKeysVersionForCurrentState is used to assign the keys version, ensuring that the keys object is freed using QSBR.

UOPS

  • Atomics are used where needed. Notably, loading the module's dictionary does not require an atomic: the md_dict field, but not the dictionary itself, is immutable.
  • It's safe to access the keys object without holding the dictionary's per object lock: version assignment during specialization ensured it is freed using QSBR. A racing update will never store a different key at the cached index: once a unicode key has been stored in a keys object for a combined dictionary the offset that it was stored in will not be reused in free-threaded builds.

Single-threaded Performance

Scaling Benchmarks

This improves the scaling of calls to module functions (2.1x slower -> 13.9x faster), while other benchmarks appear unchanged.

Running benchmarks with 28 threads
                   Main          PR
object_cfunction    1.3x slower  1.4x slower
cmodule_function    1.5x slower  1.8x slower
mult_constant      11.8x faster  12.5x faster
generator          13.5x faster  12.6x faster
pymethod            2.0x slower  1.9x slower
pyfunction         13.7x faster  12.4x faster
module_function     2.1x slower  13.9x faster
load_string_const  13.6x faster  12.8x faster
load_tuple_const   13.6x faster  13.2x faster
create_pyobject    15.4x faster  15.9x faster
create_closure     14.4x faster  14.2x faster
create_dict        13.3x faster  13.5x faster
thread_local_read   3.6x slower  4.4x slower

@mpage mpage added the 🔨 test-with-refleak-buildbots Test PR w/ refleak buildbots; report in status section label Dec 7, 2024
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @mpage for commit 38140d9 🤖

If you want to schedule another build, you need to add the 🔨 test-with-refleak-buildbots label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-refleak-buildbots Test PR w/ refleak buildbots; report in status section label Dec 7, 2024
@mpage
Copy link
Contributor Author

mpage commented Dec 7, 2024

Refleaks build failures for the default build are preexisting. The free-threaded failure looks like its caused by this PR.

mpage added 2 commits December 8, 2024 21:27
…y_decrefs_once

Refactor the test so that the specialized and unspecialized implementation of loading
the argument to sys.getrefcount use the same refcounting approach. Previously, the
argument would be evaluated by loading an attribute from a module. This specializes
to LOAD_ATTR_MODULE. In free-threaded builds the unspecialized form of LOAD_ATTR
always creates a new reference for its result, while the specialized form does not
create a reference if the result uses deferred refcounting. This causes a difference
in the result returned from sys.getrefcount, depending on whether or not the bytecode
has been specialized (e.g. on runs > 1 in refleak tests).

The refactored version uses LOAD_GLOBAL, whose specialized and unspecialized forms both
do not create references when the result uses deferred refcounting.

Also refactor the test to handle the difference in the result returned from sys.getrefcount
in default builds (includes the temporary reference on the operand stack) and free-threaded
builds (no temporary reference is created for deferred values).
@mpage
Copy link
Contributor Author

mpage commented Dec 9, 2024

!buildbot nogil refleak

@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @mpage for commit 550f955 🤖

The command will test the builders whose names match following regular expression: nogil refleak

The builders matched are:

  • AMD64 Fedora Rawhide NoGIL refleaks PR
  • PPC64LE Fedora Rawhide NoGIL refleaks PR
  • aarch64 Fedora Rawhide NoGIL refleaks PR
  • AMD64 CentOS9 NoGIL Refleaks PR

@mpage mpage requested a review from colesbury December 9, 2024 23:34
@mpage mpage marked this pull request as ready for review December 9, 2024 23:34
Copy link
Contributor

@colesbury colesbury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Python/bytecodes.c Outdated Show resolved Hide resolved
@@ -665,19 +667,46 @@ def test_c_subclass_of_heap_ctype_with_del_modifying_dunder_class_only_decrefs_o
del subclass_instance

# Test that setting __class__ modified the reference counts of the types
#
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit of an aside, but I'm skeptical of the value of this test considering the increasing complexity. If we want to make sure we don't decref too many times, we should test that we don't crash. Depending on the exact refcounts feels too likely to break with implementation changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Checking that we don't crash sounds like a better approach to me. Filed #127881 to track that.

@mpage mpage added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Dec 12, 2024
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @mpage for commit a483958 🤖

If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Dec 12, 2024
@mpage
Copy link
Contributor Author

mpage commented Dec 12, 2024

!buildbot AMD64 Android

@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @mpage for commit a483958 🤖

The command will test the builders whose names match following regular expression: AMD64 Android

The builders matched are:

  • AMD64 Android PR

@mpage
Copy link
Contributor Author

mpage commented Dec 13, 2024

Buildbot failures look unrelated to this PR:

  • AMD64 Arch Linux Asan PR, AMD64 Arch Linux Perf PR, AMD64 Windows11 Non-Debug PR, PPC64LE Fedora Stable LTO PR, PPC64LE Fedora Stable PR, PPC64LE RHEL8 LTO PR - All the same failure that exists on main
  • AMD64 FreeBSD PR - Exists on main
  • ARM64 Windows Non-Debug PR, ARM64 Windows PR - Buildbots are brokens

@mpage mpage merged commit 2de048c into python:main Dec 13, 2024
63 checks passed
@mpage mpage deleted the gh-115999-load-attr-module branch December 13, 2024 18:17
srinivasreddy pushed a commit to srinivasreddy/cpython that referenced this pull request Jan 8, 2025
…hreaded builds (python#127711)

We use the same approach that was used for specialization of LOAD_GLOBAL in free-threaded builds:

_CHECK_ATTR_MODULE is renamed to _CHECK_ATTR_MODULE_PUSH_KEYS; it pushes the keys object for the following _LOAD_ATTR_MODULE_FROM_KEYS (nee _LOAD_ATTR_MODULE). This arrangement avoids having to recheck the keys version.

_LOAD_ATTR_MODULE is renamed to _LOAD_ATTR_MODULE_FROM_KEYS; it loads the value from the keys object pushed by the preceding _CHECK_ATTR_MODULE_PUSH_KEYS at the cached index.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants