Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-115999: Specialize STORE_ATTR in free-threaded builds. #127838

Merged
merged 12 commits into from
Dec 19, 2024

Conversation

nascheme
Copy link
Member

@nascheme nascheme commented Dec 11, 2024

  • Fix thread safety issues with specialized opcodes (STORE_ATTR_INSTANCE_VALUE, STORE_ATTR_SLOT, STORE_ATTR_WITH_HINT). Need a combination of locks and atomics to be safe.
  • Fix thread safety issues with _Py_Specialize_StoreAttr . Avoid using borrowed references. Save and store the tp_version_tag from the beginning of the specialization process since it might change. Use helper functions to update opcode.
  • Add unit tests to ensure specialization is happening in free-threaded builds. Add addtional tests to try to trigger data races.

@nascheme nascheme changed the title gh-115999: Enable specialization of STORE_ATTR free-threaded builds. gh-115999: Specialize STORE_ATTR in free-threaded builds. Dec 11, 2024
@nascheme nascheme marked this pull request as ready for review December 11, 2024 21:54
@nascheme nascheme requested a review from mpage December 11, 2024 21:54
Python/bytecodes.c Outdated Show resolved Hide resolved
Python/bytecodes.c Outdated Show resolved Hide resolved
Copy link
Contributor

@mpage mpage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking this on! This looks like a regression on the default build. I haven't had a chance to dig into it, but I suspect it might either be due to the check that Sam flagged in _STORE_ATTR_INSTANCE_VALUE or the change to when we read the type version in _Py_Specialize_StoreAttr. It looks like the richards benchmark is the most heavily affected, so that might be a good isolated benchmark to use for debugging.

Lib/test/test_opcache.py Outdated Show resolved Hide resolved
Python/specialize.c Outdated Show resolved Hide resolved
Python/specialize.c Outdated Show resolved Hide resolved
Python/specialize.c Outdated Show resolved Hide resolved
Python/specialize.c Outdated Show resolved Hide resolved
@nascheme
Copy link
Member Author

nascheme commented Dec 13, 2024

Thanks for taking this on! This looks like a regression on the default build. I haven't had a chance to dig into it, but I suspect it might either be due to the check that Sam flagged in _STORE_ATTR_INSTANCE_VALUE or the change to when we read the type version in _Py_Specialize_StoreAttr. It looks like the richards benchmark is the most heavily affected, so that might be a good isolated benchmark to use for debugging.

Based on my benchmarking my second commit, the regression with richards is gone. Likely it was the fact that STORE_ATTR_INSTANCE_VALUE was not actually working.

Regarding the tp_version_tag not getting set due to type_get_version() being hoisted, fixing it as you suggest by using _PyType_LookupRefAndVersion is a bit complex to do. So, I deferred doing that for now. I'll look at it again.

@nascheme
Copy link
Member Author

Regarding the tp_version_tag not getting set due to type_get_version() being hoisted, fixing it as you suggest by using _PyType_LookupRefAndVersion is a bit complex to do. So, I deferred doing that for now. I'll look at it again.

This was actually not hard to fix. I was thinking that _PyType_LookupRefAndVersion would not set the tag in the case the lookup fails. However, that's not the case.

Python/bytecodes.c Outdated Show resolved Hide resolved
Python/bytecodes.c Outdated Show resolved Hide resolved
@nascheme nascheme force-pushed the gh-115999-specialize-store-attr branch from f920dcd to 4c484ab Compare December 13, 2024 19:17
@nascheme
Copy link
Member Author

I rebased on main to resolve merge conflicts and force pushed.

Copy link
Contributor

@mpage mpage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a couple of small comments inline, but LGTM overall.

Python/specialize.c Outdated Show resolved Hide resolved
Python/specialize.c Outdated Show resolved Hide resolved
@nascheme nascheme requested a review from methane as a code owner December 17, 2024 21:59
* Fix locking for `STORE_ATTR_INSTANCE_VALUE`.  Create
  `_GUARD_TYPE_VERSION_AND_LOCK` so that object stays locked and
  `tp_version_tag` cannot change.  Fix inverted logic bug that caused
  erroneous deopt.

* Fix locking for `_STORE_ATTR_WITH_HINT`.  Double check that
  `_PyObject_GetManagedDict()` hasn't disappeared since we locked the
  dict.

- Pass `tp_version_tag` to `specialize_dict_access()`, ensuring
  the version we store on the cache is the correct one (in case of
  it changing during the specalize analysis).

- Split `analyze_descriptor` into `analyze_descriptor_load` and
  `analyze_descriptor_store` since those don't share much logic.
  Add `descriptor_is_class` helper function.

- In `specialize_dict_access`, double check `_PyObject_GetManagedDict()`
  in case we race and dict was materialized before the lock.
If the type is new and a version tag hasn't yet been assigned, we would
fail to specialize it.  Use `_PyType_LookupRefAndVersion()` instead of
`type_get_version()`, which will assign a version.
Use provided value of `tp_version` to store in cache.
This also fixes the case if the dict is replaced with a different one.
The function is only used in with-GIL builds.
For `specialize_dict_access_inline()`, we need to lock the keys object.

* Add `_PyDictKeys_StringLookupSplit` which does required locking and
  use in place of `_PyDictKeys_StringLookup`.

* Change `_PyObject_TryGetInstanceAttribute` to use that function
  in the case of split keys.

* Add `unicodekeys_lookup_split` helper which allows code sharing
  between `_Py_dict_lookup` and `_PyDictKeys_StringLookupSplit`.
@nascheme nascheme force-pushed the gh-115999-specialize-store-attr branch from 699f4e9 to 6bf9016 Compare December 18, 2024 06:16
Copy link
Contributor

@mpage mpage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Just a couple of small comments inline.

Objects/dictobject.c Outdated Show resolved Hide resolved
Objects/dictobject.c Outdated Show resolved Hide resolved
Copy link
Contributor

@mpage mpage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Benchmark results:

@nascheme nascheme merged commit 1b15c89 into python:main Dec 19, 2024
55 checks passed
srinivasreddy pushed a commit to srinivasreddy/cpython that referenced this pull request Dec 23, 2024
…thongh-127838)

* Add `_PyDictKeys_StringLookupSplit` which does locking on dict keys and
  use in place of `_PyDictKeys_StringLookup`.

* Change `_PyObject_TryGetInstanceAttribute` to use that function
  in the case of split keys.

* Add `unicodekeys_lookup_split` helper which allows code sharing
  between `_Py_dict_lookup` and `_PyDictKeys_StringLookupSplit`.

* Fix locking for `STORE_ATTR_INSTANCE_VALUE`.  Create
  `_GUARD_TYPE_VERSION_AND_LOCK` uop so that object stays locked and
  `tp_version_tag` cannot change.

* Pass `tp_version_tag` to `specialize_dict_access()`, ensuring
  the version we store on the cache is the correct one (in case of
  it changing during the specalize analysis).

* Split `analyze_descriptor` into `analyze_descriptor_load` and
  `analyze_descriptor_store` since those don't share much logic.
  Add `descriptor_is_class` helper function.

* In `specialize_dict_access`, double check `_PyObject_GetManagedDict()`
  in case we race and dict was materialized before the lock.

* Avoid borrowed references in `_Py_Specialize_StoreAttr()`.

* Use `specialize()` and `unspecialize()` helpers.

* Add unit tests to ensure specializing happens as expected in FT builds.

* Add unit tests to attempt to trigger data races (useful for running under TSAN).

* Add `has_split_table` function to `_testinternalcapi`.
srinivasreddy pushed a commit to srinivasreddy/cpython that referenced this pull request Jan 8, 2025
…thongh-127838)

* Add `_PyDictKeys_StringLookupSplit` which does locking on dict keys and
  use in place of `_PyDictKeys_StringLookup`.

* Change `_PyObject_TryGetInstanceAttribute` to use that function
  in the case of split keys.

* Add `unicodekeys_lookup_split` helper which allows code sharing
  between `_Py_dict_lookup` and `_PyDictKeys_StringLookupSplit`.

* Fix locking for `STORE_ATTR_INSTANCE_VALUE`.  Create
  `_GUARD_TYPE_VERSION_AND_LOCK` uop so that object stays locked and
  `tp_version_tag` cannot change.

* Pass `tp_version_tag` to `specialize_dict_access()`, ensuring
  the version we store on the cache is the correct one (in case of
  it changing during the specalize analysis).

* Split `analyze_descriptor` into `analyze_descriptor_load` and
  `analyze_descriptor_store` since those don't share much logic.
  Add `descriptor_is_class` helper function.

* In `specialize_dict_access`, double check `_PyObject_GetManagedDict()`
  in case we race and dict was materialized before the lock.

* Avoid borrowed references in `_Py_Specialize_StoreAttr()`.

* Use `specialize()` and `unspecialize()` helpers.

* Add unit tests to ensure specializing happens as expected in FT builds.

* Add unit tests to attempt to trigger data races (useful for running under TSAN).

* Add `has_split_table` function to `_testinternalcapi`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants