Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-127119: Faster check for small ints in long_dealloc #127620

Open
wants to merge 39 commits into
base: main
Choose a base branch
from

Conversation

eendebakpt
Copy link
Contributor

@eendebakpt eendebakpt commented Dec 4, 2024

  • Use a faster check for small ints in long_dealloc
  • Remove check for small ints in the free-threaded build
  • Add reference to PEP 683

Microbenchmark from #127120:

bench_long: Mean +- std dev: [main_pgo_bm_long] 195 ns +- 5 ns -> [pr_v2_pgo_bm_long] 191 ns +- 4 ns: 1.02x faster
bench_alloc: Mean +- std dev: [main_pgo_bm_long] 425 us +- 37 us -> [pr_v2_pgo_bm_long] 394 us +- 20 us: 1.08x faster

Benchmark hidden because not significant (1): bench_collatz

Geometric mean: 1.04x faster 

(benchmark is with PGO, without PGO I see no performance improvement)

Copy link
Member

@markshannon markshannon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is technically not legal C.
Pointer comparisons between separately allocated blocks of memory are undefined.

Which is a shame, because this is clearly faster.

@eendebakpt
Copy link
Contributor Author

eendebakpt commented Dec 5, 2024

Pointer comparisions are indeed not always well-defined. For reference: see https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf, section 6.5.8.6, or https://www.gnu.org/software/c-intro-and-ref/manual/html_node/Pointer-Comparison.html.

I updated the PR to use the immortality bit instead.

(maybe there are some more places where we need to update documentation on the immortality bit)

Objects/longobject.c Outdated Show resolved Hide resolved
@@ -0,0 +1 @@
Slightly optimize the :class:`int` deallocator.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe include the benchmarks results (it's a 4% improvement which is still noticable IMO). You should mention that it only concerns PGO builds as well.

Tools/gdb/libpython.py Outdated Show resolved Hide resolved
@eendebakpt eendebakpt marked this pull request as draft December 5, 2024 22:08
@eendebakpt
Copy link
Contributor Author

Turns out the immortal bit was assumed to be zero in _PyLong_IsNonNegativeCompact. The updated _PyLong_IsNonNegativeCompact makes some of the operations with ints a (tiny) bit slower, so we have to measure performance a bit more carefull.

@eendebakpt eendebakpt marked this pull request as ready for review December 5, 2024 22:50
@chris-eibl
Copy link

Turns out the immortal bit was assumed to be zero in _PyLong_IsNonNegativeCompact. The updated _PyLong_IsNonNegativeCompact makes some of the operations with ints a (tiny) bit slower, so we have to measure performance a bit more carefull.

@eduardo-elizondo already spotted that back then #102464 (comment) and created #103403, especially to target long_dealloc, see #103403 (comment).

@eendebakpt
Copy link
Contributor Author

@chris-eibl Thanks for that little bit of history.

@markshannon @eduardo-elizondo This PR is more or less identical to #103403 which has been closed. Unless you feel different about it now, I suggest we close this.

@chris-eibl
Copy link

chris-eibl commented Dec 8, 2024

But your microbenchmarks for the initial version seemed promising.
Do you have numbers for the current version of the PR as well?

The results of the pyperformance fleet would be interesting, too, since you had to touch _PyLong_IsNonNegativeCompact.

I like that in your code the comment about Accidental De-Immortalizing is now exactly before _Py_SetImmortal.

In the previous version, I had to read carefully, especially because it starts with "This should never get called", which of course does not comment on the invocation of long_dealloc itself.

Every none-small-int will be deallocated, and for all of them we pay the price of _PyLong_IsCompact. If it is a compact int (very likely), then a bunch more code is executed. Especially for small int values we have to check whether they match the singletons to care about accidential de-immortaliziation. And we have to go down to the if (pylong == small_pylong) for all small int values for everything that derives from int.

I don't care much about IntEnums or their "predecessors" class _NamedIntConstant(int) , etc, since those most likely aren't deallocated before the end of the programm. But e.g. extensions that derive from int.

Copy link
Member

@markshannon markshannon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few very minor issues

Modules/_testcapi/immortal.c Outdated Show resolved Hide resolved
Modules/_testcapi/immortal.c Outdated Show resolved Hide resolved
Include/cpython/longintrepr.h Outdated Show resolved Hide resolved
}

void
_PyLong_ExactDealloc(PyObject *self)
{
assert(PyLong_CheckExact(self));
#ifndef Py_GIL_DISABLED
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should excluding the no-gil build.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. According to https://peps.python.org/pep-0683/#stable-abi the no-gil implementation can work with older stable API extensions.

_Py_FREELIST_FREE(ints, self, PyObject_Free);
return;
}
#ifndef Py_GIL_DISABLED
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise

@bedevere-app
Copy link

bedevere-app bot commented Dec 20, 2024

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@eendebakpt
Copy link
Contributor Author

@markshannon Adding the small int check for the free-threaded build uncovered a bug: we have to prevent copying the immortal bit when subclassing int. I addresed the bug in long_subtype_new and added an assert to _long_is_small_int to detect similar issues.

@eendebakpt
Copy link
Contributor Author

I have made the requested changes; please review again

@bedevere-app
Copy link

bedevere-app bot commented Jan 8, 2025

Thanks for making the requested changes!

@markshannon: please review the changes made to this pull request.

@bedevere-app bedevere-app bot requested a review from markshannon January 8, 2025 21:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants