-
-
Notifications
You must be signed in to change notification settings - Fork 30.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-126868: Add freelist for compact int objects #126865
Conversation
I'm running this PR over pyperformance on our benchmarking hardware. It will take ~3 hours. |
Actually, scratch that -- I'll wait until the tests are passing here. That's required for PGO builds. |
Tests are passing now, but I disabled returning objects to the freelist at a couple of places. Changing |
What happens exactly when you change these lines: PyStackRef_CLOSE_SPECIALIZED(left, (destructor)PyObject_Free); // needs to be converted to freelist to
? |
@markshannon When I change the lines several tests related to refleaks in the CI are failing. For example test_no_memleak, which can be reproduced from the command line with:
The output should be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "leaks" are due to the freelist itself. They are a sign the freelist is working :).
Can you please convert all to _PyLong_ExactDealloc and apply the suggestion below, and tell me how many allocations this removes?
@eendebakpt sorry I pushed to your branch as I'm really eager to get benchmark results on this :). |
On my machine using the benchmark script provided above (release build, no PGO, no LTO):
|
Objects/longobject.c
Outdated
_Py_SetImmortal(self); | ||
return; | ||
} | ||
} | ||
} | ||
|
||
if (PyLong_CheckExact(self)) { | ||
if (_PyLong_IsCompact((PyLongObject *)self)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Fidget-Spinner The _PyLong_IsCompact
check has already been done in this method, can we move this into the if (pylong && _PyLong_IsCompact(pylong))
part?
Also, can we remove the pylong &&
part? I think pylong
can never be NULL. (PyLong_CheckExact
assumes the pointer is not NULL I think)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok that sounds good
Results:
This is a great result! Congrats and great work @eendebakpt ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity, are compact integers allowed to be signed or not? (you mentioned that we only focus on single digit numbers but the C API considers compact objects as being an implementation detail IIRC). If not, how hard would it be to make them support free lists? (I don't have much knowledge in free lists; for instance it's a mystery to me for how the free list grows)
Python/bytecodes.c
Outdated
@@ -26,6 +26,7 @@ | |||
#include "pycore_pyerrors.h" // _PyErr_GetRaisedException() | |||
#include "pycore_pystate.h" // _PyInterpreterState_GET() | |||
#include "pycore_range.h" // _PyRangeIterObject | |||
#include "pycore_long.h" // void _PyLong_ExactDealloc(PyLongObject *op); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we align the // (on mobile it seems 1 char off)?
Objects/longobject.c
Outdated
@@ -6615,7 +6642,7 @@ PyTypeObject PyLong_Type = { | |||
0, /* tp_init */ | |||
0, /* tp_alloc */ | |||
long_new, /* tp_new */ | |||
PyObject_Free, /* tp_free */ | |||
(freefunc)PyObject_Free, /* tp_free */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we align the .tp_free comment? (maybe tabs and spaces are mixed, hence that's why I see them unaligned on monile). Also, is the cast necessary to avoid UBSan failures? If not, you can just use .tp_free = ...
to emphasize the semantic
Objects/longobject.c
Outdated
* we accidentally decref small Ints out of existence. Instead, | ||
* since small Ints are immortal, re-set the reference count. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* we accidentally decref small Ints out of existence. Instead, | |
* since small Ints are immortal, re-set the reference count. | |
*/ | |
* we accidentally decref small Ints out of existence. Instead, | |
* since small Ints are immortal, re-set the reference count. | |
*/ |
Objects/longobject.c
Outdated
@@ -6,6 +6,7 @@ | |||
#include "pycore_bitutils.h" // _Py_popcount32() | |||
#include "pycore_initconfig.h" // _PyStatus_OK() | |||
#include "pycore_call.h" // _PyObject_MakeTpCall | |||
#include "pycore_freelist.h" // _Py_FREELIST_FREE(), _Py_FREELIST_POP() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment alignment.
@@ -55,6 +55,8 @@ extern void _PyLong_FiniTypes(PyInterpreterState *interp); | |||
|
|||
/* other API */ | |||
|
|||
PyAPI_FUNC(void) _PyLong_ExactDealloc(PyObject *self); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need it to be exported like that or could you live with a simple extern
? If not, can you explain which file needs this export?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's needed by the JIT on Windows, as it's used in bytecodes.c. This is a pretty common pattern in the internal C API unfortunately
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For semantic purposes, would it be better to have a macro named differently? (it would be a simple alias but it could help semantics and reviewers)
Objects/longobject.c
Outdated
@@ -42,7 +43,7 @@ static inline void | |||
_Py_DECREF_INT(PyLongObject *op) | |||
{ | |||
assert(PyLong_CheckExact(op)); | |||
_Py_DECREF_SPECIALIZED((PyObject *)op, (destructor)PyObject_Free); | |||
_Py_DECREF_SPECIALIZED((PyObject *)op, (destructor) _PyLong_ExactDealloc); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_Py_DECREF_SPECIALIZED((PyObject *)op, (destructor) _PyLong_ExactDealloc); | |
_Py_DECREF_SPECIALIZED((PyObject *)op, (destructor)_PyLong_ExactDealloc); |
I'm also not sure whether you need the cast. If you need the cast, I'd suggest changing the signature of the drstructor itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this.
I've one suggestion for a name change, otherwise it looks good.
The generated files need to be regenerated, and I'd like to rerun the benchmarks to confirm the additional small int checks don't impact performance too much.
Objects/longobject.c
Outdated
@@ -3611,24 +3615,60 @@ long_richcompare(PyObject *self, PyObject *other, int op) | |||
Py_RETURN_RICHCOMPARE(result, 0, op); | |||
} | |||
|
|||
static inline int | |||
_PyLong_IsSmallInt(PyObject *self) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With a name starting with _PyLong
this looks like an API function.
Since it is static and can only be used with compact ints maybe name it something like compact_int_is_small
?
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
I assume you mean the small int checks in |
Misc/NEWS.d/next/Core_and_Builtins/2024-11-16-22-37-46.gh-issue-126868.yOoHSY.rst
Outdated
Show resolved
Hide resolved
…e-126868.yOoHSY.rst
I have made the requested changes; please review again |
Thanks for making the requested changes! @markshannon: please review the changes made to this pull request. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this. The code looks good now.
The speedup may not be as large as initially claimed due to addition checks for small integers, but it is a real speedup and should be further improved by #127620
It seems likely the thread sanitizer failure is not caused by this PR, but this PR does expose it. |
TSan failure is addressed in #127880. |
The TSan failure is fixed in main now, if you want to merge main back into the PR. |
|
We can add freelists for the int object to improve performance. Using the new methods from #121934 the amount of code needed for adding a freelist is quite small. We only implement the freelist for compact ints (e.g. a single digit). For multi-digit int objects adding freelists is more complex (we need a size-based freelist) and the gains are smaller (for very large int objects the allocation is not a significant part of the computation time)
Notes:
long_dealloc
and_PyLong_ExactDealloc
are almost identical, we could keep justlong_dealloc
at the cost of a tiny bit of performance.Some references to discussions on freelists
The freelist improves performance of
int
operations in microbenchmarks:Benchmark script
On the pyperformance test suite (actually, a subset of the suite, not all benchmarks run on my system) shows the percentage of successfull freelist allocations increases significantly
Main:
PR