gh-126868: Add freelist for compact int objects #126865

eendebakpt · 2024-11-15T11:20:45Z

We can add freelists for the int object to improve performance. Using the new methods from #121934 the amount of code needed for adding a freelist is quite small. We only implement the freelist for compact ints (e.g. a single digit). For multi-digit int objects adding freelists is more complex (we need a size-based freelist) and the gains are smaller (for very large int objects the allocation is not a significant part of the computation time)

Notes:

The freelist size was chosen to be 100 (equal to the freelist size of float), but perhaps this can be tuned better
The long_dealloc and _PyLong_ExactDealloc are almost identical, we could keep just long_dealloc at the cost of a tiny bit of performance.

Some references to discussions on freelists

The freelist improves performance of int operations in microbenchmarks:

bench_long: Mean +- std dev: [main_long] 106 ns +- 5 ns -> [pr_long1c] 99.8 ns +- 4.4 ns: 1.07x faster
bench_alloc: Mean +- std dev: [main_long] 210 us +- 6 us -> [pr_long1c] 177 us +- 10 us: 1.19x faster

Benchmark hidden because not significant (1): bench_collatz

Geometric mean: 1.08x faster

Benchmark script

# Quick benchmark for cpython long objects

import pyperf


def collatz(a):
    while a > 1:
        if a % 2 == 0:
            a = a // 2
        else:
            a = 3 * a + 1


def bench_collatz(loops):
    range_it = range(loops)
    t0 = pyperf.perf_counter()
    for ii in range_it:
        collatz(ii)
    return pyperf.perf_counter() - t0


def bench_long(loops):
    range_it = range(loops)
    t0 = pyperf.perf_counter()
    x = 10
    for ii in range_it:
        x = x * x
        y = x // 2
        x = y + ii + x
        if x > 10**10:
            x = x % 1000
    return pyperf.perf_counter() - t0


def bench_alloc(loops):
    range_it = range(loops)
    t0 = pyperf.perf_counter()
    for ii in range_it:
        for kk in range(20_000):
            del kk
    return pyperf.perf_counter() - t0


# %timeit bench_long(1000)

if __name__ == "__main__":
    runner = pyperf.Runner()
    runner.bench_time_func("bench_collatz", bench_collatz)
    runner.bench_time_func("bench_long", bench_long)
    runner.bench_time_func("bench_alloc", bench_alloc)

On the pyperformance test suite (actually, a subset of the suite, not all benchmarks run on my system) shows the percentage of successfull freelist allocations increases significantly

Main:

Allocations from freelist 	2,004,971,371 	39.8%
Frees to freelist 	2,005,350,418 	
Allocations 	3,034,877,938 	60.2%
Allocations to 512 bytes 	3,008,791,812 	59.7%
Allocations to 4 kbytes 	18,648,072 	0.4%
Allocations over 4 kbytes 	7,438,054 	0.1%
Frees 	3,142,033,922

PR

Allocations from freelist 	3,058,347,887 	58.6%
Frees to freelist 	3,058,576,117 	
Allocations 	2,159,771,546 	41.4%
Allocations to 512 bytes 	2,133,373,693 	40.9%
Allocations to 4 kbytes 	18,802,328 	0.4%
Allocations over 4 kbytes 	7,595,525 	0.1%
Frees 	2,267,538,686

Issue: Add freelist for compact int objects #126868

mdboom · 2024-11-15T13:48:40Z

I'm running this PR over pyperformance on our benchmarking hardware. It will take ~3 hours.

mdboom · 2024-11-15T13:50:12Z

I'm running this PR over pyperformance on our benchmarking hardware. It will take ~3 hours.

Actually, scratch that -- I'll wait until the tests are passing here. That's required for PGO builds.

eendebakpt · 2024-11-16T23:00:24Z

Tests are passing now, but I disabled returning objects to the freelist at a couple of places. Changing PyObject_Free to _PyLong_ExactDealloc on the lines of code marked with "needs to be converted to freelist" introduces refleaks. It seems related to #125323. @fidgetSpinner Do you perhaps have an idea what is going on?

markshannon · 2024-11-18T15:05:01Z

What happens exactly when you change these lines:

PyStackRef_CLOSE_SPECIALIZED(left, (destructor)PyObject_Free); // needs to be converted to freelist

to

PyStackRef_CLOSE_SPECIALIZED(left, (destructor)_PyLong_ExactDealloc);

?

eendebakpt · 2024-11-18T15:58:46Z

@markshannon When I change the lines several tests related to refleaks in the CI are failing. For example test_no_memleak, which can be reproduced from the command line with:

python -Xshowrefcount -c  "[x+x*x for x in range(1000)]"

The output should be [0 refs, 0 blocks], but instead I get [982 refs, 0 blocks] (exact numbers depending on which lines I change and the exact code executed).

Fidget-Spinner

The "leaks" are due to the freelist itself. They are a sign the freelist is working :).
Can you please convert all to _PyLong_ExactDealloc and apply the suggestion below, and tell me how many allocations this removes?

Objects/longobject.c

Fidget-Spinner · 2024-11-20T19:17:52Z

@eendebakpt sorry I pushed to your branch as I'm really eager to get benchmark results on this :).

Fidget-Spinner · 2024-11-20T19:30:21Z

On my machine using the benchmark script provided above (release build, no PGO, no LTO):

bench_collatz: Mean +- std dev: [without_freelist] 9.41 us +- 0.08 us -> [with_freelist] 9.22 us +- 0.08 us: 1.02x faster
bench_long: Mean +- std dev: [without_freelist] 187 ns +- 1 ns -> [with_freelist] 164 ns +- 2 ns: 1.14x faster
bench_alloc: Mean +- std dev: [without_freelist] 411 us +- 4 us -> [with_freelist] 333 us +- 2 us: 1.24x faster

Geometric mean: 1.13x faster

eendebakpt · 2024-11-20T20:12:32Z

Objects/longobject.c

                _Py_SetImmortal(self);
                return;
            }
        }
    }
+
+    if (PyLong_CheckExact(self)) {
+        if (_PyLong_IsCompact((PyLongObject *)self)) {


@Fidget-Spinner The _PyLong_IsCompact check has already been done in this method, can we move this into the if (pylong && _PyLong_IsCompact(pylong)) part?

Also, can we remove the pylong && part? I think pylong can never be NULL. (PyLong_CheckExact assumes the pointer is not NULL I think)

Ok that sounds good

Fidget-Spinner · 2024-11-21T07:33:52Z

Results:

1.1% speedup on macOS: https://github.com/faster-cpython/benchmarking-public/blob/main/results/bm-20241121-3.14.0a1%2B-d1e4aa2/bm-20241121-darwin-arm64-eendebakpt-int_freelist-3.14.0a1%2B-d1e4aa2-vs-base.svg
0.3% speedup on x86-Linux: https://github.com/faster-cpython/benchmarking-public/blob/main/results/bm-20241121-3.14.0a1%2B-d1e4aa2/bm-20241121-linux-x86_64-eendebakpt-int_freelist-3.14.0a1%2B-d1e4aa2-vs-base.svg
No speedup on Windows

This is a great result! Congrats and great work @eendebakpt !

picnixz

Out of curiosity, are compact integers allowed to be signed or not? (you mentioned that we only focus on single digit numbers but the C API considers compact objects as being an implementation detail IIRC). If not, how hard would it be to make them support free lists? (I don't have much knowledge in free lists; for instance it's a mystery to me for how the free list grows)

picnixz · 2024-11-21T07:41:45Z

Python/bytecodes.c

@@ -26,6 +26,7 @@
 #include "pycore_pyerrors.h"      // _PyErr_GetRaisedException()
 #include "pycore_pystate.h"       // _PyInterpreterState_GET()
 #include "pycore_range.h"         // _PyRangeIterObject
+#include "pycore_long.h"         // void _PyLong_ExactDealloc(PyLongObject *op);


Can we align the // (on mobile it seems 1 char off)?

picnixz · 2024-11-21T07:42:29Z

Objects/longobject.c

@@ -6615,7 +6642,7 @@ PyTypeObject PyLong_Type = {
    0,                                          /* tp_init */
    0,                                          /* tp_alloc */
    long_new,                                   /* tp_new */
-    PyObject_Free,                              /* tp_free */
+    (freefunc)PyObject_Free,                              /* tp_free */


Can we align the .tp_free comment? (maybe tabs and spaces are mixed, hence that's why I see them unaligned on monile). Also, is the cast necessary to avoid UBSan failures? If not, you can just use .tp_free = ... to emphasize the semantic

picnixz · 2024-11-21T07:44:00Z

Objects/longobject.c

+                * we accidentally decref small Ints out of existence. Instead,
+                * since small Ints are immortal, re-set the reference count.
+                */


Suggested change

* we accidentally decref small Ints out of existence. Instead,

* since small Ints are immortal, re-set the reference count.

*/

* we accidentally decref small Ints out of existence. Instead,

* since small Ints are immortal, re-set the reference count.

*/

picnixz · 2024-11-21T07:44:49Z

Objects/longobject.c

@@ -6,6 +6,7 @@
 #include "pycore_bitutils.h"      // _Py_popcount32()
 #include "pycore_initconfig.h"    // _PyStatus_OK()
 #include "pycore_call.h"          // _PyObject_MakeTpCall
+#include "pycore_freelist.h" // _Py_FREELIST_FREE(), _Py_FREELIST_POP()


Comment alignment.

picnixz · 2024-11-21T07:45:37Z

Include/internal/pycore_long.h

@@ -55,6 +55,8 @@ extern void _PyLong_FiniTypes(PyInterpreterState *interp);

 /* other API */

+PyAPI_FUNC(void) _PyLong_ExactDealloc(PyObject *self);


Do you need it to be exported like that or could you live with a simple extern? If not, can you explain which file needs this export?

It's needed by the JIT on Windows, as it's used in bytecodes.c. This is a pretty common pattern in the internal C API unfortunately

For semantic purposes, would it be better to have a macro named differently? (it would be a simple alias but it could help semantics and reviewers)

picnixz · 2024-11-21T07:52:40Z

Objects/longobject.c

@@ -42,7 +43,7 @@ static inline void
 _Py_DECREF_INT(PyLongObject *op)
 {
    assert(PyLong_CheckExact(op));
-    _Py_DECREF_SPECIALIZED((PyObject *)op, (destructor)PyObject_Free);
+    _Py_DECREF_SPECIALIZED((PyObject *)op, (destructor) _PyLong_ExactDealloc);


Suggested change

_Py_DECREF_SPECIALIZED((PyObject *)op, (destructor) _PyLong_ExactDealloc);

_Py_DECREF_SPECIALIZED((PyObject *)op, (destructor)_PyLong_ExactDealloc);

I'm also not sure whether you need the cast. If you need the cast, I'd suggest changing the signature of the drstructor itself.

markshannon

Thanks for doing this.
I've one suggestion for a name change, otherwise it looks good.

The generated files need to be regenerated, and I'd like to rerun the benchmarks to confirm the additional small int checks don't impact performance too much.

markshannon · 2024-12-04T12:29:06Z

Objects/longobject.c

@@ -3611,24 +3615,60 @@ long_richcompare(PyObject *self, PyObject *other, int op)
    Py_RETURN_RICHCOMPARE(result, 0, op);
 }

+static inline int
+_PyLong_IsSmallInt(PyObject *self)


With a name starting with _PyLong this looks like an API function.
Since it is static and can only be used with compact ints maybe name it something like compact_int_is_small?

bedevere-app · 2024-12-04T13:49:26Z

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

eendebakpt · 2024-12-07T22:20:27Z

The generated files need to be regenerated, and I'd like to rerun the benchmarks to confirm the additional small int checks don't impact performance too much.

I assume you mean the small int checks in _PyLong_ExactDealloc? In current main these checks are not present (PyObject_Free is called directly), so we could argue they are not needed in _PyLong_ExactDealloc either. On the other hand, if I understand things well to be correct they should be added.

Misc/NEWS.d/next/Core_and_Builtins/2024-11-16-22-37-46.gh-issue-126868.yOoHSY.rst

…e-126868.yOoHSY.rst

eendebakpt · 2024-12-09T10:42:53Z

I have made the requested changes; please review again

bedevere-app · 2024-12-09T10:42:58Z

Thanks for making the requested changes!

@markshannon: please review the changes made to this pull request.

markshannon · 2024-12-12T10:47:42Z

0.6% speedup

markshannon

Thanks for doing this. The code looks good now.

The speedup may not be as large as initially claimed due to addition checks for small integers, but it is a real speedup and should be further improved by #127620

markshannon · 2024-12-12T14:23:51Z

It seems likely the thread sanitizer failure is not caused by this PR, but this PR does expose it.
Hopefully it will be fixed soon, but we'll need to wait a bit before merging this PR.

colesbury · 2024-12-12T17:31:01Z

TSan failure is addressed in #127880.

colesbury · 2024-12-12T17:59:55Z

The TSan failure is fixed in main now, if you want to merge main back into the PR.

bedevere-bot · 2024-12-13T10:36:49Z

⚠️⚠️⚠️ Buildbot failure ⚠️⚠️⚠️

Hi! The buildbot iOS ARM64 Simulator 3.x has failed when building commit 5fc6bb2.

What do you need to do:

Don't panic.
Check the buildbot page in the devguide if you don't know what the buildbots are or how they work.
Go to the page of the buildbot that failed (https://buildbot.python.org/#/builders/1380/builds/2109) and take a look at the build logs.
Check if the failure is related to this commit (5fc6bb2) or if it is a false positive.
If the failure is related to this commit, please, reflect that on the issue and make a new Pull Request with a fix.

You can take a look at the buildbot page here:

https://buildbot.python.org/#/builders/1380/builds/2109

Summary of the results of the build (if available):

==

Click to see traceback logs

Traceback (most recent call last):

Add freelist of compact int objects

0713034

eendebakpt requested review from ericsnowcurrently and markshannon as code owners November 15, 2024 11:20

bedevere-app bot added the awaiting review label Nov 15, 2024

eendebakpt added 2 commits November 15, 2024 12:32

fix build

be58ade

cleanup freelist at exit

3f50b54

eendebakpt changed the title ~~Draft: Add freelist of compact int objects~~ Draft: gh-126868: Add freelist for compact int objects Nov 15, 2024

eendebakpt marked this pull request as draft November 15, 2024 12:50

bedevere-app bot removed the awaiting review label Nov 15, 2024

bedevere-app bot mentioned this pull request Nov 15, 2024

Add freelist for compact int objects #126868

Closed

eendebakpt and others added 7 commits November 16, 2024 19:59

fix memory leak; align with float implementation

fa97302

remove stale comment

d72486f

remove unused function

e07c218

avoid some problematic decrefs

328e0c1

jit build

e1dc2b3

📜🤖 Added by blurb_it.

9df776b

Merge branch 'main' into int_freelist

6b73046

Fidget-Spinner reviewed Nov 20, 2024

View reviewed changes

Objects/longobject.c Outdated Show resolved Hide resolved

Objects/longobject.c Outdated Show resolved Hide resolved

Apply review suggestions

db8247e

Fixup

d1e4aa2

eendebakpt commented Nov 20, 2024

View reviewed changes

picnixz reviewed Nov 21, 2024

View reviewed changes

markshannon requested changes Dec 4, 2024

View reviewed changes

bedevere-app bot added awaiting changes and removed awaiting review labels Dec 4, 2024

eendebakpt added 3 commits December 7, 2024 23:11

review comment

efde111

Merge branch 'main' into int_freelist

928e912

regenerate

437c24c

eendebakpt commented Dec 7, 2024

View reviewed changes

Misc/NEWS.d/next/Core_and_Builtins/2024-11-16-22-37-46.gh-issue-126868.yOoHSY.rst Outdated Show resolved Hide resolved

Update Misc/NEWS.d/next/Core_and_Builtins/2024-11-16-22-37-46.gh-issu…

b034948

…e-126868.yOoHSY.rst

bedevere-app bot added awaiting change review and removed awaiting changes labels Dec 9, 2024

bedevere-app bot requested a review from markshannon December 9, 2024 10:42

markshannon approved these changes Dec 12, 2024

View reviewed changes

bedevere-app bot added awaiting merge and removed awaiting change review labels Dec 12, 2024

colesbury mentioned this pull request Dec 12, 2024

gh-127879: Fix data race in _PyFreeList_Push #127880

Merged

eendebakpt added 2 commits December 12, 2024 21:49

Merge branch 'main' into int_freelist

19f64f6

Merge branch 'main' into int_freelist

14681c1

markshannon merged commit 5fc6bb2 into python:main Dec 13, 2024
57 checks passed

bedevere-app bot removed the awaiting merge label Dec 13, 2024

markshannon mentioned this pull request Dec 13, 2024

gh-127119: Faster check for small ints in long_dealloc #127620

Open

srinivasreddy pushed a commit to srinivasreddy/cpython that referenced this pull request Jan 8, 2025

pythongh-126868: Add freelist for compact int objects (pythonGH-126865)

abf3b83

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gh-126868: Add freelist for compact int objects #126865

gh-126868: Add freelist for compact int objects #126865

eendebakpt commented Nov 15, 2024 •

edited

Loading

mdboom commented Nov 15, 2024

mdboom commented Nov 15, 2024

eendebakpt commented Nov 16, 2024

markshannon commented Nov 18, 2024

eendebakpt commented Nov 18, 2024

Fidget-Spinner left a comment

Fidget-Spinner commented Nov 20, 2024

Fidget-Spinner commented Nov 20, 2024 •

edited

Loading

eendebakpt Nov 20, 2024

Fidget-Spinner Nov 21, 2024

Fidget-Spinner commented Nov 21, 2024 •

edited

Loading

picnixz left a comment

picnixz Nov 21, 2024

picnixz Nov 21, 2024

picnixz Nov 21, 2024

picnixz Nov 21, 2024

picnixz Nov 21, 2024

Fidget-Spinner Nov 21, 2024 •

edited

Loading

picnixz Nov 21, 2024

picnixz Nov 21, 2024

markshannon left a comment •

edited

Loading

markshannon Dec 4, 2024

bedevere-app bot commented Dec 4, 2024

eendebakpt commented Dec 7, 2024

eendebakpt commented Dec 9, 2024

bedevere-app bot commented Dec 9, 2024

markshannon commented Dec 12, 2024

markshannon left a comment

markshannon commented Dec 12, 2024

colesbury commented Dec 12, 2024

colesbury commented Dec 12, 2024

bedevere-bot commented Dec 13, 2024

		@@ -55,6 +55,8 @@ extern void _PyLong_FiniTypes(PyInterpreterState *interp);

		/* other API */

		PyAPI_FUNC(void) _PyLong_ExactDealloc(PyObject *self);

	_Py_DECREF_SPECIALIZED((PyObject *)op, (destructor) _PyLong_ExactDealloc);
	_Py_DECREF_SPECIALIZED((PyObject *)op, (destructor)_PyLong_ExactDealloc);

gh-126868: Add freelist for compact int objects #126865

gh-126868: Add freelist for compact int objects #126865

Conversation

eendebakpt commented Nov 15, 2024 • edited Loading

mdboom commented Nov 15, 2024

mdboom commented Nov 15, 2024

eendebakpt commented Nov 16, 2024

markshannon commented Nov 18, 2024

eendebakpt commented Nov 18, 2024

Fidget-Spinner left a comment

Choose a reason for hiding this comment

Fidget-Spinner commented Nov 20, 2024

Fidget-Spinner commented Nov 20, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Fidget-Spinner commented Nov 21, 2024 • edited Loading

picnixz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Fidget-Spinner Nov 21, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

markshannon left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bedevere-app bot commented Dec 4, 2024

eendebakpt commented Dec 7, 2024

eendebakpt commented Dec 9, 2024

bedevere-app bot commented Dec 9, 2024

markshannon commented Dec 12, 2024

markshannon left a comment

Choose a reason for hiding this comment

markshannon commented Dec 12, 2024

colesbury commented Dec 12, 2024

colesbury commented Dec 12, 2024

bedevere-bot commented Dec 13, 2024

⚠️⚠️⚠️ Buildbot failure ⚠️⚠️⚠️

eendebakpt commented Nov 15, 2024 •

edited

Loading

Fidget-Spinner commented Nov 20, 2024 •

edited

Loading

Fidget-Spinner commented Nov 21, 2024 •

edited

Loading

Fidget-Spinner Nov 21, 2024 •

edited

Loading

markshannon left a comment •

edited

Loading