Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

time.sleep(0) is slower on Python 3.11 than on Python 3.10 #125997

Open
charles-cooper opened this issue Oct 26, 2024 · 17 comments
Open

time.sleep(0) is slower on Python 3.11 than on Python 3.10 #125997

charles-cooper opened this issue Oct 26, 2024 · 17 comments
Labels
extension-modules C modules in the Modules dir performance Performance or resource usage type-bug An unexpected behavior, bug, or error

Comments

@charles-cooper
Copy link

charles-cooper commented Oct 26, 2024

Bug report

Bug description:

the following script runs 100x slower on python 3.11 and python 3.12:

# relax.py

import time

def main():
    for _ in range(1_000_000):
        time.sleep(0.0)

main()
~ $ time python3.10 relax.py 

real	0m0.643s
user	0m0.345s
sys	0m0.314s
~ $ time python3.11 relax.py 

real	0m52.212s
user	0m0.744s
sys	0m1.115s
~ $ time python3.12 relax.py 

real	0m52.826s
user	0m1.317s
sys	0m0.997s

here is my system+python information:

~ $ uname -r
5.15.0-122-generic
~ $ lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 22.04.5 LTS
Release:	22.04
Codename:	jammy
~ $ python3.10 -VV
Python 3.10.12 (main, Sep 11 2024, 15:47:36) [GCC 11.4.0]
~ $ python3.11 -VV
Python 3.11.10 (main, Sep  7 2024, 18:35:41) [GCC 11.4.0]
~ $ python3.12 -VV
Python 3.12.7 (main, Oct  1 2024, 08:52:12) [GCC 11.4.0]

CPython versions tested on:

3.10, 3.11, 3.12, 3.13

Operating systems tested on:

Linux

Linked PRs

@charles-cooper charles-cooper added the type-bug An unexpected behavior, bug, or error label Oct 26, 2024
@dg-pb
Copy link
Contributor

dg-pb commented Oct 26, 2024

A while ago did some benchmarking on 3.11 and 3.12 with sleep (on OSX) and it did seem unreasonably slow.

Was expecting nano-seconds, but was in micro-second order.

Just assumed that this is how it should be.
But if this is actually a fixable regression, it would be great.

@ZeroIntensity
Copy link
Member

I wasn't able to reproduce this on my end:

$ time python3.11 relax.py
python3.11 relax.py  1.07s user 1.54s system 4% cpu 55.396 total
$ time python3.12 relax.py
python3.12 relax.py  2.04s user 0.47s system 4% cpu 55.842 total

Though, timeit does pick up a tiny slowdown:

$ cat relax.py | python3.11 -m timeit
20000000 loops, best of 5: 10.8 nsec per loop
$ cat relax.py | python3.12 -m timeit
50000000 loops, best of 5: 7.47 nsec per loop

Unfortunately, 3.11 is security-only, so there's nothing that can get fixed here unless this is somehow causing a security issue somewhere.

@ZeroIntensity ZeroIntensity added extension-modules C modules in the Modules dir 3.11 only security fixes pending The issue will be closed if no feedback is provided labels Oct 26, 2024
@dg-pb
Copy link
Contributor

dg-pb commented Oct 26, 2024

Can not reproduce on OSX either:

python -m timeit -s 'import time' '[time.sleep(0) for _ in range(1_000_000)]'
# 3.10: 1 loop, best of 5: 631 msec per loop
# 3.11: 1 loop, best of 5: 550 msec per loop
# 3.12: 1 loop, best of 5: 556 msec per loop

@charles-cooper
Copy link
Author

hmm -- fwiw, here is my CPU info (from cat /proc/cpuinfo):

vendor_id	: GenuineIntel
cpu family	: 6
model		: 166
model name	: Intel(R) Core(TM) i7-10710U CPU @ 1.10GHz
stepping	: 0
microcode	: 0xfe
cpu MHz		: 1600.000
cache size	: 12288 KB
physical id	: 0
siblings	: 12
core id		: 4
cpu cores	: 6
apicid		: 9
initial apicid	: 9
fpu		: yes
fpu_exception	: yes

i am reproducing this in python3.13 as well.

~ $ python3.13 -VV
Python 3.13.0 (main, Oct  8 2024, 08:51:28) [GCC 11.4.0]
~ $ time python3.13 relax.py

real	0m52.886s
user	0m1.305s
sys	0m1.023s

and using timeit:

~ $ python3.10 -m timeit -s 'import time' 'time.sleep(0)'
500000 loops, best of 5: 404 nsec per loop
~ $ python3.11 -m timeit -s 'import time' 'time.sleep(0)'
5000 loops, best of 5: 51.9 usec per loop
~ $ python3.12 -m timeit -s 'import time' 'time.sleep(0)'
5000 loops, best of 5: 51.9 usec per loop
~ $ python3.13 -m timeit -s 'import time' 'time.sleep(0)'
5000 loops, best of 5: 51.9 usec per loop

@charles-cooper
Copy link
Author

charles-cooper commented Oct 26, 2024

i am reproducing on another linux machine, too - this time with ubuntu 24.04

$ lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 24.04.1 LTS
Release:	24.04
Codename:	noble
$ python -m timeit -s 'import time' 'time.sleep(0)'
5000 loops, best of 5: 51.8 usec per loop

@hauntsaninja hauntsaninja added pending The issue will be closed if no feedback is provided and removed pending The issue will be closed if no feedback is provided labels Oct 26, 2024
@hauntsaninja
Copy link
Contributor

hauntsaninja commented Oct 26, 2024

I can reproduce on Linux (but not on macOS) — sleep goes from comfortably under 1us to >50us

I'm assuming this is caused by this change: #65501 (it's also mentioned in the docs for time.sleep).
Reading online, it sounds like you could get a lower interval by fiddling with the Linux scheduler.

Unfortunately, 3.11 is security-only, so there's nothing that can get fixed here

The change is present in Python 3.11 onwards, so it's possible we could change something here. That said, unless someone has a concrete change to suggest, this issue probably isn't going to go anywhere.

@ZeroIntensity ZeroIntensity added the 3.12 bugs and security fixes label Oct 26, 2024
@ZeroIntensity
Copy link
Member

The change is present in Python 3.11 onwards, so it's possible we could change something here. That said, unless someone has a concrete change to suggest, this issue probably isn't going to go anywhere.

Oh, I do see that in the original report now. I thought this only affected 3.11, my bad! But yeah, I doubt there's anything that can be done here. Maybe this is somehow an upstream Linux problem.

@charles-cooper charles-cooper changed the title time.sleep regression in 3.11 time.sleep regression in 3.11 onwards Oct 26, 2024
@charles-cooper
Copy link
Author

I'm assuming this is caused by this change: #65501 (it's also mentioned in the docs for time.sleep). Reading online, it sounds like you could get a lower interval by fiddling with the Linux scheduler.

@hauntsaninja what is the connection with that change? is it that linux defaults to clock_nanosleep() instead of select()?

@charles-cooper
Copy link
Author

charles-cooper commented Oct 26, 2024

just an update -- i can reproduce 3.10 behavior with the following code:

#!/usr/bin/env python
import select

#import time  # 259ms

def main():
    poll = select.poll()
    for _ in range(1_000_000):
        #time.sleep(0.0)
        #select.select([],[],[],0)
        poll.poll(0)

main()  # 408ms

@picnixz
Copy link
Member

picnixz commented Dec 2, 2024

That said, unless someone has a concrete change to suggest

How about specializing the sleep when the time is 0? we could just switch to select for this specific case and let the rest use clock_nanosleep.

@hauntsaninja

@picnixz picnixz added the performance Performance or resource usage label Dec 2, 2024
@picnixz picnixz removed 3.11 only security fixes 3.12 bugs and security fixes pending The issue will be closed if no feedback is provided labels Dec 26, 2024
@picnixz picnixz changed the title time.sleep regression in 3.11 onwards time.sleep(0) regression in 3.11 onwards Dec 26, 2024
@picnixz
Copy link
Member

picnixz commented Jan 5, 2025

This is subject to updates as a consensus has not been fully reached:

We discussed a lot on the PR but I'll try to summarize our conclusions:

  1. This will not be treated as a bug but rather as a performance regression. As such, any PR trying to reduce the time spent in time.sleep(0) will not be backported but considered as a feature.

  2. Using time.sleep(0) may or may not what users want to do. If they want to wait for polling, they should explicitly use select.poll.poll(0). If they want to explicitly relinquish the CPU, they should use os.sched_yield() with a real-time scheduling policy, though the usage of that function is (among the various SO posts I've read) quite controversial.

If users want a 0-sleep and a "fake" operation, they need to use pass instead. If they want a syscall, they should use select() or sched_yield() alternative. Currently time.sleep(0) would take roughly 50us per call, mainly because of implementation details and other checks. However, I managed to reduce this to 2us. While it's above the ~150ns of select.poll.poll(0) or os.sched_yield(), it would at least not penalize existing code by much.

The reason why we decided not to fallback to select() is essentially because of its implementation details and because it's probably better to expect that sleep calls the underlying sleep function and not a "sleep-like" function like select or sched_yield.

@vstinner
Copy link
Member

vstinner commented Jan 7, 2025

I ran some benchmarks on Linux (Fedora 41, Linux kernel 6.12.6).

On Python 3.10, sleep(0) took 289 ns, whereas it takes 52.6 us on Python 3.11: Python 3.11 is around 182x slower than Python 3.10.

$ python3.10 -m pyperf timeit -s 'import time; sleep=time.sleep' 'sleep(0)'
Mean +- std dev: 289 ns +- 22 ns
$ python3.11 -m pyperf timeit -s 'import time; sleep=time.sleep' 'sleep(0)'
Mean +- std dev: 52.6 us +- 0.0 us

Python 3.11 was modified to use nanosleep() / clock_nanosleep() which explains this difference.

Note that a sleep of 1 nanosecond takes exactly 52.6 us on Python 3.10 and 311:

$ python3.10 -m pyperf timeit -s 'import time; sleep=time.sleep' 'sleep(1e-9)'
Mean +- std dev: 52.6 us +- 0.0 us
$ python3.11 -m pyperf timeit -s 'import time; sleep=time.sleep' 'sleep(1e-9)'
Mean +- std dev: 52.6 us +- 0.0 us

@vstinner
Copy link
Member

vstinner commented Jan 7, 2025

FreeBSD doesn't have this issue, even if Python 3.11 implements time.sleep() with clock_nanosleep().

$ python3.10 -m pyperf timeit -s 'import time; sleep=time.sleep' 'sleep(0)' 
Mean +- std dev: 306 ns +- 25 ns

$ python3.11 -m pyperf timeit -s 'import time; sleep=time.sleep' 'sleep(0)' 
Mean +- std dev: 252 ns +- 19 ns

Moreover, Python 3.11 is 1.21x faster than Python 3.10.


For a sleep of 1 nanosecond, Python 3.11 is 25.5x faster than Python 3.10:

$ python3.10 -m pyperf timeit -s 'import time; sleep=time.sleep' 'sleep(1e-9)' 
Mean +- std dev: 5.97 us +- 0.09 us

$ python3.11 -m pyperf timeit -s 'import time; sleep=time.sleep' 'sleep(1e-9)' 
Mean +- std dev: 234 ns +- 4 ns

@vstinner
Copy link
Member

vstinner commented Jan 7, 2025

@vstinner
Copy link
Member

vstinner commented Jan 7, 2025

os.sched_yield() benchmark:

python3 -m pyperf timeit -s 'import os; sched_yield=os.sched_yield' 'sched_yield()' 
  • Linux (Python 3.13): 316 ns +- 25 ns
  • FreeBSD (Python 3.11): 215 ns +- 15 ns

@vstinner
Copy link
Member

vstinner commented Jan 7, 2025

An alternative would be to implement time.sleep(0) as calling sched_yield(). The problem is that I'm not sure if sched_yield() semantics is the one expected by time.sleep(0) callers?

@vstinner vstinner changed the title time.sleep(0) regression in 3.11 onwards time.sleep(0) is slower on Python 3.11 than on Python 3.10 Jan 7, 2025
@picnixz
Copy link
Member

picnixz commented Jan 12, 2025

The problem is that I'm not sure if sched_yield() semantics is the one expected by time.sleep(0) callers?

sched_yield() used with a non-deterministic scheduling policy (which is the default AFAIR) is not recommended (see its manpage). Relinquishing the CPU using time.sleep(0) is a Windows-only implementation (though I don't know why it was designed like this).

As a caller, I would expect:

  • time.sleep(0) to use sleep(), nanosleep(), or clock_nanosleep(). In this case, I shouldn't expect time.sleep(0) to be faster than a pure C call to clock_nanosleep().

  • Altenratively, it should immediately return (no call to any libc function). Now, do we want to specialize this or not? for now, I decided not to and suggested some alternatives. I see two cases: either the caller writes time.sleep(0) (explicitly) and this can be changed into pass or something else. Or this is something like time.sleep(s) where s = 0. In this case, I think the caller does not necessarily know what s is and thus expects that some libc function is called.

What I however think is that time.sleep() was previously incorrectly using select(). We should have always use a sleep function and not emulating a sleep via select() or sched_yield(). The purpose of those functions is different than "sleeping". The former is for waiting for file descriptors to be ready (but we're actually using the timeout value to emulate this) while the latter is to voluntarily relinquish the CPU. So both are distinct from just "sleeping" (although sched_yield() would be closer to what is expected, but since it requires a read-time scheduling policy, we shouldn't use it as well).

Note that using select() with very small sleep values also gives the same performance as with clock_nanosleep() and nanosleep(). The gap is only present when we use a sleep time of 0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
extension-modules C modules in the Modules dir performance Performance or resource usage type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

6 participants