-
Notifications
You must be signed in to change notification settings - Fork 192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Help the compiler avoid inlining lazy init functions. #443
Help the compiler avoid inlining lazy init functions. #443
Conversation
I did some comparisons on
It seems like (2) and (3) generate nearly identical code, while (3) is (to me) easier to read. So I think we should just mark |
Before this change, the compiler generates code that looks like this: ``` if not initialized: goto init do_work: do the actual work goto exit init: inilned init() goto do_work exit: ret ``` If the initialization code is small, this works fine. But, for (bad) reasons, is_rdrand_good is particularly huge. Thus, jumping over its inined code is wasteful because it puts bad pressure on the instruction cache. With this change, the generated code looks like this: ``` if not initialized: goto init do_work: do the actual work goto exit init: call init() goto do_work exit: ret ``` I verified these claims by running: ``` $ cargo asm --rust getrandom_inner --lib --target=x86_64-fortanix-unknown-sgx ``` This is also what other implementations (e.g. OnceCell) do. While here, I made the analogous change to LazyPtr, and rewrote LazyPtr to the same form as LazyUsize. I didn't check the generated code for LazyPtr though. (Why is `is_rdrand_good` huge? The compiler unrolls the 10 iteration retry loop, and then it unrolls the 8 iteration self-test loop, so the result is `rdrand()` is inlined 80 times inside is_rdrand_good. This is something to address separately as it also affects `getrandom_inner` itself.)
7d346d4
to
dd0cf76
Compare
First, this change doesn't just affect the RDRAND case. It also affects Linux/Android because it affects the Secondly, the large effect on the Linux/Android implementation is clearer when |
dd0cf76
to
bf84b06
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First, this change doesn't just affect the RDRAND case. It also affects Linux/Android because it affects the
LazyBool
used to cache whether thegetrandom
syscall is available.Secondly, the large effect on the Linux/Android implementation is clearer when
use_file
is refactored in the same way. I've updated this PR with that change.
Fair point, I like having all our init/error paths marked #[cold]
, as it makes the intent of these paths a little clearer.
@@ -19,6 +19,8 @@ use core::{ | |||
const FILE_PATH: &[u8] = b"/dev/urandom\0"; | |||
const FD_UNINIT: usize = usize::max_value(); | |||
|
|||
// Do not inline this when it is the fallback implementation. | |||
#[cfg_attr(any(target_os = "android", target_os = "linux"), inline(never))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Elsewhere we use #[cold]
but here we use inline(never)
, should we be consistent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, when getrandom
is not available it isn't cold. I could go either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a comment explaining why it is #[inlne(never)]
but not #[cold]
.
4ca1f51
to
7afea71
Compare
Before this change, the compiler generates code that looks like this:
If the initialization code is small, this works fine. But, for (bad) reasons, is_rdrand_good is particularly huge. Thus, jumping over its inined code is wasteful because it puts bad pressure on the instruction cache.
With this change, the generated code looks like this:
I verified these claims by running:
This is also what other implementations (e.g. OnceCell) do.
While here, I made the analogous change to LazyPtr, and rewrote LazyPtr to the same form as LazyUsize. I didn't check the generated code for LazyPtr though.
(Why is
is_rdrand_good
huge? The compiler unrolls the 10 iteration retry loop, and then it unrolls the 8 iteration self-test loop, so the result isrdrand()
is inlined 80 times inside is_rdrand_good. This is something to address separately as it also affectsgetrandom_inner
itself.)