Help the compiler avoid inlining lazy init functions. #443

briansmith · 2024-05-31T19:56:09Z

Before this change, the compiler generates code that looks like this:

  if not initialized:
     goto init
do_work:
  do the actual work
  goto exit
init:
  inilned init()
  goto do_work
exit:
  ret

If the initialization code is small, this works fine. But, for (bad) reasons, is_rdrand_good is particularly huge. Thus, jumping over its inined code is wasteful because it puts bad pressure on the instruction cache.

With this change, the generated code looks like this:

  if not initialized:
     goto init
do_work:
  do the actual work
  goto exit
init:
  call init()
  goto do_work
exit:
  ret

I verified these claims by running:

$ cargo asm --rust getrandom_inner --lib --target=x86_64-fortanix-unknown-sgx

This is also what other implementations (e.g. OnceCell) do.

While here, I made the analogous change to LazyPtr, and rewrote LazyPtr to the same form as LazyUsize. I didn't check the generated code for LazyPtr though.

(Why is is_rdrand_good huge? The compiler unrolls the 10 iteration retry loop, and then it unrolls the 8 iteration self-test loop, so the result is rdrand() is inlined 80 times inside is_rdrand_good. This is something to address separately as it also affects getrandom_inner itself.)

josephlr · 2024-06-03T23:06:35Z

I did some comparisons on x86_64-unknown-linux-gnu and opt-level=3 with a few different implementations:

Current implementation: https://rust.godbolt.org/z/3nWjdbW7j
The implementation in this PR: https://rust.godbolt.org/z/384jec4ba
Marking is_rdrand_good as #[cold]: https://rust.godbolt.org/z/MnnavhYWT

It seems like (2) and (3) generate nearly identical code, while (3) is (to me) easier to read. So I think we should just mark is_rdrand_good (and self_test) as #[cold].

Before this change, the compiler generates code that looks like this: ``` if not initialized: goto init do_work: do the actual work goto exit init: inilned init() goto do_work exit: ret ``` If the initialization code is small, this works fine. But, for (bad) reasons, is_rdrand_good is particularly huge. Thus, jumping over its inined code is wasteful because it puts bad pressure on the instruction cache. With this change, the generated code looks like this: ``` if not initialized: goto init do_work: do the actual work goto exit init: call init() goto do_work exit: ret ``` I verified these claims by running: ``` $ cargo asm --rust getrandom_inner --lib --target=x86_64-fortanix-unknown-sgx ``` This is also what other implementations (e.g. OnceCell) do. While here, I made the analogous change to LazyPtr, and rewrote LazyPtr to the same form as LazyUsize. I didn't check the generated code for LazyPtr though. (Why is `is_rdrand_good` huge? The compiler unrolls the 10 iteration retry loop, and then it unrolls the 8 iteration self-test loop, so the result is `rdrand()` is inlined 80 times inside is_rdrand_good. This is something to address separately as it also affects `getrandom_inner` itself.)

briansmith · 2024-06-04T00:55:42Z

It seems like (2) and (3) generate nearly identical code, while (3) is (to me) easier to read. So I think we should just mark is_rdrand_good (and self_test) as #[cold].

First, this change doesn't just affect the RDRAND case. It also affects Linux/Android because it affects the LazyBool used to cache whether the getrandom syscall is available.

Secondly, the large effect on the Linux/Android implementation is clearer when use_file is refactored in the same way. I've updated this PR with that change.

josephlr

First, this change doesn't just affect the RDRAND case. It also affects Linux/Android because it affects the LazyBool used to cache whether the getrandom syscall is available.

Secondly, the large effect on the Linux/Android implementation is clearer when use_file is refactored in the same way. I've updated this PR with that change.

Fair point, I like having all our init/error paths marked #[cold], as it makes the intent of these paths a little clearer.

josephlr · 2024-06-04T01:01:06Z

src/use_file.rs

@@ -19,6 +19,8 @@ use core::{
 const FILE_PATH: &[u8] = b"/dev/urandom\0";
 const FD_UNINIT: usize = usize::max_value();

+// Do not inline this when it is the fallback implementation.
+#[cfg_attr(any(target_os = "android", target_os = "linux"), inline(never))]


Elsewhere we use #[cold] but here we use inline(never), should we be consistent?

Well, when getrandom is not available it isn't cold. I could go either way.

I added a comment explaining why it is #[inlne(never)] but not #[cold].

src/lazy.rs

josephlr mentioned this pull request Jun 4, 2024

rdrand: Avoid inlining unrolled retry loops. #444

Closed

briansmith force-pushed the b/lazy-init-is-not-usually-done branch from 7d346d4 to dd0cf76 Compare June 4, 2024 00:53

use_file: Stop inliining uncommnly-taken path.

594dcf6

briansmith force-pushed the b/lazy-init-is-not-usually-done branch from dd0cf76 to bf84b06 Compare June 4, 2024 01:12

josephlr reviewed Jun 4, 2024

View reviewed changes

josephlr approved these changes Jun 4, 2024

View reviewed changes

briansmith added 2 commits June 4, 2024 10:14

Do not inline use_file when it is the fallback.

510c5fc

Make got/cold path logic for one-time initalization more uniform.

7afea71

briansmith force-pushed the b/lazy-init-is-not-usually-done branch from 4ca1f51 to 7afea71 Compare June 4, 2024 17:14

newpavlov approved these changes Jun 4, 2024

View reviewed changes

newpavlov merged commit 8933c05 into rust-random:master Jun 4, 2024
52 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Help the compiler avoid inlining lazy init functions. #443

Help the compiler avoid inlining lazy init functions. #443

briansmith commented May 31, 2024

josephlr commented Jun 3, 2024 •

edited

Loading

briansmith commented Jun 4, 2024

josephlr left a comment

josephlr Jun 4, 2024

briansmith Jun 4, 2024

briansmith Jun 4, 2024

Help the compiler avoid inlining lazy init functions. #443

Help the compiler avoid inlining lazy init functions. #443

Conversation

briansmith commented May 31, 2024

josephlr commented Jun 3, 2024 • edited Loading

briansmith commented Jun 4, 2024

josephlr left a comment

Choose a reason for hiding this comment

josephlr Jun 4, 2024

Choose a reason for hiding this comment

briansmith Jun 4, 2024

Choose a reason for hiding this comment

briansmith Jun 4, 2024

Choose a reason for hiding this comment

josephlr commented Jun 3, 2024 •

edited

Loading