Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Help the compiler avoid inlining lazy init functions. #443

Merged

Conversation

briansmith
Copy link
Contributor

Before this change, the compiler generates code that looks like this:

  if not initialized:
     goto init
do_work:
  do the actual work
  goto exit
init:
  inilned init()
  goto do_work
exit:
  ret

If the initialization code is small, this works fine. But, for (bad) reasons, is_rdrand_good is particularly huge. Thus, jumping over its inined code is wasteful because it puts bad pressure on the instruction cache.

With this change, the generated code looks like this:

  if not initialized:
     goto init
do_work:
  do the actual work
  goto exit
init:
  call init()
  goto do_work
exit:
  ret

I verified these claims by running:

$ cargo asm --rust getrandom_inner --lib --target=x86_64-fortanix-unknown-sgx

This is also what other implementations (e.g. OnceCell) do.

While here, I made the analogous change to LazyPtr, and rewrote LazyPtr to the same form as LazyUsize. I didn't check the generated code for LazyPtr though.

(Why is is_rdrand_good huge? The compiler unrolls the 10 iteration retry loop, and then it unrolls the 8 iteration self-test loop, so the result is rdrand() is inlined 80 times inside is_rdrand_good. This is something to address separately as it also affects getrandom_inner itself.)

@josephlr
Copy link
Member

josephlr commented Jun 3, 2024

I did some comparisons on x86_64-unknown-linux-gnu and opt-level=3 with a few different implementations:

  1. Current implementation: https://rust.godbolt.org/z/3nWjdbW7j
  2. The implementation in this PR: https://rust.godbolt.org/z/384jec4ba
  3. Marking is_rdrand_good as #[cold]: https://rust.godbolt.org/z/MnnavhYWT

It seems like (2) and (3) generate nearly identical code, while (3) is (to me) easier to read. So I think we should just mark is_rdrand_good (and self_test) as #[cold].

Before this change, the compiler generates code that looks like this:

```
  if not initialized:
     goto init
do_work:
  do the actual work
  goto exit
init:
  inilned init()
  goto do_work
exit:
  ret
```

If the initialization code is small, this works fine. But, for (bad)
reasons, is_rdrand_good is particularly huge. Thus, jumping over its
inined code is wasteful because it puts bad pressure on the
instruction cache.

With this change, the generated code looks like this:

```
  if not initialized:
     goto init
do_work:
  do the actual work
  goto exit
init:
  call init()
  goto do_work
exit:
  ret
```

I verified these claims by running:
```
$ cargo asm --rust getrandom_inner --lib --target=x86_64-fortanix-unknown-sgx
```

This is also what other implementations (e.g. OnceCell) do.

While here, I made the analogous change to LazyPtr, and rewrote LazyPtr
to the same form as LazyUsize. I didn't check the generated code for
LazyPtr though.

(Why is `is_rdrand_good` huge? The compiler unrolls the 10 iteration
retry loop, and then it unrolls the 8 iteration self-test loop, so the
result is `rdrand()` is inlined 80 times inside is_rdrand_good. This
is something to address separately as it also affects `getrandom_inner`
itself.)
@briansmith briansmith force-pushed the b/lazy-init-is-not-usually-done branch from 7d346d4 to dd0cf76 Compare June 4, 2024 00:53
@briansmith
Copy link
Contributor Author

It seems like (2) and (3) generate nearly identical code, while (3) is (to me) easier to read. So I think we should just mark is_rdrand_good (and self_test) as #[cold].

First, this change doesn't just affect the RDRAND case. It also affects Linux/Android because it affects the LazyBool used to cache whether the getrandom syscall is available.

Secondly, the large effect on the Linux/Android implementation is clearer when use_file is refactored in the same way. I've updated this PR with that change.

@briansmith briansmith force-pushed the b/lazy-init-is-not-usually-done branch from dd0cf76 to bf84b06 Compare June 4, 2024 01:12
Copy link
Member

@josephlr josephlr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, this change doesn't just affect the RDRAND case. It also affects Linux/Android because it affects the LazyBool used to cache whether the getrandom syscall is available.

Secondly, the large effect on the Linux/Android implementation is clearer when use_file is refactored in the same way. I've updated this PR with that change.

Fair point, I like having all our init/error paths marked #[cold], as it makes the intent of these paths a little clearer.

@@ -19,6 +19,8 @@ use core::{
const FILE_PATH: &[u8] = b"/dev/urandom\0";
const FD_UNINIT: usize = usize::max_value();

// Do not inline this when it is the fallback implementation.
#[cfg_attr(any(target_os = "android", target_os = "linux"), inline(never))]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Elsewhere we use #[cold] but here we use inline(never), should we be consistent?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, when getrandom is not available it isn't cold. I could go either way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a comment explaining why it is #[inlne(never)] but not #[cold].

src/lazy.rs Outdated Show resolved Hide resolved
@briansmith briansmith force-pushed the b/lazy-init-is-not-usually-done branch from 4ca1f51 to 7afea71 Compare June 4, 2024 17:14
@newpavlov newpavlov merged commit 8933c05 into rust-random:master Jun 4, 2024
52 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants