-
Notifications
You must be signed in to change notification settings - Fork 192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Web optimizations #559
Web optimizations #559
Conversation
1853f3b
to
d5f59dd
Compare
There is a strange bug in Firefox where using In the meantime I implemented a workaround where I check the name of the constructor instead. |
b711996
to
0e3e163
Compare
Alright, I just remembered that I encountered this issue before. So the problem is that We can detect COOP & COEP via See w3c/ServiceWorker#1545 and Bugzilla #1 and #2. |
5aab925
to
04a539e
Compare
Alright, after playing around with the implementation and accounting for various scenarios, I decided its probably best if we just always check the constructor name for simplicities sake. Let me know if you think otherwise. |
This is a fair amount of additional complexity. It would be nice to provide performance measurements to demonstrate that this optimization is worth the trouble. It also may be worth to split this PR into two parts: And wouldn't the unconditional dependency on |
The first change, caching the The second change is not using an I did a quick local benchmark:
Here is the code (no const RUNS: u32 = 10_000_000;
const CRYPTO_BUFFER_SIZE: u32 = 256;
fn main() {
let global: Global = js_sys::global().unchecked_into();
let crypto = global.crypto();
let mut buffer = [0; CRYPTO_BUFFER_SIZE as usize];
let start = Instant::now();
for _ in 0..RUNS {
crypto.get_random_values_ref(&mut buffer).unwrap();
}
console::log_1(&format!("{:?}", start.elapsed() / RUNS).into());
let mut buffer = [0; CRYPTO_BUFFER_SIZE as usize];
let start = Instant::now();
for _ in 0..RUNS {
let js_buffer = Uint8Array::new_with_length(CRYPTO_BUFFER_SIZE);
let js_buffer = js_buffer.subarray(0, CRYPTO_BUFFER_SIZE);
crypto.get_random_values(&js_buffer).unwrap();
unsafe { js_buffer.raw_copy_to_ptr(buffer.as_mut_ptr()) };
}
console::log_1(&format!("{:?}", start.elapsed() / RUNS).into());
} I will let you decide on how representative this is for actual real world usage. I assume the most offensive part of the code additions is the I briefly tested just not caching it and doing the detection on every single call. It would still be an improvement, but we loose around 10% of the performance gain. Let me know what you think.
This is a workaround for a macro-hygiene bug in |
aa62dd1
to
2f9ec56
Compare
I removed the caching of the Let me know if you want me to split out the |
Alright, I made rustwasm/wasm-bindgen#4325 to solve this issue. Its a dance I've done a couple of times already and its pretty bad. The Wasm module already contains all that information so we will just let So this will wait for a new release of |
ae2616d
to
15a1de5
Compare
I decided to remove the |
30fb78f
to
20f7a91
Compare
This PR is ready again. I updated the OP so all the relevant information can be found there. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @daxpedda for trying to help us clean this code up. I also really like the idea of using target_feature = "atomics"
to check for if we need to care about shared memory or not.
I have some suggestions below on how to furthur simplify.
src/backends/wasm_js.rs
Outdated
// Crypto.getRandomValues() | ||
#[cfg(not(target_feature = "atomics"))] | ||
#[wasm_bindgen(method, js_name = getRandomValues, catch)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of using methods, static
s, and thread_local_v2
, can we use js_namespace
like:
// Web Crypto API bindings (https://www.w3.org/TR/WebCryptoAPI/)
#[wasm_bindgen(js_namespace = crypto)]
extern "C" {
#[cfg(not(target_feature = "atomics"))]
#[wasm_bindgen(js_name = getRandomValues, catch)]
fn get_random_values(buf: &mut [MaybeUninit<u8>]) -> Result<(), JsValue>;
#[cfg(target_feature = "atomics")]
#[wasm_bindgen(js_name = getRandomValues, catch)]
fn get_random_values(buf: &Uint8Array) -> Result<(), JsValue>;
}
(I'm not sure if js_namespace
should be crypto
or ["globalThis", "crypto"])
).
Then the code can just be:
// If atomics are not enabled, we assume that Rust's memory is not backed by a
// SharedArrayBuffer, so we can directly call getRandomValues() on the buffer.
#[cfg(not(target_feature = "atomics"))]
pub fn fill_inner(dest: &mut [MaybeUninit<u8>]) -> Result<(), Error> {
// The byteLength of the input must not exceed 65536 bytes, see:
// https://developer.mozilla.org/en-US/docs/Web/API/Crypto/getRandomValues
const MAX_LENGTH: usize = 65536;
for chunk in dest.chunks_mut(MAX_LENGTH) {
if crypto.get_random_values(chunk).is_err() {
return Err(Error::WEB_GET_RANDOM_VALUES);
}
}
Ok(())
}
// If atomics are enabled, Rust's memory can be backed by a SharedArrayBuffer,
// so we use a temporary Uint8Array buffer when calling getRandomValues().
#[cfg(target_feature = "atomics")]
pub fn fill_inner(dest: &mut [MaybeUninit<u8>]) -> Result<(), Error> {
// We are rarely called with long slices, so we allocate a small buffer.
const BUF_LEN: u16 = 256; // Less than MAX_LENGTH
let tmp = js_sys::Uint8Array::new_with_length(BUF_LEN.into());
for chunk in dest.chunks_mut(BUF_LEN.into()) {
// The chunk can be smaller than tmp's length, so we call to
// JS to create a smaller view of buf without allocation.
let buf = if chunk.len() == BUF_LEN.into() {
tmp
} else {
let len: u32 = chunk.len().try_into().expect("chunk.len() <= BUF_LEN <= u32::MAX");
tmp.subarray(0, len)
}
if crypto.get_random_values(&buf).is_err() {
return Err(Error::WEB_GET_RANDOM_VALUES);
}
// SAFETY: buf's length is equal the chunk's length.
unsafe { buf.raw_copy_to_ptr(chunk.as_mut_ptr().cast::<u8>()) };
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are forfeiting the check if Crypto
is available though.
But if your intention was just to get rid of the nesting, how about prefacing the function like this:
if CRYPTO.with(Option::is_none) {
return Err(Error::WEB_CRYPTO)
}
// use `get_random_values()` without nesting:
get_random_values(&buf);
Let me know how you want to proceed.
My recommendation is to keep the check in place, there's all sort of strange JS environments out there, getrandom
is very foundational and should support these imo. And CRYPTO
is cached after all so there is no performance overhead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checking if the global crypto
object exists doesn't really change what platforms/environments we support. It just changes what error message we get when the call to crypto.getRandomValues
fails. Before, we were checking for existence of the object as part of our platform detection (and to cache the value in RngSource
). Now that we don't have two different implementations for Web vs Node, we don't need to do any of that.
So if I'm understanding things correctly, having the check for the existence of the object just lets us return Error::WEB_CRYPTO
to the user, informing them that "Web Crypto API is unavailable"
instead of my implementation above which would make us return Error::WEB_GET_RANDOM_VALUES
in all places.
I think returning good error messages is valuable, so I think keeping some sort of check is OK, provided that we aren't making an additional FFI call every time (which I don't think we do because the result of is_none
is cached).
I personally find the un-nested version easier to read, so we might try a version that avoids too much nesting (potentially by using a helper function).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checking if the global
crypto
object exists doesn't really change what platforms/environments we support. It just changes what error message we get when the call tocrypto.getRandomValues
fails.
Ah, you are relying on the fact that WBG just catches any exception when tagging it with catch
. While this is currently the case, its not intended to be used like this. Ideally, this would only catch exceptions thrown by the function call. However, I guess this behavior has to stay in place because otherwise it would be a breaking change.
I personally find the un-nested version easier to read, so we might try a version that avoids too much nesting (potentially by using a helper function).
Sure!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, let me know if you like this better!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see!
From my understanding the issue was getting rid of the nesting, but that has been achieved, right? Is there some other motivation to get rid of thread_local_v2
?
On that note: your proposal will store the check globally. But different threads may have different outcomes. This isn't hypothetical either, some worklets don't have access to Crypto
. So this needs to be in a thread-local to work correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not familiar with the wasm-bindgen
inner workings, but wouldn't js_namespace
-based code result in a lighter-weight WASM file? Plus, the resulting source code will be slightly smaller, assuming that we do not care about the WEB_CRYPTO
error. If we want to keep it, then I think the current code is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not familiar with the
wasm-bindgen
inner workings, but wouldn'tjs_namespace
-based code result in a lighter-weight WASM file?
Yes, we have one FFI function less present in the Wasm file. To put it into perspective: the function that actually ends up in the Wasm file after post-processing is done is just a single FFI signature, one byte cast and an if condition. Not counting the storing and loading ofc.
Plus, the resulting source code will be slightly smaller (assuming that we do not care about the
WEB_CRYPTO
error).
If you don't care about the WEB_CRYPTO
error yes, then we can simply go with the original proposal by @josephlr. I will do some digging into the specification and actually find out if Crypto.getRandomValues()
can realistically fail or not actually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, that fast: according to the specification, Crypto.getRandomValues()
can only fail if we don't supply the right buffer type or we exceed the maximum byte length of 65536. So instead of getting rid of WEB_CRYPTO
I will go ahead and get rid of WEB_GET_RANDOM_VALUES
.
Let me know what you think!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 8cf6832.
0095636
to
04f830f
Compare
Co-Authored-By: Joe Richey <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gating on target_feature = "atomics"
looks a bit surprising to me, but I am not familiar with the wasm-bindgen
and the wider WASM ecosystem, so I can not make an informed decision here.
I'm happy to answer any questions, but here is some context: The base problem is that The underlying Wasm memory in browsers is represented by a This is connected to the Wasm threads proposal (toolchains use the target feature
This is to say:
Lastly, Rust by default marks the memory with
Within the spec, there is one thing we might want to consider: it is allowed to enable the Outside the spec, the other case was using a |
CI failure is unrelated. |
src/backends/wasm_js.rs
Outdated
#[wasm_bindgen(method, getter)] | ||
fn crypto(this: &Global) -> Crypto; | ||
// Holds the global `Crypto` object. | ||
#[wasm_bindgen(thread_local_v2, js_name = crypto)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I initially avoided using globalThis
here because I have no idea how it works exactly on Node.js. In the meantime I have looked up the documentation and asked around: it should be perfectly fine to use globalThis
in Node.js.
For context: the current setup just accesses the object directly from the global scope. According to the spec this is perfectly fine. However, this is usually discouraged because global variables can simply be overwritten. While I'm not sure if there is more history behind this, fact is that read-only properties like crypto
can't be overwritten. So both approaches, accessing Crypto
through the global crypto
variable or globalThis.crypto
, will work the same.
But to stick with "how it should be done" I will add globalThis
now.
EDIT: Support should also not be a concern after #554, but you can make your own evaluations via "Can I Use".
Thank you! |
As discussed in #541. This PR adds the following improvements:
Crypto
object.SharedArrayBuffer
. If not, we can copy bytes directly into our memory instead of having to go through JS. This saves allocating the buffer in JS and copying the bytes into Wasm memory. This is also the most common path.SharedArrayBuffer
requirestarget_feature = "atomics"
, which is unstable and requires Rust nightly. See Web optimizations #559 (comment) for full context.Uint8Array
with the minimum amount of bytes necessary instead of a fixed size.Uint8Array
size for the atomic paths have been increased to 65536 bytes: the maximum allowed buffer size forCrypto.getRandomValues()
.All in all this should give a performance improvement of ~5% to ~500% depending on the amount of requested bytes and which path is taken. See #559 (comment) for some benchmark results.
This spawned a bunch of improvements and fixes in
wasm-bindgen
that are being used here:std::thread_local!
rustwasm/wasm-bindgen#4315MaybeUninit<T>
rustwasm/wasm-bindgen#4316atomics
for Node.js rustwasm/wasm-bindgen#4318undefined
static imports withOption
rustwasm/wasm-bindgen#4319copy_to_uninit()
to allTypedArray
s rustwasm/wasm-bindgen#4340