Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature request: with_randomized_context method #1567

Open
apoelstra opened this issue Jul 8, 2024 · 9 comments
Open

feature request: with_randomized_context method #1567

apoelstra opened this issue Jul 8, 2024 · 9 comments
Labels

Comments

@apoelstra
Copy link
Contributor

apoelstra commented Jul 8, 2024

In rust-secp256k1 we're exploring how we can best support a signing API for systems which potentially have no allocator, may be operating multiple threads, but which have very limited threading primitives (e.g. we have atomics with acquire/release semantics but no stdlib with prepackaged mutexes or other locks).

I think a useful function would be something like

int secp256k1_with_randomized_context(const unsigned char* seed32, cb callback, void* callback_data) {
    /* create context object on the stack, which can't be done from the public API */
    secp256k1_context_rerandomize(&ctx, seed32);
    return callback(&ctx, callback_data);
}

(where cb is a type alias for a callback that takes a context and void pointer and returns an int).

Our usage here would be to implement a signing function that used a freshly randomized context but which did not require the user pass context objects through every API function, nor would it require us to maintain a global mutable context object, which is really hard to do without mutexes. The resulting function would be ~twice as slow as normal signing function but for many usecases this is acceptable since signing is not a frequent operation.

On the other hand, maintaining a global mutable 32-byte random seed would be super easy because we don't need any synchronization beyond the use of atomics to avoid UB.

cc #780 which is closely related to this but more general.

@Kixunil
Copy link

Kixunil commented Jul 8, 2024

It'd be also extra nice if it was possible to guarantee that callback will be called (abort if it can't).

@real-or-random
Copy link
Contributor

    /* create context object on the stack, which can't be done from the public API */

Couldn't you create a context on the stack using secp256k1_context_preallocated_create? I'm not saying it's elegant, and you'll need to take care of alignment, but it should be doable.

@Kixunil
Copy link

Kixunil commented Jul 9, 2024

@real-or-random Rust doesn't have alloca so it wouldn't work for dynamically-linked system libraries without going through C. We could in principle do that but to my knowledge alloca has some problems (I don't remember the details). Calling into the library code which knows the exact size of its context sounds much more appealing.

@real-or-random
Copy link
Contributor

without going through C. We could in principle do that but to my knowledge alloca has some problems (I don't remember the details).

alloca is simply obsolete and nonstandard, but there's nothing wrong with it. Well, except that it allocates on the stack, but this is precisely what you want to do here. The "modern" (available since C99) version are variable-length arrays.

Calling into the library code which knows the exact size of its context sounds much more appealing.

I see that, I'm just trying to understand the nature of the problem and its urgency.

@real-or-random
Copy link
Contributor

Concept ACK

I think that's a simple way to give users the ability to rerandomize every operation, and it works without breaking our context API. We could even provide convenience wrappers for key generation and signing.

Contexts are pretty small now after we've removed all the dynamic tables. One caveat is that we need to get the stack allocation right in C89... But that's doable, we know the size of the context at compilation time of the library, and we have BIGGEST_ALIGNMENT.

Would one of you be willing to work on a PR?

@apoelstra
Copy link
Contributor Author

Yes, I can take this on.

@gmaxwell
Copy link
Contributor

gmaxwell commented Aug 3, 2024

I avoided stateless randomization in the initial implementation not just for performance reasons, but so that additional calls amplified uncertainty.

Consider an attacker that can observe high resolution power traces or EMI. If the attacker can learn about the timing of a signing, the attacker can also likely learn something about the randomization process itself. But with stateful randomization the uncertainty of the state is cumulative, so the attacker's problem gets worse rather than better with multiple tries. The attacker is better off restarting the device every attempt (if able). But if the context is randomized each time, then strategy becomes to just try in a loop.

To make it concrete, say that the attackers timing trace allows it to view the magnitude of scalar used in a point-scalar multiply. Without randomization the attacker's strategy is sign many times, filter out the signatures where the scalar was atypically small then use LLL to recover the private key. With one-shot randomization, the attacker would look for traces where the offset and key were both usually small-- which would take quadratically more work but would still likely be reasonable if the base attack was. With stateful randomization, this attack won't work except on the first signature (where it reduces to the stateless case), so the attacker has to be able to restart the device between tries which might not be possible at all (e.g. no access to restart it or it needs a pin after restart).

Another reason I'd preferred the explicit randomization was latency-- the randomization could potentially be done in the background after some signing completed and so it would essentially be free from a delay perspective. But that advantage is also lost here.

I think if stateless operation is important it probably makes sense to implement a different kind of randomization, where the scalar is 512-bit (or even just 320-bit) and the result is equal to the intended point mod N. This can be done without precomputing an offset-- and so without an offset calculation that could leak data. I believe it's also a more common blinding technique in industry. Of course, this could be combined with other forms of randomization such as the stateful offset stuff.

@apoelstra
Copy link
Contributor Author

@gmaxwell our intended usage here isn't stateless randomization, it's randomization where we can maintain the state outside of the context object. If we are manipulating a blob of 32 bytes that we control, we can do this using builtin atomics from the Rust standard library (where we can use an array of 32 atomic u8s which we access with no specified memory ordering, which should be nearly as fast as not using atomics at all, but will prevent completely unsynchronized access which would be UB.)

But if we have to manipulate a pointer to an opaque libsecp context, that would require a mutex (which, given that our only low-level synchronization primitives are atomics, would likely require an ad-hoc spinlock-based implementation).

Now, I think that you will argue that if we're updating a 32-byte blob and then doing a fresh ecmult on that, this is still "stateless" because the actual EC mult is started from scratch. Would this be improved if we were to, say, only update half the blob (or even, only one/a few bits)?

I believe that if we took an approach of:

  • Initially choose 32 random bytes
  • On subsequent rerandomizations, "left shift" the lower 16 bytes so that they replace the upper 16 bytes, and choose 16 new bytes randomly. (As mentioned above, this will be done with no effort to be atomic so the operation might get mangled; but this is unlikely and shouldn't hurt anything.)

I believe this form of rerandomization will be "stateful" in the way that you want it to be, albeit with only 128 bits of fresh randomness per rerandomization.

@apoelstra
Copy link
Contributor Author

I have convinced myself that the above solution fails to achieve "statefulness" in a useful sense.

To see why, consider an attacker who can detect when the random seed has a low hamming weight, and in this case somehow gains an advantage in learning about secret data. In the existing stateful model, to exploit this he would need to continually reboot the device as @gmaxwell says, because each rerandomization starts from an existing point (which the attacker knows little about, except at best the hamming weights of the summands that led to its discrete log). So knowing that a particular rerandomization had a low hamming weight tells the attacker very little (nothing, in the limit) about the point that's actually used to blind the secret key.

In contrast, no matter what games I play with the seed passed to with_randomized_context, the attacker will be able to measure the full derivation starting from zero, determine the hamming weight of the blinding factor, and selectively ignore ones that he can't make use of. So we are giving him the full advantage of being able to reboot the device.

Having said this, I think stateless rerandomization is much better than no rerandomization at all, and we can get stateless rerandomization in a nostd Rust environment (as well as in other similar freestanding environments), while we have spent years trying to get stateful rerandomization and not come up with a satisfactory solution. I think we should add this method, put a giant doccomment with @gmaxwell's warnings, explaining that if you have the ability to use mutexes (or just don't care about synchronization) then you should use a global context and repeatedly rerandomize it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants