Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SnarkyJS SHA/Keccak #9

Open
wants to merge 13 commits into
base: main
Choose a base branch
from
Open

SnarkyJS SHA/Keccak #9

wants to merge 13 commits into from

Conversation

Trivo25
Copy link
Member

@Trivo25 Trivo25 commented Jun 15, 2023

ECDSA in separate PR #14, based on this one.

0005-snarkyjs-ecdsa-sha copy.md Outdated Show resolved Hide resolved
0005-snarkyjs-ecdsa-sha copy.md Outdated Show resolved Hide resolved
@Trivo25 Trivo25 changed the title SnarkyJS ECDSA and SHA/Keccak SnarkyJS SHA/Keccak Jun 20, 2023
// ..
```

### Alternative Approach
Copy link

@MartinMinkov MartinMinkov Jun 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've spent some time researching and thinking about our approach to the API, and I feel a Hash interface could be a great fit for us.

I took a look into the functionality of various popular JS runtimes like Node.JS, Deno, and Bun, I noticed they offer builder patterns for creating hash functions. While their approach is great, considering the diverse configuration options they have to deal with, our needs aren't as complex (I think). So, perhaps we might not need to follow the builder pattern for our hash functions.

I also looked into how Noir structures its approach: https://noir-lang.org/standard_library/cryptographic_primitives/hashes.

They follow a std:hash:[hash_name] style, influenced by Rust, which I found quite appealing. It feels efficient and neat to call the function directly without any additional building process. My vote goes towards an interface-styled approach, happy to hear other opinions, though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with having the different hashes directly!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just checking... in the snark these hashes come with some circuit constants and it may be more efficient to include those in the circuit once at the start of the circuit. If the "direct hash" API isn't able to facilitate this, then the "builder" approach could have an advantage since in that approach you could add those constraints when building the hash configuration.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jspada, don't we have a way to include constants only once anyway? "Constant unification"? When you use a constant that already exists in the circuit, it will just be wired to the existing one?
If that's true and we can use it, it seems to me we don't need a special setup command to avoid adding the same constants multiple times. cc @mrmr1993?

0005-snarkyjs-ecdsa-sha.md Outdated Show resolved Hide resolved
Comment on lines 146 to 150
The usage of both Keccak/SHA3 and Poseidon is the same: The user passes an array of Field elements into the hash function and the output is returned. One important detail is that Keccak/SHA3 only accepts an array of Field element that each are no larger than 1 byte. We will also provide additional helper functions that allow conversion between Field element arrays and hexadecimal encoded strings.

```ts
function fieldBytesFromHex(xs: Field[]): string;
function hexToFieldBytes(hex: string): Field[];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for calling it out that the Keccak interface expects single-byte field elements.

I think we should follow our usual approach here, and introduce a provable type which encodes the constraint of being no more than a single byte long: UInt8 or Byte

The Keccak / SHA3 interface should therefore be:

function hash(message: UInt8[]): UInt8[]
// TODO what is the return type? we could return a fixed length tuple!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw @Trivo25, your RFC has the fieldBytesFromHex and hexToFieldBytes interfaces mixed up

Copy link

@MartinMinkov MartinMinkov Jun 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to ask if adding Field as an acceptable type in the array is helpful. When hash is called, we can check if the field is exactly a byte at runtime. That way, we can be more flexible with our types using (Field | UInt8)[]. The drawback of this flexibility is that we introduce errors when we could avoid them and just make the user do their type conversations. wdyt?

Returning tuples would be neat here! I think you keep all the normal array methods, so the added type safety is helpful!

EDIT: I don't think the above idea makes sense actually if we are removing the range checks in the gadgets.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to removing the constraint checks in the gadgets and forcing input to be UInt8 👍

@Trivo25 Trivo25 marked this pull request as ready for review June 21, 2023 12:41
@Trivo25 Trivo25 mentioned this pull request Jun 21, 2023
@barriebyron
Copy link
Contributor

I removed the extraneous Markdown that breaks CI, the template was updated after this RFC was created #10

## SHA3/Keccak

In order to test the implementation of SHA3 and Keccak in SnarkyJS, we will follow the testing approach we already apply to other gadgets and gates.
This includes testing the out-of- and in-snark variants using our testing framework, as well as adding a SHA3 and Keccak regression test. The regression tests will also include a set of predetermined digests to make sure that the algorithm doesn't unexpectedly change over time (similar to the tests implemented for the OCaml gadget).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a set of predetermined digests

If we're considering to utilize the test vectors from organizations like NIST (to ensure the correctness of the implementation), then we should specify it.

regression test

It will be good to also mention what will form the regression testing suite (for example, functional tests, specific edge cases, errors handling, backward compatibility, etc.).

Copy link
Contributor

@mitschabaude mitschabaude Jun 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, we have a "regression test suite" in use in snarkyjs CI, which this RFC refers to. it checks that constraints generated using a given list of contracts / provable methods don't change: https://github.com/o1-labs/snarkyjs/blob/main/src/examples/vk_regression.ts

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for specific edge cases, errors handling

Since padding is important for security and there are at least two kinds of padding supported by our Keccak gadgets, from a security perspective testing different length messages (e.g. empty, zero, one less than padding lengths, one more, multiples of, etc) are great edge-cases to test imho!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks everyone! I slightly edited this section to reflect your ideas. I am a big fan of adding (maybe not all, but at least some) of the NIST test vectors - what are your thoughts on this? Is this already being done in the crypto side? (didn't explicitly see that)

In order to test the implementation of SHA3 and Keccak in SnarkyJS, we will follow the testing approach we already apply to other gadgets and gates.
This includes testing the out-of- and in-snark variants using our testing framework, as well as adding a SHA3 and Keccak regression test. The regression tests will also include a set of predetermined digests to make sure that the algorithm doesn't unexpectedly change over time (similar to the tests implemented for the OCaml gadget).

In addition to that, we should provide a dedicated integration test that handles SHA3/Keccak hashing within a smart contract (proving enabled). This will allow us to not only provide developers with an example, but also ensure that SHA3 and Keccak proofs can be generated.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Are we concerned about performance testing?
    • For example, hashing large amounts of data or rapidly making hash requests.
  • Are we concerned about collision testing?
  • Are we concerned about security testing?
    • Possible/known vulnerabilities, cryptographic weakness, etc.
  • Are we concerned about usability testing?
    • In a form of gathering the feedback from a small group of developers to improve the design and usability of the API before releasing it.

  • The RFC indicates that hashing with SHA3 and Keccak is more expensive than using Poseidon. Performance metrics, such as time and memory usage, should be gathered and analyzed to provide developers with a clear understanding of the subject.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question - @querolita to what extend has the original crypto implementation be tested? Do you have any recommendations to what we should test, if at all?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests performed to the implementation were:

  • Checking different output lengths and different configurations of the padding (NIST's for SHA3 and submission's for Keccak)
  • Checking different length outputs with different numbers of blocks
  • Edge cases of padding to make sure that a whole new block is created, and messages that already look padded
  • Checking different endianness of inputs and outputs
  • Check that non-byte length inputs cannot be passed
  • Can reuse constraint system for the same circuit structure
  • Cannot reuse constraint system for different output lengths
  • Cannot reuse constraint system for same output length but different padding
  • Cannot reuse constraint system for different endianness

I would create similar tests for SnarkyJS, but probably more high level since you don't have access to low level details such as constraint system creation (that I am aware of). In particular, I would make sure that inputs which are not "correct formatted hex values" (meaning an odd number of hex digits) is not passed to the function. And of course making sure that all combinations actually produce the expected value (pairs of output lengths and padding types), for some random values. I wouldn't go much deeper than that regarding the snarkyjs level alone.

In that sense, we are not testing against collision, just comparing against official outputs of the hash function on those given inputs, nor performance testing (all we know is 1 block of encryption, which is enough for 1080bits of input data, fits in 15k kimchi rows, thus 4 full blocks can be hashed without chunking), we haven't looked for keccak's potential vulnerabilities, and we have not gathered usability info from users other than ourselves.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shimkiv does this sounds good to you? Is there anything else you would change here?

Copy link
Member

@shimkiv shimkiv Jul 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we will follow the testing approach we already apply to other gadgets and gates

It will be good to add some implementation example links here.

will also include a set of predetermined digests

Can we please be more specific here? What kind of set? Is it standardised set? If not then how we determine the set to be good enough?
Open question

similar to the tests implemented for the OCaml gadget

Can we please include links here?

We will include a range of edge cases in the tests (e.g. empty input, zero, etc).

It will be good if we can be more explicit here and at least highlight the major cases we're aware of and that will be included into the testing plan.

In that sense, we are not testing against collision, just comparing against official outputs of the hash function on those given inputs, nor performance testing (all we know is 1 block of encryption, which is enough for 1080bits of input data, fits in 15k kimchi rows, thus 4 full blocks can be hashed without chunking), we haven't looked for keccak's potential vulnerabilities, and we have not gathered usability info from users other than ourselves.

If we agree on this then it would be good if we can add listed items into the "out of the scope" testing section with some rationale.

Copy link
Member

@querolita querolita left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving for now but please take a look at the comments to update the names, and whether or not you want to extend non-NIST to support other output lengths as well.

0005-snarkyjs-ecdsa-sha.md Outdated Show resolved Hide resolved
0005-snarkyjs-ecdsa-sha.md Outdated Show resolved Hide resolved
0005-snarkyjs-ecdsa-sha.md Outdated Show resolved Hide resolved
0005-snarkyjs-ecdsa-sha.md Outdated Show resolved Hide resolved
0005-snarkyjs-ecdsa-sha.md Show resolved Hide resolved
In order to reduce the bindings surface, Keccak and SHA3 will be exposed via a `create` function, which behaves like a factory pattern.
By calling this function inside SnarkyJS, we can define and expose all possible variants of SHA3(224/256/384/512) and Keccak without the need to have an individual function in the bindings layer for each variant.

In order to differentiate Poseidon, which works over native Field elements, and SHA3 and Keccak which works over byte-sized Field elements, it will be beneficial to introduce a new type `UInt8` (a Field element that is exactly a byte) to draw a clan line between both hash functions.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems useful. Nonetheless, the gadget already performs all necessary constraints to make sure that the inputs are indeed 8 bits at most. That means it is sound. But if you want to provide the user interface with a much more explicit type for this so they can see if it is ok at compile time, then I believe it is a great idea for user experience.

Copy link
Contributor

@mitschabaude mitschabaude Jun 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is important information! If we had a UInt8 type, it would actually be preferable to constrain its range as it is introduced into provable code. This is how we usually do it, so that any user of UInt8 in provable code can rely on the fact that this value can only be 8 bits long. (You might have seen the same done for booleans in snarky).

So, could it be a possibility to add a flag to the existing gadgets that disables the 8 bit range check on inputs? @querolita

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea @mitschabaude - makes sense.

Note typo: draw a clan line

Copy link
Member

@querolita querolita Jun 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Answering your question @mitschabaude, yes, it could be easily added so that you have access to it from SnarkyJS. From a cryptographic point of view, I still have the feeling that I do prefer to leave those constraints in the circuit. Meaning, even if the JS code won't allow you to introduce anything larger than UInt8, it still relies on trusting that the types are correct rather than a mathematical proof that it is indeed the case. But if this is something that SnarkyJS does often, I guess it is fine if you can disable the check in case the user introduces a UInt8 as input.

Copy link
Contributor

@mitschabaude mitschabaude Jun 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@querolita I may have been not clear enough -- what I'm proposing is to add constraints which prove that inputs fit in 8 bits. It's just better to add those constraints when a value is introduced, and not inside a specific gadget.

Assume that we have multiple gadgets which operate on UInt8. If every gadget adds constraints to check that the Uint8 is only 8 bits long, then we will add those gates multiple times (if we use multiple of these gadgets).

If, instead, we add the constraints whenever a Uint8 is introduced (with exists), and not in each gadget, then we will have those constraints only once.

In snarkyjs and in snarky, we already have a way to add constraints to the proof everytime a variable of a certain type is introduced with exists. In snarky, every typ can have a check method which contains those "type-defining" constraints.

exists calls typ.check:
https://github.com/o1-labs/snarky/blob/14f8e2ff981a9c9ea48c94b2cc1d8c161301537b/src/base/checked_runner.ml#L277

and types such as booleans include their defining constraints as part of typ.check:
https://github.com/o1-labs/snarky/blob/14f8e2ff981a9c9ea48c94b2cc1d8c161301537b/src/base/utils.ml#L278-L281

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, I completely agree with you @querolita that we can never leave out constraints like this.

even if snarkyjs would throw an error on creating a UInt8 which doesn't fit in 8 bits, that would not prevent adversarial provers from including such a value in their proof: they can simply create their proof using a modified version of snarkyjs which doesn't throw this error. a modification like this is trivial and doesn't change the constraints, so an adversarial proof like that would be valid against an on-chain verification key, which simply means smart contracts would have a security vulnerability

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooooooh!!! Okay I get you now! Nice very nice indeed! Actually we had discussions early in the days of the design of Keccak where we were analysing the possibility to add a "Byte" type, together with existing ones like Boolean or Field. But we thought it might not be super urgent at the Snarky level, but it indeed looks interesting at the SnarkyJS level.

I really like this idea now. And yes, I can create a flag to disable these checks inside the gadget, if they have already been constrained at UInt8 creation time. I will create a subsequent PR with this, together with other improvements that have been suggested along this discussion.

0005-snarkyjs-ecdsa-sha.md Outdated Show resolved Hide resolved
0005-snarkyjs-ecdsa-sha.md Show resolved Hide resolved
0005-snarkyjs-ecdsa-sha.md Outdated Show resolved Hide resolved
In order to reduce the bindings surface, Keccak and SHA3 will be exposed via a `create` function, which behaves like a factory pattern.
By calling this function inside SnarkyJS, we can define and expose all possible variants of SHA3(224/256/384/512) and Keccak without the need to have an individual function in the bindings layer for each variant.

In order to differentiate Poseidon, which works over native Field elements, and SHA3 and Keccak which works over byte-sized Field elements, it will be beneficial to introduce a new type `UInt8` (a Field element that is exactly a byte) to draw a clan line between both hash functions.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea @mitschabaude - makes sense.

Note typo: draw a clan line

// ..
```

### Alternative Approach
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just checking... in the snark these hashes come with some circuit constants and it may be more efficient to include those in the circuit once at the start of the circuit. If the "direct hash" API isn't able to facilitate this, then the "builder" approach could have an advantage since in that approach you could add those constraints when building the hash configuration.

## SHA3/Keccak

In order to test the implementation of SHA3 and Keccak in SnarkyJS, we will follow the testing approach we already apply to other gadgets and gates.
This includes testing the out-of- and in-snark variants using our testing framework, as well as adding a SHA3 and Keccak regression test. The regression tests will also include a set of predetermined digests to make sure that the algorithm doesn't unexpectedly change over time (similar to the tests implemented for the OCaml gadget).
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for specific edge cases, errors handling

Since padding is important for security and there are at least two kinds of padding supported by our Keccak gadgets, from a security perspective testing different length messages (e.g. empty, zero, one less than padding lengths, one more, multiples of, etc) are great edge-cases to test imho!

0005-snarkyjs-ecdsa-sha.md Show resolved Hide resolved
@querolita
Copy link
Member

querolita commented Jun 22, 2023

Hey! Since @mitschabaude suggested to include some flags to reduce duplication of byte-checks when lists UInt8 will be used, I thought maybe this could as well be interesting for outside users:

Right now, all keccak implementations consider the first element of the list to be the most significant byte of the string. Similar for the output. I wonder if it could be interesting to add a flag to indicate whether the input/output is desired to be in big endian or in little endian, to be more transparent about this point. What do you think? @Trivo25 @jspada

In SnarkyJS, these new primitives will be declared as followed:

```ts
function buildSha(length: 224 | 256 | 385 | 512, nist: boolean) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you update this with the most object too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What exactly do you mean by "the most object"?

0005-snarkyjs-ecdsa-sha.md Outdated Show resolved Hide resolved
In order to test the implementation of SHA3 and Keccak in SnarkyJS, we will follow the testing approach we already apply to other gadgets and gates.
This includes testing the out-of- and in-snark variants using our testing framework, as well as adding a SHA3 and Keccak regression test. The regression tests will also include a set of predetermined digests to make sure that the algorithm doesn't unexpectedly change over time (similar to the tests implemented for the OCaml gadget).

In addition to that, we should provide a dedicated integration test that handles SHA3/Keccak hashing within a smart contract (proving enabled). This will allow us to not only provide developers with an example, but also ensure that SHA3 and Keccak proofs can be generated.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shimkiv does this sounds good to you? Is there anything else you would change here?


Overall, exposing new gadgets and gates follow a strict pattern that has been used all over in the SnarkyJS bindings layer. As an example, the [Poseidon](https://github.com/o1-labs/snarkyjs-bindings/blob/main/ocaml/lib/snarky_js_bindings_lib.ml#L386) implementation behaves similarly. From the point of view of SnarkyJS, these gadgets are just another set of function calls to OCaml.

## ECDSA
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove this section if there will be separate RFC?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! It will be removed with the other PR, which is stacked on top of this PR #14


## Test plan and functional requirements

## SHA3/Keccak
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test plan and functional requirements and SHA3/Keccak sections are of the same level. Later one probably should be a sub-section (###).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants