-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to provide data availability guarrentees inside the EVM #6
Comments
So when a proof happens we know that
Which is 4000 gas per tx. Its not gr8 but it should work. Tho it means that if we move to pedersen commitments we would need to implment it very efficiently or else our data availability guarrentees would limit tx per second. This also works out as much cheaper than passing public inputs to the snark verification. |
The process is:
So, the 'delta' is the new leaves between each SNARK proof, and the circuit inputs these leaves as private variables, and proves that the hash of all of the leaves in-sequence is equal to the one calculated on-chain, then it adds them all into the tree and calculates the new root. So: def circuit(public:old_root, public:new_root, public:leafs_sequence_hash, private:leafs):
leaf_hash = 0
for leaf in leafs:
leaf_hash = H(leaf_hash, leaf)
assert leaf_hash == leafs_sequence_hash
root = merkle_append(old_root, leafs)
assert root == new_root This verifies that the leafs provided to the circuit match the ones added on-chain, without having to provide the leaves as public inputs. It also attests that the merkle root has been updated specifically to include those leaves. To insert a leaf, the person calls an on-chain contract and provides it with the leaf hashes, this could be done once, or in batches. But that's for cheaper inserts into the merkle tree only, with aggregation of inserts via zkSNARK proof - which is different from what |
To re-cap some discussion earlier. Data availability can be guaranteed by splitting it into the 'tx pool' and the 'prover', this also provides an anti-censorship mechanism. There is a consensus of nodes which represent the 'tx pool', they guarantee data availability. If I want a TX included in a block, I must first prove to the 'tx pool' nodes that it's a valid transaction for a given merkle tree root (e.g. the current block). The tx pool signs this information, confirming they have received proof that it's a valid transaction and hold all the data necessary for the prover to include it in a block. I then take this information to the prover, I say 'I have proof, from the tx pool, that I have a valid transaction' without revealing any details about that transaction to the prover. The prover then gives me a token/signature which guarantees they will include this transaction in the block, however they don't know any information about that transaction (yet). For the next block, the prover then provides all the guarantees for transactions to the tx pool nodes - which reply with the details of the transactions. This provides a mechanism which splits responsibility of data availability and proving transactions, so the prover must commit to what it will include without knowing what it is, and that all information it needs will be available - so would require collusion between the tx pool nodes and the prover to censor transactions. This is a half-baked idea, but is worth investigating. |
If whatever consensus mechanism used involves every tx pool node seeing the plain transaction, then it should be fairly easy for a prover to also run a tx pool node in order to view the plain transactions submitted by users and as a result the prover would still be able to censor effectively. An alternative could involve DFINITY style random sampling such that only a subset of tx pool nodes check the validity of a user's transaction which might make it more difficult for a prover to see all user's plain transactions. This would be a UX hit for users, but instead of relying on a tx node pool, maybe users could generate a SNARK proving the validity of a transaction without revealing it to the prover. The prover would then provide a signature guaranteeing the inclusion of the valid transaction without knowing its contents until it actually receives the plain transaction later on for inclusion in the Merkle tree. |
Good insight.
Yes, this was a line of thinking I had too. But then the 'data availability' problem rears its head again, where if the prover commits to including a transaction - you need to guarantee that it has all the data available for it and the transaction owner can't withhold the information. At least - if the prover will be penalised for not including that data, you shouldn't be able to deliberately withhold information. e.g. we want some kind of game theoretic commitment by the prover, where it's penalised for not following the rules of the game (where it must prove everything it's committed to) ? |
IMO we should provide strong data availability guarrentees. For me this means being able to recover from failure modes. Ethereum provides this by saying if data ever becomes unavailable then the chain is invalid and we will roll back to the last state where data is available and continue mining from there. Our in EVM approaches to data availability proves this by relying on ethereum to provide these guarrentees.
However this approach does not allow this. If the tx pool is malicious they can lock all users funds by making data become unavailable. Is this a proposal for a complete data availability guarrentees or just one layer of it? |
Where is input for these sha256 operations coming from? From the tx data? A byte of Ethereum transaction costs 68 gas (see |
Even if this can be somehow optimized to only posting the addresses and hashes of the leaves on mainchain, it's still around 5k gas per tx. A batched token transfer can already be achieved in EVM at 15k gas per tx, so the saving is not as radical, especially when the costs of zk-proof are considered. Have you guys considered approaches from the link below (erasure coding, probabalistic availability checks by light clients)? |
We are kind of moving away from this proposal in favor of an idea from @HarryR Where we pass all of the leaves that are updated. Hash them together in the EVM and then passing that to the snark that again hashes the new leaves and then verifies that the result equals the public input from the EVM. So in that case we would only need to pass the list of leaves to the EVM which would cost noTx* cost_per_word == 2,176,000 we also have to do 2046 hashes inside the EVM which is feasible for sha256 but for our more efficient in snark computations it gets more expensive. But the problem here is that is increases the number of constraints in the snark required. For noTx = 1000 |
Ah okay this makes it seem impossible to provide data availability guarantees inside the EVM. Maybe not impossible but a 3X optimization is less exciting. So perhaps we have do some something similar to plasma and allow people to exit on a stale state. Then we don't need to have data available but we do need to have exit time limits. |
My thinking on erasure coding is that in this context we only have one chunk. So it turns into data compression. But because all of the data that we have to publish are hashes and therefore sudo random we probably can't get decent compression of them. Tho i am open to investigating this approach a little more. |
Ideally we can construct a system in which users can go offline for prolonged periods of time and still reasonably expect their funds to be safe. Let's call it long exit guarantee. Zk-SNARKs cover the most difficult part of it: no incorrect transactions can be posted. Now, hopefully we find a solution the remaining challenge, data availability. If I understand correctly, stale state exit approach would not have long exit guarantee, because users must act immediately when they face data unavailability for their branch. Right? |
Can you give me some info on this? I want to check if the batched tx's are unique.
We can probably compress the leaves from 32 bytes currently so we can get sub 5k gas per tx.
so that is 7 addresses per 32 bytes and addresses are probably compressible on top of that. I still think that 1.5 to 3 improvement is justify MVP 1 having EVM based data availability guarantees. But lets talk about that at our next meeting #16 :)
So lets move this to another issues to keep things a little cleaner here. #15 is for data availability outside the EVM and here for for in EVM data availability. |
The scheme can look like follows. Transfer data contains compressed |
What kind of compression do they use? |
Just 4 bytes per account. |
So the 'account' is the leaf offset? e.g. each transaction submitted on-chain in EVM is:
Then that data is hashed into a sequence, and the same hash sequence is verified in-snark? With that form each tx is 128 bits, and it accurately describes the movement of funds from one account to another so you can calculate the balance of each account after each block. And you can pack two transactions into each EVM word. So gas cost is:
Then, you can pass the 'old_root' and 'new_root' parameters to EVM too, and hash them, so the SNARK circuit only has 1 public input, keeping verification costs to a minimum. I see what @barryWhiteHat means about 128 bit security level, if every tx has 128 bits of tx info, and a truncated 128 bit hash of the public key, then you can fit the whole leaf into 256 bits, the problem though is the fast in-circuit hash algorithms don't work on 256 bits - you need to fit them in 253 bits. But you could make the
Then the on-chain costs become:
And all on-chain tx data has enough info to fully re-create and verify the tree. Say snarkVerification costs 500k. With 100 txs the gas cost per-tx for snark verification is 5k.
So you can see the larger the number of TXs included per-block the cheaper it gets. However, the cost is dominated by the 'cheap' hash cost - and maybe 5k per hash is too much, it might be a lot cheaper than that... The signatures for the transactions don't need to be included on-chain. And the pubkey hash doesn't need to be either... But. it looks like using zkSNARKs can do account-model similar to batched updates, cheaper than EVM alone, if you use SHA256 has the on-chain hash function this becomes very cheap on-chain (much cheaper than batched EVM transactions) and the break-even threshold is reduced at the expense of a snark circuit which takes much longer to prove. |
So right now our gas cost per tx using Harry's data availability guarantees
@gluk64 does that look correct to you? Where we can optimize;
|
Yes. Is 5k expected cost of hash with Pedersen commitment? Do we actually need Still, the cost of even 10k per tx is very high. Remember that SNARK proof generation is equally if not more expensive. And as is shown above, simple transfer under the same conditions can be implemented on pure EVM in under 20k gas. So what do you think? Besides, Ethereum transaction data does not have same data availability guarantees as EVM storage data. Any miner can safely discard all transactions prior to a block he believes is sufficiently irrevertable. So the entire solution depends on the existance of the altruistic archive nodes, who will be willing to provide all the Gigabytes of historic data upon request. I'm not such nodes exist even now... |
It depends on the implementation and how much i can optimize. I think 5k is optimistic. I would think more like 10k.
No we do not include these. The user can calculate these from the transactions they send. When someone receives a payment they get the actual leaf data also. otherwise it does not count as a receipt.
Agree this is a major limitation. I am thinking about data availability outside EVM. Will post about it in a while. Not sure if we should move forward with EVM data availability would like to hear other weigh in @HarryR @yondonfu @PhABC
If they throw away the tx how can anyone sync the chain? They need to confirm signatures to calculate the current state ? |
There is no need to verify the entire chain from genesis. All clients can have trusted block number hardcoded and download a snapshot of the state. Parity and geth both support such a quick sync mode, I suspect that most new users do not download full chain. And the absolute majority uses light clients with Infura. |
Due to #3 its quite expensive to add a public input to the snark. So we cannot provide data availability that way. We need to provide this some how in Minimal Viable Roll_up (MVR)
The text was updated successfully, but these errors were encountered: