Making Celestia fully real-time verifiable (and, in turn, scale nearly for free)

lmao · May 22, 2025, 6:25pm

This is a post about one possible (weird) vision enabling additional verifiability for Celestia. The tl;dr is, by making many parts of the stack verifiable, we can replace (a) a lot of round-trip costs from either validation, fraud proofs, etc and (b) enable all sorts of niceties (such as permissionless bridging with no wait) with very little additional work.

This also lets us do neat stuff like scale to ~GiB-sized blocks for validators/proposers with a fast-but-not-insane internet connection (like, home fiber, e.g.) on somewhat-fast consumer hardware. (Think, say, an Apple Silicon MacBook Pro with decent specs.)

For this post, I will assume some reasonable familiarity with the very high level of the Celestia architecture and some familiarity with the stuff of some weird blog post I posted before. In a sense, this post is an elaboration of that with an eye towards Celestia in particular, so it’s worth taking a peek first, even if just to agree on the high level before getting into the nitty-gritty.

The bit

Encoding verifiability

The basic observation here is that we can make encoding verifiable for almost no additional cost to having only performed the encoding, as we do today. In contrast to the current system, this would remove the explicit need for fraud proofs, which has all sorts of nice downstream consequences.

What exactly are we making verifiable here? Let’s say we are given some data \tilde{X}. Call its tensor encoding X. It is very easy for anyone who has the unencoded block \tilde{X} (i.e., not just the block proposer!) to construct a succinct proof that sampled rows and columns of the encoded X indeed correspond exactly to a unique \tilde{X} and that this X is decodable.

This would (almost-but-not-quite) enable Dev Sharding, of course: with this construction there is ~ no need for anyone in consensus to download the whole block, all that has to be downloaded are the inclusion proofs and some guarantee that execution on the base layer was performed correctly. (Unfortunately, right now, this would mean that we would still have most consensus nodes need to download most of \tilde{X} since we need to verify blob inclusion + payments, but we will get to this in a second.)

What changes need to be made? Well, first the particular way we use Reed–Solomon codes (via “shares”) are simply not high distance enough for the proofs to be small. One thing we might want to do is set the share size to be = 1 (which is the same as a “standard” Reed–Solomon codes) or, equivalently, just tensor encode the matrix with RS codes directly. We might also want to decrease the rate of the code to 1/4, instead of 1/2, which will help even further shorten proofs. (There is no need to change the current encoding scheme or base field until we get to 1GiB blocks, which is neat.)

What does this give? For a 32MiB block:

For nodes which want to sample 100 bits of security for decoding and reconstruction, they only need ~2 MiB total for a 32MiB block: proofs, elements, and Merkle openings included. This would be for people like roll-up providers or consensus nodes.
For nodes who want, say, 10 bits of security, this drops to 382 KiB, by my estimation.

For a 1 GiB block (this will require going to the slightly larger base field \mathbf{F}_{2^{32}})

Nodes that want to sample 100 bits of security need ~ 19 MiB proofs
For nodes that want 10 bits: ~2.4 MiB
7 bits: ~1.9 MiB

The latter is probably slightly too large for now, but alas. For nodes which can’t handle this, we can probably do some sort of proof recursion and so on.

With encoding comes reconstruction

On the other hand, this lets us do distributed reconstruction without requiring any one node hold all the data. In particular, let’s say a node samples a row that is unavailable.

We now need to convince the node that (a) actually there is a unique block that was committed to and encoded, so the leader isn’t just screwing with all of us, and (b) provide enough samples of that row (via column openings we have) that it can error decode to the correct thing.

First, (a) is very easy: anyone who does the sampling for a 100-bit proof by generating verifiable randomness (via, say, generating a random seed and then using the randomness that comes from hashing that seed), then the small node can take this proof and be convinced. (Yes it might be slightly large, but this is a bad case!)
Second, (b) is, well, mostly done for us here too: once (a) is checked, we know that most columns are good, so decoding the given row from the columns means we’re chillin’.

Note that there is no need to actually check or get the Merkle path for this particular row either, it doesn’t matter, since we know the error-corrected row is good from the guarantees of the protocol. (If we had an incorrect path though, it could function as a fraud proof, but it’s really not necessary here.)

With encoding verifiability comes computational verifiability

Ok, we did skip one important (and major) part of the above: consensus nodes still need to make sure that the Celestia state machine (pay-for-blobs, etc.) is still ticking correctly.

In the current scheme, consensus nodes cannot download just the small parts of the block: they cannot re-execute transactions to verify, since, well, they don’t have them!

It turns out that there’s a second, very neat, observation. Once we have that the blocks are functioning as their own proofs of encoding, they also function as a polynomial commitment scheme to the data in the block. This means that we can use the samples (not the whole block!) as the data for a succinct proof that verifies some statement over all transactions in the block.

In other words: sampling for data availability not only allows you to verify that the data is available, but also that the encoding was done correctly, and, even more, lets you verify a succinct proof over the complete data in the block without needing to see all of it. (Of course, constructing said proof requires you know all of the data, but verifying this proof only requires the samples needed for DA.)

This is quite cool! Because, during encoding, we can require the block proposer to construct a (short) proof that the transactions in the given block have netted out correctly, and that the blobs paid for were included in the provided block! We can then get this short proof over the data of the block, and use our DA samples + this proof to verify that the whole block is gucci. (A side note for the nerds: it is very important to note that the operations are very very simple on the base layer. These are also embarrassingly parallel. Since we also have a free commitment to the block’s data, it should be the case that we can construct an insanely efficient proof that everything is good.)

Now nodes only need to perform DA and check this now-quite-short, succinct proof over the given block to verify that everything is good.

Even more, we can also recurse the proof of the previous block being good into the proof of this block! If we do this (a la Mina) then verifying the current block + DA tells us that the chain is good all the way until now and that the data for the current block is available… with almost no overhead compared to having just done the basic DA sampling.

With computational verifiability comes tiny consensus nodes

Ok, finally here’s the punchline: with the fact that everyone can simply verify the block by doing DA and we can do distributed reconstruction if enough people do DA, then, well, no consensus node needs to download the whole block anymore.

Any consensus node just needs the DA samples + the proof of correctness, which means that, for a 1GiB sized block, any node in consensus just needs to download 20 MiB to verify everything (state transition, DA, and so on) is good.

This means we can probably drastically simplify block distribution since samples are very cheap, and now even RPis can participate in consensus! (While still having 1GiB-sized blocks.)

Now, too, can your phone just download a few MB every 10 seconds or so and know that everything is good

Icing on the cake: roll-up proofs

Assuming roll-up data is posted on rows of this matrix X, we note that most of the cost of constructing a proof over this data (found in a row) comes from doing the encoding of these rows.

But, wait! We’ve already done this work for DA anyways! That’s the whole idea of DA! So roll-ups can just reuse this encoding directly, and continue with the (much cheaper) part of the proof. This should enable real-time proving very very easily since, in this scenario, we aren’t trying to do backflips to fit a complicated proof into a tiny (Groth16, etc.) proof that is verified by the EVM or whatever. We just use a standard but insanely fast FRI/WHIR/Ligerito proof (which are “insanely large” by EVM standards) and that’s all anyone needs, since we’re not verifying it on an EVM—we’re verifying it on a phone.

Open questions

Here’s where we get to the open questions, of which there are a few.

How easy and fast is it to prove the Celestia state transition function given the block data and a free commitment to this data? Proving this would involve verifying signatures of blobs and recursing this into a succinct proof, which imo is the only real open question on how quickly we can do this. My assumption is pretty fast, since we have essentially infinite flexibility on the proof system, commitment scheme, and so on.
Can we make the 7 bit node sizes smaller for the 1GiB nodes? Do we need to? Or do we wish to give them slightly worse guarantees as a tradeoff? For example, we can probably recurse a 100 bit proof, with some decent cost, to a much smaller proof, and then prove that a very small number of Merkle openings are fine. Can decode over rows now, but might be missing some parts of the block which make it hard to do distributed decoding for missing rows/cols, unless there are millions of them, which might be a fine assumption. (On the other hand, they can help decode a complete block, if needed, with far fewer people!)
How badly does the change from share_size=whatever it is today to share_size=1 affect the current state of things? Encoding times? Do we need a slightly faster library in this special case? (Should be very cheap, given our somewhat-naive Julia implementation is already pretty fast.)

Pardon any random typos or other nonsense, I’m writing this all out stream-of-consciousness style and have to head out. But hopefully this enables some amount of interesting discussion

lmao · May 22, 2025, 7:13pm

As a quick side note: we can get even fancier and do all sorts of nonsense I didn’t touch upon here.

One example would be to “enshrine” a basic verifier (similar in spirit to this post) which can then just get rolled up into the block proof. Verifiers are, by construction, relatively simple machines, so in many cases they should, in turn, not cost too much to recurse. (This is the point of WHIR and Ligerito, e.g.)

Now, verifying DA + < 500KiB proof gets you: correctness of encoding, DA, correctness of the state transition for the Celestia blocks, and correctness of roll-up updates. All without having to eat the 1000x or so cost of rolling up the proof into a “small” Groth16 proof and so on.