When a node syncs from genesis, should it DA sample from past historical blocks?
The node can learn the canonical chain by following consensus, but it must also be able to validate state transitions. For example, if consensus finalizes a block that contains an invalid signature, or invalid erasure coding, the syncing node must know to reject the block, and thus, historical fraud proofs are a requirement for genesis sync, and historical DA sampling is a requirement for fraud proofs.
However, recall that Celestia is a Data Availability solution, and not a “Data Storage” solution. Guaranteeing the availability of historical data essentially makes it a Data Storage solution, which is a much larger scope. Should nodes syncing from genesis reject historical blocks that are missing data? (This applies to other blockchains, BTC and friends included).
Suppose consensus is on block #390,512, but all replicas around the world of block #25 get deleted. Does that mean we need to go all the way back to block #25, since the chain is now built on a missing (thus invalid) block?
Validity proofs of Celestia’s state transition function (minimal coin balances, transfers, staking) would eliminate this requirement altogether, restoring the scope to DA.
If Celestia doesn’t guarantee the availability of historical data, then there is no way for a syncing node to tell if historical blocks are valid.
The majority could be corrupted, finalize a block with data withheld, and move on. Sampling only protects the nodes who are synced and are following the head. However, if they reject a block finalized by the majority and consensus never finalizes a available block for that height, they’re basically fucked.
I believe that since Celestia is PoS we already make weak subjectivity assumptions and periodically checkpoint. If you’re okay with weak subjectivity, you can assume that all the previous blocks before the checkpoint were valid and only bootstrap the most recent blocks. AFAIK nodes are expected to store block data up until the latest checkpoint, so Celestia does guarantee their storage during that period.
Although blocks before the most recent checkpoint aren’t stored in protocol, it’s likely that they will be stored by someone in perpetuity. Storage after all is an honest 1 of N assumption, and in fact you need not assume altruism on behalf of the storer, they may have very strong economic incentives to store all the block data from genesis.
–
After I’ve written this, I’m now wondering if making a weak subjectivity assumption actually allows you to assume that all previous blocks before the checkpoint were valid. Does it?
Made a diagram of weak subjectivity. Genesis sync is still possible with PoS, but the sync function might return multiple valid chain heads if a long-range attack is in progress, at which point, it is up to your discretion which chain head you support. Checkpoints are useful as some trusted external timestamping information you can use to help disambiguate forks in the event of a long-range attack, but I don’t see how they remove the need to verify all historical state transitions.
In this thread I am asking if the celestia validators finalizing a block with missing data should count as an invalid state transition. I think ideally it would, but that isn’t really guaranteed for any blockchain.
The question is broader than Celestia. should bitcoin revert if copies of historical blocks are deleted? It’s not likely since the bitcoin blockchain is small, but for blockchains that scale, e.g Celestia, it’s much more expensive to store these replicas.
Solana’s blockchain is absolutely massive and very few contiguous replicas of the full chain history exist. Their community has basically agreed to defer to social consensus on whether the current head is canonical- a very strong honest majority assumption with lots of trust placed in the RPC cartel and the exchanges. I wonder if there is any way to spare Celestia from the same fate.