In traditional blockchains, users are expected to run full nodes that sync all the data in the chain to verify that all the data in the chain committed to by block headers (i.e. as Merkle roots) was indeed published (data availability verification). This doesn’t scale with increasing block sizes, as users would need increasing bandwidth resources. Therefore in newer blockchains, there is a focus on supporting light nodes that can verify the chain’s data availability with lower resource requirements, using techniques such as data availability sampling.
In this post, I map different light node data availability verification techniques, from least secure to most secure.
Level -1: full node (reference level)
Full nodes verify the data availability of a chain by downloading all the block data themselves, and only accept a block as valid if they were able to successfully do so.
We use this as a reference level with the baseline security assumption, where each level below adds extra security assumptions compared to a full node.
Level 0: no data availability guarantee
There is just a commitment to the data but no guarantee that the data is actually available. Examples include IPFS URIs. This can be useful for use cases where data availability is not required for safety, such as NFTs.
Security assumption: (no data availability guarantee).
Level 1: data availability committee
There is a committee that attests that the data for a particular commitment was made available. The threshold of signatures required from the committee can be adjusted to tradeoff data availability guarantee and liveness.
Security assumption: honest majority assumption on the committee.
Level 2: data availability committee with cryptoeconomic security
There is a committee that attests that the data for a particular commitment was made available. The committee can be slashed or halted if they lie about the availability of data.
Note this requires the committee to recognized by the consensus rules of a sovereign chain (e.g. the committee could be the validator set of an L1 chain, such as Celestia). This is because it is not possible to slash data availability failures on-chain due to the fisherman’s dilemma.
If an implementation of this technique does not support data availability sampling light nodes, then the slashing or halting can only be done by full nodes, which e.g. in the case of Tendermint, halt if a block is unavailable.
Security assumption: honest majority assumption on the committee, with cryptoeconomic incentives.
Level 2.1-2.3
If an implementation of this technique does support data availability sampling light nodes, then the slashing or halting can also be done by light nodes. This reduces the minimum node requirements of the technique, which has a positive effect on the security of the technique.
Level 2.1 refers to a level 3 light node, level 2.2 refers to a level 4 light node, and level 2.3 refers to a level 5 light node.
Level 3: data availability sampling, without an honest minority of light nodes
In addition to a data availability committee, it is possible to support data availability sampling light nodes, which are light nodes that can verify the data availability of the chain with very high probability without downloading all the data, by downloading random chunks of data from each (erasure coded) block.
Ideally, there should be an honest minority of light nodes such that they can collectively reconstruct the block if an adversarial block producer withheld any data in the block. This also requires a peer-to-peer block reconstruction protocol. If this is not the case, the scheme has similar security properties to proof of retrievability schemes.
In proof of retrievability schemes, data can only be reconstructed under the assumption that the sampling interface remains live. The block producer or data availability committee would be required to keep the sampling interface live, if they want to pass the data availability checks of all light nodes. If this assumption holds true, then it is possible for a full node to recover all of the data by making enough sampling requests via the sampling interface.
In case the nodes that expose the sampling interface discriminate against full nodes performing sampling, the full nodes could mask by pretending to be many light nodes. In effect, the “honest minority of light node assumption” could effectively be simulated by forcing full nodes to use the same sampling interface that light nodes use.
Security assumption: honest majority assumption on the committee, with cryptoeconomic incentives or data available sampling interface remains live for full nodes.
Level 4: data availability sampling, with an honest minority of light nodes
The protocol supports data availability sampling, and there is an honest minority of light nodes (k of N where k is a constant irrespective of how many N light nodes there are) such that they can collectively reconstruct the block if an adversarial block producer withheld any data in the block.
For it be possible to reconstruct the data, this additionally requires a synchronous network with a bounded network delay, as light nodes need to be able to share their samples with full nodes, so that the block can be reconstructed.
As noted in Level 3, each light node does not necessarily need to be an individual human, for the honest minority of light node assumption to be true. Users could run multiple light nodes. The important thing is that from the perspective of nodes responding to sampling requests, all of the light nodes are indistinguishable from each other (e.g. by connecting from sufficiently unique IP addresses, or ideally via Level 5 private data availability sampling). This is so that nodes responding to sampling requests cannot discriminate against e.g. light nodes that sample more data versus light nodes that sample less data, so that they can fool a greater portion of light nodes. Interestingly, this is a unique example where sybils can add security to a protocol. However, these light nodes need to be honest as they need to be willing to share their samples with full nodes to enable block reconstruction, so it’s likely also not a good idea to rely on a single or few entities running all the light nodes for example.
Under a standard network model however, block producers could perform a “selective share disclosure” attack (section 5.4 in https://arxiv.org/pdf/1809.09044.pdf), where the block producer only responds to the sampling requests of the first few light nodes, but denies the requests of later light nodes, and does not release enough samples for the block to be reconstructed. This would cause the first few light nodes to incorrectly believe that an unavailable block is available. In such an attack, this would cause the block to be rejected from the canonical chain (or at least cause a fork) as the other light nodes (and full nodes) would not accept the block, causing the chain to halt or the committee slashed, so such an attack would likely be costly. At minimum, this would at least prevent validators from being able to arbitrarily change the state transition function unilaterally for the entire network of light nodes, ensuring that validators don’t unilaterally have governance rights over the protocol rules.
Security assumption: honest majority assumption on the committee, with cryptoeconomic incentives or (honest minority of light nodes and synchronous network and light node is not the first few to sample).
Level 5: unlinkable data availability sampling
To prevent the “selective share disclosure” attack where the first few light nodes incorrectly believe that an unavailable block is available, we require an enhanced network model where different sample requests cannot be linked to the same light node, and the order that they are received by the network is uniformly random with respect to other requests. This could be achieved by using mixnet and anonymization technology, but further research is required on this topic. For more information, see this research report on evaluation of private networks for bypassing selective disclosure attacks in data availability sampling.
Security assumption: honest majority assumption on the committee, with cryptoeconomic incentives or (honest minority of light nodes and synchronous network).