Separating Retrievability from Availability: Implications for the DA Network

Introduction

This document examines the separation between data availability and data retrievability in celestia. The distinction between these 2 concepts has significant implications for network design, resource allocation, and security guarantees. Separating these properties allows us to optimize network performance while maintaining appropriate security guarantees. Light nodes can focus on sampling, while retrivability does not have to be handled by the DA-Network.

Availability vs. Retrievability

Data Availability refers to the property that data has been properly distributed across the network so that it can be reconstructed by any participant if needed. This is a one-time verification event.

Data Retrievability refers to the ongoing ability to access previously distributed data over an extended period. This is a continuous service guarantee.

These properties are fundamentally different:

  • Availability is a one-time event and verifiable through sampling
  • Retrievability is a continuous service guarantee that cannot be cryptographically enforced through sampling alone

Retrievability Limitations

It is important to acknowledge that without specialized protocols (such as Proof-of-Spacetime), sampling-based networks cannot cryptographically guarantee continuous retrievability. Sampling only ensures that the data exists at that particular moment, and it cannot attribute fault if a node decides to delete the data prematurely after most light nodes have already agreed that it is available.

Any claims about long-term retrievability beyond what sampling can verify amount to a “pinky promise” rather than a cryptographic guarantee, which is totally fine as long as you aware of it.

Sample Storage

Current Inefficiencies

The current approach often conflates these properties, leading to inefficient resource allocation. Light nodes (LNs) maintain samples well beyond what is necessary for availability verification, attempting to ensure retrievability - a task for which samples and the sampling process are not optimally suited. Currently, the pruning time of samples is coupled to how long full nodes promise to keep data retrievable.

Protocol change example

  1. Light nodes perform sampling to verify data availability
  2. After verification, light nodes propagate samples to their connected full nodes that request it for reconstruction (FNs)
    2.1 For clarity, “full node” here refers to a generic node that is connected to the DA network and can perform reconstruction.
  3. Once propagation is complete, light nodes can safely and immediately delete samples
    3.1 In the absence of reconstruction, samples can be deleted immediately after sampling is complete.

The protocol completion can be determined when all connected FNs either:

  • confirm receipt of samples
  • indicate they do not need the samples

This interaction can occur before full reconstruction is complete. One additional change still needs to be discussed is whether full nodes that fail to respond appropriately to sample sharing requests should receive negative peer scores, potentially leading to connection timeouts, or whether you can delete samples after a timeout even if full nodes do not respond.

This approach builds on the security assumption that light nodes maintain connections to at least one honest full node, as we assume no eclipse attacks.

If the one honest full node needs the samples, it will get them, and if all malicious full nodes try to request samples even if they don’t need them, then it’s also fine because that is intended behaviour.

Whether full nodes pull from light nodes or light nodes ask full nodes is an implementation detail that can be discussed here independently.

Extension to prune headers early

Light nodes can now prune samples much earlier than previously thought. Another observation is that most users do not need to keep the header in storage. The header is used to verify samples and proofs of inclusion against the data and state root. If you are a user(light node) who does not consume blobs, you can safely delete the header after verifying and deleting the samples. This should also be a simple change and an option perfect for light nodes running in the wallet / browser.

Acknowledgments

Thank you, @walldiss and @adlerjohn, for reviewing, and thank you to many others for discussions and comments that led to this forum post.

6 Likes