Frame by Frame: Unpacking Celestia’s Windows and Parameters

Introduction

The design of Celestia’s data availability network relies upon a series of interdependent parameters—time windows—that govern security, liveness, and resource utilization. In particular, the concepts of weak subjectivity, unbonding period, pruning window, and sampling window play pivotal roles in ensuring that light nodes, full nodes, and rollups can securely participate, verify, and recover from adversarial conditions.
This post presents an in-depth examination of each window, its relationships, and derives practical parameters.

\textbf{Definitions}\\ WSP := \text{Weak Subjectivity Period}\\ UBP := \text{Unbonding Period}\\ PW := \text{Pruning Window}\\ SW := \text{Sampling Window}\\ SSW := \text{Social Slashing Window}\\ FPW := \text{Fraud Proof Window}

What Is Weak Subjectivity?

The weak subjectivity period denotes the maximum duration for which a node may safely remain offline and later rejoin the network without requiring an external trust assumption (such as a trusted checkpoint). We also call it the trusting period which is currently set to 14 days.
In proof-of-stake contexts, prolonged offline periods expose nodes to the risk of misidentifying the canonical chain—an opening for long-range attacks.

The period should be long enough to accommodate ordinary downtime and routine maintenance. If it is too short, then a node would always have to rely on a trusted checkpoint if it cannot stay online continuously. If it is longer than the unbonding period, validators can exit before they are slashed, as we will see in the next section.

Unbonding Period Must Exceed The Weak Subjectivity

The unbonding period specifies how long validators must wait after initiating a withdrawal before their staked tokens become liquid. I provides a buffer to detect and punish malicious behavior, and it ensures that any stake used to compromise consensus can be slashed if necessary. Otherwise validators could be malicious and exit their stake before being punished.

If the unbonding period were shorter than the weak subjectivity period, a node, that is offline longer than the unbonding period but less than the weak subjectivity period could rejoin without knowledge that certain validators had been unbonded (and potentially misbehaved) in its absence. Now this node could be subjected to a long-range attack as validators that already unbonded could sell their private keys and a new (malicious) fork could be created.

Therefore the unbonding period must be at least as long as the weak subjectivity period.

(1)\; UBP > WSP

Data Pruning Window Can Be Arbitrary Large

The pruning window defines how long full nodes retain raw block data (including blobs). There exists a lower bound on the pruning window that it should at least be as long as the sampling window. Light nodes would detect a false positive data withholding attack if an honest full node deletes the data while a light node expects it to be sampleable. An additional safety buffer (for example, of an hour to a day) is necessary to accommodate for the practical syncing time and network delays a light node might incur. The relationship is that the pruning window has to be at least the sampling window, but not the other way around. We can have a much larger pruning window and make it dependent on storage costs, incentives, and how long the network wants to promise data retrievability.

(2)\; PW > SW

Sampling Window and Weak Subjectivity Are Equivalent

The sampling window denotes how far back a light node must sample the chain to confirm data availability. Intuitively, sampling once at the tip suffices for tip availability, but blocks in between might have been withheld when a light node has been offline. You need to check that you have a continuous chain of available blocks.

Weak subjectivity already captures the maximum safe offline period; the sampling period must match this period. Sampling less would leave the node blind to blocks slashed during its absence due to withholding. Sampling more can also lead to issues. An honest full node could have already pruned the data, meaning that the light node would falsely detect that the data has been withheld. Actions taken based on this observation would lead to incorrect outcomes.

Light nodes must sample exactly over the weak subjectivity period—no more, no less.

(3)\; SW = WSP

Social Consensus of Slashing Must Wait for Weak Subjectivity Period

When a data withholding attack is detected, Celestia resorts to a social slashing mechanism: affected parties agree off-chain to slash the responsible validators and hard-fork to restore liveness. It will take some time to reach social consensus among stakeholders. The claim is that this forking cannot happen before the weak subjectivity ends.
Because offline nodes may have missed the initial sampling and detection of the attack, they cannot confidently claim slashing outcomes until they have sampled across the entire weak subjectivity duration. Prematurely enacting slashing on-chain before this interval would risk splitting the network.
Let’s say the validators get slashed too fast. Then, the validators could reveal the samples post-slashing, meaning that new light nodes syncing just before the weak subjectivity period ends would get fooled.
The weak subjectivity period is the lower bound, but there is also an upper bound: the unbonding period. It would be useless to slash a validator after they have already unbonded.

(4)\; WSP < SSW < UBP

Fraud Proof Time of Optimistic Rollups Must Respect Weak Subjectivity

Optimistic rollups on top of Celestia challenge state updates via fraud proofs. The fraud proof window defines how long a challenger may contest an invalid rollup batch. For rollups relying on Celestia’s data layer, the fraud proof window must allow challengers to sample data and verify state transitions within the underlying network’s security.
If the fraud proof window were shorter than Celestia’s weak subjectivity period, a malicious rollup operator could temporarily withhold data, trigger a malicious state transition, and have that transition exit the fraud proof period before light nodes complete their required sampling. Consequently, invalid batches could become irreversible despite ongoing attacks.
Therefore, the fraud proof window for optimistic rollups must be strictly longer as Celestia’s weak subjectivity period with some additional buffer to create, distribute and settle the fraud proof.

(5)\; FPW > WSP

Weak Subjectivity Is Uniform Across Node Types

Celestia’s weak subjectivity period applies universally to light nodes, full nodes, and validator clients. Regardless of role, everyone must share the same offline tolerance.
If light nodes have a shorter weak subjectivity, slashing might happen afterward, and vice versa. For the whole network to follow the same chain, nodes will need the same weak subjectivity period.

(6)\; WSP_{\text{light}} = WSP_{\text{full}}

Conclusion

The protocol has to balance how often nodes have to be online without a trusted checkpoint with the rollups desire to keep the weak subjectivity of Celestia as low as possible and to socially slash as early as possible. As long as we satisfy the constraints described in the post we can change the weak subjectivity to a more favourable parameter, lets say 1 week for example. This change could also lower the unbonding period closer to 1 week. Additionally, because we established that the sampling window is equivalent to the weak subjectivity, light nodes must sample for 1 week exactly. Finally, because the pruning window is decoupled from these things, we can set it to 2 weeks, as this seems sufficient and would already be in line with the current weak subjectivity period.

\textbf{Constraint Summary}\\ (1)\; UBP > WSP\\ (2)\; PW > SW\\ (3)\; SW = WSP\\ (4)\; WSP < SSW < UBP\\ (5)\; FPW > WSP\\ (6)\; WSP_{\text{light}} = WSP_{\text{full}}

Acknowledgments

Thank you, @walldiss and @adlerjohn, for reviewing, and thank you to many others for discussions and comments that led to this forum post.

5 Likes

Why, what if pruning window is larger than sampling window and thus the data wouldn’t have been pruned?

1 Like

Why, what if pruning window is larger than sampling window and thus the data wouldn’t have been pruned?

You are right. The conclusion is still correct, but the reasoning was wrong.

First, we must agree that the Social slashing window is very close to the weak subjectivity window. Waiting for the validators’ slashing would not make much sense if the validators are malicious. Maybe it should be the same. I’m interested in hearing your thoughts.

If that is correct and you sample past the social slashing window but before the pruning window, your sampling does not give you any additional guarantees because no action will be taken if sampling fails.

Therefore, you should not sample past a point where it no longer makes a difference.

You could even simplify the point: You don’t have to sample before a trusted checkpoint, and the trusted checkpoint always has to be inside the weak subjectivity window; therefore, you should never sample beyond the weak subjectivity period.

2 Likes

Historically, TrustingPeriod has been a parameter that was subjective to the node - It could be different values so long as it was less than the UnbondingPeriod plus some delta needed to propagate evidence and act on the misbehaviour (i.e. slashing). Here you propose that it in fact needs to be global. Can you describe more why? It may be more helpful to describe a case where the TrustingPeriod is different between two nodes and what that can lead to

I think that before we had the following assumption :

Pick any TrustingPeriod < UnbondingPeriod; as long as ≥ ⅔ of the current validator weight is honest, a light client is safe because any equivocation can still be proved on-chain and slashed.

Now that we are talking about a light node that does not want to do a 2/3 honest majority assumption, we need to revisit the above statement. Because data withholding is not an attributable fault, every node has to sample for itself. And I think this is where a network split can occur if these windows diverge.

Scenario:

Below is the timeline with three nodes:

  • Node ATrustingPeriod = 7 days, offline until Day 8
  • Node BTrustingPeriod = 14 days, offline until Day 8
  • Node CTrustingPeriod = 7 days, online the whole time, runs DAS
Day Network / Chain Event Node A – TP 7d (offline) Node B – TP 14d (offline) Node C – online + DAS
0 All nodes trust header H Stores H Stores H Stores H
1 ⅔ validators sign H + 1 but withhold its block data (offline) (offline) Fails DAS ⇒ flags misbehaviour, halts at H
1-6 Validators keep building on the withheld fork; community discusses response Stays halted; coordinates with other nodes through social conseues
7 Social-consensus hard fork & slashing of malicious validators; new canonical chain starts at F₀ Still offline Still offline Upgrades to fork F₀; resumes producing / following blocks
8 Validators finally release withheld data; offline nodes reconnect Gap = 8 d > 7 d ⇒ needs fresh checkpoint → seeds F₀, syncs Gap = 8 d ≤ 14 d ⇒ uses existing header path → syncs, DAS now passes Fully synced on fork; DAS passes

In conclusion, Node A and C will stay on the same hard fork because they have the same trusting period. Node B thinks it’s still safe to sync, and it would be now on a different fork. If the slashing were to happen on-chain with evidence, then you would not need this, but because that’s only possible through social consensus (hard fork), you need to coordinate when you do social consensus and agree beforehand.

Now, writing this, I realize that IBC light clients have an honest majority assumption, so it might not be needed or possible to do this hardfork, and they would be rekt anyway. But just as you redeploy a new Blobstream you should also redeploy a new celestia IBC light client.

The only way I can imagine doing this is through permissionless(through security council decided) intersubjective forking where you can deploy a new IBC client and the community decides which fork to follow, which effectively would be slashing.

That fork would have the same amount on the receiving chain as the malicious light client at deployment, but with an upgrade exit window. And this upgrade exit window would be exactly the trusting period, as it would decide for how long you can exit if there is a malicious security council.

Summarizing all nodes should have the same trusting period. On-chain light clients cannot have this feature without intersubjective forking, which includes IBC.

My current thinking is that as long as this feature does not exist for IBC, it runs on a higher trust assumption, so the trusting period just needs to be lower than the unbonding period, and nothing has to change as long as we don’t change the unbonding period.

1 Like

Thanks for the extensive write up. I think I understand your explanation. I feel that the murky undefined ares is still how the social consensus is conducted - there is no pre-agreed upon procedure. Some thoughts:

  1. Can Rollups have different criteria for migrating than the Celestia network does? i.e. some might want to fallback or migrate to another DA far earlier than the 7 day window.
  2. Can the network choose to fork earlier than the 7 day period or is evidence that data is withheld need to be for a minimum of 7 days.
  3. We need to remember that data withholding is still subjective. Some nodes could receive some portion of the shares. The colluding nodes may want to provide just below the reconstructable threshold of shares to convince some minority that data is available.
2 Likes

We always assume that we don’t have a selective disclosure attack. This might not reflect reality, but all our security models rely on that, so I don’t think that argument holds. Either all nodes know that the data is available or none of them do to hold the agreement property.

Rollups should not have different criteria especially if the assets issued on the rollup are coming from the base chain. If you are a sovereign rollup you can do whatever you want and socially hardfork whenever you want. The assets on issued natively on this chain would be fine. Assets from Celestia would not. Rollups forking with the baselayer is more of a feature than a bug.

You can probably have a fork earlier and go back to an honest majority assumption for rollups, but it would not be the canonical one for light nodes. The problem with forking earlier is that light nodes that have been offline for a longer period are fooled. It’s the same setup as above, but instead, you now fork at day 3, which fools Node B when it comes online, and the attackers releases the data that has been withheld before.

Thats why you want to have the trusting period as short as possible to have the tightest security and recover as fast as possible. But the downside is obviously that you have to be online more often.

So there is an argument to go even lower than 7 days.

But given possible infrequent use of the light client - perhaps you only check your balance once a week - lowering the trusting period would lead to more frequent reinitialisation which is also not great as you are more frequently vulnerable to trusting an incorrect or forged header.

I would imagine this would be the most common way an attack would play out if we were ever to experience one. I disagree that our security models should assume that.

I guess even as a sovereign rollup, if many of your assets are held up in bridges you may want to coordinate some form of hard fork to ensure they are not lost.

I get your point that if your fork early, existing light nodes might not be aware unless the operator is notified out of band. In the event it actually happened, I would think the community would lean to forking earlier than waiting the full trusted period and relying on out of band consensus to inform light node operators to reinitialize

would lead to more frequent reinitialisation which is also not great as you are more frequently vulnerable to trusting an incorrect or forged header.

You would never get an incorrect or forged header, because it is a trusted checkpoint. If you trust the checkpoint by definition its correct. The argument should be “asking for a trusted checkpoint is bad UX, because I don’t want to call my friends every time I want to use my wallet”

I would imagine this would be the most common way an attack would play out if we were ever to experience one. I disagree that our security models should assume that.

The reality is that our security model does assume that and we can only fix that with private sampling. Look at “5.7 Properties Security Analysis Standard Model” [1809.09044] Fraud and Data Availability Proofs: Maximising Light Client Security and Scaling Blockchains with Dishonest Majorities

Due to the selective share disclosure attack described in Section 5.4, this
means that the block producer can violate soundness and agreement of the first
c clients that make sample requests, as the block producer can stop releasing
shares just before it is about to release the final shares to allow the block to be
recoverable

I get your point that if your fork early, existing light nodes might not be aware unless the operator is notified out of band. In the event it actually happened, I would think the community would lean to forking earlier than waiting the full trusted period and relying on out of band consensus to inform light node operators to reinitialize

That is the same as lowering the trusting period. If the reality is different than we should lower the trusting period to match that reality.

Asking a friend out of band if there was fork is isomorphic to asking them for a trusted checkpoint.