Parameter Change Proposal: Increase SignedBlocksWindow to 20000 for Celestia Mainnet

Details

As of the Celestia v3 upgrade (block 2993219), the theoretical block time was reduced from 12 seconds to 6 seconds. However, in practice, average block times have decreased from ~10s to ~5s. Accordingly, the SignedBlocksWindow was doubled from 5000 to 10000 to maintain the same ~3h30min effective slashing window.

{
  "SignedBlocksWindow": 10000,
  "MinSignedPerWindow": 0.75
}

With these parameters, a validator is jailed if they miss 2,500 consecutive blocks (25% of 10,000), which—at 5 seconds per block—equates to roughly 3 hours and 30 minutes.

While this may seem sufficient on paper, our recent experience shows that it is not enough to realistically respond to real-world infrastructure incidents—especially those that occur during nighttime hours.


Case Study: node101 Incident – What we experienced

On December 29, 2024 at 03:58:18 UTC (block 3268199), our validator was jailed due to a server failure. The machine we had rented suddenly entered read-only mode, disabling the validator. Despite having a fully functional monitoring setup, no alert could resolve the underlying hardware issue fast enough—especially since it occurred during nighttime in our region (UTC+3, Turkey).

We immediately changed providers and rebuilt our infrastructure, but the damage was done. A one-time, unpredictable server failure resulted in a jailing penalty that undermined months of continuous, high-quality validator operation.


We’re Not Alone

After discussing with other validators, we discovered that many were jailed around the same time. Some of them include:

And surely others whose experiences we don’t yet know about. These were not cases of negligence, but rather unavoidable technical interruptions—happening during a slashing window too short to reasonably recover from.


Why This Matters

We fully recognize that being jailed is not a light matter—it directly impacts network liveness and the staking rewards of delegators who trust us with their assets. At node101, we treat this responsibility as paramount and always aim to operate with the highest standards of reliability and responsiveness.

However, real-world infrastructure issues can still arise. Hardware failures, kernel panics, or sudden provider-side restrictions are situations no validator is immune to, no matter how robust their monitoring systems are.

The current 3.5-hour jailing window leaves an extremely narrow margin for recovery from such incidents. It’s not about time zones or sleep schedules—at any given moment, somewhere in the world it’s night, and validators operate across all continents. Even the most committed teams can’t ensure human intervention 24/7 without unsustainable operational overhead.

We believe the slashing window should reflect both technical realities and the effort validators already put into proactive monitoring.


Proposal

We propose to double the SignedBlocksWindow from 10000 to 20000. This change increases the effective jailing threshold from ~3h30min to ~7h, calculated as:

20000 × (1 - 0.75) = 5000 blocks
5000 blocks × 5s = 25,000 seconds = ~6.94 hours

{
  "SignedBlocksWindow": 20000,
  "MinSignedPerWindow": 0.75
}

Uptime Perspective

Let’s put this into an annual reliability context:

  • Total hours in a year:
    365 days × 24 hours = 8,760 hours
  • If a validator is down once for 7 hours:
    8,760 - 7 = 8,753 hours uptime
    (8,753 ÷ 8,760) × 100 ≈ 99.92% uptime
  • If down twice for 7 hours (14 hours total):
    8,760 - 14 = 8,746 hours uptime
    (8,746 ÷ 8,760) × 100 ≈ 99.84% uptime

These are extremely high availability levels—comparable to enterprise-grade SLAs—yet still susceptible to jailing under the current 3.5-hour window. That’s the imbalance we aim to address.

This proposal doesn’t weaken slashing—it preserves validator accountability while recognizing real-world ops. It’s a step toward greater fairness and resilience in a globally decentralized network.


Voting

  • :white_check_mark: YES — Increase the SignedBlocksWindow to 20000 blocks (~7h window)
  • :cross_mark: NO — Keep the current SignedBlocksWindow at 10000 blocks (~3h30min window)

Authors

3 Likes

I agree with this sentiment to increase the window to ~7hrs.

As achieving stable four 9s throughout the year is already a beast of it’s own.

1 Like

First, I want to express my gratitude to both the Node101 and P-OPS teams for putting together such a thorough and well-detailed proposal.

Qubelabs fully supports this proposal, as the 3.5-hour jailing window is too restrictive. Despite 24/7 monitoring, the limited time available for intervention is unsustainable in the long term.

1 Like

Thanks for making this.

I politely disagree.

Jailing doesn’t have severe consequences for the delegators. There’s no slashing of funds, instead only a small amount of inflation is missed while down, which I think is fair as the validator shouldn’t be getting rewarded for not being up.

However, keeping a validator in the set while it’s down does have meaningful consequences for the rest of the network. Each proposal slot it misses is an empty slot, decreasing throughput and increasing block times. Users suffer while the validator gets a small amount of inflation. I don’t think that’s fair.

It’s okay to get jailed. It’s okay for validators with higher uptimes to get slightly more inflation. In my opinion, we should actually lower the number of missed blocks. Good validators will remain up a higher percentage of the time, and thus will have higher APRs. That makes sense to me.

edit TLDR; the goal shouldn’t be to never get jailed. The goal should be to fairly reward good validators and keep the network healthy.

2 Likes

Thanks for your thoughtful response!

I actually agree with much of what you said—validators with higher uptime should earn more inflationary rewards, and jailing, in itself, is a fair consequence for downtime.

That said, the concern we’re raising is a bit more nuanced: jailing has external consequences beyond inflation rewards—notably exclusion from the delegation program even for validators who have been consistently contributing to the network and community.

In that sense, it’s not just about missed inflation or uptime-based ranking anymore. One rare, unlucky event—especially during off-hours—can erase months of good performance and community engagement.

So while we support rewarding performance, we think the current 3.5-hour window is too unforgiving in edge cases. Doubling it to ~7 hours still keeps standards high but introduces a bit more tolerance for rare, unpredictable failures.

1 Like

notably exclusion from the delegation program even for validators who have been consistently contributing to the network and community

That’s a great point! I think we found the root of the problem then and should address that instead of changing protocol parameters in an attempt to either correct or push back against that off chain policy.

4 Likes

You’re right—ideally, off-chain delegation policies should be designed with more nuance, so a single jailing event doesn’t negate a validator’s long-term contributions. But until that changes, protocol-level parameters remain the only lever we have to protect against disproportionate outcomes from rare events.

This proposal isn’t trying to “fix” delegation policies via chain config—but rather to give all validators (especially smaller teams) a more reasonable buffer to recover from incidents that are genuinely out of their control. It doesn’t reduce jailing severity or weaken uptime expectations—it just softens the cliff that currently exists.

Still, I completely agree we should also be having this conversation with the delegation program teams. It’s a two-sided issue—and both sides probably need attention.

There are offcourse ways to be less dependent on a specific server.
Nevertheless less then 4h is short in terms of alerting and responding.
In normal business times alerting, notifying and resolving a P1 is set at a total of 4h when its during business hours. Outside business hours this scales to normally to 6 or 8h.

For the impact here its indeed harsh if a validator maintains uptime for a long period and due to a infrastructure failure a jail event occurs and they loose delegation. For that i am actually up for both, getting a more reasonable time to get things fixed and maybe rethink jailing in the program.

Nevertheless less then 4h is short in terms of alerting and responding.

I don’t think the goal should be not to be jailed. if that is the goal, why not increase the downtime to a week or a month? The goal should be to fairly reward validators and keep the network healthy. This means short jail times and a fixed offchain policy.

the impact here its indeed harsh if a validator maintains uptime for a long period and due to a infrastructure failure a jail event occurs and they loose delegation.

imo, we should change the offchain policy for delegations. for other delegations, all validators get jailed for the same amount of downtime, so more jailings will occur as it shortens. This is actually a good thing! Most people don’t undelegate if someone gets a jailed a few times for a short period because it doesn’t have an effect on APR. If we change offchain policy to not be ridiculous, then we solve the problem at the root. imo, that’s what we should push for.

1 Like

I agree a week or month would be too much—but as Bart pointed out, 7 hours matches normal ops standards, especially outside business hours. It still requires ~99.9% uptime annually, so it’s far from lenient.

I also don’t think jailing should be seen as the default outcome. Validators build up long-term track records, and one rare infra failure can result in losing delegation—not just from Celestia, but other chains too. That’s a big hit for something out of their control.

Also, with a 7-hour window, the validator actually loses more APR before jailing, not less. So, it’s not about avoiding consequences—it’s about making them proportional.

1 Like

I agree with the proposal here.

If we are targeting shorter SignedBlocksWindow for network quality and consistent block times, then this might lead to geographical centralization.
In our experience, nodes in asia struggle to find good peers sometimes and cloud providers in Europe provide much better service. To not get jailed as often, validators will be incentivized to choose nodes in EU which might lead to geographical centralization.

1 Like

Just to share some neutral perspective based on a recent incident we experienced:

Our validator was affected by a server hardware issue that triggered alerts overnight. It took around 4–5 hours for the datacenter to diagnose the problem, migrate to a new machine, and replace one of the faulty drives.

After recovery, we ran into the known P2P peering issues that others have reported and are currently experiencing. Despite the correct configs, syncing took additional time, involving trial and error with persistent_peers, unconditional_peer_ids, seeds, and other settings. What ultimately helped was relying on our own RPC/relayer/snapshot nodes and making sure our validator was added as an unconditional_peer on those nodes.

In total, the recovery took about 6–8 hours from failure to full sync. While we responded quickly, this case shows how infrastructure and networking issues combined can stretch beyond the current jail window, even with active maintenance.

Not taking a side here, but just adding context that supports concerns on both ends - the importance of maintaining network liveness and accountability of staying online/signing blocks, while also recognizing the need for some flexibility when operators face unavoidable downtime beyond their control.

3 Likes

We’re working on blocksync (which is a peering issue under the hood) before increasing block size further, so its a very high priority. We have a good plan to fix this soon Fix getting to the tip of large chains quickly · Issue #1727 · celestiaorg/celestia-core · GitHub

It does feel unfair to get jailed when you responded in time. I don’t think it’s fair to expect insane multi node setups to avoid downtime either.

I still feel like the point of Jailing isn’t clear, or like I’m going crazy and explaining it wrong. It’s to keep the network healthy and reward validators who can keep their node up. Jailing isn’t bad. Being jailed for less than 24 hours should have essentially 0 consequences.

I will find whoever is in charge of the delegation program and ask them to fix it, but increasing the parameter is not addressing the root of the problem. Picking parameters to address symptoms is a terrible practice. We do our best to avoid that for as long as we can.

Increasing this parameter is bad for rollups by directly increases the number of missed slots, decreasing throughput and increasing finality. If it’s bad for rollups, it’s bad for the network.

Please wait to vote on this parameter until the peering (and therefore blocksync and statesync) are fixed.

2 Likes