Woods Attack on Celestia

nusret1996 · 7 September 2021 21:23

This post was co-authored with Nima Vaziri (@NimaVaziri). Special thanks to John Adler (@adlerjohn), Mustafa Al-Bassam (@musalbas), Ismail Khoffi (@ismail) and other members of the Celestia team for their reviews.

HackMD mirror: Woods Attack on Celestia - HackMD

Introduction

This document describes a denial-of-service (DoS) attack that targets rollup nodes trying to sync with the latest state of a rollup on Celestia. While bootstrapping, these nodes download the headers of all past rollup blocks to determine the canonical tip of the rollup chain. Thus, the time required for the completion of bootstrapping depends on the number of rollup blocks previously published on the parent chain. In this context, attackers could delay bootstrapping considerably by publishing many invalid rollup blocks on the parent chain and forcing syncing nodes to download all of their headers.

Rollups on Celestia is prerequisite for a better understanding of this article which assumes a basic knowledge of the rollups and namespaces.

Section Background explains how rollup-specific messages are posted on the parent chain in Celestia. Section Attack Description gives a detailed description of how the attacks exploit the mechanism explained in Section Background to spam the parent chain with invalid rollup blocks. Sections Costs and Mitigation focus on the attacker’s costs and the mitigations strategies for the attack respectively.

Background

In Celestia, the parent chain does not host any contract that interprets or processes rollup-specific messages. It is used by the rollup nodes solely as an ordering and data availability service for their messages. Consensus nodes select messages sent by rollup nodes using a fee-market and accept any message that is paid for. As they do not validate the states within rollup blocks, consensus nodes cannot distinguish or censor invalid messages submitted to the parent chain.

parent-chain-block
Parent chain blocks consist of multiple shares each associated with a namespace ID (NID) and arranged in a square. Colored rectangles represent the messages published for the namespace IDs of the shares. In the given example, shares 5 to 16 are spammed by invalid messages (shown in red) posted for the namespace ID (0xFF) of a rollup under attack.

Each rollup is associated with a unique namespace ID which signifies to the consensus nodes the location of the rollup-specific messages within a parent chain block. These messages are structured to first list the contents of the rollup block headers, e.g state roots, and then the transactions contained in these block. To download the past rollup blocks included in a parent chain block B, rollup full nodes would first ask for the messages within B with their rollup’s namespace ID from the consensus nodes. Similarly, rollup light clients would ask for the first few bytes of the same messages corresponding to the rollup block headers. Upon receiving the respective messages from the consensus nodes, rollup nodes can verify that they were returned all messages with the queried namespace ID using the Namespaced Merkle Trees.

To bootstrap with their rollup, new full nodes retrieve headers and transactions contained within all past rollup blocks published on the parent chain. Thus, they can process all of the older transactions, verify the validity of the blocks and determine the canonical rollup chain and the latest application state. On the other hand, new light clients download only the headers of past rollup blocks published on the parent chain. Using past validity or fraud proofs generated for these blocks, they can detect the invalid ones and determine the canonical rollup chain consisting of the valid blocks.

Given the explanations above, there are multiple stakeholders that can potentially be affected by the attack:

Consensus (parent chain) nodes
Rollup full nodes under attack
Rollup light clients under attack
Attacker: Any entity that can post rollup blocks to the parent chain can be an attacker, such as malicious full nodes in a rollup.

Attack Description

In the descriptions below, we assume that the attackers target the rollup full nodes and light clients belonging to a certain rollup with a known namespace ID.

Attack on Rollup Light Clients

In this version of the attack, attacker pays consensus nodes to spam each parent chain block with headers of invalid rollup blocks. Suppose there are k blocks on the parent chain and each such block consists of n fixed sized shares that can be assigned a namespace. Then, the attacker publishes the header of an invalid rollup block at each share and sets their namespace IDs to be same as the rollup. Hence, when a new light client attempts to bootstrap, it will receive rollup block headers from all of the shares within each parent chain block, and will have to check the validity of the state roots within all of them using validity or fraud proofs. Assuming that a single state root can be checked in constant time, identifying all of the invalid rollup blocks and finding the canonical rollup chain will take time on the order of k \times n, i.e the total size of the parent chain at the given moment. This is equivalent to having rollup light clients download all of the data within the parent chain.

Attack on Rollup Full Nodes

In this version, attacker pays consensus nodes to spam each parent chain block with not only the headers of invalid rollup blocks but also their transactions. For this purpose, it reserves all of the n shares in each parent chain block for the namespace ID of the rollup and fill them with headers and transactions of invalid rollup blocks. Thus, even though a syncing rollup full node will download and process O(k \times n) transactions, equivalent to the total size of the parent chain, the number of valid rollup blocks verified by the node will be a much smaller fraction of the downloaded blocks.

Costs

When rollup light clients are under attack, they have to download O(n) messages from each parent chain block, where n is the total number of shares per block. Considering that a light client for the parent chain downloads only O(\sqrt{n}) messages per block for data availability checks, downloading O(n) messages is a large fixed cost for light clients syncing with a rollup. On the other hand, even though rollup full nodes can be expected to process as many as O(n) rollup-specific transactions per parent chain block, only a small fraction will end up in the canonical rollup chain under attack.

For a numerical estimate, if (i) there are n=1024 shares that can be dedicated to a given namespace in a Tendermint block, (ii) each share costs \$9 in transaction fees (\$9 was calculated to be on par with Ethereum transaction fees), and (iii) there is one block per minute, a rough estimate for the attacker’s cost would be 1024∗9∗(30∗24∗60)\approx400 million USD for an attack period of four weeks during which there will be k\approx43,000 parent chain blocks. This is not only a reasonable cost for a billion dollar adversary, e.g, a nation state, but also a serious DoS vector for rollup light clients as they will have to download the entirety of 43,000 parent chain blocks spammed with invalid rollup-specific messages.

To see the effect of the attack above on the bootstrapping time, suppose one year worth of Celestia blocks can be downloaded by light clients in a single day when they receive messages from only O(\sqrt{n}) of the shares per block. Then, downloading the entirety of 43,000 blocks spammed by the attacker would require approximately 3 extra days on top of the original 1 day, enough to dissuade most light clients from ever following the rollup under attack!

Mitigation

One way bootstrapping rollup nodes can mitigate the attack is by downloading rollup block headers from full nodes via the rollup’s peer-to-peer network. This way, they can avoid scanning the parent chain and downloading messages from every share. However, this solution cannot be implemented by rollup light clients that rely on the parent chain to receive rollup-specific messages. Moreover, it might facilitate other DoS attacks on these syncing light clients by malicious rollup full nodes. (These can potentially be avoided using privacy preserving spam protection methods on the network level.)

Another mitigation strategy is to rate limit the number of shares that can be reserved for a single namespace within each parent chain block. For instance, when the total number of shares per parent chain block is n, rollup-specific messages with a single namespace ID can be restricted to occupy at most \sqrt{n} shares per block. In this case, even a light client under attack would have to retrieve messages from at most O(\sqrt{n}) shares per parent chain block. As light clients already download O(\sqrt{n}) messages per parent chain block for data availability checks under normal circumstances, this restriction removes the DoS vector caused by the attacks as compared to the normal operation of the rollup.

The proposed restriction on the number of shares per NID can be implemented in multiple ways. For instance, the price of reserving a new share for a given NID can be made exponentially large as the number of shares with that NID grows. It can also be enforced directly by rollup nodes that would refuse to download messages from more than \sqrt{n} shares per parent chain block.

Rollups that are willing to pay for more space can always use multiple namespace IDs for their messages to circumvent the constraint on the number of shares per namespace. However, this comes with a greater DoS risk for the syncing nodes, implying a tradeoff with the benefits of more data storage.

When calculating the attacker’s costs in Section Costs, we ignored the fact that the attacker’s funds could be slashed by the rollup nodes for publishing invalid blocks on the parent chain. However, as the balances of rollup nodes are stored as part of the application state in Celestia, attacker can be penalized only if it has already locked funds within the state. If this is not the case, rollup nodes can always reject blocks by the attacker; however, this does not change the fact that rollup nodes would have to download these blocks first (at least the signatures to verify the identity of the staked rollup node). Thus, financial penalties do not necessarily provide a mitigation for the syncing rollup nodes in the context of the attacks above.

nick · 15 September 2021 10:48

Why I think trusting that rollup full nodes will give light clients valid block headers is the best solution

Reason 1: PoW and PoS light clients also rely on the p2p network to sync

In theory this same spam attack could be performed in a proof of work or proof of stake system as follows.

In PoW I could publish a bunch of blocks that appear to have the right amount of work performed but the hash doesn’t check out. In PoS I could publish a bunch of blocks where the signatures don’t check out. In both cases the light client would none the less have to download each header and verify that the hash or signatures don’t match before discarding the block header as invalid and knowing which is the canonical chain.

The reason this attack doesn’t work in practice is that in PoS and PoW the nodes verify the signatures and hashes before propagating them. Therefore these invalid blocks never get distributed throughout the network and are “forgotten”.

We don’t have this luxury in Celestia because these invalid blocks get posted to the main-chain regardless of their validity and will be “remembered” by the main network. Hence why we need new mechanisms to combat this attack.

Note however, that these invalid blocks are “forgotten” by the rollup peer to peer network in the same way that the PoW and PoS peer to peer networks forget invalid blocks by not propagating them.

Trusting the rollup sub-network full nodes to only give you the valid block headers while syncing your light client seems equivalent in that you are assuming the sub-network only “remembers” valid blocks. There may be some formal equivalence there which implies that the trust assumptions in the rollup context are not any worse than in standard PoW and PoS light clients.

Reason 2: You probably need to get historical fraud proofs from an honest node anyway

Syncing a new light client not only requires downloading all the block headers but also downloading and verifying all the historical fraud proofs to be able to tell which headers are valid and invalid. This brings up an important question. Where does the light client gather historical fraud proofs?

I can imagine 2 scenarios:

find the historical fraud proofs on the Celestia main chain
trust honest full nodes to share the historical fraud proofs as they sync

For 1 to work the rollup aggregator would have to post a fraud proof on the Celestia main-chain for each and every invalid block at the previous height. This only worsens the effects of the spam attack, forcing more and more data to be posted on chain.

In the case of 2, you are essentially trusting that there’s an honest full node who will give you all the fraud proofs you need to tell which chain is valid. This is the same assumption you would be making if you just downloaded the block headers directly from a full node.

Hence 1 is impractical and 2 requires the same trust assumption as just downloading the block headers directly from an honest full node. Therefore you may as well just trust that there is an honest full node who will give you all the valid block headers and bypass this whole process.

Reason 3: It would be easy to recover from syncing with a dishonest full node

There is still the danger that while bootstrapping, you peered with a dishonest full node who tells you the wrong chain history. How do we make sure if this happens, the light client can easily recover?

So long as the light client is connected to the wider p2p network i.e. is not being eclipse attacked, then it should receive messages showing that the rest of the network is building on a different chain. This could alert the light client to connect to other full nodes to check if they are on the wrong chain. A single fraud proof from an honest full node would be enough to discard the false chain and re-sync to the right chain.

I believe this vulnerability to eclipse attacks is the same in light clients of PoW and PoS chains. As @musalbas likes to point out, all blockchains need to assume that you can connect to at least 1 honest node as a peer for anything to work.

nusret1996 · 15 September 2021 19:07

I agree that receiving valid block headers from rollup full nodes could provide a sound protection against the Woods Attack for rollup light clients connected to the p2p network of the rollup. However, we should also consider light clients that receive rollup-specific messages purely over the Celestia parent chain. Consideration for such light clients has in the past guided discussions around publishing fraud proofs on the parent chain besides propagating them over the p2p network. It is possible that applications on other rollups could prefer to trust the parent chain or IBC as relay mechanisms over the p2p network of the rollup towards which they act as light clients.

Although I agree that light clients can recover from syncing with a dishonest full node, in the lack of other spam protection mechanism on the network level, they could still be spammed by the adversary. After all, pricing on the parent chain stands as a spam protection mechanism, yet, the adversary was willing to spend 100 million USD to stage the Woods Attack. In this context, starting many new ‘malicious’ full nodes and spamming the syncing light client with invalid block headers might not only be doable but also easy for such an adversary. Thankfully, mechanisms like Rate Limiting Nullifiers could help with avoiding malicious nodes sending many messages and p2p networks could be enhanced to rate-limit the number of new joining nodes via mechanisms like staking, cryptographic puzzles etc.

nick · 16 September 2021 08:45

That’s very true. In some cases for interoperability people may not want to use a relayer. In those cases you can’t rely on an honest node.

But is the tradeoff of potentially opening yourself up to spamming attacks better than not having to rely on a relayer? Interesting questions to consider.

musalbas · 16 September 2021 08:58

In the case of an IBC-like relayer, they could only relay the valid block headers. If they relay invalid block headers, it could be ignored by the bridged chain, or they could be penalized.