Informational CIP: Rename `data availability` to `data publication`

TLDR

The term of data availability is often considered a confusing term for community members:

  • People who are new to modular networks may confuse it with “data storage”.
  • People who are familiar with the term still may find it confusing (1, 2).
  • Even Celestia finds it difficult to explain.

As originally publicly proposed by musalbas, favored by community members, and used by ecosystem projects, this CIP suggests to rename data availability to data publication.

Below is the content of CIP itself for further discussion.

Abstract

The term data availability isn’t as straightforward as it should be and could lead to misunderstandings within the community. To address this, this CIP proposes replacing data availability with data publication.

Motivation

The term data availability has caused confusion within the community due to its lack of intuitive clarity. For instance, in Celestia’s Glossary, there isn’t a clear definition of data availability; instead, it states that data availability addresses the question of whether this data has been published. Additionally, numerous community members have misinterpreted data availability as meaning data storage.

Specification

The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “NOT RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in RFC 2119 and RFC 8174.

The term data availability is RECOMMENDED to be renamed to data publication.

Rationale

Motivations:

  • Data publication aligns more precisely with the intended meaning, which revolves around whether data has been published.
  • The community already favors and commonly uses the term data publication.
  • Data publication maintains a similar structure to data availability, making it easier for those familiar with the latter term to transition.

Alternative designs:

  • Proof of publication: While intuitive, it differs in structure from data availability and may be too closely associated with terms like proof of work, potentially causing confusion within consensus-related mechanisms.
  • Data availability proof: While logically coherent, it may create issues when used in conjunction with other terms, as the emphasis falls on “proof”. For instance, “verify a rollup’s data availability” and “verify a rollup’s data availability proof” might not refer to the same concept.

Backwards Compatibility

No backward compatibility issues found.

Security Considerations

No security issues found.

Copyright

Copyright and related rights waived via CC0.


This CIP is inspired by musalbas’s tweet, modularmedia_'s tweet, and EIP-6789.

4 Likes

As some more discussions, a tweet by Nick White and its comments demonstrates some community support for this proposal.

Also more thought from high_byte on “data”-related terms that anything with “data” may be ambiguous.

1 Like

We are glad that CIP-5 has entered the draft phase, and we will continue the renaming discussion here.

Here are some additional supporting proofs:

  • Nick White’s tweet about this CIP (twitter: /nickwh8te/status/1721823895353696263)
  • Poll by Modular Media prefers “data publication” to “data availability” (twitter: /modularmedia_/status/1700182744884510760)
  • AltLayer’s founder used data publication to refer to DA (twitter: /jiayaoqi/status/1681044622099861504)
  • Namada uses both terms interchangeably (twitter: /namada.net/blog/namada-and-celestia-exploring-a-path-toward-shielded-data-availability)
  • Mustafa thinks “we should commit to officially renaming data availability to data publication” (twitter: /musalbas/status/1696521530409394253)
1 Like

Generally, I agree that data publication is a more understandable and intuitive term than data availability. But I don’t think it’s a perfect solution.

Ideally, data publication should be cohesive with data availability sampling (DAS). That would require a change to data publication sampling (DPS), breaking the acronym’s pronounceability.

But more importantly, if data publication is a better term, I’d like to see more real usage by the broader modular community before committing to such a change. Usage across mediums like explainer blog posts, public communications from teams, and docs, to name a few. That would be a stronger signal of preference for data publication in communications beyond a few examples of people tweeting that they like it better.

1 Like

Should the term also take into account the intended time frame of the existence of proof of publication?

If you look at https://github.com/celestiaorg/CIPs/blob/main/cips/cip-4.md for example, its pretty clear that Celestia and other DP/DA layer focused projects are opinionated about a short / medium term storage interval. So its clearly not just publication.

Looking back at terms people already might be familiar with, maybe the utilization of ther termcaching could be useful here?

I propose to call it a DPC layer - a Data Publication and Caching layer.

1 Like

I agree, DAS is much better than DPS.

I wonder if it would be possible to just change DA itself to DP, and keep the terminology like DAS (since for most community members, it may be necessary to know only DA / DP, and not DAS at the technical level)?

2 Likes

I think you bring up a very important concept. “Caching” can also be a good indicator of the intended time frame of the existence of proof of publication.

But in terms of usage, caching is not a common term as for the blockchain network / layer?

2 Likes

Also adding one related resource: A brief data availability and retrievability FAQ

One of the challenges to the renaming is that previous literature likely won’t be changed to reflect the name change.

This is not an unsurmountable concern by any means, but should be considered as the prior literature will remain relevant in the future.

Yes, I think this is a problem for all “renaming”.

Therefore, I believe that this CIP, if standardized, will serve as a recommendation guideline for future works.