Notes on Release, Build and CI Engineering

The curse of bug-to-bug compatibility

Disclaimer: I am a Senior Principal Engineer in Red Hat. I was a member of the RHEL 9/CentOS Stream 9 Bootstrap team. Opinions are my own.

Tl;dr

The chase for “bug-to-bug compatibility” hurts community, hurts RHEL customers and hurts the industry as a whole. The real innovation behind the CentOS Stream is the attempt to change it.

What is ABI compatibility?

ABI compatibility is a requirement for certain interfaces to not change for a certain length of time, so that you can safely rely on the availability of a certain function and certain library which behaves in a predictable way.

It is important to know that RHEL ABI compatibility doesn't say “all ABIs and APIs are stable forever”. The real ABI compatibility of RHEL is described in details in the official ABI Compatibility guidelines:

https://access.redhat.com/articles/rhel9-abi-compatibility

I recommend to take a look and check which compatibility level is assigned to your favorite library according to this guide.

What is bug to bug compatibility?

The term bug-to-bug compatibility when applied in isolation means that for this specific issue when we implement a new system we carry the bug over to a new implementation. Your users rely on the broken behavior so much that they require it on the new system, see xkcd/Workflow.

The way the term is applied to RHEL conversation is very generic, and therefore has much less sense. It implies that one Linux distribution has the same bugs as the another.

Linux distribution is not a static thing though. Linux distribution is a pool of package builds, each versioned on its own, and updated on its own, which are then combined into different subsets based on different rules. It is both the power and the weakness of a Linux distribution. The power as the ability to combine and mix and match packages allows you to create solutions to a large range of tasks. The weakness as different combinations of components may lead to different behavior.

Add the branching structure of RHEL to it, with all of its powerful minor-stream complexity, and you'll realize that there is simply no single state of RHEL, which you can be bug-to-bug compatible to.

Thus, at the distribution level the “bug-to-bug compatibility” concept does not exist. It is an overhyped buzzword people use without putting too much thought into it.

Why does it hurt

ABI compatibility guidelines is the open formal standard for the RHEL-compatible ecosystem. It is the .odt kind of specification for Linux distributions. On the other hand, the mythical bug-to-bug compatibility is the .docx. You chase the always moving target which you can not control or predict.

The huge amount of issues coming to RHEL support, and the demand from RHEL customers for longer and more extensive and never-ending support cycles, come from the fact that the ecosystem doesn't follow the standard, and relies on “undefined behavior” of the specific implementation of it.

Think about it: Even a single RHEL minor release, as the most stable and most restricted flow of updates we have at hand, is not bug-to-bug compatible to itself. Yes, we change things and we fix bugs. That's what updates do.

More to that, even a static snapshot of a RHEL minor release is not a good reference for anything. Paying RHEL customers never have a system deployed exactly in the same state how RHEL engineers test it. Every single customer changes the system so that they cherry-pick certain updates, freeze some other and install custom compatible versions of certain things. And that is generally OK. Until it isn't and generates the issue and a support case.

The ABI-compatibility standard, which we enforce, gives us both – the limits which we shouldn't cross, but also the flexibility to adjust within those limits. And any kind of “pinning” to the undefined behavior of a specific shapshot of RHEL at a specific point in time implemented by a third-party or ourselves creates an issue for future updates not knowing about that hidden requirement.

So yes, relying on undefined behavior is bad for business. For all businesses. As well as for community. It is simply bad for all people on Earth who use whatever those businesses and non-businesses create.

What is CentOS Stream really?

CentOS Stream is sort of “RHEL Stable Proposed Updates”. Yes, Red Hat Marketing and Branding folks do not want it to be explained this way, and probably cringe at this very moment. Thus, instead of looking at labels, look at the tech side: It is an ABI-stable and continuous Linux distribution, the mainline of RHEL, from which we branch RHEL minor releases.

But CentOS Stream also represents an open reference implementation of the ABI compatibility standard of the RHEL-compatible ecosystem.

If your RHEL-compatible application or service doesn't work on CentOS Stream, you do it wrong.

And I'll rephrase it: people think that they need a RHEL “clone”, because it gives them the access to vendors who develop and test for RHEL. The point though is that to get access to RHEL ecosystem community needs vendors to start develop and test for CentOS Stream. And this is where Red Hat's interest and community interest overlap.


Now some may ask,

Then let's adjust the standard. Bring those requirements in. Do not assume that the requirements, which do not work on CentOS Stream, will somehow magically be fullfilled by RHEL. Because they won't.

And some may say,

Then bring your tests. The easy way to write a standard is to turn it into a distribution-agnostic test.

If you are worried that CentOS Stream will break a certain behavior, write a test and let's gate all CentOS Stream updates with it (And while we are at it, we can also gate all Fedora updates and even upstream updates using the same test, see Packit )

And then some may say,

Then you don't need bug-to-bug compatibility. You need the power of remixing.

Do your own customization via a Special Interest Group SIG, see for example Hyperscale SIG. Make a version which fits your goals, your schedule, your workflow and your quality requirement, but based on the open shared standard of the ecosystem.

#Eldevelopment