Notes on Release, Build and CI Engineering

CI System vs CI Pipeline

(This is a repost of the article I wrote in 2017. I think it is as relevant now as it was then)

For any code you write, you need several steps to transform it from a set of text files to a certain release artifact or a running service. You go through these steps manually at first, but sooner or later you decide to automate them. And this is how the CI/CD pipeline of a project is born.

But there are different ways how you can organize the automation.

The classical approach (CI System) is formed around a standalone CI engine (for example Jenkins, Buildbot, Bamboo..). In the CI system you configure scenarios (jobs, build factories, plans..) and triggers. You also can add dependencies, manual triggers, parametrization, post-build hooks, dashboards and much more. CI system usually becomes the world of its own, with limited number of almighty CI administrators, managing multiple interdependent projects at once.

There is also a new, “postmodern” way of setting up a CI, which is essentially inspired by the Travis CI and its integration into GitHub. And if we would follow the common trend, we’d call it System-less CI Pipeline.

In this approach the recipes, triggers and actions of the CI/CD pipeline are stored in a configuration file together with the project code. The CI system (yes, there still is one) is hidden somewhere behind your repository frontend. You do not interact with the CI system directly, rather see it responding to your actions on the codebase: pull-requests, merges, review comments, tags — every action can trigger a certain recipe, configured inside the project. Feedback from the CI system is also provided via the repository frontend in a form of comments, tags and labels.

To highlight the difference between these setups, let us consider multiple projects – A, B and C (see diagram).

As soon as the code enters the CI system, it is no longer owned by the project. It becomes the part of a larger infrastructure, and follows the lifecycle defined on the system level. This allows complex dependency graphs, common integration gates and cross-project communication.

In the pipeline case, the pipeline carries the project code through the set of steps defined on the project level. Each project gets an isolated, independent lifecycle, which doesn’t generally interact with the outside world.

And which one is better

You probably can guess my answer: yes, you need both.

But let’s look into one common discussion point.

CI system silo

Silo is one of the scariest words in the modern IT environment, and also one of the most deadliest weapons. And the argument goes as follows:

Standalone CI system creates a silo, cutting out the dev team from the project lifecycle. Pipeline is better because it allows developers to keep the full control of the project

As a Quantum Integration Engineer I would love to dismiss this argument altogether as it compares apples to oranges, but it does have a point. Or, better to say, I can see where it comes from:

Widely known and widely adopted classical CI systems, like Jenkins, were not designed for collaboration and therefore are extremely bad at it. There are no “native” tools for code review, tests, approvals or visualization of the CI system configuration. Projects like Jenkins Job Builder, which address some of these issues, are still considered to be alien for the classical CI ecosystem. High entry barrier, outdated and unusable visualizations, lack of code reuse, no naming policies and common practices.. together with generic code-centric development culture (in which CI organization is not worth any effort) .. all of this leads to complex, legacy CI systems. And none of us wants to be there.

Thus, from a developer point of view the choice looks as follows: either you work with some unpredictable unknown third-party CI system or you bring every piece of the infrastructure into your project and manage it yourself.

Now, given that I admit the problem, there are some bad news:

Switching to CI Pipeline doesn’t make you immune to the silo issue, rather it encourages you to create one. In fact, the pipeline is a silo by the very definition of it:

An information silo, or a group of such silos, is an insular management system in which one information system or subsystem is incapable of reciprocal operation with others that are, or should be, related. — Wikipedia

As a project developer you might have a feeling that pipeline improves the situation, while the only thing which has improved is your position with respect to the silo wall: you are inside your own silo now:

While you may feel good getting the full control over the pipeline definition for your project, you significantly reduce the possibility to integrate with other projects: each of them now has their own pipeline, which is not visible to you.

The other way

Let me add the disclaimer first: if you run a project, which produces a single artifact (a binary or a container), if you expect every project contributor to be fully aware of its entire lifecycle, then it might be the CI Pipeline approach is enough for your use case. Do use it, it is better than nothing.

But if we talk about continuous integration of hundreds of components.. we need more than that.

Of course there are certain project-level tasks which project team can control on its own. But then the control needs to be passed further to the shared system where representatives of different projects can design, discuss and develop common integration workflows together with the people understanding the whole.

Rather than treating CI as a simple collection of pipeline configurations hidden in the individual project repositories, we need to see a shared CI as a standalone open project, which needs architecture, governance, configuration as a code, change process, code review, documentation, on-boarding guides and simplified drive-by contributions.

To solve the collaboration problem you don’t give every collaborator a personal isolated playground. Instead you give collaborators tools and processes to work on the shared environment. And it is what good CI System is supposed to be. It is an integration system, after all.