autogits/doc
Adam Majer d14ebd0727 .
2024-08-18 22:22:21 +02:00
..
project-update.svg . 2024-08-18 19:47:50 +02:00
project.svg . 2024-08-18 19:18:21 +02:00
README.md . 2024-08-18 22:22:21 +02:00

Introduction

The OBS (Open Build Service) was created at a time when VCS field was still evolving. One of the main issues not handled at the time was handling of large files. Large files are at the core of package sources -- think, upstream sources.

Today, this has changed. Git is the most popular and widely used VCS in history. Entire businesses are build around providing services for Git. Git has also ability to deal with large files via Git LFS subsystem.

Here, we'll detail how Git and Git LFS can be leveraged to provide a superior contributor environment while increasing flexibility and transparency and tracability in OBS project management.

Overview of current project

OBS is used to build projects. It doesn't build package or images, but it only builds projects. And while package management is exposed directly, project management is hidden behind APIs, legacy workflows and a monolithic codebase.

The goal of this project is to move project management to Git and facilitate project workflow via external, adaptable helpers. As a consequence, OBS will be used to build any project, in Git or in legacy VCS internal to OBS. But Git-based projects will no longer be curtailed by internal OBS machinery and can adapt any project specific workflow in a modular fashion, without the need to change OBS sources.

The goal is to move current workflows for openSUSE:Factory, as well as SLFO, out of OBS and into Git. OBS will still be used to build such projects, but everything else, from approvals to maintainer definitions, to project configs, must be moved to Git. Doing so will not only simplify workflows for package maintainers but also will inject transparency in project history and secure our infrastructure with modern cryptography.

How Git works

Git contains only 4 basic objects:

  • blob -- file data
  • tree -- directory listing, contains other trees or blob or commit, or commits (aka, submodules)
  • commit -- this contains parent commit information, tree objects, forming unchangable, sealed history backed by a cryptographic hash function (kind of like a Bitcoin blockchain)
  • tags -- additional labels associated with commits

A good way of thinking about Git is not as a VCS, but as a multi-version file system, where each revision is sealed by the new revisions.

Each of the objects is represented internally as part of another object via SHA256. Therefore, integrity of Git, along with entire evolution of the sources, is backed by SHA256.

In contrast, integrity function used by OBS is MD5.

Workflow of Git Projects

OBS connects package with project. A commit to a package, updates the projects where the instance of the package resides.

Git does not contain notion of projects and packages. It simply manages source trees. The work associated with managing a project and its packages is now done externally.

The basic structure of a Git managed project is below. The package repository contains package sources, while project repository contains all the information associated with the project, including pointers (aka, git submodules) of all the package sources.

ProjectGit submodule points to commit in PackageGit

An update in package must be represented as an update to the project. This can happen in two ways. Either direct using prjgit-updater, which updates the project git directly on pushes,

ProjectGit submodule is updated following push to PackageGit

or indirectly via pr-review workflow, which updates project git via PR workflow,

PackageGit submodule is updated via PR rerouted through ProjectGit

In all cases, the project must be updated for the changes to be built. This is akin to OBS today, except that the project is an internal state, mostly hidden from inspection.

Centralization of package management

The proposal is to move all "official" package sources under a /pool organization. Each "official" project would then have one branch assigned to help with package updates.

The branches represent the current state of packages in a given project. Basic package updates follow the pr-review workflow.