Open Source Software and GNU/Linux

Introduction

This article discusses some of the workings of open source software with particular reference to the Linux kernel and GNU. It will explain some of the processes involved for those of us who know Linux but are not software engineers. Perhaps you are an Linux administrator or enthusiast, but find yourself unsure of terms such as upstream, downstream, mainline, patch, merge and other words developers love to bandy ? Read on.

Open Source Software

If you are a Linux user, you might be familiar with installing software using yum or apt-get. For example

yum install vlc

or

apt-get install vlc

and you have probably used yum or apt-get to update your system, either with a GUI or directly on the command line. For example, the following commands will update all of the software on a Linux system to the latest version.

yum update
yum upgrade dist

But how does that actually work ? Who is writing this software, and where does it come from ? Why are certain parts of your system updated and others not ? This article will offer some explanations in language understandable those of us who use Linux but are not software engineers.

GNU/Linux

What we are talking about here is Linux, GNU, and other open source software. Linux, strictly speaking, is just the kernel. GNU software covers auxiliary parts of the operating system such as gcc, the C compiler and its libraries, the shell and a wealth of command line tools like grep, diff, awk and so on. Applications such as the office suite LibreOffice, VLC the popular video client, K3B the DVD burner and languages like Perl and Python fall into the third group – other open source software.

Who Writes all this Stuff ?

Let’s start with the Linux kernel. This is a large binary file that gets loaded into memory when your system boots and basically runs the whole show. It was first written in 1991 by Linus Torvalds and is now developed and maintained by a large number of people around the world. This development community uses a piece of software called Git to work in collaboration: downloading sections of kernel code, modifying it and uploading it again to a central repository. The repository is controlled by Linus Torvalds, who makes new releases of the kernel every 2 or 3 months

Git is an open source revision control system written in 2005 by Linus specially for Linux kernel development. It is applicable to any software project where a distributed community of developers need to work in unison, and it is now used by a number of other software projects including Perl, VLC, Gnome, KDE and Android.

Not all open source software uses Git. The free web server Apache, GCC and Ruby all use Subversion, an alternative revision control system.

The latest stable Linux kernel is always available at http://www.kernel.org/, downloadable through a very obvious link on that page.

Linux Kernel Releases at kernel.org

Besides the stable release, you will see a number of other kernel releases and branches on the main kernel.org page. Each release (or branch) is a copy of the entire Linux kernel source code, but at a different stage of development. The latest stable release is, as the name suggests, a fully tested, complete set of kernel source code frozen and released for general consumption by the public. Then there is something called the “mainline” branch. This started off as a copy of the latest stable release but has been developed further, and contains modifications that will one day appear in a future stable release. The branch right at the top of the list is called linux-next. It contains modifications (patches) that are intended for the very next stable kernel release.

Below is a description on some of the software engineering terms introduced above.

Branches

Within a software project, a branch is a copy of all or part of the source created to allow parallel development work. The two copies of the source might be worked on by two sets of developers. Although they are working on different parts of the code, each set of developers has a copy of the entire code base, which they can use for testing to see how their changes affect the whole project. When development is complete, the two branches are “merged” together again to form a single code set once more, containing both sets of changes.

Patches

A “patch” is a small section of the source code which has been copied and modified, usually to fix a problem or to extend functionality. For example, a project might consist of many hundreds of files containing source code written in C or some other language. A developer might address a bug in the end product by copying and modifying (say) two of these source files. Those two modified files can be held separately, in which case they form a patch applicable to the main source tree.

The patch could also be released to the public, allowing users to obtain the fix just by downloading those two modified files or (more usually) a distribution file prepared from them. The patched files can also be put back into the main development source tree, overwriting the two original files. In this way the fix will form part of the main code and will be present in the next overall release of the product. A developer might describe this action as “merging into mainline“.

Linux Distributions

Linux distributions like Red Hat, Fedora, Ubuntu and Mint all make use of the Linux kernel. And they include a great deal of GNU and other open source software too – GUIs like Gnome, KDE and XFCE, applications like Firefox, Transmission, Wine and Shotwell, and smaller tools like the file editors vi and gedit. The Linux distributions are said to be “downstream” of the software they include. Likewise, KDE and Gnome can be called “upstream” of the distros that include them. All distros are downstream of the Linux kernel, because they all include it in their releases. And there are similar dependencies across many open source projects.

Note that Distributions projects do not primarily develop code. They are more involved with collecting and integrating existing free software projects than with writing original code.

Dependencies

Since downstream projects make use of upstream code, they will be affected by any changes made upstream. Adapting and integrating upstream code is a large part of the work that goes into any Linux distribution or indeed any open source project that has an upstream dependency. The next release of Ubuntu, for example, will need to include a version of the latest Linux kernel, which must then be modified (“patched“) to work with the rest of Ubuntu.

Yum can show dependencies. For example, the useful “wget” downloader depends on openssl to provide secure socket support, and the zlib library for compressing/decompressing data, among many other things:

yum deplist wget ... Finding dependencies: package: wget.i686 1.12-4.fc14 ... dependency: /bin/sh provider: bash.i686 4.1.7-3.fc14 provider: bash.i686 4.1.7-4.fc14 dependency: libz.so.1 provider: zlib.i686 1.2.5-2.fc14 dependency: libssl.so.10 provider: openssl.i686 1.0.0a-2.fc14 provider: openssl.i686 1.0.0e-1.fc14 ...

Upstream Merges

Sometimes included code will be modified by downstream authors. The modifications then form patches (see above) to be applied to the inherited code. The more modifications that are made in this way, the greater will be the ongoing burden on the downstream project, who must add the same modification when they include future releases of the upstream code.

An upstream merge happens when the downstream authors send the modifications to the upstream project for inclusion there. There are two benefits. First, the downstream project is relieved of the ongoing burden of supporting the modified code. Second, other projects dependant on the same code will all benefit from the code improvement.

Closing Remarks

I hope that the above has been helpful to its intended audience. It is by no means complete or completely correct, and will be modified and extended as appropriate.

Unix etc.

Unix, Linux and related technologies.