ἅπαξ λεγόμενον
146.190.13.172

hier(7) revised


July 2024

Introduction

One of the things most users of POSIX-compatible systems (or the ones made in their image) take for granted is the file system hierarchy. Many of the traditional file locations are usually not even questioned (like the executable files residing in various bin directories), but some might seem weird without understanding the historical context behind their appearance (like the /usr directory not containing users’ directories). Some of the file locations became a de facto standard to the point where their absence can be lethal to a working system, an example being the musl standard C library expecting the POSIX-compatible shell language interpreter residing in /bin/sh (Felker 2019), even though the POSIX standard itself does not specify its location and even recommends against hardcoding its path (IEEE and The Open Group 2024).

The purpose of this article is documenting some of the peculiarities of the “standard” file system hierarchy, providing some historical context, comparing it with other hierarchy standards, and presenting a scheme the author uses for his computer systems.

Two major file system hierarchy schemes

While one might say there are as many file system directory schemes as there are operating systems employing them, the author believes they can be divided into two groups: user-centric and distributor-centric. When it comes to personal workstations, there are cases where the user of the workstation at some point also becomes the distributor of their own system, since some specialized software might not have been thought of by the official distributor. Fortunately, some distributor-centric file system schemes do provide ways to install such software without ruining the integrity of the system and without going against the wishes of a present package manager, but they usually do not provide as much freedom as user-centric schemes.

A user-centric approach is usually taken in systems originally designed to be used with personal computers where being connected to a network (with a potential software repository being available online) was not taken for granted. DOS-like systems are a great example. While the startup process requires certain files like, e.g., IO.SYS, present at certain places (or even positioned in a special way on the partition), various DOS distributions tend to not care about the exact file locations and are usually configured when the system starts. The user is expected to maintain the PATH enviroment variable to be able to run installed programs from the command prompt, i.e., appending a new executable file location every time a new software package is installed. Obviously, this is the case only if the program is to be called from other places than its directory, other programs like video games, which are not called as often (unless it is a gaming distribution), can be run by changing the current directory to their location and executing them there (or maybe even by specifying the full path from various places).

The author has experience with various DOS systems having either a top-level directory for every installed software package or a dedicated directory with subdirectories for every package to keep the root directory clean. Various installers also usually propose creating a top-level directory for their package. To summarize, the user is expected to keep their system clean and act as a package manager. The PATH variable also tends to be rather long in this scheme.

A distributor-centric approach is usually seen in systems belonging to the Unix family of operating systems and either systems directly based off of them or systems trying to maintain some compatibility with them, like GNU, BSD, or Plan 9. This time, the scheme itself is directly standardized and the system integrity highly depends on it. One of the first documented hierarchy standards actually appears in the Unix Programmer’s Manual (Bell Telephone Laboratories 1979) and is rendered, in the manual page convention, as hier(7). It outlines the layout and serves as a reference for the user to know where to look for certain files. One thing to take into account when comparing this kind of scheme with the user-centric approach is that the systems employing it were not usually available to average consumers and were rather used in laboratories and research centers, so most of the time they came with a lot more software, configuration files and/or documentation than, e.g., an average DOS distribution. The result was a scheme designed for a central maintenance when it comes to packages. We can see it in the most defining characteristic of this scheme: the subdirectories aggregate the files not by the package they belong to, but by their purpose, e.g., there is a common place for all executable files (actually, a few places, described below). This differs greatly from the user-centric approach, but it also shows the idea behind the system employing this scheme—the system works as a whole and depends on all the packages present to create a distribution of this system.

The hier(7) standard carries a lot of historical baggage, like the names of the directories, but the most annoying one is the /usr directory. Originally a place for users’ personal directories, at some point it became basically a second root directory (since it was usually mounted on a second drive with a lot more space for user files). We can find there, e.g., a second bin directory. An alert reader (and the one with some experience with such systems) will realize this is the reason the PATH variable is usually shorter in this case when compared to, e.g., DOS. By default, it only needs to contain the /bin and /usr/bin directory to allow the user to call most of the executable files comprising their system.

Such an alternative root directory (or the root direcotry itself) is called a prefix (and can possibly be empty).

There are systems that try to solve the /usr issue, e.g., the recommended practice in the GNU system (a one its distributions with the Linux kernel do not really follow) is to basically create a symbolic link between /usr and / to satisfy the programs with hardcoded paths to the /usr prefix and try to eliminate the reduntant prefix itself (Free Software Foundation 2010). There are also clean systems—Plan 9 from Bell Labs, which actually uses /usr for its intended purpose, i.e., the users’ directories, keeps shared files in /sys and prefixes the architecture-dependent subdirectories with the top-level directories named after the architectures (binding them together later with, e.g., executable script directories as single top-level directories like /bin).

The scheme employed in most systems today does not actually reference hier(7), but the Filesystem Hierarchy Standard (The Linux Foundation 2015) instead. The FHS tries to keep the directory scheme clean even when taking into account the historical remnants of the old systems to the point of defining what programs should actually be located in /bin or other bins. Not only that, it actually defines another root directory: /usr/local—in the most basic sense, its purpose is to keep the packages that were not part of the distribution itself. There is also an /opt directory for this purpose.

Some system distributions (like, e.g., CentOS) employ the practice of linking the /usr directories with their / equivalents, similar to the recommended GNU practice. The author believes it makes an unnecessary mess and symbolic links should not be used in this case at all.

We can summarize the two approaches with a simple example. Let us say there are two packages: x and y. The first one contains a program a with its documentation, a library a and some kind of a configuration file for both. The second one contains two programs: b and c, of which only the first one has corresponding documentation.

On an example system with user-centric directory scheme, what the file system layout would probably look is shown below.

\x\a.exe
\x\a.doc
\x\a.cfg
\x\a.lib

\y\b.exe
\y\b.doc
\y\c.exe

The user would have to add \x and \y to their PATH variable and make their linker become aware of the \x\a.lib file (possibly by adding \x to some other path variable).

It is possible to apply the FHS rules to the above example (assuming the /usr/local prefix).

/usr/local/bin/a
/usr/local/bin/b
/usr/local/bin/c

/usr/local/lib/liba.a

/usr/local/etc/a.cfg

/usr/local/share/doc/x/a.doc
/usr/local/share/doc/y/b.doc

This example actually shows the crux of the distinction the author proposes to place on these two approaches: the user-centric one can actually be managed by hand without any problems. If the user wants to remove an entire package, either x or y, they simply have to remove its entire directory (possibly with letting other packages in the system know they should not look for it anymore). The distributor-centric approach is a little more tricky—how does the user know which files to remove? The documentation files are aggregated by the package names in the share directory, so it is as easy as with the user-centric approach, but removing the executable files would require the knowledge what exact files should be removed. One can try to match them by name (assuming they do remember which program belongs to which package or they are keeping notes), but things get complicated with the y package, as it contains two programs (and having more than one programs within a single package is not an uncommon practice). Therefore, the package maintanance is usually either left for the distributor, who does it instead of the user, or delegated to a dedicated program called a package manager with an interface the user can use to maintain their packages (along with possibly downloading their sources from online repositories).

Revised file system scheme

The author does not hide the fact he is partial to DOS-like operating systems, including their approach to the file system scheme. Fortunately, the Linux kernel (which the author uses, if only for the lack of better alternatives) provides a lot of freedom when it comes to the files and the directories of the underlying system.

The example below shows the root directory of one of the author’s systems.

$ ls /
app   dev   etc   proc   run   sys   tmp   usr   vol

A reader used to orthodox file system schemes might find it a little surprising, especially the lack of some directories usually taken for granted. If a potential investigator were to follow the FHS, the next point of interest would be the /usr directory.

$ ls /usr
hapax   system

Apparently, none of these directories contain the bin subdirectory (and it seems the /usr directory serves its original purpose). If that is so, where is ls(1)?

$ which ls
/app/busybox/busybox-1.34.1/bin/ls

The above example shows exactly how the packages are organized. The top-level /app directory is similar to the Program Files directory found on Microsoft Windows systems. While not required by the system itself, as there are no variables referencing it, it serves as a way to keep the root directory clean, i.e., to avoid what usually happened in the root directory of the DOS C: device. The /app directory contains subdirectories for every vendor or provider of an installed package or the package directory itself if it is developed as a separate project without a clear and separate vendor, the examples being the above /app/busybox, or /app/gnu. Vendor subdirectories contain other subdirectories for every installed version of the package, the directory name being the name of the package and the version string after a -, the examples being the above /app/busybox/busybox-1.34.1, or /app/gnu/gcc-4.9.4.

As we can see, the bin directory resides directly in the individual package subdirectory. This approach serves two main purposes. First, it provides a rather easy way to do manual package management (allowing the system administrator to simply add or remove entire directories containing the packages in question). Second, it allows the users to manually specify what packages exactly they want to have in their path variables instead of having an entire system at their disposal with unintuitive schemes like the bin/sbin distinction. A rather pleasant side effect of this approach is the ability to keep the previous package versions intact in their original locations, the only required change being modifying the paths referencing them. The author found it useful to keep around (at least) the previous version of every package.

Assuming the example packages x and y were released by vendor Z and were both versioned 1.0, below is how the directory structure would look like when employing this scheme.

/app/z/x-1.0/bin/a
/app/z/x-1.0/doc/a.doc
/app/z/x-1.0/etc/a.cfg
/app/z/x-1.0/lib/liba.a

/app/z/y-1.0/bin/b
/app/z/y-1.0/bin/c
/app/z/y-1.0/doc/b.doc

The reader can compare it with the original example. While the scheme retains the user-centric properties, it also uses the classic hier(7)/FHS layout locally to organize particular files.

Of course, setting up a system like this will have its drawbacks, since most packages do not support such a scheme by default.

One thing to notice is the location of the shell command language interpreter. The reader should not be surprised by the following which(1) output.

$ which sh
/app/busybox/busybox-1.34.1/bin/sh

As shown in the examples above, there is no /bin, so there is no /bin/sh either. It does not break POSIX compatibility, but if the target system supports the shebang line in script files, the scripts will have to be maintained and possibly updated when installing the new package containing sh(1). It can easily be mitigated by explicitly calling sh(1) instead of running scripts directly, or cleverly passing the SHELL environment variable (for programs respecting it). The author has found his peculiar systems only benefited from removing explicit references to various interpreters, so he never had to compromise by creating the /bin directory or finding a way to keep the location constant. It might not be the case for the reader, but the author encourages them to try it.

As mentioned before, the musl standard C library requires the interpreter in a constant location, but all the references can be easily rewritten, so that the functions using it traverse the PATH variable in search for sh(1) instead of calling it explicitly. It places another consntraint on the system: the PATH variable must always cointain a directory with the shell command language interpreter. The author has found out it is not entirely unreasonable and has never had any problems with it, but as always, the reader’s mileage may vary.

Installing new packages is another thing to watch out for. While civilized packages employ Makefiles or other build systems, some packages might not define the directories as variables, so the user has to be careful and thoroughly read the source code to avoid leaving a reference to the “standard” location. Fortunately, the heaviest packages the author had to deal with usually employed the schemes where it was easy to manipulate the directories. All of the GNU packages follow the GNU Coding Standards (Free Software Foundation 2024) and the directories can be fine-tuned during installation. It usually boils down to running configure with a proper prefix parameter and watching out for files like install.sh which are run explicitly without calling the sh(1) by the variable. The below example usually works without any issues, even for the heaviest packages such as GCC.

$ sh configure --prefix=/app/provider/package-version SHELL=$SHELL

Other variables can obviously be used to make the contents even more sane, like keeping all of the manual pages in a single man directory without the preceding share container, etc. The author likes to even remove the individual numbered subdirectories for manual sections and to keep the pkgconfig subdirectory outside of lib, among other things to keep his system clean.

Obviously, since the system is starting to look DOS-like, the user will have to maintain their PATH variable to be able to call the installed programs from the command prompt (or allow other packages to see them).

The moment one starts to compile packages, another seemingly obvious problem becomes clear: there is no /lib directory. (The author does not use dynamic linking and his systems do not contain dynamic linkers, so the reader is on their own if they wish to keep the program loader in a variable versioned location, but the author strongly recommends against even installing it in the first place. Static linking has its obvious benefits in this entire scheme, but properly describing them is beyond the scope of this article.) Among other packages, ld(1) will try look for the C runtime files. If we were to follow the conventions described above, an example C library location would be /app/musl/musl-1.2.5/lib, along with all the other required files like crt0.o. Fortunately, an entire GCC stack allows us to set up a sysroot, i.e., a path referring to the place where all the necessary C library files can be found. It can even be relative to the location of the compiler, so it can be used to provide a rather elaborate scheme in case of some special system needs. The author usually just passes the --sysroot argument through the LDFLAGS variable, since packages using GNU autotools respect it and properly pass it to the compiler and linker. One thing to note is that the compiler and the GNU binutils have to be compiled with the --sysroot support, so a run-off-the-mill GCC might not be suitable for the task and would have to be recompiled.

Other variables possibly in need of maintaining (in case the user uses GNU software development stack or a compatible one) are CPATH for the C header files, now scattered across various /include directories among the packages in the /app directory, LDFLAGS since the linker has to know where to look for the libraries, and PKGCONFIG in case the system uses pkg-config or equivalent to build packages. The package configuration files can also be traitorous to this scheme, so the author recommends checking them after installation, as they might contain hardcoded paths to other packages (like even the C library as seen in, e.g., ncurses). The author actually found the pkg-config and the package configuration files really helpful in maintaining parts of his systems, especially because of its ability to return proper header and library file locations, so package build systems aware of pkg-config did not require all of the paths present in CPATH and LDFLAGS variables. The configuration files are so simple they can be created by the user and kept in their respective package directories, or maybe even in a shared directory for all package definitions as some kind of a central package registry for the entire system. The author found it is possible to write wrapper scripts returning even the --sysroot switch for the C library in a clever fashion when trying to automate builds without any predefined variables, but later abandoned this scheme for simpler ways to maintain the variables, since the number of the packages he needed was finite.

Some packages using pkg-config (like libpng) try to version the libraries or header directories themselves, so that the proper paths can be returned during build. Since the above conventions version the entire package directories, the user can remove the obvious redundancy during installation if they wish. The author does this and so far has found no problems with his approach.

There is one more thing to mention when it comes to package directories and it has been purposefully omitted until now: cross-compilation. This scheme should not be surprising for anyone maintaining large cross-compilation environments, but will be described here nonetheless. Since the author’s GCC is heavily patched, he keeps all of the compilers (and binutils) for all the supported architectures and recompiles them as needed. As an example, while the main native GCC compiler is /app/gnu/gcc-4.9.4/bin/gcc, the cross-complier for, e.g., DPMI for i686 is /app/gnu/gcc-4.9.4/bin/i686-pc-pe-gcc. C++ programs require the standard C++ library (which is a part of GCC), so the native one is being kept in /app/gnu/gcc-4.9.4/lib while the one for i686 is in /app/gnu/gcc-4.9.4/i686-pc-linux-musl/lib, the obvious convention being simply a subdirectory named after the target triplet. The same convention applies for other libraries, such as ncurses, i.e., /app/gnu/ncurses-6.3/i686-pc-linux-musl/lib. The locations of programs (in case of GCC or GNU binutils) can be fine-tuned by the configure or Makefile variables (the reader should be extra careful when handling the tooldir variable) and the library locations can be either configured using the EPREFIX variable or individual subdirectory variables like LIBDIR. The reader should choose the appropriate way to deal with it to make it compatible with their cross-compilation environment, but setting up such an environment is beyond the scope of this article.

Conclusions

The author hopes this article will help people unsatisfied with their file system directory layouts realize the alternatives are possible.

Customizing the file system directory structure to such an extent is obviously pretty advanced and might not even be possible for some systems. Obviously, one might run into unexpected obstacles which are not covered by this article when trying to make the entire system consistent. The reader should always understand what they are doing and should not try to attempt it if they do not feel confident about their skills in operating system maintenance and/or distribution.

The presented directory layout is used by the author and is pretty stable (especially the hard constraints about the vendor and versioned package subdirectories), but the author sometimes makes subtle modifications to this scheme whenever they are needed, so the entire text should be treated only as a suggestion. To quote hier(7), “the position of files is subject to change without notice.”

References

Bell Telephone Laboratories. 1979. “hier - file system hierarchy,” Unix Programmer’s Manual: Seventh Edition, Volume 7. Murray Hill: Bell Telephone Laboratories. https://man.cat-v.org/unix_7th/7/hier.

Felker, Rich. 2019. musl 1.1.24 (Draft) Reference Manual. https://musl.libc.org/doc/1.1.24/manual.html.

Free Software Foundation. 2010. The GNU/Hurd User’s Guide. https://www.gnu.org/software/hurd/users-guide/using_gnuhurd.html.

Free Software Foundation. 2024. GNU Coding Standards. https://www.gnu.org/prep/standards/standards.html.

IEEE and The Open Group. 2024. The Open Group Base Specifications Issue 8. https://pubs.opengroup.org/onlinepubs/9799919799/.

The Linux Foundation. 2015. Filesystem Hierarchy Standard, Version 3.0. https://refspecs.linuxfoundation.org/FHS_3.0/fhs/index.html.