[Tutorial] Understanding and working with UNIX filesystems and permissions

Aragorn · 9 May 2021 13:00

Difficulty: ★★★☆☆

Note: This post is meant as a tutorial. Please do not post on this thread regarding any problems you’re having with permissions, but start a new thread instead. Thank you.

PREAMBLE: THE HISTORICAL BACKGROUND

If you are new to the world of GNU/Linux ─ and especially if you come from the Microsoft Windows ecosystem ─ then you will undoubtedly have already noticed that GNU/Linux handles permissions and storage volumes quite differently from what you might be used to and what you might expect. This is because GNU/Linux is a UNIX-family operating system, and UNIX is an operating system architecture that was specifically designed for concurrent multiuser access. As such, security is an integral aspect of the UNIX paradigm, unlike in Microsoft Windows, which started its life as a graphical user interface on top of MS-DOS, itself a single-user and single-tasking operating system for standalone computers without a network connection, and with a processor that had no memory management unit and no process privilege separation.

UNIX started off as a multitasking, multiuser operating system that was developed both on and for minicomputers at AT&T Bell Labs in 1969. Unlike what the name “minicomputer” may lead you to believe, those were pretty big machines that easily took up an entire wall in a room, but they were called “minicomputers” because the only other type of computer that existed at the time was a mainframe, which would readily take up an entire floor in a company building.

Design-wise, UNIX was heavily inspired by and modeled after the then experimental Multics mainframe operating system, which Dennis Ritchie, Ken Thompson, Joe Ossanna and Douglas McIlroy were administrating at AT&T Bell Labs.

Back at the time, the aforementioned gentlemen liked playing a (non-graphical) computer game ─ which they themselves had written ─ against each other while the mainframe was crunching away on some batch job. The only problem was that their game did of course consume CPU cycles on the mainframe, which is why they decided to turn toward an unused DEC PDP-7 minicomputer. But this particular PDP-7 didn’t have any operating system on it yet, and this is why UNIX was created as a scaled-down, less bulky version of Multics.

Later on, when their superiors at AT&T Bell Labs discovered UNIX, the aforementioned gentlemen convinced them that UNIX was perfectly capable of applying typesetting to documents for professional printing, and thus the system was accepted for internal use within the company. UNIX was then further developed and later on rewritten in C, a more or less platform-independent programming language Dennis Ritchie had created in the meantime.

Due to an antitrust case against AT&T Bell Labs as a US-government-subsidized corporation, AT&T Bell Labs was not allowed to market UNIX as a commercial operating system, and they were obligated to license the source code to any other entity that requested it. As such, the Berkeley Software Distribution (of the Berkeley University) adopted, continued developing and started marketing UNIX, and with great success. Many other companies followed in their footsteps, and UNIX would eventually become an industry standard because of its reliability, scalability, flexibility, security and robustness.

As such, in 1983, UNIX also became the inspiration for the GNU Project ─ which was specifically created so as to promote software freedom ─ and in 1991, for Linus Torvalds, a then student at the University of Helsinki, who created his own kernel called… Linux.

Despite the high quality of the GNU software and the accolades it received from the professional world, GNU’s own kernel, although quite advanced in concept, was still not ready by 1991 ─ and it’s still not what one would consider production-ready even today as I’m writing this post. Conversely, Linus Torvalds had written a high-quality UNIX-style kernel, but he didn’t have any userland software of his own, and so he started using some of the GNU tools for the further development of his kernel, because the GNU software was readily available and it was excellent.

Cutting a long story short, eventually the GNU people started porting all of GNU’s userland to the Linux kernel, and as such, GNU/Linux was born, as a complete Free & Open Source Software operating system under the GNU General Public License.

DRIVE LETTERS vs. A UNIFIED DIRECTORY HIERARCHY

Having orginated as a graphical user interface on top of MS-DOS ─ which was itself a not-too-legal 16-bit rewrite by Tim Paterson of Digital Research’s 8-bit CP/M operating system ─ Microsoft Windows still largely builds upon that legacy on account of how it approaches storage. As such, like MS-DOS and CP/M before it, as well as OS/2 ─ the once intended more powerful successor to MS-DOS/PC-DOS ─ Microsoft Windows designates drive letters to individual storage volumes.

The use of drive letters goes back to CP/M and the very earliest computers that ran MS-DOS or PC-DOS ─ the latter was IBM’s own version of MS-DOS, specifically intended for IBM-branded personal computers with a built-in BASIC interpreter. Back at the time, those consumer-oriented machines didn’t have any hard disk drives in them, and neither CP/M nor DOS even supported hard disk drives at first. Furthermore, CP/M and DOS also had no support for directories, or “folders”, as you millenials now call them. Everything was stored on a primitive, flat filesystem.

As the early personal computers didn’t have any hard disk drives, the most common storage media in those days were floppy disks, which were quite limited in storage capacity. As such, the drive letters were a convenient way to allow one to copy files from one floppy disk to another. If the machine had two floppy drives, then the drive from which it booted would be drive A: and the other one would be drive B:.

If the machine had only one floppy drive, then CP/M and DOS could use the designations A: and B: to distinguish between two different floppy disks alternatingly being inserted into the same drive, so that one could still copy a file from one floppy to another, with RAM as the intermediate storage while the floppies were being swapped ─ the operating system would prompt you to replace the floppy when the file(s) had been copied to RAM and press Return when the target floppy had been inserted into the drive. For bigger files, this often necessitated repeatedly swapping the floppies in the drive because of the limited amount of RAM that either of these two operating systems could access, not to mention the limitations of the hardware itself.

As of MS-DOS 2.x on, DOS begot support for directories, and as of MS-DOS 3.x on, it also begot support for hard disk drives ─ or “fixed disks”, as IBM used to call them. For quite a while, there was however a 32-MiB size limitation on partitions, and only one primary partition was usable per hard disk drive after DOS had booted, even though many hard disk drives of the era had capacities of 40 MiB or more. This is why support for an extended partition container with multiple so-called logical partitions was added to DOS 3.x. The drive letters A: and B: remained in use for floppy drives ─ whether you had only one or two of them ─ and the marked-bootable primary partition on the first hard disk drive found by the system at boot time became designated as drive C:. If the machine had two distinct hard disk drives, then drive D: would be the marked-bootable primary partition on the second hard disk drive, and if not, then the drive letter D: was reserved for the first logical partition on the drive that held the volume C:.

Now, at the very lowest level, each individual filesystem naturally has its own root directory, and Microsoft Windows ─ like DOS ─ exposes this to the user. So if you’re on a Windows machine that has two storage volumes, you’ll have a drive C: with its own root directory, C:\, as well as a drive D: with its own root directory, D:\.

This is where UNIX systems do things differently. Although it is only natural that each individual filesystem has its own root directory ─ and just as an interesting note, btrfs and zfs can actually have multiple root directories within a single filesystem ─ UNIX does not expose this distinction to the end-user. Instead, like in Multics, UNIX uses a single logical root directory, and all other storage volumes are simply mounted to directories in the tree structure. This mounting can be done manually by someone with superuser privileges, or it can be done automatically at boot time, based upon information stored in the file /etc/fstab.

For instance, if you opt to create only a single partition during the installation of Manjaro, then the directory /home ─ which holds the home directories and data of all unprivileged user accounts in the system ─ will live on the root filesystem, and so will its contents. However, if you have opted to use a separate filesystem for /home, then /home itself is still a directory on the root filesystem ─ i.e. the filesystem that the kernel looks for when it boots, and which holds the logical root directory of the system ─ but the contents of /home will reside on a physically distinct filesystem, which is itself mounted at the directory /home. This is very transparent, because navigating to /home will then always take you to the home directories of the user accounts on your computer, regardless of whether this storage medium is separate from the root filesystem itself or not, and regardless of what physical storage medium these contents reside on, even if it’s a drive on another computer across the LAN.

We’ve already briefly touched upon the file /etc/fstab. Essentially, this file is a simple table, or otherwise put, a flat database, in which each line is a record, and each record contains six fields separated by whitespace ─ i.e. space characters or tabs. The layout of the fields (with an example) is as follows…


# STORAGE DEVICE  |  MOUNTPOINT |  TYPE  |  MOUNT OPTIONS  |  DUMP  |  PASS

/dev/sda3            /home         ext4     defaults          0        1

Now let’s take a look at what each of these fields represents…

The first field refers to the storage device that is to be mounted. In the example above, I have used the designation /dev/sda3, but because the order of the different drives in a machine with multiple drives attached is never guaranteed to remain the same across reboots, much better is to use the UUID, which is a unique identifier stored in the filesystem header when the filesystem is created. Alternatively, one can also use a LABEL, which is also stored in the filesystem header, but a LABEL is not guaranteed to be unique, given that it must be set by the biological unit between the keyboard and the chair. A couple of examples follow below…

UUID=some-long-string   /home                   ext4   defaults  0   1
LABEL=movies            /home/my-name/movies    ext4   defaults  0   0

Note: Instead of the UUID or a LABEL, both of which are stored in the filesystem’s own headers, it is also possible to use the PARTUUID or (if present) PARTLABEL, which are similar, but which are stored in the partition table itself on GPT drives, and which, unlike the regular UUID or LABEL, will not change if the filesystem is reformatted. (Note: You cannot do this if the partition table is in the MS-DOS MBR format.)

The second field is easy. It contains the name of the directory that the filesystem is to be mounted on. However, do note that when a filesystem is mounted on this directory, whatever was in the directory before the filesystem was mounted will be obscured until the filesystem is unmounted again. You can actually use this as a trick to help you diagnose problems with mounting, i.e. if you create a directory that another filesystem is to be mounted on later, then you can create a zero-length file in this directory with the name NOT_MOUNTED. By consequence, if you later on visit this directory by way of a file manager or you attempt to list the contents of this directory by way of the command line, and you then get to see a file named NOT_MOUNTED while you were expecting a whole list of other files, then you know what the problem is.

The third field specifies the type of filesystem, although in the event of a swap partition or a swap file, there actually isn’t any filesystem ─ the kernel directly accesses the raw drive blocks for paging data to the swap partition or swap file ─ but you can use sw or swap as a placeholder. Note that for NTFS filesystems, one should use the type ntfs-3g here and not ntfs. The ntfs driver in the kernel is only good for read-only acccess and editing a file by overwriting existing bytes without that the file changes in size. So it’s not really usable. Much better is therefore to use the ntfs-3g type, which has a more functional driver. This more functional driver runs in userspace, not in the kernel ─ it could not be included in the kernel itself because of licensing reasons — which is why it must be explicitly specified in place of the built-in ntfs driver.

The fourth field comprises the mount options. Multiple mount options can be specified, so long as they are separated by commas without spaces. Which options all exist independently from the filesystem types and which ones are the defaults for each suppported type of filesystem can be gleaned from the man page…:

man mount

The fifth field can be set to either 0 or 1, but these days it should not be set to anything other than 0 anymore, although technically speaking, the value of 1 is still functional. This field denotes whether the filesystem must be backed up when using the dump command, an older utility for making backups. Nowadays much better and more flexible backup solutions exist, and so dump isn’t used much anymore, but it’s still supported for compatibility reasons.

The sixth field can be set to 0, 1 or 2. If set to 1 or 2, it denotes the order in which a filesystem must be checked for errors at boot. If set to 0, the filesystem will not be checked, but concretely in Manjaro, the root filesystem is always checked from within the initramfs at every boot, even if the value is set to 0 in /etc/fstab.

Now, as a useful tip, Manjaro uses systemd as a system manager, and if you’re going to be using external storage media ─ e.g. a USB-connected drive ─ then systemd will create a mountpoint for the device on the fly, somewhere under the /run hierarchy. Only ─ and we’ll get back to this later ─ /run is a tmpfs, which means that its contents only exist in virtual memory; they start out living in RAM, but they can be paged out to the swap partition or swap file if needed. And quite often, the permissions that systemd applies to the mountpoint or to the filesystem if it is not a UNIX-native filesystem ─ we’ll get back to this later as well ─ do not give you write access to the filesystem in question. You could then of course change the permissions on the mountpoint, but given that /run is a tmpfs, your change to the permissions will not persist across reboots.

This is why it’s best to forego the “automagic” that systemd applies and instead set up a static mountpoint for the filesystem in /etc/fstab. And if it’s a filesystem containing personal data, then it’s best to set up a mountpoint inside of your own home directory. That way, the mountpoint ─ i.e. a directory ─ will itself already have the correct ownership and permissions. This too I will get back to farther down in this post.

FILE OWNERSHIP AND PERMISSIONS

From all the way back in its earliest days, UNIX has had two very important aspects about it…

1. Everything is a file

In a UNIX system, every aspect of the system ─ from the hardware over to various other subsystems and communication between processes ─ is represented both toward the users and toward applications by way of an abstraction layer in the form of device special files, all of which live under /dev. In GNU/Linux concretely, /dev is a special type of tmpfs ─ a filesystem in virtual memory ─ and its contents are dynamically managed by the udev subsystem of the systemd project, even in GNU/Linux distributions that do not use systemd as a system manager daemon; the Gentoo distribution has developed its own udev fork called eudev, which is now also used by yet other non-systemd-based distributions such as Slackware.

To give you a concrete example, say that your computer has a single HDD (hard disk drive) or SSD (solid-state drive) in it, connected to the motherboard by way of an SATA cable. Your drive will thus be presented to userspace as the device special file /dev/sda. The first partition on that drive will be presented to userspace as /dev/sda1, and so on for the following partitions. (Note: I am not going to touch upon the partition numbering differences between GPT-partitioned drives and MBR-partitioned drives in the context of this tutorial, because most members of this forum are using machines that boot in native UEFI mode and have GPT-partitioned drives.)

There are different types of device special files, which I will get into farther down when we talk about the permissions masks, but what you should keep in mind about this abstraction of the hardware in the form of a filesystem is that it allows for the various aspects of the system ─ e.g. a drive, a console or some other peripheral ─ to be read from and written to just as you would read from or write to any other file, provided that you yourself have the permission to do so.

2. Every file has an owner, a group and a permissions mask

Whenever you look at a detailed directory listing in the file manager of your choice, or you issue the command…

ls -lh

… on a non-empty directory, you will get to see not just the filename and the size of the file, but also a bunch of additional information, among which the permissions masks of the files or directories. An example follows below…

[nx-74205:/dev/pts/3][/home/aragorn]
[08:03:49][aragorn] > ls -lh /tmp
total 4.0K
drwx------ 3 aragorn aragorn 80 May  9 03:54 checkup-db-1000
drwx------ 2 aragorn aragorn 60 May  7 02:05 claws-mail-1000
srwxrwxrwx 1 aragorn aragorn  0 May  7 02:03 dbus-3WQJz0SEHW
drwx------ 2 aragorn aragorn 40 May  7 01:59 plasma-csd-generator.cyxiXk
drwx------ 2 aragorn aragorn 60 May  7 01:59 plasmashell-rbRNjZ
-rw------- 1 aragorn aragorn  0 May  7 01:59 qipc_sharedmemory_MSMNotifierforPlasmad7cd3d9ed002d00038761dcd548b1461629c9f02
-rw------- 1 aragorn aragorn  0 May  7 01:59 qipc_systemsem_MSMNotifierforPlasmad7cd3d9ed002d00038761dcd548b1461629c9f02
srwx------ 1 sddm    sddm     0 May  7 01:59 sddm-:0-DRFlYV
srwxr-xr-x 1 root    root     0 May  7 01:59 sddm-auth4ea63e88-3580-43c7-9944-9193bd94913b
drwx------ 3 root    root    60 May  7 01:59 systemd-private-17c791c340b64bffb6e46b7f1ed9fe99-systemd-logind.service-4c67ar
drwx------ 3 root    root    60 May  7 01:59 systemd-private-17c791c340b64bffb6e46b7f1ed9fe99-systemd-timesyncd.service-jGuaP4
drwx------ 3 root    root    60 May  7 01:59 systemd-private-17c791c340b64bffb6e46b7f1ed9fe99-upower.service-IaWjxI
drwx------ 2 aragorn aragorn 40 May  8 03:26 trizen-aragorn
-rw------- 1 aragorn aragorn 53 May  7 01:59 xauth-1000-_0

The first column in the terminal output pasted above represents the permissions masks of the contents of my /tmp directory at the time of my writing this post. Each permissions mask is comprised of 10 characters, which I will divide into four groups below for clarification. Let’s take the file dbus-3WQJz0SEHW from the directory listing above as an example.

s rwx rwx rwx

The first group comprises only a single character. In this case, it is the letter s, which denotes that the file dbus-3WQJz0SEHW is a special type of file, i.e. a UNIX domain socket, which is a file used for interprocess communication. There are also other file types, of course.

-  = a regular file
d  = a directory
l  = a symbolic link (also called "symlink" or "soft-link")
p  = a named pipe
s  = a socket
b  = a block device special file 
c  = a character device special file

Now, if we go over these different types of files, I guess I don’t need to explain what a regular file and a directory are, but the other ones do require a bit of explanation.

A symbolic link is a very special type of file that points toward another (real or non-existing) file or directory by way of an absolute or relative path. You could compare it to the “shortcut” in Microsoft Windows, except that a Windows shortcut is only implemented in the graphical user interface, whereas a symbolic link in UNIX is implemented in the filesystem itself.

Symbolic links can be broken, i.e. if the file or directory they point at has in the meantime been moved, deleted or renamed. Also, the path to the file or directory the symbolic link points at can be absolute ─ i.e. it contains the full path from the root directory down to the file or directory ─ or it can be relative to the location of the symlink in the directory hierarchy, as in the examples below…

somefile            : this file exists in the same directory as the symbolic link itself
./somefile          : this file exists in the same directory as the symbolic link itself
somedir/somefile    : this file exists in the directory "somedir", which itself exists in the same directory as the symbolic link itself
../somefile         : this file exists in the parent directory of the directory that the symlink sits in

An important note here is that if you try changing the permissions on a symbolic link, then the permissions of the symlink itself are not changed, but instead the changes are imparted upon the file or directory that the symlink points to, although when traversing directories in a recursive operation, the permissions on any targets that symlinks in subdirectories point at will not be changed.

In most cases, you will find that the symbolic links on your system all have the permissions lrwxrwxrwx, but this is just a convention, as the permissions on a symlink itself are never even read and ─ at least in GNU/Linux ─ cannot be altered.

A named pipe is a form of I/O redirection along the FIFO (“first in, first out”) principle, whereby a process can temporarily store its output in the FIFO, so that another process can later read said data from the pipe as its input.

A socket is a way for distinct userspace processes to be able to communicate with one another on the same host. These are usually used for exchanging network traffic.

A block device special file is a special type of file under the /dev hierarchy that represents a storage device. Such devices are read from and written to by way of entire blocks.

A character device special file is a special type of file under the /dev hierarchy that represents any aspect of the system that can be read from or written to byte by byte. Examples would be the local console (/dev/console), a character mode virtual console (e.g. /dev/tty2), a serial console or modem (e.g. /dev/ttyS0), a terminal emulator in a GUI environment (e.g. /dev/pts/1), the NULL device (/dev/null), and so on.

Now that we’ve discussed the different types of files, let’s take a look again at the rest of the permissions mask. As you could see higher up, I split the whole mask up into, first, the character that identifies the type of file, and then three more groups of three characters each. We call these triads, and there are three triads for every file and every directory…:

for the user who owns the file (u)
for the group of the file (g)
for all others (o)

Now let’s take a look at those triads again by way of the typical permissions mask on an executable.

[nx-74205:/dev/pts/3][/home/aragorn]
[09:01:25][aragorn] > ls -l /usr/bin/kate
-rwxr-xr-x 1 root root 788464 Apr 16 23:20 /usr/bin/kate

As you can see, the first character is - because kate is a regular file. Then next we have the three triads formed by the letters rwx, r-x and again r-x. Behind that you see the digit 1 and two instances of the word root. This is important, because it means that the file /usr/bin/kate is owned by the root user, and that the group for the file is also the root user’s group.

The first triad rwx represents the permissions that the user has who owns the file ─ in this case, the root user.

r means that this user (the root user) can read the file
w means that this user (the root user) can write to the file
x means that this user (the root user) can execute the file

Note: The x permission on a directory means that the directory is searchable. If you happen to have r permission on a directory but you don’t have the x permission, then you will be able to list the directory contents by way of the ls command, but you won’t be able to cd into it or open it with a file manager.

As you can see, the second and third triad have r-x permissions, which means that neither the root’s group nor anyone else in the system ─ and that includes processes as well as people ─ have write access to the file, but they do all have permission to read the file and execute it. (For those who do not know, kate stands for “KDE Advanced Text Editor”, which is an application, and thus it needs to have execute permission for everyone.)

Permissions can be set or altered by way of the chmod command, but this command also allows for the permissions to be set in the form of an octal digit. Likewise, you may also come across permission references here at the forum in this numerical form. So let’s dig a little deeper into how that works.

First of all, in their basic form ─ there are special exceptions, which I’ll get into farther down ─ the r, w and x permissions are bits. This means that they can only have two values; either they are set or they are unset. In other words, their value is binary ─ it is either “1” or “0”.

Now, if you then consider that each triad is comprised of three bits, then the total amount of permissions that are set within that triad can be represented by an octal number, because the minimum value is 0 (when all three bits are unset), and the maximum value is 7 (when all three bits are set). And now you’re probably thinking, “Why an octal number? Why not a decimal number?” Well, the octal system is comprised of only eight numbers ─ 0 through 7 ─ and as such, the value can be stored in a single byte.

For those of you who aren’t good with numbers, here’s a little table with the possible permissions values within a single triad. Note that some of these permissions below are nonsensical from the practical standpoint, but they are shown just for the sake of being able to understand how a permissions mask translates into an octal value.

PERMISSIONS           BINARY VALUE    OCTAL VALUE

    ---                  000              0
    --x                  001              1
    -w-                  010              2
    -wx                  011              3
    r--                  100              4
    r-x                  101              5
    rw-                  110              6
    rwx                  111              7

When it comes to file permissions, the most common way you’ll see these octal numbers being used is in reference to all three of the triads, i.e. user, group and others. A typical example for a system directory would be 755, which translates into drwxr-xr-x permissions. For non-executable files, a typical example would be 644, which translates to -rw-r--r--.

Important note: What permissions you actually have with regard to any given file not only depends on the permissions of the file itself, but also on the permissions of the directory containing the file.

In order to modify the contents of a file, you need write permission on the file itself.
In order to delete the file or rename the file, you need write permission on the directory containing the file, because those are write operations on the directory itself, not on the actual file. This is because the name of the file is not a property of the file itself. The name of the file is an entry in the table of contents of the directory. The file itself is identified by its inode, which is its entry in the filesystem itself, and which is also where the permissions and ownership of the file are stored.

Now, related to the concept of the filename and the inode is the number that you see behind the permissions but before the names of the owner and the group, as in the example below.

-rw------- 1 aragorn aragorn 53 May  7 01:59 xauth-1000-_0

In this example, the number is “1”. When it comes to a file, this number represents the link counter for the inode of that particular file. The link counter is the counter of names that the file has, because in UNIX, a file can have multiple names all at the same time, and each of those names can exist in other directories than the one you’re looking at, provided that these directories all exist within the same filesystem ─ this is important. And in the case of a file having multiple filenames, we speak of hard-links ─ as opposed to symlinks, soft-links or symbolic links, all three of which mean the same thing.

Indeed, a hard-link is nothing other than an additional name for any given file, and this additional name can be an entry in the same or another directory. So for instance, you could have a file /home/your-username/Documents/Invoice.pdf, which has a second link as /home/your-username/BadNews/I-dont-wanna-pay-this-much.pdf. Different directories, and different names ─ although the name doesn’t need to differ if it’s located in another directory ─ but for the filesystem, it’ll all be one and the same file, and its link counter will say “2” instead of “1”. If both filenames are deleted, then the link counter will be set to zero and the drive blocks occupied by the file will be marked as free for reuse again.

Notes:

When the link counter is reset to zero, the drive blocks are however not erased ─ they will only be overwritten when they are needed again for storing something else. Utilities for zeroing the blocks do exist, but better be careful with those, and there’s no point in using them on an SSD.
Because hard-links are simply additional links to the same inode, hard-linking cannot span across different filesystems.
Because the permissions, owner and group of any file or directory are stored in the inode itself, you cannot set different permissions, a different owner or a different group on the individual hard-links of a single file.
The Linux kernel no longer supports (user-created) hard-links for directories, although every directory will always contain two hard-links, i.e. . (which points at itself) and .. (which points at its parent directory). In the event of the root directory, the link to the parent also points at the root directory itself. Given that these two links ─ i.e. . and .. ─ have a name that starts with a period, these two entries will not show up in a normal directory listing. Also, because directories cannot be hard-linked anymore, the link counter for a directory instead refers to the number of subdirectories it contains.

SPECIAL PERMISSIONS

In addition to the permissions explained above, there are a couple of extra permissions, which when set will show a different presentation of the permissions mask, albeit that their values are stored in an additional field, and not in the place they appear in the permissions mask. After all, the regular permissions are bits, and they can have only two values.

SUID (“Set UID”) ─ -rwsr-x-r-x. When the SUID bit is set on an executable, then the executable will always be run with the UID (“user ID”) of the owner, regardless who executes it. When set on a directory, the Linux kernel ignores it, as well as if it is set on a file which is a script ─ e.g. a Bash, Python or Perl script. The octal value for setting the SUID bit by way of chmod is 4.
SGID (“Set GID”) ─ -rwxr-s-r-x. When the SGID bit is set on an executable, then the executable will be run with the GID (“group ID”) of the file’s group, unless the user running the executable does not belong to said group. When set on a directory, the SGID bit causes all newly added files to the directory to inherit the group ID of the directory. This is quite useful when multiple user accounts who all belong to the same user group have to work on files stored in a commonly writable directory. The octal value for setting the SGID bit by way of chmod is 2.
Restricted Deletion Flag, also known as Sticky Bit ─ -rwx-r-xr-t. When set on a file, this means that the file’s text image will be stored on the swap device for faster access, although in practice, it is never used for that anymore. On a world-writable directory like /tmp or /var/tmp ─ which have drwxrwxrwt permissions ─ it prevents users from deleting each other’s files. The octal value for setting the sticky bit by way of chmod is 1.
Restricted Execution ─ -rwxr-xr-X. On a file, the file becomes executable only if already at least one other user apart from the owner has execute permission on it. On a directory it simply means that the directory can be traversed. In practice, this permission isn’t being used anymore ─ in my 20+ years of exclusively using GNU/Linux, I’ve never come across it. There is no octal mode associated with this permission, so it must literally be set with…

chmod +X /path/to/filename

UMASK

In UNIX systems, the command umask is used in combination with a three-digit octal number ─ one for each permissions triad ─ to determine the permissions that a new file or new directory must be created with, but in an inverse way, i.e. the octal number does not represent the permissions that the file or directory will be created with, but what octal value must be subtracted from 777 for directories and 666 for files. So a umask of 022 means that new directories will be created with a 755 (drwxr-xr-x) permissions mask, and new files with a 644 (-rw-r--r--) permissions mask.

Normally, the root account’s umask should always be 022 ─ notwithstanding the fact that the permissions on the root user’s home directory /root should always be set to 700 or 750 ─ and this is usually also the case for unprivileged user accounts, but personally, I prefer a umask of 077 for unprivileged user accounts, for reasons of privacy on systems with multiple unprivileged accounts.

The umask is commonly set in the shell’s configuration files ─ systemwide, this will be in /etc/login.defs, and individual users can set it in ~/.bashrc or ~/.zshrc, depending on what shell they use.

UID vs. USER NAME

As you could see higher up, the command…

ls -l

… lists not only the permissions, the link counter, the size of the file, the time and date of last modification and the filename, but also the owner and group of each file. However, it is important to note here that the names of the owner and group are only mnemonics, and that to the system internally, they correspond to a numerical UID and GID.

The reason why this is important is that not all GNU/Linux distributions start numbering the UIDs and GIDs of unprivileged user accounts at the same value. In Arch and Manjaro, the first created unprivileged user account at installation time will have UID 1000 and GID 1000, but other distributions ─ e.g. PCLinuxOS ─ start creating their unprivileged user accounts with a UID and GID of 500. As such, it is quite possible for you to try accessing a Linux-native filesystem belonging to another GNU/Linux distribution on your machine and discover that your UID and GID don’t match anymore, and that as such you don’t have read and/or write access to those files anymore, because suddenly they are owned by a user who’s simply called “500”, and a group that’s called “500” as well.

Problems like that can be remedied with chown ─ executed with root privileges, of course ─ but you had better beware that if you then boot into the other distribution again, the UID and GID will be wrong again, because they no longer match the user account you were having in that system.

NTFS/FAT/VFAT/exFAT

A special word needs to be said about mounting NTFS or FAT and derivative filesystems in GNU/Linux. These filesystems do not support UNIX/POSIX permissions and file ownership, but because UNIX systems need POSIX permissions for proper security handling, the kernel will emulate such permissions in its virtual filesystem layer at mount time. What’s important to keep in mind here is that…

these permisions are virtual and are not stored on the actual NTFS or FAT(-derivative) filesystem; and that
these permissions are set for the whole filesystem at mount time and cannot be altered with chmod, nor can they be set for individual files and/or directories on the NTFS or FAT(-derivative) filesystem.

The same is true for the ownership and group. Both are set in the virtual filesystem layer when the filesystem is mounted, and remain the same throughout the whole time that the filesystem remains mounted into the tree.

Now, one can rely on the “automagic” provided by systemd, but more reliable is to add a static mountpoint under your home directory for these alien filesystems, and have them mounted via an entry in /etc/fstab, where you can set the required mount options and the required fake UID and permissions that must be applied when the filesystem is mounted. An example of a /etc/fstab entry for an NTFS filesystem follows below…

UUID=some-long-string  /home/your-user-account/my-winfs  ntfs-3g  auto,nofail,uid=1000,gid=1000,utf8,umask=022,defaults   0   0

More information about the different mount options for NTFS, FAT/FAT32 and exFAT can be found by perusing…

man mount

Hopefully this post will have taught you something new. And just as hopefully, you won’t be breaking your system with this newfound knowledge. But if you do, you get to keep all the pieces.

linux-aarhus · 9 February 2022 17:41