1 .. SPDX-License-Identifier: GPL-2.0
6 Most filesystem developers will have encountered idmappings. They are used when
7 reading from or writing ownership to disk, reporting ownership to userspace, or
8 for permission checking. This document is aimed at filesystem developers that
9 want to know how idmappings work.
14 An idmapping is essentially a translation of a range of ids into another or the
15 same range of ids. The notational convention for idmappings that is widely used
20 ``u`` indicates the first element in the upper idmapset ``U`` and ``k``
21 indicates the first element in the lower idmapset ``K``. The ``r`` parameter
22 indicates the range of the idmapping, i.e. how many ids are mapped. From now
23 on, we will always prefix ids with ``u`` or ``k`` to make it clear whether
24 we're talking about an id in the upper or lower idmapset.
26 To see what this looks like in practice, let's take the following idmapping::
30 and write down the mappings it will generate::
36 From a mathematical viewpoint ``U`` and ``K`` are well-ordered sets and an
37 idmapping is an order isomorphism from ``U`` into ``K``. So ``U`` and ``K`` are
38 order isomorphic. In fact, ``U`` and ``K`` are always well-ordered subsets of
39 the set of all possible ids useable on a given system.
41 Looking at this mathematically briefly will help us highlight some properties
42 that make it easier to understand how we can translate between idmappings. For
43 example, we know that the inverse idmapping is an order isomorphism as well::
49 Given that we are dealing with order isomorphisms plus the fact that we're
50 dealing with subsets we can embedd idmappings into each other, i.e. we can
51 sensibly translate between different idmappings. For example, assume we've been
52 given the three idmappings::
58 and id ``k11000`` which has been generated by the first idmapping by mapping
59 ``u1000`` from the upper idmapset down to ``k11000`` in the lower idmapset.
61 Because we're dealing with order isomorphic subsets it is meaningful to ask
62 what id ``k11000`` corresponds to in the second or third idmapping. The
63 straightfoward algorithm to use is to apply the inverse of the first idmapping,
64 mapping ``k11000`` up to ``u1000``. Afterwards, we can map ``u1000`` down using
65 either the second idmapping mapping or third idmapping mapping. The second
66 idmapping would map ``u1000`` down to ``21000``. The third idmapping would map
67 ``u1000`` down to ``u31000``.
69 If we were given the same task for the following three idmappings::
75 we would fail to translate as the sets aren't order isomorphic over the full
76 range of the first idmapping anymore (However they are order isomorphic over
77 the full range of the second idmapping.). Neither the second or third idmapping
78 contain ``u1000`` in the upper idmapset ``U``. This is equivalent to not having
79 an id mapped. We can simply say that ``u1000`` is unmapped in the second and
80 third idmapping. The kernel will report unmapped ids as the overflowuid
81 ``(uid_t)-1`` or overflowgid ``(gid_t)-1`` to userspace.
83 The algorithm to calculate what a given id maps to is pretty simple. First, we
84 need to verify that the range can contain our target id. We will skip this step
85 for simplicity. After that if we want to know what ``id`` maps to we can do
88 - If we want to map from left to right::
93 - If we want to map from right to left::
98 Instead of "left to right" we can also say "down" and instead of "right to
99 left" we can also say "up". Obviously mapping down and up invert each other.
101 To see whether the simple formulas above work, consider the following two
105 2. u500:k30000:r10000
107 Assume we are given ``k21000`` in the lower idmapset of the first idmapping. We
108 want to know what id this was mapped from in the upper idmapset of the first
109 idmapping. So we're mapping up in the first idmapping::
112 k21000 - k20000 + u0 = u1000
114 Now assume we are given the id ``u1100`` in the upper idmapset of the second
115 idmapping and we want to know what this id maps down to in the lower idmapset
116 of the second idmapping. This means we're mapping down in the second
120 u1100 - u500 + k30000 = k30600
125 In the context of the kernel an idmapping can be interpreted as mapping a range
126 of userspace ids into a range of kernel ids::
128 userspace-id:kernel-id:range
130 A userspace id is always an element in the upper idmapset of an idmapping of
131 type ``uid_t`` or ``gid_t`` and a kernel id is always an element in the lower
132 idmapset of an idmapping of type ``kuid_t`` or ``kgid_t``. From now on
133 "userspace id" will be used to refer to the well known ``uid_t`` and ``gid_t``
134 types and "kernel id" will be used to refer to ``kuid_t`` and ``kgid_t``.
136 The kernel is mostly concerned with kernel ids. They are used when performing
137 permission checks and are stored in an inode's ``i_uid`` and ``i_gid`` field.
138 A userspace id on the other hand is an id that is reported to userspace by the
139 kernel, or is passed by userspace to the kernel, or a raw device id that is
140 written or read from disk.
142 Note that we are only concerned with idmappings as the kernel stores them not
143 how userspace would specify them.
145 For the rest of this document we will prefix all userspace ids with ``u`` and
146 all kernel ids with ``k``. Ranges of idmappings will be prefixed with ``r``. So
147 an idmapping will be written as ``u0:k10000:r10000``.
149 For example, the id ``u1000`` is an id in the upper idmapset or "userspace
150 idmapset" starting with ``u1000``. And it is mapped to ``k11000`` which is a
151 kernel id in the lower idmapset or "kernel idmapset" starting with ``k10000``.
153 A kernel id is always created by an idmapping. Such idmappings are associated
154 with user namespaces. Since we mainly care about how idmappings work we're not
155 going to be concerned with how idmappings are created nor how they are used
156 outside of the filesystem context. This is best left to an explanation of user
159 The initial user namespace is special. It always has an idmapping of the
164 which is an identity idmapping over the full range of ids available on this
167 Other user namespaces usually have non-identity idmappings such as::
171 When a process creates or wants to change ownership of a file, or when the
172 ownership of a file is read from disk by a filesystem, the userspace id is
173 immediately translated into a kernel id according to the idmapping associated
174 with the relevant user namespace.
176 For instance, consider a file that is stored on disk by a filesystem as being
179 - If a filesystem were to be mounted in the initial user namespaces (as most
180 filesystems are) then the initial idmapping will be used. As we saw this is
181 simply the identity idmapping. This would mean id ``u1000`` read from disk
182 would be mapped to id ``k1000``. So an inode's ``i_uid`` and ``i_gid`` field
183 would contain ``k1000``.
185 - If a filesystem were to be mounted with an idmapping of ``u0:k10000:r10000``
186 then ``u1000`` read from disk would be mapped to ``k11000``. So an inode's
187 ``i_uid`` and ``i_gid`` would contain ``k11000``.
189 Translation algorithms
190 ----------------------
192 We've already seen briefly that it is possible to translate between different
193 idmappings. We'll now take a closer look how that works.
198 This translation algorithm is used by the kernel in quite a few places. For
199 example, it is used when reporting back the ownership of a file to userspace
200 via the ``stat()`` system call family.
202 If we've been given ``k11000`` from one idmapping we can map that id up in
203 another idmapping. In order for this to work both idmappings need to contain
204 the same kernel id in their kernel idmapsets. For example, consider the
205 following idmappings::
208 2. u20000:k10000:r10000
210 and we are mapping ``u1000`` down to ``k11000`` in the first idmapping . We can
211 then translate ``k11000`` into a userspace id in the second idmapping using the
212 kernel idmapset of the second idmapping::
214 /* Map the kernel id up into a userspace id in the second idmapping. */
215 from_kuid(u20000:k10000:r10000, k11000) = u21000
217 Note, how we can get back to the kernel id in the first idmapping by inverting
220 /* Map the userspace id down into a kernel id in the second idmapping. */
221 make_kuid(u20000:k10000:r10000, u21000) = k11000
223 /* Map the kernel id up into a userspace id in the first idmapping. */
224 from_kuid(u0:k10000:r10000, k11000) = u1000
226 This algorithm allows us to answer the question what userspace id a given
227 kernel id corresponds to in a given idmapping. In order to be able to answer
228 this question both idmappings need to contain the same kernel id in their
229 respective kernel idmapsets.
231 For example, when the kernel reads a raw userspace id from disk it maps it down
232 into a kernel id according to the idmapping associated with the filesystem.
233 Let's assume the filesystem was mounted with an idmapping of
234 ``u0:k20000:r10000`` and it reads a file owned by ``u1000`` from disk. This
235 means ``u1000`` will be mapped to ``k21000`` which is what will be stored in
236 the inode's ``i_uid`` and ``i_gid`` field.
238 When someone in userspace calls ``stat()`` or a related function to get
239 ownership information about the file the kernel can't simply map the id back up
240 according to the filesystem's idmapping as this would give the wrong owner if
241 the caller is using an idmapping.
243 So the kernel will map the id back up in the idmapping of the caller. Let's
244 assume the caller has the slighly unconventional idmapping
245 ``u3000:k20000:r10000`` then ``k21000`` would map back up to ``u4000``.
246 Consequently the user would see that this file is owned by ``u4000``.
251 It is possible to translate a kernel id from one idmapping to another one via
252 the userspace idmapset of the two idmappings. This is equivalent to remapping
255 Let's look at an example. We are given the following two idmappings::
260 and we are given ``k11000`` in the first idmapping. In order to translate this
261 kernel id in the first idmapping into a kernel id in the second idmapping we
262 need to perform two steps:
264 1. Map the kernel id up into a userspace id in the first idmapping::
266 /* Map the kernel id up into a userspace id in the first idmapping. */
267 from_kuid(u0:k10000:r10000, k11000) = u1000
269 2. Map the userspace id down into a kernel id in the second idmapping::
271 /* Map the userspace id down into a kernel id in the second idmapping. */
272 make_kuid(u0:k20000:r10000, u1000) = k21000
274 As you can see we used the userspace idmapset in both idmappings to translate
275 the kernel id in one idmapping to a kernel id in another idmapping.
277 This allows us to answer the question what kernel id we would need to use to
278 get the same userspace id in another idmapping. In order to be able to answer
279 this question both idmappings need to contain the same userspace id in their
280 respective userspace idmapsets.
282 Note, how we can easily get back to the kernel id in the first idmapping by
283 inverting the algorithm:
285 1. Map the kernel id up into a userspace id in the second idmapping::
287 /* Map the kernel id up into a userspace id in the second idmapping. */
288 from_kuid(u0:k20000:r10000, k21000) = u1000
290 2. Map the userspace id down into a kernel id in the first idmapping::
292 /* Map the userspace id down into a kernel id in the first idmapping. */
293 make_kuid(u0:k10000:r10000, u1000) = k11000
295 Another way to look at this translation is to treat it as inverting one
296 idmapping and applying another idmapping if both idmappings have the relevant
297 userspace id mapped. This will come in handy when working with idmapped mounts.
302 It is never valid to use an id in the kernel idmapset of one idmapping as the
303 id in the userspace idmapset of another or the same idmapping. While the kernel
304 idmapset always indicates an idmapset in the kernel id space the userspace
305 idmapset indicates a userspace id. So the following translations are forbidden::
307 /* Map the userspace id down into a kernel id in the first idmapping. */
308 make_kuid(u0:k10000:r10000, u1000) = k11000
310 /* INVALID: Map the kernel id down into a kernel id in the second idmapping. */
311 make_kuid(u10000:k20000:r10000, k110000) = k21000
316 /* Map the kernel id up into a userspace id in the first idmapping. */
317 from_kuid(u0:k10000:r10000, k11000) = u1000
319 /* INVALID: Map the userspace id up into a userspace id in the second idmapping. */
320 from_kuid(u20000:k0:r10000, u1000) = k21000
323 Idmappings when creating filesystem objects
324 -------------------------------------------
326 The concepts of mapping an id down or mapping an id up are expressed in the two
327 kernel functions filesystem developers are rather familiar with and which we've
328 already used in this document::
330 /* Map the userspace id down into a kernel id. */
331 make_kuid(idmapping, uid)
333 /* Map the kernel id up into a userspace id. */
334 from_kuid(idmapping, kuid)
336 We will take an abbreviated look into how idmappings figure into creating
337 filesystem objects. For simplicity we will only look at what happens when the
338 VFS has already completed path lookup right before it calls into the filesystem
339 itself. So we're concerned with what happens when e.g. ``vfs_mkdir()`` is
340 called. We will also assume that the directory we're creating filesystem
341 objects in is readable and writable for everyone.
343 When creating a filesystem object the caller will look at the caller's
344 filesystem ids. These are just regular ``uid_t`` and ``gid_t`` userspace ids
345 but they are exclusively used when determining file ownership which is why they
346 are called "filesystem ids". They are usually identical to the uid and gid of
347 the caller but can differ. We will just assume they are always identical to not
348 get lost in too many details.
350 When the caller enters the kernel two things happen:
352 1. Map the caller's userspace ids down into kernel ids in the caller's
354 (To be precise, the kernel will simply look at the kernel ids stashed in the
355 credentials of the current task but for our education we'll pretend this
356 translation happens just in time.)
357 2. Verify that the caller's kernel ids can be mapped up to userspace ids in the
358 filesystem's idmapping.
360 The second step is important as regular filesystem will ultimately need to map
361 the kernel id back up into a userspace id when writing to disk.
362 So with the second step the kernel guarantees that a valid userspace id can be
363 written to disk. If it can't the kernel will refuse the creation request to not
364 even remotely risk filesystem corruption.
366 The astute reader will have realized that this is simply a varation of the
367 crossmapping algorithm we mentioned above in a previous section. First, the
368 kernel maps the caller's userspace id down into a kernel id according to the
369 caller's idmapping and then maps that kernel id up according to the
370 filesystem's idmapping.
378 caller idmapping: u0:k0:r4294967295
379 filesystem idmapping: u0:k0:r4294967295
381 Both the caller and the filesystem use the identity idmapping:
383 1. Map the caller's userspace ids into kernel ids in the caller's idmapping::
385 make_kuid(u0:k0:r4294967295, u1000) = k1000
387 2. Verify that the caller's kernel ids can be mapped to userspace ids in the
388 filesystem's idmapping.
390 For this second step the kernel will call the function
391 ``fsuidgid_has_mapping()`` which ultimately boils down to calling
394 from_kuid(u0:k0:r4294967295, k1000) = u1000
396 In this example both idmappings are the same so there's nothing exciting going
397 on. Ultimately the userspace id that lands on disk will be ``u1000``.
405 caller idmapping: u0:k10000:r10000
406 filesystem idmapping: u0:k20000:r10000
408 1. Map the caller's userspace ids down into kernel ids in the caller's
411 make_kuid(u0:k10000:r10000, u1000) = k11000
413 2. Verify that the caller's kernel ids can be mapped up to userspace ids in the
414 filesystem's idmapping::
416 from_kuid(u0:k20000:r10000, k11000) = u-1
418 It's immediately clear that while the caller's userspace id could be
419 successfully mapped down into kernel ids in the caller's idmapping the kernel
420 ids could not be mapped up according to the filesystem's idmapping. So the
421 kernel will deny this creation request.
423 Note that while this example is less common, because most filesystem can't be
424 mounted with non-initial idmappings this is a general problem as we can see in
433 caller idmapping: u0:k10000:r10000
434 filesystem idmapping: u0:k0:r4294967295
436 1. Map the caller's userspace ids down into kernel ids in the caller's
439 make_kuid(u0:k10000:r10000, u1000) = k11000
441 2. Verify that the caller's kernel ids can be mapped up to userspace ids in the
442 filesystem's idmapping::
444 from_kuid(u0:k0:r4294967295, k11000) = u11000
446 We can see that the translation always succeeds. The userspace id that the
447 filesystem will ultimately put to disk will always be identical to the value of
448 the kernel id that was created in the caller's idmapping. This has mainly two
451 First, that we can't allow a caller to ultimately write to disk with another
452 userspace id. We could only do this if we were to mount the whole fileystem
453 with the caller's or another idmapping. But that solution is limited to a few
454 filesystems and not very flexible. But this is a use-case that is pretty
455 important in containerized workloads.
457 Second, the caller will usually not be able to create any files or access
458 directories that have stricter permissions because none of the filesystem's
459 kernel ids map up into valid userspace ids in the caller's idmapping
461 1. Map raw userspace ids down to kernel ids in the filesystem's idmapping::
463 make_kuid(u0:k0:r4294967295, u1000) = k1000
465 2. Map kernel ids up to userspace ids in the caller's idmapping::
467 from_kuid(u0:k10000:r10000, k1000) = u-1
475 caller idmapping: u0:k10000:r10000
476 filesystem idmapping: u0:k0:r4294967295
478 In order to report ownership to userspace the kernel uses the crossmapping
479 algorithm introduced in a previous section:
481 1. Map the userspace id on disk down into a kernel id in the filesystem's
484 make_kuid(u0:k0:r4294967295, u1000) = k1000
486 2. Map the kernel id up into a userspace id in the caller's idmapping::
488 from_kuid(u0:k10000:r10000, k1000) = u-1
490 The crossmapping algorithm fails in this case because the kernel id in the
491 filesystem idmapping cannot be mapped up to a userspace id in the caller's
492 idmapping. Thus, the kernel will report the ownership of this file as the
501 caller idmapping: u0:k10000:r10000
502 filesystem idmapping: u0:k20000:r10000
504 In order to report ownership to userspace the kernel uses the crossmapping
505 algorithm introduced in a previous section:
507 1. Map the userspace id on disk down into a kernel id in the filesystem's
510 make_kuid(u0:k20000:r10000, u1000) = k21000
512 2. Map the kernel id up into a userspace id in the caller's idmapping::
514 from_kuid(u0:k10000:r10000, k21000) = u-1
516 Again, the crossmapping algorithm fails in this case because the kernel id in
517 the filesystem idmapping cannot be mapped to a userspace id in the caller's
518 idmapping. Thus, the kernel will report the ownership of this file as the
521 Note how in the last two examples things would be simple if the caller would be
522 using the initial idmapping. For a filesystem mounted with the initial
523 idmapping it would be trivial. So we only consider a filesystem with an
524 idmapping of ``u0:k20000:r10000``:
526 1. Map the userspace id on disk down into a kernel id in the filesystem's
529 make_kuid(u0:k20000:r10000, u1000) = k21000
531 2. Map the kernel id up into a userspace id in the caller's idmapping::
533 from_kuid(u0:k0:r4294967295, k21000) = u21000
535 Idmappings on idmapped mounts
536 -----------------------------
538 The examples we've seen in the previous section where the caller's idmapping
539 and the filesystem's idmapping are incompatible causes various issues for
540 workloads. For a more complex but common example, consider two containers
541 started on the host. To completely prevent the two containers from affecting
542 each other, an administrator may often use different non-overlapping idmappings
543 for the two containers::
545 container1 idmapping: u0:k10000:r10000
546 container2 idmapping: u0:k20000:r10000
547 filesystem idmapping: u0:k30000:r10000
549 An administrator wanting to provide easy read-write access to the following set
556 to both containers currently can't.
558 Of course the administrator has the option to recursively change ownership via
559 ``chown()``. For example, they could change ownership so that ``dir`` and all
560 files below it can be crossmapped from the filesystem's into the container's
561 idmapping. Let's assume they change ownership so it is compatible with the
562 first container's idmapping::
568 This would still leave ``dir`` rather useless to the second container. In fact,
569 ``dir`` and all files below it would continue to appear owned by the overflowid
570 for the second container.
572 Or consider another increasingly popular example. Some service managers such as
573 systemd implement a concept called "portable home directories". A user may want
574 to use their home directories on different machines where they are assigned
575 different login userspace ids. Most users will have ``u1000`` as the login id
576 on their machine at home and all files in their home directory will usually be
577 owned by ``u1000``. At uni or at work they may have another login id such as
578 ``u1125``. This makes it rather difficult to interact with their home directory
579 on their work machine.
581 In both cases changing ownership recursively has grave implications. The most
582 obvious one is that ownership is changed globally and permanently. In the home
583 directory case this change in ownership would even need to happen everytime the
584 user switches from their home to their work machine. For really large sets of
585 files this becomes increasingly costly.
587 If the user is lucky, they are dealing with a filesystem that is mountable
588 inside user namespaces. But this would also change ownership globally and the
589 change in ownership is tied to the lifetime of the filesystem mount, i.e. the
590 superblock. The only way to change ownership is to completely unmount the
591 filesystem and mount it again in another user namespace. This is usually
592 impossible because it would mean that all users currently accessing the
593 filesystem can't anymore. And it means that ``dir`` still can't be shared
594 between two containers with different idmappings.
595 But usually the user doesn't even have this option since most filesystems
596 aren't mountable inside containers. And not having them mountable might be
597 desirable as it doesn't require the filesystem to deal with malicious
600 But the usecases mentioned above and more can be handled by idmapped mounts.
601 They allow to expose the same set of dentries with different ownership at
602 different mounts. This is achieved by marking the mounts with a user namespace
603 through the ``mount_setattr()`` system call. The idmapping associated with it
604 is then used to translate from the caller's idmapping to the filesystem's
605 idmapping and vica versa using the remapping algorithm we introduced above.
607 Idmapped mounts make it possible to change ownership in a temporary and
608 localized way. The ownership changes are restricted to a specific mount and the
609 ownership changes are tied to the lifetime of the mount. All other users and
610 locations where the filesystem is exposed are unaffected.
612 Filesystems that support idmapped mounts don't have any real reason to support
613 being mountable inside user namespaces. A filesystem could be exposed
614 completely under an idmapped mount to get the same effect. This has the
615 advantage that filesystems can leave the creation of the superblock to
616 privileged users in the initial user namespace.
618 However, it is perfectly possible to combine idmapped mounts with filesystems
619 mountable inside user namespaces. We will touch on this further below.
624 Idmapping functions were added that translate between idmappings. They make use
625 of the remapping algorithm we've introduced earlier. We're going to look at
628 - ``i_uid_into_mnt()`` and ``i_gid_into_mnt()``
630 The ``i_*id_into_mnt()`` functions translate filesystem's kernel ids into
631 kernel ids in the mount's idmapping::
633 /* Map the filesystem's kernel id up into a userspace id in the filesystem's idmapping. */
634 from_kuid(filesystem, kid) = uid
636 /* Map the filesystem's userspace id down ito a kernel id in the mount's idmapping. */
637 make_kuid(mount, uid) = kuid
639 - ``mapped_fsuid()`` and ``mapped_fsgid()``
641 The ``mapped_fs*id()`` functions translate the caller's kernel ids into
642 kernel ids in the filesystem's idmapping. This translation is achieved by
643 remapping the caller's kernel ids using the mount's idmapping::
645 /* Map the caller's kernel id up into a userspace id in the mount's idmapping. */
646 from_kuid(mount, kid) = uid
648 /* Map the mount's userspace id down into a kernel id in the filesystem's idmapping. */
649 make_kuid(filesystem, uid) = kuid
651 Note that these two functions invert each other. Consider the following
654 caller idmapping: u0:k10000:r10000
655 filesystem idmapping: u0:k20000:r10000
656 mount idmapping: u0:k10000:r10000
658 Assume a file owned by ``u1000`` is read from disk. The filesystem maps this id
659 to ``k21000`` according to it's idmapping. This is what is stored in the
660 inode's ``i_uid`` and ``i_gid`` fields.
662 When the caller queries the ownership of this file via ``stat()`` the kernel
663 would usually simply use the crossmapping algorithm and map the filesystem's
664 kernel id up to a userspace id in the caller's idmapping.
666 But when the caller is accessing the file on an idmapped mount the kernel will
667 first call ``i_uid_into_mnt()`` thereby translating the filesystem's kernel id
668 into a kernel id in the mount's idmapping::
670 i_uid_into_mnt(k21000):
671 /* Map the filesystem's kernel id up into a userspace id. */
672 from_kuid(u0:k20000:r10000, k21000) = u1000
674 /* Map the filesystem's userspace id down ito a kernel id in the mount's idmapping. */
675 make_kuid(u0:k10000:r10000, u1000) = k11000
677 Finally, when the kernel reports the owner to the caller it will turn the
678 kernel id in the mount's idmapping into a userspace id in the caller's
681 from_kuid(u0:k10000:r10000, k11000) = u1000
683 We can test whether this algorithm really works by verifying what happens when
684 we create a new file. Let's say the user is creating a file with ``u1000``.
686 The kernel maps this to ``k11000`` in the caller's idmapping. Usually the
687 kernel would now apply the crossmapping, verifying that ``k11000`` can be
688 mapped to a userspace id in the filesystem's idmapping. Since ``k11000`` can't
689 be mapped up in the filesystem's idmapping directly this creation request
692 But when the caller is accessing the file on an idmapped mount the kernel will
693 first call ``mapped_fs*id()`` thereby translating the caller's kernel id into
694 a kernel id according to the mount's idmapping::
696 mapped_fsuid(k11000):
697 /* Map the caller's kernel id up into a userspace id in the mount's idmapping. */
698 from_kuid(u0:k10000:r10000, k11000) = u1000
700 /* Map the mount's userspace id down into a kernel id in the filesystem's idmapping. */
701 make_kuid(u0:k20000:r10000, u1000) = k21000
703 When finally writing to disk the kernel will then map ``k21000`` up into a
704 userspace id in the filesystem's idmapping::
706 from_kuid(u0:k20000:r10000, k21000) = u1000
708 As we can see, we end up with an invertible and therefore information
709 preserving algorithm. A file created from ``u1000`` on an idmapped mount will
710 also be reported as being owned by ``u1000`` and vica versa.
712 Let's now briefly reconsider the failing examples from earlier in the context
715 Example 2 reconsidered
716 ~~~~~~~~~~~~~~~~~~~~~~
721 caller idmapping: u0:k10000:r10000
722 filesystem idmapping: u0:k20000:r10000
723 mount idmapping: u0:k10000:r10000
725 When the caller is using a non-initial idmapping the common case is to attach
726 the same idmapping to the mount. We now perform three steps:
728 1. Map the caller's userspace ids into kernel ids in the caller's idmapping::
730 make_kuid(u0:k10000:r10000, u1000) = k11000
732 2. Translate the caller's kernel id into a kernel id in the filesystem's
735 mapped_fsuid(k11000):
736 /* Map the kernel id up into a userspace id in the mount's idmapping. */
737 from_kuid(u0:k10000:r10000, k11000) = u1000
739 /* Map the userspace id down into a kernel id in the filesystem's idmapping. */
740 make_kuid(u0:k20000:r10000, u1000) = k21000
742 2. Verify that the caller's kernel ids can be mapped to userspace ids in the
743 filesystem's idmapping::
745 from_kuid(u0:k20000:r10000, k21000) = u1000
747 So the ownership that lands on disk will be ``u1000``.
749 Example 3 reconsidered
750 ~~~~~~~~~~~~~~~~~~~~~~
755 caller idmapping: u0:k10000:r10000
756 filesystem idmapping: u0:k0:r4294967295
757 mount idmapping: u0:k10000:r10000
759 The same translation algorithm works with the third example.
761 1. Map the caller's userspace ids into kernel ids in the caller's idmapping::
763 make_kuid(u0:k10000:r10000, u1000) = k11000
765 2. Translate the caller's kernel id into a kernel id in the filesystem's
768 mapped_fsuid(k11000):
769 /* Map the kernel id up into a userspace id in the mount's idmapping. */
770 from_kuid(u0:k10000:r10000, k11000) = u1000
772 /* Map the userspace id down into a kernel id in the filesystem's idmapping. */
773 make_kuid(u0:k0:r4294967295, u1000) = k1000
775 2. Verify that the caller's kernel ids can be mapped to userspace ids in the
776 filesystem's idmapping::
778 from_kuid(u0:k0:r4294967295, k21000) = u1000
780 So the ownership that lands on disk will be ``u1000``.
782 Example 4 reconsidered
783 ~~~~~~~~~~~~~~~~~~~~~~
788 caller idmapping: u0:k10000:r10000
789 filesystem idmapping: u0:k0:r4294967295
790 mount idmapping: u0:k10000:r10000
792 In order to report ownership to userspace the kernel now does three steps using
793 the translation algorithm we introduced earlier:
795 1. Map the userspace id on disk down into a kernel id in the filesystem's
798 make_kuid(u0:k0:r4294967295, u1000) = k1000
800 2. Translate the kernel id into a kernel id in the mount's idmapping::
802 i_uid_into_mnt(k1000):
803 /* Map the kernel id up into a userspace id in the filesystem's idmapping. */
804 from_kuid(u0:k0:r4294967295, k1000) = u1000
806 /* Map the userspace id down into a kernel id in the mounts's idmapping. */
807 make_kuid(u0:k10000:r10000, u1000) = k11000
809 3. Map the kernel id up into a userspace id in the caller's idmapping::
811 from_kuid(u0:k10000:r10000, k11000) = u1000
813 Earlier, the caller's kernel id couldn't be crossmapped in the filesystems's
814 idmapping. With the idmapped mount in place it now can be crossmapped into the
815 filesystem's idmapping via the mount's idmapping. The file will now be created
816 with ``u1000`` according to the mount's idmapping.
818 Example 5 reconsidered
819 ~~~~~~~~~~~~~~~~~~~~~~
824 caller idmapping: u0:k10000:r10000
825 filesystem idmapping: u0:k20000:r10000
826 mount idmapping: u0:k10000:r10000
828 Again, in order to report ownership to userspace the kernel now does three
829 steps using the translation algorithm we introduced earlier:
831 1. Map the userspace id on disk down into a kernel id in the filesystem's
834 make_kuid(u0:k20000:r10000, u1000) = k21000
836 2. Translate the kernel id into a kernel id in the mount's idmapping::
838 i_uid_into_mnt(k21000):
839 /* Map the kernel id up into a userspace id in the filesystem's idmapping. */
840 from_kuid(u0:k20000:r10000, k21000) = u1000
842 /* Map the userspace id down into a kernel id in the mounts's idmapping. */
843 make_kuid(u0:k10000:r10000, u1000) = k11000
845 3. Map the kernel id up into a userspace id in the caller's idmapping::
847 from_kuid(u0:k10000:r10000, k11000) = u1000
849 Earlier, the file's kernel id couldn't be crossmapped in the filesystems's
850 idmapping. With the idmapped mount in place it now can be crossmapped into the
851 filesystem's idmapping via the mount's idmapping. The file is now owned by
852 ``u1000`` according to the mount's idmapping.
854 Changing ownership on a home directory
855 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
857 We've seen above how idmapped mounts can be used to translate between
858 idmappings when either the caller, the filesystem or both uses a non-initial
859 idmapping. A wide range of usecases exist when the caller is using
860 a non-initial idmapping. This mostly happens in the context of containerized
861 workloads. The consequence is as we have seen that for both, filesystem's
862 mounted with the initial idmapping and filesystems mounted with non-initial
863 idmappings, access to the filesystem isn't working because the kernel ids can't
864 be crossmapped between the caller's and the filesystem's idmapping.
866 As we've seen above idmapped mounts provide a solution to this by remapping the
867 caller's or filesystem's idmapping according to the mount's idmapping.
869 Aside from containerized workloads, idmapped mounts have the advantage that
870 they also work when both the caller and the filesystem use the initial
871 idmapping which means users on the host can change the ownership of directories
872 and files on a per-mount basis.
874 Consider our previous example where a user has their home directory on portable
875 storage. At home they have id ``u1000`` and all files in their home directory
876 are owned by ``u1000`` whereas at uni or work they have login id ``u1125``.
878 Taking their home directory with them becomes problematic. They can't easily
879 access their files, they might not be able to write to disk without applying
880 lax permissions or ACLs and even if they can, they will end up with an annoying
881 mix of files and directories owned by ``u1000`` and ``u1125``.
883 Idmapped mounts allow to solve this problem. A user can create an idmapped
884 mount for their home directory on their work computer or their computer at home
885 depending on what ownership they would prefer to end up on the portable storage
888 Let's assume they want all files on disk to belong to ``u1000``. When the user
889 plugs in their portable storage at their work station they can setup a job that
890 creates an idmapped mount with the minimal idmapping ``u1000:k1125:r1``. So now
891 when they create a file the kernel performs the following steps we already know
895 caller idmapping: u0:k0:r4294967295
896 filesystem idmapping: u0:k0:r4294967295
897 mount idmapping: u1000:k1125:r1
899 1. Map the caller's userspace ids into kernel ids in the caller's idmapping::
901 make_kuid(u0:k0:r4294967295, u1125) = k1125
903 2. Translate the caller's kernel id into a kernel id in the filesystem's
907 /* Map the kernel id up into a userspace id in the mount's idmapping. */
908 from_kuid(u1000:k1125:r1, k1125) = u1000
910 /* Map the userspace id down into a kernel id in the filesystem's idmapping. */
911 make_kuid(u0:k0:r4294967295, u1000) = k1000
913 2. Verify that the caller's kernel ids can be mapped to userspace ids in the
914 filesystem's idmapping::
916 from_kuid(u0:k0:r4294967295, k1000) = u1000
918 So ultimately the file will be created with ``u1000`` on disk.
920 Now let's briefly look at what ownership the caller with id ``u1125`` will see
921 on their work computer:
926 caller idmapping: u0:k0:r4294967295
927 filesystem idmapping: u0:k0:r4294967295
928 mount idmapping: u1000:k1125:r1
930 1. Map the userspace id on disk down into a kernel id in the filesystem's
933 make_kuid(u0:k0:r4294967295, u1000) = k1000
935 2. Translate the kernel id into a kernel id in the mount's idmapping::
937 i_uid_into_mnt(k1000):
938 /* Map the kernel id up into a userspace id in the filesystem's idmapping. */
939 from_kuid(u0:k0:r4294967295, k1000) = u1000
941 /* Map the userspace id down into a kernel id in the mounts's idmapping. */
942 make_kuid(u1000:k1125:r1, u1000) = k1125
944 3. Map the kernel id up into a userspace id in the caller's idmapping::
946 from_kuid(u0:k0:r4294967295, k1125) = u1125
948 So ultimately the caller will be reported that the file belongs to ``u1125``
949 which is the caller's userspace id on their workstation in our example.
951 The raw userspace id that is put on disk is ``u1000`` so when the user takes
952 their home directory back to their home computer where they are assigned
953 ``u1000`` using the initial idmapping and mount the filesystem with the initial
954 idmapping they will see all those files owned by ``u1000``.
959 Currently, the implementation of idmapped mounts enforces that the filesystem
960 is mounted with the initial idmapping. The reason is simply that none of the
961 filesystems that we targeted were mountable with a non-initial idmapping. But
962 that might change soon enough. As we've seen above, thanks to the properties of
963 idmappings the translation works for both filesystems mounted with the initial
964 idmapping and filesystem with non-initial idmappings.
966 Based on this current restriction to filesystem mounted with the initial
967 idmapping two noticeable shortcuts have been taken:
969 1. We always stash a reference to the initial user namespace in ``struct
970 vfsmount``. Idmapped mounts are thus mounts that have a non-initial user
971 namespace attached to them.
973 In order to support idmapped mounts this needs to be changed. Instead of
974 stashing the initial user namespace the user namespace the filesystem was
975 mounted with must be stashed. An idmapped mount is then any mount that has
976 a different user namespace attached then the filesystem was mounted with.
977 This has no user-visible consequences.
979 2. The translation algorithms in ``mapped_fs*id()`` and ``i_*id_into_mnt()``
982 Let's consider ``mapped_fs*id()`` first. This function translates the
983 caller's kernel id into a kernel id in the filesystem's idmapping via
984 a mount's idmapping. The full algorithm is::
987 /* Map the kernel id up into a userspace id in the mount's idmapping. */
988 from_kuid(mount-idmapping, kid) = uid
990 /* Map the userspace id down into a kernel id in the filesystem's idmapping. */
991 make_kuid(filesystem-idmapping, uid) = kuid
993 We know that the filesystem is always mounted with the initial idmapping as
994 we enforce this in ``mount_setattr()``. So this can be shortened to::
997 /* Map the kernel id up into a userspace id in the mount's idmapping. */
998 from_kuid(mount-idmapping, kid) = uid
1000 /* Map the userspace id down into a kernel id in the filesystem's idmapping. */
1001 KUIDT_INIT(uid) = kuid
1003 Similarly, for ``i_*id_into_mnt()`` which translated the filesystem's kernel
1004 id into a mount's kernel id::
1006 i_uid_into_mnt(kid):
1007 /* Map the kernel id up into a userspace id in the filesystem's idmapping. */
1008 from_kuid(filesystem-idmapping, kid) = uid
1010 /* Map the userspace id down into a kernel id in the mounts's idmapping. */
1011 make_kuid(mount-idmapping, uid) = kuid
1013 Again, we know that the filesystem is always mounted with the initial
1014 idmapping as we enforce this in ``mount_setattr()``. So this can be
1017 i_uid_into_mnt(kid):
1018 /* Map the kernel id up into a userspace id in the filesystem's idmapping. */
1019 __kuid_val(kid) = uid
1021 /* Map the userspace id down into a kernel id in the mounts's idmapping. */
1022 make_kuid(mount-idmapping, uid) = kuid
1024 Handling filesystems mounted with non-initial idmappings requires that the
1025 translation functions be converted to their full form. They can still be
1026 shortcircuited on non-idmapped mounts. This has no user-visible consequences.