Message ID | 1486041664-25836-1-git-send-email-zygmunt.krynicki@canonical.com |
---|---|
State | New |
Headers | show |
On Thu, Feb 02, 2017 at 02:21:04PM +0100, Zygmunt Krynicki wrote: > This patch adds a new MNT_ flag that is set for bind mounts (it mirrors > MS_BIND) and surfaces it via mountinfo. This allows for easier > identification of mount entries that are bind mounted from somewhere > else. > > Signed-off-by: Zygmunt Krynicki <zygmunt.krynicki@canonical.com> This is a change in userspace ABI, so this is not something that we're likely to take into Ubuntu without being sure that it will also be accepted upstream. However my expectation is that this patch would meet resistance or be rejected outright upstream (explanation follows). Can you explain why you need this? Fundamentally "bind" refers to the mount operation and not to some property of the mount itself. Once you've performed a bind mount the new mount is more or less equivalent to the original - they have a peer relationship and not a parent/child sort of relationship. The kernel knows about the relationships between mounts, but whether or not one is the original and the other was created via a bind mount operation is irrelevant. The relationship between two mounts can be seen from userspace using /proc/<pid>/mountinfo. For example: # mount -o loop fs.img a # mount --bind a b # mount --bind a/foo c # cat /proc/self/mountinfo ... 158 26 7:0 / /home/ubuntu/bind-test/a rw,relatime shared:135 - ext4 /dev/loop0 rw,data=ordered 162 26 7:0 / /home/ubuntu/bind-test/b rw,relatime shared:135 - ext4 /dev/loop0 rw,data=ordered 166 26 7:0 /foo /home/ubuntu/bind-test/c rw,relatime shared:135 - ext4 /dev/loop0 rw,data=ordered The "shared:135" indicates that these mounts are all part of the same peer group (i.e. peer group 135). The mounts at .../a and .../b are completely equivalent to the kernel. It does show you that the mount at .../c is a little different. The 4th column indicates that the it mounts only a subtree of the filesystem. In all other ways it is equivalent to the other two mounts. Seth
On Thu, Feb 2, 2017 at 4:45 PM, Seth Forshee <seth.forshee@canonical.com> wrote: > On Thu, Feb 02, 2017 at 02:21:04PM +0100, Zygmunt Krynicki wrote: >> This patch adds a new MNT_ flag that is set for bind mounts (it mirrors >> MS_BIND) and surfaces it via mountinfo. This allows for easier >> identification of mount entries that are bind mounted from somewhere >> else. >> >> Signed-off-by: Zygmunt Krynicki <zygmunt.krynicki@canonical.com> > > This is a change in userspace ABI, so this is not something that we're > likely to take into Ubuntu without being sure that it will also be > accepted upstream. > > However my expectation is that this patch would meet resistance or be > rejected outright upstream (explanation follows). Can you explain why > you need this? Yes. I will explain at the end of this message. > Fundamentally "bind" refers to the mount operation and not to some > property of the mount itself. Once you've performed a bind mount the new > mount is more or less equivalent to the original - they have a peer > relationship and not a parent/child sort of relationship. The kernel > knows about the relationships between mounts, but whether or not one is > the original and the other was created via a bind mount operation is > irrelevant. > > The relationship between two mounts can be seen from userspace using > /proc/<pid>/mountinfo. For example: > > # mount -o loop fs.img a > # mount --bind a b > # mount --bind a/foo c > # cat /proc/self/mountinfo > ... > 158 26 7:0 / /home/ubuntu/bind-test/a rw,relatime shared:135 - ext4 /dev/loop0 rw,data=ordered > 162 26 7:0 / /home/ubuntu/bind-test/b rw,relatime shared:135 - ext4 /dev/loop0 rw,data=ordered > 166 26 7:0 /foo /home/ubuntu/bind-test/c rw,relatime shared:135 - ext4 /dev/loop0 rw,data=ordered > > The "shared:135" indicates that these mounts are all part of the same > peer group (i.e. peer group 135). The mounts at .../a and .../b are > completely equivalent to the kernel. Until you change the sharing among them and... # mount -o loop fs.img a # mount --bind a b # mount --bind a/foo c # cat /proc/self/mountinfo | tail -n 3 223 199 7:19 / /home/zyga/experiment/a rw,relatime shared:262 - ext4 /dev/loop19 rw,data=ordered 236 199 7:19 / /home/zyga/experiment/b rw,relatime shared:262 - ext4 /dev/loop19 rw,data=ordered 257 199 7:19 /foo /home/zyga/experiment/c rw,relatime,bind shared:262 - ext4 /dev/loop19 rw,data=ordered # mount --make-private b # cat /proc/self/mountinfo | tail -n 3 223 199 7:19 / /home/zyga/experiment/a rw,relatime shared:262 - ext4 /dev/loop19 rw,data=ordered 236 199 7:19 / /home/zyga/experiment/b rw,relatime - ext4 /dev/loop19 rw,data=ordered 257 199 7:19 /foo /home/zyga/experiment/c rw,relatime,bind shared:262 - ext4 /dev/loop19 rw,data=ordered EDIT: writing this I realized what my problem really is... let me try to explain below (ignore that paste above) My original problem is: given a declaration that "$source should be bind-mounted in $destination" and the state of mountinfo, should this operation be performed or is it already done? # mount --bind /snap/snapd-hacker-toolbelt/14/src/ /snap/snapd-hacker-toolbelt/14/dst/ # cat /proc/self/mountinfo | tail -n 1 678 737 7:8 /src /snap/snapd-hacker-toolbelt/14/dst rw,relatime,bind master:30 - squashfs /dev/loop8 ro The problem with the way mountinfo presents the facts is that I was implicitly looking for "$source" somewhere to cross-reference. I think I now realize what I want to check for is different but already present in the data that I have. I just need to come up with a set of things that are equivalent to the $source mount and see if my $destination mount is present there. First, I want to find what the $source" really is. In the example above source is "/snap/snapd-hacker-toolbelt/14/src/". Scanning the mount table I can find. 737 730 7:8 / /snap/snapd-hacker-toolbelt/14 rw,relatime master:30 - squashfs /dev/loop8 ro Now this is not exactly $source but it is the longest prefix of $source that I can find. A quick guess would be to look for a perfect match and if not found, discard the final component and look again. Once I know what the source really is (it is /dev/loop8 + the concatenation of the discarded final components) I can state a different question: What is the set of mount entries using /dev/loop8: /home/zyga # cat /proc/self/mountinfo | grep loop8 461 224 7:8 / /var/lib/snapd/hostfs/snap/snapd-hacker-toolbelt/14 rw,relatime master:30 - squashfs /dev/loop8 ro 737 730 7:8 / /snap/snapd-hacker-toolbelt/14 rw,relatime master:30 - squashfs /dev/loop8 ro 678 737 7:8 /src /snap/snapd-hacker-toolbelt/14/dst rw,relatime,bind master:30 - squashfs /dev/loop8 ro I can now infer that both /snap/snapd-hacker-toolbelt/14/dst and /snap/snapd-hacker-toolbelt/14/src represent the same object: /src from /dev/loop8. My original question, should I do that bind mount or is it already done can be answered by checking if the $destination is present in the set above. Do you think I am on the right track? ZK
On Thu, Feb 02, 2017 at 06:48:11PM +0100, Zygmunt Krynicki wrote: > On Thu, Feb 2, 2017 at 4:45 PM, Seth Forshee <seth.forshee@canonical.com> wrote: > > On Thu, Feb 02, 2017 at 02:21:04PM +0100, Zygmunt Krynicki wrote: > >> This patch adds a new MNT_ flag that is set for bind mounts (it mirrors > >> MS_BIND) and surfaces it via mountinfo. This allows for easier > >> identification of mount entries that are bind mounted from somewhere > >> else. > >> > >> Signed-off-by: Zygmunt Krynicki <zygmunt.krynicki@canonical.com> > > > > This is a change in userspace ABI, so this is not something that we're > > likely to take into Ubuntu without being sure that it will also be > > accepted upstream. > > > > However my expectation is that this patch would meet resistance or be > > rejected outright upstream (explanation follows). Can you explain why > > you need this? > > Yes. I will explain at the end of this message. > > > Fundamentally "bind" refers to the mount operation and not to some > > property of the mount itself. Once you've performed a bind mount the new > > mount is more or less equivalent to the original - they have a peer > > relationship and not a parent/child sort of relationship. The kernel > > knows about the relationships between mounts, but whether or not one is > > the original and the other was created via a bind mount operation is > > irrelevant. > > > > The relationship between two mounts can be seen from userspace using > > /proc/<pid>/mountinfo. For example: > > > > # mount -o loop fs.img a > > # mount --bind a b > > # mount --bind a/foo c > > # cat /proc/self/mountinfo > > ... > > 158 26 7:0 / /home/ubuntu/bind-test/a rw,relatime shared:135 - ext4 /dev/loop0 rw,data=ordered > > 162 26 7:0 / /home/ubuntu/bind-test/b rw,relatime shared:135 - ext4 /dev/loop0 rw,data=ordered > > 166 26 7:0 /foo /home/ubuntu/bind-test/c rw,relatime shared:135 - ext4 /dev/loop0 rw,data=ordered > > > > The "shared:135" indicates that these mounts are all part of the same > > peer group (i.e. peer group 135). The mounts at .../a and .../b are > > completely equivalent to the kernel. > > Until you change the sharing among them and... > > # mount -o loop fs.img a > # mount --bind a b > # mount --bind a/foo c > # cat /proc/self/mountinfo | tail -n 3 > 223 199 7:19 / /home/zyga/experiment/a rw,relatime shared:262 - ext4 > /dev/loop19 rw,data=ordered > 236 199 7:19 / /home/zyga/experiment/b rw,relatime shared:262 - ext4 > /dev/loop19 rw,data=ordered > 257 199 7:19 /foo /home/zyga/experiment/c rw,relatime,bind shared:262 > - ext4 /dev/loop19 rw,data=ordered > # mount --make-private b > # cat /proc/self/mountinfo | tail -n 3 > 223 199 7:19 / /home/zyga/experiment/a rw,relatime shared:262 - ext4 > /dev/loop19 rw,data=ordered > 236 199 7:19 / /home/zyga/experiment/b rw,relatime - ext4 /dev/loop19 > rw,data=ordered > 257 199 7:19 /foo /home/zyga/experiment/c rw,relatime,bind shared:262 > - ext4 /dev/loop19 rw,data=ordered > > EDIT: writing this I realized what my problem really is... let me try > to explain below (ignore that paste above) > > My original problem is: given a declaration that "$source should be > bind-mounted in $destination" and the state of mountinfo, should this > operation be performed or is it already done? > > # mount --bind /snap/snapd-hacker-toolbelt/14/src/ > /snap/snapd-hacker-toolbelt/14/dst/ > # cat /proc/self/mountinfo | tail -n 1 > 678 737 7:8 /src /snap/snapd-hacker-toolbelt/14/dst rw,relatime,bind > master:30 - squashfs /dev/loop8 ro > > The problem with the way mountinfo presents the facts is that I was > implicitly looking for "$source" somewhere to cross-reference. I think > I now realize what I want to check for is different but already > present in the data that I have. I just need to come up with a set of > things that are equivalent to the $source mount and see if my > $destination mount is present there. > > First, I want to find what the $source" really is. In the example > above source is "/snap/snapd-hacker-toolbelt/14/src/". Scanning the > mount table I can find. > > 737 730 7:8 / /snap/snapd-hacker-toolbelt/14 rw,relatime master:30 - > squashfs /dev/loop8 ro > > Now this is not exactly $source but it is the longest prefix of > $source that I can find. A quick guess would be to look for a perfect > match and if not found, discard the final component and look again. > Once I know what the source really is (it is /dev/loop8 + the > concatenation of the discarded final components) I can state a > different question: > > What is the set of mount entries using /dev/loop8: > > /home/zyga # cat /proc/self/mountinfo | grep loop8 > 461 224 7:8 / /var/lib/snapd/hostfs/snap/snapd-hacker-toolbelt/14 > rw,relatime master:30 - squashfs /dev/loop8 ro > 737 730 7:8 / /snap/snapd-hacker-toolbelt/14 rw,relatime master:30 - > squashfs /dev/loop8 ro > 678 737 7:8 /src /snap/snapd-hacker-toolbelt/14/dst rw,relatime,bind > master:30 - squashfs /dev/loop8 ro > > I can now infer that both /snap/snapd-hacker-toolbelt/14/dst and > /snap/snapd-hacker-toolbelt/14/src represent the same object: /src > from /dev/loop8. > > My original question, should I do that bind mount or is it already > done can be answered by checking if the $destination is present in the > set above. > > Do you think I am on the right track? I think you're very close. Let's walk through it step by step. A prerequisite is to really understand the data in mountinfo. Take a look at the description of mountinfo in Documentation/filesystems/proc.txt in the kernel source tree (also you'll want to understand that well to parse it programatically; there's a section with 0 or more optional fields). The root question is, "has $source been bind mounted at $dest." Working with your example above, we'd have: $source = /snap/snapd-hacker-toolbelt/14/src $dest = /snap/snapd-hacker-toolbelt/14/dst First let's determine if anything is even mounted at $dest. By scanning mountinfo and looking for $dest (expressed as the full path relative to the process's root) in the "mount point" field of one of mountinfo's entries. Note however that the mount point could appear more than once, with mounts covering other mounts, and I assume you really want to know about the topmost mount (the one whose contents you'd see in the mount point). I suspect this will always be the last entry in mountinfo with $dest as its mount point, but I need to confirm this. If nothing is mounted at $dest, then you've got your answer. Otherwise you need to determine exactly what's mounted there. In our example we end up with this entry: 678 737 7:8 /src /snap/snapd-hacker-toolbelt/14/dst rw,relatime master:30 - squashfs /dev/loop8 ro We're going to be interested in three fields - root, mount point, and mount source. Extracting these: $dest: root = /src mount_point = /snap/snapd-hacker-toolbelt/14/dst mount_source = /dev/loop8 From this we know the root of the mount at $dest is /src within the filesystem at /dev/loop8. One important thing to note here is that mount_source is filesystem specific, so for some filesystems such as fuse you can't count on it being a path to a block device. You can count on it being the same for any two entries which represent mounts from the same filesystem, however. Now, as you said, we should search mountinfo for the entry whose mount point is the longest prefix of $source (keeping in mind that there could be more than one). That gives us: 737 730 7:8 / /snap/snapd-hacker-toolbelt/14 rw,relatime master:30 - squashfs /dev/loop8 ro Or: $source: root = / mount_point = /snap/snapd-hacker-toolbelt/14 mount_source = /dev/loop8 If Ssource and $dest have different mount_source's you know you don't have the bind mount you're looking for. Otherwise you should strip the mount_point path component from $soure then prepend its root components, which will give you the path of $source within the mount_source. If this is the same as the root for $dest then you have what you want mounted there. I believe that should give you a definite answer as to whether the bind mount you desire has already been done. Let me know if you find any mistakes or have any questions. I'll look into how to be sure that you've identified the topmost mount for a given path and let you know. Seth
On Thu, Feb 02, 2017 at 01:27:11PM -0600, Seth Forshee wrote: > Note however that the mount point could appear more than once, > with mounts covering other mounts, and I assume you really want to know > about the topmost mount (the one whose contents you'd see in the mount > point). I suspect this will always be the last entry in mountinfo with > $dest as its mount point, but I need to confirm this. This should be true. mountinfo displays mounts in the order that they appear in the mount namespace's list of mounts, and as new mounts are added at the end of the list a new mount will always appear after any mounts which are shadowed by it. Seth
diff --git a/fs/namespace.c b/fs/namespace.c index c9ba9d1..e2fc0c1 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -2162,6 +2162,7 @@ static int do_loopback(struct path *path, const char *old_name, } mnt->mnt.mnt_flags &= ~MNT_LOCKED; + mnt->mnt.mnt_flags |= MNT_BIND; err = graft_tree(mnt, parent, mp); if (err) { diff --git a/fs/proc_namespace.c b/fs/proc_namespace.c index 876459559..e510585 100644 --- a/fs/proc_namespace.c +++ b/fs/proc_namespace.c @@ -67,6 +67,7 @@ static void show_mnt_opts(struct seq_file *m, struct vfsmount *mnt) { MNT_NOATIME, ",noatime" }, { MNT_NODIRATIME, ",nodiratime" }, { MNT_RELATIME, ",relatime" }, + { MNT_BIND, ",bind" }, { 0, NULL } }; const struct proc_fs_info *fs_infop; diff --git a/include/linux/mount.h b/include/linux/mount.h index 1172cce..81f7bec 100644 --- a/include/linux/mount.h +++ b/include/linux/mount.h @@ -62,6 +62,7 @@ struct mnt_namespace; #define MNT_SYNC_UMOUNT 0x2000000 #define MNT_MARKED 0x4000000 #define MNT_UMOUNT 0x8000000 +#define MNT_BIND 0x10000000 struct vfsmount { struct dentry *mnt_root; /* root of the mounted tree */
This patch adds a new MNT_ flag that is set for bind mounts (it mirrors MS_BIND) and surfaces it via mountinfo. This allows for easier identification of mount entries that are bind mounted from somewhere else. Signed-off-by: Zygmunt Krynicki <zygmunt.krynicki@canonical.com> --- fs/namespace.c | 1 + fs/proc_namespace.c | 1 + include/linux/mount.h | 1 + 3 files changed, 3 insertions(+)