fs: store MS_BIND as MNT_BIND and show it in mountinfo

Message ID	1486041664-25836-1-git-send-email-zygmunt.krynicki@canonical.com
State	New
Headers	show Return-Path: <kernel-team-bounces@lists.ubuntu.com> From: Zygmunt Krynicki <zygmunt.krynicki@canonical.com> To: kernel-team@lists.ubuntu.com Subject: [PATCH] fs: store MS_BIND as MNT_BIND and show it in mountinfo Date: Thu, 2 Feb 2017 14:21:04 +0100 Message-Id: <1486041664-25836-1-git-send-email-zygmunt.krynicki@canonical.com> Precedence: list MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: kernel-team-bounces@lists.ubuntu.com

Message ID

1486041664-25836-1-git-send-email-zygmunt.krynicki@canonical.com

State

New

Headers

From: Zygmunt Krynicki <zygmunt.krynicki@canonical.com>
To: kernel-team@lists.ubuntu.com
Subject: [PATCH] fs: store MS_BIND as MNT_BIND and show it in mountinfo
Date: Thu,  2 Feb 2017 14:21:04 +0100
Message-Id: <1486041664-25836-1-git-send-email-zygmunt.krynicki@canonical.com>
Precedence: list
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Errors-To: kernel-team-bounces@lists.ubuntu.com
Sender: kernel-team-bounces@lists.ubuntu.com

Commit Message

Zygmunt Krynicki Feb. 2, 2017, 1:21 p.m. UTC

This patch adds a new MNT_ flag that is set for bind mounts (it mirrors
MS_BIND) and surfaces it via mountinfo. This allows for easier
identification of mount entries that are bind mounted from somewhere
else.

Signed-off-by: Zygmunt Krynicki <zygmunt.krynicki@canonical.com>
---
 fs/namespace.c        | 1 +
 fs/proc_namespace.c   | 1 +
 include/linux/mount.h | 1 +
 3 files changed, 3 insertions(+)

Comments

Seth Forshee Feb. 2, 2017, 3:45 p.m. UTC | #1

On Thu, Feb 02, 2017 at 02:21:04PM +0100, Zygmunt Krynicki wrote:
> This patch adds a new MNT_ flag that is set for bind mounts (it mirrors
> MS_BIND) and surfaces it via mountinfo. This allows for easier
> identification of mount entries that are bind mounted from somewhere
> else.
> 
> Signed-off-by: Zygmunt Krynicki <zygmunt.krynicki@canonical.com>

This is a change in userspace ABI, so this is not something that we're
likely to take into Ubuntu without being sure that it will also be
accepted upstream.

However my expectation is that this patch would meet resistance or be
rejected outright upstream (explanation follows). Can you explain why
you need this?

Fundamentally "bind" refers to the mount operation and not to some
property of the mount itself. Once you've performed a bind mount the new
mount is more or less equivalent to the original - they have a peer
relationship and not a parent/child sort of relationship. The kernel
knows about the relationships between mounts, but whether or not one is
the original and the other was created via a bind mount operation is
irrelevant.

The relationship between two mounts can be seen from userspace using
/proc/<pid>/mountinfo. For example:

# mount -o loop fs.img a
# mount --bind a b
# mount --bind a/foo c
# cat /proc/self/mountinfo
...
158 26 7:0 / /home/ubuntu/bind-test/a rw,relatime shared:135 - ext4 /dev/loop0 rw,data=ordered
162 26 7:0 / /home/ubuntu/bind-test/b rw,relatime shared:135 - ext4 /dev/loop0 rw,data=ordered
166 26 7:0 /foo /home/ubuntu/bind-test/c rw,relatime shared:135 - ext4 /dev/loop0 rw,data=ordered

The "shared:135" indicates that these mounts are all part of the same
peer group (i.e. peer group 135). The mounts at .../a and .../b are
completely equivalent to the kernel.

It does show you that the mount at .../c is a little different.  The 4th
column indicates that the it mounts only a subtree of the filesystem. In
all other ways it is equivalent to the other two mounts.

Seth

Zygmunt Krynicki Feb. 2, 2017, 5:48 p.m. UTC | #2

On Thu, Feb 2, 2017 at 4:45 PM, Seth Forshee <seth.forshee@canonical.com> wrote:
> On Thu, Feb 02, 2017 at 02:21:04PM +0100, Zygmunt Krynicki wrote:
>> This patch adds a new MNT_ flag that is set for bind mounts (it mirrors
>> MS_BIND) and surfaces it via mountinfo. This allows for easier
>> identification of mount entries that are bind mounted from somewhere
>> else.
>>
>> Signed-off-by: Zygmunt Krynicki <zygmunt.krynicki@canonical.com>
>
> This is a change in userspace ABI, so this is not something that we're
> likely to take into Ubuntu without being sure that it will also be
> accepted upstream.
>
> However my expectation is that this patch would meet resistance or be
> rejected outright upstream (explanation follows). Can you explain why
> you need this?

Yes. I will explain at the end of this message.

> Fundamentally "bind" refers to the mount operation and not to some
> property of the mount itself. Once you've performed a bind mount the new
> mount is more or less equivalent to the original - they have a peer
> relationship and not a parent/child sort of relationship. The kernel
> knows about the relationships between mounts, but whether or not one is
> the original and the other was created via a bind mount operation is
> irrelevant.
>
> The relationship between two mounts can be seen from userspace using
> /proc/<pid>/mountinfo. For example:
>
> # mount -o loop fs.img a
> # mount --bind a b
> # mount --bind a/foo c
> # cat /proc/self/mountinfo
> ...
> 158 26 7:0 / /home/ubuntu/bind-test/a rw,relatime shared:135 - ext4 /dev/loop0 rw,data=ordered
> 162 26 7:0 / /home/ubuntu/bind-test/b rw,relatime shared:135 - ext4 /dev/loop0 rw,data=ordered
> 166 26 7:0 /foo /home/ubuntu/bind-test/c rw,relatime shared:135 - ext4 /dev/loop0 rw,data=ordered
>
> The "shared:135" indicates that these mounts are all part of the same
> peer group (i.e. peer group 135). The mounts at .../a and .../b are
> completely equivalent to the kernel.

Until you change the sharing among them and...

# mount -o loop fs.img a
# mount --bind a b
# mount --bind a/foo c
# cat /proc/self/mountinfo | tail -n 3
223 199 7:19 / /home/zyga/experiment/a rw,relatime shared:262 - ext4
/dev/loop19 rw,data=ordered
236 199 7:19 / /home/zyga/experiment/b rw,relatime shared:262 - ext4
/dev/loop19 rw,data=ordered
257 199 7:19 /foo /home/zyga/experiment/c rw,relatime,bind shared:262
- ext4 /dev/loop19 rw,data=ordered
# mount --make-private b
# cat /proc/self/mountinfo | tail -n 3
223 199 7:19 / /home/zyga/experiment/a rw,relatime shared:262 - ext4
/dev/loop19 rw,data=ordered
236 199 7:19 / /home/zyga/experiment/b rw,relatime - ext4 /dev/loop19
rw,data=ordered
257 199 7:19 /foo /home/zyga/experiment/c rw,relatime,bind shared:262
- ext4 /dev/loop19 rw,data=ordered

EDIT: writing this I realized what my problem really is... let me try
to explain below (ignore that paste above)

My original problem is: given a declaration that "$source should be
bind-mounted in $destination" and the state of mountinfo, should this
operation be performed or is it already done?

# mount --bind /snap/snapd-hacker-toolbelt/14/src/
/snap/snapd-hacker-toolbelt/14/dst/
# cat /proc/self/mountinfo | tail -n 1
678 737 7:8 /src /snap/snapd-hacker-toolbelt/14/dst rw,relatime,bind
master:30 - squashfs /dev/loop8 ro

The problem with the way mountinfo presents the facts is that I was
implicitly looking for "$source" somewhere to cross-reference. I think
I now realize what I want to check for is different but already
present in the data that I have. I just need to come up with a set of
things that are equivalent to the $source mount and see if my
$destination mount is present there.

First, I want to find what the $source" really is. In the example
above source is "/snap/snapd-hacker-toolbelt/14/src/". Scanning the
mount table I can find.

737 730 7:8 / /snap/snapd-hacker-toolbelt/14 rw,relatime master:30 -
squashfs /dev/loop8 ro

Now this is not exactly $source but it is the longest prefix of
$source that I can find. A quick guess would be to look for a perfect
match and if not found, discard the final component and look again.
Once I know what the source really is (it is /dev/loop8 + the
concatenation of the discarded final components) I can state a
different question:

What is the set of mount entries using /dev/loop8:

/home/zyga # cat /proc/self/mountinfo | grep loop8
461 224 7:8 / /var/lib/snapd/hostfs/snap/snapd-hacker-toolbelt/14
rw,relatime master:30 - squashfs /dev/loop8 ro
737 730 7:8 / /snap/snapd-hacker-toolbelt/14 rw,relatime master:30 -
squashfs /dev/loop8 ro
678 737 7:8 /src /snap/snapd-hacker-toolbelt/14/dst rw,relatime,bind
master:30 - squashfs /dev/loop8 ro

I can now infer that both /snap/snapd-hacker-toolbelt/14/dst and
/snap/snapd-hacker-toolbelt/14/src represent the same object: /src
from /dev/loop8.

My original question, should I do that bind mount or is it already
done can be answered by checking if the $destination is present in the
set above.

Do you think I am on the right track?
ZK

Seth Forshee Feb. 2, 2017, 7:27 p.m. UTC | #3

On Thu, Feb 02, 2017 at 06:48:11PM +0100, Zygmunt Krynicki wrote:
> On Thu, Feb 2, 2017 at 4:45 PM, Seth Forshee <seth.forshee@canonical.com> wrote:
> > On Thu, Feb 02, 2017 at 02:21:04PM +0100, Zygmunt Krynicki wrote:
> >> This patch adds a new MNT_ flag that is set for bind mounts (it mirrors
> >> MS_BIND) and surfaces it via mountinfo. This allows for easier
> >> identification of mount entries that are bind mounted from somewhere
> >> else.
> >>
> >> Signed-off-by: Zygmunt Krynicki <zygmunt.krynicki@canonical.com>
> >
> > This is a change in userspace ABI, so this is not something that we're
> > likely to take into Ubuntu without being sure that it will also be
> > accepted upstream.
> >
> > However my expectation is that this patch would meet resistance or be
> > rejected outright upstream (explanation follows). Can you explain why
> > you need this?
> 
> Yes. I will explain at the end of this message.
> 
> > Fundamentally "bind" refers to the mount operation and not to some
> > property of the mount itself. Once you've performed a bind mount the new
> > mount is more or less equivalent to the original - they have a peer
> > relationship and not a parent/child sort of relationship. The kernel
> > knows about the relationships between mounts, but whether or not one is
> > the original and the other was created via a bind mount operation is
> > irrelevant.
> >
> > The relationship between two mounts can be seen from userspace using
> > /proc/<pid>/mountinfo. For example:
> >
> > # mount -o loop fs.img a
> > # mount --bind a b
> > # mount --bind a/foo c
> > # cat /proc/self/mountinfo
> > ...
> > 158 26 7:0 / /home/ubuntu/bind-test/a rw,relatime shared:135 - ext4 /dev/loop0 rw,data=ordered
> > 162 26 7:0 / /home/ubuntu/bind-test/b rw,relatime shared:135 - ext4 /dev/loop0 rw,data=ordered
> > 166 26 7:0 /foo /home/ubuntu/bind-test/c rw,relatime shared:135 - ext4 /dev/loop0 rw,data=ordered
> >
> > The "shared:135" indicates that these mounts are all part of the same
> > peer group (i.e. peer group 135). The mounts at .../a and .../b are
> > completely equivalent to the kernel.
> 
> Until you change the sharing among them and...
> 
> # mount -o loop fs.img a
> # mount --bind a b
> # mount --bind a/foo c
> # cat /proc/self/mountinfo | tail -n 3
> 223 199 7:19 / /home/zyga/experiment/a rw,relatime shared:262 - ext4
> /dev/loop19 rw,data=ordered
> 236 199 7:19 / /home/zyga/experiment/b rw,relatime shared:262 - ext4
> /dev/loop19 rw,data=ordered
> 257 199 7:19 /foo /home/zyga/experiment/c rw,relatime,bind shared:262
> - ext4 /dev/loop19 rw,data=ordered
> # mount --make-private b
> # cat /proc/self/mountinfo | tail -n 3
> 223 199 7:19 / /home/zyga/experiment/a rw,relatime shared:262 - ext4
> /dev/loop19 rw,data=ordered
> 236 199 7:19 / /home/zyga/experiment/b rw,relatime - ext4 /dev/loop19
> rw,data=ordered
> 257 199 7:19 /foo /home/zyga/experiment/c rw,relatime,bind shared:262
> - ext4 /dev/loop19 rw,data=ordered
> 
> EDIT: writing this I realized what my problem really is... let me try
> to explain below (ignore that paste above)
> 
> My original problem is: given a declaration that "$source should be
> bind-mounted in $destination" and the state of mountinfo, should this
> operation be performed or is it already done?
> 
> # mount --bind /snap/snapd-hacker-toolbelt/14/src/
> /snap/snapd-hacker-toolbelt/14/dst/
> # cat /proc/self/mountinfo | tail -n 1
> 678 737 7:8 /src /snap/snapd-hacker-toolbelt/14/dst rw,relatime,bind
> master:30 - squashfs /dev/loop8 ro
> 
> The problem with the way mountinfo presents the facts is that I was
> implicitly looking for "$source" somewhere to cross-reference. I think
> I now realize what I want to check for is different but already
> present in the data that I have. I just need to come up with a set of
> things that are equivalent to the $source mount and see if my
> $destination mount is present there.
> 
> First, I want to find what the $source" really is. In the example
> above source is "/snap/snapd-hacker-toolbelt/14/src/". Scanning the
> mount table I can find.
> 
> 737 730 7:8 / /snap/snapd-hacker-toolbelt/14 rw,relatime master:30 -
> squashfs /dev/loop8 ro
> 
> Now this is not exactly $source but it is the longest prefix of
> $source that I can find. A quick guess would be to look for a perfect
> match and if not found, discard the final component and look again.
> Once I know what the source really is (it is /dev/loop8 + the
> concatenation of the discarded final components) I can state a
> different question:
> 
> What is the set of mount entries using /dev/loop8:
> 
> /home/zyga # cat /proc/self/mountinfo | grep loop8
> 461 224 7:8 / /var/lib/snapd/hostfs/snap/snapd-hacker-toolbelt/14
> rw,relatime master:30 - squashfs /dev/loop8 ro
> 737 730 7:8 / /snap/snapd-hacker-toolbelt/14 rw,relatime master:30 -
> squashfs /dev/loop8 ro
> 678 737 7:8 /src /snap/snapd-hacker-toolbelt/14/dst rw,relatime,bind
> master:30 - squashfs /dev/loop8 ro
> 
> I can now infer that both /snap/snapd-hacker-toolbelt/14/dst and
> /snap/snapd-hacker-toolbelt/14/src represent the same object: /src
> from /dev/loop8.
> 
> My original question, should I do that bind mount or is it already
> done can be answered by checking if the $destination is present in the
> set above.
> 
> Do you think I am on the right track?

I think you're very close. Let's walk through it step by step.

A prerequisite is to really understand the data in mountinfo.  Take a
look at the description of mountinfo in
Documentation/filesystems/proc.txt in the kernel source tree (also
you'll want to understand that well to parse it programatically; there's
a section with 0 or more optional fields).

The root question is, "has $source been bind mounted at $dest." Working
with your example above, we'd have:

 $source = /snap/snapd-hacker-toolbelt/14/src
 $dest = /snap/snapd-hacker-toolbelt/14/dst

First let's determine if anything is even mounted at $dest. By scanning
mountinfo and looking for $dest (expressed as the full path relative to
the process's root) in the "mount point" field of one of mountinfo's
entries. Note however that the mount point could appear more than once,
with mounts covering other mounts, and I assume you really want to know
about the topmost mount (the one whose contents you'd see in the mount
point). I suspect this will always be the last entry in mountinfo with
$dest as its mount point, but I need to confirm this.

If nothing is mounted at $dest, then you've got your answer. Otherwise
you need to determine exactly what's mounted there. In our example we
end up with this entry:

 678 737 7:8 /src /snap/snapd-hacker-toolbelt/14/dst rw,relatime master:30 - squashfs /dev/loop8 ro

We're going to be interested in three fields - root, mount point, and
mount source. Extracting these:

 $dest:
   root = /src
   mount_point = /snap/snapd-hacker-toolbelt/14/dst
   mount_source = /dev/loop8

From this we know the root of the mount at $dest is /src within the
filesystem at /dev/loop8.

One important thing to note here is that mount_source is filesystem
specific, so for some filesystems such as fuse you can't count on it
being a path to a block device. You can count on it being the same for
any two entries which represent mounts from the same filesystem,
however.

Now, as you said, we should search mountinfo for the entry whose mount
point is the longest prefix of $source (keeping in mind that there could
be more than one). That gives us:

 737 730 7:8 / /snap/snapd-hacker-toolbelt/14 rw,relatime master:30 - squashfs /dev/loop8 ro

Or:

 $source:
   root = /
   mount_point = /snap/snapd-hacker-toolbelt/14
   mount_source = /dev/loop8

If Ssource and $dest have different mount_source's you know you don't
have the bind mount you're looking for. Otherwise you should strip the
mount_point path component from $soure then prepend its root components,
which will give you the path of $source within the mount_source. If this
is the same as the root for $dest then you have what you want mounted
there.

I believe that should give you a definite answer as to whether the bind
mount you desire has already been done. Let me know if you find any
mistakes or have any questions. I'll look into how to be sure that
you've identified the topmost mount for a given path and let you know.

Seth

Seth Forshee Feb. 2, 2017, 9:12 p.m. UTC | #4

On Thu, Feb 02, 2017 at 01:27:11PM -0600, Seth Forshee wrote:
> Note however that the mount point could appear more than once,
> with mounts covering other mounts, and I assume you really want to know
> about the topmost mount (the one whose contents you'd see in the mount
> point). I suspect this will always be the last entry in mountinfo with
> $dest as its mount point, but I need to confirm this.

This should be true. mountinfo displays mounts in the order that they
appear in the mount namespace's list of mounts, and as new mounts are
added at the end of the list a new mount will always appear after any
mounts which are shadowed by it.

Seth

diff --git a/fs/namespace.c b/fs/namespace.c
index c9ba9d1..e2fc0c1 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2162,6 +2162,7 @@  static int do_loopback(struct path *path, const char *old_name,
 	}
 
 	mnt->mnt.mnt_flags &= ~MNT_LOCKED;
+	mnt->mnt.mnt_flags |= MNT_BIND;
 
 	err = graft_tree(mnt, parent, mp);
 	if (err) {
diff --git a/fs/proc_namespace.c b/fs/proc_namespace.c
index 876459559..e510585 100644
--- a/fs/proc_namespace.c
+++ b/fs/proc_namespace.c
@@ -67,6 +67,7 @@  static void show_mnt_opts(struct seq_file *m, struct vfsmount *mnt)
 		{ MNT_NOATIME, ",noatime" },
 		{ MNT_NODIRATIME, ",nodiratime" },
 		{ MNT_RELATIME, ",relatime" },
+		{ MNT_BIND, ",bind" },
 		{ 0, NULL }
 	};
 	const struct proc_fs_info *fs_infop;
diff --git a/include/linux/mount.h b/include/linux/mount.h
index 1172cce..81f7bec 100644
--- a/include/linux/mount.h
+++ b/include/linux/mount.h
@@ -62,6 +62,7 @@  struct mnt_namespace;
 #define MNT_SYNC_UMOUNT		0x2000000
 #define MNT_MARKED		0x4000000
 #define MNT_UMOUNT		0x8000000
+#define MNT_BIND		0x10000000
 
 struct vfsmount {
 	struct dentry *mnt_root;	/* root of the mounted tree */

fs: store MS_BIND as MNT_BIND and show it in mountinfo

Commit Message

Comments

Patch