diff mbox series

[B,2/2] unfuck sysfs_mount()

Message ID 20210630190001.6112-3-gpiccoli@canonical.com
State New
Headers show
Series Kernel oops due to uninitialized list on kernfs | expand

Commit Message

Guilherme G. Piccoli June 30, 2021, 7 p.m. UTC
From: Al Viro <viro@zeniv.linux.org.uk>

BugLink: https://bugs.launchpad.net/bugs/1934175

new_sb is left uninitialized in case of early failures in kernfs_mount_ns(),
and while IS_ERR(root) is true in all such cases, using IS_ERR(root) || !new_sb
is not a solution - IS_ERR(root) is true in some cases when new_sb is true.

Make sure new_sb is initialized (and matches the reality) in all cases and
fix the condition for dropping kobj reference - we want it done precisely
in those situations where the reference has not been transferred into a new
super_block instance.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
(cherry picked from commit 7b745a4e4051e1bbce40e0b1c2cf636c70583aa4)
Signed-off-by: Guilherme G. Piccoli <gpiccoli@canonical.com>
---
 fs/sysfs/mount.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

Comments

Thadeu Lima de Souza Cascardo July 1, 2021, 12:32 p.m. UTC | #1
On Wed, Jun 30, 2021 at 04:00:01PM -0300, Guilherme G. Piccoli wrote:
> From: Al Viro <viro@zeniv.linux.org.uk>
> 
> BugLink: https://bugs.launchpad.net/bugs/1934175
> 
> new_sb is left uninitialized in case of early failures in kernfs_mount_ns(),
> and while IS_ERR(root) is true in all such cases, using IS_ERR(root) || !new_sb
> is not a solution - IS_ERR(root) is true in some cases when new_sb is true.
> 
> Make sure new_sb is initialized (and matches the reality) in all cases and
> fix the condition for dropping kobj reference - we want it done precisely
> in those situations where the reference has not been transferred into a new
> super_block instance.
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> (cherry picked from commit 7b745a4e4051e1bbce40e0b1c2cf636c70583aa4)
> Signed-off-by: Guilherme G. Piccoli <gpiccoli@canonical.com>
> ---
>  fs/sysfs/mount.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/sysfs/mount.c b/fs/sysfs/mount.c
> index fb49510c5dcf..88b388415d0e 100644
> --- a/fs/sysfs/mount.c
> +++ b/fs/sysfs/mount.c
> @@ -28,7 +28,7 @@ static struct dentry *sysfs_mount(struct file_system_type *fs_type,
>  {
>  	struct dentry *root;
>  	void *ns;
> -	bool new_sb;
> +	bool new_sb = false;
>  
>  	if (!(flags & SB_KERNMOUNT)) {
>  		if (!kobj_ns_current_may_mount(KOBJ_NS_TYPE_NET))
> @@ -38,9 +38,9 @@ static struct dentry *sysfs_mount(struct file_system_type *fs_type,
>  	ns = kobj_ns_grab_current(KOBJ_NS_TYPE_NET);
>  	root = kernfs_mount_ns(fs_type, flags, sysfs_root,
>  				SYSFS_MAGIC, &new_sb, ns);
> -	if (IS_ERR(root) || !new_sb)
> +	if (!new_sb)
>  		kobj_ns_drop(KOBJ_NS_TYPE_NET, ns);
> -	else if (new_sb)
> +	else if (!IS_ERR(root))
>  		root->d_sb->s_iflags |= SB_I_USERNS_VISIBLE;
>  
>  	return root;
> -- 
> 2.31.1

I still need to wrap my head around sget_userns (called from
kernfs_mount_ns), but it looks like we can get away with dropping the
kobj_ns reference here when this is not a new superblock, because there
will be a reference to this "old" superblock, which will prevent it from
being killed, and we only drop the kobj_ns on sysfs_kill_sb (defined below
sysfs_mount) when all superblock references are done with.

So we only need to keep the kobj_ns reference in the case of new_sb.

However, if this is a new_sb, but we return an error
(fs/kernfs/mount.c:340), with the old code, we would drop the kobj_ns
reference, as expected. With this patch, it would not drop it, leading to a
kernel leak.

This code has been totally revamped after that change, with commit
23bf1b6be9c2 ("kernfs, sysfs, cgroup, intel_rdt: Support fs_context"),
which means that whatever bug this is fixing needs to be fixed specifically
for our 4.15 kernel (and upstream 4.14.y, and v4.19.y, it seems).

Cascardo.
Guilherme G. Piccoli July 1, 2021, 12:50 p.m. UTC | #2
Cascardo, thanks a lot for the deep analysis!
The first commit in the series was the real fix, and you acked it! So,
we're fine.

Regarding this one, honestly I always had a terrible time with VFS
commits due to the complete lack of good commit messages from Al plus
the complexity of the code. This one, for me seemed a very
straightforward fix - I've also asked feedback in linux-stable (from
Al himself!) and the patch was accepted for 4.14.y .

You seem to have a good understanding of this area, so I'd suggest you
to reply [0] with your analysis to prevent it to get added on stable
4.14.y - and if it is added there, we can either remove it later or
just not merge this stable fix in our kernels.

Cheers,


Guilherme


[0] https://lore.kernel.org/stable/20210628143628.33342-64-sashal@kernel.org/
Thadeu Lima de Souza Cascardo July 1, 2021, 3:09 p.m. UTC | #3
On Thu, Jul 01, 2021 at 09:32:48AM -0300, Thadeu Lima de Souza Cascardo wrote:
> On Wed, Jun 30, 2021 at 04:00:01PM -0300, Guilherme G. Piccoli wrote:
> > From: Al Viro <viro@zeniv.linux.org.uk>
> > 
> > BugLink: https://bugs.launchpad.net/bugs/1934175
> > 
> > new_sb is left uninitialized in case of early failures in kernfs_mount_ns(),
> > and while IS_ERR(root) is true in all such cases, using IS_ERR(root) || !new_sb
> > is not a solution - IS_ERR(root) is true in some cases when new_sb is true.
> > 
> > Make sure new_sb is initialized (and matches the reality) in all cases and
> > fix the condition for dropping kobj reference - we want it done precisely
> > in those situations where the reference has not been transferred into a new
> > super_block instance.
> > 
> > Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> > (cherry picked from commit 7b745a4e4051e1bbce40e0b1c2cf636c70583aa4)
> > Signed-off-by: Guilherme G. Piccoli <gpiccoli@canonical.com>
> > ---
> >  fs/sysfs/mount.c | 6 +++---
> >  1 file changed, 3 insertions(+), 3 deletions(-)
> > 
> > diff --git a/fs/sysfs/mount.c b/fs/sysfs/mount.c
> > index fb49510c5dcf..88b388415d0e 100644
> > --- a/fs/sysfs/mount.c
> > +++ b/fs/sysfs/mount.c
> > @@ -28,7 +28,7 @@ static struct dentry *sysfs_mount(struct file_system_type *fs_type,
> >  {
> >  	struct dentry *root;
> >  	void *ns;
> > -	bool new_sb;
> > +	bool new_sb = false;
> >  
> >  	if (!(flags & SB_KERNMOUNT)) {
> >  		if (!kobj_ns_current_may_mount(KOBJ_NS_TYPE_NET))
> > @@ -38,9 +38,9 @@ static struct dentry *sysfs_mount(struct file_system_type *fs_type,
> >  	ns = kobj_ns_grab_current(KOBJ_NS_TYPE_NET);
> >  	root = kernfs_mount_ns(fs_type, flags, sysfs_root,
> >  				SYSFS_MAGIC, &new_sb, ns);
> > -	if (IS_ERR(root) || !new_sb)
> > +	if (!new_sb)
> >  		kobj_ns_drop(KOBJ_NS_TYPE_NET, ns);
> > -	else if (new_sb)
> > +	else if (!IS_ERR(root))
> >  		root->d_sb->s_iflags |= SB_I_USERNS_VISIBLE;
> >  
> >  	return root;
> > -- 
> > 2.31.1
> 
> I still need to wrap my head around sget_userns (called from
> kernfs_mount_ns), but it looks like we can get away with dropping the
> kobj_ns reference here when this is not a new superblock, because there
> will be a reference to this "old" superblock, which will prevent it from
> being killed, and we only drop the kobj_ns on sysfs_kill_sb (defined below
> sysfs_mount) when all superblock references are done with.
> 
> So we only need to keep the kobj_ns reference in the case of new_sb.
> 
> However, if this is a new_sb, but we return an error
> (fs/kernfs/mount.c:340), with the old code, we would drop the kobj_ns
> reference, as expected. With this patch, it would not drop it, leading to a
> kernel leak.
> 

It was called to my attention that we drop the reference by calling kill_sb
from deactive_locked_super, which is called right before returning the error.
And this was the motivation for the fix in the first place, which somehow did
not seem clear from the commit message.

So, we have:

1) early failure on kernfs_mount_ns, new_sb is false, deactivate_locked_super
has not been called, kill_sb will not be called: we need to drop the reference.

2) only other failure on kernfs_mount_ns is when new_sb is true and
deactive_locked_super has been called: no need to drop the reference.

3) success on kernfs_mount_ns, either new_sb is set or not, we want to drop
when new_sb is false to avoid a leak, because we only grabbed an extra
reference on struct super_block s_active member.

With that,

Acked-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>

> This code has been totally revamped after that change, with commit
> 23bf1b6be9c2 ("kernfs, sysfs, cgroup, intel_rdt: Support fs_context"),
> which means that whatever bug this is fixing needs to be fixed specifically
> for our 4.15 kernel (and upstream 4.14.y, and v4.19.y, it seems).
> 
> Cascardo.
> 
> -- 
> kernel-team mailing list
> kernel-team@lists.ubuntu.com
> https://lists.ubuntu.com/mailman/listinfo/kernel-team
Guilherme G. Piccoli July 1, 2021, 3:14 p.m. UTC | #4
On Thu, Jul 1, 2021 at 12:09 PM Thadeu Lima de Souza Cascardo
<cascardo@canonical.com> wrote:
>
> On Thu, Jul 01, 2021 at 09:32:48AM -0300, Thadeu Lima de Souza Cascardo wrote:
> > On Wed, Jun 30, 2021 at 04:00:01PM -0300, Guilherme G. Piccoli wrote:
> > > From: Al Viro <viro@zeniv.linux.org.uk>
> > >
> > > BugLink: https://bugs.launchpad.net/bugs/1934175
> > >
> > > new_sb is left uninitialized in case of early failures in kernfs_mount_ns(),
> > > and while IS_ERR(root) is true in all such cases, using IS_ERR(root) || !new_sb
> > > is not a solution - IS_ERR(root) is true in some cases when new_sb is true.
> > >
> > > Make sure new_sb is initialized (and matches the reality) in all cases and
> > > fix the condition for dropping kobj reference - we want it done precisely
> > > in those situations where the reference has not been transferred into a new
> > > super_block instance.
> > >
> > > Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> > > (cherry picked from commit 7b745a4e4051e1bbce40e0b1c2cf636c70583aa4)
> > > Signed-off-by: Guilherme G. Piccoli <gpiccoli@canonical.com>
> > > ---
> > >  fs/sysfs/mount.c | 6 +++---
> > >  1 file changed, 3 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/fs/sysfs/mount.c b/fs/sysfs/mount.c
> > > index fb49510c5dcf..88b388415d0e 100644
> > > --- a/fs/sysfs/mount.c
> > > +++ b/fs/sysfs/mount.c
> > > @@ -28,7 +28,7 @@ static struct dentry *sysfs_mount(struct file_system_type *fs_type,
> > >  {
> > >     struct dentry *root;
> > >     void *ns;
> > > -   bool new_sb;
> > > +   bool new_sb = false;
> > >
> > >     if (!(flags & SB_KERNMOUNT)) {
> > >             if (!kobj_ns_current_may_mount(KOBJ_NS_TYPE_NET))
> > > @@ -38,9 +38,9 @@ static struct dentry *sysfs_mount(struct file_system_type *fs_type,
> > >     ns = kobj_ns_grab_current(KOBJ_NS_TYPE_NET);
> > >     root = kernfs_mount_ns(fs_type, flags, sysfs_root,
> > >                             SYSFS_MAGIC, &new_sb, ns);
> > > -   if (IS_ERR(root) || !new_sb)
> > > +   if (!new_sb)
> > >             kobj_ns_drop(KOBJ_NS_TYPE_NET, ns);
> > > -   else if (new_sb)
> > > +   else if (!IS_ERR(root))
> > >             root->d_sb->s_iflags |= SB_I_USERNS_VISIBLE;
> > >
> > >     return root;
> > > --
> > > 2.31.1
> >
> > I still need to wrap my head around sget_userns (called from
> > kernfs_mount_ns), but it looks like we can get away with dropping the
> > kobj_ns reference here when this is not a new superblock, because there
> > will be a reference to this "old" superblock, which will prevent it from
> > being killed, and we only drop the kobj_ns on sysfs_kill_sb (defined below
> > sysfs_mount) when all superblock references are done with.
> >
> > So we only need to keep the kobj_ns reference in the case of new_sb.
> >
> > However, if this is a new_sb, but we return an error
> > (fs/kernfs/mount.c:340), with the old code, we would drop the kobj_ns
> > reference, as expected. With this patch, it would not drop it, leading to a
> > kernel leak.
> >
>
> It was called to my attention that we drop the reference by calling kill_sb
> from deactive_locked_super, which is called right before returning the error.
> And this was the motivation for the fix in the first place, which somehow did
> not seem clear from the commit message.
>
> So, we have:
>
> 1) early failure on kernfs_mount_ns, new_sb is false, deactivate_locked_super
> has not been called, kill_sb will not be called: we need to drop the reference.
>
> 2) only other failure on kernfs_mount_ns is when new_sb is true and
> deactive_locked_super has been called: no need to drop the reference.
>
> 3) success on kernfs_mount_ns, either new_sb is set or not, we want to drop
> when new_sb is false to avoid a leak, because we only grabbed an extra
> reference on struct super_block s_active member.
>
> With that,
>
> Acked-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
>


Thanks again, very informative analysis!
diff mbox series

Patch

diff --git a/fs/sysfs/mount.c b/fs/sysfs/mount.c
index fb49510c5dcf..88b388415d0e 100644
--- a/fs/sysfs/mount.c
+++ b/fs/sysfs/mount.c
@@ -28,7 +28,7 @@  static struct dentry *sysfs_mount(struct file_system_type *fs_type,
 {
 	struct dentry *root;
 	void *ns;
-	bool new_sb;
+	bool new_sb = false;
 
 	if (!(flags & SB_KERNMOUNT)) {
 		if (!kobj_ns_current_may_mount(KOBJ_NS_TYPE_NET))
@@ -38,9 +38,9 @@  static struct dentry *sysfs_mount(struct file_system_type *fs_type,
 	ns = kobj_ns_grab_current(KOBJ_NS_TYPE_NET);
 	root = kernfs_mount_ns(fs_type, flags, sysfs_root,
 				SYSFS_MAGIC, &new_sb, ns);
-	if (IS_ERR(root) || !new_sb)
+	if (!new_sb)
 		kobj_ns_drop(KOBJ_NS_TYPE_NET, ns);
-	else if (new_sb)
+	else if (!IS_ERR(root))
 		root->d_sb->s_iflags |= SB_I_USERNS_VISIBLE;
 
 	return root;