diff mbox series

ext4: fix memory leak in ext4_fill_super

Message ID 20210428172828.12589-1-paskripkin@gmail.com
State Superseded
Headers show
Series ext4: fix memory leak in ext4_fill_super | expand

Commit Message

Pavel Skripkin April 28, 2021, 5:28 p.m. UTC
syzbot reported memory leak in ext4 subsyetem.
The problem appears, when thread_stop() call happens
before wake_up_process().

Normally, this data will be freed by
created thread, but if kthread_stop()
returned -EINTR, this data should be freed manually

Reported-by: syzbot+d9e482e303930fa4f6ff@syzkaller.appspotmail.com
Tested-by: syzbot+d9e482e303930fa4f6ff@syzkaller.appspotmail.com
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
---
 fs/ext4/super.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

Comments

Vegard Nossum April 29, 2021, 10:01 a.m. UTC | #1
On 2021-04-28 19:28, Pavel Skripkin wrote:
> syzbot reported memory leak in ext4 subsyetem.
> The problem appears, when thread_stop() call happens
> before wake_up_process().
> 
> Normally, this data will be freed by
> created thread, but if kthread_stop()
> returned -EINTR, this data should be freed manually
> 
> Reported-by: syzbot+d9e482e303930fa4f6ff@syzkaller.appspotmail.com
> Tested-by: syzbot+d9e482e303930fa4f6ff@syzkaller.appspotmail.com
> Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
> ---
>   fs/ext4/super.c | 6 ++++--
>   1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index b9693680463a..9c33e97bd5c5 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -5156,8 +5156,10 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
>   failed_mount3:
>   	flush_work(&sbi->s_error_work);
>   	del_timer_sync(&sbi->s_err_report);
> -	if (sbi->s_mmp_tsk)
> -		kthread_stop(sbi->s_mmp_tsk);
> +	if (sbi->s_mmp_tsk) {
> +		if (kthread_stop(sbi->s_mmp_tsk) == -EINTR)
> +			kfree(kthread_data(sbi->s_mmp_tsk));
> +	}
>   failed_mount2:
>   	rcu_read_lock();
>   	group_desc = rcu_dereference(sbi->s_group_desc);
> 

So I've looked at this, and the puzzling thing is that ext4 uses
kthread_run() which immediately calls wake_up_process() -- according to
the kerneldoc for kthread_stop(), it shouldn't return -EINTR in this
case:

  * Returns the result of threadfn(), or %-EINTR if wake_up_process()
  * was never called.
  */
int kthread_stop(struct task_struct *k)

So it really looks like kthread_stop() can return -EINTR even when
wake_up_process() has been called but the thread hasn't had a chance to
run yet?

If this is true, then we either have to fix kthread_create() to make
sure it respects the behaviour that is claimed by the comment OR we have
to audit every single kthread_stop() in the kernel which does not check
for -EINTR.


Vegard
Pavel Skripkin April 29, 2021, 11:08 a.m. UTC | #2
On Thu, 29 Apr 2021 12:01:46 +0200
Vegard Nossum <vegard.nossum@oracle.com> wrote:

> 
> On 2021-04-28 19:28, Pavel Skripkin wrote:
> > syzbot reported memory leak in ext4 subsyetem.
> > The problem appears, when thread_stop() call happens
> > before wake_up_process().
> > 
> > Normally, this data will be freed by
> > created thread, but if kthread_stop()
> > returned -EINTR, this data should be freed manually
> > 
> > Reported-by: syzbot+d9e482e303930fa4f6ff@syzkaller.appspotmail.com
> > Tested-by: syzbot+d9e482e303930fa4f6ff@syzkaller.appspotmail.com
> > Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
> > ---
> >   fs/ext4/super.c | 6 ++++--
> >   1 file changed, 4 insertions(+), 2 deletions(-)
> > 
> > diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> > index b9693680463a..9c33e97bd5c5 100644
> > --- a/fs/ext4/super.c
> > +++ b/fs/ext4/super.c
> > @@ -5156,8 +5156,10 @@ static int ext4_fill_super(struct
> > super_block *sb, void *data, int silent) failed_mount3:
> >   	flush_work(&sbi->s_error_work);
> >   	del_timer_sync(&sbi->s_err_report);
> > -	if (sbi->s_mmp_tsk)
> > -		kthread_stop(sbi->s_mmp_tsk);
> > +	if (sbi->s_mmp_tsk) {
> > +		if (kthread_stop(sbi->s_mmp_tsk) == -EINTR)
> > +			kfree(kthread_data(sbi->s_mmp_tsk));
> > +	}
> >   failed_mount2:
> >   	rcu_read_lock();
> >   	group_desc = rcu_dereference(sbi->s_group_desc);
> > 
> 
> So I've looked at this, and the puzzling thing is that ext4 uses
> kthread_run() which immediately calls wake_up_process() -- according
> to the kerneldoc for kthread_stop(), it shouldn't return -EINTR in
> this case:
> 
>   * Returns the result of threadfn(), or %-EINTR if wake_up_process()
>   * was never called.
>   */
> int kthread_stop(struct task_struct *k)
> 
> So it really looks like kthread_stop() can return -EINTR even when
> wake_up_process() has been called but the thread hasn't had a chance
> to run yet?
> 
> If this is true, then we either have to fix kthread_create() to make
> sure it respects the behaviour that is claimed by the comment OR we
> have to audit every single kthread_stop() in the kernel which does
> not check for -EINTR.
> 

Me and Vegard found the root case of this bug:

static int kthread(void *_create) 
{
	....
	ret = -EINTR;
	if (!test_bit(KTHREAD_SHOULD_STOP, &self->flags)) {
		cgroup_kthread_ready();
		__kthread_parkme(self);
		ret = threadfn(data);
	}
	
	do_exit(ret);
}

There is a change, that kthread_stop() call will happen before this
. It means, that all kthread_stop() return value must be checked
everywhere

Vegard wrote code snippet, which reproduces this behavior:

#include <linux/printk.h>
#include <linux/proc_fs.h>
#include <linux/kthread.h>

static int test_thread(void *data)
{
        printk(KERN_ERR "test_thread()\n");
        return 0;
}

static int test_show(struct seq_file *seq, void *data)
{
        struct task_struct *t = kthread_run(test_thread, NULL, "test");
        if (!IS_ERR(t)) {
                int ret = kthread_stop(t);
                printk(KERN_ERR "kthread_stop() = %d\n", ret);
        }

        return 0;
}

static void __init init_test(void)
{
        proc_create_single("test", 0444, NULL, &test_show);
}

late_initcall(init_test);


> 
> Vegard



With regards,
Pavel Skripkin
Pavel Skripkin April 29, 2021, 11:33 a.m. UTC | #3
On Thu, 29 Apr 2021 12:01:46 +0200
Vegard Nossum <vegard.nossum@oracle.com> wrote:

> 
> On 2021-04-28 19:28, Pavel Skripkin wrote:
> > syzbot reported memory leak in ext4 subsyetem.
> > The problem appears, when thread_stop() call happens
> > before wake_up_process().
> > 
> > Normally, this data will be freed by
> > created thread, but if kthread_stop()
> > returned -EINTR, this data should be freed manually
> > 
> > Reported-by: syzbot+d9e482e303930fa4f6ff@syzkaller.appspotmail.com
> > Tested-by: syzbot+d9e482e303930fa4f6ff@syzkaller.appspotmail.com
> > Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
> > ---
> >   fs/ext4/super.c | 6 ++++--
> >   1 file changed, 4 insertions(+), 2 deletions(-)
> > 
> > diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> > index b9693680463a..9c33e97bd5c5 100644
> > --- a/fs/ext4/super.c
> > +++ b/fs/ext4/super.c
> > @@ -5156,8 +5156,10 @@ static int ext4_fill_super(struct
> > super_block *sb, void *data, int silent) failed_mount3:
> >   	flush_work(&sbi->s_error_work);
> >   	del_timer_sync(&sbi->s_err_report);
> > -	if (sbi->s_mmp_tsk)
> > -		kthread_stop(sbi->s_mmp_tsk);
> > +	if (sbi->s_mmp_tsk) {
> > +		if (kthread_stop(sbi->s_mmp_tsk) == -EINTR)
> > +			kfree(kthread_data(sbi->s_mmp_tsk));
> > +	}
> >   failed_mount2:
> >   	rcu_read_lock();
> >   	group_desc = rcu_dereference(sbi->s_group_desc);
> > 
> 
> So I've looked at this, and the puzzling thing is that ext4 uses
> kthread_run() which immediately calls wake_up_process() -- according
> to the kerneldoc for kthread_stop(), it shouldn't return -EINTR in
> this case:
> 
>   * Returns the result of threadfn(), or %-EINTR if wake_up_process()
>   * was never called.
>   */
> int kthread_stop(struct task_struct *k)
> 
> So it really looks like kthread_stop() can return -EINTR even when
> wake_up_process() has been called but the thread hasn't had a chance
> to run yet?
> 
> If this is true, then we either have to fix kthread_create() to make
> sure it respects the behaviour that is claimed by the comment OR we
> have to audit every single kthread_stop() in the kernel which does
> not check for -EINTR.
> 
> 
> Vegard

I am sorry for my complitely broken mail client :(

Me and Vegard found the root case of this bug:

static int kthread(void *_create) 
{
	....
	ret = -EINTR;
	if (!test_bit(KTHREAD_SHOULD_STOP, &self->flags)) {
		cgroup_kthread_ready();
		__kthread_parkme(self);
		ret = threadfn(data);
	}
	
	do_exit(ret);
}

There is a chance, that kthread_stop() call will happen before
threadfn call. It means, that kthread_stop() return value must be checked everywhere,
isn't it? Otherwise, there are a lot of potential memory leaks,
because some developers rely on the fact, that data allocated for the thread will
be freed _inside_ thread function.

Vegard wrote the code snippet, which reproduces this behavior:

#include <linux/printk.h>
#include <linux/proc_fs.h>
#include <linux/kthread.h>

static int test_thread(void *data)
{
        printk(KERN_ERR "test_thread()\n");
        return 0;
}

static int test_show(struct seq_file *seq, void *data)
{
        struct task_struct *t = kthread_run(test_thread, NULL, "test");
        if (!IS_ERR(t)) {
                int ret = kthread_stop(t);
                printk(KERN_ERR "kthread_stop() = %d\n", ret);
        }

        return 0;
}

static void __init init_test(void)
{
        proc_create_single("test", 0444, NULL, &test_show);
}

late_initcall(init_test);

So, is this behavior is expected or not? Should maintainers rewrite
code, which doesn't check kthread_stop() return value?


With regards,
Pavel Skripkin
Theodore Ts'o April 29, 2021, 5:05 p.m. UTC | #4
On Thu, Apr 29, 2021 at 02:33:54PM +0300, Pavel Skripkin wrote:
> 
> There is a chance, that kthread_stop() call will happen before
> threadfn call. It means, that kthread_stop() return value must be checked everywhere,
> isn't it? Otherwise, there are a lot of potential memory leaks,
> because some developers rely on the fact, that data allocated for the thread will
> be freed _inside_ thread function.

That's not the only potential way that we could leak memory.  Earlier
in kthread(), if this memory allocation fails,

	self = kzalloc(sizeof(*self), GFP_KERNEL);

we will exit with -ENOMEM.  So at the very least all callers of
kthread_stop() also need to check for -ENOMEM as well as -EINTR ---
or, be somehow sure that the thread function was successfully called
and started.  In this particular case, the ext4 mount code had just
started the kmmpd thread, and then detected that something else had
gone wrong, and failed the mount before the kmmpd thread ever had a
chance to run.

I think if we want to fix this more generally across the whole kernel,
we would need to have a variant of kthread_run which supplies two
functions --- one which is the thread function, and the other which is
a cleanup function.  The cleanup function could just be kfree, but
there will be other cases where the cleanup function will need to do
other work before freeing the data structure (e.g., brelse((struct
mmpd_data *)data->bh)).

Is it worth it to provide such a cleanup function, which if present
would be called any time the thread exits or is killed?  I dunno.
It's probably simpler to just strongly recommend that the cleanup work
should never be done in the thread function, but after kthread_stop()
is called, whether it returns an error or not.  That's probably the
right fix for ext4, I think.

(Although note that kthread_stop(sbi->s_mmp_task) is called in
multiple places in fs/ext4/super.c, not just in the single location
which this patch touches.)

						- Ted
Pavel Skripkin April 29, 2021, 7:20 p.m. UTC | #5
Hi! Thanks for your reply.

On Thu, 29 Apr 2021 13:05:01 -0400
"Theodore Ts'o" <tytso@mit.edu> wrote:
> On Thu, Apr 29, 2021 at 02:33:54PM +0300, Pavel Skripkin wrote:
> > 
> > There is a chance, that kthread_stop() call will happen before
> > threadfn call. It means, that kthread_stop() return value must be
> > checked everywhere, isn't it? Otherwise, there are a lot of
> > potential memory leaks, because some developers rely on the fact,
> > that data allocated for the thread will be freed _inside_ thread
> > function.
> 
> That's not the only potential way that we could leak memory.  Earlier
> in kthread(), if this memory allocation fails,
> 
> 	self = kzalloc(sizeof(*self), GFP_KERNEL);
> 
> we will exit with -ENOMEM.  So at the very least all callers of
> kthread_stop() also need to check for -ENOMEM as well as -EINTR ---
> or, be somehow sure that the thread function was successfully called
> and started.  In this particular case, the ext4 mount code had just
> started the kmmpd thread, and then detected that something else had
> gone wrong, and failed the mount before the kmmpd thread ever had a
> chance to run.
> 
> I think if we want to fix this more generally across the whole kernel,
> we would need to have a variant of kthread_run which supplies two
> functions --- one which is the thread function, and the other which is
> a cleanup function.  The cleanup function could just be kfree, but
> there will be other cases where the cleanup function will need to do
> other work before freeing the data structure (e.g., brelse((struct
> mmpd_data *)data->bh)).

I skimmed through kernel code and I didn't find any code
examples, except ext4, where kthread is freeing something. Maybe, this
API isn't required, but, as Vegard said, comment over
kthread_stop() should be changed, because it's confusing.

I have already added kthread.c developers (I hope, I chose
the right emails) to CC. Maybe, they will think about this API. 

> 
> Is it worth it to provide such a cleanup function, which if present
> would be called any time the thread exits or is killed?  I dunno.
> It's probably simpler to just strongly recommend that the cleanup work
> should never be done in the thread function, but after kthread_stop()
> is called, whether it returns an error or not.  That's probably the
> right fix for ext4, I think.
> 
> (Although note that kthread_stop(sbi->s_mmp_task) is called in
> multiple places in fs/ext4/super.c, not just in the single location
> which this patch touches.)
> 

Good point, I'll add this and -ENOMEM checks and will send v2.

Thanks!

> 						- Ted



With regards,
Pavel Skripkin
Pavel Skripkin April 29, 2021, 8:09 p.m. UTC | #6
On Thu, 29 Apr 2021 13:05:01 -0400
"Theodore Ts'o" <tytso@mit.edu> wrote:

> On Thu, Apr 29, 2021 at 02:33:54PM +0300, Pavel Skripkin wrote:
> > 
> > There is a chance, that kthread_stop() call will happen before
> > threadfn call. It means, that kthread_stop() return value must be
> > checked everywhere, isn't it? Otherwise, there are a lot of
> > potential memory leaks, because some developers rely on the fact,
> > that data allocated for the thread will be freed _inside_ thread
> > function.
> 
> That's not the only potential way that we could leak memory.  Earlier
> in kthread(), if this memory allocation fails,
> 
> 	self = kzalloc(sizeof(*self), GFP_KERNEL);
> 
> we will exit with -ENOMEM.  So at the very least all callers of
> kthread_stop() also need to check for -ENOMEM as well as -EINTR ---
> or, be somehow sure that the thread function was successfully called
> and started.  In this particular case, the ext4 mount code had just
> started the kmmpd thread, and then detected that something else had
> gone wrong, and failed the mount before the kmmpd thread ever had a
> chance to run.

There is a small problem about -ENOMEM:

static int kmmpd(void *data)
{
...
			retval = read_mmp_block(sb, &bh_check, mmp_block);
			if (retval) {
				ext4_error_err(sb, -retval,
					       "error reading MMP data: %d",
					       retval);
				goto exit_thread;
			}
...

exit_thread:
	EXT4_SB(sb)->s_mmp_tsk = NULL;
	kfree(data);
	brelse(bh);
	return retval;
}

read_mmp_block can return -ENOMEM. In this case double free will happen.
I believe, we can change `return retval;` to `return 0;`, because
kthread return value isn't checked anywhere.

What do You think about it? 


With regards,
Pavel Skripkin
Theodore Ts'o April 29, 2021, 9:41 p.m. UTC | #7
On Thu, Apr 29, 2021 at 11:09:56PM +0300, Pavel Skripkin wrote:
> > we will exit with -ENOMEM.  So at the very least all callers of
> > kthread_stop() also need to check for -ENOMEM as well as -EINTR ---
> > or, be somehow sure that the thread function was successfully called
> > and started.  In this particular case, the ext4 mount code had just
> > started the kmmpd thread, and then detected that something else had
> > gone wrong, and failed the mount before the kmmpd thread ever had a
> > chance to run.
> 
> There is a small problem about -ENOMEM...

What I'd suggest is that we simply move

> exit_thread:
> 	EXT4_SB(sb)->s_mmp_tsk = NULL;
> 	kfree(data);
> 	brelse(bh);
> 	return retval;
> }

out of the thread function.  That means hanging struct mmpd_data off
the struct ext4_sb_info structure, and then adding a function like
this to fs/ext4/mmp.c

static void ext4_stop_mmpd(struct ext4_sb_info *sbi)
{
	if (sbi->s_mmp_tsk) {
		kthread_stop(sbi->s_mmp_tsk);
		brelse(sbi->s_mmp_data->bh);
		kfree(sbi->s_mmp_data);
		sbi->s_mmp_data = NULL;
		sbi->s_mmp_tsk = NULL;
	}
}

Basically, just move all of the cleanup so it is done after the
kthread is stopped, so we don't have to do any fancy error checking.
We just do it unconditionally.

					- Ted

P.S.  Actually, we could drop the struct mmpd_data altogether, and
just hang the buffer head as sbi->s_mmp_bh.  Then we can just pass the
struct super * as the private pointer to kmmpd().
Pavel Skripkin April 29, 2021, 10:05 p.m. UTC | #8
On Thu, 29 Apr 2021 17:41:12 -0400
"Theodore Ts'o" <tytso@mit.edu> wrote:

> On Thu, Apr 29, 2021 at 11:09:56PM +0300, Pavel Skripkin wrote:
> > > we will exit with -ENOMEM.  So at the very least all callers of
> > > kthread_stop() also need to check for -ENOMEM as well as -EINTR
> > > --- or, be somehow sure that the thread function was successfully
> > > called and started.  In this particular case, the ext4 mount code
> > > had just started the kmmpd thread, and then detected that
> > > something else had gone wrong, and failed the mount before the
> > > kmmpd thread ever had a chance to run.
> > 
> > There is a small problem about -ENOMEM...
> 
> What I'd suggest is that we simply move
> 
> > exit_thread:
> > 	EXT4_SB(sb)->s_mmp_tsk = NULL;
> > 	kfree(data);
> > 	brelse(bh);
> > 	return retval;
> > }
> 
> out of the thread function.  That means hanging struct mmpd_data off
> the struct ext4_sb_info structure, and then adding a function like
> this to fs/ext4/mmp.c
> 
> static void ext4_stop_mmpd(struct ext4_sb_info *sbi)
> {
> 	if (sbi->s_mmp_tsk) {
> 		kthread_stop(sbi->s_mmp_tsk);
> 		brelse(sbi->s_mmp_data->bh);
> 		kfree(sbi->s_mmp_data);
> 		sbi->s_mmp_data = NULL;
> 		sbi->s_mmp_tsk = NULL;
> 	}
> }
> 
> Basically, just move all of the cleanup so it is done after the
> kthread is stopped, so we don't have to do any fancy error checking.
> We just do it unconditionally.

This sound much better than my idea. Will do it in v2.

Thanks!

> 
> 					- Ted
> 
> P.S.  Actually, we could drop the struct mmpd_data altogether, and
> just hang the buffer head as sbi->s_mmp_bh.  Then we can just pass the
> struct super * as the private pointer to kmmpd().
> 



With regards,
Pavel Skripkin
Theodore Ts'o April 30, 2021, 3:44 a.m. UTC | #9
On Fri, Apr 30, 2021 at 01:05:47AM +0300, Pavel Skripkin wrote:
> > out of the thread function.  That means hanging struct mmpd_data off
> > the struct ext4_sb_info structure, and then adding a function like
> > this to fs/ext4/mmp.c
> > 
> > static void ext4_stop_mmpd(struct ext4_sb_info *sbi)

That should "extern void ...", since it will be called from
fs/ext4/super.c.  I had originally was thinking to put this function
in fs/ext4/super.c, but from the perspective of keeping the MMP code
in the same source file, it probably makes sense to keep this function
with the other MMP functions.

> > {
> > 	if (sbi->s_mmp_tsk) {
> > 		kthread_stop(sbi->s_mmp_tsk);
> > 		brelse(sbi->s_mmp_data->bh);
> > 		kfree(sbi->s_mmp_data);
> > 		sbi->s_mmp_data = NULL;
> > 		sbi->s_mmp_tsk = NULL;
> > 	}
> > }
> > 
> > Basically, just move all of the cleanup so it is done after the
> > kthread is stopped, so we don't have to do any fancy error checking.
> > We just do it unconditionally.

Cheers,

						- Ted
diff mbox series

Patch

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index b9693680463a..9c33e97bd5c5 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -5156,8 +5156,10 @@  static int ext4_fill_super(struct super_block *sb, void *data, int silent)
 failed_mount3:
 	flush_work(&sbi->s_error_work);
 	del_timer_sync(&sbi->s_err_report);
-	if (sbi->s_mmp_tsk)
-		kthread_stop(sbi->s_mmp_tsk);
+	if (sbi->s_mmp_tsk) {
+		if (kthread_stop(sbi->s_mmp_tsk) == -EINTR)
+			kfree(kthread_data(sbi->s_mmp_tsk));
+	}
 failed_mount2:
 	rcu_read_lock();
 	group_desc = rcu_dereference(sbi->s_group_desc);