Message ID | 20210428172828.12589-1-paskripkin@gmail.com |
---|---|
State | Superseded |
Headers | show |
Series | ext4: fix memory leak in ext4_fill_super | expand |
On 2021-04-28 19:28, Pavel Skripkin wrote: > syzbot reported memory leak in ext4 subsyetem. > The problem appears, when thread_stop() call happens > before wake_up_process(). > > Normally, this data will be freed by > created thread, but if kthread_stop() > returned -EINTR, this data should be freed manually > > Reported-by: syzbot+d9e482e303930fa4f6ff@syzkaller.appspotmail.com > Tested-by: syzbot+d9e482e303930fa4f6ff@syzkaller.appspotmail.com > Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> > --- > fs/ext4/super.c | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/fs/ext4/super.c b/fs/ext4/super.c > index b9693680463a..9c33e97bd5c5 100644 > --- a/fs/ext4/super.c > +++ b/fs/ext4/super.c > @@ -5156,8 +5156,10 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent) > failed_mount3: > flush_work(&sbi->s_error_work); > del_timer_sync(&sbi->s_err_report); > - if (sbi->s_mmp_tsk) > - kthread_stop(sbi->s_mmp_tsk); > + if (sbi->s_mmp_tsk) { > + if (kthread_stop(sbi->s_mmp_tsk) == -EINTR) > + kfree(kthread_data(sbi->s_mmp_tsk)); > + } > failed_mount2: > rcu_read_lock(); > group_desc = rcu_dereference(sbi->s_group_desc); > So I've looked at this, and the puzzling thing is that ext4 uses kthread_run() which immediately calls wake_up_process() -- according to the kerneldoc for kthread_stop(), it shouldn't return -EINTR in this case: * Returns the result of threadfn(), or %-EINTR if wake_up_process() * was never called. */ int kthread_stop(struct task_struct *k) So it really looks like kthread_stop() can return -EINTR even when wake_up_process() has been called but the thread hasn't had a chance to run yet? If this is true, then we either have to fix kthread_create() to make sure it respects the behaviour that is claimed by the comment OR we have to audit every single kthread_stop() in the kernel which does not check for -EINTR. Vegard
On Thu, 29 Apr 2021 12:01:46 +0200 Vegard Nossum <vegard.nossum@oracle.com> wrote: > > On 2021-04-28 19:28, Pavel Skripkin wrote: > > syzbot reported memory leak in ext4 subsyetem. > > The problem appears, when thread_stop() call happens > > before wake_up_process(). > > > > Normally, this data will be freed by > > created thread, but if kthread_stop() > > returned -EINTR, this data should be freed manually > > > > Reported-by: syzbot+d9e482e303930fa4f6ff@syzkaller.appspotmail.com > > Tested-by: syzbot+d9e482e303930fa4f6ff@syzkaller.appspotmail.com > > Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> > > --- > > fs/ext4/super.c | 6 ++++-- > > 1 file changed, 4 insertions(+), 2 deletions(-) > > > > diff --git a/fs/ext4/super.c b/fs/ext4/super.c > > index b9693680463a..9c33e97bd5c5 100644 > > --- a/fs/ext4/super.c > > +++ b/fs/ext4/super.c > > @@ -5156,8 +5156,10 @@ static int ext4_fill_super(struct > > super_block *sb, void *data, int silent) failed_mount3: > > flush_work(&sbi->s_error_work); > > del_timer_sync(&sbi->s_err_report); > > - if (sbi->s_mmp_tsk) > > - kthread_stop(sbi->s_mmp_tsk); > > + if (sbi->s_mmp_tsk) { > > + if (kthread_stop(sbi->s_mmp_tsk) == -EINTR) > > + kfree(kthread_data(sbi->s_mmp_tsk)); > > + } > > failed_mount2: > > rcu_read_lock(); > > group_desc = rcu_dereference(sbi->s_group_desc); > > > > So I've looked at this, and the puzzling thing is that ext4 uses > kthread_run() which immediately calls wake_up_process() -- according > to the kerneldoc for kthread_stop(), it shouldn't return -EINTR in > this case: > > * Returns the result of threadfn(), or %-EINTR if wake_up_process() > * was never called. > */ > int kthread_stop(struct task_struct *k) > > So it really looks like kthread_stop() can return -EINTR even when > wake_up_process() has been called but the thread hasn't had a chance > to run yet? > > If this is true, then we either have to fix kthread_create() to make > sure it respects the behaviour that is claimed by the comment OR we > have to audit every single kthread_stop() in the kernel which does > not check for -EINTR. > Me and Vegard found the root case of this bug: static int kthread(void *_create) { .... ret = -EINTR; if (!test_bit(KTHREAD_SHOULD_STOP, &self->flags)) { cgroup_kthread_ready(); __kthread_parkme(self); ret = threadfn(data); } do_exit(ret); } There is a change, that kthread_stop() call will happen before this . It means, that all kthread_stop() return value must be checked everywhere Vegard wrote code snippet, which reproduces this behavior: #include <linux/printk.h> #include <linux/proc_fs.h> #include <linux/kthread.h> static int test_thread(void *data) { printk(KERN_ERR "test_thread()\n"); return 0; } static int test_show(struct seq_file *seq, void *data) { struct task_struct *t = kthread_run(test_thread, NULL, "test"); if (!IS_ERR(t)) { int ret = kthread_stop(t); printk(KERN_ERR "kthread_stop() = %d\n", ret); } return 0; } static void __init init_test(void) { proc_create_single("test", 0444, NULL, &test_show); } late_initcall(init_test); > > Vegard With regards, Pavel Skripkin
On Thu, 29 Apr 2021 12:01:46 +0200 Vegard Nossum <vegard.nossum@oracle.com> wrote: > > On 2021-04-28 19:28, Pavel Skripkin wrote: > > syzbot reported memory leak in ext4 subsyetem. > > The problem appears, when thread_stop() call happens > > before wake_up_process(). > > > > Normally, this data will be freed by > > created thread, but if kthread_stop() > > returned -EINTR, this data should be freed manually > > > > Reported-by: syzbot+d9e482e303930fa4f6ff@syzkaller.appspotmail.com > > Tested-by: syzbot+d9e482e303930fa4f6ff@syzkaller.appspotmail.com > > Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> > > --- > > fs/ext4/super.c | 6 ++++-- > > 1 file changed, 4 insertions(+), 2 deletions(-) > > > > diff --git a/fs/ext4/super.c b/fs/ext4/super.c > > index b9693680463a..9c33e97bd5c5 100644 > > --- a/fs/ext4/super.c > > +++ b/fs/ext4/super.c > > @@ -5156,8 +5156,10 @@ static int ext4_fill_super(struct > > super_block *sb, void *data, int silent) failed_mount3: > > flush_work(&sbi->s_error_work); > > del_timer_sync(&sbi->s_err_report); > > - if (sbi->s_mmp_tsk) > > - kthread_stop(sbi->s_mmp_tsk); > > + if (sbi->s_mmp_tsk) { > > + if (kthread_stop(sbi->s_mmp_tsk) == -EINTR) > > + kfree(kthread_data(sbi->s_mmp_tsk)); > > + } > > failed_mount2: > > rcu_read_lock(); > > group_desc = rcu_dereference(sbi->s_group_desc); > > > > So I've looked at this, and the puzzling thing is that ext4 uses > kthread_run() which immediately calls wake_up_process() -- according > to the kerneldoc for kthread_stop(), it shouldn't return -EINTR in > this case: > > * Returns the result of threadfn(), or %-EINTR if wake_up_process() > * was never called. > */ > int kthread_stop(struct task_struct *k) > > So it really looks like kthread_stop() can return -EINTR even when > wake_up_process() has been called but the thread hasn't had a chance > to run yet? > > If this is true, then we either have to fix kthread_create() to make > sure it respects the behaviour that is claimed by the comment OR we > have to audit every single kthread_stop() in the kernel which does > not check for -EINTR. > > > Vegard I am sorry for my complitely broken mail client :( Me and Vegard found the root case of this bug: static int kthread(void *_create) { .... ret = -EINTR; if (!test_bit(KTHREAD_SHOULD_STOP, &self->flags)) { cgroup_kthread_ready(); __kthread_parkme(self); ret = threadfn(data); } do_exit(ret); } There is a chance, that kthread_stop() call will happen before threadfn call. It means, that kthread_stop() return value must be checked everywhere, isn't it? Otherwise, there are a lot of potential memory leaks, because some developers rely on the fact, that data allocated for the thread will be freed _inside_ thread function. Vegard wrote the code snippet, which reproduces this behavior: #include <linux/printk.h> #include <linux/proc_fs.h> #include <linux/kthread.h> static int test_thread(void *data) { printk(KERN_ERR "test_thread()\n"); return 0; } static int test_show(struct seq_file *seq, void *data) { struct task_struct *t = kthread_run(test_thread, NULL, "test"); if (!IS_ERR(t)) { int ret = kthread_stop(t); printk(KERN_ERR "kthread_stop() = %d\n", ret); } return 0; } static void __init init_test(void) { proc_create_single("test", 0444, NULL, &test_show); } late_initcall(init_test); So, is this behavior is expected or not? Should maintainers rewrite code, which doesn't check kthread_stop() return value? With regards, Pavel Skripkin
On Thu, Apr 29, 2021 at 02:33:54PM +0300, Pavel Skripkin wrote: > > There is a chance, that kthread_stop() call will happen before > threadfn call. It means, that kthread_stop() return value must be checked everywhere, > isn't it? Otherwise, there are a lot of potential memory leaks, > because some developers rely on the fact, that data allocated for the thread will > be freed _inside_ thread function. That's not the only potential way that we could leak memory. Earlier in kthread(), if this memory allocation fails, self = kzalloc(sizeof(*self), GFP_KERNEL); we will exit with -ENOMEM. So at the very least all callers of kthread_stop() also need to check for -ENOMEM as well as -EINTR --- or, be somehow sure that the thread function was successfully called and started. In this particular case, the ext4 mount code had just started the kmmpd thread, and then detected that something else had gone wrong, and failed the mount before the kmmpd thread ever had a chance to run. I think if we want to fix this more generally across the whole kernel, we would need to have a variant of kthread_run which supplies two functions --- one which is the thread function, and the other which is a cleanup function. The cleanup function could just be kfree, but there will be other cases where the cleanup function will need to do other work before freeing the data structure (e.g., brelse((struct mmpd_data *)data->bh)). Is it worth it to provide such a cleanup function, which if present would be called any time the thread exits or is killed? I dunno. It's probably simpler to just strongly recommend that the cleanup work should never be done in the thread function, but after kthread_stop() is called, whether it returns an error or not. That's probably the right fix for ext4, I think. (Although note that kthread_stop(sbi->s_mmp_task) is called in multiple places in fs/ext4/super.c, not just in the single location which this patch touches.) - Ted
Hi! Thanks for your reply. On Thu, 29 Apr 2021 13:05:01 -0400 "Theodore Ts'o" <tytso@mit.edu> wrote: > On Thu, Apr 29, 2021 at 02:33:54PM +0300, Pavel Skripkin wrote: > > > > There is a chance, that kthread_stop() call will happen before > > threadfn call. It means, that kthread_stop() return value must be > > checked everywhere, isn't it? Otherwise, there are a lot of > > potential memory leaks, because some developers rely on the fact, > > that data allocated for the thread will be freed _inside_ thread > > function. > > That's not the only potential way that we could leak memory. Earlier > in kthread(), if this memory allocation fails, > > self = kzalloc(sizeof(*self), GFP_KERNEL); > > we will exit with -ENOMEM. So at the very least all callers of > kthread_stop() also need to check for -ENOMEM as well as -EINTR --- > or, be somehow sure that the thread function was successfully called > and started. In this particular case, the ext4 mount code had just > started the kmmpd thread, and then detected that something else had > gone wrong, and failed the mount before the kmmpd thread ever had a > chance to run. > > I think if we want to fix this more generally across the whole kernel, > we would need to have a variant of kthread_run which supplies two > functions --- one which is the thread function, and the other which is > a cleanup function. The cleanup function could just be kfree, but > there will be other cases where the cleanup function will need to do > other work before freeing the data structure (e.g., brelse((struct > mmpd_data *)data->bh)). I skimmed through kernel code and I didn't find any code examples, except ext4, where kthread is freeing something. Maybe, this API isn't required, but, as Vegard said, comment over kthread_stop() should be changed, because it's confusing. I have already added kthread.c developers (I hope, I chose the right emails) to CC. Maybe, they will think about this API. > > Is it worth it to provide such a cleanup function, which if present > would be called any time the thread exits or is killed? I dunno. > It's probably simpler to just strongly recommend that the cleanup work > should never be done in the thread function, but after kthread_stop() > is called, whether it returns an error or not. That's probably the > right fix for ext4, I think. > > (Although note that kthread_stop(sbi->s_mmp_task) is called in > multiple places in fs/ext4/super.c, not just in the single location > which this patch touches.) > Good point, I'll add this and -ENOMEM checks and will send v2. Thanks! > - Ted With regards, Pavel Skripkin
On Thu, 29 Apr 2021 13:05:01 -0400 "Theodore Ts'o" <tytso@mit.edu> wrote: > On Thu, Apr 29, 2021 at 02:33:54PM +0300, Pavel Skripkin wrote: > > > > There is a chance, that kthread_stop() call will happen before > > threadfn call. It means, that kthread_stop() return value must be > > checked everywhere, isn't it? Otherwise, there are a lot of > > potential memory leaks, because some developers rely on the fact, > > that data allocated for the thread will be freed _inside_ thread > > function. > > That's not the only potential way that we could leak memory. Earlier > in kthread(), if this memory allocation fails, > > self = kzalloc(sizeof(*self), GFP_KERNEL); > > we will exit with -ENOMEM. So at the very least all callers of > kthread_stop() also need to check for -ENOMEM as well as -EINTR --- > or, be somehow sure that the thread function was successfully called > and started. In this particular case, the ext4 mount code had just > started the kmmpd thread, and then detected that something else had > gone wrong, and failed the mount before the kmmpd thread ever had a > chance to run. There is a small problem about -ENOMEM: static int kmmpd(void *data) { ... retval = read_mmp_block(sb, &bh_check, mmp_block); if (retval) { ext4_error_err(sb, -retval, "error reading MMP data: %d", retval); goto exit_thread; } ... exit_thread: EXT4_SB(sb)->s_mmp_tsk = NULL; kfree(data); brelse(bh); return retval; } read_mmp_block can return -ENOMEM. In this case double free will happen. I believe, we can change `return retval;` to `return 0;`, because kthread return value isn't checked anywhere. What do You think about it? With regards, Pavel Skripkin
On Thu, Apr 29, 2021 at 11:09:56PM +0300, Pavel Skripkin wrote: > > we will exit with -ENOMEM. So at the very least all callers of > > kthread_stop() also need to check for -ENOMEM as well as -EINTR --- > > or, be somehow sure that the thread function was successfully called > > and started. In this particular case, the ext4 mount code had just > > started the kmmpd thread, and then detected that something else had > > gone wrong, and failed the mount before the kmmpd thread ever had a > > chance to run. > > There is a small problem about -ENOMEM... What I'd suggest is that we simply move > exit_thread: > EXT4_SB(sb)->s_mmp_tsk = NULL; > kfree(data); > brelse(bh); > return retval; > } out of the thread function. That means hanging struct mmpd_data off the struct ext4_sb_info structure, and then adding a function like this to fs/ext4/mmp.c static void ext4_stop_mmpd(struct ext4_sb_info *sbi) { if (sbi->s_mmp_tsk) { kthread_stop(sbi->s_mmp_tsk); brelse(sbi->s_mmp_data->bh); kfree(sbi->s_mmp_data); sbi->s_mmp_data = NULL; sbi->s_mmp_tsk = NULL; } } Basically, just move all of the cleanup so it is done after the kthread is stopped, so we don't have to do any fancy error checking. We just do it unconditionally. - Ted P.S. Actually, we could drop the struct mmpd_data altogether, and just hang the buffer head as sbi->s_mmp_bh. Then we can just pass the struct super * as the private pointer to kmmpd().
On Thu, 29 Apr 2021 17:41:12 -0400 "Theodore Ts'o" <tytso@mit.edu> wrote: > On Thu, Apr 29, 2021 at 11:09:56PM +0300, Pavel Skripkin wrote: > > > we will exit with -ENOMEM. So at the very least all callers of > > > kthread_stop() also need to check for -ENOMEM as well as -EINTR > > > --- or, be somehow sure that the thread function was successfully > > > called and started. In this particular case, the ext4 mount code > > > had just started the kmmpd thread, and then detected that > > > something else had gone wrong, and failed the mount before the > > > kmmpd thread ever had a chance to run. > > > > There is a small problem about -ENOMEM... > > What I'd suggest is that we simply move > > > exit_thread: > > EXT4_SB(sb)->s_mmp_tsk = NULL; > > kfree(data); > > brelse(bh); > > return retval; > > } > > out of the thread function. That means hanging struct mmpd_data off > the struct ext4_sb_info structure, and then adding a function like > this to fs/ext4/mmp.c > > static void ext4_stop_mmpd(struct ext4_sb_info *sbi) > { > if (sbi->s_mmp_tsk) { > kthread_stop(sbi->s_mmp_tsk); > brelse(sbi->s_mmp_data->bh); > kfree(sbi->s_mmp_data); > sbi->s_mmp_data = NULL; > sbi->s_mmp_tsk = NULL; > } > } > > Basically, just move all of the cleanup so it is done after the > kthread is stopped, so we don't have to do any fancy error checking. > We just do it unconditionally. This sound much better than my idea. Will do it in v2. Thanks! > > - Ted > > P.S. Actually, we could drop the struct mmpd_data altogether, and > just hang the buffer head as sbi->s_mmp_bh. Then we can just pass the > struct super * as the private pointer to kmmpd(). > With regards, Pavel Skripkin
On Fri, Apr 30, 2021 at 01:05:47AM +0300, Pavel Skripkin wrote: > > out of the thread function. That means hanging struct mmpd_data off > > the struct ext4_sb_info structure, and then adding a function like > > this to fs/ext4/mmp.c > > > > static void ext4_stop_mmpd(struct ext4_sb_info *sbi) That should "extern void ...", since it will be called from fs/ext4/super.c. I had originally was thinking to put this function in fs/ext4/super.c, but from the perspective of keeping the MMP code in the same source file, it probably makes sense to keep this function with the other MMP functions. > > { > > if (sbi->s_mmp_tsk) { > > kthread_stop(sbi->s_mmp_tsk); > > brelse(sbi->s_mmp_data->bh); > > kfree(sbi->s_mmp_data); > > sbi->s_mmp_data = NULL; > > sbi->s_mmp_tsk = NULL; > > } > > } > > > > Basically, just move all of the cleanup so it is done after the > > kthread is stopped, so we don't have to do any fancy error checking. > > We just do it unconditionally. Cheers, - Ted
diff --git a/fs/ext4/super.c b/fs/ext4/super.c index b9693680463a..9c33e97bd5c5 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -5156,8 +5156,10 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent) failed_mount3: flush_work(&sbi->s_error_work); del_timer_sync(&sbi->s_err_report); - if (sbi->s_mmp_tsk) - kthread_stop(sbi->s_mmp_tsk); + if (sbi->s_mmp_tsk) { + if (kthread_stop(sbi->s_mmp_tsk) == -EINTR) + kfree(kthread_data(sbi->s_mmp_tsk)); + } failed_mount2: rcu_read_lock(); group_desc = rcu_dereference(sbi->s_group_desc);