Message ID | 20210805151418.30659-1-paskripkin@gmail.com |
---|---|
State | New |
Headers | show |
Series | ext4: avoid huge mmp update interval value | expand |
On Thu, Aug 05, 2021 at 06:14:18PM +0300, Pavel Skripkin wrote: > Syzbot reported task hung bug in ext4_fill_super(). The problem was in > too huge mmp update interval. > > Syzkaller reproducer setted s_mmp_update_interval to 39785 seconds. This > update interaval is unreasonable huge and it can cause tasks to hung on > kthread_stop() call, since it will wait until timeout timer expires. I must be missing something. kthread_stop() should wake up the kmmpd() thread, which should see kthread_should_stop(), and then it should exit. What is causing it to wait until the timeout timer expires? - Ted
On 8/5/21 10:45 PM, Theodore Ts'o wrote: > On Thu, Aug 05, 2021 at 06:14:18PM +0300, Pavel Skripkin wrote: >> Syzbot reported task hung bug in ext4_fill_super(). The problem was in >> too huge mmp update interval. >> >> Syzkaller reproducer setted s_mmp_update_interval to 39785 seconds. This >> update interaval is unreasonable huge and it can cause tasks to hung on >> kthread_stop() call, since it will wait until timeout timer expires. > > I must be missing something. kthread_stop() should wake up the > kmmpd() thread, which should see kthread_should_stop(), and then it > should exit. What is causing it to wait until the timeout timer > expires? > > - Ted > Hi, Ted! I guess, I've explained my idea badly, sorry :) I mean, that there is a chance to hit this situation: CPU0 CPU1 kthread_should_stop() <-- false kthread_stop() set_bit(KTHREAD_SHOULD_STOP) wake_up_process() wait_for_completion() schedule_timeout_interruptible() *waits until timer expires* Since there wasn't any validation checks for mmp_update_interval, CPU0 will wait for up to (1 << 16) seconds (s_mmp_update_interval it __le16). With regards, Pavel Skripkin
On Thu, Aug 05, 2021 at 11:12:42PM +0300, Pavel Skripkin wrote: > On 8/5/21 10:45 PM, Theodore Ts'o wrote: > > On Thu, Aug 05, 2021 at 06:14:18PM +0300, Pavel Skripkin wrote: > > > Syzbot reported task hung bug in ext4_fill_super(). The problem was in > > > too huge mmp update interval. > > > > > > Syzkaller reproducer setted s_mmp_update_interval to 39785 seconds. This > > > update interaval is unreasonable huge and it can cause tasks to hung on > > > kthread_stop() call, since it will wait until timeout timer expires. > > > > I must be missing something. kthread_stop() should wake up the > > kmmpd() thread, which should see kthread_should_stop(), and then it > > should exit. What is causing it to wait until the timeout timer > > expires? > > > > - Ted > > > > > Hi, Ted! > > I guess, I've explained my idea badly, sorry :) > > I mean, that there is a chance to hit this situation: > > CPU0 CPU1 > kthread_should_stop() <-- false > kthread_stop() > set_bit(KTHREAD_SHOULD_STOP) > wake_up_process() > wait_for_completion() > schedule_timeout_interruptible() > > *waits until timer expires* Yeah, so the bug here is checking kthread_should_stop() while the task state is TASK_RUNNING. What you need to do here is: while (run) { .... set_current_state(TASK_INTERRUPTIBLE); if (kthread_should_stop()) { __set_current_state(TASK_RUNNING); break; } schedule_timeout(tout); ..... } That means in the case above where schedule() occurs after the kthread_should_stop() check has raced with kthread_stop(), then wake_up_process() will handle any races with schedule() correctly. i.e. wake_up_process() will set the task state to TASK_RUNNING and schedule() will not sleep if it is called after wake_up_process(). Or if schedule() runs first then wake_up_process() will wake it correctly after setting the state to TASK_RUNNING. Either way, the loop then runs around again straight away to the next kthread_should_stop() call, at which point it breaks out. I note that the "wait_to_exit:" code in the same function does this properly.... Cheers, Dave.
On 8/6/21 1:59 AM, Dave Chinner wrote: > On Thu, Aug 05, 2021 at 11:12:42PM +0300, Pavel Skripkin wrote: >> On 8/5/21 10:45 PM, Theodore Ts'o wrote: >> > On Thu, Aug 05, 2021 at 06:14:18PM +0300, Pavel Skripkin wrote: >> > > Syzbot reported task hung bug in ext4_fill_super(). The problem was in >> > > too huge mmp update interval. >> > > >> > > Syzkaller reproducer setted s_mmp_update_interval to 39785 seconds. This >> > > update interaval is unreasonable huge and it can cause tasks to hung on >> > > kthread_stop() call, since it will wait until timeout timer expires. >> > >> > I must be missing something. kthread_stop() should wake up the >> > kmmpd() thread, which should see kthread_should_stop(), and then it >> > should exit. What is causing it to wait until the timeout timer >> > expires? >> > >> > - Ted >> > >> >> >> Hi, Ted! >> >> I guess, I've explained my idea badly, sorry :) >> >> I mean, that there is a chance to hit this situation: >> >> CPU0 CPU1 >> kthread_should_stop() <-- false >> kthread_stop() >> set_bit(KTHREAD_SHOULD_STOP) >> wake_up_process() >> wait_for_completion() >> schedule_timeout_interruptible() >> >> *waits until timer expires* > > Yeah, so the bug here is checking kthread_should_stop() while > the task state is TASK_RUNNING. > > What you need to do here is: > > while (run) { > > .... > set_current_state(TASK_INTERRUPTIBLE); > if (kthread_should_stop()) { > __set_current_state(TASK_RUNNING); > break; > } > schedule_timeout(tout); > > ..... > } > > > That means in the case above where schedule() occurs after the > kthread_should_stop() check has raced with kthread_stop(), then > wake_up_process() will handle any races with schedule() correctly. > > i.e. wake_up_process() will set the task state to TASK_RUNNING and > schedule() will not sleep if it is called after wake_up_process(). > Or if schedule() runs first then wake_up_process() will wake it > correctly after setting the state to TASK_RUNNING. > > Either way, the loop then runs around again straight away to the next > kthread_should_stop() call, at which point it breaks out. > > I note that the "wait_to_exit:" code in the same function does this > properly.... > Hi, Dave! I've tested your suggestion with syzbot and it works, thank you! Anyway, @Ted, does it make sense to add boundaries for s_mmp_update_interval? I think, too big update interval for mmp isn't reasonable. I can send patch series with Dave's suggestion and previous patch. What do you think? With regards, Pavel Skripkin
diff --git a/fs/ext4/mmp.c b/fs/ext4/mmp.c index bc364c119af6..160abee66dce 100644 --- a/fs/ext4/mmp.c +++ b/fs/ext4/mmp.c @@ -7,6 +7,9 @@ #include "ext4.h" +#define EXT4_KMMP_MAX_INTERVAL 100 +#define EXT4_KMMP_MIN_INTERVAL 5 + /* Checksumming functions */ static __le32 ext4_mmp_csum(struct super_block *sb, struct mmp_struct *mmp) { @@ -140,6 +143,12 @@ static int kmmpd(void *data) unsigned long diff; int retval; + /* We should avoid unreasonable huge update interval, since it can cause + * task hung bug on umount or on error handling path in ext4_fill_super() + */ + mmp_update_interval = clamp(mmp_update_interval, EXT4_KMMP_MIN_INTERVAL, + EXT4_KMMP_MAX_INTERVAL); + mmp_block = le64_to_cpu(es->s_mmp_block); mmp = (struct mmp_struct *)(bh->b_data); mmp->mmp_time = cpu_to_le64(ktime_get_real_seconds()); @@ -156,6 +165,9 @@ static int kmmpd(void *data) memcpy(mmp->mmp_nodename, init_utsname()->nodename, sizeof(mmp->mmp_nodename)); + ext4_msg(sb, KERN_INFO, "Started kmmp thread with update interval = %u\n", + mmp_update_interval); + while (!kthread_should_stop() && !sb_rdonly(sb)) { if (!ext4_has_feature_mmp(sb)) { ext4_warning(sb, "kmmpd being stopped since MMP feature"
Syzbot reported task hung bug in ext4_fill_super(). The problem was in too huge mmp update interval. Syzkaller reproducer setted s_mmp_update_interval to 39785 seconds. This update interaval is unreasonable huge and it can cause tasks to hung on kthread_stop() call, since it will wait until timeout timer expires. To avoid this sutiation, I've added MIN and MAX constants for kmmp interval and clamped s_mmp_update_interval within the boundaries Reported-and-tested-by: syzbot+c9ff4822a62eee994ea3@syzkaller.appspotmail.com Fixes: c5e06d101aaf ("ext4: add support for multiple mount protection") Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> --- Hi, Ted and ext4 maintainers! I am not sure about min/max values for interval, so I look forward to receiving your views on these values and patch in general! With regards, Pavel Skripkin --- fs/ext4/mmp.c | 12 ++++++++++++ 1 file changed, 12 insertions(+)