diff mbox series

ext4: avoid huge mmp update interval value

Message ID 20210805151418.30659-1-paskripkin@gmail.com
State New
Headers show
Series ext4: avoid huge mmp update interval value | expand

Commit Message

Pavel Skripkin Aug. 5, 2021, 3:14 p.m. UTC
Syzbot reported task hung bug in ext4_fill_super(). The problem was in
too huge mmp update interval.

Syzkaller reproducer setted s_mmp_update_interval to 39785 seconds. This
update interaval is unreasonable huge and it can cause tasks to hung on
kthread_stop() call, since it will wait until timeout timer expires.

To avoid this sutiation, I've added MIN and MAX constants for kmmp
interval and clamped s_mmp_update_interval within the boundaries

Reported-and-tested-by: syzbot+c9ff4822a62eee994ea3@syzkaller.appspotmail.com
Fixes: c5e06d101aaf ("ext4: add support for multiple mount protection")
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
---

Hi, Ted and ext4 maintainers!

I am not sure about min/max values for interval, so I look forward to
receiving your views on these values and patch in general!



With regards,
Pavel Skripkin

---
 fs/ext4/mmp.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

Comments

Theodore Ts'o Aug. 5, 2021, 7:45 p.m. UTC | #1
On Thu, Aug 05, 2021 at 06:14:18PM +0300, Pavel Skripkin wrote:
> Syzbot reported task hung bug in ext4_fill_super(). The problem was in
> too huge mmp update interval.
> 
> Syzkaller reproducer setted s_mmp_update_interval to 39785 seconds. This
> update interaval is unreasonable huge and it can cause tasks to hung on
> kthread_stop() call, since it will wait until timeout timer expires.

I must be missing something.  kthread_stop() should wake up the
kmmpd() thread, which should see kthread_should_stop(), and then it
should exit.  What is causing it to wait until the timeout timer
expires?

					- Ted
Pavel Skripkin Aug. 5, 2021, 8:12 p.m. UTC | #2
On 8/5/21 10:45 PM, Theodore Ts'o wrote:
> On Thu, Aug 05, 2021 at 06:14:18PM +0300, Pavel Skripkin wrote:
>> Syzbot reported task hung bug in ext4_fill_super(). The problem was in
>> too huge mmp update interval.
>> 
>> Syzkaller reproducer setted s_mmp_update_interval to 39785 seconds. This
>> update interaval is unreasonable huge and it can cause tasks to hung on
>> kthread_stop() call, since it will wait until timeout timer expires.
> 
> I must be missing something.  kthread_stop() should wake up the
> kmmpd() thread, which should see kthread_should_stop(), and then it
> should exit.  What is causing it to wait until the timeout timer
> expires?
> 
> 					- Ted
> 


Hi, Ted!

I guess, I've explained my idea badly, sorry :)

I mean, that there is a chance to hit this situation:

CPU0				CPU1
				kthread_should_stop()  <-- false
kthread_stop()
set_bit(KTHREAD_SHOULD_STOP)				
wake_up_process()
wait_for_completion()
				schedule_timeout_interruptible()

*waits until timer expires*


Since there wasn't any validation checks for mmp_update_interval, CPU0 
will wait for up to (1 << 16) seconds (s_mmp_update_interval it __le16).


With regards,
Pavel Skripkin
Dave Chinner Aug. 5, 2021, 10:59 p.m. UTC | #3
On Thu, Aug 05, 2021 at 11:12:42PM +0300, Pavel Skripkin wrote:
> On 8/5/21 10:45 PM, Theodore Ts'o wrote:
> > On Thu, Aug 05, 2021 at 06:14:18PM +0300, Pavel Skripkin wrote:
> > > Syzbot reported task hung bug in ext4_fill_super(). The problem was in
> > > too huge mmp update interval.
> > > 
> > > Syzkaller reproducer setted s_mmp_update_interval to 39785 seconds. This
> > > update interaval is unreasonable huge and it can cause tasks to hung on
> > > kthread_stop() call, since it will wait until timeout timer expires.
> > 
> > I must be missing something.  kthread_stop() should wake up the
> > kmmpd() thread, which should see kthread_should_stop(), and then it
> > should exit.  What is causing it to wait until the timeout timer
> > expires?
> > 
> > 					- Ted
> > 
> 
> 
> Hi, Ted!
> 
> I guess, I've explained my idea badly, sorry :)
> 
> I mean, that there is a chance to hit this situation:
> 
> CPU0				CPU1
> 				kthread_should_stop()  <-- false
> kthread_stop()
> set_bit(KTHREAD_SHOULD_STOP)				
> wake_up_process()
> wait_for_completion()
> 				schedule_timeout_interruptible()
> 
> *waits until timer expires*

Yeah, so the bug here is checking kthread_should_stop() while
the task state is TASK_RUNNING.

What you need to do here is:

while (run) {

	....
	set_current_state(TASK_INTERRUPTIBLE);
	if (kthread_should_stop()) {
		__set_current_state(TASK_RUNNING);
		break;
	}
	schedule_timeout(tout);

	.....
}


That means in the case above where schedule() occurs after the
kthread_should_stop() check has raced with kthread_stop(), then
wake_up_process() will handle any races with schedule() correctly.

i.e.  wake_up_process() will set the task state to TASK_RUNNING and
schedule() will not sleep if it is called after wake_up_process().
Or if schedule() runs first then wake_up_process() will wake it
correctly after setting the state to TASK_RUNNING.

Either way, the loop then runs around again straight away to the next
kthread_should_stop() call, at which point it breaks out.

I note that the "wait_to_exit:" code in the same function does this
properly....

Cheers,

Dave.
Pavel Skripkin Aug. 6, 2021, 10:20 a.m. UTC | #4
On 8/6/21 1:59 AM, Dave Chinner wrote:
> On Thu, Aug 05, 2021 at 11:12:42PM +0300, Pavel Skripkin wrote:
>> On 8/5/21 10:45 PM, Theodore Ts'o wrote:
>> > On Thu, Aug 05, 2021 at 06:14:18PM +0300, Pavel Skripkin wrote:
>> > > Syzbot reported task hung bug in ext4_fill_super(). The problem was in
>> > > too huge mmp update interval.
>> > > 
>> > > Syzkaller reproducer setted s_mmp_update_interval to 39785 seconds. This
>> > > update interaval is unreasonable huge and it can cause tasks to hung on
>> > > kthread_stop() call, since it will wait until timeout timer expires.
>> > 
>> > I must be missing something.  kthread_stop() should wake up the
>> > kmmpd() thread, which should see kthread_should_stop(), and then it
>> > should exit.  What is causing it to wait until the timeout timer
>> > expires?
>> > 
>> > 					- Ted
>> > 
>> 
>> 
>> Hi, Ted!
>> 
>> I guess, I've explained my idea badly, sorry :)
>> 
>> I mean, that there is a chance to hit this situation:
>> 
>> CPU0				CPU1
>> 				kthread_should_stop()  <-- false
>> kthread_stop()
>> set_bit(KTHREAD_SHOULD_STOP)				
>> wake_up_process()
>> wait_for_completion()
>> 				schedule_timeout_interruptible()
>> 
>> *waits until timer expires*
> 
> Yeah, so the bug here is checking kthread_should_stop() while
> the task state is TASK_RUNNING.
> 
> What you need to do here is:
> 
> while (run) {
> 
> 	....
> 	set_current_state(TASK_INTERRUPTIBLE);
> 	if (kthread_should_stop()) {
> 		__set_current_state(TASK_RUNNING);
> 		break;
> 	}
> 	schedule_timeout(tout);
> 
> 	.....
> }
> 
> 
> That means in the case above where schedule() occurs after the
> kthread_should_stop() check has raced with kthread_stop(), then
> wake_up_process() will handle any races with schedule() correctly.
> 
> i.e.  wake_up_process() will set the task state to TASK_RUNNING and
> schedule() will not sleep if it is called after wake_up_process().
> Or if schedule() runs first then wake_up_process() will wake it
> correctly after setting the state to TASK_RUNNING.
> 
> Either way, the loop then runs around again straight away to the next
> kthread_should_stop() call, at which point it breaks out.
> 
> I note that the "wait_to_exit:" code in the same function does this
> properly....
> 

Hi, Dave!

I've tested your suggestion with syzbot and it works, thank you!


Anyway, @Ted, does it make sense to add boundaries for 
s_mmp_update_interval? I think, too big update interval for mmp isn't 
reasonable. I can send patch series with Dave's suggestion and previous 
patch. What do you think?




With regards,
Pavel Skripkin
diff mbox series

Patch

diff --git a/fs/ext4/mmp.c b/fs/ext4/mmp.c
index bc364c119af6..160abee66dce 100644
--- a/fs/ext4/mmp.c
+++ b/fs/ext4/mmp.c
@@ -7,6 +7,9 @@ 
 
 #include "ext4.h"
 
+#define EXT4_KMMP_MAX_INTERVAL		100
+#define EXT4_KMMP_MIN_INTERVAL		5
+
 /* Checksumming functions */
 static __le32 ext4_mmp_csum(struct super_block *sb, struct mmp_struct *mmp)
 {
@@ -140,6 +143,12 @@  static int kmmpd(void *data)
 	unsigned long diff;
 	int retval;
 
+	/* We should avoid unreasonable huge update interval, since it can cause
+	 * task hung bug on umount or on error handling path in ext4_fill_super()
+	 */
+	mmp_update_interval = clamp(mmp_update_interval, EXT4_KMMP_MIN_INTERVAL,
+							 EXT4_KMMP_MAX_INTERVAL);
+
 	mmp_block = le64_to_cpu(es->s_mmp_block);
 	mmp = (struct mmp_struct *)(bh->b_data);
 	mmp->mmp_time = cpu_to_le64(ktime_get_real_seconds());
@@ -156,6 +165,9 @@  static int kmmpd(void *data)
 	memcpy(mmp->mmp_nodename, init_utsname()->nodename,
 	       sizeof(mmp->mmp_nodename));
 
+	ext4_msg(sb, KERN_INFO, "Started kmmp thread with update interval = %u\n",
+		 mmp_update_interval);
+
 	while (!kthread_should_stop() && !sb_rdonly(sb)) {
 		if (!ext4_has_feature_mmp(sb)) {
 			ext4_warning(sb, "kmmpd being stopped since MMP feature"