diff mbox

[BUG] aborted ext4 leads to inifinity loop in balance_dirty_pages

Message ID 20111108000335.GA7518@quack.suse.cz
State Not Applicable, archived
Headers show

Commit Message

Jan Kara Nov. 8, 2011, 12:03 a.m. UTC
On Fri 28-10-11 14:34:31, Kazuya Mio wrote:
> 2011/10/25 22:40, Jan Kara wrote:
> >  Please no. Generally this boils down to what do we do with dirty data
> >when there's error in writing them out. Currently we just throw them away
> >(e.g. in media error case) but I don't think that's a generally good thing
> >because e.g. admin may want to copy the data to other working storage or
> >so. So I think we should rather keep the data and provide a mechanism for
> >userspace to ask kernel to get rid of the data (so that we don't eventually
> >run OOM).
> 
> I see. I agree with you.
> 
> >>Do you have any ideas?
> >  So the question is what would you like to achieve. If you just want to
> >unblock a thread then a solution would be to make a thread at
> >balance_dirty_pages() killable. If generally you want to get rid of dirty
> >memory, then I don't have a really good answer but throwing dirty data away
> >seems like a bad answer to me.
> 
> The problem is that we cannot unmount the corrupted filesystem due to
> un-killable dd process. We must bring down the system to resume the service
> with no dirty pages. I think it is important for the service continuity
> to be able to kill the thread handling in balance_dirty_pages().
  OK, attached are two patches based on latest Linus's tree that should
make your task killable. Can you test them?

									Honza

Comments

Kazuya Mio Nov. 9, 2011, 8:28 a.m. UTC | #1
2011/11/08 9:03, Jan Kara wrote:
> On Fri 28-10-11 14:34:31, Kazuya Mio wrote:
>> 2011/10/25 22:40, Jan Kara wrote:
>>>   Please no. Generally this boils down to what do we do with dirty data
>>> when there's error in writing them out. Currently we just throw them away
>>> (e.g. in media error case) but I don't think that's a generally good thing
>>> because e.g. admin may want to copy the data to other working storage or
>>> so. So I think we should rather keep the data and provide a mechanism for
>>> userspace to ask kernel to get rid of the data (so that we don't eventually
>>> run OOM).
>>
>> I see. I agree with you.
>>
>>>> Do you have any ideas?
>>>   So the question is what would you like to achieve. If you just want to
>>> unblock a thread then a solution would be to make a thread at
>>> balance_dirty_pages() killable. If generally you want to get rid of dirty
>>> memory, then I don't have a really good answer but throwing dirty data away
>>> seems like a bad answer to me.
>>
>> The problem is that we cannot unmount the corrupted filesystem due to
>> un-killable dd process. We must bring down the system to resume the service
>> with no dirty pages. I think it is important for the service continuity
>> to be able to kill the thread handling in balance_dirty_pages().
>    OK, attached are two patches based on latest Linus's tree that should
> make your task killable. Can you test them?

I'm trying to reproduce now, but it's hard. Could you wait a few days?
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jan Kara Nov. 9, 2011, 11:15 a.m. UTC | #2
On Wed 09-11-11 17:28:20, Kazuya Mio wrote:
> 2011/11/08 9:03, Jan Kara wrote:
> > On Fri 28-10-11 14:34:31, Kazuya Mio wrote:
> >> 2011/10/25 22:40, Jan Kara wrote:
> >>>   Please no. Generally this boils down to what do we do with dirty data
> >>> when there's error in writing them out. Currently we just throw them away
> >>> (e.g. in media error case) but I don't think that's a generally good thing
> >>> because e.g. admin may want to copy the data to other working storage or
> >>> so. So I think we should rather keep the data and provide a mechanism for
> >>> userspace to ask kernel to get rid of the data (so that we don't eventually
> >>> run OOM).
> >>
> >> I see. I agree with you.
> >>
> >>>> Do you have any ideas?
> >>>   So the question is what would you like to achieve. If you just want to
> >>> unblock a thread then a solution would be to make a thread at
> >>> balance_dirty_pages() killable. If generally you want to get rid of dirty
> >>> memory, then I don't have a really good answer but throwing dirty data away
> >>> seems like a bad answer to me.
> >>
> >> The problem is that we cannot unmount the corrupted filesystem due to
> >> un-killable dd process. We must bring down the system to resume the service
> >> with no dirty pages. I think it is important for the service continuity
> >> to be able to kill the thread handling in balance_dirty_pages().
> >    OK, attached are two patches based on latest Linus's tree that should
> > make your task killable. Can you test them?
> 
> I'm trying to reproduce now, but it's hard. Could you wait a few days?
  Sure, take as much time as you need.

								Honza
Kazuya Mio Nov. 14, 2011, 10:06 a.m. UTC | #3
2011/11/08 9:03, Jan Kara wrote:
> On Fri 28-10-11 14:34:31, Kazuya Mio wrote:
>> 2011/10/25 22:40, Jan Kara wrote:
>>>   Please no. Generally this boils down to what do we do with dirty data
>>> when there's error in writing them out. Currently we just throw them away
>>> (e.g. in media error case) but I don't think that's a generally good thing
>>> because e.g. admin may want to copy the data to other working storage or
>>> so. So I think we should rather keep the data and provide a mechanism for
>>> userspace to ask kernel to get rid of the data (so that we don't eventually
>>> run OOM).
>>
>> I see. I agree with you.
>>
>>>> Do you have any ideas?
>>>   So the question is what would you like to achieve. If you just want to
>>> unblock a thread then a solution would be to make a thread at
>>> balance_dirty_pages() killable. If generally you want to get rid of dirty
>>> memory, then I don't have a really good answer but throwing dirty data away
>>> seems like a bad answer to me.
>>
>> The problem is that we cannot unmount the corrupted filesystem due to
>> un-killable dd process. We must bring down the system to resume the service
>> with no dirty pages. I think it is important for the service continuity
>> to be able to kill the thread handling in balance_dirty_pages().
>    OK, attached are two patches based on latest Linus's tree that should
> make your task killable. Can you test them?

Sorry for the late reply.
I confirmed that these patches fix the problem.

Reported-and-tested-by: Kazuya Mio <k-mio@sx.jp.nec.com>

Regards,
Kazuya Mio
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jan Kara Nov. 14, 2011, 11:11 a.m. UTC | #4
On Mon 14-11-11 19:06:31, Kazuya Mio wrote:
> 2011/11/08 9:03, Jan Kara wrote:
> > On Fri 28-10-11 14:34:31, Kazuya Mio wrote:
> >> 2011/10/25 22:40, Jan Kara wrote:
> >>>   Please no. Generally this boils down to what do we do with dirty data
> >>> when there's error in writing them out. Currently we just throw them away
> >>> (e.g. in media error case) but I don't think that's a generally good thing
> >>> because e.g. admin may want to copy the data to other working storage or
> >>> so. So I think we should rather keep the data and provide a mechanism for
> >>> userspace to ask kernel to get rid of the data (so that we don't eventually
> >>> run OOM).
> >>
> >> I see. I agree with you.
> >>
> >>>> Do you have any ideas?
> >>>   So the question is what would you like to achieve. If you just want to
> >>> unblock a thread then a solution would be to make a thread at
> >>> balance_dirty_pages() killable. If generally you want to get rid of dirty
> >>> memory, then I don't have a really good answer but throwing dirty data away
> >>> seems like a bad answer to me.
> >>
> >> The problem is that we cannot unmount the corrupted filesystem due to
> >> un-killable dd process. We must bring down the system to resume the service
> >> with no dirty pages. I think it is important for the service continuity
> >> to be able to kill the thread handling in balance_dirty_pages().
> >    OK, attached are two patches based on latest Linus's tree that should
> > make your task killable. Can you test them?
> 
> Sorry for the late reply.
> I confirmed that these patches fix the problem.
> 
> Reported-and-tested-by: Kazuya Mio <k-mio@sx.jp.nec.com>
  Thanks for testing! I've sent patches for inclusion...

									Honza
diff mbox

Patch

From 6eefa10d92cc35b66a8166cc26472d383b572b0d Mon Sep 17 00:00:00 2001
From: Jan Kara <jack@suse.cz>
Date: Mon, 7 Nov 2011 18:46:39 +0100
Subject: [PATCH 2/2] fs: Make write(2) interruptible by a signal

Currently write(2) to a file is not interruptible by a signal. Sometimes this
is desirable (e.g. when you want to quickly kill a process hogging your disk or
when some process gets blocked in balance_dirty_pages() indefinitely due to a
filesystem being in an error condition).

Signed-off-by: Jan Kara <jack@suse.cz>
---
 mm/filemap.c |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index c0018f2..6b01d2f 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2407,6 +2407,10 @@  static ssize_t generic_perform_write(struct file *file,
 						iov_iter_count(i));
 
 again:
+		if (signal_pending(current)) {
+			status = -EINTR;
+			break;
+		}
 
 		/*
 		 * Bring in the user page that we will copy from _first_.
-- 
1.7.1