Patchwork [3/3] ext4: Block mmapped writes while the fs is frozen

login
register
mail settings
Submitter Jan Kara
Date May 10, 2011, 10:29 p.m.
Message ID <1305066574-1573-4-git-send-email-jack@suse.cz>
Download mbox | patch
Permalink /patch/95058/
State Superseded
Headers show

Comments

Jan Kara - May 10, 2011, 10:29 p.m.
We should not allow file modification via mmap while the filesystem is
frozen. So block in ext4_page_mkwrite() while the filesystem is frozen.

We have to check for frozen filesystem with the page marked dirty and under
page lock with which we then return from ext4_page_mkwrite(). Only that way we
cannot race with writeback done by freezing code - either we mark the page
dirty after the writeback has started, see freezing in progress and block, or
writeback will wait for our page lock which is released only when the fault is
done and then writeback will writeout and writeprotect the page again.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/inode.c |   20 ++++++++++++++++++++
 1 files changed, 20 insertions(+), 0 deletions(-)
Christoph Hellwig - May 17, 2011, 3:11 p.m.
On Wed, May 11, 2011 at 12:29:34AM +0200, Jan Kara wrote:
> We should not allow file modification via mmap while the filesystem is
> frozen. So block in ext4_page_mkwrite() while the filesystem is frozen.
> 
> We have to check for frozen filesystem with the page marked dirty and under
> page lock with which we then return from ext4_page_mkwrite(). Only that way we
> cannot race with writeback done by freezing code - either we mark the page
> dirty after the writeback has started, see freezing in progress and block, or
> writeback will wait for our page lock which is released only when the fault is
> done and then writeback will writeout and writeprotect the page again.

This really should be done in (__)block_page_mkwrite.  I'd also return
VM_FAULT_RETRY instead of retrying inside the block_mkwrite handler
in case you hit the race.

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jan Kara - May 18, 2011, 7:56 a.m.
On Tue 17-05-11 11:11:08, Christoph Hellwig wrote:
> On Wed, May 11, 2011 at 12:29:34AM +0200, Jan Kara wrote:
> > We should not allow file modification via mmap while the filesystem is
> > frozen. So block in ext4_page_mkwrite() while the filesystem is frozen.
> > 
> > We have to check for frozen filesystem with the page marked dirty and under
> > page lock with which we then return from ext4_page_mkwrite(). Only that way we
> > cannot race with writeback done by freezing code - either we mark the page
> > dirty after the writeback has started, see freezing in progress and block, or
> > writeback will wait for our page lock which is released only when the fault is
> > done and then writeback will writeout and writeprotect the page again.
> 
> This really should be done in (__)block_page_mkwrite.  I'd also return
> VM_FAULT_RETRY instead of retrying inside the block_mkwrite handler
> in case you hit the race.
  Well, we can do the non-blocking check at the end of
__block_page_mkwrite() and return some error value (EAGAIN translating to
VM_FAULT_RETRY would look logical, I just have to think off better error
value for VM_FAULT_NOPAGE). But vfs_check_frozen() cannot be in
__block_page_mkwrite() since ext4 needs to call that with a transaction
started so that would create a deadlock and we need to call
vfs_check_frozen() somewhere so that we don't busyloop.

I can call vfs_check_frozen() inside block_page_mkwrite() but it would be a
bit surprising difference from __block_page_mkwrite() to me. Not sure what
the cleanest solution would be here...

									Honza
Christoph Hellwig - May 18, 2011, 8:07 a.m.
On Wed, May 18, 2011 at 09:56:14AM +0200, Jan Kara wrote:
> __block_page_mkwrite() and return some error value (EAGAIN translating to
> VM_FAULT_RETRY would look logical, I just have to think off better error
> value for VM_FAULT_NOPAGE). But vfs_check_frozen() cannot be in
> __block_page_mkwrite() since ext4 needs to call that with a transaction
> started so that would create a deadlock and we need to call
> vfs_check_frozen() somewhere so that we don't busyloop.
> 
> I can call vfs_check_frozen() inside block_page_mkwrite() but it would be a
> bit surprising difference from __block_page_mkwrite() to me. Not sure what
> the cleanest solution would be here...

block_page_mkwrite is supposed to be used directly by filesystems and
do all the right things.  IIRC Eric even mentioned he added
vfs_check_frozen to it for RHEL, but forgot to push it upstream.

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Sandeen - May 18, 2011, 2:03 p.m.
On 5/18/11 3:07 AM, Christoph Hellwig wrote:
> On Wed, May 18, 2011 at 09:56:14AM +0200, Jan Kara wrote:
>> __block_page_mkwrite() and return some error value (EAGAIN translating to
>> VM_FAULT_RETRY would look logical, I just have to think off better error
>> value for VM_FAULT_NOPAGE). But vfs_check_frozen() cannot be in
>> __block_page_mkwrite() since ext4 needs to call that with a transaction
>> started so that would create a deadlock and we need to call
>> vfs_check_frozen() somewhere so that we don't busyloop.
>>
>> I can call vfs_check_frozen() inside block_page_mkwrite() but it would be a
>> bit surprising difference from __block_page_mkwrite() to me. Not sure what
>> the cleanest solution would be here...
> 
> block_page_mkwrite is supposed to be used directly by filesystems and
> do all the right things.  IIRC Eric even mentioned he added
> vfs_check_frozen to it for RHEL, but forgot to push it upstream.

Well, I tried, but it was rejected IIRC.  Still, mea culpa....

I can resurrect what I did for RHEL5 and repost if desired...

-Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jan Kara - May 18, 2011, 3:25 p.m.
On Wed 18-05-11 09:03:27, Eric Sandeen wrote:
> On 5/18/11 3:07 AM, Christoph Hellwig wrote:
> > On Wed, May 18, 2011 at 09:56:14AM +0200, Jan Kara wrote:
> >> __block_page_mkwrite() and return some error value (EAGAIN translating to
> >> VM_FAULT_RETRY would look logical, I just have to think off better error
> >> value for VM_FAULT_NOPAGE). But vfs_check_frozen() cannot be in
> >> __block_page_mkwrite() since ext4 needs to call that with a transaction
> >> started so that would create a deadlock and we need to call
> >> vfs_check_frozen() somewhere so that we don't busyloop.
> >>
> >> I can call vfs_check_frozen() inside block_page_mkwrite() but it would be a
> >> bit surprising difference from __block_page_mkwrite() to me. Not sure what
> >> the cleanest solution would be here...
> > 
> > block_page_mkwrite is supposed to be used directly by filesystems and
> > do all the right things.  IIRC Eric even mentioned he added
> > vfs_check_frozen to it for RHEL, but forgot to push it upstream.
> 
> Well, I tried, but it was rejected IIRC.  Still, mea culpa....
> 
> I can resurrect what I did for RHEL5 and repost if desired...
  I've just submitted second version of the patch series. So please check
whether it does all you need... Thanks.

									Honza
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Sandeen - May 18, 2011, 4:40 p.m.
On May 18, 2011, at 10:25 AM, Jan Kara <jack@suse.cz> wrote:

> On Wed 18-05-11 09:03:27, Eric Sandeen wrote:
>> On 5/18/11 3:07 AM, Christoph Hellwig wrote:
>>> On Wed, May 18, 2011 at 09:56:14AM +0200, Jan Kara wrote:
>>>> __block_page_mkwrite() and return some error value (EAGAIN translating to
>>>> VM_FAULT_RETRY would look logical, I just have to think off better error
>>>> value for VM_FAULT_NOPAGE). But vfs_check_frozen() cannot be in
>>>> __block_page_mkwrite() since ext4 needs to call that with a transaction
>>>> started so that would create a deadlock and we need to call
>>>> vfs_check_frozen() somewhere so that we don't busyloop.
>>>> 
>>>> I can call vfs_check_frozen() inside block_page_mkwrite() but it would be a
>>>> bit surprising difference from __block_page_mkwrite() to me. Not sure what
>>>> the cleanest solution would be here...
>>> 
>>> block_page_mkwrite is supposed to be used directly by filesystems and
>>> do all the right things.  IIRC Eric even mentioned he added
>>> vfs_check_frozen to it for RHEL, but forgot to push it upstream.
>> 
>> Well, I tried, but it was rejected IIRC.  Still, mea culpa....
>> 
>> I can resurrect what I did for RHEL5 and repost if desired...
>  I've just submitted second version of the patch series. So please check
> whether it does all you need... Thanks.
> 
Thanks!  Btw the rejection I mentioned was years ago... Not you :)

-Eric

>                                    Honza
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 04b164d..a7e13b6 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -5801,6 +5801,12 @@  int ext4_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
 	get_block_t *get_block;
 	int retries = 0;
 
+restart:
+	/*
+	 * This check is racy but catches the common case. The check at the
+	 * end of this function is reliable.
+	 */
+	vfs_check_frozen(inode->i_sb, SB_FREEZE_WRITE);
 	/* Delalloc case is easy... */
 	if (test_opt(inode->i_sb, DELALLOC) &&
 	    !ext4_should_journal_data(inode) &&
@@ -5870,5 +5876,19 @@  out_ret:
 			ret = VM_FAULT_SIGBUS;
 	}
 out:
+	/*
+	 * Freezing in progress? We check after the page is marked dirty and
+	 * with page lock held so if the test here fails, we are sure freezing
+	 * code will wait during syncing until the page fault is done - at that
+	 * point page will be dirty and unlocked so freezing code will write it
+	 * and writeprotect it again.
+	 */
+	if (ret == VM_FAULT_LOCKED) {
+		set_page_dirty(page);
+		if (inode->i_sb->s_frozen != SB_UNFROZEN) {
+			unlock_page(page);
+			goto restart;
+		}
+	}
 	return ret;
 }