Patchwork ext4: fix blkdev_issue_flush() failure handling

login
register
mail settings
Submitter Bartlomiej Zolnierkiewicz
Date March 29, 2009, 5:28 p.m.
Message ID <200903291928.14451.bzolnier@gmail.com>
Download mbox | patch
Permalink /patch/25290/
State New
Headers show

Comments

Bartlomiej Zolnierkiewicz - March 29, 2009, 5:28 p.m.
blkdev_issue_flush() may fail (i.e. due to media error on FLUSH CACHE
command execution) so its users should check for the return value.

Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
---
 fs/ext4/fsync.c |    9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Sandeen - March 29, 2009, 5:43 p.m.
Bartlomiej Zolnierkiewicz wrote:
> blkdev_issue_flush() may fail (i.e. due to media error on FLUSH CACHE
> command execution) so its users should check for the return value.
> 
> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
> ---
>  fs/ext4/fsync.c |    9 ++++++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
> 
> Index: b/fs/ext4/fsync.c
> ===================================================================
> --- a/fs/ext4/fsync.c
> +++ b/fs/ext4/fsync.c
> @@ -48,7 +48,7 @@ int ext4_sync_file(struct file *file, st
>  {
>  	struct inode *inode = dentry->d_inode;
>  	journal_t *journal = EXT4_SB(inode->i_sb)->s_journal;
> -	int ret = 0;
> +	int ret = 0, tmp_ret;
>  
>  	J_ASSERT(ext4_journal_current_handle() == NULL);
>  
> @@ -92,8 +92,11 @@ int ext4_sync_file(struct file *file, st
>  			.nr_to_write = 0, /* sys_fsync did this */
>  		};
>  		ret = sync_inode(inode, &wbc);
> -		if (journal && (journal->j_flags & JBD2_BARRIER))
> -			blkdev_issue_flush(inode->i_sb->s_bdev, NULL);
> +		if (journal && (journal->j_flags & JBD2_BARRIER)) {
> +			tmp_ret = blkdev_issue_flush(inode->i_sb->s_bdev, NULL);
> +			if (ret == 0 && tmp_ret < 0 && tmp_ret != -EOPNOTSUPP)
> +				ret = tmp_ret;
> +		}
>  	}
>  out:
>  	return ret;

As long as we keep the call there this is probably good, but after
talking w/ Chris Mason, I think the call is extraneous anyway and should
probably just be removed...

-Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Theodore Ts'o - March 30, 2009, 2:25 a.m.
On Sun, Mar 29, 2009 at 12:43:22PM -0500, Eric Sandeen wrote:
> 
> As long as we keep the call there this is probably good, but after
> talking w/ Chris Mason, I think the call is extraneous anyway and should
> probably just be removed...
> 

Yes, I agree, but it takes a lot of digging to be completely sure of
that it's safe to remove it.  Interestingly, it was you who added the
patch which added the call to blkdev_issue_flush():

commit d755fb384250d6bd7fd18a0930e71965acc8e72e
Author: Eric Sandeen <sandeen@redhat.com>
Date:   Fri Jul 11 19:27:31 2008 -0400

    ext4: call blkdev_issue_flush on fsync
    
    To ensure that bits are truly on-disk after an fsync,
    we should call blkdev_issue_flush if barriers are supported.
    
    Inspired by an old thread on barriers, by reiserfs & xfs
    which do the same, and by a patch SuSE ships with their kernel
    
    Signed-off-by: Eric Sandeen <sandeen@redhat.com>
    Signed-off-by: Mingming Cao <cmm@us.ibm.com>
    Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

When we remove it we should add a comment noting why it's not
necessary.

						- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Sandeen - March 30, 2009, 3:22 a.m.
Theodore Tso wrote:
> On Sun, Mar 29, 2009 at 12:43:22PM -0500, Eric Sandeen wrote:
>> As long as we keep the call there this is probably good, but after
>> talking w/ Chris Mason, I think the call is extraneous anyway and should
>> probably just be removed...
>>
> 
> Yes, I agree, but it takes a lot of digging to be completely sure of
> that it's safe to remove it.  Interestingly, it was you who added the
> patch which added the call to blkdev_issue_flush():

> commit d755fb384250d6bd7fd18a0930e71965acc8e72e
> Author: Eric Sandeen <sandeen@redhat.com>
> Date:   Fri Jul 11 19:27:31 2008 -0400


Yes, it was.  Although I got the idea when hch pointed out that SuSE did
this... thanks to Chris.  It's come full circle.  :)

-Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Chris Mason - March 30, 2009, 11:47 a.m.
On Sun, 2009-03-29 at 22:22 -0500, Eric Sandeen wrote:
> Theodore Tso wrote:
> > On Sun, Mar 29, 2009 at 12:43:22PM -0500, Eric Sandeen wrote:
> >> As long as we keep the call there this is probably good, but after
> >> talking w/ Chris Mason, I think the call is extraneous anyway and should
> >> probably just be removed...
> >>
> > 
> > Yes, I agree, but it takes a lot of digging to be completely sure of
> > that it's safe to remove it.  Interestingly, it was you who added the
> > patch which added the call to blkdev_issue_flush():
> 
> > commit d755fb384250d6bd7fd18a0930e71965acc8e72e
> > Author: Eric Sandeen <sandeen@redhat.com>
> > Date:   Fri Jul 11 19:27:31 2008 -0400
> 
> 
> Yes, it was.  Although I got the idea when hch pointed out that SuSE did
> this... thanks to Chris.  It's come full circle.  :)

Grin.  I'm not sure the I_DIRTY checks alone are enough to decide that a
commit is required though.  I think the inode could be clean but still
have metadata that needs commit.

-chris


--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Fernando Luis Vázquez Cao - March 30, 2009, 1:01 p.m.
Chris Mason wrote:
> On Sun, 2009-03-29 at 22:22 -0500, Eric Sandeen wrote:
>> Theodore Tso wrote:
>>> On Sun, Mar 29, 2009 at 12:43:22PM -0500, Eric Sandeen wrote:
>>>> As long as we keep the call there this is probably good, but after
>>>> talking w/ Chris Mason, I think the call is extraneous anyway and should
>>>> probably just be removed...
>>>>
>>> Yes, I agree, but it takes a lot of digging to be completely sure of
>>> that it's safe to remove it.  Interestingly, it was you who added the
>>> patch which added the call to blkdev_issue_flush():
>>> commit d755fb384250d6bd7fd18a0930e71965acc8e72e
>>> Author: Eric Sandeen <sandeen@redhat.com>
>>> Date:   Fri Jul 11 19:27:31 2008 -0400
>>
>> Yes, it was.  Although I got the idea when hch pointed out that SuSE did
>> this... thanks to Chris.  It's come full circle.  :)
> 
> Grin.  I'm not sure the I_DIRTY checks alone are enough to decide that a
> commit is required though.  I think the inode could be clean but still
> have metadata that needs commit.

Chris, I have just sent patches that attempt to fix both ext3 and
ext4 while also adding a per-device sysfs knob tu disable
write-flushes. A previous version of this patch set added a new
generic mount option but comments from Christoph and others
convinced me to turn it into a per-device tunable. Could you take
a look at the patches?

Bartlomiej, I have just noticed that I happened to be working on
patches for reiserfs and xfs similar to the ones you sent earlier
this week. I picked some bits from your submission so I took the
liberty to add your signed-off to my patches. Could you take a
look at them and let me know if you are confortable with that?

Latest patches: http://lkml.org/lkml/2009/3/30/100
Beginning of the sub-thread: http://lkml.org/lkml/2009/3/29/28

Best regards,

Fernando
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Chris Mason - March 30, 2009, 1:24 p.m.
On Mon, 2009-03-30 at 22:01 +0900, Fernando Luis Vázquez Cao wrote:
> Chris Mason wrote:
> > On Sun, 2009-03-29 at 22:22 -0500, Eric Sandeen wrote:
> >> Theodore Tso wrote:
> >>> On Sun, Mar 29, 2009 at 12:43:22PM -0500, Eric Sandeen wrote:
> >>>> As long as we keep the call there this is probably good, but after
> >>>> talking w/ Chris Mason, I think the call is extraneous anyway and should
> >>>> probably just be removed...
> >>>>
> >>> Yes, I agree, but it takes a lot of digging to be completely sure of
> >>> that it's safe to remove it.  Interestingly, it was you who added the
> >>> patch which added the call to blkdev_issue_flush():
> >>> commit d755fb384250d6bd7fd18a0930e71965acc8e72e
> >>> Author: Eric Sandeen <sandeen@redhat.com>
> >>> Date:   Fri Jul 11 19:27:31 2008 -0400
> >>
> >> Yes, it was.  Although I got the idea when hch pointed out that SuSE did
> >> this... thanks to Chris.  It's come full circle.  :)
> > 
> > Grin.  I'm not sure the I_DIRTY checks alone are enough to decide that a
> > commit is required though.  I think the inode could be clean but still
> > have metadata that needs commit.
> 
> Chris, I have just sent patches that attempt to fix both ext3 and
> ext4 while also adding a per-device sysfs knob tu disable
> write-flushes. A previous version of this patch set added a new
> generic mount option but comments from Christoph and others
> convinced me to turn it into a per-device tunable. Could you take
> a look at the patches?
> 

Jens' comment are right on I think.  If we get that fixed up we can get
rid of all the filesystem mount -o barrier=flush,0,1,xyz confusion and
set it via the block devices directly.

That would be nice ;)

-chris


--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Theodore Ts'o - March 30, 2009, 2:25 p.m.
On Mon, Mar 30, 2009 at 10:01:16PM +0900, Fernando Luis Vázquez Cao wrote:
>
> Chris, I have just sent patches that attempt to fix both ext3 and
> ext4 while also adding a per-device sysfs knob tu disable
> write-flushes. A previous version of this patch set added a new
> generic mount option but comments from Christoph and others
> convinced me to turn it into a per-device tunable. Could you take
> a look at the patches?

Fernando, see my comments on those patches.  We don't need to issue a
barrier after a call to sync_inode() or ext[34]_force_commit(), since
those functions will issue a barrier for us.  It would probably be a
good idea to use blktrace to test and make sure that we have one and
exactly one barrier op issued for each fsync().

						- Ted

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bartlomiej Zolnierkiewicz - March 30, 2009, 3:26 p.m.
Hi,

On Monday 30 March 2009, Fernando Luis Vázquez Cao wrote:
> Chris Mason wrote:
> > On Sun, 2009-03-29 at 22:22 -0500, Eric Sandeen wrote:
> >> Theodore Tso wrote:
> >>> On Sun, Mar 29, 2009 at 12:43:22PM -0500, Eric Sandeen wrote:
> >>>> As long as we keep the call there this is probably good, but after
> >>>> talking w/ Chris Mason, I think the call is extraneous anyway and should
> >>>> probably just be removed...
> >>>>
> >>> Yes, I agree, but it takes a lot of digging to be completely sure of
> >>> that it's safe to remove it.  Interestingly, it was you who added the
> >>> patch which added the call to blkdev_issue_flush():
> >>> commit d755fb384250d6bd7fd18a0930e71965acc8e72e
> >>> Author: Eric Sandeen <sandeen@redhat.com>
> >>> Date:   Fri Jul 11 19:27:31 2008 -0400
> >>
> >> Yes, it was.  Although I got the idea when hch pointed out that SuSE did
> >> this... thanks to Chris.  It's come full circle.  :)
> > 
> > Grin.  I'm not sure the I_DIRTY checks alone are enough to decide that a
> > commit is required though.  I think the inode could be clean but still
> > have metadata that needs commit.
> 
> Chris, I have just sent patches that attempt to fix both ext3 and
> ext4 while also adding a per-device sysfs knob tu disable
> write-flushes. A previous version of this patch set added a new
> generic mount option but comments from Christoph and others
> convinced me to turn it into a per-device tunable. Could you take
> a look at the patches?
> 
> Bartlomiej, I have just noticed that I happened to be working on
> patches for reiserfs and xfs similar to the ones you sent earlier
> this week. I picked some bits from your submission so I took the
> liberty to add your signed-off to my patches. Could you take a
> look at them and let me know if you are confortable with that?

I'm fine with people building bigger changes on top of my patches
but if you do so you please clearly denote in the patch description
what changes you have applied to the original patch...

Thanks,
Bart
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Theodore Ts'o - March 30, 2009, 5:46 p.m.
On Mon, Mar 30, 2009 at 07:47:47AM -0400, Chris Mason wrote:
> > Yes, it was.  Although I got the idea when hch pointed out that SuSE did
> > this... thanks to Chris.  It's come full circle.  :)
> 
> Grin.  I'm not sure the I_DIRTY checks alone are enough to decide that a
> commit is required though.  I think the inode could be clean but still
> have metadata that needs commit.

So to close this hole, I think what we can do is to track the last
transaction id where ext4_do_update_inode() was called, and if that
transaction id == the currently running transaction id, then we need
to call ext4_force_commit() even though the inode is clean.  I think
that should fix up the race that you're concerned about.

     	    	       	    	 		  - Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Fernando Luis Vázquez Cao - March 31, 2009, 4:23 a.m.
Bartlomiej Zolnierkiewicz wrote:
> Hi,
> 
> On Monday 30 March 2009, Fernando Luis Vázquez Cao wrote:
>> Chris Mason wrote:
>>> On Sun, 2009-03-29 at 22:22 -0500, Eric Sandeen wrote:
>>>> Theodore Tso wrote:
>>>>> On Sun, Mar 29, 2009 at 12:43:22PM -0500, Eric Sandeen wrote:
>>>>>> As long as we keep the call there this is probably good, but after
>>>>>> talking w/ Chris Mason, I think the call is extraneous anyway and should
>>>>>> probably just be removed...
>>>>>>
>>>>> Yes, I agree, but it takes a lot of digging to be completely sure of
>>>>> that it's safe to remove it.  Interestingly, it was you who added the
>>>>> patch which added the call to blkdev_issue_flush():
>>>>> commit d755fb384250d6bd7fd18a0930e71965acc8e72e
>>>>> Author: Eric Sandeen <sandeen@redhat.com>
>>>>> Date:   Fri Jul 11 19:27:31 2008 -0400
>>>> Yes, it was.  Although I got the idea when hch pointed out that SuSE did
>>>> this... thanks to Chris.  It's come full circle.  :)
>>> Grin.  I'm not sure the I_DIRTY checks alone are enough to decide that a
>>> commit is required though.  I think the inode could be clean but still
>>> have metadata that needs commit.
>> Chris, I have just sent patches that attempt to fix both ext3 and
>> ext4 while also adding a per-device sysfs knob tu disable
>> write-flushes. A previous version of this patch set added a new
>> generic mount option but comments from Christoph and others
>> convinced me to turn it into a per-device tunable. Could you take
>> a look at the patches?
>>
>> Bartlomiej, I have just noticed that I happened to be working on
>> patches for reiserfs and xfs similar to the ones you sent earlier
>> this week. I picked some bits from your submission so I took the
>> liberty to add your signed-off to my patches. Could you take a
>> look at them and let me know if you are confortable with that?
> 
> I'm fine with people building bigger changes on top of my patches
> but if you do so you please clearly denote in the patch description
> what changes you have applied to the original patch...

You are right, sorry about that. I will add a short changelog when I
resubmit the patches.

Thanks!

- Fernando
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Fernando Luis Vázquez Cao - March 31, 2009, 4:28 a.m.
Chris Mason wrote:
> On Mon, 2009-03-30 at 22:01 +0900, Fernando Luis Vázquez Cao wrote:
>> Chris Mason wrote:
>>> On Sun, 2009-03-29 at 22:22 -0500, Eric Sandeen wrote:
>>>> Theodore Tso wrote:
>>>>> On Sun, Mar 29, 2009 at 12:43:22PM -0500, Eric Sandeen wrote:
>>>>>> As long as we keep the call there this is probably good, but after
>>>>>> talking w/ Chris Mason, I think the call is extraneous anyway and should
>>>>>> probably just be removed...
>>>>>>
>>>>> Yes, I agree, but it takes a lot of digging to be completely sure of
>>>>> that it's safe to remove it.  Interestingly, it was you who added the
>>>>> patch which added the call to blkdev_issue_flush():
>>>>> commit d755fb384250d6bd7fd18a0930e71965acc8e72e
>>>>> Author: Eric Sandeen <sandeen@redhat.com>
>>>>> Date:   Fri Jul 11 19:27:31 2008 -0400
>>>> Yes, it was.  Although I got the idea when hch pointed out that SuSE did
>>>> this... thanks to Chris.  It's come full circle.  :)
>>> Grin.  I'm not sure the I_DIRTY checks alone are enough to decide that a
>>> commit is required though.  I think the inode could be clean but still
>>> have metadata that needs commit.
>> Chris, I have just sent patches that attempt to fix both ext3 and
>> ext4 while also adding a per-device sysfs knob tu disable
>> write-flushes. A previous version of this patch set added a new
>> generic mount option but comments from Christoph and others
>> convinced me to turn it into a per-device tunable. Could you take
>> a look at the patches?
>>
> 
> Jens' comment are right on I think.  If we get that fixed up we can get
> rid of all the filesystem mount -o barrier=flush,0,1,xyz confusion and
> set it via the block devices directly.
> 
> That would be nice ;)

Thank you for your feedback, Chris! I will address some of the issues spotted
in the mailing list and resend the whole patch-set.

Regards,

Fernando
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Fernando Luis Vázquez Cao - March 31, 2009, 4:33 a.m.
Theodore Tso wrote:
> On Mon, Mar 30, 2009 at 10:01:16PM +0900, Fernando Luis Vázquez Cao wrote:
>> Chris, I have just sent patches that attempt to fix both ext3 and
>> ext4 while also adding a per-device sysfs knob tu disable
>> write-flushes. A previous version of this patch set added a new
>> generic mount option but comments from Christoph and others
>> convinced me to turn it into a per-device tunable. Could you take
>> a look at the patches?
> 
> Fernando, see my comments on those patches.  We don't need to issue a
> barrier after a call to sync_inode() or ext[34]_force_commit(), since
> those functions will issue a barrier for us.  It would probably be a
> good idea to use blktrace to test and make sure that we have one and
> exactly one barrier op issued for each fsync().

I'll give blktrace a spin and check if things are working as expected.

Thanks!

- Fernando
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

Index: b/fs/ext4/fsync.c
===================================================================
--- a/fs/ext4/fsync.c
+++ b/fs/ext4/fsync.c
@@ -48,7 +48,7 @@  int ext4_sync_file(struct file *file, st
 {
 	struct inode *inode = dentry->d_inode;
 	journal_t *journal = EXT4_SB(inode->i_sb)->s_journal;
-	int ret = 0;
+	int ret = 0, tmp_ret;
 
 	J_ASSERT(ext4_journal_current_handle() == NULL);
 
@@ -92,8 +92,11 @@  int ext4_sync_file(struct file *file, st
 			.nr_to_write = 0, /* sys_fsync did this */
 		};
 		ret = sync_inode(inode, &wbc);
-		if (journal && (journal->j_flags & JBD2_BARRIER))
-			blkdev_issue_flush(inode->i_sb->s_bdev, NULL);
+		if (journal && (journal->j_flags & JBD2_BARRIER)) {
+			tmp_ret = blkdev_issue_flush(inode->i_sb->s_bdev, NULL);
+			if (ret == 0 && tmp_ret < 0 && tmp_ret != -EOPNOTSUPP)
+				ret = tmp_ret;
+		}
 	}
 out:
 	return ret;