diff mbox

RFC PATCH: ext4 no journal corruption with locale-gen

Message ID 6601abe90906220942se70fb70w5481e178f1525dd8@mail.gmail.com
State Superseded, archived
Headers show

Commit Message

Curt Wohlgemuth June 22, 2009, 4:42 p.m. UTC
Hi Ted:

I think the following patch is sufficient.  It explicitly sets the aops to
ext4_writeback_aops if there is no delayed allocation and no journal.

I tested the locale-gen example with all combinations of

   data=writeback
   data=ordered
   data=journal
   <no journal at all>

and

   delalloc
   nodelalloc

and it works correctly now.  The paths for writeback seem fine to me for an
inode w/o a journal.


       Signed-off-by: Curt Wohlgemuth <curtw@google.com>
---


On Wed, Jun 17, 2009 at 4:46 PM, Theodore Tso<tytso@mit.edu> wrote:
> Hi Curt,
>
> Thanks for your analysis of the bug.  The reason for the strange logic
> in ext4_set_aops() is because at the moment the code doesn't support
> the combination of data=journalled && delalloc.  That's why it was
> explicitly checking for ext4_should_order_data() and
> ext4_should_writeback_data().
>
> We have a check for this in ext4_fill_super(), so your patch should be
> safe, since the combination of ext4_should_journal_data &&
> test_opt(inode->i_sb, DELALLOC) should never happen.
>
> As to your question of whether the nodelalloc and nojournal case
> should really be ext4_journalled_aops, I suspect ext4_writeback_aops
> makes more sense.  I haven't audited all of the code paths to make
> sure they DTRT in the non-journalled case yet, though.
>
>                                                        - Ted
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Aneesh Kumar K.V June 22, 2009, 4:56 p.m. UTC | #1
On Mon, Jun 22, 2009 at 09:42:25AM -0700, Curt Wohlgemuth wrote:
> Hi Ted:
> 
> I think the following patch is sufficient.  It explicitly sets the aops to
> ext4_writeback_aops if there is no delayed allocation and no journal.
> 
> I tested the locale-gen example with all combinations of
> 
>    data=writeback
>    data=ordered
>    data=journal
>    <no journal at all>
> 
> and
> 
>    delalloc
>    nodelalloc
> 
> and it works correctly now.  The paths for writeback seem fine to me for an
> inode w/o a journal.
> 
> 
>        Signed-off-by: Curt Wohlgemuth <curtw@google.com>
> ---
> --- 2.6.26/fs/ext4/inode.c.orig	2009-06-09 20:05:27.000000000 -0700
> +++ 2.6.26/fs/ext4/inode.c	2009-06-22 08:55:13.000000000 -0700
> @@ -3442,15 +3442,12 @@ static const struct address_space_operat
> 
>  void ext4_set_aops(struct inode *inode)
>  {
> -	if (ext4_should_order_data(inode) &&
> -		test_opt(inode->i_sb, DELALLOC))
> +	if (test_opt(inode->i_sb, DELALLOC))

This change is not related to the fix right ?

>  		inode->i_mapping->a_ops = &ext4_da_aops;
>  	else if (ext4_should_order_data(inode))
>  		inode->i_mapping->a_ops = &ext4_ordered_aops;
> -	else if (ext4_should_writeback_data(inode) &&
> -		 test_opt(inode->i_sb, DELALLOC))
> -		inode->i_mapping->a_ops = &ext4_da_aops;
> -	else if (ext4_should_writeback_data(inode))
> +	else if (ext4_should_writeback_data(inode) ||
> +	                         EXT4_JOURNAL(inode) == NULL)
>  		inode->i_mapping->a_ops = &ext4_writeback_aops;

Can you send a patch with this hunk alone. The previous one is not
related to the fix right ?


>  	else
>  		inode->i_mapping->a_ops = &ext4_journalled_aops;
> 
> 

-aneesh
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Curt Wohlgemuth June 22, 2009, 5:01 p.m. UTC | #2
On Mon, Jun 22, 2009 at 9:56 AM, Aneesh Kumar
K.V<aneesh.kumar@linux.vnet.ibm.com> wrote:
> On Mon, Jun 22, 2009 at 09:42:25AM -0700, Curt Wohlgemuth wrote:
>> Hi Ted:
>>
>> I think the following patch is sufficient.  It explicitly sets the aops to
>> ext4_writeback_aops if there is no delayed allocation and no journal.
>>
>> I tested the locale-gen example with all combinations of
>>
>>    data=writeback
>>    data=ordered
>>    data=journal
>>    <no journal at all>
>>
>> and
>>
>>    delalloc
>>    nodelalloc
>>
>> and it works correctly now.  The paths for writeback seem fine to me for an
>> inode w/o a journal.
>>
>>
>>        Signed-off-by: Curt Wohlgemuth <curtw@google.com>
>> ---
>> --- 2.6.26/fs/ext4/inode.c.orig       2009-06-09 20:05:27.000000000 -0700
>> +++ 2.6.26/fs/ext4/inode.c    2009-06-22 08:55:13.000000000 -0700
>> @@ -3442,15 +3442,12 @@ static const struct address_space_operat
>>
>>  void ext4_set_aops(struct inode *inode)
>>  {
>> -     if (ext4_should_order_data(inode) &&
>> -             test_opt(inode->i_sb, DELALLOC))
>> +     if (test_opt(inode->i_sb, DELALLOC))
>
> This change is not related to the fix right ?
>
>>               inode->i_mapping->a_ops = &ext4_da_aops;
>>       else if (ext4_should_order_data(inode))
>>               inode->i_mapping->a_ops = &ext4_ordered_aops;
>> -     else if (ext4_should_writeback_data(inode) &&
>> -              test_opt(inode->i_sb, DELALLOC))
>> -             inode->i_mapping->a_ops = &ext4_da_aops;
>> -     else if (ext4_should_writeback_data(inode))
>> +     else if (ext4_should_writeback_data(inode) ||
>> +                              EXT4_JOURNAL(inode) == NULL)
>>               inode->i_mapping->a_ops = &ext4_writeback_aops;
>
> Can you send a patch with this hunk alone. The previous one is not
> related to the fix right ?

You need them both.  We want all inodes with delayed allocation to use
ext4_da_aops.  The fall-through cases are just for nodelalloc.

Curt

>
>
>>       else
>>               inode->i_mapping->a_ops = &ext4_journalled_aops;
>>
>>
>
> -aneesh
>
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andreas Dilger June 22, 2009, 9:13 p.m. UTC | #3
On Jun 22, 2009  09:42 -0700, Curt Wohlgemuth wrote:
> I tested the locale-gen example with all combinations of
> 
>    data=writeback
>    data=ordered
>    data=journal
>    <no journal at all>

On an unrelated note - would it be useful to mount an ext4 filesystem
with a journal using "data=none" (or similar) to run without a journal?

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Curt Wohlgemuth June 29, 2009, 5:50 p.m. UTC | #4
Ted, did you have any comment or objections to this patch?

Thanks,
Curt


On Mon, Jun 22, 2009 at 9:42 AM, Curt Wohlgemuth<curtw@google.com> wrote:
> Hi Ted:
>
> I think the following patch is sufficient.  It explicitly sets the aops to
> ext4_writeback_aops if there is no delayed allocation and no journal.
>
> I tested the locale-gen example with all combinations of
>
>   data=writeback
>   data=ordered
>   data=journal
>   <no journal at all>
>
> and
>
>   delalloc
>   nodelalloc
>
> and it works correctly now.  The paths for writeback seem fine to me for an
> inode w/o a journal.
>
>
>       Signed-off-by: Curt Wohlgemuth <curtw@google.com>
> ---
> --- 2.6.26/fs/ext4/inode.c.orig 2009-06-09 20:05:27.000000000 -0700
> +++ 2.6.26/fs/ext4/inode.c      2009-06-22 08:55:13.000000000 -0700
> @@ -3442,15 +3442,12 @@ static const struct address_space_operat
>
>  void ext4_set_aops(struct inode *inode)
>  {
> -       if (ext4_should_order_data(inode) &&
> -               test_opt(inode->i_sb, DELALLOC))
> +       if (test_opt(inode->i_sb, DELALLOC))
>                inode->i_mapping->a_ops = &ext4_da_aops;
>        else if (ext4_should_order_data(inode))
>                inode->i_mapping->a_ops = &ext4_ordered_aops;
> -       else if (ext4_should_writeback_data(inode) &&
> -                test_opt(inode->i_sb, DELALLOC))
> -               inode->i_mapping->a_ops = &ext4_da_aops;
> -       else if (ext4_should_writeback_data(inode))
> +       else if (ext4_should_writeback_data(inode) ||
> +                                EXT4_JOURNAL(inode) == NULL)
>                inode->i_mapping->a_ops = &ext4_writeback_aops;
>        else
>                inode->i_mapping->a_ops = &ext4_journalled_aops;
>
>
> On Wed, Jun 17, 2009 at 4:46 PM, Theodore Tso<tytso@mit.edu> wrote:
>> Hi Curt,
>>
>> Thanks for your analysis of the bug.  The reason for the strange logic
>> in ext4_set_aops() is because at the moment the code doesn't support
>> the combination of data=journalled && delalloc.  That's why it was
>> explicitly checking for ext4_should_order_data() and
>> ext4_should_writeback_data().
>>
>> We have a check for this in ext4_fill_super(), so your patch should be
>> safe, since the combination of ext4_should_journal_data &&
>> test_opt(inode->i_sb, DELALLOC) should never happen.
>>
>> As to your question of whether the nodelalloc and nojournal case
>> should really be ext4_journalled_aops, I suspect ext4_writeback_aops
>> makes more sense.  I haven't audited all of the code paths to make
>> sure they DTRT in the non-journalled case yet, though.
>>
>>                                                        - Ted
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Aneesh Kumar K.V July 1, 2009, 6:31 p.m. UTC | #5
On Mon, Jun 22, 2009 at 09:42:25AM -0700, Curt Wohlgemuth wrote:
> Hi Ted:
> 
> I think the following patch is sufficient.  It explicitly sets the aops to
> ext4_writeback_aops if there is no delayed allocation and no journal.
> 
> I tested the locale-gen example with all combinations of
> 
>    data=writeback
>    data=ordered
>    data=journal
>    <no journal at all>
> 
> and
> 
>    delalloc
>    nodelalloc
> 
> and it works correctly now.  The paths for writeback seem fine to me for an
> inode w/o a journal.
> 
> 
>        Signed-off-by: Curt Wohlgemuth <curtw@google.com>
> ---
> --- 2.6.26/fs/ext4/inode.c.orig	2009-06-09 20:05:27.000000000 -0700
> +++ 2.6.26/fs/ext4/inode.c	2009-06-22 08:55:13.000000000 -0700
> @@ -3442,15 +3442,12 @@ static const struct address_space_operat
> 
>  void ext4_set_aops(struct inode *inode)
>  {
> -	if (ext4_should_order_data(inode) &&
> -		test_opt(inode->i_sb, DELALLOC))
> +	if (test_opt(inode->i_sb, DELALLOC))
>  		inode->i_mapping->a_ops = &ext4_da_aops;
>  	else if (ext4_should_order_data(inode))
>  		inode->i_mapping->a_ops = &ext4_ordered_aops;
> -	else if (ext4_should_writeback_data(inode) &&
> -		 test_opt(inode->i_sb, DELALLOC))
> -		inode->i_mapping->a_ops = &ext4_da_aops;
> -	else if (ext4_should_writeback_data(inode))
> +	else if (ext4_should_writeback_data(inode) ||
> +	                         EXT4_JOURNAL(inode) == NULL)
>  		inode->i_mapping->a_ops = &ext4_writeback_aops;
>  	else
>  		inode->i_mapping->a_ops = &ext4_journalled_aops;
> 
> 

I looked at the patch in detail and  I guess we should instead force
a data=writeback mode if the filesystem is created without a journal.
I am not sure what whould be the meaning of data=ordered/data=journal
without a journal. So if we find that file system doesn't have a journal
then either we should update the default mount option in the filesystem
to be of data=writeback. Also if the user tried to mount with
data=ordered or data=journal we should print appropriate message and
force ourself to data=writeback.

Once we have data=writeback set then ext4_set_aops will handle the  case
properly.

-aneesh
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Aneesh Kumar K.V July 1, 2009, 6:35 p.m. UTC | #6
On Mon, Jun 22, 2009 at 11:13:27PM +0200, Andreas Dilger wrote:
> On Jun 22, 2009  09:42 -0700, Curt Wohlgemuth wrote:
> > I tested the locale-gen example with all combinations of
> > 
> >    data=writeback
> >    data=ordered
> >    data=journal
> >    <no journal at all>
> 
> On an unrelated note - would it be useful to mount an ext4 filesystem
> with a journal using "data=none" (or similar) to run without a journal?
> 

I think this is better. I would suggest data=nojournal. That way we can
check the mount options to figure out whether we are running with
journal or not. Also i guess this enables us to run without using a
journal even if  mke2fs created a journal for us

-aneesh
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michael Rubin July 1, 2009, 6:44 p.m. UTC | #7
On Wed, Jul 1, 2009 at 11:35 AM, Aneesh Kumar
K.V<aneesh.kumar@linux.vnet.ibm.com> wrote:
> On Mon, Jun 22, 2009 at 11:13:27PM +0200, Andreas Dilger wrote:
>> On Jun 22, 2009  09:42 -0700, Curt Wohlgemuth wrote:
> I think this is better. I would suggest data=nojournal. That way we can
> check the mount options to figure out whether we are running with
> journal or not. Also i guess this enables us to run without using a
> journal even if  mke2fs created a journal for us
>

As a heavy consumer of the non-journal mode I really like this idea.
It will make many things easier and clear for us.

mrubin
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Xiang Wang July 2, 2009, 11:27 p.m. UTC | #8
On Wed, Jul 1, 2009 at 11:35 AM, Aneesh Kumar
K.V<aneesh.kumar@linux.vnet.ibm.com> wrote:
> On Mon, Jun 22, 2009 at 11:13:27PM +0200, Andreas Dilger wrote:
>> On Jun 22, 2009  09:42 -0700, Curt Wohlgemuth wrote:
>> > I tested the locale-gen example with all combinations of
>> >
>> >    data=writeback
>> >    data=ordered
>> >    data=journal
>> >    <no journal at all>
>>
>> On an unrelated note - would it be useful to mount an ext4 filesystem
>> with a journal using "data=none" (or similar) to run without a journal?
>>
>
> I think this is better. I would suggest data=nojournal. That way we can
> check the mount options to figure out whether we are running with
> journal or not. Also i guess this enables us to run without using a
> journal even if  mke2fs created a journal for us

We think adding this "data=nojournal" mount option is a very good idea.
To be more specific, after adding this option, the semantics will be:

If mke2fs does not create a journal, then we should use the
"data=nojournal" mount option only.

If mke2fs creates a journal for us, we are allowed to use one of the
following 4 mount options:
data=nojournal
data=writeback
data=ordered
data=journal

We are now working on this patch and will submit it in a couple of weeks.

Meanwhile if Ted could take the most recent patch from Curt to fix the data
corruption problem, that would be great!

Thanks,
Xiang

> -aneesh
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

--- 2.6.26/fs/ext4/inode.c.orig	2009-06-09 20:05:27.000000000 -0700
+++ 2.6.26/fs/ext4/inode.c	2009-06-22 08:55:13.000000000 -0700
@@ -3442,15 +3442,12 @@  static const struct address_space_operat

 void ext4_set_aops(struct inode *inode)
 {
-	if (ext4_should_order_data(inode) &&
-		test_opt(inode->i_sb, DELALLOC))
+	if (test_opt(inode->i_sb, DELALLOC))
 		inode->i_mapping->a_ops = &ext4_da_aops;
 	else if (ext4_should_order_data(inode))
 		inode->i_mapping->a_ops = &ext4_ordered_aops;
-	else if (ext4_should_writeback_data(inode) &&
-		 test_opt(inode->i_sb, DELALLOC))
-		inode->i_mapping->a_ops = &ext4_da_aops;
-	else if (ext4_should_writeback_data(inode))
+	else if (ext4_should_writeback_data(inode) ||
+	                         EXT4_JOURNAL(inode) == NULL)
 		inode->i_mapping->a_ops = &ext4_writeback_aops;
 	else
 		inode->i_mapping->a_ops = &ext4_journalled_aops;