diff mbox

ext4: Use s_csum_seed instead of i_csum_seed for xattr block csum.

Message ID 1340547236-2838-1-git-send-email-tm@tao.ma
State Not Applicable, archived
Headers show

Commit Message

Tao Ma June 24, 2012, 2:13 p.m. UTC
From: Tao Ma <boyu.mt@taobao.com>

In xattr block operation, we use h_refcount to indicate whether the
xattr block is shared among many inodes. And xattr block csum uses
s_csum_seed if it is shared and i_csum_seed if it belongs to
one inode. But this has a problem. So consider the block is shared
first bewteen inode A and B, and B has some xattr update and CoW
the xattr block. When it updates the *old* xattr block(because
of the h_refcount change) and calls ext4_xattr_release_block, we
has no idea that inode A is the real owner of the *old* xattr
block and we can't use the i_csum_seed of inode A either in xattr
block csum calculation. And I don't think we have an easy way to
find inode A.

So this patch just removes the tricky i_csum_seed and we now uses
s_csum_seed every time for the xattr block csum. The corresponding
patch for the e2fsprogs will be sent in another patch.

This is spotted by xfstests 117.

Cc: Darrick J. Wong <djwong@us.ibm.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Tao Ma <boyu.mt@taobao.com>
---
 fs/ext4/xattr.c |   11 ++++-------
 1 files changed, 4 insertions(+), 7 deletions(-)

Comments

Theodore Ts'o June 26, 2012, 2:23 a.m. UTC | #1
On Sun, Jun 24, 2012 at 10:13:56PM +0800, Tao Ma wrote:
> From: Tao Ma <boyu.mt@taobao.com>
> 
> In xattr block operation, we use h_refcount to indicate whether the
> xattr block is shared among many inodes. And xattr block csum uses
> s_csum_seed if it is shared and i_csum_seed if it belongs to
> one inode. But this has a problem. So consider the block is shared
> first bewteen inode A and B, and B has some xattr update and CoW
> the xattr block. When it updates the *old* xattr block(because
> of the h_refcount change) and calls ext4_xattr_release_block, we
> has no idea that inode A is the real owner of the *old* xattr
> block and we can't use the i_csum_seed of inode A either in xattr
> block csum calculation. And I don't think we have an easy way to
> find inode A.
> 
> So this patch just removes the tricky i_csum_seed and we now uses
> s_csum_seed every time for the xattr block csum. The corresponding
> patch for the e2fsprogs will be sent in another patch.

This makes sense to me; it's an on-disk format change, but we haven't
released the e2fsprogs patches in anything other than the proposed
updates branch of the e2fsprogs repo, so it seems reasonable to make
this change as there's really no other way to fix this.

Darrick, any objections to this change?

						- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Darrick J. Wong June 28, 2012, 8:15 p.m. UTC | #2
On Mon, Jun 25, 2012 at 10:23:00PM -0400, Theodore Ts'o wrote:
> On Sun, Jun 24, 2012 at 10:13:56PM +0800, Tao Ma wrote:
> > From: Tao Ma <boyu.mt@taobao.com>
> > 
> > In xattr block operation, we use h_refcount to indicate whether the
> > xattr block is shared among many inodes. And xattr block csum uses
> > s_csum_seed if it is shared and i_csum_seed if it belongs to
> > one inode. But this has a problem. So consider the block is shared
> > first bewteen inode A and B, and B has some xattr update and CoW
> > the xattr block. When it updates the *old* xattr block(because
> > of the h_refcount change) and calls ext4_xattr_release_block, we
> > has no idea that inode A is the real owner of the *old* xattr
> > block and we can't use the i_csum_seed of inode A either in xattr
> > block csum calculation. And I don't think we have an easy way to
> > find inode A.
> > 
> > So this patch just removes the tricky i_csum_seed and we now uses
> > s_csum_seed every time for the xattr block csum. The corresponding
> > patch for the e2fsprogs will be sent in another patch.
> 
> This makes sense to me; it's an on-disk format change, but we haven't
> released the e2fsprogs patches in anything other than the proposed
> updates branch of the e2fsprogs repo, so it seems reasonable to make
> this change as there's really no other way to fix this.
> 
> Darrick, any objections to this change?

Nope. iirc the only reason we had that weird code to start with was that
someone suggested that we use the inode checksum in the "xattr block only has
one owner" case, though I seem to have FUBAR'd it anyway.

--D
> 
> 						- Ted
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andreas Dilger June 29, 2012, 2:27 a.m. UTC | #3
On 2012-06-28, at 2:15 PM, Darrick J. Wong wrote:
> On Mon, Jun 25, 2012 at 10:23:00PM -0400, Theodore Ts'o wrote:
>> On Sun, Jun 24, 2012 at 10:13:56PM +0800, Tao Ma wrote:
>>> From: Tao Ma <boyu.mt@taobao.com>
>>> 
>>> In xattr block operation, we use h_refcount to indicate whether the
>>> xattr block is shared among many inodes. And xattr block csum uses
>>> s_csum_seed if it is shared and i_csum_seed if it belongs to
>>> one inode. But this has a problem. So consider the block is shared
>>> first bewteen inode A and B, and B has some xattr update and CoW
>>> the xattr block. When it updates the *old* xattr block(because
>>> of the h_refcount change) and calls ext4_xattr_release_block, we
>>> has no idea that inode A is the real owner of the *old* xattr
>>> block and we can't use the i_csum_seed of inode A either in xattr
>>> block csum calculation. And I don't think we have an easy way to
>>> find inode A.
>>> 
>>> So this patch just removes the tricky i_csum_seed and we now uses
>>> s_csum_seed every time for the xattr block csum. The corresponding
>>> patch for the e2fsprogs will be sent in another patch.
>> 
>> This makes sense to me; it's an on-disk format change, but we haven't
>> released the e2fsprogs patches in anything other than the proposed
>> updates branch of the e2fsprogs repo, so it seems reasonable to make
>> this change as there's really no other way to fix this.
>> 
>> Darrick, any objections to this change?
> 
> Nope. iirc the only reason we had that weird code to start with was that
> someone suggested that we use the inode checksum in the "xattr block only has
> one owner" case, though I seem to have FUBAR'd it anyway.

That was my fault.  I was trying to avoid the case when some node gets
wrong xattr block, and there is no way to detect this.  Going from a
single owner (using ino as seed) to multiple owners (using block as seed)
is easy.  It's just the case of going from multiple owners to a single
one that is tricky.

We could take the easy way out, and fall back to the block-seed checksum
if the inode seed checksum fails.  In the most common cases (no shared
block checksum, or many inodes sharing a checksum) this works fine, and
in the uncommon case (formerly shared block only used by one inode) it
would need to compute the checksum twice.  This means there are only a
small number of xattr blocks for which two checksums work, instead of
all xattr blocks which can be incorrectly referenced by any inode since
if the inode is pointing at that block it will also compute the right
checksum using the block seed.

Note that I'm not dead set on this, but wanted to explain the reasoning
about how this decision got made.

Cheers, Andreas





--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Theodore Ts'o June 29, 2012, 6:13 p.m. UTC | #4
On Thu, Jun 28, 2012 at 08:27:06PM -0600, Andreas Dilger wrote:
> That was my fault.  I was trying to avoid the case when some node gets
> wrong xattr block, and there is no way to detect this.  Going from a
> single owner (using ino as seed) to multiple owners (using block as seed)
> is easy.  It's just the case of going from multiple owners to a single
> one that is tricky.
> 
> We could take the easy way out, and fall back to the block-seed checksum
> if the inode seed checksum fails.  In the most common cases (no shared
> block checksum, or many inodes sharing a checksum) this works fine, and
> in the uncommon case (formerly shared block only used by one inode) it
> would need to compute the checksum twice.  This means there are only a
> small number of xattr blocks for which two checksums work, instead of
> all xattr blocks which can be incorrectly referenced by any inode since
> if the inode is pointing at that block it will also compute the right
> checksum using the block seed.
> 
> Note that I'm not dead set on this, but wanted to explain the reasoning
> about how this decision got made.

I thought about proposing adding a flag to the xattr header to
indicate whether the superblock seed or the per-inode seed should be
used; I ultimately decided it wasn't worth the complexity that it
would add, so I didn't bring it up.

It seems to me that in the vast majority of the cases, the xattr will
fit in the inode, especially in the SElinux and ACL case.  So the
question is how often will we have really large xattrs.  It *should*
be rare, since if it is happening all the time for ACL's and SElinux
cases, performance will be disastrous.  If we do have lots of xattr
blocks, it would make sense to be worried about per-inode seeds, but
is it worth the extra complexity?

Eh.....  In the end it's kind of on the line as I see it.  I don't
feel strongly enough to engineer it myself, but if someone were to
send me the patch, I'd probably accept it.

Regards,

						- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tao Ma June 30, 2012, 2:19 p.m. UTC | #5
On 06/30/2012 02:13 AM, Ted Ts'o wrote:
> On Thu, Jun 28, 2012 at 08:27:06PM -0600, Andreas Dilger wrote:
>> That was my fault.  I was trying to avoid the case when some node gets
>> wrong xattr block, and there is no way to detect this.  Going from a
>> single owner (using ino as seed) to multiple owners (using block as seed)
>> is easy.  It's just the case of going from multiple owners to a single
>> one that is tricky.
>>
>> We could take the easy way out, and fall back to the block-seed checksum
>> if the inode seed checksum fails.  In the most common cases (no shared
>> block checksum, or many inodes sharing a checksum) this works fine, and
>> in the uncommon case (formerly shared block only used by one inode) it
>> would need to compute the checksum twice.  This means there are only a
>> small number of xattr blocks for which two checksums work, instead of
>> all xattr blocks which can be incorrectly referenced by any inode since
>> if the inode is pointing at that block it will also compute the right
>> checksum using the block seed.
>>
>> Note that I'm not dead set on this, but wanted to explain the reasoning
>> about how this decision got made.
> 
> I thought about proposing adding a flag to the xattr header to
> indicate whether the superblock seed or the per-inode seed should be
> used; I ultimately decided it wasn't worth the complexity that it
> would add, so I didn't bring it up.
yeah, it is type of *too much complicated* for a xattr block csum
calculation. And what's more, the case will be that csum is controlled
by the field it checksums. ;) So consider if these of 2 fields don't
meet with each other, we should trust the field and recalc the csum or
trust the csum and change the field? It will add too much complexity to
the code, both the kernel and the e2fsprogs.

Thanks
Tao
> 
> It seems to me that in the vast majority of the cases, the xattr will
> fit in the inode, especially in the SElinux and ACL case.  So the
> question is how often will we have really large xattrs.  It *should*
> be rare, since if it is happening all the time for ACL's and SElinux
> cases, performance will be disastrous.  If we do have lots of xattr
> blocks, it would make sense to be worried about per-inode seeds, but
> is it worth the extra complexity?
> 
> Eh.....  In the end it's kind of on the line as I see it.  I don't
> feel strongly enough to engineer it myself, but if someone were to
> send me the patch, I'd probably accept it.
> 
> Regards,
> 
> 						- Ted
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Theodore Ts'o July 9, 2012, 2:12 p.m. UTC | #6
On Sun, Jun 24, 2012 at 10:13:56PM +0800, Tao Ma wrote:
> From: Tao Ma <boyu.mt@taobao.com>
> 
> In xattr block operation, we use h_refcount to indicate whether the
> xattr block is shared among many inodes. And xattr block csum uses
> s_csum_seed if it is shared and i_csum_seed if it belongs to
> one inode. But this has a problem. So consider the block is shared
> first bewteen inode A and B, and B has some xattr update and CoW
> the xattr block. When it updates the *old* xattr block(because
> of the h_refcount change) and calls ext4_xattr_release_block, we
> has no idea that inode A is the real owner of the *old* xattr
> block and we can't use the i_csum_seed of inode A either in xattr
> block csum calculation. And I don't think we have an easy way to
> find inode A.
> 
> So this patch just removes the tricky i_csum_seed and we now uses
> s_csum_seed every time for the xattr block csum. The corresponding
> patch for the e2fsprogs will be sent in another patch.
> 
> This is spotted by xfstests 117.
> 
> Cc: Darrick J. Wong <djwong@us.ibm.com>
> Cc: "Theodore Ts'o" <tytso@mit.edu>
> Signed-off-by: Tao Ma <boyu.mt@taobao.com>

Thanks, applied.

					- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tao Ma Sept. 3, 2012, 3:14 a.m. UTC | #7
Hi Ted,
	As the corresponding kernel change has been merged, can this patch be
merged to the e2fsporgs also?

Thanks
Tao
On 06/24/2012 10:13 PM, Tao Ma wrote:
> From: Tao Ma <boyu.mt@taobao.com>
> 
> In xattr block operation, we use h_refcount to indicate whether the
> xattr block is shared among many inodes. And xattr block csum uses
> s_csum_seed if it is shared and i_csum_seed if it belongs to
> one inode. But this has a problem. So consider the block is shared
> first bewteen inode A and B, and B has some xattr update and CoW
> the xattr block. When it updates the *old* xattr block(because
> of the h_refcount change) and calls ext4_xattr_release_block, we
> has no idea that inode A is the real owner of the *old* xattr
> block and we can't use the i_csum_seed of inode A either in xattr
> block csum calculation. And I don't think we have an easy way to
> find inode A.
> 
> So this patch just removes the tricky i_csum_seed and we now uses
> s_csum_seed every time for the xattr block csum. The corresponding
> patch for the e2fsprogs will be sent in another patch.
> 
> This is spotted by xfstests 117.
> 
> Cc: Darrick J. Wong <djwong@us.ibm.com>
> Cc: "Theodore Ts'o" <tytso@mit.edu>
> Signed-off-by: Tao Ma <boyu.mt@taobao.com>
> ---
>  fs/ext4/xattr.c |   11 ++++-------
>  1 files changed, 4 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c
> index e56c9ed..2cdb98d 100644
> --- a/fs/ext4/xattr.c
> +++ b/fs/ext4/xattr.c
> @@ -127,19 +127,16 @@ static __le32 ext4_xattr_block_csum(struct inode *inode,
>  				    struct ext4_xattr_header *hdr)
>  {
>  	struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
> -	struct ext4_inode_info *ei = EXT4_I(inode);
>  	__u32 csum, old;
>  
>  	old = hdr->h_checksum;
>  	hdr->h_checksum = 0;
> -	if (le32_to_cpu(hdr->h_refcount) != 1) {
> -		block_nr = cpu_to_le64(block_nr);
> -		csum = ext4_chksum(sbi, sbi->s_csum_seed, (__u8 *)&block_nr,
> -				   sizeof(block_nr));
> -	} else
> -		csum = ei->i_csum_seed;
> +	block_nr = cpu_to_le64(block_nr);
> +	csum = ext4_chksum(sbi, sbi->s_csum_seed, (__u8 *)&block_nr,
> +			   sizeof(block_nr));
>  	csum = ext4_chksum(sbi, csum, (__u8 *)hdr,
>  			   EXT4_BLOCK_SIZE(inode->i_sb));
> +
>  	hdr->h_checksum = old;
>  	return cpu_to_le32(csum);
>  }
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Theodore Ts'o Sept. 3, 2012, 3:39 a.m. UTC | #8
On Mon, Sep 03, 2012 at 11:14:19AM +0800, Tao Ma wrote:
> Hi Ted,
> 	As the corresponding kernel change has been merged, can this patch be
> merged to the e2fsporgs also?

Hi Tao,

I folded your change into the metadata checksum patches before I
applied them to the next branch (and which is now included in the
master branch).  So if you use the master/next branch, it should be
fully in sync with the kernel implementation of metadata checksum.
This was along with a number of other cleanups that I did to collapse
related patches before they got fully applied into e2fsprogs.

I have run xfstests with metadata checksum enabled in the kernel with
the development branch of e2fsprogs, and it passes, so I'm fairly
confident they are in sync.

One of the questions that we should consider in the near future is
when to cut a e2fsprogs 1.43 release.  Things which we might want to
consider adding before we cut 1.43 include the changes to fully
support online resize for 64-bit file systems, and possibly the inline
data patches, if we are confident in the e2fsprogs changes and that
the format of inline data is ready to be cast into concrete.

Regards,

						- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tao Ma Sept. 3, 2012, 5:37 a.m. UTC | #9
On 09/03/2012 11:39 AM, Theodore Ts'o wrote:
> On Mon, Sep 03, 2012 at 11:14:19AM +0800, Tao Ma wrote:
>> Hi Ted,
>> 	As the corresponding kernel change has been merged, can this patch be
>> merged to the e2fsporgs also?
> 
> Hi Tao,
> 
> I folded your change into the metadata checksum patches before I
> applied them to the next branch (and which is now included in the
> master branch).  So if you use the master/next branch, it should be
> fully in sync with the kernel implementation of metadata checksum.
> This was along with a number of other cleanups that I did to collapse
> related patches before they got fully applied into e2fsprogs.
oh, yeah, I found it after I sent this e-mail.
> 
> I have run xfstests with metadata checksum enabled in the kernel with
> the development branch of e2fsprogs, and it passes, so I'm fairly
> confident they are in sync.
cool. I will use it+inline_data to test the new inline data kernel.
> 
> One of the questions that we should consider in the near future is
> when to cut a e2fsprogs 1.43 release.  Things which we might want to
> consider adding before we cut 1.43 include the changes to fully
> support online resize for 64-bit file systems, and possibly the inline
> data patches, if we are confident in the e2fsprogs changes and that
> the format of inline data is ready to be cast into concrete.
Sure, I have finished the new inline data patches(your suggestion of the
new "..") and run the xfstests on it. Hope the patch set can be sent
this week.

Thanks
Tao
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c
index e56c9ed..2cdb98d 100644
--- a/fs/ext4/xattr.c
+++ b/fs/ext4/xattr.c
@@ -127,19 +127,16 @@  static __le32 ext4_xattr_block_csum(struct inode *inode,
 				    struct ext4_xattr_header *hdr)
 {
 	struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
-	struct ext4_inode_info *ei = EXT4_I(inode);
 	__u32 csum, old;
 
 	old = hdr->h_checksum;
 	hdr->h_checksum = 0;
-	if (le32_to_cpu(hdr->h_refcount) != 1) {
-		block_nr = cpu_to_le64(block_nr);
-		csum = ext4_chksum(sbi, sbi->s_csum_seed, (__u8 *)&block_nr,
-				   sizeof(block_nr));
-	} else
-		csum = ei->i_csum_seed;
+	block_nr = cpu_to_le64(block_nr);
+	csum = ext4_chksum(sbi, sbi->s_csum_seed, (__u8 *)&block_nr,
+			   sizeof(block_nr));
 	csum = ext4_chksum(sbi, csum, (__u8 *)hdr,
 			   EXT4_BLOCK_SIZE(inode->i_sb));
+
 	hdr->h_checksum = old;
 	return cpu_to_le32(csum);
 }