Patchwork [RESEND,2/2] ext4: ext4_inode_is_fast_symlink should use cluster size

login
register
mail settings
Submitter Yongqiang Yang
Date Dec. 20, 2013, 5:13 a.m.
Message ID <CAGBYx2Zx7jnK-DmM7=RCCVwz4N0FfE_-=eETO8OkLt5McCXttg@mail.gmail.com>
Download mbox | patch
Permalink /patch/303891/
State Awaiting Upstream
Headers show

Comments

Yongqiang Yang - Dec. 20, 2013, 5:13 a.m.
From: Yongqiang Yang <xiaoqiangnk@gmail.com>

can be reproduced by xfstests 62 with  bigalloc and 128bit size inode.

Signed-off-by: Yongqiang Yang <yangyongqiang01@baidu.com>
---
 fs/ext4/inode.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

 }
--
1.8.5.2
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Carlos Maiolino - Dec. 23, 2013, 12:17 p.m.
On Fri, Dec 20, 2013 at 01:13:33PM +0800, Yongqiang Yang wrote:
> From: Yongqiang Yang <xiaoqiangnk@gmail.com>
> 
> can be reproduced by xfstests 62 with  bigalloc and 128bit size inode.
> 
> Signed-off-by: Yongqiang Yang <yangyongqiang01@baidu.com>
> ---
>  fs/ext4/inode.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 9115f28..1869fcf 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -145,7 +145,7 @@ static int ext4_meta_trans_blocks(struct inode
> *inode, int lblocks,
>  static int ext4_inode_is_fast_symlink(struct inode *inode)
>  {
>         int ea_blocks = EXT4_I(inode)->i_file_acl ?
> -               (inode->i_sb->s_blocksize >> 9) : 0;
> +                       EXT4_CLUSTER_SIZE(inode->i_sb) >> 9 : 0;

Code looks good, but looks like it has an extra TAB here. Just a cosmetic thing;
despite that, consider it

Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Andreas Dilger - Dec. 23, 2013, 7:33 p.m.
If actually prefer if we changed this function from checking the blocks count to checking the symlink length (i_size) to determine if this is a fast or slow symlink.

I don't think there has ever been a kernel that stores symlinks < 60 chars in an external block, and it would definitely avoid the many, many times this function has been wrong because of xattrs and other things that change the number of blocks allocated to an inode. Using the block count instead of i_size makes it impossible to change the number of blocks associated with a symlink without breaking older kernels (and I'm sure this will break again in the future, just as it did when xattrs started appearing on symlinks in the first place. I'd really prefer that we move away from that. 

Also, it doesn't seem obvious to me that a symlink in a bigalloc filesystem should account for ALL of the blocks in the cluster to the inode vs. just the blocks actually needed to store the symlink name. 

Cheers, Andreas

> On Dec 23, 2013, at 5:17, Carlos Maiolino <cmaiolino@redhat.com> wrote:
> 
>> On Fri, Dec 20, 2013 at 01:13:33PM +0800, Yongqiang Yang wrote:
>> From: Yongqiang Yang <xiaoqiangnk@gmail.com>
>> 
>> can be reproduced by xfstests 62 with  bigalloc and 128bit size inode.
>> 
>> Signed-off-by: Yongqiang Yang <yangyongqiang01@baidu.com>
>> ---
>> fs/ext4/inode.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
>> index 9115f28..1869fcf 100644
>> --- a/fs/ext4/inode.c
>> +++ b/fs/ext4/inode.c
>> @@ -145,7 +145,7 @@ static int ext4_meta_trans_blocks(struct inode
>> *inode, int lblocks,
>> static int ext4_inode_is_fast_symlink(struct inode *inode)
>> {
>>        int ea_blocks = EXT4_I(inode)->i_file_acl ?
>> -               (inode->i_sb->s_blocksize >> 9) : 0;
>> +                       EXT4_CLUSTER_SIZE(inode->i_sb) >> 9 : 0;
> 
> Code looks good, but looks like it has an extra TAB here. Just a cosmetic thing;
> despite that, consider it
> 
> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
> 
> 
> -- 
> Carlos
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Theodore Ts'o - Jan. 6, 2014, 2:51 p.m.
On Fri, Dec 20, 2013 at 01:13:33PM +0800, Yongqiang Yang wrote:
> From: Yongqiang Yang <xiaoqiangnk@gmail.com>
> 
> can be reproduced by xfstests 62 with  bigalloc and 128bit size inode.
> 
> Signed-off-by: Yongqiang Yang <yangyongqiang01@baidu.com>

Thanks, applied.

						- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andreas Dilger - Jan. 6, 2014, 5:47 p.m.
Ted, 
What about the idea to stop using the blocks count for doing the fast/slow symlink check, and instead use i_size?  IMHO this is far more robust than using the blocks count, since we've repeatedly seen bugs when the number of blocks allocated to an inode changes for done reason (e.g. xattrs, bigalloc, multi-block xattrs in the future).

AFAIK the kernel and e2fsprogs have always been consistent with symlinks <= 60 bytes being stored in the inode. 

Cheers, Andreas

> On Jan 6, 2014, at 7:51, Theodore Ts'o <tytso@mit.edu> wrote:
> 
>> On Fri, Dec 20, 2013 at 01:13:33PM +0800, Yongqiang Yang wrote:
>> From: Yongqiang Yang <xiaoqiangnk@gmail.com>
>> 
>> can be reproduced by xfstests 62 with  bigalloc and 128bit size inode.
>> 
>> Signed-off-by: Yongqiang Yang <yangyongqiang01@baidu.com>
> 
> Thanks, applied.
> 
>                        - Ted
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Theodore Ts'o - Jan. 6, 2014, 7:45 p.m.
On Mon, Jan 06, 2014 at 10:47:03AM -0700, Andreas Dilger wrote:
> What about the idea to stop using the blocks count for doing the
> fast/slow symlink check, and instead use i_size?  IMHO this is far
> more robust than using the blocks count, since we've repeatedly seen
> bugs when the number of blocks allocated to an inode changes for
> done reason (e.g. xattrs, bigalloc, multi-block xattrs in the
> future).

I did see your earlier proposal on this front, but I didn't want to
this change without thinking about it a bit more closely.  In
particular, we probably would want to enforce this change in e2fsck
for a while first.

Currently, if we have a slow symlink where i_size is less than 60
bytes, both e2fsprogs and the kernel handles this case.  See the
attached file system image.

Yes, I created it synthetically, but keep in mind that that there are
other implementations of ext2/3/4 other than just in the Linux kernel.
In particular, the GNU Hurd and *BSD have their own independent
implementation of ext2.  So even if the Linux kernel has never created
a slow symlink with i_size <= 60 bytes, but that doesn't mean that
it's for certain that there are no such implementations out there in the wild.

That doesn't mean that we should never make such a change, but it does
mean that it's not something I want to do lightly.

							- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 9115f28..1869fcf 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -145,7 +145,7 @@  static int ext4_meta_trans_blocks(struct inode
*inode, int lblocks,
 static int ext4_inode_is_fast_symlink(struct inode *inode)
 {
        int ea_blocks = EXT4_I(inode)->i_file_acl ?
-               (inode->i_sb->s_blocksize >> 9) : 0;
+                       EXT4_CLUSTER_SIZE(inode->i_sb) >> 9 : 0;

        return (S_ISLNK(inode->i_mode) && inode->i_blocks - ea_blocks == 0);