Patchwork [RFC] ext4: handle fast symlink properly with inline_data

login
register
mail settings
Submitter Zheng Liu
Date Feb. 13, 2014, 7:07 a.m.
Message ID <1392275237-6998-1-git-send-email-wenqing.lz@taobao.com>
Download mbox | patch
Permalink /patch/319895/
State New
Headers show

Comments

Zheng Liu - Feb. 13, 2014, 7:07 a.m.
From: Zheng Liu <wenqing.lz@taobao.com>

This commit tries to fix a bug that we can't read fast symlink properly.
The following script can hit the bug.

  #!/bin/bash

  cd ${MNT}
  filename=ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789
  rm -rf test
  mkdir test
  cd test
  echo "hello" >$filename
  ln -s $filename symlinkfile
  cd
  sudo umount /mnt/sda1
  sudo mount -t ext4 /dev/sda1 /mnt/sda1
  readlink /mnt/sda1/test/symlinkfile

The root cause is that we don't handle inline data in ext4_follow_link.
In this commit it makes ext4_follow_link handle inline data properly.

Reported-by: Ian Nartowicz <claws@nartowicz.co.uk>
Cc: Ian Nartowicz <claws@nartowicz.co.uk>
Cc: Tao Ma <tm@tao.ma>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
---
Hi all,

I am not sure whether or not we need to enable inline_data for a fast
symlink inode.  Obviously, it brings a benefit that after enabling
inline_data feature for a fast symlink we can get more space to store
the path.  But it seems that the original patch doesn't want to do this
Another solution for fixing this bug is to disable inline_data for a
fast symlink.  Any comment?

Thanks,
						- Zheng

 fs/ext4/ext4.h    |    2 ++
 fs/ext4/inline.c  |    5 ++---
 fs/ext4/symlink.c |   46 +++++++++++++++++++++++++++++++++++++++++++---
 3 files changed, 47 insertions(+), 6 deletions(-)
Ian Nartowicz - Feb. 13, 2014, 5:05 p.m.
Zheng Liu <gnehzuil.liu <at> gmail.com> writes:

> 
> From: Zheng Liu <wenqing.lz <at> taobao.com>
> 
> This commit tries to fix a bug that we can't read fast symlink properly.
> The following script can hit the bug.
> 

> Hi all,
> 
> I am not sure whether or not we need to enable inline_data for a fast
> symlink inode.  Obviously, it brings a benefit that after enabling
> inline_data feature for a fast symlink we can get more space to store
> the path.  But it seems that the original patch doesn't want to do this
> Another solution for fixing this bug is to disable inline_data for a
> fast symlink.  Any comment?
> 
> Thanks,
> 						- Zheng

I compiled a kernel with this patch and the long symlinks work OK now.  fsck
still complains, of course.

Seems like a fairly rare case, but is there any real downside to supporting it?

--ian

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Theodore Ts'o - Feb. 18, 2014, 1:52 a.m.
On Thu, Feb 13, 2014 at 03:07:17PM +0800, Zheng Liu wrote:
> 
> I am not sure whether or not we need to enable inline_data for a fast
> symlink inode.  Obviously, it brings a benefit that after enabling
> inline_data feature for a fast symlink we can get more space to store
> the path.  But it seems that the original patch doesn't want to do this
> Another solution for fixing this bug is to disable inline_data for a
> fast symlink.  Any comment?

Well, if we are using inline data, and we have a symlink which is
longer than 60 bytes, but less than extra space available for an
inline data, it seems like a good thing to support.

The downside is that it is a bit more complication to add the kernel's
code in both the kernel as well as e2fsprogs, but it doesn't seem that
bad.

So I don't have any objections to adding this functionality.  What do
other folks think?

						- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andreas Dilger - Feb. 18, 2014, 7:07 p.m.
I suspect that the stats for symlinks > 60 but < ~150 chars is only a very
small fraction of files. If the code complexity of handling this is very
small (i.e. it is just handled as a natural consequence of writing "data" 
of this size) then I would be OK with it. 

Otherwise, I expect the code and maintenance overhead of supporting
the 0.01% (?) of symlinks that are this size is probably lot worth it. 

People could check what the actual usage is via the "fsstats" tool at:

http://www.pdsi-scidac.org/fsstats/

There is also data there already that reports stats on symlink length, but
it is mostly HPC filesystems and it might be better to redo this with a
desktop-type workload. 

Cheers, Andreas

>> On Feb 17, 2014, at 17:52, Theodore Ts'o <tytso@mit.edu> wrote:
>> 
>> On Thu, Feb 13, 2014 at 03:07:17PM +0800, Zheng Liu wrote:
>> 
>> I am not sure whether or not we need to enable inline_data for a fast
>> symlink inode.  Obviously, it brings a benefit that after enabling
>> inline_data feature for a fast symlink we can get more space to store
>> the path.  But it seems that the original patch doesn't want to do this
>> Another solution for fixing this bug is to disable inline_data for a
>> fast symlink.  Any comment?
> 
> Well, if we are using inline data, and we have a symlink which is
> longer than 60 bytes, but less than extra space available for an
> inline data, it seems like a good thing to support.
> 
> The downside is that it is a bit more complication to add the kernel's
> code in both the kernel as well as e2fsprogs, but it doesn't seem that
> bad.
> 
> So I don't have any objections to adding this functionality.  What do
> other folks think?
> 
>                       - Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Darrick J. Wong - May 30, 2014, 11:26 p.m.
On Tue, Feb 18, 2014 at 11:07:24AM -0800, Andreas Dilger wrote:
> I suspect that the stats for symlinks > 60 but < ~150 chars is only a very
> small fraction of files. If the code complexity of handling this is very
> small (i.e. it is just handled as a natural consequence of writing "data" 
> of this size) then I would be OK with it. 
> 
> Otherwise, I expect the code and maintenance overhead of supporting
> the 0.01% (?) of symlinks that are this size is probably lot worth it. 
> 
> People could check what the actual usage is via the "fsstats" tool at:
> 
> http://www.pdsi-scidac.org/fsstats/
> 
> There is also data there already that reports stats on symlink length, but
> it is mostly HPC filesystems and it might be better to redo this with a
> desktop-type workload. 

I think we should either put in this kernel patch so that we can read inline
data fast symlinks, or remove the ability to write inline data fast symlinks.
It's a bit surprising that I can do:

# mke2fs -t ext4 -O inline_data /dev/sdb         
# mount /dev/sdb /mnt/
# ln -s "Fuzzy Wuzzy was a bear. Fuzzy Wuzzy had no hair. I guess he wasn't fuzzy, was he?" /mnt/biglink
# readlink /mnt/biglink
Fuzzy Wuzzy was a bear. Fuzzy Wuzzy had no hair. I guess he wasn't fuzzy, was he?
# umount /mnt
# mount /dev/sdb /mnt/
# readlink /mnt/biglink
Fuzzy Wuzzy was a bear. Fuzzy Wuzzy had no hair. I guess he

What happened to the punchline of the limerick? ------------^^^^^^^ ???? :)

e2fsck still seems to think that you can't have inline_data fast symlinks.  I
don't see a downside to continuing to allow them.

--D

> 
> Cheers, Andreas
> 
> >> On Feb 17, 2014, at 17:52, Theodore Ts'o <tytso@mit.edu> wrote:
> >> 
> >> On Thu, Feb 13, 2014 at 03:07:17PM +0800, Zheng Liu wrote:
> >> 
> >> I am not sure whether or not we need to enable inline_data for a fast
> >> symlink inode.  Obviously, it brings a benefit that after enabling
> >> inline_data feature for a fast symlink we can get more space to store
> >> the path.  But it seems that the original patch doesn't want to do this
> >> Another solution for fixing this bug is to disable inline_data for a
> >> fast symlink.  Any comment?
> > 
> > Well, if we are using inline data, and we have a symlink which is
> > longer than 60 bytes, but less than extra space available for an
> > inline data, it seems like a good thing to support.
> > 
> > The downside is that it is a bit more complication to add the kernel's
> > code in both the kernel as well as e2fsprogs, but it doesn't seem that
> > bad.
> > 
> > So I don't have any objections to adding this functionality.  What do
> > other folks think?
> > 
> >                       - Ted
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Zheng Liu - June 2, 2014, 6:42 a.m.
Hi all,

On Fri, May 30, 2014 at 04:26:33PM -0700, Darrick J. Wong wrote:
> On Tue, Feb 18, 2014 at 11:07:24AM -0800, Andreas Dilger wrote:
> > I suspect that the stats for symlinks > 60 but < ~150 chars is only a very
> > small fraction of files. If the code complexity of handling this is very
> > small (i.e. it is just handled as a natural consequence of writing "data" 
> > of this size) then I would be OK with it. 
> > 
> > Otherwise, I expect the code and maintenance overhead of supporting
> > the 0.01% (?) of symlinks that are this size is probably lot worth it. 
> > 
> > People could check what the actual usage is via the "fsstats" tool at:
> > 
> > http://www.pdsi-scidac.org/fsstats/
> > 
> > There is also data there already that reports stats on symlink length, but
> > it is mostly HPC filesystems and it might be better to redo this with a
> > desktop-type workload. 
> 
> I think we should either put in this kernel patch so that we can read inline
> data fast symlinks, or remove the ability to write inline data fast symlinks.
> It's a bit surprising that I can do:
> 
> # mke2fs -t ext4 -O inline_data /dev/sdb         
> # mount /dev/sdb /mnt/
> # ln -s "Fuzzy Wuzzy was a bear. Fuzzy Wuzzy had no hair. I guess he wasn't fuzzy, was he?" /mnt/biglink
> # readlink /mnt/biglink
> Fuzzy Wuzzy was a bear. Fuzzy Wuzzy had no hair. I guess he wasn't fuzzy, was he?
> # umount /mnt
> # mount /dev/sdb /mnt/
> # readlink /mnt/biglink
> Fuzzy Wuzzy was a bear. Fuzzy Wuzzy had no hair. I guess he
> 
> What happened to the punchline of the limerick? ------------^^^^^^^ ???? :)

Do *not* apply this patch, please.  I revise this patch and I think it
is not right solution, although it can fix the bug.

The root cause is in ext4_inode_is_fast_symlink() function that it
doesn't check whether or not an inode has inline data.  After creating
a normal symlink, the symlink will be stored in ->i_block and extra
space if the length of symlink is greater than 60 bytes and less than
extra space in inline data.  In the mean time, this inode has inline
data flag.  If the file system is remounted, ext4_inode_is_fast_symlink
thinks this inode is a fast symlink, and the data in ->i_block is copied
to user.  I will send a new patch to fix this bug.

> 
> e2fsck still seems to think that you can't have inline_data fast symlinks.  I
> don't see a downside to continuing to allow them.

Meanwhile another patch for e2fsck also will be sent out soon in order
to handle symlink properly with inline data in e2fsck.

Regards,
                                                - Zheng
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ian Nartowicz - June 2, 2014, 11:06 a.m.
On Mon, 2 Jun 2014 14:42:51 +0800
Zheng Liu <gnehzuil.liu@gmail.com> wrote:

>Hi all,
>
>On Fri, May 30, 2014 at 04:26:33PM -0700, Darrick J. Wong wrote:
>> On Tue, Feb 18, 2014 at 11:07:24AM -0800, Andreas Dilger wrote:
>> > I suspect that the stats for symlinks > 60 but < ~150 chars is only a very
>> > small fraction of files. If the code complexity of handling this is very
>> > small (i.e. it is just handled as a natural consequence of writing "data" 
>> > of this size) then I would be OK with it. 
>> > 
>> > Otherwise, I expect the code and maintenance overhead of supporting
>> > the 0.01% (?) of symlinks that are this size is probably lot worth it. 
>> > 
>> > People could check what the actual usage is via the "fsstats" tool at:
>> > 
>> > http://www.pdsi-scidac.org/fsstats/
>> > 
>> > There is also data there already that reports stats on symlink length, but
>> > it is mostly HPC filesystems and it might be better to redo this with a
>> > desktop-type workload. 
>> 
>> I think we should either put in this kernel patch so that we can read inline
>> data fast symlinks, or remove the ability to write inline data fast symlinks.
>> It's a bit surprising that I can do:
>> 
>> # mke2fs -t ext4 -O inline_data /dev/sdb         
>> # mount /dev/sdb /mnt/
>> # ln -s "Fuzzy Wuzzy was a bear. Fuzzy Wuzzy had no hair. I guess he wasn't
>> fuzzy, was he?" /mnt/biglink # readlink /mnt/biglink
>> Fuzzy Wuzzy was a bear. Fuzzy Wuzzy had no hair. I guess he wasn't fuzzy,
>> was he? # umount /mnt
>> # mount /dev/sdb /mnt/
>> # readlink /mnt/biglink
>> Fuzzy Wuzzy was a bear. Fuzzy Wuzzy had no hair. I guess he
>> 
>> What happened to the punchline of the limerick? ------------^^^^^^^ ???? :)
>
>Do *not* apply this patch, please.  I revise this patch and I think it
>is not right solution, although it can fix the bug.
>
>The root cause is in ext4_inode_is_fast_symlink() function that it
>doesn't check whether or not an inode has inline data.  After creating
>a normal symlink, the symlink will be stored in ->i_block and extra
>space if the length of symlink is greater than 60 bytes and less than
>extra space in inline data.  In the mean time, this inode has inline
>data flag.  If the file system is remounted, ext4_inode_is_fast_symlink
>thinks this inode is a fast symlink, and the data in ->i_block is copied
>to user.  I will send a new patch to fix this bug.
>
>> 
>> e2fsck still seems to think that you can't have inline_data fast symlinks.  I
>> don't see a downside to continuing to allow them.
>
>Meanwhile another patch for e2fsck also will be sent out soon in order
>to handle symlink properly with inline data in e2fsck.
>
>Regards,
>                                                - Zheng

fsstats on root on my desktop, percentage of symlink targets 64 - 150
characters long is 22%, almost all in the 64-71 char bucket.  Lots of them are
theme icons and python packages, some shared library objects, nothing that
many people won't have.

--ian
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Zheng Liu - June 2, 2014, 11:54 a.m.
On Mon, Jun 02, 2014 at 12:06:35PM +0100, Ian Nartowicz wrote:
> On Mon, 2 Jun 2014 14:42:51 +0800
> Zheng Liu <gnehzuil.liu@gmail.com> wrote:
> 
> >Hi all,
> >
> >On Fri, May 30, 2014 at 04:26:33PM -0700, Darrick J. Wong wrote:
> >> On Tue, Feb 18, 2014 at 11:07:24AM -0800, Andreas Dilger wrote:
> >> > I suspect that the stats for symlinks > 60 but < ~150 chars is only a very
> >> > small fraction of files. If the code complexity of handling this is very
> >> > small (i.e. it is just handled as a natural consequence of writing "data" 
> >> > of this size) then I would be OK with it. 
> >> > 
> >> > Otherwise, I expect the code and maintenance overhead of supporting
> >> > the 0.01% (?) of symlinks that are this size is probably lot worth it. 
> >> > 
> >> > People could check what the actual usage is via the "fsstats" tool at:
> >> > 
> >> > http://www.pdsi-scidac.org/fsstats/
> >> > 
> >> > There is also data there already that reports stats on symlink length, but
> >> > it is mostly HPC filesystems and it might be better to redo this with a
> >> > desktop-type workload. 
> >> 
> >> I think we should either put in this kernel patch so that we can read inline
> >> data fast symlinks, or remove the ability to write inline data fast symlinks.
> >> It's a bit surprising that I can do:
> >> 
> >> # mke2fs -t ext4 -O inline_data /dev/sdb         
> >> # mount /dev/sdb /mnt/
> >> # ln -s "Fuzzy Wuzzy was a bear. Fuzzy Wuzzy had no hair. I guess he wasn't
> >> fuzzy, was he?" /mnt/biglink # readlink /mnt/biglink
> >> Fuzzy Wuzzy was a bear. Fuzzy Wuzzy had no hair. I guess he wasn't fuzzy,
> >> was he? # umount /mnt
> >> # mount /dev/sdb /mnt/
> >> # readlink /mnt/biglink
> >> Fuzzy Wuzzy was a bear. Fuzzy Wuzzy had no hair. I guess he
> >> 
> >> What happened to the punchline of the limerick? ------------^^^^^^^ ???? :)
> >
> >Do *not* apply this patch, please.  I revise this patch and I think it
> >is not right solution, although it can fix the bug.
> >
> >The root cause is in ext4_inode_is_fast_symlink() function that it
> >doesn't check whether or not an inode has inline data.  After creating
> >a normal symlink, the symlink will be stored in ->i_block and extra
> >space if the length of symlink is greater than 60 bytes and less than
> >extra space in inline data.  In the mean time, this inode has inline
> >data flag.  If the file system is remounted, ext4_inode_is_fast_symlink
> >thinks this inode is a fast symlink, and the data in ->i_block is copied
> >to user.  I will send a new patch to fix this bug.
> >
> >> 
> >> e2fsck still seems to think that you can't have inline_data fast symlinks.  I
> >> don't see a downside to continuing to allow them.
> >
> >Meanwhile another patch for e2fsck also will be sent out soon in order
> >to handle symlink properly with inline data in e2fsck.
> >
> >Regards,
> >                                                - Zheng
> 
> fsstats on root on my desktop, percentage of symlink targets 64 - 150
> characters long is 22%, almost all in the 64-71 char bucket.  Lots of them are
> theme icons and python packages, some shared library objects, nothing that
> many people won't have.

Hi Ian,

Thanks for sharing this with us.  It is useful for us to determine
whether or not we should support symlink with inline data feature, and
according to your report, we'd better support it. :-)

Regards,
                                                - Zheng
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index ece5556..d943f9c 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -2589,6 +2589,8 @@  extern int ext4_has_inline_data(struct inode *inode);
 extern int ext4_get_inline_size(struct inode *inode);
 extern int ext4_get_max_inline_size(struct inode *inode);
 extern int ext4_find_inline_data_nolock(struct inode *inode);
+extern int ext4_read_inline_data(struct inode *inode, void *buffer,
+				 unsigned int len, struct ext4_iloc *iloc);
 extern void ext4_write_inline_data(struct inode *inode,
 				   struct ext4_iloc *iloc,
 				   void *buffer, loff_t pos,
diff --git a/fs/ext4/inline.c b/fs/ext4/inline.c
index 82edf5b..6dda3c4 100644
--- a/fs/ext4/inline.c
+++ b/fs/ext4/inline.c
@@ -165,9 +165,8 @@  out:
 	return error;
 }
 
-static int ext4_read_inline_data(struct inode *inode, void *buffer,
-				 unsigned int len,
-				 struct ext4_iloc *iloc)
+int ext4_read_inline_data(struct inode *inode, void *buffer,
+			  unsigned int len, struct ext4_iloc *iloc)
 {
 	struct ext4_xattr_entry *entry;
 	struct ext4_xattr_ibody_header *header;
diff --git a/fs/ext4/symlink.c b/fs/ext4/symlink.c
index ff37119..66b62f8 100644
--- a/fs/ext4/symlink.c
+++ b/fs/ext4/symlink.c
@@ -25,9 +25,48 @@ 
 
 static void *ext4_follow_link(struct dentry *dentry, struct nameidata *nd)
 {
-	struct ext4_inode_info *ei = EXT4_I(dentry->d_inode);
-	nd_set_link(nd, (char *) ei->i_data);
-	return NULL;
+	struct inode *inode = dentry->d_inode;
+	struct ext4_inode_info *ei = EXT4_I(inode);
+	struct ext4_iloc iloc;
+	char *inline_data = NULL;
+	int ret, inline_size = 0;
+	int error;
+
+	if (!ext4_has_inline_data(inode)) {
+		nd_set_link(nd, (char *) ei->i_data);
+		return NULL;
+	}
+
+	error = ext4_get_inode_loc(inode, &iloc);
+	if (error)
+		return ERR_PTR(error);
+
+	down_read(&ei->xattr_sem);
+	if (!ext4_has_inline_data(inode)) {
+		up_read(&ei->xattr_sem);
+		return ERR_PTR(-EAGAIN);
+	}
+	inline_size = ext4_get_inline_size(inode);
+	inline_data = kmalloc(inline_size + 1, GFP_NOFS);
+	if (!inline_data) {
+		up_read(&ei->xattr_sem);
+		return ERR_PTR(-ENOMEM);
+	}
+
+	ret = ext4_read_inline_data(inode, inline_data, inline_size, &iloc);
+	up_read(&ei->xattr_sem);
+	if (ret < 0)
+		return ERR_PTR(-EIO);
+
+	inline_data[inline_size] = '\0';
+	nd_set_link(nd, inline_data);
+	return inline_data;
+}
+
+static void ext4_put_link(struct dentry *dentry, struct nameidata *nd, void *cookie)
+{
+	if (cookie)
+		kfree(cookie);
 }
 
 const struct inode_operations ext4_symlink_inode_operations = {
@@ -44,6 +83,7 @@  const struct inode_operations ext4_symlink_inode_operations = {
 const struct inode_operations ext4_fast_symlink_inode_operations = {
 	.readlink	= generic_readlink,
 	.follow_link	= ext4_follow_link,
+	.put_link	= ext4_put_link,
 	.setattr	= ext4_setattr,
 	.setxattr	= generic_setxattr,
 	.getxattr	= generic_getxattr,