Patchwork Far too long mount time

login
register
mail settings
Submitter Theodore Ts'o
Date Aug. 16, 2012, 6:53 p.m.
Message ID <20120816185337.GB31346@thunk.org>
Download mbox | patch
Permalink /patch/178075/
State Superseded
Headers show

Comments

Theodore Ts'o - Aug. 16, 2012, 6:53 p.m.
On Thu, Aug 16, 2012 at 10:42:19AM -0400, Theodore Ts'o wrote:
> On Thu, Aug 16, 2012 at 10:16:48AM -0400, Calvin Walton wrote:
> > On Thu, 2012-08-16 at 03:09 -0600, Andreas Dilger wrote:
> > Is there any fix for this issue queued up for an upcoming stable
> > release? It still reverts cleanly on 3.5.2.
> 
> There isn't a fix queued up yet, but there will be one soon....

This patch should solve the problem (as an alternative to reverting
8aeb00ff85a).

						- Ted


From dc43c7a8a6c266c31aa4f0408000c4d1b9f3c787 Mon Sep 17 00:00:00 2001
From: Theodore Ts'o <tytso@mit.edu>
Date: Thu, 16 Aug 2012 11:59:04 -0400
Subject: [PATCH] ext4: fix long mount times on very big file systems

Commit 8aeb00ff85a: "ext4: fix overhead calculation used by
ext4_statfs()" introduced a O(n**2) calculation which makes very large
file systems take forever to mount.  Fix this with an optimization for
non-bigalloc file systems.  (For bigalloc file systems the overhead
needs to be set in the the superblock.)

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org
---
 fs/ext4/super.c | 4 ++++
 1 file changed, 4 insertions(+)
Eric Sandeen - Aug. 16, 2012, 7:23 p.m.
On 8/16/12 1:53 PM, Theodore Ts'o wrote:
> On Thu, Aug 16, 2012 at 10:42:19AM -0400, Theodore Ts'o wrote:
>> On Thu, Aug 16, 2012 at 10:16:48AM -0400, Calvin Walton wrote:
>>> On Thu, 2012-08-16 at 03:09 -0600, Andreas Dilger wrote:
>>> Is there any fix for this issue queued up for an upcoming stable
>>> release? It still reverts cleanly on 3.5.2.
>>
>> There isn't a fix queued up yet, but there will be one soon....
>
> This patch should solve the problem (as an alternative to reverting
> 8aeb00ff85a).
>
> 						- Ted
>
>
>  From dc43c7a8a6c266c31aa4f0408000c4d1b9f3c787 Mon Sep 17 00:00:00 2001
> From: Theodore Ts'o <tytso@mit.edu>
> Date: Thu, 16 Aug 2012 11:59:04 -0400
> Subject: [PATCH] ext4: fix long mount times on very big file systems
>
> Commit 8aeb00ff85a: "ext4: fix overhead calculation used by
> ext4_statfs()" introduced a O(n**2) calculation which makes very large
> file systems take forever to mount.  Fix this with an optimization for
> non-bigalloc file systems.  (For bigalloc file systems the overhead
> needs to be set in the the superblock.)

And mkfs with bigalloc will do that, right?

Hm.
         /*
          * Get the # of file system overhead blocks from the
          * superblock if present.
          */
         if (es->s_overhead_clusters)
                 sbi->s_overhead = le32_to_cpu(es->s_overhead_clusters);
         else {
                 ret = ext4_calculate_overhead(sb);
                 if (ret)
                         goto failed_mount_wq;
         }

so if we mkfs'd with bigalloc, s_overhead_clusters will be set and we 
won't call ext4_calculate_overhead, right?

But if we didn't mkfs with bigalloc, we won't have the feature, and
ext4_calc_overhead will exit early.

So when does all of the code after the short-circuit ever run?

-Eric

> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
> Cc: stable@vger.kernel.org
> ---
>   fs/ext4/super.c | 4 ++++
>   1 file changed, 4 insertions(+)
>
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index 603023b..055c65b 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -3129,6 +3129,10 @@ static int count_overhead(struct super_block *sb, ext4_group_t grp,
>   	ext4_group_t		i, ngroups = ext4_get_groups_count(sb);
>   	int			s, j, count = 0;
>
> +	if (!EXT4_HAS_RO_COMPAT_FEATURE(sb, EXT4_FEATURE_RO_COMPAT_BIGALLOC))
> +		return (ext4_bg_has_super(sb, grp) + ext4_bg_num_gdb(sb, i) +
> +			sbi->s_itb_per_group + 2);
> +
>   	first_block = le32_to_cpu(sbi->s_es->s_first_data_block) +
>   		(grp * EXT4_BLOCKS_PER_GROUP(sb));
>   	last_block = first_block + EXT4_BLOCKS_PER_GROUP(sb) - 1;
>

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Theodore Ts'o - Aug. 16, 2012, 7:57 p.m.
On Thu, Aug 16, 2012 at 02:23:37PM -0500, Eric Sandeen wrote:
> 
> And mkfs with bigalloc will do that, right?

Not yet.

We also need to make sure s_overhead_clusters gets updated after a
resize operation, but I wanted to get a fix for non-bigalloc file
systems out there as soon as possible.

> So when does all of the code after the short-circuit ever run?

Until we get ths rest of the code fixed up, it will needed for all
bigalloc file systems.  The one saving grace is that bigalloc file
systems will have fewer block groups, so the O(n**2) won't be as bad
--- but it is something we really should fix.

					- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Theodore Ts'o - Aug. 17, 2012, 1:55 p.m.
On Thu, Aug 16, 2012 at 02:53:37PM -0400, Theodore Ts'o wrote:
> This patch should solve the problem (as an alternative to reverting
> 8aeb00ff85a).
> 
> 
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index 603023b..055c65b 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -3129,6 +3129,10 @@ static int count_overhead(struct super_block *sb, ext4_group_t grp,
>  	ext4_group_t		i, ngroups = ext4_get_groups_count(sb);
>  	int			s, j, count = 0;
>  
> +	if (!EXT4_HAS_RO_COMPAT_FEATURE(sb, EXT4_FEATURE_RO_COMPAT_BIGALLOC))
> +		return (ext4_bg_has_super(sb, grp) + ext4_bg_num_gdb(sb, i) +
                                                                        ^^^

This should be "grp"; stupid typo on my part.

This patch with the fix-up will be pushed to Linus shortly and tagged
for the stable tree.

     	       	      	     	     	- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Justin Piszcz - Aug. 17, 2012, 2:48 p.m.
-----Original Message-----
From: Theodore Ts'o [mailto:tytso@mit.edu] 
Sent: Friday, August 17, 2012 9:55 AM
To: Calvin Walton
Cc: Andreas Dilger; Javier Marcet; Linux Ext4 Mailing List;
jpiszcz@lucidpixels.com
Subject: Re: Far too long mount time

On Thu, Aug 16, 2012 at 02:53:37PM -0400, Theodore Ts'o wrote:
> This patch should solve the problem (as an alternative to reverting
> 8aeb00ff85a).

This should be "grp"; stupid typo on my part.

This patch with the fix-up will be pushed to Linus shortly and tagged
for the stable tree.

--

Hi Theo,

Found the patch here:
http://www.spinics.net/lists/linux-ext4/msg33498.html

Patched against the fresh+stable 3.5.1 tree:

p34:/usr/src/linux-3.5.1# patch -p1 < ../tso.patch
patching file fs/ext4/super.c
Hunk #1 succeeded at 3109 (offset -20 lines).
p34:/usr/src/linux-3.5.1#

Rebooted, but we have a problem, my 60TB RAID-6 is now a 46TB RAID-6?
I am rebooting back into the 3.5.1+earlier reverted patch.

p34:~# uptime; uname -a ; df -h | grep /r1
 10:46:51 up 0 min,  1 user,  load average: 1.04, 0.27, 0.09
Linux p34 3.5.1 #5 SMP Fri Aug 17 10:24:22 EDT 2012 x86_64 GNU/Linux
/dev/sda1        46T  1.5T   45T   4% /r1
p34:~#

Justin.

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Theodore Ts'o - Aug. 17, 2012, 2:58 p.m.
I'm confused; I didn't see your original message.  Your complaint was
with my buggy patch at:

   http://www.spinics.net/lists/linux-ext4/msg33498.html

without the s/i/grp/ fix applied, right?

						- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Justin Piszcz - Aug. 17, 2012, 3 p.m.
-----Original Message-----
From: Justin Piszcz [mailto:jpiszcz@lucidpixels.com] 
Sent: Friday, August 17, 2012 10:48 AM
To: 'Theodore Ts'o'; 'Calvin Walton'; ap@solarrain.com
Cc: 'Andreas Dilger'; 'Javier Marcet'; 'Linux Ext4 Mailing List'
Subject: RE: Far too long mount time



-----Original Message-----
From: Theodore Ts'o [mailto:tytso@mit.edu] 
Sent: Friday, August 17, 2012 9:55 AM
To: Calvin Walton
Cc: Andreas Dilger; Javier Marcet; Linux Ext4 Mailing List;
jpiszcz@lucidpixels.com
Subject: Re: Far too long mount time

On Thu, Aug 16, 2012 at 02:53:37PM -0400, Theodore Ts'o wrote:
> This patch should solve the problem (as an alternative to reverting
> 8aeb00ff85a).

This should be "grp"; stupid typo on my part.

This patch with the fix-up will be pushed to Linus shortly and tagged
for the stable tree.

--

Hi Theo,

Found the patch here:
http://www.spinics.net/lists/linux-ext4/msg33498.html

Patched against the fresh+stable 3.5.1 tree:

p34:/usr/src/linux-3.5.1# patch -p1 < ../tso.patch
patching file fs/ext4/super.c
Hunk #1 succeeded at 3109 (offset -20 lines).
p34:/usr/src/linux-3.5.1#

Rebooted, but we have a problem, my 60TB RAID-6 is now a 46TB RAID-6?
I am rebooting back into the 3.5.1+earlier reverted patch.

p34:~# uptime; uname -a ; df -h | grep /r1
 10:46:51 up 0 min,  1 user,  load average: 1.04, 0.27, 0.09
Linux p34 3.5.1 #5 SMP Fri Aug 17 10:24:22 EDT 2012 x86_64 GNU/Linux
/dev/sda1        46T  1.5T   45T   4% /r1
p34:~#
--

Went back to old kernel (w/reverted patch, it is ok now)

p34:~# uname -a; uptime; df -h |grep /r1
Linux p34 3.5.1+ #3 SMP Sun Aug 12 10:31:34 EDT 2012 x86_64 GNU/Linux
 10:59:51 up 9 min,  1 user,  load average: 0.00, 0.20, 0.23
/dev/sda1        61T   16T   45T  27% /r1
p34:~#

Justin.



--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Theodore Ts'o - Aug. 17, 2012, 3:13 p.m.
On Fri, Aug 17, 2012 at 11:00:19AM -0400, Justin Piszcz wrote:
> 
> 
> -----Original Message-----
> From: Justin Piszcz [mailto:jpiszcz@lucidpixels.com] 
> Sent: Friday, August 17, 2012 10:48 AM
> To: 'Theodore Ts'o'; 'Calvin Walton'; ap@solarrain.com
> Cc: 'Andreas Dilger'; 'Javier Marcet'; 'Linux Ext4 Mailing List'
> Subject: RE: Far too long mount time
> 
> 
> 
> -----Original Message-----
> From: Theodore Ts'o [mailto:tytso@mit.edu] 
> Sent: Friday, August 17, 2012 9:55 AM
> To: Calvin Walton
> Cc: Andreas Dilger; Javier Marcet; Linux Ext4 Mailing List;
> jpiszcz@lucidpixels.com
> Subject: Re: Far too long mount time

I think there's something wrong with your mailer?  You just forwarded
your original message without any further additional text? 

Still confused,

     	      	      	      	  	  - Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Theodore Ts'o - Aug. 17, 2012, 3:27 p.m.
OK, I think what you were trying to say; that you've only gone back to
the original kernel and you have not tried the fixed patch.  Yes?

Sorry, all of the excess quoting made it very hard for me to parse
your reply....

						- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andreas Dilger - Aug. 17, 2012, 3:58 p.m.
On 2012-08-16, at 12:53, Theodore Ts'o <tytso@mit.edu> wrote:
> On Thu, Aug 16, 2012 at 10:42:19AM -0400, Theodore Ts'o wrote:
>> On Thu, Aug 16, 2012 at 10:16:48AM -0400, Calvin Walton wrote:
>>> On Thu, 2012-08-16 at 03:09 -0600, Andreas Dilger wrote:
>>> Is there any fix for this issue queued up for an upcoming stable
>>> release? It still reverts cleanly on 3.5.2.
>> 
>> There isn't a fix queued up yet, but there will be one soon....
> 
> This patch should solve the problem (as an alternative to reverting
> 8aeb00ff85a).

Why not save the newly calculated value for s_oberhead_blocks to disk after the first time it is calculated?  The only time it would need to be changed again is after a resize operation, which could be done by calling count_overhead() at the end again. 

Cheers, Andreas

> From dc43c7a8a6c266c31aa4f0408000c4d1b9f3c787 Mon Sep 17 00:00:00 2001
> From: Theodore Ts'o <tytso@mit.edu>
> Date: Thu, 16 Aug 2012 11:59:04 -0400
> Subject: [PATCH] ext4: fix long mount times on very big file systems
> 
> Commit 8aeb00ff85a: "ext4: fix overhead calculation used by
> ext4_statfs()" introduced a O(n**2) calculation which makes very large
> file systems take forever to mount.  Fix this with an optimization for
> non-bigalloc file systems.  (For bigalloc file systems the overhead
> needs to be set in the the superblock.)
> 
> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
> Cc: stable@vger.kernel.org
> ---
> fs/ext4/super.c | 4 ++++
> 1 file changed, 4 insertions(+)
> 
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index 603023b..055c65b 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -3129,6 +3129,10 @@ static int count_overhead(struct super_block *sb, ext4_group_t grp,
>    ext4_group_t        i, ngroups = ext4_get_groups_count(sb);
>    int            s, j, count = 0;
> 
> +    if (!EXT4_HAS_RO_COMPAT_FEATURE(sb, EXT4_FEATURE_RO_COMPAT_BIGALLOC))
> +        return (ext4_bg_has_super(sb, grp) + ext4_bg_num_gdb(sb, i) +
> +            sbi->s_itb_per_group + 2);
> +
>    first_block = le32_to_cpu(sbi->s_es->s_first_data_block) +
>        (grp * EXT4_BLOCKS_PER_GROUP(sb));
>    last_block = first_block + EXT4_BLOCKS_PER_GROUP(sb) - 1;
> -- 
> 1.7.12.rc0.22.gcdd159b
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Justin Piszcz - Aug. 17, 2012, 7:18 p.m.
-----Original Message-----
From: Theodore Ts'o [mailto:tytso@mit.edu] 
Sent: Friday, August 17, 2012 11:28 AM
To: Justin Piszcz
Cc: 'Calvin Walton'; ap@solarrain.com; 'Andreas Dilger'; 'Javier Marcet';
'Linux Ext4 Mailing List'
Subject: Re: Far too long mount time

OK, I think what you were trying to say; that you've only gone back to
the original kernel and you have not tried the fixed patch.  Yes?

Sorry, all of the excess quoting made it very hard for me to parse
your reply....

						- Ted


Hi Theo,

I used the patch from this URL:
http://www.spinics.net/lists/linux-ext4/msg33498.html

The filesystem does mount quickly and normally BUT it shows up as 45TB and
not the 60TB it is normally.
I went back to the 3.5.1 kernel with 8aeb00ff85a reverted and all is back to
normal.

Justin.

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Sandeen - Aug. 17, 2012, 7:44 p.m.
On 8/17/12 2:18 PM, Justin Piszcz wrote:
>
>
> -----Original Message-----
> From: Theodore Ts'o [mailto:tytso@mit.edu]
> Sent: Friday, August 17, 2012 11:28 AM
> To: Justin Piszcz
> Cc: 'Calvin Walton'; ap@solarrain.com; 'Andreas Dilger'; 'Javier Marcet';
> 'Linux Ext4 Mailing List'
> Subject: Re: Far too long mount time
>
> OK, I think what you were trying to say; that you've only gone back to
> the original kernel and you have not tried the fixed patch.  Yes?
>
> Sorry, all of the excess quoting made it very hard for me to parse
> your reply....
>
> 						- Ted
>
>
> Hi Theo,
>
> I used the patch from this URL:
> http://www.spinics.net/lists/linux-ext4/msg33498.html
>
> The filesystem does mount quickly and normally BUT it shows up as 45TB and
> not the 60TB it is normally.
> I went back to the 3.5.1 kernel with 8aeb00ff85a reverted and all is back to
> normal.

Didn't this all start w/ df reporting issues?  Seems like we need an xfs 
test explicitly for extN, with and without minixdf ....

-Eric

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 603023b..055c65b 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -3129,6 +3129,10 @@  static int count_overhead(struct super_block *sb, ext4_group_t grp,
 	ext4_group_t		i, ngroups = ext4_get_groups_count(sb);
 	int			s, j, count = 0;
 
+	if (!EXT4_HAS_RO_COMPAT_FEATURE(sb, EXT4_FEATURE_RO_COMPAT_BIGALLOC))
+		return (ext4_bg_has_super(sb, grp) + ext4_bg_num_gdb(sb, i) +
+			sbi->s_itb_per_group + 2);
+
 	first_block = le32_to_cpu(sbi->s_es->s_first_data_block) +
 		(grp * EXT4_BLOCKS_PER_GROUP(sb));
 	last_block = first_block + EXT4_BLOCKS_PER_GROUP(sb) - 1;