diff mbox

ext4: Automatic setting of {INODE,BLOCK}_UNINIT flags

Message ID 1351149943-4827-1-git-send-email-tracek@redhat.com
State Superseded, archived
Headers show

Commit Message

Tomas Racek Oct. 25, 2012, 7:25 a.m. UTC
When last inode from bg is freed, set the INODE_UNINIT flag, similarly
when last block is freed, set BLOCK_UNINIT flag. This can speed up
subsequent fsck run.

Signed-off-by: Tomas Racek <tracek@redhat.com>
---
 fs/ext4/ialloc.c  | 4 ++++
 fs/ext4/mballoc.c | 4 ++++
 2 files changed, 8 insertions(+)

Comments

Yongqiang Yang Oct. 25, 2012, 7:44 a.m. UTC | #1
Does it make sense in no journal mode?

Thanks,
Yongqiang.

On Thu, Oct 25, 2012 at 3:25 PM, Tomas Racek <tracek@redhat.com> wrote:
> When last inode from bg is freed, set the INODE_UNINIT flag, similarly
> when last block is freed, set BLOCK_UNINIT flag. This can speed up
> subsequent fsck run.
>
> Signed-off-by: Tomas Racek <tracek@redhat.com>
> ---
>  fs/ext4/ialloc.c  | 4 ++++
>  fs/ext4/mballoc.c | 4 ++++
>  2 files changed, 8 insertions(+)
>
> diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
> index 4facdd2..6e4b982 100644
> --- a/fs/ext4/ialloc.c
> +++ b/fs/ext4/ialloc.c
> @@ -289,6 +289,10 @@ void ext4_free_inode(handle_t *handle, struct inode *inode)
>
>         count = ext4_free_inodes_count(sb, gdp) + 1;
>         ext4_free_inodes_set(sb, gdp, count);
> +
> +       if (count == EXT4_INODES_PER_GROUP(sb))
> +               gdp->bg_flags |= cpu_to_le16(EXT4_BG_INODE_UNINIT);
> +
>         if (is_directory) {
>                 count = ext4_used_dirs_count(sb, gdp) - 1;
>                 ext4_used_dirs_set(sb, gdp, count);
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index 526e553..28bce35 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -4665,6 +4665,10 @@ do_more:
>
>         ret = ext4_free_group_clusters(sb, gdp) + count_clusters;
>         ext4_free_group_clusters_set(sb, gdp, ret);
> +
> +       if(ret == EXT4_BLOCKS_PER_GROUP(sb))
> +               gdp->bg_flags |= cpu_to_le16(EXT4_BG_BLOCK_UNINIT);
> +
>         ext4_block_bitmap_csum_set(sb, block_group, gdp, bitmap_bh);
>         ext4_group_desc_csum_set(sb, block_group, gdp);
>         ext4_unlock_group(sb, block_group);
> --
> 1.7.11.7
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
Lukas Czerner Oct. 25, 2012, 9:44 a.m. UTC | #2
On Thu, 25 Oct 2012, Yongqiang Yang wrote:

> Date: Thu, 25 Oct 2012 15:44:48 +0800
> From: Yongqiang Yang <xiaoqiangnk@gmail.com>
> To: Tomas Racek <tracek@redhat.com>
> Cc: linux-ext4@vger.kernel.org, lczerner@redhat.com
> Subject: Re: [PATCH] ext4: Automatic setting of {INODE,BLOCK}_UNINIT flags
> 
> Does it make sense in no journal mode?

I am not sure why it would not ? It only makes changes to the block
group descriptors marking block/inode table as unitialized so we can
skip then in fsck.

Originally it has been only marked uninitialized when we create the
file system, but once we use the particular inode table or block
group bitmap it become initialized forever, even though it has been
completely freed since then. This patch fixes it so it can become
uninitialized again (granted that the flag name is not the best).

Advantages are that we can skip that in fsck, but also we might save
read from disk, because uninitialized descriptors are initialized in
memory rather hen read from the disk.

So my question is, why do you think this might not make sense in no
journal mode ? Maybe I am missing something.

Thanks!
-Lukas

> 
> Thanks,
> Yongqiang.
> 
> On Thu, Oct 25, 2012 at 3:25 PM, Tomas Racek <tracek@redhat.com> wrote:
> > When last inode from bg is freed, set the INODE_UNINIT flag, similarly
> > when last block is freed, set BLOCK_UNINIT flag. This can speed up
> > subsequent fsck run.
> >
> > Signed-off-by: Tomas Racek <tracek@redhat.com>
> > ---
> >  fs/ext4/ialloc.c  | 4 ++++
> >  fs/ext4/mballoc.c | 4 ++++
> >  2 files changed, 8 insertions(+)
> >
> > diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
> > index 4facdd2..6e4b982 100644
> > --- a/fs/ext4/ialloc.c
> > +++ b/fs/ext4/ialloc.c
> > @@ -289,6 +289,10 @@ void ext4_free_inode(handle_t *handle, struct inode *inode)
> >
> >         count = ext4_free_inodes_count(sb, gdp) + 1;
> >         ext4_free_inodes_set(sb, gdp, count);
> > +
> > +       if (count == EXT4_INODES_PER_GROUP(sb))
> > +               gdp->bg_flags |= cpu_to_le16(EXT4_BG_INODE_UNINIT);
> > +
> >         if (is_directory) {
> >                 count = ext4_used_dirs_count(sb, gdp) - 1;
> >                 ext4_used_dirs_set(sb, gdp, count);
> > diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> > index 526e553..28bce35 100644
> > --- a/fs/ext4/mballoc.c
> > +++ b/fs/ext4/mballoc.c
> > @@ -4665,6 +4665,10 @@ do_more:
> >
> >         ret = ext4_free_group_clusters(sb, gdp) + count_clusters;
> >         ext4_free_group_clusters_set(sb, gdp, ret);
> > +
> > +       if(ret == EXT4_BLOCKS_PER_GROUP(sb))
> > +               gdp->bg_flags |= cpu_to_le16(EXT4_BG_BLOCK_UNINIT);
> > +
> >         ext4_block_bitmap_csum_set(sb, block_group, gdp, bitmap_bh);
> >         ext4_group_desc_csum_set(sb, block_group, gdp);
> >         ext4_unlock_group(sb, block_group);
> > --
> > 1.7.11.7
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yongqiang Yang Oct. 25, 2012, 11:39 a.m. UTC | #3
>
> So my question is, why do you think this might not make sense in no
> journal mode ? Maybe I am missing something.
Yep, advantage is obvious, in no journal mode, if we delete a file
which is the last inode in a block group, and the uninit flag of inode
bitmap is flused to disk and directory referring the inode is not
flushed,  I don't know how fsck handles the situation currently.  If
fsck handles the situation, everything is ok. I meant maybe we should
check fsck too.

Yongqiang.
>
> Thanks!
> -Lukas
>
>>
>> Thanks,
>> Yongqiang.
>>
>> On Thu, Oct 25, 2012 at 3:25 PM, Tomas Racek <tracek@redhat.com> wrote:
>> > When last inode from bg is freed, set the INODE_UNINIT flag, similarly
>> > when last block is freed, set BLOCK_UNINIT flag. This can speed up
>> > subsequent fsck run.
>> >
>> > Signed-off-by: Tomas Racek <tracek@redhat.com>
>> > ---
>> >  fs/ext4/ialloc.c  | 4 ++++
>> >  fs/ext4/mballoc.c | 4 ++++
>> >  2 files changed, 8 insertions(+)
>> >
>> > diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
>> > index 4facdd2..6e4b982 100644
>> > --- a/fs/ext4/ialloc.c
>> > +++ b/fs/ext4/ialloc.c
>> > @@ -289,6 +289,10 @@ void ext4_free_inode(handle_t *handle, struct inode *inode)
>> >
>> >         count = ext4_free_inodes_count(sb, gdp) + 1;
>> >         ext4_free_inodes_set(sb, gdp, count);
>> > +
>> > +       if (count == EXT4_INODES_PER_GROUP(sb))
>> > +               gdp->bg_flags |= cpu_to_le16(EXT4_BG_INODE_UNINIT);
>> > +
>> >         if (is_directory) {
>> >                 count = ext4_used_dirs_count(sb, gdp) - 1;
>> >                 ext4_used_dirs_set(sb, gdp, count);
>> > diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
>> > index 526e553..28bce35 100644
>> > --- a/fs/ext4/mballoc.c
>> > +++ b/fs/ext4/mballoc.c
>> > @@ -4665,6 +4665,10 @@ do_more:
>> >
>> >         ret = ext4_free_group_clusters(sb, gdp) + count_clusters;
>> >         ext4_free_group_clusters_set(sb, gdp, ret);
>> > +
>> > +       if(ret == EXT4_BLOCKS_PER_GROUP(sb))
>> > +               gdp->bg_flags |= cpu_to_le16(EXT4_BG_BLOCK_UNINIT);
>> > +
>> >         ext4_block_bitmap_csum_set(sb, block_group, gdp, bitmap_bh);
>> >         ext4_group_desc_csum_set(sb, block_group, gdp);
>> >         ext4_unlock_group(sb, block_group);
>> > --
>> > 1.7.11.7
>> >
>> > --
>> > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>> > the body of a message to majordomo@vger.kernel.org
>> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
>>
>>
Zheng Liu Oct. 25, 2012, 12:54 p.m. UTC | #4
On Thu, Oct 25, 2012 at 07:39:06PM +0800, Yongqiang Yang wrote:
> >
> > So my question is, why do you think this might not make sense in no
> > journal mode ? Maybe I am missing something.
> Yep, advantage is obvious, in no journal mode, if we delete a file
> which is the last inode in a block group, and the uninit flag of inode
> bitmap is flused to disk and directory referring the inode is not
> flushed,  I don't know how fsck handles the situation currently.  If
> fsck handles the situation, everything is ok. I meant maybe we should
> check fsck too.

Hi Yongqiang,

It seems that it couldn't happen whether it is in no journal mode or
journal mode.  When a file is deleted, the dir entry will be updated
firstly, and then the block will be freed.  So the block is freed after
the dir entry is updated.  So when the last inode is freed, the dir
entry must be flushed to the disk.  Am I missing something?

Regards,
Zheng
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Lukas Czerner Oct. 25, 2012, 1:59 p.m. UTC | #5
On Thu, 25 Oct 2012, Zheng Liu wrote:

> Date: Thu, 25 Oct 2012 20:54:09 +0800
> From: Zheng Liu <gnehzuil.liu@gmail.com>
> To: Yongqiang Yang <xiaoqiangnk@gmail.com>
> Cc: Lukáš Czerner <lczerner@redhat.com>, Tomas Racek <tracek@redhat.com>,
>     linux-ext4@vger.kernel.org
> Subject: Re: [PATCH] ext4: Automatic setting of {INODE,BLOCK}_UNINIT flags
> 
> On Thu, Oct 25, 2012 at 07:39:06PM +0800, Yongqiang Yang wrote:
> > >
> > > So my question is, why do you think this might not make sense in no
> > > journal mode ? Maybe I am missing something.
> > Yep, advantage is obvious, in no journal mode, if we delete a file
> > which is the last inode in a block group, and the uninit flag of inode
> > bitmap is flused to disk and directory referring the inode is not
> > flushed,  I don't know how fsck handles the situation currently.  If
> > fsck handles the situation, everything is ok. I meant maybe we should
> > check fsck too.
> 
> Hi Yongqiang,
> 
> It seems that it couldn't happen whether it is in no journal mode or
> journal mode.  When a file is deleted, the dir entry will be updated
> firstly, and then the block will be freed.  So the block is freed after
> the dir entry is updated.  So when the last inode is freed, the dir
> entry must be flushed to the disk.  Am I missing something?

I think you're right. Doing this the other way around would be a bug
regardless on this patch.

Thanks!
-Lukas

> 
> Regards,
> Zheng
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
Andreas Dilger Oct. 25, 2012, 2:37 p.m. UTC | #6
On 2012-10-25, at 0:44, Yongqiang Yang <xiaoqiangnk@gmail.com> wrote:
> Does it make sense in no journal mode?

This is even more important in nojournal mode, since that will need to run e2fsck after every boot. 

Cheers, Andreas

> On Thu, Oct 25, 2012 at 3:25 PM, Tomas Racek <tracek@redhat.com> wrote:
>> When last inode from bg is freed, set the INODE_UNINIT flag, similarly
>> when last block is freed, set BLOCK_UNINIT flag. This can speed up
>> subsequent fsck run.
>> 
>> Signed-off-by: Tomas Racek <tracek@redhat.com>
>> ---
>> fs/ext4/ialloc.c  | 4 ++++
>> fs/ext4/mballoc.c | 4 ++++
>> 2 files changed, 8 insertions(+)
>> 
>> diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
>> index 4facdd2..6e4b982 100644
>> --- a/fs/ext4/ialloc.c
>> +++ b/fs/ext4/ialloc.c
>> @@ -289,6 +289,10 @@ void ext4_free_inode(handle_t *handle, struct inode *inode)
>> 
>>        count = ext4_free_inodes_count(sb, gdp) + 1;
>>        ext4_free_inodes_set(sb, gdp, count);
>> +
>> +       if (count == EXT4_INODES_PER_GROUP(sb))
>> +               gdp->bg_flags |= cpu_to_le16(EXT4_BG_INODE_UNINIT);
>> +
>>        if (is_directory) {
>>                count = ext4_used_dirs_count(sb, gdp) - 1;
>>                ext4_used_dirs_set(sb, gdp, count);
>> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
>> index 526e553..28bce35 100644
>> --- a/fs/ext4/mballoc.c
>> +++ b/fs/ext4/mballoc.c
>> @@ -4665,6 +4665,10 @@ do_more:
>> 
>>        ret = ext4_free_group_clusters(sb, gdp) + count_clusters;
>>        ext4_free_group_clusters_set(sb, gdp, ret);
>> +
>> +       if(ret == EXT4_BLOCKS_PER_GROUP(sb))
>> +               gdp->bg_flags |= cpu_to_le16(EXT4_BG_BLOCK_UNINIT);
>> +
>>        ext4_block_bitmap_csum_set(sb, block_group, gdp, bitmap_bh);
>>        ext4_group_desc_csum_set(sb, block_group, gdp);
>>        ext4_unlock_group(sb, block_group);
>> --
>> 1.7.11.7
>> 
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
> 
> -- 
> Best Wishes
> Yongqiang Yang
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Theodore Ts'o Nov. 8, 2012, 5:48 p.m. UTC | #7
On Thu, Oct 25, 2012 at 09:25:43AM +0200, Tomas Racek wrote:
> When last inode from bg is freed, set the INODE_UNINIT flag, similarly
> when last block is freed, set BLOCK_UNINIT flag. This can speed up
> subsequent fsck run.
> 
> Signed-off-by: Tomas Racek <tracek@redhat.com>

Could you make the following enhancements to your patch?

1)  Only do this if ext4_has_group_desc_csum(sb) is true

2)  Check to make sure the inode bitmap has only one bit set (the inode
    to be freed).  Basically, I don't want to set the BLOCK/INODE_UNINIT
    based just on the block group descriptor count, in case it got corrupted
     --- what, me paranoid?

    If there is other inodes set, then we need to call ext4_error()
    since we know the file system has gotten corrupted.

3)  If we can set BLOCK/INODE_UNINIT, we can skip modifying the bitmap
    block (since we won't consult the bitmap block if uninit is set).
    So that means we can skip calling get_write_access(), modifying
    the bitmap, updating the checksum, and then calling
    handle_dirty_metadata.  In fact, we can call ext4_forget(), so we
    can drop it from the buffer cache, and if it had been modified during
    the current transaction --- as part of an rm -rf, for example ---
    we don't need to include the bitmap block in the transaction.

(2) makes this patch much safer, and (3) should more than make up for
the overhead of scanning the the bitmap.

Thanks!!!

						- Ted

P.S.  Although the block bitmap is metadata, you can pass 0 for
is_metadata, since we don't need to revoke the block --- that's only
needed for extent tree blocks, indirect blocks, or directory blocks,
which could potentially get reused as a data block.  This isn't an
issue for the bitmap blocks, so we don't need to include a revoke
record in the journal.

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
index 4facdd2..6e4b982 100644
--- a/fs/ext4/ialloc.c
+++ b/fs/ext4/ialloc.c
@@ -289,6 +289,10 @@  void ext4_free_inode(handle_t *handle, struct inode *inode)
 
 	count = ext4_free_inodes_count(sb, gdp) + 1;
 	ext4_free_inodes_set(sb, gdp, count);
+
+	if (count == EXT4_INODES_PER_GROUP(sb))
+               gdp->bg_flags |= cpu_to_le16(EXT4_BG_INODE_UNINIT);
+
 	if (is_directory) {
 		count = ext4_used_dirs_count(sb, gdp) - 1;
 		ext4_used_dirs_set(sb, gdp, count);
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 526e553..28bce35 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -4665,6 +4665,10 @@  do_more:
 
 	ret = ext4_free_group_clusters(sb, gdp) + count_clusters;
 	ext4_free_group_clusters_set(sb, gdp, ret);
+
+	if(ret == EXT4_BLOCKS_PER_GROUP(sb))
+		gdp->bg_flags |= cpu_to_le16(EXT4_BG_BLOCK_UNINIT);
+
 	ext4_block_bitmap_csum_set(sb, block_group, gdp, bitmap_bh);
 	ext4_group_desc_csum_set(sb, block_group, gdp);
 	ext4_unlock_group(sb, block_group);