diff mbox

[1/3] ext4: remove block bitmap initialization in ext4_new_inode()

Message ID 1326490249-1685-2-git-send-email-tytso@mit.edu
State Rejected, archived
Headers show

Commit Message

Theodore Ts'o Jan. 13, 2012, 9:30 p.m. UTC
We don't need to initialize the block bitmap when we allocate a new
inode.  This is old code from the very early days that is just
confusing things, and also has the problem of modifying the block
group descriptor without obeying the ext4_journal_get_write_access() /
ext4_handle_dirty_metadata() modification protocols.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
---
 fs/ext4/ialloc.c |   31 -------------------------------
 1 files changed, 0 insertions(+), 31 deletions(-)

Comments

Andreas Dilger Jan. 14, 2012, 5:02 a.m. UTC | #1
On 2012-01-13, at 2:30 PM, Theodore Ts'o wrote:
> We don't need to initialize the block bitmap when we allocate a new
> inode.

The reason the block bitmap is initialized when an inode is allocated
in that group is to indicate that the inode bitmap is in use, and the
inode table blocks are also in use.

> This is old code from the very early days that is just
> confusing things, and also has the problem of modifying the block
> group descriptor without obeying the ext4_journal_get_write_access() /
> ext4_handle_dirty_metadata() modification protocols.

The group descriptor was already modified earlier on when the inode was
being allocated from the group, so the group descriptor block was already
accounted for in the transaction credits after "repeat_in_this_group:"

repeat_in_this_group:
                ino = ext4_find_next_zero_bit((unsigned long *)
                                              inode_bitmap_bh->b_data,
                                              EXT4_INODES_PER_GROUP(sb), ino);

                if (ino < EXT4_INODES_PER_GROUP(sb)) {

                        BUFFER_TRACE(inode_bitmap_bh, "get_write_access");
                        err = ext4_journal_get_write_access(handle,
                                                            inode_bitmap_bh);
                        if (err)
                                goto fail;

                        BUFFER_TRACE(group_desc_bh, "get_write_access");
*****>>>>>              err = ext4_journal_get_write_access(handle,
                                                            group_desc_bh);

That is why ext4_handle_dirty_metadata() isn't called until after the group
descriptor is modified during the block bitmap initialization.


I'm not wholly against removing this code, but we have to do it with the
clear understanding that we will have inodes in use for which the block
bitmap is showing that the in-use blocks are free.  This doesn't seem like
a great situation to me.


> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
> ---
> fs/ext4/ialloc.c |   31 -------------------------------
> 1 files changed, 0 insertions(+), 31 deletions(-)
> 
> diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
> index 72fc989..a4ce10f 100644
> --- a/fs/ext4/ialloc.c
> +++ b/fs/ext4/ialloc.c
> @@ -807,37 +807,6 @@ repeat_in_this_group:
> 	goto out;
> 
> got:
> -	/* We may have to initialize the block bitmap if it isn't already */
> -	if (EXT4_HAS_RO_COMPAT_FEATURE(sb, EXT4_FEATURE_RO_COMPAT_GDT_CSUM) &&
> -	    gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) {
> -		struct buffer_head *block_bitmap_bh;
> -
> -		block_bitmap_bh = ext4_read_block_bitmap(sb, group);
> -		BUFFER_TRACE(block_bitmap_bh, "get block bitmap access");
> -		err = ext4_journal_get_write_access(handle, block_bitmap_bh);
> -		if (err) {
> -			brelse(block_bitmap_bh);
> -			goto fail;
> -		}
> -
> -		BUFFER_TRACE(block_bitmap_bh, "dirty block bitmap");
> -		err = ext4_handle_dirty_metadata(handle, NULL, block_bitmap_bh);
> -		brelse(block_bitmap_bh);
> -
> -		/* recheck and clear flag under lock if we still need to */
> -		ext4_lock_group(sb, group);
> -		if (gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) {
> -			gdp->bg_flags &= cpu_to_le16(~EXT4_BG_BLOCK_UNINIT);
> -			ext4_free_group_clusters_set(sb, gdp,
> -				ext4_free_clusters_after_init(sb, group, gdp));
> -			gdp->bg_checksum = ext4_group_desc_csum(sbi, group,
> -								gdp);
> -		}
> -		ext4_unlock_group(sb, group);
> -
> -		if (err)
> -			goto fail;
> -	}
> 	BUFFER_TRACE(group_desc_bh, "call ext4_handle_dirty_metadata");
> 	err = ext4_handle_dirty_metadata(handle, NULL, group_desc_bh);
> 	if (err)
> -- 
> 1.7.8.11.gefc1f.dirty
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Cheers, Andreas





--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Amir Goldstein Jan. 14, 2012, 6:41 p.m. UTC | #2
On Sat, Jan 14, 2012 at 7:02 AM, Andreas Dilger <adilger@dilger.ca> wrote:
>
> On 2012-01-13, at 2:30 PM, Theodore Ts'o wrote:
> > We don't need to initialize the block bitmap when we allocate a new
> > inode.
>
> The reason the block bitmap is initialized when an inode is allocated
> in that group is to indicate that the inode bitmap is in use, and the
> inode table blocks are also in use.

but the inode table blocks will not be from this block group when flex_bg
is used and if it's the first group of a flex_bg group, then it's bitmaps
are already initialized by mkfs/resize.

>
> > This is old code from the very early days that is just
> > confusing things, and also has the problem of modifying the block
> > group descriptor without obeying the ext4_journal_get_write_access() /
> > ext4_handle_dirty_metadata() modification protocols.
>
> The group descriptor was already modified earlier on when the inode was
> being allocated from the group, so the group descriptor block was already
> accounted for in the transaction credits after "repeat_in_this_group:"
>
> repeat_in_this_group:
>                ino = ext4_find_next_zero_bit((unsigned long *)
>                                              inode_bitmap_bh->b_data,
>                                              EXT4_INODES_PER_GROUP(sb), ino);
>
>                if (ino < EXT4_INODES_PER_GROUP(sb)) {
>
>                        BUFFER_TRACE(inode_bitmap_bh, "get_write_access");
>                        err = ext4_journal_get_write_access(handle,
>                                                            inode_bitmap_bh);
>                        if (err)
>                                goto fail;
>
>                        BUFFER_TRACE(group_desc_bh, "get_write_access");
> *****>>>>>              err = ext4_journal_get_write_access(handle,
>                                                            group_desc_bh);
>
> That is why ext4_handle_dirty_metadata() isn't called until after the group
> descriptor is modified during the block bitmap initialization.
>
>
> I'm not wholly against removing this code, but we have to do it with the
> clear understanding that we will have inodes in use for which the block
> bitmap is showing that the in-use blocks are free.  This doesn't seem like
> a great situation to me.
>
>
> > Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
> > ---
> > fs/ext4/ialloc.c |   31 -------------------------------
> > 1 files changed, 0 insertions(+), 31 deletions(-)
> >
> > diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
> > index 72fc989..a4ce10f 100644
> > --- a/fs/ext4/ialloc.c
> > +++ b/fs/ext4/ialloc.c
> > @@ -807,37 +807,6 @@ repeat_in_this_group:
> >       goto out;
> >
> > got:
> > -     /* We may have to initialize the block bitmap if it isn't already */
> > -     if (EXT4_HAS_RO_COMPAT_FEATURE(sb, EXT4_FEATURE_RO_COMPAT_GDT_CSUM) &&
> > -         gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) {
> > -             struct buffer_head *block_bitmap_bh;
> > -
> > -             block_bitmap_bh = ext4_read_block_bitmap(sb, group);
> > -             BUFFER_TRACE(block_bitmap_bh, "get block bitmap access");
> > -             err = ext4_journal_get_write_access(handle, block_bitmap_bh);
> > -             if (err) {
> > -                     brelse(block_bitmap_bh);
> > -                     goto fail;
> > -             }
> > -
> > -             BUFFER_TRACE(block_bitmap_bh, "dirty block bitmap");
> > -             err = ext4_handle_dirty_metadata(handle, NULL, block_bitmap_bh);
> > -             brelse(block_bitmap_bh);
> > -
> > -             /* recheck and clear flag under lock if we still need to */
> > -             ext4_lock_group(sb, group);
> > -             if (gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) {
> > -                     gdp->bg_flags &= cpu_to_le16(~EXT4_BG_BLOCK_UNINIT);
> > -                     ext4_free_group_clusters_set(sb, gdp,
> > -                             ext4_free_clusters_after_init(sb, group, gdp));
> > -                     gdp->bg_checksum = ext4_group_desc_csum(sbi, group,
> > -                                                             gdp);
> > -             }
> > -             ext4_unlock_group(sb, group);
> > -
> > -             if (err)
> > -                     goto fail;
> > -     }
> >       BUFFER_TRACE(group_desc_bh, "call ext4_handle_dirty_metadata");
> >       err = ext4_handle_dirty_metadata(handle, NULL, group_desc_bh);
> >       if (err)
> > --
> > 1.7.8.11.gefc1f.dirty
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
> Cheers, Andreas
>
>
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andreas Dilger Jan. 15, 2012, 5:25 p.m. UTC | #3
On 2012-01-14, at 11:41, Amir Goldstein <amir73il@gmail.com> wrote:

> On Sat, Jan 14, 2012 at 7:02 AM, Andreas Dilger <adilger@dilger.ca> wrote:
>> 
>> On 2012-01-13, at 2:30 PM, Theodore Ts'o wrote:
>>> We don't need to initialize the block bitmap when we allocate a new
>>> inode.
>> 
>> The reason the block bitmap is initialized when an inode is allocated
>> in that group is to indicate that the inode bitmap is in use, and the
>> inode table blocks are also in use.
> 
> but the inode table blocks will not be from this block group when flex_bg
> is used and if it's the first group of a flex_bg group, then it's bitmaps
> are already initialized by mkfs/resize.

In that case the code could detect that  the bitmap covering the inode bitmap and inode table is already initialized and not do anything. 

>>> This is old code from the very early days that is just
>>> confusing things, and also has the problem of modifying the block
>>> group descriptor without obeying the ext4_journal_get_write_access() /
>>> ext4_handle_dirty_metadata() modification protocols.
>> 
>> The group descriptor was already modified earlier on when the inode was
>> being allocated from the group, so the group descriptor block was already
>> accounted for in the transaction credits after "repeat_in_this_group:"
>> 
>> repeat_in_this_group:
>>                ino = ext4_find_next_zero_bit((unsigned long *)
>>                                              inode_bitmap_bh->b_data,
>>                                              EXT4_INODES_PER_GROUP(sb), ino);
>> 
>>                if (ino < EXT4_INODES_PER_GROUP(sb)) {
>> 
>>                        BUFFER_TRACE(inode_bitmap_bh, "get_write_access");
>>                        err = ext4_journal_get_write_access(handle,
>>                                                            inode_bitmap_bh);
>>                        if (err)
>>                                goto fail;
>> 
>>                        BUFFER_TRACE(group_desc_bh, "get_write_access");
>> *****>>>>>              err = ext4_journal_get_write_access(handle,
>>                                                            group_desc_bh);
>> 
>> That is why ext4_handle_dirty_metadata() isn't called until after the group
>> descriptor is modified during the block bitmap initialization.
>> 
>> 
>> I'm not wholly against removing this code, but we have to do it with the
>> clear understanding that we will have inodes in use for which the block
>> bitmap is showing that the in-use blocks are free.  This doesn't seem like
>> a great situation to me.
>> 
>> 
>>> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
>>> ---
>>> fs/ext4/ialloc.c |   31 -------------------------------
>>> 1 files changed, 0 insertions(+), 31 deletions(-)
>>> 
>>> diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
>>> index 72fc989..a4ce10f 100644
>>> --- a/fs/ext4/ialloc.c
>>> +++ b/fs/ext4/ialloc.c
>>> @@ -807,37 +807,6 @@ repeat_in_this_group:
>>>       goto out;
>>> 
>>> got:
>>> -     /* We may have to initialize the block bitmap if it isn't already */
>>> -     if (EXT4_HAS_RO_COMPAT_FEATURE(sb, EXT4_FEATURE_RO_COMPAT_GDT_CSUM) &&
>>> -         gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) {
>>> -             struct buffer_head *block_bitmap_bh;
>>> -
>>> -             block_bitmap_bh = ext4_read_block_bitmap(sb, group);
>>> -             BUFFER_TRACE(block_bitmap_bh, "get block bitmap access");
>>> -             err = ext4_journal_get_write_access(handle, block_bitmap_bh);
>>> -             if (err) {
>>> -                     brelse(block_bitmap_bh);
>>> -                     goto fail;
>>> -             }
>>> -
>>> -             BUFFER_TRACE(block_bitmap_bh, "dirty block bitmap");
>>> -             err = ext4_handle_dirty_metadata(handle, NULL, block_bitmap_bh);
>>> -             brelse(block_bitmap_bh);
>>> -
>>> -             /* recheck and clear flag under lock if we still need to */
>>> -             ext4_lock_group(sb, group);
>>> -             if (gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) {
>>> -                     gdp->bg_flags &= cpu_to_le16(~EXT4_BG_BLOCK_UNINIT);
>>> -                     ext4_free_group_clusters_set(sb, gdp,
>>> -                             ext4_free_clusters_after_init(sb, group, gdp));
>>> -                     gdp->bg_checksum = ext4_group_desc_csum(sbi, group,
>>> -                                                             gdp);
>>> -             }
>>> -             ext4_unlock_group(sb, group);
>>> -
>>> -             if (err)
>>> -                     goto fail;
>>> -     }
>>>       BUFFER_TRACE(group_desc_bh, "call ext4_handle_dirty_metadata");
>>>       err = ext4_handle_dirty_metadata(handle, NULL, group_desc_bh);
>>>       if (err)
>>> --
>>> 1.7.8.11.gefc1f.dirty
>>> 
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> 
>> 
>> Cheers, Andreas
>> 
>> 
>> 
>> 
>> 
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Theodore Ts'o Jan. 16, 2012, 3:52 p.m. UTC | #4
On Fri, Jan 13, 2012 at 10:02:46PM -0700, Andreas Dilger wrote:
> On 2012-01-13, at 2:30 PM, Theodore Ts'o wrote:
> > This is old code from the very early days that is just
> > confusing things, and also has the problem of modifying the block
> > group descriptor without obeying the ext4_journal_get_write_access() /
> > ext4_handle_dirty_metadata() modification protocols.
> 
> The group descriptor was already modified earlier on when the inode was
> being allocated from the group, so the group descriptor block was already
> accounted for in the transaction credits after "repeat_in_this_group:"

That's true, but we also have to modify the block bitmap block itself;
and that's a journal credit which is currently not accounted for.
Fortunately we're over conservative enough in how we calculate the
number of journal credits necessary that in practice we don't BUG_ON
due to running out ouf journal credits.

> I'm not wholly against removing this code, but we have to do it with the
> clear understanding that we will have inodes in use for which the block
> bitmap is showing that the in-use blocks are free.  This doesn't seem like
> a great situation to me.

We have backup block groups descriptors and (in the case of meta_bg,
which is needed for 64-bit resize) primary block group descriptors
which are also only accounted for by the fields in the block group
descriptors.

We can't allocate blocks in that block group without initializing the
block bitmap, and ext4_init_block_bitmap() will reserve the
appropriate fields; furthermore, if the block group checksum is
incorrect, it will lock out the block group by marking all blocks as
used and setting the number of free inodes and blocks to zero.

So it should be a safe thing to do.

						- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
index 72fc989..a4ce10f 100644
--- a/fs/ext4/ialloc.c
+++ b/fs/ext4/ialloc.c
@@ -807,37 +807,6 @@  repeat_in_this_group:
 	goto out;
 
 got:
-	/* We may have to initialize the block bitmap if it isn't already */
-	if (EXT4_HAS_RO_COMPAT_FEATURE(sb, EXT4_FEATURE_RO_COMPAT_GDT_CSUM) &&
-	    gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) {
-		struct buffer_head *block_bitmap_bh;
-
-		block_bitmap_bh = ext4_read_block_bitmap(sb, group);
-		BUFFER_TRACE(block_bitmap_bh, "get block bitmap access");
-		err = ext4_journal_get_write_access(handle, block_bitmap_bh);
-		if (err) {
-			brelse(block_bitmap_bh);
-			goto fail;
-		}
-
-		BUFFER_TRACE(block_bitmap_bh, "dirty block bitmap");
-		err = ext4_handle_dirty_metadata(handle, NULL, block_bitmap_bh);
-		brelse(block_bitmap_bh);
-
-		/* recheck and clear flag under lock if we still need to */
-		ext4_lock_group(sb, group);
-		if (gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) {
-			gdp->bg_flags &= cpu_to_le16(~EXT4_BG_BLOCK_UNINIT);
-			ext4_free_group_clusters_set(sb, gdp,
-				ext4_free_clusters_after_init(sb, group, gdp));
-			gdp->bg_checksum = ext4_group_desc_csum(sbi, group,
-								gdp);
-		}
-		ext4_unlock_group(sb, group);
-
-		if (err)
-			goto fail;
-	}
 	BUFFER_TRACE(group_desc_bh, "call ext4_handle_dirty_metadata");
 	err = ext4_handle_dirty_metadata(handle, NULL, group_desc_bh);
 	if (err)