diff mbox

Add largedir feature

Message ID 2C3E39F6-8143-415F-B9D5-F3363BDEBC45@dilger.ca
State Superseded, archived
Headers show

Commit Message

Andreas Dilger May 1, 2017, 11:39 p.m. UTC
On May 1, 2017, at 12:58 PM, Eric Biggers <ebiggers3@gmail.com> wrote:
> 
> On Sat, Apr 29, 2017 at 08:59:53PM -0400, Theodore Ts'o wrote:
>> On Thu, Mar 16, 2017 at 05:51:17AM -0400, Artem Blagodarenko wrote:
>>> From: Artem Blagodarenko <artem.blagodarenko@gmail.com>
>>> 
>>> This INCOMPAT_LARGEDIR feature allows larger directories to be created
>>> in ldiskfs, both with directory sizes over 2GB and and a maximum htree
>>> depth of 3 instead of the current limit of 2. These features are needed
>>> in order to exceed the current limit of approximately 10M entries in a
>>> single directory.
>>> 
>>> Signed-off-by: Yang Sheng <yang.sheng@intel.com>
>>> Signed-off-by: Artem Blagodarenko <artem.blagodarenko@seagate.com>
>> 
>> Thanks, applied.
>> 
>> 					- Ted
> 
> FYI, this patch is causing a problem when creating many files in a directory
> (without the largedir feature enabled).  I haven't looked into it but maybe it
> will ring a bell for someone.

Hmm, is this also a problem without the patch, when creating large numbers
of entries in a directory?

It looks like the directory index is clobbering the directory index block
checksum, which is stored in struct ext2_dx_tail at the end of each index
block.  One possibility is that the patch is miscalculating the maximum
number of entries in the index blocks when the metadata_csum feature is
enabled?  I don't think that feature is enabled by default with e2fsprogs
yet, so it is entirely possible that these two features aren't quite playing
nice together in the sandbox.

That said, I looked through the patch again with this in mind, and I don't
see anything related to the dx_limit.  The dx_node_limit() function correctly
takes the metadata_csum feature into account when calculating the limit of
the htree entries in interior index blocks, so it isn't clear where this
checksum error is coming from.

It might be useful to print out the checksum values, just in case e.g. this
is being checked on a block that was just zeroed out?

Alternately, it might be that we are not initializing the checksum properly
in the first place?  Looking at it from this angle, I see what could be a
problem.  Can you try out the following (untested) patch (also attached in
case of mangling)?  I'm not totally sure it is correct, since the change from
ext4_handle_dirty_metadata() to ext4_handle_dirty_dx_block() could potentially
be ext4_handle_dirty_dirent_block(), but I _think_ the current change is right.

There is also a separate question of whether we need to dirty ALL of the buffers
in this code, compared to the pre-patch changes which had fewer calls to dirty
buffers, but at least this patch should fix the checksum errors.

Cheers, Andreas
----


> seq -f "abcdefghijklmnopqrstuvwxyz012345%.0f" 100000 | xargs touch
> 
> [   42.726480] EXT4-fs error (device vdc): dx_probe:840: inode #2: block 508: comm touch: Directory index failed checksum
> [   42.729472] EXT4-fs error (device vdc): dx_probe:840: inode #2: block 508: comm touch: Directory index failed checksum
> [   42.731689] EXT4-fs error (device vdc): dx_probe:840: inode #2: block 508: comm touch: Directory index failed checksum
> [   42.734303] EXT4-fs error (device vdc): dx_probe:840: inode #2: block 508: comm touch: Directory index failed checksum
> [   42.736383] EXT4-fs error (device vdc): dx_probe:840: inode #2: block 508: comm touch: Directory index failed checksum
> [   42.739133] EXT4-fs error (device vdc): dx_probe:840: inode #2: block 508: comm touch: Directory index failed checksum
> [   42.741307] EXT4-fs error (device vdc): dx_probe:840: inode #2: block 508: comm touch: Directory index failed checksum
> [   42.743963] EXT4-fs error (device vdc): dx_probe:840: inode #2: block 508: comm touch: Directory index failed checksum
> [   42.745998] EXT4-fs error (device vdc): dx_probe:840: inode #2: block 508: comm touch: Directory index failed checksum
> ...


Cheers, Andreas

Comments

Theodore Ts'o May 2, 2017, 2:44 a.m. UTC | #1
On Mon, May 01, 2017 at 05:39:41PM -0600, Andreas Dilger wrote:
> > FYI, this patch is causing a problem when creating many files in a directory
> > (without the largedir feature enabled).  I haven't looked into it but maybe it
> > will ring a bell for someone.
> 
> Hmm, is this also a problem without the patch, when creating large numbers
> of entries in a directory?

No, it works fine without the patch.  Turnkey reproduction for
kvm-xfstests which works fine without the largedir patch, but which
fails with the largedir patch:

   mke2fs -t ext4 -Fq /dev/vdc
   mount /vdc
   mkdir /vdc/a
   cd /vdc/a
   seq -f "abcdefghijklmnopqrstuvwxyz012345%.0f" 100000 | xargs touch
   umount /vdc
   e2fsck -fy /dev/vdc

The output of e2fsck:

e2fsck 1.43.5-WIP-ed1e950f (26-Apr-2017)
Pass 1: Checking inodes, blocks, and sizes
Inode 131073, i_size is 2080768, should be 2084864.  Fix? yes

Pass 2: Checking directory structure
Problem in HTREE directory inode 131073: root node fails checksum.
Clear HTree index? yes

Pass 3: Checking directory connectivity
Pass 3A: Optimizing directories
Pass 4: Checking reference counts
Pass 5: Checking group summary information

/dev/vdc: ***** FILE SYSTEM WAS MODIFIED *****
/dev/vdc: 30705/327680 files (0.0% non-contiguous), 42515/1310720 blocks

It looks the large_dir patch is responsible for a test regression ---
it is causing generic/339 to fail.

I'm going to drop the large dir patch for the 4.12 merge window.
Eric, thanks for noticing and pointing this out!

						- Ted
diff mbox

Patch

diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index bd189c04..f0e8a6c 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -2309,21 +2309,23 @@  static int ext4_dx_add_entry(handle_t *handle,
 			dx_insert_block((frame - 1), hash2, newblock);
 			dxtrace(dx_show_index("node", frame->entries));
 			dxtrace(dx_show_index("node",
-				((struct dx_node *) bh2->b_data)->entries));
+				((struct dx_node *)bh2->b_data)->entries));
 			err = ext4_handle_dirty_dx_node(handle, dir, bh2);
 			if (err)
 				goto journal_error;
-			brelse (bh2);
-			ext4_handle_dirty_metadata(handle, dir,
-						   (frame - 1)->bh);
+			brelse(bh2);
+			err = ext4_handle_dirty_dx_node(handle, dir,
+							(frame - 1)->bh);
+			if (err)
+				goto journal_error;
 			if (restart) {
-				ext4_handle_dirty_metadata(handle, dir,
-							   frame->bh);
-				goto cleanup;
+				err = ext4_handle_dirty_dirty_dx_node(handle,
+								dir, frame->bh);
+				goto journal_error;
 			}
-		} else {
+		} else /* add_level */ {
 			struct dx_root *dxroot;
-			memcpy((char *) entries2, (char *) entries,
+			memcpy((char *)entries2, (char *)entries,
 			       icount * sizeof(struct dx_entry));
 			dx_set_limit(entries2, dx_node_limit(dir));

@@ -2335,15 +2337,13 @@  static int ext4_dx_add_entry(handle_t *handle,
 			dxtrace(printk(KERN_DEBUG
 				       "Creating %d level index...\n",
 				       info->indirect_levels));
-			ext4_handle_dirty_metadata(handle, dir, frame->bh);
-			ext4_handle_dirty_metadata(handle, dir, bh2);
+			err = ext4_handle_dirty_dx_node(handle, dir, frame->bh);
+			if (err)
+				goto journal_error;
+			err = ext4_handle_dirty_dx_node(handle, dir, bh2);
 			brelse(bh2);
 			restart = 1;
-			goto cleanup;
-		}
-		if (err) {
-			ext4_std_error(inode->i_sb, err);
-			goto cleanup;
+			goto journal_error;
 		}
 	}
 	de = do_split(handle, dir, &bh, frame, &fname->hinfo);
@@ -2355,7 +2355,7 @@  static int ext4_dx_add_entry(handle_t *handle,
 	goto cleanup;

 journal_error:
-	ext4_std_error(dir->i_sb, err);
+	ext4_std_error(dir->i_sb, err); /* this is a no-op if err == 0 */
 cleanup:
 	brelse(bh);
 	dx_release(frames);