Message ID | 20110329060300.GA27142@bitwizard.nl |
---|---|
State | New, archived |
Headers | show |
Am 29.03.2011 08:03, schrieb Rogier Wolff: >>> >>> Is the slow performance with lots of hard links a known issue? > > > > Yes, it is a known issue. At least its not my fault :-) thanks for the info. > > You get to test my patch. :-) > > > > I strongly suspect that (just like me) sometime in the past you've > > seen e2fsck run out of memory and were advised to enable the > > on-disk-databases. Something like that... The drive has been formatted recently but a bad controller corrupted vital information upon mount and some more on the next fsck. I Ctrl-C pretty fast when I saw lots of rather confusing kernel errors between fsck output. This could have left the drive in a similiar state, couldn't it? -- Christian Brandt life is short and in most cases it ends with death but my tombstone will carry the hiscore -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Mar 29, 2011 at 10:26:54PM +0200, Christian Brandt wrote: > Am 29.03.2011 08:03, schrieb Rogier Wolff: > > >>> >>> Is the slow performance with lots of hard links a known issue? > > > > > > Yes, it is a known issue. > > At least its not my fault :-) thanks for the info. > > > > You get to test my patch. :-) > > > > > > I strongly suspect that (just like me) sometime in the past you've > > > seen e2fsck run out of memory and were advised to enable the > > > on-disk-databases. > > Something like that... The drive has been formatted recently but a bad > controller corrupted vital information upon mount and some more on the > next fsck. I Ctrl-C pretty fast when I saw lots of rather confusing > kernel errors between fsck output. This could have left the drive in a > similiar state, couldn't it? The code I "fixed" is the code that uses an on-disk database instead of in-memory datastructures. Those in-memory datastructures may move to swap if you have enough of that and enough addressing space. In my case, normal fsck memory usage plus those two flexible datastructures would have exceeded 3Gb which exceeds the 32-bit Linux process size limit. So if you haven't touched the config file which specifies to put these structures on disk, you are not experiencing the same problem that I was.... Or someone else changed the configuration file for you.... The patch is against a CVS checkout (or whatever SCM is used) of e2fsprogs. Roger.
diff --git a/e2fsck/dirinfo.c b/e2fsck/dirinfo.c index 901235c..9b29f23 100644 --- a/e2fsck/dirinfo.c +++ b/e2fsck/dirinfo.c @@ -62,7 +62,7 @@ static void setup_tdb(e2fsck_t ctx, ext2_ino_t num_dirs) uuid_unparse(ctx->fs->super->s_uuid, uuid); sprintf(db->tdb_fn, "%s/%s-dirinfo-XXXXXX", tdb_dir, uuid); fd = mkstemp(db->tdb_fn); - db->tdb = tdb_open(db->tdb_fn, 0, TDB_CLEAR_IF_FIRST, + db->tdb = tdb_open(db->tdb_fn, 999931, TDB_NOLOCK | TDB_NOSYNC, O_RDWR | O_CREAT | O_TRUNC, 0600); close(fd); } diff --git a/lib/ext2fs/icount.c b/lib/ext2fs/icount.c index bec0f5f..bdd5b26 100644 --- a/lib/ext2fs/icount.c +++ b/lib/ext2fs/icount.c @@ -173,6 +173,19 @@ static void uuid_unparse(void *uu, char *out) uuid.node[3], uuid.node[4], uuid.node[5]); } +static unsigned int my_tdb_hash(TDB_DATA *key) +{ + unsigned int value; /* Used to compute the hash value. */ + int i; /* Used to cycle through random values. */ + + /* initial value 0 is as good as any one. */ + for (value = 0, i=0; i < key->dsize; i++) + value = value * 256 + key->dptr[i] + (value >> 24) * 241; + + return value; +} + + errcode_t ext2fs_create_icount_tdb(ext2_filsys fs, char *tdb_dir, int flags, ext2_icount_t *ret) { @@ -180,6 +193,7 @@ errcode_t ext2fs_create_icount_tdb(ext2_filsys fs, char *tdb_dir, errcode_t retval; char *fn, uuid[40]; int fd; + int hash_size; retval = alloc_icount(fs, flags, &icount); if (retval) @@ -192,9 +206,20 @@ errcode_t ext2fs_create_icount_tdb(ext2_filsys fs, char *tdb_dir, sprintf(fn, "%s/%s-icount-XXXXXX", tdb_dir, uuid); fd = mkstemp(fn); + /* + hash_size should be on the same order of the number of entries actually + used. The tdb default used to be 131 which gives us a big performance + penalty with normal inode numbers. We now trust the superblock. If it's + wrong, don't worry, tdb will manage, it will just cost a little bit more + CPUtime. + If the hash function is good and distributes the values uniformly across + the 32bit output space, it doesn't really matter that we didn't chose a + prime. The default tdb hash function is pretty worthless. Someone didn't + read Knuth. */ + hash_size = fs->super->s_inodes_count - fs->super->s_free_inodes_count; icount->tdb_fn = fn; - icount->tdb = tdb_open(fn, 0, TDB_CLEAR_IF_FIRST, - O_RDWR | O_CREAT | O_TRUNC, 0600); + icount->tdb = tdb_open_ex(fn, hash_size, TDB_NOLOCK | TDB_NOSYNC, + O_RDWR | O_CREAT | O_TRUNC, 0600, NULL, my_tdb_hash); if (icount->tdb) { close(fd); *ret = icount;