Message ID | 20141031214949.GA644@thunk.org |
---|---|
State | Accepted, archived |
Headers | show |
On Fri, Oct 31, 2014 at 2:49 PM, Theodore Ts'o <tytso@mit.edu> wrote: > > Theodore Ts'o (1): > jbd2: use a better hash function for the revoke table Does it really make sense to use hash_u64()? It can be quite expensive (mainly on 32-bit targets), and since the low bits are where all the information is anyway, I'd suggest using hash_32() here even if the block number in theory can have a few bits above the 32-bit mark. Anyway, pulled, but I just reacted to that small detail. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Oct 31, 2014 at 04:26:16PM -0700, Linus Torvalds wrote: > On Fri, Oct 31, 2014 at 2:49 PM, Theodore Ts'o <tytso@mit.edu> wrote: > > > > Theodore Ts'o (1): > > jbd2: use a better hash function for the revoke table > > Does it really make sense to use hash_u64()? It can be quite expensive > (mainly on 32-bit targets), and since the low bits are where all the > information is anyway, I'd suggest using hash_32() here even if the > block number in theory can have a few bits above the 32-bit mark. Hmm... the problem is that since the block group size is normally 32768 blocks, and most metadata blocks (which is what needs to be revoked) is located at the beginning of the block groups, if we drop the high 32-bits, then there would be some hash aliasing going on. What we could do is use hash_32() unless we have a file system large enough that it matters, and then if we still wanted to avoid using hash_u64(), we could do something like this: hash_32(__swab32(blk >> 32) | (blk & 0xFFFFFFFF)) That way we get the information from the block group number as well, and in a way where it doesn't interfere with the information in the low bits of the block number. I didn't think hash_64 was *that* slow, so it's not clear the above would be faster, though. And if someone is using a > 16TB file system on a 32-bit platform, I suspect they might be having other problems. :-) - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, Nov 1, 2014 at 6:38 AM, Theodore Ts'o <tytso@mit.edu> wrote: > > I didn't think hash_64 was *that* slow, so it's not clear the above > would be faster, though. And if someone is using a > 16TB file system > on a 32-bit platform, I suspect they might be having other problems. :-) Fair enough, hash_64() isn't *that* slow. But it _is_ 6 64-bit shifts and adds/subtracts, which on a 32-bit machine tends to be quite expensive. On some of them it's function calls etc. And your point about >16TB filesystems is completely buggy. That was *my* point. Most people - even on 64-bit - do *not* have 16TB filesystems, and the high 32 bits are zero or contain very very little information (ie even on a multi-terabyte filesystem, it's one or two bits worth of information). So hash_32() is not only much more reasonable on a 32-bit machine, the end result is basically as good for 99.999% of all uses. Exactly *because* people don't have those big filesystems. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> And your point about >16TB filesystems is completely buggy. That was > *my* point. Most people - even on 64-bit - do *not* have 16TB > filesystems, and the high 32 bits are zero or contain very very little > information (ie even on a multi-terabyte filesystem, it's one or two > bits worth of information). So hash_32() is not only much more > reasonable on a 32-bit machine, the end result is basically as good > for 99.999% of all uses. Exactly *because* people don't have those big > filesystems. I agree; that's why I suggested using hash_32() if the number of blocks in the file system is less than 2**32. I did look at hash_u64 and didn't think it was that bad, but that's probably because compared to crypto checksums it's positively fast, and it's really easy to get into the bad habit of thinking that all the world's an x86_64. :-) - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html