[v2,07/67] fscache: Implement a hash function

Message ID	163906888735.143852.10944614318596881429.stgit@warthog.procyon.org.uk
State	New
Headers	show Return-Path: <linux-cifs-owner@vger.kernel.org> Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v2 07/67] fscache: Implement a hash function From: David Howells <dhowells@redhat.com> To: linux-cachefs@redhat.com Cc: dhowells@redhat.com, Trond Myklebust <trondmy@hammerspace.com>, Anna Schumaker <anna.schumaker@netapp.com>, Steve French <sfrench@samba.org>, Dominique Martinet <asmadeus@codewreck.org>, Jeff Layton <jlayton@kernel.org>, Matthew Wilcox <willy@infradead.org>, Alexander Viro <viro@zeniv.linux.org.uk>, Omar Sandoval <osandov@osandov.com>, JeffleXu <jefflexu@linux.alibaba.com>, Linus Torvalds <torvalds@linux-foundation.org>, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Date: Thu, 09 Dec 2021 16:54:47 +0000 Message-ID: <163906888735.143852.10944614318596881429.stgit@warthog.procyon.org.uk> In-Reply-To: <163906878733.143852.5604115678965006622.stgit@warthog.procyon.org.uk> References: <163906878733.143852.5604115678965006622.stgit@warthog.procyon.org.uk> User-Agent: StGit/0.23 MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Precedence: bulk
Series	fscache, cachefiles: Rewrite \| expand [v2,00/67] fscache, cachefiles: Rewrite [v2,01/67] fscache, cachefiles: Disable configuration [v2,04/67] netfs: Display the netfs inode number in the netfs_read tracepoint [v2,05/67] netfs: Pass a flag to ->prepare_write() to say if there's no alloc'd space [v2,06/67] fscache: Introduce new driver [v2,07/67] fscache: Implement a hash function [v2,08/67] fscache: Implement cache registration [v2,09/67] fscache: Implement volume registration [v2,10/67] fscache: Implement cookie registration [v2,11/67] fscache: Implement cache-level access helpers [v2,12/67] fscache: Implement volume-level access helpers [v2,13/67] fscache: Implement cookie-level access helpers [v2,14/67] fscache: Implement functions add/remove a cache [v2,15/67] fscache: Provide and use cache methods to lookup/create/free a volume [v2,16/67] fscache: Add a function for a cache backend to note an I/O error [v2,17/67] fscache: Implement simple cookie state machine [v2,18/67] fscache: Implement cookie user counting and resource pinning [v2,19/67] fscache: Implement cookie invalidation [v2,20/67] fscache: Provide a means to begin an operation [v2,21/67] fscache: Count data storage objects in a cache [v2,22/67] fscache: Provide read/write stat counters for the cache [v2,23/67] fscache: Provide a function to let the netfs update its coherency data [v2,24/67] netfs: Pass more information on how to deal with a hole in the cache [v2,25/67] fscache: Implement raw I/O interface [v2,26/67] fscache: Implement higher-level write I/O interface [v2,27/67] vfs, fscache: Implement pinning of cache usage for writeback [v2,28/67] fscache: Provide a function to note the release of a page [v2,29/67] fscache: Provide a function to resize a cookie [v2,30/67] cachefiles: Introduce rewritten driver [v2,31/67] cachefiles: Define structs [v2,32/67] cachefiles: Add some error injection support [v2,33/67] cachefiles: Add a couple of tracepoints for logging errors [v2,34/67] cachefiles: Add cache error reporting macro [v2,35/67] cachefiles: Add security derivation [v2,36/67] cachefiles: Register a miscdev and parse commands over it [v2,37/67] cachefiles: Provide a function to check how much space there is [v2,38/67] vfs, cachefiles: Mark a backing file in use with an inode flag [v2,39/67] cachefiles: Implement a function to get/create a directory in the cache [v2,40/67] cachefiles: Implement cache registration and withdrawal [v2,41/67] cachefiles: Implement volume support [v2,42/67] cachefiles: Add tracepoints for calls to the VFS [v2,43/67] cachefiles: Implement object lifecycle funcs [v2,44/67] cachefiles: Implement key to filename encoding [v2,45/67] cachefiles: Implement metadata/coherency data storage in xattrs [v2,46/67] cachefiles: Mark a backing file in use with an inode flag [v2,47/67] cachefiles: Implement culling daemon commands [v2,48/67] cachefiles: Implement backing file wrangling [v2,49/67] cachefiles: Implement begin and end I/O operation [v2,50/67] cachefiles: Implement cookie resize for truncate [v2,51/67] cachefiles: Implement the I/O routines [v2,52/67] cachefiles: Allow cachefiles to actually function [v2,53/67] fscache, cachefiles: Display stats of no-space events [v2,54/67] fscache, cachefiles: Display stat of culling events [v2,55/67] afs: Handle len being extending over page end in write_begin/write_end [v2,56/67] afs: Fix afs_write_end() to handle len > page size [v2,57/67] afs: Convert afs to use the new fscache API [v2,58/67] afs: Copy local writes to the cache when writing to the server [v2,59/67] afs: Skip truncation on the server of data we haven't written yet [v2,60/67] 9p: Use fscache indexing rewrite and reenable caching [v2,61/67] 9p: Copy local writes to the cache when writing to the server [v2,62/67] nfs: Convert to new fscache volume/cookie API [v2,63/67] nfs: Implement cache I/O by accessing the cache directly [v2,64/67] cifs: Support fscache indexing rewrite (untested) [v2,65/67] ceph: conversion to new fscache API [v2,66/67] ceph: add fscache writeback support

David Howells Dec. 9, 2021, 4:54 p.m. UTC

Implement a function to generate hashes.  It needs to be stable over time
and endianness-independent as the hashes will appear on disk in future
patches.  It can assume that its input is a multiple of four bytes in size
and alignment.

This is borrowed from the VFS and simplified.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: linux-cachefs@redhat.com
Link: https://lore.kernel.org/r/163819586113.215744.1699465806130102367.stgit@warthog.procyon.org.uk/ # v1
---

 fs/fscache/internal.h |    2 ++
 fs/fscache/main.c     |   39 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 41 insertions(+)

Linus Torvalds Dec. 9, 2021, 5:12 p.m. UTC | #1

On Thu, Dec 9, 2021 at 8:54 AM David Howells <dhowells@redhat.com> wrote:
>
> Implement a function to generate hashes.  It needs to be stable over time
> and endianness-independent as the hashes will appear on disk in future
> patches.

I'm not actually seeing this being endianness-independent.

Is the input just regular 32-bit data in native word order? Because
then it's not endianness-independent, it's purely that there *is* no
endianness to the data at all and it is purely native data.

So the code may be correct, but the explanation is confusing. There is
absolutely nothing here that is about endianness.

           Linus

David Howells Dec. 9, 2021, 9:57 p.m. UTC | #2

Linus Torvalds <torvalds@linux-foundation.org> wrote:

> > Implement a function to generate hashes.  It needs to be stable over time
> > and endianness-independent as the hashes will appear on disk in future
> > patches.
> 
> I'm not actually seeing this being endianness-independent.
> 
> Is the input just regular 32-bit data in native word order? Because
> then it's not endianness-independent, it's purely that there *is* no
> endianness to the data at all and it is purely native data.
>
> So the code may be correct, but the explanation is confusing. There is
> absolutely nothing here that is about endianness.

What I'm trying to get at is that the hash needs to be consistent, no matter
the endianness of the cpu, for any particular input blob.  The hashing
function shouldn't need to know the structure of the input blob.  In the case
of the volume key, it's a padded printable string; in the case of the cookie
key, it's probably some sort of structured blob, quite possibly an actual
array of be32.

The reason it needs to be consistent is that people seem to like seeding the
cache by tarring up the cache from one machine and untarring it on another.

And looking again at my code:

unsigned int fscache_hash(unsigned int salt, unsigned int *data, unsigned int n)
{
	unsigned int a, x = 0, y = salt;

	for (; n; n--) {
		a = *data++;   <<<<<<<
		HASH_MIX(x, y, a);
	}
	return fold_hash(x, y);
}

The marked line should probably use something like le/be32_to_cpu().

I also need to fix 9p to canonicalise its cookie key.

Thanks for catching that,
David

Linus Torvalds Dec. 9, 2021, 10:07 p.m. UTC | #3

On Thu, Dec 9, 2021 at 1:57 PM David Howells <dhowells@redhat.com> wrote:
>
> What I'm trying to get at is that the hash needs to be consistent, no matter
> the endianness of the cpu, for any particular input blob.

Yeah, if that's the case, then you should probably make that "unsigned
int *data" argument probably just be "void *" and then:

>                 a = *data++;   <<<<<<<
>                 HASH_MIX(x, y, a);
>         }
>         return fold_hash(x, y);
> }
>
> The marked line should probably use something like le/be32_to_cpu().

Yes, it should be using a '__le32 *' inside that function and you
should use l32_to_cpu(). Obviously, BE would work too, but cause
unnecessary work on common hardware.

But as mentioned for the other patches, you should then also be a lot
more careful about always using the end result as an 'unsigned int'
(or maybe 'u32') too, and when comparing hashes for binary search or
other, you should always do th4e compare in some stable format.

Because doing

        return (long)hash_a - (long)hash_b;

and looking at the sign doesn't actually result in a stable ordering
on 32-bit architectures. You don't get a transitive ordering (ie a < b
and b < c doesn't imply a < c).

And presumably if the hashes are meaningful across machines, then hash
comparisons should also be meaningful across machines.

So when comparing hashes, you need to compare them either in a truly
bigger signed type (and make sure that doesn't get truncated) - kind
of like how a lot of 'memcmp()' functions do 'unsigned char'
subtractions in an 'int' - or you need to compare them _as_ 'unsigned
int'.

Otherwise the comparisons will be all kinds of messed up.

          Linus

David Howells Dec. 10, 2021, 2:35 p.m. UTC | #4

Linus Torvalds <torvalds@linux-foundation.org> wrote:

> > What I'm trying to get at is that the hash needs to be consistent, no matter
> > the endianness of the cpu, for any particular input blob.
> 
> Yeah, if that's the case, then you should probably make that "unsigned
> int *data" argument probably just be "void *" and then:
> 
> >                 a = *data++;   <<<<<<<
> >                 HASH_MIX(x, y, a);
> >         }
> >         return fold_hash(x, y);
> > }
> >
> > The marked line should probably use something like le/be32_to_cpu().
> 
> Yes, it should be using a '__le32 *' inside that function and you
> should use l32_to_cpu(). Obviously, BE would work too, but cause
> unnecessary work on common hardware.

Okay, how about I make the attached change to make the hashing stable?  This
will make fscache_hash() take an opaque buffer and a length (the length must
be a multiple of four).

David
---
diff --git a/fs/fscache/cookie.c b/fs/fscache/cookie.c
index e287952292c5..65cf2ae22a70 100644
--- a/fs/fscache/cookie.c
+++ b/fs/fscache/cookie.c
@@ -269,22 +269,23 @@ EXPORT_SYMBOL(fscache_caching_failed);
 static int fscache_set_key(struct fscache_cookie *cookie,
 			   const void *index_key, size_t index_key_len)
 {
-	u32 *buf;
-	int bufs;
+	void *buf;
+	size_t buf_size;
 
-	bufs = DIV_ROUND_UP(index_key_len, sizeof(*buf));
+	buf_size = round_up(index_key_len, sizeof(__le32));
 
 	if (index_key_len > sizeof(cookie->inline_key)) {
-		buf = kcalloc(bufs, sizeof(*buf), GFP_KERNEL);
+		buf = kzalloc(buf_size, GFP_KERNEL);
 		if (!buf)
 			return -ENOMEM;
 		cookie->key = buf;
 	} else {
-		buf = (u32 *)cookie->inline_key;
+		buf = cookie->inline_key;
 	}
 
 	memcpy(buf, index_key, index_key_len);
-	cookie->key_hash = fscache_hash(cookie->volume->key_hash, buf, bufs);
+	cookie->key_hash = fscache_hash(cookie->volume->key_hash,
+					buf, buf_size);
 	return 0;
 }
 
diff --git a/fs/fscache/internal.h b/fs/fscache/internal.h
index 87884f4b34fb..f121c21590dc 100644
--- a/fs/fscache/internal.h
+++ b/fs/fscache/internal.h
@@ -86,7 +86,7 @@ static inline void fscache_end_operation(struct netfs_cache_resources *cres)
  */
 extern unsigned fscache_debug;
 
-extern unsigned int fscache_hash(unsigned int salt, unsigned int *data, unsigned int n);
+extern unsigned int fscache_hash(unsigned int salt, const void *data, size_t len);
 
 /*
  * proc.c
diff --git a/fs/fscache/main.c b/fs/fscache/main.c
index 01d57433702c..dad85fd84f6f 100644
--- a/fs/fscache/main.c
+++ b/fs/fscache/main.c
@@ -53,15 +53,16 @@ static inline unsigned int fold_hash(unsigned long x, unsigned long y)
 /*
  * Generate a hash.  This is derived from full_name_hash(), but we want to be
  * sure it is arch independent and that it doesn't change as bits of the
- * computed hash value might appear on disk.  The caller also guarantees that
- * the hashed data will be a series of aligned 32-bit words.
+ * computed hash value might appear on disk.  The caller must guarantee that
+ * the source data is a multiple of four bytes in size.
  */
-unsigned int fscache_hash(unsigned int salt, unsigned int *data, unsigned int n)
+unsigned int fscache_hash(unsigned int salt, const void *data, size_t len)
 {
-	unsigned int a, x = 0, y = salt;
+	const __le32 *p = data;
+	unsigned int a, x = 0, y = salt, n = len / sizeof(__le32);
 
 	for (; n; n--) {
-		a = *data++;
+		a = le32_to_cpu(*p++);
 		HASH_MIX(x, y, a);
 	}
 	return fold_hash(x, y);
diff --git a/fs/fscache/volume.c b/fs/fscache/volume.c
index edd3c245010e..26a6b8f315e1 100644
--- a/fs/fscache/volume.c
+++ b/fs/fscache/volume.c
@@ -131,7 +131,7 @@ static long fscache_compare_volume(const struct fscache_volume *a,
 	if (a->key[0] != b->key[0])
 		return (long)a->key[0]   - (long)b->key[0];
 
-	klen = round_up(a->key[0] + 1, sizeof(unsigned int));
+	klen = round_up(a->key[0] + 1, sizeof(__le32));
 	return memcmp(a->key, b->key, klen);
 }
 
@@ -225,7 +225,7 @@ static struct fscache_volume *fscache_alloc_volume(const char *volume_key,
 	 * hashing easier.
 	 */
 	klen = strlen(volume_key);
-	hlen = round_up(1 + klen + 1, sizeof(unsigned int));
+	hlen = round_up(1 + klen + 1, sizeof(__le32));
 	key = kzalloc(hlen, GFP_KERNEL);
 	if (!key)
 		goto err_vol;
@@ -233,8 +233,7 @@ static struct fscache_volume *fscache_alloc_volume(const char *volume_key,
 	memcpy(key + 1, volume_key, klen);
 
 	volume->key = key;
-	volume->key_hash = fscache_hash(0, (unsigned int *)key,
-					hlen / sizeof(unsigned int));
+	volume->key_hash = fscache_hash(0, key, hlen);
 
 	volume->debug_id = atomic_inc_return(&fscache_volume_debug_id);
 	down_write(&fscache_addremove_sem);

David Howells Dec. 10, 2021, 2:41 p.m. UTC | #5

Linus Torvalds <torvalds@linux-foundation.org> wrote:

> But as mentioned for the other patches, you should then also be a lot
> more careful about always using the end result as an 'unsigned int'
> (or maybe 'u32') too, and when comparing hashes for binary search or
> other, you should always do th4e compare in some stable format.
> 
> Because doing
> 
>         return (long)hash_a - (long)hash_b;
> 
> and looking at the sign doesn't actually result in a stable ordering
> on 32-bit architectures. You don't get a transitive ordering (ie a < b
> and b < c doesn't imply a < c).
> 
> And presumably if the hashes are meaningful across machines, then hash
> comparisons should also be meaningful across machines.
> 
> So when comparing hashes, you need to compare them either in a truly
> bigger signed type (and make sure that doesn't get truncated) - kind
> of like how a lot of 'memcmp()' functions do 'unsigned char'
> subtractions in an 'int' - or you need to compare them _as_ 'unsigned
> int'.
> 
> Otherwise the comparisons will be all kinds of messed up.

I don't think it should matter in this case, as the in-memory hash tables are
an independent of what's on disk (and not even necessarily the same size).
They're only used for duplicate detection.  The idea was to be able to shorten
the scanning of a hash bucket by half on average, but I didn't make it do
that.  It's just that I use the same hash value as a quick check first.

However, since the comparator functions are only used to see if they're the
same or different, the attached change makes them return bool instead, no
cast or subtraction required.

David
---
diff --git a/fs/fscache/cookie.c b/fs/fscache/cookie.c
index 65cf2ae22a70..ca36b598d6b5 100644
--- a/fs/fscache/cookie.c
+++ b/fs/fscache/cookie.c
@@ -289,17 +289,15 @@ static int fscache_set_key(struct fscache_cookie *cookie,
 	return 0;
 }
 
-static long fscache_compare_cookie(const struct fscache_cookie *a,
-				   const struct fscache_cookie *b)
+static bool fscache_cookie_same(const struct fscache_cookie *a,
+				const struct fscache_cookie *b)
 {
 	const void *ka, *kb;
 
-	if (a->key_hash != b->key_hash)
-		return (long)a->key_hash - (long)b->key_hash;
-	if (a->volume != b->volume)
-		return (long)a->volume - (long)b->volume;
-	if (a->key_len != b->key_len)
-		return (long)a->key_len - (long)b->key_len;
+	if (a->key_hash	!= b->key_hash ||
+	    a->volume	!= b->volume ||
+	    a->key_len	!= b->key_len)
+		return false;
 
 	if (a->key_len <= sizeof(a->inline_key)) {
 		ka = &a->inline_key;
@@ -308,7 +306,7 @@ static long fscache_compare_cookie(const struct fscache_cookie *a,
 		ka = a->key;
 		kb = b->key;
 	}
-	return memcmp(ka, kb, a->key_len);
+	return memcmp(ka, kb, a->key_len) == 0;
 }
 
 static atomic_t fscache_cookie_debug_id = ATOMIC_INIT(1);
@@ -399,7 +397,7 @@ static bool fscache_hash_cookie(struct fscache_cookie *candidate)
 
 	hlist_bl_lock(h);
 	hlist_bl_for_each_entry(cursor, p, h, hash_link) {
-		if (fscache_compare_cookie(candidate, cursor) == 0) {
+		if (fscache_cookie_same(candidate, cursor)) {
 			if (!test_bit(FSCACHE_COOKIE_RELINQUISHED, &cursor->flags))
 				goto collision;
 			wait_for = fscache_get_cookie(cursor,
diff --git a/fs/fscache/volume.c b/fs/fscache/volume.c
index 26a6b8f315e1..0e80fd8fd14a 100644
--- a/fs/fscache/volume.c
+++ b/fs/fscache/volume.c
@@ -119,20 +119,18 @@ void fscache_end_volume_access(struct fscache_volume *volume,
 }
 EXPORT_SYMBOL(fscache_end_volume_access);
 
-static long fscache_compare_volume(const struct fscache_volume *a,
-				   const struct fscache_volume *b)
+static bool fscache_volume_same(const struct fscache_volume *a,
+				const struct fscache_volume *b)
 {
 	size_t klen;
 
-	if (a->key_hash != b->key_hash)
-		return (long)a->key_hash - (long)b->key_hash;
-	if (a->cache != b->cache)
-		return (long)a->cache    - (long)b->cache;
-	if (a->key[0] != b->key[0])
-		return (long)a->key[0]   - (long)b->key[0];
+	if (a->key_hash	!= b->key_hash ||
+	    a->cache	!= b->cache ||
+	    a->key[0]	!= b->key[0])
+		return false;
 
 	klen = round_up(a->key[0] + 1, sizeof(__le32));
-	return memcmp(a->key, b->key, klen);
+	return memcmp(a->key, b->key, klen) == 0;
 }
 
 static bool fscache_is_acquire_pending(struct fscache_volume *volume)
@@ -170,7 +168,7 @@ static bool fscache_hash_volume(struct fscache_volume *candidate)
 
 	hlist_bl_lock(h);
 	hlist_bl_for_each_entry(cursor, p, h, hash_link) {
-		if (fscache_compare_volume(candidate, cursor) == 0) {
+		if (fscache_volume_same(candidate, cursor)) {
 			if (!test_bit(FSCACHE_VOLUME_RELINQUISHED, &cursor->flags))
 				goto collision;
 			fscache_see_volume(cursor, fscache_volume_get_hash_collision);
@@ -335,7 +333,7 @@ static void fscache_wake_pending_volume(struct fscache_volume *volume,
 	struct hlist_bl_node *p;
 
 	hlist_bl_for_each_entry(cursor, p, h, hash_link) {
-		if (fscache_compare_volume(cursor, volume) == 0) {
+		if (fscache_volume_same(cursor, volume)) {
 			fscache_see_volume(cursor, fscache_volume_see_hash_wake);
 			clear_bit(FSCACHE_VOLUME_ACQUIRE_PENDING, &cursor->flags);
 			wake_up_bit(&cursor->flags, FSCACHE_VOLUME_ACQUIRE_PENDING);

Linus Torvalds Dec. 10, 2021, 5:33 p.m. UTC | #6

On Fri, Dec 10, 2021 at 6:41 AM David Howells <dhowells@redhat.com> wrote:
>
> However, since the comparator functions are only used to see if they're the
> same or different, the attached change makes them return bool instead, no
> cast or subtraction required.

Ok, thanks, that resolves my worries.

Which is not to say it all works - I obviously only scanned the
patches rather than testing anything.

                Linus

[v2,07/67] fscache: Implement a hash function

Commit Message

Comments

Patch