diff mbox

net: fold network name hash (v2)

Message ID 20091027150436.56e673cd@nehalam
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

stephen hemminger Oct. 27, 2009, 10:04 p.m. UTC
The full_name_hash does not produce a value that is evenly distributed
over the lower 8 bits. This causes name hash to be unbalanced with large
number of names. There is a standard function to fold in upper bits
so use that.

This is independent of possible improvements to full_name_hash()
in future.
 
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Eric Dumazet Oct. 28, 2009, 6:07 a.m. UTC | #1
Stephen Hemminger a écrit :
> The full_name_hash does not produce a value that is evenly distributed
> over the lower 8 bits. This causes name hash to be unbalanced with large
> number of names. There is a standard function to fold in upper bits
> so use that.
> 
> This is independent of possible improvements to full_name_hash()
> in future.

>  static inline struct hlist_head *dev_name_hash(struct net *net, const char *name)
>  {
>  	unsigned hash = full_name_hash(name, strnlen(name, IFNAMSIZ));
> -	return &net->dev_name_head[hash & ((1 << NETDEV_HASHBITS) - 1)];
> +	return &net->dev_name_head[hash_long(hash, NETDEV_HASHBITS)];
>  }
>  
>  static inline struct hlist_head *dev_index_hash(struct net *net, int ifindex)

full_name_hash() returns an "unsigned int", which is guaranteed to be 32 bits

You should therefore use hash_32(hash, NETDEV_HASHBITS),
not hash_long() that maps to hash_64() on 64 bit arches, which is
slower and certainly not any better with a 32bits input.



/* Compute the hash for a name string. */
static inline unsigned int
full_name_hash(const unsigned char *name, unsigned int len)
{
        unsigned long hash = init_name_hash();
        while (len--)
                hash = partial_name_hash(*name++, hash);
        return end_name_hash(hash);
}

static inline u32 hash_32(u32 val, unsigned int bits)
{
        /* On some cpus multiply is faster, on others gcc will do shifts */
        u32 hash = val * GOLDEN_RATIO_PRIME_32;

        /* High bits are more random, so use them. */
        return hash >> (32 - bits);
}


static inline u64 hash_64(u64 val, unsigned int bits)
{
        u64 hash = val;

        /*  Sigh, gcc can't optimise this alone like it does for 32 bits. */
        u64 n = hash;
        n <<= 18;
        hash -= n;
        n <<= 33;
        hash -= n;
        n <<= 3;
        hash += n;
        n <<= 3;
        hash -= n;
        n <<= 4;
        hash += n;
        n <<= 2;
        hash += n;

        /* High bits are more random, so use them. */
        return hash >> (64 - bits);
}
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Oct. 28, 2009, 9:28 a.m. UTC | #2
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 28 Oct 2009 07:07:10 +0100

> You should therefore use hash_32(hash, NETDEV_HASHBITS),
> not hash_long() that maps to hash_64() on 64 bit arches, which is
> slower and certainly not any better with a 32bits input.

Agreed.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
stephen hemminger Oct. 28, 2009, 3:57 p.m. UTC | #3
On Wed, 28 Oct 2009 07:07:10 +0100
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> Stephen Hemminger a écrit :
> > The full_name_hash does not produce a value that is evenly distributed
> > over the lower 8 bits. This causes name hash to be unbalanced with large
> > number of names. There is a standard function to fold in upper bits
> > so use that.
> > 
> > This is independent of possible improvements to full_name_hash()
> > in future.
> 
> >  static inline struct hlist_head *dev_name_hash(struct net *net, const char *name)
> >  {
> >  	unsigned hash = full_name_hash(name, strnlen(name, IFNAMSIZ));
> > -	return &net->dev_name_head[hash & ((1 << NETDEV_HASHBITS) - 1)];
> > +	return &net->dev_name_head[hash_long(hash, NETDEV_HASHBITS)];
> >  }
> >  
> >  static inline struct hlist_head *dev_index_hash(struct net *net, int ifindex)
> 
> full_name_hash() returns an "unsigned int", which is guaranteed to be 32 bits
> 
> You should therefore use hash_32(hash, NETDEV_HASHBITS),
> not hash_long() that maps to hash_64() on 64 bit arches, which is
> slower and certainly not any better with a 32bits input.

OK, I was following precedent. Only a couple places use hash_32, most use
hash_long().

Using the upper bits does give better distribution.
With 100,000 network names:

               Time       Ratio       Max   StdDev
hash_32       0.002123     1.00       422  11.07
hash_64       0.002927     1.00       400   3.97

The time field is pretty meaningless for such a small sample
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

--- a/net/core/dev.c	2009-10-27 14:54:21.922563076 -0700
+++ b/net/core/dev.c	2009-10-27 15:04:16.733813459 -0700
@@ -86,6 +86,7 @@ 
 #include <linux/socket.h>
 #include <linux/sockios.h>
 #include <linux/errno.h>
+#include <linux/hash.h>
 #include <linux/interrupt.h>
 #include <linux/if_ether.h>
 #include <linux/netdevice.h>
@@ -199,7 +200,7 @@  EXPORT_SYMBOL(dev_base_lock);
 static inline struct hlist_head *dev_name_hash(struct net *net, const char *name)
 {
 	unsigned hash = full_name_hash(name, strnlen(name, IFNAMSIZ));
-	return &net->dev_name_head[hash & ((1 << NETDEV_HASHBITS) - 1)];
+	return &net->dev_name_head[hash_long(hash, NETDEV_HASHBITS)];
 }
 
 static inline struct hlist_head *dev_index_hash(struct net *net, int ifindex)