diff mbox

[v6,net-next,1/5] net: add napi_id and hash

Message ID 20130529063925.27486.46649.stgit@ladj378.jer.intel.com
State Superseded, archived
Delegated to: David Miller
Headers show

Commit Message

Eliezer Tamir May 29, 2013, 6:39 a.m. UTC
Adds a napi_id and a hashing mechanism to lookup a napi by id.
This will be used by subsequent patches to implement low latency
Ethernet device polling.
Based on a code sample by Eric Dumazet.

Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
---

 include/linux/netdevice.h |   29 ++++++++++++++++++++++++
 net/core/dev.c            |   54 +++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 83 insertions(+), 0 deletions(-)


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Eric Dumazet May 29, 2013, 12:56 p.m. UTC | #1
On Wed, 2013-05-29 at 09:39 +0300, Eliezer Tamir wrote:
> Adds a napi_id and a hashing mechanism to lookup a napi by id.
> This will be used by subsequent patches to implement low latency
> Ethernet device polling.
> Based on a code sample by Eric Dumazet.
> 
> Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
> ---

OK this looks good enough for inclusion.

If a v7 ever is submitted, please add a 'static' for

static DEFINE_SPINLOCK(napi_hash_lock);
static unsigned int napi_gen_id;
static DEFINE_HASHTABLE(napi_hash, 8);

If David chose to apply v6, I'll submit a patch for this.

Signed-off-by: Eric Dumazet <edumazet@google.com>



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet May 29, 2013, 1:43 p.m. UTC | #2
On Wed, 2013-05-29 at 14:09 +0100, David Laight wrote:
> > > Adds a napi_id and a hashing mechanism to lookup a napi by id.
> 
> Is this one of the places where the 'id' can be selected
> so that the 'hash' lookup never collides?

Very few devices will ever call napi_hash_add()

[ Real NIC RX queues, not virtual devices ]

We use a hash table with 256 slots, the chance of collision is about 0%

Lets not over engineer the thing before its even used.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eliezer Tamir May 29, 2013, 3:04 p.m. UTC | #3
On 29/05/2013 15:56, Eric Dumazet wrote:
> On Wed, 2013-05-29 at 09:39 +0300, Eliezer Tamir wrote:
>> Adds a napi_id and a hashing mechanism to lookup a napi by id.
>> This will be used by subsequent patches to implement low latency
>> Ethernet device polling.
>> Based on a code sample by Eric Dumazet.
>>
>> Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
>> ---
>
> OK this looks good enough for inclusion.
>
> If a v7 ever is submitted, please add a 'static' for
>
> static DEFINE_SPINLOCK(napi_hash_lock);
> static unsigned int napi_gen_id;
> static DEFINE_HASHTABLE(napi_hash, 8);
>

I will post a v7 along with the changes you suggested to 2/5, I will 
wait a bit to see if there are other things to fix.

Thanks,
Eliezer
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ben Hutchings May 29, 2013, 8:09 p.m. UTC | #4
On Wed, 2013-05-29 at 09:39 +0300, Eliezer Tamir wrote:
> Adds a napi_id and a hashing mechanism to lookup a napi by id.
> This will be used by subsequent patches to implement low latency
> Ethernet device polling.
> Based on a code sample by Eric Dumazet.
> 
> Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
[...]
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
[...]
> @@ -4136,6 +4143,53 @@ void napi_complete(struct napi_struct *n)
>  }
>  EXPORT_SYMBOL(napi_complete);
>  
> +void napi_hash_add(struct napi_struct *napi)
> +{
> +	if (!test_and_set_bit(NAPI_STATE_HASHED, &napi->state)) {
> +
> +		spin_lock(&napi_hash_lock);
> +
> +		/* 0 is not a valid id */
> +		napi->napi_id = 0;
> +		while (!napi->napi_id)
> +			napi->napi_id = ++napi_gen_id;

Suppose we're loading/unloading one driver repeatedly while another one
remains loaded the whole time.  Then once napi_gen_id wraps around, the
same ID can be assigned to multiple contexts.

So far as I can see, assigning the same ID twice will just make polling
stop working for one of the NAPI contexts; I don't think it causes a
crash.  And it is exceedingly unlikely to happen in production.  But if
you're going to the trouble of handling wrap-around at all, you'd better
handle this.

[...]
> +/* must be called under rcu_read_lock(), as we dont take a reference */
> +struct napi_struct *napi_by_id(int napi_id)
> +{
> +	unsigned int hash = napi_id % HASH_SIZE(napi_hash);
[...]

napi_id should be declared unsigned int here, as elsewhere.  The
division can't actually yield a negative result because HASH_SIZE() has
type size_t and napi_id is promoted to match, but I had to go and look
at hashtable.h to check that.

Ben.
Eliezer Tamir May 30, 2013, 6:51 a.m. UTC | #5
On 29/05/2013 23:09, Ben Hutchings wrote:
> On Wed, 2013-05-29 at 09:39 +0300, Eliezer Tamir wrote:
>> +void napi_hash_add(struct napi_struct *napi)
>> +{
>> +	if (!test_and_set_bit(NAPI_STATE_HASHED, &napi->state)) {
>> +
>> +		spin_lock(&napi_hash_lock);
>> +
>> +		/* 0 is not a valid id */
>> +		napi->napi_id = 0;
>> +		while (!napi->napi_id)
>> +			napi->napi_id = ++napi_gen_id;
>
> Suppose we're loading/unloading one driver repeatedly while another one
> remains loaded the whole time.  Then once napi_gen_id wraps around, the
> same ID can be assigned to multiple contexts.
>
> So far as I can see, assigning the same ID twice will just make polling
> stop working for one of the NAPI contexts; I don't think it causes a
> crash.  And it is exceedingly unlikely to happen in production.  But if
> you're going to the trouble of handling wrap-around at all, you'd better
> handle this.

OK


> [...]
>> +/* must be called under rcu_read_lock(), as we dont take a reference */
>> +struct napi_struct *napi_by_id(int napi_id)
>> +{
>> +	unsigned int hash = napi_id % HASH_SIZE(napi_hash);
> [...]
>
> napi_id should be declared unsigned int here, as elsewhere.  The
> division can't actually yield a negative result because HASH_SIZE() has
> type size_t and napi_id is promoted to match, but I had to go and look
> at hashtable.h to check that.

Good catch,

Thanks,
Eliezer
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 8f967e3..964648e 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -324,12 +324,15 @@  struct napi_struct {
 	struct sk_buff		*gro_list;
 	struct sk_buff		*skb;
 	struct list_head	dev_list;
+	struct hlist_node	napi_hash_node;
+	unsigned int		napi_id;
 };
 
 enum {
 	NAPI_STATE_SCHED,	/* Poll is scheduled */
 	NAPI_STATE_DISABLE,	/* Disable pending */
 	NAPI_STATE_NPSVC,	/* Netpoll - don't dequeue from poll_list */
+	NAPI_STATE_HASHED,	/* In NAPI hash */
 };
 
 enum gro_result {
@@ -446,6 +449,32 @@  extern void __napi_complete(struct napi_struct *n);
 extern void napi_complete(struct napi_struct *n);
 
 /**
+ *	napi_hash_add - add a NAPI to global hashtable
+ *	@napi: napi context
+ *
+ * generate a new napi_id and store a @napi under it in napi_hash
+ */
+extern void napi_hash_add(struct napi_struct *napi);
+
+/**
+ *	napi_hash_del - remove a NAPI from global table
+ *	@napi: napi context
+ *
+ * Warning: caller must observe rcu grace period
+ * before freeing memory containing @napi
+ */
+extern void napi_hash_del(struct napi_struct *napi);
+
+/**
+ *	napi_by_id - lookup a NAPI by napi_id
+ *	@napi_id: hashed napi_id
+ *
+ * lookup @napi_id in napi_hash table
+ * must be called under rcu_read_lock()
+ */
+extern struct napi_struct *napi_by_id(int napi_id);
+
+/**
  *	napi_disable - prevent NAPI from scheduling
  *	@n: napi context
  *
diff --git a/net/core/dev.c b/net/core/dev.c
index b2e9057..0f39481 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -129,6 +129,7 @@ 
 #include <linux/inetdevice.h>
 #include <linux/cpu_rmap.h>
 #include <linux/static_key.h>
+#include <linux/hashtable.h>
 
 #include "net-sysfs.h"
 
@@ -166,6 +167,12 @@  static struct list_head offload_base __read_mostly;
 DEFINE_RWLOCK(dev_base_lock);
 EXPORT_SYMBOL(dev_base_lock);
 
+/* protects napi_hash addition/deletion and napi_gen_id */
+DEFINE_SPINLOCK(napi_hash_lock);
+
+unsigned int napi_gen_id;
+DEFINE_HASHTABLE(napi_hash, 8);
+
 seqcount_t devnet_rename_seq;
 
 static inline void dev_base_seq_inc(struct net *net)
@@ -4136,6 +4143,53 @@  void napi_complete(struct napi_struct *n)
 }
 EXPORT_SYMBOL(napi_complete);
 
+void napi_hash_add(struct napi_struct *napi)
+{
+	if (!test_and_set_bit(NAPI_STATE_HASHED, &napi->state)) {
+
+		spin_lock(&napi_hash_lock);
+
+		/* 0 is not a valid id */
+		napi->napi_id = 0;
+		while (!napi->napi_id)
+			napi->napi_id = ++napi_gen_id;
+
+		hlist_add_head_rcu(&napi->napi_hash_node,
+			&napi_hash[napi->napi_id % HASH_SIZE(napi_hash)]);
+
+		spin_unlock(&napi_hash_lock);
+	}
+}
+EXPORT_SYMBOL_GPL(napi_hash_add);
+
+/* Warning : caller is responsible to make sure rcu grace period
+ * is respected before freeing memory containing @napi
+ */
+void napi_hash_del(struct napi_struct *napi)
+{
+	spin_lock(&napi_hash_lock);
+
+	if (test_and_clear_bit(NAPI_STATE_HASHED, &napi->state))
+		hlist_del_rcu(&napi->napi_hash_node);
+
+	spin_unlock(&napi_hash_lock);
+}
+EXPORT_SYMBOL_GPL(napi_hash_del);
+
+/* must be called under rcu_read_lock(), as we dont take a reference */
+struct napi_struct *napi_by_id(int napi_id)
+{
+	unsigned int hash = napi_id % HASH_SIZE(napi_hash);
+	struct napi_struct *napi;
+
+	hlist_for_each_entry_rcu(napi, &napi_hash[hash], napi_hash_node)
+		if (napi->napi_id == napi_id)
+			return napi;
+
+	return NULL;
+}
+EXPORT_SYMBOL_GPL(napi_by_id);
+
 void netif_napi_add(struct net_device *dev, struct napi_struct *napi,
 		    int (*poll)(struct napi_struct *, int), int weight)
 {