diff mbox

[net-next] net: introduce dev_consume_skb_any()

Message ID 1386247508.30495.180.camel@edumazet-glaptop2.roam.corp.google.com
State Accepted, archived
Delegated to: David Miller
Headers show

Commit Message

Eric Dumazet Dec. 5, 2013, 12:45 p.m. UTC
From: Eric Dumazet <edumazet@google.com>

Some network drivers use dev_kfree_skb_any() and dev_kfree_skb_irq()
helpers to free skbs, both for dropped packets and TX completed ones.

We need to separate the two causes to get better diagnostics
given by dropwatch or "perf record -e skb:kfree_skb"

This patch provides two new helpers, dev_consume_skb_any() and
dev_consume_skb_irq() to be used for consumed skbs.

__dev_kfree_skb_irq() is slightly optimized to remove one
atomic_dec_and_test() in fast path, and use this_cpu_{r|w} accessors.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/linux/netdevice.h |   53 +++++++++++++++++++++++++++++-------
 net/core/dev.c            |   45 ++++++++++++++++++++----------
 2 files changed, 74 insertions(+), 24 deletions(-)



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Hannes Frederic Sowa Dec. 5, 2013, 2:13 p.m. UTC | #1
On Thu, Dec 05, 2013 at 04:45:08AM -0800, Eric Dumazet wrote:
> -		local_irq_save(flags);
> -		sd = &__get_cpu_var(softnet_data);
> -		skb->next = sd->completion_queue;
> -		sd->completion_queue = skb;
> -		raise_softirq_irqoff(NET_TX_SOFTIRQ);
> -		local_irq_restore(flags);
> +	if (likely(atomic_read(&skb->users) == 1)) {
> +		smp_rmb();

Could you give me a hint why this barrier is needed? IMHO the volatile
access in atomic_read should get rid of the control dependency so I
don't see a need for this barrier. Without the volatile access a
compiler-barrier would still suffice, I guess?


> +		atomic_set(&skb->users, 0);
> +	} else if (likely(!atomic_dec_and_test(&skb->users))) {
> +		return;

Or does this memory barrier deal with the part below this return?

Thanks,

  Hannes

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Dec. 5, 2013, 2:45 p.m. UTC | #2
On Thu, 2013-12-05 at 15:13 +0100, Hannes Frederic Sowa wrote:
> On Thu, Dec 05, 2013 at 04:45:08AM -0800, Eric Dumazet wrote:
> > -		local_irq_save(flags);
> > -		sd = &__get_cpu_var(softnet_data);
> > -		skb->next = sd->completion_queue;
> > -		sd->completion_queue = skb;
> > -		raise_softirq_irqoff(NET_TX_SOFTIRQ);
> > -		local_irq_restore(flags);
> > +	if (likely(atomic_read(&skb->users) == 1)) {
> > +		smp_rmb();
> 
> Could you give me a hint why this barrier is needed? IMHO the volatile
> access in atomic_read should get rid of the control dependency so I
> don't see a need for this barrier. Without the volatile access a
> compiler-barrier would still suffice, I guess?

Please take a look at kfree_skb() implementation.

If you think a comment is needed there, please feel free to add it.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Dec. 5, 2013, 3:05 p.m. UTC | #3
On Thu, 2013-12-05 at 06:45 -0800, Eric Dumazet wrote:
> On Thu, 2013-12-05 at 15:13 +0100, Hannes Frederic Sowa wrote:
> > On Thu, Dec 05, 2013 at 04:45:08AM -0800, Eric Dumazet wrote:
> > > -		local_irq_save(flags);
> > > -		sd = &__get_cpu_var(softnet_data);
> > > -		skb->next = sd->completion_queue;
> > > -		sd->completion_queue = skb;
> > > -		raise_softirq_irqoff(NET_TX_SOFTIRQ);
> > > -		local_irq_restore(flags);
> > > +	if (likely(atomic_read(&skb->users) == 1)) {
> > > +		smp_rmb();
> > 
> > Could you give me a hint why this barrier is needed? IMHO the volatile
> > access in atomic_read should get rid of the control dependency so I
> > don't see a need for this barrier. Without the volatile access a
> > compiler-barrier would still suffice, I guess?
> 
> Please take a look at kfree_skb() implementation.
> 
> If you think a comment is needed there, please feel free to add it.
> 

My understanding of this (old) barrier here is an implicit wmb in
skb_get()

This probably needs something like :

static inline struct sk_buff *skb_get(struct sk_buff *skb)
{
	smp_mb__before_atomic_inc(); /* check {consume|kfree}_skb() */
	atomic_inc(&skb->users);
}



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hannes Frederic Sowa Dec. 5, 2013, 3:44 p.m. UTC | #4
On Thu, Dec 05, 2013 at 07:05:52AM -0800, Eric Dumazet wrote:
> On Thu, 2013-12-05 at 06:45 -0800, Eric Dumazet wrote:
> > On Thu, 2013-12-05 at 15:13 +0100, Hannes Frederic Sowa wrote:
> > > On Thu, Dec 05, 2013 at 04:45:08AM -0800, Eric Dumazet wrote:
> > > > -		local_irq_save(flags);
> > > > -		sd = &__get_cpu_var(softnet_data);
> > > > -		skb->next = sd->completion_queue;
> > > > -		sd->completion_queue = skb;
> > > > -		raise_softirq_irqoff(NET_TX_SOFTIRQ);
> > > > -		local_irq_restore(flags);
> > > > +	if (likely(atomic_read(&skb->users) == 1)) {
> > > > +		smp_rmb();
> > > 
> > > Could you give me a hint why this barrier is needed? IMHO the volatile
> > > access in atomic_read should get rid of the control dependency so I
> > > don't see a need for this barrier. Without the volatile access a
> > > compiler-barrier would still suffice, I guess?
> > 
> > Please take a look at kfree_skb() implementation.
> > 
> > If you think a comment is needed there, please feel free to add it.
> > 
> 
> My understanding of this (old) barrier here is an implicit wmb in
> skb_get()
> 
> This probably needs something like :
> 
> static inline struct sk_buff *skb_get(struct sk_buff *skb)
> {
> 	smp_mb__before_atomic_inc(); /* check {consume|kfree}_skb() */
> 	atomic_inc(&skb->users);
> }

Thanks for the pointer to kfree_skb. I found this commit which added the
barrier in kfree_skb (from history.git):

commit 09d3e84de438f217510b604a980befd07b0c8262
Author: Herbert Xu <herbert@gondor.apana.org.au>
Date:   Sat Feb 5 03:23:27 2005 -0800

    [NET]: Add missing memory barrier to kfree_skb().
    
    Also kill kfree_skb_fast(), that is a relic from fast switching
    which was killed off years ago.
    
    The bug is that in the case where we do the atomic_read()
    optimization, we need to make sure that reads of skb state
    later in __kfree_skb() processing (particularly the skb->list
    BUG check) are not reordered to occur before the counter
    read by the cpu.
    
    Thanks to Olaf Kirch and Anton Blanchard for discovering
    and helping fix this bug.
    
    Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
    Signed-off-by: David S. Miller <davem@davemloft.net>

It makes some sense but I did not grasp the whole ->users dependency
picture, yet. I guess the barrier is only needed when refcount drops
down to 0 and we don't necessarily need one when incrementing ->users.

Thank you,

  Hannes

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Dec. 5, 2013, 4:38 p.m. UTC | #5
On Thu, 2013-12-05 at 16:44 +0100, Hannes Frederic Sowa wrote:

> It makes some sense but I did not grasp the whole ->users dependency
> picture, yet. I guess the barrier is only needed when refcount drops
> down to 0 and we don't necessarily need one when incrementing ->users.

If you are the only user of this skb, really no smp barrier is needed at
all.

The problem comes when another cpu is working on the skb, and finally
releases its reference on it.

Before releasing its reference, it must commit all changes it might have
done onto skb. Otherwise another cpu might read stale data.

The smp_wmb() is done by the atomic_dec_and_test(), as it contains a
full barrier.

So the smp_rmb() pairs with the barrier done in atomic_dec_and_test()



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hannes Frederic Sowa Dec. 5, 2013, 4:54 p.m. UTC | #6
On Thu, Dec 05, 2013 at 08:38:05AM -0800, Eric Dumazet wrote:
> On Thu, 2013-12-05 at 16:44 +0100, Hannes Frederic Sowa wrote:
> 
> > It makes some sense but I did not grasp the whole ->users dependency
> > picture, yet. I guess the barrier is only needed when refcount drops
> > down to 0 and we don't necessarily need one when incrementing ->users.
> 
> If you are the only user of this skb, really no smp barrier is needed at
> all.
> 
> The problem comes when another cpu is working on the skb, and finally
> releases its reference on it.
> 
> Before releasing its reference, it must commit all changes it might have
> done onto skb. Otherwise another cpu might read stale data.
> 
> The smp_wmb() is done by the atomic_dec_and_test(), as it contains a
> full barrier.
> 
> So the smp_rmb() pairs with the barrier done in atomic_dec_and_test()

Ha, it all makes sense now! Thanks, Eric!

(Sorry for the noise but I find this kind of problems very interesting)

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Dec. 6, 2013, 8:24 p.m. UTC | #7
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 05 Dec 2013 04:45:08 -0800

> From: Eric Dumazet <edumazet@google.com>
> 
> Some network drivers use dev_kfree_skb_any() and dev_kfree_skb_irq()
> helpers to free skbs, both for dropped packets and TX completed ones.
> 
> We need to separate the two causes to get better diagnostics
> given by dropwatch or "perf record -e skb:kfree_skb"
> 
> This patch provides two new helpers, dev_consume_skb_any() and
> dev_consume_skb_irq() to be used for consumed skbs.
> 
> __dev_kfree_skb_irq() is slightly optimized to remove one
> atomic_dec_and_test() in fast path, and use this_cpu_{r|w} accessors.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied, thanks Eric.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 7f0ed423a360..c6d64d20050c 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2368,17 +2368,52 @@  static inline int netif_copy_real_num_queues(struct net_device *to_dev,
 #define DEFAULT_MAX_NUM_RSS_QUEUES	(8)
 int netif_get_num_default_rss_queues(void);
 
-/* Use this variant when it is known for sure that it
- * is executing from hardware interrupt context or with hardware interrupts
- * disabled.
- */
-void dev_kfree_skb_irq(struct sk_buff *skb);
+enum skb_free_reason {
+	SKB_REASON_CONSUMED,
+	SKB_REASON_DROPPED,
+};
+
+void __dev_kfree_skb_irq(struct sk_buff *skb, enum skb_free_reason reason);
+void __dev_kfree_skb_any(struct sk_buff *skb, enum skb_free_reason reason);
 
-/* Use this variant in places where it could be invoked
- * from either hardware interrupt or other context, with hardware interrupts
- * either disabled or enabled.
+/*
+ * It is not allowed to call kfree_skb() or consume_skb() from hardware
+ * interrupt context or with hardware interrupts being disabled.
+ * (in_irq() || irqs_disabled())
+ *
+ * We provide four helpers that can be used in following contexts :
+ *
+ * dev_kfree_skb_irq(skb) when caller drops a packet from irq context,
+ *  replacing kfree_skb(skb)
+ *
+ * dev_consume_skb_irq(skb) when caller consumes a packet from irq context.
+ *  Typically used in place of consume_skb(skb) in TX completion path
+ *
+ * dev_kfree_skb_any(skb) when caller doesn't know its current irq context,
+ *  replacing kfree_skb(skb)
+ *
+ * dev_consume_skb_any(skb) when caller doesn't know its current irq context,
+ *  and consumed a packet. Used in place of consume_skb(skb)
  */
-void dev_kfree_skb_any(struct sk_buff *skb);
+static inline void dev_kfree_skb_irq(struct sk_buff *skb)
+{
+	__dev_kfree_skb_irq(skb, SKB_REASON_DROPPED);
+}
+
+static inline void dev_consume_skb_irq(struct sk_buff *skb)
+{
+	__dev_kfree_skb_irq(skb, SKB_REASON_CONSUMED);
+}
+
+static inline void dev_kfree_skb_any(struct sk_buff *skb)
+{
+	__dev_kfree_skb_any(skb, SKB_REASON_DROPPED);
+}
+
+static inline void dev_consume_skb_any(struct sk_buff *skb)
+{
+	__dev_kfree_skb_any(skb, SKB_REASON_CONSUMED);
+}
 
 int netif_rx(struct sk_buff *skb);
 int netif_rx_ni(struct sk_buff *skb);
diff --git a/net/core/dev.c b/net/core/dev.c
index ba3b7ea5ebb3..aa54a742f392 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2145,30 +2145,42 @@  void __netif_schedule(struct Qdisc *q)
 }
 EXPORT_SYMBOL(__netif_schedule);
 
-void dev_kfree_skb_irq(struct sk_buff *skb)
+struct dev_kfree_skb_cb {
+	enum skb_free_reason reason;
+};
+
+static struct dev_kfree_skb_cb *get_kfree_skb_cb(const struct sk_buff *skb)
+{
+	return (struct dev_kfree_skb_cb *)skb->cb;
+}
+
+void __dev_kfree_skb_irq(struct sk_buff *skb, enum skb_free_reason reason)
 {
-	if (atomic_dec_and_test(&skb->users)) {
-		struct softnet_data *sd;
-		unsigned long flags;
+	unsigned long flags;
 
-		local_irq_save(flags);
-		sd = &__get_cpu_var(softnet_data);
-		skb->next = sd->completion_queue;
-		sd->completion_queue = skb;
-		raise_softirq_irqoff(NET_TX_SOFTIRQ);
-		local_irq_restore(flags);
+	if (likely(atomic_read(&skb->users) == 1)) {
+		smp_rmb();
+		atomic_set(&skb->users, 0);
+	} else if (likely(!atomic_dec_and_test(&skb->users))) {
+		return;
 	}
+	get_kfree_skb_cb(skb)->reason = reason;
+	local_irq_save(flags);
+	skb->next = __this_cpu_read(softnet_data.completion_queue);
+	__this_cpu_write(softnet_data.completion_queue, skb);
+	raise_softirq_irqoff(NET_TX_SOFTIRQ);
+	local_irq_restore(flags);
 }
-EXPORT_SYMBOL(dev_kfree_skb_irq);
+EXPORT_SYMBOL(__dev_kfree_skb_irq);
 
-void dev_kfree_skb_any(struct sk_buff *skb)
+void __dev_kfree_skb_any(struct sk_buff *skb, enum skb_free_reason reason)
 {
 	if (in_irq() || irqs_disabled())
-		dev_kfree_skb_irq(skb);
+		__dev_kfree_skb_irq(skb, reason);
 	else
 		dev_kfree_skb(skb);
 }
-EXPORT_SYMBOL(dev_kfree_skb_any);
+EXPORT_SYMBOL(__dev_kfree_skb_any);
 
 
 /**
@@ -3306,7 +3318,10 @@  static void net_tx_action(struct softirq_action *h)
 			clist = clist->next;
 
 			WARN_ON(atomic_read(&skb->users));
-			trace_kfree_skb(skb, net_tx_action);
+			if (likely(get_kfree_skb_cb(skb)->reason == SKB_REASON_CONSUMED))
+				trace_consume_skb(skb);
+			else
+				trace_kfree_skb(skb, net_tx_action);
 			__kfree_skb(skb);
 		}
 	}