diff mbox

[v3] net: batch skb dequeueing from softnet input_pkt_queue

Message ID 1271238738-8386-1-git-send-email-xiaosuo@gmail.com
State Rejected, archived
Delegated to: David Miller
Headers show

Commit Message

Changli Gao April 14, 2010, 9:52 a.m. UTC
batch skb dequeueing from softnet input_pkt_queue

batch skb dequeueing from softnet input_pkt_queue to reduce potential lock
contention and irq disabling/enabling.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
----
 include/linux/netdevice.h |    1 
 net/core/dev.c            |   56 ++++++++++++++++++++++++++++++++--------------
 2 files changed, 40 insertions(+), 17 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

jamal April 14, 2010, 11:58 a.m. UTC | #1
On Wed, 2010-04-14 at 17:52 +0800, Changli Gao wrote:
> batch skb dequeueing from softnet input_pkt_queue
> 
> batch skb dequeueing from softnet input_pkt_queue to reduce potential lock
> contention and irq disabling/enabling.
> 
> Signed-off-by: Changli Gao <xiaosuo@gmail.com>

It seems we are now going to generate a lot more IPIs with such a
change. At least this is what i am imagining.
CPU0: packet comes in,queue empty, generate an IPI to CPU1
CPU0: second packet comes in, enqueue
CPU1: grab two packets to process and run with them
CPU0: packet comes in,queue empty, generate an IPI to CPU1
..
...
.....

IPIs add to latency (refer to my other email). Did you test this
to reach some conclusion that it improves thing or was it just by
inspection?

cheers,
jamal

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Changli Gao April 14, 2010, 12:13 p.m. UTC | #2
On Wed, Apr 14, 2010 at 7:58 PM, jamal <hadi@cyberus.ca> wrote:
>
> It seems we are now going to generate a lot more IPIs with such a
> change. At least this is what i am imagining.
> CPU0: packet comes in,queue empty, generate an IPI to CPU1
> CPU0: second packet comes in, enqueue
> CPU1: grab two packets to process and run with them
> CPU0: packet comes in,queue empty, generate an IPI to CPU1

No extra IPI is needed.

+       qlen = queue->input_pkt_queue.qlen + queue->processing_queue.qlen;
+       if (qlen <= netdev_max_backlog) {
+               if (qlen) {

the packets in processing_queue are counted too.

>
> IPIs add to latency (refer to my other email). Did you test this
> to reach some conclusion that it improves thing or was it just by
> inspection?
>

:( only insepection.
jamal April 14, 2010, 12:28 p.m. UTC | #3
On Wed, 2010-04-14 at 20:13 +0800, Changli Gao wrote:
> On Wed, Apr 14, 2010 at 7:58 PM, jamal <hadi@cyberus.ca> wrote:

> No extra IPI is needed.
> 
> +       qlen = queue->input_pkt_queue.qlen + queue->processing_queue.qlen;
> +       if (qlen <= netdev_max_backlog) {
> +               if (qlen) {
> 
> the packets in processing_queue are counted too.

Ok - Looks reasonable.

> > IPIs add to latency (refer to my other email). Did you test this
> > to reach some conclusion that it improves thing or was it just by
> > inspection?
> >
> 
> :( only insepection.

I am probably being pushy, but one simple test for latency of single
flow is: 

from machine 1, send ping -f

on rps machine:
Base test: no rps on ( a fresh boot with no sysctls should do fine)
Test 1: irq affinity on cpuX, rps to cpuY 
Test 2: repeat test1 with your change.

It should show no difference between test1 and 2. If it shows
improvement better - but showing worse latency is bad.

cheers,
jamal

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet April 14, 2010, 3:20 p.m. UTC | #4
Le mercredi 14 avril 2010 à 17:52 +0800, Changli Gao a écrit :
> batch skb dequeueing from softnet input_pkt_queue
> 
> batch skb dequeueing from softnet input_pkt_queue to reduce potential lock
> contention and irq disabling/enabling.
> 
> Signed-off-by: Changli Gao <xiaosuo@gmail.com>

Adding stop_machine() with no explanation ?

No ack from my previous comments, suggestions, and still same logic ?

Are we supposed to read patch, test it, make some benches, correct bugs,
say Amen ?

This is becoming silly, if you ask me.

This is a NACK of this patch, obviously.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Changli Gao April 14, 2010, 11:13 p.m. UTC | #5
On Wed, Apr 14, 2010 at 11:20 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le mercredi 14 avril 2010 à 17:52 +0800, Changli Gao a écrit :
>> batch skb dequeueing from softnet input_pkt_queue
>>
>> batch skb dequeueing from softnet input_pkt_queue to reduce potential lock
>> contention and irq disabling/enabling.
>>
>> Signed-off-by: Changli Gao <xiaosuo@gmail.com>
>
> Adding stop_machine() with no explanation ?

stop_machine() is added to flush the processing_queue. Because the old
flush_backlog() runs in IRQ context, it can't touch the things, which
are only valid in softirq context.

>
> No ack from my previous comments, suggestions, and still same logic ?

In this patch, the volatile variable flush_processing_queue is
removed, and the corresponding lines are removed too.

Oh, I should splice the old message back, as stop_machine is used
instead. So, the processing queue will be removed from softnet_data,
and a single int counter will be used instead to count the packets
which are being processing.

one issue you concern is potential cache miss when summing in enqueue
function. It won't happen all the time, in fact, I think it should
happen seldom, and it is the responsibility of hardware to cache the
data frequently used.

>
> Are we supposed to read patch, test it, make some benches, correct bugs,
> say Amen ?
>

OK, I'll test it.
diff mbox

Patch

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index d1a21b5..898bc62 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1335,6 +1335,7 @@  struct softnet_data {
 	struct call_single_data	csd ____cacheline_aligned_in_smp;
 #endif
 	struct sk_buff_head	input_pkt_queue;
+	struct sk_buff_head	processing_queue;
 	struct napi_struct	backlog;
 };
 
diff --git a/net/core/dev.c b/net/core/dev.c
index a10a216..c635a71 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -131,6 +131,7 @@ 
 #include <linux/random.h>
 #include <trace/events/napi.h>
 #include <linux/pci.h>
+#include <linux/stop_machine.h>
 
 #include "net-sysfs.h"
 
@@ -2332,6 +2333,7 @@  static int enqueue_to_backlog(struct sk_buff *skb, int cpu)
 {
 	struct softnet_data *queue;
 	unsigned long flags;
+	u32 qlen;
 
 	queue = &per_cpu(softnet_data, cpu);
 
@@ -2339,8 +2341,9 @@  static int enqueue_to_backlog(struct sk_buff *skb, int cpu)
 	__get_cpu_var(netdev_rx_stat).total++;
 
 	rps_lock(queue);
-	if (queue->input_pkt_queue.qlen <= netdev_max_backlog) {
-		if (queue->input_pkt_queue.qlen) {
+	qlen = queue->input_pkt_queue.qlen + queue->processing_queue.qlen;
+	if (qlen <= netdev_max_backlog) {
+		if (qlen) {
 enqueue:
 			__skb_queue_tail(&queue->input_pkt_queue, skb);
 			rps_unlock(queue);
@@ -2791,19 +2794,31 @@  int netif_receive_skb(struct sk_buff *skb)
 EXPORT_SYMBOL(netif_receive_skb);
 
 /* Network device is going away, flush any packets still pending  */
-static void flush_backlog(void *arg)
+static void __flush_backlog(struct sk_buff_head *head, struct net_device *dev)
 {
-	struct net_device *dev = arg;
-	struct softnet_data *queue = &__get_cpu_var(softnet_data);
 	struct sk_buff *skb, *tmp;
 
-	rps_lock(queue);
-	skb_queue_walk_safe(&queue->input_pkt_queue, skb, tmp)
+	skb_queue_walk_safe(head, skb, tmp) {
 		if (skb->dev == dev) {
-			__skb_unlink(skb, &queue->input_pkt_queue);
+			__skb_unlink(skb, head);
 			kfree_skb(skb);
 		}
-	rps_unlock(queue);
+	}
+}
+
+static int flush_backlog(void *arg)
+{
+	struct net_device *dev = arg;
+	struct softnet_data *queue;
+	int cpu;
+
+	for_each_online_cpu(cpu) {
+		queue = &per_cpu(softnet_data, cpu);
+		__flush_backlog(&queue->input_pkt_queue, dev);
+		__flush_backlog(&queue->processing_queue, dev);
+	}
+
+	return 0;
 }
 
 static int napi_gro_complete(struct sk_buff *skb)
@@ -3118,20 +3133,23 @@  static int process_backlog(struct napi_struct *napi, int quota)
 
 		local_irq_disable();
 		rps_lock(queue);
-		skb = __skb_dequeue(&queue->input_pkt_queue);
-		if (!skb) {
+		skb_queue_splice_tail_init(&queue->input_pkt_queue,
+					   &queue->processing_queue);
+		if (skb_queue_empty(&queue->processing_queue)) {
 			__napi_complete(napi);
 			rps_unlock(queue);
 			local_irq_enable();
-			break;
+			return work;
 		}
 		rps_unlock(queue);
 		local_irq_enable();
 
-		__netif_receive_skb(skb);
-	} while (++work < quota && jiffies == start_time);
-
-	return work;
+		while ((skb = __skb_dequeue(&queue->processing_queue))) {
+			__netif_receive_skb(skb);
+			if (++work >= quota || jiffies != start_time)
+				return work;
+		}
+	} while (1);
 }
 
 /**
@@ -5027,7 +5045,7 @@  void netdev_run_todo(void)
 
 		dev->reg_state = NETREG_UNREGISTERED;
 
-		on_each_cpu(flush_backlog, dev, 1);
+		stop_machine(flush_backlog, dev, NULL);
 
 		netdev_wait_allrefs(dev);
 
@@ -5487,6 +5505,9 @@  static int dev_cpu_callback(struct notifier_block *nfb,
 	raise_softirq_irqoff(NET_TX_SOFTIRQ);
 	local_irq_enable();
 
+	while ((skb = __skb_dequeue(&oldsd->processing_queue)))
+		netif_rx(skb);
+
 	/* Process offline CPU's input_pkt_queue */
 	while ((skb = __skb_dequeue(&oldsd->input_pkt_queue)))
 		netif_rx(skb);
@@ -5709,6 +5730,7 @@  static int __init net_dev_init(void)
 
 		queue = &per_cpu(softnet_data, i);
 		skb_queue_head_init(&queue->input_pkt_queue);
+		skb_queue_head_init(&queue->processing_queue);
 		queue->completion_queue = NULL;
 		INIT_LIST_HEAD(&queue->poll_list);