diff mbox

softirq: punt to ksoftirqd if __do_softirq recently looped

Message ID 20140410115706.662fb5e7@annuminas.surriel.com
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Rik van Riel April 10, 2014, 3:57 p.m. UTC
Jiri noticed that netperf throughput had gotten worse in recent years,
for smaller message sizes. In the past, ksoftirqd would take around 80%
of a CPU, and netserver would take around 100% of another CPU.

On current kernels, sometimes all the softirq processing is done in the
context of the netperf process, which can result in as much as a 50%
performance drop, due to netserver spending all its CPU time "delivering"
packets to a socket it rarely empties, and dropping the packets on the
floor as a result.

This seems silly in an age where even cell phones are multi-core, and
we could simply let the ksoftirqd thread handle the softirq load, so
the scheduler can migrate the userspace task to another CPU.

This patch accomplishes that in a very simplistic way. The code
remembers when __do_softirq last looped, and will punt softirq
handling to ksoftirqd if another softirq happens in the same jiffie.

Netperf results:

			without patch		with patch
UDP_STREAM      1472    957.17 / 954.18         957.15 / 951.73
UDP_STREAM      978     936.85 / 930.06         936.84 / 927.63
UDP_STREAM      466     875.98 / 865.62         875.98 / 868.65
UDP_STREAM      210     760.88 / 748.70         760.88 / 748.61
UDP_STREAM      82      554.06 / 329.96         554.06 / 505.95
                        unstable ^^^^^^
UDP_STREAM      18      158.99 / 108.95         160.73 / 112.68

Signed-off-by: Rik van Riel <riel@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: David Miller <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Tested-by: Jiri Benc <jbenc@redhat.com>
Reported-by: Jiri Benc <jbenc@redhat.com>
---
 kernel/softirq.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Eric Dumazet April 10, 2014, 6:15 p.m. UTC | #1
On Thu, 2014-04-10 at 11:57 -0400, Rik van Riel wrote:
> Jiri noticed that netperf throughput had gotten worse in recent years,
> for smaller message sizes. In the past, ksoftirqd would take around 80%
> of a CPU, and netserver would take around 100% of another CPU.
> 
> On current kernels, sometimes all the softirq processing is done in the
> context of the netperf process, which can result in as much as a 50%
> performance drop, due to netserver spending all its CPU time "delivering"
> packets to a socket it rarely empties, and dropping the packets on the
> floor as a result.
> 
> This seems silly in an age where even cell phones are multi-core, and
> we could simply let the ksoftirqd thread handle the softirq load, so
> the scheduler can migrate the userspace task to another CPU.
> 
> This patch accomplishes that in a very simplistic way. The code
> remembers when __do_softirq last looped, and will punt softirq
> handling to ksoftirqd if another softirq happens in the same jiffie.
> 
> Netperf results:
> 
> 			without patch		with patch
> UDP_STREAM      1472    957.17 / 954.18         957.15 / 951.73
> UDP_STREAM      978     936.85 / 930.06         936.84 / 927.63
> UDP_STREAM      466     875.98 / 865.62         875.98 / 868.65
> UDP_STREAM      210     760.88 / 748.70         760.88 / 748.61
> UDP_STREAM      82      554.06 / 329.96         554.06 / 505.95
>                         unstable ^^^^^^
> UDP_STREAM      18      158.99 / 108.95         160.73 / 112.68
> 
> Signed-off-by: Rik van Riel <riel@redhat.com>
> Cc: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: David Miller <davem@davemloft.net>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Tested-by: Jiri Benc <jbenc@redhat.com>
> Reported-by: Jiri Benc <jbenc@redhat.com>
> ---
>  kernel/softirq.c | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/softirq.c b/kernel/softirq.c
> index 787b3a0..020be2f 100644
> --- a/kernel/softirq.c
> +++ b/kernel/softirq.c
> @@ -56,6 +56,7 @@ EXPORT_SYMBOL(irq_stat);
>  static struct softirq_action softirq_vec[NR_SOFTIRQS] __cacheline_aligned_in_smp;
>  
>  DEFINE_PER_CPU(struct task_struct *, ksoftirqd);
> +DEFINE_PER_CPU(unsigned long, softirq_looped);
>  
>  char *softirq_to_name[NR_SOFTIRQS] = {
>  	"HI", "TIMER", "NET_TX", "NET_RX", "BLOCK", "BLOCK_IOPOLL",
> @@ -271,6 +272,9 @@ asmlinkage void __do_softirq(void)
>  
>  	pending = local_softirq_pending();
>  	if (pending) {
> +		/* Still busy? Remember this for invoke_softirq() below... */
> +		this_cpu_write(softirq_looped, jiffies);
> +
>  		if (time_before(jiffies, end) && !need_resched() &&
>  		    --max_restart)
>  			goto restart;
> @@ -330,7 +334,11 @@ void irq_enter(void)
>  
>  static inline void invoke_softirq(void)
>  {
> -	if (!force_irqthreads) {
> +	/*
> +	 * If force_irqthreads is set, or if we looped in __do_softirq this
> +	 * jiffie, punt to ksoftirqd to prevent userland starvation.
> +	 */
> +	if (!force_irqthreads && this_cpu_read(softirq_looped) != jiffies) {
>  		/*
>  		 * We can safely execute softirq on the current stack if
>  		 * it is the irq stack, because it should be near empty


I guess this is the tradeoff between latencies and throughput.

Have you tried some TCP_RR  / UDP_RR tests with one / multiple
instances, and have you tried drivers that use deferred skb freeing
(hard irq calling TX completion handler, then dev_kfree_skb_any()
scheduling TX softirq) and increase chance of having a not zero
local_softirq_pending() 

Calling skb destructor on a different cpu can have a huge false sharing
effect. A TCP socket is really big.

Your test only UDP_STREAM stresses the RX part, and UDP sockets dont
use the complex callbacks TCP sockets use.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller April 11, 2014, 8:33 p.m. UTC | #2
From: Rik van Riel <riel@redhat.com>
Date: Thu, 10 Apr 2014 11:57:06 -0400

> @@ -330,7 +334,11 @@ void irq_enter(void)
>  
>  static inline void invoke_softirq(void)
>  {
> -	if (!force_irqthreads) {
> +	/*
> +	 * If force_irqthreads is set, or if we looped in __do_softirq this
> +	 * jiffie, punt to ksoftirqd to prevent userland starvation.
> +	 */
> +	if (!force_irqthreads && this_cpu_read(softirq_looped) != jiffies) {

If we do this, which I'm not convinced of yet, I think we should use two
jiffies as the cutoff.

Because if we are at the tail end of one jiffie we'll prematurely go to
ksoftirqd when we really have no reason to do so.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rik van Riel April 12, 2014, 12:02 a.m. UTC | #3
On 04/11/2014 04:33 PM, David Miller wrote:
> From: Rik van Riel <riel@redhat.com>
> Date: Thu, 10 Apr 2014 11:57:06 -0400
> 
>> @@ -330,7 +334,11 @@ void irq_enter(void)
>>  
>>  static inline void invoke_softirq(void)
>>  {
>> -	if (!force_irqthreads) {
>> +	/*
>> +	 * If force_irqthreads is set, or if we looped in __do_softirq this
>> +	 * jiffie, punt to ksoftirqd to prevent userland starvation.
>> +	 */
>> +	if (!force_irqthreads && this_cpu_read(softirq_looped) != jiffies) {
> 
> If we do this, which I'm not convinced of yet, I think we should use two
> jiffies as the cutoff.

I am not fully convinced, either.  This patch mostly just illustrates
the problem, and gives something that solves Jiri's immediate problem.

It is quite likely that there is a better way to solve the problem of:
1) softirq handling starving out userspace processing,
2) which could be solved by moving the userspace process elsewhere, and
3) shifting softirq processing to ksoftirqd

A working patch seems to be one of the better ways to start a
discussion, though.

If anybody has a nicer idea on how to solve the problem, I'd even be
willing to implement your idea, and give Jiri another patch to test :)
diff mbox

Patch

diff --git a/kernel/softirq.c b/kernel/softirq.c
index 787b3a0..020be2f 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -56,6 +56,7 @@  EXPORT_SYMBOL(irq_stat);
 static struct softirq_action softirq_vec[NR_SOFTIRQS] __cacheline_aligned_in_smp;
 
 DEFINE_PER_CPU(struct task_struct *, ksoftirqd);
+DEFINE_PER_CPU(unsigned long, softirq_looped);
 
 char *softirq_to_name[NR_SOFTIRQS] = {
 	"HI", "TIMER", "NET_TX", "NET_RX", "BLOCK", "BLOCK_IOPOLL",
@@ -271,6 +272,9 @@  asmlinkage void __do_softirq(void)
 
 	pending = local_softirq_pending();
 	if (pending) {
+		/* Still busy? Remember this for invoke_softirq() below... */
+		this_cpu_write(softirq_looped, jiffies);
+
 		if (time_before(jiffies, end) && !need_resched() &&
 		    --max_restart)
 			goto restart;
@@ -330,7 +334,11 @@  void irq_enter(void)
 
 static inline void invoke_softirq(void)
 {
-	if (!force_irqthreads) {
+	/*
+	 * If force_irqthreads is set, or if we looped in __do_softirq this
+	 * jiffie, punt to ksoftirqd to prevent userland starvation.
+	 */
+	if (!force_irqthreads && this_cpu_read(softirq_looped) != jiffies) {
 		/*
 		 * We can safely execute softirq on the current stack if
 		 * it is the irq stack, because it should be near empty