From patchwork Fri Feb 27 16:08:04 2009 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Eric Dumazet X-Patchwork-Id: 23817 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.176.167]) by ozlabs.org (Postfix) with ESMTP id 0A123DDD0B for ; Sat, 28 Feb 2009 03:09:04 +1100 (EST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755641AbZB0QI2 (ORCPT ); Fri, 27 Feb 2009 11:08:28 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1755598AbZB0QI2 (ORCPT ); Fri, 27 Feb 2009 11:08:28 -0500 Received: from gw1.cosmosbay.com ([212.99.114.194]:39720 "EHLO gw1.cosmosbay.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754023AbZB0QI1 convert rfc822-to-8bit (ORCPT ); Fri, 27 Feb 2009 11:08:27 -0500 Received: from [127.0.0.1] (localhost [127.0.0.1]) by gw1.cosmosbay.com (8.13.7/8.13.7) with ESMTP id n1RG84oS014536; Fri, 27 Feb 2009 17:08:05 +0100 Message-ID: <49A80FE4.6030508@cosmosbay.com> Date: Fri, 27 Feb 2009 17:08:04 +0100 From: Eric Dumazet User-Agent: Thunderbird 2.0.0.19 (Windows/20081209) MIME-Version: 1.0 To: "Paul E. McKenney" CC: Stephen Hemminger , David Miller , Patrick McHardy , Rick Jones , netdev@vger.kernel.org, netfilter-devel@vger.kernel.org, linux kernel Subject: [PATCH] rcu: increment quiescent state counter in ksoftirqd() References: <20090218051906.174295181@vyatta.com> <20090218052747.321329022@vyatta.com> <20090219114719.560999b5@extreme> <499DEF49.3040602@cosmosbay.com> <49A7F262.8040805@cosmosbay.com> In-Reply-To: <49A7F262.8040805@cosmosbay.com> X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-1.6 (gw1.cosmosbay.com [0.0.0.0]); Fri, 27 Feb 2009 17:08:05 +0100 (CET) Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Eric Dumazet a écrit : > Eric Dumazet a écrit : >> Stephen Hemminger a écrit : >>> The reader/writer lock in ip_tables is acquired in the critical path of >>> processing packets and is one of the reasons just loading iptables can cause >>> a 20% performance loss. The rwlock serves two functions: >>> >>> 1) it prevents changes to table state (xt_replace) while table is in use. >>> This is now handled by doing rcu on the xt_table. When table is >>> replaced, the new table(s) are put in and the old one table(s) are freed >>> after RCU period. >>> >>> 2) it provides synchronization when accesing the counter values. >>> This is now handled by swapping in new table_info entries for each cpu >>> then summing the old values, and putting the result back onto one >>> cpu. On a busy system it may cause sampling to occur at different >>> times on each cpu, but no packet/byte counts are lost in the process. >>> >>> Signed-off-by: Stephen Hemminger >> >> Acked-by: Eric Dumazet >> >> Sucessfully tested on my dual quad core machine too, but iptables only (no ipv6 here) >> >> BTW, my new "tbench 8" result is 2450 MB/s, (it was 2150 MB/s not so long ago) >> >> Thanks Stephen, thats very cool stuff, yet another rwlock out of kernel :) >> > > While testing multicast flooding stuff, I found that "iptables -nvL" can > have a *very* slow response time on my dual quad core machine... > > > # time iptables -nvL > Chain INPUT (policy ACCEPT 416M packets, 64G bytes) > pkts bytes target prot opt in out source destination > > Chain FORWARD (policy ACCEPT 0 packets, 0 bytes) > pkts bytes target prot opt in out source destination > > Chain OUTPUT (policy ACCEPT 401M packets, 62G bytes) > pkts bytes target prot opt in out source destination > > real 0m1.810s <<<< HERE >>>> > user 0m0.000s > sys 0m0.001s > > > CONFIG_NO_HZ=y > CONFIG_HZ_1000=y > CONFIG_HZ=1000 > > One cpu is 100% handling softirqs, could it be the problem ? > > Cpu0 : 1.0%us, 14.7%sy, 0.0%ni, 83.3%id, 0.0%wa, 0.0%hi, 1.0%si, 0.0%st > Cpu1 : 3.6%us, 23.2%sy, 0.0%ni, 71.6%id, 0.0%wa, 0.0%hi, 1.7%si, 0.0%st > Cpu2 : 0.0%us, 0.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi,100.0%si, 0.0%st > Cpu3 : 2.7%us, 23.9%sy, 0.0%ni, 71.1%id, 0.7%wa, 0.0%hi, 1.7%si, 0.0%st > Cpu4 : 1.3%us, 14.3%sy, 0.0%ni, 83.3%id, 0.0%wa, 0.0%hi, 1.0%si, 0.0%st > Cpu5 : 1.0%us, 14.2%sy, 0.0%ni, 83.4%id, 0.0%wa, 0.0%hi, 1.3%si, 0.0%st > Cpu6 : 0.3%us, 7.0%sy, 0.0%ni, 92.4%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st > Cpu7 : 0.7%us, 8.0%sy, 0.0%ni, 90.0%id, 0.7%wa, 0.0%hi, 0.7%si, 0.0%st Hi Paul I found following patch helps if one cpu is looping inside ksoftirqd() synchronize_rcu() now completes in 40 ms instead of 1800 ms. Thank you [PATCH] rcu: increment quiescent state counter in ksoftirqd() If a machine is flooded by network frames, a cpu can loop 100% of its time inside ksoftirqd() without calling schedule(). This can delay RCU grace period to insane values. Adding rcu_qsctr_inc() call in ksoftirqd() solves this problem. Signed-off-by: Eric Dumazet Reviewed-by: Paul E. McKenney --- -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/kernel/softirq.c b/kernel/softirq.c index bdbe9de..9041ea7 100644 --- a/kernel/softirq.c +++ b/kernel/softirq.c @@ -626,6 +626,7 @@ static int ksoftirqd(void * __bind_cpu) preempt_enable_no_resched(); cond_resched(); preempt_disable(); + rcu_qsctr_inc((long)__bind_cpu); } preempt_enable(); set_current_state(TASK_INTERRUPTIBLE);