From patchwork Fri Oct 7 08:58:08 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Koehrer Mathias (ETAS/ESW5)" X-Patchwork-Id: 679178 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from whitealder.osuosl.org (smtp1.osuosl.org [140.211.166.138]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3sr3Mn28XZz9s2Q for ; Fri, 7 Oct 2016 19:58:28 +1100 (AEDT) Received: from localhost (localhost [127.0.0.1]) by whitealder.osuosl.org (Postfix) with ESMTP id E1F4692DE9; Fri, 7 Oct 2016 08:58:25 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from whitealder.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Lqm4bEzPuJSr; Fri, 7 Oct 2016 08:58:21 +0000 (UTC) Received: from ash.osuosl.org (ash.osuosl.org [140.211.166.34]) by whitealder.osuosl.org (Postfix) with ESMTP id 4D73E92DDD; Fri, 7 Oct 2016 08:58:21 +0000 (UTC) X-Original-To: intel-wired-lan@lists.osuosl.org Delivered-To: intel-wired-lan@lists.osuosl.org Received: from hemlock.osuosl.org (smtp2.osuosl.org [140.211.166.133]) by ash.osuosl.org (Postfix) with ESMTP id 95E2F1C283A for ; Fri, 7 Oct 2016 08:58:20 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by hemlock.osuosl.org (Postfix) with ESMTP id 8D03B95FD1 for ; Fri, 7 Oct 2016 08:58:20 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from hemlock.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id VvNmnLJQHD64 for ; Fri, 7 Oct 2016 08:58:17 +0000 (UTC) X-Greylist: from auto-whitelisted by SQLgrey-1.7.6 Received: from imta22.fe.bosch.de (imta22.fe.bosch.de [139.15.243.26]) by hemlock.osuosl.org (Postfix) with ESMTPS id 771E595D0E for ; Fri, 7 Oct 2016 08:58:17 +0000 (UTC) Received: from smtp6-v.fe.bosch.de (imta23.fe.bosch.de [139.15.243.227]) by imta22.fe.bosch.de (Postfix) with ESMTP id 7557A8403B4 for ; Fri, 7 Oct 2016 10:58:15 +0200 (CEST) Received: from vsmta13.fe.internet.bosch.com (unknown [10.4.98.53]) by imta23.fe.bosch.de (Postfix) with ESMTP id 15B8215800F1; Fri, 7 Oct 2016 10:58:09 +0200 (CEST) Received: from SI-MBX1012.de.bosch.com (vsgw24.fe.internet.bosch.com [10.4.98.24]) by vsmta13.fe.internet.bosch.com (Postfix) with ESMTP id 034612E4024A; Fri, 7 Oct 2016 10:58:09 +0200 (CEST) Received: from FE-MBX1012.de.bosch.com (10.3.230.70) by SI-MBX1012.de.bosch.com (10.3.230.45) with Microsoft SMTP Server (TLS) id 15.0.1178.4; Fri, 7 Oct 2016 10:58:08 +0200 Received: from FE-MBX1012.de.bosch.com ([fe80::9db6:aaff:1980:b83a]) by FE-MBX1012.de.bosch.com ([fe80::9db6:aaff:1980:b83a%16]) with mapi id 15.00.1178.000; Fri, 7 Oct 2016 10:58:08 +0200 From: "Koehrer Mathias (ETAS/ESW5)" To: "Williams, Mitch A" , Julia Cartwright , "Kirsher, Jeffrey T" , Greg Thread-Topic: Kernel 4.6.7-rt13: Intel Ethernet driver igb causes huge latencies in cyclictest Thread-Index: AdIUzS/19ba6z4msSlWtJALmxToZ5gABboeAACOjRvAACqwnIP//8zsA///T88CAAFArAP/7akzAgAzGM4D/9s4gcAJTnT2A//8hO/D//crngP/6fRQQ//RfcAD/56V84A== Date: Fri, 7 Oct 2016 08:58:08 +0000 Message-ID: <29250f87b1d84aacb8aa312935582291@FE-MBX1012.de.bosch.com> References: <20160923123224.odybv2uos6tot6it@linutronix.de> <20160923144140.5tkzeymamrb5qnsv@linutronix.de> <20160928194519.GA32423@jcartwri.amer.corp.natinst.com> <487032ca81f84e70bdacc39a024eff5e@FE-MBX1012.de.bosch.com> <20161004193445.GF10625@jcartwri.amer.corp.natinst.com> <584755c2766e4b94a604ece16760fe14@FE-MBX1012.de.bosch.com> <20161005155959.GH10625@jcartwri.amer.corp.natinst.com> <13c3cd3ffee4490fb22b8de383e51361@FE-MBX1012.de.bosch.com> In-Reply-To: Accept-Language: de-DE, en-US Content-Language: de-DE X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-exchange-transport-fromentityheader: Hosted x-originating-ip: [10.4.162.35] MIME-Version: 1.0 X-TM-AS-MML: disable X-TM-AS-Product-Ver: IMSS-7.1.0.1679-8.0.0.1202-22618.006 X-TMASE-MatchedRID: u6ojmU07PKwn2WEbWzq9rZYsKSXWWrsHgGa+oYp5i6o3XlVVGX0Okcbc GZNtpcQlcODoO2WaU5VwH4NDqFwAOIzQp7mciZRI5cQFbdjxan4Y15SV2JK8fVJxdAzF6l77CF7 o0j5/guXvAMTWRRjk0wp3gn5iC/R6Mpb/RI2lqNVU/2M0RAfTihiue5Gf9pmpgzTyKHJgM//U3a MpoM9oQuQtpM4qNt7n+2YkBWRPfvNnzOcaBOv1CNxajlW+zwxCZdKh+/+x0Y68NrbzjPvzJ5VBh I8VoTbc69P6E+c3GM7+qrXTNqMyZiIOuxx4sxXhmq3agDaCfdKcngj41L6zDuOxOq7LQlGL4swy bZuY2NSiY80alckuhvPMeS1edoBcxzvHgMQjo/bd+fuf9kcapvQhZRRMg92Eh8BhJvgqWBlQ8jS 6UmjcjMtCdlRKqkfFmSldS/TQ6AHxeC4/s6RY31VN8laWo90MiuiCZsC0BEgv/EcQ7KhR/JQtgT +OmrFw0Mr2Kar8/fMhTMvDIC4gwHrSP9RtGZYoU+OjsPhIWDiw3bSBLOCKst9RlPzeVuQQZ10wK XUWXnLVnTylJSIHbB4PMBj1+jNNHMuTzWfQQg2Lm2Ke1GK7PaAFZFUcbb6RAIZpxx3hp72nXhAD gppWjXG3lf0p6oT9ziUXUukS/yzlRxm3A2wKuv7E6GNqs6ce2n9TdfAvmQIXvQkGi3tjz46HM5r qDwqtA8pekTaelKIvmcIEXouhV1cibA/x8UtkiGxjJ6RtbQQ/Kv58cRuXrQ== Cc: "netdev@vger.kernel.org" , "intel-wired-lan@lists.osuosl.org" , "linux-rt-users@vger.kernel.org" , Sebastian Andrzej Siewior Subject: Re: [Intel-wired-lan] Kernel 4.6.7-rt13: Intel Ethernet driver igb causes huge latencies in cyclictest X-BeenThere: intel-wired-lan@lists.osuosl.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Intel Wired Ethernet Linux Kernel Driver Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-wired-lan-bounces@lists.osuosl.org Sender: "Intel-wired-lan" Hi Mitch, > > > > > > Although, to be clear, it isn't the fact that there exists 8 > > > threads, it's > > that the device is > > > firing all 8 interrupts at the same time. The time spent in hardirq > > context just waking > > > up all 8 of those threads (and the cyclictest wakeup) is enough to > > > cause > > your > > > regression. > > > > > > netdev/igb folks- > > > > > > Under what conditions should it be expected that the i350 trigger > > > all of > > the TxRx > > > interrupts simultaneously? Any ideas here? > > I can answer that! I wrote that code. > > We trigger the interrupts once a second because MSI and MSI-X interrupts are NOT > guaranteed to be delivered. If this happens, the queues being serviced by this "lost" > interrupt are completely stuck. > > The device automatically masks each interrupt vector after it fires, expecting the > ISR to re-enable the vector after processing is complete. If the interrupt is lost, the > ISR doesn't run, so the vector ends up permanently masked. At this point, any > queues associated with that vector are stuck. The only recovery is through the > netdev watchdog, which initiates a reset. > > During development of igb, we had several platforms with chipsets that routinely > dropped MSI messages under stress. Things would be running fine and then, pow, > all the traffic on a queue would stop. > > So, I added code to fire each vector once per second. Just unmasking the interrupt > isn't enough - we need to trigger the ISR to get the queues cleaned up so the device > can work again. > > Is this workaround still needed? I don't know. Modern chipsets don't break a sweat > handling gigabit-speed traffic, and they *probably* don't drop interrupts. But I'd still > rather have that insurance. > > You could try to remove the write to the EICS registers in the watchdog task to see if > that takes care of your problem. But I wouldn't want to remove that code > permanently, because we have seen lost interrupts in the past. > > You also could try staggering the writes so that not all vectors fire each second. But > then you'll potentially incur a much longer delay if an interrupt does get lost, which > means you could trigger netdev watchdog events. > > -Mitch Thanks for the explanation. Now I have switched over to kernel 4.8-rt1. This shows the very same effect. I modified the in-kernel's igb_main.c (function igb_watchdog_task) to comment out the access to the EICS registers: The result is now slighty better, however the max latency that has been measured by cyclictest is still much higher compared to the values of kernel 3.18.27-rt27. I have now enabled all events, the execution of # cyclictest -a -i 105 -m -n -p 80 -t 1 -b 30 -C delivers the following trace values cyclicte-10062 0....... 10025813402us : sys_exit: NR 230 = 0 cyclicte-10062 0....... 10025813402us : sys_enter: NR 230 (1, 1, 7ffff73ff930, 0, 0, 2bd7e12e) cyclicte-10062 0....... 10025813402us : hrtimer_init: hrtimer=ffff88040a017e18 clockid=CLOCK_MONOTONIC mode=HRTIMER_MODE_ABS cyclicte-10062 0d...1.. 10025813403us : hrtimer_start: hrtimer=ffff88040a017e18 function=hrtimer_wakeup expires=10024735653388 softexpires=10024735653388 cyclicte-10062 0d...1.. 10025813403us : write_msr: 6e0, value 20ca630b9aef cyclicte-10062 0d...1.. 10025813403us : rcu_utilization: Start context switch cyclicte-10062 0d...1.. 10025813403us : rcu_utilization: End context switch cyclicte-10062 0d...2.. 10025813404us : sched_switch: prev_comm=cyclictest prev_pid=10062 prev_prio=19 prev_state=S ==> next_comm=kworker/0:3 next_pid=1388 next_prio=120 cyclicte-10062 0d...2.. 10025813404us+: x86_fpu_regs_deactivated: x86/fpu: ffff88040c603ec0 fpregs_active: 0 fpstate_active: 1 counter: 69 xfeatures: 2 xcomp_bv: 0 kworker/-1388 0d..h... 10025813468us : irq_handler_entry: irq=48 name=eth2-tx-0 kworker/-1388 0d..h... 10025813468us : irq_handler_exit: irq=48 ret=handled kworker/-1388 0d..h1.. 10025813469us : sched_waking: comm=irq/48-eth2-tx- pid=10057 prio=49 target_cpu=000 kworker/-1388 0dN.h2.. 10025813469us : sched_wakeup: comm=irq/48-eth2-tx- pid=10057 prio=49 target_cpu=000 kworker/-1388 0dN.h1.. 10025813470us : irq_handler_entry: irq=47 name=eth2-rx-0 kworker/-1388 0dN.h1.. 10025813470us : irq_handler_exit: irq=47 ret=handled kworker/-1388 0dN.h2.. 10025813471us : sched_waking: comm=irq/47-eth2-rx- pid=10056 prio=49 target_cpu=000 kworker/-1388 0dN.h3.. 10025813471us : sched_wakeup: comm=irq/47-eth2-rx- pid=10056 prio=49 target_cpu=000 kworker/-1388 0dN..1.. 10025813472us : rcu_utilization: Start context switch kworker/-1388 0dN..1.. 10025813472us : rcu_utilization: End context switch kworker/-1388 0dN..2.. 10025813472us : sched_stat_runtime: comm=kworker/0:3 pid=1388 runtime=67566 [ns] vruntime=101216288332 [ns] kworker/-1388 0d...2.. 10025813472us : sched_switch: prev_comm=kworker/0:3 prev_pid=1388 prev_prio=120 prev_state=R+ ==> next_comm=irq/48-eth2-tx- next_pid=10057 next_prio=49 irq/48-e-10057 0d....11 10025813473us : softirq_raise: vec=3 [action=NET_RX] irq/48-e-10057 0.....12 10025813474us : softirq_entry: vec=3 [action=NET_RX] irq/48-e-10057 0.....12 10025813475us : napi_poll: napi poll on napi struct ffff88040a582850 for device eth2 work 0 budget 64 irq/48-e-10057 0.....12 10025813475us : softirq_exit: vec=3 [action=NET_RX] irq/48-e-10057 0d...1.. 10025813475us : rcu_utilization: Start context switch irq/48-e-10057 0d...1.. 10025813476us : rcu_utilization: End context switch irq/48-e-10057 0d...2.. 10025813476us : sched_switch: prev_comm=irq/48-eth2-tx- prev_pid=10057 prev_prio=49 prev_state=S ==> next_comm=irq/47-eth2-rx- next_pid=10056 next_prio=49 irq/47-e-10056 0d....11 10025813477us : softirq_raise: vec=3 [action=NET_RX] irq/47-e-10056 0.....12 10025813477us : softirq_entry: vec=3 [action=NET_RX] irq/47-e-10056 0.....12 10025813478us : napi_poll: napi poll on napi struct ffff88040a580850 for device eth2 work 0 budget 64 irq/47-e-10056 0.....12 10025813478us : softirq_exit: vec=3 [action=NET_RX] irq/47-e-10056 0d...1.. 10025813479us : rcu_utilization: Start context switch irq/47-e-10056 0d...1.. 10025813479us : rcu_utilization: End context switch irq/47-e-10056 0d...2.. 10025813479us+: sched_switch: prev_comm=irq/47-eth2-rx- prev_pid=10056 prev_prio=49 prev_state=S ==> next_comm=kworker/0:3 next_pid=1388 next_prio=120 kworker/-1388 0d..h... 10025813516us : local_timer_entry: vector=239 kworker/-1388 0d..h1.. 10025813516us : hrtimer_interrupt: cpu=0 offset=-28999 curr=kworker/0:3[120] thread=cyclictest[19] kworker/-1388 0d..h1.. 10025813517us : hrtimer_cancel: hrtimer=ffff88040a017e18 kworker/-1388 0d..h... 10025813517us : hrtimer_expire_entry: hrtimer=ffff88040a017e18 function=hrtimer_wakeup now=10024735682387 kworker/-1388 0d..h1.. 10025813517us : sched_waking: comm=cyclictest pid=10062 prio=19 target_cpu=000 kworker/-1388 0dN.h2.. 10025813517us : sched_wakeup: comm=cyclictest pid=10062 prio=19 target_cpu=000 kworker/-1388 0dN.h... 10025813517us : hrtimer_expire_exit: hrtimer=ffff88040a017e18 kworker/-1388 0dN.h1.. 10025813518us : hrtimer_interrupt: cpu=0 offset=317613 curr=kworker/0:3[120] thread=[-1] kworker/-1388 0dN.h... 10025813518us : write_msr: 6e0, value 20ca631e99f8 kworker/-1388 0dN.h... 10025813518us : local_timer_exit: vector=239 kworker/-1388 0dN..1.. 10025813518us : rcu_utilization: Start context switch kworker/-1388 0dN..1.. 10025813518us : rcu_utilization: End context switch kworker/-1388 0dN..2.. 10025813518us : sched_stat_runtime: comm=kworker/0:3 pid=1388 runtime=38173 [ns] vruntime=101216326505 [ns] kworker/-1388 0d...2.. 10025813519us : sched_switch: prev_comm=kworker/0:3 prev_pid=1388 prev_prio=120 prev_state=R+ ==> next_comm=cyclictest next_pid=10062 next_prio=19 kworker/-1388 0d...2.. 10025813519us : x86_fpu_regs_activated: x86/fpu: ffff88040c603ec0 fpregs_active: 1 fpstate_active: 1 counter: 70 xfeatures: 2 xcomp_bv: 0 kworker/-1388 0d...2.. 10025813519us : write_msr: c0000100, value 7ffff7400700 cyclicte-10062 0....... 10025813519us : sys_exit: NR 230 = 0 cyclicte-10062 0....... 10025813524us : sys_enter: NR 1 (5, 7ffff7400300, 1f, 7ffff77a5460, 2, 7ffff744c99a) cyclicte-10062 0.....11 10025813526us : tracing_mark_write: hit latency threshold (32 > 30) cyclicte-10062 0....... 10025813526us : sys_exit: NR 1 = 31 cyclicte-10062 0....... 10025813527us : sys_enter: NR 1 (4, 4076b0, 2, 7ffff77a5460, 2, 7ffff744c99a) Also, I notice that there is still roughly one irq every 2 seconds on irq/47 (eth2-rx-0) and irq/48 (eth2-tx-0). Any idea or feedback on this? Thanks Mathias --- igb_main.c.orig 2016-10-07 10:43:37.855873754 +0200 +++ igb_main.c 2016-10-07 10:31:51.451346292 +0200 @@ -4613,6 +4613,7 @@ no_wait: } /* Cause software interrupt to ensure Rx ring is cleaned */ +#if 0 if (adapter->flags & IGB_FLAG_HAS_MSIX) { u32 eics = 0; @@ -4622,6 +4623,7 @@ no_wait: } else { wr32(E1000_ICS, E1000_ICS_RXDMT0); } +#endif igb_spoof_check(adapter); igb_ptp_rx_hang(adapter);