Message ID | 1394448955-19915-1-git-send-email-nhorman@tuxdriver.com |
---|---|
State | Accepted, archived |
Delegated to: | David Miller |
Headers | show |
From: Neil Horman <nhorman@tuxdriver.com> Date: Mon, 10 Mar 2014 06:55:55 -0400 > vmxnet3's netpoll driver is incorrectly coded. It directly calls > vmxnet3_do_poll, which is the driver internal napi poll routine. As the netpoll > controller method doesn't block real napi polls in any way, there is a potential > for race conditions in which the netpoll controller method and the napi poll > method run concurrently. The result is data corruption causing panics such as this > one recently observed: > PID: 1371 TASK: ffff88023762caa0 CPU: 1 COMMAND: "rs:main Q:Reg" > #0 [ffff88023abd5780] machine_kexec at ffffffff81038f3b > #1 [ffff88023abd57e0] crash_kexec at ffffffff810c5d92 > #2 [ffff88023abd58b0] oops_end at ffffffff8152b570 > #3 [ffff88023abd58e0] die at ffffffff81010e0b > #4 [ffff88023abd5910] do_trap at ffffffff8152add4 > #5 [ffff88023abd5970] do_invalid_op at ffffffff8100cf95 > #6 [ffff88023abd5a10] invalid_op at ffffffff8100bf9b > [exception RIP: vmxnet3_rq_rx_complete+1968] > RIP: ffffffffa00f1e80 RSP: ffff88023abd5ac8 RFLAGS: 00010086 > RAX: 0000000000000000 RBX: ffff88023b5dcee0 RCX: 00000000000000c0 > RDX: 0000000000000000 RSI: 00000000000005f2 RDI: ffff88023b5dcee0 > RBP: ffff88023abd5b48 R8: 0000000000000000 R9: ffff88023a3b6048 > R10: 0000000000000000 R11: 0000000000000002 R12: ffff8802398d4cd8 > R13: ffff88023af35140 R14: ffff88023b60c890 R15: 0000000000000000 > ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 > #7 [ffff88023abd5b50] vmxnet3_do_poll at ffffffffa00f204a [vmxnet3] > #8 [ffff88023abd5b80] vmxnet3_netpoll at ffffffffa00f209c [vmxnet3] > #9 [ffff88023abd5ba0] netpoll_poll_dev at ffffffff81472bb7 > > The fix is to do as other drivers do, and have the poll controller call the top > half interrupt handler, which schedules a napi poll properly to recieve frames > > Tested by myself, successfully. > > Signed-off-by: Neil Horman <nhorman@tuxdriver.com> VMware folks, please review. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> > The fix is to do as other drivers do, and have the poll controller call the > > top > > half interrupt handler, which schedules a napi poll properly to recieve > > frames > > > > Tested by myself, successfully. > > > > Signed-off-by: Neil Horman <nhorman@tuxdriver.com> > > VMware folks, please review > Neil, thanks for the updated change. It looks good. Reviewed-by: Shreyas N Bhatewara <sbhatewara@vmware.com> -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Mar 11, 2014 at 11:24:08AM -0700, Shreyas Bhatewara wrote: > > > The fix is to do as other drivers do, and have the poll controller call the > > > top > > > half interrupt handler, which schedules a napi poll properly to recieve > > > frames > > > > > > Tested by myself, successfully. > > > > > > Signed-off-by: Neil Horman <nhorman@tuxdriver.com> > > > > VMware folks, please review > > > Neil, thanks for the updated change. It looks good. > > Reviewed-by: Shreyas N Bhatewara <sbhatewara@vmware.com> > Thanks guys! Neil -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Neil Horman <nhorman@tuxdriver.com> Date: Mon, 10 Mar 2014 06:55:55 -0400 > vmxnet3's netpoll driver is incorrectly coded. It directly calls > vmxnet3_do_poll, which is the driver internal napi poll routine. As the netpoll > controller method doesn't block real napi polls in any way, there is a potential > for race conditions in which the netpoll controller method and the napi poll > method run concurrently. The result is data corruption causing panics such as this > one recently observed: ... > The fix is to do as other drivers do, and have the poll controller call the top > half interrupt handler, which schedules a napi poll properly to recieve frames > > Tested by myself, successfully. > > Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Applied. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/net/vmxnet3/vmxnet3_drv.c b/drivers/net/vmxnet3/vmxnet3_drv.c index 3be786f..b7daa02 100644 --- a/drivers/net/vmxnet3/vmxnet3_drv.c +++ b/drivers/net/vmxnet3/vmxnet3_drv.c @@ -1761,12 +1761,18 @@ static void vmxnet3_netpoll(struct net_device *netdev) { struct vmxnet3_adapter *adapter = netdev_priv(netdev); + int i; - if (adapter->intr.mask_mode == VMXNET3_IMM_ACTIVE) - vmxnet3_disable_all_intrs(adapter); - - vmxnet3_do_poll(adapter, adapter->rx_queue[0].rx_ring[0].size); - vmxnet3_enable_all_intrs(adapter); + switch (adapter->intr.type) { + case VMXNET3_IT_MSIX: + for (i = 0; i < adapter->num_rx_queues; i++) + vmxnet3_msix_rx(0, &adapter->rx_queue[i]); + break; + case VMXNET3_IT_MSI: + default: + vmxnet3_intr(0, adapter->netdev); + break; + } } #endif /* CONFIG_NET_POLL_CONTROLLER */
vmxnet3's netpoll driver is incorrectly coded. It directly calls vmxnet3_do_poll, which is the driver internal napi poll routine. As the netpoll controller method doesn't block real napi polls in any way, there is a potential for race conditions in which the netpoll controller method and the napi poll method run concurrently. The result is data corruption causing panics such as this one recently observed: PID: 1371 TASK: ffff88023762caa0 CPU: 1 COMMAND: "rs:main Q:Reg" #0 [ffff88023abd5780] machine_kexec at ffffffff81038f3b #1 [ffff88023abd57e0] crash_kexec at ffffffff810c5d92 #2 [ffff88023abd58b0] oops_end at ffffffff8152b570 #3 [ffff88023abd58e0] die at ffffffff81010e0b #4 [ffff88023abd5910] do_trap at ffffffff8152add4 #5 [ffff88023abd5970] do_invalid_op at ffffffff8100cf95 #6 [ffff88023abd5a10] invalid_op at ffffffff8100bf9b [exception RIP: vmxnet3_rq_rx_complete+1968] RIP: ffffffffa00f1e80 RSP: ffff88023abd5ac8 RFLAGS: 00010086 RAX: 0000000000000000 RBX: ffff88023b5dcee0 RCX: 00000000000000c0 RDX: 0000000000000000 RSI: 00000000000005f2 RDI: ffff88023b5dcee0 RBP: ffff88023abd5b48 R8: 0000000000000000 R9: ffff88023a3b6048 R10: 0000000000000000 R11: 0000000000000002 R12: ffff8802398d4cd8 R13: ffff88023af35140 R14: ffff88023b60c890 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #7 [ffff88023abd5b50] vmxnet3_do_poll at ffffffffa00f204a [vmxnet3] #8 [ffff88023abd5b80] vmxnet3_netpoll at ffffffffa00f209c [vmxnet3] #9 [ffff88023abd5ba0] netpoll_poll_dev at ffffffff81472bb7 The fix is to do as other drivers do, and have the poll controller call the top half interrupt handler, which schedules a napi poll properly to recieve frames Tested by myself, successfully. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: Shreyas Bhatewara <sbhatewara@vmware.com> CC: "VMware, Inc." <pv-drivers@vmware.com> CC: "David S. Miller" <davem@davemloft.net> CC: stable@vger.kernel.org --- Change notes: v2) * Fixed missing break statements * Added loop for each rx queue --- drivers/net/vmxnet3/vmxnet3_drv.c | 16 +++++++++++----- 1 file changed, 11 insertions(+), 5 deletions(-)