Message ID | 1442844109.7367.5.camel@infradead.org |
---|---|
State | Not Applicable, archived |
Delegated to: | David Miller |
Headers | show |
David Woodhouse <dwmw2@infradead.org> : > From: David Woodhouse <David.Woodhouse@intel.com> > > The TX timeout handling has been observed to trigger RX IRQ storms. And > since cp_interrupt() just keeps saying that it handled the interrupt, > the machine then dies. Fix the return value from cp_interrupt(), and > the offending IRQ gets disabled and the machine survives. I am not fond of the way it dissociates the hardware status word and the software "handled" variable. What you are describing - RX IRQ storms - looks like a problem between the irq and poll handlers. That's where I expect the problem to be solved. Sprinkling "handled" operations does not make me terribly confortable, especially as I'd happily trade the old-style part irq, part napi processing for a plain napi processing (I can get over it though :o) ). Once the code is past the "if (!status || (status == 0xFFFF))" test - or whatever test against some mask - I don't see why the driver could refuse to take ownership of the irq.
On Mon, 2015-09-21 at 22:25 +0200, Francois Romieu wrote: > David Woodhouse <dwmw2@infradead.org> : > > From: David Woodhouse <David.Woodhouse@intel.com> > > > > The TX timeout handling has been observed to trigger RX IRQ storms. And > > since cp_interrupt() just keeps saying that it handled the interrupt, > > the machine then dies. Fix the return value from cp_interrupt(), and > > the offending IRQ gets disabled and the machine survives. > > I am not fond of the way it dissociates the hardware status word and the > software "handled" variable. Oh, I like that part very much :) The practice of returning a 'handled' status only when you've actually *done* something you expect to mitigate the interrupt, is a useful way of also protecting against both hardware misbehaviour and software bugs. > What you are describing - RX IRQ storms - looks like a problem between > the irq and poll handlers. That's where I expect the problem to be solved. I already fixed that, in the next patch in the series. But the failure mode *should* have been 'IRQ disabled' and the device continuing to work via polling. Not a complete death of the machine. That's the difference this patch makes. > Sprinkling "handled" operations does not make me terribly confortable, > especially as I'd happily trade the old-style part irq, part napi > processing for a plain napi processing (I can get over it though :o) ). The existing cp_rx_poll() function mostly runs without taking cp->lock. But cp_tx() *does* need cp->lock for the tx_head/tx_tail manipulations. I'm guessing that's why it's still being called directly from the hard IRQ handler? I suppose we could probably find a way to move that out. But I was mostly just trying to fix stuff that was actually broken... :)
From: David Woodhouse <dwmw2@infradead.org> Date: Mon, 21 Sep 2015 15:01:49 +0100 > From: David Woodhouse <David.Woodhouse@intel.com> > > The TX timeout handling has been observed to trigger RX IRQ storms. And > since cp_interrupt() just keeps saying that it handled the interrupt, > the machine then dies. Fix the return value from cp_interrupt(), and > the offending IRQ gets disabled and the machine survives. > > Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Like Francois, I don't like this. First of all, there are only 3 bits not handled explicitly by cp_interrupt(). And for those if they are set and no other condition was rasied, you should report the event and the status bits set, and then forcibly clear the interrupt. And if we are getting Rx* interrupts with napi_schedule_prep() returning false, that's a serious problem. It can mean that the TX timeout handler's resetting of the chip is either miscoded or is racing with either NAPI polling or this interrupt handler. And if that's the case your patch is making the chip's IRQ line get disabled when this race triggers. This change is even worse, in my opinion, if patch #2 indeed makes the problem go away. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 2015-09-22 at 16:45 -0700, David Miller wrote: > And if we are getting Rx* interrupts with napi_schedule_prep() > returning false, that's a serious problem. It can mean that the TX > timeout handler's resetting of the chip is either miscoded or is > racing with either NAPI polling or this interrupt handler. Such bugs happen. That's why we have IRQ-storm detection, to help protect us. > And if that's the case your patch is making the chip's IRQ line get > disabled when this race triggers. > > This change is even worse, in my opinion, if patch #2 indeed makes > the problem go away. Even worse than what? When a bug like the one I've fixed in #2 happens, let's look at the options: - With this patch, the IRQ storm is detected. We disable the IRQ line and the driver continues to work, albeit with a higher latency. - Without this patch, the box just dies completely. First the hardware watchdog was triggering an apparently spontaneous reset. Then when I disabled the hardware watchdog, the machine locked up and doesn't even respond to serial sysrq. Eventually I got a little bit of sense out of it using the NMI watchdog. Trust me, the disabled IRQ you get with my patch really *isn't* worse than what happened without it :) If cp_interrupt() had already worked the way I suggest, I would literally have two days of my life back right now. I spent a *lot* of time working out that it was dying in an interrupt storm. And trying to find the underlying problem while having to physically visit it under the stairs and reset it all the time. In fact, I've been seeing undebuggable spontaneous resets of this router for a while now. If I'd had an 'IRQ disabled' backtrace and a smoking gun, I might have been able to fix it a long time ago. I even discounted the IRQ-storm hypothesis, early on, on the basis that "Linux handles those quite gracefully now, and disables the interrupt". Except of course *that* relies on the IRQ handler returning an appropriate value. So it doesn't work with 8139cp as things stand. That's why I'm really quite keen on fixing cp_interrupt() to be more defensive in when it claims to have handled the interrupt. Surely that's *why* we have the IRQ-storm detection in the first place — to help us survive precisely this situation. IRQ storms are either caused by software bugs like the one fixed in patch #2, or hardware misbehaviour. And if IRQ handlers blindly return 'handled' just because the hardware actually *did* assert an interrupt, regardless of whether they've made any *progress*, then we don't get that protection in a *lot* of the situations where we'd want it. But sure, for now I'll drop this from the series and I can try to convince you separately. In fact I think it only affects the last patch in the series which reduces the banging on TxPoll. And I'm going to drop that one too for now anyway. I'll repost shortly. And I think I'll add a new one at the end, enabling TX checksums, SG and TSO by default.
From: David Woodhouse <dwmw2@infradead.org> Date: Wed, 23 Sep 2015 09:14:09 +0100 > But sure, for now I'll drop this from the series and I can try to > convince you separately. Yes, let's discuss this independantly to the nice bug fixing that's happening in the rest of this series. Thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2015-09-23 at 10:58 -0700, David Miller wrote: > From: David Woodhouse <dwmw2@infradead.org> > Date: Wed, 23 Sep 2015 09:14:09 +0100 > > > But sure, for now I'll drop this from the series and I can try to > > convince you separately. > > Yes, let's discuss this independantly to the nice bug fixing > that's happening in the rest of this series. Since you explicitly call them 'bug fixes'... I suppose the last patch in the latest series (enabling the various offloads by default) isn't a bug fix and should be held back for net-next. I think I can live with pushing the rest for the net tree. Just to recap, the series is: 8139cp: Do not re-enable RX interrupts in cp_tx_timeout() 8139cp: Fix tx_queued debug message to print correct slot numbers 8139cp: Fix TSO/scatter-gather descriptor setup 8139cp: Reduce duplicate csum/tso code in cp_start_xmit() 8139cp: Fix DMA unmapping of transmitted buffers 8139cp: Dump contents of descriptor ring on TX timeout 8139cp: Enable offload features by default The penultimate patch is also not strictly a bug fix, but it's only adding additional diagnostics to a code path which was already broken until I fixed it anyway, so I can live with that in net although I'll also be happy if you want to defer it. The fourth patch which removes the three duplicate versions of the csum/tso checks might also be deferrable but really, it's *also* fixing the failure mode when we see an inappropriate CHECKSUM_PARTIAL packet, and it's touching a code path which I've *also* fixed in this patch set. So it might as well stay. I can shuffle things around if you disagree, but assuming Francois concurs I'd suggest merging patches 1-6 (or just pulling from git://git.infradead.org/users/dwmw2/linux-8139cp.git fixes-only) and then I'll resubmit the last patch some time later, after you next pull net into net-next. Otherwise, please let me know if/how you'd like me to reorganise it.
From: David Woodhouse <dwmw2@infradead.org> Date: Wed, 23 Sep 2015 20:45:30 +0100 > Since you explicitly call them 'bug fixes'... I suppose the last patch > in the latest series (enabling the various offloads by default) isn't a > bug fix and should be held back for net-next. > > I think I can live with pushing the rest for the net tree. Just to > recap, the series is: I've applied patches #1-#6 and left #7 sitting around in patchwork for when I next merge net into net-next, thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2015-09-23 at 14:48 -0700, David Miller wrote: > I've applied patches #1-#6 and left #7 sitting around in patchwork > for when I next merge net into net-next, thanks. Great, thanks. I've got one more for the net tree; just masking the gso_size to the maximum (0xfff) that the hardware can handle, and never telling the net stack that that's our maximum, doesn't seem like such a good idea... I think I really am going to stop looking now :)
David Woodhouse <dwmw2@infradead.org> : [...] > But sure, for now I'll drop this from the series and I can try to > convince you separately. You may as well change the IRQ storm avoidance logic so that it does not require conformant driver code if you do not want to waste more time :o)
David Woodhouse <dwmw2@infradead.org> : [...] > I can shuffle things around if you disagree, but assuming Francois > concurs I'd suggest merging patches 1-6 (or just pulling from > git://git.infradead.org/users/dwmw2/linux-8139cp.git fixes-only) > and then I'll resubmit the last patch some time later, after you > next pull net into net-next. No objection but I do not really trust whatever review I do this late. Did you try #5 with the DMA allocation debug code enabled ?
On Thu, 2015-09-24 at 00:44 +0200, Francois Romieu wrote: > David Woodhouse <dwmw2@infradead.org> : > [...] > > I can shuffle things around if you disagree, but assuming Francois > > concurs I'd suggest merging patches 1-6 (or just pulling from > > git://git.infradead.org/users/dwmw2/linux-8139cp.git fixes-only) > > and then I'll resubmit the last patch some time later, after you > > next pull net into net-next. > > No objection but I do not really trust whatever review I do this late. > > Did you try #5 with the DMA allocation debug code enabled ? Only in QEMU (where there wasn't a problem in the first place). But yes.
On Thu, 2015-09-24 at 00:44 +0200, Francois Romieu wrote: > David Woodhouse <dwmw2@infradead.org> : > [...] > > But sure, for now I'll drop this from the series and I can try to > > convince you separately. > > You may as well change the IRQ storm avoidance logic so that it does not > require conformant driver code if you do not want to waste more time :o) That's not entirely trivial — at a certain level we *do* require that drivers don't lie to the core IRQ logic. I suppose even if a rogue driver is returning IRQ_HANDLED in an IRQ storm, when it's *not* actually doing anything or making any progress, perhaps there's a way we could detect that we're always going straight back into the next interrupt handler as *soon* as we exit the previous one. Perhaps there's merit in exploring that option. I'm not sure that gives you a valid excuse for not wanting to fix the driver though :) I might start by "clarifying" the documentation on the IRQ handler's return value...
On Wed, 2015-09-23 at 10:58 -0700, David Miller wrote: > From: David Woodhouse <dwmw2@infradead.org> > Date: Wed, 23 Sep 2015 09:14:09 +0100 > > > But sure, for now I'll drop this from the series and I can try to > > convince you separately. > > Yes, let's discuss this independantly to the nice bug fixing > that's happening in the rest of this series. > > Thanks. This is the patch we discussed earlier, and I came away from the conversation believing that I had achieved my goal of trying to convince you separately :) The patch in patchwork at https://patchwork.ozlabs.org/patch/520318/ still applies cleanly; is that sufficient or would you like it resubmitted (perhaps after -rc1)? Thanks.
diff --git a/drivers/net/ethernet/realtek/8139cp.c b/drivers/net/ethernet/realtek/8139cp.c index ba3dab7..f1054ad 100644 --- a/drivers/net/ethernet/realtek/8139cp.c +++ b/drivers/net/ethernet/realtek/8139cp.c @@ -371,7 +371,7 @@ struct cp_private { static void __cp_set_rx_mode (struct net_device *dev); -static void cp_tx (struct cp_private *cp); +static int cp_tx (struct cp_private *cp); static void cp_clean_rings (struct cp_private *cp); #ifdef CONFIG_NET_POLL_CONTROLLER static void cp_poll_controller(struct net_device *dev); @@ -587,8 +587,6 @@ static irqreturn_t cp_interrupt (int irq, void *dev_instance) if (!status || (status == 0xFFFF)) goto out_unlock; - handled = 1; - netif_dbg(cp, intr, dev, "intr, status %04x cmd %02x cpcmd %04x\n", status, cpr8(Cmd), cpr16(CpCmd)); @@ -596,25 +594,30 @@ static irqreturn_t cp_interrupt (int irq, void *dev_instance) /* close possible race's with dev_close */ if (unlikely(!netif_running(dev))) { + handled = 1; cpw16(IntrMask, 0); goto out_unlock; } - if (status & (RxOK | RxErr | RxEmpty | RxFIFOOvr)) + if (status & (RxOK | RxErr | RxEmpty | RxFIFOOvr)) { if (napi_schedule_prep(&cp->napi)) { + handled = 1; cpw16_f(IntrMask, cp_norx_intr_mask); __napi_schedule(&cp->napi); } - + } if (status & (TxOK | TxErr | TxEmpty | SWInt)) - cp_tx(cp); - if (status & LinkChg) - mii_check_media(&cp->mii_if, netif_msg_link(cp), false); + handled |= cp_tx(cp); + if (status & LinkChg) { + handled = 1; + mii_check_media(&cp->mii_if, netif_msg_link(cp), false); + } if (status & PciErr) { u16 pci_status; + handled = 1; pci_read_config_word(cp->pdev, PCI_STATUS, &pci_status); pci_write_config_word(cp->pdev, PCI_STATUS, pci_status); netdev_err(dev, "PCI bus error, status=%04x, PCI status=%04x\n", @@ -645,11 +648,12 @@ static void cp_poll_controller(struct net_device *dev) } #endif -static void cp_tx (struct cp_private *cp) +static int cp_tx (struct cp_private *cp) { unsigned tx_head = cp->tx_head; unsigned tx_tail = cp->tx_tail; unsigned bytes_compl = 0, pkts_compl = 0; + int handled = 0; while (tx_tail != tx_head) { struct cp_desc *txd = cp->tx_ring + tx_tail; @@ -661,6 +665,7 @@ static void cp_tx (struct cp_private *cp) if (status & DescOwn) break; + handled = 1; skb = cp->tx_skb[tx_tail]; BUG_ON(!skb); @@ -704,6 +709,8 @@ static void cp_tx (struct cp_private *cp) netdev_completed_queue(cp->dev, pkts_compl, bytes_compl); if (TX_BUFFS_AVAIL(cp) > (MAX_SKB_FRAGS + 1)) netif_wake_queue(cp->dev); + + return handled; } static inline u32 cp_tx_vlan_tag(struct sk_buff *skb)