Message ID | 20120601125949.GA11973@electric-eye.fr.zoreil.com |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
Thanks for the quick reply. Le vendredi 01 juin 2012 14:59:49, vous avez écrit : > Same thing if you reset and remove the pci device through sysfs then ask > the PCI bridge to scan it again ? I didn't try it before - but I should have, I know this. rmmod; reset; modprobe -> doesn't work rmmod; reset; remove; rescan -> doesn't work either (?!) > https://bugzilla.kernel.org/show_bug.cgi?id=42899 contains similar if not > identical IOMMU messages (this #bz is messy but it may be of intereset to > add yourself to the Cc: list btw). I found it a bit after my post (while watching the archives, in case someone replied without CC :) ). I posted on that bug as I couldn't find a way to just add me to bug CC. > The r8169 bug is real but the IOMMU message seems rather useless if not > bogus. Just being curious, feel free to skip over my questions: If it's bogus, could it be a mis-interpretation of its state when the error occurs (I don't know how CPU knows a fault happened, I guess some IRQ + some register contain error status, address of error, some process/context identifier) ? Or hardware bug ? Or MMU misconfiguration for some reason ? If it's not bogus, would it be the sign of firmware bug (accessing some unpredictable memory upon certain conditions) ? > You can apply the attached patch but it may not do much for your problem. > The patch below could make a difference though. Does it ? I'll try either and both. Given the poor result I got from reset/remove/rescan, I guess I should reboot between attempts, right ? Should I prevent original module auto-loading at boot ? Maybe more than just r8169 ? Regards,
Vincent Pelletier <plr.vincent@gmail.com> : [...] > If it's bogus, could it be a mis-interpretation of its state when the error > occurs (I don't know how CPU knows a fault happened, I guess some IRQ + some > register contain error status, address of error, some process/context > identifier) ? See "AMD I/O Virtualization Technology (IOMMU) Specification". > Or hardware bug ? Or MMU misconfiguration for some reason ? I don't have time to poke deeply enough into the iommu code. [...] > If it's not bogus, would it be the sign of firmware bug (accessing some > unpredictable memory upon certain conditions) ? That's what I thought first. Or I should have added something to the r8169 driver. However it's quite reproducible, the failing address is one of the mapped Rx or Tx descriptor ring address - don't remember which one, see the PR at korg - and it does not fit the timing pattern. [...] > I'll try either and both. Given the poor result I got from > reset/remove/rescan, I guess I should reboot between attempts, right ? Yes. The inlined patch could help avoiding the problem but it is not supposed to help a failed network adapter recovering. > Should I prevent original module auto-loading at boot ? Maybe more than just > r8169 ? It should not be required. YMMV.
diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c index bbacb37..da46588 100644 --- a/drivers/net/ethernet/realtek/r8169.c +++ b/drivers/net/ethernet/realtek/r8169.c @@ -3766,6 +3766,7 @@ static void rtl_init_rxcfg(struct rtl8169_private *tp) case RTL_GIGA_MAC_VER_22: case RTL_GIGA_MAC_VER_23: case RTL_GIGA_MAC_VER_24: + case RTL_GIGA_MAC_VER_34: RTL_W32(RxConfig, RX128_INT_EN | RX_MULTI_EN | RX_DMA_BURST); break; default: