Message ID | 1386614601-9453-1-git-send-email-natg@google.com |
---|---|
State | Accepted, archived |
Delegated to: | David Miller |
Headers | show |
On Mon, 2013-12-09 at 10:43 -0800, Nat Gurumoorthy wrote: > The new tg3 driver leaves REG_BASE_ADDR (PCI config offset 120) > uninitialized. From power on reset this register may have garbage in it. The > Register Base Address register defines the device local address of a > register. The data pointed to by this location is read or written using > the Register Data register (PCI config offset 128). When REG_BASE_ADDR has > garbage any read or write of Register Data Register (PCI 128) will cause the > PCI bus to lock up. The TCO watchdog will fire and bring down the system. > > Is this to prevent problem from other software that may be scanning the PCI config space? It won't help if this happens before the tg3 driver is loaded, right? Thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Michael, We had crashes when the PCI config space got scanned via /sys/devices/pcixxxx/....../config. I agree that this fix will not help if the scan happens before the tg3 driver gets loaded. Regards Nat On Mon, Dec 9, 2013 at 11:00 AM, Michael Chan <mchan@broadcom.com> wrote: > On Mon, 2013-12-09 at 10:43 -0800, Nat Gurumoorthy wrote: >> The new tg3 driver leaves REG_BASE_ADDR (PCI config offset 120) >> uninitialized. From power on reset this register may have garbage in it. The >> Register Base Address register defines the device local address of a >> register. The data pointed to by this location is read or written using >> the Register Data register (PCI config offset 128). When REG_BASE_ADDR has >> garbage any read or write of Register Data Register (PCI 128) will cause the >> PCI bus to lock up. The TCO watchdog will fire and bring down the system. >> >> > Is this to prevent problem from other software that may be scanning the > PCI config space? It won't help if this happens before the tg3 driver > is loaded, right? > > Thanks. >
On Tue, 2013-12-10 at 00:18 +0300, Sergei Shtylyov wrote: > > We had crashes when the PCI config space got scanned via > > /sys/devices/pcixxxx/....../config. > > > I agree that this fix will not help if the scan happens before the tg3 > > driver gets loaded. > > Then perhaps a better place for such fixup would be a PCI quirk? > Yes, I agree. Thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hello. On 12/09/2013 10:22 PM, Natarajan Gurumoorthy wrote: > Michael, > We had crashes when the PCI config space got scanned via > /sys/devices/pcixxxx/....../config. > I agree that this fix will not help if the scan happens before the tg3 > driver gets loaded. Then perhaps a better place for such fixup would be a PCI quirk? > Regards > Nat > On Mon, Dec 9, 2013 at 11:00 AM, Michael Chan <mchan@broadcom.com> wrote: >> On Mon, 2013-12-09 at 10:43 -0800, Nat Gurumoorthy wrote: >>> The new tg3 driver leaves REG_BASE_ADDR (PCI config offset 120) >>> uninitialized. From power on reset this register may have garbage in it. The >>> Register Base Address register defines the device local address of a >>> register. The data pointed to by this location is read or written using >>> the Register Data register (PCI config offset 128). When REG_BASE_ADDR has >>> garbage any read or write of Register Data Register (PCI 128) will cause the >>> PCI bus to lock up. The TCO watchdog will fire and bring down the system. >> Is this to prevent problem from other software that may be scanning the >> PCI config space? It won't help if this happens before the tg3 driver >> is loaded, right? >> Thanks. WBR, Sergei -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 2013-12-09 at 13:07 -0800, Michael Chan wrote: > On Tue, 2013-12-10 at 00:18 +0300, Sergei Shtylyov wrote: > > > We had crashes when the PCI config space got scanned via > > > /sys/devices/pcixxxx/....../config. > > > > > I agree that this fix will not help if the scan happens before the tg3 > > > driver gets loaded. > > > > Then perhaps a better place for such fixup would be a PCI quirk? > > > Yes, I agree. Thanks. > On second thought, I think your original patch should be sufficient and we don't need to add the PCI quirk to cover so many devices. The reason is that indirect access needs to be explicitly enabled in the MISC_HOST_CTRL (0x68) register. The default value for register 0x68 should have indirect access disabled. Nat, does this match what you're seeing? Did you ever see any system crash before tg3 loads? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Michael, The only time I see crashes is after the tg3 driver has been loaded into the system. I our use case we are poking around /sys/devices/pcixxxx/......../config. I guess you will incorporate the original patch into the driver and we can abandon this patch. Regards Nat On Mon, Dec 9, 2013 at 10:58 PM, Michael Chan <mchan@broadcom.com> wrote: > On Mon, 2013-12-09 at 13:07 -0800, Michael Chan wrote: >> On Tue, 2013-12-10 at 00:18 +0300, Sergei Shtylyov wrote: >> > > We had crashes when the PCI config space got scanned via >> > > /sys/devices/pcixxxx/....../config. >> > >> > > I agree that this fix will not help if the scan happens before the tg3 >> > > driver gets loaded. >> > >> > Then perhaps a better place for such fixup would be a PCI quirk? >> > >> Yes, I agree. Thanks. >> > > On second thought, I think your original patch should be sufficient and > we don't need to add the PCI quirk to cover so many devices. The reason > is that indirect access needs to be explicitly enabled in the > MISC_HOST_CTRL (0x68) register. The default value for register 0x68 > should have indirect access disabled. > > Nat, does this match what you're seeing? Did you ever see any system > crash before tg3 loads? >
On Tue, 2013-12-10 at 07:01 -0800, Natarajan Gurumoorthy wrote: > The only time I see crashes is after the tg3 driver has been > loaded into the system. I our use case we are poking around > /sys/devices/pcixxxx/......../config. David, please apply this first patch. Acked-by: Michael Chan <mchan@broadcom.com> -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Michael Chan <mchan@broadcom.com> Date: Mon, 9 Dec 2013 22:58:44 -0800 > On Mon, 2013-12-09 at 13:07 -0800, Michael Chan wrote: >> On Tue, 2013-12-10 at 00:18 +0300, Sergei Shtylyov wrote: >> > > We had crashes when the PCI config space got scanned via >> > > /sys/devices/pcixxxx/....../config. >> > >> > > I agree that this fix will not help if the scan happens before the tg3 >> > > driver gets loaded. >> > >> > Then perhaps a better place for such fixup would be a PCI quirk? >> > >> Yes, I agree. Thanks. >> > > On second thought, I think your original patch should be sufficient and > we don't need to add the PCI quirk to cover so many devices. The reason > is that indirect access needs to be explicitly enabled in the > MISC_HOST_CTRL (0x68) register. The default value for register 0x68 > should have indirect access disabled. > > Nat, does this match what you're seeing? Did you ever see any system > crash before tg3 loads? What if the kernel is booted via kexec, and the driver in the kernel we are kexec'ing from left indirect access enabled in MISC_HOST_CTRL? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 2013-12-10 at 13:43 -0500, David Miller wrote: > What if the kernel is booted via kexec, and the driver in the kernel > we are kexec'ing from left indirect access enabled in MISC_HOST_CTRL? That should be ok. The driver will only use valid register offsets in indirect mode during run time, so register 0x78 should point to a valid register. If indirect mode is never used by the driver, it will point to zero with this patch. So register 0x78 will be valid (or zero) in the kexec'ed kernel. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Michael Chan <mchan@broadcom.com> Date: Tue, 10 Dec 2013 10:49:39 -0800 > On Tue, 2013-12-10 at 13:43 -0500, David Miller wrote: > >> What if the kernel is booted via kexec, and the driver in the kernel >> we are kexec'ing from left indirect access enabled in MISC_HOST_CTRL? > > That should be ok. The driver will only use valid register offsets in > indirect mode during run time, so register 0x78 should point to a valid > register. If indirect mode is never used by the driver, it will point > to zero with this patch. So register 0x78 will be valid (or zero) in > the kexec'ed kernel. Ok, that may be true, but I'd like to consider the much larger issue at hand. If the indirect mechanism is enabled, some of the offsets that may be in there might be value, but would be entirely undesirable to be read because the read has side effects. What if the interrupt status register is what gets read if the user scans config space at just the wrong moment, and therefore an interrupt gets lost? I understand that the patch we are discussing is a serious improvement from the current situation, so I will apply it and queue it up for -stable. However I think we need to do something reasonable to prevent the kinds of situations I described above. Thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 2013-12-10 at 22:23 -0500, David Miller wrote: > Ok, that may be true, but I'd like to consider the much larger issue > at hand. > > If the indirect mechanism is enabled, some of the offsets that may be > in there might be value, but would be entirely undesirable to be read > because the read has side effects. > > What if the interrupt status register is what gets read if the user > scans config space at just the wrong moment, and therefore an > interrupt gets lost? > I did a quick audit of the driver to see if we are seriously exposed to this problem. Only very old chips (5700 to 5703) use indirect register access as workarounds for various issues. I looked at the IRQ path which uses status block to determine if there is work to do. If status block contains no new data, we check the PCI_STATE register and the bits don't get cleared on read. MAC status register which is used to determine link status does not have bits that are cleared on read. So we should be ok. We use the statistics block for counters on these older chips so there is no conflict. On newer chips, we read counters from registers which get cleared on read, but we never use indirect methods to read these counters. On all new chips, indirect method is not used except to download firmware (on 57766 only). Userspace read will not interfere with firmware download. So it looks to me that we are in good shape. Thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Michael Chan <mchan@broadcom.com> Date: Wed, 11 Dec 2013 15:21:18 -0800 > So it looks to me that we are in good shape. Thanks. Thanks for checking this out, and providing such detailed analysis. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c index 369b736..9a904fd 100644 --- a/drivers/net/ethernet/broadcom/tg3.c +++ b/drivers/net/ethernet/broadcom/tg3.c @@ -16499,6 +16499,9 @@ static int tg3_get_invariants(struct tg3 *tp, const struct pci_device_id *ent) /* Clear this out for sanity. */ tw32(TG3PCI_MEM_WIN_BASE_ADDR, 0); + /* Clear TG3PCI_REG_BASE_ADDR to prevent hangs. */ + tw32(TG3PCI_REG_BASE_ADDR, 0); + pci_read_config_dword(tp->pdev, TG3PCI_PCISTATE, &pci_state_reg); if ((pci_state_reg & PCISTATE_CONV_PCI_MODE) == 0 &&
The new tg3 driver leaves REG_BASE_ADDR (PCI config offset 120) uninitialized. From power on reset this register may have garbage in it. The Register Base Address register defines the device local address of a register. The data pointed to by this location is read or written using the Register Data register (PCI config offset 128). When REG_BASE_ADDR has garbage any read or write of Register Data Register (PCI 128) will cause the PCI bus to lock up. The TCO watchdog will fire and bring down the system. Signed-off-by: Nat Gurumoorthy <natg@google.com> --- drivers/net/ethernet/broadcom/tg3.c | 3 +++ 1 file changed, 3 insertions(+)