Message ID | 20200423173955.GA193359@google.com |
---|---|
State | New |
Headers | show |
Series | [GIT,PULL] PCI fixes for v5.7 | expand |
On Thu, Apr 23, 2020 at 10:40 AM Bjorn Helgaas <helgaas@kernel.org> wrote: > > - Workaround Apex TPU class code issue that prevents resource > assignment (Bjorn Helgaas) Hmm. I have no objections to that patch, but I do wonder if it might not be better to try to actually assign the resource at enable_resource time? Put another way: if I read the situation correctly, what happened is that the hardware is broken and doesn't have the proper class code, and so the resource is not initially assigned at all. But then the driver matches on the device ID, and tries to use the device, and then we get into trouble at pci_enable_resources(). But is there any reason we don't just at least try to do pci_assign_resource() at that point? Yeah, because we didn't do it at bus scanning, maybe there's no room for it, but that's what we do for the PCI ROM resources (which I think we also don't claim by default) when drivers ask to map them. The pci/rom.c code does /* assign the ROM an address if it doesn't have one */ if (res->parent == NULL && pci_assign_resource(pdev, PCI_ROM_RESOURCE)) return NULL; could we perhaps do the same in enable_resource? Your patch is obviously much better for an -rc kernel, so this is more of a longer-term "wouldn't it be less fragile to ..." query. Alternatively, maybe we should do resource assignment even for PCI_CLASS_NOT_DEFINED? Linus
The pull request you sent on Thu, 23 Apr 2020 12:39:55 -0500:
> git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git tags/pci-v5.7-fixes-1
has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/25b1fa8dfb3fe2578c04a077953b13c534f30902
Thank you!
On Thu, Apr 23, 2020 at 11:22:20AM -0700, Linus Torvalds wrote: > On Thu, Apr 23, 2020 at 10:40 AM Bjorn Helgaas <helgaas@kernel.org> wrote: > > > > - Workaround Apex TPU class code issue that prevents resource > > assignment (Bjorn Helgaas) > > Hmm. > > I have no objections to that patch, but I do wonder if it might not be > better to try to actually assign the resource at enable_resource time? > > Put another way: if I read the situation correctly, what happened is > that the hardware is broken and doesn't have the proper class code, > and so the resource is not initially assigned at all. But then the > driver matches on the device ID, and tries to use the device, and then > we get into trouble at pci_enable_resources(). Exactly. > But is there any reason we don't just at least try to do > pci_assign_resource() at that point? Yeah, because we didn't do it at > bus scanning, maybe there's no room for it, but that's what we do for > the PCI ROM resources (which I think we also don't claim by default) > when drivers ask to map them. That might make sense, but I think we should be consistent with the checking __dev_sort_resources() does, e.g., skipping PCI_CLASS_NOT_DEFINED, or at least understand why it's safe to be different. > The pci/rom.c code does > > /* assign the ROM an address if it doesn't have one */ > if (res->parent == NULL && pci_assign_resource(pdev, PCI_ROM_RESOURCE)) > return NULL; > > could we perhaps do the same in enable_resource? > > Your patch is obviously much better for an -rc kernel, so this is more > of a longer-term "wouldn't it be less fragile to ..." query. > > Alternatively, maybe we should do resource assignment even for > PCI_CLASS_NOT_DEFINED? Yeah. I don't know the history of why we skip PCI_CLASS_NOT_DEFINED. I did consider about the fact that we're skipping it, to make it easier to debug next time. PCI_CLASS_NOT_DEFINED is supposed to be for devices built before the Class Code field was defined. That note is at least as old as PCI 2.2 from 1998, so there shouldn't be *that* many of those devices left. Bjorn
On Thu, Apr 23, 2020 at 10:23:05PM -0500, Bjorn Helgaas wrote: > Yeah. I don't know the history of why we skip PCI_CLASS_NOT_DEFINED. > I did consider about the fact that we're skipping it, to make it > easier to debug next time. I did consider *warning* about ...
I think a "warning" would of great value, as it would be easy to identify the root cause of such issues pretty quickly. On Fri, Apr 24, 2020 at 4:55 AM Bjorn Helgaas <helgaas@kernel.org> wrote: > > On Thu, Apr 23, 2020 at 10:23:05PM -0500, Bjorn Helgaas wrote: > > Yeah. I don't know the history of why we skip PCI_CLASS_NOT_DEFINED. > > I did consider about the fact that we're skipping it, to make it > > easier to debug next time. > > I did consider *warning* about ...