Message ID | 20210514080025.1828197-6-chenhuacai@loongson.cn |
---|---|
State | New |
Headers | show |
Series | PCI: Loongson-related pci quirks | expand |
Hi Huacai, [...] > The ASpeed AST2500 hardward does not set the VGA Enable bit on its It would be "hardware" in the above sentence. > bridge control register, which causes vgaarb subsystem don't think the Probably better to say "don't consider the" rather than use "think". > VGA card behind the bridge as a valid boot vga device. It would be "VGA" in the above sentence, to be consistent. > So we provide a quirk to fix Xorg auto-detection. A nit pick. Technically it's "X.org", but I see that the canon in the commit messages is to use "Xorg", so either would be valid. > See similar bug: > > https://patchwork.kernel.org/project/linux-pci/patch/20170619023528.11532-1-dja@axtens.net/ A small nit pick. Linking to https://lore.kernel.org/linux-pci/ is often preferred. [...] > + struct pci_dev *bridge; > + struct pci_bus *bus; > + struct pci_dev *vdevp = NULL; > + u16 config; > + > + bus = pdev->bus; > + bridge = bus->self; Preferred style would be: struct pci_bus *bus = pdev->bug; struct pci_dev *bridge = bus->self; > + /* Is VGA routed to us? */ > + if (bridge && (pci_is_bridge(bridge))) { > + pci_read_config_word(bridge, PCI_BRIDGE_CONTROL, &config); > + > + /* Yes, this bridge is PCI bridge-to-bridge spec compliant, > + * just return! > + */ Second line of the comment has some errand space. Also, wording of this comment could be improved a bit. > + if (config & PCI_BRIDGE_CTL_VGA) > + return; > + > + dev_warn(&pdev->dev, "VGA bridge control is not enabled\n"); > + } > + > + /* Just return if the system already have a default device */ Missing period at the end of the sentence in the comment. I am also not sure this comment is useful, as the if-statement immediately below is quite self-explanatory. > + if (vga_default_device()) > + return; > + > + /* No default vga device */ It would be "VGA" here, plus missing period at the end. > + while ((vdevp = pci_get_class(PCI_CLASS_DISPLAY_VGA << 8, vdevp))) { > + if (vdevp->vendor != 0x1a03) { > + /* Have other vga devcie in the system, do nothing */ It would be "VGA" and "device" in the comment above, missing period at the end. I am also not sure if this comment is useful, given the log line just below it. > + dev_info(&pdev->dev, "Another boot vga device: 0x%x:0x%x\n", It would be "VGA" in the message above. Also, using "Another" is a bit vague. What makes a VGA device "another" in this case? > + dev_info(&pdev->dev, "Boot vga device set as 0x%x:0x%x\n", > + pdev->vendor, pdev->device); You need to align the second line in the above with the open bracket, like in the log line above. > +DECLARE_PCI_FIXUP_CLASS_FINAL(0x1a03, 0x2000, PCI_CLASS_DISPLAY_VGA, 8, aspeed_fixup_vgaarb); You might need to wrap this line above, see: https://lore.kernel.org/linux-pci/20171026223701.GA25649@bhelgaas-glaptop.roam.corp.google.com/ Krzysztof
On Fri, May 14, 2021 at 04:00:25PM +0800, Huacai Chen wrote: > According to PCI-to-PCI bridge spec, bit 3 of Bridge Control Register is > VGA Enable bit which modifies the response to VGA compatible addresses. The bridge spec is pretty old, and most of the content has been incorporated into the PCIe spec. I think you can cite "PCIe r5.0, sec 7.5.1.3.13" here instead. > If the VGA Enable bit is set, the bridge will decode and forward the > following accesses on the primary interface to the secondary interface. *Which* following accesses? The structure of English requires that if you say "the following accesses," you must continue by *listing* the accesses. > The ASpeed AST2500 hardward does not set the VGA Enable bit on its > bridge control register, which causes vgaarb subsystem don't think the > VGA card behind the bridge as a valid boot vga device. s/hardward/bridge/ s/vga/VGA/ (also in code comments and dmesg strings below) From the code, it looks like AST2500 ([1a03:2000]) is a VGA device, since it apparently has a VGA class code. But here you say the AST2500 has a Bridge Control register, which suggests that it's a bridge. If AST2500 is some sort of combination that includes both a bridge and a VGA device, please outline that topology. But the hardware defect is that some bridges forward VGA accesses even though their VGA Enable bit is not set? The quirk should be attached to broken *bridges*, not to VGA devices. If a bridge forwards VGA accesses regardless of how its VGA Enable bit is set, that means VGA arbitration (in vgaarb.c) cannot work correctly, so merely setting the default VGA device once in a quirk is not sufficient. You would have to somehow disable any future attempts to use other VGA devices. Only the VGA device below this defective bridge is usable. Any other VGA devices in the system would be useless. > So we provide a quirk to fix Xorg auto-detection. > > See similar bug: > > https://patchwork.kernel.org/project/linux-pci/patch/20170619023528.11532-1-dja@axtens.net/ This patch was never merged. If we merged a revised version, please cite the SHA1 instead. > Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> > Signed-off-by: Jingfeng Sui <suijingfeng@loongson.cn> > --- > drivers/pci/quirks.c | 47 ++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 47 insertions(+) > > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c > index 6ab4b3bba36b..adf5490706ad 100644 > --- a/drivers/pci/quirks.c > +++ b/drivers/pci/quirks.c > @@ -28,6 +28,7 @@ > #include <linux/platform_data/x86/apple.h> > #include <linux/pm_runtime.h> > #include <linux/switchtec.h> > +#include <linux/vgaarb.h> > #include <asm/dma.h> /* isa_dma_bridge_buggy */ > #include "pci.h" > > @@ -297,6 +298,52 @@ static void loongson_mrrs_quirk(struct pci_dev *dev) > } > DECLARE_PCI_FIXUP_ENABLE(PCI_ANY_ID, PCI_ANY_ID, loongson_mrrs_quirk); > > + > +static void aspeed_fixup_vgaarb(struct pci_dev *pdev) > +{ > + struct pci_dev *bridge; > + struct pci_bus *bus; > + struct pci_dev *vdevp = NULL; > + u16 config; > + > + bus = pdev->bus; > + bridge = bus->self; > + > + /* Is VGA routed to us? */ > + if (bridge && (pci_is_bridge(bridge))) { > + pci_read_config_word(bridge, PCI_BRIDGE_CONTROL, &config); > + > + /* Yes, this bridge is PCI bridge-to-bridge spec compliant, > + * just return! > + */ > + if (config & PCI_BRIDGE_CTL_VGA) > + return; > + > + dev_warn(&pdev->dev, "VGA bridge control is not enabled\n"); > + } You cannot assume that a bridge is defective just because PCI_BRIDGE_CTL_VGA is not set. > + /* Just return if the system already have a default device */ > + if (vga_default_device()) > + return; > + > + /* No default vga device */ > + while ((vdevp = pci_get_class(PCI_CLASS_DISPLAY_VGA << 8, vdevp))) { > + if (vdevp->vendor != 0x1a03) { > + /* Have other vga devcie in the system, do nothing */ > + dev_info(&pdev->dev, "Another boot vga device: 0x%x:0x%x\n", > + vdevp->vendor, vdevp->device); > + return; > + } > + } > + > + vga_set_default_device(pdev); > + > + dev_info(&pdev->dev, "Boot vga device set as 0x%x:0x%x\n", > + pdev->vendor, pdev->device); > +} > +DECLARE_PCI_FIXUP_CLASS_FINAL(0x1a03, 0x2000, PCI_CLASS_DISPLAY_VGA, 8, aspeed_fixup_vgaarb); > + > + > /* > * The Mellanox Tavor device gives false positive parity errors. Disable > * parity error reporting. > -- > 2.27.0 >
Hi, Bjorn, On Fri, May 14, 2021 at 11:10 PM Bjorn Helgaas <helgaas@kernel.org> wrote: > > On Fri, May 14, 2021 at 04:00:25PM +0800, Huacai Chen wrote: > > According to PCI-to-PCI bridge spec, bit 3 of Bridge Control Register is > > VGA Enable bit which modifies the response to VGA compatible addresses. > > The bridge spec is pretty old, and most of the content has been > incorporated into the PCIe spec. I think you can cite "PCIe r5.0, sec > 7.5.1.3.13" here instead. > > > If the VGA Enable bit is set, the bridge will decode and forward the > > following accesses on the primary interface to the secondary interface. > > *Which* following accesses? The structure of English requires that if > you say "the following accesses," you must continue by *listing* the > accesses. > > > The ASpeed AST2500 hardward does not set the VGA Enable bit on its > > bridge control register, which causes vgaarb subsystem don't think the > > VGA card behind the bridge as a valid boot vga device. > > s/hardward/bridge/ > s/vga/VGA/ (also in code comments and dmesg strings below) > > From the code, it looks like AST2500 ([1a03:2000]) is a VGA device, > since it apparently has a VGA class code. But here you say the > AST2500 has a Bridge Control register, which suggests that it's a > bridge. If AST2500 is some sort of combination that includes both a > bridge and a VGA device, please outline that topology. > > But the hardware defect is that some bridges forward VGA accesses even > though their VGA Enable bit is not set? The quirk should be attached > to broken *bridges*, not to VGA devices. > > If a bridge forwards VGA accesses regardless of how its VGA Enable bit > is set, that means VGA arbitration (in vgaarb.c) cannot work > correctly, so merely setting the default VGA device once in a quirk is > not sufficient. You would have to somehow disable any future attempts > to use other VGA devices. Only the VGA device below this defective > bridge is usable. Any other VGA devices in the system would be > useless. > > > So we provide a quirk to fix Xorg auto-detection. > > > > See similar bug: > > > > https://patchwork.kernel.org/project/linux-pci/patch/20170619023528.11532-1-dja@axtens.net/ > > This patch was never merged. If we merged a revised version, please > cite the SHA1 instead. This patch has never merged, and I found that it is unnecessary after commit a37c0f48950b56f6ef2ee637 ("vgaarb: Select a default VGA device even if there's no legacy VGA"). Maybe this ASpeed patch is also unnecessary. If it is still needed, I'll investigate the root cause. Huacai > > > Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> > > Signed-off-by: Jingfeng Sui <suijingfeng@loongson.cn> > > --- > > drivers/pci/quirks.c | 47 ++++++++++++++++++++++++++++++++++++++++++++ > > 1 file changed, 47 insertions(+) > > > > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c > > index 6ab4b3bba36b..adf5490706ad 100644 > > --- a/drivers/pci/quirks.c > > +++ b/drivers/pci/quirks.c > > @@ -28,6 +28,7 @@ > > #include <linux/platform_data/x86/apple.h> > > #include <linux/pm_runtime.h> > > #include <linux/switchtec.h> > > +#include <linux/vgaarb.h> > > #include <asm/dma.h> /* isa_dma_bridge_buggy */ > > #include "pci.h" > > > > @@ -297,6 +298,52 @@ static void loongson_mrrs_quirk(struct pci_dev *dev) > > } > > DECLARE_PCI_FIXUP_ENABLE(PCI_ANY_ID, PCI_ANY_ID, loongson_mrrs_quirk); > > > > + > > +static void aspeed_fixup_vgaarb(struct pci_dev *pdev) > > +{ > > + struct pci_dev *bridge; > > + struct pci_bus *bus; > > + struct pci_dev *vdevp = NULL; > > + u16 config; > > + > > + bus = pdev->bus; > > + bridge = bus->self; > > + > > + /* Is VGA routed to us? */ > > + if (bridge && (pci_is_bridge(bridge))) { > > + pci_read_config_word(bridge, PCI_BRIDGE_CONTROL, &config); > > + > > + /* Yes, this bridge is PCI bridge-to-bridge spec compliant, > > + * just return! > > + */ > > + if (config & PCI_BRIDGE_CTL_VGA) > > + return; > > + > > + dev_warn(&pdev->dev, "VGA bridge control is not enabled\n"); > > + } > > You cannot assume that a bridge is defective just because > PCI_BRIDGE_CTL_VGA is not set. > > > + /* Just return if the system already have a default device */ > > + if (vga_default_device()) > > + return; > > + > > + /* No default vga device */ > > + while ((vdevp = pci_get_class(PCI_CLASS_DISPLAY_VGA << 8, vdevp))) { > > + if (vdevp->vendor != 0x1a03) { > > + /* Have other vga devcie in the system, do nothing */ > > + dev_info(&pdev->dev, "Another boot vga device: 0x%x:0x%x\n", > > + vdevp->vendor, vdevp->device); > > + return; > > + } > > + } > > + > > + vga_set_default_device(pdev); > > + > > + dev_info(&pdev->dev, "Boot vga device set as 0x%x:0x%x\n", > > + pdev->vendor, pdev->device); > > +} > > +DECLARE_PCI_FIXUP_CLASS_FINAL(0x1a03, 0x2000, PCI_CLASS_DISPLAY_VGA, 8, aspeed_fixup_vgaarb); > > + > > + > > /* > > * The Mellanox Tavor device gives false positive parity errors. Disable > > * parity error reporting. > > -- > > 2.27.0 > >
Hi, Bjorn, On Sat, May 15, 2021 at 5:09 PM Huacai Chen <chenhuacai@gmail.com> wrote: > > Hi, Bjorn, > > On Fri, May 14, 2021 at 11:10 PM Bjorn Helgaas <helgaas@kernel.org> wrote: > > > > On Fri, May 14, 2021 at 04:00:25PM +0800, Huacai Chen wrote: > > > According to PCI-to-PCI bridge spec, bit 3 of Bridge Control Register is > > > VGA Enable bit which modifies the response to VGA compatible addresses. > > > > The bridge spec is pretty old, and most of the content has been > > incorporated into the PCIe spec. I think you can cite "PCIe r5.0, sec > > 7.5.1.3.13" here instead. > > > > > If the VGA Enable bit is set, the bridge will decode and forward the > > > following accesses on the primary interface to the secondary interface. > > > > *Which* following accesses? The structure of English requires that if > > you say "the following accesses," you must continue by *listing* the > > accesses. > > > > > The ASpeed AST2500 hardward does not set the VGA Enable bit on its > > > bridge control register, which causes vgaarb subsystem don't think the > > > VGA card behind the bridge as a valid boot vga device. > > > > s/hardward/bridge/ > > s/vga/VGA/ (also in code comments and dmesg strings below) > > > > From the code, it looks like AST2500 ([1a03:2000]) is a VGA device, > > since it apparently has a VGA class code. But here you say the > > AST2500 has a Bridge Control register, which suggests that it's a > > bridge. If AST2500 is some sort of combination that includes both a > > bridge and a VGA device, please outline that topology. > > > > But the hardware defect is that some bridges forward VGA accesses even > > though their VGA Enable bit is not set? The quirk should be attached > > to broken *bridges*, not to VGA devices. > > > > If a bridge forwards VGA accesses regardless of how its VGA Enable bit > > is set, that means VGA arbitration (in vgaarb.c) cannot work > > correctly, so merely setting the default VGA device once in a quirk is > > not sufficient. You would have to somehow disable any future attempts > > to use other VGA devices. Only the VGA device below this defective > > bridge is usable. Any other VGA devices in the system would be > > useless. > > > > > So we provide a quirk to fix Xorg auto-detection. > > > > > > See similar bug: > > > > > > https://patchwork.kernel.org/project/linux-pci/patch/20170619023528.11532-1-dja@axtens.net/ > > > > This patch was never merged. If we merged a revised version, please > > cite the SHA1 instead. > This patch has never merged, and I found that it is unnecessary after > commit a37c0f48950b56f6ef2ee637 ("vgaarb: Select a default VGA device > even if there's no legacy VGA"). Maybe this ASpeed patch is also > unnecessary. If it is still needed, I'll investigate the root cause. I found that vga_arb_device_init() and pcibios_init() are both wrapped by subsys_initcall(), which means their sequence is unpredictable. And unfortunately, in our platform vga_arb_device_init() is called before pcibios_init(), which makes vga_arb_device_init() fail to set a default vga device. This is the root cause why we thought that we still need a quirk for AST2500. I think the best solution is make vga_arb_device_init() be wrapped by subsys_initcall_sync(), do you think so? Huacai > > Huacai > > > > > Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> > > > Signed-off-by: Jingfeng Sui <suijingfeng@loongson.cn> > > > --- > > > drivers/pci/quirks.c | 47 ++++++++++++++++++++++++++++++++++++++++++++ > > > 1 file changed, 47 insertions(+) > > > > > > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c > > > index 6ab4b3bba36b..adf5490706ad 100644 > > > --- a/drivers/pci/quirks.c > > > +++ b/drivers/pci/quirks.c > > > @@ -28,6 +28,7 @@ > > > #include <linux/platform_data/x86/apple.h> > > > #include <linux/pm_runtime.h> > > > #include <linux/switchtec.h> > > > +#include <linux/vgaarb.h> > > > #include <asm/dma.h> /* isa_dma_bridge_buggy */ > > > #include "pci.h" > > > > > > @@ -297,6 +298,52 @@ static void loongson_mrrs_quirk(struct pci_dev *dev) > > > } > > > DECLARE_PCI_FIXUP_ENABLE(PCI_ANY_ID, PCI_ANY_ID, loongson_mrrs_quirk); > > > > > > + > > > +static void aspeed_fixup_vgaarb(struct pci_dev *pdev) > > > +{ > > > + struct pci_dev *bridge; > > > + struct pci_bus *bus; > > > + struct pci_dev *vdevp = NULL; > > > + u16 config; > > > + > > > + bus = pdev->bus; > > > + bridge = bus->self; > > > + > > > + /* Is VGA routed to us? */ > > > + if (bridge && (pci_is_bridge(bridge))) { > > > + pci_read_config_word(bridge, PCI_BRIDGE_CONTROL, &config); > > > + > > > + /* Yes, this bridge is PCI bridge-to-bridge spec compliant, > > > + * just return! > > > + */ > > > + if (config & PCI_BRIDGE_CTL_VGA) > > > + return; > > > + > > > + dev_warn(&pdev->dev, "VGA bridge control is not enabled\n"); > > > + } > > > > You cannot assume that a bridge is defective just because > > PCI_BRIDGE_CTL_VGA is not set. > > > > > + /* Just return if the system already have a default device */ > > > + if (vga_default_device()) > > > + return; > > > + > > > + /* No default vga device */ > > > + while ((vdevp = pci_get_class(PCI_CLASS_DISPLAY_VGA << 8, vdevp))) { > > > + if (vdevp->vendor != 0x1a03) { > > > + /* Have other vga devcie in the system, do nothing */ > > > + dev_info(&pdev->dev, "Another boot vga device: 0x%x:0x%x\n", > > > + vdevp->vendor, vdevp->device); > > > + return; > > > + } > > > + } > > > + > > > + vga_set_default_device(pdev); > > > + > > > + dev_info(&pdev->dev, "Boot vga device set as 0x%x:0x%x\n", > > > + pdev->vendor, pdev->device); > > > +} > > > +DECLARE_PCI_FIXUP_CLASS_FINAL(0x1a03, 0x2000, PCI_CLASS_DISPLAY_VGA, 8, aspeed_fixup_vgaarb); > > > + > > > + > > > /* > > > * The Mellanox Tavor device gives false positive parity errors. Disable > > > * parity error reporting. > > > -- > > > 2.27.0 > > >
On Mon, May 17, 2021 at 08:53:43PM +0800, Huacai Chen wrote: > On Sat, May 15, 2021 at 5:09 PM Huacai Chen <chenhuacai@gmail.com> wrote: > > On Fri, May 14, 2021 at 11:10 PM Bjorn Helgaas <helgaas@kernel.org> wrote: > > > On Fri, May 14, 2021 at 04:00:25PM +0800, Huacai Chen wrote: > > > > According to PCI-to-PCI bridge spec, bit 3 of Bridge Control Register is > > > > VGA Enable bit which modifies the response to VGA compatible addresses. > > > > > > The bridge spec is pretty old, and most of the content has been > > > incorporated into the PCIe spec. I think you can cite "PCIe r5.0, sec > > > 7.5.1.3.13" here instead. > > > > > > > If the VGA Enable bit is set, the bridge will decode and forward the > > > > following accesses on the primary interface to the secondary interface. > > > > > > *Which* following accesses? The structure of English requires that if > > > you say "the following accesses," you must continue by *listing* the > > > accesses. > > > > > > > The ASpeed AST2500 hardward does not set the VGA Enable bit on its > > > > bridge control register, which causes vgaarb subsystem don't think the > > > > VGA card behind the bridge as a valid boot vga device. > > > > > > s/hardward/bridge/ > > > s/vga/VGA/ (also in code comments and dmesg strings below) > > > > > > From the code, it looks like AST2500 ([1a03:2000]) is a VGA device, > > > since it apparently has a VGA class code. But here you say the > > > AST2500 has a Bridge Control register, which suggests that it's a > > > bridge. If AST2500 is some sort of combination that includes both a > > > bridge and a VGA device, please outline that topology. > > > > > > But the hardware defect is that some bridges forward VGA accesses even > > > though their VGA Enable bit is not set? The quirk should be attached > > > to broken *bridges*, not to VGA devices. > > > > > > If a bridge forwards VGA accesses regardless of how its VGA Enable bit > > > is set, that means VGA arbitration (in vgaarb.c) cannot work > > > correctly, so merely setting the default VGA device once in a quirk is > > > not sufficient. You would have to somehow disable any future attempts > > > to use other VGA devices. Only the VGA device below this defective > > > bridge is usable. Any other VGA devices in the system would be > > > useless. > > > > > > > So we provide a quirk to fix Xorg auto-detection. > > > > > > > > See similar bug: > > > > > > > > https://patchwork.kernel.org/project/linux-pci/patch/20170619023528.11532-1-dja@axtens.net/ > > > > > > This patch was never merged. If we merged a revised version, please > > > cite the SHA1 instead. > > > > This patch has never merged, and I found that it is unnecessary after > > commit a37c0f48950b56f6ef2ee637 ("vgaarb: Select a default VGA device > > even if there's no legacy VGA"). Maybe this ASpeed patch is also > > unnecessary. If it is still needed, I'll investigate the root cause. > > I found that vga_arb_device_init() and pcibios_init() are both wrapped > by subsys_initcall(), which means their sequence is unpredictable. And > unfortunately, in our platform vga_arb_device_init() is called before > pcibios_init(), which makes vga_arb_device_init() fail to set a > default vga device. This is the root cause why we thought that we > still need a quirk for AST2500. Does this mean there is no hardware defect here? The VGA Enable bit works correctly? > I think the best solution is make vga_arb_device_init() be wrapped by > subsys_initcall_sync(), do you think so? Hmm. Unfortunately the semantics of subsys_initcall_sync() are not documented, so I'm not sure exactly *why* such a change would work and whether we could rely on it to continue working. pcibios_init() isn't very consistent across arches. On some, including alpha, microblaze, some MIPS platforms, powerpc, and sh, it enumerates PCI devices. On others (ia64, parisc, sparc, x86), it does basically nothing. That makes life a little difficult. vga_arb_device_init() is a subsys_initcall() and wants to look through all the PCI devices. That's a little problematic for arches where pcibios_init() is also a subsys_initcall() and enumerates PCI devices. Sorry, that's no answer for you. Just more questions :) > > > > Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> > > > > Signed-off-by: Jingfeng Sui <suijingfeng@loongson.cn> > > > > --- > > > > drivers/pci/quirks.c | 47 ++++++++++++++++++++++++++++++++++++++++++++ > > > > 1 file changed, 47 insertions(+) > > > > > > > > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c > > > > index 6ab4b3bba36b..adf5490706ad 100644 > > > > --- a/drivers/pci/quirks.c > > > > +++ b/drivers/pci/quirks.c > > > > @@ -28,6 +28,7 @@ > > > > #include <linux/platform_data/x86/apple.h> > > > > #include <linux/pm_runtime.h> > > > > #include <linux/switchtec.h> > > > > +#include <linux/vgaarb.h> > > > > #include <asm/dma.h> /* isa_dma_bridge_buggy */ > > > > #include "pci.h" > > > > > > > > @@ -297,6 +298,52 @@ static void loongson_mrrs_quirk(struct pci_dev *dev) > > > > } > > > > DECLARE_PCI_FIXUP_ENABLE(PCI_ANY_ID, PCI_ANY_ID, loongson_mrrs_quirk); > > > > > > > > + > > > > +static void aspeed_fixup_vgaarb(struct pci_dev *pdev) > > > > +{ > > > > + struct pci_dev *bridge; > > > > + struct pci_bus *bus; > > > > + struct pci_dev *vdevp = NULL; > > > > + u16 config; > > > > + > > > > + bus = pdev->bus; > > > > + bridge = bus->self; > > > > + > > > > + /* Is VGA routed to us? */ > > > > + if (bridge && (pci_is_bridge(bridge))) { > > > > + pci_read_config_word(bridge, PCI_BRIDGE_CONTROL, &config); > > > > + > > > > + /* Yes, this bridge is PCI bridge-to-bridge spec compliant, > > > > + * just return! > > > > + */ > > > > + if (config & PCI_BRIDGE_CTL_VGA) > > > > + return; > > > > + > > > > + dev_warn(&pdev->dev, "VGA bridge control is not enabled\n"); > > > > + } > > > > > > You cannot assume that a bridge is defective just because > > > PCI_BRIDGE_CTL_VGA is not set. > > > > > > > + /* Just return if the system already have a default device */ > > > > + if (vga_default_device()) > > > > + return; > > > > + > > > > + /* No default vga device */ > > > > + while ((vdevp = pci_get_class(PCI_CLASS_DISPLAY_VGA << 8, vdevp))) { > > > > + if (vdevp->vendor != 0x1a03) { > > > > + /* Have other vga devcie in the system, do nothing */ > > > > + dev_info(&pdev->dev, "Another boot vga device: 0x%x:0x%x\n", > > > > + vdevp->vendor, vdevp->device); > > > > + return; > > > > + } > > > > + } > > > > + > > > > + vga_set_default_device(pdev); > > > > + > > > > + dev_info(&pdev->dev, "Boot vga device set as 0x%x:0x%x\n", > > > > + pdev->vendor, pdev->device); > > > > +} > > > > +DECLARE_PCI_FIXUP_CLASS_FINAL(0x1a03, 0x2000, PCI_CLASS_DISPLAY_VGA, 8, aspeed_fixup_vgaarb); > > > > + > > > > + > > > > /* > > > > * The Mellanox Tavor device gives false positive parity errors. Disable > > > > * parity error reporting. > > > > -- > > > > 2.27.0 > > > >
> -----Original Messages----- > From: "Bjorn Helgaas" <helgaas@kernel.org> > Sent Time: 2021-05-18 02:28:10 (Tuesday) > To: "Huacai Chen" <chenhuacai@gmail.com> > Cc: "Huacai Chen" <chenhuacai@loongson.cn>, "Bjorn Helgaas" <bhelgaas@google.com>, linux-pci <linux-pci@vger.kernel.org>, "Jiaxun Yang" <jiaxun.yang@flygoat.com>, "Jingfeng Sui" <suijingfeng@loongson.cn> > Subject: Re: [PATCH 5/5] PCI: Support ASpeed VGA cards behind a misbehaving bridge > > On Mon, May 17, 2021 at 08:53:43PM +0800, Huacai Chen wrote: > > On Sat, May 15, 2021 at 5:09 PM Huacai Chen <chenhuacai@gmail.com> wrote: > > > On Fri, May 14, 2021 at 11:10 PM Bjorn Helgaas <helgaas@kernel.org> wrote: > > > > On Fri, May 14, 2021 at 04:00:25PM +0800, Huacai Chen wrote: > > > > > According to PCI-to-PCI bridge spec, bit 3 of Bridge Control Register is > > > > > VGA Enable bit which modifies the response to VGA compatible addresses. > > > > > > > > The bridge spec is pretty old, and most of the content has been > > > > incorporated into the PCIe spec. I think you can cite "PCIe r5.0, sec > > > > 7.5.1.3.13" here instead. > > > > > > > > > If the VGA Enable bit is set, the bridge will decode and forward the > > > > > following accesses on the primary interface to the secondary interface. > > > > > > > > *Which* following accesses? The structure of English requires that if > > > > you say "the following accesses," you must continue by *listing* the > > > > accesses. > > > > > > > > > The ASpeed AST2500 hardward does not set the VGA Enable bit on its > > > > > bridge control register, which causes vgaarb subsystem don't think the > > > > > VGA card behind the bridge as a valid boot vga device. > > > > > > > > s/hardward/bridge/ > > > > s/vga/VGA/ (also in code comments and dmesg strings below) > > > > > > > > From the code, it looks like AST2500 ([1a03:2000]) is a VGA device, > > > > since it apparently has a VGA class code. But here you say the > > > > AST2500 has a Bridge Control register, which suggests that it's a > > > > bridge. If AST2500 is some sort of combination that includes both a > > > > bridge and a VGA device, please outline that topology. > > > > > > > > But the hardware defect is that some bridges forward VGA accesses even > > > > though their VGA Enable bit is not set? The quirk should be attached > > > > to broken *bridges*, not to VGA devices. > > > > > > > > If a bridge forwards VGA accesses regardless of how its VGA Enable bit > > > > is set, that means VGA arbitration (in vgaarb.c) cannot work > > > > correctly, so merely setting the default VGA device once in a quirk is > > > > not sufficient. You would have to somehow disable any future attempts > > > > to use other VGA devices. Only the VGA device below this defective > > > > bridge is usable. Any other VGA devices in the system would be > > > > useless. > > > > > > > > > So we provide a quirk to fix Xorg auto-detection. > > > > > > > > > > See similar bug: > > > > > > > > > > https://patchwork.kernel.org/project/linux-pci/patch/20170619023528.11532-1-dja@axtens.net/ > > > > > > > > This patch was never merged. If we merged a revised version, please > > > > cite the SHA1 instead. > > > > > > This patch has never merged, and I found that it is unnecessary after > > > commit a37c0f48950b56f6ef2ee637 ("vgaarb: Select a default VGA device > > > even if there's no legacy VGA"). Maybe this ASpeed patch is also > > > unnecessary. If it is still needed, I'll investigate the root cause. > > > > I found that vga_arb_device_init() and pcibios_init() are both wrapped > > by subsys_initcall(), which means their sequence is unpredictable. And > > unfortunately, in our platform vga_arb_device_init() is called before > > pcibios_init(), which makes vga_arb_device_init() fail to set a > > default vga device. This is the root cause why we thought that we > > still need a quirk for AST2500. > > Does this mean there is no hardware defect here? The VGA Enable bit > works correctly? The AST2500 BMC card we are using consist of a PCI-to-PCI Bridge (1a03:1150) and a PCI VGA device (1a03:2000). The value of the Bridge Control register in its PCI-to-PCI Bridge Configuration Registers is 0x0000. Thus, the VGA Enable bit in the Bridge Control register do not set, and the VGA Enable bit is not writable. The topology of this BMC card is illustrated as following: /sys/devices/pci0000:00 |-- 0000:00:0c.0 | |-- class (0x060400) | |-- vendor (0x0014) | |-- device (0x7a09) | |-- ... | |-- 0000:04:00.0 | | | -- class (0x060400) | | | -- device (0x1150) | | | -- vendor (0x1a03) | | | -- revision (0x04) | | | -- ... | | | -- 0000:05:00.0 | | | | -- class (0x030000) | | | | -- device (0x2000) | | | | -- vendor (0x1a03) | | | | -- irq (51) | | | | -- i2c-6 | | | | -- drm | | | | -- graphics | | | | -- ... | `-- uevent `-- ... The following information is getted from lspci -vvxx: 04:00.0 PCI bridge: ASPEED Technology, Inc. AST1150 PCI-to-PCI Bridge (rev 04) (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <tabort- <mabort-="">SERR- <perr- intx-="" latency:="" 0="" interrupt:="" pin="" a="" routed="" to="" irq="" 51="" numa="" node:="" bus:="" primary="04," secondary="05," subordinate="05," sec-latency="0" i="" o="" behind="" bridge:="" 00004000-00004fff="" memory="" 41000000-427fffff="" status:="" 66mhz+="" fastb2b-="" parerr-="" devsel="medium">TAbort- <tabort- <mabort-="" <serr-="" <perr-="" bridgectl:="" parity-="" serr-="" noisa-="" vga-="" mabort-="">Reset- FastB2B- PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- Capabilities: <access denied=""> 00: 03 1a 50 11 07 00 10 00 04 00 04 06 00 00 01 00 10: 00 00 00 00 00 00 00 00 04 05 05 00 41 41 20 02 20: 00 41 70 42 f1 ff 01 00 00 00 00 00 00 00 00 00 30: 00 00 00 00 50 00 00 00 00 00 00 00 63 01 00 00 05:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev 41) (prog-if 00 [VGA controller]) Subsystem: ASPEED Technology, Inc. ASPEED Graphics Family Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop+ ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <tabort- <mabort-="">SERR- <perr- intx-="" latency:="" 0="" interrupt:="" pin="" a="" routed="" to="" irq="" 51="" numa="" node:="" region="" 0:="" memory="" at="" 41000000="" (32-bit,="" non-prefetchable)="" [size="16M]" 1:="" 42000000="" 2:="" i="" o="" ports="" 4000="" capabilities:="" <access="" denied=""> Kernel driver in use: ast 00: 03 1a 00 20 27 00 10 02 41 00 00 03 00 00 00 00 10: 00 00 00 41 00 00 00 42 01 40 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 03 1a 00 20 30: 00 00 00 00 40 00 00 00 00 00 00 00 63 01 00 00 > > > I think the best solution is make vga_arb_device_init() be wrapped by > > subsys_initcall_sync(), do you think so? > > Hmm. Unfortunately the semantics of subsys_initcall_sync() are not > documented, so I'm not sure exactly *why* such a change would work and > whether we could rely on it to continue working. > > pcibios_init() isn't very consistent across arches. On some, > including alpha, microblaze, some MIPS platforms, powerpc, and sh, it > enumerates PCI devices. On others (ia64, parisc, sparc, x86), it does > basically nothing. That makes life a little difficult. > > vga_arb_device_init() is a subsys_initcall() and wants to look through > all the PCI devices. That's a little problematic for arches where > pcibios_init() is also a subsys_initcall() and enumerates PCI devices. > > Sorry, that's no answer for you. Just more questions :) > > > > > > Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> > > > > > Signed-off-by: Jingfeng Sui <suijingfeng@loongson.cn> > > > > > --- > > > > > drivers/pci/quirks.c | 47 ++++++++++++++++++++++++++++++++++++++++++++ > > > > > 1 file changed, 47 insertions(+) > > > > > > > > > > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c > > > > > index 6ab4b3bba36b..adf5490706ad 100644 > > > > > --- a/drivers/pci/quirks.c > > > > > +++ b/drivers/pci/quirks.c > > > > > @@ -28,6 +28,7 @@ > > > > > #include <linux platform_data="" x86="" apple.h=""> > > > > > #include <linux pm_runtime.h=""> > > > > > #include <linux switchtec.h=""> > > > > > +#include <linux vgaarb.h=""> > > > > > #include <asm dma.h=""> /* isa_dma_bridge_buggy */ > > > > > #include "pci.h" > > > > > > > > > > @@ -297,6 +298,52 @@ static void loongson_mrrs_quirk(struct pci_dev *dev) > > > > > } > > > > > DECLARE_PCI_FIXUP_ENABLE(PCI_ANY_ID, PCI_ANY_ID, loongson_mrrs_quirk); > > > > > > > > > > + > > > > > +static void aspeed_fixup_vgaarb(struct pci_dev *pdev) > > > > > +{ > > > > > + struct pci_dev *bridge; > > > > > + struct pci_bus *bus; > > > > > + struct pci_dev *vdevp = NULL; > > > > > + u16 config; > > > > > + > > > > > + bus = pdev->bus; > > > > > + bridge = bus->self; > > > > > + > > > > > + /* Is VGA routed to us? */ > > > > > + if (bridge && (pci_is_bridge(bridge))) { > > > > > + pci_read_config_word(bridge, PCI_BRIDGE_CONTROL, &config); > > > > > + > > > > > + /* Yes, this bridge is PCI bridge-to-bridge spec compliant, > > > > > + * just return! > > > > > + */ > > > > > + if (config & PCI_BRIDGE_CTL_VGA) > > > > > + return; > > > > > + > > > > > + dev_warn(&pdev->dev, "VGA bridge control is not enabled\n"); > > > > > + } > > > > > > > > You cannot assume that a bridge is defective just because > > > > PCI_BRIDGE_CTL_VGA is not set. > > > > > > > > > + /* Just return if the system already have a default device */ > > > > > + if (vga_default_device()) > > > > > + return; > > > > > + > > > > > + /* No default vga device */ > > > > > + while ((vdevp = pci_get_class(PCI_CLASS_DISPLAY_VGA << 8, vdevp))) { > > > > > + if (vdevp->vendor != 0x1a03) { > > > > > + /* Have other vga devcie in the system, do nothing */ > > > > > + dev_info(&pdev->dev, "Another boot vga device: 0x%x:0x%x\n", > > > > > + vdevp->vendor, vdevp->device); > > > > > + return; > > > > > + } > > > > > + } > > > > > + > > > > > + vga_set_default_device(pdev); > > > > > + > > > > > + dev_info(&pdev->dev, "Boot vga device set as 0x%x:0x%x\n", > > > > > + pdev->vendor, pdev->device); > > > > > +} > > > > > +DECLARE_PCI_FIXUP_CLASS_FINAL(0x1a03, 0x2000, PCI_CLASS_DISPLAY_VGA, 8, aspeed_fixup_vgaarb); > > > > > + > > > > > + > > > > > /* > > > > > * The Mellanox Tavor device gives false positive parity errors. Disable > > > > > * parity error reporting. > > > > > -- > > > > > 2.27.0 > > > > > </asm></linux></linux></linux></linux></suijingfeng@loongson.cn></chenhuacai@loongson.cn></perr-></tabort-></access></tabort-></perr-></tabort-></helgaas@kernel.org></chenhuacai@gmail.com></suijingfeng@loongson.cn></jiaxun.yang@flygoat.com></linux-pci@vger.kernel.org></bhelgaas@google.com></chenhuacai@loongson.cn></chenhuacai@gmail.com></helgaas@kernel.org>
On Tue, May 18, 2021 at 10:31:56AM +0800, 隋景峰 wrote: > > -----Original Messages----- Wow, this is ugly (the ">" instead of ">"). Can you figure out how to respond in the usual plain-text way? > > From: "Bjorn Helgaas" <helgaas@kernel.org> > > Sent Time: 2021-05-18 02:28:10 (Tuesday) > > To: "Huacai Chen" <chenhuacai@gmail.com> > > Cc: "Huacai Chen" <chenhuacai@loongson.cn>, "Bjorn Helgaas" <bhelgaas@google.com>, linux-pci <linux-pci@vger.kernel.org>, "Jiaxun Yang" <jiaxun.yang@flygoat.com>, "Jingfeng Sui" <suijingfeng@loongson.cn> > > Subject: Re: [PATCH 5/5] PCI: Support ASpeed VGA cards behind a misbehaving bridge > > > > On Mon, May 17, 2021 at 08:53:43PM +0800, Huacai Chen wrote: > > > On Sat, May 15, 2021 at 5:09 PM Huacai Chen <chenhuacai@gmail.com> wrote: > > > > On Fri, May 14, 2021 at 11:10 PM Bjorn Helgaas <helgaas@kernel.org> wrote: > > > > > On Fri, May 14, 2021 at 04:00:25PM +0800, Huacai Chen wrote: > > > > > > According to PCI-to-PCI bridge spec, bit 3 of Bridge Control Register is > > > > > > VGA Enable bit which modifies the response to VGA compatible addresses. > > > > > > > > > > The bridge spec is pretty old, and most of the content has been > > > > > incorporated into the PCIe spec. I think you can cite "PCIe r5.0, sec > > > > > 7.5.1.3.13" here instead. > > > > > > > > > > > If the VGA Enable bit is set, the bridge will decode and forward the > > > > > > following accesses on the primary interface to the secondary interface. > > > > > > > > > > *Which* following accesses? The structure of English requires that if > > > > > you say "the following accesses," you must continue by *listing* the > > > > > accesses. > > > > > > > > > > > The ASpeed AST2500 hardward does not set the VGA Enable bit on its > > > > > > bridge control register, which causes vgaarb subsystem don't think the > > > > > > VGA card behind the bridge as a valid boot vga device. > > > > > > > > > > s/hardward/bridge/ > > > > > s/vga/VGA/ (also in code comments and dmesg strings below) > > > > > > > > > > From the code, it looks like AST2500 ([1a03:2000]) is a VGA device, > > > > > since it apparently has a VGA class code. But here you say the > > > > > AST2500 has a Bridge Control register, which suggests that it's a > > > > > bridge. If AST2500 is some sort of combination that includes both a > > > > > bridge and a VGA device, please outline that topology. > > > > > > > > > > But the hardware defect is that some bridges forward VGA accesses even > > > > > though their VGA Enable bit is not set? The quirk should be attached > > > > > to broken *bridges*, not to VGA devices. > > > > > > > > > > If a bridge forwards VGA accesses regardless of how its VGA Enable bit > > > > > is set, that means VGA arbitration (in vgaarb.c) cannot work > > > > > correctly, so merely setting the default VGA device once in a quirk is > > > > > not sufficient. You would have to somehow disable any future attempts > > > > > to use other VGA devices. Only the VGA device below this defective > > > > > bridge is usable. Any other VGA devices in the system would be > > > > > useless. > > > > > > > > > > > So we provide a quirk to fix Xorg auto-detection. > > > > > > > > > > > > See similar bug: > > > > > > > > > > > > https://patchwork.kernel.org/project/linux-pci/patch/20170619023528.11532-1-dja@axtens.net/ > > > > > > > > > > This patch was never merged. If we merged a revised version, please > > > > > cite the SHA1 instead. > > > > > > > > This patch has never merged, and I found that it is unnecessary after > > > > commit a37c0f48950b56f6ef2ee637 ("vgaarb: Select a default VGA device > > > > even if there's no legacy VGA"). Maybe this ASpeed patch is also > > > > unnecessary. If it is still needed, I'll investigate the root cause. > > > > > > I found that vga_arb_device_init() and pcibios_init() are both wrapped > > > by subsys_initcall(), which means their sequence is unpredictable. And > > > unfortunately, in our platform vga_arb_device_init() is called before > > > pcibios_init(), which makes vga_arb_device_init() fail to set a > > > default vga device. This is the root cause why we thought that we > > > still need a quirk for AST2500. > > > > Does this mean there is no hardware defect here? The VGA Enable bit > > works correctly? > > > The AST2500 BMC card we are using consist of a PCI-to-PCI Bridge (1a03:1150) > and a PCI VGA device (1a03:2000). The value of the Bridge Control register > in its PCI-to-PCI Bridge Configuration Registers is 0x0000. Thus, the VGA > Enable bit in the Bridge Control register do not set, and the VGA > Enable bit is not writable. If VGA Enable is 0 and cannot be set to 1, the bridge should *never* forward VGA accesses to its secondary bus. The generic VGA driver that uses the legacy [mem 0xa0000-0xbffff] range should not work with the VGA device at 05:00.0, and that device cannot participate in the VGA arbitration scheme, which relies on the VGA Enable bit. If you have a driver for 05:00.0 that doesn't need the legacy memory range, it's possible that it may work. But VGA arbitration will be broken, and if 05:00.0 needs to be initialized by an option ROM, that probably won't work either. If the 04:00.0 bridge *always* forwards VGA accesses, even though its VGA Enable bit is always zero, then the bridge is broken. In that case, the generic VGA driver should work with the 05:00.0 device, but VGA arbitration will be limited. I'm not sure, but the arbiter *might* be able to use the VGA Enable bit in the 00:0c.0 bridge to control VGA access to 05:00.0. You wouldn't be able to have more than one VGA device below 00:0c.0, and you may not be able have more than one in the entire system. > The topology of this BMC card is illustrated as following: > > /sys/devices/pci0000:00 > |-- 0000:00:0c.0 > | |-- class (0x060400) > | |-- vendor (0x0014) > | |-- device (0x7a09) > | |-- ... > | |-- 0000:04:00.0 > | | | -- class (0x060400) > | | | -- device (0x1150) > | | | -- vendor (0x1a03) > | | | -- revision (0x04) > | | | -- ... > | | | -- 0000:05:00.0 > | | | | -- class (0x030000) > | | | | -- device (0x2000) > | | | | -- vendor (0x1a03) > | | | | -- irq (51) > | | | | -- i2c-6 > | | | | -- drm > | | | | -- graphics > | | | | -- ... > | `-- uevent > `-- ... > > The following information is getted from lspci -vvxx: Generally it's better to use "sudo lspci -vvxx" so you decode all the capabilities, too. But in this case, all we care about is the Bridge Control register ("bridgectl"), which *is* included (and apparently is set to 0, since it's decoded as "vga-"). > 04:00.0 PCI bridge: ASPEED Technology, Inc. AST1150 PCI-to-PCI Bridge (rev 04) (prog-if 00 [Normal decode]) > Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <tabort- <mabort-="">SERR- <perr- intx-="" latency:="" 0="" interrupt:="" pin="" a="" routed="" to="" irq="" 51="" numa="" node:="" bus:="" primary="04," secondary="05," subordinate="05," sec-latency="0" i="" o="" behind="" bridge:="" 00004000-00004fff="" memory="" 41000000-427fffff="" status:="" 66mhz+="" fastb2b-="" parerr-="" devsel="medium">TAbort- <tabort- <mabort-="" <serr-="" <perr-="" bridgectl:="" parity-="" serr-="" noisa-="" vga-="" mabort-="">Reset- FastB2B- > PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- > Capabilities: <access denied=""> > 00: 03 1a 50 11 07 00 10 00 04 00 04 06 00 00 01 00 > 10: 00 00 00 00 00 00 00 00 04 05 05 00 41 41 20 02 > 20: 00 41 70 42 f1 ff 01 00 00 00 00 00 00 00 00 00 > 30: 00 00 00 00 50 00 00 00 00 00 00 00 63 01 00 00 > > 05:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev 41) (prog-if 00 [VGA controller]) > Subsystem: ASPEED Technology, Inc. ASPEED Graphics Family > Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop+ ParErr- Stepping- SERR- FastB2B- DisINTx- > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <tabort- <mabort-="">SERR- <perr- intx-="" latency:="" 0="" interrupt:="" pin="" a="" routed="" to="" irq="" 51="" numa="" node:="" region="" 0:="" memory="" at="" 41000000="" (32-bit,="" non-prefetchable)="" [size="16M]" 1:="" 42000000="" 2:="" i="" o="" ports="" 4000="" capabilities:="" <access="" denied=""> > Kernel driver in use: ast > 00: 03 1a 00 20 27 00 10 02 41 00 00 03 00 00 00 00 > 10: 00 00 00 41 00 00 00 42 01 40 00 00 00 00 00 00 > 20: 00 00 00 00 00 00 00 00 00 00 00 00 03 1a 00 20 > 30: 00 00 00 00 40 00 00 00 00 00 00 00 63 01 00 00
Hi, Bjorn, On Tue, May 18, 2021 at 2:28 AM Bjorn Helgaas <helgaas@kernel.org> wrote: > > On Mon, May 17, 2021 at 08:53:43PM +0800, Huacai Chen wrote: > > On Sat, May 15, 2021 at 5:09 PM Huacai Chen <chenhuacai@gmail.com> wrote: > > > On Fri, May 14, 2021 at 11:10 PM Bjorn Helgaas <helgaas@kernel.org> wrote: > > > > On Fri, May 14, 2021 at 04:00:25PM +0800, Huacai Chen wrote: > > > > > According to PCI-to-PCI bridge spec, bit 3 of Bridge Control Register is > > > > > VGA Enable bit which modifies the response to VGA compatible addresses. > > > > > > > > The bridge spec is pretty old, and most of the content has been > > > > incorporated into the PCIe spec. I think you can cite "PCIe r5.0, sec > > > > 7.5.1.3.13" here instead. > > > > > > > > > If the VGA Enable bit is set, the bridge will decode and forward the > > > > > following accesses on the primary interface to the secondary interface. > > > > > > > > *Which* following accesses? The structure of English requires that if > > > > you say "the following accesses," you must continue by *listing* the > > > > accesses. > > > > > > > > > The ASpeed AST2500 hardward does not set the VGA Enable bit on its > > > > > bridge control register, which causes vgaarb subsystem don't think the > > > > > VGA card behind the bridge as a valid boot vga device. > > > > > > > > s/hardward/bridge/ > > > > s/vga/VGA/ (also in code comments and dmesg strings below) > > > > > > > > From the code, it looks like AST2500 ([1a03:2000]) is a VGA device, > > > > since it apparently has a VGA class code. But here you say the > > > > AST2500 has a Bridge Control register, which suggests that it's a > > > > bridge. If AST2500 is some sort of combination that includes both a > > > > bridge and a VGA device, please outline that topology. > > > > > > > > But the hardware defect is that some bridges forward VGA accesses even > > > > though their VGA Enable bit is not set? The quirk should be attached > > > > to broken *bridges*, not to VGA devices. > > > > > > > > If a bridge forwards VGA accesses regardless of how its VGA Enable bit > > > > is set, that means VGA arbitration (in vgaarb.c) cannot work > > > > correctly, so merely setting the default VGA device once in a quirk is > > > > not sufficient. You would have to somehow disable any future attempts > > > > to use other VGA devices. Only the VGA device below this defective > > > > bridge is usable. Any other VGA devices in the system would be > > > > useless. > > > > > > > > > So we provide a quirk to fix Xorg auto-detection. > > > > > > > > > > See similar bug: > > > > > > > > > > https://patchwork.kernel.org/project/linux-pci/patch/20170619023528.11532-1-dja@axtens.net/ > > > > > > > > This patch was never merged. If we merged a revised version, please > > > > cite the SHA1 instead. > > > > > > This patch has never merged, and I found that it is unnecessary after > > > commit a37c0f48950b56f6ef2ee637 ("vgaarb: Select a default VGA device > > > even if there's no legacy VGA"). Maybe this ASpeed patch is also > > > unnecessary. If it is still needed, I'll investigate the root cause. > > > > I found that vga_arb_device_init() and pcibios_init() are both wrapped > > by subsys_initcall(), which means their sequence is unpredictable. And > > unfortunately, in our platform vga_arb_device_init() is called before > > pcibios_init(), which makes vga_arb_device_init() fail to set a > > default vga device. This is the root cause why we thought that we > > still need a quirk for AST2500. > > Does this mean there is no hardware defect here? The VGA Enable bit > works correctly? > No, VGA Enable bit still doesn't set, but with commit a37c0f48950b56f6ef2ee637 ("vgaarb: Select a default VGA device even if there's no legacy VGA") we no longer depend on VGA Enable. > > I think the best solution is make vga_arb_device_init() be wrapped by > > subsys_initcall_sync(), do you think so? > > Hmm. Unfortunately the semantics of subsys_initcall_sync() are not > documented, so I'm not sure exactly *why* such a change would work and > whether we could rely on it to continue working. > > pcibios_init() isn't very consistent across arches. On some, > including alpha, microblaze, some MIPS platforms, powerpc, and sh, it > enumerates PCI devices. On others (ia64, parisc, sparc, x86), it does > basically nothing. That makes life a little difficult. subsys_initcall_sync() is ensured after all subsys_initcall() functions, so at least it can solve the problem on platforms which use pcibios_init() to enumerate PCI devices (x86 and other ACPI-based platforms are also OK, because they use acpi_init() -->acpi_scan_init() -->pci_acpi_scan_root() to enumerate devices). Huacai > > vga_arb_device_init() is a subsys_initcall() and wants to look through > all the PCI devices. That's a little problematic for arches where > pcibios_init() is also a subsys_initcall() and enumerates PCI devices. > > Sorry, that's no answer for you. Just more questions :) > > > > > > Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> > > > > > Signed-off-by: Jingfeng Sui <suijingfeng@loongson.cn> > > > > > --- > > > > > drivers/pci/quirks.c | 47 ++++++++++++++++++++++++++++++++++++++++++++ > > > > > 1 file changed, 47 insertions(+) > > > > > > > > > > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c > > > > > index 6ab4b3bba36b..adf5490706ad 100644 > > > > > --- a/drivers/pci/quirks.c > > > > > +++ b/drivers/pci/quirks.c > > > > > @@ -28,6 +28,7 @@ > > > > > #include <linux/platform_data/x86/apple.h> > > > > > #include <linux/pm_runtime.h> > > > > > #include <linux/switchtec.h> > > > > > +#include <linux/vgaarb.h> > > > > > #include <asm/dma.h> /* isa_dma_bridge_buggy */ > > > > > #include "pci.h" > > > > > > > > > > @@ -297,6 +298,52 @@ static void loongson_mrrs_quirk(struct pci_dev *dev) > > > > > } > > > > > DECLARE_PCI_FIXUP_ENABLE(PCI_ANY_ID, PCI_ANY_ID, loongson_mrrs_quirk); > > > > > > > > > > + > > > > > +static void aspeed_fixup_vgaarb(struct pci_dev *pdev) > > > > > +{ > > > > > + struct pci_dev *bridge; > > > > > + struct pci_bus *bus; > > > > > + struct pci_dev *vdevp = NULL; > > > > > + u16 config; > > > > > + > > > > > + bus = pdev->bus; > > > > > + bridge = bus->self; > > > > > + > > > > > + /* Is VGA routed to us? */ > > > > > + if (bridge && (pci_is_bridge(bridge))) { > > > > > + pci_read_config_word(bridge, PCI_BRIDGE_CONTROL, &config); > > > > > + > > > > > + /* Yes, this bridge is PCI bridge-to-bridge spec compliant, > > > > > + * just return! > > > > > + */ > > > > > + if (config & PCI_BRIDGE_CTL_VGA) > > > > > + return; > > > > > + > > > > > + dev_warn(&pdev->dev, "VGA bridge control is not enabled\n"); > > > > > + } > > > > > > > > You cannot assume that a bridge is defective just because > > > > PCI_BRIDGE_CTL_VGA is not set. > > > > > > > > > + /* Just return if the system already have a default device */ > > > > > + if (vga_default_device()) > > > > > + return; > > > > > + > > > > > + /* No default vga device */ > > > > > + while ((vdevp = pci_get_class(PCI_CLASS_DISPLAY_VGA << 8, vdevp))) { > > > > > + if (vdevp->vendor != 0x1a03) { > > > > > + /* Have other vga devcie in the system, do nothing */ > > > > > + dev_info(&pdev->dev, "Another boot vga device: 0x%x:0x%x\n", > > > > > + vdevp->vendor, vdevp->device); > > > > > + return; > > > > > + } > > > > > + } > > > > > + > > > > > + vga_set_default_device(pdev); > > > > > + > > > > > + dev_info(&pdev->dev, "Boot vga device set as 0x%x:0x%x\n", > > > > > + pdev->vendor, pdev->device); > > > > > +} > > > > > +DECLARE_PCI_FIXUP_CLASS_FINAL(0x1a03, 0x2000, PCI_CLASS_DISPLAY_VGA, 8, aspeed_fixup_vgaarb); > > > > > + > > > > > + > > > > > /* > > > > > * The Mellanox Tavor device gives false positive parity errors. Disable > > > > > * parity error reporting. > > > > > -- > > > > > 2.27.0 > > > > >
On 2021/5/18 上午11:09, Bjorn Helgaas wrote: > If VGA Enable is 0 and cannot be set to 1, the bridge should *never* > forward VGA accesses to its secondary bus. The generic VGA driver > that uses the legacy [mem 0xa0000-0xbffff] range should not work with > the VGA device at 05:00.0, and that device cannot participate in the > VGA arbitration scheme, which relies on the VGA Enable bit. > > If you have a driver for 05:00.0 that doesn't need the legacy memory > range, it's possible that it may work. But VGA arbitration will be > broken, and if 05:00.0 needs to be initialized by an option ROM, that > probably won't work either. We are not using a "generic VGA driver", in user space, we are using the modesetting driver come with X server, and it seems work normally. The real problems is VGA arbitration will not set 05:00.0 as the default VGA which means that when X server read /sys/devices/pci0000:00/0000:00:0c.0/0000:04:00.0/0000:05:00.0/boot_vga will get a "0". This break Xorg auto-detection. We want the boot_vga sysfs file be "1". > If the 04:00.0 bridge *always* forwards VGA accesses, even though its > VGA Enable bit is always zero, then the bridge is broken. In that > case, the generic VGA driver should work with the 05:00.0 device, but > VGA arbitration will be limited. I'm not sure, but the arbiter > *might* be able to use the VGA Enable bit in the 00:0c.0 bridge to > control VGA access to 05:00.0. You wouldn't be able to have more than > one VGA device below 00:0c.0, and you may not be able have more than > one in the entire system. We have only one VGA device(05:00.0) below 00:0c.0, but we do able to have more than one in the entire system. We could even mount a AMDGPU on this server. But in reality, there is a render only GPU and a self-designed display controller integrated in LS7A1000 bridge. Both the render only GPU and the display controller is PCI device, they are located at PCI root bus directly without a PCI-to-PCI bridge in the middle. The display controller is blocked by the firmware if ASPEED BMC card is present, it can't be accessed under linux kernel. Let me show you a updated version of the PCI topology of our server(machine): /sys/devices/pci0000:00 |-- 0000:00:06.0 | | -- class (0x040000) | | -- vendor (0x0014) | | -- device (0x7a15) | | -- drm | | -- ... |-- 0000:00:0c.0 | |-- class (0x060400) | |-- vendor (0x0014) | |-- device (0x7a09) | |-- ... | |-- 0000:04:00.0 | | | -- class (0x060400) | | | -- device (0x1150) | | | -- vendor (0x1a03) | | | -- revision (0x04) | | | -- ... | | | -- 0000:05:00.0 | | | | -- class (0x030000) | | | | -- device (0x2000) | | | | -- vendor (0x1a03) | | | | -- boot_vga | | | | -- i2c-6 | | | | -- drm | | | | -- graphics | | | | -- ... | `-- uevent `-- ... Even through the render only GPU(00:06.0) is not a VGA device, it still can disturb X server choose a primary device to use. But the root cause is the kernel side does not set 05:00.0 as default VGA. In this case X server will fallback to the first device found to use. and 00:06.0 is always found before 05:00.0. If kernel side set 05:00.0 as default VGA, all other problems is secondary.
On Tue, May 18, 2021 at 05:30:38PM +0800, suijingfeng wrote: > On 2021/5/18 上午11:09, Bjorn Helgaas wrote: > > > If VGA Enable is 0 and cannot be set to 1, the bridge should *never* > > forward VGA accesses to its secondary bus. The generic VGA driver > > that uses the legacy [mem 0xa0000-0xbffff] range should not work with > > the VGA device at 05:00.0, and that device cannot participate in the > > VGA arbitration scheme, which relies on the VGA Enable bit. > > > > If you have a driver for 05:00.0 that doesn't need the legacy memory > > range, it's possible that it may work. But VGA arbitration will be > > broken, and if 05:00.0 needs to be initialized by an option ROM, that > > probably won't work either. > > We are not using a "generic VGA driver", in user space, we are using the > modesetting driver come with X server, and it seems work normally. The real > problems is VGA arbitration will not set 05:00.0 as the default VGA which > means that when X server read > /sys/devices/pci0000:00/0000:00:0c.0/0000:04:00.0/0000:05:00.0/boot_vga will > get a "0". This break Xorg auto-detection. We want the boot_vga sysfs file > be "1". I don't think it's true; I think VGA arbitration *will* set 05:00.0 as the default VGA, as long as 05:00.0 has been enumerated before vga_arb_device_init(). As far as I can tell, the problem is not that the bridge is broken or that vga_arb_device_init() is broken. The problem is that on your system, vga_arb_device_init() runs before 05:00.0 has been enumerated, so it *can't* set 05:00.0 as the default VGA because it doesn't know about 05:00.0 at all. > > If the 04:00.0 bridge *always* forwards VGA accesses, even though its > > VGA Enable bit is always zero, then the bridge is broken. In that > > case, the generic VGA driver should work with the 05:00.0 device, but > > VGA arbitration will be limited. I'm not sure, but the arbiter > > *might* be able to use the VGA Enable bit in the 00:0c.0 bridge to > > control VGA access to 05:00.0. You wouldn't be able to have more than > > one VGA device below 00:0c.0, and you may not be able have more than > > one in the entire system. > > We have only one VGA device(05:00.0) below 00:0c.0, but we do able to have > more than one in the entire system. We could even mount a AMDGPU > on this server. But in reality, there is a render only GPU and a > self-designed display controller integrated in LS7A1000 bridge. Both the > render only GPU and the display controller is PCI device, they are located > at PCI root bus directly without a PCI-to-PCI bridge in the middle. The > display controller is blocked by the firmware if ASPEED BMC card is present, > it can't be accessed under linux kernel. Let me show you a updated version > of the PCI topology of our server(machine): > > /sys/devices/pci0000:00 > |-- 0000:00:06.0 > | | -- class (0x040000) > | | -- vendor (0x0014) > | | -- device (0x7a15) > | | -- drm > | | -- ... > |-- 0000:00:0c.0 > | |-- class (0x060400) > | |-- vendor (0x0014) > | |-- device (0x7a09) > | |-- ... > | |-- 0000:04:00.0 > | | | -- class (0x060400) > | | | -- device (0x1150) > | | | -- vendor (0x1a03) > | | | -- revision (0x04) > | | | -- ... > | | | -- 0000:05:00.0 > | | | | -- class (0x030000) > | | | | -- device (0x2000) > | | | | -- vendor (0x1a03) > | | | | -- boot_vga > | | | | -- i2c-6 > | | | | -- drm > | | | | -- graphics > | | | | -- ... > | `-- uevent > `-- ... > > Even through the render only GPU(00:06.0) is not a VGA device, it still can > disturb X server choose a primary device to use. But the root cause is the > kernel side does not set 05:00.0 as default VGA. In this case X server will > fallback to the first device found to use. and 00:06.0 is always found > before 05:00.0. If kernel side set 05:00.0 as default VGA, > all other problems is secondary. The fact that X selects 00:06.0 is a user-level thing and I don't know what's involved. Its class code (0x0400) looks like PCI_CLASS_MULTIMEDIA_VIDEO, so vga_arb_device_init() should completely ignore it. vga_arb_device_init() should set 05:00.0 as the *kernel's* default device as long as 05:00.0 has already been enumerated. Even if vga_arb_device_init() runs before 05:00.0 has been enumerated, it looks like vga_arbiter_add_pci_device() should notice when 05:00.0 is enumerated, and you should see the "VGA device added:" message for it. vga_arbiter_add_pci_device() looks like it *would* set 05:00.0 as the default device in that case, except for the fact that 04:00.0 doesn't support PCI_BRIDGE_CTL_VGA. That's probably a bug in a37c0f48950b ("vgaarb: Select a default VGA device even if there's no legacy VGA"). I think the logic added by a37c0f48950b probably should be shared between the vga_arb_device_init() initcall path and the vga_arbiter_add_pci_device() device-add path. Can you collect your dmesg output, so we can see the enumeration order and what vgaarb does with it? Bjorn
On Tue, May 18, 2021 at 03:13:43PM +0800, Huacai Chen wrote: > On Tue, May 18, 2021 at 2:28 AM Bjorn Helgaas <helgaas@kernel.org> wrote: > > On Mon, May 17, 2021 at 08:53:43PM +0800, Huacai Chen wrote: > > > On Sat, May 15, 2021 at 5:09 PM Huacai Chen <chenhuacai@gmail.com> wrote: > > > > On Fri, May 14, 2021 at 11:10 PM Bjorn Helgaas <helgaas@kernel.org> wrote: > > > > > On Fri, May 14, 2021 at 04:00:25PM +0800, Huacai Chen wrote: > > > > > > According to PCI-to-PCI bridge spec, bit 3 of Bridge Control Register is > > > > > > VGA Enable bit which modifies the response to VGA compatible addresses. > > > > > > > > > > The bridge spec is pretty old, and most of the content has been > > > > > incorporated into the PCIe spec. I think you can cite "PCIe r5.0, sec > > > > > 7.5.1.3.13" here instead. > > > > > > > > > > > If the VGA Enable bit is set, the bridge will decode and forward the > > > > > > following accesses on the primary interface to the secondary interface. > > > > > > > > > > *Which* following accesses? The structure of English requires that if > > > > > you say "the following accesses," you must continue by *listing* the > > > > > accesses. > > > > > > > > > > > The ASpeed AST2500 hardward does not set the VGA Enable bit on its > > > > > > bridge control register, which causes vgaarb subsystem don't think the > > > > > > VGA card behind the bridge as a valid boot vga device. > > > > > > > > > > s/hardward/bridge/ > > > > > s/vga/VGA/ (also in code comments and dmesg strings below) > > > > > > > > > > From the code, it looks like AST2500 ([1a03:2000]) is a VGA device, > > > > > since it apparently has a VGA class code. But here you say the > > > > > AST2500 has a Bridge Control register, which suggests that it's a > > > > > bridge. If AST2500 is some sort of combination that includes both a > > > > > bridge and a VGA device, please outline that topology. > > > > > > > > > > But the hardware defect is that some bridges forward VGA accesses even > > > > > though their VGA Enable bit is not set? The quirk should be attached > > > > > to broken *bridges*, not to VGA devices. > > > > > > > > > > If a bridge forwards VGA accesses regardless of how its VGA Enable bit > > > > > is set, that means VGA arbitration (in vgaarb.c) cannot work > > > > > correctly, so merely setting the default VGA device once in a quirk is > > > > > not sufficient. You would have to somehow disable any future attempts > > > > > to use other VGA devices. Only the VGA device below this defective > > > > > bridge is usable. Any other VGA devices in the system would be > > > > > useless. > > > > > > > > > > > So we provide a quirk to fix Xorg auto-detection. > > > > > > > > > > > > See similar bug: > > > > > > > > > > > > https://patchwork.kernel.org/project/linux-pci/patch/20170619023528.11532-1-dja@axtens.net/ > > > > > > > > > > This patch was never merged. If we merged a revised version, please > > > > > cite the SHA1 instead. > > > > > > > > This patch has never merged, and I found that it is unnecessary after > > > > commit a37c0f48950b56f6ef2ee637 ("vgaarb: Select a default VGA device > > > > even if there's no legacy VGA"). Maybe this ASpeed patch is also > > > > unnecessary. If it is still needed, I'll investigate the root cause. > > > > > > I found that vga_arb_device_init() and pcibios_init() are both wrapped > > > by subsys_initcall(), which means their sequence is unpredictable. And > > > unfortunately, in our platform vga_arb_device_init() is called before > > > pcibios_init(), which makes vga_arb_device_init() fail to set a > > > default vga device. This is the root cause why we thought that we > > > still need a quirk for AST2500. > > > > Does this mean there is no hardware defect here? The VGA Enable bit > > works correctly? > > > No, VGA Enable bit still doesn't set, but with commit > a37c0f48950b56f6ef2ee637 ("vgaarb: Select a default VGA device even if > there's no legacy VGA") we no longer depend on VGA Enable. Correct me if I'm wrong: - On the AST2500 bridge [1a03:1150], the VGA Enable bit is read-only 0. - The AST2500 bridge never forwards VGA accesses ([mem 0xa0000-0xbffff], [io 0x3b0-0x3bb], [io 0x3c0-0x3df]) to its secondary bus. The VGA Enable bit is optional, and if both the above are true, the bridge is working correctly per spec, and the quirk below is not the right solution, and whatever solution we come up with should not claim that the bridge is misbehaving. > > > I think the best solution is make vga_arb_device_init() be wrapped by > > > subsys_initcall_sync(), do you think so? > > > > Hmm. Unfortunately the semantics of subsys_initcall_sync() are not > > documented, so I'm not sure exactly *why* such a change would work and > > whether we could rely on it to continue working. > > > > pcibios_init() isn't very consistent across arches. On some, > > including alpha, microblaze, some MIPS platforms, powerpc, and sh, it > > enumerates PCI devices. On others (ia64, parisc, sparc, x86), it does > > basically nothing. That makes life a little difficult. > > subsys_initcall_sync() is ensured after all subsys_initcall() > functions, so at least it can solve the problem on platforms which use > pcibios_init() to enumerate PCI devices (x86 and other ACPI-based > platforms are also OK, because they use acpi_init() > -->acpi_scan_init() -->pci_acpi_scan_root() to enumerate devices). More details in my response to suijingfeng: https://lore.kernel.org/r/20210518193100.GA148462@bjorn-Precision-5520 I'd rather not fiddle with the initcall ordering. That mechanism is fragile and I'd prefer something more robust. I'm wondering whether it's practical to do something in the normal PCI enumeration path, e.g., in pci_init_capabilities(). Maybe we can detect the default VGA device as we enumerate it. Then we wouldn't have this weird process of "find all PCI devices first, then scan for the default VGA device, and oh, by the way, also check for VGA devices hot-added later." Bjorn
Hi, Bjorn, On Wed, May 19, 2021 at 3:35 AM Bjorn Helgaas <helgaas@kernel.org> wrote: > > On Tue, May 18, 2021 at 03:13:43PM +0800, Huacai Chen wrote: > > On Tue, May 18, 2021 at 2:28 AM Bjorn Helgaas <helgaas@kernel.org> wrote: > > > On Mon, May 17, 2021 at 08:53:43PM +0800, Huacai Chen wrote: > > > > On Sat, May 15, 2021 at 5:09 PM Huacai Chen <chenhuacai@gmail.com> wrote: > > > > > On Fri, May 14, 2021 at 11:10 PM Bjorn Helgaas <helgaas@kernel.org> wrote: > > > > > > On Fri, May 14, 2021 at 04:00:25PM +0800, Huacai Chen wrote: > > > > > > > According to PCI-to-PCI bridge spec, bit 3 of Bridge Control Register is > > > > > > > VGA Enable bit which modifies the response to VGA compatible addresses. > > > > > > > > > > > > The bridge spec is pretty old, and most of the content has been > > > > > > incorporated into the PCIe spec. I think you can cite "PCIe r5.0, sec > > > > > > 7.5.1.3.13" here instead. > > > > > > > > > > > > > If the VGA Enable bit is set, the bridge will decode and forward the > > > > > > > following accesses on the primary interface to the secondary interface. > > > > > > > > > > > > *Which* following accesses? The structure of English requires that if > > > > > > you say "the following accesses," you must continue by *listing* the > > > > > > accesses. > > > > > > > > > > > > > The ASpeed AST2500 hardward does not set the VGA Enable bit on its > > > > > > > bridge control register, which causes vgaarb subsystem don't think the > > > > > > > VGA card behind the bridge as a valid boot vga device. > > > > > > > > > > > > s/hardward/bridge/ > > > > > > s/vga/VGA/ (also in code comments and dmesg strings below) > > > > > > > > > > > > From the code, it looks like AST2500 ([1a03:2000]) is a VGA device, > > > > > > since it apparently has a VGA class code. But here you say the > > > > > > AST2500 has a Bridge Control register, which suggests that it's a > > > > > > bridge. If AST2500 is some sort of combination that includes both a > > > > > > bridge and a VGA device, please outline that topology. > > > > > > > > > > > > But the hardware defect is that some bridges forward VGA accesses even > > > > > > though their VGA Enable bit is not set? The quirk should be attached > > > > > > to broken *bridges*, not to VGA devices. > > > > > > > > > > > > If a bridge forwards VGA accesses regardless of how its VGA Enable bit > > > > > > is set, that means VGA arbitration (in vgaarb.c) cannot work > > > > > > correctly, so merely setting the default VGA device once in a quirk is > > > > > > not sufficient. You would have to somehow disable any future attempts > > > > > > to use other VGA devices. Only the VGA device below this defective > > > > > > bridge is usable. Any other VGA devices in the system would be > > > > > > useless. > > > > > > > > > > > > > So we provide a quirk to fix Xorg auto-detection. > > > > > > > > > > > > > > See similar bug: > > > > > > > > > > > > > > https://patchwork.kernel.org/project/linux-pci/patch/20170619023528.11532-1-dja@axtens.net/ > > > > > > > > > > > > This patch was never merged. If we merged a revised version, please > > > > > > cite the SHA1 instead. > > > > > > > > > > This patch has never merged, and I found that it is unnecessary after > > > > > commit a37c0f48950b56f6ef2ee637 ("vgaarb: Select a default VGA device > > > > > even if there's no legacy VGA"). Maybe this ASpeed patch is also > > > > > unnecessary. If it is still needed, I'll investigate the root cause. > > > > > > > > I found that vga_arb_device_init() and pcibios_init() are both wrapped > > > > by subsys_initcall(), which means their sequence is unpredictable. And > > > > unfortunately, in our platform vga_arb_device_init() is called before > > > > pcibios_init(), which makes vga_arb_device_init() fail to set a > > > > default vga device. This is the root cause why we thought that we > > > > still need a quirk for AST2500. > > > > > > Does this mean there is no hardware defect here? The VGA Enable bit > > > works correctly? > > > > > No, VGA Enable bit still doesn't set, but with commit > > a37c0f48950b56f6ef2ee637 ("vgaarb: Select a default VGA device even if > > there's no legacy VGA") we no longer depend on VGA Enable. > > Correct me if I'm wrong: > > - On the AST2500 bridge [1a03:1150], the VGA Enable bit is > read-only 0. > > - The AST2500 bridge never forwards VGA accesses ([mem > 0xa0000-0xbffff], [io 0x3b0-0x3bb], [io 0x3c0-0x3df]) to its > secondary bus. > > The VGA Enable bit is optional, and if both the above are true, the > bridge is working correctly per spec, and the quirk below is not the > right solution, and whatever solution we come up with should not > claim that the bridge is misbehaving. Yes, you are right, the bridge is working correctly, which is similar to HiSilicon D05. > > > > > I think the best solution is make vga_arb_device_init() be wrapped by > > > > subsys_initcall_sync(), do you think so? > > > > > > Hmm. Unfortunately the semantics of subsys_initcall_sync() are not > > > documented, so I'm not sure exactly *why* such a change would work and > > > whether we could rely on it to continue working. > > > > > > pcibios_init() isn't very consistent across arches. On some, > > > including alpha, microblaze, some MIPS platforms, powerpc, and sh, it > > > enumerates PCI devices. On others (ia64, parisc, sparc, x86), it does > > > basically nothing. That makes life a little difficult. > > > > subsys_initcall_sync() is ensured after all subsys_initcall() > > functions, so at least it can solve the problem on platforms which use > > pcibios_init() to enumerate PCI devices (x86 and other ACPI-based > > platforms are also OK, because they use acpi_init() > > -->acpi_scan_init() -->pci_acpi_scan_root() to enumerate devices). > > More details in my response to suijingfeng: > https://lore.kernel.org/r/20210518193100.GA148462@bjorn-Precision-5520 > > I'd rather not fiddle with the initcall ordering. That mechanism is > fragile and I'd prefer something more robust. > > I'm wondering whether it's practical to do something in the normal PCI > enumeration path, e.g., in pci_init_capabilities(). Maybe we can > detect the default VGA device as we enumerate it. Then we wouldn't > have this weird process of "find all PCI devices first, then scan for > the default VGA device, and oh, by the way, also check for VGA devices > hot-added later." If we don't want to rely on initcall order, and want to solve the hot-added case, then can we add vga_arb_select_default_device() in pci_notify() when (action == BUS_NOTIFY_ADD_DEVICE && !vga_default_device())? Huacai > > Bjorn
On Wed, May 19, 2021 at 10:17:14AM +0800, Huacai Chen wrote: > On Wed, May 19, 2021 at 3:35 AM Bjorn Helgaas <helgaas@kernel.org> wrote: > > On Tue, May 18, 2021 at 03:13:43PM +0800, Huacai Chen wrote: > > > On Tue, May 18, 2021 at 2:28 AM Bjorn Helgaas <helgaas@kernel.org> wrote: > > > > On Mon, May 17, 2021 at 08:53:43PM +0800, Huacai Chen wrote: > > > > > On Sat, May 15, 2021 at 5:09 PM Huacai Chen <chenhuacai@gmail.com> wrote: > > > > > > On Fri, May 14, 2021 at 11:10 PM Bjorn Helgaas <helgaas@kernel.org> wrote: > > > > > > > On Fri, May 14, 2021 at 04:00:25PM +0800, Huacai Chen wrote: > > > > > > > > According to PCI-to-PCI bridge spec, bit 3 of Bridge Control Register is > > > > > > > > VGA Enable bit which modifies the response to VGA compatible addresses. > > > > > > > > > > > > > > The bridge spec is pretty old, and most of the content has been > > > > > > > incorporated into the PCIe spec. I think you can cite "PCIe r5.0, sec > > > > > > > 7.5.1.3.13" here instead. > > > > > > > > > > > > > > > If the VGA Enable bit is set, the bridge will decode and forward the > > > > > > > > following accesses on the primary interface to the secondary interface. > > > > > > > > > > > > > > *Which* following accesses? The structure of English requires that if > > > > > > > you say "the following accesses," you must continue by *listing* the > > > > > > > accesses. > > > > > > > > > > > > > > > The ASpeed AST2500 hardward does not set the VGA Enable bit on its > > > > > > > > bridge control register, which causes vgaarb subsystem don't think the > > > > > > > > VGA card behind the bridge as a valid boot vga device. > > > > > > > > > > > > > > s/hardward/bridge/ > > > > > > > s/vga/VGA/ (also in code comments and dmesg strings below) > > > > > > > > > > > > > > From the code, it looks like AST2500 ([1a03:2000]) is a VGA device, > > > > > > > since it apparently has a VGA class code. But here you say the > > > > > > > AST2500 has a Bridge Control register, which suggests that it's a > > > > > > > bridge. If AST2500 is some sort of combination that includes both a > > > > > > > bridge and a VGA device, please outline that topology. > > > > > > > > > > > > > > But the hardware defect is that some bridges forward VGA accesses even > > > > > > > though their VGA Enable bit is not set? The quirk should be attached > > > > > > > to broken *bridges*, not to VGA devices. > > > > > > > > > > > > > > If a bridge forwards VGA accesses regardless of how its VGA Enable bit > > > > > > > is set, that means VGA arbitration (in vgaarb.c) cannot work > > > > > > > correctly, so merely setting the default VGA device once in a quirk is > > > > > > > not sufficient. You would have to somehow disable any future attempts > > > > > > > to use other VGA devices. Only the VGA device below this defective > > > > > > > bridge is usable. Any other VGA devices in the system would be > > > > > > > useless. > > > > > > > > > > > > > > > So we provide a quirk to fix Xorg auto-detection. > > > > > > > > > > > > > > > > See similar bug: > > > > > > > > > > > > > > > > https://patchwork.kernel.org/project/linux-pci/patch/20170619023528.11532-1-dja@axtens.net/ > > > > > > > > > > > > > > This patch was never merged. If we merged a revised version, please > > > > > > > cite the SHA1 instead. > > > > > > > > > > > > This patch has never merged, and I found that it is unnecessary after > > > > > > commit a37c0f48950b56f6ef2ee637 ("vgaarb: Select a default VGA device > > > > > > even if there's no legacy VGA"). Maybe this ASpeed patch is also > > > > > > unnecessary. If it is still needed, I'll investigate the root cause. > > > > > > > > > > I found that vga_arb_device_init() and pcibios_init() are both wrapped > > > > > by subsys_initcall(), which means their sequence is unpredictable. And > > > > > unfortunately, in our platform vga_arb_device_init() is called before > > > > > pcibios_init(), which makes vga_arb_device_init() fail to set a > > > > > default vga device. This is the root cause why we thought that we > > > > > still need a quirk for AST2500. > > > > > > > > Does this mean there is no hardware defect here? The VGA Enable bit > > > > works correctly? > > > > > > > No, VGA Enable bit still doesn't set, but with commit > > > a37c0f48950b56f6ef2ee637 ("vgaarb: Select a default VGA device even if > > > there's no legacy VGA") we no longer depend on VGA Enable. > > > > Correct me if I'm wrong: > > > > - On the AST2500 bridge [1a03:1150], the VGA Enable bit is > > read-only 0. > > > > - The AST2500 bridge never forwards VGA accesses ([mem > > 0xa0000-0xbffff], [io 0x3b0-0x3bb], [io 0x3c0-0x3df]) to its > > secondary bus. > > > > The VGA Enable bit is optional, and if both the above are true, the > > bridge is working correctly per spec, and the quirk below is not the > > right solution, and whatever solution we come up with should not > > claim that the bridge is misbehaving. > Yes, you are right, the bridge is working correctly, which is similar > to HiSilicon D05. > > > > > > > > > I think the best solution is make vga_arb_device_init() be wrapped by > > > > > subsys_initcall_sync(), do you think so? > > > > > > > > Hmm. Unfortunately the semantics of subsys_initcall_sync() are not > > > > documented, so I'm not sure exactly *why* such a change would work and > > > > whether we could rely on it to continue working. > > > > > > > > pcibios_init() isn't very consistent across arches. On some, > > > > including alpha, microblaze, some MIPS platforms, powerpc, and sh, it > > > > enumerates PCI devices. On others (ia64, parisc, sparc, x86), it does > > > > basically nothing. That makes life a little difficult. > > > > > > subsys_initcall_sync() is ensured after all subsys_initcall() > > > functions, so at least it can solve the problem on platforms which use > > > pcibios_init() to enumerate PCI devices (x86 and other ACPI-based > > > platforms are also OK, because they use acpi_init() > > > -->acpi_scan_init() -->pci_acpi_scan_root() to enumerate devices). > > > > More details in my response to suijingfeng: > > https://lore.kernel.org/r/20210518193100.GA148462@bjorn-Precision-5520 > > > > I'd rather not fiddle with the initcall ordering. That mechanism is > > fragile and I'd prefer something more robust. > > > > I'm wondering whether it's practical to do something in the normal PCI > > enumeration path, e.g., in pci_init_capabilities(). Maybe we can > > detect the default VGA device as we enumerate it. Then we wouldn't > > have this weird process of "find all PCI devices first, then scan for > > the default VGA device, and oh, by the way, also check for VGA devices > > hot-added later." > > If we don't want to rely on initcall order, and want to solve the > hot-added case, then can we add vga_arb_select_default_device() in > pci_notify() when (action == BUS_NOTIFY_ADD_DEVICE && > !vga_default_device())? I think I would see if it's possible to call vga_arb_select_default_device() from vga_arbiter_add_pci_device() instead of from vga_arb_device_init(). I would also (as a separate patch) try to get rid of this loop in vga_arb_device_init(): list_for_each_entry(vgadev, &vga_list, list) { struct device *dev = &vgadev->pdev->dev; if (vgadev->bridge_has_one_vga) vgaarb_info(dev, "bridge control possible\n"); else vgaarb_info(dev, "no bridge control possible\n"); } and do the vgaarb_info() in vga_arbiter_check_bridge_sharing(), where the loop would not be needed. Bjorn
Hi, Bjorn, On Thu, May 20, 2021 at 3:33 AM Bjorn Helgaas <helgaas@kernel.org> wrote: > > On Wed, May 19, 2021 at 10:17:14AM +0800, Huacai Chen wrote: > > On Wed, May 19, 2021 at 3:35 AM Bjorn Helgaas <helgaas@kernel.org> wrote: > > > On Tue, May 18, 2021 at 03:13:43PM +0800, Huacai Chen wrote: > > > > On Tue, May 18, 2021 at 2:28 AM Bjorn Helgaas <helgaas@kernel.org> wrote: > > > > > On Mon, May 17, 2021 at 08:53:43PM +0800, Huacai Chen wrote: > > > > > > On Sat, May 15, 2021 at 5:09 PM Huacai Chen <chenhuacai@gmail.com> wrote: > > > > > > > On Fri, May 14, 2021 at 11:10 PM Bjorn Helgaas <helgaas@kernel.org> wrote: > > > > > > > > On Fri, May 14, 2021 at 04:00:25PM +0800, Huacai Chen wrote: > > > > > > > > > According to PCI-to-PCI bridge spec, bit 3 of Bridge Control Register is > > > > > > > > > VGA Enable bit which modifies the response to VGA compatible addresses. > > > > > > > > > > > > > > > > The bridge spec is pretty old, and most of the content has been > > > > > > > > incorporated into the PCIe spec. I think you can cite "PCIe r5.0, sec > > > > > > > > 7.5.1.3.13" here instead. > > > > > > > > > > > > > > > > > If the VGA Enable bit is set, the bridge will decode and forward the > > > > > > > > > following accesses on the primary interface to the secondary interface. > > > > > > > > > > > > > > > > *Which* following accesses? The structure of English requires that if > > > > > > > > you say "the following accesses," you must continue by *listing* the > > > > > > > > accesses. > > > > > > > > > > > > > > > > > The ASpeed AST2500 hardward does not set the VGA Enable bit on its > > > > > > > > > bridge control register, which causes vgaarb subsystem don't think the > > > > > > > > > VGA card behind the bridge as a valid boot vga device. > > > > > > > > > > > > > > > > s/hardward/bridge/ > > > > > > > > s/vga/VGA/ (also in code comments and dmesg strings below) > > > > > > > > > > > > > > > > From the code, it looks like AST2500 ([1a03:2000]) is a VGA device, > > > > > > > > since it apparently has a VGA class code. But here you say the > > > > > > > > AST2500 has a Bridge Control register, which suggests that it's a > > > > > > > > bridge. If AST2500 is some sort of combination that includes both a > > > > > > > > bridge and a VGA device, please outline that topology. > > > > > > > > > > > > > > > > But the hardware defect is that some bridges forward VGA accesses even > > > > > > > > though their VGA Enable bit is not set? The quirk should be attached > > > > > > > > to broken *bridges*, not to VGA devices. > > > > > > > > > > > > > > > > If a bridge forwards VGA accesses regardless of how its VGA Enable bit > > > > > > > > is set, that means VGA arbitration (in vgaarb.c) cannot work > > > > > > > > correctly, so merely setting the default VGA device once in a quirk is > > > > > > > > not sufficient. You would have to somehow disable any future attempts > > > > > > > > to use other VGA devices. Only the VGA device below this defective > > > > > > > > bridge is usable. Any other VGA devices in the system would be > > > > > > > > useless. > > > > > > > > > > > > > > > > > So we provide a quirk to fix Xorg auto-detection. > > > > > > > > > > > > > > > > > > See similar bug: > > > > > > > > > > > > > > > > > > https://patchwork.kernel.org/project/linux-pci/patch/20170619023528.11532-1-dja@axtens.net/ > > > > > > > > > > > > > > > > This patch was never merged. If we merged a revised version, please > > > > > > > > cite the SHA1 instead. > > > > > > > > > > > > > > This patch has never merged, and I found that it is unnecessary after > > > > > > > commit a37c0f48950b56f6ef2ee637 ("vgaarb: Select a default VGA device > > > > > > > even if there's no legacy VGA"). Maybe this ASpeed patch is also > > > > > > > unnecessary. If it is still needed, I'll investigate the root cause. > > > > > > > > > > > > I found that vga_arb_device_init() and pcibios_init() are both wrapped > > > > > > by subsys_initcall(), which means their sequence is unpredictable. And > > > > > > unfortunately, in our platform vga_arb_device_init() is called before > > > > > > pcibios_init(), which makes vga_arb_device_init() fail to set a > > > > > > default vga device. This is the root cause why we thought that we > > > > > > still need a quirk for AST2500. > > > > > > > > > > Does this mean there is no hardware defect here? The VGA Enable bit > > > > > works correctly? > > > > > > > > > No, VGA Enable bit still doesn't set, but with commit > > > > a37c0f48950b56f6ef2ee637 ("vgaarb: Select a default VGA device even if > > > > there's no legacy VGA") we no longer depend on VGA Enable. > > > > > > Correct me if I'm wrong: > > > > > > - On the AST2500 bridge [1a03:1150], the VGA Enable bit is > > > read-only 0. > > > > > > - The AST2500 bridge never forwards VGA accesses ([mem > > > 0xa0000-0xbffff], [io 0x3b0-0x3bb], [io 0x3c0-0x3df]) to its > > > secondary bus. > > > > > > The VGA Enable bit is optional, and if both the above are true, the > > > bridge is working correctly per spec, and the quirk below is not the > > > right solution, and whatever solution we come up with should not > > > claim that the bridge is misbehaving. > > Yes, you are right, the bridge is working correctly, which is similar > > to HiSilicon D05. > > > > > > > > > > > > > I think the best solution is make vga_arb_device_init() be wrapped by > > > > > > subsys_initcall_sync(), do you think so? > > > > > > > > > > Hmm. Unfortunately the semantics of subsys_initcall_sync() are not > > > > > documented, so I'm not sure exactly *why* such a change would work and > > > > > whether we could rely on it to continue working. > > > > > > > > > > pcibios_init() isn't very consistent across arches. On some, > > > > > including alpha, microblaze, some MIPS platforms, powerpc, and sh, it > > > > > enumerates PCI devices. On others (ia64, parisc, sparc, x86), it does > > > > > basically nothing. That makes life a little difficult. > > > > > > > > subsys_initcall_sync() is ensured after all subsys_initcall() > > > > functions, so at least it can solve the problem on platforms which use > > > > pcibios_init() to enumerate PCI devices (x86 and other ACPI-based > > > > platforms are also OK, because they use acpi_init() > > > > -->acpi_scan_init() -->pci_acpi_scan_root() to enumerate devices). > > > > > > More details in my response to suijingfeng: > > > https://lore.kernel.org/r/20210518193100.GA148462@bjorn-Precision-5520 > > > > > > I'd rather not fiddle with the initcall ordering. That mechanism is > > > fragile and I'd prefer something more robust. > > > > > > I'm wondering whether it's practical to do something in the normal PCI > > > enumeration path, e.g., in pci_init_capabilities(). Maybe we can > > > detect the default VGA device as we enumerate it. Then we wouldn't > > > have this weird process of "find all PCI devices first, then scan for > > > the default VGA device, and oh, by the way, also check for VGA devices > > > hot-added later." > > > > If we don't want to rely on initcall order, and want to solve the > > hot-added case, then can we add vga_arb_select_default_device() in > > pci_notify() when (action == BUS_NOTIFY_ADD_DEVICE && > > !vga_default_device())? > > I think I would see if it's possible to call > vga_arb_select_default_device() from vga_arbiter_add_pci_device() > instead of from vga_arb_device_init(). > > I would also (as a separate patch) try to get rid of this loop in > vga_arb_device_init(): > > list_for_each_entry(vgadev, &vga_list, list) { > struct device *dev = &vgadev->pdev->dev; > > if (vgadev->bridge_has_one_vga) > vgaarb_info(dev, "bridge control possible\n"); > else > vgaarb_info(dev, "no bridge control possible\n"); > } > > and do the vgaarb_info() in vga_arbiter_check_bridge_sharing(), where > the loop would not be needed. Any updates? Huacai > > Bjorn
On Tue, May 25, 2021 at 07:03:05PM +0800, Huacai Chen wrote: > On Thu, May 20, 2021 at 3:33 AM Bjorn Helgaas <helgaas@kernel.org> wrote: > > On Wed, May 19, 2021 at 10:17:14AM +0800, Huacai Chen wrote: > > > On Wed, May 19, 2021 at 3:35 AM Bjorn Helgaas <helgaas@kernel.org> wrote: > > > > On Tue, May 18, 2021 at 03:13:43PM +0800, Huacai Chen wrote: > > > > > On Tue, May 18, 2021 at 2:28 AM Bjorn Helgaas <helgaas@kernel.org> wrote: > > > > > > On Mon, May 17, 2021 at 08:53:43PM +0800, Huacai Chen wrote: > > > > > > > On Sat, May 15, 2021 at 5:09 PM Huacai Chen <chenhuacai@gmail.com> wrote: > > > > > > > > On Fri, May 14, 2021 at 11:10 PM Bjorn Helgaas <helgaas@kernel.org> wrote: > > > > > > > > > On Fri, May 14, 2021 at 04:00:25PM +0800, Huacai Chen wrote: > > > > > > > > > > According to PCI-to-PCI bridge spec, bit 3 of Bridge Control Register is > > > > > > > > > > VGA Enable bit which modifies the response to VGA compatible addresses. > > > > > > > > > > > > > > > > > > The bridge spec is pretty old, and most of the content has been > > > > > > > > > incorporated into the PCIe spec. I think you can cite "PCIe r5.0, sec > > > > > > > > > 7.5.1.3.13" here instead. > > > > > > > > > > > > > > > > > > > If the VGA Enable bit is set, the bridge will decode and forward the > > > > > > > > > > following accesses on the primary interface to the secondary interface. > > > > > > > > > > > > > > > > > > *Which* following accesses? The structure of English requires that if > > > > > > > > > you say "the following accesses," you must continue by *listing* the > > > > > > > > > accesses. > > > > > > > > > > > > > > > > > > > The ASpeed AST2500 hardward does not set the VGA Enable bit on its > > > > > > > > > > bridge control register, which causes vgaarb subsystem don't think the > > > > > > > > > > VGA card behind the bridge as a valid boot vga device. > > > > > > > > > > > > > > > > > > s/hardward/bridge/ > > > > > > > > > s/vga/VGA/ (also in code comments and dmesg strings below) > > > > > > > > > > > > > > > > > > From the code, it looks like AST2500 ([1a03:2000]) is a VGA device, > > > > > > > > > since it apparently has a VGA class code. But here you say the > > > > > > > > > AST2500 has a Bridge Control register, which suggests that it's a > > > > > > > > > bridge. If AST2500 is some sort of combination that includes both a > > > > > > > > > bridge and a VGA device, please outline that topology. > > > > > > > > > > > > > > > > > > But the hardware defect is that some bridges forward VGA accesses even > > > > > > > > > though their VGA Enable bit is not set? The quirk should be attached > > > > > > > > > to broken *bridges*, not to VGA devices. > > > > > > > > > > > > > > > > > > If a bridge forwards VGA accesses regardless of how its VGA Enable bit > > > > > > > > > is set, that means VGA arbitration (in vgaarb.c) cannot work > > > > > > > > > correctly, so merely setting the default VGA device once in a quirk is > > > > > > > > > not sufficient. You would have to somehow disable any future attempts > > > > > > > > > to use other VGA devices. Only the VGA device below this defective > > > > > > > > > bridge is usable. Any other VGA devices in the system would be > > > > > > > > > useless. > > > > > > > > > > > > > > > > > > > So we provide a quirk to fix Xorg auto-detection. > > > > > > > > > > > > > > > > > > > > See similar bug: > > > > > > > > > > > > > > > > > > > > https://patchwork.kernel.org/project/linux-pci/patch/20170619023528.11532-1-dja@axtens.net/ > > > > > > > > > > > > > > > > > > This patch was never merged. If we merged a revised version, please > > > > > > > > > cite the SHA1 instead. > > > > > > > > > > > > > > > > This patch has never merged, and I found that it is unnecessary after > > > > > > > > commit a37c0f48950b56f6ef2ee637 ("vgaarb: Select a default VGA device > > > > > > > > even if there's no legacy VGA"). Maybe this ASpeed patch is also > > > > > > > > unnecessary. If it is still needed, I'll investigate the root cause. > > > > > > > > > > > > > > I found that vga_arb_device_init() and pcibios_init() are both wrapped > > > > > > > by subsys_initcall(), which means their sequence is unpredictable. And > > > > > > > unfortunately, in our platform vga_arb_device_init() is called before > > > > > > > pcibios_init(), which makes vga_arb_device_init() fail to set a > > > > > > > default vga device. This is the root cause why we thought that we > > > > > > > still need a quirk for AST2500. > > > > > > > > > > > > Does this mean there is no hardware defect here? The VGA Enable bit > > > > > > works correctly? > > > > > > > > > > > No, VGA Enable bit still doesn't set, but with commit > > > > > a37c0f48950b56f6ef2ee637 ("vgaarb: Select a default VGA device even if > > > > > there's no legacy VGA") we no longer depend on VGA Enable. > > > > > > > > Correct me if I'm wrong: > > > > > > > > - On the AST2500 bridge [1a03:1150], the VGA Enable bit is > > > > read-only 0. > > > > > > > > - The AST2500 bridge never forwards VGA accesses ([mem > > > > 0xa0000-0xbffff], [io 0x3b0-0x3bb], [io 0x3c0-0x3df]) to its > > > > secondary bus. > > > > > > > > The VGA Enable bit is optional, and if both the above are true, the > > > > bridge is working correctly per spec, and the quirk below is not the > > > > right solution, and whatever solution we come up with should not > > > > claim that the bridge is misbehaving. > > > Yes, you are right, the bridge is working correctly, which is similar > > > to HiSilicon D05. > > > > > > > > > > > > > > > > > I think the best solution is make vga_arb_device_init() be wrapped by > > > > > > > subsys_initcall_sync(), do you think so? > > > > > > > > > > > > Hmm. Unfortunately the semantics of subsys_initcall_sync() are not > > > > > > documented, so I'm not sure exactly *why* such a change would work and > > > > > > whether we could rely on it to continue working. > > > > > > > > > > > > pcibios_init() isn't very consistent across arches. On some, > > > > > > including alpha, microblaze, some MIPS platforms, powerpc, and sh, it > > > > > > enumerates PCI devices. On others (ia64, parisc, sparc, x86), it does > > > > > > basically nothing. That makes life a little difficult. > > > > > > > > > > subsys_initcall_sync() is ensured after all subsys_initcall() > > > > > functions, so at least it can solve the problem on platforms which use > > > > > pcibios_init() to enumerate PCI devices (x86 and other ACPI-based > > > > > platforms are also OK, because they use acpi_init() > > > > > -->acpi_scan_init() -->pci_acpi_scan_root() to enumerate devices). > > > > > > > > More details in my response to suijingfeng: > > > > https://lore.kernel.org/r/20210518193100.GA148462@bjorn-Precision-5520 > > > > > > > > I'd rather not fiddle with the initcall ordering. That mechanism is > > > > fragile and I'd prefer something more robust. > > > > > > > > I'm wondering whether it's practical to do something in the normal PCI > > > > enumeration path, e.g., in pci_init_capabilities(). Maybe we can > > > > detect the default VGA device as we enumerate it. Then we wouldn't > > > > have this weird process of "find all PCI devices first, then scan for > > > > the default VGA device, and oh, by the way, also check for VGA devices > > > > hot-added later." > > > > > > If we don't want to rely on initcall order, and want to solve the > > > hot-added case, then can we add vga_arb_select_default_device() in > > > pci_notify() when (action == BUS_NOTIFY_ADD_DEVICE && > > > !vga_default_device())? > > > > I think I would see if it's possible to call > > vga_arb_select_default_device() from vga_arbiter_add_pci_device() > > instead of from vga_arb_device_init(). > > > > I would also (as a separate patch) try to get rid of this loop in > > vga_arb_device_init(): > > > > list_for_each_entry(vgadev, &vga_list, list) { > > struct device *dev = &vgadev->pdev->dev; > > > > if (vgadev->bridge_has_one_vga) > > vgaarb_info(dev, "bridge control possible\n"); > > else > > vgaarb_info(dev, "no bridge control possible\n"); > > } > > > > and do the vgaarb_info() in vga_arbiter_check_bridge_sharing(), where > > the loop would not be needed. > > Any updates? Are you waiting for me to do something else? I suggested an approach above, but I don't have time to actually do the work for you. Bjorn
Hi, Bjorn, On Tue, May 25, 2021 at 9:55 PM Bjorn Helgaas <helgaas@kernel.org> wrote: > > On Tue, May 25, 2021 at 07:03:05PM +0800, Huacai Chen wrote: > > On Thu, May 20, 2021 at 3:33 AM Bjorn Helgaas <helgaas@kernel.org> wrote: > > > On Wed, May 19, 2021 at 10:17:14AM +0800, Huacai Chen wrote: > > > > On Wed, May 19, 2021 at 3:35 AM Bjorn Helgaas <helgaas@kernel.org> wrote: > > > > > On Tue, May 18, 2021 at 03:13:43PM +0800, Huacai Chen wrote: > > > > > > On Tue, May 18, 2021 at 2:28 AM Bjorn Helgaas <helgaas@kernel.org> wrote: > > > > > > > On Mon, May 17, 2021 at 08:53:43PM +0800, Huacai Chen wrote: > > > > > > > > On Sat, May 15, 2021 at 5:09 PM Huacai Chen <chenhuacai@gmail.com> wrote: > > > > > > > > > On Fri, May 14, 2021 at 11:10 PM Bjorn Helgaas <helgaas@kernel.org> wrote: > > > > > > > > > > On Fri, May 14, 2021 at 04:00:25PM +0800, Huacai Chen wrote: > > > > > > > > > > > According to PCI-to-PCI bridge spec, bit 3 of Bridge Control Register is > > > > > > > > > > > VGA Enable bit which modifies the response to VGA compatible addresses. > > > > > > > > > > > > > > > > > > > > The bridge spec is pretty old, and most of the content has been > > > > > > > > > > incorporated into the PCIe spec. I think you can cite "PCIe r5.0, sec > > > > > > > > > > 7.5.1.3.13" here instead. > > > > > > > > > > > > > > > > > > > > > If the VGA Enable bit is set, the bridge will decode and forward the > > > > > > > > > > > following accesses on the primary interface to the secondary interface. > > > > > > > > > > > > > > > > > > > > *Which* following accesses? The structure of English requires that if > > > > > > > > > > you say "the following accesses," you must continue by *listing* the > > > > > > > > > > accesses. > > > > > > > > > > > > > > > > > > > > > The ASpeed AST2500 hardward does not set the VGA Enable bit on its > > > > > > > > > > > bridge control register, which causes vgaarb subsystem don't think the > > > > > > > > > > > VGA card behind the bridge as a valid boot vga device. > > > > > > > > > > > > > > > > > > > > s/hardward/bridge/ > > > > > > > > > > s/vga/VGA/ (also in code comments and dmesg strings below) > > > > > > > > > > > > > > > > > > > > From the code, it looks like AST2500 ([1a03:2000]) is a VGA device, > > > > > > > > > > since it apparently has a VGA class code. But here you say the > > > > > > > > > > AST2500 has a Bridge Control register, which suggests that it's a > > > > > > > > > > bridge. If AST2500 is some sort of combination that includes both a > > > > > > > > > > bridge and a VGA device, please outline that topology. > > > > > > > > > > > > > > > > > > > > But the hardware defect is that some bridges forward VGA accesses even > > > > > > > > > > though their VGA Enable bit is not set? The quirk should be attached > > > > > > > > > > to broken *bridges*, not to VGA devices. > > > > > > > > > > > > > > > > > > > > If a bridge forwards VGA accesses regardless of how its VGA Enable bit > > > > > > > > > > is set, that means VGA arbitration (in vgaarb.c) cannot work > > > > > > > > > > correctly, so merely setting the default VGA device once in a quirk is > > > > > > > > > > not sufficient. You would have to somehow disable any future attempts > > > > > > > > > > to use other VGA devices. Only the VGA device below this defective > > > > > > > > > > bridge is usable. Any other VGA devices in the system would be > > > > > > > > > > useless. > > > > > > > > > > > > > > > > > > > > > So we provide a quirk to fix Xorg auto-detection. > > > > > > > > > > > > > > > > > > > > > > See similar bug: > > > > > > > > > > > > > > > > > > > > > > https://patchwork.kernel.org/project/linux-pci/patch/20170619023528.11532-1-dja@axtens.net/ > > > > > > > > > > > > > > > > > > > > This patch was never merged. If we merged a revised version, please > > > > > > > > > > cite the SHA1 instead. > > > > > > > > > > > > > > > > > > This patch has never merged, and I found that it is unnecessary after > > > > > > > > > commit a37c0f48950b56f6ef2ee637 ("vgaarb: Select a default VGA device > > > > > > > > > even if there's no legacy VGA"). Maybe this ASpeed patch is also > > > > > > > > > unnecessary. If it is still needed, I'll investigate the root cause. > > > > > > > > > > > > > > > > I found that vga_arb_device_init() and pcibios_init() are both wrapped > > > > > > > > by subsys_initcall(), which means their sequence is unpredictable. And > > > > > > > > unfortunately, in our platform vga_arb_device_init() is called before > > > > > > > > pcibios_init(), which makes vga_arb_device_init() fail to set a > > > > > > > > default vga device. This is the root cause why we thought that we > > > > > > > > still need a quirk for AST2500. > > > > > > > > > > > > > > Does this mean there is no hardware defect here? The VGA Enable bit > > > > > > > works correctly? > > > > > > > > > > > > > No, VGA Enable bit still doesn't set, but with commit > > > > > > a37c0f48950b56f6ef2ee637 ("vgaarb: Select a default VGA device even if > > > > > > there's no legacy VGA") we no longer depend on VGA Enable. > > > > > > > > > > Correct me if I'm wrong: > > > > > > > > > > - On the AST2500 bridge [1a03:1150], the VGA Enable bit is > > > > > read-only 0. > > > > > > > > > > - The AST2500 bridge never forwards VGA accesses ([mem > > > > > 0xa0000-0xbffff], [io 0x3b0-0x3bb], [io 0x3c0-0x3df]) to its > > > > > secondary bus. > > > > > > > > > > The VGA Enable bit is optional, and if both the above are true, the > > > > > bridge is working correctly per spec, and the quirk below is not the > > > > > right solution, and whatever solution we come up with should not > > > > > claim that the bridge is misbehaving. > > > > Yes, you are right, the bridge is working correctly, which is similar > > > > to HiSilicon D05. > > > > > > > > > > > > > > > > > > > > > I think the best solution is make vga_arb_device_init() be wrapped by > > > > > > > > subsys_initcall_sync(), do you think so? > > > > > > > > > > > > > > Hmm. Unfortunately the semantics of subsys_initcall_sync() are not > > > > > > > documented, so I'm not sure exactly *why* such a change would work and > > > > > > > whether we could rely on it to continue working. > > > > > > > > > > > > > > pcibios_init() isn't very consistent across arches. On some, > > > > > > > including alpha, microblaze, some MIPS platforms, powerpc, and sh, it > > > > > > > enumerates PCI devices. On others (ia64, parisc, sparc, x86), it does > > > > > > > basically nothing. That makes life a little difficult. > > > > > > > > > > > > subsys_initcall_sync() is ensured after all subsys_initcall() > > > > > > functions, so at least it can solve the problem on platforms which use > > > > > > pcibios_init() to enumerate PCI devices (x86 and other ACPI-based > > > > > > platforms are also OK, because they use acpi_init() > > > > > > -->acpi_scan_init() -->pci_acpi_scan_root() to enumerate devices). > > > > > > > > > > More details in my response to suijingfeng: > > > > > https://lore.kernel.org/r/20210518193100.GA148462@bjorn-Precision-5520 > > > > > > > > > > I'd rather not fiddle with the initcall ordering. That mechanism is > > > > > fragile and I'd prefer something more robust. > > > > > > > > > > I'm wondering whether it's practical to do something in the normal PCI > > > > > enumeration path, e.g., in pci_init_capabilities(). Maybe we can > > > > > detect the default VGA device as we enumerate it. Then we wouldn't > > > > > have this weird process of "find all PCI devices first, then scan for > > > > > the default VGA device, and oh, by the way, also check for VGA devices > > > > > hot-added later." > > > > > > > > If we don't want to rely on initcall order, and want to solve the > > > > hot-added case, then can we add vga_arb_select_default_device() in > > > > pci_notify() when (action == BUS_NOTIFY_ADD_DEVICE && > > > > !vga_default_device())? > > > > > > I think I would see if it's possible to call > > > vga_arb_select_default_device() from vga_arbiter_add_pci_device() > > > instead of from vga_arb_device_init(). > > > > > > I would also (as a separate patch) try to get rid of this loop in > > > vga_arb_device_init(): > > > > > > list_for_each_entry(vgadev, &vga_list, list) { > > > struct device *dev = &vgadev->pdev->dev; > > > > > > if (vgadev->bridge_has_one_vga) > > > vgaarb_info(dev, "bridge control possible\n"); > > > else > > > vgaarb_info(dev, "no bridge control possible\n"); > > > } > > > > > > and do the vgaarb_info() in vga_arbiter_check_bridge_sharing(), where > > > the loop would not be needed. > > > > Any updates? > > Are you waiting for me to do something else? > > I suggested an approach above, but I don't have time to actually do > the work for you. Yes, I am really waiting... but I am also investigating history and thinking. If I haven't missed something (correct me if I'm wrong). For the original HiSilicon problem, the first attempt is to modify vga_arbiter_add_pci_device() and remove the VGA_RSRC_LEGACY_MASK check. But vga_arbiter_add_pci_device() is called for each PCI device, so removing that check will cause the first VGA device to be the default VGA device. This breaks some x86 platforms, so after that you don't touch vga_arbiter_add_pci_device(), but add vga_arb_select_default_device() in vga_arb_device_init(). If the above history is correct, then we cannot add vga_arb_select_default_device() in vga_arbiter_add_pci_device() directly. So it seems we can only add vga_arb_select_default_device() in pci_notify(). And if we don't care about hotplug, we can simply use subsys_initcall_sync() to wrap vga_arb_device_init(). And DRM developers, please let me know what do you think about? Huacai > > Bjorn
> > > > I think I would see if it's possible to call > > > > vga_arb_select_default_device() from vga_arbiter_add_pci_device() > > > > instead of from vga_arb_device_init(). > > > > > > > > I would also (as a separate patch) try to get rid of this loop in > > > > vga_arb_device_init(): > > > > > > > > list_for_each_entry(vgadev, &vga_list, list) { > > > > struct device *dev = &vgadev->pdev->dev; > > > > > > > > if (vgadev->bridge_has_one_vga) > > > > vgaarb_info(dev, "bridge control possible\n"); > > > > else > > > > vgaarb_info(dev, "no bridge control possible\n"); > > > > } > > > > > > > > and do the vgaarb_info() in vga_arbiter_check_bridge_sharing(), where > > > > the loop would not be needed. > > > > > > Any updates? > > > > Are you waiting for me to do something else? > > > > I suggested an approach above, but I don't have time to actually do > > the work for you. > Yes, I am really waiting... but I am also investigating history and thinking. > > If I haven't missed something (correct me if I'm wrong). For the > original HiSilicon problem, the first attempt is to modify > vga_arbiter_add_pci_device() and remove the VGA_RSRC_LEGACY_MASK > check. But vga_arbiter_add_pci_device() is called for each PCI device, > so removing that check will cause the first VGA device to be the > default VGA device. This breaks some x86 platforms, so after that you > don't touch vga_arbiter_add_pci_device(), but add > vga_arb_select_default_device() in vga_arb_device_init(). > > If the above history is correct, then we cannot add > vga_arb_select_default_device() in vga_arbiter_add_pci_device() > directly. So it seems we can only add vga_arb_select_default_device() > in pci_notify(). And if we don't care about hotplug, we can simply use > subsys_initcall_sync() to wrap vga_arb_device_init(). > > And DRM developers, please let me know what do you think about? I'm not 100% following what is going on here. Do you need call vga_arb_select_default_device after hotplug for some reason, or it this just a race with subsys_init? I think just adding subsys_initcall_sync should be fine I don't see why you'd want to care about making a hotplug VGA device the default at this point. Dave.
On Wed, May 26, 2021 at 01:00:33PM +1000, Dave Airlie wrote: > > > > > I think I would see if it's possible to call > > > > > vga_arb_select_default_device() from vga_arbiter_add_pci_device() > > > > > instead of from vga_arb_device_init(). > > > > > > > > > > I would also (as a separate patch) try to get rid of this loop in > > > > > vga_arb_device_init(): > > > > > > > > > > list_for_each_entry(vgadev, &vga_list, list) { > > > > > struct device *dev = &vgadev->pdev->dev; > > > > > > > > > > if (vgadev->bridge_has_one_vga) > > > > > vgaarb_info(dev, "bridge control possible\n"); > > > > > else > > > > > vgaarb_info(dev, "no bridge control possible\n"); > > > > > } > > > > > > > > > > and do the vgaarb_info() in vga_arbiter_check_bridge_sharing(), where > > > > > the loop would not be needed. > > > > > > > > Any updates? > > > > > > Are you waiting for me to do something else? > > > > > > I suggested an approach above, but I don't have time to actually do > > > the work for you. > > > > Yes, I am really waiting... but I am also investigating history > > and thinking. Well, don't wait for me because this work is not on my to-do list :) > > If I haven't missed something (correct me if I'm wrong). For the > > original HiSilicon problem, the first attempt is to modify > > vga_arbiter_add_pci_device() and remove the VGA_RSRC_LEGACY_MASK > > check. But vga_arbiter_add_pci_device() is called for each PCI device, > > so removing that check will cause the first VGA device to be the > > default VGA device. This breaks some x86 platforms, so after that you > > don't touch vga_arbiter_add_pci_device(), but add > > vga_arb_select_default_device() in vga_arb_device_init(). > > > > If the above history is correct, then we cannot add > > vga_arb_select_default_device() in vga_arbiter_add_pci_device() > > directly. So it seems we can only add vga_arb_select_default_device() > > in pci_notify(). And if we don't care about hotplug, we can simply use > > subsys_initcall_sync() to wrap vga_arb_device_init(). > > > > And DRM developers, please let me know what do you think about? > > I'm not 100% following what is going on here. > > Do you need call vga_arb_select_default_device after hotplug for some > reason, or it this just a race with subsys_init? > > I think just adding subsys_initcall_sync should be fine Doing subsys_initcall_sync(vga_arb_device_init) is probably "OK". I don't think it's *great* because initcalls don't make dependencies explicit so it won't be obvious *why* it's subsys_initcall_sync, and it feels a little like a band-aid. > I don't see why you'd want to care about making a hotplug VGA device > the default at this point. I don't think hotplug per se is relevant for this ASpeed case. But I think the current design is slightly broken in that we set up the machinery to call vga_arbiter_add_pci_device() for hot-added devices, but a hot-added device can never be set as the default VGA device. Imagine a system with a single VGA device. If that device is plugged in before boot, it becomes the default VGA device. If it is hot-added after boot, it does not. That inconsistency feels wrong to me. If it were possible to set the default VGA device in vga_arbiter_add_pci_device(), it would fix that inconsistency and solve the ASpeed case. But maybe that's not practical. Bjorn
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index 6ab4b3bba36b..adf5490706ad 100644 --- a/drivers/pci/quirks.c +++ b/drivers/pci/quirks.c @@ -28,6 +28,7 @@ #include <linux/platform_data/x86/apple.h> #include <linux/pm_runtime.h> #include <linux/switchtec.h> +#include <linux/vgaarb.h> #include <asm/dma.h> /* isa_dma_bridge_buggy */ #include "pci.h" @@ -297,6 +298,52 @@ static void loongson_mrrs_quirk(struct pci_dev *dev) } DECLARE_PCI_FIXUP_ENABLE(PCI_ANY_ID, PCI_ANY_ID, loongson_mrrs_quirk); + +static void aspeed_fixup_vgaarb(struct pci_dev *pdev) +{ + struct pci_dev *bridge; + struct pci_bus *bus; + struct pci_dev *vdevp = NULL; + u16 config; + + bus = pdev->bus; + bridge = bus->self; + + /* Is VGA routed to us? */ + if (bridge && (pci_is_bridge(bridge))) { + pci_read_config_word(bridge, PCI_BRIDGE_CONTROL, &config); + + /* Yes, this bridge is PCI bridge-to-bridge spec compliant, + * just return! + */ + if (config & PCI_BRIDGE_CTL_VGA) + return; + + dev_warn(&pdev->dev, "VGA bridge control is not enabled\n"); + } + + /* Just return if the system already have a default device */ + if (vga_default_device()) + return; + + /* No default vga device */ + while ((vdevp = pci_get_class(PCI_CLASS_DISPLAY_VGA << 8, vdevp))) { + if (vdevp->vendor != 0x1a03) { + /* Have other vga devcie in the system, do nothing */ + dev_info(&pdev->dev, "Another boot vga device: 0x%x:0x%x\n", + vdevp->vendor, vdevp->device); + return; + } + } + + vga_set_default_device(pdev); + + dev_info(&pdev->dev, "Boot vga device set as 0x%x:0x%x\n", + pdev->vendor, pdev->device); +} +DECLARE_PCI_FIXUP_CLASS_FINAL(0x1a03, 0x2000, PCI_CLASS_DISPLAY_VGA, 8, aspeed_fixup_vgaarb); + + /* * The Mellanox Tavor device gives false positive parity errors. Disable * parity error reporting.