Message ID | 20231221-thunderbolt-pci-patch-4-v4-1-2e136e57c9bc@chromium.org |
---|---|
State | New |
Headers | show |
Series | [v4] PCI: Relabel JHL6540 on Lenovo X1 Carbon 7,8 | expand |
On Thu, Dec 21, 2023 at 03:53:42PM -0500, Esther Shimanovich wrote: > +static void carbon_X1_fixup_relabel_alpine_ridge(struct pci_dev *dev) > +{ > + /* Is this JHL6540 PCI component embedded in a Lenovo device? */ > + if (dev->subsystem_vendor != PCI_VENDOR_ID_LENOVO) > + return; > + > + /* Is this JHL6540 PCI component embedded in an X1 Carbon Gen 7/8? */ > + if (dev->subsystem_device != 0x22be && // Gen 8 > + dev->subsystem_device != 0x2292) { // Gen 7 > + return; > + } > + > + dev_set_removable(&dev->dev, DEVICE_FIXED); > + > + /* Not all 0x15d3 components are external facing */ > + if (dev->device == 0x15d3 && > + dev->devfn != 0x08 && > + dev->devfn != 0x20) { > + return; > + } > + > + dev->external_facing = true; > +} > + > +/* > + * We also need to relabel the root port as a consequence of changing > + * the JHL6540's PCIE hierarchy. > + */ > +static void carbon_X1_fixup_rootport_not_removable(struct pci_dev *dev) > +{ > + /* Is this JHL6540 PCI component embedded in a Lenovo device? */ > + if (dev->subsystem_vendor != PCI_VENDOR_ID_LENOVO) > + return; > + > + /* Is this JHL6540 PCI component embedded in an X1 Carbon Gen 7/8? */ > + if (dev->subsystem_device != 0x22be && // Gen 8 > + dev->subsystem_device != 0x2292) { // Gen 7 > + return; > + } This ventures into the realm of nitpicking, but this can be factored out into a helper. I'll shut up now and let PCI folks handle this. Thanks.
On Thu, Dec 21, 2023 at 03:53:42PM -0500, Esther Shimanovich wrote: > On Lenovo X1 Carbon Gen 7/8 devices, when a platform enables a policy to > distrust removable PCI devices, the build-in USB-C ports stop working at > all. > This happens because these X1 Carbon models have a unique feature; a > Thunderbolt controller that is discrete from the SoC. The software sees > this controller, and incorrectly assumes it is a removable PCI device, > even though it is fixed to the computer and is wired to the computer's > own USB-C ports. > > Relabel all the components of the JHL6540 controller as DEVICE_FIXED, > and where applicable, external_facing. > > Ensure that the security policy to distrust external PCI devices works > as intended, and that the device's USB-C ports are able to enumerate > even when the policy is enabled. Thanks for all your work here. This is going to be a maintenance problem. We typically use quirks to work around hardware defects, e.g., a device that advertises a feature that doesn't work per spec. But I don't see where the defect is here. And I doubt that this is really a unique situation. So it's likely that this will happen on other systems, and we don't want to have to add quirks every time another one shows up. If this is a firmware defect, e.g., if this platform is using "ExternalFacingPort" incorrectly, we can add a quirk to work around that, too. But I'm not sure that's the case. > Signed-off-by: Esther Shimanovich <eshimanovich@chromium.org> > --- > Changes in v4: > - replaced a dmi check in the rootport quirk with a subsystem vendor and > device check. > - Link to v3: https://lore.kernel.org/r/20231220-thunderbolt-pci-patch-4-v3-1-056fd1717d06@chromium.org > > Changes in v3: > - removed redundant dmi check, as the subsystem vendor check is > sufficient > - switched to PCI_VENDOR_ID_LENOVO instead of hex code > - Link to v2: https://lore.kernel.org/r/20231219-thunderbolt-pci-patch-4-v2-1-ec2d7af45a9b@chromium.org > > Changes in v2: > - nothing new, v1 was just a test run to see if the ASCII diagram would > be rendered properly in mutt and k-9 > - for folks using gmail, make sure to select "show original" on the top > right, as otherwise the diagram will be garbled by the standard > non-monospace font > - Link to v1: https://lore.kernel.org/r/20231219-thunderbolt-pci-patch-4-v1-1-4e8e3773f0a9@chromium.org > --- > drivers/pci/quirks.c | 112 +++++++++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 112 insertions(+) > > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c > index ea476252280a..34e43323ff14 100644 > --- a/drivers/pci/quirks.c > +++ b/drivers/pci/quirks.c > @@ -3873,6 +3873,118 @@ DECLARE_PCI_FIXUP_SUSPEND_LATE(PCI_VENDOR_ID_INTEL, > quirk_apple_poweroff_thunderbolt); > #endif > > +/* > + * On most ThinkPad Carbon 7/8s, JHL6540 Thunderbolt 3 bridges are set > + * incorrectly as DEVICE_REMOVABLE despite being built into the device. > + * This is the side effect of a unique hardware configuration. > + * > + * Normally, Thunderbolt functionality is integrated to the SoC and > + * its root ports. > + * > + * Most devices: > + * root port --> USB-C port > + * > + * But X1 Carbon Gen 7/8 uses Whiskey Lake and Comet Lake SoC, which > + * don't come with Thunderbolt functionality. Therefore, a discrete > + * Thunderbolt Host PCI controller was added between the root port and > + * the USB-C port. > + * > + * Thinkpad Carbon 7/8s > + * (w/ Whiskey Lake and Comet Lake SoC): > + * root port --> JHL6540 --> USB-C port > + * > + * Because the root port is labeled by FW as "ExternalFacingPort", as > + * required by the DMAR ACPI spec, the JHL6540 chip is inaccurately Can you include a citation (spec name, revision, section) for this DMAR requirement? > + * labeled as DEVICE_REMOVABLE by the kernel pci driver. > + * Therefore, the built-in USB-C ports do not enumerate when policies > + * forbidding external pci devices are enforced. > + * > + * This fix relabels the pci components in the built-in JHL6540 chip as > + * DEVICE_FIXED, ensuring that the built-in USB-C ports always enumerate > + * properly as intended. > + * > + * This fix also labels the external facing components of the JHL6540 as > + * external_facing, so that the pci attach policy works as intended. > + * > + * The ASCII diagram below describes the pci layout of the JHL6540 chip. > + * > + * Root Port > + * [8086:02b4] or [8086:9db4] > + * | > + * JHL6540 Chip > + * __________________________________________________ > + * | Bridge | > + * | PCI ID -> [8086:15d3] | > + * | DEVFN -> (00) | > + * | _________________|__________________ | > + * | | | | | | > + * | Bridge Bridge Bridge Bridge | > + * | [8086:15d3] [8086:15d3] [8086:15d3] [8086:15d3] | > + * | (00) (08) (10) (20) | > + * | | | | | | > + * | NHI | USB Controller | | > + * | [8086:15d2] | [8086:15d4] | | > + * | (00) | (00) | | > + * | |___________| |___________| | > + * |____________|________________________|____________| > + * | | > + * USB-C Port USB-C Port > + * > + * > + * Based on what a JHL6549 pci component's pci id, subsystem device id > + * and devfn are, we can infer if it is fixed and if it faces a usb port; > + * which would mean it is external facing. > + * This quirk uses these values to identify the pci components and set the > + * properties accordingly. Random nits: Capitalize spec terms like "PCI", "USB", "Root Port", etc., consistently in English text. Rewrap to fill 78 columns or so. Add blank lines between paragraphs. This applies to the commit log (which should be wrapped to 75 to allow for "git log" indent) as well as comments. But honestly, I hope we can figure out a solution that doesn't require a big comment like this. Checking for random device IDs to deduce the topology is not the way PCI is supposed to work. PCI is designed to be enumerable, so software can learn what it needs to know by interrogating the hardware directly. For cases where that's impossible, ACPI is supposed to fill the gaps, e.g., with "ExternalFacingPort". If this patch covers over a gap that firmware doesn't handle yet, maybe we need to add some new firmware interface. If that's the case, we can add quirks for platforms that don't have the new interface. But we at least need a plan that doesn't require quirks indefinitely. > + */ > +static void carbon_X1_fixup_relabel_alpine_ridge(struct pci_dev *dev) > +{ > + /* Is this JHL6540 PCI component embedded in a Lenovo device? */ > + if (dev->subsystem_vendor != PCI_VENDOR_ID_LENOVO) > + return; > + > + /* Is this JHL6540 PCI component embedded in an X1 Carbon Gen 7/8? */ > + if (dev->subsystem_device != 0x22be && // Gen 8 > + dev->subsystem_device != 0x2292) { // Gen 7 > + return; > + } > + > + dev_set_removable(&dev->dev, DEVICE_FIXED); > + > + /* Not all 0x15d3 components are external facing */ > + if (dev->device == 0x15d3 && > + dev->devfn != 0x08 && > + dev->devfn != 0x20) { > + return; > + } > + > + dev->external_facing = true; > +} > + > +/* > + * We also need to relabel the root port as a consequence of changing > + * the JHL6540's PCIE hierarchy. > + */ > +static void carbon_X1_fixup_rootport_not_removable(struct pci_dev *dev) > +{ > + /* Is this JHL6540 PCI component embedded in a Lenovo device? */ > + if (dev->subsystem_vendor != PCI_VENDOR_ID_LENOVO) > + return; > + > + /* Is this JHL6540 PCI component embedded in an X1 Carbon Gen 7/8? */ > + if (dev->subsystem_device != 0x22be && // Gen 8 > + dev->subsystem_device != 0x2292) { // Gen 7 > + return; > + } > + > + dev->external_facing = false; > +} > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x15d3, carbon_X1_fixup_relabel_alpine_ridge); > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x15d2, carbon_X1_fixup_relabel_alpine_ridge); > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x15d4, carbon_X1_fixup_relabel_alpine_ridge); > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x02b4, carbon_X1_fixup_rootport_not_removable); > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x9db4, carbon_X1_fixup_rootport_not_removable); > + > /* > * Following are device-specific reset methods which can be used to > * reset a single function if other methods (e.g. FLR, PM D0->D3) are > > --- > base-commit: b85ea95d086471afb4ad062012a4d73cd328fa86 > change-id: 20231219-thunderbolt-pci-patch-4-ede71cb833c4 > > Best regards, > -- > Esther Shimanovich <eshimanovich@chromium.org> >
[adding Mika and Mario to the list of recipients, original patch is here: https://lore.kernel.org/all/20231221-thunderbolt-pci-patch-4-v4-1-2e136e57c9bc@chromium.org/ a lot of folks are on vacation, replies might not be sent before Jan 8; full quote without further comments below] On Thu, Dec 21, 2023 at 03:53:42PM -0500, Esther Shimanovich wrote: > On Lenovo X1 Carbon Gen 7/8 devices, when a platform enables a policy to > distrust removable PCI devices, the build-in USB-C ports stop working at > all. > This happens because these X1 Carbon models have a unique feature; a > Thunderbolt controller that is discrete from the SoC. The software sees > this controller, and incorrectly assumes it is a removable PCI device, > even though it is fixed to the computer and is wired to the computer's > own USB-C ports. > > Relabel all the components of the JHL6540 controller as DEVICE_FIXED, > and where applicable, external_facing. > > Ensure that the security policy to distrust external PCI devices works > as intended, and that the device's USB-C ports are able to enumerate > even when the policy is enabled. > > Signed-off-by: Esther Shimanovich <eshimanovich@chromium.org> > --- > Changes in v4: > - replaced a dmi check in the rootport quirk with a subsystem vendor and > device check. > - Link to v3: https://lore.kernel.org/r/20231220-thunderbolt-pci-patch-4-v3-1-056fd1717d06@chromium.org > > Changes in v3: > - removed redundant dmi check, as the subsystem vendor check is > sufficient > - switched to PCI_VENDOR_ID_LENOVO instead of hex code > - Link to v2: https://lore.kernel.org/r/20231219-thunderbolt-pci-patch-4-v2-1-ec2d7af45a9b@chromium.org > > Changes in v2: > - nothing new, v1 was just a test run to see if the ASCII diagram would > be rendered properly in mutt and k-9 > - for folks using gmail, make sure to select "show original" on the top > right, as otherwise the diagram will be garbled by the standard > non-monospace font > - Link to v1: https://lore.kernel.org/r/20231219-thunderbolt-pci-patch-4-v1-1-4e8e3773f0a9@chromium.org > --- > drivers/pci/quirks.c | 112 +++++++++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 112 insertions(+) > > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c > index ea476252280a..34e43323ff14 100644 > --- a/drivers/pci/quirks.c > +++ b/drivers/pci/quirks.c > @@ -3873,6 +3873,118 @@ DECLARE_PCI_FIXUP_SUSPEND_LATE(PCI_VENDOR_ID_INTEL, > quirk_apple_poweroff_thunderbolt); > #endif > > +/* > + * On most ThinkPad Carbon 7/8s, JHL6540 Thunderbolt 3 bridges are set > + * incorrectly as DEVICE_REMOVABLE despite being built into the device. > + * This is the side effect of a unique hardware configuration. > + * > + * Normally, Thunderbolt functionality is integrated to the SoC and > + * its root ports. > + * > + * Most devices: > + * root port --> USB-C port > + * > + * But X1 Carbon Gen 7/8 uses Whiskey Lake and Comet Lake SoC, which > + * don't come with Thunderbolt functionality. Therefore, a discrete > + * Thunderbolt Host PCI controller was added between the root port and > + * the USB-C port. > + * > + * Thinkpad Carbon 7/8s > + * (w/ Whiskey Lake and Comet Lake SoC): > + * root port --> JHL6540 --> USB-C port > + * > + * Because the root port is labeled by FW as "ExternalFacingPort", as > + * required by the DMAR ACPI spec, the JHL6540 chip is inaccurately > + * labeled as DEVICE_REMOVABLE by the kernel pci driver. > + * Therefore, the built-in USB-C ports do not enumerate when policies > + * forbidding external pci devices are enforced. > + * > + * This fix relabels the pci components in the built-in JHL6540 chip as > + * DEVICE_FIXED, ensuring that the built-in USB-C ports always enumerate > + * properly as intended. > + * > + * This fix also labels the external facing components of the JHL6540 as > + * external_facing, so that the pci attach policy works as intended. > + * > + * The ASCII diagram below describes the pci layout of the JHL6540 chip. > + * > + * Root Port > + * [8086:02b4] or [8086:9db4] > + * | > + * JHL6540 Chip > + * __________________________________________________ > + * | Bridge | > + * | PCI ID -> [8086:15d3] | > + * | DEVFN -> (00) | > + * | _________________|__________________ | > + * | | | | | | > + * | Bridge Bridge Bridge Bridge | > + * | [8086:15d3] [8086:15d3] [8086:15d3] [8086:15d3] | > + * | (00) (08) (10) (20) | > + * | | | | | | > + * | NHI | USB Controller | | > + * | [8086:15d2] | [8086:15d4] | | > + * | (00) | (00) | | > + * | |___________| |___________| | > + * |____________|________________________|____________| > + * | | > + * USB-C Port USB-C Port > + * > + * > + * Based on what a JHL6549 pci component's pci id, subsystem device id > + * and devfn are, we can infer if it is fixed and if it faces a usb port; > + * which would mean it is external facing. > + * This quirk uses these values to identify the pci components and set the > + * properties accordingly. > + */ > +static void carbon_X1_fixup_relabel_alpine_ridge(struct pci_dev *dev) > +{ > + /* Is this JHL6540 PCI component embedded in a Lenovo device? */ > + if (dev->subsystem_vendor != PCI_VENDOR_ID_LENOVO) > + return; > + > + /* Is this JHL6540 PCI component embedded in an X1 Carbon Gen 7/8? */ > + if (dev->subsystem_device != 0x22be && // Gen 8 > + dev->subsystem_device != 0x2292) { // Gen 7 > + return; > + } > + > + dev_set_removable(&dev->dev, DEVICE_FIXED); > + > + /* Not all 0x15d3 components are external facing */ > + if (dev->device == 0x15d3 && > + dev->devfn != 0x08 && > + dev->devfn != 0x20) { > + return; > + } > + > + dev->external_facing = true; > +} > + > +/* > + * We also need to relabel the root port as a consequence of changing > + * the JHL6540's PCIE hierarchy. > + */ > +static void carbon_X1_fixup_rootport_not_removable(struct pci_dev *dev) > +{ > + /* Is this JHL6540 PCI component embedded in a Lenovo device? */ > + if (dev->subsystem_vendor != PCI_VENDOR_ID_LENOVO) > + return; > + > + /* Is this JHL6540 PCI component embedded in an X1 Carbon Gen 7/8? */ > + if (dev->subsystem_device != 0x22be && // Gen 8 > + dev->subsystem_device != 0x2292) { // Gen 7 > + return; > + } > + > + dev->external_facing = false; > +} > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x15d3, carbon_X1_fixup_relabel_alpine_ridge); > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x15d2, carbon_X1_fixup_relabel_alpine_ridge); > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x15d4, carbon_X1_fixup_relabel_alpine_ridge); > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x02b4, carbon_X1_fixup_rootport_not_removable); > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x9db4, carbon_X1_fixup_rootport_not_removable); > + > /* > * Following are device-specific reset methods which can be used to > * reset a single function if other methods (e.g. FLR, PM D0->D3) are > > --- > base-commit: b85ea95d086471afb4ad062012a4d73cd328fa86 > change-id: 20231219-thunderbolt-pci-patch-4-ede71cb833c4 > > Best regards, > -- > Esther Shimanovich <eshimanovich@chromium.org>
Thanks Lukas for including me. On Thu, Dec 28, 2023 at 02:25:17PM +0100, Lukas Wunner wrote: > [adding Mika and Mario to the list of recipients, original patch is here: > https://lore.kernel.org/all/20231221-thunderbolt-pci-patch-4-v4-1-2e136e57c9bc@chromium.org/ > a lot of folks are on vacation, replies might not be sent before Jan 8; > full quote without further comments below] > > On Thu, Dec 21, 2023 at 03:53:42PM -0500, Esther Shimanovich wrote: > > On Lenovo X1 Carbon Gen 7/8 devices, when a platform enables a policy to > > distrust removable PCI devices, the build-in USB-C ports stop working at > > all. > > This happens because these X1 Carbon models have a unique feature; a > > Thunderbolt controller that is discrete from the SoC. The software sees > > this controller, and incorrectly assumes it is a removable PCI device, > > even though it is fixed to the computer and is wired to the computer's > > own USB-C ports. Yes, the ExternalFacingPort applies to root ports so that includes everything below that. > > Relabel all the components of the JHL6540 controller as DEVICE_FIXED, > > and where applicable, external_facing. > > > > Ensure that the security policy to distrust external PCI devices works > > as intended, and that the device's USB-C ports are able to enumerate > > even when the policy is enabled. This has been working just fine so far and as far as I can tell there is no such "policy" in place in the mainline kernel. > > > > Signed-off-by: Esther Shimanovich <eshimanovich@chromium.org> > > --- > > Changes in v4: > > - replaced a dmi check in the rootport quirk with a subsystem vendor and > > device check. > > - Link to v3: https://lore.kernel.org/r/20231220-thunderbolt-pci-patch-4-v3-1-056fd1717d06@chromium.org > > > > Changes in v3: > > - removed redundant dmi check, as the subsystem vendor check is > > sufficient > > - switched to PCI_VENDOR_ID_LENOVO instead of hex code > > - Link to v2: https://lore.kernel.org/r/20231219-thunderbolt-pci-patch-4-v2-1-ec2d7af45a9b@chromium.org > > > > Changes in v2: > > - nothing new, v1 was just a test run to see if the ASCII diagram would > > be rendered properly in mutt and k-9 > > - for folks using gmail, make sure to select "show original" on the top > > right, as otherwise the diagram will be garbled by the standard > > non-monospace font > > - Link to v1: https://lore.kernel.org/r/20231219-thunderbolt-pci-patch-4-v1-1-4e8e3773f0a9@chromium.org > > --- > > drivers/pci/quirks.c | 112 +++++++++++++++++++++++++++++++++++++++++++++++++++ > > 1 file changed, 112 insertions(+) > > > > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c > > index ea476252280a..34e43323ff14 100644 > > --- a/drivers/pci/quirks.c > > +++ b/drivers/pci/quirks.c > > @@ -3873,6 +3873,118 @@ DECLARE_PCI_FIXUP_SUSPEND_LATE(PCI_VENDOR_ID_INTEL, > > quirk_apple_poweroff_thunderbolt); > > #endif > > > > +/* > > + * On most ThinkPad Carbon 7/8s, JHL6540 Thunderbolt 3 bridges are set > > + * incorrectly as DEVICE_REMOVABLE despite being built into the device. > > + * This is the side effect of a unique hardware configuration. There is really nothing "unique" here. It's exactly as specified by this: https://learn.microsoft.com/en-us/windows-hardware/drivers/pci/dsd-for-pcie-root-ports#identifying-externally-exposed-pcie-root-ports and being used in many many system already out there and those have been working just fine. > > + * > > + * Normally, Thunderbolt functionality is integrated to the SoC and > > + * its root ports. > > + * > > + * Most devices: > > + * root port --> USB-C port > > + * > > + * But X1 Carbon Gen 7/8 uses Whiskey Lake and Comet Lake SoC, which > > + * don't come with Thunderbolt functionality. Therefore, a discrete > > + * Thunderbolt Host PCI controller was added between the root port and > > + * the USB-C port. > > + * > > + * Thinkpad Carbon 7/8s > > + * (w/ Whiskey Lake and Comet Lake SoC): > > + * root port --> JHL6540 --> USB-C port > > + * > > + * Because the root port is labeled by FW as "ExternalFacingPort", as > > + * required by the DMAR ACPI spec, the JHL6540 chip is inaccurately > > + * labeled as DEVICE_REMOVABLE by the kernel pci driver. > > + * Therefore, the built-in USB-C ports do not enumerate when policies > > + * forbidding external pci devices are enforced. > > + * > > + * This fix relabels the pci components in the built-in JHL6540 chip as > > + * DEVICE_FIXED, ensuring that the built-in USB-C ports always enumerate > > + * properly as intended. > > + * > > + * This fix also labels the external facing components of the JHL6540 as > > + * external_facing, so that the pci attach policy works as intended. > > + * > > + * The ASCII diagram below describes the pci layout of the JHL6540 chip. > > + * > > + * Root Port > > + * [8086:02b4] or [8086:9db4] > > + * | > > + * JHL6540 Chip > > + * __________________________________________________ > > + * | Bridge | > > + * | PCI ID -> [8086:15d3] | > > + * | DEVFN -> (00) | > > + * | _________________|__________________ | > > + * | | | | | | > > + * | Bridge Bridge Bridge Bridge | > > + * | [8086:15d3] [8086:15d3] [8086:15d3] [8086:15d3] | > > + * | (00) (08) (10) (20) | > > + * | | | | | | > > + * | NHI | USB Controller | | > > + * | [8086:15d2] | [8086:15d4] | | > > + * | (00) | (00) | | > > + * | |___________| |___________| | > > + * |____________|________________________|____________| > > + * | | > > + * USB-C Port USB-C Port > > + * > > + * > > + * Based on what a JHL6549 pci component's pci id, subsystem device id > > + * and devfn are, we can infer if it is fixed and if it faces a usb port; > > + * which would mean it is external facing. > > + * This quirk uses these values to identify the pci components and set the > > + * properties accordingly. > > + */ > > +static void carbon_X1_fixup_relabel_alpine_ridge(struct pci_dev *dev) > > +{ > > + /* Is this JHL6540 PCI component embedded in a Lenovo device? */ > > + if (dev->subsystem_vendor != PCI_VENDOR_ID_LENOVO) > > + return; > > + > > + /* Is this JHL6540 PCI component embedded in an X1 Carbon Gen 7/8? */ > > + if (dev->subsystem_device != 0x22be && // Gen 8 > > + dev->subsystem_device != 0x2292) { // Gen 7 > > + return; > > + } > > + > > + dev_set_removable(&dev->dev, DEVICE_FIXED); > > + > > + /* Not all 0x15d3 components are external facing */ > > + if (dev->device == 0x15d3 && > > + dev->devfn != 0x08 && > > + dev->devfn != 0x20) { > > + return; > > + } > > + > > + dev->external_facing = true; > > +} > > + > > +/* > > + * We also need to relabel the root port as a consequence of changing > > + * the JHL6540's PCIE hierarchy. > > + */ > > +static void carbon_X1_fixup_rootport_not_removable(struct pci_dev *dev) > > +{ > > + /* Is this JHL6540 PCI component embedded in a Lenovo device? */ > > + if (dev->subsystem_vendor != PCI_VENDOR_ID_LENOVO) > > + return; > > + > > + /* Is this JHL6540 PCI component embedded in an X1 Carbon Gen 7/8? */ > > + if (dev->subsystem_device != 0x22be && // Gen 8 > > + dev->subsystem_device != 0x2292) { // Gen 7 > > + return; > > + } > > + > > + dev->external_facing = false; > > +} > > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x15d3, carbon_X1_fixup_relabel_alpine_ridge); > > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x15d2, carbon_X1_fixup_relabel_alpine_ridge); > > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x15d4, carbon_X1_fixup_relabel_alpine_ridge); > > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x02b4, carbon_X1_fixup_rootport_not_removable); > > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x9db4, carbon_X1_fixup_rootport_not_removable); This is not scalable at all. You would need to include lots of systems here. And there should be no issue at all anyways. Can you elaborate what the issue is and which mainline kernel you are using to reproduce this?
Thank you for all your comments! I really appreciate all your help with this. I will address the style feedback once we reach a decision on how we will fix this bug. I first will respond to your comments, and then I will list out the possible solutions to this bug, in a way that takes into account all of your insights. On Tue, Dec 26, 2023 at 7:15 PM Bjorn Helgaas <helgaas@kernel.org> wrote: > Can you include a citation (spec name, revision, section) for this > DMAR requirement? > This was my mistake–I misinterpreted what a firmware developer told me. This is a firmware ACPI requirement from windows, which is not in the DMAR spec. Windows uses it to identify externally exposed PCIE root ports. https://learn.microsoft.com/en-us/windows-hardware/drivers/pci/dsd-for-pcie-root-ports#identifying-externally-exposed-pcie-root-ports > But I don't see where the defect is here. And I doubt that this is > really a unique situation. So it's likely that this will happen on > other systems, and we don't want to have to add quirks every time > another one shows up. ... > don't have the new interface. But we at least need a plan that > doesn't require quirks indefinitely. ... On Thu, Dec 28, 2023 at 8:41 AM Mika Westerberg <mika.westerberg@linux.intel.com> wrote: > This is not scalable at all. You would need to include lots of systems > here. And there should be no issue at all anyways. My team tests hundreds of different devices, and this is the only one which exhibited this issue that we’ve seen so far. No other devices we’ve seen so far have a discrete internal Thunderbolt controller which is treated as a removable device. Therefore, we don’t expect that a large number of devices will need this quirk. > There is really nothing "unique" here. It's exactly as specified by > this: > > https://learn.microsoft.com/en-us/windows-hardware/drivers/pci/dsd-for-pcie-root-ports#identifying-externally-exposed-pcie-root-ports > > and being used in many many system already out there and those have been > working just fine. I don’t know how many computers have a discrete Thunderbolt chip that is separate from their CPU, but this doesn’t seem to be a common occurrence. These devices were made during a narrow window of time when CPUs didn’t have Thunderbolt features yet, so a separate JHL6540 chip had to be added so that Lenovo could include Thunderbolt on X1 Carbon Gen 7/8. As you said, these devices do indeed work fine in cases where you don’t care if a PCI Thunderbolt device is internal or external, which is most cases. Problems happen only whenever someone adds a security policy, or some other feature that cares about the distinction between a fixed or removable PCI device. > This has been working just fine so far and as far as I can tell there is > no such "policy" in place in the mainline kernel. Correct, there is no such policy in the mainline kernel as of now. The bug is that the linux kernel’s “removable” property is inaccurate for this device. > Can you elaborate what the issue is and which mainline kernel you are > using to reproduce this? Thanks for this question! On a Lenovo Thinkpad Gen 7/Gen 8 computer with the linux kernel installed, when you look at the properties of the JHL6540 Thunderbolt controller, you see that it is incorrectly labeled as removable. I have replicated this bug on the b85ea95d0864 Linux 6.7-rc1 kernel. Before my patch, you see that the JHL6540 controller is inaccurately labeled “removable”: $ udevadm info -a -p /sys/bus/pci/devices/0000:05:00.0 | grep -e {removable} -e {device} -e {vendor} -e looking looking at device '/devices/pci0000:00/0000:00:1d.4/0000:05:00.0': ATTR{device}=="0x15d3" ATTR{removable}=="removable" ATTR{vendor}=="0x8086" looking at parent device '/devices/pci0000:00/0000:00:1d.4': ATTRS{device}=="0x02b4" ATTRS{vendor}=="0x8086" looking at parent device '/devices/pci0000:00': After applying the patch in this ticket, we see the JHL6540 controller is now labeled as “fixed”: $ udevadm info -a -p /sys/bus/pci/devices/0000:05:00.0 | grep -e {removable} -e {device} -e {vendor} -e looking looking at device '/devices/pci0000:00/0000:00:1d.4/0000:05:00.0': ATTR{device}=="0x15d3" ATTR{removable}=="fixed" ATTR{vendor}=="0x8086" looking at parent device '/devices/pci0000:00/0000:00:1d.4': ATTRS{device}=="0x02b4" ATTRS{vendor}=="0x8086" looking at parent device '/devices/pci0000:00': OK so here is the part where I share what I’ve developed as a result of your comments: The two options I see to resolve this are as follows: 1) Either we fix this by adding a new firmware interface as Bjorn Helgaas brought up. 2) Alternatively we may address this through a cleaned-up version of this patch If the solution is to add a firmware interface, how would I go about that process? Could you put me in touch with someone with that know-how? Would we have a temporary software quirk in place while the firmware spec is being updated? I am deferring to your expertise and knowledge in solving this bug. Thank you for all your help.
On Wed, Jan 17, 2024 at 04:21:18PM -0500, Esther Shimanovich wrote: > Thank you for all your comments! I really appreciate all your help > with this. I will address the style feedback once we reach a decision > on how we will fix this bug. > I first will respond to your comments, and then I will list out the > possible solutions to this bug, in a way that takes into account all > of your insights. > > On Tue, Dec 26, 2023 at 7:15 PM Bjorn Helgaas <helgaas@kernel.org> wrote: > > Can you include a citation (spec name, revision, section) for this > > DMAR requirement? > > > This was my mistake–I misinterpreted what a firmware developer told > me. This is a firmware ACPI requirement from windows, which is not in > the DMAR spec. Windows uses it to identify externally exposed PCIE > root ports. > https://learn.microsoft.com/en-us/windows-hardware/drivers/pci/dsd-for-pcie-root-ports#identifying-externally-exposed-pcie-root-ports > > > But I don't see where the defect is here. And I doubt that this is > > really a unique situation. So it's likely that this will happen on > > other systems, and we don't want to have to add quirks every time > > another one shows up. > ... > > don't have the new interface. But we at least need a plan that > > doesn't require quirks indefinitely. > ... > On Thu, Dec 28, 2023 at 8:41 AM Mika Westerberg > <mika.westerberg@linux.intel.com> wrote: > > This is not scalable at all. You would need to include lots of systems > > here. And there should be no issue at all anyways. > My team tests hundreds of different devices, and this is the only one > which exhibited this issue that we’ve seen so far. > No other devices we’ve seen so far have a discrete internal > Thunderbolt controller which is treated as a removable device. > Therefore, we don’t expect that a large number of devices will need > this quirk. Well that's pretty much all Intel Titan Ridge and Maple Ridge based systems. Some early ones did not use IOMMU but all the rest do. > > There is really nothing "unique" here. It's exactly as specified by > > this: > > > > https://learn.microsoft.com/en-us/windows-hardware/drivers/pci/dsd-for-pcie-root-ports#identifying-externally-exposed-pcie-root-ports > > > > and being used in many many system already out there and those have been > > working just fine. > I don’t know how many computers have a discrete Thunderbolt chip that > is separate from their CPU, but this doesn’t seem to be a common > occurrence. > These devices were made during a narrow window of time when CPUs > didn’t have Thunderbolt features yet, so a separate JHL6540 chip had > to be added so that Lenovo could include Thunderbolt on X1 Carbon Gen > 7/8. Before Intel Ice Lake it was all discrete and it is still discrete with the Barlow Ridge controller who will have exact same ExternalFacing port as the previous models. > As you said, these devices do indeed work fine in cases where you > don’t care if a PCI Thunderbolt device is internal or external, which > is most cases. > Problems happen only whenever someone adds a security policy, or some > other feature that cares about the distinction between a fixed or > removable PCI device. Do we have such security policy in the mainline kernel? > > This has been working just fine so far and as far as I can tell there is > > no such "policy" in place in the mainline kernel. > Correct, there is no such policy in the mainline kernel as of now. The > bug is that the linux kernel’s “removable” property is inaccurate for > this device. Or perhaps the "policy" should know this better? IIRC there were some "exceptions" in the Chrome kernel that allowed to put these devices into "allowlist" is this not the case anymore? > > Can you elaborate what the issue is and which mainline kernel you are > > using to reproduce this? > Thanks for this question! On a Lenovo Thinkpad Gen 7/Gen 8 computer > with the linux kernel installed, when you look at the properties of > the JHL6540 Thunderbolt controller, you see that it is incorrectly > labeled as removable. I have replicated this bug on the b85ea95d0864 > Linux 6.7-rc1 kernel. > > Before my patch, you see that the JHL6540 controller is inaccurately > labeled “removable”: > $ udevadm info -a -p /sys/bus/pci/devices/0000:05:00.0 | grep -e > {removable} -e {device} -e {vendor} -e looking > looking at device '/devices/pci0000:00/0000:00:1d.4/0000:05:00.0': > ATTR{device}=="0x15d3" > ATTR{removable}=="removable" > ATTR{vendor}=="0x8086" This is actually accurate. The Thunderbolt controller is itself hot-removable and that BTW happens to be hot-removed when fwupd applies firmware upgrades to the device.
On 1/18/2024 00:00, Mika Westerberg wrote: >> Before my patch, you see that the JHL6540 controller is inaccurately >> labeled “removable”: >> $ udevadm info -a -p /sys/bus/pci/devices/0000:05:00.0 | grep -e >> {removable} -e {device} -e {vendor} -e looking >> looking at device '/devices/pci0000:00/0000:00:1d.4/0000:05:00.0': >> ATTR{device}=="0x15d3" >> ATTR{removable}=="removable" >> ATTR{vendor}=="0x8086" > > This is actually accurate. The Thunderbolt controller is itself > hot-removable and that BTW happens to be hot-removed when fwupd applies > firmware upgrades to the device. Depending on the consumers of this removable attribute I wonder if we need to a new ATTR of "external" instead of overloading "removable".
On Thu, Jan 18, 2024 at 09:47:07AM -0600, Mario Limonciello wrote: > On 1/18/2024 00:00, Mika Westerberg wrote: > > > Before my patch, you see that the JHL6540 controller is inaccurately > > > labeled “removable”: > > > $ udevadm info -a -p /sys/bus/pci/devices/0000:05:00.0 | grep -e > > > {removable} -e {device} -e {vendor} -e looking > > > looking at device '/devices/pci0000:00/0000:00:1d.4/0000:05:00.0': > > > ATTR{device}=="0x15d3" > > > ATTR{removable}=="removable" > > > ATTR{vendor}=="0x8086" > > > > This is actually accurate. The Thunderbolt controller is itself > > hot-removable and that BTW happens to be hot-removed when fwupd applies > > firmware upgrades to the device. This is quite interesting take. Does fwupd rip the controller out of the box to update it? By that account your touchpad is also removable as it may stop functioning when its firmware gets updated. > > Depending on the consumers of this removable attribute I wonder if we need > to a new ATTR of "external" instead of overloading "removable". Isn't this the same thing? From Documentation/ABI/testing/sysfs-devices-removable: What: /sys/devices/.../removable Date: May 2021 Contact: Rajat Jain <rajatxjain@gmail.com> Description: Information about whether a given device can be removed from the platform by the user. This is determined by its subsystem in a bus / platform-specific way. This attribute is only present for devices that can support determining such information: =========== =================================================== "removable" device can be removed from the platform by the user "fixed" device is fixed to the platform / cannot be removed by the user. Note this "by the user". Maybe we should add word "physically" here to qualify the meaning completely, but that is what it is. Not that it disappears from the bus or stops operating for some time because of firmware updates, but it can be physically detached from the platform/system. Thanks.
On Thu, Jan 18, 2024 at 08:12:56AM -0800, Dmitry Torokhov wrote: > On Thu, Jan 18, 2024 at 09:47:07AM -0600, Mario Limonciello wrote: > > On 1/18/2024 00:00, Mika Westerberg wrote: > > > > Before my patch, you see that the JHL6540 controller is inaccurately > > > > labeled “removable”: > > > > $ udevadm info -a -p /sys/bus/pci/devices/0000:05:00.0 | grep -e > > > > {removable} -e {device} -e {vendor} -e looking > > > > looking at device '/devices/pci0000:00/0000:00:1d.4/0000:05:00.0': > > > > ATTR{device}=="0x15d3" > > > > ATTR{removable}=="removable" > > > > ATTR{vendor}=="0x8086" > > > > > > This is actually accurate. The Thunderbolt controller is itself > > > hot-removable and that BTW happens to be hot-removed when fwupd applies > > > firmware upgrades to the device. > > This is quite interesting take. Does fwupd rip the controller out of the > box to update it? By that account your touchpad is also removable as it > may stop functioning when its firmware gets updated. > > > > > Depending on the consumers of this removable attribute I wonder if we need > > to a new ATTR of "external" instead of overloading "removable". > > Isn't this the same thing? From > Documentation/ABI/testing/sysfs-devices-removable: > > What: /sys/devices/.../removable > Date: May 2021 > Contact: Rajat Jain <rajatxjain@gmail.com> > Description: > Information about whether a given device can be removed from the > platform by the user. This is determined by its subsystem in a > bus / platform-specific way. This attribute is only present for > devices that can support determining such information: > > =========== =================================================== > "removable" device can be removed from the platform by the user > "fixed" device is fixed to the platform / cannot be removed > by the user. > > Note this "by the user". Maybe we should add word "physically" here to > qualify the meaning completely, but that is what it is. Not that it > disappears from the bus or stops operating for some time because of > firmware updates, but it can be physically detached from the > platform/system. FTR this is where Rajat and Bjorn have agreed what the attribute means when it was moved from USB-specific to general device property: https://lore.kernel.org/all/20210511230228.GA2429744@bjorn-Precision-5520/ Thanks.
On Thu, Jan 18, 2024 at 08:12:56AM -0800, Dmitry Torokhov wrote: > On Thu, Jan 18, 2024 at 09:47:07AM -0600, Mario Limonciello wrote: > > On 1/18/2024 00:00, Mika Westerberg wrote: > > > > Before my patch, you see that the JHL6540 controller is inaccurately > > > > labeled “removable”: > > > > $ udevadm info -a -p /sys/bus/pci/devices/0000:05:00.0 | grep -e > > > > {removable} -e {device} -e {vendor} -e looking > > > > looking at device '/devices/pci0000:00/0000:00:1d.4/0000:05:00.0': > > > > ATTR{device}=="0x15d3" > > > > ATTR{removable}=="removable" > > > > ATTR{vendor}=="0x8086" > > > > > > This is actually accurate. The Thunderbolt controller is itself > > > hot-removable and that BTW happens to be hot-removed when fwupd applies > > > firmware upgrades to the device. > > This is quite interesting take. Does fwupd rip the controller out of the > box to update it? By that account your touchpad is also removable as it > may stop functioning when its firmware gets updated. The Thunderbolt controller is connected to a hotpluggable PCIe root port so it will be dissappear from the userspace so that "removable" in that sense is accurate. > > Depending on the consumers of this removable attribute I wonder if we need > > to a new ATTR of "external" instead of overloading "removable". > > Isn't this the same thing? From > Documentation/ABI/testing/sysfs-devices-removable: > > What: /sys/devices/.../removable > Date: May 2021 > Contact: Rajat Jain <rajatxjain@gmail.com> > Description: > Information about whether a given device can be removed from the > platform by the user. This is determined by its subsystem in a > bus / platform-specific way. This attribute is only present for > devices that can support determining such information: > > =========== =================================================== > "removable" device can be removed from the platform by the user > "fixed" device is fixed to the platform / cannot be removed > by the user. > > Note this "by the user". Maybe we should add word "physically" here to > qualify the meaning completely, but that is what it is. Not that it > disappears from the bus or stops operating for some time because of > firmware updates, but it can be physically detached from the > platform/system. That would be good to fully qualify what this means. But I get you it is intented to identify devices that can be physically unplugged from the system.
On Fri, Jan 19, 2024 at 07:37:56AM +0200, Mika Westerberg wrote: > On Thu, Jan 18, 2024 at 08:12:56AM -0800, Dmitry Torokhov wrote: > > On Thu, Jan 18, 2024 at 09:47:07AM -0600, Mario Limonciello wrote: > > > On 1/18/2024 00:00, Mika Westerberg wrote: > > > > > Before my patch, you see that the JHL6540 controller is inaccurately > > > > > labeled “removable”: > > > > > $ udevadm info -a -p /sys/bus/pci/devices/0000:05:00.0 | grep -e > > > > > {removable} -e {device} -e {vendor} -e looking > > > > > looking at device '/devices/pci0000:00/0000:00:1d.4/0000:05:00.0': > > > > > ATTR{device}=="0x15d3" > > > > > ATTR{removable}=="removable" > > > > > ATTR{vendor}=="0x8086" > > > > > > > > This is actually accurate. The Thunderbolt controller is itself > > > > hot-removable and that BTW happens to be hot-removed when fwupd applies > > > > firmware upgrades to the device. > > > > This is quite interesting take. Does fwupd rip the controller out of the > > box to update it? By that account your touchpad is also removable as it > > may stop functioning when its firmware gets updated. > > The Thunderbolt controller is connected to a hotpluggable PCIe root port > so it will be dissappear from the userspace so that "removable" in that > sense is accurate. There are systems as well where the Thunderbolt (and/or xHCI) controller only appears if there is anything plugged to the physical Type-C ports and it gets removed pretty soon after the physical device gets unplugged. These are also the same Alpine Ridge and Titan Ridge controllers that this patch is dealing with. I tried to think about some sort of more generic heuristic how to figure out that the controller is actually inside the physical system but there is a problem that the same controller can appear on the bus as well, eg. you plug in Thunderbolt dock and that one has xHCI controller too. That device should definitely be "removable". With the "software CM" systems we have a couple of additional hints in the ACPI tables that can be used to identify the "tunneled" ports but this does not apply to the older systems I'm afraid. Now if I understand the reason behind this patch is actually not about "removability" that much than about identifying a trusted vs. untrusted device and attaching a driver to those. I was under impression that there is already a solution to this in ChromeOS kernel. It has an allowlist of drivers that are allowed to attach these devices and that includes the PCIe port drivers, xhci_hcd and the thunderbolt driver, possibly something else too. Is this not working for your case?
On Fri, Jan 19, 2024 at 09:48:29AM +0200, Mika Westerberg wrote: > On Fri, Jan 19, 2024 at 07:37:56AM +0200, Mika Westerberg wrote: > > On Thu, Jan 18, 2024 at 08:12:56AM -0800, Dmitry Torokhov wrote: > > > On Thu, Jan 18, 2024 at 09:47:07AM -0600, Mario Limonciello wrote: > > > > On 1/18/2024 00:00, Mika Westerberg wrote: > > > > > > Before my patch, you see that the JHL6540 controller is inaccurately > > > > > > labeled “removable”: > > > > > > $ udevadm info -a -p /sys/bus/pci/devices/0000:05:00.0 | grep -e > > > > > > {removable} -e {device} -e {vendor} -e looking > > > > > > looking at device '/devices/pci0000:00/0000:00:1d.4/0000:05:00.0': > > > > > > ATTR{device}=="0x15d3" > > > > > > ATTR{removable}=="removable" > > > > > > ATTR{vendor}=="0x8086" > > > > > > > > > > This is actually accurate. The Thunderbolt controller is itself > > > > > hot-removable and that BTW happens to be hot-removed when fwupd applies > > > > > firmware upgrades to the device. > > > > > > This is quite interesting take. Does fwupd rip the controller out of the > > > box to update it? By that account your touchpad is also removable as it > > > may stop functioning when its firmware gets updated. > > > > The Thunderbolt controller is connected to a hotpluggable PCIe root port > > so it will be dissappear from the userspace so that "removable" in that > > sense is accurate. > > There are systems as well where the Thunderbolt (and/or xHCI) controller > only appears if there is anything plugged to the physical Type-C ports > and it gets removed pretty soon after the physical device gets > unplugged. These are also the same Alpine Ridge and Titan Ridge > controllers that this patch is dealing with. > > I tried to think about some sort of more generic heuristic how to figure > out that the controller is actually inside the physical system but there > is a problem that the same controller can appear on the bus as well, eg. > you plug in Thunderbolt dock and that one has xHCI controller too. That > device should definitely be "removable". With the "software CM" systems > we have a couple of additional hints in the ACPI tables that can be used > to identify the "tunneled" ports but this does not apply to the older > systems I'm afraid. The below "might" work: 1. A device that is directly behind a PCIe root or downstream port that has ->external_facing == 1. 2. It is a PCIe endpoint. 3. It is a sibling to or has any of the below PCI IDs (see drivers/thunderbolt/nhi.h for the definitions): PCI_DEVICE_ID_INTEL_ALPINE_RIDGE_C_4C_NHI PCI_DEVICE_ID_INTEL_ALPINE_RIDGE_C_2C_NHI PCI_DEVICE_ID_INTEL_ALPINE_RIDGE_LP_USBONLY_NHI PCI_DEVICE_ID_INTEL_ALPINE_RIDGE_USBONLY_NHI PCI_DEVICE_ID_INTEL_ALPINE_RIDGE_C_USBONLY_NHI PCI_DEVICE_ID_INTEL_TITAN_RIDGE_2C_NHI PCI_DEVICE_ID_INTEL_TITAN_RIDGE_4C_NHI And for all USB4 we can use the PCI class: PCI_CLASS_SERIAL_USB_USB4 (4. With USB4 systems we could also add the check that the device is not below the tunneled PCIe ports but that's kind of redundant). I think this covers the existing devices as well as future discrete USB4 controllers and marks both the NHI and the xHCI as "fixed" which we could also use to lift the bounce buffering restriction from them. @Mario, did I miss anything?
On Thu, Jan 18, 2024 at 1:01 AM Mika Westerberg <mika.westerberg@linux.intel.com> wrote: > > Well that's pretty much all Intel Titan Ridge and Maple Ridge based > systems. Some early ones did not use IOMMU but all the rest do. ... > Before Intel Ice Lake it was all discrete and it is still discrete with > the Barlow Ridge controller who will have exact same ExternalFacing port > as the previous models. Next week I'll try those devices in our inventory to see if I can find another one with this bug. I'll get back to you on that! On Fri, Jan 19, 2024 at 2:58 AM Mika Westerberg <mika.westerberg@linux.intel.com> wrote: > > Now if I understand the reason behind this patch is actually not about > "removability" that much than about identifying a trusted vs. untrusted > device and attaching a driver to those. I was under impression that > there is already a solution to this in ChromeOS kernel. It has an > allowlist of drivers that are allowed to attach these devices and that > includes the PCIe port drivers, xhci_hcd and the thunderbolt driver, > possibly something else too. Is this not working for your case? This device shouldn’t be treated as a removable thunderbolt device that is enabled by policy because it is an internal device that should be trusted in the first place. Even so, while learning about this problem I tried modifying the ChromeOS policy but it ended up not fixing the issue because it seems like there is an expectation for it to see an existing “fixed” thunderbolt port before it loads anything else. But the fixed thunderbolt port is prevented from enumerating during bootup, before the policy has a chance to work. On Fri, Jan 19, 2024 at 5:23 AM Mika Westerberg <mika.westerberg@linux.intel.com> wrote: > > The below "might" work: > > 1. A device that is directly behind a PCIe root or downstream port that > has ->external_facing == 1. > > 2. It is a PCIe endpoint. > > 3. It is a sibling to or has any of the below PCI IDs (see > drivers/thunderbolt/nhi.h for the definitions): External pci devices seem to have the same kinds of chips in them. So this wouldn’t distinguish the “fixed but discrete” embedded pci devices from the “removable” pcie through usb devices. My monitor with thunderbolt capabilities has the JHL7540 chip in it. From the kernel's perspective, I have only found that the subsystem id is what distinguishes these devices. That is, unless I am missing something in your proposal that would distinguish a fixed JHL6540 chip from an external JHL6540 chip. Please correct me on any assumptions I get wrong!
On Fri, Jan 19, 2024 at 11:03:18AM -0500, Esther Shimanovich wrote: > On Thu, Jan 18, 2024 at 1:01 AM Mika Westerberg > <mika.westerberg@linux.intel.com> wrote: > > > > Well that's pretty much all Intel Titan Ridge and Maple Ridge based > > systems. Some early ones did not use IOMMU but all the rest do. > ... > > Before Intel Ice Lake it was all discrete and it is still discrete with > > the Barlow Ridge controller who will have exact same ExternalFacing port > > as the previous models. > > Next week I'll try those devices in our inventory to see if I can find > another one with this bug. I'll get back to you on that! > > On Fri, Jan 19, 2024 at 2:58 AM Mika Westerberg > <mika.westerberg@linux.intel.com> wrote: > > > > Now if I understand the reason behind this patch is actually not about > > "removability" that much than about identifying a trusted vs. untrusted > > device and attaching a driver to those. I was under impression that > > there is already a solution to this in ChromeOS kernel. It has an > > allowlist of drivers that are allowed to attach these devices and that > > includes the PCIe port drivers, xhci_hcd and the thunderbolt driver, > > possibly something else too. Is this not working for your case? > > This device shouldn’t be treated as a removable thunderbolt device > that is enabled by policy because it is an internal device that should > be trusted in the first place. > Even so, while learning about this problem I tried modifying the > ChromeOS policy but it ended up not fixing the issue because it seems > like there is an expectation for it to see an existing “fixed” > thunderbolt port before it loads anything else. But the fixed > thunderbolt port is prevented from enumerating during bootup, before > the policy has a chance to work. Have you contacted the Chrome kernel folks about this? Perhaps they have thought about this already? > On Fri, Jan 19, 2024 at 5:23 AM Mika Westerberg > <mika.westerberg@linux.intel.com> wrote: > > > > The below "might" work: > > > > 1. A device that is directly behind a PCIe root or downstream port that > > has ->external_facing == 1. > > > > 2. It is a PCIe endpoint. > > > > 3. It is a sibling to or has any of the below PCI IDs (see > > drivers/thunderbolt/nhi.h for the definitions): > > External pci devices seem to have the same kinds of chips in them. So > this wouldn’t distinguish the “fixed but discrete” embedded pci > devices from the “removable” pcie through usb devices. My monitor with > thunderbolt capabilities has the JHL7540 chip in it. From the kernel's > perspective, I have only found that the subsystem id is what > distinguishes these devices. > > That is, unless I am missing something in your proposal that would > distinguish a fixed JHL6540 chip from an external JHL6540 chip. Please > correct me on any assumptions I get wrong! Yes, you are missing the 1. that it needs to be directly behind the PCIe root or downstream port that is marked as ->external_facing, and the fact that there can't be NHI's (that's the host controller with the IDs I listed in 3.) anywhere else except starting the topology according the USB4 spec (and the same applies to Thunderbolt 1-3).
On 1/19/2024 04:22, Mika Westerberg wrote: > On Fri, Jan 19, 2024 at 09:48:29AM +0200, Mika Westerberg wrote: >> On Fri, Jan 19, 2024 at 07:37:56AM +0200, Mika Westerberg wrote: >>> On Thu, Jan 18, 2024 at 08:12:56AM -0800, Dmitry Torokhov wrote: >>>> On Thu, Jan 18, 2024 at 09:47:07AM -0600, Mario Limonciello wrote: >>>>> On 1/18/2024 00:00, Mika Westerberg wrote: >>>>>>> Before my patch, you see that the JHL6540 controller is inaccurately >>>>>>> labeled “removable”: >>>>>>> $ udevadm info -a -p /sys/bus/pci/devices/0000:05:00.0 | grep -e >>>>>>> {removable} -e {device} -e {vendor} -e looking >>>>>>> looking at device '/devices/pci0000:00/0000:00:1d.4/0000:05:00.0': >>>>>>> ATTR{device}=="0x15d3" >>>>>>> ATTR{removable}=="removable" >>>>>>> ATTR{vendor}=="0x8086" >>>>>> >>>>>> This is actually accurate. The Thunderbolt controller is itself >>>>>> hot-removable and that BTW happens to be hot-removed when fwupd applies >>>>>> firmware upgrades to the device. >>>> >>>> This is quite interesting take. Does fwupd rip the controller out of the >>>> box to update it? By that account your touchpad is also removable as it >>>> may stop functioning when its firmware gets updated. >>> >>> The Thunderbolt controller is connected to a hotpluggable PCIe root port >>> so it will be dissappear from the userspace so that "removable" in that >>> sense is accurate. >> >> There are systems as well where the Thunderbolt (and/or xHCI) controller >> only appears if there is anything plugged to the physical Type-C ports >> and it gets removed pretty soon after the physical device gets >> unplugged. These are also the same Alpine Ridge and Titan Ridge >> controllers that this patch is dealing with. >> >> I tried to think about some sort of more generic heuristic how to figure >> out that the controller is actually inside the physical system but there >> is a problem that the same controller can appear on the bus as well, eg. >> you plug in Thunderbolt dock and that one has xHCI controller too. That >> device should definitely be "removable". With the "software CM" systems >> we have a couple of additional hints in the ACPI tables that can be used >> to identify the "tunneled" ports but this does not apply to the older >> systems I'm afraid. > > The below "might" work: > > 1. A device that is directly behind a PCIe root or downstream port that > has ->external_facing == 1. > > 2. It is a PCIe endpoint. > > 3. It is a sibling to or has any of the below PCI IDs (see > drivers/thunderbolt/nhi.h for the definitions): > > PCI_DEVICE_ID_INTEL_ALPINE_RIDGE_C_4C_NHI > PCI_DEVICE_ID_INTEL_ALPINE_RIDGE_C_2C_NHI > PCI_DEVICE_ID_INTEL_ALPINE_RIDGE_LP_USBONLY_NHI > PCI_DEVICE_ID_INTEL_ALPINE_RIDGE_USBONLY_NHI > PCI_DEVICE_ID_INTEL_ALPINE_RIDGE_C_USBONLY_NHI > PCI_DEVICE_ID_INTEL_TITAN_RIDGE_2C_NHI > PCI_DEVICE_ID_INTEL_TITAN_RIDGE_4C_NHI > > And for all USB4 we can use the PCI class: > > PCI_CLASS_SERIAL_USB_USB4 > > (4. With USB4 systems we could also add the check that the device is not > below the tunneled PCIe ports but that's kind of redundant). > > I think this covers the existing devices as well as future discrete USB4 > controllers and marks both the NHI and the xHCI as "fixed" which we > could also use to lift the bounce buffering restriction from them. > > @Mario, did I miss anything? The bounce buffering restriction is only for unaligned DMA isn't it? Does that tend to happen a lot? But otherwise I think this does a good job. It will cover external enclosure cases too because of having to check it's directly behind a root port. But it should also include comments about why it's not needed on newer systems (IE the ACPI hints for someone with no prior knowledge looking at this to find).
On Mon, Jan 22, 2024 at 05:50:24PM -0600, Mario Limonciello wrote: > On 1/19/2024 04:22, Mika Westerberg wrote: > > On Fri, Jan 19, 2024 at 09:48:29AM +0200, Mika Westerberg wrote: > > > On Fri, Jan 19, 2024 at 07:37:56AM +0200, Mika Westerberg wrote: > > > > On Thu, Jan 18, 2024 at 08:12:56AM -0800, Dmitry Torokhov wrote: > > > > > On Thu, Jan 18, 2024 at 09:47:07AM -0600, Mario Limonciello wrote: > > > > > > On 1/18/2024 00:00, Mika Westerberg wrote: > > > > > > > > Before my patch, you see that the JHL6540 controller is inaccurately > > > > > > > > labeled “removable”: > > > > > > > > $ udevadm info -a -p /sys/bus/pci/devices/0000:05:00.0 | grep -e > > > > > > > > {removable} -e {device} -e {vendor} -e looking > > > > > > > > looking at device '/devices/pci0000:00/0000:00:1d.4/0000:05:00.0': > > > > > > > > ATTR{device}=="0x15d3" > > > > > > > > ATTR{removable}=="removable" > > > > > > > > ATTR{vendor}=="0x8086" > > > > > > > > > > > > > > This is actually accurate. The Thunderbolt controller is itself > > > > > > > hot-removable and that BTW happens to be hot-removed when fwupd applies > > > > > > > firmware upgrades to the device. > > > > > > > > > > This is quite interesting take. Does fwupd rip the controller out of the > > > > > box to update it? By that account your touchpad is also removable as it > > > > > may stop functioning when its firmware gets updated. > > > > > > > > The Thunderbolt controller is connected to a hotpluggable PCIe root port > > > > so it will be dissappear from the userspace so that "removable" in that > > > > sense is accurate. > > > > > > There are systems as well where the Thunderbolt (and/or xHCI) controller > > > only appears if there is anything plugged to the physical Type-C ports > > > and it gets removed pretty soon after the physical device gets > > > unplugged. These are also the same Alpine Ridge and Titan Ridge > > > controllers that this patch is dealing with. > > > > > > I tried to think about some sort of more generic heuristic how to figure > > > out that the controller is actually inside the physical system but there > > > is a problem that the same controller can appear on the bus as well, eg. > > > you plug in Thunderbolt dock and that one has xHCI controller too. That > > > device should definitely be "removable". With the "software CM" systems > > > we have a couple of additional hints in the ACPI tables that can be used > > > to identify the "tunneled" ports but this does not apply to the older > > > systems I'm afraid. > > > > The below "might" work: > > > > 1. A device that is directly behind a PCIe root or downstream port that > > has ->external_facing == 1. > > > > 2. It is a PCIe endpoint. > > > > 3. It is a sibling to or has any of the below PCI IDs (see > > drivers/thunderbolt/nhi.h for the definitions): > > > > PCI_DEVICE_ID_INTEL_ALPINE_RIDGE_C_4C_NHI > > PCI_DEVICE_ID_INTEL_ALPINE_RIDGE_C_2C_NHI > > PCI_DEVICE_ID_INTEL_ALPINE_RIDGE_LP_USBONLY_NHI > > PCI_DEVICE_ID_INTEL_ALPINE_RIDGE_USBONLY_NHI > > PCI_DEVICE_ID_INTEL_ALPINE_RIDGE_C_USBONLY_NHI > > PCI_DEVICE_ID_INTEL_TITAN_RIDGE_2C_NHI > > PCI_DEVICE_ID_INTEL_TITAN_RIDGE_4C_NHI > > > > And for all USB4 we can use the PCI class: > > > > PCI_CLASS_SERIAL_USB_USB4 > > > > (4. With USB4 systems we could also add the check that the device is not > > below the tunneled PCIe ports but that's kind of redundant). > > > > I think this covers the existing devices as well as future discrete USB4 > > controllers and marks both the NHI and the xHCI as "fixed" which we > > could also use to lift the bounce buffering restriction from them. > > > > @Mario, did I miss anything? > > The bounce buffering restriction is only for unaligned DMA isn't it? Does > that tend to happen a lot? AFAICT no but this would allow to use IOMMU identity mappings instead of full mappings with these devices. > But otherwise I think this does a good job. It will cover external > enclosure cases too because of having to check it's directly behind a root > port. > > But it should also include comments about why it's not needed on newer > systems (IE the ACPI hints for someone with no prior knowledge looking at > this to find). Agree.
On Fri, Jan 19, 2024 at 11:03 AM Esther Shimanovich <eshimanovich@chromium.org> wrote: > Next week I'll try those devices in our inventory to see if I can find > another one with this bug. I'll get back to you on that! I ended up finding 11 additional models with this bug, from various manufacturers such as HP, Dell and Lenovo, that use the following thunderbolt controllers: JHL6240, JHL6340, JHL6540, JHL7540. So making this fix apply to all affected devices would make sense. On Mon, Jan 22, 2024 at 1:10 AM Mika Westerberg <mika.westerberg@linux.intel.com> wrote: > Yes, you are missing the 1. that it needs to be directly behind the PCIe > root or downstream port that is marked as ->external_facing, and the > fact that there can't be NHI's (that's the host controller with the IDs > I listed in 3.) anywhere else except starting the topology according the > USB4 spec (and the same applies to Thunderbolt 1-3). Thanks for the explanation. I'll write up a patch that implements this and takes into account all the feedback. Then I'll test it on multiple models, and then send it your way. Let me know if it makes sense to add you as a co-author. Very much appreciate your insights.
Hey! Asking for some help on implementation. So I implemented most of this, and successfully tested the quirk on 6 different devices with various types of discrete fixed JHL Thunderbolt chips. However I want to add an additional check. A device wouldn't need this quirk if it already has Thunderbolt functionality built in within its Root Port. I tried to use "is_thunderbolt" as an identifier for that type of device--- but when I tested on a device with a thunderbolt root port, "is_thunderbolt" was false for all the Thunderbolt PCI components listed below. ~ # lspci -nn | grep Thunderbolt 00:07.0 PCI bridge [0604]: Intel Corporation Tiger Lake-LP Thunderbolt 4 PCI Express Root Port #1 [8086:9a25] (rev 01) 00:07.2 PCI bridge [0604]: Intel Corporation Tiger Lake-LP Thunderbolt 4 PCI Express Root Port #2 [8086:9a27] (rev 01) 00:0d.0 USB controller [0c03]: Intel Corporation Tiger Lake-LP Thunderbolt 4 USB Controller [8086:9a13] (rev 01) 00:0d.2 USB controller [0c03]: Intel Corporation Tiger Lake-LP Thunderbolt 4 NHI #0 [8086:9a1b] (rev 01) 00:0d.3 USB controller [0c03]: Intel Corporation Tiger Lake-LP Thunderbolt 4 NHI #1 [8086:9a1d] (rev 01) The device I tested was the Lenovo carbon X1 Gen 9. When set_pcie_thunderbolt is run, the devices listed above return false on the pci_find_next_ext_capability check. I have spent some time trying to see if there are any ACPI values or any alternative indicators to reliably know if a root port has thunderbolt capabilities. I do see that ADBG is often set to TBT but that looks like a debugging tool in ACPI. I also looked through lspci -vvv and compared the output with a device that has no Thunderbolt built into its CPU, which gave me nothing. I also tried looking through values in /sys/bus and searching through the kernel, looking through logs etc. The only option I see now is figuring out how to get the string value returned by the lspci and parsing it, but I'm not 100% sure if all Thunderbolt CPUs would have Root ports specifically labeled as "Thunderbolt Root Port". I'm also not sure if that value is supposed to be used in that way. I was hoping that someone may have a suggestion here. Just for reference, this is the fix I have so far. I don't want to submit it as a new patch, until I resolve the above question. +static bool get_pci_exp_upstream_port(struct pci_dev *dev, + struct pci_dev **upstream_port) +{ + struct pci_dev *parent = dev; + + // If the dev is an upstream port, return itself + if (pci_pcie_type(dev) == PCI_EXP_TYPE_UPSTREAM) { + *upstream_port = dev; + return true; + } + + // Iterate through the dev's parents to find its upstream port + while ((parent = pci_upstream_bridge(parent))) { + if (pci_pcie_type(parent) == PCI_EXP_TYPE_UPSTREAM) { + *upstream_port = parent; + return true; + } + } + return false; +} + +static void relabel_internal_thunderbolt_chip(struct pci_dev *dev) +{ + struct pci_dev *upstream_port; + struct pci_dev *upstream_ports_parent; + + // Get the upstream port closest to the dev + if (!(get_pci_exp_upstream_port(dev, &upstream_port))) + return; + + // Next, we confirm if the upstream port is directly behind a PCIe root + // port. + if (!(upstream_ports_parent == pci_upstream_bridge(upstream_port))) + return; + if (!(pci_pcie_type(upstream_ports_parent) == PCI_EXP_TYPE_ROOT_PORT)) + return; // quirk does not apply + + // If a device's root port already has thunderbolt capabilities, then + // it shouldn't be using this quirk. + // TODO: THIS CHECK DOES NOT WORK + // I ALSO TRIED USING pci_is_thunderbolt_attached WHICH DIDN'T WORK EITHER + if (!(upstream_ports_parent->is_thunderbolt)) + return; // thunderbolt functionality is built into root port + + // Apply quirk + dev_set_removable(&dev->dev, DEVICE_FIXED); + + // External facing bridges must be marked as such so that devices + // below them can correctly be labeled as REMOVABLE + if ((pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM) + && (dev->devfn == 0x08 | dev->devfn == 0x20)) + dev->external_facing = true; +}
Hi, On Mon, Apr 15, 2024 at 06:34:00PM -0400, Esther Shimanovich wrote: > Hey! > Asking for some help on implementation. > So I implemented most of this, and successfully tested the quirk on 6 > different devices with various types of discrete fixed JHL Thunderbolt > chips. > > However I want to add an additional check. A device wouldn't need this > quirk if it already has Thunderbolt functionality built in within its > Root Port. There is really no "Thunderbolt functionality built in within its Root Port". More accurate is that the Thunderbolt/USB4 controller is integrated into the CPU and the PCIe tunnels go out through the CPU PCIe Root Ports. > I tried to use "is_thunderbolt" as an identifier for that type of > device--- but when I tested on a device with a thunderbolt root port, > "is_thunderbolt" was false for all the Thunderbolt PCI components > listed below. You should not use is_thunderbolt for anything else than with the legacy Apple systems. > ~ # lspci -nn | grep Thunderbolt > 00:07.0 PCI bridge [0604]: Intel Corporation Tiger Lake-LP Thunderbolt > 4 PCI Express Root Port #1 [8086:9a25] (rev 01) > 00:07.2 PCI bridge [0604]: Intel Corporation Tiger Lake-LP Thunderbolt > 4 PCI Express Root Port #2 [8086:9a27] (rev 01) > 00:0d.0 USB controller [0c03]: Intel Corporation Tiger Lake-LP > Thunderbolt 4 USB Controller [8086:9a13] (rev 01) > 00:0d.2 USB controller [0c03]: Intel Corporation Tiger Lake-LP > Thunderbolt 4 NHI #0 [8086:9a1b] (rev 01) > 00:0d.3 USB controller [0c03]: Intel Corporation Tiger Lake-LP > Thunderbolt 4 NHI #1 [8086:9a1d] (rev 01) Okay this is integrated Thunderbolt/USB4 controller. > The device I tested was the Lenovo carbon X1 Gen 9. When > set_pcie_thunderbolt is run, the devices listed above return false on > the pci_find_next_ext_capability check. > > I have spent some time trying to see if there are any ACPI values or > any alternative indicators to reliably know if a root port has > thunderbolt capabilities. I do see that ADBG is often set to TBT but > that looks like a debugging tool in ACPI. > > I also looked through lspci -vvv and compared the output with a device > that has no Thunderbolt built into its CPU, which gave me nothing. For integrated and most recent systems (that's with the software connection manager) the tunneled PCIe ports (Root or Downstream) can be identified by looking at "usb4-host-interface" ACPI _DSD property. In addition to this there is the "External Facing Port" ACPI _DSD property attached to them too. Maybe these help? With the firmware connection manager there is only the "External Facing Port" _DSD though. The Microsoft documentation for this that we also use in Linux is here: https://learn.microsoft.com/en-us/windows-hardware/drivers/pci/dsd-for-pcie-root-ports
Thank you for your response! It is very much appreciated. On the Tiger Lake device I was testing on, the usb4-host-interface value is NOT listed in its ACPI. I then decided to query the ACPI values collected from devices in my office, to see if this issue is limited to my device. Ice Lake - 4 devices, none had "usb4-host-interface" Tiger Lake - 31 devices, none have "usb4-host-interface" Alder Lake - 32 devices, I see that 15 of them have "usb4-host-interface" in their ACPI Raptor Lake - 1 device, does not have "usb4-host-interface" It looks like only Alder Lake has usb4-host-interface listed in its ACPI for whatever reason. It seems like I cannot use usb4-host-interface as a determinant whether the CPU has Thunderbolt capabilities (thus not needing a discrete Thunderbolt chip). ExternalFacingPort is listed in devices that don't have CPUs with Thunderbolts, so that can't be a determinant. Am I missing something?
Hi, On Thu, Apr 18, 2024 at 03:43:08PM -0400, Esther Shimanovich wrote: > Thank you for your response! It is very much appreciated. > > On the Tiger Lake device I was testing on, the usb4-host-interface > value is NOT listed in its ACPI. > > I then decided to query the ACPI values collected from devices in my > office, to see if this issue is limited to my device. > Ice Lake - 4 devices, none had "usb4-host-interface" > Tiger Lake - 31 devices, none have "usb4-host-interface" > Alder Lake - 32 devices, I see that 15 of them have > "usb4-host-interface" in their ACPI > Raptor Lake - 1 device, does not have "usb4-host-interface" > > It looks like only Alder Lake has usb4-host-interface listed in its > ACPI for whatever reason. The reason is that it is there only for "software connection manager" systems. That's Chrome TGL and then all ADL and beyond (with integrated Thunderbolt/USB4). > It seems like I cannot use usb4-host-interface as a determinant > whether the CPU has Thunderbolt capabilities (thus not needing a > discrete Thunderbolt chip). > ExternalFacingPort is listed in devices that don't have CPUs with > Thunderbolts, so that can't be a determinant. > > Am I missing something? "ExternalFacingPort" is there for all firmware and software connection manager systems but only if they use IOMMU for DMA security (also /sys/bus/thunderbolt/devices/domainX/iommu_dma_protection == 1). So this is all integrated (Ice Lake+), then Alpine Ridge, Titan Ridge and Maple Ridge based recent systems. That leaves out still older systems with firmware connection manager with the "legacy" security settings. This is pretty much Alpine and Titan Ridge and I think those you can identify using their PCI IDs directly as the list will not be too big. Something like this taken from the drivers/thunderbolt/nhi.h: #define PCI_DEVICE_ID_INTEL_ALPINE_RIDGE_LP_BRIDGE 0x15c0 #define PCI_DEVICE_ID_INTEL_ALPINE_RIDGE_C_4C_BRIDGE 0x15d3 #define PCI_DEVICE_ID_INTEL_ALPINE_RIDGE_C_2C_BRIDGE 0x15da #define PCI_DEVICE_ID_INTEL_TITAN_RIDGE_2C_BRIDGE 0x15e7 #define PCI_DEVICE_ID_INTEL_TITAN_RIDGE_4C_BRIDGE 0x15ea #define PCI_DEVICE_ID_INTEL_TITAN_RIDGE_DD_BRIDGE 0x15ef In addition to this some of the systems used the BIOS "assisted" enumeration which means that the controller is actually only present if you have either USB or Thunderbolt device connected. Othertimes it is not present at all (not sure if you want to label these differently though). In summary if you want to find out if the CPU has an integrated Thunderbolt/USB4 controller (I think this is what you ask above) then the best bet would be to use "ExternalFacingPort" because that is present in all those systems.
Thanks for the explanation! I still don't fully understand how that would work for my use case. Perhaps it would be better for me to describe the case I am trying to protect against. To rehash, this quirk was written for devices with discrete Thunderbolt controllers. For example, CometLake_CPU -> AlpineRidge_Chip -> USB-C Port This device has the ExternalFacingPort property in ACPI. My quirk relabels the Alpine Ridge chip as "fixed" and external-facing, so that devices attached to the USB-C port could be labeled as "removable" Let's say we have a TigerLake CPU, which has integrated Thunderbolt/USB4 capabilities: TigerLake_ThunderboltCPU -> USB-C Port This device also has the ExternalFacingPort property in ACPI and lacks the usb4-host-interface property in the ACPI. My worry is that someone could take an Alpine Ridge Chip Thunderbolt Dock and attach it to the TigerLake CPU TigerLake_ThunderboltCPU -> USB-C Port -> AlpineRidge_Dock If that were to happen, this quirk would incorrectly label the Alpine Ridge Dock as "fixed" instead of "removable". My thinking was that we could prevent this scenario from occurring if we filtered this quirk not to apply on CPU's like Tiger Lake, with integrated Thunderbolt/USB4 capabilities. ExternalFacingPort is found both on the Comet Lake ACPI and also on the Tiger Lake ACPI. So I can't use that to distinguish between CPUs which don't have integrated Thunderbolt, like Comet Lake, and CPUs with integrated Thunderbolt, like Tiger Lake. I am looking for something that can tell me if the device's Root Port has the Thunderbolt controller upstream to it or not. Is there anything like that? Or perhaps should I add a check which compares the name of the device's CPU with a list of CPUs that this quirk can be applied to? Or is there some way I can identify the Thunderbolt controller, then determine if it's upstream or downstream from the root port? Or are Alpine Ridge docks not something to worry about at all?
On 4/22/2024 14:17, Esther Shimanovich wrote: > Thanks for the explanation! I still don't fully understand how that > would work for my use case. > > Perhaps it would be better for me to describe the case I am trying to > protect against. > > To rehash, this quirk was written for devices with discrete > Thunderbolt controllers. > > For example, > CometLake_CPU -> AlpineRidge_Chip -> USB-C Port > This device has the ExternalFacingPort property in ACPI. > My quirk relabels the Alpine Ridge chip as "fixed" and > external-facing, so that devices attached to the USB-C port could be > labeled as "removable" > > Let's say we have a TigerLake CPU, which has integrated > Thunderbolt/USB4 capabilities: > > TigerLake_ThunderboltCPU -> USB-C Port > This device also has the ExternalFacingPort property in ACPI and lacks > the usb4-host-interface property in the ACPI. > > My worry is that someone could take an Alpine Ridge Chip Thunderbolt > Dock and attach it to the TigerLake CPU > > TigerLake_ThunderboltCPU -> USB-C Port -> AlpineRidge_Dock > > If that were to happen, this quirk would incorrectly label the Alpine > Ridge Dock as "fixed" instead of "removable". > > My thinking was that we could prevent this scenario from occurring if > we filtered this quirk not to apply on CPU's like Tiger Lake, with > integrated Thunderbolt/USB4 capabilities. > > ExternalFacingPort is found both on the Comet Lake ACPI and also on > the Tiger Lake ACPI. So I can't use that to distinguish between CPUs > which don't have integrated Thunderbolt, like Comet Lake, and CPUs > with integrated Thunderbolt, like Tiger Lake. > > I am looking for something that can tell me if the device's Root Port > has the Thunderbolt controller upstream to it or not. > Is there anything like that? > Or perhaps should I add a check which compares the name of the > device's CPU with a list of CPUs that this quirk can be applied to? > Or is there some way I can identify the Thunderbolt controller, then > determine if it's upstream or downstream from the root port? > Or are Alpine Ridge docks not something to worry about at all? My thought was once you have a device as untrusted, everything else connected to it should "also" be untrusted.
On Mon, Apr 22, 2024 at 02:21:18PM -0500, Mario Limonciello wrote: > On 4/22/2024 14:17, Esther Shimanovich wrote: > > Thanks for the explanation! I still don't fully understand how that > > would work for my use case. > > > > Perhaps it would be better for me to describe the case I am trying to > > protect against. > > > > To rehash, this quirk was written for devices with discrete > > Thunderbolt controllers. > > > > For example, > > CometLake_CPU -> AlpineRidge_Chip -> USB-C Port > > This device has the ExternalFacingPort property in ACPI. > > My quirk relabels the Alpine Ridge chip as "fixed" and > > external-facing, so that devices attached to the USB-C port could be > > labeled as "removable" > > > > Let's say we have a TigerLake CPU, which has integrated > > Thunderbolt/USB4 capabilities: > > > > TigerLake_ThunderboltCPU -> USB-C Port > > This device also has the ExternalFacingPort property in ACPI and lacks > > the usb4-host-interface property in the ACPI. > > > > My worry is that someone could take an Alpine Ridge Chip Thunderbolt > > Dock and attach it to the TigerLake CPU > > > > TigerLake_ThunderboltCPU -> USB-C Port -> AlpineRidge_Dock > > > > If that were to happen, this quirk would incorrectly label the Alpine > > Ridge Dock as "fixed" instead of "removable". > > > > My thinking was that we could prevent this scenario from occurring if > > we filtered this quirk not to apply on CPU's like Tiger Lake, with > > integrated Thunderbolt/USB4 capabilities. > > > > ExternalFacingPort is found both on the Comet Lake ACPI and also on > > the Tiger Lake ACPI. So I can't use that to distinguish between CPUs > > which don't have integrated Thunderbolt, like Comet Lake, and CPUs > > with integrated Thunderbolt, like Tiger Lake. > > > > I am looking for something that can tell me if the device's Root Port > > has the Thunderbolt controller upstream to it or not. > > Is there anything like that? > > Or perhaps should I add a check which compares the name of the > > device's CPU with a list of CPUs that this quirk can be applied to? > > Or is there some way I can identify the Thunderbolt controller, then > > determine if it's upstream or downstream from the root port? > > Or are Alpine Ridge docks not something to worry about at all? > > My thought was once you have a device as untrusted, everything else > connected to it should "also" be untrusted. I think what you are looking for is that anything behind a PCIe tunnel should not have this applied. IIRC the AMD GPU or some code there were going to add identification of "virtual" links to the bandwidth calculation functionality. @Mario, do you remember if this was done already and if that could maybe be re-used here? The other way I think is something like this: - If it does not have "usb4-host-interface" property (or behind a port that has that). These are all tunneled (e.g virtual). - It is directly connected to a PCIe root port with "ExternalFacingPort" and it has sibling device that is "Thunderbolt NHI". This is because you can only have "NHI" on a host router according to the USB4 spec. I may be forgetting something though.
On Tue, Apr 23, 2024 at 08:33:12AM +0300, Mika Westerberg wrote: > I think what you are looking for is that anything behind a PCIe tunnel > should not have this applied. IIRC the AMD GPU or some code there were > going to add identification of "virtual" links to the bandwidth > calculation functionality. I guess I could resurrect my correlation patch: https://lore.kernel.org/all/f53ea40a7487e145aa1a62c347cef1814072e140.1536517047.git.lukas@wunner.de/ The last time I forward-ported it was for v5.13. I still have that code running on my development machine. The problem is that it only allows lookup from tb_port to pci_dev. I'd have to add a pointer to struct pci_dev to allow lookups in the inverse direction. Though I think we have such PCI companion devices for CXL as well, so such a pointer could be useful in general. I'm knee-deep in PCI device authentication code but could probably dedicate a weekend to the correlation patch if there's interest? Once we have correlation, we can expose more precise bandwidth for virtual PCI links in sysfs. Thanks, Lukas
On Tue, Apr 23, 2024 at 10:31:30AM +0200, Lukas Wunner wrote: > On Tue, Apr 23, 2024 at 08:33:12AM +0300, Mika Westerberg wrote: > > I think what you are looking for is that anything behind a PCIe tunnel > > should not have this applied. IIRC the AMD GPU or some code there were > > going to add identification of "virtual" links to the bandwidth > > calculation functionality. > > I guess I could resurrect my correlation patch: > > https://lore.kernel.org/all/f53ea40a7487e145aa1a62c347cef1814072e140.1536517047.git.lukas@wunner.de/ > > The last time I forward-ported it was for v5.13. I still have that code > running on my development machine. > > The problem is that it only allows lookup from tb_port to pci_dev. > I'd have to add a pointer to struct pci_dev to allow lookups in the > inverse direction. Though I think we have such PCI companion devices > for CXL as well, so such a pointer could be useful in general. > > I'm knee-deep in PCI device authentication code but could probably > dedicate a weekend to the correlation patch if there's interest? > > Once we have correlation, we can expose more precise bandwidth > for virtual PCI links in sysfs. Sounds good to me :) There are also some additions in USB4 spec that allows discovery of mapping between PCIe adapters and the corresponding PCIe downstream/root port. Perhaps these can be added there too?
On 4/23/2024 00:33, Mika Westerberg wrote: > On Mon, Apr 22, 2024 at 02:21:18PM -0500, Mario Limonciello wrote: >> On 4/22/2024 14:17, Esther Shimanovich wrote: >>> Thanks for the explanation! I still don't fully understand how that >>> would work for my use case. >>> >>> Perhaps it would be better for me to describe the case I am trying to >>> protect against. >>> >>> To rehash, this quirk was written for devices with discrete >>> Thunderbolt controllers. >>> >>> For example, >>> CometLake_CPU -> AlpineRidge_Chip -> USB-C Port >>> This device has the ExternalFacingPort property in ACPI. >>> My quirk relabels the Alpine Ridge chip as "fixed" and >>> external-facing, so that devices attached to the USB-C port could be >>> labeled as "removable" >>> >>> Let's say we have a TigerLake CPU, which has integrated >>> Thunderbolt/USB4 capabilities: >>> >>> TigerLake_ThunderboltCPU -> USB-C Port >>> This device also has the ExternalFacingPort property in ACPI and lacks >>> the usb4-host-interface property in the ACPI. >>> >>> My worry is that someone could take an Alpine Ridge Chip Thunderbolt >>> Dock and attach it to the TigerLake CPU >>> >>> TigerLake_ThunderboltCPU -> USB-C Port -> AlpineRidge_Dock >>> >>> If that were to happen, this quirk would incorrectly label the Alpine >>> Ridge Dock as "fixed" instead of "removable". >>> >>> My thinking was that we could prevent this scenario from occurring if >>> we filtered this quirk not to apply on CPU's like Tiger Lake, with >>> integrated Thunderbolt/USB4 capabilities. >>> >>> ExternalFacingPort is found both on the Comet Lake ACPI and also on >>> the Tiger Lake ACPI. So I can't use that to distinguish between CPUs >>> which don't have integrated Thunderbolt, like Comet Lake, and CPUs >>> with integrated Thunderbolt, like Tiger Lake. >>> >>> I am looking for something that can tell me if the device's Root Port >>> has the Thunderbolt controller upstream to it or not. >>> Is there anything like that? >>> Or perhaps should I add a check which compares the name of the >>> device's CPU with a list of CPUs that this quirk can be applied to? >>> Or is there some way I can identify the Thunderbolt controller, then >>> determine if it's upstream or downstream from the root port? >>> Or are Alpine Ridge docks not something to worry about at all? >> >> My thought was once you have a device as untrusted, everything else >> connected to it should "also" be untrusted. > > I think what you are looking for is that anything behind a PCIe tunnel > should not have this applied. IIRC the AMD GPU or some code there were > going to add identification of "virtual" links to the bandwidth > calculation functionality. > > @Mario, do you remember if this was done already and if that could maybe > be re-used here? Yeah there was a series that I worked on a few spins a while back specifically in the context of eGPUs to identify virtual links and take them out of bandwidth calculations. It didn't get merged, I recall it got stalled on various feedback and I didn't dust it off because the series also did prompt discussions about the reasoning that amdgpu was doing this in the first place. It turned out to be a bad assumption in the code and I instead made a change to amdgpu to not look at the whole topology but just the link partner (466a7d115326ece682c2b60d1c77d1d0b9010b4f if anyone is curious). > > The other way I think is something like this: > > - If it does not have "usb4-host-interface" property (or behind a port > that has that). These are all tunneled (e.g virtual). > > - It is directly connected to a PCIe root port with > "ExternalFacingPort" and it has sibling device that is "Thunderbolt > NHI". This is because you can only have "NHI" on a host router > according to the USB4 spec. > > I may be forgetting something though.
On Tue, Apr 23, 2024 at 11:59:36AM -0500, Mario Limonciello wrote: > On 4/23/2024 00:33, Mika Westerberg wrote: > > On Mon, Apr 22, 2024 at 02:21:18PM -0500, Mario Limonciello wrote: > > > On 4/22/2024 14:17, Esther Shimanovich wrote: > > > > Thanks for the explanation! I still don't fully understand how that > > > > would work for my use case. > > > > > > > > Perhaps it would be better for me to describe the case I am trying to > > > > protect against. > > > > > > > > To rehash, this quirk was written for devices with discrete > > > > Thunderbolt controllers. > > > > > > > > For example, > > > > CometLake_CPU -> AlpineRidge_Chip -> USB-C Port > > > > This device has the ExternalFacingPort property in ACPI. > > > > My quirk relabels the Alpine Ridge chip as "fixed" and > > > > external-facing, so that devices attached to the USB-C port could be > > > > labeled as "removable" > > > > > > > > Let's say we have a TigerLake CPU, which has integrated > > > > Thunderbolt/USB4 capabilities: > > > > > > > > TigerLake_ThunderboltCPU -> USB-C Port > > > > This device also has the ExternalFacingPort property in ACPI and lacks > > > > the usb4-host-interface property in the ACPI. > > > > > > > > My worry is that someone could take an Alpine Ridge Chip Thunderbolt > > > > Dock and attach it to the TigerLake CPU > > > > > > > > TigerLake_ThunderboltCPU -> USB-C Port -> AlpineRidge_Dock > > > > > > > > If that were to happen, this quirk would incorrectly label the Alpine > > > > Ridge Dock as "fixed" instead of "removable". > > > > > > > > My thinking was that we could prevent this scenario from occurring if > > > > we filtered this quirk not to apply on CPU's like Tiger Lake, with > > > > integrated Thunderbolt/USB4 capabilities. > > > > > > > > ExternalFacingPort is found both on the Comet Lake ACPI and also on > > > > the Tiger Lake ACPI. So I can't use that to distinguish between CPUs > > > > which don't have integrated Thunderbolt, like Comet Lake, and CPUs > > > > with integrated Thunderbolt, like Tiger Lake. > > > > > > > > I am looking for something that can tell me if the device's Root Port > > > > has the Thunderbolt controller upstream to it or not. > > > > Is there anything like that? > > > > Or perhaps should I add a check which compares the name of the > > > > device's CPU with a list of CPUs that this quirk can be applied to? > > > > Or is there some way I can identify the Thunderbolt controller, then > > > > determine if it's upstream or downstream from the root port? > > > > Or are Alpine Ridge docks not something to worry about at all? > > > > > > My thought was once you have a device as untrusted, everything else > > > connected to it should "also" be untrusted. > > > > I think what you are looking for is that anything behind a PCIe tunnel > > should not have this applied. IIRC the AMD GPU or some code there were > > going to add identification of "virtual" links to the bandwidth > > calculation functionality. > > > > @Mario, do you remember if this was done already and if that could maybe > > be re-used here? > > Yeah there was a series that I worked on a few spins a while back > specifically in the context of eGPUs to identify virtual links and take them > out of bandwidth calculations. > > It didn't get merged, I recall it got stalled on various feedback and I > didn't dust it off because the series also did prompt discussions about the > reasoning that amdgpu was doing this in the first place. It turned out to > be a bad assumption in the code and I instead made a change to amdgpu to not > look at the whole topology but just the link partner > (466a7d115326ece682c2b60d1c77d1d0b9010b4f if anyone is curious). Okay that makes sense. Thanks!
Thank you for all your help! On Tue, Apr 23, 2024 at 1:33 AM Mika Westerberg <mika.westerberg@linux.intel.com> wrote: > > The other way I think is something like this: > > - If it does not have "usb4-host-interface" property (or behind a port > that has that). These are all tunneled (e.g virtual). > > - It is directly connected to a PCIe root port with > "ExternalFacingPort" and it has sibling device that is "Thunderbolt > NHI". This is because you can only have "NHI" on a host router > according to the USB4 spec. > I did find one example of a docking station that uses the DSL6540 chip, which has PCI IDs defined in include/linux/pci_ids.h: #define PCI_DEVICE_ID_INTEL_ALPINE_RIDGE_4C_NHI 0x1577 #define PCI_DEVICE_ID_INTEL_ALPINE_RIDGE_4C_BRIDGE 0x1578 It seems like it has an NHI, despite being in an external, removable docking station. This appears to contradict what you say about only having "NHI" on a host router. I am assuming that by host router, you mean the fixed discrete, fixed thunderbolt chip, or the thunderbolt controller upstream to the root port. Please correct me if I got anything wrong! Looking at 18-241_ThunderboltController_Brief_HI.pdf, it seems like these Alpine Ridge chips can be used either on a computer or a peripheral. (Expected usage: Computer or peripheral) So I'm not sure if finding an NHI would guarantee that the device is not a peripheral. My original question was how to distinguish a Thunderbolt controller that is on a removable peripheral, like a docking station-- from one that is a discrete chip fixed to a computer or upstream to the root port. So unless I am misunderstanding something, it appears that my only option is waiting for Lukas's patches. Please correct me if that is not the case!
Hi, On Thu, Apr 25, 2024 at 05:16:24PM -0400, Esther Shimanovich wrote: > Thank you for all your help! > > On Tue, Apr 23, 2024 at 1:33 AM Mika Westerberg > <mika.westerberg@linux.intel.com> wrote: > > > > The other way I think is something like this: > > > > - If it does not have "usb4-host-interface" property (or behind a port > > that has that). These are all tunneled (e.g virtual). > > > > - It is directly connected to a PCIe root port with > > "ExternalFacingPort" and it has sibling device that is "Thunderbolt > > NHI". This is because you can only have "NHI" on a host router > > according to the USB4 spec. > > > I did find one example of a docking station that uses the DSL6540 > chip, which has PCI IDs defined in include/linux/pci_ids.h: > #define PCI_DEVICE_ID_INTEL_ALPINE_RIDGE_4C_NHI 0x1577 > #define PCI_DEVICE_ID_INTEL_ALPINE_RIDGE_4C_BRIDGE 0x1578 > It seems like it has an NHI, despite being in an external, removable > docking station. This appears to contradict what you say about only > having "NHI" on a host router. I am assuming that by host router, you > mean the fixed discrete, fixed thunderbolt chip, or the thunderbolt > controller upstream to the root port. Please correct me if I got > anything wrong! So it goes same way with other discrete chips from Intel at least. It is the same silicon but the NHI is disabled on device routers. That said, it is entirely possible for a "malicious" device to pretend to have one so we need to be careful. > Looking at 18-241_ThunderboltController_Brief_HI.pdf, it seems like > these Alpine Ridge chips can be used either on a computer or a > peripheral. (Expected usage: Computer or peripheral) Yes as above. Most our chips are such. > So I'm not sure if finding an NHI would guarantee that the device is > not a peripheral. My original question was how to distinguish a > Thunderbolt controller that is on a removable peripheral, like a > docking station-- from one that is a discrete chip fixed to a computer > or upstream to the root port. Yes that's the problem. We can figure that out from full USB4 system by looking at the "usb4-host-interface" ACPI _DSD properties of the tunneled PCIe Root/Downstream ports. But this does not work with the pre-USB4 hosts so that requires some sort of heuristics, unfortunately. > So unless I am misunderstanding something, it appears that my only > option is waiting for Lukas's patches. Please correct me if that is > not the case! I think with Lukas' patches (Lukas please correct me if I got that wrong) you can find out the PCIe ports which have their link going over a tunnel. His series works with pre-USB4 devices so that should cover your case. (In addition to "usb4-host-interface" ACPI _DSD property there is a special router operation that allows extracting the same PCIe downstream port mapping that Lukas' patches are doing from DROM so this should also allow identify all tunneled links.) This way you can identify the xHCI (well and NHI) that are not behind PCIe tunnel so that should mean they are really part of the host system (being soldered or plugged to the PCIe slot or so). If I understood right this is what you were looking for, correct?
> This way you can identify the xHCI (well and NHI) that are not behind > PCIe tunnel so that should mean they are really part of the host system > (being soldered or plugged to the PCIe slot or so). If I understood > right this is what you were looking for, correct? Yes! Thank you for the explanation.
On Fri, Apr 26, 2024 at 07:52:07AM +0300, Mika Westerberg wrote: > On Thu, Apr 25, 2024 at 05:16:24PM -0400, Esther Shimanovich wrote: > > I did find one example of a docking station that uses the DSL6540 > > chip, which has PCI IDs defined in include/linux/pci_ids.h: > > #define PCI_DEVICE_ID_INTEL_ALPINE_RIDGE_4C_NHI 0x1577 > > #define PCI_DEVICE_ID_INTEL_ALPINE_RIDGE_4C_BRIDGE 0x1578 > > It seems like it has an NHI, despite being in an external, removable > > docking station. This appears to contradict what you say about only > > having "NHI" on a host router. I am assuming that by host router, you > > mean the fixed discrete, fixed thunderbolt chip, or the thunderbolt > > controller upstream to the root port. Please correct me if I got > > anything wrong! > > So it goes same way with other discrete chips from Intel at least. It is > the same silicon but the NHI is disabled on device routers. > > That said, it is entirely possible for a "malicious" device to pretend > to have one so we need to be careful. If a device (accidentally or maliciously) exposes an NHI, the thunderbolt driver will try to bind to it. Do we take any precautions to prevent that? AFAICS we'd be allocating a duplicate root_switch with route 0. Seems dangerous if two driver instances talk to the same Root Switch. This looks like something that needs to be checked by Intel Validation Engineering folks. Have they been testing this? (+cc Gil) Thanks, Lukas
On Thu, Apr 25, 2024 at 05:16:24PM -0400, Esther Shimanovich wrote: > So unless I am misunderstanding something, it appears that my only > option is waiting for Lukas's patches. I've just pushed this branch: https://github.com/l1k/linux/commits/thunderbolt_associate_v1 But it's not even compile-tested yet. 0-day is crunching through it now and I'll force-push if/when fixing bugs. It also doesn't contain support yet for the "Get PCIe Downstream Entry" functionality that's apparently needed for USB4. The branch marks PCIe Downstream Adapters on a Root Switch as fixed and external_facing. However that doesn't appear to be sufficient: I notice that in your patch, you're also clearing the external_facing bit on the Root Port above the discrete host controller. Additionally you're marking the bridges leading to the NHI and XHCI as fixed. You're also marking the NHI and XHCI themselves as fixed and external_facing. I just want to confirm whether all of this is actually necessary. At least marking the NHI and XHCI as external_facing seems nonsensical. I need to know the *minimal* set of attribute changes to fix the problem. I also need to know exactly what the user-visible issue is and how it comes about. Your commit message merely says "the build-in USB-C ports stop working at all". Does that mean that no driver is bound to the NHI or XHCI? The solution implemented by the above-linked branch hinges on the NHI driver fixing up the device attributes. But if the NHI driver is not bound, it can't fix up the attributes. I notice that commit 86eaf4a5b431 ("thunderbolt: Make iommu_dma_protection more accurate") bemoans the inability to determine external_facing ports clearly. The above-linked branch may solve that, so it seems there may be overlap between the issue discussed here and the one that the commit sought to solve. Thanks, Lukas
On Sat, Apr 27, 2024 at 07:35:33AM +0200, Lukas Wunner wrote: > On Fri, Apr 26, 2024 at 07:52:07AM +0300, Mika Westerberg wrote: > > On Thu, Apr 25, 2024 at 05:16:24PM -0400, Esther Shimanovich wrote: > > > I did find one example of a docking station that uses the DSL6540 > > > chip, which has PCI IDs defined in include/linux/pci_ids.h: > > > #define PCI_DEVICE_ID_INTEL_ALPINE_RIDGE_4C_NHI 0x1577 > > > #define PCI_DEVICE_ID_INTEL_ALPINE_RIDGE_4C_BRIDGE 0x1578 > > > It seems like it has an NHI, despite being in an external, removable > > > docking station. This appears to contradict what you say about only > > > having "NHI" on a host router. I am assuming that by host router, you > > > mean the fixed discrete, fixed thunderbolt chip, or the thunderbolt > > > controller upstream to the root port. Please correct me if I got > > > anything wrong! > > > > So it goes same way with other discrete chips from Intel at least. It is > > the same silicon but the NHI is disabled on device routers. > > > > That said, it is entirely possible for a "malicious" device to pretend > > to have one so we need to be careful. > > If a device (accidentally or maliciously) exposes an NHI, the thunderbolt > driver will try to bind to it. > > Do we take any precautions to prevent that? Not at the moment but it will be behind the IOMMU so it cannot access any other memory that what is reserved for it. > AFAICS we'd be allocating a duplicate root_switch with route 0. > Seems dangerous if two driver instances talk to the same Root Switch. I don't think it even works because it cannot have topology ID of 0 if it is a device router which it is in this case since it is being first enumerated by the "real" host. That said we have not tested this either so can be 100% sure.
On Thu, Apr 25, 2024 at 05:16:24PM -0400, Esther Shimanovich wrote: > I did find one example of a docking station that uses the DSL6540 > chip, which has PCI IDs defined in include/linux/pci_ids.h: > #define PCI_DEVICE_ID_INTEL_ALPINE_RIDGE_4C_NHI 0x1577 > #define PCI_DEVICE_ID_INTEL_ALPINE_RIDGE_4C_BRIDGE 0x1578 > It seems like it has an NHI, despite being in an external, removable > docking station. Could you provide full output of dmesg and lspci -vvv of a machine with this docking station attached? Perhaps open a bug at bugzilla.kernel.org and attach it there? Could you then try the below patch and see if it prevents the Thunderbolt driver from binding to the hot-plugged device? Thanks! -- >8 -- From a10a294a650232c16447a43c2b591f34d3cb399f Mon Sep 17 00:00:00 2001 From: Lukas Wunner <lukas@wunner.de> Date: Sat, 27 Apr 2024 16:24:18 +0200 Subject: [PATCH] thunderbolt: Do not bind to NHI exposed by a hot-plugged device An NHI should only be exposed by Thunderbolt host controllers, not by hot-plugged devices. Avoid binding to an NHI erroneously or maliciously exposed by a device by looking at the upstream port of its switch: On a host controller, the upstream port is of type TB_TYPE_NHI, whereas on hot-plugged devices it is of type TB_TYPE_PORT. Signed-off-by: Lukas Wunner <lukas@wunner.de> Cc: stable@vger.kernel.org --- drivers/thunderbolt/tb.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/drivers/thunderbolt/tb.c b/drivers/thunderbolt/tb.c index e00e62de53f3..d95ff9ed4d96 100644 --- a/drivers/thunderbolt/tb.c +++ b/drivers/thunderbolt/tb.c @@ -2786,6 +2786,13 @@ static int tb_start(struct tb *tb, bool reset) if (IS_ERR(tb->root_switch)) return PTR_ERR(tb->root_switch); + /* NHI erroneously exposed by a hot-plugged device? */ + if (!tb_port_is_nhi(tb_upstream_port(tb->root_switch))) { + tb_err(tb, "not a host controller\n"); + tb_switch_put(tb->root_switch); + return -ENODEV; + } + /* * ICM firmware upgrade needs running firmware and in native * mode that is not available so disable firmware upgrade of the
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index ea476252280a..34e43323ff14 100644 --- a/drivers/pci/quirks.c +++ b/drivers/pci/quirks.c @@ -3873,6 +3873,118 @@ DECLARE_PCI_FIXUP_SUSPEND_LATE(PCI_VENDOR_ID_INTEL, quirk_apple_poweroff_thunderbolt); #endif +/* + * On most ThinkPad Carbon 7/8s, JHL6540 Thunderbolt 3 bridges are set + * incorrectly as DEVICE_REMOVABLE despite being built into the device. + * This is the side effect of a unique hardware configuration. + * + * Normally, Thunderbolt functionality is integrated to the SoC and + * its root ports. + * + * Most devices: + * root port --> USB-C port + * + * But X1 Carbon Gen 7/8 uses Whiskey Lake and Comet Lake SoC, which + * don't come with Thunderbolt functionality. Therefore, a discrete + * Thunderbolt Host PCI controller was added between the root port and + * the USB-C port. + * + * Thinkpad Carbon 7/8s + * (w/ Whiskey Lake and Comet Lake SoC): + * root port --> JHL6540 --> USB-C port + * + * Because the root port is labeled by FW as "ExternalFacingPort", as + * required by the DMAR ACPI spec, the JHL6540 chip is inaccurately + * labeled as DEVICE_REMOVABLE by the kernel pci driver. + * Therefore, the built-in USB-C ports do not enumerate when policies + * forbidding external pci devices are enforced. + * + * This fix relabels the pci components in the built-in JHL6540 chip as + * DEVICE_FIXED, ensuring that the built-in USB-C ports always enumerate + * properly as intended. + * + * This fix also labels the external facing components of the JHL6540 as + * external_facing, so that the pci attach policy works as intended. + * + * The ASCII diagram below describes the pci layout of the JHL6540 chip. + * + * Root Port + * [8086:02b4] or [8086:9db4] + * | + * JHL6540 Chip + * __________________________________________________ + * | Bridge | + * | PCI ID -> [8086:15d3] | + * | DEVFN -> (00) | + * | _________________|__________________ | + * | | | | | | + * | Bridge Bridge Bridge Bridge | + * | [8086:15d3] [8086:15d3] [8086:15d3] [8086:15d3] | + * | (00) (08) (10) (20) | + * | | | | | | + * | NHI | USB Controller | | + * | [8086:15d2] | [8086:15d4] | | + * | (00) | (00) | | + * | |___________| |___________| | + * |____________|________________________|____________| + * | | + * USB-C Port USB-C Port + * + * + * Based on what a JHL6549 pci component's pci id, subsystem device id + * and devfn are, we can infer if it is fixed and if it faces a usb port; + * which would mean it is external facing. + * This quirk uses these values to identify the pci components and set the + * properties accordingly. + */ +static void carbon_X1_fixup_relabel_alpine_ridge(struct pci_dev *dev) +{ + /* Is this JHL6540 PCI component embedded in a Lenovo device? */ + if (dev->subsystem_vendor != PCI_VENDOR_ID_LENOVO) + return; + + /* Is this JHL6540 PCI component embedded in an X1 Carbon Gen 7/8? */ + if (dev->subsystem_device != 0x22be && // Gen 8 + dev->subsystem_device != 0x2292) { // Gen 7 + return; + } + + dev_set_removable(&dev->dev, DEVICE_FIXED); + + /* Not all 0x15d3 components are external facing */ + if (dev->device == 0x15d3 && + dev->devfn != 0x08 && + dev->devfn != 0x20) { + return; + } + + dev->external_facing = true; +} + +/* + * We also need to relabel the root port as a consequence of changing + * the JHL6540's PCIE hierarchy. + */ +static void carbon_X1_fixup_rootport_not_removable(struct pci_dev *dev) +{ + /* Is this JHL6540 PCI component embedded in a Lenovo device? */ + if (dev->subsystem_vendor != PCI_VENDOR_ID_LENOVO) + return; + + /* Is this JHL6540 PCI component embedded in an X1 Carbon Gen 7/8? */ + if (dev->subsystem_device != 0x22be && // Gen 8 + dev->subsystem_device != 0x2292) { // Gen 7 + return; + } + + dev->external_facing = false; +} +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x15d3, carbon_X1_fixup_relabel_alpine_ridge); +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x15d2, carbon_X1_fixup_relabel_alpine_ridge); +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x15d4, carbon_X1_fixup_relabel_alpine_ridge); +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x02b4, carbon_X1_fixup_rootport_not_removable); +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x9db4, carbon_X1_fixup_rootport_not_removable); + /* * Following are device-specific reset methods which can be used to * reset a single function if other methods (e.g. FLR, PM D0->D3) are
On Lenovo X1 Carbon Gen 7/8 devices, when a platform enables a policy to distrust removable PCI devices, the build-in USB-C ports stop working at all. This happens because these X1 Carbon models have a unique feature; a Thunderbolt controller that is discrete from the SoC. The software sees this controller, and incorrectly assumes it is a removable PCI device, even though it is fixed to the computer and is wired to the computer's own USB-C ports. Relabel all the components of the JHL6540 controller as DEVICE_FIXED, and where applicable, external_facing. Ensure that the security policy to distrust external PCI devices works as intended, and that the device's USB-C ports are able to enumerate even when the policy is enabled. Signed-off-by: Esther Shimanovich <eshimanovich@chromium.org> --- Changes in v4: - replaced a dmi check in the rootport quirk with a subsystem vendor and device check. - Link to v3: https://lore.kernel.org/r/20231220-thunderbolt-pci-patch-4-v3-1-056fd1717d06@chromium.org Changes in v3: - removed redundant dmi check, as the subsystem vendor check is sufficient - switched to PCI_VENDOR_ID_LENOVO instead of hex code - Link to v2: https://lore.kernel.org/r/20231219-thunderbolt-pci-patch-4-v2-1-ec2d7af45a9b@chromium.org Changes in v2: - nothing new, v1 was just a test run to see if the ASCII diagram would be rendered properly in mutt and k-9 - for folks using gmail, make sure to select "show original" on the top right, as otherwise the diagram will be garbled by the standard non-monospace font - Link to v1: https://lore.kernel.org/r/20231219-thunderbolt-pci-patch-4-v1-1-4e8e3773f0a9@chromium.org --- drivers/pci/quirks.c | 112 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 112 insertions(+) --- base-commit: b85ea95d086471afb4ad062012a4d73cd328fa86 change-id: 20231219-thunderbolt-pci-patch-4-ede71cb833c4 Best regards,