diff mbox series

[v4] PCI: Relabel JHL6540 on Lenovo X1 Carbon 7,8

Message ID 20231221-thunderbolt-pci-patch-4-v4-1-2e136e57c9bc@chromium.org
State New
Headers show
Series [v4] PCI: Relabel JHL6540 on Lenovo X1 Carbon 7,8 | expand

Commit Message

Esther Shimanovich Dec. 21, 2023, 8:53 p.m. UTC
On Lenovo X1 Carbon Gen 7/8 devices, when a platform enables a policy to
distrust removable PCI devices, the build-in USB-C ports stop working at
all.
This happens because these X1 Carbon models have a unique feature; a
Thunderbolt controller that is discrete from the SoC. The software sees
this controller, and incorrectly assumes it is a removable PCI device,
even though it is fixed to the computer and is wired to the computer's
own USB-C ports.

Relabel all the components of the JHL6540 controller as DEVICE_FIXED,
and where applicable, external_facing.

Ensure that the security policy to distrust external PCI devices works
as intended, and that the device's USB-C ports are able to enumerate
even when the policy is enabled.

Signed-off-by: Esther Shimanovich <eshimanovich@chromium.org>
---
Changes in v4:
- replaced a dmi check in the rootport quirk with a subsystem vendor and
  device check.
- Link to v3: https://lore.kernel.org/r/20231220-thunderbolt-pci-patch-4-v3-1-056fd1717d06@chromium.org

Changes in v3:
- removed redundant dmi check, as the subsystem vendor check is
  sufficient
- switched to PCI_VENDOR_ID_LENOVO instead of hex code
- Link to v2: https://lore.kernel.org/r/20231219-thunderbolt-pci-patch-4-v2-1-ec2d7af45a9b@chromium.org

Changes in v2:
- nothing new, v1 was just a test run to see if the ASCII diagram would
  be rendered properly in mutt and k-9
- for folks using gmail, make sure to select "show original" on the top
  right, as otherwise the diagram will be garbled by the standard
  non-monospace font
- Link to v1: https://lore.kernel.org/r/20231219-thunderbolt-pci-patch-4-v1-1-4e8e3773f0a9@chromium.org
---
 drivers/pci/quirks.c | 112 +++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 112 insertions(+)


---
base-commit: b85ea95d086471afb4ad062012a4d73cd328fa86
change-id: 20231219-thunderbolt-pci-patch-4-ede71cb833c4

Best regards,

Comments

Dmitry Torokhov Dec. 21, 2023, 11:15 p.m. UTC | #1
On Thu, Dec 21, 2023 at 03:53:42PM -0500, Esther Shimanovich wrote:
> +static void carbon_X1_fixup_relabel_alpine_ridge(struct pci_dev *dev)
> +{
> +	/* Is this JHL6540 PCI component embedded in a Lenovo device? */
> +	if (dev->subsystem_vendor != PCI_VENDOR_ID_LENOVO)
> +		return;
> +
> +	/* Is this JHL6540 PCI component embedded in an X1 Carbon Gen 7/8? */
> +	if (dev->subsystem_device != 0x22be && // Gen 8
> +	    dev->subsystem_device != 0x2292) { // Gen 7
> +		return;
> +	}
> +
> +	dev_set_removable(&dev->dev, DEVICE_FIXED);
> +
> +	/* Not all 0x15d3 components are external facing */
> +	if (dev->device == 0x15d3 &&
> +	    dev->devfn != 0x08 &&
> +	    dev->devfn != 0x20) {
> +		return;
> +	}
> +
> +	dev->external_facing = true;
> +}
> +
> +/*
> + * We also need to relabel the root port as a consequence of changing
> + * the JHL6540's PCIE hierarchy.
> + */
> +static void carbon_X1_fixup_rootport_not_removable(struct pci_dev *dev)
> +{
> +	/* Is this JHL6540 PCI component embedded in a Lenovo device? */
> +	if (dev->subsystem_vendor != PCI_VENDOR_ID_LENOVO)
> +		return;
> +
> +	/* Is this JHL6540 PCI component embedded in an X1 Carbon Gen 7/8? */
> +	if (dev->subsystem_device != 0x22be && // Gen 8
> +	    dev->subsystem_device != 0x2292) { // Gen 7
> +		return;
> +	}

This ventures into the realm of nitpicking, but this can be factored out
into a helper.

I'll shut up now and let PCI folks handle this.

Thanks.
Bjorn Helgaas Dec. 27, 2023, 12:15 a.m. UTC | #2
On Thu, Dec 21, 2023 at 03:53:42PM -0500, Esther Shimanovich wrote:
> On Lenovo X1 Carbon Gen 7/8 devices, when a platform enables a policy to
> distrust removable PCI devices, the build-in USB-C ports stop working at
> all.
> This happens because these X1 Carbon models have a unique feature; a
> Thunderbolt controller that is discrete from the SoC. The software sees
> this controller, and incorrectly assumes it is a removable PCI device,
> even though it is fixed to the computer and is wired to the computer's
> own USB-C ports.
> 
> Relabel all the components of the JHL6540 controller as DEVICE_FIXED,
> and where applicable, external_facing.
> 
> Ensure that the security policy to distrust external PCI devices works
> as intended, and that the device's USB-C ports are able to enumerate
> even when the policy is enabled.

Thanks for all your work here.

This is going to be a maintenance problem.  We typically use quirks to
work around hardware defects, e.g., a device that advertises a feature
that doesn't work per spec.

But I don't see where the defect is here.  And I doubt that this is
really a unique situation.  So it's likely that this will happen on
other systems, and we don't want to have to add quirks every time
another one shows up.

If this is a firmware defect, e.g., if this platform is using
"ExternalFacingPort" incorrectly, we can add a quirk to work around
that, too.  But I'm not sure that's the case.

> Signed-off-by: Esther Shimanovich <eshimanovich@chromium.org>
> ---
> Changes in v4:
> - replaced a dmi check in the rootport quirk with a subsystem vendor and
>   device check.
> - Link to v3: https://lore.kernel.org/r/20231220-thunderbolt-pci-patch-4-v3-1-056fd1717d06@chromium.org
> 
> Changes in v3:
> - removed redundant dmi check, as the subsystem vendor check is
>   sufficient
> - switched to PCI_VENDOR_ID_LENOVO instead of hex code
> - Link to v2: https://lore.kernel.org/r/20231219-thunderbolt-pci-patch-4-v2-1-ec2d7af45a9b@chromium.org
> 
> Changes in v2:
> - nothing new, v1 was just a test run to see if the ASCII diagram would
>   be rendered properly in mutt and k-9
> - for folks using gmail, make sure to select "show original" on the top
>   right, as otherwise the diagram will be garbled by the standard
>   non-monospace font
> - Link to v1: https://lore.kernel.org/r/20231219-thunderbolt-pci-patch-4-v1-1-4e8e3773f0a9@chromium.org
> ---
>  drivers/pci/quirks.c | 112 +++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 112 insertions(+)
> 
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index ea476252280a..34e43323ff14 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -3873,6 +3873,118 @@ DECLARE_PCI_FIXUP_SUSPEND_LATE(PCI_VENDOR_ID_INTEL,
>  			       quirk_apple_poweroff_thunderbolt);
>  #endif
>  
> +/*
> + * On most ThinkPad Carbon 7/8s, JHL6540 Thunderbolt 3 bridges are set
> + * incorrectly as DEVICE_REMOVABLE despite being built into the device.
> + * This is the side effect of a unique hardware configuration.
> + *
> + * Normally, Thunderbolt functionality is integrated to the SoC and
> + * its root ports.
> + *
> + *                          Most devices:
> + *                    root port --> USB-C port
> + *
> + * But X1 Carbon Gen 7/8 uses Whiskey Lake and Comet Lake SoC, which
> + * don't come with Thunderbolt functionality. Therefore, a discrete
> + * Thunderbolt Host PCI controller was added between the root port and
> + * the USB-C port.
> + *
> + *                        Thinkpad Carbon 7/8s
> + *                 (w/ Whiskey Lake and Comet Lake SoC):
> + *                root port -->  JHL6540   --> USB-C port
> + *
> + * Because the root port is labeled by FW as "ExternalFacingPort", as
> + * required by the DMAR ACPI spec, the JHL6540 chip is inaccurately

Can you include a citation (spec name, revision, section) for this
DMAR requirement?

> + * labeled as DEVICE_REMOVABLE by the kernel pci driver.
> + * Therefore, the built-in USB-C ports do not enumerate when policies
> + * forbidding external pci devices are enforced.
> + *
> + * This fix relabels the pci components in the built-in JHL6540 chip as
> + * DEVICE_FIXED, ensuring that the built-in USB-C ports always enumerate
> + * properly as intended.
> + *
> + * This fix also labels the external facing components of the JHL6540 as
> + * external_facing, so that the pci attach policy works as intended.
> + *
> + * The ASCII diagram below describes the pci layout of the JHL6540 chip.
> + *
> + *                         Root Port
> + *                 [8086:02b4] or [8086:9db4]
> + *                             |
> + *                        JHL6540 Chip
> + *     __________________________________________________
> + *    |                      Bridge                      |
> + *    |        PCI ID ->  [8086:15d3]                    |
> + *    |         DEVFN ->      (00)                       |
> + *    |       _________________|__________________       |
> + *    |      |           |            |           |      |
> + *    |    Bridge     Bridge        Bridge      Bridge   |
> + *    | [8086:15d3] [8086:15d3]  [8086:15d3] [8086:15d3] |
> + *    |    (00)        (08)         (10)        (20)     |
> + *    |      |           |            |           |      |
> + *    |     NHI          |     USB Controller     |      |
> + *    | [8086:15d2]      |       [8086:15d4]      |      |
> + *    |    (00)          |          (00)          |      |
> + *    |      |___________|            |___________|      |
> + *    |____________|________________________|____________|
> + *                 |                        |
> + *             USB-C Port               USB-C Port
> + *
> + *
> + * Based on what a JHL6549 pci component's pci id, subsystem device id
> + * and devfn are, we can infer if it is fixed and if it faces a usb port;
> + * which would mean it is external facing.
> + * This quirk uses these values to identify the pci components and set the
> + * properties accordingly.

Random nits: Capitalize spec terms like "PCI", "USB", "Root Port",
etc., consistently in English text.  Rewrap to fill 78 columns or so.
Add blank lines between paragraphs.  This applies to the commit log
(which should be wrapped to 75 to allow for "git log" indent) as well
as comments.

But honestly, I hope we can figure out a solution that doesn't require
a big comment like this.  Checking for random device IDs to deduce the
topology is not the way PCI is supposed to work.  PCI is designed to
be enumerable, so software can learn what it needs to know by
interrogating the hardware directly.

For cases where that's impossible, ACPI is supposed to fill the gaps,
e.g., with "ExternalFacingPort".  If this patch covers over a gap that
firmware doesn't handle yet, maybe we need to add some new firmware
interface.  If that's the case, we can add quirks for platforms that
don't have the new interface.  But we at least need a plan that
doesn't require quirks indefinitely.

> + */
> +static void carbon_X1_fixup_relabel_alpine_ridge(struct pci_dev *dev)
> +{
> +	/* Is this JHL6540 PCI component embedded in a Lenovo device? */
> +	if (dev->subsystem_vendor != PCI_VENDOR_ID_LENOVO)
> +		return;
> +
> +	/* Is this JHL6540 PCI component embedded in an X1 Carbon Gen 7/8? */
> +	if (dev->subsystem_device != 0x22be && // Gen 8
> +	    dev->subsystem_device != 0x2292) { // Gen 7
> +		return;
> +	}
> +
> +	dev_set_removable(&dev->dev, DEVICE_FIXED);
> +
> +	/* Not all 0x15d3 components are external facing */
> +	if (dev->device == 0x15d3 &&
> +	    dev->devfn != 0x08 &&
> +	    dev->devfn != 0x20) {
> +		return;
> +	}
> +
> +	dev->external_facing = true;
> +}
> +
> +/*
> + * We also need to relabel the root port as a consequence of changing
> + * the JHL6540's PCIE hierarchy.
> + */
> +static void carbon_X1_fixup_rootport_not_removable(struct pci_dev *dev)
> +{
> +	/* Is this JHL6540 PCI component embedded in a Lenovo device? */
> +	if (dev->subsystem_vendor != PCI_VENDOR_ID_LENOVO)
> +		return;
> +
> +	/* Is this JHL6540 PCI component embedded in an X1 Carbon Gen 7/8? */
> +	if (dev->subsystem_device != 0x22be && // Gen 8
> +	    dev->subsystem_device != 0x2292) { // Gen 7
> +		return;
> +	}
> +
> +	dev->external_facing = false;
> +}
> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x15d3, carbon_X1_fixup_relabel_alpine_ridge);
> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x15d2, carbon_X1_fixup_relabel_alpine_ridge);
> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x15d4, carbon_X1_fixup_relabel_alpine_ridge);
> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x02b4, carbon_X1_fixup_rootport_not_removable);
> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x9db4, carbon_X1_fixup_rootport_not_removable);
> +
>  /*
>   * Following are device-specific reset methods which can be used to
>   * reset a single function if other methods (e.g. FLR, PM D0->D3) are
> 
> ---
> base-commit: b85ea95d086471afb4ad062012a4d73cd328fa86
> change-id: 20231219-thunderbolt-pci-patch-4-ede71cb833c4
> 
> Best regards,
> -- 
> Esther Shimanovich <eshimanovich@chromium.org>
>
Lukas Wunner Dec. 28, 2023, 1:25 p.m. UTC | #3
[adding Mika and Mario to the list of recipients, original patch is here:
https://lore.kernel.org/all/20231221-thunderbolt-pci-patch-4-v4-1-2e136e57c9bc@chromium.org/
a lot of folks are on vacation, replies might not be sent before Jan 8;
full quote without further comments below]

On Thu, Dec 21, 2023 at 03:53:42PM -0500, Esther Shimanovich wrote:
> On Lenovo X1 Carbon Gen 7/8 devices, when a platform enables a policy to
> distrust removable PCI devices, the build-in USB-C ports stop working at
> all.
> This happens because these X1 Carbon models have a unique feature; a
> Thunderbolt controller that is discrete from the SoC. The software sees
> this controller, and incorrectly assumes it is a removable PCI device,
> even though it is fixed to the computer and is wired to the computer's
> own USB-C ports.
> 
> Relabel all the components of the JHL6540 controller as DEVICE_FIXED,
> and where applicable, external_facing.
> 
> Ensure that the security policy to distrust external PCI devices works
> as intended, and that the device's USB-C ports are able to enumerate
> even when the policy is enabled.
> 
> Signed-off-by: Esther Shimanovich <eshimanovich@chromium.org>
> ---
> Changes in v4:
> - replaced a dmi check in the rootport quirk with a subsystem vendor and
>   device check.
> - Link to v3: https://lore.kernel.org/r/20231220-thunderbolt-pci-patch-4-v3-1-056fd1717d06@chromium.org
> 
> Changes in v3:
> - removed redundant dmi check, as the subsystem vendor check is
>   sufficient
> - switched to PCI_VENDOR_ID_LENOVO instead of hex code
> - Link to v2: https://lore.kernel.org/r/20231219-thunderbolt-pci-patch-4-v2-1-ec2d7af45a9b@chromium.org
> 
> Changes in v2:
> - nothing new, v1 was just a test run to see if the ASCII diagram would
>   be rendered properly in mutt and k-9
> - for folks using gmail, make sure to select "show original" on the top
>   right, as otherwise the diagram will be garbled by the standard
>   non-monospace font
> - Link to v1: https://lore.kernel.org/r/20231219-thunderbolt-pci-patch-4-v1-1-4e8e3773f0a9@chromium.org
> ---
>  drivers/pci/quirks.c | 112 +++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 112 insertions(+)
> 
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index ea476252280a..34e43323ff14 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -3873,6 +3873,118 @@ DECLARE_PCI_FIXUP_SUSPEND_LATE(PCI_VENDOR_ID_INTEL,
>  			       quirk_apple_poweroff_thunderbolt);
>  #endif
>  
> +/*
> + * On most ThinkPad Carbon 7/8s, JHL6540 Thunderbolt 3 bridges are set
> + * incorrectly as DEVICE_REMOVABLE despite being built into the device.
> + * This is the side effect of a unique hardware configuration.
> + *
> + * Normally, Thunderbolt functionality is integrated to the SoC and
> + * its root ports.
> + *
> + *                          Most devices:
> + *                    root port --> USB-C port
> + *
> + * But X1 Carbon Gen 7/8 uses Whiskey Lake and Comet Lake SoC, which
> + * don't come with Thunderbolt functionality. Therefore, a discrete
> + * Thunderbolt Host PCI controller was added between the root port and
> + * the USB-C port.
> + *
> + *                        Thinkpad Carbon 7/8s
> + *                 (w/ Whiskey Lake and Comet Lake SoC):
> + *                root port -->  JHL6540   --> USB-C port
> + *
> + * Because the root port is labeled by FW as "ExternalFacingPort", as
> + * required by the DMAR ACPI spec, the JHL6540 chip is inaccurately
> + * labeled as DEVICE_REMOVABLE by the kernel pci driver.
> + * Therefore, the built-in USB-C ports do not enumerate when policies
> + * forbidding external pci devices are enforced.
> + *
> + * This fix relabels the pci components in the built-in JHL6540 chip as
> + * DEVICE_FIXED, ensuring that the built-in USB-C ports always enumerate
> + * properly as intended.
> + *
> + * This fix also labels the external facing components of the JHL6540 as
> + * external_facing, so that the pci attach policy works as intended.
> + *
> + * The ASCII diagram below describes the pci layout of the JHL6540 chip.
> + *
> + *                         Root Port
> + *                 [8086:02b4] or [8086:9db4]
> + *                             |
> + *                        JHL6540 Chip
> + *     __________________________________________________
> + *    |                      Bridge                      |
> + *    |        PCI ID ->  [8086:15d3]                    |
> + *    |         DEVFN ->      (00)                       |
> + *    |       _________________|__________________       |
> + *    |      |           |            |           |      |
> + *    |    Bridge     Bridge        Bridge      Bridge   |
> + *    | [8086:15d3] [8086:15d3]  [8086:15d3] [8086:15d3] |
> + *    |    (00)        (08)         (10)        (20)     |
> + *    |      |           |            |           |      |
> + *    |     NHI          |     USB Controller     |      |
> + *    | [8086:15d2]      |       [8086:15d4]      |      |
> + *    |    (00)          |          (00)          |      |
> + *    |      |___________|            |___________|      |
> + *    |____________|________________________|____________|
> + *                 |                        |
> + *             USB-C Port               USB-C Port
> + *
> + *
> + * Based on what a JHL6549 pci component's pci id, subsystem device id
> + * and devfn are, we can infer if it is fixed and if it faces a usb port;
> + * which would mean it is external facing.
> + * This quirk uses these values to identify the pci components and set the
> + * properties accordingly.
> + */
> +static void carbon_X1_fixup_relabel_alpine_ridge(struct pci_dev *dev)
> +{
> +	/* Is this JHL6540 PCI component embedded in a Lenovo device? */
> +	if (dev->subsystem_vendor != PCI_VENDOR_ID_LENOVO)
> +		return;
> +
> +	/* Is this JHL6540 PCI component embedded in an X1 Carbon Gen 7/8? */
> +	if (dev->subsystem_device != 0x22be && // Gen 8
> +	    dev->subsystem_device != 0x2292) { // Gen 7
> +		return;
> +	}
> +
> +	dev_set_removable(&dev->dev, DEVICE_FIXED);
> +
> +	/* Not all 0x15d3 components are external facing */
> +	if (dev->device == 0x15d3 &&
> +	    dev->devfn != 0x08 &&
> +	    dev->devfn != 0x20) {
> +		return;
> +	}
> +
> +	dev->external_facing = true;
> +}
> +
> +/*
> + * We also need to relabel the root port as a consequence of changing
> + * the JHL6540's PCIE hierarchy.
> + */
> +static void carbon_X1_fixup_rootport_not_removable(struct pci_dev *dev)
> +{
> +	/* Is this JHL6540 PCI component embedded in a Lenovo device? */
> +	if (dev->subsystem_vendor != PCI_VENDOR_ID_LENOVO)
> +		return;
> +
> +	/* Is this JHL6540 PCI component embedded in an X1 Carbon Gen 7/8? */
> +	if (dev->subsystem_device != 0x22be && // Gen 8
> +	    dev->subsystem_device != 0x2292) { // Gen 7
> +		return;
> +	}
> +
> +	dev->external_facing = false;
> +}
> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x15d3, carbon_X1_fixup_relabel_alpine_ridge);
> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x15d2, carbon_X1_fixup_relabel_alpine_ridge);
> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x15d4, carbon_X1_fixup_relabel_alpine_ridge);
> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x02b4, carbon_X1_fixup_rootport_not_removable);
> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x9db4, carbon_X1_fixup_rootport_not_removable);
> +
>  /*
>   * Following are device-specific reset methods which can be used to
>   * reset a single function if other methods (e.g. FLR, PM D0->D3) are
> 
> ---
> base-commit: b85ea95d086471afb4ad062012a4d73cd328fa86
> change-id: 20231219-thunderbolt-pci-patch-4-ede71cb833c4
> 
> Best regards,
> -- 
> Esther Shimanovich <eshimanovich@chromium.org>
Mika Westerberg Dec. 28, 2023, 1:39 p.m. UTC | #4
Thanks Lukas for including me.

On Thu, Dec 28, 2023 at 02:25:17PM +0100, Lukas Wunner wrote:
> [adding Mika and Mario to the list of recipients, original patch is here:
> https://lore.kernel.org/all/20231221-thunderbolt-pci-patch-4-v4-1-2e136e57c9bc@chromium.org/
> a lot of folks are on vacation, replies might not be sent before Jan 8;
> full quote without further comments below]
> 
> On Thu, Dec 21, 2023 at 03:53:42PM -0500, Esther Shimanovich wrote:
> > On Lenovo X1 Carbon Gen 7/8 devices, when a platform enables a policy to
> > distrust removable PCI devices, the build-in USB-C ports stop working at
> > all.
> > This happens because these X1 Carbon models have a unique feature; a
> > Thunderbolt controller that is discrete from the SoC. The software sees
> > this controller, and incorrectly assumes it is a removable PCI device,
> > even though it is fixed to the computer and is wired to the computer's
> > own USB-C ports.

Yes, the ExternalFacingPort applies to root ports so that includes
everything below that.

> > Relabel all the components of the JHL6540 controller as DEVICE_FIXED,
> > and where applicable, external_facing.
> > 
> > Ensure that the security policy to distrust external PCI devices works
> > as intended, and that the device's USB-C ports are able to enumerate
> > even when the policy is enabled.

This has been working just fine so far and as far as I can tell there is
no such "policy" in place in the mainline kernel.

> > 
> > Signed-off-by: Esther Shimanovich <eshimanovich@chromium.org>
> > ---
> > Changes in v4:
> > - replaced a dmi check in the rootport quirk with a subsystem vendor and
> >   device check.
> > - Link to v3: https://lore.kernel.org/r/20231220-thunderbolt-pci-patch-4-v3-1-056fd1717d06@chromium.org
> > 
> > Changes in v3:
> > - removed redundant dmi check, as the subsystem vendor check is
> >   sufficient
> > - switched to PCI_VENDOR_ID_LENOVO instead of hex code
> > - Link to v2: https://lore.kernel.org/r/20231219-thunderbolt-pci-patch-4-v2-1-ec2d7af45a9b@chromium.org
> > 
> > Changes in v2:
> > - nothing new, v1 was just a test run to see if the ASCII diagram would
> >   be rendered properly in mutt and k-9
> > - for folks using gmail, make sure to select "show original" on the top
> >   right, as otherwise the diagram will be garbled by the standard
> >   non-monospace font
> > - Link to v1: https://lore.kernel.org/r/20231219-thunderbolt-pci-patch-4-v1-1-4e8e3773f0a9@chromium.org
> > ---
> >  drivers/pci/quirks.c | 112 +++++++++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 112 insertions(+)
> > 
> > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> > index ea476252280a..34e43323ff14 100644
> > --- a/drivers/pci/quirks.c
> > +++ b/drivers/pci/quirks.c
> > @@ -3873,6 +3873,118 @@ DECLARE_PCI_FIXUP_SUSPEND_LATE(PCI_VENDOR_ID_INTEL,
> >  			       quirk_apple_poweroff_thunderbolt);
> >  #endif
> >  
> > +/*
> > + * On most ThinkPad Carbon 7/8s, JHL6540 Thunderbolt 3 bridges are set
> > + * incorrectly as DEVICE_REMOVABLE despite being built into the device.
> > + * This is the side effect of a unique hardware configuration.

There is really nothing "unique" here. It's exactly as specified by
this:

https://learn.microsoft.com/en-us/windows-hardware/drivers/pci/dsd-for-pcie-root-ports#identifying-externally-exposed-pcie-root-ports

and being used in many many system already out there and those have been
working just fine.

> > + *
> > + * Normally, Thunderbolt functionality is integrated to the SoC and
> > + * its root ports.
> > + *
> > + *                          Most devices:
> > + *                    root port --> USB-C port
> > + *
> > + * But X1 Carbon Gen 7/8 uses Whiskey Lake and Comet Lake SoC, which
> > + * don't come with Thunderbolt functionality. Therefore, a discrete
> > + * Thunderbolt Host PCI controller was added between the root port and
> > + * the USB-C port.
> > + *
> > + *                        Thinkpad Carbon 7/8s
> > + *                 (w/ Whiskey Lake and Comet Lake SoC):
> > + *                root port -->  JHL6540   --> USB-C port
> > + *
> > + * Because the root port is labeled by FW as "ExternalFacingPort", as
> > + * required by the DMAR ACPI spec, the JHL6540 chip is inaccurately
> > + * labeled as DEVICE_REMOVABLE by the kernel pci driver.
> > + * Therefore, the built-in USB-C ports do not enumerate when policies
> > + * forbidding external pci devices are enforced.
> > + *
> > + * This fix relabels the pci components in the built-in JHL6540 chip as
> > + * DEVICE_FIXED, ensuring that the built-in USB-C ports always enumerate
> > + * properly as intended.
> > + *
> > + * This fix also labels the external facing components of the JHL6540 as
> > + * external_facing, so that the pci attach policy works as intended.
> > + *
> > + * The ASCII diagram below describes the pci layout of the JHL6540 chip.
> > + *
> > + *                         Root Port
> > + *                 [8086:02b4] or [8086:9db4]
> > + *                             |
> > + *                        JHL6540 Chip
> > + *     __________________________________________________
> > + *    |                      Bridge                      |
> > + *    |        PCI ID ->  [8086:15d3]                    |
> > + *    |         DEVFN ->      (00)                       |
> > + *    |       _________________|__________________       |
> > + *    |      |           |            |           |      |
> > + *    |    Bridge     Bridge        Bridge      Bridge   |
> > + *    | [8086:15d3] [8086:15d3]  [8086:15d3] [8086:15d3] |
> > + *    |    (00)        (08)         (10)        (20)     |
> > + *    |      |           |            |           |      |
> > + *    |     NHI          |     USB Controller     |      |
> > + *    | [8086:15d2]      |       [8086:15d4]      |      |
> > + *    |    (00)          |          (00)          |      |
> > + *    |      |___________|            |___________|      |
> > + *    |____________|________________________|____________|
> > + *                 |                        |
> > + *             USB-C Port               USB-C Port
> > + *
> > + *
> > + * Based on what a JHL6549 pci component's pci id, subsystem device id
> > + * and devfn are, we can infer if it is fixed and if it faces a usb port;
> > + * which would mean it is external facing.
> > + * This quirk uses these values to identify the pci components and set the
> > + * properties accordingly.
> > + */
> > +static void carbon_X1_fixup_relabel_alpine_ridge(struct pci_dev *dev)
> > +{
> > +	/* Is this JHL6540 PCI component embedded in a Lenovo device? */
> > +	if (dev->subsystem_vendor != PCI_VENDOR_ID_LENOVO)
> > +		return;
> > +
> > +	/* Is this JHL6540 PCI component embedded in an X1 Carbon Gen 7/8? */
> > +	if (dev->subsystem_device != 0x22be && // Gen 8
> > +	    dev->subsystem_device != 0x2292) { // Gen 7
> > +		return;
> > +	}
> > +
> > +	dev_set_removable(&dev->dev, DEVICE_FIXED);
> > +
> > +	/* Not all 0x15d3 components are external facing */
> > +	if (dev->device == 0x15d3 &&
> > +	    dev->devfn != 0x08 &&
> > +	    dev->devfn != 0x20) {
> > +		return;
> > +	}
> > +
> > +	dev->external_facing = true;
> > +}
> > +
> > +/*
> > + * We also need to relabel the root port as a consequence of changing
> > + * the JHL6540's PCIE hierarchy.
> > + */
> > +static void carbon_X1_fixup_rootport_not_removable(struct pci_dev *dev)
> > +{
> > +	/* Is this JHL6540 PCI component embedded in a Lenovo device? */
> > +	if (dev->subsystem_vendor != PCI_VENDOR_ID_LENOVO)
> > +		return;
> > +
> > +	/* Is this JHL6540 PCI component embedded in an X1 Carbon Gen 7/8? */
> > +	if (dev->subsystem_device != 0x22be && // Gen 8
> > +	    dev->subsystem_device != 0x2292) { // Gen 7
> > +		return;
> > +	}
> > +
> > +	dev->external_facing = false;
> > +}
> > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x15d3, carbon_X1_fixup_relabel_alpine_ridge);
> > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x15d2, carbon_X1_fixup_relabel_alpine_ridge);
> > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x15d4, carbon_X1_fixup_relabel_alpine_ridge);
> > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x02b4, carbon_X1_fixup_rootport_not_removable);
> > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x9db4, carbon_X1_fixup_rootport_not_removable);

This is not scalable at all. You would need to include lots of systems
here. And there should be no issue at all anyways.

Can you elaborate what the issue is and which mainline kernel you are
using to reproduce this?
Esther Shimanovich Jan. 17, 2024, 9:21 p.m. UTC | #5
Thank you for all your comments! I really appreciate all your help
with this. I will address the style feedback once we reach a decision
on how we will fix this bug.
I first will respond to your comments, and then I will list out the
possible solutions to this bug, in a way that takes into account all
of your insights.

On Tue, Dec 26, 2023 at 7:15 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> Can you include a citation (spec name, revision, section) for this
> DMAR requirement?
>
This was my mistake–I misinterpreted what a firmware developer told
me. This is a firmware ACPI requirement from windows, which is not in
the DMAR spec. Windows uses it to identify externally exposed PCIE
root ports.
https://learn.microsoft.com/en-us/windows-hardware/drivers/pci/dsd-for-pcie-root-ports#identifying-externally-exposed-pcie-root-ports

> But I don't see where the defect is here.  And I doubt that this is
> really a unique situation.  So it's likely that this will happen on
> other systems, and we don't want to have to add quirks every time
> another one shows up.
...
> don't have the new interface.  But we at least need a plan that
> doesn't require quirks indefinitely.
...
On Thu, Dec 28, 2023 at 8:41 AM Mika Westerberg
<mika.westerberg@linux.intel.com> wrote:
> This is not scalable at all. You would need to include lots of systems
> here. And there should be no issue at all anyways.
My team tests hundreds of different devices, and this is the only one
which exhibited this issue that we’ve seen so far.
No other devices we’ve seen so far have a discrete internal
Thunderbolt controller which is treated as a removable device.
Therefore, we don’t expect that a large number of devices will need
this quirk.

> There is really nothing "unique" here. It's exactly as specified by
> this:
>
> https://learn.microsoft.com/en-us/windows-hardware/drivers/pci/dsd-for-pcie-root-ports#identifying-externally-exposed-pcie-root-ports
>
> and being used in many many system already out there and those have been
> working just fine.
I don’t know how many computers have a discrete Thunderbolt chip that
is separate from their CPU, but this doesn’t seem to be a common
occurrence.
These devices were made during a narrow window of time when CPUs
didn’t have Thunderbolt features yet, so a separate JHL6540 chip had
to be added so that Lenovo could include Thunderbolt on X1 Carbon Gen
7/8.

As you said, these devices do indeed work fine in cases where you
don’t care if a PCI Thunderbolt device is internal or external, which
is most cases.
Problems happen only whenever someone adds a security policy, or some
other feature that cares about the distinction between a fixed or
removable PCI device.

> This has been working just fine so far and as far as I can tell there is
> no such "policy" in place in the mainline kernel.
Correct, there is no such policy in the mainline kernel as of now. The
bug is that the linux kernel’s “removable” property is inaccurate for
this device.

> Can you elaborate what the issue is and which mainline kernel you are
> using to reproduce this?
Thanks for this question! On a Lenovo Thinkpad Gen 7/Gen 8 computer
with the linux kernel installed, when you look at the properties of
the JHL6540 Thunderbolt controller, you see that it is incorrectly
labeled as removable. I have replicated this bug on the b85ea95d0864
Linux 6.7-rc1 kernel.

Before my patch, you see that the JHL6540 controller is inaccurately
labeled “removable”:
$ udevadm info -a -p /sys/bus/pci/devices/0000:05:00.0 | grep -e
{removable} -e {device} -e {vendor} -e looking
  looking at device '/devices/pci0000:00/0000:00:1d.4/0000:05:00.0':
    ATTR{device}=="0x15d3"
    ATTR{removable}=="removable"
    ATTR{vendor}=="0x8086"
  looking at parent device '/devices/pci0000:00/0000:00:1d.4':
    ATTRS{device}=="0x02b4"
    ATTRS{vendor}=="0x8086"
  looking at parent device '/devices/pci0000:00':

After applying the patch in this ticket, we see the JHL6540 controller
is now labeled as “fixed”:
$ udevadm info -a -p /sys/bus/pci/devices/0000:05:00.0 | grep -e
{removable} -e {device} -e {vendor} -e looking
  looking at device '/devices/pci0000:00/0000:00:1d.4/0000:05:00.0':
    ATTR{device}=="0x15d3"
    ATTR{removable}=="fixed"
    ATTR{vendor}=="0x8086"
  looking at parent device '/devices/pci0000:00/0000:00:1d.4':
    ATTRS{device}=="0x02b4"
    ATTRS{vendor}=="0x8086"
  looking at parent device '/devices/pci0000:00':

OK so here is the part where I share what I’ve developed as a result
of your comments:

The two options I see to resolve this are as follows:
1) Either we fix this by adding a new firmware interface as Bjorn
Helgaas brought up.
2) Alternatively we may address this through a cleaned-up version of this patch

If the solution is to add a firmware interface, how would I go about
that process? Could you put me in touch with someone with that
know-how?
Would we have a temporary software quirk in place while the firmware
spec is being updated?
I am deferring to your expertise and knowledge in solving this bug.
Thank you for all your help.
Mika Westerberg Jan. 18, 2024, 6 a.m. UTC | #6
On Wed, Jan 17, 2024 at 04:21:18PM -0500, Esther Shimanovich wrote:
> Thank you for all your comments! I really appreciate all your help
> with this. I will address the style feedback once we reach a decision
> on how we will fix this bug.
> I first will respond to your comments, and then I will list out the
> possible solutions to this bug, in a way that takes into account all
> of your insights.
> 
> On Tue, Dec 26, 2023 at 7:15 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > Can you include a citation (spec name, revision, section) for this
> > DMAR requirement?
> >
> This was my mistake–I misinterpreted what a firmware developer told
> me. This is a firmware ACPI requirement from windows, which is not in
> the DMAR spec. Windows uses it to identify externally exposed PCIE
> root ports.
> https://learn.microsoft.com/en-us/windows-hardware/drivers/pci/dsd-for-pcie-root-ports#identifying-externally-exposed-pcie-root-ports
> 
> > But I don't see where the defect is here.  And I doubt that this is
> > really a unique situation.  So it's likely that this will happen on
> > other systems, and we don't want to have to add quirks every time
> > another one shows up.
> ...
> > don't have the new interface.  But we at least need a plan that
> > doesn't require quirks indefinitely.
> ...
> On Thu, Dec 28, 2023 at 8:41 AM Mika Westerberg
> <mika.westerberg@linux.intel.com> wrote:
> > This is not scalable at all. You would need to include lots of systems
> > here. And there should be no issue at all anyways.
> My team tests hundreds of different devices, and this is the only one
> which exhibited this issue that we’ve seen so far.
> No other devices we’ve seen so far have a discrete internal
> Thunderbolt controller which is treated as a removable device.
> Therefore, we don’t expect that a large number of devices will need
> this quirk.

Well that's pretty much all Intel Titan Ridge and Maple Ridge based
systems. Some early ones did not use IOMMU but all the rest do.

> > There is really nothing "unique" here. It's exactly as specified by
> > this:
> >
> > https://learn.microsoft.com/en-us/windows-hardware/drivers/pci/dsd-for-pcie-root-ports#identifying-externally-exposed-pcie-root-ports
> >
> > and being used in many many system already out there and those have been
> > working just fine.
> I don’t know how many computers have a discrete Thunderbolt chip that
> is separate from their CPU, but this doesn’t seem to be a common
> occurrence.
> These devices were made during a narrow window of time when CPUs
> didn’t have Thunderbolt features yet, so a separate JHL6540 chip had
> to be added so that Lenovo could include Thunderbolt on X1 Carbon Gen
> 7/8.

Before Intel Ice Lake it was all discrete and it is still discrete with
the Barlow Ridge controller who will have exact same ExternalFacing port
as the previous models.

> As you said, these devices do indeed work fine in cases where you
> don’t care if a PCI Thunderbolt device is internal or external, which
> is most cases.
> Problems happen only whenever someone adds a security policy, or some
> other feature that cares about the distinction between a fixed or
> removable PCI device.

Do we have such security policy in the mainline kernel?

> > This has been working just fine so far and as far as I can tell there is
> > no such "policy" in place in the mainline kernel.
> Correct, there is no such policy in the mainline kernel as of now. The
> bug is that the linux kernel’s “removable” property is inaccurate for
> this device.

Or perhaps the "policy" should know this better? IIRC there were some
"exceptions" in the Chrome kernel that allowed to put these devices into
"allowlist" is this not the case anymore?

> > Can you elaborate what the issue is and which mainline kernel you are
> > using to reproduce this?
> Thanks for this question! On a Lenovo Thinkpad Gen 7/Gen 8 computer
> with the linux kernel installed, when you look at the properties of
> the JHL6540 Thunderbolt controller, you see that it is incorrectly
> labeled as removable. I have replicated this bug on the b85ea95d0864
> Linux 6.7-rc1 kernel.
> 
> Before my patch, you see that the JHL6540 controller is inaccurately
> labeled “removable”:
> $ udevadm info -a -p /sys/bus/pci/devices/0000:05:00.0 | grep -e
> {removable} -e {device} -e {vendor} -e looking
>   looking at device '/devices/pci0000:00/0000:00:1d.4/0000:05:00.0':
>     ATTR{device}=="0x15d3"
>     ATTR{removable}=="removable"
>     ATTR{vendor}=="0x8086"

This is actually accurate. The Thunderbolt controller is itself
hot-removable and that BTW happens to be hot-removed when fwupd applies
firmware upgrades to the device.
Mario Limonciello Jan. 18, 2024, 3:47 p.m. UTC | #7
On 1/18/2024 00:00, Mika Westerberg wrote:
>> Before my patch, you see that the JHL6540 controller is inaccurately
>> labeled “removable”:
>> $ udevadm info -a -p /sys/bus/pci/devices/0000:05:00.0 | grep -e
>> {removable} -e {device} -e {vendor} -e looking
>>    looking at device '/devices/pci0000:00/0000:00:1d.4/0000:05:00.0':
>>      ATTR{device}=="0x15d3"
>>      ATTR{removable}=="removable"
>>      ATTR{vendor}=="0x8086"
> 
> This is actually accurate. The Thunderbolt controller is itself
> hot-removable and that BTW happens to be hot-removed when fwupd applies
> firmware upgrades to the device.

Depending on the consumers of this removable attribute I wonder if we 
need to a new ATTR of "external" instead of overloading "removable".
Dmitry Torokhov Jan. 18, 2024, 4:12 p.m. UTC | #8
On Thu, Jan 18, 2024 at 09:47:07AM -0600, Mario Limonciello wrote:
> On 1/18/2024 00:00, Mika Westerberg wrote:
> > > Before my patch, you see that the JHL6540 controller is inaccurately
> > > labeled “removable”:
> > > $ udevadm info -a -p /sys/bus/pci/devices/0000:05:00.0 | grep -e
> > > {removable} -e {device} -e {vendor} -e looking
> > >    looking at device '/devices/pci0000:00/0000:00:1d.4/0000:05:00.0':
> > >      ATTR{device}=="0x15d3"
> > >      ATTR{removable}=="removable"
> > >      ATTR{vendor}=="0x8086"
> > 
> > This is actually accurate. The Thunderbolt controller is itself
> > hot-removable and that BTW happens to be hot-removed when fwupd applies
> > firmware upgrades to the device.

This is quite interesting take. Does fwupd rip the controller out of the
box to update it? By that account your touchpad is also removable as it
may stop functioning when its firmware gets updated.

> 
> Depending on the consumers of this removable attribute I wonder if we need
> to a new ATTR of "external" instead of overloading "removable".

Isn't this the same thing? From
Documentation/ABI/testing/sysfs-devices-removable:

What:		/sys/devices/.../removable
Date:		May 2021
Contact:	Rajat Jain <rajatxjain@gmail.com>
Description:
		Information about whether a given device can be removed from the
		platform by the	user. This is determined by its subsystem in a
		bus / platform-specific way. This attribute is only present for
		devices that can support determining such information:

		===========  ===================================================
		"removable"  device can be removed from the platform by the user
		"fixed"      device is fixed to the platform / cannot be removed
			     by the user.

Note this "by the user". Maybe we should add word "physically" here to
qualify the meaning completely, but that is what it is. Not that it
disappears from the bus or stops operating for some time because of
firmware updates, but it can be physically detached from the
platform/system.

Thanks.
Dmitry Torokhov Jan. 18, 2024, 4:21 p.m. UTC | #9
On Thu, Jan 18, 2024 at 08:12:56AM -0800, Dmitry Torokhov wrote:
> On Thu, Jan 18, 2024 at 09:47:07AM -0600, Mario Limonciello wrote:
> > On 1/18/2024 00:00, Mika Westerberg wrote:
> > > > Before my patch, you see that the JHL6540 controller is inaccurately
> > > > labeled “removable”:
> > > > $ udevadm info -a -p /sys/bus/pci/devices/0000:05:00.0 | grep -e
> > > > {removable} -e {device} -e {vendor} -e looking
> > > >    looking at device '/devices/pci0000:00/0000:00:1d.4/0000:05:00.0':
> > > >      ATTR{device}=="0x15d3"
> > > >      ATTR{removable}=="removable"
> > > >      ATTR{vendor}=="0x8086"
> > > 
> > > This is actually accurate. The Thunderbolt controller is itself
> > > hot-removable and that BTW happens to be hot-removed when fwupd applies
> > > firmware upgrades to the device.
> 
> This is quite interesting take. Does fwupd rip the controller out of the
> box to update it? By that account your touchpad is also removable as it
> may stop functioning when its firmware gets updated.
> 
> > 
> > Depending on the consumers of this removable attribute I wonder if we need
> > to a new ATTR of "external" instead of overloading "removable".
> 
> Isn't this the same thing? From
> Documentation/ABI/testing/sysfs-devices-removable:
> 
> What:		/sys/devices/.../removable
> Date:		May 2021
> Contact:	Rajat Jain <rajatxjain@gmail.com>
> Description:
> 		Information about whether a given device can be removed from the
> 		platform by the	user. This is determined by its subsystem in a
> 		bus / platform-specific way. This attribute is only present for
> 		devices that can support determining such information:
> 
> 		===========  ===================================================
> 		"removable"  device can be removed from the platform by the user
> 		"fixed"      device is fixed to the platform / cannot be removed
> 			     by the user.
> 
> Note this "by the user". Maybe we should add word "physically" here to
> qualify the meaning completely, but that is what it is. Not that it
> disappears from the bus or stops operating for some time because of
> firmware updates, but it can be physically detached from the
> platform/system.

FTR this is where Rajat and Bjorn have agreed what the attribute means
when it was moved from USB-specific to general device property:

https://lore.kernel.org/all/20210511230228.GA2429744@bjorn-Precision-5520/

Thanks.
Mika Westerberg Jan. 19, 2024, 5:37 a.m. UTC | #10
On Thu, Jan 18, 2024 at 08:12:56AM -0800, Dmitry Torokhov wrote:
> On Thu, Jan 18, 2024 at 09:47:07AM -0600, Mario Limonciello wrote:
> > On 1/18/2024 00:00, Mika Westerberg wrote:
> > > > Before my patch, you see that the JHL6540 controller is inaccurately
> > > > labeled “removable”:
> > > > $ udevadm info -a -p /sys/bus/pci/devices/0000:05:00.0 | grep -e
> > > > {removable} -e {device} -e {vendor} -e looking
> > > >    looking at device '/devices/pci0000:00/0000:00:1d.4/0000:05:00.0':
> > > >      ATTR{device}=="0x15d3"
> > > >      ATTR{removable}=="removable"
> > > >      ATTR{vendor}=="0x8086"
> > > 
> > > This is actually accurate. The Thunderbolt controller is itself
> > > hot-removable and that BTW happens to be hot-removed when fwupd applies
> > > firmware upgrades to the device.
> 
> This is quite interesting take. Does fwupd rip the controller out of the
> box to update it? By that account your touchpad is also removable as it
> may stop functioning when its firmware gets updated.

The Thunderbolt controller is connected to a hotpluggable PCIe root port
so it will be dissappear from the userspace so that "removable" in that
sense is accurate.

> > Depending on the consumers of this removable attribute I wonder if we need
> > to a new ATTR of "external" instead of overloading "removable".
> 
> Isn't this the same thing? From
> Documentation/ABI/testing/sysfs-devices-removable:
> 
> What:		/sys/devices/.../removable
> Date:		May 2021
> Contact:	Rajat Jain <rajatxjain@gmail.com>
> Description:
> 		Information about whether a given device can be removed from the
> 		platform by the	user. This is determined by its subsystem in a
> 		bus / platform-specific way. This attribute is only present for
> 		devices that can support determining such information:
> 
> 		===========  ===================================================
> 		"removable"  device can be removed from the platform by the user
> 		"fixed"      device is fixed to the platform / cannot be removed
> 			     by the user.
> 
> Note this "by the user". Maybe we should add word "physically" here to
> qualify the meaning completely, but that is what it is. Not that it
> disappears from the bus or stops operating for some time because of
> firmware updates, but it can be physically detached from the
> platform/system.

That would be good to fully qualify what this means. But I get you it is
intented to identify devices that can be physically unplugged from the
system.
Mika Westerberg Jan. 19, 2024, 7:48 a.m. UTC | #11
On Fri, Jan 19, 2024 at 07:37:56AM +0200, Mika Westerberg wrote:
> On Thu, Jan 18, 2024 at 08:12:56AM -0800, Dmitry Torokhov wrote:
> > On Thu, Jan 18, 2024 at 09:47:07AM -0600, Mario Limonciello wrote:
> > > On 1/18/2024 00:00, Mika Westerberg wrote:
> > > > > Before my patch, you see that the JHL6540 controller is inaccurately
> > > > > labeled “removable”:
> > > > > $ udevadm info -a -p /sys/bus/pci/devices/0000:05:00.0 | grep -e
> > > > > {removable} -e {device} -e {vendor} -e looking
> > > > >    looking at device '/devices/pci0000:00/0000:00:1d.4/0000:05:00.0':
> > > > >      ATTR{device}=="0x15d3"
> > > > >      ATTR{removable}=="removable"
> > > > >      ATTR{vendor}=="0x8086"
> > > > 
> > > > This is actually accurate. The Thunderbolt controller is itself
> > > > hot-removable and that BTW happens to be hot-removed when fwupd applies
> > > > firmware upgrades to the device.
> > 
> > This is quite interesting take. Does fwupd rip the controller out of the
> > box to update it? By that account your touchpad is also removable as it
> > may stop functioning when its firmware gets updated.
> 
> The Thunderbolt controller is connected to a hotpluggable PCIe root port
> so it will be dissappear from the userspace so that "removable" in that
> sense is accurate.

There are systems as well where the Thunderbolt (and/or xHCI) controller
only appears if there is anything plugged to the physical Type-C ports
and it gets removed pretty soon after the physical device gets
unplugged. These are also the same Alpine Ridge and Titan Ridge
controllers that this patch is dealing with.

I tried to think about some sort of more generic heuristic how to figure
out that the controller is actually inside the physical system but there
is a problem that the same controller can appear on the bus as well, eg.
you plug in Thunderbolt dock and that one has xHCI controller too. That
device should definitely be "removable". With the "software CM" systems
we have a couple of additional hints in the ACPI tables that can be used
to identify the "tunneled" ports but this does not apply to the older
systems I'm afraid.

Now if I understand the reason behind this patch is actually not about
"removability" that much than about identifying a trusted vs. untrusted
device and attaching a driver to those. I was under impression that
there is already a solution to this in ChromeOS kernel. It has an
allowlist of drivers that are allowed to attach these devices and that
includes the PCIe port drivers, xhci_hcd and the thunderbolt driver,
possibly something else too. Is this not working for your case?
Mika Westerberg Jan. 19, 2024, 10:22 a.m. UTC | #12
On Fri, Jan 19, 2024 at 09:48:29AM +0200, Mika Westerberg wrote:
> On Fri, Jan 19, 2024 at 07:37:56AM +0200, Mika Westerberg wrote:
> > On Thu, Jan 18, 2024 at 08:12:56AM -0800, Dmitry Torokhov wrote:
> > > On Thu, Jan 18, 2024 at 09:47:07AM -0600, Mario Limonciello wrote:
> > > > On 1/18/2024 00:00, Mika Westerberg wrote:
> > > > > > Before my patch, you see that the JHL6540 controller is inaccurately
> > > > > > labeled “removable”:
> > > > > > $ udevadm info -a -p /sys/bus/pci/devices/0000:05:00.0 | grep -e
> > > > > > {removable} -e {device} -e {vendor} -e looking
> > > > > >    looking at device '/devices/pci0000:00/0000:00:1d.4/0000:05:00.0':
> > > > > >      ATTR{device}=="0x15d3"
> > > > > >      ATTR{removable}=="removable"
> > > > > >      ATTR{vendor}=="0x8086"
> > > > > 
> > > > > This is actually accurate. The Thunderbolt controller is itself
> > > > > hot-removable and that BTW happens to be hot-removed when fwupd applies
> > > > > firmware upgrades to the device.
> > > 
> > > This is quite interesting take. Does fwupd rip the controller out of the
> > > box to update it? By that account your touchpad is also removable as it
> > > may stop functioning when its firmware gets updated.
> > 
> > The Thunderbolt controller is connected to a hotpluggable PCIe root port
> > so it will be dissappear from the userspace so that "removable" in that
> > sense is accurate.
> 
> There are systems as well where the Thunderbolt (and/or xHCI) controller
> only appears if there is anything plugged to the physical Type-C ports
> and it gets removed pretty soon after the physical device gets
> unplugged. These are also the same Alpine Ridge and Titan Ridge
> controllers that this patch is dealing with.
> 
> I tried to think about some sort of more generic heuristic how to figure
> out that the controller is actually inside the physical system but there
> is a problem that the same controller can appear on the bus as well, eg.
> you plug in Thunderbolt dock and that one has xHCI controller too. That
> device should definitely be "removable". With the "software CM" systems
> we have a couple of additional hints in the ACPI tables that can be used
> to identify the "tunneled" ports but this does not apply to the older
> systems I'm afraid.

The below "might" work:

1. A device that is directly behind a PCIe root or downstream port that
   has ->external_facing == 1.

2. It is a PCIe endpoint.

3. It is a sibling to or has any of the below PCI IDs (see
   drivers/thunderbolt/nhi.h for the definitions):

   PCI_DEVICE_ID_INTEL_ALPINE_RIDGE_C_4C_NHI
   PCI_DEVICE_ID_INTEL_ALPINE_RIDGE_C_2C_NHI
   PCI_DEVICE_ID_INTEL_ALPINE_RIDGE_LP_USBONLY_NHI
   PCI_DEVICE_ID_INTEL_ALPINE_RIDGE_USBONLY_NHI
   PCI_DEVICE_ID_INTEL_ALPINE_RIDGE_C_USBONLY_NHI
   PCI_DEVICE_ID_INTEL_TITAN_RIDGE_2C_NHI
   PCI_DEVICE_ID_INTEL_TITAN_RIDGE_4C_NHI

   And for all USB4 we can use the PCI class:

   PCI_CLASS_SERIAL_USB_USB4

(4. With USB4 systems we could also add the check that the device is not
below the tunneled PCIe ports but that's kind of redundant).

I think this covers the existing devices as well as future discrete USB4
controllers and marks both the NHI and the xHCI as "fixed" which we
could also use to lift the bounce buffering restriction from them.

@Mario, did I miss anything?
Esther Shimanovich Jan. 19, 2024, 4:03 p.m. UTC | #13
On Thu, Jan 18, 2024 at 1:01 AM Mika Westerberg
<mika.westerberg@linux.intel.com> wrote:
>
> Well that's pretty much all Intel Titan Ridge and Maple Ridge based
> systems. Some early ones did not use IOMMU but all the rest do.
...
> Before Intel Ice Lake it was all discrete and it is still discrete with
> the Barlow Ridge controller who will have exact same ExternalFacing port
> as the previous models.

Next week I'll try those devices in our inventory to see if I can find
another one with this bug. I'll get back to you on that!

On Fri, Jan 19, 2024 at 2:58 AM Mika Westerberg
<mika.westerberg@linux.intel.com> wrote:
>
> Now if I understand the reason behind this patch is actually not about
> "removability" that much than about identifying a trusted vs. untrusted
> device and attaching a driver to those. I was under impression that
> there is already a solution to this in ChromeOS kernel. It has an
> allowlist of drivers that are allowed to attach these devices and that
> includes the PCIe port drivers, xhci_hcd and the thunderbolt driver,
> possibly something else too. Is this not working for your case?

This device shouldn’t be treated as a removable thunderbolt device
that is enabled by policy because it is an internal device that should
be trusted in the first place.
Even so, while learning about this problem I tried modifying the
ChromeOS policy but it ended up not fixing the issue because it seems
like there is an expectation for it to see an existing “fixed”
thunderbolt port before it loads anything else. But the fixed
thunderbolt port is prevented from enumerating during bootup, before
the policy has a chance to work.

On Fri, Jan 19, 2024 at 5:23 AM Mika Westerberg
<mika.westerberg@linux.intel.com> wrote:
>
> The below "might" work:
>
> 1. A device that is directly behind a PCIe root or downstream port that
>    has ->external_facing == 1.
>
> 2. It is a PCIe endpoint.
>
> 3. It is a sibling to or has any of the below PCI IDs (see
>    drivers/thunderbolt/nhi.h for the definitions):

External pci devices seem to have the same kinds of chips in them. So
this wouldn’t distinguish the “fixed but discrete” embedded pci
devices from the “removable” pcie through usb devices. My monitor with
thunderbolt capabilities has the JHL7540 chip in it. From the kernel's
perspective, I have only found that the subsystem id is what
distinguishes these devices.

That is, unless I am missing something in your proposal that would
distinguish a fixed JHL6540 chip from an external JHL6540 chip. Please
correct me on any assumptions I get wrong!
Mika Westerberg Jan. 22, 2024, 6:10 a.m. UTC | #14
On Fri, Jan 19, 2024 at 11:03:18AM -0500, Esther Shimanovich wrote:
> On Thu, Jan 18, 2024 at 1:01 AM Mika Westerberg
> <mika.westerberg@linux.intel.com> wrote:
> >
> > Well that's pretty much all Intel Titan Ridge and Maple Ridge based
> > systems. Some early ones did not use IOMMU but all the rest do.
> ...
> > Before Intel Ice Lake it was all discrete and it is still discrete with
> > the Barlow Ridge controller who will have exact same ExternalFacing port
> > as the previous models.
> 
> Next week I'll try those devices in our inventory to see if I can find
> another one with this bug. I'll get back to you on that!
> 
> On Fri, Jan 19, 2024 at 2:58 AM Mika Westerberg
> <mika.westerberg@linux.intel.com> wrote:
> >
> > Now if I understand the reason behind this patch is actually not about
> > "removability" that much than about identifying a trusted vs. untrusted
> > device and attaching a driver to those. I was under impression that
> > there is already a solution to this in ChromeOS kernel. It has an
> > allowlist of drivers that are allowed to attach these devices and that
> > includes the PCIe port drivers, xhci_hcd and the thunderbolt driver,
> > possibly something else too. Is this not working for your case?
> 
> This device shouldn’t be treated as a removable thunderbolt device
> that is enabled by policy because it is an internal device that should
> be trusted in the first place.
> Even so, while learning about this problem I tried modifying the
> ChromeOS policy but it ended up not fixing the issue because it seems
> like there is an expectation for it to see an existing “fixed”
> thunderbolt port before it loads anything else. But the fixed
> thunderbolt port is prevented from enumerating during bootup, before
> the policy has a chance to work.

Have you contacted the Chrome kernel folks about this? Perhaps they
have thought about this already?

> On Fri, Jan 19, 2024 at 5:23 AM Mika Westerberg
> <mika.westerberg@linux.intel.com> wrote:
> >
> > The below "might" work:
> >
> > 1. A device that is directly behind a PCIe root or downstream port that
> >    has ->external_facing == 1.
> >
> > 2. It is a PCIe endpoint.
> >
> > 3. It is a sibling to or has any of the below PCI IDs (see
> >    drivers/thunderbolt/nhi.h for the definitions):
> 
> External pci devices seem to have the same kinds of chips in them. So
> this wouldn’t distinguish the “fixed but discrete” embedded pci
> devices from the “removable” pcie through usb devices. My monitor with
> thunderbolt capabilities has the JHL7540 chip in it. From the kernel's
> perspective, I have only found that the subsystem id is what
> distinguishes these devices.
> 
> That is, unless I am missing something in your proposal that would
> distinguish a fixed JHL6540 chip from an external JHL6540 chip. Please
> correct me on any assumptions I get wrong!

Yes, you are missing the 1. that it needs to be directly behind the PCIe
root or downstream port that is marked as ->external_facing, and the
fact that there can't be NHI's (that's the host controller with the IDs
I listed in 3.) anywhere else except starting the topology according the
USB4 spec (and the same applies to Thunderbolt 1-3).
Mario Limonciello Jan. 22, 2024, 11:50 p.m. UTC | #15
On 1/19/2024 04:22, Mika Westerberg wrote:
> On Fri, Jan 19, 2024 at 09:48:29AM +0200, Mika Westerberg wrote:
>> On Fri, Jan 19, 2024 at 07:37:56AM +0200, Mika Westerberg wrote:
>>> On Thu, Jan 18, 2024 at 08:12:56AM -0800, Dmitry Torokhov wrote:
>>>> On Thu, Jan 18, 2024 at 09:47:07AM -0600, Mario Limonciello wrote:
>>>>> On 1/18/2024 00:00, Mika Westerberg wrote:
>>>>>>> Before my patch, you see that the JHL6540 controller is inaccurately
>>>>>>> labeled “removable”:
>>>>>>> $ udevadm info -a -p /sys/bus/pci/devices/0000:05:00.0 | grep -e
>>>>>>> {removable} -e {device} -e {vendor} -e looking
>>>>>>>     looking at device '/devices/pci0000:00/0000:00:1d.4/0000:05:00.0':
>>>>>>>       ATTR{device}=="0x15d3"
>>>>>>>       ATTR{removable}=="removable"
>>>>>>>       ATTR{vendor}=="0x8086"
>>>>>>
>>>>>> This is actually accurate. The Thunderbolt controller is itself
>>>>>> hot-removable and that BTW happens to be hot-removed when fwupd applies
>>>>>> firmware upgrades to the device.
>>>>
>>>> This is quite interesting take. Does fwupd rip the controller out of the
>>>> box to update it? By that account your touchpad is also removable as it
>>>> may stop functioning when its firmware gets updated.
>>>
>>> The Thunderbolt controller is connected to a hotpluggable PCIe root port
>>> so it will be dissappear from the userspace so that "removable" in that
>>> sense is accurate.
>>
>> There are systems as well where the Thunderbolt (and/or xHCI) controller
>> only appears if there is anything plugged to the physical Type-C ports
>> and it gets removed pretty soon after the physical device gets
>> unplugged. These are also the same Alpine Ridge and Titan Ridge
>> controllers that this patch is dealing with.
>>
>> I tried to think about some sort of more generic heuristic how to figure
>> out that the controller is actually inside the physical system but there
>> is a problem that the same controller can appear on the bus as well, eg.
>> you plug in Thunderbolt dock and that one has xHCI controller too. That
>> device should definitely be "removable". With the "software CM" systems
>> we have a couple of additional hints in the ACPI tables that can be used
>> to identify the "tunneled" ports but this does not apply to the older
>> systems I'm afraid.
> 
> The below "might" work:
> 
> 1. A device that is directly behind a PCIe root or downstream port that
>     has ->external_facing == 1.
> 
> 2. It is a PCIe endpoint.
> 
> 3. It is a sibling to or has any of the below PCI IDs (see
>     drivers/thunderbolt/nhi.h for the definitions):
> 
>     PCI_DEVICE_ID_INTEL_ALPINE_RIDGE_C_4C_NHI
>     PCI_DEVICE_ID_INTEL_ALPINE_RIDGE_C_2C_NHI
>     PCI_DEVICE_ID_INTEL_ALPINE_RIDGE_LP_USBONLY_NHI
>     PCI_DEVICE_ID_INTEL_ALPINE_RIDGE_USBONLY_NHI
>     PCI_DEVICE_ID_INTEL_ALPINE_RIDGE_C_USBONLY_NHI
>     PCI_DEVICE_ID_INTEL_TITAN_RIDGE_2C_NHI
>     PCI_DEVICE_ID_INTEL_TITAN_RIDGE_4C_NHI
> 
>     And for all USB4 we can use the PCI class:
> 
>     PCI_CLASS_SERIAL_USB_USB4
> 
> (4. With USB4 systems we could also add the check that the device is not
> below the tunneled PCIe ports but that's kind of redundant).
> 
> I think this covers the existing devices as well as future discrete USB4
> controllers and marks both the NHI and the xHCI as "fixed" which we
> could also use to lift the bounce buffering restriction from them.
> 
> @Mario, did I miss anything?

The bounce buffering restriction is only for unaligned DMA isn't it? 
Does that tend to happen a lot?

But otherwise I think this does a good job.  It will cover external 
enclosure cases too because of having to check it's directly behind a 
root port.

But it should also include comments about why it's not needed on newer 
systems (IE the ACPI hints for someone with no prior knowledge looking 
at this to find).
Mika Westerberg Jan. 23, 2024, 6:18 a.m. UTC | #16
On Mon, Jan 22, 2024 at 05:50:24PM -0600, Mario Limonciello wrote:
> On 1/19/2024 04:22, Mika Westerberg wrote:
> > On Fri, Jan 19, 2024 at 09:48:29AM +0200, Mika Westerberg wrote:
> > > On Fri, Jan 19, 2024 at 07:37:56AM +0200, Mika Westerberg wrote:
> > > > On Thu, Jan 18, 2024 at 08:12:56AM -0800, Dmitry Torokhov wrote:
> > > > > On Thu, Jan 18, 2024 at 09:47:07AM -0600, Mario Limonciello wrote:
> > > > > > On 1/18/2024 00:00, Mika Westerberg wrote:
> > > > > > > > Before my patch, you see that the JHL6540 controller is inaccurately
> > > > > > > > labeled “removable”:
> > > > > > > > $ udevadm info -a -p /sys/bus/pci/devices/0000:05:00.0 | grep -e
> > > > > > > > {removable} -e {device} -e {vendor} -e looking
> > > > > > > >     looking at device '/devices/pci0000:00/0000:00:1d.4/0000:05:00.0':
> > > > > > > >       ATTR{device}=="0x15d3"
> > > > > > > >       ATTR{removable}=="removable"
> > > > > > > >       ATTR{vendor}=="0x8086"
> > > > > > > 
> > > > > > > This is actually accurate. The Thunderbolt controller is itself
> > > > > > > hot-removable and that BTW happens to be hot-removed when fwupd applies
> > > > > > > firmware upgrades to the device.
> > > > > 
> > > > > This is quite interesting take. Does fwupd rip the controller out of the
> > > > > box to update it? By that account your touchpad is also removable as it
> > > > > may stop functioning when its firmware gets updated.
> > > > 
> > > > The Thunderbolt controller is connected to a hotpluggable PCIe root port
> > > > so it will be dissappear from the userspace so that "removable" in that
> > > > sense is accurate.
> > > 
> > > There are systems as well where the Thunderbolt (and/or xHCI) controller
> > > only appears if there is anything plugged to the physical Type-C ports
> > > and it gets removed pretty soon after the physical device gets
> > > unplugged. These are also the same Alpine Ridge and Titan Ridge
> > > controllers that this patch is dealing with.
> > > 
> > > I tried to think about some sort of more generic heuristic how to figure
> > > out that the controller is actually inside the physical system but there
> > > is a problem that the same controller can appear on the bus as well, eg.
> > > you plug in Thunderbolt dock and that one has xHCI controller too. That
> > > device should definitely be "removable". With the "software CM" systems
> > > we have a couple of additional hints in the ACPI tables that can be used
> > > to identify the "tunneled" ports but this does not apply to the older
> > > systems I'm afraid.
> > 
> > The below "might" work:
> > 
> > 1. A device that is directly behind a PCIe root or downstream port that
> >     has ->external_facing == 1.
> > 
> > 2. It is a PCIe endpoint.
> > 
> > 3. It is a sibling to or has any of the below PCI IDs (see
> >     drivers/thunderbolt/nhi.h for the definitions):
> > 
> >     PCI_DEVICE_ID_INTEL_ALPINE_RIDGE_C_4C_NHI
> >     PCI_DEVICE_ID_INTEL_ALPINE_RIDGE_C_2C_NHI
> >     PCI_DEVICE_ID_INTEL_ALPINE_RIDGE_LP_USBONLY_NHI
> >     PCI_DEVICE_ID_INTEL_ALPINE_RIDGE_USBONLY_NHI
> >     PCI_DEVICE_ID_INTEL_ALPINE_RIDGE_C_USBONLY_NHI
> >     PCI_DEVICE_ID_INTEL_TITAN_RIDGE_2C_NHI
> >     PCI_DEVICE_ID_INTEL_TITAN_RIDGE_4C_NHI
> > 
> >     And for all USB4 we can use the PCI class:
> > 
> >     PCI_CLASS_SERIAL_USB_USB4
> > 
> > (4. With USB4 systems we could also add the check that the device is not
> > below the tunneled PCIe ports but that's kind of redundant).
> > 
> > I think this covers the existing devices as well as future discrete USB4
> > controllers and marks both the NHI and the xHCI as "fixed" which we
> > could also use to lift the bounce buffering restriction from them.
> > 
> > @Mario, did I miss anything?
> 
> The bounce buffering restriction is only for unaligned DMA isn't it? Does
> that tend to happen a lot?

AFAICT no but this would allow to use IOMMU identity mappings instead of
full mappings with these devices.

> But otherwise I think this does a good job.  It will cover external
> enclosure cases too because of having to check it's directly behind a root
> port.
> 
> But it should also include comments about why it's not needed on newer
> systems (IE the ACPI hints for someone with no prior knowledge looking at
> this to find).

Agree.
Esther Shimanovich Jan. 25, 2024, 11:45 p.m. UTC | #17
On Fri, Jan 19, 2024 at 11:03 AM Esther Shimanovich
<eshimanovich@chromium.org> wrote:
> Next week I'll try those devices in our inventory to see if I can find
> another one with this bug. I'll get back to you on that!

I ended up finding 11 additional models with this bug, from various
manufacturers such as HP, Dell and Lenovo, that use the following
thunderbolt controllers: JHL6240, JHL6340, JHL6540, JHL7540. So making
this fix apply to all affected devices would make sense.

On Mon, Jan 22, 2024 at 1:10 AM Mika Westerberg
<mika.westerberg@linux.intel.com> wrote:
> Yes, you are missing the 1. that it needs to be directly behind the PCIe
> root or downstream port that is marked as ->external_facing, and the
> fact that there can't be NHI's (that's the host controller with the IDs
> I listed in 3.) anywhere else except starting the topology according the
> USB4 spec (and the same applies to Thunderbolt 1-3).

Thanks for the explanation. I'll write up a patch that implements this
and takes into account all the feedback. Then I'll test it on multiple
models, and then send it your way. Let me know if it makes sense to
add you as a co-author.
Very much appreciate your insights.
Esther Shimanovich April 15, 2024, 10:34 p.m. UTC | #18
Hey!
Asking for some help on implementation.
So I implemented most of this, and successfully tested the quirk on 6
different devices with various types of discrete fixed JHL Thunderbolt
chips.

However I want to add an additional check. A device wouldn't need this
quirk if it already has Thunderbolt functionality built in within its
Root Port.

I tried to use "is_thunderbolt" as an identifier for that type of
device--- but when I tested on a device with a thunderbolt root port,
"is_thunderbolt" was false for all the Thunderbolt PCI components
listed below.

~ # lspci -nn | grep Thunderbolt
00:07.0 PCI bridge [0604]: Intel Corporation Tiger Lake-LP Thunderbolt
4 PCI Express Root Port #1 [8086:9a25] (rev 01)
00:07.2 PCI bridge [0604]: Intel Corporation Tiger Lake-LP Thunderbolt
4 PCI Express Root Port #2 [8086:9a27] (rev 01)
00:0d.0 USB controller [0c03]: Intel Corporation Tiger Lake-LP
Thunderbolt 4 USB Controller [8086:9a13] (rev 01)
00:0d.2 USB controller [0c03]: Intel Corporation Tiger Lake-LP
Thunderbolt 4 NHI #0 [8086:9a1b] (rev 01)
00:0d.3 USB controller [0c03]: Intel Corporation Tiger Lake-LP
Thunderbolt 4 NHI #1 [8086:9a1d] (rev 01)

The device I tested was the Lenovo carbon X1 Gen 9. When
set_pcie_thunderbolt is run, the devices listed above return false on
the pci_find_next_ext_capability check.

I have spent some time trying to see if there are any ACPI values or
any alternative indicators to reliably know if a root port has
thunderbolt capabilities. I do see that ADBG is often set to TBT but
that looks like a debugging tool in ACPI.

I also looked through lspci -vvv and compared the output with a device
that has no Thunderbolt built into its CPU, which gave me nothing.

I also tried looking through values in /sys/bus and searching through
the kernel, looking through logs etc. The only option I see now is
figuring out how to get the string value returned by the lspci and
parsing it, but I'm not 100% sure if all Thunderbolt CPUs would have
Root ports specifically labeled as "Thunderbolt Root Port". I'm also
not sure if that value is supposed to be used in that way.

I was hoping that someone may have a suggestion here.


Just for reference, this is the fix I have so far. I don't want to
submit it as a new patch, until I resolve the above question.

+static bool get_pci_exp_upstream_port(struct pci_dev *dev,
+                                   struct pci_dev **upstream_port)
+{
+       struct pci_dev *parent = dev;
+
+       // If the dev is an upstream port, return itself
+       if (pci_pcie_type(dev) == PCI_EXP_TYPE_UPSTREAM) {
+               *upstream_port = dev;
+               return true;
+       }
+
+       // Iterate through the dev's parents to find its upstream port
+       while ((parent = pci_upstream_bridge(parent))) {
+               if (pci_pcie_type(parent) == PCI_EXP_TYPE_UPSTREAM) {
+                       *upstream_port = parent;
+                       return true;
+               }
+       }
+       return false;
+}
+
+static void relabel_internal_thunderbolt_chip(struct pci_dev *dev)
+{
+       struct pci_dev *upstream_port;
+       struct pci_dev *upstream_ports_parent;
+
+       // Get the upstream port closest to the dev
+       if (!(get_pci_exp_upstream_port(dev, &upstream_port)))
+               return;
+
+       // Next, we confirm if the upstream port is directly behind a PCIe root
+       // port.
+       if (!(upstream_ports_parent == pci_upstream_bridge(upstream_port)))
+               return;
+       if (!(pci_pcie_type(upstream_ports_parent) == PCI_EXP_TYPE_ROOT_PORT))
+               return; // quirk does not apply
+
+       // If a device's root port already has thunderbolt capabilities, then
+       // it shouldn't be using this quirk.
+       // TODO: THIS CHECK DOES NOT WORK
+       // I ALSO TRIED USING pci_is_thunderbolt_attached WHICH DIDN'T
WORK EITHER
+       if (!(upstream_ports_parent->is_thunderbolt))
+               return; // thunderbolt functionality is built into root port
+
+       // Apply quirk
+       dev_set_removable(&dev->dev, DEVICE_FIXED);
+
+       // External facing bridges must be marked as such so that devices
+       // below them can correctly be labeled as REMOVABLE
+       if ((pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM)
+            && (dev->devfn == 0x08 | dev->devfn == 0x20))
+               dev->external_facing = true;
+}
Mika Westerberg April 16, 2024, 5:03 a.m. UTC | #19
Hi,

On Mon, Apr 15, 2024 at 06:34:00PM -0400, Esther Shimanovich wrote:
> Hey!
> Asking for some help on implementation.
> So I implemented most of this, and successfully tested the quirk on 6
> different devices with various types of discrete fixed JHL Thunderbolt
> chips.
> 
> However I want to add an additional check. A device wouldn't need this
> quirk if it already has Thunderbolt functionality built in within its
> Root Port.

There is really no "Thunderbolt functionality built in within its Root
Port".

More accurate is that the Thunderbolt/USB4 controller is integrated into
the CPU and the PCIe tunnels go out through the CPU PCIe Root Ports.

> I tried to use "is_thunderbolt" as an identifier for that type of
> device--- but when I tested on a device with a thunderbolt root port,
> "is_thunderbolt" was false for all the Thunderbolt PCI components
> listed below.

You should not use is_thunderbolt for anything else than with the legacy
Apple systems.

> ~ # lspci -nn | grep Thunderbolt
> 00:07.0 PCI bridge [0604]: Intel Corporation Tiger Lake-LP Thunderbolt
> 4 PCI Express Root Port #1 [8086:9a25] (rev 01)
> 00:07.2 PCI bridge [0604]: Intel Corporation Tiger Lake-LP Thunderbolt
> 4 PCI Express Root Port #2 [8086:9a27] (rev 01)
> 00:0d.0 USB controller [0c03]: Intel Corporation Tiger Lake-LP
> Thunderbolt 4 USB Controller [8086:9a13] (rev 01)
> 00:0d.2 USB controller [0c03]: Intel Corporation Tiger Lake-LP
> Thunderbolt 4 NHI #0 [8086:9a1b] (rev 01)
> 00:0d.3 USB controller [0c03]: Intel Corporation Tiger Lake-LP
> Thunderbolt 4 NHI #1 [8086:9a1d] (rev 01)

Okay this is integrated Thunderbolt/USB4 controller.

> The device I tested was the Lenovo carbon X1 Gen 9. When
> set_pcie_thunderbolt is run, the devices listed above return false on
> the pci_find_next_ext_capability check.
> 
> I have spent some time trying to see if there are any ACPI values or
> any alternative indicators to reliably know if a root port has
> thunderbolt capabilities. I do see that ADBG is often set to TBT but
> that looks like a debugging tool in ACPI.
> 
> I also looked through lspci -vvv and compared the output with a device
> that has no Thunderbolt built into its CPU, which gave me nothing.

For integrated and most recent systems (that's with the software
connection manager) the tunneled PCIe ports (Root or Downstream) can be
identified by looking at "usb4-host-interface" ACPI _DSD property. In
addition to this there is the "External Facing Port" ACPI _DSD property
attached to them too. Maybe these help?

With the firmware connection manager there is only the "External Facing
Port" _DSD though.

The Microsoft documentation for this that we also use in Linux is here:

https://learn.microsoft.com/en-us/windows-hardware/drivers/pci/dsd-for-pcie-root-ports
Esther Shimanovich April 18, 2024, 7:43 p.m. UTC | #20
Thank you for your response! It is very much appreciated.

On the Tiger Lake device I was testing on, the usb4-host-interface
value is NOT listed in its ACPI.

I then decided to query the ACPI values collected from devices in my
office, to see if this issue is limited to my device.
Ice Lake - 4 devices, none had "usb4-host-interface"
Tiger Lake - 31 devices, none have "usb4-host-interface"
Alder Lake - 32 devices, I see that 15 of them have
"usb4-host-interface" in their ACPI
Raptor Lake - 1 device, does not have "usb4-host-interface"

It looks like only Alder Lake has usb4-host-interface  listed in its
ACPI for whatever reason.

It seems like I cannot use usb4-host-interface as a determinant
whether the CPU has Thunderbolt capabilities (thus not needing a
discrete Thunderbolt chip).
ExternalFacingPort is listed in devices that don't have CPUs with
Thunderbolts, so that can't be a determinant.

Am I missing something?
Mika Westerberg April 19, 2024, 4:49 a.m. UTC | #21
Hi,

On Thu, Apr 18, 2024 at 03:43:08PM -0400, Esther Shimanovich wrote:
> Thank you for your response! It is very much appreciated.
> 
> On the Tiger Lake device I was testing on, the usb4-host-interface
> value is NOT listed in its ACPI.
> 
> I then decided to query the ACPI values collected from devices in my
> office, to see if this issue is limited to my device.
> Ice Lake - 4 devices, none had "usb4-host-interface"
> Tiger Lake - 31 devices, none have "usb4-host-interface"
> Alder Lake - 32 devices, I see that 15 of them have
> "usb4-host-interface" in their ACPI
> Raptor Lake - 1 device, does not have "usb4-host-interface"
> 
> It looks like only Alder Lake has usb4-host-interface  listed in its
> ACPI for whatever reason.

The reason is that it is there only for "software connection manager"
systems. That's Chrome TGL and then all ADL and beyond (with integrated
Thunderbolt/USB4).

> It seems like I cannot use usb4-host-interface as a determinant
> whether the CPU has Thunderbolt capabilities (thus not needing a
> discrete Thunderbolt chip).
> ExternalFacingPort is listed in devices that don't have CPUs with
> Thunderbolts, so that can't be a determinant.
> 
> Am I missing something?

"ExternalFacingPort" is there for all firmware and software connection
manager systems but only if they use IOMMU for DMA security (also
/sys/bus/thunderbolt/devices/domainX/iommu_dma_protection == 1). So this
is all integrated (Ice Lake+), then Alpine Ridge, Titan Ridge and Maple
Ridge based recent systems.

That leaves out still older systems with firmware connection manager
with the "legacy" security settings. This is pretty much Alpine and
Titan Ridge and I think those you can identify using their PCI IDs
directly as the list will not be too big. Something like this taken from
the drivers/thunderbolt/nhi.h:

#define PCI_DEVICE_ID_INTEL_ALPINE_RIDGE_LP_BRIDGE      0x15c0
#define PCI_DEVICE_ID_INTEL_ALPINE_RIDGE_C_4C_BRIDGE    0x15d3
#define PCI_DEVICE_ID_INTEL_ALPINE_RIDGE_C_2C_BRIDGE    0x15da
#define PCI_DEVICE_ID_INTEL_TITAN_RIDGE_2C_BRIDGE       0x15e7
#define PCI_DEVICE_ID_INTEL_TITAN_RIDGE_4C_BRIDGE       0x15ea
#define PCI_DEVICE_ID_INTEL_TITAN_RIDGE_DD_BRIDGE       0x15ef

In addition to this some of the systems used the BIOS "assisted"
enumeration which means that the controller is actually only present if
you have either USB or Thunderbolt device connected. Othertimes it is
not present at all (not sure if you want to label these differently
though).

In summary if you want to find out if the CPU has an integrated
Thunderbolt/USB4 controller (I think this is what you ask above) then
the best bet would be to use "ExternalFacingPort" because that is
present in all those systems.
Esther Shimanovich April 22, 2024, 7:17 p.m. UTC | #22
Thanks for the explanation! I still don't fully understand how that
would work for my use case.

Perhaps it would be better for me to describe the case I am trying to
protect against.

To rehash, this quirk was written for devices with discrete
Thunderbolt controllers.

For example,
CometLake_CPU -> AlpineRidge_Chip -> USB-C Port
This device has the ExternalFacingPort property in ACPI.
My quirk relabels the Alpine Ridge chip as "fixed" and
external-facing, so that devices attached to the USB-C port could be
labeled as "removable"

Let's say we have a TigerLake CPU, which has integrated
Thunderbolt/USB4 capabilities:

TigerLake_ThunderboltCPU -> USB-C Port
This device also has the ExternalFacingPort property in ACPI and lacks
the usb4-host-interface property in the ACPI.

My worry is that someone could take an Alpine Ridge Chip Thunderbolt
Dock and attach it to the TigerLake CPU

TigerLake_ThunderboltCPU -> USB-C Port -> AlpineRidge_Dock

If that were to happen, this quirk would incorrectly label the Alpine
Ridge Dock as "fixed" instead of "removable".

My thinking was that we could prevent this scenario from occurring if
we filtered this quirk not to apply on CPU's like Tiger Lake, with
integrated Thunderbolt/USB4 capabilities.

ExternalFacingPort is found both on the Comet Lake ACPI and also on
the Tiger Lake ACPI. So I can't use that to distinguish between CPUs
which don't have integrated Thunderbolt, like Comet Lake, and CPUs
with integrated Thunderbolt, like Tiger Lake.

I am looking for something that can tell me if the device's Root Port
has the Thunderbolt controller upstream to it or not.
Is there anything like that?
Or perhaps should I add a check which compares the name of the
device's CPU with a list of CPUs that this quirk can be applied to?
Or is there some way I can identify the Thunderbolt controller, then
determine if it's upstream or downstream from the root port?
Or are Alpine Ridge docks not something to worry about at all?
Mario Limonciello April 22, 2024, 7:21 p.m. UTC | #23
On 4/22/2024 14:17, Esther Shimanovich wrote:
> Thanks for the explanation! I still don't fully understand how that
> would work for my use case.
> 
> Perhaps it would be better for me to describe the case I am trying to
> protect against.
> 
> To rehash, this quirk was written for devices with discrete
> Thunderbolt controllers.
> 
> For example,
> CometLake_CPU -> AlpineRidge_Chip -> USB-C Port
> This device has the ExternalFacingPort property in ACPI.
> My quirk relabels the Alpine Ridge chip as "fixed" and
> external-facing, so that devices attached to the USB-C port could be
> labeled as "removable"
> 
> Let's say we have a TigerLake CPU, which has integrated
> Thunderbolt/USB4 capabilities:
> 
> TigerLake_ThunderboltCPU -> USB-C Port
> This device also has the ExternalFacingPort property in ACPI and lacks
> the usb4-host-interface property in the ACPI.
> 
> My worry is that someone could take an Alpine Ridge Chip Thunderbolt
> Dock and attach it to the TigerLake CPU
> 
> TigerLake_ThunderboltCPU -> USB-C Port -> AlpineRidge_Dock
> 
> If that were to happen, this quirk would incorrectly label the Alpine
> Ridge Dock as "fixed" instead of "removable".
> 
> My thinking was that we could prevent this scenario from occurring if
> we filtered this quirk not to apply on CPU's like Tiger Lake, with
> integrated Thunderbolt/USB4 capabilities.
> 
> ExternalFacingPort is found both on the Comet Lake ACPI and also on
> the Tiger Lake ACPI. So I can't use that to distinguish between CPUs
> which don't have integrated Thunderbolt, like Comet Lake, and CPUs
> with integrated Thunderbolt, like Tiger Lake.
> 
> I am looking for something that can tell me if the device's Root Port
> has the Thunderbolt controller upstream to it or not.
> Is there anything like that?
> Or perhaps should I add a check which compares the name of the
> device's CPU with a list of CPUs that this quirk can be applied to?
> Or is there some way I can identify the Thunderbolt controller, then
> determine if it's upstream or downstream from the root port?
> Or are Alpine Ridge docks not something to worry about at all?

My thought was once you have a device as untrusted, everything else 
connected to it should "also" be untrusted.
Mika Westerberg April 23, 2024, 5:33 a.m. UTC | #24
On Mon, Apr 22, 2024 at 02:21:18PM -0500, Mario Limonciello wrote:
> On 4/22/2024 14:17, Esther Shimanovich wrote:
> > Thanks for the explanation! I still don't fully understand how that
> > would work for my use case.
> > 
> > Perhaps it would be better for me to describe the case I am trying to
> > protect against.
> > 
> > To rehash, this quirk was written for devices with discrete
> > Thunderbolt controllers.
> > 
> > For example,
> > CometLake_CPU -> AlpineRidge_Chip -> USB-C Port
> > This device has the ExternalFacingPort property in ACPI.
> > My quirk relabels the Alpine Ridge chip as "fixed" and
> > external-facing, so that devices attached to the USB-C port could be
> > labeled as "removable"
> > 
> > Let's say we have a TigerLake CPU, which has integrated
> > Thunderbolt/USB4 capabilities:
> > 
> > TigerLake_ThunderboltCPU -> USB-C Port
> > This device also has the ExternalFacingPort property in ACPI and lacks
> > the usb4-host-interface property in the ACPI.
> > 
> > My worry is that someone could take an Alpine Ridge Chip Thunderbolt
> > Dock and attach it to the TigerLake CPU
> > 
> > TigerLake_ThunderboltCPU -> USB-C Port -> AlpineRidge_Dock
> > 
> > If that were to happen, this quirk would incorrectly label the Alpine
> > Ridge Dock as "fixed" instead of "removable".
> > 
> > My thinking was that we could prevent this scenario from occurring if
> > we filtered this quirk not to apply on CPU's like Tiger Lake, with
> > integrated Thunderbolt/USB4 capabilities.
> > 
> > ExternalFacingPort is found both on the Comet Lake ACPI and also on
> > the Tiger Lake ACPI. So I can't use that to distinguish between CPUs
> > which don't have integrated Thunderbolt, like Comet Lake, and CPUs
> > with integrated Thunderbolt, like Tiger Lake.
> > 
> > I am looking for something that can tell me if the device's Root Port
> > has the Thunderbolt controller upstream to it or not.
> > Is there anything like that?
> > Or perhaps should I add a check which compares the name of the
> > device's CPU with a list of CPUs that this quirk can be applied to?
> > Or is there some way I can identify the Thunderbolt controller, then
> > determine if it's upstream or downstream from the root port?
> > Or are Alpine Ridge docks not something to worry about at all?
> 
> My thought was once you have a device as untrusted, everything else
> connected to it should "also" be untrusted.

I think what you are looking for is that anything behind a PCIe tunnel
should not have this applied. IIRC the AMD GPU or some code there were
going to add identification of "virtual" links to the bandwidth
calculation functionality.

@Mario, do you remember if this was done already and if that could maybe
be re-used here?

The other way I think is something like this:

  - If it does not have "usb4-host-interface" property (or behind a port
    that has that). These are all tunneled (e.g virtual).

  - It is directly connected to a PCIe root port with
    "ExternalFacingPort" and it has sibling device that is "Thunderbolt
    NHI". This is because you can only have "NHI" on a host router
    according to the USB4 spec.

I may be forgetting something though.
Lukas Wunner April 23, 2024, 8:31 a.m. UTC | #25
On Tue, Apr 23, 2024 at 08:33:12AM +0300, Mika Westerberg wrote:
> I think what you are looking for is that anything behind a PCIe tunnel
> should not have this applied. IIRC the AMD GPU or some code there were
> going to add identification of "virtual" links to the bandwidth
> calculation functionality.

I guess I could resurrect my correlation patch:

https://lore.kernel.org/all/f53ea40a7487e145aa1a62c347cef1814072e140.1536517047.git.lukas@wunner.de/

The last time I forward-ported it was for v5.13.  I still have that code
running on my development machine.

The problem is that it only allows lookup from tb_port to pci_dev.
I'd have to add a pointer to struct pci_dev to allow lookups in the
inverse direction.  Though I think we have such PCI companion devices
for CXL as well, so such a pointer could be useful in general.

I'm knee-deep in PCI device authentication code but could probably
dedicate a weekend to the correlation patch if there's interest?

Once we have correlation, we can expose more precise bandwidth
for virtual PCI links in sysfs.

Thanks,

Lukas
Mika Westerberg April 23, 2024, 8:40 a.m. UTC | #26
On Tue, Apr 23, 2024 at 10:31:30AM +0200, Lukas Wunner wrote:
> On Tue, Apr 23, 2024 at 08:33:12AM +0300, Mika Westerberg wrote:
> > I think what you are looking for is that anything behind a PCIe tunnel
> > should not have this applied. IIRC the AMD GPU or some code there were
> > going to add identification of "virtual" links to the bandwidth
> > calculation functionality.
> 
> I guess I could resurrect my correlation patch:
> 
> https://lore.kernel.org/all/f53ea40a7487e145aa1a62c347cef1814072e140.1536517047.git.lukas@wunner.de/
> 
> The last time I forward-ported it was for v5.13.  I still have that code
> running on my development machine.
> 
> The problem is that it only allows lookup from tb_port to pci_dev.
> I'd have to add a pointer to struct pci_dev to allow lookups in the
> inverse direction.  Though I think we have such PCI companion devices
> for CXL as well, so such a pointer could be useful in general.
> 
> I'm knee-deep in PCI device authentication code but could probably
> dedicate a weekend to the correlation patch if there's interest?
> 
> Once we have correlation, we can expose more precise bandwidth
> for virtual PCI links in sysfs.

Sounds good to me :) There are also some additions in USB4 spec that
allows discovery of mapping between PCIe adapters and the corresponding
PCIe downstream/root port. Perhaps these can be added there too?
Mario Limonciello April 23, 2024, 4:59 p.m. UTC | #27
On 4/23/2024 00:33, Mika Westerberg wrote:
> On Mon, Apr 22, 2024 at 02:21:18PM -0500, Mario Limonciello wrote:
>> On 4/22/2024 14:17, Esther Shimanovich wrote:
>>> Thanks for the explanation! I still don't fully understand how that
>>> would work for my use case.
>>>
>>> Perhaps it would be better for me to describe the case I am trying to
>>> protect against.
>>>
>>> To rehash, this quirk was written for devices with discrete
>>> Thunderbolt controllers.
>>>
>>> For example,
>>> CometLake_CPU -> AlpineRidge_Chip -> USB-C Port
>>> This device has the ExternalFacingPort property in ACPI.
>>> My quirk relabels the Alpine Ridge chip as "fixed" and
>>> external-facing, so that devices attached to the USB-C port could be
>>> labeled as "removable"
>>>
>>> Let's say we have a TigerLake CPU, which has integrated
>>> Thunderbolt/USB4 capabilities:
>>>
>>> TigerLake_ThunderboltCPU -> USB-C Port
>>> This device also has the ExternalFacingPort property in ACPI and lacks
>>> the usb4-host-interface property in the ACPI.
>>>
>>> My worry is that someone could take an Alpine Ridge Chip Thunderbolt
>>> Dock and attach it to the TigerLake CPU
>>>
>>> TigerLake_ThunderboltCPU -> USB-C Port -> AlpineRidge_Dock
>>>
>>> If that were to happen, this quirk would incorrectly label the Alpine
>>> Ridge Dock as "fixed" instead of "removable".
>>>
>>> My thinking was that we could prevent this scenario from occurring if
>>> we filtered this quirk not to apply on CPU's like Tiger Lake, with
>>> integrated Thunderbolt/USB4 capabilities.
>>>
>>> ExternalFacingPort is found both on the Comet Lake ACPI and also on
>>> the Tiger Lake ACPI. So I can't use that to distinguish between CPUs
>>> which don't have integrated Thunderbolt, like Comet Lake, and CPUs
>>> with integrated Thunderbolt, like Tiger Lake.
>>>
>>> I am looking for something that can tell me if the device's Root Port
>>> has the Thunderbolt controller upstream to it or not.
>>> Is there anything like that?
>>> Or perhaps should I add a check which compares the name of the
>>> device's CPU with a list of CPUs that this quirk can be applied to?
>>> Or is there some way I can identify the Thunderbolt controller, then
>>> determine if it's upstream or downstream from the root port?
>>> Or are Alpine Ridge docks not something to worry about at all?
>>
>> My thought was once you have a device as untrusted, everything else
>> connected to it should "also" be untrusted.
> 
> I think what you are looking for is that anything behind a PCIe tunnel
> should not have this applied. IIRC the AMD GPU or some code there were
> going to add identification of "virtual" links to the bandwidth
> calculation functionality.
> 
> @Mario, do you remember if this was done already and if that could maybe
> be re-used here?

Yeah there was a series that I worked on a few spins a while back 
specifically in the context of eGPUs to identify virtual links and take 
them out of bandwidth calculations.

It didn't get merged, I recall it got stalled on various feedback and I 
didn't dust it off because the series also did prompt discussions about 
the reasoning that amdgpu was doing this in the first place.  It turned 
out to be a bad assumption in the code and I instead made a change to 
amdgpu to not look at the whole topology but just the link partner 
(466a7d115326ece682c2b60d1c77d1d0b9010b4f if anyone is curious).

> 
> The other way I think is something like this:
> 
>    - If it does not have "usb4-host-interface" property (or behind a port
>      that has that). These are all tunneled (e.g virtual).
> 
>    - It is directly connected to a PCIe root port with
>      "ExternalFacingPort" and it has sibling device that is "Thunderbolt
>      NHI". This is because you can only have "NHI" on a host router
>      according to the USB4 spec.
> 
> I may be forgetting something though.
Mika Westerberg April 24, 2024, 8:56 a.m. UTC | #28
On Tue, Apr 23, 2024 at 11:59:36AM -0500, Mario Limonciello wrote:
> On 4/23/2024 00:33, Mika Westerberg wrote:
> > On Mon, Apr 22, 2024 at 02:21:18PM -0500, Mario Limonciello wrote:
> > > On 4/22/2024 14:17, Esther Shimanovich wrote:
> > > > Thanks for the explanation! I still don't fully understand how that
> > > > would work for my use case.
> > > > 
> > > > Perhaps it would be better for me to describe the case I am trying to
> > > > protect against.
> > > > 
> > > > To rehash, this quirk was written for devices with discrete
> > > > Thunderbolt controllers.
> > > > 
> > > > For example,
> > > > CometLake_CPU -> AlpineRidge_Chip -> USB-C Port
> > > > This device has the ExternalFacingPort property in ACPI.
> > > > My quirk relabels the Alpine Ridge chip as "fixed" and
> > > > external-facing, so that devices attached to the USB-C port could be
> > > > labeled as "removable"
> > > > 
> > > > Let's say we have a TigerLake CPU, which has integrated
> > > > Thunderbolt/USB4 capabilities:
> > > > 
> > > > TigerLake_ThunderboltCPU -> USB-C Port
> > > > This device also has the ExternalFacingPort property in ACPI and lacks
> > > > the usb4-host-interface property in the ACPI.
> > > > 
> > > > My worry is that someone could take an Alpine Ridge Chip Thunderbolt
> > > > Dock and attach it to the TigerLake CPU
> > > > 
> > > > TigerLake_ThunderboltCPU -> USB-C Port -> AlpineRidge_Dock
> > > > 
> > > > If that were to happen, this quirk would incorrectly label the Alpine
> > > > Ridge Dock as "fixed" instead of "removable".
> > > > 
> > > > My thinking was that we could prevent this scenario from occurring if
> > > > we filtered this quirk not to apply on CPU's like Tiger Lake, with
> > > > integrated Thunderbolt/USB4 capabilities.
> > > > 
> > > > ExternalFacingPort is found both on the Comet Lake ACPI and also on
> > > > the Tiger Lake ACPI. So I can't use that to distinguish between CPUs
> > > > which don't have integrated Thunderbolt, like Comet Lake, and CPUs
> > > > with integrated Thunderbolt, like Tiger Lake.
> > > > 
> > > > I am looking for something that can tell me if the device's Root Port
> > > > has the Thunderbolt controller upstream to it or not.
> > > > Is there anything like that?
> > > > Or perhaps should I add a check which compares the name of the
> > > > device's CPU with a list of CPUs that this quirk can be applied to?
> > > > Or is there some way I can identify the Thunderbolt controller, then
> > > > determine if it's upstream or downstream from the root port?
> > > > Or are Alpine Ridge docks not something to worry about at all?
> > > 
> > > My thought was once you have a device as untrusted, everything else
> > > connected to it should "also" be untrusted.
> > 
> > I think what you are looking for is that anything behind a PCIe tunnel
> > should not have this applied. IIRC the AMD GPU or some code there were
> > going to add identification of "virtual" links to the bandwidth
> > calculation functionality.
> > 
> > @Mario, do you remember if this was done already and if that could maybe
> > be re-used here?
> 
> Yeah there was a series that I worked on a few spins a while back
> specifically in the context of eGPUs to identify virtual links and take them
> out of bandwidth calculations.
> 
> It didn't get merged, I recall it got stalled on various feedback and I
> didn't dust it off because the series also did prompt discussions about the
> reasoning that amdgpu was doing this in the first place.  It turned out to
> be a bad assumption in the code and I instead made a change to amdgpu to not
> look at the whole topology but just the link partner
> (466a7d115326ece682c2b60d1c77d1d0b9010b4f if anyone is curious).

Okay that makes sense. Thanks!
Esther Shimanovich April 25, 2024, 9:16 p.m. UTC | #29
Thank you for all your help!

On Tue, Apr 23, 2024 at 1:33 AM Mika Westerberg
<mika.westerberg@linux.intel.com> wrote:
>
> The other way I think is something like this:
>
>   - If it does not have "usb4-host-interface" property (or behind a port
>     that has that). These are all tunneled (e.g virtual).
>
>   - It is directly connected to a PCIe root port with
>     "ExternalFacingPort" and it has sibling device that is "Thunderbolt
>     NHI". This is because you can only have "NHI" on a host router
>     according to the USB4 spec.
>
I did find one example of a docking station that uses the DSL6540
chip, which has PCI IDs defined in include/linux/pci_ids.h:
#define PCI_DEVICE_ID_INTEL_ALPINE_RIDGE_4C_NHI     0x1577
#define PCI_DEVICE_ID_INTEL_ALPINE_RIDGE_4C_BRIDGE  0x1578
It seems like it has an NHI, despite being in an external, removable
docking station. This appears to contradict what you say about only
having "NHI" on a host router. I am assuming that by host router, you
mean the fixed discrete, fixed thunderbolt chip, or the thunderbolt
controller upstream to the root port. Please correct me if I got
anything wrong!

Looking at 18-241_ThunderboltController_Brief_HI.pdf, it seems like
these Alpine Ridge chips can be used either on a computer or a
peripheral. (Expected usage: Computer or peripheral)
So I'm not sure if finding an NHI would guarantee that the device is
not a peripheral. My original question was how to distinguish a
Thunderbolt controller that is on a removable peripheral, like a
docking station-- from one that is a discrete chip fixed to a computer
or upstream to the root port.

So unless I am misunderstanding something, it appears that my only
option is waiting for Lukas's patches. Please correct me if that is
not the case!
Mika Westerberg April 26, 2024, 4:52 a.m. UTC | #30
Hi,

On Thu, Apr 25, 2024 at 05:16:24PM -0400, Esther Shimanovich wrote:
> Thank you for all your help!
> 
> On Tue, Apr 23, 2024 at 1:33 AM Mika Westerberg
> <mika.westerberg@linux.intel.com> wrote:
> >
> > The other way I think is something like this:
> >
> >   - If it does not have "usb4-host-interface" property (or behind a port
> >     that has that). These are all tunneled (e.g virtual).
> >
> >   - It is directly connected to a PCIe root port with
> >     "ExternalFacingPort" and it has sibling device that is "Thunderbolt
> >     NHI". This is because you can only have "NHI" on a host router
> >     according to the USB4 spec.
> >
> I did find one example of a docking station that uses the DSL6540
> chip, which has PCI IDs defined in include/linux/pci_ids.h:
> #define PCI_DEVICE_ID_INTEL_ALPINE_RIDGE_4C_NHI     0x1577
> #define PCI_DEVICE_ID_INTEL_ALPINE_RIDGE_4C_BRIDGE  0x1578
> It seems like it has an NHI, despite being in an external, removable
> docking station. This appears to contradict what you say about only
> having "NHI" on a host router. I am assuming that by host router, you
> mean the fixed discrete, fixed thunderbolt chip, or the thunderbolt
> controller upstream to the root port. Please correct me if I got
> anything wrong!

So it goes same way with other discrete chips from Intel at least. It is
the same silicon but the NHI is disabled on device routers.

That said, it is entirely possible for a "malicious" device to pretend
to have one so we need to be careful.

> Looking at 18-241_ThunderboltController_Brief_HI.pdf, it seems like
> these Alpine Ridge chips can be used either on a computer or a
> peripheral. (Expected usage: Computer or peripheral)

Yes as above. Most our chips are such.

> So I'm not sure if finding an NHI would guarantee that the device is
> not a peripheral. My original question was how to distinguish a
> Thunderbolt controller that is on a removable peripheral, like a
> docking station-- from one that is a discrete chip fixed to a computer
> or upstream to the root port.

Yes that's the problem. We can figure that out from full USB4 system by
looking at the "usb4-host-interface" ACPI _DSD properties of the
tunneled PCIe Root/Downstream ports. But this does not work with the
pre-USB4 hosts so that requires some sort of heuristics, unfortunately.

> So unless I am misunderstanding something, it appears that my only
> option is waiting for Lukas's patches. Please correct me if that is
> not the case!

I think with Lukas' patches (Lukas please correct me if I got that
wrong) you can find out the PCIe ports which have their link going over
a tunnel. His series works with pre-USB4 devices so that should cover
your case. (In addition to "usb4-host-interface" ACPI _DSD property there
is a special router operation that allows extracting the same PCIe
downstream port mapping that Lukas' patches are doing from DROM so this
should also allow identify all tunneled links.)

This way you can identify the xHCI (well and NHI) that are not behind
PCIe tunnel so that should mean they are really part of the host system
(being soldered or plugged to the PCIe slot or so). If I understood
right this is what you were looking for, correct?
Esther Shimanovich April 26, 2024, 3:58 p.m. UTC | #31
> This way you can identify the xHCI (well and NHI) that are not behind
> PCIe tunnel so that should mean they are really part of the host system
> (being soldered or plugged to the PCIe slot or so). If I understood
> right this is what you were looking for, correct?

Yes! Thank you for the explanation.
Lukas Wunner April 27, 2024, 5:35 a.m. UTC | #32
On Fri, Apr 26, 2024 at 07:52:07AM +0300, Mika Westerberg wrote:
> On Thu, Apr 25, 2024 at 05:16:24PM -0400, Esther Shimanovich wrote:
> > I did find one example of a docking station that uses the DSL6540
> > chip, which has PCI IDs defined in include/linux/pci_ids.h:
> > #define PCI_DEVICE_ID_INTEL_ALPINE_RIDGE_4C_NHI     0x1577
> > #define PCI_DEVICE_ID_INTEL_ALPINE_RIDGE_4C_BRIDGE  0x1578
> > It seems like it has an NHI, despite being in an external, removable
> > docking station. This appears to contradict what you say about only
> > having "NHI" on a host router. I am assuming that by host router, you
> > mean the fixed discrete, fixed thunderbolt chip, or the thunderbolt
> > controller upstream to the root port. Please correct me if I got
> > anything wrong!
> 
> So it goes same way with other discrete chips from Intel at least. It is
> the same silicon but the NHI is disabled on device routers.
> 
> That said, it is entirely possible for a "malicious" device to pretend
> to have one so we need to be careful.

If a device (accidentally or maliciously) exposes an NHI, the thunderbolt
driver will try to bind to it.

Do we take any precautions to prevent that?

AFAICS we'd be allocating a duplicate root_switch with route 0.
Seems dangerous if two driver instances talk to the same Root Switch.

This looks like something that needs to be checked by Intel Validation
Engineering folks.  Have they been testing this?  (+cc Gil)

Thanks,

Lukas
Lukas Wunner April 27, 2024, 7:08 a.m. UTC | #33
On Thu, Apr 25, 2024 at 05:16:24PM -0400, Esther Shimanovich wrote:
> So unless I am misunderstanding something, it appears that my only
> option is waiting for Lukas's patches.

I've just pushed this branch:

https://github.com/l1k/linux/commits/thunderbolt_associate_v1

But it's not even compile-tested yet.  0-day is crunching through it now
and I'll force-push if/when fixing bugs.  It also doesn't contain support
yet for the "Get PCIe Downstream Entry" functionality that's apparently
needed for USB4.

The branch marks PCIe Downstream Adapters on a Root Switch as fixed and
external_facing.

However that doesn't appear to be sufficient:  I notice that in your
patch, you're also clearing the external_facing bit on the Root Port
above the discrete host controller.

Additionally you're marking the bridges leading to the NHI and XHCI
as fixed.  You're also marking the NHI and XHCI themselves as fixed
and external_facing.

I just want to confirm whether all of this is actually necessary.
At least marking the NHI and XHCI as external_facing seems nonsensical.
I need to know the *minimal* set of attribute changes to fix the problem.

I also need to know exactly what the user-visible issue is and how
it comes about.  Your commit message merely says "the build-in USB-C
ports stop working at all".  Does that mean that no driver is bound
to the NHI or XHCI?

The solution implemented by the above-linked branch hinges on the
NHI driver fixing up the device attributes.  But if the NHI driver
is not bound, it can't fix up the attributes.

I notice that commit 86eaf4a5b431 ("thunderbolt: Make iommu_dma_protection
more accurate") bemoans the inability to determine external_facing ports
clearly.  The above-linked branch may solve that, so it seems there may
be overlap between the issue discussed here and the one that the commit
sought to solve.

Thanks,

Lukas
Mika Westerberg April 27, 2024, 7:41 a.m. UTC | #34
On Sat, Apr 27, 2024 at 07:35:33AM +0200, Lukas Wunner wrote:
> On Fri, Apr 26, 2024 at 07:52:07AM +0300, Mika Westerberg wrote:
> > On Thu, Apr 25, 2024 at 05:16:24PM -0400, Esther Shimanovich wrote:
> > > I did find one example of a docking station that uses the DSL6540
> > > chip, which has PCI IDs defined in include/linux/pci_ids.h:
> > > #define PCI_DEVICE_ID_INTEL_ALPINE_RIDGE_4C_NHI     0x1577
> > > #define PCI_DEVICE_ID_INTEL_ALPINE_RIDGE_4C_BRIDGE  0x1578
> > > It seems like it has an NHI, despite being in an external, removable
> > > docking station. This appears to contradict what you say about only
> > > having "NHI" on a host router. I am assuming that by host router, you
> > > mean the fixed discrete, fixed thunderbolt chip, or the thunderbolt
> > > controller upstream to the root port. Please correct me if I got
> > > anything wrong!
> > 
> > So it goes same way with other discrete chips from Intel at least. It is
> > the same silicon but the NHI is disabled on device routers.
> > 
> > That said, it is entirely possible for a "malicious" device to pretend
> > to have one so we need to be careful.
> 
> If a device (accidentally or maliciously) exposes an NHI, the thunderbolt
> driver will try to bind to it.
> 
> Do we take any precautions to prevent that?

Not at the moment but it will be behind the IOMMU so it cannot access
any other memory that what is reserved for it.

> AFAICS we'd be allocating a duplicate root_switch with route 0.
> Seems dangerous if two driver instances talk to the same Root Switch.

I don't think it even works because it cannot have topology ID of 0 if
it is a device router which it is in this case since it is being first
enumerated by the "real" host.

That said we have not tested this either so can be 100% sure.
Lukas Wunner April 27, 2024, 3:09 p.m. UTC | #35
On Thu, Apr 25, 2024 at 05:16:24PM -0400, Esther Shimanovich wrote:
> I did find one example of a docking station that uses the DSL6540
> chip, which has PCI IDs defined in include/linux/pci_ids.h:
> #define PCI_DEVICE_ID_INTEL_ALPINE_RIDGE_4C_NHI     0x1577
> #define PCI_DEVICE_ID_INTEL_ALPINE_RIDGE_4C_BRIDGE  0x1578
> It seems like it has an NHI, despite being in an external, removable
> docking station.

Could you provide full output of dmesg and lspci -vvv of a machine
with this docking station attached?

Perhaps open a bug at bugzilla.kernel.org and attach it there?

Could you then try the below patch and see if it prevents the
Thunderbolt driver from binding to the hot-plugged device?

Thanks!

-- >8 --
From a10a294a650232c16447a43c2b591f34d3cb399f Mon Sep 17 00:00:00 2001
From: Lukas Wunner <lukas@wunner.de>
Date: Sat, 27 Apr 2024 16:24:18 +0200
Subject: [PATCH] thunderbolt: Do not bind to NHI exposed by a hot-plugged
 device

An NHI should only be exposed by Thunderbolt host controllers, not by
hot-plugged devices.

Avoid binding to an NHI erroneously or maliciously exposed by a device
by looking at the upstream port of its switch:

On a host controller, the upstream port is of type TB_TYPE_NHI, whereas
on hot-plugged devices it is of type TB_TYPE_PORT.

Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: stable@vger.kernel.org
---
 drivers/thunderbolt/tb.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/thunderbolt/tb.c b/drivers/thunderbolt/tb.c
index e00e62de53f3..d95ff9ed4d96 100644
--- a/drivers/thunderbolt/tb.c
+++ b/drivers/thunderbolt/tb.c
@@ -2786,6 +2786,13 @@ static int tb_start(struct tb *tb, bool reset)
 	if (IS_ERR(tb->root_switch))
 		return PTR_ERR(tb->root_switch);
 
+	/* NHI erroneously exposed by a hot-plugged device? */
+	if (!tb_port_is_nhi(tb_upstream_port(tb->root_switch))) {
+		tb_err(tb, "not a host controller\n");
+		tb_switch_put(tb->root_switch);
+		return -ENODEV;
+	}
+
 	/*
 	 * ICM firmware upgrade needs running firmware and in native
 	 * mode that is not available so disable firmware upgrade of the
diff mbox series

Patch

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index ea476252280a..34e43323ff14 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -3873,6 +3873,118 @@  DECLARE_PCI_FIXUP_SUSPEND_LATE(PCI_VENDOR_ID_INTEL,
 			       quirk_apple_poweroff_thunderbolt);
 #endif
 
+/*
+ * On most ThinkPad Carbon 7/8s, JHL6540 Thunderbolt 3 bridges are set
+ * incorrectly as DEVICE_REMOVABLE despite being built into the device.
+ * This is the side effect of a unique hardware configuration.
+ *
+ * Normally, Thunderbolt functionality is integrated to the SoC and
+ * its root ports.
+ *
+ *                          Most devices:
+ *                    root port --> USB-C port
+ *
+ * But X1 Carbon Gen 7/8 uses Whiskey Lake and Comet Lake SoC, which
+ * don't come with Thunderbolt functionality. Therefore, a discrete
+ * Thunderbolt Host PCI controller was added between the root port and
+ * the USB-C port.
+ *
+ *                        Thinkpad Carbon 7/8s
+ *                 (w/ Whiskey Lake and Comet Lake SoC):
+ *                root port -->  JHL6540   --> USB-C port
+ *
+ * Because the root port is labeled by FW as "ExternalFacingPort", as
+ * required by the DMAR ACPI spec, the JHL6540 chip is inaccurately
+ * labeled as DEVICE_REMOVABLE by the kernel pci driver.
+ * Therefore, the built-in USB-C ports do not enumerate when policies
+ * forbidding external pci devices are enforced.
+ *
+ * This fix relabels the pci components in the built-in JHL6540 chip as
+ * DEVICE_FIXED, ensuring that the built-in USB-C ports always enumerate
+ * properly as intended.
+ *
+ * This fix also labels the external facing components of the JHL6540 as
+ * external_facing, so that the pci attach policy works as intended.
+ *
+ * The ASCII diagram below describes the pci layout of the JHL6540 chip.
+ *
+ *                         Root Port
+ *                 [8086:02b4] or [8086:9db4]
+ *                             |
+ *                        JHL6540 Chip
+ *     __________________________________________________
+ *    |                      Bridge                      |
+ *    |        PCI ID ->  [8086:15d3]                    |
+ *    |         DEVFN ->      (00)                       |
+ *    |       _________________|__________________       |
+ *    |      |           |            |           |      |
+ *    |    Bridge     Bridge        Bridge      Bridge   |
+ *    | [8086:15d3] [8086:15d3]  [8086:15d3] [8086:15d3] |
+ *    |    (00)        (08)         (10)        (20)     |
+ *    |      |           |            |           |      |
+ *    |     NHI          |     USB Controller     |      |
+ *    | [8086:15d2]      |       [8086:15d4]      |      |
+ *    |    (00)          |          (00)          |      |
+ *    |      |___________|            |___________|      |
+ *    |____________|________________________|____________|
+ *                 |                        |
+ *             USB-C Port               USB-C Port
+ *
+ *
+ * Based on what a JHL6549 pci component's pci id, subsystem device id
+ * and devfn are, we can infer if it is fixed and if it faces a usb port;
+ * which would mean it is external facing.
+ * This quirk uses these values to identify the pci components and set the
+ * properties accordingly.
+ */
+static void carbon_X1_fixup_relabel_alpine_ridge(struct pci_dev *dev)
+{
+	/* Is this JHL6540 PCI component embedded in a Lenovo device? */
+	if (dev->subsystem_vendor != PCI_VENDOR_ID_LENOVO)
+		return;
+
+	/* Is this JHL6540 PCI component embedded in an X1 Carbon Gen 7/8? */
+	if (dev->subsystem_device != 0x22be && // Gen 8
+	    dev->subsystem_device != 0x2292) { // Gen 7
+		return;
+	}
+
+	dev_set_removable(&dev->dev, DEVICE_FIXED);
+
+	/* Not all 0x15d3 components are external facing */
+	if (dev->device == 0x15d3 &&
+	    dev->devfn != 0x08 &&
+	    dev->devfn != 0x20) {
+		return;
+	}
+
+	dev->external_facing = true;
+}
+
+/*
+ * We also need to relabel the root port as a consequence of changing
+ * the JHL6540's PCIE hierarchy.
+ */
+static void carbon_X1_fixup_rootport_not_removable(struct pci_dev *dev)
+{
+	/* Is this JHL6540 PCI component embedded in a Lenovo device? */
+	if (dev->subsystem_vendor != PCI_VENDOR_ID_LENOVO)
+		return;
+
+	/* Is this JHL6540 PCI component embedded in an X1 Carbon Gen 7/8? */
+	if (dev->subsystem_device != 0x22be && // Gen 8
+	    dev->subsystem_device != 0x2292) { // Gen 7
+		return;
+	}
+
+	dev->external_facing = false;
+}
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x15d3, carbon_X1_fixup_relabel_alpine_ridge);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x15d2, carbon_X1_fixup_relabel_alpine_ridge);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x15d4, carbon_X1_fixup_relabel_alpine_ridge);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x02b4, carbon_X1_fixup_rootport_not_removable);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x9db4, carbon_X1_fixup_rootport_not_removable);
+
 /*
  * Following are device-specific reset methods which can be used to
  * reset a single function if other methods (e.g. FLR, PM D0->D3) are