diff mbox series

Avoid FLR for AMD Starship/Matisse Cryptographic Coprocessor

Message ID e30f1c18-3189-774f-054e-8499ade9e398@jaundrew.com
State New
Headers show
Series Avoid FLR for AMD Starship/Matisse Cryptographic Coprocessor | expand

Commit Message

David Jaundrew Aug. 31, 2021, 5:17 a.m. UTC
This patch fixes another FLR bug for the Starship/Matisse controller:

AMD Starship/Matisse Cryptogrpahic Coprocessor PSPCPP

This patch allows functions on the same Starship/Matisse device (such as USB controller,sound card) to properly pass through to a guest OS using vfio-pc. Without this patch, the virtual machine does not boot and eventually times out.

Excerpt from lspci -nn showing crypto function on same device as USB and sound card (which are already listed in quirks.c):
0e:00.1 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Cryptographic Coprocessor PSPCPP [1022:1486]
0e:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Matisse USB 3.0 Host Controller [1022:149c]
0e:00.4 Audio device [0403]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse HD Audio Controller [1022:1487]

Fix tested successfully on an Asus ROG STRIX X570-E GAMING motherboard with AMD Ryzen 9 3900X.

Comments

Bjorn Helgaas Sept. 28, 2021, 9:45 p.m. UTC | #1
On Mon, Aug 30, 2021 at 10:17:15PM -0700, David Jaundrew wrote:
> This patch fixes another FLR bug for the Starship/Matisse controller:
> 
> AMD Starship/Matisse Cryptogrpahic Coprocessor PSPCPP
> 
> This patch allows functions on the same Starship/Matisse device (such as USB controller,sound card) to properly pass through to a guest OS using vfio-pc. Without this patch, the virtual machine does not boot and eventually times out.
> 
> Excerpt from lspci -nn showing crypto function on same device as USB and sound card (which are already listed in quirks.c):
> 0e:00.1 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Cryptographic Coprocessor PSPCPP [1022:1486]
> 0e:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Matisse USB 3.0 Host Controller [1022:149c]
> 0e:00.4 Audio device [0403]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse HD Audio Controller [1022:1487]
> 
> Fix tested successfully on an Asus ROG STRIX X570-E GAMING motherboard with AMD Ryzen 9 3900X.

This is missing a signed-off-by and the patch is corrupted somehow:

  04:44:29 ~/linux (pci/virtualization)$ git am m/20210830_david_avoid_flr_for_amd_starship_matisse_cryptographic_coprocessor.mbx
  Applying: Avoid FLR for AMD Starship/Matisse Cryptographic Coprocessor
  error: corrupt patch at line 4
  Patch failed at 0001 Avoid FLR for AMD Starship/Matisse Cryptographic Coprocessor

Can you fix?  If you can add least supply a signed-off-by, I can apply
it manually if necessary.

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/submitting-patches.rst?id=v5.11#n361

> --- a/drivers/pci/quirks.c      2021-08-30 21:19:25.365738689 -0700
> +++ b/drivers/pci/quirks.c      2021-08-30 21:21:25.802031789 -0700
> @@ -5208,6 +5208,7 @@
>  /*
>   * FLR may cause the following to devices to hang:
>   *
> + * AMD Starship/Matisse Cryptographic Coprocessor PSPCPP 0x1486
>   * AMD Starship/Matisse HD Audio Controller 0x1487
>   * AMD Starship USB 3.0 Host Controller 0x148c
>   * AMD Matisse USB 3.0 Host Controller 0x149c
> @@ -5219,6 +5220,7 @@
>  {
>         dev->dev_flags |= PCI_DEV_FLAGS_NO_FLR_RESET;
>  }
> +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, 0x1486, quirk_no_flr);
>  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, 0x1487, quirk_no_flr);
>  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, 0x148c, quirk_no_flr);
>  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, 0x149c, quirk_no_flr);
David Jaundrew Sept. 29, 2021, 12:16 a.m. UTC | #2
This patch fixes another FLR bug for the Starship/Matisse controller:

AMD Starship/Matisse Cryptogrpahic Coprocessor PSPCPP

This patch allows functions on the same Starship/Matisse device (such as
USB controller,sound card) to properly pass through to a guest OS using
vfio-pc. Without this patch, the virtual machine does not boot and
eventually times out.

Excerpt from lspci -nn showing crypto function on same device as USB and
sound card (which are already listed in quirks.c):

0e:00.1 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD]
  Starship/Matisse Cryptographic Coprocessor PSPCPP [1022:1486]
0e:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD]
  Matisse USB 3.0 Host Controller [1022:149c]
0e:00.4 Audio device [0403]: Advanced Micro Devices, Inc. [AMD]
  Starship/Matisse HD Audio Controller [1022:1487]

Fix tested successfully on an Asus ROG STRIX X570-E GAMING motherboard
with AMD Ryzen 9 3900X.

Signed-off-by: David Jaundrew <david@jaundrew.com>
---
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 6d74386eadc2..0d19e7aa219a 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -5208,6 +5208,7 @@ DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x443, quirk_intel_qat_vf_cap);
 /*
  * FLR may cause the following to devices to hang:
  *
+ * AMD Starship/Matisse Cryptographic Coprocessor PSPCPP 0x1486
  * AMD Starship/Matisse HD Audio Controller 0x1487
  * AMD Starship USB 3.0 Host Controller 0x148c
  * AMD Matisse USB 3.0 Host Controller 0x149c
@@ -5219,6 +5220,7 @@ static void quirk_no_flr(struct pci_dev *dev)
 {
        dev->dev_flags |= PCI_DEV_FLAGS_NO_FLR_RESET;
 }
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, 0x1486, quirk_no_flr);
 DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, 0x1487, quirk_no_flr);
 DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, 0x148c, quirk_no_flr);
 DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, 0x149c, quirk_no_flr);

On 2021-09-28 2:45 p.m., Bjorn Helgaas wrote:
> On Mon, Aug 30, 2021 at 10:17:15PM -0700, David Jaundrew wrote:
>> This patch fixes another FLR bug for the Starship/Matisse controller:
>>
>> AMD Starship/Matisse Cryptogrpahic Coprocessor PSPCPP
>>
>> This patch allows functions on the same Starship/Matisse device (such as USB controller,sound card) to properly pass through to a guest OS using vfio-pc. Without this patch, the virtual machine does not boot and eventually times out.
>>
>> Excerpt from lspci -nn showing crypto function on same device as USB and sound card (which are already listed in quirks.c):
>> 0e:00.1 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Cryptographic Coprocessor PSPCPP [1022:1486]
>> 0e:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Matisse USB 3.0 Host Controller [1022:149c]
>> 0e:00.4 Audio device [0403]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse HD Audio Controller [1022:1487]
>>
>> Fix tested successfully on an Asus ROG STRIX X570-E GAMING motherboard with AMD Ryzen 9 3900X.
> This is missing a signed-off-by and the patch is corrupted somehow:
>
>   04:44:29 ~/linux (pci/virtualization)$ git am m/20210830_david_avoid_flr_for_amd_starship_matisse_cryptographic_coprocessor.mbx
>   Applying: Avoid FLR for AMD Starship/Matisse Cryptographic Coprocessor
>   error: corrupt patch at line 4
>   Patch failed at 0001 Avoid FLR for AMD Starship/Matisse Cryptographic Coprocessor
>
> Can you fix?  If you can add least supply a signed-off-by, I can apply
> it manually if necessary.
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/submitting-patches.rst?id=v5.11#n361
>
>> --- a/drivers/pci/quirks.c      2021-08-30 21:19:25.365738689 -0700
>> +++ b/drivers/pci/quirks.c      2021-08-30 21:21:25.802031789 -0700
>> @@ -5208,6 +5208,7 @@
>>  /*
>>   * FLR may cause the following to devices to hang:
>>   *
>> + * AMD Starship/Matisse Cryptographic Coprocessor PSPCPP 0x1486
>>   * AMD Starship/Matisse HD Audio Controller 0x1487
>>   * AMD Starship USB 3.0 Host Controller 0x148c
>>   * AMD Matisse USB 3.0 Host Controller 0x149c
>> @@ -5219,6 +5220,7 @@
>>  {
>>         dev->dev_flags |= PCI_DEV_FLAGS_NO_FLR_RESET;
>>  }
>> +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, 0x1486, quirk_no_flr);
>>  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, 0x1487, quirk_no_flr);
>>  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, 0x148c, quirk_no_flr);
>>  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, 0x149c, quirk_no_flr);
Krzysztof Wilczyński Sept. 29, 2021, 12:21 a.m. UTC | #3
Hi David,

Just a note: it might have been better to sent this as v2, but given how
small this patch is, albeit Bjorn might be fine with taking it as-is.

> This patch fixes another FLR bug for the Starship/Matisse controller:
> 
> AMD Starship/Matisse Cryptogrpahic Coprocessor PSPCPP
> 
> This patch allows functions on the same Starship/Matisse device (such as
> USB controller,sound card) to properly pass through to a guest OS using
> vfio-pc. Without this patch, the virtual machine does not boot and
> eventually times out.
> 
> Excerpt from lspci -nn showing crypto function on same device as USB and
> sound card (which are already listed in quirks.c):
> 
> 0e:00.1 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD]
>   Starship/Matisse Cryptographic Coprocessor PSPCPP [1022:1486]
> 0e:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD]
>   Matisse USB 3.0 Host Controller [1022:149c]
> 0e:00.4 Audio device [0403]: Advanced Micro Devices, Inc. [AMD]
>   Starship/Matisse HD Audio Controller [1022:1487]
> 
> Fix tested successfully on an Asus ROG STRIX X570-E GAMING motherboard
> with AMD Ryzen 9 3900X.
> 
> Signed-off-by: David Jaundrew <david@jaundrew.com>
> ---
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index 6d74386eadc2..0d19e7aa219a 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -5208,6 +5208,7 @@ DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x443, quirk_intel_qat_vf_cap);
>  /*
>   * FLR may cause the following to devices to hang:
>   *
> + * AMD Starship/Matisse Cryptographic Coprocessor PSPCPP 0x1486
>   * AMD Starship/Matisse HD Audio Controller 0x1487
>   * AMD Starship USB 3.0 Host Controller 0x148c
>   * AMD Matisse USB 3.0 Host Controller 0x149c
> @@ -5219,6 +5220,7 @@ static void quirk_no_flr(struct pci_dev *dev)
>  {
>         dev->dev_flags |= PCI_DEV_FLAGS_NO_FLR_RESET;
>  }
> +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, 0x1486, quirk_no_flr);
>  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, 0x1487, quirk_no_flr);
>  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, 0x148c, quirk_no_flr);
>  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, 0x149c, quirk_no_flr);

Thank you!

Reviewed-by: Krzysztof Wilczyński <kw@linux.com>

	Krzysztof
Bjorn Helgaas Sept. 29, 2021, 1:59 a.m. UTC | #4
[+cc Alex, Krzysztof, AMD folks]

On Tue, Sep 28, 2021 at 05:16:49PM -0700, David Jaundrew wrote:
> This patch fixes another FLR bug for the Starship/Matisse controller:
> 
> AMD Starship/Matisse Cryptogrpahic Coprocessor PSPCPP
> 
> This patch allows functions on the same Starship/Matisse device (such as
> USB controller,sound card) to properly pass through to a guest OS using
> vfio-pc. Without this patch, the virtual machine does not boot and
> eventually times out.

Apparently yet another AMD device that advertises FLR support, but it
doesn't work?

I don't have a problem avoiding the FLR, but I *would* like some
indication that:

  - This is a known erratum and AMD has some plan to fix this in
    future devices so we don't have to trip over them all
    individually, and

  - This is not a security issue.  Part of the reason VFIO resets
    pass-through devices is to avoid leaking state from one guest to
    another.  If reset doesn't work, that makes me wonder, especially
    since this is a cryptographic coprocessor that sounds like it
    might be full of secrets.  So I *assume* VFIO will use a different
    type of reset instead of FLR, but I'm just double-checking.

> Excerpt from lspci -nn showing crypto function on same device as USB and
> sound card (which are already listed in quirks.c):
> 
> 0e:00.1 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD]
>   Starship/Matisse Cryptographic Coprocessor PSPCPP [1022:1486]
> 0e:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD]
>   Matisse USB 3.0 Host Controller [1022:149c]
> 0e:00.4 Audio device [0403]: Advanced Micro Devices, Inc. [AMD]
>   Starship/Matisse HD Audio Controller [1022:1487]
> 
> Fix tested successfully on an Asus ROG STRIX X570-E GAMING motherboard
> with AMD Ryzen 9 3900X.
> 
> Signed-off-by: David Jaundrew <david@jaundrew.com>

The patch below still doesn't apply.  Looks like maybe it was pasted
into the email and the tabs got changed to space?  No worries, I can
apply it manually if appropriate.

> ---
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index 6d74386eadc2..0d19e7aa219a 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -5208,6 +5208,7 @@ DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x443, quirk_intel_qat_vf_cap);
>  /*
>   * FLR may cause the following to devices to hang:
>   *
> + * AMD Starship/Matisse Cryptographic Coprocessor PSPCPP 0x1486
>   * AMD Starship/Matisse HD Audio Controller 0x1487
>   * AMD Starship USB 3.0 Host Controller 0x148c
>   * AMD Matisse USB 3.0 Host Controller 0x149c
> @@ -5219,6 +5220,7 @@ static void quirk_no_flr(struct pci_dev *dev)
>  {
>         dev->dev_flags |= PCI_DEV_FLAGS_NO_FLR_RESET;
>  }
> +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, 0x1486, quirk_no_flr);
>  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, 0x1487, quirk_no_flr);
>  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, 0x148c, quirk_no_flr);
>  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, 0x149c, quirk_no_flr);
>
Deucher, Alexander Sept. 29, 2021, 1:09 p.m. UTC | #5
[AMD Official Use Only]

> -----Original Message-----
> From: Bjorn Helgaas <helgaas@kernel.org>
> Sent: Tuesday, September 28, 2021 9:59 PM
> To: David Jaundrew <david@jaundrew.com>
> Cc: Bjorn Helgaas <bhelgaas@google.com>; linux-pci@vger.kernel.org; Alex
> Williamson <alex.williamson@redhat.com>; Deucher, Alexander
> <Alexander.Deucher@amd.com>; Shah, Nehal-bakulchandra <Nehal-
> bakulchandra.Shah@amd.com>; Koenig, Christian
> <Christian.Koenig@amd.com>; Krzysztof Wilczyński <kw@linux.com>
> Subject: Re: [PATCH] Avoid FLR for AMD Starship/Matisse Cryptographic
> Coprocessor
> 
> [+cc Alex, Krzysztof, AMD folks]
> 
> On Tue, Sep 28, 2021 at 05:16:49PM -0700, David Jaundrew wrote:
> > This patch fixes another FLR bug for the Starship/Matisse controller:
> >
> > AMD Starship/Matisse Cryptogrpahic Coprocessor PSPCPP
> >
> > This patch allows functions on the same Starship/Matisse device (such
> > as USB controller,sound card) to properly pass through to a guest OS
> > using vfio-pc. Without this patch, the virtual machine does not boot
> > and eventually times out.
> 
> Apparently yet another AMD device that advertises FLR support, but it
> doesn't work?
> 
> I don't have a problem avoiding the FLR, but I *would* like some indication
> that:
> 
>   - This is a known erratum and AMD has some plan to fix this in
>     future devices so we don't have to trip over them all
>     individually, and
> 
>   - This is not a security issue.  Part of the reason VFIO resets
>     pass-through devices is to avoid leaking state from one guest to
>     another.  If reset doesn't work, that makes me wonder, especially
>     since this is a cryptographic coprocessor that sounds like it
>     might be full of secrets.  So I *assume* VFIO will use a different
>     type of reset instead of FLR, but I'm just double-checking.
> 

Will try and get more information on these questions with the right teams internally.

Alex

> > Excerpt from lspci -nn showing crypto function on same device as USB
> > and sound card (which are already listed in quirks.c):
> >
> > 0e:00.1 Encryption controller [1080]: Advanced Micro Devices, Inc.
> > [AMD]
> >   Starship/Matisse Cryptographic Coprocessor PSPCPP [1022:1486]
> > 0e:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD]
> >   Matisse USB 3.0 Host Controller [1022:149c]
> > 0e:00.4 Audio device [0403]: Advanced Micro Devices, Inc. [AMD]
> >   Starship/Matisse HD Audio Controller [1022:1487]
> >
> > Fix tested successfully on an Asus ROG STRIX X570-E GAMING
> motherboard
> > with AMD Ryzen 9 3900X.
> >
> > Signed-off-by: David Jaundrew <david@jaundrew.com>
> 
> The patch below still doesn't apply.  Looks like maybe it was pasted into the
> email and the tabs got changed to space?  No worries, I can apply it manually
> if appropriate.
> 
> > ---
> > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index
> > 6d74386eadc2..0d19e7aa219a 100644
> > --- a/drivers/pci/quirks.c
> > +++ b/drivers/pci/quirks.c
> > @@ -5208,6 +5208,7 @@
> DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL,
> > 0x443, quirk_intel_qat_vf_cap);
> >  /*
> >   * FLR may cause the following to devices to hang:
> >   *
> > + * AMD Starship/Matisse Cryptographic Coprocessor PSPCPP 0x1486
> >   * AMD Starship/Matisse HD Audio Controller 0x1487
> >   * AMD Starship USB 3.0 Host Controller 0x148c
> >   * AMD Matisse USB 3.0 Host Controller 0x149c @@ -5219,6 +5220,7 @@
> > static void quirk_no_flr(struct pci_dev *dev)
> >  {
> >         dev->dev_flags |= PCI_DEV_FLAGS_NO_FLR_RESET;
> >  }
> > +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, 0x1486,
> quirk_no_flr);
> >  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, 0x1487,
> quirk_no_flr);
> >  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, 0x148c,
> quirk_no_flr);
> >  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, 0x149c,
> quirk_no_flr);
> >
Alex Williamson Sept. 29, 2021, 6:26 p.m. UTC | #6
On Tue, 28 Sep 2021 20:59:02 -0500
Bjorn Helgaas <helgaas@kernel.org> wrote:

> [+cc Alex, Krzysztof, AMD folks]
> 
> On Tue, Sep 28, 2021 at 05:16:49PM -0700, David Jaundrew wrote:
> > This patch fixes another FLR bug for the Starship/Matisse controller:
> > 
> > AMD Starship/Matisse Cryptogrpahic Coprocessor PSPCPP
> > 
> > This patch allows functions on the same Starship/Matisse device (such as
> > USB controller,sound card) to properly pass through to a guest OS using
> > vfio-pc. Without this patch, the virtual machine does not boot and
> > eventually times out.  
> 
> Apparently yet another AMD device that advertises FLR support, but it
> doesn't work?
> 
> I don't have a problem avoiding the FLR, but I *would* like some
> indication that:
> 
>   - This is a known erratum and AMD has some plan to fix this in
>     future devices so we don't have to trip over them all
>     individually, and
> 
>   - This is not a security issue.  Part of the reason VFIO resets
>     pass-through devices is to avoid leaking state from one guest to
>     another.  If reset doesn't work, that makes me wonder, especially
>     since this is a cryptographic coprocessor that sounds like it
>     might be full of secrets.  So I *assume* VFIO will use a different
>     type of reset instead of FLR, but I'm just double-checking.

It depends on what's available, chances are not good that we have
another means of function level reset, so this probably means it's
exposed as-is.  I agree, not great for a device managing something to
do with cryptography.  It's potentially a better security measure to
let the device wedge itself.  Thanks,

Alex
 
> > Excerpt from lspci -nn showing crypto function on same device as USB and
> > sound card (which are already listed in quirks.c):
> > 
> > 0e:00.1 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD]
> >   Starship/Matisse Cryptographic Coprocessor PSPCPP [1022:1486]
> > 0e:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD]
> >   Matisse USB 3.0 Host Controller [1022:149c]
> > 0e:00.4 Audio device [0403]: Advanced Micro Devices, Inc. [AMD]
> >   Starship/Matisse HD Audio Controller [1022:1487]
> > 
> > Fix tested successfully on an Asus ROG STRIX X570-E GAMING motherboard
> > with AMD Ryzen 9 3900X.
> > 
> > Signed-off-by: David Jaundrew <david@jaundrew.com>  
> 
> The patch below still doesn't apply.  Looks like maybe it was pasted
> into the email and the tabs got changed to space?  No worries, I can
> apply it manually if appropriate.
> 
> > ---
> > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> > index 6d74386eadc2..0d19e7aa219a 100644
> > --- a/drivers/pci/quirks.c
> > +++ b/drivers/pci/quirks.c
> > @@ -5208,6 +5208,7 @@ DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x443, quirk_intel_qat_vf_cap);
> >  /*
> >   * FLR may cause the following to devices to hang:
> >   *
> > + * AMD Starship/Matisse Cryptographic Coprocessor PSPCPP 0x1486
> >   * AMD Starship/Matisse HD Audio Controller 0x1487
> >   * AMD Starship USB 3.0 Host Controller 0x148c
> >   * AMD Matisse USB 3.0 Host Controller 0x149c
> > @@ -5219,6 +5220,7 @@ static void quirk_no_flr(struct pci_dev *dev)
> >  {
> >         dev->dev_flags |= PCI_DEV_FLAGS_NO_FLR_RESET;
> >  }
> > +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, 0x1486, quirk_no_flr);
> >  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, 0x1487, quirk_no_flr);
> >  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, 0x148c, quirk_no_flr);
> >  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, 0x149c, quirk_no_flr);
> >   
>
Bjorn Helgaas Sept. 29, 2021, 6:50 p.m. UTC | #7
On Wed, Sep 29, 2021 at 12:26:12PM -0600, Alex Williamson wrote:
> On Tue, 28 Sep 2021 20:59:02 -0500
> Bjorn Helgaas <helgaas@kernel.org> wrote:
> 
> > [+cc Alex, Krzysztof, AMD folks]
> > 
> > On Tue, Sep 28, 2021 at 05:16:49PM -0700, David Jaundrew wrote:
> > > This patch fixes another FLR bug for the Starship/Matisse controller:
> > > 
> > > AMD Starship/Matisse Cryptogrpahic Coprocessor PSPCPP
> > > 
> > > This patch allows functions on the same Starship/Matisse device (such as
> > > USB controller,sound card) to properly pass through to a guest OS using
> > > vfio-pc. Without this patch, the virtual machine does not boot and
> > > eventually times out.  
> > 
> > Apparently yet another AMD device that advertises FLR support, but it
> > doesn't work?
> > 
> > I don't have a problem avoiding the FLR, but I *would* like some
> > indication that:
> > 
> >   - This is a known erratum and AMD has some plan to fix this in
> >     future devices so we don't have to trip over them all
> >     individually, and
> > 
> >   - This is not a security issue.  Part of the reason VFIO resets
> >     pass-through devices is to avoid leaking state from one guest to
> >     another.  If reset doesn't work, that makes me wonder, especially
> >     since this is a cryptographic coprocessor that sounds like it
> >     might be full of secrets.  So I *assume* VFIO will use a different
> >     type of reset instead of FLR, but I'm just double-checking.
> 
> It depends on what's available, chances are not good that we have
> another means of function level reset, so this probably means it's
> exposed as-is.  I agree, not great for a device managing something to
> do with cryptography.  It's potentially a better security measure to
> let the device wedge itself.  Thanks,

OK, I think that means I need to ignore this patch until we have some
evidence that it's actually safe to allow VFIO to pass the device
through to another guest.

And I guess we are making the assumption that the audio, USB, and
ethernet controllers [1] are safe to hand off between guests?  I don't
know enough about those controllers to even have an opinion about
that.  Surely there is config space and MMIO space that could leak
data between guests?

Is there anything that tracks whether the device has been reset after
being passed through to a guest?  For example, I assume the following
would be safe if we could tell the reset had been done:

  - Pass through to guest A
  - Guest A exits
  - User resets all devices on bus (including this one)
  - Pass through to guest B

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/quirks.c?id=v5.14#n5212

> > > Excerpt from lspci -nn showing crypto function on same device as USB and
> > > sound card (which are already listed in quirks.c):
> > > 
> > > 0e:00.1 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD]
> > >   Starship/Matisse Cryptographic Coprocessor PSPCPP [1022:1486]
> > > 0e:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD]
> > >   Matisse USB 3.0 Host Controller [1022:149c]
> > > 0e:00.4 Audio device [0403]: Advanced Micro Devices, Inc. [AMD]
> > >   Starship/Matisse HD Audio Controller [1022:1487]
> > > 
> > > Fix tested successfully on an Asus ROG STRIX X570-E GAMING motherboard
> > > with AMD Ryzen 9 3900X.
> > > 
> > > Signed-off-by: David Jaundrew <david@jaundrew.com>  
> > 
> > The patch below still doesn't apply.  Looks like maybe it was pasted
> > into the email and the tabs got changed to space?  No worries, I can
> > apply it manually if appropriate.
> > 
> > > ---
> > > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> > > index 6d74386eadc2..0d19e7aa219a 100644
> > > --- a/drivers/pci/quirks.c
> > > +++ b/drivers/pci/quirks.c
> > > @@ -5208,6 +5208,7 @@ DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x443, quirk_intel_qat_vf_cap);
> > >  /*
> > >   * FLR may cause the following to devices to hang:
> > >   *
> > > + * AMD Starship/Matisse Cryptographic Coprocessor PSPCPP 0x1486
> > >   * AMD Starship/Matisse HD Audio Controller 0x1487
> > >   * AMD Starship USB 3.0 Host Controller 0x148c
> > >   * AMD Matisse USB 3.0 Host Controller 0x149c
> > > @@ -5219,6 +5220,7 @@ static void quirk_no_flr(struct pci_dev *dev)
> > >  {
> > >         dev->dev_flags |= PCI_DEV_FLAGS_NO_FLR_RESET;
> > >  }
> > > +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, 0x1486, quirk_no_flr);
> > >  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, 0x1487, quirk_no_flr);
> > >  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, 0x148c, quirk_no_flr);
> > >  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, 0x149c, quirk_no_flr);
Alex Williamson Sept. 29, 2021, 7:34 p.m. UTC | #8
On Wed, 29 Sep 2021 13:50:29 -0500
Bjorn Helgaas <helgaas@kernel.org> wrote:

> On Wed, Sep 29, 2021 at 12:26:12PM -0600, Alex Williamson wrote:
> > On Tue, 28 Sep 2021 20:59:02 -0500
> > Bjorn Helgaas <helgaas@kernel.org> wrote:
> >   
> > > [+cc Alex, Krzysztof, AMD folks]
> > > 
> > > On Tue, Sep 28, 2021 at 05:16:49PM -0700, David Jaundrew wrote:  
> > > > This patch fixes another FLR bug for the Starship/Matisse controller:
> > > > 
> > > > AMD Starship/Matisse Cryptogrpahic Coprocessor PSPCPP
> > > > 
> > > > This patch allows functions on the same Starship/Matisse device (such as
> > > > USB controller,sound card) to properly pass through to a guest OS using
> > > > vfio-pc. Without this patch, the virtual machine does not boot and
> > > > eventually times out.    
> > > 
> > > Apparently yet another AMD device that advertises FLR support, but it
> > > doesn't work?
> > > 
> > > I don't have a problem avoiding the FLR, but I *would* like some
> > > indication that:
> > > 
> > >   - This is a known erratum and AMD has some plan to fix this in
> > >     future devices so we don't have to trip over them all
> > >     individually, and
> > > 
> > >   - This is not a security issue.  Part of the reason VFIO resets
> > >     pass-through devices is to avoid leaking state from one guest to
> > >     another.  If reset doesn't work, that makes me wonder, especially
> > >     since this is a cryptographic coprocessor that sounds like it
> > >     might be full of secrets.  So I *assume* VFIO will use a different
> > >     type of reset instead of FLR, but I'm just double-checking.  
> > 
> > It depends on what's available, chances are not good that we have
> > another means of function level reset, so this probably means it's
> > exposed as-is.  I agree, not great for a device managing something to
> > do with cryptography.  It's potentially a better security measure to
> > let the device wedge itself.  Thanks,  
> 
> OK, I think that means I need to ignore this patch until we have some
> evidence that it's actually safe to allow VFIO to pass the device
> through to another guest.
> 
> And I guess we are making the assumption that the audio, USB, and
> ethernet controllers [1] are safe to hand off between guests?  I don't
> know enough about those controllers to even have an opinion about
> that.  Surely there is config space and MMIO space that could leak
> data between guests?

The expectation is that there's a lot less potential for such devices.
If we were to try to restrict assignment to absolutely secure devices
we'd probably rule out anything that's not a VF and then start
excluding things from there.  Even with proper resets, there's a
potential that a user could muck with non-volatile device state (ex.
option ROMs).
 
> Is there anything that tracks whether the device has been reset after
> being passed through to a guest?  For example, I assume the following
> would be safe if we could tell the reset had been done:
> 
>   - Pass through to guest A
>   - Guest A exits
>   - User resets all devices on bus (including this one)
>   - Pass through to guest B

Yes, we do track whether a device has been reset, but rather by the
kernel more so than userspace as the latter might just invite userspace
to exploit an mmap post-reset in order to insert a payload.

Our tracking is more for the purpose of trying to do a bus reset once
all of the devices affected are released from userspace.  For example
if the multi-function set includes this crypto device, the usb
controller, and audio controller, once the user releases the last of
those, if any of them still require a reset we'd perform a bus reset.
However, if these devices are actually in separate IOMMU groups (reset
scope is not accounted for in grouping), then the user could release the
crypto device and it could be re-assigned to a new user and we never
get a chance to reset the bus.  I don't know what grouping looks like
among these devices.  Thanks,

Alex
Deucher, Alexander Sept. 29, 2021, 8:07 p.m. UTC | #9
[Public]

> -----Original Message-----
> From: Bjorn Helgaas <helgaas@kernel.org>
> Sent: Wednesday, September 29, 2021 2:50 PM
> To: Alex Williamson <alex.williamson@redhat.com>
> Cc: David Jaundrew <david@jaundrew.com>; Bjorn Helgaas
> <bhelgaas@google.com>; linux-pci@vger.kernel.org; Deucher, Alexander
> <Alexander.Deucher@amd.com>; Shah, Nehal-bakulchandra <Nehal-
> bakulchandra.Shah@amd.com>; Koenig, Christian
> <Christian.Koenig@amd.com>; Krzysztof Wilczyński <kw@linux.com>
> Subject: Re: [PATCH] Avoid FLR for AMD Starship/Matisse Cryptographic
> Coprocessor
> 
> On Wed, Sep 29, 2021 at 12:26:12PM -0600, Alex Williamson wrote:
> > On Tue, 28 Sep 2021 20:59:02 -0500
> > Bjorn Helgaas <helgaas@kernel.org> wrote:
> >
> > > [+cc Alex, Krzysztof, AMD folks]
> > >
> > > On Tue, Sep 28, 2021 at 05:16:49PM -0700, David Jaundrew wrote:
> > > > This patch fixes another FLR bug for the Starship/Matisse controller:
> > > >
> > > > AMD Starship/Matisse Cryptogrpahic Coprocessor PSPCPP
> > > >
> > > > This patch allows functions on the same Starship/Matisse device
> > > > (such as USB controller,sound card) to properly pass through to a
> > > > guest OS using vfio-pc. Without this patch, the virtual machine
> > > > does not boot and eventually times out.
> > >
> > > Apparently yet another AMD device that advertises FLR support, but
> > > it doesn't work?
> > >
> > > I don't have a problem avoiding the FLR, but I *would* like some
> > > indication that:
> > >
> > >   - This is a known erratum and AMD has some plan to fix this in
> > >     future devices so we don't have to trip over them all
> > >     individually, and
> > >
> > >   - This is not a security issue.  Part of the reason VFIO resets
> > >     pass-through devices is to avoid leaking state from one guest to
> > >     another.  If reset doesn't work, that makes me wonder, especially
> > >     since this is a cryptographic coprocessor that sounds like it
> > >     might be full of secrets.  So I *assume* VFIO will use a different
> > >     type of reset instead of FLR, but I'm just double-checking.
> >
> > It depends on what's available, chances are not good that we have
> > another means of function level reset, so this probably means it's
> > exposed as-is.  I agree, not great for a device managing something to
> > do with cryptography.  It's potentially a better security measure to
> > let the device wedge itself.  Thanks,
> 
> OK, I think that means I need to ignore this patch until we have some
> evidence that it's actually safe to allow VFIO to pass the device through to
> another guest.
> 
> And I guess we are making the assumption that the audio, USB, and ethernet
> controllers [1] are safe to hand off between guests?  I don't know enough
> about those controllers to even have an opinion about that.  Surely there is
> config space and MMIO space that could leak data between guests?

Adding a few more AMD people.

I doubt FLR was intended to be enabled on these consumer parts, it was probably a mistake.  I'm trying to find out more internally.  In general, I suspect most vendors don't do much passthrough validation on consumer level hardware.  

> 
> Is there anything that tracks whether the device has been reset after being
> passed through to a guest?  For example, I assume the following would be
> safe if we could tell the reset had been done:
> 
>   - Pass through to guest A
>   - Guest A exits
>   - User resets all devices on bus (including this one)
>   - Pass through to guest B
> 

Probably the safest thing to do would be to not allow passthrough on any PCI devices which don't have functional FLR support.  Do other things like PCI hot resets make the same guarantees about device state that FLR does?

Alex


> [1]
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.k
> ernel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Ftorvalds%2Flinux.git%
> 2Ftree%2Fdrivers%2Fpci%2Fquirks.c%3Fid%3Dv5.14%23n5212&amp;data=04
> %7C01%7CAlexander.Deucher%40amd.com%7Cf6d3929788584ce937d208d98
> 37a0366%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C6376853823
> 39000224%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoi
> V2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=DMR7b2J
> W%2FnSO6hmz44r%2FRvvu0ml2krECECZaU2pXC%2BM%3D&amp;reserved=
> 0
> 
> > > > Excerpt from lspci -nn showing crypto function on same device as
> > > > USB and sound card (which are already listed in quirks.c):
> > > >
> > > > 0e:00.1 Encryption controller [1080]: Advanced Micro Devices, Inc.
> > > > [AMD]
> > > >   Starship/Matisse Cryptographic Coprocessor PSPCPP [1022:1486]
> > > > 0e:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD]
> > > >   Matisse USB 3.0 Host Controller [1022:149c]
> > > > 0e:00.4 Audio device [0403]: Advanced Micro Devices, Inc. [AMD]
> > > >   Starship/Matisse HD Audio Controller [1022:1487]
> > > >
> > > > Fix tested successfully on an Asus ROG STRIX X570-E GAMING
> > > > motherboard with AMD Ryzen 9 3900X.
> > > >
> > > > Signed-off-by: David Jaundrew <david@jaundrew.com>
> > >
> > > The patch below still doesn't apply.  Looks like maybe it was pasted
> > > into the email and the tabs got changed to space?  No worries, I can
> > > apply it manually if appropriate.
> > >
> > > > ---
> > > > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index
> > > > 6d74386eadc2..0d19e7aa219a 100644
> > > > --- a/drivers/pci/quirks.c
> > > > +++ b/drivers/pci/quirks.c
> > > > @@ -5208,6 +5208,7 @@
> DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL,
> > > > 0x443, quirk_intel_qat_vf_cap);
> > > >  /*
> > > >   * FLR may cause the following to devices to hang:
> > > >   *
> > > > + * AMD Starship/Matisse Cryptographic Coprocessor PSPCPP 0x1486
> > > >   * AMD Starship/Matisse HD Audio Controller 0x1487
> > > >   * AMD Starship USB 3.0 Host Controller 0x148c
> > > >   * AMD Matisse USB 3.0 Host Controller 0x149c @@ -5219,6 +5220,7
> > > > @@ static void quirk_no_flr(struct pci_dev *dev)
> > > >  {
> > > >         dev->dev_flags |= PCI_DEV_FLAGS_NO_FLR_RESET;
> > > >  }
> > > > +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, 0x1486,
> quirk_no_flr);
> > > >  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, 0x1487,
> quirk_no_flr);
> > > >  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, 0x148c,
> quirk_no_flr);
> > > >  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, 0x149c,
> quirk_no_flr);
diff mbox series

Patch

--- a/drivers/pci/quirks.c      2021-08-30 21:19:25.365738689 -0700
+++ b/drivers/pci/quirks.c      2021-08-30 21:21:25.802031789 -0700
@@ -5208,6 +5208,7 @@ 
 /*
  * FLR may cause the following to devices to hang:
  *
+ * AMD Starship/Matisse Cryptographic Coprocessor PSPCPP 0x1486
  * AMD Starship/Matisse HD Audio Controller 0x1487
  * AMD Starship USB 3.0 Host Controller 0x148c
  * AMD Matisse USB 3.0 Host Controller 0x149c
@@ -5219,6 +5220,7 @@ 
 {
        dev->dev_flags |= PCI_DEV_FLAGS_NO_FLR_RESET;
 }
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, 0x1486, quirk_no_flr);
 DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, 0x1487, quirk_no_flr);
 DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, 0x148c, quirk_no_flr);
 DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, 0x149c, quirk_no_flr);