diff mbox series

[v4] PCI: vmd: Do not change the BIOS Hotplug setting on VMD rootports

Message ID 20230829051022.1328383-1-nirmal.patel@linux.intel.com
State New
Headers show
Series [v4] PCI: vmd: Do not change the BIOS Hotplug setting on VMD rootports | expand

Commit Message

Nirmal Patel Aug. 29, 2023, 5:10 a.m. UTC
Currently during Host boot up, VMD UEFI driver loads and configures
all the VMD endpoints devices and devices behind VMD. Then during
VMD rootport creation, VMD driver honors ACPI settings for Hotplug,
AER, DPC, PM and enables these features based on BIOS settings.

During the Guest boot up, ACPI settings along with VMD UEFI driver are
not present in Guest BIOS which results in assigning default values to
Hotplug, AER, DPC, etc. As a result hotplug is disabled on the VMD
rootports in the Guest OS.

VMD driver in Guest should be able to see the same settings as seen
by Host VMD driver. Because of the missing implementation of VMD UEFI
driver in Guest BIOS, the Hotplug is disabled on VMD rootport in
Guest OS. Hot inserted drives don't show up and hot removed drives
do not disappear even if VMD supports Hotplug in Guest. This
behavior is observed in various combinations of guest OSes i.e. RHEL,
SLES and hypervisors i.e. KVM and ESXI.

This change will make the VMD Host and Guest Driver to keep the settings
implemented by the UEFI VMD DXE driver and thus honoring the user
selections for hotplug in the BIOS.

Signed-off-by: Nirmal Patel <nirmal.patel@linux.intel.com>
---
v3->v4: Rewrite the commit log.
v2->v3: Update the commit log.
v1->v2: Update the commit log.
---
 drivers/pci/controller/vmd.c | 2 --
 1 file changed, 2 deletions(-)

Comments

Lorenzo Pieralisi Aug. 29, 2023, 3:12 p.m. UTC | #1
On Tue, Aug 29, 2023 at 01:10:22AM -0400, Nirmal Patel wrote:
> Currently during Host boot up, VMD UEFI driver loads and configures
> all the VMD endpoints devices and devices behind VMD. Then during
> VMD rootport creation, VMD driver honors ACPI settings for Hotplug,
> AER, DPC, PM and enables these features based on BIOS settings.
> 
> During the Guest boot up, ACPI settings along with VMD UEFI driver are
> not present in Guest BIOS which results in assigning default values to
> Hotplug, AER, DPC, etc. As a result hotplug is disabled on the VMD
> rootports in the Guest OS.
> 
> VMD driver in Guest should be able to see the same settings as seen
> by Host VMD driver. Because of the missing implementation of VMD UEFI
> driver in Guest BIOS, the Hotplug is disabled on VMD rootport in
> Guest OS. Hot inserted drives don't show up and hot removed drives
> do not disappear even if VMD supports Hotplug in Guest. This
> behavior is observed in various combinations of guest OSes i.e. RHEL,
> SLES and hypervisors i.e. KVM and ESXI.
> 
> This change will make the VMD Host and Guest Driver to keep the settings
> implemented by the UEFI VMD DXE driver and thus honoring the user
> selections for hotplug in the BIOS.
> 
> Signed-off-by: Nirmal Patel <nirmal.patel@linux.intel.com>
> ---
> v3->v4: Rewrite the commit log.
> v2->v3: Update the commit log.
> v1->v2: Update the commit log.
> ---
>  drivers/pci/controller/vmd.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
> index 769eedeb8802..52c2461b4761 100644
> --- a/drivers/pci/controller/vmd.c
> +++ b/drivers/pci/controller/vmd.c
> @@ -701,8 +701,6 @@ static int vmd_alloc_irqs(struct vmd_dev *vmd)
>  static void vmd_copy_host_bridge_flags(struct pci_host_bridge *root_bridge,
>  				       struct pci_host_bridge *vmd_bridge)
>  {
> -	vmd_bridge->native_pcie_hotplug = root_bridge->native_pcie_hotplug;
> -	vmd_bridge->native_shpc_hotplug = root_bridge->native_shpc_hotplug;

How is the host bridge probed in the guest if there is no ACPI support ?

I would like to understand this better how this works.

>  	vmd_bridge->native_aer = root_bridge->native_aer;
>  	vmd_bridge->native_pme = root_bridge->native_pme;
>  	vmd_bridge->native_ltr = root_bridge->native_ltr;

I don't get why *only* the hotplug flag should not be copied. Either
you want to preserve them all or none.

I assume the issue is that in the host, the _OSC method is used to
probe for flags whereas in the guest you can't rely on it ?

Is there a use case where you *do* want to copy the flags from the
root_bridge to the vmd_bridge ?

This does not look solid to me.

Thanks,
Lorenzo
Bjorn Helgaas Aug. 29, 2023, 6 p.m. UTC | #2
On Tue, Aug 29, 2023 at 01:10:22AM -0400, Nirmal Patel wrote:
> Currently during Host boot up, VMD UEFI driver loads and configures
> all the VMD endpoints devices and devices behind VMD. Then during
> VMD rootport creation, VMD driver honors ACPI settings for Hotplug,
> AER, DPC, PM and enables these features based on BIOS settings.
> 
> During the Guest boot up, ACPI settings along with VMD UEFI driver are
> not present in Guest BIOS which results in assigning default values to
> Hotplug, AER, DPC, etc. As a result hotplug is disabled on the VMD
> rootports in the Guest OS.
> 
> VMD driver in Guest should be able to see the same settings as seen
> by Host VMD driver. Because of the missing implementation of VMD UEFI
> driver in Guest BIOS, the Hotplug is disabled on VMD rootport in
> Guest OS. Hot inserted drives don't show up and hot removed drives
> do not disappear even if VMD supports Hotplug in Guest. This
> behavior is observed in various combinations of guest OSes i.e. RHEL,
> SLES and hypervisors i.e. KVM and ESXI.
> 
> This change will make the VMD Host and Guest Driver to keep the settings
> implemented by the UEFI VMD DXE driver and thus honoring the user
> selections for hotplug in the BIOS.

These settings are negotiated between the OS and the BIOS.  The guest
runs a different BIOS than the host, so why should the guest setting
be related to the host setting?

I'm not a virtualization whiz, and I don't understand all of what's
going on here, so please correct me when I go wrong:

IIUC you need to change the guest behavior.  The guest currently sees
vmd_bridge->native_pcie_hotplug as FALSE, and you need it to be TRUE?
Currently this is copied from the guest's
root_bridge->native_pcie_hotplug, so that must also be FALSE.

I guess the guest sees a fabricated host bridge, which would start
with native_pcie_hotplug as TRUE (from pci_init_host_bridge()), and
then it must be set to FALSE because the guest _OSC didn't grant
ownership to the OS?  (The guest dmesg should show this, right?)

In the guest, vmd_enable_domain() allocates a host bridge via
pci_create_root_bus(), and that would again start with
native_pcie_hotplug as TRUE.  It's not an ACPI host bridge, so I don't
think we do _OSC negotiation for it.  After this patch removes the
copy from the fabricated host bridge, it would be left as TRUE.

If this is on track, it seems like if we want the guest to own PCIe
hotplug, the guest BIOS _OSC for the fabricated host bridge should
grant ownership of it.

> Signed-off-by: Nirmal Patel <nirmal.patel@linux.intel.com>
> ---
> v3->v4: Rewrite the commit log.
> v2->v3: Update the commit log.
> v1->v2: Update the commit log.
> ---
>  drivers/pci/controller/vmd.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
> index 769eedeb8802..52c2461b4761 100644
> --- a/drivers/pci/controller/vmd.c
> +++ b/drivers/pci/controller/vmd.c
> @@ -701,8 +701,6 @@ static int vmd_alloc_irqs(struct vmd_dev *vmd)
>  static void vmd_copy_host_bridge_flags(struct pci_host_bridge *root_bridge,
>  				       struct pci_host_bridge *vmd_bridge)
>  {
> -	vmd_bridge->native_pcie_hotplug = root_bridge->native_pcie_hotplug;
> -	vmd_bridge->native_shpc_hotplug = root_bridge->native_shpc_hotplug;
>  	vmd_bridge->native_aer = root_bridge->native_aer;
>  	vmd_bridge->native_pme = root_bridge->native_pme;
>  	vmd_bridge->native_ltr = root_bridge->native_ltr;
> -- 
> 2.31.1
>
Nirmal Patel Aug. 29, 2023, 8:54 p.m. UTC | #3
On 8/29/2023 8:12 AM, Lorenzo Pieralisi wrote:
> On Tue, Aug 29, 2023 at 01:10:22AM -0400, Nirmal Patel wrote:
>> Currently during Host boot up, VMD UEFI driver loads and configures
>> all the VMD endpoints devices and devices behind VMD. Then during
>> VMD rootport creation, VMD driver honors ACPI settings for Hotplug,
>> AER, DPC, PM and enables these features based on BIOS settings.
>>
>> During the Guest boot up, ACPI settings along with VMD UEFI driver are
>> not present in Guest BIOS which results in assigning default values to
>> Hotplug, AER, DPC, etc. As a result hotplug is disabled on the VMD
>> rootports in the Guest OS.
>>
>> VMD driver in Guest should be able to see the same settings as seen
>> by Host VMD driver. Because of the missing implementation of VMD UEFI
>> driver in Guest BIOS, the Hotplug is disabled on VMD rootport in
>> Guest OS. Hot inserted drives don't show up and hot removed drives
>> do not disappear even if VMD supports Hotplug in Guest. This
>> behavior is observed in various combinations of guest OSes i.e. RHEL,
>> SLES and hypervisors i.e. KVM and ESXI.
>>
>> This change will make the VMD Host and Guest Driver to keep the settings
>> implemented by the UEFI VMD DXE driver and thus honoring the user
>> selections for hotplug in the BIOS.
>>
>> Signed-off-by: Nirmal Patel <nirmal.patel@linux.intel.com>
>> ---
>> v3->v4: Rewrite the commit log.
>> v2->v3: Update the commit log.
>> v1->v2: Update the commit log.
>> ---
>>  drivers/pci/controller/vmd.c | 2 --
>>  1 file changed, 2 deletions(-)
>>
>> diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
>> index 769eedeb8802..52c2461b4761 100644
>> --- a/drivers/pci/controller/vmd.c
>> +++ b/drivers/pci/controller/vmd.c
>> @@ -701,8 +701,6 @@ static int vmd_alloc_irqs(struct vmd_dev *vmd)
>>  static void vmd_copy_host_bridge_flags(struct pci_host_bridge *root_bridge,
>>  				       struct pci_host_bridge *vmd_bridge)
>>  {
>> -	vmd_bridge->native_pcie_hotplug = root_bridge->native_pcie_hotplug;
>> -	vmd_bridge->native_shpc_hotplug = root_bridge->native_shpc_hotplug;
> How is the host bridge probed in the guest if there is no ACPI support ?
>
> I would like to understand this better how this works.
>
>>  	vmd_bridge->native_aer = root_bridge->native_aer;
>>  	vmd_bridge->native_pme = root_bridge->native_pme;
>>  	vmd_bridge->native_ltr = root_bridge->native_ltr;
> I don't get why *only* the hotplug flag should not be copied. Either
> you want to preserve them all or none.
>
> I assume the issue is that in the host, the _OSC method is used to
> probe for flags whereas in the guest you can't rely on it ?

Yes. Guest OSes don't have same _OSC implementation as Host OSes.

>
> Is there a use case where you *do* want to copy the flags from the
> root_bridge to the vmd_bridge ?

VMD wasn't dependent on root_bridge flags until this patch.
https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git/commit/drivers/pci/controller/vmd.c?id=04b12ef163d10e348db664900ae7f611b83c7a0e

The solution to this patch could have been either using rate limit
for AER or disabling AER from BIOS. I think other flags were added
as part of this fix for AER for the same reason you mentioned above.
Either put all the flags or nothing. Hotplug is broken in Guest OS
after this patch. Is reverting this patch an option?

Thanks

>
> This does not look solid to me.
>
> Thanks,
> Lorenzo
Nirmal Patel Aug. 29, 2023, 9:35 p.m. UTC | #4
On 8/29/2023 11:00 AM, Bjorn Helgaas wrote:
> On Tue, Aug 29, 2023 at 01:10:22AM -0400, Nirmal Patel wrote:
>> Currently during Host boot up, VMD UEFI driver loads and configures
>> all the VMD endpoints devices and devices behind VMD. Then during
>> VMD rootport creation, VMD driver honors ACPI settings for Hotplug,
>> AER, DPC, PM and enables these features based on BIOS settings.
>>
>> During the Guest boot up, ACPI settings along with VMD UEFI driver are
>> not present in Guest BIOS which results in assigning default values to
>> Hotplug, AER, DPC, etc. As a result hotplug is disabled on the VMD
>> rootports in the Guest OS.
>>
>> VMD driver in Guest should be able to see the same settings as seen
>> by Host VMD driver. Because of the missing implementation of VMD UEFI
>> driver in Guest BIOS, the Hotplug is disabled on VMD rootport in
>> Guest OS. Hot inserted drives don't show up and hot removed drives
>> do not disappear even if VMD supports Hotplug in Guest. This
>> behavior is observed in various combinations of guest OSes i.e. RHEL,
>> SLES and hypervisors i.e. KVM and ESXI.
>>
>> This change will make the VMD Host and Guest Driver to keep the settings
>> implemented by the UEFI VMD DXE driver and thus honoring the user
>> selections for hotplug in the BIOS.
> These settings are negotiated between the OS and the BIOS.  The guest
> runs a different BIOS than the host, so why should the guest setting
> be related to the host setting?
>
> I'm not a virtualization whiz, and I don't understand all of what's
> going on here, so please correct me when I go wrong:
>
> IIUC you need to change the guest behavior.  The guest currently sees
> vmd_bridge->native_pcie_hotplug as FALSE, and you need it to be TRUE?

Correct.

> Currently this is copied from the guest's
> root_bridge->native_pcie_hotplug, so that must also be FALSE.
>
> I guess the guest sees a fabricated host bridge, which would start
> with native_pcie_hotplug as TRUE (from pci_init_host_bridge()), and
> then it must be set to FALSE because the guest _OSC didn't grant
> ownership to the OS?  (The guest dmesg should show this, right?)

This is my understanding too. I don't know much in detail about Guest
expectation.

>
> In the guest, vmd_enable_domain() allocates a host bridge via
> pci_create_root_bus(), and that would again start with
> native_pcie_hotplug as TRUE.  It's not an ACPI host bridge, so I don't
> think we do _OSC negotiation for it.  After this patch removes the
> copy from the fabricated host bridge, it would be left as TRUE.

VMD was not dependent on _OSC settings and is not ACPI Host bridge. It
became _OSC dependent after the patch 04b12ef163d1.
https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git/commit/drivers/pci/controller/vmd.c?id=04b12ef163d10e348db664900ae7f611b83c7a0e

This patch was added as a quick fix for AER flooding but it could
have been avoided by using rate limit for AER.

I don't know all the history of VMD driver but does it have to be
dependent on root_bridge flags from _OSC? Is reverting 04b12ef163d1
a better idea than not allowing just hotplug flags to be copied from
root_bridge?

Thanks.

>
> If this is on track, it seems like if we want the guest to own PCIe
> hotplug, the guest BIOS _OSC for the fabricated host bridge should
> grant ownership of it.

I will try to check this option.

>
>> Signed-off-by: Nirmal Patel <nirmal.patel@linux.intel.com>
>> ---
>> v3->v4: Rewrite the commit log.
>> v2->v3: Update the commit log.
>> v1->v2: Update the commit log.
>> ---
>>  drivers/pci/controller/vmd.c | 2 --
>>  1 file changed, 2 deletions(-)
>>
>> diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
>> index 769eedeb8802..52c2461b4761 100644
>> --- a/drivers/pci/controller/vmd.c
>> +++ b/drivers/pci/controller/vmd.c
>> @@ -701,8 +701,6 @@ static int vmd_alloc_irqs(struct vmd_dev *vmd)
>>  static void vmd_copy_host_bridge_flags(struct pci_host_bridge *root_bridge,
>>  				       struct pci_host_bridge *vmd_bridge)
>>  {
>> -	vmd_bridge->native_pcie_hotplug = root_bridge->native_pcie_hotplug;
>> -	vmd_bridge->native_shpc_hotplug = root_bridge->native_shpc_hotplug;
>>  	vmd_bridge->native_aer = root_bridge->native_aer;
>>  	vmd_bridge->native_pme = root_bridge->native_pme;
>>  	vmd_bridge->native_ltr = root_bridge->native_ltr;
>> -- 
>> 2.31.1
>>
Lorenzo Pieralisi Aug. 30, 2023, 1:47 p.m. UTC | #5
On Tue, Aug 29, 2023 at 02:35:36PM -0700, Patel, Nirmal wrote:
> On 8/29/2023 11:00 AM, Bjorn Helgaas wrote:
> > On Tue, Aug 29, 2023 at 01:10:22AM -0400, Nirmal Patel wrote:
> >> Currently during Host boot up, VMD UEFI driver loads and configures
> >> all the VMD endpoints devices and devices behind VMD. Then during
> >> VMD rootport creation, VMD driver honors ACPI settings for Hotplug,
> >> AER, DPC, PM and enables these features based on BIOS settings.
> >>
> >> During the Guest boot up, ACPI settings along with VMD UEFI driver are
> >> not present in Guest BIOS which results in assigning default values to
> >> Hotplug, AER, DPC, etc. As a result hotplug is disabled on the VMD
> >> rootports in the Guest OS.
> >>
> >> VMD driver in Guest should be able to see the same settings as seen
> >> by Host VMD driver. Because of the missing implementation of VMD UEFI
> >> driver in Guest BIOS, the Hotplug is disabled on VMD rootport in
> >> Guest OS. Hot inserted drives don't show up and hot removed drives
> >> do not disappear even if VMD supports Hotplug in Guest. This
> >> behavior is observed in various combinations of guest OSes i.e. RHEL,
> >> SLES and hypervisors i.e. KVM and ESXI.
> >>
> >> This change will make the VMD Host and Guest Driver to keep the settings
> >> implemented by the UEFI VMD DXE driver and thus honoring the user
> >> selections for hotplug in the BIOS.
> > These settings are negotiated between the OS and the BIOS.  The guest
> > runs a different BIOS than the host, so why should the guest setting
> > be related to the host setting?
> >
> > I'm not a virtualization whiz, and I don't understand all of what's
> > going on here, so please correct me when I go wrong:
> >
> > IIUC you need to change the guest behavior.  The guest currently sees
> > vmd_bridge->native_pcie_hotplug as FALSE, and you need it to be TRUE?
> 
> Correct.

Why ? Can you explain to us please what the bug is ?

It is clear that "hotplug is broken in the guest" is not enough
to fix this sensibly.

> > Currently this is copied from the guest's
> > root_bridge->native_pcie_hotplug, so that must also be FALSE.
> >
> > I guess the guest sees a fabricated host bridge, which would start
> > with native_pcie_hotplug as TRUE (from pci_init_host_bridge()), and
> > then it must be set to FALSE because the guest _OSC didn't grant
> > ownership to the OS?  (The guest dmesg should show this, right?)
> 
> This is my understanding too. I don't know much in detail about Guest
> expectation.

What clears native_pcie_hotplug in the guest (if it does not have
ACPI firmware and therefore no _OSC negotiation) ?

> > In the guest, vmd_enable_domain() allocates a host bridge via
> > pci_create_root_bus(), and that would again start with
> > native_pcie_hotplug as TRUE.  It's not an ACPI host bridge, so I don't
> > think we do _OSC negotiation for it.  After this patch removes the
> > copy from the fabricated host bridge, it would be left as TRUE.
> 
> VMD was not dependent on _OSC settings and is not ACPI Host bridge. It
> became _OSC dependent after the patch 04b12ef163d1.
> https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git/commit/drivers/pci/controller/vmd.c?id=04b12ef163d10e348db664900ae7f611b83c7a0e
> 
> This patch was added as a quick fix for AER flooding but it could
> have been avoided by using rate limit for AER.
> 
> I don't know all the history of VMD driver but does it have to be
> dependent on root_bridge flags from _OSC? Is reverting 04b12ef163d1
> a better idea than not allowing just hotplug flags to be copied from
> root_bridge?

No, a better idea is to understand what the bug is and fix it for
good.

So we are back to the drawing board to define how the host bridge
driver works in the host and in the guest to understand how hardware
is configured and what do we expect from it.

Thanks,
Lorenzo

> 
> Thanks.
> 
> >
> > If this is on track, it seems like if we want the guest to own PCIe
> > hotplug, the guest BIOS _OSC for the fabricated host bridge should
> > grant ownership of it.
> 
> I will try to check this option.
> 
> >
> >> Signed-off-by: Nirmal Patel <nirmal.patel@linux.intel.com>
> >> ---
> >> v3->v4: Rewrite the commit log.
> >> v2->v3: Update the commit log.
> >> v1->v2: Update the commit log.
> >> ---
> >>  drivers/pci/controller/vmd.c | 2 --
> >>  1 file changed, 2 deletions(-)
> >>
> >> diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
> >> index 769eedeb8802..52c2461b4761 100644
> >> --- a/drivers/pci/controller/vmd.c
> >> +++ b/drivers/pci/controller/vmd.c
> >> @@ -701,8 +701,6 @@ static int vmd_alloc_irqs(struct vmd_dev *vmd)
> >>  static void vmd_copy_host_bridge_flags(struct pci_host_bridge *root_bridge,
> >>  				       struct pci_host_bridge *vmd_bridge)
> >>  {
> >> -	vmd_bridge->native_pcie_hotplug = root_bridge->native_pcie_hotplug;
> >> -	vmd_bridge->native_shpc_hotplug = root_bridge->native_shpc_hotplug;
> >>  	vmd_bridge->native_aer = root_bridge->native_aer;
> >>  	vmd_bridge->native_pme = root_bridge->native_pme;
> >>  	vmd_bridge->native_ltr = root_bridge->native_ltr;
> >> -- 
> >> 2.31.1
> >>
>
Bjorn Helgaas Aug. 30, 2023, 4:55 p.m. UTC | #6
[+cc Kai-Heng, Lorenzo, Rafael]

On Tue, Aug 29, 2023 at 02:35:36PM -0700, Patel, Nirmal wrote:
> On 8/29/2023 11:00 AM, Bjorn Helgaas wrote:
> > On Tue, Aug 29, 2023 at 01:10:22AM -0400, Nirmal Patel wrote:
> >> Currently during Host boot up, VMD UEFI driver loads and configures
> >> all the VMD endpoints devices and devices behind VMD. Then during
> >> VMD rootport creation, VMD driver honors ACPI settings for Hotplug,
> >> AER, DPC, PM and enables these features based on BIOS settings.
> >>
> >> During the Guest boot up, ACPI settings along with VMD UEFI driver are
> >> not present in Guest BIOS which results in assigning default values to
> >> Hotplug, AER, DPC, etc. As a result hotplug is disabled on the VMD
> >> rootports in the Guest OS.
> >>
> >> VMD driver in Guest should be able to see the same settings as seen
> >> by Host VMD driver. Because of the missing implementation of VMD UEFI
> >> driver in Guest BIOS, the Hotplug is disabled on VMD rootport in
> >> Guest OS. Hot inserted drives don't show up and hot removed drives
> >> do not disappear even if VMD supports Hotplug in Guest. This
> >> behavior is observed in various combinations of guest OSes i.e. RHEL,
> >> SLES and hypervisors i.e. KVM and ESXI.
> >>
> >> This change will make the VMD Host and Guest Driver to keep the settings
> >> implemented by the UEFI VMD DXE driver and thus honoring the user
> >> selections for hotplug in the BIOS.
> > These settings are negotiated between the OS and the BIOS.  The guest
> > runs a different BIOS than the host, so why should the guest setting
> > be related to the host setting?
> >
> > I'm not a virtualization whiz, and I don't understand all of what's
> > going on here, so please correct me when I go wrong:
> >
> > IIUC you need to change the guest behavior.  The guest currently sees
> > vmd_bridge->native_pcie_hotplug as FALSE, and you need it to be TRUE?
> 
> Correct.
> 
> > Currently this is copied from the guest's
> > root_bridge->native_pcie_hotplug, so that must also be FALSE.
> >
> > I guess the guest sees a fabricated host bridge, which would start
> > with native_pcie_hotplug as TRUE (from pci_init_host_bridge()), and
> > then it must be set to FALSE because the guest _OSC didn't grant
> > ownership to the OS?  (The guest dmesg should show this, right?)
> 
> This is my understanding too. I don't know much in detail about Guest
> expectation.
> 
> > In the guest, vmd_enable_domain() allocates a host bridge via
> > pci_create_root_bus(), and that would again start with
> > native_pcie_hotplug as TRUE.  It's not an ACPI host bridge, so I don't
> > think we do _OSC negotiation for it.  After this patch removes the
> > copy from the fabricated host bridge, it would be left as TRUE.
> 
> VMD was not dependent on _OSC settings and is not ACPI Host bridge. It
> became _OSC dependent after the patch 04b12ef163d1.
> https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git/commit/drivers/pci/controller/vmd.c?id=04b12ef163d10e348db664900ae7f611b83c7a0e
> 
> This patch was added as a quick fix for AER flooding but it could
> have been avoided by using rate limit for AER.
> 
> I don't know all the history of VMD driver but does it have to be
> dependent on root_bridge flags from _OSC? Is reverting 04b12ef163d1
> a better idea than not allowing just hotplug flags to be copied from
> root_bridge?

It seems like the question is who owns AER, hotplug, etc for devices
below VMD.  AER rate limiting sounds itself like a quick fix without
addressing the underlying problem.

> > If this is on track, it seems like if we want the guest to own PCIe
> > hotplug, the guest BIOS _OSC for the fabricated host bridge should
> > grant ownership of it.
> 
> I will try to check this option.

On second thought, this doesn't seem right to me.  An _OSC method
clearly applies to the hierarchy under that device, e.g., if we have a
PNP0A03 host bridge with _SEG 0000 and _CRS that includes [bus 00-ff],
its _OSC clearly applies to any devices in domain 0000, which in this
case would include the VMD bridge.

But I don't think it should apply to the devices *below* the VMD
bridge.  Those are in a different domain, and if the platform wants to
manage AER, hotplug, etc., for those devices, it would have to know
some alternate config access method in order to read the AER and
hotplug registers.  I think that config access depends on the setup
done by the VMD driver, so the platform doesn't know that.

By this argument, I would think the guest _OSC would apply to the VMD
device itself, but not to the children of the VMD, and the guest could
retain the default native control of all these services inside the VMD
domain.

But prior to 04b12ef163d1 ("PCI: vmd: Honor ACPI _OSC on PCIe
features"), the guest *did* retain native control, so we would have to 
resolve the issue solved by 04b12ef163d1 in some other way.

Bjorn

> >> Signed-off-by: Nirmal Patel <nirmal.patel@linux.intel.com>
> >> ---
> >> v3->v4: Rewrite the commit log.
> >> v2->v3: Update the commit log.
> >> v1->v2: Update the commit log.
> >> ---
> >>  drivers/pci/controller/vmd.c | 2 --
> >>  1 file changed, 2 deletions(-)
> >>
> >> diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
> >> index 769eedeb8802..52c2461b4761 100644
> >> --- a/drivers/pci/controller/vmd.c
> >> +++ b/drivers/pci/controller/vmd.c
> >> @@ -701,8 +701,6 @@ static int vmd_alloc_irqs(struct vmd_dev *vmd)
> >>  static void vmd_copy_host_bridge_flags(struct pci_host_bridge *root_bridge,
> >>  				       struct pci_host_bridge *vmd_bridge)
> >>  {
> >> -	vmd_bridge->native_pcie_hotplug = root_bridge->native_pcie_hotplug;
> >> -	vmd_bridge->native_shpc_hotplug = root_bridge->native_shpc_hotplug;
> >>  	vmd_bridge->native_aer = root_bridge->native_aer;
> >>  	vmd_bridge->native_pme = root_bridge->native_pme;
> >>  	vmd_bridge->native_ltr = root_bridge->native_ltr;
> >> -- 
> >> 2.31.1
> >>
>
Nirmal Patel Sept. 12, 2023, 9:35 p.m. UTC | #7
On 8/30/2023 9:55 AM, Bjorn Helgaas wrote:
> [+cc Kai-Heng, Lorenzo, Rafael]
>
> On Tue, Aug 29, 2023 at 02:35:36PM -0700, Patel, Nirmal wrote:
>> On 8/29/2023 11:00 AM, Bjorn Helgaas wrote:
>>> On Tue, Aug 29, 2023 at 01:10:22AM -0400, Nirmal Patel wrote:
>>>> Currently during Host boot up, VMD UEFI driver loads and configures
>>>> all the VMD endpoints devices and devices behind VMD. Then during
>>>> VMD rootport creation, VMD driver honors ACPI settings for Hotplug,
>>>> AER, DPC, PM and enables these features based on BIOS settings.
>>>>
>>>> During the Guest boot up, ACPI settings along with VMD UEFI driver are
>>>> not present in Guest BIOS which results in assigning default values to
>>>> Hotplug, AER, DPC, etc. As a result hotplug is disabled on the VMD
>>>> rootports in the Guest OS.
>>>>
>>>> VMD driver in Guest should be able to see the same settings as seen
>>>> by Host VMD driver. Because of the missing implementation of VMD UEFI
>>>> driver in Guest BIOS, the Hotplug is disabled on VMD rootport in
>>>> Guest OS. Hot inserted drives don't show up and hot removed drives
>>>> do not disappear even if VMD supports Hotplug in Guest. This
>>>> behavior is observed in various combinations of guest OSes i.e. RHEL,
>>>> SLES and hypervisors i.e. KVM and ESXI.
>>>>
>>>> This change will make the VMD Host and Guest Driver to keep the settings
>>>> implemented by the UEFI VMD DXE driver and thus honoring the user
>>>> selections for hotplug in the BIOS.
>>> These settings are negotiated between the OS and the BIOS.  The guest
>>> runs a different BIOS than the host, so why should the guest setting
>>> be related to the host setting?
>>>
>>> I'm not a virtualization whiz, and I don't understand all of what's
>>> going on here, so please correct me when I go wrong:
>>>
>>> IIUC you need to change the guest behavior.  The guest currently sees
>>> vmd_bridge->native_pcie_hotplug as FALSE, and you need it to be TRUE?
>> Correct.
>>
>>> Currently this is copied from the guest's
>>> root_bridge->native_pcie_hotplug, so that must also be FALSE.
>>>
>>> I guess the guest sees a fabricated host bridge, which would start
>>> with native_pcie_hotplug as TRUE (from pci_init_host_bridge()), and
>>> then it must be set to FALSE because the guest _OSC didn't grant
>>> ownership to the OS?  (The guest dmesg should show this, right?)
>> This is my understanding too. I don't know much in detail about Guest
>> expectation.
>>
>>> In the guest, vmd_enable_domain() allocates a host bridge via
>>> pci_create_root_bus(), and that would again start with
>>> native_pcie_hotplug as TRUE.  It's not an ACPI host bridge, so I don't
>>> think we do _OSC negotiation for it.  After this patch removes the
>>> copy from the fabricated host bridge, it would be left as TRUE.
>> VMD was not dependent on _OSC settings and is not ACPI Host bridge. It
>> became _OSC dependent after the patch 04b12ef163d1.
>> https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git/commit/drivers/pci/controller/vmd.c?id=04b12ef163d10e348db664900ae7f611b83c7a0e
>>
>> This patch was added as a quick fix for AER flooding but it could
>> have been avoided by using rate limit for AER.
>>
>> I don't know all the history of VMD driver but does it have to be
>> dependent on root_bridge flags from _OSC? Is reverting 04b12ef163d1
>> a better idea than not allowing just hotplug flags to be copied from
>> root_bridge?
> It seems like the question is who owns AER, hotplug, etc for devices
> below VMD.  AER rate limiting sounds itself like a quick fix without
> addressing the underlying problem.
>
>>> If this is on track, it seems like if we want the guest to own PCIe
>>> hotplug, the guest BIOS _OSC for the fabricated host bridge should
>>> grant ownership of it.
>> I will try to check this option.
> On second thought, this doesn't seem right to me.  An _OSC method
> clearly applies to the hierarchy under that device, e.g., if we have a
> PNP0A03 host bridge with _SEG 0000 and _CRS that includes [bus 00-ff],
> its _OSC clearly applies to any devices in domain 0000, which in this
> case would include the VMD bridge.
>
> But I don't think it should apply to the devices *below* the VMD
> bridge.  Those are in a different domain, and if the platform wants to
> manage AER, hotplug, etc., for those devices, it would have to know
> some alternate config access method in order to read the AER and
> hotplug registers.  I think that config access depends on the setup
> done by the VMD driver, so the platform doesn't know that.
>
> By this argument, I would think the guest _OSC would apply to the VMD
> device itself, but not to the children of the VMD, and the guest could
> retain the default native control of all these services inside the VMD
> domain.
>
> But prior to 04b12ef163d1 ("PCI: vmd: Honor ACPI _OSC on PCIe
> features"), the guest *did* retain native control, so we would have to 
> resolve the issue solved by 04b12ef163d1 in some other way.
>
> Bjorn
Yes, _OSC settings should applied to devices on 0000 domain. VMD
creates its own domain to manage the child devices. So it is against
the VMD design to force _OSC settings and overwrite VMD settings.

The patch 04b12ef163d1 disables AER on VMD rootports by using BIOS system
settings for AER, Hotplug, etc.
The patch 04b12ef163d1 *assumes VMD is a bridge device* and copies
and *imposes system settings* for AER, DPC, Hotplug, PM, etc on VMD.
Borrowing and applying system settings on VMD rootports is
not correct. VMD is *type 0 PCI endpoint device* and all the PCI devices
under VMD are *privately* owned by VMD not by the OS. Also VMD has
its *own Hotplug settings* for its rootports and child devices in BIOS
under VMD settings that are different from BIOS system settings.
It is these settings that give VMD its own unique functionality.

For the above reason, VMD driver should disable AER generation by
devices it owns. There are two possible solutions.

Possible options to fix: There are two possible solutions.

Options 1: VMD driver disables AER by copying AER BIOS system settings
which the patch 04b12ef163d1 does but do not change other settings
including Hotplug. The proposed patch does that.

Option 2: Either disable AER by adding an extra BIOS settings under
VMD settings or disable AER by Linux VMD driver by adding a boot
parameter and remove the patch 04b12ef163d1.

Thanks
nirmal

>
>>>> Signed-off-by: Nirmal Patel <nirmal.patel@linux.intel.com>
>>>> ---
>>>> v3->v4: Rewrite the commit log.
>>>> v2->v3: Update the commit log.
>>>> v1->v2: Update the commit log.
>>>> ---
>>>>  drivers/pci/controller/vmd.c | 2 --
>>>>  1 file changed, 2 deletions(-)
>>>>
>>>> diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
>>>> index 769eedeb8802..52c2461b4761 100644
>>>> --- a/drivers/pci/controller/vmd.c
>>>> +++ b/drivers/pci/controller/vmd.c
>>>> @@ -701,8 +701,6 @@ static int vmd_alloc_irqs(struct vmd_dev *vmd)
>>>>  static void vmd_copy_host_bridge_flags(struct pci_host_bridge *root_bridge,
>>>>  				       struct pci_host_bridge *vmd_bridge)
>>>>  {
>>>> -	vmd_bridge->native_pcie_hotplug = root_bridge->native_pcie_hotplug;
>>>> -	vmd_bridge->native_shpc_hotplug = root_bridge->native_shpc_hotplug;
>>>>  	vmd_bridge->native_aer = root_bridge->native_aer;
>>>>  	vmd_bridge->native_pme = root_bridge->native_pme;
>>>>  	vmd_bridge->native_ltr = root_bridge->native_ltr;
>>>> -- 
>>>> 2.31.1
>>>>
Bjorn Helgaas Sept. 12, 2023, 10:54 p.m. UTC | #8
On Tue, Sep 12, 2023 at 02:35:39PM -0700, Patel, Nirmal wrote:
> On 8/30/2023 9:55 AM, Bjorn Helgaas wrote:
> > On Tue, Aug 29, 2023 at 02:35:36PM -0700, Patel, Nirmal wrote:
> >> On 8/29/2023 11:00 AM, Bjorn Helgaas wrote:
> >>> On Tue, Aug 29, 2023 at 01:10:22AM -0400, Nirmal Patel wrote:
> >>>> Currently during Host boot up, VMD UEFI driver loads and configures
> >>>> all the VMD endpoints devices and devices behind VMD. Then during
> >>>> VMD rootport creation, VMD driver honors ACPI settings for Hotplug,
> >>>> AER, DPC, PM and enables these features based on BIOS settings.
> >>>>
> >>>> During the Guest boot up, ACPI settings along with VMD UEFI driver are
> >>>> not present in Guest BIOS which results in assigning default values to
> >>>> Hotplug, AER, DPC, etc. As a result hotplug is disabled on the VMD
> >>>> rootports in the Guest OS.
> >>>>
> >>>> VMD driver in Guest should be able to see the same settings as seen
> >>>> by Host VMD driver. Because of the missing implementation of VMD UEFI
> >>>> driver in Guest BIOS, the Hotplug is disabled on VMD rootport in
> >>>> Guest OS. Hot inserted drives don't show up and hot removed drives
> >>>> do not disappear even if VMD supports Hotplug in Guest. This
> >>>> behavior is observed in various combinations of guest OSes i.e. RHEL,
> >>>> SLES and hypervisors i.e. KVM and ESXI.
> >>>>
> >>>> This change will make the VMD Host and Guest Driver to keep the settings
> >>>> implemented by the UEFI VMD DXE driver and thus honoring the user
> >>>> selections for hotplug in the BIOS.
> >>> These settings are negotiated between the OS and the BIOS.  The guest
> >>> runs a different BIOS than the host, so why should the guest setting
> >>> be related to the host setting?
> >>>
> >>> I'm not a virtualization whiz, and I don't understand all of what's
> >>> going on here, so please correct me when I go wrong:
> >>>
> >>> IIUC you need to change the guest behavior.  The guest currently sees
> >>> vmd_bridge->native_pcie_hotplug as FALSE, and you need it to be TRUE?
> >> Correct.
> >>
> >>> Currently this is copied from the guest's
> >>> root_bridge->native_pcie_hotplug, so that must also be FALSE.
> >>>
> >>> I guess the guest sees a fabricated host bridge, which would start
> >>> with native_pcie_hotplug as TRUE (from pci_init_host_bridge()), and
> >>> then it must be set to FALSE because the guest _OSC didn't grant
> >>> ownership to the OS?  (The guest dmesg should show this, right?)
> >> This is my understanding too. I don't know much in detail about Guest
> >> expectation.
> >>
> >>> In the guest, vmd_enable_domain() allocates a host bridge via
> >>> pci_create_root_bus(), and that would again start with
> >>> native_pcie_hotplug as TRUE.  It's not an ACPI host bridge, so I don't
> >>> think we do _OSC negotiation for it.  After this patch removes the
> >>> copy from the fabricated host bridge, it would be left as TRUE.
> >> VMD was not dependent on _OSC settings and is not ACPI Host bridge. It
> >> became _OSC dependent after the patch 04b12ef163d1.
> >> https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git/commit/drivers/pci/controller/vmd.c?id=04b12ef163d10e348db664900ae7f611b83c7a0e
> >>
> >> This patch was added as a quick fix for AER flooding but it could
> >> have been avoided by using rate limit for AER.
> >>
> >> I don't know all the history of VMD driver but does it have to be
> >> dependent on root_bridge flags from _OSC? Is reverting 04b12ef163d1
> >> a better idea than not allowing just hotplug flags to be copied from
> >> root_bridge?
> > It seems like the question is who owns AER, hotplug, etc for devices
> > below VMD.  AER rate limiting sounds itself like a quick fix without
> > addressing the underlying problem.
> >
> >>> If this is on track, it seems like if we want the guest to own PCIe
> >>> hotplug, the guest BIOS _OSC for the fabricated host bridge should
> >>> grant ownership of it.
> >> I will try to check this option.
> > On second thought, this doesn't seem right to me.  An _OSC method
> > clearly applies to the hierarchy under that device, e.g., if we have a
> > PNP0A03 host bridge with _SEG 0000 and _CRS that includes [bus 00-ff],
> > its _OSC clearly applies to any devices in domain 0000, which in this
> > case would include the VMD bridge.
> >
> > But I don't think it should apply to the devices *below* the VMD
> > bridge.  Those are in a different domain, and if the platform wants to
> > manage AER, hotplug, etc., for those devices, it would have to know
> > some alternate config access method in order to read the AER and
> > hotplug registers.  I think that config access depends on the setup
> > done by the VMD driver, so the platform doesn't know that.
> >
> > By this argument, I would think the guest _OSC would apply to the VMD
> > device itself, but not to the children of the VMD, and the guest could
> > retain the default native control of all these services inside the VMD
> > domain.
> >
> > But prior to 04b12ef163d1 ("PCI: vmd: Honor ACPI _OSC on PCIe
> > features"), the guest *did* retain native control, so we would have to 
> > resolve the issue solved by 04b12ef163d1 in some other way.
>
> Yes, _OSC settings should applied to devices on 0000 domain.

To be careful here, I think a PNP0A03 _OSC applies to the devices in
the PNP0A03 hierarchy, which could be any domain, not just 0000.

But I think we agree that devices behind a VMD bridge are in some
hierarchy separate from the PNP0A03 one because the PNP0A03 device
doesn't tell OSPM how to find things behind the VMD.

> VMD creates its own domain to manage the child devices. So it is
> against the VMD design to force _OSC settings and overwrite VMD
> settings.
> 
> The patch 04b12ef163d1 disables AER on VMD rootports by using BIOS system
> settings for AER, Hotplug, etc.
> The patch 04b12ef163d1 *assumes VMD is a bridge device* and copies
> and *imposes system settings* for AER, DPC, Hotplug, PM, etc on VMD.
> Borrowing and applying system settings on VMD rootports is
> not correct.

Yes, agreed, and I think this suggests that we really should remove
04b12ef163d1 ("PCI: vmd: Honor ACPI _OSC on PCIe features")
completely.

> VMD is *type 0 PCI endpoint device* and all the PCI devices
> under VMD are *privately* owned by VMD not by the OS. Also VMD has
> its *own Hotplug settings* for its rootports and child devices in BIOS
> under VMD settings that are different from BIOS system settings.
> It is these settings that give VMD its own unique functionality.
> 
> For the above reason, VMD driver should disable AER generation by
> devices it owns. There are two possible solutions.
>
> Possible options to fix: There are two possible solutions.
> 
> Options 1: VMD driver disables AER by copying AER BIOS system settings
> which the patch 04b12ef163d1 does but do not change other settings
> including Hotplug. The proposed patch does that.

This doesn't seem right because I don't think we should be applying
any _OSC settings to devices below VMD unless there's a PNP0A03 device
that describes the domain below VMD.

> Option 2: Either disable AER by adding an extra BIOS settings under
> VMD settings or disable AER by Linux VMD driver by adding a boot
> parameter and remove the patch 04b12ef163d1.

I think we should remove 04b12ef163d1 and figure out some other way
of handling devices below VMD.  Maybe you just hard-code those items
to be what you want.

> >>>> Signed-off-by: Nirmal Patel <nirmal.patel@linux.intel.com>
> >>>> ---
> >>>> v3->v4: Rewrite the commit log.
> >>>> v2->v3: Update the commit log.
> >>>> v1->v2: Update the commit log.
> >>>> ---
> >>>>  drivers/pci/controller/vmd.c | 2 --
> >>>>  1 file changed, 2 deletions(-)
> >>>>
> >>>> diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
> >>>> index 769eedeb8802..52c2461b4761 100644
> >>>> --- a/drivers/pci/controller/vmd.c
> >>>> +++ b/drivers/pci/controller/vmd.c
> >>>> @@ -701,8 +701,6 @@ static int vmd_alloc_irqs(struct vmd_dev *vmd)
> >>>>  static void vmd_copy_host_bridge_flags(struct pci_host_bridge *root_bridge,
> >>>>  				       struct pci_host_bridge *vmd_bridge)
> >>>>  {
> >>>> -	vmd_bridge->native_pcie_hotplug = root_bridge->native_pcie_hotplug;
> >>>> -	vmd_bridge->native_shpc_hotplug = root_bridge->native_shpc_hotplug;
> >>>>  	vmd_bridge->native_aer = root_bridge->native_aer;
> >>>>  	vmd_bridge->native_pme = root_bridge->native_pme;
> >>>>  	vmd_bridge->native_ltr = root_bridge->native_ltr;
> >>>> -- 
> >>>> 2.31.1
> >>>>
>
Kai-Heng Feng Sept. 13, 2023, 3:54 a.m. UTC | #9
On Wed, Sep 13, 2023 at 6:54 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Tue, Sep 12, 2023 at 02:35:39PM -0700, Patel, Nirmal wrote:
> > On 8/30/2023 9:55 AM, Bjorn Helgaas wrote:
> > > On Tue, Aug 29, 2023 at 02:35:36PM -0700, Patel, Nirmal wrote:
> > >> On 8/29/2023 11:00 AM, Bjorn Helgaas wrote:
> > >>> On Tue, Aug 29, 2023 at 01:10:22AM -0400, Nirmal Patel wrote:
> > >>>> Currently during Host boot up, VMD UEFI driver loads and configures
> > >>>> all the VMD endpoints devices and devices behind VMD. Then during
> > >>>> VMD rootport creation, VMD driver honors ACPI settings for Hotplug,
> > >>>> AER, DPC, PM and enables these features based on BIOS settings.
> > >>>>
> > >>>> During the Guest boot up, ACPI settings along with VMD UEFI driver are
> > >>>> not present in Guest BIOS which results in assigning default values to
> > >>>> Hotplug, AER, DPC, etc. As a result hotplug is disabled on the VMD
> > >>>> rootports in the Guest OS.
> > >>>>
> > >>>> VMD driver in Guest should be able to see the same settings as seen
> > >>>> by Host VMD driver. Because of the missing implementation of VMD UEFI
> > >>>> driver in Guest BIOS, the Hotplug is disabled on VMD rootport in
> > >>>> Guest OS. Hot inserted drives don't show up and hot removed drives
> > >>>> do not disappear even if VMD supports Hotplug in Guest. This
> > >>>> behavior is observed in various combinations of guest OSes i.e. RHEL,
> > >>>> SLES and hypervisors i.e. KVM and ESXI.
> > >>>>
> > >>>> This change will make the VMD Host and Guest Driver to keep the settings
> > >>>> implemented by the UEFI VMD DXE driver and thus honoring the user
> > >>>> selections for hotplug in the BIOS.
> > >>> These settings are negotiated between the OS and the BIOS.  The guest
> > >>> runs a different BIOS than the host, so why should the guest setting
> > >>> be related to the host setting?
> > >>>
> > >>> I'm not a virtualization whiz, and I don't understand all of what's
> > >>> going on here, so please correct me when I go wrong:
> > >>>
> > >>> IIUC you need to change the guest behavior.  The guest currently sees
> > >>> vmd_bridge->native_pcie_hotplug as FALSE, and you need it to be TRUE?
> > >> Correct.
> > >>
> > >>> Currently this is copied from the guest's
> > >>> root_bridge->native_pcie_hotplug, so that must also be FALSE.
> > >>>
> > >>> I guess the guest sees a fabricated host bridge, which would start
> > >>> with native_pcie_hotplug as TRUE (from pci_init_host_bridge()), and
> > >>> then it must be set to FALSE because the guest _OSC didn't grant
> > >>> ownership to the OS?  (The guest dmesg should show this, right?)
> > >> This is my understanding too. I don't know much in detail about Guest
> > >> expectation.
> > >>
> > >>> In the guest, vmd_enable_domain() allocates a host bridge via
> > >>> pci_create_root_bus(), and that would again start with
> > >>> native_pcie_hotplug as TRUE.  It's not an ACPI host bridge, so I don't
> > >>> think we do _OSC negotiation for it.  After this patch removes the
> > >>> copy from the fabricated host bridge, it would be left as TRUE.
> > >> VMD was not dependent on _OSC settings and is not ACPI Host bridge. It
> > >> became _OSC dependent after the patch 04b12ef163d1.
> > >> https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git/commit/drivers/pci/controller/vmd.c?id=04b12ef163d10e348db664900ae7f611b83c7a0e
> > >>
> > >> This patch was added as a quick fix for AER flooding but it could
> > >> have been avoided by using rate limit for AER.
> > >>
> > >> I don't know all the history of VMD driver but does it have to be
> > >> dependent on root_bridge flags from _OSC? Is reverting 04b12ef163d1
> > >> a better idea than not allowing just hotplug flags to be copied from
> > >> root_bridge?
> > > It seems like the question is who owns AER, hotplug, etc for devices
> > > below VMD.  AER rate limiting sounds itself like a quick fix without
> > > addressing the underlying problem.
> > >
> > >>> If this is on track, it seems like if we want the guest to own PCIe
> > >>> hotplug, the guest BIOS _OSC for the fabricated host bridge should
> > >>> grant ownership of it.
> > >> I will try to check this option.
> > > On second thought, this doesn't seem right to me.  An _OSC method
> > > clearly applies to the hierarchy under that device, e.g., if we have a
> > > PNP0A03 host bridge with _SEG 0000 and _CRS that includes [bus 00-ff],
> > > its _OSC clearly applies to any devices in domain 0000, which in this
> > > case would include the VMD bridge.
> > >
> > > But I don't think it should apply to the devices *below* the VMD
> > > bridge.  Those are in a different domain, and if the platform wants to
> > > manage AER, hotplug, etc., for those devices, it would have to know
> > > some alternate config access method in order to read the AER and
> > > hotplug registers.  I think that config access depends on the setup
> > > done by the VMD driver, so the platform doesn't know that.
> > >
> > > By this argument, I would think the guest _OSC would apply to the VMD
> > > device itself, but not to the children of the VMD, and the guest could
> > > retain the default native control of all these services inside the VMD
> > > domain.
> > >
> > > But prior to 04b12ef163d1 ("PCI: vmd: Honor ACPI _OSC on PCIe
> > > features"), the guest *did* retain native control, so we would have to
> > > resolve the issue solved by 04b12ef163d1 in some other way.
> >
> > Yes, _OSC settings should applied to devices on 0000 domain.
>
> To be careful here, I think a PNP0A03 _OSC applies to the devices in
> the PNP0A03 hierarchy, which could be any domain, not just 0000.
>
> But I think we agree that devices behind a VMD bridge are in some
> hierarchy separate from the PNP0A03 one because the PNP0A03 device
> doesn't tell OSPM how to find things behind the VMD.
>
> > VMD creates its own domain to manage the child devices. So it is
> > against the VMD design to force _OSC settings and overwrite VMD
> > settings.
> >
> > The patch 04b12ef163d1 disables AER on VMD rootports by using BIOS system
> > settings for AER, Hotplug, etc.
> > The patch 04b12ef163d1 *assumes VMD is a bridge device* and copies
> > and *imposes system settings* for AER, DPC, Hotplug, PM, etc on VMD.
> > Borrowing and applying system settings on VMD rootports is
> > not correct.
>
> Yes, agreed, and I think this suggests that we really should remove
> 04b12ef163d1 ("PCI: vmd: Honor ACPI _OSC on PCIe features")
> completely.
>
> > VMD is *type 0 PCI endpoint device* and all the PCI devices
> > under VMD are *privately* owned by VMD not by the OS. Also VMD has
> > its *own Hotplug settings* for its rootports and child devices in BIOS
> > under VMD settings that are different from BIOS system settings.
> > It is these settings that give VMD its own unique functionality.
> >
> > For the above reason, VMD driver should disable AER generation by
> > devices it owns. There are two possible solutions.
> >
> > Possible options to fix: There are two possible solutions.
> >
> > Options 1: VMD driver disables AER by copying AER BIOS system settings
> > which the patch 04b12ef163d1 does but do not change other settings
> > including Hotplug. The proposed patch does that.
>
> This doesn't seem right because I don't think we should be applying
> any _OSC settings to devices below VMD unless there's a PNP0A03 device
> that describes the domain below VMD.
>
> > Option 2: Either disable AER by adding an extra BIOS settings under
> > VMD settings or disable AER by Linux VMD driver by adding a boot
> > parameter and remove the patch 04b12ef163d1.
>
> I think we should remove 04b12ef163d1 and figure out some other way
> of handling devices below VMD.  Maybe you just hard-code those items
> to be what you want.

The idea was that the same physical root ports were used by 0000
domain and VMD domain, so it's reasonable to use the same PCIe
services.

So what items should be hard-coded, assuming 04b12ef163d1 gets reverted?

Kai-Heng

>
> > >>>> Signed-off-by: Nirmal Patel <nirmal.patel@linux.intel.com>
> > >>>> ---
> > >>>> v3->v4: Rewrite the commit log.
> > >>>> v2->v3: Update the commit log.
> > >>>> v1->v2: Update the commit log.
> > >>>> ---
> > >>>>  drivers/pci/controller/vmd.c | 2 --
> > >>>>  1 file changed, 2 deletions(-)
> > >>>>
> > >>>> diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
> > >>>> index 769eedeb8802..52c2461b4761 100644
> > >>>> --- a/drivers/pci/controller/vmd.c
> > >>>> +++ b/drivers/pci/controller/vmd.c
> > >>>> @@ -701,8 +701,6 @@ static int vmd_alloc_irqs(struct vmd_dev *vmd)
> > >>>>  static void vmd_copy_host_bridge_flags(struct pci_host_bridge *root_bridge,
> > >>>>                                         struct pci_host_bridge *vmd_bridge)
> > >>>>  {
> > >>>> -        vmd_bridge->native_pcie_hotplug = root_bridge->native_pcie_hotplug;
> > >>>> -        vmd_bridge->native_shpc_hotplug = root_bridge->native_shpc_hotplug;
> > >>>>          vmd_bridge->native_aer = root_bridge->native_aer;
> > >>>>          vmd_bridge->native_pme = root_bridge->native_pme;
> > >>>>          vmd_bridge->native_ltr = root_bridge->native_ltr;
> > >>>> --
> > >>>> 2.31.1
> > >>>>
> >
Bjorn Helgaas Sept. 13, 2023, 12:50 p.m. UTC | #10
On Wed, Sep 13, 2023 at 11:54:05AM +0800, Kai-Heng Feng wrote:
> On Wed, Sep 13, 2023 at 6:54 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > On Tue, Sep 12, 2023 at 02:35:39PM -0700, Patel, Nirmal wrote:
> > > On 8/30/2023 9:55 AM, Bjorn Helgaas wrote:
> > > > On Tue, Aug 29, 2023 at 02:35:36PM -0700, Patel, Nirmal wrote:
> > > >> On 8/29/2023 11:00 AM, Bjorn Helgaas wrote:
> > > >>> On Tue, Aug 29, 2023 at 01:10:22AM -0400, Nirmal Patel wrote:
> > > >>>> Currently during Host boot up, VMD UEFI driver loads and configures
> > > >>>> all the VMD endpoints devices and devices behind VMD. Then during
> > > >>>> VMD rootport creation, VMD driver honors ACPI settings for Hotplug,
> > > >>>> AER, DPC, PM and enables these features based on BIOS settings.
> > > >>>>
> > > >>>> During the Guest boot up, ACPI settings along with VMD UEFI driver are
> > > >>>> not present in Guest BIOS which results in assigning default values to
> > > >>>> Hotplug, AER, DPC, etc. As a result hotplug is disabled on the VMD
> > > >>>> rootports in the Guest OS.
> > > >>>>
> > > >>>> VMD driver in Guest should be able to see the same settings as seen
> > > >>>> by Host VMD driver. Because of the missing implementation of VMD UEFI
> > > >>>> driver in Guest BIOS, the Hotplug is disabled on VMD rootport in
> > > >>>> Guest OS. Hot inserted drives don't show up and hot removed drives
> > > >>>> do not disappear even if VMD supports Hotplug in Guest. This
> > > >>>> behavior is observed in various combinations of guest OSes i.e. RHEL,
> > > >>>> SLES and hypervisors i.e. KVM and ESXI.
> > > >>>>
> > > >>>> This change will make the VMD Host and Guest Driver to keep the settings
> > > >>>> implemented by the UEFI VMD DXE driver and thus honoring the user
> > > >>>> selections for hotplug in the BIOS.
> > > >>> These settings are negotiated between the OS and the BIOS.  The guest
> > > >>> runs a different BIOS than the host, so why should the guest setting
> > > >>> be related to the host setting?
> > > >>>
> > > >>> I'm not a virtualization whiz, and I don't understand all of what's
> > > >>> going on here, so please correct me when I go wrong:
> > > >>>
> > > >>> IIUC you need to change the guest behavior.  The guest currently sees
> > > >>> vmd_bridge->native_pcie_hotplug as FALSE, and you need it to be TRUE?
> > > >> Correct.
> > > >>
> > > >>> Currently this is copied from the guest's
> > > >>> root_bridge->native_pcie_hotplug, so that must also be FALSE.
> > > >>>
> > > >>> I guess the guest sees a fabricated host bridge, which would start
> > > >>> with native_pcie_hotplug as TRUE (from pci_init_host_bridge()), and
> > > >>> then it must be set to FALSE because the guest _OSC didn't grant
> > > >>> ownership to the OS?  (The guest dmesg should show this, right?)
> > > >> This is my understanding too. I don't know much in detail about Guest
> > > >> expectation.
> > > >>
> > > >>> In the guest, vmd_enable_domain() allocates a host bridge via
> > > >>> pci_create_root_bus(), and that would again start with
> > > >>> native_pcie_hotplug as TRUE.  It's not an ACPI host bridge, so I don't
> > > >>> think we do _OSC negotiation for it.  After this patch removes the
> > > >>> copy from the fabricated host bridge, it would be left as TRUE.
> > > >> VMD was not dependent on _OSC settings and is not ACPI Host bridge. It
> > > >> became _OSC dependent after the patch 04b12ef163d1.
> > > >> https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git/commit/drivers/pci/controller/vmd.c?id=04b12ef163d10e348db664900ae7f611b83c7a0e
> > > >>
> > > >> This patch was added as a quick fix for AER flooding but it could
> > > >> have been avoided by using rate limit for AER.
> > > >>
> > > >> I don't know all the history of VMD driver but does it have to be
> > > >> dependent on root_bridge flags from _OSC? Is reverting 04b12ef163d1
> > > >> a better idea than not allowing just hotplug flags to be copied from
> > > >> root_bridge?
> > > > It seems like the question is who owns AER, hotplug, etc for devices
> > > > below VMD.  AER rate limiting sounds itself like a quick fix without
> > > > addressing the underlying problem.
> > > >
> > > >>> If this is on track, it seems like if we want the guest to own PCIe
> > > >>> hotplug, the guest BIOS _OSC for the fabricated host bridge should
> > > >>> grant ownership of it.
> > > >> I will try to check this option.
> > > > On second thought, this doesn't seem right to me.  An _OSC method
> > > > clearly applies to the hierarchy under that device, e.g., if we have a
> > > > PNP0A03 host bridge with _SEG 0000 and _CRS that includes [bus 00-ff],
> > > > its _OSC clearly applies to any devices in domain 0000, which in this
> > > > case would include the VMD bridge.
> > > >
> > > > But I don't think it should apply to the devices *below* the VMD
> > > > bridge.  Those are in a different domain, and if the platform wants to
> > > > manage AER, hotplug, etc., for those devices, it would have to know
> > > > some alternate config access method in order to read the AER and
> > > > hotplug registers.  I think that config access depends on the setup
> > > > done by the VMD driver, so the platform doesn't know that.
> > > >
> > > > By this argument, I would think the guest _OSC would apply to the VMD
> > > > device itself, but not to the children of the VMD, and the guest could
> > > > retain the default native control of all these services inside the VMD
> > > > domain.
> > > >
> > > > But prior to 04b12ef163d1 ("PCI: vmd: Honor ACPI _OSC on PCIe
> > > > features"), the guest *did* retain native control, so we would have to
> > > > resolve the issue solved by 04b12ef163d1 in some other way.
> > >
> > > Yes, _OSC settings should applied to devices on 0000 domain.
> >
> > To be careful here, I think a PNP0A03 _OSC applies to the devices in
> > the PNP0A03 hierarchy, which could be any domain, not just 0000.
> >
> > But I think we agree that devices behind a VMD bridge are in some
> > hierarchy separate from the PNP0A03 one because the PNP0A03 device
> > doesn't tell OSPM how to find things behind the VMD.
> >
> > > VMD creates its own domain to manage the child devices. So it is
> > > against the VMD design to force _OSC settings and overwrite VMD
> > > settings.
> > >
> > > The patch 04b12ef163d1 disables AER on VMD rootports by using BIOS system
> > > settings for AER, Hotplug, etc.
> > > The patch 04b12ef163d1 *assumes VMD is a bridge device* and copies
> > > and *imposes system settings* for AER, DPC, Hotplug, PM, etc on VMD.
> > > Borrowing and applying system settings on VMD rootports is
> > > not correct.
> >
> > Yes, agreed, and I think this suggests that we really should remove
> > 04b12ef163d1 ("PCI: vmd: Honor ACPI _OSC on PCIe features")
> > completely.
> >
> > > VMD is *type 0 PCI endpoint device* and all the PCI devices
> > > under VMD are *privately* owned by VMD not by the OS. Also VMD has
> > > its *own Hotplug settings* for its rootports and child devices in BIOS
> > > under VMD settings that are different from BIOS system settings.
> > > It is these settings that give VMD its own unique functionality.
> > >
> > > For the above reason, VMD driver should disable AER generation by
> > > devices it owns. There are two possible solutions.
> > >
> > > Possible options to fix: There are two possible solutions.
> > >
> > > Options 1: VMD driver disables AER by copying AER BIOS system settings
> > > which the patch 04b12ef163d1 does but do not change other settings
> > > including Hotplug. The proposed patch does that.
> >
> > This doesn't seem right because I don't think we should be applying
> > any _OSC settings to devices below VMD unless there's a PNP0A03 device
> > that describes the domain below VMD.
> >
> > > Option 2: Either disable AER by adding an extra BIOS settings under
> > > VMD settings or disable AER by Linux VMD driver by adding a boot
> > > parameter and remove the patch 04b12ef163d1.
> >
> > I think we should remove 04b12ef163d1 and figure out some other way
> > of handling devices below VMD.  Maybe you just hard-code those items
> > to be what you want.
> 
> The idea was that the same physical root ports were used by 0000
> domain and VMD domain, so it's reasonable to use the same PCIe
> services.

Hmm.  In some ways the VMD device acts as a Root Port, since it
originates a new hierarchy in a separate domain, but on the upstream
side, it's just a normal endpoint.

How does AER for the new hierarchy work?  A device below the VMD can
generate ERR_COR/ERR_NONFATAL/ERR_FATAL messages.  I guess I was
assuming those messages would terminate at the VMD, and the VMD could
generate an AER interrupt just like a Root Port.  But that can't be
right because I don't think VMD would have the Root Error Command
register needed to manage that interrupt.

But if VMD just passes those messages up to the Root Port, the source
of the messages (the Requester ID) won't make any sense because
they're in a hierarchy the Root Port doesn't know anything about.

> So what items should be hard-coded, assuming 04b12ef163d1 gets reverted?

> > > >>>> Signed-off-by: Nirmal Patel <nirmal.patel@linux.intel.com>
> > > >>>> ---
> > > >>>> v3->v4: Rewrite the commit log.
> > > >>>> v2->v3: Update the commit log.
> > > >>>> v1->v2: Update the commit log.
> > > >>>> ---
> > > >>>>  drivers/pci/controller/vmd.c | 2 --
> > > >>>>  1 file changed, 2 deletions(-)
> > > >>>>
> > > >>>> diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
> > > >>>> index 769eedeb8802..52c2461b4761 100644
> > > >>>> --- a/drivers/pci/controller/vmd.c
> > > >>>> +++ b/drivers/pci/controller/vmd.c
> > > >>>> @@ -701,8 +701,6 @@ static int vmd_alloc_irqs(struct vmd_dev *vmd)
> > > >>>>  static void vmd_copy_host_bridge_flags(struct pci_host_bridge *root_bridge,
> > > >>>>                                         struct pci_host_bridge *vmd_bridge)
> > > >>>>  {
> > > >>>> -        vmd_bridge->native_pcie_hotplug = root_bridge->native_pcie_hotplug;
> > > >>>> -        vmd_bridge->native_shpc_hotplug = root_bridge->native_shpc_hotplug;
> > > >>>>          vmd_bridge->native_aer = root_bridge->native_aer;
> > > >>>>          vmd_bridge->native_pme = root_bridge->native_pme;
> > > >>>>          vmd_bridge->native_ltr = root_bridge->native_ltr;
> > > >>>> --
> > > >>>> 2.31.1
> > > >>>>
> > >
Kai-Heng Feng Sept. 19, 2023, 3:31 a.m. UTC | #11
On Wed, Sep 13, 2023 at 8:50 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
[snipped]
> Hmm.  In some ways the VMD device acts as a Root Port, since it
> originates a new hierarchy in a separate domain, but on the upstream
> side, it's just a normal endpoint.
>
> How does AER for the new hierarchy work?  A device below the VMD can
> generate ERR_COR/ERR_NONFATAL/ERR_FATAL messages.  I guess I was
> assuming those messages would terminate at the VMD, and the VMD could
> generate an AER interrupt just like a Root Port.  But that can't be
> right because I don't think VMD would have the Root Error Command
> register needed to manage that interrupt.

VMD itself doesn't seem to manage AER, the rootport that "moved" from
0000 domain does:
[ 2113.507345] pcieport 10000:e0:06.0: AER: Corrected error received:
10000:e1:00.0
[ 2113.507380] nvme 10000:e1:00.0: PCIe Bus Error: severity=Corrected,
type=Physical Layer, (Receiver ID)
[ 2113.507389] nvme 10000:e1:00.0:   device [144d:a80a] error
status/mask=00000001/0000e000
[ 2113.507398] nvme 10000:e1:00.0:    [ 0] RxErr                  (First)

>
> But if VMD just passes those messages up to the Root Port, the source
> of the messages (the Requester ID) won't make any sense because
> they're in a hierarchy the Root Port doesn't know anything about.

Not sure what's current status is but I think Nirmal's patch is valid
for both our cases.

Kai-Heng

>
> > So what items should be hard-coded, assuming 04b12ef163d1 gets reverted?
>
> > > > >>>> Signed-off-by: Nirmal Patel <nirmal.patel@linux.intel.com>
> > > > >>>> ---
> > > > >>>> v3->v4: Rewrite the commit log.
> > > > >>>> v2->v3: Update the commit log.
> > > > >>>> v1->v2: Update the commit log.
> > > > >>>> ---
> > > > >>>>  drivers/pci/controller/vmd.c | 2 --
> > > > >>>>  1 file changed, 2 deletions(-)
> > > > >>>>
> > > > >>>> diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
> > > > >>>> index 769eedeb8802..52c2461b4761 100644
> > > > >>>> --- a/drivers/pci/controller/vmd.c
> > > > >>>> +++ b/drivers/pci/controller/vmd.c
> > > > >>>> @@ -701,8 +701,6 @@ static int vmd_alloc_irqs(struct vmd_dev *vmd)
> > > > >>>>  static void vmd_copy_host_bridge_flags(struct pci_host_bridge *root_bridge,
> > > > >>>>                                         struct pci_host_bridge *vmd_bridge)
> > > > >>>>  {
> > > > >>>> -        vmd_bridge->native_pcie_hotplug = root_bridge->native_pcie_hotplug;
> > > > >>>> -        vmd_bridge->native_shpc_hotplug = root_bridge->native_shpc_hotplug;
> > > > >>>>          vmd_bridge->native_aer = root_bridge->native_aer;
> > > > >>>>          vmd_bridge->native_pme = root_bridge->native_pme;
> > > > >>>>          vmd_bridge->native_ltr = root_bridge->native_ltr;
> > > > >>>> --
> > > > >>>> 2.31.1
> > > > >>>>
> > > >
Bjorn Helgaas Sept. 19, 2023, 2:34 p.m. UTC | #12
On Tue, Sep 19, 2023 at 11:31:57AM +0800, Kai-Heng Feng wrote:
> On Wed, Sep 13, 2023 at 8:50 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> [snipped]
> > Hmm.  In some ways the VMD device acts as a Root Port, since it
> > originates a new hierarchy in a separate domain, but on the upstream
> > side, it's just a normal endpoint.
> >
> > How does AER for the new hierarchy work?  A device below the VMD can
> > generate ERR_COR/ERR_NONFATAL/ERR_FATAL messages.  I guess I was
> > assuming those messages would terminate at the VMD, and the VMD could
> > generate an AER interrupt just like a Root Port.  But that can't be
> > right because I don't think VMD would have the Root Error Command
> > register needed to manage that interrupt.
> 
> VMD itself doesn't seem to manage AER, the rootport that "moved" from
> 0000 domain does:
> [ 2113.507345] pcieport 10000:e0:06.0: AER: Corrected error received:
> 10000:e1:00.0
> [ 2113.507380] nvme 10000:e1:00.0: PCIe Bus Error: severity=Corrected,
> type=Physical Layer, (Receiver ID)
> [ 2113.507389] nvme 10000:e1:00.0:   device [144d:a80a] error
> status/mask=00000001/0000e000
> [ 2113.507398] nvme 10000:e1:00.0:    [ 0] RxErr                  (First)

Oh, I forgot how VMD works.  It sounds like there *is* a Root Port
that is logically below the VMD, e.g., (from
https://bugzilla.kernel.org/show_bug.cgi?id=215027):

  ACPI: PCI Root Bridge [PC00] (domain 0000 [bus 00-e0])
  acpi PNP0A08:00: _OSC: platform does not support [AER]
  acpi PNP0A08:00: _OSC: OS now controls [PCIeHotplug SHPCHotplug PME PCIeCapability LTR]
  pci  0000:00:0e.0: [8086:467f] type 00         # VMD
  vmd  0000:00:0e.0: PCI host bridge to bus 10000:e0
  pci 10000:e0:06.0: [8086:464d] type 01         # Root Port to [bus e1]
  pci 10000:e1:00.0: [144d:a80a] type 00         # Samsung NVMe

So ERR_* messages from the e1:00.0 Samsung device would terminate at
the e0:06.0 Root Port.  That Root Port has an AER Capability with Root
Error Command/Status/Error Source registers.

> > But if VMD just passes those messages up to the Root Port, the source
> > of the messages (the Requester ID) won't make any sense because
> > they're in a hierarchy the Root Port doesn't know anything about.
> 
> Not sure what's current status is but I think Nirmal's patch is valid
> for both our cases.

So I think the question is whether that PNP0A08:00 _OSC applies to
domain 10000.  I think the answer is "no" because the platform doesn't
know about the existence of domain 10000, and it can't access config
space in that domain.

E.g., if _OSC negotiated that the platform owned AER in domain 0000, I
don't think it would make sense for that to mean the platform *also*
owned AER in domain 10000, because the platform doesn't know how to
configure AER or handle AER interrupts in that domain.

Nirmal's patch ignores _OSC for hotplug, but keeps the _OSC results
for AER, PME, and LTR.  I think we should ignore _OSC for *all* of
them.

That would mean reverting 04b12ef163d1 ("PCI: vmd: Honor ACPI _OSC on
PCIe features") completely, so of course we'd have to figure out how
to resolve the AER message flood a different way.

Bjorn

> > > > > >>>>  static void vmd_copy_host_bridge_flags(struct pci_host_bridge *root_bridge,
> > > > > >>>>                                         struct pci_host_bridge *vmd_bridge)
> > > > > >>>>  {
> > > > > >>>> -        vmd_bridge->native_pcie_hotplug = root_bridge->native_pcie_hotplug;
> > > > > >>>> -        vmd_bridge->native_shpc_hotplug = root_bridge->native_shpc_hotplug;
> > > > > >>>>          vmd_bridge->native_aer = root_bridge->native_aer;
> > > > > >>>>          vmd_bridge->native_pme = root_bridge->native_pme;
> > > > > >>>>          vmd_bridge->native_ltr = root_bridge->native_ltr;
Rafael J. Wysocki Sept. 19, 2023, 3:52 p.m. UTC | #13
On Tue, Sep 19, 2023 at 4:34 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Tue, Sep 19, 2023 at 11:31:57AM +0800, Kai-Heng Feng wrote:
> > On Wed, Sep 13, 2023 at 8:50 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > [snipped]
> > > Hmm.  In some ways the VMD device acts as a Root Port, since it
> > > originates a new hierarchy in a separate domain, but on the upstream
> > > side, it's just a normal endpoint.
> > >
> > > How does AER for the new hierarchy work?  A device below the VMD can
> > > generate ERR_COR/ERR_NONFATAL/ERR_FATAL messages.  I guess I was
> > > assuming those messages would terminate at the VMD, and the VMD could
> > > generate an AER interrupt just like a Root Port.  But that can't be
> > > right because I don't think VMD would have the Root Error Command
> > > register needed to manage that interrupt.
> >
> > VMD itself doesn't seem to manage AER, the rootport that "moved" from
> > 0000 domain does:
> > [ 2113.507345] pcieport 10000:e0:06.0: AER: Corrected error received:
> > 10000:e1:00.0
> > [ 2113.507380] nvme 10000:e1:00.0: PCIe Bus Error: severity=Corrected,
> > type=Physical Layer, (Receiver ID)
> > [ 2113.507389] nvme 10000:e1:00.0:   device [144d:a80a] error
> > status/mask=00000001/0000e000
> > [ 2113.507398] nvme 10000:e1:00.0:    [ 0] RxErr                  (First)
>
> Oh, I forgot how VMD works.  It sounds like there *is* a Root Port
> that is logically below the VMD, e.g., (from
> https://bugzilla.kernel.org/show_bug.cgi?id=215027):
>
>   ACPI: PCI Root Bridge [PC00] (domain 0000 [bus 00-e0])
>   acpi PNP0A08:00: _OSC: platform does not support [AER]
>   acpi PNP0A08:00: _OSC: OS now controls [PCIeHotplug SHPCHotplug PME PCIeCapability LTR]
>   pci  0000:00:0e.0: [8086:467f] type 00         # VMD
>   vmd  0000:00:0e.0: PCI host bridge to bus 10000:e0
>   pci 10000:e0:06.0: [8086:464d] type 01         # Root Port to [bus e1]
>   pci 10000:e1:00.0: [144d:a80a] type 00         # Samsung NVMe
>
> So ERR_* messages from the e1:00.0 Samsung device would terminate at
> the e0:06.0 Root Port.  That Root Port has an AER Capability with Root
> Error Command/Status/Error Source registers.
>
> > > But if VMD just passes those messages up to the Root Port, the source
> > > of the messages (the Requester ID) won't make any sense because
> > > they're in a hierarchy the Root Port doesn't know anything about.
> >
> > Not sure what's current status is but I think Nirmal's patch is valid
> > for both our cases.
>
> So I think the question is whether that PNP0A08:00 _OSC applies to
> domain 10000.  I think the answer is "no" because the platform doesn't
> know about the existence of domain 10000, and it can't access config
> space in that domain.

Well, the VMD device itself is there in domain 0000, however, and sure
enough, the platform firmware can access its config space.

> E.g., if _OSC negotiated that the platform owned AER in domain 0000, I
> don't think it would make sense for that to mean the platform *also*
> owned AER in domain 10000, because the platform doesn't know how to
> configure AER or handle AER interrupts in that domain.

I'm not sure about this.

AFAICS, domain 10000 is not physically independent of domain 0000, so
I'm not sure to what extent the above applies.

> Nirmal's patch ignores _OSC for hotplug, but keeps the _OSC results
> for AER, PME, and LTR.  I think we should ignore _OSC for *all* of
> them.
>
> That would mean reverting 04b12ef163d1 ("PCI: vmd: Honor ACPI _OSC on
> PCIe features") completely, so of course we'd have to figure out how
> to resolve the AER message flood a different way.

I agree with the above approach, however.
Bjorn Helgaas Sept. 19, 2023, 5:33 p.m. UTC | #14
On Tue, Sep 19, 2023 at 05:52:33PM +0200, Rafael J. Wysocki wrote:
> On Tue, Sep 19, 2023 at 4:34 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > On Tue, Sep 19, 2023 at 11:31:57AM +0800, Kai-Heng Feng wrote:
> > > On Wed, Sep 13, 2023 at 8:50 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > [snipped]
> > > > Hmm.  In some ways the VMD device acts as a Root Port, since it
> > > > originates a new hierarchy in a separate domain, but on the upstream
> > > > side, it's just a normal endpoint.
> > > >
> > > > How does AER for the new hierarchy work?  A device below the VMD can
> > > > generate ERR_COR/ERR_NONFATAL/ERR_FATAL messages.  I guess I was
> > > > assuming those messages would terminate at the VMD, and the VMD could
> > > > generate an AER interrupt just like a Root Port.  But that can't be
> > > > right because I don't think VMD would have the Root Error Command
> > > > register needed to manage that interrupt.
> > >
> > > VMD itself doesn't seem to manage AER, the rootport that "moved" from
> > > 0000 domain does:
> > > [ 2113.507345] pcieport 10000:e0:06.0: AER: Corrected error received:
> > > 10000:e1:00.0
> > > [ 2113.507380] nvme 10000:e1:00.0: PCIe Bus Error: severity=Corrected,
> > > type=Physical Layer, (Receiver ID)
> > > [ 2113.507389] nvme 10000:e1:00.0:   device [144d:a80a] error
> > > status/mask=00000001/0000e000
> > > [ 2113.507398] nvme 10000:e1:00.0:    [ 0] RxErr                  (First)
> >
> > Oh, I forgot how VMD works.  It sounds like there *is* a Root Port
> > that is logically below the VMD, e.g., (from
> > https://bugzilla.kernel.org/show_bug.cgi?id=215027):
> >
> >   ACPI: PCI Root Bridge [PC00] (domain 0000 [bus 00-e0])
> >   acpi PNP0A08:00: _OSC: platform does not support [AER]
> >   acpi PNP0A08:00: _OSC: OS now controls [PCIeHotplug SHPCHotplug PME PCIeCapability LTR]
> >   pci  0000:00:0e.0: [8086:467f] type 00         # VMD
> >   vmd  0000:00:0e.0: PCI host bridge to bus 10000:e0
> >   pci 10000:e0:06.0: [8086:464d] type 01         # Root Port to [bus e1]
> >   pci 10000:e1:00.0: [144d:a80a] type 00         # Samsung NVMe
> >
> > So ERR_* messages from the e1:00.0 Samsung device would terminate at
> > the e0:06.0 Root Port.  That Root Port has an AER Capability with Root
> > Error Command/Status/Error Source registers.
> >
> > > > But if VMD just passes those messages up to the Root Port, the source
> > > > of the messages (the Requester ID) won't make any sense because
> > > > they're in a hierarchy the Root Port doesn't know anything about.
> > >
> > > Not sure what's current status is but I think Nirmal's patch is valid
> > > for both our cases.
> >
> > So I think the question is whether that PNP0A08:00 _OSC applies to
> > domain 10000.  I think the answer is "no" because the platform doesn't
> > know about the existence of domain 10000, and it can't access config
> > space in that domain.
> 
> Well, the VMD device itself is there in domain 0000, however, and sure
> enough, the platform firmware can access its config space.
> 
> > E.g., if _OSC negotiated that the platform owned AER in domain 0000, I
> > don't think it would make sense for that to mean the platform *also*
> > owned AER in domain 10000, because the platform doesn't know how to
> > configure AER or handle AER interrupts in that domain.
> 
> I'm not sure about this.
> 
> AFAICS, domain 10000 is not physically independent of domain 0000, so
> I'm not sure to what extent the above applies.

Domain 10000 definitely isn't physically independent of domain 0000
since all TLPs to/from 10000 traverse the domain 0000 link to the VMD
at 0000:00:0e.0.

The platform can access the VMD endpoint (0000:00:0e.0) config space.
But I don't think the platform can access config space of anything in
domain 10000, which in my mind makes it *logically* independent.

IIUC, config access to anything below the VMD (e.g., domain 10000) is
done by the vmd driver itself using BAR 0 of the VMD device
(vmd_cfg_addr(), vmd_pci_read(), vmd_pci_write(), see [1]).

Bjorn

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/controller/vmd.c?id=v6.5#n378
Rafael J. Wysocki Sept. 19, 2023, 6:32 p.m. UTC | #15
On Tue, Sep 19, 2023 at 7:33 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Tue, Sep 19, 2023 at 05:52:33PM +0200, Rafael J. Wysocki wrote:
> > On Tue, Sep 19, 2023 at 4:34 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > On Tue, Sep 19, 2023 at 11:31:57AM +0800, Kai-Heng Feng wrote:
> > > > On Wed, Sep 13, 2023 at 8:50 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > [snipped]
> > > > > Hmm.  In some ways the VMD device acts as a Root Port, since it
> > > > > originates a new hierarchy in a separate domain, but on the upstream
> > > > > side, it's just a normal endpoint.
> > > > >
> > > > > How does AER for the new hierarchy work?  A device below the VMD can
> > > > > generate ERR_COR/ERR_NONFATAL/ERR_FATAL messages.  I guess I was
> > > > > assuming those messages would terminate at the VMD, and the VMD could
> > > > > generate an AER interrupt just like a Root Port.  But that can't be
> > > > > right because I don't think VMD would have the Root Error Command
> > > > > register needed to manage that interrupt.
> > > >
> > > > VMD itself doesn't seem to manage AER, the rootport that "moved" from
> > > > 0000 domain does:
> > > > [ 2113.507345] pcieport 10000:e0:06.0: AER: Corrected error received:
> > > > 10000:e1:00.0
> > > > [ 2113.507380] nvme 10000:e1:00.0: PCIe Bus Error: severity=Corrected,
> > > > type=Physical Layer, (Receiver ID)
> > > > [ 2113.507389] nvme 10000:e1:00.0:   device [144d:a80a] error
> > > > status/mask=00000001/0000e000
> > > > [ 2113.507398] nvme 10000:e1:00.0:    [ 0] RxErr                  (First)
> > >
> > > Oh, I forgot how VMD works.  It sounds like there *is* a Root Port
> > > that is logically below the VMD, e.g., (from
> > > https://bugzilla.kernel.org/show_bug.cgi?id=215027):
> > >
> > >   ACPI: PCI Root Bridge [PC00] (domain 0000 [bus 00-e0])
> > >   acpi PNP0A08:00: _OSC: platform does not support [AER]
> > >   acpi PNP0A08:00: _OSC: OS now controls [PCIeHotplug SHPCHotplug PME PCIeCapability LTR]
> > >   pci  0000:00:0e.0: [8086:467f] type 00         # VMD
> > >   vmd  0000:00:0e.0: PCI host bridge to bus 10000:e0
> > >   pci 10000:e0:06.0: [8086:464d] type 01         # Root Port to [bus e1]
> > >   pci 10000:e1:00.0: [144d:a80a] type 00         # Samsung NVMe
> > >
> > > So ERR_* messages from the e1:00.0 Samsung device would terminate at
> > > the e0:06.0 Root Port.  That Root Port has an AER Capability with Root
> > > Error Command/Status/Error Source registers.
> > >
> > > > > But if VMD just passes those messages up to the Root Port, the source
> > > > > of the messages (the Requester ID) won't make any sense because
> > > > > they're in a hierarchy the Root Port doesn't know anything about.
> > > >
> > > > Not sure what's current status is but I think Nirmal's patch is valid
> > > > for both our cases.
> > >
> > > So I think the question is whether that PNP0A08:00 _OSC applies to
> > > domain 10000.  I think the answer is "no" because the platform doesn't
> > > know about the existence of domain 10000, and it can't access config
> > > space in that domain.
> >
> > Well, the VMD device itself is there in domain 0000, however, and sure
> > enough, the platform firmware can access its config space.
> >
> > > E.g., if _OSC negotiated that the platform owned AER in domain 0000, I
> > > don't think it would make sense for that to mean the platform *also*
> > > owned AER in domain 10000, because the platform doesn't know how to
> > > configure AER or handle AER interrupts in that domain.
> >
> > I'm not sure about this.
> >
> > AFAICS, domain 10000 is not physically independent of domain 0000, so
> > I'm not sure to what extent the above applies.
>
> Domain 10000 definitely isn't physically independent of domain 0000
> since all TLPs to/from 10000 traverse the domain 0000 link to the VMD
> at 0000:00:0e.0.
>
> The platform can access the VMD endpoint (0000:00:0e.0) config space.
> But I don't think the platform can access config space of anything in
> domain 10000, which in my mind makes it *logically* independent.
>
> IIUC, config access to anything below the VMD (e.g., domain 10000) is
> done by the vmd driver itself using BAR 0 of the VMD device
> (vmd_cfg_addr(), vmd_pci_read(), vmd_pci_write(), see [1]).
>
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/controller/vmd.c?id=v6.5#n378

I know, but the platform firmware may not need to go through the VMD
BAR in order to change the configuration of the devices in the "VMD
segment".

The question is whether or not the OS should attempt to control the
_OSC features on the VMD segment if the firmware has not allowed it to
control those features on the primary "parent" segment and the way the
config space on the VMD segment is accessed may not be entirely
relevant in that respect.

Note that VMD may want to impose additional restrictions on the OS
control of those features, but it is not clear to me whether or not it
should attempt to override the refusal to grant control of them on the
primary host bridge.
Bjorn Helgaas Sept. 19, 2023, 8:09 p.m. UTC | #16
On Tue, Sep 19, 2023 at 08:32:22PM +0200, Rafael J. Wysocki wrote:
> On Tue, Sep 19, 2023 at 7:33 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > On Tue, Sep 19, 2023 at 05:52:33PM +0200, Rafael J. Wysocki wrote:
> > > On Tue, Sep 19, 2023 at 4:34 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > On Tue, Sep 19, 2023 at 11:31:57AM +0800, Kai-Heng Feng wrote:
> > > > > On Wed, Sep 13, 2023 at 8:50 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > > [snipped]
> > > > > > Hmm.  In some ways the VMD device acts as a Root Port, since it
> > > > > > originates a new hierarchy in a separate domain, but on the upstream
> > > > > > side, it's just a normal endpoint.
> > > > > >
> > > > > > How does AER for the new hierarchy work?  A device below the VMD can
> > > > > > generate ERR_COR/ERR_NONFATAL/ERR_FATAL messages.  I guess I was
> > > > > > assuming those messages would terminate at the VMD, and the VMD could
> > > > > > generate an AER interrupt just like a Root Port.  But that can't be
> > > > > > right because I don't think VMD would have the Root Error Command
> > > > > > register needed to manage that interrupt.
> > > > >
> > > > > VMD itself doesn't seem to manage AER, the rootport that "moved" from
> > > > > 0000 domain does:
> > > > > [ 2113.507345] pcieport 10000:e0:06.0: AER: Corrected error received:
> > > > > 10000:e1:00.0
> > > > > [ 2113.507380] nvme 10000:e1:00.0: PCIe Bus Error: severity=Corrected,
> > > > > type=Physical Layer, (Receiver ID)
> > > > > [ 2113.507389] nvme 10000:e1:00.0:   device [144d:a80a] error
> > > > > status/mask=00000001/0000e000
> > > > > [ 2113.507398] nvme 10000:e1:00.0:    [ 0] RxErr                  (First)
> > > >
> > > > Oh, I forgot how VMD works.  It sounds like there *is* a Root Port
> > > > that is logically below the VMD, e.g., (from
> > > > https://bugzilla.kernel.org/show_bug.cgi?id=215027):
> > > >
> > > >   ACPI: PCI Root Bridge [PC00] (domain 0000 [bus 00-e0])
> > > >   acpi PNP0A08:00: _OSC: platform does not support [AER]
> > > >   acpi PNP0A08:00: _OSC: OS now controls [PCIeHotplug SHPCHotplug PME PCIeCapability LTR]
> > > >   pci  0000:00:0e.0: [8086:467f] type 00         # VMD
> > > >   vmd  0000:00:0e.0: PCI host bridge to bus 10000:e0
> > > >   pci 10000:e0:06.0: [8086:464d] type 01         # Root Port to [bus e1]
> > > >   pci 10000:e1:00.0: [144d:a80a] type 00         # Samsung NVMe
> > > >
> > > > So ERR_* messages from the e1:00.0 Samsung device would terminate at
> > > > the e0:06.0 Root Port.  That Root Port has an AER Capability with Root
> > > > Error Command/Status/Error Source registers.
> > > >
> > > > > > But if VMD just passes those messages up to the Root Port, the source
> > > > > > of the messages (the Requester ID) won't make any sense because
> > > > > > they're in a hierarchy the Root Port doesn't know anything about.
> > > > >
> > > > > Not sure what's current status is but I think Nirmal's patch is valid
> > > > > for both our cases.
> > > >
> > > > So I think the question is whether that PNP0A08:00 _OSC applies to
> > > > domain 10000.  I think the answer is "no" because the platform doesn't
> > > > know about the existence of domain 10000, and it can't access config
> > > > space in that domain.
> > >
> > > Well, the VMD device itself is there in domain 0000, however, and sure
> > > enough, the platform firmware can access its config space.
> > >
> > > > E.g., if _OSC negotiated that the platform owned AER in domain 0000, I
> > > > don't think it would make sense for that to mean the platform *also*
> > > > owned AER in domain 10000, because the platform doesn't know how to
> > > > configure AER or handle AER interrupts in that domain.
> > >
> > > I'm not sure about this.
> > >
> > > AFAICS, domain 10000 is not physically independent of domain 0000, so
> > > I'm not sure to what extent the above applies.
> >
> > Domain 10000 definitely isn't physically independent of domain 0000
> > since all TLPs to/from 10000 traverse the domain 0000 link to the VMD
> > at 0000:00:0e.0.
> >
> > The platform can access the VMD endpoint (0000:00:0e.0) config space.
> > But I don't think the platform can access config space of anything in
> > domain 10000, which in my mind makes it *logically* independent.
> >
> > IIUC, config access to anything below the VMD (e.g., domain 10000) is
> > done by the vmd driver itself using BAR 0 of the VMD device
> > (vmd_cfg_addr(), vmd_pci_read(), vmd_pci_write(), see [1]).
> >
> > [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/controller/vmd.c?id=v6.5#n378
> 
> I know, but the platform firmware may not need to go through the VMD
> BAR in order to change the configuration of the devices in the "VMD
> segment".

I'm assuming that firmware can only access "VMD segment" config space
if it has its own VMD driver.  It sounds like there might be another
way that I don't know about?  Is there some way to do things via AML
and _REG?

> The question is whether or not the OS should attempt to control the
> _OSC features on the VMD segment if the firmware has not allowed it to
> control those features on the primary "parent" segment and the way the
> config space on the VMD segment is accessed may not be entirely
> relevant in that respect.

All these features are managed via config space.  I don't know how to
interpret "firmware has no way to read the AER Capability of a Root
Port in the VMD segment, but firmware still owns AER".  That sounds to
me like "nobody can do anything with AER".

> Note that VMD may want to impose additional restrictions on the OS
> control of those features, but it is not clear to me whether or not it
> should attempt to override the refusal to grant control of them on the
> primary host bridge.

In practical terms, I think you're saying vmd_copy_host_bridge_flags()
could CLEAR vmd_bridge->native_* flags (i.e., restrict the OS from
using features even if the platform has granted control of them), but
it's not clear that it should SET them (i.e., override platform's
refusal to grant control)?  Is that right?

I've been assuming config space access is the critical piece, but it
sounds like maybe it's not?

Bjorn
Rafael J. Wysocki Sept. 20, 2023, 10:08 a.m. UTC | #17
On Tue, Sep 19, 2023 at 10:09 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Tue, Sep 19, 2023 at 08:32:22PM +0200, Rafael J. Wysocki wrote:
> > On Tue, Sep 19, 2023 at 7:33 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > On Tue, Sep 19, 2023 at 05:52:33PM +0200, Rafael J. Wysocki wrote:
> > > > On Tue, Sep 19, 2023 at 4:34 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > > On Tue, Sep 19, 2023 at 11:31:57AM +0800, Kai-Heng Feng wrote:
> > > > > > On Wed, Sep 13, 2023 at 8:50 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > > > [snipped]
> > > > > > > Hmm.  In some ways the VMD device acts as a Root Port, since it
> > > > > > > originates a new hierarchy in a separate domain, but on the upstream
> > > > > > > side, it's just a normal endpoint.
> > > > > > >
> > > > > > > How does AER for the new hierarchy work?  A device below the VMD can
> > > > > > > generate ERR_COR/ERR_NONFATAL/ERR_FATAL messages.  I guess I was
> > > > > > > assuming those messages would terminate at the VMD, and the VMD could
> > > > > > > generate an AER interrupt just like a Root Port.  But that can't be
> > > > > > > right because I don't think VMD would have the Root Error Command
> > > > > > > register needed to manage that interrupt.
> > > > > >
> > > > > > VMD itself doesn't seem to manage AER, the rootport that "moved" from
> > > > > > 0000 domain does:
> > > > > > [ 2113.507345] pcieport 10000:e0:06.0: AER: Corrected error received:
> > > > > > 10000:e1:00.0
> > > > > > [ 2113.507380] nvme 10000:e1:00.0: PCIe Bus Error: severity=Corrected,
> > > > > > type=Physical Layer, (Receiver ID)
> > > > > > [ 2113.507389] nvme 10000:e1:00.0:   device [144d:a80a] error
> > > > > > status/mask=00000001/0000e000
> > > > > > [ 2113.507398] nvme 10000:e1:00.0:    [ 0] RxErr                  (First)
> > > > >
> > > > > Oh, I forgot how VMD works.  It sounds like there *is* a Root Port
> > > > > that is logically below the VMD, e.g., (from
> > > > > https://bugzilla.kernel.org/show_bug.cgi?id=215027):
> > > > >
> > > > >   ACPI: PCI Root Bridge [PC00] (domain 0000 [bus 00-e0])
> > > > >   acpi PNP0A08:00: _OSC: platform does not support [AER]
> > > > >   acpi PNP0A08:00: _OSC: OS now controls [PCIeHotplug SHPCHotplug PME PCIeCapability LTR]
> > > > >   pci  0000:00:0e.0: [8086:467f] type 00         # VMD
> > > > >   vmd  0000:00:0e.0: PCI host bridge to bus 10000:e0
> > > > >   pci 10000:e0:06.0: [8086:464d] type 01         # Root Port to [bus e1]
> > > > >   pci 10000:e1:00.0: [144d:a80a] type 00         # Samsung NVMe
> > > > >
> > > > > So ERR_* messages from the e1:00.0 Samsung device would terminate at
> > > > > the e0:06.0 Root Port.  That Root Port has an AER Capability with Root
> > > > > Error Command/Status/Error Source registers.
> > > > >
> > > > > > > But if VMD just passes those messages up to the Root Port, the source
> > > > > > > of the messages (the Requester ID) won't make any sense because
> > > > > > > they're in a hierarchy the Root Port doesn't know anything about.
> > > > > >
> > > > > > Not sure what's current status is but I think Nirmal's patch is valid
> > > > > > for both our cases.
> > > > >
> > > > > So I think the question is whether that PNP0A08:00 _OSC applies to
> > > > > domain 10000.  I think the answer is "no" because the platform doesn't
> > > > > know about the existence of domain 10000, and it can't access config
> > > > > space in that domain.
> > > >
> > > > Well, the VMD device itself is there in domain 0000, however, and sure
> > > > enough, the platform firmware can access its config space.
> > > >
> > > > > E.g., if _OSC negotiated that the platform owned AER in domain 0000, I
> > > > > don't think it would make sense for that to mean the platform *also*
> > > > > owned AER in domain 10000, because the platform doesn't know how to
> > > > > configure AER or handle AER interrupts in that domain.
> > > >
> > > > I'm not sure about this.
> > > >
> > > > AFAICS, domain 10000 is not physically independent of domain 0000, so
> > > > I'm not sure to what extent the above applies.
> > >
> > > Domain 10000 definitely isn't physically independent of domain 0000
> > > since all TLPs to/from 10000 traverse the domain 0000 link to the VMD
> > > at 0000:00:0e.0.
> > >
> > > The platform can access the VMD endpoint (0000:00:0e.0) config space.
> > > But I don't think the platform can access config space of anything in
> > > domain 10000, which in my mind makes it *logically* independent.
> > >
> > > IIUC, config access to anything below the VMD (e.g., domain 10000) is
> > > done by the vmd driver itself using BAR 0 of the VMD device
> > > (vmd_cfg_addr(), vmd_pci_read(), vmd_pci_write(), see [1]).
> > >
> > > [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/controller/vmd.c?id=v6.5#n378
> >
> > I know, but the platform firmware may not need to go through the VMD
> > BAR in order to change the configuration of the devices in the "VMD
> > segment".
>
> I'm assuming that firmware can only access "VMD segment" config space
> if it has its own VMD driver.  It sounds like there might be another
> way that I don't know about?  Is there some way to do things via AML
> and _REG?

I would need to ask the VMD people about this TBH.

> > The question is whether or not the OS should attempt to control the
> > _OSC features on the VMD segment if the firmware has not allowed it to
> > control those features on the primary "parent" segment and the way the
> > config space on the VMD segment is accessed may not be entirely
> > relevant in that respect.
>
> All these features are managed via config space.

As far as the OS is concerned.

> I don't know how to
> interpret "firmware has no way to read the AER Capability of a Root
> Port in the VMD segment, but firmware still owns AER".  That sounds to
> me like "nobody can do anything with AER".

This assumes that the config space access method used by Linux is the
only one available, but I'm not sure that this is the case.  It would
be nice if someone could confirm or deny this.  Nirmal?

Also, I know for a fact that the firmware running on certain tiny
cores included in the SoC may talk to the PCI devices on it via I2C or
similar (at least on servers).  This communication channel can be used
to control the features in question AFAICS.

> > Note that VMD may want to impose additional restrictions on the OS
> > control of those features, but it is not clear to me whether or not it
> > should attempt to override the refusal to grant control of them on the
> > primary host bridge.
>
> In practical terms, I think you're saying vmd_copy_host_bridge_flags()
> could CLEAR vmd_bridge->native_* flags (i.e., restrict the OS from
> using features even if the platform has granted control of them), but
> it's not clear that it should SET them (i.e., override platform's
> refusal to grant control)?  Is that right?

Yes, it is.

> I've been assuming config space access is the critical piece, but it
> sounds like maybe it's not?

It may not be in principle, so it would be good to have more
information regarding this.  I would like the VMD people to tell us
what exactly the expectations are.
Nirmal Patel Sept. 21, 2023, 12:20 a.m. UTC | #18
On 9/20/2023 3:08 AM, Rafael J. Wysocki wrote:
> On Tue, Sep 19, 2023 at 10:09 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>> On Tue, Sep 19, 2023 at 08:32:22PM +0200, Rafael J. Wysocki wrote:
>>> On Tue, Sep 19, 2023 at 7:33 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>>>> On Tue, Sep 19, 2023 at 05:52:33PM +0200, Rafael J. Wysocki wrote:
>>>>> On Tue, Sep 19, 2023 at 4:34 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>>>>>> On Tue, Sep 19, 2023 at 11:31:57AM +0800, Kai-Heng Feng wrote:
>>>>>>> On Wed, Sep 13, 2023 at 8:50 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>>>>>>> [snipped]
>>>>>>>> Hmm.  In some ways the VMD device acts as a Root Port, since it
>>>>>>>> originates a new hierarchy in a separate domain, but on the upstream
>>>>>>>> side, it's just a normal endpoint.
>>>>>>>>
>>>>>>>> How does AER for the new hierarchy work?  A device below the VMD can
>>>>>>>> generate ERR_COR/ERR_NONFATAL/ERR_FATAL messages.  I guess I was
>>>>>>>> assuming those messages would terminate at the VMD, and the VMD could
>>>>>>>> generate an AER interrupt just like a Root Port.  But that can't be
>>>>>>>> right because I don't think VMD would have the Root Error Command
>>>>>>>> register needed to manage that interrupt.
>>>>>>> VMD itself doesn't seem to manage AER, the rootport that "moved" from
>>>>>>> 0000 domain does:
>>>>>>> [ 2113.507345] pcieport 10000:e0:06.0: AER: Corrected error received:
>>>>>>> 10000:e1:00.0
>>>>>>> [ 2113.507380] nvme 10000:e1:00.0: PCIe Bus Error: severity=Corrected,
>>>>>>> type=Physical Layer, (Receiver ID)
>>>>>>> [ 2113.507389] nvme 10000:e1:00.0:   device [144d:a80a] error
>>>>>>> status/mask=00000001/0000e000
>>>>>>> [ 2113.507398] nvme 10000:e1:00.0:    [ 0] RxErr                  (First)
>>>>>> Oh, I forgot how VMD works.  It sounds like there *is* a Root Port
>>>>>> that is logically below the VMD, e.g., (from
>>>>>> https://bugzilla.kernel.org/show_bug.cgi?id=215027):
>>>>>>
>>>>>>   ACPI: PCI Root Bridge [PC00] (domain 0000 [bus 00-e0])
>>>>>>   acpi PNP0A08:00: _OSC: platform does not support [AER]
>>>>>>   acpi PNP0A08:00: _OSC: OS now controls [PCIeHotplug SHPCHotplug PME PCIeCapability LTR]
>>>>>>   pci  0000:00:0e.0: [8086:467f] type 00         # VMD
>>>>>>   vmd  0000:00:0e.0: PCI host bridge to bus 10000:e0
>>>>>>   pci 10000:e0:06.0: [8086:464d] type 01         # Root Port to [bus e1]
>>>>>>   pci 10000:e1:00.0: [144d:a80a] type 00         # Samsung NVMe
>>>>>>
>>>>>> So ERR_* messages from the e1:00.0 Samsung device would terminate at
>>>>>> the e0:06.0 Root Port.  That Root Port has an AER Capability with Root
>>>>>> Error Command/Status/Error Source registers.
>>>>>>
>>>>>>>> But if VMD just passes those messages up to the Root Port, the source
>>>>>>>> of the messages (the Requester ID) won't make any sense because
>>>>>>>> they're in a hierarchy the Root Port doesn't know anything about.
>>>>>>> Not sure what's current status is but I think Nirmal's patch is valid
>>>>>>> for both our cases.
>>>>>> So I think the question is whether that PNP0A08:00 _OSC applies to
>>>>>> domain 10000.  I think the answer is "no" because the platform doesn't
>>>>>> know about the existence of domain 10000, and it can't access config
>>>>>> space in that domain.
>>>>> Well, the VMD device itself is there in domain 0000, however, and sure
>>>>> enough, the platform firmware can access its config space.
>>>>>
>>>>>> E.g., if _OSC negotiated that the platform owned AER in domain 0000, I
>>>>>> don't think it would make sense for that to mean the platform *also*
>>>>>> owned AER in domain 10000, because the platform doesn't know how to
>>>>>> configure AER or handle AER interrupts in that domain.
>>>>> I'm not sure about this.
>>>>>
>>>>> AFAICS, domain 10000 is not physically independent of domain 0000, so
>>>>> I'm not sure to what extent the above applies.
>>>> Domain 10000 definitely isn't physically independent of domain 0000
>>>> since all TLPs to/from 10000 traverse the domain 0000 link to the VMD
>>>> at 0000:00:0e.0.
>>>>
>>>> The platform can access the VMD endpoint (0000:00:0e.0) config space.
>>>> But I don't think the platform can access config space of anything in
>>>> domain 10000, which in my mind makes it *logically* independent.
>>>>
>>>> IIUC, config access to anything below the VMD (e.g., domain 10000) is
>>>> done by the vmd driver itself using BAR 0 of the VMD device
>>>> (vmd_cfg_addr(), vmd_pci_read(), vmd_pci_write(), see [1]).
>>>>
>>>> [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/controller/vmd.c?id=v6.5#n378
>>> I know, but the platform firmware may not need to go through the VMD
>>> BAR in order to change the configuration of the devices in the "VMD
>>> segment".
>> I'm assuming that firmware can only access "VMD segment" config space
>> if it has its own VMD driver.  It sounds like there might be another
>> way that I don't know about?  Is there some way to do things via AML
>> and _REG?
> I would need to ask the VMD people about this TBH.

Sorry for the previous message!

Once VMD is enabled from the BIOS, entire PCI tree behind VMD is owned
and managed by UEFI VMD driver. So 10000 domain is owned by VMD Linux
driver after boot up.

The entire PCI topology behind VMD is mapped into a memory whose
address is programmed in BAR0/1 of VMD endpoint. Any config access to
rootports or NVMe behind VMD is an MMIO access.

>
>>> The question is whether or not the OS should attempt to control the
>>> _OSC features on the VMD segment if the firmware has not allowed it to
>>> control those features on the primary "parent" segment and the way the
>>> config space on the VMD segment is accessed may not be entirely
>>> relevant in that respect.
I agree. The way the config space on VMD segment is accessed may
not be important. Since we have the question of ownership of VMD
domain.

When firmware disables native_aer support, the VMD driver doesn't
know about it. We can verify this by looking at difference in
values of native_aer for VMD host_bridge and VMD root_bridge.
(i.e. vmd_copy_host_bridge_flags)
OS needs to make VMD driver aware if AER is enabled or disabled by
the firmware.

We should also consider the case of VM.
VM firmware is not aware of VMD or VMD Hotplug support. Host firmware
needs to provide a way to pass the Hotplug settings to VM firmware
but it is not possible at the moment as VM firmware doesn't know
about VMD.

Also another impact of _OSC ownership of VMD domain for all the
settings is that the VMD Hotplug setting from BIOS becomes irrelevant.
BIOS allows enabling and disabling of Hotplug on VMD's rootports from
BIOS menu but it doesn't allow enabling or disabling of AER, PM, DPC,
etc.
>> All these features are managed via config space.
> As far as the OS is concerned.
>
>> I don't know how to
>> interpret "firmware has no way to read the AER Capability of a Root
>> Port in the VMD segment, but firmware still owns AER".  That sounds to
>> me like "nobody can do anything with AER".
> This assumes that the config space access method used by Linux is the
> only one available, but I'm not sure that this is the case.  It would
> be nice if someone could confirm or deny this.  Nirmal?
>
> Also, I know for a fact that the firmware running on certain tiny
> cores included in the SoC may talk to the PCI devices on it via I2C or
> similar (at least on servers).  This communication channel can be used
> to control the features in question AFAICS.
>
>>> Note that VMD may want to impose additional restrictions on the OS
>>> control of those features, but it is not clear to me whether or not it
>>> should attempt to override the refusal to grant control of them on the
>>> primary host bridge.
>> In practical terms, I think you're saying vmd_copy_host_bridge_flags()
>> could CLEAR vmd_bridge->native_* flags (i.e., restrict the OS from
>> using features even if the platform has granted control of them), but
>> it's not clear that it should SET them (i.e., override platform's
>> refusal to grant control)?  Is that right?
> Yes, it is.
That may be true for host but in case of VM where VM BIOS is
disabling everything and setting them to default power off state.

Since VMD driver doesn't know if native_aer is enabled or disabled on
the platform by the firmware, we should keep using
vmd_copy_host_bridge_flags to get native_aer settings.

I can think of two approaches at the moment that can solve our issue.

Option #1: Keep the proposed changes.

Option #2: Keep the patch 04b12ef163d1 and find a way to set Hotplug
for VMD in VM. One way to do is to read Hotplug capable bit in SltCap
as defined in PCI specs and enable or disable native_hotplug based on
that value.

For more information.
Since VMD Linux driver doesn't have independent AER handling, it
should honor the platform or _OSC AER settings. Before the patch
04b12ef163d1, AER was getting enabled on its VMD rootports when VMD
driver makes call to pci_create_root_bus -> pci_alloc_host_bridge
-> pci_init_host_bridge. see [1]

[1] https://elixir.bootlin.com/linux/latest/C/ident/pci_init_host_bridge

>
>> I've been assuming config space access is the critical piece, but it
>> sounds like maybe it's not?
> It may not be in principle, so it would be good to have more
> information regarding this.  I would like the VMD people to tell us
> what exactly the expectations are.
diff mbox series

Patch

diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
index 769eedeb8802..52c2461b4761 100644
--- a/drivers/pci/controller/vmd.c
+++ b/drivers/pci/controller/vmd.c
@@ -701,8 +701,6 @@  static int vmd_alloc_irqs(struct vmd_dev *vmd)
 static void vmd_copy_host_bridge_flags(struct pci_host_bridge *root_bridge,
 				       struct pci_host_bridge *vmd_bridge)
 {
-	vmd_bridge->native_pcie_hotplug = root_bridge->native_pcie_hotplug;
-	vmd_bridge->native_shpc_hotplug = root_bridge->native_shpc_hotplug;
 	vmd_bridge->native_aer = root_bridge->native_aer;
 	vmd_bridge->native_pme = root_bridge->native_pme;
 	vmd_bridge->native_ltr = root_bridge->native_ltr;