diff mbox series

x86/PCI: Use MMCONFIG by default for KVM guests

Message ID 20200722001513.298315-1-jusual@redhat.com
State New
Headers show
Series x86/PCI: Use MMCONFIG by default for KVM guests | expand

Commit Message

Julia Suvorova July 22, 2020, 12:15 a.m. UTC
Scanning for PCI devices at boot takes a long time for KVM guests. It
can be reduced if KVM will handle all configuration space accesses for
non-existent devices without going to userspace [1]. But for this to
work, all accesses must go through MMCONFIG.
This change allows to use pci_mmcfg as raw_pci_ops for 64-bit KVM
guests making MMCONFIG the default access method.

[1] https://lkml.org/lkml/2020/5/14/936

Signed-off-by: Julia Suvorova <jusual@redhat.com>
---
 arch/x86/pci/direct.c      | 5 +++++
 arch/x86/pci/mmconfig_64.c | 3 +++
 2 files changed, 8 insertions(+)

Comments

Matthew Wilcox (Oracle) July 22, 2020, 2:59 a.m. UTC | #1
On Wed, Jul 22, 2020 at 02:15:13AM +0200, Julia Suvorova wrote:
> @@ -264,6 +265,10 @@ void __init pci_direct_init(int type)
>  {
>  	if (type == 0)
>  		return;
> +
> +	if (raw_pci_ext_ops && kvm_para_available())
> +		return;

This is a bit of a subtle way of saying "If mmconfig exists, don't use
the cf8 mechanism".  There's probably a better way of doing this, but
the x86 pci init sequence is already byzantine and I don't understand it
well enough to offer you an alternative.
Vitaly Kuznetsov July 22, 2020, 9:43 a.m. UTC | #2
Julia Suvorova <jusual@redhat.com> writes:

> Scanning for PCI devices at boot takes a long time for KVM guests. It
> can be reduced if KVM will handle all configuration space accesses for
> non-existent devices without going to userspace [1]. But for this to
> work, all accesses must go through MMCONFIG.
> This change allows to use pci_mmcfg as raw_pci_ops for 64-bit KVM
> guests making MMCONFIG the default access method.
>
> [1] https://lkml.org/lkml/2020/5/14/936
>
> Signed-off-by: Julia Suvorova <jusual@redhat.com>
> ---
>  arch/x86/pci/direct.c      | 5 +++++
>  arch/x86/pci/mmconfig_64.c | 3 +++
>  2 files changed, 8 insertions(+)
>
> diff --git a/arch/x86/pci/direct.c b/arch/x86/pci/direct.c
> index a51074c55982..8ff6b65d8f48 100644
> --- a/arch/x86/pci/direct.c
> +++ b/arch/x86/pci/direct.c
> @@ -6,6 +6,7 @@
>  #include <linux/pci.h>
>  #include <linux/init.h>
>  #include <linux/dmi.h>
> +#include <linux/kvm_para.h>
>  #include <asm/pci_x86.h>
>  
>  /*
> @@ -264,6 +265,10 @@ void __init pci_direct_init(int type)
>  {
>  	if (type == 0)
>  		return;
> +
> +	if (raw_pci_ext_ops && kvm_para_available())
> +		return;
> +
>  	printk(KERN_INFO "PCI: Using configuration type %d for base access\n",
>  		 type);
>  	if (type == 1) {
> diff --git a/arch/x86/pci/mmconfig_64.c b/arch/x86/pci/mmconfig_64.c
> index 0c7b6e66c644..9eb772821766 100644
> --- a/arch/x86/pci/mmconfig_64.c
> +++ b/arch/x86/pci/mmconfig_64.c
> @@ -10,6 +10,7 @@
>  #include <linux/init.h>
>  #include <linux/acpi.h>
>  #include <linux/bitmap.h>
> +#include <linux/kvm_para.h>
>  #include <linux/rcupdate.h>
>  #include <asm/e820/api.h>
>  #include <asm/pci_x86.h>
> @@ -122,6 +123,8 @@ int __init pci_mmcfg_arch_init(void)
>  		}
>  
>  	raw_pci_ext_ops = &pci_mmcfg;
> +	if (kvm_para_available())
> +		raw_pci_ops = &pci_mmcfg;
>  
>  	return 1;
>  }

This implies mmconfig access method is always functional (when present)
for all KVM guests, regardless of hypervisor version/which KVM userspace
is is use/... In case the assumption is true the patch looks good (to
me) but in case it isn't or if we think that more control over this
is needed we may want to introduce a PV feature bit for KVM.

Also, I'm thinking about moving this to arch/x86/kernel/kvm.c: we can
override x86_init.pci.arch_init and reassign raw_pci_ops after doing
pci_arch_init().

Cc: kvm@vger.kernel.org
Michael S. Tsirkin July 22, 2020, 10:59 a.m. UTC | #3
On Wed, Jul 22, 2020 at 02:15:13AM +0200, Julia Suvorova wrote:
> Scanning for PCI devices at boot takes a long time for KVM guests. It
> can be reduced if KVM will handle all configuration space accesses for
> non-existent devices without going to userspace [1]. But for this to
> work, all accesses must go through MMCONFIG.
> This change allows to use pci_mmcfg as raw_pci_ops for 64-bit KVM
> guests making MMCONFIG the default access method.
> 
> [1] https://lkml.org/lkml/2020/5/14/936
> 
> Signed-off-by: Julia Suvorova <jusual@redhat.com>

Thanks for the patch!
Some comments:


I guess the point is that on KVM, MMIO accesses of mmcfg are
faster than two accesses needed for classic access - is that right?
Worth mentioning in the commit log.

> ---
>  arch/x86/pci/direct.c      | 5 +++++
>  arch/x86/pci/mmconfig_64.c | 3 +++
>  2 files changed, 8 insertions(+)
> 
> diff --git a/arch/x86/pci/direct.c b/arch/x86/pci/direct.c
> index a51074c55982..8ff6b65d8f48 100644
> --- a/arch/x86/pci/direct.c
> +++ b/arch/x86/pci/direct.c
> @@ -6,6 +6,7 @@
>  #include <linux/pci.h>
>  #include <linux/init.h>
>  #include <linux/dmi.h>
> +#include <linux/kvm_para.h>
>  #include <asm/pci_x86.h>
>  
>  /*
> @@ -264,6 +265,10 @@ void __init pci_direct_init(int type)
>  {
>  	if (type == 0)
>  		return;
> +
> +	if (raw_pci_ext_ops && kvm_para_available())
> +		return;
> +
>  	printk(KERN_INFO "PCI: Using configuration type %d for base access\n",
>  		 type);
>  	if (type == 1) {
> diff --git a/arch/x86/pci/mmconfig_64.c b/arch/x86/pci/mmconfig_64.c
> index 0c7b6e66c644..9eb772821766 100644
> --- a/arch/x86/pci/mmconfig_64.c
> +++ b/arch/x86/pci/mmconfig_64.c
> @@ -10,6 +10,7 @@
>  #include <linux/init.h>
>  #include <linux/acpi.h>
>  #include <linux/bitmap.h>
> +#include <linux/kvm_para.h>
>  #include <linux/rcupdate.h>
>  #include <asm/e820/api.h>
>  #include <asm/pci_x86.h>
> @@ -122,6 +123,8 @@ int __init pci_mmcfg_arch_init(void)
>  		}
>  
>  	raw_pci_ext_ops = &pci_mmcfg;
> +	if (kvm_para_available())
> +		raw_pci_ops = &pci_mmcfg;
>  
>  	return 1;
>  }

The issue with anything like this is that it breaks if we ever do
some config accesses that affect mmconfig, e.g. to move it, or if
disabling or sizing BARs on some device (e.g. the PCI bridge)
also disables MMCFG.

I guess the explanation for why this works with QEMU is basically
that at least on QEMU right now disabling memory on the root
device does not disable MMCFG, and linux does not bother
tweaking MMCFG range set up by the bios.

Some suggestions:

1.  It's worth mentioning all this in the commit log.

2.  How do we know the above will always be correct?
    Something like checking the ID of the root might be a good idea.
    And given we know the ID, we can also make sure we don't
    disable MMCFG. Does this make sense?

3. Another idea: how about preferring pcbios on kvm instead? That can do
   what's appropriate for the platform ...

> -- 
> 2.25.4
Bjorn Helgaas July 22, 2020, 11:19 p.m. UTC | #4
On Wed, Jul 22, 2020 at 02:15:13AM +0200, Julia Suvorova wrote:
> Scanning for PCI devices at boot takes a long time for KVM guests. It
> can be reduced if KVM will handle all configuration space accesses for
> non-existent devices without going to userspace [1]. But for this to
> work, all accesses must go through MMCONFIG.
> This change allows to use pci_mmcfg as raw_pci_ops for 64-bit KVM
> guests making MMCONFIG the default access method.

The above *looks* like it's intended to be two paragraphs, which would
be easier to read with a blank line between.

The last sentence should say what the patch actually *does*, e.g.,
"Use pci_mmcfg as raw_pci_ops ..."

> [1] https://lkml.org/lkml/2020/5/14/936

Please use a lore.kernel.org URL instead because it's more usable and
I'd rather depend on kernel.org than lkml.org.

> Signed-off-by: Julia Suvorova <jusual@redhat.com>
> ---
>  arch/x86/pci/direct.c      | 5 +++++
>  arch/x86/pci/mmconfig_64.c | 3 +++
>  2 files changed, 8 insertions(+)
> 
> diff --git a/arch/x86/pci/direct.c b/arch/x86/pci/direct.c
> index a51074c55982..8ff6b65d8f48 100644
> --- a/arch/x86/pci/direct.c
> +++ b/arch/x86/pci/direct.c
> @@ -6,6 +6,7 @@
>  #include <linux/pci.h>
>  #include <linux/init.h>
>  #include <linux/dmi.h>
> +#include <linux/kvm_para.h>
>  #include <asm/pci_x86.h>
>  
>  /*
> @@ -264,6 +265,10 @@ void __init pci_direct_init(int type)
>  {
>  	if (type == 0)
>  		return;
> +
> +	if (raw_pci_ext_ops && kvm_para_available())
> +		return;
>  	printk(KERN_INFO "PCI: Using configuration type %d for base access\n",
>  		 type);
>  	if (type == 1) {
> diff --git a/arch/x86/pci/mmconfig_64.c b/arch/x86/pci/mmconfig_64.c
> index 0c7b6e66c644..9eb772821766 100644
> --- a/arch/x86/pci/mmconfig_64.c
> +++ b/arch/x86/pci/mmconfig_64.c
> @@ -10,6 +10,7 @@
>  #include <linux/init.h>
>  #include <linux/acpi.h>
>  #include <linux/bitmap.h>
> +#include <linux/kvm_para.h>
>  #include <linux/rcupdate.h>
>  #include <asm/e820/api.h>
>  #include <asm/pci_x86.h>
> @@ -122,6 +123,8 @@ int __init pci_mmcfg_arch_init(void)
>  		}
>  
>  	raw_pci_ext_ops = &pci_mmcfg;
> +	if (kvm_para_available())
> +		raw_pci_ops = &pci_mmcfg;

The idea of using MMCONFIG for *all* config space, not just extended
config space, makes sense to me, although the very long discussion at
https://lore.kernel.org/lkml/20071225032605.29147200@laptopd505.fenrus.org/
makes me wary.  Of course I realize you're talking specifically about
KVM, not doing this in general.

But it doesn't seem right to make this specific to KVM, since it's not
obvious to me that there's a basis in PCI for making this distinction.

>  	return 1;
>  }
> -- 
> 2.25.4
>
Julia Suvorova July 26, 2020, 6:35 p.m. UTC | #5
On Wed, Jul 22, 2020 at 11:43 AM Vitaly Kuznetsov <vkuznets@redhat.com> wrote:
>
> Julia Suvorova <jusual@redhat.com> writes:
>
> > Scanning for PCI devices at boot takes a long time for KVM guests. It
> > can be reduced if KVM will handle all configuration space accesses for
> > non-existent devices without going to userspace [1]. But for this to
> > work, all accesses must go through MMCONFIG.
> > This change allows to use pci_mmcfg as raw_pci_ops for 64-bit KVM
> > guests making MMCONFIG the default access method.
> >
> > [1] https://lkml.org/lkml/2020/5/14/936
> >
> > Signed-off-by: Julia Suvorova <jusual@redhat.com>
> > ---
> >  arch/x86/pci/direct.c      | 5 +++++
> >  arch/x86/pci/mmconfig_64.c | 3 +++
> >  2 files changed, 8 insertions(+)
> >
> > diff --git a/arch/x86/pci/direct.c b/arch/x86/pci/direct.c
> > index a51074c55982..8ff6b65d8f48 100644
> > --- a/arch/x86/pci/direct.c
> > +++ b/arch/x86/pci/direct.c
> > @@ -6,6 +6,7 @@
> >  #include <linux/pci.h>
> >  #include <linux/init.h>
> >  #include <linux/dmi.h>
> > +#include <linux/kvm_para.h>
> >  #include <asm/pci_x86.h>
> >
> >  /*
> > @@ -264,6 +265,10 @@ void __init pci_direct_init(int type)
> >  {
> >       if (type == 0)
> >               return;
> > +
> > +     if (raw_pci_ext_ops && kvm_para_available())
> > +             return;
> > +
> >       printk(KERN_INFO "PCI: Using configuration type %d for base access\n",
> >                type);
> >       if (type == 1) {
> > diff --git a/arch/x86/pci/mmconfig_64.c b/arch/x86/pci/mmconfig_64.c
> > index 0c7b6e66c644..9eb772821766 100644
> > --- a/arch/x86/pci/mmconfig_64.c
> > +++ b/arch/x86/pci/mmconfig_64.c
> > @@ -10,6 +10,7 @@
> >  #include <linux/init.h>
> >  #include <linux/acpi.h>
> >  #include <linux/bitmap.h>
> > +#include <linux/kvm_para.h>
> >  #include <linux/rcupdate.h>
> >  #include <asm/e820/api.h>
> >  #include <asm/pci_x86.h>
> > @@ -122,6 +123,8 @@ int __init pci_mmcfg_arch_init(void)
> >               }
> >
> >       raw_pci_ext_ops = &pci_mmcfg;
> > +     if (kvm_para_available())
> > +             raw_pci_ops = &pci_mmcfg;
> >
> >       return 1;
> >  }
>
> This implies mmconfig access method is always functional (when present)
> for all KVM guests, regardless of hypervisor version/which KVM userspace
> is is use/... In case the assumption is true the patch looks good (to
> me) but in case it isn't or if we think that more control over this
> is needed we may want to introduce a PV feature bit for KVM.

Ok, I'll introduce a feature bit and turn it on in QEMU.

> Also, I'm thinking about moving this to arch/x86/kernel/kvm.c: we can
> override x86_init.pci.arch_init and reassign raw_pci_ops after doing
> pci_arch_init().

This might be a good idea.

Best regards, Julia Suvorova.
Julia Suvorova July 26, 2020, 6:58 p.m. UTC | #6
On Thu, Jul 23, 2020 at 1:19 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Wed, Jul 22, 2020 at 02:15:13AM +0200, Julia Suvorova wrote:
> > Scanning for PCI devices at boot takes a long time for KVM guests. It
> > can be reduced if KVM will handle all configuration space accesses for
> > non-existent devices without going to userspace [1]. But for this to
> > work, all accesses must go through MMCONFIG.
> > This change allows to use pci_mmcfg as raw_pci_ops for 64-bit KVM
> > guests making MMCONFIG the default access method.
>
> The above *looks* like it's intended to be two paragraphs, which would
> be easier to read with a blank line between.
>
> The last sentence should say what the patch actually *does*, e.g.,
> "Use pci_mmcfg as raw_pci_ops ..."
>
> > [1] https://lkml.org/lkml/2020/5/14/936
>
> Please use a lore.kernel.org URL instead because it's more usable and
> I'd rather depend on kernel.org than lkml.org.
>
> > Signed-off-by: Julia Suvorova <jusual@redhat.com>
> > ---
> >  arch/x86/pci/direct.c      | 5 +++++
> >  arch/x86/pci/mmconfig_64.c | 3 +++
> >  2 files changed, 8 insertions(+)
> >
> > diff --git a/arch/x86/pci/direct.c b/arch/x86/pci/direct.c
> > index a51074c55982..8ff6b65d8f48 100644
> > --- a/arch/x86/pci/direct.c
> > +++ b/arch/x86/pci/direct.c
> > @@ -6,6 +6,7 @@
> >  #include <linux/pci.h>
> >  #include <linux/init.h>
> >  #include <linux/dmi.h>
> > +#include <linux/kvm_para.h>
> >  #include <asm/pci_x86.h>
> >
> >  /*
> > @@ -264,6 +265,10 @@ void __init pci_direct_init(int type)
> >  {
> >       if (type == 0)
> >               return;
> > +
> > +     if (raw_pci_ext_ops && kvm_para_available())
> > +             return;
> >       printk(KERN_INFO "PCI: Using configuration type %d for base access\n",
> >                type);
> >       if (type == 1) {
> > diff --git a/arch/x86/pci/mmconfig_64.c b/arch/x86/pci/mmconfig_64.c
> > index 0c7b6e66c644..9eb772821766 100644
> > --- a/arch/x86/pci/mmconfig_64.c
> > +++ b/arch/x86/pci/mmconfig_64.c
> > @@ -10,6 +10,7 @@
> >  #include <linux/init.h>
> >  #include <linux/acpi.h>
> >  #include <linux/bitmap.h>
> > +#include <linux/kvm_para.h>
> >  #include <linux/rcupdate.h>
> >  #include <asm/e820/api.h>
> >  #include <asm/pci_x86.h>
> > @@ -122,6 +123,8 @@ int __init pci_mmcfg_arch_init(void)
> >               }
> >
> >       raw_pci_ext_ops = &pci_mmcfg;
> > +     if (kvm_para_available())
> > +             raw_pci_ops = &pci_mmcfg;
>
> The idea of using MMCONFIG for *all* config space, not just extended
> config space, makes sense to me, although the very long discussion at
> https://lore.kernel.org/lkml/20071225032605.29147200@laptopd505.fenrus.org/
> makes me wary.  Of course I realize you're talking specifically about
> KVM, not doing this in general.
>
> But it doesn't seem right to make this specific to KVM, since it's not
> obvious to me that there's a basis in PCI for making this distinction.

Bugs that were fixed (or more accurately, avoided) by a0ca99096094
("PCI x86: always use conf1 to access config space below 256 bytes")
are still present. And to enable MMCONFIG for the entire config space,
we need to re-introduce all these fixes or at least identify affected
devices, which may be impossible.

We can avoid KVM-specific changes in the generic PCI code by
implementing x86_init.pci.arch_init inside KVM code, as Vitaly
suggested. What do you think?

Best regards, Julia Suvorova.
Andy Shevchenko July 27, 2020, 2:04 p.m. UTC | #7
On Wed, Jul 22, 2020 at 12:47 PM Vitaly Kuznetsov <vkuznets@redhat.com> wrote:
> Julia Suvorova <jusual@redhat.com> writes:

> > Scanning for PCI devices at boot takes a long time for KVM guests. It
> > can be reduced if KVM will handle all configuration space accesses for
> > non-existent devices without going to userspace [1]. But for this to
> > work, all accesses must go through MMCONFIG.
> > This change allows to use pci_mmcfg as raw_pci_ops for 64-bit KVM
> > guests making MMCONFIG the default access method.

I'm not sure it won't break anything.

> > [1] https://lkml.org/lkml/2020/5/14/936

use Link: tag and better to use lore.kernel.org.

> This implies mmconfig access method is always functional (when present)
> for all KVM guests, regardless of hypervisor version/which KVM userspace
> is is use/... In case the assumption is true the patch looks good (to
> me) but in case it isn't or if we think that more control over this
> is needed we may want to introduce a PV feature bit for KVM.
>
> Also, I'm thinking about moving this to arch/x86/kernel/kvm.c: we can
> override x86_init.pci.arch_init and reassign raw_pci_ops after doing
> pci_arch_init().

% git grep -n -w x86_init.pci.arch_init -- arch/x86/
arch/x86/hyperv/hv_init.c:400:  x86_init.pci.arch_init = hv_pci_init;
arch/x86/kernel/apic/apic_numachip.c:203:       x86_init.pci.arch_init
= pci_numachip_init;
arch/x86/kernel/jailhouse.c:207:        x86_init.pci.arch_init
 = jailhouse_pci_arch_init;
arch/x86/pci/init.c:20: if (x86_init.pci.arch_init && !x86_init.pci.arch_init())
arch/x86/platform/intel-mid/intel-mid.c:172:    x86_init.pci.arch_init
= intel_mid_pci_init;
arch/x86/platform/olpc/olpc.c:309:              x86_init.pci.arch_init
= pci_olpc_init;
arch/x86/xen/enlighten_pv.c:1411:
x86_init.pci.arch_init = pci_xen_init;

Are you going to update all these? Or how this is supposed to work (I
may be missing something)?
Vitaly Kuznetsov July 27, 2020, 3:55 p.m. UTC | #8
Andy Shevchenko <andy.shevchenko@gmail.com> writes:

> On Wed, Jul 22, 2020 at 12:47 PM Vitaly Kuznetsov <vkuznets@redhat.com> wrote:
>> Julia Suvorova <jusual@redhat.com> writes:
>
>> > Scanning for PCI devices at boot takes a long time for KVM guests. It
>> > can be reduced if KVM will handle all configuration space accesses for
>> > non-existent devices without going to userspace [1]. But for this to
>> > work, all accesses must go through MMCONFIG.
>> > This change allows to use pci_mmcfg as raw_pci_ops for 64-bit KVM
>> > guests making MMCONFIG the default access method.
>
> I'm not sure it won't break anything.

It likely will as it's really hard to check all possible KVM
configurations in existence and that's why we are converging on adding a
feature bit which KVM userspace (e.g. QEMU) will set when the
configuration is known to be good.

>
>> > [1] https://lkml.org/lkml/2020/5/14/936
>
> use Link: tag and better to use lore.kernel.org.
>
>> This implies mmconfig access method is always functional (when present)
>> for all KVM guests, regardless of hypervisor version/which KVM userspace
>> is is use/... In case the assumption is true the patch looks good (to
>> me) but in case it isn't or if we think that more control over this
>> is needed we may want to introduce a PV feature bit for KVM.
>>
>> Also, I'm thinking about moving this to arch/x86/kernel/kvm.c: we can
>> override x86_init.pci.arch_init and reassign raw_pci_ops after doing
>> pci_arch_init().
>
> % git grep -n -w x86_init.pci.arch_init -- arch/x86/
> arch/x86/hyperv/hv_init.c:400:  x86_init.pci.arch_init = hv_pci_init;
> arch/x86/kernel/apic/apic_numachip.c:203:       x86_init.pci.arch_init
> = pci_numachip_init;
> arch/x86/kernel/jailhouse.c:207:        x86_init.pci.arch_init
>  = jailhouse_pci_arch_init;
> arch/x86/pci/init.c:20: if (x86_init.pci.arch_init && !x86_init.pci.arch_init())
> arch/x86/platform/intel-mid/intel-mid.c:172:    x86_init.pci.arch_init
> = intel_mid_pci_init;
> arch/x86/platform/olpc/olpc.c:309:              x86_init.pci.arch_init
> = pci_olpc_init;
> arch/x86/xen/enlighten_pv.c:1411:
> x86_init.pci.arch_init = pci_xen_init;
>
> Are you going to update all these? Or how this is supposed to work (I
> may be missing something)?

My suggestion was to do exactly the same for KVM guests instead of
switching ops in pci_mmcfg_arch_init() depending on kvm_para_available()
output. Basically, keep all KVM-related tunings in one place
(arch/x86/kernel/kvm.c).
Michael S. Tsirkin July 29, 2020, 2:43 p.m. UTC | #9
On Sun, Jul 26, 2020 at 08:58:45PM +0200, Julia Suvorova wrote:
> On Thu, Jul 23, 2020 at 1:19 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
> >
> > On Wed, Jul 22, 2020 at 02:15:13AM +0200, Julia Suvorova wrote:
> > > Scanning for PCI devices at boot takes a long time for KVM guests. It
> > > can be reduced if KVM will handle all configuration space accesses for
> > > non-existent devices without going to userspace [1]. But for this to
> > > work, all accesses must go through MMCONFIG.
> > > This change allows to use pci_mmcfg as raw_pci_ops for 64-bit KVM
> > > guests making MMCONFIG the default access method.
> >
> > The above *looks* like it's intended to be two paragraphs, which would
> > be easier to read with a blank line between.
> >
> > The last sentence should say what the patch actually *does*, e.g.,
> > "Use pci_mmcfg as raw_pci_ops ..."
> >
> > > [1] https://lkml.org/lkml/2020/5/14/936
> >
> > Please use a lore.kernel.org URL instead because it's more usable and
> > I'd rather depend on kernel.org than lkml.org.
> >
> > > Signed-off-by: Julia Suvorova <jusual@redhat.com>
> > > ---
> > >  arch/x86/pci/direct.c      | 5 +++++
> > >  arch/x86/pci/mmconfig_64.c | 3 +++
> > >  2 files changed, 8 insertions(+)
> > >
> > > diff --git a/arch/x86/pci/direct.c b/arch/x86/pci/direct.c
> > > index a51074c55982..8ff6b65d8f48 100644
> > > --- a/arch/x86/pci/direct.c
> > > +++ b/arch/x86/pci/direct.c
> > > @@ -6,6 +6,7 @@
> > >  #include <linux/pci.h>
> > >  #include <linux/init.h>
> > >  #include <linux/dmi.h>
> > > +#include <linux/kvm_para.h>
> > >  #include <asm/pci_x86.h>
> > >
> > >  /*
> > > @@ -264,6 +265,10 @@ void __init pci_direct_init(int type)
> > >  {
> > >       if (type == 0)
> > >               return;
> > > +
> > > +     if (raw_pci_ext_ops && kvm_para_available())
> > > +             return;
> > >       printk(KERN_INFO "PCI: Using configuration type %d for base access\n",
> > >                type);
> > >       if (type == 1) {
> > > diff --git a/arch/x86/pci/mmconfig_64.c b/arch/x86/pci/mmconfig_64.c
> > > index 0c7b6e66c644..9eb772821766 100644
> > > --- a/arch/x86/pci/mmconfig_64.c
> > > +++ b/arch/x86/pci/mmconfig_64.c
> > > @@ -10,6 +10,7 @@
> > >  #include <linux/init.h>
> > >  #include <linux/acpi.h>
> > >  #include <linux/bitmap.h>
> > > +#include <linux/kvm_para.h>
> > >  #include <linux/rcupdate.h>
> > >  #include <asm/e820/api.h>
> > >  #include <asm/pci_x86.h>
> > > @@ -122,6 +123,8 @@ int __init pci_mmcfg_arch_init(void)
> > >               }
> > >
> > >       raw_pci_ext_ops = &pci_mmcfg;
> > > +     if (kvm_para_available())
> > > +             raw_pci_ops = &pci_mmcfg;
> >
> > The idea of using MMCONFIG for *all* config space, not just extended
> > config space, makes sense to me, although the very long discussion at
> > https://lore.kernel.org/lkml/20071225032605.29147200@laptopd505.fenrus.org/
> > makes me wary.  Of course I realize you're talking specifically about
> > KVM, not doing this in general.
> >
> > But it doesn't seem right to make this specific to KVM, since it's not
> > obvious to me that there's a basis in PCI for making this distinction.
> 
> Bugs that were fixed (or more accurately, avoided) by a0ca99096094
> ("PCI x86: always use conf1 to access config space below 256 bytes")
> are still present. And to enable MMCONFIG for the entire config space,
> we need to re-introduce all these fixes or at least identify affected
> devices, which may be impossible.

What *is* about KVM here is that there's no real benefit
to this change if not running on x86 within a hypervisor.
And this should be better documented in a code comment and
commit log.



What is *not* about KVM here is that it's known to be
safe when running on QEMU and on the specific implementation.
Other implementations - even if they are using kvm -
might freeze if you disable memory of the pci host device,
or try to size BARs so they overlap the MMCONFIG.

So to proceed with your approach, I would say either we limit this to
just a known good QEMU device, or disable this when poking at unsafe
registers.

But I have another idea: isn't it true that we can get a large part of
the benefit simply by reading the device/vendor ID through MMCONFIG?
That is almost sure to be safe anyway, even though I would limit it to
just KVM simply because other systems do not benefit.

> 
> We can avoid KVM-specific changes in the generic PCI code by
> implementing x86_init.pci.arch_init inside KVM code, as Vitaly
> suggested. What do you think?
> 
> Best regards, Julia Suvorova.

Makes sense.
diff mbox series

Patch

diff --git a/arch/x86/pci/direct.c b/arch/x86/pci/direct.c
index a51074c55982..8ff6b65d8f48 100644
--- a/arch/x86/pci/direct.c
+++ b/arch/x86/pci/direct.c
@@ -6,6 +6,7 @@ 
 #include <linux/pci.h>
 #include <linux/init.h>
 #include <linux/dmi.h>
+#include <linux/kvm_para.h>
 #include <asm/pci_x86.h>
 
 /*
@@ -264,6 +265,10 @@  void __init pci_direct_init(int type)
 {
 	if (type == 0)
 		return;
+
+	if (raw_pci_ext_ops && kvm_para_available())
+		return;
+
 	printk(KERN_INFO "PCI: Using configuration type %d for base access\n",
 		 type);
 	if (type == 1) {
diff --git a/arch/x86/pci/mmconfig_64.c b/arch/x86/pci/mmconfig_64.c
index 0c7b6e66c644..9eb772821766 100644
--- a/arch/x86/pci/mmconfig_64.c
+++ b/arch/x86/pci/mmconfig_64.c
@@ -10,6 +10,7 @@ 
 #include <linux/init.h>
 #include <linux/acpi.h>
 #include <linux/bitmap.h>
+#include <linux/kvm_para.h>
 #include <linux/rcupdate.h>
 #include <asm/e820/api.h>
 #include <asm/pci_x86.h>
@@ -122,6 +123,8 @@  int __init pci_mmcfg_arch_init(void)
 		}
 
 	raw_pci_ext_ops = &pci_mmcfg;
+	if (kvm_para_available())
+		raw_pci_ops = &pci_mmcfg;
 
 	return 1;
 }