diff mbox

[RFC,net-next,1/4] pci: Add flag indicating device has been assigned by KVM

Message ID 20110727221749.8435.19000.stgit@gitlad.jf.intel.com
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Rose, Gregory V July 27, 2011, 10:17 p.m. UTC
Device drivers that create and destroy SR-IOV virtual functions via
calls to pci_enable_sriov() and pci_disable_sriov can cause catastrophic
failures if they attempt to destroy VFs while they are assigned to
guest virtual machines.  By adding a flag for use by the KVM module
to indicate that a device is assigned a device driver can check that
flag and avoid destroying VFs while they are assigned and avoid system
failures.

Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
---

 include/linux/pci.h     |    2 ++
 virt/kvm/assigned-dev.c |    2 ++
 virt/kvm/iommu.c        |    4 ++++
 3 files changed, 8 insertions(+), 0 deletions(-)


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Ian Campbell July 28, 2011, 3:11 p.m. UTC | #1
On Wed, 2011-07-27 at 15:17 -0700, Greg Rose wrote:
> Device drivers that create and destroy SR-IOV virtual functions via
> calls to pci_enable_sriov() and pci_disable_sriov can cause catastrophic
> failures if they attempt to destroy VFs while they are assigned to
> guest virtual machines.  By adding a flag for use by the KVM module
> to indicate that a device is assigned a device driver can check that
> flag and avoid destroying VFs while they are assigned and avoid system
> failures.
> 
> Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
> ---
> 
>  include/linux/pci.h     |    2 ++

I added Jesse and linux-pci to CC.

>  virt/kvm/assigned-dev.c |    2 ++
>  virt/kvm/iommu.c        |    4 ++++
>  3 files changed, 8 insertions(+), 0 deletions(-)

I suppose this would also be useful in Xen's pciback or any other system
which does passthrough? (Konrad CC'd for pciback)

Is there some common lower layer we could hook this in to? (does
iommu_attach/detach_device make sense?) Or shall we just add the flag
manipulations to pciback as well?

Ian.

> 
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 2d29218..a297ca2 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -174,6 +174,8 @@ enum pci_dev_flags {
>  	PCI_DEV_FLAGS_MSI_INTX_DISABLE_BUG = (__force pci_dev_flags_t) 1,
>  	/* Device configuration is irrevocably lost if disabled into D3 */
>  	PCI_DEV_FLAGS_NO_D3 = (__force pci_dev_flags_t) 2,
> +	/* Provide indication device is assigned by KVM */
> +	PCI_DEV_FLAGS_ASSIGNED = (__force pci_dev_flags_t) 4,
>  };
>  
>  enum pci_irq_reroute_variant {
> diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c
> index 6cc4b97..f401de1 100644
> --- a/virt/kvm/assigned-dev.c
> +++ b/virt/kvm/assigned-dev.c
> @@ -205,6 +205,8 @@ static void kvm_free_assigned_device(struct kvm *kvm,
>  	else
>  		pci_restore_state(assigned_dev->dev);
>  
> +	assigned_dev->dev->dev_flags &= ~PCI_DEV_FLAGS_ASSIGNED;
> +
>  	pci_release_regions(assigned_dev->dev);
>  	pci_disable_device(assigned_dev->dev);
>  	pci_dev_put(assigned_dev->dev);
> diff --git a/virt/kvm/iommu.c b/virt/kvm/iommu.c
> index 62a9caf..cffc530 100644
> --- a/virt/kvm/iommu.c
> +++ b/virt/kvm/iommu.c
> @@ -181,6 +181,8 @@ int kvm_assign_device(struct kvm *kvm,
>  			goto out_unmap;
>  	}
>  
> +	pdev->dev_flags |= PCI_DEV_FLAGS_ASSIGNED;
> +
>  	printk(KERN_DEBUG "assign device %x:%x:%x.%x\n",
>  		assigned_dev->host_segnr,
>  		assigned_dev->host_busnr,
> @@ -209,6 +211,8 @@ int kvm_deassign_device(struct kvm *kvm,
>  
>  	iommu_detach_device(domain, &pdev->dev);
>  
> +	pdev->dev_flags &= ~PCI_DEV_FLAGS_ASSIGNED;
> +
>  	printk(KERN_DEBUG "deassign device %x:%x:%x.%x\n",
>  		assigned_dev->host_segnr,
>  		assigned_dev->host_busnr,
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
Rose, Gregory V July 28, 2011, 3:58 p.m. UTC | #2
> -----Original Message-----

> From: Ian Campbell [mailto:ijc@hellion.org.uk]

> Sent: Thursday, July 28, 2011 8:11 AM

> To: Rose, Gregory V; Konrad Rzeszutek Wilk

> Cc: netdev@vger.kernel.org; davem@davemloft.net;

> bhutchings@solarflare.com; Kirsher, Jeffrey T; Jesse Barnes; linux-

> pci@vger.kernel.org

> Subject: Re: [RFC net-next PATCH 1/4] pci: Add flag indicating device has

> been assigned by KVM

> 

> On Wed, 2011-07-27 at 15:17 -0700, Greg Rose wrote:

> > Device drivers that create and destroy SR-IOV virtual functions via

> > calls to pci_enable_sriov() and pci_disable_sriov can cause catastrophic

> > failures if they attempt to destroy VFs while they are assigned to

> > guest virtual machines.  By adding a flag for use by the KVM module

> > to indicate that a device is assigned a device driver can check that

> > flag and avoid destroying VFs while they are assigned and avoid system

> > failures.

> >

> > Signed-off-by: Greg Rose <gregory.v.rose@intel.com>

> > ---

> >

> >  include/linux/pci.h     |    2 ++

> 

> I added Jesse and linux-pci to CC.

> 

> >  virt/kvm/assigned-dev.c |    2 ++

> >  virt/kvm/iommu.c        |    4 ++++

> >  3 files changed, 8 insertions(+), 0 deletions(-)

> 

> I suppose this would also be useful in Xen's pciback or any other system

> which does passthrough? (Konrad CC'd for pciback)


Definitely yes.  Xen experiences the same issues when the PF driver is removed while VFs are assigned to guests.

> 

> Is there some common lower layer we could hook this in to? (does

> iommu_attach/detach_device make sense?) Or shall we just add the flag

> manipulations to pciback as well?

[Greg Rose] 

I was unaware of any common lower layers, i.e I didn't know that Xen also uses the iommu_attach/detach_device calls.  It took me a week of digging around in the KVM module code just to find these hooks.  Generally I stick to Ethernet device drivers and I'm not that familiar with device pass through code.  I was just confronted with a problem and looking for some way to fix it.

;^)

That sounds like a good idea, let me have a look at it.

- Greg
Ian Campbell July 28, 2011, 4:27 p.m. UTC | #3
On Thu, 2011-07-28 at 08:58 -0700, Rose, Gregory V wrote:
> > -----Original Message-----
> > From: Ian Campbell [mailto:ijc@hellion.org.uk]
> > Sent: Thursday, July 28, 2011 8:11 AM
> > To: Rose, Gregory V; Konrad Rzeszutek Wilk
> > Cc: netdev@vger.kernel.org; davem@davemloft.net;
> > bhutchings@solarflare.com; Kirsher, Jeffrey T; Jesse Barnes; linux-
> > pci@vger.kernel.org
> > Subject: Re: [RFC net-next PATCH 1/4] pci: Add flag indicating device has
> > been assigned by KVM
> > 
> > On Wed, 2011-07-27 at 15:17 -0700, Greg Rose wrote:
> > > Device drivers that create and destroy SR-IOV virtual functions via
> > > calls to pci_enable_sriov() and pci_disable_sriov can cause catastrophic
> > > failures if they attempt to destroy VFs while they are assigned to
> > > guest virtual machines.  By adding a flag for use by the KVM module
> > > to indicate that a device is assigned a device driver can check that
> > > flag and avoid destroying VFs while they are assigned and avoid system
> > > failures.
> > >
> > > Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
> > > ---
> > >
> > >  include/linux/pci.h     |    2 ++
> > 
> > I added Jesse and linux-pci to CC.
> > 
> > >  virt/kvm/assigned-dev.c |    2 ++
> > >  virt/kvm/iommu.c        |    4 ++++
> > >  3 files changed, 8 insertions(+), 0 deletions(-)
> > 
> > I suppose this would also be useful in Xen's pciback or any other system
> > which does passthrough? (Konrad CC'd for pciback)
> 
> Definitely yes.  Xen experiences the same issues when the PF driver is
> removed while VFs are assigned to guests.
> 
> > 
> > Is there some common lower layer we could hook this in to? (does
> > iommu_attach/detach_device make sense?) Or shall we just add the flag
> > manipulations to pciback as well?
> [Greg Rose] 
> 
> I was unaware of any common lower layers, i.e I didn't know that Xen
> also uses the iommu_attach/detach_device calls.

my mistake -- under Xen the iommu is driver by the hypervisor and not
the domain 0 kernel so there is no iommu_* in pciback.

>   It took me a week of digging around in the KVM module code just to
> find these hooks.

I'm not actually sure where in pciback the right place to put this would
be is, perhaps Konrad has an idea.

>   Generally I stick to Ethernet device drivers and I'm not that
> familiar with device pass through code.  I was just confronted with a
> problem and looking for some way to fix it.
> 
> ;^)
> 
> That sounds like a good idea, let me have a look at it.
> 
> - Greg
>
Rose, Gregory V July 28, 2011, 4:42 p.m. UTC | #4
> -----Original Message-----

> From: Ian Campbell [mailto:ijc@hellion.org.uk]

> Sent: Thursday, July 28, 2011 9:28 AM

> To: Rose, Gregory V

> Cc: Konrad Rzeszutek Wilk; netdev@vger.kernel.org; davem@davemloft.net;

> bhutchings@solarflare.com; Kirsher, Jeffrey T; Jesse Barnes; linux-

> pci@vger.kernel.org

> Subject: RE: [RFC net-next PATCH 1/4] pci: Add flag indicating device has

> been assigned by KVM

> 

> On Thu, 2011-07-28 at 08:58 -0700, Rose, Gregory V wrote:

> > > -----Original Message-----

> > > From: Ian Campbell [mailto:ijc@hellion.org.uk]

> > > Sent: Thursday, July 28, 2011 8:11 AM

> > > To: Rose, Gregory V; Konrad Rzeszutek Wilk

> > > Cc: netdev@vger.kernel.org; davem@davemloft.net;

> > > bhutchings@solarflare.com; Kirsher, Jeffrey T; Jesse Barnes; linux-

> > > pci@vger.kernel.org

> > > Subject: Re: [RFC net-next PATCH 1/4] pci: Add flag indicating device

> has

> > > been assigned by KVM

> > >

> > > On Wed, 2011-07-27 at 15:17 -0700, Greg Rose wrote:

> > > > Device drivers that create and destroy SR-IOV virtual functions via

> > > > calls to pci_enable_sriov() and pci_disable_sriov can cause

> catastrophic

> > > > failures if they attempt to destroy VFs while they are assigned to

> > > > guest virtual machines.  By adding a flag for use by the KVM module

> > > > to indicate that a device is assigned a device driver can check that

> > > > flag and avoid destroying VFs while they are assigned and avoid

> system

> > > > failures.

> > > >

> > > > Signed-off-by: Greg Rose <gregory.v.rose@intel.com>

> > > > ---

> > > >

> > > >  include/linux/pci.h     |    2 ++

> > >

> > > I added Jesse and linux-pci to CC.

> > >

> > > >  virt/kvm/assigned-dev.c |    2 ++

> > > >  virt/kvm/iommu.c        |    4 ++++

> > > >  3 files changed, 8 insertions(+), 0 deletions(-)

> > >

> > > I suppose this would also be useful in Xen's pciback or any other

> system

> > > which does passthrough? (Konrad CC'd for pciback)

> >

> > Definitely yes.  Xen experiences the same issues when the PF driver is

> > removed while VFs are assigned to guests.

> >

> > >

> > > Is there some common lower layer we could hook this in to? (does

> > > iommu_attach/detach_device make sense?) Or shall we just add the flag

> > > manipulations to pciback as well?

> > [Greg Rose]

> >

> > I was unaware of any common lower layers, i.e I didn't know that Xen

> > also uses the iommu_attach/detach_device calls.

> 

> my mistake -- under Xen the iommu is driver by the hypervisor and not

> the domain 0 kernel so there is no iommu_* in pciback.

> 

> >   It took me a week of digging around in the KVM module code just to

> > find these hooks.

> 

> I'm not actually sure where in pciback the right place to put this would

> be is, perhaps Konrad has an idea.


OK, but I hope Xen can still use the dev_flag assignment bit.

Thanks,

- Greg
Jesse Barnes July 29, 2011, 4:51 p.m. UTC | #5
On Thu, 28 Jul 2011 16:11:17 +0100
Ian Campbell <ijc@hellion.org.uk> wrote:

> On Wed, 2011-07-27 at 15:17 -0700, Greg Rose wrote:
> > Device drivers that create and destroy SR-IOV virtual functions via
> > calls to pci_enable_sriov() and pci_disable_sriov can cause catastrophic
> > failures if they attempt to destroy VFs while they are assigned to
> > guest virtual machines.  By adding a flag for use by the KVM module
> > to indicate that a device is assigned a device driver can check that
> > flag and avoid destroying VFs while they are assigned and avoid system
> > failures.
> > 
> > Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
> > ---
> > 
> >  include/linux/pci.h     |    2 ++
> 
> I added Jesse and linux-pci to CC.
> 
> >  virt/kvm/assigned-dev.c |    2 ++
> >  virt/kvm/iommu.c        |    4 ++++
> >  3 files changed, 8 insertions(+), 0 deletions(-)
> 
> I suppose this would also be useful in Xen's pciback or any other system
> which does passthrough? (Konrad CC'd for pciback)
> 
> Is there some common lower layer we could hook this in to? (does
> iommu_attach/detach_device make sense?) Or shall we just add the flag
> manipulations to pciback as well?
> 
> Ian.
> 
> > 
> > diff --git a/include/linux/pci.h b/include/linux/pci.h
> > index 2d29218..a297ca2 100644
> > --- a/include/linux/pci.h
> > +++ b/include/linux/pci.h
> > @@ -174,6 +174,8 @@ enum pci_dev_flags {
> >  	PCI_DEV_FLAGS_MSI_INTX_DISABLE_BUG = (__force pci_dev_flags_t) 1,
> >  	/* Device configuration is irrevocably lost if disabled into D3 */
> >  	PCI_DEV_FLAGS_NO_D3 = (__force pci_dev_flags_t) 2,
> > +	/* Provide indication device is assigned by KVM */
> > +	PCI_DEV_FLAGS_ASSIGNED = (__force pci_dev_flags_t) 4,
> >  };

Looks fine, but I'd make the comment less redundant with the code, e.g.
"set when the device is assigned to a guest instance" or somesuch.
Rose, Gregory V July 29, 2011, 4:54 p.m. UTC | #6
> -----Original Message-----
> From: Jesse Barnes [mailto:jbarnes@virtuousgeek.org]
> Sent: Friday, July 29, 2011 9:52 AM
> To: Ian Campbell
> Cc: Rose, Gregory V; Konrad Rzeszutek Wilk; netdev@vger.kernel.org;
> davem@davemloft.net; bhutchings@solarflare.com; Kirsher, Jeffrey T; linux-
> pci@vger.kernel.org
> Subject: Re: [RFC net-next PATCH 1/4] pci: Add flag indicating device has
> been assigned by KVM
> 
> On Thu, 28 Jul 2011 16:11:17 +0100
> Ian Campbell <ijc@hellion.org.uk> wrote:
> 
> > On Wed, 2011-07-27 at 15:17 -0700, Greg Rose wrote:
> > > Device drivers that create and destroy SR-IOV virtual functions via
> > > calls to pci_enable_sriov() and pci_disable_sriov can cause
> catastrophic
> > > failures if they attempt to destroy VFs while they are assigned to
> > > guest virtual machines.  By adding a flag for use by the KVM module
> > > to indicate that a device is assigned a device driver can check that
> > > flag and avoid destroying VFs while they are assigned and avoid system
> > > failures.
> > >
> > > Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
> > > ---
> > >
> > >  include/linux/pci.h     |    2 ++
> >
> > I added Jesse and linux-pci to CC.
> >
> > >  virt/kvm/assigned-dev.c |    2 ++
> > >  virt/kvm/iommu.c        |    4 ++++
> > >  3 files changed, 8 insertions(+), 0 deletions(-)
> >
> > I suppose this would also be useful in Xen's pciback or any other system
> > which does passthrough? (Konrad CC'd for pciback)
> >
> > Is there some common lower layer we could hook this in to? (does
> > iommu_attach/detach_device make sense?) Or shall we just add the flag
> > manipulations to pciback as well?
> >
> > Ian.
> >
> > >
> > > diff --git a/include/linux/pci.h b/include/linux/pci.h
> > > index 2d29218..a297ca2 100644
> > > --- a/include/linux/pci.h
> > > +++ b/include/linux/pci.h
> > > @@ -174,6 +174,8 @@ enum pci_dev_flags {
> > >  	PCI_DEV_FLAGS_MSI_INTX_DISABLE_BUG = (__force pci_dev_flags_t) 1,
> > >  	/* Device configuration is irrevocably lost if disabled into D3 */
> > >  	PCI_DEV_FLAGS_NO_D3 = (__force pci_dev_flags_t) 2,
> > > +	/* Provide indication device is assigned by KVM */
> > > +	PCI_DEV_FLAGS_ASSIGNED = (__force pci_dev_flags_t) 4,
> > >  };
> 
> Looks fine, but I'd make the comment less redundant with the code, e.g.
> "set when the device is assigned to a guest instance" or somesuch.

Sure, sounds good to me.

Rev 2 of the RFC patches will be out in a couple of weeks, I'm away next week.

Thanks,

- Greg

> 
> --
> Jesse Barnes, Intel Open Source Technology Center
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/linux/pci.h b/include/linux/pci.h
index 2d29218..a297ca2 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -174,6 +174,8 @@  enum pci_dev_flags {
 	PCI_DEV_FLAGS_MSI_INTX_DISABLE_BUG = (__force pci_dev_flags_t) 1,
 	/* Device configuration is irrevocably lost if disabled into D3 */
 	PCI_DEV_FLAGS_NO_D3 = (__force pci_dev_flags_t) 2,
+	/* Provide indication device is assigned by KVM */
+	PCI_DEV_FLAGS_ASSIGNED = (__force pci_dev_flags_t) 4,
 };
 
 enum pci_irq_reroute_variant {
diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c
index 6cc4b97..f401de1 100644
--- a/virt/kvm/assigned-dev.c
+++ b/virt/kvm/assigned-dev.c
@@ -205,6 +205,8 @@  static void kvm_free_assigned_device(struct kvm *kvm,
 	else
 		pci_restore_state(assigned_dev->dev);
 
+	assigned_dev->dev->dev_flags &= ~PCI_DEV_FLAGS_ASSIGNED;
+
 	pci_release_regions(assigned_dev->dev);
 	pci_disable_device(assigned_dev->dev);
 	pci_dev_put(assigned_dev->dev);
diff --git a/virt/kvm/iommu.c b/virt/kvm/iommu.c
index 62a9caf..cffc530 100644
--- a/virt/kvm/iommu.c
+++ b/virt/kvm/iommu.c
@@ -181,6 +181,8 @@  int kvm_assign_device(struct kvm *kvm,
 			goto out_unmap;
 	}
 
+	pdev->dev_flags |= PCI_DEV_FLAGS_ASSIGNED;
+
 	printk(KERN_DEBUG "assign device %x:%x:%x.%x\n",
 		assigned_dev->host_segnr,
 		assigned_dev->host_busnr,
@@ -209,6 +211,8 @@  int kvm_deassign_device(struct kvm *kvm,
 
 	iommu_detach_device(domain, &pdev->dev);
 
+	pdev->dev_flags &= ~PCI_DEV_FLAGS_ASSIGNED;
+
 	printk(KERN_DEBUG "deassign device %x:%x:%x.%x\n",
 		assigned_dev->host_segnr,
 		assigned_dev->host_busnr,