[for-2.11] vfio: Fix vfio-kvm group registration

Message ID 20171205205409.5348.53070.stgit@gimli.home
State New
Headers show
Series
  • [for-2.11] vfio: Fix vfio-kvm group registration
Related show

Commit Message

Alex Williamson Dec. 5, 2017, 9:09 p.m.
Commit 8c37faa475f3 ("vfio-pci, ppc64/spapr: Reorder group-to-container
attaching") moved registration of groups with the vfio-kvm device from
vfio_get_group() to vfio_connect_container(), but it missed the case
where a group is attached to an existing container and takes an early
exit.  Perhaps this is a less common case on ppc64/spapr, but on x86
(without viommu) all groups are connected to the same container and
thus only the first group gets registered with the vfio-kvm device.
This becomes a problem if we then hot-unplug the devices associated
with that first group and we end up with KVM being misinformed about
any vfio connections that might remain.  Fix by including the call to
vfio_kvm_device_add_group() in this early exit path.

Fixes: 8c37faa475f3 ("vfio-pci, ppc64/spapr: Reorder group-to-container attaching")
Cc: qemu-stable@nongnu.org # qemu-2.10+
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---

This bug also existed in QEMU 2.10, but I think the fix is sufficiently
obvious (famous last words) to propose for 2.11 at this late date.  If
the first group is hot unplugged then KVM may revert to code emulation
that assumes no non-coherent DMA is present on some systems.  Also for
KVMGT, if the vGPU is not the first device registered, then the
notifier to enable linkages to KVM would not be called.  Please review.
Thanks,

Alex

 hw/vfio/common.c |    1 +
 1 file changed, 1 insertion(+)

Comments

Alexey Kardashevskiy Dec. 6, 2017, 1:02 a.m. | #1
On 06/12/17 08:09, Alex Williamson wrote:
> Commit 8c37faa475f3 ("vfio-pci, ppc64/spapr: Reorder group-to-container
> attaching") moved registration of groups with the vfio-kvm device from
> vfio_get_group() to vfio_connect_container(), but it missed the case
> where a group is attached to an existing container and takes an early
> exit.  Perhaps this is a less common case on ppc64/spapr, but on x86
> (without viommu) all groups are connected to the same container and
> thus only the first group gets registered with the vfio-kvm device.
> This becomes a problem if we then hot-unplug the devices associated
> with that first group and we end up with KVM being misinformed about
> any vfio connections that might remain.  Fix by including the call to
> vfio_kvm_device_add_group() in this early exit path.
> 
> Fixes: 8c37faa475f3 ("vfio-pci, ppc64/spapr: Reorder group-to-container attaching")
> Cc: qemu-stable@nongnu.org # qemu-2.10+
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> ---
> 
> This bug also existed in QEMU 2.10, but I think the fix is sufficiently
> obvious (famous last words) to propose for 2.11 at this late date.  If
> the first group is hot unplugged then KVM may revert to code emulation
> that assumes no non-coherent DMA is present on some systems.  Also for
> KVMGT, if the vGPU is not the first device registered, then the
> notifier to enable linkages to KVM would not be called.  Please review.

For what it is worth

Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>


Sorry for the breakage...

One question - how was this discovered? I'd love to set up a test
environment on my old thinkpad x230 if possible.



> Thanks,
> 
> Alex
> 
>  hw/vfio/common.c |    1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 7b2924c0ef19..7007878e345e 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -968,6 +968,7 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
>          if (!ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &container->fd)) {
>              group->container = container;
>              QLIST_INSERT_HEAD(&container->group_list, group, container_next);
> +            vfio_kvm_device_add_group(group);
>              return 0;
>          }
>      }
>
Alex Williamson Dec. 6, 2017, 1:30 a.m. | #2
On Wed, 6 Dec 2017 12:02:01 +1100
Alexey Kardashevskiy <aik@ozlabs.ru> wrote:

> On 06/12/17 08:09, Alex Williamson wrote:
> > Commit 8c37faa475f3 ("vfio-pci, ppc64/spapr: Reorder group-to-container
> > attaching") moved registration of groups with the vfio-kvm device from
> > vfio_get_group() to vfio_connect_container(), but it missed the case
> > where a group is attached to an existing container and takes an early
> > exit.  Perhaps this is a less common case on ppc64/spapr, but on x86
> > (without viommu) all groups are connected to the same container and
> > thus only the first group gets registered with the vfio-kvm device.
> > This becomes a problem if we then hot-unplug the devices associated
> > with that first group and we end up with KVM being misinformed about
> > any vfio connections that might remain.  Fix by including the call to
> > vfio_kvm_device_add_group() in this early exit path.
> > 
> > Fixes: 8c37faa475f3 ("vfio-pci, ppc64/spapr: Reorder group-to-container attaching")
> > Cc: qemu-stable@nongnu.org # qemu-2.10+
> > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> > ---
> > 
> > This bug also existed in QEMU 2.10, but I think the fix is sufficiently
> > obvious (famous last words) to propose for 2.11 at this late date.  If
> > the first group is hot unplugged then KVM may revert to code emulation
> > that assumes no non-coherent DMA is present on some systems.  Also for
> > KVMGT, if the vGPU is not the first device registered, then the
> > notifier to enable linkages to KVM would not be called.  Please review.  
> 
> For what it is worth
> 
> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>

Thanks!

> Sorry for the breakage...
> 
> One question - how was this discovered? I'd love to set up a test
> environment on my old thinkpad x230 if possible.

Assign two devices from separate iommu groups, hot unplug the first
device, followed by the second device.  The second unplug will trigger:

qemu-kvm: Failed to remove group ## from KVM VFIO device: No such file or directory

Laptops don't have many devices and we're not good about keeping up
with ACS quirks on laptop chipsets, so it might be difficult to find
the prerequisite setup there.  Thanks,

Alex

> >  hw/vfio/common.c |    1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> > index 7b2924c0ef19..7007878e345e 100644
> > --- a/hw/vfio/common.c
> > +++ b/hw/vfio/common.c
> > @@ -968,6 +968,7 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
> >          if (!ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &container->fd)) {
> >              group->container = container;
> >              QLIST_INSERT_HEAD(&container->group_list, group, container_next);
> > +            vfio_kvm_device_add_group(group);
> >              return 0;
> >          }
> >      }
> >   
> 
>
Liu, Yi L Dec. 6, 2017, 2:44 a.m. | #3
On Tue, Dec 05, 2017 at 02:09:07PM -0700, Alex Williamson wrote:
> Commit 8c37faa475f3 ("vfio-pci, ppc64/spapr: Reorder group-to-container
> attaching") moved registration of groups with the vfio-kvm device from
> vfio_get_group() to vfio_connect_container(), but it missed the case
> where a group is attached to an existing container and takes an early
> exit.  Perhaps this is a less common case on ppc64/spapr, but on x86
> (without viommu) all groups are connected to the same container and
> thus only the first group gets registered with the vfio-kvm device.
> This becomes a problem if we then hot-unplug the devices associated
> with that first group and we end up with KVM being misinformed about
> any vfio connections that might remain.  Fix by including the call to
> vfio_kvm_device_add_group() in this early exit path.
> 
> Fixes: 8c37faa475f3 ("vfio-pci, ppc64/spapr: Reorder group-to-container attaching")
> Cc: qemu-stable@nongnu.org # qemu-2.10+
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> ---
> 
> This bug also existed in QEMU 2.10, but I think the fix is sufficiently
> obvious (famous last words) to propose for 2.11 at this late date.  If
> the first group is hot unplugged then KVM may revert to code emulation
> that assumes no non-coherent DMA is present on some systems.  Also for
> KVMGT, if the vGPU is not the first device registered, then the
> notifier to enable linkages to KVM would not be called.  Please review.
> Thanks,

Alex, for x86, I suppose it doesn't exist in the case which viommu is exposed
to guest?

Regards,
Yi L
> 
>  hw/vfio/common.c |    1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 7b2924c0ef19..7007878e345e 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -968,6 +968,7 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
>          if (!ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &container->fd)) {
>              group->container = container;
>              QLIST_INSERT_HEAD(&container->group_list, group, container_next);
> +            vfio_kvm_device_add_group(group);
>              return 0;
>          }
>      }
> 
>
Alex Williamson Dec. 6, 2017, 3:12 a.m. | #4
On Wed, 6 Dec 2017 10:44:43 +0800
"Liu, Yi L" <yi.l.liu@linux.intel.com> wrote:

> On Tue, Dec 05, 2017 at 02:09:07PM -0700, Alex Williamson wrote:
> > Commit 8c37faa475f3 ("vfio-pci, ppc64/spapr: Reorder group-to-container
> > attaching") moved registration of groups with the vfio-kvm device from
> > vfio_get_group() to vfio_connect_container(), but it missed the case
> > where a group is attached to an existing container and takes an early
> > exit.  Perhaps this is a less common case on ppc64/spapr, but on x86
> > (without viommu) all groups are connected to the same container and
> > thus only the first group gets registered with the vfio-kvm device.
> > This becomes a problem if we then hot-unplug the devices associated
> > with that first group and we end up with KVM being misinformed about
> > any vfio connections that might remain.  Fix by including the call to
> > vfio_kvm_device_add_group() in this early exit path.
> > 
> > Fixes: 8c37faa475f3 ("vfio-pci, ppc64/spapr: Reorder group-to-container attaching")
> > Cc: qemu-stable@nongnu.org # qemu-2.10+
> > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> > ---
> > 
> > This bug also existed in QEMU 2.10, but I think the fix is sufficiently
> > obvious (famous last words) to propose for 2.11 at this late date.  If
> > the first group is hot unplugged then KVM may revert to code emulation
> > that assumes no non-coherent DMA is present on some systems.  Also for
> > KVMGT, if the vGPU is not the first device registered, then the
> > notifier to enable linkages to KVM would not be called.  Please review.
> > Thanks,  
> 
> Alex, for x86, I suppose it doesn't exist in the case which viommu is exposed
> to guest?

With viommu, I believe each group would be in its own AddressSpace and
therefore get a separate container, so I don't think it'd be an issue.
It's only subsequent groups added to the same container which are
missed.  Thanks,

Alex

> >  hw/vfio/common.c |    1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> > index 7b2924c0ef19..7007878e345e 100644
> > --- a/hw/vfio/common.c
> > +++ b/hw/vfio/common.c
> > @@ -968,6 +968,7 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
> >          if (!ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &container->fd)) {
> >              group->container = container;
> >              QLIST_INSERT_HEAD(&container->group_list, group, container_next);
> > +            vfio_kvm_device_add_group(group);
> >              return 0;
> >          }
> >      }
> > 
> >
Liu, Yi L Dec. 6, 2017, 4:31 a.m. | #5
On Tue, Dec 05, 2017 at 08:12:58PM -0700, Alex Williamson wrote:
> On Wed, 6 Dec 2017 10:44:43 +0800
> "Liu, Yi L" <yi.l.liu@linux.intel.com> wrote:
> 
> > On Tue, Dec 05, 2017 at 02:09:07PM -0700, Alex Williamson wrote:
> > > Commit 8c37faa475f3 ("vfio-pci, ppc64/spapr: Reorder group-to-container
> > > attaching") moved registration of groups with the vfio-kvm device from
> > > vfio_get_group() to vfio_connect_container(), but it missed the case
> > > where a group is attached to an existing container and takes an early
> > > exit.  Perhaps this is a less common case on ppc64/spapr, but on x86
> > > (without viommu) all groups are connected to the same container and
> > > thus only the first group gets registered with the vfio-kvm device.
> > > This becomes a problem if we then hot-unplug the devices associated
> > > with that first group and we end up with KVM being misinformed about
> > > any vfio connections that might remain.  Fix by including the call to
> > > vfio_kvm_device_add_group() in this early exit path.
> > > 
> > > Fixes: 8c37faa475f3 ("vfio-pci, ppc64/spapr: Reorder group-to-container attaching")
> > > Cc: qemu-stable@nongnu.org # qemu-2.10+
> > > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> > > ---
> > > 
> > > This bug also existed in QEMU 2.10, but I think the fix is sufficiently
> > > obvious (famous last words) to propose for 2.11 at this late date.  If
> > > the first group is hot unplugged then KVM may revert to code emulation
> > > that assumes no non-coherent DMA is present on some systems.  Also for
> > > KVMGT, if the vGPU is not the first device registered, then the
> > > notifier to enable linkages to KVM would not be called.  Please review.
> > > Thanks,  
> > 
> > Alex, for x86, I suppose it doesn't exist in the case which viommu is exposed
> > to guest?
> 
> With viommu, I believe each group would be in its own AddressSpace and
> therefore get a separate container, so I don't think it'd be an issue.
> It's only subsequent groups added to the same container which are
> missed.  Thanks,

agree, thanks for the confirm. It's a nice fix~

Regards,
Yi L
> 
> > >  hw/vfio/common.c |    1 +
> > >  1 file changed, 1 insertion(+)
> > > 
> > > diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> > > index 7b2924c0ef19..7007878e345e 100644
> > > --- a/hw/vfio/common.c
> > > +++ b/hw/vfio/common.c
> > > @@ -968,6 +968,7 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
> > >          if (!ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &container->fd)) {
> > >              group->container = container;
> > >              QLIST_INSERT_HEAD(&container->group_list, group, container_next);
> > > +            vfio_kvm_device_add_group(group);
> > >              return 0;
> > >          }
> > >      }
> > > 
> > >   
>
Peter Xu Dec. 6, 2017, 7:20 a.m. | #6
On Tue, Dec 05, 2017 at 06:30:39PM -0700, Alex Williamson wrote:
> On Wed, 6 Dec 2017 12:02:01 +1100
> Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
> 
> > On 06/12/17 08:09, Alex Williamson wrote:
> > > Commit 8c37faa475f3 ("vfio-pci, ppc64/spapr: Reorder group-to-container
> > > attaching") moved registration of groups with the vfio-kvm device from
> > > vfio_get_group() to vfio_connect_container(), but it missed the case
> > > where a group is attached to an existing container and takes an early
> > > exit.  Perhaps this is a less common case on ppc64/spapr, but on x86
> > > (without viommu) all groups are connected to the same container and
> > > thus only the first group gets registered with the vfio-kvm device.
> > > This becomes a problem if we then hot-unplug the devices associated
> > > with that first group and we end up with KVM being misinformed about
> > > any vfio connections that might remain.  Fix by including the call to
> > > vfio_kvm_device_add_group() in this early exit path.
> > > 
> > > Fixes: 8c37faa475f3 ("vfio-pci, ppc64/spapr: Reorder group-to-container attaching")
> > > Cc: qemu-stable@nongnu.org # qemu-2.10+
> > > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> > > ---
> > > 
> > > This bug also existed in QEMU 2.10, but I think the fix is sufficiently
> > > obvious (famous last words) to propose for 2.11 at this late date.  If
> > > the first group is hot unplugged then KVM may revert to code emulation
> > > that assumes no non-coherent DMA is present on some systems.  Also for
> > > KVMGT, if the vGPU is not the first device registered, then the
> > > notifier to enable linkages to KVM would not be called.  Please review.  
> > 
> > For what it is worth
> > 
> > Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> 
> Thanks!
> 
> > Sorry for the breakage...
> > 
> > One question - how was this discovered? I'd love to set up a test
> > environment on my old thinkpad x230 if possible.
> 
> Assign two devices from separate iommu groups, hot unplug the first
> device, followed by the second device.  The second unplug will trigger:
> 
> qemu-kvm: Failed to remove group ## from KVM VFIO device: No such file or directory

I reproduced this with command line:

bin=x86_64-softmmu/qemu-system-x86_64  
$bin -machine q35,kernel-irqchip=split \                                       
     -enable-kvm -m 4G -nographic \    
     -monitor telnet::6666,server,nowait \                                     
     -device ioh3420,multifunction=on,bus=pcie.0,id=port0,chassis=0 \          
     -device ioh3420,bus=pcie.0,id=port1,chassis=1 \                           
     -netdev user,id=user.0,hostfwd=tcp::5555-:22 \                            
     -device e1000,netdev=user.0 \     
     -device vfio-pci,host=05:00.0,id=vfio0,bus=port0 \                        
     -device vfio-pci,host=05:00.1,id=vfio1,bus=port1 \                        
     /home/images/fedora-25.qcow2      

The patch fixes it, so:

Reviewed-by: Peter Xu <peterx@redhat.com>
Tested-by: Peter Xu <peterx@redhat.com>

Thanks,
Auger Eric Dec. 6, 2017, 8:14 a.m. | #7
Hi Alex,

On 05/12/17 22:09, Alex Williamson wrote:
> Commit 8c37faa475f3 ("vfio-pci, ppc64/spapr: Reorder group-to-container
> attaching") moved registration of groups with the vfio-kvm device from
> vfio_get_group() to vfio_connect_container(), but it missed the case
> where a group is attached to an existing container and takes an early
> exit.  Perhaps this is a less common case on ppc64/spapr, but on x86
> (without viommu) all groups are connected to the same container and
> thus only the first group gets registered with the vfio-kvm device.
> This becomes a problem if we then hot-unplug the devices associated
> with that first group and we end up with KVM being misinformed about
> any vfio connections that might remain.  Fix by including the call to
> vfio_kvm_device_add_group() in this early exit path.
> 
> Fixes: 8c37faa475f3 ("vfio-pci, ppc64/spapr: Reorder group-to-container attaching")
> Cc: qemu-stable@nongnu.org # qemu-2.10+
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>

Tested on arm64 Gigabyte HW with hot-detach of the 2 X540T2 PFs

I don't have the "2017-12-06T08:02:07.535373Z qemu-system-aarch64:
Failed to remove group 24 from KVM VFIO device: No such file or
directory" anymore when detaching the second PF.

Thanks

Eric


> ---
> 
> This bug also existed in QEMU 2.10, but I think the fix is sufficiently
> obvious (famous last words) to propose for 2.11 at this late date.  If
> the first group is hot unplugged then KVM may revert to code emulation
> that assumes no non-coherent DMA is present on some systems.  Also for
> KVMGT, if the vGPU is not the first device registered, then the
> notifier to enable linkages to KVM would not be called.  Please review.
> Thanks,
> 
> Alex
> 
>  hw/vfio/common.c |    1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 7b2924c0ef19..7007878e345e 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -968,6 +968,7 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
>          if (!ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &container->fd)) {
>              group->container = container;
>              QLIST_INSERT_HEAD(&container->group_list, group, container_next);
> +            vfio_kvm_device_add_group(group);
>              return 0;
>          }
>      }
> 
>
Alexey Kardashevskiy Dec. 7, 2017, 12:16 a.m. | #8
On 06/12/17 12:30, Alex Williamson wrote:
> On Wed, 6 Dec 2017 12:02:01 +1100
> Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
> 
>> On 06/12/17 08:09, Alex Williamson wrote:
>>> Commit 8c37faa475f3 ("vfio-pci, ppc64/spapr: Reorder group-to-container
>>> attaching") moved registration of groups with the vfio-kvm device from
>>> vfio_get_group() to vfio_connect_container(), but it missed the case
>>> where a group is attached to an existing container and takes an early
>>> exit.  Perhaps this is a less common case on ppc64/spapr, but on x86
>>> (without viommu) all groups are connected to the same container and
>>> thus only the first group gets registered with the vfio-kvm device.
>>> This becomes a problem if we then hot-unplug the devices associated
>>> with that first group and we end up with KVM being misinformed about
>>> any vfio connections that might remain.  Fix by including the call to
>>> vfio_kvm_device_add_group() in this early exit path.
>>>
>>> Fixes: 8c37faa475f3 ("vfio-pci, ppc64/spapr: Reorder group-to-container attaching")
>>> Cc: qemu-stable@nongnu.org # qemu-2.10+
>>> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
>>> ---
>>>
>>> This bug also existed in QEMU 2.10, but I think the fix is sufficiently
>>> obvious (famous last words) to propose for 2.11 at this late date.  If
>>> the first group is hot unplugged then KVM may revert to code emulation
>>> that assumes no non-coherent DMA is present on some systems.  Also for
>>> KVMGT, if the vGPU is not the first device registered, then the
>>> notifier to enable linkages to KVM would not be called.  Please review.  
>>
>> For what it is worth
>>
>> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> 
> Thanks!
> 
>> Sorry for the breakage...
>>
>> One question - how was this discovered? I'd love to set up a test
>> environment on my old thinkpad x230 if possible.
> 
> Assign two devices from separate iommu groups, hot unplug the first
> device, followed by the second device.  The second unplug will trigger:
> 
> qemu-kvm: Failed to remove group ## from KVM VFIO device: No such file or directory
> 
> Laptops don't have many devices and we're not good about keeping up
> with ACS quirks on laptop chipsets, so it might be difficult to find
> the prerequisite setup there.  Thanks,


This is actually easy to reproduce on the spapr platform as reusing the
same container is what we do these days, at least till we get multiple PHB
support in libvirt :-/

Patch

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 7b2924c0ef19..7007878e345e 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -968,6 +968,7 @@  static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
         if (!ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &container->fd)) {
             group->container = container;
             QLIST_INSERT_HEAD(&container->group_list, group, container_next);
+            vfio_kvm_device_add_group(group);
             return 0;
         }
     }