[for-2.11] vfio: Fix vfio-kvm group registration

Message ID 20171205205409.5348.53070.stgit@gimli.home
State New
Headers show
Series
  • [for-2.11] vfio: Fix vfio-kvm group registration
Related show

Commit Message

Alex Williamson Dec. 5, 2017, 9:09 p.m.
Commit 8c37faa475f3 ("vfio-pci, ppc64/spapr: Reorder group-to-container
attaching") moved registration of groups with the vfio-kvm device from
vfio_get_group() to vfio_connect_container(), but it missed the case
where a group is attached to an existing container and takes an early
exit.  Perhaps this is a less common case on ppc64/spapr, but on x86
(without viommu) all groups are connected to the same container and
thus only the first group gets registered with the vfio-kvm device.
This becomes a problem if we then hot-unplug the devices associated
with that first group and we end up with KVM being misinformed about
any vfio connections that might remain.  Fix by including the call to
vfio_kvm_device_add_group() in this early exit path.

Fixes: 8c37faa475f3 ("vfio-pci, ppc64/spapr: Reorder group-to-container attaching")
Cc: qemu-stable@nongnu.org # qemu-2.10+
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---

This bug also existed in QEMU 2.10, but I think the fix is sufficiently
obvious (famous last words) to propose for 2.11 at this late date.  If
the first group is hot unplugged then KVM may revert to code emulation
that assumes no non-coherent DMA is present on some systems.  Also for
KVMGT, if the vGPU is not the first device registered, then the
notifier to enable linkages to KVM would not be called.  Please review.
Thanks,

Alex

 hw/vfio/common.c |    1 +
 1 file changed, 1 insertion(+)

Comments

Alexey Kardashevskiy Dec. 6, 2017, 1:02 a.m. | #1
On 06/12/17 08:09, Alex Williamson wrote:
> Commit 8c37faa475f3 ("vfio-pci, ppc64/spapr: Reorder group-to-container
> attaching") moved registration of groups with the vfio-kvm device from
> vfio_get_group() to vfio_connect_container(), but it missed the case
> where a group is attached to an existing container and takes an early
> exit.  Perhaps this is a less common case on ppc64/spapr, but on x86
> (without viommu) all groups are connected to the same container and
> thus only the first group gets registered with the vfio-kvm device.
> This becomes a problem if we then hot-unplug the devices associated
> with that first group and we end up with KVM being misinformed about
> any vfio connections that might remain.  Fix by including the call to
> vfio_kvm_device_add_group() in this early exit path.
> 
> Fixes: 8c37faa475f3 ("vfio-pci, ppc64/spapr: Reorder group-to-container attaching")
> Cc: qemu-stable@nongnu.org # qemu-2.10+
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> ---
> 
> This bug also existed in QEMU 2.10, but I think the fix is sufficiently
> obvious (famous last words) to propose for 2.11 at this late date.  If
> the first group is hot unplugged then KVM may revert to code emulation
> that assumes no non-coherent DMA is present on some systems.  Also for
> KVMGT, if the vGPU is not the first device registered, then the
> notifier to enable linkages to KVM would not be called.  Please review.

For what it is worth

Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>


Sorry for the breakage...

One question - how was this discovered? I'd love to set up a test
environment on my old thinkpad x230 if possible.



> Thanks,
> 
> Alex
> 
>  hw/vfio/common.c |    1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 7b2924c0ef19..7007878e345e 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -968,6 +968,7 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
>          if (!ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &container->fd)) {
>              group->container = container;
>              QLIST_INSERT_HEAD(&container->group_list, group, container_next);
> +            vfio_kvm_device_add_group(group);
>              return 0;
>          }
>      }
>
Alex Williamson Dec. 6, 2017, 1:30 a.m. | #2
On Wed, 6 Dec 2017 12:02:01 +1100
Alexey Kardashevskiy <aik@ozlabs.ru> wrote:

> On 06/12/17 08:09, Alex Williamson wrote:
> > Commit 8c37faa475f3 ("vfio-pci, ppc64/spapr: Reorder group-to-container
> > attaching") moved registration of groups with the vfio-kvm device from
> > vfio_get_group() to vfio_connect_container(), but it missed the case
> > where a group is attached to an existing container and takes an early
> > exit.  Perhaps this is a less common case on ppc64/spapr, but on x86
> > (without viommu) all groups are connected to the same container and
> > thus only the first group gets registered with the vfio-kvm device.
> > This becomes a problem if we then hot-unplug the devices associated
> > with that first group and we end up with KVM being misinformed about
> > any vfio connections that might remain.  Fix by including the call to
> > vfio_kvm_device_add_group() in this early exit path.
> > 
> > Fixes: 8c37faa475f3 ("vfio-pci, ppc64/spapr: Reorder group-to-container attaching")
> > Cc: qemu-stable@nongnu.org # qemu-2.10+
> > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> > ---
> > 
> > This bug also existed in QEMU 2.10, but I think the fix is sufficiently
> > obvious (famous last words) to propose for 2.11 at this late date.  If
> > the first group is hot unplugged then KVM may revert to code emulation
> > that assumes no non-coherent DMA is present on some systems.  Also for
> > KVMGT, if the vGPU is not the first device registered, then the
> > notifier to enable linkages to KVM would not be called.  Please review.  
> 
> For what it is worth
> 
> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>

Thanks!

> Sorry for the breakage...
> 
> One question - how was this discovered? I'd love to set up a test
> environment on my old thinkpad x230 if possible.

Assign two devices from separate iommu groups, hot unplug the first
device, followed by the second device.  The second unplug will trigger:

qemu-kvm: Failed to remove group ## from KVM VFIO device: No such file or directory

Laptops don't have many devices and we're not good about keeping up
with ACS quirks on laptop chipsets, so it might be difficult to find
the prerequisite setup there.  Thanks,

Alex

> >  hw/vfio/common.c |    1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> > index 7b2924c0ef19..7007878e345e 100644
> > --- a/hw/vfio/common.c
> > +++ b/hw/vfio/common.c
> > @@ -968,6 +968,7 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
> >          if (!ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &container->fd)) {
> >              group->container = container;
> >              QLIST_INSERT_HEAD(&container->group_list, group, container_next);
> > +            vfio_kvm_device_add_group(group);
> >              return 0;
> >          }
> >      }
> >   
> 
>
Liu, Yi L Dec. 6, 2017, 2:44 a.m. | #3
On Tue, Dec 05, 2017 at 02:09:07PM -0700, Alex Williamson wrote:
> Commit 8c37faa475f3 ("vfio-pci, ppc64/spapr: Reorder group-to-container
> attaching") moved registration of groups with the vfio-kvm device from
> vfio_get_group() to vfio_connect_container(), but it missed the case
> where a group is attached to an existing container and takes an early
> exit.  Perhaps this is a less common case on ppc64/spapr, but on x86
> (without viommu) all groups are connected to the same container and
> thus only the first group gets registered with the vfio-kvm device.
> This becomes a problem if we then hot-unplug the devices associated
> with that first group and we end up with KVM being misinformed about
> any vfio connections that might remain.  Fix by including the call to
> vfio_kvm_device_add_group() in this early exit path.
> 
> Fixes: 8c37faa475f3 ("vfio-pci, ppc64/spapr: Reorder group-to-container attaching")
> Cc: qemu-stable@nongnu.org # qemu-2.10+
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> ---
> 
> This bug also existed in QEMU 2.10, but I think the fix is sufficiently
> obvious (famous last words) to propose for 2.11 at this late date.  If
> the first group is hot unplugged then KVM may revert to code emulation
> that assumes no non-coherent DMA is present on some systems.  Also for
> KVMGT, if the vGPU is not the first device registered, then the
> notifier to enable linkages to KVM would not be called.  Please review.
> Thanks,

Alex, for x86, I suppose it doesn't exist in the case which viommu is exposed
to guest?

Regards,
Yi L
> 
>  hw/vfio/common.c |    1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 7b2924c0ef19..7007878e345e 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -968,6 +968,7 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
>          if (!ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &container->fd)) {
>              group->container = container;
>              QLIST_INSERT_HEAD(&container->group_list, group, container_next);
> +            vfio_kvm_device_add_group(group);
>              return 0;
>          }
>      }
> 
>
Alex Williamson Dec. 6, 2017, 3:12 a.m. | #4
On Wed, 6 Dec 2017 10:44:43 +0800
"Liu, Yi L" <yi.l.liu@linux.intel.com> wrote:

> On Tue, Dec 05, 2017 at 02:09:07PM -0700, Alex Williamson wrote:
> > Commit 8c37faa475f3 ("vfio-pci, ppc64/spapr: Reorder group-to-container
> > attaching") moved registration of groups with the vfio-kvm device from
> > vfio_get_group() to vfio_connect_container(), but it missed the case
> > where a group is attached to an existing container and takes an early
> > exit.  Perhaps this is a less common case on ppc64/spapr, but on x86
> > (without viommu) all groups are connected to the same container and
> > thus only the first group gets registered with the vfio-kvm device.
> > This becomes a problem if we then hot-unplug the devices associated
> > with that first group and we end up with KVM being misinformed about
> > any vfio connections that might remain.  Fix by including the call to
> > vfio_kvm_device_add_group() in this early exit path.
> > 
> > Fixes: 8c37faa475f3 ("vfio-pci, ppc64/spapr: Reorder group-to-container attaching")
> > Cc: qemu-stable@nongnu.org # qemu-2.10+
> > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> > ---
> > 
> > This bug also existed in QEMU 2.10, but I think the fix is sufficiently
> > obvious (famous last words) to propose for 2.11 at this late date.  If
> > the first group is hot unplugged then KVM may revert to code emulation
> > that assumes no non-coherent DMA is present on some systems.  Also for
> > KVMGT, if the vGPU is not the first device registered, then the
> > notifier to enable linkages to KVM would not be called.  Please review.
> > Thanks,  
> 
> Alex, for x86, I suppose it doesn't exist in the case which viommu is exposed
> to guest?

With viommu, I believe each group would be in its own AddressSpace and
therefore get a separate container, so I don't think it'd be an issue.
It's only subsequent groups added to the same container which are
missed.  Thanks,

Alex

> >  hw/vfio/common.c |    1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> > index 7b2924c0ef19..7007878e345e 100644
> > --- a/hw/vfio/common.c
> > +++ b/hw/vfio/common.c
> > @@ -968,6 +968,7 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
> >          if (!ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &container->fd)) {
> >              group->container = container;
> >              QLIST_INSERT_HEAD(&container->group_list, group, container_next);
> > +            vfio_kvm_device_add_group(group);
> >              return 0;
> >          }
> >      }
> > 
> >
Liu, Yi L Dec. 6, 2017, 4:31 a.m. | #5
On Tue, Dec 05, 2017 at 08:12:58PM -0700, Alex Williamson wrote:
> On Wed, 6 Dec 2017 10:44:43 +0800
> "Liu, Yi L" <yi.l.liu@linux.intel.com> wrote:
> 
> > On Tue, Dec 05, 2017 at 02:09:07PM -0700, Alex Williamson wrote:
> > > Commit 8c37faa475f3 ("vfio-pci, ppc64/spapr: Reorder group-to-container
> > > attaching") moved registration of groups with the vfio-kvm device from
> > > vfio_get_group() to vfio_connect_container(), but it missed the case
> > > where a group is attached to an existing container and takes an early
> > > exit.  Perhaps this is a less common case on ppc64/spapr, but on x86
> > > (without viommu) all groups are connected to the same container and
> > > thus only the first group gets registered with the vfio-kvm device.
> > > This becomes a problem if we then hot-unplug the devices associated
> > > with that first group and we end up with KVM being misinformed about
> > > any vfio connections that might remain.  Fix by including the call to
> > > vfio_kvm_device_add_group() in this early exit path.
> > > 
> > > Fixes: 8c37faa475f3 ("vfio-pci, ppc64/spapr: Reorder group-to-container attaching")
> > > Cc: qemu-stable@nongnu.org # qemu-2.10+
> > > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> > > ---
> > > 
> > > This bug also existed in QEMU 2.10, but I think the fix is sufficiently
> > > obvious (famous last words) to propose for 2.11 at this late date.  If
> > > the first group is hot unplugged then KVM may revert to code emulation
> > > that assumes no non-coherent DMA is present on some systems.  Also for
> > > KVMGT, if the vGPU is not the first device registered, then the
> > > notifier to enable linkages to KVM would not be called.  Please review.
> > > Thanks,  
> > 
> > Alex, for x86, I suppose it doesn't exist in the case which viommu is exposed
> > to guest?
> 
> With viommu, I believe each group would be in its own AddressSpace and
> therefore get a separate container, so I don't think it'd be an issue.
> It's only subsequent groups added to the same container which are
> missed.  Thanks,

agree, thanks for the confirm. It's a nice fix~

Regards,
Yi L
> 
> > >  hw/vfio/common.c |    1 +
> > >  1 file changed, 1 insertion(+)
> > > 
> > > diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> > > index 7b2924c0ef19..7007878e345e 100644
> > > --- a/hw/vfio/common.c
> > > +++ b/hw/vfio/common.c
> > > @@ -968,6 +968,7 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
> > >          if (!ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &container->fd)) {
> > >              group->container = container;
> > >              QLIST_INSERT_HEAD(&container->group_list, group, container_next);
> > > +            vfio_kvm_device_add_group(group);
> > >              return 0;
> > >          }
> > >      }
> > > 
> > >   
>
Peter Xu Dec. 6, 2017, 7:20 a.m. | #6
On Tue, Dec 05, 2017 at 06:30:39PM -0700, Alex Williamson wrote:
> On Wed, 6 Dec 2017 12:02:01 +1100
> Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
> 
> > On 06/12/17 08:09, Alex Williamson wrote:
> > > Commit 8c37faa475f3 ("vfio-pci, ppc64/spapr: Reorder group-to-container
> > > attaching") moved registration of groups with the vfio-kvm device from
> > > vfio_get_group() to vfio_connect_container(), but it missed the case
> > > where a group is attached to an existing container and takes an early
> > > exit.  Perhaps this is a less common case on ppc64/spapr, but on x86
> > > (without viommu) all groups are connected to the same container and
> > > thus only the first group gets registered with the vfio-kvm device.
> > > This becomes a problem if we then hot-unplug the devices associated
> > > with that first group and we end up with KVM being misinformed about
> > > any vfio connections that might remain.  Fix by including the call to
> > > vfio_kvm_device_add_group() in this early exit path.
> > > 
> > > Fixes: 8c37faa475f3 ("vfio-pci, ppc64/spapr: Reorder group-to-container attaching")
> > > Cc: qemu-stable@nongnu.org # qemu-2.10+
> > > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> > > ---
> > > 
> > > This bug also existed in QEMU 2.10, but I think the fix is sufficiently
> > > obvious (famous last words) to propose for 2.11 at this late date.  If
> > > the first group is hot unplugged then KVM may revert to code emulation
> > > that assumes no non-coherent DMA is present on some systems.  Also for
> > > KVMGT, if the vGPU is not the first device registered, then the
> > > notifier to enable linkages to KVM would not be called.  Please review.  
> > 
> > For what it is worth
> > 
> > Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> 
> Thanks!
> 
> > Sorry for the breakage...
> > 
> > One question - how was this discovered? I'd love to set up a test
> > environment on my old thinkpad x230 if possible.
> 
> Assign two devices from separate iommu groups, hot unplug the first
> device, followed by the second device.  The second unplug will trigger:
> 
> qemu-kvm: Failed to remove group ## from KVM VFIO device: No such file or directory

I reproduced this with command line:

bin=x86_64-softmmu/qemu-system-x86_64  
$bin -machine q35,kernel-irqchip=split \                                       
     -enable-kvm -m 4G -nographic \    
     -monitor telnet::6666,server,nowait \                                     
     -device ioh3420,multifunction=on,bus=pcie.0,id=port0,chassis=0 \          
     -device ioh3420,bus=pcie.0,id=port1,chassis=1 \                           
     -netdev user,id=user.0,hostfwd=tcp::5555-:22 \                            
     -device e1000,netdev=user.0 \     
     -device vfio-pci,host=05:00.0,id=vfio0,bus=port0 \                        
     -device vfio-pci,host=05:00.1,id=vfio1,bus=port1 \                        
     /home/images/fedora-25.qcow2      

The patch fixes it, so:

Reviewed-by: Peter Xu <peterx@redhat.com>
Tested-by: Peter Xu <peterx@redhat.com>

Thanks,
Eric Auger Dec. 6, 2017, 8:14 a.m. | #7
Hi Alex,

On 05/12/17 22:09, Alex Williamson wrote:
> Commit 8c37faa475f3 ("vfio-pci, ppc64/spapr: Reorder group-to-container
> attaching") moved registration of groups with the vfio-kvm device from
> vfio_get_group() to vfio_connect_container(), but it missed the case
> where a group is attached to an existing container and takes an early
> exit.  Perhaps this is a less common case on ppc64/spapr, but on x86
> (without viommu) all groups are connected to the same container and
> thus only the first group gets registered with the vfio-kvm device.
> This becomes a problem if we then hot-unplug the devices associated
> with that first group and we end up with KVM being misinformed about
> any vfio connections that might remain.  Fix by including the call to
> vfio_kvm_device_add_group() in this early exit path.
> 
> Fixes: 8c37faa475f3 ("vfio-pci, ppc64/spapr: Reorder group-to-container attaching")
> Cc: qemu-stable@nongnu.org # qemu-2.10+
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>

Tested on arm64 Gigabyte HW with hot-detach of the 2 X540T2 PFs

I don't have the "2017-12-06T08:02:07.535373Z qemu-system-aarch64:
Failed to remove group 24 from KVM VFIO device: No such file or
directory" anymore when detaching the second PF.

Thanks

Eric


> ---
> 
> This bug also existed in QEMU 2.10, but I think the fix is sufficiently
> obvious (famous last words) to propose for 2.11 at this late date.  If
> the first group is hot unplugged then KVM may revert to code emulation
> that assumes no non-coherent DMA is present on some systems.  Also for
> KVMGT, if the vGPU is not the first device registered, then the
> notifier to enable linkages to KVM would not be called.  Please review.
> Thanks,
> 
> Alex
> 
>  hw/vfio/common.c |    1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 7b2924c0ef19..7007878e345e 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -968,6 +968,7 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
>          if (!ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &container->fd)) {
>              group->container = container;
>              QLIST_INSERT_HEAD(&container->group_list, group, container_next);
> +            vfio_kvm_device_add_group(group);
>              return 0;
>          }
>      }
> 
>
Alexey Kardashevskiy Dec. 7, 2017, 12:16 a.m. | #8
On 06/12/17 12:30, Alex Williamson wrote:
> On Wed, 6 Dec 2017 12:02:01 +1100
> Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
> 
>> On 06/12/17 08:09, Alex Williamson wrote:
>>> Commit 8c37faa475f3 ("vfio-pci, ppc64/spapr: Reorder group-to-container
>>> attaching") moved registration of groups with the vfio-kvm device from
>>> vfio_get_group() to vfio_connect_container(), but it missed the case
>>> where a group is attached to an existing container and takes an early
>>> exit.  Perhaps this is a less common case on ppc64/spapr, but on x86
>>> (without viommu) all groups are connected to the same container and
>>> thus only the first group gets registered with the vfio-kvm device.
>>> This becomes a problem if we then hot-unplug the devices associated
>>> with that first group and we end up with KVM being misinformed about
>>> any vfio connections that might remain.  Fix by including the call to
>>> vfio_kvm_device_add_group() in this early exit path.
>>>
>>> Fixes: 8c37faa475f3 ("vfio-pci, ppc64/spapr: Reorder group-to-container attaching")
>>> Cc: qemu-stable@nongnu.org # qemu-2.10+
>>> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
>>> ---
>>>
>>> This bug also existed in QEMU 2.10, but I think the fix is sufficiently
>>> obvious (famous last words) to propose for 2.11 at this late date.  If
>>> the first group is hot unplugged then KVM may revert to code emulation
>>> that assumes no non-coherent DMA is present on some systems.  Also for
>>> KVMGT, if the vGPU is not the first device registered, then the
>>> notifier to enable linkages to KVM would not be called.  Please review.  
>>
>> For what it is worth
>>
>> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> 
> Thanks!
> 
>> Sorry for the breakage...
>>
>> One question - how was this discovered? I'd love to set up a test
>> environment on my old thinkpad x230 if possible.
> 
> Assign two devices from separate iommu groups, hot unplug the first
> device, followed by the second device.  The second unplug will trigger:
> 
> qemu-kvm: Failed to remove group ## from KVM VFIO device: No such file or directory
> 
> Laptops don't have many devices and we're not good about keeping up
> with ACS quirks on laptop chipsets, so it might be difficult to find
> the prerequisite setup there.  Thanks,


This is actually easy to reproduce on the spapr platform as reusing the
same container is what we do these days, at least till we get multiple PHB
support in libvirt :-/
Alexey Kardashevskiy Jan. 18, 2018, 9:29 a.m. | #9
On 06/12/17 12:30, Alex Williamson wrote:
> On Wed, 6 Dec 2017 12:02:01 +1100
> Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
> 
>> On 06/12/17 08:09, Alex Williamson wrote:
>>> Commit 8c37faa475f3 ("vfio-pci, ppc64/spapr: Reorder group-to-container
>>> attaching") moved registration of groups with the vfio-kvm device from
>>> vfio_get_group() to vfio_connect_container(), but it missed the case
>>> where a group is attached to an existing container and takes an early
>>> exit.  Perhaps this is a less common case on ppc64/spapr, but on x86
>>> (without viommu) all groups are connected to the same container and
>>> thus only the first group gets registered with the vfio-kvm device.
>>> This becomes a problem if we then hot-unplug the devices associated
>>> with that first group and we end up with KVM being misinformed about
>>> any vfio connections that might remain.  Fix by including the call to
>>> vfio_kvm_device_add_group() in this early exit path.
>>>
>>> Fixes: 8c37faa475f3 ("vfio-pci, ppc64/spapr: Reorder group-to-container attaching")
>>> Cc: qemu-stable@nongnu.org # qemu-2.10+
>>> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
>>> ---
>>>
>>> This bug also existed in QEMU 2.10, but I think the fix is sufficiently
>>> obvious (famous last words) to propose for 2.11 at this late date.  If
>>> the first group is hot unplugged then KVM may revert to code emulation
>>> that assumes no non-coherent DMA is present on some systems.  Also for
>>> KVMGT, if the vGPU is not the first device registered, then the
>>> notifier to enable linkages to KVM would not be called.  Please review.  
>>
>> For what it is worth
>>
>> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> 
> Thanks!
> 
>> Sorry for the breakage...
>>
>> One question - how was this discovered? I'd love to set up a test
>> environment on my old thinkpad x230 if possible.
> 
> Assign two devices from separate iommu groups, hot unplug the first
> device, followed by the second device.  The second unplug will trigger:
> 
> qemu-kvm: Failed to remove group ## from KVM VFIO device: No such file or directory
> 
> Laptops don't have many devices and we're not good about keeping up
> with ACS quirks on laptop chipsets, so it might be difficult to find
> the prerequisite setup there.  Thanks,

Tried the laptop, these worked:

03:00.0 Network controller: Intel Corporation Centrino Advanced-N 6205
[Taylor Peak] (rev 34)
00:1a.0 USB controller: Intel Corporation 7 Series/C216 Chipset Family USB
Enhanced Host Controller #2 (rev 04)


However VGA did not.

$ lspci -nns 00:02.0
00:02.0 VGA compatible controller [0300]: Intel Corporation 3rd Gen Core
processor Graphics Controller [8086:0166] (rev 09)

I run like this:

pbuild/qemu-localhost-x86_64/x86_64-softmmu/qemu-system-x86_64 \
-enable-kvm -m 2G \
-netdev "tap,id=TAP0,helper=/home/aik/qemu-bridge-helper --br=br0" \
-device "virtio-net-pci,id=vnet0,mac=C0:41:49:4b:00:32,netdev=TAP0" \
virtimg/fc27-32GB.qcow2 -nodefaults \
-chardev stdio,id=STDIO0,signal=off,mux=on \
-device isa-serial,id=isa-serial0,chardev=STDIO0 \
-mon id=MON0,chardev=STDIO0,mode=readline -nographic -vga none \
-snapshot \
-device "vfio-pci,id=vfio0000_00_02_0,host=0000:00:02.0"

and it crashes pretty soon, I suppose, as @pc does not change:

(qemu) info cpus
* CPU #0: pc=0x00000000000c5afa thread_id=4024
(qemu) info cpus
* CPU #0: pc=0x00000000000c5afa thread_id=4024

and it does not seem to reach seabios or it does and seabios is
initializing VGA - hard to tell, without any VGA - seabios prints messages
to the console and shows grub. Is there any trick to try? Not big deal if
none, just curious. Thanks.




> 
> Alex
> 
>>>  hw/vfio/common.c |    1 +
>>>  1 file changed, 1 insertion(+)
>>>
>>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>>> index 7b2924c0ef19..7007878e345e 100644
>>> --- a/hw/vfio/common.c
>>> +++ b/hw/vfio/common.c
>>> @@ -968,6 +968,7 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
>>>          if (!ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &container->fd)) {
>>>              group->container = container;
>>>              QLIST_INSERT_HEAD(&container->group_list, group, container_next);
>>> +            vfio_kvm_device_add_group(group);
>>>              return 0;
>>>          }
>>>      }
>>>   
>>
>>
>
Alex Williamson Jan. 18, 2018, 10:15 p.m. | #10
On Thu, 18 Jan 2018 20:29:48 +1100
Alexey Kardashevskiy <aik@ozlabs.ru> wrote:

> On 06/12/17 12:30, Alex Williamson wrote:
> > On Wed, 6 Dec 2017 12:02:01 +1100
> > Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
> >   
> >> On 06/12/17 08:09, Alex Williamson wrote:  
> >>> Commit 8c37faa475f3 ("vfio-pci, ppc64/spapr: Reorder group-to-container
> >>> attaching") moved registration of groups with the vfio-kvm device from
> >>> vfio_get_group() to vfio_connect_container(), but it missed the case
> >>> where a group is attached to an existing container and takes an early
> >>> exit.  Perhaps this is a less common case on ppc64/spapr, but on x86
> >>> (without viommu) all groups are connected to the same container and
> >>> thus only the first group gets registered with the vfio-kvm device.
> >>> This becomes a problem if we then hot-unplug the devices associated
> >>> with that first group and we end up with KVM being misinformed about
> >>> any vfio connections that might remain.  Fix by including the call to
> >>> vfio_kvm_device_add_group() in this early exit path.
> >>>
> >>> Fixes: 8c37faa475f3 ("vfio-pci, ppc64/spapr: Reorder group-to-container attaching")
> >>> Cc: qemu-stable@nongnu.org # qemu-2.10+
> >>> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> >>> ---
> >>>
> >>> This bug also existed in QEMU 2.10, but I think the fix is sufficiently
> >>> obvious (famous last words) to propose for 2.11 at this late date.  If
> >>> the first group is hot unplugged then KVM may revert to code emulation
> >>> that assumes no non-coherent DMA is present on some systems.  Also for
> >>> KVMGT, if the vGPU is not the first device registered, then the
> >>> notifier to enable linkages to KVM would not be called.  Please review.    
> >>
> >> For what it is worth
> >>
> >> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>  
> > 
> > Thanks!
> >   
> >> Sorry for the breakage...
> >>
> >> One question - how was this discovered? I'd love to set up a test
> >> environment on my old thinkpad x230 if possible.  
> > 
> > Assign two devices from separate iommu groups, hot unplug the first
> > device, followed by the second device.  The second unplug will trigger:
> > 
> > qemu-kvm: Failed to remove group ## from KVM VFIO device: No such file or directory
> > 
> > Laptops don't have many devices and we're not good about keeping up
> > with ACS quirks on laptop chipsets, so it might be difficult to find
> > the prerequisite setup there.  Thanks,  
> 
> Tried the laptop, these worked:
> 
> 03:00.0 Network controller: Intel Corporation Centrino Advanced-N 6205
> [Taylor Peak] (rev 34)
> 00:1a.0 USB controller: Intel Corporation 7 Series/C216 Chipset Family USB
> Enhanced Host Controller #2 (rev 04)

Worked as in reproduced the issue above?

> However VGA did not.
> 
> $ lspci -nns 00:02.0
> 00:02.0 VGA compatible controller [0300]: Intel Corporation 3rd Gen Core
> processor Graphics Controller [8086:0166] (rev 09)
> 
> I run like this:
> 
> pbuild/qemu-localhost-x86_64/x86_64-softmmu/qemu-system-x86_64 \
> -enable-kvm -m 2G \
> -netdev "tap,id=TAP0,helper=/home/aik/qemu-bridge-helper --br=br0" \
> -device "virtio-net-pci,id=vnet0,mac=C0:41:49:4b:00:32,netdev=TAP0" \
> virtimg/fc27-32GB.qcow2 -nodefaults \
> -chardev stdio,id=STDIO0,signal=off,mux=on \
> -device isa-serial,id=isa-serial0,chardev=STDIO0 \
> -mon id=MON0,chardev=STDIO0,mode=readline -nographic -vga none \
> -snapshot \
> -device "vfio-pci,id=vfio0000_00_02_0,host=0000:00:02.0"
> 
> and it crashes pretty soon, I suppose, as @pc does not change:
> 
> (qemu) info cpus
> * CPU #0: pc=0x00000000000c5afa thread_id=4024
> (qemu) info cpus
> * CPU #0: pc=0x00000000000c5afa thread_id=4024
> 
> and it does not seem to reach seabios or it does and seabios is
> initializing VGA - hard to tell, without any VGA - seabios prints messages
> to the console and shows grub. Is there any trick to try? Not big deal if
> none, just curious. Thanks.

Intel graphics is very "special", see docs/igd-assign.txt.  If your
goal is just to have one more device to assign that isn't too much
trouble, walk away slowly ;)  Minimally you'll need to decide if you're
trying to get legacy mode or UPT mode working (see doc), the former
needs to have the device at guest address 00:02.0.  The latter doesn't
technically support output to the display, but can be coaxed to work
with the x-igd-opregion option, but Intel is pretty fickle about
whether they actually care if this works, so YMMV.  Thanks,

Alex
Alexey Kardashevskiy Jan. 19, 2018, 12:35 a.m. | #11
On 19/01/18 09:15, Alex Williamson wrote:
> On Thu, 18 Jan 2018 20:29:48 +1100
> Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
> 
>> On 06/12/17 12:30, Alex Williamson wrote:
>>> On Wed, 6 Dec 2017 12:02:01 +1100
>>> Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
>>>   
>>>> On 06/12/17 08:09, Alex Williamson wrote:  
>>>>> Commit 8c37faa475f3 ("vfio-pci, ppc64/spapr: Reorder group-to-container
>>>>> attaching") moved registration of groups with the vfio-kvm device from
>>>>> vfio_get_group() to vfio_connect_container(), but it missed the case
>>>>> where a group is attached to an existing container and takes an early
>>>>> exit.  Perhaps this is a less common case on ppc64/spapr, but on x86
>>>>> (without viommu) all groups are connected to the same container and
>>>>> thus only the first group gets registered with the vfio-kvm device.
>>>>> This becomes a problem if we then hot-unplug the devices associated
>>>>> with that first group and we end up with KVM being misinformed about
>>>>> any vfio connections that might remain.  Fix by including the call to
>>>>> vfio_kvm_device_add_group() in this early exit path.
>>>>>
>>>>> Fixes: 8c37faa475f3 ("vfio-pci, ppc64/spapr: Reorder group-to-container attaching")
>>>>> Cc: qemu-stable@nongnu.org # qemu-2.10+
>>>>> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
>>>>> ---
>>>>>
>>>>> This bug also existed in QEMU 2.10, but I think the fix is sufficiently
>>>>> obvious (famous last words) to propose for 2.11 at this late date.  If
>>>>> the first group is hot unplugged then KVM may revert to code emulation
>>>>> that assumes no non-coherent DMA is present on some systems.  Also for
>>>>> KVMGT, if the vGPU is not the first device registered, then the
>>>>> notifier to enable linkages to KVM would not be called.  Please review.    
>>>>
>>>> For what it is worth
>>>>
>>>> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>  
>>>
>>> Thanks!
>>>   
>>>> Sorry for the breakage...
>>>>
>>>> One question - how was this discovered? I'd love to set up a test
>>>> environment on my old thinkpad x230 if possible.  
>>>
>>> Assign two devices from separate iommu groups, hot unplug the first
>>> device, followed by the second device.  The second unplug will trigger:
>>>
>>> qemu-kvm: Failed to remove group ## from KVM VFIO device: No such file or directory
>>>
>>> Laptops don't have many devices and we're not good about keeping up
>>> with ACS quirks on laptop chipsets, so it might be difficult to find
>>> the prerequisite setup there.  Thanks,  
>>
>> Tried the laptop, these worked:
>>
>> 03:00.0 Network controller: Intel Corporation Centrino Advanced-N 6205
>> [Taylor Peak] (rev 34)
>> 00:1a.0 USB controller: Intel Corporation 7 Series/C216 Chipset Family USB
>> Enhanced Host Controller #2 (rev 04)
> 
> Worked as in reproduced the issue above?


Nah, that issue I reproduced on a powerpc box.


> 
>> However VGA did not.
>>
>> $ lspci -nns 00:02.0
>> 00:02.0 VGA compatible controller [0300]: Intel Corporation 3rd Gen Core
>> processor Graphics Controller [8086:0166] (rev 09)
>>
>> I run like this:
>>
>> pbuild/qemu-localhost-x86_64/x86_64-softmmu/qemu-system-x86_64 \
>> -enable-kvm -m 2G \
>> -netdev "tap,id=TAP0,helper=/home/aik/qemu-bridge-helper --br=br0" \
>> -device "virtio-net-pci,id=vnet0,mac=C0:41:49:4b:00:32,netdev=TAP0" \
>> virtimg/fc27-32GB.qcow2 -nodefaults \
>> -chardev stdio,id=STDIO0,signal=off,mux=on \
>> -device isa-serial,id=isa-serial0,chardev=STDIO0 \
>> -mon id=MON0,chardev=STDIO0,mode=readline -nographic -vga none \
>> -snapshot \
>> -device "vfio-pci,id=vfio0000_00_02_0,host=0000:00:02.0"
>>
>> and it crashes pretty soon, I suppose, as @pc does not change:
>>
>> (qemu) info cpus
>> * CPU #0: pc=0x00000000000c5afa thread_id=4024
>> (qemu) info cpus
>> * CPU #0: pc=0x00000000000c5afa thread_id=4024
>>
>> and it does not seem to reach seabios or it does and seabios is
>> initializing VGA - hard to tell, without any VGA - seabios prints messages
>> to the console and shows grub. Is there any trick to try? Not big deal if
>> none, just curious. Thanks.
> 
> Intel graphics is very "special", see docs/igd-assign.txt.  If your
> goal is just to have one more device to assign that isn't too much
> trouble, walk away slowly ;)  Minimally you'll need to decide if you're
> trying to get legacy mode or UPT mode working (see doc), the former
> needs to have the device at guest address 00:02.0.  The latter doesn't
> technically support output to the display, but can be coaxed to work
> with the x-igd-opregion option, but Intel is pretty fickle about
> whether they actually care if this works, so YMMV.  Thanks,

Wow.

Anyway, this worked - I can see few boot prints (not many as console=ttyS0)
and eventually the fedora login screen.

pbuild/qemu-localhost-x86_64/x86_64-softmmu/qemu-system-x86_64 \
-enable-kvm -m 2G \
-netdev "tap,id=TAP0,helper=/home/aik/qemu-bridge-helper --br=br0" \
-device \
"virtio-net-pci,id=vnet0,bus=pci.0,addr=8.0,mac=C0:41:49:4b:00:32,netdev=TAP0"
\
virtimg/fc27-32GB.qcow2 -nodefaults \
-chardev stdio,id=STDIO0,signal=off,mux=on \
-device isa-serial,id=isa-serial0,chardev=STDIO0 \
-mon id=MON0,chardev=STDIO0,mode=readline -nographic -vga none \
-snapshot \
-device \
vfio-pci-igd-lpc-bridge,id=vfio-pci-igd-lpc-bridge0,bus=pci.0,addr=1f.0 \
-device \
"vfio-pci,id=vfio0000_00_02_0,host=0000:00:02.0,bus=pci.0,addr=2.0"


In the guest:

[aik@aiktest50 ~]$ lspci
00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 09)
00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II]
00:01.1 IDE interface: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II]
00:01.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 03)
00:02.0 VGA compatible controller: Intel Corporation 3rd Gen Core processor
Graphics Controller (rev 09)
00:08.0 Ethernet controller: Red Hat, Inc. Virtio network device
00:1f.0 ISA bridge: Intel Corporation QM77 Express Chipset LPC Controller
(rev 04)

Good, thanks.


The problem I see now is if I ever run QEMU + VFIO with that IGD, I cannot
reload vfio_pci module. It must be unrelated but if I do not do IGD
passthru, then I do not see it. This is my python script output, pretty
straightforward:

[aik@balbir ~]$ s/pci/bind 0:00:02.0 unbind
Call: ['sudo', 'bash', '-c', "echo '0000:00:02.0' >
'/sys/bus/pci/devices/0000:00:02.0/driver/unbind'"]
[aik@balbir ~]$ s/pci/bind 0:00:02.0 rebind
Succeeded: echo "0000:00:02.0" > /sys/bus/pci/drivers/i915/bind
[aik@balbir ~]$ sudo rmmod vfio_pci vfio_virqfd vfio_iommu_type1 vfio
[aik@balbir ~]$ lsmod | grep vfio
[aik@balbir ~]$ s/pci/bind 0:00:02.0
Link=/sys/bus/pci/devices/0000:00:02.0/iommu_group
Call: ['sudo', 'bash', '-c', "echo '0000:00:02.0' >
'/sys/bus/pci/devices/0000:00:02.0/driver/unbind'"]
Cmd: bash -c lsmod
Call: ['sudo', 'bash', '-c', 'modprobe vfio_pci']

Message from syslogd@localhost at Jan 19 11:29:12 ...
 kernel:NMI watchdog: Watchdog detected hard LOCKUP on cpu 2

Message from syslogd@localhost at Jan 19 11:29:12 ...
 kernel:watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [modprobe:2065]

Does this look familiar, before I decide I really want to dig further?


Another question - how do you configure input (keyboard, mouse) for a guest
with IGD?

Patch

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 7b2924c0ef19..7007878e345e 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -968,6 +968,7 @@  static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
         if (!ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &container->fd)) {
             group->container = container;
             QLIST_INSERT_HEAD(&container->group_list, group, container_next);
+            vfio_kvm_device_add_group(group);
             return 0;
         }
     }