diff mbox

[RFC,v11,3/4] vfio-pci: pass the aer error to guest

Message ID 1483175588-17006-4-git-send-email-caoj.fnst@cn.fujitsu.com
State New
Headers show

Commit Message

Cao jin Dec. 31, 2016, 9:13 a.m. UTC
From: Chen Fan <chen.fan.fnst@cn.fujitsu.com>

When physical device has uncorrectable error hanppened, the vfio_pci
driver will signal the uncorrectable error status register value to
corresponding QEMU's vfio-pci device via the eventfd registered by this
device, then, the vfio-pci's error eventfd handler will be invoked in
event loop.

Construct and pass the aer message to root port, root port will trigger an
interrupt to signal guest, then, the guest driver will do the recovery.

Note: Now only support non-fatal error's recovery, fatal error will
still result in vm stop.

Signed-off-by: Chen Fan <chen.fan.fnst@cn.fujitsu.com>
Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
---
 hw/vfio/pci.c | 50 ++++++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 42 insertions(+), 8 deletions(-)

Comments

Michael S. Tsirkin Jan. 9, 2017, 10:55 p.m. UTC | #1
On Sat, Dec 31, 2016 at 05:13:07PM +0800, Cao jin wrote:
> From: Chen Fan <chen.fan.fnst@cn.fujitsu.com>
> 
> When physical device has uncorrectable error hanppened, the vfio_pci
> driver will signal the uncorrectable error status register value to
> corresponding QEMU's vfio-pci device via the eventfd registered by this
> device, then, the vfio-pci's error eventfd handler will be invoked in
> event loop.
> 
> Construct and pass the aer message to root port, root port will trigger an
> interrupt to signal guest, then, the guest driver will do the recovery.
> 
> Note: Now only support non-fatal error's recovery, fatal error will
> still result in vm stop.
> 
> Signed-off-by: Chen Fan <chen.fan.fnst@cn.fujitsu.com>
> Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
> Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
> ---
>  hw/vfio/pci.c | 50 ++++++++++++++++++++++++++++++++++++++++++--------
>  1 file changed, 42 insertions(+), 8 deletions(-)
> 
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index 76a8ac3..9861f72 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -2470,21 +2470,55 @@ static void vfio_put_device(VFIOPCIDevice *vdev)
>  static void vfio_err_notifier_handler(void *opaque)
>  {
>      VFIOPCIDevice *vdev = opaque;
> +    PCIDevice *dev = &vdev->pdev;
> +    PCIEAERMsg msg = {
> +        .severity = 0,
> +        .source_id = (pci_bus_num(dev->bus) << 8) | dev->devfn,
> +    };
> +    int len;
> +    uint64_t uncor_status;
> +
> +    /* Read uncorrectable error status from driver */
> +    len = read(vdev->err_notifier.rfd, &uncor_status, sizeof(uncor_status));
> +    if (len != sizeof(uncor_status)) {
> +        error_report("vfio-pci: uncor error status reading returns"
> +                     " invalid number of bytes: %d", len);
> +        return; //Or goto stop?

I would definitely suggest this to make sure we don't regress.

> +    }
> +
> +    if (!(vdev->features & VFIO_FEATURE_ENABLE_AER)) {
> +        goto stop;
> +    }
> +
> +    /* Populate the aer msg and send it to root port */
> +    if (dev->exp.aer_cap) {
> +        uint8_t *aer_cap = dev->config + dev->exp.aer_cap;
> +        bool isfatal = uncor_status &
> +                       pci_get_long(aer_cap + PCI_ERR_UNCOR_SEVER);
> +
> +	if (isfatal) {
> +	    goto stop;
> +	}
> +
> +        msg.severity = isfatal ? PCI_ERR_ROOT_CMD_FATAL_EN :
> +                                 PCI_ERR_ROOT_CMD_NONFATAL_EN;
>  
> -    if (!event_notifier_test_and_clear(&vdev->err_notifier)) {
> +        error_report("vfio-pci device %d sending AER to root port. uncor"
> +                     " status = 0x%"PRIx64, dev->devfn, uncor_status);
> +        pcie_aer_msg(dev, &msg);
>          return;
>      }
>  
> +stop:
>      /*
> -     * TBD. Retrieve the error details and decide what action
> -     * needs to be taken. One of the actions could be to pass
> -     * the error to the guest and have the guest driver recover
> -     * from the error. This requires that PCIe capabilities be
> -     * exposed to the guest. For now, we just terminate the
> -     * guest to contain the error.
> +     * Terminate the guest in case of
> +     * 1. AER capability is not exposed to guest.
> +     * 2. AER capability is exposed, but error is fatal, only non-fatal
> +     * error is handled now.
>       */
>  
> -    error_report("%s(%s) Unrecoverable error detected. Please collect any data possible and then kill the guest", __func__, vdev->vbasedev.name);
> +    error_report("%s(%s) fatal error detected. Please collect any data"
> +            " possible and then kill the guest", __func__, vdev->vbasedev.name);
>  
>      vm_stop(RUN_STATE_INTERNAL_ERROR);
>  }
> -- 
> 1.8.3.1
> 
>
Alex Williamson Jan. 18, 2017, 10:31 p.m. UTC | #2
On Sat, 31 Dec 2016 17:13:07 +0800
Cao jin <caoj.fnst@cn.fujitsu.com> wrote:

> From: Chen Fan <chen.fan.fnst@cn.fujitsu.com>
> 
> When physical device has uncorrectable error hanppened, the vfio_pci
> driver will signal the uncorrectable error status register value to
> corresponding QEMU's vfio-pci device via the eventfd registered by this
> device, then, the vfio-pci's error eventfd handler will be invoked in
> event loop.
> 
> Construct and pass the aer message to root port, root port will trigger an
> interrupt to signal guest, then, the guest driver will do the recovery.
> 
> Note: Now only support non-fatal error's recovery, fatal error will
> still result in vm stop.

Please update the entire commit log, don't just add a note that this
now only covers non-fatal errors.

> 
> Signed-off-by: Chen Fan <chen.fan.fnst@cn.fujitsu.com>
> Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
> Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
> ---
>  hw/vfio/pci.c | 50 ++++++++++++++++++++++++++++++++++++++++++--------
>  1 file changed, 42 insertions(+), 8 deletions(-)
> 
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index 76a8ac3..9861f72 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -2470,21 +2470,55 @@ static void vfio_put_device(VFIOPCIDevice *vdev)
>  static void vfio_err_notifier_handler(void *opaque)
>  {
>      VFIOPCIDevice *vdev = opaque;
> +    PCIDevice *dev = &vdev->pdev;
> +    PCIEAERMsg msg = {
> +        .severity = 0,
> +        .source_id = (pci_bus_num(dev->bus) << 8) | dev->devfn,
> +    };
> +    int len;
> +    uint64_t uncor_status;
> +
> +    /* Read uncorrectable error status from driver */
> +    len = read(vdev->err_notifier.rfd, &uncor_status, sizeof(uncor_status));

Whoa this seems bogus.  In the kernel eventfd_signal() adds the value
to the internal counter.  We can't guarantee that the user is going to
immediately respond to every signal, multiple signals might occur, each
incrementing the counter.  I don't think we can use the eventfd like
this.

> +    if (len != sizeof(uncor_status)) {
> +        error_report("vfio-pci: uncor error status reading returns"
> +                     " invalid number of bytes: %d", len);
> +        return; //Or goto stop?

It's bogus use of the eventfd anyway afaict, but
event_notifier_test_and_clear() certainly handles at least EINTR.

> +    }
> +
> +    if (!(vdev->features & VFIO_FEATURE_ENABLE_AER)) {
> +        goto stop;
> +    }

I'm not sure this should be user selected anymore.

> +
> +    /* Populate the aer msg and send it to root port */
> +    if (dev->exp.aer_cap) {
> +        uint8_t *aer_cap = dev->config + dev->exp.aer_cap;
> +        bool isfatal = uncor_status &
> +                       pci_get_long(aer_cap + PCI_ERR_UNCOR_SEVER);
> +
> +	if (isfatal) {
> +	    goto stop;
> +	}

QEMU uses spaces not tabs.

> +
> +        msg.severity = isfatal ? PCI_ERR_ROOT_CMD_FATAL_EN :
> +                                 PCI_ERR_ROOT_CMD_NONFATAL_EN;

We don't get here if isfatal.

>  
> -    if (!event_notifier_test_and_clear(&vdev->err_notifier)) {
> +        error_report("vfio-pci device %d sending AER to root port. uncor"
> +                     " status = 0x%"PRIx64, dev->devfn, uncor_status);
> +        pcie_aer_msg(dev, &msg);
>          return;
>      }
>  
> +stop:
>      /*
> -     * TBD. Retrieve the error details and decide what action
> -     * needs to be taken. One of the actions could be to pass
> -     * the error to the guest and have the guest driver recover
> -     * from the error. This requires that PCIe capabilities be
> -     * exposed to the guest. For now, we just terminate the
> -     * guest to contain the error.
> +     * Terminate the guest in case of
> +     * 1. AER capability is not exposed to guest.
> +     * 2. AER capability is exposed, but error is fatal, only non-fatal
> +     * error is handled now.

You're also currently requiring the user to enable aer per device.

>       */
>  
> -    error_report("%s(%s) Unrecoverable error detected. Please collect any data possible and then kill the guest", __func__, vdev->vbasedev.name);
> +    error_report("%s(%s) fatal error detected. Please collect any data"
> +            " possible and then kill the guest", __func__, vdev->vbasedev.name);
>  
>      vm_stop(RUN_STATE_INTERNAL_ERROR);
>  }
Cao jin Jan. 20, 2017, 6:50 a.m. UTC | #3
On 01/19/2017 06:31 AM, Alex Williamson wrote:
> On Sat, 31 Dec 2016 17:13:07 +0800
> Cao jin <caoj.fnst@cn.fujitsu.com> wrote:
> 
>> From: Chen Fan <chen.fan.fnst@cn.fujitsu.com>
>>

>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>> index 76a8ac3..9861f72 100644
>> --- a/hw/vfio/pci.c
>> +++ b/hw/vfio/pci.c
>> @@ -2470,21 +2470,55 @@ static void vfio_put_device(VFIOPCIDevice *vdev)
>>  static void vfio_err_notifier_handler(void *opaque)
>>  {
>>      VFIOPCIDevice *vdev = opaque;
>> +    PCIDevice *dev = &vdev->pdev;
>> +    PCIEAERMsg msg = {
>> +        .severity = 0,
>> +        .source_id = (pci_bus_num(dev->bus) << 8) | dev->devfn,
>> +    };
>> +    int len;
>> +    uint64_t uncor_status;
>> +
>> +    /* Read uncorrectable error status from driver */
>> +    len = read(vdev->err_notifier.rfd, &uncor_status, sizeof(uncor_status));
> 
> Whoa this seems bogus.  In the kernel eventfd_signal() adds the value
> to the internal counter.  We can't guarantee that the user is going to
> immediately respond to every signal, multiple signals might occur, each
> incrementing the counter.  I don't think we can use the eventfd like
> this.
> 

Ah, got your concern, make sense. Michael had the same comments as you,
I didn't got him at that time...

I explained the reason in v10 cover letter(5 of changelog): I find that
error status register reading in error handler sometime returns ALL F's.
 I may want to take a look at v10, incorporate your comments, and test
to see if the issue still exists.  Currently if we only handle non-fatal
error, we can still use eventfd like before.
Tian, Kevin Jan. 20, 2017, 6:57 a.m. UTC | #4
> From: Alex Williamson
> Sent: Thursday, January 19, 2017 6:32 AM
> 
> On Sat, 31 Dec 2016 17:13:07 +0800
> Cao jin <caoj.fnst@cn.fujitsu.com> wrote:
> 
> > From: Chen Fan <chen.fan.fnst@cn.fujitsu.com>
> >
> > When physical device has uncorrectable error hanppened, the vfio_pci
> > driver will signal the uncorrectable error status register value to
> > corresponding QEMU's vfio-pci device via the eventfd registered by this
> > device, then, the vfio-pci's error eventfd handler will be invoked in
> > event loop.
> >
> > Construct and pass the aer message to root port, root port will trigger an
> > interrupt to signal guest, then, the guest driver will do the recovery.
> >
> > Note: Now only support non-fatal error's recovery, fatal error will
> > still result in vm stop.
> 
> Please update the entire commit log, don't just add a note that this
> now only covers non-fatal errors.
> 

One thing relate to vIOMMU. There is still a TODO task about forwarding
IOMMU fault thru VFIO to Qemu, so Qemu vIOMMU has the chance to
walk guest remapping structure to emulate virtual IOMMU fault. Likely
it also requires eventfd mechanism.

Wondering whether making sense to reuse same eventfd for both AER
and vIOMMU or using separate eventfd is also fine? Even go with the
former option, I don't expect substantial change to this series. Major
change is on interface definition - extensible to multiple types of 
fault/error conditions instead of assuming AER only.

Thoughts?

Thanks
Kevin
Alex Williamson Jan. 20, 2017, 6:21 p.m. UTC | #5
On Fri, 20 Jan 2017 06:57:22 +0000
"Tian, Kevin" <kevin.tian@intel.com> wrote:

> > From: Alex Williamson
> > Sent: Thursday, January 19, 2017 6:32 AM
> > 
> > On Sat, 31 Dec 2016 17:13:07 +0800
> > Cao jin <caoj.fnst@cn.fujitsu.com> wrote:
> >   
> > > From: Chen Fan <chen.fan.fnst@cn.fujitsu.com>
> > >
> > > When physical device has uncorrectable error hanppened, the vfio_pci
> > > driver will signal the uncorrectable error status register value to
> > > corresponding QEMU's vfio-pci device via the eventfd registered by this
> > > device, then, the vfio-pci's error eventfd handler will be invoked in
> > > event loop.
> > >
> > > Construct and pass the aer message to root port, root port will trigger an
> > > interrupt to signal guest, then, the guest driver will do the recovery.
> > >
> > > Note: Now only support non-fatal error's recovery, fatal error will
> > > still result in vm stop.  
> > 
> > Please update the entire commit log, don't just add a note that this
> > now only covers non-fatal errors.
> >   
> 
> One thing relate to vIOMMU. There is still a TODO task about forwarding
> IOMMU fault thru VFIO to Qemu, so Qemu vIOMMU has the chance to
> walk guest remapping structure to emulate virtual IOMMU fault. Likely
> it also requires eventfd mechanism.
> 
> Wondering whether making sense to reuse same eventfd for both AER
> and vIOMMU or using separate eventfd is also fine? Even go with the
> former option, I don't expect substantial change to this series. Major
> change is on interface definition - extensible to multiple types of 
> fault/error conditions instead of assuming AER only.
> 
> Thoughts?

We can't really convey any information through an eventfd, it's just a
signal, so I don't think we can use the same eventfd for both types of
errors.  Already here we're discussing the idea of using separate
eventfds for fatal vs non-fatal AERs.  IOMMU error processing seems
like yet another eventfd and likely some region or ioctl mechanism for
retrieving the error details since the IOMMU hardware is not directly
accessible.  Furthermore, such an event might logically be connected to
the vfio container rather than the device, so it might not even use the
same file descriptor.  Thanks,

Alex
Tian, Kevin Jan. 22, 2017, 4:43 a.m. UTC | #6
> From: Alex Williamson [mailto:alex.williamson@redhat.com]
> Sent: Saturday, January 21, 2017 2:21 AM
> 
> On Fri, 20 Jan 2017 06:57:22 +0000
> "Tian, Kevin" <kevin.tian@intel.com> wrote:
> 
> > > From: Alex Williamson
> > > Sent: Thursday, January 19, 2017 6:32 AM
> > >
> > > On Sat, 31 Dec 2016 17:13:07 +0800
> > > Cao jin <caoj.fnst@cn.fujitsu.com> wrote:
> > >
> > > > From: Chen Fan <chen.fan.fnst@cn.fujitsu.com>
> > > >
> > > > When physical device has uncorrectable error hanppened, the vfio_pci
> > > > driver will signal the uncorrectable error status register value to
> > > > corresponding QEMU's vfio-pci device via the eventfd registered by this
> > > > device, then, the vfio-pci's error eventfd handler will be invoked in
> > > > event loop.
> > > >
> > > > Construct and pass the aer message to root port, root port will trigger an
> > > > interrupt to signal guest, then, the guest driver will do the recovery.
> > > >
> > > > Note: Now only support non-fatal error's recovery, fatal error will
> > > > still result in vm stop.
> > >
> > > Please update the entire commit log, don't just add a note that this
> > > now only covers non-fatal errors.
> > >
> >
> > One thing relate to vIOMMU. There is still a TODO task about forwarding
> > IOMMU fault thru VFIO to Qemu, so Qemu vIOMMU has the chance to
> > walk guest remapping structure to emulate virtual IOMMU fault. Likely
> > it also requires eventfd mechanism.
> >
> > Wondering whether making sense to reuse same eventfd for both AER
> > and vIOMMU or using separate eventfd is also fine? Even go with the
> > former option, I don't expect substantial change to this series. Major
> > change is on interface definition - extensible to multiple types of
> > fault/error conditions instead of assuming AER only.
> >
> > Thoughts?
> 
> We can't really convey any information through an eventfd, it's just a
> signal, so I don't think we can use the same eventfd for both types of
> errors.  Already here we're discussing the idea of using separate
> eventfds for fatal vs non-fatal AERs.  IOMMU error processing seems
> like yet another eventfd and likely some region or ioctl mechanism for
> retrieving the error details since the IOMMU hardware is not directly
> accessible.  Furthermore, such an event might logically be connected to
> the vfio container rather than the device, so it might not even use the
> same file descriptor.  Thanks,
> 

Clear enough. Thanks,
Kevin
diff mbox

Patch

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 76a8ac3..9861f72 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2470,21 +2470,55 @@  static void vfio_put_device(VFIOPCIDevice *vdev)
 static void vfio_err_notifier_handler(void *opaque)
 {
     VFIOPCIDevice *vdev = opaque;
+    PCIDevice *dev = &vdev->pdev;
+    PCIEAERMsg msg = {
+        .severity = 0,
+        .source_id = (pci_bus_num(dev->bus) << 8) | dev->devfn,
+    };
+    int len;
+    uint64_t uncor_status;
+
+    /* Read uncorrectable error status from driver */
+    len = read(vdev->err_notifier.rfd, &uncor_status, sizeof(uncor_status));
+    if (len != sizeof(uncor_status)) {
+        error_report("vfio-pci: uncor error status reading returns"
+                     " invalid number of bytes: %d", len);
+        return; //Or goto stop?
+    }
+
+    if (!(vdev->features & VFIO_FEATURE_ENABLE_AER)) {
+        goto stop;
+    }
+
+    /* Populate the aer msg and send it to root port */
+    if (dev->exp.aer_cap) {
+        uint8_t *aer_cap = dev->config + dev->exp.aer_cap;
+        bool isfatal = uncor_status &
+                       pci_get_long(aer_cap + PCI_ERR_UNCOR_SEVER);
+
+	if (isfatal) {
+	    goto stop;
+	}
+
+        msg.severity = isfatal ? PCI_ERR_ROOT_CMD_FATAL_EN :
+                                 PCI_ERR_ROOT_CMD_NONFATAL_EN;
 
-    if (!event_notifier_test_and_clear(&vdev->err_notifier)) {
+        error_report("vfio-pci device %d sending AER to root port. uncor"
+                     " status = 0x%"PRIx64, dev->devfn, uncor_status);
+        pcie_aer_msg(dev, &msg);
         return;
     }
 
+stop:
     /*
-     * TBD. Retrieve the error details and decide what action
-     * needs to be taken. One of the actions could be to pass
-     * the error to the guest and have the guest driver recover
-     * from the error. This requires that PCIe capabilities be
-     * exposed to the guest. For now, we just terminate the
-     * guest to contain the error.
+     * Terminate the guest in case of
+     * 1. AER capability is not exposed to guest.
+     * 2. AER capability is exposed, but error is fatal, only non-fatal
+     * error is handled now.
      */
 
-    error_report("%s(%s) Unrecoverable error detected. Please collect any data possible and then kill the guest", __func__, vdev->vbasedev.name);
+    error_report("%s(%s) fatal error detected. Please collect any data"
+            " possible and then kill the guest", __func__, vdev->vbasedev.name);
 
     vm_stop(RUN_STATE_INTERNAL_ERROR);
 }