diff mbox

[RFC,01/14] vfio: Start adding VFIO/EEH interface

Message ID 1442647117-2726-2-git-send-email-david@gibson.dropbear.id.au
State New
Headers show

Commit Message

David Gibson Sept. 19, 2015, 7:18 a.m. UTC
At present the code handling IBM's Enhanced Error Handling (EEH) interface
on VFIO devices operates by bypassing the usual VFIO logic with
vfio_container_ioctl().  That's a poorly designed interface with unclear
semantics about exactly what can be operated on.

As a first step to cleaning that up, start creating a new VFIO interface
for EEH operations.  Because EEH operates in units of a "Partitionable
Endpoint" (PE) - a group of devices that can't be mutually isolated), it
needs to expose host PEs (that is, IOMMU groups) to the guest.  This means
EEH needs VFIO concepts exposed that other VFIO users don't.

At present all EEH ioctl()s in use operate conceptually on a single PE and
take no parameters except the opcode itself.  So, expose a vfio_eeh_op()
function to apply a single no-argument operation to a VFIO group.

At present the kernel VFIO/EEH interface is broken, because it assumes
there is only one VFIO group per container, which is no longer always the
case.  So, add logic to detect this case and warn.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 hw/vfio/Makefile.objs      |  1 +
 hw/vfio/eeh.c              | 64 ++++++++++++++++++++++++++++++++++++++++++++++
 include/hw/vfio/vfio-eeh.h | 42 ++++++++++++++++++++++++++++++
 3 files changed, 107 insertions(+)
 create mode 100644 hw/vfio/eeh.c
 create mode 100644 include/hw/vfio/vfio-eeh.h

Comments

Alex Williamson Sept. 23, 2015, 5:28 p.m. UTC | #1
On Sat, 2015-09-19 at 17:18 +1000, David Gibson wrote:
> At present the code handling IBM's Enhanced Error Handling (EEH) interface
> on VFIO devices operates by bypassing the usual VFIO logic with
> vfio_container_ioctl().  That's a poorly designed interface with unclear
> semantics about exactly what can be operated on.
> 
> As a first step to cleaning that up, start creating a new VFIO interface
> for EEH operations.  Because EEH operates in units of a "Partitionable
> Endpoint" (PE) - a group of devices that can't be mutually isolated), it
> needs to expose host PEs (that is, IOMMU groups) to the guest.  This means
> EEH needs VFIO concepts exposed that other VFIO users don't.
> 
> At present all EEH ioctl()s in use operate conceptually on a single PE and
> take no parameters except the opcode itself.  So, expose a vfio_eeh_op()
> function to apply a single no-argument operation to a VFIO group.
> 
> At present the kernel VFIO/EEH interface is broken, because it assumes
> there is only one VFIO group per container, which is no longer always the
> case.  So, add logic to detect this case and warn.
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  hw/vfio/Makefile.objs      |  1 +
>  hw/vfio/eeh.c              | 64 ++++++++++++++++++++++++++++++++++++++++++++++
>  include/hw/vfio/vfio-eeh.h | 42 ++++++++++++++++++++++++++++++
>  3 files changed, 107 insertions(+)
>  create mode 100644 hw/vfio/eeh.c
>  create mode 100644 include/hw/vfio/vfio-eeh.h
> 
> diff --git a/hw/vfio/Makefile.objs b/hw/vfio/Makefile.objs
> index d540c9d..384c702 100644
> --- a/hw/vfio/Makefile.objs
> +++ b/hw/vfio/Makefile.objs
> @@ -3,4 +3,5 @@ obj-$(CONFIG_SOFTMMU) += common.o
>  obj-$(CONFIG_PCI) += pci.o
>  obj-$(CONFIG_SOFTMMU) += platform.o
>  obj-$(CONFIG_SOFTMMU) += calxeda-xgmac.o
> +obj-$(CONFIG_PSERIES) += eeh.o
>  endif
> diff --git a/hw/vfio/eeh.c b/hw/vfio/eeh.c
> new file mode 100644
> index 0000000..35bd06c
> --- /dev/null
> +++ b/hw/vfio/eeh.c
> @@ -0,0 +1,64 @@
> +/*
> + * EEH (IBM Enhanced Error Handling) functions for VFIO devices
> + *
> + * Copyright Red Hat, Inc. 2015
> + *
> + * Authors:
> + *  David Gibson <dgibson@redhat.com>
> + *
> + * This program is free software: you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License as
> + * published by the Free Software Foundation, either version 2 of the
> + * License, or (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
> + *
> + * Based on earlier EEH implementations:
> + * Copyright (c) 2011-2014 Alexey Kardashevskiy, IBM Corporation.
> + */
> +#include <sys/ioctl.h>
> +#include <linux/vfio.h>
> +
> +#include "qemu/error-report.h"
> +
> +#include "hw/vfio/vfio-common.h"
> +#include "hw/vfio/vfio-eeh.h"
> +
> +int vfio_eeh_op(VFIOGroup *group, uint32_t op)
> +{
> +    VFIOContainer *container = group->container;
> +    struct vfio_eeh_pe_op pe_op = {
> +        .argsz = sizeof(pe_op),
> +        .op = op,
> +    };
> +    int ret;
> +
> +    if (!container) {
> +        error_report("vfio/eeh: EEH_PE_OP 0x%x called on unattached group %d",
> +                     op, group->groupid);
> +        return -ENODEV;
> +    }
> +
> +    /* A broken kernel interface means EEH operations can't work
> +     * correctly if there are multiple groups in a container */
> +    if ((QLIST_FIRST(&container->group_list) != group)
> +        || QLIST_NEXT(group, container_next)) {
> +        error_report("vfio/eeh: EEH_PE_OP 0x%x called on container"
> +                     " with multiple groups", op);
> +        return -ENOSPC;

-EINVAL really seems more appropriate

> +    }
> +
> +    ret = ioctl(container->fd, VFIO_EEH_PE_OP, &pe_op);
> +    if (ret < 0) {
> +        error_report("vfio/eeh: EEH_PE_OP 0x%x failed on group %d: %m",
> +                     op, group->groupid);
> +    }
> +
> +    return ret;

Would -errno be more interesting in the failure case?

> +}
> diff --git a/include/hw/vfio/vfio-eeh.h b/include/hw/vfio/vfio-eeh.h
> new file mode 100644
> index 0000000..d7356f2
> --- /dev/null
> +++ b/include/hw/vfio/vfio-eeh.h
> @@ -0,0 +1,42 @@
> +/*
> + * EEH (IBM Enhanced Error Handling) functions for VFIO devices
> + *
> + * Copyright Red Hat, Inc. 2015
> + *
> + * Authors:
> + *  David Gibson <dgibson@redhat.com>
> + *
> + * This program is free software: you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License as
> + * published by the Free Software Foundation, either version 2 of the
> + * License, or (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
> + *
> + * Based on earlier EEH implementations:
> + * Copyright (c) 2011-2014 Alexey Kardashevskiy, IBM Corporation.
> + */
> +#ifndef VFIO_EEH_H
> +#define VFIO_EEH_H
> +
> +#include <linux/vfio.h>
> +
> +/*
> + * EEH (Enhanced Error Handling) used in IBM Power guests, needs extra
> + * hooks in VFIO to correctly pass error handling operations between
> + * guest and host.
> + */
> +
> +/* EEH needs to be aware of VFIOGroups as a concept, though not the
> + * internals */
> +typedef struct VFIOGroup VFIOGroup;
> +
> +int vfio_eeh_op(VFIOGroup *group, uint32_t op);
> +
> +#endif /* VFIO_EEH_H */
David Gibson Sept. 24, 2015, 1:11 a.m. UTC | #2
On Wed, Sep 23, 2015 at 11:28:29AM -0600, Alex Williamson wrote:
> On Sat, 2015-09-19 at 17:18 +1000, David Gibson wrote:
> > At present the code handling IBM's Enhanced Error Handling (EEH) interface
> > on VFIO devices operates by bypassing the usual VFIO logic with
> > vfio_container_ioctl().  That's a poorly designed interface with unclear
> > semantics about exactly what can be operated on.
> > 
> > As a first step to cleaning that up, start creating a new VFIO interface
> > for EEH operations.  Because EEH operates in units of a "Partitionable
> > Endpoint" (PE) - a group of devices that can't be mutually isolated), it
> > needs to expose host PEs (that is, IOMMU groups) to the guest.  This means
> > EEH needs VFIO concepts exposed that other VFIO users don't.
> > 
> > At present all EEH ioctl()s in use operate conceptually on a single PE and
> > take no parameters except the opcode itself.  So, expose a vfio_eeh_op()
> > function to apply a single no-argument operation to a VFIO group.
> > 
> > At present the kernel VFIO/EEH interface is broken, because it assumes
> > there is only one VFIO group per container, which is no longer always the
> > case.  So, add logic to detect this case and warn.
> > 
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > ---
> >  hw/vfio/Makefile.objs      |  1 +
> >  hw/vfio/eeh.c              | 64 ++++++++++++++++++++++++++++++++++++++++++++++
> >  include/hw/vfio/vfio-eeh.h | 42 ++++++++++++++++++++++++++++++
> >  3 files changed, 107 insertions(+)
> >  create mode 100644 hw/vfio/eeh.c
> >  create mode 100644 include/hw/vfio/vfio-eeh.h
> > 
> > diff --git a/hw/vfio/Makefile.objs b/hw/vfio/Makefile.objs
> > index d540c9d..384c702 100644
> > --- a/hw/vfio/Makefile.objs
> > +++ b/hw/vfio/Makefile.objs
> > @@ -3,4 +3,5 @@ obj-$(CONFIG_SOFTMMU) += common.o
> >  obj-$(CONFIG_PCI) += pci.o
> >  obj-$(CONFIG_SOFTMMU) += platform.o
> >  obj-$(CONFIG_SOFTMMU) += calxeda-xgmac.o
> > +obj-$(CONFIG_PSERIES) += eeh.o
> >  endif
> > diff --git a/hw/vfio/eeh.c b/hw/vfio/eeh.c
> > new file mode 100644
> > index 0000000..35bd06c
> > --- /dev/null
> > +++ b/hw/vfio/eeh.c
> > @@ -0,0 +1,64 @@
> > +/*
> > + * EEH (IBM Enhanced Error Handling) functions for VFIO devices
> > + *
> > + * Copyright Red Hat, Inc. 2015
> > + *
> > + * Authors:
> > + *  David Gibson <dgibson@redhat.com>
> > + *
> > + * This program is free software: you can redistribute it and/or
> > + * modify it under the terms of the GNU General Public License as
> > + * published by the Free Software Foundation, either version 2 of the
> > + * License, or (at your option) any later version.
> > + *
> > + * This program is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > + * GNU General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU General Public License
> > + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
> > + *
> > + * Based on earlier EEH implementations:
> > + * Copyright (c) 2011-2014 Alexey Kardashevskiy, IBM Corporation.
> > + */
> > +#include <sys/ioctl.h>
> > +#include <linux/vfio.h>
> > +
> > +#include "qemu/error-report.h"
> > +
> > +#include "hw/vfio/vfio-common.h"
> > +#include "hw/vfio/vfio-eeh.h"
> > +
> > +int vfio_eeh_op(VFIOGroup *group, uint32_t op)
> > +{
> > +    VFIOContainer *container = group->container;
> > +    struct vfio_eeh_pe_op pe_op = {
> > +        .argsz = sizeof(pe_op),
> > +        .op = op,
> > +    };
> > +    int ret;
> > +
> > +    if (!container) {
> > +        error_report("vfio/eeh: EEH_PE_OP 0x%x called on unattached group %d",
> > +                     op, group->groupid);
> > +        return -ENODEV;
> > +    }
> > +
> > +    /* A broken kernel interface means EEH operations can't work
> > +     * correctly if there are multiple groups in a container */
> > +    if ((QLIST_FIRST(&container->group_list) != group)
> > +        || QLIST_NEXT(group, container_next)) {
> > +        error_report("vfio/eeh: EEH_PE_OP 0x%x called on container"
> > +                     " with multiple groups", op);
> > +        return -ENOSPC;
> 
> -EINVAL really seems more appropriate

So, I agree that ENOSPC is a dubious choice, but EINVAL is definitely
wrong here.

Broad as it is, EINVAL should always indicate that the caller has
supplied some sort of bad parameter.  In this case the parameters are
just fine, it's just that the kernel is broken so we can't handle that
case.

Perhaps EBUSY?  Since there isn't an ESHOULDWORKBUTDOESNT or EDEVBROKEN.

> 
> > +    }
> > +
> > +    ret = ioctl(container->fd, VFIO_EEH_PE_OP, &pe_op);
> > +    if (ret < 0) {
> > +        error_report("vfio/eeh: EEH_PE_OP 0x%x failed on group %d: %m",
> > +                     op, group->groupid);
> > +    }
> > +
> > +    return ret;
> 
> Would -errno be more interesting in the failure case?

Oh, yes.  Too much kernel work, I'm used to things returning the error
code.
Alex Williamson Sept. 24, 2015, 2:12 a.m. UTC | #3
On Thu, 2015-09-24 at 11:11 +1000, David Gibson wrote:
> On Wed, Sep 23, 2015 at 11:28:29AM -0600, Alex Williamson wrote:
> > On Sat, 2015-09-19 at 17:18 +1000, David Gibson wrote:
> > > At present the code handling IBM's Enhanced Error Handling (EEH) interface
> > > on VFIO devices operates by bypassing the usual VFIO logic with
> > > vfio_container_ioctl().  That's a poorly designed interface with unclear
> > > semantics about exactly what can be operated on.
> > > 
> > > As a first step to cleaning that up, start creating a new VFIO interface
> > > for EEH operations.  Because EEH operates in units of a "Partitionable
> > > Endpoint" (PE) - a group of devices that can't be mutually isolated), it
> > > needs to expose host PEs (that is, IOMMU groups) to the guest.  This means
> > > EEH needs VFIO concepts exposed that other VFIO users don't.
> > > 
> > > At present all EEH ioctl()s in use operate conceptually on a single PE and
> > > take no parameters except the opcode itself.  So, expose a vfio_eeh_op()
> > > function to apply a single no-argument operation to a VFIO group.
> > > 
> > > At present the kernel VFIO/EEH interface is broken, because it assumes
> > > there is only one VFIO group per container, which is no longer always the
> > > case.  So, add logic to detect this case and warn.
> > > 
> > > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > > ---
> > >  hw/vfio/Makefile.objs      |  1 +
> > >  hw/vfio/eeh.c              | 64 ++++++++++++++++++++++++++++++++++++++++++++++
> > >  include/hw/vfio/vfio-eeh.h | 42 ++++++++++++++++++++++++++++++
> > >  3 files changed, 107 insertions(+)
> > >  create mode 100644 hw/vfio/eeh.c
> > >  create mode 100644 include/hw/vfio/vfio-eeh.h
> > > 
> > > diff --git a/hw/vfio/Makefile.objs b/hw/vfio/Makefile.objs
> > > index d540c9d..384c702 100644
> > > --- a/hw/vfio/Makefile.objs
> > > +++ b/hw/vfio/Makefile.objs
> > > @@ -3,4 +3,5 @@ obj-$(CONFIG_SOFTMMU) += common.o
> > >  obj-$(CONFIG_PCI) += pci.o
> > >  obj-$(CONFIG_SOFTMMU) += platform.o
> > >  obj-$(CONFIG_SOFTMMU) += calxeda-xgmac.o
> > > +obj-$(CONFIG_PSERIES) += eeh.o
> > >  endif
> > > diff --git a/hw/vfio/eeh.c b/hw/vfio/eeh.c
> > > new file mode 100644
> > > index 0000000..35bd06c
> > > --- /dev/null
> > > +++ b/hw/vfio/eeh.c
> > > @@ -0,0 +1,64 @@
> > > +/*
> > > + * EEH (IBM Enhanced Error Handling) functions for VFIO devices
> > > + *
> > > + * Copyright Red Hat, Inc. 2015
> > > + *
> > > + * Authors:
> > > + *  David Gibson <dgibson@redhat.com>
> > > + *
> > > + * This program is free software: you can redistribute it and/or
> > > + * modify it under the terms of the GNU General Public License as
> > > + * published by the Free Software Foundation, either version 2 of the
> > > + * License, or (at your option) any later version.
> > > + *
> > > + * This program is distributed in the hope that it will be useful,
> > > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > > + * GNU General Public License for more details.
> > > + *
> > > + * You should have received a copy of the GNU General Public License
> > > + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
> > > + *
> > > + * Based on earlier EEH implementations:
> > > + * Copyright (c) 2011-2014 Alexey Kardashevskiy, IBM Corporation.
> > > + */
> > > +#include <sys/ioctl.h>
> > > +#include <linux/vfio.h>
> > > +
> > > +#include "qemu/error-report.h"
> > > +
> > > +#include "hw/vfio/vfio-common.h"
> > > +#include "hw/vfio/vfio-eeh.h"
> > > +
> > > +int vfio_eeh_op(VFIOGroup *group, uint32_t op)
> > > +{
> > > +    VFIOContainer *container = group->container;
> > > +    struct vfio_eeh_pe_op pe_op = {
> > > +        .argsz = sizeof(pe_op),
> > > +        .op = op,
> > > +    };
> > > +    int ret;
> > > +
> > > +    if (!container) {
> > > +        error_report("vfio/eeh: EEH_PE_OP 0x%x called on unattached group %d",
> > > +                     op, group->groupid);
> > > +        return -ENODEV;
> > > +    }
> > > +
> > > +    /* A broken kernel interface means EEH operations can't work
> > > +     * correctly if there are multiple groups in a container */
> > > +    if ((QLIST_FIRST(&container->group_list) != group)
> > > +        || QLIST_NEXT(group, container_next)) {
> > > +        error_report("vfio/eeh: EEH_PE_OP 0x%x called on container"
> > > +                     " with multiple groups", op);
> > > +        return -ENOSPC;
> > 
> > -EINVAL really seems more appropriate
> 
> So, I agree that ENOSPC is a dubious choice, but EINVAL is definitely
> wrong here.
> 
> Broad as it is, EINVAL should always indicate that the caller has
> supplied some sort of bad parameter.  In this case the parameters are
> just fine, it's just that the kernel is broken so we can't handle that
> case.
> 
> Perhaps EBUSY?  Since there isn't an ESHOULDWORKBUTDOESNT or EDEVBROKEN.

The caller has supplied a bad parameter, a group that doesn't match the
existing group in the container.  If you really object, EPERM?  EACCES?

> > > +    }
> > > +
> > > +    ret = ioctl(container->fd, VFIO_EEH_PE_OP, &pe_op);
> > > +    if (ret < 0) {
> > > +        error_report("vfio/eeh: EEH_PE_OP 0x%x failed on group %d: %m",
> > > +                     op, group->groupid);
> > > +    }
> > > +
> > > +    return ret;
> > 
> > Would -errno be more interesting in the failure case?
> 
> Oh, yes.  Too much kernel work, I'm used to things returning the error
> code.
>
David Gibson Sept. 24, 2015, 4:09 a.m. UTC | #4
On Wed, Sep 23, 2015 at 08:12:30PM -0600, Alex Williamson wrote:
> On Thu, 2015-09-24 at 11:11 +1000, David Gibson wrote:
> > On Wed, Sep 23, 2015 at 11:28:29AM -0600, Alex Williamson wrote:
> > > On Sat, 2015-09-19 at 17:18 +1000, David Gibson wrote:
> > > > At present the code handling IBM's Enhanced Error Handling (EEH) interface
> > > > on VFIO devices operates by bypassing the usual VFIO logic with
> > > > vfio_container_ioctl().  That's a poorly designed interface with unclear
> > > > semantics about exactly what can be operated on.
> > > > 
> > > > As a first step to cleaning that up, start creating a new VFIO interface
> > > > for EEH operations.  Because EEH operates in units of a "Partitionable
> > > > Endpoint" (PE) - a group of devices that can't be mutually isolated), it
> > > > needs to expose host PEs (that is, IOMMU groups) to the guest.  This means
> > > > EEH needs VFIO concepts exposed that other VFIO users don't.
> > > > 
> > > > At present all EEH ioctl()s in use operate conceptually on a single PE and
> > > > take no parameters except the opcode itself.  So, expose a vfio_eeh_op()
> > > > function to apply a single no-argument operation to a VFIO group.
> > > > 
> > > > At present the kernel VFIO/EEH interface is broken, because it assumes
> > > > there is only one VFIO group per container, which is no longer always the
> > > > case.  So, add logic to detect this case and warn.
> > > > 
> > > > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > > > ---
> > > >  hw/vfio/Makefile.objs      |  1 +
> > > >  hw/vfio/eeh.c              | 64 ++++++++++++++++++++++++++++++++++++++++++++++
> > > >  include/hw/vfio/vfio-eeh.h | 42 ++++++++++++++++++++++++++++++
> > > >  3 files changed, 107 insertions(+)
> > > >  create mode 100644 hw/vfio/eeh.c
> > > >  create mode 100644 include/hw/vfio/vfio-eeh.h
> > > > 
> > > > diff --git a/hw/vfio/Makefile.objs b/hw/vfio/Makefile.objs
> > > > index d540c9d..384c702 100644
> > > > --- a/hw/vfio/Makefile.objs
> > > > +++ b/hw/vfio/Makefile.objs
> > > > @@ -3,4 +3,5 @@ obj-$(CONFIG_SOFTMMU) += common.o
> > > >  obj-$(CONFIG_PCI) += pci.o
> > > >  obj-$(CONFIG_SOFTMMU) += platform.o
> > > >  obj-$(CONFIG_SOFTMMU) += calxeda-xgmac.o
> > > > +obj-$(CONFIG_PSERIES) += eeh.o
> > > >  endif
> > > > diff --git a/hw/vfio/eeh.c b/hw/vfio/eeh.c
> > > > new file mode 100644
> > > > index 0000000..35bd06c
> > > > --- /dev/null
> > > > +++ b/hw/vfio/eeh.c
> > > > @@ -0,0 +1,64 @@
> > > > +/*
> > > > + * EEH (IBM Enhanced Error Handling) functions for VFIO devices
> > > > + *
> > > > + * Copyright Red Hat, Inc. 2015
> > > > + *
> > > > + * Authors:
> > > > + *  David Gibson <dgibson@redhat.com>
> > > > + *
> > > > + * This program is free software: you can redistribute it and/or
> > > > + * modify it under the terms of the GNU General Public License as
> > > > + * published by the Free Software Foundation, either version 2 of the
> > > > + * License, or (at your option) any later version.
> > > > + *
> > > > + * This program is distributed in the hope that it will be useful,
> > > > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > > > + * GNU General Public License for more details.
> > > > + *
> > > > + * You should have received a copy of the GNU General Public License
> > > > + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
> > > > + *
> > > > + * Based on earlier EEH implementations:
> > > > + * Copyright (c) 2011-2014 Alexey Kardashevskiy, IBM Corporation.
> > > > + */
> > > > +#include <sys/ioctl.h>
> > > > +#include <linux/vfio.h>
> > > > +
> > > > +#include "qemu/error-report.h"
> > > > +
> > > > +#include "hw/vfio/vfio-common.h"
> > > > +#include "hw/vfio/vfio-eeh.h"
> > > > +
> > > > +int vfio_eeh_op(VFIOGroup *group, uint32_t op)
> > > > +{
> > > > +    VFIOContainer *container = group->container;
> > > > +    struct vfio_eeh_pe_op pe_op = {
> > > > +        .argsz = sizeof(pe_op),
> > > > +        .op = op,
> > > > +    };
> > > > +    int ret;
> > > > +
> > > > +    if (!container) {
> > > > +        error_report("vfio/eeh: EEH_PE_OP 0x%x called on unattached group %d",
> > > > +                     op, group->groupid);
> > > > +        return -ENODEV;
> > > > +    }
> > > > +
> > > > +    /* A broken kernel interface means EEH operations can't work
> > > > +     * correctly if there are multiple groups in a container */
> > > > +    if ((QLIST_FIRST(&container->group_list) != group)
> > > > +        || QLIST_NEXT(group, container_next)) {
> > > > +        error_report("vfio/eeh: EEH_PE_OP 0x%x called on container"
> > > > +                     " with multiple groups", op);
> > > > +        return -ENOSPC;
> > > 
> > > -EINVAL really seems more appropriate
> > 
> > So, I agree that ENOSPC is a dubious choice, but EINVAL is definitely
> > wrong here.
> > 
> > Broad as it is, EINVAL should always indicate that the caller has
> > supplied some sort of bad parameter.  In this case the parameters are
> > just fine, it's just that the kernel is broken so we can't handle that
> > case.
> > 
> > Perhaps EBUSY?  Since there isn't an ESHOULDWORKBUTDOESNT or EDEVBROKEN.
> 
> The caller has supplied a bad parameter, a group that doesn't match the
> existing group in the container.  If you really object, EPERM?  EACCES?

Well, I suppose, but the parameter is only bad because the
implementation is flawed, not because the user has requested something
actually impossible.  EACCES is definitely wrong, since that should
refer to a problem with (settable) permissions.  EPERM.. I guess I can
live with, I'll go with that.
Thomas Huth Sept. 24, 2015, 5:45 a.m. UTC | #5
On 24/09/15 06:09, David Gibson wrote:
> On Wed, Sep 23, 2015 at 08:12:30PM -0600, Alex Williamson wrote:
>> On Thu, 2015-09-24 at 11:11 +1000, David Gibson wrote:
>>> On Wed, Sep 23, 2015 at 11:28:29AM -0600, Alex Williamson wrote:
>>>> On Sat, 2015-09-19 at 17:18 +1000, David Gibson wrote:
>>>>> At present the code handling IBM's Enhanced Error Handling (EEH) interface
>>>>> on VFIO devices operates by bypassing the usual VFIO logic with
>>>>> vfio_container_ioctl().  That's a poorly designed interface with unclear
>>>>> semantics about exactly what can be operated on.
>>>>>
>>>>> As a first step to cleaning that up, start creating a new VFIO interface
>>>>> for EEH operations.  Because EEH operates in units of a "Partitionable
>>>>> Endpoint" (PE) - a group of devices that can't be mutually isolated), it
>>>>> needs to expose host PEs (that is, IOMMU groups) to the guest.  This means
>>>>> EEH needs VFIO concepts exposed that other VFIO users don't.
>>>>>
>>>>> At present all EEH ioctl()s in use operate conceptually on a single PE and
>>>>> take no parameters except the opcode itself.  So, expose a vfio_eeh_op()
>>>>> function to apply a single no-argument operation to a VFIO group.
>>>>>
>>>>> At present the kernel VFIO/EEH interface is broken, because it assumes
>>>>> there is only one VFIO group per container, which is no longer always the
>>>>> case.  So, add logic to detect this case and warn.
>>>>>
>>>>> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
>>>>> ---
>>>>>  hw/vfio/Makefile.objs      |  1 +
>>>>>  hw/vfio/eeh.c              | 64 ++++++++++++++++++++++++++++++++++++++++++++++
>>>>>  include/hw/vfio/vfio-eeh.h | 42 ++++++++++++++++++++++++++++++
>>>>>  3 files changed, 107 insertions(+)
>>>>>  create mode 100644 hw/vfio/eeh.c
>>>>>  create mode 100644 include/hw/vfio/vfio-eeh.h
>>>>>
>>>>> diff --git a/hw/vfio/Makefile.objs b/hw/vfio/Makefile.objs
>>>>> index d540c9d..384c702 100644
>>>>> --- a/hw/vfio/Makefile.objs
>>>>> +++ b/hw/vfio/Makefile.objs
>>>>> @@ -3,4 +3,5 @@ obj-$(CONFIG_SOFTMMU) += common.o
>>>>>  obj-$(CONFIG_PCI) += pci.o
>>>>>  obj-$(CONFIG_SOFTMMU) += platform.o
>>>>>  obj-$(CONFIG_SOFTMMU) += calxeda-xgmac.o
>>>>> +obj-$(CONFIG_PSERIES) += eeh.o
>>>>>  endif
>>>>> diff --git a/hw/vfio/eeh.c b/hw/vfio/eeh.c
>>>>> new file mode 100644
>>>>> index 0000000..35bd06c
>>>>> --- /dev/null
>>>>> +++ b/hw/vfio/eeh.c
>>>>> @@ -0,0 +1,64 @@
>>>>> +/*
>>>>> + * EEH (IBM Enhanced Error Handling) functions for VFIO devices
>>>>> + *
>>>>> + * Copyright Red Hat, Inc. 2015
>>>>> + *
>>>>> + * Authors:
>>>>> + *  David Gibson <dgibson@redhat.com>
>>>>> + *
>>>>> + * This program is free software: you can redistribute it and/or
>>>>> + * modify it under the terms of the GNU General Public License as
>>>>> + * published by the Free Software Foundation, either version 2 of the
>>>>> + * License, or (at your option) any later version.
>>>>> + *
>>>>> + * This program is distributed in the hope that it will be useful,
>>>>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>>>>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>>>>> + * GNU General Public License for more details.
>>>>> + *
>>>>> + * You should have received a copy of the GNU General Public License
>>>>> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
>>>>> + *
>>>>> + * Based on earlier EEH implementations:
>>>>> + * Copyright (c) 2011-2014 Alexey Kardashevskiy, IBM Corporation.
>>>>> + */
>>>>> +#include <sys/ioctl.h>
>>>>> +#include <linux/vfio.h>
>>>>> +
>>>>> +#include "qemu/error-report.h"
>>>>> +
>>>>> +#include "hw/vfio/vfio-common.h"
>>>>> +#include "hw/vfio/vfio-eeh.h"
>>>>> +
>>>>> +int vfio_eeh_op(VFIOGroup *group, uint32_t op)
>>>>> +{
>>>>> +    VFIOContainer *container = group->container;
>>>>> +    struct vfio_eeh_pe_op pe_op = {
>>>>> +        .argsz = sizeof(pe_op),
>>>>> +        .op = op,
>>>>> +    };
>>>>> +    int ret;
>>>>> +
>>>>> +    if (!container) {
>>>>> +        error_report("vfio/eeh: EEH_PE_OP 0x%x called on unattached group %d",
>>>>> +                     op, group->groupid);
>>>>> +        return -ENODEV;
>>>>> +    }
>>>>> +
>>>>> +    /* A broken kernel interface means EEH operations can't work
>>>>> +     * correctly if there are multiple groups in a container */
>>>>> +    if ((QLIST_FIRST(&container->group_list) != group)
>>>>> +        || QLIST_NEXT(group, container_next)) {
>>>>> +        error_report("vfio/eeh: EEH_PE_OP 0x%x called on container"
>>>>> +                     " with multiple groups", op);
>>>>> +        return -ENOSPC;
>>>>
>>>> -EINVAL really seems more appropriate
>>>
>>> So, I agree that ENOSPC is a dubious choice, but EINVAL is definitely
>>> wrong here.
>>>
>>> Broad as it is, EINVAL should always indicate that the caller has
>>> supplied some sort of bad parameter.  In this case the parameters are
>>> just fine, it's just that the kernel is broken so we can't handle that
>>> case.
>>>
>>> Perhaps EBUSY?  Since there isn't an ESHOULDWORKBUTDOESNT or EDEVBROKEN.
>>
>> The caller has supplied a bad parameter, a group that doesn't match the
>> existing group in the container.  If you really object, EPERM?  EACCES?
> 
> Well, I suppose, but the parameter is only bad because the
> implementation is flawed, not because the user has requested something
> actually impossible.  EACCES is definitely wrong, since that should
> refer to a problem with (settable) permissions.  EPERM.. I guess I can
> live with, I'll go with that.

What about ENOTSUP ?
According to http://www.gnu.org/software/libc/manual/html_node/Error-Codes.html :
"Not supported. A function returns this error when certain parameter values
 are valid, but the functionality they request is not available."

Thomas
diff mbox

Patch

diff --git a/hw/vfio/Makefile.objs b/hw/vfio/Makefile.objs
index d540c9d..384c702 100644
--- a/hw/vfio/Makefile.objs
+++ b/hw/vfio/Makefile.objs
@@ -3,4 +3,5 @@  obj-$(CONFIG_SOFTMMU) += common.o
 obj-$(CONFIG_PCI) += pci.o
 obj-$(CONFIG_SOFTMMU) += platform.o
 obj-$(CONFIG_SOFTMMU) += calxeda-xgmac.o
+obj-$(CONFIG_PSERIES) += eeh.o
 endif
diff --git a/hw/vfio/eeh.c b/hw/vfio/eeh.c
new file mode 100644
index 0000000..35bd06c
--- /dev/null
+++ b/hw/vfio/eeh.c
@@ -0,0 +1,64 @@ 
+/*
+ * EEH (IBM Enhanced Error Handling) functions for VFIO devices
+ *
+ * Copyright Red Hat, Inc. 2015
+ *
+ * Authors:
+ *  David Gibson <dgibson@redhat.com>
+ *
+ * This program is free software: you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation, either version 2 of the
+ * License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ *
+ * Based on earlier EEH implementations:
+ * Copyright (c) 2011-2014 Alexey Kardashevskiy, IBM Corporation.
+ */
+#include <sys/ioctl.h>
+#include <linux/vfio.h>
+
+#include "qemu/error-report.h"
+
+#include "hw/vfio/vfio-common.h"
+#include "hw/vfio/vfio-eeh.h"
+
+int vfio_eeh_op(VFIOGroup *group, uint32_t op)
+{
+    VFIOContainer *container = group->container;
+    struct vfio_eeh_pe_op pe_op = {
+        .argsz = sizeof(pe_op),
+        .op = op,
+    };
+    int ret;
+
+    if (!container) {
+        error_report("vfio/eeh: EEH_PE_OP 0x%x called on unattached group %d",
+                     op, group->groupid);
+        return -ENODEV;
+    }
+
+    /* A broken kernel interface means EEH operations can't work
+     * correctly if there are multiple groups in a container */
+    if ((QLIST_FIRST(&container->group_list) != group)
+        || QLIST_NEXT(group, container_next)) {
+        error_report("vfio/eeh: EEH_PE_OP 0x%x called on container"
+                     " with multiple groups", op);
+        return -ENOSPC;
+    }
+
+    ret = ioctl(container->fd, VFIO_EEH_PE_OP, &pe_op);
+    if (ret < 0) {
+        error_report("vfio/eeh: EEH_PE_OP 0x%x failed on group %d: %m",
+                     op, group->groupid);
+    }
+
+    return ret;
+}
diff --git a/include/hw/vfio/vfio-eeh.h b/include/hw/vfio/vfio-eeh.h
new file mode 100644
index 0000000..d7356f2
--- /dev/null
+++ b/include/hw/vfio/vfio-eeh.h
@@ -0,0 +1,42 @@ 
+/*
+ * EEH (IBM Enhanced Error Handling) functions for VFIO devices
+ *
+ * Copyright Red Hat, Inc. 2015
+ *
+ * Authors:
+ *  David Gibson <dgibson@redhat.com>
+ *
+ * This program is free software: you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation, either version 2 of the
+ * License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ *
+ * Based on earlier EEH implementations:
+ * Copyright (c) 2011-2014 Alexey Kardashevskiy, IBM Corporation.
+ */
+#ifndef VFIO_EEH_H
+#define VFIO_EEH_H
+
+#include <linux/vfio.h>
+
+/*
+ * EEH (Enhanced Error Handling) used in IBM Power guests, needs extra
+ * hooks in VFIO to correctly pass error handling operations between
+ * guest and host.
+ */
+
+/* EEH needs to be aware of VFIOGroups as a concept, though not the
+ * internals */
+typedef struct VFIOGroup VFIOGroup;
+
+int vfio_eeh_op(VFIOGroup *group, uint32_t op);
+
+#endif /* VFIO_EEH_H */