Patchwork vhost: force vhost off for non-MSI guests

login
register
mail settings
Submitter Michael S. Tsirkin
Date Jan. 20, 2011, 3:35 p.m.
Message ID <20110120153521.GA24357@redhat.com>
Download mbox | patch
Permalink /patch/79707/
State New
Headers show

Comments

Michael S. Tsirkin - Jan. 20, 2011, 3:35 p.m.
When MSI is off, each interrupt needs to be bounced through the io
thread when it's set/cleared, so vhost-net causes more context switches and
higher CPU utilization than userspace virtio which handles networking in
the same thread.

We'll need to fix this by adding level irq support in kvm irqfd,
for now disable vhost-net in these configurations.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---

I need to report some error from virtio-pci
that would be handled specially (disable but don't
report an error) so I wanted one that's never likely to be used by a
userspace ioctl. I selected ERANGE but it'd
be easy to switch to something else. Comments?

 hw/vhost.c      |    4 +++-
 hw/virtio-net.c |    6 ++++--
 hw/virtio-pci.c |    3 +++
 hw/virtio.h     |    2 ++
 4 files changed, 12 insertions(+), 3 deletions(-)
Anthony Liguori - Jan. 20, 2011, 3:43 p.m.
On 01/20/2011 09:35 AM, Michael S. Tsirkin wrote:
> When MSI is off, each interrupt needs to be bounced through the io
> thread when it's set/cleared, so vhost-net causes more context switches and
> higher CPU utilization than userspace virtio which handles networking in
> the same thread.
>
> We'll need to fix this by adding level irq support in kvm irqfd,
> for now disable vhost-net in these configurations.
>
> Signed-off-by: Michael S. Tsirkin<mst@redhat.com>
>    

I actually think this should be a terminal error.  The user asks for 
vhost-net, if we cannot enable it, we should exit.

Or we should warn the user that they should expect bad performance.  
Silently doing something that the user has explicitly asked us not to do 
is not a good behavior.

Regards,

Anthony Liguori

> ---
>
> I need to report some error from virtio-pci
> that would be handled specially (disable but don't
> report an error) so I wanted one that's never likely to be used by a
> userspace ioctl. I selected ERANGE but it'd
> be easy to switch to something else. Comments?
>
>   hw/vhost.c      |    4 +++-
>   hw/virtio-net.c |    6 ++++--
>   hw/virtio-pci.c |    3 +++
>   hw/virtio.h     |    2 ++
>   4 files changed, 12 insertions(+), 3 deletions(-)
>
> diff --git a/hw/vhost.c b/hw/vhost.c
> index 1d09ed0..c79765a 100644
> --- a/hw/vhost.c
> +++ b/hw/vhost.c
> @@ -649,7 +649,9 @@ int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev)
>
>       r = vdev->binding->set_guest_notifiers(vdev->binding_opaque, true);
>       if (r<  0) {
> -        fprintf(stderr, "Error binding guest notifier: %d\n", -r);
> +	if (r != -EVIRTIO_DISABLED) {
> +		fprintf(stderr, "Error binding guest notifier: %d\n", -r);
> +	}
>           goto fail_notifiers;
>       }
>
> diff --git a/hw/virtio-net.c b/hw/virtio-net.c
> index ccb3e63..5de3fee 100644
> --- a/hw/virtio-net.c
> +++ b/hw/virtio-net.c
> @@ -121,8 +121,10 @@ static void virtio_net_vhost_status(VirtIONet *n, uint8_t status)
>       if (!n->vhost_started) {
>           int r = vhost_net_start(tap_get_vhost_net(n->nic->nc.peer),&n->vdev);
>           if (r<  0) {
> -            error_report("unable to start vhost net: %d: "
> -                         "falling back on userspace virtio", -r);
> +            if (r != -EVIRTIO_DISABLED) {
> +                error_report("unable to start vhost net: %d: "
> +                             "falling back on userspace virtio", -r);
> +            }
>           } else {
>               n->vhost_started = 1;
>           }
> diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
> index dd8887a..dbf4be0 100644
> --- a/hw/virtio-pci.c
> +++ b/hw/virtio-pci.c
> @@ -628,6 +628,9 @@ static int virtio_pci_set_guest_notifier(void *opaque, int n, bool assign)
>       EventNotifier *notifier = virtio_queue_get_guest_notifier(vq);
>
>       if (assign) {
> +        if (!msix_enabled(&proxy->pci_dev)) {
> +            return -EVIRTIO_DISABLED;
> +        }
>           int r = event_notifier_init(notifier, 0);
>           if (r<  0) {
>               return r;
> diff --git a/hw/virtio.h b/hw/virtio.h
> index d8546d5..53bbdba 100644
> --- a/hw/virtio.h
> +++ b/hw/virtio.h
> @@ -98,6 +98,8 @@ typedef struct {
>       void (*vmstate_change)(void * opaque, bool running);
>   } VirtIOBindings;
>
> +#define EVIRTIO_DISABLED ERANGE
> +
>   #define VIRTIO_PCI_QUEUE_MAX 64
>
>   #define VIRTIO_NO_VECTOR 0xffff
>
Sridhar Samudrala - Jan. 20, 2011, 4:31 p.m.
On Thu, 2011-01-20 at 17:35 +0200, Michael S. Tsirkin wrote:
> When MSI is off, each interrupt needs to be bounced through the io
> thread when it's set/cleared, so vhost-net causes more context switches and
> higher CPU utilization than userspace virtio which handles networking in
> the same thread.
> 
> We'll need to fix this by adding level irq support in kvm irqfd,
> for now disable vhost-net in these configurations.
> 
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
> 
> I need to report some error from virtio-pci
> that would be handled specially (disable but don't
> report an error) so I wanted one that's never likely to be used by a
> userspace ioctl. I selected ERANGE but it'd
> be easy to switch to something else. Comments?

Should this error be EVHOST_DISABLED rather than EVIRTIO_DISABLED?

-Sridhar
> 
>  hw/vhost.c      |    4 +++-
>  hw/virtio-net.c |    6 ++++--
>  hw/virtio-pci.c |    3 +++
>  hw/virtio.h     |    2 ++
>  4 files changed, 12 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/vhost.c b/hw/vhost.c
> index 1d09ed0..c79765a 100644
> --- a/hw/vhost.c
> +++ b/hw/vhost.c
> @@ -649,7 +649,9 @@ int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev)
>  
>      r = vdev->binding->set_guest_notifiers(vdev->binding_opaque, true);
>      if (r < 0) {
> -        fprintf(stderr, "Error binding guest notifier: %d\n", -r);
> +	if (r != -EVIRTIO_DISABLED) {
> +		fprintf(stderr, "Error binding guest notifier: %d\n", -r);
> +	}
>          goto fail_notifiers;
>      }
>  
> diff --git a/hw/virtio-net.c b/hw/virtio-net.c
> index ccb3e63..5de3fee 100644
> --- a/hw/virtio-net.c
> +++ b/hw/virtio-net.c
> @@ -121,8 +121,10 @@ static void virtio_net_vhost_status(VirtIONet *n, uint8_t status)
>      if (!n->vhost_started) {
>          int r = vhost_net_start(tap_get_vhost_net(n->nic->nc.peer), &n->vdev);
>          if (r < 0) {
> -            error_report("unable to start vhost net: %d: "
> -                         "falling back on userspace virtio", -r);
> +            if (r != -EVIRTIO_DISABLED) {
> +                error_report("unable to start vhost net: %d: "
> +                             "falling back on userspace virtio", -r);
> +            }
>          } else {
>              n->vhost_started = 1;
>          }
> diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
> index dd8887a..dbf4be0 100644
> --- a/hw/virtio-pci.c
> +++ b/hw/virtio-pci.c
> @@ -628,6 +628,9 @@ static int virtio_pci_set_guest_notifier(void *opaque, int n, bool assign)
>      EventNotifier *notifier = virtio_queue_get_guest_notifier(vq);
>  
>      if (assign) {
> +        if (!msix_enabled(&proxy->pci_dev)) {
> +            return -EVIRTIO_DISABLED;
> +        }
>          int r = event_notifier_init(notifier, 0);
>          if (r < 0) {
>              return r;
> diff --git a/hw/virtio.h b/hw/virtio.h
> index d8546d5..53bbdba 100644
> --- a/hw/virtio.h
> +++ b/hw/virtio.h
> @@ -98,6 +98,8 @@ typedef struct {
>      void (*vmstate_change)(void * opaque, bool running);
>  } VirtIOBindings;
>  
> +#define EVIRTIO_DISABLED ERANGE
> +
>  #define VIRTIO_PCI_QUEUE_MAX 64
>  
>  #define VIRTIO_NO_VECTOR 0xffff
Michael S. Tsirkin - Jan. 20, 2011, 5:47 p.m.
On Thu, Jan 20, 2011 at 08:31:53AM -0800, Sridhar Samudrala wrote:
> On Thu, 2011-01-20 at 17:35 +0200, Michael S. Tsirkin wrote:
> > When MSI is off, each interrupt needs to be bounced through the io
> > thread when it's set/cleared, so vhost-net causes more context switches and
> > higher CPU utilization than userspace virtio which handles networking in
> > the same thread.
> > 
> > We'll need to fix this by adding level irq support in kvm irqfd,
> > for now disable vhost-net in these configurations.
> > 
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > ---
> > 
> > I need to report some error from virtio-pci
> > that would be handled specially (disable but don't
> > report an error) so I wanted one that's never likely to be used by a
> > userspace ioctl. I selected ERANGE but it'd
> > be easy to switch to something else. Comments?
> 
> Should this error be EVHOST_DISABLED rather than EVIRTIO_DISABLED?
> 
> -Sridhar

The error is reported by virtio-pci which does not know about vhost.
I started with EVIRTIO_MSIX_DISABLED and made is shorter.
Would EVIRTIO_MSIX_DISABLED be better?

> > 
> >  hw/vhost.c      |    4 +++-
> >  hw/virtio-net.c |    6 ++++--
> >  hw/virtio-pci.c |    3 +++
> >  hw/virtio.h     |    2 ++
> >  4 files changed, 12 insertions(+), 3 deletions(-)
> > 
> > diff --git a/hw/vhost.c b/hw/vhost.c
> > index 1d09ed0..c79765a 100644
> > --- a/hw/vhost.c
> > +++ b/hw/vhost.c
> > @@ -649,7 +649,9 @@ int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev)
> >  
> >      r = vdev->binding->set_guest_notifiers(vdev->binding_opaque, true);
> >      if (r < 0) {
> > -        fprintf(stderr, "Error binding guest notifier: %d\n", -r);
> > +	if (r != -EVIRTIO_DISABLED) {
> > +		fprintf(stderr, "Error binding guest notifier: %d\n", -r);
> > +	}
> >          goto fail_notifiers;
> >      }
> >  
> > diff --git a/hw/virtio-net.c b/hw/virtio-net.c
> > index ccb3e63..5de3fee 100644
> > --- a/hw/virtio-net.c
> > +++ b/hw/virtio-net.c
> > @@ -121,8 +121,10 @@ static void virtio_net_vhost_status(VirtIONet *n, uint8_t status)
> >      if (!n->vhost_started) {
> >          int r = vhost_net_start(tap_get_vhost_net(n->nic->nc.peer), &n->vdev);
> >          if (r < 0) {
> > -            error_report("unable to start vhost net: %d: "
> > -                         "falling back on userspace virtio", -r);
> > +            if (r != -EVIRTIO_DISABLED) {
> > +                error_report("unable to start vhost net: %d: "
> > +                             "falling back on userspace virtio", -r);
> > +            }
> >          } else {
> >              n->vhost_started = 1;
> >          }
> > diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
> > index dd8887a..dbf4be0 100644
> > --- a/hw/virtio-pci.c
> > +++ b/hw/virtio-pci.c
> > @@ -628,6 +628,9 @@ static int virtio_pci_set_guest_notifier(void *opaque, int n, bool assign)
> >      EventNotifier *notifier = virtio_queue_get_guest_notifier(vq);
> >  
> >      if (assign) {
> > +        if (!msix_enabled(&proxy->pci_dev)) {
> > +            return -EVIRTIO_DISABLED;
> > +        }
> >          int r = event_notifier_init(notifier, 0);
> >          if (r < 0) {
> >              return r;
> > diff --git a/hw/virtio.h b/hw/virtio.h
> > index d8546d5..53bbdba 100644
> > --- a/hw/virtio.h
> > +++ b/hw/virtio.h
> > @@ -98,6 +98,8 @@ typedef struct {
> >      void (*vmstate_change)(void * opaque, bool running);
> >  } VirtIOBindings;
> >  
> > +#define EVIRTIO_DISABLED ERANGE
> > +
> >  #define VIRTIO_PCI_QUEUE_MAX 64
> >  
> >  #define VIRTIO_NO_VECTOR 0xffff
Alex Williamson - Jan. 20, 2011, 6:05 p.m.
On Thu, 2011-01-20 at 17:35 +0200, Michael S. Tsirkin wrote:
> When MSI is off, each interrupt needs to be bounced through the io
> thread when it's set/cleared, so vhost-net causes more context switches and
> higher CPU utilization than userspace virtio which handles networking in
> the same thread.
> 
> We'll need to fix this by adding level irq support in kvm irqfd,
> for now disable vhost-net in these configurations.

We need this if we want avoid bouncing vfio INtx through qemu too.

> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
> 
> I need to report some error from virtio-pci
> that would be handled specially (disable but don't
> report an error) so I wanted one that's never likely to be used by a
> userspace ioctl. I selected ERANGE but it'd
> be easy to switch to something else. Comments?
> 
>  hw/vhost.c      |    4 +++-
>  hw/virtio-net.c |    6 ++++--
>  hw/virtio-pci.c |    3 +++
>  hw/virtio.h     |    2 ++
>  4 files changed, 12 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/vhost.c b/hw/vhost.c
> index 1d09ed0..c79765a 100644
> --- a/hw/vhost.c
> +++ b/hw/vhost.c
> @@ -649,7 +649,9 @@ int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev)
>  
>      r = vdev->binding->set_guest_notifiers(vdev->binding_opaque, true);
>      if (r < 0) {
> -        fprintf(stderr, "Error binding guest notifier: %d\n", -r);
> +	if (r != -EVIRTIO_DISABLED) {
> +		fprintf(stderr, "Error binding guest notifier: %d\n", -r);
> +	}

style - the above is tab indented.

>          goto fail_notifiers;
>      }
>  
> diff --git a/hw/virtio-net.c b/hw/virtio-net.c
> index ccb3e63..5de3fee 100644
> --- a/hw/virtio-net.c
> +++ b/hw/virtio-net.c
> @@ -121,8 +121,10 @@ static void virtio_net_vhost_status(VirtIONet *n, uint8_t status)
>      if (!n->vhost_started) {
>          int r = vhost_net_start(tap_get_vhost_net(n->nic->nc.peer), &n->vdev);
>          if (r < 0) {
> -            error_report("unable to start vhost net: %d: "
> -                         "falling back on userspace virtio", -r);
> +            if (r != -EVIRTIO_DISABLED) {
> +                error_report("unable to start vhost net: %d: "
> +                             "falling back on userspace virtio", -r);
> +            }
>          } else {
>              n->vhost_started = 1;
>          }
> diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
> index dd8887a..dbf4be0 100644
> --- a/hw/virtio-pci.c
> +++ b/hw/virtio-pci.c
> @@ -628,6 +628,9 @@ static int virtio_pci_set_guest_notifier(void *opaque, int n, bool assign)
>      EventNotifier *notifier = virtio_queue_get_guest_notifier(vq);
>  
>      if (assign) {
> +        if (!msix_enabled(&proxy->pci_dev)) {
> +            return -EVIRTIO_DISABLED;
> +        }
>          int r = event_notifier_init(notifier, 0);
>          if (r < 0) {
>              return r;
> diff --git a/hw/virtio.h b/hw/virtio.h
> index d8546d5..53bbdba 100644
> --- a/hw/virtio.h
> +++ b/hw/virtio.h
> @@ -98,6 +98,8 @@ typedef struct {
>      void (*vmstate_change)(void * opaque, bool running);
>  } VirtIOBindings;
>  
> +#define EVIRTIO_DISABLED ERANGE
> +
>  #define VIRTIO_PCI_QUEUE_MAX 64
>  
>  #define VIRTIO_NO_VECTOR 0xffff

I'm not a fan of having this special return value.  Why doesn't
virtio-pci only setup the set_guest_notifiers function pointer when msix
is enabled?  If not that, it could at least expose some
virtio_foo_enabled() type feature that vhost could check.  Thanks,

Alex
Sridhar Samudrala - Jan. 20, 2011, 11:43 p.m.
On Thu, 2011-01-20 at 19:47 +0200, Michael S. Tsirkin wrote:
> On Thu, Jan 20, 2011 at 08:31:53AM -0800, Sridhar Samudrala wrote:
> > On Thu, 2011-01-20 at 17:35 +0200, Michael S. Tsirkin wrote:
> > > When MSI is off, each interrupt needs to be bounced through the io
> > > thread when it's set/cleared, so vhost-net causes more context switches and
> > > higher CPU utilization than userspace virtio which handles networking in
> > > the same thread.
> > > 
> > > We'll need to fix this by adding level irq support in kvm irqfd,
> > > for now disable vhost-net in these configurations.
> > > 
> > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > > ---
> > > 
> > > I need to report some error from virtio-pci
> > > that would be handled specially (disable but don't
> > > report an error) so I wanted one that's never likely to be used by a
> > > userspace ioctl. I selected ERANGE but it'd
> > > be easy to switch to something else. Comments?
> > 
> > Should this error be EVHOST_DISABLED rather than EVIRTIO_DISABLED?
> > 
> > -Sridhar
> 
> The error is reported by virtio-pci which does not know about vhost.
> I started with EVIRTIO_MSIX_DISABLED and made is shorter.
> Would EVIRTIO_MSIX_DISABLED be better?

I think so. This makes it more clear.
-Sridhar
> 
> > > 
> > >  hw/vhost.c      |    4 +++-
> > >  hw/virtio-net.c |    6 ++++--
> > >  hw/virtio-pci.c |    3 +++
> > >  hw/virtio.h     |    2 ++
> > >  4 files changed, 12 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/hw/vhost.c b/hw/vhost.c
> > > index 1d09ed0..c79765a 100644
> > > --- a/hw/vhost.c
> > > +++ b/hw/vhost.c
> > > @@ -649,7 +649,9 @@ int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev)
> > >  
> > >      r = vdev->binding->set_guest_notifiers(vdev->binding_opaque, true);
> > >      if (r < 0) {
> > > -        fprintf(stderr, "Error binding guest notifier: %d\n", -r);
> > > +	if (r != -EVIRTIO_DISABLED) {
> > > +		fprintf(stderr, "Error binding guest notifier: %d\n", -r);
> > > +	}
> > >          goto fail_notifiers;
> > >      }
> > >  
> > > diff --git a/hw/virtio-net.c b/hw/virtio-net.c
> > > index ccb3e63..5de3fee 100644
> > > --- a/hw/virtio-net.c
> > > +++ b/hw/virtio-net.c
> > > @@ -121,8 +121,10 @@ static void virtio_net_vhost_status(VirtIONet *n, uint8_t status)
> > >      if (!n->vhost_started) {
> > >          int r = vhost_net_start(tap_get_vhost_net(n->nic->nc.peer), &n->vdev);
> > >          if (r < 0) {
> > > -            error_report("unable to start vhost net: %d: "
> > > -                         "falling back on userspace virtio", -r);
> > > +            if (r != -EVIRTIO_DISABLED) {
> > > +                error_report("unable to start vhost net: %d: "
> > > +                             "falling back on userspace virtio", -r);
> > > +            }
> > >          } else {
> > >              n->vhost_started = 1;
> > >          }
> > > diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
> > > index dd8887a..dbf4be0 100644
> > > --- a/hw/virtio-pci.c
> > > +++ b/hw/virtio-pci.c
> > > @@ -628,6 +628,9 @@ static int virtio_pci_set_guest_notifier(void *opaque, int n, bool assign)
> > >      EventNotifier *notifier = virtio_queue_get_guest_notifier(vq);
> > >  
> > >      if (assign) {
> > > +        if (!msix_enabled(&proxy->pci_dev)) {
> > > +            return -EVIRTIO_DISABLED;
> > > +        }
> > >          int r = event_notifier_init(notifier, 0);
> > >          if (r < 0) {
> > >              return r;
> > > diff --git a/hw/virtio.h b/hw/virtio.h
> > > index d8546d5..53bbdba 100644
> > > --- a/hw/virtio.h
> > > +++ b/hw/virtio.h
> > > @@ -98,6 +98,8 @@ typedef struct {
> > >      void (*vmstate_change)(void * opaque, bool running);
> > >  } VirtIOBindings;
> > >  
> > > +#define EVIRTIO_DISABLED ERANGE
> > > +
> > >  #define VIRTIO_PCI_QUEUE_MAX 64
> > >  
> > >  #define VIRTIO_NO_VECTOR 0xffff
rukhsana ansari - March 9, 2011, 12:19 p.m.
Hi,

On Thu, Jan 20, 2011 at 9:05 PM, Michael S. Tsirkin <mst@redhat.com> wrote:

> When MSI is off, each interrupt needs to be bounced through the io
> thread when it's set/cleared, so vhost-net causes more context switches and
> higher CPU utilization than userspace virtio which handles networking in
> the same thread.
>
> We'll need to fix this by adding level irq support in kvm irqfd,
> for now disable vhost-net in these configurations.
>
>
> This maybe a novice question - Would appreciate it if you can you provide a
pointer to documentation or relevant code that explains what is the
limitation in supporting level irq support in kvm irqfd.


Thanks
-Rukhsana
rukhsana ansari - March 14, 2011, 5:05 p.m.
Seeking clarification to the original question I posted:
>>
>>
> This maybe a novice question - Would appreciate it if you can you provide
a
> pointer to documentation or relevant code that explains what is the
> *limitation in supporting level irq support in kvm irqfd.*
>
>
>
After browsing the KVM kernel code, it does look like direct assignment of
PCI devices allows support for level-triggered interrupts to be injected to
the guest from the kernel.  (*as opposed to not supporting it for vhost
irqfd mechanism*)
This occurs when the guest device supports INTX.
Reference:  kvm_assigned_dev_interrupt_work_handler() in assigned-dev.c
calls kvm_set_irq()
with the guest_irq.
This function in turn invokes the assigned set function  (either
kvm_set_pic_irq or kvm_set_ioapic_irq) which was setup at kvm_irq_chip
creation time when kvm_setup_default_irq_routing () called for handling
ioctl KVM_CREATE_IRQCHIP.

So, it isn't clear why level-triggered interrupt isn't supported for irqfd
mechanism.
Would greatly appreciate clarification here

Thanks
-Rukhsana
Michael S. Tsirkin - March 14, 2011, 7 p.m.
On Mon, Mar 14, 2011 at 10:35:08PM +0530, rukhsana ansari wrote:
> Seeking clarification to the original question I posted:
> >>
> >>
> > This maybe a novice question - Would appreciate it if you can you provide a
> > pointer to documentation or relevant code that explains what is the
> > limitation in supporting level irq support in kvm irqfd.
> >
> >
> >
> After browsing the KVM kernel code, it does look like direct assignment of PCI
> devices allows support for level-triggered interrupts to be injected to the
> guest from the kernel.  (as opposed to not supporting it for vhost irqfd
> mechanism)
> This occurs when the guest device supports INTX.
> Reference:  kvm_assigned_dev_interrupt_work_handler() in assigned-dev.c calls
> kvm_set_irq()
> with the guest_irq.
> This function in turn invokes the assigned set function  (either
> kvm_set_pic_irq or kvm_set_ioapic_irq) which was setup at kvm_irq_chip creation
> time when kvm_setup_default_irq_routing () called for handling ioctl
> KVM_CREATE_IRQCHIP.
> 
> So, it isn't clear why level-triggered interrupt isn't supported for irqfd
> mechanism.
> Would greatly appreciate clarification here
> 
> Thanks
> -Rukhsana
> 

Mostly, no one came up with an implementation so far.

If the point is to use irqfd with vhost-net, there's also
a question of adding interfaces to
1. pass IO read transactions directly to another kernel module
2. add an interface to clear the irq level

Maybe the right thing is to combine the two somehow:
irqfd might get an oiption to set a bit in memory,
ioeventfd might get an option to read and clear from memory
and clear irqfd line at the same time.
Alex Williamson - March 14, 2011, 7:31 p.m.
On Mon, 2011-03-14 at 21:00 +0200, Michael S. Tsirkin wrote:
> On Mon, Mar 14, 2011 at 10:35:08PM +0530, rukhsana ansari wrote:
> > Seeking clarification to the original question I posted:
> > >>
> > >>
> > > This maybe a novice question - Would appreciate it if you can you provide a
> > > pointer to documentation or relevant code that explains what is the
> > > limitation in supporting level irq support in kvm irqfd.
> > >
> > >
> > >
> > After browsing the KVM kernel code, it does look like direct assignment of PCI
> > devices allows support for level-triggered interrupts to be injected to the
> > guest from the kernel.  (as opposed to not supporting it for vhost irqfd
> > mechanism)
> > This occurs when the guest device supports INTX.
> > Reference:  kvm_assigned_dev_interrupt_work_handler() in assigned-dev.c calls
> > kvm_set_irq()
> > with the guest_irq.
> > This function in turn invokes the assigned set function  (either
> > kvm_set_pic_irq or kvm_set_ioapic_irq) which was setup at kvm_irq_chip creation
> > time when kvm_setup_default_irq_routing () called for handling ioctl
> > KVM_CREATE_IRQCHIP.
> > 
> > So, it isn't clear why level-triggered interrupt isn't supported for irqfd
> > mechanism.
> > Would greatly appreciate clarification here
> > 
> > Thanks
> > -Rukhsana
> > 
> 
> Mostly, no one came up with an implementation so far.
> 
> If the point is to use irqfd with vhost-net, there's also
> a question of adding interfaces to
> 1. pass IO read transactions directly to another kernel module
> 2. add an interface to clear the irq level
> 
> Maybe the right thing is to combine the two somehow:
> irqfd might get an oiption to set a bit in memory,
> ioeventfd might get an option to read and clear from memory
> and clear irqfd line at the same time.

I had wanted this for VFIO too and it gets pretty complicated.  The
first problem with level triggered interrupts is that you need to know
which GSI your device triggers.  This means translating PCI INTA through
bridge swizzles and chipset mapping to an IOAPIC.  Current device
assignment does this through a complete hack in qemu.  Then you can set
the IRQ, but being level triggered, we need to know when the guest has
serviced the IRQ so we can de-assert it.  This requires a hook into the
in-kernel APIC to sent the EOI back out to userspace.

I posted RFC patches for doing all this a while back, but they didn't go
anywhere.  I think the feeling was that it was too intrusive for "slow"
interrupts.  The current thinking for VFIO based device assignment is to
use qemu for level interrupts until we find something that actually
needs low latency in this path.  We generally consider INTx to be like
supporting i/o port space or non-4k BARs, ie. necessary for
compatibility, but not necessarily a performance path.  High performance
devices should always be using some kind of MSI because it bypasses all
of the APIC complications and slowness.  Thanks,

Alex
rukhsana ansari - March 17, 2011, 3:34 p.m.
Alex, Michael,

Thank you for the clarification.

On Tue, Mar 15, 2011 at 1:01 AM, Alex Williamson <alex.williamson@redhat.com
> wrote:

> On Mon, 2011-03-14 at 21:00 +0200, Michael S. Tsirkin wrote:
> > On Mon, Mar 14, 2011 at 10:35:08PM +0530, rukhsana ansari wrote:
> > > Seeking clarification to the original question I posted:
> > > >>
> > > >>
> > > > This maybe a novice question - Would appreciate it if you can you
> provide a
> > > > pointer to documentation or relevant code that explains what is the
> > > > limitation in supporting level irq support in kvm irqfd.
> > > >
> > > >
> > > >
> > > After browsing the KVM kernel code, it does look like direct assignment
> of PCI
> > > devices allows support for level-triggered interrupts to be injected to
> the
> > > guest from the kernel.  (as opposed to not supporting it for vhost
> irqfd
> > > mechanism)
> > > This occurs when the guest device supports INTX.
> > > Reference:  kvm_assigned_dev_interrupt_work_handler() in assigned-dev.c
> calls
> > > kvm_set_irq()
> > > with the guest_irq.
> > > This function in turn invokes the assigned set function  (either
> > > kvm_set_pic_irq or kvm_set_ioapic_irq) which was setup at kvm_irq_chip
> creation
> > > time when kvm_setup_default_irq_routing () called for handling ioctl
> > > KVM_CREATE_IRQCHIP.
> > >
> > > So, it isn't clear why level-triggered interrupt isn't supported for
> irqfd
> > > mechanism.
> > > Would greatly appreciate clarification here
> > >
> > > Thanks
> > > -Rukhsana
> > >
> >
> > Mostly, no one came up with an implementation so far.
> >
> > If the point is to use irqfd with vhost-net, there's also
> > a question of adding interfaces to
> > 1. pass IO read transactions directly to another kernel module
> > 2. add an interface to clear the irq level
> >
> > Maybe the right thing is to combine the two somehow:
> > irqfd might get an oiption to set a bit in memory,
> > ioeventfd might get an option to read and clear from memory
> > and clear irqfd line at the same time.
>
> I had wanted this for VFIO too and it gets pretty complicated.  The
> first problem with level triggered interrupts is that you need to know
> which GSI your device triggers.  This means translating PCI INTA through
> bridge swizzles and chipset mapping to an IOAPIC.  Current device
> assignment does this through a complete hack in qemu.  Then you can set
> the IRQ, but being level triggered, we need to know when the guest has
> serviced the IRQ so we can de-assert it.  This requires a hook into the
> in-kernel APIC to sent the EOI back out to userspace.
>
> I posted RFC patches for doing all this a while back, but they didn't go
> anywhere.  I think the feeling was that it was too intrusive for "slow"
> interrupts.  The current thinking for VFIO based device assignment is to
> use qemu for level interrupts until we find something that actually
> needs low latency in this path.  We generally consider INTx to be like
> supporting i/o port space or non-4k BARs, ie. necessary for
> compatibility, but not necessarily a performance path.  High performance
> devices should always be using some kind of MSI because it bypasses all
> of the APIC complications and slowness.  Thanks,
>
> Alex
>
>

Patch

diff --git a/hw/vhost.c b/hw/vhost.c
index 1d09ed0..c79765a 100644
--- a/hw/vhost.c
+++ b/hw/vhost.c
@@ -649,7 +649,9 @@  int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev)
 
     r = vdev->binding->set_guest_notifiers(vdev->binding_opaque, true);
     if (r < 0) {
-        fprintf(stderr, "Error binding guest notifier: %d\n", -r);
+	if (r != -EVIRTIO_DISABLED) {
+		fprintf(stderr, "Error binding guest notifier: %d\n", -r);
+	}
         goto fail_notifiers;
     }
 
diff --git a/hw/virtio-net.c b/hw/virtio-net.c
index ccb3e63..5de3fee 100644
--- a/hw/virtio-net.c
+++ b/hw/virtio-net.c
@@ -121,8 +121,10 @@  static void virtio_net_vhost_status(VirtIONet *n, uint8_t status)
     if (!n->vhost_started) {
         int r = vhost_net_start(tap_get_vhost_net(n->nic->nc.peer), &n->vdev);
         if (r < 0) {
-            error_report("unable to start vhost net: %d: "
-                         "falling back on userspace virtio", -r);
+            if (r != -EVIRTIO_DISABLED) {
+                error_report("unable to start vhost net: %d: "
+                             "falling back on userspace virtio", -r);
+            }
         } else {
             n->vhost_started = 1;
         }
diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
index dd8887a..dbf4be0 100644
--- a/hw/virtio-pci.c
+++ b/hw/virtio-pci.c
@@ -628,6 +628,9 @@  static int virtio_pci_set_guest_notifier(void *opaque, int n, bool assign)
     EventNotifier *notifier = virtio_queue_get_guest_notifier(vq);
 
     if (assign) {
+        if (!msix_enabled(&proxy->pci_dev)) {
+            return -EVIRTIO_DISABLED;
+        }
         int r = event_notifier_init(notifier, 0);
         if (r < 0) {
             return r;
diff --git a/hw/virtio.h b/hw/virtio.h
index d8546d5..53bbdba 100644
--- a/hw/virtio.h
+++ b/hw/virtio.h
@@ -98,6 +98,8 @@  typedef struct {
     void (*vmstate_change)(void * opaque, bool running);
 } VirtIOBindings;
 
+#define EVIRTIO_DISABLED ERANGE
+
 #define VIRTIO_PCI_QUEUE_MAX 64
 
 #define VIRTIO_NO_VECTOR 0xffff