From patchwork Mon Jan 31 21:19:37 2011
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Michael S. Tsirkin" <mst@redhat.com>
X-Patchwork-Id: 81248
Return-Path: <qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Received: from lists.gnu.org (lists.gnu.org [199.232.76.165])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(Client did not present a certificate)
	by ozlabs.org (Postfix) with ESMTPS id 16F14B70EE
	for <incoming@patchwork.ozlabs.org>;
	Tue,  1 Feb 2011 10:29:07 +1100 (EST)
Received: from localhost ([127.0.0.1]:41642 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1Pk1Pl-0005hp-Rd
	for incoming@patchwork.ozlabs.org; Mon, 31 Jan 2011 16:36:02 -0500
Received: from [140.186.70.92] (port=58660 helo=eggs.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1Pk1AK-0000gJ-Pq
	for qemu-devel@nongnu.org; Mon, 31 Jan 2011 16:21:03 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <mst@redhat.com>) id 1Pk1AD-00057Q-HA
	for qemu-devel@nongnu.org; Mon, 31 Jan 2011 16:20:03 -0500
Received: from mx1.redhat.com ([209.132.183.28]:48634)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <mst@redhat.com>) id 1Pk1AD-000565-1E
	for qemu-devel@nongnu.org; Mon, 31 Jan 2011 16:19:57 -0500
Received: from int-mx10.intmail.prod.int.phx2.redhat.com
	(int-mx10.intmail.prod.int.phx2.redhat.com [10.5.11.23])
	by mx1.redhat.com (8.13.8/8.13.8) with ESMTP id p0VLJrTr018591
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK);
	Mon, 31 Jan 2011 16:19:53 -0500
Received: from redhat.com (vpn-200-64.tlv.redhat.com [10.35.200.64])
	by int-mx10.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with
	SMTP id p0VLJn7C019675; Mon, 31 Jan 2011 16:19:50 -0500
Date: Mon, 31 Jan 2011 23:19:37 +0200
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Anthony Liguori <anthony@codemonkey.ws>
Message-ID: <20110131211937.GA8637@redhat.com>
MIME-Version: 1.0
Content-Disposition: inline
User-Agent: Mutt/1.5.21 (2010-09-15)
X-Scanned-By: MIMEDefang 2.68 on 10.5.11.23
X-detected-operating-system: by eggs.gnu.org: Genre and OS details not
	recognized.
X-Received-From: 209.132.183.28
Cc: kvm@vger.kernel.org, Juan Quintela <quintela@redhat.com>,
	Jes.Sorensen@redhat.com, jasowang@redhat.com, qemu-devel@nongnu.org,
	Alex Williamson <alex.williamson@redhat.com>
Subject: [Qemu-devel] [PATCHv2] vhost: force vhost off for non-MSI guests
X-BeenThere: qemu-devel@nongnu.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org
Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org

When MSI is off, each interrupt needs to be bounced through the io
thread when it's set/cleared, so vhost-net causes more context switches and
higher CPU utilization than userspace virtio which handles networking in
the same thread.

We'll need to fix this by adding level irq support in kvm irqfd,
for now disable vhost-net in these configurations.

Added a vhostforce flag to force vhost-net back on.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---

Untested, I'll send a pull request later after
some testing assuming we've resolved the
command line comments to everyone's satisfaction.

Changes since v1:
	added vhostforce to enable vhost for non-MSI guests

 hw/vhost.c      |   16 +++++++++++-----
 hw/vhost.h      |    3 ++-
 hw/vhost_net.c  |    8 +++++---
 hw/vhost_net.h  |    2 +-
 hw/virtio-net.c |    6 ++++--
 hw/virtio-pci.c |    6 +++++-
 hw/virtio.h     |    4 +++-
 net/tap.c       |    6 ++++--
 qemu-options.hx |    4 +++-
 9 files changed, 38 insertions(+), 17 deletions(-)

diff --git a/hw/vhost.c b/hw/vhost.c
index 6082da2..72df75f 100644
--- a/hw/vhost.c
+++ b/hw/vhost.c
@@ -581,7 +581,7 @@ static void vhost_virtqueue_cleanup(struct vhost_dev *dev,
                               0, virtio_queue_get_desc_size(vdev, idx));
 }
 
-int vhost_dev_init(struct vhost_dev *hdev, int devfd)
+int vhost_dev_init(struct vhost_dev *hdev, int devfd, bool force)
 {
     uint64_t features;
     int r;
@@ -613,6 +613,7 @@ int vhost_dev_init(struct vhost_dev *hdev, int devfd)
     hdev->log_enabled = false;
     hdev->started = false;
     cpu_register_phys_memory_client(&hdev->client);
+    hdev->force = force;
     return 0;
 fail:
     r = -errno;
@@ -636,9 +637,12 @@ int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev)
         goto fail;
     }
 
-    r = vdev->binding->set_guest_notifiers(vdev->binding_opaque, true);
+    r = vdev->binding->set_guest_notifiers(vdev->binding_opaque, true,
+                                           hdev->force);
     if (r < 0) {
-        fprintf(stderr, "Error binding guest notifier: %d\n", -r);
+	if (r != -EVIRTIO_DISABLED) {
+		fprintf(stderr, "Error binding guest notifier: %d\n", -r);
+	}
         goto fail_notifiers;
     }
 
@@ -686,7 +690,8 @@ fail_vq:
     }
 fail_mem:
 fail_features:
-    vdev->binding->set_guest_notifiers(vdev->binding_opaque, false);
+    vdev->binding->set_guest_notifiers(vdev->binding_opaque, false,
+                                       hdev->force);
 fail_notifiers:
 fail:
     return r;
@@ -704,7 +709,8 @@ void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev)
     }
     vhost_client_sync_dirty_bitmap(&hdev->client, 0,
                                    (target_phys_addr_t)~0x0ull);
-    r = vdev->binding->set_guest_notifiers(vdev->binding_opaque, false);
+    r = vdev->binding->set_guest_notifiers(vdev->binding_opaque, false,
+                                           hdev->force);
     if (r < 0) {
         fprintf(stderr, "vhost guest notifier cleanup failed: %d\n", r);
         fflush(stderr);
diff --git a/hw/vhost.h b/hw/vhost.h
index 86dd834..d5d4d86 100644
--- a/hw/vhost.h
+++ b/hw/vhost.h
@@ -38,9 +38,10 @@ struct vhost_dev {
     bool log_enabled;
     vhost_log_chunk_t *log;
     unsigned long long log_size;
+    bool force;
 };
 
-int vhost_dev_init(struct vhost_dev *hdev, int devfd);
+int vhost_dev_init(struct vhost_dev *hdev, int devfd, bool force);
 void vhost_dev_cleanup(struct vhost_dev *hdev);
 int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev);
 void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev);
diff --git a/hw/vhost_net.c b/hw/vhost_net.c
index c068be1..4f57271 100644
--- a/hw/vhost_net.c
+++ b/hw/vhost_net.c
@@ -81,7 +81,8 @@ static int vhost_net_get_fd(VLANClientState *backend)
     }
 }
 
-struct vhost_net *vhost_net_init(VLANClientState *backend, int devfd)
+struct vhost_net *vhost_net_init(VLANClientState *backend, int devfd,
+                                 bool force)
 {
     int r;
     struct vhost_net *net = qemu_malloc(sizeof *net);
@@ -98,7 +99,7 @@ struct vhost_net *vhost_net_init(VLANClientState *backend, int devfd)
         (1 << VHOST_NET_F_VIRTIO_NET_HDR);
     net->backend = r;
 
-    r = vhost_dev_init(&net->dev, devfd);
+    r = vhost_dev_init(&net->dev, devfd, force);
     if (r < 0) {
         goto fail;
     }
@@ -188,7 +189,8 @@ void vhost_net_cleanup(struct vhost_net *net)
     qemu_free(net);
 }
 #else
-struct vhost_net *vhost_net_init(VLANClientState *backend, int devfd)
+struct vhost_net *vhost_net_init(VLANClientState *backend, int devfd,
+                                 bool force)
 {
 	return NULL;
 }
diff --git a/hw/vhost_net.h b/hw/vhost_net.h
index 6c18ff7..8435094 100644
--- a/hw/vhost_net.h
+++ b/hw/vhost_net.h
@@ -6,7 +6,7 @@
 struct vhost_net;
 typedef struct vhost_net VHostNetState;
 
-VHostNetState *vhost_net_init(VLANClientState *backend, int devfd);
+VHostNetState *vhost_net_init(VLANClientState *backend, int devfd, bool force);
 
 int vhost_net_start(VHostNetState *net, VirtIODevice *dev);
 void vhost_net_stop(VHostNetState *net, VirtIODevice *dev);
diff --git a/hw/virtio-net.c b/hw/virtio-net.c
index ccb3e63..5de3fee 100644
--- a/hw/virtio-net.c
+++ b/hw/virtio-net.c
@@ -121,8 +121,10 @@ static void virtio_net_vhost_status(VirtIONet *n, uint8_t status)
     if (!n->vhost_started) {
         int r = vhost_net_start(tap_get_vhost_net(n->nic->nc.peer), &n->vdev);
         if (r < 0) {
-            error_report("unable to start vhost net: %d: "
-                         "falling back on userspace virtio", -r);
+            if (r != -EVIRTIO_DISABLED) {
+                error_report("unable to start vhost net: %d: "
+                             "falling back on userspace virtio", -r);
+            }
         } else {
             n->vhost_started = 1;
         }
diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
index d07ff97..c93ef11 100644
--- a/hw/virtio-pci.c
+++ b/hw/virtio-pci.c
@@ -595,12 +595,16 @@ static int virtio_pci_set_guest_notifier(void *opaque, int n, bool assign)
     return 0;
 }
 
-static int virtio_pci_set_guest_notifiers(void *opaque, bool assign)
+static int virtio_pci_set_guest_notifiers(void *opaque, bool assign, bool force)
 {
     VirtIOPCIProxy *proxy = opaque;
     VirtIODevice *vdev = proxy->vdev;
     int r, n;
 
+    if (assign && !force && !msix_enabled(&proxy->pci_dev)) {
+	    return -EVIRTIO_DISABLED;
+    }
+
     for (n = 0; n < VIRTIO_PCI_QUEUE_MAX; n++) {
         if (!virtio_queue_get_num(vdev, n)) {
             break;
diff --git a/hw/virtio.h b/hw/virtio.h
index d8546d5..156c9a3 100644
--- a/hw/virtio.h
+++ b/hw/virtio.h
@@ -93,11 +93,13 @@ typedef struct {
     int (*load_config)(void * opaque, QEMUFile *f);
     int (*load_queue)(void * opaque, int n, QEMUFile *f);
     unsigned (*get_features)(void * opaque);
-    int (*set_guest_notifiers)(void * opaque, bool assigned);
+    int (*set_guest_notifiers)(void * opaque, bool assigned, bool force);
     int (*set_host_notifier)(void * opaque, int n, bool assigned);
     void (*vmstate_change)(void * opaque, bool running);
 } VirtIOBindings;
 
+#define EVIRTIO_DISABLED ERANGE
+
 #define VIRTIO_PCI_QUEUE_MAX 64
 
 #define VIRTIO_NO_VECTOR 0xffff
diff --git a/net/tap.c b/net/tap.c
index eada34a..b8cd252 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -491,8 +491,10 @@ int net_init_tap(QemuOpts *opts, Monitor *mon, const char *name, VLANState *vlan
         }
     }
 
-    if (qemu_opt_get_bool(opts, "vhost", !!qemu_opt_get(opts, "vhostfd"))) {
+    if (qemu_opt_get_bool(opts, "vhost", !!qemu_opt_get(opts, "vhostfd") ||
+                          qemu_opt_get_bool(opts, "vhostforce", false))) {
         int vhostfd, r;
+        bool force = qemu_opt_get_bool(opts, "vhostforce", false);
         if (qemu_opt_get(opts, "vhostfd")) {
             r = net_handle_fd_param(mon, qemu_opt_get(opts, "vhostfd"));
             if (r == -1) {
@@ -502,7 +504,7 @@ int net_init_tap(QemuOpts *opts, Monitor *mon, const char *name, VLANState *vlan
         } else {
             vhostfd = -1;
         }
-        s->vhost_net = vhost_net_init(&s->nc, vhostfd);
+        s->vhost_net = vhost_net_init(&s->nc, vhostfd, force);
         if (!s->vhost_net) {
             error_report("vhost-net requested but could not be initialized");
             return -1;
diff --git a/qemu-options.hx b/qemu-options.hx
index 898561d..11c93a2 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -1050,7 +1050,7 @@ DEF("net", HAS_ARG, QEMU_OPTION_net,
     "-net tap[,vlan=n][,name=str],ifname=name\n"
     "                connect the host TAP network interface to VLAN 'n'\n"
 #else
-    "-net tap[,vlan=n][,name=str][,fd=h][,ifname=name][,script=file][,downscript=dfile][,sndbuf=nbytes][,vnet_hdr=on|off][,vhost=on|off][,vhostfd=h]\n"
+    "-net tap[,vlan=n][,name=str][,fd=h][,ifname=name][,script=file][,downscript=dfile][,sndbuf=nbytes][,vnet_hdr=on|off][,vhost=on|off][,vhostfd=h][,vhostforce=on|off]\n"
     "                connect the host TAP network interface to VLAN 'n' and use the\n"
     "                network scripts 'file' (default=" DEFAULT_NETWORK_SCRIPT ")\n"
     "                and 'dfile' (default=" DEFAULT_NETWORK_DOWN_SCRIPT ")\n"
@@ -1061,6 +1061,8 @@ DEF("net", HAS_ARG, QEMU_OPTION_net,
     "                use vnet_hdr=off to avoid enabling the IFF_VNET_HDR tap flag\n"
     "                use vnet_hdr=on to make the lack of IFF_VNET_HDR support an error condition\n"
     "                use vhost=on to enable experimental in kernel accelerator\n"
+    "                    (only has effect for virtio guests which use MSIX)\n"
+    "                use vhostforce=on to force vhost on for non-MSIX virtio guests\n"
     "                use 'vhostfd=h' to connect to an already opened vhost net device\n"
 #endif
     "-net socket[,vlan=n][,name=str][,fd=h][,listen=[host]:port][,connect=host:port]\n"