From patchwork Thu Aug 8 04:45:04 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tyler Hicks X-Patchwork-Id: 1143818 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 463wm7207Yz9s7T; Thu, 8 Aug 2019 14:45:27 +1000 (AEST) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1hvaIl-0008Qs-Rf; Thu, 08 Aug 2019 04:45:23 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.86_2) (envelope-from ) id 1hvaIj-0008QM-RG for kernel-team@lists.ubuntu.com; Thu, 08 Aug 2019 04:45:21 +0000 Received: from 2.general.tyhicks.us.vpn ([10.172.64.53] helo=sec.ubuntu-ci) by youngberry.canonical.com with esmtpsa (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.76) (envelope-from ) id 1hvaIi-0004ab-Tj; Thu, 08 Aug 2019 04:45:21 +0000 From: Tyler Hicks To: kernel-team@lists.ubuntu.com Subject: [PATCH 1/9] vhost: introduce vhost_vq_avail_empty() Date: Thu, 8 Aug 2019 04:45:04 +0000 Message-Id: <1565239512-11188-2-git-send-email-tyhicks@canonical.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1565239512-11188-1-git-send-email-tyhicks@canonical.com> References: <1565239512-11188-1-git-send-email-tyhicks@canonical.com> X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Jason Wang This patch introduces a helper which will return true if we're sure that the available ring is empty for a specific vq. When we're not sure, e.g vq access failure, return false instead. This could be used for busy polling code to exit the busy loop. Signed-off-by: Jason Wang Signed-off-by: Michael S. Tsirkin CVE-2019-3900 (cherry picked from commit d4a60603fa0b42012decfa058dfa44cffde7a10c) Signed-off-by: Tyler Hicks --- drivers/vhost/vhost.c | 14 ++++++++++++++ drivers/vhost/vhost.h | 1 + 2 files changed, 15 insertions(+) diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index 2ed0a356d1d3..84a0a97b7988 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -1629,6 +1629,20 @@ void vhost_add_used_and_signal_n(struct vhost_dev *dev, } EXPORT_SYMBOL_GPL(vhost_add_used_and_signal_n); +/* return true if we're sure that avaiable ring is empty */ +bool vhost_vq_avail_empty(struct vhost_dev *dev, struct vhost_virtqueue *vq) +{ + __virtio16 avail_idx; + int r; + + r = __get_user(avail_idx, &vq->avail->idx); + if (r) + return false; + + return vhost16_to_cpu(vq, avail_idx) == vq->avail_idx; +} +EXPORT_SYMBOL_GPL(vhost_vq_avail_empty); + /* OK, now we need to know about added descriptors. */ bool vhost_enable_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq) { diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h index d3f767448a72..0f9b4d22bee5 100644 --- a/drivers/vhost/vhost.h +++ b/drivers/vhost/vhost.h @@ -158,6 +158,7 @@ void vhost_add_used_and_signal_n(struct vhost_dev *, struct vhost_virtqueue *, struct vring_used_elem *heads, unsigned count); void vhost_signal(struct vhost_dev *, struct vhost_virtqueue *); void vhost_disable_notify(struct vhost_dev *, struct vhost_virtqueue *); +bool vhost_vq_avail_empty(struct vhost_dev *, struct vhost_virtqueue *); bool vhost_enable_notify(struct vhost_dev *, struct vhost_virtqueue *); int vhost_log_write(struct vhost_virtqueue *vq, struct vhost_log *log, From patchwork Thu Aug 8 04:45:05 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tyler Hicks X-Patchwork-Id: 1143822 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 463wmJ0Hxqz9sN1; Thu, 8 Aug 2019 14:45:36 +1000 (AEST) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1hvaIt-0008VR-Gs; Thu, 08 Aug 2019 04:45:31 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.86_2) (envelope-from ) id 1hvaIl-0008Qj-Ck for kernel-team@lists.ubuntu.com; Thu, 08 Aug 2019 04:45:23 +0000 Received: from 2.general.tyhicks.us.vpn ([10.172.64.53] helo=sec.ubuntu-ci) by youngberry.canonical.com with esmtpsa (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.76) (envelope-from ) id 1hvaIk-0004ab-Ln; Thu, 08 Aug 2019 04:45:23 +0000 From: Tyler Hicks To: kernel-team@lists.ubuntu.com Subject: [PATCH 2/9] vhost_net: tx batching Date: Thu, 8 Aug 2019 04:45:05 +0000 Message-Id: <1565239512-11188-3-git-send-email-tyhicks@canonical.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1565239512-11188-1-git-send-email-tyhicks@canonical.com> References: <1565239512-11188-1-git-send-email-tyhicks@canonical.com> X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Jason Wang This patch tries to utilize tuntap rx batching by peeking the tx virtqueue during transmission, if there's more available buffers in the virtqueue, set MSG_MORE flag for a hint for backend (e.g tuntap) to batch the packets. Reviewed-by: Stefan Hajnoczi Signed-off-by: Jason Wang Acked-by: Michael S. Tsirkin Signed-off-by: David S. Miller CVE-2019-3900 (backported from commit 0ed005ce02fa0a88e5e6b7b5f7ff452171881610) [tyhicks: Minor context adjustment due to missing commit 030881372460 ("vhost_net: basic polling support")] Signed-off-by: Tyler Hicks --- drivers/vhost/net.c | 23 ++++++++++++++++++++--- 1 file changed, 20 insertions(+), 3 deletions(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 645b2197930e..7d467525bb38 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -287,6 +287,15 @@ static void vhost_zerocopy_callback(struct ubuf_info *ubuf, bool success) rcu_read_unlock_bh(); } +static bool vhost_exceeds_maxpend(struct vhost_net *net) +{ + struct vhost_net_virtqueue *nvq = &net->vqs[VHOST_NET_VQ_TX]; + struct vhost_virtqueue *vq = &nvq->vq; + + return (nvq->upend_idx + vq->num - VHOST_MAX_PEND) % UIO_MAXIOV + == nvq->done_idx; +} + /* Expects to be always run from workqueue - which acts as * read-size critical section for our kind of RCU. */ static void handle_tx(struct vhost_net *net) @@ -327,8 +336,7 @@ static void handle_tx(struct vhost_net *net) /* If more outstanding DMAs, queue the work. * Handle upend_idx wrap around */ - if (unlikely((nvq->upend_idx + vq->num - VHOST_MAX_PEND) - % UIO_MAXIOV == nvq->done_idx)) + if (unlikely(vhost_exceeds_maxpend(net))) break; head = vhost_get_vq_desc(vq, vq->iov, @@ -388,6 +396,16 @@ static void handle_tx(struct vhost_net *net) msg.msg_control = NULL; ubufs = NULL; } + + total_len += len; + if (total_len < VHOST_NET_WEIGHT && + !vhost_vq_avail_empty(&net->dev, vq) && + likely(!vhost_exceeds_maxpend(net))) { + msg.msg_flags |= MSG_MORE; + } else { + msg.msg_flags &= ~MSG_MORE; + } + /* TODO: Check specific error and bomb out unless ENOBUFS? */ err = sock->ops->sendmsg(sock, &msg, len); if (unlikely(err < 0)) { @@ -406,7 +424,6 @@ static void handle_tx(struct vhost_net *net) vhost_add_used_and_signal(&net->dev, vq, head, 0); else vhost_zerocopy_signal_used(net, vq); - total_len += len; vhost_net_tx_packet(net); if (unlikely(total_len >= VHOST_NET_WEIGHT)) { vhost_poll_queue(&vq->poll); From patchwork Thu Aug 8 04:45:06 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tyler Hicks X-Patchwork-Id: 1143819 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 463wmC1cC3z9s7T; Thu, 8 Aug 2019 14:45:31 +1000 (AEST) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1hvaIp-0008Sh-5h; Thu, 08 Aug 2019 04:45:27 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.86_2) (envelope-from ) id 1hvaIn-0008Rr-0T for kernel-team@lists.ubuntu.com; Thu, 08 Aug 2019 04:45:25 +0000 Received: from 2.general.tyhicks.us.vpn ([10.172.64.53] helo=sec.ubuntu-ci) by youngberry.canonical.com with esmtpsa (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.76) (envelope-from ) id 1hvaIm-0004ab-74; Thu, 08 Aug 2019 04:45:24 +0000 From: Tyler Hicks To: kernel-team@lists.ubuntu.com Subject: [PATCH 3/9] vhost_net: do not stall on zerocopy depletion Date: Thu, 8 Aug 2019 04:45:06 +0000 Message-Id: <1565239512-11188-4-git-send-email-tyhicks@canonical.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1565239512-11188-1-git-send-email-tyhicks@canonical.com> References: <1565239512-11188-1-git-send-email-tyhicks@canonical.com> X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Willem de Bruijn Vhost-net has a hard limit on the number of zerocopy skbs in flight. When reached, transmission stalls. Stalls cause latency, as well as head-of-line blocking of other flows that do not use zerocopy. Instead of stalling, revert to copy-based transmission. Tested by sending two udp flows from guest to host, one with payload of VHOST_GOODCOPY_LEN, the other too small for zerocopy (1B). The large flow is redirected to a netem instance with 1MBps rate limit and deep 1000 entry queue. modprobe ifb ip link set dev ifb0 up tc qdisc add dev ifb0 root netem limit 1000 rate 1MBit tc qdisc add dev tap0 ingress tc filter add dev tap0 parent ffff: protocol ip \ u32 match ip dport 8000 0xffff \ action mirred egress redirect dev ifb0 Before the delay, both flows process around 80K pps. With the delay, before this patch, both process around 400. After this patch, the large flow is still rate limited, while the small reverts to its original rate. See also discussion in the first link, below. Without rate limiting, {1, 10, 100}x TCP_STREAM tests continued to send at 100% zerocopy. The limit in vhost_exceeds_maxpend must be carefully chosen. With vq->num >> 1, the flows remain correlated. This value happens to correspond to VHOST_MAX_PENDING for vq->num == 256. Allow smaller fractions and ensure correctness also for much smaller values of vq->num, by testing the min() of both explicitly. See also the discussion in the second link below. Changes v1 -> v2 - replaced min with typed min_t - avoid unnecessary whitespace change Link:http://lkml.kernel.org/r/CAF=yD-+Wk9sc9dXMUq1+x_hh=3ThTXa6BnZkygP3tgVpjbp93g@mail.gmail.com Link:http://lkml.kernel.org/r/20170819064129.27272-1-den@klaipeden.com Signed-off-by: Willem de Bruijn Signed-off-by: David S. Miller CVE-2019-3900 (cherry picked from commit 1e6f74536de08b5e50cf0e37e735911c2cef7c62) Signed-off-by: Tyler Hicks --- drivers/vhost/net.c | 12 +++--------- 1 file changed, 3 insertions(+), 9 deletions(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 7d467525bb38..5c249b82c1a4 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -292,8 +292,8 @@ static bool vhost_exceeds_maxpend(struct vhost_net *net) struct vhost_net_virtqueue *nvq = &net->vqs[VHOST_NET_VQ_TX]; struct vhost_virtqueue *vq = &nvq->vq; - return (nvq->upend_idx + vq->num - VHOST_MAX_PEND) % UIO_MAXIOV - == nvq->done_idx; + return (nvq->upend_idx + UIO_MAXIOV - nvq->done_idx) % UIO_MAXIOV > + min_t(unsigned int, VHOST_MAX_PEND, vq->num >> 2); } /* Expects to be always run from workqueue - which acts as @@ -333,11 +333,6 @@ static void handle_tx(struct vhost_net *net) if (zcopy) vhost_zerocopy_signal_used(net, vq); - /* If more outstanding DMAs, queue the work. - * Handle upend_idx wrap around - */ - if (unlikely(vhost_exceeds_maxpend(net))) - break; head = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov), @@ -373,8 +368,7 @@ static void handle_tx(struct vhost_net *net) len = msg_data_left(&msg); zcopy_used = zcopy && len >= VHOST_GOODCOPY_LEN - && (nvq->upend_idx + 1) % UIO_MAXIOV != - nvq->done_idx + && !vhost_exceeds_maxpend(net) && vhost_net_tx_select_zcopy(net); /* use msg_control to pass vhost zerocopy ubuf info to skb */ From patchwork Thu Aug 8 04:45:07 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Tyler Hicks X-Patchwork-Id: 1143820 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 463wmF49F9z9s7T; Thu, 8 Aug 2019 14:45:33 +1000 (AEST) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1hvaIr-0008Ts-AR; Thu, 08 Aug 2019 04:45:29 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.86_2) (envelope-from ) id 1hvaIo-0008SM-JN for kernel-team@lists.ubuntu.com; Thu, 08 Aug 2019 04:45:26 +0000 Received: from 2.general.tyhicks.us.vpn ([10.172.64.53] helo=sec.ubuntu-ci) by youngberry.canonical.com with esmtpsa (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.76) (envelope-from ) id 1hvaIn-0004ab-Mo; Thu, 08 Aug 2019 04:45:26 +0000 From: Tyler Hicks To: kernel-team@lists.ubuntu.com Subject: [PATCH 4/9] vhost-net: set packet weight of tx polling to 2 * vq size Date: Thu, 8 Aug 2019 04:45:07 +0000 Message-Id: <1565239512-11188-5-git-send-email-tyhicks@canonical.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1565239512-11188-1-git-send-email-tyhicks@canonical.com> References: <1565239512-11188-1-git-send-email-tyhicks@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: haibinzhang(张海斌) handle_tx will delay rx for tens or even hundreds of milliseconds when tx busy polling udp packets with small length(e.g. 1byte udp payload), because setting VHOST_NET_WEIGHT takes into account only sent-bytes but no single packet length. Ping-Latencies shown below were tested between two Virtual Machines using netperf (UDP_STREAM, len=1), and then another machine pinged the client: vq size=256 Packet-Weight Ping-Latencies(millisecond) min avg max Origin 3.319 18.489 57.303 64 1.643 2.021 2.552 128 1.825 2.600 3.224 256 1.997 2.710 4.295 512 1.860 3.171 4.631 1024 2.002 4.173 9.056 2048 2.257 5.650 9.688 4096 2.093 8.508 15.943 vq size=512 Packet-Weight Ping-Latencies(millisecond) min avg max Origin 6.537 29.177 66.245 64 2.798 3.614 4.403 128 2.861 3.820 4.775 256 3.008 4.018 4.807 512 3.254 4.523 5.824 1024 3.079 5.335 7.747 2048 3.944 8.201 12.762 4096 4.158 11.057 19.985 Seems pretty consistent, a small dip at 2 VQ sizes. Ring size is a hint from device about a burst size it can tolerate. Based on benchmarks, set the weight to 2 * vq size. To evaluate this change, another tests were done using netperf(RR, TX) between two machines with Intel(R) Xeon(R) Gold 6133 CPU @ 2.50GHz, and vq size was tweaked through qemu. Results shown below does not show obvious changes. vq size=256 TCP_RR vq size=512 TCP_RR size/sessions/+thu%/+normalize% size/sessions/+thu%/+normalize% 1/ 1/ -7%/ -2% 1/ 1/ 0%/ -2% 1/ 4/ +1%/ 0% 1/ 4/ +1%/ 0% 1/ 8/ +1%/ -2% 1/ 8/ 0%/ +1% 64/ 1/ -6%/ 0% 64/ 1/ +7%/ +3% 64/ 4/ 0%/ +2% 64/ 4/ -1%/ +1% 64/ 8/ 0%/ 0% 64/ 8/ -1%/ -2% 256/ 1/ -3%/ -4% 256/ 1/ -4%/ -2% 256/ 4/ +3%/ +4% 256/ 4/ +1%/ +2% 256/ 8/ +2%/ 0% 256/ 8/ +1%/ -1% vq size=256 UDP_RR vq size=512 UDP_RR size/sessions/+thu%/+normalize% size/sessions/+thu%/+normalize% 1/ 1/ -5%/ +1% 1/ 1/ -3%/ -2% 1/ 4/ +4%/ +1% 1/ 4/ -2%/ +2% 1/ 8/ -1%/ -1% 1/ 8/ -1%/ 0% 64/ 1/ -2%/ -3% 64/ 1/ +1%/ +1% 64/ 4/ -5%/ -1% 64/ 4/ +2%/ 0% 64/ 8/ 0%/ -1% 64/ 8/ -2%/ +1% 256/ 1/ +7%/ +1% 256/ 1/ -7%/ 0% 256/ 4/ +1%/ +1% 256/ 4/ -3%/ -4% 256/ 8/ +2%/ +2% 256/ 8/ +1%/ +1% vq size=256 TCP_STREAM vq size=512 TCP_STREAM size/sessions/+thu%/+normalize% size/sessions/+thu%/+normalize% 64/ 1/ 0%/ -3% 64/ 1/ 0%/ 0% 64/ 4/ +3%/ -1% 64/ 4/ -2%/ +4% 64/ 8/ +9%/ -4% 64/ 8/ -1%/ +2% 256/ 1/ +1%/ -4% 256/ 1/ +1%/ +1% 256/ 4/ -1%/ -1% 256/ 4/ -3%/ 0% 256/ 8/ +7%/ +5% 256/ 8/ -3%/ 0% 512/ 1/ +1%/ 0% 512/ 1/ -1%/ -1% 512/ 4/ +1%/ -1% 512/ 4/ 0%/ 0% 512/ 8/ +7%/ -5% 512/ 8/ +6%/ -1% 1024/ 1/ 0%/ -1% 1024/ 1/ 0%/ +1% 1024/ 4/ +3%/ 0% 1024/ 4/ +1%/ 0% 1024/ 8/ +8%/ +5% 1024/ 8/ -1%/ 0% 2048/ 1/ +2%/ +2% 2048/ 1/ -1%/ 0% 2048/ 4/ +1%/ 0% 2048/ 4/ 0%/ -1% 2048/ 8/ -2%/ 0% 2048/ 8/ 5%/ -1% 4096/ 1/ -2%/ 0% 4096/ 1/ -2%/ 0% 4096/ 4/ +2%/ 0% 4096/ 4/ 0%/ 0% 4096/ 8/ +9%/ -2% 4096/ 8/ -5%/ -1% Acked-by: Michael S. Tsirkin Signed-off-by: Haibin Zhang Signed-off-by: Yunfang Tai Signed-off-by: Lidong Chen Signed-off-by: David S. Miller CVE-2019-3900 (cherry picked from commit a2ac99905f1ea8b15997a6ec39af69aa28a3653b) Signed-off-by: Tyler Hicks --- drivers/vhost/net.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 5c249b82c1a4..359d168d26da 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -39,6 +39,10 @@ MODULE_PARM_DESC(experimental_zcopytx, "Enable Zero Copy TX;" * Using this limit prevents one virtqueue from starving others. */ #define VHOST_NET_WEIGHT 0x80000 +/* Max number of packets transferred before requeueing the job. + * Using this limit prevents one virtqueue from starving rx. */ +#define VHOST_NET_PKT_WEIGHT(vq) ((vq)->num * 2) + /* MAX number of TX used buffers for outstanding zerocopy */ #define VHOST_MAX_PEND 128 #define VHOST_GOODCOPY_LEN 256 @@ -317,6 +321,7 @@ static void handle_tx(struct vhost_net *net) struct socket *sock; struct vhost_net_ubuf_ref *uninitialized_var(ubufs); bool zcopy, zcopy_used; + int sent_pkts = 0; mutex_lock(&vq->mutex); sock = vq->private_data; @@ -419,7 +424,8 @@ static void handle_tx(struct vhost_net *net) else vhost_zerocopy_signal_used(net, vq); vhost_net_tx_packet(net); - if (unlikely(total_len >= VHOST_NET_WEIGHT)) { + if (unlikely(total_len >= VHOST_NET_WEIGHT) || + unlikely(++sent_pkts >= VHOST_NET_PKT_WEIGHT(vq))) { vhost_poll_queue(&vq->poll); break; } From patchwork Thu Aug 8 04:45:08 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tyler Hicks X-Patchwork-Id: 1143821 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 463wmG0hHYz9sNx; Thu, 8 Aug 2019 14:45:34 +1000 (AEST) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1hvaIr-0008UJ-HG; Thu, 08 Aug 2019 04:45:29 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.86_2) (envelope-from ) id 1hvaIq-0008TI-12 for kernel-team@lists.ubuntu.com; Thu, 08 Aug 2019 04:45:28 +0000 Received: from 2.general.tyhicks.us.vpn ([10.172.64.53] helo=sec.ubuntu-ci) by youngberry.canonical.com with esmtpsa (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.76) (envelope-from ) id 1hvaIp-0004ab-CA; Thu, 08 Aug 2019 04:45:27 +0000 From: Tyler Hicks To: kernel-team@lists.ubuntu.com Subject: [PATCH 5/9] vhost_net: use packet weight for rx handler, too Date: Thu, 8 Aug 2019 04:45:08 +0000 Message-Id: <1565239512-11188-6-git-send-email-tyhicks@canonical.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1565239512-11188-1-git-send-email-tyhicks@canonical.com> References: <1565239512-11188-1-git-send-email-tyhicks@canonical.com> X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Paolo Abeni Similar to commit a2ac99905f1e ("vhost-net: set packet weight of tx polling to 2 * vq size"), we need a packet-based limit for handler_rx, too - elsewhere, under rx flood with small packets, tx can be delayed for a very long time, even without busypolling. The pkt limit applied to handle_rx must be the same applied by handle_tx, or we will get unfair scheduling between rx and tx. Tying such limit to the queue length makes it less effective for large queue length values and can introduce large process scheduler latencies, so a constant valued is used - likewise the existing bytes limit. The selected limit has been validated with PVP[1] performance test with different queue sizes: queue size 256 512 1024 baseline 366 354 362 weight 128 715 723 670 weight 256 740 745 733 weight 512 600 460 583 weight 1024 423 427 418 A packet weight of 256 gives peek performances in under all the tested scenarios. No measurable regression in unidirectional performance tests has been detected. [1] https://developers.redhat.com/blog/2017/06/05/measuring-and-comparing-open-vswitch-performance/ Signed-off-by: Paolo Abeni Acked-by: Jason Wang Signed-off-by: David S. Miller CVE-2019-3900 (backported from commit db688c24eada63b1efe6d0d7d835e5c3bdd71fd3) [tyhicks: Backport to Xenial: - Context adjustment in call to mutex_lock_nested() in hunk #3 due to missing and unneeded commit aaa3149bbee9 ("vhost_net: add missing lock nesting notation") - Context adjustment in call to vhost_log_write() in hunk #4 due to missing and unneeded commit cc5e71075947 ("vhost: log dirty page correctly") - Context adjustment in hunk #4 due to using break instead of goto out] Signed-off-by: Tyler Hicks --- drivers/vhost/net.c | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 359d168d26da..3e760cb19643 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -40,8 +40,10 @@ MODULE_PARM_DESC(experimental_zcopytx, "Enable Zero Copy TX;" #define VHOST_NET_WEIGHT 0x80000 /* Max number of packets transferred before requeueing the job. - * Using this limit prevents one virtqueue from starving rx. */ -#define VHOST_NET_PKT_WEIGHT(vq) ((vq)->num * 2) + * Using this limit prevents one virtqueue from starving others with small + * pkts. + */ +#define VHOST_NET_PKT_WEIGHT 256 /* MAX number of TX used buffers for outstanding zerocopy */ #define VHOST_MAX_PEND 128 @@ -425,7 +427,7 @@ static void handle_tx(struct vhost_net *net) vhost_zerocopy_signal_used(net, vq); vhost_net_tx_packet(net); if (unlikely(total_len >= VHOST_NET_WEIGHT) || - unlikely(++sent_pkts >= VHOST_NET_PKT_WEIGHT(vq))) { + unlikely(++sent_pkts >= VHOST_NET_PKT_WEIGHT)) { vhost_poll_queue(&vq->poll); break; } @@ -556,6 +558,7 @@ static void handle_rx(struct vhost_net *net) struct socket *sock; struct iov_iter fixup; __virtio16 num_buffers; + int recv_pkts = 0; mutex_lock(&vq->mutex); sock = vq->private_data; @@ -648,7 +651,8 @@ static void handle_rx(struct vhost_net *net) if (unlikely(vq_log)) vhost_log_write(vq, vq_log, log, vhost_len); total_len += vhost_len; - if (unlikely(total_len >= VHOST_NET_WEIGHT)) { + if (unlikely(total_len >= VHOST_NET_WEIGHT) || + unlikely(++recv_pkts >= VHOST_NET_PKT_WEIGHT)) { vhost_poll_queue(&vq->poll); break; } From patchwork Thu Aug 8 04:45:09 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tyler Hicks X-Patchwork-Id: 1143823 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 463wmK0T0vz9s7T; Thu, 8 Aug 2019 14:45:37 +1000 (AEST) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1hvaIu-000057-Uj; Thu, 08 Aug 2019 04:45:32 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.86_2) (envelope-from ) id 1hvaIr-0008Tq-C8 for kernel-team@lists.ubuntu.com; Thu, 08 Aug 2019 04:45:29 +0000 Received: from 2.general.tyhicks.us.vpn ([10.172.64.53] helo=sec.ubuntu-ci) by youngberry.canonical.com with esmtpsa (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.76) (envelope-from ) id 1hvaIq-0004ab-Lf; Thu, 08 Aug 2019 04:45:29 +0000 From: Tyler Hicks To: kernel-team@lists.ubuntu.com Subject: [PATCH 6/9] vhost_net: introduce vhost_exceeds_weight() Date: Thu, 8 Aug 2019 04:45:09 +0000 Message-Id: <1565239512-11188-7-git-send-email-tyhicks@canonical.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1565239512-11188-1-git-send-email-tyhicks@canonical.com> References: <1565239512-11188-1-git-send-email-tyhicks@canonical.com> X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Jason Wang Signed-off-by: Jason Wang Signed-off-by: David S. Miller CVE-2019-3900 (backported from commit 272f35cba53d088085e5952fd81d7a133ab90789) [tyhicks: Backport to Xenial: - Minor context adjustment in net.c due to missing commit b0d0ea50e782 ("vhost_net: introduce helper to initialize tx iov iter") - Context adjustment in call to vhost_log_write() in hunk #4 due to missing and unneeded commit cc5e71075947 ("vhost: log dirty page correctly") - Context adjustment in hunk #4 due to using break instead of goto out] Signed-off-by: Tyler Hicks --- drivers/vhost/net.c | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 3e760cb19643..3c68b9b210c4 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -302,6 +302,12 @@ static bool vhost_exceeds_maxpend(struct vhost_net *net) min_t(unsigned int, VHOST_MAX_PEND, vq->num >> 2); } +static bool vhost_exceeds_weight(int pkts, int total_len) +{ + return total_len >= VHOST_NET_WEIGHT || + pkts >= VHOST_NET_PKT_WEIGHT; +} + /* Expects to be always run from workqueue - which acts as * read-size critical section for our kind of RCU. */ static void handle_tx(struct vhost_net *net) @@ -397,7 +403,6 @@ static void handle_tx(struct vhost_net *net) msg.msg_control = NULL; ubufs = NULL; } - total_len += len; if (total_len < VHOST_NET_WEIGHT && !vhost_vq_avail_empty(&net->dev, vq) && @@ -426,8 +431,7 @@ static void handle_tx(struct vhost_net *net) else vhost_zerocopy_signal_used(net, vq); vhost_net_tx_packet(net); - if (unlikely(total_len >= VHOST_NET_WEIGHT) || - unlikely(++sent_pkts >= VHOST_NET_PKT_WEIGHT)) { + if (unlikely(vhost_exceeds_weight(++sent_pkts, total_len))) { vhost_poll_queue(&vq->poll); break; } @@ -651,8 +655,7 @@ static void handle_rx(struct vhost_net *net) if (unlikely(vq_log)) vhost_log_write(vq, vq_log, log, vhost_len); total_len += vhost_len; - if (unlikely(total_len >= VHOST_NET_WEIGHT) || - unlikely(++recv_pkts >= VHOST_NET_PKT_WEIGHT)) { + if (unlikely(vhost_exceeds_weight(++recv_pkts, total_len))) { vhost_poll_queue(&vq->poll); break; } From patchwork Thu Aug 8 04:45:10 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tyler Hicks X-Patchwork-Id: 1143824 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 463wmM3vcXz9s7T; Thu, 8 Aug 2019 14:45:39 +1000 (AEST) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1hvaIw-00006M-H6; Thu, 08 Aug 2019 04:45:34 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.86_2) (envelope-from ) id 1hvaIt-0008V3-9K for kernel-team@lists.ubuntu.com; Thu, 08 Aug 2019 04:45:31 +0000 Received: from 2.general.tyhicks.us.vpn ([10.172.64.53] helo=sec.ubuntu-ci) by youngberry.canonical.com with esmtpsa (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.76) (envelope-from ) id 1hvaIr-0004ab-TN; Thu, 08 Aug 2019 04:45:30 +0000 From: Tyler Hicks To: kernel-team@lists.ubuntu.com Subject: [PATCH 7/9] vhost: introduce vhost_exceeds_weight() Date: Thu, 8 Aug 2019 04:45:10 +0000 Message-Id: <1565239512-11188-8-git-send-email-tyhicks@canonical.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1565239512-11188-1-git-send-email-tyhicks@canonical.com> References: <1565239512-11188-1-git-send-email-tyhicks@canonical.com> X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Jason Wang We used to have vhost_exceeds_weight() for vhost-net to: - prevent vhost kthread from hogging the cpu - balance the time spent between TX and RX This function could be useful for vsock and scsi as well. So move it to vhost.c. Device must specify a weight which counts the number of requests, or it can also specific a byte_weight which counts the number of bytes that has been processed. Signed-off-by: Jason Wang Reviewed-by: Stefan Hajnoczi Signed-off-by: Michael S. Tsirkin CVE-2019-3900 (backported from commit e82b9b0727ff6d665fff2d326162b460dded554d) [tyhicks: Backport to Xenial: - Adjust handle_tx() instead of handle_tx_{copy,zerocopy}() due to missing commit 0d20bdf34dc7 ("vhost_net: split out datacopy logic") - Considerable context adjustments throughout the patch due to a lack of missing the iov_limit member of the vhost_dev struct which was added later in commit b46a0bf78ad7 ("vhost: fix OOB in get_rx_bufs()") - Context adjustment in call to vhost_log_write() in hunk #3 of net.c due to missing and unneeded commit cc5e71075947 ("vhost: log dirty page correctly") - Context adjustment in hunk #3 of net.c due to using break instead of goto out - Context adjustment in hunk #4 of net.c due to missing and unneeded commit c67df11f6e48 ("vhost_net: try batch dequing from skb array") - Don't patch vsock.c since Xenial doesn't have vhost vsock support - Adjust context in vhost_dev_init() to account for different local variables - Adjust context in struct vhost_dev to account for different struct members] Signed-off-by: Tyler Hicks --- drivers/vhost/net.c | 18 +++++------------- drivers/vhost/scsi.c | 8 +++++++- drivers/vhost/vhost.c | 20 +++++++++++++++++++- drivers/vhost/vhost.h | 6 +++++- 4 files changed, 36 insertions(+), 16 deletions(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 3c68b9b210c4..2b73e1d0776b 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -302,12 +302,6 @@ static bool vhost_exceeds_maxpend(struct vhost_net *net) min_t(unsigned int, VHOST_MAX_PEND, vq->num >> 2); } -static bool vhost_exceeds_weight(int pkts, int total_len) -{ - return total_len >= VHOST_NET_WEIGHT || - pkts >= VHOST_NET_PKT_WEIGHT; -} - /* Expects to be always run from workqueue - which acts as * read-size critical section for our kind of RCU. */ static void handle_tx(struct vhost_net *net) @@ -431,10 +425,9 @@ static void handle_tx(struct vhost_net *net) else vhost_zerocopy_signal_used(net, vq); vhost_net_tx_packet(net); - if (unlikely(vhost_exceeds_weight(++sent_pkts, total_len))) { - vhost_poll_queue(&vq->poll); + if (unlikely(vhost_exceeds_weight(vq, ++sent_pkts, + total_len))) break; - } } out: mutex_unlock(&vq->mutex); @@ -655,10 +648,8 @@ static void handle_rx(struct vhost_net *net) if (unlikely(vq_log)) vhost_log_write(vq, vq_log, log, vhost_len); total_len += vhost_len; - if (unlikely(vhost_exceeds_weight(++recv_pkts, total_len))) { - vhost_poll_queue(&vq->poll); + if (unlikely(vhost_exceeds_weight(vq, ++recv_pkts, total_len))) break; - } } out: mutex_unlock(&vq->mutex); @@ -728,7 +719,8 @@ static int vhost_net_open(struct inode *inode, struct file *f) n->vqs[i].vhost_hlen = 0; n->vqs[i].sock_hlen = 0; } - vhost_dev_init(dev, vqs, VHOST_NET_VQ_MAX); + vhost_dev_init(dev, vqs, VHOST_NET_VQ_MAX, + VHOST_NET_WEIGHT, VHOST_NET_PKT_WEIGHT); vhost_poll_init(n->poll + VHOST_NET_VQ_TX, handle_tx_net, POLLOUT, dev); vhost_poll_init(n->poll + VHOST_NET_VQ_RX, handle_rx_net, POLLIN, dev); diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c index 8fc62a03637a..631fa4a768d7 100644 --- a/drivers/vhost/scsi.c +++ b/drivers/vhost/scsi.c @@ -58,6 +58,12 @@ #define VHOST_SCSI_PREALLOC_UPAGES 2048 #define VHOST_SCSI_PREALLOC_PROT_SGLS 512 +/* Max number of requests before requeueing the job. + * Using this limit prevents one virtqueue from starving others with + * request. + */ +#define VHOST_SCSI_WEIGHT 256 + struct vhost_scsi_inflight { /* Wait for the flush operation to finish */ struct completion comp; @@ -1443,7 +1449,7 @@ static int vhost_scsi_open(struct inode *inode, struct file *f) vqs[i] = &vs->vqs[i].vq; vs->vqs[i].vq.handle_kick = vhost_scsi_handle_kick; } - vhost_dev_init(&vs->dev, vqs, VHOST_SCSI_MAX_VQ); + vhost_dev_init(&vs->dev, vqs, VHOST_SCSI_MAX_VQ, VHOST_SCSI_WEIGHT, 0); vhost_scsi_init_inflight(vs, NULL); diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index 84a0a97b7988..632127f35152 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -370,8 +370,24 @@ static void vhost_dev_free_iovecs(struct vhost_dev *dev) vhost_vq_free_iovecs(dev->vqs[i]); } +bool vhost_exceeds_weight(struct vhost_virtqueue *vq, + int pkts, int total_len) +{ + struct vhost_dev *dev = vq->dev; + + if ((dev->byte_weight && total_len >= dev->byte_weight) || + pkts >= dev->weight) { + vhost_poll_queue(&vq->poll); + return true; + } + + return false; +} +EXPORT_SYMBOL_GPL(vhost_exceeds_weight); + void vhost_dev_init(struct vhost_dev *dev, - struct vhost_virtqueue **vqs, int nvqs) + struct vhost_virtqueue **vqs, int nvqs, + int weight, int byte_weight) { struct vhost_virtqueue *vq; int i; @@ -386,6 +402,8 @@ void vhost_dev_init(struct vhost_dev *dev, spin_lock_init(&dev->work_lock); INIT_LIST_HEAD(&dev->work_list); dev->worker = NULL; + dev->weight = weight; + dev->byte_weight = byte_weight; for (i = 0; i < dev->nvqs; ++i) { vq = dev->vqs[i]; diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h index 0f9b4d22bee5..aefc1af928b8 100644 --- a/drivers/vhost/vhost.h +++ b/drivers/vhost/vhost.h @@ -127,9 +127,13 @@ struct vhost_dev { spinlock_t work_lock; struct list_head work_list; struct task_struct *worker; + int weight; + int byte_weight; }; -void vhost_dev_init(struct vhost_dev *, struct vhost_virtqueue **vqs, int nvqs); +bool vhost_exceeds_weight(struct vhost_virtqueue *vq, int pkts, int total_len); +void vhost_dev_init(struct vhost_dev *, struct vhost_virtqueue **vqs, + int nvqs, int weight, int byte_weight); long vhost_dev_set_owner(struct vhost_dev *dev); bool vhost_dev_has_owner(struct vhost_dev *dev); long vhost_dev_check_owner(struct vhost_dev *); From patchwork Thu Aug 8 04:45:11 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tyler Hicks X-Patchwork-Id: 1143825 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 463wmN6Jbqz9s00; Thu, 8 Aug 2019 14:45:40 +1000 (AEST) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1hvaIy-00007a-78; Thu, 08 Aug 2019 04:45:36 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.86_2) (envelope-from ) id 1hvaIt-0008W1-VC for kernel-team@lists.ubuntu.com; Thu, 08 Aug 2019 04:45:31 +0000 Received: from 2.general.tyhicks.us.vpn ([10.172.64.53] helo=sec.ubuntu-ci) by youngberry.canonical.com with esmtpsa (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.76) (envelope-from ) id 1hvaIt-0004ab-8C; Thu, 08 Aug 2019 04:45:31 +0000 From: Tyler Hicks To: kernel-team@lists.ubuntu.com Subject: [PATCH 8/9] vhost_net: fix possible infinite loop Date: Thu, 8 Aug 2019 04:45:11 +0000 Message-Id: <1565239512-11188-9-git-send-email-tyhicks@canonical.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1565239512-11188-1-git-send-email-tyhicks@canonical.com> References: <1565239512-11188-1-git-send-email-tyhicks@canonical.com> X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Jason Wang When the rx buffer is too small for a packet, we will discard the vq descriptor and retry it for the next packet: while ((sock_len = vhost_net_rx_peek_head_len(net, sock->sk, &busyloop_intr))) { ... /* On overrun, truncate and discard */ if (unlikely(headcount > UIO_MAXIOV)) { iov_iter_init(&msg.msg_iter, READ, vq->iov, 1, 1); err = sock->ops->recvmsg(sock, &msg, 1, MSG_DONTWAIT | MSG_TRUNC); pr_debug("Discarded rx packet: len %zd\n", sock_len); continue; } ... } This makes it possible to trigger a infinite while..continue loop through the co-opreation of two VMs like: 1) Malicious VM1 allocate 1 byte rx buffer and try to slow down the vhost process as much as possible e.g using indirect descriptors or other. 2) Malicious VM2 generate packets to VM1 as fast as possible Fixing this by checking against weight at the end of RX and TX loop. This also eliminate other similar cases when: - userspace is consuming the packets in the meanwhile - theoretical TOCTOU attack if guest moving avail index back and forth to hit the continue after vhost find guest just add new buffers This addresses CVE-2019-3900. Fixes: d8316f3991d20 ("vhost: fix total length when packets are too short") Fixes: 3a4d5c94e9593 ("vhost_net: a kernel-level virtio server") Signed-off-by: Jason Wang Reviewed-by: Stefan Hajnoczi Signed-off-by: Michael S. Tsirkin CVE-2019-3900 (backported from commit e2412c07f8f3040593dfb88207865a3cd58680c0) [tyhicks: Backport to Xenial: - Adjust handle_tx() instead of handle_tx_{copy,zerocopy}() due to missing commit 0d20bdf34dc7 ("vhost_net: split out datacopy logic") - Minor context adjustments due to a lack of missing the iov_limit member of the vhost_dev struct which was added later in commit b46a0bf78ad7 ("vhost: fix OOB in get_rx_bufs()") - handle_rx() still uses peek_head_len() due to missing and unneeded commit 030881372460 ("vhost_net: basic polling support") - Context adjustment in call to vhost_log_write() in hunk #3 of net.c due to missing and unneeded commit cc5e71075947 ("vhost: log dirty page correctly") - Context adjustment in hunk #4 due to using break instead of goto out - Context adjustment in hunk #5 due to missing and unneeded commit c67df11f6e48 ("vhost_net: try batch dequing from skb array")] Signed-off-by: Tyler Hicks --- drivers/vhost/net.c | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 2b73e1d0776b..1ba2c21f0fcb 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -335,7 +335,7 @@ static void handle_tx(struct vhost_net *net) hdr_size = nvq->vhost_hlen; zcopy = nvq->ubufs; - for (;;) { + do { /* Release DMAs done buffers first */ if (zcopy) vhost_zerocopy_signal_used(net, vq); @@ -425,10 +425,7 @@ static void handle_tx(struct vhost_net *net) else vhost_zerocopy_signal_used(net, vq); vhost_net_tx_packet(net); - if (unlikely(vhost_exceeds_weight(vq, ++sent_pkts, - total_len))) - break; - } + } while (likely(!vhost_exceeds_weight(vq, ++sent_pkts, total_len))); out: mutex_unlock(&vq->mutex); } @@ -570,7 +567,11 @@ static void handle_rx(struct vhost_net *net) vq->log : NULL; mergeable = vhost_has_feature(vq, VIRTIO_NET_F_MRG_RXBUF); - while ((sock_len = peek_head_len(sock->sk))) { + do { + sock_len = peek_head_len(sock->sk); + + if (!sock_len) + break; sock_len += sock_hlen; vhost_len = sock_len + vhost_hlen; headcount = get_rx_bufs(vq, vq->heads, vhost_len, @@ -648,9 +649,8 @@ static void handle_rx(struct vhost_net *net) if (unlikely(vq_log)) vhost_log_write(vq, vq_log, log, vhost_len); total_len += vhost_len; - if (unlikely(vhost_exceeds_weight(vq, ++recv_pkts, total_len))) - break; - } + } while (likely(!vhost_exceeds_weight(vq, ++recv_pkts, total_len))); + out: mutex_unlock(&vq->mutex); } @@ -720,7 +720,7 @@ static int vhost_net_open(struct inode *inode, struct file *f) n->vqs[i].sock_hlen = 0; } vhost_dev_init(dev, vqs, VHOST_NET_VQ_MAX, - VHOST_NET_WEIGHT, VHOST_NET_PKT_WEIGHT); + VHOST_NET_PKT_WEIGHT, VHOST_NET_WEIGHT); vhost_poll_init(n->poll + VHOST_NET_VQ_TX, handle_tx_net, POLLOUT, dev); vhost_poll_init(n->poll + VHOST_NET_VQ_RX, handle_rx_net, POLLIN, dev); From patchwork Thu Aug 8 04:45:12 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tyler Hicks X-Patchwork-Id: 1143826 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 463wmQ4b3tz9sN1; Thu, 8 Aug 2019 14:45:42 +1000 (AEST) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1hvaJ0-00009p-G6; Thu, 08 Aug 2019 04:45:38 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.86_2) (envelope-from ) id 1hvaIv-00005H-B7 for kernel-team@lists.ubuntu.com; Thu, 08 Aug 2019 04:45:33 +0000 Received: from 2.general.tyhicks.us.vpn ([10.172.64.53] helo=sec.ubuntu-ci) by youngberry.canonical.com with esmtpsa (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.76) (envelope-from ) id 1hvaIu-0004ab-KA; Thu, 08 Aug 2019 04:45:32 +0000 From: Tyler Hicks To: kernel-team@lists.ubuntu.com Subject: [PATCH 9/9] vhost: scsi: add weight support Date: Thu, 8 Aug 2019 04:45:12 +0000 Message-Id: <1565239512-11188-10-git-send-email-tyhicks@canonical.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1565239512-11188-1-git-send-email-tyhicks@canonical.com> References: <1565239512-11188-1-git-send-email-tyhicks@canonical.com> X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Jason Wang This patch will check the weight and exit the loop if we exceeds the weight. This is useful for preventing scsi kthread from hogging cpu which is guest triggerable. This addresses CVE-2019-3900. Cc: Paolo Bonzini Cc: Stefan Hajnoczi Fixes: 057cbf49a1f0 ("tcm_vhost: Initial merge for vhost level target fabric driver") Signed-off-by: Jason Wang Reviewed-by: Stefan Hajnoczi Signed-off-by: Michael S. Tsirkin Reviewed-by: Stefan Hajnoczi CVE-2019-3900 (backported from commit c1ea02f15ab5efb3e93fc3144d895410bf79fcf2) [tyhicks: Backport to Xenial: - Minor context adjustment in local variables - Adjust context around the loop in vhost_scsi_handle_vq() - No need to modify vhost_scsi_ctl_handle_vq() since it was added later in commit 0d02dbd68c47 ("vhost/scsi: Respond to control queue operations")] Signed-off-by: Tyler Hicks --- drivers/vhost/scsi.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c index 631fa4a768d7..3698dd66c720 100644 --- a/drivers/vhost/scsi.c +++ b/drivers/vhost/scsi.c @@ -861,7 +861,7 @@ vhost_scsi_handle_vq(struct vhost_scsi *vs, struct vhost_virtqueue *vq) u64 tag; u32 exp_data_len, data_direction; unsigned out, in; - int head, ret, prot_bytes; + int head, ret, prot_bytes, c = 0; size_t req_size, rsp_size = sizeof(struct virtio_scsi_cmd_resp); size_t out_size, in_size; u16 lun; @@ -880,7 +880,7 @@ vhost_scsi_handle_vq(struct vhost_scsi *vs, struct vhost_virtqueue *vq) vhost_disable_notify(&vs->dev, vq); - for (;;) { + do { head = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov), &out, &in, NULL, NULL); @@ -1096,7 +1096,7 @@ vhost_scsi_handle_vq(struct vhost_scsi *vs, struct vhost_virtqueue *vq) */ INIT_WORK(&cmd->work, vhost_scsi_submission_work); queue_work(vhost_scsi_workqueue, &cmd->work); - } + } while (likely(!vhost_exceeds_weight(vq, ++c, 0))); out: mutex_unlock(&vq->mutex); }