From patchwork Mon Dec 3 20:07:14 2012 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Michael S. Tsirkin" X-Patchwork-Id: 203434 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 291462C0089 for ; Tue, 4 Dec 2012 07:04:47 +1100 (EST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751994Ab2LCUEd (ORCPT ); Mon, 3 Dec 2012 15:04:33 -0500 Received: from mx1.redhat.com ([209.132.183.28]:5765 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751526Ab2LCUEc (ORCPT ); Mon, 3 Dec 2012 15:04:32 -0500 Received: from int-mx11.intmail.prod.int.phx2.redhat.com (int-mx11.intmail.prod.int.phx2.redhat.com [10.5.11.24]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id qB3K4QST029595 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Mon, 3 Dec 2012 15:04:26 -0500 Received: from redhat.com (vpn1-5-113.ams2.redhat.com [10.36.5.113]) by int-mx11.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with SMTP id qB3K4L8h031010; Mon, 3 Dec 2012 15:04:22 -0500 Date: Mon, 3 Dec 2012 22:07:14 +0200 From: "Michael S. Tsirkin" To: "David S. Miller" Cc: Jason Wang , "Michael S. Tsirkin" , Neil Horman , Rami Rosen , Dave Jones , Michael Kerrisk , netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCHv3] tun: only queue packets on device Message-ID: <20121203200714.GA2183@redhat.com> MIME-Version: 1.0 Content-Disposition: inline X-Scanned-By: MIMEDefang 2.68 on 10.5.11.24 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Historically tun supported two modes of operation: - in default mode, a small number of packets would get queued at the device, the rest would be queued in qdisc - in one queue mode, all packets would get queued at the device This might have made sense up to a point where we made the queue depth for both modes the same and set it to a huge value (500) so unless the consumer is stuck the chance of losing packets is small. Thus in practice both modes behave the same, but the default mode has some problems: - if packets are never consumed, fragments are never orphaned which cases a DOS for sender using zero copy transmit - overrun errors are hard to diagnose: fifo error is incremented only once so you can not distinguish between userspace that is stuck and a transient failure, tcpdump on the device does not show any traffic Userspace solves this simply by enabling IFF_ONE_QUEUE but there seems to be little point in not doing the right thing for everyone, by default. Signed-off-by: Michael S. Tsirkin --- Changes since v2: Fix comment style Changes from v1: Address comment by David Miller: Now that TUN_NO_QUEUE has no real effect and is a NOP, document it as such both in if_tun.h and the places in the driver that flip the bit based upon userspace requests. drivers/net/tun.c | 24 ++++++++---------------- include/uapi/linux/if_tun.h | 2 ++ 2 files changed, 10 insertions(+), 16 deletions(-) diff --git a/drivers/net/tun.c b/drivers/net/tun.c index 607a3a5..038196b 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -693,21 +693,8 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev) * number of queues. */ if (skb_queue_len(&tfile->socket.sk->sk_receive_queue) - >= dev->tx_queue_len / tun->numqueues){ - if (!(tun->flags & TUN_ONE_QUEUE)) { - /* Normal queueing mode. */ - /* Packet scheduler handles dropping of further packets. */ - netif_stop_subqueue(dev, txq); - - /* We won't see all dropped packets individually, so overrun - * error is more appropriate. */ - dev->stats.tx_fifo_errors++; - } else { - /* Single queue mode. - * Driver handles dropping of all packets itself. */ - goto drop; - } - } + >= dev->tx_queue_len / tun->numqueues) + goto drop; /* Orphan the skb - required as we might hang on to it * for indefinite time. */ @@ -1322,7 +1309,6 @@ static ssize_t tun_do_read(struct tun_struct *tun, struct tun_file *tfile, schedule(); continue; } - netif_wake_subqueue(tun->dev, tfile->queue_index); ret = tun_put_user(tun, tfile, skb, iv, len); kfree_skb(skb); @@ -1485,6 +1471,9 @@ static int tun_flags(struct tun_struct *tun) if (tun->flags & TUN_NO_PI) flags |= IFF_NO_PI; + /* This flag has no real effect. We track the value for backwards + * compatibility. + */ if (tun->flags & TUN_ONE_QUEUE) flags |= IFF_ONE_QUEUE; @@ -1633,6 +1622,9 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr) else tun->flags &= ~TUN_NO_PI; + /* This flag has no real effect. We track the value for backwards + * compatibility. + */ if (ifr->ifr_flags & IFF_ONE_QUEUE) tun->flags |= TUN_ONE_QUEUE; else diff --git a/include/uapi/linux/if_tun.h b/include/uapi/linux/if_tun.h index 958497a..2835b85 100644 --- a/include/uapi/linux/if_tun.h +++ b/include/uapi/linux/if_tun.h @@ -31,6 +31,7 @@ #define TUN_FASYNC 0x0010 #define TUN_NOCHECKSUM 0x0020 #define TUN_NO_PI 0x0040 +/* This flag has no real effect */ #define TUN_ONE_QUEUE 0x0080 #define TUN_PERSIST 0x0100 #define TUN_VNET_HDR 0x0200 @@ -60,6 +61,7 @@ #define IFF_TUN 0x0001 #define IFF_TAP 0x0002 #define IFF_NO_PI 0x1000 +/* This flag has no real effect */ #define IFF_ONE_QUEUE 0x2000 #define IFF_VNET_HDR 0x4000 #define IFF_TUN_EXCL 0x8000