From patchwork Thu Apr 26 17:42:15 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Willem de Bruijn X-Patchwork-Id: 905281 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="rESHMcGD"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 40X4CK0QW1z9ryr for ; Fri, 27 Apr 2018 03:42:37 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932229AbeDZRmd (ORCPT ); Thu, 26 Apr 2018 13:42:33 -0400 Received: from mail-qk0-f193.google.com ([209.85.220.193]:36257 "EHLO mail-qk0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754377AbeDZRm3 (ORCPT ); Thu, 26 Apr 2018 13:42:29 -0400 Received: by mail-qk0-f193.google.com with SMTP id a202so26265205qkg.3 for ; Thu, 26 Apr 2018 10:42:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=Zo1JRl1RKcR5QYiFu/ta8lfHiTT60TnDwBOiOZ3blq8=; b=rESHMcGDeXAjoriNPde+2uif1cqsh4RZqKcqBBKkJd/LUbq7l9kWfL852LEbW9AvvQ Nof7rXgg4ppjOQWqOLS8qqH6Fiv2+Zu9A19E84JdmCO7BgPnJtoVmaXIx/Uo0iAzMydN 8Ug9aLncjbs+3Isq7ql3Ewu+s/6P/7IVKiDuY4QTLpwSz7NUtSTUCW6KKj9CCT0TswIO yb/r0020RHrG9NFSQKIB80N7h1SCa5iZB1meP/l2NBYbHI4LbR9+lVsORvOZkYKLjgNS LyHTAqz0Q14d6fUcQw48fccbT4SahaOgAE3p80Lg63XOqorMVzgMBSE/R2Ttf619hE0b jkDw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=Zo1JRl1RKcR5QYiFu/ta8lfHiTT60TnDwBOiOZ3blq8=; b=HO+rHft9ivVGKPS7tWOPaBbl6KXs2UXPqoM4m5RtFSDUxrXh4DSWmKjLCOwBRAb82h wTxZTsUD6Xx+XfrRVpse7EkdFerDLe4FsXHigWykZMSy0M2ULgEeBVetjewkX/8fUIu8 AqllTy3d0lFcSgJfmz5dHv+a/hOie/8PbWl96ItZi/ixSgCLoPFHDHjP6k3YgyYXHPwx LM0NADv4pcloVEgAGWZeSot+O3BqnZrpHBZzsbqZ7Vcdksm8DfzJboUwn5LwBeIoCDIE 1FXpkyQ8i4MNRbZkutnTn0vFwKWk3C2/Tmi2Bms2M9g1aYMPxk7cnyLsChSqTff4guOe TeVA== X-Gm-Message-State: ALQs6tDiEB55JOh86VQe2DoXEWFVIeEbl3TzjWN4MO6JcoXvnLq2jVNf RKq8jFrg2ZTZNYHvtN/7hHDF8Yuk X-Google-Smtp-Source: AB8JxZpN5sGIzVOOeHQ91fD992+diFvRTuAB7I4RW/SqOapq1pZ70nxQB820KC8wf6GmMAuh7ErELw== X-Received: by 10.55.45.134 with SMTP id t128mr30653912qkh.356.1524764547791; Thu, 26 Apr 2018 10:42:27 -0700 (PDT) Received: from willemb1.nyc.corp.google.com ([2620:0:1003:315:3fa1:a34c:1128:1d39]) by smtp.gmail.com with ESMTPSA id o14-v6sm12927515qta.23.2018.04.26.10.42.27 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 26 Apr 2018 10:42:27 -0700 (PDT) From: Willem de Bruijn To: netdev@vger.kernel.org Cc: davem@davemloft.net, alexander.duyck@gmail.com, Willem de Bruijn Subject: [PATCH net-next v2 01/11] udp: expose inet cork to udp Date: Thu, 26 Apr 2018 13:42:15 -0400 Message-Id: <20180426174225.246388-2-willemdebruijn.kernel@gmail.com> X-Mailer: git-send-email 2.17.0.484.g0c8726318c-goog In-Reply-To: <20180426174225.246388-1-willemdebruijn.kernel@gmail.com> References: <20180426174225.246388-1-willemdebruijn.kernel@gmail.com> Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Willem de Bruijn UDP segmentation offload needs access to inet_cork in the udp layer. Pass the struct to ip(6)_make_skb instead of allocating it on the stack in that function itself. This patch is a noop otherwise. Signed-off-by: Willem de Bruijn --- include/net/ip.h | 2 +- include/net/ipv6.h | 1 + net/ipv4/ip_output.c | 17 ++++++++--------- net/ipv4/udp.c | 4 +++- net/ipv6/ip6_output.c | 20 ++++++++++---------- net/ipv6/udp.c | 3 ++- 6 files changed, 25 insertions(+), 22 deletions(-) diff --git a/include/net/ip.h b/include/net/ip.h index dc4a2d6e58a5..7ec543a64bbc 100644 --- a/include/net/ip.h +++ b/include/net/ip.h @@ -171,7 +171,7 @@ struct sk_buff *ip_make_skb(struct sock *sk, struct flowi4 *fl4, int len, int odd, struct sk_buff *skb), void *from, int length, int transhdrlen, struct ipcm_cookie *ipc, struct rtable **rtp, - unsigned int flags); + struct inet_cork *cork, unsigned int flags); static inline struct sk_buff *ip_finish_skb(struct sock *sk, struct flowi4 *fl4) { diff --git a/include/net/ipv6.h b/include/net/ipv6.h index 68b167d98879..0dd722cab037 100644 --- a/include/net/ipv6.h +++ b/include/net/ipv6.h @@ -950,6 +950,7 @@ struct sk_buff *ip6_make_skb(struct sock *sk, void *from, int length, int transhdrlen, struct ipcm6_cookie *ipc6, struct flowi6 *fl6, struct rt6_info *rt, unsigned int flags, + struct inet_cork_full *cork, const struct sockcm_cookie *sockc); static inline struct sk_buff *ip6_finish_skb(struct sock *sk) diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index 83c73bab2c3d..2883ff1e909c 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -1470,9 +1470,8 @@ struct sk_buff *ip_make_skb(struct sock *sk, int len, int odd, struct sk_buff *skb), void *from, int length, int transhdrlen, struct ipcm_cookie *ipc, struct rtable **rtp, - unsigned int flags) + struct inet_cork *cork, unsigned int flags) { - struct inet_cork cork; struct sk_buff_head queue; int err; @@ -1481,22 +1480,22 @@ struct sk_buff *ip_make_skb(struct sock *sk, __skb_queue_head_init(&queue); - cork.flags = 0; - cork.addr = 0; - cork.opt = NULL; - err = ip_setup_cork(sk, &cork, ipc, rtp); + cork->flags = 0; + cork->addr = 0; + cork->opt = NULL; + err = ip_setup_cork(sk, cork, ipc, rtp); if (err) return ERR_PTR(err); - err = __ip_append_data(sk, fl4, &queue, &cork, + err = __ip_append_data(sk, fl4, &queue, cork, ¤t->task_frag, getfrag, from, length, transhdrlen, flags); if (err) { - __ip_flush_pending_frames(sk, &queue, &cork); + __ip_flush_pending_frames(sk, &queue, cork); return ERR_PTR(err); } - return __ip_make_skb(sk, fl4, &queue, &cork); + return __ip_make_skb(sk, fl4, &queue, cork); } /* diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index 24b5c59b1c53..6b9d8017b319 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -1030,9 +1030,11 @@ int udp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) /* Lockless fast path for the non-corking case. */ if (!corkreq) { + struct inet_cork cork; + skb = ip_make_skb(sk, fl4, getfrag, msg, ulen, sizeof(struct udphdr), &ipc, &rt, - msg->msg_flags); + &cork, msg->msg_flags); err = PTR_ERR(skb); if (!IS_ERR_OR_NULL(skb)) err = udp_send_skb(skb, fl4); diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index 6a477d54f8c7..7fa1db447405 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -1755,9 +1755,9 @@ struct sk_buff *ip6_make_skb(struct sock *sk, void *from, int length, int transhdrlen, struct ipcm6_cookie *ipc6, struct flowi6 *fl6, struct rt6_info *rt, unsigned int flags, + struct inet_cork_full *cork, const struct sockcm_cookie *sockc) { - struct inet_cork_full cork; struct inet6_cork v6_cork; struct sk_buff_head queue; int exthdrlen = (ipc6->opt ? ipc6->opt->opt_flen : 0); @@ -1768,27 +1768,27 @@ struct sk_buff *ip6_make_skb(struct sock *sk, __skb_queue_head_init(&queue); - cork.base.flags = 0; - cork.base.addr = 0; - cork.base.opt = NULL; - cork.base.dst = NULL; + cork->base.flags = 0; + cork->base.addr = 0; + cork->base.opt = NULL; + cork->base.dst = NULL; v6_cork.opt = NULL; - err = ip6_setup_cork(sk, &cork, &v6_cork, ipc6, rt, fl6); + err = ip6_setup_cork(sk, cork, &v6_cork, ipc6, rt, fl6); if (err) { - ip6_cork_release(&cork, &v6_cork); + ip6_cork_release(cork, &v6_cork); return ERR_PTR(err); } if (ipc6->dontfrag < 0) ipc6->dontfrag = inet6_sk(sk)->dontfrag; - err = __ip6_append_data(sk, fl6, &queue, &cork.base, &v6_cork, + err = __ip6_append_data(sk, fl6, &queue, &cork->base, &v6_cork, ¤t->task_frag, getfrag, from, length + exthdrlen, transhdrlen + exthdrlen, flags, ipc6, sockc); if (err) { - __ip6_flush_pending_frames(sk, &queue, &cork, &v6_cork); + __ip6_flush_pending_frames(sk, &queue, cork, &v6_cork); return ERR_PTR(err); } - return __ip6_make_skb(sk, &queue, &cork, &v6_cork); + return __ip6_make_skb(sk, &queue, cork, &v6_cork); } diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c index 4ec76a87aeb8..824797f8d1ab 100644 --- a/net/ipv6/udp.c +++ b/net/ipv6/udp.c @@ -1324,12 +1324,13 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) /* Lockless fast path for the non-corking case */ if (!corkreq) { + struct inet_cork_full cork; struct sk_buff *skb; skb = ip6_make_skb(sk, getfrag, msg, ulen, sizeof(struct udphdr), &ipc6, &fl6, (struct rt6_info *)dst, - msg->msg_flags, &sockc); + msg->msg_flags, &cork, &sockc); err = PTR_ERR(skb); if (!IS_ERR_OR_NULL(skb)) err = udp_v6_send_skb(skb, &fl6); From patchwork Thu Apr 26 17:42:16 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Willem de Bruijn X-Patchwork-Id: 905291 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="I2xUVADW"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 40X4D96ngPz9ry1 for ; Fri, 27 Apr 2018 03:43:21 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932352AbeDZRnT (ORCPT ); Thu, 26 Apr 2018 13:43:19 -0400 Received: from mail-qk0-f196.google.com ([209.85.220.196]:34374 "EHLO mail-qk0-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756767AbeDZRm3 (ORCPT ); Thu, 26 Apr 2018 13:42:29 -0400 Received: by mail-qk0-f196.google.com with SMTP id p186so23551554qkd.1 for ; Thu, 26 Apr 2018 10:42:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=G5hU7W1W2AIzO6iKrBciI+lIZ+cUsT9CXq+J+2HssDQ=; b=I2xUVADWXeABOBTbpWGiF/g9WgKtVpae4ZmUnwUwWRxVAmn+SpF0cUox5dS0se3yGq LxH73Bw/xSUVFFCd7XG1UdgiwqjqKiHKOEC4K8kU9A0gNq4F/TUzkP3im7DhZRPGaYtu dIaDryTSx2REGLsEoSrmXQ7QV4OGV9Up0XtqLtSlUbrSJJqjXtwoOlh7hXjnaoDQlxMp 5UHNkC5DCwXdtTMxqehAMP3LvHEAotmNsrUr57Q4eeJb/VszDFD5OFDz4SNpgZrs23Sn NDcTC2hRPN4tnOOLGJKMz2lwhFEFPecNm6J85BAVUpxP1WgoR5TK7eYCcEYoavdkaawy Ps5A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=G5hU7W1W2AIzO6iKrBciI+lIZ+cUsT9CXq+J+2HssDQ=; b=MD+JOlYjGw4JjJtjsCmwp1Kpma9kLQRghNpt5ZXw3CyURe7LfEY0rQUYzEd1oLIa4M KsPRrC0I5JgZAX/uJWDJPpGxVCB1bgVCL7VTvhtTAzK8c85tHM/qWyRarSrlOlLfNjCR fF4+02uT8IfIdNELn+UZWcV6rIfosM1HlwgYkxwyyoFNoUKh72AjvRdUhbI8n4o4vF30 dS6aJxE5q1peNQXPSrNDbYxzQs+olRtt7OsWcWpSkwVN6TRS8ONd90ymUps3gnAcvko8 44Q2IgZAVQ5GUIbnM9djqeW2zZ8hF3jLbfkX92SVJvGz5GsDNXjIF92tyop5mHyloFls LBxw== X-Gm-Message-State: ALQs6tCiv3UZseWoCQWpvleXUyHd8IonetaLR3JnU9XozMQUIj4F5AVW MyS45dvnhH5ylnXVew754GgrxOpt X-Google-Smtp-Source: AB8JxZpzsOHrX/z9Jv1BxvbH3zkkXhnErUx/CcOwfb50LcWIra7rWcjNF232WlsaIQZx+VIH03OmtQ== X-Received: by 10.55.20.232 with SMTP id 101mr35962953qku.243.1524764548377; Thu, 26 Apr 2018 10:42:28 -0700 (PDT) Received: from willemb1.nyc.corp.google.com ([2620:0:1003:315:3fa1:a34c:1128:1d39]) by smtp.gmail.com with ESMTPSA id o14-v6sm12927515qta.23.2018.04.26.10.42.27 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 26 Apr 2018 10:42:27 -0700 (PDT) From: Willem de Bruijn To: netdev@vger.kernel.org Cc: davem@davemloft.net, alexander.duyck@gmail.com, Willem de Bruijn Subject: [PATCH net-next v2 02/11] udp: add udp gso Date: Thu, 26 Apr 2018 13:42:16 -0400 Message-Id: <20180426174225.246388-3-willemdebruijn.kernel@gmail.com> X-Mailer: git-send-email 2.17.0.484.g0c8726318c-goog In-Reply-To: <20180426174225.246388-1-willemdebruijn.kernel@gmail.com> References: <20180426174225.246388-1-willemdebruijn.kernel@gmail.com> Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Willem de Bruijn Implement generic segmentation offload support for udp datagrams. A follow-up patch adds support to the protocol stack to generate such packets. UDP GSO is not UFO. UFO fragments a single large datagram. GSO splits a large payload into a number of discrete UDP datagrams. The implementation adds a GSO type SKB_UDP_GSO_L4 to differentiate it from UFO (SKB_UDP_GSO). IPPROTO_UDPLITE is excluded, as that protocol has no gso handler registered. Signed-off-by: Willem de Bruijn --- include/linux/skbuff.h | 2 ++ include/net/udp.h | 4 ++++ net/core/skbuff.c | 2 ++ net/ipv4/udp_offload.c | 52 +++++++++++++++++++++++++++++++++++++++++- net/ipv6/ip6_offload.c | 6 +++-- net/ipv6/udp_offload.c | 19 ++++++++++++++- 6 files changed, 81 insertions(+), 4 deletions(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index d274059529eb..a4a5c0c5cba8 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -573,6 +573,8 @@ enum { SKB_GSO_ESP = 1 << 15, SKB_GSO_UDP = 1 << 16, + + SKB_GSO_UDP_L4 = 1 << 17, }; #if BITS_PER_LONG > 32 diff --git a/include/net/udp.h b/include/net/udp.h index 0676b272f6ac..741d888d0fdb 100644 --- a/include/net/udp.h +++ b/include/net/udp.h @@ -174,6 +174,10 @@ struct sk_buff **udp_gro_receive(struct sk_buff **head, struct sk_buff *skb, struct udphdr *uh, udp_lookup_t lookup); int udp_gro_complete(struct sk_buff *skb, int nhoff, udp_lookup_t lookup); +struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb, + netdev_features_t features, + unsigned int mss, __sum16 check); + static inline struct udphdr *udp_gro_udphdr(struct sk_buff *skb) { struct udphdr *uh; diff --git a/net/core/skbuff.c b/net/core/skbuff.c index ff49e352deea..c647cfe114e0 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -4940,6 +4940,8 @@ static unsigned int skb_gso_transport_seglen(const struct sk_buff *skb) thlen = tcp_hdrlen(skb); } else if (unlikely(skb_is_gso_sctp(skb))) { thlen = sizeof(struct sctphdr); + } else if (shinfo->gso_type & SKB_GSO_UDP_L4) { + thlen = sizeof(struct udphdr); } /* UFO sets gso_size to the size of the fragmentation * payload, i.e. the size of the L4 (UDP) header is already diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c index ea6e6e7df0ee..6b09220ca407 100644 --- a/net/ipv4/udp_offload.c +++ b/net/ipv4/udp_offload.c @@ -187,6 +187,53 @@ struct sk_buff *skb_udp_tunnel_segment(struct sk_buff *skb, } EXPORT_SYMBOL(skb_udp_tunnel_segment); +struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb, + netdev_features_t features, + unsigned int mss, __sum16 check) +{ + struct sk_buff *segs, *seg; + unsigned int hdrlen; + struct udphdr *uh; + + if (gso_skb->len <= sizeof(*uh) + mss) + return ERR_PTR(-EINVAL); + + hdrlen = gso_skb->data - skb_mac_header(gso_skb); + skb_pull(gso_skb, sizeof(*uh)); + + segs = skb_segment(gso_skb, features); + if (unlikely(IS_ERR_OR_NULL(segs))) + return segs; + + for (seg = segs; seg; seg = seg->next) { + uh = udp_hdr(seg); + uh->len = htons(seg->len - hdrlen); + uh->check = check; + + /* last packet can be partial gso_size */ + if (!seg->next) + csum_replace2(&uh->check, htons(mss), + htons(seg->len - hdrlen - sizeof(*uh))); + } + + return segs; +} + +static struct sk_buff *__udp4_gso_segment(struct sk_buff *gso_skb, + netdev_features_t features) +{ + const struct iphdr *iph = ip_hdr(gso_skb); + unsigned int mss = skb_shinfo(gso_skb)->gso_size; + + if (!can_checksum_protocol(features, htons(ETH_P_IP))) + return ERR_PTR(-EIO); + + return __udp_gso_segment(gso_skb, features, mss, + udp_v4_check(sizeof(struct udphdr) + mss, + iph->saddr, iph->daddr, 0)); +} +EXPORT_SYMBOL_GPL(__udp4_gso_segment); + static struct sk_buff *udp4_ufo_fragment(struct sk_buff *skb, netdev_features_t features) { @@ -203,12 +250,15 @@ static struct sk_buff *udp4_ufo_fragment(struct sk_buff *skb, goto out; } - if (!(skb_shinfo(skb)->gso_type & SKB_GSO_UDP)) + if (!(skb_shinfo(skb)->gso_type & (SKB_GSO_UDP | SKB_GSO_UDP_L4))) goto out; if (!pskb_may_pull(skb, sizeof(struct udphdr))) goto out; + if (skb_shinfo(skb)->gso_type & SKB_GSO_UDP_L4) + return __udp4_gso_segment(skb, features); + mss = skb_shinfo(skb)->gso_size; if (unlikely(skb->len <= mss)) goto out; diff --git a/net/ipv6/ip6_offload.c b/net/ipv6/ip6_offload.c index 4a87f9428ca5..5b3f2f89ef41 100644 --- a/net/ipv6/ip6_offload.c +++ b/net/ipv6/ip6_offload.c @@ -88,9 +88,11 @@ static struct sk_buff *ipv6_gso_segment(struct sk_buff *skb, if (skb->encapsulation && skb_shinfo(skb)->gso_type & (SKB_GSO_IPXIP4 | SKB_GSO_IPXIP6)) - udpfrag = proto == IPPROTO_UDP && encap; + udpfrag = proto == IPPROTO_UDP && encap && + (skb_shinfo(skb)->gso_type & SKB_GSO_UDP); else - udpfrag = proto == IPPROTO_UDP && !skb->encapsulation; + udpfrag = proto == IPPROTO_UDP && !skb->encapsulation && + (skb_shinfo(skb)->gso_type & SKB_GSO_UDP); ops = rcu_dereference(inet6_offloads[proto]); if (likely(ops && ops->callbacks.gso_segment)) { diff --git a/net/ipv6/udp_offload.c b/net/ipv6/udp_offload.c index 2a04dc9c781b..f7b85b1e6b3e 100644 --- a/net/ipv6/udp_offload.c +++ b/net/ipv6/udp_offload.c @@ -17,6 +17,20 @@ #include #include "ip6_offload.h" +static struct sk_buff *__udp6_gso_segment(struct sk_buff *gso_skb, + netdev_features_t features) +{ + const struct ipv6hdr *ip6h = ipv6_hdr(gso_skb); + unsigned int mss = skb_shinfo(gso_skb)->gso_size; + + if (!can_checksum_protocol(features, htons(ETH_P_IPV6))) + return ERR_PTR(-EIO); + + return __udp_gso_segment(gso_skb, features, mss, + udp_v6_check(sizeof(struct udphdr) + mss, + &ip6h->saddr, &ip6h->daddr, 0)); +} + static struct sk_buff *udp6_ufo_fragment(struct sk_buff *skb, netdev_features_t features) { @@ -42,12 +56,15 @@ static struct sk_buff *udp6_ufo_fragment(struct sk_buff *skb, const struct ipv6hdr *ipv6h; struct udphdr *uh; - if (!(skb_shinfo(skb)->gso_type & SKB_GSO_UDP)) + if (!(skb_shinfo(skb)->gso_type & (SKB_GSO_UDP | SKB_GSO_UDP_L4))) goto out; if (!pskb_may_pull(skb, sizeof(struct udphdr))) goto out; + if (skb_shinfo(skb)->gso_type & SKB_GSO_UDP_L4) + return __udp6_gso_segment(skb, features); + /* Do software UFO. Complete and fill in the UDP checksum as HW cannot * do checksum of UDP packets sent as multiple IP fragments. */ From patchwork Thu Apr 26 17:42:17 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Willem de Bruijn X-Patchwork-Id: 905290 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="Le/pQxK5"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 40X4D64dlRz9ry1 for ; Fri, 27 Apr 2018 03:43:18 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932347AbeDZRnQ (ORCPT ); Thu, 26 Apr 2018 13:43:16 -0400 Received: from mail-qk0-f196.google.com ([209.85.220.196]:37737 "EHLO mail-qk0-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754014AbeDZRma (ORCPT ); Thu, 26 Apr 2018 13:42:30 -0400 Received: by mail-qk0-f196.google.com with SMTP id d74so26269657qkg.4 for ; Thu, 26 Apr 2018 10:42:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=KZBR7+8J0lWhrUdrE8qiVIGj+y48lE2M8LmOoM9t0q4=; b=Le/pQxK5fSmdfDruOTEYs5Rdxja3b/FGRegnP6snaOQnYswI62z6FhaV3KLKL07Xsu GOxDCbR90QKPM8j7dBREt6xI4xCGaUD+cWUkcXhAM1lTe2jke0B/8fGrHZt4hZ9CWICR BKzBGElCh4N5hKu8Z8kmbWL4duNsuGtxI2HSXddO+JKo+dKwyIAeltldfToofWbOm+4g ceGnfpkisuBSjNuZkxHTQHDGClvaDwZrLsD6ned0RkAG++ngiPeSbs57PR0/qINL0VHO Eykmb2F7fxkq0iVlnrJyP25InR039dxOK8hXysBW5NllwLnaH9xLqDELycES8T/c5F9c Cy3Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=KZBR7+8J0lWhrUdrE8qiVIGj+y48lE2M8LmOoM9t0q4=; b=tpp4A8urAZXJx1WRtRR6+YJ7AzAQr17UZJunFO8ZBHqiuBsUau7xe/UpaiU4xWsEe9 0aK7eB2TzodCW29ottOv3WTfA7v7hoAfmB57oBQg2OPHnNSthbrPlGBg8Uwuem0Curne v8y90BsvCgK0cqfeAd9Mq/LB9WPRfn2ohSyMoPYyw5jLHGkcayvOlCxb7iHPUo5VM76H J8bLh6RvhIg2JMALRpv+CCCdRz6eLsbWQutrfqBfCMPsgRVYqcjZSmYUtHKr+eeNr2/9 x38y0KFC6DyZRo3asXhWmbYbBhAoh9XHypYFv4854thabEfKnSQXhaAcdEVv+QyBIsEO ByAQ== X-Gm-Message-State: ALQs6tBOuvRHW/wwTvjr26j89awYE4eeoE6gccwOmvcYRDI95QMdlCDc SxnX3RZEiM1Z2IGQLmMWiWwM6BxW X-Google-Smtp-Source: AIpwx49oXYJWHmWt1SOSvzIElEfiyyfOZUzrsjE9hrgItre9OGPxd1LV/LtxDHm8W88T9jFzwqokdQ== X-Received: by 10.55.104.198 with SMTP id d189mr30456976qkc.81.1524764549449; Thu, 26 Apr 2018 10:42:29 -0700 (PDT) Received: from willemb1.nyc.corp.google.com ([2620:0:1003:315:3fa1:a34c:1128:1d39]) by smtp.gmail.com with ESMTPSA id o14-v6sm12927515qta.23.2018.04.26.10.42.28 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 26 Apr 2018 10:42:28 -0700 (PDT) From: Willem de Bruijn To: netdev@vger.kernel.org Cc: davem@davemloft.net, alexander.duyck@gmail.com, Willem de Bruijn Subject: [PATCH net-next v2 03/11] udp: generate gso with UDP_SEGMENT Date: Thu, 26 Apr 2018 13:42:17 -0400 Message-Id: <20180426174225.246388-4-willemdebruijn.kernel@gmail.com> X-Mailer: git-send-email 2.17.0.484.g0c8726318c-goog In-Reply-To: <20180426174225.246388-1-willemdebruijn.kernel@gmail.com> References: <20180426174225.246388-1-willemdebruijn.kernel@gmail.com> Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Willem de Bruijn Support generic segmentation offload for udp datagrams. Callers can concatenate and send at once the payload of multiple datagrams with the same destination. To set segment size, the caller sets socket option UDP_SEGMENT to the length of each discrete payload. This value must be smaller than or equal to the relevant MTU. A follow-up patch adds cmsg UDP_SEGMENT to specify segment size on a per send call basis. Total byte length may then exceed MTU. If not an exact multiple of segment size, the last segment will be shorter. The implementation adds a gso_size field to the udp socket, ip(v6) cmsg cookie and inet_cork structure to be able to set the value at setsockopt or cmsg time and to work with both lockless and corked paths. Initial benchmark numbers show UDP GSO about as expensive as TCP GSO. tcp tso 3197 MB/s 54232 msg/s 54232 calls/s 6,457,754,262 cycles tcp gso 1765 MB/s 29939 msg/s 29939 calls/s 11,203,021,806 cycles tcp without tso/gso * 739 MB/s 12548 msg/s 12548 calls/s 11,205,483,630 cycles udp 876 MB/s 14873 msg/s 624666 calls/s 11,205,777,429 cycles udp gso 2139 MB/s 36282 msg/s 36282 calls/s 11,204,374,561 cycles [*] after reverting commit 0a6b2a1dc2a2 ("tcp: switch to GSO being always on") Measured total system cycles ('-a') for one core while pinning both the network receive path and benchmark process to that core: perf stat -a -C 12 -e cycles \ ./udpgso_bench_tx -C 12 -4 -D "$DST" -l 4 Note the reduction in calls/s with GSO. Bytes per syscall drops increases from 1470 to 61818. Signed-off-by: Willem de Bruijn --- include/linux/udp.h | 3 +++ include/net/inet_sock.h | 1 + include/net/ip.h | 1 + include/net/ipv6.h | 1 + include/uapi/linux/udp.h | 1 + net/ipv4/ip_output.c | 9 ++++++--- net/ipv4/udp.c | 33 ++++++++++++++++++++++++++++++--- net/ipv6/ip6_output.c | 6 ++++-- net/ipv6/udp.c | 23 ++++++++++++++++++++--- 9 files changed, 67 insertions(+), 11 deletions(-) diff --git a/include/linux/udp.h b/include/linux/udp.h index eaea63bc79bb..ca840345571b 100644 --- a/include/linux/udp.h +++ b/include/linux/udp.h @@ -55,6 +55,7 @@ struct udp_sock { * when the socket is uncorked. */ __u16 len; /* total length of pending frames */ + __u16 gso_size; /* * Fields specific to UDP-Lite. */ @@ -87,6 +88,8 @@ struct udp_sock { int forward_deficit; }; +#define UDP_MAX_SEGMENTS (1 << 6UL) + static inline struct udp_sock *udp_sk(const struct sock *sk) { return (struct udp_sock *)sk; diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h index 0a671c32d6b9..83d5b3c2ac42 100644 --- a/include/net/inet_sock.h +++ b/include/net/inet_sock.h @@ -147,6 +147,7 @@ struct inet_cork { __u8 ttl; __s16 tos; char priority; + __u16 gso_size; }; struct inet_cork_full { diff --git a/include/net/ip.h b/include/net/ip.h index 7ec543a64bbc..bada1f1f871e 100644 --- a/include/net/ip.h +++ b/include/net/ip.h @@ -76,6 +76,7 @@ struct ipcm_cookie { __u8 ttl; __s16 tos; char priority; + __u16 gso_size; }; #define IPCB(skb) ((struct inet_skb_parm*)((skb)->cb)) diff --git a/include/net/ipv6.h b/include/net/ipv6.h index 0dd722cab037..0a872a7c33c8 100644 --- a/include/net/ipv6.h +++ b/include/net/ipv6.h @@ -298,6 +298,7 @@ struct ipcm6_cookie { __s16 tclass; __s8 dontfrag; struct ipv6_txoptions *opt; + __u16 gso_size; }; static inline struct ipv6_txoptions *txopt_get(const struct ipv6_pinfo *np) diff --git a/include/uapi/linux/udp.h b/include/uapi/linux/udp.h index efb7b5991c2f..09d00f8c442b 100644 --- a/include/uapi/linux/udp.h +++ b/include/uapi/linux/udp.h @@ -32,6 +32,7 @@ struct udphdr { #define UDP_ENCAP 100 /* Set the socket to accept encapsulated packets */ #define UDP_NO_CHECK6_TX 101 /* Disable sending checksum for UDP6X */ #define UDP_NO_CHECK6_RX 102 /* Disable accpeting checksum for UDP6 */ +#define UDP_SEGMENT 103 /* Set GSO segmentation size */ /* UDP encapsulation types */ #define UDP_ENCAP_ESPINUDP_NON_IKE 1 /* draft-ietf-ipsec-nat-t-ike-00/01 */ diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index 2883ff1e909c..da4abbee10f7 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -882,7 +882,8 @@ static int __ip_append_data(struct sock *sk, skb = skb_peek_tail(queue); exthdrlen = !skb ? rt->dst.header_len : 0; - mtu = cork->fragsize; + mtu = cork->gso_size ? IP_MAX_MTU : cork->fragsize; + if (cork->tx_flags & SKBTX_ANY_SW_TSTAMP && sk->sk_tsflags & SOF_TIMESTAMPING_OPT_ID) tskey = sk->sk_tskey++; @@ -906,7 +907,7 @@ static int __ip_append_data(struct sock *sk, if (transhdrlen && length + fragheaderlen <= mtu && rt->dst.dev->features & (NETIF_F_HW_CSUM | NETIF_F_IP_CSUM) && - !(flags & MSG_MORE) && + (!(flags & MSG_MORE) || cork->gso_size) && !exthdrlen) csummode = CHECKSUM_PARTIAL; @@ -1135,6 +1136,8 @@ static int ip_setup_cork(struct sock *sk, struct inet_cork *cork, *rtp = NULL; cork->fragsize = ip_sk_use_pmtu(sk) ? dst_mtu(&rt->dst) : rt->dst.dev->mtu; + + cork->gso_size = sk->sk_type == SOCK_DGRAM ? ipc->gso_size : 0; cork->dst = &rt->dst; cork->length = 0; cork->ttl = ipc->ttl; @@ -1214,7 +1217,7 @@ ssize_t ip_append_page(struct sock *sk, struct flowi4 *fl4, struct page *page, return -EOPNOTSUPP; hh_len = LL_RESERVED_SPACE(rt->dst.dev); - mtu = cork->fragsize; + mtu = cork->gso_size ? IP_MAX_MTU : cork->fragsize; fragheaderlen = sizeof(struct iphdr) + (opt ? opt->optlen : 0); maxfraglen = ((mtu - fragheaderlen) & ~7) + fragheaderlen; diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index 6b9d8017b319..bda022c5480b 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -757,7 +757,8 @@ void udp_set_csum(bool nocheck, struct sk_buff *skb, } EXPORT_SYMBOL(udp_set_csum); -static int udp_send_skb(struct sk_buff *skb, struct flowi4 *fl4) +static int udp_send_skb(struct sk_buff *skb, struct flowi4 *fl4, + struct inet_cork *cork) { struct sock *sk = skb->sk; struct inet_sock *inet = inet_sk(sk); @@ -777,6 +778,21 @@ static int udp_send_skb(struct sk_buff *skb, struct flowi4 *fl4) uh->len = htons(len); uh->check = 0; + if (cork->gso_size) { + const int hlen = skb_network_header_len(skb) + + sizeof(struct udphdr); + + if (hlen + cork->gso_size > cork->fragsize) + return -EINVAL; + if (skb->len > cork->gso_size * UDP_MAX_SEGMENTS) + return -EINVAL; + if (skb->ip_summed != CHECKSUM_PARTIAL || is_udplite) + return -EIO; + + skb_shinfo(skb)->gso_size = cork->gso_size; + skb_shinfo(skb)->gso_type = SKB_GSO_UDP_L4; + } + if (is_udplite) /* UDP-Lite */ csum = udplite_csum(skb); @@ -828,7 +844,7 @@ int udp_push_pending_frames(struct sock *sk) if (!skb) goto out; - err = udp_send_skb(skb, fl4); + err = udp_send_skb(skb, fl4, &inet->cork.base); out: up->len = 0; @@ -922,6 +938,7 @@ int udp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) ipc.sockc.tsflags = sk->sk_tsflags; ipc.addr = inet->inet_saddr; ipc.oif = sk->sk_bound_dev_if; + ipc.gso_size = up->gso_size; if (msg->msg_controllen) { err = ip_cmsg_send(sk, msg, &ipc, sk->sk_family == AF_INET6); @@ -1037,7 +1054,7 @@ int udp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) &cork, msg->msg_flags); err = PTR_ERR(skb); if (!IS_ERR_OR_NULL(skb)) - err = udp_send_skb(skb, fl4); + err = udp_send_skb(skb, fl4, &cork); goto out; } @@ -2367,6 +2384,12 @@ int udp_lib_setsockopt(struct sock *sk, int level, int optname, up->no_check6_rx = valbool; break; + case UDP_SEGMENT: + if (val < 0 || val > USHRT_MAX) + return -EINVAL; + up->gso_size = val; + break; + /* * UDP-Lite's partial checksum coverage (RFC 3828). */ @@ -2457,6 +2480,10 @@ int udp_lib_getsockopt(struct sock *sk, int level, int optname, val = up->no_check6_rx; break; + case UDP_SEGMENT: + val = up->gso_size; + break; + /* The following two cannot be changed on UDP sockets, the return is * always 0 (which corresponds to the full checksum coverage of UDP). */ case UDPLITE_SEND_CSCOV: diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index 7fa1db447405..a1c4a78132d2 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -1240,6 +1240,8 @@ static int ip6_setup_cork(struct sock *sk, struct inet_cork_full *cork, if (mtu < IPV6_MIN_MTU) return -EINVAL; cork->base.fragsize = mtu; + cork->base.gso_size = sk->sk_type == SOCK_DGRAM ? ipc6->gso_size : 0; + if (dst_allfrag(xfrm_dst_path(&rt->dst))) cork->base.flags |= IPCORK_ALLFRAG; cork->base.length = 0; @@ -1281,7 +1283,7 @@ static int __ip6_append_data(struct sock *sk, dst_exthdrlen = rt->dst.header_len - rt->rt6i_nfheader_len; } - mtu = cork->fragsize; + mtu = cork->gso_size ? IP6_MAX_MTU : cork->fragsize; orig_mtu = mtu; hh_len = LL_RESERVED_SPACE(rt->dst.dev); @@ -1329,7 +1331,7 @@ static int __ip6_append_data(struct sock *sk, if (transhdrlen && sk->sk_protocol == IPPROTO_UDP && headersize == sizeof(struct ipv6hdr) && length <= mtu - headersize && - !(flags & MSG_MORE) && + (!(flags & MSG_MORE) || cork->gso_size) && rt->dst.dev->features & (NETIF_F_IPV6_CSUM | NETIF_F_HW_CSUM)) csummode = CHECKSUM_PARTIAL; diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c index 824797f8d1ab..86b7dd58d4b4 100644 --- a/net/ipv6/udp.c +++ b/net/ipv6/udp.c @@ -1023,7 +1023,8 @@ static void udp6_hwcsum_outgoing(struct sock *sk, struct sk_buff *skb, * Sending */ -static int udp_v6_send_skb(struct sk_buff *skb, struct flowi6 *fl6) +static int udp_v6_send_skb(struct sk_buff *skb, struct flowi6 *fl6, + struct inet_cork *cork) { struct sock *sk = skb->sk; struct udphdr *uh; @@ -1042,6 +1043,21 @@ static int udp_v6_send_skb(struct sk_buff *skb, struct flowi6 *fl6) uh->len = htons(len); uh->check = 0; + if (cork->gso_size) { + const int hlen = skb_network_header_len(skb) + + sizeof(struct udphdr); + + if (hlen + cork->gso_size > cork->fragsize) + return -EINVAL; + if (skb->len > cork->gso_size * UDP_MAX_SEGMENTS) + return -EINVAL; + if (skb->ip_summed != CHECKSUM_PARTIAL || is_udplite) + return -EIO; + + skb_shinfo(skb)->gso_size = cork->gso_size; + skb_shinfo(skb)->gso_type = SKB_GSO_UDP_L4; + } + if (is_udplite) csum = udplite_csum(skb); else if (udp_sk(sk)->no_check6_tx) { /* UDP csum disabled */ @@ -1093,7 +1109,7 @@ static int udp_v6_push_pending_frames(struct sock *sk) if (!skb) goto out; - err = udp_v6_send_skb(skb, &fl6); + err = udp_v6_send_skb(skb, &fl6, &inet_sk(sk)->cork.base); out: up->len = 0; @@ -1127,6 +1143,7 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) ipc6.hlimit = -1; ipc6.tclass = -1; ipc6.dontfrag = -1; + ipc6.gso_size = up->gso_size; sockc.tsflags = sk->sk_tsflags; /* destination address check */ @@ -1333,7 +1350,7 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) msg->msg_flags, &cork, &sockc); err = PTR_ERR(skb); if (!IS_ERR_OR_NULL(skb)) - err = udp_v6_send_skb(skb, &fl6); + err = udp_v6_send_skb(skb, &fl6, &cork.base); goto out; } From patchwork Thu Apr 26 17:42:18 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Willem de Bruijn X-Patchwork-Id: 905283 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="dqa2F+yq"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 40X4CR2Kckz9ry1 for ; Fri, 27 Apr 2018 03:42:43 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932255AbeDZRml (ORCPT ); Thu, 26 Apr 2018 13:42:41 -0400 Received: from mail-qt0-f194.google.com ([209.85.216.194]:33481 "EHLO mail-qt0-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756850AbeDZRmb (ORCPT ); Thu, 26 Apr 2018 13:42:31 -0400 Received: by mail-qt0-f194.google.com with SMTP id f16-v6so28798913qth.0 for ; Thu, 26 Apr 2018 10:42:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=JIy6jWuNVmGDK0FMCsh6oGo2DOs0LSikBpfZxTWBXvQ=; b=dqa2F+yq26oxsQMBid+VUuTDd5x46OmJJudWbZMlTGWlM1tWCgebxqqVLzyHAfbv5z qqRKA6MN4/NKyy7dw2/WInsrrhPsJaVcHD8SZDEqhoNRz6EO0KRKi5dcYrT989UOeFfr gxo6sWjLNARS1d2VRDIp/CCSr8Y3EwcQuQKj8/uZMeomTuPFgU9Py5l4h++SwEnMxL+K UuFB3npUiVvMs1u02lCK9q7pzIwvzfjHkB+UVekWktsweqwWsMR9/yn4b8q+Ih9eaZz5 mYvx4Pjep9af7MQn0CEbGx6gOpI1jqb0bCLgvznmU1QsfCTLC/oI/sv9KBYuTDsJ3jq7 LwUQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=JIy6jWuNVmGDK0FMCsh6oGo2DOs0LSikBpfZxTWBXvQ=; b=IuYeODAJzpfWUJgNYRQOoGtpxbLNx6ECDrLrXjHgvHeDuZR1BdsV3NPJZt0zxJerxW PCpayw1QGQlq44WCNjCCCYk0OTELm9X1j5xiUMaXCnioAV2bkDtLxYQ8Ut9ATP9ehIUX Cz/09o5Gj+EXpF5ZMShHq9NBlmJL23uOtyc9zAkaF42Te1W8Teq6tgt7XIvvw7N/eHdr q309ruzaES1Giq9bsB/FVrOJxUJUlSZF37bDuRSuhVfJoJUMt834kG7V/IStwBJDzYLB 8cBVONFzrnY+eEPjzBuuL+RMgd1qQMKN7oJ6xzaeLYKnpuxZHHb0PiZrdJ/Tqn2OPOBR 2Usw== X-Gm-Message-State: ALQs6tBX0E2gHWHWADi8JI0NHyDEeLj2K2OTMaKTxDiHUIVlLkP7nEvp su/DeqcoJvz6hkaDymHkQnnYCzr3 X-Google-Smtp-Source: AB8JxZqZSCAgqt0iOcaWY9oEfeW9YaiWdv/fwTBq271qoi22pK1hKm3kvHTgzkJ3esq9i664uW9dDw== X-Received: by 2002:ac8:39f:: with SMTP id t31-v6mr37362243qtg.259.1524764550179; Thu, 26 Apr 2018 10:42:30 -0700 (PDT) Received: from willemb1.nyc.corp.google.com ([2620:0:1003:315:3fa1:a34c:1128:1d39]) by smtp.gmail.com with ESMTPSA id o14-v6sm12927515qta.23.2018.04.26.10.42.29 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 26 Apr 2018 10:42:29 -0700 (PDT) From: Willem de Bruijn To: netdev@vger.kernel.org Cc: davem@davemloft.net, alexander.duyck@gmail.com, Willem de Bruijn Subject: [PATCH net-next v2 04/11] udp: better wmem accounting on gso Date: Thu, 26 Apr 2018 13:42:18 -0400 Message-Id: <20180426174225.246388-5-willemdebruijn.kernel@gmail.com> X-Mailer: git-send-email 2.17.0.484.g0c8726318c-goog In-Reply-To: <20180426174225.246388-1-willemdebruijn.kernel@gmail.com> References: <20180426174225.246388-1-willemdebruijn.kernel@gmail.com> Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Willem de Bruijn skb_segment by default transfers allocated wmem from the gso skb to the tail of the segment list. This underreports real truesize of the list, especially if the tail might be dropped. Similar to tcp_gso_segment, update wmem_alloc with the aggregate list truesize and make each segment responsible for its own share by setting skb->destructor. Clear gso_skb->destructor prior to calling skb_segment to skip the default assignment to tail. Signed-off-by: Willem de Bruijn --- net/ipv4/udp_offload.c | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c index 6b09220ca407..d806529b9b3d 100644 --- a/net/ipv4/udp_offload.c +++ b/net/ipv4/udp_offload.c @@ -191,6 +191,8 @@ struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb, netdev_features_t features, unsigned int mss, __sum16 check) { + struct sock *sk = gso_skb->sk; + unsigned int sum_truesize = 0; struct sk_buff *segs, *seg; unsigned int hdrlen; struct udphdr *uh; @@ -201,9 +203,15 @@ struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb, hdrlen = gso_skb->data - skb_mac_header(gso_skb); skb_pull(gso_skb, sizeof(*uh)); + /* clear destructor to avoid skb_segment assigning it to tail */ + WARN_ON_ONCE(gso_skb->destructor != sock_wfree); + gso_skb->destructor = NULL; + segs = skb_segment(gso_skb, features); - if (unlikely(IS_ERR_OR_NULL(segs))) + if (unlikely(IS_ERR_OR_NULL(segs))) { + gso_skb->destructor = sock_wfree; return segs; + } for (seg = segs; seg; seg = seg->next) { uh = udp_hdr(seg); @@ -214,8 +222,14 @@ struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb, if (!seg->next) csum_replace2(&uh->check, htons(mss), htons(seg->len - hdrlen - sizeof(*uh))); + + seg->destructor = sock_wfree; + seg->sk = sk; + sum_truesize += seg->truesize; } + refcount_add(sum_truesize - gso_skb->truesize, &sk->sk_wmem_alloc); + return segs; } From patchwork Thu Apr 26 17:42:19 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Willem de Bruijn X-Patchwork-Id: 905282 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="dDGt7eUb"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 40X4CN0P0Sz9ry1 for ; Fri, 27 Apr 2018 03:42:40 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932240AbeDZRmi (ORCPT ); Thu, 26 Apr 2018 13:42:38 -0400 Received: from mail-qk0-f196.google.com ([209.85.220.196]:46025 "EHLO mail-qk0-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932120AbeDZRmd (ORCPT ); Thu, 26 Apr 2018 13:42:33 -0400 Received: by mail-qk0-f196.google.com with SMTP id x22so8427180qkb.12 for ; Thu, 26 Apr 2018 10:42:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=9xyO/PLneMT3TJwQoecAhs9L3I0k6Ixm6+eCSxmGGDc=; b=dDGt7eUbjBy90b8U3DpgSs6CL/hGPhUaiEi6S38e988Ix/Fc+OybXULbsX0iPvGMC8 kmLZBHuNg0G86V6xQdRRrbCyvdzqNFaLAsg8CjOLKTBjI0yAegjRi0AQAl/XMqMfHGUi YuqQbXtGSr9uKhZS4LDoOsGSP2RkA9/t5G9W8J0Ybt5DFIeHeMlaL6KNg5ncKyuY5bYy t4jB5MIEkuwYGHpKcQj9grglAGbnEZJ6lkKCafbRjptDNWvb9VkCIPe7iIZMmjBSgvc7 n/klPRIraWJ7taZxutk4V49+h9XeWu57VPsg55SFcCpa3Cet46eGuDvYtnGAPl6I7OSH pUIA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=9xyO/PLneMT3TJwQoecAhs9L3I0k6Ixm6+eCSxmGGDc=; b=I+EQO/HMDVSgx3sFduqByfkHvVQnDlKSg33cU2wYdr+7Y2c4hly0Gjs85JwO3CvNKM Pb+ropgE7N8721+EpXYfGqKb+FLJn5wF8EXjc5AHORlKbmyRCru26J6ha/PvmHR7juoV DsAlaqm2/CRcugtHlGE13teZkfOeehQcllRqBDqeOtAyRTQWXeMZgaTF6JJwm2rLi559 TRveGlF+djNjU0Zp+jOMfPBzi1W3tSGz68sO0xVtniVi4BpAE9zXv+XY57TED4QoOSkV yi8R/vq5T6qOkrmpLIhstkXrCYiXNzkFdcwAoOIYbYeAuSB1V5kChe46zx3V1LhuFRRR yc3g== X-Gm-Message-State: ALQs6tC/UK4PNjSSXlktEOTrU6poaGyC/dGgeMFTvF1nWPlRRa7reNQ1 IwuEM2iuSKiwKynLhLGElPPOPXHq X-Google-Smtp-Source: AB8JxZrvUS+5RKiTYLP7lwZnYopXzY8XLEmTbfP/SqBOjWev8nQNdkHboAEw4sURDxtt6ziMx0P7nw== X-Received: by 10.55.11.65 with SMTP id 62mr35712713qkl.287.1524764551063; Thu, 26 Apr 2018 10:42:31 -0700 (PDT) Received: from willemb1.nyc.corp.google.com ([2620:0:1003:315:3fa1:a34c:1128:1d39]) by smtp.gmail.com with ESMTPSA id o14-v6sm12927515qta.23.2018.04.26.10.42.30 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 26 Apr 2018 10:42:30 -0700 (PDT) From: Willem de Bruijn To: netdev@vger.kernel.org Cc: davem@davemloft.net, alexander.duyck@gmail.com, Willem de Bruijn Subject: [PATCH net-next v2 05/11] udp: paged allocation with gso Date: Thu, 26 Apr 2018 13:42:19 -0400 Message-Id: <20180426174225.246388-6-willemdebruijn.kernel@gmail.com> X-Mailer: git-send-email 2.17.0.484.g0c8726318c-goog In-Reply-To: <20180426174225.246388-1-willemdebruijn.kernel@gmail.com> References: <20180426174225.246388-1-willemdebruijn.kernel@gmail.com> Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Willem de Bruijn When sending large datagrams that are later segmented, store data in page frags to avoid copying from linear in skb_segment. Signed-off-by: Willem de Bruijn --- net/ipv4/ip_output.c | 15 +++++++++++---- net/ipv6/ip6_output.c | 19 ++++++++++++++----- 2 files changed, 25 insertions(+), 9 deletions(-) diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index da4abbee10f7..f2338e40c37d 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -878,11 +878,13 @@ static int __ip_append_data(struct sock *sk, struct rtable *rt = (struct rtable *)cork->dst; unsigned int wmem_alloc_delta = 0; u32 tskey = 0; + bool paged; skb = skb_peek_tail(queue); exthdrlen = !skb ? rt->dst.header_len : 0; mtu = cork->gso_size ? IP_MAX_MTU : cork->fragsize; + paged = !!cork->gso_size; if (cork->tx_flags & SKBTX_ANY_SW_TSTAMP && sk->sk_tsflags & SOF_TIMESTAMPING_OPT_ID) @@ -934,6 +936,7 @@ static int __ip_append_data(struct sock *sk, unsigned int fraglen; unsigned int fraggap; unsigned int alloclen; + unsigned int pagedlen = 0; struct sk_buff *skb_prev; alloc_new_skb: skb_prev = skb; @@ -954,8 +957,12 @@ static int __ip_append_data(struct sock *sk, if ((flags & MSG_MORE) && !(rt->dst.dev->features&NETIF_F_SG)) alloclen = mtu; - else + else if (!paged) alloclen = fraglen; + else { + alloclen = min_t(int, fraglen, MAX_HEADER); + pagedlen = fraglen - alloclen; + } alloclen += exthdrlen; @@ -999,7 +1006,7 @@ static int __ip_append_data(struct sock *sk, /* * Find where to start putting bytes. */ - data = skb_put(skb, fraglen + exthdrlen); + data = skb_put(skb, fraglen + exthdrlen - pagedlen); skb_set_network_header(skb, exthdrlen); skb->transport_header = (skb->network_header + fragheaderlen); @@ -1015,7 +1022,7 @@ static int __ip_append_data(struct sock *sk, pskb_trim_unique(skb_prev, maxfraglen); } - copy = datalen - transhdrlen - fraggap; + copy = datalen - transhdrlen - fraggap - pagedlen; if (copy > 0 && getfrag(from, data + transhdrlen, offset, copy, fraggap, skb) < 0) { err = -EFAULT; kfree_skb(skb); @@ -1023,7 +1030,7 @@ static int __ip_append_data(struct sock *sk, } offset += copy; - length -= datalen - fraggap; + length -= copy + transhdrlen; transhdrlen = 0; exthdrlen = 0; csummode = CHECKSUM_NONE; diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index a1c4a78132d2..dfd8af41824e 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -1276,6 +1276,7 @@ static int __ip6_append_data(struct sock *sk, int csummode = CHECKSUM_NONE; unsigned int maxnonfragsize, headersize; unsigned int wmem_alloc_delta = 0; + bool paged; skb = skb_peek_tail(queue); if (!skb) { @@ -1283,6 +1284,7 @@ static int __ip6_append_data(struct sock *sk, dst_exthdrlen = rt->dst.header_len - rt->rt6i_nfheader_len; } + paged = !!cork->gso_size; mtu = cork->gso_size ? IP6_MAX_MTU : cork->fragsize; orig_mtu = mtu; @@ -1374,6 +1376,7 @@ static int __ip6_append_data(struct sock *sk, unsigned int fraglen; unsigned int fraggap; unsigned int alloclen; + unsigned int pagedlen = 0; alloc_new_skb: /* There's no room in the current skb */ if (skb) @@ -1396,11 +1399,17 @@ static int __ip6_append_data(struct sock *sk, if (datalen > (cork->length <= mtu && !(cork->flags & IPCORK_ALLFRAG) ? mtu : maxfraglen) - fragheaderlen) datalen = maxfraglen - fragheaderlen - rt->dst.trailer_len; + fraglen = datalen + fragheaderlen; + if ((flags & MSG_MORE) && !(rt->dst.dev->features&NETIF_F_SG)) alloclen = mtu; - else - alloclen = datalen + fragheaderlen; + else if (!paged) + alloclen = fraglen; + else { + alloclen = min_t(int, fraglen, MAX_HEADER); + pagedlen = fraglen - alloclen; + } alloclen += dst_exthdrlen; @@ -1422,7 +1431,7 @@ static int __ip6_append_data(struct sock *sk, */ alloclen += sizeof(struct frag_hdr); - copy = datalen - transhdrlen - fraggap; + copy = datalen - transhdrlen - fraggap - pagedlen; if (copy < 0) { err = -EINVAL; goto error; @@ -1461,7 +1470,7 @@ static int __ip6_append_data(struct sock *sk, /* * Find where to start putting bytes */ - data = skb_put(skb, fraglen); + data = skb_put(skb, fraglen - pagedlen); skb_set_network_header(skb, exthdrlen); data += fragheaderlen; skb->transport_header = (skb->network_header + @@ -1484,7 +1493,7 @@ static int __ip6_append_data(struct sock *sk, } offset += copy; - length -= datalen - fraggap; + length -= copy + transhdrlen; transhdrlen = 0; exthdrlen = 0; dst_exthdrlen = 0; From patchwork Thu Apr 26 17:42:20 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Willem de Bruijn X-Patchwork-Id: 905289 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="i/0YCXQw"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 40X4D11wZ3z9ry1 for ; Fri, 27 Apr 2018 03:43:13 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932340AbeDZRnL (ORCPT ); Thu, 26 Apr 2018 13:43:11 -0400 Received: from mail-qk0-f195.google.com ([209.85.220.195]:39833 "EHLO mail-qk0-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932177AbeDZRmd (ORCPT ); Thu, 26 Apr 2018 13:42:33 -0400 Received: by mail-qk0-f195.google.com with SMTP id z75so15255881qkb.6 for ; Thu, 26 Apr 2018 10:42:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=ln7Av1ay+NAN76hv1AeROwM6UjJ1Nvmnrt7yu6GUSZc=; b=i/0YCXQw/mLTzHzerIS4tZry8/Bpo/eXnz+TM3FAOwD9DJK7gcVdulD9pLjDbwBJ+Z vtDsC6jtFBFGgoE2OOQigXznEcSh1vIoQQU08jdihJKkb0Ld6oCwh4O/1XonLiPnG+fY qYHyt8iU3uNjdUMEANzVN/wPFYfHgvI09Fm+2MZLZ2SZD0fa8uZkxL6z9GhHtmRFxASh j6XGWl3YsNPzBg8KCfpGZQCP2tera+sdv3lsXKvYuppzfaflJW2mrvWlgrzoNjkSRNhf 8lxe0A1uQqbg6BkhQvKGiw5m8KJ0zddlsOpGaAYih5Hn3v3paNovBY7ZDSvZIDDw5EWy yikA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=ln7Av1ay+NAN76hv1AeROwM6UjJ1Nvmnrt7yu6GUSZc=; b=tl4WnbYI2tZuuJcTujKAvFV0Z+eJNlKnbDChRglOtF7w+UM7resHfnQ1cEANNGO46p pfL6IQuhTPfPNZwKG0nZGN8i9AvWdpi0c+/kTZEL0aW2kh64PvgkOsBDciiO/4g88mLy C0NiFs711YyV0I+JmbEYH0Q5HWFIsAdL2LfLg1czmMEgMTx28UOJ0+XAMYdZPgrIIo2f t0j64wgeto8ybCq6iGGbYT0D9tI+9G0WZYuXr6BqaPWYmLvEgnv/UahgryDchEBBwk37 In/X/KR2PRfQ3TPgG8BdAgyx5qA2Jvp/69x1D10Cq37XKkFS4bzNG1VC/QOzeaBWg/0t Ig7w== X-Gm-Message-State: ALQs6tAHN8HL9AdB+AfOVxHM+yI54KWzcCvU5K0zwAOAJiz1HnbNS2qK UwhlBWSvFUInevwhIY7sv9cas1xc X-Google-Smtp-Source: AB8JxZobpOsIA3wjugiZOqWQHsxfcaq6Mi5gpx+3JGCuJ0e3/uBPERgEQWnHKi9forJ/QWg7TRe1wA== X-Received: by 10.55.153.133 with SMTP id b127mr30223377qke.43.1524764551965; Thu, 26 Apr 2018 10:42:31 -0700 (PDT) Received: from willemb1.nyc.corp.google.com ([2620:0:1003:315:3fa1:a34c:1128:1d39]) by smtp.gmail.com with ESMTPSA id o14-v6sm12927515qta.23.2018.04.26.10.42.31 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 26 Apr 2018 10:42:31 -0700 (PDT) From: Willem de Bruijn To: netdev@vger.kernel.org Cc: davem@davemloft.net, alexander.duyck@gmail.com, Willem de Bruijn Subject: [PATCH net-next v2 06/11] udp: add gso segment cmsg Date: Thu, 26 Apr 2018 13:42:20 -0400 Message-Id: <20180426174225.246388-7-willemdebruijn.kernel@gmail.com> X-Mailer: git-send-email 2.17.0.484.g0c8726318c-goog In-Reply-To: <20180426174225.246388-1-willemdebruijn.kernel@gmail.com> References: <20180426174225.246388-1-willemdebruijn.kernel@gmail.com> Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Willem de Bruijn Allow specifying segment size in the send call. The new control message performs the same function as socket option UDP_SEGMENT while avoiding the extra system call. Signed-off-by: Willem de Bruijn --- include/net/udp.h | 1 + net/ipv4/udp.c | 43 +++++++++++++++++++++++++++++++++++++++++-- net/ipv6/udp.c | 5 ++++- 3 files changed, 46 insertions(+), 3 deletions(-) diff --git a/include/net/udp.h b/include/net/udp.h index 741d888d0fdb..05990746810e 100644 --- a/include/net/udp.h +++ b/include/net/udp.h @@ -273,6 +273,7 @@ int udp_abort(struct sock *sk, int err); int udp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len); int udp_push_pending_frames(struct sock *sk); void udp_flush_pending_frames(struct sock *sk); +int udp_cmsg_send(struct sock *sk, struct msghdr *msg, u16 *gso_size); void udp4_hwcsum(struct sk_buff *skb, __be32 src, __be32 dst); int udp_rcv(struct sk_buff *skb); int udp_ioctl(struct sock *sk, int cmd, unsigned long arg); diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index bda022c5480b..350bdafc462b 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -853,6 +853,42 @@ int udp_push_pending_frames(struct sock *sk) } EXPORT_SYMBOL(udp_push_pending_frames); +static int __udp_cmsg_send(struct cmsghdr *cmsg, u16 *gso_size) +{ + switch (cmsg->cmsg_type) { + case UDP_SEGMENT: + if (cmsg->cmsg_len != CMSG_LEN(sizeof(__u16))) + return -EINVAL; + *gso_size = *(__u16 *)CMSG_DATA(cmsg); + return 0; + default: + return -EINVAL; + } +} + +int udp_cmsg_send(struct sock *sk, struct msghdr *msg, u16 *gso_size) +{ + struct cmsghdr *cmsg; + bool need_ip = false; + int err; + + for_each_cmsghdr(cmsg, msg) { + if (!CMSG_OK(msg, cmsg)) + return -EINVAL; + + if (cmsg->cmsg_level != SOL_UDP) { + need_ip = true; + continue; + } + + err = __udp_cmsg_send(cmsg, gso_size); + if (err) + return err; + } + + return need_ip; +} + int udp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) { struct inet_sock *inet = inet_sk(sk); @@ -941,8 +977,11 @@ int udp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) ipc.gso_size = up->gso_size; if (msg->msg_controllen) { - err = ip_cmsg_send(sk, msg, &ipc, sk->sk_family == AF_INET6); - if (unlikely(err)) { + err = udp_cmsg_send(sk, msg, &ipc.gso_size); + if (err > 0) + err = ip_cmsg_send(sk, msg, &ipc, + sk->sk_family == AF_INET6); + if (unlikely(err < 0)) { kfree(ipc.opt); return err; } diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c index 86b7dd58d4b4..6acfdd3e442b 100644 --- a/net/ipv6/udp.c +++ b/net/ipv6/udp.c @@ -1276,7 +1276,10 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) opt->tot_len = sizeof(*opt); ipc6.opt = opt; - err = ip6_datagram_send_ctl(sock_net(sk), sk, msg, &fl6, &ipc6, &sockc); + err = udp_cmsg_send(sk, msg, &ipc6.gso_size); + if (err > 0) + err = ip6_datagram_send_ctl(sock_net(sk), sk, msg, &fl6, + &ipc6, &sockc); if (err < 0) { fl6_sock_release(flowlabel); return err; From patchwork Thu Apr 26 17:42:21 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Willem de Bruijn X-Patchwork-Id: 905288 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="FXdI8KGW"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 40X4Cw6tYRz9ry1 for ; Fri, 27 Apr 2018 03:43:08 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932327AbeDZRnH (ORCPT ); Thu, 26 Apr 2018 13:43:07 -0400 Received: from mail-qt0-f193.google.com ([209.85.216.193]:36877 "EHLO mail-qt0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932230AbeDZRme (ORCPT ); Thu, 26 Apr 2018 13:42:34 -0400 Received: by mail-qt0-f193.google.com with SMTP id w12-v6so32604167qti.4 for ; Thu, 26 Apr 2018 10:42:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=+egsGqeOvVr4H8w7WOd5T1JNH2GybxQYydGpdtZhVxI=; b=FXdI8KGWSUkjOmDUfVuUzcDSuJMf7jnGA1p95i98qxa6nms4NtPQPNTcSE6I3mPhVG vPpHnuuX7V+rbrLvEg3fVvsRggyAAHp4sknCYWVX0XZvOXmqSAC1uYF876jYUPYmK3u3 YUmxRNYmvDOs/ElcLXvD1pRpx5e7TlZMZWUhV8iUcEo3JlcSzyuifOubDDXH4jc/sNK+ esw+jyfZXm4styx1j4CtpvgJTvVY9T2Ne5bdBVHiunKZY4tzrO+39P4582byaRfVBvoz 1GkcpsJtFAP2YxLzkC+medL+fDjzT/hmb79MVvmf6IJI4FUb0KDij3jcQAUxE85ERyRG fwkw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=+egsGqeOvVr4H8w7WOd5T1JNH2GybxQYydGpdtZhVxI=; b=HTXGYXSNnDKHcR1SYsX7uAzjVs0txQ1g1pfad8o001TXSBxtxTHx/m5kJcawiVEdSV osOP7ScF6cwIOK+vUwUaNSdbGpaON3HfuepwwrPtimKObq99N3bGYoJM5eurPa2QJmgX EjoTIv8dtyn3rrYUA9HCUceOQ5bgJiAVmcUUZM/iFaxgKEMa6F8IUip4OdcqLIBXhyNv yTD+vNsSBnHd0bc+yFpoZuixFV4MCj5So/8iqVR00hlpdSzTR8K8OSkE3NGTyLkQZ+8d RQWP/yiySQ9lZ5eJj3FBDPE2AyDda2EFiu48F4+PWrattAGEz1SASjrPOLtlg/xVv9zs p6KA== X-Gm-Message-State: ALQs6tCGjV8kMHu1VxBvFkVn4BVB6/br5IoH7bgp0MRzOeV7Xgodjc6R isEdQ117NCpViVEZTvAvBOi1/dkY X-Google-Smtp-Source: AB8JxZrgZoTwaLSYTqkEL7Ken/TXjLOcJQqlYxKpL/pWc5N92XB6snfAxWqV4IWG1mXYCbT8Eq9NIA== X-Received: by 2002:ac8:376b:: with SMTP id p40-v6mr38748315qtb.282.1524764553257; Thu, 26 Apr 2018 10:42:33 -0700 (PDT) Received: from willemb1.nyc.corp.google.com ([2620:0:1003:315:3fa1:a34c:1128:1d39]) by smtp.gmail.com with ESMTPSA id o14-v6sm12927515qta.23.2018.04.26.10.42.31 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 26 Apr 2018 10:42:32 -0700 (PDT) From: Willem de Bruijn To: netdev@vger.kernel.org Cc: davem@davemloft.net, alexander.duyck@gmail.com, Willem de Bruijn Subject: [PATCH net-next v2 07/11] udp: add gso support to virtual devices Date: Thu, 26 Apr 2018 13:42:21 -0400 Message-Id: <20180426174225.246388-8-willemdebruijn.kernel@gmail.com> X-Mailer: git-send-email 2.17.0.484.g0c8726318c-goog In-Reply-To: <20180426174225.246388-1-willemdebruijn.kernel@gmail.com> References: <20180426174225.246388-1-willemdebruijn.kernel@gmail.com> Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Willem de Bruijn Virtual devices such as tunnels and bonding can handle large packets. Only segment packets when reaching a physical or loopback device. Signed-off-by: Willem de Bruijn --- Documentation/networking/netdev-features.txt | 7 +++++++ include/linux/netdev_features.h | 5 ++++- include/linux/netdevice.h | 1 + net/core/ethtool.c | 1 + 4 files changed, 13 insertions(+), 1 deletion(-) diff --git a/Documentation/networking/netdev-features.txt b/Documentation/networking/netdev-features.txt index c77f9d57eb91..c4a54c162547 100644 --- a/Documentation/networking/netdev-features.txt +++ b/Documentation/networking/netdev-features.txt @@ -113,6 +113,13 @@ whatever headers there might be. NETIF_F_TSO_ECN means that hardware can properly split packets with CWR bit set, be it TCPv4 (when NETIF_F_TSO is enabled) or TCPv6 (NETIF_F_TSO6). + * Transmit UDP segmentation offload + +NETIF_F_GSO_UDP_GSO_L4 accepts a single UDP header with a payload that exceeds +gso_size. On segmentation, it segments the payload on gso_size boundaries and +replicates the network and UDP headers (fixing up the last one if less than +gso_size). + * Transmit DMA from high memory On platforms where this is relevant, NETIF_F_HIGHDMA signals that diff --git a/include/linux/netdev_features.h b/include/linux/netdev_features.h index 35b79f47a13d..fe2f3b30960e 100644 --- a/include/linux/netdev_features.h +++ b/include/linux/netdev_features.h @@ -55,8 +55,9 @@ enum { NETIF_F_GSO_SCTP_BIT, /* ... SCTP fragmentation */ NETIF_F_GSO_ESP_BIT, /* ... ESP with TSO */ NETIF_F_GSO_UDP_BIT, /* ... UFO, deprecated except tuntap */ + NETIF_F_GSO_UDP_L4_BIT, /* ... UDP payload GSO (not UFO) */ /**/NETIF_F_GSO_LAST = /* last bit, see GSO_MASK */ - NETIF_F_GSO_UDP_BIT, + NETIF_F_GSO_UDP_L4_BIT, NETIF_F_FCOE_CRC_BIT, /* FCoE CRC32 */ NETIF_F_SCTP_CRC_BIT, /* SCTP checksum offload */ @@ -147,6 +148,7 @@ enum { #define NETIF_F_HW_ESP_TX_CSUM __NETIF_F(HW_ESP_TX_CSUM) #define NETIF_F_RX_UDP_TUNNEL_PORT __NETIF_F(RX_UDP_TUNNEL_PORT) #define NETIF_F_HW_TLS_RECORD __NETIF_F(HW_TLS_RECORD) +#define NETIF_F_GSO_UDP_L4 __NETIF_F(GSO_UDP_L4) #define for_each_netdev_feature(mask_addr, bit) \ for_each_set_bit(bit, (unsigned long *)mask_addr, NETDEV_FEATURE_COUNT) @@ -216,6 +218,7 @@ enum { NETIF_F_GSO_GRE_CSUM | \ NETIF_F_GSO_IPXIP4 | \ NETIF_F_GSO_IPXIP6 | \ + NETIF_F_GSO_UDP_L4 | \ NETIF_F_GSO_UDP_TUNNEL | \ NETIF_F_GSO_UDP_TUNNEL_CSUM) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 14e0777ffcfb..366c32891158 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -4186,6 +4186,7 @@ static inline bool net_gso_ok(netdev_features_t features, int gso_type) BUILD_BUG_ON(SKB_GSO_SCTP != (NETIF_F_GSO_SCTP >> NETIF_F_GSO_SHIFT)); BUILD_BUG_ON(SKB_GSO_ESP != (NETIF_F_GSO_ESP >> NETIF_F_GSO_SHIFT)); BUILD_BUG_ON(SKB_GSO_UDP != (NETIF_F_GSO_UDP >> NETIF_F_GSO_SHIFT)); + BUILD_BUG_ON(SKB_GSO_UDP_L4 != (NETIF_F_GSO_UDP_L4 >> NETIF_F_GSO_SHIFT)); return (features & feature) == feature; } diff --git a/net/core/ethtool.c b/net/core/ethtool.c index 03416e6dd5d7..4650fd6d678c 100644 --- a/net/core/ethtool.c +++ b/net/core/ethtool.c @@ -92,6 +92,7 @@ static const char netdev_features_strings[NETDEV_FEATURE_COUNT][ETH_GSTRING_LEN] [NETIF_F_GSO_PARTIAL_BIT] = "tx-gso-partial", [NETIF_F_GSO_SCTP_BIT] = "tx-sctp-segmentation", [NETIF_F_GSO_ESP_BIT] = "tx-esp-segmentation", + [NETIF_F_GSO_UDP_L4_BIT] = "tx-udp-segmentation", [NETIF_F_FCOE_CRC_BIT] = "tx-checksum-fcoe-crc", [NETIF_F_SCTP_CRC_BIT] = "tx-checksum-sctp", From patchwork Thu Apr 26 17:42:22 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Willem de Bruijn X-Patchwork-Id: 905287 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="Oc8ooVHn"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 40X4Cv4xYKz9ry1 for ; Fri, 27 Apr 2018 03:43:07 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932315AbeDZRnF (ORCPT ); Thu, 26 Apr 2018 13:43:05 -0400 Received: from mail-qk0-f193.google.com ([209.85.220.193]:37745 "EHLO mail-qk0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754377AbeDZRmf (ORCPT ); Thu, 26 Apr 2018 13:42:35 -0400 Received: by mail-qk0-f193.google.com with SMTP id d74so26269846qkg.4 for ; Thu, 26 Apr 2018 10:42:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=ftcWgI1UqBAs2cYGlrVAVzLQgw/oWiQXgTWIMatPMPU=; b=Oc8ooVHnDpTgEPmQZu8MhqUZsW+TVWyD9ld+WrpMPc7OMxZqsJCQwEP9DE2Xe5k7B7 Sr3yK8Speg40+0ZMuvoAMySVUadTm3YXaC0DzWdsj6bYS/DD/7a45PvZKUXQciyStArM 3h9gk7qgeuT0Y+1q83lbm/Kth3glVd3ukU3CbV7Yyl7vrbhz1YrQpTRccSZK7UwdkB9g Fxyygene0qUhyUFoHHA1j3OsAnV7E5AMdjq+TGgRqOiR7+qa5jQF2dfPip6rin4OQFR7 Zmy1y9kuhEeU/17U1mToWdeAqHrU4TezlaI1g15qOItCJiHEpMIXS2aBbTymKueCgTC7 IjCw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=ftcWgI1UqBAs2cYGlrVAVzLQgw/oWiQXgTWIMatPMPU=; b=POak3muDu3VLCKmMaNbYr2tOX3H19sQ3WO47I2x0aROfivGGhPjD5EV0r3MB0VtU8/ qWIYkuqdx//HO4HBUFY0/VQTLmyGgNckBw88ZgBpDxSUPPC2Grs+YUdraTyID8ZHMS9r queK6SriNhgyae2bJYD3WdjZumWGXF0XC1h33XvAh3i0NLFN47kGmNE0o5RYq1+EVglE pbD8yEcNcLtdIVVpoirIEIjlzsKwrkh1y8P8H7Pe0hTpFAcEvxAaXwcDJB+9AwAWNc45 cMqh7iQw6xZO2RgwxKdOUMBlMs6jcctBY+SXGE8JApRrEHntcebw+Eul1ivghRWIqNHt ybrQ== X-Gm-Message-State: ALQs6tCsrRfVCLwc675uBK5yykGQGcXiiNcArGM/Tg94yRf6ixbvPfrW 2b2XpnVaRvwJWaskBK1Eca9aNxLh X-Google-Smtp-Source: AB8JxZqGAjZtBdAt1yj3D3wmuUtT2XwdXZHLNA/PQO3xvkEYCMYYluaVdrke1fJ0n5F5JT86EzY+xg== X-Received: by 10.55.31.77 with SMTP id f74mr34172071qkf.444.1524764554220; Thu, 26 Apr 2018 10:42:34 -0700 (PDT) Received: from willemb1.nyc.corp.google.com ([2620:0:1003:315:3fa1:a34c:1128:1d39]) by smtp.gmail.com with ESMTPSA id o14-v6sm12927515qta.23.2018.04.26.10.42.33 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 26 Apr 2018 10:42:33 -0700 (PDT) From: Willem de Bruijn To: netdev@vger.kernel.org Cc: davem@davemloft.net, alexander.duyck@gmail.com, Willem de Bruijn Subject: [PATCH net-next v2 08/11] selftests: udp gso Date: Thu, 26 Apr 2018 13:42:22 -0400 Message-Id: <20180426174225.246388-9-willemdebruijn.kernel@gmail.com> X-Mailer: git-send-email 2.17.0.484.g0c8726318c-goog In-Reply-To: <20180426174225.246388-1-willemdebruijn.kernel@gmail.com> References: <20180426174225.246388-1-willemdebruijn.kernel@gmail.com> Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Willem de Bruijn Validate udp gso, including edge cases (such as min/max gso sizes). Signed-off-by: Willem de Bruijn --- tools/testing/selftests/net/.gitignore | 1 + tools/testing/selftests/net/Makefile | 3 +- tools/testing/selftests/net/udpgso.c | 486 +++++++++++++++++++++++++ tools/testing/selftests/net/udpgso.sh | 16 + 4 files changed, 505 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/net/udpgso.c create mode 100755 tools/testing/selftests/net/udpgso.sh diff --git a/tools/testing/selftests/net/.gitignore b/tools/testing/selftests/net/.gitignore index 5559f6add046..67a26240407c 100644 --- a/tools/testing/selftests/net/.gitignore +++ b/tools/testing/selftests/net/.gitignore @@ -8,3 +8,4 @@ reuseport_bpf_numa reuseport_dualstack reuseaddr_conflict tcp_mmap +udpgso diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile index c3761c35f542..db3303348315 100644 --- a/tools/testing/selftests/net/Makefile +++ b/tools/testing/selftests/net/Makefile @@ -5,12 +5,13 @@ CFLAGS = -Wall -Wl,--no-as-needed -O2 -g CFLAGS += -I../../../../usr/include/ TEST_PROGS := run_netsocktests run_afpackettests test_bpf.sh netdevice.sh rtnetlink.sh -TEST_PROGS += fib_tests.sh fib-onlink-tests.sh in_netns.sh pmtu.sh +TEST_PROGS += fib_tests.sh fib-onlink-tests.sh in_netns.sh pmtu.sh udpgso.sh TEST_GEN_FILES = socket TEST_GEN_FILES += psock_fanout psock_tpacket msg_zerocopy TEST_GEN_FILES += tcp_mmap TEST_GEN_PROGS = reuseport_bpf reuseport_bpf_cpu reuseport_bpf_numa TEST_GEN_PROGS += reuseport_dualstack reuseaddr_conflict +TEST_GEN_PROGS += udpgso include ../lib.mk diff --git a/tools/testing/selftests/net/udpgso.c b/tools/testing/selftests/net/udpgso.c new file mode 100644 index 000000000000..68a230dfee73 --- /dev/null +++ b/tools/testing/selftests/net/udpgso.c @@ -0,0 +1,486 @@ +// SPDX-License-Identifier: GPL-2.0 + +#define _GNU_SOURCE + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#ifndef ETH_MAX_MTU +#define ETH_MAX_MTU 0xFFFFU +#endif + +#ifndef UDP_SEGMENT +#define UDP_SEGMENT 103 +#endif + +#define CONST_MTU_TEST 1500 + +#define CONST_HDRLEN_V4 (sizeof(struct iphdr) + sizeof(struct udphdr)) +#define CONST_HDRLEN_V6 (sizeof(struct ip6_hdr) + sizeof(struct udphdr)) + +#define CONST_MSS_V4 (CONST_MTU_TEST - CONST_HDRLEN_V4) +#define CONST_MSS_V6 (CONST_MTU_TEST - CONST_HDRLEN_V6) + +#define CONST_MAX_SEGS_V4 (ETH_MAX_MTU / CONST_MSS_V4) +#define CONST_MAX_SEGS_V6 (ETH_MAX_MTU / CONST_MSS_V6) + +static bool cfg_do_ipv4; +static bool cfg_do_ipv6; +static bool cfg_do_connectionless; +static bool cfg_do_setsockopt; +static int cfg_specific_test_id = -1; + +static const char cfg_ifname[] = "lo"; +static unsigned short cfg_port = 9000; + +static char buf[ETH_MAX_MTU]; + +struct testcase { + int tlen; /* send() buffer size, may exceed mss */ + bool tfail; /* send() call is expected to fail */ + int gso_len; /* mss after applying gso */ + int r_num_mss; /* recv(): number of calls of full mss */ + int r_len_last; /* recv(): size of last non-mss dgram, if any */ +}; + +const struct in6_addr addr6 = IN6ADDR_LOOPBACK_INIT; +const struct in_addr addr4 = { .s_addr = __constant_htonl(INADDR_LOOPBACK + 2) }; + +struct testcase testcases_v4[] = { + { + /* no GSO: send a single byte */ + .tlen = 1, + .r_len_last = 1, + }, + { + /* no GSO: send a single MSS */ + .tlen = CONST_MSS_V4, + .r_num_mss = 1, + }, + { + /* no GSO: send a single MSS + 1B: fail */ + .tlen = CONST_MSS_V4 + 1, + .tfail = true, + }, + { + /* send a single MSS: will fail with GSO, because the segment + * logic in udp4_ufo_fragment demands a gso skb to be > MTU + */ + .tlen = CONST_MSS_V4, + .gso_len = CONST_MSS_V4, + .tfail = true, + .r_num_mss = 1, + }, + { + /* send a single MSS + 1B */ + .tlen = CONST_MSS_V4 + 1, + .gso_len = CONST_MSS_V4, + .r_num_mss = 1, + .r_len_last = 1, + }, + { + /* send exactly 2 MSS */ + .tlen = CONST_MSS_V4 * 2, + .gso_len = CONST_MSS_V4, + .r_num_mss = 2, + }, + { + /* send 2 MSS + 1B */ + .tlen = (CONST_MSS_V4 * 2) + 1, + .gso_len = CONST_MSS_V4, + .r_num_mss = 2, + .r_len_last = 1, + }, + { + /* send MAX segs */ + .tlen = (ETH_MAX_MTU / CONST_MSS_V4) * CONST_MSS_V4, + .gso_len = CONST_MSS_V4, + .r_num_mss = (ETH_MAX_MTU / CONST_MSS_V4), + }, + + { + /* send MAX bytes */ + .tlen = ETH_MAX_MTU - CONST_HDRLEN_V4, + .gso_len = CONST_MSS_V4, + .r_num_mss = CONST_MAX_SEGS_V4, + .r_len_last = ETH_MAX_MTU - CONST_HDRLEN_V4 - + (CONST_MAX_SEGS_V4 * CONST_MSS_V4), + }, + { + /* send MAX + 1: fail */ + .tlen = ETH_MAX_MTU - CONST_HDRLEN_V4 + 1, + .gso_len = CONST_MSS_V4, + .tfail = true, + }, + { + /* EOL */ + } +}; + +#ifndef IP6_MAX_MTU +#define IP6_MAX_MTU (ETH_MAX_MTU + sizeof(struct ip6_hdr)) +#endif + +struct testcase testcases_v6[] = { + { + /* no GSO: send a single byte */ + .tlen = 1, + .r_len_last = 1, + }, + { + /* no GSO: send a single MSS */ + .tlen = CONST_MSS_V6, + .r_num_mss = 1, + }, + { + /* no GSO: send a single MSS + 1B: fail */ + .tlen = CONST_MSS_V6 + 1, + .tfail = true, + }, + { + /* send a single MSS: will fail with GSO, because the segment + * logic in udp4_ufo_fragment demands a gso skb to be > MTU + */ + .tlen = CONST_MSS_V6, + .gso_len = CONST_MSS_V6, + .tfail = true, + .r_num_mss = 1, + }, + { + /* send a single MSS + 1B */ + .tlen = CONST_MSS_V6 + 1, + .gso_len = CONST_MSS_V6, + .r_num_mss = 1, + .r_len_last = 1, + }, + { + /* send exactly 2 MSS */ + .tlen = CONST_MSS_V6 * 2, + .gso_len = CONST_MSS_V6, + .r_num_mss = 2, + }, + { + /* send 2 MSS + 1B */ + .tlen = (CONST_MSS_V6 * 2) + 1, + .gso_len = CONST_MSS_V6, + .r_num_mss = 2, + .r_len_last = 1, + }, + { + /* send MAX segs */ + .tlen = (IP6_MAX_MTU / CONST_MSS_V6) * CONST_MSS_V6, + .gso_len = CONST_MSS_V6, + .r_num_mss = (IP6_MAX_MTU / CONST_MSS_V6), + }, + + { + /* send MAX bytes */ + .tlen = IP6_MAX_MTU - CONST_HDRLEN_V6, + .gso_len = CONST_MSS_V6, + .r_num_mss = CONST_MAX_SEGS_V6, + .r_len_last = IP6_MAX_MTU - CONST_HDRLEN_V6 - + (CONST_MAX_SEGS_V6 * CONST_MSS_V6), + }, + { + /* send MAX + 1: fail */ + .tlen = IP6_MAX_MTU - CONST_HDRLEN_V6 + 1, + .gso_len = CONST_MSS_V6, + .tfail = true, + }, + { + /* EOL */ + } +}; + +static unsigned int get_device_mtu(int fd, const char *ifname) +{ + struct ifreq ifr; + + memset(&ifr, 0, sizeof(ifr)); + + strcpy(ifr.ifr_name, ifname); + + if (ioctl(fd, SIOCGIFMTU, &ifr)) + error(1, errno, "ioctl get mtu"); + + return ifr.ifr_mtu; +} + +static void __set_device_mtu(int fd, const char *ifname, unsigned int mtu) +{ + struct ifreq ifr; + + memset(&ifr, 0, sizeof(ifr)); + + ifr.ifr_mtu = mtu; + strcpy(ifr.ifr_name, ifname); + + if (ioctl(fd, SIOCSIFMTU, &ifr)) + error(1, errno, "ioctl set mtu"); +} + +static void set_device_mtu(int fd, int mtu) +{ + int val; + + val = get_device_mtu(fd, cfg_ifname); + fprintf(stderr, "device mtu (orig): %u\n", val); + + __set_device_mtu(fd, cfg_ifname, mtu); + val = get_device_mtu(fd, cfg_ifname); + if (val != mtu) + error(1, 0, "unable to set device mtu to %u\n", val); + + fprintf(stderr, "device mtu (test): %u\n", val); +} + +static void set_pmtu_discover(int fd, bool is_ipv4) +{ + int level, name, val; + + if (is_ipv4) { + level = SOL_IP; + name = IP_MTU_DISCOVER; + val = IP_PMTUDISC_DO; + } else { + level = SOL_IPV6; + name = IPV6_MTU_DISCOVER; + val = IPV6_PMTUDISC_DO; + } + + if (setsockopt(fd, level, name, &val, sizeof(val))) + error(1, errno, "setsockopt path mtu"); +} + +static bool send_one(int fd, int len, int gso_len, + struct sockaddr *addr, socklen_t alen) +{ + char control[CMSG_SPACE(sizeof(uint16_t))] = {0}; + struct msghdr msg = {0}; + struct iovec iov = {0}; + struct cmsghdr *cm; + int ret; + + iov.iov_base = buf; + iov.iov_len = len; + + msg.msg_iov = &iov; + msg.msg_iovlen = 1; + + msg.msg_name = addr; + msg.msg_namelen = alen; + + if (gso_len && !cfg_do_setsockopt) { + msg.msg_control = control; + msg.msg_controllen = sizeof(control); + + cm = CMSG_FIRSTHDR(&msg); + cm->cmsg_level = SOL_UDP; + cm->cmsg_type = UDP_SEGMENT; + cm->cmsg_len = CMSG_LEN(sizeof(uint16_t)); + *((uint16_t *) CMSG_DATA(cm)) = gso_len; + } + + ret = sendmsg(fd, &msg, 0); + if (ret == -1 && (errno == EMSGSIZE || errno == ENOMEM)) + return false; + if (ret == -1) + error(1, errno, "sendmsg"); + if (ret != len) + error(1, 0, "sendto: %d != %u", ret, len); + + return true; +} + +static int recv_one(int fd, int flags) +{ + int ret; + + ret = recv(fd, buf, sizeof(buf), flags); + if (ret == -1 && errno == EAGAIN && (flags & MSG_DONTWAIT)) + return 0; + if (ret == -1) + error(1, errno, "recv"); + + return ret; +} + +static void run_one(struct testcase *test, int fdt, int fdr, + struct sockaddr *addr, socklen_t alen) +{ + int i, ret, val, mss; + bool sent; + + fprintf(stderr, "ipv%d tx:%d gso:%d %s\n", + addr->sa_family == AF_INET ? 4 : 6, + test->tlen, test->gso_len, + test->tfail ? "(fail)" : ""); + + val = test->gso_len; + if (cfg_do_setsockopt) { + if (setsockopt(fdt, SOL_UDP, UDP_SEGMENT, &val, sizeof(val))) + error(1, errno, "setsockopt udp segment"); + } + + sent = send_one(fdt, test->tlen, test->gso_len, addr, alen); + if (sent && test->tfail) + error(1, 0, "send succeeded while expecting failure"); + if (!sent && !test->tfail) + error(1, 0, "send failed while expecting success"); + if (!sent) + return; + + mss = addr->sa_family == AF_INET ? CONST_MSS_V4 : CONST_MSS_V6; + + /* Recv all full MSS datagrams */ + for (i = 0; i < test->r_num_mss; i++) { + ret = recv_one(fdr, 0); + if (ret != mss) + error(1, 0, "recv.%d: %d != %d", i, ret, mss); + } + + /* Recv the non-full last datagram, if tlen was not a multiple of mss */ + if (test->r_len_last) { + ret = recv_one(fdr, 0); + if (ret != test->r_len_last) + error(1, 0, "recv.%d: %d != %d (last)", + i, ret, test->r_len_last); + } + + /* Verify received all data */ + ret = recv_one(fdr, MSG_DONTWAIT); + if (ret) + error(1, 0, "recv: unexpected datagram"); +} + +static void run_all(int fdt, int fdr, struct sockaddr *addr, socklen_t alen) +{ + struct testcase *tests, *test; + + tests = addr->sa_family == AF_INET ? testcases_v4 : testcases_v6; + + for (test = tests; test->tlen; test++) { + /* if a specific test is given, then skip all others */ + if (cfg_specific_test_id == -1 || + cfg_specific_test_id == test - tests) + run_one(test, fdt, fdr, addr, alen); + } +} + +static void run_test(struct sockaddr *addr, socklen_t alen) +{ + struct timeval tv = { .tv_usec = 100 * 1000 }; + int fdr, fdt; + + fdr = socket(addr->sa_family, SOCK_DGRAM, 0); + if (fdr == -1) + error(1, errno, "socket r"); + + if (bind(fdr, addr, alen)) + error(1, errno, "bind"); + + /* Have tests fail quickly instead of hang */ + if (setsockopt(fdr, SOL_SOCKET, SO_RCVTIMEO, &tv, sizeof(tv))) + error(1, errno, "setsockopt rcv timeout"); + + fdt = socket(addr->sa_family, SOCK_DGRAM, 0); + if (fdt == -1) + error(1, errno, "socket t"); + + /* Do not fragment these datagrams: only succeed if GSO works */ + set_pmtu_discover(fdt, addr->sa_family == AF_INET); + + if (cfg_do_connectionless) { + set_device_mtu(fdt, CONST_MTU_TEST); + run_all(fdt, fdr, addr, alen); + } + + if (close(fdt)) + error(1, errno, "close t"); + if (close(fdr)) + error(1, errno, "close r"); +} + +static void run_test_v4(void) +{ + struct sockaddr_in addr = {0}; + + addr.sin_family = AF_INET; + addr.sin_port = htons(cfg_port); + addr.sin_addr = addr4; + + run_test((void *)&addr, sizeof(addr)); +} + +static void run_test_v6(void) +{ + struct sockaddr_in6 addr = {0}; + + addr.sin6_family = AF_INET6; + addr.sin6_port = htons(cfg_port); + addr.sin6_addr = addr6; + + run_test((void *)&addr, sizeof(addr)); +} + +static void parse_opts(int argc, char **argv) +{ + int c; + + while ((c = getopt(argc, argv, "46Cst:")) != -1) { + switch (c) { + case '4': + cfg_do_ipv4 = true; + break; + case '6': + cfg_do_ipv6 = true; + break; + case 'C': + cfg_do_connectionless = true; + break; + case 's': + cfg_do_setsockopt = true; + break; + case 't': + cfg_specific_test_id = strtoul(optarg, NULL, 0); + break; + default: + error(1, 0, "%s: parse error", argv[0]); + } + } +} + +int main(int argc, char **argv) +{ + parse_opts(argc, argv); + + if (cfg_do_ipv4) + run_test_v4(); + if (cfg_do_ipv6) + run_test_v6(); + + fprintf(stderr, "OK\n"); + return 0; +} + diff --git a/tools/testing/selftests/net/udpgso.sh b/tools/testing/selftests/net/udpgso.sh new file mode 100755 index 000000000000..7977b97e060c --- /dev/null +++ b/tools/testing/selftests/net/udpgso.sh @@ -0,0 +1,16 @@ +#!/bin/sh +# SPDX-License-Identifier: GPL-2.0 +# +# Run a series of udpgso regression tests + +echo "ipv4 cmsg" +./in_netns.sh ./udpgso -4 -C + +echo "ipv4 setsockopt" +./in_netns.sh ./udpgso -4 -C -s + +echo "ipv6 cmsg" +./in_netns.sh ./udpgso -6 -C + +echo "ipv6 setsockopt" +./in_netns.sh ./udpgso -6 -C -s From patchwork Thu Apr 26 17:42:23 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Willem de Bruijn X-Patchwork-Id: 905286 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="FowvpSQx"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 40X4Cn1v6Dz9ryr for ; Fri, 27 Apr 2018 03:43:01 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932308AbeDZRm6 (ORCPT ); Thu, 26 Apr 2018 13:42:58 -0400 Received: from mail-qk0-f193.google.com ([209.85.220.193]:38902 "EHLO mail-qk0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756840AbeDZRmg (ORCPT ); Thu, 26 Apr 2018 13:42:36 -0400 Received: by mail-qk0-f193.google.com with SMTP id b39so26244580qkb.5 for ; Thu, 26 Apr 2018 10:42:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=oUr6zSgFTOWXInw37pxXUdObfF/NQw7+/9JkHNecV/k=; b=FowvpSQxibklcpcKotRzvP1jMegowaaG7o8hjSCQmi3U5D1TzXapyezwntFLKFeKlG TwKvGocvHi2csmaB09Rk1lteuDq5iL4g2sA7DnpPfYwsFYqbQUt4KEybUCOAteUpUbDX b8ZC38xD3uv2eqakyX/p84bJmSsjaCkXxVeuRilRD8HE64BYo18F8JQG2a86FuRLixEa QVMeG4+u+FlJNPUr0ZiW+hMt74mxJNvUFWAPw/Doz53qr90hTPKmcki/Jvr8p7DqsaZi bZgSDH0a71wWX/j6XfeshBEH2O/3+VVceMk+cSRdm/Idqsouy7FbsXLpS8jkhtXLb7b4 0QyA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=oUr6zSgFTOWXInw37pxXUdObfF/NQw7+/9JkHNecV/k=; b=SZMBw3t3eN9Z4Jvu/yuaM+GQ0IxhYIkpwaVSlHfcBvCP+L9sQbKJlVHy0fSwc2Oo0a 2dWkQrQ5tLjJSjJz7LIPMuAxmZ5xXNZzFow2t+ANgxix8Q5Ho8a3XH8GuUUtVRHZZSFx tX/mB1xL3iX2jVrcEcEkyrGk1qaoMeNdGog3MIx6Va0FeRye9wMcuM4IT5jGVlba5qyL g4nR2yy8b/T8sigL9uTc2SrafctcvTuDzAquzmbq1FLbna85mqGJ/81U5NqRIlkibpot w+MBRr9YyBhHkI84RjbDFknAsuVzGAZInAXStw7RSB8MauliqakAkuxpCYmtobElArjS YgIA== X-Gm-Message-State: ALQs6tD1gCCSigy6NhkcxmkMutp8lzLVOgEPEwDBr04N/JwAZTIkvMvH 3iyACr7fIX8uGwlr03koo9LQDlJc X-Google-Smtp-Source: AIpwx48MplJBMdsqWGBv5w1BIJrX7vh+V5sM4zdCJNrtjMJpLvr7/eHrbgV6r0x67gcYkU9TB0kmTQ== X-Received: by 10.55.26.41 with SMTP id a41mr34229201qka.401.1524764554887; Thu, 26 Apr 2018 10:42:34 -0700 (PDT) Received: from willemb1.nyc.corp.google.com ([2620:0:1003:315:3fa1:a34c:1128:1d39]) by smtp.gmail.com with ESMTPSA id o14-v6sm12927515qta.23.2018.04.26.10.42.34 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 26 Apr 2018 10:42:34 -0700 (PDT) From: Willem de Bruijn To: netdev@vger.kernel.org Cc: davem@davemloft.net, alexander.duyck@gmail.com, Willem de Bruijn Subject: [PATCH net-next v2 09/11] selftests: udp gso with connected sockets Date: Thu, 26 Apr 2018 13:42:23 -0400 Message-Id: <20180426174225.246388-10-willemdebruijn.kernel@gmail.com> X-Mailer: git-send-email 2.17.0.484.g0c8726318c-goog In-Reply-To: <20180426174225.246388-1-willemdebruijn.kernel@gmail.com> References: <20180426174225.246388-1-willemdebruijn.kernel@gmail.com> Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Willem de Bruijn Connected sockets use path mtu instead of device mtu. Test this path by inserting a route mtu that is lower than the device mtu. Verify that the path mtu for the connection matches this lower number, then run the same test as in the connectionless case. Signed-off-by: Willem de Bruijn --- tools/testing/selftests/net/udpgso.c | 117 +++++++++++++++++++++++++- tools/testing/selftests/net/udpgso.sh | 7 ++ 2 files changed, 122 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/net/udpgso.c b/tools/testing/selftests/net/udpgso.c index 68a230dfee73..a47dca025346 100644 --- a/tools/testing/selftests/net/udpgso.c +++ b/tools/testing/selftests/net/udpgso.c @@ -47,6 +47,7 @@ static bool cfg_do_ipv4; static bool cfg_do_ipv6; +static bool cfg_do_connected; static bool cfg_do_connectionless; static bool cfg_do_setsockopt; static int cfg_specific_test_id = -1; @@ -273,6 +274,101 @@ static void set_pmtu_discover(int fd, bool is_ipv4) error(1, errno, "setsockopt path mtu"); } +static unsigned int get_path_mtu(int fd, bool is_ipv4) +{ + socklen_t vallen; + unsigned int mtu; + int ret; + + vallen = sizeof(mtu); + if (is_ipv4) + ret = getsockopt(fd, SOL_IP, IP_MTU, &mtu, &vallen); + else + ret = getsockopt(fd, SOL_IPV6, IPV6_MTU, &mtu, &vallen); + + if (ret) + error(1, errno, "getsockopt mtu"); + + + fprintf(stderr, "path mtu (read): %u\n", mtu); + return mtu; +} + +/* very wordy version of system("ip route add dev lo mtu 1500 127.0.0.3/32") */ +static void set_route_mtu(int mtu, bool is_ipv4) +{ + struct sockaddr_nl nladdr = { .nl_family = AF_NETLINK }; + struct nlmsghdr *nh; + struct rtattr *rta; + struct rtmsg *rt; + char data[NLMSG_ALIGN(sizeof(*nh)) + + NLMSG_ALIGN(sizeof(*rt)) + + NLMSG_ALIGN(RTA_LENGTH(sizeof(addr6))) + + NLMSG_ALIGN(RTA_LENGTH(sizeof(int))) + + NLMSG_ALIGN(RTA_LENGTH(0) + RTA_LENGTH(sizeof(int)))]; + int fd, ret, alen, off = 0; + + alen = is_ipv4 ? sizeof(addr4) : sizeof(addr6); + + fd = socket(AF_NETLINK, SOCK_RAW, NETLINK_ROUTE); + if (fd == -1) + error(1, errno, "socket netlink"); + + memset(data, 0, sizeof(data)); + + nh = (void *)data; + nh->nlmsg_type = RTM_NEWROUTE; + nh->nlmsg_flags = NLM_F_REQUEST | NLM_F_CREATE; + off += NLMSG_ALIGN(sizeof(*nh)); + + rt = (void *)(data + off); + rt->rtm_family = is_ipv4 ? AF_INET : AF_INET6; + rt->rtm_table = RT_TABLE_MAIN; + rt->rtm_dst_len = alen << 3; + rt->rtm_protocol = RTPROT_BOOT; + rt->rtm_scope = RT_SCOPE_UNIVERSE; + rt->rtm_type = RTN_UNICAST; + off += NLMSG_ALIGN(sizeof(*rt)); + + rta = (void *)(data + off); + rta->rta_type = RTA_DST; + rta->rta_len = RTA_LENGTH(alen); + if (is_ipv4) + memcpy(RTA_DATA(rta), &addr4, alen); + else + memcpy(RTA_DATA(rta), &addr6, alen); + off += NLMSG_ALIGN(rta->rta_len); + + rta = (void *)(data + off); + rta->rta_type = RTA_OIF; + rta->rta_len = RTA_LENGTH(sizeof(int)); + *((int *)(RTA_DATA(rta))) = 1; //if_nametoindex("lo"); + off += NLMSG_ALIGN(rta->rta_len); + + /* MTU is a subtype in a metrics type */ + rta = (void *)(data + off); + rta->rta_type = RTA_METRICS; + rta->rta_len = RTA_LENGTH(0) + RTA_LENGTH(sizeof(int)); + off += NLMSG_ALIGN(rta->rta_len); + + /* now fill MTU subtype. Note that it fits within above rta_len */ + rta = (void *)(((char *) rta) + RTA_LENGTH(0)); + rta->rta_type = RTAX_MTU; + rta->rta_len = RTA_LENGTH(sizeof(int)); + *((int *)(RTA_DATA(rta))) = mtu; + + nh->nlmsg_len = off; + + ret = sendto(fd, data, off, 0, (void *)&nladdr, sizeof(nladdr)); + if (ret != off) + error(1, errno, "send netlink: %uB != %uB\n", ret, off); + + if (close(fd)) + error(1, errno, "close netlink"); + + fprintf(stderr, "route mtu (test): %u\n", mtu); +} + static bool send_one(int fd, int len, int gso_len, struct sockaddr *addr, socklen_t alen) { @@ -391,7 +487,7 @@ static void run_all(int fdt, int fdr, struct sockaddr *addr, socklen_t alen) static void run_test(struct sockaddr *addr, socklen_t alen) { struct timeval tv = { .tv_usec = 100 * 1000 }; - int fdr, fdt; + int fdr, fdt, val; fdr = socket(addr->sa_family, SOCK_DGRAM, 0); if (fdr == -1) @@ -416,6 +512,20 @@ static void run_test(struct sockaddr *addr, socklen_t alen) run_all(fdt, fdr, addr, alen); } + if (cfg_do_connected) { + set_device_mtu(fdt, CONST_MTU_TEST + 100); + set_route_mtu(CONST_MTU_TEST, addr->sa_family == AF_INET); + + if (connect(fdt, addr, alen)) + error(1, errno, "connect"); + + val = get_path_mtu(fdt, addr->sa_family == AF_INET); + if (val != CONST_MTU_TEST) + error(1, 0, "bad path mtu %u\n", val); + + run_all(fdt, fdr, addr, 0 /* use connected addr */); + } + if (close(fdt)) error(1, errno, "close t"); if (close(fdr)) @@ -448,7 +558,7 @@ static void parse_opts(int argc, char **argv) { int c; - while ((c = getopt(argc, argv, "46Cst:")) != -1) { + while ((c = getopt(argc, argv, "46cCst:")) != -1) { switch (c) { case '4': cfg_do_ipv4 = true; @@ -456,6 +566,9 @@ static void parse_opts(int argc, char **argv) case '6': cfg_do_ipv6 = true; break; + case 'c': + cfg_do_connected = true; + break; case 'C': cfg_do_connectionless = true; break; diff --git a/tools/testing/selftests/net/udpgso.sh b/tools/testing/selftests/net/udpgso.sh index 7977b97e060c..7cdf0e7c1dde 100755 --- a/tools/testing/selftests/net/udpgso.sh +++ b/tools/testing/selftests/net/udpgso.sh @@ -14,3 +14,10 @@ echo "ipv6 cmsg" echo "ipv6 setsockopt" ./in_netns.sh ./udpgso -6 -C -s + +echo "ipv4 connected" +./in_netns.sh ./udpgso -4 -c + +# blocked on 2nd loopback address +# echo "ipv6 connected" +# ./in_netns.sh ./udpgso -6 -c From patchwork Thu Apr 26 17:42:24 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Willem de Bruijn X-Patchwork-Id: 905285 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="Avb0hdrX"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 40X4Ch0K1dz9ry1 for ; Fri, 27 Apr 2018 03:42:56 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932288AbeDZRmx (ORCPT ); Thu, 26 Apr 2018 13:42:53 -0400 Received: from mail-qk0-f194.google.com ([209.85.220.194]:38904 "EHLO mail-qk0-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756863AbeDZRmg (ORCPT ); Thu, 26 Apr 2018 13:42:36 -0400 Received: by mail-qk0-f194.google.com with SMTP id b39so26244601qkb.5 for ; Thu, 26 Apr 2018 10:42:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=ItBig+yz11vXIHcEwutC2O/40D4GpIH3HB5GospcmS8=; b=Avb0hdrXnUXs1adkgkFUx1Yd8MXEzoTbroqHYr7O4/1KEomq+ZQfVWSX37EAu4Bbo+ vC80AN6gSXkXeav4cCkZLUQ8WKltYzMLL3fNvUCohvs+cQEsm+AEMs+bTcTC15KNfcHu hqi7wQXutYk494b3cYEDm+2BiQsocf2jc5l0Ec/eTOdUzY0vj/GbFcto625Hl/AmSjRG 91UrpUMf4X7be4YmoR6N9StQX7cJ2jQWevECkeeHuNLZdoq2EhdZMjnUkLICitAixbw9 E1CbSVY9BggRmA1wSKmhryOM5TI1e9Yl8lYRfY9WcFMNanmRAYBlB1VMAjB8h2i02pqm Tc6Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=ItBig+yz11vXIHcEwutC2O/40D4GpIH3HB5GospcmS8=; b=H0rtA70jv6JhBsE0vT+TeiNyvrC+6zDeK7L+Y69GBvHZFeCudoUv80kbExR97k1qvE gSd1xPhlz28Qu3w+ZIG8zu0djC11ERWA+WTx8jYjko3UTYuc3boPJ/wiKel5roCqIjyP yoddawsKJTmCc9u+AqQfi9SWLUih6ml1HpI4r7eVqQnHANabvuVgVATv/r+F3D4nk2JC 86Np861aBsxnKPqxtoud9GsknJMVatvMDdBe171nYQGMEVKYzpJ8GZBiX2STTblVz3GH 10DtJAnYgGvqUylhGSS1wiiQB4PiFLpuWyhPAAm3y4WqqnkEivzSdN20lyVdEr4RX8Rh oMKg== X-Gm-Message-State: ALQs6tBHdYlQxU4muMgZK5DyTUQhgIvmgSL5o+QQ1p4/xeVlYMo4y98/ 7pcBdkEbYs65Hi93/vkfWPIIoe9a X-Google-Smtp-Source: AB8JxZo0jF048uaAC1dQ1gmpB+0YQ7CakuLR6rZN0ADakar5HFlQIhVBqDIkn+RWiVPvSqgobZh7MA== X-Received: by 10.55.116.71 with SMTP id p68mr33783620qkc.29.1524764555475; Thu, 26 Apr 2018 10:42:35 -0700 (PDT) Received: from willemb1.nyc.corp.google.com ([2620:0:1003:315:3fa1:a34c:1128:1d39]) by smtp.gmail.com with ESMTPSA id o14-v6sm12927515qta.23.2018.04.26.10.42.34 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 26 Apr 2018 10:42:35 -0700 (PDT) From: Willem de Bruijn To: netdev@vger.kernel.org Cc: davem@davemloft.net, alexander.duyck@gmail.com, Willem de Bruijn Subject: [PATCH net-next v2 10/11] selftests: udp gso with corking Date: Thu, 26 Apr 2018 13:42:24 -0400 Message-Id: <20180426174225.246388-11-willemdebruijn.kernel@gmail.com> X-Mailer: git-send-email 2.17.0.484.g0c8726318c-goog In-Reply-To: <20180426174225.246388-1-willemdebruijn.kernel@gmail.com> References: <20180426174225.246388-1-willemdebruijn.kernel@gmail.com> Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Willem de Bruijn Corked sockets take a different path to construct a udp datagram than the lockless fast path. Test this alternate path. Signed-off-by: Willem de Bruijn --- tools/testing/selftests/net/udpgso.c | 42 ++++++++++++++++++++------- tools/testing/selftests/net/udpgso.sh | 6 ++++ 2 files changed, 38 insertions(+), 10 deletions(-) diff --git a/tools/testing/selftests/net/udpgso.c b/tools/testing/selftests/net/udpgso.c index a47dca025346..6b1998b9f7df 100644 --- a/tools/testing/selftests/net/udpgso.c +++ b/tools/testing/selftests/net/udpgso.c @@ -49,6 +49,7 @@ static bool cfg_do_ipv4; static bool cfg_do_ipv6; static bool cfg_do_connected; static bool cfg_do_connectionless; +static bool cfg_do_msgmore; static bool cfg_do_setsockopt; static int cfg_specific_test_id = -1; @@ -369,6 +370,23 @@ static void set_route_mtu(int mtu, bool is_ipv4) fprintf(stderr, "route mtu (test): %u\n", mtu); } +static bool __send_one(int fd, struct msghdr *msg, int flags) +{ + int ret; + + ret = sendmsg(fd, msg, flags); + if (ret == -1 && (errno == EMSGSIZE || errno == ENOMEM)) + return false; + if (ret == -1) + error(1, errno, "sendmsg"); + if (ret != msg->msg_iov->iov_len) + error(1, 0, "sendto: %d != %lu", ret, msg->msg_iov->iov_len); + if (msg->msg_flags) + error(1, 0, "sendmsg: return flags 0x%x\n", msg->msg_flags); + + return true; +} + static bool send_one(int fd, int len, int gso_len, struct sockaddr *addr, socklen_t alen) { @@ -376,7 +394,6 @@ static bool send_one(int fd, int len, int gso_len, struct msghdr msg = {0}; struct iovec iov = {0}; struct cmsghdr *cm; - int ret; iov.iov_base = buf; iov.iov_len = len; @@ -398,15 +415,17 @@ static bool send_one(int fd, int len, int gso_len, *((uint16_t *) CMSG_DATA(cm)) = gso_len; } - ret = sendmsg(fd, &msg, 0); - if (ret == -1 && (errno == EMSGSIZE || errno == ENOMEM)) - return false; - if (ret == -1) - error(1, errno, "sendmsg"); - if (ret != len) - error(1, 0, "sendto: %d != %u", ret, len); + /* If MSG_MORE, send 1 byte followed by remainder */ + if (cfg_do_msgmore && len > 1) { + iov.iov_len = 1; + if (!__send_one(fd, &msg, MSG_MORE)) + error(1, 0, "send 1B failed"); - return true; + iov.iov_base++; + iov.iov_len = len - 1; + } + + return __send_one(fd, &msg, 0); } static int recv_one(int fd, int flags) @@ -558,7 +577,7 @@ static void parse_opts(int argc, char **argv) { int c; - while ((c = getopt(argc, argv, "46cCst:")) != -1) { + while ((c = getopt(argc, argv, "46cCmst:")) != -1) { switch (c) { case '4': cfg_do_ipv4 = true; @@ -572,6 +591,9 @@ static void parse_opts(int argc, char **argv) case 'C': cfg_do_connectionless = true; break; + case 'm': + cfg_do_msgmore = true; + break; case 's': cfg_do_setsockopt = true; break; diff --git a/tools/testing/selftests/net/udpgso.sh b/tools/testing/selftests/net/udpgso.sh index 7cdf0e7c1dde..fec24f584fe9 100755 --- a/tools/testing/selftests/net/udpgso.sh +++ b/tools/testing/selftests/net/udpgso.sh @@ -21,3 +21,9 @@ echo "ipv4 connected" # blocked on 2nd loopback address # echo "ipv6 connected" # ./in_netns.sh ./udpgso -6 -c + +echo "ipv4 msg_more" +./in_netns.sh ./udpgso -4 -C -m + +echo "ipv6 msg_more" +./in_netns.sh ./udpgso -6 -C -m From patchwork Thu Apr 26 17:42:25 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Willem de Bruijn X-Patchwork-Id: 905284 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="IOlwRQRE"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 40X4Cb5t1Qz9ry1 for ; Fri, 27 Apr 2018 03:42:51 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932272AbeDZRmt (ORCPT ); Thu, 26 Apr 2018 13:42:49 -0400 Received: from mail-qk0-f194.google.com ([209.85.220.194]:41649 "EHLO mail-qk0-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756870AbeDZRmi (ORCPT ); Thu, 26 Apr 2018 13:42:38 -0400 Received: by mail-qk0-f194.google.com with SMTP id d125so10217473qkb.8 for ; Thu, 26 Apr 2018 10:42:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=jOojZT7UIDvanfUaQ9sbifNyMm6A7QgBp9ZtqtiAsiw=; b=IOlwRQRElB3NDS+Ci6eRAXQagXgv6JSoyKiiwFuN6sbL7x3VJ234f/QhEcikBcMhFM dAtSSAto8zn6fUWTXR0cVq4yKoSnlDuxOZaUS9bqbpqnjLtm9rpIQyuEQC32daZiIfI9 Su7F4qR6GSKgcYTOGQhajzmJv0OHGwgbKW002UKYRh4x89wrD58BX1+rrfqz+ZUmwp6F e4F+bZwbZEpr90aJNp7ScKM/RWlyxARAs9Z/sHDbU5ZhwwERaHsekH+ImRCcDavMYYRn BYYkcV8JvxlYV5x5aiBK0GPlDIxC9mt9clMhxksdh+aEz1fiWkhXz4zHnlV8JHZhg/i1 Mteg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=jOojZT7UIDvanfUaQ9sbifNyMm6A7QgBp9ZtqtiAsiw=; b=tCYkrxmPDS5ZrKnNxM6lu/+UKDF84tSIUmcIIaV23+2CpcBq0rL7ClZd4YVLPYzKkO 1cFr2RLf/+o8pGgzP7XZflVmgRzn45Gmslv/JARZOVa7Um/YhqTX4RiJfjnJlNey0u09 DokJFkBgvl5AeAQf2ORyXG3gJNu5zskZFsDH3Spp4MN1+5eRXNEp12nJ1J2Zto4EXW5v c04y398PFjwmd6yC3tH1r4xU0nxh32NEWQmGNi3xdtW49jhfupTskDabZos6BNt/HDnu axcoopJ8mV+1F3axd1AK6qUO6QBVtuvb+bm92M1Sw4oRl6+cBdf3B0gUAZtCs8N4Ym9H 6/mA== X-Gm-Message-State: ALQs6tBiTR5ZfnckJClqVhPknIQlyyjLPN/F505r4ju3zRwRwIFDsjBI eAv1NO1anpyCShrFMqh/Ceir0ti3 X-Google-Smtp-Source: AIpwx49mjbPTgtfTm7tQdZgVZNIZs1c6UlSdnigvUB58DfXNYldYVdVRKaQG6lKsdn9woIpo+3xFBQ== X-Received: by 10.55.86.195 with SMTP id k186mr36949492qkb.0.1524764556623; Thu, 26 Apr 2018 10:42:36 -0700 (PDT) Received: from willemb1.nyc.corp.google.com ([2620:0:1003:315:3fa1:a34c:1128:1d39]) by smtp.gmail.com with ESMTPSA id o14-v6sm12927515qta.23.2018.04.26.10.42.35 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 26 Apr 2018 10:42:35 -0700 (PDT) From: Willem de Bruijn To: netdev@vger.kernel.org Cc: davem@davemloft.net, alexander.duyck@gmail.com, Willem de Bruijn Subject: [PATCH net-next v2 11/11] selftests: udp gso benchmark Date: Thu, 26 Apr 2018 13:42:25 -0400 Message-Id: <20180426174225.246388-12-willemdebruijn.kernel@gmail.com> X-Mailer: git-send-email 2.17.0.484.g0c8726318c-goog In-Reply-To: <20180426174225.246388-1-willemdebruijn.kernel@gmail.com> References: <20180426174225.246388-1-willemdebruijn.kernel@gmail.com> Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Willem de Bruijn Send udp data between a source and sink, optionally with udp gso. The two processes are expected to be run on separate hosts. A script is included that runs them together over loopback in a single namespace for functionality testing. Signed-off-by: Willem de Bruijn --- tools/testing/selftests/net/.gitignore | 2 + tools/testing/selftests/net/Makefile | 3 +- tools/testing/selftests/net/udpgso_bench.sh | 74 +++ tools/testing/selftests/net/udpgso_bench_rx.c | 265 +++++++++++ tools/testing/selftests/net/udpgso_bench_tx.c | 420 ++++++++++++++++++ 5 files changed, 763 insertions(+), 1 deletion(-) create mode 100755 tools/testing/selftests/net/udpgso_bench.sh create mode 100644 tools/testing/selftests/net/udpgso_bench_rx.c create mode 100644 tools/testing/selftests/net/udpgso_bench_tx.c diff --git a/tools/testing/selftests/net/.gitignore b/tools/testing/selftests/net/.gitignore index 67a26240407c..f0e6c35a93ae 100644 --- a/tools/testing/selftests/net/.gitignore +++ b/tools/testing/selftests/net/.gitignore @@ -9,3 +9,5 @@ reuseport_dualstack reuseaddr_conflict tcp_mmap udpgso +udpgso_bench_rx +udpgso_bench_tx diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile index db3303348315..df9102ec7b7a 100644 --- a/tools/testing/selftests/net/Makefile +++ b/tools/testing/selftests/net/Makefile @@ -6,12 +6,13 @@ CFLAGS += -I../../../../usr/include/ TEST_PROGS := run_netsocktests run_afpackettests test_bpf.sh netdevice.sh rtnetlink.sh TEST_PROGS += fib_tests.sh fib-onlink-tests.sh in_netns.sh pmtu.sh udpgso.sh +TEST_PROGS += udpgso_bench.sh TEST_GEN_FILES = socket TEST_GEN_FILES += psock_fanout psock_tpacket msg_zerocopy TEST_GEN_FILES += tcp_mmap TEST_GEN_PROGS = reuseport_bpf reuseport_bpf_cpu reuseport_bpf_numa TEST_GEN_PROGS += reuseport_dualstack reuseaddr_conflict -TEST_GEN_PROGS += udpgso +TEST_GEN_PROGS += udpgso udpgso_bench_tx udpgso_bench_rx include ../lib.mk diff --git a/tools/testing/selftests/net/udpgso_bench.sh b/tools/testing/selftests/net/udpgso_bench.sh new file mode 100755 index 000000000000..792fa4d0285e --- /dev/null +++ b/tools/testing/selftests/net/udpgso_bench.sh @@ -0,0 +1,74 @@ +#!/bin/sh +# SPDX-License-Identifier: GPL-2.0 +# +# Run a series of udpgso benchmarks + +wake_children() { + local -r jobs="$(jobs -p)" + + if [[ "${jobs}" != "" ]]; then + kill -1 ${jobs} 2>/dev/null + fi +} +trap wake_children EXIT + +run_one() { + local -r args=$@ + + ./udpgso_bench_rx & + ./udpgso_bench_rx -t & + + ./udpgso_bench_tx ${args} +} + +run_in_netns() { + local -r args=$@ + + ./in_netns.sh $0 __subprocess ${args} +} + +run_udp() { + local -r args=$@ + + echo "udp" + run_in_netns ${args} + + echo "udp gso" + run_in_netns ${args} -S + + echo "udp gso zerocopy" + run_in_netns ${args} -S -z +} + +run_tcp() { + local -r args=$@ + + echo "tcp" + run_in_netns ${args} -t + + echo "tcp zerocopy" + run_in_netns ${args} -t -z +} + +run_all() { + local -r core_args="-l 4" + local -r ipv4_args="${core_args} -4 -D 127.0.0.1" + local -r ipv6_args="${core_args} -6 -D ::1" + + echo "ipv4" + run_tcp "${ipv4_args}" + run_udp "${ipv4_args}" + + echo "ipv6" + run_tcp "${ipv4_args}" + run_udp "${ipv6_args}" +} + +if [[ $# -eq 0 ]]; then + run_all +elif [[ $1 == "__subprocess" ]]; then + shift + run_one $@ +else + run_in_netns $@ +fi diff --git a/tools/testing/selftests/net/udpgso_bench_rx.c b/tools/testing/selftests/net/udpgso_bench_rx.c new file mode 100644 index 000000000000..727cf67a3f75 --- /dev/null +++ b/tools/testing/selftests/net/udpgso_bench_rx.c @@ -0,0 +1,265 @@ +// SPDX-License-Identifier: GPL-2.0 + +#define _GNU_SOURCE + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +static int cfg_port = 8000; +static bool cfg_tcp; +static bool cfg_verify; + +static bool interrupted; +static unsigned long packets, bytes; + +static void sigint_handler(int signum) +{ + if (signum == SIGINT) + interrupted = true; +} + +static unsigned long gettimeofday_ms(void) +{ + struct timeval tv; + + gettimeofday(&tv, NULL); + return (tv.tv_sec * 1000) + (tv.tv_usec / 1000); +} + +static void do_poll(int fd) +{ + struct pollfd pfd; + int ret; + + pfd.events = POLLIN; + pfd.revents = 0; + pfd.fd = fd; + + do { + ret = poll(&pfd, 1, 10); + if (ret == -1) + error(1, errno, "poll"); + if (ret == 0) + continue; + if (pfd.revents != POLLIN) + error(1, errno, "poll: 0x%x expected 0x%x\n", + pfd.revents, POLLIN); + } while (!ret && !interrupted); +} + +static int do_socket(bool do_tcp) +{ + struct sockaddr_in6 addr = {0}; + int fd, val; + + fd = socket(PF_INET6, cfg_tcp ? SOCK_STREAM : SOCK_DGRAM, 0); + if (fd == -1) + error(1, errno, "socket"); + + val = 1 << 21; + if (setsockopt(fd, SOL_SOCKET, SO_RCVBUF, &val, sizeof(val))) + error(1, errno, "setsockopt rcvbuf"); + val = 1; + if (setsockopt(fd, SOL_SOCKET, SO_REUSEPORT, &val, sizeof(val))) + error(1, errno, "setsockopt reuseport"); + + addr.sin6_family = PF_INET6; + addr.sin6_port = htons(cfg_port); + addr.sin6_addr = in6addr_any; + if (bind(fd, (void *) &addr, sizeof(addr))) + error(1, errno, "bind"); + + if (do_tcp) { + int accept_fd = fd; + + if (listen(accept_fd, 1)) + error(1, errno, "listen"); + + do_poll(accept_fd); + + fd = accept(accept_fd, NULL, NULL); + if (fd == -1) + error(1, errno, "accept"); + if (close(accept_fd)) + error(1, errno, "close accept fd"); + } + + return fd; +} + +/* Flush all outstanding bytes for the tcp receive queue */ +static void do_flush_tcp(int fd) +{ + int ret; + + while (true) { + /* MSG_TRUNC flushes up to len bytes */ + ret = recv(fd, NULL, 1 << 21, MSG_TRUNC | MSG_DONTWAIT); + if (ret == -1 && errno == EAGAIN) + return; + if (ret == -1) + error(1, errno, "flush"); + if (ret == 0) { + /* client detached */ + exit(0); + } + + packets++; + bytes += ret; + } + +} + +static char sanitized_char(char val) +{ + return (val >= 'a' && val <= 'z') ? val : '.'; +} + +static void do_verify_udp(const char *data, int len) +{ + char cur = data[0]; + int i; + + /* verify contents */ + if (cur < 'a' || cur > 'z') + error(1, 0, "data initial byte out of range"); + + for (i = 1; i < len; i++) { + if (cur == 'z') + cur = 'a'; + else + cur++; + + if (data[i] != cur) + error(1, 0, "data[%d]: len %d, %c(%hhu) != %c(%hhu)\n", + i, len, + sanitized_char(data[i]), data[i], + sanitized_char(cur), cur); + } +} + +/* Flush all outstanding datagrams. Verify first few bytes of each. */ +static void do_flush_udp(int fd) +{ + static char rbuf[ETH_DATA_LEN]; + int ret, len, budget = 256; + + len = cfg_verify ? sizeof(rbuf) : 0; + while (budget--) { + /* MSG_TRUNC will make return value full datagram length */ + ret = recv(fd, rbuf, len, MSG_TRUNC | MSG_DONTWAIT); + if (ret == -1 && errno == EAGAIN) + return; + if (ret == -1) + error(1, errno, "recv"); + if (len) { + if (ret == 0) + error(1, errno, "recv: 0 byte datagram\n"); + + do_verify_udp(rbuf, ret); + } + + packets++; + bytes += ret; + } +} + +static void usage(const char *filepath) +{ + error(1, 0, "Usage: %s [-tv] [-p port]", filepath); +} + +static void parse_opts(int argc, char **argv) +{ + int c; + + while ((c = getopt(argc, argv, "ptv")) != -1) { + switch (c) { + case 'p': + cfg_port = htons(strtoul(optarg, NULL, 0)); + break; + case 't': + cfg_tcp = true; + break; + case 'v': + cfg_verify = true; + break; + } + } + + if (optind != argc) + usage(argv[0]); + + if (cfg_tcp && cfg_verify) + error(1, 0, "TODO: implement verify mode for tcp"); +} + +static void do_recv(void) +{ + unsigned long tnow, treport; + int fd; + + fd = do_socket(cfg_tcp); + + treport = gettimeofday_ms() + 1000; + do { + do_poll(fd); + + if (cfg_tcp) + do_flush_tcp(fd); + else + do_flush_udp(fd); + + tnow = gettimeofday_ms(); + if (tnow > treport) { + if (packets) + fprintf(stderr, + "%s rx: %6lu MB/s %8lu calls/s\n", + cfg_tcp ? "tcp" : "udp", + bytes >> 20, packets); + bytes = packets = 0; + treport = tnow + 1000; + } + + } while (!interrupted); + + if (close(fd)) + error(1, errno, "close"); +} + +int main(int argc, char **argv) +{ + parse_opts(argc, argv); + + signal(SIGINT, sigint_handler); + + do_recv(); + + return 0; +} diff --git a/tools/testing/selftests/net/udpgso_bench_tx.c b/tools/testing/selftests/net/udpgso_bench_tx.c new file mode 100644 index 000000000000..e821564053cf --- /dev/null +++ b/tools/testing/selftests/net/udpgso_bench_tx.c @@ -0,0 +1,420 @@ +// SPDX-License-Identifier: GPL-2.0 + +#define _GNU_SOURCE + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#ifndef ETH_MAX_MTU +#define ETH_MAX_MTU 0xFFFFU +#endif + +#ifndef UDP_SEGMENT +#define UDP_SEGMENT 103 +#endif + +#ifndef SO_ZEROCOPY +#define SO_ZEROCOPY 60 +#endif + +#ifndef MSG_ZEROCOPY +#define MSG_ZEROCOPY 0x4000000 +#endif + +#define NUM_PKT 100 + +static bool cfg_cache_trash; +static int cfg_cpu = -1; +static int cfg_connected = true; +static int cfg_family = PF_UNSPEC; +static uint16_t cfg_mss; +static int cfg_payload_len = (1472 * 42); +static int cfg_port = 8000; +static int cfg_runtime_ms = -1; +static bool cfg_segment; +static bool cfg_sendmmsg; +static bool cfg_tcp; +static bool cfg_zerocopy; + +static socklen_t cfg_alen; +static struct sockaddr_storage cfg_dst_addr; + +static bool interrupted; +static char buf[NUM_PKT][ETH_MAX_MTU]; + +static void sigint_handler(int signum) +{ + if (signum == SIGINT) + interrupted = true; +} + +static unsigned long gettimeofday_ms(void) +{ + struct timeval tv; + + gettimeofday(&tv, NULL); + return (tv.tv_sec * 1000) + (tv.tv_usec / 1000); +} + +static int set_cpu(int cpu) +{ + cpu_set_t mask; + + CPU_ZERO(&mask); + CPU_SET(cpu, &mask); + if (sched_setaffinity(0, sizeof(mask), &mask)) + error(1, 0, "setaffinity %d", cpu); + + return 0; +} + +static void setup_sockaddr(int domain, const char *str_addr, void *sockaddr) +{ + struct sockaddr_in6 *addr6 = (void *) sockaddr; + struct sockaddr_in *addr4 = (void *) sockaddr; + + switch (domain) { + case PF_INET: + addr4->sin_family = AF_INET; + addr4->sin_port = htons(cfg_port); + if (inet_pton(AF_INET, str_addr, &(addr4->sin_addr)) != 1) + error(1, 0, "ipv4 parse error: %s", str_addr); + break; + case PF_INET6: + addr6->sin6_family = AF_INET6; + addr6->sin6_port = htons(cfg_port); + if (inet_pton(AF_INET6, str_addr, &(addr6->sin6_addr)) != 1) + error(1, 0, "ipv6 parse error: %s", str_addr); + break; + default: + error(1, 0, "illegal domain"); + } +} + +static void flush_zerocopy(int fd) +{ + struct msghdr msg = {0}; /* flush */ + int ret; + + while (1) { + ret = recvmsg(fd, &msg, MSG_ERRQUEUE); + if (ret == -1 && errno == EAGAIN) + break; + if (ret == -1) + error(1, errno, "errqueue"); + if (msg.msg_flags != (MSG_ERRQUEUE | MSG_CTRUNC)) + error(1, 0, "errqueue: flags 0x%x\n", msg.msg_flags); + msg.msg_flags = 0; + } +} + +static int send_tcp(int fd, char *data) +{ + int ret, done = 0, count = 0; + + while (done < cfg_payload_len) { + ret = send(fd, data + done, cfg_payload_len - done, + cfg_zerocopy ? MSG_ZEROCOPY : 0); + if (ret == -1) + error(1, errno, "write"); + + done += ret; + count++; + } + + return count; +} + +static int send_udp(int fd, char *data) +{ + int ret, total_len, len, count = 0; + + total_len = cfg_payload_len; + + while (total_len) { + len = total_len < cfg_mss ? total_len : cfg_mss; + + ret = sendto(fd, data, len, cfg_zerocopy ? MSG_ZEROCOPY : 0, + cfg_connected ? NULL : (void *)&cfg_dst_addr, + cfg_connected ? 0 : cfg_alen); + if (ret == -1) + error(1, errno, "write"); + if (ret != len) + error(1, errno, "write: %uB != %uB\n", ret, len); + + total_len -= len; + count++; + } + + return count; +} + +static int send_udp_sendmmsg(int fd, char *data) +{ + const int max_nr_msg = ETH_MAX_MTU / ETH_DATA_LEN; + struct mmsghdr mmsgs[max_nr_msg]; + struct iovec iov[max_nr_msg]; + unsigned int off = 0, left; + int i = 0, ret; + + memset(mmsgs, 0, sizeof(mmsgs)); + + left = cfg_payload_len; + while (left) { + if (i == max_nr_msg) + error(1, 0, "sendmmsg: exceeds max_nr_msg"); + + iov[i].iov_base = data + off; + iov[i].iov_len = cfg_mss < left ? cfg_mss : left; + + mmsgs[i].msg_hdr.msg_iov = iov + i; + mmsgs[i].msg_hdr.msg_iovlen = 1; + + off += iov[i].iov_len; + left -= iov[i].iov_len; + i++; + } + + ret = sendmmsg(fd, mmsgs, i, cfg_zerocopy ? MSG_ZEROCOPY : 0); + if (ret == -1) + error(1, errno, "sendmmsg"); + + return ret; +} + +static void send_udp_segment_cmsg(struct cmsghdr *cm) +{ + uint16_t *valp; + + cm->cmsg_level = SOL_UDP; + cm->cmsg_type = UDP_SEGMENT; + cm->cmsg_len = CMSG_LEN(sizeof(cfg_mss)); + valp = (void *)CMSG_DATA(cm); + *valp = cfg_mss; +} + +static int send_udp_segment(int fd, char *data) +{ + char control[CMSG_SPACE(sizeof(cfg_mss))] = {0}; + struct msghdr msg = {0}; + struct iovec iov = {0}; + int ret; + + iov.iov_base = data; + iov.iov_len = cfg_payload_len; + + msg.msg_iov = &iov; + msg.msg_iovlen = 1; + + msg.msg_control = control; + msg.msg_controllen = sizeof(control); + send_udp_segment_cmsg(CMSG_FIRSTHDR(&msg)); + + msg.msg_name = (void *)&cfg_dst_addr; + msg.msg_namelen = cfg_alen; + + ret = sendmsg(fd, &msg, cfg_zerocopy ? MSG_ZEROCOPY : 0); + if (ret == -1) + error(1, errno, "sendmsg"); + if (ret != iov.iov_len) + error(1, 0, "sendmsg: %u != %lu\n", ret, iov.iov_len); + + return 1; +} + +static void usage(const char *filepath) +{ + error(1, 0, "Usage: %s [-46cmStuz] [-C cpu] [-D dst ip] [-l secs] [-p port] [-s sendsize]", + filepath); +} + +static void parse_opts(int argc, char **argv) +{ + int max_len, hdrlen; + int c; + + while ((c = getopt(argc, argv, "46cC:D:l:mp:s:Stuz")) != -1) { + switch (c) { + case '4': + if (cfg_family != PF_UNSPEC) + error(1, 0, "Pass one of -4 or -6"); + cfg_family = PF_INET; + cfg_alen = sizeof(struct sockaddr_in); + break; + case '6': + if (cfg_family != PF_UNSPEC) + error(1, 0, "Pass one of -4 or -6"); + cfg_family = PF_INET6; + cfg_alen = sizeof(struct sockaddr_in6); + break; + case 'c': + cfg_cache_trash = true; + break; + case 'C': + cfg_cpu = strtol(optarg, NULL, 0); + break; + case 'D': + setup_sockaddr(cfg_family, optarg, &cfg_dst_addr); + break; + case 'l': + cfg_runtime_ms = strtoul(optarg, NULL, 10) * 1000; + break; + case 'm': + cfg_sendmmsg = true; + break; + case 'p': + cfg_port = strtoul(optarg, NULL, 0); + break; + case 's': + cfg_payload_len = strtoul(optarg, NULL, 0); + break; + case 'S': + cfg_segment = true; + break; + case 't': + cfg_tcp = true; + break; + case 'u': + cfg_connected = false; + break; + case 'z': + cfg_zerocopy = true; + break; + } + } + + if (optind != argc) + usage(argv[0]); + + if (cfg_family == PF_UNSPEC) + error(1, 0, "must pass one of -4 or -6"); + if (cfg_tcp && !cfg_connected) + error(1, 0, "connectionless tcp makes no sense"); + if (cfg_segment && cfg_sendmmsg) + error(1, 0, "cannot combine segment offload and sendmmsg"); + + if (cfg_family == PF_INET) + hdrlen = sizeof(struct iphdr) + sizeof(struct udphdr); + else + hdrlen = sizeof(struct ip6_hdr) + sizeof(struct udphdr); + + cfg_mss = ETH_DATA_LEN - hdrlen; + max_len = ETH_MAX_MTU - hdrlen; + + if (cfg_payload_len > max_len) + error(1, 0, "payload length %u exceeds max %u", + cfg_payload_len, max_len); +} + +static void set_pmtu_discover(int fd, bool is_ipv4) +{ + int level, name, val; + + if (is_ipv4) { + level = SOL_IP; + name = IP_MTU_DISCOVER; + val = IP_PMTUDISC_DO; + } else { + level = SOL_IPV6; + name = IPV6_MTU_DISCOVER; + val = IPV6_PMTUDISC_DO; + } + + if (setsockopt(fd, level, name, &val, sizeof(val))) + error(1, errno, "setsockopt path mtu"); +} + +int main(int argc, char **argv) +{ + unsigned long num_msgs, num_sends; + unsigned long tnow, treport, tstop; + int fd, i, val; + + parse_opts(argc, argv); + + if (cfg_cpu > 0) + set_cpu(cfg_cpu); + + for (i = 0; i < sizeof(buf[0]); i++) + buf[0][i] = 'a' + (i % 26); + for (i = 1; i < NUM_PKT; i++) + memcpy(buf[i], buf[0], sizeof(buf[0])); + + signal(SIGINT, sigint_handler); + + fd = socket(cfg_family, cfg_tcp ? SOCK_STREAM : SOCK_DGRAM, 0); + if (fd == -1) + error(1, errno, "socket"); + + if (cfg_zerocopy) { + val = 1; + if (setsockopt(fd, SOL_SOCKET, SO_ZEROCOPY, &val, sizeof(val))) + error(1, errno, "setsockopt zerocopy"); + } + + if (cfg_connected && + connect(fd, (void *)&cfg_dst_addr, cfg_alen)) + error(1, errno, "connect"); + + if (cfg_segment) + set_pmtu_discover(fd, cfg_family == PF_INET); + + num_msgs = num_sends = 0; + tnow = gettimeofday_ms(); + tstop = tnow + cfg_runtime_ms; + treport = tnow + 1000; + + i = 0; + do { + if (cfg_tcp) + num_sends += send_tcp(fd, buf[i]); + else if (cfg_segment) + num_sends += send_udp_segment(fd, buf[i]); + else if (cfg_sendmmsg) + num_sends += send_udp_sendmmsg(fd, buf[i]); + else + num_sends += send_udp(fd, buf[i]); + num_msgs++; + + if (cfg_zerocopy && ((num_msgs & 0xF) == 0)) + flush_zerocopy(fd); + + tnow = gettimeofday_ms(); + if (tnow > treport) { + fprintf(stderr, + "%s tx: %6lu MB/s %8lu calls/s %6lu msg/s\n", + cfg_tcp ? "tcp" : "udp", + (num_msgs * cfg_payload_len) >> 20, + num_sends, num_msgs); + num_msgs = num_sends = 0; + treport = tnow + 1000; + } + + /* cold cache when writing buffer */ + if (cfg_cache_trash) + i = ++i < NUM_PKT ? i : 0; + + } while (!interrupted && (cfg_runtime_ms == -1 || tnow < tstop)); + + if (close(fd)) + error(1, errno, "close"); + + return 0; +}