From patchwork Fri May 8 18:10:51 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kelsey Skunberg X-Patchwork-Id: 1286381 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 49JdgY2NdXz9sSW; Sat, 9 May 2020 04:11:21 +1000 (AEST) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1jX7Sv-0000x0-1P; Fri, 08 May 2020 18:11:17 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1jX7Sr-0000vf-Js for kernel-team@lists.ubuntu.com; Fri, 08 May 2020 18:11:13 +0000 Received: from mail-io1-f71.google.com ([209.85.166.71]) by youngberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1jX7Sr-0007Pz-85 for kernel-team@lists.ubuntu.com; Fri, 08 May 2020 18:11:13 +0000 Received: by mail-io1-f71.google.com with SMTP id m16so2671312ion.3 for ; Fri, 08 May 2020 11:11:13 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=f4vnTz4XZmDRKnWPzXh0oQ+49Z4dMRVyTXJaGAIJlNc=; b=Jt+bFY1Xv6Ed4Y7y25/+vLbyIeeWzlgw/CTv46btS4CKE4gMHdHdgNb8mGcvMea8Id KLaFF879d7eFw97LJuijUQyeNWHrhwRiDq+RQMkSETLTC/bZ0+d/HYIKwDoutbBnqc68 KmhmeWyH+PR/lDXsBMtcZFkMyWuWdXa2VpIPRLfucnLgRzBsJGhop5N2fVyJW9h/gt7E RgMTsfFbH3GgQgMUAtiWnUHAvv+WwUB5xNeh+0rwNfnY6xQOkvl6DUf4sgbpSFguO2g0 h0xfKn4GczwtLfMTBNNDFkIn2iUZKYpck773LonJPTvs6zmQqlb9lEoZjuSHRpYdYv58 VG4w== X-Gm-Message-State: AGi0PuZr0rGuZmO57z6EJb4gfbS6OcCNTYyAdPw3tpZvH7dOkGGlMAAa 2gfxIYnqeVEkeE28Lv1ScZtfY8xoveWJvPcP6KNfQ+b0f4AkoEIF5Y5Ww3lOIzKVwVjv2pZRjcd 9zH8Yr76xlxxbHC0eikL07FAX+I1CgYprPteM/xlfCg== X-Received: by 2002:a02:c77b:: with SMTP id k27mr3702017jao.139.1588961472017; Fri, 08 May 2020 11:11:12 -0700 (PDT) X-Google-Smtp-Source: APiQypLtVYMGVGmyuxQ1ObyAt7UjcudnOmLpZLMZ88AySHooDGqikIiQbMUh2s7O/4OSkV0I/cDX6g== X-Received: by 2002:a02:c77b:: with SMTP id k27mr3701991jao.139.1588961471674; Fri, 08 May 2020 11:11:11 -0700 (PDT) Received: from localhost.localdomain (c-73-243-191-173.hsd1.co.comcast.net. [73.243.191.173]) by smtp.gmail.com with ESMTPSA id j4sm1106098ilq.5.2020.05.08.11.11.11 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 08 May 2020 11:11:11 -0700 (PDT) From: Kelsey Skunberg To: kernel-team@lists.ubuntu.com Subject: [X][PATCH 2/2] net: openvswitch: add hash info to upcall Date: Fri, 8 May 2020 12:10:51 -0600 Message-Id: <20200508181051.25162-3-kelsey.skunberg@canonical.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20200508181051.25162-1-kelsey.skunberg@canonical.com> References: <20200508181051.25162-1-kelsey.skunberg@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Tonghao Zhang BugLink: https://bugs.launchpad.net/bugs/1860986 When using the kernel datapath, the upcall don't include skb hash info relatived. That will introduce some problem, because the hash of skb is important in kernel stack. For example, VXLAN module uses it to select UDP src port. The tx queue selection may also use the hash in stack. Hash is computed in different ways. Hash is random for a TCP socket, and hash may be computed in hardware, or software stack. Recalculation hash is not easy. Hash of TCP socket is computed: tcp_v4_connect -> sk_set_txhash (is random) __tcp_transmit_skb -> skb_set_hash_from_sk There will be one upcall, without information of skb hash, to ovs-vswitchd, for the first packet of a TCP session. The rest packets will be processed in Open vSwitch modules, hash kept. If this tcp session is forward to VXLAN module, then the UDP src port of first tcp packet is different from rest packets. TCP packets may come from the host or dockers, to Open vSwitch. To fix it, we store the hash info to upcall, and restore hash when packets sent back. +---------------+ +-------------------------+ | Docker/VMs | | ovs-vswitchd | +----+----------+ +-+--------------------+--+ | ^ | | | | | | upcall v restore packet hash (not recalculate) | +-+--------------------+--+ | tap netdev | | vxlan module +---------------> +--> Open vSwitch ko +--> or internal type | | +-------------------------+ Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2019-October/364062.html Signed-off-by: Tonghao Zhang Acked-by: Pravin B Shelar Signed-off-by: David S. Miller (cherry picked from commit bd1903b7c4596ba6f7677d0dfefd05ba5876707d) Signed-off-by: Kelsey Skunberg --- include/uapi/linux/openvswitch.h | 2 ++ net/openvswitch/datapath.c | 26 +++++++++++++++++++++++++- net/openvswitch/datapath.h | 12 ++++++++++++ 3 files changed, 39 insertions(+), 1 deletion(-) diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h index a27222d5b413..784fb8d7e08d 100644 --- a/include/uapi/linux/openvswitch.h +++ b/include/uapi/linux/openvswitch.h @@ -167,6 +167,7 @@ enum ovs_packet_cmd { * @OVS_PACKET_ATTR_MRU: Present for an %OVS_PACKET_CMD_ACTION and * %OVS_PACKET_ATTR_USERSPACE action specify the Maximum received fragment * size. + * @OVS_PACKET_ATTR_HASH: Packet hash info (e.g. hash, sw_hash and l4_hash in skb). * * These attributes follow the &struct ovs_header within the Generic Netlink * payload for %OVS_PACKET_* commands. @@ -184,6 +185,7 @@ enum ovs_packet_attr { OVS_PACKET_ATTR_PROBE, /* Packet operation is a feature probe, error logging should be suppressed. */ OVS_PACKET_ATTR_MRU, /* Maximum received IP fragment size. */ + OVS_PACKET_ATTR_HASH, /* Packet hash. */ __OVS_PACKET_ATTR_MAX }; diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c index d21d914ad9cc..7ed8b3882961 100644 --- a/net/openvswitch/datapath.c +++ b/net/openvswitch/datapath.c @@ -383,7 +383,8 @@ static size_t upcall_msg_size(const struct dp_upcall_info *upcall_info, { size_t size = NLMSG_ALIGN(sizeof(struct ovs_header)) + nla_total_size(hdrlen) /* OVS_PACKET_ATTR_PACKET */ - + nla_total_size(ovs_key_attr_size()); /* OVS_PACKET_ATTR_KEY */ + + nla_total_size(ovs_key_attr_size()) /* OVS_PACKET_ATTR_KEY */ + + nla_total_size(sizeof(u64)); /* OVS_PACKET_ATTR_HASH */ /* OVS_PACKET_ATTR_USERDATA */ if (upcall_info->userdata) @@ -429,6 +430,7 @@ static int queue_userspace_packet(struct datapath *dp, struct sk_buff *skb, size_t len; unsigned int hlen; int err, dp_ifindex; + u64 hash; dp_ifindex = get_dpifindex(dp); if (!dp_ifindex) @@ -513,6 +515,19 @@ static int queue_userspace_packet(struct datapath *dp, struct sk_buff *skb, pad_packet(dp, user_skb); } + /* Add OVS_PACKET_ATTR_HASH */ + hash = skb_get_hash_raw(skb); + if (skb->sw_hash) + hash |= OVS_PACKET_HASH_SW_BIT; + + if (skb->l4_hash) + hash |= OVS_PACKET_HASH_L4_BIT; + + if (nla_put(user_skb, OVS_PACKET_ATTR_HASH, sizeof (u64), &hash)) { + err = -ENOBUFS; + goto out; + } + /* Only reserve room for attribute header, packet data is added * in skb_zerocopy() */ if (!(nla = nla_reserve(user_skb, OVS_PACKET_ATTR_PACKET, 0))) { @@ -553,6 +568,7 @@ static int ovs_packet_cmd_execute(struct sk_buff *skb, struct genl_info *info) struct ethhdr *eth; struct vport *input_vport; u16 mru = 0; + u64 hash; int len; int err; bool log = !a[OVS_PACKET_ATTR_PROBE]; @@ -589,6 +605,14 @@ static int ovs_packet_cmd_execute(struct sk_buff *skb, struct genl_info *info) } OVS_CB(packet)->mru = mru; + if (a[OVS_PACKET_ATTR_HASH]) { + hash = nla_get_u64(a[OVS_PACKET_ATTR_HASH]); + + __skb_set_hash(packet, hash & 0xFFFFFFFFULL, + !!(hash & OVS_PACKET_HASH_SW_BIT), + !!(hash & OVS_PACKET_HASH_L4_BIT)); + } + /* Build an sw_flow for sending this packet. */ flow = ovs_flow_alloc(); err = PTR_ERR(flow); diff --git a/net/openvswitch/datapath.h b/net/openvswitch/datapath.h index 67bdecd9fdc1..6bc11fd27f35 100644 --- a/net/openvswitch/datapath.h +++ b/net/openvswitch/datapath.h @@ -138,6 +138,18 @@ struct ovs_net { bool xt_label; }; +/** + * enum ovs_pkt_hash_types - hash info to include with a packet + * to send to userspace. + * @OVS_PACKET_HASH_SW_BIT: indicates hash was computed in software stack. + * @OVS_PACKET_HASH_L4_BIT: indicates hash is a canonical 4-tuple hash + * over transport ports. + */ +enum ovs_pkt_hash_types { + OVS_PACKET_HASH_SW_BIT = (1ULL << 32), + OVS_PACKET_HASH_L4_BIT = (1ULL << 33), +}; + extern int ovs_net_id; void ovs_lock(void); void ovs_unlock(void);