From patchwork Wed Apr 28 21:56:52 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Valerio X-Patchwork-Id: 1471442 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=2605:bc80:3010::137; helo=smtp4.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=BLN8rjiP; dkim-atps=neutral Received: from smtp4.osuosl.org (smtp4.osuosl.org [IPv6:2605:bc80:3010::137]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4FVst83WQCz9sj5 for ; Thu, 29 Apr 2021 07:57:04 +1000 (AEST) Received: from localhost (localhost [127.0.0.1]) by smtp4.osuosl.org (Postfix) with ESMTP id 7342B40E6D; Wed, 28 Apr 2021 21:57:02 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp4.osuosl.org ([127.0.0.1]) by localhost (smtp4.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 6k_f1Mj9pt3J; Wed, 28 Apr 2021 21:57:01 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [IPv6:2605:bc80:3010:104::8cd3:938]) by smtp4.osuosl.org (Postfix) with ESMTP id 42C9440E4A; Wed, 28 Apr 2021 21:57:00 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 0E16DC000E; Wed, 28 Apr 2021 21:57:00 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp2.osuosl.org (smtp2.osuosl.org [140.211.166.133]) by lists.linuxfoundation.org (Postfix) with ESMTP id 1D90EC0001 for ; Wed, 28 Apr 2021 21:56:59 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp2.osuosl.org (Postfix) with ESMTP id 00B68400EB for ; Wed, 28 Apr 2021 21:56:59 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Authentication-Results: smtp2.osuosl.org (amavisd-new); dkim=pass (1024-bit key) header.d=redhat.com Received: from smtp2.osuosl.org ([127.0.0.1]) by localhost (smtp2.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id vqgzGb5apVp1 for ; Wed, 28 Apr 2021 21:56:58 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.8.0 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by smtp2.osuosl.org (Postfix) with ESMTPS id 40EC14064F for ; Wed, 28 Apr 2021 21:56:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1619647017; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=BQdiXnIpT9an1V/fy0HzhkRdQxvlQ3XsUz80PCynMkc=; b=BLN8rjiP88WRZQyKeABfzIfkKse5rwdgp1H+AtfSKUE3/dKilsILNP9tERCizdSp/T+BJF P1UfFAbTeTCgqcyD2gfKbzEt3e+7Hj2DEWwjL/OAyqOpLiLecrennUi7XBMbgPAiEMq91M kd2FUD/9AuBqUN57BNv2/R9eizWCovI= Received: from mail-ed1-f72.google.com (mail-ed1-f72.google.com [209.85.208.72]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-383-HROL9CcyOgmoDgy_S7-SuA-1; Wed, 28 Apr 2021 17:56:54 -0400 X-MC-Unique: HROL9CcyOgmoDgy_S7-SuA-1 Received: by mail-ed1-f72.google.com with SMTP id z12-20020aa7d40c0000b0290388179cc8bfso1487181edq.21 for ; Wed, 28 Apr 2021 14:56:54 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:from:to:cc:date:message-id:in-reply-to :references:user-agent:mime-version:content-transfer-encoding; bh=BQdiXnIpT9an1V/fy0HzhkRdQxvlQ3XsUz80PCynMkc=; b=W9doG5vBG0BelccfPGRVAi2p3PZSIOBIiTI0dZNDOgL96ht/J/0PCdrtqkDv57Vin1 2+G2g1SWtdHOVQwZ8yi+m6Q4lkZopgEorxDLAOBAeDAK+j0oPKsKwWXXDzpeVRxs26nB 98mcVcvnsXL6aWnd+YARbkIPE4Bbhi5peEQfAZ/nBAiRWBoIuZE/sQN3d67463vxmzYI uJoLp6nhFMERyRvif+81N0PkhSR4yan/SmgE5c/6iPwhESO6LNopzdCC6qFzzunEreey PHD5namuB0a8j7roydl4tuoKPaFfV+/8uhr14egm/ASdJ3Pd9uh2dBzHZSDqSvfJR66V AJjQ== X-Gm-Message-State: AOAM531ZjAybnHTCK0aT0cKc9zZ8ldn2Ww4H1ghr9bXM7FtKvYKmg3C7 y+eoChSirQQMqTkkSGLK8cKHSpnQpv/N5/y444Wq7mOjhrLuxH9wowHBJubJV+VURJMk4vKUma/ evZUt8Gz/d53Md2oxOzl0eI/daoqA84MWtMSEkMUseQ/pI75258znwHnbQcUvMbSD X-Received: by 2002:a17:906:a295:: with SMTP id i21mr4949465ejz.160.1619647013605; Wed, 28 Apr 2021 14:56:53 -0700 (PDT) X-Google-Smtp-Source: ABdhPJydYYhDDPzWjURIls/dAobKpM/GrUbQzhKwe1wTBPtqe9nzWPRT+zHW+Oy6NsiAp6M4z89CJA== X-Received: by 2002:a17:906:a295:: with SMTP id i21mr4949449ejz.160.1619647013450; Wed, 28 Apr 2021 14:56:53 -0700 (PDT) Received: from localhost (net-37-119-128-110.cust.vodafonedsl.it. [37.119.128.110]) by smtp.gmail.com with ESMTPSA id n15sm609676eje.118.2021.04.28.14.56.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 28 Apr 2021 14:56:53 -0700 (PDT) From: Paolo Valerio To: dev@openvswitch.org Date: Wed, 28 Apr 2021 23:56:52 +0200 Message-ID: <161964701197.293712.14695746221814517261.stgit@fed.void> In-Reply-To: <161964677372.293712.9125314386461543424.stgit@fed.void> References: <161964677372.293712.9125314386461543424.stgit@fed.void> User-Agent: StGit/0.23 MIME-Version: 1.0 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=pvalerio@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Cc: i.maximets@ovn.org Subject: [ovs-dev] [PATCH v6 1/2] util.h: add token concatenation macro with argument expansion X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" this macro is handy when it comes paste two tokens when one or both are macros. Rename CURSOR_JOIN() to OVS_JOIN() and move it to util.h so that it can be reused. Signed-off-by: Paolo Valerio Acked-by: Gaetan Rivet Acked-by: Aaron Conole --- lib/cmap.h | 5 +---- lib/util.h | 7 +++++++ 2 files changed, 8 insertions(+), 4 deletions(-) diff --git a/lib/cmap.h b/lib/cmap.h index d9db3c915..c502d2311 100644 --- a/lib/cmap.h +++ b/lib/cmap.h @@ -245,9 +245,6 @@ void cmap_cursor_advance(struct cmap_cursor *); /* Generate a unique name for the cursor with the __COUNTER__ macro to * allow nesting of CMAP_FOR_EACH loops. */ -#define CURSOR_JOIN2(x,y) x##y -#define CURSOR_JOIN(x, y) CURSOR_JOIN2(x,y) - #define CMAP_FOR_EACH__(NODE, MEMBER, CMAP, CURSOR_NAME) \ for (struct cmap_cursor CURSOR_NAME = cmap_cursor_start(CMAP); \ CMAP_CURSOR_FOR_EACH__(NODE, &CURSOR_NAME, MEMBER); \ @@ -255,7 +252,7 @@ void cmap_cursor_advance(struct cmap_cursor *); #define CMAP_FOR_EACH(NODE, MEMBER, CMAP) \ CMAP_FOR_EACH__(NODE, MEMBER, CMAP, \ - CURSOR_JOIN(cursor_, __COUNTER__)) + OVS_JOIN(cursor_, __COUNTER__)) static inline struct cmap_node *cmap_first(const struct cmap *); diff --git a/lib/util.h b/lib/util.h index 1fe8ef32b..aea19d45f 100644 --- a/lib/util.h +++ b/lib/util.h @@ -105,6 +105,13 @@ ovs_prefetch_range(const void *start, size_t size) #define OVS_NOT_REACHED() abort() +/* Joins two token expanding the arguments if they are macros. + * + * For token concatenation the circumlocution is needed for the + * expansion. */ +#define OVS_JOIN2(X, Y) X##Y +#define OVS_JOIN(X, Y) OVS_JOIN2(X, Y) + /* Use "%"PRIuSIZE to format size_t with printf(). */ #ifdef _WIN32 #define PRIdSIZE "Id" From patchwork Wed Apr 28 21:56:58 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Valerio X-Patchwork-Id: 1471443 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=140.211.166.137; helo=smtp4.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=jKiZF7XM; dkim-atps=neutral Received: from smtp4.osuosl.org (smtp4.osuosl.org [140.211.166.137]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4FVstQ46dyz9sj5 for ; Thu, 29 Apr 2021 07:57:18 +1000 (AEST) Received: from localhost (localhost [127.0.0.1]) by smtp4.osuosl.org (Postfix) with ESMTP id C3CAD4184B; Wed, 28 Apr 2021 21:57:16 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp4.osuosl.org ([127.0.0.1]) by localhost (smtp4.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id t1pceXG5NQo7; Wed, 28 Apr 2021 21:57:15 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [IPv6:2605:bc80:3010:104::8cd3:938]) by smtp4.osuosl.org (Postfix) with ESMTP id 30CE54183B; Wed, 28 Apr 2021 21:57:14 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 0AD11C000E; Wed, 28 Apr 2021 21:57:14 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp4.osuosl.org (smtp4.osuosl.org [140.211.166.137]) by lists.linuxfoundation.org (Postfix) with ESMTP id 34270C0001 for ; Wed, 28 Apr 2021 21:57:12 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp4.osuosl.org (Postfix) with ESMTP id 43D6E414FB for ; Wed, 28 Apr 2021 21:57:07 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp4.osuosl.org ([127.0.0.1]) by localhost (smtp4.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id hnjDLCESPvSP for ; Wed, 28 Apr 2021 21:57:05 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.8.0 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by smtp4.osuosl.org (Postfix) with ESMTPS id 4AD51405C2 for ; Wed, 28 Apr 2021 21:57:05 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1619647024; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Xv6lMEkBdvx8vPZhIz+r9ZEbjBNoHxXnNk2FmZ/nr+E=; b=jKiZF7XMdVDd8iS2DBDHACfK7cZ1R5k8tuYM6UvV4/evsYNr8aMKJpqaDMA+1Jn8B8wLGH xv1oPQusFmeSHTMFXGbjgmTGiWJ+IuoK72Qtt5h2NBNERHQTx9G1F/llH3kuxWkRHQE3Rj tPfQZFu/zukJkTAnQWBq3YDp+OxhcV8= Received: from mail-ej1-f71.google.com (mail-ej1-f71.google.com [209.85.218.71]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-504-ImVxn8loPa22_UFqg1GdiQ-1; Wed, 28 Apr 2021 17:57:02 -0400 X-MC-Unique: ImVxn8loPa22_UFqg1GdiQ-1 Received: by mail-ej1-f71.google.com with SMTP id ne22-20020a1709077b96b02903803a047edeso11570376ejc.3 for ; Wed, 28 Apr 2021 14:57:01 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:from:to:cc:date:message-id:in-reply-to :references:user-agent:mime-version:content-transfer-encoding; bh=Xv6lMEkBdvx8vPZhIz+r9ZEbjBNoHxXnNk2FmZ/nr+E=; b=RXJVdFtkU2R8VIxU1MhekCuONuPPVU9tu2Rz8TQVdzF306najlNcmQ+DNjBDsNTvVF OSosWE9dJVjGRh/zaBRThGRWqXVLP/6H/R1M9XgX3LPeedhduMi2vhv4RSCkFDKBEwsI jsZ5sgnVuEGCMZnF0ViuLcakbOQZ4P3r0curGPeqs3lNBpQUAYVCwV2GHAfAdzJLnCOp 57vRZF9A9TFqTq9G0E45EbeFWiUkk11TyYqmmXgd2be5OhxGTonU5HX0Z3LlktZyFBoK EggIydHmi/lMJKZJwdDvzLxz4i3g+tUZgG5AQSsdRZzTRzDsE313AIPIc+pGmceEMkU1 wXGQ== X-Gm-Message-State: AOAM533PLO9WwtHbx3OF/8Y2cqGBVabe8s9AphpwFzmJualo/uka1hsI TtnnxripuRfVvcB3Y71+C9a2qOvd9DotWevygZttPIL/7hPjmsYOlO/1NDcdBmkmQ+gmmGe297h Wf+a1Gw96rewLCGpUhKXuENiexhu6+/hF6vYDwRFyfZZHRuOdeE9HDYrPFIMQpFns X-Received: by 2002:a17:906:2b8c:: with SMTP id m12mr24871090ejg.22.1619647020304; Wed, 28 Apr 2021 14:57:00 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzHAYhZ7N7TIgajst4VjhXFXCJlp51/QBYm+Hs2g/GiscTI5XZ36Kn8CfARvSlwj6rwCVIZjw== X-Received: by 2002:a17:906:2b8c:: with SMTP id m12mr24871064ejg.22.1619647019949; Wed, 28 Apr 2021 14:56:59 -0700 (PDT) Received: from localhost (net-37-119-128-110.cust.vodafonedsl.it. [37.119.128.110]) by smtp.gmail.com with ESMTPSA id bu26sm627118ejb.30.2021.04.28.14.56.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 28 Apr 2021 14:56:59 -0700 (PDT) From: Paolo Valerio To: dev@openvswitch.org Date: Wed, 28 Apr 2021 23:56:58 +0200 Message-ID: <161964701852.293712.9625607780323662101.stgit@fed.void> In-Reply-To: <161964677372.293712.9125314386461543424.stgit@fed.void> References: <161964677372.293712.9125314386461543424.stgit@fed.void> User-Agent: StGit/0.23 MIME-Version: 1.0 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=pvalerio@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Cc: i.maximets@ovn.org Subject: [ovs-dev] [PATCH v6 2/2] conntrack: handle SNAT with all-zero IP address X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" this patch introduces for the userspace datapath the handling of rules like the following: ct(commit,nat(src=0.0.0.0),...) Kernel datapath already handle this case that is particularly handy in scenarios like the following: Given A: 10.1.1.1, B: 192.168.2.100, C: 10.1.1.2 A opens a connection toward B on port 80 selecting as source port 10000. B's IP gets dnat'ed to C's IP (10.1.1.1:10000 -> 192.168.2.100:80). This will result in: tcp,orig=(src=10.1.1.1,dst=192.168.2.100,sport=10000,dport=80),reply=(src=10.1.1.2,dst=10.1.1.1,sport=80,dport=10000),protoinfo=(state=ESTABLISHED) A now tries to establish another connection with C using source port 10000, this time using C's IP address (10.1.1.1:10000 -> 10.1.1.2:80). This second connection, if processed by conntrack with no SNAT/DNAT involved, collides with the reverse tuple of the first connection, so the entry for this valid connection doesn't get created. With this commit, and adding a SNAT rule with 0.0.0.0 for 10.1.1.1:10000 -> 10.1.1.2:80 will allow to create the conn entry: tcp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=10000,dport=80),reply=(src=10.1.1.2,dst=10.1.1.1,sport=80,dport=10001),protoinfo=(state=ESTABLISHED) tcp,orig=(src=10.1.1.1,dst=192.168.2.100,sport=10000,dport=80),reply=(src=10.1.1.2,dst=10.1.1.1,sport=80,dport=10000),protoinfo=(state=ESTABLISHED) The issue exists even in the opposite case (with A trying to connect to C using B's IP after establishing a direct connection from A to C). This commit refactors the relevant function in a way that both of the previously mentioned cases are handled as well. Suggested-by: Eelco Chaudron Signed-off-by: Paolo Valerio Acked-by: Gaetan Rivet Acked-by: Aaron Conole --- NEWS | 3 lib/conntrack-private.h | 33 ++++ lib/conntrack.c | 335 ++++++++++++++++++++++++-------------- lib/ovs-actions.xml | 3 tests/system-userspace-macros.at | 8 - 5 files changed, 251 insertions(+), 131 deletions(-) diff --git a/NEWS b/NEWS index 95cf922aa..2af3ff6d1 100644 --- a/NEWS +++ b/NEWS @@ -5,6 +5,9 @@ Post-v2.15.0 - Userspace datapath: * Auto load balancing of PMDs now partially supports cross-NUMA polling cases, e.g if all PMD threads are running on the same NUMA node. + * Add all-zero IP SNAT handling to conntrack. In case of collision, + using ct(src=0.0.0.0), the source port will be replaced with another + non-colliding port in the ephemeral range (1024, 65535). - ovs-ctl: * New option '--no-record-hostname' to disable hostname configuration in ovsdb on startup. diff --git a/lib/conntrack-private.h b/lib/conntrack-private.h index e8332bdba..cc2fb045d 100644 --- a/lib/conntrack-private.h +++ b/lib/conntrack-private.h @@ -148,6 +148,39 @@ enum ct_update_res { CT_TIMEOUT(ICMP_FIRST) \ CT_TIMEOUT(ICMP_REPLY) +#define NAT_ACTION_SNAT_ALL (NAT_ACTION_SRC | NAT_ACTION_SRC_PORT) +#define NAT_ACTION_DNAT_ALL (NAT_ACTION_DST | NAT_ACTION_DST_PORT) + +enum ct_ephemeral_range { + MIN_NAT_EPHEMERAL_PORT = 1024, + MAX_NAT_EPHEMERAL_PORT = 65535 +}; + +#define IN_RANGE(curr, min, max) \ + (curr >= min && curr <= max) + +#define NEXT_PORT_IN_RANGE(curr, min, max) \ + (curr = (!IN_RANGE(curr, min, max) || curr == max) ? min : curr + 1) + +/* if the current port is out of range increase the attempts by + * one so that in the worst case scenario the current out of + * range port plus all the in-range ports get tested. + * Note that curr can be an out of range port only in case of + * source port (SNAT with port range unspecified or DNAT), + * furthermore the source port in the packet has to be less than + * MIN_NAT_EPHEMERAL_PORT. */ +#define N_PORT_ATTEMPTS(curr, min, max) \ + ((!IN_RANGE(curr, min, max)) ? (max - min) + 2 : (max - min) + 1) + +/* loose in-range check, the first curr port can be any port out of + * the range. */ +#define FOR_EACH_PORT_IN_RANGE__(curr, min, max, INAME) \ + for (uint16_t INAME = N_PORT_ATTEMPTS(curr, min, max); \ + INAME > 0; INAME--, NEXT_PORT_IN_RANGE(curr, min, max)) + +#define FOR_EACH_PORT_IN_RANGE(curr, min, max) \ + FOR_EACH_PORT_IN_RANGE__(curr, min, max, OVS_JOIN(idx, __COUNTER__)) + enum ct_timeout { #define CT_TIMEOUT(NAME) CT_TM_##NAME, CT_TIMEOUTS diff --git a/lib/conntrack.c b/lib/conntrack.c index 99198a601..bf388dbfb 100644 --- a/lib/conntrack.c +++ b/lib/conntrack.c @@ -108,8 +108,8 @@ static void set_label(struct dp_packet *, struct conn *, static void *clean_thread_main(void *f_); static bool -nat_select_range_tuple(struct conntrack *ct, const struct conn *conn, - struct conn *nat_conn); +nat_get_unique_tuple(struct conntrack *ct, const struct conn *conn, + struct conn *nat_conn); static uint8_t reverse_icmp_type(uint8_t type); @@ -728,11 +728,11 @@ pat_packet(struct dp_packet *pkt, const struct conn *conn) } } else if (conn->nat_info->nat_action & NAT_ACTION_DST) { if (conn->key.nw_proto == IPPROTO_TCP) { - struct tcp_header *th = dp_packet_l4(pkt); - packet_set_tcp_port(pkt, th->tcp_src, conn->rev_key.src.port); + packet_set_tcp_port(pkt, conn->rev_key.dst.port, + conn->rev_key.src.port); } else if (conn->key.nw_proto == IPPROTO_UDP) { - struct udp_header *uh = dp_packet_l4(pkt); - packet_set_udp_port(pkt, uh->udp_src, conn->rev_key.src.port); + packet_set_udp_port(pkt, conn->rev_key.dst.port, + conn->rev_key.src.port); } } } @@ -786,11 +786,9 @@ un_pat_packet(struct dp_packet *pkt, const struct conn *conn) } } else if (conn->nat_info->nat_action & NAT_ACTION_DST) { if (conn->key.nw_proto == IPPROTO_TCP) { - struct tcp_header *th = dp_packet_l4(pkt); - packet_set_tcp_port(pkt, conn->key.dst.port, th->tcp_dst); + packet_set_tcp_port(pkt, conn->key.dst.port, conn->key.src.port); } else if (conn->key.nw_proto == IPPROTO_UDP) { - struct udp_header *uh = dp_packet_l4(pkt); - packet_set_udp_port(pkt, conn->key.dst.port, uh->udp_dst); + packet_set_udp_port(pkt, conn->key.dst.port, conn->key.src.port); } } } @@ -810,12 +808,10 @@ reverse_pat_packet(struct dp_packet *pkt, const struct conn *conn) } } else if (conn->nat_info->nat_action & NAT_ACTION_DST) { if (conn->key.nw_proto == IPPROTO_TCP) { - struct tcp_header *th_in = dp_packet_l4(pkt); - packet_set_tcp_port(pkt, th_in->tcp_src, + packet_set_tcp_port(pkt, conn->key.src.port, conn->key.dst.port); } else if (conn->key.nw_proto == IPPROTO_UDP) { - struct udp_header *uh_in = dp_packet_l4(pkt); - packet_set_udp_port(pkt, uh_in->udp_src, + packet_set_udp_port(pkt, conn->key.src.port, conn->key.dst.port); } } @@ -1029,14 +1025,14 @@ conn_not_found(struct conntrack *ct, struct dp_packet *pkt, } } else { memcpy(nat_conn, nc, sizeof *nat_conn); - bool nat_res = nat_select_range_tuple(ct, nc, nat_conn); + bool nat_res = nat_get_unique_tuple(ct, nc, nat_conn); if (!nat_res) { goto nat_res_exhaustion; } /* Update nc with nat adjustments made to nat_conn by - * nat_select_range_tuple(). */ + * nat_get_unique_tuple(). */ memcpy(nc, nat_conn, sizeof *nc); } @@ -2210,130 +2206,221 @@ nat_range_hash(const struct conn *conn, uint32_t basis) return hash_finish(hash, 0); } -static bool -nat_select_range_tuple(struct conntrack *ct, const struct conn *conn, - struct conn *nat_conn) -{ - enum { MIN_NAT_EPHEMERAL_PORT = 1024, - MAX_NAT_EPHEMERAL_PORT = 65535 }; - - uint16_t min_port; - uint16_t max_port; - uint16_t first_port; - uint32_t hash = nat_range_hash(conn, ct->hash_basis); +/* Ports are stored in host byte order for convenience. */ +static void +set_sport_range(struct nat_action_info_t *ni, const struct conn_key *k, + uint32_t hash, uint16_t *curr, uint16_t *min, + uint16_t *max) +{ + if (((ni->nat_action & NAT_ACTION_SNAT_ALL) == NAT_ACTION_SRC) || + ((ni->nat_action & NAT_ACTION_DST))) { + *curr = ntohs(k->src.port); + *min = MIN_NAT_EPHEMERAL_PORT; + *max = MAX_NAT_EPHEMERAL_PORT; + } else { + *min = ni->min_port; + *max = ni->max_port; + *curr = *min + (hash % ((*max - *min) + 1)); + } +} - if ((conn->nat_info->nat_action & NAT_ACTION_SRC) && - (!(conn->nat_info->nat_action & NAT_ACTION_SRC_PORT))) { - min_port = ntohs(conn->key.src.port); - max_port = ntohs(conn->key.src.port); - first_port = min_port; - } else if ((conn->nat_info->nat_action & NAT_ACTION_DST) && - (!(conn->nat_info->nat_action & NAT_ACTION_DST_PORT))) { - min_port = ntohs(conn->key.dst.port); - max_port = ntohs(conn->key.dst.port); - first_port = min_port; +static void +set_dport_range(struct nat_action_info_t *ni, const struct conn_key *k, + uint32_t hash, uint16_t *curr, uint16_t *min, + uint16_t *max) +{ + if (ni->nat_action & NAT_ACTION_DST_PORT) { + *min = ni->min_port; + *max = ni->max_port; + *curr = *min + (hash % ((*max - *min) + 1)); } else { - uint16_t deltap = conn->nat_info->max_port - conn->nat_info->min_port; - uint32_t port_index = hash % (deltap + 1); - first_port = conn->nat_info->min_port + port_index; - min_port = conn->nat_info->min_port; - max_port = conn->nat_info->max_port; + *curr = ntohs(k->dst.port); + *min = *max = *curr; } +} - uint32_t deltaa = 0; - uint32_t address_index; - union ct_addr ct_addr; - memset(&ct_addr, 0, sizeof ct_addr); - union ct_addr max_ct_addr; - memset(&max_ct_addr, 0, sizeof max_ct_addr); - max_ct_addr = conn->nat_info->max_addr; +/* Gets the initial in range address based on the hash. + * Addresses are kept in network order. */ +static void +get_addr_in_range(union ct_addr *min, union ct_addr *max, + union ct_addr *curr, uint32_t hash, + bool ipv4) +{ + uint32_t offt, range; - if (conn->key.dl_type == htons(ETH_TYPE_IP)) { - deltaa = ntohl(conn->nat_info->max_addr.ipv4) - - ntohl(conn->nat_info->min_addr.ipv4); - address_index = hash % (deltaa + 1); - ct_addr.ipv4 = htonl( - ntohl(conn->nat_info->min_addr.ipv4) + address_index); + if (ipv4) { + range = (ntohl(max->ipv4) - ntohl(min->ipv4)) + 1; + offt = hash % range; + curr->ipv4 = htonl(ntohl(min->ipv4) + offt); } else { - deltaa = nat_ipv6_addrs_delta(&conn->nat_info->min_addr.ipv6, - &conn->nat_info->max_addr.ipv6); - /* deltaa must be within 32 bits for full hash coverage. A 64 or + range = nat_ipv6_addrs_delta(&min->ipv6, + &max->ipv6) + 1; + /* range must be within 32 bits for full hash coverage. A 64 or * 128 bit hash is unnecessary and hence not used here. Most code * is kept common with V4; nat_ipv6_addrs_delta() will do the * enforcement via max_ct_addr. */ - max_ct_addr = conn->nat_info->min_addr; - nat_ipv6_addr_increment(&max_ct_addr.ipv6, deltaa); - address_index = hash % (deltaa + 1); - ct_addr.ipv6 = conn->nat_info->min_addr.ipv6; - nat_ipv6_addr_increment(&ct_addr.ipv6, address_index); - } - - uint16_t port = first_port; - bool all_ports_tried = false; - /* For DNAT or for specified port ranges, we don't use ephemeral ports. */ - bool ephemeral_ports_tried - = conn->nat_info->nat_action & NAT_ACTION_DST || - conn->nat_info->nat_action & NAT_ACTION_SRC_PORT - ? true : false; - union ct_addr first_addr = ct_addr; - bool pat_enabled = conn->key.nw_proto == IPPROTO_TCP || - conn->key.nw_proto == IPPROTO_UDP; - - while (true) { + offt = hash % range; + curr->ipv6 = min->ipv6; + nat_ipv6_addr_increment(&curr->ipv6, offt); + } +} + +static void +get_initial_addr(const struct conn *conn, union ct_addr *min, + union ct_addr *max, union ct_addr *curr, + uint32_t hash, bool ipv4) +{ + const union ct_addr zero_ip = {0}; + + /* all-zero CASE */ + if (!memcmp(min, &zero_ip, sizeof(*min))) { if (conn->nat_info->nat_action & NAT_ACTION_SRC) { - nat_conn->rev_key.dst.addr = ct_addr; - if (pat_enabled) { - nat_conn->rev_key.dst.port = htons(port); - } - } else { - nat_conn->rev_key.src.addr = ct_addr; - if (pat_enabled) { - nat_conn->rev_key.src.port = htons(port); - } + *curr = conn->key.src.addr; + } else if (conn->nat_info->nat_action & NAT_ACTION_DST) { + *curr = conn->key.dst.addr; + } + } else { + get_addr_in_range(min, max, curr, hash, ipv4); + } +} + +static void +store_addr_to_key(union ct_addr *addr, struct conn_key *key, + uint16_t action) +{ + if (action & NAT_ACTION_SRC) { + key->dst.addr = *addr; + } else { + key->src.addr = *addr; + } +} + +static void +next_addr_in_range(union ct_addr *curr, union ct_addr *min, + union ct_addr *max, bool ipv4) +{ + if (ipv4) { + /* this check could be unified with IPv6, but let's avoid + * an unneeded memcmp() in case of IPv4. */ + if (min->ipv4 == max->ipv4) { + return; + } + + curr->ipv4 = (curr->ipv4 == max->ipv4) ? + min->ipv4 : + htonl(ntohl(curr->ipv4) + 1); + } else { + if (!memcmp(min, max, sizeof(*min))) { + return; + } + + if (!memcmp(curr, max, sizeof(*curr))) { + *curr = *min; + return; } - bool found = conn_lookup(ct, &nat_conn->rev_key, time_msec(), NULL, - NULL); - if (!found) { + nat_ipv6_addr_increment(&curr->ipv6, 1); + } +} + +static bool +next_addr_in_range_guarded(union ct_addr *curr, union ct_addr *min, + union ct_addr *max, union ct_addr *guard, + bool ipv4) +{ + bool exhausted; + + next_addr_in_range(curr, min, max, ipv4); + + if (ipv4) { + exhausted = (curr->ipv4 == guard->ipv4); + } else { + exhausted = !memcmp(curr, guard, sizeof(*curr)); + } + + return exhausted; +} + +/* This function tries to get a unique tuple. + * Every iteration checks that the reverse tuple doesn't + * collide with any existing one. + * + * in case of SNAT: + * - for each src IP address in the range (if any) + * - try to find a source port in range (if any) + * - if no port range exists, use the whole + * ephemeral range (after testing the port + * used by the sender), otherwise use the + * specified range + * + * in case of DNAT: + * - for each dst IP address in the range (if any) + * - for each dport in range (if any) + * - try to find a source port in the ephemeral range + * (after testing the port used by the sender) + * + * If none can be found, return exhaustion to the caller. */ +static bool +nat_get_unique_tuple(struct conntrack *ct, const struct conn *conn, + struct conn *nat_conn) +{ + union ct_addr min_addr = {0}, max_addr = {0}, curr_addr = {0}, + guard_addr = {0}; + uint32_t hash = nat_range_hash(conn, ct->hash_basis); + bool pat_proto = conn->key.nw_proto == IPPROTO_TCP || + conn->key.nw_proto == IPPROTO_UDP; + uint16_t min_dport, max_dport, curr_dport; + uint16_t min_sport, max_sport, curr_sport; + + min_addr = conn->nat_info->min_addr; + max_addr = conn->nat_info->max_addr; + + get_initial_addr(conn, &min_addr, &max_addr, &curr_addr, hash, + (conn->key.dl_type == htons(ETH_TYPE_IP))); + + /* save the address we started from so that + * we can stop once we reach it. */ + guard_addr = curr_addr; + + set_sport_range(conn->nat_info, &conn->key, hash, &curr_sport, + &min_sport, &max_sport); + set_dport_range(conn->nat_info, &conn->key, hash, &curr_dport, + &min_dport, &max_dport); + +another_round: + store_addr_to_key(&curr_addr, &nat_conn->rev_key, + conn->nat_info->nat_action); + + if (!pat_proto) { + if (!conn_lookup(ct, &nat_conn->rev_key, + time_msec(), NULL, NULL)) { return true; - } else if (pat_enabled && !all_ports_tried) { - if (min_port == max_port) { - all_ports_tried = true; - } else if (port == max_port) { - port = min_port; - } else { - port++; - } - if (port == first_port) { - all_ports_tried = true; - } - } else { - if (memcmp(&ct_addr, &max_ct_addr, sizeof ct_addr)) { - if (conn->key.dl_type == htons(ETH_TYPE_IP)) { - ct_addr.ipv4 = htonl(ntohl(ct_addr.ipv4) + 1); - } else { - nat_ipv6_addr_increment(&ct_addr.ipv6, 1); - } - } else { - ct_addr = conn->nat_info->min_addr; - } - if (!memcmp(&ct_addr, &first_addr, sizeof ct_addr)) { - if (pat_enabled && !ephemeral_ports_tried) { - ephemeral_ports_tried = true; - ct_addr = conn->nat_info->min_addr; - first_addr = ct_addr; - min_port = MIN_NAT_EPHEMERAL_PORT; - max_port = MAX_NAT_EPHEMERAL_PORT; - } else { - break; - } + } + + goto next_addr; + } + + FOR_EACH_PORT_IN_RANGE(curr_dport, min_dport, max_dport) { + nat_conn->rev_key.src.port = htons(curr_dport); + FOR_EACH_PORT_IN_RANGE(curr_sport, min_sport, max_sport) { + nat_conn->rev_key.dst.port = htons(curr_sport); + if (!conn_lookup(ct, &nat_conn->rev_key, + time_msec(), NULL, NULL)) { + return true; } - first_port = min_port; - port = first_port; - all_ports_tried = false; } } - return false; + + /* Check if next IP is in range and respin. Otherwise, notify + * exhaustion to the caller. */ +next_addr: + if (next_addr_in_range_guarded(&curr_addr, &min_addr, + &max_addr, &guard_addr, + conn->key.dl_type == htons(ETH_TYPE_IP))) { + return false; + } + + goto another_round; } static enum ct_update_res diff --git a/lib/ovs-actions.xml b/lib/ovs-actions.xml index a0070e6c6..e815d5284 100644 --- a/lib/ovs-actions.xml +++ b/lib/ovs-actions.xml @@ -1839,8 +1839,7 @@ for i in [1,n_members]: nat(src=0.0.0.0). In this case, when a source port collision is detected during the commit, the source port will be translated to an ephemeral port. If there is no collision, no SNAT - is performed. Note that this is currently only implemented in the - Linux kernel datapath. + is performed.

diff --git a/tests/system-userspace-macros.at b/tests/system-userspace-macros.at index 9f0d38dfb..f639ba53a 100644 --- a/tests/system-userspace-macros.at +++ b/tests/system-userspace-macros.at @@ -99,12 +99,10 @@ m4_define([CHECK_CONNTRACK_NAT]) # CHECK_CONNTRACK_ZEROIP_SNAT() # # Perform requirements checks for running conntrack all-zero IP SNAT tests. -# The userspace datapath does not support all-zero IP SNAT. +# The userspace datapath always supports all-zero IP SNAT, so no check is +# needed. # -m4_define([CHECK_CONNTRACK_ZEROIP_SNAT], -[ - AT_SKIP_IF([:]) -]) +m4_define([CHECK_CONNTRACK_ZEROIP_SNAT]) # CHECK_CONNTRACK_TIMEOUT() #