From patchwork Thu Nov 3 10:46:46 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Gurucharan Shetty X-Patchwork-Id: 691015 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from archives.nicira.com (archives.nicira.com [96.126.127.54]) by ozlabs.org (Postfix) with ESMTP id 3t8xz13MMPz9tkH for ; Fri, 4 Nov 2016 07:54:05 +1100 (AEDT) Received: from archives.nicira.com (localhost [127.0.0.1]) by archives.nicira.com (Postfix) with ESMTP id BC43210B87; Thu, 3 Nov 2016 13:53:57 -0700 (PDT) X-Original-To: dev@openvswitch.org Delivered-To: dev@openvswitch.org Received: from mx3v3.cudamail.com (mx3.cudamail.com [64.34.241.5]) by archives.nicira.com (Postfix) with ESMTPS id 8FBF810B7C for ; Thu, 3 Nov 2016 13:53:54 -0700 (PDT) Received: from bar6.cudamail.com (localhost [127.0.0.1]) by mx3v3.cudamail.com (Postfix) with ESMTPS id 23A661627EA for ; Thu, 3 Nov 2016 14:53:54 -0600 (MDT) X-ASG-Debug-ID: 1478206433-0b323720427f7e00001-byXFYA Received: from mx3-pf1.cudamail.com ([192.168.14.2]) by bar6.cudamail.com with ESMTP id rmroY2eu57CrgGD5 (version=TLSv1 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Thu, 03 Nov 2016 14:53:53 -0600 (MDT) X-Barracuda-Envelope-From: guru.ovn@gmail.com X-Barracuda-RBL-Trusted-Forwarder: 192.168.14.2 Received: from unknown (HELO mail-pf0-f193.google.com) (209.85.192.193) by mx3-pf1.cudamail.com with ESMTPS (AES128-SHA encrypted); 3 Nov 2016 20:53:53 -0000 Received-SPF: pass (mx3-pf1.cudamail.com: SPF record at _netblocks.google.com designates 209.85.192.193 as permitted sender) X-Barracuda-Apparent-Source-IP: 209.85.192.193 X-Barracuda-RBL-IP: 209.85.192.193 Received: by mail-pf0-f193.google.com with SMTP id y68so5803068pfb.1 for ; Thu, 03 Nov 2016 13:53:53 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references; bh=gHICUmM06GL2paiQmiKzUUTiCn2H5/cfucBtJ7pboNY=; b=KBGlXiqSWvKPfDePBNK4cTSaqKRzWYthqvmkWTbrGu5xNzSc0GGwqRToOnziI0QAdf gUeVJ02hRr8z7eCOGTn5dOThiwuiEcU80UdBH4iPbZ7Ufbp7Swvi6HGXpkFzvvBWDu8Z yGLEUm+O9IS8kztvT3sTHWo82Z2fE8AmAf2/deiFDPkRx6FYC4C/VH5Jkd56IM+4Kwhw ctf25KYRIHh9hcNlJUFmRAxsOH9g6nah0U+kgVGRDp6IzaQhSRSpBiN2AfBsAT3+YC02 8/ALfQRx4RhcNfOZ1um7eY8ItraZ7nTWyRwsXv2We+uCAmwSPgdxhqiFXvIJJwIDn65F G2Rw== X-Gm-Message-State: ABUngvfeSsj8Gx+hZyNJRqNsw/z886la6yt+AT0F8gFSq8xwT4dGcD7WEdHj0uFSQB8fsA== X-Received: by 10.99.235.10 with SMTP id t10mr16513732pgh.95.1478206432224; Thu, 03 Nov 2016 13:53:52 -0700 (PDT) Received: from ubuntu.eng.vmware.com ([208.91.1.34]) by smtp.gmail.com with ESMTPSA id x1sm14852607pax.7.2016.11.03.13.53.50 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Thu, 03 Nov 2016 13:53:51 -0700 (PDT) X-CudaMail-Envelope-Sender: guru.ovn@gmail.com From: Gurucharan Shetty To: dev@openvswitch.org X-CudaMail-Whitelist-To: dev@openvswitch.org X-CudaMail-MID: CM-V1-1102051651 X-CudaMail-DTE: 110316 X-CudaMail-Originating-IP: 209.85.192.193 Date: Thu, 3 Nov 2016 03:46:46 -0700 X-ASG-Orig-Subj: [##CM-V1-1102051651##][PATCH 3/3] ovn-northd: SNAT in either direction of gateway router. Message-Id: <1478170006-15289-3-git-send-email-guru@ovn.org> X-Mailer: git-send-email 1.9.1 In-Reply-To: <1478170006-15289-1-git-send-email-guru@ovn.org> References: <1478170006-15289-1-git-send-email-guru@ovn.org> X-Barracuda-Connect: UNKNOWN[192.168.14.2] X-Barracuda-Start-Time: 1478206433 X-Barracuda-Encrypted: DHE-RSA-AES256-SHA X-Barracuda-URL: https://web.cudamail.com:443/cgi-mod/mark.cgi X-ASG-Whitelist: Header =?UTF-8?B?eFwtY3VkYW1haWxcLXdoaXRlbGlzdFwtdG8=?= X-Virus-Scanned: by bsmtpd at cudamail.com X-Barracuda-BRTS-Status: 1 Subject: [ovs-dev] [PATCH 3/3] ovn-northd: SNAT in either direction of gateway router. X-BeenThere: dev@openvswitch.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: dev-bounces@openvswitch.org Sender: "dev" When multiple gateway routers exist, a packet can enter any gateway router. Once the packet reaches its destination, its reverse direction should be via the same gateway router. This is achieved by doing a SNAT of the packet in the inward direction (towards logical space) with a IP address of the gateway router such that packet travels back to the same gateway router. The above means that you can have SNAT rules in the NB database for both directions. For e.g. if the logical ip address are of the range 192.168.1.0/24, you will have one SNAT rule to transform packet from 192.168.1.0/24 to an external_ip and another SNAT rule to transform "0.0.0.0/0" (all external initiated traffic) to a gateway_ip. For a particular connection, we should do SNAT in only one direction. And to do that in the pipeline, we check whether a packet has already been SNATted and if it has a transformation, we should not do it again. Signed-off-by: Gurucharan Shetty --- ovn/lib/actions.c | 8 -------- ovn/lib/logical-fields.c | 2 ++ ovn/lib/logical-fields.h | 4 ++++ ovn/northd/ovn-northd.8.xml | 49 +++++++++++++++++++++++++++++++++++---------- ovn/northd/ovn-northd.c | 31 ++++++++++++++++------------ ovn/ovn-sb.xml | 8 ++++---- tests/ovn.at | 2 +- 7 files changed, 67 insertions(+), 37 deletions(-) diff --git a/ovn/lib/actions.c b/ovn/lib/actions.c index df526c0..7896d00 100644 --- a/ovn/lib/actions.c +++ b/ovn/lib/actions.c @@ -787,14 +787,6 @@ encode_ct_nat(const struct ovnact_ct_nat *cn, ct = ofpacts->header; if (cn->ip) { ct->flags |= NX_CT_F_COMMIT; - } else if (snat) { - /* XXX: For performance reasons, we try to prevent additional - * recirculations. So far, ct_snat which is used in a gateway router - * does not need a recirculation. ct_snat(IP) does need a - * recirculation. Should we consider a method to let the actions - * specify whether an action needs recirculation if there more use - * cases?. */ - ct->recirc_table = NX_CT_RECIRC_NONE; } ofpact_finish(ofpacts, &ct->ofpact); ofpbuf_push_uninit(ofpacts, ct_offset); diff --git a/ovn/lib/logical-fields.c b/ovn/lib/logical-fields.c index d4578c3..27eba01 100644 --- a/ovn/lib/logical-fields.c +++ b/ovn/lib/logical-fields.c @@ -88,6 +88,8 @@ ovn_init_symtab(struct shash *symtab) char flags_str[16]; snprintf(flags_str, sizeof flags_str, "flags[%d]", MLF_ALLOW_LOOPBACK_BIT); expr_symtab_add_subfield(symtab, "flags.loopback", NULL, flags_str); + snprintf(flags_str, sizeof flags_str, "flags[%d]", MLF_SNAT_DONE); + expr_symtab_add_subfield(symtab, "flags.snat", NULL, flags_str); /* Connection tracking state. */ expr_symtab_add_field(symtab, "ct_mark", MFF_CT_MARK, NULL, false); diff --git a/ovn/lib/logical-fields.h b/ovn/lib/logical-fields.h index a1f1da6..8f12e06 100644 --- a/ovn/lib/logical-fields.h +++ b/ovn/lib/logical-fields.h @@ -47,6 +47,7 @@ void ovn_init_symtab(struct shash *symtab); enum mff_log_flags_bits { MLF_ALLOW_LOOPBACK_BIT = 0, MLF_RCV_FROM_VXLAN_BIT = 1, + MLF_SNAT_DONE_BIT = 2, }; /* MFF_LOG_FLAGS_REG flag assignments */ @@ -59,6 +60,9 @@ enum mff_log_flags { * VXLAN encapsulation. Egress port information is available for * Geneve and STT tunnel types. */ MLF_RCV_FROM_VXLAN = (1 << MLF_RCV_FROM_VXLAN_BIT), + + /* Indicate that a packet has already been transformed in a SNAT zone. */ + MLF_SNAT_DONE = (1 << MLF_SNAT_DONE_BIT), }; #endif /* ovn/lib/logical-fields.h */ diff --git a/ovn/northd/ovn-northd.8.xml b/ovn/northd/ovn-northd.8.xml index b406db6..abae49c 100644 --- a/ovn/northd/ovn-northd.8.xml +++ b/ovn/northd/ovn-northd.8.xml @@ -1149,7 +1149,7 @@ icmp4 { to change the source IP address of a packet from A to B, a priority-100 flow matches ip && ip4.dst == B with an action - ct_snat; next;. + ct_snat;.

@@ -1159,7 +1159,34 @@ icmp4 { -

Ingress Table 4: DNAT

+

Ingress Table 4: POST_UNSNAT

+ +

+ This is to get the SNAT state of the packet after having sent the packet + through the SNAT zone in the previous table. +

+ +
    +
  • +

    + A priority-100 flow that matches ip && ct.dnat + with an action flags.snat = 1; next; indicating that + a UNSNAT happened in the previous table. +

    + +

    + A priority-90 flow with match ip + with an action flags.snat = 0; next;. +

    + +

    + A priority-0 logical flow with match 1 has actions + next;. +

    +
  • +
+ +

Ingress Table 5: DNAT

Packets enter the pipeline with destination IP address that needs to @@ -1208,7 +1235,7 @@ icmp4 { -

Ingress Table 5: IP Routing

+

Ingress Table 6: IP Routing

A packet that arrives at this table is an IP packet that should be @@ -1219,9 +1246,9 @@ icmp4 { packet's final destination, unchanged) and advances to the next table for ARP resolution. It also sets reg1 (or xxreg1) to the IP address owned by the selected router - port (Table 7 will generate ARP request, if needed, with - reg0 as the target protocol address and reg1 - as the source protocol address). + port (Ingress table ARP Request will generate ARP request, + if needed, with reg0 as the target protocol address and + reg1 as the source protocol address).

@@ -1300,7 +1327,7 @@ next; -

Ingress Table 6: ARP/ND Resolution

+

Ingress Table 7: ARP/ND Resolution

Any packet that reaches this table is an IP packet whose next-hop @@ -1382,7 +1409,7 @@ next; -

Ingress Table 7: ARP Request

+

Ingress Table 8: ARP Request

In the common case where the Ethernet destination has been resolved, this @@ -1408,9 +1435,9 @@ arp {

- (Ingress table 4 initialized reg1 with the IP address - owned by outport and reg0 with the next-hop - IP address) + (Ingress table IP Routing initialized reg1 + with the IP address owned by outport and + reg0 with the next-hop IP address)

diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c index 86504aa..72205e5 100644 --- a/ovn/northd/ovn-northd.c +++ b/ovn/northd/ovn-northd.c @@ -129,10 +129,11 @@ enum ovn_stage { PIPELINE_STAGE(ROUTER, IN, IP_INPUT, 1, "lr_in_ip_input") \ PIPELINE_STAGE(ROUTER, IN, DEFRAG, 2, "lr_in_defrag") \ PIPELINE_STAGE(ROUTER, IN, UNSNAT, 3, "lr_in_unsnat") \ - PIPELINE_STAGE(ROUTER, IN, DNAT, 4, "lr_in_dnat") \ - PIPELINE_STAGE(ROUTER, IN, IP_ROUTING, 5, "lr_in_ip_routing") \ - PIPELINE_STAGE(ROUTER, IN, ARP_RESOLVE, 6, "lr_in_arp_resolve") \ - PIPELINE_STAGE(ROUTER, IN, ARP_REQUEST, 7, "lr_in_arp_request") \ + PIPELINE_STAGE(ROUTER, IN, POST_UNSNAT, 4, "lr_in_post_unsnat") \ + PIPELINE_STAGE(ROUTER, IN, DNAT, 5, "lr_in_dnat") \ + PIPELINE_STAGE(ROUTER, IN, IP_ROUTING, 6, "lr_in_ip_routing") \ + PIPELINE_STAGE(ROUTER, IN, ARP_RESOLVE, 7, "lr_in_arp_resolve") \ + PIPELINE_STAGE(ROUTER, IN, ARP_REQUEST, 8, "lr_in_arp_request") \ \ /* Logical router egress stages. */ \ PIPELINE_STAGE(ROUTER, OUT, SNAT, 0, "lr_out_snat") \ @@ -3840,6 +3841,7 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap *ports, /* Packets are allowed by default. */ ovn_lflow_add(lflows, od, S_ROUTER_IN_DEFRAG, 0, "1", "next;"); ovn_lflow_add(lflows, od, S_ROUTER_IN_UNSNAT, 0, "1", "next;"); + ovn_lflow_add(lflows, od, S_ROUTER_IN_POST_UNSNAT, 0, "1", "next;"); ovn_lflow_add(lflows, od, S_ROUTER_OUT_SNAT, 0, "1", "next;"); ovn_lflow_add(lflows, od, S_ROUTER_IN_DNAT, 0, "1", "next;"); @@ -3917,6 +3919,13 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap *ports, sset_destroy(&all_ips); + /* When a UNSNAT happens, ct.dnat is set because the destination + * IP address is the one that is changed. */ + ovn_lflow_add(lflows, od, S_ROUTER_IN_POST_UNSNAT, 100, + "ip && ct.dnat", "flags.snat = 1; next;"); + ovn_lflow_add(lflows, od, S_ROUTER_IN_POST_UNSNAT, 90, + "ip", "flags.snat = 0; next;"); + for (int i = 0; i < od->nbr->n_nat; i++) { const struct nbrec_nat *nat; @@ -3976,7 +3985,7 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap *ports, ds_clear(&match); ds_put_format(&match, "ip && ip4.dst == %s", nat->external_ip); ovn_lflow_add(lflows, od, S_ROUTER_IN_UNSNAT, 100, - ds_cstr(&match), "ct_snat; next;"); + ds_cstr(&match), "ct_snat;"); } /* Ingress DNAT table: Packets enter the pipeline with destination @@ -4011,7 +4020,8 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap *ports, if (!strcmp(nat->type, "snat") || !strcmp(nat->type, "dnat_and_snat")) { ds_clear(&match); - ds_put_format(&match, "ip && ip4.src == %s", nat->logical_ip); + ds_put_format(&match, "ip && ip4.src == %s && flags.snat == 0", + nat->logical_ip); ds_clear(&actions); ds_put_format(&actions, "ct_snat(%s);", nat->external_ip); @@ -4025,7 +4035,7 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap *ports, } /* Re-circulate every packet through the DNAT zone. - * This helps with three things. + * This helps with two things. * * 1. Any packet that needs to be unDNATed in the reverse * direction gets unDNATed. Ideally this could be done in @@ -4035,12 +4045,7 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap *ports, * we can do it here, saving a future re-circulation. * * 2. Established load-balanced connections automatically get - * DNATed. - * - * 3. Any packet that was sent through SNAT zone in the - * previous table automatically gets re-circulated to get - * back the new destination IP address that is needed for - * routing in the openflow pipeline. */ + * DNATed. */ ovn_lflow_add(lflows, od, S_ROUTER_IN_DNAT, 50, "ip", "flags.loopback = 1; ct_dnat;"); } diff --git a/ovn/ovn-sb.xml b/ovn/ovn-sb.xml index 65191ed..e564c8a 100644 --- a/ovn/ovn-sb.xml +++ b/ovn/ovn-sb.xml @@ -1103,10 +1103,10 @@

ct_snat sends the packet through the SNAT zone to - unSNAT any packet that was SNATed in the opposite direction. If - the packet needs to be sent to the next tables, then it should be - followed by a next; action. The next tables will not - see the changes in the packet caused by the connection tracker. + unSNAT any packet that was SNATed in the opposite direction. + The packet is then automatically sent to the next tables as if + followed by next; action. The next tables will see + the changes in the packet caused by the connection tracker.

ct_snat(IP) sends the packet through the diff --git a/tests/ovn.at b/tests/ovn.at index 69f5277..bc8c9b8 100644 --- a/tests/ovn.at +++ b/tests/ovn.at @@ -838,7 +838,7 @@ ct_dnat(); # ct_snat ct_snat; - encodes as ct(zone=NXM_NX_REG12[0..15],nat) + encodes as ct(table=27,zone=NXM_NX_REG12[0..15],nat) has prereqs ip ct_snat(192.168.1.2); encodes as ct(commit,table=27,zone=NXM_NX_REG12[0..15],nat(src=192.168.1.2))