diff mbox

[ovs-dev,v4] ovn: DNAT and SNAT on a gateway router.

Message ID 1465457869-5490-1-git-send-email-guru@ovn.org
State Superseded
Headers show

Commit Message

Gurucharan Shetty June 9, 2016, 7:37 a.m. UTC
For traffic from physical space to virtual space we need DNAT.
The DNAT happens in the gateway router and reaches the logical
port. The return traffic should be unDNATed.

Traffic originating in virtual space heading to physical space
should be SNATed. The return traffic is unSNATted.

East-west traffic with the public destination IP address needs
a DNAT. This traffic is punted to the l3 gateway where DNAT
takes place. This traffic is also SNATed and eventually loops back to
its destination. The SNAT is needed because we need the reverse traffic
to go back to the l3 gateway and not short-circuit directly to the source.

This commit introduces 4 new logical actions.
1. ct_snat: To send the packet through SNAT zone to unSNAT packets.
2. ct_snat(IP): To SNAT to the provided IP address.
3. ct_dnat: To send the packet throgh DNAT zone to unDNAT packets.
4. ct_dnat(IP): To DNAT to the provided IP.

This commit only provides the ability to do IP based NAT. This will
eventually be enhanced to do PORT based NAT too.

Command hints:

Consider a distributed router "R1" that has switch foo (192.168.1.0/24)
with a lport foo1 (192.168.1.2) and bar (192.168.2.0/24) with lport bar1
(192.168.2.2) connected to it. You connect "R1" to
a gateway router "R2" via a switch "join" in (20.0.0.0/24) network.

R2 has a switch "alice" (172.16.1.0/24) connected to it (to simulate
external network).

case: Add pure DNAT (north-south)

Add a DNAT rule in R2:
ovn-nbctl -- --id=@nat create nat type="dnat" logical_ip=192.168.1.2 \
external_ip=30.0.0.2 -- add logical_router R2 nat @nat

Now alice1 should be able to ping 192.168.1.2 via 30.0.0.2.

case2 : Add pure SNAT (south-north)

Add a SNAT rule in R2:

ovn-nbctl -- --id=@nat create nat type="snat" logical_ip=192.168.2.2 \
external_ip=30.0.0.1 -- add logical_router R2 nat @nat

(You need a static route in R1 to send packets destined to outside
world to go through R2. The logical_ip can be a subnet.)

When bar1 pings alice1, alice1 receives traffic from 30.0.0.1

One could combine case1 and case2 with nat type="dnat_and_snat"
if the IP addresses are the same.

case3 : SNAT and DNAT (east-west traffic)

When bar1 pings 30.0.0.2, the traffic jumps to the gateway router
and loops back to foo1 with a source ip address of 30.0.0.1

Signed-off-by: Gurucharan Shetty <guru@ovn.org>
---
v3->v4:
1. Added unit tests and updated documentation based on blp's comments.
2. Changed schema to make it easier for OpenStack users.
 
---
 ovn/lib/actions.c           |  83 ++++++++++++++++++++
 ovn/northd/ovn-northd.8.xml | 131 ++++++++++++++++++++++++++++---
 ovn/northd/ovn-northd.c     | 187 ++++++++++++++++++++++++++++++++++++++++++--
 ovn/ovn-nb.ovsschema        |  19 ++++-
 ovn/ovn-nb.xml              |  65 +++++++++++++--
 ovn/ovn-sb.xml              |  41 ++++++++++
 ovn/utilities/ovn-nbctl.c   |   5 ++
 tests/ovn.at                |  17 ++++
 8 files changed, 524 insertions(+), 24 deletions(-)

Comments

Gurucharan Shetty June 16, 2016, 3:56 p.m. UTC | #1
On 9 June 2016 at 00:37, Gurucharan Shetty <guru@ovn.org> wrote:

> For traffic from physical space to virtual space we need DNAT.
> The DNAT happens in the gateway router and reaches the logical
> port. The return traffic should be unDNATed.
>
> Traffic originating in virtual space heading to physical space
> should be SNATed. The return traffic is unSNATted.
>
> East-west traffic with the public destination IP address needs
> a DNAT. This traffic is punted to the l3 gateway where DNAT
> takes place. This traffic is also SNATed and eventually loops back to
> its destination. The SNAT is needed because we need the reverse traffic
> to go back to the l3 gateway and not short-circuit directly to the source.
>
> This commit introduces 4 new logical actions.
> 1. ct_snat: To send the packet through SNAT zone to unSNAT packets.
> 2. ct_snat(IP): To SNAT to the provided IP address.
> 3. ct_dnat: To send the packet throgh DNAT zone to unDNAT packets.
> 4. ct_dnat(IP): To DNAT to the provided IP.
>
> This commit only provides the ability to do IP based NAT. This will
> eventually be enhanced to do PORT based NAT too.
>
> Command hints:
>
> Consider a distributed router "R1" that has switch foo (192.168.1.0/24)
> with a lport foo1 (192.168.1.2) and bar (192.168.2.0/24) with lport bar1
> (192.168.2.2) connected to it. You connect "R1" to
> a gateway router "R2" via a switch "join" in (20.0.0.0/24) network.
>
> R2 has a switch "alice" (172.16.1.0/24) connected to it (to simulate
> external network).
>
> case: Add pure DNAT (north-south)
>
> Add a DNAT rule in R2:
> ovn-nbctl -- --id=@nat create nat type="dnat" logical_ip=192.168.1.2 \
> external_ip=30.0.0.2 -- add logical_router R2 nat @nat
>
> Now alice1 should be able to ping 192.168.1.2 via 30.0.0.2.
>
> case2 : Add pure SNAT (south-north)
>
> Add a SNAT rule in R2:
>
> ovn-nbctl -- --id=@nat create nat type="snat" logical_ip=192.168.2.2 \
> external_ip=30.0.0.1 -- add logical_router R2 nat @nat
>
> (You need a static route in R1 to send packets destined to outside
> world to go through R2. The logical_ip can be a subnet.)
>
> When bar1 pings alice1, alice1 receives traffic from 30.0.0.1
>
> One could combine case1 and case2 with nat type="dnat_and_snat"
> if the IP addresses are the same.
>
> case3 : SNAT and DNAT (east-west traffic)
>
> When bar1 pings 30.0.0.2, the traffic jumps to the gateway router
> and loops back to foo1 with a source ip address of 30.0.0.1
>
> Signed-off-by: Gurucharan Shetty <guru@ovn.org>
> ---
> v3->v4:
> 1. Added unit tests and updated documentation based on blp's comments.
> 2. Changed schema to make it easier for OpenStack users.
>

This patch no longer applies on the tip of the master branch because of a
merge conflict. So for easier reference, I rebased it and pushed it here:
https://github.com/shettyg/ovs/tree/gateway


>
> ---
>  ovn/lib/actions.c           |  83 ++++++++++++++++++++
>  ovn/northd/ovn-northd.8.xml | 131 ++++++++++++++++++++++++++++---
>  ovn/northd/ovn-northd.c     | 187
> ++++++++++++++++++++++++++++++++++++++++++--
>  ovn/ovn-nb.ovsschema        |  19 ++++-
>  ovn/ovn-nb.xml              |  65 +++++++++++++--
>  ovn/ovn-sb.xml              |  41 ++++++++++
>  ovn/utilities/ovn-nbctl.c   |   5 ++
>  tests/ovn.at                |  17 ++++
>  8 files changed, 524 insertions(+), 24 deletions(-)
>
> diff --git a/ovn/lib/actions.c b/ovn/lib/actions.c
> index 5f0bf19..4a486a0 100644
> --- a/ovn/lib/actions.c
> +++ b/ovn/lib/actions.c
> @@ -442,6 +442,85 @@ emit_ct(struct action_context *ctx, bool recirc_next,
> bool commit)
>      add_prerequisite(ctx, "ip");
>  }
>
> +static void
> +parse_ct_nat(struct action_context *ctx, bool snat)
> +{
> +    const size_t ct_offset = ctx->ofpacts->size;
> +    ofpbuf_pull(ctx->ofpacts, ct_offset);
> +
> +    struct ofpact_conntrack *ct = ofpact_put_CT(ctx->ofpacts);
> +
> +    if (ctx->ap->cur_ltable < ctx->ap->n_tables) {
> +        ct->recirc_table = ctx->ap->first_ptable + ctx->ap->cur_ltable +
> 1;
> +    } else {
> +        action_error(ctx,
> +                     "\"ct_[sd]nat\" action not allowed in last table.");
> +        return;
> +    }
> +
> +    if (snat) {
> +        ct->zone_src.field = mf_from_id(MFF_LOG_SNAT_ZONE);
> +    } else {
> +        ct->zone_src.field = mf_from_id(MFF_LOG_DNAT_ZONE);
> +    }
> +    ct->zone_src.ofs = 0;
> +    ct->zone_src.n_bits = 16;
> +    ct->flags = 0;
> +    ct->alg = 0;
> +
> +    add_prerequisite(ctx, "ip");
> +
> +    struct ofpact_nat *nat;
> +    size_t nat_offset;
> +    nat_offset = ctx->ofpacts->size;
> +    ofpbuf_pull(ctx->ofpacts, nat_offset);
> +
> +    nat = ofpact_put_NAT(ctx->ofpacts);
> +    nat->flags = 0;
> +    nat->range_af = AF_UNSPEC;
> +
> +    int commit = 0;
> +    if (lexer_match(ctx->lexer, LEX_T_LPAREN)) {
> +        ovs_be32 ip;
> +        if (ctx->lexer->token.type == LEX_T_INTEGER
> +            && ctx->lexer->token.format == LEX_F_IPV4) {
> +            ip = ctx->lexer->token.value.ipv4;
> +        } else {
> +            action_syntax_error(ctx, "invalid ip");
> +            return;
> +        }
> +
> +        nat->range_af = AF_INET;
> +        nat->range.addr.ipv4.min = ip;
> +        if (snat) {
> +            nat->flags |= NX_NAT_F_SRC;
> +        } else {
> +            nat->flags |= NX_NAT_F_DST;
> +        }
> +        commit = NX_CT_F_COMMIT;
> +        lexer_get(ctx->lexer);
> +        if (!lexer_match(ctx->lexer, LEX_T_RPAREN)) {
> +            action_syntax_error(ctx, "expecting `)'");
> +            return;
> +        }
> +    }
> +
> +    ctx->ofpacts->header = ofpbuf_push_uninit(ctx->ofpacts, nat_offset);
> +    ct = ctx->ofpacts->header;
> +    ct->flags |= commit;
> +
> +    /* XXX: For performance reasons, we try to prevent additional
> +     * recirculations.  So far, ct_snat which is used in a gateway router
> +     * does not need a recirculation. ct_snat(IP) does need a
> recirculation.
> +     * Should we consider a method to let the actions specify whether a
> action
> +     * needs recirculation if there more use cases?. */
> +    if (!commit && snat) {
> +        ct->recirc_table = NX_CT_RECIRC_NONE;
> +    }
> +    ofpact_finish(ctx->ofpacts, &ct->ofpact);
> +    ofpbuf_push_uninit(ctx->ofpacts, ct_offset);
> +}
> +
>  static bool
>  parse_action(struct action_context *ctx)
>  {
> @@ -469,6 +548,10 @@ parse_action(struct action_context *ctx)
>          emit_ct(ctx, true, false);
>      } else if (lexer_match_id(ctx->lexer, "ct_commit")) {
>          emit_ct(ctx, false, true);
> +    } else if (lexer_match_id(ctx->lexer, "ct_dnat")) {
> +        parse_ct_nat(ctx, false);
> +    } else if (lexer_match_id(ctx->lexer, "ct_snat")) {
> +        parse_ct_nat(ctx, true);
>      } else if (lexer_match_id(ctx->lexer, "arp")) {
>          parse_arp_action(ctx);
>      } else if (lexer_match_id(ctx->lexer, "get_arp")) {
> diff --git a/ovn/northd/ovn-northd.8.xml b/ovn/northd/ovn-northd.8.xml
> index 1983812..c237604 100644
> --- a/ovn/northd/ovn-northd.8.xml
> +++ b/ovn/northd/ovn-northd.8.xml
> @@ -517,11 +517,40 @@ next;
>
>        <li>
>          <p>
> -          Reply to ARP requests.  These flows reply to ARP requests for
> the
> -          router's own IP address.  For each router port <var>P</var>
> that owns
> -          IP address <var>A</var> and Ethernet address <var>E</var>, a
> -          priority-90 flow matches <code>inport == <var>P</var> &amp;&amp;
> -          arp.op == 1 &amp;&amp; arp.tpa == <var>A</var></code> (ARP
> request)
> +          Reply to ARP requests.
> +        </p>
> +
> +        <p>
> +          These flows reply to ARP requests for the router's own IP
> address.
> +          For each router port <var>P</var> that owns IP address
> <var>A</var>
> +          and Ethernet address <var>E</var>, a priority-90 flow matches
> +          <code>inport == <var>P</var> &amp;&amp; arp.op == 1 &amp;&amp;
> +          arp.tpa == <var>A</var></code> (ARP request) with the following
> +          actions:
> +        </p>
> +
> +        <pre>
> +eth.dst = eth.src;
> +eth.src = <var>E</var>;
> +arp.op = 2; /* ARP reply. */
> +arp.tha = arp.sha;
> +arp.sha = <var>E</var>;
> +arp.tpa = arp.spa;
> +arp.spa = <var>A</var>;
> +outport = <var>P</var>;
> +inport = ""; /* Allow sending out inport. */
> +output;
> +        </pre>
> +      </li>
> +
> +      <li>
> +        <p>
> +          These flows reply to ARP requests for the virtual IP addresses
> +          configured in the router for DNAT. For a configured DNAT IP
> address
> +          <var>A</var>, for each router port <var>P</var> with Ethernet
> +          address <var>E</var>, a priority-90 flow matches
> +          <code>inport == <var>P</var> &amp;&amp; arp.op == 1 &amp;&amp;
> +          arp.tpa == <var>A</var></code> (ARP request)
>            with the following actions:
>          </p>
>
> @@ -663,7 +692,62 @@ icmp4 {
>        </li>
>      </ul>
>
> -    <h3>Ingress Table 2: IP Routing</h3>
> +    <h3>Ingress Table 2: UNSNAT</h3>
> +
> +    <p>
> +      This is for already established connections' reverse traffic.
> +      i.e., SNAT has already been done in egress pipeline and now the
> +      packet has entered the ingress pipeline as part of a reply.  It is
> +      unSNATted here.
> +    </p>
> +
> +    <ul>
> +      <li>
> +        <p>
> +          For each configuration in the OVN Northbound database, that asks
> +          to change the source IP address of a packet from <var>A</var> to
> +          <var>B</var>, a priority-100 flow matches <code>ip &amp;&amp;
> +          ip4.dst == <var>B</var></code> with an action
> +          <code>ct_snat; next;</code>.
> +        </p>
> +
> +        <p>
> +          A priority-0 logical flow with match <code>1</code> has actions
> +          <code>next;</code>.
> +        </p>
> +      </li>
> +    </ul>
> +
> +    <h3>Ingress Table 3: DNAT</h3>
> +
> +    <p>
> +      Packets enter the pipeline with destination IP address that needs to
> +      be DNATted from a virtual IP address to a real IP address.  Packets
> +      in the reverse direction needs to be unDNATed.
> +    </p>
> +    <ul>
> +      <li>
> +        <p>
> +          For each configuration in the OVN Northbound database, that asks
> +          to change the destination IP address of a packet from
> <var>A</var> to
> +          <var>B</var>, a priority-100 flow matches <code>ip &amp;&amp;
> +          ip4.dst == <var>A</var></code> with an action <code>inport = "";
> +          ct_dnat(<var>B</var>);</code>.
> +        </p>
> +
> +        <p>
> +          For all IP packets of a Gateway router, a priority-50 flow with
> an
> +          action <code>inport = ""; ct_dnat;</code>.
> +        </p>
> +
> +        <p>
> +          A priority-0 logical flow with match <code>1</code> has actions
> +          <code>next;</code>.
> +        </p>
> +      </li>
> +    </ul>
> +
> +    <h3>Ingress Table 4: IP Routing</h3>
>
>      <p>
>        A packet that arrives at this table is an IP packet that should be
> routed
> @@ -672,7 +756,7 @@ icmp4 {
>        <code>ip4.dst</code>, the packet's final destination, unchanged) and
>        advances to the next table for ARP resolution.  It also sets
>        <code>reg1</code> to the IP address owned by the selected router
> port
> -      (which is used later in table 4 as the IP source address for an ARP
> +      (which is used later in table 6 as the IP source address for an ARP
>        request, if needed).
>      </p>
>
> @@ -743,7 +827,7 @@ icmp4 {
>        </li>
>      </ul>
>
> -    <h3>Ingress Table 3: ARP Resolution</h3>
> +    <h3>Ingress Table 5: ARP Resolution</h3>
>
>      <p>
>        Any packet that reaches this table is an IP packet whose next-hop IP
> @@ -798,7 +882,7 @@ icmp4 {
>        </li>
>      </ul>
>
> -    <h3>Ingress Table 4: ARP Request</h3>
> +    <h3>Ingress Table 6: ARP Request</h3>
>
>      <p>
>        In the common case where the Ethernet destination has been
> resolved, this
> @@ -823,7 +907,7 @@ arp {
>          </pre>
>
>          <p>
> -          (Ingress table 2 initialized <code>reg1</code> with the IP
> address
> +          (Ingress table 4 initialized <code>reg1</code> with the IP
> address
>            owned by <code>outport</code>.)
>          </p>
>
> @@ -838,7 +922,32 @@ arp {
>        </li>
>      </ul>
>
> -    <h3>Egress Table 0: Delivery</h3>
> +    <h3>Egress Table 0: SNAT</h3>
> +
> +    <p>
> +      Packets that are configured to be SNATed get their source IP address
> +      changed based on the configuration in the OVN Northbound database.
> +    </p>
> +    <ul>
> +      <li>
> +        <p>
> +          For each configuration in the OVN Northbound database, that asks
> +          to change the source IP address of a packet from an IP address
> of
> +          <var>A</var> or to change the source IP address of a packet that
> +          belongs to network <var>A</var> to <var>B</var>, a flow matches
> +          <code>ip &amp;&amp; ip4.src == <var>A</var></code> with an
> action
> +          <code>ct_snat(<var>B</var>);</code>.  The priority of the flow
> +          is calculated based on the mask of <var>A</var>, with matches
> +          having larger masks getting higher priorities.
> +        </p>
> +        <p>
> +          A priority-0 logical flow with match <code>1</code> has actions
> +          <code>next;</code>.
> +        </p>
> +      </li>
> +    </ul>
> +
> +    <h3>Egress Table 1: Delivery</h3>
>
>      <p>
>        Packets that reach this table are ready for delivery.  It contains
> diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
> index cac0148..4683780 100644
> --- a/ovn/northd/ovn-northd.c
> +++ b/ovn/northd/ovn-northd.c
> @@ -105,12 +105,15 @@ enum ovn_stage {
>      /* Logical router ingress stages. */                              \
>      PIPELINE_STAGE(ROUTER, IN,  ADMISSION,   0, "lr_in_admission")    \
>      PIPELINE_STAGE(ROUTER, IN,  IP_INPUT,    1, "lr_in_ip_input")     \
> -    PIPELINE_STAGE(ROUTER, IN,  IP_ROUTING,  2, "lr_in_ip_routing")   \
> -    PIPELINE_STAGE(ROUTER, IN,  ARP_RESOLVE, 3, "lr_in_arp_resolve")  \
> -    PIPELINE_STAGE(ROUTER, IN,  ARP_REQUEST, 4, "lr_in_arp_request")  \
> +    PIPELINE_STAGE(ROUTER, IN,  UNSNAT,      2, "lr_in_unsnat")       \
> +    PIPELINE_STAGE(ROUTER, IN,  DNAT,        3, "lr_in_dnat")         \
> +    PIPELINE_STAGE(ROUTER, IN,  IP_ROUTING,  4, "lr_in_ip_routing")   \
> +    PIPELINE_STAGE(ROUTER, IN,  ARP_RESOLVE, 5, "lr_in_arp_resolve")  \
> +    PIPELINE_STAGE(ROUTER, IN,  ARP_REQUEST, 6, "lr_in_arp_request")  \
>                                                                        \
>      /* Logical router egress stages. */                               \
> -    PIPELINE_STAGE(ROUTER, OUT, DELIVERY,    0, "lr_out_delivery")
> +    PIPELINE_STAGE(ROUTER, OUT, SNAT,      0, "lr_out_snat")          \
> +    PIPELINE_STAGE(ROUTER, OUT, DELIVERY,  1, "lr_out_delivery")
>
>  #define PIPELINE_STAGE(DP_TYPE, PIPELINE, STAGE, TABLE, NAME)   \
>      S_##DP_TYPE##_##PIPELINE##_##STAGE                          \
> @@ -1998,6 +2001,51 @@ build_lrouter_flows(struct hmap *datapaths, struct
> hmap *ports,
>          free(match);
>          free(actions);
>
> +        /* ARP handling for external IP addresses.
> +         *
> +         * DNAT IP addresses are external IP addresses that need ARP
> +         * handling. */
> +        for (int i = 0; i < op->od->nbr->n_nat; i++) {
> +            const struct nbrec_nat *nat;
> +
> +            nat = op->od->nbr->nat[i];
> +
> +            if(!strcmp(nat->type, "snat")) {
> +                continue;
> +            }
> +
> +            ovs_be32 ip;
> +            if (!ip_parse(nat->external_ip, &ip) || !ip) {
> +                static struct vlog_rate_limit rl =
> VLOG_RATE_LIMIT_INIT(5, 1);
> +                VLOG_WARN_RL(&rl, "bad ip address %s in dnat
> configuration "
> +                             "for router %s", nat->external_ip, op->key);
> +                continue;
> +            }
> +
> +            match = xasprintf(
> +                "inport == %s && arp.tpa == "IP_FMT" && arp.op == 1",
> +                op->json_key, IP_ARGS(ip));
> +            actions = xasprintf(
> +                "eth.dst = eth.src; "
> +                "eth.src = "ETH_ADDR_FMT"; "
> +                "arp.op = 2; /* ARP reply */ "
> +                "arp.tha = arp.sha; "
> +                "arp.sha = "ETH_ADDR_FMT"; "
> +                "arp.tpa = arp.spa; "
> +                "arp.spa = "IP_FMT"; "
> +                "outport = %s; "
> +                "inport = \"\"; /* Allow sending out inport. */ "
> +                "output;",
> +                ETH_ADDR_ARGS(op->mac),
> +                ETH_ADDR_ARGS(op->mac),
> +                IP_ARGS(ip),
> +                op->json_key);
> +            ovn_lflow_add(lflows, op->od, S_ROUTER_IN_IP_INPUT, 90,
> +                          match, actions);
> +            free(match);
> +            free(actions);
> +        }
> +
>          /* Drop IP traffic to this router. */
>          match = xasprintf("ip4.dst == "IP_FMT, IP_ARGS(op->ip));
>          ovn_lflow_add(lflows, op->od, S_ROUTER_IN_IP_INPUT, 60,
> @@ -2005,6 +2053,135 @@ build_lrouter_flows(struct hmap *datapaths, struct
> hmap *ports,
>          free(match);
>      }
>
> +    /* NAT in Gateway routers. */
> +    HMAP_FOR_EACH (od, key_node, datapaths) {
> +        if (!od->nbr) {
> +            continue;
> +        }
> +
> +        /* Packets are allowed by default. */
> +        ovn_lflow_add(lflows, od, S_ROUTER_IN_UNSNAT, 0, "1", "next;");
> +        ovn_lflow_add(lflows, od, S_ROUTER_OUT_SNAT, 0, "1", "next;");
> +        ovn_lflow_add(lflows, od, S_ROUTER_IN_DNAT, 0, "1", "next;");
> +
> +        /* NAT rules are only valid on Gateway routers. */
> +        if (!smap_get(&od->nbr->options, "chassis")) {
> +            continue;
> +        }
> +
> +        for (int i = 0; i < od->nbr->n_nat; i++) {
> +            const struct nbrec_nat *nat;
> +
> +            nat = od->nbr->nat[i];
> +
> +            ovs_be32 ip, mask;
> +
> +            char *error = ip_parse_masked(nat->external_ip, &ip, &mask);
> +            if (error || mask != OVS_BE32_MAX) {
> +                static struct vlog_rate_limit rl =
> VLOG_RATE_LIMIT_INIT(5, 1);
> +                VLOG_WARN_RL(&rl, "bad external ip %s for nat",
> +                             nat->external_ip);
> +                free(error);
> +                continue;
> +            }
> +
> +            /* Check the validity of nat->logical_ip. 'logical_ip' can
> +             * be a subnet when the type is "snat". */
> +            error = ip_parse_masked(nat->logical_ip, &ip, &mask);
> +            if (!strcmp(nat->type, "snat")) {
> +                if (error) {
> +                    static struct vlog_rate_limit rl =
> +                        VLOG_RATE_LIMIT_INIT(5, 1);
> +                    VLOG_WARN_RL(&rl, "bad ip network or ip %s for snat "
> +                                 "in router "UUID_FMT"",
> +                                 nat->logical_ip, UUID_ARGS(&od->key));
> +                    free(error);
> +                    continue;
> +                }
> +            } else {
> +                if (error || mask != OVS_BE32_MAX) {
> +                    static struct vlog_rate_limit rl =
> +                        VLOG_RATE_LIMIT_INIT(5, 1);
> +                    VLOG_WARN_RL(&rl, "bad ip %s for dnat in router "
> +                        ""UUID_FMT"", nat->logical_ip,
> UUID_ARGS(&od->key));
> +                    free(error);
> +                    continue;
> +                }
> +            }
> +
> +
> +            char *match, *actions;
> +
> +            /* Ingress UNSNAT table: It is for already established
> connections'
> +             * reverse traffic. i.e., SNAT has already been done in egress
> +             * pipeline and now the packet has entered the ingress
> pipeline as
> +             * part of a reply. We undo the SNAT here.
> +             *
> +             * Undoing SNAT has to happen before DNAT processing.  This is
> +             * because when the packet was DNATed in ingress pipeline, it
> did
> +             * not know about the possibility of eventual additional SNAT
> in
> +             * egress pipeline. */
> +            if (!strcmp(nat->type, "snat")
> +                || !strcmp(nat->type, "dnat_and_snat")) {
> +                match = xasprintf("ip && ip4.dst == %s",
> nat->external_ip);
> +                ovn_lflow_add(lflows, od, S_ROUTER_IN_UNSNAT, 100,
> +                              match, "ct_snat; next;");
> +                free(match);
> +            }
> +
> +            /* Ingress DNAT table: Packets enter the pipeline with
> destination
> +             * IP address that needs to be DNATted from a external IP
> address
> +             * to a logical IP address. */
> +            if (!strcmp(nat->type, "dnat")
> +                || !strcmp(nat->type, "dnat_and_snat")) {
> +                /* Packet when it goes from the initiator to destination.
> +                 * We need to zero the inport because the router can
> +                 * send the packet back through the same interface. */
> +                match = xasprintf("ip && ip4.dst == %s",
> nat->external_ip);
> +                actions = xasprintf("inport = \"\"; ct_dnat(%s);",
> +                                    nat->logical_ip);
> +                ovn_lflow_add(lflows, od, S_ROUTER_IN_DNAT, 100,
> +                           match, actions);
> +                free(match);
> +                free(actions);
> +            }
> +
> +            /* Egress SNAT table: Packets enter the egress pipeline with
> +             * source ip address that needs to be SNATted to a external ip
> +             * address. */
> +            if (!strcmp(nat->type, "snat")
> +                || !strcmp(nat->type, "dnat_and_snat")) {
> +                match = xasprintf("ip && ip4.src == %s", nat->logical_ip);
> +                actions = xasprintf("ct_snat(%s);", nat->external_ip);
> +
> +                /* The priority here is calculated such that the
> +                 * nat->logical_ip with the longest mask gets a higher
> +                 * priority. */
> +                ovn_lflow_add(lflows, od, S_ROUTER_OUT_SNAT,
> +                              count_1bits(ntohl(mask)) + 1, match,
> actions);
> +                free(match);
> +                free(actions);
> +            }
> +        }
> +
> +        /* Re-circulate every packet through the DNAT zone.
> +        * This helps with two things.
> +        *
> +        * 1. Any packet that needs to be unDNATed in the reverse
> +        * direction gets unDNATed. Ideally this could be done in
> +        * the egress pipeline. But since the gateway router
> +        * does not have any feature that depends on the source
> +        * ip address being external IP address for IP routing,
> +        * we can do it here, saving a future re-circulation.
> +        *
> +        * 2. Any packet that was sent through SNAT zone in the
> +        * previous table automatically gets re-circulated to get
> +        * back the new destination IP address that is needed for
> +        * routing in the openflow pipeline. */
> +        ovn_lflow_add(lflows, od, S_ROUTER_IN_DNAT, 50,
> +                      "ip", "inport = \"\"; ct_dnat;");
> +    }
> +
>      /* Logical router ingress table 2: IP Routing.
>       *
>       * A packet that arrives at this table is an IP packet that should be
> @@ -2205,7 +2382,7 @@ build_lrouter_flows(struct hmap *datapaths, struct
> hmap *ports,
>          ovn_lflow_add(lflows, od, S_ROUTER_IN_ARP_REQUEST, 0, "1",
> "output;");
>      }
>
> -    /* Logical router egress table 0: Delivery (priority 100).
> +    /* Logical router egress table 1: Delivery (priority 100).
>       *
>       * Priority 100 rules deliver packets to enabled logical ports. */
>      HMAP_FOR_EACH (op, key_node, ports) {
> diff --git a/ovn/ovn-nb.ovsschema b/ovn/ovn-nb.ovsschema
> index fa21b30..ac6ca14 100644
> --- a/ovn/ovn-nb.ovsschema
> +++ b/ovn/ovn-nb.ovsschema
> @@ -1,7 +1,7 @@
>  {
>      "name": "OVN_Northbound",
> -    "version": "2.1.2",
> -    "cksum": "429668869 5325",
> +    "version": "2.1.3",
> +    "cksum": "3631923697 6121",
>      "tables": {
>          "Logical_Switch": {
>              "columns": {
> @@ -78,6 +78,11 @@
>                                     "max": "unlimited"}},
>                  "default_gw": {"type": {"key": "string", "min": 0, "max":
> 1}},
>                  "enabled": {"type": {"key": "boolean", "min": 0, "max":
> 1}},
> +                "nat": {"type": {"key": {"type": "uuid",
> +                                         "refTable": "NAT",
> +                                         "refType": "strong"},
> +                                 "min": 0,
> +                                 "max": "unlimited"}},
>                  "options": {
>                       "type": {"key": "string",
>                                "value": "string",
> @@ -104,6 +109,16 @@
>                  "ip_prefix": {"type": "string"},
>                  "nexthop": {"type": "string"},
>                  "output_port": {"type": {"key": "string", "min": 0,
> "max": 1}}},
> +            "isRoot": false},
> +        "NAT": {
> +            "columns": {
> +                "external_ip": {"type": "string"},
> +                "logical_ip": {"type": "string"},
> +                "type": {"type": {"key": {"type": "string",
> +                                           "enum": ["set", ["dnat",
> +                                                             "snat",
> +
>  "dnat_and_snat"
> +                                                               ]]}}}},
>              "isRoot": false}
>      }
>  }
> diff --git a/ovn/ovn-nb.xml b/ovn/ovn-nb.xml
> index 41092f1..93ad305 100644
> --- a/ovn/ovn-nb.xml
> +++ b/ovn/ovn-nb.xml
> @@ -631,18 +631,31 @@
>        router has all ingress and egress traffic dropped.
>      </column>
>
> +    <column name="nat">
> +      One or more NAT rules for the router. NAT rules only work on the
> +      Gateway routers.
> +    </column>
> +
>      <group title="Options">
>        <p>
>          Additional options for the logical router.
>        </p>
>
>        <column name="options" key="chassis">
> -        If set, indicates that the logical router in question is
> -        a Gateway router (which is centralized) and resides in the set
> -        chassis.  The same value is also used by
> <code>ovn-controller</code>
> -        to uniquely identify the chassis in the OVN deployment and
> -        comes from <code>external_ids:system-id</code> in the
> -        <code>Open_vSwitch</code> table of Open_vSwitch database.
> +        <p>
> +          If set, indicates that the logical router in question is a
> Gateway
> +          router (which is centralized) and resides in the set chassis.
> The
> +          same value is also used by <code>ovn-controller</code> to
> +          uniquely identify the chassis in the OVN deployment and
> +          comes from <code>external_ids:system-id</code> in the
> +          <code>Open_vSwitch</code> table of Open_vSwitch database.
> +        </p>
> +
> +        <p>
> +          The Gateway router can only be connected to a distributed router
> +          via a switch if SNAT and DNAT are to be configured in the
> Gateway
> +          router.
> +        </p>
>        </column>
>      </group>
>
> @@ -765,4 +778,44 @@
>      </column>
>    </table>
>
> +  <table name="NAT" title="NAT rules for a Gateway router.">
> +    <p>
> +      Each record represents a NAT rule in a Gateway router.
> +    </p>
> +
> +    <column name="type">
> +      <p>Type of the NAT rule.</p>
> +      <ul>
> +        <li>
> +          When <ref column="type"/> is <code>dnat</code>, the externally
> +          visible IP address <ref column="external_ip"/> is DNATted to
> the IP
> +          address <ref column="logical_ip"/> in the logical space.
> +        </li>
> +        <li>
> +          When <ref column="type"/> is <code>snat</code>, IP packets
> +          with their source IP address that either matches the IP address
> +          in <ref column="logical_ip"/> or is in the network provided by
> +          <ref column="logical_ip"/> is SNATed into the IP address in
> +          <ref column="external_ip"/>.
> +        </li>
> +        <li>
> +          When <ref column="type"/> is <code>dnat_and_snat</code>, the
> +          externally visible IP address <ref column="external_ip"/> is
> +          DNATted to the IP address <ref column="logical_ip"/> in the
> +          logical space. In addition, IP packets with the source IP
> +          address that matches <ref column="logical_ip"/> is SNATed into
> +          the IP address in <ref column="external_ip"/>.
> +        </li>
> +      </ul>
> +    </column>
> +
> +    <column name="external_ip">
> +      An IPv4 address.
> +    </column>
> +
> +    <column name="logical_ip">
> +      An IPv4 network (e.g 192.168.1.0/24) or an IPv4 address.
> +    </column>
> +  </table>
> +
>  </database>
> diff --git a/ovn/ovn-sb.xml b/ovn/ovn-sb.xml
> index 1231b4e..5665871 100644
> --- a/ovn/ovn-sb.xml
> +++ b/ovn/ovn-sb.xml
> @@ -951,6 +951,47 @@
>            </p>
>          </dd>
>
> +        <dt><code>ct_dnat;</code></dt>
> +        <dt><code>ct_dnat(<var>IP</var>);</code></dt>
> +        <dd>
> +          <p>
> +            <code>ct_dnat</code> sends the packet through the DNAT zone in
> +            connection tracking table to unDNAT any packet that was
> DNATed in
> +            the opposite direction.  The packet is then automatically
> sent to
> +            to the next tables as if followed by <code>next;</code>
> action.
> +            The next tables will see the changes in the packet caused by
> +            the connection tracker.
> +          </p>
> +          <p>
> +            <code>ct_dnat(<var>IP</var>)</code> sends the packet through
> the
> +            DNAT zone to change the destination IP address of the packet
> to
> +            the one provided inside the parenthesis and commits the
> connection.
> +            The packet is then automatically sent to the next tables as if
> +            followed by <code>next;</code> action.  The next tables will
> see
> +            the changes in the packet caused by the connection tracker.
> +          </p>
> +        </dd>
> +
> +        <dt><code>ct_snat;</code></dt>
> +        <dt><code>ct_snat(<var>IP</var>);</code></dt>
> +        <dd>
> +          <p>
> +            <code>ct_snat</code> sends the packet through the SNAT zone to
> +            unSNAT any packet that was SNATed in the opposite direction.
> If
> +            the packet needs to be sent to the next tables, then it
> should be
> +            followed by a <code>next;</code> action.  The next tables
> will not
> +            see the changes in the packet caused by the connection
> tracker.
> +          </p>
> +          <p>
> +            <code>ct_snat(<var>IP</var>)</code> sends the packet through
> the
> +            SNAT zone to change the source IP address of the packet to
> +            the one provided inside the parenthesis and commits the
> connection.
> +            The packet is then automatically sent to the next tables as if
> +            followed by <code>next;</code> action.  The next tables will
> see the
> +            changes in the packet caused by the connection tracker.
> +          </p>
> +        </dd>
> +
>          <dt><code>arp { <var>action</var>; </code>...<code> };</code></dt>
>          <dd>
>            <p>
> diff --git a/ovn/utilities/ovn-nbctl.c b/ovn/utilities/ovn-nbctl.c
> index 321040e..b821307 100644
> --- a/ovn/utilities/ovn-nbctl.c
> +++ b/ovn/utilities/ovn-nbctl.c
> @@ -1449,6 +1449,11 @@ static const struct ctl_table_class tables[] = {
>         NULL},
>        {NULL, NULL, NULL}}},
>
> +    {&nbrec_table_nat,
> +     {{&nbrec_table_nat, NULL,
> +       NULL},
> +      {NULL, NULL, NULL}}},
> +
>      {NULL, {{NULL, NULL, NULL}, {NULL, NULL, NULL}}}
>  };
>
> diff --git a/tests/ovn.at b/tests/ovn.at
> index 633cf35..19d5c73 100644
> --- a/tests/ovn.at
> +++ b/tests/ovn.at
> @@ -507,6 +507,23 @@ ip.ttl => Syntax error at end of input expecting `--'.
>  ct_next; => actions=ct(table=27,zone=NXM_NX_REG5[0..15]), prereqs=ip
>  ct_commit; => actions=ct(commit,zone=NXM_NX_REG5[0..15]), prereqs=ip
>
> +# dnat
> +ct_dnat; => actions=ct(table=27,zone=NXM_NX_REG3[0..15],nat), prereqs=ip
> +ct_dnat(192.168.1.2); =>
> actions=ct(commit,table=27,zone=NXM_NX_REG3[0..15],nat(dst=192.168.1.2)),
> prereqs=ip
> +ct_dnat(192.168.1.2, 192.168.1.3); => Syntax error at `,' expecting `)'.
> +ct_dnat(foo); => Syntax error at `foo' invalid ip.
> +ct_dnat(foo, bar); => Syntax error at `foo' invalid ip.
> +ct_dnat(); => Syntax error at `)' invalid ip.
> +
> +# snat
> +ct_snat; => actions=ct(zone=NXM_NX_REG4[0..15],nat), prereqs=ip
> +ct_snat(192.168.1.2); =>
> actions=ct(commit,table=27,zone=NXM_NX_REG4[0..15],nat(src=192.168.1.2)),
> prereqs=ip
> +ct_snat(192.168.1.2, 192.168.1.3); => Syntax error at `,' expecting `)'.
> +ct_snat(foo); => Syntax error at `foo' invalid ip.
> +ct_snat(foo, bar); => Syntax error at `foo' invalid ip.
> +ct_snat(); => Syntax error at `)' invalid ip.
> +
> +
>  # arp
>  arp { eth.dst = ff:ff:ff:ff:ff:ff; output; }; =>
> actions=controller(userdata=00.00.00.00.00.00.00.00.00.19.00.10.80.00.06.06.ff.ff.ff.ff.ff.ff.00.00.ff.ff.00.10.00.00.23.20.00.0e.ff.f8.40.00.00.00),
> prereqs=ip4
>
> --
> 1.9.1
>
>
diff mbox

Patch

diff --git a/ovn/lib/actions.c b/ovn/lib/actions.c
index 5f0bf19..4a486a0 100644
--- a/ovn/lib/actions.c
+++ b/ovn/lib/actions.c
@@ -442,6 +442,85 @@  emit_ct(struct action_context *ctx, bool recirc_next, bool commit)
     add_prerequisite(ctx, "ip");
 }
 
+static void
+parse_ct_nat(struct action_context *ctx, bool snat)
+{
+    const size_t ct_offset = ctx->ofpacts->size;
+    ofpbuf_pull(ctx->ofpacts, ct_offset);
+
+    struct ofpact_conntrack *ct = ofpact_put_CT(ctx->ofpacts);
+
+    if (ctx->ap->cur_ltable < ctx->ap->n_tables) {
+        ct->recirc_table = ctx->ap->first_ptable + ctx->ap->cur_ltable + 1;
+    } else {
+        action_error(ctx,
+                     "\"ct_[sd]nat\" action not allowed in last table.");
+        return;
+    }
+
+    if (snat) {
+        ct->zone_src.field = mf_from_id(MFF_LOG_SNAT_ZONE);
+    } else {
+        ct->zone_src.field = mf_from_id(MFF_LOG_DNAT_ZONE);
+    }
+    ct->zone_src.ofs = 0;
+    ct->zone_src.n_bits = 16;
+    ct->flags = 0;
+    ct->alg = 0;
+
+    add_prerequisite(ctx, "ip");
+
+    struct ofpact_nat *nat;
+    size_t nat_offset;
+    nat_offset = ctx->ofpacts->size;
+    ofpbuf_pull(ctx->ofpacts, nat_offset);
+
+    nat = ofpact_put_NAT(ctx->ofpacts);
+    nat->flags = 0;
+    nat->range_af = AF_UNSPEC;
+
+    int commit = 0;
+    if (lexer_match(ctx->lexer, LEX_T_LPAREN)) {
+        ovs_be32 ip;
+        if (ctx->lexer->token.type == LEX_T_INTEGER
+            && ctx->lexer->token.format == LEX_F_IPV4) {
+            ip = ctx->lexer->token.value.ipv4;
+        } else {
+            action_syntax_error(ctx, "invalid ip");
+            return;
+        }
+
+        nat->range_af = AF_INET;
+        nat->range.addr.ipv4.min = ip;
+        if (snat) {
+            nat->flags |= NX_NAT_F_SRC;
+        } else {
+            nat->flags |= NX_NAT_F_DST;
+        }
+        commit = NX_CT_F_COMMIT;
+        lexer_get(ctx->lexer);
+        if (!lexer_match(ctx->lexer, LEX_T_RPAREN)) {
+            action_syntax_error(ctx, "expecting `)'");
+            return;
+        }
+    }
+
+    ctx->ofpacts->header = ofpbuf_push_uninit(ctx->ofpacts, nat_offset);
+    ct = ctx->ofpacts->header;
+    ct->flags |= commit;
+
+    /* XXX: For performance reasons, we try to prevent additional
+     * recirculations.  So far, ct_snat which is used in a gateway router
+     * does not need a recirculation. ct_snat(IP) does need a recirculation.
+     * Should we consider a method to let the actions specify whether a action
+     * needs recirculation if there more use cases?. */
+    if (!commit && snat) {
+        ct->recirc_table = NX_CT_RECIRC_NONE;
+    }
+    ofpact_finish(ctx->ofpacts, &ct->ofpact);
+    ofpbuf_push_uninit(ctx->ofpacts, ct_offset);
+}
+
 static bool
 parse_action(struct action_context *ctx)
 {
@@ -469,6 +548,10 @@  parse_action(struct action_context *ctx)
         emit_ct(ctx, true, false);
     } else if (lexer_match_id(ctx->lexer, "ct_commit")) {
         emit_ct(ctx, false, true);
+    } else if (lexer_match_id(ctx->lexer, "ct_dnat")) {
+        parse_ct_nat(ctx, false);
+    } else if (lexer_match_id(ctx->lexer, "ct_snat")) {
+        parse_ct_nat(ctx, true);
     } else if (lexer_match_id(ctx->lexer, "arp")) {
         parse_arp_action(ctx);
     } else if (lexer_match_id(ctx->lexer, "get_arp")) {
diff --git a/ovn/northd/ovn-northd.8.xml b/ovn/northd/ovn-northd.8.xml
index 1983812..c237604 100644
--- a/ovn/northd/ovn-northd.8.xml
+++ b/ovn/northd/ovn-northd.8.xml
@@ -517,11 +517,40 @@  next;
 
       <li>
         <p>
-          Reply to ARP requests.  These flows reply to ARP requests for the
-          router's own IP address.  For each router port <var>P</var> that owns
-          IP address <var>A</var> and Ethernet address <var>E</var>, a
-          priority-90 flow matches <code>inport == <var>P</var> &amp;&amp;
-          arp.op == 1 &amp;&amp; arp.tpa == <var>A</var></code> (ARP request)
+          Reply to ARP requests.
+        </p>
+
+        <p>
+          These flows reply to ARP requests for the router's own IP address.
+          For each router port <var>P</var> that owns IP address <var>A</var>
+          and Ethernet address <var>E</var>, a priority-90 flow matches
+          <code>inport == <var>P</var> &amp;&amp; arp.op == 1 &amp;&amp;
+          arp.tpa == <var>A</var></code> (ARP request) with the following
+          actions:
+        </p>
+
+        <pre>
+eth.dst = eth.src;
+eth.src = <var>E</var>;
+arp.op = 2; /* ARP reply. */
+arp.tha = arp.sha;
+arp.sha = <var>E</var>;
+arp.tpa = arp.spa;
+arp.spa = <var>A</var>;
+outport = <var>P</var>;
+inport = ""; /* Allow sending out inport. */
+output;
+        </pre>
+      </li>
+
+      <li>
+        <p>
+          These flows reply to ARP requests for the virtual IP addresses
+          configured in the router for DNAT. For a configured DNAT IP address
+          <var>A</var>, for each router port <var>P</var> with Ethernet
+          address <var>E</var>, a priority-90 flow matches
+          <code>inport == <var>P</var> &amp;&amp; arp.op == 1 &amp;&amp;
+          arp.tpa == <var>A</var></code> (ARP request)
           with the following actions:
         </p>
 
@@ -663,7 +692,62 @@  icmp4 {
       </li>
     </ul>
 
-    <h3>Ingress Table 2: IP Routing</h3>
+    <h3>Ingress Table 2: UNSNAT</h3>
+
+    <p>
+      This is for already established connections' reverse traffic.
+      i.e., SNAT has already been done in egress pipeline and now the
+      packet has entered the ingress pipeline as part of a reply.  It is
+      unSNATted here.
+    </p>
+
+    <ul>
+      <li>
+        <p>
+          For each configuration in the OVN Northbound database, that asks
+          to change the source IP address of a packet from <var>A</var> to
+          <var>B</var>, a priority-100 flow matches <code>ip &amp;&amp;
+          ip4.dst == <var>B</var></code> with an action
+          <code>ct_snat; next;</code>.
+        </p>
+
+        <p>
+          A priority-0 logical flow with match <code>1</code> has actions
+          <code>next;</code>.
+        </p>
+      </li>
+    </ul>
+
+    <h3>Ingress Table 3: DNAT</h3>
+
+    <p>
+      Packets enter the pipeline with destination IP address that needs to
+      be DNATted from a virtual IP address to a real IP address.  Packets
+      in the reverse direction needs to be unDNATed.
+    </p>
+    <ul>
+      <li>
+        <p>
+          For each configuration in the OVN Northbound database, that asks
+          to change the destination IP address of a packet from <var>A</var> to
+          <var>B</var>, a priority-100 flow matches <code>ip &amp;&amp;
+          ip4.dst == <var>A</var></code> with an action <code>inport = "";
+          ct_dnat(<var>B</var>);</code>.
+        </p>
+
+        <p>
+          For all IP packets of a Gateway router, a priority-50 flow with an
+          action <code>inport = ""; ct_dnat;</code>.
+        </p>
+
+        <p>
+          A priority-0 logical flow with match <code>1</code> has actions
+          <code>next;</code>.
+        </p>
+      </li>
+    </ul>
+
+    <h3>Ingress Table 4: IP Routing</h3>
 
     <p>
       A packet that arrives at this table is an IP packet that should be routed
@@ -672,7 +756,7 @@  icmp4 {
       <code>ip4.dst</code>, the packet's final destination, unchanged) and
       advances to the next table for ARP resolution.  It also sets
       <code>reg1</code> to the IP address owned by the selected router port
-      (which is used later in table 4 as the IP source address for an ARP
+      (which is used later in table 6 as the IP source address for an ARP
       request, if needed).
     </p>
 
@@ -743,7 +827,7 @@  icmp4 {
       </li>
     </ul>
 
-    <h3>Ingress Table 3: ARP Resolution</h3>
+    <h3>Ingress Table 5: ARP Resolution</h3>
 
     <p>
       Any packet that reaches this table is an IP packet whose next-hop IP
@@ -798,7 +882,7 @@  icmp4 {
       </li>
     </ul>
 
-    <h3>Ingress Table 4: ARP Request</h3>
+    <h3>Ingress Table 6: ARP Request</h3>
 
     <p>
       In the common case where the Ethernet destination has been resolved, this
@@ -823,7 +907,7 @@  arp {
         </pre>
 
         <p>
-          (Ingress table 2 initialized <code>reg1</code> with the IP address
+          (Ingress table 4 initialized <code>reg1</code> with the IP address
           owned by <code>outport</code>.)
         </p>
 
@@ -838,7 +922,32 @@  arp {
       </li>
     </ul>
 
-    <h3>Egress Table 0: Delivery</h3>
+    <h3>Egress Table 0: SNAT</h3>
+
+    <p>
+      Packets that are configured to be SNATed get their source IP address
+      changed based on the configuration in the OVN Northbound database.
+    </p>
+    <ul>
+      <li>
+        <p>
+          For each configuration in the OVN Northbound database, that asks
+          to change the source IP address of a packet from an IP address of
+          <var>A</var> or to change the source IP address of a packet that
+          belongs to network <var>A</var> to <var>B</var>, a flow matches
+          <code>ip &amp;&amp; ip4.src == <var>A</var></code> with an action
+          <code>ct_snat(<var>B</var>);</code>.  The priority of the flow
+          is calculated based on the mask of <var>A</var>, with matches
+          having larger masks getting higher priorities.
+        </p>
+        <p>
+          A priority-0 logical flow with match <code>1</code> has actions
+          <code>next;</code>.
+        </p>
+      </li>
+    </ul>
+
+    <h3>Egress Table 1: Delivery</h3>
 
     <p>
       Packets that reach this table are ready for delivery.  It contains
diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
index cac0148..4683780 100644
--- a/ovn/northd/ovn-northd.c
+++ b/ovn/northd/ovn-northd.c
@@ -105,12 +105,15 @@  enum ovn_stage {
     /* Logical router ingress stages. */                              \
     PIPELINE_STAGE(ROUTER, IN,  ADMISSION,   0, "lr_in_admission")    \
     PIPELINE_STAGE(ROUTER, IN,  IP_INPUT,    1, "lr_in_ip_input")     \
-    PIPELINE_STAGE(ROUTER, IN,  IP_ROUTING,  2, "lr_in_ip_routing")   \
-    PIPELINE_STAGE(ROUTER, IN,  ARP_RESOLVE, 3, "lr_in_arp_resolve")  \
-    PIPELINE_STAGE(ROUTER, IN,  ARP_REQUEST, 4, "lr_in_arp_request")  \
+    PIPELINE_STAGE(ROUTER, IN,  UNSNAT,      2, "lr_in_unsnat")       \
+    PIPELINE_STAGE(ROUTER, IN,  DNAT,        3, "lr_in_dnat")         \
+    PIPELINE_STAGE(ROUTER, IN,  IP_ROUTING,  4, "lr_in_ip_routing")   \
+    PIPELINE_STAGE(ROUTER, IN,  ARP_RESOLVE, 5, "lr_in_arp_resolve")  \
+    PIPELINE_STAGE(ROUTER, IN,  ARP_REQUEST, 6, "lr_in_arp_request")  \
                                                                       \
     /* Logical router egress stages. */                               \
-    PIPELINE_STAGE(ROUTER, OUT, DELIVERY,    0, "lr_out_delivery")
+    PIPELINE_STAGE(ROUTER, OUT, SNAT,      0, "lr_out_snat")          \
+    PIPELINE_STAGE(ROUTER, OUT, DELIVERY,  1, "lr_out_delivery")
 
 #define PIPELINE_STAGE(DP_TYPE, PIPELINE, STAGE, TABLE, NAME)   \
     S_##DP_TYPE##_##PIPELINE##_##STAGE                          \
@@ -1998,6 +2001,51 @@  build_lrouter_flows(struct hmap *datapaths, struct hmap *ports,
         free(match);
         free(actions);
 
+        /* ARP handling for external IP addresses.
+         *
+         * DNAT IP addresses are external IP addresses that need ARP
+         * handling. */
+        for (int i = 0; i < op->od->nbr->n_nat; i++) {
+            const struct nbrec_nat *nat;
+
+            nat = op->od->nbr->nat[i];
+
+            if(!strcmp(nat->type, "snat")) {
+                continue;
+            }
+
+            ovs_be32 ip;
+            if (!ip_parse(nat->external_ip, &ip) || !ip) {
+                static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 1);
+                VLOG_WARN_RL(&rl, "bad ip address %s in dnat configuration "
+                             "for router %s", nat->external_ip, op->key);
+                continue;
+            }
+
+            match = xasprintf(
+                "inport == %s && arp.tpa == "IP_FMT" && arp.op == 1",
+                op->json_key, IP_ARGS(ip));
+            actions = xasprintf(
+                "eth.dst = eth.src; "
+                "eth.src = "ETH_ADDR_FMT"; "
+                "arp.op = 2; /* ARP reply */ "
+                "arp.tha = arp.sha; "
+                "arp.sha = "ETH_ADDR_FMT"; "
+                "arp.tpa = arp.spa; "
+                "arp.spa = "IP_FMT"; "
+                "outport = %s; "
+                "inport = \"\"; /* Allow sending out inport. */ "
+                "output;",
+                ETH_ADDR_ARGS(op->mac),
+                ETH_ADDR_ARGS(op->mac),
+                IP_ARGS(ip),
+                op->json_key);
+            ovn_lflow_add(lflows, op->od, S_ROUTER_IN_IP_INPUT, 90,
+                          match, actions);
+            free(match);
+            free(actions);
+        }
+
         /* Drop IP traffic to this router. */
         match = xasprintf("ip4.dst == "IP_FMT, IP_ARGS(op->ip));
         ovn_lflow_add(lflows, op->od, S_ROUTER_IN_IP_INPUT, 60,
@@ -2005,6 +2053,135 @@  build_lrouter_flows(struct hmap *datapaths, struct hmap *ports,
         free(match);
     }
 
+    /* NAT in Gateway routers. */
+    HMAP_FOR_EACH (od, key_node, datapaths) {
+        if (!od->nbr) {
+            continue;
+        }
+
+        /* Packets are allowed by default. */
+        ovn_lflow_add(lflows, od, S_ROUTER_IN_UNSNAT, 0, "1", "next;");
+        ovn_lflow_add(lflows, od, S_ROUTER_OUT_SNAT, 0, "1", "next;");
+        ovn_lflow_add(lflows, od, S_ROUTER_IN_DNAT, 0, "1", "next;");
+
+        /* NAT rules are only valid on Gateway routers. */
+        if (!smap_get(&od->nbr->options, "chassis")) {
+            continue;
+        }
+
+        for (int i = 0; i < od->nbr->n_nat; i++) {
+            const struct nbrec_nat *nat;
+
+            nat = od->nbr->nat[i];
+
+            ovs_be32 ip, mask;
+
+            char *error = ip_parse_masked(nat->external_ip, &ip, &mask);
+            if (error || mask != OVS_BE32_MAX) {
+                static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 1);
+                VLOG_WARN_RL(&rl, "bad external ip %s for nat",
+                             nat->external_ip);
+                free(error);
+                continue;
+            }
+
+            /* Check the validity of nat->logical_ip. 'logical_ip' can
+             * be a subnet when the type is "snat". */
+            error = ip_parse_masked(nat->logical_ip, &ip, &mask);
+            if (!strcmp(nat->type, "snat")) {
+                if (error) {
+                    static struct vlog_rate_limit rl =
+                        VLOG_RATE_LIMIT_INIT(5, 1);
+                    VLOG_WARN_RL(&rl, "bad ip network or ip %s for snat "
+                                 "in router "UUID_FMT"",
+                                 nat->logical_ip, UUID_ARGS(&od->key));
+                    free(error);
+                    continue;
+                }
+            } else {
+                if (error || mask != OVS_BE32_MAX) {
+                    static struct vlog_rate_limit rl =
+                        VLOG_RATE_LIMIT_INIT(5, 1);
+                    VLOG_WARN_RL(&rl, "bad ip %s for dnat in router "
+                        ""UUID_FMT"", nat->logical_ip, UUID_ARGS(&od->key));
+                    free(error);
+                    continue;
+                }
+            }
+
+
+            char *match, *actions;
+
+            /* Ingress UNSNAT table: It is for already established connections'
+             * reverse traffic. i.e., SNAT has already been done in egress
+             * pipeline and now the packet has entered the ingress pipeline as
+             * part of a reply. We undo the SNAT here.
+             *
+             * Undoing SNAT has to happen before DNAT processing.  This is
+             * because when the packet was DNATed in ingress pipeline, it did
+             * not know about the possibility of eventual additional SNAT in
+             * egress pipeline. */
+            if (!strcmp(nat->type, "snat")
+                || !strcmp(nat->type, "dnat_and_snat")) {
+                match = xasprintf("ip && ip4.dst == %s", nat->external_ip);
+                ovn_lflow_add(lflows, od, S_ROUTER_IN_UNSNAT, 100,
+                              match, "ct_snat; next;");
+                free(match);
+            }
+
+            /* Ingress DNAT table: Packets enter the pipeline with destination
+             * IP address that needs to be DNATted from a external IP address
+             * to a logical IP address. */
+            if (!strcmp(nat->type, "dnat")
+                || !strcmp(nat->type, "dnat_and_snat")) {
+                /* Packet when it goes from the initiator to destination.
+                 * We need to zero the inport because the router can
+                 * send the packet back through the same interface. */
+                match = xasprintf("ip && ip4.dst == %s", nat->external_ip);
+                actions = xasprintf("inport = \"\"; ct_dnat(%s);",
+                                    nat->logical_ip);
+                ovn_lflow_add(lflows, od, S_ROUTER_IN_DNAT, 100,
+                           match, actions);
+                free(match);
+                free(actions);
+            }
+
+            /* Egress SNAT table: Packets enter the egress pipeline with
+             * source ip address that needs to be SNATted to a external ip
+             * address. */
+            if (!strcmp(nat->type, "snat")
+                || !strcmp(nat->type, "dnat_and_snat")) {
+                match = xasprintf("ip && ip4.src == %s", nat->logical_ip);
+                actions = xasprintf("ct_snat(%s);", nat->external_ip);
+
+                /* The priority here is calculated such that the
+                 * nat->logical_ip with the longest mask gets a higher
+                 * priority. */
+                ovn_lflow_add(lflows, od, S_ROUTER_OUT_SNAT,
+                              count_1bits(ntohl(mask)) + 1, match, actions);
+                free(match);
+                free(actions);
+            }
+        }
+
+        /* Re-circulate every packet through the DNAT zone.
+        * This helps with two things.
+        *
+        * 1. Any packet that needs to be unDNATed in the reverse
+        * direction gets unDNATed. Ideally this could be done in
+        * the egress pipeline. But since the gateway router
+        * does not have any feature that depends on the source
+        * ip address being external IP address for IP routing,
+        * we can do it here, saving a future re-circulation.
+        *
+        * 2. Any packet that was sent through SNAT zone in the
+        * previous table automatically gets re-circulated to get
+        * back the new destination IP address that is needed for
+        * routing in the openflow pipeline. */
+        ovn_lflow_add(lflows, od, S_ROUTER_IN_DNAT, 50,
+                      "ip", "inport = \"\"; ct_dnat;");
+    }
+
     /* Logical router ingress table 2: IP Routing.
      *
      * A packet that arrives at this table is an IP packet that should be
@@ -2205,7 +2382,7 @@  build_lrouter_flows(struct hmap *datapaths, struct hmap *ports,
         ovn_lflow_add(lflows, od, S_ROUTER_IN_ARP_REQUEST, 0, "1", "output;");
     }
 
-    /* Logical router egress table 0: Delivery (priority 100).
+    /* Logical router egress table 1: Delivery (priority 100).
      *
      * Priority 100 rules deliver packets to enabled logical ports. */
     HMAP_FOR_EACH (op, key_node, ports) {
diff --git a/ovn/ovn-nb.ovsschema b/ovn/ovn-nb.ovsschema
index fa21b30..ac6ca14 100644
--- a/ovn/ovn-nb.ovsschema
+++ b/ovn/ovn-nb.ovsschema
@@ -1,7 +1,7 @@ 
 {
     "name": "OVN_Northbound",
-    "version": "2.1.2",
-    "cksum": "429668869 5325",
+    "version": "2.1.3",
+    "cksum": "3631923697 6121",
     "tables": {
         "Logical_Switch": {
             "columns": {
@@ -78,6 +78,11 @@ 
                                    "max": "unlimited"}},
                 "default_gw": {"type": {"key": "string", "min": 0, "max": 1}},
                 "enabled": {"type": {"key": "boolean", "min": 0, "max": 1}},
+                "nat": {"type": {"key": {"type": "uuid",
+                                         "refTable": "NAT",
+                                         "refType": "strong"},
+                                 "min": 0,
+                                 "max": "unlimited"}},
                 "options": {
                      "type": {"key": "string",
                               "value": "string",
@@ -104,6 +109,16 @@ 
                 "ip_prefix": {"type": "string"},
                 "nexthop": {"type": "string"},
                 "output_port": {"type": {"key": "string", "min": 0, "max": 1}}},
+            "isRoot": false},
+        "NAT": {
+            "columns": {
+                "external_ip": {"type": "string"},
+                "logical_ip": {"type": "string"},
+                "type": {"type": {"key": {"type": "string",
+                                           "enum": ["set", ["dnat",
+                                                             "snat",
+                                                             "dnat_and_snat"
+                                                               ]]}}}},
             "isRoot": false}
     }
 }
diff --git a/ovn/ovn-nb.xml b/ovn/ovn-nb.xml
index 41092f1..93ad305 100644
--- a/ovn/ovn-nb.xml
+++ b/ovn/ovn-nb.xml
@@ -631,18 +631,31 @@ 
       router has all ingress and egress traffic dropped.
     </column>
 
+    <column name="nat">
+      One or more NAT rules for the router. NAT rules only work on the
+      Gateway routers.
+    </column>
+
     <group title="Options">
       <p>
         Additional options for the logical router.
       </p>
 
       <column name="options" key="chassis">
-        If set, indicates that the logical router in question is
-        a Gateway router (which is centralized) and resides in the set
-        chassis.  The same value is also used by <code>ovn-controller</code>
-        to uniquely identify the chassis in the OVN deployment and
-        comes from <code>external_ids:system-id</code> in the
-        <code>Open_vSwitch</code> table of Open_vSwitch database.
+        <p>
+          If set, indicates that the logical router in question is a Gateway
+          router (which is centralized) and resides in the set chassis.  The
+          same value is also used by <code>ovn-controller</code> to
+          uniquely identify the chassis in the OVN deployment and
+          comes from <code>external_ids:system-id</code> in the
+          <code>Open_vSwitch</code> table of Open_vSwitch database.
+        </p>
+
+        <p>
+          The Gateway router can only be connected to a distributed router
+          via a switch if SNAT and DNAT are to be configured in the Gateway
+          router.
+        </p>
       </column>
     </group>
     
@@ -765,4 +778,44 @@ 
     </column>
   </table>
 
+  <table name="NAT" title="NAT rules for a Gateway router.">
+    <p>
+      Each record represents a NAT rule in a Gateway router.
+    </p>
+
+    <column name="type">
+      <p>Type of the NAT rule.</p>
+      <ul>
+        <li>
+          When <ref column="type"/> is <code>dnat</code>, the externally
+          visible IP address <ref column="external_ip"/> is DNATted to the IP
+          address <ref column="logical_ip"/> in the logical space.
+        </li>
+        <li>
+          When <ref column="type"/> is <code>snat</code>, IP packets
+          with their source IP address that either matches the IP address
+          in <ref column="logical_ip"/> or is in the network provided by
+          <ref column="logical_ip"/> is SNATed into the IP address in
+          <ref column="external_ip"/>.
+        </li>
+        <li>
+          When <ref column="type"/> is <code>dnat_and_snat</code>, the
+          externally visible IP address <ref column="external_ip"/> is
+          DNATted to the IP address <ref column="logical_ip"/> in the
+          logical space. In addition, IP packets with the source IP
+          address that matches <ref column="logical_ip"/> is SNATed into
+          the IP address in <ref column="external_ip"/>.
+        </li>
+      </ul>
+    </column>
+
+    <column name="external_ip">
+      An IPv4 address.
+    </column>
+
+    <column name="logical_ip">
+      An IPv4 network (e.g 192.168.1.0/24) or an IPv4 address.
+    </column>
+  </table>
+
 </database>
diff --git a/ovn/ovn-sb.xml b/ovn/ovn-sb.xml
index 1231b4e..5665871 100644
--- a/ovn/ovn-sb.xml
+++ b/ovn/ovn-sb.xml
@@ -951,6 +951,47 @@ 
           </p>
         </dd>
 
+        <dt><code>ct_dnat;</code></dt>
+        <dt><code>ct_dnat(<var>IP</var>);</code></dt>
+        <dd>
+          <p>
+            <code>ct_dnat</code> sends the packet through the DNAT zone in
+            connection tracking table to unDNAT any packet that was DNATed in
+            the opposite direction.  The packet is then automatically sent to
+            to the next tables as if followed by <code>next;</code> action.
+            The next tables will see the changes in the packet caused by
+            the connection tracker.
+          </p>
+          <p>
+            <code>ct_dnat(<var>IP</var>)</code> sends the packet through the
+            DNAT zone to change the destination IP address of the packet to
+            the one provided inside the parenthesis and commits the connection.
+            The packet is then automatically sent to the next tables as if
+            followed by <code>next;</code> action.  The next tables will see
+            the changes in the packet caused by the connection tracker.
+          </p>
+        </dd>
+
+        <dt><code>ct_snat;</code></dt>
+        <dt><code>ct_snat(<var>IP</var>);</code></dt>
+        <dd>
+          <p>
+            <code>ct_snat</code> sends the packet through the SNAT zone to
+            unSNAT any packet that was SNATed in the opposite direction.  If
+            the packet needs to be sent to the next tables, then it should be
+            followed by a <code>next;</code> action.  The next tables will not
+            see the changes in the packet caused by the connection tracker.
+          </p>
+          <p>
+            <code>ct_snat(<var>IP</var>)</code> sends the packet through the
+            SNAT zone to change the source IP address of the packet to
+            the one provided inside the parenthesis and commits the connection.
+            The packet is then automatically sent to the next tables as if
+            followed by <code>next;</code> action.  The next tables will see the
+            changes in the packet caused by the connection tracker.
+          </p>
+        </dd>
+
         <dt><code>arp { <var>action</var>; </code>...<code> };</code></dt>
         <dd>
           <p>
diff --git a/ovn/utilities/ovn-nbctl.c b/ovn/utilities/ovn-nbctl.c
index 321040e..b821307 100644
--- a/ovn/utilities/ovn-nbctl.c
+++ b/ovn/utilities/ovn-nbctl.c
@@ -1449,6 +1449,11 @@  static const struct ctl_table_class tables[] = {
        NULL},
       {NULL, NULL, NULL}}},
 
+    {&nbrec_table_nat,
+     {{&nbrec_table_nat, NULL,
+       NULL},
+      {NULL, NULL, NULL}}},
+
     {NULL, {{NULL, NULL, NULL}, {NULL, NULL, NULL}}}
 };
 
diff --git a/tests/ovn.at b/tests/ovn.at
index 633cf35..19d5c73 100644
--- a/tests/ovn.at
+++ b/tests/ovn.at
@@ -507,6 +507,23 @@  ip.ttl => Syntax error at end of input expecting `--'.
 ct_next; => actions=ct(table=27,zone=NXM_NX_REG5[0..15]), prereqs=ip
 ct_commit; => actions=ct(commit,zone=NXM_NX_REG5[0..15]), prereqs=ip
 
+# dnat
+ct_dnat; => actions=ct(table=27,zone=NXM_NX_REG3[0..15],nat), prereqs=ip
+ct_dnat(192.168.1.2); => actions=ct(commit,table=27,zone=NXM_NX_REG3[0..15],nat(dst=192.168.1.2)), prereqs=ip
+ct_dnat(192.168.1.2, 192.168.1.3); => Syntax error at `,' expecting `)'.
+ct_dnat(foo); => Syntax error at `foo' invalid ip.
+ct_dnat(foo, bar); => Syntax error at `foo' invalid ip.
+ct_dnat(); => Syntax error at `)' invalid ip.
+
+# snat
+ct_snat; => actions=ct(zone=NXM_NX_REG4[0..15],nat), prereqs=ip
+ct_snat(192.168.1.2); => actions=ct(commit,table=27,zone=NXM_NX_REG4[0..15],nat(src=192.168.1.2)), prereqs=ip
+ct_snat(192.168.1.2, 192.168.1.3); => Syntax error at `,' expecting `)'.
+ct_snat(foo); => Syntax error at `foo' invalid ip.
+ct_snat(foo, bar); => Syntax error at `foo' invalid ip.
+ct_snat(); => Syntax error at `)' invalid ip.
+
+
 # arp
 arp { eth.dst = ff:ff:ff:ff:ff:ff; output; }; => actions=controller(userdata=00.00.00.00.00.00.00.00.00.19.00.10.80.00.06.06.ff.ff.ff.ff.ff.ff.00.00.ff.ff.00.10.00.00.23.20.00.0e.ff.f8.40.00.00.00), prereqs=ip4