[ovs-dev,3/3] ovn-northd: SNAT in either direction of gateway router.
diff mbox

Message ID 1478170006-15289-3-git-send-email-guru@ovn.org
State Superseded
Headers show

Commit Message

Guru Shetty Nov. 3, 2016, 10:46 a.m. UTC
When multiple gateway routers exist, a packet can
enter any gateway router. Once the packet reaches its
destination, its reverse direction should be via the
same gateway router.  This is achieved by doing a SNAT
of the packet in the inward direction (towards logical space)
with a IP address of the gateway router such that packet travels back
to the same gateway router.

The above means that you can have SNAT rules in the NB
database for both directions.  For e.g. if the logical
ip address are of the range 192.168.1.0/24, you will have
one SNAT rule to transform packet from 192.168.1.0/24 to
an external_ip and another SNAT rule to transform
"0.0.0.0/0" (all external initiated traffic) to a gateway_ip.

For a particular connection, we should do SNAT in only one
direction.  And to do that in the pipeline, we check whether a packet
has already been SNATted and if it has a transformation, we should not
do it again.

Signed-off-by: Gurucharan Shetty <guru@ovn.org>
---
 ovn/lib/actions.c           |  8 --------
 ovn/lib/logical-fields.c    |  2 ++
 ovn/lib/logical-fields.h    |  4 ++++
 ovn/northd/ovn-northd.8.xml | 49 +++++++++++++++++++++++++++++++++++----------
 ovn/northd/ovn-northd.c     | 31 ++++++++++++++++------------
 ovn/ovn-sb.xml              |  8 ++++----
 tests/ovn.at                |  2 +-
 7 files changed, 67 insertions(+), 37 deletions(-)

Comments

Mickey Spiegel Nov. 3, 2016, 11:34 p.m. UTC | #1
Interesting problem. See comments inline.

On Thu, Nov 3, 2016 at 3:46 AM, Gurucharan Shetty <guru@ovn.org> wrote:

> When multiple gateway routers exist, a packet can
> enter any gateway router. Once the packet reaches its
> destination, its reverse direction should be via the
> same gateway router.  This is achieved by doing a SNAT
> of the packet in the inward direction (towards logical space)
> with a IP address of the gateway router such that packet travels back
> to the same gateway router.
>
> The above means that you can have SNAT rules in the NB
> database for both directions.  For e.g. if the logical
> ip address are of the range 192.168.1.0/24, you will have
> one SNAT rule to transform packet from 192.168.1.0/24 to
> an external_ip and another SNAT rule to transform
> "0.0.0.0/0" (all external initiated traffic) to a gateway_ip.
>
> For a particular connection, we should do SNAT in only one
> direction.


In certain deployment scenarios, including the one you have
mentioned, this might work. However, I wonder if this is really
addressing the root of the problem, or latching on to a symptom
specific to this deployment scenario.

It seems to me that the root of the problem has to do with
three issues:
1. SNAT (and DNAT) rules should not apply to ct.rpl traffic,
   instead only UNSNAT (and UNDNAT) rules should apply.
   Conntrack can tell which traffic is reply traffic, but this would
   require going through conntrack with recirc prior to SNAT.
2. If a stateful action such as DNAT or LB is taken on a
   gateway router, such that it is necessary for the reverse
   packet flow to come back to the same gateway router,
   then there should be an SNAT action coupled with the
   DNAT or LB action. If we could implement such composite
   actions, then there would be no need for a general purpose
   SNAT 0.0.0.0/0 catch all. Perhaps instead of SNAT
   0.0.0.0/0, DNAT or LB could set a flag that indicates that
   SNAT should be applied?
3. Currently your gateway router has no sense of direction.
   NAT rules are applied on all interfaces. In the scenario
   described above, you really want the 192.168.1.0/24 SNAT
   rule to apply to outward connections, while you want the
   0.0.0.0/0 SNAT rule to apply to inward connections. Since
   the gateway router has no sense of direction, both rules
   are applied in both directions. Is there a way to introduce
   a sense of direction to a gateway router?

And to do that in the pipeline, we check whether a packet
> has already been SNATted and if it has a transformation, we should not
> do it again.
>

I think this is a roundabout way of implementing 1 above. If all
gateway router traffic is subject to SNAT, then reply traffic will
always hit UNSNAT and so avoid the SNAT stage.

The question is whether a solution that requires all traffic to
be subject to SNAT is acceptable, for deployment scenarios
where SNAT rules have coarse enough IP address prefixes
to catch traffic in both directions?

Mickey
Guru Shetty Nov. 4, 2016, 1:06 a.m. UTC | #2
> On Nov 3, 2016, at 4:34 PM, Mickey Spiegel <mickeys.dev@gmail.com> wrote:
> 
> Interesting problem. See comments inline.
> 
>> On Thu, Nov 3, 2016 at 3:46 AM, Gurucharan Shetty <guru@ovn.org> wrote:
>> 
>> When multiple gateway routers exist, a packet can
>> enter any gateway router. Once the packet reaches its
>> destination, its reverse direction should be via the
>> same gateway router.  This is achieved by doing a SNAT
>> of the packet in the inward direction (towards logical space)
>> with a IP address of the gateway router such that packet travels back
>> to the same gateway router.
>> 
>> The above means that you can have SNAT rules in the NB
>> database for both directions.  For e.g. if the logical
>> ip address are of the range 192.168.1.0/24, you will have
>> one SNAT rule to transform packet from 192.168.1.0/24 to
>> an external_ip and another SNAT rule to transform
>> "0.0.0.0/0" (all external initiated traffic) to a gateway_ip.
>> 
>> For a particular connection, we should do SNAT in only one
>> direction.
> 
> 
> In certain deployment scenarios, including the one you have
> mentioned, this might work. However, I wonder if this is really
> addressing the root of the problem, or latching on to a symptom
> specific to this deployment scenario.
> 
> It seems to me that the root of the problem has to do with
> three issues:
> 1. SNAT (and DNAT) rules should not apply to ct.rpl traffic,
>   instead only UNSNAT (and UNDNAT) rules should apply.
>   Conntrack can tell which traffic is reply traffic, but this would
>   require going through conntrack with recirc prior to SNAT.

Correct

> 2. If a stateful action such as DNAT or LB is taken on a
>   gateway router, such that it is necessary for the reverse
>   packet flow to come back to the same gateway router,
>   then there should be an SNAT action coupled with the
>   DNAT or LB action. If we could implement such composite
>   actions, then there would be no need for a general purpose
>   SNAT 0.0.0.0/0 catch all. Perhaps instead of SNAT
>   0.0.0.0/0, DNAT or LB could set a flag that indicates that
>   SNAT should be applied?

That is one option. But I could not think of a nice way to express it in nb database. The other option is to set it as an option in the gateway router itself.

> 3. Currently your gateway router has no sense of direction.
>   NAT rules are applied on all interfaces. In the scenario
>   described above, you really want the 192.168.1.0/24 SNAT
>   rule to apply to outward connections, while you want the
>   0.0.0.0/0 SNAT rule to apply to inward connections. Since
>   the gateway router has no sense of direction, both rules
>   are applied in both directions. Is there a way to introduce
>   a sense of direction to a gateway router?

It complicates deployment a bit. But can be given more thought.

> 
> And to do that in the pipeline, we check whether a packet
>> has already been SNATted and if it has a transformation, we should not
>> do it again.
> 
> I think this is a roundabout way of implementing 1 above. If all
> gateway router traffic is subject to SNAT, then reply traffic will
> always hit UNSNAT and so avoid the SNAT stage.

Right.

> 
> The question is whether a solution that requires all traffic to
> be subject to SNAT is acceptable, for deployment scenarios
> where SNAT rules have coarse enough IP address prefixes
> to catch traffic in both directions?

I did not understand the above point. Can you elaborate a bit.

> 
> Mickey
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev
Mickey Spiegel Nov. 4, 2016, 2:16 a.m. UTC | #3
See reply at the bottom.

On Thu, Nov 3, 2016 at 6:06 PM, Guru Shetty <guru.ovn@gmail.com> wrote:

<snip>

> It seems to me that the root of the problem has to do with
> > three issues:
> > 1. SNAT (and DNAT) rules should not apply to ct.rpl traffic,
> >   instead only UNSNAT (and UNDNAT) rules should apply.
> >   Conntrack can tell which traffic is reply traffic, but this would
> >   require going through conntrack with recirc prior to SNAT.
>
> Correct
>
> <snip>
>


> > And to do that in the pipeline, we check whether a packet
> >> has already been SNATted and if it has a transformation, we should not
> >> do it again.
> >
> > I think this is a roundabout way of implementing 1 above. If all
> > gateway router traffic is subject to SNAT, then reply traffic will
> > always hit UNSNAT and so avoid the SNAT stage.
>
> Right.
>
> >
> > The question is whether a solution that requires all traffic to
> > be subject to SNAT is acceptable, for deployment scenarios
> > where SNAT rules have coarse enough IP address prefixes
> > to catch traffic in both directions?
>
> I did not understand the above point. Can you elaborate a bit.
>

This approach imposes symmetry in terms of which traffic is
subject to SNAT, when an SNAT rule is coarse enough to match
source IP addresses in both directions, for example 0.0.0.0/0.
If you wanted SNAT imposed for 0.0.0.0/0 in the inward direction,
then you will have SNAT imposed for 0.0.0.0/0 in the outward
direction, with no exceptions except for SNAT longest match
overrides. I guess that is why you suggested the 'nosnat'
second patch?

Mickey


> >
> > Mickey
> > _______________________________________________
> > dev mailing list
> > dev@openvswitch.org
> > http://openvswitch.org/mailman/listinfo/dev
>
Guru Shetty Nov. 4, 2016, 2:29 a.m. UTC | #4
> On Nov 3, 2016, at 7:16 PM, Mickey Spiegel <mickeys.dev@gmail.com> wrote:
> 
> See reply at the bottom.
> 
>> On Thu, Nov 3, 2016 at 6:06 PM, Guru Shetty <guru.ovn@gmail.com> wrote:
>> 
>> <snip>
>> 
>> > It seems to me that the root of the problem has to do with
>> > three issues:
>> > 1. SNAT (and DNAT) rules should not apply to ct.rpl traffic,
>> >   instead only UNSNAT (and UNDNAT) rules should apply.
>> >   Conntrack can tell which traffic is reply traffic, but this would
>> >   require going through conntrack with recirc prior to SNAT.
>> 
>> Correct
>> 
>> <snip>
>  
>> > And to do that in the pipeline, we check whether a packet
>> >> has already been SNATted and if it has a transformation, we should not
>> >> do it again.
>> >
>> > I think this is a roundabout way of implementing 1 above. If all
>> > gateway router traffic is subject to SNAT, then reply traffic will
>> > always hit UNSNAT and so avoid the SNAT stage.
>> 
>> Right.
>> 
>> >
>> > The question is whether a solution that requires all traffic to
>> > be subject to SNAT is acceptable, for deployment scenarios
>> > where SNAT rules have coarse enough IP address prefixes
>> > to catch traffic in both directions?
>> 
>> I did not understand the above point. Can you elaborate a bit.
> 
> This approach imposes symmetry in terms of which traffic is
> subject to SNAT, when an SNAT rule is coarse enough to match
> source IP addresses in both directions, for example 0.0.0.0/0.
> If you wanted SNAT imposed for 0.0.0.0/0 in the inward direction,
> then you will have SNAT imposed for 0.0.0.0/0 in the outward
> direction, with no exceptions except for SNAT longest match
> overrides. I guess that is why you suggested the 'nosnat'
> second patch?

Thanks for the explanation. You are right about the situation. Do you have any design suggestions to solve this overall problem in a easy way?

> 
> Mickey
> 
>> 
>> >
>> > Mickey
>> > _______________________________________________
>> > dev mailing list
>> > dev@openvswitch.org
>> > http://openvswitch.org/mailman/listinfo/dev
>
Mickey Spiegel Nov. 4, 2016, 3:42 a.m. UTC | #5
On Thu, Nov 3, 2016 at 6:06 PM, Guru Shetty <guru.ovn@gmail.com> wrote:

<snip>


> > 2. If a stateful action such as DNAT or LB is taken on a
> >   gateway router, such that it is necessary for the reverse
> >   packet flow to come back to the same gateway router,
> >   then there should be an SNAT action coupled with the
> >   DNAT or LB action. If we could implement such composite
> >   actions, then there would be no need for a general purpose
> >   SNAT 0.0.0.0/0 catch all. Perhaps instead of SNAT
> >   0.0.0.0/0, DNAT or LB could set a flag that indicates that
> >   SNAT should be applied?
>
> That is one option. But I could not think of a nice way to express it in
> nb database. The other option is to set it as an option in the gateway
> router itself.


Coming back to this option because IMO it is the most straightforward in
terms of understanding what the knobs do, even if figuring out what knobs
to provide is not so straightforward.

For the trigger, I am somewhat torn between putting it in the DNAT and LB
rules, versus a gateway router option. I can think of cases where it is
driven by topology (your multiple gateway case), as well as cases where it
is driven by application (whether direct server return works with the
application or not). If the trigger goes in the DNAT and LB rules, then a
force_snat column of type boolean in the NAT and Load_Balancer tables would
take care of the trigger.

The part that is not as obvious is where to specify the IP address that
replaces the packet's source IP address. I would lean towards specifying
this separately from the existing NAT table, partly because the terminology
of the existing NAT table (logical_ip/external_ip) does not match the use
case, and partly because my somewhat hazy memory of hardware load balancer
configuration involves specification of a reference to a NAT pool, i.e. it
is not a separate SNAT rule specification but just the extra bit of
specification necessary to make the SNAT part work. If my memory is right,
that involved a relatively complex reference from the DNAT or LB rule to a
NAT pool. The equivalent of your SNAT 0.0.0.0/0 rule would be just a single
SNAT IP address on the gateway router for all force_snat cases, rather than
a reference to one of many IP addresses or IP address pools.

As long as the trigger is only needed for DNAT and LB in the gateway router
itself, a force_snat flag should work. If you want a trigger for other
cases such as stateful services in a switch upstream from the gateway
router, at first glance it does not seem like this option can handle such a
case. You would need interface related configuration in order to identify
the subset of traffic, and some pipeline stage with conntrack so that you
could weed out ct.rpl traffic.

Mickey
Guru Shetty Nov. 4, 2016, 4:54 p.m. UTC | #6
On 3 November 2016 at 20:42, Mickey Spiegel <mickeys.dev@gmail.com> wrote:

> On Thu, Nov 3, 2016 at 6:06 PM, Guru Shetty <guru.ovn@gmail.com> wrote:
>
> <snip>
>
>
>> > 2. If a stateful action such as DNAT or LB is taken on a
>> >   gateway router, such that it is necessary for the reverse
>> >   packet flow to come back to the same gateway router,
>> >   then there should be an SNAT action coupled with the
>> >   DNAT or LB action. If we could implement such composite
>> >   actions, then there would be no need for a general purpose
>> >   SNAT 0.0.0.0/0 catch all. Perhaps instead of SNAT
>> >   0.0.0.0/0, DNAT or LB could set a flag that indicates that
>> >   SNAT should be applied?
>>
>> That is one option. But I could not think of a nice way to express it in
>> nb database. The other option is to set it as an option in the gateway
>> router itself.
>
>
> Coming back to this option because IMO it is the most straightforward in
> terms of understanding what the knobs do, even if figuring out what knobs
> to provide is not so straightforward.
>
> For the trigger, I am somewhat torn between putting it in the DNAT and LB
> rules, versus a gateway router option. I can think of cases where it is
> driven by topology (your multiple gateway case), as well as cases where it
> is driven by application (whether direct server return works with the
> application or not). If the trigger goes in the DNAT and LB rules, then a
> force_snat column of type boolean in the NAT and Load_Balancer tables would
> take care of the trigger.
>
> The part that is not as obvious is where to specify the IP address that
> replaces the packet's source IP address. I would lean towards specifying
> this separately from the existing NAT table, partly because the terminology
> of the existing NAT table (logical_ip/external_ip) does not match the use
> case, and partly because my somewhat hazy memory of hardware load balancer
> configuration involves specification of a reference to a NAT pool, i.e. it
> is not a separate SNAT rule specification but just the extra bit of
> specification necessary to make the SNAT part work. If my memory is right,
> that involved a relatively complex reference from the DNAT or LB rule to a
> NAT pool. The equivalent of your SNAT 0.0.0.0/0 rule would be just a
> single SNAT IP address on the gateway router for all force_snat cases,
> rather than a reference to one of many IP addresses or IP address pools.
>
> As long as the trigger is only needed for DNAT and LB in the gateway
> router itself, a force_snat flag should work. If you want a trigger for
> other cases such as stateful services in a switch upstream from the gateway
> router, at first glance it does not seem like this option can handle such a
> case. You would need interface related configuration in order to identify
> the subset of traffic, and some pipeline stage with conntrack so that you
> could weed out ct.rpl traffic.
>

Thanks. I think the trigger is only needed for LB and DNAT use cases. I am
thinking of going with the following approach:

* options:force-snat="$IP" in the gateway router.

Whenever there is a DNAT or LB done on the gateway router, you get a force
snat done with the set IP address.  If application specific use cases come
up as we go forward, then it is a matter of providing the same knob in LB
and NAT tables too - which can be designed better when the use cases are
more obvious.




> Mickey
>
Mickey Spiegel Nov. 4, 2016, 5:27 p.m. UTC | #7
Forgot to copy the list on the last reply, but also realized something and
asking for one change below.

On Fri, Nov 4, 2016 at 9:54 AM, Guru Shetty <guru@ovn.org> wrote:

>
>
> On 3 November 2016 at 20:42, Mickey Spiegel <mickeys.dev@gmail.com> wrote:
>
>> On Thu, Nov 3, 2016 at 6:06 PM, Guru Shetty <guru.ovn@gmail.com> wrote:
>>
>> <snip>
>>
>>
>>> > 2. If a stateful action such as DNAT or LB is taken on a
>>> >   gateway router, such that it is necessary for the reverse
>>> >   packet flow to come back to the same gateway router,
>>> >   then there should be an SNAT action coupled with the
>>> >   DNAT or LB action. If we could implement such composite
>>> >   actions, then there would be no need for a general purpose
>>> >   SNAT 0.0.0.0/0 catch all. Perhaps instead of SNAT
>>> >   0.0.0.0/0, DNAT or LB could set a flag that indicates that
>>> >   SNAT should be applied?
>>>
>>> That is one option. But I could not think of a nice way to express it in
>>> nb database. The other option is to set it as an option in the gateway
>>> router itself.
>>
>>
>> Coming back to this option because IMO it is the most straightforward in
>> terms of understanding what the knobs do, even if figuring out what knobs
>> to provide is not so straightforward.
>>
>> For the trigger, I am somewhat torn between putting it in the DNAT and LB
>> rules, versus a gateway router option. I can think of cases where it is
>> driven by topology (your multiple gateway case), as well as cases where it
>> is driven by application (whether direct server return works with the
>> application or not). If the trigger goes in the DNAT and LB rules, then a
>> force_snat column of type boolean in the NAT and Load_Balancer tables would
>> take care of the trigger.
>>
>> The part that is not as obvious is where to specify the IP address that
>> replaces the packet's source IP address. I would lean towards specifying
>> this separately from the existing NAT table, partly because the terminology
>> of the existing NAT table (logical_ip/external_ip) does not match the use
>> case, and partly because my somewhat hazy memory of hardware load balancer
>> configuration involves specification of a reference to a NAT pool, i.e. it
>> is not a separate SNAT rule specification but just the extra bit of
>> specification necessary to make the SNAT part work. If my memory is right,
>> that involved a relatively complex reference from the DNAT or LB rule to a
>> NAT pool. The equivalent of your SNAT 0.0.0.0/0 rule would be just a
>> single SNAT IP address on the gateway router for all force_snat cases,
>> rather than a reference to one of many IP addresses or IP address pools.
>>
>> As long as the trigger is only needed for DNAT and LB in the gateway
>> router itself, a force_snat flag should work. If you want a trigger for
>> other cases such as stateful services in a switch upstream from the gateway
>> router, at first glance it does not seem like this option can handle such a
>> case. You would need interface related configuration in order to identify
>> the subset of traffic, and some pipeline stage with conntrack so that you
>> could weed out ct.rpl traffic.
>>
>
> Thanks. I think the trigger is only needed for LB and DNAT use cases. I am
> thinking of going with the following approach:
>
> * options:force-snat="$IP" in the gateway router.
>
> Whenever there is a DNAT or LB done on the gateway router, you get a force
> snat done with the set IP address.  If application specific use cases come
> up as we go forward, then it is a matter of providing the same knob in LB
> and NAT tables too - which can be designed better when the use cases are
> more obvious.
>

That works for me. A bit of explanation is required with regard to which
traffic is subject to force-snat, but all in all I think this is easy to
understand and avoids complex side effects.

I just realized that this is the missing piece that I need to allow
centralized load balancing on a distributed router. In that case, I need
force-snat for load balancing but not for dnat. Wondering if there should
be two options instead:
dnat-force-snat="$IP"
lb-force-snat="$IP"

That also takes care of the explanation.

Mickey

Patch
diff mbox

diff --git a/ovn/lib/actions.c b/ovn/lib/actions.c
index df526c0..7896d00 100644
--- a/ovn/lib/actions.c
+++ b/ovn/lib/actions.c
@@ -787,14 +787,6 @@  encode_ct_nat(const struct ovnact_ct_nat *cn,
     ct = ofpacts->header;
     if (cn->ip) {
         ct->flags |= NX_CT_F_COMMIT;
-    } else if (snat) {
-        /* XXX: For performance reasons, we try to prevent additional
-         * recirculations.  So far, ct_snat which is used in a gateway router
-         * does not need a recirculation. ct_snat(IP) does need a
-         * recirculation.  Should we consider a method to let the actions
-         * specify whether an action needs recirculation if there more use
-         * cases?. */
-        ct->recirc_table = NX_CT_RECIRC_NONE;
     }
     ofpact_finish(ofpacts, &ct->ofpact);
     ofpbuf_push_uninit(ofpacts, ct_offset);
diff --git a/ovn/lib/logical-fields.c b/ovn/lib/logical-fields.c
index d4578c3..27eba01 100644
--- a/ovn/lib/logical-fields.c
+++ b/ovn/lib/logical-fields.c
@@ -88,6 +88,8 @@  ovn_init_symtab(struct shash *symtab)
     char flags_str[16];
     snprintf(flags_str, sizeof flags_str, "flags[%d]", MLF_ALLOW_LOOPBACK_BIT);
     expr_symtab_add_subfield(symtab, "flags.loopback", NULL, flags_str);
+    snprintf(flags_str, sizeof flags_str, "flags[%d]", MLF_SNAT_DONE);
+    expr_symtab_add_subfield(symtab, "flags.snat", NULL, flags_str);
 
     /* Connection tracking state. */
     expr_symtab_add_field(symtab, "ct_mark", MFF_CT_MARK, NULL, false);
diff --git a/ovn/lib/logical-fields.h b/ovn/lib/logical-fields.h
index a1f1da6..8f12e06 100644
--- a/ovn/lib/logical-fields.h
+++ b/ovn/lib/logical-fields.h
@@ -47,6 +47,7 @@  void ovn_init_symtab(struct shash *symtab);
 enum mff_log_flags_bits {
     MLF_ALLOW_LOOPBACK_BIT = 0,
     MLF_RCV_FROM_VXLAN_BIT = 1,
+    MLF_SNAT_DONE_BIT = 2,
 };
 
 /* MFF_LOG_FLAGS_REG flag assignments */
@@ -59,6 +60,9 @@  enum mff_log_flags {
      * VXLAN encapsulation.  Egress port information is available for
      * Geneve and STT tunnel types. */
     MLF_RCV_FROM_VXLAN = (1 << MLF_RCV_FROM_VXLAN_BIT),
+
+    /* Indicate that a packet has already been transformed in a SNAT zone. */
+    MLF_SNAT_DONE = (1 << MLF_SNAT_DONE_BIT),
 };
 
 #endif /* ovn/lib/logical-fields.h */
diff --git a/ovn/northd/ovn-northd.8.xml b/ovn/northd/ovn-northd.8.xml
index b406db6..abae49c 100644
--- a/ovn/northd/ovn-northd.8.xml
+++ b/ovn/northd/ovn-northd.8.xml
@@ -1149,7 +1149,7 @@  icmp4 {
           to change the source IP address of a packet from <var>A</var> to
           <var>B</var>, a priority-100 flow matches <code>ip &amp;&amp;
           ip4.dst == <var>B</var></code> with an action
-          <code>ct_snat; next;</code>.
+          <code>ct_snat;</code>.
         </p>
 
         <p>
@@ -1159,7 +1159,34 @@  icmp4 {
       </li>
     </ul>
 
-    <h3>Ingress Table 4: DNAT</h3>
+    <h3>Ingress Table 4: POST_UNSNAT</h3>
+
+    <p>
+      This is to get the SNAT state of the packet after having sent the packet
+      through the SNAT zone in the previous table.
+    </p>
+
+    <ul>
+      <li>
+        <p>
+          A priority-100 flow that matches <code>ip &amp;&amp; ct.dnat</code>
+          with an action <code>flags.snat = 1; next;</code> indicating that
+          a UNSNAT happened in the previous table.
+        </p>
+
+        <p>
+          A priority-90 flow with match <code>ip</code>
+          with an action <code>flags.snat = 0; next;</code>.
+        </p>
+
+        <p>
+          A priority-0 logical flow with match <code>1</code> has actions
+          <code>next;</code>.
+        </p>
+      </li>
+    </ul>
+
+    <h3>Ingress Table 5: DNAT</h3>
 
     <p>
       Packets enter the pipeline with destination IP address that needs to
@@ -1208,7 +1235,7 @@  icmp4 {
       </li>
     </ul>
 
-    <h3>Ingress Table 5: IP Routing</h3>
+    <h3>Ingress Table 6: IP Routing</h3>
 
     <p>
       A packet that arrives at this table is an IP packet that should be
@@ -1219,9 +1246,9 @@  icmp4 {
       packet's final destination, unchanged) and advances to the next
       table for ARP resolution.  It also sets <code>reg1</code> (or
       <code>xxreg1</code>) to the IP address owned by the selected router
-      port (Table 7 will generate ARP request, if needed, with
-      <code>reg0</code> as the target protocol address and <code>reg1</code>
-      as the source protocol address).
+      port (Ingress table <code>ARP Request</code> will generate ARP request,
+      if needed, with <code>reg0</code> as the target protocol address and
+      <code>reg1</code> as the source protocol address).
     </p>
 
     <p>
@@ -1300,7 +1327,7 @@  next;
       </li>
     </ul>
 
-    <h3>Ingress Table 6: ARP/ND Resolution</h3>
+    <h3>Ingress Table 7: ARP/ND Resolution</h3>
 
     <p>
       Any packet that reaches this table is an IP packet whose next-hop
@@ -1382,7 +1409,7 @@  next;
       </li>
     </ul>
 
-    <h3>Ingress Table 7: ARP Request</h3>
+    <h3>Ingress Table 8: ARP Request</h3>
 
     <p>
       In the common case where the Ethernet destination has been resolved, this
@@ -1408,9 +1435,9 @@  arp {
         </pre>
 
         <p>
-          (Ingress table 4 initialized <code>reg1</code> with the IP address
-          owned by <code>outport</code> and <code>reg0</code> with the next-hop
-          IP address)
+          (Ingress table <code>IP Routing</code> initialized <code>reg1</code>
+           with the IP address owned by <code>outport</code> and
+           <code>reg0</code> with the next-hop IP address)
         </p>
 
         <p>
diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
index 86504aa..72205e5 100644
--- a/ovn/northd/ovn-northd.c
+++ b/ovn/northd/ovn-northd.c
@@ -129,10 +129,11 @@  enum ovn_stage {
     PIPELINE_STAGE(ROUTER, IN,  IP_INPUT,    1, "lr_in_ip_input")     \
     PIPELINE_STAGE(ROUTER, IN,  DEFRAG,      2, "lr_in_defrag")       \
     PIPELINE_STAGE(ROUTER, IN,  UNSNAT,      3, "lr_in_unsnat")       \
-    PIPELINE_STAGE(ROUTER, IN,  DNAT,        4, "lr_in_dnat")         \
-    PIPELINE_STAGE(ROUTER, IN,  IP_ROUTING,  5, "lr_in_ip_routing")   \
-    PIPELINE_STAGE(ROUTER, IN,  ARP_RESOLVE, 6, "lr_in_arp_resolve")  \
-    PIPELINE_STAGE(ROUTER, IN,  ARP_REQUEST, 7, "lr_in_arp_request")  \
+    PIPELINE_STAGE(ROUTER, IN,  POST_UNSNAT, 4, "lr_in_post_unsnat")  \
+    PIPELINE_STAGE(ROUTER, IN,  DNAT,        5, "lr_in_dnat")         \
+    PIPELINE_STAGE(ROUTER, IN,  IP_ROUTING,  6, "lr_in_ip_routing")   \
+    PIPELINE_STAGE(ROUTER, IN,  ARP_RESOLVE, 7, "lr_in_arp_resolve")  \
+    PIPELINE_STAGE(ROUTER, IN,  ARP_REQUEST, 8, "lr_in_arp_request")  \
                                                                       \
     /* Logical router egress stages. */                               \
     PIPELINE_STAGE(ROUTER, OUT, SNAT,      0, "lr_out_snat")          \
@@ -3840,6 +3841,7 @@  build_lrouter_flows(struct hmap *datapaths, struct hmap *ports,
         /* Packets are allowed by default. */
         ovn_lflow_add(lflows, od, S_ROUTER_IN_DEFRAG, 0, "1", "next;");
         ovn_lflow_add(lflows, od, S_ROUTER_IN_UNSNAT, 0, "1", "next;");
+        ovn_lflow_add(lflows, od, S_ROUTER_IN_POST_UNSNAT, 0, "1", "next;");
         ovn_lflow_add(lflows, od, S_ROUTER_OUT_SNAT, 0, "1", "next;");
         ovn_lflow_add(lflows, od, S_ROUTER_IN_DNAT, 0, "1", "next;");
 
@@ -3917,6 +3919,13 @@  build_lrouter_flows(struct hmap *datapaths, struct hmap *ports,
 
         sset_destroy(&all_ips);
 
+        /* When a UNSNAT happens, ct.dnat is set because the destination
+         * IP address is the one that is changed. */
+        ovn_lflow_add(lflows, od, S_ROUTER_IN_POST_UNSNAT, 100,
+                      "ip && ct.dnat", "flags.snat = 1; next;");
+        ovn_lflow_add(lflows, od, S_ROUTER_IN_POST_UNSNAT, 90,
+                      "ip", "flags.snat = 0; next;");
+
         for (int i = 0; i < od->nbr->n_nat; i++) {
             const struct nbrec_nat *nat;
 
@@ -3976,7 +3985,7 @@  build_lrouter_flows(struct hmap *datapaths, struct hmap *ports,
                 ds_clear(&match);
                 ds_put_format(&match, "ip && ip4.dst == %s", nat->external_ip);
                 ovn_lflow_add(lflows, od, S_ROUTER_IN_UNSNAT, 100,
-                              ds_cstr(&match), "ct_snat; next;");
+                              ds_cstr(&match), "ct_snat;");
             }
 
             /* Ingress DNAT table: Packets enter the pipeline with destination
@@ -4011,7 +4020,8 @@  build_lrouter_flows(struct hmap *datapaths, struct hmap *ports,
             if (!strcmp(nat->type, "snat")
                 || !strcmp(nat->type, "dnat_and_snat")) {
                 ds_clear(&match);
-                ds_put_format(&match, "ip && ip4.src == %s", nat->logical_ip);
+                ds_put_format(&match, "ip && ip4.src == %s && flags.snat == 0",
+                              nat->logical_ip);
                 ds_clear(&actions);
                 ds_put_format(&actions, "ct_snat(%s);", nat->external_ip);
 
@@ -4025,7 +4035,7 @@  build_lrouter_flows(struct hmap *datapaths, struct hmap *ports,
         }
 
         /* Re-circulate every packet through the DNAT zone.
-        * This helps with three things.
+        * This helps with two things.
         *
         * 1. Any packet that needs to be unDNATed in the reverse
         * direction gets unDNATed. Ideally this could be done in
@@ -4035,12 +4045,7 @@  build_lrouter_flows(struct hmap *datapaths, struct hmap *ports,
         * we can do it here, saving a future re-circulation.
         *
         * 2. Established load-balanced connections automatically get
-        * DNATed.
-        *
-        * 3. Any packet that was sent through SNAT zone in the
-        * previous table automatically gets re-circulated to get
-        * back the new destination IP address that is needed for
-        * routing in the openflow pipeline. */
+        * DNATed. */
         ovn_lflow_add(lflows, od, S_ROUTER_IN_DNAT, 50,
                       "ip", "flags.loopback = 1; ct_dnat;");
     }
diff --git a/ovn/ovn-sb.xml b/ovn/ovn-sb.xml
index 65191ed..e564c8a 100644
--- a/ovn/ovn-sb.xml
+++ b/ovn/ovn-sb.xml
@@ -1103,10 +1103,10 @@ 
         <dd>
           <p>
             <code>ct_snat</code> sends the packet through the SNAT zone to
-            unSNAT any packet that was SNATed in the opposite direction.  If
-            the packet needs to be sent to the next tables, then it should be
-            followed by a <code>next;</code> action.  The next tables will not
-            see the changes in the packet caused by the connection tracker.
+            unSNAT any packet that was SNATed in the opposite direction.
+            The packet is then automatically sent to the next tables as if
+            followed by <code>next;</code> action.  The next tables will see
+            the changes in the packet caused by the connection tracker.
           </p>
           <p>
             <code>ct_snat(<var>IP</var>)</code> sends the packet through the
diff --git a/tests/ovn.at b/tests/ovn.at
index 69f5277..bc8c9b8 100644
--- a/tests/ovn.at
+++ b/tests/ovn.at
@@ -838,7 +838,7 @@  ct_dnat();
 
 # ct_snat
 ct_snat;
-    encodes as ct(zone=NXM_NX_REG12[0..15],nat)
+    encodes as ct(table=27,zone=NXM_NX_REG12[0..15],nat)
     has prereqs ip
 ct_snat(192.168.1.2);
     encodes as ct(commit,table=27,zone=NXM_NX_REG12[0..15],nat(src=192.168.1.2))