[ovs-dev,1/2] ovn-northd: Ability to loop-back in a router.
diff mbox

Message ID 1467706640-10314-1-git-send-email-guru@ovn.org
State Not Applicable
Headers show

Commit Message

Gurucharan Shetty July 5, 2016, 8:17 a.m. UTC
Currently, when a client looks at a load balancer VIP,
it notices that it is in a different subnet than itself
and sends the packet to its connected router port's
MAC address. The load balancer intercepts it.

If the load balancer VIP translates to an endpoint IP in a
different subnet (than the one client has), than the
load balancing works fine because the router will send
the packet to the correct destination.

But if one of the endpoints that VIP translated into
was in the same subnet as the client, the OVN router
fails to send the packet back via the same interface.

This commit changes that behavior and lets an OVN router
loop-back the packet via the same interface.

Signed-off-by: Gurucharan Shetty <guru@ovn.org>
---
 ovn/northd/ovn-northd.8.xml | 10 ++++++++++
 ovn/northd/ovn-northd.c     | 10 ++++++++++
 2 files changed, 20 insertions(+)

Comments

Mickey Spiegel July 8, 2016, 3:30 a.m. UTC | #1
>To: dev@openvswitch.org
>From: Gurucharan Shetty 
>Sent by: "dev" 
>Date: 07/05/2016 11:15AM
>Subject: [ovs-dev] [PATCH 1/2] ovn-northd: Ability to loop-back in a router.
>
>Currently, when a client looks at a load balancer VIP,
>it notices that it is in a different subnet than itself
>and sends the packet to its connected router port's
>MAC address. The load balancer intercepts it.
>
>If the load balancer VIP translates to an endpoint IP in a
>different subnet (than the one client has), than the
>load balancing works fine because the router will send
>the packet to the correct destination.
>
>But if one of the endpoints that VIP translated into
>was in the same subnet as the client, the OVN router
>fails to send the packet back via the same interface.

So the load balancer is translating the destination IP,
but leaving the MAC address unchanged?
Based on the MAC address, the packet is forwarded to
the router patch port?

>This commit changes that behavior and lets an OVN router
>loop-back the packet via the same interface.
>
>Signed-off-by: Gurucharan Shetty <guru@ovn.org>
>---
> ovn/northd/ovn-northd.8.xml | 10 ++++++++++
> ovn/northd/ovn-northd.c     | 10 ++++++++++
> 2 files changed, 20 insertions(+)
>
>diff --git a/ovn/northd/ovn-northd.8.xml b/ovn/northd/ovn-northd.8.xml
>index 6bc83ea..08e9d4e 100644
>--- a/ovn/northd/ovn-northd.8.xml
>+++ b/ovn/northd/ovn-northd.8.xml
>@@ -743,6 +743,16 @@ output;
>         port's own IP address is used to SNAT packets passing through that
>         router.
>       </li>
>+
>+      <li>
>+        Allow router to send back the packet to the same router port from
>+        which it was received (for cases where the destination IP address
>+        is in the same subnet as the router port).  For router ports with an
>+        IP address of <var>A</var> and mask of <var>M</var>, a priority-20 flow
>+        is added with a match for <code>ip4.dst ==
>+        <var>A</var>/<var>M </var></code> and an action of
>+        <code>inport = ""</code>.
>+      </li>
>     </ul>
> 
>     <p>
>diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
>index f4b4435..158f10d 100644
>--- a/ovn/northd/ovn-northd.c
>+++ b/ovn/northd/ovn-northd.c
>@@ -2367,6 +2367,16 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap *ports,
>                           "drop;");
>             free(match);
>         }
>+
>+        /* When destination IP address is in the same subnet as the
>+         * router port, the packet may need to be eventually sent
>+         * back the same port.  For cases like that, allow sending
>+         * out the inport. */
>+        match = xasprintf("ip4.dst == "IP_FMT"/"IP_FMT,
>+                          IP_ARGS(op->network), IP_ARGS(op->mask));
>+        ovn_lflow_add(lflows, op->od, S_ROUTER_IN_IP_INPUT, 20,
>+                      match, "inport = \"\"; next;");
>+        free(match);
>     }

I am concerned about two aspects of this proposal:
1. It applies to all traffic to directly connected subnets, not just
   for load balancer traffic. That is a significant change in behavior.
2. It is removing the inport early on in the router ingress pipeline,
   which scares me and seems like it will make debugging difficult.
   You could narrow it down quite a bit by matching on inport, but
   that still leaves the behavior that concerns me for some traffic.
   Looking at my design for NAT in a distributed router, removing
   the inport would break it. I suspect there might be other
   future features that might act on inport, such as RPF.

I am not sure what is the best alternative to address the problem.
This seems related to the basic load balancer design, the questions
that I asked at the beginning.

Brainstorming about workarounds, I wonder if it would be better to
have the load balancer traffic enter the router on a different port,
but that would require changing switch datapaths coming out of the
load balancer.

Wondering whether we should relax the restriction on
inport == outport for router datapaths, relying on TTL to mitigate
loops?
Perhaps controlled by a configuration knob?

Mickey

>     /* NAT in Gateway routers. */
>-- 
>1.9.1
>
>_______________________________________________
>dev mailing list
>dev@openvswitch.org
>http://openvswitch.org/mailman/listinfo/dev
>
Gurucharan Shetty July 8, 2016, 4:28 a.m. UTC | #2
On 7 July 2016 at 20:30, Mickey Spiegel <emspiege@us.ibm.com> wrote:

> >To: dev@openvswitch.org
> >From: Gurucharan Shetty
> >Sent by: "dev"
> >Date: 07/05/2016 11:15AM
> >Subject: [ovs-dev] [PATCH 1/2] ovn-northd: Ability to loop-back in a
> router.
> >
> >Currently, when a client looks at a load balancer VIP,
> >it notices that it is in a different subnet than itself
> >and sends the packet to its connected router port's
> >MAC address. The load balancer intercepts it.
> >
> >If the load balancer VIP translates to an endpoint IP in a
> >different subnet (than the one client has), than the
> >load balancing works fine because the router will send
> >the packet to the correct destination.
> >
> >But if one of the endpoints that VIP translated into
> >was in the same subnet as the client, the OVN router
> >fails to send the packet back via the same interface.
>
> So the load balancer is translating the destination IP,
> but leaving the MAC address unchanged?
> Based on the MAC address, the packet is forwarded to
> the router patch port?
>
Yes. This does look like a common behavior. Atleast, the default Kubernetes
load balancers (or any iptables based load-balancers) seem to do that.

 --snip...

>
>
> I am concerned about two aspects of this proposal:
> 1. It applies to all traffic to directly connected subnets, not just
>    for load balancer traffic. That is a significant change in behavior.
>
Agreed. (Having said that, some Physical routers seem to do the same thing.
i.e. have the capability to send back the traffic. I am not sure whether
all Physical routers are capable of doing it.)



> 2. It is removing the inport early on in the router ingress pipeline,
>    which scares me and seems like it will make debugging difficult.
>    You could narrow it down quite a bit by matching on inport, but
>    that still leaves the behavior that concerns me for some traffic.
>    Looking at my design for NAT in a distributed router, removing
>    the inport would break it. I suspect there might be other
>    future features that might act on inport, such as RPF.
>
>
This is only true when the destination IP address is in the same subnet as
the router port. For other cases, inport is available. Do you also need to
send back traffic? I guess what I am getting at is, why do you think this
will hurt other features which won't loop-back?


(For cases like that, a workaround would be to store inport in a register
for later use? )


>
>
> >     /* NAT in Gateway routers. */
> >--
> >1.9.1
> >
> >_______________________________________________
> >dev mailing list
> >dev@openvswitch.org
> >http://openvswitch.org/mailman/listinfo/dev
> >
>
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev
>
Gurucharan Shetty July 8, 2016, 4:33 a.m. UTC | #3
On 7 July 2016 at 21:28, Guru Shetty <guru@ovn.org> wrote:

>
>
> On 7 July 2016 at 20:30, Mickey Spiegel <emspiege@us.ibm.com> wrote:
>
>> >To: dev@openvswitch.org
>> >From: Gurucharan Shetty
>> >Sent by: "dev"
>> >Date: 07/05/2016 11:15AM
>> >Subject: [ovs-dev] [PATCH 1/2] ovn-northd: Ability to loop-back in a
>> router.
>> >
>> >Currently, when a client looks at a load balancer VIP,
>> >it notices that it is in a different subnet than itself
>> >and sends the packet to its connected router port's
>> >MAC address. The load balancer intercepts it.
>> >
>> >If the load balancer VIP translates to an endpoint IP in a
>> >different subnet (than the one client has), than the
>> >load balancing works fine because the router will send
>> >the packet to the correct destination.
>> >
>> >But if one of the endpoints that VIP translated into
>> >was in the same subnet as the client, the OVN router
>> >fails to send the packet back via the same interface.
>>
>> So the load balancer is translating the destination IP,
>> but leaving the MAC address unchanged?
>> Based on the MAC address, the packet is forwarded to
>> the router patch port?
>>
> Yes. This does look like a common behavior. Atleast, the default
> Kubernetes load balancers (or any iptables based load-balancers) seem to do
> that.
>
>  --snip...
>
>>
>>
>> I am concerned about two aspects of this proposal:
>> 1. It applies to all traffic to directly connected subnets, not just
>>    for load balancer traffic. That is a significant change in behavior.
>>
> Agreed. (Having said that, some Physical routers seem to do the same
> thing. i.e. have the capability to send back the traffic. I am not sure
> whether all Physical routers are capable of doing it.)
>
>
>
>> 2. It is removing the inport early on in the router ingress pipeline,
>>    which scares me and seems like it will make debugging difficult.
>>    You could narrow it down quite a bit by matching on inport, but
>>    that still leaves the behavior that concerns me for some traffic.
>>    Looking at my design for NAT in a distributed router, removing
>>    the inport would break it. I suspect there might be other
>>    future features that might act on inport, such as RPF.
>>
>>
> This is only true when the destination IP address is in the same subnet as
> the router port. For other cases, inport is available. Do you also need to
> send back traffic? I guess what I am getting at is, why do you think this
> will hurt other features which won't loop-back?
>

Looks like my patch does it for every router port in that router. That is
clearly wrong and was not my intention. If I limit it to only the port
which has that subnet, would that satisfy your concern?


>
>
> (For cases like that, a workaround would be to store inport in a register
> for later use? )
>
>
>>
>>
>> >     /* NAT in Gateway routers. */
>> >--
>> >1.9.1
>> >
>> >_______________________________________________
>> >dev mailing list
>> >dev@openvswitch.org
>> >http://openvswitch.org/mailman/listinfo/dev
>> >
>>
>> _______________________________________________
>> dev mailing list
>> dev@openvswitch.org
>> http://openvswitch.org/mailman/listinfo/dev
>>
>
>
Mickey Spiegel July 8, 2016, 5:36 a.m. UTC | #4
-----Guru Shetty <guru@ovn.org> wrote: -----

>To: Mickey Spiegel/San Jose/IBM@IBMUS
>From: Guru Shetty <guru@ovn.org>
>Date: 07/07/2016 09:34PM
>Cc: ovs dev <dev@openvswitch.org>
>Subject: Re: [ovs-dev] [PATCH 1/2] ovn-northd: Ability to loop-back
>in a router.
>
>
>
>On 7 July 2016 at 21:28, Guru Shetty <guru@ovn.org> wrote:
>
>
>On 7 July 2016 at 20:30, Mickey Spiegel <emspiege@us.ibm.com> wrote:
>>To: dev@openvswitch.org
> >From: Gurucharan Shetty
> >Sent by: "dev"
> >Date: 07/05/2016 11:15AM
> >Subject: [ovs-dev] [PATCH 1/2] ovn-northd: Ability to loop-back in a router.
> >
> >Currently, when a client looks at a load balancer VIP,
> >it notices that it is in a different subnet than itself
> >and sends the packet to its connected router port's
> >MAC address. The load balancer intercepts it.
> >
> >If the load balancer VIP translates to an endpoint IP in a
> >different subnet (than the one client has), than the
> >load balancing works fine because the router will send
> >the packet to the correct destination.
> >
> >But if one of the endpoints that VIP translated into
> >was in the same subnet as the client, the OVN router
> >fails to send the packet back via the same interface.
> 
> So the load balancer is translating the destination IP,
> but leaving the MAC address unchanged?
> Based on the MAC address, the packet is forwarded to
> the router patch port?
>Yes. This does look like a common behavior. Atleast, the default
>Kubernetes load balancers (or any iptables based load-balancers) seem
>to do that.

This does not seem clean. I still wonder whether it would make
more sense to start over on a separate logical switch for the load
balancer, leading to a different patch port into the logical router.

> --snip... 
>
>
> I am concerned about two aspects of this proposal:
> 1. It applies to all traffic to directly connected subnets, not just
>    for load balancer traffic. That is a significant change in behavior.
>Agreed. (Having said that, some Physical routers seem to do the same
>thing. i.e. have the capability to send back the traffic. I am not
>sure whether all Physical routers are capable of doing it.)

A quick search told me that one of the major router vendors allowed
that 8 or 9 years ago. Not sure if they allow it now.

Their firewalls do not allow it by default, but have a configuration knob.

>  2. It is removing the inport early on in the router ingress pipeline,
>    which scares me and seems like it will make debugging difficult.
>    You could narrow it down quite a bit by matching on inport, but
>    that still leaves the behavior that concerns me for some traffic.
>    Looking at my design for NAT in a distributed router, removing
>    the inport would break it. I suspect there might be other
>    future features that might act on inport, such as RPF.
> 
>
>This is only true when the destination IP address is in the same
>subnet as the router port. For other cases, inport is available. Do
>you also need to send back traffic? I guess what I am getting at is,
>why do you think this will hurt other features which won't loop-back?

This is not about loopback. It is about the mechanism that you chose
to achieve your goal, zeroing out the inport very early in the router
ingress pipeline. Other lookups later in the router ingress pipeline
may need to have the inport available for match conditions. For the
NAT design that I am working on, I want to match on the router
gateway address (SNAT) and inport == gateway port, together. For
RPF, it could be any router port.

>Looks like my patch does it for every router port in that router.
>That is clearly wrong and was not my intention. If I limit it to only
>the port which has that subnet, would that satisfy your concern?

No. That is what I mentioned above by "narrow it down quite a bit
by matching on inport". You would still be zeroing out the inport
in some cases, which may affect later pipeline stages that want to
match on inport.

Once you put this change in, in what cases are you still precluding
inport == outport?
Only when the dest IP matches a default, static or dynamic route
rather than a connected subnet.
Does the inport == outport check still have any significant value
once you do that?
I would argue not much. The simplest solution in that case would
be to turn off the check for router datapaths, though I would still
think it should be protected by a configuration knob of some sort.
If you turn off the check for router datapaths, the change would be
in physical.c for table 34, and would not affect the logical flows
constructed by northd.

Mickey

>(For cases like that, a workaround would be to store inport in a
>register for later use? )
> 
> 
> >     /* NAT in Gateway routers. */
> >--
> >1.9.1
> >
> >_______________________________________________
> >dev mailing list
> >dev@openvswitch.org
> >http://openvswitch.org/mailman/listinfo/dev
> >
> 
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev
> 
> 
>
Guru Shetty July 8, 2016, 8:30 a.m. UTC | #5
> On Jul 7, 2016, at 10:36 PM, Mickey Spiegel <emspiege@us.ibm.com> wrote:
> 
> -----Guru Shetty <guru@ovn.org> wrote: -----
> 
>> To: Mickey Spiegel/San Jose/IBM@IBMUS
>> From: Guru Shetty <guru@ovn.org>
>> Date: 07/07/2016 09:34PM
>> Cc: ovs dev <dev@openvswitch.org>
>> Subject: Re: [ovs-dev] [PATCH 1/2] ovn-northd: Ability to loop-back
>> in a router.
>> 
>> 
>> 
>> On 7 July 2016 at 21:28, Guru Shetty <guru@ovn.org> wrote:
>> 
>> 
>>> On 7 July 2016 at 20:30, Mickey Spiegel <emspiege@us.ibm.com> wrote:
>>> To: dev@openvswitch.org
>>> From: Gurucharan Shetty
>>> Sent by: "dev"
>>> Date: 07/05/2016 11:15AM
>>> Subject: [ovs-dev] [PATCH 1/2] ovn-northd: Ability to loop-back in a router.
>>> 
>>> Currently, when a client looks at a load balancer VIP,
>>> it notices that it is in a different subnet than itself
>>> and sends the packet to its connected router port's
>>> MAC address. The load balancer intercepts it.
>>> 
>>> If the load balancer VIP translates to an endpoint IP in a
>>> different subnet (than the one client has), than the
>>> load balancing works fine because the router will send
>>> the packet to the correct destination.
>>> 
>>> But if one of the endpoints that VIP translated into
>>> was in the same subnet as the client, the OVN router
>>> fails to send the packet back via the same interface.
>> 
>> So the load balancer is translating the destination IP,
>> but leaving the MAC address unchanged?
>> Based on the MAC address, the packet is forwarded to
>> the router patch port?
>> Yes. This does look like a common behavior. Atleast, the default
>> Kubernetes load balancers (or any iptables based load-balancers) seem
>> to do that.
> 
> This does not seem clean. I still wonder whether it would make
> more sense to start over on a separate logical switch for the load
> balancer, leading to a different patch port into the logical router.
I feel right now that  it complicates the topology for not a lot of useful benefits.

> 
>> --snip... 
>> 
>> 
>> I am concerned about two aspects of this proposal:
>> 1. It applies to all traffic to directly connected subnets, not just
>>   for load balancer traffic. That is a significant change in behavior.
>> Agreed. (Having said that, some Physical routers seem to do the same
>> thing. i.e. have the capability to send back the traffic. I am not
>> sure whether all Physical routers are capable of doing it.)
> 
> A quick search told me that one of the major router vendors allowed
> that 8 or 9 years ago. Not sure if they allow it now.
> 
> Their firewalls do not allow it by default, but have a configuration knob.
> 
>> 2. It is removing the inport early on in the router ingress pipeline,
>>   which scares me and seems like it will make debugging difficult.
>>   You could narrow it down quite a bit by matching on inport, but
>>   that still leaves the behavior that concerns me for some traffic.
>>   Looking at my design for NAT in a distributed router, removing
>>   the inport would break it. I suspect there might be other
>>   future features that might act on inport, such as RPF.
>> 
>> 
>> This is only true when the destination IP address is in the same
>> subnet as the router port. For other cases, inport is available. Do
>> you also need to send back traffic? I guess what I am getting at is,
>> why do you think this will hurt other features which won't loop-back?
> 
> This is not about loopback. It is about the mechanism that you chose
> to achieve your goal, zeroing out the inport very early in the router
> ingress pipeline. Other lookups later in the router ingress pipeline
> may need to have the inport available for match conditions. For the
> NAT design that I am working on, I want to match on the router
> gateway address (SNAT) and inport == gateway port, together. For
> RPF, it could be any router port.
> 
>> Looks like my patch does it for every router port in that router.
>> That is clearly wrong and was not my intention. If I limit it to only
>> the port which has that subnet, would that satisfy your concern?
> 
> No. That is what I mentioned above by "narrow it down quite a bit
> by matching on inport". You would still be zeroing out the inport
> in some cases, which may affect later pipeline stages that want to
> match on inport.
> 
> Once you put this change in, in what cases are you still precluding
> inport == outport?
> Only when the dest IP matches a default, static or dynamic route
> rather than a connected subnet.
> Does the inport == outport check still have any significant value
> once you do that?
> I would argue not much. The simplest solution in that case would
> be to turn off the check for router datapaths, though I would still
> think it should be protected by a configuration knob of some sort.
> If you turn off the check for router datapaths, the change would be
> in physical.c for table 34, and would not affect the logical flows
> constructed by northd.

That is one way to look at it and makes sense. Let me think over this and talk to people for more ideas.


> 
> Mickey
> 
>> (For cases like that, a workaround would be to store inport in a
>> register for later use? )
>> 
>> 
>>>    /* NAT in Gateway routers. */
>>> --
>>> 1.9.1
>>> 
>>> _______________________________________________
>>> dev mailing list
>>> dev@openvswitch.org
>>> http://openvswitch.org/mailman/listinfo/dev
>> 
>> _______________________________________________
>> dev mailing list
>> dev@openvswitch.org
>> http://openvswitch.org/mailman/listinfo/dev
> 
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev

Patch
diff mbox

diff --git a/ovn/northd/ovn-northd.8.xml b/ovn/northd/ovn-northd.8.xml
index 6bc83ea..08e9d4e 100644
--- a/ovn/northd/ovn-northd.8.xml
+++ b/ovn/northd/ovn-northd.8.xml
@@ -743,6 +743,16 @@  output;
         port's own IP address is used to SNAT packets passing through that
         router.
       </li>
+
+      <li>
+        Allow router to send back the packet to the same router port from
+        which it was received (for cases where the destination IP address
+        is in the same subnet as the router port).  For router ports with an
+        IP address of <var>A</var> and mask of <var>M</var>, a priority-20 flow
+        is added with a match for <code>ip4.dst ==
+        <var>A</var>/<var>M </var></code> and an action of
+        <code>inport = ""</code>.
+      </li>
     </ul>
 
     <p>
diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
index f4b4435..158f10d 100644
--- a/ovn/northd/ovn-northd.c
+++ b/ovn/northd/ovn-northd.c
@@ -2367,6 +2367,16 @@  build_lrouter_flows(struct hmap *datapaths, struct hmap *ports,
                           "drop;");
             free(match);
         }
+
+        /* When destination IP address is in the same subnet as the
+         * router port, the packet may need to be eventually sent
+         * back the same port.  For cases like that, allow sending
+         * out the inport. */
+        match = xasprintf("ip4.dst == "IP_FMT"/"IP_FMT,
+                          IP_ARGS(op->network), IP_ARGS(op->mask));
+        ovn_lflow_add(lflows, op->od, S_ROUTER_IN_IP_INPUT, 20,
+                      match, "inport = \"\"; next;");
+        free(match);
     }
 
     /* NAT in Gateway routers. */