diff mbox

[ovs-dev] native tunneling bug?

Message ID 20150901235621.GA29879@nicira.com
State RFC
Headers show

Commit Message

Ben Pfaff Sept. 1, 2015, 11:56 p.m. UTC
I think I've come across a bug in OVS native tunneling, or at any rate
an important difference between Linux kernel and OVS native tunneling.
In Linux kernel tunneling, a tunnel packet received by the kernel first
passes through the kernel IP stack.  Among other things, the IP stack
drops packets that are not destined to the current host.  It appears to
me that the native tunneling code doesn't have any similar check,
because I'm seeing it accept and packets flooded by the upstream switch
that are not destined to an IP address of the host.  This means in
effect that the user of native tunneling must set "options:local_ip",
whereas a user of Linux kernel tunneling doesn't (and probably
shouldn't).

I suspect that this behavior is unintentional; it isn't mentioned in
README-native-tunneling.md or (as far as I can tell) anywhere else.

I noticed this while testing OVN.  If you configure a few hypervisors
and send packets from only one of them, then the switch that connects
them will flood all the packets to all of the rest (since it hasn't yet
learned where they are).  The result is that for N hypervisors, remote
VIFs get N-1 copies of the packets instead of just one.  I'm appending a
patch that works around it, though I'd prefer to fix the tunneling code
rather than apply this patch.

--8<--------------------------cut here-------------------------->8--

From: Ben Pfaff <blp@nicira.com>
Date: Tue, 1 Sep 2015 16:52:29 -0700
Subject: [PATCH] ovn-controller: Attach local_ip to each tunnel.

This avoids packet duplication when native tunneling is used.
---
 ovn/controller/encaps.c | 17 +++++++++++++++--
 1 file changed, 15 insertions(+), 2 deletions(-)

Comments

Pravin B Shelar Sept. 2, 2015, 2:14 a.m. UTC | #1
On Tue, Sep 1, 2015 at 4:56 PM, Ben Pfaff <blp@nicira.com> wrote:
> I think I've come across a bug in OVS native tunneling, or at any rate
> an important difference between Linux kernel and OVS native tunneling.
> In Linux kernel tunneling, a tunnel packet received by the kernel first
> passes through the kernel IP stack.  Among other things, the IP stack
> drops packets that are not destined to the current host.  It appears to
> me that the native tunneling code doesn't have any similar check,
> because I'm seeing it accept and packets flooded by the upstream switch
> that are not destined to an IP address of the host.  This means in
> effect that the user of native tunneling must set "options:local_ip",
> whereas a user of Linux kernel tunneling doesn't (and probably
> shouldn't).
>
Right. Its bug.

> I suspect that this behavior is unintentional; it isn't mentioned in
> README-native-tunneling.md or (as far as I can tell) anywhere else.
>
> I noticed this while testing OVN.  If you configure a few hypervisors
> and send packets from only one of them, then the switch that connects
> them will flood all the packets to all of the rest (since it hasn't yet
> learned where they are).  The result is that for N hypervisors, remote
> VIFs get N-1 copies of the packets instead of just one.  I'm appending a
> patch that works around it, though I'd prefer to fix the tunneling code
> rather than apply this patch.
>
We can fix it adding the local ip-address to tnl-port-map.
I will send a patch.
Ben Pfaff Sept. 2, 2015, 2:19 a.m. UTC | #2
On Tue, Sep 01, 2015 at 07:14:00PM -0700, Pravin Shelar wrote:
> On Tue, Sep 1, 2015 at 4:56 PM, Ben Pfaff <blp@nicira.com> wrote:
> > I think I've come across a bug in OVS native tunneling, or at any rate
> > an important difference between Linux kernel and OVS native tunneling.
> > In Linux kernel tunneling, a tunnel packet received by the kernel first
> > passes through the kernel IP stack.  Among other things, the IP stack
> > drops packets that are not destined to the current host.  It appears to
> > me that the native tunneling code doesn't have any similar check,
> > because I'm seeing it accept and packets flooded by the upstream switch
> > that are not destined to an IP address of the host.  This means in
> > effect that the user of native tunneling must set "options:local_ip",
> > whereas a user of Linux kernel tunneling doesn't (and probably
> > shouldn't).
> >
> Right. Its bug.
> 
> > I suspect that this behavior is unintentional; it isn't mentioned in
> > README-native-tunneling.md or (as far as I can tell) anywhere else.
> >
> > I noticed this while testing OVN.  If you configure a few hypervisors
> > and send packets from only one of them, then the switch that connects
> > them will flood all the packets to all of the rest (since it hasn't yet
> > learned where they are).  The result is that for N hypervisors, remote
> > VIFs get N-1 copies of the packets instead of just one.  I'm appending a
> > patch that works around it, though I'd prefer to fix the tunneling code
> > rather than apply this patch.
> >
> We can fix it adding the local ip-address to tnl-port-map.
> I will send a patch.

Thanks Pravin!
Jesse Gross Sept. 2, 2015, 1:44 p.m. UTC | #3
On Tue, Sep 1, 2015 at 7:14 PM, Pravin Shelar <pshelar@nicira.com> wrote:
> On Tue, Sep 1, 2015 at 4:56 PM, Ben Pfaff <blp@nicira.com> wrote:
>> I think I've come across a bug in OVS native tunneling, or at any rate
>> an important difference between Linux kernel and OVS native tunneling.
>> In Linux kernel tunneling, a tunnel packet received by the kernel first
>> passes through the kernel IP stack.  Among other things, the IP stack
>> drops packets that are not destined to the current host.  It appears to
>> me that the native tunneling code doesn't have any similar check,
>> because I'm seeing it accept and packets flooded by the upstream switch
>> that are not destined to an IP address of the host.  This means in
>> effect that the user of native tunneling must set "options:local_ip",
>> whereas a user of Linux kernel tunneling doesn't (and probably
>> shouldn't).
>>
> Right. Its bug.
>
>> I suspect that this behavior is unintentional; it isn't mentioned in
>> README-native-tunneling.md or (as far as I can tell) anywhere else.
>>
>> I noticed this while testing OVN.  If you configure a few hypervisors
>> and send packets from only one of them, then the switch that connects
>> them will flood all the packets to all of the rest (since it hasn't yet
>> learned where they are).  The result is that for N hypervisors, remote
>> VIFs get N-1 copies of the packets instead of just one.  I'm appending a
>> patch that works around it, though I'd prefer to fix the tunneling code
>> rather than apply this patch.
>>
> We can fix it adding the local ip-address to tnl-port-map.
> I will send a patch.

Presumably we also should use DMAC as well?
Ben Pfaff Sept. 2, 2015, 3:43 p.m. UTC | #4
On Wed, Sep 02, 2015 at 06:44:15AM -0700, Jesse Gross wrote:
> On Tue, Sep 1, 2015 at 7:14 PM, Pravin Shelar <pshelar@nicira.com> wrote:
> > On Tue, Sep 1, 2015 at 4:56 PM, Ben Pfaff <blp@nicira.com> wrote:
> >> I think I've come across a bug in OVS native tunneling, or at any rate
> >> an important difference between Linux kernel and OVS native tunneling.
> >> In Linux kernel tunneling, a tunnel packet received by the kernel first
> >> passes through the kernel IP stack.  Among other things, the IP stack
> >> drops packets that are not destined to the current host.  It appears to
> >> me that the native tunneling code doesn't have any similar check,
> >> because I'm seeing it accept and packets flooded by the upstream switch
> >> that are not destined to an IP address of the host.  This means in
> >> effect that the user of native tunneling must set "options:local_ip",
> >> whereas a user of Linux kernel tunneling doesn't (and probably
> >> shouldn't).
> >>
> > Right. Its bug.
> >
> >> I suspect that this behavior is unintentional; it isn't mentioned in
> >> README-native-tunneling.md or (as far as I can tell) anywhere else.
> >>
> >> I noticed this while testing OVN.  If you configure a few hypervisors
> >> and send packets from only one of them, then the switch that connects
> >> them will flood all the packets to all of the rest (since it hasn't yet
> >> learned where they are).  The result is that for N hypervisors, remote
> >> VIFs get N-1 copies of the packets instead of just one.  I'm appending a
> >> patch that works around it, though I'd prefer to fix the tunneling code
> >> rather than apply this patch.
> >>
> > We can fix it adding the local ip-address to tnl-port-map.
> > I will send a patch.
> 
> Presumably we also should use DMAC as well?

Good point!
Jesse Gross Sept. 2, 2015, 3:55 p.m. UTC | #5
On Wed, Sep 2, 2015 at 6:44 AM, Jesse Gross <jesse@nicira.com> wrote:
> On Tue, Sep 1, 2015 at 7:14 PM, Pravin Shelar <pshelar@nicira.com> wrote:
>> On Tue, Sep 1, 2015 at 4:56 PM, Ben Pfaff <blp@nicira.com> wrote:
>>> I think I've come across a bug in OVS native tunneling, or at any rate
>>> an important difference between Linux kernel and OVS native tunneling.
>>> In Linux kernel tunneling, a tunnel packet received by the kernel first
>>> passes through the kernel IP stack.  Among other things, the IP stack
>>> drops packets that are not destined to the current host.  It appears to
>>> me that the native tunneling code doesn't have any similar check,
>>> because I'm seeing it accept and packets flooded by the upstream switch
>>> that are not destined to an IP address of the host.  This means in
>>> effect that the user of native tunneling must set "options:local_ip",
>>> whereas a user of Linux kernel tunneling doesn't (and probably
>>> shouldn't).
>>>
>> Right. Its bug.
>>
>>> I suspect that this behavior is unintentional; it isn't mentioned in
>>> README-native-tunneling.md or (as far as I can tell) anywhere else.
>>>
>>> I noticed this while testing OVN.  If you configure a few hypervisors
>>> and send packets from only one of them, then the switch that connects
>>> them will flood all the packets to all of the rest (since it hasn't yet
>>> learned where they are).  The result is that for N hypervisors, remote
>>> VIFs get N-1 copies of the packets instead of just one.  I'm appending a
>>> patch that works around it, though I'd prefer to fix the tunneling code
>>> rather than apply this patch.
>>>
>> We can fix it adding the local ip-address to tnl-port-map.
>> I will send a patch.
>
> Presumably we also should use DMAC as well?

And I realized no VLAN tag as well (since if it is an access port, the
tag should be stripped off as already).

This is a larger point but there's a bunch of things that are missing
from a typical IP stack implementation. Some that I immediately see
are verifying the IP header checksum and checking the header length. I
guess there are probably others as well.
Pravin B Shelar Sept. 2, 2015, 6:37 p.m. UTC | #6
On Wed, Sep 2, 2015 at 8:55 AM, Jesse Gross <jesse@nicira.com> wrote:
> On Wed, Sep 2, 2015 at 6:44 AM, Jesse Gross <jesse@nicira.com> wrote:
>> On Tue, Sep 1, 2015 at 7:14 PM, Pravin Shelar <pshelar@nicira.com> wrote:
>>> On Tue, Sep 1, 2015 at 4:56 PM, Ben Pfaff <blp@nicira.com> wrote:
>>>> I think I've come across a bug in OVS native tunneling, or at any rate
>>>> an important difference between Linux kernel and OVS native tunneling.
>>>> In Linux kernel tunneling, a tunnel packet received by the kernel first
>>>> passes through the kernel IP stack.  Among other things, the IP stack
>>>> drops packets that are not destined to the current host.  It appears to
>>>> me that the native tunneling code doesn't have any similar check,
>>>> because I'm seeing it accept and packets flooded by the upstream switch
>>>> that are not destined to an IP address of the host.  This means in
>>>> effect that the user of native tunneling must set "options:local_ip",
>>>> whereas a user of Linux kernel tunneling doesn't (and probably
>>>> shouldn't).
>>>>
>>> Right. Its bug.
>>>
>>>> I suspect that this behavior is unintentional; it isn't mentioned in
>>>> README-native-tunneling.md or (as far as I can tell) anywhere else.
>>>>
>>>> I noticed this while testing OVN.  If you configure a few hypervisors
>>>> and send packets from only one of them, then the switch that connects
>>>> them will flood all the packets to all of the rest (since it hasn't yet
>>>> learned where they are).  The result is that for N hypervisors, remote
>>>> VIFs get N-1 copies of the packets instead of just one.  I'm appending a
>>>> patch that works around it, though I'd prefer to fix the tunneling code
>>>> rather than apply this patch.
>>>>
>>> We can fix it adding the local ip-address to tnl-port-map.
>>> I will send a patch.
>>
>> Presumably we also should use DMAC as well?
>
> And I realized no VLAN tag as well (since if it is an access port, the
> tag should be stripped off as already).
>
ok. I incorporate it in the patch.

> This is a larger point but there's a bunch of things that are missing
> from a typical IP stack implementation. Some that I immediately see
> are verifying the IP header checksum and checking the header length. I
> guess there are probably others as well.

I think we can do this checking in tnl-pop action.
diff mbox

Patch

diff --git a/ovn/controller/encaps.c b/ovn/controller/encaps.c
index 070b741..74b0e87 100644
--- a/ovn/controller/encaps.c
+++ b/ovn/controller/encaps.c
@@ -113,7 +113,7 @@  tunnel_create_name(struct tunnel_ctx *tc, const char *chassis_id)
 
 static void
 tunnel_add(struct tunnel_ctx *tc, const char *new_chassis_id,
-           const struct sbrec_encap *encap)
+           const struct sbrec_encap *encap, const char *encap_ip)
 {
     struct port_hash_node *hash_node;
 
@@ -167,6 +167,7 @@  tunnel_add(struct tunnel_ctx *tc, const char *new_chassis_id,
     ovsrec_interface_set_name(iface, port_name);
     ovsrec_interface_set_type(iface, encap->type);
     smap_add(&options, "remote_ip", encap->ip);
+    smap_add(&options, "local_ip", encap_ip);
     smap_add(&options, "key", "flow");
     ovsrec_interface_set_options(iface, &options);
     smap_destroy(&options);
@@ -235,6 +236,18 @@  encaps_run(struct controller_ctx *ctx, const struct ovsrec_bridge *br_int,
         return;
     }
 
+    const struct ovsrec_open_vswitch *cfg = ovsrec_open_vswitch_first(ctx->ovs_idl);
+    if (!cfg) {
+        VLOG_INFO("No Open_vSwitch row defined.");
+        return;
+    }
+
+    const char *encap_ip = smap_get(&cfg->external_ids, "ovn-encap-ip");
+    if (!encap_ip) {
+        VLOG_INFO("Need to specify an encap ip");
+        return;
+    }
+
     const struct sbrec_chassis *chassis_rec;
     const struct ovsrec_bridge *br;
 
@@ -278,7 +291,7 @@  encaps_run(struct controller_ctx *ctx, const struct ovsrec_bridge *br_int,
                 VLOG_INFO("No supported encaps for '%s'", chassis_rec->name);
                 continue;
             }
-            tunnel_add(&tc, chassis_rec->name, encap);
+            tunnel_add(&tc, chassis_rec->name, encap, encap_ip);
         }
     }