diff mbox

[net] vxlan: revert per-vxlan port

Message ID 20130520103017.054ae605@nehalam.linuxnetplumber.net
State Rejected, archived
Delegated to: David Miller
Headers show

Commit Message

Stephen Hemminger May 20, 2013, 5:30 p.m. UTC
\This commit 823aa873bc782f1c51b1ce8ec6da7cfcaf93836e
Author: stephen hemminger <stephen@networkplumber.org>
Date:   Sat Apr 27 11:31:57 2013 +0000

    vxlan: allow choosing destination port per vxlan

is broken revert it. The change allowed setting per port for transmit
but did not add additional listening sockets, which made any vxlan's
defined with non-default port send only.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
This patch needs to go -net only and not -net-next where the problem has
been resolved. Not sure what the process flow is to make that happen.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

David Stevens May 20, 2013, 6:15 p.m. UTC | #1
> From: Stephen Hemminger <stephen@networkplumber.org>
 
> \This commit 823aa873bc782f1c51b1ce8ec6da7cfcaf93836e
> Author: stephen hemminger <stephen@networkplumber.org>
> Date:   Sat Apr 27 11:31:57 2013 +0000
> 
>     vxlan: allow choosing destination port per vxlan
> 
> is broken revert it. The change allowed setting per port for transmit
> but did not add additional listening sockets, which made any vxlan's
> defined with non-default port send only.

This allows you to specify a different default port for
transmits, which is what you want to do if your own instance
of VXLAN is the odd one. I don't see any requirement for multiple
listen ports for that to be useful, since those sending to you
can have complete fdb tables even if the local instance doesn't
and relies on the default. Not to mention using an agent to
fill the fdb triggered by packets sent to the default, so the
receiver is not necessarily even a VXLAN instance. The receiver
side and transmit side ports can be completely independent of
each other, as in any other client-server system.

I think an administrator should have full flexibility to specify
the ports and destinations as s/he sees fit and if you don't think
that's a useful feature, you don't have to use it; the defaults
work fine, too.

                                                        +-DLS


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Stephen Hemminger May 20, 2013, 6:30 p.m. UTC | #2
On Mon, 20 May 2013 14:15:59 -0400
David Stevens <dlstevens@us.ibm.com> wrote:

> > From: Stephen Hemminger <stephen@networkplumber.org>
>  
> > \This commit 823aa873bc782f1c51b1ce8ec6da7cfcaf93836e
> > Author: stephen hemminger <stephen@networkplumber.org>
> > Date:   Sat Apr 27 11:31:57 2013 +0000
> > 
> >     vxlan: allow choosing destination port per vxlan
> > 
> > is broken revert it. The change allowed setting per port for transmit
> > but did not add additional listening sockets, which made any vxlan's
> > defined with non-default port send only.
> 
> This allows you to specify a different default port for
> transmits, which is what you want to do if your own instance
> of VXLAN is the odd one. I don't see any requirement for multiple
> listen ports for that to be useful, since those sending to you
> can have complete fdb tables even if the local instance doesn't
> and relies on the default. Not to mention using an agent to
> fill the fdb triggered by packets sent to the default, so the
> receiver is not necessarily even a VXLAN instance. The receiver
> side and transmit side ports can be completely independent of
> each other, as in any other client-server system.

Vxlan's are a weird beast. They can be viewed as either bridge like
entities or tunnel like entities. I view them more as bridge type
devices where user configures two hosts with equivalent values and
they learn about each other. In that case the code in 3.10 is broken;
but the version with the learning in net-next works.

Your view is that VXLAN's are more like tunnels, where each host
has static entries to know about every other host. In that mode,
3.10 is useable, but the same effect can be had by defining static
neighbour entries. 

Both views are valid, but I don't want the behaviour to change from
3.10 to 3.11, since 3.10 is not released yet, it makes more sense
to go back and remove IFLA_VXLAN_PORT from 3.10 and make it work
like 3.9.


> I think an administrator should have full flexibility to specify
> the ports and destinations as s/he sees fit and if you don't think
> that's a useful feature, you don't have to use it; the defaults
> work fine, too.

They can do that with neighbour entries.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sridhar Samudrala May 20, 2013, 11:59 p.m. UTC | #3
On 5/20/2013 11:30 AM, Stephen Hemminger wrote:
> On Mon, 20 May 2013 14:15:59 -0400
> David Stevens <dlstevens@us.ibm.com> wrote:
>
>>> From: Stephen Hemminger <stephen@networkplumber.org>
>>   
>>> \This commit 823aa873bc782f1c51b1ce8ec6da7cfcaf93836e
>>> Author: stephen hemminger <stephen@networkplumber.org>
>>> Date:   Sat Apr 27 11:31:57 2013 +0000
>>>
>>>      vxlan: allow choosing destination port per vxlan
>>>
>>> is broken revert it. The change allowed setting per port for transmit
>>> but did not add additional listening sockets, which made any vxlan's
>>> defined with non-default port send only.
>> This allows you to specify a different default port for
>> transmits, which is what you want to do if your own instance
>> of VXLAN is the odd one. I don't see any requirement for multiple
>> listen ports for that to be useful, since those sending to you
>> can have complete fdb tables even if the local instance doesn't
>> and relies on the default. Not to mention using an agent to
>> fill the fdb triggered by packets sent to the default, so the
>> receiver is not necessarily even a VXLAN instance. The receiver
>> side and transmit side ports can be completely independent of
>> each other, as in any other client-server system.
> Vxlan's are a weird beast. They can be viewed as either bridge like
> entities or tunnel like entities. I view them more as bridge type
> devices where user configures two hosts with equivalent values and
> they learn about each other. In that case the code in 3.10 is broken;
> but the version with the learning in net-next works.
>
> Your view is that VXLAN's are more like tunnels, where each host
> has static entries to know about every other host. In that mode,
> 3.10 is useable, but the same effect can be had by defining static
> neighbour entries.

how can we send to a different dst port using static neighbor entries?

Thanks
Sridhar
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Stephen Hemminger May 21, 2013, 12:13 a.m. UTC | #4
On Mon, 20 May 2013 16:59:44 -0700
Sridhar Samudrala <samudrala.sridhar@gmail.com> wrote:

> On 5/20/2013 11:30 AM, Stephen Hemminger wrote:
> > On Mon, 20 May 2013 14:15:59 -0400
> > David Stevens <dlstevens@us.ibm.com> wrote:
> >
> >>> From: Stephen Hemminger <stephen@networkplumber.org>
> >>   
> >>> \This commit 823aa873bc782f1c51b1ce8ec6da7cfcaf93836e
> >>> Author: stephen hemminger <stephen@networkplumber.org>
> >>> Date:   Sat Apr 27 11:31:57 2013 +0000
> >>>
> >>>      vxlan: allow choosing destination port per vxlan
> >>>
> >>> is broken revert it. The change allowed setting per port for transmit
> >>> but did not add additional listening sockets, which made any vxlan's
> >>> defined with non-default port send only.
> >> This allows you to specify a different default port for
> >> transmits, which is what you want to do if your own instance
> >> of VXLAN is the odd one. I don't see any requirement for multiple
> >> listen ports for that to be useful, since those sending to you
> >> can have complete fdb tables even if the local instance doesn't
> >> and relies on the default. Not to mention using an agent to
> >> fill the fdb triggered by packets sent to the default, so the
> >> receiver is not necessarily even a VXLAN instance. The receiver
> >> side and transmit side ports can be completely independent of
> >> each other, as in any other client-server system.
> > Vxlan's are a weird beast. They can be viewed as either bridge like
> > entities or tunnel like entities. I view them more as bridge type
> > devices where user configures two hosts with equivalent values and
> > they learn about each other. In that case the code in 3.10 is broken;
> > but the version with the learning in net-next works.
> >
> > Your view is that VXLAN's are more like tunnels, where each host
> > has static entries to know about every other host. In that mode,
> > 3.10 is useable, but the same effect can be had by defining static
> > neighbour entries.
> 
> how can we send to a different dst port using static neighbor entries?
> 
> Thanks
> Sridhar

It is part of this commit in 3.10 already.

commit 6681712d67eef14c4ce793561c3231659153a320
Author: David Stevens <dlstevens@us.ibm.com>
Date:   Fri Mar 15 04:35:51 2013 +0000

    vxlan: generalize forwarding tables
    
    This patch generalizes VXLAN forwarding table entries allowing an administrator
    to:
        1) specify multiple destinations for a given MAC
        2) specify alternate vni's in the VXLAN header
        3) specify alternate destination UDP ports
        4) use multicast MAC addresses as fdb lookup keys
        5) specify multicast destinations
        6) specify the outgoing interface for forwarded packets
    
    The combination allows configuration of more complex topologies using VXLAN
    encapsulation.
    
    Changes since v1: rebase to 3.9.0-rc2
    
    Signed-Off-By: David L Stevens <dlstevens@us.ibm.com>
    
    Signed-off-by: David S. Miller <davem@davemloft.net>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sridhar Samudrala May 21, 2013, 2:53 a.m. UTC | #5
On 5/20/2013 5:13 PM, Stephen Hemminger wrote:
> On Mon, 20 May 2013 16:59:44 -0700
> Sridhar Samudrala <samudrala.sridhar@gmail.com> wrote:
>
>> On 5/20/2013 11:30 AM, Stephen Hemminger wrote:
>>> On Mon, 20 May 2013 14:15:59 -0400
>>> David Stevens <dlstevens@us.ibm.com> wrote:
>>>
>>>>> From: Stephen Hemminger <stephen@networkplumber.org>
>>>>    
>>>>> \This commit 823aa873bc782f1c51b1ce8ec6da7cfcaf93836e
>>>>> Author: stephen hemminger <stephen@networkplumber.org>
>>>>> Date:   Sat Apr 27 11:31:57 2013 +0000
>>>>>
>>>>>       vxlan: allow choosing destination port per vxlan
>>>>>
>>>>> is broken revert it. The change allowed setting per port for transmit
>>>>> but did not add additional listening sockets, which made any vxlan's
>>>>> defined with non-default port send only.
>>>> This allows you to specify a different default port for
>>>> transmits, which is what you want to do if your own instance
>>>> of VXLAN is the odd one. I don't see any requirement for multiple
>>>> listen ports for that to be useful, since those sending to you
>>>> can have complete fdb tables even if the local instance doesn't
>>>> and relies on the default. Not to mention using an agent to
>>>> fill the fdb triggered by packets sent to the default, so the
>>>> receiver is not necessarily even a VXLAN instance. The receiver
>>>> side and transmit side ports can be completely independent of
>>>> each other, as in any other client-server system.
>>> Vxlan's are a weird beast. They can be viewed as either bridge like
>>> entities or tunnel like entities. I view them more as bridge type
>>> devices where user configures two hosts with equivalent values and
>>> they learn about each other. In that case the code in 3.10 is broken;
>>> but the version with the learning in net-next works.
>>>
>>> Your view is that VXLAN's are more like tunnels, where each host
>>> has static entries to know about every other host. In that mode,
>>> 3.10 is useable, but the same effect can be had by defining static
>>> neighbour entries.
>> how can we send to a different dst port using static neighbor entries?
>>
>> Thanks
>> Sridhar
> It is part of this commit in 3.10 already.
>
> commit 6681712d67eef14c4ce793561c3231659153a320
> Author: David Stevens <dlstevens@us.ibm.com>
> Date:   Fri Mar 15 04:35:51 2013 +0000
>
>      vxlan: generalize forwarding tables
>      
>      This patch generalizes VXLAN forwarding table entries allowing an administrator
>      to:
>          1) specify multiple destinations for a given MAC
>          2) specify alternate vni's in the VXLAN header
>          3) specify alternate destination UDP ports
>          4) use multicast MAC addresses as fdb lookup keys
>          5) specify multicast destinations
>          6) specify the outgoing interface for forwarded packets
>      
>
OK. I am aware of specifying dst port via static fdb(forwarding table) 
entries.
I was confused when you said neighbor entries.

Thanks
Sridhar

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller May 22, 2013, 10:08 p.m. UTC | #6
From: Stephen Hemminger <stephen@networkplumber.org>
Date: Mon, 20 May 2013 11:30:00 -0700

> On Mon, 20 May 2013 14:15:59 -0400
> David Stevens <dlstevens@us.ibm.com> wrote:
> 
>> I think an administrator should have full flexibility to specify
>> the ports and destinations as s/he sees fit and if you don't think
>> that's a useful feature, you don't have to use it; the defaults
>> work fine, too.
> 
> They can do that with neighbour entries.

David, please come to some kind of agreement with Stephen about what
we're going to do about this.

Thank you.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Stevens May 22, 2013, 11:18 p.m. UTC | #7
David Miller <davem@davemloft.net> wrote on 05/22/2013 06:08:30 PM:

> David, please come to some kind of agreement with Stephen about what
> we're going to do about this.

Well, I think being able to specify an alternate port for the
default fdb entry is a useful feature in its own right. It is
not equivalent to individual fdb entries with alternate ports
because those require the host to know all the destinations in
advance. It's certainly reasonable to use an alternate port for
the default entry as well, and is not "useless" without multiple
listen ports; it is as useful for the default port as it is for
individual fdb entries.

I'm not sure what the benefit of reverting any working feature
is, only to re-add it later, which I believe is what Stephen
is proposing. I think it is useful for some configurations now,
even if it only becomes useful for other configurations after
multiple listen ports.

Stephen, can you explain what the benefit of reverting this is?

I think we want it long term, but I can live without it
temporarily. I just don't see any reasoning for removing it
temporarily in the first place. It isn't broken, or breaking
anything else, after all.

                                                +-DLS

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Stephen Hemminger May 23, 2013, 12:39 a.m. UTC | #8
On Wed, 22 May 2013 19:18:12 -0400
David Stevens <dlstevens@us.ibm.com> wrote:

> David Miller <davem@davemloft.net> wrote on 05/22/2013 06:08:30 PM:
> 
> > David, please come to some kind of agreement with Stephen about what
> > we're going to do about this.
> 
> Well, I think being able to specify an alternate port for the
> default fdb entry is a useful feature in its own right. It is
> not equivalent to individual fdb entries with alternate ports
> because those require the host to know all the destinations in
> advance. It's certainly reasonable to use an alternate port for
> the default entry as well, and is not "useless" without multiple
> listen ports; it is as useful for the default port as it is for
> individual fdb entries.
> 
> I'm not sure what the benefit of reverting any working feature
> is, only to re-add it later, which I believe is what Stephen
> is proposing. I think it is useful for some configurations now,
> even if it only becomes useful for other configurations after
> multiple listen ports.
> 
> Stephen, can you explain what the benefit of reverting this is?
> 
> I think we want it long term, but I can live without it
> temporarily. I just don't see any reasoning for removing it
> temporarily in the first place. It isn't broken, or breaking
> anything else, after all.
> 
>                                                 +-DLS
> 

There are two different config things here
  1. FDB - NDA_PORT
  2. VXLAN - IFLA_VXLAN_PORT

The FDB stuff is seperate and fine, Dave did a complete job on that.
I was confused, but it's fine.

The per-vxlan device stuff is the issue. I don't think it works
as is in 3.10 because there is no real way to use it to receive.
And don't want to let it out broken. By reverting that part, we avoid
raising false expectations.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Stevens May 23, 2013, 2:14 a.m. UTC | #9
Stephen Hemminger <stephen@networkplumber.org> wrote on 05/22/2013 
08:39:54 PM:

> The per-vxlan device stuff is the issue. I don't think it works
> as is in 3.10 because there is no real way to use it to receive.
> And don't want to let it out broken. By reverting that part, we avoid
> raising false expectations.
> 

Of course you can use it to receive; the remote hosts can be
listening on a different port and this is the only mechanism
to send to those if there are not fdb entries on the local machine.

So, for example, host A:
binds to 1234
has no fdb entries, but sets the default remote port to 4789

host B
binds to 4789
has no fdb entries and sets the default remote port to 1234

...or you have an agent listening on a multicast address with port 5429
and the agent's job is to dynamically populate the fdb tables of any
hosts that don't have matching entries. All the VXLAN instances bind
to, say 4789, and the agent binds to port 5429. The agent receives a
packet, and adds an entry to the fdb table on the sending host where
the destination is the actual VXLAN receiver with port 4789. But the
default port on all of them for this kind of set-up is correctly the
different 5429, not 4789.

These configurations have nothing to do with multiple listen ports,
and reverting the patch prohibits an admin from doing it at all.
In the first set up, you can link two segments on different ports
without knowing any destination MAC addresses. In the second, you
can separate traffic not matching any destination MAC from traffic
with a known destination, and use that to fill forwarding tables.

I know these configurations don't match your sense of how VXLAN
should be used, but these are not "broken", do receives perfectly fine,
and have no dependency at all on multiple listen ports. I don't know
what  "false expectations" you think are there-- I think anyone
changing the default destination port to a different value from
the binding port simply expects that transmits using the default
will go to one port and received packets will go to another. Deciding
in advance that that is never appropriate is short-sighted; that
decision belongs with the admin setting it up. Reverting the patch
removes that choice.


                                                        +-DLS

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Stephen Hemminger May 23, 2013, 5:08 p.m. UTC | #10
On Wed, 22 May 2013 22:14:39 -0400
David Stevens <dlstevens@us.ibm.com> wrote:

> Stephen Hemminger <stephen@networkplumber.org> wrote on 05/22/2013 
> 08:39:54 PM:
> 
> > The per-vxlan device stuff is the issue. I don't think it works
> > as is in 3.10 because there is no real way to use it to receive.
> > And don't want to let it out broken. By reverting that part, we avoid
> > raising false expectations.
> > 
> 
> Of course you can use it to receive; the remote hosts can be
> listening on a different port and this is the only mechanism
> to send to those if there are not fdb entries on the local machine.
> 
> So, for example, host A:
> binds to 1234
> has no fdb entries, but sets the default remote port to 4789
> 
> host B
> binds to 4789
> has no fdb entries and sets the default remote port to 1234
> 
> ...or you have an agent listening on a multicast address with port 5429
> and the agent's job is to dynamically populate the fdb tables of any
> hosts that don't have matching entries. All the VXLAN instances bind
> to, say 4789, and the agent binds to port 5429. The agent receives a
> packet, and adds an entry to the fdb table on the sending host where
> the destination is the actual VXLAN receiver with port 4789. But the
> default port on all of them for this kind of set-up is correctly the
> different 5429, not 4789.
> 
> These configurations have nothing to do with multiple listen ports,
> and reverting the patch prohibits an admin from doing it at all.
> In the first set up, you can link two segments on different ports
> without knowing any destination MAC addresses. In the second, you
> can separate traffic not matching any destination MAC from traffic
> with a known destination, and use that to fill forwarding tables.
> 
> I know these configurations don't match your sense of how VXLAN
> should be used, but these are not "broken", do receives perfectly fine,
> and have no dependency at all on multiple listen ports. I don't know
> what  "false expectations" you think are there-- I think anyone
> changing the default destination port to a different value from
> the binding port simply expects that transmits using the default
> will go to one port and received packets will go to another. Deciding
> in advance that that is never appropriate is short-sighted; that
> decision belongs with the admin setting it up. Reverting the patch
> removes that choice.
> 
> 
>                                                         +-DLS
> 

Let's go back to basic use cases.
User wants to setup a vxlan between two hosts with custom port
   On both hosts, he/she does:
 # ip li add vxlan0 type vxlan id 42 group 239.1.1.1 dev eth0 dstport 4789

With net-next it works, with 3.10 it doesn't.

Sure it works if they have your agent, or manually configure N-M static FDB
entries, but that is corner case.

No matter if Dave decides to the revert the kernel part or not, I am pulling the option
from 3.10 version of iproute to avoid user confusion.




--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Stevens May 23, 2013, 6:45 p.m. UTC | #11
Stephen Hemminger <stephen@networkplumber.org> wrote on 05/23/2013 
01:08:54 PM:

> Let's go back to basic use cases.
> User wants to setup a vxlan between two hosts with custom port
>    On both hosts, he/she does:
>  # ip li add vxlan0 type vxlan id 42 group 239.1.1.1 dev eth0 dstport 
4789
> 
> With net-next it works, with 3.10 it doesn't.

This does not "work" in net-next, either. If you set the one-and-only bind
port to one port and the one and only send port to another port on both
systems, they can't talk to each other in net-next, either. If you 
separately
have it listen on multiple ports, and if potential non-linux systems 
you're
talking to can do that too, then you can configure it to work, but the 
extra
listen ports are entirely unnecessary.

You're starting with the assumption, which I disagree with, that the 
receive
and send ports need to be the same. In fact, for two systems listening on
different ports to talk to each other, they *must* be different. It 
doesn't
require multiple ports, it only requires the listen and send ports be 
*different*.

If I have two existing systems, one listening on port 8472 and one 
listening
on port 4789, and I want to connect them, how would you do that? Yes, you
*can* add individual MAC entries with different ports for all VMs 
connected
on both machines, but if you have 2 hosts and 10,000 MAC addresses, that's
a ridiculous waste. If you simply don't know all the MAC addresses on the
remote machine, you can't do it. But if you can set the destination port 
for
the default fdb entry, which is what dstport does, then you can do both.

The dstport and dst are simply the default entry for the fdb table, used 
if
there is no exact match in the fdb. They should have every capability to 
use
alternate data that the fdb table does.

> Sure it works if they have your agent, or manually configure N-M static 
FDB
> entries, but that is corner case.
> 
> No matter if Dave decides to the revert the kernel part or not, I am
> pulling the option
> from 3.10 version of iproute to avoid user confusion.

I don't know what confusion you're concerned about here. How can anyone
be confused by packets going to a different destination port when they use
an option "dstport XXXX"? (!?!?!?!?)

You can't really prohibit people from using VXLAN however they choose to.
It simply means people wanting to do more interesting things will
have to patch your code, jump through hoops to edit packets with 
netfilter,
or use alternative implementations that are more flexible. None of those
things benefits VXLAN on linux.

                                                                +-DLS

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Stephen Hemminger May 23, 2013, 7:18 p.m. UTC | #12
On Thu, 23 May 2013 14:45:51 -0400
David Stevens <dlstevens@us.ibm.com> wrote:

> Stephen Hemminger <stephen@networkplumber.org> wrote on 05/23/2013 
> 01:08:54 PM:
> 
> > Let's go back to basic use cases.
> > User wants to setup a vxlan between two hosts with custom port
> >    On both hosts, he/she does:
> >  # ip li add vxlan0 type vxlan id 42 group 239.1.1.1 dev eth0 dstport 
> 4789
> > 
> > With net-next it works, with 3.10 it doesn't.
> 
> This does not "work" in net-next, either. If you set the one-and-only bind
> port to one port and the one and only send port to another port on both
> systems, they can't talk to each other in net-next, either. If you 
> separately
> have it listen on multiple ports, and if potential non-linux systems 
> you're
> talking to can do that too, then you can configure it to work, but the 
> extra
> listen ports are entirely unnecessary.

With the patch davem already included, the dstport is enough
to add additional listener.


> 
> You're starting with the assumption, which I disagree with, that the 
> receive
> and send ports need to be the same. In fact, for two systems listening on
> different ports to talk to each other, they *must* be different. It 
> doesn't
> require multiple ports, it only requires the listen and send ports be 
> *different*.


Another use case is:
 # ip link add vxlan-old type vxlan id 42 group 239.1.1.1 dev eth0
 # ip link add vxlan-new type vxlan id 42 group 239.1.1.1 dev eth0 dstport 4789

This also works with net-next and not with 3.10



> 
> If I have two existing systems, one listening on port 8472 and one 
> listening
> on port 4789, and I want to connect them, how would you do that?

You can either do:
 A. Define two vxlan's and route or bridge them
 B. Define one vxlan with a destination port and add exception FDB entry
    to talk to the outlier


 Yes, you
> *can* add individual MAC entries with different ports for all VMs 
> connected
> on both machines, but if you have 2 hosts and 10,000 MAC addresses, that's
> a ridiculous waste. If you simply don't know all the MAC addresses on the
> remote machine, you can't do it. But if you can set the destination port 
> for
> the default fdb entry, which is what dstport does, then you can do both.

User's may reasonably want to define multiple overlay VXLAN's each
on a different port, or create one VXLAN and add excptions.


> 
> The dstport and dst are simply the default entry for the fdb table, used 
> if
> there is no exact match in the fdb. They should have every capability to 
> use
> alternate data that the fdb table does.
> 
> > Sure it works if they have your agent, or manually configure N-M static 
> FDB
> > entries, but that is corner case.
> > 
> > No matter if Dave decides to the revert the kernel part or not, I am
> > pulling the option
> > from 3.10 version of iproute to avoid user confusion.
> 
> I don't know what confusion you're concerned about here. How can anyone
> be confused by packets going to a different destination port when they use
> an option "dstport XXXX"? (!?!?!?!?)

The dstport option is not in a released kernel or iproute.

> 
> You can't really prohibit people from using VXLAN however they choose to.
> It simply means people wanting to do more interesting things will
> have to patch your code, jump through hoops to edit packets with 
> netfilter,
> or use alternative implementations that are more flexible. None of those
> things benefits VXLAN on linux.
> 
>                                                                 +-DLS
> 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Stevens May 23, 2013, 8:06 p.m. UTC | #13
Stephen Hemminger <stephen@networkplumber.org> wrote on 05/23/2013 
03:18:04 PM:

 
> With the patch davem already included, the dstport is enough
> to add additional listener.

        If you're saying that using the dstport changes the
listen port, or adds another listen port, then I think that
behaviour is wrong and should be reverted.
        An admin should be able to specify the source and destination
ports independently of each other. If dstport has a side-effect that
is unrelated to changing the destination port, that's what I'd call
"confusing."
        IMHO, "port" should change the listen port (only) and "dstport"
should change the send port (only). And yes, both of those should allow
multiple ports, and destinations. So, binding should be a list of
the form: "[IP:]port[,[IP:]port]*" and destinations should be the same
as in the fdb, allowing multiple destinations and different ports, and
different vni's. It should be simply a "default" fdb entry in all 
respects.

> Another use case is:
>  # ip link add vxlan-old type vxlan id 42 group 239.1.1.1 dev eth0
>  # ip link add vxlan-new type vxlan id 42 group 239.1.1.1 dev eth0 
> dstport 4789
> 
> This also works with net-next and not with 3.10

        Yes, I wholeheartedly approve of this feature. It simply is
not the only use case for wanting different destination ports, so is
not linked to the dstport feature, in my opinion.

> > If I have two existing systems, one listening on port 8472 and one 
> > listening
> > on port 4789, and I want to connect them, how would you do that?
> 
> You can either do:
>  A. Define two vxlan's and route or bridge them
>  B. Define one vxlan with a destination port and add exception FDB entry
>     to talk to the outlier

        So, you're requiring an extra bind port (and the specific port the
remote device is using may not be available on the local machine) and an
extra hop through a bridge, when all you really want as an admin is to say
"listen on this port, send on that port" without that also meaning you
have to know and specify the MAC address of every possible remote VM.

> User's may reasonably want to define multiple overlay VXLAN's each
> on a different port, or create one VXLAN and add excptions.

        Sure. I think multiple listen ports is a good idea, but it is
not the only reason for wanting a different destination port, and not
a reason to link destination and listen ports.
 
> The dstport option is not in a released kernel or iproute.

I don't think we're going to get consensus here. If you intend the
artificial requirement that the bind and listen ports be the same,
and that admins need to jump through hoops to get the reasonable
effect of making them different, then I simply disagree with you.

                                                                +-DLS



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sridhar Samudrala May 23, 2013, 10:35 p.m. UTC | #14
On 5/23/2013 1:06 PM, David Stevens wrote:
> Stephen Hemminger <stephen@networkplumber.org> wrote on 05/23/2013
> 03:18:04 PM:
>
>   
>> With the patch davem already included, the dstport is enough
>> to add additional listener.
>          If you're saying that using the dstport changes the
> listen port, or adds another listen port, then I think that
> behaviour is wrong and should be reverted.
I agree that using 'dstport' option to also create a socket and binding 
to that port for
receives is confusing. As the name suggests, it should only be used as a 
default dst
port for fdb entries.

>          An admin should be able to specify the source and destination
> ports independently of each other. If dstport has a side-effect that
> is unrelated to changing the destination port, that's what I'd call
> "confusing."
>          IMHO, "port" should change the listen port (only) and "dstport"
> should change the send port (only). And yes, both of those should allow
> multiple ports, and destinations. So, binding should be a list of
> the form: "[IP:]port[,[IP:]port]*" and destinations should be the same
> as in the fdb, allowing multiple destinations and different ports, and
> different vni's. It should be simply a "default" fdb entry in all
> respects.
Currently 'port' option takes 2 values that indicate the range of ports 
that can be used as
source port when sending vxlan packets.

So we don't have a good way to specify listening port when creating a 
vxlan device using
the existing options.

It may be a good idea to revert dstport in linux-3.10 and  multiple 
listening ports patch in
net-next and re-implement them with 2 different options that can take a 
list of ports/addresses
as David suggested.

Thanks
Sridhar
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller June 3, 2013, 5:40 a.m. UTC | #15
I'm deciding to keep the patch in, and not revert.

A very concise and incredibly strong argument will need to be
given to me to change my mind.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Stephen Hemminger June 3, 2013, 3:15 p.m. UTC | #16
On Sun, 02 Jun 2013 22:40:37 -0700 (PDT)
David Miller <davem@davemloft.net> wrote:

> 
> I'm deciding to keep the patch in, and not revert.
> 
> A very concise and incredibly strong argument will need to be
> given to me to change my mind.
> 
> Thanks.

Ok, I decided to drop the ability to configure this in 3.10 iproute2 because
the behaviour changes in 3.11
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

--- a/drivers/net/vxlan.c	2013-05-20 08:13:49.078440463 -0700
+++ b/drivers/net/vxlan.c	2013-05-20 09:31:32.752105130 -0700
@@ -110,7 +110,6 @@  struct vxlan_dev {
 	struct net_device *dev;
 	struct vxlan_rdst default_dst;	/* default destination */
 	__be32		  saddr;	/* source address */
-	__be16		  dst_port;
 	__u16		  port_min;	/* source port range */
 	__u16		  port_max;
 	__u8		  tos;		/* TOS override */
@@ -193,7 +192,7 @@  static int vxlan_fdb_info(struct sk_buff
 	if (send_ip && nla_put_be32(skb, NDA_DST, rdst->remote_ip))
 		goto nla_put_failure;
 
-	if (rdst->remote_port && rdst->remote_port != vxlan->dst_port &&
+	if (rdst->remote_port && rdst->remote_port != htons(vxlan_port) &&
 	    nla_put_be16(skb, NDA_PORT, rdst->remote_port))
 		goto nla_put_failure;
 	if (rdst->remote_vni != vxlan->default_dst.remote_vni &&
@@ -480,7 +479,7 @@  static int vxlan_fdb_add(struct ndmsg *n
 			return -EINVAL;
 		port = nla_get_be16(tb[NDA_PORT]);
 	} else
-		port = vxlan->dst_port;
+		port = htons(vxlan_port);
 
 	if (tb[NDA_VNI]) {
 		if (nla_len(tb[NDA_VNI]) != sizeof(u32))
@@ -591,7 +590,7 @@  static void vxlan_snoop(struct net_devic
 		err = vxlan_fdb_create(vxlan, src_mac, src_ip,
 				       NUD_REACHABLE,
 				       NLM_F_EXCL|NLM_F_CREATE,
-				       vxlan->dst_port,
+				       vxlan_port,
 				       vxlan->default_dst.remote_vni,
 				       0, NTF_SELF);
 		spin_unlock(&vxlan->hash_lock);
@@ -982,7 +981,7 @@  static netdev_tx_t vxlan_xmit_one(struct
 	__be16 df = 0;
 	__u8 tos, ttl;
 
-	dst_port = rdst->remote_port ? rdst->remote_port : vxlan->dst_port;
+	dst_port = rdst->remote_port ? rdst->remote_port : htons(vxlan_port);
 	vni = rdst->remote_vni;
 	dst = rdst->remote_ip;
 
@@ -1326,7 +1325,6 @@  static void vxlan_setup(struct net_devic
 	inet_get_local_port_range(&low, &high);
 	vxlan->port_min = low;
 	vxlan->port_max = high;
-	vxlan->dst_port = htons(vxlan_port);
 
 	vxlan->dev = dev;
 
@@ -1349,7 +1347,6 @@  static const struct nla_policy vxlan_pol
 	[IFLA_VXLAN_RSC]	= { .type = NLA_U8 },
 	[IFLA_VXLAN_L2MISS]	= { .type = NLA_U8 },
 	[IFLA_VXLAN_L3MISS]	= { .type = NLA_U8 },
-	[IFLA_VXLAN_PORT]	= { .type = NLA_U16 },
 };
 
 static int vxlan_validate(struct nlattr *tb[], struct nlattr *data[])
@@ -1479,9 +1476,6 @@  static int vxlan_newlink(struct net *net
 		vxlan->port_max = ntohs(p->high);
 	}
 
-	if (data[IFLA_VXLAN_PORT])
-		vxlan->dst_port = nla_get_be16(data[IFLA_VXLAN_PORT]);
-
 	SET_ETHTOOL_OPS(dev, &vxlan_ethtool_ops);
 
 	err = register_netdevice(dev);
@@ -1517,7 +1511,6 @@  static size_t vxlan_get_size(const struc
 		nla_total_size(sizeof(__u32)) +	/* IFLA_VXLAN_AGEING */
 		nla_total_size(sizeof(__u32)) +	/* IFLA_VXLAN_LIMIT */
 		nla_total_size(sizeof(struct ifla_vxlan_port_range)) +
-		nla_total_size(sizeof(__be16))+ /* IFLA_VXLAN_PORT */
 		0;
 }
 
@@ -1554,8 +1547,7 @@  static int vxlan_fill_info(struct sk_buf
 	    nla_put_u8(skb, IFLA_VXLAN_L3MISS,
 			!!(vxlan->flags & VXLAN_F_L3MISS)) ||
 	    nla_put_u32(skb, IFLA_VXLAN_AGEING, vxlan->age_interval) ||
-	    nla_put_u32(skb, IFLA_VXLAN_LIMIT, vxlan->addrmax) ||
-	    nla_put_be16(skb, IFLA_VXLAN_PORT, vxlan->dst_port))
+	    nla_put_u32(skb, IFLA_VXLAN_LIMIT, vxlan->addrmax))
 		goto nla_put_failure;
 
 	if (nla_put(skb, IFLA_VXLAN_PORT_RANGE, sizeof(ports), &ports))