diff mbox

bonding: Allow tun-interfaces as slaves

Message ID 20160808214812.GJ22974@cork
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Jörn Engel Aug. 8, 2016, 9:48 p.m. UTC
Up until 00503b6f702e (part of 3.14-rc1), the bonding driver could be
used to enslave tun-interfaces.  00503b6f702e broke that behaviour,
afaics as an unintended side-effect.

For the purpose of bond-over-tun in balance-rr mode, simply ignoring the
error from dev_set_mac_address() is good enough.  I am not familiar
enough with the code to judge what new problems this patch might
introduce.

Signed-off-by: Joern Engel <joern@purestorage.com>
---
 drivers/net/bonding/bond_main.c | 7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

Comments

Ding Tianhong Aug. 9, 2016, 2:18 a.m. UTC | #1
On 2016/8/9 5:48, Jörn Engel wrote:
> Up until 00503b6f702e (part of 3.14-rc1), the bonding driver could be
> used to enslave tun-interfaces.  00503b6f702e broke that behaviour,
> afaics as an unintended side-effect.
> 

Hi Jorn:

I don't understand your problem clearly, can you explain more about how the 00503b6f702e break tun-interfaces
and we will try to fix it.

and more, dev_set_mac_address will change the salver's mac address, some nic don't support to change the mac address and
could not work as bond slave, so we need to check the return value, I don't think this patch has any effective improvement.

Thanks.
Ding

> For the purpose of bond-over-tun in balance-rr mode, simply ignoring the
> error from dev_set_mac_address() is good enough.  I am not familiar
> enough with the code to judge what new problems this patch might
> introduce.
> 
> Signed-off-by: Joern Engel <joern@purestorage.com>
> ---
>  drivers/net/bonding/bond_main.c | 7 +------
>  1 file changed, 1 insertion(+), 6 deletions(-)
> 
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index 1f276fa30ba6..bc5dba847f50 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -1489,11 +1489,7 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
>  		 */
>  		memcpy(addr.sa_data, bond_dev->dev_addr, bond_dev->addr_len);
>  		addr.sa_family = slave_dev->type;
> -		res = dev_set_mac_address(slave_dev, &addr);
> -		if (res) {
> -			netdev_dbg(bond_dev, "Error %d calling set_mac_address\n", res);
> -			goto err_restore_mtu;
> -		}
> +		dev_set_mac_address(slave_dev, &addr);
>  	}
>  
>  	/* set slave flag before open to prevent IPv6 addrconf */
> @@ -1777,7 +1773,6 @@ err_restore_mac:
>  		dev_set_mac_address(slave_dev, &addr);
>  	}
>  
> -err_restore_mtu:
>  	dev_set_mtu(slave_dev, new_slave->original_mtu);
>  
>  err_free:
>
Jörn Engel Aug. 9, 2016, 3:09 a.m. UTC | #2
Hello Tianhong!

On Tue, Aug 09, 2016 at 10:18:41AM +0800, Ding Tianhong wrote:
> 
> I don't understand your problem clearly, can you explain more about how the 00503b6f702e break tun-interfaces
> and we will try to fix it.

Here is a trivial testcase:
openvpn --mktun --dev tun0
echo +tun0 > /sys/class/net/bond0/bonding/slaves

Worked fine before your patch, no longer works after your patch.  Works
again after my patch.

> and more, dev_set_mac_address will change the salver's mac address, some nic don't support to change the mac address and
> could not work as bond slave, so we need to check the return value, I don't think this patch has any effective improvement.

Using bonding in balance-rr mode, there doesn't seem to be a need to
change the mac address.  I suppose you might care in other modes, but I
don't.

Jörn

--
Time? What's that? Time is only worth what you do with it.
-- Theo de Raadt
Zhu Yanjun Aug. 9, 2016, 5:29 a.m. UTC | #3
Can we check slave_ops->ndo_set_mac_address?

1476         if ((slave_ops->ndo_set_mac_address) &&
(!bond->params.fail_over_mac ||
1477             BOND_MODE(bond) != BOND_MODE_ACTIVEBACKUP)) {
1478                 /* Set slave to master's mac address. The
application already
1479 * set the master's mac address to that of the first slave
1480 */
1481                 memcpy(addr.sa_data, bond_dev->dev_addr,
bond_dev->addr_len);
1482                 addr.sa_family = slave_dev->type;
1483                 res = dev_set_mac_address(slave_dev, &addr);
1484                 if (res) {
1485                         netdev_dbg(bond_dev, "Error %d calling
set_mac_address\n", res);
1486                         goto err_restore_mtu;
1487                 }
1488         }

On Tue, Aug 9, 2016 at 11:09 AM, Jörn Engel <joern@purestorage.com> wrote:
> Hello Tianhong!
>
> On Tue, Aug 09, 2016 at 10:18:41AM +0800, Ding Tianhong wrote:
>>
>> I don't understand your problem clearly, can you explain more about how the 00503b6f702e break tun-interfaces
>> and we will try to fix it.
>
> Here is a trivial testcase:
> openvpn --mktun --dev tun0
> echo +tun0 > /sys/class/net/bond0/bonding/slaves
>
> Worked fine before your patch, no longer works after your patch.  Works
> again after my patch.
>
>> and more, dev_set_mac_address will change the salver's mac address, some nic don't support to change the mac address and
>> could not work as bond slave, so we need to check the return value, I don't think this patch has any effective improvement.
>
> Using bonding in balance-rr mode, there doesn't seem to be a need to
> change the mac address.  I suppose you might care in other modes, but I
> don't.
>
> Jörn
>
> --
> Time? What's that? Time is only worth what you do with it.
> -- Theo de Raadt
Ding Tianhong Aug. 9, 2016, 5:52 a.m. UTC | #4
On 2016/8/9 11:09, Jörn Engel wrote:
> Hello Tianhong!
> 
> On Tue, Aug 09, 2016 at 10:18:41AM +0800, Ding Tianhong wrote:
>>
>> I don't understand your problem clearly, can you explain more about how the 00503b6f702e break tun-interfaces
>> and we will try to fix it.
> 
> Here is a trivial testcase:
> openvpn --mktun --dev tun0
> echo +tun0 > /sys/class/net/bond0/bonding/slaves
> 
> Worked fine before your patch, no longer works after your patch.  Works
> again after my patch.
> 
Hi Jorn:

I check the code and know the reason, the Tun device(or something like tun...) didn't has the ndo_set_mac_address, so will not add to bond
as a slaver device in the mode rr, I think the original logic is fine for bond enslave processing, the old kernel just avoid this problem and not
fix it, the bond need to considerate the virtual device which not support changing mac.

>> and more, dev_set_mac_address will change the salver's mac address, some nic don't support to change the mac address and
>> could not work as bond slave, so we need to check the return value, I don't think this patch has any effective improvement.
> 
> Using bonding in balance-rr mode, there doesn't seem to be a need to
> change the mac address.  I suppose you might care in other modes, but I
> don't.
> 
> Jörn
> 
> --
> Time? What's that? Time is only worth what you do with it.
> -- Theo de Raadt
> 
>
Ding Tianhong Aug. 9, 2016, 1:28 p.m. UTC | #5
On 2016/8/9 13:29, zhuyj wrote:
> Can we check slave_ops->ndo_set_mac_address?
> 
> 1476         if ((slave_ops->ndo_set_mac_address) &&
> (!bond->params.fail_over_mac ||
> 1477             BOND_MODE(bond) != BOND_MODE_ACTIVEBACKUP)) {
> 1478                 /* Set slave to master's mac address. The
> application already
> 1479 * set the master's mac address to that of the first slave
> 1480 */
> 1481                 memcpy(addr.sa_data, bond_dev->dev_addr,
> bond_dev->addr_len);
> 1482                 addr.sa_family = slave_dev->type;
> 1483                 res = dev_set_mac_address(slave_dev, &addr);
> 1484                 if (res) {
> 1485                         netdev_dbg(bond_dev, "Error %d calling
> set_mac_address\n", res);
> 1486                         goto err_restore_mtu;
> 1487                 }
> 1488         }
> 

This patch is a simple solution for this problem, but I don't think it is the right solution, the bond is a virtual device base on L2,
if the slave has no mac address, it will break the design principle, so we need to think more about it.

I think if the bonding dev has to support L3 virtual device, we need to add new bond features to distinguish the dev and make the
bond xmit and transfer without the mac address.

Thanks
Ding

> On Tue, Aug 9, 2016 at 11:09 AM, Jörn Engel <joern@purestorage.com> wrote:
>> Hello Tianhong!
>>
>> On Tue, Aug 09, 2016 at 10:18:41AM +0800, Ding Tianhong wrote:
>>>
>>> I don't understand your problem clearly, can you explain more about how the 00503b6f702e break tun-interfaces
>>> and we will try to fix it.
>>
>> Here is a trivial testcase:
>> openvpn --mktun --dev tun0
>> echo +tun0 > /sys/class/net/bond0/bonding/slaves
>>
>> Worked fine before your patch, no longer works after your patch.  Works
>> again after my patch.
>>
>>> and more, dev_set_mac_address will change the salver's mac address, some nic don't support to change the mac address and
>>> could not work as bond slave, so we need to check the return value, I don't think this patch has any effective improvement.
>>
>> Using bonding in balance-rr mode, there doesn't seem to be a need to
>> change the mac address.  I suppose you might care in other modes, but I
>> don't.
>>
>> Jörn
>>
>> --
>> Time? What's that? Time is only worth what you do with it.
>> -- Theo de Raadt
> 
> .
>
Jörn Engel Aug. 9, 2016, 6:08 p.m. UTC | #6
On Tue, Aug 09, 2016 at 09:28:45PM +0800, Ding Tianhong wrote:
> 
> This patch is a simple solution for this problem, but I don't think it is the right solution, the bond is a virtual device base on L2,
> if the slave has no mac address, it will break the design principle, so we need to think more about it.

The important point is: it worked.  It solved a problem that at least
three people cared enough about to send a bug report.

Now it doesn't work anymore.  That is a regression.

Whether or not L2 has always been a design principle for bonding can be
argued as well.  But in the face of a regression, I suggest we fix the
regression.

> I think if the bonding dev has to support L3 virtual device, we need to add new bond features to distinguish the dev and make the
> bond xmit and transfer without the mac address.

Simply not checking errors when setting the mac address solves the
problem for me.  No new features needed.

If you want to retain error handling, you can make those checks
conditional on the mode.  In balance-rr or broadcast mode, ignore the
error.  I don't need and haven't tested broadcast mode, but it doesn't
seem to depend on any L2 attributes either.

Jörn

--
Do not stop an army on its way home.
-- Sun Tzu
Jay Vosburgh Aug. 9, 2016, 6:21 p.m. UTC | #7
Jörn Engel <joern@purestorage.com> wrote:

>Hello Tianhong!
>
>On Tue, Aug 09, 2016 at 10:18:41AM +0800, Ding Tianhong wrote:
>> 
>> I don't understand your problem clearly, can you explain more about how the 00503b6f702e break tun-interfaces
>> and we will try to fix it.
>
>Here is a trivial testcase:
>openvpn --mktun --dev tun0
>echo +tun0 > /sys/class/net/bond0/bonding/slaves
>
>Worked fine before your patch, no longer works after your patch.  Works
>again after my patch.

	Could you describe your use case a bit further?  Are you bonding
together multiple VPN tunnels?

	This may be a regression, but since the patch that nominally
introduced it was 2 years ago, the impact appears to be very narrow.

>> and more, dev_set_mac_address will change the salver's mac address, some nic don't support to change the mac address and
>> could not work as bond slave, so we need to check the return value, I don't think this patch has any effective improvement.
>
>Using bonding in balance-rr mode, there doesn't seem to be a need to
>change the mac address.  I suppose you might care in other modes, but I
>don't.

	The balance-rr mode (as well as the -xor mode) is designed to
interoperate with a Cisco Etherchannel-style static link aggregation,
which requires all members to have the same MAC address for proper
function.

	Now, the above notwithstanding, I don't have an issue if you
want to bond together a couple of tun devices and can make it work.
However, for the standard balance-rr case, the enslavement must fail if
the call to set the slave MAC fails, as permitting the slave into the
bond after set-MAC fails can result in a bond that silently loses
packets.

	My tentative suggestion is that we extened fail_over_mac to
cover additional modes, as a sort of "I really know what I'm doing"
flag, and allow the enslavement to succeed when it is set.  This would
require setting an additional bonding option for this situation, but
that doesn't seem to be a undue burden as this looks to be a niche use
case.

	-J

---
	-Jay Vosburgh, jay.vosburgh@canonical.com
Jörn Engel Aug. 9, 2016, 6:40 p.m. UTC | #8
On Tue, Aug 09, 2016 at 11:21:31AM -0700, Jay Vosburgh wrote:
> Jörn Engel <joern@purestorage.com> wrote:
> >On Tue, Aug 09, 2016 at 10:18:41AM +0800, Ding Tianhong wrote:
> >> 
> >> I don't understand your problem clearly, can you explain more about how the 00503b6f702e break tun-interfaces
> >> and we will try to fix it.
> >
> >Here is a trivial testcase:
> >openvpn --mktun --dev tun0
> >echo +tun0 > /sys/class/net/bond0/bonding/slaves
> >
> >Worked fine before your patch, no longer works after your patch.  Works
> >again after my patch.
> 
> 	Could you describe your use case a bit further?  Are you bonding
> together multiple VPN tunnels?

Yes.  Specificaly I use "ssh -w" to create tunnels.  Ssh is
single-threaded, so the tunnel is too slow.  Aggregate a bunch and you
get closer to link speed.

Alternative would be pfSense.  Afaics that easily beats anything Linux
can offer.  I'm just more familiar with Linux and trust ssh security
more than most alternatives.

> 	This may be a regression, but since the patch that nominally
> introduced it was 2 years ago, the impact appears to be very narrow.

Did you check the dates on the other two bug reports?  Anyone
experiencing the problem and checking google will come to the conclusion
that you don't care and not bother sending yet another bug report.  You
then come to the conclusion that users don't care.

> >> and more, dev_set_mac_address will change the salver's mac address, some nic don't support to change the mac address and
> >> could not work as bond slave, so we need to check the return value, I don't think this patch has any effective improvement.
> >
> >Using bonding in balance-rr mode, there doesn't seem to be a need to
> >change the mac address.  I suppose you might care in other modes, but I
> >don't.
> 
> 	The balance-rr mode (as well as the -xor mode) is designed to
> interoperate with a Cisco Etherchannel-style static link aggregation,
> which requires all members to have the same MAC address for proper
> function.

Linux was designed to be a terminal for dialup to a university in
Helsinki, if memory serves.  Sometimes it is a good thing to work in
ways the design never intended.

Jörn

--
A defeated army first battles and then seeks victory.
-- Sun Tzu
David Miller Aug. 9, 2016, 7:06 p.m. UTC | #9
From: Jörn Engel <joern@purestorage.com>
Date: Tue, 9 Aug 2016 11:08:30 -0700

> On Tue, Aug 09, 2016 at 09:28:45PM +0800, Ding Tianhong wrote:
>> 
>> I think if the bonding dev has to support L3 virtual device, we need to add new bond features to distinguish the dev and make the
>> bond xmit and transfer without the mac address.
> 
> Simply not checking errors when setting the mac address solves the
> problem for me.  No new features needed.

But it only works in certain modes.

So the best we can do is enforce the MAC address setting in the
modes that absolutely require it.  We cannot ignore the MAC
address setting unilaterally.
David Miller Aug. 9, 2016, 7:10 p.m. UTC | #10
From: Jörn Engel <joern@purestorage.com>
Date: Tue, 9 Aug 2016 11:40:57 -0700

> On Tue, Aug 09, 2016 at 11:21:31AM -0700, Jay Vosburgh wrote:
>> 	The balance-rr mode (as well as the -xor mode) is designed to
>> interoperate with a Cisco Etherchannel-style static link aggregation,
>> which requires all members to have the same MAC address for proper
>> function.
> 
> Linux was designed to be a terminal for dialup to a university in
> Helsinki, if memory serves.  Sometimes it is a good thing to work in
> ways the design never intended.

You're not addressing the issue Jay is trying to make you aware of
in a useful way.

You state that Jay doesn't want to help you, but your comment here
shows that you really aren't exactly participating in the most
positive manner either.
diff mbox

Patch

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 1f276fa30ba6..bc5dba847f50 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1489,11 +1489,7 @@  int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
 		 */
 		memcpy(addr.sa_data, bond_dev->dev_addr, bond_dev->addr_len);
 		addr.sa_family = slave_dev->type;
-		res = dev_set_mac_address(slave_dev, &addr);
-		if (res) {
-			netdev_dbg(bond_dev, "Error %d calling set_mac_address\n", res);
-			goto err_restore_mtu;
-		}
+		dev_set_mac_address(slave_dev, &addr);
 	}
 
 	/* set slave flag before open to prevent IPv6 addrconf */
@@ -1777,7 +1773,6 @@  err_restore_mac:
 		dev_set_mac_address(slave_dev, &addr);
 	}
 
-err_restore_mtu:
 	dev_set_mtu(slave_dev, new_slave->original_mtu);
 
 err_free: