diff mbox series

bonding: fix arp_validate toggling in active-backup mode

Message ID 20190510215709.19162-1-jarod@redhat.com
State Accepted
Delegated to: David Miller
Headers show
Series bonding: fix arp_validate toggling in active-backup mode | expand

Commit Message

Jarod Wilson May 10, 2019, 9:57 p.m. UTC
There's currently a problem with toggling arp_validate on and off with an
active-backup bond. At the moment, you can start up a bond, like so:

modprobe bonding mode=1 arp_interval=100 arp_validate=0 arp_ip_targets=192.168.1.1
ip link set bond0 down
echo "ens4f0" > /sys/class/net/bond0/bonding/slaves
echo "ens4f1" > /sys/class/net/bond0/bonding/slaves
ip link set bond0 up
ip addr add 192.168.1.2/24 dev bond0

Pings to 192.168.1.1 work just fine. Now turn on arp_validate:

echo 1 > /sys/class/net/bond0/bonding/arp_validate

Pings to 192.168.1.1 continue to work just fine. Now when you go to turn
arp_validate off again, the link falls flat on it's face:

echo 0 > /sys/class/net/bond0/bonding/arp_validate
dmesg
...
[133191.911987] bond0: Setting arp_validate to none (0)
[133194.257793] bond0: bond_should_notify_peers: slave ens4f0
[133194.258031] bond0: link status definitely down for interface ens4f0, disabling it
[133194.259000] bond0: making interface ens4f1 the new active one
[133197.330130] bond0: link status definitely down for interface ens4f1, disabling it
[133197.331191] bond0: now running without any active interface!

The problem lies in bond_options.c, where passing in arp_validate=0
results in bond->recv_probe getting set to NULL. This flies directly in
the face of commit 3fe68df97c7f, which says we need to set recv_probe =
bond_arp_recv, even if we're not using arp_validate. Said commit fixed
this in bond_option_arp_interval_set, but missed that we can get to that
same state in bond_option_arp_validate_set as well.

One solution would be to universally set recv_probe = bond_arp_recv here
as well, but I don't think bond_option_arp_validate_set has any business
touching recv_probe at all, and that should be left to the arp_interval
code, so we can just make things much tidier here.

Fixes: 3fe68df97c7f ("bonding: always set recv_probe to bond_arp_rcv in arp monitor")
CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@gmail.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: "David S. Miller" <davem@davemloft.net>
CC: netdev@vger.kernel.org
Signed-off-by: Jarod Wilson <jarod@redhat.com>
---
 drivers/net/bonding/bond_options.c | 7 -------
 1 file changed, 7 deletions(-)

Comments

Jay Vosburgh May 10, 2019, 10:53 p.m. UTC | #1
Jarod Wilson <jarod@redhat.com> wrote:

>There's currently a problem with toggling arp_validate on and off with an
>active-backup bond. At the moment, you can start up a bond, like so:
>
>modprobe bonding mode=1 arp_interval=100 arp_validate=0 arp_ip_targets=192.168.1.1
>ip link set bond0 down
>echo "ens4f0" > /sys/class/net/bond0/bonding/slaves
>echo "ens4f1" > /sys/class/net/bond0/bonding/slaves
>ip link set bond0 up
>ip addr add 192.168.1.2/24 dev bond0
>
>Pings to 192.168.1.1 work just fine. Now turn on arp_validate:
>
>echo 1 > /sys/class/net/bond0/bonding/arp_validate
>
>Pings to 192.168.1.1 continue to work just fine. Now when you go to turn
>arp_validate off again, the link falls flat on it's face:
>
>echo 0 > /sys/class/net/bond0/bonding/arp_validate
>dmesg
>...
>[133191.911987] bond0: Setting arp_validate to none (0)
>[133194.257793] bond0: bond_should_notify_peers: slave ens4f0
>[133194.258031] bond0: link status definitely down for interface ens4f0, disabling it
>[133194.259000] bond0: making interface ens4f1 the new active one
>[133197.330130] bond0: link status definitely down for interface ens4f1, disabling it
>[133197.331191] bond0: now running without any active interface!
>
>The problem lies in bond_options.c, where passing in arp_validate=0
>results in bond->recv_probe getting set to NULL. This flies directly in
>the face of commit 3fe68df97c7f, which says we need to set recv_probe =
>bond_arp_recv, even if we're not using arp_validate. Said commit fixed
>this in bond_option_arp_interval_set, but missed that we can get to that
>same state in bond_option_arp_validate_set as well.
>
>One solution would be to universally set recv_probe = bond_arp_recv here
>as well, but I don't think bond_option_arp_validate_set has any business
>touching recv_probe at all, and that should be left to the arp_interval
>code, so we can just make things much tidier here.
>
>Fixes: 3fe68df97c7f ("bonding: always set recv_probe to bond_arp_rcv in arp monitor")

	Is the above Fixes: tag correct?  3fe68df97c7f is not the source
of the erroneous logic being removed, which was introduced by

commit 29c4948293bfc426e52a921f4259eb3676961e81
Author: sfeldma@cumulusnetworks.com <sfeldma@cumulusnetworks.com>
Date:   Thu Dec 12 14:10:38 2013 -0800

    bonding: add arp_validate netlink support

	Regardless of which Fixes: is correct, the patch itself looks
fine to me:

Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com>

	-J


>CC: Jay Vosburgh <j.vosburgh@gmail.com>
>CC: Veaceslav Falico <vfalico@gmail.com>
>CC: Andy Gospodarek <andy@greyhouse.net>
>CC: "David S. Miller" <davem@davemloft.net>
>CC: netdev@vger.kernel.org
>Signed-off-by: Jarod Wilson <jarod@redhat.com>
>---
> drivers/net/bonding/bond_options.c | 7 -------
> 1 file changed, 7 deletions(-)
>
>diff --git a/drivers/net/bonding/bond_options.c b/drivers/net/bonding/bond_options.c
>index da1fc17295d9..b996967af8d9 100644
>--- a/drivers/net/bonding/bond_options.c
>+++ b/drivers/net/bonding/bond_options.c
>@@ -1098,13 +1098,6 @@ static int bond_option_arp_validate_set(struct bonding *bond,
> {
> 	netdev_dbg(bond->dev, "Setting arp_validate to %s (%llu)\n",
> 		   newval->string, newval->value);
>-
>-	if (bond->dev->flags & IFF_UP) {
>-		if (!newval->value)
>-			bond->recv_probe = NULL;
>-		else if (bond->params.arp_interval)
>-			bond->recv_probe = bond_arp_rcv;
>-	}
> 	bond->params.arp_validate = newval->value;
> 
> 	return 0;
>-- 
>2.20.1
>

---
	-Jay Vosburgh, jay.vosburgh@canonical.com
Jarod Wilson May 11, 2019, 6:12 a.m. UTC | #2
On 5/10/19 6:53 PM, Jay Vosburgh wrote:
> Jarod Wilson <jarod@redhat.com> wrote:
> 
>> There's currently a problem with toggling arp_validate on and off with an
>> active-backup bond. At the moment, you can start up a bond, like so:
>>
>> modprobe bonding mode=1 arp_interval=100 arp_validate=0 arp_ip_targets=192.168.1.1
>> ip link set bond0 down
>> echo "ens4f0" > /sys/class/net/bond0/bonding/slaves
>> echo "ens4f1" > /sys/class/net/bond0/bonding/slaves
>> ip link set bond0 up
>> ip addr add 192.168.1.2/24 dev bond0
>>
>> Pings to 192.168.1.1 work just fine. Now turn on arp_validate:
>>
>> echo 1 > /sys/class/net/bond0/bonding/arp_validate
>>
>> Pings to 192.168.1.1 continue to work just fine. Now when you go to turn
>> arp_validate off again, the link falls flat on it's face:
>>
>> echo 0 > /sys/class/net/bond0/bonding/arp_validate
>> dmesg
>> ...
>> [133191.911987] bond0: Setting arp_validate to none (0)
>> [133194.257793] bond0: bond_should_notify_peers: slave ens4f0
>> [133194.258031] bond0: link status definitely down for interface ens4f0, disabling it
>> [133194.259000] bond0: making interface ens4f1 the new active one
>> [133197.330130] bond0: link status definitely down for interface ens4f1, disabling it
>> [133197.331191] bond0: now running without any active interface!
>>
>> The problem lies in bond_options.c, where passing in arp_validate=0
>> results in bond->recv_probe getting set to NULL. This flies directly in
>> the face of commit 3fe68df97c7f, which says we need to set recv_probe =
>> bond_arp_recv, even if we're not using arp_validate. Said commit fixed
>> this in bond_option_arp_interval_set, but missed that we can get to that
>> same state in bond_option_arp_validate_set as well.
>>
>> One solution would be to universally set recv_probe = bond_arp_recv here
>> as well, but I don't think bond_option_arp_validate_set has any business
>> touching recv_probe at all, and that should be left to the arp_interval
>> code, so we can just make things much tidier here.
>>
>> Fixes: 3fe68df97c7f ("bonding: always set recv_probe to bond_arp_rcv in arp monitor")
> 
> 	Is the above Fixes: tag correct?  3fe68df97c7f is not the source
> of the erroneous logic being removed, which was introduced by
> 
> commit 29c4948293bfc426e52a921f4259eb3676961e81
> Author: sfeldma@cumulusnetworks.com <sfeldma@cumulusnetworks.com>
> Date:   Thu Dec 12 14:10:38 2013 -0800
> 
>      bonding: add arp_validate netlink support

I wasn't entirely sure that was the best choice for Fixes either, it was 
sort of more "Augments the Fix in", so I'd certainly have no objection 
to changing the Fixes tag to the earlier commit instead.
Jay Vosburgh May 13, 2019, 4:43 p.m. UTC | #3
Jarod Wilson <jarod@redhat.com> wrote:

>On 5/10/19 6:53 PM, Jay Vosburgh wrote:
>> Jarod Wilson <jarod@redhat.com> wrote:
>>
>>> There's currently a problem with toggling arp_validate on and off with an
>>> active-backup bond. At the moment, you can start up a bond, like so:
>>>
>>> modprobe bonding mode=1 arp_interval=100 arp_validate=0 arp_ip_targets=192.168.1.1
>>> ip link set bond0 down
>>> echo "ens4f0" > /sys/class/net/bond0/bonding/slaves
>>> echo "ens4f1" > /sys/class/net/bond0/bonding/slaves
>>> ip link set bond0 up
>>> ip addr add 192.168.1.2/24 dev bond0
>>>
>>> Pings to 192.168.1.1 work just fine. Now turn on arp_validate:
>>>
>>> echo 1 > /sys/class/net/bond0/bonding/arp_validate
>>>
>>> Pings to 192.168.1.1 continue to work just fine. Now when you go to turn
>>> arp_validate off again, the link falls flat on it's face:
>>>
>>> echo 0 > /sys/class/net/bond0/bonding/arp_validate
>>> dmesg
>>> ...
>>> [133191.911987] bond0: Setting arp_validate to none (0)
>>> [133194.257793] bond0: bond_should_notify_peers: slave ens4f0
>>> [133194.258031] bond0: link status definitely down for interface ens4f0, disabling it
>>> [133194.259000] bond0: making interface ens4f1 the new active one
>>> [133197.330130] bond0: link status definitely down for interface ens4f1, disabling it
>>> [133197.331191] bond0: now running without any active interface!
>>>
>>> The problem lies in bond_options.c, where passing in arp_validate=0
>>> results in bond->recv_probe getting set to NULL. This flies directly in
>>> the face of commit 3fe68df97c7f, which says we need to set recv_probe =
>>> bond_arp_recv, even if we're not using arp_validate. Said commit fixed
>>> this in bond_option_arp_interval_set, but missed that we can get to that
>>> same state in bond_option_arp_validate_set as well.
>>>
>>> One solution would be to universally set recv_probe = bond_arp_recv here
>>> as well, but I don't think bond_option_arp_validate_set has any business
>>> touching recv_probe at all, and that should be left to the arp_interval
>>> code, so we can just make things much tidier here.
>>>
>>> Fixes: 3fe68df97c7f ("bonding: always set recv_probe to bond_arp_rcv in arp monitor")
>>
>> 	Is the above Fixes: tag correct?  3fe68df97c7f is not the source
>> of the erroneous logic being removed, which was introduced by
>>
>> commit 29c4948293bfc426e52a921f4259eb3676961e81
>> Author: sfeldma@cumulusnetworks.com <sfeldma@cumulusnetworks.com>
>> Date:   Thu Dec 12 14:10:38 2013 -0800
>>
>>      bonding: add arp_validate netlink support
>
>I wasn't entirely sure that was the best choice for Fixes either, it was
>sort of more "Augments the Fix in", so I'd certainly have no objection to
>changing the Fixes tag to the earlier commit instead.

	That would be my preference, as the 29c4948293bf commit looks to
be the change actually being fixed.

	-J

---
	-Jay Vosburgh, jay.vosburgh@canonical.com
David Miller May 13, 2019, 4:44 p.m. UTC | #4
From: Jarod Wilson <jarod@redhat.com>
Date: Fri, 10 May 2019 17:57:09 -0400

> There's currently a problem with toggling arp_validate on and off with an
> active-backup bond. At the moment, you can start up a bond, like so:
 ...
> The problem lies in bond_options.c, where passing in arp_validate=0
> results in bond->recv_probe getting set to NULL. This flies directly in
> the face of commit 3fe68df97c7f, which says we need to set recv_probe =
> bond_arp_recv, even if we're not using arp_validate. Said commit fixed
> this in bond_option_arp_interval_set, but missed that we can get to that
> same state in bond_option_arp_validate_set as well.
> 
> One solution would be to universally set recv_probe = bond_arp_recv here
> as well, but I don't think bond_option_arp_validate_set has any business
> touching recv_probe at all, and that should be left to the arp_interval
> code, so we can just make things much tidier here.
> 
> Fixes: 3fe68df97c7f ("bonding: always set recv_probe to bond_arp_rcv in arp monitor")
 ...
> Signed-off-by: Jarod Wilson <jarod@redhat.com>

Applied and queued up for -stable, thanks.
David Miller May 13, 2019, 4:46 p.m. UTC | #5
From: Jay Vosburgh <jay.vosburgh@canonical.com>
Date: Mon, 13 May 2019 09:43:30 -0700

> 	That would be my preference, as the 29c4948293bf commit looks to
> be the change actually being fixed.

Sorry I pushed the original commit message out :-(

But isn't the Fixes: tag he choose the one where the logic actually
causes problems?  That's kind of my real criteria for Fixes: tags.
Jay Vosburgh May 13, 2019, 5:10 p.m. UTC | #6
David Miller <davem@davemloft.net> wrote:

>From: Jay Vosburgh <jay.vosburgh@canonical.com>
>Date: Mon, 13 May 2019 09:43:30 -0700
>
>> 	That would be my preference, as the 29c4948293bf commit looks to
>> be the change actually being fixed.
>
>Sorry I pushed the original commit message out :-(
>
>But isn't the Fixes: tag he choose the one where the logic actually
>causes problems?  That's kind of my real criteria for Fixes: tags.

	I don't think so.  It looks like the problem being fixed here
(clearing recv_probe when we shouldn't) was introduced at 29c4948293bf,
but was not the only place the same problem existed.  3fe68df97c7f fixed
the other occurrences of this problem, but missed the specific one added
by 29c4948293bf, which is now fixed by this patch.

	In any event, both of the commits in question are old enough
that it's kind of moot, as -stable will presumably get the right thing
regardless.

	-J

---
	-Jay Vosburgh, jay.vosburgh@canonical.com
diff mbox series

Patch

diff --git a/drivers/net/bonding/bond_options.c b/drivers/net/bonding/bond_options.c
index da1fc17295d9..b996967af8d9 100644
--- a/drivers/net/bonding/bond_options.c
+++ b/drivers/net/bonding/bond_options.c
@@ -1098,13 +1098,6 @@  static int bond_option_arp_validate_set(struct bonding *bond,
 {
 	netdev_dbg(bond->dev, "Setting arp_validate to %s (%llu)\n",
 		   newval->string, newval->value);
-
-	if (bond->dev->flags & IFF_UP) {
-		if (!newval->value)
-			bond->recv_probe = NULL;
-		else if (bond->params.arp_interval)
-			bond->recv_probe = bond_arp_rcv;
-	}
 	bond->params.arp_validate = newval->value;
 
 	return 0;