diff mbox series

[net,v3] failover: allow name change on IFF_UP slave interfaces

Message ID 1553644093-10917-1-git-send-email-si-wei.liu@oracle.com
State Changes Requested
Delegated to: David Miller
Headers show
Series [net,v3] failover: allow name change on IFF_UP slave interfaces | expand

Commit Message

Si-Wei Liu March 26, 2019, 11:48 p.m. UTC
When a netdev appears through hot plug then gets enslaved by a failover
master that is already up and running, the slave will be opened
right away after getting enslaved. Today there's a race that userspace
(udev) may fail to rename the slave if the kernel (net_failover)
opens the slave earlier than when the userspace rename happens.
Unlike bond or team, the primary slave of failover can't be renamed by
userspace ahead of time, since the kernel initiated auto-enslavement is
unable to, or rather, is never meant to be synchronized with the rename
request from userspace.

As the failover slave interfaces are not designed to be operated
directly by userspace apps: IP configuration, filter rules with
regard to network traffic passing and etc., should all be done on master
interface. In general, userspace apps only care about the
name of master interface, while slave names are less important as long
as admin users can see reliable names that may carry
other information describing the netdev. For e.g., they can infer that
"ens3nsby" is a standby slave of "ens3", while for a
name like "eth0" they can't tell which master it belongs to.

Historically the name of IFF_UP interface can't be changed because
there might be admin script or management software that is already
relying on such behavior and assumes that the slave name can't be
changed once UP. But failover is special: with the in-kernel
auto-enslavement mechanism, the userspace expectation for device
enumeration and bring-up order is already broken. Previously initramfs
and various userspace config tools were modified to bypass failover
slaves because of auto-enslavement and duplicate MAC address. Similarly,
in case that users care about seeing reliable slave name, the new type
of failover slaves needs to be taken care of specifically in userspace
anyway.

It's less risky to lift up the rename restriction on failover slave
which is already UP. Although it's possible this change may potentially
break userspace component (most likely configuration scripts or
management software) that assumes slave name can't be changed while
UP, it's relatively a limited and controllable set among all userspace
components, which can be fixed specifically to listen for the rename
and/or link down/up events on failover slaves. Userspace component
interacting with slaves is expected to be changed to operate on failover
master interface instead, as the failover slave is dynamic in nature
which may come and go at any point.  The goal is to make the role of
failover slaves less relevant, and userspace components should only
deal with failover master in the long run.

Fixes: 30c8bd5aa8b2 ("net: Introduce generic failover module")
Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
Reviewed-by: Liran Alon <liran.alon@oracle.com>

--
v1 -> v2:
- Drop configurable module parameter (Sridhar)

v2 -> v3:
- Drop additional IFF_SLAVE_RENAME_OK flag (Sridhar)
- Send down and up events around rename (Michael S. Tsirkin)
---
 net/core/dev.c | 37 ++++++++++++++++++++++++++++++++++---
 1 file changed, 34 insertions(+), 3 deletions(-)

Comments

Stephen Hemminger March 27, 2019, 2:13 a.m. UTC | #1
On Tue, 26 Mar 2019 19:48:13 -0400
Si-Wei Liu <si-wei.liu@oracle.com> wrote:

> When a netdev appears through hot plug then gets enslaved by a failover
> master that is already up and running, the slave will be opened
> right away after getting enslaved. Today there's a race that userspace
> (udev) may fail to rename the slave if the kernel (net_failover)
> opens the slave earlier than when the userspace rename happens.
> Unlike bond or team, the primary slave of failover can't be renamed by
> userspace ahead of time, since the kernel initiated auto-enslavement is
> unable to, or rather, is never meant to be synchronized with the rename
> request from userspace.
> 
> As the failover slave interfaces are not designed to be operated
> directly by userspace apps: IP configuration, filter rules with
> regard to network traffic passing and etc., should all be done on master
> interface. In general, userspace apps only care about the
> name of master interface, while slave names are less important as long
> as admin users can see reliable names that may carry
> other information describing the netdev. For e.g., they can infer that
> "ens3nsby" is a standby slave of "ens3", while for a
> name like "eth0" they can't tell which master it belongs to.
> 
> Historically the name of IFF_UP interface can't be changed because
> there might be admin script or management software that is already
> relying on such behavior and assumes that the slave name can't be
> changed once UP. But failover is special: with the in-kernel
> auto-enslavement mechanism, the userspace expectation for device
> enumeration and bring-up order is already broken. Previously initramfs
> and various userspace config tools were modified to bypass failover
> slaves because of auto-enslavement and duplicate MAC address. Similarly,
> in case that users care about seeing reliable slave name, the new type
> of failover slaves needs to be taken care of specifically in userspace
> anyway.
> 
> It's less risky to lift up the rename restriction on failover slave
> which is already UP. Although it's possible this change may potentially
> break userspace component (most likely configuration scripts or
> management software) that assumes slave name can't be changed while
> UP, it's relatively a limited and controllable set among all userspace
> components, which can be fixed specifically to listen for the rename
> and/or link down/up events on failover slaves. Userspace component
> interacting with slaves is expected to be changed to operate on failover
> master interface instead, as the failover slave is dynamic in nature
> which may come and go at any point.  The goal is to make the role of
> failover slaves less relevant, and userspace components should only
> deal with failover master in the long run.
> 
> Fixes: 30c8bd5aa8b2 ("net: Introduce generic failover module")
> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
> Reviewed-by: Liran Alon <liran.alon@oracle.com>


Why do you need to do dev_close/dev_open which will bounce
the link?
Jiri Pirko March 27, 2019, 11:11 a.m. UTC | #2
Wed, Mar 27, 2019 at 12:48:13AM CET, si-wei.liu@oracle.com wrote:
>When a netdev appears through hot plug then gets enslaved by a failover
>master that is already up and running, the slave will be opened
>right away after getting enslaved. Today there's a race that userspace
>(udev) may fail to rename the slave if the kernel (net_failover)
>opens the slave earlier than when the userspace rename happens.
>Unlike bond or team, the primary slave of failover can't be renamed by
>userspace ahead of time, since the kernel initiated auto-enslavement is
>unable to, or rather, is never meant to be synchronized with the rename
>request from userspace.
>
>As the failover slave interfaces are not designed to be operated
>directly by userspace apps: IP configuration, filter rules with
>regard to network traffic passing and etc., should all be done on master
>interface. In general, userspace apps only care about the
>name of master interface, while slave names are less important as long
>as admin users can see reliable names that may carry
>other information describing the netdev. For e.g., they can infer that
>"ens3nsby" is a standby slave of "ens3", while for a
>name like "eth0" they can't tell which master it belongs to.
>
>Historically the name of IFF_UP interface can't be changed because
>there might be admin script or management software that is already
>relying on such behavior and assumes that the slave name can't be
>changed once UP. But failover is special: with the in-kernel
>auto-enslavement mechanism, the userspace expectation for device
>enumeration and bring-up order is already broken. Previously initramfs
>and various userspace config tools were modified to bypass failover
>slaves because of auto-enslavement and duplicate MAC address. Similarly,
>in case that users care about seeing reliable slave name, the new type
>of failover slaves needs to be taken care of specifically in userspace
>anyway.
>
>It's less risky to lift up the rename restriction on failover slave
>which is already UP. Although it's possible this change may potentially
>break userspace component (most likely configuration scripts or
>management software) that assumes slave name can't be changed while
>UP, it's relatively a limited and controllable set among all userspace
>components, which can be fixed specifically to listen for the rename
>and/or link down/up events on failover slaves. Userspace component
>interacting with slaves is expected to be changed to operate on failover
>master interface instead, as the failover slave is dynamic in nature
>which may come and go at any point.  The goal is to make the role of
>failover slaves less relevant, and userspace components should only
>deal with failover master in the long run.
>
>Fixes: 30c8bd5aa8b2 ("net: Introduce generic failover module")
>Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
>Reviewed-by: Liran Alon <liran.alon@oracle.com>
>
>--
>v1 -> v2:
>- Drop configurable module parameter (Sridhar)
>
>v2 -> v3:
>- Drop additional IFF_SLAVE_RENAME_OK flag (Sridhar)
>- Send down and up events around rename (Michael S. Tsirkin)
>---
> net/core/dev.c | 37 ++++++++++++++++++++++++++++++++++---
> 1 file changed, 34 insertions(+), 3 deletions(-)
>
>diff --git a/net/core/dev.c b/net/core/dev.c
>index 722d50d..3e0cd80 100644
>--- a/net/core/dev.c
>+++ b/net/core/dev.c
>@@ -1171,6 +1171,7 @@ int dev_get_valid_name(struct net *net, struct net_device *dev,
> int dev_change_name(struct net_device *dev, const char *newname)
> {
> 	unsigned char old_assign_type;
>+	bool reopen_needed = false;
> 	char oldname[IFNAMSIZ];
> 	int err = 0;
> 	int ret;
>@@ -1180,8 +1181,24 @@ int dev_change_name(struct net_device *dev, const char *newname)
> 	BUG_ON(!dev_net(dev));
> 
> 	net = dev_net(dev);
>-	if (dev->flags & IFF_UP)
>-		return -EBUSY;
>+
>+	/* Allow failover slave to rename even when
>+	 * it is up and running.
>+	 *
>+	 * Failover slaves are special, since userspace
>+	 * might rename the slave after the interface
>+	 * has been brought up and running due to
>+	 * auto-enslavement.
>+	 *
>+	 * Failover users don't actually care about slave
>+	 * name change, as they are only expected to operate
>+	 * on master interface directly.
>+	 */
>+	if (dev->flags & IFF_UP) {
>+		if (likely(!(dev->priv_flags & IFF_FAILOVER_SLAVE)))
>+			return -EBUSY;
>+		reopen_needed = true;
>+	}
> 
> 	write_seqcount_begin(&devnet_rename_seq);
> 
>@@ -1198,6 +1215,9 @@ int dev_change_name(struct net_device *dev, const char *newname)
> 		return err;
> 	}
> 
>+	if (reopen_needed)
>+		dev_close(dev);

Ugh. Don't dev_close/dev_open on name change.


>+
> 	if (oldname[0] && !strchr(oldname, '%'))
> 		netdev_info(dev, "renamed from %s\n", oldname);
> 
>@@ -1210,7 +1230,9 @@ int dev_change_name(struct net_device *dev, const char *newname)
> 		memcpy(dev->name, oldname, IFNAMSIZ);
> 		dev->name_assign_type = old_assign_type;
> 		write_seqcount_end(&devnet_rename_seq);
>-		return ret;
>+		if (err >= 0)
>+			err = ret;
>+		goto reopen;
> 	}
> 
> 	write_seqcount_end(&devnet_rename_seq);
>@@ -1246,6 +1268,15 @@ int dev_change_name(struct net_device *dev, const char *newname)
> 		}
> 	}
> 
>+reopen:
>+	if (reopen_needed) {
>+		ret = dev_open(dev);
>+		if (ret) {
>+			pr_err("%s: reopen device failed: %d\n",
>+			       dev->name, ret);
>+		}
>+	}
>+
> 	return err;
> }
> 
>-- 
>1.8.3.1
>
Michael S. Tsirkin March 27, 2019, 1:25 p.m. UTC | #3
On Tue, Mar 26, 2019 at 07:13:42PM -0700, Stephen Hemminger wrote:
> On Tue, 26 Mar 2019 19:48:13 -0400
> Si-Wei Liu <si-wei.liu@oracle.com> wrote:
> 
> > When a netdev appears through hot plug then gets enslaved by a failover
> > master that is already up and running, the slave will be opened
> > right away after getting enslaved. Today there's a race that userspace
> > (udev) may fail to rename the slave if the kernel (net_failover)
> > opens the slave earlier than when the userspace rename happens.
> > Unlike bond or team, the primary slave of failover can't be renamed by
> > userspace ahead of time, since the kernel initiated auto-enslavement is
> > unable to, or rather, is never meant to be synchronized with the rename
> > request from userspace.
> > 
> > As the failover slave interfaces are not designed to be operated
> > directly by userspace apps: IP configuration, filter rules with
> > regard to network traffic passing and etc., should all be done on master
> > interface. In general, userspace apps only care about the
> > name of master interface, while slave names are less important as long
> > as admin users can see reliable names that may carry
> > other information describing the netdev. For e.g., they can infer that
> > "ens3nsby" is a standby slave of "ens3", while for a
> > name like "eth0" they can't tell which master it belongs to.
> > 
> > Historically the name of IFF_UP interface can't be changed because
> > there might be admin script or management software that is already
> > relying on such behavior and assumes that the slave name can't be
> > changed once UP. But failover is special: with the in-kernel
> > auto-enslavement mechanism, the userspace expectation for device
> > enumeration and bring-up order is already broken. Previously initramfs
> > and various userspace config tools were modified to bypass failover
> > slaves because of auto-enslavement and duplicate MAC address. Similarly,
> > in case that users care about seeing reliable slave name, the new type
> > of failover slaves needs to be taken care of specifically in userspace
> > anyway.
> > 
> > It's less risky to lift up the rename restriction on failover slave
> > which is already UP. Although it's possible this change may potentially
> > break userspace component (most likely configuration scripts or
> > management software) that assumes slave name can't be changed while
> > UP, it's relatively a limited and controllable set among all userspace
> > components, which can be fixed specifically to listen for the rename
> > and/or link down/up events on failover slaves. Userspace component
> > interacting with slaves is expected to be changed to operate on failover
> > master interface instead, as the failover slave is dynamic in nature
> > which may come and go at any point.  The goal is to make the role of
> > failover slaves less relevant, and userspace components should only
> > deal with failover master in the long run.
> > 
> > Fixes: 30c8bd5aa8b2 ("net: Introduce generic failover module")
> > Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
> > Reviewed-by: Liran Alon <liran.alon@oracle.com>
> 
> 
> Why do you need to do dev_close/dev_open which will bounce
> the link?

What we need is notify userspace that link went up/down.
close/open will do that but just sending notifications
would do that as well without playing with link states.
Si-Wei Liu March 27, 2019, 8:10 p.m. UTC | #4
On 3/27/2019 6:25 AM, Michael S. Tsirkin wrote:
> On Tue, Mar 26, 2019 at 07:13:42PM -0700, Stephen Hemminger wrote:
>> On Tue, 26 Mar 2019 19:48:13 -0400
>> Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>>
>>> When a netdev appears through hot plug then gets enslaved by a failover
>>> master that is already up and running, the slave will be opened
>>> right away after getting enslaved. Today there's a race that userspace
>>> (udev) may fail to rename the slave if the kernel (net_failover)
>>> opens the slave earlier than when the userspace rename happens.
>>> Unlike bond or team, the primary slave of failover can't be renamed by
>>> userspace ahead of time, since the kernel initiated auto-enslavement is
>>> unable to, or rather, is never meant to be synchronized with the rename
>>> request from userspace.
>>>
>>> As the failover slave interfaces are not designed to be operated
>>> directly by userspace apps: IP configuration, filter rules with
>>> regard to network traffic passing and etc., should all be done on master
>>> interface. In general, userspace apps only care about the
>>> name of master interface, while slave names are less important as long
>>> as admin users can see reliable names that may carry
>>> other information describing the netdev. For e.g., they can infer that
>>> "ens3nsby" is a standby slave of "ens3", while for a
>>> name like "eth0" they can't tell which master it belongs to.
>>>
>>> Historically the name of IFF_UP interface can't be changed because
>>> there might be admin script or management software that is already
>>> relying on such behavior and assumes that the slave name can't be
>>> changed once UP. But failover is special: with the in-kernel
>>> auto-enslavement mechanism, the userspace expectation for device
>>> enumeration and bring-up order is already broken. Previously initramfs
>>> and various userspace config tools were modified to bypass failover
>>> slaves because of auto-enslavement and duplicate MAC address. Similarly,
>>> in case that users care about seeing reliable slave name, the new type
>>> of failover slaves needs to be taken care of specifically in userspace
>>> anyway.
>>>
>>> It's less risky to lift up the rename restriction on failover slave
>>> which is already UP. Although it's possible this change may potentially
>>> break userspace component (most likely configuration scripts or
>>> management software) that assumes slave name can't be changed while
>>> UP, it's relatively a limited and controllable set among all userspace
>>> components, which can be fixed specifically to listen for the rename
>>> and/or link down/up events on failover slaves. Userspace component
>>> interacting with slaves is expected to be changed to operate on failover
>>> master interface instead, as the failover slave is dynamic in nature
>>> which may come and go at any point.  The goal is to make the role of
>>> failover slaves less relevant, and userspace components should only
>>> deal with failover master in the long run.
>>>
>>> Fixes: 30c8bd5aa8b2 ("net: Introduce generic failover module")
>>> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
>>> Reviewed-by: Liran Alon <liran.alon@oracle.com>
>>
>> Why do you need to do dev_close/dev_open which will bounce
>> the link?
> What we need is notify userspace that link went up/down.
> close/open will do that but just sending notifications
> would do that as well without playing with link states.
>
Since you were requesting to send fake link down/up events around 
rename, so as to keep existing userspace intact with this behavioral 
change, right? The thing is if you can't fake notification with just 
IFF_UP or ~IFF_UP then claim everything is done. If you look at 
rtnl_fill_ifinfo() where the notification payload is prepared, you'll 
find a lot of states and flags are correlated:

ifi_flags
IFLA_OPERSTATE
IFLA_CARRIER
IFLA_CARRIER_CHANGES

which requires below states to be toggled or taken care of in between:

operstate
__LINK_STATE_START
__LINK_STATE_NOCARRIER
carrier_changes

for e.g. user mostly treats IFF_RUNNING as the indication of link 
up/down as opposed to IFF_UP. That would require you to toggle 
__LINK_STATE_START (and  operstate as well) without doing a full 
dev_close/open. Since __LINK_STATE_START is cleared, there's no sense to 
let CARRIER_OK remain set, and then you'd need to take care of 
carrier_changes... Since you don't really shutting down the device, the 
link watchdog keeps running and may race with inconsistent carrier state 
in between. dev_close/open may have done unneeded work, but it's the 
safest option IMHO, as apparently the cost and ugly complexity to fake 
link down/up events is not something worthwhile compared to simply 
bouncing the link state.

Another point is kernel consumers of the NETDEV_CHANGENAME notifier 
might well assume the link is already taken down by dev_close() before 
the rename. I didn't check all those consumers in tree but thought it 
might be safe to keep the current convention.

Now let me turn around and ask you what's your concerns if bouncing the 
link state. While I can tweak a lightweight version of dev_close/open to 
bypass ndo_stop and ndo_start while shutting down the link watchdog on 
behalf of drivers, it's far more involved than make me think if that's 
really what you had in mind.

Another less safer option is that we just notify userspace anyway 
without sending down/up event around, as I don't see *any real 
application* cares about the link state or whatsoever when it attempts 
to detect rename. Given that the scope is limited to failover slave the 
chance of breaking userspace app would be extremely low in practice.

Thanks,
-Siwei
Michael S. Tsirkin March 27, 2019, 10:16 p.m. UTC | #5
On Wed, Mar 27, 2019 at 01:10:10PM -0700, si-wei liu wrote:
> Another less safer option is that we just notify userspace anyway without
> sending down/up event around, as I don't see *any real application* cares
> about the link state or whatsoever when it attempts to detect rename.

How do you write a race ree handler then? ATM just detecting link up is
sufficient and covers 100% of cases. Seems like a good idea to keep it
that way.
Si-Wei Liu March 27, 2019, 10:31 p.m. UTC | #6
On 3/27/2019 3:16 PM, Michael S. Tsirkin wrote:
> On Wed, Mar 27, 2019 at 01:10:10PM -0700, si-wei liu wrote:
>> Another less safer option is that we just notify userspace anyway without
>> sending down/up event around, as I don't see *any real application* cares
>> about the link state or whatsoever when it attempts to detect rename.
> How do you write a race ree handler then? ATM just detecting link up is
> sufficient and covers 100% of cases. Seems like a good idea to keep it
> that way.
>
What do you mean? Which flag or attribute do you think 100% of the 
userspace regard it as link up? And why userspace that cares about name 
change needs to double check whether link is up or not? I'm sorry, but 
are you sure this is the 100% case that every userspace app does it like 
what you're claiming?

-Siwei
Si-Wei Liu March 27, 2019, 10:48 p.m. UTC | #7
On 3/27/2019 3:31 PM, si-wei liu wrote:
>
>
> On 3/27/2019 3:16 PM, Michael S. Tsirkin wrote:
>> On Wed, Mar 27, 2019 at 01:10:10PM -0700, si-wei liu wrote:
>>> Another less safer option is that we just notify userspace anyway 
>>> without
>>> sending down/up event around, as I don't see *any real application* 
>>> cares
>>> about the link state or whatsoever when it attempts to detect rename.
>> How do you write a race ree handler then? ATM just detecting link up is
>> sufficient and covers 100% of cases. Seems like a good idea to keep it
>> that way.
>>
> What do you mean? Which flag or attribute do you think 100% of the 
> userspace regard it as link up? And why userspace that cares about 
> name change needs to double check whether link is up or not? I'm 
> sorry, but are you sure this is the 100% case that every userspace app 
> does it like what you're claiming?
>
> -Siwei
To me even the any flag checking regarding link state is unnecessary as 
it's hard for userspace apps to follow precisely the name change 
together with the link state. I'm not sure if you still persist in 
sending link down/up event. What the userspace apps that care about the 
name so far as I see chase the name through ifindex. I'm really doubtful 
if link state is that important to them. But if you really need that 
consistency I'd say bouncing the link is perhaps the only safe option.

-Siwei
Si-Wei Liu March 27, 2019, 11:44 p.m. UTC | #8
On 3/27/2019 4:11 AM, Jiri Pirko wrote:
> Wed, Mar 27, 2019 at 12:48:13AM CET, si-wei.liu@oracle.com wrote:
>> When a netdev appears through hot plug then gets enslaved by a failover
>> master that is already up and running, the slave will be opened
>> right away after getting enslaved. Today there's a race that userspace
>> (udev) may fail to rename the slave if the kernel (net_failover)
>> opens the slave earlier than when the userspace rename happens.
>> Unlike bond or team, the primary slave of failover can't be renamed by
>> userspace ahead of time, since the kernel initiated auto-enslavement is
>> unable to, or rather, is never meant to be synchronized with the rename
>> request from userspace.
>>
>> As the failover slave interfaces are not designed to be operated
>> directly by userspace apps: IP configuration, filter rules with
>> regard to network traffic passing and etc., should all be done on master
>> interface. In general, userspace apps only care about the
>> name of master interface, while slave names are less important as long
>> as admin users can see reliable names that may carry
>> other information describing the netdev. For e.g., they can infer that
>> "ens3nsby" is a standby slave of "ens3", while for a
>> name like "eth0" they can't tell which master it belongs to.
>>
>> Historically the name of IFF_UP interface can't be changed because
>> there might be admin script or management software that is already
>> relying on such behavior and assumes that the slave name can't be
>> changed once UP. But failover is special: with the in-kernel
>> auto-enslavement mechanism, the userspace expectation for device
>> enumeration and bring-up order is already broken. Previously initramfs
>> and various userspace config tools were modified to bypass failover
>> slaves because of auto-enslavement and duplicate MAC address. Similarly,
>> in case that users care about seeing reliable slave name, the new type
>> of failover slaves needs to be taken care of specifically in userspace
>> anyway.
>>
>> It's less risky to lift up the rename restriction on failover slave
>> which is already UP. Although it's possible this change may potentially
>> break userspace component (most likely configuration scripts or
>> management software) that assumes slave name can't be changed while
>> UP, it's relatively a limited and controllable set among all userspace
>> components, which can be fixed specifically to listen for the rename
>> and/or link down/up events on failover slaves. Userspace component
>> interacting with slaves is expected to be changed to operate on failover
>> master interface instead, as the failover slave is dynamic in nature
>> which may come and go at any point.  The goal is to make the role of
>> failover slaves less relevant, and userspace components should only
>> deal with failover master in the long run.
>>
>> Fixes: 30c8bd5aa8b2 ("net: Introduce generic failover module")
>> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
>> Reviewed-by: Liran Alon <liran.alon@oracle.com>
>>
>> --
>> v1 -> v2:
>> - Drop configurable module parameter (Sridhar)
>>
>> v2 -> v3:
>> - Drop additional IFF_SLAVE_RENAME_OK flag (Sridhar)
>> - Send down and up events around rename (Michael S. Tsirkin)
>> ---
>> net/core/dev.c | 37 ++++++++++++++++++++++++++++++++++---
>> 1 file changed, 34 insertions(+), 3 deletions(-)
>>
>> diff --git a/net/core/dev.c b/net/core/dev.c
>> index 722d50d..3e0cd80 100644
>> --- a/net/core/dev.c
>> +++ b/net/core/dev.c
>> @@ -1171,6 +1171,7 @@ int dev_get_valid_name(struct net *net, struct net_device *dev,
>> int dev_change_name(struct net_device *dev, const char *newname)
>> {
>> 	unsigned char old_assign_type;
>> +	bool reopen_needed = false;
>> 	char oldname[IFNAMSIZ];
>> 	int err = 0;
>> 	int ret;
>> @@ -1180,8 +1181,24 @@ int dev_change_name(struct net_device *dev, const char *newname)
>> 	BUG_ON(!dev_net(dev));
>>
>> 	net = dev_net(dev);
>> -	if (dev->flags & IFF_UP)
>> -		return -EBUSY;
>> +
>> +	/* Allow failover slave to rename even when
>> +	 * it is up and running.
>> +	 *
>> +	 * Failover slaves are special, since userspace
>> +	 * might rename the slave after the interface
>> +	 * has been brought up and running due to
>> +	 * auto-enslavement.
>> +	 *
>> +	 * Failover users don't actually care about slave
>> +	 * name change, as they are only expected to operate
>> +	 * on master interface directly.
>> +	 */
>> +	if (dev->flags & IFF_UP) {
>> +		if (likely(!(dev->priv_flags & IFF_FAILOVER_SLAVE)))
>> +			return -EBUSY;
>> +		reopen_needed = true;
>> +	}
>>
>> 	write_seqcount_begin(&devnet_rename_seq);
>>
>> @@ -1198,6 +1215,9 @@ int dev_change_name(struct net_device *dev, const char *newname)
>> 		return err;
>> 	}
>>
>> +	if (reopen_needed)
>> +		dev_close(dev);
> Ugh. Don't dev_close/dev_open on name change.
See my response to Michael and Stephen. What's your suggestion then?
>
>> +
>> 	if (oldname[0] && !strchr(oldname, '%'))
>> 		netdev_info(dev, "renamed from %s\n", oldname);
>>
>> @@ -1210,7 +1230,9 @@ int dev_change_name(struct net_device *dev, const char *newname)
>> 		memcpy(dev->name, oldname, IFNAMSIZ);
>> 		dev->name_assign_type = old_assign_type;
>> 		write_seqcount_end(&devnet_rename_seq);
>> -		return ret;
>> +		if (err >= 0)
>> +			err = ret;
>> +		goto reopen;
>> 	}
>>
>> 	write_seqcount_end(&devnet_rename_seq);
>> @@ -1246,6 +1268,15 @@ int dev_change_name(struct net_device *dev, const char *newname)
>> 		}
>> 	}
>>
>> +reopen:
>> +	if (reopen_needed) {
>> +		ret = dev_open(dev);
>> +		if (ret) {
>> +			pr_err("%s: reopen device failed: %d\n",
>> +			       dev->name, ret);
>> +		}
>> +	}
>> +
>> 	return err;
>> }
>>
>> -- 
>> 1.8.3.1
>>
Stephen Hemminger March 28, 2019, 5:14 p.m. UTC | #9
On Wed, 27 Mar 2019 16:44:19 -0700
si-wei liu <si-wei.liu@oracle.com> wrote:

> On 3/27/2019 4:11 AM, Jiri Pirko wrote:
> > Wed, Mar 27, 2019 at 12:48:13AM CET, si-wei.liu@oracle.com wrote:  
> >> When a netdev appears through hot plug then gets enslaved by a failover
> >> master that is already up and running, the slave will be opened
> >> right away after getting enslaved. Today there's a race that userspace
> >> (udev) may fail to rename the slave if the kernel (net_failover)
> >> opens the slave earlier than when the userspace rename happens.
> >> Unlike bond or team, the primary slave of failover can't be renamed by
> >> userspace ahead of time, since the kernel initiated auto-enslavement is
> >> unable to, or rather, is never meant to be synchronized with the rename
> >> request from userspace.
> >>
> >> As the failover slave interfaces are not designed to be operated
> >> directly by userspace apps: IP configuration, filter rules with
> >> regard to network traffic passing and etc., should all be done on master
> >> interface. In general, userspace apps only care about the
> >> name of master interface, while slave names are less important as long
> >> as admin users can see reliable names that may carry
> >> other information describing the netdev. For e.g., they can infer that
> >> "ens3nsby" is a standby slave of "ens3", while for a
> >> name like "eth0" they can't tell which master it belongs to.
> >>
> >> Historically the name of IFF_UP interface can't be changed because
> >> there might be admin script or management software that is already
> >> relying on such behavior and assumes that the slave name can't be
> >> changed once UP. But failover is special: with the in-kernel
> >> auto-enslavement mechanism, the userspace expectation for device
> >> enumeration and bring-up order is already broken. Previously initramfs
> >> and various userspace config tools were modified to bypass failover
> >> slaves because of auto-enslavement and duplicate MAC address. Similarly,
> >> in case that users care about seeing reliable slave name, the new type
> >> of failover slaves needs to be taken care of specifically in userspace
> >> anyway.
> >>
> >> It's less risky to lift up the rename restriction on failover slave
> >> which is already UP. Although it's possible this change may potentially
> >> break userspace component (most likely configuration scripts or
> >> management software) that assumes slave name can't be changed while
> >> UP, it's relatively a limited and controllable set among all userspace
> >> components, which can be fixed specifically to listen for the rename
> >> and/or link down/up events on failover slaves. Userspace component
> >> interacting with slaves is expected to be changed to operate on failover
> >> master interface instead, as the failover slave is dynamic in nature
> >> which may come and go at any point.  The goal is to make the role of
> >> failover slaves less relevant, and userspace components should only
> >> deal with failover master in the long run.
> >>
> >> Fixes: 30c8bd5aa8b2 ("net: Introduce generic failover module")
> >> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
> >> Reviewed-by: Liran Alon <liran.alon@oracle.com>
> >>
> >> --
> >> v1 -> v2:
> >> - Drop configurable module parameter (Sridhar)
> >>
> >> v2 -> v3:
> >> - Drop additional IFF_SLAVE_RENAME_OK flag (Sridhar)
> >> - Send down and up events around rename (Michael S. Tsirkin)
> >> ---
> >> net/core/dev.c | 37 ++++++++++++++++++++++++++++++++++---
> >> 1 file changed, 34 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/net/core/dev.c b/net/core/dev.c
> >> index 722d50d..3e0cd80 100644
> >> --- a/net/core/dev.c
> >> +++ b/net/core/dev.c
> >> @@ -1171,6 +1171,7 @@ int dev_get_valid_name(struct net *net, struct net_device *dev,
> >> int dev_change_name(struct net_device *dev, const char *newname)
> >> {
> >> 	unsigned char old_assign_type;
> >> +	bool reopen_needed = false;
> >> 	char oldname[IFNAMSIZ];
> >> 	int err = 0;
> >> 	int ret;
> >> @@ -1180,8 +1181,24 @@ int dev_change_name(struct net_device *dev, const char *newname)
> >> 	BUG_ON(!dev_net(dev));
> >>
> >> 	net = dev_net(dev);
> >> -	if (dev->flags & IFF_UP)
> >> -		return -EBUSY;
> >> +
> >> +	/* Allow failover slave to rename even when
> >> +	 * it is up and running.
> >> +	 *
> >> +	 * Failover slaves are special, since userspace
> >> +	 * might rename the slave after the interface
> >> +	 * has been brought up and running due to
> >> +	 * auto-enslavement.
> >> +	 *
> >> +	 * Failover users don't actually care about slave
> >> +	 * name change, as they are only expected to operate
> >> +	 * on master interface directly.
> >> +	 */
> >> +	if (dev->flags & IFF_UP) {
> >> +		if (likely(!(dev->priv_flags & IFF_FAILOVER_SLAVE)))
> >> +			return -EBUSY;
> >> +		reopen_needed = true;
> >> +	}
> >>
> >> 	write_seqcount_begin(&devnet_rename_seq);
> >>
> >> @@ -1198,6 +1215,9 @@ int dev_change_name(struct net_device *dev, const char *newname)
> >> 		return err;
> >> 	}
> >>
> >> +	if (reopen_needed)
> >> +		dev_close(dev);  
> > Ugh. Don't dev_close/dev_open on name change.  
> See my response to Michael and Stephen. What's your suggestion then?

To a DEV_CHANGE notification instead?


My opinion is that allowing name change is not worth the doing.
Also, the kernel should never do the name change, it is up to userspace.
kernel test robot March 28, 2019, 7:49 p.m. UTC | #10
Hi Si-Wei,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on net/master]

url:    https://github.com/0day-ci/linux/commits/Si-Wei-Liu/failover-allow-name-change-on-IFF_UP-slave-interfaces/20190329-020744
config: openrisc-or1ksim_defconfig (attached as .config)
compiler: or1k-linux-gcc (GCC) 6.0.0 20160327 (experimental)
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=openrisc 

All errors (new ones prefixed by >>):

   net/core/dev.c: In function 'dev_change_name':
>> net/core/dev.c:1277:9: error: too few arguments to function 'dev_open'
      ret = dev_open(dev);
            ^~~~~~~~
   In file included from net/core/dev.c:93:0:
   include/linux/netdevice.h:2636:5: note: declared here
    int dev_open(struct net_device *dev, struct netlink_ext_ack *extack);
        ^~~~~~~~

vim +/dev_open +1277 net/core/dev.c

  1166	
  1167	/**
  1168	 *	dev_change_name - change name of a device
  1169	 *	@dev: device
  1170	 *	@newname: name (or format string) must be at least IFNAMSIZ
  1171	 *
  1172	 *	Change name of a device, can pass format strings "eth%d".
  1173	 *	for wildcarding.
  1174	 */
  1175	int dev_change_name(struct net_device *dev, const char *newname)
  1176	{
  1177		unsigned char old_assign_type;
  1178		bool reopen_needed = false;
  1179		char oldname[IFNAMSIZ];
  1180		int err = 0;
  1181		int ret;
  1182		struct net *net;
  1183	
  1184		ASSERT_RTNL();
  1185		BUG_ON(!dev_net(dev));
  1186	
  1187		net = dev_net(dev);
  1188	
  1189		/* Allow failover slave to rename even when
  1190		 * it is up and running.
  1191		 *
  1192		 * Failover slaves are special, since userspace
  1193		 * might rename the slave after the interface
  1194		 * has been brought up and running due to
  1195		 * auto-enslavement.
  1196		 *
  1197		 * Failover users don't actually care about slave
  1198		 * name change, as they are only expected to operate
  1199		 * on master interface directly.
  1200		 */
  1201		if (dev->flags & IFF_UP) {
  1202			if (likely(!(dev->priv_flags & IFF_FAILOVER_SLAVE)))
  1203				return -EBUSY;
  1204			reopen_needed = true;
  1205		}
  1206	
  1207		write_seqcount_begin(&devnet_rename_seq);
  1208	
  1209		if (strncmp(newname, dev->name, IFNAMSIZ) == 0) {
  1210			write_seqcount_end(&devnet_rename_seq);
  1211			return 0;
  1212		}
  1213	
  1214		memcpy(oldname, dev->name, IFNAMSIZ);
  1215	
  1216		err = dev_get_valid_name(net, dev, newname);
  1217		if (err < 0) {
  1218			write_seqcount_end(&devnet_rename_seq);
  1219			return err;
  1220		}
  1221	
  1222		if (reopen_needed)
  1223			dev_close(dev);
  1224	
  1225		if (oldname[0] && !strchr(oldname, '%'))
  1226			netdev_info(dev, "renamed from %s\n", oldname);
  1227	
  1228		old_assign_type = dev->name_assign_type;
  1229		dev->name_assign_type = NET_NAME_RENAMED;
  1230	
  1231	rollback:
  1232		ret = device_rename(&dev->dev, dev->name);
  1233		if (ret) {
  1234			memcpy(dev->name, oldname, IFNAMSIZ);
  1235			dev->name_assign_type = old_assign_type;
  1236			write_seqcount_end(&devnet_rename_seq);
  1237			if (err >= 0)
  1238				err = ret;
  1239			goto reopen;
  1240		}
  1241	
  1242		write_seqcount_end(&devnet_rename_seq);
  1243	
  1244		netdev_adjacent_rename_links(dev, oldname);
  1245	
  1246		write_lock_bh(&dev_base_lock);
  1247		hlist_del_rcu(&dev->name_hlist);
  1248		write_unlock_bh(&dev_base_lock);
  1249	
  1250		synchronize_rcu();
  1251	
  1252		write_lock_bh(&dev_base_lock);
  1253		hlist_add_head_rcu(&dev->name_hlist, dev_name_hash(net, dev->name));
  1254		write_unlock_bh(&dev_base_lock);
  1255	
  1256		ret = call_netdevice_notifiers(NETDEV_CHANGENAME, dev);
  1257		ret = notifier_to_errno(ret);
  1258	
  1259		if (ret) {
  1260			/* err >= 0 after dev_alloc_name() or stores the first errno */
  1261			if (err >= 0) {
  1262				err = ret;
  1263				write_seqcount_begin(&devnet_rename_seq);
  1264				memcpy(dev->name, oldname, IFNAMSIZ);
  1265				memcpy(oldname, newname, IFNAMSIZ);
  1266				dev->name_assign_type = old_assign_type;
  1267				old_assign_type = NET_NAME_RENAMED;
  1268				goto rollback;
  1269			} else {
  1270				pr_err("%s: name change rollback failed: %d\n",
  1271				       dev->name, ret);
  1272			}
  1273		}
  1274	
  1275	reopen:
  1276		if (reopen_needed) {
> 1277			ret = dev_open(dev);
  1278			if (ret) {
  1279				pr_err("%s: reopen device failed: %d\n",
  1280				       dev->name, ret);
  1281			}
  1282		}
  1283	
  1284		return err;
  1285	}
  1286	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation
Si-Wei Liu March 28, 2019, 8:15 p.m. UTC | #11
On 3/28/2019 10:14 AM, Stephen Hemminger wrote:
> On Wed, 27 Mar 2019 16:44:19 -0700
> si-wei liu <si-wei.liu@oracle.com> wrote:
>
>> On 3/27/2019 4:11 AM, Jiri Pirko wrote:
>>> Wed, Mar 27, 2019 at 12:48:13AM CET, si-wei.liu@oracle.com wrote:
>>>> When a netdev appears through hot plug then gets enslaved by a failover
>>>> master that is already up and running, the slave will be opened
>>>> right away after getting enslaved. Today there's a race that userspace
>>>> (udev) may fail to rename the slave if the kernel (net_failover)
>>>> opens the slave earlier than when the userspace rename happens.
>>>> Unlike bond or team, the primary slave of failover can't be renamed by
>>>> userspace ahead of time, since the kernel initiated auto-enslavement is
>>>> unable to, or rather, is never meant to be synchronized with the rename
>>>> request from userspace.
>>>>
>>>> As the failover slave interfaces are not designed to be operated
>>>> directly by userspace apps: IP configuration, filter rules with
>>>> regard to network traffic passing and etc., should all be done on master
>>>> interface. In general, userspace apps only care about the
>>>> name of master interface, while slave names are less important as long
>>>> as admin users can see reliable names that may carry
>>>> other information describing the netdev. For e.g., they can infer that
>>>> "ens3nsby" is a standby slave of "ens3", while for a
>>>> name like "eth0" they can't tell which master it belongs to.
>>>>
>>>> Historically the name of IFF_UP interface can't be changed because
>>>> there might be admin script or management software that is already
>>>> relying on such behavior and assumes that the slave name can't be
>>>> changed once UP. But failover is special: with the in-kernel
>>>> auto-enslavement mechanism, the userspace expectation for device
>>>> enumeration and bring-up order is already broken. Previously initramfs
>>>> and various userspace config tools were modified to bypass failover
>>>> slaves because of auto-enslavement and duplicate MAC address. Similarly,
>>>> in case that users care about seeing reliable slave name, the new type
>>>> of failover slaves needs to be taken care of specifically in userspace
>>>> anyway.
>>>>
>>>> It's less risky to lift up the rename restriction on failover slave
>>>> which is already UP. Although it's possible this change may potentially
>>>> break userspace component (most likely configuration scripts or
>>>> management software) that assumes slave name can't be changed while
>>>> UP, it's relatively a limited and controllable set among all userspace
>>>> components, which can be fixed specifically to listen for the rename
>>>> and/or link down/up events on failover slaves. Userspace component
>>>> interacting with slaves is expected to be changed to operate on failover
>>>> master interface instead, as the failover slave is dynamic in nature
>>>> which may come and go at any point.  The goal is to make the role of
>>>> failover slaves less relevant, and userspace components should only
>>>> deal with failover master in the long run.
>>>>
>>>> Fixes: 30c8bd5aa8b2 ("net: Introduce generic failover module")
>>>> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
>>>> Reviewed-by: Liran Alon <liran.alon@oracle.com>
>>>>
>>>> --
>>>> v1 -> v2:
>>>> - Drop configurable module parameter (Sridhar)
>>>>
>>>> v2 -> v3:
>>>> - Drop additional IFF_SLAVE_RENAME_OK flag (Sridhar)
>>>> - Send down and up events around rename (Michael S. Tsirkin)
>>>> ---
>>>> net/core/dev.c | 37 ++++++++++++++++++++++++++++++++++---
>>>> 1 file changed, 34 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/net/core/dev.c b/net/core/dev.c
>>>> index 722d50d..3e0cd80 100644
>>>> --- a/net/core/dev.c
>>>> +++ b/net/core/dev.c
>>>> @@ -1171,6 +1171,7 @@ int dev_get_valid_name(struct net *net, struct net_device *dev,
>>>> int dev_change_name(struct net_device *dev, const char *newname)
>>>> {
>>>> 	unsigned char old_assign_type;
>>>> +	bool reopen_needed = false;
>>>> 	char oldname[IFNAMSIZ];
>>>> 	int err = 0;
>>>> 	int ret;
>>>> @@ -1180,8 +1181,24 @@ int dev_change_name(struct net_device *dev, const char *newname)
>>>> 	BUG_ON(!dev_net(dev));
>>>>
>>>> 	net = dev_net(dev);
>>>> -	if (dev->flags & IFF_UP)
>>>> -		return -EBUSY;
>>>> +
>>>> +	/* Allow failover slave to rename even when
>>>> +	 * it is up and running.
>>>> +	 *
>>>> +	 * Failover slaves are special, since userspace
>>>> +	 * might rename the slave after the interface
>>>> +	 * has been brought up and running due to
>>>> +	 * auto-enslavement.
>>>> +	 *
>>>> +	 * Failover users don't actually care about slave
>>>> +	 * name change, as they are only expected to operate
>>>> +	 * on master interface directly.
>>>> +	 */
>>>> +	if (dev->flags & IFF_UP) {
>>>> +		if (likely(!(dev->priv_flags & IFF_FAILOVER_SLAVE)))
>>>> +			return -EBUSY;
>>>> +		reopen_needed = true;
>>>> +	}
>>>>
>>>> 	write_seqcount_begin(&devnet_rename_seq);
>>>>
>>>> @@ -1198,6 +1215,9 @@ int dev_change_name(struct net_device *dev, const char *newname)
>>>> 		return err;
>>>> 	}
>>>>
>>>> +	if (reopen_needed)
>>>> +		dev_close(dev);
>>> Ugh. Don't dev_close/dev_open on name change.
>> See my response to Michael and Stephen. What's your suggestion then?
> To a DEV_CHANGE notification instead?
Thanks. That is what I was thinking. Will post a v4 to simplify the 
notification. Not worth the effort to fine tune for cases in hypothesis.

-Siwei

>
>
> My opinion is that allowing name change is not worth the doing.
> Also, the kernel should never do the name change, it is up to userspace.
>
diff mbox series

Patch

diff --git a/net/core/dev.c b/net/core/dev.c
index 722d50d..3e0cd80 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1171,6 +1171,7 @@  int dev_get_valid_name(struct net *net, struct net_device *dev,
 int dev_change_name(struct net_device *dev, const char *newname)
 {
 	unsigned char old_assign_type;
+	bool reopen_needed = false;
 	char oldname[IFNAMSIZ];
 	int err = 0;
 	int ret;
@@ -1180,8 +1181,24 @@  int dev_change_name(struct net_device *dev, const char *newname)
 	BUG_ON(!dev_net(dev));
 
 	net = dev_net(dev);
-	if (dev->flags & IFF_UP)
-		return -EBUSY;
+
+	/* Allow failover slave to rename even when
+	 * it is up and running.
+	 *
+	 * Failover slaves are special, since userspace
+	 * might rename the slave after the interface
+	 * has been brought up and running due to
+	 * auto-enslavement.
+	 *
+	 * Failover users don't actually care about slave
+	 * name change, as they are only expected to operate
+	 * on master interface directly.
+	 */
+	if (dev->flags & IFF_UP) {
+		if (likely(!(dev->priv_flags & IFF_FAILOVER_SLAVE)))
+			return -EBUSY;
+		reopen_needed = true;
+	}
 
 	write_seqcount_begin(&devnet_rename_seq);
 
@@ -1198,6 +1215,9 @@  int dev_change_name(struct net_device *dev, const char *newname)
 		return err;
 	}
 
+	if (reopen_needed)
+		dev_close(dev);
+
 	if (oldname[0] && !strchr(oldname, '%'))
 		netdev_info(dev, "renamed from %s\n", oldname);
 
@@ -1210,7 +1230,9 @@  int dev_change_name(struct net_device *dev, const char *newname)
 		memcpy(dev->name, oldname, IFNAMSIZ);
 		dev->name_assign_type = old_assign_type;
 		write_seqcount_end(&devnet_rename_seq);
-		return ret;
+		if (err >= 0)
+			err = ret;
+		goto reopen;
 	}
 
 	write_seqcount_end(&devnet_rename_seq);
@@ -1246,6 +1268,15 @@  int dev_change_name(struct net_device *dev, const char *newname)
 		}
 	}
 
+reopen:
+	if (reopen_needed) {
+		ret = dev_open(dev);
+		if (ret) {
+			pr_err("%s: reopen device failed: %d\n",
+			       dev->name, ret);
+		}
+	}
+
 	return err;
 }