diff mbox series

net: Fix device name resolving crash in default_device_exit()

Message ID 88c1a635-4404-253b-6144-d7f3f9779531@virtuozzo.com
State Changes Requested, archived
Delegated to: David Miller
Headers show
Series net: Fix device name resolving crash in default_device_exit() | expand

Commit Message

Kirill Tkhai June 20, 2018, 8:57 a.m. UTC
From: Kirill Tkhai <ktkhai@virtuozzo.com>

The following script makes kernel to crash since it can't obtain
a name for a device, when the name is occupied by another device:

#!/bin/bash
ifconfig eth0 down
ifconfig eth1 down
index=`cat /sys/class/net/eth1/ifindex`
ip link set eth1 name dev$index
unshare -n sleep 1h &
pid=$!
while [[ "`readlink /proc/self/ns/net`" == "`readlink /proc/$pid/ns/net`" ]]; do continue; done
ip link set dev$index netns $pid
ip link set eth0 name dev$index
kill -9 $pid

Kernel messages:

virtio_net virtio1 dev3: renamed from eth1
virtio_net virtio0 dev3: renamed from eth0
default_device_exit: failed to move dev3 to init_net: -17
------------[ cut here ]------------
kernel BUG at net/core/dev.c:8978!
invalid opcode: 0000 [#1] PREEMPT SMP
CPU: 1 PID: 276 Comm: kworker/u8:3 Not tainted 4.17.0+ #292
Workqueue: netns cleanup_net
RIP: 0010:default_device_exit+0x9c/0xb0
[stack trace snipped]

This patch gives more variability during choosing new name
of device and fixes the problem.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
---

Since there is no suggestions how to fix this in another way, I'm resending the patch.

 net/core/dev.c |    4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

Comments

David Ahern June 20, 2018, 5:15 p.m. UTC | #1
On 6/20/18 2:57 AM, Kirill Tkhai wrote:
> From: Kirill Tkhai <ktkhai@virtuozzo.com>
> 
> The following script makes kernel to crash since it can't obtain
> a name for a device, when the name is occupied by another device:
> 
> #!/bin/bash
> ifconfig eth0 down
> ifconfig eth1 down
> index=`cat /sys/class/net/eth1/ifindex`
> ip link set eth1 name dev$index
> unshare -n sleep 1h &
> pid=$!
> while [[ "`readlink /proc/self/ns/net`" == "`readlink /proc/$pid/ns/net`" ]]; do continue; done
> ip link set dev$index netns $pid
> ip link set eth0 name dev$index
> kill -9 $pid
> 
> Kernel messages:
> 
> virtio_net virtio1 dev3: renamed from eth1
> virtio_net virtio0 dev3: renamed from eth0
> default_device_exit: failed to move dev3 to init_net: -17
> ------------[ cut here ]------------
> kernel BUG at net/core/dev.c:8978!
> invalid opcode: 0000 [#1] PREEMPT SMP
> CPU: 1 PID: 276 Comm: kworker/u8:3 Not tainted 4.17.0+ #292
> Workqueue: netns cleanup_net
> RIP: 0010:default_device_exit+0x9c/0xb0
> [stack trace snipped]
> 
> This patch gives more variability during choosing new name
> of device and fixes the problem.
> 
> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
> ---
> 
> Since there is no suggestions how to fix this in another way, I'm resending the patch.

This patch does not remove the BUG, so does not really solve the
problem. ie., it is fairly trivial to write a script (32k dev%d named
devices in init_net) that triggers it again, so your commit subject and
commit log are not correct with the references to 'fixing the problem'.

The change does provide more variability in naming and reduces the
likelihood of not being able to push a device back to init_net.
Kirill Tkhai June 21, 2018, 10:03 a.m. UTC | #2
On 20.06.2018 20:15, David Ahern wrote:
> On 6/20/18 2:57 AM, Kirill Tkhai wrote:
>> From: Kirill Tkhai <ktkhai@virtuozzo.com>
>>
>> The following script makes kernel to crash since it can't obtain
>> a name for a device, when the name is occupied by another device:
>>
>> #!/bin/bash
>> ifconfig eth0 down
>> ifconfig eth1 down
>> index=`cat /sys/class/net/eth1/ifindex`
>> ip link set eth1 name dev$index
>> unshare -n sleep 1h &
>> pid=$!
>> while [[ "`readlink /proc/self/ns/net`" == "`readlink /proc/$pid/ns/net`" ]]; do continue; done
>> ip link set dev$index netns $pid
>> ip link set eth0 name dev$index
>> kill -9 $pid
>>
>> Kernel messages:
>>
>> virtio_net virtio1 dev3: renamed from eth1
>> virtio_net virtio0 dev3: renamed from eth0
>> default_device_exit: failed to move dev3 to init_net: -17
>> ------------[ cut here ]------------
>> kernel BUG at net/core/dev.c:8978!
>> invalid opcode: 0000 [#1] PREEMPT SMP
>> CPU: 1 PID: 276 Comm: kworker/u8:3 Not tainted 4.17.0+ #292
>> Workqueue: netns cleanup_net
>> RIP: 0010:default_device_exit+0x9c/0xb0
>> [stack trace snipped]
>>
>> This patch gives more variability during choosing new name
>> of device and fixes the problem.
>>
>> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
>> ---
>>
>> Since there is no suggestions how to fix this in another way, I'm resending the patch.
> 
> This patch does not remove the BUG, so does not really solve the
> problem. ie., it is fairly trivial to write a script (32k dev%d named
> devices in init_net) that triggers it again, so your commit subject and
> commit log are not correct with the references to 'fixing the problem'.

1)I'm not agree with you and I don't think removing the BUG() is a good idea.
This function is called from the place, where it must not fail. But it can
fail, and the problem with name is not the only reason of this happens.
We can't continue further pernet_operations in case of a problem happened
in default_device_exit(), and we can't remove the BUG() before this function
becomes of void type. But we are not going to make it of void type. So
we can't remove the BUG().

2)In case of the script is trivial, can't you just post it here to show
what type of devices you mean? Is there real problem or this is
a theoretical thinking?

All virtual devices I see have rtnl_link_ops, so that they just destroyed
in default_device_exit_batch(). According to physical devices, it's difficult
to imagine a node with 32k physical devices, and if someone tried to deploy
them it may meet problems not only in this place.

> The change does provide more variability in naming and reduces the
> likelihood of not being able to push a device back to init_net.

No, it provides. With the patch one may move real device to a container,
and allow to do with the device anything including changing of device
index. Then, the destruction of the container does not resilt a kernel
panic just because of two devices have the same index.

Kirill
David Ahern June 21, 2018, 3:28 p.m. UTC | #3
On 6/21/18 4:03 AM, Kirill Tkhai wrote:
>> This patch does not remove the BUG, so does not really solve the
>> problem. ie., it is fairly trivial to write a script (32k dev%d named
>> devices in init_net) that triggers it again, so your commit subject and
>> commit log are not correct with the references to 'fixing the problem'.
> 
> 1)I'm not agree with you and I don't think removing the BUG() is a good idea.
> This function is called from the place, where it must not fail. But it can
> fail, and the problem with name is not the only reason of this happens.
> We can't continue further pernet_operations in case of a problem happened
> in default_device_exit(), and we can't remove the BUG() before this function
> becomes of void type. But we are not going to make it of void type. So
> we can't remove the BUG().

You missed my point: that the function can still fail means you are not
"fixing" the problem, only delaying it.

> 
> 2)In case of the script is trivial, can't you just post it here to show
> what type of devices you mean? Is there real problem or this is
> a theoretical thinking?

Current code:

# ip li sh dev eth2
4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP
mode DEFAULT group default qlen 1000
    link/ether 02:e0:f9:46:64:80 brd ff:ff:ff:ff:ff:ff
# ip netns add fubar
# ip li set eth2 netns fubar
# ip li add eth2 type dummy
# ip li add dev4 type dummy
# ip netns del fubar
--> BUG
kernel:[78079.127748] default_device_exit: failed to move eth2 to
init_net: -17


With your patch:

# ip li sh dev eth2
4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP
mode DEFAULT group default qlen 1000
    link/ether 02:e0:f9:46:64:80 brd ff:ff:ff:ff:ff:ff
# ip netns add fubar
# ip li set eth2 netns fubar
# ip li add eth2 type dummy
# for n in $(seq 0 $((32*1024))); do
  echo "li add dev${n} type dummy"
  done > ip.batch
# ip -batch ip.batch
# ip netns del fubar
--> BUG
kernel:[   25.800024] default_device_exit: failed to move eth2 to
init_net: -17


> 
> All virtual devices I see have rtnl_link_ops, so that they just destroyed
> in default_device_exit_batch(). According to physical devices, it's difficult
> to imagine a node with 32k physical devices, and if someone tried to deploy
> them it may meet problems not only in this place.

Nothing says it has to be a physical device. It is only checking for a name.

> 
>> The change does provide more variability in naming and reduces the
>> likelihood of not being able to push a device back to init_net.
> 
> No, it provides. With the patch one may move real device to a container,
> and allow to do with the device anything including changing of device
> index. Then, the destruction of the container does not resilt a kernel
> panic just because of two devices have the same index.
> 
> Kirill
>
Kirill Tkhai June 22, 2018, 8:36 a.m. UTC | #4
On 21.06.2018 18:28, David Ahern wrote:
> On 6/21/18 4:03 AM, Kirill Tkhai wrote:
>>> This patch does not remove the BUG, so does not really solve the
>>> problem. ie., it is fairly trivial to write a script (32k dev%d named
>>> devices in init_net) that triggers it again, so your commit subject and
>>> commit log are not correct with the references to 'fixing the problem'.
>>
>> 1)I'm not agree with you and I don't think removing the BUG() is a good idea.
>> This function is called from the place, where it must not fail. But it can
>> fail, and the problem with name is not the only reason of this happens.
>> We can't continue further pernet_operations in case of a problem happened
>> in default_device_exit(), and we can't remove the BUG() before this function
>> becomes of void type. But we are not going to make it of void type. So
>> we can't remove the BUG().
> 
> You missed my point: that the function can still fail means you are not
> "fixing" the problem, only delaying it.

Till the function is of int type and it can fail, we can't remove the BUG().
And this does not connected with name resolution.

>>
>> 2)In case of the script is trivial, can't you just post it here to show
>> what type of devices you mean? Is there real problem or this is
>> a theoretical thinking?
> 
> Current code:
> 
> # ip li sh dev eth2
> 4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP
> mode DEFAULT group default qlen 1000
>     link/ether 02:e0:f9:46:64:80 brd ff:ff:ff:ff:ff:ff
> # ip netns add fubar
> # ip li set eth2 netns fubar
> # ip li add eth2 type dummy
> # ip li add dev4 type dummy
> # ip netns del fubar
> --> BUG
> kernel:[78079.127748] default_device_exit: failed to move eth2 to
> init_net: -17
> 
> 
> With your patch:
> 
> # ip li sh dev eth2
> 4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP
> mode DEFAULT group default qlen 1000
>     link/ether 02:e0:f9:46:64:80 brd ff:ff:ff:ff:ff:ff
> # ip netns add fubar
> # ip li set eth2 netns fubar
> # ip li add eth2 type dummy
> # for n in $(seq 0 $((32*1024))); do
>   echo "li add dev${n} type dummy"
>   done > ip.batch
> # ip -batch ip.batch
> # ip netns del fubar
> --> BUG
> kernel:[   25.800024] default_device_exit: failed to move eth2 to
> init_net: -17

Yeah, this has a sense.

>>
>> All virtual devices I see have rtnl_link_ops, so that they just destroyed
>> in default_device_exit_batch(). According to physical devices, it's difficult
>> to imagine a node with 32k physical devices, and if someone tried to deploy
>> them it may meet problems not only in this place.
> 
> Nothing says it has to be a physical device. It is only checking for a name.
> 
>>
>>> The change does provide more variability in naming and reduces the
>>> likelihood of not being able to push a device back to init_net.
>>
>> No, it provides. With the patch one may move real device to a container,
>> and allow to do with the device anything including changing of device
>> index. Then, the destruction of the container does not resilt a kernel
>> panic just because of two devices have the same index.
diff mbox series

Patch

diff --git a/net/core/dev.c b/net/core/dev.c
index 6e18242a1cae..6c9b9303ded6 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -8959,7 +8959,6 @@  static void __net_exit default_device_exit(struct net *net)
 	rtnl_lock();
 	for_each_netdev_safe(net, dev, aux) {
 		int err;
-		char fb_name[IFNAMSIZ];
 
 		/* Ignore unmoveable devices (i.e. loopback) */
 		if (dev->features & NETIF_F_NETNS_LOCAL)
@@ -8970,8 +8969,7 @@  static void __net_exit default_device_exit(struct net *net)
 			continue;
 
 		/* Push remaining network devices to init_net */
-		snprintf(fb_name, IFNAMSIZ, "dev%d", dev->ifindex);
-		err = dev_change_net_namespace(dev, &init_net, fb_name);
+		err = dev_change_net_namespace(dev, &init_net, "dev%d");
 		if (err) {
 			pr_emerg("%s: failed to move %s to init_net: %d\n",
 				 __func__, dev->name, err);