diff mbox

[net] bonding: fix destruction of bond with devices different from arphrd_ether

Message ID 1436989971-32761-1-git-send-email-razor@blackwall.org
State Accepted, archived
Delegated to: David Miller
Headers show

Commit Message

Nikolay Aleksandrov July 15, 2015, 7:52 p.m. UTC
From: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>

When the bonding is being unloaded and the netdevice notifier is
unregistered it executes NETDEV_UNREGISTER for each device which should
remove the bond's proc entry but if the device enslaved is not of
ARPHRD_ETHER type and is in front of the bonding, it may execute
bond_release_and_destroy() first which would release the last slave and
destroy the bond device leaving the proc entry and thus we will get the
following error (with dynamic debug on for bond_netdev_event to see the
events order):
[  908.963051] eql: event: 9
[  908.963052] eql: IFF_SLAVE
[  908.963054] eql: event: 2
[  908.963056] eql: IFF_SLAVE
[  908.963058] eql: event: 6
[  908.963059] eql: IFF_SLAVE
[  908.963110] bond0: Releasing active interface eql
[  908.976168] bond0: Destroying bond bond0
[  908.976266] bond0 (unregistering): Released all slaves
[  908.984097] ------------[ cut here ]------------
[  908.984107] WARNING: CPU: 0 PID: 1787 at fs/proc/generic.c:575
remove_proc_entry+0x112/0x160()
[  908.984110] remove_proc_entry: removing non-empty directory
'net/bonding', leaking at least 'bond0'
[  908.984111] Modules linked in: bonding(-) eql(O) 9p nfsd auth_rpcgss
oid_registry nfs_acl nfs lockd grace fscache sunrpc crct10dif_pclmul
crc32_pclmul crc32c_intel ghash_clmulni_intel ppdev qxl drm_kms_helper
snd_hda_codec_generic aesni_intel ttm aes_x86_64 glue_helper pcspkr lrw
gf128mul ablk_helper cryptd snd_hda_intel virtio_console snd_hda_codec
psmouse serio_raw snd_hwdep snd_hda_core 9pnet_virtio 9pnet evdev joydev
drm virtio_balloon snd_pcm snd_timer snd soundcore i2c_piix4 i2c_core
pvpanic acpi_cpufreq parport_pc parport processor thermal_sys button
autofs4 ext4 crc16 mbcache jbd2 hid_generic usbhid hid sg sr_mod cdrom
ata_generic virtio_blk virtio_net floppy ata_piix e1000 libata ehci_pci
virtio_pci scsi_mod uhci_hcd ehci_hcd virtio_ring virtio usbcore
usb_common [last unloaded: bonding]

[  908.984168] CPU: 0 PID: 1787 Comm: rmmod Tainted: G        W  O
4.2.0-rc2+ #8
[  908.984170] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[  908.984172]  0000000000000000 ffffffff81732d41 ffffffff81525b34
ffff8800358dfda8
[  908.984175]  ffffffff8106c521 ffff88003595af78 ffff88003595af40
ffff88003e3a4280
[  908.984178]  ffffffffa058d040 0000000000000000 ffffffff8106c59a
ffffffff8172ebd0
[  908.984181] Call Trace:
[  908.984188]  [<ffffffff81525b34>] ? dump_stack+0x40/0x50
[  908.984193]  [<ffffffff8106c521>] ? warn_slowpath_common+0x81/0xb0
[  908.984196]  [<ffffffff8106c59a>] ? warn_slowpath_fmt+0x4a/0x50
[  908.984199]  [<ffffffff81218352>] ? remove_proc_entry+0x112/0x160
[  908.984205]  [<ffffffffa05850e6>] ? bond_destroy_proc_dir+0x26/0x30
[bonding]
[  908.984208]  [<ffffffffa057540e>] ? bond_net_exit+0x8e/0xa0 [bonding]
[  908.984217]  [<ffffffff8142f407>] ? ops_exit_list.isra.4+0x37/0x70
[  908.984225]  [<ffffffff8142f52d>] ?
unregister_pernet_operations+0x8d/0xd0
[  908.984228]  [<ffffffff8142f58d>] ?
unregister_pernet_subsys+0x1d/0x30
[  908.984232]  [<ffffffffa0585269>] ? bonding_exit+0x23/0xdba [bonding]
[  908.984236]  [<ffffffff810e28ba>] ? SyS_delete_module+0x18a/0x250
[  908.984241]  [<ffffffff81086f99>] ? task_work_run+0x89/0xc0
[  908.984244]  [<ffffffff8152b732>] ?
entry_SYSCALL_64_fastpath+0x16/0x75
[  908.984247] ---[ end trace 7c006ed4abbef24b ]---

Thus remove the proc entry manually if bond_release_and_destroy() is
used. Because of the checks in bond_remove_proc_entry() it's not a
problem for a bond device to change namespaces (the bug fixed by the
Fixes commit) but since commit
f9399814927ad ("bonding: Don't allow bond devices to change network
namespaces.") that can't happen anyway.

Reported-by: Carol Soto <clsoto@linux.vnet.ibm.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Fixes: a64d49c3dd50 ("bonding: Manage /proc/net/bonding/ entries from
                      the netdev events")
---
 drivers/net/bonding/bond_main.c | 1 +
 1 file changed, 1 insertion(+)

Comments

clsoto@linux.vnet.ibm.com July 15, 2015, 9:01 p.m. UTC | #1
On 7/15/2015 2:52 PM, Nikolay Aleksandrov wrote:
> From: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
>
> When the bonding is being unloaded and the netdevice notifier is
> unregistered it executes NETDEV_UNREGISTER for each device which should
> remove the bond's proc entry but if the device enslaved is not of
> ARPHRD_ETHER type and is in front of the bonding, it may execute
> bond_release_and_destroy() first which would release the last slave and
> destroy the bond device leaving the proc entry and thus we will get the
> following error (with dynamic debug on for bond_netdev_event to see the
> events order):
> [  908.963051] eql: event: 9
> [  908.963052] eql: IFF_SLAVE
> [  908.963054] eql: event: 2
> [  908.963056] eql: IFF_SLAVE
> [  908.963058] eql: event: 6
> [  908.963059] eql: IFF_SLAVE
> [  908.963110] bond0: Releasing active interface eql
> [  908.976168] bond0: Destroying bond bond0
> [  908.976266] bond0 (unregistering): Released all slaves
> [  908.984097] ------------[ cut here ]------------
> [  908.984107] WARNING: CPU: 0 PID: 1787 at fs/proc/generic.c:575
> remove_proc_entry+0x112/0x160()
> [  908.984110] remove_proc_entry: removing non-empty directory
> 'net/bonding', leaking at least 'bond0'
> [  908.984111] Modules linked in: bonding(-) eql(O) 9p nfsd auth_rpcgss
> oid_registry nfs_acl nfs lockd grace fscache sunrpc crct10dif_pclmul
> crc32_pclmul crc32c_intel ghash_clmulni_intel ppdev qxl drm_kms_helper
> snd_hda_codec_generic aesni_intel ttm aes_x86_64 glue_helper pcspkr lrw
> gf128mul ablk_helper cryptd snd_hda_intel virtio_console snd_hda_codec
> psmouse serio_raw snd_hwdep snd_hda_core 9pnet_virtio 9pnet evdev joydev
> drm virtio_balloon snd_pcm snd_timer snd soundcore i2c_piix4 i2c_core
> pvpanic acpi_cpufreq parport_pc parport processor thermal_sys button
> autofs4 ext4 crc16 mbcache jbd2 hid_generic usbhid hid sg sr_mod cdrom
> ata_generic virtio_blk virtio_net floppy ata_piix e1000 libata ehci_pci
> virtio_pci scsi_mod uhci_hcd ehci_hcd virtio_ring virtio usbcore
> usb_common [last unloaded: bonding]
>
> [  908.984168] CPU: 0 PID: 1787 Comm: rmmod Tainted: G        W  O
> 4.2.0-rc2+ #8
> [  908.984170] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> [  908.984172]  0000000000000000 ffffffff81732d41 ffffffff81525b34
> ffff8800358dfda8
> [  908.984175]  ffffffff8106c521 ffff88003595af78 ffff88003595af40
> ffff88003e3a4280
> [  908.984178]  ffffffffa058d040 0000000000000000 ffffffff8106c59a
> ffffffff8172ebd0
> [  908.984181] Call Trace:
> [  908.984188]  [<ffffffff81525b34>] ? dump_stack+0x40/0x50
> [  908.984193]  [<ffffffff8106c521>] ? warn_slowpath_common+0x81/0xb0
> [  908.984196]  [<ffffffff8106c59a>] ? warn_slowpath_fmt+0x4a/0x50
> [  908.984199]  [<ffffffff81218352>] ? remove_proc_entry+0x112/0x160
> [  908.984205]  [<ffffffffa05850e6>] ? bond_destroy_proc_dir+0x26/0x30
> [bonding]
> [  908.984208]  [<ffffffffa057540e>] ? bond_net_exit+0x8e/0xa0 [bonding]
> [  908.984217]  [<ffffffff8142f407>] ? ops_exit_list.isra.4+0x37/0x70
> [  908.984225]  [<ffffffff8142f52d>] ?
> unregister_pernet_operations+0x8d/0xd0
> [  908.984228]  [<ffffffff8142f58d>] ?
> unregister_pernet_subsys+0x1d/0x30
> [  908.984232]  [<ffffffffa0585269>] ? bonding_exit+0x23/0xdba [bonding]
> [  908.984236]  [<ffffffff810e28ba>] ? SyS_delete_module+0x18a/0x250
> [  908.984241]  [<ffffffff81086f99>] ? task_work_run+0x89/0xc0
> [  908.984244]  [<ffffffff8152b732>] ?
> entry_SYSCALL_64_fastpath+0x16/0x75
> [  908.984247] ---[ end trace 7c006ed4abbef24b ]---
>
> Thus remove the proc entry manually if bond_release_and_destroy() is
> used. Because of the checks in bond_remove_proc_entry() it's not a
> problem for a bond device to change namespaces (the bug fixed by the
> Fixes commit) but since commit
> f9399814927ad ("bonding: Don't allow bond devices to change network
> namespaces.") that can't happen anyway.
>
> Reported-by: Carol Soto <clsoto@linux.vnet.ibm.com>
> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
> Fixes: a64d49c3dd50 ("bonding: Manage /proc/net/bonding/ entries from
>                        the netdev events")
> ---
>   drivers/net/bonding/bond_main.c | 1 +
>   1 file changed, 1 insertion(+)
>
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index 317a49480475..ec1404ec4d2f 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -1916,6 +1916,7 @@ static int  bond_release_and_destroy(struct net_device *bond_dev,
>   		bond_dev->priv_flags |= IFF_DISABLE_NETPOLL;
>   		netdev_info(bond_dev, "Destroying bond %s\n",
>   			    bond_dev->name);
> +		bond_remove_proc_entry(bond);
>   		unregister_netdevice(bond_dev);
>   	}
>   	return ret;
Tested-by: Carol L Soto <clsoto@linux.vnet.ibm.com>

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric W. Biederman July 15, 2015, 10:39 p.m. UTC | #2
Nikolay Aleksandrov <razor@blackwall.org> writes:

> From: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
>
> When the bonding is being unloaded and the netdevice notifier is
> unregistered it executes NETDEV_UNREGISTER for each device which should
> remove the bond's proc entry but if the device enslaved is not of
> ARPHRD_ETHER type and is in front of the bonding, it may execute
> bond_release_and_destroy() first which would release the last slave and
> destroy the bond device leaving the proc entry and thus we will get the
> following error (with dynamic debug on for bond_netdev_event to see the
> events order):

I see the failure below.  I am perplexed at the description.  Does the
bonding driver actually make sense on a non-ethernet device?

Would it not be better to make more sense to limit bonding to ethernet
devices so we don't get weird behavior?  I imagine there might be other
problems with bonding non-ethernet devices that no one has spotted,
or cares about.

Eric


> [  908.963051] eql: event: 9
> [  908.963052] eql: IFF_SLAVE
> [  908.963054] eql: event: 2
> [  908.963056] eql: IFF_SLAVE
> [  908.963058] eql: event: 6
> [  908.963059] eql: IFF_SLAVE
> [  908.963110] bond0: Releasing active interface eql
> [  908.976168] bond0: Destroying bond bond0
> [  908.976266] bond0 (unregistering): Released all slaves
> [  908.984097] ------------[ cut here ]------------
> [  908.984107] WARNING: CPU: 0 PID: 1787 at fs/proc/generic.c:575
> remove_proc_entry+0x112/0x160()
> [  908.984110] remove_proc_entry: removing non-empty directory
> 'net/bonding', leaking at least 'bond0'
> [  908.984111] Modules linked in: bonding(-) eql(O) 9p nfsd auth_rpcgss
> oid_registry nfs_acl nfs lockd grace fscache sunrpc crct10dif_pclmul
> crc32_pclmul crc32c_intel ghash_clmulni_intel ppdev qxl drm_kms_helper
> snd_hda_codec_generic aesni_intel ttm aes_x86_64 glue_helper pcspkr lrw
> gf128mul ablk_helper cryptd snd_hda_intel virtio_console snd_hda_codec
> psmouse serio_raw snd_hwdep snd_hda_core 9pnet_virtio 9pnet evdev joydev
> drm virtio_balloon snd_pcm snd_timer snd soundcore i2c_piix4 i2c_core
> pvpanic acpi_cpufreq parport_pc parport processor thermal_sys button
> autofs4 ext4 crc16 mbcache jbd2 hid_generic usbhid hid sg sr_mod cdrom
> ata_generic virtio_blk virtio_net floppy ata_piix e1000 libata ehci_pci
> virtio_pci scsi_mod uhci_hcd ehci_hcd virtio_ring virtio usbcore
> usb_common [last unloaded: bonding]
>
> [  908.984168] CPU: 0 PID: 1787 Comm: rmmod Tainted: G        W  O
> 4.2.0-rc2+ #8
> [  908.984170] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> [  908.984172]  0000000000000000 ffffffff81732d41 ffffffff81525b34
> ffff8800358dfda8
> [  908.984175]  ffffffff8106c521 ffff88003595af78 ffff88003595af40
> ffff88003e3a4280
> [  908.984178]  ffffffffa058d040 0000000000000000 ffffffff8106c59a
> ffffffff8172ebd0
> [  908.984181] Call Trace:
> [  908.984188]  [<ffffffff81525b34>] ? dump_stack+0x40/0x50
> [  908.984193]  [<ffffffff8106c521>] ? warn_slowpath_common+0x81/0xb0
> [  908.984196]  [<ffffffff8106c59a>] ? warn_slowpath_fmt+0x4a/0x50
> [  908.984199]  [<ffffffff81218352>] ? remove_proc_entry+0x112/0x160
> [  908.984205]  [<ffffffffa05850e6>] ? bond_destroy_proc_dir+0x26/0x30
> [bonding]
> [  908.984208]  [<ffffffffa057540e>] ? bond_net_exit+0x8e/0xa0 [bonding]
> [  908.984217]  [<ffffffff8142f407>] ? ops_exit_list.isra.4+0x37/0x70
> [  908.984225]  [<ffffffff8142f52d>] ?
> unregister_pernet_operations+0x8d/0xd0
> [  908.984228]  [<ffffffff8142f58d>] ?
> unregister_pernet_subsys+0x1d/0x30
> [  908.984232]  [<ffffffffa0585269>] ? bonding_exit+0x23/0xdba [bonding]
> [  908.984236]  [<ffffffff810e28ba>] ? SyS_delete_module+0x18a/0x250
> [  908.984241]  [<ffffffff81086f99>] ? task_work_run+0x89/0xc0
> [  908.984244]  [<ffffffff8152b732>] ?
> entry_SYSCALL_64_fastpath+0x16/0x75
> [  908.984247] ---[ end trace 7c006ed4abbef24b ]---
>
> Thus remove the proc entry manually if bond_release_and_destroy() is
> used. Because of the checks in bond_remove_proc_entry() it's not a
> problem for a bond device to change namespaces (the bug fixed by the
> Fixes commit) but since commit
> f9399814927ad ("bonding: Don't allow bond devices to change network
> namespaces.") that can't happen anyway.
>
> Reported-by: Carol Soto <clsoto@linux.vnet.ibm.com>
> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
> Fixes: a64d49c3dd50 ("bonding: Manage /proc/net/bonding/ entries from
>                       the netdev events")
> ---
>  drivers/net/bonding/bond_main.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index 317a49480475..ec1404ec4d2f 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -1916,6 +1916,7 @@ static int  bond_release_and_destroy(struct net_device *bond_dev,
>  		bond_dev->priv_flags |= IFF_DISABLE_NETPOLL;
>  		netdev_info(bond_dev, "Destroying bond %s\n",
>  			    bond_dev->name);
> +		bond_remove_proc_entry(bond);
>  		unregister_netdevice(bond_dev);
>  	}
>  	return ret;
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nikolay Aleksandrov July 15, 2015, 10:54 p.m. UTC | #3
On 07/16/2015 12:39 AM, Eric W. Biederman wrote:
> Nikolay Aleksandrov <razor@blackwall.org> writes:
> 
>> From: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
>>
>> When the bonding is being unloaded and the netdevice notifier is
>> unregistered it executes NETDEV_UNREGISTER for each device which should
>> remove the bond's proc entry but if the device enslaved is not of
>> ARPHRD_ETHER type and is in front of the bonding, it may execute
>> bond_release_and_destroy() first which would release the last slave and
>> destroy the bond device leaving the proc entry and thus we will get the
>> following error (with dynamic debug on for bond_netdev_event to see the
>> events order):
> 
> I see the failure below.  I am perplexed at the description.  Does the
> bonding driver actually make sense on a non-ethernet device?
> 
Sometimes it does, I've seen it used with infiniband devices a lot, also
there were cases of some ppp users.

> Would it not be better to make more sense to limit bonding to ethernet
> devices so we don't get weird behavior?  I imagine there might be other
> problems with bonding non-ethernet devices that no one has spotted,
> or cares about.
> 
> Eric
> 
> 
My personal opinion would be to disable non-ethernet devices, but support was
already added and has been there for a long time so we have to fix this for
the older releases, I don't mind removing non-ethernet device support for net-next
but I'm guessing there're people still using that like the case that started
this thread.

Cheers,
 Nik

>> [  908.963051] eql: event: 9
>> [  908.963052] eql: IFF_SLAVE
>> [  908.963054] eql: event: 2
>> [  908.963056] eql: IFF_SLAVE
>> [  908.963058] eql: event: 6
>> [  908.963059] eql: IFF_SLAVE
>> [  908.963110] bond0: Releasing active interface eql
>> [  908.976168] bond0: Destroying bond bond0
>> [  908.976266] bond0 (unregistering): Released all slaves
>> [  908.984097] ------------[ cut here ]------------
>> [  908.984107] WARNING: CPU: 0 PID: 1787 at fs/proc/generic.c:575
>> remove_proc_entry+0x112/0x160()
>> [  908.984110] remove_proc_entry: removing non-empty directory
>> 'net/bonding', leaking at least 'bond0'
>> [  908.984111] Modules linked in: bonding(-) eql(O) 9p nfsd auth_rpcgss
>> oid_registry nfs_acl nfs lockd grace fscache sunrpc crct10dif_pclmul
>> crc32_pclmul crc32c_intel ghash_clmulni_intel ppdev qxl drm_kms_helper
>> snd_hda_codec_generic aesni_intel ttm aes_x86_64 glue_helper pcspkr lrw
>> gf128mul ablk_helper cryptd snd_hda_intel virtio_console snd_hda_codec
>> psmouse serio_raw snd_hwdep snd_hda_core 9pnet_virtio 9pnet evdev joydev
>> drm virtio_balloon snd_pcm snd_timer snd soundcore i2c_piix4 i2c_core
>> pvpanic acpi_cpufreq parport_pc parport processor thermal_sys button
>> autofs4 ext4 crc16 mbcache jbd2 hid_generic usbhid hid sg sr_mod cdrom
>> ata_generic virtio_blk virtio_net floppy ata_piix e1000 libata ehci_pci
>> virtio_pci scsi_mod uhci_hcd ehci_hcd virtio_ring virtio usbcore
>> usb_common [last unloaded: bonding]
>>
>> [  908.984168] CPU: 0 PID: 1787 Comm: rmmod Tainted: G        W  O
>> 4.2.0-rc2+ #8
>> [  908.984170] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
>> [  908.984172]  0000000000000000 ffffffff81732d41 ffffffff81525b34
>> ffff8800358dfda8
>> [  908.984175]  ffffffff8106c521 ffff88003595af78 ffff88003595af40
>> ffff88003e3a4280
>> [  908.984178]  ffffffffa058d040 0000000000000000 ffffffff8106c59a
>> ffffffff8172ebd0
>> [  908.984181] Call Trace:
>> [  908.984188]  [<ffffffff81525b34>] ? dump_stack+0x40/0x50
>> [  908.984193]  [<ffffffff8106c521>] ? warn_slowpath_common+0x81/0xb0
>> [  908.984196]  [<ffffffff8106c59a>] ? warn_slowpath_fmt+0x4a/0x50
>> [  908.984199]  [<ffffffff81218352>] ? remove_proc_entry+0x112/0x160
>> [  908.984205]  [<ffffffffa05850e6>] ? bond_destroy_proc_dir+0x26/0x30
>> [bonding]
>> [  908.984208]  [<ffffffffa057540e>] ? bond_net_exit+0x8e/0xa0 [bonding]
>> [  908.984217]  [<ffffffff8142f407>] ? ops_exit_list.isra.4+0x37/0x70
>> [  908.984225]  [<ffffffff8142f52d>] ?
>> unregister_pernet_operations+0x8d/0xd0
>> [  908.984228]  [<ffffffff8142f58d>] ?
>> unregister_pernet_subsys+0x1d/0x30
>> [  908.984232]  [<ffffffffa0585269>] ? bonding_exit+0x23/0xdba [bonding]
>> [  908.984236]  [<ffffffff810e28ba>] ? SyS_delete_module+0x18a/0x250
>> [  908.984241]  [<ffffffff81086f99>] ? task_work_run+0x89/0xc0
>> [  908.984244]  [<ffffffff8152b732>] ?
>> entry_SYSCALL_64_fastpath+0x16/0x75
>> [  908.984247] ---[ end trace 7c006ed4abbef24b ]---
>>
>> Thus remove the proc entry manually if bond_release_and_destroy() is
>> used. Because of the checks in bond_remove_proc_entry() it's not a
>> problem for a bond device to change namespaces (the bug fixed by the
>> Fixes commit) but since commit
>> f9399814927ad ("bonding: Don't allow bond devices to change network
>> namespaces.") that can't happen anyway.
>>
>> Reported-by: Carol Soto <clsoto@linux.vnet.ibm.com>
>> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
>> Fixes: a64d49c3dd50 ("bonding: Manage /proc/net/bonding/ entries from
>>                       the netdev events")
>> ---
>>  drivers/net/bonding/bond_main.c | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>> index 317a49480475..ec1404ec4d2f 100644
>> --- a/drivers/net/bonding/bond_main.c
>> +++ b/drivers/net/bonding/bond_main.c
>> @@ -1916,6 +1916,7 @@ static int  bond_release_and_destroy(struct net_device *bond_dev,
>>  		bond_dev->priv_flags |= IFF_DISABLE_NETPOLL;
>>  		netdev_info(bond_dev, "Destroying bond %s\n",
>>  			    bond_dev->name);
>> +		bond_remove_proc_entry(bond);
>>  		unregister_netdevice(bond_dev);
>>  	}
>>  	return ret;

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
clsoto@linux.vnet.ibm.com July 15, 2015, 10:58 p.m. UTC | #4
On 7/15/2015 5:54 PM, Nikolay Aleksandrov wrote:
> On 07/16/2015 12:39 AM, Eric W. Biederman wrote:
>> Nikolay Aleksandrov <razor@blackwall.org> writes:
>>
>>> From: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
>>>
>>> When the bonding is being unloaded and the netdevice notifier is
>>> unregistered it executes NETDEV_UNREGISTER for each device which should
>>> remove the bond's proc entry but if the device enslaved is not of
>>> ARPHRD_ETHER type and is in front of the bonding, it may execute
>>> bond_release_and_destroy() first which would release the last slave and
>>> destroy the bond device leaving the proc entry and thus we will get the
>>> following error (with dynamic debug on for bond_netdev_event to see the
>>> events order):
>> I see the failure below.  I am perplexed at the description.  Does the
>> bonding driver actually make sense on a non-ethernet device?
>>
> Sometimes it does, I've seen it used with infiniband devices a lot, also
> there were cases of some ppp users.
>
>> Would it not be better to make more sense to limit bonding to ethernet
>> devices so we don't get weird behavior?  I imagine there might be other
>> problems with bonding non-ethernet devices that no one has spotted,
>> or cares about.
>>
>> Eric
>>
>>
> My personal opinion would be to disable non-ethernet devices, but support was
> already added and has been there for a long time so we have to fix this for
> the older releases, I don't mind removing non-ethernet device support for net-next
> but I'm guessing there're people still using that like the case that started
> this thread.
>
> Cheers,
>   Nik
Yes, there are Infiniband users that uses bonding.

Carol
>>> [  908.963051] eql: event: 9
>>> [  908.963052] eql: IFF_SLAVE
>>> [  908.963054] eql: event: 2
>>> [  908.963056] eql: IFF_SLAVE
>>> [  908.963058] eql: event: 6
>>> [  908.963059] eql: IFF_SLAVE
>>> [  908.963110] bond0: Releasing active interface eql
>>> [  908.976168] bond0: Destroying bond bond0
>>> [  908.976266] bond0 (unregistering): Released all slaves
>>> [  908.984097] ------------[ cut here ]------------
>>> [  908.984107] WARNING: CPU: 0 PID: 1787 at fs/proc/generic.c:575
>>> remove_proc_entry+0x112/0x160()
>>> [  908.984110] remove_proc_entry: removing non-empty directory
>>> 'net/bonding', leaking at least 'bond0'
>>> [  908.984111] Modules linked in: bonding(-) eql(O) 9p nfsd auth_rpcgss
>>> oid_registry nfs_acl nfs lockd grace fscache sunrpc crct10dif_pclmul
>>> crc32_pclmul crc32c_intel ghash_clmulni_intel ppdev qxl drm_kms_helper
>>> snd_hda_codec_generic aesni_intel ttm aes_x86_64 glue_helper pcspkr lrw
>>> gf128mul ablk_helper cryptd snd_hda_intel virtio_console snd_hda_codec
>>> psmouse serio_raw snd_hwdep snd_hda_core 9pnet_virtio 9pnet evdev joydev
>>> drm virtio_balloon snd_pcm snd_timer snd soundcore i2c_piix4 i2c_core
>>> pvpanic acpi_cpufreq parport_pc parport processor thermal_sys button
>>> autofs4 ext4 crc16 mbcache jbd2 hid_generic usbhid hid sg sr_mod cdrom
>>> ata_generic virtio_blk virtio_net floppy ata_piix e1000 libata ehci_pci
>>> virtio_pci scsi_mod uhci_hcd ehci_hcd virtio_ring virtio usbcore
>>> usb_common [last unloaded: bonding]
>>>
>>> [  908.984168] CPU: 0 PID: 1787 Comm: rmmod Tainted: G        W  O
>>> 4.2.0-rc2+ #8
>>> [  908.984170] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
>>> [  908.984172]  0000000000000000 ffffffff81732d41 ffffffff81525b34
>>> ffff8800358dfda8
>>> [  908.984175]  ffffffff8106c521 ffff88003595af78 ffff88003595af40
>>> ffff88003e3a4280
>>> [  908.984178]  ffffffffa058d040 0000000000000000 ffffffff8106c59a
>>> ffffffff8172ebd0
>>> [  908.984181] Call Trace:
>>> [  908.984188]  [<ffffffff81525b34>] ? dump_stack+0x40/0x50
>>> [  908.984193]  [<ffffffff8106c521>] ? warn_slowpath_common+0x81/0xb0
>>> [  908.984196]  [<ffffffff8106c59a>] ? warn_slowpath_fmt+0x4a/0x50
>>> [  908.984199]  [<ffffffff81218352>] ? remove_proc_entry+0x112/0x160
>>> [  908.984205]  [<ffffffffa05850e6>] ? bond_destroy_proc_dir+0x26/0x30
>>> [bonding]
>>> [  908.984208]  [<ffffffffa057540e>] ? bond_net_exit+0x8e/0xa0 [bonding]
>>> [  908.984217]  [<ffffffff8142f407>] ? ops_exit_list.isra.4+0x37/0x70
>>> [  908.984225]  [<ffffffff8142f52d>] ?
>>> unregister_pernet_operations+0x8d/0xd0
>>> [  908.984228]  [<ffffffff8142f58d>] ?
>>> unregister_pernet_subsys+0x1d/0x30
>>> [  908.984232]  [<ffffffffa0585269>] ? bonding_exit+0x23/0xdba [bonding]
>>> [  908.984236]  [<ffffffff810e28ba>] ? SyS_delete_module+0x18a/0x250
>>> [  908.984241]  [<ffffffff81086f99>] ? task_work_run+0x89/0xc0
>>> [  908.984244]  [<ffffffff8152b732>] ?
>>> entry_SYSCALL_64_fastpath+0x16/0x75
>>> [  908.984247] ---[ end trace 7c006ed4abbef24b ]---
>>>
>>> Thus remove the proc entry manually if bond_release_and_destroy() is
>>> used. Because of the checks in bond_remove_proc_entry() it's not a
>>> problem for a bond device to change namespaces (the bug fixed by the
>>> Fixes commit) but since commit
>>> f9399814927ad ("bonding: Don't allow bond devices to change network
>>> namespaces.") that can't happen anyway.
>>>
>>> Reported-by: Carol Soto <clsoto@linux.vnet.ibm.com>
>>> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
>>> Fixes: a64d49c3dd50 ("bonding: Manage /proc/net/bonding/ entries from
>>>                        the netdev events")
>>> ---
>>>   drivers/net/bonding/bond_main.c | 1 +
>>>   1 file changed, 1 insertion(+)
>>>
>>> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>>> index 317a49480475..ec1404ec4d2f 100644
>>> --- a/drivers/net/bonding/bond_main.c
>>> +++ b/drivers/net/bonding/bond_main.c
>>> @@ -1916,6 +1916,7 @@ static int  bond_release_and_destroy(struct net_device *bond_dev,
>>>   		bond_dev->priv_flags |= IFF_DISABLE_NETPOLL;
>>>   		netdev_info(bond_dev, "Destroying bond %s\n",
>>>   			    bond_dev->name);
>>> +		bond_remove_proc_entry(bond);
>>>   		unregister_netdevice(bond_dev);
>>>   	}
>>>   	return ret;

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Veaceslav Falico July 16, 2015, 6:14 a.m. UTC | #5
On Thu, Jul 16, 2015 at 12:54:06AM +0200, Nikolay Aleksandrov wrote:
...snip...
>My personal opinion would be to disable non-ethernet devices, but support was
>already added and has been there for a long time so we have to fix this for
>the older releases, I don't mind removing non-ethernet device support for net-next
>but I'm guessing there're people still using that like the case that started
>this thread.

Yep, last time non-ethernet devices broke there were quite a few reports
for ppp users, and infiniband is indeed used a lot too. No options here, we
must support it...
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller July 20, 2015, 7:56 p.m. UTC | #6
From: Nikolay Aleksandrov <razor@blackwall.org>
Date: Wed, 15 Jul 2015 21:52:51 +0200

> From: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
> 
> When the bonding is being unloaded and the netdevice notifier is
> unregistered it executes NETDEV_UNREGISTER for each device which should
> remove the bond's proc entry but if the device enslaved is not of
> ARPHRD_ETHER type and is in front of the bonding, it may execute
> bond_release_and_destroy() first which would release the last slave and
> destroy the bond device leaving the proc entry and thus we will get the
> following error (with dynamic debug on for bond_netdev_event to see the
> events order):
 ...
> Thus remove the proc entry manually if bond_release_and_destroy() is
> used. Because of the checks in bond_remove_proc_entry() it's not a
> problem for a bond device to change namespaces (the bug fixed by the
> Fixes commit) but since commit
> f9399814927ad ("bonding: Don't allow bond devices to change network
> namespaces.") that can't happen anyway.
> 
> Reported-by: Carol Soto <clsoto@linux.vnet.ibm.com>
> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
> Fixes: a64d49c3dd50 ("bonding: Manage /proc/net/bonding/ entries from
>                       the netdev events")

Applied and queued up for -stable, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 317a49480475..ec1404ec4d2f 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1916,6 +1916,7 @@  static int  bond_release_and_destroy(struct net_device *bond_dev,
 		bond_dev->priv_flags |= IFF_DISABLE_NETPOLL;
 		netdev_info(bond_dev, "Destroying bond %s\n",
 			    bond_dev->name);
+		bond_remove_proc_entry(bond);
 		unregister_netdevice(bond_dev);
 	}
 	return ret;