[nf] ipvs: more robust refcounting when sync thread starts
diff mbox series

Message ID 2bac83f117771cc5c5a59c2bf816039b0efa8239.1570010302.git.dcaratti@redhat.com
State Changes Requested
Delegated to: Pablo Neira
Headers show
Series
  • [nf] ipvs: more robust refcounting when sync thread starts
Related show

Commit Message

Davide Caratti Oct. 2, 2019, 10:05 a.m. UTC
if the IPVS module is removed while the sync daemon is starting, there is
a small possibility for stop_sync_thread() to end successfully just after
start_sync_thread() released sync_mutex, but before increasing the module
refcount. When this race happens, the following warning is seen in dmesg:

 IPVS: stopping master sync thread 14134 ...
 ------------[ cut here ]------------
 WARNING: CPU: 0 PID: 14133 at kernel/module.c:1137 module_put.part.44+0x167/0x2a0
 Modules linked in: ip_vs(-) nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 sch_netem veth ip6table_filter ip6_tables iptable_filter binfmt_misc intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_codec_generic ledtrig_audio snd_hda_intel snd_hda_codec ext4 snd_hda_core snd_hwdep mbcache jbd2 snd_seq snd_seq_device snd_pcm aesni_intel crypto_simd cryptd glue_helper joydev snd_timer virtio_balloon pcspkr snd soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c ata_generic pata_acpi virtio_net virtio_console virtio_blk net_failover failover qxl drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm crc32c_intel serio_raw ata_piix virtio_pci drm virtio_ring virtio libata floppy dm_mirror dm_region_hash dm_log dm_mod [last unloaded: nf_defrag_ipv6]
 CPU: 0 PID: 14133 Comm: modprobe Tainted: G        W         5.3.0-rc7.upstream+ #723
 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
 RIP: 0010:module_put.part.44+0x167/0x2a0
 Code: 48 83 c4 68 5b 5d 41 5c 41 5d 41 5e 41 5f c3 89 44 24 28 83 e8 01 89 c5 0f 89 57 ff ff ff 48 c7 c7 60 9c ed 86 e8 32 4c f5 ff <0f> 0b e9 6c ff ff ff 65 8b 1d 6b b2 e6 7a 89 db be 08 00 00 00 48
 RSP: 0018:ffff88804f9b7c78 EFLAGS: 00010282
 RAX: 0000000000000024 RBX: ffffffffc1609690 RCX: ffffffff85a82042
 RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff888053627bec
 RBP: 00000000ffffffff R08: ffffed100a6c6181 R09: ffffed100a6c6181
 R10: 0000000000000001 R11: ffffed100a6c6180 R12: 1ffff11009f36f90
 R13: ffffffffc1609300 R14: ffff88804c0d5500 R15: ffff88804c0d5ea0
 FS:  00007fa1c73f8740(0000) GS:ffff888053600000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00007f914f4d345f CR3: 000000004855a005 CR4: 00000000001606f0
 Call Trace:
  stop_sync_thread+0x3a3/0x7c0 [ip_vs]
  ip_vs_sync_net_cleanup+0x13/0x50 [ip_vs]
  ops_exit_list.isra.5+0x94/0x140
  unregister_pernet_operations+0x29d/0x460
  unregister_pernet_device+0x26/0x60
  ip_vs_cleanup+0x11/0x38 [ip_vs]
  do_syscall_64+0xa5/0x4f0
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
 RIP: 0033:0x7fa1c68c9027
 Code: 73 01 c3 48 8b 0d 69 6e 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 39 6e 2c 00 f7 d8 64 89 01 48
 RSP: 002b:00007ffe8b021cf8 EFLAGS: 00000202 ORIG_RAX: 00000000000000b0
 RAX: ffffffffffffffda RBX: 0000000001528240 RCX: 00007fa1c68c9027
 RDX: 0000000000000000 RSI: 0000000000000800 RDI: 00000000015282a8
 RBP: 0000000000000000 R08: 00007fa1c6b91060 R09: 00007fa1c693d1a0
 R10: 00007ffe8b021760 R11: 0000000000000202 R12: 0000000000000000
 R13: 0000000000000001 R14: 00000000015282a8 R15: 0000000000000000

Fix this ensuring that start_sync_thread() takes a refcount for the ip_vs
module earlier, and eventually drop it in the error path.

Reported-by: Zhi Li <yieli@redhat.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
---
 net/netfilter/ipvs/ip_vs_sync.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

Comments

Julian Anastasov Oct. 3, 2019, 7:28 p.m. UTC | #1
Hello,

On Wed, 2 Oct 2019, Davide Caratti wrote:

> if the IPVS module is removed while the sync daemon is starting, there is
> a small possibility for stop_sync_thread() to end successfully just after
> start_sync_thread() released sync_mutex, but before increasing the module
> refcount. When this race happens, the following warning is seen in dmesg:
> 
>  IPVS: stopping master sync thread 14134 ...
>  ------------[ cut here ]------------
>  WARNING: CPU: 0 PID: 14133 at kernel/module.c:1137 module_put.part.44+0x167/0x2a0
>  Modules linked in: ip_vs(-) nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 sch_netem veth ip6table_filter ip6_tables iptable_filter binfmt_misc intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_codec_generic ledtrig_audio snd_hda_intel snd_hda_codec ext4 snd_hda_core snd_hwdep mbcache jbd2 snd_seq snd_seq_device snd_pcm aesni_intel crypto_simd cryptd glue_helper joydev snd_timer virtio_balloon pcspkr snd soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c ata_generic pata_acpi virtio_net virtio_console virtio_blk net_failover failover qxl drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm crc32c_intel serio_raw ata_piix virtio_pci drm virtio_ring virtio libata floppy dm_mirror dm_region_hash dm_log dm_mod [last unloaded: nf_defrag_ipv6]
>  CPU: 0 PID: 14133 Comm: modprobe Tainted: G        W         5.3.0-rc7.upstream+ #723
>  Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
>  RIP: 0010:module_put.part.44+0x167/0x2a0
>  Code: 48 83 c4 68 5b 5d 41 5c 41 5d 41 5e 41 5f c3 89 44 24 28 83 e8 01 89 c5 0f 89 57 ff ff ff 48 c7 c7 60 9c ed 86 e8 32 4c f5 ff <0f> 0b e9 6c ff ff ff 65 8b 1d 6b b2 e6 7a 89 db be 08 00 00 00 48
>  RSP: 0018:ffff88804f9b7c78 EFLAGS: 00010282
>  RAX: 0000000000000024 RBX: ffffffffc1609690 RCX: ffffffff85a82042
>  RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff888053627bec
>  RBP: 00000000ffffffff R08: ffffed100a6c6181 R09: ffffed100a6c6181
>  R10: 0000000000000001 R11: ffffed100a6c6180 R12: 1ffff11009f36f90
>  R13: ffffffffc1609300 R14: ffff88804c0d5500 R15: ffff88804c0d5ea0
>  FS:  00007fa1c73f8740(0000) GS:ffff888053600000(0000) knlGS:0000000000000000
>  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>  CR2: 00007f914f4d345f CR3: 000000004855a005 CR4: 00000000001606f0
>  Call Trace:
>   stop_sync_thread+0x3a3/0x7c0 [ip_vs]
>   ip_vs_sync_net_cleanup+0x13/0x50 [ip_vs]
>   ops_exit_list.isra.5+0x94/0x140
>   unregister_pernet_operations+0x29d/0x460
>   unregister_pernet_device+0x26/0x60
>   ip_vs_cleanup+0x11/0x38 [ip_vs]
>   do_syscall_64+0xa5/0x4f0
>   entry_SYSCALL_64_after_hwframe+0x49/0xbe
>  RIP: 0033:0x7fa1c68c9027
>  Code: 73 01 c3 48 8b 0d 69 6e 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 39 6e 2c 00 f7 d8 64 89 01 48
>  RSP: 002b:00007ffe8b021cf8 EFLAGS: 00000202 ORIG_RAX: 00000000000000b0
>  RAX: ffffffffffffffda RBX: 0000000001528240 RCX: 00007fa1c68c9027
>  RDX: 0000000000000000 RSI: 0000000000000800 RDI: 00000000015282a8
>  RBP: 0000000000000000 R08: 00007fa1c6b91060 R09: 00007fa1c693d1a0
>  R10: 00007ffe8b021760 R11: 0000000000000202 R12: 0000000000000000
>  R13: 0000000000000001 R14: 00000000015282a8 R15: 0000000000000000
> 
> Fix this ensuring that start_sync_thread() takes a refcount for the ip_vs
> module earlier, and eventually drop it in the error path.
> 
> Reported-by: Zhi Li <yieli@redhat.com>
> Signed-off-by: Davide Caratti <dcaratti@redhat.com>
> ---
>  net/netfilter/ipvs/ip_vs_sync.c | 11 ++++++++---
>  1 file changed, 8 insertions(+), 3 deletions(-)
> 
> diff --git a/net/netfilter/ipvs/ip_vs_sync.c b/net/netfilter/ipvs/ip_vs_sync.c
> index a4a78c4b06de..7ac148182b61 100644
> --- a/net/netfilter/ipvs/ip_vs_sync.c
> +++ b/net/netfilter/ipvs/ip_vs_sync.c
> @@ -1758,6 +1758,9 @@ int start_sync_thread(struct netns_ipvs *ipvs, struct ipvs_sync_daemon_cfg *c,
>  	int result = -ENOMEM;
>  	u16 mtu, min_mtu;
>  
> +	/* increase the module use count */
> +	ip_vs_use_count_inc();
> +

	May be this is not early enough. When control hooks 
are the first thing to disconnect with ip_vs_unregister_nl_ioctl()
how start_sync_thread() is called concurrently? ip_vs_sockopts is 
protected with .owner field and ip_vs_genl_family with .module field
or more correctly with cb_lock rwsem in net/netlink/genetlink.c
which avoids doit() handlers after genl_unregister_family() is called.
So, both interfaces do not allow start_sync_thread() during module
removal and it is interesting how your fix works actually.

	Which interface calls start_sync_thread (sockopt/genl)? Note that
do_ip_vs_set_ctl() already has ip_vs_use_count_inc() before the
start_sync_thread() call. Also, the WARN in module_put warns
about refcnt underflow, so there is an extra ip_vs_use_count_dec() ?

	Another possibility is the missing checks for 
ip_vs_use_count_inc() result to cause underflow on ip_vs_use_count_dec().

	Are there any steps to reproduce?

>  	IP_VS_DBG(7, "%s(): pid %d\n", __func__, task_pid_nr(current));
>  	IP_VS_DBG(7, "Each ip_vs_sync_conn entry needs %zd bytes\n",
>  		  sizeof(struct ip_vs_sync_conn_v0));
> @@ -1892,9 +1895,6 @@ int start_sync_thread(struct netns_ipvs *ipvs, struct ipvs_sync_daemon_cfg *c,
>  	mutex_unlock(&ipvs->sync_mutex);
>  	rtnl_unlock();
>  
> -	/* increase the module use count */
> -	ip_vs_use_count_inc();
> -
>  	return 0;
>  
>  out:
> @@ -1924,11 +1924,16 @@ int start_sync_thread(struct netns_ipvs *ipvs, struct ipvs_sync_daemon_cfg *c,
>  		}
>  		kfree(ti);
>  	}
> +
> +	/* decrease the module use count */
> +	ip_vs_use_count_dec();
>  	return result;
>  
>  out_early:
>  	mutex_unlock(&ipvs->sync_mutex);
>  	rtnl_unlock();
> +	/* decrease the module use count */
> +	ip_vs_use_count_dec();
>  	return result;
>  }
>  
> -- 
> 2.21.0

Regards

--
Julian Anastasov <ja@ssi.bg>
Davide Caratti Oct. 9, 2019, 3:01 p.m. UTC | #2
On Thu, 2019-10-03 at 22:28 +0300, Julian Anastasov wrote:
> 	Hello,

hello Julian,

thanks a lot for reviewing this code.

> On Wed, 2 Oct 2019, Davide Caratti wrote:
> 
> > if the IPVS module is removed while the sync daemon is starting, there is
> > a small possibility for stop_sync_thread() to end successfully just after
> > start_sync_thread() released sync_mutex, but before increasing the module
> > refcount. When this race happens, the following warning is seen in dmesg:
> > 
> >  IPVS: stopping master sync thread 14134 ...
> >  ------------[ cut here ]------------
> >  WARNING: CPU: 0 PID: 14133 at kernel/module.c:1137 module_put.part.44+0x167/0x2a0

[...]

> 	May be this is not early enough. When control hooks 
> are the first thing to disconnect with ip_vs_unregister_nl_ioctl()
> how start_sync_thread() is called concurrently?

Initially I believed that __ip_vs_dev_cleanup() was called when
start_sync_thread() was already running (but it already released the RTNL
lock). But then, after your comments:

>  ip_vs_sockopts is 
> protected with .owner field and ip_vs_genl_family with .module field
> or more correctly with cb_lock rwsem in net/netlink/genetlink.c
> which avoids doit() handlers after genl_unregister_family() is called.
> So, both interfaces do not allow start_sync_thread() during module
> removal and it is interesting how your fix works actually.

I think that probably my patch just changes the timing of events in my
test setup (see script below), making the problem more difficult to
observe.

> 	Which interface calls start_sync_thread (sockopt/genl)? Note that
> do_ip_vs_set_ctl() already has ip_vs_use_count_inc() before the
> start_sync_thread() call. Also, the WARN in module_put warns
> about refcnt underflow, so there is an extra ip_vs_use_count_dec() ?

it's genl in my case, see call traces below.

> 	Another possibility is the missing checks for 
> ip_vs_use_count_inc() result to cause underflow on ip_vs_use_count_dec().

thanks for the advice, the unbalance is probably the cause of the splat
I'm seeing. On a clean/unpatched kernel, I modified ip_vs_use_count() to
do this:

WARN_ON(!try_module_get(THIS_MODULE))

using the reproducer below, I still see the splat in module_put(), but
it's preceded by a failure in ip_vs_use_count_inc(), which accidentally is
the same one I moved in my patch).

IPVS: Registered protocols (TCP, UDP, SCTP, AH, ESP)
 IPVS: Connection hash table configured (size=4096, memory=64Kbytes)
 IPVS: ipvs loaded.
 IPVS: sync thread started: state = MASTER, mcast_ifn = veth_syncd, syncid = 0, id = 0
 IPVS: stopping master sync thread 4005 ...
 IPVS: sync thread started: state = MASTER, mcast_ifn = veth_syncd, syncid = 0, id = 0
 ------------[ cut here ]------------
 WARNING: CPU: 0 PID: 4014 at net/netfilter/ipvs/ip_vs_ctl.c:235 ip_vs_use_count_inc+0x19/0x20 [ip_vs]
 Modules linked in: ip_vs(-) nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 veth ip6table_filter ip6_tables iptable_filter binfmt_misc intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul ext4 mbcache jbd2 ghash_clmulni_intel snd_hda_codec_generic ledtrig_audio snd_hda_intel snd_intel_nhlt snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm aesni_intel crypto_simd cryptd glue_helper joydev pcspkr snd_timer virtio_balloon snd soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c ata_generic pata_acpi virtio_net net_failover virtio_blk failover virtio_console qxl drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ata_piix ttm crc32c_intel serio_raw drm virtio_pci libata virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod [last unloaded: nf_defrag_ipv6]
 CPU: 0 PID: 4014 Comm: ipvsadm Not tainted 5.4.0-rc1.upstream+ #741
 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
 RIP: 0010:ip_vs_use_count_inc+0x19/0x20 [ip_vs]
 Code: e8 bc 5e dc f4 4c 8b 44 24 08 e9 db fd ff ff 66 90 0f 1f 44 00 00 48 c7 c7 00 02 42 c1 e8 5f 3c 9e f4 84 c0 74 04 0f b6 c0 c3 <0f> 0b 0f b6 c0 c3 90 0f 1f 44 00 00 55 48 89 e5 41 57 49 89 f7 41
 RSP: 0018:ffff88804ba5f2c8 EFLAGS: 00010246
 RAX: 0000000000000000 RBX: ffff88804f533ca8 RCX: 0000000000000000
 RDX: 0000000000000000 RSI: 1ffff1100974be3f RDI: ffffffffc1420200
 RBP: ffff88804ba5f500 R08: fffffbfff71129d1 R09: fffffbfff71129d1
 R10: 0000000000000001 R11: fffffbfff71129d0 R12: ffff88804f533300
 R13: 0000000000000000 R14: 0000000000000001 R15: ffff888079442d00
 FS:  00007fd42ff9c740(0000) GS:ffff888053800000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000fc91d8 CR3: 000000004fe2c002 CR4: 00000000001606f0
 Call Trace:
  start_sync_thread+0x1c17/0x23a0 [ip_vs]
  ip_vs_genl_new_daemon+0x3b0/0x550 [ip_vs]
  ip_vs_genl_set_daemon+0x295/0x360 [ip_vs]
  genl_rcv_msg+0x588/0x11e0
  netlink_rcv_skb+0x121/0x350
  genl_rcv+0x24/0x40
  netlink_unicast+0x416/0x5d0
  netlink_sendmsg+0x67f/0xb50
  sock_sendmsg+0xe2/0x110
  ___sys_sendmsg+0x640/0x940
  __sys_sendmsg+0xd2/0x170
  do_syscall_64+0xa5/0x4e0
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
 RIP: 0033:0x7fd42f48f930
 Code: c3 48 8b 05 42 75 2c 00 f7 db 64 89 18 48 83 cb ff eb dd 0f 1f 80 00 00 00 00 83 3d 6d d6 2c 00 00 75 10 b8 2e 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 ae cc 00 00 48 89 04 24
 RSP: 002b:00007ffcf69ecaa8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
 RAX: ffffffffffffffda RBX: 0000000000fc2100 RCX: 00007fd42f48f930
 RDX: 0000000000000000 RSI: 00007ffcf69ecb50 RDI: 0000000000000003
 RBP: 0000000000fc2010 R08: 0000000000000001 R09: 0000000000fc2150
 R10: 00007fd42f7577b8 R11: 0000000000000246 R12: 0000000000fc3160
 R13: 00007ffcf69ecb50 R14: 0000000000000000 R15: 0000000000000000
 irq event stamp: 3288
 hardirqs last  enabled at (3287): [<ffffffffb5bb276f>] __local_bh_enable_ip+0xef/0x1b0
 hardirqs last disabled at (3288): [<ffffffffb5a0556a>] trace_hardirqs_off_thunk+0x1a/0x20
 softirqs last  enabled at (3286): [<ffffffffc13f3365>] start_sync_thread+0x1c05/0x23a0 [ip_vs]
 softirqs last disabled at (3284): [<ffffffffc13f32e1>] start_sync_thread+0x1b81/0x23a0 [ip_vs]
 ---[ end trace 7b4e216440e510e6 ]---
 IPVS: stopping master sync thread 4016 ...
 ------------[ cut here ]------------
 WARNING: CPU: 0 PID: 4013 at kernel/module.c:1146 module_put.part.44+0x15b/0x290
 Modules linked in: ip_vs(-) nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 veth ip6table_filter ip6_tables iptable_filter binfmt_misc intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul ext4 mbcache jbd2 ghash_clmulni_intel snd_hda_codec_generic ledtrig_audio snd_hda_intel snd_intel_nhlt snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm aesni_intel crypto_simd cryptd glue_helper joydev pcspkr snd_timer virtio_balloon snd soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c ata_generic pata_acpi virtio_net net_failover virtio_blk failover virtio_console qxl drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ata_piix ttm crc32c_intel serio_raw drm virtio_pci libata virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod [last unloaded: nf_defrag_ipv6]
 CPU: 0 PID: 4013 Comm: modprobe Tainted: G        W         5.4.0-rc1.upstream+ #741
 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
 RIP: 0010:module_put.part.44+0x15b/0x290
 Code: 04 25 28 00 00 00 0f 85 18 01 00 00 48 83 c4 68 5b 5d 41 5c 41 5d 41 5e 41 5f c3 89 44 24 28 83 e8 01 89 c5 0f 89 57 ff ff ff <0f> 0b e9 78 ff ff ff 65 8b 1d 67 83 26 4a 89 db be 08 00 00 00 48
 RSP: 0018:ffff888050607c78 EFLAGS: 00010297
 RAX: 0000000000000003 RBX: ffffffffc1420590 RCX: ffffffffb5db0ef9
 RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffffffffc1420590
 RBP: 00000000ffffffff R08: fffffbfff82840b3 R09: fffffbfff82840b3
 R10: 0000000000000001 R11: fffffbfff82840b2 R12: 1ffff1100a0c0f90
 R13: ffffffffc1420200 R14: ffff88804f533300 R15: ffff88804f533ca0
 FS:  00007f8ea9720740(0000) GS:ffff888053800000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00007f3245abe000 CR3: 000000004c28a006 CR4: 00000000001606f0
 Call Trace:
  stop_sync_thread+0x3a3/0x7c0 [ip_vs]
  ip_vs_sync_net_cleanup+0x13/0x50 [ip_vs]
  ops_exit_list.isra.5+0x94/0x140
  unregister_pernet_operations+0x29d/0x460
  unregister_pernet_device+0x26/0x60
  ip_vs_cleanup+0x11/0x38 [ip_vs]
  __x64_sys_delete_module+0x2d5/0x400
  do_syscall_64+0xa5/0x4e0
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
 RIP: 0033:0x7f8ea8bf0db7
 Code: 73 01 c3 48 8b 0d b9 80 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 89 80 2c 00 f7 d8 64 89 01 48
 RSP: 002b:00007ffcd38d2fe8 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
 RAX: ffffffffffffffda RBX: 0000000002436240 RCX: 00007f8ea8bf0db7
 RDX: 0000000000000000 RSI: 0000000000000800 RDI: 00000000024362a8
 RBP: 0000000000000000 R08: 00007f8ea8eba060 R09: 00007f8ea8c658a0
 R10: 00007ffcd38d2a60 R11: 0000000000000206 R12: 0000000000000000
 R13: 0000000000000001 R14: 00000000024362a8 R15: 0000000000000000
 irq event stamp: 4538
 hardirqs last  enabled at (4537): [<ffffffffb6193dde>] quarantine_put+0x9e/0x170
 hardirqs last disabled at (4538): [<ffffffffb5a0556a>] trace_hardirqs_off_thunk+0x1a/0x20
 softirqs last  enabled at (4522): [<ffffffffb6f8ebe9>] sk_common_release+0x169/0x2d0
 softirqs last disabled at (4520): [<ffffffffb6f8eb3e>] sk_common_release+0xbe/0x2d0
 ---[ end trace 7b4e216440e510e7 ]---
 IPVS: ipvs unloaded.

so, this is try_module_get() in ip_vs_start_sync_thread() failing to refcount ip_vs.
When the module is unloaded and stop_sync_thread() is called, the refcounts are
unbalanced and then we see the splat when the last module_put() is called.

> 	Are there any steps to reproduce?

yes, a script that does the following:

        while true; do
                if modprobe -r ip_vs; then
                        sleep 0.5
                fi
        done >/dev/null 2>&1 &

        while true; do
                ipvsadm --start-daemon master --mcast-interface veth_syncd
        done >/dev/null 2>&1 &

        while true; do
                ipvsadm --stop-daemon master
        done >/dev/null 2>&1


I'm about to send a v2 patch where I test the return value of each call to
ip_vs_use_count_inc(), and jump to the error path of caller functions
(returning -EAGAIN) on failure of try_module_get(). For
ip_vs_start_sync_thread(), the resulting code is very similar to what I
did in the first post, with the only addition of an additional 'goto
out_unlock;' in case the attempt to refcount the module failed.

thank you in advance for any further feedback!
Julian Anastasov Oct. 9, 2019, 7:28 p.m. UTC | #3
Hello,

On Wed, 9 Oct 2019, Davide Caratti wrote:

> > 	Another possibility is the missing checks for 
> > ip_vs_use_count_inc() result to cause underflow on ip_vs_use_count_dec().
> 
> thanks for the advice, the unbalance is probably the cause of the splat
> I'm seeing. On a clean/unpatched kernel, I modified ip_vs_use_count() to
> do this:
> 
> WARN_ON(!try_module_get(THIS_MODULE))

	I think, the problem should be in step 2 below:

- step 1: try_stop_module -> try_release_module_ref -> atomic_sub_return
sets module refcnt to 0. After this point the process to remove module
can not be stopped and try_module_get() will fail in atomic_inc_not_zero.

- step 2: there is very small gap this step to occur: commands are
still registered and they should abort if try_module_get() fails.
This is what we see with WARN and oopses: ip_vs_use_count_inc can not
increment refcnt and we later fail in following ip_vs_use_count_dec.
Commands that add resources still do not need _inc + _dec but
those that increment module refcnt need the _inc check.

- step 3: ip_vs_unregister_nl_ioctl unregisters command handlers.
After this step commands are not running.

- step 4: release all module resources added up to step 2.
On module removal we do not expect to see stop_sync_thread()
stopping any running threads. The module can not be removed while
sync is running.

> using the reproducer below, I still see the splat in module_put(), but
> it's preceded by a failure in ip_vs_use_count_inc(), which accidentally is
> the same one I moved in my patch).
> 
> IPVS: Registered protocols (TCP, UDP, SCTP, AH, ESP)
>  IPVS: Connection hash table configured (size=4096, memory=64Kbytes)
>  IPVS: ipvs loaded.
>  IPVS: sync thread started: state = MASTER, mcast_ifn = veth_syncd, syncid = 0, id = 0
>  IPVS: stopping master sync thread 4005 ...
>  IPVS: sync thread started: state = MASTER, mcast_ifn = veth_syncd, syncid = 0, id = 0
>  ------------[ cut here ]------------
>  WARNING: CPU: 0 PID: 4014 at net/netfilter/ipvs/ip_vs_ctl.c:235 ip_vs_use_count_inc+0x19/0x20 [ip_vs]
>  Modules linked in: ip_vs(-) nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 veth ip6table_filter ip6_tables iptable_filter binfmt_misc intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul ext4 mbcache jbd2 ghash_clmulni_intel snd_hda_codec_generic ledtrig_audio snd_hda_intel snd_intel_nhlt snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm aesni_intel crypto_simd cryptd glue_helper joydev pcspkr snd_timer virtio_balloon snd soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c ata_generic pata_acpi virtio_net net_failover virtio_blk failover virtio_console qxl drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ata_piix ttm crc32c_intel serio_raw drm virtio_pci libata virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod [last unloaded: nf_defrag_ipv6]
>  CPU: 0 PID: 4014 Comm: ipvsadm Not tainted 5.4.0-rc1.upstream+ #741
>  Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
>  RIP: 0010:ip_vs_use_count_inc+0x19/0x20 [ip_vs]
>  Code: e8 bc 5e dc f4 4c 8b 44 24 08 e9 db fd ff ff 66 90 0f 1f 44 00 00 48 c7 c7 00 02 42 c1 e8 5f 3c 9e f4 84 c0 74 04 0f b6 c0 c3 <0f> 0b 0f b6 c0 c3 90 0f 1f 44 00 00 55 48 89 e5 41 57 49 89 f7 41
>  RSP: 0018:ffff88804ba5f2c8 EFLAGS: 00010246
>  RAX: 0000000000000000 RBX: ffff88804f533ca8 RCX: 0000000000000000
>  RDX: 0000000000000000 RSI: 1ffff1100974be3f RDI: ffffffffc1420200
>  RBP: ffff88804ba5f500 R08: fffffbfff71129d1 R09: fffffbfff71129d1
>  R10: 0000000000000001 R11: fffffbfff71129d0 R12: ffff88804f533300
>  R13: 0000000000000000 R14: 0000000000000001 R15: ffff888079442d00
>  FS:  00007fd42ff9c740(0000) GS:ffff888053800000(0000) knlGS:0000000000000000
>  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>  CR2: 0000000000fc91d8 CR3: 000000004fe2c002 CR4: 00000000001606f0
>  Call Trace:
>   start_sync_thread+0x1c17/0x23a0 [ip_vs]
>   ip_vs_genl_new_daemon+0x3b0/0x550 [ip_vs]
>   ip_vs_genl_set_daemon+0x295/0x360 [ip_vs]
>   genl_rcv_msg+0x588/0x11e0
>   netlink_rcv_skb+0x121/0x350
>   genl_rcv+0x24/0x40
>   netlink_unicast+0x416/0x5d0
>   netlink_sendmsg+0x67f/0xb50
>   sock_sendmsg+0xe2/0x110
>   ___sys_sendmsg+0x640/0x940
>   __sys_sendmsg+0xd2/0x170
>   do_syscall_64+0xa5/0x4e0
>   entry_SYSCALL_64_after_hwframe+0x49/0xbe
>  RIP: 0033:0x7fd42f48f930
>  Code: c3 48 8b 05 42 75 2c 00 f7 db 64 89 18 48 83 cb ff eb dd 0f 1f 80 00 00 00 00 83 3d 6d d6 2c 00 00 75 10 b8 2e 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 ae cc 00 00 48 89 04 24
>  RSP: 002b:00007ffcf69ecaa8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
>  RAX: ffffffffffffffda RBX: 0000000000fc2100 RCX: 00007fd42f48f930
>  RDX: 0000000000000000 RSI: 00007ffcf69ecb50 RDI: 0000000000000003
>  RBP: 0000000000fc2010 R08: 0000000000000001 R09: 0000000000fc2150
>  R10: 00007fd42f7577b8 R11: 0000000000000246 R12: 0000000000fc3160
>  R13: 00007ffcf69ecb50 R14: 0000000000000000 R15: 0000000000000000
>  irq event stamp: 3288
>  hardirqs last  enabled at (3287): [<ffffffffb5bb276f>] __local_bh_enable_ip+0xef/0x1b0
>  hardirqs last disabled at (3288): [<ffffffffb5a0556a>] trace_hardirqs_off_thunk+0x1a/0x20
>  softirqs last  enabled at (3286): [<ffffffffc13f3365>] start_sync_thread+0x1c05/0x23a0 [ip_vs]
>  softirqs last disabled at (3284): [<ffffffffc13f32e1>] start_sync_thread+0x1b81/0x23a0 [ip_vs]
>  ---[ end trace 7b4e216440e510e6 ]---
>  IPVS: stopping master sync thread 4016 ...
>  ------------[ cut here ]------------
>  WARNING: CPU: 0 PID: 4013 at kernel/module.c:1146 module_put.part.44+0x15b/0x290
>  Modules linked in: ip_vs(-) nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 veth ip6table_filter ip6_tables iptable_filter binfmt_misc intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul ext4 mbcache jbd2 ghash_clmulni_intel snd_hda_codec_generic ledtrig_audio snd_hda_intel snd_intel_nhlt snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm aesni_intel crypto_simd cryptd glue_helper joydev pcspkr snd_timer virtio_balloon snd soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c ata_generic pata_acpi virtio_net net_failover virtio_blk failover virtio_console qxl drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ata_piix ttm crc32c_intel serio_raw drm virtio_pci libata virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod [last unloaded: nf_defrag_ipv6]
>  CPU: 0 PID: 4013 Comm: modprobe Tainted: G        W         5.4.0-rc1.upstream+ #741
>  Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
>  RIP: 0010:module_put.part.44+0x15b/0x290
>  Code: 04 25 28 00 00 00 0f 85 18 01 00 00 48 83 c4 68 5b 5d 41 5c 41 5d 41 5e 41 5f c3 89 44 24 28 83 e8 01 89 c5 0f 89 57 ff ff ff <0f> 0b e9 78 ff ff ff 65 8b 1d 67 83 26 4a 89 db be 08 00 00 00 48
>  RSP: 0018:ffff888050607c78 EFLAGS: 00010297
>  RAX: 0000000000000003 RBX: ffffffffc1420590 RCX: ffffffffb5db0ef9
>  RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffffffffc1420590
>  RBP: 00000000ffffffff R08: fffffbfff82840b3 R09: fffffbfff82840b3
>  R10: 0000000000000001 R11: fffffbfff82840b2 R12: 1ffff1100a0c0f90
>  R13: ffffffffc1420200 R14: ffff88804f533300 R15: ffff88804f533ca0
>  FS:  00007f8ea9720740(0000) GS:ffff888053800000(0000) knlGS:0000000000000000
>  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>  CR2: 00007f3245abe000 CR3: 000000004c28a006 CR4: 00000000001606f0
>  Call Trace:
>   stop_sync_thread+0x3a3/0x7c0 [ip_vs]
>   ip_vs_sync_net_cleanup+0x13/0x50 [ip_vs]
>   ops_exit_list.isra.5+0x94/0x140
>   unregister_pernet_operations+0x29d/0x460
>   unregister_pernet_device+0x26/0x60
>   ip_vs_cleanup+0x11/0x38 [ip_vs]
>   __x64_sys_delete_module+0x2d5/0x400
>   do_syscall_64+0xa5/0x4e0
>   entry_SYSCALL_64_after_hwframe+0x49/0xbe
>  RIP: 0033:0x7f8ea8bf0db7
>  Code: 73 01 c3 48 8b 0d b9 80 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 89 80 2c 00 f7 d8 64 89 01 48
>  RSP: 002b:00007ffcd38d2fe8 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
>  RAX: ffffffffffffffda RBX: 0000000002436240 RCX: 00007f8ea8bf0db7
>  RDX: 0000000000000000 RSI: 0000000000000800 RDI: 00000000024362a8
>  RBP: 0000000000000000 R08: 00007f8ea8eba060 R09: 00007f8ea8c658a0
>  R10: 00007ffcd38d2a60 R11: 0000000000000206 R12: 0000000000000000
>  R13: 0000000000000001 R14: 00000000024362a8 R15: 0000000000000000
>  irq event stamp: 4538
>  hardirqs last  enabled at (4537): [<ffffffffb6193dde>] quarantine_put+0x9e/0x170
>  hardirqs last disabled at (4538): [<ffffffffb5a0556a>] trace_hardirqs_off_thunk+0x1a/0x20
>  softirqs last  enabled at (4522): [<ffffffffb6f8ebe9>] sk_common_release+0x169/0x2d0
>  softirqs last disabled at (4520): [<ffffffffb6f8eb3e>] sk_common_release+0xbe/0x2d0
>  ---[ end trace 7b4e216440e510e7 ]---
>  IPVS: ipvs unloaded.
> 
> so, this is try_module_get() in ip_vs_start_sync_thread() failing to refcount ip_vs.

	Yes, refcnt is not increased.

> When the module is unloaded and stop_sync_thread() is called, the refcounts are
> unbalanced and then we see the splat when the last module_put() is called.

	module_put on refcnt 0

> > 	Are there any steps to reproduce?
> 
> yes, a script that does the following:
> 
>         while true; do
>                 if modprobe -r ip_vs; then
>                         sleep 0.5
>                 fi
>         done >/dev/null 2>&1 &
> 
>         while true; do
>                 ipvsadm --start-daemon master --mcast-interface veth_syncd
>         done >/dev/null 2>&1 &
> 
>         while true; do
>                 ipvsadm --stop-daemon master
>         done >/dev/null 2>&1
> 
> 
> I'm about to send a v2 patch where I test the return value of each call to
> ip_vs_use_count_inc(), and jump to the error path of caller functions
> (returning -EAGAIN) on failure of try_module_get(). For
> ip_vs_start_sync_thread(), the resulting code is very similar to what I
> did in the first post, with the only addition of an additional 'goto
> out_unlock;' in case the attempt to refcount the module failed.

	Yes, every place that increments refcnt would 
need early ip_vs_use_count_inc() call with result check. Then check in 
do_ip_vs_set_ctl() would not be needed, any resources we modify would be
released after the ip_vs_unregister_nl_ioctl call.

	For the other files, check is needed in register_ip_vs_pe() and 
register_ip_vs_scheduler(), register_ip_vs_app() should
_inc+check and _dec on failure.

Regards

--
Julian Anastasov <ja@ssi.bg>

Patch
diff mbox series

diff --git a/net/netfilter/ipvs/ip_vs_sync.c b/net/netfilter/ipvs/ip_vs_sync.c
index a4a78c4b06de..7ac148182b61 100644
--- a/net/netfilter/ipvs/ip_vs_sync.c
+++ b/net/netfilter/ipvs/ip_vs_sync.c
@@ -1758,6 +1758,9 @@  int start_sync_thread(struct netns_ipvs *ipvs, struct ipvs_sync_daemon_cfg *c,
 	int result = -ENOMEM;
 	u16 mtu, min_mtu;
 
+	/* increase the module use count */
+	ip_vs_use_count_inc();
+
 	IP_VS_DBG(7, "%s(): pid %d\n", __func__, task_pid_nr(current));
 	IP_VS_DBG(7, "Each ip_vs_sync_conn entry needs %zd bytes\n",
 		  sizeof(struct ip_vs_sync_conn_v0));
@@ -1892,9 +1895,6 @@  int start_sync_thread(struct netns_ipvs *ipvs, struct ipvs_sync_daemon_cfg *c,
 	mutex_unlock(&ipvs->sync_mutex);
 	rtnl_unlock();
 
-	/* increase the module use count */
-	ip_vs_use_count_inc();
-
 	return 0;
 
 out:
@@ -1924,11 +1924,16 @@  int start_sync_thread(struct netns_ipvs *ipvs, struct ipvs_sync_daemon_cfg *c,
 		}
 		kfree(ti);
 	}
+
+	/* decrease the module use count */
+	ip_vs_use_count_dec();
 	return result;
 
 out_early:
 	mutex_unlock(&ipvs->sync_mutex);
 	rtnl_unlock();
+	/* decrease the module use count */
+	ip_vs_use_count_dec();
 	return result;
 }