Message ID | 20191208182417.16789-1-fw@strlen.de |
---|---|
State | Accepted, archived |
Delegated to: | Florian Westphal |
Headers | show |
Hi, On Sun, 2019-12-08 at 19:24 +0100, Florian Westphal wrote: > It includes 48 patches (down from 66), I will refrain from patch-bombing > the list with it. There is one patch as followup here because its > "almost new" in the sense that its an amalgamation of different patches > that I placed in that commit because the others seemed quite large already. > > The __tcp_poll patch is gone. Later patches, in particular the changes to > only store 'tcp_sock' rather than 'struct socket' on the conn_list, have > been brought forward and squashed into the relevant commits, would thus > be implicitly part of basic mptcp submission. I've amemded commit > messages where appropriate. Thank you Florian for the great work! I went through the patches up to v1 support, and LGTM. > There are more opportunities for squashing after the kselftest commit, > but I did not do this. Unless there is a need to do so I would prefer > to add smaller patches after basic functionality is working. Agreed! I'll try to rebase and squash the fallback cleanup patches on top of this. Thanks, Paolo
Hi Florian, Paolo, On 09/12/2019 13:33, Paolo Abeni wrote: > Hi, > > On Sun, 2019-12-08 at 19:24 +0100, Florian Westphal wrote: >> It includes 48 patches (down from 66), I will refrain from patch-bombing >> the list with it. There is one patch as followup here because its >> "almost new" in the sense that its an amalgamation of different patches >> that I placed in that commit because the others seemed quite large already. >> >> The __tcp_poll patch is gone. Later patches, in particular the changes to >> only store 'tcp_sock' rather than 'struct socket' on the conn_list, have >> been brought forward and squashed into the relevant commits, would thus >> be implicitly part of basic mptcp submission. I've amemded commit >> messages where appropriate. > > Thank you Florian for the great work! > > I went through the patches up to v1 support, and LGTM. It is also good to me! Thank you very much for the very big work! The kind of work where even you you have to sit down to do it! ... O:-D I just re-created the tree using your branch. I am also checking now that all commits are OK, with and without MPTCP. Also the usual compilation tests with i386 and without IPv6. I will do a second pass on the different commits tomorrow! Cheers, Matt
Hi, On 10/12/2019 19:24, Matthieu Baerts wrote: > Hi Florian, Paolo, > > On 09/12/2019 13:33, Paolo Abeni wrote: >> Hi, >> >> On Sun, 2019-12-08 at 19:24 +0100, Florian Westphal wrote: >>> It includes 48 patches (down from 66), I will refrain from patch-bombing >>> the list with it. There is one patch as followup here because its >>> "almost new" in the sense that its an amalgamation of different patches >>> that I placed in that commit because the others seemed quite large >>> already. >>> >>> The __tcp_poll patch is gone. Later patches, in particular the >>> changes to >>> only store 'tcp_sock' rather than 'struct socket' on the conn_list, have >>> been brought forward and squashed into the relevant commits, would thus >>> be implicitly part of basic mptcp submission. I've amemded commit >>> messages where appropriate. >> >> Thank you Florian for the great work! >> >> I went through the patches up to v1 support, and LGTM. > > It is also good to me! Thank you very much for the very big work! The > kind of work where even you you have to sit down to do it! ... O:-D > > I just re-created the tree using your branch. I am also checking now > that all commits are OK, with and without MPTCP. Also the usual > compilation tests with i386 and without IPv6. FYI, all tests are green! I didn't re-enable the automatic sync with net-next. I will do that after having applied recent Paolo's changes. Cheers, Matt
On Tue, 2019-12-10 at 22:27 +0100, Matthieu Baerts wrote: > Hi, > > On 10/12/2019 19:24, Matthieu Baerts wrote: > > Hi Florian, Paolo, > > > > On 09/12/2019 13:33, Paolo Abeni wrote: > > > Hi, > > > > > > On Sun, 2019-12-08 at 19:24 +0100, Florian Westphal wrote: > > > > It includes 48 patches (down from 66), I will refrain from patch-bombing > > > > the list with it. There is one patch as followup here because its > > > > "almost new" in the sense that its an amalgamation of different patches > > > > that I placed in that commit because the others seemed quite large > > > > already. > > > > > > > > The __tcp_poll patch is gone. Later patches, in particular the > > > > changes to > > > > only store 'tcp_sock' rather than 'struct socket' on the conn_list, have > > > > been brought forward and squashed into the relevant commits, would thus > > > > be implicitly part of basic mptcp submission. I've amemded commit > > > > messages where appropriate. > > > > > > Thank you Florian for the great work! > > > > > > I went through the patches up to v1 support, and LGTM. > > > > It is also good to me! Thank you very much for the very big work! The > > kind of work where even you you have to sit down to do it! ... O:-D > > > > I just re-created the tree using your branch. I am also checking now > > that all commits are OK, with and without MPTCP. Also the usual > > compilation tests with i386 and without IPv6. > > FYI, all tests are green! > I didn't re-enable the automatic sync with net-next. I will do that > after having applied recent Paolo's changes. The rebased and amost squashed thing is available here: https://github.com/pabeni/mptcp/tree/fwestpha-mptcp_export_rebase_19_fallback_cleanup_2 It build on each commit. Not tested yet, sent-out early for review's sake. Note that some patches after MP_JOIN required some relevant rework, see: https://github.com/pabeni/mptcp/commit/0a84b5b3547cba0bea6186402d0eed4981b6174a https://github.com/pabeni/mptcp/commit/481d3137ce456df2dc93f4b64f0204201d6fee7f @Florian, WDYT? Thanks, Paolo
On Wed, 2019-12-11 at 17:32 +0100, Paolo Abeni wrote: > On Tue, 2019-12-10 at 22:27 +0100, Matthieu Baerts wrote: > > Hi, > > > > On 10/12/2019 19:24, Matthieu Baerts wrote: > > > Hi Florian, Paolo, > > > > > > On 09/12/2019 13:33, Paolo Abeni wrote: > > > > Hi, > > > > > > > > On Sun, 2019-12-08 at 19:24 +0100, Florian Westphal wrote: > > > > > It includes 48 patches (down from 66), I will refrain from patch-bombing > > > > > the list with it. There is one patch as followup here because its > > > > > "almost new" in the sense that its an amalgamation of different patches > > > > > that I placed in that commit because the others seemed quite large > > > > > already. > > > > > > > > > > The __tcp_poll patch is gone. Later patches, in particular the > > > > > changes to > > > > > only store 'tcp_sock' rather than 'struct socket' on the conn_list, have > > > > > been brought forward and squashed into the relevant commits, would thus > > > > > be implicitly part of basic mptcp submission. I've amemded commit > > > > > messages where appropriate. > > > > > > > > Thank you Florian for the great work! > > > > > > > > I went through the patches up to v1 support, and LGTM. > > > > > > It is also good to me! Thank you very much for the very big work! The > > > kind of work where even you you have to sit down to do it! ... O:-D > > > > > > I just re-created the tree using your branch. I am also checking now > > > that all commits are OK, with and without MPTCP. Also the usual > > > compilation tests with i386 and without IPv6. > > > > FYI, all tests are green! > > I didn't re-enable the automatic sync with net-next. I will do that > > after having applied recent Paolo's changes. > > The rebased and amost squashed thing is available here: > > https://github.com/pabeni/mptcp/tree/fwestpha-mptcp_export_rebase_19_fallback_cleanup_2 > > It build on each commit. Not tested yet, sent-out early for review's > sake. Just tested successfully on top of V1 patches and on branch head. For the record, I observed once on the unsquashed patches, the following warn (with decoded stack trace): [ 1093.576154] ------------[ cut here ]------------ [ 1093.577188] refcount_t: addition on 0; use-after-free. [ 1093.577966] WARNING: CPU: 2 PID: 29790 at lib/refcount.c:25 refcount_warn_saturate (/home/pabeni/devel/linux-mptcp/lib/refcount.c:25 (discriminator 3)) [ 1093.579153] Modules linked in: sch_netem veth intel_rapl_msr rfkill intel_rapl_common kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel joydev pcspkr virtio_balloon i2c_piix4 ip_tables xfs libcrc32c crc32c_intel serio_raw virtio_net net_failover virtio_blk ata_generic failover virtio_console ata_piix libata [ 1093.584101] CPU: 2 PID: 29790 Comm: mptcp_connect Not tainted 5.4.0.mptcp_bfaef67f2a47+ #464 [ 1093.585723] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-2.fc30 04/01/2014 [ 1093.587415] RIP: 0010:refcount_warn_saturate (/home/pabeni/devel/linux-mptcp/lib/refcount.c:25 (discriminator 3)) [ 1093.588409] Code: b9 2d 41 02 01 e8 3b 90 41 ff 0f 0b eb 9d 80 3d a8 2d 41 02 00 75 94 48 c7 c7 c0 09 e8 87 c6 05 98 2d 41 02 01 e8 1b 90 41 ff <0f> 0b e9 7a ff ff ff 80 3d 82 2d 41 02 00 0f 85 6d ff ff ff 48 c7 All code ======== 0: b9 2d 41 02 01 mov $0x102412d,%ecx 5: e8 3b 90 41 ff callq 0xffffffffff419045 a: 0f 0b ud2 c: eb 9d jmp 0xffffffffffffffab e: 80 3d a8 2d 41 02 00 cmpb $0x0,0x2412da8(%rip) # 0x2412dbd 15: 75 94 jne 0xffffffffffffffab 17: 48 c7 c7 c0 09 e8 87 mov $0xffffffff87e809c0,%rdi 1e: c6 05 98 2d 41 02 01 movb $0x1,0x2412d98(%rip) # 0x2412dbd 25: e8 1b 90 41 ff callq 0xffffffffff419045 2a:* 0f 0b ud2 <-- trapping instruction 2c: e9 7a ff ff ff jmpq 0xffffffffffffffab 31: 80 3d 82 2d 41 02 00 cmpb $0x0,0x2412d82(%rip) # 0x2412dba 38: 0f 85 6d ff ff ff jne 0xffffffffffffffab 3e: 48 rex.W 3f: c7 .byte 0xc7 Code starting with the faulting instruction =========================================== 0: 0f 0b ud2 2: e9 7a ff ff ff jmpq 0xffffffffffffff81 7: 80 3d 82 2d 41 02 00 cmpb $0x0,0x2412d82(%rip) # 0x2412d90 e: 0f 85 6d ff ff ff jne 0xffffffffffffff81 14: 48 rex.W 15: c7 .byte 0xc7 [ 1093.592029] RSP: 0018:ffff8881061f7cd0 EFLAGS: 00010282 [ 1093.593155] RAX: 0000000000000000 RBX: ffff888035b54040 RCX: 0000000000000000 [ 1093.594476] RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffffed1020c3ef8c [ 1093.595769] RBP: 0000000000000002 R08: 0000000000000001 R09: ffffed102187c4d1 [ 1093.597063] R10: ffffed102187c4d0 R11: ffff88810c3e2687 R12: ffff888035b53fe8 [ 1093.598354] R13: ffff888035b54040 R14: ffff888035b53fc0 R15: ffff888035b53fc0 [ 1093.599654] FS: 00007f9b8b9ad740(0000) GS:ffff88810c200000(0000) knlGS:0000000000000000 [ 1093.601104] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1093.602174] CR2: 00007fffdcdc0000 CR3: 0000000103e04001 CR4: 0000000000160ee0 [ 1093.603472] Call Trace: [ 1093.603997] tcp_close (/home/pabeni/devel/linux-mptcp/./include/linux/refcount.h:191 /home/pabeni/devel/linux-mptcp/./include/linux/refcount.h:228 /home/pabeni/devel/linux-mptcp/./include/net/sock.h:647 /home/pabeni/devel/linux-mptcp/net/ipv4/tcp.c:2434) [ 1093.605757] mptcp_close (/home/pabeni/devel/linux-mptcp/net/mptcp/protocol.c:521 /home/pabeni/devel/linux-mptcp/net/mptcp/protocol.c:579) [ 1093.606604] inet_release (/home/pabeni/devel/linux-mptcp/net/ipv4/af_inet.c:428) [ 1093.607427] __sock_release (/home/pabeni/devel/linux-mptcp/net/socket.c:593) [ 1093.608230] sock_close (/home/pabeni/devel/linux-mptcp/net/socket.c:1272) [ 1093.608792] __fput (/home/pabeni/devel/linux-mptcp/fs/file_table.c:280) [ 1093.609430] task_work_run (/home/pabeni/devel/linux-mptcp/kernel/task_work.c:115 (discriminator 1)) [ 1093.610100] exit_to_usermode_loop (/home/pabeni/devel/linux-mptcp/./include/linux/tracehook.h:188 /home/pabeni/devel/linux-mptcp/arch/x86/entry/common.c:164) [ 1093.610858] do_syscall_64 (/home/pabeni/devel/linux-mptcp/arch/x86/entry/common.c:195 /home/pabeni/devel/linux-mptcp/arch/x86/entry/common.c:278 /home/pabeni/devel/linux-mptcp/arch/x86/entry/common.c:304) [ 1093.611579] entry_SYSCALL_64_after_hwframe (/home/pabeni/devel/linux-mptcp/arch/x86/entry/entry_64.S:177) [ 1093.612617] RIP: 0033:0x7f9b8b4b77a0 [ 1093.613289] Code: 00 64 c7 00 0d 00 00 00 b8 ff ff ff ff eb 90 b8 ff ff ff ff eb 89 0f 1f 40 00 83 3d fd c7 2d 00 00 75 10 b8 03 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 9e c5 01 00 48 89 04 24 All code ======== 0: 00 64 c7 00 add %ah,0x0(%rdi,%rax,8) 4: 0d 00 00 00 b8 or $0xb8000000,%eax 9: ff (bad) a: ff (bad) b: ff (bad) c: ff (bad) d: eb 90 jmp 0xffffffffffffff9f f: b8 ff ff ff ff mov $0xffffffff,%eax 14: eb 89 jmp 0xffffffffffffff9f 16: 0f 1f 40 00 nopl 0x0(%rax) 1a: 83 3d fd c7 2d 00 00 cmpl $0x0,0x2dc7fd(%rip) # 0x2dc81e 21: 75 10 jne 0x33 23: b8 03 00 00 00 mov $0x3,%eax 28: 0f 05 syscall 2a:* 48 3d 01 f0 ff ff cmp $0xfffffffffffff001,%rax <-- trapping instruction 30: 73 31 jae 0x63 32: c3 retq --- But I could not reproduce it (so far) on the squashed branch. I'll suggest to merge the patches as-is and ev. look after bug after that. Cheers, Paolo
On Wed, 2019-12-11 at 17:32 +0100, Paolo Abeni wrote: > On Tue, 2019-12-10 at 22:27 +0100, Matthieu Baerts wrote: > > Hi, > > > > On 10/12/2019 19:24, Matthieu Baerts wrote: > > > Hi Florian, Paolo, > > > > > > On 09/12/2019 13:33, Paolo Abeni wrote: > > > > Hi, > > > > > > > > On Sun, 2019-12-08 at 19:24 +0100, Florian Westphal wrote: > > > > > It includes 48 patches (down from 66), I will refrain from patch-bombing > > > > > the list with it. There is one patch as followup here because its > > > > > "almost new" in the sense that its an amalgamation of different patches > > > > > that I placed in that commit because the others seemed quite large > > > > > already. > > > > > > > > > > The __tcp_poll patch is gone. Later patches, in particular the > > > > > changes to > > > > > only store 'tcp_sock' rather than 'struct socket' on the conn_list, have > > > > > been brought forward and squashed into the relevant commits, would thus > > > > > be implicitly part of basic mptcp submission. I've amemded commit > > > > > messages where appropriate. > > > > > > > > Thank you Florian for the great work! > > > > > > > > I went through the patches up to v1 support, and LGTM. > > > > > > It is also good to me! Thank you very much for the very big work! The > > > kind of work where even you you have to sit down to do it! ... O:-D > > > > > > I just re-created the tree using your branch. I am also checking now > > > that all commits are OK, with and without MPTCP. Also the usual > > > compilation tests with i386 and without IPv6. > > > > FYI, all tests are green! > > I didn't re-enable the automatic sync with net-next. I will do that > > after having applied recent Paolo's changes. > > The rebased and amost squashed thing is available here: > > https://github.com/pabeni/mptcp/tree/fwestpha-mptcp_export_rebase_19_fallback_cleanup_2 > > It build on each commit. Not tested yet, sent-out early for review's > sake. > > Note that some patches after MP_JOIN required some relevant rework, > see: > > https://github.com/pabeni/mptcp/commit/0a84b5b3547cba0bea6186402d0eed4981b6174a This one had a couple of gross mistake, just cooked a less bugged version. The new tree is here: https://github.com/pabeni/mptcp/commits/fwestpha-mptcp_export_rebase_19_fallback_cleanup_3 The only updated patch is the last one: https://github.com/pabeni/mptcp/commit/831e5c21e2751c2598b3e25516544d59d65c72d3 @Florian: I tested the whole tree vs self-tests - so MP_JOIN is not really tested. Are you ok to merge as-is and then look after eventual bug introduced by me here? Thanks, Paolo
Paolo Abeni <pabeni@redhat.com> wrote: > @Florian: I tested the whole tree vs self-tests - so MP_JOIN is not > really tested. Are you ok to merge as-is and then look after eventual > bug introduced by me here? Maybe, can you explain __mptcp_flush_join_list() usage? AFAICS subflow on join list can be moved to conn_list at 'random' point in time, even if its not fully established yet? (there are calls to __mptcp_flush_join_list in recv/sendmsg path...) Given MP_JOIN isn't in the 1st/2nd batch i would be ok with merging now though.
On Wed, 2019-12-11 at 20:22 +0100, Florian Westphal wrote: > Paolo Abeni <pabeni@redhat.com> wrote: > > @Florian: I tested the whole tree vs self-tests - so MP_JOIN is not > > really tested. Are you ok to merge as-is and then look after eventual > > bug introduced by me here? > > Maybe, can you explain __mptcp_flush_join_list() usage? mp_join subflow are appended to join_list by mptcp_finish_join(), for passive connections, at when the tcp subflow become established. We can't add directly to conn_list, or we will hit the ABBA locking schema the fallback cleanup tried to avoid. The code tries to splice join list into conn_list every just before each conn_list traversing, so that that later lookup/access have to cope only with conn_list. > AFAICS subflow on join list can be moved to conn_list at 'random' > point in time, even if its not fully established yet? passive subflow are moved into join_list when they are 'established' at the tcp level, but the MP_JOIN handshake is not completed (fourth ack still not reached). Should be ok for recvmsg() - as we check the 'data_ready' field - and close(), possibly we need some additional care for mptcp_subflow_get_send(). Cheers, Paolo
On Wed, 11 Dec 2019, Paolo Abeni wrote: > On Wed, 2019-12-11 at 17:32 +0100, Paolo Abeni wrote: >> On Tue, 2019-12-10 at 22:27 +0100, Matthieu Baerts wrote: >>> Hi, >>> >>> On 10/12/2019 19:24, Matthieu Baerts wrote: >>>> Hi Florian, Paolo, >>>> >>>> On 09/12/2019 13:33, Paolo Abeni wrote: >>>>> Hi, >>>>> >>>>> On Sun, 2019-12-08 at 19:24 +0100, Florian Westphal wrote: >>>>>> It includes 48 patches (down from 66), I will refrain from patch-bombing >>>>>> the list with it. There is one patch as followup here because its >>>>>> "almost new" in the sense that its an amalgamation of different patches >>>>>> that I placed in that commit because the others seemed quite large >>>>>> already. >>>>>> >>>>>> The __tcp_poll patch is gone. Later patches, in particular the >>>>>> changes to >>>>>> only store 'tcp_sock' rather than 'struct socket' on the conn_list, have >>>>>> been brought forward and squashed into the relevant commits, would thus >>>>>> be implicitly part of basic mptcp submission. I've amemded commit >>>>>> messages where appropriate. >>>>> >>>>> Thank you Florian for the great work! >>>>> >>>>> I went through the patches up to v1 support, and LGTM. >>>> >>>> It is also good to me! Thank you very much for the very big work! The >>>> kind of work where even you you have to sit down to do it! ... O:-D >>>> >>>> I just re-created the tree using your branch. I am also checking now >>>> that all commits are OK, with and without MPTCP. Also the usual >>>> compilation tests with i386 and without IPv6. >>> >>> FYI, all tests are green! >>> I didn't re-enable the automatic sync with net-next. I will do that >>> after having applied recent Paolo's changes. >> >> The rebased and amost squashed thing is available here: >> >> https://github.com/pabeni/mptcp/tree/fwestpha-mptcp_export_rebase_19_fallback_cleanup_2 >> >> It build on each commit. Not tested yet, sent-out early for review's >> sake. > > Just tested successfully on top of V1 patches and on branch head. > > For the record, I observed once on the unsquashed patches, the > following warn (with decoded stack trace): > > [ 1093.576154] ------------[ cut here ]------------ > [ 1093.577188] refcount_t: addition on 0; use-after-free. > [ 1093.577966] WARNING: CPU: 2 PID: 29790 at lib/refcount.c:25 refcount_warn_saturate (/home/pabeni/devel/linux-mptcp/lib/refcount.c:25 (discriminator 3)) > [ 1093.579153] Modules linked in: sch_netem veth intel_rapl_msr rfkill intel_rapl_common kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel joydev pcspkr virtio_balloon i2c_piix4 ip_tables xfs libcrc32c crc32c_intel serio_raw virtio_net net_failover virtio_blk ata_generic failover virtio_console ata_piix libata > [ 1093.584101] CPU: 2 PID: 29790 Comm: mptcp_connect Not tainted 5.4.0.mptcp_bfaef67f2a47+ #464 > [ 1093.585723] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-2.fc30 04/01/2014 > [ 1093.587415] RIP: 0010:refcount_warn_saturate (/home/pabeni/devel/linux-mptcp/lib/refcount.c:25 (discriminator 3)) > [ 1093.588409] Code: b9 2d 41 02 01 e8 3b 90 41 ff 0f 0b eb 9d 80 3d a8 2d 41 02 00 75 94 48 c7 c7 c0 09 e8 87 c6 05 98 2d 41 02 01 e8 1b 90 41 ff <0f> 0b e9 7a ff ff ff 80 3d 82 2d 41 02 00 0f 85 6d ff ff ff 48 c7 > All code > ======== > 0: b9 2d 41 02 01 mov $0x102412d,%ecx > 5: e8 3b 90 41 ff callq 0xffffffffff419045 > a: 0f 0b ud2 > c: eb 9d jmp 0xffffffffffffffab > e: 80 3d a8 2d 41 02 00 cmpb $0x0,0x2412da8(%rip) # 0x2412dbd > 15: 75 94 jne 0xffffffffffffffab > 17: 48 c7 c7 c0 09 e8 87 mov $0xffffffff87e809c0,%rdi > 1e: c6 05 98 2d 41 02 01 movb $0x1,0x2412d98(%rip) # 0x2412dbd > 25: e8 1b 90 41 ff callq 0xffffffffff419045 > 2a:* 0f 0b ud2 <-- trapping instruction > 2c: e9 7a ff ff ff jmpq 0xffffffffffffffab > 31: 80 3d 82 2d 41 02 00 cmpb $0x0,0x2412d82(%rip) # 0x2412dba > 38: 0f 85 6d ff ff ff jne 0xffffffffffffffab > 3e: 48 rex.W > 3f: c7 .byte 0xc7 > > Code starting with the faulting instruction > =========================================== > 0: 0f 0b ud2 > 2: e9 7a ff ff ff jmpq 0xffffffffffffff81 > 7: 80 3d 82 2d 41 02 00 cmpb $0x0,0x2412d82(%rip) # 0x2412d90 > e: 0f 85 6d ff ff ff jne 0xffffffffffffff81 > 14: 48 rex.W > 15: c7 .byte 0xc7 > [ 1093.592029] RSP: 0018:ffff8881061f7cd0 EFLAGS: 00010282 > [ 1093.593155] RAX: 0000000000000000 RBX: ffff888035b54040 RCX: 0000000000000000 > [ 1093.594476] RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffffed1020c3ef8c > [ 1093.595769] RBP: 0000000000000002 R08: 0000000000000001 R09: ffffed102187c4d1 > [ 1093.597063] R10: ffffed102187c4d0 R11: ffff88810c3e2687 R12: ffff888035b53fe8 > [ 1093.598354] R13: ffff888035b54040 R14: ffff888035b53fc0 R15: ffff888035b53fc0 > [ 1093.599654] FS: 00007f9b8b9ad740(0000) GS:ffff88810c200000(0000) knlGS:0000000000000000 > [ 1093.601104] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 1093.602174] CR2: 00007fffdcdc0000 CR3: 0000000103e04001 CR4: 0000000000160ee0 > [ 1093.603472] Call Trace: > [ 1093.603997] tcp_close (/home/pabeni/devel/linux-mptcp/./include/linux/refcount.h:191 /home/pabeni/devel/linux-mptcp/./include/linux/refcount.h:228 /home/pabeni/devel/linux-mptcp/./include/net/sock.h:647 /home/pabeni/devel/linux-mptcp/net/ipv4/tcp.c:2434) There's an extra sock_put() (or a missing sock_hold()) somewhere before the sock_hold() in tcp_close(). Seems like the only candidate for that is an unmatched sock_put() in subflow_ulp_release() which is releasing the reference count for mptcp_subflow_context->conn. It looks like that mptcp_subflow_context->conn reference count is intended to be matched by a sock_hold() when the conn value is assigned (see mptcp_subflow_create_socket()). There are other assignments to mptcp_subflow_context->conn in subflow_syn_recv_sock() and mptcp_accept() which are not calling sock_hold() - so maybe that's the problem. If that's it, I'm not sure why this warning happens rarely instead of all the time. I'll try adding some debug statements to confirm this theory. Mat > [ 1093.605757] mptcp_close (/home/pabeni/devel/linux-mptcp/net/mptcp/protocol.c:521 /home/pabeni/devel/linux-mptcp/net/mptcp/protocol.c:579) > [ 1093.606604] inet_release (/home/pabeni/devel/linux-mptcp/net/ipv4/af_inet.c:428) > [ 1093.607427] __sock_release (/home/pabeni/devel/linux-mptcp/net/socket.c:593) > [ 1093.608230] sock_close (/home/pabeni/devel/linux-mptcp/net/socket.c:1272) > [ 1093.608792] __fput (/home/pabeni/devel/linux-mptcp/fs/file_table.c:280) > [ 1093.609430] task_work_run (/home/pabeni/devel/linux-mptcp/kernel/task_work.c:115 (discriminator 1)) > [ 1093.610100] exit_to_usermode_loop (/home/pabeni/devel/linux-mptcp/./include/linux/tracehook.h:188 /home/pabeni/devel/linux-mptcp/arch/x86/entry/common.c:164) > [ 1093.610858] do_syscall_64 (/home/pabeni/devel/linux-mptcp/arch/x86/entry/common.c:195 /home/pabeni/devel/linux-mptcp/arch/x86/entry/common.c:278 /home/pabeni/devel/linux-mptcp/arch/x86/entry/common.c:304) > [ 1093.611579] entry_SYSCALL_64_after_hwframe (/home/pabeni/devel/linux-mptcp/arch/x86/entry/entry_64.S:177) > [ 1093.612617] RIP: 0033:0x7f9b8b4b77a0 > [ 1093.613289] Code: 00 64 c7 00 0d 00 00 00 b8 ff ff ff ff eb 90 b8 ff ff ff ff eb 89 0f 1f 40 00 83 3d fd c7 2d 00 00 75 10 b8 03 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 9e c5 01 00 48 89 04 24 > All code > ======== > 0: 00 64 c7 00 add %ah,0x0(%rdi,%rax,8) > 4: 0d 00 00 00 b8 or $0xb8000000,%eax > 9: ff (bad) > a: ff (bad) > b: ff (bad) > c: ff (bad) > d: eb 90 jmp 0xffffffffffffff9f > f: b8 ff ff ff ff mov $0xffffffff,%eax > 14: eb 89 jmp 0xffffffffffffff9f > 16: 0f 1f 40 00 nopl 0x0(%rax) > 1a: 83 3d fd c7 2d 00 00 cmpl $0x0,0x2dc7fd(%rip) # 0x2dc81e > 21: 75 10 jne 0x33 > 23: b8 03 00 00 00 mov $0x3,%eax > 28: 0f 05 syscall > 2a:* 48 3d 01 f0 ff ff cmp $0xfffffffffffff001,%rax <-- trapping instruction > 30: 73 31 jae 0x63 > 32: c3 retq > --- > > But I could not reproduce it (so far) on the squashed branch. > > I'll suggest to merge the patches as-is and ev. look after bug after > that. > > Cheers, > > Paolo > _______________________________________________ > mptcp mailing list -- mptcp@lists.01.org > To unsubscribe send an email to mptcp-leave@lists.01.org > -- Mat Martineau Intel
On Wed, 2019-12-11 at 17:06 -0800, Mat Martineau wrote: > On Wed, 11 Dec 2019, Paolo Abeni wrote: > > > On Wed, 2019-12-11 at 17:32 +0100, Paolo Abeni wrote: > > > On Tue, 2019-12-10 at 22:27 +0100, Matthieu Baerts wrote: > > > > Hi, > > > > > > > > On 10/12/2019 19:24, Matthieu Baerts wrote: > > > > > Hi Florian, Paolo, > > > > > > > > > > On 09/12/2019 13:33, Paolo Abeni wrote: > > > > > > Hi, > > > > > > > > > > > > On Sun, 2019-12-08 at 19:24 +0100, Florian Westphal wrote: > > > > > > > It includes 48 patches (down from 66), I will refrain from patch-bombing > > > > > > > the list with it. There is one patch as followup here because its > > > > > > > "almost new" in the sense that its an amalgamation of different patches > > > > > > > that I placed in that commit because the others seemed quite large > > > > > > > already. > > > > > > > > > > > > > > The __tcp_poll patch is gone. Later patches, in particular the > > > > > > > changes to > > > > > > > only store 'tcp_sock' rather than 'struct socket' on the conn_list, have > > > > > > > been brought forward and squashed into the relevant commits, would thus > > > > > > > be implicitly part of basic mptcp submission. I've amemded commit > > > > > > > messages where appropriate. > > > > > > > > > > > > Thank you Florian for the great work! > > > > > > > > > > > > I went through the patches up to v1 support, and LGTM. > > > > > > > > > > It is also good to me! Thank you very much for the very big work! The > > > > > kind of work where even you you have to sit down to do it! ... O:-D > > > > > > > > > > I just re-created the tree using your branch. I am also checking now > > > > > that all commits are OK, with and without MPTCP. Also the usual > > > > > compilation tests with i386 and without IPv6. > > > > > > > > FYI, all tests are green! > > > > I didn't re-enable the automatic sync with net-next. I will do that > > > > after having applied recent Paolo's changes. > > > > > > The rebased and amost squashed thing is available here: > > > > > > https://github.com/pabeni/mptcp/tree/fwestpha-mptcp_export_rebase_19_fallback_cleanup_2 > > > > > > It build on each commit. Not tested yet, sent-out early for review's > > > sake. > > > > Just tested successfully on top of V1 patches and on branch head. > > > > For the record, I observed once on the unsquashed patches, the > > following warn (with decoded stack trace): > > > > [ 1093.576154] ------------[ cut here ]------------ > > [ 1093.577188] refcount_t: addition on 0; use-after-free. > > [ 1093.577966] WARNING: CPU: 2 PID: 29790 at lib/refcount.c:25 refcount_warn_saturate (/home/pabeni/devel/linux-mptcp/lib/refcount.c:25 (discriminator 3)) > > [ 1093.579153] Modules linked in: sch_netem veth intel_rapl_msr rfkill intel_rapl_common kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel joydev pcspkr virtio_balloon i2c_piix4 ip_tables xfs libcrc32c crc32c_intel serio_raw virtio_net net_failover virtio_blk ata_generic failover virtio_console ata_piix libata > > [ 1093.584101] CPU: 2 PID: 29790 Comm: mptcp_connect Not tainted 5.4.0.mptcp_bfaef67f2a47+ #464 > > [ 1093.585723] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-2.fc30 04/01/2014 > > [ 1093.587415] RIP: 0010:refcount_warn_saturate (/home/pabeni/devel/linux-mptcp/lib/refcount.c:25 (discriminator 3)) > > [ 1093.588409] Code: b9 2d 41 02 01 e8 3b 90 41 ff 0f 0b eb 9d 80 3d a8 2d 41 02 00 75 94 48 c7 c7 c0 09 e8 87 c6 05 98 2d 41 02 01 e8 1b 90 41 ff <0f> 0b e9 7a ff ff ff 80 3d 82 2d 41 02 00 0f 85 6d ff ff ff 48 c7 > > All code > > ======== > > 0: b9 2d 41 02 01 mov $0x102412d,%ecx > > 5: e8 3b 90 41 ff callq 0xffffffffff419045 > > a: 0f 0b ud2 > > c: eb 9d jmp 0xffffffffffffffab > > e: 80 3d a8 2d 41 02 00 cmpb $0x0,0x2412da8(%rip) # 0x2412dbd > > 15: 75 94 jne 0xffffffffffffffab > > 17: 48 c7 c7 c0 09 e8 87 mov $0xffffffff87e809c0,%rdi > > 1e: c6 05 98 2d 41 02 01 movb $0x1,0x2412d98(%rip) # 0x2412dbd > > 25: e8 1b 90 41 ff callq 0xffffffffff419045 > > 2a:* 0f 0b ud2 <-- trapping instruction > > 2c: e9 7a ff ff ff jmpq 0xffffffffffffffab > > 31: 80 3d 82 2d 41 02 00 cmpb $0x0,0x2412d82(%rip) # 0x2412dba > > 38: 0f 85 6d ff ff ff jne 0xffffffffffffffab > > 3e: 48 rex.W > > 3f: c7 .byte 0xc7 > > > > Code starting with the faulting instruction > > =========================================== > > 0: 0f 0b ud2 > > 2: e9 7a ff ff ff jmpq 0xffffffffffffff81 > > 7: 80 3d 82 2d 41 02 00 cmpb $0x0,0x2412d82(%rip) # 0x2412d90 > > e: 0f 85 6d ff ff ff jne 0xffffffffffffff81 > > 14: 48 rex.W > > 15: c7 .byte 0xc7 > > [ 1093.592029] RSP: 0018:ffff8881061f7cd0 EFLAGS: 00010282 > > [ 1093.593155] RAX: 0000000000000000 RBX: ffff888035b54040 RCX: 0000000000000000 > > [ 1093.594476] RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffffed1020c3ef8c > > [ 1093.595769] RBP: 0000000000000002 R08: 0000000000000001 R09: ffffed102187c4d1 > > [ 1093.597063] R10: ffffed102187c4d0 R11: ffff88810c3e2687 R12: ffff888035b53fe8 > > [ 1093.598354] R13: ffff888035b54040 R14: ffff888035b53fc0 R15: ffff888035b53fc0 > > [ 1093.599654] FS: 00007f9b8b9ad740(0000) GS:ffff88810c200000(0000) knlGS:0000000000000000 > > [ 1093.601104] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [ 1093.602174] CR2: 00007fffdcdc0000 CR3: 0000000103e04001 CR4: 0000000000160ee0 > > [ 1093.603472] Call Trace: > > [ 1093.603997] tcp_close (/home/pabeni/devel/linux-mptcp/./include/linux/refcount.h:191 /home/pabeni/devel/linux-mptcp/./include/linux/refcount.h:228 /home/pabeni/devel/linux-mptcp/./include/net/sock.h:647 /home/pabeni/devel/linux-mptcp/net/ipv4/tcp.c:2434) > > There's an extra sock_put() (or a missing sock_hold()) somewhere before > the sock_hold() in tcp_close(). Seems like the only candidate for that is > an unmatched sock_put() in subflow_ulp_release() which is releasing the > reference count for mptcp_subflow_context->conn. > > It looks like that mptcp_subflow_context->conn reference count is intended > to be matched by a sock_hold() when the conn value is assigned (see > mptcp_subflow_create_socket()). There are other assignments to > mptcp_subflow_context->conn in subflow_syn_recv_sock() and mptcp_accept() > which are not calling sock_hold() - so maybe that's the problem. If that's > it, I'm not sure why this warning happens rarely instead of all the time. > > I'll try adding some debug statements to confirm this theory. [...] > [ 1093.605757] mptcp_close (/home/pabeni/devel/linux-mptcp/net/mptcp/protocol.c:521 /home/pabeni/devel/linux-mptcp/net/mptcp/protocol.c:579) Thank you for the feedback! I think the refcount mismatch happens on a subflow socket, in the relevant checked-out tree: /home/pabeni/devel/linux-mptcp/net/mptcp/protocol.c:521 is: static void __mptcp_close_ssk(struct sock *sk, struct sock *ssk, struct mptcp_subflow_context *subflow, long timeout) { struct socket *sock = READ_ONCE(ssk->sk_socket); list_del(&subflow->node); if (sock && sock != sk->sk_socket) { /* outgoing subflow */ sock_release(sock); } else { /* incoming subflow */ sock_orphan(ssk); tcp_close(ssk, timeout); // XXXXXXX <- this line } } and /home/pabeni/devel/linux-mptcp/net/mptcp/protocol.c:579 is: static void mptcp_close(struct sock *sk, long timeout) { struct mptcp_subflow_context *subflow, *tmp; struct mptcp_sock *msk = mptcp_sk(sk); mptcp_token_destroy(msk->token); inet_sk_state_store(sk, TCP_CLOSE); lock_sock(sk); list_for_each_entry_safe(subflow, tmp, &msk->conn_list, node) { struct sock *ssk = mptcp_subflow_tcp_sock(subflow); __mptcp_close_ssk(sk, ssk, subflow, timeout); // ^^^^^^^^^^^^^^^^^^ this line XXXXXXXXXXXXXXXXX ^^^^^^^^^^^^ } Given the very low frequency of the event this, the refcount mismatch is very likely due to some race conditions. I think the following one could be a possible (even likely) root cause. In __mptcp_close_ssk() - ssk lock is not held: sock_orphan(ssk); // ssk->flags |= SOCK_DEAD // ssk is in LAST_ACK status, some other CPU // receives the last ack and call tcp_done() // which in turns call inet_csk_destroy_sock(), as ssk // has SOCK_DEAD set, ssk is freed tcp_close(ssk, timeout); // whooops :( I'll try to validate the above with the following dumb/hacky fix, if it works, we can think something more elegant meanwhile/later: sock_hold(ssk); // so that eventual inet_csk_destroy_sock() // in between orphan and close will not free // ssk sock_orphan(ssk); tcp_close(ssk, timeout); sock_put(ssk); Cheers, Paolo
On Wed, 2019-12-11 at 20:00 +0100, Paolo Abeni wrote: > On Wed, 2019-12-11 at 17:32 +0100, Paolo Abeni wrote: > > On Tue, 2019-12-10 at 22:27 +0100, Matthieu Baerts wrote: > > > > This one had a couple of gross mistake, just cooked a less bugged > version. The new tree is here: > > https://github.com/pabeni/mptcp/commits/fwestpha-mptcp_export_rebase_19_fallback_cleanup_3 > > The only updated patch is the last one: FYI, Paolo did a newer version at https://github.com/pabeni/mptcp/commits/fwestpha-mptcp_export_rebase_19_fallback_cleanup_4 that has squashed the fixes. Over that tree, I fixed most of checkpatch issues on the first 26 patches (up to kselftests), rebuilt, retested and pushed at https://github.com/dcaratti/mptcp/tree/fwestpha-mptcp_export_rebase_19_fallback_cleanup_5 Please find below the list of patches where the code actually changed: 0003-tcp-Define-IPPROTO_MPTCP.patch 0013-mptcp-Handle-MPTCP-TCP-options.patch 0017-mptcp-Add-key-generation-and-token-tree.patch 0020-mptcp-Write-MPTCP-DSS-headers-to-outgoing-data-packe.patch 0021-mptcp-Implement-MPTCP-receive-path.patch 0022-mptcp-add-subflow-write-space-signalling-and-mptcp_p.patch 0023-mptcp-recvmsg-can-drain-data-from-multiple-subflows.patch 0026-mptcp-add-basic-kselftest-for-mptcp.patch 0027-mptcp-move-from-sha1-v0-to-sha256-v1.patch 0028-mptcp-parse-and-emit-MP_CAPABLE-option-according-to-.patch 0031-mptcp-Add-ADD_ADDR-handling.patch
diff --git a/include/net/tcp.h b/include/net/tcp.h index e0525a5eb1fb..354a4ce9e5e9 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -396,7 +396,6 @@ void tcp_init_sock(struct sock *sk); void tcp_init_transfer(struct sock *sk, int bpf_op); __poll_t tcp_poll(struct file *file, struct socket *sock, struct poll_table_struct *wait); -__poll_t __tcp_poll(struct sock *sk); int tcp_getsockopt(struct sock *sk, int level, int optname, char __user *optval, int __user *optlen); int tcp_setsockopt(struct sock *sk, int level, int optname, diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 1cc7eb288029..9994bb49b8cd 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -490,12 +490,15 @@ static inline bool tcp_stream_is_readable(const struct tcp_sock *tp, * take care of normal races (between the test and the event) and we don't * go look at any of the socket buffers directly. */ -__poll_t __tcp_poll(struct sock *sk) +__poll_t tcp_poll(struct file *file, struct socket *sock, poll_table *wait) { __poll_t mask; + struct sock *sk = sock->sk; const struct tcp_sock *tp = tcp_sk(sk); int state; + sock_poll_wait(file, sock, wait); + state = inet_sk_state_load(sk); if (state == TCP_LISTEN) return inet_csk_listen_poll(sk); @@ -587,13 +590,6 @@ __poll_t __tcp_poll(struct sock *sk) return mask; } - -__poll_t tcp_poll(struct file *file, struct socket *sock, poll_table *wait) -{ - sock_poll_wait(file, sock, wait); - - return __tcp_poll(sock->sk); -} EXPORT_SYMBOL(tcp_poll); int tcp_ioctl(struct sock *sk, int cmd, unsigned long arg) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 156752e11591..57e092aca615 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -513,7 +513,6 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) ssk_check_wmem(msk, ssk); release_sock(ssk); - out: release_sock(sk); return ret; @@ -902,9 +901,9 @@ static int __mptcp_init_sock(struct sock *sk) INIT_LIST_HEAD(&msk->conn_list); INIT_LIST_HEAD(&msk->join_list); INIT_LIST_HEAD(&msk->rtx_queue); + __set_bit(MPTCP_SEND_SPACE, &msk->flags); INIT_WORK(&msk->rtx_work, mptcp_worker); - __set_bit(MPTCP_SEND_SPACE, &msk->flags); /* re-use the csk retrans timer for MPTCP-level retrans */ timer_setup(&msk->sk.icsk_retransmit_timer, mptcp_retransmit_timer, 0); @@ -1012,15 +1011,6 @@ static void mptcp_close(struct sock *sk, long timeout) sk_common_release(sk); } -static int mptcp_disconnect(struct sock *sk, int flags) -{ - lock_sock(sk); - __mptcp_clear_xmit(sk); - release_sock(sk); - mptcp_cancel_work(sk); - return tcp_disconnect(sk, flags); -} - static void mptcp_copy_inaddrs(struct sock *msk, const struct sock *ssk) { const struct ipv6_pinfo *ssk6 = inet6_sk(ssk); @@ -1044,6 +1034,15 @@ static void mptcp_copy_inaddrs(struct sock *msk, const struct sock *ssk) #endif } +static int mptcp_disconnect(struct sock *sk, int flags) +{ + lock_sock(sk); + __mptcp_clear_xmit(sk); + release_sock(sk); + mptcp_cancel_work(sk); + return tcp_disconnect(sk, flags); +} + static struct sock *mptcp_accept(struct sock *sk, int flags, int *err, bool kern) { diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h index 8251408610f0..1563d52eaa8d 100644 --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -79,8 +79,8 @@ /* MPTCP socket flags */ #define MPTCP_DATA_READY BIT(0) -#define MPTCP_WORK_RTX BIT(1) -#define MPTCP_SEND_SPACE BIT(2) +#define MPTCP_SEND_SPACE BIT(1) +#define MPTCP_WORK_RTX BIT(2) #define MPTCP_WORK_EOF BIT(3) static inline __be32 mptcp_option(u8 subopt, u8 len, u8 nib, u8 field)