From patchwork Tue Oct 15 10:11:37 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: "Bartschies, Thomas" X-Patchwork-Id: 1176882 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=reject dis=none) header.from=cvk.de Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=cvk.de header.i=@cvk.de header.b="KibGiDzo"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 46srp66QLMz9sP7 for ; Tue, 15 Oct 2019 21:12:30 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730152AbfJOKMa (ORCPT ); Tue, 15 Oct 2019 06:12:30 -0400 Received: from cvk-fw2.cvk.de ([194.39.189.12]:33836 "EHLO cvk-fw2.cvk.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727504AbfJOKM3 (ORCPT ); Tue, 15 Oct 2019 06:12:29 -0400 Received: from localhost (cvk-fw2 [127.0.0.1]) by cvk-fw2.cvk.de (Postfix) with ESMTP id 46srp23Ctsz4w99; Tue, 15 Oct 2019 12:12:26 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=cvk.de; h= mime-version:content-transfer-encoding:content-type:content-type :content-language:accept-language:in-reply-to:references :message-id:date:date:subject:subject:from:from; s= mailcvk20190509; t=1571134297; x=1572948698; bh=0MQY6baThDB3rgb8 QTWmdr5GM8D74/IiagXcoMmmghE=; b=KibGiDzolJelLdQl0/1QTaonIjTHd+Sp G3+6qXz6AXn7N4iCyG1yyiC/B3f3H+/or5xNkm7PA5pgSYBk+Jl8qmYZLvqkMnxV JyMMa4iTVTrJMES1oJP1dwgBa8rBtwq8aMLQq5b54iXhMTU5Ma2Zir2a/hhiMIQS 033rLfhygb9tnKYZvyOj3v+isjDkceh6yYOjKZ3PuPoJdlCTK/vk3gAVG3yEk3es u1Rk0Id3X326yDyKXyAaDubJvj1wy/nDB6wlB7hh6msoCMzE3aYDTLOW4Rpbw6op lqKJh2qW8UhARMZ6Eqi5pzSLE/aeNw/eOaBD48L6/vhEKUh2JtzGpg== X-Virus-Scanned: by amavisd-new at cvk.de Received: from cvk-fw2.cvk.de ([127.0.0.1]) by localhost (cvk-fw2.cvk.de [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id QKO8rNWg9NMk; Tue, 15 Oct 2019 12:11:37 +0200 (CEST) Received: from cvk027.cvk.de (cvk027.cvk.de [10.1.0.22]) by cvk-fw2.cvk.de (Postfix) with ESMTP; Tue, 15 Oct 2019 12:11:37 +0200 (CEST) Received: from cvk038.intra.cvk.de (cvk038.intra.cvk.de [10.1.0.38]) by cvk027.cvk.de (Postfix) with ESMTP id 95E1284904F; Tue, 15 Oct 2019 12:11:37 +0200 (CEST) Received: from CVK038.intra.cvk.de ([::1]) by cvk038.intra.cvk.de ([::1]) with mapi id 14.03.0468.000; Tue, 15 Oct 2019 12:11:37 +0200 From: "Bartschies, Thomas" To: 'Eric Dumazet' , 'David Ahern' , "'netdev@vger.kernel.org'" Subject: AW: AW: big ICMP requests get disrupted on IPSec tunnel activation Thread-Topic: AW: big ICMP requests get disrupted on IPSec tunnel activation Thread-Index: AdVqDuxpNdDfEB5ERA+Q1nyhyK5bhAANt6yAALhjMUAFW8txsP//+lYA//6mIhA= Date: Tue, 15 Oct 2019 10:11:37 +0000 Message-ID: References: In-Reply-To: Accept-Language: de-DE, en-US Content-Language: de-DE X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.11.10.4] MIME-Version: 1.0 X-GBS-PROC: mBCtcOCtwCOp5vL6i/FxNf6ZlYeJIxaFGRtN6OMl228= Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Hello Eric, no problem. Just done that. Here are the results. Hope it helps. Did a parallel debug ftrace also. But no conclusive information within that. Tried it before. [Di Okt 15 11:51:06 2019] WARNING: CPU: 1 PID: 0 at net/sched/sch_fq.c:389 fq_enqueue+0x531/0x610 [sch_fq] [Di Okt 15 11:51:06 2019] Modules linked in: twofish_generic twofish_avx_x86_64 twofish_x86_64_3way twofish_x86_64 twofish_common camellia_generic camellia_aesni_avx_x86_64 camellia_x86_64 serpent_avx_x86_64 serpent_sse2_x86_64 serpent_generic blowfish_generic blowfish_x86_64 blowfish_common cast5_avx_x86_64 cast5_generic cast_common des_generic cmac xcbc rmd160 sha512_ssse3 sha512_generic af_key xt_CHECKSUM ipt_rpfilter xt_statistic xt_connlimit nf_conncount ip_set_hash_ip br_netfilter bridge stp llc xt_HL xt_hl ipt_REJECT nf_reject_ipv4 xt_REDIRECT xt_recent xt_realm xt_CT xt_NETMAP xt_MASQUERADE xt_esp ipt_ECN xt_ecn xt_comment ipt_CLUSTERIP ipt_ah xt_addrtype act_police cls_basic cls_flow cls_fw cls_u32 sch_tbf sch_prio sch_htb sch_hfsc sch_ingress sch_sfq xt_set ip_set xt_NFLOG nf_log_ipv4 nf_log_common xt_LOG nf_conntrack_sane nf_conntrack_netlink nf_nat_tftp nf_nat_snmp_basic nf_conntrack_snmp nf_nat_sip nf_nat_pptp nf_nat_irc nf_nat_h323 nf_nat_ftp nf_nat_amanda nf_conntrack_tftp [Di Okt 15 11:51:06 2019] nf_conntrack_sip nf_conntrack_pptp nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp ts_kmp nf_conntrack_amanda xt_TPROXY nf_tproxy_ipv6 nf_tproxy_ipv4 xt_time xt_TCPMSS xt_tcpmss xt_sctp xt_policy xt_pkttype xt_physdev xt_owner xt_NFQUEUE xt_nat xt_multiport xt_mark xt_mac xt_limit xt_length xt_iprange xt_helper xt_hashlimit xt_DSCP xt_dscp xt_dccp xt_connmark xt_CLASSIFY xt_AUDIT iptable_raw iptable_nat nf_nat iptable_mangle iptable_filter vmw_vsock_vmci_transport vsock ip6t_REJECT nf_reject_ipv6 xt_state xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c nfnetlink_log nfnetlink ip6table_filter ip6_tables sb_edac crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_rapl_perf joydev vmw_balloon vmw_vmci i2c_piix4 sch_fq tcp_illinois binfmt_misc ip_tables vmwgfx drm_kms_helper ttm drm crc32c_intel serio_raw mptspi e1000 scsi_transport_spi mptscsih mptbase ata_generic pata_acpi [Di Okt 15 11:51:06 2019] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.2.18-200.fc30.x86_64 #1 [Di Okt 15 11:51:06 2019] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 12/12/2018 [Di Okt 15 11:51:06 2019] RIP: 0010:fq_enqueue+0x531/0x610 [sch_fq] [Di Okt 15 11:51:06 2019] Code: db 48 8b 3d 49 28 00 00 48 89 ee e8 39 ba 03 db 45 85 ed 75 d4 49 8b 2f e9 f1 fb ff ff 48 83 83 30 02 00 00 01 e9 fd fc ff ff <0f> 0b e9 83 fc ff ff 4c 89 ea 31 c0 e9 be fc ff ff 41 8b 46 08 39 [Di Okt 15 11:51:06 2019] RSP: 0018:ffffa2fe806f0a58 EFLAGS: 00010206 [Di Okt 15 11:51:06 2019] RAX: 0000000df8475800 RBX: ffff973638a32800 RCX: 0000000000000000 [Di Okt 15 11:51:06 2019] RDX: 000003df1924d162 RSI: 0000000000000004 RDI: ffffffffc02845b5 [Di Okt 15 11:51:06 2019] RBP: 15cdc829bb26f866 R08: 0000000000000000 R09: ffff97363afc4e00 [Di Okt 15 11:51:06 2019] R10: 0000000000000078 R11: ffff97363808b088 R12: ffff97363821ce00 [Di Okt 15 11:51:06 2019] R13: ffff973634620cb0 R14: 00000000000003cb R15: ffff973635965c70 [Di Okt 15 11:51:06 2019] FS: 0000000000000000(0000) GS:ffff97363ba80000(0000) knlGS:0000000000000000 [Di Okt 15 11:51:06 2019] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [Di Okt 15 11:51:06 2019] CR2: 00005616fa9f90b8 CR3: 0000000135084005 CR4: 00000000000606e0 [Di Okt 15 11:51:06 2019] Call Trace: [Di Okt 15 11:51:06 2019] [Di Okt 15 11:51:06 2019] ? fq_flow_purge+0x70/0x70 [sch_fq] [Di Okt 15 11:51:06 2019] ? ftrace_graph_caller+0xa0/0xa0 [Di Okt 15 11:51:06 2019] __dev_queue_xmit+0x45b/0x960 [Di Okt 15 11:51:06 2019] ? ftrace_graph_caller+0xa0/0xa0 [Di Okt 15 11:51:06 2019] ip_finish_output2+0x2c8/0x580 [Di Okt 15 11:51:06 2019] ? ftrace_graph_caller+0xa0/0xa0 [Di Okt 15 11:51:06 2019] ip_do_fragment+0x69d/0x7a0 [Di Okt 15 11:51:06 2019] ? __ip_flush_pending_frames.isra.0+0x80/0x80 [Di Okt 15 11:51:06 2019] ? ftrace_graph_caller+0xa0/0xa0 [Di Okt 15 11:51:06 2019] ip_output+0x71/0xf0 [Di Okt 15 11:51:06 2019] ? ip_finish_output2+0x580/0x580 [Di Okt 15 11:51:06 2019] ? ftrace_graph_caller+0xa0/0xa0 [Di Okt 15 11:51:06 2019] ip_forward+0x3c5/0x480 [Di Okt 15 11:51:06 2019] ? ip_defrag.cold+0x37/0x37 [Di Okt 15 11:51:06 2019] ? ftrace_graph_caller+0xa0/0xa0 [Di Okt 15 11:51:06 2019] ip_rcv+0xbc/0xd0 [Di Okt 15 11:51:06 2019] ? ip_rcv_finish_core.isra.0+0x390/0x390 [Di Okt 15 11:51:06 2019] ? ftrace_graph_caller+0xa0/0xa0 [Di Okt 15 11:51:06 2019] __netif_receive_skb_one_core+0x80/0x90 [Di Okt 15 11:51:06 2019] ? ftrace_graph_caller+0xa0/0xa0 [Di Okt 15 11:51:06 2019] netif_receive_skb_internal+0x41/0xb0 [Di Okt 15 11:51:06 2019] ? return_to_handler+0x15/0x36 [Di Okt 15 11:51:06 2019] ? ftrace_graph_caller+0xa0/0xa0 [Di Okt 15 11:51:06 2019] napi_gro_receive+0xf6/0x160 [Di Okt 15 11:51:06 2019] ? ftrace_graph_caller+0xa0/0xa0 [Di Okt 15 11:51:06 2019] e1000_clean_rx_irq+0x2ac/0x570 [e1000] [Di Okt 15 11:51:06 2019] ? ftrace_graph_caller+0xa0/0xa0 [Di Okt 15 11:51:06 2019] e1000_clean+0x278/0x5f0 [e1000] [Di Okt 15 11:51:06 2019] ? ftrace_graph_caller+0xa0/0xa0 [Di Okt 15 11:51:06 2019] net_rx_action+0x148/0x3b0 [Di Okt 15 11:51:06 2019] ? napi_complete_done+0x110/0x110 [Di Okt 15 11:51:06 2019] ? ftrace_graph_caller+0xa0/0xa0 [Di Okt 15 11:51:06 2019] __do_softirq+0xed/0x30e [Di Okt 15 11:51:06 2019] ? ftrace_graph_caller+0xa0/0xa0 [Di Okt 15 11:51:06 2019] irq_exit+0xf1/0x100 [Di Okt 15 11:51:06 2019] ? ftrace_graph_caller+0xa0/0xa0 [Di Okt 15 11:51:06 2019] do_IRQ+0x81/0xe0 [Di Okt 15 11:51:06 2019] ? ftrace_graph_caller+0xa0/0xa0 [Di Okt 15 11:51:06 2019] common_interrupt+0xf/0xf [Di Okt 15 11:51:06 2019] [Di Okt 15 11:51:06 2019] RIP: 0010:native_safe_halt+0xe/0x10 [Di Okt 15 11:51:06 2019] Code: 90 90 90 90 90 90 90 90 90 90 90 90 e9 07 00 00 00 0f 00 2d 46 06 44 00 f4 c3 66 90 e9 07 00 00 00 0f 00 2d 36 06 44 00 fb f4 90 e8 eb 98 03 00 41 54 55 53 e8 f2 01 79 ff 65 8b 2d db 7f 64 [Di Okt 15 11:51:06 2019] RSP: 0018:ffffa2fe806abeb0 EFLAGS: 00000282 ORIG_RAX: ffffffffffffffd3 [Di Okt 15 11:51:06 2019] RAX: ffffffff9b9c7d60 RBX: 0000000000000001 RCX: 0000000000000000 [Di Okt 15 11:51:06 2019] RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffffffff9b1140d6 [Di Okt 15 11:51:06 2019] RBP: 0000000000000001 R08: 0000000000000000 R09: ffff97363afc4e00 [Di Okt 15 11:51:06 2019] R10: 0000000000000c78 R11: ffff973638092c88 R12: ffff97363a84ddc0 [Di Okt 15 11:51:06 2019] R13: 0000000000000000 R14: 0000000000000000 R15: ffff97363a84ddc0 [Di Okt 15 11:51:06 2019] ? __cpuidle_text_start+0x8/0x8 [Di Okt 15 11:51:06 2019] ? do_idle+0x1f6/0x260 [Di Okt 15 11:51:06 2019] default_idle+0x1a/0x140 [Di Okt 15 11:51:06 2019] ? ftrace_graph_caller+0xa0/0xa0 [Di Okt 15 11:51:06 2019] do_idle+0x1fb/0x260 [Di Okt 15 11:51:06 2019] ? cpu_startup_entry+0x19/0x20 [Di Okt 15 11:51:06 2019] ? ftrace_graph_caller+0xa0/0xa0 [Di Okt 15 11:51:06 2019] cpu_startup_entry+0x19/0x20 [Di Okt 15 11:51:06 2019] start_secondary+0x17d/0x1d0 [Di Okt 15 11:51:06 2019] secondary_startup_64+0xa4/0xb0 [Di Okt 15 11:51:06 2019] ---[ end trace 34f95424acff5ddb ]--- Regards, --- Thomas Bartschies CVK IT Systeme -----Ursprüngliche Nachricht----- Von: Eric Dumazet [mailto:eric.dumazet@gmail.com] Gesendet: Montag, 14. Oktober 2019 17:32 An: Bartschies, Thomas ; 'David Ahern' ; 'netdev@vger.kernel.org' Betreff: Re: AW: big ICMP requests get disrupted on IPSec tunnel activation On 10/14/19 7:02 AM, Bartschies, Thomas wrote: > Hello, > > it took a while to build a testsystem for bisecting the issue. Finally I've identified the patch that causes my problems. > BTW. The fq packet network scheduler is in use. > > It's > [PATCH net-next] tcp/fq: move back to CLOCK_MONOTONIC > > In the recent TCP/EDT patch series, I switched TCP and sch_fq clocks > from MONOTONIC to TAI, in order to meet the choice done earlier for sch_etf packet scheduler. > > But sure enough, this broke some setups were the TAI clock jumps forward (by almost 50 year...), as reported by Leonard Crestez. > > If we want to converge later, we'll probably need to add an skb field to differentiate the clock bases, or a socket option. > > In the meantime, an UDP application will need to use CLOCK_MONOTONIC > base for its SCM_TXTIME timestamps if using fq packet scheduler. > > Fixes: 72b0094f9182 ("tcp: switch tcp_clock_ns() to CLOCK_TAI base") > Fixes: 142537e41923 ("net_sched: sch_fq: switch to CLOCK_TAI") > Fixes: fd2bca2aa789 ("tcp: switch internal pacing timer to CLOCK_TAI") > Signed-off-by: Eric Dumazet > Reported-by: Leonard Crestez > > ---- > > After reverting it in a current 5.2.18 kernel, the problem disappears. > There were some post fixes for other issues caused by this patch. > These fixed other similar issues, but not mine. I've already tried to > set the tstamp to zero in xfrm4_output.c, but with no luck so far. I'm pretty sure, that reverting the clock patch isn't the proper solution for upstream. So I what other way this can be fixed? Thanks a lot Thomas for this report ! I guess you could add a debug check in fq to let us know the call graph. Something like the following : > > --- > [PATCH net] net: clear skb->tstamp in bridge forwarding path Matteo > reported forwarding issues inside the linux bridge, if the enslaved interfaces use the fq qdisc. > > Similar to commit 8203e2d844d3 ("net: clear skb->tstamp in forwarding > paths"), we need to clear the tstamp field in the bridge forwarding path. > > Fixes: 80b14dee2bea ("net: Add a new socket option for a future > transmit time.") > Fixes: fb420d5d91c1 ("tcp/fq: move back to CLOCK_MONOTONIC") > Reported-and-tested-by: Matteo Croce > Signed-off-by: Paolo Abeni > > and > > net: clear skb->tstamp in forwarding paths > > Sergey reported that forwarding was no longer working if fq packet scheduler was used. > > This is caused by the recent switch to EDT model, since incoming > packets might have been timestamped by __net_timestamp() > > __net_timestamp() uses ktime_get_real(), while fq expects packets using CLOCK_MONOTONIC base. > > The fix is to clear skb->tstamp in forwarding paths. > > Fixes: 80b14dee ("net: Add a new socket option for a future transmit > time.") > Fixes: fb420d5d ("tcp/fq: move back to CLOCK_MONOTONIC") > Signed-off-by: default avatarEric Dumazet > Reported-by: default avatarSergey Matyukevich > Tested-by: default avatarSergey Matyukevich > Signed-off-by: default avatarDavid S. Miller > > Best regards, > -- > Thomas Bartschies > CVK IT Systeme > > > -----Ursprüngliche Nachricht----- > Von: Bartschies, Thomas > Gesendet: Dienstag, 17. September 2019 09:28 > An: 'David Ahern' ; 'netdev@vger.kernel.org' > > Betreff: AW: big ICMP requests get disrupted on IPSec tunnel > activation > > Hello, > > thanks for the suggestion. Running pmtu.sh with kernel versions 4.19, 4.20 and even 5.2.13 made no difference. All tests were successful every time. > > Although my external ping tests still failing with the newer kernels. I've ran the script after triggering my problem, to make sure all possible side effects happening. > > Please keep in mind, that even when the ICMP requests stalling, other connections still going through. Like e.g. ssh or tracepath. I would expect that all connection types would be affected if this is a MTU problem. Am I wrong? > > Any suggestions for more tests to isolate the cause? > > Best regards, > -- > Thomas Bartschies > CVK IT Systeme > > -----Ursprüngliche Nachricht----- > Von: David Ahern [mailto:dsahern@gmail.com] > Gesendet: Freitag, 13. September 2019 19:13 > An: Bartschies, Thomas ; > 'netdev@vger.kernel.org' > Betreff: Re: big ICMP requests get disrupted on IPSec tunnel > activation > > On 9/13/19 9:59 AM, Bartschies, Thomas wrote: >> Hello together, >> >> since kenel 4.20 we're observing a strange behaviour when sending big ICMP packets. An example is a packet size of 3000 bytes. >> The packets should be forwarded by a linux gateway (firewall) having multiple interfaces also acting as a vpn gateway. >> >> Test steps: >> 1. Disabled all iptables rules >> 2. Enabled the VPN IPSec Policies. >> 3. Start a ping with packet size (e.g. 3000 bytes) from a client in >> the DMZ passing the machine targeting another LAN machine 4. Ping >> works 5. Enable a VPN policy by sending pings from the gateway to a >> tunnel target. System tries to create the tunnel 6. Ping from 3. immediately stalls. No error messages. Just stops. >> 7. Stop Ping from 3. Start another without packet size parameter. Stalls also. >> >> Result: >> Connections from the client to other services on the LAN machine >> still work. Tracepath works. Only ICMP requests do not pass the >> gateway anymore. tcpdump sees them on incoming interface, but not on the outgoing LAN interface. IMCP requests to any other target IP address in LAN still work. Until one uses a bigger packet size. Then these alternative connections stall also. >> >> Flushing the policy table has no effect. Flushing the conntrack table has no effect. Setting rp_filter to loose (2) has no effect. >> Flush the route cache has no effect. >> >> Only a reboot of the gateway restores normal behavior. >> >> What can be the cause? Is this a networking bug? >> > > some of these most likely will fail due to other reasons, but can you run 'tools/testing/selftests/net/pmtu.sh'[1] on 4.19 and then 4.20 and compare results. Hopefully it will shed some light on the problem and can be used to bisect to a commit that caused the regression. > > > [1] > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tre > e/tools/testing/selftests/net/pmtu.sh > diff --git a/net/sched/sch_fq.c b/net/sched/sch_fq.c index 98dd87ce15108cfe1c011da44ba32f97763776c8..2aa41a39e81b94f3b7092dc51b91829f5929634d 100644 --- a/net/sched/sch_fq.c +++ b/net/sched/sch_fq.c @@ -380,9 +380,14 @@ static void flow_queue_add(struct fq_flow *flow, struct sk_buff *skb) { struct rb_node **p, *parent; struct sk_buff *head, *aux; + s64 delay; fq_skb_cb(skb)->time_to_send = skb->tstamp ?: ktime_get_ns(); + /* We should really add a TCA_FQ_MAX_HORIZON at some point :( */ + delay = fq_skb_cb(skb)->time_to_send - ktime_get_ns(); + WARN_ON_ONCE(delay > 60 * NSEC_PER_SEC); + head = flow->head; if (!head || fq_skb_cb(skb)->time_to_send >= fq_skb_cb(flow->tail)->time_to_send) {