diff mbox series

[ovs-dev,V3] datapath: Prevent panic

Message ID 1524243413-7787-1-git-send-email-gvrose8192@gmail.com
State Changes Requested
Headers show
Series [ovs-dev,V3] datapath: Prevent panic | expand

Commit Message

Gregory Rose April 20, 2018, 4:56 p.m. UTC
On RHEL 7.x kernels we observe a panic induced by a paging error
when the timer kicks off a job that subsequently accesses memory
that belonged to the openvswitch kernel module but was since
unloaded - thus the paging error.

The panic can be induced on any RHEL 7.x kernel with the following test:

while `true`
do
    make check-kmod TESTSUITEFLAGS="-k \!gre"
done

On the systems I've been testing on it generally takes anywhere from a
minute to 15 minutes or so to repro but never longer than that.  Similar
results have been seen by other testers.

Fix the panic by flushing conntrack and then ensuring that the timers are
deleted by incorporating the patch from suggested by Eric Garver.

Here is the trace:
[252257.801809] BUG: unable to handle kernel paging request at ffffffffc07c6298
[252257.802451] IP: [<ffffffff810996e0>] run_timer_softirq+0xe0/0x310
[252257.803055] PGD 19f5067 PUD 19f7067 PMD 2fb5fc2067 PTE 0
[252257.803559] Oops: 0002 [#1] SMP
[252257.804138] Modules linked in: geneve ip6_udp_tunnel xt_statistic xt_physdev xt_nat xt_recent xt_comment xt_mark ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat xt_addrtype ipt_REJECT nf_reject_ipv4 xt_conntrack iptable_filter ip_tables nf_conntrack_netlink br_netfilter overlay(T) sch_htb veth udp_tunnel 8021q garp mrp tun ip_set nfnetlink bridge stp llc nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iTCO_wdt iTCO_vendor_support dcdbas mxm_wmi sb_edac edac_core intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd joydev mei_me sg mei ipmi_ssif pcspkr shpchp lpc_ich ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter wmi nfsd auth_rpcgss
[252257.808079] nfs_acl lockd grace sunrpc xfs libcrc32c sr_mod sd_mod cdrom crc_t10dif crct10dif_generic uas usb_storage mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crct10dif_pclmul crct10dif_common crc32c_intel ahci libahci ixgbe libata igb megaraid_sas mdio ptp i2c_algo_bit pps_core i2c_core dca dm_mirror dm_region_hash dm_log dm_mod [last unloaded: openvswitch]
[252257.811056] CPU: 33 PID: 0 Comm: swapper/33 Tainted: G OE ------------ T 3.10.0-693.el7.x86_64 #1
[252257.811826] Hardware name: Dell Inc. PowerEdge R630/02C2CP, BIOS 2.1.5 04/11/2016
[252257.812605] task: ffff8830b7708fd0 ti: ffff8830b7718000 task.ti: ffff8830b7718000
[252257.813447] RIP: 0010:[<ffffffff810996e0>] [<ffffffff810996e0>] run_timer_softirq+0xe0/0x310
[252257.814298] RSP: 0018:ffff885fbe203e68 EFLAGS: 00010082
[252257.815122] RAX: ffff8830b66bc838 RBX: ffff8830b66bc000 RCX: ffffffffc07c6290
[252257.815933] RDX: ffff8830b66bc810 RSI: ffff885fbe203e90 RDI: ffff8830b66bc000
[252257.816733] RBP: ffff885fbe203ed0 R08: 0000e56b5701d800 R09: ffff885fbe203da0
[252257.817568] R10: 0000000000000002 R11: ffff885fbe203da8 R12: 0000000000000081
[252257.818429] R13: 0000e56b56fb2eca R14: ffffffff819eb0c8 R15: 0000000000000001
[252257.819297] FS: 0000000000000000(0000) GS:ffff885fbe200000(0000) knlGS:0000000000000000
[252257.820174] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[252257.821156] CR2: ffffffffc07c6298 CR3: 00000000019f2000 CR4: 00000000003407e0
[252257.822012] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[252257.822869] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[252257.823720] Stack:
[252257.824678] ffff8830b66bdc28 ffff8830b66bd828 ffff8830b66bd428 ffff8830b66bd028
[252257.825699] ffff885fbe20fe80 ffff885fbe203eb0 ffffffff8132bfe0 28a56c277c4fe974
[252257.826623] 0000000000000001 0000000000000001 0000e56b56fb2eca ffffffff819eb0c8
[252257.827524] Call Trace:
[252257.828410] <IRQ>
[252257.828417]
[252257.829292] [<ffffffff8132bfe0>] ? timerqueue_add+0x60/0xb0
[252257.830164] [<ffffffff81090b3f>] __do_softirq+0xef/0x280
[252257.831010] [<ffffffff816b6a5c>] call_softirq+0x1c/0x30
[252257.831849] [<ffffffff8102d3c5>] do_softirq+0x65/0xa0
[252257.832669] [<ffffffff81090ec5>] irq_exit+0x105/0x110
[252257.833501] [<ffffffff816b76c2>] smp_apic_timer_interrupt+0x42/0x50
[252257.834330] [<ffffffff816b5c1d>] apic_timer_interrupt+0x6d/0x80
[252257.835152] <EOI>
[252257.835159]
[252257.835944] [<ffffffff81527a02>] ? cpuidle_enter_state+0x52/0xc0
[252257.837231] [<ffffffff81527b48>] cpuidle_idle_call+0xd8/0x210
[252257.838224] [<ffffffff81034fee>] arch_cpu_idle+0xe/0x30
[252257.839133] [<ffffffff810e7bca>] cpu_startup_entry+0x14a/0x1c0
[252257.839933] [<ffffffff81051af6>] start_secondary+0x1b6/0x230
[252257.840684] Code: 00 00 00 44 0f b6 e0 45 85 e4 0f 84 a7 01 00 00 49 63 d4 48 83 43 10 01 48 8d 75 c0 48 c1 e2 04 48 01 da 48 8b 4a 28 48 8d 42 28 <48> 89 71 08 48 89 4d c0 48 8b 4a 30 48 89 4d c8 48 89 31 48 89
[252257.842366] RIP [<ffffffff810996e0>] run_timer_softirq+0xe0/0x310
[252257.843183] RSP <ffff885fbe203e68>
[252257.843955] CR2: ffffffffc07c6298

Fixes: dd41d330b03 ("openvswitch: Add force commit.")
Suggested-by: Eric Garver <e@erig.me>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
---

V2 - Add splat to commit message per Pravin's request
V3 - Remove msleep and replace with code from Eric Garver to delete
     conntrack timer.

squash me
---
 datapath/conntrack.c        | 6 ++++--
 tests/system-kmod-macros.at | 1 +
 utilities/ovs-lib.in        | 1 +
 3 files changed, 6 insertions(+), 2 deletions(-)

Comments

Flavio Leitner April 20, 2018, 6:10 p.m. UTC | #1
On Fri, Apr 20, 2018 at 09:56:53AM -0700, Greg Rose wrote:
> On RHEL 7.x kernels we observe a panic induced by a paging error
> when the timer kicks off a job that subsequently accesses memory
> that belonged to the openvswitch kernel module but was since
> unloaded - thus the paging error.
> 
> The panic can be induced on any RHEL 7.x kernel with the following test:
> 
> while `true`
> do
>     make check-kmod TESTSUITEFLAGS="-k \!gre"
> done
> 
> On the systems I've been testing on it generally takes anywhere from a
> minute to 15 minutes or so to repro but never longer than that.  Similar
> results have been seen by other testers.
> 
> Fix the panic by flushing conntrack and then ensuring that the timers are
> deleted by incorporating the patch from suggested by Eric Garver.
> 
> Here is the trace:
> [252257.801809] BUG: unable to handle kernel paging request at ffffffffc07c6298
> [252257.802451] IP: [<ffffffff810996e0>] run_timer_softirq+0xe0/0x310
> [252257.803055] PGD 19f5067 PUD 19f7067 PMD 2fb5fc2067 PTE 0
> [252257.803559] Oops: 0002 [#1] SMP
> [252257.804138] Modules linked in: geneve ip6_udp_tunnel xt_statistic xt_physdev xt_nat xt_recent xt_comment xt_mark ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat xt_addrtype ipt_REJECT nf_reject_ipv4 xt_conntrack iptable_filter ip_tables nf_conntrack_netlink br_netfilter overlay(T) sch_htb veth udp_tunnel 8021q garp mrp tun ip_set nfnetlink bridge stp llc nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iTCO_wdt iTCO_vendor_support dcdbas mxm_wmi sb_edac edac_core intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd joydev mei_me sg mei ipmi_ssif pcspkr shpchp lpc_ich ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter wmi nfsd auth_rpcgss
> [252257.808079] nfs_acl lockd grace sunrpc xfs libcrc32c sr_mod sd_mod cdrom crc_t10dif crct10dif_generic uas usb_storage mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crct10dif_pclmul crct10dif_common crc32c_intel ahci libahci ixgbe libata igb megaraid_sas mdio ptp i2c_algo_bit pps_core i2c_core dca dm_mirror dm_region_hash dm_log dm_mod [last unloaded: openvswitch]
> [252257.811056] CPU: 33 PID: 0 Comm: swapper/33 Tainted: G OE ------------ T 3.10.0-693.el7.x86_64 #1
> [252257.811826] Hardware name: Dell Inc. PowerEdge R630/02C2CP, BIOS 2.1.5 04/11/2016
> [252257.812605] task: ffff8830b7708fd0 ti: ffff8830b7718000 task.ti: ffff8830b7718000
> [252257.813447] RIP: 0010:[<ffffffff810996e0>] [<ffffffff810996e0>] run_timer_softirq+0xe0/0x310
> [252257.814298] RSP: 0018:ffff885fbe203e68 EFLAGS: 00010082
> [252257.815122] RAX: ffff8830b66bc838 RBX: ffff8830b66bc000 RCX: ffffffffc07c6290
> [252257.815933] RDX: ffff8830b66bc810 RSI: ffff885fbe203e90 RDI: ffff8830b66bc000
> [252257.816733] RBP: ffff885fbe203ed0 R08: 0000e56b5701d800 R09: ffff885fbe203da0
> [252257.817568] R10: 0000000000000002 R11: ffff885fbe203da8 R12: 0000000000000081
> [252257.818429] R13: 0000e56b56fb2eca R14: ffffffff819eb0c8 R15: 0000000000000001
> [252257.819297] FS: 0000000000000000(0000) GS:ffff885fbe200000(0000) knlGS:0000000000000000
> [252257.820174] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [252257.821156] CR2: ffffffffc07c6298 CR3: 00000000019f2000 CR4: 00000000003407e0
> [252257.822012] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [252257.822869] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [252257.823720] Stack:
> [252257.824678] ffff8830b66bdc28 ffff8830b66bd828 ffff8830b66bd428 ffff8830b66bd028
> [252257.825699] ffff885fbe20fe80 ffff885fbe203eb0 ffffffff8132bfe0 28a56c277c4fe974
> [252257.826623] 0000000000000001 0000000000000001 0000e56b56fb2eca ffffffff819eb0c8
> [252257.827524] Call Trace:
> [252257.828410] <IRQ>
> [252257.828417]
> [252257.829292] [<ffffffff8132bfe0>] ? timerqueue_add+0x60/0xb0
> [252257.830164] [<ffffffff81090b3f>] __do_softirq+0xef/0x280
> [252257.831010] [<ffffffff816b6a5c>] call_softirq+0x1c/0x30
> [252257.831849] [<ffffffff8102d3c5>] do_softirq+0x65/0xa0
> [252257.832669] [<ffffffff81090ec5>] irq_exit+0x105/0x110
> [252257.833501] [<ffffffff816b76c2>] smp_apic_timer_interrupt+0x42/0x50
> [252257.834330] [<ffffffff816b5c1d>] apic_timer_interrupt+0x6d/0x80
> [252257.835152] <EOI>
> [252257.835159]
> [252257.835944] [<ffffffff81527a02>] ? cpuidle_enter_state+0x52/0xc0
> [252257.837231] [<ffffffff81527b48>] cpuidle_idle_call+0xd8/0x210
> [252257.838224] [<ffffffff81034fee>] arch_cpu_idle+0xe/0x30
> [252257.839133] [<ffffffff810e7bca>] cpu_startup_entry+0x14a/0x1c0
> [252257.839933] [<ffffffff81051af6>] start_secondary+0x1b6/0x230
> [252257.840684] Code: 00 00 00 44 0f b6 e0 45 85 e4 0f 84 a7 01 00 00 49 63 d4 48 83 43 10 01 48 8d 75 c0 48 c1 e2 04 48 01 da 48 8b 4a 28 48 8d 42 28 <48> 89 71 08 48 89 4d c0 48 8b 4a 30 48 89 4d c8 48 89 31 48 89
> [252257.842366] RIP [<ffffffff810996e0>] run_timer_softirq+0xe0/0x310
> [252257.843183] RSP <ffff885fbe203e68>
> [252257.843955] CR2: ffffffffc07c6298
> 
> Fixes: dd41d330b03 ("openvswitch: Add force commit.")
> Suggested-by: Eric Garver <e@erig.me>
> Signed-off-by: Greg Rose <gvrose8192@gmail.com>
> ---
> 
> V2 - Add splat to commit message per Pravin's request
> V3 - Remove msleep and replace with code from Eric Garver to delete
>      conntrack timer.
> 
> squash me
> ---
>  datapath/conntrack.c        | 6 ++++--
>  tests/system-kmod-macros.at | 1 +
>  utilities/ovs-lib.in        | 1 +
>  3 files changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/datapath/conntrack.c b/datapath/conntrack.c
> index e53b8e3..4e7c42b 100644
> --- a/datapath/conntrack.c
> +++ b/datapath/conntrack.c
> @@ -576,7 +576,8 @@ ovs_ct_expect_find(struct net *net, const struct nf_conntrack_zone *zone,
>  		if (h) {
>  			struct nf_conn *ct = nf_ct_tuplehash_to_ctrack(h);
>  
> -			nf_ct_delete(ct, 0, 0);
> +			if (del_timer(&ct->timeout))
> +				nf_ct_delete(ct, 0, 0);
>  			nf_conntrack_put(&ct->ct_general);
>  		}
>  	}

Somehow you need to limit that change to the correct range of kernels,
because since 4.9 the ct->timeout changed from struct timer_list to
u32 which is jiffies and not a timer.

fbl


> @@ -727,7 +728,8 @@ static bool skb_nfct_cached(struct net *net,
>  		 * the reference.
>  		 */
>  		if (nf_ct_is_confirmed(ct))
> -			nf_ct_delete(ct, 0, 0);
> +			if (del_timer(&ct->timeout))
> +				nf_ct_delete(ct, 0, 0);
>  
>  		nf_conntrack_put(&ct->ct_general);
>  		nf_ct_set(skb, NULL, 0);
> diff --git a/tests/system-kmod-macros.at b/tests/system-kmod-macros.at
> index f23a406..2b9b691 100644
> --- a/tests/system-kmod-macros.at
> +++ b/tests/system-kmod-macros.at
> @@ -23,6 +23,7 @@ m4_define([OVS_TRAFFIC_VSWITCHD_START],
>                 on_exit 'modprobe -q -r mod'
>                ])
>     on_exit 'ovs-dpctl del-dp ovs-system'
> +   on_exit 'ovs-appctl dpctl/flush-conntrack'
>     _OVS_VSWITCHD_START([])
>     dnl Add bridges, ports, etc.
>     AT_CHECK([ovs-vsctl -- _ADD_BR([br0]) -- $1 m4_if([$2], [], [], [| uuidfilt])], [0], [$2])
> diff --git a/utilities/ovs-lib.in b/utilities/ovs-lib.in
> index 4dc3151..4c3ad0f 100644
> --- a/utilities/ovs-lib.in
> +++ b/utilities/ovs-lib.in
> @@ -616,6 +616,7 @@ force_reload_kmod () {
>      for dp in `ovs-dpctl dump-dps`; do
>          action "Removing datapath: $dp" ovs-dpctl del-dp "$dp"
>      done
> +    action "ovs-appctl dpctl/flush-conntrack"
>  
>      for vport in `awk '/^vport_/ { print $1 }' /proc/modules`; do
>          action "Removing $vport module" rmmod $vport
> -- 
> 1.8.3.1
> 
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Gregory Rose April 20, 2018, 6:15 p.m. UTC | #2
On 4/20/2018 11:10 AM, Flavio Leitner wrote:
> On Fri, Apr 20, 2018 at 09:56:53AM -0700, Greg Rose wrote:
>> On RHEL 7.x kernels we observe a panic induced by a paging error
>> when the timer kicks off a job that subsequently accesses memory
>> that belonged to the openvswitch kernel module but was since
>> unloaded - thus the paging error.
>>
>> The panic can be induced on any RHEL 7.x kernel with the following test:
>>
>> while `true`
>> do
>>      make check-kmod TESTSUITEFLAGS="-k \!gre"
>> done
>>
>> On the systems I've been testing on it generally takes anywhere from a
>> minute to 15 minutes or so to repro but never longer than that.  Similar
>> results have been seen by other testers.
>>
>> Fix the panic by flushing conntrack and then ensuring that the timers are
>> deleted by incorporating the patch from suggested by Eric Garver.
>>
>> Here is the trace:
>> [252257.801809] BUG: unable to handle kernel paging request at ffffffffc07c6298
>> [252257.802451] IP: [<ffffffff810996e0>] run_timer_softirq+0xe0/0x310
>> [252257.803055] PGD 19f5067 PUD 19f7067 PMD 2fb5fc2067 PTE 0
>> [252257.803559] Oops: 0002 [#1] SMP
>> [252257.804138] Modules linked in: geneve ip6_udp_tunnel xt_statistic xt_physdev xt_nat xt_recent xt_comment xt_mark ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat xt_addrtype ipt_REJECT nf_reject_ipv4 xt_conntrack iptable_filter ip_tables nf_conntrack_netlink br_netfilter overlay(T) sch_htb veth udp_tunnel 8021q garp mrp tun ip_set nfnetlink bridge stp llc nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iTCO_wdt iTCO_vendor_support dcdbas mxm_wmi sb_edac edac_core intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd joydev mei_me sg mei ipmi_ssif pcspkr shpchp lpc_ich ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter wmi nfsd auth_rpcgss
>> [252257.808079] nfs_acl lockd grace sunrpc xfs libcrc32c sr_mod sd_mod cdrom crc_t10dif crct10dif_generic uas usb_storage mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crct10dif_pclmul crct10dif_common crc32c_intel ahci libahci ixgbe libata igb megaraid_sas mdio ptp i2c_algo_bit pps_core i2c_core dca dm_mirror dm_region_hash dm_log dm_mod [last unloaded: openvswitch]
>> [252257.811056] CPU: 33 PID: 0 Comm: swapper/33 Tainted: G OE ------------ T 3.10.0-693.el7.x86_64 #1
>> [252257.811826] Hardware name: Dell Inc. PowerEdge R630/02C2CP, BIOS 2.1.5 04/11/2016
>> [252257.812605] task: ffff8830b7708fd0 ti: ffff8830b7718000 task.ti: ffff8830b7718000
>> [252257.813447] RIP: 0010:[<ffffffff810996e0>] [<ffffffff810996e0>] run_timer_softirq+0xe0/0x310
>> [252257.814298] RSP: 0018:ffff885fbe203e68 EFLAGS: 00010082
>> [252257.815122] RAX: ffff8830b66bc838 RBX: ffff8830b66bc000 RCX: ffffffffc07c6290
>> [252257.815933] RDX: ffff8830b66bc810 RSI: ffff885fbe203e90 RDI: ffff8830b66bc000
>> [252257.816733] RBP: ffff885fbe203ed0 R08: 0000e56b5701d800 R09: ffff885fbe203da0
>> [252257.817568] R10: 0000000000000002 R11: ffff885fbe203da8 R12: 0000000000000081
>> [252257.818429] R13: 0000e56b56fb2eca R14: ffffffff819eb0c8 R15: 0000000000000001
>> [252257.819297] FS: 0000000000000000(0000) GS:ffff885fbe200000(0000) knlGS:0000000000000000
>> [252257.820174] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [252257.821156] CR2: ffffffffc07c6298 CR3: 00000000019f2000 CR4: 00000000003407e0
>> [252257.822012] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> [252257.822869] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> [252257.823720] Stack:
>> [252257.824678] ffff8830b66bdc28 ffff8830b66bd828 ffff8830b66bd428 ffff8830b66bd028
>> [252257.825699] ffff885fbe20fe80 ffff885fbe203eb0 ffffffff8132bfe0 28a56c277c4fe974
>> [252257.826623] 0000000000000001 0000000000000001 0000e56b56fb2eca ffffffff819eb0c8
>> [252257.827524] Call Trace:
>> [252257.828410] <IRQ>
>> [252257.828417]
>> [252257.829292] [<ffffffff8132bfe0>] ? timerqueue_add+0x60/0xb0
>> [252257.830164] [<ffffffff81090b3f>] __do_softirq+0xef/0x280
>> [252257.831010] [<ffffffff816b6a5c>] call_softirq+0x1c/0x30
>> [252257.831849] [<ffffffff8102d3c5>] do_softirq+0x65/0xa0
>> [252257.832669] [<ffffffff81090ec5>] irq_exit+0x105/0x110
>> [252257.833501] [<ffffffff816b76c2>] smp_apic_timer_interrupt+0x42/0x50
>> [252257.834330] [<ffffffff816b5c1d>] apic_timer_interrupt+0x6d/0x80
>> [252257.835152] <EOI>
>> [252257.835159]
>> [252257.835944] [<ffffffff81527a02>] ? cpuidle_enter_state+0x52/0xc0
>> [252257.837231] [<ffffffff81527b48>] cpuidle_idle_call+0xd8/0x210
>> [252257.838224] [<ffffffff81034fee>] arch_cpu_idle+0xe/0x30
>> [252257.839133] [<ffffffff810e7bca>] cpu_startup_entry+0x14a/0x1c0
>> [252257.839933] [<ffffffff81051af6>] start_secondary+0x1b6/0x230
>> [252257.840684] Code: 00 00 00 44 0f b6 e0 45 85 e4 0f 84 a7 01 00 00 49 63 d4 48 83 43 10 01 48 8d 75 c0 48 c1 e2 04 48 01 da 48 8b 4a 28 48 8d 42 28 <48> 89 71 08 48 89 4d c0 48 8b 4a 30 48 89 4d c8 48 89 31 48 89
>> [252257.842366] RIP [<ffffffff810996e0>] run_timer_softirq+0xe0/0x310
>> [252257.843183] RSP <ffff885fbe203e68>
>> [252257.843955] CR2: ffffffffc07c6298
>>
>> Fixes: dd41d330b03 ("openvswitch: Add force commit.")
>> Suggested-by: Eric Garver <e@erig.me>
>> Signed-off-by: Greg Rose <gvrose8192@gmail.com>
>> ---
>>
>> V2 - Add splat to commit message per Pravin's request
>> V3 - Remove msleep and replace with code from Eric Garver to delete
>>       conntrack timer.
>>
>> squash me
>> ---
>>   datapath/conntrack.c        | 6 ++++--
>>   tests/system-kmod-macros.at | 1 +
>>   utilities/ovs-lib.in        | 1 +
>>   3 files changed, 6 insertions(+), 2 deletions(-)
>>
>> diff --git a/datapath/conntrack.c b/datapath/conntrack.c
>> index e53b8e3..4e7c42b 100644
>> --- a/datapath/conntrack.c
>> +++ b/datapath/conntrack.c
>> @@ -576,7 +576,8 @@ ovs_ct_expect_find(struct net *net, const struct nf_conntrack_zone *zone,
>>   		if (h) {
>>   			struct nf_conn *ct = nf_ct_tuplehash_to_ctrack(h);
>>   
>> -			nf_ct_delete(ct, 0, 0);
>> +			if (del_timer(&ct->timeout))
>> +				nf_ct_delete(ct, 0, 0);
>>   			nf_conntrack_put(&ct->ct_general);
>>   		}
>>   	}
> Somehow you need to limit that change to the correct range of kernels,
> because since 4.9 the ct->timeout changed from struct timer_list to
> u32 which is jiffies and not a timer.
>
> fbl

Whoops.  OK, I'll add a check for RHEL 7.

Thanks Flavio.

- Greg

>
>> @@ -727,7 +728,8 @@ static bool skb_nfct_cached(struct net *net,
>>   		 * the reference.
>>   		 */
>>   		if (nf_ct_is_confirmed(ct))
>> -			nf_ct_delete(ct, 0, 0);
>> +			if (del_timer(&ct->timeout))
>> +				nf_ct_delete(ct, 0, 0);
>>   
>>   		nf_conntrack_put(&ct->ct_general);
>>   		nf_ct_set(skb, NULL, 0);
>> diff --git a/tests/system-kmod-macros.at b/tests/system-kmod-macros.at
>> index f23a406..2b9b691 100644
>> --- a/tests/system-kmod-macros.at
>> +++ b/tests/system-kmod-macros.at
>> @@ -23,6 +23,7 @@ m4_define([OVS_TRAFFIC_VSWITCHD_START],
>>                  on_exit 'modprobe -q -r mod'
>>                 ])
>>      on_exit 'ovs-dpctl del-dp ovs-system'
>> +   on_exit 'ovs-appctl dpctl/flush-conntrack'
>>      _OVS_VSWITCHD_START([])
>>      dnl Add bridges, ports, etc.
>>      AT_CHECK([ovs-vsctl -- _ADD_BR([br0]) -- $1 m4_if([$2], [], [], [| uuidfilt])], [0], [$2])
>> diff --git a/utilities/ovs-lib.in b/utilities/ovs-lib.in
>> index 4dc3151..4c3ad0f 100644
>> --- a/utilities/ovs-lib.in
>> +++ b/utilities/ovs-lib.in
>> @@ -616,6 +616,7 @@ force_reload_kmod () {
>>       for dp in `ovs-dpctl dump-dps`; do
>>           action "Removing datapath: $dp" ovs-dpctl del-dp "$dp"
>>       done
>> +    action "ovs-appctl dpctl/flush-conntrack"
>>   
>>       for vport in `awk '/^vport_/ { print $1 }' /proc/modules`; do
>>           action "Removing $vport module" rmmod $vport
>> -- 
>> 1.8.3.1
>>
>> _______________________________________________
>> dev mailing list
>> dev@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Gregory Rose April 20, 2018, 9:08 p.m. UTC | #3
On 4/20/2018 11:15 AM, Gregory Rose wrote:
> On 4/20/2018 11:10 AM, Flavio Leitner wrote:
>> On Fri, Apr 20, 2018 at 09:56:53AM -0700, Greg Rose wrote:
>>> On RHEL 7.x kernels we observe a panic induced by a paging error
>>> when the timer kicks off a job that subsequently accesses memory
>>> that belonged to the openvswitch kernel module but was since
>>> unloaded - thus the paging error.
>>>
>>> The panic can be induced on any RHEL 7.x kernel with the following 
>>> test:
>>>
>>> while `true`
>>> do
>>>      make check-kmod TESTSUITEFLAGS="-k \!gre"
>>> done
>>>
>>> On the systems I've been testing on it generally takes anywhere from a
>>> minute to 15 minutes or so to repro but never longer than that.  
>>> Similar
>>> results have been seen by other testers.
>>>
>>> Fix the panic by flushing conntrack and then ensuring that the 
>>> timers are
>>> deleted by incorporating the patch from suggested by Eric Garver.
>>>
>>> Here is the trace:
>>> [252257.801809] BUG: unable to handle kernel paging request at 
>>> ffffffffc07c6298
>>> [252257.802451] IP: [<ffffffff810996e0>] run_timer_softirq+0xe0/0x310
>>> [252257.803055] PGD 19f5067 PUD 19f7067 PMD 2fb5fc2067 PTE 0
>>> [252257.803559] Oops: 0002 [#1] SMP
>>> [252257.804138] Modules linked in: geneve ip6_udp_tunnel 
>>> xt_statistic xt_physdev xt_nat xt_recent xt_comment xt_mark 
>>> ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat xt_addrtype 
>>> ipt_REJECT nf_reject_ipv4 xt_conntrack iptable_filter ip_tables 
>>> nf_conntrack_netlink br_netfilter overlay(T) sch_htb veth udp_tunnel 
>>> 8021q garp mrp tun ip_set nfnetlink bridge stp llc nf_conntrack_ipv6 
>>> nf_defrag_ipv6 nf_nat_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 
>>> nf_nat_ipv4 nf_nat nf_conntrack iTCO_wdt iTCO_vendor_support dcdbas 
>>> mxm_wmi sb_edac edac_core intel_powerclamp coretemp intel_rapl 
>>> iosf_mbi kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel 
>>> aesni_intel lrw gf128mul glue_helper ablk_helper cryptd joydev 
>>> mei_me sg mei ipmi_ssif pcspkr shpchp lpc_ich ipmi_si ipmi_devintf 
>>> ipmi_msghandler acpi_power_meter wmi nfsd auth_rpcgss
>>> [252257.808079] nfs_acl lockd grace sunrpc xfs libcrc32c sr_mod 
>>> sd_mod cdrom crc_t10dif crct10dif_generic uas usb_storage mgag200 
>>> drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm 
>>> crct10dif_pclmul crct10dif_common crc32c_intel ahci libahci ixgbe 
>>> libata igb megaraid_sas mdio ptp i2c_algo_bit pps_core i2c_core dca 
>>> dm_mirror dm_region_hash dm_log dm_mod [last unloaded: openvswitch]
>>> [252257.811056] CPU: 33 PID: 0 Comm: swapper/33 Tainted: G OE 
>>> ------------ T 3.10.0-693.el7.x86_64 #1
>>> [252257.811826] Hardware name: Dell Inc. PowerEdge R630/02C2CP, BIOS 
>>> 2.1.5 04/11/2016
>>> [252257.812605] task: ffff8830b7708fd0 ti: ffff8830b7718000 task.ti: 
>>> ffff8830b7718000
>>> [252257.813447] RIP: 0010:[<ffffffff810996e0>] [<ffffffff810996e0>] 
>>> run_timer_softirq+0xe0/0x310
>>> [252257.814298] RSP: 0018:ffff885fbe203e68 EFLAGS: 00010082
>>> [252257.815122] RAX: ffff8830b66bc838 RBX: ffff8830b66bc000 RCX: 
>>> ffffffffc07c6290
>>> [252257.815933] RDX: ffff8830b66bc810 RSI: ffff885fbe203e90 RDI: 
>>> ffff8830b66bc000
>>> [252257.816733] RBP: ffff885fbe203ed0 R08: 0000e56b5701d800 R09: 
>>> ffff885fbe203da0
>>> [252257.817568] R10: 0000000000000002 R11: ffff885fbe203da8 R12: 
>>> 0000000000000081
>>> [252257.818429] R13: 0000e56b56fb2eca R14: ffffffff819eb0c8 R15: 
>>> 0000000000000001
>>> [252257.819297] FS: 0000000000000000(0000) GS:ffff885fbe200000(0000) 
>>> knlGS:0000000000000000
>>> [252257.820174] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [252257.821156] CR2: ffffffffc07c6298 CR3: 00000000019f2000 CR4: 
>>> 00000000003407e0
>>> [252257.822012] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
>>> 0000000000000000
>>> [252257.822869] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 
>>> 0000000000000400
>>> [252257.823720] Stack:
>>> [252257.824678] ffff8830b66bdc28 ffff8830b66bd828 ffff8830b66bd428 
>>> ffff8830b66bd028
>>> [252257.825699] ffff885fbe20fe80 ffff885fbe203eb0 ffffffff8132bfe0 
>>> 28a56c277c4fe974
>>> [252257.826623] 0000000000000001 0000000000000001 0000e56b56fb2eca 
>>> ffffffff819eb0c8
>>> [252257.827524] Call Trace:
>>> [252257.828410] <IRQ>
>>> [252257.828417]
>>> [252257.829292] [<ffffffff8132bfe0>] ? timerqueue_add+0x60/0xb0
>>> [252257.830164] [<ffffffff81090b3f>] __do_softirq+0xef/0x280
>>> [252257.831010] [<ffffffff816b6a5c>] call_softirq+0x1c/0x30
>>> [252257.831849] [<ffffffff8102d3c5>] do_softirq+0x65/0xa0
>>> [252257.832669] [<ffffffff81090ec5>] irq_exit+0x105/0x110
>>> [252257.833501] [<ffffffff816b76c2>] smp_apic_timer_interrupt+0x42/0x50
>>> [252257.834330] [<ffffffff816b5c1d>] apic_timer_interrupt+0x6d/0x80
>>> [252257.835152] <EOI>
>>> [252257.835159]
>>> [252257.835944] [<ffffffff81527a02>] ? cpuidle_enter_state+0x52/0xc0
>>> [252257.837231] [<ffffffff81527b48>] cpuidle_idle_call+0xd8/0x210
>>> [252257.838224] [<ffffffff81034fee>] arch_cpu_idle+0xe/0x30
>>> [252257.839133] [<ffffffff810e7bca>] cpu_startup_entry+0x14a/0x1c0
>>> [252257.839933] [<ffffffff81051af6>] start_secondary+0x1b6/0x230
>>> [252257.840684] Code: 00 00 00 44 0f b6 e0 45 85 e4 0f 84 a7 01 00 
>>> 00 49 63 d4 48 83 43 10 01 48 8d 75 c0 48 c1 e2 04 48 01 da 48 8b 4a 
>>> 28 48 8d 42 28 <48> 89 71 08 48 89 4d c0 48 8b 4a 30 48 89 4d c8 48 
>>> 89 31 48 89
>>> [252257.842366] RIP [<ffffffff810996e0>] run_timer_softirq+0xe0/0x310
>>> [252257.843183] RSP <ffff885fbe203e68>
>>> [252257.843955] CR2: ffffffffc07c6298
>>>
>>> Fixes: dd41d330b03 ("openvswitch: Add force commit.")
>>> Suggested-by: Eric Garver <e@erig.me>
>>> Signed-off-by: Greg Rose <gvrose8192@gmail.com>
>>> ---
>>>
>>> V2 - Add splat to commit message per Pravin's request
>>> V3 - Remove msleep and replace with code from Eric Garver to delete
>>>       conntrack timer.
>>>
>>> squash me
>>> ---
>>>   datapath/conntrack.c        | 6 ++++--
>>>   tests/system-kmod-macros.at | 1 +
>>>   utilities/ovs-lib.in        | 1 +
>>>   3 files changed, 6 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/datapath/conntrack.c b/datapath/conntrack.c
>>> index e53b8e3..4e7c42b 100644
>>> --- a/datapath/conntrack.c
>>> +++ b/datapath/conntrack.c
>>> @@ -576,7 +576,8 @@ ovs_ct_expect_find(struct net *net, const struct 
>>> nf_conntrack_zone *zone,
>>>           if (h) {
>>>               struct nf_conn *ct = nf_ct_tuplehash_to_ctrack(h);
>>>   -            nf_ct_delete(ct, 0, 0);
>>> +            if (del_timer(&ct->timeout))
>>> +                nf_ct_delete(ct, 0, 0);
>>>               nf_conntrack_put(&ct->ct_general);
>>>           }
>>>       }
>> Somehow you need to limit that change to the correct range of kernels,
>> because since 4.9 the ct->timeout changed from struct timer_list to
>> u32 which is jiffies and not a timer.
>>
>> fbl
>
> Whoops.  OK, I'll add a check for RHEL 7.
>
> Thanks Flavio.
>
> - Greg

Guh, still getting these:

Message from syslogd@localhost at Apr 20 16:58:32 ...
  kernel:NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! 
[ovstest:7675]

Message from syslogd@localhost at Apr 20 16:59:00 ...
  kernel:NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! 
[ovstest:7675]

Message from syslogd@localhost at Apr 20 16:59:32 ...
  kernel:NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! 
[ovstest:7675]

Message from syslogd@localhost at Apr 20 17:00:00 ...
  kernel:NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! 
[ovstest:7675]

Message from syslogd@localhost at Apr 20 17:00:28 ...
  kernel:NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! 
[ovstest:7675]

Message from syslogd@localhost at Apr 20 17:00:56 ...
  kernel:NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! 
[ovstest:7675]

Message from syslogd@localhost at Apr 20 17:01:24 ...
  kernel:NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s! 
[ovstest:7675]

Message from syslogd@localhost at Apr 20 17:01:52 ...
  kernel:NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s! 
[ovstest:7675]

Message from syslogd@localhost at Apr 20 17:02:32 ...
  kernel:NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! 
[ovstest:7675]

Message from syslogd@localhost at Apr 20 17:03:00 ...
  kernel:NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! 
[ovstest:7675]

Message from syslogd@localhost at Apr 20 17:03:28 ...
  kernel:NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! 
[ovstest:7675]

Message from syslogd@localhost at Apr 20 17:03:56 ...
  kernel:NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! 
[ovstest:7675]

Message from syslogd@localhost at Apr 20 17:04:24 ...
  kernel:NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! 
[ovstest:7675]

Message from syslogd@localhost at Apr 20 17:04:52 ...
  kernel:NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! 
[ovstest:7675]

Message from syslogd@localhost at Apr 20 17:05:32 ...
  kernel:NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s! 
[ovstest:7675]

Message from syslogd@localhost at Apr 20 17:06:00 ...
  kernel:NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s! 
[ovstest:7675]

So the suggested change to delete the timers changes the character of 
the bug but it does not fix the issue.

Back to the drawing board.

- Greg
>
>>
>>> @@ -727,7 +728,8 @@ static bool skb_nfct_cached(struct net *net,
>>>            * the reference.
>>>            */
>>>           if (nf_ct_is_confirmed(ct))
>>> -            nf_ct_delete(ct, 0, 0);
>>> +            if (del_timer(&ct->timeout))
>>> +                nf_ct_delete(ct, 0, 0);
>>>             nf_conntrack_put(&ct->ct_general);
>>>           nf_ct_set(skb, NULL, 0);
>>> diff --git a/tests/system-kmod-macros.at b/tests/system-kmod-macros.at
>>> index f23a406..2b9b691 100644
>>> --- a/tests/system-kmod-macros.at
>>> +++ b/tests/system-kmod-macros.at
>>> @@ -23,6 +23,7 @@ m4_define([OVS_TRAFFIC_VSWITCHD_START],
>>>                  on_exit 'modprobe -q -r mod'
>>>                 ])
>>>      on_exit 'ovs-dpctl del-dp ovs-system'
>>> +   on_exit 'ovs-appctl dpctl/flush-conntrack'
>>>      _OVS_VSWITCHD_START([])
>>>      dnl Add bridges, ports, etc.
>>>      AT_CHECK([ovs-vsctl -- _ADD_BR([br0]) -- $1 m4_if([$2], [], [], 
>>> [| uuidfilt])], [0], [$2])
>>> diff --git a/utilities/ovs-lib.in b/utilities/ovs-lib.in
>>> index 4dc3151..4c3ad0f 100644
>>> --- a/utilities/ovs-lib.in
>>> +++ b/utilities/ovs-lib.in
>>> @@ -616,6 +616,7 @@ force_reload_kmod () {
>>>       for dp in `ovs-dpctl dump-dps`; do
>>>           action "Removing datapath: $dp" ovs-dpctl del-dp "$dp"
>>>       done
>>> +    action "ovs-appctl dpctl/flush-conntrack"
>>>         for vport in `awk '/^vport_/ { print $1 }' /proc/modules`; do
>>>           action "Removing $vport module" rmmod $vport
>>> -- 
>>> 1.8.3.1
>>>
>>> _______________________________________________
>>> dev mailing list
>>> dev@openvswitch.org
>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
diff mbox series

Patch

diff --git a/datapath/conntrack.c b/datapath/conntrack.c
index e53b8e3..4e7c42b 100644
--- a/datapath/conntrack.c
+++ b/datapath/conntrack.c
@@ -576,7 +576,8 @@  ovs_ct_expect_find(struct net *net, const struct nf_conntrack_zone *zone,
 		if (h) {
 			struct nf_conn *ct = nf_ct_tuplehash_to_ctrack(h);
 
-			nf_ct_delete(ct, 0, 0);
+			if (del_timer(&ct->timeout))
+				nf_ct_delete(ct, 0, 0);
 			nf_conntrack_put(&ct->ct_general);
 		}
 	}
@@ -727,7 +728,8 @@  static bool skb_nfct_cached(struct net *net,
 		 * the reference.
 		 */
 		if (nf_ct_is_confirmed(ct))
-			nf_ct_delete(ct, 0, 0);
+			if (del_timer(&ct->timeout))
+				nf_ct_delete(ct, 0, 0);
 
 		nf_conntrack_put(&ct->ct_general);
 		nf_ct_set(skb, NULL, 0);
diff --git a/tests/system-kmod-macros.at b/tests/system-kmod-macros.at
index f23a406..2b9b691 100644
--- a/tests/system-kmod-macros.at
+++ b/tests/system-kmod-macros.at
@@ -23,6 +23,7 @@  m4_define([OVS_TRAFFIC_VSWITCHD_START],
                on_exit 'modprobe -q -r mod'
               ])
    on_exit 'ovs-dpctl del-dp ovs-system'
+   on_exit 'ovs-appctl dpctl/flush-conntrack'
    _OVS_VSWITCHD_START([])
    dnl Add bridges, ports, etc.
    AT_CHECK([ovs-vsctl -- _ADD_BR([br0]) -- $1 m4_if([$2], [], [], [| uuidfilt])], [0], [$2])
diff --git a/utilities/ovs-lib.in b/utilities/ovs-lib.in
index 4dc3151..4c3ad0f 100644
--- a/utilities/ovs-lib.in
+++ b/utilities/ovs-lib.in
@@ -616,6 +616,7 @@  force_reload_kmod () {
     for dp in `ovs-dpctl dump-dps`; do
         action "Removing datapath: $dp" ovs-dpctl del-dp "$dp"
     done
+    action "ovs-appctl dpctl/flush-conntrack"
 
     for vport in `awk '/^vport_/ { print $1 }' /proc/modules`; do
         action "Removing $vport module" rmmod $vport