Message ID | 1286905245.2703.3.camel@edumazet-laptop |
---|---|
State | Not Applicable, archived |
Delegated to: | David Miller |
Headers | show |
Eric Dumazet wrote: > 2.6.27 is a bit old, you might try : > > commit 7fa7cb7109d07c29ab28bb877bc7049a0150dbe5 > Author: Eric Dumazet <eric.dumazet@gmail.com> > Date: Mon Sep 27 04:18:27 2010 +0000 > Amazing -- a fix in 20 minutes -- thank you! I need to do more load testing but it looks like it may have solved the issue. Joe Buehler -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Eric Dumazet wrote: > 2.6.27 is a bit old, you might try : > > commit 7fa7cb7109d07c29ab28bb877bc7049a0150dbe5 > Author: Eric Dumazet <eric.dumazet@gmail.com> > Date: Mon Sep 27 04:18:27 2010 +0000 Alas, after more load testing I find that the panic still occurs: CPU 1 Unable to handle kernel paging request at virtual address 0000000000000000, epc == ffffffff8146728c, ra == ffffffff81467258 Oops[#1]: Cpu 1 $ 0 : 0000000000000000 0000000000000000 0000000000000000 0000000000000000 $ 4 : ffffffffffffffff a80000008c9d78f0 a80000009acd9880 000000000a205a7c $ 8 : 0000000000000000 0000000000000020 a80000009a9c49d0 0000000000000000 $12 : ffffffff8155de00 0000000000000004 0000000000000001 0000000000000000 $16 : 0000000000000000 a80000008c9d78f0 0000000000000002 a80000009b94ed80 $20 : a80000009b94edf8 0000000000000000 0000000000000003 a80000008c9d78a0 $24 : 0000000000000000 ffffffff812df388 $28 : a80000008c9d4000 a80000008c9d7840 fffffffffffffff5 ffffffff81467258 Hi : 0000000000000000 Lo : 0000000000000000 epc : ffffffff8146728c fib_rules_lookup+0x11c/0x260 Not tainted ra : ffffffff81467258 fib_rules_lookup+0xe8/0x260 Status: 1010cce3 KX SX UX KERNEL EXL IE Cause : 00800008 BadVA : 0000000000000000 PrId : 000d0409 (Cavium Octeon) Modules linked in: x_tables ip_tables iptable_filter nf_conntrack nf_conntrack_ipv4 nf_nat iptable_nat tun xt_tcpudp xt_state ipt_REJECT ipv6 ip6_tables ip6table_filter ip6t_ipv6header ip6t_REJECT Process qscope7500 (pid: 1343, threadinfo=a80000008c9d4000, task=a80000008c068ac0, tls=000000002d51e920) Stack : ffffffffffffffff 0000000000000003 a80000008c9d78d8 a80000008c9d79d8 a80000008c9d78f0 0000000000000000 ffffffff816c39c0 ffffffffffffffff 0000000000000003 00000000000004a6 0000000000000000 ffffffff814bd2a4 0000000000000000 a80000008c9d78d8 0000000000000000 ffffffffc001aa24 a80000008c9d78d8 ffffffff81478a38 0000000000000003 0000000000000001 0000000000000000 0000000000000000 0000000000000001 000000000a205a7c 0a2059bf00000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 a80000008c9d79d0 0000000000000000 a80000008c9d79d8 a8000000994ff380 0000000000000000 0000000000000000 a80000008c9d79d0 ffffffff816c39c0 0000000000000003 00000000000004a6 ... Call Trace: [<ffffffff8146728c>] fib_rules_lookup+0x11c/0x260 [<ffffffff814bd2a4>] fib_lookup+0x2c/0x48 [<ffffffff81478a38>] __ip_route_output_key+0x918/0xf38 [<ffffffff81479090>] ip_route_output_flow+0x38/0x2e8 [<ffffffff81482c44>] ip_queue_xmit+0x38c/0x3a8 [<ffffffff81497f5c>] tcp_transmit_skb+0x3f4/0x7d0 [<ffffffff8149af2c>] __tcp_push_pending_frames+0x1fc/0x9e0 [<ffffffff8148cf50>] tcp_sendmsg+0x900/0xe00 [<ffffffff81441f7c>] sock_aio_write+0x16c/0x190 [<ffffffff811d0424>] do_sync_write+0xbc/0x130 [<ffffffff811d10a8>] vfs_write+0x150/0x158 [<ffffffff811d124c>] sys_write+0x5c/0x118 [<ffffffff8114532c>] handle_sys+0x12c/0x148 Code: 0040282d 00000000 de100000 <de020000> cc400000 1614ffce 00000000 2405fffd dfbf0058 Fatal exception: panic in 5 seconds Kernel panic - not syncing: Fatal exception Rebooting in 1 seconds.. I'll have a go at adding a spinlock. Joe Buehler -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le mercredi 20 octobre 2010 à 13:43 -0400, Joe Buehler a écrit : > Eric Dumazet wrote: > > > 2.6.27 is a bit old, you might try : > > > > commit 7fa7cb7109d07c29ab28bb877bc7049a0150dbe5 > > Author: Eric Dumazet <eric.dumazet@gmail.com> > > Date: Mon Sep 27 04:18:27 2010 +0000 > > Alas, after more load testing I find that the panic still occurs: > > CPU 1 Unable to handle kernel paging request at virtual address > 0000000000000000, epc == ffffffff8146728c, ra == ffffffff81467258 > Oops[#1]: > Cpu 1 > $ 0 : 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > $ 4 : ffffffffffffffff a80000008c9d78f0 a80000009acd9880 000000000a205a7c > $ 8 : 0000000000000000 0000000000000020 a80000009a9c49d0 0000000000000000 > $12 : ffffffff8155de00 0000000000000004 0000000000000001 0000000000000000 > $16 : 0000000000000000 a80000008c9d78f0 0000000000000002 a80000009b94ed80 > $20 : a80000009b94edf8 0000000000000000 0000000000000003 a80000008c9d78a0 > $24 : 0000000000000000 ffffffff812df388 > $28 : a80000008c9d4000 a80000008c9d7840 fffffffffffffff5 ffffffff81467258 > Hi : 0000000000000000 > Lo : 0000000000000000 > epc : ffffffff8146728c fib_rules_lookup+0x11c/0x260 > Not tainted > ra : ffffffff81467258 fib_rules_lookup+0xe8/0x260 > Status: 1010cce3 KX SX UX KERNEL EXL IE > Cause : 00800008 > BadVA : 0000000000000000 > PrId : 000d0409 (Cavium Octeon) > Modules linked in: x_tables ip_tables iptable_filter nf_conntrack > nf_conntrack_ipv4 nf_nat iptable_nat tun xt_tcpudp xt_state ipt_REJECT > ipv6 ip6_tables ip6table_filter ip6t_ipv6header ip6t_REJECT > Process qscope7500 (pid: 1343, threadinfo=a80000008c9d4000, > task=a80000008c068ac0, tls=000000002d51e920) > Stack : ffffffffffffffff 0000000000000003 a80000008c9d78d8 a80000008c9d79d8 > a80000008c9d78f0 0000000000000000 ffffffff816c39c0 ffffffffffffffff > 0000000000000003 00000000000004a6 0000000000000000 ffffffff814bd2a4 > 0000000000000000 a80000008c9d78d8 0000000000000000 ffffffffc001aa24 > a80000008c9d78d8 ffffffff81478a38 0000000000000003 0000000000000001 > 0000000000000000 0000000000000000 0000000000000001 000000000a205a7c > 0a2059bf00000000 0000000000000000 0000000000000000 0000000000000000 > 0000000000000000 0000000000000000 a80000008c9d79d0 0000000000000000 > a80000008c9d79d8 a8000000994ff380 0000000000000000 0000000000000000 > a80000008c9d79d0 ffffffff816c39c0 0000000000000003 00000000000004a6 > ... > Call Trace: > [<ffffffff8146728c>] fib_rules_lookup+0x11c/0x260 > [<ffffffff814bd2a4>] fib_lookup+0x2c/0x48 > [<ffffffff81478a38>] __ip_route_output_key+0x918/0xf38 > [<ffffffff81479090>] ip_route_output_flow+0x38/0x2e8 > [<ffffffff81482c44>] ip_queue_xmit+0x38c/0x3a8 > [<ffffffff81497f5c>] tcp_transmit_skb+0x3f4/0x7d0 > [<ffffffff8149af2c>] __tcp_push_pending_frames+0x1fc/0x9e0 > [<ffffffff8148cf50>] tcp_sendmsg+0x900/0xe00 > [<ffffffff81441f7c>] sock_aio_write+0x16c/0x190 > [<ffffffff811d0424>] do_sync_write+0xbc/0x130 > [<ffffffff811d10a8>] vfs_write+0x150/0x158 > [<ffffffff811d124c>] sys_write+0x5c/0x118 > [<ffffffff8114532c>] handle_sys+0x12c/0x148 > > > Code: 0040282d 00000000 de100000 <de020000> cc400000 1614ffce > 00000000 2405fffd dfbf0058 > Fatal exception: panic in 5 seconds > Kernel panic - not syncing: Fatal exception > Rebooting in 1 seconds.. Thanks Joe Could you provide a disassembly of function fib_rules_lookup ? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Eric Dumazet wrote:
> Could you provide a disassembly of function fib_rules_lookup ?
Try looking in http://68.100.141.95:3000/linux-crash/. There should be
the source file I am using (not current release if you recall), the .o,
the disassembly, and a -S compile that makes deducing the line numbers a
little easier.
I have tried adding a recursive spinlock as a quick fix without much
success. It looks like the code in net/core/fib_rules.c results in some
rather complex code paths through the kernel involving softirq handlers.
I get lockups though I took steps to make sure local interrupts were
off and preemption disabled when taking the spinlock.
I do not know the RCU API or the network code at all but some of the
code in fib_rules.c does not look right prima facie. For example,
dump_rules does not seem to bother about RCU when traversing rules_list.
Its caller, fib_nl_dumprule does, but only for one of two calls. The
fib_nl_newrule and fib_nl_delrule functions traverse rules_list without
any obvious RCU precautions.
Joe Buehler
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Le vendredi 22 octobre 2010 à 16:30 -0400, Joe Buehler a écrit : > Eric Dumazet wrote: > > > Could you provide a disassembly of function fib_rules_lookup ? > > Try looking in http://68.100.141.95:3000/linux-crash/. There should be > the source file I am using (not current release if you recall), the .o, > the disassembly, and a -S compile that makes deducing the line numbers a > little easier. > Hmm, I'll take a look sometime in the future, thanks > I have tried adding a recursive spinlock as a quick fix without much > success. It looks like the code in net/core/fib_rules.c results in some > rather complex code paths through the kernel involving softirq handlers. > I get lockups though I took steps to make sure local interrupts were > off and preemption disabled when taking the spinlock. > > I do not know the RCU API or the network code at all but some of the > code in fib_rules.c does not look right prima facie. For example, > dump_rules does not seem to bother about RCU when traversing rules_list. > Its caller, fib_nl_dumprule does, but only for one of two calls. The > fib_nl_newrule and fib_nl_delrule functions traverse rules_list without > any obvious RCU precautions. There is no obligation RCU shall be used in a slow path algorithm. This slow path uses regular locking to prevent another process to change the things while we read them, thats it. Patrick made a change in the past so that RTNL is locked on dump operations, so no dump_rules() is in fact protected by RTNL, it doesnt need to take care of 'RCU'. Thanks -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le samedi 23 octobre 2010 à 05:44 +0200, Eric Dumazet a écrit : > Le vendredi 22 octobre 2010 à 16:30 -0400, Joe Buehler a écrit : > > Eric Dumazet wrote: > > > > > Could you provide a disassembly of function fib_rules_lookup ? > > > > Try looking in http://68.100.141.95:3000/linux-crash/. There should be > > the source file I am using (not current release if you recall), the .o, > > the disassembly, and a -S compile that makes deducing the line numbers a > > little easier. > > > > Hmm, I'll take a look sometime in the future, thanks Did that... Hmm... I am wondering if smp_rcu_assign_pointer() (or more precisely smp_wmb()) is correctly implemented on octeon platform. Try to add in fib_nl_newrule() right after the kzalloc bloc : rule = kzalloc(ops->rule_size, GFP_KERNEL); if (rule == NULL) { err = -ENOMEM; goto errout; } + rule->list.next = LIST_POISON1; + rule->list.prev = LIST_POISON2; So that we can actually see if the NULL dereference bug you hit becomes a "LIST_POISON1" dereference bug... -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le samedi 23 octobre 2010 à 17:11 +0200, Eric Dumazet a écrit : > Le samedi 23 octobre 2010 à 05:44 +0200, Eric Dumazet a écrit : > > Le vendredi 22 octobre 2010 à 16:30 -0400, Joe Buehler a écrit : > > > Eric Dumazet wrote: > > > > > > > Could you provide a disassembly of function fib_rules_lookup ? > > > > > > Try looking in http://68.100.141.95:3000/linux-crash/. There should be > > > the source file I am using (not current release if you recall), the .o, > > > the disassembly, and a -S compile that makes deducing the line numbers a > > > little easier. > > > > > > > Hmm, I'll take a look sometime in the future, thanks > > Did that... Hmm... > > I am wondering if smp_rcu_assign_pointer() (or more precisely smp_wmb()) > is correctly implemented on octeon platform. > > Try to add in fib_nl_newrule() right after the kzalloc bloc : > > rule = kzalloc(ops->rule_size, GFP_KERNEL); > if (rule == NULL) { > err = -ENOMEM; > goto errout; > } > + rule->list.next = LIST_POISON1; > + rule->list.prev = LIST_POISON2; > > > So that we can actually see if the NULL dereference bug you hit becomes > a "LIST_POISON1" dereference bug... > > Reading commit 500c2e1fdbcc2b273bd is interesting... David Daney added a nudge_writes(), actually doing a "syncw" instruction, and this seems to be the smp_wmb() this platform should be using in the first place, not a pure compiler barrier (barrier()) So Joe, you might want to change the smp_wmb() call in rcu_assign_pointer() by the nudge_writes() call, and see what happens... -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Eric Dumazet wrote: > > Did that... Hmm... > > I am wondering if smp_rcu_assign_pointer() (or more precisely smp_wmb()) > is correctly implemented on octeon platform. > > Try to add in fib_nl_newrule() right after the kzalloc bloc : > > rule = kzalloc(ops->rule_size, GFP_KERNEL); > if (rule == NULL) { > err = -ENOMEM; > goto errout; > } > + rule->list.next = LIST_POISON1; > + rule->list.prev = LIST_POISON2; > > > So that we can actually see if the NULL dereference bug you hit becomes > a "LIST_POISON1" dereference bug... > > > Thanks -- I'll try it when I'm back in the office Tuesday. It is always possible that there is some issue with the Octeon memory barrier stuff, but I would think that the system would be much more unstable than it is -- we're really beating on a dual CPU LINUX instance that has Java and C++ apps running and also doing some network I/O. My strategy at this point is logging events to memory and dumping the log to the console at the time of the panic. I might be able to figure out the sequence of events causing the crash. The load test that causes the panic is using several dozen TAP interfaces, ifconfig'd up/down every 10 seconds or so, with source-routes, DNAT and SNAT being set up and taken down also. Joe Buehler -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Eric Dumazet wrote: > > David Daney added a nudge_writes(), actually doing a "syncw" > instruction, and this seems to be the smp_wmb() this platform should be > using in the first place, not a pure compiler barrier (barrier()) > > So Joe, you might want to change the smp_wmb() call in > rcu_assign_pointer() by the nudge_writes() call, and see what happens... > > > I think Daney is Cavium's Octeon LINUX guru from the posts I've seen so he would definitely know the platform. I'm not sure I quite understand what you are saying but it sounds as though you are saying that smp_wb is not doing a syncw and that sounds *totally* broken -- snycw is what the low-level Cavium SDK uses for memory barriers all over the place. Joe Buehler -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le samedi 23 octobre 2010 à 11:40 -0400, Joe Buehler a écrit : > Eric Dumazet wrote: > > > > David Daney added a nudge_writes(), actually doing a "syncw" > > instruction, and this seems to be the smp_wmb() this platform should be > > using in the first place, not a pure compiler barrier (barrier()) > > > > So Joe, you might want to change the smp_wmb() call in > > rcu_assign_pointer() by the nudge_writes() call, and see what happens... > > > > > > > > I think Daney is Cavium's Octeon LINUX guru from the posts I've seen so > he would definitely know the platform. I'm not sure I quite understand > what you are saying but it sounds as though you are saying that smp_wb > is not doing a syncw and that sounds *totally* broken -- snycw is what > the low-level Cavium SDK uses for memory barriers all over the place. > Yes, I am saying exactly this : smp_wmb() is a barrier() only, at least on the disassembly you provided to me. It might be fine (it is the same on x86 for example) fib_rules.old.s .L234: .loc 1 338 0 beq $9,$0,.L235 if (last) { .LBB911: // last = prev->next .LBB912: .loc 12 45 0 ld $2,0($9) //next = prev->next .LBB913: .LBB914: .loc 12 22 0 sd $9,8($18) // part of list_add_rcu new->prev = prev; .loc 12 21 0 sd $2,0($18) // new->next = next; .loc 12 23 0 <<!>> sd $18,0($9) //rcu_assign_pointer(prev->next, new); .loc 12 24 0 sd $18,8($2) // next->prev = new; .L236: .LBE914: .LBE913: No syncw here at least. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le samedi 23 octobre 2010 à 11:33 -0400, Joe Buehler a écrit : > It is always possible that there is some issue with the Octeon memory > barrier stuff, but I would think that the system would be much more > unstable than it is -- we're really beating on a dual CPU LINUX instance > that has Java and C++ apps running and also doing some network I/O. > > My strategy at this point is logging events to memory and dumping the > log to the console at the time of the panic. I might be able to figure > out the sequence of events causing the crash. > > The load test that causes the panic is using several dozen TAP > interfaces, ifconfig'd up/down every 10 seconds or so, with > source-routes, DNAT and SNAT being set up and taken down also. With a normal workload, on a dual cpu machine, a missing memory barrier can stay un-noticed for quite a long time. The race window is so small that probability for the bug might be 0.0000001 % or something like that :( You could try to run a test dual threaded program to reproduce the problem in user land, faster... -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
23.10.2010, 20:36, "Eric Dumazet" <eric.dumazet@gmail.com>: > With a normal workload, on a dual cpu machine, a missing memory barrier > can stay un-noticed for quite a long time. The race window is so small > that probability for the bug might be 0.0000001 % or something like > that :( Eric, I'd like to remind you that I've faced the similar problem on simple x86. See http://kerneltrap.org/mailarchive/linux-netdev/2010/3/9/6271568 Two main differences for our case: 1) There is no userspace workload (except for bgpd), no changes in interfaces 2) We are not using multiple routing tables This panic was pretty rare in our case (not more that 2 times per month). Currently we're running fine with disabled CONFIG_IP_MULTIPLE_TABLES.
Eric Dumazet wrote: > With a normal workload, on a dual cpu machine, a missing memory barrier > can stay un-noticed for quite a long time. The race window is so small > that probability for the bug might be 0.0000001 % or something like > that :( I'm looking at the LINUX source at the moment and not liking what I see in include/asm-mips/barrier.h: #define smp_mb() __asm__ __volatile__(__WEAK_ORDERING_MB : : :"memory") #ifdef CONFIG_CPU_CAVIUM_OCTEON #define smp_rmb() barrier() #define smp_wmb() barrier() #else #define smp_rmb() __asm__ __volatile__(__WEAK_ORDERING_MB : : :"memory") #define smp_wmb() __asm__ __volatile__(__WEAK_ORDERING_MB : : :"memory") #endif The Octeon documentation explicitly says that neither loads nor stores need execute in program order, so the definitions for smp_rmb and smp_wmb appear to be wrong wrong wrong. It appears that smp_wmb should be making use of SYNCW and smp_rmb should be making use of SYNC. Should I pursue this question on the main LINUX kernel list? Joe Buehler -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le mardi 26 octobre 2010 à 09:29 -0400, Joe Buehler a écrit : > Eric Dumazet wrote: > > > With a normal workload, on a dual cpu machine, a missing memory barrier > > can stay un-noticed for quite a long time. The race window is so small > > that probability for the bug might be 0.0000001 % or something like > > that :( > > I'm looking at the LINUX source at the moment and not liking what I see > in include/asm-mips/barrier.h: > > #define smp_mb() __asm__ __volatile__(__WEAK_ORDERING_MB : : :"memory") > #ifdef CONFIG_CPU_CAVIUM_OCTEON > #define smp_rmb() barrier() > #define smp_wmb() barrier() > #else > #define smp_rmb() __asm__ __volatile__(__WEAK_ORDERING_MB : : :"memory") > #define smp_wmb() __asm__ __volatile__(__WEAK_ORDERING_MB : : :"memory") > #endif > > The Octeon documentation explicitly says that neither loads nor stores > need execute in program order, so the definitions for smp_rmb and > smp_wmb appear to be wrong wrong wrong. > > It appears that smp_wmb should be making use of SYNCW and smp_rmb should > be making use of SYNC. > > Should I pursue this question on the main LINUX kernel list? Well, it would be surprising this being wrong and crash only once in a while in fib_rules_lookup Did you tried my last patch ? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Eric Dumazet wrote: > Well, it would be surprising this being wrong and crash only once in a > while in fib_rules_lookup > Did you tried my last patch ? There was a patch to the kernel by David Daney back in January to improve performance of Octeon memory barriers. The patch changes the generic MIPS barrier code to introduce optimizations for Octeon. The LINUX version I am using is from the Octeon SDK and appears to have an early version of this patch. It's broken however -- the Jan patch has proper SYNCW instructions in smp_wmb while the SDK version does not. I have made your changes but will also fold in this change, then start some load testing. The real-time scheduler is broken in the LINUX I am using -- I get kernel crashes -- and I would be most happy if the SYNCW fix fixed that also. Joe Buehler -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le mardi 26 octobre 2010 à 09:53 -0400, Joe Buehler a écrit : > Eric Dumazet wrote: > > > Well, it would be surprising this being wrong and crash only once in a > > while in fib_rules_lookup > > > Did you tried my last patch ? > > There was a patch to the kernel by David Daney back in January to > improve performance of Octeon memory barriers. The patch changes the > generic MIPS barrier code to introduce optimizations for Octeon. The > LINUX version I am using is from the Octeon SDK and appears to have an > early version of this patch. It's broken however -- the Jan patch has > proper SYNCW instructions in smp_wmb while the SDK version does not. > > I have made your changes but will also fold in this change, then start > some load testing. > > The real-time scheduler is broken in the LINUX I am using -- I get > kernel crashes -- and I would be most happy if the SYNCW fix fixed that > also. > > Joe Buehler > Just to make sure, you do use two "syncw", not only one ? /* * We actually use two syncw instructions in a row when we need a write * memory barrier. This is because the CN3XXX series of Octeons have * errata Core-401. This can cause a single syncw to not enforce * ordering under very rare conditions. Even if it is rare, better safe * than sorry. */ #define CVMX_SYNCW asm volatile ("syncw\n\tsyncw" : : : "memory") -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le mardi 26 octobre 2010 à 09:53 -0400, Joe Buehler a écrit : > Eric Dumazet wrote: > > > Well, it would be surprising this being wrong and crash only once in a > > while in fib_rules_lookup > > > Did you tried my last patch ? > > There was a patch to the kernel by David Daney back in January to > improve performance of Octeon memory barriers. The patch changes the > generic MIPS barrier code to introduce optimizations for Octeon. The > LINUX version I am using is from the Octeon SDK and appears to have an > early version of this patch. It's broken however -- the Jan patch has > proper SYNCW instructions in smp_wmb while the SDK version does not. > > I have made your changes but will also fold in this change, then start > some load testing. > > The real-time scheduler is broken in the LINUX I am using -- I get > kernel crashes -- and I would be most happy if the SYNCW fix fixed that > also. > > Just to make sure, are you using a single syncw, or a double one ? /* * We actually use two syncw instructions in a row when we need a write * memory barrier. This is because the CN3XXX series of Octeons have * errata Core-401. This can cause a single syncw to not enforce * ordering under very rare conditions. Even if it is rare, better safe * than sorry. */ #define CVMX_SYNCW asm volatile ("syncw\n\tsyncw" : : : "memory") -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Eric Dumazet wrote:
> Just to make sure, are you using a single syncw, or a double one ?
I'm using double. I know about the errata. It doesn't apply to all
Octeon versions but in their SDK they always use two without worrying
about the chip variant.
What I actually to patch this was look at the current LINUX prerelease
and copy what it does.
We'll see later today if all these changes fix my crash, it takes a
while to encounter it.
Joe Buehler
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Eric, tests with your fixes and with the smp barrier corrections are looking positive. More testing is needed before a final verdict is in however -- the load test crashed after 8 hours due to an application bug that we're working on fixing now. Joe Buehler -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le mercredi 27 octobre 2010 à 17:01 -0400, Joe Buehler a écrit : > Eric, tests with your fixes and with the smp barrier corrections are > looking positive. More testing is needed before a final verdict is in > however -- the load test crashed after 8 hours due to an application bug > that we're working on fixing now. > Excellent ! Thanks for the report. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
We have had no further problems with crashes. The SMP memory barrier fixes seem to have been the final piece that was missing. Thanks for all the help, you guys are great. Joe Buehler -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le jeudi 02 décembre 2010 à 08:36 -0500, Joe Buehler a écrit : > We have had no further problems with crashes. The SMP memory barrier > fixes seem to have been the final piece that was missing. > > Thanks for all the help, you guys are great. > > Joe Buehler > Great, thanks for your feedback Joe ! -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/net/core/fib_rules.c b/net/core/fib_rules.c index 42e84e0..d078728 100644 --- a/net/core/fib_rules.c +++ b/net/core/fib_rules.c @@ -225,9 +225,11 @@ jumped: err = ops->action(rule, fl, flags, arg); if (err != -EAGAIN) { - fib_rule_get(rule); - arg->rule = rule; - goto out; + if (likely(atomic_inc_not_zero(&rule->refcnt))) { + arg->rule = rule; + goto out; + } + break; } }