[ovs-dev,RFC] ofproto-dpif-upcall: do not pause on uninitialized pause barriers
diff mbox

Message ID 1442435537-18772-1-git-send-email-zoltan.kiss@linaro.org
State Rejected
Headers show

Commit Message

Zoltan Kiss Sept. 16, 2015, 8:32 p.m. UTC
e4e74c3a "dpif-netdev: Purge all ukeys when reconfigure pmd." introduced a new
dp_purge_cb function, which calls udpif_pause_revalidators() and that tries to
block on pause_barrier.
But if OVS was started with flow-restore-wait="true" (e.g. through ovs-ctl),
type_run() will have backer->recv_set_enable == false, and udpif_set_threads
won't initialize the barrier, which leads to a segfault like this:

#0  seq_read (seq=0x0) at lib/seq.c:121
#1  0x00000000004f33a2 in ovs_barrier_block (barrier=barrier@entry=0x9304a0) at lib/ovs-thread.c:291
#2  0x0000000000445a01 in udpif_pause_revalidators (udpif=0x930410) at ofproto/ofproto-dpif-upcall.c:526
#3  dp_purge_cb (aux=0x930410, pmd_id=4294967295) at ofproto/ofproto-dpif-upcall.c:2271
#4  0x0000000000471552 in dp_netdev_del_pmd (pmd=pmd@entry=0x961f10, dp=<optimised out>, dp=<optimised out>) at lib/dpif-netdev.c:2891
#5  0x00000000004716d9 in dp_netdev_destroy_all_pmds (dp=0x948550) at lib/dpif-netdev.c:2904
#6  dpif_netdev_pmd_set (dpif=<optimised out>, n_rxqs=1, cmask=0x946250 "2") at lib/dpif-netdev.c:2385
#7  0x000000000047873a in dpif_poll_threads_set (dpif=0x95ed10, n_rxqs=<optimised out>, cmask=<optimised out>) at lib/dpif.c:1411
#8  0x000000000043897f in type_run (type=<optimised out>) at ofproto/ofproto-dpif.c:558
#9  0x000000000042b195 in ofproto_type_run (datapath_type=<optimised out>, datapath_type@entry=0xcf23e0 "netdev") at ofproto/ofproto.c:1655
#10 0x000000000040f395 in bridge_run__ () at vswitchd/bridge.c:2875
#11 0x00000000004155b3 in bridge_reconfigure (ovs_cfg=ovs_cfg@entry=0x961db0) at vswitchd/bridge.c:700
#12 0x0000000000418439 in bridge_run () at vswitchd/bridge.c:2984
#13 0x000000000040d025 in main (argc=11, argv=0x7fffffffe9a8) at vswitchd/ovs-vswitchd.c:120

I'm not sure that checking for this initialization is the right thing here, but
at least OVS starts up.

Signed-off-by: Zoltan Kiss <zoltan.kiss@linaro.org>

Comments

Joe Stringer Sept. 16, 2015, 9:54 p.m. UTC | #1
On 16 September 2015 at 13:32, Zoltan Kiss <zoltan.kiss@linaro.org> wrote:
> e4e74c3a "dpif-netdev: Purge all ukeys when reconfigure pmd." introduced a new
> dp_purge_cb function, which calls udpif_pause_revalidators() and that tries to
> block on pause_barrier.
> But if OVS was started with flow-restore-wait="true" (e.g. through ovs-ctl),
> type_run() will have backer->recv_set_enable == false, and udpif_set_threads
> won't initialize the barrier, which leads to a segfault like this:
>
> #0  seq_read (seq=0x0) at lib/seq.c:121
> #1  0x00000000004f33a2 in ovs_barrier_block (barrier=barrier@entry=0x9304a0) at lib/ovs-thread.c:291
> #2  0x0000000000445a01 in udpif_pause_revalidators (udpif=0x930410) at ofproto/ofproto-dpif-upcall.c:526
> #3  dp_purge_cb (aux=0x930410, pmd_id=4294967295) at ofproto/ofproto-dpif-upcall.c:2271
> #4  0x0000000000471552 in dp_netdev_del_pmd (pmd=pmd@entry=0x961f10, dp=<optimised out>, dp=<optimised out>) at lib/dpif-netdev.c:2891
> #5  0x00000000004716d9 in dp_netdev_destroy_all_pmds (dp=0x948550) at lib/dpif-netdev.c:2904
> #6  dpif_netdev_pmd_set (dpif=<optimised out>, n_rxqs=1, cmask=0x946250 "2") at lib/dpif-netdev.c:2385
> #7  0x000000000047873a in dpif_poll_threads_set (dpif=0x95ed10, n_rxqs=<optimised out>, cmask=<optimised out>) at lib/dpif.c:1411
> #8  0x000000000043897f in type_run (type=<optimised out>) at ofproto/ofproto-dpif.c:558
> #9  0x000000000042b195 in ofproto_type_run (datapath_type=<optimised out>, datapath_type@entry=0xcf23e0 "netdev") at ofproto/ofproto.c:1655
> #10 0x000000000040f395 in bridge_run__ () at vswitchd/bridge.c:2875
> #11 0x00000000004155b3 in bridge_reconfigure (ovs_cfg=ovs_cfg@entry=0x961db0) at vswitchd/bridge.c:700
> #12 0x0000000000418439 in bridge_run () at vswitchd/bridge.c:2984
> #13 0x000000000040d025 in main (argc=11, argv=0x7fffffffe9a8) at vswitchd/ovs-vswitchd.c:120
>
> I'm not sure that checking for this initialization is the right thing here, but
> at least OVS starts up.
>
> Signed-off-by: Zoltan Kiss <zoltan.kiss@linaro.org>

Thanks for the report.

I think this proposal will only fix the initial time this is hit. Ie,
if the threads are reconfigured, then the barrier will be destroyed
but this doesn't automatically zero the barrier.seq so you'll end up
hitting the same issue.

How about if there was a function in ofproto-dpif.c something like
"bool ofproto_dpif_backer_enabled(backer)" which returns the status of
the recv_set_enable flag. The udpif_{pause,resume}_revalidators()
functions could use this to determine whether to execute the barrier
block. Also, I think the latch calls should be nested within the
check.

CC: Alex for a second opinion
Zoltan Kiss Sept. 17, 2015, 10:27 a.m. UTC | #2
On 16/09/15 22:54, Joe Stringer wrote:
> On 16 September 2015 at 13:32, Zoltan Kiss <zoltan.kiss@linaro.org> wrote:
>> e4e74c3a "dpif-netdev: Purge all ukeys when reconfigure pmd." introduced a new
>> dp_purge_cb function, which calls udpif_pause_revalidators() and that tries to
>> block on pause_barrier.
>> But if OVS was started with flow-restore-wait="true" (e.g. through ovs-ctl),
>> type_run() will have backer->recv_set_enable == false, and udpif_set_threads
>> won't initialize the barrier, which leads to a segfault like this:
>>
>> #0  seq_read (seq=0x0) at lib/seq.c:121
>> #1  0x00000000004f33a2 in ovs_barrier_block (barrier=barrier@entry=0x9304a0) at lib/ovs-thread.c:291
>> #2  0x0000000000445a01 in udpif_pause_revalidators (udpif=0x930410) at ofproto/ofproto-dpif-upcall.c:526
>> #3  dp_purge_cb (aux=0x930410, pmd_id=4294967295) at ofproto/ofproto-dpif-upcall.c:2271
>> #4  0x0000000000471552 in dp_netdev_del_pmd (pmd=pmd@entry=0x961f10, dp=<optimised out>, dp=<optimised out>) at lib/dpif-netdev.c:2891
>> #5  0x00000000004716d9 in dp_netdev_destroy_all_pmds (dp=0x948550) at lib/dpif-netdev.c:2904
>> #6  dpif_netdev_pmd_set (dpif=<optimised out>, n_rxqs=1, cmask=0x946250 "2") at lib/dpif-netdev.c:2385
>> #7  0x000000000047873a in dpif_poll_threads_set (dpif=0x95ed10, n_rxqs=<optimised out>, cmask=<optimised out>) at lib/dpif.c:1411
>> #8  0x000000000043897f in type_run (type=<optimised out>) at ofproto/ofproto-dpif.c:558
>> #9  0x000000000042b195 in ofproto_type_run (datapath_type=<optimised out>, datapath_type@entry=0xcf23e0 "netdev") at ofproto/ofproto.c:1655
>> #10 0x000000000040f395 in bridge_run__ () at vswitchd/bridge.c:2875
>> #11 0x00000000004155b3 in bridge_reconfigure (ovs_cfg=ovs_cfg@entry=0x961db0) at vswitchd/bridge.c:700
>> #12 0x0000000000418439 in bridge_run () at vswitchd/bridge.c:2984
>> #13 0x000000000040d025 in main (argc=11, argv=0x7fffffffe9a8) at vswitchd/ovs-vswitchd.c:120
>>
>> I'm not sure that checking for this initialization is the right thing here, but
>> at least OVS starts up.
>>
>> Signed-off-by: Zoltan Kiss <zoltan.kiss@linaro.org>
>
> Thanks for the report.
>
> I think this proposal will only fix the initial time this is hit. Ie,
> if the threads are reconfigured, then the barrier will be destroyed
> but this doesn't automatically zero the barrier.seq so you'll end up
> hitting the same issue.

Yes, I had a feeling it won't be zeroed anywhere.

>
> How about if there was a function in ofproto-dpif.c something like
> "bool ofproto_dpif_backer_enabled(backer)" which returns the status of
> the recv_set_enable flag. The udpif_{pause,resume}_revalidators()
> functions could use this to determine whether to execute the barrier
> block.

Sounds good, I'll send a patch for that.

  Also, I think the latch calls should be nested within the
> check.

Ok.


>
> CC: Alex for a second opinion
>

Patch
diff mbox

diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c
index 8a43bbf..4b6fac4 100644
--- a/ofproto/ofproto-dpif-upcall.c
+++ b/ofproto/ofproto-dpif-upcall.c
@@ -523,7 +523,8 @@  static void
 udpif_pause_revalidators(struct udpif *udpif)
 {
     latch_set(&udpif->pause_latch);
-    ovs_barrier_block(&udpif->pause_barrier);
+    if (udpif->pause_barrier.seq)
+        ovs_barrier_block(&udpif->pause_barrier);
 }
 
 /* Resumes the pausing of revalidators.  Should only be called by the
@@ -532,7 +533,8 @@  static void
 udpif_resume_revalidators(struct udpif *udpif)
 {
     latch_poll(&udpif->pause_latch);
-    ovs_barrier_block(&udpif->pause_barrier);
+    if (udpif->pause_barrier.seq)
+        ovs_barrier_block(&udpif->pause_barrier);
 }
 
 /* Tells 'udpif' how many threads it should use to handle upcalls.