Message ID | 20231018142353.2439714-1-david.marchand@redhat.com |
---|---|
State | Accepted |
Commit | bd86266ea9ab3452173b70912f1c967db16ef0dd |
Headers | show |
Series | [ovs-dev,v2] ofproto-dpif-upcall: Pause revalidators when purging. | expand |
Context | Check | Description |
---|---|---|
ovsrobot/apply-robot | success | apply and check: success |
ovsrobot/github-robot-_Build_and_Test | success | github build: passed |
ovsrobot/intel-ovs-compilation | success | test: success |
On 18 Oct 2023, at 16:23, David Marchand wrote: > This issue has been observed when running traffic tests with a dpdk > enabled userspace datapath (though those tests are added in a separate > series). > However, the described issue also affects the kernel datapath which is > why this patch is sent separately. > > A main thread executing the 'revalidator/purge' command could race with > revalidator threads that can be dumping/sweeping the purged flows at the > same time. > > This race can be reproduced (with dpif debug logs) by running the > conntrack - ICMP related unit tests with the userspace datapath: > > 2023-10-09T14:11:55.242Z|00177|unixctl|DBG|received request > revalidator/purge[], id=0 > 2023-10-09T14:11:55.242Z|00044|dpif(revalidator17)|DBG|netdev@ovs-netdev: > flow_dump ufid:68ff6817-fb3b-4b30-8412-9cf175318294 <empty>, > packets:0, bytes:0, used:never > 2023-10-09T14:11:55.242Z|00178|dpif|DBG|netdev@ovs-netdev: flow_del > ufid:07046e91-30a6-4862-9048-1a76b5a88a5b > recirc_id(0),dp_hash(0),skb_priority(0),in_port(2),skb_mark(0), > ct_state(0),ct_zone(0),ct_mark(0),ct_label(0), > packet_type(ns=0,id=0), > eth(src=a6:0a:bf:e2:f3:f2,dst=62:23:0f:f6:2c:75), > eth_type(0x0800),ipv4(src=10.1.1.1,dst=10.1.1.2,proto=17,tos=0, > ttl=64,frag=no),udp(src=37380,dst=10000), packets:0, bytes:0, > used:never > ... > 2023-10-09T14:11:55.242Z|00049|dpif(revalidator17)|WARN|netdev@ovs-netdev: > failed to flow_get (No such file or directory) > ufid:07046e91-30a6-4862-9048-1a76b5a88a5b <empty>, packets:0, > bytes:0, used:never > 2023-10-09T14:11:55.242Z|00050|ofproto_dpif_upcall(revalidator17)|WARN| > Failed to acquire udpif_key corresponding to unexpected flow > (No such file or directory): > ufid:07046e91-30a6-4862-9048-1a76b5a88a5b > ... > 2023-10-09T14:11:55.242Z|00183|unixctl|DBG|replying with success, id=0: "" > > To avoid this race, a first part of the fix is to pause (if not already > paused) the revalidators while the main thread is purging the datapath > flows. > > Then a second issue is observed by running the same unit test with the > kernel datapath. Its dpif implementation dumps flows via a netlink request > (see dpif_flow_dump_create(), dpif_netlink_flow_dump_create(), > nl_dump_start(), nl_sock_send__()) in the leader revalidator thread, > before pausing revalidators: > > 2023-10-09T14:44:28.742Z|00122|unixctl|DBG|received request > revalidator/purge[], id=0 > ... > 2023-10-09T14:44:28.742Z|00125|dpif|DBG|system@ovs-system: flow_del > ufid:70102d81-30a1-44b9-aa76-3d02a9ffd2c9 recirc_id(0),dp_hash(0), > skb_priority(0),in_port(2),skb_mark(0),ct_state(0),ct_zone(0), > ct_mark(0),ct_label(0),eth(src=a6:0a:bf:e2:f3:f2, > dst=ff:ff:ff:ff:ff:ff),eth_type(0x0806),arp(sip=10.1.1.1, > tip=10.1.1.2,op=1,sha=a6:0a:bf:e2:f3:f2,tha=00:00:00:00:00:00), > packets:0, bytes:0, used:never > ... > 2023-10-09T14:44:28.742Z|00129|unixctl|DBG|replying with success, id=0: "" > ... > 2023-10-09T14:44:28.742Z|00006|dpif(revalidator21)|DBG|system@ovs-system: > flow_dump ufid:70102d81-30a1-44b9-aa76-3d02a9ffd2c9 <empty>, > packets:0, bytes:0, used:never > ... > 2023-10-09T14:44:28.742Z|00012|dpif(revalidator21)|WARN|system@ovs-system: > failed to flow_get (No such file or directory) > ufid:70102d81-30a1-44b9-aa76-3d02a9ffd2c9 <empty>, packets:0, > bytes:0, used:never > 2023-10-09T14:44:28.742Z|00013|ofproto_dpif_upcall(revalidator21)|WARN| > Failed to acquire udpif_key corresponding to unexpected flow > (No such file or directory): > ufid:70102d81-30a1-44b9-aa76-3d02a9ffd2c9 > > To avoid evaluating already deleted flows, the second part of the fix is > to ensure that dumping from the leader revalidator thread is done out of > any pause request. > > As a result of this patch, the unit test "offloads - delete ufid mapping > if device not exist - offloads enabled" does not need to waive the random > warning logs when purging dp flows. > > Fixes: 98bb4286970d ("tests: Add command to purge revalidators of flows.") > Acked-by: Eelco Chaudron <echaudro@redhat.com> > Signed-off-by: David Marchand <david.marchand@redhat.com> ACKing the additional changes, and confirming that running the offload test 100 times did not show any problems! Acked-by: Eelco Chaudron <echaudro@redhat.com>
On Thu, Oct 19, 2023 at 03:27:29PM +0200, Eelco Chaudron wrote: > > > On 18 Oct 2023, at 16:23, David Marchand wrote: > > > This issue has been observed when running traffic tests with a dpdk > > enabled userspace datapath (though those tests are added in a separate > > series). > > However, the described issue also affects the kernel datapath which is > > why this patch is sent separately. > > > > A main thread executing the 'revalidator/purge' command could race with > > revalidator threads that can be dumping/sweeping the purged flows at the > > same time. > > > > This race can be reproduced (with dpif debug logs) by running the > > conntrack - ICMP related unit tests with the userspace datapath: > > > > 2023-10-09T14:11:55.242Z|00177|unixctl|DBG|received request > > revalidator/purge[], id=0 > > 2023-10-09T14:11:55.242Z|00044|dpif(revalidator17)|DBG|netdev@ovs-netdev: > > flow_dump ufid:68ff6817-fb3b-4b30-8412-9cf175318294 <empty>, > > packets:0, bytes:0, used:never > > 2023-10-09T14:11:55.242Z|00178|dpif|DBG|netdev@ovs-netdev: flow_del > > ufid:07046e91-30a6-4862-9048-1a76b5a88a5b > > recirc_id(0),dp_hash(0),skb_priority(0),in_port(2),skb_mark(0), > > ct_state(0),ct_zone(0),ct_mark(0),ct_label(0), > > packet_type(ns=0,id=0), > > eth(src=a6:0a:bf:e2:f3:f2,dst=62:23:0f:f6:2c:75), > > eth_type(0x0800),ipv4(src=10.1.1.1,dst=10.1.1.2,proto=17,tos=0, > > ttl=64,frag=no),udp(src=37380,dst=10000), packets:0, bytes:0, > > used:never > > ... > > 2023-10-09T14:11:55.242Z|00049|dpif(revalidator17)|WARN|netdev@ovs-netdev: > > failed to flow_get (No such file or directory) > > ufid:07046e91-30a6-4862-9048-1a76b5a88a5b <empty>, packets:0, > > bytes:0, used:never > > 2023-10-09T14:11:55.242Z|00050|ofproto_dpif_upcall(revalidator17)|WARN| > > Failed to acquire udpif_key corresponding to unexpected flow > > (No such file or directory): > > ufid:07046e91-30a6-4862-9048-1a76b5a88a5b > > ... > > 2023-10-09T14:11:55.242Z|00183|unixctl|DBG|replying with success, id=0: "" > > > > To avoid this race, a first part of the fix is to pause (if not already > > paused) the revalidators while the main thread is purging the datapath > > flows. > > > > Then a second issue is observed by running the same unit test with the > > kernel datapath. Its dpif implementation dumps flows via a netlink request > > (see dpif_flow_dump_create(), dpif_netlink_flow_dump_create(), > > nl_dump_start(), nl_sock_send__()) in the leader revalidator thread, > > before pausing revalidators: > > > > 2023-10-09T14:44:28.742Z|00122|unixctl|DBG|received request > > revalidator/purge[], id=0 > > ... > > 2023-10-09T14:44:28.742Z|00125|dpif|DBG|system@ovs-system: flow_del > > ufid:70102d81-30a1-44b9-aa76-3d02a9ffd2c9 recirc_id(0),dp_hash(0), > > skb_priority(0),in_port(2),skb_mark(0),ct_state(0),ct_zone(0), > > ct_mark(0),ct_label(0),eth(src=a6:0a:bf:e2:f3:f2, > > dst=ff:ff:ff:ff:ff:ff),eth_type(0x0806),arp(sip=10.1.1.1, > > tip=10.1.1.2,op=1,sha=a6:0a:bf:e2:f3:f2,tha=00:00:00:00:00:00), > > packets:0, bytes:0, used:never > > ... > > 2023-10-09T14:44:28.742Z|00129|unixctl|DBG|replying with success, id=0: "" > > ... > > 2023-10-09T14:44:28.742Z|00006|dpif(revalidator21)|DBG|system@ovs-system: > > flow_dump ufid:70102d81-30a1-44b9-aa76-3d02a9ffd2c9 <empty>, > > packets:0, bytes:0, used:never > > ... > > 2023-10-09T14:44:28.742Z|00012|dpif(revalidator21)|WARN|system@ovs-system: > > failed to flow_get (No such file or directory) > > ufid:70102d81-30a1-44b9-aa76-3d02a9ffd2c9 <empty>, packets:0, > > bytes:0, used:never > > 2023-10-09T14:44:28.742Z|00013|ofproto_dpif_upcall(revalidator21)|WARN| > > Failed to acquire udpif_key corresponding to unexpected flow > > (No such file or directory): > > ufid:70102d81-30a1-44b9-aa76-3d02a9ffd2c9 > > > > To avoid evaluating already deleted flows, the second part of the fix is > > to ensure that dumping from the leader revalidator thread is done out of > > any pause request. > > > > As a result of this patch, the unit test "offloads - delete ufid mapping > > if device not exist - offloads enabled" does not need to waive the random > > warning logs when purging dp flows. > > > > Fixes: 98bb4286970d ("tests: Add command to purge revalidators of flows.") > > Acked-by: Eelco Chaudron <echaudro@redhat.com> > > Signed-off-by: David Marchand <david.marchand@redhat.com> > > > ACKing the additional changes, and confirming that running the offload test 100 times did not show any problems! > > Acked-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Simon Horman <horms@ovn.org>
On 10/20/23 14:41, Simon Horman wrote: > On Thu, Oct 19, 2023 at 03:27:29PM +0200, Eelco Chaudron wrote: >> >> >> On 18 Oct 2023, at 16:23, David Marchand wrote: >> >>> This issue has been observed when running traffic tests with a dpdk >>> enabled userspace datapath (though those tests are added in a separate >>> series). >>> However, the described issue also affects the kernel datapath which is >>> why this patch is sent separately. >>> >>> A main thread executing the 'revalidator/purge' command could race with >>> revalidator threads that can be dumping/sweeping the purged flows at the >>> same time. >>> >>> This race can be reproduced (with dpif debug logs) by running the >>> conntrack - ICMP related unit tests with the userspace datapath: >>> >>> 2023-10-09T14:11:55.242Z|00177|unixctl|DBG|received request >>> revalidator/purge[], id=0 >>> 2023-10-09T14:11:55.242Z|00044|dpif(revalidator17)|DBG|netdev@ovs-netdev: >>> flow_dump ufid:68ff6817-fb3b-4b30-8412-9cf175318294 <empty>, >>> packets:0, bytes:0, used:never >>> 2023-10-09T14:11:55.242Z|00178|dpif|DBG|netdev@ovs-netdev: flow_del >>> ufid:07046e91-30a6-4862-9048-1a76b5a88a5b >>> recirc_id(0),dp_hash(0),skb_priority(0),in_port(2),skb_mark(0), >>> ct_state(0),ct_zone(0),ct_mark(0),ct_label(0), >>> packet_type(ns=0,id=0), >>> eth(src=a6:0a:bf:e2:f3:f2,dst=62:23:0f:f6:2c:75), >>> eth_type(0x0800),ipv4(src=10.1.1.1,dst=10.1.1.2,proto=17,tos=0, >>> ttl=64,frag=no),udp(src=37380,dst=10000), packets:0, bytes:0, >>> used:never >>> ... >>> 2023-10-09T14:11:55.242Z|00049|dpif(revalidator17)|WARN|netdev@ovs-netdev: >>> failed to flow_get (No such file or directory) >>> ufid:07046e91-30a6-4862-9048-1a76b5a88a5b <empty>, packets:0, >>> bytes:0, used:never >>> 2023-10-09T14:11:55.242Z|00050|ofproto_dpif_upcall(revalidator17)|WARN| >>> Failed to acquire udpif_key corresponding to unexpected flow >>> (No such file or directory): >>> ufid:07046e91-30a6-4862-9048-1a76b5a88a5b >>> ... >>> 2023-10-09T14:11:55.242Z|00183|unixctl|DBG|replying with success, id=0: "" >>> >>> To avoid this race, a first part of the fix is to pause (if not already >>> paused) the revalidators while the main thread is purging the datapath >>> flows. >>> >>> Then a second issue is observed by running the same unit test with the >>> kernel datapath. Its dpif implementation dumps flows via a netlink request >>> (see dpif_flow_dump_create(), dpif_netlink_flow_dump_create(), >>> nl_dump_start(), nl_sock_send__()) in the leader revalidator thread, >>> before pausing revalidators: >>> >>> 2023-10-09T14:44:28.742Z|00122|unixctl|DBG|received request >>> revalidator/purge[], id=0 >>> ... >>> 2023-10-09T14:44:28.742Z|00125|dpif|DBG|system@ovs-system: flow_del >>> ufid:70102d81-30a1-44b9-aa76-3d02a9ffd2c9 recirc_id(0),dp_hash(0), >>> skb_priority(0),in_port(2),skb_mark(0),ct_state(0),ct_zone(0), >>> ct_mark(0),ct_label(0),eth(src=a6:0a:bf:e2:f3:f2, >>> dst=ff:ff:ff:ff:ff:ff),eth_type(0x0806),arp(sip=10.1.1.1, >>> tip=10.1.1.2,op=1,sha=a6:0a:bf:e2:f3:f2,tha=00:00:00:00:00:00), >>> packets:0, bytes:0, used:never >>> ... >>> 2023-10-09T14:44:28.742Z|00129|unixctl|DBG|replying with success, id=0: "" >>> ... >>> 2023-10-09T14:44:28.742Z|00006|dpif(revalidator21)|DBG|system@ovs-system: >>> flow_dump ufid:70102d81-30a1-44b9-aa76-3d02a9ffd2c9 <empty>, >>> packets:0, bytes:0, used:never >>> ... >>> 2023-10-09T14:44:28.742Z|00012|dpif(revalidator21)|WARN|system@ovs-system: >>> failed to flow_get (No such file or directory) >>> ufid:70102d81-30a1-44b9-aa76-3d02a9ffd2c9 <empty>, packets:0, >>> bytes:0, used:never >>> 2023-10-09T14:44:28.742Z|00013|ofproto_dpif_upcall(revalidator21)|WARN| >>> Failed to acquire udpif_key corresponding to unexpected flow >>> (No such file or directory): >>> ufid:70102d81-30a1-44b9-aa76-3d02a9ffd2c9 >>> >>> To avoid evaluating already deleted flows, the second part of the fix is >>> to ensure that dumping from the leader revalidator thread is done out of >>> any pause request. >>> >>> As a result of this patch, the unit test "offloads - delete ufid mapping >>> if device not exist - offloads enabled" does not need to waive the random >>> warning logs when purging dp flows. >>> >>> Fixes: 98bb4286970d ("tests: Add command to purge revalidators of flows.") >>> Acked-by: Eelco Chaudron <echaudro@redhat.com> >>> Signed-off-by: David Marchand <david.marchand@redhat.com> >> >> >> ACKing the additional changes, and confirming that running the offload test 100 times did not show any problems! >> >> Acked-by: Eelco Chaudron <echaudro@redhat.com> > > Acked-by: Simon Horman <horms@ovn.org> > Thanks! Applied and backported down to 2.17. Best regards, Ilya Maximets.
diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c index cde03abc6d..e49e9f7e50 100644 --- a/ofproto/ofproto-dpif-upcall.c +++ b/ofproto/ofproto-dpif-upcall.c @@ -990,7 +990,7 @@ udpif_revalidator(void *arg) udpif->reval_exit = latch_is_set(&udpif->exit_latch); start_time = time_msec(); - if (!udpif->reval_exit) { + if (!udpif->reval_exit && !udpif->pause) { bool terse_dump; terse_dump = udpif_use_ufid(udpif); @@ -1000,10 +1000,15 @@ udpif_revalidator(void *arg) } } - /* Wait for the leader to start the flow dump. */ + /* Wait for the leader to reach this point. */ ovs_barrier_block(&udpif->reval_barrier); if (udpif->pause) { revalidator_pause(revalidator); + if (!udpif->reval_exit) { + /* The main thread resumed all validators, but the leader + * dumped nothing, go to next iteration. */ + continue; + } } if (udpif->reval_exit) { @@ -3217,11 +3222,19 @@ upcall_unixctl_purge(struct unixctl_conn *conn, int argc OVS_UNUSED, struct udpif *udpif; LIST_FOR_EACH (udpif, list_node, &all_udpifs) { + bool wake_up = false; int n; + if (!latch_is_set(&udpif->pause_latch)) { + udpif_pause_revalidators(udpif); + wake_up = true; + } for (n = 0; n < udpif->n_revalidators; n++) { revalidator_purge(&udpif->revalidators[n]); } + if (wake_up) { + udpif_resume_revalidators(udpif); + } } unixctl_command_reply(conn, ""); } diff --git a/tests/system-offloads-traffic.at b/tests/system-offloads-traffic.at index 81f3dc8c1e..e9a4587653 100644 --- a/tests/system-offloads-traffic.at +++ b/tests/system-offloads-traffic.at @@ -799,8 +799,6 @@ AT_CHECK([test $(ovs-appctl dpctl/dump-flows | grep -c "eth_type(0x0800)") -eq 0 OVS_TRAFFIC_VSWITCHD_STOP(["/could not open network device ovs-p0/d /on nonexistent port/d -/failed to flow_get/d -/Failed to acquire udpif_key/d /No such device/d /failed to offload flow/d "])