diff mbox series

[ovs-dev,2/2] system-traffic: Check frozen state handling with TLV map change

Message ID 1585252702-8649-2-git-send-email-pkusunyifeng@gmail.com
State Superseded
Headers show
Series [ovs-dev,1/2] tun_metadata: Fix coredump caused by use-after-free bug | expand

Commit Message

Yifeng Sun March 26, 2020, 7:58 p.m. UTC
This patch enhances a system traffic test to prevent regression on
the tunnel metadata table (tun_table) handling with frozen state.
Without a proper fix this test can crash ovs-vswitchd due to a
use-after-free bug on tun_table.

These are the timed sequence of how this bug is triggered:

- Adds an OpenFlow rule in OVS that matches Geneve tunnel metadata that
contains a controller action.
- When the first packet matches the aforementioned OpenFlow rule,
during the miss upcall, OVS stores a pointer to the tun_table (that
decodes the Geneve tunnel metadata) in a frozen state and pushes down
a datapath flow into kernel datapath.
- Issues a add-tlv-map command to reprogram the tun_table on OVS.
OVS frees the old tun_table and create a new tun_table.
- A subsequent packet hits the kernel datapath flow again. Since
there is a controller action associated with that flow, it triggers
slow path controller upcall.
- In the slow path controller upcall, OVS derives the tun_table
from the frozen state, which points to the old tun_table that is
already being freed at this time point.
- In order to access the tunnel metadata, OVS uses the invalid
pointer that points to the old tun_table and triggers the core dump.

Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com>
Co-authored-by: Yi-Hung Wei <yihung.wei@gmail.com>
---
 tests/system-traffic.at | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

Comments

William Tu April 6, 2020, 3:29 p.m. UTC | #1
Hi Yifeng,

Thanks for the patch, I can reproduce the issue using
$ make check-system-userspace TESTSUITEFLAGS='-k resume'
ASAN reports
==127707==ERROR: AddressSanitizer: heap-use-after-free on address
0x61f000020690 at pc 0x00000089cecf bp 0x7fff38f95690 sp
0x7fff38f95688
READ of size 4 at 0x61f000020690 thread T0
    #0 0x89cece in tun_metadata_get_fmd /root/ovs/lib/tun-metadata.c:394:52
    #1 0x66a3f9 in flow_get_metadata /root/ovs/lib/flow.c:1236:5
    #2 0x58a9ca in process_upcall
/root/ovs/ofproto/ofproto-dpif-upcall.c:1538:13
    #3 0x57a723 in upcall_cb /root/ovs/ofproto/ofproto-dpif-upcall.c:1311:13

However, applying your fix (patch 1/2) and run
$  make check-system-userspace TESTSUITEFLAGS='-k resume'
fix the crash but trigger other fail.

Can you take a look? Some comments below:

On Thu, Mar 26, 2020 at 12:58 PM Yifeng Sun <pkusunyifeng@gmail.com> wrote:
>
> This patch enhances a system traffic test to prevent regression on
> the tunnel metadata table (tun_table) handling with frozen state.
> Without a proper fix this test can crash ovs-vswitchd due to a
> use-after-free bug on tun_table.
>
> These are the timed sequence of how this bug is triggered:
>
> - Adds an OpenFlow rule in OVS that matches Geneve tunnel metadata that
> contains a controller action.
> - When the first packet matches the aforementioned OpenFlow rule,
> during the miss upcall, OVS stores a pointer to the tun_table (that
> decodes the Geneve tunnel metadata) in a frozen state and pushes down
> a datapath flow into kernel datapath.
> - Issues a add-tlv-map command to reprogram the tun_table on OVS.
> OVS frees the old tun_table and create a new tun_table.
> - A subsequent packet hits the kernel datapath flow again. Since
> there is a controller action associated with that flow, it triggers
> slow path controller upcall.
> - In the slow path controller upcall, OVS derives the tun_table
> from the frozen state, which points to the old tun_table that is
> already being freed at this time point.
> - In order to access the tunnel metadata, OVS uses the invalid
> pointer that points to the old tun_table and triggers the core dump.
>
> Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
> Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com>
> Co-authored-by: Yi-Hung Wei <yihung.wei@gmail.com>
> ---
>  tests/system-traffic.at | 14 ++++++++++++++
>  1 file changed, 14 insertions(+)
>
> diff --git a/tests/system-traffic.at b/tests/system-traffic.at
> index 4a39c929c207..992de8546c41 100644
> --- a/tests/system-traffic.at
> +++ b/tests/system-traffic.at
> @@ -611,6 +611,20 @@ NS_CHECK_EXEC([at_ns0], [ping -q -c 3 10.1.1.100 | FORMAT_PING], [0], [dnl
>  3 packets transmitted, 3 received, 0% packet loss, time 0ms
>  ])
>
> +dnl Test OVS handles TLV map modifictions properly when restores frozen state.
> +NS_CHECK_EXEC([at_ns0], [ping 10.1.1.100 > ping.out &])
since ping.out is not checked, can you use ping -c <count> -q?

> +
> +sleep 2
do we need this sleep 2

> +
> +AT_CHECK([ovs-ofctl add-tlv-map br0 "{class=0xffff,type=0x88,len=4}->tun_metadata1"])
> +sleep 1
> +AT_CHECK([ovs-ofctl add-tlv-map br0 "{class=0xffff,type=0x99,len=4}->tun_metadata2"])
> +sleep 1
> +AT_CHECK([ovs-ofctl add-tlv-map br0 "{class=0xffff,type=0xaa,len=4}->tun_metadata3"])
> +sleep 1
> +
> +dnl At this point, ovs-vswitchd will either crash or everything is OK.
can you remove this comment?

>` +
>  OVS_APP_EXIT_AND_WAIT([ovs-ofctl])
>  OVS_TRAFFIC_VSWITCHD_STOP
>  AT_CLEANUP
> --
> 2.7.4
>
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
William Tu April 6, 2020, 3:55 p.m. UTC | #2
On Mon, Apr 6, 2020 at 8:29 AM William Tu <u9012063@gmail.com> wrote:
>
> Hi Yifeng,
>
> Thanks for the patch, I can reproduce the issue using
> $ make check-system-userspace TESTSUITEFLAGS='-k resume'
> ASAN reports
> ==127707==ERROR: AddressSanitizer: heap-use-after-free on address
> 0x61f000020690 at pc 0x00000089cecf bp 0x7fff38f95690 sp
> 0x7fff38f95688
> READ of size 4 at 0x61f000020690 thread T0
>     #0 0x89cece in tun_metadata_get_fmd /root/ovs/lib/tun-metadata.c:394:52
>     #1 0x66a3f9 in flow_get_metadata /root/ovs/lib/flow.c:1236:5
>     #2 0x58a9ca in process_upcall
> /root/ovs/ofproto/ofproto-dpif-upcall.c:1538:13
>     #3 0x57a723 in upcall_cb /root/ovs/ofproto/ofproto-dpif-upcall.c:1311:13
>
> However, applying your fix (patch 1/2) and run
> $  make check-system-userspace TESTSUITEFLAGS='-k resume'
> fix the crash but trigger other fail.
>
FYI, with your patch, the failed log shows:
+2020-04-06T15:43:08.716Z|00001|dpif(revalidator7)|WARN|netdev@ovs-netdev:
failed to put[modify] (No such file or directory)
ufid:4c0cc511-5dfd-4afe-a43b-86889dcd3600
skb_priority(0/0),tunnel(tun_id=0x0,src=172.31.1.1,dst=172.31.1.100,ttl=64/0,tp_src=62880/0,tp_dst=6081/0,flags(-df-csum+key)),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(4),packet_type(ns=0,id=0),eth(src=ee:b9:ef:c8:ab:57/00:00:00:00:00:00,dst=33:33:00:00:00:16/00:00:00:00:00:00),eth_type(0x86dd),ipv6(src=::/::,dst=ff02::16/::,label=0/0,proto=58/0,tclass=0/0,hlimit=1/0,frag=no),icmpv6(type=143/0,code=0/0),
actions:userspace(pid=0,controller(reason=7,dont_send=0,continuation=0,recirc_id=7,rule_cookie=0,controller_id=0,max_len=65535))
+2020-04-06T15:43:08.716Z|00002|dpif(revalidator7)|WARN|netdev@ovs-netdev:
failed to put[modify] (No such file or directory)
ufid:e7d78a37-7dd8-4641-a71d-1c57e4b47329
skb_priority(0/0),tunnel(tun_id=0x0,src=172.31.1.1,dst=172.31.1.100,ttl=64/0,tp_src=22243/0,tp_dst=6081/0,flags(-df-csum+key)),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(4),packet_type(ns=0,id=0),eth(src=ee:b9:ef:c8:ab:57/00:00:00:00:00:00,dst=f2:7d:a0:68:ae:4a/00:00:00:00:00:00),eth_type(0x0800),ipv4(src=10.1.1.1/0.0.0.0,dst=10.1.1.100/0.0.0.0,proto=1,tos=0/0,ttl=64/0,frag=no),icmp(type=8/0,code=0/0),
actions:userspace(pid=0,controller(reason=1,dont_send=0,continuation=1,recirc_id=8,rule_cookie=0,controller_id=0,max_len=65535))
Yifeng Sun April 7, 2020, 8:52 p.m. UTC | #3
Thanks William for the comments. I will submit a new version.
Yifeng

On Mon, Apr 6, 2020 at 8:56 AM William Tu <u9012063@gmail.com> wrote:

> On Mon, Apr 6, 2020 at 8:29 AM William Tu <u9012063@gmail.com> wrote:
> >
> > Hi Yifeng,
> >
> > Thanks for the patch, I can reproduce the issue using
> > $ make check-system-userspace TESTSUITEFLAGS='-k resume'
> > ASAN reports
> > ==127707==ERROR: AddressSanitizer: heap-use-after-free on address
> > 0x61f000020690 at pc 0x00000089cecf bp 0x7fff38f95690 sp
> > 0x7fff38f95688
> > READ of size 4 at 0x61f000020690 thread T0
> >     #0 0x89cece in tun_metadata_get_fmd
> /root/ovs/lib/tun-metadata.c:394:52
> >     #1 0x66a3f9 in flow_get_metadata /root/ovs/lib/flow.c:1236:5
> >     #2 0x58a9ca in process_upcall
> > /root/ovs/ofproto/ofproto-dpif-upcall.c:1538:13
> >     #3 0x57a723 in upcall_cb
> /root/ovs/ofproto/ofproto-dpif-upcall.c:1311:13
> >
> > However, applying your fix (patch 1/2) and run
> > $  make check-system-userspace TESTSUITEFLAGS='-k resume'
> > fix the crash but trigger other fail.
> >
> FYI, with your patch, the failed log shows:
> +2020-04-06T15:43:08.716Z|00001|dpif(revalidator7)|WARN|netdev@ovs-netdev:
> failed to put[modify] (No such file or directory)
> ufid:4c0cc511-5dfd-4afe-a43b-86889dcd3600
>
> skb_priority(0/0),tunnel(tun_id=0x0,src=172.31.1.1,dst=172.31.1.100,ttl=64/0,tp_src=62880/0,tp_dst=6081/0,flags(-df-csum+key)),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(4),packet_type(ns=0,id=0),eth(src=ee:b9:ef:c8:ab:57/00:00:00:00:00:00,dst=33:33:00:00:00:16/00:00:00:00:00:00),eth_type(0x86dd),ipv6(src=::/::,dst=ff02::16/::,label=0/0,proto=58/0,tclass=0/0,hlimit=1/0,frag=no),icmpv6(type=143/0,code=0/0),
>
> actions:userspace(pid=0,controller(reason=7,dont_send=0,continuation=0,recirc_id=7,rule_cookie=0,controller_id=0,max_len=65535))
> +2020-04-06T15:43:08.716Z|00002|dpif(revalidator7)|WARN|netdev@ovs-netdev:
> failed to put[modify] (No such file or directory)
> ufid:e7d78a37-7dd8-4641-a71d-1c57e4b47329
>
> skb_priority(0/0),tunnel(tun_id=0x0,src=172.31.1.1,dst=172.31.1.100,ttl=64/0,tp_src=22243/0,tp_dst=6081/0,flags(-df-csum+key)),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(4),packet_type(ns=0,id=0),eth(src=ee:b9:ef:c8:ab:57/00:00:00:00:00:00,dst=f2:7d:a0:68:ae:4a/00:00:00:00:00:00),eth_type(0x0800),ipv4(src=
> 10.1.1.1/0.0.0.0,dst=10.1.1.100/0.0.0.0,proto=1,tos=0/0,ttl=64/0,frag=no
> ),icmp(type=8/0,code=0/0),
>
> actions:userspace(pid=0,controller(reason=1,dont_send=0,continuation=1,recirc_id=8,rule_cookie=0,controller_id=0,max_len=65535))
>
diff mbox series

Patch

diff --git a/tests/system-traffic.at b/tests/system-traffic.at
index 4a39c929c207..992de8546c41 100644
--- a/tests/system-traffic.at
+++ b/tests/system-traffic.at
@@ -611,6 +611,20 @@  NS_CHECK_EXEC([at_ns0], [ping -q -c 3 10.1.1.100 | FORMAT_PING], [0], [dnl
 3 packets transmitted, 3 received, 0% packet loss, time 0ms
 ])
 
+dnl Test OVS handles TLV map modifictions properly when restores frozen state.
+NS_CHECK_EXEC([at_ns0], [ping 10.1.1.100 > ping.out &])
+
+sleep 2
+
+AT_CHECK([ovs-ofctl add-tlv-map br0 "{class=0xffff,type=0x88,len=4}->tun_metadata1"])
+sleep 1
+AT_CHECK([ovs-ofctl add-tlv-map br0 "{class=0xffff,type=0x99,len=4}->tun_metadata2"])
+sleep 1
+AT_CHECK([ovs-ofctl add-tlv-map br0 "{class=0xffff,type=0xaa,len=4}->tun_metadata3"])
+sleep 1
+
+dnl At this point, ovs-vswitchd will either crash or everything is OK.
+
 OVS_APP_EXIT_AND_WAIT([ovs-ofctl])
 OVS_TRAFFIC_VSWITCHD_STOP
 AT_CLEANUP