diff mbox series

[ovs-dev] ipf.c: Fix userspace datapath crash caused by IP fragments

Message ID 3F279AFF-BF47-45FF-9E66-E104779E96A4@didiglobal.com
State Superseded
Headers show
Series [ovs-dev] ipf.c: Fix userspace datapath crash caused by IP fragments | expand

Commit Message

王亮 Liang Wang April 13, 2021, 5:25 a.m. UTC
From d5c454bc9c68a6b2b40a2301a21f7337d6086d4c Mon Sep 17 00:00:00 2001
From: Wang Liang <wangliangrt@didiglobal.com>
Date: Mon, 12 Apr 2021 18:24:21 +0800
Subject: [PATCH] Fix userspace datapath crash caused by IP fragments

The ovs userspace datapath will crash on openflow packet-out message
with IP packet largger than MTU. This patch will fix the problem.
When pre-processing the IP fragments, clone the packet if it has
'dnsteal' flag set.

Signed-off-by: Wang Liang <wangliangrt@didiglobal.com>
---
 lib/ipf.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Aaron Conole April 14, 2021, 6:54 p.m. UTC | #1
王亮 Liang Wang <wangliangrt@didiglobal.com> writes:

> From d5c454bc9c68a6b2b40a2301a21f7337d6086d4c Mon Sep 17 00:00:00 2001
> From: Wang Liang <wangliangrt@didiglobal.com>
> Date: Mon, 12 Apr 2021 18:24:21 +0800
> Subject: [PATCH] Fix userspace datapath crash caused by IP fragments
>
> The ovs userspace datapath will crash on openflow packet-out message
> with IP packet largger than MTU. This patch will fix the problem.
> When pre-processing the IP fragments, clone the packet if it has
> 'dnsteal' flag set.
>
> Signed-off-by: Wang Liang <wangliangrt@didiglobal.com>
> ---

I guess the real issue is that we don't honor the dnsteal flag
correctly.  It implies that we cannot safely hold a reference to the
dp_packet because some other entity (read: the conntrack processing
pipeline) will free it.

Can you provide a test case that shows this?  I've done a little bit of
testing, but haven't been able to reproduce.

>  lib/ipf.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/lib/ipf.c b/lib/ipf.c
> index c20bcc0b3..43f81706c 100644
> --- a/lib/ipf.c
> +++ b/lib/ipf.c
> @@ -811,7 +811,7 @@ ipf_process_frag(struct ipf *ipf, struct ipf_list *ipf_list,
>               * recommend not setting the mempool number of buffers too low
>               * and also clamp the number of fragments. */
>              struct ipf_frag *frag = &ipf_list->frag_list[last_inuse_idx + 1];
> -            frag->pkt = pkt;
> +            frag->pkt = dnsteal ? dp_packet_clone(pkt) : pkt;

I think the problem we have is once we do this duplication, I think
we'll leak a reference to the packet, because we record the dnsteal
value with frag, and that will skip the dp_packet_delete() calls later
on during fragment queue release.

I think with this change we also need to remove the dnsteal flag and
simply always own the packet that comes in.  Maybe I missed something?

>              frag->start_data_byte = start_data_byte;
>              frag->end_data_byte = end_data_byte;
>              frag->dnsteal = dnsteal;
王亮 Liang Wang April 15, 2021, 6:23 a.m. UTC | #2
Hi, Conole. Thank you for pointing out the mistakes in my last patch. Yes, I did missed some something important.
The correct and complete way to fix this problem is like the following patch. In fact, we have found and fixed this
crash problem one year ago; and this patch has worked very well in our production environment for 11 months.

From 7e13a39cdbdfb484d4cb6b074f08471dc16d22b7 Mon Sep 17 00:00:00 2001
From: Wang Liang <wangliangrt@didiglobal.com>
Date: Thu, 15 Apr 2021 13:52:01 +0800
Subject: [PATCH] Fix userspace datapath crash caused by IP fragments

The ovs userspace datapath will crash on openflow packet-out message
with IP packet largger than MTU. This patch will fix the problem.
When pre-processing the IP fragments, clone the packet if it has
'dnsteal' flag set.

Signed-off-by: Wang Liang <wangliangrt@didiglobal.com>
---
 lib/ipf.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/lib/ipf.c b/lib/ipf.c
index c20bcc0b3..176140afb 100644
--- a/lib/ipf.c
+++ b/lib/ipf.c
@@ -811,7 +811,7 @@ ipf_process_frag(struct ipf *ipf, struct ipf_list *ipf_list,
              * recommend not setting the mempool number of buffers too low
              * and also clamp the number of fragments. */
             struct ipf_frag *frag = &ipf_list->frag_list[last_inuse_idx + 1];
-            frag->pkt = pkt;
+            frag->pkt = dnsteal ? dp_packet_clone(pkt) : pkt;
             frag->start_data_byte = start_data_byte;
             frag->end_data_byte = end_data_byte;
             frag->dnsteal = dnsteal;
@@ -1338,9 +1338,7 @@ ipf_destroy(struct ipf *ipf)
         while (ipf_list->last_sent_idx < ipf_list->last_inuse_idx) {
             struct dp_packet *pkt
                 = ipf_list->frag_list[ipf_list->last_sent_idx + 1].pkt;
-            if (!ipf_list->frag_list[ipf_list->last_sent_idx + 1].dnsteal) {
-                dp_packet_delete(pkt);
-            }
+            dp_packet_delete(pkt);
             atomic_count_dec(&ipf->nfrag);
             ipf_list->last_sent_idx++;
         }
Aaron Conole April 15, 2021, 12:55 p.m. UTC | #3
王亮 Liang Wang <wangliangrt@didiglobal.com> writes:

> Hi, Conole. Thank you for pointing out the mistakes in my last patch. Yes, I did missed some something important.
> The correct and complete way to fix this problem is like the following patch. In fact, we have found and fixed this
> crash problem one year ago; and this patch has worked very well in our production environment for 11 months.
>
> From 7e13a39cdbdfb484d4cb6b074f08471dc16d22b7 Mon Sep 17 00:00:00 2001
> From: Wang Liang <wangliangrt@didiglobal.com>
> Date: Thu, 15 Apr 2021 13:52:01 +0800
> Subject: [PATCH] Fix userspace datapath crash caused by IP fragments
>
> The ovs userspace datapath will crash on openflow packet-out message
> with IP packet largger than MTU. This patch will fix the problem.
> When pre-processing the IP fragments, clone the packet if it has
> 'dnsteal' flag set.
>
> Signed-off-by: Wang Liang <wangliangrt@didiglobal.com>
> ---
>  lib/ipf.c | 6 ++----
>  1 file changed, 2 insertions(+), 4 deletions(-)
>
> diff --git a/lib/ipf.c b/lib/ipf.c
> index c20bcc0b3..176140afb 100644
> --- a/lib/ipf.c
> +++ b/lib/ipf.c
> @@ -811,7 +811,7 @@ ipf_process_frag(struct ipf *ipf, struct ipf_list *ipf_list,
>               * recommend not setting the mempool number of buffers too low
>               * and also clamp the number of fragments. */
>              struct ipf_frag *frag = &ipf_list->frag_list[last_inuse_idx + 1];
> -            frag->pkt = pkt;
> +            frag->pkt = dnsteal ? dp_packet_clone(pkt) : pkt;
>              frag->start_data_byte = start_data_byte;
>              frag->end_data_byte = end_data_byte;
>              frag->dnsteal = dnsteal;
> @@ -1338,9 +1338,7 @@ ipf_destroy(struct ipf *ipf)
>          while (ipf_list->last_sent_idx < ipf_list->last_inuse_idx) {
>              struct dp_packet *pkt
>                  = ipf_list->frag_list[ipf_list->last_sent_idx + 1].pkt;
> -            if (!ipf_list->frag_list[ipf_list->last_sent_idx + 1].dnsteal) {
> -                dp_packet_delete(pkt);
> -            }
> +            dp_packet_delete(pkt);

This looks better.

There are no more uses of (struct ipf_frag *)->dnsteal - so I think we should
remove it from the struct as well.

I know you wrote that you've been running this in production for a year,
and I think the change looks correct, but is it possible to have a test
case (maybe even a pcap that we can extract) to go along with it?

>              atomic_count_dec(&ipf->nfrag);
>              ipf_list->last_sent_idx++;
>          }
diff mbox series

Patch

diff --git a/lib/ipf.c b/lib/ipf.c
index c20bcc0b3..43f81706c 100644
--- a/lib/ipf.c
+++ b/lib/ipf.c
@@ -811,7 +811,7 @@  ipf_process_frag(struct ipf *ipf, struct ipf_list *ipf_list,
              * recommend not setting the mempool number of buffers too low
              * and also clamp the number of fragments. */
             struct ipf_frag *frag = &ipf_list->frag_list[last_inuse_idx + 1];
-            frag->pkt = pkt;
+            frag->pkt = dnsteal ? dp_packet_clone(pkt) : pkt;
             frag->start_data_byte = start_data_byte;
             frag->end_data_byte = end_data_byte;
             frag->dnsteal = dnsteal;