diff mbox series

[ovs-dev] ofproto-dpif-trace: Fix infinite recirculation tracing.

Message ID 20240222150633.2626722-1-i.maximets@ovn.org
State Accepted
Commit 6fc215de30f51e66e60a7c11083e2597850599e5
Delegated to: Ilya Maximets
Headers show
Series [ovs-dev] ofproto-dpif-trace: Fix infinite recirculation tracing. | expand

Checks

Context Check Description
ovsrobot/apply-robot success apply and check: success
ovsrobot/github-robot-_Build_and_Test success github build: passed
ovsrobot/intel-ovs-compilation fail test: fail

Commit Message

Ilya Maximets Feb. 22, 2024, 3:06 p.m. UTC
Trace attempts to process all the recirculations.  However, if there
is a recirculation loop, i.e. if every recirculation generates another
recirculation, this process will never stop.  It will grind until the
trace fills the system memory.

A simple reproducer:

  make sandbox
  ovs-vsctl add-br br0
  ovs-vsctl add-port br0 p1
  ovs-ofctl add-flow br0 "table=0,in_port=p1,ip,actions=ct(table=0)"
  ovs-appctl ofproto/trace br0 in_port=p1,ip

Limit the number of recirculations trace is processing with a fairly
arbitrary number - 4096 (loosely based on the resubmit limit, but
they are not actually related).

Not adding a test for this since it's only for a trace, but also
because the test may lead to OOM event in a system if the test fails,
which is not nice.

Fixes: e6bc8e749381 ("ofproto/trace: Add support for tracing conntrack recirculation")
Reported-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
---
 ofproto/ofproto-dpif-trace.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

Comments

Jaime Caamaño Ruiz Feb. 22, 2024, 3:46 p.m. UTC | #1
We have the option to tweak the ct action ct_state output with `*--ct-next*
*flags* <https://man.archlinux.org/man/ovs-vswitchd.8.en#ct-next>` trace
option.
Would it make sense to have the same capability for the ip_frag field given
that the ct action might influence its value after re-injection? Or it does
not?
Not suggesting it for this patch, but maybe a further improvement down the
line.

On Thu, Feb 22, 2024 at 4:34 PM Ilya Maximets <i.maximets@ovn.org> wrote:

> Trace attempts to process all the recirculations.  However, if there
> is a recirculation loop, i.e. if every recirculation generates another
> recirculation, this process will never stop.  It will grind until the
> trace fills the system memory.
>
> A simple reproducer:
>
>   make sandbox
>   ovs-vsctl add-br br0
>   ovs-vsctl add-port br0 p1
>   ovs-ofctl add-flow br0 "table=0,in_port=p1,ip,actions=ct(table=0)"
>   ovs-appctl ofproto/trace br0 in_port=p1,ip
>
> Limit the number of recirculations trace is processing with a fairly
> arbitrary number - 4096 (loosely based on the resubmit limit, but
> they are not actually related).
>
> Not adding a test for this since it's only for a trace, but also
> because the test may lead to OOM event in a system if the test fails,
> which is not nice.
>
> Fixes: e6bc8e749381 ("ofproto/trace: Add support for tracing conntrack
> recirculation")
> Reported-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>
> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
> ---
>  ofproto/ofproto-dpif-trace.c | 18 ++++++++++++++++++
>  1 file changed, 18 insertions(+)
>
> diff --git a/ofproto/ofproto-dpif-trace.c b/ofproto/ofproto-dpif-trace.c
> index b86e7fe07..87506aa78 100644
> --- a/ofproto/ofproto-dpif-trace.c
> +++ b/ofproto/ofproto-dpif-trace.c
> @@ -845,17 +845,35 @@ ofproto_trace(struct ofproto_dpif *ofproto, const
> struct flow *flow,
>                bool names)
>  {
>      struct ovs_list recirc_queue = OVS_LIST_INITIALIZER(&recirc_queue);
> +    int recirculations = 0;
> +
>      ofproto_trace__(ofproto, flow, packet, &recirc_queue,
>                      ofpacts, ofpacts_len, output, names);
>
>      struct oftrace_recirc_node *recirc_node;
>      LIST_FOR_EACH_POP (recirc_node, node, &recirc_queue) {
> +        if (recirculations++ > 4096) {
> +            ds_put_cstr(output, "\n\n");
> +            ds_put_char_multiple(output, '=', 79);
> +            ds_put_cstr(output, "\nTrace reached the recirculation limit."
> +                                "  Sopping the trace here.");
> +            ds_put_format(output,
> +                          "\nQueued but not processed: %"PRIuSIZE
> +                          " recirculations.",
> +                          ovs_list_size(&recirc_queue) + 1);
> +            oftrace_recirc_node_destroy(recirc_node);
> +            break;
> +        }
>          ofproto_trace_recirc_node(recirc_node, next_ct_states, output);
>          ofproto_trace__(ofproto, &recirc_node->flow, recirc_node->packet,
>                          &recirc_queue, ofpacts, ofpacts_len, output,
>                          names);
>          oftrace_recirc_node_destroy(recirc_node);
>      }
> +    /* Destroy remaining recirculation nodes, if any. */
> +    LIST_FOR_EACH_POP (recirc_node, node, &recirc_queue) {
> +        oftrace_recirc_node_destroy(recirc_node);
> +    }
>  }
>
>  void
> --
> 2.43.0
>
>
Ilya Maximets Feb. 22, 2024, 4:05 p.m. UTC | #2
On 2/22/24 16:46, Jaime Caamaño Ruiz wrote:
> We have the option to tweak the ct action ct_state output with `*--ct-next*
> /flags/ <https://man.archlinux.org/man/ovs-vswitchd.8.en#ct-next>` trace option.
> Would it make sense to have the same capability for the ip_frag field given that
> the ct action might influence its value after re-injection? Or it does not?
> Not suggesting it for this patch, but maybe a further improvement down the line.

It's a little controversial to add such a flag, since in general
OVS is not supposed to reassemble packets and it is a side effect
of a kernel conntrack implementation.  But I see how it can be
useful for the real world tracing since the Linux kernel does behave
this way and it's not going to change any time soon.

So, yeah, I think, it's reasonable to add such a trace modifier in
the future.

Best regards, Ilya Maximets.
Simon Horman Feb. 26, 2024, 1:10 p.m. UTC | #3
On Thu, Feb 22, 2024 at 04:06:32PM +0100, Ilya Maximets wrote:
> Trace attempts to process all the recirculations.  However, if there
> is a recirculation loop, i.e. if every recirculation generates another
> recirculation, this process will never stop.  It will grind until the
> trace fills the system memory.
> 
> A simple reproducer:
> 
>   make sandbox
>   ovs-vsctl add-br br0
>   ovs-vsctl add-port br0 p1
>   ovs-ofctl add-flow br0 "table=0,in_port=p1,ip,actions=ct(table=0)"
>   ovs-appctl ofproto/trace br0 in_port=p1,ip
> 
> Limit the number of recirculations trace is processing with a fairly
> arbitrary number - 4096 (loosely based on the resubmit limit, but
> they are not actually related).
> 
> Not adding a test for this since it's only for a trace, but also
> because the test may lead to OOM event in a system if the test fails,
> which is not nice.
> 
> Fixes: e6bc8e749381 ("ofproto/trace: Add support for tracing conntrack recirculation")
> Reported-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>
> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

Acked-by: Simon Horman <horms@ovn.org>

FWIIW, 4096 strikes me as an excessively generous limit.
But I have no reason to argue for a smaller value.

...
Ilya Maximets March 1, 2024, 11:01 p.m. UTC | #4
On 2/26/24 14:10, Simon Horman wrote:
> On Thu, Feb 22, 2024 at 04:06:32PM +0100, Ilya Maximets wrote:
>> Trace attempts to process all the recirculations.  However, if there
>> is a recirculation loop, i.e. if every recirculation generates another
>> recirculation, this process will never stop.  It will grind until the
>> trace fills the system memory.
>>
>> A simple reproducer:
>>
>>   make sandbox
>>   ovs-vsctl add-br br0
>>   ovs-vsctl add-port br0 p1
>>   ovs-ofctl add-flow br0 "table=0,in_port=p1,ip,actions=ct(table=0)"
>>   ovs-appctl ofproto/trace br0 in_port=p1,ip
>>
>> Limit the number of recirculations trace is processing with a fairly
>> arbitrary number - 4096 (loosely based on the resubmit limit, but
>> they are not actually related).
>>
>> Not adding a test for this since it's only for a trace, but also
>> because the test may lead to OOM event in a system if the test fails,
>> which is not nice.
>>
>> Fixes: e6bc8e749381 ("ofproto/trace: Add support for tracing conntrack recirculation")
>> Reported-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>
>> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
> 
> Acked-by: Simon Horman <horms@ovn.org>
> 
> FWIIW, 4096 strikes me as an excessively generous limit.
> But I have no reason to argue for a smaller value.

I think, there is a couple of legit cases where we would want that
many recirculations, since they are not necessarily very deep, but
may look like a very wide tree instead.  And in general we only
actually need protection from infinite cases.  So, should be fine
to have 4096, I suppose.

Thanks for review!

Applied and backported down to 2.17.

Best regards, Ilya Maximets.
diff mbox series

Patch

diff --git a/ofproto/ofproto-dpif-trace.c b/ofproto/ofproto-dpif-trace.c
index b86e7fe07..87506aa78 100644
--- a/ofproto/ofproto-dpif-trace.c
+++ b/ofproto/ofproto-dpif-trace.c
@@ -845,17 +845,35 @@  ofproto_trace(struct ofproto_dpif *ofproto, const struct flow *flow,
               bool names)
 {
     struct ovs_list recirc_queue = OVS_LIST_INITIALIZER(&recirc_queue);
+    int recirculations = 0;
+
     ofproto_trace__(ofproto, flow, packet, &recirc_queue,
                     ofpacts, ofpacts_len, output, names);
 
     struct oftrace_recirc_node *recirc_node;
     LIST_FOR_EACH_POP (recirc_node, node, &recirc_queue) {
+        if (recirculations++ > 4096) {
+            ds_put_cstr(output, "\n\n");
+            ds_put_char_multiple(output, '=', 79);
+            ds_put_cstr(output, "\nTrace reached the recirculation limit."
+                                "  Sopping the trace here.");
+            ds_put_format(output,
+                          "\nQueued but not processed: %"PRIuSIZE
+                          " recirculations.",
+                          ovs_list_size(&recirc_queue) + 1);
+            oftrace_recirc_node_destroy(recirc_node);
+            break;
+        }
         ofproto_trace_recirc_node(recirc_node, next_ct_states, output);
         ofproto_trace__(ofproto, &recirc_node->flow, recirc_node->packet,
                         &recirc_queue, ofpacts, ofpacts_len, output,
                         names);
         oftrace_recirc_node_destroy(recirc_node);
     }
+    /* Destroy remaining recirculation nodes, if any. */
+    LIST_FOR_EACH_POP (recirc_node, node, &recirc_queue) {
+        oftrace_recirc_node_destroy(recirc_node);
+    }
 }
 
 void