diff mbox series

[ovs-dev] ofproto-dpif-trace: Improve NAT tracing.

Message ID 1578068851-10608-1-git-send-email-dceara@redhat.com
State Superseded
Headers show
Series [ovs-dev] ofproto-dpif-trace: Improve NAT tracing. | expand

Commit Message

Dumitru Ceara Jan. 3, 2020, 4:27 p.m. UTC
When ofproto/trace detects a recirc action it resumes execution at the
specified next table. However, if the ct action performs SNAT/DNAT,
e.g., ct(commit,nat(src=1.1.1.1:4000),table=42), the src/dst IPs and
ports in the oftrace_recirc_node->flow field are not updated. This leads
to misleading outputs from ofproto/trace as real packets would actually
first get NATed and might match different flows when recirculated.

Assume the first IP/port from the NAT src/dst action will be used by
conntrack for the translation and update the oftrace_recirc_node->flow
accordingly. This is not entirely correct as conntrack might choose a
different IP/port but the result is more realistic than before.

This fix covers new connections. However, for reply traffic that executes
actions of the form ct(nat, table=42) we still don't update the flow as
we don't have any information about conntrack state when tracing.

Signed-off-by: Dumitru Ceara <dceara@redhat.com>
---
 ofproto/ofproto-dpif-trace.c | 33 +++++++++++++++++++++++++++++++++
 ofproto/ofproto-dpif-trace.h |  1 +
 ofproto/ofproto-dpif-xlate.c |  6 ++++--
 tests/ofproto-dpif.at        | 36 ++++++++++++++++++++++++++++++++++++
 4 files changed, 74 insertions(+), 2 deletions(-)

Comments

Ben Pfaff Jan. 6, 2020, 10:11 p.m. UTC | #1
On Fri, Jan 03, 2020 at 05:27:31PM +0100, Dumitru Ceara wrote:
> When ofproto/trace detects a recirc action it resumes execution at the
> specified next table. However, if the ct action performs SNAT/DNAT,
> e.g., ct(commit,nat(src=1.1.1.1:4000),table=42), the src/dst IPs and
> ports in the oftrace_recirc_node->flow field are not updated. This leads
> to misleading outputs from ofproto/trace as real packets would actually
> first get NATed and might match different flows when recirculated.
> 
> Assume the first IP/port from the NAT src/dst action will be used by
> conntrack for the translation and update the oftrace_recirc_node->flow
> accordingly. This is not entirely correct as conntrack might choose a
> different IP/port but the result is more realistic than before.
> 
> This fix covers new connections. However, for reply traffic that executes
> actions of the form ct(nat, table=42) we still don't update the flow as
> we don't have any information about conntrack state when tracing.
> 
> Signed-off-by: Dumitru Ceara <dceara@redhat.com>

This is a great idea.

I have an idea for further improvement.  Currently this patch is quiet:
it doesn't say what's going on.  It would be better if it said that it
was replacing the source and/or destination addresses, and by what and
why.  I think that would be easiest done in ofproto_trace() itself near
the existing code that does it already for ct_state.  I think we could
just save the ofn pointer in the recirc node and move the replacement
code to ofproto_trace().
Dumitru Ceara Jan. 7, 2020, 8:15 a.m. UTC | #2
On Mon, Jan 6, 2020 at 11:11 PM Ben Pfaff <blp@ovn.org> wrote:
>
> On Fri, Jan 03, 2020 at 05:27:31PM +0100, Dumitru Ceara wrote:
> > When ofproto/trace detects a recirc action it resumes execution at the
> > specified next table. However, if the ct action performs SNAT/DNAT,
> > e.g., ct(commit,nat(src=1.1.1.1:4000),table=42), the src/dst IPs and
> > ports in the oftrace_recirc_node->flow field are not updated. This leads
> > to misleading outputs from ofproto/trace as real packets would actually
> > first get NATed and might match different flows when recirculated.
> >
> > Assume the first IP/port from the NAT src/dst action will be used by
> > conntrack for the translation and update the oftrace_recirc_node->flow
> > accordingly. This is not entirely correct as conntrack might choose a
> > different IP/port but the result is more realistic than before.
> >
> > This fix covers new connections. However, for reply traffic that executes
> > actions of the form ct(nat, table=42) we still don't update the flow as
> > we don't have any information about conntrack state when tracing.
> >
> > Signed-off-by: Dumitru Ceara <dceara@redhat.com>
>
> This is a great idea.
>
> I have an idea for further improvement.  Currently this patch is quiet:
> it doesn't say what's going on.  It would be better if it said that it
> was replacing the source and/or destination addresses, and by what and
> why.  I think that would be easiest done in ofproto_trace() itself near
> the existing code that does it already for ct_state.  I think we could
> just save the ofn pointer in the recirc node and move the replacement
> code to ofproto_trace().
>

Thanks for the review! Sounds better indeed. I'll respin the patch
with your enhancement.

As follow up work I was thinking it would be nice to be able to cover
the other types of nat recirculation too, e.g., ct(nat, table=42), and
allow the user to specify the outcome of the nat operation in a
similar way as we do for conntrack with --ct-next. However, I'm not
sure what the best way is to do that.. I'm open to suggestions.

Could we enhance ofproto/trace such that the user can specify a
conntrack database in a predefined format? For example, the output of
"conntrack -L -o xml". Or would that become confusing?

Thanks,
Dumitru
Ben Pfaff Jan. 7, 2020, 7:25 p.m. UTC | #3
On Tue, Jan 07, 2020 at 09:15:26AM +0100, Dumitru Ceara wrote:
> On Mon, Jan 6, 2020 at 11:11 PM Ben Pfaff <blp@ovn.org> wrote:
> >
> > On Fri, Jan 03, 2020 at 05:27:31PM +0100, Dumitru Ceara wrote:
> > > When ofproto/trace detects a recirc action it resumes execution at the
> > > specified next table. However, if the ct action performs SNAT/DNAT,
> > > e.g., ct(commit,nat(src=1.1.1.1:4000),table=42), the src/dst IPs and
> > > ports in the oftrace_recirc_node->flow field are not updated. This leads
> > > to misleading outputs from ofproto/trace as real packets would actually
> > > first get NATed and might match different flows when recirculated.
> > >
> > > Assume the first IP/port from the NAT src/dst action will be used by
> > > conntrack for the translation and update the oftrace_recirc_node->flow
> > > accordingly. This is not entirely correct as conntrack might choose a
> > > different IP/port but the result is more realistic than before.
> > >
> > > This fix covers new connections. However, for reply traffic that executes
> > > actions of the form ct(nat, table=42) we still don't update the flow as
> > > we don't have any information about conntrack state when tracing.
> > >
> > > Signed-off-by: Dumitru Ceara <dceara@redhat.com>
> >
> > This is a great idea.
> >
> > I have an idea for further improvement.  Currently this patch is quiet:
> > it doesn't say what's going on.  It would be better if it said that it
> > was replacing the source and/or destination addresses, and by what and
> > why.  I think that would be easiest done in ofproto_trace() itself near
> > the existing code that does it already for ct_state.  I think we could
> > just save the ofn pointer in the recirc node and move the replacement
> > code to ofproto_trace().
> >
> 
> Thanks for the review! Sounds better indeed. I'll respin the patch
> with your enhancement.
> 
> As follow up work I was thinking it would be nice to be able to cover
> the other types of nat recirculation too, e.g., ct(nat, table=42), and
> allow the user to specify the outcome of the nat operation in a
> similar way as we do for conntrack with --ct-next. However, I'm not
> sure what the best way is to do that.. I'm open to suggestions.
> 
> Could we enhance ofproto/trace such that the user can specify a
> conntrack database in a predefined format? For example, the output of
> "conntrack -L -o xml". Or would that become confusing?

I have two ideas.

First, OVS is already able to read and dump the actual connection
tracking database in use (see e.g. the dpctl/dump-conntrack command in
ovs-vswitchd(8)).  ofproto/trace could consult this.  It would want to
dump out the entry that was found (or not found) while doing it.

Second, I guess we could have an option whose argument is a set of
actions to execute, which could update the source and destination, etc.
That's very general purpose!
Dumitru Ceara Jan. 10, 2020, 9:46 a.m. UTC | #4
On Tue, Jan 7, 2020 at 8:25 PM Ben Pfaff <blp@ovn.org> wrote:
>
> On Tue, Jan 07, 2020 at 09:15:26AM +0100, Dumitru Ceara wrote:
> > On Mon, Jan 6, 2020 at 11:11 PM Ben Pfaff <blp@ovn.org> wrote:
> > >
> > > On Fri, Jan 03, 2020 at 05:27:31PM +0100, Dumitru Ceara wrote:
> > > > When ofproto/trace detects a recirc action it resumes execution at the
> > > > specified next table. However, if the ct action performs SNAT/DNAT,
> > > > e.g., ct(commit,nat(src=1.1.1.1:4000),table=42), the src/dst IPs and
> > > > ports in the oftrace_recirc_node->flow field are not updated. This leads
> > > > to misleading outputs from ofproto/trace as real packets would actually
> > > > first get NATed and might match different flows when recirculated.
> > > >
> > > > Assume the first IP/port from the NAT src/dst action will be used by
> > > > conntrack for the translation and update the oftrace_recirc_node->flow
> > > > accordingly. This is not entirely correct as conntrack might choose a
> > > > different IP/port but the result is more realistic than before.
> > > >
> > > > This fix covers new connections. However, for reply traffic that executes
> > > > actions of the form ct(nat, table=42) we still don't update the flow as
> > > > we don't have any information about conntrack state when tracing.
> > > >
> > > > Signed-off-by: Dumitru Ceara <dceara@redhat.com>
> > >
> > > This is a great idea.
> > >
> > > I have an idea for further improvement.  Currently this patch is quiet:
> > > it doesn't say what's going on.  It would be better if it said that it
> > > was replacing the source and/or destination addresses, and by what and
> > > why.  I think that would be easiest done in ofproto_trace() itself near
> > > the existing code that does it already for ct_state.  I think we could
> > > just save the ofn pointer in the recirc node and move the replacement
> > > code to ofproto_trace().
> > >
> >
> > Thanks for the review! Sounds better indeed. I'll respin the patch
> > with your enhancement.

Hi Ben,

I sent v2 including your suggestions:
https://patchwork.ozlabs.org/patch/1220907/

> >
> > As follow up work I was thinking it would be nice to be able to cover
> > the other types of nat recirculation too, e.g., ct(nat, table=42), and
> > allow the user to specify the outcome of the nat operation in a
> > similar way as we do for conntrack with --ct-next. However, I'm not
> > sure what the best way is to do that.. I'm open to suggestions.
> >
> > Could we enhance ofproto/trace such that the user can specify a
> > conntrack database in a predefined format? For example, the output of
> > "conntrack -L -o xml". Or would that become confusing?
>
> I have two ideas.
>
> First, OVS is already able to read and dump the actual connection
> tracking database in use (see e.g. the dpctl/dump-conntrack command in
> ovs-vswitchd(8)).  ofproto/trace could consult this.  It would want to
> dump out the entry that was found (or not found) while doing it.

This is nice, thanks for pointing it out. So I guess when
troubleshooting packet forwarding we could insert conntrack entries to
guide ofproto/trace.

>
> Second, I guess we could have an option whose argument is a set of
> actions to execute, which could update the source and destination, etc.
> That's very general purpose!
>

I also like this second option because it decouples tracing from the
underlying datapath conntrack implementation.

I'll give it more thought and follow up when I have something working.

Regards,
Dumitru
diff mbox series

Patch

diff --git a/ofproto/ofproto-dpif-trace.c b/ofproto/ofproto-dpif-trace.c
index 8ae8a22..b2ff903 100644
--- a/ofproto/ofproto-dpif-trace.c
+++ b/ofproto/ofproto-dpif-trace.c
@@ -86,6 +86,7 @@  oftrace_node_destroy(struct oftrace_node *node)
 bool
 oftrace_add_recirc_node(struct ovs_list *recirc_queue,
                         enum oftrace_recirc_type type, const struct flow *flow,
+                        const struct ofpact_nat *ofn,
                         const struct dp_packet *packet, uint32_t recirc_id,
                         const uint16_t zone)
 {
@@ -101,6 +102,38 @@  oftrace_add_recirc_node(struct ovs_list *recirc_queue,
     node->flow = *flow;
     node->flow.recirc_id = recirc_id;
     node->flow.ct_zone = zone;
+
+    /* If there's any snat/dnat information assume we always translate to
+     * the first IP/port to make sure we don't match on incorrect flows later
+     * on.
+     */
+    if (ofn) {
+        if (ofn->flags & NX_NAT_F_SRC) {
+            if (ofn->range_af == AF_INET) {
+                node->flow.nw_src = ofn->range.addr.ipv4.min;
+            } else if (ofn->range_af == AF_INET6) {
+                memcpy(&node->flow.ipv6_src, &ofn->range.addr.ipv6.min,
+                       sizeof node->flow.ipv6_src);
+            }
+
+            if (ofn->range_af != AF_UNSPEC && ofn->range.proto.min) {
+                node->flow.tp_src = htons(ofn->range.proto.min);
+            }
+        }
+        if (ofn->flags & NX_NAT_F_DST) {
+            if (ofn->range_af == AF_INET) {
+                node->flow.nw_dst = ofn->range.addr.ipv4.min;
+            } else if (ofn->range_af == AF_INET6) {
+                memcpy(&node->flow.ipv6_dst, &ofn->range.addr.ipv6.min,
+                       sizeof node->flow.ipv6_dst);
+            }
+
+            if (ofn->range_af != AF_UNSPEC && ofn->range.proto.min) {
+                node->flow.tp_dst = htons(ofn->range.proto.min);
+            }
+        }
+    }
+
     node->packet = packet ? dp_packet_clone(packet) : NULL;
 
     return true;
diff --git a/ofproto/ofproto-dpif-trace.h b/ofproto/ofproto-dpif-trace.h
index 63dbb50..e5db9f9 100644
--- a/ofproto/ofproto-dpif-trace.h
+++ b/ofproto/ofproto-dpif-trace.h
@@ -91,6 +91,7 @@  struct oftrace_node *oftrace_report(struct ovs_list *, enum oftrace_node_type,
                                     const char *text);
 bool oftrace_add_recirc_node(struct ovs_list *recirc_queue,
                              enum oftrace_recirc_type, const struct flow *,
+                             const struct ofpact_nat *,
                              const struct dp_packet *, uint32_t recirc_id,
                              const uint16_t zone);
 
diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
index 1c72693..ac73656 100644
--- a/ofproto/ofproto-dpif-xlate.c
+++ b/ofproto/ofproto-dpif-xlate.c
@@ -4968,7 +4968,8 @@  compose_recirculate_and_fork(struct xlate_ctx *ctx, uint8_t table,
     if (OVS_UNLIKELY(ctx->xin->trace) && recirc_id) {
         if (oftrace_add_recirc_node(ctx->xin->recirc_queue,
                                     OFT_RECIRC_CONNTRACK, &ctx->xin->flow,
-                                    ctx->xin->packet, recirc_id, zone)) {
+                                    ctx->ct_nat_action, ctx->xin->packet,
+                                    recirc_id, zone)) {
             xlate_report(ctx, OFT_DETAIL, "A clone of the packet is forked to "
                          "recirculate. The forked pipeline will be resumed at "
                          "table %u.", table);
@@ -6151,7 +6152,6 @@  compose_conntrack_action(struct xlate_ctx *ctx, struct ofpact_conntrack *ofc,
     put_ct_label(&ctx->xin->flow, ctx->odp_actions, ctx->wc);
     put_ct_helper(ctx, ctx->odp_actions, ofc);
     put_ct_nat(ctx);
-    ctx->ct_nat_action = NULL;
     nl_msg_end_nested(ctx->odp_actions, ct_offset);
 
     ctx->wc->masks.ct_mark = old_ct_mark_mask;
@@ -6162,6 +6162,8 @@  compose_conntrack_action(struct xlate_ctx *ctx, struct ofpact_conntrack *ofc,
         compose_recirculate_and_fork(ctx, ofc->recirc_table, zone);
     }
 
+    ctx->ct_nat_action = NULL;
+
     /* The ct_* fields are only available in the scope of the 'recirc_table'
      * call chain. */
     flow_clear_conntrack(&ctx->xin->flow);
diff --git a/tests/ofproto-dpif.at b/tests/ofproto-dpif.at
index 511d861..28e7f87 100644
--- a/tests/ofproto-dpif.at
+++ b/tests/ofproto-dpif.at
@@ -10598,6 +10598,42 @@  AT_CHECK([tail -1 stdout], [0],
 OVS_VSWITCHD_STOP
 AT_CLEANUP
 
+AT_SETUP([ofproto-dpif - nat - ofproto/trace])
+OVS_VSWITCHD_START
+
+add_of_ports br0 1 2 3
+
+flow="in_port=1,udp,nw_src=1.1.1.1,nw_dst=1.1.1.2,udp_src=100,udp_dst=200"
+AT_DATA([flows.txt], [dnl
+table=0,priority=100,ip,nw_src=1.1.1.1,ct_state=-trk,action=ct(commit,nat(src=10.0.0.1-10.0.0.42:1000-1042),table=0)
+table=0,priority=100,udp,ct_state=+trk,nw_src=10.0.0.1,nw_dst=1.1.1.2,tp_src=1000,tp_dst=200,action=ct(commit,nat(dst=20.0.0.1-20.0.0.42:2000-2042),table=0)
+table=0,priority=100,udp,ct_state=+trk,nw_src=10.0.0.1,nw_dst=20.0.0.1,tp_src=1000,tp_dst=2000,action=3
+table=0,priority=90,ip,ct_state=+trk,action=2
+])
+AT_CHECK([ovs-ofctl del-flows br0])
+AT_CHECK([ovs-ofctl add-flows br0 flows.txt])
+AT_CHECK([ovs-appctl ofproto/trace br0 "$flow"], [0], [stdout])
+AT_CHECK([tail -1 stdout], [0],
+  [Datapath actions: 3
+])
+
+flow="in_port=1,udp6,ipv6_src=1::1,ipv6_dst=1::2,udp_src=100,udp_dst=200"
+AT_DATA([flows.txt], [dnl
+table=0,priority=100,ip6,ipv6_src=1::1,ct_state=-trk,action=ct(commit,nat(src=[[10::1]]-[[10::42]]:1000-1042),table=0)
+table=0,priority=100,udp6,ct_state=+trk,ipv6_src=10::1,ipv6_dst=1::2,tp_src=1000,tp_dst=200,action=ct(commit,nat(dst=[[20::1]]-[[20::42]]:2000-2042),table=0)
+table=0,priority=100,udp6,ct_state=+trk,ipv6_src=10::1,ipv6_dst=20::1,tp_src=1000,tp_dst=2000,action=3
+table=0,priority=90,ip6,ct_state=+trk,action=2
+])
+AT_CHECK([ovs-ofctl del-flows br0])
+AT_CHECK([ovs-ofctl add-flows br0 flows.txt])
+AT_CHECK([ovs-appctl ofproto/trace br0 "$flow"], [0], [stdout])
+AT_CHECK([tail -1 stdout], [0],
+  [Datapath actions: 3
+])
+
+OVS_VSWITCHD_STOP
+AT_CLEANUP
+
 AT_SETUP([ofproto - set mtu])
 OVS_VSWITCHD_START