diff mbox

[ovs-dev,3/7] userspace: Switching of L3 packets in L2 pipeline

Message ID 20170303184059.GT29912@ovn.org
State Not Applicable
Headers show

Commit Message

Ben Pfaff March 3, 2017, 6:40 p.m. UTC
On Fri, Feb 03, 2017 at 10:40:09AM +0000, Jan Scheurich wrote:
> Ports have a new layer3 attribute if they send/receive L3 packets.
> 
> The packet_type included in structs dp_packet and flow is considered in
> ofproto-dpif. The classical L2 match fields (dl_src, dl_dst, dl_type, and
> vlan_tci, vlan_vid, vlan_pcp) now have Ethernet as pre-requisite.
> 
> A dummy ethernet header is pushed to L3 packets received from L3 ports
> before the pipeline processing starts. The ethernet header is popped
> before sending a packet to a L3 port.
> 
> For datapath ports that can receive L2 or L3 packets, the packet_type
> becomes part of the flow key for datapath flows and is handled
> appropriately in dpif-netdev.
> 
> Signed-off-by: Lorand Jakab <lojakab@cisco.com>
> Signed-off-by: Simon Horman <simon.horman@netronome.com>
> Signed-off-by: Jiri Benc <jbenc@redhat.com>
> Signed-off-by: Yi Yang <yi.y.yang@intel.com>
> Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com>
> Co-authored-by: Zoltan Balogh <zoltan.balogh@ericsson.com>

dp_netdev_flow_add() should use VLOG_DROP_DBG() once, and then
VLOG_DBG() twice if it OKs it, instead of VLOG_DBG_RL twice.  Otherwise
you can get the first log message but not the second one, which will be
less useful.

s/isntalled/installed/ in the comment in the same function.

This patch makes some white space only changes to dp_netdev_flow_add().
Please don't.

Please don't add comments about changes, like this one.  Readers should
not have to understand the history to understand the code:
+        if (put->flags & DPIF_FP_MODIFY) {
+            /* Removed the additional check
+             * flow_equal(&match.flow, &netdev_flow->flow) as a) the
+             * dpcls lookup is sufficient to uniquely identify a flow
+             * and b) it caused false negatives because the flow in
+             * netdev->flow may not properly be masked. */
The right place to explain changes is in the commit message.  If there
needs to be extra explanation of an algorithm in comments, that's
perfectly appropriate, but it should talk about the code as it exists,
not as it once did.

I see that match_format() prints packet_type as a hex number.  I doubt
that's the best for human consumption.  At the very least, it seems like
it's better printed as a pair of numbers, and probably it would be more
friendly to say what packet type it is.

Similarly in format_odp_key_attr().

What is the value of this change?  format_eth_masked() incorporates the
same check that it adds:

@@ -1231,12 +1242,16 @@ match_format(const struct match *match, struct ds *s, int priority)
             ds_put_format(s, "%svlan_tci=%s0x%04"PRIx16"/0x%04"PRIx16",",
                           colors.param, colors.end,
                           ntohs(f->vlan_tci), ntohs(wc->masks.vlan_tci));
         }
     }
-    format_eth_masked(s, "dl_src", f->dl_src, wc->masks.dl_src);
-    format_eth_masked(s, "dl_dst", f->dl_dst, wc->masks.dl_dst);
+    if (!eth_addr_is_zero(wc->masks.dl_src)) {
+        format_eth_masked(s, "dl_src", f->dl_src, wc->masks.dl_src);
+    }
+    if (!eth_addr_is_zero(wc->masks.dl_dst)) {
+        format_eth_masked(s, "dl_dst", f->dl_dst, wc->masks.dl_dst);
+    }

We generally avoid memset when another choice is available, e.g.:


Can we fix whatever problem this is instead of working around it?

+    /* Wildcard ethernet addresses if the original packet type was not
+     * Ethernet.
+     * XXX: This is a work-around. ofproto shouldn't unwildcard the Ethernet
+     * addresses at all. */
+    if (ctx->xin->upcall_flow->packet_type != PT_ETH) {
+        memset(&ctx->wc->masks.dl_dst, 0, ETH_ADDR_LEN);
+        memset(&ctx->wc->masks.dl_src, 0, ETH_ADDR_LEN);
+    }
+

and this?

+    /* XXX: ONLY FOR NON-PTAP BRIDGE! */
+    if (flow->packet_type != PT_ETH && in_port && in_port->is_layer3 &&
+            ctx.table_id == 0) {
+        /* Add dummy Ethernet header to non-L2 packet if it's coming from a
+         * L3 port. So all packets will be L2 packets for lookup.
+         * The dl_type has already been set from the packet_type. */
+        flow->packet_type = PT_ETH;
+        memset(&flow->dl_src, 0, ETH_ADDR_LEN);
+        memset(&flow->dl_dst, 0, ETH_ADDR_LEN);
+    }

Thanks,

Ben.

Comments

Jan Scheurich March 6, 2017, 12:37 a.m. UTC | #1
> Please don't add comments about changes, like this one.  Readers should
> not have to understand the history to understand the code:
> +        if (put->flags & DPIF_FP_MODIFY) {
> +            /* Removed the additional check
> +             * flow_equal(&match.flow, &netdev_flow->flow) as a) the
> +             * dpcls lookup is sufficient to uniquely identify a flow
> +             * and b) it caused false negatives because the flow in
> +             * netdev->flow may not properly be masked. */
> The right place to explain changes is in the commit message.  If there
> needs to be extra explanation of an algorithm in comments, that's
> perfectly appropriate, but it should talk about the code as it exists,
> not as it once did.

OK, will do.

> I see that match_format() prints packet_type as a hex number.  I doubt
> that's the best for human consumption.  At the very least, it seems like
> it's better printed as a pair of numbers, and probably it would be more
> friendly to say what packet type it is.

I agree that a pair of numbers would be more user friendly. Would you prefer packet_type=(1,0x800) or packet_type={1,0x800} or some other notation. 

For the most common Ethernet packet_type (0,0) we would print packet_type=ethernet.

Symbolic packet_type representation for namespace 1 could easily lead to confusion with the legacy match short-hands such as "ip", "ipv6", "arp", "udp", etc. I believe these will need to remain untouched, e.g. packet_type=0 && dl_type=0x800 && nw_proto=17 for "udp", or we would break a lot of legacy.

Here we should really agree on something up front. Changing the notation is very painful due to the large number of unit test cases that have to be adapted.

> 
> Similarly in format_odp_key_attr().
> 
> What is the value of this change?  format_eth_masked() incorporates the
> same check that it adds:
> 
> @@ -1231,12 +1242,16 @@ match_format(const struct match *match, struct ds *s, int priority)
>              ds_put_format(s, "%svlan_tci=%s0x%04"PRIx16"/0x%04"PRIx16",",
>                            colors.param, colors.end,
>                            ntohs(f->vlan_tci), ntohs(wc->masks.vlan_tci));
>          }
>      }
> -    format_eth_masked(s, "dl_src", f->dl_src, wc->masks.dl_src);
> -    format_eth_masked(s, "dl_dst", f->dl_dst, wc->masks.dl_dst);
> +    if (!eth_addr_is_zero(wc->masks.dl_src)) {
> +        format_eth_masked(s, "dl_src", f->dl_src, wc->masks.dl_src);
> +    }
> +    if (!eth_addr_is_zero(wc->masks.dl_dst)) {
> +        format_eth_masked(s, "dl_dst", f->dl_dst, wc->masks.dl_dst);
> +    }

Good spot. Will remove.

> 
> We generally avoid memset when another choice is available, e.g.:
> 
> diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
> index 5d0ed86d46df..b571d83d56a8 100644
> --- a/ofproto/ofproto-dpif-xlate.c
> +++ b/ofproto/ofproto-dpif-xlate.c
> @@ -3149,8 +3149,8 @@ compose_output_action__(struct xlate_ctx *ctx, ofp_port_t ofp_port,
>      else if (flow->packet_type != PT_ETH && !xport->is_layer3) {
>          /* L2 outport and non-ethernet packet_type -> add dummy eth header. */
>          flow->packet_type = PT_ETH;
> -        memset(&flow->dl_dst, 0,ETH_ADDR_LEN);
> -        memset(&flow->dl_src, 0,ETH_ADDR_LEN);
> +        flow->dl_dst = eth_addr_zero;
> +        flow->dl_src = eth_addr_zero;
>      }

Will fix that.

> Can we fix whatever problem this is instead of working around it?
> 
> +    /* Wildcard ethernet addresses if the original packet type was not
> +     * Ethernet.
> +     * XXX: This is a work-around. ofproto shouldn't unwildcard the Ethernet
> +     * addresses at all. */
> +    if (ctx->xin->upcall_flow->packet_type != PT_ETH) {
> +        memset(&ctx->wc->masks.dl_dst, 0, ETH_ADDR_LEN);
> +        memset(&ctx->wc->masks.dl_src, 0, ETH_ADDR_LEN);
> +    }

I had a discussion with Jarno about this (https://mail.openvswitch.org/pipermail/ovs-dev/2017-January/327510.html) and agreed that this is needed unless ofproto-dpif-xlate maintains a separate mask to keep track of the field bits that have been set independently from the bits that have been matched. That might well be worth doing to get wider, more generic megaflows, but it should be a separate patch series. I can remove the XXX: comment.

> and this?
> 
> +    /* XXX: ONLY FOR NON-PTAP BRIDGE! */
> +    if (flow->packet_type != PT_ETH && in_port && in_port->is_layer3 &&
> +            ctx.table_id == 0) {
> +        /* Add dummy Ethernet header to non-L2 packet if it's coming from a
> +         * L3 port. So all packets will be L2 packets for lookup.
> +         * The dl_type has already been set from the packet_type. */
> +        flow->packet_type = PT_ETH;
> +        memset(&flow->dl_src, 0, ETH_ADDR_LEN);
> +        memset(&flow->dl_dst, 0, ETH_ADDR_LEN);
> +    }

This is not a work-around. Again the comment was more for us to remember what needs revising when implementing PTAP bridges. We can remove it.
Ben Pfaff March 7, 2017, 9:27 p.m. UTC | #2
On Mon, Mar 06, 2017 at 12:37:11AM +0000, Jan Scheurich wrote:
> > I see that match_format() prints packet_type as a hex number.  I doubt
> > that's the best for human consumption.  At the very least, it seems like
> > it's better printed as a pair of numbers, and probably it would be more
> > friendly to say what packet type it is.
> 
> I agree that a pair of numbers would be more user friendly. Would you
> prefer packet_type=(1,0x800) or packet_type={1,0x800} or some other
> notation.
> 
> For the most common Ethernet packet_type (0,0) we would print
> packet_type=ethernet.
> 
> Symbolic packet_type representation for namespace 1 could easily lead to confusion with the legacy match short-hands such as "ip", "ipv6", "arp", "udp", etc. I believe these will need to remain untouched, e.g. packet_type=0 && dl_type=0x800 && nw_proto=17 for "udp", or we would break a lot of legacy.
> 
> Here we should really agree on something up front. Changing the
> notation is very painful due to the large number of unit test cases
> that have to be adapted.

I think that packet_type=(0,0x800) is the choice closest to the existing
formats.

I wonder whether we need to print a packet_type for Ethernet at all.  We
could decide to print nothing for a match on Ethernet, and instead print
"packet_type=*" if the packet type is wildcarded.  I suspect that this
would result in no change in the common case.

> > Can we fix whatever problem this is instead of working around it?
> > 
> > +    /* Wildcard ethernet addresses if the original packet type was not
> > +     * Ethernet.
> > +     * XXX: This is a work-around. ofproto shouldn't unwildcard the Ethernet
> > +     * addresses at all. */
> > +    if (ctx->xin->upcall_flow->packet_type != PT_ETH) {
> > +        memset(&ctx->wc->masks.dl_dst, 0, ETH_ADDR_LEN);
> > +        memset(&ctx->wc->masks.dl_src, 0, ETH_ADDR_LEN);
> > +    }
> 
> I had a discussion with Jarno about this
> (https://mail.openvswitch.org/pipermail/ovs-dev/2017-January/327510.html)
> and agreed that this is needed unless ofproto-dpif-xlate maintains a
> separate mask to keep track of the field bits that have been set
> independently from the bits that have been matched. That might well be
> worth doing to get wider, more generic megaflows, but it should be a
> separate patch series. I can remove the XXX: comment.

OK.

> > and this?
> > 
> > +    /* XXX: ONLY FOR NON-PTAP BRIDGE! */
> > +    if (flow->packet_type != PT_ETH && in_port && in_port->is_layer3 &&
> > +            ctx.table_id == 0) {
> > +        /* Add dummy Ethernet header to non-L2 packet if it's coming from a
> > +         * L3 port. So all packets will be L2 packets for lookup.
> > +         * The dl_type has already been set from the packet_type. */
> > +        flow->packet_type = PT_ETH;
> > +        memset(&flow->dl_src, 0, ETH_ADDR_LEN);
> > +        memset(&flow->dl_dst, 0, ETH_ADDR_LEN);
> > +    }
> 
> This is not a work-around. Again the comment was more for us to
> remember what needs revising when implementing PTAP bridges. We can
> remove it.

OK, thanks.
diff mbox

Patch

diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
index 5d0ed86d46df..b571d83d56a8 100644
--- a/ofproto/ofproto-dpif-xlate.c
+++ b/ofproto/ofproto-dpif-xlate.c
@@ -3149,8 +3149,8 @@  compose_output_action__(struct xlate_ctx *ctx, ofp_port_t ofp_port,
     else if (flow->packet_type != PT_ETH && !xport->is_layer3) {
         /* L2 outport and non-ethernet packet_type -> add dummy eth header. */
         flow->packet_type = PT_ETH;
-        memset(&flow->dl_dst, 0,ETH_ADDR_LEN);
-        memset(&flow->dl_src, 0,ETH_ADDR_LEN);
+        flow->dl_dst = eth_addr_zero;
+        flow->dl_src = eth_addr_zero;
     }
 
     if (xport->peer) {