diff mbox

[v9,net-next,4/7] openvswitch: add layer 3 flow/port support

Message ID 20160520052858.GA15505@vergenet.net
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Simon Horman May 20, 2016, 5:29 a.m. UTC
Hi Jiri,

On Tue, May 17, 2016 at 04:32:50PM +0200, Jiri Benc wrote:
> Looking through the patchset again, this time more deeply. Sorry for
> the delay.

No need to be sorry, good things take time.

> On Wed,  4 May 2016 16:36:30 +0900, Simon Horman wrote:
> > +struct ovs_action_push_eth {
> > +	struct ovs_key_ethernet addresses;
> > +	__be16   eth_type;
> 
> Extra spaces.

Sorry about that.

As per some earlier discussion I plan to remove the eth_type field entirely.

> 
> > +static int pop_eth(struct sk_buff *skb, struct sw_flow_key *key)
> > +{
> > +	skb_pull_rcsum(skb, ETH_HLEN);
> > +	skb_reset_mac_header(skb);
> > +	skb->mac_len -= ETH_HLEN;
> > +
> > +	invalidate_flow_key(key);
> > +	return 0;
> > +}
> 
> There's a fundamental question here: how should pop_eth behave when
> vlan tag is present?
> 
> There are two options: either vlan is considered part of the Ethernet
> header and pop_eth means implicitly resetting vlan tag, or packet can
> have vlan tag even if it's not Ethernet.
> 
> This patch seems to implement the first option; however, skb->vlan_tci
> should be reset and pop_eth should check whether the vlan tag is
> present in the frame (deaccelerated) and remove it if it is. Otherwise,
> the behavior of pop_eth would be inconsistent.
> 
> However, I'm not sure whether the second option does not make more
> sense. It may, in fact, be needed - ARPHRD_NONE tunnel port could not
> be set as an access port otherwise (unless I'm missing something).
> 
> In that case, pop_eth will need to put the vlan tag to skb->vlan_tci if
> it's in the frame itself. Also, push_vlan and pop_vlan would need to be
> modified to work with is_layer3 packets.

Good point.

The second option does seem rather tempting although I'm not sure
that it actually plays out in the access-port scenario at this time.

> > +static int push_eth(struct sk_buff *skb, struct sw_flow_key *key,
> > +		    const struct ovs_action_push_eth *ethh)
> > +{
> > +	int err;
> > +
> > +	/* De-accelerate any hardware accelerated VLAN tag added to a previous
> > +	 * Ethernet header */
> > +	err = skb_vlan_deaccel(skb);
> 
> Why? Just keep it in skb->vlan_tci.

Agreed, this seems unnecessary.

> > --- a/net/openvswitch/flow.c
> > +++ b/net/openvswitch/flow.c
> > @@ -468,28 +468,31 @@ static int key_extract(struct sk_buff *skb, struct sw_flow_key *key)
> >  
> >  	skb_reset_mac_header(skb);
> >  
> > -	/* Link layer.  We are guaranteed to have at least the 14 byte Ethernet
> > -	 * header in the linear data area.
> > -	 */
> > -	eth = eth_hdr(skb);
> > -	ether_addr_copy(key->eth.src, eth->h_source);
> > -	ether_addr_copy(key->eth.dst, eth->h_dest);
> > +	/* Link layer. */
> > +	if (key->phy.is_layer3) {
> > +		key->eth.tci = 0;
> 
> Could make sense to use skb->vlan_tci, see above.

The incremental patch below is what I have so far.
The patch to add skb_vlan_deaccel() should also be dropped.

Comments

Jiri Benc May 20, 2016, 8 a.m. UTC | #1
On Fri, 20 May 2016 14:29:01 +0900, Simon Horman wrote:
> The second option does seem rather tempting although I'm not sure
> that it actually plays out in the access-port scenario at this time.

We support gre ports to be access ports currently. With conversion to
ipgre, this needs to continue working. It's no problem for frames with
the Ethernet header but now we have a situation where a port is tagged,
thus the user expects that packets received on that port will behave
accordingly. I don't think we can make some packets honor this and some
ignore this; and we can't disallow gre to be an access port.

How do you plan to solve this? By user space always pushing an ethernet
header before push_vlan?

 Jiri
Simon Horman May 20, 2016, 8:11 a.m. UTC | #2
On Fri, May 20, 2016 at 10:00:28AM +0200, Jiri Benc wrote:
> On Fri, 20 May 2016 14:29:01 +0900, Simon Horman wrote:
> > The second option does seem rather tempting although I'm not sure
> > that it actually plays out in the access-port scenario at this time.
> 
> We support gre ports to be access ports currently. With conversion to
> ipgre, this needs to continue working. It's no problem for frames with
> the Ethernet header but now we have a situation where a port is tagged,
> thus the user expects that packets received on that port will behave
> accordingly. I don't think we can make some packets honor this and some
> ignore this; and we can't disallow gre to be an access port.
> 
> How do you plan to solve this? By user space always pushing an ethernet
> header before push_vlan?

Yes. That is my understanding of how OvS currently handles access ports but
I have a feeling that either I am mistaken or that you are referring to a
slightly different scenario.
Simon Horman May 20, 2016, 8:16 a.m. UTC | #3
On Fri, May 20, 2016 at 05:11:23PM +0900, Simon Horman wrote:
> On Fri, May 20, 2016 at 10:00:28AM +0200, Jiri Benc wrote:
> > On Fri, 20 May 2016 14:29:01 +0900, Simon Horman wrote:
> > > The second option does seem rather tempting although I'm not sure
> > > that it actually plays out in the access-port scenario at this time.
> > 
> > We support gre ports to be access ports currently. With conversion to
> > ipgre, this needs to continue working. It's no problem for frames with
> > the Ethernet header but now we have a situation where a port is tagged,
> > thus the user expects that packets received on that port will behave
> > accordingly. I don't think we can make some packets honor this and some
> > ignore this; and we can't disallow gre to be an access port.
> > 
> > How do you plan to solve this? By user space always pushing an ethernet
> > header before push_vlan?
> 
> Yes. That is my understanding of how OvS currently handles access ports but
> I have a feeling that either I am mistaken or that you are referring to a
> slightly different scenario.

Hi again.

I apologise for having sent my previous email a little too quickly.

My understanding is that currently OvS handles access ports using a
push_vlan action. And that this patch set in conjunction with its
user-space counterpart should ensure that a push_eth action occurs first.
This is the context of my remarks above.
Jiri Benc May 20, 2016, 8:39 a.m. UTC | #4
On Fri, 20 May 2016 17:16:13 +0900, Simon Horman wrote:
> My understanding is that currently OvS handles access ports using a
> push_vlan action.

When needed (i.e. when the packet goes to a non-access port), yes.

> And that this patch set in conjunction with its
> user-space counterpart should ensure that a push_eth action occurs first.
> This is the context of my remarks above.

Okay, works for me principally. If it's later found insufficient,
relaxing push_vlan and pop_vlan to work also for L3 frames is still
easily possible without breaking compatibility.

Out of curiosity (and without looking at the user space patchset), what
will the pushed Ethernet header contain? E.g., consider the following
scenario: two GRE ports, both set as access ports with the same tag,
and a mirror port mirroring everything. Now an IP packet without inner
Ethernet header is received on one of the GRE interfaces.

Will the packet be output to the second GRE port? Will it be sent out
without the inner Ethernet header? Are custom rules necessary, or will
NORMAL take care of this? What will be sent to the mirror port?

Thanks!

 Jiri
Simon Horman May 20, 2016, 9:12 a.m. UTC | #5
On Fri, May 20, 2016 at 10:39:39AM +0200, Jiri Benc wrote:
> On Fri, 20 May 2016 17:16:13 +0900, Simon Horman wrote:
> > My understanding is that currently OvS handles access ports using a
> > push_vlan action.
> 
> When needed (i.e. when the packet goes to a non-access port), yes.
> 
> > And that this patch set in conjunction with its
> > user-space counterpart should ensure that a push_eth action occurs first.
> > This is the context of my remarks above.
> 
> Okay, works for me principally. If it's later found insufficient,
> relaxing push_vlan and pop_vlan to work also for L3 frames is still
> easily possible without breaking compatibility.

Right. I'm all for allowing extension later if the need arises.

> Out of curiosity (and without looking at the user space patchset), what
> will the pushed Ethernet header contain? E.g., consider the following
> scenario: two GRE ports, both set as access ports with the same tag,
> and a mirror port mirroring everything. Now an IP packet without inner
> Ethernet header is received on one of the GRE interfaces.
> 
> Will the packet be output to the second GRE port? Will it be sent out
> without the inner Ethernet header? Are custom rules necessary, or will
> NORMAL take care of this? What will be sent to the mirror port?

Let me take a stab at answering that without running any tests.

1. push_eth adds an Ethernet header with all-zero addresses and
   the Ethernet type as determined from skb->protocol which is in
   turn determined by the tunnel header (we have discussed that
   bit before).

   In principle it is pushed when needed. And this happens automatically
   as controlled by user-space.

   It is possible to modify the Ethernet addresses using a custom rule.
   (I need to exercise that more often.)

2. For the GRE part of the scenario above it is important to know that with
   the accompanying user-space patch set OvS user-space the user-space
   representation of a vport (from now on simply vport) may be "layer3" or
   not.

   This allows OvS user-space to determine if an Ethernet header should be
   present or not on receive. And if it needs to be present or not on
   transmit. This allows it to automatically use pop_eth and push_eth to
   control the presence of an Ethernet header so its there when it needs to
   be and not when it doesn't.

   So if a GRE vport is "layer3" then no Ethernet header should be
   present on transmit, regardless of where the packet came from. And
   conversely if the GRE vport is not "layer3" then an Ethernet header
   should be present.

3. With regards to the mirroring part of your connection, I need to check
   on that and possibly its broken. But my thinking is that a mirroring
   vport would regarded in the same way as any other vport in this respect
   and the presence of the "layer3" flag would control whether an Ethernet
   header is present or not.

   It may be the case that its not possible to use a tunnel vport as a
   mirroring vport. And as all other vports currently do not support
   "layer3" then currently an Ethernet header would always be present
   on output to a mirror.
Jiri Benc May 20, 2016, 9:20 a.m. UTC | #6
On Fri, 20 May 2016 18:12:05 +0900, Simon Horman wrote:
> 1. push_eth adds an Ethernet header with all-zero addresses and
>    the Ethernet type as determined from skb->protocol which is in
>    turn determined by the tunnel header (we have discussed that
>    bit before).
> 
>    In principle it is pushed when needed. And this happens automatically
>    as controlled by user-space.
> 
>    It is possible to modify the Ethernet addresses using a custom rule.
>    (I need to exercise that more often.)
> 
> 2. For the GRE part of the scenario above it is important to know that with
>    the accompanying user-space patch set OvS user-space the user-space
>    representation of a vport (from now on simply vport) may be "layer3" or
>    not.
> 
>    This allows OvS user-space to determine if an Ethernet header should be
>    present or not on receive. And if it needs to be present or not on
>    transmit. This allows it to automatically use pop_eth and push_eth to
>    control the presence of an Ethernet header so its there when it needs to
>    be and not when it doesn't.
> 
>    So if a GRE vport is "layer3" then no Ethernet header should be
>    present on transmit, regardless of where the packet came from. And
>    conversely if the GRE vport is not "layer3" then an Ethernet header
>    should be present.

I think this works for me. Thanks a lot for answering my questions!

> 3. With regards to the mirroring part of your connection, I need to check
>    on that and possibly its broken. But my thinking is that a mirroring
>    vport would regarded in the same way as any other vport in this respect
>    and the presence of the "layer3" flag would control whether an Ethernet
>    header is present or not.
> 
>    It may be the case that its not possible to use a tunnel vport as a
>    mirroring vport. And as all other vports currently do not support
>    "layer3" then currently an Ethernet header would always be present
>    on output to a mirror.

We can just require a mirror port to be always L2 and output the L3
packets with zero Ethernet addresses. For mirroring, this should be
okay(?)

 Jiri
Simon Horman May 20, 2016, 10:14 a.m. UTC | #7
On Fri, May 20, 2016 at 11:20:04AM +0200, Jiri Benc wrote:
> On Fri, 20 May 2016 18:12:05 +0900, Simon Horman wrote:

[...]

> > 3. With regards to the mirroring part of your question, I need to check
> >    on that and possibly its broken. But my thinking is that a mirroring
> >    vport would regarded in the same way as any other vport in this respect
> >    and the presence of the "layer3" flag would control whether an Ethernet
> >    header is present or not.
> > 
> >    It may be the case that its not possible to use a tunnel vport as a
> >    mirroring vport. And as all other vports currently do not support
> >    "layer3" then currently an Ethernet header would always be present
> >    on output to a mirror.
> 
> We can just require a mirror port to be always L2 and output the L3
> packets with zero Ethernet addresses. For mirroring, this should be
> okay(?)

Yes, I expect that is a good approach.

I'll look over the code to see about making that so if its not already the
case.
diff mbox

Patch

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index c413c588a24f..6853ab008861 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -2994,6 +2994,7 @@  int skb_vlan_pop(struct sk_buff *skb);
 int skb_vlan_push(struct sk_buff *skb, __be16 vlan_proto, u16 vlan_tci);
 struct sk_buff *pskb_extract(struct sk_buff *skb, int off, int to_copy,
 			     gfp_t gfp);
+int skb_vlan_accel(struct sk_buff *skb);
 
 static inline int memcpy_from_msg(void *data, struct msghdr *msg, int len)
 {
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 7a1d48983f81..a36c7491f714 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -4482,12 +4482,28 @@  pull:
 	return err;
 }
 
-int skb_vlan_pop(struct sk_buff *skb)
+/* If a vlan tag is present move it to hw accel tag */
+int skb_vlan_accel(struct sk_buff *skb)
 {
 	u16 vlan_tci;
 	__be16 vlan_proto;
 	int err;
 
+	vlan_proto = skb->protocol;
+	err = __skb_vlan_pop(skb, &vlan_tci);
+	if (unlikely(err))
+		return err;
+
+	__vlan_hwaccel_put_tag(skb, vlan_proto, vlan_tci);
+	return 0;
+}
+EXPORT_SYMBOL(skb_vlan_accel);
+
+int skb_vlan_pop(struct sk_buff *skb)
+{
+	u16 vlan_tci;
+	int err;
+
 	if (likely(skb_vlan_tag_present(skb))) {
 		skb->vlan_tci = 0;
 	} else {
@@ -4500,19 +4516,13 @@  int skb_vlan_pop(struct sk_buff *skb)
 		if (err)
 			return err;
 	}
-	/* move next vlan tag to hw accel tag */
+
 	if (likely((skb->protocol != htons(ETH_P_8021Q) &&
 		    skb->protocol != htons(ETH_P_8021AD)) ||
 		   skb->len < VLAN_ETH_HLEN))
 		return 0;
 
-	vlan_proto = skb->protocol;
-	err = __skb_vlan_pop(skb, &vlan_tci);
-	if (unlikely(err))
-		return err;
-
-	__vlan_hwaccel_put_tag(skb, vlan_proto, vlan_tci);
-	return 0;
+	return skb_vlan_accel(skb);
 }
 EXPORT_SYMBOL(skb_vlan_pop);
 
diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index 7d9b2307d6eb..ad2331cde732 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -302,6 +302,17 @@  static int set_eth_addr(struct sk_buff *skb, struct sw_flow_key *flow_key,
 
 static int pop_eth(struct sk_buff *skb, struct sw_flow_key *key)
 {
+	/* Push outermost VLAN tag to skb metadata unless a VLAN tag
+	 * is already present there.
+	 */
+	if ((skb->protocol == htons(ETH_P_8021Q) ||
+	     skb->protocol == htons(ETH_P_8021AD)) &&
+	    !skb_vlan_tag_present(skb)) {
+		int err = skb_vlan_accel(skb);
+		if (unlikely(err))
+			return err;
+	}
+
 	skb_pull_rcsum(skb, ETH_HLEN);
 	skb_reset_mac_header(skb);
 	skb->mac_len -= ETH_HLEN;
@@ -314,13 +325,6 @@  static int push_eth(struct sk_buff *skb, struct sw_flow_key *key,
 		    const struct ovs_action_push_eth *ethh)
 {
 	struct ethhdr *hdr;
-	int err;
-
-	/* De-accelerate any hardware accelerated VLAN tag added to a previous
-	 * Ethernet header */
-	err = skb_vlan_deaccel(skb);
-	if (unlikely(err))
-		return err;
 
 	/* Add the new Ethernet header */
 	if (skb_cow_head(skb, ETH_HLEN) < 0)
diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
index 4d2698596033..fdefee776d4f 100644
--- a/net/openvswitch/flow.c
+++ b/net/openvswitch/flow.c
@@ -469,8 +469,10 @@  static int key_extract(struct sk_buff *skb, struct sw_flow_key *key)
 	skb_reset_mac_header(skb);
 
 	/* Link layer. */
+	key->eth.tci = 0;
 	if (key->phy.is_layer3) {
-		key->eth.tci = 0;
+		if (skb_vlan_tag_present(skb))
+			key->eth.tci = htons(skb->vlan_tci);
 	} else {
 		eth = eth_hdr(skb);
 		ether_addr_copy(key->eth.src, eth->h_source);
@@ -481,7 +483,6 @@  static int key_extract(struct sk_buff *skb, struct sw_flow_key *key)
 		 * update skb->csum here.
 		 */
 
-		key->eth.tci = 0;
 		if (skb_vlan_tag_present(skb))
 			key->eth.tci = htons(skb->vlan_tci);
 		else if (eth->h_proto == htons(ETH_P_8021Q))