diff mbox

[v9,net-next,4/7] openvswitch: add layer 3 flow/port support

Message ID 20160511030632.GA24805@vergenet.net
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Simon Horman May 11, 2016, 3:06 a.m. UTC
Hi Jiri,

On Wed, May 11, 2016 at 10:50:09AM +0900, Simon Horman wrote:

[...]

> > > Its possible that I've overlooked something but as things stand I think
> > > things look like this:
> > > 
> > > * ovs_flow_key_extract() keys off dev->type and skb->protocol.
> > > * ovs_flow_key_extract() calls key_extract() which amongst other things
> > >   sets up the skb->mac_header and skb->mac_len of the skb.
> > > * ovs_flow_key_extract() sets skb->protocol to that of the inner packet
> > >   in the case of TEB
> > > * Actions update the above mentioned skb fields as appropriate.
> > 
> > Okay, that actually eases things somewhat.
> > 
> > > So it seems to me that it should be safe to rely on skb->protocol
> > > in the receive path. Or more specifically, in netdev_port_receive().
> > > 
> > > If mac_len is also able to be used then I think fine. But it seems to me
> > > that it needs to be set up by OvS at least for the ARPHRD_NONE case. This
> > > could be done early on, say in netdev_port_receive(). But it seems that
> > > would involve duplicating some of what is already occurring in
> > > key_extract().
> > 
> > I'd actually prefer doing this earlier, netdev_port_receive looks like
> > the right place. Just set mac_len there (or whatever) and let
> > key_extract do the rest of the work, not depending on dev->type in
> > there.
> > 
> > My point about recirculation was not actually valid, as I missed you're
> > doing this in ovs_flow_key_extract and not in key_extract. Still,
> > I think the special handling of particular interface types belongs to
> > the tx processing on those interfaces, not to the common code.
> 
> Sure, if that is your preference I think it should be simple enough to
> implement. I agree that netdev_port_receive() looks like a good place for
> this.

So far I have the following, which seems to work
changes to make dev->type ARPHRD_NONE for compat GRE vports.

Is this close to what you had in mind?

Comments

Jiri Benc May 11, 2016, 2:09 p.m. UTC | #1
On Wed, 11 May 2016 12:06:35 +0900, Simon Horman wrote:
> Is this close to what you had in mind?

Yes but see below.

> @@ -739,17 +729,17 @@ int ovs_flow_key_extract(const struct ip_tunnel_info *tun_info,
>  	key->phy.skb_mark = skb->mark;
>  	ovs_ct_fill_key(skb, key);
>  	key->ovs_flow_hash = 0;
> -	key->phy.is_layer3 = is_layer3;
> +	key->phy.is_layer3 = (tun_info && skb->mac_len == 0);

Do we have to depend on tun_info? It would be nice to support all
ARPHRD_NONE interfaces, not just tunnels. The tun interface (from
the tuntap driver) comes to mind, for example.

> --- a/net/openvswitch/vport-netdev.c
> +++ b/net/openvswitch/vport-netdev.c
> @@ -60,7 +60,21 @@ static void netdev_port_receive(struct sk_buff *skb)
>  	if (vport->dev->type == ARPHRD_ETHER) {
>  		skb_push(skb, ETH_HLEN);
>  		skb_postpush_rcsum(skb, skb->data, ETH_HLEN);
> +	} else if (vport->dev->type == ARPHRD_NONE) {
> +		if (skb->protocol == htons(ETH_P_TEB)) {
> +			struct ethhdr *eth = eth_hdr(skb);
> +
> +			if (unlikely(skb->len < ETH_HLEN))
> +				goto error;
> +
> +			skb->mac_len = ETH_HLEN;
> +			if (eth->h_proto == htons(ETH_P_8021Q))
> +				skb->mac_len += VLAN_HLEN;
> +		} else {
> +			skb->mac_len = 0;
> +		}

Without putting much thought into this, could this perhaps be left for
parse_ethertype (called from key_extract) to do?

Thanks,

 Jiri
Simon Horman May 11, 2016, 10:46 p.m. UTC | #2
On Wed, May 11, 2016 at 04:09:28PM +0200, Jiri Benc wrote:
> On Wed, 11 May 2016 12:06:35 +0900, Simon Horman wrote:
> > Is this close to what you had in mind?
> 
> Yes but see below.
> 
> > @@ -739,17 +729,17 @@ int ovs_flow_key_extract(const struct ip_tunnel_info *tun_info,
> >  	key->phy.skb_mark = skb->mark;
> >  	ovs_ct_fill_key(skb, key);
> >  	key->ovs_flow_hash = 0;
> > -	key->phy.is_layer3 = is_layer3;
> > +	key->phy.is_layer3 = (tun_info && skb->mac_len == 0);
> 
> Do we have to depend on tun_info? It would be nice to support all
> ARPHRD_NONE interfaces, not just tunnels. The tun interface (from
> the tuntap driver) comes to mind, for example.

Yes, I think that should work. I was just being cautious.

Do you think it is safe to detect TEB based on skb->protocol regardless
of the presence of tun_info?

> > +++ b/net/openvswitch/vport-netdev.c
> > @@ -60,7 +60,21 @@ static void netdev_port_receive(struct sk_buff *skb)
> >  	if (vport->dev->type == ARPHRD_ETHER) {
> >  		skb_push(skb, ETH_HLEN);
> >  		skb_postpush_rcsum(skb, skb->data, ETH_HLEN);
> > +	} else if (vport->dev->type == ARPHRD_NONE) {
> > +		if (skb->protocol == htons(ETH_P_TEB)) {
> > +			struct ethhdr *eth = eth_hdr(skb);
> > +
> > +			if (unlikely(skb->len < ETH_HLEN))
> > +				goto error;
> > +
> > +			skb->mac_len = ETH_HLEN;
> > +			if (eth->h_proto == htons(ETH_P_8021Q))
> > +				skb->mac_len += VLAN_HLEN;
> > +		} else {
> > +			skb->mac_len = 0;
> > +		}
> 
> Without putting much thought into this, could this perhaps be left for
> parse_ethertype (called from key_extract) to do?

I think I am confused.

I believe that key_extract() does already do all of the above (and more).

The purpose of the above change was to do this work here rather than
leaving it to parse_ethertype. This is because I was under the impression
that is what you were after. Specifically as a mechanism to avoid relying
on vport->dev->type in ovs_flow_key_extract.

If we can live with a bogus skb->mac_len value that is sufficient for
ovs_flow_key_extract.() and set correctly by key_extract() (which happens
anyway) we could do something like this:

	} else if (vport->dev->type == ARPHRD_NONE) {
		if (skb->protocol == htons(ETH_P_TEB))
			/* Ignores presence of VLAN but is sufficient for
			 * ovs_flow_key_extract() which then calls key_extract()
			 * which calculates skb->mac_len correctly. */
			skb->mac_len = ETH_HLEN; /* Ignores presence of VLAN */
		else
			skb->mac_len = 0;
	}


But perhaps I have missed the point somehow.
Jiri Benc May 17, 2016, 2:43 p.m. UTC | #3
On Thu, 12 May 2016 07:46:52 +0900, Simon Horman wrote:
> If we can live with a bogus skb->mac_len value that is sufficient for
> ovs_flow_key_extract.() and set correctly by key_extract() (which happens
> anyway) we could do something like this:
> 
> 	} else if (vport->dev->type == ARPHRD_NONE) {
> 		if (skb->protocol == htons(ETH_P_TEB))
> 			/* Ignores presence of VLAN but is sufficient for
> 			 * ovs_flow_key_extract() which then calls key_extract()
> 			 * which calculates skb->mac_len correctly. */
> 			skb->mac_len = ETH_HLEN; /* Ignores presence of VLAN */
> 		else
> 			skb->mac_len = 0;
> 	}
> 
> 
> But perhaps I have missed the point somehow.

You did not, I was more thinking aloud. But you're right, it doesn't
make much sense.

So, wouldn't it be actually more correct and in line with patch 2 to
call eth_type_trans() here? Possibly even followed by skb_vlan_untag
(not needed. But it might make things easier to have the first vlan tag
always reside inside skb->vlan_tci and treat that as an invariant in
the whole ovs kernel code. It'll need to be done in more spots than
just this one, though, and is probably matter of a separate patchset).

 Jiri
Simon Horman May 18, 2016, 2:18 a.m. UTC | #4
On Tue, May 17, 2016 at 04:43:20PM +0200, Jiri Benc wrote:
> On Thu, 12 May 2016 07:46:52 +0900, Simon Horman wrote:
> > If we can live with a bogus skb->mac_len value that is sufficient for
> > ovs_flow_key_extract.() and set correctly by key_extract() (which happens
> > anyway) we could do something like this:
> > 
> > 	} else if (vport->dev->type == ARPHRD_NONE) {
> > 		if (skb->protocol == htons(ETH_P_TEB))
> > 			/* Ignores presence of VLAN but is sufficient for
> > 			 * ovs_flow_key_extract() which then calls key_extract()
> > 			 * which calculates skb->mac_len correctly. */
> > 			skb->mac_len = ETH_HLEN; /* Ignores presence of VLAN */
> > 		else
> > 			skb->mac_len = 0;
> > 	}
> > 
> > 
> > But perhaps I have missed the point somehow.
> 
> You did not, I was more thinking aloud. But you're right, it doesn't
> make much sense.
> 
> So, wouldn't it be actually more correct and in line with patch 2 to
> call eth_type_trans() here?

Yes, that seems reasonable.

> Possibly even followed by skb_vlan_untag
> (not needed. But it might make things easier to have the first vlan tag
> always reside inside skb->vlan_tci and treat that as an invariant in
> the whole ovs kernel code. It'll need to be done in more spots than
> just this one, though, and is probably matter of a separate patchset).

That also seems reasonable. But as it seems more invasive and is not
strictly necessary I'd rather handle that update separately.
diff mbox

Patch

diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
index d320c2657627..4d2698596033 100644
--- a/net/openvswitch/flow.c
+++ b/net/openvswitch/flow.c
@@ -700,8 +700,6 @@  int ovs_flow_key_update(struct sk_buff *skb, struct sw_flow_key *key)
 int ovs_flow_key_extract(const struct ip_tunnel_info *tun_info,
 			 struct sk_buff *skb, struct sw_flow_key *key)
 {
-	bool is_layer3 = false;
-	bool is_teb = false;
 	int err;
 
 	/* Extract metadata from packet. */
@@ -709,14 +707,6 @@  int ovs_flow_key_extract(const struct ip_tunnel_info *tun_info,
 		key->tun_proto = ip_tunnel_info_af(tun_info);
 		memcpy(&key->tun_key, &tun_info->key, sizeof(key->tun_key));
 
-		if (OVS_CB(skb)->input_vport->dev->type != ARPHRD_ETHER) {
-			if (skb->protocol == htons(ETH_P_TEB))
-				is_teb = true;
-			else
-				is_layer3 = true;
-		}
-
-
 		if (tun_info->options_len) {
 			BUILD_BUG_ON((1 << (sizeof(tun_info->options_len) *
 						   8)) - 1
@@ -739,17 +729,17 @@  int ovs_flow_key_extract(const struct ip_tunnel_info *tun_info,
 	key->phy.skb_mark = skb->mark;
 	ovs_ct_fill_key(skb, key);
 	key->ovs_flow_hash = 0;
-	key->phy.is_layer3 = is_layer3;
+	key->phy.is_layer3 = (tun_info && skb->mac_len == 0);
 	key->recirc_id = 0;
 
 	err = key_extract(skb, key);
 	if (err < 0)
 		return err;
 
-	if (is_teb)
-		skb->protocol = key->eth.type;
-	else if (is_layer3)
+	if (key->phy.is_layer3)
 		key->eth.type = skb->protocol;
+	else if (tun_info && skb->protocol == htons(ETH_P_TEB))
+		skb->protocol = key->eth.type;
 
 	return err;
 }
diff --git a/net/openvswitch/vport-netdev.c b/net/openvswitch/vport-netdev.c
index 7d54414b35eb..ac8178ac2c81 100644
--- a/net/openvswitch/vport-netdev.c
+++ b/net/openvswitch/vport-netdev.c
@@ -60,7 +60,21 @@  static void netdev_port_receive(struct sk_buff *skb)
 	if (vport->dev->type == ARPHRD_ETHER) {
 		skb_push(skb, ETH_HLEN);
 		skb_postpush_rcsum(skb, skb->data, ETH_HLEN);
+	} else if (vport->dev->type == ARPHRD_NONE) {
+		if (skb->protocol == htons(ETH_P_TEB)) {
+			struct ethhdr *eth = eth_hdr(skb);
+
+			if (unlikely(skb->len < ETH_HLEN))
+				goto error;
+
+			skb->mac_len = ETH_HLEN;
+			if (eth->h_proto == htons(ETH_P_8021Q))
+				skb->mac_len += VLAN_HLEN;
+		} else {
+			skb->mac_len = 0;
+		}
 	}
+
 	ovs_vport_receive(vport, skb, skb_tunnel_info(skb));
 	return;
 error: