[ovs-dev] Change in openvswitch kernel module to support MPLS label depth of 3 in ingress direction.
diff mbox series

Message ID 1571581532-18581-1-git-send-email-martinvarghesenokia@gmail.com
State New
Headers show
Series
  • [ovs-dev] Change in openvswitch kernel module to support MPLS label depth of 3 in ingress direction.
Related show

Commit Message

Martin Varghese Oct. 20, 2019, 2:25 p.m. UTC
From: Martin Varghese <martin.varghese@nokia.com>

The openvswitch kernel module was supporting a MPLS label depth of 1
in the ingress direction though the userspace OVS supports a max depth
of 3 labels. This change enables openvswitch module to support a max
depth of 3 labels in the ingress.

Signed-off-by: Martin Varghese <martin.varghese@nokia.com>
---
 datapath/actions.c      |  2 +-
 datapath/flow.c         | 22 ++++++++++++------
 datapath/flow.h         |  8 ++++---
 datapath/flow_netlink.c | 61 +++++++++++++++++++++++++++++++++++++------------
 tests/system-traffic.at | 39 +++++++++++++++++++++++++++++++
 5 files changed, 107 insertions(+), 25 deletions(-)

Comments

Ben Pfaff Oct. 20, 2019, 7:07 p.m. UTC | #1
On Sun, Oct 20, 2019 at 07:55:32PM +0530, Martin Varghese wrote:
> From: Martin Varghese <martin.varghese@nokia.com>
> 
> The openvswitch kernel module was supporting a MPLS label depth of 1
> in the ingress direction though the userspace OVS supports a max depth
> of 3 labels. This change enables openvswitch module to support a max
> depth of 3 labels in the ingress.
> 
> Signed-off-by: Martin Varghese <martin.varghese@nokia.com>

Thanks for the patch!

Usually, for kernel module changes, the workflow is to submit the change
upstream to the Linux kernel first ("upstream first").  Then, afterward,
we backport the upstream changes into the OVS repository.

I see that you have CCed this to the Linux kernel networking list
(netdev) but the patch itself is against the OVS repo.  Probably, if you
want to get reviews from netdev, you should instead post a patch against
the net-next repository.

Thanks again for working to improve Open vSwitch.
Martin Varghese Oct. 21, 2019, 4:08 a.m. UTC | #2
On Sun, Oct 20, 2019 at 12:07:06PM -0700, Ben Pfaff wrote:
> On Sun, Oct 20, 2019 at 07:55:32PM +0530, Martin Varghese wrote:
> > From: Martin Varghese <martin.varghese@nokia.com>
> > 
> > The openvswitch kernel module was supporting a MPLS label depth of 1
> > in the ingress direction though the userspace OVS supports a max depth
> > of 3 labels. This change enables openvswitch module to support a max
> > depth of 3 labels in the ingress.
> > 
> > Signed-off-by: Martin Varghese <martin.varghese@nokia.com>
> 
> Thanks for the patch!
> 
> Usually, for kernel module changes, the workflow is to submit the change
> upstream to the Linux kernel first ("upstream first").  Then, afterward,
> we backport the upstream changes into the OVS repository.
> 
> I see that you have CCed this to the Linux kernel networking list
> (netdev) but the patch itself is against the OVS repo.  Probably, if you
> want to get reviews from netdev, you should instead post a patch against
> the net-next repository.
> 
> Thanks again for working to improve Open vSwitch.

Thankyou for your mail.
The same changes are being discussed in netdev@vger.kernel.org.
Along with this patch, V2 of kernel patch for net-next repo is also submitted.

Patch
diff mbox series

diff --git a/datapath/actions.c b/datapath/actions.c
index a44e804..fbf4457 100644
--- a/datapath/actions.c
+++ b/datapath/actions.c
@@ -276,7 +276,7 @@  static int set_mpls(struct sk_buff *skb, struct sw_flow_key *flow_key,
 	}
 
 	stack->label_stack_entry = lse;
-	flow_key->mpls.top_lse = lse;
+	flow_key->mpls.lse[0] = lse;
 	return 0;
 }
 
diff --git a/datapath/flow.c b/datapath/flow.c
index 46e2bac..436b7a4 100644
--- a/datapath/flow.c
+++ b/datapath/flow.c
@@ -659,27 +659,35 @@  static int key_extract_l3l4(struct sk_buff *skb, struct sw_flow_key *key)
 			memset(&key->ipv4, 0, sizeof(key->ipv4));
 		}
 	} else if (eth_p_mpls(key->eth.type)) {
-		size_t stack_len = MPLS_HLEN;
+		u8 label_count = 1;
 
+		memset(&key->mpls, 0, sizeof(key->mpls));
 		skb_set_inner_network_header(skb, skb->mac_len);
 		while (1) {
 			__be32 lse;
 
-			error = check_header(skb, skb->mac_len + stack_len);
+			error = check_header(skb, skb->mac_len +
+					     label_count * MPLS_HLEN);
 			if (unlikely(error))
 				return 0;
 
 			memcpy(&lse, skb_inner_network_header(skb), MPLS_HLEN);
 
-			if (stack_len == MPLS_HLEN)
-				memcpy(&key->mpls.top_lse, &lse, MPLS_HLEN);
+			if (label_count <= MPLS_LABEL_DEPTH)
+				memcpy(&key->mpls.lse[label_count - 1], &lse,
+				       MPLS_HLEN);
 
-			skb_set_inner_network_header(skb, skb->mac_len + stack_len);
+			skb_set_inner_network_header(skb, skb->mac_len +
+						     label_count * MPLS_HLEN);
 			if (lse & htonl(MPLS_LS_S_MASK))
 				break;
-
-			stack_len += MPLS_HLEN;
+			
+			label_count++;
 		}
+		if (label_count > MPLS_LABEL_DEPTH)
+			label_count = MPLS_LABEL_DEPTH;
+
+		key->mpls.num_labels_mask = GENMASK(label_count - 1, 0);
 	} else if (key->eth.type == htons(ETH_P_IPV6)) {
 		int nh_len;             /* IPv6 Header + Extensions */
 
diff --git a/datapath/flow.h b/datapath/flow.h
index 1a5df38..9b7dcaf 100644
--- a/datapath/flow.h
+++ b/datapath/flow.h
@@ -43,6 +43,7 @@  enum sw_flow_mac_proto {
 	MAC_PROTO_ETHERNET,
 };
 #define SW_FLOW_KEY_INVALID	0x80
+#define MPLS_LABEL_DEPTH       3
 
 /* Store options at the end of the array if they are less than the
  * maximum size. This allows us to get the benefits of variable length
@@ -98,9 +99,6 @@  struct sw_flow_key {
 					 */
 	union {
 		struct {
-			__be32 top_lse;	/* top label stack entry */
-		} mpls;
-		struct {
 			u8     proto;	/* IP protocol or lower 8 bits of ARP opcode. */
 			u8     tos;	    /* IP ToS. */
 			u8     ttl;	    /* IP TTL/hop limit. */
@@ -148,6 +146,10 @@  struct sw_flow_key {
 				} nd;
 			};
 		} ipv6;
+		struct {
+			u32 num_labels_mask;    /* labels present bitmap of effective length MPLS_LABEL_DEPTH */
+			__be32 lse[MPLS_LABEL_DEPTH];     /* label stack entry  */
+		} mpls;
 		struct ovs_key_nsh nsh;         /* network service header */
 	};
 	struct {
diff --git a/datapath/flow_netlink.c b/datapath/flow_netlink.c
index 0f7ab53..9c30d85 100644
--- a/datapath/flow_netlink.c
+++ b/datapath/flow_netlink.c
@@ -438,7 +438,7 @@  static const struct ovs_len_tbl ovs_key_lens[OVS_KEY_ATTR_MAX + 1] = {
 	[OVS_KEY_ATTR_DP_HASH]	 = { .len = sizeof(u32) },
 	[OVS_KEY_ATTR_TUNNEL]	 = { .len = OVS_ATTR_NESTED,
 				     .next = ovs_tunnel_key_lens, },
-	[OVS_KEY_ATTR_MPLS]	 = { .len = sizeof(struct ovs_key_mpls) },
+	[OVS_KEY_ATTR_MPLS]	 = { .len = OVS_ATTR_VARIABLE },
 	[OVS_KEY_ATTR_CT_STATE]	 = { .len = sizeof(u32) },
 	[OVS_KEY_ATTR_CT_ZONE]	 = { .len = sizeof(u16) },
 	[OVS_KEY_ATTR_CT_MARK]	 = { .len = sizeof(u32) },
@@ -1619,10 +1619,27 @@  static int ovs_key_from_nlattrs(struct net *net, struct sw_flow_match *match,
 
 	if (attrs & (1ULL << OVS_KEY_ATTR_MPLS)) {
 		const struct ovs_key_mpls *mpls_key;
+		u32 hdr_len;
+		u32 label_count, label_count_mask, i;
+		
 
 		mpls_key = nla_data(a[OVS_KEY_ATTR_MPLS]);
-		SW_FLOW_KEY_PUT(match, mpls.top_lse,
-				mpls_key->mpls_lse, is_mask);
+		hdr_len = nla_len(a[OVS_KEY_ATTR_MPLS]);
+		label_count = hdr_len / sizeof(struct ovs_key_mpls);
+		
+		if (label_count == 0 || label_count > MPLS_LABEL_DEPTH ||
+		    hdr_len % sizeof(struct ovs_key_mpls))
+			return -EINVAL;
+
+		label_count_mask =  GENMASK(label_count - 1, 0);
+		
+		for (i = 0 ; i < label_count; i++)
+			SW_FLOW_KEY_PUT(match, mpls.lse[i],
+					mpls_key[i].mpls_lse, is_mask);
+
+		SW_FLOW_KEY_PUT(match, mpls.num_labels_mask,
+				label_count_mask, is_mask);
+		
 
 		attrs &= ~(1ULL << OVS_KEY_ATTR_MPLS);
 	}
@@ -2103,13 +2120,18 @@  static int __ovs_nla_put_key(const struct sw_flow_key *swkey,
 		ether_addr_copy(arp_key->arp_sha, output->ipv4.arp.sha);
 		ether_addr_copy(arp_key->arp_tha, output->ipv4.arp.tha);
 	} else if (eth_p_mpls(swkey->eth.type)) {
+		u8 num_labels, i;
 		struct ovs_key_mpls *mpls_key;
-
-		nla = nla_reserve(skb, OVS_KEY_ATTR_MPLS, sizeof(*mpls_key));
+		
+		num_labels = hweight_long(output->mpls.num_labels_mask);		
+		nla = nla_reserve(skb, OVS_KEY_ATTR_MPLS,
+				  num_labels * sizeof(*mpls_key));
 		if (!nla)
 			goto nla_put_failure;
+
 		mpls_key = nla_data(nla);
-		mpls_key->mpls_lse = output->mpls.top_lse;
+		for (i = 0; i < num_labels; i++)
+			mpls_key[i].mpls_lse = output->mpls.lse[i];
 	}
 
 	if ((swkey->eth.type == htons(ETH_P_IP) ||
@@ -2950,6 +2972,10 @@  static int __ovs_nla_copy_actions(struct net *net, const struct nlattr *attr,
 	u8 mac_proto = ovs_key_mac_proto(key);
 	const struct nlattr *a;
 	int rem, err;
+	u32 mpls_label_count = 0;
+	
+	if (eth_p_mpls(eth_type))
+		mpls_label_count = hweight_long(key->mpls.num_labels_mask);
 
 	nla_for_each_nested(a, attr, rem) {
 		/* Expected argument lengths, (u32)-1 for variable length. */
@@ -3058,26 +3084,33 @@  static int __ovs_nla_copy_actions(struct net *net, const struct nlattr *attr,
 			     !eth_p_mpls(eth_type)))
 				return -EINVAL;
 			eth_type = mpls->mpls_ethertype;
+			mpls_label_count++;
 			break;
 		}
 
-		case OVS_ACTION_ATTR_POP_MPLS:
+		case OVS_ACTION_ATTR_POP_MPLS: {
+			__be16  proto;
 			if (vlan_tci & htons(VLAN_CFI_MASK) ||
 			    !eth_p_mpls(eth_type))
 				return -EINVAL;
 
-			/* Disallow subsequent L2.5+ set and mpls_pop actions
-			 * as there is no check here to ensure that the new
-			 * eth_type is valid and thus set actions could
-			 * write off the end of the packet or otherwise
-			 * corrupt it.
+			/* Disallow subsequent L2.5+ set actions as there is
+			 * no check here to ensure that the new eth_type is
+			 * valid and thus set actions could write off the
+			 * end of the packet or otherwise  corrupt it.
 			 *
 			 * Support for these actions is planned using packet
 			 * recirculation.
 			 */
-			eth_type = htons(0);
-			break;
+			proto = nla_get_be16(a);
+			mpls_label_count--;
 
+			if (!eth_p_mpls(proto) || !mpls_label_count)
+				eth_type = htons(0);
+			else
+				eth_type =  proto;
+			break;
+		}
 		case OVS_ACTION_ATTR_SET:
 			err = validate_set(a, key, sfa,
 					   &skip_copy, mac_proto, eth_type,
diff --git a/tests/system-traffic.at b/tests/system-traffic.at
index 870a05e..cde7429 100644
--- a/tests/system-traffic.at
+++ b/tests/system-traffic.at
@@ -992,6 +992,45 @@  NS_CHECK_EXEC([at_ns1], [ping -q -c 3 -i 0.3 -w 2 10.1.1.1 | FORMAT_PING], [0],
 
 OVS_TRAFFIC_VSWITCHD_STOP
 AT_CLEANUP
+
+AT_SETUP([datapath - multiple mpls label pop])
+OVS_TRAFFIC_VSWITCHD_START([_ADD_BR([br1])])
+
+ADD_NAMESPACES(at_ns0, at_ns1)
+
+ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24")
+ADD_VETH(p1, at_ns1, br1, "10.1.1.2/24")
+
+AT_CHECK([ip link add patch0 type veth peer name patch1])
+on_exit 'ip link del patch0'
+
+AT_CHECK([ip link set dev patch0 up])
+AT_CHECK([ip link set dev patch1 up])
+AT_CHECK([ovs-vsctl add-port br0 patch0])
+AT_CHECK([ovs-vsctl add-port br1 patch1])
+
+AT_DATA([flows.txt], [dnl
+table=0,priority=100,dl_type=0x0800 actions=push_mpls:0x8847,set_mpls_label:3,push_mpls:0x8847,set_mpls_label:2,push_mpls:0x8847,set_mpls_label:1,resubmit(,3)
+table=0,priority=100,dl_type=0x8847,mpls_label=1 actions=pop_mpls:0x8847,resubmit(,1)
+table=1,priority=100,dl_type=0x8847,mpls_label=2 actions=pop_mpls:0x8847,resubmit(,2)
+table=2,priority=100,dl_type=0x8847,mpls_label=3 actions=pop_mpls:0x0800,resubmit(,3)
+table=0,priority=10 actions=resubmit(,3)
+table=3,priority=10 actions=normal
+])
+
+AT_CHECK([ovs-ofctl add-flows br0 flows.txt])
+AT_CHECK([ovs-ofctl add-flows br1 flows.txt])
+
+NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.2 | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+NS_CHECK_EXEC([at_ns1], [ping -q -c 3 -i 0.3 -w 2 10.1.1.1 | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
 AT_SETUP([datapath - basic truncate action])
 AT_SKIP_IF([test $HAVE_NC = no])
 OVS_TRAFFIC_VSWITCHD_START()