diff mbox series

seg6: using DSCP of inner IPv4 packets

Message ID 20200804074030.1147-1-ahabdels@gmail.com
State Changes Requested
Delegated to: David Miller
Headers show
Series seg6: using DSCP of inner IPv4 packets | expand

Commit Message

Ahmed Abdelsalam Aug. 4, 2020, 7:40 a.m. UTC
This patch allows copying the DSCP from inner IPv4 header to the
outer IPv6 header, when doing SRv6 Encapsulation.

This allows forwarding packet across the SRv6 fabric based on their
original traffic class.

Signed-off-by: Ahmed Abdelsalam <ahabdels@gmail.com>
---
 net/ipv6/seg6_iptunnel.c | 36 ++++++++++++++++++------------------
 1 file changed, 18 insertions(+), 18 deletions(-)

Comments

David Miller Aug. 4, 2020, 8:08 p.m. UTC | #1
From: Ahmed Abdelsalam <ahabdels@gmail.com>
Date: Tue,  4 Aug 2020 07:40:30 +0000

> This patch allows copying the DSCP from inner IPv4 header to the
> outer IPv6 header, when doing SRv6 Encapsulation.
> 
> This allows forwarding packet across the SRv6 fabric based on their
> original traffic class.
> 
> Signed-off-by: Ahmed Abdelsalam <ahabdels@gmail.com>

Thanks for reworking your patch this way.  I need some time to
audit the new control flow since the SKB push operation is now
in a different location.

Thanks.
David Miller Aug. 6, 2020, 12:40 a.m. UTC | #2
From: Ahmed Abdelsalam <ahabdels@gmail.com>
Date: Tue,  4 Aug 2020 07:40:30 +0000

> This patch allows copying the DSCP from inner IPv4 header to the
> outer IPv6 header, when doing SRv6 Encapsulation.
> 
> This allows forwarding packet across the SRv6 fabric based on their
> original traffic class.
> 
> Signed-off-by: Ahmed Abdelsalam <ahabdels@gmail.com>

You have changed the hop limit behavior here and that neither seems
intentional nor correct.

When encapsulating ipv6 inside of ipv6 the inner hop limit should be
inherited.  You should only use the DST hop limit when encapsulating
ipv4.

And that's what the existing code did.
Ahmed Abdelsalam Aug. 6, 2020, 6:43 a.m. UTC | #3
Hi David,

SRv6 as defined in [1][2] does not mandate that the hop_limit of the 
outer IPv6 header has to be copied from the inner packet.

The only thing that is mandatory is that the hop_limit of the inner 
packet has to be decremented [3]. This complies with the specification 
defined in the Generic Packet Tunneling in IPv6 [4]. This part is 
actually missing in the kernel.

For the hop_limit of the outer IPv6 header, the other SRv6 
implementations [5][6] by default uses the default ipv6 hop_limit. But 
they allow also to use a configurable hop_limit for the outer header.

In conclusion the hop limit behavior in this patch is intentional and in 
my opnion correct.

If you agree I can send two patches to:
- decrement hop_limit of inner packet
- allow a configurable hop limit of outer IPv6 header


[1] https://tools.ietf.org/html/rfc8754
[2] 
https://tools.ietf.org/html/draft-ietf-spring-srv6-network-programming-16
[3] 
https://tools.ietf.org/html/draft-ietf-spring-srv6-network-programming-16#section-5
[4] https://tools.ietf.org/html/rfc2473#section-3.1
[5]https://github.com/FDio/vpp/blob/8bf80a3ddae7733925a757cb1710e25776eea01c/src/vnet/srv6/sr_policy_rewrite.c#L110
[6] 
https://www.cisco.com/c/en/us/td/docs/routers/asr9000/software/asr9k-r6-6/segment-routing/configuration/guide/b-segment-routing-cg-asr9000-66x/b-segment-routing-cg-asr9000-66x_chapter_011.html#id_94209


On 06/08/2020 02:40, David Miller wrote:
> From: Ahmed Abdelsalam <ahabdels@gmail.com>
> Date: Tue,  4 Aug 2020 07:40:30 +0000
> 
>> This patch allows copying the DSCP from inner IPv4 header to the
>> outer IPv6 header, when doing SRv6 Encapsulation.
>>
>> This allows forwarding packet across the SRv6 fabric based on their
>> original traffic class.
>>
>> Signed-off-by: Ahmed Abdelsalam <ahabdels@gmail.com>
> 
> You have changed the hop limit behavior here and that neither seems
> intentional nor correct.
> 
> When encapsulating ipv6 inside of ipv6 the inner hop limit should be
> inherited.  You should only use the DST hop limit when encapsulating
> ipv4.
> 
> And that's what the existing code did.
>
David Miller Aug. 8, 2020, 12:43 a.m. UTC | #4
From: Ahmed Abdelsalam <ahabdels@gmail.com>
Date: Thu, 6 Aug 2020 08:43:06 +0200

> SRv6 as defined in [1][2] does not mandate that the hop_limit of the
> outer IPv6 header has to be copied from the inner packet.

This is not an issue of seg6 RFCs, but rather generic ip6 in ip6
tunnel encapsulation.

Therefore, what the existing ip6 tunnel encap does is our guide,
and it inherits from the inner header.

And that's what the original seg6 code almost certainly used to
guide the decision making in this area.
Ahmed Abdelsalam Aug. 15, 2020, 9:47 a.m. UTC | #5
Hi David,

Sorry for the late reply. I'm on PTO with limited email access.

I will revise the patch in the next weeks and make outer IPv6 header 
inherit Hop limit from Inner packet for the IPv6 case.

Ahmed


On 08/08/2020 02:43, David Miller wrote:
> From: Ahmed Abdelsalam <ahabdels@gmail.com>
> Date: Thu, 6 Aug 2020 08:43:06 +0200
> 
>> SRv6 as defined in [1][2] does not mandate that the hop_limit of the
>> outer IPv6 header has to be copied from the inner packet.
> 
> This is not an issue of seg6 RFCs, but rather generic ip6 in ip6
> tunnel encapsulation.
> 
> Therefore, what the existing ip6 tunnel encap does is our guide,
> and it inherits from the inner header.
> 
> And that's what the original seg6 code almost certainly used to
> guide the decision making in this area.
>
diff mbox series

Patch

diff --git a/net/ipv6/seg6_iptunnel.c b/net/ipv6/seg6_iptunnel.c
index e0e9f48ab14f..12fb32ca64f7 100644
--- a/net/ipv6/seg6_iptunnel.c
+++ b/net/ipv6/seg6_iptunnel.c
@@ -87,8 +87,7 @@  static void set_tun_src(struct net *net, struct net_device *dev,
 }
 
 /* Compute flowlabel for outer IPv6 header */
-static __be32 seg6_make_flowlabel(struct net *net, struct sk_buff *skb,
-				  struct ipv6hdr *inner_hdr)
+static __be32 seg6_make_flowlabel(struct net *net, struct sk_buff *skb)
 {
 	int do_flowlabel = net->ipv6.sysctl.seg6_flowlabel;
 	__be32 flowlabel = 0;
@@ -99,7 +98,7 @@  static __be32 seg6_make_flowlabel(struct net *net, struct sk_buff *skb,
 		hash = rol32(hash, 16);
 		flowlabel = (__force __be32)hash & IPV6_FLOWLABEL_MASK;
 	} else if (!do_flowlabel && skb->protocol == htons(ETH_P_IPV6)) {
-		flowlabel = ip6_flowlabel(inner_hdr);
+		flowlabel = ip6_flowlabel(ipv6_hdr(skb));
 	}
 	return flowlabel;
 }
@@ -109,10 +108,11 @@  int seg6_do_srh_encap(struct sk_buff *skb, struct ipv6_sr_hdr *osrh, int proto)
 {
 	struct dst_entry *dst = skb_dst(skb);
 	struct net *net = dev_net(dst->dev);
-	struct ipv6hdr *hdr, *inner_hdr;
 	struct ipv6_sr_hdr *isrh;
 	int hdrlen, tot_len, err;
+	struct ipv6hdr *hdr;
 	__be32 flowlabel;
+	u8 tos = 0;
 
 	hdrlen = (osrh->hdrlen + 1) << 3;
 	tot_len = hdrlen + sizeof(*hdr);
@@ -121,31 +121,31 @@  int seg6_do_srh_encap(struct sk_buff *skb, struct ipv6_sr_hdr *osrh, int proto)
 	if (unlikely(err))
 		return err;
 
-	inner_hdr = ipv6_hdr(skb);
-	flowlabel = seg6_make_flowlabel(net, skb, inner_hdr);
-
-	skb_push(skb, tot_len);
-	skb_reset_network_header(skb);
-	skb_mac_header_rebuild(skb);
-	hdr = ipv6_hdr(skb);
-
 	/* inherit tc, flowlabel and hlim
 	 * hlim will be decremented in ip6_forward() afterwards and
 	 * decapsulation will overwrite inner hlim with outer hlim
 	 */
 
+	flowlabel = seg6_make_flowlabel(net, skb);
+
 	if (skb->protocol == htons(ETH_P_IPV6)) {
-		ip6_flow_hdr(hdr, ip6_tclass(ip6_flowinfo(inner_hdr)),
-			     flowlabel);
-		hdr->hop_limit = inner_hdr->hop_limit;
+		tos = ip6_tclass(ip6_flowinfo(ipv6_hdr(skb)));
+	} else if (skb->protocol == htons(ETH_P_IP)) {
+		tos = ip_hdr(skb)->tos;
+		memset(IP6CB(skb), 0, sizeof(*IP6CB(skb)));
 	} else {
-		ip6_flow_hdr(hdr, 0, flowlabel);
-		hdr->hop_limit = ip6_dst_hoplimit(skb_dst(skb));
-
 		memset(IP6CB(skb), 0, sizeof(*IP6CB(skb)));
 	}
 
+	skb_push(skb, tot_len);
+	skb_reset_network_header(skb);
+	skb_mac_header_rebuild(skb);
+	hdr = ipv6_hdr(skb);
+
+	ip6_flow_hdr(hdr, tos, flowlabel);
+
 	hdr->nexthdr = NEXTHDR_ROUTING;
+	hdr->hop_limit = ip6_dst_hoplimit(skb_dst(skb));
 
 	isrh = (void *)hdr + sizeof(*hdr);
 	memcpy(isrh, osrh, hdrlen);