diff mbox

[RFC,07/11] GENEVE: Add option to mangle IP IDs on inner headers when using TSO

Message ID 20160407223237.11142.33072.stgit@ahduyck-xeon-server
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Alexander Duyck April 7, 2016, 10:32 p.m. UTC
This patch adds support for a feature I am calling IP ID mangling.  It is
basically just another way of saying the IP IDs that are transmitted by the
tunnel may not match up with what would normally be expected.  Specifically
what will happen is in the case of TSO the IP IDs on the headers will be a
fixed value so a given TSO will repeat the same inner IP ID value gso_segs
number of times.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
---
 drivers/net/geneve.c         |   24 ++++++++++++++++++++++--
 include/net/udp_tunnel.h     |    8 --------
 include/uapi/linux/if_link.h |    1 +
 3 files changed, 23 insertions(+), 10 deletions(-)

Comments

Jesse Gross April 7, 2016, 11:22 p.m. UTC | #1
On Thu, Apr 7, 2016 at 7:32 PM, Alexander Duyck <aduyck@mirantis.com> wrote:
> This patch adds support for a feature I am calling IP ID mangling.  It is
> basically just another way of saying the IP IDs that are transmitted by the
> tunnel may not match up with what would normally be expected.  Specifically
> what will happen is in the case of TSO the IP IDs on the headers will be a
> fixed value so a given TSO will repeat the same inner IP ID value gso_segs
> number of times.
>
> Signed-off-by: Alexander Duyck <aduyck@mirantis.com>

If I'm understanding this correctly, enabling IP ID mangling will help
performance on ixgbe since it will allow it to do GSO partial instead
of plain GSO but it will hurt performance on i40e since it will drop
from TSO to plain GSO.

Assuming that's right, it seems like it will make it hard to chose the
right setting without knowledge of which hardware is in use. I guess
what we really want is "I care about nicely incrementing IP IDs" vs.
"I don't care as long as the DF bit is set". That second case is
really what this flag is trying to say but it seems like it is
enforcing too much in the i40e case - I don't think anyone wants to go
out of their way to make IP IDs jump around if incrementing is faster.
Alexander H Duyck April 7, 2016, 11:52 p.m. UTC | #2
On Thu, Apr 7, 2016 at 4:22 PM, Jesse Gross <jesse@kernel.org> wrote:
> On Thu, Apr 7, 2016 at 7:32 PM, Alexander Duyck <aduyck@mirantis.com> wrote:
>> This patch adds support for a feature I am calling IP ID mangling.  It is
>> basically just another way of saying the IP IDs that are transmitted by the
>> tunnel may not match up with what would normally be expected.  Specifically
>> what will happen is in the case of TSO the IP IDs on the headers will be a
>> fixed value so a given TSO will repeat the same inner IP ID value gso_segs
>> number of times.
>>
>> Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
>
> If I'm understanding this correctly, enabling IP ID mangling will help
> performance on ixgbe since it will allow it to do GSO partial instead
> of plain GSO but it will hurt performance on i40e since it will drop
> from TSO to plain GSO.

Right.  However the option is currently defaulted to off, and can be
enabled per tunnel endpoint.  So if you had an ixgbe to i40e link you
could enable it on the end with the ixgbe and you should see good
performance in both directions.

> Assuming that's right, it seems like it will make it hard to chose the
> right setting without knowledge of which hardware is in use. I guess
> what we really want is "I care about nicely incrementing IP IDs" vs.
> "I don't care as long as the DF bit is set". That second case is
> really what this flag is trying to say but it seems like it is
> enforcing too much in the i40e case - I don't think anyone wants to go
> out of their way to make IP IDs jump around if incrementing is faster.

Right.  The problem is trying to sort out all the GRO/GSO bits.  I was
probably being a bit too conservative after the last few iterations
for the GRO fixes.

Just a thought.  What if I replaced NETIF_F_TSO_FIXEDID with something
that meant we could mange the IP ID like a NETIF_F_TSO_IPID_MANGLE
(advice for better name welcome).  Instead of the feature flag meaning
we are going to transmit packets with a fixed ID it would mean we
don't care about the ID and are free to mangle it as we see fit.  The
GSO type can retain the same meaning as far as that requiring the same
ID for all, but the feature would mean we will take fixed and convert
it to incrementing, or incrementing and convert it to fixed.

- Alex
Jesse Gross April 8, 2016, 9:40 p.m. UTC | #3
On Thu, Apr 7, 2016 at 8:52 PM, Alexander Duyck
<alexander.duyck@gmail.com> wrote:
> Just a thought.  What if I replaced NETIF_F_TSO_FIXEDID with something
> that meant we could mange the IP ID like a NETIF_F_TSO_IPID_MANGLE
> (advice for better name welcome).  Instead of the feature flag meaning
> we are going to transmit packets with a fixed ID it would mean we
> don't care about the ID and are free to mangle it as we see fit.  The
> GSO type can retain the same meaning as far as that requiring the same
> ID for all, but the feature would mean we will take fixed and convert
> it to incrementing, or incrementing and convert it to fixed.

I saw the new version of the code that you posted with this idea and
now that I understand it better, it seems like a reasonable choice to
me - it's nice that it is consistent with GRO and not tunnel specific.
It also makes behavior consistent across drivers in regards to
incrementing IDs in the default case, which was one of my concerns
from before.

Maybe I missed it but I didn't see any checks for the DF bit being set
when we transmit a packet with NETIF_F_TSO_MANGLEID. Even if I am
comfortable mangling my IDs in the DF case, I don't think this would
ever extend to non-DF packets. In the documentation you noted that it
is the driver's responsibility to do this check but I couldn't find it
in either ixgbe or igb. It would also be nice if the core stack could
enforce it somehow as well rather than each driver.
Alexander H Duyck April 8, 2016, 10:04 p.m. UTC | #4
On Fri, Apr 8, 2016 at 2:40 PM, Jesse Gross <jesse@kernel.org> wrote:
> On Thu, Apr 7, 2016 at 8:52 PM, Alexander Duyck
> <alexander.duyck@gmail.com> wrote:
>> Just a thought.  What if I replaced NETIF_F_TSO_FIXEDID with something
>> that meant we could mange the IP ID like a NETIF_F_TSO_IPID_MANGLE
>> (advice for better name welcome).  Instead of the feature flag meaning
>> we are going to transmit packets with a fixed ID it would mean we
>> don't care about the ID and are free to mangle it as we see fit.  The
>> GSO type can retain the same meaning as far as that requiring the same
>> ID for all, but the feature would mean we will take fixed and convert
>> it to incrementing, or incrementing and convert it to fixed.
>
> I saw the new version of the code that you posted with this idea and
> now that I understand it better, it seems like a reasonable choice to
> me - it's nice that it is consistent with GRO and not tunnel specific.
> It also makes behavior consistent across drivers in regards to
> incrementing IDs in the default case, which was one of my concerns
> from before.
>
> Maybe I missed it but I didn't see any checks for the DF bit being set
> when we transmit a packet with NETIF_F_TSO_MANGLEID. Even if I am
> comfortable mangling my IDs in the DF case, I don't think this would
> ever extend to non-DF packets. In the documentation you noted that it
> is the driver's responsibility to do this check but I couldn't find it
> in either ixgbe or igb. It would also be nice if the core stack could
> enforce it somehow as well rather than each driver.

Yeah I had glossed over that in the igb and ixgbe patches.  A check is
only really needed for the incrementing to non-incrementing case and I
wasn't sure how common it was to have TCP with an IP header that
didn't set the DF bit.  In the case of the outer headers igb and ixgbe
will increment the IP ID always so we don't have to worry about if DF
is set of not there.  For the inner headers I had fudged it a bit and
didn't add the validation.  If needed I can see about adding that
shortly.

- Alex
Jesse Gross April 9, 2016, 3:52 p.m. UTC | #5
On Fri, Apr 8, 2016 at 7:04 PM, Alexander Duyck
<alexander.duyck@gmail.com> wrote:
> On Fri, Apr 8, 2016 at 2:40 PM, Jesse Gross <jesse@kernel.org> wrote:
>> Maybe I missed it but I didn't see any checks for the DF bit being set
>> when we transmit a packet with NETIF_F_TSO_MANGLEID. Even if I am
>> comfortable mangling my IDs in the DF case, I don't think this would
>> ever extend to non-DF packets. In the documentation you noted that it
>> is the driver's responsibility to do this check but I couldn't find it
>> in either ixgbe or igb. It would also be nice if the core stack could
>> enforce it somehow as well rather than each driver.
>
> Yeah I had glossed over that in the igb and ixgbe patches.  A check is
> only really needed for the incrementing to non-incrementing case and I
> wasn't sure how common it was to have TCP with an IP header that
> didn't set the DF bit.  In the case of the outer headers igb and ixgbe
> will increment the IP ID always so we don't have to worry about if DF
> is set of not there.  For the inner headers I had fudged it a bit and
> didn't add the validation.  If needed I can see about adding that
> shortly.

TCP without the DF bit set is not the default but it is possible (it
can be enabled by setting /proc/sys/net/ipv4/ip_no_pmtu_disc). I also
did a quick check of some Internet services and at least some of them
seem to return TCP without DF, so it's not too rare.
Alexander H Duyck April 9, 2016, 5:36 p.m. UTC | #6
On Sat, Apr 9, 2016 at 8:52 AM, Jesse Gross <jesse@kernel.org> wrote:
> On Fri, Apr 8, 2016 at 7:04 PM, Alexander Duyck
> <alexander.duyck@gmail.com> wrote:
>> On Fri, Apr 8, 2016 at 2:40 PM, Jesse Gross <jesse@kernel.org> wrote:
>>> Maybe I missed it but I didn't see any checks for the DF bit being set
>>> when we transmit a packet with NETIF_F_TSO_MANGLEID. Even if I am
>>> comfortable mangling my IDs in the DF case, I don't think this would
>>> ever extend to non-DF packets. In the documentation you noted that it
>>> is the driver's responsibility to do this check but I couldn't find it
>>> in either ixgbe or igb. It would also be nice if the core stack could
>>> enforce it somehow as well rather than each driver.
>>
>> Yeah I had glossed over that in the igb and ixgbe patches.  A check is
>> only really needed for the incrementing to non-incrementing case and I
>> wasn't sure how common it was to have TCP with an IP header that
>> didn't set the DF bit.  In the case of the outer headers igb and ixgbe
>> will increment the IP ID always so we don't have to worry about if DF
>> is set of not there.  For the inner headers I had fudged it a bit and
>> didn't add the validation.  If needed I can see about adding that
>> shortly.
>
> TCP without the DF bit set is not the default but it is possible (it
> can be enabled by setting /proc/sys/net/ipv4/ip_no_pmtu_disc). I also
> did a quick check of some Internet services and at least some of them
> seem to return TCP without DF, so it's not too rare.

For next version I plan to include a check in netif_skb_features that
will clear NETIF_F_TSO_MANGLEID if the DF bit is not set.

- Alex
Eric Dumazet April 9, 2016, 6:02 p.m. UTC | #7
On Sat, 2016-04-09 at 10:36 -0700, Alexander Duyck wrote:

> For next version I plan to include a check in netif_skb_features that
> will clear NETIF_F_TSO_MANGLEID if the DF bit is not set.

So it looks like we slowly but surely make the whole stack damn slow, to
support some extra features.

How many instructions are needed nowadays for netif_skb_features()
alone ?

More than 120 if I am not mistaken, if we count fast path in
skb_network_protocol()

Note: I have a patch to remove the gso_min_segs thing, currently unused.

2 instructions saved ! Woo-hoo !
Alexander H Duyck April 9, 2016, 6:32 p.m. UTC | #8
On Sat, Apr 9, 2016 at 11:02 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Sat, 2016-04-09 at 10:36 -0700, Alexander Duyck wrote:
>
>> For next version I plan to include a check in netif_skb_features that
>> will clear NETIF_F_TSO_MANGLEID if the DF bit is not set.
>
> So it looks like we slowly but surely make the whole stack damn slow, to
> support some extra features.
>
> How many instructions are needed nowadays for netif_skb_features()
> alone ?
>
> More than 120 if I am not mistaken, if we count fast path in
> skb_network_protocol()
>
> Note: I have a patch to remove the gso_min_segs thing, currently unused.
>
> 2 instructions saved ! Woo-hoo !

I was planning to drop the checks for TSO_MANGLEID and GSO_PARTIAL in
a if statment based on skb_is_gso() since I had both GSO_PARTIAL to
check for as well.

- Alex
diff mbox

Patch

diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
index bc168894bda3..6352223d80c3 100644
--- a/drivers/net/geneve.c
+++ b/drivers/net/geneve.c
@@ -80,6 +80,7 @@  struct geneve_dev {
 #define GENEVE_F_UDP_ZERO_CSUM_TX	BIT(0)
 #define GENEVE_F_UDP_ZERO_CSUM6_TX	BIT(1)
 #define GENEVE_F_UDP_ZERO_CSUM6_RX	BIT(2)
+#define GENEVE_F_TCP_FIXEDID		BIT(3)
 
 struct geneve_sock {
 	bool			collect_md;
@@ -702,9 +703,14 @@  static int geneve_build_skb(struct rtable *rt, struct sk_buff *skb,
 	int min_headroom;
 	int err;
 	bool udp_sum = !(flags & GENEVE_F_UDP_ZERO_CSUM_TX);
+	int type = udp_sum ? SKB_GSO_UDP_TUNNEL_CSUM : SKB_GSO_UDP_TUNNEL;
 
 	skb_scrub_packet(skb, xnet);
 
+	if ((flags & GENEVE_F_TCP_FIXEDID) && skb_is_gso(skb) &&
+	    (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV4))
+		type |= SKB_GSO_TCP_FIXEDID;
+
 	min_headroom = LL_RESERVED_SPACE(rt->dst.dev) + rt->dst.header_len
 			+ GENEVE_BASE_HLEN + opt_len + sizeof(struct iphdr);
 	err = skb_cow_head(skb, min_headroom);
@@ -713,7 +719,7 @@  static int geneve_build_skb(struct rtable *rt, struct sk_buff *skb,
 		goto free_rt;
 	}
 
-	skb = udp_tunnel_handle_offloads(skb, udp_sum);
+	skb = iptunnel_handle_offloads(skb, type);
 	if (IS_ERR(skb)) {
 		err = PTR_ERR(skb);
 		goto free_rt;
@@ -739,9 +745,14 @@  static int geneve6_build_skb(struct dst_entry *dst, struct sk_buff *skb,
 	int min_headroom;
 	int err;
 	bool udp_sum = !(flags & GENEVE_F_UDP_ZERO_CSUM6_TX);
+	int type = udp_sum ? SKB_GSO_UDP_TUNNEL_CSUM : SKB_GSO_UDP_TUNNEL;
 
 	skb_scrub_packet(skb, xnet);
 
+	if ((flags & GENEVE_F_TCP_FIXEDID) && skb_is_gso(skb) &&
+	    (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV4))
+		type |= SKB_GSO_TCP_FIXEDID;
+
 	min_headroom = LL_RESERVED_SPACE(dst->dev) + dst->header_len
 			+ GENEVE_BASE_HLEN + opt_len + sizeof(struct ipv6hdr);
 	err = skb_cow_head(skb, min_headroom);
@@ -750,7 +761,7 @@  static int geneve6_build_skb(struct dst_entry *dst, struct sk_buff *skb,
 		goto free_dst;
 	}
 
-	skb = udp_tunnel_handle_offloads(skb, udp_sum);
+	skb = iptunnel_handle_offloads(skb, type);
 	if (IS_ERR(skb)) {
 		err = PTR_ERR(skb);
 		goto free_dst;
@@ -1249,6 +1260,7 @@  static const struct nla_policy geneve_policy[IFLA_GENEVE_MAX + 1] = {
 	[IFLA_GENEVE_UDP_CSUM]		= { .type = NLA_U8 },
 	[IFLA_GENEVE_UDP_ZERO_CSUM6_TX]	= { .type = NLA_U8 },
 	[IFLA_GENEVE_UDP_ZERO_CSUM6_RX]	= { .type = NLA_U8 },
+	[IFLA_GENEVE_IPID_MANGLE]	= { .type = NLA_FLAG },
 };
 
 static int geneve_validate(struct nlattr *tb[], struct nlattr *data[])
@@ -1436,6 +1448,9 @@  static int geneve_newlink(struct net *net, struct net_device *dev,
 	    nla_get_u8(data[IFLA_GENEVE_UDP_ZERO_CSUM6_RX]))
 		flags |= GENEVE_F_UDP_ZERO_CSUM6_RX;
 
+	if (data[IFLA_GENEVE_IPID_MANGLE])
+		flags |= GENEVE_F_TCP_FIXEDID;
+
 	return geneve_configure(net, dev, &remote, vni, ttl, tos, label,
 				dst_port, metadata, flags);
 }
@@ -1460,6 +1475,7 @@  static size_t geneve_get_size(const struct net_device *dev)
 		nla_total_size(sizeof(__u8)) + /* IFLA_GENEVE_UDP_CSUM */
 		nla_total_size(sizeof(__u8)) + /* IFLA_GENEVE_UDP_ZERO_CSUM6_TX */
 		nla_total_size(sizeof(__u8)) + /* IFLA_GENEVE_UDP_ZERO_CSUM6_RX */
+		nla_total_size(0) +	 /* IFLA_GENEVE_IPID_MANGLE */
 		0;
 }
 
@@ -1505,6 +1521,10 @@  static int geneve_fill_info(struct sk_buff *skb, const struct net_device *dev)
 		       !!(geneve->flags & GENEVE_F_UDP_ZERO_CSUM6_RX)))
 		goto nla_put_failure;
 
+	if ((geneve->flags & GENEVE_F_TCP_FIXEDID) &&
+	    nla_put_flag(skb, IFLA_GENEVE_IPID_MANGLE))
+		goto nla_put_failure;
+
 	return 0;
 
 nla_put_failure:
diff --git a/include/net/udp_tunnel.h b/include/net/udp_tunnel.h
index b83114077cee..c44d04259665 100644
--- a/include/net/udp_tunnel.h
+++ b/include/net/udp_tunnel.h
@@ -98,14 +98,6 @@  struct metadata_dst *udp_tun_rx_dst(struct sk_buff *skb, unsigned short family,
 				    __be16 flags, __be64 tunnel_id,
 				    int md_size);
 
-static inline struct sk_buff *udp_tunnel_handle_offloads(struct sk_buff *skb,
-							 bool udp_csum)
-{
-	int type = udp_csum ? SKB_GSO_UDP_TUNNEL_CSUM : SKB_GSO_UDP_TUNNEL;
-
-	return iptunnel_handle_offloads(skb, type);
-}
-
 static inline void udp_tunnel_gro_complete(struct sk_buff *skb, int nhoff)
 {
 	struct udphdr *uh;
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index a3bc3f2a63d3..38acf49c818b 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -513,6 +513,7 @@  enum {
 	IFLA_GENEVE_UDP_ZERO_CSUM6_TX,
 	IFLA_GENEVE_UDP_ZERO_CSUM6_RX,
 	IFLA_GENEVE_LABEL,
+	IFLA_GENEVE_IPID_MANGLE,
 	__IFLA_GENEVE_MAX
 };
 #define IFLA_GENEVE_MAX	(__IFLA_GENEVE_MAX - 1)