diff mbox

[next,S13,14/15] i40e: fix ip-ip GRE encapulation

Message ID 1440798961-17326-15-git-send-email-catherine.sullivan@intel.com
State Changes Requested
Delegated to: Jeff Kirsher
Headers show

Commit Message

Catherine Sullivan Aug. 28, 2015, 9:56 p.m. UTC
From: Jesse Brandeburg <jesse.brandeburg@intel.com>

The i40e driver will soon be capable of offloading GRE traffic
but in the meantime, don't let the driver tell the OS that
it can offload types of traffic it cannot.

This fixes a bug where if you stack a GRE MAC in IP tunnel
over i40e, all TSO packets will be sent with invalid checksums
causing really bad throughput.

NOTE: I still see some retransmits that are unexplained when
testing with vxlan or ip/mac tunnels (GRE) but the throughput
is more like what is expected with multi-gigabit on a single
flow.

Reported-by: Stefan Assman <sassman@redhat.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Change-ID: Ie31843b150695b2b1889e306b8c33c1d2da29a9e
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 36 ++++++++++++++++++++++++++---
 1 file changed, 33 insertions(+), 3 deletions(-)

Comments

Jesse Gross Aug. 28, 2015, 11:03 p.m. UTC | #1
On Fri, Aug 28, 2015 at 2:56 PM, Catherine Sullivan
<catherine.sullivan@intel.com> wrote:
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
> index 469e1d5..b686450 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_main.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
[...]
> +#if IS_ENABLED(CONFIG_VXLAN)
> +       return vxlan_features_check(skb, features);
> +#else
>         return features;
> +#endif

I don't think this is right - when VXLAN is compiled in, it will
restrict offload capabilities to that of VXLAN. But when there is no
VXLAN support, all tunnel types will be allowed. Presumably, there is
some underlying hardware capability here for both situations (and
maybe even for GRE as well).
Shannon Nelson Sept. 9, 2015, 11:32 p.m. UTC | #2
> > From: Jesse Gross [mailto:jesse@nicira.com]
> > Sent: Friday, August 28, 2015 4:03 PM
> > To: Sullivan, Catherine
> > Cc: intel-wired-lan
> > Subject: Re: [Intel-wired-lan] [next PATCH S13 14/15] i40e: fix ip-ip
> GRE
> > encapulation

Hi Jesse, thanks for looking at these.  Please excuse this late response and possibly mangled headers, I just got back from sabbatical and am looking at old mail...

> >
> > On Fri, Aug 28, 2015 at 2:56 PM, Catherine Sullivan
> > <catherine.sullivan@intel.com> wrote:
> > > diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c
> > > b/drivers/net/ethernet/intel/i40e/i40e_main.c
> > > index 469e1d5..b686450 100644
> > > --- a/drivers/net/ethernet/intel/i40e/i40e_main.c
> > > +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
> > [...]
> > > +#if IS_ENABLED(CONFIG_VXLAN)
> > > +       return vxlan_features_check(skb, features); #else
> > >         return features;
> > > +#endif
> >
> > I don't think this is right - when VXLAN is compiled in, it will
> restrict offload
> > capabilities to that of VXLAN. But when there is no VXLAN support, all
> tunnel
> > types will be allowed. Presumably, there is some underlying hardware
> > capability here for both situations (and maybe even for GRE as well).

I don't understand where you this restriction.  Looking at the code in vxlan_features_check(), it looks to me that the features bits are mangled only if IPPROTO_UDP is set AND the other header parameters don't match VXLAN needs.  If there is no encapsulation, or it is something other than IPPROTO_UDP (e.g. IPPROTO_GRE), then there is no change to the features bits.

If you are referring to other tunneling protocols that might call themselves IPROTO_UDP, then I would think there is a problem in the implementation of vxlan_features_check(), not in how we call it.  In that case, vxlan_features_check() should be looking at some additional flag to assure VXLAN before verifying the other header parameters.

What am I missing?  Are you thinking that our calling code should look for some additional VXLAN flag before calling vxlan_features_check()?

Thanks,
sln
Jesse Gross Sept. 10, 2015, 12:52 a.m. UTC | #3
On Wed, Sep 9, 2015 at 4:32 PM, Nelson, Shannon
<shannon.nelson@intel.com> wrote:
>> > From: Jesse Gross [mailto:jesse@nicira.com]
>> > On Fri, Aug 28, 2015 at 2:56 PM, Catherine Sullivan
>> > <catherine.sullivan@intel.com> wrote:
>> > > diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c
>> > > b/drivers/net/ethernet/intel/i40e/i40e_main.c
>> > > index 469e1d5..b686450 100644
>> > > --- a/drivers/net/ethernet/intel/i40e/i40e_main.c
>> > > +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
>> > [...]
>> > > +#if IS_ENABLED(CONFIG_VXLAN)
>> > > +       return vxlan_features_check(skb, features); #else
>> > >         return features;
>> > > +#endif
>> >
>> > I don't think this is right - when VXLAN is compiled in, it will
>> restrict offload
>> > capabilities to that of VXLAN. But when there is no VXLAN support, all
>> tunnel
>> > types will be allowed. Presumably, there is some underlying hardware
>> > capability here for both situations (and maybe even for GRE as well).
>
> I don't understand where you this restriction.  Looking at the code in vxlan_features_check(), it looks to me that the features bits are mangled only if IPPROTO_UDP is set AND the other header parameters don't match VXLAN needs.  If there is no encapsulation, or it is something other than IPPROTO_UDP (e.g. IPPROTO_GRE), then there is no change to the features bits.
>
> If you are referring to other tunneling protocols that might call themselves IPROTO_UDP, then I would think there is a problem in the implementation of vxlan_features_check(), not in how we call it.  In that case, vxlan_features_check() should be looking at some additional flag to assure VXLAN before verifying the other header parameters.
>
> What am I missing?  Are you thinking that our calling code should look for some additional VXLAN flag before calling vxlan_features_check()?

VXLAN is not the only UDP based encapsulation, so there are other
protocols that call themselves that. (I'm not quite sure what you mean
by this, it seems to imply that these protocols are somehow spoofing
something. They're not - they and VXLAN are all layered under UDP).
However, I don't think that's really the issue.

The point that I was trying to make is that there is an underlying
hardware capability as far as the types of protocols that are
understood, lengths of headers, etc. and the goal of this function is
to remove any offload features that can't be supported so the core
stack doesn't use them. vxlan_features_check() will do this by
chopping down the general NETIF_F_GSO_UDP_TUNNEL feature to only allow
headers that look like VXLAN. However, if VXLAN is not compiled in,
then with this patch that restriction won't take place and the driver
will actually receive packets with more offloads than it otherwise
would. These can come from other UDP protocols which can still be
enabled even if VXLAN is not. In this case, these protocols will
break.

That being said, I also happen to know that the underlying hardware is
capable of supporting more than just VXLAN. In particular, it supports
Geneve, which is also a UDP based protocol. Using
vxlan_features_check() here will remove offload support for at least
some types of Geneve packets, so this is a little heavy handed.

To make a long story short, it looks like the actual problem here is
that only UDP tunnels are supported (since that is all that is checked
in i40e_tx_enable_csum()). In that case, I think you could simply do:

if ((protocol != IPPROTO_UDP))
    return features & ~(NETIF_F_ALL_CSUM | NETIF_F_GSO_MASK);

And that would solve the above concerns as well as catch some
additional problematic cases.
Shannon Nelson Sept. 10, 2015, 4:54 a.m. UTC | #4
> From: Jesse Gross [mailto:jesse@nicira.com]
> Sent: Wednesday, September 09, 2015 5:52 PM
[...]
> 
> On Wed, Sep 9, 2015 at 4:32 PM, Nelson, Shannon
> <shannon.nelson@intel.com> wrote:
> >> > From: Jesse Gross [mailto:jesse@nicira.com]
> >> > On Fri, Aug 28, 2015 at 2:56 PM, Catherine Sullivan
> >> > <catherine.sullivan@intel.com> wrote:
> >> > > diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c
> >> > > b/drivers/net/ethernet/intel/i40e/i40e_main.c
> >> > > index 469e1d5..b686450 100644
> >> > > --- a/drivers/net/ethernet/intel/i40e/i40e_main.c
> >> > > +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
> >> > [...]
> >> > > +#if IS_ENABLED(CONFIG_VXLAN)
> >> > > +       return vxlan_features_check(skb, features); #else
> >> > >         return features;
> >> > > +#endif
> >> >
> >> > I don't think this is right - when VXLAN is compiled in, it will
> >> restrict offload
> >> > capabilities to that of VXLAN. But when there is no VXLAN support,
> all
> >> tunnel
> >> > types will be allowed. Presumably, there is some underlying
> hardware
> >> > capability here for both situations (and maybe even for GRE as
> well).
> >
> > I don't understand where you this restriction.  Looking at the code in
> vxlan_features_check(), it looks to me that the features bits are
> mangled only if IPPROTO_UDP is set AND the other header parameters don't
> match VXLAN needs.  If there is no encapsulation, or it is something
> other than IPPROTO_UDP (e.g. IPPROTO_GRE), then there is no change to
> the features bits.
> >
> > If you are referring to other tunneling protocols that might call
> themselves IPROTO_UDP, then I would think there is a problem in the
> implementation of vxlan_features_check(), not in how we call it.  In
> that case, vxlan_features_check() should be looking at some additional
> flag to assure VXLAN before verifying the other header parameters.
> >
> > What am I missing?  Are you thinking that our calling code should look
> for some additional VXLAN flag before calling vxlan_features_check()?
> 
> VXLAN is not the only UDP based encapsulation, so there are other
> protocols that call themselves that. (I'm not quite sure what you mean
> by this, it seems to imply that these protocols are somehow spoofing
> something. They're not - they and VXLAN are all layered under UDP).
> However, I don't think that's really the issue.

[...]

Okay, so we are on the same page, just different paragraphs...

> That being said, I also happen to know that the underlying hardware is
> capable of supporting more than just VXLAN. In particular, it supports
> Geneve, which is also a UDP based protocol. Using
> vxlan_features_check() here will remove offload support for at least
> some types of Geneve packets, so this is a little heavy handed.

It becomes an issue if you rely on vxlan_features_check(), as we are currently doing, to filter the IPPROTO_UDP.  Yes, there are other UDP encapsulations, we're just not implementing them quite yet... tho' I believe you've already seen the newer Geneve patches that were just posted.  This patch is from before that.

> To make a long story short, it looks like the actual problem here is
> that only UDP tunnels are supported (since that is all that is checked
> in i40e_tx_enable_csum()). In that case, I think you could simply do:
> 
> if ((protocol != IPPROTO_UDP))
>     return features & ~(NETIF_F_ALL_CSUM | NETIF_F_GSO_MASK);

The GRE encapsulation uses a different IPPROTO_xxx value, which is what we're dealing with in this particular patch, so we're checking that in our own function, then using the kernel's vxlan check the IPPROTO_UDP encapsulation.  Yes, this is heavy handed, and as we expand the tunnels to be supported (E>G> Geneve), we're going to need to discern each encapsulation and check them individually.

I'd like to suggest that we use this patch as is, and continue refining the driver's feature checks as we add support for more encapsulations.

In the future, as these encapsulations become supported by more devices and drivers, and the checks are coded and debugged, we'll want to move these checks into the kernel stack for all to use, along with the vxlan check.

sln
Jesse Gross Sept. 10, 2015, 6:01 a.m. UTC | #5
On Wed, Sep 9, 2015 at 9:54 PM, Nelson, Shannon
<shannon.nelson@intel.com> wrote:
>> From: Jesse Gross [mailto:jesse@nicira.com]
>> Sent: Wednesday, September 09, 2015 5:52 PM
> [...]
>>
>> On Wed, Sep 9, 2015 at 4:32 PM, Nelson, Shannon
>> <shannon.nelson@intel.com> wrote:
>> >> > From: Jesse Gross [mailto:jesse@nicira.com]
>> >> > On Fri, Aug 28, 2015 at 2:56 PM, Catherine Sullivan
>> >> > <catherine.sullivan@intel.com> wrote:
>> >> > > diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c
>> >> > > b/drivers/net/ethernet/intel/i40e/i40e_main.c
>> >> > > index 469e1d5..b686450 100644
>> >> > > --- a/drivers/net/ethernet/intel/i40e/i40e_main.c
>> >> > > +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
>> >> > [...]
>> >> > > +#if IS_ENABLED(CONFIG_VXLAN)
>> >> > > +       return vxlan_features_check(skb, features); #else
>> >> > >         return features;
>> >> > > +#endif
>> >> >
>> >> > I don't think this is right - when VXLAN is compiled in, it will
>> >> restrict offload
>> >> > capabilities to that of VXLAN. But when there is no VXLAN support,
>> all
>> >> tunnel
>> >> > types will be allowed. Presumably, there is some underlying
>> hardware
>> >> > capability here for both situations (and maybe even for GRE as
>> well).
>> >
>> > I don't understand where you this restriction.  Looking at the code in
>> vxlan_features_check(), it looks to me that the features bits are
>> mangled only if IPPROTO_UDP is set AND the other header parameters don't
>> match VXLAN needs.  If there is no encapsulation, or it is something
>> other than IPPROTO_UDP (e.g. IPPROTO_GRE), then there is no change to
>> the features bits.
>> >
>> > If you are referring to other tunneling protocols that might call
>> themselves IPROTO_UDP, then I would think there is a problem in the
>> implementation of vxlan_features_check(), not in how we call it.  In
>> that case, vxlan_features_check() should be looking at some additional
>> flag to assure VXLAN before verifying the other header parameters.
>> >
>> > What am I missing?  Are you thinking that our calling code should look
>> for some additional VXLAN flag before calling vxlan_features_check()?
>>
>> VXLAN is not the only UDP based encapsulation, so there are other
>> protocols that call themselves that. (I'm not quite sure what you mean
>> by this, it seems to imply that these protocols are somehow spoofing
>> something. They're not - they and VXLAN are all layered under UDP).
>> However, I don't think that's really the issue.
>
> [...]
>
> Okay, so we are on the same page, just different paragraphs...
>
>> That being said, I also happen to know that the underlying hardware is
>> capable of supporting more than just VXLAN. In particular, it supports
>> Geneve, which is also a UDP based protocol. Using
>> vxlan_features_check() here will remove offload support for at least
>> some types of Geneve packets, so this is a little heavy handed.
>
> It becomes an issue if you rely on vxlan_features_check(), as we are currently doing, to filter the IPPROTO_UDP.  Yes, there are other UDP encapsulations, we're just not implementing them quite yet... tho' I believe you've already seen the newer Geneve patches that were just posted.  This patch is from before that.

I understand that but I believe those are only on the receive side and
I don't think there is any overlap here.

>> To make a long story short, it looks like the actual problem here is
>> that only UDP tunnels are supported (since that is all that is checked
>> in i40e_tx_enable_csum()). In that case, I think you could simply do:
>>
>> if ((protocol != IPPROTO_UDP))
>>     return features & ~(NETIF_F_ALL_CSUM | NETIF_F_GSO_MASK);
>
> The GRE encapsulation uses a different IPPROTO_xxx value, which is what we're dealing with in this particular patch, so we're checking that in our own function, then using the kernel's vxlan check the IPPROTO_UDP encapsulation.  Yes, this is heavy handed, and as we expand the tunnels to be supported (E>G> Geneve), we're going to need to discern each encapsulation and check them individually.
>
> I'd like to suggest that we use this patch as is, and continue refining the driver's feature checks as we add support for more encapsulations.

Unfortunately, I don't think that is a good idea as this patch
introduces bugs. The stack already implements these other protocols
and by not supporting them here that means there are no restrictions
on them. For example, it seems that Ethernet over GRE, IPIP, or Geneve
when VXLAN is not compiled in will all go out with checksum errors.
Even if you don't want to support offloads yet, you can't just send
the packets with errors.

Since the actual restriction in the driver is that non-UDP protocols
are not offloaded, I believe that the code that I suggested is the
only way to avoid dropping packets.

> In the future, as these encapsulations become supported by more devices and drivers, and the checks are coded and debugged, we'll want to move these checks into the kernel stack for all to use, along with the vxlan check.

To be honest, there is no way that is going to happen. The goal of the
core network stack is to be as generic as possible and the push is
going to be to have the drivers be more general rather than the core
kernel have more specific protocol knowledge.
Bowers, AndrewX Sept. 10, 2015, 4:15 p.m. UTC | #6
> -----Original Message-----
> From: Intel-wired-lan [mailto:intel-wired-lan-bounces@lists.osuosl.org] On
> Behalf Of Catherine Sullivan
> Sent: Friday, August 28, 2015 2:56 PM
> To: intel-wired-lan@lists.osuosl.org
> Subject: [Intel-wired-lan] [next PATCH S13 14/15] i40e: fix ip-ip GRE
> encapulation
> 
> From: Jesse Brandeburg <jesse.brandeburg@intel.com>
> 
> The i40e driver will soon be capable of offloading GRE traffic but in the
> meantime, don't let the driver tell the OS that it can offload types of traffic it
> cannot.
> 
> This fixes a bug where if you stack a GRE MAC in IP tunnel over i40e, all TSO
> packets will be sent with invalid checksums causing really bad throughput.
> 
> NOTE: I still see some retransmits that are unexplained when testing with
> vxlan or ip/mac tunnels (GRE) but the throughput is more like what is
> expected with multi-gigabit on a single flow.
> 
> Reported-by: Stefan Assman <sassman@redhat.com>
> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
> Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
> Change-ID: Ie31843b150695b2b1889e306b8c33c1d2da29a9e
> ---
>  drivers/net/ethernet/intel/i40e/i40e_main.c | 36
> ++++++++++++++++++++++++++---
>  1 file changed, 33 insertions(+), 3 deletions(-)

Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Code changes present in tree, patch breaks build of kernel. Builds normally with patch reverted. Error received is "drivers/net/ethernet/intel/i40e/i40e_main.c:8351:2: error: implicit declaration of function 'vxlan_features_check' [-Werror=implicit-function-declaration]"make
Catherine Sullivan Sept. 10, 2015, 4:56 p.m. UTC | #7
Jeff, please drop this patch for now. We will re-work it and send it in a future series. 

Thanks,
Catherine
Shannon Nelson Sept. 10, 2015, 9:53 p.m. UTC | #8
> From: Jesse Gross [mailto:jesse@nicira.com]
> Sent: Wednesday, September 09, 2015 11:02 PM
> 

Thanks, Jesse, for your perseverance.

> >> To make a long story short, it looks like the actual problem here is
> >> that only UDP tunnels are supported (since that is all that is
> checked
> >> in i40e_tx_enable_csum()). In that case, I think you could simply do:
> >>
> >> if ((protocol != IPPROTO_UDP))
> >>     return features & ~(NETIF_F_ALL_CSUM | NETIF_F_GSO_MASK);

Ah, now I see that the GRE support patch for i40e_tx_enable_csum() hasn't been pushed out yet.  This explains some of this discussion.  Okay - for the moment we're dropping this patch, to be reworked when the other patch gets pushed.

I think you're right about simply dropping the use of vxlan_features_check() as it is too specific for our needs.  This will be part of the rework.

sln
Jesse Gross Sept. 11, 2015, 2:12 a.m. UTC | #9
On Thu, Sep 10, 2015 at 2:53 PM, Nelson, Shannon
<shannon.nelson@intel.com> wrote:
>> From: Jesse Gross [mailto:jesse@nicira.com]
>> Sent: Wednesday, September 09, 2015 11:02 PM
>>
>
> Thanks, Jesse, for your perseverance.
>
>> >> To make a long story short, it looks like the actual problem here is
>> >> that only UDP tunnels are supported (since that is all that is
>> checked
>> >> in i40e_tx_enable_csum()). In that case, I think you could simply do:
>> >>
>> >> if ((protocol != IPPROTO_UDP))
>> >>     return features & ~(NETIF_F_ALL_CSUM | NETIF_F_GSO_MASK);
>
> Ah, now I see that the GRE support patch for i40e_tx_enable_csum() hasn't been pushed out yet.  This explains some of this discussion.  Okay - for the moment we're dropping this patch, to be reworked when the other patch gets pushed.
>
> I think you're right about simply dropping the use of vxlan_features_check() as it is too specific for our needs.  This will be part of the rework.

Thanks a lot for taking a look at it. Not trying to be a pain here - I
just want to make sure things work for all protocols/users.
diff mbox

Patch

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 469e1d5..b686450 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -30,6 +30,7 @@ 
 #if defined(CONFIG_VXLAN) || defined(CONFIG_VXLAN_MODULE)
 #include <net/vxlan.h>
 #endif
+#include <net/gre.h>
 
 const char i40e_driver_name[] = "i40e";
 static const char i40e_driver_string[] =
@@ -8372,12 +8373,41 @@  static netdev_features_t i40e_features_check(struct sk_buff *skb,
 					     struct net_device *dev,
 					     netdev_features_t features)
 {
-	if (skb->encapsulation &&
-	    (skb_inner_mac_header(skb) - skb_transport_header(skb) >
-	     I40E_MAX_TUNNEL_HDR_LEN))
+	u8 protocol = 0;
+	int hlen;
+
+	if (!skb->encapsulation)
+		return features;
+
+	/* prevent tunnel headers that are too long to offload from
+	 * being sent to the hardware
+	 */
+	if (skb_inner_mac_header(skb) - skb_transport_header(skb) >
+	    I40E_MAX_TUNNEL_HDR_LEN)
+		return features & ~(NETIF_F_ALL_CSUM | NETIF_F_GSO_MASK);
+
+	switch (vlan_get_protocol(skb)) {
+	case htons(ETH_P_IP):
+		protocol = ip_hdr(skb)->protocol;
+		break;
+	case htons(ETH_P_IPV6):
+		protocol = ipv6_hdr(skb)->nexthdr;
+		break;
+	}
+
+	hlen = skb_inner_mac_header(skb) - skb_transport_header(skb);
+	if ((protocol == IPPROTO_GRE) &&
+	    (skb->inner_protocol_type != ENCAP_TYPE_ETHER ||
+	     skb->inner_protocol != htons(ETH_P_TEB) ||
+	     hlen < sizeof(struct gre_base_hdr) ||
+	     hlen > (sizeof(struct gre_base_hdr) + 12)))
 		return features & ~(NETIF_F_ALL_CSUM | NETIF_F_GSO_MASK);
 
+#if IS_ENABLED(CONFIG_VXLAN)
+	return vxlan_features_check(skb, features);
+#else
 	return features;
+#endif
 }
 
 static const struct net_device_ops i40e_netdev_ops = {