diff mbox series

[v2] bnx2x: disable GSO where gso_size is too big for hardware

Message ID 20180111235905.10110-1-dja@axtens.net
State Changes Requested, archived
Delegated to: David Miller
Headers show
Series [v2] bnx2x: disable GSO where gso_size is too big for hardware | expand

Commit Message

Daniel Axtens Jan. 11, 2018, 11:59 p.m. UTC
If a bnx2x card is passed a GSO packet with a gso_size larger than
~9700 bytes, it will cause a firmware error that will bring the card
down:

bnx2x: [bnx2x_attn_int_deasserted3:4323(enP24p1s0f0)]MC assert!
bnx2x: [bnx2x_mc_assert:720(enP24p1s0f0)]XSTORM_ASSERT_LIST_INDEX 0x2
bnx2x: [bnx2x_mc_assert:736(enP24p1s0f0)]XSTORM_ASSERT_INDEX 0x0 = 0x00000000 0x25e43e47 0x00463e01 0x00010052
bnx2x: [bnx2x_mc_assert:750(enP24p1s0f0)]Chip Revision: everest3, FW Version: 7_13_1
... (dump of values continues) ...

Detect when gso_size + header length is greater than the maximum
packet size (9700 bytes) and disable GSO. For simplicity and speed
this is approximated by comparing gso_size against 9200 and assuming
no-one will have more than 500 bytes of headers.

This raises the obvious question - how do we end up with a packet with
a gso_size that's greater than 9700? This has been observed on an
powerpc system when Open vSwitch is forwarding a packet from an
ibmveth device.

ibmveth is a bit special. It's the driver for communication between
virtual machines (aka 'partitions'/LPARs) running under IBM's
proprietary hypervisor on ppc machines. It allows sending very large
packets (up to 64kB) between LPARs. This involves some quite
'interesting' things: for example, when talking TCP, the MSS is stored
the checksum field (see ibmveth_rx_mss_helper() in ibmveth.c).

Normally on a box like this, there would be a Virtual I/O Server
(VIOS) partition that owns the physical network card. VIOS lets the
AIX partitions know when they're talking to a real network and that
they should drop their MSS. This works fine if VIOS owns the physical
network card.

However, in this case, a Linux partition owns the card (this is known
as a NovaLink setup). The negotiation between VIOS and AIX uses a
non-standard TCP option, so Linux has never supported that.  Instead,
Linux just supports receiving large packets. It doesn't support any
form of messaging/MSS negotiation back to other LPARs.

To get some clarity about where the large MSS was coming from, I asked
Thomas Falcon, the maintainer of ibmveth, for some background:

"In most cases, large segments are an aggregation of smaller packets
by the Virtual I/O Server (VIOS) partition and then are forwarded to
the Linux LPAR / ibmveth driver. These segments can be as large as
64KB. In this case, since the customer is using Novalink, I believe
what is happening is pretty straightforward: the large segments are
created by the AIX partition and then forwarded to the Linux
partition, ... The ibmveth driver doesn't do any aggregation itself
but just ensures the proper bits are set before sending the frame up
to avoid giving the upper layers indigestion."

It is possible to stop AIX from sending these large segments, but it
requires configuration on each LPAR. While ibmveth's behaviour is
admittedly weird, we should fix this here: it shouldn't be possible
for it to cause a firmware panic on another card.

Cc: Thomas Falcon <tlfalcon@linux.vnet.ibm.com> # ibmveth
Cc: Yuval Mintz <Yuval.Mintz@cavium.com> # bnx2x
Thanks-to: Jay Vosburgh <jay.vosburgh@canonical.com> # veth info
Signed-off-by: Daniel Axtens <dja@axtens.net>

---
v2: change to a feature check as suggested by Eric Dumazet.

---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

Comments

Marcelo Ricardo Leitner Jan. 12, 2018, 11:48 a.m. UTC | #1
On Fri, Jan 12, 2018 at 10:59:05AM +1100, Daniel Axtens wrote:
> If a bnx2x card is passed a GSO packet with a gso_size larger than
> ~9700 bytes, it will cause a firmware error that will bring the card
> down:

Why not use netif_set_gso_max_size() instead?
Some drivers are using it to avoid such larger than supported packets.
Marcelo Ricardo Leitner Jan. 12, 2018, 11:53 a.m. UTC | #2
On Fri, Jan 12, 2018 at 09:48:44AM -0200, Marcelo Ricardo Leitner wrote:
> On Fri, Jan 12, 2018 at 10:59:05AM +1100, Daniel Axtens wrote:
> > If a bnx2x card is passed a GSO packet with a gso_size larger than
> > ~9700 bytes, it will cause a firmware error that will bring the card
> > down:
> 
> Why not use netif_set_gso_max_size() instead?
> Some drivers are using it to avoid such larger than supported packets.

Scratch that, sorry. It doesn't take effect in forwarding path.

  Marcelo
Chopra, Manish Jan. 15, 2018, 2:58 p.m. UTC | #3
> -----Original Message-----
> From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org]
> On Behalf Of Daniel Axtens
> Sent: Friday, January 12, 2018 5:29 AM
> To: netdev@vger.kernel.org
> Cc: Daniel Axtens <dja@axtens.net>; Thomas Falcon
> <tlfalcon@linux.vnet.ibm.com>; Mintz, Yuval <Yuval.Mintz@cavium.com>
> Subject: [PATCH v2] bnx2x: disable GSO where gso_size is too big for hardware
> 
> If a bnx2x card is passed a GSO packet with a gso_size larger than
> ~9700 bytes, it will cause a firmware error that will bring the card
> down:
> 
> bnx2x: [bnx2x_attn_int_deasserted3:4323(enP24p1s0f0)]MC assert!
> bnx2x: [bnx2x_mc_assert:720(enP24p1s0f0)]XSTORM_ASSERT_LIST_INDEX 0x2
> bnx2x: [bnx2x_mc_assert:736(enP24p1s0f0)]XSTORM_ASSERT_INDEX 0x0 =
> 0x00000000 0x25e43e47 0x00463e01 0x00010052
> bnx2x: [bnx2x_mc_assert:750(enP24p1s0f0)]Chip Revision: everest3, FW
> Version: 7_13_1 ... (dump of values continues) ...
> 
> Detect when gso_size + header length is greater than the maximum packet size
> (9700 bytes) and disable GSO. For simplicity and speed this is approximated by
> comparing gso_size against 9200 and assuming no-one will have more than 500
> bytes of headers.
> 
> This raises the obvious question - how do we end up with a packet with a
> gso_size that's greater than 9700? This has been observed on an powerpc
> system when Open vSwitch is forwarding a packet from an ibmveth device.
> 
> ibmveth is a bit special. It's the driver for communication between virtual
> machines (aka 'partitions'/LPARs) running under IBM's proprietary hypervisor on
> ppc machines. It allows sending very large packets (up to 64kB) between LPARs.
> This involves some quite 'interesting' things: for example, when talking TCP, the
> MSS is stored the checksum field (see ibmveth_rx_mss_helper() in ibmveth.c).
> 
> Normally on a box like this, there would be a Virtual I/O Server
> (VIOS) partition that owns the physical network card. VIOS lets the AIX partitions
> know when they're talking to a real network and that they should drop their
> MSS. This works fine if VIOS owns the physical network card.
> 
> However, in this case, a Linux partition owns the card (this is known as a
> NovaLink setup). The negotiation between VIOS and AIX uses a non-standard
> TCP option, so Linux has never supported that.  Instead, Linux just supports
> receiving large packets. It doesn't support any form of messaging/MSS
> negotiation back to other LPARs.
> 
> To get some clarity about where the large MSS was coming from, I asked
> Thomas Falcon, the maintainer of ibmveth, for some background:
> 
> "In most cases, large segments are an aggregation of smaller packets by the
> Virtual I/O Server (VIOS) partition and then are forwarded to the Linux LPAR /
> ibmveth driver. These segments can be as large as 64KB. In this case, since the
> customer is using Novalink, I believe what is happening is pretty straightforward:
> the large segments are created by the AIX partition and then forwarded to the
> Linux partition, ... The ibmveth driver doesn't do any aggregation itself but just
> ensures the proper bits are set before sending the frame up to avoid giving the
> upper layers indigestion."
> 
> It is possible to stop AIX from sending these large segments, but it requires
> configuration on each LPAR. While ibmveth's behaviour is admittedly weird, we
> should fix this here: it shouldn't be possible for it to cause a firmware panic on
> another card.
> 
> Cc: Thomas Falcon <tlfalcon@linux.vnet.ibm.com> # ibmveth
> Cc: Yuval Mintz <Yuval.Mintz@cavium.com> # bnx2x
> Thanks-to: Jay Vosburgh <jay.vosburgh@canonical.com> # veth info
> Signed-off-by: Daniel Axtens <dja@axtens.net>
> 
> ---
> v2: change to a feature check as suggested by Eric Dumazet.
> 
> ---
>  drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
> b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
> index 7b08323e3f3d..bab909b5d7a2 100644
> --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
> +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
> @@ -12934,6 +12934,17 @@ static netdev_features_t
> bnx2x_features_check(struct sk_buff *skb,
>  					      struct net_device *dev,
>  					      netdev_features_t features)
>  {
> +	/*
> +	 * A skb with gso_size + header length > 9700 will cause a
> +	 * firmware panic. Drop GSO support.
> +	 *
> +	 * To avoid costly calculations on all packets (and because
> +	 * super-jumbo frames are rare), allow 500 bytes of headers
> +	 * and just disable GSO if gso_size is greater than 9200.
> +	 */
> +	if (unlikely(skb_is_gso(skb) && skb_shinfo(skb)->gso_size > 9200))
> +		features &= ~NETIF_F_GSO_MASK;
> +
>  	features = vlan_features_check(skb, features);
>  	return vxlan_features_check(skb, features);  }
> --
> 2.14.1

This is merely a question on kernel stack, how kernel might end up sending such GSO packets to the device
considering MTUs of underlined device ? Not sure if this should be handled at driver level or at kernel stack level ?

In general, when I run iperf with MTU 1500, I get gso_size as 1448.
When running with 9600 MTU, I get gso_size as 9548.

With this change, considering Jumbo MTU, even if device is able to handle a packet with gso_size like 9548, it will cause it to disable GSO
Considering jumbo frames are rare, doesn't sound well. Since there are customers/people which might be utilizing jumbo MTUs.

Thanks,
Manish
David Miller Jan. 15, 2018, 7:06 p.m. UTC | #4
From: Daniel Axtens <dja@axtens.net>
Date: Fri, 12 Jan 2018 10:59:05 +1100

> If a bnx2x card is passed a GSO packet with a gso_size larger than
> ~9700 bytes, it will cause a firmware error that will bring the card
> down:
> 
> bnx2x: [bnx2x_attn_int_deasserted3:4323(enP24p1s0f0)]MC assert!
> bnx2x: [bnx2x_mc_assert:720(enP24p1s0f0)]XSTORM_ASSERT_LIST_INDEX 0x2
> bnx2x: [bnx2x_mc_assert:736(enP24p1s0f0)]XSTORM_ASSERT_INDEX 0x0 = 0x00000000 0x25e43e47 0x00463e01 0x00010052
> bnx2x: [bnx2x_mc_assert:750(enP24p1s0f0)]Chip Revision: everest3, FW Version: 7_13_1
> ... (dump of values continues) ...
> 
> Detect when gso_size + header length is greater than the maximum
> packet size (9700 bytes) and disable GSO. For simplicity and speed
> this is approximated by comparing gso_size against 9200 and assuming
> no-one will have more than 500 bytes of headers.

What is the MTU size configured on the bnx2x device when these 9700
byte packets are seen?

If it's less than 9700, whatever is allowing your device (openvswitch,
ibmveth, whatever) needs to be fixed.

I don't like this at all, quite frankly.  We'll have one device now that
has this special check, probably many others can run into this situation
as well but they won't be used on these kinds of powerpc boxes and
therefore nobody is going to notice.

I'm not applying this without more information or better justification,
sorry.
Stephen Hemminger Jan. 15, 2018, 7:44 p.m. UTC | #5
On Fri, 12 Jan 2018 09:53:23 -0200
Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> wrote:

> On Fri, Jan 12, 2018 at 09:48:44AM -0200, Marcelo Ricardo Leitner wrote:
> > On Fri, Jan 12, 2018 at 10:59:05AM +1100, Daniel Axtens wrote:  
> > > If a bnx2x card is passed a GSO packet with a gso_size larger than
> > > ~9700 bytes, it will cause a firmware error that will bring the card
> > > down:  
> > 
> > Why not use netif_set_gso_max_size() instead?
> > Some drivers are using it to avoid such larger than supported packets.  
> 
> Scratch that, sorry. It doesn't take effect in forwarding path.
> 
>   Marcelo

It still makes sense to tell TCP not to generate huge GSO packets.

For the forwarding path, there needs to be some software recovery
logic in net transmit path.
Daniel Axtens Jan. 16, 2018, 12:12 a.m. UTC | #6
David Miller <davem@davemloft.net> writes:

> From: Daniel Axtens <dja@axtens.net>
> Date: Fri, 12 Jan 2018 10:59:05 +1100
>
>> If a bnx2x card is passed a GSO packet with a gso_size larger than
>> ~9700 bytes, it will cause a firmware error that will bring the card
>> down:
>> 
>> bnx2x: [bnx2x_attn_int_deasserted3:4323(enP24p1s0f0)]MC assert!
>> bnx2x: [bnx2x_mc_assert:720(enP24p1s0f0)]XSTORM_ASSERT_LIST_INDEX 0x2
>> bnx2x: [bnx2x_mc_assert:736(enP24p1s0f0)]XSTORM_ASSERT_INDEX 0x0 = 0x00000000 0x25e43e47 0x00463e01 0x00010052
>> bnx2x: [bnx2x_mc_assert:750(enP24p1s0f0)]Chip Revision: everest3, FW Version: 7_13_1
>> ... (dump of values continues) ...
>> 
>> Detect when gso_size + header length is greater than the maximum
>> packet size (9700 bytes) and disable GSO. For simplicity and speed
>> this is approximated by comparing gso_size against 9200 and assuming
>> no-one will have more than 500 bytes of headers.
>
> What is the MTU size configured on the bnx2x device when these 9700
> byte packets are seen?
>
> If it's less than 9700, whatever is allowing your device (openvswitch,
> ibmveth, whatever) needs to be fixed.

Sure, I had an approach that checks the gso_size in is_skb_forwardable
and the equivalent openvswitch path - I'll send that.

Regards,
Daniel

>
> I don't like this at all, quite frankly.  We'll have one device now that
> has this special check, probably many others can run into this situation
> as well but they won't be used on these kinds of powerpc boxes and
> therefore nobody is going to notice.
>
> I'm not applying this without more information or better justification,
> sorry.
diff mbox series

Patch

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
index 7b08323e3f3d..bab909b5d7a2 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
@@ -12934,6 +12934,17 @@  static netdev_features_t bnx2x_features_check(struct sk_buff *skb,
 					      struct net_device *dev,
 					      netdev_features_t features)
 {
+	/*
+	 * A skb with gso_size + header length > 9700 will cause a
+	 * firmware panic. Drop GSO support.
+	 *
+	 * To avoid costly calculations on all packets (and because
+	 * super-jumbo frames are rare), allow 500 bytes of headers
+	 * and just disable GSO if gso_size is greater than 9200.
+	 */
+	if (unlikely(skb_is_gso(skb) && skb_shinfo(skb)->gso_size > 9200))
+		features &= ~NETIF_F_GSO_MASK;
+
 	features = vlan_features_check(skb, features);
 	return vxlan_features_check(skb, features);
 }