diff mbox

[1/2,for,4.13] net: qcom/emac: disable flow control autonegotiation by default

Message ID 1501623460-3575-2-git-send-email-timur@codeaurora.org
State Rejected, archived
Delegated to: David Miller
Headers show

Commit Message

Timur Tabi Aug. 1, 2017, 9:37 p.m. UTC
The EMAC has a curious qwirk when RX flow control is enabled and the
kernel hangs.  With the kernel hung, the EMAC's RX queue soon fills.
If RX flow control is enabled, the EMAC will then send a non-stop
stream of pause frames until the system is reset.  The EMAC does not
have a built-in watchdog.

In various tests, the pause frame stream sometimes overloads nearby
switches, effectively disabling the network.  Since the RX queue is
large and the host processor is more than capable of handling incoming
packets quickly, the only time the EMAC will send any pause frames is
when the kernel is hung and unrecoverable.

To avoid all these problems, we disable flow control autonegotiation
by default, and only enable receiving pause frames.

Cc: stable@vger.kernel.org # 4.11.x
Signed-off-by: Timur Tabi <timur@codeaurora.org>
---
 drivers/net/ethernet/qualcomm/emac/emac.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

Comments

Florian Fainelli Aug. 1, 2017, 9:55 p.m. UTC | #1
On 08/01/2017 02:37 PM, Timur Tabi wrote:
> The EMAC has a curious qwirk when RX flow control is enabled and the
> kernel hangs.  With the kernel hung, the EMAC's RX queue soon fills.
> If RX flow control is enabled, the EMAC will then send a non-stop
> stream of pause frames until the system is reset.  The EMAC does not
> have a built-in watchdog.
> 
> In various tests, the pause frame stream sometimes overloads nearby
> switches, effectively disabling the network.  Since the RX queue is
> large and the host processor is more than capable of handling incoming
> packets quickly, the only time the EMAC will send any pause frames is
> when the kernel is hung and unrecoverable.

This is not specific to your EMAC, a lot of adapters have this problem
actually.

I wonder if it would make sense to reach for a broader solution where we
could have a networking stack panic/oops notifier which will actively
clean up the active network devices' RX queue(s) and if tx_pause was
enabled, disable it. We could have drivers announce themselves as
needing this either via NETIF_F_* feature bit or some other private flag.

> 
> To avoid all these problems, we disable flow control autonegotiation
> by default, and only enable receiving pause frames.
> 
> Cc: stable@vger.kernel.org # 4.11.x
> Signed-off-by: Timur Tabi <timur@codeaurora.org>
> ---
>  drivers/net/ethernet/qualcomm/emac/emac.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/qualcomm/emac/emac.c b/drivers/net/ethernet/qualcomm/emac/emac.c
> index 60850bfa3d32..475c0ea29235 100644
> --- a/drivers/net/ethernet/qualcomm/emac/emac.c
> +++ b/drivers/net/ethernet/qualcomm/emac/emac.c
> @@ -441,8 +441,13 @@ static void emac_init_adapter(struct emac_adapter *adpt)
>  	/* others */
>  	adpt->preamble = EMAC_PREAMBLE_DEF;
>  
> -	/* default to automatic flow control */
> -	adpt->automatic = true;
> +	/* Disable transmission of pause frames by default, to avoid the
> +	 * risk of a pause frame flood that can occur if the kernel hangs.
> +	 * We still want to be able to respond to them, however.
> +	 */
> +	adpt->automatic = false;
> +	adpt->tx_flow_control = false;
> +	adpt->rx_flow_control = true;
>  }
>  
>  /* Get the clock */
>
Timur Tabi Aug. 1, 2017, 10:02 p.m. UTC | #2
On 08/01/2017 04:55 PM, Florian Fainelli wrote:
> This is not specific to your EMAC, a lot of adapters have this problem
> actually.
> 
> I wonder if it would make sense to reach for a broader solution where we
> could have a networking stack panic/oops notifier which will actively
> clean up the active network devices' RX queue(s) and if tx_pause was
> enabled, disable it. We could have drivers announce themselves as
> needing this either via NETIF_F_* feature bit or some other private flag.

Unfortunately, the problem occurs only when Linux hangs, to the point 
where the driver's interrupt handlers are blocked.  The RX queue is 256 
entries, and the processor has 48 cores, so the EMAC is never going to 
send pause frames in any real-world situation.

The only time I've seen pause frames sent out is in the lab when I halt 
the cores with a hardware debugger, and only if I have enough network 
traffic that the EMAC picks up.
Florian Fainelli Aug. 1, 2017, 10:08 p.m. UTC | #3
On 08/01/2017 03:02 PM, Timur Tabi wrote:
> On 08/01/2017 04:55 PM, Florian Fainelli wrote:
>> This is not specific to your EMAC, a lot of adapters have this problem
>> actually.
>>
>> I wonder if it would make sense to reach for a broader solution where we
>> could have a networking stack panic/oops notifier which will actively
>> clean up the active network devices' RX queue(s) and if tx_pause was
>> enabled, disable it. We could have drivers announce themselves as
>> needing this either via NETIF_F_* feature bit or some other private flag.
> 
> Unfortunately, the problem occurs only when Linux hangs, to the point
> where the driver's interrupt handlers are blocked.  The RX queue is 256
> entries, and the processor has 48 cores, so the EMAC is never going to
> send pause frames in any real-world situation.
> 
> The only time I've seen pause frames sent out is in the lab when I halt
> the cores with a hardware debugger, and only if I have enough network
> traffic that the EMAC picks up.

The size and scale of your system makes it so but imagine e.g: a single
core ~ 1Ghz @ 1Gbits/sec system having the same problems, here you are
quite likely to see the system under panic flooding the network.

Then again your patch is fine and can be revised at any time a broader
facility is offered, I just felt like we actually have a good way with
reasonably driver-agnostic code to possibly deal with that problem.

Implementing such a solution would not be a -stable backport candidate
though....
Andrew Lunn Aug. 1, 2017, 11:15 p.m. UTC | #4
On Tue, Aug 01, 2017 at 04:37:39PM -0500, Timur Tabi wrote:
> The EMAC has a curious qwirk when RX flow control is enabled and the
> kernel hangs.  With the kernel hung, the EMAC's RX queue soon fills.
> If RX flow control is enabled, the EMAC will then send a non-stop
> stream of pause frames until the system is reset.  The EMAC does not
> have a built-in watchdog.
> 
> In various tests, the pause frame stream sometimes overloads nearby
> switches, effectively disabling the network.  Since the RX queue is
> large and the host processor is more than capable of handling incoming
> packets quickly, the only time the EMAC will send any pause frames is
> when the kernel is hung and unrecoverable.
> 
> To avoid all these problems, we disable flow control autonegotiation
> by default, and only enable receiving pause frames.
> 
> Cc: stable@vger.kernel.org # 4.11.x
> Signed-off-by: Timur Tabi <timur@codeaurora.org>
> ---
>  drivers/net/ethernet/qualcomm/emac/emac.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/qualcomm/emac/emac.c b/drivers/net/ethernet/qualcomm/emac/emac.c
> index 60850bfa3d32..475c0ea29235 100644
> --- a/drivers/net/ethernet/qualcomm/emac/emac.c
> +++ b/drivers/net/ethernet/qualcomm/emac/emac.c
> @@ -441,8 +441,13 @@ static void emac_init_adapter(struct emac_adapter *adpt)
>  	/* others */
>  	adpt->preamble = EMAC_PREAMBLE_DEF;
>  
> -	/* default to automatic flow control */
> -	adpt->automatic = true;
> +	/* Disable transmission of pause frames by default, to avoid the
> +	 * risk of a pause frame flood that can occur if the kernel hangs.
> +	 * We still want to be able to respond to them, however.
> +	 */
> +	adpt->automatic = false;
> +	adpt->tx_flow_control = false;
> +	adpt->rx_flow_control = true;

Hi Timur

Pause frames are something you can auto-negotiate at the PHY
level. Should you also be clearing some bits in the phydev, so the
peer knows pause frames are not supported?

     Andrew
Timur Tabi Aug. 2, 2017, 12:56 a.m. UTC | #5
On 8/1/17 6:15 PM, Andrew Lunn wrote:
> Pause frames are something you can auto-negotiate at the PHY
> level. Should you also be clearing some bits in the phydev, so the
> peer knows pause frames are not supported?

When pause frame autonegotiation is enabled in the driver, that only 
means that the driver looks at what the PHY has autonegotiated, and then 
configures the MAC to match that.

The driver doesn't touch the PHY at all.  It leaves all that to phylib.

Now if autonegotiation is disabled in the driver, then it just 
hard-codes those TX/RX settings in the driver.  Are you saying I should 
program the PHY at the point to disable autonegotiation on the PHY 
level?  If so, then I don't know how to do that.  I just assumed that 
the MAC never tells the PHY what to do.
Andrew Lunn Aug. 2, 2017, 2:58 a.m. UTC | #6
On Tue, Aug 01, 2017 at 07:56:31PM -0500, Timur Tabi wrote:
> On 8/1/17 6:15 PM, Andrew Lunn wrote:
> >Pause frames are something you can auto-negotiate at the PHY
> >level. Should you also be clearing some bits in the phydev, so the
> >peer knows pause frames are not supported?
> 
> When pause frame autonegotiation is enabled in the driver, that only means
> that the driver looks at what the PHY has autonegotiated, and then
> configures the MAC to match that.
> 
> The driver doesn't touch the PHY at all.  It leaves all that to phylib.
> 
> Now if autonegotiation is disabled in the driver, then it just hard-codes
> those TX/RX settings in the driver.  Are you saying I should program the PHY
> at the point to disable autonegotiation on the PHY level?  If so, then I
> don't know how to do that.  I just assumed that the MAC never tells the PHY
> what to do.

Documentation/networking/phy.txt says:

Pause frames / flow control

 The PHY does not participate directly in flow control/pause frames except by
 making sure that the SUPPORTED_Pause and SUPPORTED_AsymPause bits are set in
 MII_ADVERTISE to indicate towards the link partner that the Ethernet MAC
 controller supports such a thing. Since flow control/pause frames generation
 involves the Ethernet MAC driver, it is recommended that this driver takes care
 of properly indicating advertisement and support for such features by setting
 the SUPPORTED_Pause and SUPPORTED_AsymPause bits accordingly. This can be done
 either before or after phy_connect() and/or as a result of implementing the
 ethtool::set_pauseparam feature.

So just check if the MAC driver is setting SUPPORTED_Pause and
SUPPORTED_AsymPause.

	Andrew
Timur Tabi Aug. 2, 2017, 3:22 a.m. UTC | #7
On 8/1/17 9:58 PM, Andrew Lunn wrote:
>   The PHY does not participate directly in flow control/pause frames except by
>   making sure that the SUPPORTED_Pause and SUPPORTED_AsymPause bits are set in
>   MII_ADVERTISE to indicate towards the link partner that the Ethernet MAC
>   controller supports such a thing. Since flow control/pause frames generation
>   involves the Ethernet MAC driver, it is recommended that this driver takes care
>   of properly indicating advertisement and support for such features by setting
>   the SUPPORTED_Pause and SUPPORTED_AsymPause bits accordingly. This can be done
>   either before or after phy_connect() and/or as a result of implementing the
>   ethtool::set_pauseparam feature.
> 
> So just check if the MAC driver is setting SUPPORTED_Pause and
> SUPPORTED_AsymPause.

I think this was covered in this thread: 
http://www.spinics.net/lists/netdev/msg408978.html
David Laight Aug. 2, 2017, 1:48 p.m. UTC | #8
From: Timur Tabi
> Sent: 01 August 2017 22:38
> The EMAC has a curious qwirk when RX flow control is enabled and the
> kernel hangs.  With the kernel hung, the EMAC's RX queue soon fills.
> If RX flow control is enabled, the EMAC will then send a non-stop
> stream of pause frames until the system is reset.  The EMAC does not
> have a built-in watchdog.
> 
> In various tests, the pause frame stream sometimes overloads nearby
> switches, effectively disabling the network.
...

If the nearby switches cannot handle pause frames, then the MAC shouldn't
be sending them at all.
They

I suspect that they should only ever be sent if the phy autonegotiation
indicates that they are supported.
You might want to avoid sending them even if allowed, or advertise
non-support on hardware that could support them, but sending them
anyway is likely to cause grief.

This is similar to the problems that arise when the 'speed' is forced
instead of limiting the advertised speed to one value.

	David
Timur Tabi Aug. 2, 2017, 2:21 p.m. UTC | #9
On 8/2/17 8:48 AM, David Laight wrote:
> If the nearby switches cannot handle pause frames, then the MAC shouldn't
> be sending them at all.

There's no way for me to know whether the switches can handle the pause 
frames or not.  You would think that sending one multicast pause frame 
ever 33ms would not overload a switch, but in our lab it does.

This is why I changed the default to disable sending pause frames.

> They
> 
> I suspect that they should only ever be sent if the phy autonegotiation
> indicates that they are supported.

Or if you manually enable them.  That's why I changed to default: the 
MAC is now configured by default to accept pause frames but not send them.
David Laight Aug. 2, 2017, 2:51 p.m. UTC | #10
From: Timur Tabi
> Sent: 02 August 2017 15:22
> On 8/2/17 8:48 AM, David Laight wrote:
> > If the nearby switches cannot handle pause frames, then the MAC shouldn't
> > be sending them at all.
> 
> There's no way for me to know whether the switches can handle the pause
> frames or not.  You would think that sending one multicast pause frame
> ever 33ms would not overload a switch, but in our lab it does.
> 
> This is why I changed the default to disable sending pause frames.
...

Thinks ...
Sending pause frames just tells the adjacent switch not to send you packets
(because you'll discard them).
Since the idea is to avoid the discards, the switch will buffer the
packets it would have sent.
The buffers in the switch then fill up with packets it isn't sending you.
The switch then runs out of buffers, it has 2 choices:
1) Throw the packets away.
2) Send 'pause' frames to the sources.
If it sends 'pause' frames the entire network will very quickly lock up.
If it discards the packets they might as well have been discarded by the
receiving MAC.

Doesn't this mean that pause frames are 99.999% useless??

	David
Timur Tabi Aug. 2, 2017, 3:08 p.m. UTC | #11
On 08/02/2017 09:51 AM, David Laight wrote:
> Sending pause frames just tells the adjacent switch not to send you packets
> (because you'll discard them).
> Since the idea is to avoid the discards, the switch will buffer the
> packets it would have sent.
> The buffers in the switch then fill up with packets it isn't sending you.

I was under the impression that the switch forwards the pause frames to 
other devices, so that the transmitting NIC can stop sending the data, 
but your explanation makes a lot more sense.  If the EMAC never stops 
sending pause frames, then the switch's buffers will fill up, disabling 
all other devices.  If the switch does not have per-port buffers, then 
it makes sense when the buffer is full, it blocks all ports.

> The switch then runs out of buffers, it has 2 choices:
> 1) Throw the packets away.
> 2) Send 'pause' frames to the sources.
> If it sends 'pause' frames the entire network will very quickly lock up.
> If it discards the packets they might as well have been discarded by the
> receiving MAC.
> 
> Doesn't this mean that pause frames are 99.999% useless??

Pause frames are intended for situations where the receiving CPU is 
temporarily overwhelmed and just needs a second or two to resume 
processing incoming packets.  That makes sense on a dinky single-core 
32-bit CPU.
David Laight Aug. 2, 2017, 3:38 p.m. UTC | #12
From: Timur Tabi
> Sent: 02 August 2017 16:09
> On 08/02/2017 09:51 AM, David Laight wrote:
> > Sending pause frames just tells the adjacent switch not to send you packets
> > (because you'll discard them).
> > Since the idea is to avoid the discards, the switch will buffer the
> > packets it would have sent.
> > The buffers in the switch then fill up with packets it isn't sending you.
> 
> I was under the impression that the switch forwards the pause frames to
> other devices, so that the transmitting NIC can stop sending the data,

Most of my ethernet knowledge predates FDX :-)
It wouldn't make any sense to try to use the source address of a pause
frame to suppress traffic to a specific MAC - that would have to go way down
into IP on all the receiving systems.
Also ISTR that pause frames are very short and don't even contain MAC
addresses.

> but your explanation makes a lot more sense.  If the EMAC never stops
> sending pause frames, then the switch's buffers will fill up, disabling
> all other devices.  If the switch does not have per-port buffers, then
> it makes sense when the buffer is full, it blocks all ports.

A switch will (probably) either have buffers for each input port, or for
each output port (or maybe both).
Output port buffers are less likely to cause grief when an output port is
running at a slower speed than the input port.

But is a switch is going to send pause frames it doesn't really matter
how the buffers are arranged. The cascade will still happen.

It would take very careful heuristics in a switch to manage pause
frames properly.

Of course, once the kernel has crashed, even multicast packets will
eventually run the MAC out of buffers.
(Unless you run on private network and manage to reinstall and reboot
while the MAC is still active and then eventually die when a receive
frame overwrites what used to be a receive buffer.)

	David
David Miller Aug. 2, 2017, 5:54 p.m. UTC | #13
From: Timur Tabi <timur@codeaurora.org>
Date: Tue,  1 Aug 2017 16:37:39 -0500

> The EMAC has a curious qwirk when RX flow control is enabled and the
> kernel hangs.  With the kernel hung, the EMAC's RX queue soon fills.
> If RX flow control is enabled, the EMAC will then send a non-stop
> stream of pause frames until the system is reset.  The EMAC does not
> have a built-in watchdog.
> 
> In various tests, the pause frame stream sometimes overloads nearby
> switches, effectively disabling the network.  Since the RX queue is
> large and the host processor is more than capable of handling incoming
> packets quickly, the only time the EMAC will send any pause frames is
> when the kernel is hung and unrecoverable.
> 
> To avoid all these problems, we disable flow control autonegotiation
> by default, and only enable receiving pause frames.
> 
> Cc: stable@vger.kernel.org # 4.11.x
> Signed-off-by: Timur Tabi <timur@codeaurora.org>

I've thought about this a lot and I don't like it for many reasons.

First of all, this hung kernel scenerio is completely bogus.

The ethernet chip may not have a proper watchdog for the pause frame
generation logic, but the whole kernel sure as hell does.

And if this kind of thing matters to the user, they will have a
software or hardware watchdog driver enabled to break out of this
situation.

Turning off flow control by default has so many negative ramifications
and don't try to convince me that users will be "aware" of this and
turn it back on.

They largely won't.
Timur Tabi Aug. 2, 2017, 6:23 p.m. UTC | #14
On 08/02/2017 12:54 PM, David Miller wrote:
> And if this kind of thing matters to the user, they will have a
> software or hardware watchdog driver enabled to break out of this
> situation.

The problem is that the user is not going to expect that the EMAC can 
disable the nearby switch(es) when the kernel is hung and not rebooted 
quickly enough.  Internally, this bug/feature has caused quite a bit of 
mayhem, so the problem is real.  No cares about enabling flow control -- 
it just happens to be enabled on some systems where the switch agrees to 
it.  So random individuals can't debug the hardware because suddenly the 
EMAC has gone haywire and disabled the local network.

> Turning off flow control by default has so many negative ramifications
> and don't try to convince me that users will be "aware" of this and
> turn it back on.

What are the negative ramifications?  It's practically impossible to 
overload the chip such that it can't process the incoming packets fast 
enough.  I don't know of any real-world situation where the EMAC needs 
to transmit pause frames.
David Miller Aug. 2, 2017, 6:35 p.m. UTC | #15
From: Timur Tabi <timur@codeaurora.org>
Date: Wed, 2 Aug 2017 13:23:18 -0500

> On 08/02/2017 12:54 PM, David Miller wrote:
>> And if this kind of thing matters to the user, they will have a
>> software or hardware watchdog driver enabled to break out of this
>> situation.
> 
> The problem is that the user is not going to expect that the EMAC can
> disable the nearby switch(es) when the kernel is hung and not rebooted
> quickly enough.  Internally, this bug/feature has caused quite a bit
> of mayhem, so the problem is real.  No cares about enabling flow
> control -- it just happens to be enabled on some systems where the
> switch agrees to it.  So random individuals can't debug the hardware
> because suddenly the EMAC has gone haywire and disabled the local
> network.

Again, any serious installation will have a system watchdog enabled
which will break the pause frame bomb.
David Miller Aug. 2, 2017, 6:36 p.m. UTC | #16
From: Timur Tabi <timur@codeaurora.org>
Date: Wed, 2 Aug 2017 13:23:18 -0500

> It's practically impossible to overload the chip such that it can't
> process the incoming packets fast enough.  I don't know of any
> real-world situation where the EMAC needs to transmit pause frames.

Slow cpus or very expensive stack operations can cause this, it's not
a property of the EMAC chip at all.
Timur Tabi Aug. 2, 2017, 6:39 p.m. UTC | #17
On 08/02/2017 01:35 PM, David Miller wrote:
> Again, any serious installation will have a system watchdog enabled
> which will break the pause frame bomb.

Oh well.  I guess I'll have to carry this patch internally.

What about patch #2?
David Miller Aug. 2, 2017, 11:15 p.m. UTC | #18
From: Timur Tabi <timur@codeaurora.org>
Date: Wed, 2 Aug 2017 13:39:34 -0500

> On 08/02/2017 01:35 PM, David Miller wrote:
>> Again, any serious installation will have a system watchdog enabled
>> which will break the pause frame bomb.
> 
> Oh well.  I guess I'll have to carry this patch internally.

So you don't want to run a proper watchdog on your systems?
You want them to just hang there and wait for someone to
notice instead?

> What about patch #2?

No real objection.
Timur Tabi Aug. 3, 2017, 1 a.m. UTC | #19
On 8/2/17 6:15 PM, David Miller wrote:

> So you don't want to run a proper watchdog on your systems?
> You want them to just hang there and wait for someone to
> notice instead?

This is for internal development.  We noticed the problem first during 
debugging, when we would halt a core for more than a couple minutes. 
Then the entire lab network would go down.
diff mbox

Patch

diff --git a/drivers/net/ethernet/qualcomm/emac/emac.c b/drivers/net/ethernet/qualcomm/emac/emac.c
index 60850bfa3d32..475c0ea29235 100644
--- a/drivers/net/ethernet/qualcomm/emac/emac.c
+++ b/drivers/net/ethernet/qualcomm/emac/emac.c
@@ -441,8 +441,13 @@  static void emac_init_adapter(struct emac_adapter *adpt)
 	/* others */
 	adpt->preamble = EMAC_PREAMBLE_DEF;
 
-	/* default to automatic flow control */
-	adpt->automatic = true;
+	/* Disable transmission of pause frames by default, to avoid the
+	 * risk of a pause frame flood that can occur if the kernel hangs.
+	 * We still want to be able to respond to them, however.
+	 */
+	adpt->automatic = false;
+	adpt->tx_flow_control = false;
+	adpt->rx_flow_control = true;
 }
 
 /* Get the clock */