Message ID | 1501623460-3575-2-git-send-email-timur@codeaurora.org |
---|---|
State | Rejected, archived |
Delegated to: | David Miller |
Headers | show |
On 08/01/2017 02:37 PM, Timur Tabi wrote: > The EMAC has a curious qwirk when RX flow control is enabled and the > kernel hangs. With the kernel hung, the EMAC's RX queue soon fills. > If RX flow control is enabled, the EMAC will then send a non-stop > stream of pause frames until the system is reset. The EMAC does not > have a built-in watchdog. > > In various tests, the pause frame stream sometimes overloads nearby > switches, effectively disabling the network. Since the RX queue is > large and the host processor is more than capable of handling incoming > packets quickly, the only time the EMAC will send any pause frames is > when the kernel is hung and unrecoverable. This is not specific to your EMAC, a lot of adapters have this problem actually. I wonder if it would make sense to reach for a broader solution where we could have a networking stack panic/oops notifier which will actively clean up the active network devices' RX queue(s) and if tx_pause was enabled, disable it. We could have drivers announce themselves as needing this either via NETIF_F_* feature bit or some other private flag. > > To avoid all these problems, we disable flow control autonegotiation > by default, and only enable receiving pause frames. > > Cc: stable@vger.kernel.org # 4.11.x > Signed-off-by: Timur Tabi <timur@codeaurora.org> > --- > drivers/net/ethernet/qualcomm/emac/emac.c | 9 +++++++-- > 1 file changed, 7 insertions(+), 2 deletions(-) > > diff --git a/drivers/net/ethernet/qualcomm/emac/emac.c b/drivers/net/ethernet/qualcomm/emac/emac.c > index 60850bfa3d32..475c0ea29235 100644 > --- a/drivers/net/ethernet/qualcomm/emac/emac.c > +++ b/drivers/net/ethernet/qualcomm/emac/emac.c > @@ -441,8 +441,13 @@ static void emac_init_adapter(struct emac_adapter *adpt) > /* others */ > adpt->preamble = EMAC_PREAMBLE_DEF; > > - /* default to automatic flow control */ > - adpt->automatic = true; > + /* Disable transmission of pause frames by default, to avoid the > + * risk of a pause frame flood that can occur if the kernel hangs. > + * We still want to be able to respond to them, however. > + */ > + adpt->automatic = false; > + adpt->tx_flow_control = false; > + adpt->rx_flow_control = true; > } > > /* Get the clock */ >
On 08/01/2017 04:55 PM, Florian Fainelli wrote: > This is not specific to your EMAC, a lot of adapters have this problem > actually. > > I wonder if it would make sense to reach for a broader solution where we > could have a networking stack panic/oops notifier which will actively > clean up the active network devices' RX queue(s) and if tx_pause was > enabled, disable it. We could have drivers announce themselves as > needing this either via NETIF_F_* feature bit or some other private flag. Unfortunately, the problem occurs only when Linux hangs, to the point where the driver's interrupt handlers are blocked. The RX queue is 256 entries, and the processor has 48 cores, so the EMAC is never going to send pause frames in any real-world situation. The only time I've seen pause frames sent out is in the lab when I halt the cores with a hardware debugger, and only if I have enough network traffic that the EMAC picks up.
On 08/01/2017 03:02 PM, Timur Tabi wrote: > On 08/01/2017 04:55 PM, Florian Fainelli wrote: >> This is not specific to your EMAC, a lot of adapters have this problem >> actually. >> >> I wonder if it would make sense to reach for a broader solution where we >> could have a networking stack panic/oops notifier which will actively >> clean up the active network devices' RX queue(s) and if tx_pause was >> enabled, disable it. We could have drivers announce themselves as >> needing this either via NETIF_F_* feature bit or some other private flag. > > Unfortunately, the problem occurs only when Linux hangs, to the point > where the driver's interrupt handlers are blocked. The RX queue is 256 > entries, and the processor has 48 cores, so the EMAC is never going to > send pause frames in any real-world situation. > > The only time I've seen pause frames sent out is in the lab when I halt > the cores with a hardware debugger, and only if I have enough network > traffic that the EMAC picks up. The size and scale of your system makes it so but imagine e.g: a single core ~ 1Ghz @ 1Gbits/sec system having the same problems, here you are quite likely to see the system under panic flooding the network. Then again your patch is fine and can be revised at any time a broader facility is offered, I just felt like we actually have a good way with reasonably driver-agnostic code to possibly deal with that problem. Implementing such a solution would not be a -stable backport candidate though....
On Tue, Aug 01, 2017 at 04:37:39PM -0500, Timur Tabi wrote: > The EMAC has a curious qwirk when RX flow control is enabled and the > kernel hangs. With the kernel hung, the EMAC's RX queue soon fills. > If RX flow control is enabled, the EMAC will then send a non-stop > stream of pause frames until the system is reset. The EMAC does not > have a built-in watchdog. > > In various tests, the pause frame stream sometimes overloads nearby > switches, effectively disabling the network. Since the RX queue is > large and the host processor is more than capable of handling incoming > packets quickly, the only time the EMAC will send any pause frames is > when the kernel is hung and unrecoverable. > > To avoid all these problems, we disable flow control autonegotiation > by default, and only enable receiving pause frames. > > Cc: stable@vger.kernel.org # 4.11.x > Signed-off-by: Timur Tabi <timur@codeaurora.org> > --- > drivers/net/ethernet/qualcomm/emac/emac.c | 9 +++++++-- > 1 file changed, 7 insertions(+), 2 deletions(-) > > diff --git a/drivers/net/ethernet/qualcomm/emac/emac.c b/drivers/net/ethernet/qualcomm/emac/emac.c > index 60850bfa3d32..475c0ea29235 100644 > --- a/drivers/net/ethernet/qualcomm/emac/emac.c > +++ b/drivers/net/ethernet/qualcomm/emac/emac.c > @@ -441,8 +441,13 @@ static void emac_init_adapter(struct emac_adapter *adpt) > /* others */ > adpt->preamble = EMAC_PREAMBLE_DEF; > > - /* default to automatic flow control */ > - adpt->automatic = true; > + /* Disable transmission of pause frames by default, to avoid the > + * risk of a pause frame flood that can occur if the kernel hangs. > + * We still want to be able to respond to them, however. > + */ > + adpt->automatic = false; > + adpt->tx_flow_control = false; > + adpt->rx_flow_control = true; Hi Timur Pause frames are something you can auto-negotiate at the PHY level. Should you also be clearing some bits in the phydev, so the peer knows pause frames are not supported? Andrew
On 8/1/17 6:15 PM, Andrew Lunn wrote: > Pause frames are something you can auto-negotiate at the PHY > level. Should you also be clearing some bits in the phydev, so the > peer knows pause frames are not supported? When pause frame autonegotiation is enabled in the driver, that only means that the driver looks at what the PHY has autonegotiated, and then configures the MAC to match that. The driver doesn't touch the PHY at all. It leaves all that to phylib. Now if autonegotiation is disabled in the driver, then it just hard-codes those TX/RX settings in the driver. Are you saying I should program the PHY at the point to disable autonegotiation on the PHY level? If so, then I don't know how to do that. I just assumed that the MAC never tells the PHY what to do.
On Tue, Aug 01, 2017 at 07:56:31PM -0500, Timur Tabi wrote: > On 8/1/17 6:15 PM, Andrew Lunn wrote: > >Pause frames are something you can auto-negotiate at the PHY > >level. Should you also be clearing some bits in the phydev, so the > >peer knows pause frames are not supported? > > When pause frame autonegotiation is enabled in the driver, that only means > that the driver looks at what the PHY has autonegotiated, and then > configures the MAC to match that. > > The driver doesn't touch the PHY at all. It leaves all that to phylib. > > Now if autonegotiation is disabled in the driver, then it just hard-codes > those TX/RX settings in the driver. Are you saying I should program the PHY > at the point to disable autonegotiation on the PHY level? If so, then I > don't know how to do that. I just assumed that the MAC never tells the PHY > what to do. Documentation/networking/phy.txt says: Pause frames / flow control The PHY does not participate directly in flow control/pause frames except by making sure that the SUPPORTED_Pause and SUPPORTED_AsymPause bits are set in MII_ADVERTISE to indicate towards the link partner that the Ethernet MAC controller supports such a thing. Since flow control/pause frames generation involves the Ethernet MAC driver, it is recommended that this driver takes care of properly indicating advertisement and support for such features by setting the SUPPORTED_Pause and SUPPORTED_AsymPause bits accordingly. This can be done either before or after phy_connect() and/or as a result of implementing the ethtool::set_pauseparam feature. So just check if the MAC driver is setting SUPPORTED_Pause and SUPPORTED_AsymPause. Andrew
On 8/1/17 9:58 PM, Andrew Lunn wrote: > The PHY does not participate directly in flow control/pause frames except by > making sure that the SUPPORTED_Pause and SUPPORTED_AsymPause bits are set in > MII_ADVERTISE to indicate towards the link partner that the Ethernet MAC > controller supports such a thing. Since flow control/pause frames generation > involves the Ethernet MAC driver, it is recommended that this driver takes care > of properly indicating advertisement and support for such features by setting > the SUPPORTED_Pause and SUPPORTED_AsymPause bits accordingly. This can be done > either before or after phy_connect() and/or as a result of implementing the > ethtool::set_pauseparam feature. > > So just check if the MAC driver is setting SUPPORTED_Pause and > SUPPORTED_AsymPause. I think this was covered in this thread: http://www.spinics.net/lists/netdev/msg408978.html
From: Timur Tabi > Sent: 01 August 2017 22:38 > The EMAC has a curious qwirk when RX flow control is enabled and the > kernel hangs. With the kernel hung, the EMAC's RX queue soon fills. > If RX flow control is enabled, the EMAC will then send a non-stop > stream of pause frames until the system is reset. The EMAC does not > have a built-in watchdog. > > In various tests, the pause frame stream sometimes overloads nearby > switches, effectively disabling the network. ... If the nearby switches cannot handle pause frames, then the MAC shouldn't be sending them at all. They I suspect that they should only ever be sent if the phy autonegotiation indicates that they are supported. You might want to avoid sending them even if allowed, or advertise non-support on hardware that could support them, but sending them anyway is likely to cause grief. This is similar to the problems that arise when the 'speed' is forced instead of limiting the advertised speed to one value. David
On 8/2/17 8:48 AM, David Laight wrote: > If the nearby switches cannot handle pause frames, then the MAC shouldn't > be sending them at all. There's no way for me to know whether the switches can handle the pause frames or not. You would think that sending one multicast pause frame ever 33ms would not overload a switch, but in our lab it does. This is why I changed the default to disable sending pause frames. > They > > I suspect that they should only ever be sent if the phy autonegotiation > indicates that they are supported. Or if you manually enable them. That's why I changed to default: the MAC is now configured by default to accept pause frames but not send them.
From: Timur Tabi > Sent: 02 August 2017 15:22 > On 8/2/17 8:48 AM, David Laight wrote: > > If the nearby switches cannot handle pause frames, then the MAC shouldn't > > be sending them at all. > > There's no way for me to know whether the switches can handle the pause > frames or not. You would think that sending one multicast pause frame > ever 33ms would not overload a switch, but in our lab it does. > > This is why I changed the default to disable sending pause frames. ... Thinks ... Sending pause frames just tells the adjacent switch not to send you packets (because you'll discard them). Since the idea is to avoid the discards, the switch will buffer the packets it would have sent. The buffers in the switch then fill up with packets it isn't sending you. The switch then runs out of buffers, it has 2 choices: 1) Throw the packets away. 2) Send 'pause' frames to the sources. If it sends 'pause' frames the entire network will very quickly lock up. If it discards the packets they might as well have been discarded by the receiving MAC. Doesn't this mean that pause frames are 99.999% useless?? David
On 08/02/2017 09:51 AM, David Laight wrote: > Sending pause frames just tells the adjacent switch not to send you packets > (because you'll discard them). > Since the idea is to avoid the discards, the switch will buffer the > packets it would have sent. > The buffers in the switch then fill up with packets it isn't sending you. I was under the impression that the switch forwards the pause frames to other devices, so that the transmitting NIC can stop sending the data, but your explanation makes a lot more sense. If the EMAC never stops sending pause frames, then the switch's buffers will fill up, disabling all other devices. If the switch does not have per-port buffers, then it makes sense when the buffer is full, it blocks all ports. > The switch then runs out of buffers, it has 2 choices: > 1) Throw the packets away. > 2) Send 'pause' frames to the sources. > If it sends 'pause' frames the entire network will very quickly lock up. > If it discards the packets they might as well have been discarded by the > receiving MAC. > > Doesn't this mean that pause frames are 99.999% useless?? Pause frames are intended for situations where the receiving CPU is temporarily overwhelmed and just needs a second or two to resume processing incoming packets. That makes sense on a dinky single-core 32-bit CPU.
From: Timur Tabi > Sent: 02 August 2017 16:09 > On 08/02/2017 09:51 AM, David Laight wrote: > > Sending pause frames just tells the adjacent switch not to send you packets > > (because you'll discard them). > > Since the idea is to avoid the discards, the switch will buffer the > > packets it would have sent. > > The buffers in the switch then fill up with packets it isn't sending you. > > I was under the impression that the switch forwards the pause frames to > other devices, so that the transmitting NIC can stop sending the data, Most of my ethernet knowledge predates FDX :-) It wouldn't make any sense to try to use the source address of a pause frame to suppress traffic to a specific MAC - that would have to go way down into IP on all the receiving systems. Also ISTR that pause frames are very short and don't even contain MAC addresses. > but your explanation makes a lot more sense. If the EMAC never stops > sending pause frames, then the switch's buffers will fill up, disabling > all other devices. If the switch does not have per-port buffers, then > it makes sense when the buffer is full, it blocks all ports. A switch will (probably) either have buffers for each input port, or for each output port (or maybe both). Output port buffers are less likely to cause grief when an output port is running at a slower speed than the input port. But is a switch is going to send pause frames it doesn't really matter how the buffers are arranged. The cascade will still happen. It would take very careful heuristics in a switch to manage pause frames properly. Of course, once the kernel has crashed, even multicast packets will eventually run the MAC out of buffers. (Unless you run on private network and manage to reinstall and reboot while the MAC is still active and then eventually die when a receive frame overwrites what used to be a receive buffer.) David
From: Timur Tabi <timur@codeaurora.org> Date: Tue, 1 Aug 2017 16:37:39 -0500 > The EMAC has a curious qwirk when RX flow control is enabled and the > kernel hangs. With the kernel hung, the EMAC's RX queue soon fills. > If RX flow control is enabled, the EMAC will then send a non-stop > stream of pause frames until the system is reset. The EMAC does not > have a built-in watchdog. > > In various tests, the pause frame stream sometimes overloads nearby > switches, effectively disabling the network. Since the RX queue is > large and the host processor is more than capable of handling incoming > packets quickly, the only time the EMAC will send any pause frames is > when the kernel is hung and unrecoverable. > > To avoid all these problems, we disable flow control autonegotiation > by default, and only enable receiving pause frames. > > Cc: stable@vger.kernel.org # 4.11.x > Signed-off-by: Timur Tabi <timur@codeaurora.org> I've thought about this a lot and I don't like it for many reasons. First of all, this hung kernel scenerio is completely bogus. The ethernet chip may not have a proper watchdog for the pause frame generation logic, but the whole kernel sure as hell does. And if this kind of thing matters to the user, they will have a software or hardware watchdog driver enabled to break out of this situation. Turning off flow control by default has so many negative ramifications and don't try to convince me that users will be "aware" of this and turn it back on. They largely won't.
On 08/02/2017 12:54 PM, David Miller wrote: > And if this kind of thing matters to the user, they will have a > software or hardware watchdog driver enabled to break out of this > situation. The problem is that the user is not going to expect that the EMAC can disable the nearby switch(es) when the kernel is hung and not rebooted quickly enough. Internally, this bug/feature has caused quite a bit of mayhem, so the problem is real. No cares about enabling flow control -- it just happens to be enabled on some systems where the switch agrees to it. So random individuals can't debug the hardware because suddenly the EMAC has gone haywire and disabled the local network. > Turning off flow control by default has so many negative ramifications > and don't try to convince me that users will be "aware" of this and > turn it back on. What are the negative ramifications? It's practically impossible to overload the chip such that it can't process the incoming packets fast enough. I don't know of any real-world situation where the EMAC needs to transmit pause frames.
From: Timur Tabi <timur@codeaurora.org> Date: Wed, 2 Aug 2017 13:23:18 -0500 > On 08/02/2017 12:54 PM, David Miller wrote: >> And if this kind of thing matters to the user, they will have a >> software or hardware watchdog driver enabled to break out of this >> situation. > > The problem is that the user is not going to expect that the EMAC can > disable the nearby switch(es) when the kernel is hung and not rebooted > quickly enough. Internally, this bug/feature has caused quite a bit > of mayhem, so the problem is real. No cares about enabling flow > control -- it just happens to be enabled on some systems where the > switch agrees to it. So random individuals can't debug the hardware > because suddenly the EMAC has gone haywire and disabled the local > network. Again, any serious installation will have a system watchdog enabled which will break the pause frame bomb.
From: Timur Tabi <timur@codeaurora.org> Date: Wed, 2 Aug 2017 13:23:18 -0500 > It's practically impossible to overload the chip such that it can't > process the incoming packets fast enough. I don't know of any > real-world situation where the EMAC needs to transmit pause frames. Slow cpus or very expensive stack operations can cause this, it's not a property of the EMAC chip at all.
On 08/02/2017 01:35 PM, David Miller wrote: > Again, any serious installation will have a system watchdog enabled > which will break the pause frame bomb. Oh well. I guess I'll have to carry this patch internally. What about patch #2?
From: Timur Tabi <timur@codeaurora.org> Date: Wed, 2 Aug 2017 13:39:34 -0500 > On 08/02/2017 01:35 PM, David Miller wrote: >> Again, any serious installation will have a system watchdog enabled >> which will break the pause frame bomb. > > Oh well. I guess I'll have to carry this patch internally. So you don't want to run a proper watchdog on your systems? You want them to just hang there and wait for someone to notice instead? > What about patch #2? No real objection.
On 8/2/17 6:15 PM, David Miller wrote: > So you don't want to run a proper watchdog on your systems? > You want them to just hang there and wait for someone to > notice instead? This is for internal development. We noticed the problem first during debugging, when we would halt a core for more than a couple minutes. Then the entire lab network would go down.
diff --git a/drivers/net/ethernet/qualcomm/emac/emac.c b/drivers/net/ethernet/qualcomm/emac/emac.c index 60850bfa3d32..475c0ea29235 100644 --- a/drivers/net/ethernet/qualcomm/emac/emac.c +++ b/drivers/net/ethernet/qualcomm/emac/emac.c @@ -441,8 +441,13 @@ static void emac_init_adapter(struct emac_adapter *adpt) /* others */ adpt->preamble = EMAC_PREAMBLE_DEF; - /* default to automatic flow control */ - adpt->automatic = true; + /* Disable transmission of pause frames by default, to avoid the + * risk of a pause frame flood that can occur if the kernel hangs. + * We still want to be able to respond to them, however. + */ + adpt->automatic = false; + adpt->tx_flow_control = false; + adpt->rx_flow_control = true; } /* Get the clock */
The EMAC has a curious qwirk when RX flow control is enabled and the kernel hangs. With the kernel hung, the EMAC's RX queue soon fills. If RX flow control is enabled, the EMAC will then send a non-stop stream of pause frames until the system is reset. The EMAC does not have a built-in watchdog. In various tests, the pause frame stream sometimes overloads nearby switches, effectively disabling the network. Since the RX queue is large and the host processor is more than capable of handling incoming packets quickly, the only time the EMAC will send any pause frames is when the kernel is hung and unrecoverable. To avoid all these problems, we disable flow control autonegotiation by default, and only enable receiving pause frames. Cc: stable@vger.kernel.org # 4.11.x Signed-off-by: Timur Tabi <timur@codeaurora.org> --- drivers/net/ethernet/qualcomm/emac/emac.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)