diff mbox series

[net] net: ethernet: fec: prevent tx starvation under high rx load

Message ID 20200629191601.5169-1-tobias@waldekranz.com
State Changes Requested
Delegated to: David Miller
Headers show
Series [net] net: ethernet: fec: prevent tx starvation under high rx load | expand

Commit Message

Tobias Waldekranz June 29, 2020, 7:16 p.m. UTC
In the ISR, we poll the event register for the queues in need of
service and then enter polled mode. After this point, the event
register will never be read again until we exit polled mode.

In a scenario where a UDP flow is routed back out through the same
interface, i.e. "router-on-a-stick" we'll typically only see an rx
queue event initially. Once we start to process the incoming flow
we'll be locked polled mode, but we'll never clean the tx rings since
that event is never caught.

Eventually the netdev watchdog will trip, causing all buffers to be
dropped and then the process starts over again.

By adding a poll of the active events at each NAPI call, we avoid the
starvation.

Fixes: 4d494cdc92b3 ("net: fec: change data structure to support multiqueue")
Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
---
 drivers/net/ethernet/freescale/fec_main.c | 22 +++++++++++++---------
 1 file changed, 13 insertions(+), 9 deletions(-)

Comments

David Miller June 29, 2020, 8:07 p.m. UTC | #1
From: Tobias Waldekranz <tobias@waldekranz.com>
Date: Mon, 29 Jun 2020 21:16:01 +0200

> In the ISR, we poll the event register for the queues in need of
> service and then enter polled mode. After this point, the event
> register will never be read again until we exit polled mode.
> 
> In a scenario where a UDP flow is routed back out through the same
> interface, i.e. "router-on-a-stick" we'll typically only see an rx
> queue event initially. Once we start to process the incoming flow
> we'll be locked polled mode, but we'll never clean the tx rings since
> that event is never caught.
> 
> Eventually the netdev watchdog will trip, causing all buffers to be
> dropped and then the process starts over again.
> 
> By adding a poll of the active events at each NAPI call, we avoid the
> starvation.
> 
> Fixes: 4d494cdc92b3 ("net: fec: change data structure to support multiqueue")
> Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>

I don't see how this can happen since you process the TX queue
unconditionally every NAPI pass, regardless of what bits you see
set in the IEVENT register.

Or don't you?  Oh, I see, you don't:

	for_each_set_bit(queue_id, &fep->work_tx, FEC_ENET_MAX_TX_QS) {

That's the problem.  Just unconditionally process the TX work regardless
of what is in IEVENT.  That whole ->tx_work member and the code that
uses it can just be deleted.  fec_enet_collect_events() can just return
a boolean saying whether there is any RX or TX work at all.

Than you're performance and latency will be even better in this situation.
Andy Duan June 30, 2020, 6:28 a.m. UTC | #2
From: Tobias Waldekranz <tobias@waldekranz.com> Sent: Tuesday, June 30, 2020 3:16 AM
> In the ISR, we poll the event register for the queues in need of service and
> then enter polled mode. After this point, the event register will never be read
> again until we exit polled mode.
> 
> In a scenario where a UDP flow is routed back out through the same interface,
> i.e. "router-on-a-stick" we'll typically only see an rx queue event initially.
> Once we start to process the incoming flow we'll be locked polled mode, but
> we'll never clean the tx rings since that event is never caught.
> 
> Eventually the netdev watchdog will trip, causing all buffers to be dropped and
> then the process starts over again.
> 
> By adding a poll of the active events at each NAPI call, we avoid the
> starvation.
> 
> Fixes: 4d494cdc92b3 ("net: fec: change data structure to support
> multiqueue")
> Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>

Acked-by: Fugang Duan <fugang.duan@nxp.com>
> ---
>  drivers/net/ethernet/freescale/fec_main.c | 22 +++++++++++++---------
>  1 file changed, 13 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/net/ethernet/freescale/fec_main.c
> b/drivers/net/ethernet/freescale/fec_main.c
> index 2d0d313ee7c5..f6e25c2d2c8c 100644
> --- a/drivers/net/ethernet/freescale/fec_main.c
> +++ b/drivers/net/ethernet/freescale/fec_main.c
> @@ -1617,8 +1617,17 @@ fec_enet_rx(struct net_device *ndev, int
> budget)  }
> 
>  static bool
> -fec_enet_collect_events(struct fec_enet_private *fep, uint int_events)
> +fec_enet_collect_events(struct fec_enet_private *fep)
>  {
> +       uint int_events;
> +
> +       int_events = readl(fep->hwp + FEC_IEVENT);
> +
> +       /* Don't clear MDIO events, we poll for those */
> +       int_events &= ~FEC_ENET_MII;
> +
> +       writel(int_events, fep->hwp + FEC_IEVENT);
> +
>         if (int_events == 0)
>                 return false;
> 
> @@ -1644,16 +1653,9 @@ fec_enet_interrupt(int irq, void *dev_id)  {
>         struct net_device *ndev = dev_id;
>         struct fec_enet_private *fep = netdev_priv(ndev);
> -       uint int_events;
>         irqreturn_t ret = IRQ_NONE;
> 
> -       int_events = readl(fep->hwp + FEC_IEVENT);
> -
> -       /* Don't clear MDIO events, we poll for those */
> -       int_events &= ~FEC_ENET_MII;
> -
> -       writel(int_events, fep->hwp + FEC_IEVENT);
> -       fec_enet_collect_events(fep, int_events);
> +       fec_enet_collect_events(fep);
> 
>         if ((fep->work_tx || fep->work_rx) && fep->link) {
>                 ret = IRQ_HANDLED;
> @@ -1674,6 +1676,8 @@ static int fec_enet_rx_napi(struct napi_struct
> *napi, int budget)
>         struct fec_enet_private *fep = netdev_priv(ndev);
>         int pkts;
> 
> +       fec_enet_collect_events(fep);
> +
>         pkts = fec_enet_rx(ndev, budget);
> 
>         fec_enet_tx(ndev);
> --
> 2.17.1
Tobias Waldekranz June 30, 2020, 6:39 a.m. UTC | #3
On Mon Jun 29, 2020 at 3:07 PM CEST, David Miller wrote:
> From: Tobias Waldekranz <tobias@waldekranz.com>
> Date: Mon, 29 Jun 2020 21:16:01 +0200
>
> > In the ISR, we poll the event register for the queues in need of
> > service and then enter polled mode. After this point, the event
> > register will never be read again until we exit polled mode.
> > 
> > In a scenario where a UDP flow is routed back out through the same
> > interface, i.e. "router-on-a-stick" we'll typically only see an rx
> > queue event initially. Once we start to process the incoming flow
> > we'll be locked polled mode, but we'll never clean the tx rings since
> > that event is never caught.
> > 
> > Eventually the netdev watchdog will trip, causing all buffers to be
> > dropped and then the process starts over again.
> > 
> > By adding a poll of the active events at each NAPI call, we avoid the
> > starvation.
> > 
> > Fixes: 4d494cdc92b3 ("net: fec: change data structure to support multiqueue")
> > Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
>
> I don't see how this can happen since you process the TX queue
> unconditionally every NAPI pass, regardless of what bits you see
> set in the IEVENT register.
>
> Or don't you? Oh, I see, you don't:
>
> for_each_set_bit(queue_id, &fep->work_tx, FEC_ENET_MAX_TX_QS) {
>
> That's the problem. Just unconditionally process the TX work regardless
> of what is in IEVENT. That whole ->tx_work member and the code that
> uses it can just be deleted. fec_enet_collect_events() can just return
> a boolean saying whether there is any RX or TX work at all.

Maybe Andy could chime in here, but I think the ->tx_work construction
is load bearing. It seems to me like that is the only thing stopping
us from trying to process non-existing queues on older versions of the
silicon which only has a single queue.
David Miller June 30, 2020, 7:58 p.m. UTC | #4
From: "Tobias Waldekranz" <tobias@waldekranz.com>
Date: Tue, 30 Jun 2020 08:39:58 +0200

> On Mon Jun 29, 2020 at 3:07 PM CEST, David Miller wrote:
>> I don't see how this can happen since you process the TX queue
>> unconditionally every NAPI pass, regardless of what bits you see
>> set in the IEVENT register.
>>
>> Or don't you? Oh, I see, you don't:
>>
>> for_each_set_bit(queue_id, &fep->work_tx, FEC_ENET_MAX_TX_QS) {
>>
>> That's the problem. Just unconditionally process the TX work regardless
>> of what is in IEVENT. That whole ->tx_work member and the code that
>> uses it can just be deleted. fec_enet_collect_events() can just return
>> a boolean saying whether there is any RX or TX work at all.
> 
> Maybe Andy could chime in here, but I think the ->tx_work construction
> is load bearing. It seems to me like that is the only thing stopping
> us from trying to process non-existing queues on older versions of the
> silicon which only has a single queue.

Then iterate over "actually existing" queues.

My primary point still stands.
Andy Duan July 1, 2020, 3:22 a.m. UTC | #5
From: David Miller <davem@davemloft.net> Sent: Wednesday, July 1, 2020 3:58 AM
> From: "Tobias Waldekranz" <tobias@waldekranz.com>
> Date: Tue, 30 Jun 2020 08:39:58 +0200
> 
> > On Mon Jun 29, 2020 at 3:07 PM CEST, David Miller wrote:
> >> I don't see how this can happen since you process the TX queue
> >> unconditionally every NAPI pass, regardless of what bits you see set
> >> in the IEVENT register.
> >>
> >> Or don't you? Oh, I see, you don't:
> >>
> >> for_each_set_bit(queue_id, &fep->work_tx, FEC_ENET_MAX_TX_QS) {
> >>
> >> That's the problem. Just unconditionally process the TX work
> >> regardless of what is in IEVENT. That whole ->tx_work member and the
> >> code that uses it can just be deleted. fec_enet_collect_events() can
> >> just return a boolean saying whether there is any RX or TX work at all.
> >
> > Maybe Andy could chime in here, but I think the ->tx_work construction
> > is load bearing. It seems to me like that is the only thing stopping
> > us from trying to process non-existing queues on older versions of the
> > silicon which only has a single queue.
> 
> Then iterate over "actually existing" queues.
Yes, the iterate over real queues, but only bit2 has the chance to be set, so it
Is compatible with single queue.
diff mbox series

Patch

diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c
index 2d0d313ee7c5..f6e25c2d2c8c 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -1617,8 +1617,17 @@  fec_enet_rx(struct net_device *ndev, int budget)
 }
 
 static bool
-fec_enet_collect_events(struct fec_enet_private *fep, uint int_events)
+fec_enet_collect_events(struct fec_enet_private *fep)
 {
+	uint int_events;
+
+	int_events = readl(fep->hwp + FEC_IEVENT);
+
+	/* Don't clear MDIO events, we poll for those */
+	int_events &= ~FEC_ENET_MII;
+
+	writel(int_events, fep->hwp + FEC_IEVENT);
+
 	if (int_events == 0)
 		return false;
 
@@ -1644,16 +1653,9 @@  fec_enet_interrupt(int irq, void *dev_id)
 {
 	struct net_device *ndev = dev_id;
 	struct fec_enet_private *fep = netdev_priv(ndev);
-	uint int_events;
 	irqreturn_t ret = IRQ_NONE;
 
-	int_events = readl(fep->hwp + FEC_IEVENT);
-
-	/* Don't clear MDIO events, we poll for those */
-	int_events &= ~FEC_ENET_MII;
-
-	writel(int_events, fep->hwp + FEC_IEVENT);
-	fec_enet_collect_events(fep, int_events);
+	fec_enet_collect_events(fep);
 
 	if ((fep->work_tx || fep->work_rx) && fep->link) {
 		ret = IRQ_HANDLED;
@@ -1674,6 +1676,8 @@  static int fec_enet_rx_napi(struct napi_struct *napi, int budget)
 	struct fec_enet_private *fep = netdev_priv(ndev);
 	int pkts;
 
+	fec_enet_collect_events(fep);
+
 	pkts = fec_enet_rx(ndev, budget);
 
 	fec_enet_tx(ndev);