ezchip: nps_enet: check if napi has been completed

Submitted by Zakharov Vlad on March 29, 2017, 10:41 a.m.

Details

Message ID 1490784106-14489-1-git-send-email-vzakhar@synopsys.com
State New
Headers show

Commit Message

Zakharov Vlad March 29, 2017, 10:41 a.m.
After a new NAPI_STATE_MISSED state was added to NAPI we can get into
this state and in such case we have to reschedule NAPI as some work is
still pending and we have to process it. napi_complete_done() function
returns false if we have to reschedule something (e.g. in case we were
in MISSED state) as current polling have not been completed yet.

nps_enet driver hasn't been verifying the return value of
napi_complete_done() and has been forcibly enabling interrupts. That is
not correct as we should not enable interrupts before we have processed
all scheduled work. As a result we were getting trapped in interrupt
hanlder chain as we had never been able to disabale ethernet
interrupts again.

So this patch makes nps_enet_poll() func verify return value of
napi_complete_done() and enable interrupts only in case all scheduled
work has been completed.

Signed-off-by: Vlad Zakharov <vzakhar@synopsys.com>
---
 drivers/net/ethernet/ezchip/nps_enet.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

Comments

David Miller March 29, 2017, 9:30 p.m.
From: Vlad Zakharov <Vladislav.Zakharov@synopsys.com>
Date: Wed, 29 Mar 2017 13:41:46 +0300

> After a new NAPI_STATE_MISSED state was added to NAPI we can get into
> this state and in such case we have to reschedule NAPI as some work is
> still pending and we have to process it. napi_complete_done() function
> returns false if we have to reschedule something (e.g. in case we were
> in MISSED state) as current polling have not been completed yet.
> 
> nps_enet driver hasn't been verifying the return value of
> napi_complete_done() and has been forcibly enabling interrupts. That is
> not correct as we should not enable interrupts before we have processed
> all scheduled work. As a result we were getting trapped in interrupt
> hanlder chain as we had never been able to disabale ethernet
> interrupts again.
> 
> So this patch makes nps_enet_poll() func verify return value of
> napi_complete_done() and enable interrupts only in case all scheduled
> work has been completed.
> 
> Signed-off-by: Vlad Zakharov <vzakhar@synopsys.com>

Applied.

Eric, if this is really required now, we have 148 broken drivers still.
Eric Dumazet March 29, 2017, 9:41 p.m.
On Wed, Mar 29, 2017 at 2:30 PM, David Miller <davem@davemloft.net> wrote:
Signed-off-by: Vlad Zakharov <vzakhar@synopsys.com>
>
> Applied.
>
> Eric, if this is really required now, we have 148 broken drivers still.

Piece of cake :/

If we get more reports like that, we might implement a logic to
prevent infinite loops.

It is not clear to me what exactly happened to this driver, since
testing napi_complete_done() was not mandatory.
Zakharov Vlad March 30, 2017, 9:16 a.m.
Hi Eric,

On Wed, 2017-03-29 at 14:41 -0700, Eric Dumazet wrote:
> On Wed, Mar 29, 2017 at 2:30 PM, David Miller <davem@davemloft.net> wrote:

> Signed-off-by: Vlad Zakharov <vzakhar@synopsys.com>

> > 

> > 

> > Applied.

> > 

> > Eric, if this is really required now, we have 148 broken drivers still.

> 

> Piece of cake :/

> 

> If we get more reports like that, we might implement a logic to

> prevent infinite loops.

> 

> It is not clear to me what exactly happened to this driver, since

> testing napi_complete_done() was not mandatory.


I am not sure what is happening with other drivers, but in case of ezchip nps_enet driver after the following commit:
39e6c8208d7b6fb9d2047850fb3327db567b564b

if we got into NAPI_STATE_MISSED state the following happened:
in nps_enet_poll func we were calling napi_complete_done() (which reset MISSED state but left SCHED state) and after
that without any checks were enabling interrupts.

Then we obviously were getting into nps_enet_irq_hanlder() if irq was pending (it is very possbile state). If we look
inside this function we will see that it disables interrupts only in case napi_sched_prep() returns true. In its turn
napi_sched_prep() returns true only in case it changes state from non-SCHED to SCHED. But in our case as SCHED had been
already set it set MISSED state and then returned false. So at that point we had already been trapped: after exiting irq
hanlder we were getting into nps_enet_irq_hanlder() again and again as we couldn't rescind pending irq signal and
disable corresponding irq. 

-- 
Best regards,
Vlad Zakharov <vzakhar@synopsys.com>
Eric Dumazet March 30, 2017, 1:25 p.m.
On Thu, Mar 30, 2017 at 2:16 AM, Vlad Zakharov
<Vladislav.Zakharov@synopsys.com> wrote:

> I am not sure what is happening with other drivers, but in case of ezchip nps_enet driver after the following commit:
> 39e6c8208d7b6fb9d2047850fb3327db567b564b
>
> if we got into NAPI_STATE_MISSED state the following happened:
> in nps_enet_poll func we were calling napi_complete_done() (which reset MISSED state but left SCHED state) and after
> that without any checks were enabling interrupts.
>
> Then we obviously were getting into nps_enet_irq_hanlder() if irq was pending (it is very possbile state). If we look
> inside this function we will see that it disables interrupts only in case napi_sched_prep() returns true. In its turn
> napi_sched_prep() returns true only in case it changes state from non-SCHED to SCHED. But in our case as SCHED had been
> already set it set MISSED state and then returned false. So at that point we had already been trapped: after exiting irq
> hanlder we were getting into nps_enet_irq_hanlder() again and again as we couldn't rescind pending irq signal and
> disable corresponding irq.

Hi Vlad

Considering the use of  napi_schedule_prep() in
nps_enet_irq_handler(), it is strange that nps_enet_poll() uses :

if (nps_enet_is_tx_pending(priv)) {
     nps_enet_reg_set(priv, NPS_ENET_REG_BUF_INT_ENABLE, 0);
     napi_reschedule(napi); // note the return value is ignored...
}

So nps_enet_poll() was enabling interrupts, then disabling them, which
seems very unusual.

I need to check all drivers using napi_schedule_prep() and/or
napi_reschedule() and make sure they also test napi_comple_done()
return value...

I count 12 drivers using napi_reschedule() without checking its return value.
They either should check its return value, or use conventional napi_schedule()

Thanks !
Zakharov Vlad April 26, 2017, 12:56 p.m.
Hi David, all,

On Wed, 2017-03-29 at 14:30 -0700, David Miller wrote:
> From: Vlad Zakharov <Vladislav.Zakharov@synopsys.com>

> Date: Wed, 29 Mar 2017 13:41:46 +0300

> 

> > 

> > After a new NAPI_STATE_MISSED state was added to NAPI we can get into

> > this state and in such case we have to reschedule NAPI as some work is

> > still pending and we have to process it. napi_complete_done() function

> > returns false if we have to reschedule something (e.g. in case we were

> > in MISSED state) as current polling have not been completed yet.

> > 

> > nps_enet driver hasn't been verifying the return value of

> > napi_complete_done() and has been forcibly enabling interrupts. That is

> > not correct as we should not enable interrupts before we have processed

> > all scheduled work. As a result we were getting trapped in interrupt

> > hanlder chain as we had never been able to disabale ethernet

> > interrupts again.

> > 

> > So this patch makes nps_enet_poll() func verify return value of

> > napi_complete_done() and enable interrupts only in case all scheduled

> > work has been completed.

> > 

> > Signed-off-by: Vlad Zakharov <vzakhar@synopsys.com>

> 

> Applied.

> 

> Eric, if this is really required now, we have 148 broken drivers still.


Could you please backport this patch to stable tree (starting from 4.10) as it is pretty important and fixes nps_enet
driver?

It became actual after Eric's commit 39e6c8208d7b (net: solve a NAPI race) which has already been backported to 4.10.

Thanks.

-- 
Best regards,
Vlad Zakharov <vzakhar@synopsys.com>

Patch hide | download patch | download mbox

diff --git a/drivers/net/ethernet/ezchip/nps_enet.c b/drivers/net/ethernet/ezchip/nps_enet.c
index 992ebe9..f819843 100644
--- a/drivers/net/ethernet/ezchip/nps_enet.c
+++ b/drivers/net/ethernet/ezchip/nps_enet.c
@@ -189,11 +189,9 @@  static int nps_enet_poll(struct napi_struct *napi, int budget)
 
 	nps_enet_tx_handler(ndev);
 	work_done = nps_enet_rx_handler(ndev);
-	if (work_done < budget) {
+	if ((work_done < budget) && napi_complete_done(napi, work_done)) {
 		u32 buf_int_enable_value = 0;
 
-		napi_complete_done(napi, work_done);
-
 		/* set tx_done and rx_rdy bits */
 		buf_int_enable_value |= NPS_ENET_ENABLE << RX_RDY_SHIFT;
 		buf_int_enable_value |= NPS_ENET_ENABLE << TX_DONE_SHIFT;