diff mbox

[resend,1/3] Revert "net: fec: fix missing napi_disable call"

Message ID CAHrpEqSi+caO5gSo1BZRGqytxhCOs-OuyjiCaG+7fnsnFBj77g@mail.gmail.com
State Superseded, archived
Delegated to: David Miller
Headers show

Commit Message

Zhi Li April 27, 2013, 10:26 a.m. UTC
2013/4/27 Robert Schwebel <r.schwebel@pengutronix.de>:
> On Fri, Apr 26, 2013 at 02:33:09PM -0400, David Miller wrote:
>> From: Robert Schwebel <r.schwebel@pengutronix.de>
>> Date: Fri, 26 Apr 2013 15:44:15 +0200
>>
>> > Seriously - it's friday, and 3.9 is expected to come out this
>> > weekend.
>>
>> Seriously, it took you how long to notice the breakage and report
>> it in sufficient detail for the author to make an attempt at a fix?
>>
>> I thnk Frank's request is reasonable given the circumstances, please
>> work closely with him on the fix.
>
> The FEC driver has worked fine in 3.8.x.
>
> Frank's patches for the 3.9 cycle...
>
> - remove locking in a way that memory is freed which is in use

The purpose of remove lock is that fix dead lock.
It is true I miss a execute path to reset BD descript.
But the possibility is very low.

You report this problem without detail reproduce steps.

> - break the driver, up to a point where the kernel oopses when the link
>   goes away

run script while [ 1 ]; do ethtool -s eth0 autoneg off;sleep 3;ethtool
-s eth0 autoneg on; sleep 4; done;

I just capture one oops during 1 hours.
I am try to fix this issue without add back lock.

I plan use below way to fix this issue.  I am running my stress test.
if you have time, you can run it at your sit.

> - mix up different changes (queue handling) and should have been split up
>   into separate patches
>
> Unfortunately, the breakage happens only on multicore (MX6Q) and if you
> change the link status; that's probably the reason why it hasn't been
> noticed earlier.
>
> We have really tried to find a "quick fix which does it right", but it
> has turned out that this isn't possible in such a short time, because it
> is more complex than just re-adding locks. We feel that the results of
> last week's activities are not good enough that they could be merged
> without further breakage.
>
> Please consider to merge the reverts. Otherwhise FEC will be broken in 3.9.
>
> Of course, we can help with a real solution, but please after 3.9.final.
>
> Thanks,
> Robert
> --
> Pengutronix e.K.                           |                             |
> Industrial Linux Solutions                 | http://www.pengutronix.de/  |
> Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0    |
> Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Francois Romieu April 27, 2013, 12:06 p.m. UTC | #1
Frank Li <lznuaa@gmail.com> :
[...]
> diff --git a/drivers/net/ethernet/freescale/fec.c
> b/drivers/net/ethernet/freescale/fe
> index 73195f6..c945bb7 100644
> --- a/drivers/net/ethernet/freescale/fec.c
> +++ b/drivers/net/ethernet/freescale/fec.c
> @@ -403,6 +403,11 @@ fec_restart(struct net_device *ndev, int duplex)
>         const struct platform_device_id *id_entry =
>                                 platform_get_device_id(fep->pdev);
>         int i;
> +       if (netif_running(ndev)) {
> +               napi_disable(&fep->napi);

napi_disable may sleep.

fec_restart can be called with spinlock held in fec_enet_adjust_link
Zhi Li April 27, 2013, 12:43 p.m. UTC | #2
2013/4/27, Francois Romieu <romieu@fr.zoreil.com>:
> Frank Li <lznuaa@gmail.com> :
> [...]
>> diff --git a/drivers/net/ethernet/freescale/fec.c
>> b/drivers/net/ethernet/freescale/fe
>> index 73195f6..c945bb7 100644
>> --- a/drivers/net/ethernet/freescale/fec.c
>> +++ b/drivers/net/ethernet/freescale/fec.c
>> @@ -403,6 +403,11 @@ fec_restart(struct net_device *ndev, int duplex)
>>         const struct platform_device_id *id_entry =
>>                                 platform_get_device_id(fep->pdev);
>>         int i;
>> +       if (netif_running(ndev)) {
>> +               napi_disable(&fep->napi);
>
> napi_disable may sleep.
>
> fec_restart can be called with spinlock held in fec_enet_adjust_link

You are right. Spin lock in FEC enet adjust link can be removed when
remove spinlock in tx and RX.

>
> --
> Ueimor
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Francois Romieu April 27, 2013, 7:16 p.m. UTC | #3
Frank Li <lznuaa@gmail.com> :
> 2013/4/27, Francois Romieu <romieu@fr.zoreil.com>:
[...]
> > napi_disable may sleep.
> >
> > fec_restart can be called with spinlock held in fec_enet_adjust_link
> 
> You are right. Spin lock in FEC enet adjust link can be removed when
> remove spinlock in tx and RX.

fec_restart is also called from the netdev watchdog handler (fec_timeout)
and tasklet can't sleep either. You should imho schedule_work from
fec_timeout.

How is the driver supposed to avoid the napi context
fec_enet_tx:netif_queue_stopped vs fec_enet_start_xmit:netif_stop_queue
race since both can run on different cpu without any read/write ordering
enforcement between one thread netif_* and its peer {dirty/cur}_tx ?
Zhi Li April 28, 2013, 3:09 a.m. UTC | #4
2013/4/28 Francois Romieu <romieu@fr.zoreil.com>:
> Frank Li <lznuaa@gmail.com> :
>> 2013/4/27, Francois Romieu <romieu@fr.zoreil.com>:
> [...]
>> > napi_disable may sleep.
>> >
>> > fec_restart can be called with spinlock held in fec_enet_adjust_link
>>
>> You are right. Spin lock in FEC enet adjust link can be removed when
>> remove spinlock in tx and RX.
>
> fec_restart is also called from the netdev watchdog handler (fec_timeout)
> and tasklet can't sleep either. You should imho schedule_work from
> fec_timeout.

I will add delay_work to fix this issue.

>
> How is the driver supposed to avoid the napi context
> fec_enet_tx:netif_queue_stopped vs fec_enet_start_xmit:netif_stop_queue
> race since both can run on different cpu without any read/write ordering
> enforcement between one thread netif_* and its peer {dirty/cur}_tx ?

This is lockless design if only one read and one write. It likes kfifo.

Assume CPU1 run xmit.  CPU2 run napi fec_enet_tx

(1)       if (fep->cur_tx == fep->dirty_tx)
(2)                 netif_stop_queue(ndev);

if CPU2 update fep->dirty_tx before CPU1 run (1).  the condition is false.
if CPU2 update fep->dirty_tx after (1) before (2),   netif_stop_queue
will be called.  There are not problem at this time even though queue
is not full here.  queue is not empty for sure here. fec_enet_tx will
be called again when NAPI trigger by one frame finished transfer,
fec_enet_tx will wake up send queue.

Most worse case is waste one BD script.

>
> --
> Ueimor
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Zhi Li April 28, 2013, 4:31 a.m. UTC | #5
2013/4/28 Frank Li <lznuaa@gmail.com>:
> 2013/4/28 Francois Romieu <romieu@fr.zoreil.com>:
>> Frank Li <lznuaa@gmail.com> :
>>> 2013/4/27, Francois Romieu <romieu@fr.zoreil.com>:
>> [...]
>>> > napi_disable may sleep.
>>> >
>>> > fec_restart can be called with spinlock held in fec_enet_adjust_link
>>>
>>> You are right. Spin lock in FEC enet adjust link can be removed when
>>> remove spinlock in tx and RX.
>>
>> fec_restart is also called from the netdev watchdog handler (fec_timeout)
>> and tasklet can't sleep either. You should imho schedule_work from
>> fec_timeout.
>
> I will add delay_work to fix this issue.
>
>>
>> How is the driver supposed to avoid the napi context
>> fec_enet_tx:netif_queue_stopped vs fec_enet_start_xmit:netif_stop_queue
>> race since both can run on different cpu without any read/write ordering
>> enforcement between one thread netif_* and its peer {dirty/cur}_tx ?
>
> This is lockless design if only one read and one write. It likes kfifo.
>
> Assume CPU1 run xmit.  CPU2 run napi fec_enet_tx
>
> (1)       if (fep->cur_tx == fep->dirty_tx)
> (2)                 netif_stop_queue(ndev);
>
> if CPU2 update fep->dirty_tx before CPU1 run (1).  the condition is false.
> if CPU2 update fep->dirty_tx after (1) before (2),   netif_stop_queue
> will be called.  There are not problem at this time even though queue
> is not full here.  queue is not empty for sure here. fec_enet_tx will
> be called again when NAPI trigger by one frame finished transfer,
> fec_enet_tx will wake up send queue.
>
> Most worse case is waste one BD script.

I just send out a patch to try to fix your problem.
Can you help test it in your sit?

>
>>
>> --
>> Ueimor
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Francois Romieu April 28, 2013, 9:39 a.m. UTC | #6
Frank Li <lznuaa@gmail.com> :
[...]
> This is lockless design if only one read and one write. It likes kfifo.
> 
> Assume CPU1 run xmit.  CPU2 run napi fec_enet_tx
> 
> (1)       if (fep->cur_tx == fep->dirty_tx)
> (2)                 netif_stop_queue(ndev);
> 
> if CPU2 update fep->dirty_tx before CPU1 run (1).  the condition is false.
> if CPU2 update fep->dirty_tx after (1) before (2),   netif_stop_queue
> will be called.  There are not problem at this time even though queue
> is not full here.  queue is not empty for sure here. fec_enet_tx will
> be called again when NAPI trigger by one frame finished transfer,
> fec_enet_tx will wake up send queue.

"before" and "after" may not work as expected between different CPU without
explicit synchronization (or barrier). It won't oops but it would be careful
to envision something like drivers/net/ethernet/broadcom/tg3.c::tg3_tx.
Francois Romieu April 28, 2013, 9:40 a.m. UTC | #7
Frank Li <lznuaa@gmail.com> :
[...]
> I just send out a patch to try to fix your problem.
> Can you help test it in your sit?

I don't have the required hardware.

But I can find bugs while the napi-is-too-hard-please-do-proper-project-mgmt
squad watches elsewhere. :o)
Zhi Li April 28, 2013, 10:03 a.m. UTC | #8
2013/4/28 Francois Romieu <romieu@fr.zoreil.com>:
> Frank Li <lznuaa@gmail.com> :
> [...]
>> I just send out a patch to try to fix your problem.
>> Can you help test it in your sit?
>
> I don't have the required hardware.
>
> But I can find bugs while the napi-is-too-hard-please-do-proper-project-mgmt
> squad watches elsewhere. :o)

Sorry, what's your means?

>
> --
> Ueimor
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Zhi Li April 28, 2013, 10:05 a.m. UTC | #9
2013/4/28 Francois Romieu <romieu@fr.zoreil.com>:
> Frank Li <lznuaa@gmail.com> :
> [...]
>> This is lockless design if only one read and one write. It likes kfifo.
>>
>> Assume CPU1 run xmit.  CPU2 run napi fec_enet_tx
>>
>> (1)       if (fep->cur_tx == fep->dirty_tx)
>> (2)                 netif_stop_queue(ndev);
>>
>> if CPU2 update fep->dirty_tx before CPU1 run (1).  the condition is false.
>> if CPU2 update fep->dirty_tx after (1) before (2),   netif_stop_queue
>> will be called.  There are not problem at this time even though queue
>> is not full here.  queue is not empty for sure here. fec_enet_tx will
>> be called again when NAPI trigger by one frame finished transfer,
>> fec_enet_tx will wake up send queue.
>
> "before" and "after" may not work as expected between different CPU without
> explicit synchronization (or barrier). It won't oops but it would be careful
> to envision something like drivers/net/ethernet/broadcom/tg3.c::tg3_tx.

tg3_tx is safe method but need more lock.
But I think it is not related with this issue. I can change later.

>
> --
> Ueimor
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Francois Romieu April 28, 2013, 5:57 p.m. UTC | #10
Frank Li <lznuaa@gmail.com> :
[...]
> tg3_tx is safe method but need more lock.

It's the same design.

> But I think it is not related with this issue. I can change later.

It won't oops, sure. It will simply stop under high load or proper
timing.
Zhi Li April 28, 2013, 11:45 p.m. UTC | #11
2013/4/29, Francois Romieu <romieu@fr.zoreil.com>:
> Frank Li <lznuaa@gmail.com> :
> [...]
>> tg3_tx is safe method but need more lock.
>
> It's the same design.
>
>> But I think it is not related with this issue. I can change later.
>
> It won't oops, sure. It will simply stop under high load or proper
> timing.

It is difference problem.
It should be new patch.

Do you have test step to reproduce the problem what you said?

>
> --
> Ueimor
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Lucas Stach April 29, 2013, 1:46 p.m. UTC | #12
Am Samstag, den 27.04.2013, 18:26 +0800 schrieb Frank Li:
> >
> > - remove locking in a way that memory is freed which is in use
> 
> The purpose of remove lock is that fix dead lock.

Could you please elaborate? What is the exact situation which caused a
deadlock here? I tried to see what's happening from your changelogs, but
I'm not sure I can follow here. Wasn't the problem more of the used
spinlocks not being IRQ-save and thus potentially creating a deadlock?

Regards,
Lucas
Zhi Li May 2, 2013, 1:51 a.m. UTC | #13
2013/4/29 Lucas Stach <l.stach@pengutronix.de>:
> What is the exact situation which caused a
> deadlock here? I tried to see what's happening from your changelogs, but
> I'm not sure I can follow here. Wasn't the problem more of the used
> spinlocks not being IRQ-save and thus potentially creating a deadlock?

You can view below email thread.
http://www.spinics.net/lists/netdev/msg225240.html

The main problem network API use spin_lock_bh only now. So it does not
prefer use irq to handle skb process.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/net/ethernet/freescale/fec.c
b/drivers/net/ethernet/freescale/fe
index 73195f6..c945bb7 100644
--- a/drivers/net/ethernet/freescale/fec.c
+++ b/drivers/net/ethernet/freescale/fec.c
@@ -403,6 +403,11 @@  fec_restart(struct net_device *ndev, int duplex)
        const struct platform_device_id *id_entry =
                                platform_get_device_id(fep->pdev);
        int i;
+       if (netif_running(ndev)) {
+               napi_disable(&fep->napi);
+               netif_stop_queue(ndev);
+       }
+
        u32 temp_mac[2];
        u32 rcntl = OPT_FRAME_SIZE | 0x04;
        u32 ecntl = 0x2; /* ETHEREN */
@@ -559,6 +564,11 @@  fec_restart(struct net_device *ndev, int duplex)

        /* Enable interrupts we wish to service */
        writel(FEC_DEFAULT_IMASK, fep->hwp + FEC_IMASK);
+
+       if (netif_running(ndev)) {
+               napi_enable(&fep->napi);
+               netif_wake_queue(ndev);
+       }
 }