diff mbox series

Revert "net: linkwatch: add check for netdevice being present to linkwatch_do_dev"

Message ID 20200901150237.15302-1-geert+renesas@glider.be
State Changes Requested
Delegated to: David Miller
Headers show
Series Revert "net: linkwatch: add check for netdevice being present to linkwatch_do_dev" | expand

Commit Message

Geert Uytterhoeven Sept. 1, 2020, 3:02 p.m. UTC
This reverts commit 124eee3f6955f7aa19b9e6ff5c9b6d37cb3d1e2c.

Inami-san reported that this commit breaks bridge support in a Xen
environment, and that reverting it fixes this.

During system resume, bridge ports are no longer enabled, as that relies
on the receipt of the NETDEV_CHANGE notification.  This notification is
not sent, as netdev_state_change() is no longer called.

Note that the condition this commit intended to fix never existed
upstream, as the patch triggering it and referenced in the commit was
never applied upstream.  Hence I can confirm s2ram on r8a73a4/ape6evm
and sh73a0/kzm9g works fine before/after this revert.

Reported-by Gaku Inami <gaku.inami.xh@renesas.com>
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
---
 net/core/link_watch.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

David Miller Sept. 3, 2020, 9:58 p.m. UTC | #1
From: Geert Uytterhoeven <geert+renesas@glider.be>
Date: Tue,  1 Sep 2020 17:02:37 +0200

> This reverts commit 124eee3f6955f7aa19b9e6ff5c9b6d37cb3d1e2c.

Heiner, please review this.

Thank you.

> Inami-san reported that this commit breaks bridge support in a Xen
> environment, and that reverting it fixes this.
> 
> During system resume, bridge ports are no longer enabled, as that relies
> on the receipt of the NETDEV_CHANGE notification.  This notification is
> not sent, as netdev_state_change() is no longer called.
> 
> Note that the condition this commit intended to fix never existed
> upstream, as the patch triggering it and referenced in the commit was
> never applied upstream.  Hence I can confirm s2ram on r8a73a4/ape6evm
> and sh73a0/kzm9g works fine before/after this revert.
> 
> Reported-by Gaku Inami <gaku.inami.xh@renesas.com>
> Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
> ---
>  net/core/link_watch.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/core/link_watch.c b/net/core/link_watch.c
> index 75431ca9300fb9c4..c24574493ecf95e6 100644
> --- a/net/core/link_watch.c
> +++ b/net/core/link_watch.c
> @@ -158,7 +158,7 @@ static void linkwatch_do_dev(struct net_device *dev)
>  	clear_bit(__LINK_STATE_LINKWATCH_PENDING, &dev->state);
>  
>  	rfc2863_policy(dev);
> -	if (dev->flags & IFF_UP && netif_device_present(dev)) {
> +	if (dev->flags & IFF_UP) {
>  		if (netif_carrier_ok(dev))
>  			dev_activate(dev);
>  		else
> -- 
> 2.17.1
>
David Miller Sept. 10, 2020, 7:20 p.m. UTC | #2
From: Geert Uytterhoeven <geert+renesas@glider.be>
Date: Tue,  1 Sep 2020 17:02:37 +0200

> This reverts commit 124eee3f6955f7aa19b9e6ff5c9b6d37cb3d1e2c.
> 
> Inami-san reported that this commit breaks bridge support in a Xen
> environment, and that reverting it fixes this.
> 
> During system resume, bridge ports are no longer enabled, as that relies
> on the receipt of the NETDEV_CHANGE notification.  This notification is
> not sent, as netdev_state_change() is no longer called.
> 
> Note that the condition this commit intended to fix never existed
> upstream, as the patch triggering it and referenced in the commit was
> never applied upstream.  Hence I can confirm s2ram on r8a73a4/ape6evm
> and sh73a0/kzm9g works fine before/after this revert.
> 
> Reported-by Gaku Inami <gaku.inami.xh@renesas.com>
> Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>

Maybe you cannot reproduce it, but the problem is there and it still
looks very real to me.

netdev_state_change() does two things:

1) Emit the NETDEV_CHANGE notification

2) Emit an rtmsg_ifinfo() netlink message, which in turn tries to access
   the device statistics via ->ndo_get_stats*().

It is absolutely wrong to do #2 when netif_device_present() is false.

So I cannot apply this patch as-is, sorry.
Geert Uytterhoeven Sept. 11, 2020, 6:32 a.m. UTC | #3
Hi David,

On Thu, Sep 10, 2020 at 9:20 PM David Miller <davem@davemloft.net> wrote:
> From: Geert Uytterhoeven <geert+renesas@glider.be>
> Date: Tue,  1 Sep 2020 17:02:37 +0200
>
> > This reverts commit 124eee3f6955f7aa19b9e6ff5c9b6d37cb3d1e2c.
> >
> > Inami-san reported that this commit breaks bridge support in a Xen
> > environment, and that reverting it fixes this.
> >
> > During system resume, bridge ports are no longer enabled, as that relies
> > on the receipt of the NETDEV_CHANGE notification.  This notification is
> > not sent, as netdev_state_change() is no longer called.
> >
> > Note that the condition this commit intended to fix never existed
> > upstream, as the patch triggering it and referenced in the commit was
> > never applied upstream.  Hence I can confirm s2ram on r8a73a4/ape6evm
> > and sh73a0/kzm9g works fine before/after this revert.
> >
> > Reported-by Gaku Inami <gaku.inami.xh@renesas.com>
> > Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
>
> Maybe you cannot reproduce it, but the problem is there and it still
> looks very real to me.
>
> netdev_state_change() does two things:
>
> 1) Emit the NETDEV_CHANGE notification
>
> 2) Emit an rtmsg_ifinfo() netlink message, which in turn tries to access
>    the device statistics via ->ndo_get_stats*().
>
> It is absolutely wrong to do #2 when netif_device_present() is false.
>
> So I cannot apply this patch as-is, sorry.

Thanks a lot for looking into this!

But doing #1 is still safe?  That is the part that calls into the bridge
code.  So would moving the netif_device_present() check from
linkwatch_do_dev() to netdev_state_change(), to prevent doing #2, be
acceptable?

Thanks again!

Gr{oetje,eeting}s,

                        Geert
David Miller Sept. 12, 2020, 12:44 a.m. UTC | #4
From: Geert Uytterhoeven <geert@linux-m68k.org>
Date: Fri, 11 Sep 2020 08:32:55 +0200

> Hi David,
> 
> On Thu, Sep 10, 2020 at 9:20 PM David Miller <davem@davemloft.net> wrote:
>> From: Geert Uytterhoeven <geert+renesas@glider.be>
>> Date: Tue,  1 Sep 2020 17:02:37 +0200
>>
>> > This reverts commit 124eee3f6955f7aa19b9e6ff5c9b6d37cb3d1e2c.
>> >
>> > Inami-san reported that this commit breaks bridge support in a Xen
>> > environment, and that reverting it fixes this.
>> >
>> > During system resume, bridge ports are no longer enabled, as that relies
>> > on the receipt of the NETDEV_CHANGE notification.  This notification is
>> > not sent, as netdev_state_change() is no longer called.
>> >
>> > Note that the condition this commit intended to fix never existed
>> > upstream, as the patch triggering it and referenced in the commit was
>> > never applied upstream.  Hence I can confirm s2ram on r8a73a4/ape6evm
>> > and sh73a0/kzm9g works fine before/after this revert.
>> >
>> > Reported-by Gaku Inami <gaku.inami.xh@renesas.com>
>> > Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
>>
>> Maybe you cannot reproduce it, but the problem is there and it still
>> looks very real to me.
>>
>> netdev_state_change() does two things:
>>
>> 1) Emit the NETDEV_CHANGE notification
>>
>> 2) Emit an rtmsg_ifinfo() netlink message, which in turn tries to access
>>    the device statistics via ->ndo_get_stats*().
>>
>> It is absolutely wrong to do #2 when netif_device_present() is false.
>>
>> So I cannot apply this patch as-is, sorry.
> 
> Thanks a lot for looking into this!
> 
> But doing #1 is still safe?  That is the part that calls into the bridge
> code.  So would moving the netif_device_present() check from
> linkwatch_do_dev() to netdev_state_change(), to prevent doing #2, be
> acceptable?

I have a better question.  Why is a software device like the bridge,
that wants to effectively exist and still receive netdev event
notifications, marking itself as not present?

That's seems like the real bug here.
Geert Uytterhoeven Sept. 12, 2020, 12:33 p.m. UTC | #5
Hi David,

On Sat, Sep 12, 2020 at 2:44 AM David Miller <davem@davemloft.net> wrote:
> From: Geert Uytterhoeven <geert@linux-m68k.org>
> Date: Fri, 11 Sep 2020 08:32:55 +0200
>
> > On Thu, Sep 10, 2020 at 9:20 PM David Miller <davem@davemloft.net> wrote:
> >> From: Geert Uytterhoeven <geert+renesas@glider.be>
> >> Date: Tue,  1 Sep 2020 17:02:37 +0200
> >>
> >> > This reverts commit 124eee3f6955f7aa19b9e6ff5c9b6d37cb3d1e2c.
> >> >
> >> > Inami-san reported that this commit breaks bridge support in a Xen
> >> > environment, and that reverting it fixes this.
> >> >
> >> > During system resume, bridge ports are no longer enabled, as that relies
> >> > on the receipt of the NETDEV_CHANGE notification.  This notification is
> >> > not sent, as netdev_state_change() is no longer called.
> >> >
> >> > Note that the condition this commit intended to fix never existed
> >> > upstream, as the patch triggering it and referenced in the commit was
> >> > never applied upstream.  Hence I can confirm s2ram on r8a73a4/ape6evm
> >> > and sh73a0/kzm9g works fine before/after this revert.
> >> >
> >> > Reported-by Gaku Inami <gaku.inami.xh@renesas.com>
> >> > Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
> >>
> >> Maybe you cannot reproduce it, but the problem is there and it still
> >> looks very real to me.
> >>
> >> netdev_state_change() does two things:
> >>
> >> 1) Emit the NETDEV_CHANGE notification
> >>
> >> 2) Emit an rtmsg_ifinfo() netlink message, which in turn tries to access
> >>    the device statistics via ->ndo_get_stats*().
> >>
> >> It is absolutely wrong to do #2 when netif_device_present() is false.
> >>
> >> So I cannot apply this patch as-is, sorry.
> >
> > Thanks a lot for looking into this!
> >
> > But doing #1 is still safe?  That is the part that calls into the bridge
> > code.  So would moving the netif_device_present() check from
> > linkwatch_do_dev() to netdev_state_change(), to prevent doing #2, be
> > acceptable?
>
> I have a better question.  Why is a software device like the bridge,
> that wants to effectively exist and still receive netdev event
> notifications, marking itself as not present?
>
> That's seems like the real bug here.

"dev" is not the bridge device, but the physical Ethernet interface, which
may already be suspended during s2ram.

Gr{oetje,eeting}s,

                        Geert
David Miller Sept. 13, 2020, 1:34 a.m. UTC | #6
From: Geert Uytterhoeven <geert@linux-m68k.org>
Date: Sat, 12 Sep 2020 14:33:59 +0200

> "dev" is not the bridge device, but the physical Ethernet interface, which
> may already be suspended during s2ram.

Hmmm, ok.

Looking more deeply NETDEV_CHANGE causes br_port_carrier_check() to run which
exits early if netif_running() is false, which is going to be true if
netif_device_present() is false:

	*notified = false;
	if (!netif_running(br->dev))
		return;

The only other work the bridge notifier does is:

	if (event != NETDEV_UNREGISTER)
		br_vlan_port_event(p, event);

and:

	/* Events that may cause spanning tree to refresh */
	if (!notified && (event == NETDEV_CHANGEADDR || event == NETDEV_UP ||
			  event == NETDEV_CHANGE || event == NETDEV_DOWN))
		br_ifinfo_notify(RTM_NEWLINK, NULL, p);

So some vlan stuff, and emitting a netlink message to any available
listeners.

Should we really do all of this for a device which is not even
present?

This whole situation seems completely illogical.  The device is
useless, it logically has no link or other state that can be managed
or used, while it is not present.

So all of these bridge operations should only happen when the device
transitions back to present again.
Geert Uytterhoeven Sept. 14, 2020, 7:40 a.m. UTC | #7
Hi David,

CC bridge

On Sun, Sep 13, 2020 at 3:34 AM David Miller <davem@davemloft.net> wrote:
> From: Geert Uytterhoeven <geert@linux-m68k.org>
> Date: Sat, 12 Sep 2020 14:33:59 +0200
>
> > "dev" is not the bridge device, but the physical Ethernet interface, which
> > may already be suspended during s2ram.
>
> Hmmm, ok.
>
> Looking more deeply NETDEV_CHANGE causes br_port_carrier_check() to run which
> exits early if netif_running() is false, which is going to be true if
> netif_device_present() is false:
>
>         *notified = false;
>         if (!netif_running(br->dev))
>                 return;
>
> The only other work the bridge notifier does is:
>
>         if (event != NETDEV_UNREGISTER)
>                 br_vlan_port_event(p, event);
>
> and:
>
>         /* Events that may cause spanning tree to refresh */
>         if (!notified && (event == NETDEV_CHANGEADDR || event == NETDEV_UP ||
>                           event == NETDEV_CHANGE || event == NETDEV_DOWN))
>                 br_ifinfo_notify(RTM_NEWLINK, NULL, p);
>
> So some vlan stuff, and emitting a netlink message to any available
> listeners.
>
> Should we really do all of this for a device which is not even
> present?
>
> This whole situation seems completely illogical.  The device is
> useless, it logically has no link or other state that can be managed
> or used, while it is not present.
>
> So all of these bridge operations should only happen when the device
> transitions back to present again.

Thanks for your analysis!
I'd like to defer this to the bridge people (CC).

Gr{oetje,eeting}s,

                        Geert
Nikolay Aleksandrov Sept. 18, 2020, 12:35 p.m. UTC | #8
On Mon, 2020-09-14 at 09:40 +0200, Geert Uytterhoeven wrote:
> Hi David,
> 
> CC bridge
> 
> On Sun, Sep 13, 2020 at 3:34 AM David Miller <davem@davemloft.net> wrote:
> > From: Geert Uytterhoeven <geert@linux-m68k.org>
> > Date: Sat, 12 Sep 2020 14:33:59 +0200
> > 
> > > "dev" is not the bridge device, but the physical Ethernet interface, which
> > > may already be suspended during s2ram.
> > 
> > Hmmm, ok.
> > 
> > Looking more deeply NETDEV_CHANGE causes br_port_carrier_check() to run which
> > exits early if netif_running() is false, which is going to be true if
> > netif_device_present() is false:
> > 
> >         *notified = false;
> >         if (!netif_running(br->dev))
> >                 return;
> > 
> > The only other work the bridge notifier does is:
> > 
> >         if (event != NETDEV_UNREGISTER)
> >                 br_vlan_port_event(p, event);
> > 
> > and:
> > 
> >         /* Events that may cause spanning tree to refresh */
> >         if (!notified && (event == NETDEV_CHANGEADDR || event == NETDEV_UP ||
> >                           event == NETDEV_CHANGE || event == NETDEV_DOWN))
> >                 br_ifinfo_notify(RTM_NEWLINK, NULL, p);
> > 
> > So some vlan stuff, and emitting a netlink message to any available
> > listeners.
> > 
> > Should we really do all of this for a device which is not even
> > present?
> > 
> > This whole situation seems completely illogical.  The device is
> > useless, it logically has no link or other state that can be managed
> > or used, while it is not present.
> > 
> > So all of these bridge operations should only happen when the device
> > transitions back to present again.
> 
> Thanks for your analysis!
> I'd like to defer this to the bridge people (CC).
> 
> Gr{oetje,eeting}s,
> 
>                         Geert
> 

Hi,
Sorry for the delay. Interesting problem. :)
Thanks for the analysis, I don't see any issues with checking if the device
isn't present. It will have to go through some testing, but no obvious
objections/issues. Have you tried if it fixes your case?
I have briefly gone over drivers' use of net_device_detach(), mostly it's used
for suspends, but there are a few cases which use it on IO error or when a
device is actually detaching (VF detach). The vlan port event is for vlan
devices on top of the bridge when BROPT_VLAN_BRIDGE_BINDING is enabled and their
carrier is changed based on vlan participating ports' state.

Thanks,
 Nik
Saeed Mahameed Sept. 18, 2020, 5:58 p.m. UTC | #9
On Tue, 2020-09-01 at 17:02 +0200, Geert Uytterhoeven wrote:
> This reverts commit 124eee3f6955f7aa19b9e6ff5c9b6d37cb3d1e2c.
> 
> Inami-san reported that this commit breaks bridge support in a Xen
> environment, and that reverting it fixes this.
> 
> During system resume, bridge ports are no longer enabled, as that
> relies
> on the receipt of the NETDEV_CHANGE notification.  This notification
> is
> not sent, as netdev_state_change() is no longer called.
> 
> Note that the condition this commit intended to fix never existed
> upstream, as the patch triggering it and referenced in the commit was
> never applied upstream.  Hence I can confirm s2ram on r8a73a4/ape6evm
> and sh73a0/kzm9g works fine before/after this revert.
> 
> Reported-by Gaku Inami <gaku.inami.xh@renesas.com>
> Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
> ---
>  net/core/link_watch.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/core/link_watch.c b/net/core/link_watch.c
> index 75431ca9300fb9c4..c24574493ecf95e6 100644
> --- a/net/core/link_watch.c
> +++ b/net/core/link_watch.c
> @@ -158,7 +158,7 @@ static void linkwatch_do_dev(struct net_device
> *dev)
>  	clear_bit(__LINK_STATE_LINKWATCH_PENDING, &dev->state);
>  
>  	rfc2863_policy(dev);
> -	if (dev->flags & IFF_UP && netif_device_present(dev)) {
> +	if (dev->flags & IFF_UP) {

So with your issue the devices is both IFF_UP and !present ? how so ?
I think you should look into that.

I am ok with removing the "dev present" check from here just because we
shouldn't  be expecting IFF_UP && !present .. such thing must be a bug
somewhere else.

>  		if (netif_carrier_ok(dev))
>  			dev_activate(dev);
>  		else
David Miller Sept. 18, 2020, 9:47 p.m. UTC | #10
From: Nikolay Aleksandrov <nikolay@nvidia.com>
Date: Fri, 18 Sep 2020 12:35:02 +0000

> Thanks for the analysis, I don't see any issues with checking if the device
> isn't present. It will have to go through some testing, but no obvious
> objections/issues. Have you tried if it fixes your case?
> I have briefly gone over drivers' use of net_device_detach(), mostly it's used
> for suspends, but there are a few cases which use it on IO error or when a
> device is actually detaching (VF detach). The vlan port event is for vlan
> devices on top of the bridge when BROPT_VLAN_BRIDGE_BINDING is enabled and their
> carrier is changed based on vlan participating ports' state.

There are two things to resolve:

1) Why does the bridge need to get a change event for devices which have
   not fully resumed yet?

2) What kind of link state change is happening on devices which are not
   currently fully resumed yet?

Really this whole situation is still quite mysterious to me.

If the driver (or the PHY library it is using, etc.) is emitting link
state changes before it marks itself as "present", that's the real
bug.
David Miller Sept. 18, 2020, 10:10 p.m. UTC | #11
From: Saeed Mahameed <saeed@kernel.org>
Date: Fri, 18 Sep 2020 10:58:49 -0700

> On Tue, 2020-09-01 at 17:02 +0200, Geert Uytterhoeven wrote:
>> @@ -158,7 +158,7 @@ static void linkwatch_do_dev(struct net_device
>> *dev)
>>  	clear_bit(__LINK_STATE_LINKWATCH_PENDING, &dev->state);
>>  
>>  	rfc2863_policy(dev);
>> -	if (dev->flags & IFF_UP && netif_device_present(dev)) {
>> +	if (dev->flags & IFF_UP) {
> 
> So with your issue the devices is both IFF_UP and !present ? how so ?
> I think you should look into that.
> 
> I am ok with removing the "dev present" check from here just because we
> shouldn't  be expecting IFF_UP && !present .. such thing must be a bug
> somewhere else.

Indeed, this is why I just asked in another email why a link change event
is firing for a device which hasn't fully resumed and marked itself as
"present" yet.

More and more I think this is a driver or PHY layer bug.
Heiner Kallweit Sept. 23, 2020, 11:49 a.m. UTC | #12
On 18.09.2020 19:58, Saeed Mahameed wrote:
> On Tue, 2020-09-01 at 17:02 +0200, Geert Uytterhoeven wrote:
>> This reverts commit 124eee3f6955f7aa19b9e6ff5c9b6d37cb3d1e2c.
>>
>> Inami-san reported that this commit breaks bridge support in a Xen
>> environment, and that reverting it fixes this.
>>
>> During system resume, bridge ports are no longer enabled, as that
>> relies
>> on the receipt of the NETDEV_CHANGE notification.  This notification
>> is
>> not sent, as netdev_state_change() is no longer called.
>>
>> Note that the condition this commit intended to fix never existed
>> upstream, as the patch triggering it and referenced in the commit was
>> never applied upstream.  Hence I can confirm s2ram on r8a73a4/ape6evm
>> and sh73a0/kzm9g works fine before/after this revert.
>>
>> Reported-by Gaku Inami <gaku.inami.xh@renesas.com>
>> Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
>> ---
>>  net/core/link_watch.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/net/core/link_watch.c b/net/core/link_watch.c
>> index 75431ca9300fb9c4..c24574493ecf95e6 100644
>> --- a/net/core/link_watch.c
>> +++ b/net/core/link_watch.c
>> @@ -158,7 +158,7 @@ static void linkwatch_do_dev(struct net_device
>> *dev)
>>  	clear_bit(__LINK_STATE_LINKWATCH_PENDING, &dev->state);
>>  
>>  	rfc2863_policy(dev);
>> -	if (dev->flags & IFF_UP && netif_device_present(dev)) {
>> +	if (dev->flags & IFF_UP) {
> 
> So with your issue the devices is both IFF_UP and !present ? how so ?
> I think you should look into that.
> 
> I am ok with removing the "dev present" check from here just because we
> shouldn't  be expecting IFF_UP && !present .. such thing must be a bug
> somewhere else.
> 
>>  		if (netif_carrier_ok(dev))
>>  			dev_activate(dev);
>>  		else
> 


In __dev_close_many() we call ndo_stop() whilst IFF_UP is still set.
ndo_stop() may detach the device and bring down the PHY, resulting in an
async link change event that calls dev_get_stats(). The latter call may
have a problem if the device is detached. In a first place I'd consider
such a case a network driver bug (ndo_get_stats/64 should check for
device presence if depending on it).
The additional check in linkwatch_do_dev() was meant to protect from such
driver issues.
Saeed Mahameed Sept. 23, 2020, 6:35 p.m. UTC | #13
On Wed, 2020-09-23 at 13:49 +0200, Heiner Kallweit wrote:
> On 18.09.2020 19:58, Saeed Mahameed wrote:
> > On Tue, 2020-09-01 at 17:02 +0200, Geert Uytterhoeven wrote:
> > > This reverts commit 124eee3f6955f7aa19b9e6ff5c9b6d37cb3d1e2c.
> > > 
> > > Inami-san reported that this commit breaks bridge support in a
> > > Xen
> > > environment, and that reverting it fixes this.
> > > 
> > > During system resume, bridge ports are no longer enabled, as that
> > > relies
> > > on the receipt of the NETDEV_CHANGE notification.  This
> > > notification
> > > is
> > > not sent, as netdev_state_change() is no longer called.
> > > 
> > > Note that the condition this commit intended to fix never existed
> > > upstream, as the patch triggering it and referenced in the commit
> > > was
> > > never applied upstream.  Hence I can confirm s2ram on
> > > r8a73a4/ape6evm
> > > and sh73a0/kzm9g works fine before/after this revert.
> > > 
> > > Reported-by Gaku Inami <gaku.inami.xh@renesas.com>
> > > Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
> > > ---
> > >  net/core/link_watch.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/net/core/link_watch.c b/net/core/link_watch.c
> > > index 75431ca9300fb9c4..c24574493ecf95e6 100644
> > > --- a/net/core/link_watch.c
> > > +++ b/net/core/link_watch.c
> > > @@ -158,7 +158,7 @@ static void linkwatch_do_dev(struct
> > > net_device
> > > *dev)
> > >  	clear_bit(__LINK_STATE_LINKWATCH_PENDING, &dev->state);
> > >  
> > >  	rfc2863_policy(dev);
> > > -	if (dev->flags & IFF_UP && netif_device_present(dev)) {
> > > +	if (dev->flags & IFF_UP) {
> > 
> > So with your issue the devices is both IFF_UP and !present ? how so
> > ?
> > I think you should look into that.
> > 
> > I am ok with removing the "dev present" check from here just
> > because we
> > shouldn't  be expecting IFF_UP && !present .. such thing must be a
> > bug
> > somewhere else.
> > 
> > >  		if (netif_carrier_ok(dev))
> > >  			dev_activate(dev);
> > >  		else
> 
> In __dev_close_many() we call ndo_stop() whilst IFF_UP is still set.
> ndo_stop() may detach the device and bring down the PHY, resulting in
> an

Why would a driver detach the device on ndo_stop() ?
seems like this is the bug you need to be chasing ..
which driver is doing this ? 

> async link change event that calls dev_get_stats(). The latter call
> may
> have a problem if the device is detached. In a first place I'd
> consider
> such a case a network driver bug (ndo_get_stats/64 should check for
> device presence if depending on it).

Device drivers should avoid presence check as much as possible
especially in ndo, this check must be performed by the stack.

> The additional check in linkwatch_do_dev() was meant to protect from
> such
> driver issues.
Heiner Kallweit Sept. 23, 2020, 7:58 p.m. UTC | #14
On 23.09.2020 20:35, Saeed Mahameed wrote:
> On Wed, 2020-09-23 at 13:49 +0200, Heiner Kallweit wrote:
>> On 18.09.2020 19:58, Saeed Mahameed wrote:
>>> On Tue, 2020-09-01 at 17:02 +0200, Geert Uytterhoeven wrote:
>>>> This reverts commit 124eee3f6955f7aa19b9e6ff5c9b6d37cb3d1e2c.
>>>>
>>>> Inami-san reported that this commit breaks bridge support in a
>>>> Xen
>>>> environment, and that reverting it fixes this.
>>>>
>>>> During system resume, bridge ports are no longer enabled, as that
>>>> relies
>>>> on the receipt of the NETDEV_CHANGE notification.  This
>>>> notification
>>>> is
>>>> not sent, as netdev_state_change() is no longer called.
>>>>
>>>> Note that the condition this commit intended to fix never existed
>>>> upstream, as the patch triggering it and referenced in the commit
>>>> was
>>>> never applied upstream.  Hence I can confirm s2ram on
>>>> r8a73a4/ape6evm
>>>> and sh73a0/kzm9g works fine before/after this revert.
>>>>
>>>> Reported-by Gaku Inami <gaku.inami.xh@renesas.com>
>>>> Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
>>>> ---
>>>>  net/core/link_watch.c | 2 +-
>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/net/core/link_watch.c b/net/core/link_watch.c
>>>> index 75431ca9300fb9c4..c24574493ecf95e6 100644
>>>> --- a/net/core/link_watch.c
>>>> +++ b/net/core/link_watch.c
>>>> @@ -158,7 +158,7 @@ static void linkwatch_do_dev(struct
>>>> net_device
>>>> *dev)
>>>>  	clear_bit(__LINK_STATE_LINKWATCH_PENDING, &dev->state);
>>>>  
>>>>  	rfc2863_policy(dev);
>>>> -	if (dev->flags & IFF_UP && netif_device_present(dev)) {
>>>> +	if (dev->flags & IFF_UP) {
>>>
>>> So with your issue the devices is both IFF_UP and !present ? how so
>>> ?
>>> I think you should look into that.
>>>
>>> I am ok with removing the "dev present" check from here just
>>> because we
>>> shouldn't  be expecting IFF_UP && !present .. such thing must be a
>>> bug
>>> somewhere else.
>>>
>>>>  		if (netif_carrier_ok(dev))
>>>>  			dev_activate(dev);
>>>>  		else
>>
>> In __dev_close_many() we call ndo_stop() whilst IFF_UP is still set.
>> ndo_stop() may detach the device and bring down the PHY, resulting in
>> an
> 
> Why would a driver detach the device on ndo_stop() ?
> seems like this is the bug you need to be chasing ..
> which driver is doing this ? 
> 
Some drivers set the device to PCI D3hot at the end of ndo_stop()
to save power (using e.g. Runtime PM). Marking the device as detached
makes clear to to the net core that the device isn't accessible any
longer.

>> async link change event that calls dev_get_stats(). The latter call
>> may
>> have a problem if the device is detached. In a first place I'd
>> consider
>> such a case a network driver bug (ndo_get_stats/64 should check for
>> device presence if depending on it).
> 
> Device drivers should avoid presence check as much as possible
> especially in ndo, this check must be performed by the stack.
> 
That's a question I also stumbled across. For the ethtool ops
dev_ethtool() checks whether device is present.
But for ndo that's not always the case, e.g. dev_get_stats()
doesn't check for device presence before calling ndo_get_stats()
or ndo_get_stats64().
To a certain extent I can understand this behavior, because drivers
may just use internal data structures in ndo ops instead of accessing
the device.

>> The additional check in linkwatch_do_dev() was meant to protect from
>> such
>> driver issues.
>
David Miller Sept. 23, 2020, 8:15 p.m. UTC | #15
From: Heiner Kallweit <hkallweit1@gmail.com>
Date: Wed, 23 Sep 2020 21:58:59 +0200

> On 23.09.2020 20:35, Saeed Mahameed wrote:
>> Why would a driver detach the device on ndo_stop() ?
>> seems like this is the bug you need to be chasing ..
>> which driver is doing this ? 
>> 
> Some drivers set the device to PCI D3hot at the end of ndo_stop()
> to save power (using e.g. Runtime PM). Marking the device as detached
> makes clear to to the net core that the device isn't accessible any
> longer.

That being the case, the problem is that IFF_UP+!present is not a
valid netdev state.

Is it simply the issue that, upon resume, IFF_UP is marked true before
the device is brought out from D3hot state and thus marked as present
again?
Heiner Kallweit Sept. 23, 2020, 8:44 p.m. UTC | #16
On 23.09.2020 22:15, David Miller wrote:
> From: Heiner Kallweit <hkallweit1@gmail.com>
> Date: Wed, 23 Sep 2020 21:58:59 +0200
> 
>> On 23.09.2020 20:35, Saeed Mahameed wrote:
>>> Why would a driver detach the device on ndo_stop() ?
>>> seems like this is the bug you need to be chasing ..
>>> which driver is doing this ? 
>>>
>> Some drivers set the device to PCI D3hot at the end of ndo_stop()
>> to save power (using e.g. Runtime PM). Marking the device as detached
>> makes clear to to the net core that the device isn't accessible any
>> longer.
> 
> That being the case, the problem is that IFF_UP+!present is not a
> valid netdev state.
> 
If this combination is invalid, then netif_device_detach() should
clear IFF_UP? At a first glance this should be sufficient to avoid
the issue I was dealing with.

> Is it simply the issue that, upon resume, IFF_UP is marked true before
> the device is brought out from D3hot state and thus marked as present
> again?
> 
I can't really comment on that. The issue I was dealing with at the
time I submitted this change was about an async linkwatch event
(caused by powering down the PHY in ndo_stop) trying to access the
device when it was powered down already.
Saeed Mahameed Sept. 23, 2020, 10:42 p.m. UTC | #17
On Wed, 2020-09-23 at 22:44 +0200, Heiner Kallweit wrote:
> On 23.09.2020 22:15, David Miller wrote:
> > From: Heiner Kallweit <hkallweit1@gmail.com>
> > Date: Wed, 23 Sep 2020 21:58:59 +0200
> > 
> > > On 23.09.2020 20:35, Saeed Mahameed wrote:
> > > > Why would a driver detach the device on ndo_stop() ?
> > > > seems like this is the bug you need to be chasing ..
> > > > which driver is doing this ? 
> > > > 
> > > Some drivers set the device to PCI D3hot at the end of ndo_stop()
> > > to save power (using e.g. Runtime PM). Marking the device as
> > > detached
> > > makes clear to to the net core that the device isn't accessible
> > > any
> > > longer.
> > 
> > That being the case, the problem is that IFF_UP+!present is not a
> > valid netdev state.
> > 
> If this combination is invalid, then netif_device_detach() should
> clear IFF_UP? At a first glance this should be sufficient to avoid
> the issue I was dealing with.
> 

Feels like a work around and would conflict with the assumption that 
netif_device_detach() should only be called when !IFF_UP

Maybe we need to clear IFF_UP before calling ops->ndo_stop(dev),
instead of after on __dev_close_many(). Assuming no driver is checking
IFF_UP state on its own ndo_stop(), other than this, the order
shouldn't really matter, since clearing the flag and calling ndo_stop()
should be considered as one atomic operation.

> > Is it simply the issue that, upon resume, IFF_UP is marked true
> > before
> > the device is brought out from D3hot state and thus marked as
> > present
> > again?
> > 
> I can't really comment on that. The issue I was dealing with at the
> time I submitted this change was about an async linkwatch event
> (caused by powering down the PHY in ndo_stop) trying to access the
> device when it was powered down already.
David Miller Sept. 24, 2020, 12:21 a.m. UTC | #18
From: Saeed Mahameed <saeed@kernel.org>
Date: Wed, 23 Sep 2020 15:42:17 -0700

> Maybe we need to clear IFF_UP before calling ops->ndo_stop(dev),
> instead of after on __dev_close_many(). Assuming no driver is checking
> IFF_UP state on its own ndo_stop(), other than this, the order
> shouldn't really matter, since clearing the flag and calling ndo_stop()
> should be considered as one atomic operation.

This is my biggest concern, that some ndo_stop, or some helper called
by ndo_stop, checks IFF_UP or similar.

There is also something else.  We have both synchronous and async code
that checks state like IFF_UP and 'present' and makes a decision based
upon that.

If an async code path tests 'present', gets true, and then the RTNL
holding synchronous code path puts the device into D3hot immediately
afterwards, the async code path will still continue and access the
chips registers and fault.

I'm saying all of this because the only way this bug makes sense
is if the ->ndo_stop() sequence that marks the device !present and
then clears IFF_UP runs with the RTNL mutex held, and the code path
that tests this state in the linkwatch bits in question do not.
David Miller Sept. 24, 2020, 12:23 a.m. UTC | #19
From: David Miller <davem@davemloft.net>
Date: Wed, 23 Sep 2020 17:21:25 -0700 (PDT)

> If an async code path tests 'present', gets true, and then the RTNL
> holding synchronous code path puts the device into D3hot immediately
> afterwards, the async code path will still continue and access the
> chips registers and fault.

Wait, is the sequence:

	->ndo_stop()
		mark device not present and put into D3hot
		triggers linkwatch event
	  ...
			 ->ndo_get_stats64()

???

Then yeah we might have to clear IFF_UP at the beginning of taking
a netdev down.
Saeed Mahameed Sept. 24, 2020, 5:49 a.m. UTC | #20
On Wed, 2020-09-23 at 17:23 -0700, David Miller wrote:
> From: David Miller <davem@davemloft.net>
> Date: Wed, 23 Sep 2020 17:21:25 -0700 (PDT)
> 
> > If an async code path tests 'present', gets true, and then the RTNL
> > holding synchronous code path puts the device into D3hot
> immediately
> > afterwards, the async code path will still continue and access the
> > chips registers and fault.
> 
> Wait, is the sequence:
> 
>         ->ndo_stop()
>                 mark device not present and put into D3hot
>                 triggers linkwatch event
>           ...
>                          ->ndo_get_stats64()
> 
> ???
> 

I assume it is, since normally device drivers do carrier_off() on
ndo_stop()

1) One problematic sequence would be 
(for drivers doing D3hot on ndo_stop())

__dev_close_many()
   ->ndo_stop()
      netif_device_detach() //Mark !present;
      ... D3hot
      carrier_off()->linkwatch_event()
            ... // !present && IFF_UP 
      
2) Another problematic scenario which i see is repeated in many
drivers:

shutdown/suspend()
    rtnl_lock()
    netif_device_detach()//Mark !present;
    stop()->carrier_off()->linkwatch_event()
    // at this point device is still IFF_UP and !present
    // due to the early detach above..  
    rtnl_unlock();
   
For scenario 1) we can fix by marking IFF_UP at the beginning, but for
2), i think we need to fix the drivers to detach only after stop :(
   
> Then yeah we might have to clear IFF_UP at the beginning of taking
> a netdev down.
Jakub Kicinski Sept. 24, 2020, 4:03 p.m. UTC | #21
On Wed, 23 Sep 2020 22:49:37 -0700 Saeed Mahameed wrote:
> 2) Another problematic scenario which i see is repeated in many
> drivers:
> 
> shutdown/suspend()
>     rtnl_lock()
>     netif_device_detach()//Mark !present;
>     stop()->carrier_off()->linkwatch_event()
>     // at this point device is still IFF_UP and !present
>     // due to the early detach above..  
>     rtnl_unlock();

Maybe we can solve this by providing drivers with a better helper for
the suspend use case?

AFAIU netif_device_detach() is used by both IO errors and drivers
willingly detaching the device during normal operation (e.g. for
suspend).

Since the suspend path can sleep if we have a separate helper perhaps
we could fire off the appropriate events synchronously, and quiescence
the stack properly?
Saeed Mahameed Sept. 24, 2020, 6:16 p.m. UTC | #22
On Thu, 2020-09-24 at 09:03 -0700, Jakub Kicinski wrote:
> On Wed, 23 Sep 2020 22:49:37 -0700 Saeed Mahameed wrote:
> > 2) Another problematic scenario which i see is repeated in many
> > drivers:
> > 
> > shutdown/suspend()
> >     rtnl_lock()
> >     netif_device_detach()//Mark !present;
> >     stop()->carrier_off()->linkwatch_event()
> >     // at this point device is still IFF_UP and !present
> >     // due to the early detach above..  
> >     rtnl_unlock();
> 
> Maybe we can solve this by providing drivers with a better helper for
> the suspend use case?
> 
> AFAIU netif_device_detach() is used by both IO errors and drivers
> willingly detaching the device during normal operation (e.g. for
> suspend).
> 
> Since the suspend path can sleep if we have a separate helper perhaps
> we could fire off the appropriate events synchronously, and
> quiescence
> the stack properly?

I was thinking something similar, a more heavy
weight netif_device_detach(), which will be used in all drivers suspend
flows.
 
1) clear IFF_UP
2) ndo_stop()
3) fire events
4) mark !present
...

5) suspend device


but went and sampled some drivers and found there are many variations
for using netif_device_detach it is not going to be a simple task.
diff mbox series

Patch

diff --git a/net/core/link_watch.c b/net/core/link_watch.c
index 75431ca9300fb9c4..c24574493ecf95e6 100644
--- a/net/core/link_watch.c
+++ b/net/core/link_watch.c
@@ -158,7 +158,7 @@  static void linkwatch_do_dev(struct net_device *dev)
 	clear_bit(__LINK_STATE_LINKWATCH_PENDING, &dev->state);
 
 	rfc2863_policy(dev);
-	if (dev->flags & IFF_UP && netif_device_present(dev)) {
+	if (dev->flags & IFF_UP) {
 		if (netif_carrier_ok(dev))
 			dev_activate(dev);
 		else