[PATCHv3,2/4] drivers/base: utilize device tree info to shutdown devices
diff mbox series

Message ID 1530600642-25090-3-git-send-email-kernelfans@gmail.com
State Not Applicable
Headers show
Series
  • drivers/base: bugfix for supplier<-consumer ordering in device_kset
Related show

Commit Message

Pingfan Liu July 3, 2018, 6:50 a.m. UTC
commit 52cdbdd49853 ("driver core: correct device's shutdown order")
places an assumption of supplier<-consumer order on the process of probe.
But it turns out to break down the parent <- child order in some scene.
E.g in pci, a bridge is enabled by pci core, and behind it, the devices
have been probed. Then comes the bridge's module, which enables extra
feature(such as hotplug) on this bridge. This will break the
parent<-children order and cause failure when "kexec -e" in some scenario.

The detailed description of the scenario:
An IBM Power9 machine on which, two drivers portdrv_pci and shpchp(a mod)
match the PCI_CLASS_BRIDGE_PCI, but neither of them success to probe due
to some issue. For this case, the bridge is moved after its children in
devices_kset. Then, when "kexec -e", a ata-disk behind the bridge can not
write back buffer in flight due to the former shutdown of the bridge which
clears the BusMaster bit.

It is a little hard to impose both "parent<-child" and "supplier<-consumer"
order on devices_kset. Take the following scene:
step0: before a consumer's probing, (note child_a is supplier of consumer_a)
  [ consumer-X, child_a, ...., child_z] [... consumer_a, ..., consumer_z, ...] supplier-X
                                         ^^^^^^^^^^ affected range ^^^^^^^^^^
step1: when probing, moving consumer-X after supplier-X
  [ child_a, ...., child_z] [.... consumer_a, ..., consumer_z, ...] supplier-X, consumer-X
step2: the children of consumer-X should be re-ordered to maintain the seq
  [... consumer_a, ..., consumer_z, ....] supplier-X  [consumer-X, child_a, ...., child_z]
step3: the consumer_a should be re-ordered to maintain the seq
  [... consumer_z, ...] supplier-X [ consumer-X, child_a, consumer_a ..., child_z]

It requires two nested recursion to drain out all out-of-order item in
"affected range". To avoid such complicated code, this patch suggests
to utilize the info in device tree, instead of using the order of
devices_kset during shutdown. It iterates the device tree, and firstly
shutdown a device's children and consumers. After this patch, the buggy
commit is hollow and left to clean.

Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Grygorii Strashko <grygorii.strashko@ti.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Bjorn Helgaas <helgaas@kernel.org>
Cc: Dave Young <dyoung@redhat.com>
Cc: linux-pci@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
---
 drivers/base/core.c    | 48 +++++++++++++++++++++++++++++++++++++++++++-----
 include/linux/device.h |  1 +
 2 files changed, 44 insertions(+), 5 deletions(-)

Comments

Pingfan Liu July 3, 2018, 9:26 a.m. UTC | #1
On Tue, Jul 3, 2018 at 3:51 PM Lukas Wunner <lukas@wunner.de> wrote:
>
> On Tue, Jul 03, 2018 at 02:50:40PM +0800, Pingfan Liu wrote:
> > commit 52cdbdd49853 ("driver core: correct device's shutdown order")
> > places an assumption of supplier<-consumer order on the process of probe.
> > But it turns out to break down the parent <- child order in some scene.
> > E.g in pci, a bridge is enabled by pci core, and behind it, the devices
> > have been probed. Then comes the bridge's module, which enables extra
> > feature(such as hotplug) on this bridge. This will break the
> > parent<-children order and cause failure when "kexec -e" in some scenario.
> >
> > The detailed description of the scenario:
> > An IBM Power9 machine on which, two drivers portdrv_pci and shpchp(a mod)
> > match the PCI_CLASS_BRIDGE_PCI, but neither of them success to probe due
> > to some issue. For this case, the bridge is moved after its children in
> > devices_kset. Then, when "kexec -e", a ata-disk behind the bridge can not
> > write back buffer in flight due to the former shutdown of the bridge which
> > clears the BusMaster bit.
>
> If you revert commit cc27b735ad3a ("PCI/portdrv: Turn off PCIe services
> during shutdown"), does the issue go away?

Yes, it is gone.
Andy Shevchenko July 3, 2018, 10:58 a.m. UTC | #2
I think Pavel would be interested to see this as well (he is doing
some parallel device shutdown stuff)

On Tue, Jul 3, 2018 at 9:50 AM, Pingfan Liu <kernelfans@gmail.com> wrote:
> commit 52cdbdd49853 ("driver core: correct device's shutdown order")
> places an assumption of supplier<-consumer order on the process of probe.
> But it turns out to break down the parent <- child order in some scene.
> E.g in pci, a bridge is enabled by pci core, and behind it, the devices
> have been probed. Then comes the bridge's module, which enables extra
> feature(such as hotplug) on this bridge. This will break the
> parent<-children order and cause failure when "kexec -e" in some scenario.
>
> The detailed description of the scenario:
> An IBM Power9 machine on which, two drivers portdrv_pci and shpchp(a mod)
> match the PCI_CLASS_BRIDGE_PCI, but neither of them success to probe due
> to some issue. For this case, the bridge is moved after its children in
> devices_kset. Then, when "kexec -e", a ata-disk behind the bridge can not
> write back buffer in flight due to the former shutdown of the bridge which
> clears the BusMaster bit.
>
> It is a little hard to impose both "parent<-child" and "supplier<-consumer"
> order on devices_kset. Take the following scene:
> step0: before a consumer's probing, (note child_a is supplier of consumer_a)
>   [ consumer-X, child_a, ...., child_z] [... consumer_a, ..., consumer_z, ...] supplier-X
>                                          ^^^^^^^^^^ affected range ^^^^^^^^^^
> step1: when probing, moving consumer-X after supplier-X
>   [ child_a, ...., child_z] [.... consumer_a, ..., consumer_z, ...] supplier-X, consumer-X
> step2: the children of consumer-X should be re-ordered to maintain the seq
>   [... consumer_a, ..., consumer_z, ....] supplier-X  [consumer-X, child_a, ...., child_z]
> step3: the consumer_a should be re-ordered to maintain the seq
>   [... consumer_z, ...] supplier-X [ consumer-X, child_a, consumer_a ..., child_z]
>
> It requires two nested recursion to drain out all out-of-order item in
> "affected range". To avoid such complicated code, this patch suggests
> to utilize the info in device tree, instead of using the order of
> devices_kset during shutdown. It iterates the device tree, and firstly
> shutdown a device's children and consumers. After this patch, the buggy
> commit is hollow and left to clean.
>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Cc: Grygorii Strashko <grygorii.strashko@ti.com>
> Cc: Christoph Hellwig <hch@infradead.org>
> Cc: Bjorn Helgaas <helgaas@kernel.org>
> Cc: Dave Young <dyoung@redhat.com>
> Cc: linux-pci@vger.kernel.org
> Cc: linuxppc-dev@lists.ozlabs.org
> Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> ---
>  drivers/base/core.c    | 48 +++++++++++++++++++++++++++++++++++++++++++-----
>  include/linux/device.h |  1 +
>  2 files changed, 44 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index a48868f..684b994 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -1446,6 +1446,7 @@ void device_initialize(struct device *dev)
>         INIT_LIST_HEAD(&dev->links.consumers);
>         INIT_LIST_HEAD(&dev->links.suppliers);
>         dev->links.status = DL_DEV_NO_DRIVER;
> +       dev->shutdown = false;
>  }
>  EXPORT_SYMBOL_GPL(device_initialize);
>
> @@ -2811,7 +2812,6 @@ static void __device_shutdown(struct device *dev)
>          * lock is to be held
>          */
>         parent = get_device(dev->parent);
> -       get_device(dev);
>         /*
>          * Make sure the device is off the kset list, in the
>          * event that dev->*->shutdown() doesn't remove it.
> @@ -2842,23 +2842,60 @@ static void __device_shutdown(struct device *dev)
>                         dev_info(dev, "shutdown\n");
>                 dev->driver->shutdown(dev);
>         }
> -
> +       dev->shutdown = true;
>         device_unlock(dev);
>         if (parent)
>                 device_unlock(parent);
>
> -       put_device(dev);
>         put_device(parent);
>         spin_lock(&devices_kset->list_lock);
>  }
>
> +/* shutdown dev's children and consumer firstly, then itself */
> +static int device_for_each_child_shutdown(struct device *dev)
> +{
> +       struct klist_iter i;
> +       struct device *child;
> +       struct device_link *link;
> +
> +       /* already shutdown, then skip this sub tree */
> +       if (dev->shutdown)
> +               return 0;
> +
> +       if (!dev->p)
> +               goto check_consumers;
> +
> +       /* there is breakage of lock in __device_shutdown(), and the redundant
> +        * ref++ on srcu protected consumer is harmless since shutdown is not
> +        * hot path.
> +        */
> +       get_device(dev);
> +
> +       klist_iter_init(&dev->p->klist_children, &i);
> +       while ((child = next_device(&i)))
> +               device_for_each_child_shutdown(child);
> +       klist_iter_exit(&i);
> +
> +check_consumers:
> +       list_for_each_entry_rcu(link, &dev->links.consumers, s_node) {
> +               if (!link->consumer->shutdown)
> +                       device_for_each_child_shutdown(link->consumer);
> +       }
> +
> +       __device_shutdown(dev);
> +       put_device(dev);
> +       return 0;
> +}
> +
>  /**
>   * device_shutdown - call ->shutdown() on each device to shutdown.
>   */
>  void device_shutdown(void)
>  {
>         struct device *dev;
> +       int idx;
>
> +       idx = device_links_read_lock();
>         spin_lock(&devices_kset->list_lock);
>         /*
>          * Walk the devices list backward, shutting down each in turn.
> @@ -2866,11 +2903,12 @@ void device_shutdown(void)
>          * devices offline, even as the system is shutting down.
>          */
>         while (!list_empty(&devices_kset->list)) {
> -               dev = list_entry(devices_kset->list.prev, struct device,
> +               dev = list_entry(devices_kset->list.next, struct device,
>                                 kobj.entry);
> -               __device_shutdown(dev);
> +               device_for_each_child_shutdown(dev);
>         }
>         spin_unlock(&devices_kset->list_lock);
> +       device_links_read_unlock(idx);
>  }
>
>  /*
> diff --git a/include/linux/device.h b/include/linux/device.h
> index 055a69d..8a0f784 100644
> --- a/include/linux/device.h
> +++ b/include/linux/device.h
> @@ -1003,6 +1003,7 @@ struct device {
>         bool                    offline:1;
>         bool                    of_node_reused:1;
>         bool                    dma_32bit_limit:1;
> +       bool                    shutdown:1; /* one direction: false->true */
>  };
>
>  static inline struct device *kobj_to_dev(struct kobject *kobj)
> --
> 2.7.4
>
Pavel Tatashin July 3, 2018, 5:03 p.m. UTC | #3
Thank you Andy for the heads up. I might need to rebase my work
(http://lkml.kernel.org/r/20180629182541.6735-1-pasha.tatashin@oracle.com)
based on this change. But, it is possible it is going to be harder to
parallelize based on device tree. I will need to think about it.

Pavel

On Tue, Jul 3, 2018 at 6:59 AM Andy Shevchenko
<andy.shevchenko@gmail.com> wrote:
>
> I think Pavel would be interested to see this as well (he is doing
> some parallel device shutdown stuff)
>
> On Tue, Jul 3, 2018 at 9:50 AM, Pingfan Liu <kernelfans@gmail.com> wrote:
> > commit 52cdbdd49853 ("driver core: correct device's shutdown order")
> > places an assumption of supplier<-consumer order on the process of probe.
> > But it turns out to break down the parent <- child order in some scene.
> > E.g in pci, a bridge is enabled by pci core, and behind it, the devices
> > have been probed. Then comes the bridge's module, which enables extra
> > feature(such as hotplug) on this bridge. This will break the
> > parent<-children order and cause failure when "kexec -e" in some scenario.
> >
> > The detailed description of the scenario:
> > An IBM Power9 machine on which, two drivers portdrv_pci and shpchp(a mod)
> > match the PCI_CLASS_BRIDGE_PCI, but neither of them success to probe due
> > to some issue. For this case, the bridge is moved after its children in
> > devices_kset. Then, when "kexec -e", a ata-disk behind the bridge can not
> > write back buffer in flight due to the former shutdown of the bridge which
> > clears the BusMaster bit.
> >
> > It is a little hard to impose both "parent<-child" and "supplier<-consumer"
> > order on devices_kset. Take the following scene:
> > step0: before a consumer's probing, (note child_a is supplier of consumer_a)
> >   [ consumer-X, child_a, ...., child_z] [... consumer_a, ..., consumer_z, ...] supplier-X
> >                                          ^^^^^^^^^^ affected range ^^^^^^^^^^
> > step1: when probing, moving consumer-X after supplier-X
> >   [ child_a, ...., child_z] [.... consumer_a, ..., consumer_z, ...] supplier-X, consumer-X
> > step2: the children of consumer-X should be re-ordered to maintain the seq
> >   [... consumer_a, ..., consumer_z, ....] supplier-X  [consumer-X, child_a, ...., child_z]
> > step3: the consumer_a should be re-ordered to maintain the seq
> >   [... consumer_z, ...] supplier-X [ consumer-X, child_a, consumer_a ..., child_z]
> >
> > It requires two nested recursion to drain out all out-of-order item in
> > "affected range". To avoid such complicated code, this patch suggests
> > to utilize the info in device tree, instead of using the order of
> > devices_kset during shutdown. It iterates the device tree, and firstly
> > shutdown a device's children and consumers. After this patch, the buggy
> > commit is hollow and left to clean.
> >
> > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > Cc: Grygorii Strashko <grygorii.strashko@ti.com>
> > Cc: Christoph Hellwig <hch@infradead.org>
> > Cc: Bjorn Helgaas <helgaas@kernel.org>
> > Cc: Dave Young <dyoung@redhat.com>
> > Cc: linux-pci@vger.kernel.org
> > Cc: linuxppc-dev@lists.ozlabs.org
> > Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> > ---
> >  drivers/base/core.c    | 48 +++++++++++++++++++++++++++++++++++++++++++-----
> >  include/linux/device.h |  1 +
> >  2 files changed, 44 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/base/core.c b/drivers/base/core.c
> > index a48868f..684b994 100644
> > --- a/drivers/base/core.c
> > +++ b/drivers/base/core.c
> > @@ -1446,6 +1446,7 @@ void device_initialize(struct device *dev)
> >         INIT_LIST_HEAD(&dev->links.consumers);
> >         INIT_LIST_HEAD(&dev->links.suppliers);
> >         dev->links.status = DL_DEV_NO_DRIVER;
> > +       dev->shutdown = false;
> >  }
> >  EXPORT_SYMBOL_GPL(device_initialize);
> >
> > @@ -2811,7 +2812,6 @@ static void __device_shutdown(struct device *dev)
> >          * lock is to be held
> >          */
> >         parent = get_device(dev->parent);
> > -       get_device(dev);
> >         /*
> >          * Make sure the device is off the kset list, in the
> >          * event that dev->*->shutdown() doesn't remove it.
> > @@ -2842,23 +2842,60 @@ static void __device_shutdown(struct device *dev)
> >                         dev_info(dev, "shutdown\n");
> >                 dev->driver->shutdown(dev);
> >         }
> > -
> > +       dev->shutdown = true;
> >         device_unlock(dev);
> >         if (parent)
> >                 device_unlock(parent);
> >
> > -       put_device(dev);
> >         put_device(parent);
> >         spin_lock(&devices_kset->list_lock);
> >  }
> >
> > +/* shutdown dev's children and consumer firstly, then itself */
> > +static int device_for_each_child_shutdown(struct device *dev)
> > +{
> > +       struct klist_iter i;
> > +       struct device *child;
> > +       struct device_link *link;
> > +
> > +       /* already shutdown, then skip this sub tree */
> > +       if (dev->shutdown)
> > +               return 0;
> > +
> > +       if (!dev->p)
> > +               goto check_consumers;
> > +
> > +       /* there is breakage of lock in __device_shutdown(), and the redundant
> > +        * ref++ on srcu protected consumer is harmless since shutdown is not
> > +        * hot path.
> > +        */
> > +       get_device(dev);
> > +
> > +       klist_iter_init(&dev->p->klist_children, &i);
> > +       while ((child = next_device(&i)))
> > +               device_for_each_child_shutdown(child);
> > +       klist_iter_exit(&i);
> > +
> > +check_consumers:
> > +       list_for_each_entry_rcu(link, &dev->links.consumers, s_node) {
> > +               if (!link->consumer->shutdown)
> > +                       device_for_each_child_shutdown(link->consumer);
> > +       }
> > +
> > +       __device_shutdown(dev);
> > +       put_device(dev);
> > +       return 0;
> > +}
> > +
> >  /**
> >   * device_shutdown - call ->shutdown() on each device to shutdown.
> >   */
> >  void device_shutdown(void)
> >  {
> >         struct device *dev;
> > +       int idx;
> >
> > +       idx = device_links_read_lock();
> >         spin_lock(&devices_kset->list_lock);
> >         /*
> >          * Walk the devices list backward, shutting down each in turn.
> > @@ -2866,11 +2903,12 @@ void device_shutdown(void)
> >          * devices offline, even as the system is shutting down.
> >          */
> >         while (!list_empty(&devices_kset->list)) {
> > -               dev = list_entry(devices_kset->list.prev, struct device,
> > +               dev = list_entry(devices_kset->list.next, struct device,
> >                                 kobj.entry);
> > -               __device_shutdown(dev);
> > +               device_for_each_child_shutdown(dev);
> >         }
> >         spin_unlock(&devices_kset->list_lock);
> > +       device_links_read_unlock(idx);
> >  }
> >
> >  /*
> > diff --git a/include/linux/device.h b/include/linux/device.h
> > index 055a69d..8a0f784 100644
> > --- a/include/linux/device.h
> > +++ b/include/linux/device.h
> > @@ -1003,6 +1003,7 @@ struct device {
> >         bool                    offline:1;
> >         bool                    of_node_reused:1;
> >         bool                    dma_32bit_limit:1;
> > +       bool                    shutdown:1; /* one direction: false->true */
> >  };
> >
> >  static inline struct device *kobj_to_dev(struct kobject *kobj)
> > --
> > 2.7.4
> >
>
>
>
> --
> With Best Regards,
> Andy Shevchenko
Pingfan Liu July 4, 2018, 3:10 a.m. UTC | #4
On Tue, Jul 3, 2018 at 5:26 PM Pingfan Liu <kernelfans@gmail.com> wrote:
>
> On Tue, Jul 3, 2018 at 3:51 PM Lukas Wunner <lukas@wunner.de> wrote:
> >
> > On Tue, Jul 03, 2018 at 02:50:40PM +0800, Pingfan Liu wrote:
> > > commit 52cdbdd49853 ("driver core: correct device's shutdown order")
> > > places an assumption of supplier<-consumer order on the process of probe.
> > > But it turns out to break down the parent <- child order in some scene.
> > > E.g in pci, a bridge is enabled by pci core, and behind it, the devices
> > > have been probed. Then comes the bridge's module, which enables extra
> > > feature(such as hotplug) on this bridge. This will break the
> > > parent<-children order and cause failure when "kexec -e" in some scenario.
> > >
> > > The detailed description of the scenario:
> > > An IBM Power9 machine on which, two drivers portdrv_pci and shpchp(a mod)
> > > match the PCI_CLASS_BRIDGE_PCI, but neither of them success to probe due
> > > to some issue. For this case, the bridge is moved after its children in
> > > devices_kset. Then, when "kexec -e", a ata-disk behind the bridge can not
> > > write back buffer in flight due to the former shutdown of the bridge which
> > > clears the BusMaster bit.
> >
> > If you revert commit cc27b735ad3a ("PCI/portdrv: Turn off PCIe services
> > during shutdown"), does the issue go away?
>
> Yes, it is gone.

Have not figured out why the issue was gone. But I think it just cover
some fault.

re-fetch the boot log of mainline kernel without any patch, and filter
out the pci domain 0004
grep "devices_kset: Moving 0004:" newlog.txt
[    2.114986] devices_kset: Moving 0004:00:00.0 to end of list   <---
pcie port drive's probe, but it failed
[    2.115192] devices_kset: Moving 0004:01:00.0 to end of list
[    2.115591] devices_kset: Moving 0004:02:02.0 to end of list
[    2.115923] devices_kset: Moving 0004:02:0a.0 to end of list
[    2.116141] devices_kset: Moving 0004:02:0b.0 to end of list
[    2.116358] devices_kset: Moving 0004:02:0c.0 to end of list
[    3.181860] devices_kset: Moving 0004:03:00.0 to end of list  <---
the ata disk controller which sits behind the bridge
[   10.267081] devices_kset: Moving 0004:00:00.0 to end of list  <---
shpc_probe() on this bridge, failed too.

Hence we have the bridge (parent) after the child in devices_kset.

Thanks,
Pingfan
kernel test robot July 4, 2018, 5:04 p.m. UTC | #5
Hi Pingfan,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on driver-core/driver-core-testing]
[also build test WARNING on v4.18-rc3 next-20180704]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Pingfan-Liu/drivers-base-bugfix-for-supplier-consumer-ordering-in-device_kset/20180703-184317
reproduce: make htmldocs

All warnings (new ones prefixed by >>):

   WARNING: convert(1) not found, for SVG to PDF conversion install ImageMagick (https://www.imagemagick.org)
   mm/mempool.c:228: warning: Function parameter or member 'pool' not described in 'mempool_init'
   include/net/cfg80211.h:4279: warning: Function parameter or member 'wext.ibss' not described in 'wireless_dev'
   include/net/cfg80211.h:4279: warning: Function parameter or member 'wext.connect' not described in 'wireless_dev'
   include/net/cfg80211.h:4279: warning: Function parameter or member 'wext.keys' not described in 'wireless_dev'
   include/net/cfg80211.h:4279: warning: Function parameter or member 'wext.ie' not described in 'wireless_dev'
   include/net/cfg80211.h:4279: warning: Function parameter or member 'wext.ie_len' not described in 'wireless_dev'
   include/net/cfg80211.h:4279: warning: Function parameter or member 'wext.bssid' not described in 'wireless_dev'
   include/net/cfg80211.h:4279: warning: Function parameter or member 'wext.ssid' not described in 'wireless_dev'
   include/net/cfg80211.h:4279: warning: Function parameter or member 'wext.default_key' not described in 'wireless_dev'
   include/net/cfg80211.h:4279: warning: Function parameter or member 'wext.default_mgmt_key' not described in 'wireless_dev'
   include/net/cfg80211.h:4279: warning: Function parameter or member 'wext.prev_bssid_valid' not described in 'wireless_dev'
   include/net/mac80211.h:2282: warning: Function parameter or member 'radiotap_timestamp.units_pos' not described in 'ieee80211_hw'
   include/net/mac80211.h:2282: warning: Function parameter or member 'radiotap_timestamp.accuracy' not described in 'ieee80211_hw'
   include/net/mac80211.h:955: warning: Function parameter or member 'control.rates' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:955: warning: Function parameter or member 'control.rts_cts_rate_idx' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:955: warning: Function parameter or member 'control.use_rts' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:955: warning: Function parameter or member 'control.use_cts_prot' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:955: warning: Function parameter or member 'control.short_preamble' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:955: warning: Function parameter or member 'control.skip_table' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:955: warning: Function parameter or member 'control.jiffies' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:955: warning: Function parameter or member 'control.vif' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:955: warning: Function parameter or member 'control.hw_key' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:955: warning: Function parameter or member 'control.flags' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:955: warning: Function parameter or member 'control.enqueue_time' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:955: warning: Function parameter or member 'ack' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:955: warning: Function parameter or member 'ack.cookie' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:955: warning: Function parameter or member 'status.rates' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:955: warning: Function parameter or member 'status.ack_signal' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:955: warning: Function parameter or member 'status.ampdu_ack_len' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:955: warning: Function parameter or member 'status.ampdu_len' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:955: warning: Function parameter or member 'status.antenna' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:955: warning: Function parameter or member 'status.tx_time' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:955: warning: Function parameter or member 'status.is_valid_ack_signal' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:955: warning: Function parameter or member 'status.status_driver_data' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:955: warning: Function parameter or member 'driver_rates' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:955: warning: Function parameter or member 'pad' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:955: warning: Function parameter or member 'rate_driver_data' not described in 'ieee80211_tx_info'
   net/mac80211/sta_info.h:588: warning: Function parameter or member 'rx_stats_avg' not described in 'sta_info'
   net/mac80211/sta_info.h:588: warning: Function parameter or member 'rx_stats_avg.signal' not described in 'sta_info'
   net/mac80211/sta_info.h:588: warning: Function parameter or member 'rx_stats_avg.chain_signal' not described in 'sta_info'
   net/mac80211/sta_info.h:588: warning: Function parameter or member 'status_stats.filtered' not described in 'sta_info'
   net/mac80211/sta_info.h:588: warning: Function parameter or member 'status_stats.retry_failed' not described in 'sta_info'
   net/mac80211/sta_info.h:588: warning: Function parameter or member 'status_stats.retry_count' not described in 'sta_info'
   net/mac80211/sta_info.h:588: warning: Function parameter or member 'status_stats.lost_packets' not described in 'sta_info'
   net/mac80211/sta_info.h:588: warning: Function parameter or member 'status_stats.last_tdls_pkt_time' not described in 'sta_info'
   net/mac80211/sta_info.h:588: warning: Function parameter or member 'status_stats.msdu_retries' not described in 'sta_info'
   net/mac80211/sta_info.h:588: warning: Function parameter or member 'status_stats.msdu_failed' not described in 'sta_info'
   net/mac80211/sta_info.h:588: warning: Function parameter or member 'status_stats.last_ack' not described in 'sta_info'
   net/mac80211/sta_info.h:588: warning: Function parameter or member 'status_stats.last_ack_signal' not described in 'sta_info'
   net/mac80211/sta_info.h:588: warning: Function parameter or member 'status_stats.ack_signal_filled' not described in 'sta_info'
   net/mac80211/sta_info.h:588: warning: Function parameter or member 'status_stats.avg_ack_signal' not described in 'sta_info'
   net/mac80211/sta_info.h:588: warning: Function parameter or member 'tx_stats.packets' not described in 'sta_info'
   net/mac80211/sta_info.h:588: warning: Function parameter or member 'tx_stats.bytes' not described in 'sta_info'
   net/mac80211/sta_info.h:588: warning: Function parameter or member 'tx_stats.last_rate' not described in 'sta_info'
   net/mac80211/sta_info.h:588: warning: Function parameter or member 'tx_stats.msdu' not described in 'sta_info'
   kernel/sched/fair.c:3760: warning: Function parameter or member 'flags' not described in 'attach_entity_load_avg'
   include/linux/device.h:93: warning: bad line: this bus.
   include/linux/dma-buf.h:307: warning: Function parameter or member 'cb_excl.cb' not described in 'dma_buf'
   include/linux/dma-buf.h:307: warning: Function parameter or member 'cb_excl.poll' not described in 'dma_buf'
   include/linux/dma-buf.h:307: warning: Function parameter or member 'cb_excl.active' not described in 'dma_buf'
   include/linux/dma-buf.h:307: warning: Function parameter or member 'cb_shared.cb' not described in 'dma_buf'
   include/linux/dma-buf.h:307: warning: Function parameter or member 'cb_shared.poll' not described in 'dma_buf'
   include/linux/dma-buf.h:307: warning: Function parameter or member 'cb_shared.active' not described in 'dma_buf'
   include/linux/dma-fence-array.h:54: warning: Function parameter or member 'work' not described in 'dma_fence_array'
   include/linux/gpio/driver.h:142: warning: Function parameter or member 'request_key' not described in 'gpio_irq_chip'
   include/linux/iio/hw-consumer.h:1: warning: no structured comments found
   include/linux/device.h:94: warning: bad line: this bus.
>> include/linux/device.h:1008: warning: Function parameter or member 'shutdown' not described in 'device'
   include/linux/input/sparse-keymap.h:46: warning: Function parameter or member 'sw' not described in 'key_entry'
   include/linux/regulator/driver.h:227: warning: Function parameter or member 'resume_early' not described in 'regulator_ops'
   drivers/regulator/core.c:4465: warning: Excess function parameter 'state' description in 'regulator_suspend_late'
   arch/s390/include/asm/cio.h:245: warning: Function parameter or member 'esw.esw0' not described in 'irb'
   arch/s390/include/asm/cio.h:245: warning: Function parameter or member 'esw.esw1' not described in 'irb'
   arch/s390/include/asm/cio.h:245: warning: Function parameter or member 'esw.esw2' not described in 'irb'
   arch/s390/include/asm/cio.h:245: warning: Function parameter or member 'esw.esw3' not described in 'irb'
   arch/s390/include/asm/cio.h:245: warning: Function parameter or member 'esw.eadm' not described in 'irb'
   drivers/usb/dwc3/gadget.c:510: warning: Excess function parameter 'dwc' description in 'dwc3_gadget_start_config'
   include/drm/drm_drv.h:610: warning: Function parameter or member 'gem_prime_pin' not described in 'drm_driver'
   include/drm/drm_drv.h:610: warning: Function parameter or member 'gem_prime_unpin' not described in 'drm_driver'
   include/drm/drm_drv.h:610: warning: Function parameter or member 'gem_prime_res_obj' not described in 'drm_driver'
   include/drm/drm_drv.h:610: warning: Function parameter or member 'gem_prime_get_sg_table' not described in 'drm_driver'
   include/drm/drm_drv.h:610: warning: Function parameter or member 'gem_prime_import_sg_table' not described in 'drm_driver'
   include/drm/drm_drv.h:610: warning: Function parameter or member 'gem_prime_vmap' not described in 'drm_driver'
   include/drm/drm_drv.h:610: warning: Function parameter or member 'gem_prime_vunmap' not described in 'drm_driver'
   include/drm/drm_drv.h:610: warning: Function parameter or member 'gem_prime_mmap' not described in 'drm_driver'
   drivers/gpu/drm/i915/i915_vma.h:48: warning: cannot understand function prototype: 'struct i915_vma '
   drivers/gpu/drm/i915/i915_vma.h:1: warning: no structured comments found
   include/drm/tinydrm/tinydrm.h:34: warning: Function parameter or member 'fb_dirty' not described in 'tinydrm_device'
   drivers/gpu/drm/tinydrm/mipi-dbi.c:272: warning: Function parameter or member 'crtc_state' not described in 'mipi_dbi_enable_flush'
   drivers/gpu/drm/tinydrm/mipi-dbi.c:272: warning: Function parameter or member 'plane_state' not described in 'mipi_dbi_enable_flush'

vim +1008 include/linux/device.h

^1da177e Linus Torvalds 2005-04-16 @1008  

:::::: The code at line 1008 was first introduced by commit
:::::: 1da177e4c3f41524e886b7f1b8a0c1fc7321cac2 Linux-2.6.12-rc2

:::::: TO: Linus Torvalds <torvalds@ppc970.osdl.org>
:::::: CC: Linus Torvalds <torvalds@ppc970.osdl.org>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation
Rafael J. Wysocki July 5, 2018, 10:11 a.m. UTC | #6
On Tuesday, July 3, 2018 8:50:40 AM CEST Pingfan Liu wrote:
> commit 52cdbdd49853 ("driver core: correct device's shutdown order")
> places an assumption of supplier<-consumer order on the process of probe.
> But it turns out to break down the parent <- child order in some scene.
> E.g in pci, a bridge is enabled by pci core, and behind it, the devices
> have been probed. Then comes the bridge's module, which enables extra
> feature(such as hotplug) on this bridge. This will break the
> parent<-children order and cause failure when "kexec -e" in some scenario.
> 
> The detailed description of the scenario:
> An IBM Power9 machine on which, two drivers portdrv_pci and shpchp(a mod)
> match the PCI_CLASS_BRIDGE_PCI, but neither of them success to probe due
> to some issue. For this case, the bridge is moved after its children in
> devices_kset. Then, when "kexec -e", a ata-disk behind the bridge can not
> write back buffer in flight due to the former shutdown of the bridge which
> clears the BusMaster bit.
> 
> It is a little hard to impose both "parent<-child" and "supplier<-consumer"
> order on devices_kset. Take the following scene:
> step0: before a consumer's probing, (note child_a is supplier of consumer_a)
>   [ consumer-X, child_a, ...., child_z] [... consumer_a, ..., consumer_z, ...] supplier-X
>                                          ^^^^^^^^^^ affected range ^^^^^^^^^^
> step1: when probing, moving consumer-X after supplier-X
>   [ child_a, ...., child_z] [.... consumer_a, ..., consumer_z, ...] supplier-X, consumer-X
> step2: the children of consumer-X should be re-ordered to maintain the seq
>   [... consumer_a, ..., consumer_z, ....] supplier-X  [consumer-X, child_a, ...., child_z]
> step3: the consumer_a should be re-ordered to maintain the seq
>   [... consumer_z, ...] supplier-X [ consumer-X, child_a, consumer_a ..., child_z]
> 
> It requires two nested recursion to drain out all out-of-order item in
> "affected range". To avoid such complicated code, this patch suggests
> to utilize the info in device tree, instead of using the order of
> devices_kset during shutdown. It iterates the device tree, and firstly
> shutdown a device's children and consumers. After this patch, the buggy
> commit is hollow and left to clean.
> 
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Cc: Grygorii Strashko <grygorii.strashko@ti.com>
> Cc: Christoph Hellwig <hch@infradead.org>
> Cc: Bjorn Helgaas <helgaas@kernel.org>
> Cc: Dave Young <dyoung@redhat.com>
> Cc: linux-pci@vger.kernel.org
> Cc: linuxppc-dev@lists.ozlabs.org
> Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> ---
>  drivers/base/core.c    | 48 +++++++++++++++++++++++++++++++++++++++++++-----
>  include/linux/device.h |  1 +
>  2 files changed, 44 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index a48868f..684b994 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -1446,6 +1446,7 @@ void device_initialize(struct device *dev)
>  	INIT_LIST_HEAD(&dev->links.consumers);
>  	INIT_LIST_HEAD(&dev->links.suppliers);
>  	dev->links.status = DL_DEV_NO_DRIVER;
> +	dev->shutdown = false;
>  }
>  EXPORT_SYMBOL_GPL(device_initialize);
>  
> @@ -2811,7 +2812,6 @@ static void __device_shutdown(struct device *dev)
>  	 * lock is to be held
>  	 */
>  	parent = get_device(dev->parent);
> -	get_device(dev);

Why is the get_/put_device() not needed any more?

>  	/*
>  	 * Make sure the device is off the kset list, in the
>  	 * event that dev->*->shutdown() doesn't remove it.
> @@ -2842,23 +2842,60 @@ static void __device_shutdown(struct device *dev)
>  			dev_info(dev, "shutdown\n");
>  		dev->driver->shutdown(dev);
>  	}
> -
> +	dev->shutdown = true;
>  	device_unlock(dev);
>  	if (parent)
>  		device_unlock(parent);
>  
> -	put_device(dev);
>  	put_device(parent);
>  	spin_lock(&devices_kset->list_lock);
>  }
>  
> +/* shutdown dev's children and consumer firstly, then itself */
> +static int device_for_each_child_shutdown(struct device *dev)

Confusing name.

What about device_shutdown_subordinate()?

> +{
> +	struct klist_iter i;
> +	struct device *child;
> +	struct device_link *link;
> +
> +	/* already shutdown, then skip this sub tree */
> +	if (dev->shutdown)
> +		return 0;
> +
> +	if (!dev->p)
> +		goto check_consumers;
> +
> +	/* there is breakage of lock in __device_shutdown(), and the redundant
> +	 * ref++ on srcu protected consumer is harmless since shutdown is not
> +	 * hot path.
> +	 */
> +	get_device(dev);
> +
> +	klist_iter_init(&dev->p->klist_children, &i);
> +	while ((child = next_device(&i)))
> +		device_for_each_child_shutdown(child);

Why don't you use device_for_each_child() here?

> +	klist_iter_exit(&i);
> +
> +check_consumers:
> +	list_for_each_entry_rcu(link, &dev->links.consumers, s_node) {
> +		if (!link->consumer->shutdown)
> +			device_for_each_child_shutdown(link->consumer);
> +	}
> +
> +	__device_shutdown(dev);
> +	put_device(dev);

Possible reference counter imbalance AFAICS.

> +	return 0;
> +}

Well, instead of doing this dance, we might as well walk dpm_list here as it
is in the right order.

Of course, that would require dpm_list to be available for CONFIG_PM unset,
but it may be a better approach long term.

> +
>  /**
>   * device_shutdown - call ->shutdown() on each device to shutdown.
>   */
>  void device_shutdown(void)
>  {
>  	struct device *dev;
> +	int idx;
>  
> +	idx = device_links_read_lock();
>  	spin_lock(&devices_kset->list_lock);
>  	/*
>  	 * Walk the devices list backward, shutting down each in turn.
> @@ -2866,11 +2903,12 @@ void device_shutdown(void)
>  	 * devices offline, even as the system is shutting down.
>  	 */
>  	while (!list_empty(&devices_kset->list)) {
> -		dev = list_entry(devices_kset->list.prev, struct device,
> +		dev = list_entry(devices_kset->list.next, struct device,
>  				kobj.entry);
> -		__device_shutdown(dev);
> +		device_for_each_child_shutdown(dev);
>  	}
>  	spin_unlock(&devices_kset->list_lock);
> +	device_links_read_unlock(idx);
>  }
>  
>  /*
> diff --git a/include/linux/device.h b/include/linux/device.h
> index 055a69d..8a0f784 100644
> --- a/include/linux/device.h
> +++ b/include/linux/device.h
> @@ -1003,6 +1003,7 @@ struct device {
>  	bool			offline:1;
>  	bool			of_node_reused:1;
>  	bool			dma_32bit_limit:1;
> +	bool			shutdown:1; /* one direction: false->true */
>  };
>  
>  static inline struct device *kobj_to_dev(struct kobject *kobj)
> 

If the device_kset_move_last() in really_probe() is the only problem,
I'd rather try to fix that one in the first place.

Why is it needed?

Thanks,
Rafael
Pingfan Liu July 6, 2018, 3:02 a.m. UTC | #7
On Thu, Jul 5, 2018 at 6:13 PM Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>
> On Tuesday, July 3, 2018 8:50:40 AM CEST Pingfan Liu wrote:
> > commit 52cdbdd49853 ("driver core: correct device's shutdown order")
> > places an assumption of supplier<-consumer order on the process of probe.
> > But it turns out to break down the parent <- child order in some scene.
> > E.g in pci, a bridge is enabled by pci core, and behind it, the devices
> > have been probed. Then comes the bridge's module, which enables extra
> > feature(such as hotplug) on this bridge. This will break the
> > parent<-children order and cause failure when "kexec -e" in some scenario.
> >
> > The detailed description of the scenario:
> > An IBM Power9 machine on which, two drivers portdrv_pci and shpchp(a mod)
> > match the PCI_CLASS_BRIDGE_PCI, but neither of them success to probe due
> > to some issue. For this case, the bridge is moved after its children in
> > devices_kset. Then, when "kexec -e", a ata-disk behind the bridge can not
> > write back buffer in flight due to the former shutdown of the bridge which
> > clears the BusMaster bit.
> >
> > It is a little hard to impose both "parent<-child" and "supplier<-consumer"
> > order on devices_kset. Take the following scene:
> > step0: before a consumer's probing, (note child_a is supplier of consumer_a)
> >   [ consumer-X, child_a, ...., child_z] [... consumer_a, ..., consumer_z, ...] supplier-X
> >                                          ^^^^^^^^^^ affected range ^^^^^^^^^^
> > step1: when probing, moving consumer-X after supplier-X
> >   [ child_a, ...., child_z] [.... consumer_a, ..., consumer_z, ...] supplier-X, consumer-X
> > step2: the children of consumer-X should be re-ordered to maintain the seq
> >   [... consumer_a, ..., consumer_z, ....] supplier-X  [consumer-X, child_a, ...., child_z]
> > step3: the consumer_a should be re-ordered to maintain the seq
> >   [... consumer_z, ...] supplier-X [ consumer-X, child_a, consumer_a ..., child_z]
> >
> > It requires two nested recursion to drain out all out-of-order item in
> > "affected range". To avoid such complicated code, this patch suggests
> > to utilize the info in device tree, instead of using the order of
> > devices_kset during shutdown. It iterates the device tree, and firstly
> > shutdown a device's children and consumers. After this patch, the buggy
> > commit is hollow and left to clean.
> >
> > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > Cc: Grygorii Strashko <grygorii.strashko@ti.com>
> > Cc: Christoph Hellwig <hch@infradead.org>
> > Cc: Bjorn Helgaas <helgaas@kernel.org>
> > Cc: Dave Young <dyoung@redhat.com>
> > Cc: linux-pci@vger.kernel.org
> > Cc: linuxppc-dev@lists.ozlabs.org
> > Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> > ---
> >  drivers/base/core.c    | 48 +++++++++++++++++++++++++++++++++++++++++++-----
> >  include/linux/device.h |  1 +
> >  2 files changed, 44 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/base/core.c b/drivers/base/core.c
> > index a48868f..684b994 100644
> > --- a/drivers/base/core.c
> > +++ b/drivers/base/core.c
> > @@ -1446,6 +1446,7 @@ void device_initialize(struct device *dev)
> >       INIT_LIST_HEAD(&dev->links.consumers);
> >       INIT_LIST_HEAD(&dev->links.suppliers);
> >       dev->links.status = DL_DEV_NO_DRIVER;
> > +     dev->shutdown = false;
> >  }
> >  EXPORT_SYMBOL_GPL(device_initialize);
> >
> > @@ -2811,7 +2812,6 @@ static void __device_shutdown(struct device *dev)
> >        * lock is to be held
> >        */
> >       parent = get_device(dev->parent);
> > -     get_device(dev);
>
> Why is the get_/put_device() not needed any more?
>
They are moved upper layer into device_for_each_child_shutdown().
Since there is lock breakage in __device_shutdown(), resorting to
ref++ to protect the ancestor.  And I think the
get_device(dev->parent) can be deleted either.

> >       /*
> >        * Make sure the device is off the kset list, in the
> >        * event that dev->*->shutdown() doesn't remove it.
> > @@ -2842,23 +2842,60 @@ static void __device_shutdown(struct device *dev)
> >                       dev_info(dev, "shutdown\n");
> >               dev->driver->shutdown(dev);
> >       }
> > -
> > +     dev->shutdown = true;
> >       device_unlock(dev);
> >       if (parent)
> >               device_unlock(parent);
> >
> > -     put_device(dev);
> >       put_device(parent);
> >       spin_lock(&devices_kset->list_lock);
> >  }
> >
> > +/* shutdown dev's children and consumer firstly, then itself */
> > +static int device_for_each_child_shutdown(struct device *dev)
>
> Confusing name.
>
> What about device_shutdown_subordinate()?
>
Fine. My understanding of words is not exact.

> > +{
> > +     struct klist_iter i;
> > +     struct device *child;
> > +     struct device_link *link;
> > +
> > +     /* already shutdown, then skip this sub tree */
> > +     if (dev->shutdown)
> > +             return 0;
> > +
> > +     if (!dev->p)
> > +             goto check_consumers;
> > +
> > +     /* there is breakage of lock in __device_shutdown(), and the redundant
> > +      * ref++ on srcu protected consumer is harmless since shutdown is not
> > +      * hot path.
> > +      */
> > +     get_device(dev);
> > +
> > +     klist_iter_init(&dev->p->klist_children, &i);
> > +     while ((child = next_device(&i)))
> > +             device_for_each_child_shutdown(child);
>
> Why don't you use device_for_each_child() here?
>
OK, I will try use it.

> > +     klist_iter_exit(&i);
> > +
> > +check_consumers:
> > +     list_for_each_entry_rcu(link, &dev->links.consumers, s_node) {
> > +             if (!link->consumer->shutdown)
> > +                     device_for_each_child_shutdown(link->consumer);
> > +     }
> > +
> > +     __device_shutdown(dev);
> > +     put_device(dev);
>
> Possible reference counter imbalance AFAICS.
>
Yes, get_device() should be ahead of "if (!dev->p)". Is anything  else I miss?

> > +     return 0;
> > +}
>
> Well, instead of doing this dance, we might as well walk dpm_list here as it
> is in the right order.
>
Sorry, do you mean that using the same way to manage the dpm_list?

> Of course, that would require dpm_list to be available for CONFIG_PM unset,
> but it may be a better approach long term.
>
> > +
> >  /**
> >   * device_shutdown - call ->shutdown() on each device to shutdown.
> >   */
> >  void device_shutdown(void)
> >  {
> >       struct device *dev;
> > +     int idx;
> >
> > +     idx = device_links_read_lock();
> >       spin_lock(&devices_kset->list_lock);
> >       /*
> >        * Walk the devices list backward, shutting down each in turn.
> > @@ -2866,11 +2903,12 @@ void device_shutdown(void)
> >        * devices offline, even as the system is shutting down.
> >        */
> >       while (!list_empty(&devices_kset->list)) {
> > -             dev = list_entry(devices_kset->list.prev, struct device,
> > +             dev = list_entry(devices_kset->list.next, struct device,
> >                               kobj.entry);
> > -             __device_shutdown(dev);
> > +             device_for_each_child_shutdown(dev);
> >       }
> >       spin_unlock(&devices_kset->list_lock);
> > +     device_links_read_unlock(idx);
> >  }
> >
> >  /*
> > diff --git a/include/linux/device.h b/include/linux/device.h
> > index 055a69d..8a0f784 100644
> > --- a/include/linux/device.h
> > +++ b/include/linux/device.h
> > @@ -1003,6 +1003,7 @@ struct device {
> >       bool                    offline:1;
> >       bool                    of_node_reused:1;
> >       bool                    dma_32bit_limit:1;
> > +     bool                    shutdown:1; /* one direction: false->true */
> >  };
> >
> >  static inline struct device *kobj_to_dev(struct kobject *kobj)
> >
>
> If the device_kset_move_last() in really_probe() is the only problem,
> I'd rather try to fix that one in the first place.
>
> Why is it needed?
>
I had tried, but it turns out not easy to archive. The code is
https://patchwork.kernel.org/patch/10485195/. And I make a detailed
description of the algorithm in this patch's commit log. To be more
detailed, we face the potential out of order issue in really_probe()
like : 0th. [ consumer-X, child_a, ...., child_z] [... consumer_a,
..., consumer_z, ...] supplier-X //(note child_a is supplier of
consumer_a).  To address all the potential out of order item in the
affected section [... consumer_a, ..., consumer_z, ...],  it will
incur two nested recursions.  1st, moving  consumer-X and its
descendants after supplier-X,  2nd, moving consumer_a after child_a,
3rd. the 2nd step may pose the same situation of 0th.  Besides the two
interleaved recursion,  the breakage of spin lock requires more effort
to protect the item from disappearing in linked-list  (which I did not
implement in the https://patchwork.kernel.org/patch/10485195/). Hence
I turn to this cheap method.

Thanks,
Pingfan
Rafael J. Wysocki July 6, 2018, 9:53 a.m. UTC | #8
On Friday, July 6, 2018 5:02:15 AM CEST Pingfan Liu wrote:
> On Thu, Jul 5, 2018 at 6:13 PM Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> >
> > On Tuesday, July 3, 2018 8:50:40 AM CEST Pingfan Liu wrote:
> > > commit 52cdbdd49853 ("driver core: correct device's shutdown order")
> > > places an assumption of supplier<-consumer order on the process of probe.
> > > But it turns out to break down the parent <- child order in some scene.
> > > E.g in pci, a bridge is enabled by pci core, and behind it, the devices
> > > have been probed. Then comes the bridge's module, which enables extra
> > > feature(such as hotplug) on this bridge. This will break the
> > > parent<-children order and cause failure when "kexec -e" in some scenario.
> > >
> > > The detailed description of the scenario:
> > > An IBM Power9 machine on which, two drivers portdrv_pci and shpchp(a mod)
> > > match the PCI_CLASS_BRIDGE_PCI, but neither of them success to probe due
> > > to some issue. For this case, the bridge is moved after its children in
> > > devices_kset. Then, when "kexec -e", a ata-disk behind the bridge can not
> > > write back buffer in flight due to the former shutdown of the bridge which
> > > clears the BusMaster bit.
> > >
> > > It is a little hard to impose both "parent<-child" and "supplier<-consumer"
> > > order on devices_kset. Take the following scene:
> > > step0: before a consumer's probing, (note child_a is supplier of consumer_a)
> > >   [ consumer-X, child_a, ...., child_z] [... consumer_a, ..., consumer_z, ...] supplier-X
> > >                                          ^^^^^^^^^^ affected range ^^^^^^^^^^
> > > step1: when probing, moving consumer-X after supplier-X
> > >   [ child_a, ...., child_z] [.... consumer_a, ..., consumer_z, ...] supplier-X, consumer-X
> > > step2: the children of consumer-X should be re-ordered to maintain the seq
> > >   [... consumer_a, ..., consumer_z, ....] supplier-X  [consumer-X, child_a, ...., child_z]
> > > step3: the consumer_a should be re-ordered to maintain the seq
> > >   [... consumer_z, ...] supplier-X [ consumer-X, child_a, consumer_a ..., child_z]
> > >
> > > It requires two nested recursion to drain out all out-of-order item in
> > > "affected range". To avoid such complicated code, this patch suggests
> > > to utilize the info in device tree, instead of using the order of
> > > devices_kset during shutdown. It iterates the device tree, and firstly
> > > shutdown a device's children and consumers. After this patch, the buggy
> > > commit is hollow and left to clean.
> > >
> > > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > > Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > Cc: Grygorii Strashko <grygorii.strashko@ti.com>
> > > Cc: Christoph Hellwig <hch@infradead.org>
> > > Cc: Bjorn Helgaas <helgaas@kernel.org>
> > > Cc: Dave Young <dyoung@redhat.com>
> > > Cc: linux-pci@vger.kernel.org
> > > Cc: linuxppc-dev@lists.ozlabs.org
> > > Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> > > ---
> > >  drivers/base/core.c    | 48 +++++++++++++++++++++++++++++++++++++++++++-----
> > >  include/linux/device.h |  1 +
> > >  2 files changed, 44 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/drivers/base/core.c b/drivers/base/core.c
> > > index a48868f..684b994 100644
> > > --- a/drivers/base/core.c
> > > +++ b/drivers/base/core.c
> > > @@ -1446,6 +1446,7 @@ void device_initialize(struct device *dev)
> > >       INIT_LIST_HEAD(&dev->links.consumers);
> > >       INIT_LIST_HEAD(&dev->links.suppliers);
> > >       dev->links.status = DL_DEV_NO_DRIVER;
> > > +     dev->shutdown = false;
> > >  }
> > >  EXPORT_SYMBOL_GPL(device_initialize);
> > >
> > > @@ -2811,7 +2812,6 @@ static void __device_shutdown(struct device *dev)
> > >        * lock is to be held
> > >        */
> > >       parent = get_device(dev->parent);
> > > -     get_device(dev);
> >
> > Why is the get_/put_device() not needed any more?
> >
> They are moved upper layer into device_for_each_child_shutdown().
> Since there is lock breakage in __device_shutdown(), resorting to
> ref++ to protect the ancestor.  And I think the
> get_device(dev->parent) can be deleted either.

Wouldn't that break USB?

> > >       /*
> > >        * Make sure the device is off the kset list, in the
> > >        * event that dev->*->shutdown() doesn't remove it.
> > > @@ -2842,23 +2842,60 @@ static void __device_shutdown(struct device *dev)
> > >                       dev_info(dev, "shutdown\n");
> > >               dev->driver->shutdown(dev);
> > >       }
> > > -
> > > +     dev->shutdown = true;
> > >       device_unlock(dev);
> > >       if (parent)
> > >               device_unlock(parent);
> > >
> > > -     put_device(dev);
> > >       put_device(parent);
> > >       spin_lock(&devices_kset->list_lock);
> > >  }
> > >
> > > +/* shutdown dev's children and consumer firstly, then itself */
> > > +static int device_for_each_child_shutdown(struct device *dev)
> >
> > Confusing name.
> >
> > What about device_shutdown_subordinate()?
> >
> Fine. My understanding of words is not exact.
> 
> > > +{
> > > +     struct klist_iter i;
> > > +     struct device *child;
> > > +     struct device_link *link;
> > > +
> > > +     /* already shutdown, then skip this sub tree */
> > > +     if (dev->shutdown)
> > > +             return 0;
> > > +
> > > +     if (!dev->p)
> > > +             goto check_consumers;
> > > +
> > > +     /* there is breakage of lock in __device_shutdown(), and the redundant
> > > +      * ref++ on srcu protected consumer is harmless since shutdown is not
> > > +      * hot path.
> > > +      */
> > > +     get_device(dev);
> > > +
> > > +     klist_iter_init(&dev->p->klist_children, &i);
> > > +     while ((child = next_device(&i)))
> > > +             device_for_each_child_shutdown(child);
> >
> > Why don't you use device_for_each_child() here?
> >
> OK, I will try use it.

Well, hold on.

> > > +     klist_iter_exit(&i);
> > > +
> > > +check_consumers:
> > > +     list_for_each_entry_rcu(link, &dev->links.consumers, s_node) {
> > > +             if (!link->consumer->shutdown)
> > > +                     device_for_each_child_shutdown(link->consumer);
> > > +     }
> > > +
> > > +     __device_shutdown(dev);
> > > +     put_device(dev);
> >
> > Possible reference counter imbalance AFAICS.
> >
> Yes, get_device() should be ahead of "if (!dev->p)". Is anything  else I miss?

Yes, that's it.

> > > +     return 0;
> > > +}
> >
> > Well, instead of doing this dance, we might as well walk dpm_list here as it
> > is in the right order.
> >
> Sorry, do you mean that using the same way to manage the dpm_list?

No, I mean to use dpm_list instead of devices_kset for shutdown.

They should be in the same order anyway if all is correct.

> > Of course, that would require dpm_list to be available for CONFIG_PM unset,
> > but it may be a better approach long term.
> >
> > > +
> > >  /**
> > >   * device_shutdown - call ->shutdown() on each device to shutdown.
> > >   */
> > >  void device_shutdown(void)
> > >  {
> > >       struct device *dev;
> > > +     int idx;
> > >
> > > +     idx = device_links_read_lock();
> > >       spin_lock(&devices_kset->list_lock);
> > >       /*
> > >        * Walk the devices list backward, shutting down each in turn.
> > > @@ -2866,11 +2903,12 @@ void device_shutdown(void)
> > >        * devices offline, even as the system is shutting down.
> > >        */
> > >       while (!list_empty(&devices_kset->list)) {
> > > -             dev = list_entry(devices_kset->list.prev, struct device,
> > > +             dev = list_entry(devices_kset->list.next, struct device,
> > >                               kobj.entry);
> > > -             __device_shutdown(dev);
> > > +             device_for_each_child_shutdown(dev);
> > >       }
> > >       spin_unlock(&devices_kset->list_lock);
> > > +     device_links_read_unlock(idx);
> > >  }
> > >
> > >  /*
> > > diff --git a/include/linux/device.h b/include/linux/device.h
> > > index 055a69d..8a0f784 100644
> > > --- a/include/linux/device.h
> > > +++ b/include/linux/device.h
> > > @@ -1003,6 +1003,7 @@ struct device {
> > >       bool                    offline:1;
> > >       bool                    of_node_reused:1;
> > >       bool                    dma_32bit_limit:1;
> > > +     bool                    shutdown:1; /* one direction: false->true */
> > >  };
> > >
> > >  static inline struct device *kobj_to_dev(struct kobject *kobj)
> > >
> >
> > If the device_kset_move_last() in really_probe() is the only problem,
> > I'd rather try to fix that one in the first place.
> >
> > Why is it needed?
> >
> I had tried, but it turns out not easy to archive. The code is
> https://patchwork.kernel.org/patch/10485195/. And I make a detailed
> description of the algorithm in this patch's commit log. To be more
> detailed, we face the potential out of order issue in really_probe()
> like : 0th. [ consumer-X, child_a, ...., child_z] [... consumer_a,
> ..., consumer_z, ...] supplier-X //(note child_a is supplier of
> consumer_a).  To address all the potential out of order item in the
> affected section [... consumer_a, ..., consumer_z, ...],  it will
> incur two nested recursions.  1st, moving  consumer-X and its
> descendants after supplier-X,  2nd, moving consumer_a after child_a,
> 3rd. the 2nd step may pose the same situation of 0th.  Besides the two
> interleaved recursion,  the breakage of spin lock requires more effort
> to protect the item from disappearing in linked-list  (which I did not
> implement in the https://patchwork.kernel.org/patch/10485195/). Hence
> I turn to this cheap method.

So I think that we simply need to drop the devices_kset_move_last() call
from really_probe() as it is plain incorrect and the use case for it is
questionable at best.

And the use case it is supposed to address should be addressed differently.

Thanks,
Rafael
Pingfan Liu July 7, 2018, 4:02 a.m. UTC | #9
On Fri, Jul 6, 2018 at 5:54 PM Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>
> On Friday, July 6, 2018 5:02:15 AM CEST Pingfan Liu wrote:
> > On Thu, Jul 5, 2018 at 6:13 PM Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > >
> > > On Tuesday, July 3, 2018 8:50:40 AM CEST Pingfan Liu wrote:
> > > > commit 52cdbdd49853 ("driver core: correct device's shutdown order")
> > > > places an assumption of supplier<-consumer order on the process of probe.
> > > > But it turns out to break down the parent <- child order in some scene.
> > > > E.g in pci, a bridge is enabled by pci core, and behind it, the devices
> > > > have been probed. Then comes the bridge's module, which enables extra
> > > > feature(such as hotplug) on this bridge. This will break the
> > > > parent<-children order and cause failure when "kexec -e" in some scenario.
> > > >
> > > > The detailed description of the scenario:
> > > > An IBM Power9 machine on which, two drivers portdrv_pci and shpchp(a mod)
> > > > match the PCI_CLASS_BRIDGE_PCI, but neither of them success to probe due
> > > > to some issue. For this case, the bridge is moved after its children in
> > > > devices_kset. Then, when "kexec -e", a ata-disk behind the bridge can not
> > > > write back buffer in flight due to the former shutdown of the bridge which
> > > > clears the BusMaster bit.
> > > >
> > > > It is a little hard to impose both "parent<-child" and "supplier<-consumer"
> > > > order on devices_kset. Take the following scene:
> > > > step0: before a consumer's probing, (note child_a is supplier of consumer_a)
> > > >   [ consumer-X, child_a, ...., child_z] [... consumer_a, ..., consumer_z, ...] supplier-X
> > > >                                          ^^^^^^^^^^ affected range ^^^^^^^^^^
> > > > step1: when probing, moving consumer-X after supplier-X
> > > >   [ child_a, ...., child_z] [.... consumer_a, ..., consumer_z, ...] supplier-X, consumer-X
> > > > step2: the children of consumer-X should be re-ordered to maintain the seq
> > > >   [... consumer_a, ..., consumer_z, ....] supplier-X  [consumer-X, child_a, ...., child_z]
> > > > step3: the consumer_a should be re-ordered to maintain the seq
> > > >   [... consumer_z, ...] supplier-X [ consumer-X, child_a, consumer_a ..., child_z]
> > > >
> > > > It requires two nested recursion to drain out all out-of-order item in
> > > > "affected range". To avoid such complicated code, this patch suggests
> > > > to utilize the info in device tree, instead of using the order of
> > > > devices_kset during shutdown. It iterates the device tree, and firstly
> > > > shutdown a device's children and consumers. After this patch, the buggy
> > > > commit is hollow and left to clean.
> > > >
> > > > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > > > Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > > Cc: Grygorii Strashko <grygorii.strashko@ti.com>
> > > > Cc: Christoph Hellwig <hch@infradead.org>
> > > > Cc: Bjorn Helgaas <helgaas@kernel.org>
> > > > Cc: Dave Young <dyoung@redhat.com>
> > > > Cc: linux-pci@vger.kernel.org
> > > > Cc: linuxppc-dev@lists.ozlabs.org
> > > > Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> > > > ---
> > > >  drivers/base/core.c    | 48 +++++++++++++++++++++++++++++++++++++++++++-----
> > > >  include/linux/device.h |  1 +
> > > >  2 files changed, 44 insertions(+), 5 deletions(-)
> > > >
> > > > diff --git a/drivers/base/core.c b/drivers/base/core.c
> > > > index a48868f..684b994 100644
> > > > --- a/drivers/base/core.c
> > > > +++ b/drivers/base/core.c
> > > > @@ -1446,6 +1446,7 @@ void device_initialize(struct device *dev)
> > > >       INIT_LIST_HEAD(&dev->links.consumers);
> > > >       INIT_LIST_HEAD(&dev->links.suppliers);
> > > >       dev->links.status = DL_DEV_NO_DRIVER;
> > > > +     dev->shutdown = false;
> > > >  }
> > > >  EXPORT_SYMBOL_GPL(device_initialize);
> > > >
> > > > @@ -2811,7 +2812,6 @@ static void __device_shutdown(struct device *dev)
> > > >        * lock is to be held
> > > >        */
> > > >       parent = get_device(dev->parent);
> > > > -     get_device(dev);
> > >
> > > Why is the get_/put_device() not needed any more?
> > >
> > They are moved upper layer into device_for_each_child_shutdown().
> > Since there is lock breakage in __device_shutdown(), resorting to
> > ref++ to protect the ancestor.  And I think the
> > get_device(dev->parent) can be deleted either.
>
> Wouldn't that break USB?
>
Sorry, I can not figure out. Is USB not modeled up-to-down? This
recursion can handle the up-to-down ref issue automatically, due to
the nature of device tree. Any hints? Thanks.

> > > >       /*
> > > >        * Make sure the device is off the kset list, in the
> > > >        * event that dev->*->shutdown() doesn't remove it.
> > > > @@ -2842,23 +2842,60 @@ static void __device_shutdown(struct device *dev)
> > > >                       dev_info(dev, "shutdown\n");
> > > >               dev->driver->shutdown(dev);
> > > >       }
> > > > -
> > > > +     dev->shutdown = true;
> > > >       device_unlock(dev);
> > > >       if (parent)
> > > >               device_unlock(parent);
> > > >
> > > > -     put_device(dev);
> > > >       put_device(parent);
> > > >       spin_lock(&devices_kset->list_lock);
> > > >  }
> > > >
> > > > +/* shutdown dev's children and consumer firstly, then itself */
> > > > +static int device_for_each_child_shutdown(struct device *dev)
> > >
> > > Confusing name.
> > >
> > > What about device_shutdown_subordinate()?
> > >
> > Fine. My understanding of words is not exact.
> >
> > > > +{
> > > > +     struct klist_iter i;
> > > > +     struct device *child;
> > > > +     struct device_link *link;
> > > > +
> > > > +     /* already shutdown, then skip this sub tree */
> > > > +     if (dev->shutdown)
> > > > +             return 0;
> > > > +
> > > > +     if (!dev->p)
> > > > +             goto check_consumers;
> > > > +
> > > > +     /* there is breakage of lock in __device_shutdown(), and the redundant
> > > > +      * ref++ on srcu protected consumer is harmless since shutdown is not
> > > > +      * hot path.
> > > > +      */
> > > > +     get_device(dev);
> > > > +
> > > > +     klist_iter_init(&dev->p->klist_children, &i);
> > > > +     while ((child = next_device(&i)))
> > > > +             device_for_each_child_shutdown(child);
> > >
> > > Why don't you use device_for_each_child() here?
> > >
> > OK, I will try use it.
>
> Well, hold on.
>
> > > > +     klist_iter_exit(&i);
> > > > +
> > > > +check_consumers:
> > > > +     list_for_each_entry_rcu(link, &dev->links.consumers, s_node) {
> > > > +             if (!link->consumer->shutdown)
> > > > +                     device_for_each_child_shutdown(link->consumer);
> > > > +     }
> > > > +
> > > > +     __device_shutdown(dev);
> > > > +     put_device(dev);
> > >
> > > Possible reference counter imbalance AFAICS.
> > >
> > Yes, get_device() should be ahead of "if (!dev->p)". Is anything  else I miss?
>
> Yes, that's it.
>
> > > > +     return 0;
> > > > +}
> > >
> > > Well, instead of doing this dance, we might as well walk dpm_list here as it
> > > is in the right order.
> > >
> > Sorry, do you mean that using the same way to manage the dpm_list?
>
> No, I mean to use dpm_list instead of devices_kset for shutdown.
>
> They should be in the same order anyway if all is correct.
>
Yes, the dpm_list and devices_kset contains the same info. But can we
make the shutdown as the first step, it is more simple and easy to
verify on different ARCH. Then hunting for the solution of pm.

> > > Of course, that would require dpm_list to be available for CONFIG_PM unset,
> > > but it may be a better approach long term.
> > >
> > > > +
> > > >  /**
> > > >   * device_shutdown - call ->shutdown() on each device to shutdown.
> > > >   */
> > > >  void device_shutdown(void)
> > > >  {
> > > >       struct device *dev;
> > > > +     int idx;
> > > >
> > > > +     idx = device_links_read_lock();
> > > >       spin_lock(&devices_kset->list_lock);
> > > >       /*
> > > >        * Walk the devices list backward, shutting down each in turn.
> > > > @@ -2866,11 +2903,12 @@ void device_shutdown(void)
> > > >        * devices offline, even as the system is shutting down.
> > > >        */
> > > >       while (!list_empty(&devices_kset->list)) {
> > > > -             dev = list_entry(devices_kset->list.prev, struct device,
> > > > +             dev = list_entry(devices_kset->list.next, struct device,
> > > >                               kobj.entry);
> > > > -             __device_shutdown(dev);
> > > > +             device_for_each_child_shutdown(dev);
> > > >       }
> > > >       spin_unlock(&devices_kset->list_lock);
> > > > +     device_links_read_unlock(idx);
> > > >  }
> > > >
> > > >  /*
> > > > diff --git a/include/linux/device.h b/include/linux/device.h
> > > > index 055a69d..8a0f784 100644
> > > > --- a/include/linux/device.h
> > > > +++ b/include/linux/device.h
> > > > @@ -1003,6 +1003,7 @@ struct device {
> > > >       bool                    offline:1;
> > > >       bool                    of_node_reused:1;
> > > >       bool                    dma_32bit_limit:1;
> > > > +     bool                    shutdown:1; /* one direction: false->true */
> > > >  };
> > > >
> > > >  static inline struct device *kobj_to_dev(struct kobject *kobj)
> > > >
> > >
> > > If the device_kset_move_last() in really_probe() is the only problem,
> > > I'd rather try to fix that one in the first place.
> > >
> > > Why is it needed?
> > >
> > I had tried, but it turns out not easy to archive. The code is
> > https://patchwork.kernel.org/patch/10485195/. And I make a detailed
> > description of the algorithm in this patch's commit log. To be more
> > detailed, we face the potential out of order issue in really_probe()
> > like : 0th. [ consumer-X, child_a, ...., child_z] [... consumer_a,
> > ..., consumer_z, ...] supplier-X //(note child_a is supplier of
> > consumer_a).  To address all the potential out of order item in the
> > affected section [... consumer_a, ..., consumer_z, ...],  it will
> > incur two nested recursions.  1st, moving  consumer-X and its
> > descendants after supplier-X,  2nd, moving consumer_a after child_a,
> > 3rd. the 2nd step may pose the same situation of 0th.  Besides the two
> > interleaved recursion,  the breakage of spin lock requires more effort
> > to protect the item from disappearing in linked-list  (which I did not
> > implement in the https://patchwork.kernel.org/patch/10485195/). Hence
> > I turn to this cheap method.
>
> So I think that we simply need to drop the devices_kset_move_last() call
> from really_probe() as it is plain incorrect and the use case for it is
> questionable at best.
>
See the reply on different mail, I think there is other issue with the
current solution besides really_probe->devices_kset_move_last

Thanks,
Pingfan

Patch
diff mbox series

diff --git a/drivers/base/core.c b/drivers/base/core.c
index a48868f..684b994 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -1446,6 +1446,7 @@  void device_initialize(struct device *dev)
 	INIT_LIST_HEAD(&dev->links.consumers);
 	INIT_LIST_HEAD(&dev->links.suppliers);
 	dev->links.status = DL_DEV_NO_DRIVER;
+	dev->shutdown = false;
 }
 EXPORT_SYMBOL_GPL(device_initialize);
 
@@ -2811,7 +2812,6 @@  static void __device_shutdown(struct device *dev)
 	 * lock is to be held
 	 */
 	parent = get_device(dev->parent);
-	get_device(dev);
 	/*
 	 * Make sure the device is off the kset list, in the
 	 * event that dev->*->shutdown() doesn't remove it.
@@ -2842,23 +2842,60 @@  static void __device_shutdown(struct device *dev)
 			dev_info(dev, "shutdown\n");
 		dev->driver->shutdown(dev);
 	}
-
+	dev->shutdown = true;
 	device_unlock(dev);
 	if (parent)
 		device_unlock(parent);
 
-	put_device(dev);
 	put_device(parent);
 	spin_lock(&devices_kset->list_lock);
 }
 
+/* shutdown dev's children and consumer firstly, then itself */
+static int device_for_each_child_shutdown(struct device *dev)
+{
+	struct klist_iter i;
+	struct device *child;
+	struct device_link *link;
+
+	/* already shutdown, then skip this sub tree */
+	if (dev->shutdown)
+		return 0;
+
+	if (!dev->p)
+		goto check_consumers;
+
+	/* there is breakage of lock in __device_shutdown(), and the redundant
+	 * ref++ on srcu protected consumer is harmless since shutdown is not
+	 * hot path.
+	 */
+	get_device(dev);
+
+	klist_iter_init(&dev->p->klist_children, &i);
+	while ((child = next_device(&i)))
+		device_for_each_child_shutdown(child);
+	klist_iter_exit(&i);
+
+check_consumers:
+	list_for_each_entry_rcu(link, &dev->links.consumers, s_node) {
+		if (!link->consumer->shutdown)
+			device_for_each_child_shutdown(link->consumer);
+	}
+
+	__device_shutdown(dev);
+	put_device(dev);
+	return 0;
+}
+
 /**
  * device_shutdown - call ->shutdown() on each device to shutdown.
  */
 void device_shutdown(void)
 {
 	struct device *dev;
+	int idx;
 
+	idx = device_links_read_lock();
 	spin_lock(&devices_kset->list_lock);
 	/*
 	 * Walk the devices list backward, shutting down each in turn.
@@ -2866,11 +2903,12 @@  void device_shutdown(void)
 	 * devices offline, even as the system is shutting down.
 	 */
 	while (!list_empty(&devices_kset->list)) {
-		dev = list_entry(devices_kset->list.prev, struct device,
+		dev = list_entry(devices_kset->list.next, struct device,
 				kobj.entry);
-		__device_shutdown(dev);
+		device_for_each_child_shutdown(dev);
 	}
 	spin_unlock(&devices_kset->list_lock);
+	device_links_read_unlock(idx);
 }
 
 /*
diff --git a/include/linux/device.h b/include/linux/device.h
index 055a69d..8a0f784 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -1003,6 +1003,7 @@  struct device {
 	bool			offline:1;
 	bool			of_node_reused:1;
 	bool			dma_32bit_limit:1;
+	bool			shutdown:1; /* one direction: false->true */
 };
 
 static inline struct device *kobj_to_dev(struct kobject *kobj)