diff mbox series

[SRU,F,1/1] s390/pci: fix enabling a reserved PCI function

Message ID 20200825070050.752663-2-frank.heimes@canonical.com
State New
Headers show
Series zPCI: Enabling of a reserved PCI function regression introduced by multi-function support (LP: 1891437) | expand

Commit Message

Frank Heimes Aug. 25, 2020, 7 a.m. UTC
From: Niklas Schnelle <schnelle@linux.ibm.com>

BugLink: https://bugs.launchpad.net/bugs/1891437

In usual IPL or hot plug scenarios a zPCI function transitions directly
from reserved (invisible to Linux) to configured state or is configured
by Linux itself using an SCLP, however it can also first go from
reserved to standby and then from standby to configured without
Linux initiative.
In this scenario we first get a PEC event 0x302 and then 0x301.  This may
happen for example when the device is deconfigured at another LPAR and
made available for this LPAR. It may also happen under z/VM when
a device is attached while in some inconsistent state.

However when we get the 0x301 the device is already known to zPCI
so calling zpci_create() will add it twice resulting in the below
BUG. Instead we should only enable the existing device and finally
scan it through the PCI subsystem.

list_add double add: new=00000000ed5a9008, prev=00000000ed5a9008, next=0000000083502300.
kernel BUG at lib/list_debug.c:31!
Krnl PSW : 0704c00180000000 0000000082dc2db8 (__list_add_valid+0x70/0xa8)
Call Trace:
 [<0000000082dc2db8>] __list_add_valid+0x70/0xa8
([<0000000082dc2db4>] __list_add_valid+0x6c/0xa8)
 [<00000000828ea920>] zpci_create_device+0x60/0x1b0
 [<00000000828ef04a>] zpci_event_availability+0x282/0x2f0
 [<000000008315f848>] chsc_process_crw+0x2b8/0xa18
 [<000000008316735c>] crw_collect_info+0x254/0x348
 [<00000000829226ea>] kthread+0x14a/0x168
 [<000000008319d5c0>] ret_from_fork+0x24/0x2c

Fixes: f606b3ef47c9 ("s390/pci: adapt events for zbus")
Reported-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Tested-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
(cherry picked from commit 3047766bc6ec9c6bc9ece85b45a41ff401e8d988)
Signed-off-by: Frank Heimes <frank.heimes@canonical.com>
---
 arch/s390/pci/pci_event.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

Comments

Stefan Bader Aug. 25, 2020, 7:39 a.m. UTC | #1
On 25.08.20 09:00, frank.heimes@canonical.com wrote:
> From: Niklas Schnelle <schnelle@linux.ibm.com>
> 
> BugLink: https://bugs.launchpad.net/bugs/1891437
> 
> In usual IPL or hot plug scenarios a zPCI function transitions directly
> from reserved (invisible to Linux) to configured state or is configured
> by Linux itself using an SCLP, however it can also first go from
> reserved to standby and then from standby to configured without
> Linux initiative.
> In this scenario we first get a PEC event 0x302 and then 0x301.  This may
> happen for example when the device is deconfigured at another LPAR and
> made available for this LPAR. It may also happen under z/VM when
> a device is attached while in some inconsistent state.
> 
> However when we get the 0x301 the device is already known to zPCI
> so calling zpci_create() will add it twice resulting in the below
> BUG. Instead we should only enable the existing device and finally
> scan it through the PCI subsystem.
> 
> list_add double add: new=00000000ed5a9008, prev=00000000ed5a9008, next=0000000083502300.
> kernel BUG at lib/list_debug.c:31!
> Krnl PSW : 0704c00180000000 0000000082dc2db8 (__list_add_valid+0x70/0xa8)
> Call Trace:
>  [<0000000082dc2db8>] __list_add_valid+0x70/0xa8
> ([<0000000082dc2db4>] __list_add_valid+0x6c/0xa8)
>  [<00000000828ea920>] zpci_create_device+0x60/0x1b0
>  [<00000000828ef04a>] zpci_event_availability+0x282/0x2f0
>  [<000000008315f848>] chsc_process_crw+0x2b8/0xa18
>  [<000000008316735c>] crw_collect_info+0x254/0x348
>  [<00000000829226ea>] kthread+0x14a/0x168
>  [<000000008319d5c0>] ret_from_fork+0x24/0x2c
> 
> Fixes: f606b3ef47c9 ("s390/pci: adapt events for zbus")
> Reported-by: Alexander Egorenkov <egorenar@linux.ibm.com>
> Tested-by: Alexander Egorenkov <egorenar@linux.ibm.com>
> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
> (cherry picked from commit 3047766bc6ec9c6bc9ece85b45a41ff401e8d988)
> Signed-off-by: Frank Heimes <frank.heimes@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
> ---

Not sure you saw me commenting on the "regression potential" section of other
submissions. Recently there has been a discussion by the SRU team about this and
that most submitters use it not the way they expect. They would like to see more
of a guess about what kinds of symptoms people might observe. So kind of a
literal "what could go wrong?". Maybe you could adjust yours accordingly.

-Stefan

>  arch/s390/pci/pci_event.c | 13 ++++++++++++-
>  1 file changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/s390/pci/pci_event.c b/arch/s390/pci/pci_event.c
> index 08e1d619398e..fdebd286f402 100644
> --- a/arch/s390/pci/pci_event.c
> +++ b/arch/s390/pci/pci_event.c
> @@ -94,7 +94,18 @@ static void __zpci_event_availability(struct zpci_ccdf_avail *ccdf)
>  		}
>  		zdev->fh = ccdf->fh;
>  		zdev->state = ZPCI_FN_STATE_CONFIGURED;
> -		zpci_create_device(zdev);
> +		ret = zpci_enable_device(zdev);
> +		if (ret)
> +			break;
> +
> +		pdev = pci_scan_single_device(zdev->zbus->bus, zdev->devfn);
> +		if (!pdev)
> +			break;
> +
> +		pci_bus_add_device(pdev);
> +		pci_lock_rescan_remove();
> +		pci_bus_add_devices(zdev->zbus->bus);
> +		pci_unlock_rescan_remove();
>  		break;
>  	case 0x0302: /* Reserved -> Standby */
>  		if (!zdev) {
>
Frank Heimes Aug. 25, 2020, 7:45 a.m. UTC | #2
Yeah, I noticed that conversation.

I thought I went a bit into that direction with:

* There is some regression risk with having code changes in the zPCI
sub-system.
* zPCI is the PCI implementation on s390x, modifications here do not affect
any other architecture.
* It could be that *PCI events do not work anymore* and *NVMe devices don't
IPL (boot)* on s390x anymore.
...

I guess you probably want me to go even deeper into 'what may happen' in
future ...

On Tue, Aug 25, 2020 at 9:39 AM Stefan Bader <stefan.bader@canonical.com>
wrote:

> On 25.08.20 09:00, frank.heimes@canonical.com wrote:
> > From: Niklas Schnelle <schnelle@linux.ibm.com>
> >
> > BugLink: https://bugs.launchpad.net/bugs/1891437
> >
> > In usual IPL or hot plug scenarios a zPCI function transitions directly
> > from reserved (invisible to Linux) to configured state or is configured
> > by Linux itself using an SCLP, however it can also first go from
> > reserved to standby and then from standby to configured without
> > Linux initiative.
> > In this scenario we first get a PEC event 0x302 and then 0x301.  This may
> > happen for example when the device is deconfigured at another LPAR and
> > made available for this LPAR. It may also happen under z/VM when
> > a device is attached while in some inconsistent state.
> >
> > However when we get the 0x301 the device is already known to zPCI
> > so calling zpci_create() will add it twice resulting in the below
> > BUG. Instead we should only enable the existing device and finally
> > scan it through the PCI subsystem.
> >
> > list_add double add: new=00000000ed5a9008, prev=00000000ed5a9008,
> next=0000000083502300.
> > kernel BUG at lib/list_debug.c:31!
> > Krnl PSW : 0704c00180000000 0000000082dc2db8 (__list_add_valid+0x70/0xa8)
> > Call Trace:
> >  [<0000000082dc2db8>] __list_add_valid+0x70/0xa8
> > ([<0000000082dc2db4>] __list_add_valid+0x6c/0xa8)
> >  [<00000000828ea920>] zpci_create_device+0x60/0x1b0
> >  [<00000000828ef04a>] zpci_event_availability+0x282/0x2f0
> >  [<000000008315f848>] chsc_process_crw+0x2b8/0xa18
> >  [<000000008316735c>] crw_collect_info+0x254/0x348
> >  [<00000000829226ea>] kthread+0x14a/0x168
> >  [<000000008319d5c0>] ret_from_fork+0x24/0x2c
> >
> > Fixes: f606b3ef47c9 ("s390/pci: adapt events for zbus")
> > Reported-by: Alexander Egorenkov <egorenar@linux.ibm.com>
> > Tested-by: Alexander Egorenkov <egorenar@linux.ibm.com>
> > Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
> > Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
> > (cherry picked from commit 3047766bc6ec9c6bc9ece85b45a41ff401e8d988)
> > Signed-off-by: Frank Heimes <frank.heimes@canonical.com>
> Acked-by: Stefan Bader <stefan.bader@canonical.com>
> > ---
>
> Not sure you saw me commenting on the "regression potential" section of
> other
> submissions. Recently there has been a discussion by the SRU team about
> this and
> that most submitters use it not the way they expect. They would like to
> see more
> of a guess about what kinds of symptoms people might observe. So kind of a
> literal "what could go wrong?". Maybe you could adjust yours accordingly.
>
> -Stefan
>
> >  arch/s390/pci/pci_event.c | 13 ++++++++++++-
> >  1 file changed, 12 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/s390/pci/pci_event.c b/arch/s390/pci/pci_event.c
> > index 08e1d619398e..fdebd286f402 100644
> > --- a/arch/s390/pci/pci_event.c
> > +++ b/arch/s390/pci/pci_event.c
> > @@ -94,7 +94,18 @@ static void __zpci_event_availability(struct
> zpci_ccdf_avail *ccdf)
> >               }
> >               zdev->fh = ccdf->fh;
> >               zdev->state = ZPCI_FN_STATE_CONFIGURED;
> > -             zpci_create_device(zdev);
> > +             ret = zpci_enable_device(zdev);
> > +             if (ret)
> > +                     break;
> > +
> > +             pdev = pci_scan_single_device(zdev->zbus->bus,
> zdev->devfn);
> > +             if (!pdev)
> > +                     break;
> > +
> > +             pci_bus_add_device(pdev);
> > +             pci_lock_rescan_remove();
> > +             pci_bus_add_devices(zdev->zbus->bus);
> > +             pci_unlock_rescan_remove();
> >               break;
> >       case 0x0302: /* Reserved -> Standby */
> >               if (!zdev) {
> >
>
>
>
Stefan Bader Aug. 25, 2020, 8:33 a.m. UTC | #3
On 25.08.20 09:45, Frank Heimes wrote:
> Yeah, I noticed that conversation.
> 
> I thought I went a bit into that direction with:
> 
> * There is some regression risk with having code changes in the zPCI sub-system.  
> * zPCI is the PCI implementation on s390x, modifications here do not affect any
> other architecture.  
> * It could be that *PCI events do not work anymore* and *NVMe devices don't IPL
> (boot)* on s390x anymore.

Yeah, it probably is that. Maybe a bit obscured by the two bullet points before.
Maybe just reordering is enough...

> ...
> 
> I guess you probably want me to go even deeper into 'what may happen' in future ...
> 
> On Tue, Aug 25, 2020 at 9:39 AM Stefan Bader <stefan.bader@canonical.com
> <mailto:stefan.bader@canonical.com>> wrote:
> 
>     On 25.08.20 09:00, frank.heimes@canonical.com
>     <mailto:frank.heimes@canonical.com> wrote:
>     > From: Niklas Schnelle <schnelle@linux.ibm.com <mailto:schnelle@linux.ibm.com>>
>     >
>     > BugLink: https://bugs.launchpad.net/bugs/1891437
>     >
>     > In usual IPL or hot plug scenarios a zPCI function transitions directly
>     > from reserved (invisible to Linux) to configured state or is configured
>     > by Linux itself using an SCLP, however it can also first go from
>     > reserved to standby and then from standby to configured without
>     > Linux initiative.
>     > In this scenario we first get a PEC event 0x302 and then 0x301.  This may
>     > happen for example when the device is deconfigured at another LPAR and
>     > made available for this LPAR. It may also happen under z/VM when
>     > a device is attached while in some inconsistent state.
>     >
>     > However when we get the 0x301 the device is already known to zPCI
>     > so calling zpci_create() will add it twice resulting in the below
>     > BUG. Instead we should only enable the existing device and finally
>     > scan it through the PCI subsystem.
>     >
>     > list_add double add: new=00000000ed5a9008, prev=00000000ed5a9008,
>     next=0000000083502300.
>     > kernel BUG at lib/list_debug.c:31!
>     > Krnl PSW : 0704c00180000000 0000000082dc2db8 (__list_add_valid+0x70/0xa8)
>     > Call Trace:
>     >  [<0000000082dc2db8>] __list_add_valid+0x70/0xa8
>     > ([<0000000082dc2db4>] __list_add_valid+0x6c/0xa8)
>     >  [<00000000828ea920>] zpci_create_device+0x60/0x1b0
>     >  [<00000000828ef04a>] zpci_event_availability+0x282/0x2f0
>     >  [<000000008315f848>] chsc_process_crw+0x2b8/0xa18
>     >  [<000000008316735c>] crw_collect_info+0x254/0x348
>     >  [<00000000829226ea>] kthread+0x14a/0x168
>     >  [<000000008319d5c0>] ret_from_fork+0x24/0x2c
>     >
>     > Fixes: f606b3ef47c9 ("s390/pci: adapt events for zbus")
>     > Reported-by: Alexander Egorenkov <egorenar@linux.ibm.com
>     <mailto:egorenar@linux.ibm.com>>
>     > Tested-by: Alexander Egorenkov <egorenar@linux.ibm.com
>     <mailto:egorenar@linux.ibm.com>>
>     > Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com
>     <mailto:schnelle@linux.ibm.com>>
>     > Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com
>     <mailto:heiko.carstens@de.ibm.com>>
>     > (cherry picked from commit 3047766bc6ec9c6bc9ece85b45a41ff401e8d988)
>     > Signed-off-by: Frank Heimes <frank.heimes@canonical.com
>     <mailto:frank.heimes@canonical.com>>
>     Acked-by: Stefan Bader <stefan.bader@canonical.com
>     <mailto:stefan.bader@canonical.com>>
>     > ---
> 
>     Not sure you saw me commenting on the "regression potential" section of other
>     submissions. Recently there has been a discussion by the SRU team about this and
>     that most submitters use it not the way they expect. They would like to see more
>     of a guess about what kinds of symptoms people might observe. So kind of a
>     literal "what could go wrong?". Maybe you could adjust yours accordingly.
> 
>     -Stefan
> 
>     >  arch/s390/pci/pci_event.c | 13 ++++++++++++-
>     >  1 file changed, 12 insertions(+), 1 deletion(-)
>     >
>     > diff --git a/arch/s390/pci/pci_event.c b/arch/s390/pci/pci_event.c
>     > index 08e1d619398e..fdebd286f402 100644
>     > --- a/arch/s390/pci/pci_event.c
>     > +++ b/arch/s390/pci/pci_event.c
>     > @@ -94,7 +94,18 @@ static void __zpci_event_availability(struct
>     zpci_ccdf_avail *ccdf)
>     >               }
>     >               zdev->fh = ccdf->fh;
>     >               zdev->state = ZPCI_FN_STATE_CONFIGURED;
>     > -             zpci_create_device(zdev);
>     > +             ret = zpci_enable_device(zdev);
>     > +             if (ret)
>     > +                     break;
>     > +
>     > +             pdev = pci_scan_single_device(zdev->zbus->bus, zdev->devfn);
>     > +             if (!pdev)
>     > +                     break;
>     > +
>     > +             pci_bus_add_device(pdev);
>     > +             pci_lock_rescan_remove();
>     > +             pci_bus_add_devices(zdev->zbus->bus);
>     > +             pci_unlock_rescan_remove();
>     >               break;
>     >       case 0x0302: /* Reserved -> Standby */
>     >               if (!zdev) {
>     >
> 
>
Kleber Souza Aug. 27, 2020, 2:30 p.m. UTC | #4
On 25.08.20 09:00, frank.heimes@canonical.com wrote:
> From: Niklas Schnelle <schnelle@linux.ibm.com>
> 
> BugLink: https://bugs.launchpad.net/bugs/1891437
> 
> In usual IPL or hot plug scenarios a zPCI function transitions directly
> from reserved (invisible to Linux) to configured state or is configured
> by Linux itself using an SCLP, however it can also first go from
> reserved to standby and then from standby to configured without
> Linux initiative.
> In this scenario we first get a PEC event 0x302 and then 0x301.  This may
> happen for example when the device is deconfigured at another LPAR and
> made available for this LPAR. It may also happen under z/VM when
> a device is attached while in some inconsistent state.
> 
> However when we get the 0x301 the device is already known to zPCI
> so calling zpci_create() will add it twice resulting in the below
> BUG. Instead we should only enable the existing device and finally
> scan it through the PCI subsystem.
> 
> list_add double add: new=00000000ed5a9008, prev=00000000ed5a9008, next=0000000083502300.
> kernel BUG at lib/list_debug.c:31!
> Krnl PSW : 0704c00180000000 0000000082dc2db8 (__list_add_valid+0x70/0xa8)
> Call Trace:
>  [<0000000082dc2db8>] __list_add_valid+0x70/0xa8
> ([<0000000082dc2db4>] __list_add_valid+0x6c/0xa8)
>  [<00000000828ea920>] zpci_create_device+0x60/0x1b0
>  [<00000000828ef04a>] zpci_event_availability+0x282/0x2f0
>  [<000000008315f848>] chsc_process_crw+0x2b8/0xa18
>  [<000000008316735c>] crw_collect_info+0x254/0x348
>  [<00000000829226ea>] kthread+0x14a/0x168
>  [<000000008319d5c0>] ret_from_fork+0x24/0x2c
> 
> Fixes: f606b3ef47c9 ("s390/pci: adapt events for zbus")
> Reported-by: Alexander Egorenkov <egorenar@linux.ibm.com>
> Tested-by: Alexander Egorenkov <egorenar@linux.ibm.com>
> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
> (cherry picked from commit 3047766bc6ec9c6bc9ece85b45a41ff401e8d988)
> Signed-off-by: Frank Heimes <frank.heimes@canonical.com>

Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>

> ---
>  arch/s390/pci/pci_event.c | 13 ++++++++++++-
>  1 file changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/s390/pci/pci_event.c b/arch/s390/pci/pci_event.c
> index 08e1d619398e..fdebd286f402 100644
> --- a/arch/s390/pci/pci_event.c
> +++ b/arch/s390/pci/pci_event.c
> @@ -94,7 +94,18 @@ static void __zpci_event_availability(struct zpci_ccdf_avail *ccdf)
>  		}
>  		zdev->fh = ccdf->fh;
>  		zdev->state = ZPCI_FN_STATE_CONFIGURED;
> -		zpci_create_device(zdev);
> +		ret = zpci_enable_device(zdev);
> +		if (ret)
> +			break;
> +
> +		pdev = pci_scan_single_device(zdev->zbus->bus, zdev->devfn);
> +		if (!pdev)
> +			break;
> +
> +		pci_bus_add_device(pdev);
> +		pci_lock_rescan_remove();
> +		pci_bus_add_devices(zdev->zbus->bus);
> +		pci_unlock_rescan_remove();
>  		break;
>  	case 0x0302: /* Reserved -> Standby */
>  		if (!zdev) {
>
diff mbox series

Patch

diff --git a/arch/s390/pci/pci_event.c b/arch/s390/pci/pci_event.c
index 08e1d619398e..fdebd286f402 100644
--- a/arch/s390/pci/pci_event.c
+++ b/arch/s390/pci/pci_event.c
@@ -94,7 +94,18 @@  static void __zpci_event_availability(struct zpci_ccdf_avail *ccdf)
 		}
 		zdev->fh = ccdf->fh;
 		zdev->state = ZPCI_FN_STATE_CONFIGURED;
-		zpci_create_device(zdev);
+		ret = zpci_enable_device(zdev);
+		if (ret)
+			break;
+
+		pdev = pci_scan_single_device(zdev->zbus->bus, zdev->devfn);
+		if (!pdev)
+			break;
+
+		pci_bus_add_device(pdev);
+		pci_lock_rescan_remove();
+		pci_bus_add_devices(zdev->zbus->bus);
+		pci_unlock_rescan_remove();
 		break;
 	case 0x0302: /* Reserved -> Standby */
 		if (!zdev) {