Message ID | 20200825070050.752663-2-frank.heimes@canonical.com |
---|---|
State | New |
Headers | show |
Series | zPCI: Enabling of a reserved PCI function regression introduced by multi-function support (LP: 1891437) | expand |
On 25.08.20 09:00, frank.heimes@canonical.com wrote: > From: Niklas Schnelle <schnelle@linux.ibm.com> > > BugLink: https://bugs.launchpad.net/bugs/1891437 > > In usual IPL or hot plug scenarios a zPCI function transitions directly > from reserved (invisible to Linux) to configured state or is configured > by Linux itself using an SCLP, however it can also first go from > reserved to standby and then from standby to configured without > Linux initiative. > In this scenario we first get a PEC event 0x302 and then 0x301. This may > happen for example when the device is deconfigured at another LPAR and > made available for this LPAR. It may also happen under z/VM when > a device is attached while in some inconsistent state. > > However when we get the 0x301 the device is already known to zPCI > so calling zpci_create() will add it twice resulting in the below > BUG. Instead we should only enable the existing device and finally > scan it through the PCI subsystem. > > list_add double add: new=00000000ed5a9008, prev=00000000ed5a9008, next=0000000083502300. > kernel BUG at lib/list_debug.c:31! > Krnl PSW : 0704c00180000000 0000000082dc2db8 (__list_add_valid+0x70/0xa8) > Call Trace: > [<0000000082dc2db8>] __list_add_valid+0x70/0xa8 > ([<0000000082dc2db4>] __list_add_valid+0x6c/0xa8) > [<00000000828ea920>] zpci_create_device+0x60/0x1b0 > [<00000000828ef04a>] zpci_event_availability+0x282/0x2f0 > [<000000008315f848>] chsc_process_crw+0x2b8/0xa18 > [<000000008316735c>] crw_collect_info+0x254/0x348 > [<00000000829226ea>] kthread+0x14a/0x168 > [<000000008319d5c0>] ret_from_fork+0x24/0x2c > > Fixes: f606b3ef47c9 ("s390/pci: adapt events for zbus") > Reported-by: Alexander Egorenkov <egorenar@linux.ibm.com> > Tested-by: Alexander Egorenkov <egorenar@linux.ibm.com> > Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> > Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> > (cherry picked from commit 3047766bc6ec9c6bc9ece85b45a41ff401e8d988) > Signed-off-by: Frank Heimes <frank.heimes@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> > --- Not sure you saw me commenting on the "regression potential" section of other submissions. Recently there has been a discussion by the SRU team about this and that most submitters use it not the way they expect. They would like to see more of a guess about what kinds of symptoms people might observe. So kind of a literal "what could go wrong?". Maybe you could adjust yours accordingly. -Stefan > arch/s390/pci/pci_event.c | 13 ++++++++++++- > 1 file changed, 12 insertions(+), 1 deletion(-) > > diff --git a/arch/s390/pci/pci_event.c b/arch/s390/pci/pci_event.c > index 08e1d619398e..fdebd286f402 100644 > --- a/arch/s390/pci/pci_event.c > +++ b/arch/s390/pci/pci_event.c > @@ -94,7 +94,18 @@ static void __zpci_event_availability(struct zpci_ccdf_avail *ccdf) > } > zdev->fh = ccdf->fh; > zdev->state = ZPCI_FN_STATE_CONFIGURED; > - zpci_create_device(zdev); > + ret = zpci_enable_device(zdev); > + if (ret) > + break; > + > + pdev = pci_scan_single_device(zdev->zbus->bus, zdev->devfn); > + if (!pdev) > + break; > + > + pci_bus_add_device(pdev); > + pci_lock_rescan_remove(); > + pci_bus_add_devices(zdev->zbus->bus); > + pci_unlock_rescan_remove(); > break; > case 0x0302: /* Reserved -> Standby */ > if (!zdev) { >
Yeah, I noticed that conversation. I thought I went a bit into that direction with: * There is some regression risk with having code changes in the zPCI sub-system. * zPCI is the PCI implementation on s390x, modifications here do not affect any other architecture. * It could be that *PCI events do not work anymore* and *NVMe devices don't IPL (boot)* on s390x anymore. ... I guess you probably want me to go even deeper into 'what may happen' in future ... On Tue, Aug 25, 2020 at 9:39 AM Stefan Bader <stefan.bader@canonical.com> wrote: > On 25.08.20 09:00, frank.heimes@canonical.com wrote: > > From: Niklas Schnelle <schnelle@linux.ibm.com> > > > > BugLink: https://bugs.launchpad.net/bugs/1891437 > > > > In usual IPL or hot plug scenarios a zPCI function transitions directly > > from reserved (invisible to Linux) to configured state or is configured > > by Linux itself using an SCLP, however it can also first go from > > reserved to standby and then from standby to configured without > > Linux initiative. > > In this scenario we first get a PEC event 0x302 and then 0x301. This may > > happen for example when the device is deconfigured at another LPAR and > > made available for this LPAR. It may also happen under z/VM when > > a device is attached while in some inconsistent state. > > > > However when we get the 0x301 the device is already known to zPCI > > so calling zpci_create() will add it twice resulting in the below > > BUG. Instead we should only enable the existing device and finally > > scan it through the PCI subsystem. > > > > list_add double add: new=00000000ed5a9008, prev=00000000ed5a9008, > next=0000000083502300. > > kernel BUG at lib/list_debug.c:31! > > Krnl PSW : 0704c00180000000 0000000082dc2db8 (__list_add_valid+0x70/0xa8) > > Call Trace: > > [<0000000082dc2db8>] __list_add_valid+0x70/0xa8 > > ([<0000000082dc2db4>] __list_add_valid+0x6c/0xa8) > > [<00000000828ea920>] zpci_create_device+0x60/0x1b0 > > [<00000000828ef04a>] zpci_event_availability+0x282/0x2f0 > > [<000000008315f848>] chsc_process_crw+0x2b8/0xa18 > > [<000000008316735c>] crw_collect_info+0x254/0x348 > > [<00000000829226ea>] kthread+0x14a/0x168 > > [<000000008319d5c0>] ret_from_fork+0x24/0x2c > > > > Fixes: f606b3ef47c9 ("s390/pci: adapt events for zbus") > > Reported-by: Alexander Egorenkov <egorenar@linux.ibm.com> > > Tested-by: Alexander Egorenkov <egorenar@linux.ibm.com> > > Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> > > Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> > > (cherry picked from commit 3047766bc6ec9c6bc9ece85b45a41ff401e8d988) > > Signed-off-by: Frank Heimes <frank.heimes@canonical.com> > Acked-by: Stefan Bader <stefan.bader@canonical.com> > > --- > > Not sure you saw me commenting on the "regression potential" section of > other > submissions. Recently there has been a discussion by the SRU team about > this and > that most submitters use it not the way they expect. They would like to > see more > of a guess about what kinds of symptoms people might observe. So kind of a > literal "what could go wrong?". Maybe you could adjust yours accordingly. > > -Stefan > > > arch/s390/pci/pci_event.c | 13 ++++++++++++- > > 1 file changed, 12 insertions(+), 1 deletion(-) > > > > diff --git a/arch/s390/pci/pci_event.c b/arch/s390/pci/pci_event.c > > index 08e1d619398e..fdebd286f402 100644 > > --- a/arch/s390/pci/pci_event.c > > +++ b/arch/s390/pci/pci_event.c > > @@ -94,7 +94,18 @@ static void __zpci_event_availability(struct > zpci_ccdf_avail *ccdf) > > } > > zdev->fh = ccdf->fh; > > zdev->state = ZPCI_FN_STATE_CONFIGURED; > > - zpci_create_device(zdev); > > + ret = zpci_enable_device(zdev); > > + if (ret) > > + break; > > + > > + pdev = pci_scan_single_device(zdev->zbus->bus, > zdev->devfn); > > + if (!pdev) > > + break; > > + > > + pci_bus_add_device(pdev); > > + pci_lock_rescan_remove(); > > + pci_bus_add_devices(zdev->zbus->bus); > > + pci_unlock_rescan_remove(); > > break; > > case 0x0302: /* Reserved -> Standby */ > > if (!zdev) { > > > > >
On 25.08.20 09:45, Frank Heimes wrote: > Yeah, I noticed that conversation. > > I thought I went a bit into that direction with: > > * There is some regression risk with having code changes in the zPCI sub-system. > * zPCI is the PCI implementation on s390x, modifications here do not affect any > other architecture. > * It could be that *PCI events do not work anymore* and *NVMe devices don't IPL > (boot)* on s390x anymore. Yeah, it probably is that. Maybe a bit obscured by the two bullet points before. Maybe just reordering is enough... > ... > > I guess you probably want me to go even deeper into 'what may happen' in future ... > > On Tue, Aug 25, 2020 at 9:39 AM Stefan Bader <stefan.bader@canonical.com > <mailto:stefan.bader@canonical.com>> wrote: > > On 25.08.20 09:00, frank.heimes@canonical.com > <mailto:frank.heimes@canonical.com> wrote: > > From: Niklas Schnelle <schnelle@linux.ibm.com <mailto:schnelle@linux.ibm.com>> > > > > BugLink: https://bugs.launchpad.net/bugs/1891437 > > > > In usual IPL or hot plug scenarios a zPCI function transitions directly > > from reserved (invisible to Linux) to configured state or is configured > > by Linux itself using an SCLP, however it can also first go from > > reserved to standby and then from standby to configured without > > Linux initiative. > > In this scenario we first get a PEC event 0x302 and then 0x301. This may > > happen for example when the device is deconfigured at another LPAR and > > made available for this LPAR. It may also happen under z/VM when > > a device is attached while in some inconsistent state. > > > > However when we get the 0x301 the device is already known to zPCI > > so calling zpci_create() will add it twice resulting in the below > > BUG. Instead we should only enable the existing device and finally > > scan it through the PCI subsystem. > > > > list_add double add: new=00000000ed5a9008, prev=00000000ed5a9008, > next=0000000083502300. > > kernel BUG at lib/list_debug.c:31! > > Krnl PSW : 0704c00180000000 0000000082dc2db8 (__list_add_valid+0x70/0xa8) > > Call Trace: > > [<0000000082dc2db8>] __list_add_valid+0x70/0xa8 > > ([<0000000082dc2db4>] __list_add_valid+0x6c/0xa8) > > [<00000000828ea920>] zpci_create_device+0x60/0x1b0 > > [<00000000828ef04a>] zpci_event_availability+0x282/0x2f0 > > [<000000008315f848>] chsc_process_crw+0x2b8/0xa18 > > [<000000008316735c>] crw_collect_info+0x254/0x348 > > [<00000000829226ea>] kthread+0x14a/0x168 > > [<000000008319d5c0>] ret_from_fork+0x24/0x2c > > > > Fixes: f606b3ef47c9 ("s390/pci: adapt events for zbus") > > Reported-by: Alexander Egorenkov <egorenar@linux.ibm.com > <mailto:egorenar@linux.ibm.com>> > > Tested-by: Alexander Egorenkov <egorenar@linux.ibm.com > <mailto:egorenar@linux.ibm.com>> > > Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com > <mailto:schnelle@linux.ibm.com>> > > Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com > <mailto:heiko.carstens@de.ibm.com>> > > (cherry picked from commit 3047766bc6ec9c6bc9ece85b45a41ff401e8d988) > > Signed-off-by: Frank Heimes <frank.heimes@canonical.com > <mailto:frank.heimes@canonical.com>> > Acked-by: Stefan Bader <stefan.bader@canonical.com > <mailto:stefan.bader@canonical.com>> > > --- > > Not sure you saw me commenting on the "regression potential" section of other > submissions. Recently there has been a discussion by the SRU team about this and > that most submitters use it not the way they expect. They would like to see more > of a guess about what kinds of symptoms people might observe. So kind of a > literal "what could go wrong?". Maybe you could adjust yours accordingly. > > -Stefan > > > arch/s390/pci/pci_event.c | 13 ++++++++++++- > > 1 file changed, 12 insertions(+), 1 deletion(-) > > > > diff --git a/arch/s390/pci/pci_event.c b/arch/s390/pci/pci_event.c > > index 08e1d619398e..fdebd286f402 100644 > > --- a/arch/s390/pci/pci_event.c > > +++ b/arch/s390/pci/pci_event.c > > @@ -94,7 +94,18 @@ static void __zpci_event_availability(struct > zpci_ccdf_avail *ccdf) > > } > > zdev->fh = ccdf->fh; > > zdev->state = ZPCI_FN_STATE_CONFIGURED; > > - zpci_create_device(zdev); > > + ret = zpci_enable_device(zdev); > > + if (ret) > > + break; > > + > > + pdev = pci_scan_single_device(zdev->zbus->bus, zdev->devfn); > > + if (!pdev) > > + break; > > + > > + pci_bus_add_device(pdev); > > + pci_lock_rescan_remove(); > > + pci_bus_add_devices(zdev->zbus->bus); > > + pci_unlock_rescan_remove(); > > break; > > case 0x0302: /* Reserved -> Standby */ > > if (!zdev) { > > > >
On 25.08.20 09:00, frank.heimes@canonical.com wrote: > From: Niklas Schnelle <schnelle@linux.ibm.com> > > BugLink: https://bugs.launchpad.net/bugs/1891437 > > In usual IPL or hot plug scenarios a zPCI function transitions directly > from reserved (invisible to Linux) to configured state or is configured > by Linux itself using an SCLP, however it can also first go from > reserved to standby and then from standby to configured without > Linux initiative. > In this scenario we first get a PEC event 0x302 and then 0x301. This may > happen for example when the device is deconfigured at another LPAR and > made available for this LPAR. It may also happen under z/VM when > a device is attached while in some inconsistent state. > > However when we get the 0x301 the device is already known to zPCI > so calling zpci_create() will add it twice resulting in the below > BUG. Instead we should only enable the existing device and finally > scan it through the PCI subsystem. > > list_add double add: new=00000000ed5a9008, prev=00000000ed5a9008, next=0000000083502300. > kernel BUG at lib/list_debug.c:31! > Krnl PSW : 0704c00180000000 0000000082dc2db8 (__list_add_valid+0x70/0xa8) > Call Trace: > [<0000000082dc2db8>] __list_add_valid+0x70/0xa8 > ([<0000000082dc2db4>] __list_add_valid+0x6c/0xa8) > [<00000000828ea920>] zpci_create_device+0x60/0x1b0 > [<00000000828ef04a>] zpci_event_availability+0x282/0x2f0 > [<000000008315f848>] chsc_process_crw+0x2b8/0xa18 > [<000000008316735c>] crw_collect_info+0x254/0x348 > [<00000000829226ea>] kthread+0x14a/0x168 > [<000000008319d5c0>] ret_from_fork+0x24/0x2c > > Fixes: f606b3ef47c9 ("s390/pci: adapt events for zbus") > Reported-by: Alexander Egorenkov <egorenar@linux.ibm.com> > Tested-by: Alexander Egorenkov <egorenar@linux.ibm.com> > Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> > Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> > (cherry picked from commit 3047766bc6ec9c6bc9ece85b45a41ff401e8d988) > Signed-off-by: Frank Heimes <frank.heimes@canonical.com> Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com> > --- > arch/s390/pci/pci_event.c | 13 ++++++++++++- > 1 file changed, 12 insertions(+), 1 deletion(-) > > diff --git a/arch/s390/pci/pci_event.c b/arch/s390/pci/pci_event.c > index 08e1d619398e..fdebd286f402 100644 > --- a/arch/s390/pci/pci_event.c > +++ b/arch/s390/pci/pci_event.c > @@ -94,7 +94,18 @@ static void __zpci_event_availability(struct zpci_ccdf_avail *ccdf) > } > zdev->fh = ccdf->fh; > zdev->state = ZPCI_FN_STATE_CONFIGURED; > - zpci_create_device(zdev); > + ret = zpci_enable_device(zdev); > + if (ret) > + break; > + > + pdev = pci_scan_single_device(zdev->zbus->bus, zdev->devfn); > + if (!pdev) > + break; > + > + pci_bus_add_device(pdev); > + pci_lock_rescan_remove(); > + pci_bus_add_devices(zdev->zbus->bus); > + pci_unlock_rescan_remove(); > break; > case 0x0302: /* Reserved -> Standby */ > if (!zdev) { >
diff --git a/arch/s390/pci/pci_event.c b/arch/s390/pci/pci_event.c index 08e1d619398e..fdebd286f402 100644 --- a/arch/s390/pci/pci_event.c +++ b/arch/s390/pci/pci_event.c @@ -94,7 +94,18 @@ static void __zpci_event_availability(struct zpci_ccdf_avail *ccdf) } zdev->fh = ccdf->fh; zdev->state = ZPCI_FN_STATE_CONFIGURED; - zpci_create_device(zdev); + ret = zpci_enable_device(zdev); + if (ret) + break; + + pdev = pci_scan_single_device(zdev->zbus->bus, zdev->devfn); + if (!pdev) + break; + + pci_bus_add_device(pdev); + pci_lock_rescan_remove(); + pci_bus_add_devices(zdev->zbus->bus); + pci_unlock_rescan_remove(); break; case 0x0302: /* Reserved -> Standby */ if (!zdev) {