Message ID | 20150308094920.25374.94537.stgit@localhost.localdomain |
---|---|
State | Changes Requested |
Headers | show |
On 03/08/2015 03:19 PM, Vasant Hegde wrote: > This patch attempts to fix "poller lock issue" by removing > opal_run_pollers function call from fsp_sync_msg. > > sample log: > ---------- > [4130907342,3] Running pollers with lock held ! > CPU 08b0 Backtrace: > S: 0000000033cc3990 R: 0000000030013520 .backtrace+0x2c > S: 0000000033cc3a20 R: 0000000030017d9c .opal_run_pollers+0x44 > S: 0000000033cc3aa0 R: 0000000030045f04 .fsp_sync_msg+0x40 > S: 0000000033cc3b30 R: 0000000030047840 .fsp_fetch_data+0x138 > S: 0000000033cc3c20 R: 0000000030021204 .hservices_lid_preload+0x148 > S: 0000000033cc3d10 R: 0000000030058038 .ibm_fsp_init+0xe4 > S: 0000000033cc3dc0 R: 0000000030058df8 .firenze_init+0x18 > S: 0000000033cc3e30 R: 00000000300141a4 .main_cpu_entry+0x3e0 > S: 0000000033cc3f00 R: 0000000030002534 boot_entry+0x18c > > Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> > > --- > Ben, Stewart, > Presently we have two places (fsp_fetch_data & capp_lid_download) where > we held lock and make opal_run_pollers call (via fsp_sync_msg). > > Earlier we had replaced __fsp_poll with opal_run_pollers (commit 488dc8a3). > But I think its fine just to call __fsp_poll here. This seems to be working. > But needs to be reviewed before merging. > > I tried other alternative approach we discussed earlier. (introduce queue in > fsp_fetch_data). This requies quite a bit of work in callback function. Hmmm.. This patch fixes poller lock issue.. But this *doesn't* seems to be complete.. We still hit lock issue in fsp_fetch_data, if we get OCC load request while fetching skiroot :-( Another possible approach: Presently we use same DMA mapping for all the fetch data.. Instead if we use separate DMA mapping for OCC load requests then we can simply queue that request using fsp_fetch_data_queue call..That way we will not hit lock issue in fsp_fetch_data.. Looking into code base I don't see any other places we get such notification which will result in fsp_fetch_data call. -Vasant > --- > hw/fsp/fsp.c | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/hw/fsp/fsp.c b/hw/fsp/fsp.c > index e57c611..5fbc33f 100644 > --- a/hw/fsp/fsp.c > +++ b/hw/fsp/fsp.c > @@ -1663,7 +1663,10 @@ int fsp_sync_msg(struct fsp_msg *msg, bool autofree) > > while(fsp_msg_busy(msg)) { > cpu_relax(); > - opal_run_pollers(); > + lock(&fsp_lock); > + __fsp_poll(false); > + unlock(&fsp_lock); > } > > switch(msg->state) { > > _______________________________________________ > Skiboot mailing list > Skiboot@lists.ozlabs.org > https://lists.ozlabs.org/listinfo/skiboot >
* Vasant Hegde <hegdevasant@linux.vnet.ibm.com> [2015-03-08 16:56:31]: > On 03/08/2015 03:19 PM, Vasant Hegde wrote: > > This patch attempts to fix "poller lock issue" by removing > > opal_run_pollers function call from fsp_sync_msg. > > > > sample log: > > ---------- > > [4130907342,3] Running pollers with lock held ! > > CPU 08b0 Backtrace: > > S: 0000000033cc3990 R: 0000000030013520 .backtrace+0x2c > > S: 0000000033cc3a20 R: 0000000030017d9c .opal_run_pollers+0x44 > > S: 0000000033cc3aa0 R: 0000000030045f04 .fsp_sync_msg+0x40 > > S: 0000000033cc3b30 R: 0000000030047840 .fsp_fetch_data+0x138 > > S: 0000000033cc3c20 R: 0000000030021204 .hservices_lid_preload+0x148 > > S: 0000000033cc3d10 R: 0000000030058038 .ibm_fsp_init+0xe4 > > S: 0000000033cc3dc0 R: 0000000030058df8 .firenze_init+0x18 > > S: 0000000033cc3e30 R: 00000000300141a4 .main_cpu_entry+0x3e0 > > S: 0000000033cc3f00 R: 0000000030002534 boot_entry+0x18c > > > > Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> > > > > --- > > Ben, Stewart, > > Presently we have two places (fsp_fetch_data & capp_lid_download) where > > we held lock and make opal_run_pollers call (via fsp_sync_msg). > > > > Earlier we had replaced __fsp_poll with opal_run_pollers (commit 488dc8a3). > > But I think its fine just to call __fsp_poll here. This seems to be working. > > But needs to be reviewed before merging. > > > > I tried other alternative approach we discussed earlier. (introduce queue in > > fsp_fetch_data). This requies quite a bit of work in callback function. > > Hmmm.. This patch fixes poller lock issue.. But this *doesn't* seems to be > complete.. We still hit lock issue in fsp_fetch_data, if we get OCC load request > while fetching skiroot :-( > > Another possible approach: > Presently we use same DMA mapping for all the fetch data.. Instead if we use > separate DMA mapping for OCC load requests then we can simply queue that request > using fsp_fetch_data_queue call..That way we will not hit lock issue in > fsp_fetch_data.. Looking into code base I don't see any other places we get such > notification which will result in fsp_fetch_data call. Hi Vasanth, The OCC load request can be made to wait. I cam make it work with a callback. When fsp sends a OCC load request, I can issue a pre-load of OCC lid and queue it to your functions. The callback after completion will do the actual OCC load request processing by calling hostboot. Since we have preloaded the OCC lid, when we callback from hostboot for OCC lid, we will already have it and not come to fsp_fetch_data_queue at all. What I still need to understand is the CAPI ucode loading request and response sequence. At what point will we be able to hold the request and continue after the callback. --Vaidy
On Sun, Mar 08, 2015 at 03:19:41PM +0530, Vasant Hegde wrote: > This patch attempts to fix "poller lock issue" by removing > opal_run_pollers function call from fsp_sync_msg. > > sample log: > ---------- > [4130907342,3] Running pollers with lock held ! > CPU 08b0 Backtrace: > S: 0000000033cc3990 R: 0000000030013520 .backtrace+0x2c > S: 0000000033cc3a20 R: 0000000030017d9c .opal_run_pollers+0x44 > S: 0000000033cc3aa0 R: 0000000030045f04 .fsp_sync_msg+0x40 > S: 0000000033cc3b30 R: 0000000030047840 .fsp_fetch_data+0x138 > S: 0000000033cc3c20 R: 0000000030021204 .hservices_lid_preload+0x148 > S: 0000000033cc3d10 R: 0000000030058038 .ibm_fsp_init+0xe4 > S: 0000000033cc3dc0 R: 0000000030058df8 .firenze_init+0x18 > S: 0000000033cc3e30 R: 00000000300141a4 .main_cpu_entry+0x3e0 > S: 0000000033cc3f00 R: 0000000030002534 boot_entry+0x18c > > Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> > > --- > Ben, Stewart, > Presently we have two places (fsp_fetch_data & capp_lid_download) where > we held lock and make opal_run_pollers call (via fsp_sync_msg). > > Earlier we had replaced __fsp_poll with opal_run_pollers (commit 488dc8a3). > But I think its fine just to call __fsp_poll here. This seems to be working. > But needs to be reviewed before merging. > > I tried other alternative approach we discussed earlier. (introduce queue in > fsp_fetch_data). This requies quite a bit of work in callback function. This explanation needs to go in the commit message. > --- > hw/fsp/fsp.c | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/hw/fsp/fsp.c b/hw/fsp/fsp.c > index e57c611..5fbc33f 100644 > --- a/hw/fsp/fsp.c > +++ b/hw/fsp/fsp.c > @@ -1663,7 +1663,10 @@ int fsp_sync_msg(struct fsp_msg *msg, bool autofree) > > while(fsp_msg_busy(msg)) { > cpu_relax(); > - opal_run_pollers(); > + lock(&fsp_lock); > + __fsp_poll(false); > + unlock(&fsp_lock); You could just use fsp_opal_poll() instead. Ananth
On Mon, 2015-03-09 at 12:06 +0530, Ananth N Mavinakayanahalli wrote: > On Sun, Mar 08, 2015 at 03:19:41PM +0530, Vasant Hegde wrote: > > This patch attempts to fix "poller lock issue" by removing > > opal_run_pollers function call from fsp_sync_msg. > > I disagree with generically changing the call to __fsp_poll(). That means for example that during such a potentially long call, we will not monitor the PSI link. While a full queued d/l request mechanism is better, we could get away with a simpler hack provided we are certain that we never will call fsp_fetch_data() from a poller callback. It's a bit gross but will provide a quick fix while you work on a better solution, using a busy flag: for (;;) { lock(...) test and set busy flag unlock(...) if (flag was clear) break; cpu_relax(); } .. do load clear flag > > sample log: > > ---------- > > [4130907342,3] Running pollers with lock held ! > > CPU 08b0 Backtrace: > > S: 0000000033cc3990 R: 0000000030013520 .backtrace+0x2c > > S: 0000000033cc3a20 R: 0000000030017d9c .opal_run_pollers+0x44 > > S: 0000000033cc3aa0 R: 0000000030045f04 .fsp_sync_msg+0x40 > > S: 0000000033cc3b30 R: 0000000030047840 .fsp_fetch_data+0x138 > > S: 0000000033cc3c20 R: 0000000030021204 .hservices_lid_preload+0x148 > > S: 0000000033cc3d10 R: 0000000030058038 .ibm_fsp_init+0xe4 > > S: 0000000033cc3dc0 R: 0000000030058df8 .firenze_init+0x18 > > S: 0000000033cc3e30 R: 00000000300141a4 .main_cpu_entry+0x3e0 > > S: 0000000033cc3f00 R: 0000000030002534 boot_entry+0x18c > > > > Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> > > > > --- > > Ben, Stewart, > > Presently we have two places (fsp_fetch_data & capp_lid_download) where > > we held lock and make opal_run_pollers call (via fsp_sync_msg). > > > > Earlier we had replaced __fsp_poll with opal_run_pollers (commit 488dc8a3). > > But I think its fine just to call __fsp_poll here. This seems to be working. > > But needs to be reviewed before merging. > > > > I tried other alternative approach we discussed earlier. (introduce queue in > > fsp_fetch_data). This requies quite a bit of work in callback function. > > This explanation needs to go in the commit message. > > > --- > > hw/fsp/fsp.c | 5 ++++- > > 1 file changed, 4 insertions(+), 1 deletion(-) > > > > diff --git a/hw/fsp/fsp.c b/hw/fsp/fsp.c > > index e57c611..5fbc33f 100644 > > --- a/hw/fsp/fsp.c > > +++ b/hw/fsp/fsp.c > > @@ -1663,7 +1663,10 @@ int fsp_sync_msg(struct fsp_msg *msg, bool autofree) > > > > while(fsp_msg_busy(msg)) { > > cpu_relax(); > > - opal_run_pollers(); > > + lock(&fsp_lock); > > + __fsp_poll(false); > > + unlock(&fsp_lock); > > You could just use fsp_opal_poll() instead. > > Ananth > > _______________________________________________ > Skiboot mailing list > Skiboot@lists.ozlabs.org > https://lists.ozlabs.org/listinfo/skiboot
diff --git a/hw/fsp/fsp.c b/hw/fsp/fsp.c index e57c611..5fbc33f 100644 --- a/hw/fsp/fsp.c +++ b/hw/fsp/fsp.c @@ -1663,7 +1663,10 @@ int fsp_sync_msg(struct fsp_msg *msg, bool autofree) while(fsp_msg_busy(msg)) { cpu_relax(); - opal_run_pollers(); + lock(&fsp_lock); + __fsp_poll(false); + unlock(&fsp_lock); } switch(msg->state) {
This patch attempts to fix "poller lock issue" by removing opal_run_pollers function call from fsp_sync_msg. sample log: ---------- [4130907342,3] Running pollers with lock held ! CPU 08b0 Backtrace: S: 0000000033cc3990 R: 0000000030013520 .backtrace+0x2c S: 0000000033cc3a20 R: 0000000030017d9c .opal_run_pollers+0x44 S: 0000000033cc3aa0 R: 0000000030045f04 .fsp_sync_msg+0x40 S: 0000000033cc3b30 R: 0000000030047840 .fsp_fetch_data+0x138 S: 0000000033cc3c20 R: 0000000030021204 .hservices_lid_preload+0x148 S: 0000000033cc3d10 R: 0000000030058038 .ibm_fsp_init+0xe4 S: 0000000033cc3dc0 R: 0000000030058df8 .firenze_init+0x18 S: 0000000033cc3e30 R: 00000000300141a4 .main_cpu_entry+0x3e0 S: 0000000033cc3f00 R: 0000000030002534 boot_entry+0x18c Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> --- Ben, Stewart, Presently we have two places (fsp_fetch_data & capp_lid_download) where we held lock and make opal_run_pollers call (via fsp_sync_msg). Earlier we had replaced __fsp_poll with opal_run_pollers (commit 488dc8a3). But I think its fine just to call __fsp_poll here. This seems to be working. But needs to be reviewed before merging. I tried other alternative approach we discussed earlier. (introduce queue in fsp_fetch_data). This requies quite a bit of work in callback function. --- hw/fsp/fsp.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)