Message ID | 1470128232-22532-1-git-send-email-mukesh02@linux.vnet.ibm.com |
---|---|
State | Superseded |
Headers | show |
Lets move this elog_enabled = false; while defining elog_enabled variable. (As per Vasant's comment) This changes also need to go to all stable branches. Changes will be in V2. Regards, Mukesh On Tuesday 02 August 2016 02:27 PM, Mukesh Ojha wrote: > This issue is one of the corner case, which is related to recent change > went upstream and only observed in the petitboot prompt, where we see > only one error log instead of getting all error log in > /sys/firmware/opal/elog. > > Below is snippet of the code, where elog module in the kernel > initialised. > > { > .. > ... > rc = request_threaded_irq(irq, NULL, elog_event, =<======= > IRQF_TRIGGER_HIGH | IRQF_ONESHOT, "opal-elog", NULL); | > if (rc) { | > pr_err("%s: Can't request OPAL event irq (%d)\n", | > __func__, rc); | > return rc; | > } | > /* We are now ready to pull error logs from opal. */ | > if (opal_check_token(OPAL_ELOG_RESEND)) | > opal_resend_pending_logs(); =<======= > } > > Scenario: > While elog_enabled is true, OPAL_EVENT_ERROR_LOG_AVAIL will be set from > OPAL, whenever it has error logs that are waiting to be fetched from the > kernel. > > Race occurs between the code arrowed above, as soon as kernel registers > error log handler, it sees OPAL_EVENT_ERROR_LOG_AVAIL is set, so it > schedule the handler. Which makes 'opal_get_elog_size'(kernel) call on > the error log set the state from ELOG_STATE_FETCHED_DATA to > ELOG_STATE_FETCHED_INFO and clears OPAL_EVENT_ERROR_LOG_AVAIL. During > the same time 'opal_resend_pending_logs'(kernel) call which will set the > state machine from ELOG_STATE_FETCHED_INFO to ELOG_STATE_NONE in OPAL. > Because of that, read call from the kernel, which was to be made after > the 'opal_get_elog_size' ends up failing. But, the elog kobject was > created for the particular error log. > > Further in the resend routine in the OPAL, we make opal_commit_elog_in_host() > call that sets OPAL_EVENT_ERROR_LOG_AVAIL. So, Kernel again makes > 'opal_get_elog_size' which results in getting the error log info of the > same error log which was fetched earlier. It also changes the state > machine to ELOG_STATE_FETCHED_INFO and clears OPAL_EVENT_ERROR_LOG_AVAIL. > > > Below is the snippet from the elog_event registered handler call > { > ... > ... > > /* we may get notified twice, let's handle > * that gracefully and not create two conflicting > * entries. > */ > if (kset_find_obj(elog_kset, name)) > return IRQ_HANDLED; > ... > ... > } > > In the kernel, we search kobject for the error log whether it already > exist. So kobject is found and it returns without reading error log > data. > > So, this patch makes the flag which was true during initialisation > to false. And that solves the race. > > Signed-off-by: Mukesh Ojha <mukesh02@linux.vnet.ibm.com> > --- > hw/fsp/fsp-elog-read.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/hw/fsp/fsp-elog-read.c b/hw/fsp/fsp-elog-read.c > index 4f0bf16..c133438 100644 > --- a/hw/fsp/fsp-elog-read.c > +++ b/hw/fsp/fsp-elog-read.c > @@ -612,7 +612,7 @@ void fsp_elog_read_init(void) > if (val != 0) > return; > > - elog_enabled = true; > + elog_enabled = false; > > /* register Eror log Class D2 */ > fsp_register_client(&fsp_get_elog_notify, FSP_MCLASS_ERR_LOG);
diff --git a/hw/fsp/fsp-elog-read.c b/hw/fsp/fsp-elog-read.c index 4f0bf16..c133438 100644 --- a/hw/fsp/fsp-elog-read.c +++ b/hw/fsp/fsp-elog-read.c @@ -612,7 +612,7 @@ void fsp_elog_read_init(void) if (val != 0) return; - elog_enabled = true; + elog_enabled = false; /* register Eror log Class D2 */ fsp_register_client(&fsp_get_elog_notify, FSP_MCLASS_ERR_LOG);
This issue is one of the corner case, which is related to recent change went upstream and only observed in the petitboot prompt, where we see only one error log instead of getting all error log in /sys/firmware/opal/elog. Below is snippet of the code, where elog module in the kernel initialised. { .. ... rc = request_threaded_irq(irq, NULL, elog_event, =<======= IRQF_TRIGGER_HIGH | IRQF_ONESHOT, "opal-elog", NULL); | if (rc) { | pr_err("%s: Can't request OPAL event irq (%d)\n", | __func__, rc); | return rc; | } | /* We are now ready to pull error logs from opal. */ | if (opal_check_token(OPAL_ELOG_RESEND)) | opal_resend_pending_logs(); =<======= } Scenario: While elog_enabled is true, OPAL_EVENT_ERROR_LOG_AVAIL will be set from OPAL, whenever it has error logs that are waiting to be fetched from the kernel. Race occurs between the code arrowed above, as soon as kernel registers error log handler, it sees OPAL_EVENT_ERROR_LOG_AVAIL is set, so it schedule the handler. Which makes 'opal_get_elog_size'(kernel) call on the error log set the state from ELOG_STATE_FETCHED_DATA to ELOG_STATE_FETCHED_INFO and clears OPAL_EVENT_ERROR_LOG_AVAIL. During the same time 'opal_resend_pending_logs'(kernel) call which will set the state machine from ELOG_STATE_FETCHED_INFO to ELOG_STATE_NONE in OPAL. Because of that, read call from the kernel, which was to be made after the 'opal_get_elog_size' ends up failing. But, the elog kobject was created for the particular error log. Further in the resend routine in the OPAL, we make opal_commit_elog_in_host() call that sets OPAL_EVENT_ERROR_LOG_AVAIL. So, Kernel again makes 'opal_get_elog_size' which results in getting the error log info of the same error log which was fetched earlier. It also changes the state machine to ELOG_STATE_FETCHED_INFO and clears OPAL_EVENT_ERROR_LOG_AVAIL. Below is the snippet from the elog_event registered handler call { ... ... /* we may get notified twice, let's handle * that gracefully and not create two conflicting * entries. */ if (kset_find_obj(elog_kset, name)) return IRQ_HANDLED; ... ... } In the kernel, we search kobject for the error log whether it already exist. So kobject is found and it returns without reading error log data. So, this patch makes the flag which was true during initialisation to false. And that solves the race. Signed-off-by: Mukesh Ojha <mukesh02@linux.vnet.ibm.com> --- hw/fsp/fsp-elog-read.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)