diff mbox series

[2/2] scsi: esp: Improve consistency of RSTAT, RSEQ, and RINTR

Message ID 1543442171-24863-2-git-send-email-linux@roeck-us.net
State New
Headers show
Series None | expand

Commit Message

Guenter Roeck Nov. 28, 2018, 9:56 p.m. UTC
The guest OS reads RSTAT, RSEQ, and RINTR, and expects those registers
to reflect a consistent state. However, it is possible that the registers
can change after RSTAT was read, but before RINTR is read.

Guest OS		qemu
--------		----
Read RSTAT
			esp_command_complete()
			 RSTAT = STAT_ST
			 esp_dma_done()
			  RSTAT |= STAT_TC
			  RSEQ = 0
			  RINTR = INTR_BS

Read RSEQ
Read RINTR		RINTR = 0
			RSTAT &= ~STAT_TC
			RSEQ = SEQ_CD

The guest OS would then try to handle INTR_BS combined with an old
value of RSTAT. This sometimes resulted in lost events, spurious
interrupts, guest OS confusion, and stalled SCSI operations.
A typical guest error log (observed with various versions of Linux)
looks as follows.

scsi host1: Spurious irq, sreg=13.
...
scsi host1: Aborting command [84531f10:2a]
scsi host1: Current command [f882eea8:35]
scsi host1: Queued command [84531f10:2a]
scsi host1:  Active command [f882eea8:35]
scsi host1: Dumping command log
scsi host1: ent[15] CMD val[44] sreg[90] seqreg[00] sreg2[00] ireg[20] ss[00] event[0c]
scsi host1: ent[16] CMD val[01] sreg[90] seqreg[00] sreg2[00] ireg[20] ss[02] event[0c]
scsi host1: ent[17] CMD val[43] sreg[90] seqreg[00] sreg2[00] ireg[20] ss[02] event[0c]
scsi host1: ent[18] EVENT val[0d] sreg[92] seqreg[04] sreg2[00] ireg[18] ss[00] event[0c]
...

Adress the situation by caching RINTR and RSEQ when RSTAT is read and
using the cached values when the respective registers are read.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
---
I am not too happy with this solution (it looks kludgy to me), but it does
work. With this series applied, I have not seen a single spurious interrupt
after hundreds of boots, and no stalls. Prior to that, spurious interrupts
were seen with pretty much every boot, and the stall occurred on a regular
basis (roughly every other boot with qemu-system-hppa, less with others).

If anyone has a better idea, please let me know, and I'll be happy to
test it.

 hw/scsi/esp.c         | 40 ++++++++++++++++++++++++++++++++++++++++
 include/hw/scsi/esp.h |  2 ++
 2 files changed, 42 insertions(+)

Comments

Paolo Bonzini Nov. 29, 2018, 9:58 a.m. UTC | #1
On 28/11/18 22:56, Guenter Roeck wrote:
> The guest OS reads RSTAT, RSEQ, and RINTR, and expects those registers
> to reflect a consistent state. However, it is possible that the registers
> can change after RSTAT was read, but before RINTR is read.
> 
> Guest OS		qemu
> --------		----
> Read RSTAT
> 			esp_command_complete()
> 			 RSTAT = STAT_ST
> 			 esp_dma_done()
> 			  RSTAT |= STAT_TC
> 			  RSEQ = 0
> 			  RINTR = INTR_BS
> 
> Read RSEQ
> Read RINTR		RINTR = 0
> 			RSTAT &= ~STAT_TC
> 			RSEQ = SEQ_CD
> 
> The guest OS would then try to handle INTR_BS combined with an old
> value of RSTAT. This sometimes resulted in lost events, spurious
> interrupts, guest OS confusion, and stalled SCSI operations.

The question is, why was the guest running the interrupt routine before
STAT_INT was set in RSTAT?  The code in esp_raise_irq seems good:

    if (!(s->rregs[ESP_RSTAT] & STAT_INT)) {
        s->rregs[ESP_RSTAT] |= STAT_INT;
        qemu_irq_raise(s->irq);
        trace_esp_raise_irq();
    }

Paolo

> A typical guest error log (observed with various versions of Linux)
> looks as follows.
> 
> scsi host1: Spurious irq, sreg=13.
> ...
> scsi host1: Aborting command [84531f10:2a]
> scsi host1: Current command [f882eea8:35]
> scsi host1: Queued command [84531f10:2a]
> scsi host1:  Active command [f882eea8:35]
> scsi host1: Dumping command log
> scsi host1: ent[15] CMD val[44] sreg[90] seqreg[00] sreg2[00] ireg[20] ss[00] event[0c]
> scsi host1: ent[16] CMD val[01] sreg[90] seqreg[00] sreg2[00] ireg[20] ss[02] event[0c]
> scsi host1: ent[17] CMD val[43] sreg[90] seqreg[00] sreg2[00] ireg[20] ss[02] event[0c]
> scsi host1: ent[18] EVENT val[0d] sreg[92] seqreg[04] sreg2[00] ireg[18] ss[00] event[0c]
> ...
Mark Cave-Ayland Nov. 29, 2018, 11:56 a.m. UTC | #2
On 29/11/2018 09:58, Paolo Bonzini wrote:

> On 28/11/18 22:56, Guenter Roeck wrote:
>> The guest OS reads RSTAT, RSEQ, and RINTR, and expects those registers
>> to reflect a consistent state. However, it is possible that the registers
>> can change after RSTAT was read, but before RINTR is read.
>>
>> Guest OS		qemu
>> --------		----
>> Read RSTAT
>> 			esp_command_complete()
>> 			 RSTAT = STAT_ST
>> 			 esp_dma_done()
>> 			  RSTAT |= STAT_TC
>> 			  RSEQ = 0
>> 			  RINTR = INTR_BS
>>
>> Read RSEQ
>> Read RINTR		RINTR = 0
>> 			RSTAT &= ~STAT_TC
>> 			RSEQ = SEQ_CD
>>
>> The guest OS would then try to handle INTR_BS combined with an old
>> value of RSTAT. This sometimes resulted in lost events, spurious
>> interrupts, guest OS confusion, and stalled SCSI operations.
> 
> The question is, why was the guest running the interrupt routine before
> STAT_INT was set in RSTAT?  The code in esp_raise_irq seems good:
> 
>     if (!(s->rregs[ESP_RSTAT] & STAT_INT)) {
>         s->rregs[ESP_RSTAT] |= STAT_INT;
>         qemu_irq_raise(s->irq);
>         trace_esp_raise_irq();
>     }
> 
> Paolo

This patch is very interesting, as I have a long-running regression trying to boot
NextSTEP 3.3 on qemu-system-sparc which I eventually bisected down to the commit that
turned on iothread by default in QEMU.

The symptom is that ESP SCSI requests hang/timeout before the kernel is able to get
to the userspace installer: however if you launch QEMU with "taskset –cpu-list 1
qemu-system-sparc ..." then it works and you can complete the installation.

So certainly this suggests that there is a race condition still present in ESP
somewhere. I've given this patch a spin, and in a few quick tests here I was able to
consistently get further in kernel boot, but it still doesn't completely solve issue
for me :/


ATB,

Mark.
Guenter Roeck Nov. 29, 2018, 2:18 p.m. UTC | #3
On 11/29/18 1:58 AM, Paolo Bonzini wrote:
> On 28/11/18 22:56, Guenter Roeck wrote:
>> The guest OS reads RSTAT, RSEQ, and RINTR, and expects those registers
>> to reflect a consistent state. However, it is possible that the registers
>> can change after RSTAT was read, but before RINTR is read.
>>
>> Guest OS		qemu
>> --------		----
>> Read RSTAT
>> 			esp_command_complete()
>> 			 RSTAT = STAT_ST
>> 			 esp_dma_done()
>> 			  RSTAT |= STAT_TC
>> 			  RSEQ = 0
>> 			  RINTR = INTR_BS
>>
>> Read RSEQ
>> Read RINTR		RINTR = 0
>> 			RSTAT &= ~STAT_TC
>> 			RSEQ = SEQ_CD
>>
>> The guest OS would then try to handle INTR_BS combined with an old
>> value of RSTAT. This sometimes resulted in lost events, spurious
>> interrupts, guest OS confusion, and stalled SCSI operations.
> 
> The question is, why was the guest running the interrupt routine before
> STAT_INT was set in RSTAT?  The code in esp_raise_irq seems good:
> 

It doesn't. It polls for interrupts, but only enters the interrupt handler
if one was observed. It uses the DMA status (see first patch) for the polling,
presumably to not disturb RSTAT. Qemu produces two interrupts in a row,
the second while the first one is still handled.

Also, it works fine with a real controller (yes, I did buy one, and an
old SCSI drive, to test).

Guenter

>      if (!(s->rregs[ESP_RSTAT] & STAT_INT)) {
>          s->rregs[ESP_RSTAT] |= STAT_INT;
>          qemu_irq_raise(s->irq);
>          trace_esp_raise_irq();
>      }
> 
> Paolo
> 
>> A typical guest error log (observed with various versions of Linux)
>> looks as follows.
>>
>> scsi host1: Spurious irq, sreg=13.
>> ...
>> scsi host1: Aborting command [84531f10:2a]
>> scsi host1: Current command [f882eea8:35]
>> scsi host1: Queued command [84531f10:2a]
>> scsi host1:  Active command [f882eea8:35]
>> scsi host1: Dumping command log
>> scsi host1: ent[15] CMD val[44] sreg[90] seqreg[00] sreg2[00] ireg[20] ss[00] event[0c]
>> scsi host1: ent[16] CMD val[01] sreg[90] seqreg[00] sreg2[00] ireg[20] ss[02] event[0c]
>> scsi host1: ent[17] CMD val[43] sreg[90] seqreg[00] sreg2[00] ireg[20] ss[02] event[0c]
>> scsi host1: ent[18] EVENT val[0d] sreg[92] seqreg[04] sreg2[00] ireg[18] ss[00] event[0c]
>> ...
> 
>
Guenter Roeck Nov. 29, 2018, 3:42 p.m. UTC | #4
On Thu, Nov 29, 2018 at 11:56:39AM +0000, Mark Cave-Ayland wrote:
> On 29/11/2018 09:58, Paolo Bonzini wrote:
> 
> > On 28/11/18 22:56, Guenter Roeck wrote:
> >> The guest OS reads RSTAT, RSEQ, and RINTR, and expects those registers
> >> to reflect a consistent state. However, it is possible that the registers
> >> can change after RSTAT was read, but before RINTR is read.
> >>
> >> Guest OS		qemu
> >> --------		----
> >> Read RSTAT
> >> 			esp_command_complete()
> >> 			 RSTAT = STAT_ST
> >> 			 esp_dma_done()
> >> 			  RSTAT |= STAT_TC
> >> 			  RSEQ = 0
> >> 			  RINTR = INTR_BS
> >>
> >> Read RSEQ
> >> Read RINTR		RINTR = 0
> >> 			RSTAT &= ~STAT_TC
> >> 			RSEQ = SEQ_CD
> >>
> >> The guest OS would then try to handle INTR_BS combined with an old
> >> value of RSTAT. This sometimes resulted in lost events, spurious
> >> interrupts, guest OS confusion, and stalled SCSI operations.
> > 
> > The question is, why was the guest running the interrupt routine before
> > STAT_INT was set in RSTAT?  The code in esp_raise_irq seems good:
> > 
> >     if (!(s->rregs[ESP_RSTAT] & STAT_INT)) {
> >         s->rregs[ESP_RSTAT] |= STAT_INT;
> >         qemu_irq_raise(s->irq);
> >         trace_esp_raise_irq();
> >     }
> > 
> > Paolo
> 
> This patch is very interesting, as I have a long-running regression trying to boot
> NextSTEP 3.3 on qemu-system-sparc which I eventually bisected down to the commit that
> turned on iothread by default in QEMU.
> 
Makes me wonder: Is there a means to hold up processing esp_command_complete()
until a pending interrupt has been handled ? That might be a cleaner solution.
I could try to implement that in the driver, but maybe there is a means to do
that using the infrastructure.

Thanks,
Guenter

> The symptom is that ESP SCSI requests hang/timeout before the kernel is able to get
> to the userspace installer: however if you launch QEMU with "taskset –cpu-list 1
> qemu-system-sparc ..." then it works and you can complete the installation.
> 
> So certainly this suggests that there is a race condition still present in ESP
> somewhere. I've given this patch a spin, and in a few quick tests here I was able to
> consistently get further in kernel boot, but it still doesn't completely solve issue
> for me :/
> 
> 
> ATB,
> 
> Mark.
Guenter Roeck Nov. 29, 2018, 5:38 p.m. UTC | #5
Hi Mark,

On Thu, Nov 29, 2018 at 11:56:39AM +0000, Mark Cave-Ayland wrote:
> On 29/11/2018 09:58, Paolo Bonzini wrote:
> 
> > On 28/11/18 22:56, Guenter Roeck wrote:
> >> The guest OS reads RSTAT, RSEQ, and RINTR, and expects those registers
> >> to reflect a consistent state. However, it is possible that the registers
> >> can change after RSTAT was read, but before RINTR is read.
> >>
> >> Guest OS		qemu
> >> --------		----
> >> Read RSTAT
> >> 			esp_command_complete()
> >> 			 RSTAT = STAT_ST
> >> 			 esp_dma_done()
> >> 			  RSTAT |= STAT_TC
> >> 			  RSEQ = 0
> >> 			  RINTR = INTR_BS
> >>
> >> Read RSEQ
> >> Read RINTR		RINTR = 0
> >> 			RSTAT &= ~STAT_TC
> >> 			RSEQ = SEQ_CD
> >>
> >> The guest OS would then try to handle INTR_BS combined with an old
> >> value of RSTAT. This sometimes resulted in lost events, spurious
> >> interrupts, guest OS confusion, and stalled SCSI operations.
> > 
> > The question is, why was the guest running the interrupt routine before
> > STAT_INT was set in RSTAT?  The code in esp_raise_irq seems good:
> > 
> >     if (!(s->rregs[ESP_RSTAT] & STAT_INT)) {
> >         s->rregs[ESP_RSTAT] |= STAT_INT;
> >         qemu_irq_raise(s->irq);
> >         trace_esp_raise_irq();
> >     }
> > 
> > Paolo
> 
> This patch is very interesting, as I have a long-running regression trying to boot
> NextSTEP 3.3 on qemu-system-sparc which I eventually bisected down to the commit that
> turned on iothread by default in QEMU.
> 
> The symptom is that ESP SCSI requests hang/timeout before the kernel is able to get
> to the userspace installer: however if you launch QEMU with "taskset –cpu-list 1
> qemu-system-sparc ..." then it works and you can complete the installation.
> 
> So certainly this suggests that there is a race condition still present in ESP
> somewhere. I've given this patch a spin, and in a few quick tests here I was able to
> consistently get further in kernel boot, but it still doesn't completely solve issue
> for me :/
> 

Can you try the attached patch ? It is a bit cleaner than the first version,
and works for me as well.

Note that this isn't perfect. Specifically, I see differences in handling
STAT_TC. The controller specification is a bit ambiguous in that regard,
but comparing the qemu code with real controller behavior shows that the
real controller does not reset STAT_TC when reading the interrupt status
register. That doesn't seem to matter for Linux, but it may influence
other guests.

Guenter
From 751edd383f3c70d8a9195345b45aa3eb1ada7553 Mon Sep 17 00:00:00 2001
From: Guenter Roeck <linux@roeck-us.net>
Date: Thu, 29 Nov 2018 09:17:42 -0800
Subject: [PATCH] scsi: esp: Defer command completion until previous interrupts
 have been handled

The guest OS reads RSTAT, RSEQ, and RINTR, and expects those registers
to reflect a consistent state. However, it is possible that the registers
can change after RSTAT was read, but before RINTR is read, when
esp_command_complete() is called.

Guest OS		qemu
--------		----
[handle interrupt]
Read RSTAT
			esp_command_complete()
			 RSTAT = STAT_ST
			 esp_dma_done()
			  RSTAT |= STAT_TC
			  RSEQ = 0
			  RINTR = INTR_BS

Read RSEQ
Read RINTR		RINTR = 0
			RSTAT &= ~STAT_TC
			RSEQ = SEQ_CD

The guest OS would then try to handle INTR_BS combined with an old
value of RSTAT. This sometimes resulted in lost events, spurious
interrupts, guest OS confusion, and stalled SCSI operations.
A typical guest error log (observed with various versions of Linux)
looks as follows.

scsi host1: Spurious irq, sreg=13.
...
scsi host1: Aborting command [84531f10:2a]
scsi host1: Current command [f882eea8:35]
scsi host1: Queued command [84531f10:2a]
scsi host1:  Active command [f882eea8:35]
scsi host1: Dumping command log
scsi host1: ent[15] CMD val[44] sreg[90] seqreg[00] sreg2[00] ireg[20] ss[00] event[0c]
scsi host1: ent[16] CMD val[01] sreg[90] seqreg[00] sreg2[00] ireg[20] ss[02] event[0c]
scsi host1: ent[17] CMD val[43] sreg[90] seqreg[00] sreg2[00] ireg[20] ss[02] event[0c]
scsi host1: ent[18] EVENT val[0d] sreg[92] seqreg[04] sreg2[00] ireg[18] ss[00] event[0c]
...

Defer handling command completion until previous interrupts have been
handled to fix the problem.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
---
This looks much cleaner than the previous patch and at the same time
identifies the problem. Enabling the new trace shows that the condition
is seen on a regular basis (on average 2-3 times per boot test with
qemu-system-hppa if the test does nothing but shut down immediately).

 hw/scsi/esp.c         | 29 ++++++++++++++++++++++++-----
 hw/scsi/trace-events  |  1 +
 include/hw/scsi/esp.h |  2 ++
 3 files changed, 27 insertions(+), 5 deletions(-)

diff --git a/hw/scsi/esp.c b/hw/scsi/esp.c
index 630d923..e3d6cc5 100644
--- a/hw/scsi/esp.c
+++ b/hw/scsi/esp.c
@@ -286,11 +286,8 @@ static void esp_do_dma(ESPState *s)
     esp_dma_done(s);
 }
 
-void esp_command_complete(SCSIRequest *req, uint32_t status,
-                                 size_t resid)
+static void _esp_command_complete(ESPState *s, uint32_t status)
 {
-    ESPState *s = req->hba_private;
-
     trace_esp_command_complete();
     if (s->ti_size != 0) {
         trace_esp_command_complete_unexpected();
@@ -311,6 +308,23 @@ void esp_command_complete(SCSIRequest *req, uint32_t status,
     }
 }
 
+void esp_command_complete(SCSIRequest *req, uint32_t status,
+                          size_t resid)
+{
+    ESPState *s = req->hba_private;
+
+    if (s->rregs[ESP_RSTAT] & STAT_INT) {
+        /* Defer handling command complete until the previous
+         * interrupt has been handled.
+         */
+        trace_esp_command_complete_deferred();
+        s->deferred_status = status;
+        s->deferred_complete = true;
+        return;
+    }
+    _esp_command_complete(s, status);
+}
+
 void esp_transfer_data(SCSIRequest *req, uint32_t len)
 {
     ESPState *s = req->hba_private;
@@ -422,7 +436,10 @@ uint64_t esp_reg_read(ESPState *s, uint32_t saddr)
         s->rregs[ESP_RSTAT] &= ~STAT_TC;
         s->rregs[ESP_RSEQ] = SEQ_CD;
         esp_lower_irq(s);
-
+        if (s->deferred_complete) {
+            _esp_command_complete(s, s->deferred_status);
+            s->deferred_complete = false;
+        }
         return old_val;
     case ESP_TCHI:
         /* Return the unique id if the value has never been written */
@@ -582,6 +599,8 @@ const VMStateDescription vmstate_esp = {
         VMSTATE_UINT32(ti_wptr, ESPState),
         VMSTATE_BUFFER(ti_buf, ESPState),
         VMSTATE_UINT32(status, ESPState),
+        VMSTATE_UINT32(deferred_status, ESPState),
+        VMSTATE_BOOL(deferred_complete, ESPState),
         VMSTATE_UINT32(dma, ESPState),
         VMSTATE_PARTIAL_BUFFER(cmdbuf, ESPState, 16),
         VMSTATE_BUFFER_START_MIDDLE_V(cmdbuf, ESPState, 16, 4),
diff --git a/hw/scsi/trace-events b/hw/scsi/trace-events
index 0fb6a99..2fe8a7c 100644
--- a/hw/scsi/trace-events
+++ b/hw/scsi/trace-events
@@ -167,6 +167,7 @@ esp_handle_satn_stop(uint32_t cmdlen) "cmdlen %d"
 esp_write_response(uint32_t status) "Transfer status (status=%d)"
 esp_do_dma(uint32_t cmdlen, uint32_t len) "command len %d + %d"
 esp_command_complete(void) "SCSI Command complete"
+esp_command_complete_deferred(void) "SCSI Command complete deferred"
 esp_command_complete_unexpected(void) "SCSI command completed unexpectedly"
 esp_command_complete_fail(void) "Command failed"
 esp_transfer_data(uint32_t dma_left, int32_t ti_size) "transfer %d/%d"
diff --git a/include/hw/scsi/esp.h b/include/hw/scsi/esp.h
index 682a0d2..adab63d 100644
--- a/include/hw/scsi/esp.h
+++ b/include/hw/scsi/esp.h
@@ -23,6 +23,8 @@ struct ESPState {
     int32_t ti_size;
     uint32_t ti_rptr, ti_wptr;
     uint32_t status;
+    uint32_t deferred_status;
+    bool deferred_complete;
     uint32_t dma;
     uint8_t ti_buf[TI_BUFSZ];
     SCSIBus bus;
Paolo Bonzini Nov. 29, 2018, 5:53 p.m. UTC | #6
On 29/11/18 18:38, Guenter Roeck wrote:
> This looks much cleaner than the previous patch and at the same time
> identifies the problem. Enabling the new trace shows that the condition
> is seen on a regular basis (on average 2-3 times per boot test with
> qemu-system-hppa if the test does nothing but shut down immediately).

Yes, this looks very nice.  I'll queue it if Mark reports that it works,
with just s/_esp_command_complete/esp_report_command_complete/.

Paolo
Mark Cave-Ayland Nov. 29, 2018, 6:07 p.m. UTC | #7
On 29/11/2018 17:38, Guenter Roeck wrote:

>> This patch is very interesting, as I have a long-running regression trying to boot
>> NextSTEP 3.3 on qemu-system-sparc which I eventually bisected down to the commit that
>> turned on iothread by default in QEMU.
>>
>> The symptom is that ESP SCSI requests hang/timeout before the kernel is able to get
>> to the userspace installer: however if you launch QEMU with "taskset –cpu-list 1
>> qemu-system-sparc ..." then it works and you can complete the installation.
>>
>> So certainly this suggests that there is a race condition still present in ESP
>> somewhere. I've given this patch a spin, and in a few quick tests here I was able to
>> consistently get further in kernel boot, but it still doesn't completely solve issue
>> for me :/
>>
> 
> Can you try the attached patch ? It is a bit cleaner than the first version,
> and works for me as well.
> 
> Note that this isn't perfect. Specifically, I see differences in handling
> STAT_TC. The controller specification is a bit ambiguous in that regard,
> but comparing the qemu code with real controller behavior shows that the
> real controller does not reset STAT_TC when reading the interrupt status
> register. That doesn't seem to matter for Linux, but it may influence
> other guests.

Hi Guenter,

Thanks for the patch. I just gave it a quick test, and unfortunately my NextSTEP ISO
still hangs in the same place on boot :(

Not sure if it helps, but attached is a simple trace backend log from "-trace 'esp*'"
from startup all the way to the point where the kernel hangs on boot whilst
enumerating the SCSI bus (it does seem to hang at random points in the bus
enumeration process).


ATB,

Mark.
Mark Cave-Ayland Nov. 29, 2018, 6:34 p.m. UTC | #8
On 29/11/2018 17:38, Guenter Roeck wrote:

> Can you try the attached patch ? It is a bit cleaner than the first version,
> and works for me as well.
> 
> Note that this isn't perfect. Specifically, I see differences in handling
> STAT_TC. The controller specification is a bit ambiguous in that regard,
> but comparing the qemu code with real controller behavior shows that the
> real controller does not reset STAT_TC when reading the interrupt status
> register. That doesn't seem to matter for Linux, but it may influence
> other guests.

I've now completed a boot test of all my SPARC32 OpenBIOS CDROM images with this
patch, and whilst it doesn't solve my NextSTEP issue, I don't see any obvious
regressions.

Note that NetBSD SPARC32 tends to spit out the occasional "!TC on data xfer" message
to the console during periods of disk access, however that is something that has
always happened and isn't something new introduced by this patch.


ATB,

Mark.
Guenter Roeck Nov. 29, 2018, 7 p.m. UTC | #9
On Thu, Nov 29, 2018 at 06:07:05PM +0000, Mark Cave-Ayland wrote:
> On 29/11/2018 17:38, Guenter Roeck wrote:
> 
> >> This patch is very interesting, as I have a long-running regression trying to boot
> >> NextSTEP 3.3 on qemu-system-sparc which I eventually bisected down to the commit that
> >> turned on iothread by default in QEMU.
> >>
> >> The symptom is that ESP SCSI requests hang/timeout before the kernel is able to get
> >> to the userspace installer: however if you launch QEMU with "taskset –cpu-list 1
> >> qemu-system-sparc ..." then it works and you can complete the installation.
> >>
> >> So certainly this suggests that there is a race condition still present in ESP
> >> somewhere. I've given this patch a spin, and in a few quick tests here I was able to
> >> consistently get further in kernel boot, but it still doesn't completely solve issue
> >> for me :/
> >>
> > 
> > Can you try the attached patch ? It is a bit cleaner than the first version,
> > and works for me as well.
> > 
> > Note that this isn't perfect. Specifically, I see differences in handling
> > STAT_TC. The controller specification is a bit ambiguous in that regard,
> > but comparing the qemu code with real controller behavior shows that the
> > real controller does not reset STAT_TC when reading the interrupt status
> > register. That doesn't seem to matter for Linux, but it may influence
> > other guests.
> 
> Hi Guenter,
> 
> Thanks for the patch. I just gave it a quick test, and unfortunately my NextSTEP ISO
> still hangs in the same place on boot :(
> 
Too bad. Is it "same place" as with the first version of the patch, or
"same place" as in upstream qemu ? That might be important, as the two patch
versions behave differently (one caches RSTAT/RINTR/RSEQ, one defers command
complete handling).

> Not sure if it helps, but attached is a simple trace backend log from "-trace 'esp*'"
> from startup all the way to the point where the kernel hangs on boot whilst
> enumerating the SCSI bus (it does seem to hang at random points in the bus
> enumeration process).
> 
This is interesting; yours seems to be a different problem. I don't see any
command_complete_deferred traces in your log. I also don't see any suspicious
activity between esp_raise_irq and esp_lower_irq.

Can you try tracing in singlethreaded mode ? Maybe that can help us finding
the difference.

Thanks,
Guenter
Guenter Roeck Nov. 29, 2018, 7:07 p.m. UTC | #10
On Thu, Nov 29, 2018 at 06:34:54PM +0000, Mark Cave-Ayland wrote:
> On 29/11/2018 17:38, Guenter Roeck wrote:
> 
> > Can you try the attached patch ? It is a bit cleaner than the first version,
> > and works for me as well.
> > 
> > Note that this isn't perfect. Specifically, I see differences in handling
> > STAT_TC. The controller specification is a bit ambiguous in that regard,
> > but comparing the qemu code with real controller behavior shows that the
> > real controller does not reset STAT_TC when reading the interrupt status
> > register. That doesn't seem to matter for Linux, but it may influence
> > other guests.
> 
> I've now completed a boot test of all my SPARC32 OpenBIOS CDROM images with this
> patch, and whilst it doesn't solve my NextSTEP issue, I don't see any obvious
> regressions.
> 
> Note that NetBSD SPARC32 tends to spit out the occasional "!TC on data xfer" message
> to the console during periods of disk access, however that is something that has
> always happened and isn't something new introduced by this patch.
> 

That may be because reading the interrupt status resets the TC bit.
As mentioned above, I think it shouldn't do that. Just a wild guess, but
it might be worth a try. Can you remove "s->rregs[ESP_RSTAT] &= ~STAT_TC;"
from the ESP_RINTR case in esp_reg_read() and see what happens ?

[That may expose situations where STAT_TC _should_ be cleared but isn't,
 so we may hit other problems when doing that.]

Thanks,
Guenter
Mark Cave-Ayland Nov. 29, 2018, 7:33 p.m. UTC | #11
On 29/11/2018 19:00, Guenter Roeck wrote:

>> Thanks for the patch. I just gave it a quick test, and unfortunately my NextSTEP ISO
>> still hangs in the same place on boot :(
>>
> Too bad. Is it "same place" as with the first version of the patch, or
> "same place" as in upstream qemu ? That might be important, as the two patch
> versions behave differently (one caches RSTAT/RINTR/RSEQ, one defers command
> complete handling).

I'd say same place as in upstream QEMU, although again the exact location does vary
so I have to keep running it several times to get a feel for whether or not it is in
improvement.

>> Not sure if it helps, but attached is a simple trace backend log from "-trace 'esp*'"
>> from startup all the way to the point where the kernel hangs on boot whilst
>> enumerating the SCSI bus (it does seem to hang at random points in the bus
>> enumeration process).
>>
> This is interesting; yours seems to be a different problem. I don't see any
> command_complete_deferred traces in your log. I also don't see any suspicious
> activity between esp_raise_irq and esp_lower_irq.
> 
> Can you try tracing in singlethreaded mode ? Maybe that can help us finding
> the difference.

Attached are the "-trace 'esp*' -trace 'scsi*'" outputs for a bad (normal) boot and
good (single threaded) boot for comparison.


ATB,

Mark.
Mark Cave-Ayland Nov. 29, 2018, 7:38 p.m. UTC | #12
On 29/11/2018 19:07, Guenter Roeck wrote:

>> I've now completed a boot test of all my SPARC32 OpenBIOS CDROM images with this
>> patch, and whilst it doesn't solve my NextSTEP issue, I don't see any obvious
>> regressions.
>>
>> Note that NetBSD SPARC32 tends to spit out the occasional "!TC on data xfer" message
>> to the console during periods of disk access, however that is something that has
>> always happened and isn't something new introduced by this patch.
>>
> 
> That may be because reading the interrupt status resets the TC bit.
> As mentioned above, I think it shouldn't do that. Just a wild guess, but
> it might be worth a try. Can you remove "s->rregs[ESP_RSTAT] &= ~STAT_TC;"
> from the ESP_RINTR case in esp_reg_read() and see what happens ?
> 
> [That may expose situations where STAT_TC _should_ be cleared but isn't,
>  so we may hit other problems when doing that.]

I've tried that, and over a number of boots it does seem to allow the boot to get
further: with this change I'd say around 95% of the time NextSTEP now gets past the
SCSI bus enumeration and hangs at the point where it tries to mount the root filesystem.

Interestingly enough even if I comment out that line I still see the "!TC on data
xfer" messages appearing on the NetBSD console...


ATB,

Mark.
Guenter Roeck Nov. 29, 2018, 9:26 p.m. UTC | #13
On Thu, Nov 29, 2018 at 07:33:15PM +0000, Mark Cave-Ayland wrote:
> On 29/11/2018 19:00, Guenter Roeck wrote:
> 
> >> Thanks for the patch. I just gave it a quick test, and unfortunately my NextSTEP ISO
> >> still hangs in the same place on boot :(
> >>
> > Too bad. Is it "same place" as with the first version of the patch, or
> > "same place" as in upstream qemu ? That might be important, as the two patch
> > versions behave differently (one caches RSTAT/RINTR/RSEQ, one defers command
> > complete handling).
> 
> I'd say same place as in upstream QEMU, although again the exact location does vary
> so I have to keep running it several times to get a feel for whether or not it is in
> improvement.
> 
> >> Not sure if it helps, but attached is a simple trace backend log from "-trace 'esp*'"
> >> from startup all the way to the point where the kernel hangs on boot whilst
> >> enumerating the SCSI bus (it does seem to hang at random points in the bus
> >> enumeration process).
> >>
> > This is interesting; yours seems to be a different problem. I don't see any
> > command_complete_deferred traces in your log. I also don't see any suspicious
> > activity between esp_raise_irq and esp_lower_irq.
> > 
> > Can you try tracing in singlethreaded mode ? Maybe that can help us finding
> > the difference.
> 
> Attached are the "-trace 'esp*' -trace 'scsi*'" outputs for a bad (normal) boot and
> good (single threaded) boot for comparison.
> 

This is really weird. The "bad" log just stops. Up to that point, the "good"
log is virtually identical (liteally up to line 83602 in both logs). Is it
possible that there is some qemu-internal race condition that has nothing
to do with esp itself, one that causes qemu to lock up ?

Thanks,
Guenter
diff mbox series

Patch

diff --git a/hw/scsi/esp.c b/hw/scsi/esp.c
index 630d923..6af74bc 100644
--- a/hw/scsi/esp.c
+++ b/hw/scsi/esp.c
@@ -415,6 +415,16 @@  uint64_t esp_reg_read(ESPState *s, uint32_t saddr)
         }
         break;
     case ESP_RINTR:
+        if (s->rintr_saved) {
+            old_val = s->rintr_saved;
+            s->rintr_saved = 0;
+            if (!(s->rregs[ESP_RINTR])) {
+                s->rregs[ESP_RSTAT] &= ~STAT_TC;
+                s->rregs[ESP_RSEQ] = SEQ_CD;
+                esp_lower_irq(s);
+            }
+            return old_val;
+        }
         /* Clear sequence step, interrupt register and all status bits
            except TC */
         old_val = s->rregs[ESP_RINTR];
@@ -429,6 +439,34 @@  uint64_t esp_reg_read(ESPState *s, uint32_t saddr)
         if (!s->tchi_written) {
             return s->chip_id;
         }
+    case ESP_RSTAT:
+        /*
+         * After receiving an interrupt, the guest OS reads
+         * RSTAT, RSEQ, and RINTR. When reading RINTR,
+         * RSTAT and RSEQ are updated. It was observed that
+         * esp_command_complete() and with it esp_dma_done()
+         * was called after the guest OS reads RSTAT, but
+         * before it was able to read RINTR. In other words,
+         * the host would read RINTR associated with the
+         * old value of RSTAT, not the new value. Since RSTAT
+         * was supposed to reflect STAT_ST in this situation,
+         * the host OS got confused, and the operation stalled.
+         * Remedy the situation by caching both ESP_RINTR and
+         * ESP_RSEQ. Return those values until ESP_RINTR is read.
+         * Only do this if an interrupt is pending to limit its
+         * impact.
+         */
+        if (s->rregs[ESP_RSTAT] & STAT_INT) {
+            s->rintr_saved = s->rregs[ESP_RINTR];
+            s->rseq_saved =  s->rregs[ESP_RSEQ];
+            s->rregs[ESP_RINTR] = 0;
+        }
+        break;
+    case ESP_RSEQ:
+        if (s->rintr_saved) {
+            return s->rseq_saved;
+        }
+        break;
     default:
         break;
     }
@@ -577,6 +615,8 @@  const VMStateDescription vmstate_esp = {
     .fields = (VMStateField[]) {
         VMSTATE_BUFFER(rregs, ESPState),
         VMSTATE_BUFFER(wregs, ESPState),
+        VMSTATE_UINT8(rintr_saved, ESPState),
+        VMSTATE_UINT8(rseq_saved, ESPState),
         VMSTATE_INT32(ti_size, ESPState),
         VMSTATE_UINT32(ti_rptr, ESPState),
         VMSTATE_UINT32(ti_wptr, ESPState),
diff --git a/include/hw/scsi/esp.h b/include/hw/scsi/esp.h
index 682a0d2..342f607 100644
--- a/include/hw/scsi/esp.h
+++ b/include/hw/scsi/esp.h
@@ -17,6 +17,8 @@  typedef struct ESPState ESPState;
 struct ESPState {
     uint8_t rregs[ESP_REGS];
     uint8_t wregs[ESP_REGS];
+    uint8_t rintr_saved;
+    uint8_t rseq_saved;
     qemu_irq irq;
     uint8_t chip_id;
     bool tchi_written;