Patchwork versatile: Push lsi initialization to the end

login
register
mail settings
Submitter Jan Kiszka
Date Oct. 5, 2012, 5:30 p.m.
Message ID <506F1934.5000002@siemens.com>
Download mbox | patch
Permalink /patch/189541/
State New
Headers show

Comments

Jan Kiszka - Oct. 5, 2012, 5:30 p.m.
This is nasty, but there is no better way given current mux logic:

As setting up the block device will trigger a qemu_bh_poll while there
are qemu_chr open events in the queue, we have to register the UARTs
and everything else that might be mux'ed first so that the right active
frontend is already registered when the bottom-half is finally
processed.

This fixes spurious monitor messages with

qemu-system-arm ... -machine versatilepb /path/to/disk -nographic

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 hw/versatilepb.c |   11 ++++++-----
 1 files changed, 6 insertions(+), 5 deletions(-)
Peter Maydell - Oct. 5, 2012, 5:42 p.m.
On 5 October 2012 18:30, Jan Kiszka <jan.kiszka@siemens.com> wrote:
> This is nasty, but there is no better way given current mux logic:
>
> As setting up the block device will trigger a qemu_bh_poll while there
> are qemu_chr open events in the queue, we have to register the UARTs
> and everything else that might be mux'ed first so that the right active
> frontend is already registered when the bottom-half is finally
> processed.

Yuck. I really don't like this -- why should the board model have
to care about implementation internals of lsi53c895a? It's just
plugging in a PCI card. And why only this device and not every
storage device in every board?

Nothing should actually start happening until the whole machine model
is completely set up and we've returned control to vl.c (and probably
not until we've done our first system reset). Any device which is
doing stuff early is broken and needs fixing.

-- PMM
Jan Kiszka - Oct. 5, 2012, 6:01 p.m.
On 2012-10-05 19:42, Peter Maydell wrote:
> On 5 October 2012 18:30, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>> This is nasty, but there is no better way given current mux logic:
>>
>> As setting up the block device will trigger a qemu_bh_poll while there
>> are qemu_chr open events in the queue, we have to register the UARTs
>> and everything else that might be mux'ed first so that the right active
>> frontend is already registered when the bottom-half is finally
>> processed.
> 
> Yuck. I really don't like this -- why should the board model have
> to care about implementation internals of lsi53c895a? It's just
> plugging in a PCI card. And why only this device and not every
> storage device in every board?
> 
> Nothing should actually start happening until the whole machine model
> is completely set up and we've returned control to vl.c (and probably
> not until we've done our first system reset). Any device which is
> doing stuff early is broken and needs fixing.

Here is the backtrace that bites:

(gdb) bt
#0  qemu_bh_poll () at /data/qemu/async.c:56
#1  0x00005555555de819 in qemu_aio_wait () at /data/qemu/aio.c:118
#2  0x00005555555f6635 in bdrv_rw_co (bs=0x5555564cc140, sector_num=<optimized out>, buf=<optimized out>, nb_sectors=<optimized out>, is_write=<optimized out>) at /data/qemu/block.c:1819
#3  0x00005555555f6b1b in bdrv_read_unthrottled (bs=0x5555564cc140, sector_num=<optimized out>, buf=<optimized out>, nb_sectors=<optimized out>) at /data/qemu/block.c:1841
#4  0x0000555555637a8a in guess_disk_lchs (bs=0x5555564cc140, pcylinders=0x7fffffffd434, pheads=0x7fffffffd430, psectors=0x7fffffffd42c) at /data/qemu/hw/hd-geometry.c:68
#5  0x0000555555637c10 in hd_geometry_guess (bs=0x5555564cc140, pcyls=0x5555565dbbac, pheads=0x5555565dbbb0, psecs=0x5555565dbbb4, ptrans=0x0) at /data/qemu/hw/hd-geometry.c:125
#6  0x000055555562d0da in blkconf_geometry (conf=0x5555565dbb90, ptrans=0x0, cyls_max=65535, heads_max=255, secs_max=255) at /data/qemu/hw/block-common.c:43
#7  0x0000555555648dea in scsi_initfn (dev=0x5555565dbaf0) at /data/qemu/hw/scsi-disk.c:2061
#8  0x00005555556452e6 in scsi_device_init (s=0x5555565dbaf0) at /data/qemu/hw/scsi-bus.c:42
#9  scsi_qdev_init (qdev=<optimized out>) at /data/qemu/hw/scsi-bus.c:183
#10 0x0000555555641e7f in qdev_init (dev=0x5555565dbaf0) at /data/qemu/hw/qdev.c:160
#11 0x00005555556428c3 in scsi_bus_legacy_add_drive (bus=<optimized out>, bdrv=0x5555564cc140, unit=0, removable=false, bootindex=-1) at /data/qemu/hw/scsi-bus.c:228
#12 0x00005555556429d8 in scsi_bus_legacy_handle_cmdline (bus=0x5555565d92b8) at /data/qemu/hw/scsi-bus.c:246
#13 0x00005555556c1a46 in pci_qdev_init (qdev=0x5555565d8b00) at /data/qemu/hw/pci.c:1556
#14 0x0000555555641e7f in qdev_init (dev=0x5555565d8b00) at /data/qemu/hw/qdev.c:160
#15 0x0000555555641fd1 in qdev_init_nofail (dev=0x5555565d8b00) at /data/qemu/hw/qdev.c:261
#16 0x00005555556c25b8 in pci_create_simple_multifunction (bus=<optimized out>, devfn=<optimized out>, multifunction=<optimized out>, name=<optimized out>) at /data/qemu/hw/pci.c:1615
#17 0x0000555555808f44 in versatile_init (ram_size=134217728, boot_device=<optimized out>, 
    kernel_filename=0x5555564c9420 "zImage-qemuarm.bin", kernel_cmdline=0x5555558a0558 "", initrd_filename=0x0, 
    cpu_model=<optimized out>, board_id=387) at /data/qemu/hw/arm/../versatilepb.c:325
#18 0x0000555555809100 in vpb_init (ram_size=<optimized out>, boot_device=<optimized out>, kernel_filename=<optimized out>, kernel_cmdline=<optimized out>, initrd_filename=<optimized out>, 
    cpu_model=<optimized out>) at /data/qemu/hw/arm/../versatilepb.c:351
#19 0x000055555570bcc9 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at /data/qemu/vl.c:3627

There are earlier calls to qemu_bh_poll, but likely before chardev
creation.

I'm not a fan of this either, but the alternatives are way more
complicated. We either need to rewrite the chardev subsystem,
specifically how mux'ed devices are registered and how the active one is
selected. Or we need to avoid flushing "unrelated" BHs for block
devices. Not sure of those read requests can be postponed.

Jan
Peter Maydell - Oct. 6, 2012, 2:13 a.m.
On 5 October 2012 19:01, Jan Kiszka <jan.kiszka@siemens.com> wrote:
> I'm not a fan of this either, but the alternatives are way more
> complicated. We either need to rewrite the chardev subsystem,
> specifically how mux'ed devices are registered and how the active one is
> selected. Or we need to avoid flushing "unrelated" BHs for block
> devices. Not sure of those read requests can be postponed.

Is this a regression? If it is then the obvious answer is to back
out whatever broke it...

-- PMM
Jan Kiszka - Oct. 8, 2012, 6:52 a.m.
On 2012-10-06 04:13, Peter Maydell wrote:
> On 5 October 2012 19:01, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>> I'm not a fan of this either, but the alternatives are way more
>> complicated. We either need to rewrite the chardev subsystem,
>> specifically how mux'ed devices are registered and how the active one is
>> selected. Or we need to avoid flushing "unrelated" BHs for block
>> devices. Not sure of those read requests can be postponed.
> 
> Is this a regression? If it is then the obvious answer is to back
> out whatever broke it...

I'm using this machine for the first time, so I cannot answer this from
the top of my head. However, I don't think it can be a regression.

Mux chardevs work like this: You create the backend, then you register
the frontend with them, one by one. The last one registered is the first
one active. It should also receive the open event of chardev. But as
that open even is issued via a BH and last frontend, the serial device,
arrives after the first BH flushing, things break.

Jan
Peter Maydell - Oct. 8, 2012, 4:33 p.m.
On 5 October 2012 18:30, Jan Kiszka <jan.kiszka@siemens.com> wrote:
> This is nasty, but there is no better way given current mux logic:
>
> As setting up the block device will trigger a qemu_bh_poll while there
> are qemu_chr open events in the queue, we have to register the UARTs
> and everything else that might be mux'ed first so that the right active
> frontend is already registered when the bottom-half is finally
> processed.

So I guess this comes down to what the semantics of bottom halves are.
I can see two plausible options:

 1. bottom halves are a mechanism provided by our cpu/device
    simulation framework, and so will never be run before the
    simulation is fully initialised
  * this means devices can register BHs which set irq lines,
    send events to chr mux front ends etc etc
  * it also means that device setup mustn't trigger a bh_poll
    (so we'd need to track down the bit of the block device
    setup that's causing this)

 2. bottom halves are a generic mechanism that you can use
    not just as part of the simulation, and so BHs may run
    as soon as they're registered
  * this would let us use them for arbitrary purposes in init
  * we'd need to audit and fix all the current uses to check
    whether they're safe to run early or if they need to have
    a 'do nothing if simulation not running' check

Any opinions?

-- PMM
Paolo Bonzini - Oct. 8, 2012, 4:39 p.m.
Il 08/10/2012 18:33, Peter Maydell ha scritto:
> On 5 October 2012 18:30, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>> This is nasty, but there is no better way given current mux logic:
>>
>> As setting up the block device will trigger a qemu_bh_poll while there
>> are qemu_chr open events in the queue, we have to register the UARTs
>> and everything else that might be mux'ed first so that the right active
>> frontend is already registered when the bottom-half is finally
>> processed.
> 
> So I guess this comes down to what the semantics of bottom halves are.
> I can see two plausible options:
> 
>  1. bottom halves are a mechanism provided by our cpu/device
>     simulation framework, and so will never be run before the
>     simulation is fully initialised
>   * this means devices can register BHs which set irq lines,
>     send events to chr mux front ends etc etc
>   * it also means that device setup mustn't trigger a bh_poll
>     (so we'd need to track down the bit of the block device
>     setup that's causing this)
> 
>  2. bottom halves are a generic mechanism that you can use
>     not just as part of the simulation, and so BHs may run
>     as soon as they're registered
>   * this would let us use them for arbitrary purposes in init
>   * we'd need to audit and fix all the current uses to check
>     whether they're safe to run early or if they need to have
>     a 'do nothing if simulation not running' check

3. bottom halves are an internal concept to the block layer that has
been hijacked by device models.  The bottom half idea is that the code
would run as soon as the current code is done with this subsystem;
ideally, you would instead queue a work item in a thread-pool and the
code would block on the same fine-grained lock as the subsystem that
created the bottom half.  Work items from different subsystems would be
able to run concurrently---of course that's not too helpful while we
have a single lock for the whole iothread...

Stefan's work should be able to kill qemu_bh_new inside the block layer
(replacing it with aio_bh_new), so qemu_bh_new can be repurposed to
something that doesn't conflict with the block layer.

Paolo
Aurelien Jarno - Oct. 8, 2012, 5:07 p.m.
On Mon, Oct 08, 2012 at 08:52:30AM +0200, Jan Kiszka wrote:
> On 2012-10-06 04:13, Peter Maydell wrote:
> > On 5 October 2012 19:01, Jan Kiszka <jan.kiszka@siemens.com> wrote:
> >> I'm not a fan of this either, but the alternatives are way more
> >> complicated. We either need to rewrite the chardev subsystem,
> >> specifically how mux'ed devices are registered and how the active one is
> >> selected. Or we need to avoid flushing "unrelated" BHs for block
> >> devices. Not sure of those read requests can be postponed.
> > 
> > Is this a regression? If it is then the obvious answer is to back
> > out whatever broke it...
> 
> I'm using this machine for the first time, so I cannot answer this from
> the top of my head. However, I don't think it can be a regression.
> 

What is the bug exactly? Outputting the monitor prompt when using
-nographic? If it is the case, please note that mips/mipsel is also
affected. IIRC it appears somewhere between 1.1 and 1.2.
Jan Kiszka - Oct. 8, 2012, 5:09 p.m.
On 2012-10-08 19:07, Aurelien Jarno wrote:
> On Mon, Oct 08, 2012 at 08:52:30AM +0200, Jan Kiszka wrote:
>> On 2012-10-06 04:13, Peter Maydell wrote:
>>> On 5 October 2012 19:01, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>>>> I'm not a fan of this either, but the alternatives are way more
>>>> complicated. We either need to rewrite the chardev subsystem,
>>>> specifically how mux'ed devices are registered and how the active one is
>>>> selected. Or we need to avoid flushing "unrelated" BHs for block
>>>> devices. Not sure of those read requests can be postponed.
>>>
>>> Is this a regression? If it is then the obvious answer is to back
>>> out whatever broke it...
>>
>> I'm using this machine for the first time, so I cannot answer this from
>> the top of my head. However, I don't think it can be a regression.
>>
> 
> What is the bug exactly? Outputting the monitor prompt when using
> -nographic? If it is the case, please note that mips/mipsel is also
> affected. IIRC it appears somewhere between 1.1 and 1.2.

-nographic (-serial mon:stdio) + SCSI disk, at least that was the
combination here.

See http://thread.gmane.org/gmane.comp.emulators.qemu/174687 for a more
generic, well, workaround.

Jan

Patch

diff --git a/hw/versatilepb.c b/hw/versatilepb.c
index 7a92034..d0ad59f 100644
--- a/hw/versatilepb.c
+++ b/hw/versatilepb.c
@@ -250,11 +250,6 @@  static void versatile_init(ram_addr_t ram_size,
     if (usb_enabled) {
         pci_create_simple(pci_bus, -1, "pci-ohci");
     }
-    n = drive_get_max_bus(IF_SCSI);
-    while (n >= 0) {
-        pci_create_simple(pci_bus, -1, "lsi53c895a");
-        n--;
-    }
 
     sysbus_create_simple("pl011", 0x101f1000, pic[12]);
     sysbus_create_simple("pl011", 0x101f2000, pic[13]);
@@ -325,6 +320,12 @@  static void versatile_init(ram_addr_t ram_size,
     /* 0x101f4000 SSPI.  */
     /* 0x34000000 NOR Flash */
 
+    n = drive_get_max_bus(IF_SCSI);
+    while (n >= 0) {
+        pci_create_simple(pci_bus, -1, "lsi53c895a");
+        n--;
+    }
+
     dinfo = drive_get(IF_PFLASH, 0, 0);
     if (!pflash_cfi01_register(VERSATILE_FLASH_ADDR, NULL, "versatile.flash",
                           VERSATILE_FLASH_SIZE, dinfo ? dinfo->bdrv : NULL,