Patchwork [qemu] hw/i386/pc_sysfw: support more than one flash drive

login
register
mail settings
Submitter Laszlo Ersek
Date Nov. 21, 2013, 10:21 p.m.
Message ID <1385072461-31317-1-git-send-email-lersek@redhat.com>
Download mbox | patch
Permalink /patch/293260/
State New
Headers show

Comments

Laszlo Ersek - Nov. 21, 2013, 10:21 p.m.
This patch allows the user to usefully specify

  -drive file=img_1,if=pflash,format=raw,readonly \
  -drive file=img_2,if=pflash,format=raw

on the command line. The flash images will be mapped under 4G in their
reverse unit order -- that is, with their base addresses progressing
downwards, in increasing unit order.

(The unit number increases with command line order if not explicitly
specified.)

This accommodates the following use case: suppose that OVMF is split in
two parts, a writeable host file for non-volatile variable storage, and a
read-only part for bootstrap and decompressible executable code.

The binary code part would be read-only, centrally managed on the host
system, and passed in as unit 0. The variable store would be writeable,
VM-specific, and passed in as unit 1.

  00000000ffe00000-00000000ffe1ffff (prio 0, R-): system.flash1
  00000000ffe20000-00000000ffffffff (prio 0, R-): system.flash0

(If the guest tries to write to the flash range that is backed by the
read-only drive, bdrv_write() in pflash_update() will correctly deny the
write with -EACCES, and pflash_update() won't care, which suits us well.)

Signed-off-by: Laszlo Ersek <lersek@redhat.com>
---
 hw/i386/pc_sysfw.c | 60 ++++++++++++++++++++++++++++++++++++++++--------------
 1 file changed, 45 insertions(+), 15 deletions(-)
Eric Blake - Nov. 21, 2013, 10:26 p.m.
On 11/21/2013 03:21 PM, Laszlo Ersek wrote:
> This patch allows the user to usefully specify
> 
>   -drive file=img_1,if=pflash,format=raw,readonly \
>   -drive file=img_2,if=pflash,format=raw
> 
> on the command line. The flash images will be mapped under 4G in their
> reverse unit order -- that is, with their base addresses progressing
> downwards, in increasing unit order.
> 

> +/* This function maps flash drives from 4G downward, in order of their unit
> + * numbers. Addressing within one flash drive is of course not reversed.
> + *
> + * The drive with unit number 0 is mapped at the highest address, and it is
> + * passed to pc_isa_bios_init(). Merging severral drives for isa-bios is not

s/severral/several/
Laszlo Ersek - Nov. 21, 2013, 10:33 p.m.
On 11/21/13 23:26, Eric Blake wrote:
> On 11/21/2013 03:21 PM, Laszlo Ersek wrote:
>> This patch allows the user to usefully specify
>>
>>   -drive file=img_1,if=pflash,format=raw,readonly \
>>   -drive file=img_2,if=pflash,format=raw
>>
>> on the command line. The flash images will be mapped under 4G in their
>> reverse unit order -- that is, with their base addresses progressing
>> downwards, in increasing unit order.
>>
> 
>> +/* This function maps flash drives from 4G downward, in order of their unit
>> + * numbers. Addressing within one flash drive is of course not reversed.
>> + *
>> + * The drive with unit number 0 is mapped at the highest address, and it is
>> + * passed to pc_isa_bios_init(). Merging severral drives for isa-bios is not
> 
> s/severral/several/
> 

I may have meant "very severrrral" :)

(Will fix if patch is deemed worthy otherwise.)

Thanks!
Laszlo
Markus Armbruster - Nov. 22, 2013, 12:21 p.m.
Laszlo Ersek <lersek@redhat.com> writes:

> This patch allows the user to usefully specify
>
>   -drive file=img_1,if=pflash,format=raw,readonly \
>   -drive file=img_2,if=pflash,format=raw
>
> on the command line. The flash images will be mapped under 4G in their
> reverse unit order -- that is, with their base addresses progressing
> downwards, in increasing unit order.
>
> (The unit number increases with command line order if not explicitly
> specified.)
>
> This accommodates the following use case: suppose that OVMF is split in
> two parts, a writeable host file for non-volatile variable storage, and a
> read-only part for bootstrap and decompressible executable code.
>
> The binary code part would be read-only, centrally managed on the host
> system, and passed in as unit 0. The variable store would be writeable,
> VM-specific, and passed in as unit 1.
>
>   00000000ffe00000-00000000ffe1ffff (prio 0, R-): system.flash1
>   00000000ffe20000-00000000ffffffff (prio 0, R-): system.flash0
>
> (If the guest tries to write to the flash range that is backed by the
> read-only drive, bdrv_write() in pflash_update() will correctly deny the
> write with -EACCES, and pflash_update() won't care, which suits us well.)

Before this patch:

You can define as many if=pflash drives as you want.  Any with non-zero
index are silently ignored.

If you don't specify one with index=0, you get a ROM at the top of the
32 bit address space, contents taken from -bios (default "bios.bin").
Up to its last 128KiB are aliased at the top of the ISA address space.

If you do specify one with index=0, you get a pflash device at the top
of the 32 bit address space, with contents from the drive, and -bios is
silently ignored.  Up to its last 128KiB are copied to a ROM at the top
of the (20 bit) ISA address space.

After this patch (please correct misunderstandings):

Now the drives after the first unused index are silently ignored.

If you don't specify one with index=0, no change.

If you do, you now get N pflash devices, where N is the first unused
index.  Each pflash's contents is taken from the respective drive.  The
flashes are mapped at the top of the 32 bit address space in reverse
index order.  -bios is silently ignored, as before.  Up to the last
128KiB of the index=0 flash are copied to a ROM at the top of the ISA
address space.

Thus, no change for index=0.  For index=1..N, we now get additional
flash devices.

Correct?
Laszlo Ersek - Nov. 22, 2013, 6:30 p.m.
On 11/22/13 13:21, Markus Armbruster wrote:
> Laszlo Ersek <lersek@redhat.com> writes:
> 
>> This patch allows the user to usefully specify
>>
>>   -drive file=img_1,if=pflash,format=raw,readonly \
>>   -drive file=img_2,if=pflash,format=raw
>>
>> on the command line. The flash images will be mapped under 4G in their
>> reverse unit order -- that is, with their base addresses progressing
>> downwards, in increasing unit order.
>>
>> (The unit number increases with command line order if not explicitly
>> specified.)
>>
>> This accommodates the following use case: suppose that OVMF is split in
>> two parts, a writeable host file for non-volatile variable storage, and a
>> read-only part for bootstrap and decompressible executable code.
>>
>> The binary code part would be read-only, centrally managed on the host
>> system, and passed in as unit 0. The variable store would be writeable,
>> VM-specific, and passed in as unit 1.
>>
>>   00000000ffe00000-00000000ffe1ffff (prio 0, R-): system.flash1
>>   00000000ffe20000-00000000ffffffff (prio 0, R-): system.flash0
>>
>> (If the guest tries to write to the flash range that is backed by the
>> read-only drive, bdrv_write() in pflash_update() will correctly deny the
>> write with -EACCES, and pflash_update() won't care, which suits us well.)
> 
> Before this patch:
> 
> You can define as many if=pflash drives as you want.  Any with non-zero
> index are silently ignored.
> 
> If you don't specify one with index=0, you get a ROM at the top of the
> 32 bit address space, contents taken from -bios (default "bios.bin").
> Up to its last 128KiB are aliased at the top of the ISA address space.
> 
> If you do specify one with index=0, you get a pflash device at the top
> of the 32 bit address space, with contents from the drive, and -bios is
> silently ignored.  Up to its last 128KiB are copied to a ROM at the top
> of the (20 bit) ISA address space.
> 
> After this patch (please correct misunderstandings):
> 
> Now the drives after the first unused index are silently ignored.
> 
> If you don't specify one with index=0, no change.
> 
> If you do, you now get N pflash devices, where N is the first unused
> index.  Each pflash's contents is taken from the respective drive.  The
> flashes are mapped at the top of the 32 bit address space in reverse
> index order.  -bios is silently ignored, as before.  Up to the last
> 128KiB of the index=0 flash are copied to a ROM at the top of the ISA
> address space.
> 
> Thus, no change for index=0.  For index=1..N, we now get additional
> flash devices.
> 
> Correct?

Yes.

Thanks
Laszlo
Markus Armbruster - Nov. 25, 2013, 3:22 p.m.
Laszlo Ersek <lersek@redhat.com> writes:

> On 11/22/13 13:21, Markus Armbruster wrote:
>> Laszlo Ersek <lersek@redhat.com> writes:
>> 
>>> This patch allows the user to usefully specify
>>>
>>>   -drive file=img_1,if=pflash,format=raw,readonly \
>>>   -drive file=img_2,if=pflash,format=raw
>>>
>>> on the command line. The flash images will be mapped under 4G in their
>>> reverse unit order -- that is, with their base addresses progressing
>>> downwards, in increasing unit order.
>>>
>>> (The unit number increases with command line order if not explicitly
>>> specified.)
>>>
>>> This accommodates the following use case: suppose that OVMF is split in
>>> two parts, a writeable host file for non-volatile variable storage, and a
>>> read-only part for bootstrap and decompressible executable code.
>>>
>>> The binary code part would be read-only, centrally managed on the host
>>> system, and passed in as unit 0. The variable store would be writeable,
>>> VM-specific, and passed in as unit 1.
>>>
>>>   00000000ffe00000-00000000ffe1ffff (prio 0, R-): system.flash1
>>>   00000000ffe20000-00000000ffffffff (prio 0, R-): system.flash0
>>>
>>> (If the guest tries to write to the flash range that is backed by the
>>> read-only drive, bdrv_write() in pflash_update() will correctly deny the
>>> write with -EACCES, and pflash_update() won't care, which suits us well.)
>> 
>> Before this patch:
>> 
>> You can define as many if=pflash drives as you want.  Any with non-zero
>> index are silently ignored.
>> 
>> If you don't specify one with index=0, you get a ROM at the top of the
>> 32 bit address space, contents taken from -bios (default "bios.bin").
>> Up to its last 128KiB are aliased at the top of the ISA address space.
>> 
>> If you do specify one with index=0, you get a pflash device at the top
>> of the 32 bit address space, with contents from the drive, and -bios is
>> silently ignored.  Up to its last 128KiB are copied to a ROM at the top
>> of the (20 bit) ISA address space.
>> 
>> After this patch (please correct misunderstandings):
>> 
>> Now the drives after the first unused index are silently ignored.
>> 
>> If you don't specify one with index=0, no change.
>> 
>> If you do, you now get N pflash devices, where N is the first unused
>> index.  Each pflash's contents is taken from the respective drive.  The
>> flashes are mapped at the top of the 32 bit address space in reverse
>> index order.  -bios is silently ignored, as before.  Up to the last
>> 128KiB of the index=0 flash are copied to a ROM at the top of the ISA
>> address space.
>> 
>> Thus, no change for index=0.  For index=1..N, we now get additional
>> flash devices.
>> 
>> Correct?
>
> Yes.

Thanks.

1. Multiple pflash devices

Is there any way for a guest to see the N separate pflash devices, or do
they look exactly like a single pflash device?

2. More board code device configuration magic

Our long term goal is to configure machines by configuration files
(i.e. data) rather than code.

Your patch extends the PC board code dealing with if=pflash for multiple
drives.

Reminder: -drive if=X (where X != none) creates a backend, and leaves it
in a place where board code can find it.  Board code may elect to create
a frontend using that backend.

It's desirable that any new board code creating a frontend for -drive
if=X,... is sufficiently simple so that users can get the same result
with some -drive if=none,... -device ... incantation.  The second form
provides full control over device properties.  See section "Block
Devices" in docs/qdev-device-use.txt for examples of such mappings.

This is the case for if=ide, if=scsi, if=floppy and if=virtio (see
docs/qdev-device-use.txt).  It's not yet the case for if=pflash, because
the qdev used with it (cfi.pflash01) is a sysbus device, and these
aren't available with -device, yet.

When that gets fixed, I'd expect support for putting the flash device at
a specific address.  An equivalent if=none incantation (free of board
code magic) should be possible.

Thus, the board magic your patch adds should be of the harmless kind.
Markus Armbruster - Nov. 25, 2013, 3:32 p.m.
Laszlo Ersek <lersek@redhat.com> writes:

> This patch allows the user to usefully specify
>
>   -drive file=img_1,if=pflash,format=raw,readonly \
>   -drive file=img_2,if=pflash,format=raw
>
> on the command line. The flash images will be mapped under 4G in their
> reverse unit order -- that is, with their base addresses progressing
> downwards, in increasing unit order.
>
> (The unit number increases with command line order if not explicitly
> specified.)
>
> This accommodates the following use case: suppose that OVMF is split in
> two parts, a writeable host file for non-volatile variable storage, and a
> read-only part for bootstrap and decompressible executable code.
>
> The binary code part would be read-only, centrally managed on the host
> system, and passed in as unit 0. The variable store would be writeable,
> VM-specific, and passed in as unit 1.
>
>   00000000ffe00000-00000000ffe1ffff (prio 0, R-): system.flash1
>   00000000ffe20000-00000000ffffffff (prio 0, R-): system.flash0
>
> (If the guest tries to write to the flash range that is backed by the
> read-only drive, bdrv_write() in pflash_update() will correctly deny the
> write with -EACCES, and pflash_update() won't care, which suits us well.)
>
> Signed-off-by: Laszlo Ersek <lersek@redhat.com>
> ---
>  hw/i386/pc_sysfw.c | 60 ++++++++++++++++++++++++++++++++++++++++--------------
>  1 file changed, 45 insertions(+), 15 deletions(-)
>
> diff --git a/hw/i386/pc_sysfw.c b/hw/i386/pc_sysfw.c
> index e917c83..1c3e3d6 100644
> --- a/hw/i386/pc_sysfw.c
> +++ b/hw/i386/pc_sysfw.c
> @@ -72,35 +72,65 @@ static void pc_isa_bios_init(MemoryRegion *rom_memory,
>      memory_region_set_readonly(isa_bios, true);
>  }
>  
> +/* This function maps flash drives from 4G downward, in order of their unit
> + * numbers. Addressing within one flash drive is of course not reversed.
> + *
> + * The drive with unit number 0 is mapped at the highest address, and it is
> + * passed to pc_isa_bios_init(). Merging severral drives for isa-bios is not
> + * supported.
> + *
> + * The caller is responsible to pass in the non-NULL @pflash_drv that
> + * corresponds to the flash drive with unit number 0.
> + */
>  static void pc_system_flash_init(MemoryRegion *rom_memory,
>                                   DriveInfo *pflash_drv)
>  {
> +    int unit = 0;
>      BlockDriverState *bdrv;
>      int64_t size;
> -    hwaddr phys_addr;
> +    hwaddr phys_addr = 0x100000000ULL;
>      int sector_bits, sector_size;
>      pflash_t *system_flash;
>      MemoryRegion *flash_mem;
> +    char name[64];
>  
> -    bdrv = pflash_drv->bdrv;
> -    size = bdrv_getlength(pflash_drv->bdrv);
>      sector_bits = 12;
>      sector_size = 1 << sector_bits;
>  
> -    if ((size % sector_size) != 0) {
> -        fprintf(stderr,
> -                "qemu: PC system firmware (pflash) must be a multiple of 0x%x\n",
> -                sector_size);
> -        exit(1);
> -    }
> +    do {
> +        bdrv = pflash_drv->bdrv;
> +        size = bdrv_getlength(bdrv);
> +        if ((size % sector_size) != 0) {
> +            fprintf(stderr,
> +                    "qemu: PC system firmware (pflash segment %d) must be a "
> +                    "multiple of 0x%x\n", unit, sector_size);
> +            exit(1);
> +        }
> +        if (size > phys_addr) {
> +            fprintf(stderr, "qemu: pflash segments must fit under 4G "
> +                    "cumulatively\n");

I suspect things go haywire long before you hit address zero.

Note that both pc_piix.c and pc_q35.c leave a hole in RAM just below 4G.
The former's hole starts at 0xe0000000, the latter's at 0xb0000000.
Should that be the limit?

> +            exit(1);
> +        }
>  
> -    phys_addr = 0x100000000ULL - size;
> -    system_flash = pflash_cfi01_register(phys_addr, NULL, "system.flash", size,
> -                                         bdrv, sector_size, size >> sector_bits,
> -                                         1, 0x0000, 0x0000, 0x0000, 0x0000, 0);
> -    flash_mem = pflash_cfi01_get_memory(system_flash);
> +        phys_addr -= size;
>  
> -    pc_isa_bios_init(rom_memory, flash_mem, size);
> +        /* pflash_cfi01_register() creates a deep copy of the name */
> +        snprintf(name, sizeof name, "system.flash%d", unit);
> +        system_flash = pflash_cfi01_register(phys_addr, NULL /* qdev */, name,
> +                                             size, bdrv, sector_size,
> +                                             size >> sector_bits,
> +                                             1      /* width */,
> +                                             0x0000 /* id0 */,
> +                                             0x0000 /* id1 */,
> +                                             0x0000 /* id2 */,
> +                                             0x0000 /* id3 */,
> +                                             0      /* be */);
> +        if (unit == 0) {
> +            flash_mem = pflash_cfi01_get_memory(system_flash);
> +            pc_isa_bios_init(rom_memory, flash_mem, size);
> +        }
> +        pflash_drv = drive_get(IF_PFLASH, 0, ++unit);
> +    } while (pflash_drv != NULL);
>  }
>  
>  static void old_pc_system_rom_init(MemoryRegion *rom_memory, bool isapc_ram_fw)

The drive with index 0 is passed as parameter pflash_drv.  The others
the function gets itself.  I find that ugly.  Have you considered
dropping the parameter?
Laszlo Ersek - Nov. 25, 2013, 7:45 p.m.
On 11/25/13 16:22, Markus Armbruster wrote:
> Laszlo Ersek <lersek@redhat.com> writes:
> 
>> On 11/22/13 13:21, Markus Armbruster wrote:
>>> Laszlo Ersek <lersek@redhat.com> writes:
>>>
>>>> This patch allows the user to usefully specify
>>>>
>>>>   -drive file=img_1,if=pflash,format=raw,readonly \
>>>>   -drive file=img_2,if=pflash,format=raw
>>>>
>>>> on the command line. The flash images will be mapped under 4G in their
>>>> reverse unit order -- that is, with their base addresses progressing
>>>> downwards, in increasing unit order.
>>>>
>>>> (The unit number increases with command line order if not explicitly
>>>> specified.)
>>>>
>>>> This accommodates the following use case: suppose that OVMF is split in
>>>> two parts, a writeable host file for non-volatile variable storage, and a
>>>> read-only part for bootstrap and decompressible executable code.
>>>>
>>>> The binary code part would be read-only, centrally managed on the host
>>>> system, and passed in as unit 0. The variable store would be writeable,
>>>> VM-specific, and passed in as unit 1.
>>>>
>>>>   00000000ffe00000-00000000ffe1ffff (prio 0, R-): system.flash1
>>>>   00000000ffe20000-00000000ffffffff (prio 0, R-): system.flash0
>>>>
>>>> (If the guest tries to write to the flash range that is backed by the
>>>> read-only drive, bdrv_write() in pflash_update() will correctly deny the
>>>> write with -EACCES, and pflash_update() won't care, which suits us well.)
>>>
>>> Before this patch:
>>>
>>> You can define as many if=pflash drives as you want.  Any with non-zero
>>> index are silently ignored.
>>>
>>> If you don't specify one with index=0, you get a ROM at the top of the
>>> 32 bit address space, contents taken from -bios (default "bios.bin").
>>> Up to its last 128KiB are aliased at the top of the ISA address space.
>>>
>>> If you do specify one with index=0, you get a pflash device at the top
>>> of the 32 bit address space, with contents from the drive, and -bios is
>>> silently ignored.  Up to its last 128KiB are copied to a ROM at the top
>>> of the (20 bit) ISA address space.
>>>
>>> After this patch (please correct misunderstandings):
>>>
>>> Now the drives after the first unused index are silently ignored.
>>>
>>> If you don't specify one with index=0, no change.
>>>
>>> If you do, you now get N pflash devices, where N is the first unused
>>> index.  Each pflash's contents is taken from the respective drive.  The
>>> flashes are mapped at the top of the 32 bit address space in reverse
>>> index order.  -bios is silently ignored, as before.  Up to the last
>>> 128KiB of the index=0 flash are copied to a ROM at the top of the ISA
>>> address space.
>>>
>>> Thus, no change for index=0.  For index=1..N, we now get additional
>>> flash devices.
>>>
>>> Correct?
>>
>> Yes.
> 
> Thanks.
> 
> 1. Multiple pflash devices
> 
> Is there any way for a guest to see the N separate pflash devices, or do
> they look exactly like a single pflash device?

The interpretation of multiple -pflash options is board specific. I
grepped the source for IF_PFLASH first, and found some boards that use
more than one flash device. Merging them in one contiguous target-phys
address range would be i386 specific.

This doesn't break compatibility (because multiple pflash devices used
not to be supported for i386 targets at all), but I agree that this
would posit their interpretation for the future, and thus it might need
deeper thought.

From looking at "hw/block/pflash_cfi01.c", it seems that the guest can
interrogate the flash device about its size (nb_blocs is stored in
cfi_table starting at 0x2D, and cfi_table can be queried by command 0x98
in pflash_read()). So, if the guest cares, it can figure out that there
are multiple devices backing the range. I think.

> 2. More board code device configuration magic
> 
> Our long term goal is to configure machines by configuration files
> (i.e. data) rather than code.
> 
> Your patch extends the PC board code dealing with if=pflash for multiple
> drives.
> 
> Reminder: -drive if=X (where X != none) creates a backend, and leaves it
> in a place where board code can find it.  Board code may elect to create
> a frontend using that backend.

Yes, I'm aware. I did think about this -- eg. just create a drive with
if=none, and add a frontend with -device something. But, the frontend
here is not a device (a "peripheral"), it's a memory region with special
mmio ops.

Pflash frontends can apparently be created with "-device cfi.pflash01,...':

cfi.pflash01.drive=drive
cfi.pflash01.num-blocks=uint32
cfi.pflash01.sector-length=uint64
cfi.pflash01.width=uint8
cfi.pflash01.big-endian=uint8
cfi.pflash01.id0=uint16
cfi.pflash01.id1=uint16
cfi.pflash01.id2=uint16
cfi.pflash01.id3=uint16
cfi.pflash01.name=string

We can even point it to the backing drive as well. But these properties
don't seem to allow for the i386 specific "processing", eg. where to map
the device in target-phys address space.

> It's desirable that any new board code creating a frontend for -drive
> if=X,... is sufficiently simple so that users can get the same result
> with some -drive if=none,... -device ... incantation.  The second form
> provides full control over device properties.  See section "Block
> Devices" in docs/qdev-device-use.txt for examples of such mappings.
> 
> This is the case for if=ide, if=scsi, if=floppy and if=virtio (see
> docs/qdev-device-use.txt).  It's not yet the case for if=pflash, because
> the qdev used with it (cfi.pflash01) is a sysbus device, and these
> aren't available with -device, yet.

Thanks, I didn't know that that was in the background.

> When that gets fixed, I'd expect support for putting the flash device at
> a specific address.  An equivalent if=none incantation (free of board
> code magic) should be possible.
> 
> Thus, the board magic your patch adds should be of the harmless kind.
> 

Thanks. I think I managed to follow your train of thought, I just wasn't
sure where you'd end up. I think, as you say, once sysbus devices can be
specified with -device, cfi.pflash01 could take an address, and if
that's omitted, the default for i386 could be "map it just below the
previous one".

Thanks for looking into this, you doubtlessly know way more about the
device model than I do.

Laszlo
Laszlo Ersek - Nov. 25, 2013, 8:17 p.m.
On 11/25/13 16:32, Markus Armbruster wrote:
> Laszlo Ersek <lersek@redhat.com> writes:
> 
>> This patch allows the user to usefully specify
>>
>>   -drive file=img_1,if=pflash,format=raw,readonly \
>>   -drive file=img_2,if=pflash,format=raw
>>
>> on the command line. The flash images will be mapped under 4G in their
>> reverse unit order -- that is, with their base addresses progressing
>> downwards, in increasing unit order.
>>
>> (The unit number increases with command line order if not explicitly
>> specified.)
>>
>> This accommodates the following use case: suppose that OVMF is split in
>> two parts, a writeable host file for non-volatile variable storage, and a
>> read-only part for bootstrap and decompressible executable code.
>>
>> The binary code part would be read-only, centrally managed on the host
>> system, and passed in as unit 0. The variable store would be writeable,
>> VM-specific, and passed in as unit 1.
>>
>>   00000000ffe00000-00000000ffe1ffff (prio 0, R-): system.flash1
>>   00000000ffe20000-00000000ffffffff (prio 0, R-): system.flash0
>>
>> (If the guest tries to write to the flash range that is backed by the
>> read-only drive, bdrv_write() in pflash_update() will correctly deny the
>> write with -EACCES, and pflash_update() won't care, which suits us well.)
>>
>> Signed-off-by: Laszlo Ersek <lersek@redhat.com>
>> ---
>>  hw/i386/pc_sysfw.c | 60 ++++++++++++++++++++++++++++++++++++++++--------------
>>  1 file changed, 45 insertions(+), 15 deletions(-)
>>
>> diff --git a/hw/i386/pc_sysfw.c b/hw/i386/pc_sysfw.c
>> index e917c83..1c3e3d6 100644
>> --- a/hw/i386/pc_sysfw.c
>> +++ b/hw/i386/pc_sysfw.c
>> @@ -72,35 +72,65 @@ static void pc_isa_bios_init(MemoryRegion *rom_memory,
>>      memory_region_set_readonly(isa_bios, true);
>>  }
>>  
>> +/* This function maps flash drives from 4G downward, in order of their unit
>> + * numbers. Addressing within one flash drive is of course not reversed.
>> + *
>> + * The drive with unit number 0 is mapped at the highest address, and it is
>> + * passed to pc_isa_bios_init(). Merging severral drives for isa-bios is not
>> + * supported.
>> + *
>> + * The caller is responsible to pass in the non-NULL @pflash_drv that
>> + * corresponds to the flash drive with unit number 0.
>> + */
>>  static void pc_system_flash_init(MemoryRegion *rom_memory,
>>                                   DriveInfo *pflash_drv)
>>  {
>> +    int unit = 0;
>>      BlockDriverState *bdrv;
>>      int64_t size;
>> -    hwaddr phys_addr;
>> +    hwaddr phys_addr = 0x100000000ULL;
>>      int sector_bits, sector_size;
>>      pflash_t *system_flash;
>>      MemoryRegion *flash_mem;
>> +    char name[64];
>>  
>> -    bdrv = pflash_drv->bdrv;
>> -    size = bdrv_getlength(pflash_drv->bdrv);
>>      sector_bits = 12;
>>      sector_size = 1 << sector_bits;
>>  
>> -    if ((size % sector_size) != 0) {
>> -        fprintf(stderr,
>> -                "qemu: PC system firmware (pflash) must be a multiple of 0x%x\n",
>> -                sector_size);
>> -        exit(1);
>> -    }
>> +    do {
>> +        bdrv = pflash_drv->bdrv;
>> +        size = bdrv_getlength(bdrv);
>> +        if ((size % sector_size) != 0) {
>> +            fprintf(stderr,
>> +                    "qemu: PC system firmware (pflash segment %d) must be a "
>> +                    "multiple of 0x%x\n", unit, sector_size);
>> +            exit(1);
>> +        }
>> +        if (size > phys_addr) {
>> +            fprintf(stderr, "qemu: pflash segments must fit under 4G "
>> +                    "cumulatively\n");
> 
> I suspect things go haywire long before you hit address zero.
> 
> Note that both pc_piix.c and pc_q35.c leave a hole in RAM just below 4G.
> The former's hole starts at 0xe0000000, the latter's at 0xb0000000.
> Should that be the limit?

I wanted to do the bare minimal here, to catch obviously wrong backing
drives (like a 10G file). This is already more verification than what
the current code does.

I wouldn't mind more a specific check, but I don't want to suggest (with
more specific code) that it's actually *safe* to go down to the limit
that I'd put here.

For example, the IO-APIC mmio range is [0xFEE00000..0xFEE01000[, leaving
free 18MB-4KB just below 4G. (Of which the current OVMF, including
variable store, takes up 2MB.) Grep IO_APIC_DEFAULT_ADDRESS.

I just don't want to send the message "it's safe to go all the way down
there". Right now the responsibility is with the user (you can specify a
single pflash device that's 20MB in size even now), and I'd like to
stick with that.

I believe that

  pflash_cfi01_register()
    sysbus_mmio_map()
      sysbus_mmio_map_common()
        memory_region_add_subregion()
          memory_region_add_subregion_common()

could, in theory, find a conflict at runtime (see the #if 0-surrounded
collision warning in memory_region_add_subregion_common()). But the
memory API doesn't consider such collisions hard errors, and no status
code is propagated to the caller.

So, if a saner / more reliable limit can be identified, I wouldn't mind
checking against that, but right now I know of no such sane / general
enough limit.

> 
>> +            exit(1);
>> +        }
>>  
>> -    phys_addr = 0x100000000ULL - size;
>> -    system_flash = pflash_cfi01_register(phys_addr, NULL, "system.flash", size,
>> -                                         bdrv, sector_size, size >> sector_bits,
>> -                                         1, 0x0000, 0x0000, 0x0000, 0x0000, 0);
>> -    flash_mem = pflash_cfi01_get_memory(system_flash);
>> +        phys_addr -= size;
>>  
>> -    pc_isa_bios_init(rom_memory, flash_mem, size);
>> +        /* pflash_cfi01_register() creates a deep copy of the name */
>> +        snprintf(name, sizeof name, "system.flash%d", unit);
>> +        system_flash = pflash_cfi01_register(phys_addr, NULL /* qdev */, name,
>> +                                             size, bdrv, sector_size,
>> +                                             size >> sector_bits,
>> +                                             1      /* width */,
>> +                                             0x0000 /* id0 */,
>> +                                             0x0000 /* id1 */,
>> +                                             0x0000 /* id2 */,
>> +                                             0x0000 /* id3 */,
>> +                                             0      /* be */);
>> +        if (unit == 0) {
>> +            flash_mem = pflash_cfi01_get_memory(system_flash);
>> +            pc_isa_bios_init(rom_memory, flash_mem, size);
>> +        }
>> +        pflash_drv = drive_get(IF_PFLASH, 0, ++unit);
>> +    } while (pflash_drv != NULL);
>>  }
>>  
>>  static void old_pc_system_rom_init(MemoryRegion *rom_memory, bool isapc_ram_fw)
> 
> The drive with index 0 is passed as parameter pflash_drv.  The others
> the function gets itself.  I find that ugly.  Have you considered
> dropping the parameter?

I didn't like drive_get() to begin with. It always starts to scan the
drive list from the beginning, which makes the new loop in
pc_system_flash_init() O(n^2).

I was missing an interator-style interface for the drives, but I found
none, and I thought that iterating myself through them in O(n) (and
checking for the types) would break the current DriveInfo encapsulation.
So I kinda gave up on "elegance".

Ideally, what should be dropped is the "unit" local variable in
pc_system_flash_init(). The function should continue to take
"pflash_drv", which should however qualify as a pre-initialized
iterator. Then pc_system_flash_init() should traverse it until it runs out.

I can of course remove the parameter and start a "while" loop
(rather than a "do" loop) with drive_get(IF_PFLASH, 0, 0), if you
consider that an improvement.

Thanks!
Laszlo
Markus Armbruster - Nov. 26, 2013, 12:36 p.m.
Laszlo Ersek <lersek@redhat.com> writes:

> On 11/25/13 16:22, Markus Armbruster wrote:
>> Laszlo Ersek <lersek@redhat.com> writes:
>> 
>>> On 11/22/13 13:21, Markus Armbruster wrote:
>>>> Laszlo Ersek <lersek@redhat.com> writes:
>>>>
>>>>> This patch allows the user to usefully specify
>>>>>
>>>>>   -drive file=img_1,if=pflash,format=raw,readonly \
>>>>>   -drive file=img_2,if=pflash,format=raw
>>>>>
>>>>> on the command line. The flash images will be mapped under 4G in their
>>>>> reverse unit order -- that is, with their base addresses progressing
>>>>> downwards, in increasing unit order.
>>>>>
>>>>> (The unit number increases with command line order if not explicitly
>>>>> specified.)
>>>>>
>>>>> This accommodates the following use case: suppose that OVMF is split in
>>>>> two parts, a writeable host file for non-volatile variable storage, and a
>>>>> read-only part for bootstrap and decompressible executable code.
>>>>>
>>>>> The binary code part would be read-only, centrally managed on the host
>>>>> system, and passed in as unit 0. The variable store would be writeable,
>>>>> VM-specific, and passed in as unit 1.
>>>>>
>>>>>   00000000ffe00000-00000000ffe1ffff (prio 0, R-): system.flash1
>>>>>   00000000ffe20000-00000000ffffffff (prio 0, R-): system.flash0
>>>>>
>>>>> (If the guest tries to write to the flash range that is backed by the
>>>>> read-only drive, bdrv_write() in pflash_update() will correctly deny the
>>>>> write with -EACCES, and pflash_update() won't care, which suits us well.)
>>>>
>>>> Before this patch:
>>>>
>>>> You can define as many if=pflash drives as you want.  Any with non-zero
>>>> index are silently ignored.
>>>>
>>>> If you don't specify one with index=0, you get a ROM at the top of the
>>>> 32 bit address space, contents taken from -bios (default "bios.bin").
>>>> Up to its last 128KiB are aliased at the top of the ISA address space.
>>>>
>>>> If you do specify one with index=0, you get a pflash device at the top
>>>> of the 32 bit address space, with contents from the drive, and -bios is
>>>> silently ignored.  Up to its last 128KiB are copied to a ROM at the top
>>>> of the (20 bit) ISA address space.
>>>>
>>>> After this patch (please correct misunderstandings):
>>>>
>>>> Now the drives after the first unused index are silently ignored.
>>>>
>>>> If you don't specify one with index=0, no change.
>>>>
>>>> If you do, you now get N pflash devices, where N is the first unused
>>>> index.  Each pflash's contents is taken from the respective drive.  The
>>>> flashes are mapped at the top of the 32 bit address space in reverse
>>>> index order.  -bios is silently ignored, as before.  Up to the last
>>>> 128KiB of the index=0 flash are copied to a ROM at the top of the ISA
>>>> address space.
>>>>
>>>> Thus, no change for index=0.  For index=1..N, we now get additional
>>>> flash devices.
>>>>
>>>> Correct?
>>>
>>> Yes.
>> 
>> Thanks.
>> 
>> 1. Multiple pflash devices
>> 
>> Is there any way for a guest to see the N separate pflash devices, or do
>> they look exactly like a single pflash device?
>
> The interpretation of multiple -pflash options is board specific. I
> grepped the source for IF_PFLASH first, and found some boards that use
> more than one flash device. Merging them in one contiguous target-phys
> address range would be i386 specific.
>
> This doesn't break compatibility (because multiple pflash devices used
> not to be supported for i386 targets at all), but I agree that this
> would posit their interpretation for the future, and thus it might need
> deeper thought.
>
>>From looking at "hw/block/pflash_cfi01.c", it seems that the guest can
> interrogate the flash device about its size (nb_blocs is stored in
> cfi_table starting at 0x2D, and cfi_table can be queried by command 0x98
> in pflash_read()). So, if the guest cares, it can figure out that there
> are multiple devices backing the range. I think.

Your stated purpose for multiple -pflash:

    This accommodates the following use case: suppose that OVMF is split in
    two parts, a writeable host file for non-volatile variable storage, and a
    read-only part for bootstrap and decompressible executable code.

Such a split between writable part and read-only part makes sense to me.
How is it done in physical hardware?  Single device with configurable
write-protect, or two separate devices?

>> 2. More board code device configuration magic
>> 
>> Our long term goal is to configure machines by configuration files
>> (i.e. data) rather than code.
>> 
>> Your patch extends the PC board code dealing with if=pflash for multiple
>> drives.
>> 
>> Reminder: -drive if=X (where X != none) creates a backend, and leaves it
>> in a place where board code can find it.  Board code may elect to create
>> a frontend using that backend.
>
> Yes, I'm aware. I did think about this -- eg. just create a drive with
> if=none, and add a frontend with -device something. But, the frontend
> here is not a device (a "peripheral"), it's a memory region with special
> mmio ops.
>
> Pflash frontends can apparently be created with "-device cfi.pflash01,...':
>
> cfi.pflash01.drive=drive
> cfi.pflash01.num-blocks=uint32
> cfi.pflash01.sector-length=uint64
> cfi.pflash01.width=uint8
> cfi.pflash01.big-endian=uint8
> cfi.pflash01.id0=uint16
> cfi.pflash01.id1=uint16
> cfi.pflash01.id2=uint16
> cfi.pflash01.id3=uint16
> cfi.pflash01.name=string
>
> We can even point it to the backing drive as well. But these properties
> don't seem to allow for the i386 specific "processing", eg. where to map
> the device in target-phys address space.

It's a sysbus device, and these still don't work with -device.  My
pending "[PATCH v3 00/10] Clean up and fix no_user" makes them
unavailable with -device.

>> It's desirable that any new board code creating a frontend for -drive
>> if=X,... is sufficiently simple so that users can get the same result
>> with some -drive if=none,... -device ... incantation.  The second form
>> provides full control over device properties.  See section "Block
>> Devices" in docs/qdev-device-use.txt for examples of such mappings.
>> 
>> This is the case for if=ide, if=scsi, if=floppy and if=virtio (see
>> docs/qdev-device-use.txt).  It's not yet the case for if=pflash, because
>> the qdev used with it (cfi.pflash01) is a sysbus device, and these
>> aren't available with -device, yet.

Actually, s/aren't available/don't work/ without my pending series just
mentioned.

> Thanks, I didn't know that that was in the background.
>
>> When that gets fixed, I'd expect support for putting the flash device at
>> a specific address.  An equivalent if=none incantation (free of board
>> code magic) should be possible.
>> 
>> Thus, the board magic your patch adds should be of the harmless kind.
>> 
>
> Thanks. I think I managed to follow your train of thought, I just wasn't
> sure where you'd end up. I think, as you say, once sysbus devices can be
> specified with -device, cfi.pflash01 could take an address, and if
> that's omitted, the default for i386 could be "map it just below the
> previous one".

Yes, except I wouldn't expect such fancy defaults.

> Thanks for looking into this, you doubtlessly know way more about the
> device model than I do.

No problem, I just wanted to figure out whether we're creating even more
legacy -drive headaches here, so why not share my reasoning.
Markus Armbruster - Nov. 26, 2013, 12:53 p.m.
Laszlo Ersek <lersek@redhat.com> writes:

> On 11/22/13 13:21, Markus Armbruster wrote:
>> Laszlo Ersek <lersek@redhat.com> writes:
>> 
>>> This patch allows the user to usefully specify
>>>
>>>   -drive file=img_1,if=pflash,format=raw,readonly \
>>>   -drive file=img_2,if=pflash,format=raw
>>>
>>> on the command line. The flash images will be mapped under 4G in their
>>> reverse unit order -- that is, with their base addresses progressing
>>> downwards, in increasing unit order.
>>>
>>> (The unit number increases with command line order if not explicitly
>>> specified.)
>>>
>>> This accommodates the following use case: suppose that OVMF is split in
>>> two parts, a writeable host file for non-volatile variable storage, and a
>>> read-only part for bootstrap and decompressible executable code.
>>>
>>> The binary code part would be read-only, centrally managed on the host
>>> system, and passed in as unit 0. The variable store would be writeable,
>>> VM-specific, and passed in as unit 1.
>>>
>>>   00000000ffe00000-00000000ffe1ffff (prio 0, R-): system.flash1
>>>   00000000ffe20000-00000000ffffffff (prio 0, R-): system.flash0
>>>
>>> (If the guest tries to write to the flash range that is backed by the
>>> read-only drive, bdrv_write() in pflash_update() will correctly deny the
>>> write with -EACCES, and pflash_update() won't care, which suits us well.)
>> 
>> Before this patch:
>> 
>> You can define as many if=pflash drives as you want.  Any with non-zero
>> index are silently ignored.
>> 
>> If you don't specify one with index=0, you get a ROM at the top of the
>> 32 bit address space, contents taken from -bios (default "bios.bin").
>> Up to its last 128KiB are aliased at the top of the ISA address space.
>> 
>> If you do specify one with index=0, you get a pflash device at the top
>> of the 32 bit address space, with contents from the drive, and -bios is
>> silently ignored.  Up to its last 128KiB are copied to a ROM at the top
>> of the (20 bit) ISA address space.
>> 
>> After this patch (please correct misunderstandings):
>> 
>> Now the drives after the first unused index are silently ignored.
>> 
>> If you don't specify one with index=0, no change.
>> 
>> If you do, you now get N pflash devices, where N is the first unused
>> index.  Each pflash's contents is taken from the respective drive.  The
>> flashes are mapped at the top of the 32 bit address space in reverse
>> index order.  -bios is silently ignored, as before.  Up to the last
>> 128KiB of the index=0 flash are copied to a ROM at the top of the ISA
>> address space.
>> 
>> Thus, no change for index=0.  For index=1..N, we now get additional
>> flash devices.
>> 
>> Correct?
>
> Yes.

Thus, we grab *all* if=pflash drives for this purpose.

Your stated use case wants just two.

Hmm.  Are we sure we'll never want to map an if=pflash device somewhere
else?

I'm asking because such ABI things are a pain to change later on...
Markus Armbruster - Nov. 26, 2013, 1:11 p.m.
Laszlo Ersek <lersek@redhat.com> writes:

> On 11/25/13 16:32, Markus Armbruster wrote:
>> Laszlo Ersek <lersek@redhat.com> writes:
>> 
>>> This patch allows the user to usefully specify
>>>
>>>   -drive file=img_1,if=pflash,format=raw,readonly \
>>>   -drive file=img_2,if=pflash,format=raw
>>>
>>> on the command line. The flash images will be mapped under 4G in their
>>> reverse unit order -- that is, with their base addresses progressing
>>> downwards, in increasing unit order.
>>>
>>> (The unit number increases with command line order if not explicitly
>>> specified.)
>>>
>>> This accommodates the following use case: suppose that OVMF is split in
>>> two parts, a writeable host file for non-volatile variable storage, and a
>>> read-only part for bootstrap and decompressible executable code.
>>>
>>> The binary code part would be read-only, centrally managed on the host
>>> system, and passed in as unit 0. The variable store would be writeable,
>>> VM-specific, and passed in as unit 1.
>>>
>>>   00000000ffe00000-00000000ffe1ffff (prio 0, R-): system.flash1
>>>   00000000ffe20000-00000000ffffffff (prio 0, R-): system.flash0
>>>
>>> (If the guest tries to write to the flash range that is backed by the
>>> read-only drive, bdrv_write() in pflash_update() will correctly deny the
>>> write with -EACCES, and pflash_update() won't care, which suits us well.)
>>>
>>> Signed-off-by: Laszlo Ersek <lersek@redhat.com>
>>> ---
>>>  hw/i386/pc_sysfw.c | 60 ++++++++++++++++++++++++++++++++++++++++--------------
>>>  1 file changed, 45 insertions(+), 15 deletions(-)
>>>
>>> diff --git a/hw/i386/pc_sysfw.c b/hw/i386/pc_sysfw.c
>>> index e917c83..1c3e3d6 100644
>>> --- a/hw/i386/pc_sysfw.c
>>> +++ b/hw/i386/pc_sysfw.c
>>> @@ -72,35 +72,65 @@ static void pc_isa_bios_init(MemoryRegion *rom_memory,
>>>      memory_region_set_readonly(isa_bios, true);
>>>  }
>>>  
>>> +/* This function maps flash drives from 4G downward, in order of their unit
>>> + * numbers. Addressing within one flash drive is of course not reversed.
>>> + *
>>> + * The drive with unit number 0 is mapped at the highest address, and it is
>>> + * passed to pc_isa_bios_init(). Merging severral drives for isa-bios is not
>>> + * supported.
>>> + *
>>> + * The caller is responsible to pass in the non-NULL @pflash_drv that
>>> + * corresponds to the flash drive with unit number 0.
>>> + */
>>>  static void pc_system_flash_init(MemoryRegion *rom_memory,
>>>                                   DriveInfo *pflash_drv)
>>>  {
>>> +    int unit = 0;
>>>      BlockDriverState *bdrv;
>>>      int64_t size;
>>> -    hwaddr phys_addr;
>>> +    hwaddr phys_addr = 0x100000000ULL;
>>>      int sector_bits, sector_size;
>>>      pflash_t *system_flash;
>>>      MemoryRegion *flash_mem;
>>> +    char name[64];
>>>  
>>> -    bdrv = pflash_drv->bdrv;
>>> -    size = bdrv_getlength(pflash_drv->bdrv);
>>>      sector_bits = 12;
>>>      sector_size = 1 << sector_bits;
>>>  
>>> -    if ((size % sector_size) != 0) {
>>> -        fprintf(stderr,
>>> -                "qemu: PC system firmware (pflash) must be a multiple of 0x%x\n",
>>> -                sector_size);
>>> -        exit(1);
>>> -    }
>>> +    do {
>>> +        bdrv = pflash_drv->bdrv;
>>> +        size = bdrv_getlength(bdrv);
>>> +        if ((size % sector_size) != 0) {
>>> +            fprintf(stderr,
>>> +                    "qemu: PC system firmware (pflash segment %d) must be a "
>>> +                    "multiple of 0x%x\n", unit, sector_size);
>>> +            exit(1);
>>> +        }
>>> +        if (size > phys_addr) {
>>> +            fprintf(stderr, "qemu: pflash segments must fit under 4G "
>>> +                    "cumulatively\n");

You're just following existing bad practice here, but correct use of
error_report() would give you nicer messages.  Happy to explain if
you're interested.

>> I suspect things go haywire long before you hit address zero.
>> 
>> Note that both pc_piix.c and pc_q35.c leave a hole in RAM just below 4G.
>> The former's hole starts at 0xe0000000, the latter's at 0xb0000000.
>> Should that be the limit?
>
> I wanted to do the bare minimal here, to catch obviously wrong backing
> drives (like a 10G file). This is already more verification than what
> the current code does.
>
> I wouldn't mind more a specific check, but I don't want to suggest (with
> more specific code) that it's actually *safe* to go down to the limit
> that I'd put here.
>
> For example, the IO-APIC mmio range is [0xFEE00000..0xFEE01000[, leaving
> free 18MB-4KB just below 4G. (Of which the current OVMF, including
> variable store, takes up 2MB.) Grep IO_APIC_DEFAULT_ADDRESS.
>
> I just don't want to send the message "it's safe to go all the way down
> there". Right now the responsibility is with the user (you can specify a
> single pflash device that's 20MB in size even now), and I'd like to
> stick with that.
>
> I believe that
>
>   pflash_cfi01_register()
>     sysbus_mmio_map()
>       sysbus_mmio_map_common()
>         memory_region_add_subregion()
>           memory_region_add_subregion_common()
>
> could, in theory, find a conflict at runtime (see the #if 0-surrounded
> collision warning in memory_region_add_subregion_common()). But the
> memory API doesn't consider such collisions hard errors, and no status
> code is propagated to the caller.
>
> So, if a saner / more reliable limit can be identified, I wouldn't mind
> checking against that, but right now I know of no such sane / general
> enough limit.

If the caller knows the true limit, it could pass it.

If the true limit isn't practical to find, then what about a comment
explaining the problem?

>>> +            exit(1);
>>> +        }
>>>  
>>> -    phys_addr = 0x100000000ULL - size;
>>> - system_flash = pflash_cfi01_register(phys_addr, NULL,
>>> "system.flash", size,
>>> - bdrv, sector_size, size >> sector_bits,
>>> - 1, 0x0000, 0x0000, 0x0000, 0x0000, 0);
>>> -    flash_mem = pflash_cfi01_get_memory(system_flash);
>>> +        phys_addr -= size;
>>>  
>>> -    pc_isa_bios_init(rom_memory, flash_mem, size);
>>> +        /* pflash_cfi01_register() creates a deep copy of the name */
>>> +        snprintf(name, sizeof name, "system.flash%d", unit);
>>> + system_flash = pflash_cfi01_register(phys_addr, NULL /* qdev */,
>>> name,
>>> +                                             size, bdrv, sector_size,
>>> +                                             size >> sector_bits,
>>> +                                             1      /* width */,
>>> +                                             0x0000 /* id0 */,
>>> +                                             0x0000 /* id1 */,
>>> +                                             0x0000 /* id2 */,
>>> +                                             0x0000 /* id3 */,
>>> +                                             0      /* be */);
>>> +        if (unit == 0) {
>>> +            flash_mem = pflash_cfi01_get_memory(system_flash);
>>> +            pc_isa_bios_init(rom_memory, flash_mem, size);
>>> +        }
>>> +        pflash_drv = drive_get(IF_PFLASH, 0, ++unit);
>>> +    } while (pflash_drv != NULL);
>>>  }
>>>  
>>>  static void old_pc_system_rom_init(MemoryRegion *rom_memory, bool
>>> isapc_ram_fw)
>> 
>> The drive with index 0 is passed as parameter pflash_drv.  The others
>> the function gets itself.  I find that ugly.  Have you considered
>> dropping the parameter?
>
> I didn't like drive_get() to begin with. It always starts to scan the
> drive list from the beginning, which makes the new loop in
> pc_system_flash_init() O(n^2).

Yes, it's pretty stupid, but n is small, and the code runs only during
initialization.

> I was missing an interator-style interface for the drives, but I found
> none, and I thought that iterating myself through them in O(n) (and
> checking for the types) would break the current DriveInfo encapsulation.
> So I kinda gave up on "elegance".

Legacy drives and elegance are about as attracted to each other as oil
and water.

> Ideally, what should be dropped is the "unit" local variable in
> pc_system_flash_init(). The function should continue to take
> "pflash_drv", which should however qualify as a pre-initialized
> iterator. Then pc_system_flash_init() should traverse it until it runs out.

Yes, that would work, but requires a bit of new blockdev infrastructure.

> I can of course remove the parameter and start a "while" loop
> (rather than a "do" loop) with drive_get(IF_PFLASH, 0, 0), if you
> consider that an improvement.

I'm partial to for-loops that let me see the complete loop control in
one glance:

    for (unit = 0; drv = drive_get(IF_PFLASH, 0, unit); unit++)
Laszlo Ersek - Nov. 26, 2013, 1:27 p.m.
On 11/26/13 13:53, Markus Armbruster wrote:

> Thus, we grab *all* if=pflash drives for this purpose.
> 
> Your stated use case wants just two.
> 
> Hmm.  Are we sure we'll never want to map an if=pflash device somewhere
> else?

No, I'm not sure.

Thanks
Laszlo
Laszlo Ersek - Nov. 26, 2013, 1:32 p.m.
On 11/26/13 13:36, Markus Armbruster wrote:

> Your stated purpose for multiple -pflash:
> 
>     This accommodates the following use case: suppose that OVMF is split in
>     two parts, a writeable host file for non-volatile variable storage, and a
>     read-only part for bootstrap and decompressible executable code.
> 
> Such a split between writable part and read-only part makes sense to me.
> How is it done in physical hardware?  Single device with configurable
> write-protect, or two separate devices?

(Jordan could help more.)

Likely one device that's fully writeable.

The flash driver (through which the NvVar updates go) makes sure that a
kind of journal is written and that the live variable store is not
corrupted even if power is cut during an update.

However, if something writes to the flash without going through the
driver, it can brick the board. (Trample over the bootstrap code for
example.) I think.

Laszlo
Laszlo Ersek - Nov. 26, 2013, 1:39 p.m.
On 11/26/13 14:11, Markus Armbruster wrote:
> Laszlo Ersek <lersek@redhat.com> writes:
> 
>> On 11/25/13 16:32, Markus Armbruster wrote:
>>> Laszlo Ersek <lersek@redhat.com> writes:
>>>
>>>> This patch allows the user to usefully specify
>>>>
>>>>   -drive file=img_1,if=pflash,format=raw,readonly \
>>>>   -drive file=img_2,if=pflash,format=raw
>>>>
>>>> on the command line. The flash images will be mapped under 4G in their
>>>> reverse unit order -- that is, with their base addresses progressing
>>>> downwards, in increasing unit order.
>>>>
>>>> (The unit number increases with command line order if not explicitly
>>>> specified.)
>>>>
>>>> This accommodates the following use case: suppose that OVMF is split in
>>>> two parts, a writeable host file for non-volatile variable storage, and a
>>>> read-only part for bootstrap and decompressible executable code.
>>>>
>>>> The binary code part would be read-only, centrally managed on the host
>>>> system, and passed in as unit 0. The variable store would be writeable,
>>>> VM-specific, and passed in as unit 1.
>>>>
>>>>   00000000ffe00000-00000000ffe1ffff (prio 0, R-): system.flash1
>>>>   00000000ffe20000-00000000ffffffff (prio 0, R-): system.flash0
>>>>
>>>> (If the guest tries to write to the flash range that is backed by the
>>>> read-only drive, bdrv_write() in pflash_update() will correctly deny the
>>>> write with -EACCES, and pflash_update() won't care, which suits us well.)
>>>>
>>>> Signed-off-by: Laszlo Ersek <lersek@redhat.com>
>>>> ---
>>>>  hw/i386/pc_sysfw.c | 60 ++++++++++++++++++++++++++++++++++++++++--------------
>>>>  1 file changed, 45 insertions(+), 15 deletions(-)
>>>>
>>>> diff --git a/hw/i386/pc_sysfw.c b/hw/i386/pc_sysfw.c
>>>> index e917c83..1c3e3d6 100644
>>>> --- a/hw/i386/pc_sysfw.c
>>>> +++ b/hw/i386/pc_sysfw.c
>>>> @@ -72,35 +72,65 @@ static void pc_isa_bios_init(MemoryRegion *rom_memory,
>>>>      memory_region_set_readonly(isa_bios, true);
>>>>  }
>>>>  
>>>> +/* This function maps flash drives from 4G downward, in order of their unit
>>>> + * numbers. Addressing within one flash drive is of course not reversed.
>>>> + *
>>>> + * The drive with unit number 0 is mapped at the highest address, and it is
>>>> + * passed to pc_isa_bios_init(). Merging severral drives for isa-bios is not
>>>> + * supported.
>>>> + *
>>>> + * The caller is responsible to pass in the non-NULL @pflash_drv that
>>>> + * corresponds to the flash drive with unit number 0.
>>>> + */
>>>>  static void pc_system_flash_init(MemoryRegion *rom_memory,
>>>>                                   DriveInfo *pflash_drv)
>>>>  {
>>>> +    int unit = 0;
>>>>      BlockDriverState *bdrv;
>>>>      int64_t size;
>>>> -    hwaddr phys_addr;
>>>> +    hwaddr phys_addr = 0x100000000ULL;
>>>>      int sector_bits, sector_size;
>>>>      pflash_t *system_flash;
>>>>      MemoryRegion *flash_mem;
>>>> +    char name[64];
>>>>  
>>>> -    bdrv = pflash_drv->bdrv;
>>>> -    size = bdrv_getlength(pflash_drv->bdrv);
>>>>      sector_bits = 12;
>>>>      sector_size = 1 << sector_bits;
>>>>  
>>>> -    if ((size % sector_size) != 0) {
>>>> -        fprintf(stderr,
>>>> -                "qemu: PC system firmware (pflash) must be a multiple of 0x%x\n",
>>>> -                sector_size);
>>>> -        exit(1);
>>>> -    }
>>>> +    do {
>>>> +        bdrv = pflash_drv->bdrv;
>>>> +        size = bdrv_getlength(bdrv);
>>>> +        if ((size % sector_size) != 0) {
>>>> +            fprintf(stderr,
>>>> +                    "qemu: PC system firmware (pflash segment %d) must be a "
>>>> +                    "multiple of 0x%x\n", unit, sector_size);
>>>> +            exit(1);
>>>> +        }
>>>> +        if (size > phys_addr) {
>>>> +            fprintf(stderr, "qemu: pflash segments must fit under 4G "
>>>> +                    "cumulatively\n");
> 
> You're just following existing bad practice here, but correct use of
> error_report() would give you nicer messages.  Happy to explain if
> you're interested.

You have the Location thing in mind, eg. automatically binding the error
message to the offending command line option, right?

I've seen you use it before in another patch. Hm... It was commit
ec2df8c10a4585ba4641ae482cf2f5f13daa810e.

I will try to address the rest of your comments too when I get back to
this patch.

Thanks
Laszlo

> 
>>> I suspect things go haywire long before you hit address zero.
>>>
>>> Note that both pc_piix.c and pc_q35.c leave a hole in RAM just below 4G.
>>> The former's hole starts at 0xe0000000, the latter's at 0xb0000000.
>>> Should that be the limit?
>>
>> I wanted to do the bare minimal here, to catch obviously wrong backing
>> drives (like a 10G file). This is already more verification than what
>> the current code does.
>>
>> I wouldn't mind more a specific check, but I don't want to suggest (with
>> more specific code) that it's actually *safe* to go down to the limit
>> that I'd put here.
>>
>> For example, the IO-APIC mmio range is [0xFEE00000..0xFEE01000[, leaving
>> free 18MB-4KB just below 4G. (Of which the current OVMF, including
>> variable store, takes up 2MB.) Grep IO_APIC_DEFAULT_ADDRESS.
>>
>> I just don't want to send the message "it's safe to go all the way down
>> there". Right now the responsibility is with the user (you can specify a
>> single pflash device that's 20MB in size even now), and I'd like to
>> stick with that.
>>
>> I believe that
>>
>>   pflash_cfi01_register()
>>     sysbus_mmio_map()
>>       sysbus_mmio_map_common()
>>         memory_region_add_subregion()
>>           memory_region_add_subregion_common()
>>
>> could, in theory, find a conflict at runtime (see the #if 0-surrounded
>> collision warning in memory_region_add_subregion_common()). But the
>> memory API doesn't consider such collisions hard errors, and no status
>> code is propagated to the caller.
>>
>> So, if a saner / more reliable limit can be identified, I wouldn't mind
>> checking against that, but right now I know of no such sane / general
>> enough limit.
> 
> If the caller knows the true limit, it could pass it.
> 
> If the true limit isn't practical to find, then what about a comment
> explaining the problem?
> 
>>>> +            exit(1);
>>>> +        }
>>>>  
>>>> -    phys_addr = 0x100000000ULL - size;
>>>> - system_flash = pflash_cfi01_register(phys_addr, NULL,
>>>> "system.flash", size,
>>>> - bdrv, sector_size, size >> sector_bits,
>>>> - 1, 0x0000, 0x0000, 0x0000, 0x0000, 0);
>>>> -    flash_mem = pflash_cfi01_get_memory(system_flash);
>>>> +        phys_addr -= size;
>>>>  
>>>> -    pc_isa_bios_init(rom_memory, flash_mem, size);
>>>> +        /* pflash_cfi01_register() creates a deep copy of the name */
>>>> +        snprintf(name, sizeof name, "system.flash%d", unit);
>>>> + system_flash = pflash_cfi01_register(phys_addr, NULL /* qdev */,
>>>> name,
>>>> +                                             size, bdrv, sector_size,
>>>> +                                             size >> sector_bits,
>>>> +                                             1      /* width */,
>>>> +                                             0x0000 /* id0 */,
>>>> +                                             0x0000 /* id1 */,
>>>> +                                             0x0000 /* id2 */,
>>>> +                                             0x0000 /* id3 */,
>>>> +                                             0      /* be */);
>>>> +        if (unit == 0) {
>>>> +            flash_mem = pflash_cfi01_get_memory(system_flash);
>>>> +            pc_isa_bios_init(rom_memory, flash_mem, size);
>>>> +        }
>>>> +        pflash_drv = drive_get(IF_PFLASH, 0, ++unit);
>>>> +    } while (pflash_drv != NULL);
>>>>  }
>>>>  
>>>>  static void old_pc_system_rom_init(MemoryRegion *rom_memory, bool
>>>> isapc_ram_fw)
>>>
>>> The drive with index 0 is passed as parameter pflash_drv.  The others
>>> the function gets itself.  I find that ugly.  Have you considered
>>> dropping the parameter?
>>
>> I didn't like drive_get() to begin with. It always starts to scan the
>> drive list from the beginning, which makes the new loop in
>> pc_system_flash_init() O(n^2).
> 
> Yes, it's pretty stupid, but n is small, and the code runs only during
> initialization.
> 
>> I was missing an interator-style interface for the drives, but I found
>> none, and I thought that iterating myself through them in O(n) (and
>> checking for the types) would break the current DriveInfo encapsulation.
>> So I kinda gave up on "elegance".
> 
> Legacy drives and elegance are about as attracted to each other as oil
> and water.
> 
>> Ideally, what should be dropped is the "unit" local variable in
>> pc_system_flash_init(). The function should continue to take
>> "pflash_drv", which should however qualify as a pre-initialized
>> iterator. Then pc_system_flash_init() should traverse it until it runs out.
> 
> Yes, that would work, but requires a bit of new blockdev infrastructure.
> 
>> I can of course remove the parameter and start a "while" loop
>> (rather than a "do" loop) with drive_get(IF_PFLASH, 0, 0), if you
>> consider that an improvement.
> 
> I'm partial to for-loops that let me see the complete loop control in
> one glance:
> 
>     for (unit = 0; drv = drive_get(IF_PFLASH, 0, unit); unit++)
>
Paolo Bonzini - Nov. 26, 2013, 1:41 p.m.
Il 25/11/2013 20:45, Laszlo Ersek ha scritto:
> From looking at "hw/block/pflash_cfi01.c", it seems that the guest can
> interrogate the flash device about its size (nb_blocs is stored in
> cfi_table starting at 0x2D, and cfi_table can be queried by command 0x98
> in pflash_read()). So, if the guest cares, it can figure out that there
> are multiple devices backing the range. I think.

IIUC in the case of OVMF the guest "just knows" that there are multiple
devices backing the range.  Is that right?

But yes, I think that the guest can figure out that there are multiple
devices backing the range.  From reading the pflash code further:

* the pflash device doesn't care about the location where you write the
command (the exception is the "block erase" command)

* the pflash device only cares about the LSB you read from when you read
data from it

So you can use the last 256 bytes of the first flash (you know it ends
at 4GB) to check for the device (there's probably some suggested
protocol to do that, I don't know) and query its size.  Example with
"-qtest stdio":

writeb 0xffffff00 0x98
OK
readb 0xfffffff2d
OK 0x000000000000001f
readb 0xfffffff2e
OK 0x0000000000000000
readb 0xfffffff2f
OK 0x0000000000000010
readb 0xfffffff30
OK 0x0000000000000000
writeb 0xffffff00 0x98
OK

This means the device has 31+1 blocks each 4KB in size.  You can then
query the next device at 4GB-128KB-256, and so on.

Paolo
Laszlo Ersek - Nov. 26, 2013, 1:53 p.m.
On 11/26/13 14:41, Paolo Bonzini wrote:
> Il 25/11/2013 20:45, Laszlo Ersek ha scritto:
>> From looking at "hw/block/pflash_cfi01.c", it seems that the guest can
>> interrogate the flash device about its size (nb_blocs is stored in
>> cfi_table starting at 0x2D, and cfi_table can be queried by command 0x98
>> in pflash_read()). So, if the guest cares, it can figure out that there
>> are multiple devices backing the range. I think.
> 
> IIUC in the case of OVMF the guest "just knows" that there are multiple
> devices backing the range.  Is that right?

No, the guest (the flash driver) is unaware of that. It could know if it
wanted to, but it doesn't care. It cares about a base address and a
size, and that range is subdivided into various roles. But how many
flash devices back the range is not interesting for the driver.

Jordan wrote the driver with one flash device in mind, and when I split
the image in two parts, I took care to map them so that the base
address, the size, and those boundaries stay the same. It's sufficient
for the driver to continue working.

> But yes, I think that the guest can figure out that there are multiple
> devices backing the range.  From reading the pflash code further:
> 
> * the pflash device doesn't care about the location where you write the
> command (the exception is the "block erase" command)
> 
> * the pflash device only cares about the LSB you read from when you read
> data from it
> 
> So you can use the last 256 bytes of the first flash (you know it ends
> at 4GB) to check for the device (there's probably some suggested
> protocol to do that, I don't know) and query its size.  Example with
> "-qtest stdio":
> 
> writeb 0xffffff00 0x98
> OK
> readb 0xfffffff2d
> OK 0x000000000000001f
> readb 0xfffffff2e
> OK 0x0000000000000000
> readb 0xfffffff2f
> OK 0x0000000000000010
> readb 0xfffffff30
> OK 0x0000000000000000
> writeb 0xffffff00 0x98
> OK
> 
> This means the device has 31+1 blocks each 4KB in size.  You can then
> query the next device at 4GB-128KB-256, and so on.

Thanks for testing this in practice.
Laszlo
Paolo Bonzini - Nov. 26, 2013, 2:06 p.m.
Il 26/11/2013 14:53, Laszlo Ersek ha scritto:
>> > 
>> > IIUC in the case of OVMF the guest "just knows" that there are multiple
>> > devices backing the range.  Is that right?
> No, the guest (the flash driver) is unaware of that. It could know if it
> wanted to, but it doesn't care. It cares about a base address and a
> size, and that range is subdivided into various roles. But how many
> flash devices back the range is not interesting for the driver.
> 
> Jordan wrote the driver with one flash device in mind, and when I split
> the image in two parts, I took care to map them so that the base
> address, the size, and those boundaries stay the same. It's sufficient
> for the driver to continue working.

Ah, I see it now.  That's because the driver never needs to write an
offset into the flash device.  Offsets are communicated by writing to
different memory addresses.

Since the OVMF driver never queries the size of the device, it doesn't
care whether the flash is one or two.

Paolo
Markus Armbruster - Nov. 26, 2013, 3:35 p.m.
Laszlo Ersek <lersek@redhat.com> writes:

> On 11/26/13 14:11, Markus Armbruster wrote:
>> Laszlo Ersek <lersek@redhat.com> writes:
>> 
>>> On 11/25/13 16:32, Markus Armbruster wrote:
>>>> Laszlo Ersek <lersek@redhat.com> writes:
>>>>
>>>>> This patch allows the user to usefully specify
>>>>>
>>>>>   -drive file=img_1,if=pflash,format=raw,readonly \
>>>>>   -drive file=img_2,if=pflash,format=raw
>>>>>
>>>>> on the command line. The flash images will be mapped under 4G in their
>>>>> reverse unit order -- that is, with their base addresses progressing
>>>>> downwards, in increasing unit order.
>>>>>
>>>>> (The unit number increases with command line order if not explicitly
>>>>> specified.)
>>>>>
>>>>> This accommodates the following use case: suppose that OVMF is split in
>>>>> two parts, a writeable host file for non-volatile variable storage, and a
>>>>> read-only part for bootstrap and decompressible executable code.
>>>>>
>>>>> The binary code part would be read-only, centrally managed on the host
>>>>> system, and passed in as unit 0. The variable store would be writeable,
>>>>> VM-specific, and passed in as unit 1.
>>>>>
>>>>>   00000000ffe00000-00000000ffe1ffff (prio 0, R-): system.flash1
>>>>>   00000000ffe20000-00000000ffffffff (prio 0, R-): system.flash0
>>>>>
>>>>> (If the guest tries to write to the flash range that is backed by the
>>>>> read-only drive, bdrv_write() in pflash_update() will correctly deny the
>>>>> write with -EACCES, and pflash_update() won't care, which suits us well.)
>>>>>
>>>>> Signed-off-by: Laszlo Ersek <lersek@redhat.com>
>>>>> ---
>>>>>  hw/i386/pc_sysfw.c | 60 ++++++++++++++++++++++++++++++++++++++++--------------
>>>>>  1 file changed, 45 insertions(+), 15 deletions(-)
>>>>>
>>>>> diff --git a/hw/i386/pc_sysfw.c b/hw/i386/pc_sysfw.c
>>>>> index e917c83..1c3e3d6 100644
>>>>> --- a/hw/i386/pc_sysfw.c
>>>>> +++ b/hw/i386/pc_sysfw.c
>>>>> @@ -72,35 +72,65 @@ static void pc_isa_bios_init(MemoryRegion *rom_memory,
>>>>>      memory_region_set_readonly(isa_bios, true);
>>>>>  }
>>>>>  
>>>>> +/* This function maps flash drives from 4G downward, in order of their unit
>>>>> + * numbers. Addressing within one flash drive is of course not reversed.
>>>>> + *
>>>>> + * The drive with unit number 0 is mapped at the highest address, and it is
>>>>> + * passed to pc_isa_bios_init(). Merging severral drives for isa-bios is not
>>>>> + * supported.
>>>>> + *
>>>>> + * The caller is responsible to pass in the non-NULL @pflash_drv that
>>>>> + * corresponds to the flash drive with unit number 0.
>>>>> + */
>>>>>  static void pc_system_flash_init(MemoryRegion *rom_memory,
>>>>>                                   DriveInfo *pflash_drv)
>>>>>  {
>>>>> +    int unit = 0;
>>>>>      BlockDriverState *bdrv;
>>>>>      int64_t size;
>>>>> -    hwaddr phys_addr;
>>>>> +    hwaddr phys_addr = 0x100000000ULL;
>>>>>      int sector_bits, sector_size;
>>>>>      pflash_t *system_flash;
>>>>>      MemoryRegion *flash_mem;
>>>>> +    char name[64];
>>>>>  
>>>>> -    bdrv = pflash_drv->bdrv;
>>>>> -    size = bdrv_getlength(pflash_drv->bdrv);
>>>>>      sector_bits = 12;
>>>>>      sector_size = 1 << sector_bits;
>>>>>  
>>>>> -    if ((size % sector_size) != 0) {
>>>>> -        fprintf(stderr,
>>>>> -                "qemu: PC system firmware (pflash) must be a multiple of 0x%x\n",
>>>>> -                sector_size);
>>>>> -        exit(1);
>>>>> -    }
>>>>> +    do {
>>>>> +        bdrv = pflash_drv->bdrv;
>>>>> +        size = bdrv_getlength(bdrv);
>>>>> +        if ((size % sector_size) != 0) {
>>>>> +            fprintf(stderr,
>>>>> +                    "qemu: PC system firmware (pflash segment %d) must be a "
>>>>> +                    "multiple of 0x%x\n", unit, sector_size);
>>>>> +            exit(1);
>>>>> +        }
>>>>> +        if (size > phys_addr) {
>>>>> +            fprintf(stderr, "qemu: pflash segments must fit under 4G "
>>>>> +                    "cumulatively\n");
>> 
>> You're just following existing bad practice here, but correct use of
>> error_report() would give you nicer messages.  Happy to explain if
>> you're interested.
>
> You have the Location thing in mind, eg. automatically binding the error
> message to the offending command line option, right?

Yes, including a location in the message is one benefit.  Getting the
right one outside the initial parse can take a bit of extra work.  I'm
happy to assist with it.

Other benefits: consistent message format, timestamping support, and
making the intent of the message more obvious to readers of the code.

> I've seen you use it before in another patch. Hm... It was commit
> ec2df8c10a4585ba4641ae482cf2f5f13daa810e.
>
> I will try to address the rest of your comments too when I get back to
> this patch.

Okay :)
Jordan Justen - Nov. 26, 2013, 5:54 p.m.
On Tue, Nov 26, 2013 at 5:32 AM, Laszlo Ersek <lersek@redhat.com> wrote:
> On 11/26/13 13:36, Markus Armbruster wrote:
>
>> Your stated purpose for multiple -pflash:
>>
>>     This accommodates the following use case: suppose that OVMF is split in
>>     two parts, a writeable host file for non-volatile variable storage, and a
>>     read-only part for bootstrap and decompressible executable code.
>>
>> Such a split between writable part and read-only part makes sense to me.
>> How is it done in physical hardware?  Single device with configurable
>> write-protect, or two separate devices?
>
> (Jordan could help more.)
>
> Likely one device that's fully writeable.

Most parts will have a dedicated read-only line.

Many devices have 'block-locking' that will make some subset of blocks
read-only until a reset.

In addition to this, many chipsets will allow flash writes to be
protected by triggering SMM when a flash write occurs.

Using multiple chips are less common due to cost, but this is not a
factor for QEMU. :)

-Jordan
Markus Armbruster - Nov. 27, 2013, 1:49 p.m.
Laszlo Ersek <lersek@redhat.com> writes:

> On 11/26/13 13:53, Markus Armbruster wrote:
>
>> Thus, we grab *all* if=pflash drives for this purpose.
>> 
>> Your stated use case wants just two.
>> 
>> Hmm.  Are we sure we'll never want to map an if=pflash device somewhere
>> else?
>
> No, I'm not sure.

Perhaps grabbing just the two we need for now would be more prudent.
Markus Armbruster - Nov. 27, 2013, 1:52 p.m.
Jordan Justen <jljusten@gmail.com> writes:

> On Tue, Nov 26, 2013 at 5:32 AM, Laszlo Ersek <lersek@redhat.com> wrote:
>> On 11/26/13 13:36, Markus Armbruster wrote:
>>
>>> Your stated purpose for multiple -pflash:
>>>
>>>     This accommodates the following use case: suppose that OVMF is split in
>>>     two parts, a writeable host file for non-volatile variable storage, and a
>>>     read-only part for bootstrap and decompressible executable code.
>>>
>>> Such a split between writable part and read-only part makes sense to me.
>>> How is it done in physical hardware?  Single device with configurable
>>> write-protect, or two separate devices?
>>
>> (Jordan could help more.)
>>
>> Likely one device that's fully writeable.
>
> Most parts will have a dedicated read-only line.
>
> Many devices have 'block-locking' that will make some subset of blocks
> read-only until a reset.
>
> In addition to this, many chipsets will allow flash writes to be
> protected by triggering SMM when a flash write occurs.
>
> Using multiple chips are less common due to cost, but this is not a
> factor for QEMU. :)

Should we stick to what real hardware does?  Single device, perhaps with
block locking.
Laszlo Ersek - Nov. 27, 2013, 2:01 p.m.
On 11/27/13 14:52, Markus Armbruster wrote:
> Jordan Justen <jljusten@gmail.com> writes:
> 
>> On Tue, Nov 26, 2013 at 5:32 AM, Laszlo Ersek <lersek@redhat.com> wrote:
>>> On 11/26/13 13:36, Markus Armbruster wrote:
>>>
>>>> Your stated purpose for multiple -pflash:
>>>>
>>>>     This accommodates the following use case: suppose that OVMF is split in
>>>>     two parts, a writeable host file for non-volatile variable storage, and a
>>>>     read-only part for bootstrap and decompressible executable code.
>>>>
>>>> Such a split between writable part and read-only part makes sense to me.
>>>> How is it done in physical hardware?  Single device with configurable
>>>> write-protect, or two separate devices?
>>>
>>> (Jordan could help more.)
>>>
>>> Likely one device that's fully writeable.
>>
>> Most parts will have a dedicated read-only line.
>>
>> Many devices have 'block-locking' that will make some subset of blocks
>> read-only until a reset.
>>
>> In addition to this, many chipsets will allow flash writes to be
>> protected by triggering SMM when a flash write occurs.
>>
>> Using multiple chips are less common due to cost, but this is not a
>> factor for QEMU. :)
> 
> Should we stick to what real hardware does?  Single device, perhaps with
> block locking.

I can't back a single flash device with two drives (= two host-side
files), which is the incentive for this change.

Thanks
Laszlo
Laszlo Ersek - Nov. 27, 2013, 2:01 p.m.
On 11/27/13 14:49, Markus Armbruster wrote:
> Laszlo Ersek <lersek@redhat.com> writes:
> 
>> On 11/26/13 13:53, Markus Armbruster wrote:
>>
>>> Thus, we grab *all* if=pflash drives for this purpose.
>>>
>>> Your stated use case wants just two.
>>>
>>> Hmm.  Are we sure we'll never want to map an if=pflash device somewhere
>>> else?
>>
>> No, I'm not sure.
> 
> Perhaps grabbing just the two we need for now would be more prudent.
> 

Agreed. Units 0 and 1.

Thanks!
Laszlo
Markus Armbruster - Nov. 27, 2013, 2:45 p.m.
Laszlo Ersek <lersek@redhat.com> writes:

> On 11/27/13 14:52, Markus Armbruster wrote:
>> Jordan Justen <jljusten@gmail.com> writes:
>> 
>>> On Tue, Nov 26, 2013 at 5:32 AM, Laszlo Ersek <lersek@redhat.com> wrote:
>>>> On 11/26/13 13:36, Markus Armbruster wrote:
>>>>
>>>>> Your stated purpose for multiple -pflash:
>>>>>
>>>>>     This accommodates the following use case: suppose that OVMF is split in
>>>>>     two parts, a writeable host file for non-volatile variable
>>>>> storage, and a
>>>>>     read-only part for bootstrap and decompressible executable code.
>>>>>
>>>>> Such a split between writable part and read-only part makes sense to me.
>>>>> How is it done in physical hardware?  Single device with configurable
>>>>> write-protect, or two separate devices?
>>>>
>>>> (Jordan could help more.)
>>>>
>>>> Likely one device that's fully writeable.
>>>
>>> Most parts will have a dedicated read-only line.
>>>
>>> Many devices have 'block-locking' that will make some subset of blocks
>>> read-only until a reset.
>>>
>>> In addition to this, many chipsets will allow flash writes to be
>>> protected by triggering SMM when a flash write occurs.
>>>
>>> Using multiple chips are less common due to cost, but this is not a
>>> factor for QEMU. :)
>> 
>> Should we stick to what real hardware does?  Single device, perhaps with
>> block locking.
>
> I can't back a single flash device with two drives (= two host-side
> files), which is the incentive for this change.

There's no fundamental reason why a single device model instance could
not be backed by two block backends, named by two drive properties.

I'm not claiming this is the best solution, just offering it for
consideration.
Laszlo Ersek - Nov. 27, 2013, 3:18 p.m.
On 11/27/13 15:45, Markus Armbruster wrote:
> Laszlo Ersek <lersek@redhat.com> writes:
> 
>> On 11/27/13 14:52, Markus Armbruster wrote:
>>> Jordan Justen <jljusten@gmail.com> writes:
>>>
>>>> On Tue, Nov 26, 2013 at 5:32 AM, Laszlo Ersek <lersek@redhat.com> wrote:
>>>>> On 11/26/13 13:36, Markus Armbruster wrote:
>>>>>
>>>>>> Your stated purpose for multiple -pflash:
>>>>>>
>>>>>>     This accommodates the following use case: suppose that OVMF is split in
>>>>>>     two parts, a writeable host file for non-volatile variable
>>>>>> storage, and a
>>>>>>     read-only part for bootstrap and decompressible executable code.
>>>>>>
>>>>>> Such a split between writable part and read-only part makes sense to me.
>>>>>> How is it done in physical hardware?  Single device with configurable
>>>>>> write-protect, or two separate devices?
>>>>>
>>>>> (Jordan could help more.)
>>>>>
>>>>> Likely one device that's fully writeable.
>>>>
>>>> Most parts will have a dedicated read-only line.
>>>>
>>>> Many devices have 'block-locking' that will make some subset of blocks
>>>> read-only until a reset.
>>>>
>>>> In addition to this, many chipsets will allow flash writes to be
>>>> protected by triggering SMM when a flash write occurs.
>>>>
>>>> Using multiple chips are less common due to cost, but this is not a
>>>> factor for QEMU. :)
>>>
>>> Should we stick to what real hardware does?  Single device, perhaps with
>>> block locking.
>>
>> I can't back a single flash device with two drives (= two host-side
>> files), which is the incentive for this change.
> 
> There's no fundamental reason why a single device model instance could
> not be backed by two block backends, named by two drive properties.
> 
> I'm not claiming this is the best solution, just offering it for
> consideration.

I'll pass :) I guess in theory we could push down the multi-drive thing
to "pflash_cfi01.c". But:
- It is used by a bunch of other boards.
- Regarding i386, the second drive could automatically become (dependent
on the first drive's size) part of the range that is mirrored in
isa-bios space. Probably not intended.
- The previous point strengthens the "pflash is used by other boards
too" concern: in case of i386, I'm at least halfway aware of the
board-specific consequences of sneaking in another drive, but I have no
clue about the other boards.
- If we decide at some point to map / paste backing drives in a loop,
then the (at that time) existent device properties of pflash, let's call
them "drive0" and "drive1", will look clumsy. We'll need a way to parse
an unknown number of backend (drive) IDs. A mess.

Thanks!
Laszlo
Markus Armbruster - Nov. 27, 2013, 5:22 p.m.
Laszlo Ersek <lersek@redhat.com> writes:

> On 11/27/13 15:45, Markus Armbruster wrote:
>> Laszlo Ersek <lersek@redhat.com> writes:
>> 
>>> On 11/27/13 14:52, Markus Armbruster wrote:
>>>> Jordan Justen <jljusten@gmail.com> writes:
>>>>
>>>>> On Tue, Nov 26, 2013 at 5:32 AM, Laszlo Ersek <lersek@redhat.com> wrote:
>>>>>> On 11/26/13 13:36, Markus Armbruster wrote:
>>>>>>
>>>>>>> Your stated purpose for multiple -pflash:
>>>>>>>
>>>>>>>     This accommodates the following use case: suppose that OVMF
>>>>>>> is split in
>>>>>>>     two parts, a writeable host file for non-volatile variable
>>>>>>> storage, and a
>>>>>>>     read-only part for bootstrap and decompressible executable code.
>>>>>>>
>>>>>>> Such a split between writable part and read-only part makes sense to me.
>>>>>>> How is it done in physical hardware?  Single device with configurable
>>>>>>> write-protect, or two separate devices?
>>>>>>
>>>>>> (Jordan could help more.)
>>>>>>
>>>>>> Likely one device that's fully writeable.
>>>>>
>>>>> Most parts will have a dedicated read-only line.
>>>>>
>>>>> Many devices have 'block-locking' that will make some subset of blocks
>>>>> read-only until a reset.
>>>>>
>>>>> In addition to this, many chipsets will allow flash writes to be
>>>>> protected by triggering SMM when a flash write occurs.
>>>>>
>>>>> Using multiple chips are less common due to cost, but this is not a
>>>>> factor for QEMU. :)
>>>>
>>>> Should we stick to what real hardware does?  Single device, perhaps with
>>>> block locking.
>>>
>>> I can't back a single flash device with two drives (= two host-side
>>> files), which is the incentive for this change.
>> 
>> There's no fundamental reason why a single device model instance could
>> not be backed by two block backends, named by two drive properties.
>> 
>> I'm not claiming this is the best solution, just offering it for
>> consideration.
>
> I'll pass :) I guess in theory we could push down the multi-drive thing
> to "pflash_cfi01.c". But:
> - It is used by a bunch of other boards.
> - Regarding i386, the second drive could automatically become (dependent
> on the first drive's size) part of the range that is mirrored in
> isa-bios space. Probably not intended.

I suspect real hardware mirrors the last 128KiB of its only flash chip
regardless of where the write-protected part begins.

> - The previous point strengthens the "pflash is used by other boards
> too" concern: in case of i386, I'm at least halfway aware of the
> board-specific consequences of sneaking in another drive, but I have no
> clue about the other boards.
> - If we decide at some point to map / paste backing drives in a loop,
> then the (at that time) existent device properties of pflash, let's call
> them "drive0" and "drive1", will look clumsy. We'll need a way to parse
> an unknown number of backend (drive) IDs. A mess.

Perhaps the proper way to back partially writable flash contents isn't
splitting it into two devices, but backing a single device with a COW.
The backing file has initial contents (say BIOS image), the delta may
have additional contents (say non-volatile configuration), and picks up
whatever the device model permits to be written to the flash.
Laszlo Ersek - Nov. 27, 2013, 5:34 p.m.
On 11/27/13 18:22, Markus Armbruster wrote:

> Perhaps the proper way to back partially writable flash contents isn't
> splitting it into two devices, but backing a single device with a COW.
> The backing file has initial contents (say BIOS image), the delta may
> have additional contents (say non-volatile configuration), and picks up
> whatever the device model permits to be written to the flash.

Kevin brought this up before. The problem is that this prevents you from
upgrading the read-only part in O(1).

(Unless of course you're willing to make the backing file raw *and* to
poke directly in it when you upgrade the bios binary part. Theoretically
you could do this, because the divide between the varstore and the
binary part at 128KB falls at a qcow2 block boundary (supposing a 64K
qcow2 block size), but in practice this is a horrible idea :))

Let's not veer off into architecting a new star destroyer. I think I
have plenty ammo from your great notes thus far that I can hack up a v2
with :)

Thanks
Laszlo
Markus Armbruster - Nov. 27, 2013, 8:35 p.m.
Laszlo Ersek <lersek@redhat.com> writes:

> On 11/27/13 18:22, Markus Armbruster wrote:
>
>> Perhaps the proper way to back partially writable flash contents isn't
>> splitting it into two devices, but backing a single device with a COW.
>> The backing file has initial contents (say BIOS image), the delta may
>> have additional contents (say non-volatile configuration), and picks up
>> whatever the device model permits to be written to the flash.
>
> Kevin brought this up before. The problem is that this prevents you from
> upgrading the read-only part in O(1).

Err, what's wrong with rebasing the COW to a new backing file?

Oh, I see you considered it:

> (Unless of course you're willing to make the backing file raw *and* to
> poke directly in it when you upgrade the bios binary part. Theoretically

No, don't update the backing file, *replace* it.

If the device model permitted a write to a cluster within the BIOS part,
the replacement won't be visible, so don't do that :)

> you could do this, because the divide between the varstore and the
> binary part at 128KB falls at a qcow2 block boundary (supposing a 64K
> qcow2 block size), but in practice this is a horrible idea :))

Yes, with COW the COW clustering apply to the division between BIOS part
and non-volatile configuration part.

> Let's not veer off into architecting a new star destroyer. I think I
> have plenty ammo from your great notes thus far that I can hack up a v2
> with :)

Hmmmm, pretty star destroyers...

Patch

diff --git a/hw/i386/pc_sysfw.c b/hw/i386/pc_sysfw.c
index e917c83..1c3e3d6 100644
--- a/hw/i386/pc_sysfw.c
+++ b/hw/i386/pc_sysfw.c
@@ -72,35 +72,65 @@  static void pc_isa_bios_init(MemoryRegion *rom_memory,
     memory_region_set_readonly(isa_bios, true);
 }
 
+/* This function maps flash drives from 4G downward, in order of their unit
+ * numbers. Addressing within one flash drive is of course not reversed.
+ *
+ * The drive with unit number 0 is mapped at the highest address, and it is
+ * passed to pc_isa_bios_init(). Merging severral drives for isa-bios is not
+ * supported.
+ *
+ * The caller is responsible to pass in the non-NULL @pflash_drv that
+ * corresponds to the flash drive with unit number 0.
+ */
 static void pc_system_flash_init(MemoryRegion *rom_memory,
                                  DriveInfo *pflash_drv)
 {
+    int unit = 0;
     BlockDriverState *bdrv;
     int64_t size;
-    hwaddr phys_addr;
+    hwaddr phys_addr = 0x100000000ULL;
     int sector_bits, sector_size;
     pflash_t *system_flash;
     MemoryRegion *flash_mem;
+    char name[64];
 
-    bdrv = pflash_drv->bdrv;
-    size = bdrv_getlength(pflash_drv->bdrv);
     sector_bits = 12;
     sector_size = 1 << sector_bits;
 
-    if ((size % sector_size) != 0) {
-        fprintf(stderr,
-                "qemu: PC system firmware (pflash) must be a multiple of 0x%x\n",
-                sector_size);
-        exit(1);
-    }
+    do {
+        bdrv = pflash_drv->bdrv;
+        size = bdrv_getlength(bdrv);
+        if ((size % sector_size) != 0) {
+            fprintf(stderr,
+                    "qemu: PC system firmware (pflash segment %d) must be a "
+                    "multiple of 0x%x\n", unit, sector_size);
+            exit(1);
+        }
+        if (size > phys_addr) {
+            fprintf(stderr, "qemu: pflash segments must fit under 4G "
+                    "cumulatively\n");
+            exit(1);
+        }
 
-    phys_addr = 0x100000000ULL - size;
-    system_flash = pflash_cfi01_register(phys_addr, NULL, "system.flash", size,
-                                         bdrv, sector_size, size >> sector_bits,
-                                         1, 0x0000, 0x0000, 0x0000, 0x0000, 0);
-    flash_mem = pflash_cfi01_get_memory(system_flash);
+        phys_addr -= size;
 
-    pc_isa_bios_init(rom_memory, flash_mem, size);
+        /* pflash_cfi01_register() creates a deep copy of the name */
+        snprintf(name, sizeof name, "system.flash%d", unit);
+        system_flash = pflash_cfi01_register(phys_addr, NULL /* qdev */, name,
+                                             size, bdrv, sector_size,
+                                             size >> sector_bits,
+                                             1      /* width */,
+                                             0x0000 /* id0 */,
+                                             0x0000 /* id1 */,
+                                             0x0000 /* id2 */,
+                                             0x0000 /* id3 */,
+                                             0      /* be */);
+        if (unit == 0) {
+            flash_mem = pflash_cfi01_get_memory(system_flash);
+            pc_isa_bios_init(rom_memory, flash_mem, size);
+        }
+        pflash_drv = drive_get(IF_PFLASH, 0, ++unit);
+    } while (pflash_drv != NULL);
 }
 
 static void old_pc_system_rom_init(MemoryRegion *rom_memory, bool isapc_ram_fw)