Message ID | 1385072461-31317-1-git-send-email-lersek@redhat.com |
---|---|
State | New |
Headers | show |
On 11/21/2013 03:21 PM, Laszlo Ersek wrote: > This patch allows the user to usefully specify > > -drive file=img_1,if=pflash,format=raw,readonly \ > -drive file=img_2,if=pflash,format=raw > > on the command line. The flash images will be mapped under 4G in their > reverse unit order -- that is, with their base addresses progressing > downwards, in increasing unit order. > > +/* This function maps flash drives from 4G downward, in order of their unit > + * numbers. Addressing within one flash drive is of course not reversed. > + * > + * The drive with unit number 0 is mapped at the highest address, and it is > + * passed to pc_isa_bios_init(). Merging severral drives for isa-bios is not s/severral/several/
On 11/21/13 23:26, Eric Blake wrote: > On 11/21/2013 03:21 PM, Laszlo Ersek wrote: >> This patch allows the user to usefully specify >> >> -drive file=img_1,if=pflash,format=raw,readonly \ >> -drive file=img_2,if=pflash,format=raw >> >> on the command line. The flash images will be mapped under 4G in their >> reverse unit order -- that is, with their base addresses progressing >> downwards, in increasing unit order. >> > >> +/* This function maps flash drives from 4G downward, in order of their unit >> + * numbers. Addressing within one flash drive is of course not reversed. >> + * >> + * The drive with unit number 0 is mapped at the highest address, and it is >> + * passed to pc_isa_bios_init(). Merging severral drives for isa-bios is not > > s/severral/several/ > I may have meant "very severrrral" :) (Will fix if patch is deemed worthy otherwise.) Thanks! Laszlo
Laszlo Ersek <lersek@redhat.com> writes: > This patch allows the user to usefully specify > > -drive file=img_1,if=pflash,format=raw,readonly \ > -drive file=img_2,if=pflash,format=raw > > on the command line. The flash images will be mapped under 4G in their > reverse unit order -- that is, with their base addresses progressing > downwards, in increasing unit order. > > (The unit number increases with command line order if not explicitly > specified.) > > This accommodates the following use case: suppose that OVMF is split in > two parts, a writeable host file for non-volatile variable storage, and a > read-only part for bootstrap and decompressible executable code. > > The binary code part would be read-only, centrally managed on the host > system, and passed in as unit 0. The variable store would be writeable, > VM-specific, and passed in as unit 1. > > 00000000ffe00000-00000000ffe1ffff (prio 0, R-): system.flash1 > 00000000ffe20000-00000000ffffffff (prio 0, R-): system.flash0 > > (If the guest tries to write to the flash range that is backed by the > read-only drive, bdrv_write() in pflash_update() will correctly deny the > write with -EACCES, and pflash_update() won't care, which suits us well.) Before this patch: You can define as many if=pflash drives as you want. Any with non-zero index are silently ignored. If you don't specify one with index=0, you get a ROM at the top of the 32 bit address space, contents taken from -bios (default "bios.bin"). Up to its last 128KiB are aliased at the top of the ISA address space. If you do specify one with index=0, you get a pflash device at the top of the 32 bit address space, with contents from the drive, and -bios is silently ignored. Up to its last 128KiB are copied to a ROM at the top of the (20 bit) ISA address space. After this patch (please correct misunderstandings): Now the drives after the first unused index are silently ignored. If you don't specify one with index=0, no change. If you do, you now get N pflash devices, where N is the first unused index. Each pflash's contents is taken from the respective drive. The flashes are mapped at the top of the 32 bit address space in reverse index order. -bios is silently ignored, as before. Up to the last 128KiB of the index=0 flash are copied to a ROM at the top of the ISA address space. Thus, no change for index=0. For index=1..N, we now get additional flash devices. Correct?
On 11/22/13 13:21, Markus Armbruster wrote: > Laszlo Ersek <lersek@redhat.com> writes: > >> This patch allows the user to usefully specify >> >> -drive file=img_1,if=pflash,format=raw,readonly \ >> -drive file=img_2,if=pflash,format=raw >> >> on the command line. The flash images will be mapped under 4G in their >> reverse unit order -- that is, with their base addresses progressing >> downwards, in increasing unit order. >> >> (The unit number increases with command line order if not explicitly >> specified.) >> >> This accommodates the following use case: suppose that OVMF is split in >> two parts, a writeable host file for non-volatile variable storage, and a >> read-only part for bootstrap and decompressible executable code. >> >> The binary code part would be read-only, centrally managed on the host >> system, and passed in as unit 0. The variable store would be writeable, >> VM-specific, and passed in as unit 1. >> >> 00000000ffe00000-00000000ffe1ffff (prio 0, R-): system.flash1 >> 00000000ffe20000-00000000ffffffff (prio 0, R-): system.flash0 >> >> (If the guest tries to write to the flash range that is backed by the >> read-only drive, bdrv_write() in pflash_update() will correctly deny the >> write with -EACCES, and pflash_update() won't care, which suits us well.) > > Before this patch: > > You can define as many if=pflash drives as you want. Any with non-zero > index are silently ignored. > > If you don't specify one with index=0, you get a ROM at the top of the > 32 bit address space, contents taken from -bios (default "bios.bin"). > Up to its last 128KiB are aliased at the top of the ISA address space. > > If you do specify one with index=0, you get a pflash device at the top > of the 32 bit address space, with contents from the drive, and -bios is > silently ignored. Up to its last 128KiB are copied to a ROM at the top > of the (20 bit) ISA address space. > > After this patch (please correct misunderstandings): > > Now the drives after the first unused index are silently ignored. > > If you don't specify one with index=0, no change. > > If you do, you now get N pflash devices, where N is the first unused > index. Each pflash's contents is taken from the respective drive. The > flashes are mapped at the top of the 32 bit address space in reverse > index order. -bios is silently ignored, as before. Up to the last > 128KiB of the index=0 flash are copied to a ROM at the top of the ISA > address space. > > Thus, no change for index=0. For index=1..N, we now get additional > flash devices. > > Correct? Yes. Thanks Laszlo
Laszlo Ersek <lersek@redhat.com> writes: > On 11/22/13 13:21, Markus Armbruster wrote: >> Laszlo Ersek <lersek@redhat.com> writes: >> >>> This patch allows the user to usefully specify >>> >>> -drive file=img_1,if=pflash,format=raw,readonly \ >>> -drive file=img_2,if=pflash,format=raw >>> >>> on the command line. The flash images will be mapped under 4G in their >>> reverse unit order -- that is, with their base addresses progressing >>> downwards, in increasing unit order. >>> >>> (The unit number increases with command line order if not explicitly >>> specified.) >>> >>> This accommodates the following use case: suppose that OVMF is split in >>> two parts, a writeable host file for non-volatile variable storage, and a >>> read-only part for bootstrap and decompressible executable code. >>> >>> The binary code part would be read-only, centrally managed on the host >>> system, and passed in as unit 0. The variable store would be writeable, >>> VM-specific, and passed in as unit 1. >>> >>> 00000000ffe00000-00000000ffe1ffff (prio 0, R-): system.flash1 >>> 00000000ffe20000-00000000ffffffff (prio 0, R-): system.flash0 >>> >>> (If the guest tries to write to the flash range that is backed by the >>> read-only drive, bdrv_write() in pflash_update() will correctly deny the >>> write with -EACCES, and pflash_update() won't care, which suits us well.) >> >> Before this patch: >> >> You can define as many if=pflash drives as you want. Any with non-zero >> index are silently ignored. >> >> If you don't specify one with index=0, you get a ROM at the top of the >> 32 bit address space, contents taken from -bios (default "bios.bin"). >> Up to its last 128KiB are aliased at the top of the ISA address space. >> >> If you do specify one with index=0, you get a pflash device at the top >> of the 32 bit address space, with contents from the drive, and -bios is >> silently ignored. Up to its last 128KiB are copied to a ROM at the top >> of the (20 bit) ISA address space. >> >> After this patch (please correct misunderstandings): >> >> Now the drives after the first unused index are silently ignored. >> >> If you don't specify one with index=0, no change. >> >> If you do, you now get N pflash devices, where N is the first unused >> index. Each pflash's contents is taken from the respective drive. The >> flashes are mapped at the top of the 32 bit address space in reverse >> index order. -bios is silently ignored, as before. Up to the last >> 128KiB of the index=0 flash are copied to a ROM at the top of the ISA >> address space. >> >> Thus, no change for index=0. For index=1..N, we now get additional >> flash devices. >> >> Correct? > > Yes. Thanks. 1. Multiple pflash devices Is there any way for a guest to see the N separate pflash devices, or do they look exactly like a single pflash device? 2. More board code device configuration magic Our long term goal is to configure machines by configuration files (i.e. data) rather than code. Your patch extends the PC board code dealing with if=pflash for multiple drives. Reminder: -drive if=X (where X != none) creates a backend, and leaves it in a place where board code can find it. Board code may elect to create a frontend using that backend. It's desirable that any new board code creating a frontend for -drive if=X,... is sufficiently simple so that users can get the same result with some -drive if=none,... -device ... incantation. The second form provides full control over device properties. See section "Block Devices" in docs/qdev-device-use.txt for examples of such mappings. This is the case for if=ide, if=scsi, if=floppy and if=virtio (see docs/qdev-device-use.txt). It's not yet the case for if=pflash, because the qdev used with it (cfi.pflash01) is a sysbus device, and these aren't available with -device, yet. When that gets fixed, I'd expect support for putting the flash device at a specific address. An equivalent if=none incantation (free of board code magic) should be possible. Thus, the board magic your patch adds should be of the harmless kind.
Laszlo Ersek <lersek@redhat.com> writes: > This patch allows the user to usefully specify > > -drive file=img_1,if=pflash,format=raw,readonly \ > -drive file=img_2,if=pflash,format=raw > > on the command line. The flash images will be mapped under 4G in their > reverse unit order -- that is, with their base addresses progressing > downwards, in increasing unit order. > > (The unit number increases with command line order if not explicitly > specified.) > > This accommodates the following use case: suppose that OVMF is split in > two parts, a writeable host file for non-volatile variable storage, and a > read-only part for bootstrap and decompressible executable code. > > The binary code part would be read-only, centrally managed on the host > system, and passed in as unit 0. The variable store would be writeable, > VM-specific, and passed in as unit 1. > > 00000000ffe00000-00000000ffe1ffff (prio 0, R-): system.flash1 > 00000000ffe20000-00000000ffffffff (prio 0, R-): system.flash0 > > (If the guest tries to write to the flash range that is backed by the > read-only drive, bdrv_write() in pflash_update() will correctly deny the > write with -EACCES, and pflash_update() won't care, which suits us well.) > > Signed-off-by: Laszlo Ersek <lersek@redhat.com> > --- > hw/i386/pc_sysfw.c | 60 ++++++++++++++++++++++++++++++++++++++++-------------- > 1 file changed, 45 insertions(+), 15 deletions(-) > > diff --git a/hw/i386/pc_sysfw.c b/hw/i386/pc_sysfw.c > index e917c83..1c3e3d6 100644 > --- a/hw/i386/pc_sysfw.c > +++ b/hw/i386/pc_sysfw.c > @@ -72,35 +72,65 @@ static void pc_isa_bios_init(MemoryRegion *rom_memory, > memory_region_set_readonly(isa_bios, true); > } > > +/* This function maps flash drives from 4G downward, in order of their unit > + * numbers. Addressing within one flash drive is of course not reversed. > + * > + * The drive with unit number 0 is mapped at the highest address, and it is > + * passed to pc_isa_bios_init(). Merging severral drives for isa-bios is not > + * supported. > + * > + * The caller is responsible to pass in the non-NULL @pflash_drv that > + * corresponds to the flash drive with unit number 0. > + */ > static void pc_system_flash_init(MemoryRegion *rom_memory, > DriveInfo *pflash_drv) > { > + int unit = 0; > BlockDriverState *bdrv; > int64_t size; > - hwaddr phys_addr; > + hwaddr phys_addr = 0x100000000ULL; > int sector_bits, sector_size; > pflash_t *system_flash; > MemoryRegion *flash_mem; > + char name[64]; > > - bdrv = pflash_drv->bdrv; > - size = bdrv_getlength(pflash_drv->bdrv); > sector_bits = 12; > sector_size = 1 << sector_bits; > > - if ((size % sector_size) != 0) { > - fprintf(stderr, > - "qemu: PC system firmware (pflash) must be a multiple of 0x%x\n", > - sector_size); > - exit(1); > - } > + do { > + bdrv = pflash_drv->bdrv; > + size = bdrv_getlength(bdrv); > + if ((size % sector_size) != 0) { > + fprintf(stderr, > + "qemu: PC system firmware (pflash segment %d) must be a " > + "multiple of 0x%x\n", unit, sector_size); > + exit(1); > + } > + if (size > phys_addr) { > + fprintf(stderr, "qemu: pflash segments must fit under 4G " > + "cumulatively\n"); I suspect things go haywire long before you hit address zero. Note that both pc_piix.c and pc_q35.c leave a hole in RAM just below 4G. The former's hole starts at 0xe0000000, the latter's at 0xb0000000. Should that be the limit? > + exit(1); > + } > > - phys_addr = 0x100000000ULL - size; > - system_flash = pflash_cfi01_register(phys_addr, NULL, "system.flash", size, > - bdrv, sector_size, size >> sector_bits, > - 1, 0x0000, 0x0000, 0x0000, 0x0000, 0); > - flash_mem = pflash_cfi01_get_memory(system_flash); > + phys_addr -= size; > > - pc_isa_bios_init(rom_memory, flash_mem, size); > + /* pflash_cfi01_register() creates a deep copy of the name */ > + snprintf(name, sizeof name, "system.flash%d", unit); > + system_flash = pflash_cfi01_register(phys_addr, NULL /* qdev */, name, > + size, bdrv, sector_size, > + size >> sector_bits, > + 1 /* width */, > + 0x0000 /* id0 */, > + 0x0000 /* id1 */, > + 0x0000 /* id2 */, > + 0x0000 /* id3 */, > + 0 /* be */); > + if (unit == 0) { > + flash_mem = pflash_cfi01_get_memory(system_flash); > + pc_isa_bios_init(rom_memory, flash_mem, size); > + } > + pflash_drv = drive_get(IF_PFLASH, 0, ++unit); > + } while (pflash_drv != NULL); > } > > static void old_pc_system_rom_init(MemoryRegion *rom_memory, bool isapc_ram_fw) The drive with index 0 is passed as parameter pflash_drv. The others the function gets itself. I find that ugly. Have you considered dropping the parameter?
On 11/25/13 16:22, Markus Armbruster wrote: > Laszlo Ersek <lersek@redhat.com> writes: > >> On 11/22/13 13:21, Markus Armbruster wrote: >>> Laszlo Ersek <lersek@redhat.com> writes: >>> >>>> This patch allows the user to usefully specify >>>> >>>> -drive file=img_1,if=pflash,format=raw,readonly \ >>>> -drive file=img_2,if=pflash,format=raw >>>> >>>> on the command line. The flash images will be mapped under 4G in their >>>> reverse unit order -- that is, with their base addresses progressing >>>> downwards, in increasing unit order. >>>> >>>> (The unit number increases with command line order if not explicitly >>>> specified.) >>>> >>>> This accommodates the following use case: suppose that OVMF is split in >>>> two parts, a writeable host file for non-volatile variable storage, and a >>>> read-only part for bootstrap and decompressible executable code. >>>> >>>> The binary code part would be read-only, centrally managed on the host >>>> system, and passed in as unit 0. The variable store would be writeable, >>>> VM-specific, and passed in as unit 1. >>>> >>>> 00000000ffe00000-00000000ffe1ffff (prio 0, R-): system.flash1 >>>> 00000000ffe20000-00000000ffffffff (prio 0, R-): system.flash0 >>>> >>>> (If the guest tries to write to the flash range that is backed by the >>>> read-only drive, bdrv_write() in pflash_update() will correctly deny the >>>> write with -EACCES, and pflash_update() won't care, which suits us well.) >>> >>> Before this patch: >>> >>> You can define as many if=pflash drives as you want. Any with non-zero >>> index are silently ignored. >>> >>> If you don't specify one with index=0, you get a ROM at the top of the >>> 32 bit address space, contents taken from -bios (default "bios.bin"). >>> Up to its last 128KiB are aliased at the top of the ISA address space. >>> >>> If you do specify one with index=0, you get a pflash device at the top >>> of the 32 bit address space, with contents from the drive, and -bios is >>> silently ignored. Up to its last 128KiB are copied to a ROM at the top >>> of the (20 bit) ISA address space. >>> >>> After this patch (please correct misunderstandings): >>> >>> Now the drives after the first unused index are silently ignored. >>> >>> If you don't specify one with index=0, no change. >>> >>> If you do, you now get N pflash devices, where N is the first unused >>> index. Each pflash's contents is taken from the respective drive. The >>> flashes are mapped at the top of the 32 bit address space in reverse >>> index order. -bios is silently ignored, as before. Up to the last >>> 128KiB of the index=0 flash are copied to a ROM at the top of the ISA >>> address space. >>> >>> Thus, no change for index=0. For index=1..N, we now get additional >>> flash devices. >>> >>> Correct? >> >> Yes. > > Thanks. > > 1. Multiple pflash devices > > Is there any way for a guest to see the N separate pflash devices, or do > they look exactly like a single pflash device? The interpretation of multiple -pflash options is board specific. I grepped the source for IF_PFLASH first, and found some boards that use more than one flash device. Merging them in one contiguous target-phys address range would be i386 specific. This doesn't break compatibility (because multiple pflash devices used not to be supported for i386 targets at all), but I agree that this would posit their interpretation for the future, and thus it might need deeper thought. From looking at "hw/block/pflash_cfi01.c", it seems that the guest can interrogate the flash device about its size (nb_blocs is stored in cfi_table starting at 0x2D, and cfi_table can be queried by command 0x98 in pflash_read()). So, if the guest cares, it can figure out that there are multiple devices backing the range. I think. > 2. More board code device configuration magic > > Our long term goal is to configure machines by configuration files > (i.e. data) rather than code. > > Your patch extends the PC board code dealing with if=pflash for multiple > drives. > > Reminder: -drive if=X (where X != none) creates a backend, and leaves it > in a place where board code can find it. Board code may elect to create > a frontend using that backend. Yes, I'm aware. I did think about this -- eg. just create a drive with if=none, and add a frontend with -device something. But, the frontend here is not a device (a "peripheral"), it's a memory region with special mmio ops. Pflash frontends can apparently be created with "-device cfi.pflash01,...': cfi.pflash01.drive=drive cfi.pflash01.num-blocks=uint32 cfi.pflash01.sector-length=uint64 cfi.pflash01.width=uint8 cfi.pflash01.big-endian=uint8 cfi.pflash01.id0=uint16 cfi.pflash01.id1=uint16 cfi.pflash01.id2=uint16 cfi.pflash01.id3=uint16 cfi.pflash01.name=string We can even point it to the backing drive as well. But these properties don't seem to allow for the i386 specific "processing", eg. where to map the device in target-phys address space. > It's desirable that any new board code creating a frontend for -drive > if=X,... is sufficiently simple so that users can get the same result > with some -drive if=none,... -device ... incantation. The second form > provides full control over device properties. See section "Block > Devices" in docs/qdev-device-use.txt for examples of such mappings. > > This is the case for if=ide, if=scsi, if=floppy and if=virtio (see > docs/qdev-device-use.txt). It's not yet the case for if=pflash, because > the qdev used with it (cfi.pflash01) is a sysbus device, and these > aren't available with -device, yet. Thanks, I didn't know that that was in the background. > When that gets fixed, I'd expect support for putting the flash device at > a specific address. An equivalent if=none incantation (free of board > code magic) should be possible. > > Thus, the board magic your patch adds should be of the harmless kind. > Thanks. I think I managed to follow your train of thought, I just wasn't sure where you'd end up. I think, as you say, once sysbus devices can be specified with -device, cfi.pflash01 could take an address, and if that's omitted, the default for i386 could be "map it just below the previous one". Thanks for looking into this, you doubtlessly know way more about the device model than I do. Laszlo
On 11/25/13 16:32, Markus Armbruster wrote: > Laszlo Ersek <lersek@redhat.com> writes: > >> This patch allows the user to usefully specify >> >> -drive file=img_1,if=pflash,format=raw,readonly \ >> -drive file=img_2,if=pflash,format=raw >> >> on the command line. The flash images will be mapped under 4G in their >> reverse unit order -- that is, with their base addresses progressing >> downwards, in increasing unit order. >> >> (The unit number increases with command line order if not explicitly >> specified.) >> >> This accommodates the following use case: suppose that OVMF is split in >> two parts, a writeable host file for non-volatile variable storage, and a >> read-only part for bootstrap and decompressible executable code. >> >> The binary code part would be read-only, centrally managed on the host >> system, and passed in as unit 0. The variable store would be writeable, >> VM-specific, and passed in as unit 1. >> >> 00000000ffe00000-00000000ffe1ffff (prio 0, R-): system.flash1 >> 00000000ffe20000-00000000ffffffff (prio 0, R-): system.flash0 >> >> (If the guest tries to write to the flash range that is backed by the >> read-only drive, bdrv_write() in pflash_update() will correctly deny the >> write with -EACCES, and pflash_update() won't care, which suits us well.) >> >> Signed-off-by: Laszlo Ersek <lersek@redhat.com> >> --- >> hw/i386/pc_sysfw.c | 60 ++++++++++++++++++++++++++++++++++++++++-------------- >> 1 file changed, 45 insertions(+), 15 deletions(-) >> >> diff --git a/hw/i386/pc_sysfw.c b/hw/i386/pc_sysfw.c >> index e917c83..1c3e3d6 100644 >> --- a/hw/i386/pc_sysfw.c >> +++ b/hw/i386/pc_sysfw.c >> @@ -72,35 +72,65 @@ static void pc_isa_bios_init(MemoryRegion *rom_memory, >> memory_region_set_readonly(isa_bios, true); >> } >> >> +/* This function maps flash drives from 4G downward, in order of their unit >> + * numbers. Addressing within one flash drive is of course not reversed. >> + * >> + * The drive with unit number 0 is mapped at the highest address, and it is >> + * passed to pc_isa_bios_init(). Merging severral drives for isa-bios is not >> + * supported. >> + * >> + * The caller is responsible to pass in the non-NULL @pflash_drv that >> + * corresponds to the flash drive with unit number 0. >> + */ >> static void pc_system_flash_init(MemoryRegion *rom_memory, >> DriveInfo *pflash_drv) >> { >> + int unit = 0; >> BlockDriverState *bdrv; >> int64_t size; >> - hwaddr phys_addr; >> + hwaddr phys_addr = 0x100000000ULL; >> int sector_bits, sector_size; >> pflash_t *system_flash; >> MemoryRegion *flash_mem; >> + char name[64]; >> >> - bdrv = pflash_drv->bdrv; >> - size = bdrv_getlength(pflash_drv->bdrv); >> sector_bits = 12; >> sector_size = 1 << sector_bits; >> >> - if ((size % sector_size) != 0) { >> - fprintf(stderr, >> - "qemu: PC system firmware (pflash) must be a multiple of 0x%x\n", >> - sector_size); >> - exit(1); >> - } >> + do { >> + bdrv = pflash_drv->bdrv; >> + size = bdrv_getlength(bdrv); >> + if ((size % sector_size) != 0) { >> + fprintf(stderr, >> + "qemu: PC system firmware (pflash segment %d) must be a " >> + "multiple of 0x%x\n", unit, sector_size); >> + exit(1); >> + } >> + if (size > phys_addr) { >> + fprintf(stderr, "qemu: pflash segments must fit under 4G " >> + "cumulatively\n"); > > I suspect things go haywire long before you hit address zero. > > Note that both pc_piix.c and pc_q35.c leave a hole in RAM just below 4G. > The former's hole starts at 0xe0000000, the latter's at 0xb0000000. > Should that be the limit? I wanted to do the bare minimal here, to catch obviously wrong backing drives (like a 10G file). This is already more verification than what the current code does. I wouldn't mind more a specific check, but I don't want to suggest (with more specific code) that it's actually *safe* to go down to the limit that I'd put here. For example, the IO-APIC mmio range is [0xFEE00000..0xFEE01000[, leaving free 18MB-4KB just below 4G. (Of which the current OVMF, including variable store, takes up 2MB.) Grep IO_APIC_DEFAULT_ADDRESS. I just don't want to send the message "it's safe to go all the way down there". Right now the responsibility is with the user (you can specify a single pflash device that's 20MB in size even now), and I'd like to stick with that. I believe that pflash_cfi01_register() sysbus_mmio_map() sysbus_mmio_map_common() memory_region_add_subregion() memory_region_add_subregion_common() could, in theory, find a conflict at runtime (see the #if 0-surrounded collision warning in memory_region_add_subregion_common()). But the memory API doesn't consider such collisions hard errors, and no status code is propagated to the caller. So, if a saner / more reliable limit can be identified, I wouldn't mind checking against that, but right now I know of no such sane / general enough limit. > >> + exit(1); >> + } >> >> - phys_addr = 0x100000000ULL - size; >> - system_flash = pflash_cfi01_register(phys_addr, NULL, "system.flash", size, >> - bdrv, sector_size, size >> sector_bits, >> - 1, 0x0000, 0x0000, 0x0000, 0x0000, 0); >> - flash_mem = pflash_cfi01_get_memory(system_flash); >> + phys_addr -= size; >> >> - pc_isa_bios_init(rom_memory, flash_mem, size); >> + /* pflash_cfi01_register() creates a deep copy of the name */ >> + snprintf(name, sizeof name, "system.flash%d", unit); >> + system_flash = pflash_cfi01_register(phys_addr, NULL /* qdev */, name, >> + size, bdrv, sector_size, >> + size >> sector_bits, >> + 1 /* width */, >> + 0x0000 /* id0 */, >> + 0x0000 /* id1 */, >> + 0x0000 /* id2 */, >> + 0x0000 /* id3 */, >> + 0 /* be */); >> + if (unit == 0) { >> + flash_mem = pflash_cfi01_get_memory(system_flash); >> + pc_isa_bios_init(rom_memory, flash_mem, size); >> + } >> + pflash_drv = drive_get(IF_PFLASH, 0, ++unit); >> + } while (pflash_drv != NULL); >> } >> >> static void old_pc_system_rom_init(MemoryRegion *rom_memory, bool isapc_ram_fw) > > The drive with index 0 is passed as parameter pflash_drv. The others > the function gets itself. I find that ugly. Have you considered > dropping the parameter? I didn't like drive_get() to begin with. It always starts to scan the drive list from the beginning, which makes the new loop in pc_system_flash_init() O(n^2). I was missing an interator-style interface for the drives, but I found none, and I thought that iterating myself through them in O(n) (and checking for the types) would break the current DriveInfo encapsulation. So I kinda gave up on "elegance". Ideally, what should be dropped is the "unit" local variable in pc_system_flash_init(). The function should continue to take "pflash_drv", which should however qualify as a pre-initialized iterator. Then pc_system_flash_init() should traverse it until it runs out. I can of course remove the parameter and start a "while" loop (rather than a "do" loop) with drive_get(IF_PFLASH, 0, 0), if you consider that an improvement. Thanks! Laszlo
Laszlo Ersek <lersek@redhat.com> writes: > On 11/25/13 16:22, Markus Armbruster wrote: >> Laszlo Ersek <lersek@redhat.com> writes: >> >>> On 11/22/13 13:21, Markus Armbruster wrote: >>>> Laszlo Ersek <lersek@redhat.com> writes: >>>> >>>>> This patch allows the user to usefully specify >>>>> >>>>> -drive file=img_1,if=pflash,format=raw,readonly \ >>>>> -drive file=img_2,if=pflash,format=raw >>>>> >>>>> on the command line. The flash images will be mapped under 4G in their >>>>> reverse unit order -- that is, with their base addresses progressing >>>>> downwards, in increasing unit order. >>>>> >>>>> (The unit number increases with command line order if not explicitly >>>>> specified.) >>>>> >>>>> This accommodates the following use case: suppose that OVMF is split in >>>>> two parts, a writeable host file for non-volatile variable storage, and a >>>>> read-only part for bootstrap and decompressible executable code. >>>>> >>>>> The binary code part would be read-only, centrally managed on the host >>>>> system, and passed in as unit 0. The variable store would be writeable, >>>>> VM-specific, and passed in as unit 1. >>>>> >>>>> 00000000ffe00000-00000000ffe1ffff (prio 0, R-): system.flash1 >>>>> 00000000ffe20000-00000000ffffffff (prio 0, R-): system.flash0 >>>>> >>>>> (If the guest tries to write to the flash range that is backed by the >>>>> read-only drive, bdrv_write() in pflash_update() will correctly deny the >>>>> write with -EACCES, and pflash_update() won't care, which suits us well.) >>>> >>>> Before this patch: >>>> >>>> You can define as many if=pflash drives as you want. Any with non-zero >>>> index are silently ignored. >>>> >>>> If you don't specify one with index=0, you get a ROM at the top of the >>>> 32 bit address space, contents taken from -bios (default "bios.bin"). >>>> Up to its last 128KiB are aliased at the top of the ISA address space. >>>> >>>> If you do specify one with index=0, you get a pflash device at the top >>>> of the 32 bit address space, with contents from the drive, and -bios is >>>> silently ignored. Up to its last 128KiB are copied to a ROM at the top >>>> of the (20 bit) ISA address space. >>>> >>>> After this patch (please correct misunderstandings): >>>> >>>> Now the drives after the first unused index are silently ignored. >>>> >>>> If you don't specify one with index=0, no change. >>>> >>>> If you do, you now get N pflash devices, where N is the first unused >>>> index. Each pflash's contents is taken from the respective drive. The >>>> flashes are mapped at the top of the 32 bit address space in reverse >>>> index order. -bios is silently ignored, as before. Up to the last >>>> 128KiB of the index=0 flash are copied to a ROM at the top of the ISA >>>> address space. >>>> >>>> Thus, no change for index=0. For index=1..N, we now get additional >>>> flash devices. >>>> >>>> Correct? >>> >>> Yes. >> >> Thanks. >> >> 1. Multiple pflash devices >> >> Is there any way for a guest to see the N separate pflash devices, or do >> they look exactly like a single pflash device? > > The interpretation of multiple -pflash options is board specific. I > grepped the source for IF_PFLASH first, and found some boards that use > more than one flash device. Merging them in one contiguous target-phys > address range would be i386 specific. > > This doesn't break compatibility (because multiple pflash devices used > not to be supported for i386 targets at all), but I agree that this > would posit their interpretation for the future, and thus it might need > deeper thought. > >>From looking at "hw/block/pflash_cfi01.c", it seems that the guest can > interrogate the flash device about its size (nb_blocs is stored in > cfi_table starting at 0x2D, and cfi_table can be queried by command 0x98 > in pflash_read()). So, if the guest cares, it can figure out that there > are multiple devices backing the range. I think. Your stated purpose for multiple -pflash: This accommodates the following use case: suppose that OVMF is split in two parts, a writeable host file for non-volatile variable storage, and a read-only part for bootstrap and decompressible executable code. Such a split between writable part and read-only part makes sense to me. How is it done in physical hardware? Single device with configurable write-protect, or two separate devices? >> 2. More board code device configuration magic >> >> Our long term goal is to configure machines by configuration files >> (i.e. data) rather than code. >> >> Your patch extends the PC board code dealing with if=pflash for multiple >> drives. >> >> Reminder: -drive if=X (where X != none) creates a backend, and leaves it >> in a place where board code can find it. Board code may elect to create >> a frontend using that backend. > > Yes, I'm aware. I did think about this -- eg. just create a drive with > if=none, and add a frontend with -device something. But, the frontend > here is not a device (a "peripheral"), it's a memory region with special > mmio ops. > > Pflash frontends can apparently be created with "-device cfi.pflash01,...': > > cfi.pflash01.drive=drive > cfi.pflash01.num-blocks=uint32 > cfi.pflash01.sector-length=uint64 > cfi.pflash01.width=uint8 > cfi.pflash01.big-endian=uint8 > cfi.pflash01.id0=uint16 > cfi.pflash01.id1=uint16 > cfi.pflash01.id2=uint16 > cfi.pflash01.id3=uint16 > cfi.pflash01.name=string > > We can even point it to the backing drive as well. But these properties > don't seem to allow for the i386 specific "processing", eg. where to map > the device in target-phys address space. It's a sysbus device, and these still don't work with -device. My pending "[PATCH v3 00/10] Clean up and fix no_user" makes them unavailable with -device. >> It's desirable that any new board code creating a frontend for -drive >> if=X,... is sufficiently simple so that users can get the same result >> with some -drive if=none,... -device ... incantation. The second form >> provides full control over device properties. See section "Block >> Devices" in docs/qdev-device-use.txt for examples of such mappings. >> >> This is the case for if=ide, if=scsi, if=floppy and if=virtio (see >> docs/qdev-device-use.txt). It's not yet the case for if=pflash, because >> the qdev used with it (cfi.pflash01) is a sysbus device, and these >> aren't available with -device, yet. Actually, s/aren't available/don't work/ without my pending series just mentioned. > Thanks, I didn't know that that was in the background. > >> When that gets fixed, I'd expect support for putting the flash device at >> a specific address. An equivalent if=none incantation (free of board >> code magic) should be possible. >> >> Thus, the board magic your patch adds should be of the harmless kind. >> > > Thanks. I think I managed to follow your train of thought, I just wasn't > sure where you'd end up. I think, as you say, once sysbus devices can be > specified with -device, cfi.pflash01 could take an address, and if > that's omitted, the default for i386 could be "map it just below the > previous one". Yes, except I wouldn't expect such fancy defaults. > Thanks for looking into this, you doubtlessly know way more about the > device model than I do. No problem, I just wanted to figure out whether we're creating even more legacy -drive headaches here, so why not share my reasoning.
Laszlo Ersek <lersek@redhat.com> writes: > On 11/22/13 13:21, Markus Armbruster wrote: >> Laszlo Ersek <lersek@redhat.com> writes: >> >>> This patch allows the user to usefully specify >>> >>> -drive file=img_1,if=pflash,format=raw,readonly \ >>> -drive file=img_2,if=pflash,format=raw >>> >>> on the command line. The flash images will be mapped under 4G in their >>> reverse unit order -- that is, with their base addresses progressing >>> downwards, in increasing unit order. >>> >>> (The unit number increases with command line order if not explicitly >>> specified.) >>> >>> This accommodates the following use case: suppose that OVMF is split in >>> two parts, a writeable host file for non-volatile variable storage, and a >>> read-only part for bootstrap and decompressible executable code. >>> >>> The binary code part would be read-only, centrally managed on the host >>> system, and passed in as unit 0. The variable store would be writeable, >>> VM-specific, and passed in as unit 1. >>> >>> 00000000ffe00000-00000000ffe1ffff (prio 0, R-): system.flash1 >>> 00000000ffe20000-00000000ffffffff (prio 0, R-): system.flash0 >>> >>> (If the guest tries to write to the flash range that is backed by the >>> read-only drive, bdrv_write() in pflash_update() will correctly deny the >>> write with -EACCES, and pflash_update() won't care, which suits us well.) >> >> Before this patch: >> >> You can define as many if=pflash drives as you want. Any with non-zero >> index are silently ignored. >> >> If you don't specify one with index=0, you get a ROM at the top of the >> 32 bit address space, contents taken from -bios (default "bios.bin"). >> Up to its last 128KiB are aliased at the top of the ISA address space. >> >> If you do specify one with index=0, you get a pflash device at the top >> of the 32 bit address space, with contents from the drive, and -bios is >> silently ignored. Up to its last 128KiB are copied to a ROM at the top >> of the (20 bit) ISA address space. >> >> After this patch (please correct misunderstandings): >> >> Now the drives after the first unused index are silently ignored. >> >> If you don't specify one with index=0, no change. >> >> If you do, you now get N pflash devices, where N is the first unused >> index. Each pflash's contents is taken from the respective drive. The >> flashes are mapped at the top of the 32 bit address space in reverse >> index order. -bios is silently ignored, as before. Up to the last >> 128KiB of the index=0 flash are copied to a ROM at the top of the ISA >> address space. >> >> Thus, no change for index=0. For index=1..N, we now get additional >> flash devices. >> >> Correct? > > Yes. Thus, we grab *all* if=pflash drives for this purpose. Your stated use case wants just two. Hmm. Are we sure we'll never want to map an if=pflash device somewhere else? I'm asking because such ABI things are a pain to change later on...
Laszlo Ersek <lersek@redhat.com> writes: > On 11/25/13 16:32, Markus Armbruster wrote: >> Laszlo Ersek <lersek@redhat.com> writes: >> >>> This patch allows the user to usefully specify >>> >>> -drive file=img_1,if=pflash,format=raw,readonly \ >>> -drive file=img_2,if=pflash,format=raw >>> >>> on the command line. The flash images will be mapped under 4G in their >>> reverse unit order -- that is, with their base addresses progressing >>> downwards, in increasing unit order. >>> >>> (The unit number increases with command line order if not explicitly >>> specified.) >>> >>> This accommodates the following use case: suppose that OVMF is split in >>> two parts, a writeable host file for non-volatile variable storage, and a >>> read-only part for bootstrap and decompressible executable code. >>> >>> The binary code part would be read-only, centrally managed on the host >>> system, and passed in as unit 0. The variable store would be writeable, >>> VM-specific, and passed in as unit 1. >>> >>> 00000000ffe00000-00000000ffe1ffff (prio 0, R-): system.flash1 >>> 00000000ffe20000-00000000ffffffff (prio 0, R-): system.flash0 >>> >>> (If the guest tries to write to the flash range that is backed by the >>> read-only drive, bdrv_write() in pflash_update() will correctly deny the >>> write with -EACCES, and pflash_update() won't care, which suits us well.) >>> >>> Signed-off-by: Laszlo Ersek <lersek@redhat.com> >>> --- >>> hw/i386/pc_sysfw.c | 60 ++++++++++++++++++++++++++++++++++++++++-------------- >>> 1 file changed, 45 insertions(+), 15 deletions(-) >>> >>> diff --git a/hw/i386/pc_sysfw.c b/hw/i386/pc_sysfw.c >>> index e917c83..1c3e3d6 100644 >>> --- a/hw/i386/pc_sysfw.c >>> +++ b/hw/i386/pc_sysfw.c >>> @@ -72,35 +72,65 @@ static void pc_isa_bios_init(MemoryRegion *rom_memory, >>> memory_region_set_readonly(isa_bios, true); >>> } >>> >>> +/* This function maps flash drives from 4G downward, in order of their unit >>> + * numbers. Addressing within one flash drive is of course not reversed. >>> + * >>> + * The drive with unit number 0 is mapped at the highest address, and it is >>> + * passed to pc_isa_bios_init(). Merging severral drives for isa-bios is not >>> + * supported. >>> + * >>> + * The caller is responsible to pass in the non-NULL @pflash_drv that >>> + * corresponds to the flash drive with unit number 0. >>> + */ >>> static void pc_system_flash_init(MemoryRegion *rom_memory, >>> DriveInfo *pflash_drv) >>> { >>> + int unit = 0; >>> BlockDriverState *bdrv; >>> int64_t size; >>> - hwaddr phys_addr; >>> + hwaddr phys_addr = 0x100000000ULL; >>> int sector_bits, sector_size; >>> pflash_t *system_flash; >>> MemoryRegion *flash_mem; >>> + char name[64]; >>> >>> - bdrv = pflash_drv->bdrv; >>> - size = bdrv_getlength(pflash_drv->bdrv); >>> sector_bits = 12; >>> sector_size = 1 << sector_bits; >>> >>> - if ((size % sector_size) != 0) { >>> - fprintf(stderr, >>> - "qemu: PC system firmware (pflash) must be a multiple of 0x%x\n", >>> - sector_size); >>> - exit(1); >>> - } >>> + do { >>> + bdrv = pflash_drv->bdrv; >>> + size = bdrv_getlength(bdrv); >>> + if ((size % sector_size) != 0) { >>> + fprintf(stderr, >>> + "qemu: PC system firmware (pflash segment %d) must be a " >>> + "multiple of 0x%x\n", unit, sector_size); >>> + exit(1); >>> + } >>> + if (size > phys_addr) { >>> + fprintf(stderr, "qemu: pflash segments must fit under 4G " >>> + "cumulatively\n"); You're just following existing bad practice here, but correct use of error_report() would give you nicer messages. Happy to explain if you're interested. >> I suspect things go haywire long before you hit address zero. >> >> Note that both pc_piix.c and pc_q35.c leave a hole in RAM just below 4G. >> The former's hole starts at 0xe0000000, the latter's at 0xb0000000. >> Should that be the limit? > > I wanted to do the bare minimal here, to catch obviously wrong backing > drives (like a 10G file). This is already more verification than what > the current code does. > > I wouldn't mind more a specific check, but I don't want to suggest (with > more specific code) that it's actually *safe* to go down to the limit > that I'd put here. > > For example, the IO-APIC mmio range is [0xFEE00000..0xFEE01000[, leaving > free 18MB-4KB just below 4G. (Of which the current OVMF, including > variable store, takes up 2MB.) Grep IO_APIC_DEFAULT_ADDRESS. > > I just don't want to send the message "it's safe to go all the way down > there". Right now the responsibility is with the user (you can specify a > single pflash device that's 20MB in size even now), and I'd like to > stick with that. > > I believe that > > pflash_cfi01_register() > sysbus_mmio_map() > sysbus_mmio_map_common() > memory_region_add_subregion() > memory_region_add_subregion_common() > > could, in theory, find a conflict at runtime (see the #if 0-surrounded > collision warning in memory_region_add_subregion_common()). But the > memory API doesn't consider such collisions hard errors, and no status > code is propagated to the caller. > > So, if a saner / more reliable limit can be identified, I wouldn't mind > checking against that, but right now I know of no such sane / general > enough limit. If the caller knows the true limit, it could pass it. If the true limit isn't practical to find, then what about a comment explaining the problem? >>> + exit(1); >>> + } >>> >>> - phys_addr = 0x100000000ULL - size; >>> - system_flash = pflash_cfi01_register(phys_addr, NULL, >>> "system.flash", size, >>> - bdrv, sector_size, size >> sector_bits, >>> - 1, 0x0000, 0x0000, 0x0000, 0x0000, 0); >>> - flash_mem = pflash_cfi01_get_memory(system_flash); >>> + phys_addr -= size; >>> >>> - pc_isa_bios_init(rom_memory, flash_mem, size); >>> + /* pflash_cfi01_register() creates a deep copy of the name */ >>> + snprintf(name, sizeof name, "system.flash%d", unit); >>> + system_flash = pflash_cfi01_register(phys_addr, NULL /* qdev */, >>> name, >>> + size, bdrv, sector_size, >>> + size >> sector_bits, >>> + 1 /* width */, >>> + 0x0000 /* id0 */, >>> + 0x0000 /* id1 */, >>> + 0x0000 /* id2 */, >>> + 0x0000 /* id3 */, >>> + 0 /* be */); >>> + if (unit == 0) { >>> + flash_mem = pflash_cfi01_get_memory(system_flash); >>> + pc_isa_bios_init(rom_memory, flash_mem, size); >>> + } >>> + pflash_drv = drive_get(IF_PFLASH, 0, ++unit); >>> + } while (pflash_drv != NULL); >>> } >>> >>> static void old_pc_system_rom_init(MemoryRegion *rom_memory, bool >>> isapc_ram_fw) >> >> The drive with index 0 is passed as parameter pflash_drv. The others >> the function gets itself. I find that ugly. Have you considered >> dropping the parameter? > > I didn't like drive_get() to begin with. It always starts to scan the > drive list from the beginning, which makes the new loop in > pc_system_flash_init() O(n^2). Yes, it's pretty stupid, but n is small, and the code runs only during initialization. > I was missing an interator-style interface for the drives, but I found > none, and I thought that iterating myself through them in O(n) (and > checking for the types) would break the current DriveInfo encapsulation. > So I kinda gave up on "elegance". Legacy drives and elegance are about as attracted to each other as oil and water. > Ideally, what should be dropped is the "unit" local variable in > pc_system_flash_init(). The function should continue to take > "pflash_drv", which should however qualify as a pre-initialized > iterator. Then pc_system_flash_init() should traverse it until it runs out. Yes, that would work, but requires a bit of new blockdev infrastructure. > I can of course remove the parameter and start a "while" loop > (rather than a "do" loop) with drive_get(IF_PFLASH, 0, 0), if you > consider that an improvement. I'm partial to for-loops that let me see the complete loop control in one glance: for (unit = 0; drv = drive_get(IF_PFLASH, 0, unit); unit++)
On 11/26/13 13:53, Markus Armbruster wrote: > Thus, we grab *all* if=pflash drives for this purpose. > > Your stated use case wants just two. > > Hmm. Are we sure we'll never want to map an if=pflash device somewhere > else? No, I'm not sure. Thanks Laszlo
On 11/26/13 13:36, Markus Armbruster wrote: > Your stated purpose for multiple -pflash: > > This accommodates the following use case: suppose that OVMF is split in > two parts, a writeable host file for non-volatile variable storage, and a > read-only part for bootstrap and decompressible executable code. > > Such a split between writable part and read-only part makes sense to me. > How is it done in physical hardware? Single device with configurable > write-protect, or two separate devices? (Jordan could help more.) Likely one device that's fully writeable. The flash driver (through which the NvVar updates go) makes sure that a kind of journal is written and that the live variable store is not corrupted even if power is cut during an update. However, if something writes to the flash without going through the driver, it can brick the board. (Trample over the bootstrap code for example.) I think. Laszlo
On 11/26/13 14:11, Markus Armbruster wrote: > Laszlo Ersek <lersek@redhat.com> writes: > >> On 11/25/13 16:32, Markus Armbruster wrote: >>> Laszlo Ersek <lersek@redhat.com> writes: >>> >>>> This patch allows the user to usefully specify >>>> >>>> -drive file=img_1,if=pflash,format=raw,readonly \ >>>> -drive file=img_2,if=pflash,format=raw >>>> >>>> on the command line. The flash images will be mapped under 4G in their >>>> reverse unit order -- that is, with their base addresses progressing >>>> downwards, in increasing unit order. >>>> >>>> (The unit number increases with command line order if not explicitly >>>> specified.) >>>> >>>> This accommodates the following use case: suppose that OVMF is split in >>>> two parts, a writeable host file for non-volatile variable storage, and a >>>> read-only part for bootstrap and decompressible executable code. >>>> >>>> The binary code part would be read-only, centrally managed on the host >>>> system, and passed in as unit 0. The variable store would be writeable, >>>> VM-specific, and passed in as unit 1. >>>> >>>> 00000000ffe00000-00000000ffe1ffff (prio 0, R-): system.flash1 >>>> 00000000ffe20000-00000000ffffffff (prio 0, R-): system.flash0 >>>> >>>> (If the guest tries to write to the flash range that is backed by the >>>> read-only drive, bdrv_write() in pflash_update() will correctly deny the >>>> write with -EACCES, and pflash_update() won't care, which suits us well.) >>>> >>>> Signed-off-by: Laszlo Ersek <lersek@redhat.com> >>>> --- >>>> hw/i386/pc_sysfw.c | 60 ++++++++++++++++++++++++++++++++++++++++-------------- >>>> 1 file changed, 45 insertions(+), 15 deletions(-) >>>> >>>> diff --git a/hw/i386/pc_sysfw.c b/hw/i386/pc_sysfw.c >>>> index e917c83..1c3e3d6 100644 >>>> --- a/hw/i386/pc_sysfw.c >>>> +++ b/hw/i386/pc_sysfw.c >>>> @@ -72,35 +72,65 @@ static void pc_isa_bios_init(MemoryRegion *rom_memory, >>>> memory_region_set_readonly(isa_bios, true); >>>> } >>>> >>>> +/* This function maps flash drives from 4G downward, in order of their unit >>>> + * numbers. Addressing within one flash drive is of course not reversed. >>>> + * >>>> + * The drive with unit number 0 is mapped at the highest address, and it is >>>> + * passed to pc_isa_bios_init(). Merging severral drives for isa-bios is not >>>> + * supported. >>>> + * >>>> + * The caller is responsible to pass in the non-NULL @pflash_drv that >>>> + * corresponds to the flash drive with unit number 0. >>>> + */ >>>> static void pc_system_flash_init(MemoryRegion *rom_memory, >>>> DriveInfo *pflash_drv) >>>> { >>>> + int unit = 0; >>>> BlockDriverState *bdrv; >>>> int64_t size; >>>> - hwaddr phys_addr; >>>> + hwaddr phys_addr = 0x100000000ULL; >>>> int sector_bits, sector_size; >>>> pflash_t *system_flash; >>>> MemoryRegion *flash_mem; >>>> + char name[64]; >>>> >>>> - bdrv = pflash_drv->bdrv; >>>> - size = bdrv_getlength(pflash_drv->bdrv); >>>> sector_bits = 12; >>>> sector_size = 1 << sector_bits; >>>> >>>> - if ((size % sector_size) != 0) { >>>> - fprintf(stderr, >>>> - "qemu: PC system firmware (pflash) must be a multiple of 0x%x\n", >>>> - sector_size); >>>> - exit(1); >>>> - } >>>> + do { >>>> + bdrv = pflash_drv->bdrv; >>>> + size = bdrv_getlength(bdrv); >>>> + if ((size % sector_size) != 0) { >>>> + fprintf(stderr, >>>> + "qemu: PC system firmware (pflash segment %d) must be a " >>>> + "multiple of 0x%x\n", unit, sector_size); >>>> + exit(1); >>>> + } >>>> + if (size > phys_addr) { >>>> + fprintf(stderr, "qemu: pflash segments must fit under 4G " >>>> + "cumulatively\n"); > > You're just following existing bad practice here, but correct use of > error_report() would give you nicer messages. Happy to explain if > you're interested. You have the Location thing in mind, eg. automatically binding the error message to the offending command line option, right? I've seen you use it before in another patch. Hm... It was commit ec2df8c10a4585ba4641ae482cf2f5f13daa810e. I will try to address the rest of your comments too when I get back to this patch. Thanks Laszlo > >>> I suspect things go haywire long before you hit address zero. >>> >>> Note that both pc_piix.c and pc_q35.c leave a hole in RAM just below 4G. >>> The former's hole starts at 0xe0000000, the latter's at 0xb0000000. >>> Should that be the limit? >> >> I wanted to do the bare minimal here, to catch obviously wrong backing >> drives (like a 10G file). This is already more verification than what >> the current code does. >> >> I wouldn't mind more a specific check, but I don't want to suggest (with >> more specific code) that it's actually *safe* to go down to the limit >> that I'd put here. >> >> For example, the IO-APIC mmio range is [0xFEE00000..0xFEE01000[, leaving >> free 18MB-4KB just below 4G. (Of which the current OVMF, including >> variable store, takes up 2MB.) Grep IO_APIC_DEFAULT_ADDRESS. >> >> I just don't want to send the message "it's safe to go all the way down >> there". Right now the responsibility is with the user (you can specify a >> single pflash device that's 20MB in size even now), and I'd like to >> stick with that. >> >> I believe that >> >> pflash_cfi01_register() >> sysbus_mmio_map() >> sysbus_mmio_map_common() >> memory_region_add_subregion() >> memory_region_add_subregion_common() >> >> could, in theory, find a conflict at runtime (see the #if 0-surrounded >> collision warning in memory_region_add_subregion_common()). But the >> memory API doesn't consider such collisions hard errors, and no status >> code is propagated to the caller. >> >> So, if a saner / more reliable limit can be identified, I wouldn't mind >> checking against that, but right now I know of no such sane / general >> enough limit. > > If the caller knows the true limit, it could pass it. > > If the true limit isn't practical to find, then what about a comment > explaining the problem? > >>>> + exit(1); >>>> + } >>>> >>>> - phys_addr = 0x100000000ULL - size; >>>> - system_flash = pflash_cfi01_register(phys_addr, NULL, >>>> "system.flash", size, >>>> - bdrv, sector_size, size >> sector_bits, >>>> - 1, 0x0000, 0x0000, 0x0000, 0x0000, 0); >>>> - flash_mem = pflash_cfi01_get_memory(system_flash); >>>> + phys_addr -= size; >>>> >>>> - pc_isa_bios_init(rom_memory, flash_mem, size); >>>> + /* pflash_cfi01_register() creates a deep copy of the name */ >>>> + snprintf(name, sizeof name, "system.flash%d", unit); >>>> + system_flash = pflash_cfi01_register(phys_addr, NULL /* qdev */, >>>> name, >>>> + size, bdrv, sector_size, >>>> + size >> sector_bits, >>>> + 1 /* width */, >>>> + 0x0000 /* id0 */, >>>> + 0x0000 /* id1 */, >>>> + 0x0000 /* id2 */, >>>> + 0x0000 /* id3 */, >>>> + 0 /* be */); >>>> + if (unit == 0) { >>>> + flash_mem = pflash_cfi01_get_memory(system_flash); >>>> + pc_isa_bios_init(rom_memory, flash_mem, size); >>>> + } >>>> + pflash_drv = drive_get(IF_PFLASH, 0, ++unit); >>>> + } while (pflash_drv != NULL); >>>> } >>>> >>>> static void old_pc_system_rom_init(MemoryRegion *rom_memory, bool >>>> isapc_ram_fw) >>> >>> The drive with index 0 is passed as parameter pflash_drv. The others >>> the function gets itself. I find that ugly. Have you considered >>> dropping the parameter? >> >> I didn't like drive_get() to begin with. It always starts to scan the >> drive list from the beginning, which makes the new loop in >> pc_system_flash_init() O(n^2). > > Yes, it's pretty stupid, but n is small, and the code runs only during > initialization. > >> I was missing an interator-style interface for the drives, but I found >> none, and I thought that iterating myself through them in O(n) (and >> checking for the types) would break the current DriveInfo encapsulation. >> So I kinda gave up on "elegance". > > Legacy drives and elegance are about as attracted to each other as oil > and water. > >> Ideally, what should be dropped is the "unit" local variable in >> pc_system_flash_init(). The function should continue to take >> "pflash_drv", which should however qualify as a pre-initialized >> iterator. Then pc_system_flash_init() should traverse it until it runs out. > > Yes, that would work, but requires a bit of new blockdev infrastructure. > >> I can of course remove the parameter and start a "while" loop >> (rather than a "do" loop) with drive_get(IF_PFLASH, 0, 0), if you >> consider that an improvement. > > I'm partial to for-loops that let me see the complete loop control in > one glance: > > for (unit = 0; drv = drive_get(IF_PFLASH, 0, unit); unit++) >
Il 25/11/2013 20:45, Laszlo Ersek ha scritto: > From looking at "hw/block/pflash_cfi01.c", it seems that the guest can > interrogate the flash device about its size (nb_blocs is stored in > cfi_table starting at 0x2D, and cfi_table can be queried by command 0x98 > in pflash_read()). So, if the guest cares, it can figure out that there > are multiple devices backing the range. I think. IIUC in the case of OVMF the guest "just knows" that there are multiple devices backing the range. Is that right? But yes, I think that the guest can figure out that there are multiple devices backing the range. From reading the pflash code further: * the pflash device doesn't care about the location where you write the command (the exception is the "block erase" command) * the pflash device only cares about the LSB you read from when you read data from it So you can use the last 256 bytes of the first flash (you know it ends at 4GB) to check for the device (there's probably some suggested protocol to do that, I don't know) and query its size. Example with "-qtest stdio": writeb 0xffffff00 0x98 OK readb 0xfffffff2d OK 0x000000000000001f readb 0xfffffff2e OK 0x0000000000000000 readb 0xfffffff2f OK 0x0000000000000010 readb 0xfffffff30 OK 0x0000000000000000 writeb 0xffffff00 0x98 OK This means the device has 31+1 blocks each 4KB in size. You can then query the next device at 4GB-128KB-256, and so on. Paolo
On 11/26/13 14:41, Paolo Bonzini wrote: > Il 25/11/2013 20:45, Laszlo Ersek ha scritto: >> From looking at "hw/block/pflash_cfi01.c", it seems that the guest can >> interrogate the flash device about its size (nb_blocs is stored in >> cfi_table starting at 0x2D, and cfi_table can be queried by command 0x98 >> in pflash_read()). So, if the guest cares, it can figure out that there >> are multiple devices backing the range. I think. > > IIUC in the case of OVMF the guest "just knows" that there are multiple > devices backing the range. Is that right? No, the guest (the flash driver) is unaware of that. It could know if it wanted to, but it doesn't care. It cares about a base address and a size, and that range is subdivided into various roles. But how many flash devices back the range is not interesting for the driver. Jordan wrote the driver with one flash device in mind, and when I split the image in two parts, I took care to map them so that the base address, the size, and those boundaries stay the same. It's sufficient for the driver to continue working. > But yes, I think that the guest can figure out that there are multiple > devices backing the range. From reading the pflash code further: > > * the pflash device doesn't care about the location where you write the > command (the exception is the "block erase" command) > > * the pflash device only cares about the LSB you read from when you read > data from it > > So you can use the last 256 bytes of the first flash (you know it ends > at 4GB) to check for the device (there's probably some suggested > protocol to do that, I don't know) and query its size. Example with > "-qtest stdio": > > writeb 0xffffff00 0x98 > OK > readb 0xfffffff2d > OK 0x000000000000001f > readb 0xfffffff2e > OK 0x0000000000000000 > readb 0xfffffff2f > OK 0x0000000000000010 > readb 0xfffffff30 > OK 0x0000000000000000 > writeb 0xffffff00 0x98 > OK > > This means the device has 31+1 blocks each 4KB in size. You can then > query the next device at 4GB-128KB-256, and so on. Thanks for testing this in practice. Laszlo
Il 26/11/2013 14:53, Laszlo Ersek ha scritto: >> > >> > IIUC in the case of OVMF the guest "just knows" that there are multiple >> > devices backing the range. Is that right? > No, the guest (the flash driver) is unaware of that. It could know if it > wanted to, but it doesn't care. It cares about a base address and a > size, and that range is subdivided into various roles. But how many > flash devices back the range is not interesting for the driver. > > Jordan wrote the driver with one flash device in mind, and when I split > the image in two parts, I took care to map them so that the base > address, the size, and those boundaries stay the same. It's sufficient > for the driver to continue working. Ah, I see it now. That's because the driver never needs to write an offset into the flash device. Offsets are communicated by writing to different memory addresses. Since the OVMF driver never queries the size of the device, it doesn't care whether the flash is one or two. Paolo
Laszlo Ersek <lersek@redhat.com> writes: > On 11/26/13 14:11, Markus Armbruster wrote: >> Laszlo Ersek <lersek@redhat.com> writes: >> >>> On 11/25/13 16:32, Markus Armbruster wrote: >>>> Laszlo Ersek <lersek@redhat.com> writes: >>>> >>>>> This patch allows the user to usefully specify >>>>> >>>>> -drive file=img_1,if=pflash,format=raw,readonly \ >>>>> -drive file=img_2,if=pflash,format=raw >>>>> >>>>> on the command line. The flash images will be mapped under 4G in their >>>>> reverse unit order -- that is, with their base addresses progressing >>>>> downwards, in increasing unit order. >>>>> >>>>> (The unit number increases with command line order if not explicitly >>>>> specified.) >>>>> >>>>> This accommodates the following use case: suppose that OVMF is split in >>>>> two parts, a writeable host file for non-volatile variable storage, and a >>>>> read-only part for bootstrap and decompressible executable code. >>>>> >>>>> The binary code part would be read-only, centrally managed on the host >>>>> system, and passed in as unit 0. The variable store would be writeable, >>>>> VM-specific, and passed in as unit 1. >>>>> >>>>> 00000000ffe00000-00000000ffe1ffff (prio 0, R-): system.flash1 >>>>> 00000000ffe20000-00000000ffffffff (prio 0, R-): system.flash0 >>>>> >>>>> (If the guest tries to write to the flash range that is backed by the >>>>> read-only drive, bdrv_write() in pflash_update() will correctly deny the >>>>> write with -EACCES, and pflash_update() won't care, which suits us well.) >>>>> >>>>> Signed-off-by: Laszlo Ersek <lersek@redhat.com> >>>>> --- >>>>> hw/i386/pc_sysfw.c | 60 ++++++++++++++++++++++++++++++++++++++++-------------- >>>>> 1 file changed, 45 insertions(+), 15 deletions(-) >>>>> >>>>> diff --git a/hw/i386/pc_sysfw.c b/hw/i386/pc_sysfw.c >>>>> index e917c83..1c3e3d6 100644 >>>>> --- a/hw/i386/pc_sysfw.c >>>>> +++ b/hw/i386/pc_sysfw.c >>>>> @@ -72,35 +72,65 @@ static void pc_isa_bios_init(MemoryRegion *rom_memory, >>>>> memory_region_set_readonly(isa_bios, true); >>>>> } >>>>> >>>>> +/* This function maps flash drives from 4G downward, in order of their unit >>>>> + * numbers. Addressing within one flash drive is of course not reversed. >>>>> + * >>>>> + * The drive with unit number 0 is mapped at the highest address, and it is >>>>> + * passed to pc_isa_bios_init(). Merging severral drives for isa-bios is not >>>>> + * supported. >>>>> + * >>>>> + * The caller is responsible to pass in the non-NULL @pflash_drv that >>>>> + * corresponds to the flash drive with unit number 0. >>>>> + */ >>>>> static void pc_system_flash_init(MemoryRegion *rom_memory, >>>>> DriveInfo *pflash_drv) >>>>> { >>>>> + int unit = 0; >>>>> BlockDriverState *bdrv; >>>>> int64_t size; >>>>> - hwaddr phys_addr; >>>>> + hwaddr phys_addr = 0x100000000ULL; >>>>> int sector_bits, sector_size; >>>>> pflash_t *system_flash; >>>>> MemoryRegion *flash_mem; >>>>> + char name[64]; >>>>> >>>>> - bdrv = pflash_drv->bdrv; >>>>> - size = bdrv_getlength(pflash_drv->bdrv); >>>>> sector_bits = 12; >>>>> sector_size = 1 << sector_bits; >>>>> >>>>> - if ((size % sector_size) != 0) { >>>>> - fprintf(stderr, >>>>> - "qemu: PC system firmware (pflash) must be a multiple of 0x%x\n", >>>>> - sector_size); >>>>> - exit(1); >>>>> - } >>>>> + do { >>>>> + bdrv = pflash_drv->bdrv; >>>>> + size = bdrv_getlength(bdrv); >>>>> + if ((size % sector_size) != 0) { >>>>> + fprintf(stderr, >>>>> + "qemu: PC system firmware (pflash segment %d) must be a " >>>>> + "multiple of 0x%x\n", unit, sector_size); >>>>> + exit(1); >>>>> + } >>>>> + if (size > phys_addr) { >>>>> + fprintf(stderr, "qemu: pflash segments must fit under 4G " >>>>> + "cumulatively\n"); >> >> You're just following existing bad practice here, but correct use of >> error_report() would give you nicer messages. Happy to explain if >> you're interested. > > You have the Location thing in mind, eg. automatically binding the error > message to the offending command line option, right? Yes, including a location in the message is one benefit. Getting the right one outside the initial parse can take a bit of extra work. I'm happy to assist with it. Other benefits: consistent message format, timestamping support, and making the intent of the message more obvious to readers of the code. > I've seen you use it before in another patch. Hm... It was commit > ec2df8c10a4585ba4641ae482cf2f5f13daa810e. > > I will try to address the rest of your comments too when I get back to > this patch. Okay :)
On Tue, Nov 26, 2013 at 5:32 AM, Laszlo Ersek <lersek@redhat.com> wrote: > On 11/26/13 13:36, Markus Armbruster wrote: > >> Your stated purpose for multiple -pflash: >> >> This accommodates the following use case: suppose that OVMF is split in >> two parts, a writeable host file for non-volatile variable storage, and a >> read-only part for bootstrap and decompressible executable code. >> >> Such a split between writable part and read-only part makes sense to me. >> How is it done in physical hardware? Single device with configurable >> write-protect, or two separate devices? > > (Jordan could help more.) > > Likely one device that's fully writeable. Most parts will have a dedicated read-only line. Many devices have 'block-locking' that will make some subset of blocks read-only until a reset. In addition to this, many chipsets will allow flash writes to be protected by triggering SMM when a flash write occurs. Using multiple chips are less common due to cost, but this is not a factor for QEMU. :) -Jordan
Laszlo Ersek <lersek@redhat.com> writes: > On 11/26/13 13:53, Markus Armbruster wrote: > >> Thus, we grab *all* if=pflash drives for this purpose. >> >> Your stated use case wants just two. >> >> Hmm. Are we sure we'll never want to map an if=pflash device somewhere >> else? > > No, I'm not sure. Perhaps grabbing just the two we need for now would be more prudent.
Jordan Justen <jljusten@gmail.com> writes: > On Tue, Nov 26, 2013 at 5:32 AM, Laszlo Ersek <lersek@redhat.com> wrote: >> On 11/26/13 13:36, Markus Armbruster wrote: >> >>> Your stated purpose for multiple -pflash: >>> >>> This accommodates the following use case: suppose that OVMF is split in >>> two parts, a writeable host file for non-volatile variable storage, and a >>> read-only part for bootstrap and decompressible executable code. >>> >>> Such a split between writable part and read-only part makes sense to me. >>> How is it done in physical hardware? Single device with configurable >>> write-protect, or two separate devices? >> >> (Jordan could help more.) >> >> Likely one device that's fully writeable. > > Most parts will have a dedicated read-only line. > > Many devices have 'block-locking' that will make some subset of blocks > read-only until a reset. > > In addition to this, many chipsets will allow flash writes to be > protected by triggering SMM when a flash write occurs. > > Using multiple chips are less common due to cost, but this is not a > factor for QEMU. :) Should we stick to what real hardware does? Single device, perhaps with block locking.
On 11/27/13 14:52, Markus Armbruster wrote: > Jordan Justen <jljusten@gmail.com> writes: > >> On Tue, Nov 26, 2013 at 5:32 AM, Laszlo Ersek <lersek@redhat.com> wrote: >>> On 11/26/13 13:36, Markus Armbruster wrote: >>> >>>> Your stated purpose for multiple -pflash: >>>> >>>> This accommodates the following use case: suppose that OVMF is split in >>>> two parts, a writeable host file for non-volatile variable storage, and a >>>> read-only part for bootstrap and decompressible executable code. >>>> >>>> Such a split between writable part and read-only part makes sense to me. >>>> How is it done in physical hardware? Single device with configurable >>>> write-protect, or two separate devices? >>> >>> (Jordan could help more.) >>> >>> Likely one device that's fully writeable. >> >> Most parts will have a dedicated read-only line. >> >> Many devices have 'block-locking' that will make some subset of blocks >> read-only until a reset. >> >> In addition to this, many chipsets will allow flash writes to be >> protected by triggering SMM when a flash write occurs. >> >> Using multiple chips are less common due to cost, but this is not a >> factor for QEMU. :) > > Should we stick to what real hardware does? Single device, perhaps with > block locking. I can't back a single flash device with two drives (= two host-side files), which is the incentive for this change. Thanks Laszlo
On 11/27/13 14:49, Markus Armbruster wrote: > Laszlo Ersek <lersek@redhat.com> writes: > >> On 11/26/13 13:53, Markus Armbruster wrote: >> >>> Thus, we grab *all* if=pflash drives for this purpose. >>> >>> Your stated use case wants just two. >>> >>> Hmm. Are we sure we'll never want to map an if=pflash device somewhere >>> else? >> >> No, I'm not sure. > > Perhaps grabbing just the two we need for now would be more prudent. > Agreed. Units 0 and 1. Thanks! Laszlo
Laszlo Ersek <lersek@redhat.com> writes: > On 11/27/13 14:52, Markus Armbruster wrote: >> Jordan Justen <jljusten@gmail.com> writes: >> >>> On Tue, Nov 26, 2013 at 5:32 AM, Laszlo Ersek <lersek@redhat.com> wrote: >>>> On 11/26/13 13:36, Markus Armbruster wrote: >>>> >>>>> Your stated purpose for multiple -pflash: >>>>> >>>>> This accommodates the following use case: suppose that OVMF is split in >>>>> two parts, a writeable host file for non-volatile variable >>>>> storage, and a >>>>> read-only part for bootstrap and decompressible executable code. >>>>> >>>>> Such a split between writable part and read-only part makes sense to me. >>>>> How is it done in physical hardware? Single device with configurable >>>>> write-protect, or two separate devices? >>>> >>>> (Jordan could help more.) >>>> >>>> Likely one device that's fully writeable. >>> >>> Most parts will have a dedicated read-only line. >>> >>> Many devices have 'block-locking' that will make some subset of blocks >>> read-only until a reset. >>> >>> In addition to this, many chipsets will allow flash writes to be >>> protected by triggering SMM when a flash write occurs. >>> >>> Using multiple chips are less common due to cost, but this is not a >>> factor for QEMU. :) >> >> Should we stick to what real hardware does? Single device, perhaps with >> block locking. > > I can't back a single flash device with two drives (= two host-side > files), which is the incentive for this change. There's no fundamental reason why a single device model instance could not be backed by two block backends, named by two drive properties. I'm not claiming this is the best solution, just offering it for consideration.
On 11/27/13 15:45, Markus Armbruster wrote: > Laszlo Ersek <lersek@redhat.com> writes: > >> On 11/27/13 14:52, Markus Armbruster wrote: >>> Jordan Justen <jljusten@gmail.com> writes: >>> >>>> On Tue, Nov 26, 2013 at 5:32 AM, Laszlo Ersek <lersek@redhat.com> wrote: >>>>> On 11/26/13 13:36, Markus Armbruster wrote: >>>>> >>>>>> Your stated purpose for multiple -pflash: >>>>>> >>>>>> This accommodates the following use case: suppose that OVMF is split in >>>>>> two parts, a writeable host file for non-volatile variable >>>>>> storage, and a >>>>>> read-only part for bootstrap and decompressible executable code. >>>>>> >>>>>> Such a split between writable part and read-only part makes sense to me. >>>>>> How is it done in physical hardware? Single device with configurable >>>>>> write-protect, or two separate devices? >>>>> >>>>> (Jordan could help more.) >>>>> >>>>> Likely one device that's fully writeable. >>>> >>>> Most parts will have a dedicated read-only line. >>>> >>>> Many devices have 'block-locking' that will make some subset of blocks >>>> read-only until a reset. >>>> >>>> In addition to this, many chipsets will allow flash writes to be >>>> protected by triggering SMM when a flash write occurs. >>>> >>>> Using multiple chips are less common due to cost, but this is not a >>>> factor for QEMU. :) >>> >>> Should we stick to what real hardware does? Single device, perhaps with >>> block locking. >> >> I can't back a single flash device with two drives (= two host-side >> files), which is the incentive for this change. > > There's no fundamental reason why a single device model instance could > not be backed by two block backends, named by two drive properties. > > I'm not claiming this is the best solution, just offering it for > consideration. I'll pass :) I guess in theory we could push down the multi-drive thing to "pflash_cfi01.c". But: - It is used by a bunch of other boards. - Regarding i386, the second drive could automatically become (dependent on the first drive's size) part of the range that is mirrored in isa-bios space. Probably not intended. - The previous point strengthens the "pflash is used by other boards too" concern: in case of i386, I'm at least halfway aware of the board-specific consequences of sneaking in another drive, but I have no clue about the other boards. - If we decide at some point to map / paste backing drives in a loop, then the (at that time) existent device properties of pflash, let's call them "drive0" and "drive1", will look clumsy. We'll need a way to parse an unknown number of backend (drive) IDs. A mess. Thanks! Laszlo
Laszlo Ersek <lersek@redhat.com> writes: > On 11/27/13 15:45, Markus Armbruster wrote: >> Laszlo Ersek <lersek@redhat.com> writes: >> >>> On 11/27/13 14:52, Markus Armbruster wrote: >>>> Jordan Justen <jljusten@gmail.com> writes: >>>> >>>>> On Tue, Nov 26, 2013 at 5:32 AM, Laszlo Ersek <lersek@redhat.com> wrote: >>>>>> On 11/26/13 13:36, Markus Armbruster wrote: >>>>>> >>>>>>> Your stated purpose for multiple -pflash: >>>>>>> >>>>>>> This accommodates the following use case: suppose that OVMF >>>>>>> is split in >>>>>>> two parts, a writeable host file for non-volatile variable >>>>>>> storage, and a >>>>>>> read-only part for bootstrap and decompressible executable code. >>>>>>> >>>>>>> Such a split between writable part and read-only part makes sense to me. >>>>>>> How is it done in physical hardware? Single device with configurable >>>>>>> write-protect, or two separate devices? >>>>>> >>>>>> (Jordan could help more.) >>>>>> >>>>>> Likely one device that's fully writeable. >>>>> >>>>> Most parts will have a dedicated read-only line. >>>>> >>>>> Many devices have 'block-locking' that will make some subset of blocks >>>>> read-only until a reset. >>>>> >>>>> In addition to this, many chipsets will allow flash writes to be >>>>> protected by triggering SMM when a flash write occurs. >>>>> >>>>> Using multiple chips are less common due to cost, but this is not a >>>>> factor for QEMU. :) >>>> >>>> Should we stick to what real hardware does? Single device, perhaps with >>>> block locking. >>> >>> I can't back a single flash device with two drives (= two host-side >>> files), which is the incentive for this change. >> >> There's no fundamental reason why a single device model instance could >> not be backed by two block backends, named by two drive properties. >> >> I'm not claiming this is the best solution, just offering it for >> consideration. > > I'll pass :) I guess in theory we could push down the multi-drive thing > to "pflash_cfi01.c". But: > - It is used by a bunch of other boards. > - Regarding i386, the second drive could automatically become (dependent > on the first drive's size) part of the range that is mirrored in > isa-bios space. Probably not intended. I suspect real hardware mirrors the last 128KiB of its only flash chip regardless of where the write-protected part begins. > - The previous point strengthens the "pflash is used by other boards > too" concern: in case of i386, I'm at least halfway aware of the > board-specific consequences of sneaking in another drive, but I have no > clue about the other boards. > - If we decide at some point to map / paste backing drives in a loop, > then the (at that time) existent device properties of pflash, let's call > them "drive0" and "drive1", will look clumsy. We'll need a way to parse > an unknown number of backend (drive) IDs. A mess. Perhaps the proper way to back partially writable flash contents isn't splitting it into two devices, but backing a single device with a COW. The backing file has initial contents (say BIOS image), the delta may have additional contents (say non-volatile configuration), and picks up whatever the device model permits to be written to the flash.
On 11/27/13 18:22, Markus Armbruster wrote: > Perhaps the proper way to back partially writable flash contents isn't > splitting it into two devices, but backing a single device with a COW. > The backing file has initial contents (say BIOS image), the delta may > have additional contents (say non-volatile configuration), and picks up > whatever the device model permits to be written to the flash. Kevin brought this up before. The problem is that this prevents you from upgrading the read-only part in O(1). (Unless of course you're willing to make the backing file raw *and* to poke directly in it when you upgrade the bios binary part. Theoretically you could do this, because the divide between the varstore and the binary part at 128KB falls at a qcow2 block boundary (supposing a 64K qcow2 block size), but in practice this is a horrible idea :)) Let's not veer off into architecting a new star destroyer. I think I have plenty ammo from your great notes thus far that I can hack up a v2 with :) Thanks Laszlo
Laszlo Ersek <lersek@redhat.com> writes: > On 11/27/13 18:22, Markus Armbruster wrote: > >> Perhaps the proper way to back partially writable flash contents isn't >> splitting it into two devices, but backing a single device with a COW. >> The backing file has initial contents (say BIOS image), the delta may >> have additional contents (say non-volatile configuration), and picks up >> whatever the device model permits to be written to the flash. > > Kevin brought this up before. The problem is that this prevents you from > upgrading the read-only part in O(1). Err, what's wrong with rebasing the COW to a new backing file? Oh, I see you considered it: > (Unless of course you're willing to make the backing file raw *and* to > poke directly in it when you upgrade the bios binary part. Theoretically No, don't update the backing file, *replace* it. If the device model permitted a write to a cluster within the BIOS part, the replacement won't be visible, so don't do that :) > you could do this, because the divide between the varstore and the > binary part at 128KB falls at a qcow2 block boundary (supposing a 64K > qcow2 block size), but in practice this is a horrible idea :)) Yes, with COW the COW clustering apply to the division between BIOS part and non-volatile configuration part. > Let's not veer off into architecting a new star destroyer. I think I > have plenty ammo from your great notes thus far that I can hack up a v2 > with :) Hmmmm, pretty star destroyers...
diff --git a/hw/i386/pc_sysfw.c b/hw/i386/pc_sysfw.c index e917c83..1c3e3d6 100644 --- a/hw/i386/pc_sysfw.c +++ b/hw/i386/pc_sysfw.c @@ -72,35 +72,65 @@ static void pc_isa_bios_init(MemoryRegion *rom_memory, memory_region_set_readonly(isa_bios, true); } +/* This function maps flash drives from 4G downward, in order of their unit + * numbers. Addressing within one flash drive is of course not reversed. + * + * The drive with unit number 0 is mapped at the highest address, and it is + * passed to pc_isa_bios_init(). Merging severral drives for isa-bios is not + * supported. + * + * The caller is responsible to pass in the non-NULL @pflash_drv that + * corresponds to the flash drive with unit number 0. + */ static void pc_system_flash_init(MemoryRegion *rom_memory, DriveInfo *pflash_drv) { + int unit = 0; BlockDriverState *bdrv; int64_t size; - hwaddr phys_addr; + hwaddr phys_addr = 0x100000000ULL; int sector_bits, sector_size; pflash_t *system_flash; MemoryRegion *flash_mem; + char name[64]; - bdrv = pflash_drv->bdrv; - size = bdrv_getlength(pflash_drv->bdrv); sector_bits = 12; sector_size = 1 << sector_bits; - if ((size % sector_size) != 0) { - fprintf(stderr, - "qemu: PC system firmware (pflash) must be a multiple of 0x%x\n", - sector_size); - exit(1); - } + do { + bdrv = pflash_drv->bdrv; + size = bdrv_getlength(bdrv); + if ((size % sector_size) != 0) { + fprintf(stderr, + "qemu: PC system firmware (pflash segment %d) must be a " + "multiple of 0x%x\n", unit, sector_size); + exit(1); + } + if (size > phys_addr) { + fprintf(stderr, "qemu: pflash segments must fit under 4G " + "cumulatively\n"); + exit(1); + } - phys_addr = 0x100000000ULL - size; - system_flash = pflash_cfi01_register(phys_addr, NULL, "system.flash", size, - bdrv, sector_size, size >> sector_bits, - 1, 0x0000, 0x0000, 0x0000, 0x0000, 0); - flash_mem = pflash_cfi01_get_memory(system_flash); + phys_addr -= size; - pc_isa_bios_init(rom_memory, flash_mem, size); + /* pflash_cfi01_register() creates a deep copy of the name */ + snprintf(name, sizeof name, "system.flash%d", unit); + system_flash = pflash_cfi01_register(phys_addr, NULL /* qdev */, name, + size, bdrv, sector_size, + size >> sector_bits, + 1 /* width */, + 0x0000 /* id0 */, + 0x0000 /* id1 */, + 0x0000 /* id2 */, + 0x0000 /* id3 */, + 0 /* be */); + if (unit == 0) { + flash_mem = pflash_cfi01_get_memory(system_flash); + pc_isa_bios_init(rom_memory, flash_mem, size); + } + pflash_drv = drive_get(IF_PFLASH, 0, ++unit); + } while (pflash_drv != NULL); } static void old_pc_system_rom_init(MemoryRegion *rom_memory, bool isapc_ram_fw)
This patch allows the user to usefully specify -drive file=img_1,if=pflash,format=raw,readonly \ -drive file=img_2,if=pflash,format=raw on the command line. The flash images will be mapped under 4G in their reverse unit order -- that is, with their base addresses progressing downwards, in increasing unit order. (The unit number increases with command line order if not explicitly specified.) This accommodates the following use case: suppose that OVMF is split in two parts, a writeable host file for non-volatile variable storage, and a read-only part for bootstrap and decompressible executable code. The binary code part would be read-only, centrally managed on the host system, and passed in as unit 0. The variable store would be writeable, VM-specific, and passed in as unit 1. 00000000ffe00000-00000000ffe1ffff (prio 0, R-): system.flash1 00000000ffe20000-00000000ffffffff (prio 0, R-): system.flash0 (If the guest tries to write to the flash range that is backed by the read-only drive, bdrv_write() in pflash_update() will correctly deny the write with -EACCES, and pflash_update() won't care, which suits us well.) Signed-off-by: Laszlo Ersek <lersek@redhat.com> --- hw/i386/pc_sysfw.c | 60 ++++++++++++++++++++++++++++++++++++++++-------------- 1 file changed, 45 insertions(+), 15 deletions(-)