diff mbox

[v4] Add optionrom compatible with fw_cfg DMA version

Message ID 1454065944-15887-1-git-send-email-markmb@redhat.com
State New
Headers show

Commit Message

Marc Marí Jan. 29, 2016, 11:12 a.m. UTC
This optionrom is based on linuxboot.S.

Added changes proposed by Gerd Hoffman, Stefan Hajnoczi and Kevin O'Connor.

All optionroms are now compiled in 32 bits. This also forces to not use any
standard C header because this would need cross-compiling support check and a
big modification on the configuration script.

Signed-off-by: Marc Marí <markmb@redhat.com>
---
 .gitignore                        |   4 +
 hw/i386/pc.c                      |   9 +-
 hw/nvram/fw_cfg.c                 |   2 +-
 include/hw/nvram/fw_cfg.h         |   1 +
 pc-bios/optionrom/Makefile        |   7 +-
 pc-bios/optionrom/linuxboot_dma.c | 288 ++++++++++++++++++++++++++++++++++++++
 6 files changed, 306 insertions(+), 5 deletions(-)
 create mode 100644 pc-bios/optionrom/linuxboot_dma.c

Comments

Stefan Hajnoczi Jan. 29, 2016, 3:25 p.m. UTC | #1
On Fri, Jan 29, 2016 at 12:12:24PM +0100, Marc Marí wrote:
> This optionrom is based on linuxboot.S.
> 
> Added changes proposed by Gerd Hoffman, Stefan Hajnoczi and Kevin O'Connor.
> 
> All optionroms are now compiled in 32 bits. This also forces to not use any
> standard C header because this would need cross-compiling support check and a
> big modification on the configuration script.
> 
> Signed-off-by: Marc Marí <markmb@redhat.com>
> ---
>  .gitignore                        |   4 +
>  hw/i386/pc.c                      |   9 +-
>  hw/nvram/fw_cfg.c                 |   2 +-
>  include/hw/nvram/fw_cfg.h         |   1 +
>  pc-bios/optionrom/Makefile        |   7 +-
>  pc-bios/optionrom/linuxboot_dma.c | 288 ++++++++++++++++++++++++++++++++++++++
>  6 files changed, 306 insertions(+), 5 deletions(-)
>  create mode 100644 pc-bios/optionrom/linuxboot_dma.c

Please include a changelog when sending new revisions of a patch.
Thanks!

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Gerd Hoffmann Feb. 2, 2016, 11:06 a.m. UTC | #2
Hi,

>  %.img: %.o
> -	$(call quiet-command,$(LD) $(LDFLAGS_NOPIE) -Ttext 0 -e _start -s -o $@ $<,"  Building $(TARGET_DIR)$@")
> +	$(call quiet-command,$(LD) $(LDFLAGS_NOPIE) -m elf_i386 -Ttext 0 -e _start -s -o $@ $<,"  Building $(TARGET_DIR)$@")

Hmm, that breaks the windows cross build:

make: Entering directory `/home/kraxel/projects/qemu/build-win32'
  Building optionrom/linuxboot_dma.img
i686-w64-mingw32-ld: unrecognised emulation mode: elf_i386
Supported emulations: i386pe
make[1]: *** [linuxboot_dma.img] Error 1

cheers,
  Gerd
Gerd Hoffmann Feb. 2, 2016, 12:04 p.m. UTC | #3
On Di, 2016-02-02 at 12:06 +0100, Gerd Hoffmann wrote:
>   Hi,
> 
> >  %.img: %.o
> > -	$(call quiet-command,$(LD) $(LDFLAGS_NOPIE) -Ttext 0 -e _start -s -o $@ $<,"  Building $(TARGET_DIR)$@")
> > +	$(call quiet-command,$(LD) $(LDFLAGS_NOPIE) -m elf_i386 -Ttext 0 -e _start -s -o $@ $<,"  Building $(TARGET_DIR)$@")
> 
> Hmm, that breaks the windows cross build:
> 
> make: Entering directory `/home/kraxel/projects/qemu/build-win32'
>   Building optionrom/linuxboot_dma.img
> i686-w64-mingw32-ld: unrecognised emulation mode: elf_i386
> Supported emulations: i386pe
> make[1]: *** [linuxboot_dma.img] Error 1

Testing shows two more problems:

(1) initrd loading is broken, kernel complains it finds only gibberish:

[    0.934582] Unpacking initramfs...
[    1.166983] Initramfs unpacking failed: junk in compressed archive
[    1.168458] Freeing initrd memory: 32812k freed

(2) going back to non-dma boot via -M pc-$old doesn't work, appearently
    fw_cfg dma is enabled even for old machine types.

cheers,
  Gerd
Marc Marí Feb. 2, 2016, 12:58 p.m. UTC | #4
El Tue, 02 Feb 2016 12:06:27 +0100
Gerd Hoffmann <kraxel@redhat.com> escribió:
>   Hi,
> 
> >  %.img: %.o
> > -	$(call quiet-command,$(LD) $(LDFLAGS_NOPIE) -Ttext 0 -e
> > _start -s -o $@ $<,"  Building $(TARGET_DIR)$@")
> > +	$(call quiet-command,$(LD) $(LDFLAGS_NOPIE) -m elf_i386
> > -Ttext 0 -e _start -s -o $@ $<,"  Building $(TARGET_DIR)$@")
> 
> Hmm, that breaks the windows cross build:
> 
> make: Entering directory `/home/kraxel/projects/qemu/build-win32'
>   Building optionrom/linuxboot_dma.img
> i686-w64-mingw32-ld: unrecognised emulation mode: elf_i386
> Supported emulations: i386pe
> make[1]: *** [linuxboot_dma.img] Error 1

Thanks for reporting.

I don't know much about Windows cross-builds. Any idea on how to solve
the issue?

Marc
Marc Marí Feb. 2, 2016, 1:07 p.m. UTC | #5
El Tue, 02 Feb 2016 13:04:55 +0100
Gerd Hoffmann <kraxel@redhat.com> escribió:
> On Di, 2016-02-02 at 12:06 +0100, Gerd Hoffmann wrote:
> >   Hi,
> > 
> > >  %.img: %.o
> > > -	$(call quiet-command,$(LD) $(LDFLAGS_NOPIE) -Ttext 0 -e
> > > _start -s -o $@ $<,"  Building $(TARGET_DIR)$@")
> > > +	$(call quiet-command,$(LD) $(LDFLAGS_NOPIE) -m elf_i386
> > > -Ttext 0 -e _start -s -o $@ $<,"  Building $(TARGET_DIR)$@")
> > 
> > Hmm, that breaks the windows cross build:
> > 
> > make: Entering directory `/home/kraxel/projects/qemu/build-win32'
> >   Building optionrom/linuxboot_dma.img
> > i686-w64-mingw32-ld: unrecognised emulation mode: elf_i386
> > Supported emulations: i386pe
> > make[1]: *** [linuxboot_dma.img] Error 1
> 
> Testing shows two more problems:
> 
> (1) initrd loading is broken, kernel complains it finds only
> gibberish:
> 
> [    0.934582] Unpacking initramfs...
> [    1.166983] Initramfs unpacking failed: junk in compressed archive
> [    1.168458] Freeing initrd memory: 32812k freed

Will look at it. It is probably being copied to a wrong location.

> (2) going back to non-dma boot via -M pc-$old doesn't work,
> appearently fw_cfg dma is enabled even for old machine types.

I don't remember discussing the topic of machine types when touching
fw_cfg DMA. Which means, it probably slipped amongst the other details.
But it is now merged and in stable, so it should probably be left as it
is now.

Should this optionrom be enabled only with the latest machine type?

Thanks
Marc
Gerd Hoffmann Feb. 2, 2016, 2:23 p.m. UTC | #6
Hi,

> I don't remember discussing the topic of machine types when touching
> fw_cfg DMA. Which means, it probably slipped amongst the other details.
> But it is now merged and in stable, so it should probably be left as it
> is now.

We have to fix it, it breaks live migration.  With fw_cfg_dma turned on
we send an extra vmstate section.  qemu 2.4 (+older) will not understand
it.  So we have to turn it off for those machine types.

dma_enabled property is there already, but the logic is wrong.  It
defaults to false, but is flipped to true when the arch supports dma
(i.e. on x86 and arm), unconditionally.  Instead it should default to
true.  When set to false by the user or compat properties don't enable
fw_cfg dma (and also flip it to false when dma is not supported by the
arch).

> Should this optionrom be enabled only with the latest machine type?

The logic to pick the correct rom is fine.

cheers,
  Gerd
Laszlo Ersek Feb. 2, 2016, 3:04 p.m. UTC | #7
On 02/02/16 15:23, Gerd Hoffmann wrote:
>   Hi,
> 
>> I don't remember discussing the topic of machine types when touching
>> fw_cfg DMA. Which means, it probably slipped amongst the other details.
>> But it is now merged and in stable, so it should probably be left as it
>> is now.
> 
> We have to fix it, it breaks live migration.  With fw_cfg_dma turned on
> we send an extra vmstate section.  qemu 2.4 (+older) will not understand
> it.  So we have to turn it off for those machine types.
> 
> dma_enabled property is there already, but the logic is wrong.  It
> defaults to false, but is flipped to true when the arch supports dma
> (i.e. on x86 and arm), unconditionally.  Instead it should default to
> true.  When set to false by the user or compat properties don't enable
> fw_cfg dma (and also flip it to false when dma is not supported by the
> arch).

I think the "dma_enabled" property is not exposed to the user. It is
internal to the fw_cfg implementation. The only reason we use a property
for this purpose because we want to pass a parameter to realize(), and
that's only possible via properties.

The default value of "dma_enabled" in both fw_cfg_io_properties and
fw_cfg_mem_properties is irrelevant; the actual property value is always
overwritten in fw_cfg_init_io_dma() and fw_cfg_init_mem_wide(), which
all of the init paths go through.

I agree that DMA capability should be filtered with machine type.
However, that distinction should not be made using the current
"dma_enabled" properties (i.e., of "fw_cfg_io_properties" and
"fw_cfg_mem_properties". Instead, it should be made in the
board-specific callers of fw_cfg_init_(io_dma|mem_wide). Those functions
are:

- create_fw_cfg() [hw/arm/virt.c]
- bochs_bios_init() [hw/i386/pc.c]

It looks like aarch64 virt machine types are not versioned yet, so that
leaves us with bochs_bios_init().

bochs_bios_init() is called by pc_memory_init(), and pc_memory_init() at
last has knowledge of machine type knobs -- it takes both "pcms" and
"guest_info".

"pcms" seems to handle the -machine options, but controlling this
feature from the command line is not really a goal here. Whereas machine
type compat flags arrive through "guest_info".

So I think we need the following:
- a new boolean field in PCMachineClass (not PCMachineState) called
  "fw_cfg_dma_disabled"
- In the pc_i440fx_2_4_machine_options() and
  pc_q35_2_4_machine_options() functions, this should be set to "true"
- The same field should be added to PcGuestInfo
- The pc_init1() and pc_q35_init() functions should copy the field from
  *pcmc to *guest_info
- The pc_memory_init() function should pass
  guest_info->fw_cfg_dma_disabled to bochs_bios_init()
- bochs_bios_init() should call fw_cfg_init_io() if the flag is "true".

Thanks
Laszlo

>> Should this optionrom be enabled only with the latest machine type?
> 
> The logic to pick the correct rom is fine.
> 
> cheers,
>   Gerd
> 
>
Gerd Hoffmann Feb. 3, 2016, 8:44 a.m. UTC | #8
Hi,

> I think the "dma_enabled" property is not exposed to the user.

It is: "-global fw_cfg.dma_enabled=off" works (as in: doesn't throw an
error).  Has no effect through as it gets overridden later on.

> The default value of "dma_enabled" in both fw_cfg_io_properties and
> fw_cfg_mem_properties is irrelevant; the actual property value is always
> overwritten in fw_cfg_init_io_dma() and fw_cfg_init_mem_wide(), which
> all of the init paths go through.

And IMHO we should not do that, so setting the property actually has an
effect.

> I agree that DMA capability should be filtered with machine type.
> However, that distinction should not be made using the current
> "dma_enabled" properties (i.e., of "fw_cfg_io_properties" and
> "fw_cfg_mem_properties". Instead, it should be made in the
> board-specific callers of fw_cfg_init_(io_dma|mem_wide).

Why?

cheers,
  Gerd
Stefan Hajnoczi Feb. 3, 2016, 9:47 a.m. UTC | #9
On Tue, Feb 02, 2016 at 01:58:14PM +0100, Marc Marí wrote:
> El Tue, 02 Feb 2016 12:06:27 +0100
> Gerd Hoffmann <kraxel@redhat.com> escribió:
> >   Hi,
> > 
> > >  %.img: %.o
> > > -	$(call quiet-command,$(LD) $(LDFLAGS_NOPIE) -Ttext 0 -e
> > > _start -s -o $@ $<,"  Building $(TARGET_DIR)$@")
> > > +	$(call quiet-command,$(LD) $(LDFLAGS_NOPIE) -m elf_i386
> > > -Ttext 0 -e _start -s -o $@ $<,"  Building $(TARGET_DIR)$@")
> > 
> > Hmm, that breaks the windows cross build:
> > 
> > make: Entering directory `/home/kraxel/projects/qemu/build-win32'
> >   Building optionrom/linuxboot_dma.img
> > i686-w64-mingw32-ld: unrecognised emulation mode: elf_i386
> > Supported emulations: i386pe
> > make[1]: *** [linuxboot_dma.img] Error 1
> 
> Thanks for reporting.
> 
> I don't know much about Windows cross-builds. Any idea on how to solve
> the issue?

The Windows toolchain doesn't use ELF binaries so -m elf_i386 doesn't
work there (the emulation is called "i386pe" in i686-w64-mingw32-ld).

I wonder whether it's possible to use gcc -m32 ... -o $@ %< as a wrapper
that automatically does the right thing.  But I guess it won't work
since gcc wants a C source file and not an object file as input.

You could make the emulation ("elf_i386" vs "i386pe") conditional on the
host platform (CONFIG_WIN32=y).

I'm not sure what the most elegant solution is.

Stefan
Laszlo Ersek Feb. 3, 2016, 11:48 a.m. UTC | #10
On 02/03/16 09:44, Gerd Hoffmann wrote:
>   Hi,
> 
>> I think the "dma_enabled" property is not exposed to the user.
> 
> It is: "-global fw_cfg.dma_enabled=off" works (as in: doesn't throw an
> error).  Has no effect through as it gets overridden later on.
> 
>> The default value of "dma_enabled" in both fw_cfg_io_properties and
>> fw_cfg_mem_properties is irrelevant; the actual property value is always
>> overwritten in fw_cfg_init_io_dma() and fw_cfg_init_mem_wide(), which
>> all of the init paths go through.
> 
> And IMHO we should not do that, so setting the property actually has an
> effect.

Fair point.

>> I agree that DMA capability should be filtered with machine type.
>> However, that distinction should not be made using the current
>> "dma_enabled" properties (i.e., of "fw_cfg_io_properties" and
>> "fw_cfg_mem_properties". Instead, it should be made in the
>> board-specific callers of fw_cfg_init_(io_dma|mem_wide).
> 
> Why?

That's how "has_reserved_memory" works as well, for example.

But, if the property is made work, I guess PC_COMPAT_2_4 can be used
too. (Or should it be HW_COMPAT_2_4?)

Is that your point?

Thanks
Laszlo
Gerd Hoffmann Feb. 3, 2016, 12:57 p.m. UTC | #11
Hi,

> >> I agree that DMA capability should be filtered with machine type.
> >> However, that distinction should not be made using the current
> >> "dma_enabled" properties (i.e., of "fw_cfg_io_properties" and
> >> "fw_cfg_mem_properties". Instead, it should be made in the
> >> board-specific callers of fw_cfg_init_(io_dma|mem_wide).
> > 
> > Why?
> 
> That's how "has_reserved_memory" works as well, for example.

Those are machine options, not device options.  If we have a device
property there is no need to create a duplicate a machine option for the
same purpose.

> But, if the property is made work, I guess PC_COMPAT_2_4 can be used
> too. (Or should it be HW_COMPAT_2_4?)

HW_COMPAT_2_4 probably, so it applies to q35 too.

cheers,
  Gerd
Vladimir Sementsov-Ogievskiy Feb. 22, 2016, 11:56 a.m. UTC | #12
Hi, Marc!

Am I right, that this patch is necessary to reproduce your results from 
"QEMU fw_cfg DMA interface"?

> QEMU commit 2be4f242b50a8 and SeaBIOS commit 908a58c1d5ff
> QEMU startup time: .078
> BIOS startup time: .060
> Kernel setup time: .578
> Total time: .716
>
> QEMU with this patch series and SeaBIOS with this patch series
> QEMU startup time: .080
> BIOS startup time: .039
> Kernel setup time: .002
> Total time: .121

Because, without this patch seabios (and independently of your other 
patches) calls linuxboot optionrom, which loads kernel through io-ports 
(It's in pc-bios/optionrom/linuxboot.S: 
"read_fw_blob_addr32(FW_CFG_KERNEL)"), and this reading is the most 
time-spending part of "Kernel setup" phase.

Am I missing something?

On 29.01.2016 14:12, Marc Marí wrote:
> This optionrom is based on linuxboot.S.
>
> Added changes proposed by Gerd Hoffman, Stefan Hajnoczi and Kevin O'Connor.
>
> All optionroms are now compiled in 32 bits. This also forces to not use any
> standard C header because this would need cross-compiling support check and a
> big modification on the configuration script.
>
> Signed-off-by: Marc Marí <markmb@redhat.com>
> ---
>   .gitignore                        |   4 +
>   hw/i386/pc.c                      |   9 +-
>   hw/nvram/fw_cfg.c                 |   2 +-
>   include/hw/nvram/fw_cfg.h         |   1 +
>   pc-bios/optionrom/Makefile        |   7 +-
>   pc-bios/optionrom/linuxboot_dma.c | 288 ++++++++++++++++++++++++++++++++++++++
>   6 files changed, 306 insertions(+), 5 deletions(-)
>   create mode 100644 pc-bios/optionrom/linuxboot_dma.c
>
> diff --git a/.gitignore b/.gitignore
> index 88a80ff..101d1e0 100644
> --- a/.gitignore
> +++ b/.gitignore
> @@ -94,6 +94,10 @@
>   /pc-bios/optionrom/linuxboot.bin
>   /pc-bios/optionrom/linuxboot.raw
>   /pc-bios/optionrom/linuxboot.img
> +/pc-bios/optionrom/linuxboot_dma.asm
> +/pc-bios/optionrom/linuxboot_dma.bin
> +/pc-bios/optionrom/linuxboot_dma.raw
> +/pc-bios/optionrom/linuxboot_dma.img
>   /pc-bios/optionrom/multiboot.asm
>   /pc-bios/optionrom/multiboot.bin
>   /pc-bios/optionrom/multiboot.raw
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index 459260b..00339fa 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -1007,8 +1007,13 @@ static void load_linux(PCMachineState *pcms,
>       fw_cfg_add_i32(fw_cfg, FW_CFG_SETUP_SIZE, setup_size);
>       fw_cfg_add_bytes(fw_cfg, FW_CFG_SETUP_DATA, setup, setup_size);
>   
> -    option_rom[nb_option_roms].name = "linuxboot.bin";
> -    option_rom[nb_option_roms].bootindex = 0;
> +    if (fw_cfg_dma_enabled(fw_cfg)) {
> +        option_rom[nb_option_roms].name = "linuxboot_dma.bin";
> +        option_rom[nb_option_roms].bootindex = 0;
> +    } else {
> +        option_rom[nb_option_roms].name = "linuxboot.bin";
> +        option_rom[nb_option_roms].bootindex = 0;
> +    }
>       nb_option_roms++;
>   }
>   
> diff --git a/hw/nvram/fw_cfg.c b/hw/nvram/fw_cfg.c
> index a1d650d..d0a5753 100644
> --- a/hw/nvram/fw_cfg.c
> +++ b/hw/nvram/fw_cfg.c
> @@ -546,7 +546,7 @@ static bool is_version_1(void *opaque, int version_id)
>       return version_id == 1;
>   }
>   
> -static bool fw_cfg_dma_enabled(void *opaque)
> +bool fw_cfg_dma_enabled(void *opaque)
>   {
>       FWCfgState *s = opaque;
>   
> diff --git a/include/hw/nvram/fw_cfg.h b/include/hw/nvram/fw_cfg.h
> index 664eaf6..953e58d 100644
> --- a/include/hw/nvram/fw_cfg.h
> +++ b/include/hw/nvram/fw_cfg.h
> @@ -219,6 +219,7 @@ FWCfgState *fw_cfg_init_mem_wide(hwaddr ctl_addr,
>                                    hwaddr dma_addr, AddressSpace *dma_as);
>   
>   FWCfgState *fw_cfg_find(void);
> +bool fw_cfg_dma_enabled(void *opaque);
>   
>   #endif /* NO_QEMU_PROTOS */
>   
> diff --git a/pc-bios/optionrom/Makefile b/pc-bios/optionrom/Makefile
> index ce4852a..bdd0cc1 100644
> --- a/pc-bios/optionrom/Makefile
> +++ b/pc-bios/optionrom/Makefile
> @@ -13,15 +13,18 @@ CFLAGS := -Wall -Wstrict-prototypes -Werror -fomit-frame-pointer -fno-builtin
>   CFLAGS += -I$(SRC_PATH)
>   CFLAGS += $(call cc-option, $(CFLAGS), -fno-stack-protector)
>   CFLAGS += $(CFLAGS_NOPIE)
> +CFLAGS += -m32
>   QEMU_CFLAGS = $(CFLAGS)
>   
> -build-all: multiboot.bin linuxboot.bin kvmvapic.bin
> +ASFLAGS += -32
> +
> +build-all: multiboot.bin linuxboot.bin linuxboot_dma.bin kvmvapic.bin
>   
>   # suppress auto-removal of intermediate files
>   .SECONDARY:
>   
>   %.img: %.o
> -	$(call quiet-command,$(LD) $(LDFLAGS_NOPIE) -Ttext 0 -e _start -s -o $@ $<,"  Building $(TARGET_DIR)$@")
> +	$(call quiet-command,$(LD) $(LDFLAGS_NOPIE) -m elf_i386 -Ttext 0 -e _start -s -o $@ $<,"  Building $(TARGET_DIR)$@")
>   
>   %.raw: %.img
>   	$(call quiet-command,$(OBJCOPY) -O binary -j .text $< $@,"  Building $(TARGET_DIR)$@")
> diff --git a/pc-bios/optionrom/linuxboot_dma.c b/pc-bios/optionrom/linuxboot_dma.c
> new file mode 100644
> index 0000000..c1181cd
> --- /dev/null
> +++ b/pc-bios/optionrom/linuxboot_dma.c
> @@ -0,0 +1,288 @@
> +/*
> + * Linux Boot Option ROM for fw_cfg DMA
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, see <http://www.gnu.org/licenses/>.
> + *
> + * Copyright (c) 2015 Red Hat Inc.
> + *   Authors: Marc Marí <markmb@redhat.com>
> + */
> +
> +asm(
> +".text\n"
> +".global _start\n"
> +"_start:\n"
> +"   .short	0xaa55\n"
> +"   .byte (_end - _start) / 512\n"
> +"   lret\n"
> +"   .org 0x18\n"
> +"   .short 0\n"
> +"   .short _pnph\n"
> +"_pnph:\n"
> +"   .ascii \"$PnP\"\n"
> +"   .byte 0x01\n"
> +"   .byte ( _pnph_len / 16 )\n"
> +"   .short 0x0000\n"
> +"   .byte 0x00\n"
> +"   .byte 0x00\n"
> +"   .long 0x00000000\n"
> +"   .short _manufacturer\n"
> +"   .short _product\n"
> +"   .long 0x00000000\n"
> +"   .short 0x0000\n"
> +"   .short 0x0000\n"
> +"   .short _bev\n"
> +"   .short 0x0000\n"
> +"   .short 0x0000\n"
> +"   .equ _pnph_len, . - _pnph\n"
> +"   .align 4, 0\n"
> +"_bev:\n"
> +".code16gcc\n"
> +/* DS = CS */
> +"   movw %cs, %ax\n"
> +"   movw %ax, %ds\n"
> +"   movl %esp, %ebp\n"
> +"   cli\n"
> +"   cld\n"
> +"   jmp load_kernel\n"
> +);
> +
> +#define NO_QEMU_PROTOS
> +#include "../../include/hw/nvram/fw_cfg.h"
> +
> +#define BOOT_ROM_PRODUCT "Linux loader DMA"
> +
> +/* QEMU_CFG_DMA_CONTROL bits */
> +#define BIOS_CFG_DMA_CTL_ERROR   0x01
> +#define BIOS_CFG_DMA_CTL_READ    0x02
> +#define BIOS_CFG_DMA_CTL_SKIP    0x04
> +#define BIOS_CFG_DMA_CTL_SELECT  0x08
> +
> +#define BIOS_CFG_DMA_ADDR_HIGH 0x514
> +#define BIOS_CFG_DMA_ADDR_LOW  0x518
> +
> +#define _stringify(S)   #S
> +#define stringify(S) _stringify(S)
> +
> +#define uint64_t unsigned long long
> +#define uint32_t unsigned int
> +#define uint16_t unsigned short
> +
> +#define barrier() asm("": : :"memory")
> +
> +typedef struct FWCfgDmaAccess {
> +    uint32_t control;
> +    uint32_t length;
> +    uint64_t address;
> +} __attribute__((packed)) FWCfgDmaAccess;
> +
> +static inline void outl(uint32_t value, uint16_t port) {
> +    asm("outl %0, %w1" : : "a"(value), "Nd"(port));
> +}
> +
> +static inline void set_es(void *addr) {
> +    uint32_t seg = (uint32_t)addr >> 4;
> +    asm("movl %0, %%es" : : "r"(seg));
> +}
> +
> +static inline uint16_t readw_es(uint16_t offset) {
> +    uint16_t val;
> +    asm("addr32 movw %%es:(%1), %0" : "=r"(val) : "r"((uint32_t)offset));
> +    barrier();
> +    return val;
> +}
> +
> +static inline uint32_t readl_es(uint16_t offset) {
> +    uint32_t val;
> +    asm("addr32 movl %%es:(%1), %0" : "=r"(val) : "r"((uint32_t)offset));
> +    barrier();
> +    return val;
> +}
> +
> +static inline void writel_es(uint16_t offset, uint32_t val) {
> +    barrier();
> +    asm("addr32 movl %0, %%es:(%1)" : : "r"(val), "r"((uint32_t)offset));
> +}
> +
> +static inline uint32_t bswap32(uint32_t x)
> +{
> +    return (((x & 0x000000ffU) << 24) |
> +            ((x & 0x0000ff00U) <<  8) |
> +            ((x & 0x00ff0000U) >>  8) |
> +            ((x & 0xff000000U) >> 24));
> +}
> +
> +static inline uint64_t bswap64(uint64_t x)
> +{
> +    return (((x & 0x00000000000000ffULL) << 56) |
> +            ((x & 0x000000000000ff00ULL) << 40) |
> +            ((x & 0x0000000000ff0000ULL) << 24) |
> +            ((x & 0x00000000ff000000ULL) <<  8) |
> +            ((x & 0x000000ff00000000ULL) >>  8) |
> +            ((x & 0x0000ff0000000000ULL) >> 24) |
> +            ((x & 0x00ff000000000000ULL) >> 40) |
> +            ((x & 0xff00000000000000ULL) >> 56));
> +}
> +
> +static inline uint64_t cpu_to_be64(uint64_t x) {
> +    return bswap64(x);
> +}
> +
> +static inline uint32_t cpu_to_be32(uint32_t x) {
> +    return bswap32(x);
> +}
> +
> +static inline uint32_t be32_to_cpu(uint32_t x) {
> +    return bswap32(x);
> +}
> +
> +static void bios_cfg_read_entry(void *buf, uint16_t entry, uint32_t len)
> +{
> +    FWCfgDmaAccess access;
> +    uint32_t control = (entry << 16) | BIOS_CFG_DMA_CTL_SELECT
> +                        | BIOS_CFG_DMA_CTL_READ;
> +
> +    access.address = cpu_to_be64((uint64_t)(uint32_t)buf);
> +    access.length = cpu_to_be32(len);
> +    access.control = cpu_to_be32(control);
> +
> +    barrier();
> +
> +    outl(cpu_to_be32((uint32_t)&access), BIOS_CFG_DMA_ADDR_LOW);
> +
> +    while(be32_to_cpu(access.control) & ~BIOS_CFG_DMA_CTL_ERROR) {
> +        barrier();
> +    }
> +}
> +
> +static uint32_t get_e801_addr(void)
> +{
> +    uint32_t eax, ebx, ecx, edx;
> +    uint32_t ret;
> +
> +    eax = 0xe801;
> +    ebx = 0;
> +    ecx = 0;
> +    edx = 0;
> +    asm("int $0x15\n"
> +        : "+a"(eax)
> +        : "b"(ebx), "c"(ecx), "d"(edx));
> +
> +    /* Output could be in AX/BX or CX/DX */
> +    if ((uint16_t)ecx || (uint16_t)edx) {
> +        if(!(uint16_t)edx) {
> +            /* Add 1 MB and convert to bytes */
> +            ret = (ecx + 1024) << 10;
> +        } else {
> +            /* Add 16 MB and convert to bytes */
> +            ret = (edx + 256) << 16;
> +        }
> +    } else {
> +        if(!(uint16_t)ebx) {
> +            /* Add 1 MB and convert to bytes */
> +            ret = (eax + 1024) << 10;
> +        } else {
> +            /* Add 16 MB and convert to bytes */
> +            ret = (ebx + 256) << 16;
> +        }
> +    }
> +
> +    return ret;
> +}
> +
> +void load_kernel(void)
> +{
> +    void *setup_addr;
> +    void *initrd_addr;
> +    void *kernel_addr;
> +    void *cmdline_addr;
> +    uint32_t setup_size;
> +    uint32_t initrd_size;
> +    uint32_t kernel_size;
> +    uint32_t cmdline_size;
> +    uint32_t initrd_end_page, max_allowed_page;
> +    uint32_t segment_addr, stack_addr;
> +
> +    bios_cfg_read_entry(&setup_addr, FW_CFG_SETUP_ADDR, 4);
> +    bios_cfg_read_entry(&setup_size, FW_CFG_SETUP_SIZE, 4);
> +    bios_cfg_read_entry(setup_addr, FW_CFG_SETUP_DATA, setup_size);
> +
> +    set_es(setup_addr);
> +
> +    if (readw_es(0x206) < 0x203) {
> +        /* Assume initrd_max 0x37ffffff */
> +        writel_es(0x22c, 0x37ffffff);
> +    }
> +
> +    bios_cfg_read_entry(&initrd_addr, FW_CFG_INITRD_ADDR, 4);
> +    bios_cfg_read_entry(&initrd_size, FW_CFG_INITRD_SIZE, 4);
> +
> +    initrd_end_page = ((uint32_t)(initrd_addr + initrd_size) & -4096);
> +    max_allowed_page = (readl_es(0x22c) & -4096);
> +
> +    if (initrd_end_page != 0 && max_allowed_page != 0 &&
> +        initrd_end_page != max_allowed_page) {
> +        /* Initrd at the end of memory. Compute better initrd address
> +         * based on e801 data
> +         */
> +        initrd_addr = (void *)((get_e801_addr() - initrd_size) & -4096);
> +        writel_es(0x218, (uint32_t)initrd_addr);
> +
> +    }
> +
> +    bios_cfg_read_entry(initrd_addr, FW_CFG_INITRD_DATA, initrd_size);
> +
> +    bios_cfg_read_entry(&kernel_addr, FW_CFG_KERNEL_ADDR, 4);
> +    bios_cfg_read_entry(&kernel_size, FW_CFG_KERNEL_SIZE, 4);
> +    bios_cfg_read_entry(kernel_addr, FW_CFG_KERNEL_DATA, kernel_size);
> +
> +    bios_cfg_read_entry(&cmdline_addr, FW_CFG_CMDLINE_ADDR, 4);
> +    bios_cfg_read_entry(&cmdline_size, FW_CFG_CMDLINE_SIZE, 4);
> +    bios_cfg_read_entry(cmdline_addr, FW_CFG_CMDLINE_DATA, cmdline_size);
> +
> +    /* Boot linux */
> +    segment_addr = ((uint32_t)setup_addr >> 4);
> +    stack_addr = (uint32_t)(cmdline_addr - setup_addr - 16);
> +
> +    /* As we are changing critical registers, we cannot leave freedom to the
> +     * compiler.
> +     */
> +    asm("movw %%ax, %%ds\n"
> +        "movw %%ax, %%es\n"
> +        "movw %%ax, %%fs\n"
> +        "movw %%ax, %%gs\n"
> +        "movw %%ax, %%ss\n"
> +        "movl %%ebx, %%esp\n"
> +        "addw $0x20, %%ax\n"
> +        "pushw %%ax\n" /* CS */
> +        "pushw $0\n" /* IP */
> +        /* Clear registers and jump to Linux */
> +        "xor %%ebx, %%ebx\n"
> +        "xor %%ecx, %%ecx\n"
> +        "xor %%edx, %%edx\n"
> +        "xor %%edi, %%edi\n"
> +        "xor %%ebp, %%ebp\n"
> +        "lretw\n"
> +        : : "a"(segment_addr), "b"(stack_addr));
> +}
> +
> +asm(
> +"_manufacturer:\n"
> +".asciz \"QEMU\"\n"
> +"_product:\n"
> +".asciz "stringify(BOOT_ROM_PRODUCT)"\n"
> +".byte 0\n"
> +".align 512, 0\n"
> +"_end:\n"
> +);
> +
Stefan Hajnoczi Feb. 23, 2016, 9:51 a.m. UTC | #13
On Mon, Feb 22, 2016 at 02:56:27PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> Hi, Marc!
> 
> Am I right, that this patch is necessary to reproduce your results from
> "QEMU fw_cfg DMA interface"?
> 
> >QEMU commit 2be4f242b50a8 and SeaBIOS commit 908a58c1d5ff
> >QEMU startup time: .078
> >BIOS startup time: .060
> >Kernel setup time: .578
> >Total time: .716
> >
> >QEMU with this patch series and SeaBIOS with this patch series
> >QEMU startup time: .080
> >BIOS startup time: .039
> >Kernel setup time: .002
> >Total time: .121
> 
> Because, without this patch seabios (and independently of your other
> patches) calls linuxboot optionrom, which loads kernel through io-ports
> (It's in pc-bios/optionrom/linuxboot.S:
> "read_fw_blob_addr32(FW_CFG_KERNEL)"), and this reading is the most
> time-spending part of "Kernel setup" phase.
> 
> Am I missing something?

Yes, this patch makes -kernel/-initrd use fw_cfg DMA.

Stefan
Richard W.M. Jones March 21, 2016, 11:14 a.m. UTC | #14
My notes on this patch:

* It applies and compiles fine with current qemu on Linux/x86_64 host.
  However after building I had to do:

  cp pc-bios/optionrom/linuxboot_dma.bin pc-bios/

  I wasn't sure if that was supposed to be done automatically.

* It reduces libguestfs appliance boot time by about ½ second
  (2.00 s -> 1.53 s), so that's excellent.

* Overhead of BIOS-to-kernel drops from nearly 50% to under 20% of the
  total libguestfs appliance boot time.  SeaBIOS is still using 0.27
  seconds, of which at least 0.12 seconds is PCI bus scanning.

Anyway, all in all, a good change!

Rich.
diff mbox

Patch

diff --git a/.gitignore b/.gitignore
index 88a80ff..101d1e0 100644
--- a/.gitignore
+++ b/.gitignore
@@ -94,6 +94,10 @@ 
 /pc-bios/optionrom/linuxboot.bin
 /pc-bios/optionrom/linuxboot.raw
 /pc-bios/optionrom/linuxboot.img
+/pc-bios/optionrom/linuxboot_dma.asm
+/pc-bios/optionrom/linuxboot_dma.bin
+/pc-bios/optionrom/linuxboot_dma.raw
+/pc-bios/optionrom/linuxboot_dma.img
 /pc-bios/optionrom/multiboot.asm
 /pc-bios/optionrom/multiboot.bin
 /pc-bios/optionrom/multiboot.raw
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 459260b..00339fa 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1007,8 +1007,13 @@  static void load_linux(PCMachineState *pcms,
     fw_cfg_add_i32(fw_cfg, FW_CFG_SETUP_SIZE, setup_size);
     fw_cfg_add_bytes(fw_cfg, FW_CFG_SETUP_DATA, setup, setup_size);
 
-    option_rom[nb_option_roms].name = "linuxboot.bin";
-    option_rom[nb_option_roms].bootindex = 0;
+    if (fw_cfg_dma_enabled(fw_cfg)) {
+        option_rom[nb_option_roms].name = "linuxboot_dma.bin";
+        option_rom[nb_option_roms].bootindex = 0;
+    } else {
+        option_rom[nb_option_roms].name = "linuxboot.bin";
+        option_rom[nb_option_roms].bootindex = 0;
+    }
     nb_option_roms++;
 }
 
diff --git a/hw/nvram/fw_cfg.c b/hw/nvram/fw_cfg.c
index a1d650d..d0a5753 100644
--- a/hw/nvram/fw_cfg.c
+++ b/hw/nvram/fw_cfg.c
@@ -546,7 +546,7 @@  static bool is_version_1(void *opaque, int version_id)
     return version_id == 1;
 }
 
-static bool fw_cfg_dma_enabled(void *opaque)
+bool fw_cfg_dma_enabled(void *opaque)
 {
     FWCfgState *s = opaque;
 
diff --git a/include/hw/nvram/fw_cfg.h b/include/hw/nvram/fw_cfg.h
index 664eaf6..953e58d 100644
--- a/include/hw/nvram/fw_cfg.h
+++ b/include/hw/nvram/fw_cfg.h
@@ -219,6 +219,7 @@  FWCfgState *fw_cfg_init_mem_wide(hwaddr ctl_addr,
                                  hwaddr dma_addr, AddressSpace *dma_as);
 
 FWCfgState *fw_cfg_find(void);
+bool fw_cfg_dma_enabled(void *opaque);
 
 #endif /* NO_QEMU_PROTOS */
 
diff --git a/pc-bios/optionrom/Makefile b/pc-bios/optionrom/Makefile
index ce4852a..bdd0cc1 100644
--- a/pc-bios/optionrom/Makefile
+++ b/pc-bios/optionrom/Makefile
@@ -13,15 +13,18 @@  CFLAGS := -Wall -Wstrict-prototypes -Werror -fomit-frame-pointer -fno-builtin
 CFLAGS += -I$(SRC_PATH)
 CFLAGS += $(call cc-option, $(CFLAGS), -fno-stack-protector)
 CFLAGS += $(CFLAGS_NOPIE)
+CFLAGS += -m32
 QEMU_CFLAGS = $(CFLAGS)
 
-build-all: multiboot.bin linuxboot.bin kvmvapic.bin
+ASFLAGS += -32
+
+build-all: multiboot.bin linuxboot.bin linuxboot_dma.bin kvmvapic.bin
 
 # suppress auto-removal of intermediate files
 .SECONDARY:
 
 %.img: %.o
-	$(call quiet-command,$(LD) $(LDFLAGS_NOPIE) -Ttext 0 -e _start -s -o $@ $<,"  Building $(TARGET_DIR)$@")
+	$(call quiet-command,$(LD) $(LDFLAGS_NOPIE) -m elf_i386 -Ttext 0 -e _start -s -o $@ $<,"  Building $(TARGET_DIR)$@")
 
 %.raw: %.img
 	$(call quiet-command,$(OBJCOPY) -O binary -j .text $< $@,"  Building $(TARGET_DIR)$@")
diff --git a/pc-bios/optionrom/linuxboot_dma.c b/pc-bios/optionrom/linuxboot_dma.c
new file mode 100644
index 0000000..c1181cd
--- /dev/null
+++ b/pc-bios/optionrom/linuxboot_dma.c
@@ -0,0 +1,288 @@ 
+/*
+ * Linux Boot Option ROM for fw_cfg DMA
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ * Copyright (c) 2015 Red Hat Inc.
+ *   Authors: Marc Marí <markmb@redhat.com>
+ */
+
+asm(
+".text\n"
+".global _start\n"
+"_start:\n"
+"   .short	0xaa55\n"
+"   .byte (_end - _start) / 512\n"
+"   lret\n"
+"   .org 0x18\n"
+"   .short 0\n"
+"   .short _pnph\n"
+"_pnph:\n"
+"   .ascii \"$PnP\"\n"
+"   .byte 0x01\n"
+"   .byte ( _pnph_len / 16 )\n"
+"   .short 0x0000\n"
+"   .byte 0x00\n"
+"   .byte 0x00\n"
+"   .long 0x00000000\n"
+"   .short _manufacturer\n"
+"   .short _product\n"
+"   .long 0x00000000\n"
+"   .short 0x0000\n"
+"   .short 0x0000\n"
+"   .short _bev\n"
+"   .short 0x0000\n"
+"   .short 0x0000\n"
+"   .equ _pnph_len, . - _pnph\n"
+"   .align 4, 0\n"
+"_bev:\n"
+".code16gcc\n"
+/* DS = CS */
+"   movw %cs, %ax\n"
+"   movw %ax, %ds\n"
+"   movl %esp, %ebp\n"
+"   cli\n"
+"   cld\n"
+"   jmp load_kernel\n"
+);
+
+#define NO_QEMU_PROTOS
+#include "../../include/hw/nvram/fw_cfg.h"
+
+#define BOOT_ROM_PRODUCT "Linux loader DMA"
+
+/* QEMU_CFG_DMA_CONTROL bits */
+#define BIOS_CFG_DMA_CTL_ERROR   0x01
+#define BIOS_CFG_DMA_CTL_READ    0x02
+#define BIOS_CFG_DMA_CTL_SKIP    0x04
+#define BIOS_CFG_DMA_CTL_SELECT  0x08
+
+#define BIOS_CFG_DMA_ADDR_HIGH 0x514
+#define BIOS_CFG_DMA_ADDR_LOW  0x518
+
+#define _stringify(S)   #S
+#define stringify(S) _stringify(S)
+
+#define uint64_t unsigned long long
+#define uint32_t unsigned int
+#define uint16_t unsigned short
+
+#define barrier() asm("": : :"memory")
+
+typedef struct FWCfgDmaAccess {
+    uint32_t control;
+    uint32_t length;
+    uint64_t address;
+} __attribute__((packed)) FWCfgDmaAccess;
+
+static inline void outl(uint32_t value, uint16_t port) {
+    asm("outl %0, %w1" : : "a"(value), "Nd"(port));
+}
+
+static inline void set_es(void *addr) {
+    uint32_t seg = (uint32_t)addr >> 4;
+    asm("movl %0, %%es" : : "r"(seg));
+}
+
+static inline uint16_t readw_es(uint16_t offset) {
+    uint16_t val;
+    asm("addr32 movw %%es:(%1), %0" : "=r"(val) : "r"((uint32_t)offset));
+    barrier();
+    return val;
+}
+
+static inline uint32_t readl_es(uint16_t offset) {
+    uint32_t val;
+    asm("addr32 movl %%es:(%1), %0" : "=r"(val) : "r"((uint32_t)offset));
+    barrier();
+    return val;
+}
+
+static inline void writel_es(uint16_t offset, uint32_t val) {
+    barrier();
+    asm("addr32 movl %0, %%es:(%1)" : : "r"(val), "r"((uint32_t)offset));
+}
+
+static inline uint32_t bswap32(uint32_t x)
+{
+    return (((x & 0x000000ffU) << 24) |
+            ((x & 0x0000ff00U) <<  8) |
+            ((x & 0x00ff0000U) >>  8) |
+            ((x & 0xff000000U) >> 24));
+}
+
+static inline uint64_t bswap64(uint64_t x)
+{
+    return (((x & 0x00000000000000ffULL) << 56) |
+            ((x & 0x000000000000ff00ULL) << 40) |
+            ((x & 0x0000000000ff0000ULL) << 24) |
+            ((x & 0x00000000ff000000ULL) <<  8) |
+            ((x & 0x000000ff00000000ULL) >>  8) |
+            ((x & 0x0000ff0000000000ULL) >> 24) |
+            ((x & 0x00ff000000000000ULL) >> 40) |
+            ((x & 0xff00000000000000ULL) >> 56));
+}
+
+static inline uint64_t cpu_to_be64(uint64_t x) {
+    return bswap64(x);
+}
+
+static inline uint32_t cpu_to_be32(uint32_t x) {
+    return bswap32(x);
+}
+
+static inline uint32_t be32_to_cpu(uint32_t x) {
+    return bswap32(x);
+}
+
+static void bios_cfg_read_entry(void *buf, uint16_t entry, uint32_t len)
+{
+    FWCfgDmaAccess access;
+    uint32_t control = (entry << 16) | BIOS_CFG_DMA_CTL_SELECT
+                        | BIOS_CFG_DMA_CTL_READ;
+
+    access.address = cpu_to_be64((uint64_t)(uint32_t)buf);
+    access.length = cpu_to_be32(len);
+    access.control = cpu_to_be32(control);
+
+    barrier();
+
+    outl(cpu_to_be32((uint32_t)&access), BIOS_CFG_DMA_ADDR_LOW);
+
+    while(be32_to_cpu(access.control) & ~BIOS_CFG_DMA_CTL_ERROR) {
+        barrier();
+    }
+}
+
+static uint32_t get_e801_addr(void)
+{
+    uint32_t eax, ebx, ecx, edx;
+    uint32_t ret;
+
+    eax = 0xe801;
+    ebx = 0;
+    ecx = 0;
+    edx = 0;
+    asm("int $0x15\n"
+        : "+a"(eax)
+        : "b"(ebx), "c"(ecx), "d"(edx));
+
+    /* Output could be in AX/BX or CX/DX */
+    if ((uint16_t)ecx || (uint16_t)edx) {
+        if(!(uint16_t)edx) {
+            /* Add 1 MB and convert to bytes */
+            ret = (ecx + 1024) << 10;
+        } else {
+            /* Add 16 MB and convert to bytes */
+            ret = (edx + 256) << 16;
+        }
+    } else {
+        if(!(uint16_t)ebx) {
+            /* Add 1 MB and convert to bytes */
+            ret = (eax + 1024) << 10;
+        } else {
+            /* Add 16 MB and convert to bytes */
+            ret = (ebx + 256) << 16;
+        }
+    }
+
+    return ret;
+}
+
+void load_kernel(void)
+{
+    void *setup_addr;
+    void *initrd_addr;
+    void *kernel_addr;
+    void *cmdline_addr;
+    uint32_t setup_size;
+    uint32_t initrd_size;
+    uint32_t kernel_size;
+    uint32_t cmdline_size;
+    uint32_t initrd_end_page, max_allowed_page;
+    uint32_t segment_addr, stack_addr;
+
+    bios_cfg_read_entry(&setup_addr, FW_CFG_SETUP_ADDR, 4);
+    bios_cfg_read_entry(&setup_size, FW_CFG_SETUP_SIZE, 4);
+    bios_cfg_read_entry(setup_addr, FW_CFG_SETUP_DATA, setup_size);
+
+    set_es(setup_addr);
+
+    if (readw_es(0x206) < 0x203) {
+        /* Assume initrd_max 0x37ffffff */
+        writel_es(0x22c, 0x37ffffff);
+    }
+
+    bios_cfg_read_entry(&initrd_addr, FW_CFG_INITRD_ADDR, 4);
+    bios_cfg_read_entry(&initrd_size, FW_CFG_INITRD_SIZE, 4);
+
+    initrd_end_page = ((uint32_t)(initrd_addr + initrd_size) & -4096);
+    max_allowed_page = (readl_es(0x22c) & -4096);
+
+    if (initrd_end_page != 0 && max_allowed_page != 0 &&
+        initrd_end_page != max_allowed_page) {
+        /* Initrd at the end of memory. Compute better initrd address
+         * based on e801 data
+         */
+        initrd_addr = (void *)((get_e801_addr() - initrd_size) & -4096);
+        writel_es(0x218, (uint32_t)initrd_addr);
+
+    }
+
+    bios_cfg_read_entry(initrd_addr, FW_CFG_INITRD_DATA, initrd_size);
+
+    bios_cfg_read_entry(&kernel_addr, FW_CFG_KERNEL_ADDR, 4);
+    bios_cfg_read_entry(&kernel_size, FW_CFG_KERNEL_SIZE, 4);
+    bios_cfg_read_entry(kernel_addr, FW_CFG_KERNEL_DATA, kernel_size);
+
+    bios_cfg_read_entry(&cmdline_addr, FW_CFG_CMDLINE_ADDR, 4);
+    bios_cfg_read_entry(&cmdline_size, FW_CFG_CMDLINE_SIZE, 4);
+    bios_cfg_read_entry(cmdline_addr, FW_CFG_CMDLINE_DATA, cmdline_size);
+
+    /* Boot linux */
+    segment_addr = ((uint32_t)setup_addr >> 4);
+    stack_addr = (uint32_t)(cmdline_addr - setup_addr - 16);
+
+    /* As we are changing critical registers, we cannot leave freedom to the
+     * compiler.
+     */
+    asm("movw %%ax, %%ds\n"
+        "movw %%ax, %%es\n"
+        "movw %%ax, %%fs\n"
+        "movw %%ax, %%gs\n"
+        "movw %%ax, %%ss\n"
+        "movl %%ebx, %%esp\n"
+        "addw $0x20, %%ax\n"
+        "pushw %%ax\n" /* CS */
+        "pushw $0\n" /* IP */
+        /* Clear registers and jump to Linux */
+        "xor %%ebx, %%ebx\n"
+        "xor %%ecx, %%ecx\n"
+        "xor %%edx, %%edx\n"
+        "xor %%edi, %%edi\n"
+        "xor %%ebp, %%ebp\n"
+        "lretw\n"
+        : : "a"(segment_addr), "b"(stack_addr));
+}
+
+asm(
+"_manufacturer:\n"
+".asciz \"QEMU\"\n"
+"_product:\n"
+".asciz "stringify(BOOT_ROM_PRODUCT)"\n"
+".byte 0\n"
+".align 512, 0\n"
+"_end:\n"
+);
+