diff mbox series

[qemu] spapr: Use address from elf parser for kernel address

Message ID 20220504065536.3534488-1-aik@ozlabs.ru
State New
Headers show
Series [qemu] spapr: Use address from elf parser for kernel address | expand

Commit Message

Alexey Kardashevskiy May 4, 2022, 6:55 a.m. UTC
tl;dr: This allows Big Endian zImage booting via -kernel + x-vof=on.

QEMU loads the kernel at 0x400000 by default which works most of
the time as Linux kernels are relocatable, 64bit and compiled with "-pie"
(position independent code). This works for a little endian zImage too.

However a big endian zImage is compiled without -pie, is 32bit, linked to
0x4000000 so current QEMU ends up loading it at
0x4400000 but keeps spapr->kernel_addr unchanged so booting fails.

This uses the kernel address returned from load_elf().
If the default kernel_addr is used, there is no change in behavior (as
translate_kernel_address() takes care of this), which is:
LE/BE vmlinux and LE zImage boot, BE zImage does not.
If the VM created with "-machine kernel-addr=0,x-vof=on", then QEMU
prints a warning and BE zImage boots.

Note #1: SLOF (x-vof=off) still cannot boot a big endian zImage as
SLOF enables MSR_SF for everything loaded by QEMU and this leads to early
crash of 32bit zImage.

Note #2: BE/LE vmlinux images set MSR_SF in early boot so these just work;
a LE zImage restores MSR_SF after every CI call and we are lucky enough
not to crash before the first CI call.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---

We could probably change SLOF to always clear MSR_SF before jumping to
the kernel but this is 1) SLOF fix 2) not quite sure if it brings
lots of value.


I really wish I had this when tested this fix:
https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20220406070038.3704604-1-aik@ozlabs.ru/

---
 hw/ppc/spapr.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

Comments

Fabiano Rosas May 4, 2022, 7:16 p.m. UTC | #1
Alexey Kardashevskiy <aik@ozlabs.ru> writes:

> tl;dr: This allows Big Endian zImage booting via -kernel + x-vof=on.
>
> QEMU loads the kernel at 0x400000 by default which works most of
> the time as Linux kernels are relocatable, 64bit and compiled with "-pie"
> (position independent code). This works for a little endian zImage too.
>
> However a big endian zImage is compiled without -pie, is 32bit, linked to
> 0x4000000 so current QEMU ends up loading it at
> 0x4400000 but keeps spapr->kernel_addr unchanged so booting fails.
>
> This uses the kernel address returned from load_elf().
> If the default kernel_addr is used, there is no change in behavior (as
> translate_kernel_address() takes care of this), which is:
> LE/BE vmlinux and LE zImage boot, BE zImage does not.
> If the VM created with "-machine kernel-addr=0,x-vof=on", then QEMU
> prints a warning and BE zImage boots.

I think we can fix this without needing a different command line for BE
zImage (apart from x-vof, which is a separate matter).

If you look at translate_kernel_address, it cannot really work when the
ELF PhysAddr is != 0. We would always hit this sort of 0x4400000 issue,
so if we fix that function like this...

static uint64_t translate_kernel_address(void *opaque, uint64_t addr)
{
    SpaprMachineState *spapr = opaque;

    return addr ? addr : spapr->kernel_addr;
}

...then we could always use the ELF PhysAddr if it is different from 0
and only use the default load addr if the ELF PhysAddr is 0. If the user
gives kernel_addr on the cmdline, we honor that, even if puts the kernel
over the firmware (we have code to detect that).

> @@ -2988,6 +2990,12 @@ static void spapr_machine_init(MachineState *machine)
>              exit(1);
>          }
>  
> +        if (spapr->kernel_addr != loaded_addr) {

This could be:

        if (spapr->kernel_addr == KERNEL_LOAD_ADDR &&
	    spapr->kernel_addr != loaded_addr) {

So the precedence would be:

1- ELF PhysAddr, if != 0. After all, that is what it's for. BE zImage
   falls here;
    
2- KERNEL_LOAD_ADDR. Via translate_kernel_address, LE/BE vmlinux fall
   here;

3- kernel_addr. The user is probably hacking something, just use what
   they gave us. QEMU will yell if they load the kernel over the fw.

> +            warn_report("spapr: kernel_addr changed from 0x%lx to 0x%lx",
> +                        spapr->kernel_addr, loaded_addr);
> +            spapr->kernel_addr = loaded_addr;
> +        }
> +
>          /* load initrd */
>          if (initrd_filename) {
>              /* Try to locate the initrd in the gap between the kernel
Alexey Kardashevskiy May 5, 2022, 3:29 a.m. UTC | #2
On 5/5/22 05:16, Fabiano Rosas wrote:
> Alexey Kardashevskiy <aik@ozlabs.ru> writes:
> 
>> tl;dr: This allows Big Endian zImage booting via -kernel + x-vof=on.
>>
>> QEMU loads the kernel at 0x400000 by default which works most of
>> the time as Linux kernels are relocatable, 64bit and compiled with "-pie"
>> (position independent code). This works for a little endian zImage too.
>>
>> However a big endian zImage is compiled without -pie, is 32bit, linked to
>> 0x4000000 so current QEMU ends up loading it at
>> 0x4400000 but keeps spapr->kernel_addr unchanged so booting fails.
>>
>> This uses the kernel address returned from load_elf().
>> If the default kernel_addr is used, there is no change in behavior (as
>> translate_kernel_address() takes care of this), which is:
>> LE/BE vmlinux and LE zImage boot, BE zImage does not.
>> If the VM created with "-machine kernel-addr=0,x-vof=on", then QEMU
>> prints a warning and BE zImage boots.
> 
> I think we can fix this without needing a different command line for BE
> zImage (apart from x-vof, which is a separate matter).
> 
> If you look at translate_kernel_address, it cannot really work when the
> ELF PhysAddr is != 0. We would always hit this sort of 0x4400000 issue,
> so if we fix that function like this...
> 
> static uint64_t translate_kernel_address(void *opaque, uint64_t addr)
> {
>      SpaprMachineState *spapr = opaque;
> 
>      return addr ? addr : spapr->kernel_addr;
> }


The qemu elf loader is supposed to handle relocations which should be 
calling this hook more than once, now I wonder why it is not doing so.


> ...then we could always use the ELF PhysAddr if it is different from 0
> and only use the default load addr if the ELF PhysAddr is 0. If the user
> gives kernel_addr on the cmdline, we honor that, even if puts the kernel
> over the firmware (we have code to detect that).


ELF address is 0 for LE zImage only, vmlinux BE/LE uses 
0xc000000000000000. And we are already chopping all these tops bits off 
in translate_kernel_address() and I do not really know why _exactly_ it 
is 0x0fffffff and not let's say 0x7fffffff.


> 
>> @@ -2988,6 +2990,12 @@ static void spapr_machine_init(MachineState *machine)
>>               exit(1);
>>           }
>>   
>> +        if (spapr->kernel_addr != loaded_addr) {
> 
> This could be:
> 
>          if (spapr->kernel_addr == KERNEL_LOAD_ADDR &&
> 	    spapr->kernel_addr != loaded_addr) {
> 
> So the precedence would be:
> 
> 1- ELF PhysAddr, if != 0. After all, that is what it's for. BE zImage
>     falls here;
>      
> 2- KERNEL_LOAD_ADDR. Via translate_kernel_address, LE/BE vmlinux fall
>     here;
> 
> 3- kernel_addr. The user is probably hacking something, just use what
>     they gave us. QEMU will yell if they load the kernel over the fw.


imho too complicated.

What if the user runs QEMU with kernel-addr=0x400000? (0x400000 is 
KERNEL_LOAD_ADDR noooow but not necessarily forever). Is it 2) or 3)?

I am basically fixing a bug when we ignore where load_elf() loaded the 
ELF and just assume it is KERNEL_LOAD_ADDR. Now the code checks if the 
ELF was loaded where we want it to be. Everything else can be done but 
on top of this.


>> +            warn_report("spapr: kernel_addr changed from 0x%lx to 0x%lx",
>> +                        spapr->kernel_addr, loaded_addr);
>> +            spapr->kernel_addr = loaded_addr;
>> +        }
>> +
>>           /* load initrd */
>>           if (initrd_filename) {
>>               /* Try to locate the initrd in the gap between the kernel
Joel Stanley May 5, 2022, 4:16 a.m. UTC | #3
On Thu, 5 May 2022 at 03:31, Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
>
>
>
> On 5/5/22 05:16, Fabiano Rosas wrote:
> > Alexey Kardashevskiy <aik@ozlabs.ru> writes:
> >
> >> tl;dr: This allows Big Endian zImage booting via -kernel + x-vof=on.
> >>
> >> QEMU loads the kernel at 0x400000 by default which works most of
> >> the time as Linux kernels are relocatable, 64bit and compiled with "-pie"
> >> (position independent code). This works for a little endian zImage too.
> >>
> >> However a big endian zImage is compiled without -pie, is 32bit, linked to
> >> 0x4000000 so current QEMU ends up loading it at
> >> 0x4400000 but keeps spapr->kernel_addr unchanged so booting fails.
> >>
> >> This uses the kernel address returned from load_elf().
> >> If the default kernel_addr is used, there is no change in behavior (as
> >> translate_kernel_address() takes care of this), which is:
> >> LE/BE vmlinux and LE zImage boot, BE zImage does not.
> >> If the VM created with "-machine kernel-addr=0,x-vof=on", then QEMU
> >> prints a warning and BE zImage boots.
> >
> > I think we can fix this without needing a different command line for BE
> > zImage (apart from x-vof, which is a separate matter).
> >
> > If you look at translate_kernel_address, it cannot really work when the
> > ELF PhysAddr is != 0. We would always hit this sort of 0x4400000 issue,
> > so if we fix that function like this...
> >
> > static uint64_t translate_kernel_address(void *opaque, uint64_t addr)
> > {
> >      SpaprMachineState *spapr = opaque;
> >
> >      return addr ? addr : spapr->kernel_addr;
> > }
>
>
> The qemu elf loader is supposed to handle relocations which should be
> calling this hook more than once, now I wonder why it is not doing so.
>
>
> > ...then we could always use the ELF PhysAddr if it is different from 0
> > and only use the default load addr if the ELF PhysAddr is 0. If the user
> > gives kernel_addr on the cmdline, we honor that, even if puts the kernel
> > over the firmware (we have code to detect that).
>
>
> ELF address is 0 for LE zImage only, vmlinux BE/LE uses
> 0xc000000000000000. And we are already chopping all these tops bits off
> in translate_kernel_address() and I do not really know why _exactly_ it
> is 0x0fffffff and not let's say 0x7fffffff.
>
>
> >
> >> @@ -2988,6 +2990,12 @@ static void spapr_machine_init(MachineState *machine)
> >>               exit(1);
> >>           }
> >>
> >> +        if (spapr->kernel_addr != loaded_addr) {
> >
> > This could be:
> >
> >          if (spapr->kernel_addr == KERNEL_LOAD_ADDR &&
> >           spapr->kernel_addr != loaded_addr) {
> >
> > So the precedence would be:
> >
> > 1- ELF PhysAddr, if != 0. After all, that is what it's for. BE zImage
> >     falls here;
> >
> > 2- KERNEL_LOAD_ADDR. Via translate_kernel_address, LE/BE vmlinux fall
> >     here;
> >
> > 3- kernel_addr. The user is probably hacking something, just use what
> >     they gave us. QEMU will yell if they load the kernel over the fw.
>
>
> imho too complicated.
>
> What if the user runs QEMU with kernel-addr=0x400000? (0x400000 is
> KERNEL_LOAD_ADDR noooow but not necessarily forever). Is it 2) or 3)?
>
> I am basically fixing a bug when we ignore where load_elf() loaded the
> ELF and just assume it is KERNEL_LOAD_ADDR. Now the code checks if the
> ELF was loaded where we want it to be. Everything else can be done but
> on top of this.

It would be good to fix this so we don't need to specify kernel-addr=0.

I only recently learnt the pseries machine doesn't support loading the zImage:

 https://github.com/linuxppc/issues/issues/402

So whatever the fix is, writing down what is expected to work and what
isn't would be useful.

I tested your patch and it worked with this command line:

 qemu-system-ppc64 -M pseries,kernel-addr=0,x-vof=on -nographic
-kernel arch/powerpc/boot/zImage.pseries -serial mon:stdio -nodefaults

Tested-by: Joel Stanley <joel@jms.id.au>

Cheers,

Joel

>
>
> >> +            warn_report("spapr: kernel_addr changed from 0x%lx to 0x%lx",
> >> +                        spapr->kernel_addr, loaded_addr);
> >> +            spapr->kernel_addr = loaded_addr;
> >> +        }
> >> +
> >>           /* load initrd */
> >>           if (initrd_filename) {
> >>               /* Try to locate the initrd in the gap between the kernel
>
Alexey Kardashevskiy May 5, 2022, 5:07 a.m. UTC | #4
On 5/5/22 14:16, Joel Stanley wrote:
> On Thu, 5 May 2022 at 03:31, Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
>>
>>
>>
>> On 5/5/22 05:16, Fabiano Rosas wrote:
>>> Alexey Kardashevskiy <aik@ozlabs.ru> writes:
>>>
>>>> tl;dr: This allows Big Endian zImage booting via -kernel + x-vof=on.
>>>>
>>>> QEMU loads the kernel at 0x400000 by default which works most of
>>>> the time as Linux kernels are relocatable, 64bit and compiled with "-pie"
>>>> (position independent code). This works for a little endian zImage too.
>>>>
>>>> However a big endian zImage is compiled without -pie, is 32bit, linked to
>>>> 0x4000000 so current QEMU ends up loading it at
>>>> 0x4400000 but keeps spapr->kernel_addr unchanged so booting fails.
>>>>
>>>> This uses the kernel address returned from load_elf().
>>>> If the default kernel_addr is used, there is no change in behavior (as
>>>> translate_kernel_address() takes care of this), which is:
>>>> LE/BE vmlinux and LE zImage boot, BE zImage does not.
>>>> If the VM created with "-machine kernel-addr=0,x-vof=on", then QEMU
>>>> prints a warning and BE zImage boots.
>>>
>>> I think we can fix this without needing a different command line for BE
>>> zImage (apart from x-vof, which is a separate matter).
>>>
>>> If you look at translate_kernel_address, it cannot really work when the
>>> ELF PhysAddr is != 0. We would always hit this sort of 0x4400000 issue,
>>> so if we fix that function like this...
>>>
>>> static uint64_t translate_kernel_address(void *opaque, uint64_t addr)
>>> {
>>>       SpaprMachineState *spapr = opaque;
>>>
>>>       return addr ? addr : spapr->kernel_addr;
>>> }
>>
>>
>> The qemu elf loader is supposed to handle relocations which should be
>> calling this hook more than once, now I wonder why it is not doing so.
>>
>>
>>> ...then we could always use the ELF PhysAddr if it is different from 0
>>> and only use the default load addr if the ELF PhysAddr is 0. If the user
>>> gives kernel_addr on the cmdline, we honor that, even if puts the kernel
>>> over the firmware (we have code to detect that).
>>
>>
>> ELF address is 0 for LE zImage only, vmlinux BE/LE uses
>> 0xc000000000000000. And we are already chopping all these tops bits off
>> in translate_kernel_address() and I do not really know why _exactly_ it
>> is 0x0fffffff and not let's say 0x7fffffff.
>>
>>
>>>
>>>> @@ -2988,6 +2990,12 @@ static void spapr_machine_init(MachineState *machine)
>>>>                exit(1);
>>>>            }
>>>>
>>>> +        if (spapr->kernel_addr != loaded_addr) {
>>>
>>> This could be:
>>>
>>>           if (spapr->kernel_addr == KERNEL_LOAD_ADDR &&
>>>            spapr->kernel_addr != loaded_addr) {
>>>
>>> So the precedence would be:
>>>
>>> 1- ELF PhysAddr, if != 0. After all, that is what it's for. BE zImage
>>>      falls here;
>>>
>>> 2- KERNEL_LOAD_ADDR. Via translate_kernel_address, LE/BE vmlinux fall
>>>      here;
>>>
>>> 3- kernel_addr. The user is probably hacking something, just use what
>>>      they gave us. QEMU will yell if they load the kernel over the fw.
>>
>>
>> imho too complicated.
>>
>> What if the user runs QEMU with kernel-addr=0x400000? (0x400000 is
>> KERNEL_LOAD_ADDR noooow but not necessarily forever). Is it 2) or 3)?
>>
>> I am basically fixing a bug when we ignore where load_elf() loaded the
>> ELF and just assume it is KERNEL_LOAD_ADDR. Now the code checks if the
>> ELF was loaded where we want it to be. Everything else can be done but
>> on top of this.
> 
> It would be good to fix this so we don't need to specify kernel-addr=0.


This means the pseries code in QEMU needs to read the ELF header and 
figure out if it is position independent and what is the base address. 
And because it is Linux which is special, just reading the ELF header is 
not enough, need more heuristics (there is some already in 
translate_kernel_address()).

LE vmlinux is 64bit EXEC type and entry=0xc000000000000000
BE vmlinux is 64bit EXEC type and entry=0xc000000000000000
LE zImage is 64bit DYN type and entry=0x0
BE zImage is 32bit EXEC type and entry=0x4000000

And the default address for these in QEMU is 0x400000. Asking 
kernel-addr=0 and vof=on looks like a small evil :)

And also worth mentioning that with this hack it should be possible to 
boot grub.elf via -kernel which might be interesting for debugging, and 
that thing is linked to 0x200000 or so, and probably also 32bit BE (I do 
not have one handy).



> I only recently learnt the pseries machine doesn't support loading the zImage:
> 
>   https://github.com/linuxppc/issues/issues/402
> 
> So whatever the fix is, writing down what is expected to work and what
> isn't would be useful.
> 
> I tested your patch and it worked with this command line:
> 
>   qemu-system-ppc64 -M pseries,kernel-addr=0,x-vof=on -nographic
> -kernel arch/powerpc/boot/zImage.pseries -serial mon:stdio -nodefaults
> 
> Tested-by: Joel Stanley <joel@jms.id.au>

Cool thanks!

> 
> Cheers,
> 
> Joel
> 
>>
>>
>>>> +            warn_report("spapr: kernel_addr changed from 0x%lx to 0x%lx",
>>>> +                        spapr->kernel_addr, loaded_addr);
>>>> +            spapr->kernel_addr = loaded_addr;
>>>> +        }
>>>> +
>>>>            /* load initrd */
>>>>            if (initrd_filename) {
>>>>                /* Try to locate the initrd in the gap between the kernel
>>
Fabiano Rosas May 5, 2022, 3:50 p.m. UTC | #5
Alexey Kardashevskiy <aik@ozlabs.ru> writes:

> On 5/5/22 05:16, Fabiano Rosas wrote:
>> Alexey Kardashevskiy <aik@ozlabs.ru> writes:
>> 
>>> tl;dr: This allows Big Endian zImage booting via -kernel + x-vof=on.
>>>
>>> QEMU loads the kernel at 0x400000 by default which works most of
>>> the time as Linux kernels are relocatable, 64bit and compiled with "-pie"
>>> (position independent code). This works for a little endian zImage too.
>>>
>>> However a big endian zImage is compiled without -pie, is 32bit, linked to
>>> 0x4000000 so current QEMU ends up loading it at
>>> 0x4400000 but keeps spapr->kernel_addr unchanged so booting fails.
>>>
>>> This uses the kernel address returned from load_elf().
>>> If the default kernel_addr is used, there is no change in behavior (as
>>> translate_kernel_address() takes care of this), which is:
>>> LE/BE vmlinux and LE zImage boot, BE zImage does not.
>>> If the VM created with "-machine kernel-addr=0,x-vof=on", then QEMU
>>> prints a warning and BE zImage boots.
>> 
>> I think we can fix this without needing a different command line for BE
>> zImage (apart from x-vof, which is a separate matter).
>> 
>> If you look at translate_kernel_address, it cannot really work when the
>> ELF PhysAddr is != 0. We would always hit this sort of 0x4400000 issue,
>> so if we fix that function like this...
>> 
>> static uint64_t translate_kernel_address(void *opaque, uint64_t addr)
>> {
>>      SpaprMachineState *spapr = opaque;
>> 
>>      return addr ? addr : spapr->kernel_addr;
>> }
>
>
> The qemu elf loader is supposed to handle relocations which should be 
> calling this hook more than once, now I wonder why it is not doing so.

For relocations, it seems to only call translate_fn for s390x.

>> ...then we could always use the ELF PhysAddr if it is different from 0
>> and only use the default load addr if the ELF PhysAddr is 0. If the user
>> gives kernel_addr on the cmdline, we honor that, even if puts the kernel
>> over the firmware (we have code to detect that).
>
>
> ELF address is 0 for LE zImage only, vmlinux BE/LE uses 
> 0xc000000000000000. And we are already chopping all these tops bits off 
> in translate_kernel_address() and I do not really know why _exactly_ it 
> is 0x0fffffff and not let's say 0x7fffffff.

Oh, I am not talking about the ELF entry point. And that is not what
load_elf passes to translate_kernel_address either. It is the ELF
PhysAddr:

$ powerpc64le-buildroot-linux-gnu-readelf -We vmlinux | tail

Program Headers:
  Type           Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
  LOAD           0x010000 0xc000000000000000 0x0000000000000000 0x28d84d4 0x2a54ea8 RWE 0x10000

So it is 0x400000 for BE zImage and 0 for vmlinux.

>> 
>>> @@ -2988,6 +2990,12 @@ static void spapr_machine_init(MachineState *machine)
>>>               exit(1);
>>>           }
>>>   
>>> +        if (spapr->kernel_addr != loaded_addr) {
>> 
>> This could be:
>> 
>>          if (spapr->kernel_addr == KERNEL_LOAD_ADDR &&
>> 	    spapr->kernel_addr != loaded_addr) {
>> 
>> So the precedence would be:
>> 
>> 1- ELF PhysAddr, if != 0. After all, that is what it's for. BE zImage
>>     falls here;
>>      
>> 2- KERNEL_LOAD_ADDR. Via translate_kernel_address, LE/BE vmlinux fall
>>     here;
>> 
>> 3- kernel_addr. The user is probably hacking something, just use what
>>     they gave us. QEMU will yell if they load the kernel over the fw.
>
>
> imho too complicated.
>
> What if the user runs QEMU with kernel-addr=0x400000? (0x400000 is 
> KERNEL_LOAD_ADDR noooow but not necessarily forever). Is it 2) or 3)?

Good point. It should always be 3. It is a bad user interface to have a
command line option and arbitrarily ignore it. Be it 0x0 or 0x400000.

> I am basically fixing a bug when we ignore where load_elf() loaded the 
> ELF and just assume it is KERNEL_LOAD_ADDR. Now the code checks if the 
> ELF was loaded where we want it to be.

That bug is already accounted for, that is why we have
translate_kernel_address:

  /* address_offset is hack for kernel images that are
     linked at the wrong physical address.  */
  if (translate_fn) {
      addr = translate_fn(translate_opaque, ph->p_paddr);

So we're actively telling load_elf to load the kernel at the wrong place
when we do:

(ph->p_addr & 0x0fffffff) + spapr->kernel_addr

It just happened to work so far because the vmlinux PhysAddr is 0.

When you set kernel-addr=0 you are simply working around this other bug
because:

(ph->p_addr & 0x0fffffff) + 0 == ph->p_addr

I just want to remove this indirection and use the ELF PhysAddr
directly.

> Everything else can be done but on top of this.

If you want to merge this I could send another patch on top of it later,
no worries. I realise that this a larger change that will need more
testing.

>>> +            warn_report("spapr: kernel_addr changed from 0x%lx to 0x%lx",
>>> +                        spapr->kernel_addr, loaded_addr);
>>> +            spapr->kernel_addr = loaded_addr;
>>> +        }
>>> +
>>>           /* load initrd */
>>>           if (initrd_filename) {
>>>               /* Try to locate the initrd in the gap between the kernel
Alexey Kardashevskiy May 6, 2022, 4:49 a.m. UTC | #6
On 06/05/2022 01:50, Fabiano Rosas wrote:
> Alexey Kardashevskiy <aik@ozlabs.ru> writes:
> 
>> On 5/5/22 05:16, Fabiano Rosas wrote:
>>> Alexey Kardashevskiy <aik@ozlabs.ru> writes:
>>>
>>>> tl;dr: This allows Big Endian zImage booting via -kernel + x-vof=on.
>>>>
>>>> QEMU loads the kernel at 0x400000 by default which works most of
>>>> the time as Linux kernels are relocatable, 64bit and compiled with "-pie"
>>>> (position independent code). This works for a little endian zImage too.
>>>>
>>>> However a big endian zImage is compiled without -pie, is 32bit, linked to
>>>> 0x4000000 so current QEMU ends up loading it at
>>>> 0x4400000 but keeps spapr->kernel_addr unchanged so booting fails.
>>>>
>>>> This uses the kernel address returned from load_elf().
>>>> If the default kernel_addr is used, there is no change in behavior (as
>>>> translate_kernel_address() takes care of this), which is:
>>>> LE/BE vmlinux and LE zImage boot, BE zImage does not.
>>>> If the VM created with "-machine kernel-addr=0,x-vof=on", then QEMU
>>>> prints a warning and BE zImage boots.
>>>
>>> I think we can fix this without needing a different command line for BE
>>> zImage (apart from x-vof, which is a separate matter).
>>>
>>> If you look at translate_kernel_address, it cannot really work when the
>>> ELF PhysAddr is != 0. We would always hit this sort of 0x4400000 issue,
>>> so if we fix that function like this...
>>>
>>> static uint64_t translate_kernel_address(void *opaque, uint64_t addr)
>>> {
>>>       SpaprMachineState *spapr = opaque;
>>>
>>>       return addr ? addr : spapr->kernel_addr;
>>> }
>>
>>
>> The qemu elf loader is supposed to handle relocations which should be
>> calling this hook more than once, now I wonder why it is not doing so.
> 
> For relocations, it seems to only call translate_fn for s390x.


Agrh. You made me read this :) It is quite weird code and yeah 390-only :-/


>>> ...then we could always use the ELF PhysAddr if it is different from 0
>>> and only use the default load addr if the ELF PhysAddr is 0. If the user
>>> gives kernel_addr on the cmdline, we honor that, even if puts the kernel
>>> over the firmware (we have code to detect that).
>>
>>
>> ELF address is 0 for LE zImage only, vmlinux BE/LE uses
>> 0xc000000000000000. And we are already chopping all these tops bits off
>> in translate_kernel_address() and I do not really know why _exactly_ it
>> is 0x0fffffff and not let's say 0x7fffffff.
> 
> Oh, I am not talking about the ELF entry point. And that is not what
> load_elf passes to translate_kernel_address either. It is the ELF
> PhysAddr:
> 
> $ powerpc64le-buildroot-linux-gnu-readelf -We vmlinux | tail
> 
> Program Headers:
>    Type           Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
>    LOAD           0x010000 0xc000000000000000 0x0000000000000000 0x28d84d4 0x2a54ea8 RWE 0x10000
> 
> So it is 0x400000 for BE zImage and 0 for vmlinux.

Ah right. Me wrong.

btw potentially there can be more program sections.

[fstn1-p1 ~]$ readelf -l /home/aik/p/slof/board-qemu/llfw/stage1.elf

Elf file type is EXEC (Executable file)
Entry point 0x100
There are 2 program headers, starting at offset 64

Program Headers:
   Type           Offset             VirtAddr           PhysAddr
                  FileSiz            MemSiz              Flags  Align
   LOAD           0x0000000000000100 0x0000000000000100 0x0000000000000100
                  0x0000000000008110 0x0000000000010290  RWE    0x4000
   LOAD           0x0000000000008210 0x0000000000010400 0x0000000000010400
                  0x0000000000000010 0x0000000000000010  RW     0x8


grub might be similar.


> 
>>>
>>>> @@ -2988,6 +2990,12 @@ static void spapr_machine_init(MachineState *machine)
>>>>                exit(1);
>>>>            }
>>>>    
>>>> +        if (spapr->kernel_addr != loaded_addr) {
>>>
>>> This could be:
>>>
>>>           if (spapr->kernel_addr == KERNEL_LOAD_ADDR &&
>>> 	    spapr->kernel_addr != loaded_addr) {
>>>
>>> So the precedence would be:
>>>
>>> 1- ELF PhysAddr, if != 0. After all, that is what it's for. BE zImage
>>>      falls here;
>>>       
>>> 2- KERNEL_LOAD_ADDR. Via translate_kernel_address, LE/BE vmlinux fall
>>>      here;
>>>
>>> 3- kernel_addr. The user is probably hacking something, just use what
>>>      they gave us. QEMU will yell if they load the kernel over the fw.
>>
>>
>> imho too complicated.
>>
>> What if the user runs QEMU with kernel-addr=0x400000? (0x400000 is
>> KERNEL_LOAD_ADDR noooow but not necessarily forever). Is it 2) or 3)?
> 
> Good point. It should always be 3. It is a bad user interface to have a
> command line option and arbitrarily ignore it. Be it 0x0 or 0x400000.
> 
>> I am basically fixing a bug when we ignore where load_elf() loaded the
>> ELF and just assume it is KERNEL_LOAD_ADDR. Now the code checks if the
>> ELF was loaded where we want it to be.
> 
> That bug is already accounted for, that is why we have
> translate_kernel_address:
> 
>    /* address_offset is hack for kernel images that are
>       linked at the wrong physical address.  */
>    if (translate_fn) {
>        addr = translate_fn(translate_opaque, ph->p_paddr);
> 
> So we're actively telling load_elf to load the kernel at the wrong place
> when we do:
> 
> (ph->p_addr & 0x0fffffff) + spapr->kernel_addr
> 
> It just happened to work so far because the vmlinux PhysAddr is 0.
> 
> When you set kernel-addr=0 you are simply working around this other bug
> because:
> 
> (ph->p_addr & 0x0fffffff) + 0 == ph->p_addr
> 
> I just want to remove this indirection and use the ELF PhysAddr
> directly.


That's alright but not in translate_kernel_address(). May be I should 
rename kernel-addr to kernel-offset (which it really is)  or  hack 
load_elf() so it would take the desired location and work out the offset 
inside (and ditch the translate_fn hook) but either way we'll need 
heuristics (or the user should know) to know if the image is 
self-relocatable or not as otherwise it just won't boot.



>> Everything else can be done but on top of this.
> 
> If you want to merge this I could send another patch on top of it later,
> no worries. I realise that this a larger change that will need more
> testing.


I want to have some easy-to-explain way of booting BE zImage :)


> 
>>>> +            warn_report("spapr: kernel_addr changed from 0x%lx to 0x%lx",
>>>> +                        spapr->kernel_addr, loaded_addr);
>>>> +            spapr->kernel_addr = loaded_addr;
>>>> +        }
>>>> +
>>>>            /* load initrd */
>>>>            if (initrd_filename) {
>>>>                /* Try to locate the initrd in the gap between the kernel
Fabiano Rosas May 11, 2022, 5:47 p.m. UTC | #7
Alexey Kardashevskiy <aik@ozlabs.ru> writes:

> On 06/05/2022 01:50, Fabiano Rosas wrote:
>> Alexey Kardashevskiy <aik@ozlabs.ru> writes:
>> 
>>> On 5/5/22 05:16, Fabiano Rosas wrote:
>>>> Alexey Kardashevskiy <aik@ozlabs.ru> writes:
>>>>
>>>>> tl;dr: This allows Big Endian zImage booting via -kernel + x-vof=on.
>>>>>
>>>>> QEMU loads the kernel at 0x400000 by default which works most of
>>>>> the time as Linux kernels are relocatable, 64bit and compiled with "-pie"
>>>>> (position independent code). This works for a little endian zImage too.
>>>>>
>>>>> However a big endian zImage is compiled without -pie, is 32bit, linked to
>>>>> 0x4000000 so current QEMU ends up loading it at
>>>>> 0x4400000 but keeps spapr->kernel_addr unchanged so booting fails.
>>>>>
>>>>> This uses the kernel address returned from load_elf().
>>>>> If the default kernel_addr is used, there is no change in behavior (as
>>>>> translate_kernel_address() takes care of this), which is:
>>>>> LE/BE vmlinux and LE zImage boot, BE zImage does not.
>>>>> If the VM created with "-machine kernel-addr=0,x-vof=on", then QEMU
>>>>> prints a warning and BE zImage boots.
>>>>
>>>> I think we can fix this without needing a different command line for BE
>>>> zImage (apart from x-vof, which is a separate matter).
>>>>
>>>> If you look at translate_kernel_address, it cannot really work when the
>>>> ELF PhysAddr is != 0. We would always hit this sort of 0x4400000 issue,
>>>> so if we fix that function like this...
>>>>
>>>> static uint64_t translate_kernel_address(void *opaque, uint64_t addr)
>>>> {
>>>>       SpaprMachineState *spapr = opaque;
>>>>
>>>>       return addr ? addr : spapr->kernel_addr;
>>>> }
>>>
>>>
>>> The qemu elf loader is supposed to handle relocations which should be
>>> calling this hook more than once, now I wonder why it is not doing so.
>> 
>> For relocations, it seems to only call translate_fn for s390x.
>
>
> Agrh. You made me read this :) It is quite weird code and yeah 390-only :-/
>
>
>>>> ...then we could always use the ELF PhysAddr if it is different from 0
>>>> and only use the default load addr if the ELF PhysAddr is 0. If the user
>>>> gives kernel_addr on the cmdline, we honor that, even if puts the kernel
>>>> over the firmware (we have code to detect that).
>>>
>>>
>>> ELF address is 0 for LE zImage only, vmlinux BE/LE uses
>>> 0xc000000000000000. And we are already chopping all these tops bits off
>>> in translate_kernel_address() and I do not really know why _exactly_ it
>>> is 0x0fffffff and not let's say 0x7fffffff.
>> 
>> Oh, I am not talking about the ELF entry point. And that is not what
>> load_elf passes to translate_kernel_address either. It is the ELF
>> PhysAddr:
>> 
>> $ powerpc64le-buildroot-linux-gnu-readelf -We vmlinux | tail
>> 
>> Program Headers:
>>    Type           Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
>>    LOAD           0x010000 0xc000000000000000 0x0000000000000000 0x28d84d4 0x2a54ea8 RWE 0x10000
>> 
>> So it is 0x400000 for BE zImage and 0 for vmlinux.
>
> Ah right. Me wrong.
>
> btw potentially there can be more program sections.
>
> [fstn1-p1 ~]$ readelf -l /home/aik/p/slof/board-qemu/llfw/stage1.elf
>
> Elf file type is EXEC (Executable file)
> Entry point 0x100
> There are 2 program headers, starting at offset 64
>
> Program Headers:
>    Type           Offset             VirtAddr           PhysAddr
>                   FileSiz            MemSiz              Flags  Align
>    LOAD           0x0000000000000100 0x0000000000000100 0x0000000000000100
>                   0x0000000000008110 0x0000000000010290  RWE    0x4000
>    LOAD           0x0000000000008210 0x0000000000010400 0x0000000000010400
>                   0x0000000000000010 0x0000000000000010  RW     0x8
>
>
> grub might be similar.
>
>
>> 
>>>>
>>>>> @@ -2988,6 +2990,12 @@ static void spapr_machine_init(MachineState *machine)
>>>>>                exit(1);
>>>>>            }
>>>>>    
>>>>> +        if (spapr->kernel_addr != loaded_addr) {
>>>>
>>>> This could be:
>>>>
>>>>           if (spapr->kernel_addr == KERNEL_LOAD_ADDR &&
>>>> 	    spapr->kernel_addr != loaded_addr) {
>>>>
>>>> So the precedence would be:
>>>>
>>>> 1- ELF PhysAddr, if != 0. After all, that is what it's for. BE zImage
>>>>      falls here;
>>>>       
>>>> 2- KERNEL_LOAD_ADDR. Via translate_kernel_address, LE/BE vmlinux fall
>>>>      here;
>>>>
>>>> 3- kernel_addr. The user is probably hacking something, just use what
>>>>      they gave us. QEMU will yell if they load the kernel over the fw.
>>>
>>>
>>> imho too complicated.
>>>
>>> What if the user runs QEMU with kernel-addr=0x400000? (0x400000 is
>>> KERNEL_LOAD_ADDR noooow but not necessarily forever). Is it 2) or 3)?
>> 
>> Good point. It should always be 3. It is a bad user interface to have a
>> command line option and arbitrarily ignore it. Be it 0x0 or 0x400000.
>> 
>>> I am basically fixing a bug when we ignore where load_elf() loaded the
>>> ELF and just assume it is KERNEL_LOAD_ADDR. Now the code checks if the
>>> ELF was loaded where we want it to be.
>> 
>> That bug is already accounted for, that is why we have
>> translate_kernel_address:
>> 
>>    /* address_offset is hack for kernel images that are
>>       linked at the wrong physical address.  */
>>    if (translate_fn) {
>>        addr = translate_fn(translate_opaque, ph->p_paddr);
>> 
>> So we're actively telling load_elf to load the kernel at the wrong place
>> when we do:
>> 
>> (ph->p_addr & 0x0fffffff) + spapr->kernel_addr
>> 
>> It just happened to work so far because the vmlinux PhysAddr is 0.
>> 
>> When you set kernel-addr=0 you are simply working around this other bug
>> because:
>> 
>> (ph->p_addr & 0x0fffffff) + 0 == ph->p_addr
>> 
>> I just want to remove this indirection and use the ELF PhysAddr
>> directly.
>
>
> That's alright but not in translate_kernel_address(). May be I should 
> rename kernel-addr to kernel-offset (which it really is)  or  hack 
> load_elf() so it would take the desired location and work out the offset 
> inside (and ditch the translate_fn hook) but either way we'll need 
> heuristics (or the user should know) to know if the image is 
> self-relocatable or not as otherwise it just won't boot.
>
>
>
>>> Everything else can be done but on top of this.
>> 
>> If you want to merge this I could send another patch on top of it later,
>> no worries. I realise that this a larger change that will need more
>> testing.
>
>
> I want to have some easy-to-explain way of booting BE zImage :)

Ok, I guess I can live with this kernel-addr=0 ugliness. We can deal
with translate_kernel_address another time.

Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Daniel Henrique Barboza May 12, 2022, 5:47 p.m. UTC | #8
On 5/4/22 03:55, Alexey Kardashevskiy wrote:
> tl;dr: This allows Big Endian zImage booting via -kernel + x-vof=on.
> 
> QEMU loads the kernel at 0x400000 by default which works most of
> the time as Linux kernels are relocatable, 64bit and compiled with "-pie"
> (position independent code). This works for a little endian zImage too.
> 
> However a big endian zImage is compiled without -pie, is 32bit, linked to
> 0x4000000 so current QEMU ends up loading it at
> 0x4400000 but keeps spapr->kernel_addr unchanged so booting fails.
> 
> This uses the kernel address returned from load_elf().
> If the default kernel_addr is used, there is no change in behavior (as
> translate_kernel_address() takes care of this), which is:
> LE/BE vmlinux and LE zImage boot, BE zImage does not.
> If the VM created with "-machine kernel-addr=0,x-vof=on", then QEMU
> prints a warning and BE zImage boots.
> 
> Note #1: SLOF (x-vof=off) still cannot boot a big endian zImage as
> SLOF enables MSR_SF for everything loaded by QEMU and this leads to early
> crash of 32bit zImage.
> 
> Note #2: BE/LE vmlinux images set MSR_SF in early boot so these just work;
> a LE zImage restores MSR_SF after every CI call and we are lucky enough
> not to crash before the first CI call.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---

Queued in gitlab.com/danielhb/qemu/tree/ppc-next. Thanks,


Daniel


> 
> We could probably change SLOF to always clear MSR_SF before jumping to
> the kernel but this is 1) SLOF fix 2) not quite sure if it brings
> lots of value.
> 
> 
> I really wish I had this when tested this fix:
> https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20220406070038.3704604-1-aik@ozlabs.ru/
> 
> ---
>   hw/ppc/spapr.c | 12 ++++++++++--
>   1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index a4372ba1891e..89f18f6564bd 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -2971,14 +2971,16 @@ static void spapr_machine_init(MachineState *machine)
>       }
>   
>       if (kernel_filename) {
> +        uint64_t loaded_addr = 0;
> +
>           spapr->kernel_size = load_elf(kernel_filename, NULL,
>                                         translate_kernel_address, spapr,
> -                                      NULL, NULL, NULL, NULL, 1,
> +                                      NULL, &loaded_addr, NULL, NULL, 1,
>                                         PPC_ELF_MACHINE, 0, 0);
>           if (spapr->kernel_size == ELF_LOAD_WRONG_ENDIAN) {
>               spapr->kernel_size = load_elf(kernel_filename, NULL,
>                                             translate_kernel_address, spapr,
> -                                          NULL, NULL, NULL, NULL, 0,
> +                                          NULL, &loaded_addr, NULL, NULL, 0,
>                                             PPC_ELF_MACHINE, 0, 0);
>               spapr->kernel_le = spapr->kernel_size > 0;
>           }
> @@ -2988,6 +2990,12 @@ static void spapr_machine_init(MachineState *machine)
>               exit(1);
>           }
>   
> +        if (spapr->kernel_addr != loaded_addr) {
> +            warn_report("spapr: kernel_addr changed from 0x%lx to 0x%lx",
> +                        spapr->kernel_addr, loaded_addr);
> +            spapr->kernel_addr = loaded_addr;
> +        }
> +
>           /* load initrd */
>           if (initrd_filename) {
>               /* Try to locate the initrd in the gap between the kernel
Daniel Henrique Barboza May 17, 2022, 6:58 p.m. UTC | #9
Alexey,

I had to amend your commit due to Gitlab CI complaining about ...

On 5/4/22 03:55, Alexey Kardashevskiy wrote:
> tl;dr: This allows Big Endian zImage booting via -kernel + x-vof=on.
> 
> QEMU loads the kernel at 0x400000 by default which works most of
> the time as Linux kernels are relocatable, 64bit and compiled with "-pie"
> (position independent code). This works for a little endian zImage too.
> 
> However a big endian zImage is compiled without -pie, is 32bit, linked to
> 0x4000000 so current QEMU ends up loading it at
> 0x4400000 but keeps spapr->kernel_addr unchanged so booting fails.
> 
> This uses the kernel address returned from load_elf().
> If the default kernel_addr is used, there is no change in behavior (as
> translate_kernel_address() takes care of this), which is:
> LE/BE vmlinux and LE zImage boot, BE zImage does not.
> If the VM created with "-machine kernel-addr=0,x-vof=on", then QEMU
> prints a warning and BE zImage boots.
> 
> Note #1: SLOF (x-vof=off) still cannot boot a big endian zImage as
> SLOF enables MSR_SF for everything loaded by QEMU and this leads to early
> crash of 32bit zImage.
> 
> Note #2: BE/LE vmlinux images set MSR_SF in early boot so these just work;
> a LE zImage restores MSR_SF after every CI call and we are lucky enough
> not to crash before the first CI call.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
> 
> We could probably change SLOF to always clear MSR_SF before jumping to
> the kernel but this is 1) SLOF fix 2) not quite sure if it brings
> lots of value.
> 
> 
> I really wish I had this when tested this fix:
> https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20220406070038.3704604-1-aik@ozlabs.ru/
> 
> ---
>   hw/ppc/spapr.c | 12 ++++++++++--
>   1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index a4372ba1891e..89f18f6564bd 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -2971,14 +2971,16 @@ static void spapr_machine_init(MachineState *machine)
>       }
>   
>       if (kernel_filename) {
> +        uint64_t loaded_addr = 0;
> +
>           spapr->kernel_size = load_elf(kernel_filename, NULL,
>                                         translate_kernel_address, spapr,
> -                                      NULL, NULL, NULL, NULL, 1,
> +                                      NULL, &loaded_addr, NULL, NULL, 1,
>                                         PPC_ELF_MACHINE, 0, 0);
>           if (spapr->kernel_size == ELF_LOAD_WRONG_ENDIAN) {
>               spapr->kernel_size = load_elf(kernel_filename, NULL,
>                                             translate_kernel_address, spapr,
> -                                          NULL, NULL, NULL, NULL, 0,
> +                                          NULL, &loaded_addr, NULL, NULL, 0,
>                                             PPC_ELF_MACHINE, 0, 0);
>               spapr->kernel_le = spapr->kernel_size > 0;
>           }
> @@ -2988,6 +2990,12 @@ static void spapr_machine_init(MachineState *machine)
>               exit(1);
>           }
>   
> +        if (spapr->kernel_addr != loaded_addr) {
> +            warn_report("spapr: kernel_addr changed from 0x%lx to 0x%lx",
> +                        spapr->kernel_addr, loaded_addr);


... this code. This is problematic when compiling in a 32 bit environment because
the definition of long (long) unsigned differs from the usual 64 bit env we use:



../hw/ppc/spapr.c: In function 'spapr_machine_init':
../hw/ppc/spapr.c:2998:25: error: format '%lx' expects argument of type 'long unsigned int', but argument 2 has type 'uint64_t' {aka 'long long unsigned int'} [-Werror=format=]
  2998 |             warn_report("spapr: kernel_addr changed from 0x%lx to 0x%lx",
       |                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  2999 |                         spapr->kernel_addr, loaded_addr);
       |                         ~~~~~~~~~~~~~~~~~~
       |                              |
       |                              uint64_t {aka long long unsigned int}
../hw/ppc/spapr.c:2998:25: error: format '%lx' expects argument of type 'long unsigned int', but argument 3 has type 'uint64_t' {aka 'long long unsigned int'} [-Werror=format=]
  2998 |             warn_report("spapr: kernel_addr changed from 0x%lx to 0x%lx",
       |                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  2999 |                         spapr->kernel_addr, loaded_addr);
       |                                             ~~~~~~~~~~~
       |                                             |
       |                                             uint64_t {aka long long unsigned int}
cc1: all warnings being treated as errors


I've fixed it by doing the following:


diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 156e799ae9..8d5bdfc20f 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -2995,7 +2995,7 @@ static void spapr_machine_init(MachineState *machine)
          }
  
          if (spapr->kernel_addr != loaded_addr) {
-            warn_report("spapr: kernel_addr changed from 0x%lx to 0x%lx",
+            warn_report("spapr: kernel_addr changed from 0x%"PRIx64
+                        " to 0x%"PRIx64,
                          spapr->kernel_addr, loaded_addr);
              spapr->kernel_addr = loaded_addr;
          }



If you're ok with this fixup we can keep it as is. Otherwise feel free to send
another version.


Thanks,


Daniel





> +            spapr->kernel_addr = loaded_addr;
> +        }
> +
>           /* load initrd */
>           if (initrd_filename) {
>               /* Try to locate the initrd in the gap between the kernel
Alexey Kardashevskiy May 18, 2022, 2:51 a.m. UTC | #10
On 5/18/22 04:58, Daniel Henrique Barboza wrote:
> Alexey,
> 
> I had to amend your commit due to Gitlab CI complaining about ...
> 
> On 5/4/22 03:55, Alexey Kardashevskiy wrote:
>> tl;dr: This allows Big Endian zImage booting via -kernel + x-vof=on.
>>
>> QEMU loads the kernel at 0x400000 by default which works most of
>> the time as Linux kernels are relocatable, 64bit and compiled with "-pie"
>> (position independent code). This works for a little endian zImage too.
>>
>> However a big endian zImage is compiled without -pie, is 32bit, linked to
>> 0x4000000 so current QEMU ends up loading it at
>> 0x4400000 but keeps spapr->kernel_addr unchanged so booting fails.
>>
>> This uses the kernel address returned from load_elf().
>> If the default kernel_addr is used, there is no change in behavior (as
>> translate_kernel_address() takes care of this), which is:
>> LE/BE vmlinux and LE zImage boot, BE zImage does not.
>> If the VM created with "-machine kernel-addr=0,x-vof=on", then QEMU
>> prints a warning and BE zImage boots.
>>
>> Note #1: SLOF (x-vof=off) still cannot boot a big endian zImage as
>> SLOF enables MSR_SF for everything loaded by QEMU and this leads to early
>> crash of 32bit zImage.
>>
>> Note #2: BE/LE vmlinux images set MSR_SF in early boot so these just 
>> work;
>> a LE zImage restores MSR_SF after every CI call and we are lucky enough
>> not to crash before the first CI call.
>>
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>> ---
>>
>> We could probably change SLOF to always clear MSR_SF before jumping to
>> the kernel but this is 1) SLOF fix 2) not quite sure if it brings
>> lots of value.
>>
>>
>> I really wish I had this when tested this fix:
>> https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20220406070038.3704604-1-aik@ozlabs.ru/
>>
>> ---
>>   hw/ppc/spapr.c | 12 ++++++++++--
>>   1 file changed, 10 insertions(+), 2 deletions(-)
>>
>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>> index a4372ba1891e..89f18f6564bd 100644
>> --- a/hw/ppc/spapr.c
>> +++ b/hw/ppc/spapr.c
>> @@ -2971,14 +2971,16 @@ static void spapr_machine_init(MachineState 
>> *machine)
>>       }
>>       if (kernel_filename) {
>> +        uint64_t loaded_addr = 0;
>> +
>>           spapr->kernel_size = load_elf(kernel_filename, NULL,
>>                                         translate_kernel_address, spapr,
>> -                                      NULL, NULL, NULL, NULL, 1,
>> +                                      NULL, &loaded_addr, NULL, NULL, 1,
>>                                         PPC_ELF_MACHINE, 0, 0);
>>           if (spapr->kernel_size == ELF_LOAD_WRONG_ENDIAN) {
>>               spapr->kernel_size = load_elf(kernel_filename, NULL,
>>                                             translate_kernel_address, 
>> spapr,
>> -                                          NULL, NULL, NULL, NULL, 0,
>> +                                          NULL, &loaded_addr, NULL, 
>> NULL, 0,
>>                                             PPC_ELF_MACHINE, 0, 0);
>>               spapr->kernel_le = spapr->kernel_size > 0;
>>           }
>> @@ -2988,6 +2990,12 @@ static void spapr_machine_init(MachineState 
>> *machine)
>>               exit(1);
>>           }
>> +        if (spapr->kernel_addr != loaded_addr) {
>> +            warn_report("spapr: kernel_addr changed from 0x%lx to 
>> 0x%lx",
>> +                        spapr->kernel_addr, loaded_addr);
> 
> 
> ... this code. This is problematic when compiling in a 32 bit 
> environment because
> the definition of long (long) unsigned differs from the usual 64 bit env 
> we use:
> 
> 
> 
> ../hw/ppc/spapr.c: In function 'spapr_machine_init':
> ../hw/ppc/spapr.c:2998:25: error: format '%lx' expects argument of type 
> 'long unsigned int', but argument 2 has type 'uint64_t' {aka 'long long 
> unsigned int'} [-Werror=format=]
>   2998 |             warn_report("spapr: kernel_addr changed from 0x%lx 
> to 0x%lx",
>        |                         
> ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>   2999 |                         spapr->kernel_addr, loaded_addr);
>        |                         ~~~~~~~~~~~~~~~~~~
>        |                              |
>        |                              uint64_t {aka long long unsigned int}
> ../hw/ppc/spapr.c:2998:25: error: format '%lx' expects argument of type 
> 'long unsigned int', but argument 3 has type 'uint64_t' {aka 'long long 
> unsigned int'} [-Werror=format=]
>   2998 |             warn_report("spapr: kernel_addr changed from 0x%lx 
> to 0x%lx",
>        |                         
> ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>   2999 |                         spapr->kernel_addr, loaded_addr);
>        |                                             ~~~~~~~~~~~
>        |                                             |
>        |                                             uint64_t {aka long 
> long unsigned int}
> cc1: all warnings being treated as errors
> 
> 
> I've fixed it by doing the following:
> 
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 156e799ae9..8d5bdfc20f 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -2995,7 +2995,7 @@ static void spapr_machine_init(MachineState *machine)
>           }
> 
>           if (spapr->kernel_addr != loaded_addr) {
> -            warn_report("spapr: kernel_addr changed from 0x%lx to 0x%lx",
> +            warn_report("spapr: kernel_addr changed from 0x%"PRIx64
> +                        " to 0x%"PRIx64,
>                           spapr->kernel_addr, loaded_addr);
>               spapr->kernel_addr = loaded_addr;
>           }
> 
> 
> 
> If you're ok with this fixup we can keep it as is. Otherwise feel free 
> to send
> another version.


I am totally fine with this change, sorry I have not compile tested it, 
just assumed this cannot fail :-/


> 
> 
> Thanks,
> 
> 
> Daniel
> 
> 
> 
> 
> 
>> +            spapr->kernel_addr = loaded_addr;
>> +        }
>> +
>>           /* load initrd */
>>           if (initrd_filename) {
>>               /* Try to locate the initrd in the gap between the kernel
Daniel Henrique Barboza May 18, 2022, 10:07 a.m. UTC | #11
On 5/17/22 23:51, Alexey Kardashevskiy wrote:
> 
> 
> On 5/18/22 04:58, Daniel Henrique Barboza wrote:
>> Alexey,
>>
>> I had to amend your commit due to Gitlab CI complaining about ...
>>
>> On 5/4/22 03:55, Alexey Kardashevskiy wrote:
>>> tl;dr: This allows Big Endian zImage booting via -kernel + x-vof=on.
>>>
>>> QEMU loads the kernel at 0x400000 by default which works most of
>>> the time as Linux kernels are relocatable, 64bit and compiled with "-pie"
>>> (position independent code). This works for a little endian zImage too.
>>>
>>> However a big endian zImage is compiled without -pie, is 32bit, linked to
>>> 0x4000000 so current QEMU ends up loading it at
>>> 0x4400000 but keeps spapr->kernel_addr unchanged so booting fails.
>>>
>>> This uses the kernel address returned from load_elf().
>>> If the default kernel_addr is used, there is no change in behavior (as
>>> translate_kernel_address() takes care of this), which is:
>>> LE/BE vmlinux and LE zImage boot, BE zImage does not.
>>> If the VM created with "-machine kernel-addr=0,x-vof=on", then QEMU
>>> prints a warning and BE zImage boots.
>>>
>>> Note #1: SLOF (x-vof=off) still cannot boot a big endian zImage as
>>> SLOF enables MSR_SF for everything loaded by QEMU and this leads to early
>>> crash of 32bit zImage.
>>>
>>> Note #2: BE/LE vmlinux images set MSR_SF in early boot so these just work;
>>> a LE zImage restores MSR_SF after every CI call and we are lucky enough
>>> not to crash before the first CI call.
>>>
>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>> ---
>>>
>>> We could probably change SLOF to always clear MSR_SF before jumping to
>>> the kernel but this is 1) SLOF fix 2) not quite sure if it brings
>>> lots of value.
>>>
>>>
>>> I really wish I had this when tested this fix:
>>> https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20220406070038.3704604-1-aik@ozlabs.ru/
>>>
>>> ---
>>>   hw/ppc/spapr.c | 12 ++++++++++--
>>>   1 file changed, 10 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>>> index a4372ba1891e..89f18f6564bd 100644
>>> --- a/hw/ppc/spapr.c
>>> +++ b/hw/ppc/spapr.c
>>> @@ -2971,14 +2971,16 @@ static void spapr_machine_init(MachineState *machine)
>>>       }
>>>       if (kernel_filename) {
>>> +        uint64_t loaded_addr = 0;
>>> +
>>>           spapr->kernel_size = load_elf(kernel_filename, NULL,
>>>                                         translate_kernel_address, spapr,
>>> -                                      NULL, NULL, NULL, NULL, 1,
>>> +                                      NULL, &loaded_addr, NULL, NULL, 1,
>>>                                         PPC_ELF_MACHINE, 0, 0);
>>>           if (spapr->kernel_size == ELF_LOAD_WRONG_ENDIAN) {
>>>               spapr->kernel_size = load_elf(kernel_filename, NULL,
>>>                                             translate_kernel_address, spapr,
>>> -                                          NULL, NULL, NULL, NULL, 0,
>>> +                                          NULL, &loaded_addr, NULL, NULL, 0,
>>>                                             PPC_ELF_MACHINE, 0, 0);
>>>               spapr->kernel_le = spapr->kernel_size > 0;
>>>           }
>>> @@ -2988,6 +2990,12 @@ static void spapr_machine_init(MachineState *machine)
>>>               exit(1);
>>>           }
>>> +        if (spapr->kernel_addr != loaded_addr) {
>>> +            warn_report("spapr: kernel_addr changed from 0x%lx to 0x%lx",
>>> +                        spapr->kernel_addr, loaded_addr);
>>
>>
>> ... this code. This is problematic when compiling in a 32 bit environment because
>> the definition of long (long) unsigned differs from the usual 64 bit env we use:
>>
>>
>>
>> ../hw/ppc/spapr.c: In function 'spapr_machine_init':
>> ../hw/ppc/spapr.c:2998:25: error: format '%lx' expects argument of type 'long unsigned int', but argument 2 has type 'uint64_t' {aka 'long long unsigned int'} [-Werror=format=]
>>   2998 |             warn_report("spapr: kernel_addr changed from 0x%lx to 0x%lx",
>>        | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>   2999 |                         spapr->kernel_addr, loaded_addr);
>>        |                         ~~~~~~~~~~~~~~~~~~
>>        |                              |
>>        |                              uint64_t {aka long long unsigned int}
>> ../hw/ppc/spapr.c:2998:25: error: format '%lx' expects argument of type 'long unsigned int', but argument 3 has type 'uint64_t' {aka 'long long unsigned int'} [-Werror=format=]
>>   2998 |             warn_report("spapr: kernel_addr changed from 0x%lx to 0x%lx",
>>        | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>   2999 |                         spapr->kernel_addr, loaded_addr);
>>        |                                             ~~~~~~~~~~~
>>        |                                             |
>>        |                                             uint64_t {aka long long unsigned int}
>> cc1: all warnings being treated as errors
>>
>>
>> I've fixed it by doing the following:
>>
>>
>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>> index 156e799ae9..8d5bdfc20f 100644
>> --- a/hw/ppc/spapr.c
>> +++ b/hw/ppc/spapr.c
>> @@ -2995,7 +2995,7 @@ static void spapr_machine_init(MachineState *machine)
>>           }
>>
>>           if (spapr->kernel_addr != loaded_addr) {
>> -            warn_report("spapr: kernel_addr changed from 0x%lx to 0x%lx",
>> +            warn_report("spapr: kernel_addr changed from 0x%"PRIx64
>> +                        " to 0x%"PRIx64,
>>                           spapr->kernel_addr, loaded_addr);
>>               spapr->kernel_addr = loaded_addr;
>>           }
>>
>>
>>
>> If you're ok with this fixup we can keep it as is. Otherwise feel free to send
>> another version.
> 
> 
> I am totally fine with this change, sorry I have not compile tested it, just assumed this cannot fail :-/


Nah it's fine. You would only be able to see this error if you happen
to compile it with a 32 bit environment. Not sure if this is something
you use to do here and there. I sure don't.


Daniel



> 
> 
>>
>>
>> Thanks,
>>
>>
>> Daniel
>>
>>
>>
>>
>>
>>> +            spapr->kernel_addr = loaded_addr;
>>> +        }
>>> +
>>>           /* load initrd */
>>>           if (initrd_filename) {
>>>               /* Try to locate the initrd in the gap between the kernel
>
diff mbox series

Patch

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index a4372ba1891e..89f18f6564bd 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -2971,14 +2971,16 @@  static void spapr_machine_init(MachineState *machine)
     }
 
     if (kernel_filename) {
+        uint64_t loaded_addr = 0;
+
         spapr->kernel_size = load_elf(kernel_filename, NULL,
                                       translate_kernel_address, spapr,
-                                      NULL, NULL, NULL, NULL, 1,
+                                      NULL, &loaded_addr, NULL, NULL, 1,
                                       PPC_ELF_MACHINE, 0, 0);
         if (spapr->kernel_size == ELF_LOAD_WRONG_ENDIAN) {
             spapr->kernel_size = load_elf(kernel_filename, NULL,
                                           translate_kernel_address, spapr,
-                                          NULL, NULL, NULL, NULL, 0,
+                                          NULL, &loaded_addr, NULL, NULL, 0,
                                           PPC_ELF_MACHINE, 0, 0);
             spapr->kernel_le = spapr->kernel_size > 0;
         }
@@ -2988,6 +2990,12 @@  static void spapr_machine_init(MachineState *machine)
             exit(1);
         }
 
+        if (spapr->kernel_addr != loaded_addr) {
+            warn_report("spapr: kernel_addr changed from 0x%lx to 0x%lx",
+                        spapr->kernel_addr, loaded_addr);
+            spapr->kernel_addr = loaded_addr;
+        }
+
         /* load initrd */
         if (initrd_filename) {
             /* Try to locate the initrd in the gap between the kernel