mbox series

[0/1] hw/arm/aspeed: Add fby35 machine type

Message ID 20220503204451.1257898-1-pdel@fb.com
Headers show
Series hw/arm/aspeed: Add fby35 machine type | expand

Message

Peter Delevoryas May 3, 2022, 8:44 p.m. UTC
Hey everyone,

I'm submitting another Facebook (Meta Platforms) machine type: this time I'm
including an acceptance test too.

Unfortunately, this machine boots _very_ slowly. 300+ seconds. I'm not sure why
this is (so I don't know how to fix it easily) and I don't know how to override
the avocado test timeout just for a single test, so I increased the global
timeout for all "boot_linux_console.py" tests from 90s to 400s. I doubt this is
acceptable, but what other option is there? Should I add
AVOCADO_TIMEOUT_EXPECTED?

@skipUnless(os.getenv('AVOCADO_TIMEOUT_EXPECTED'), 'Test might timeout')

What is the point of this environment variable though, except to skip it in CIT?
If I run the test with this environment variable defined, it doesn't disable the
timeout, it just skips it right? I want an option to run this test with a larger
timeout.

Thanks,
Peter

Peter Delevoryas (1):
  hw/arm/aspeed: Add fby35 machine type

 hw/arm/aspeed.c                     | 63 +++++++++++++++++++++++++++++
 tests/avocado/boot_linux_console.py | 20 ++++++++-
 2 files changed, 82 insertions(+), 1 deletion(-)

Comments

Cédric Le Goater May 3, 2022, 9:35 p.m. UTC | #1
On 5/3/22 22:44, Peter Delevoryas wrote:
> Hey everyone,
> 
> I'm submitting another Facebook (Meta Platforms) machine type: this time I'm
> including an acceptance test too.
> 
> Unfortunately, this machine boots _very_ slowly. 300+ seconds. 

This is too much for avocado tests.

> I'm not sure why this is (so I don't know how to fix it easily)

The fuji has the same kind of problem. It takes ages to load the lzma ramdisk.
Could it be a modeling issue ? or how the FW image is compiled ?

Thanks,

C.


> and I don't know how to override
> the avocado test timeout just for a single test, so I increased the global
> timeout for all "boot_linux_console.py" tests from 90s to 400s. I doubt this is
> acceptable, but what other option is there? Should I add
> AVOCADO_TIMEOUT_EXPECTED?
> 
> @skipUnless(os.getenv('AVOCADO_TIMEOUT_EXPECTED'), 'Test might timeout')
> 
> What is the point of this environment variable though, except to skip it in CIT?
> If I run the test with this environment variable defined, it doesn't disable the
> timeout, it just skips it right? I want an option to run this test with a larger
> timeout.
> 
> Thanks,
> Peter
> 
> Peter Delevoryas (1):
>    hw/arm/aspeed: Add fby35 machine type
> 
>   hw/arm/aspeed.c                     | 63 +++++++++++++++++++++++++++++
>   tests/avocado/boot_linux_console.py | 20 ++++++++-
>   2 files changed, 82 insertions(+), 1 deletion(-)
>
Peter Delevoryas May 3, 2022, 10:47 p.m. UTC | #2
> On May 3, 2022, at 2:35 PM, Cédric Le Goater <clg@kaod.org> wrote:
> 
> On 5/3/22 22:44, Peter Delevoryas wrote:
>> Hey everyone,
>> I'm submitting another Facebook (Meta Platforms) machine type: this time I'm
>> including an acceptance test too.
>> Unfortunately, this machine boots _very_ slowly. 300+ seconds. 
> 
> This is too much for avocado tests.

Erg, yeah I figured as much. I’ll just resubmit it without the avocado test then,
if that sounds ok to you.

> 
>> I'm not sure why this is (so I don't know how to fix it easily)
> 
> The fuji has the same kind of problem. It takes ages to load the lzma ramdisk.
> Could it be a modeling issue ? or how the FW image is compiled ?

Yeah, one reason is that Facebook OpenBMC machines have an unnecessarily
big initramfs that includes all the rootfs stuff, whereas regular OpenBMC
machines have a smaller initramfs right? I don’t entirely know what I’m talking
about though.

I think most FB machines have moved to zstd compression recently though,
but this one may have been missed: I can fix that on the image side. I’ll
also experiment more to see if it’s something wrong with the image, or possibly
a regression in QEMU. It would really be super awesome if it could boot faster,
so I’m very motivated to find a solution.

> 
> Thanks,
> 
> C.
> 
> 
>> and I don't know how to override
>> the avocado test timeout just for a single test, so I increased the global
>> timeout for all "boot_linux_console.py" tests from 90s to 400s. I doubt this is
>> acceptable, but what other option is there? Should I add
>> AVOCADO_TIMEOUT_EXPECTED?
>> @skipUnless(os.getenv('AVOCADO_TIMEOUT_EXPECTED'), 'Test might timeout')
>> What is the point of this environment variable though, except to skip it in CIT?
>> If I run the test with this environment variable defined, it doesn't disable the
>> timeout, it just skips it right? I want an option to run this test with a larger
>> timeout.
>> Thanks,
>> Peter
>> Peter Delevoryas (1):
>>   hw/arm/aspeed: Add fby35 machine type
>>  hw/arm/aspeed.c                     | 63 +++++++++++++++++++++++++++++
>>  tests/avocado/boot_linux_console.py | 20 ++++++++-
>>  2 files changed, 82 insertions(+), 1 deletion(-)
>
Cédric Le Goater May 4, 2022, 7:26 a.m. UTC | #3
On 5/4/22 00:47, Peter Delevoryas wrote:
> 
> 
>> On May 3, 2022, at 2:35 PM, Cédric Le Goater <clg@kaod.org> wrote:
>>
>> On 5/3/22 22:44, Peter Delevoryas wrote:
>>> Hey everyone,
>>> I'm submitting another Facebook (Meta Platforms) machine type: this time I'm
>>> including an acceptance test too.
>>> Unfortunately, this machine boots _very_ slowly. 300+ seconds.
>>
>> This is too much for avocado tests.
> 
> Erg, yeah I figured as much. I’ll just resubmit it without the avocado test then,
> if that sounds ok to you.
> 
>>
>>> I'm not sure why this is (so I don't know how to fix it easily)
>>
>> The fuji has the same kind of problem. It takes ages to load the lzma ramdisk.
>> Could it be a modeling issue ? or how the FW image is compiled ?
> 
> Yeah, one reason is that Facebook OpenBMC machines have an unnecessarily
> big initramfs that includes all the rootfs stuff, 

Indeed,

    Trying 'ramdisk@1' ramdisk subimage
      Description:  RAMDISK
      Type:         RAMDisk Image
      Compression:  lzma compressed
      Data Start:   0x2047da18
      Data Size:    21938373 Bytes = 20.9 MiB

That doesn't help for sure.

> whereas regular OpenBMC machines have a smaller initramfs right? 

yes, about 1MB.

> I don’t entirely know what I’m talking about though.
> 
> I think most FB machines have moved to zstd compression recently though,
> but this one may have been missed: I can fix that on the image side. I’ll
> also experiment more to see if it’s something wrong with the image, or possibly
> a regression in QEMU. It would really be super awesome if it could boot faster,
> so I’m very motivated to find a solution.

there is something else because loading the kernel on the fuji takes
much longer than on the ast2600-evb and it is the same size :

    Trying 'kernel@1' kernel subimage
      Description:  Linux kernel
      Type:         Kernel Image
      Compression:  uncompressed
      Data Start:   0x201000e0
      Data Size:    3659848 Bytes = 3.5 MiB


Is uboot doing some special CPU configuration which would slow down
emulation ? Try profiling may be.

Thanks,

C.
Peter Delevoryas May 31, 2022, 6 p.m. UTC | #4
> On May 30, 2022, at 8:29 AM, Philippe Mathieu-Daudé via <qemu-arm@nongnu.org> wrote:
> 
> On 4/5/22 00:47, Peter Delevoryas wrote:
>>> On May 3, 2022, at 2:35 PM, Cédric Le Goater <clg@kaod.org> wrote:
>>> 
>>> On 5/3/22 22:44, Peter Delevoryas wrote:
>>>> Hey everyone,
>>>> I'm submitting another Facebook (Meta Platforms) machine type: this time I'm
>>>> including an acceptance test too.
>>>> Unfortunately, this machine boots _very_ slowly. 300+ seconds.
>>> 
>>> This is too much for avocado tests.
> 
> Use:
> 
>  @skipIf(os.getenv('GITLAB_CI'), 'Running on GitLab')
>  @skipUnless(os.getenv('AVOCADO_TIMEOUT_EXPECTED'),
>              'Big initramfs and run from flash')

Thanks for this suggestion!

> 
>> Erg, yeah I figured as much. I’ll just resubmit it without the avocado test then,
>> if that sounds ok to you.
> 
> No, please keep the test. While it won't run on CI, we can run it locally, very useful to bisect.

Ok, I’d be happy to resubmit the test now with the @skipIf and @skipUnless decorators
(Since the machine definition has been merged at this point).

> 
>>>> I'm not sure why this is (so I don't know how to fix it easily)
>>> 
>>> The fuji has the same kind of problem. It takes ages to load the lzma ramdisk.
>>> Could it be a modeling issue ? or how the FW image is compiled ?
>> Yeah, one reason is that Facebook OpenBMC machines have an unnecessarily
>> big initramfs that includes all the rootfs stuff, whereas regular OpenBMC
>> machines have a smaller initramfs right? I don’t entirely know what I’m talking
>> about though.
>> I think most FB machines have moved to zstd compression recently though,
>> but this one may have been missed: I can fix that on the image side. I’ll
>> also experiment more to see if it’s something wrong with the image, or possibly
>> a regression in QEMU. It would really be super awesome if it could boot faster,
>> so I’m very motivated to find a solution.
>