Message ID | 20190520231008.20140-1-mst@redhat.com |
---|---|
State | New |
Headers | show |
On Tue, 21 May 2019 at 00:10, Michael S. Tsirkin <mst@redhat.com> wrote: > > The following changes since commit 2259637b95bef3116cc262459271de08e038cc66: > > Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging (2019-05-20 17:22:05 +0100) > > are available in the Git repository at: > > git://git.kernel.org/pub/scm/virt/kvm/mst/qemu.git tags/for_upstream > > for you to fetch changes up to 0c05ec64c388aea59facbef740651afa78e04f50: > > tests: acpi: print error unable to dump ACPI table during rebuild (2019-05-20 18:40:02 -0400) > > ---------------------------------------------------------------- > pci, pc, virtio: features, fixes > > reconnect for vhost blk > tests for UEFI > misc other stuff > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com> > > ---------------------------------------------------------------- Hi -- this failed 'make check' for 32-bit Arm hosts: ERROR:/home/peter.maydell/qemu/tests/acpi-utils.c:145:acpi_find_rsdp_address_uefi: code should not be reached Aborted ERROR - too few tests run (expected 1, got 0) /home/peter.maydell/qemu/tests/Makefile.include:885: recipe for target 'check-qtest-aarch64' failed thanks -- PMM
On Tue, May 21, 2019 at 12:49:48PM +0100, Peter Maydell wrote: > On Tue, 21 May 2019 at 00:10, Michael S. Tsirkin <mst@redhat.com> wrote: > > > > The following changes since commit 2259637b95bef3116cc262459271de08e038cc66: > > > > Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging (2019-05-20 17:22:05 +0100) > > > > are available in the Git repository at: > > > > git://git.kernel.org/pub/scm/virt/kvm/mst/qemu.git tags/for_upstream > > > > for you to fetch changes up to 0c05ec64c388aea59facbef740651afa78e04f50: > > > > tests: acpi: print error unable to dump ACPI table during rebuild (2019-05-20 18:40:02 -0400) > > > > ---------------------------------------------------------------- > > pci, pc, virtio: features, fixes > > > > reconnect for vhost blk > > tests for UEFI > > misc other stuff > > > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com> > > > > ---------------------------------------------------------------- > > Hi -- this failed 'make check' for 32-bit Arm hosts: > > ERROR:/home/peter.maydell/qemu/tests/acpi-utils.c:145:acpi_find_rsdp_address_uefi: > code should not be reached > Aborted > ERROR - too few tests run (expected 1, got 0) > /home/peter.maydell/qemu/tests/Makefile.include:885: recipe for target > 'check-qtest-aarch64' failed > > thanks > -- PMM Nothing jumps out ... Igor?
On Tue, May 21, 2019 at 12:49:48PM +0100, Peter Maydell wrote: > On Tue, 21 May 2019 at 00:10, Michael S. Tsirkin <mst@redhat.com> wrote: > > > > The following changes since commit 2259637b95bef3116cc262459271de08e038cc66: > > > > Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging (2019-05-20 17:22:05 +0100) > > > > are available in the Git repository at: > > > > git://git.kernel.org/pub/scm/virt/kvm/mst/qemu.git tags/for_upstream > > > > for you to fetch changes up to 0c05ec64c388aea59facbef740651afa78e04f50: > > > > tests: acpi: print error unable to dump ACPI table during rebuild (2019-05-20 18:40:02 -0400) > > > > ---------------------------------------------------------------- > > pci, pc, virtio: features, fixes > > > > reconnect for vhost blk > > tests for UEFI > > misc other stuff > > > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com> > > > > ---------------------------------------------------------------- > > Hi -- this failed 'make check' for 32-bit Arm hosts: > > ERROR:/home/peter.maydell/qemu/tests/acpi-utils.c:145:acpi_find_rsdp_address_uefi: > code should not be reached > Aborted > ERROR - too few tests run (expected 1, got 0) > /home/peter.maydell/qemu/tests/Makefile.include:885: recipe for target > 'check-qtest-aarch64' failed > > thanks > -- PMM OK for now I will just drop UEFI tests from ARM. Igor will debug and re-add later on. Igor, as you do this maybe start by adding the infrastructure with the blacklist so we can formalize the current "these tables need to be updated".
On Tue, May 21, 2019 at 12:49:48PM +0100, Peter Maydell wrote: > On Tue, 21 May 2019 at 00:10, Michael S. Tsirkin <mst@redhat.com> wrote: > > > > The following changes since commit 2259637b95bef3116cc262459271de08e038cc66: > > > > Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging (2019-05-20 17:22:05 +0100) > > > > are available in the Git repository at: > > > > git://git.kernel.org/pub/scm/virt/kvm/mst/qemu.git tags/for_upstream > > > > for you to fetch changes up to 0c05ec64c388aea59facbef740651afa78e04f50: > > > > tests: acpi: print error unable to dump ACPI table during rebuild (2019-05-20 18:40:02 -0400) > > > > ---------------------------------------------------------------- > > pci, pc, virtio: features, fixes > > > > reconnect for vhost blk > > tests for UEFI > > misc other stuff > > > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com> > > > > ---------------------------------------------------------------- > > Hi -- this failed 'make check' for 32-bit Arm hosts: > > ERROR:/home/peter.maydell/qemu/tests/acpi-utils.c:145:acpi_find_rsdp_address_uefi: > code should not be reached > Aborted > ERROR - too few tests run (expected 1, got 0) > /home/peter.maydell/qemu/tests/Makefile.include:885: recipe for target > 'check-qtest-aarch64' failed > > thanks > -- PMM Dropped ARM and re-pushed.
On Tue, 21 May 2019 at 14:42, Michael S. Tsirkin <mst@redhat.com> wrote: > > On Tue, May 21, 2019 at 12:49:48PM +0100, Peter Maydell wrote: > > On Tue, 21 May 2019 at 00:10, Michael S. Tsirkin <mst@redhat.com> wrote: > > > > > > The following changes since commit 2259637b95bef3116cc262459271de08e038cc66: > > > > > > Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging (2019-05-20 17:22:05 +0100) > > > > > > are available in the Git repository at: > > > > > > git://git.kernel.org/pub/scm/virt/kvm/mst/qemu.git tags/for_upstream > > > > > > for you to fetch changes up to 0c05ec64c388aea59facbef740651afa78e04f50: > > > > > > tests: acpi: print error unable to dump ACPI table during rebuild (2019-05-20 18:40:02 -0400) > > > > > > ---------------------------------------------------------------- > > > pci, pc, virtio: features, fixes > > > > > > reconnect for vhost blk > > > tests for UEFI > > > misc other stuff > > > > > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com> > > > > > > ---------------------------------------------------------------- > > > > Hi -- this failed 'make check' for 32-bit Arm hosts: > > > > ERROR:/home/peter.maydell/qemu/tests/acpi-utils.c:145:acpi_find_rsdp_address_uefi: > > code should not be reached > > Aborted > > ERROR - too few tests run (expected 1, got 0) > > /home/peter.maydell/qemu/tests/Makefile.include:885: recipe for target > > 'check-qtest-aarch64' failed > > > > thanks > > -- PMM > > Dropped ARM and re-pushed. Fixed up version applied, thanks. Please update the changelog at https://wiki.qemu.org/ChangeLog/4.1 for any user-visible changes. -- PMM
On Tue, 21 May 2019 09:26:16 -0400 "Michael S. Tsirkin" <mst@redhat.com> wrote: > On Tue, May 21, 2019 at 12:49:48PM +0100, Peter Maydell wrote: > > On Tue, 21 May 2019 at 00:10, Michael S. Tsirkin <mst@redhat.com> wrote: > > > > > > The following changes since commit 2259637b95bef3116cc262459271de08e038cc66: > > > > > > Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging (2019-05-20 17:22:05 +0100) > > > > > > are available in the Git repository at: > > > > > > git://git.kernel.org/pub/scm/virt/kvm/mst/qemu.git tags/for_upstream > > > > > > for you to fetch changes up to 0c05ec64c388aea59facbef740651afa78e04f50: > > > > > > tests: acpi: print error unable to dump ACPI table during rebuild (2019-05-20 18:40:02 -0400) > > > > > > ---------------------------------------------------------------- > > > pci, pc, virtio: features, fixes > > > > > > reconnect for vhost blk > > > tests for UEFI > > > misc other stuff > > > > > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com> > > > > > > ---------------------------------------------------------------- > > > > Hi -- this failed 'make check' for 32-bit Arm hosts: > > > > ERROR:/home/peter.maydell/qemu/tests/acpi-utils.c:145:acpi_find_rsdp_address_uefi: > > code should not be reached > > Aborted > > ERROR - too few tests run (expected 1, got 0) > > /home/peter.maydell/qemu/tests/Makefile.include:885: recipe for target > > 'check-qtest-aarch64' failed > > > > thanks > > -- PMM > > Nothing jumps out ... Igor? On 32-bit ARM host and it looks like UEFI crashes (CCing Laszlo) with: InstallProtocolInterface: 5B1B31A1-9562-11D2-8E3F-00A0C969723B 469E52C0 ASSERT [DxeCore] /home/lacos/src/upstream/qemu/roms/edk2/MdePkg/Library/BaseLib/String.c(1090): Length < _gPcd_FixedAtBuild_PcdMaximumAsciiStringLength CLI to reproduce: qemu-system-aarch64 -display none -machine virt,accel=tcg -nodefaults -nographic -drive if=pflash,format=raw,file=pc-bios/edk2-aarch64-code.fd,readonly -drive if=pflash,format=raw,file=pc-bios/edk2-arm-vars.fd,snapshot=on -cdrom tests/data/uefi-boot-images/bios-tables-test.aarch64.iso.qcow2 -cpu cortex-a57 -serial stdio
On 05/22/19 15:06, Igor Mammedov wrote: > On Tue, 21 May 2019 09:26:16 -0400 > "Michael S. Tsirkin" <mst@redhat.com> wrote: > >> On Tue, May 21, 2019 at 12:49:48PM +0100, Peter Maydell wrote: >>> On Tue, 21 May 2019 at 00:10, Michael S. Tsirkin <mst@redhat.com> wrote: >>>> >>>> The following changes since commit 2259637b95bef3116cc262459271de08e038cc66: >>>> >>>> Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging (2019-05-20 17:22:05 +0100) >>>> >>>> are available in the Git repository at: >>>> >>>> git://git.kernel.org/pub/scm/virt/kvm/mst/qemu.git tags/for_upstream >>>> >>>> for you to fetch changes up to 0c05ec64c388aea59facbef740651afa78e04f50: >>>> >>>> tests: acpi: print error unable to dump ACPI table during rebuild (2019-05-20 18:40:02 -0400) >>>> >>>> ---------------------------------------------------------------- >>>> pci, pc, virtio: features, fixes >>>> >>>> reconnect for vhost blk >>>> tests for UEFI >>>> misc other stuff >>>> >>>> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> >>>> >>>> ---------------------------------------------------------------- >>> >>> Hi -- this failed 'make check' for 32-bit Arm hosts: >>> >>> ERROR:/home/peter.maydell/qemu/tests/acpi-utils.c:145:acpi_find_rsdp_address_uefi: >>> code should not be reached >>> Aborted >>> ERROR - too few tests run (expected 1, got 0) >>> /home/peter.maydell/qemu/tests/Makefile.include:885: recipe for target >>> 'check-qtest-aarch64' failed >>> >>> thanks >>> -- PMM >> >> Nothing jumps out ... Igor? > On 32-bit ARM host and it looks like UEFI crashes (CCing Laszlo) with: > > InstallProtocolInterface: 5B1B31A1-9562-11D2-8E3F-00A0C969723B 469E52C0 > ASSERT [DxeCore] /home/lacos/src/upstream/qemu/roms/edk2/MdePkg/Library/BaseLib/String.c(1090): Length < _gPcd_FixedAtBuild_PcdMaximumAsciiStringLength > > CLI to reproduce: > > qemu-system-aarch64 -display none -machine virt,accel=tcg -nodefaults -nographic -drive if=pflash,format=raw,file=pc-bios/edk2-aarch64-code.fd,readonly -drive if=pflash,format=raw,file=pc-bios/edk2-arm-vars.fd,snapshot=on -cdrom tests/data/uefi-boot-images/bios-tables-test.aarch64.iso.qcow2 -cpu cortex-a57 -serial stdio This is very interesting. I had obviously tested booting "bios-tables-test.aarch64.iso.qcow2" against "edk2-aarch64-code.fd", using TCG, on my x86_64 laptop. (And, I've run the above exact command just now, at commit a4f667b67149 -- it works 100% fine.) However, I've never been *near* a 32-bit ARM host. Therefore my suspicion is that the AARCH64 UEFI guest code tickles something in the 32-bit ARM code generator. It could be a bug in 32-bit ARM TCG, or it could be a bug in edk2 that is exposed by 32-bit ARM TCG. The direct assertion failure is mostly useless. The AsciiStrLen() function does what you'd expect it to, except it has a kind of "safety check" where it trips an assertion if the string length (under measurement) exceeds a pre-set platform constant. It helps with catching memory corruption errors. $ git show edk2-stable201903:MdePkg/Library/BaseLib/String.c | less 1090g UINTN EFIAPI AsciiStrLen ( IN CONST CHAR8 *String ) { UINTN Length; ASSERT (String != NULL); for (Length = 0; *String != '\0'; String++, Length++) { // // If PcdMaximumUnicodeStringLength is not zero, // length should not more than PcdMaximumUnicodeStringLength // if (PcdGet32 (PcdMaximumAsciiStringLength) != 0) { ASSERT (Length < PcdGet32 (PcdMaximumAsciiStringLength)); <-- HERE } } return Length; } (Never mind that the comment has a typo -- it incorrectly references "PcdMaximumUnicodeStringLength", but the PCD that's actually checked is (correctly) "PcdMaximumAsciiStringLength".) The constant is set to decimal 1,000,000 in ArmVirtQemu builds (inherited from MdePkg.dec), and that's indeed a quite unlikely length for real-word strings seen by firmware. I'll take a closer look once I have access to a 32-bit ARM host, but I'll definitely need help. Basically we should compare the original AARCH64 (dis)assembly with the QEMU-generated 32-bit ARM assembly. Hopefully I can at least get a backtrace myself. Thanks, Laszlo
On Wed, 22 May 2019 at 15:22, Laszlo Ersek <lersek@redhat.com> wrote: > This is very interesting. I had obviously tested booting > "bios-tables-test.aarch64.iso.qcow2" against "edk2-aarch64-code.fd", > using TCG, on my x86_64 laptop. (And, I've run the above exact command > just now, at commit a4f667b67149 -- it works 100% fine.) > > However, I've never been *near* a 32-bit ARM host. Therefore my > suspicion is that the AARCH64 UEFI guest code tickles something in the > 32-bit ARM code generator. It could be a bug in 32-bit ARM TCG, or it > could be a bug in edk2 that is exposed by 32-bit ARM TCG. Does it repro in a 32-bit-chroot in an aarch64 host ? I don't know if that might be easier for you to set up than getting access to real 32-bit hardware. thanks -- PMM
(+Ard) On 05/22/19 16:22, Laszlo Ersek wrote: > On 05/22/19 15:06, Igor Mammedov wrote: >> On Tue, 21 May 2019 09:26:16 -0400 >> "Michael S. Tsirkin" <mst@redhat.com> wrote: >> >>> On Tue, May 21, 2019 at 12:49:48PM +0100, Peter Maydell wrote: >>>> On Tue, 21 May 2019 at 00:10, Michael S. Tsirkin <mst@redhat.com> >>>> wrote: >>>>> >>>>> The following changes since commit >>>>> 2259637b95bef3116cc262459271de08e038cc66: >>>>> >>>>> Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' >>>>> into staging (2019-05-20 17:22:05 +0100) >>>>> >>>>> are available in the Git repository at: >>>>> >>>>> git://git.kernel.org/pub/scm/virt/kvm/mst/qemu.git >>>>> tags/for_upstream >>>>> >>>>> for you to fetch changes up to >>>>> 0c05ec64c388aea59facbef740651afa78e04f50: >>>>> >>>>> tests: acpi: print error unable to dump ACPI table during >>>>> rebuild (2019-05-20 18:40:02 -0400) >>>>> >>>>> ---------------------------------------------------------------- >>>>> pci, pc, virtio: features, fixes >>>>> >>>>> reconnect for vhost blk >>>>> tests for UEFI >>>>> misc other stuff >>>>> >>>>> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> >>>>> >>>>> ---------------------------------------------------------------- >>>> >>>> Hi -- this failed 'make check' for 32-bit Arm hosts: >>>> >>>> ERROR:/home/peter.maydell/qemu/tests/acpi-utils.c:145:acpi_find_rsdp_address_uefi: >>>> code should not be reached >>>> Aborted >>>> ERROR - too few tests run (expected 1, got 0) >>>> /home/peter.maydell/qemu/tests/Makefile.include:885: recipe for >>>> /target check-qtest-aarch64' failed >>>> >>>> thanks >>>> -- PMM >>> >>> Nothing jumps out ... Igor? >> On 32-bit ARM host and it looks like UEFI crashes (CCing Laszlo) >> with: >> >> InstallProtocolInterface: 5B1B31A1-9562-11D2-8E3F-00A0C969723B 469E52C0 >> ASSERT [DxeCore] /home/lacos/src/upstream/qemu/roms/edk2/MdePkg/Library/BaseLib/String.c(1090): Length < _gPcd_FixedAtBuild_PcdMaximumAsciiStringLength >> >> CLI to reproduce: >> >> qemu-system-aarch64 -display none -machine virt,accel=tcg >> -nodefaults -nographic -drive >> if=pflash,format=raw,file=pc-bios/edk2-aarch64-code.fd,readonly >> -drive if=pflash,format=raw,file=pc-bios/edk2-arm-vars.fd,snapshot=on >> -cdrom tests/data/uefi-boot-images/bios-tables-test.aarch64.iso.qcow2 >> -cpu cortex-a57 -serial stdio > > This is very interesting. I had obviously tested booting > "bios-tables-test.aarch64.iso.qcow2" against "edk2-aarch64-code.fd", > using TCG, on my x86_64 laptop. (And, I've run the above exact command > just now, at commit a4f667b67149 -- it works 100% fine.) > > However, I've never been *near* a 32-bit ARM host. Therefore my > suspicion is that the AARCH64 UEFI guest code tickles something in the > 32-bit ARM code generator. It could be a bug in 32-bit ARM TCG, or it > could be a bug in edk2 that is exposed by 32-bit ARM TCG. > > The direct assertion failure is mostly useless. The AsciiStrLen() > function does what you'd expect it to, except it has a kind of "safety > check" where it trips an assertion if the string length (under > measurement) exceeds a pre-set platform constant. It helps with > catching memory corruption errors. > > $ git show edk2-stable201903:MdePkg/Library/BaseLib/String.c | less > 1090g > > UINTN > EFIAPI > AsciiStrLen ( > IN CONST CHAR8 *String > ) > { > UINTN Length; > > ASSERT (String != NULL); > > for (Length = 0; *String != '\0'; String++, Length++) { > // > // If PcdMaximumUnicodeStringLength is not zero, > // length should not more than PcdMaximumUnicodeStringLength > // > if (PcdGet32 (PcdMaximumAsciiStringLength) != 0) { > ASSERT (Length < PcdGet32 (PcdMaximumAsciiStringLength)); <-- HERE > } > } > return Length; > } > > (Never mind that the comment has a typo -- it incorrectly references > "PcdMaximumUnicodeStringLength", but the PCD that's actually checked > is (correctly) "PcdMaximumAsciiStringLength".) > > The constant is set to decimal 1,000,000 in ArmVirtQemu builds > (inherited from MdePkg.dec), and that's indeed a quite unlikely length > for real-word strings seen by firmware. > > I'll take a closer look once I have access to a 32-bit ARM host, but > I'll definitely need help. Basically we should compare the original > AARCH64 (dis)assembly with the QEMU-generated 32-bit ARM assembly. > Hopefully I can at least get a backtrace myself. I have narrowed down the issue sufficiently that I think I can hand it over to Peter and Ard now -- because they know AARCH32 and AARCH64 assembly, and "target/arm/translate-a64.c" and "tcg/arm/*" too. The summarize the issue for Ard, the symptom is that AARCH64 ArmVirtQemu runs perfectly fine with TCG on an x86-64 system, but it crashes on an AARCH32 host system. Below is my analysis. (1) First, I determined a backtrace for the crash. For this, I flipped the ASSERT() failure disposition from CpuDeadLoop() to CpuBreakpoint(), via "PcdDebugPropertyMask". This printed a very nice (numeric) stack trace, which wasn't hard to turn into symbols with "objdump -S", using edk2's Build directory. (2) The actual crash is completely irrelevant, as it occurs on a cleanup path after the DXE Core fails to load the very first DXE driver that it attempts to load. The cleanup path should never be entered (i.e. the attempt to load the DXE driver in question should never fail). BTW the DXE driver in question is "MdeModulePkg/Universal/DevicePathDxe", but it's mostly irrelevant. (3) The function that first encounters a failure -- i.e. where the guest firmware behavior diverges, dependent on whether qemu-system-aarch64/TCG executes on an AARCH32 host, or an x86-64 host -- is PeCoffLoaderRelocateImage(), in "MdePkg/Library/BasePeCoffLib/BasePeCoff.c". It is invoked when the DXE Core loads DevicePathDxe. The following check fails in the function (on the AARCH32 host), when it attempts to process the very first relocation block in DevicePathDxe: > // > // Add check for RelocBase->SizeOfBlock field. > // > if (RelocBase->SizeOfBlock == 0) { > ImageContext->ImageError = IMAGE_ERROR_FAILED_RELOCATION; > return RETURN_LOAD_ERROR; > } I logged the address and the contents of (*RelocBase). The address is the same in both working and failing cases, the contents differ. (4) I tracked back a little bit to CoreLoadImageCommon() in "MdeModulePkg/Core/Dxe/Image/Image.c", to the spot where the image file is fetched (for later relocation). The following function call succeeds in both cases, however it returns *different data* as the DevicePathDxe.efi image file: > FHand.Source = GetFileBufferByFilePath ( > BootPolicy, > FilePath, > &FHand.SourceSize, > &AuthenticationStatus > ); Base address and size are identical, the CRC32s differ. After hexdumping the image variants (functional vs. broken with garbled relocations), and diffing the logs, an interesting pattern emerged. In every 4096 byte block, the 8-byte word at offset 4032 (0xFC0) is zeroed out in the broken variant. There are no other differences, as far as I can tell. 4096 = 64*64, and the qword in question is the start of the last 64-byte block (63*64=4032). I'm attaching the two log sections ("good.txt" (from the x86-64 host) vs "bad.txt" (from the aarch64 host)). (5) Because the DevicePathDxe.efi image originates from the FvMain firmware volume, which is embedded as an LZMA-compressed file into the FVMAIN_COMPACT firmware volume, I hooked another CRC32 calculation into LzmaUefiDecompress(), in "MdeModulePkg/Library/LzmaCustomDecompressLib/LzmaDecompress.c". The decompression is performed by the PEI Core with the help of the DXE IPL PEIM; in other words it happens in the PEI phase. The log confirmed that the firmware ran identically on both hosts (x86-64 and aarch32). Thus, the data corruption was introduced somewhere between the decompression near the end of PEI, and GetFileBufferByFilePath() in the DXE Core. (6) Here I got a bit frustrated by the many possible paths in the reading of firmware volumes, in the files "MdeModulePkg/Core/Dxe/FwVol/FwVolRead.c" and "MdeModulePkg/Core/Dxe/SectionExtraction/CoreSectionExtraction.c". However, all those paths seemed to end in CopyMem(), one way or another -- ultimately, CopyMem() would transfer the data from the decompressed firmware volume (which was fine) to the caller of GetFileBufferByFilePath() (which was not fine). (7) CopyMem() comes from the BaseMemoryLib class. "ArmVirtPkg/ArmVirt.dsc.inc" resolves it to the following lib instances: > [LibraryClasses.common] > # use the accelerated BaseMemoryLibOptDxe by default, overrides for SEC/PEI below > BaseMemoryLib|MdePkg/Library/BaseMemoryLibOptDxe/BaseMemoryLibOptDxe.inf > > [LibraryClasses.common.SEC] > BaseMemoryLib|MdePkg/Library/BaseMemoryLib/BaseMemoryLib.inf > > [LibraryClasses.common.PEI_CORE] > BaseMemoryLib|MdePkg/Library/BaseMemoryLib/BaseMemoryLib.inf > > [LibraryClasses.common.PEIM] > BaseMemoryLib|MdePkg/Library/BaseMemoryLib/BaseMemoryLib.inf The optimized aarch64 assembly code can be seen here: https://github.com/tianocore/edk2/blob/master/MdePkg/Library/BaseMemoryLibOptDxe/AArch64/CopyMem.S It has great comments, and the 64-byte chunk size mentioned in the comments made me realize that 0xFC0 equals 63 decimal * 64 decimal. (8) I applied the following (proof of concept) patch: > diff --git a/ArmVirtPkg/ArmVirt.dsc.inc b/ArmVirtPkg/ArmVirt.dsc.inc > index a5d63751a343..c643a5a76718 100644 > --- a/ArmVirtPkg/ArmVirt.dsc.inc > +++ b/ArmVirtPkg/ArmVirt.dsc.inc > @@ -67,8 +67,7 @@ [LibraryClasses.common] > # > PcdLib|MdePkg/Library/DxePcdLib/DxePcdLib.inf > > - # use the accelerated BaseMemoryLibOptDxe by default, overrides for SEC/PEI below > - BaseMemoryLib|MdePkg/Library/BaseMemoryLibOptDxe/BaseMemoryLibOptDxe.inf > + BaseMemoryLib|MdePkg/Library/BaseMemoryLib/BaseMemoryLib.inf > > # Networking Requirements > !include NetworkPkg/NetworkLibs.dsc.inc > @@ -160,7 +159,6 @@ [LibraryClasses.common] > > [LibraryClasses.common.SEC] > PcdLib|MdePkg/Library/BasePcdLibNull/BasePcdLibNull.inf > - BaseMemoryLib|MdePkg/Library/BaseMemoryLib/BaseMemoryLib.inf > > DebugAgentLib|ArmPkg/Library/DebugAgentSymbolsBaseLib/DebugAgentSymbolsBaseLib.inf > SerialPortLib|ArmVirtPkg/Library/FdtPL011SerialPortLib/EarlyFdtPL011SerialPortLib.inf > @@ -171,7 +169,6 @@ [LibraryClasses.common.SEC] > > [LibraryClasses.common.PEI_CORE] > PcdLib|MdePkg/Library/PeiPcdLib/PeiPcdLib.inf > - BaseMemoryLib|MdePkg/Library/BaseMemoryLib/BaseMemoryLib.inf > HobLib|MdePkg/Library/PeiHobLib/PeiHobLib.inf > PeiServicesLib|MdePkg/Library/PeiServicesLib/PeiServicesLib.inf > MemoryAllocationLib|MdePkg/Library/PeiMemoryAllocationLib/PeiMemoryAllocationLib.inf > @@ -186,7 +183,6 @@ [LibraryClasses.common.PEI_CORE] > > [LibraryClasses.common.PEIM] > PcdLib|MdePkg/Library/PeiPcdLib/PeiPcdLib.inf > - BaseMemoryLib|MdePkg/Library/BaseMemoryLib/BaseMemoryLib.inf > HobLib|MdePkg/Library/PeiHobLib/PeiHobLib.inf > PeiServicesLib|MdePkg/Library/PeiServicesLib/PeiServicesLib.inf > MemoryAllocationLib|MdePkg/Library/PeiMemoryAllocationLib/PeiMemoryAllocationLib.inf which replaces the assembly implementation of CopyMem() -- and of some other functions -- with C implementations (which are also optimized; see commit 01f688be90f5, "MdePkg/BaseMemoryLib: widen aligned accesses to 32 or 64 bits", 2016-09-13), in all module types. (9) With this patch, the boot finished successfully (although it took very long): > BiosTablesTest: BiosTablesTest=41200000 Rsdp10=0 Rsdp20=40370000 > BiosTablesTest: Smbios21=0 Smbios30=43EF0000 > BiosTablesTest: press any key to exit (10) Given that "translate-a64.c" is common between both x86-64 and aarch32 hosts, I think it must be "tcg/arm/*" that doesn't interoperate with the guest's "MdePkg/Library/BaseMemoryLibOptDxe/AArch64/CopyMem.S", for some reason. IOW, the aarch64 binary code is likely parsed correctly into the internal representation, but the 32-bit ARM code generated from the IR could hit some corner case. Thanks, Laszlo
On 05/23/19 02:51, Laszlo Ersek wrote: > I'm attaching the two log sections ("good.txt" (from > the x86-64 host) vs "bad.txt" (from the aarch64 host)). Typo correction: "bad.txt" comes from an aarch32 host. (That's quite the point.) Thanks & sorry Laszlo
On Thu, 23 May 2019 at 01:51, Laszlo Ersek <lersek@redhat.com> wrote: > I have narrowed down the issue sufficiently that I think I can hand it > over to Peter and Ard now -- because they know AARCH32 and AARCH64 > assembly, and "target/arm/translate-a64.c" and "tcg/arm/*" too. > > The summarize the issue for Ard, the symptom is that AARCH64 ArmVirtQemu > runs perfectly fine with TCG on an x86-64 system, but it crashes on an > AARCH32 host system. Thanks for the investigation; this is probably more one for Richard than me. -- PMM
Hi Peter, On 05/22/19 23:15, Peter Maydell wrote: > On Wed, 22 May 2019 at 15:22, Laszlo Ersek <lersek@redhat.com> wrote: >> This is very interesting. I had obviously tested booting >> "bios-tables-test.aarch64.iso.qcow2" against "edk2-aarch64-code.fd", >> using TCG, on my x86_64 laptop. (And, I've run the above exact command >> just now, at commit a4f667b67149 -- it works 100% fine.) >> >> However, I've never been *near* a 32-bit ARM host. Therefore my >> suspicion is that the AARCH64 UEFI guest code tickles something in the >> 32-bit ARM code generator. It could be a bug in 32-bit ARM TCG, or it >> could be a bug in edk2 that is exposed by 32-bit ARM TCG. > > Does it repro in a 32-bit-chroot in an aarch64 host ? > I don't know if that might be easier for you to set up than > getting access to real 32-bit hardware. If this test is still necessary (e.g. for repeated testing), I might be able to run qemu-system-aarch64/TCG in a 32-bit QEMU/KVM guest on my Mustang. I was preparing to set up this kind of nesting, when Igor gave me access to a real aarch32 host. Thanks Laszlo
On 05/23/19 10:37, Peter Maydell wrote: > On Thu, 23 May 2019 at 01:51, Laszlo Ersek <lersek@redhat.com> wrote: >> I have narrowed down the issue sufficiently that I think I can hand >> it over to Peter and Ard now -- because they know AARCH32 and AARCH64 >> assembly, and "target/arm/translate-a64.c" and "tcg/arm/*" too. >> >> The summarize the issue for Ard, the symptom is that AARCH64 >> ArmVirtQemu runs perfectly fine with TCG on an x86-64 system, but it >> crashes on an AARCH32 host system. > > Thanks for the investigation; this is probably more one for Richard > than me. I figured I'd provide some logs. (1) To recap, the aarch64 assembly source code file that seems to be mis-translated (from the aarch64 binary) lives at https://github.com/tianocore/edk2/blob/3604174718e2afc950c3cc64c64ba5165c8692bd/MdePkg/Library/BaseMemoryLibOptDxe/AArch64/CopyMem.S The relevant functions are InternalMemCopyMem() and __memcpy(). (2) I disassembled the aarch64 binary with "objdump", and uploaded the listing to: http://people.redhat.com/lersek/aarch64-to-arm-mistranslation/DxeCore.objdump.xz The InternalMemCopyMem() function starts at hex offset 2b2ec. The __memcpy() function starts at hex offset 2b180. (3) I ran the guest with "-d in_asm,op,op_opt,op_ind,out_asm" and uploaded the log file to: http://people.redhat.com/lersek/aarch64-to-arm-mistranslation/tcg.in_asm.op.op_opt.op_ind.out_asm.log.xz The TBs that correspond to (parts of) the InternalMemCopyMem() and __memcpy() functions are scattered over the file, but the offset between the "nice" disassembly from (2), and the in-RAM TBs in (3), can be determined from the fact that there is a single prfm instruction in the entire binary. The instruction's offset is 0x2b180 in (2) -- at the beginning of the __memcpy() function --, and its RAM address is 0x472d2180 in (3). Thus the difference (= the load address of DxeCore.efi) is 0x472a7000. (This is logged by the guest as well: > Loading DXE CORE at 0x000472A7000 EntryPoint=0x000472A8000 ) Thanks Laszlo
The following changes since commit 2259637b95bef3116cc262459271de08e038cc66: Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging (2019-05-20 17:22:05 +0100) are available in the Git repository at: git://git.kernel.org/pub/scm/virt/kvm/mst/qemu.git tags/for_upstream for you to fetch changes up to 0c05ec64c388aea59facbef740651afa78e04f50: tests: acpi: print error unable to dump ACPI table during rebuild (2019-05-20 18:40:02 -0400) ---------------------------------------------------------------- pci, pc, virtio: features, fixes reconnect for vhost blk tests for UEFI misc other stuff Signed-off-by: Michael S. Tsirkin <mst@redhat.com> ---------------------------------------------------------------- Dan Streetman (1): do not call vhost_net_cleanup() on running net from char user event Daniel P. Berrangé (2): hw: report invalid disable-legacy|modern usage for virtio-1-only devs Revert "globals: Allow global properties to be optional" David Gibson (2): pcie: Remove redundant test in pcie_mmcfg_data_{read,write}() pci: Simplify pci_bus_is_root() Igor Mammedov (16): q35: acpi: do not create dummy MCFG table tests: acpi: rename acpi_parse_rsdp_table() into acpi_fetch_rsdp_table() tests: acpi: make acpi_fetch_table() take size of fetched table pointer tests: acpi: make RSDT test routine handle XSDT tests: acpi: make pointer to RSDP 64bit tests: acpi: fetch X_DSDT if pointer to DSDT is 0 tests: acpi: skip FACS table if board uses hw reduced ACPI profile tests: acpi: move boot_sector_init() into x86 tests branch tests: acpi: add acpi_find_rsdp_address_uefi() helper tests: acpi: add a way to start tests with UEFI firmware tests: acpi: ignore SMBIOS tests when UEFI firmware is used tests: acpi: allow to override default accelerator tests: add expected ACPI tables for arm/virt board tests: acpi: add simple arm/virt testcase tests: acpi: refactor rebuild-expected-aml.sh to dump ACPI tables for a specified list of targets tests: acpi: print error unable to dump ACPI table during rebuild Li Feng (1): libvhost-user: fix bad vu_log_write Marc-André Lureau (1): docs: reST-ify vhost-user documentation Markus Armbruster (3): acpi/piix4: Convert debug printf()s to trace events acpi/pcihp: Convert debug printf()s to trace events acpi/pcihp: Add a few more trace points related to unplug Wei Yang (3): hw/arm/virt-acpi-build: remove unnecessary variable mcfg_start i386, acpi: remove mcfg_ prefix in AcpiMcfgInfo members hw/arm/virt-acpi-build: pass AcpiMcfgInfo to build_mcfg() Xie Yongji (7): virtio: Introduce started flag to VirtioDevice virtio: Use started flag in virtio_vmstate_change() vhost-user-blk: Use started flag in vhost_user_blk_set_status() vhost-user-blk: Only start vhost-user backend with the first kick vhost-user-blk: Add return value for vhost_user_blk_start() vhost-user-blk: Add support to reconnect backend contrib/vhost-user-blk: enable inflight I/O tracking docs/interop/vhost-user.txt | 1219 ---------------------------- hw/virtio/virtio-pci.h | 31 +- include/hw/acpi/pci.h | 33 + include/hw/pci/pci.h | 1 - include/hw/pci/pci_bus.h | 12 +- include/hw/qdev-core.h | 3 - include/hw/virtio/vhost-user-blk.h | 3 + include/hw/virtio/virtio.h | 2 + tests/acpi-utils.h | 7 +- contrib/libvhost-user/libvhost-user.c | 2 +- contrib/vhost-user-blk/vhost-user-blk.c | 3 +- hw/acpi/pcihp.c | 32 +- hw/acpi/piix4.c | 14 +- hw/arm/virt-acpi-build.c | 22 +- hw/block/vhost-user-blk.c | 175 +++- hw/core/machine.c | 23 +- hw/display/virtio-gpu-pci.c | 4 +- hw/display/virtio-vga.c | 4 +- hw/i386/acpi-build.c | 32 +- hw/pci-bridge/pci_expander_bridge.c | 6 - hw/pci/pci.c | 14 +- hw/pci/pcie_host.c | 10 - hw/virtio/virtio-crypto-pci.c | 4 +- hw/virtio/virtio-input-pci.c | 4 +- hw/virtio/virtio-pci.c | 27 +- hw/virtio/virtio.c | 54 +- net/vhost-user.c | 1 - qom/object.c | 3 - tests/acpi-utils.c | 68 +- tests/bios-tables-test.c | 146 +++- tests/vmgenid-test.c | 6 +- MAINTAINERS | 2 +- docs/interop/index.rst | 2 +- docs/interop/vhost-user.rst | 1351 +++++++++++++++++++++++++++++++ hw/acpi/trace-events | 16 + tests/Makefile.include | 1 + tests/data/acpi/rebuild-expected-aml.sh | 23 +- tests/data/acpi/virt/APIC | Bin 0 -> 168 bytes tests/data/acpi/virt/DSDT | Bin 0 -> 18476 bytes tests/data/acpi/virt/FACP | Bin 0 -> 268 bytes tests/data/acpi/virt/GTDT | Bin 0 -> 96 bytes tests/data/acpi/virt/MCFG | Bin 0 -> 60 bytes tests/data/acpi/virt/SPCR | Bin 0 -> 80 bytes 43 files changed, 1908 insertions(+), 1452 deletions(-) delete mode 100644 docs/interop/vhost-user.txt create mode 100644 include/hw/acpi/pci.h create mode 100644 docs/interop/vhost-user.rst create mode 100644 tests/data/acpi/virt/APIC create mode 100644 tests/data/acpi/virt/DSDT create mode 100644 tests/data/acpi/virt/FACP create mode 100644 tests/data/acpi/virt/GTDT create mode 100644 tests/data/acpi/virt/MCFG create mode 100644 tests/data/acpi/virt/SPCR