Message ID | 527A7EDE.3060409@redhat.com |
---|---|
State | New |
Headers | show |
On Wed, Nov 06, 2013 at 06:39:42PM +0100, Paolo Bonzini wrote: > Il 06/11/2013 17:22, Luiz Capitulino ha scritto: > > 1. Run qemu with gdb server support > > > > # qemu [...] -s -S > > > > 2. Connect gdb and try to set a breakpoint > > > > $ gdb /path/to/vmlinux > > (gdb) target remote:1234 > > (gdb) b secondary_startup_64 > > (Note that this doesn't make much sense until the kernel has been loaded > into memory. You probably want hbreak instead). > > > 3. On qemu terminal > > > > qemu-qmp: /home/lcapitulino/work/src/upstream/qmp-unstable/include/qemu/int128.h:22: int128_get64: Assertion `!a.hi' failed. > > Aborted (core dumped) > > > > According to bisect the culprit is: > > > > commit a53ae8e934cd54686875b5bcfc2f434244ee55d6 > > Author: Marcel Apfelbaum <marcel.a@redhat.com> > > Date: Mon Sep 16 11:21:16 2013 +0300 > > > > hw/pci: partially handle pci master abort > > I couldn't get quite the same reproducer, mine was: > > $ gdb > (gdb) set arch i386:x86-64 > The target architecture is assumed to be i386:x86-64 > (gdb) target remote localhost:1234 > Remote debugging using localhost:1234 > <bang> > > The problem is that gdb attempts to read a few bytes from address > 0xffffffffffffffe6 to 0xffffffffffffffff inclusive. > > The region it gets is the (newly introduced) master abort region, which > is as big as the PCI address space (see pci_bus_init). Due to a typo > that's only 2^63-1, not 2^64. But we get it anyway because > phys_page_find ignores the upper bits of the physical address. In > address_space_translate_internal then > > diff = int128_sub(section->mr->size, int128_make64(addr)); > *plen = int128_get64(int128_min(diff, int128_make64(*plen))); > > diff becomes negative, and int128_get64 booms. > > Will send as a proper patch tomorrow... can you give your Tested-by? This just makes the symproms go away. The real bug is exec ignores high address bits during page lookups. It should fail on invalid address not access a random page. I'll send a patch. > > diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c > index b3d94bd..68901c3 100644 > --- a/hw/i386/pc_piix.c > +++ b/hw/i386/pc_piix.c > @@ -114,7 +114,7 @@ static void pc_init1(QEMUMachineInitArgs *args, > > if (pci_enabled) { > pci_memory = g_new(MemoryRegion, 1); > - memory_region_init(pci_memory, NULL, "pci", INT64_MAX); > + memory_region_init(pci_memory, NULL, "pci", UINT64_MAX); > rom_memory = pci_memory; > } else { > pci_memory = NULL; > diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c > index 2e315f7..c9a03fc 100644 > --- a/hw/i386/pc_q35.c > +++ b/hw/i386/pc_q35.c > @@ -101,7 +101,7 @@ static void pc_q35_init(QEMUMachineInitArgs *args) > /* pci enabled */ > if (pci_enabled) { > pci_memory = g_new(MemoryRegion, 1); > - memory_region_init(pci_memory, NULL, "pci", INT64_MAX); > + memory_region_init(pci_memory, NULL, "pci", UINT64_MAX); > rom_memory = pci_memory; > } else { > pci_memory = NULL; > This is also ugly and will be broken on actual 64 bit systems (not x86). Generally INT64_MAX does not make sense at all.
Il 06/11/2013 18:48, Michael S. Tsirkin ha scritto: > This just makes the symproms go away. That's correct. > The real bug is exec ignores high address bits during page > lookups. It should fail on invalid address not access > a random page. > I'll send a patch. The real real bug is that all address spaces should be 2^64, which you said you consider too intrusive a patch. I don't feel confident changing phys_page_find, even if it's just 2 lines. Paolo
On Wed, 06 Nov 2013 18:39:42 +0100 Paolo Bonzini <pbonzini@redhat.com> wrote: > Il 06/11/2013 17:22, Luiz Capitulino ha scritto: > > 1. Run qemu with gdb server support > > > > # qemu [...] -s -S > > > > 2. Connect gdb and try to set a breakpoint > > > > $ gdb /path/to/vmlinux > > (gdb) target remote:1234 > > (gdb) b secondary_startup_64 > > (Note that this doesn't make much sense until the kernel has been loaded > into memory. You probably want hbreak instead). hbreak didn't work either, gdb doesn't stop at the breakpoint. I tried to test this with another random function and got a "Remote 'g' packet reply is too long" (which seems to be yet another different problem). > > 3. On qemu terminal > > > > qemu-qmp: /home/lcapitulino/work/src/upstream/qmp-unstable/include/qemu/int128.h:22: int128_get64: Assertion `!a.hi' failed. > > Aborted (core dumped) > > > > According to bisect the culprit is: > > > > commit a53ae8e934cd54686875b5bcfc2f434244ee55d6 > > Author: Marcel Apfelbaum <marcel.a@redhat.com> > > Date: Mon Sep 16 11:21:16 2013 +0300 > > > > hw/pci: partially handle pci master abort > > I couldn't get quite the same reproducer, mine was: > > $ gdb > (gdb) set arch i386:x86-64 > The target architecture is assumed to be i386:x86-64 > (gdb) target remote localhost:1234 > Remote debugging using localhost:1234 > <bang> > > The problem is that gdb attempts to read a few bytes from address > 0xffffffffffffffe6 to 0xffffffffffffffff inclusive. > > The region it gets is the (newly introduced) master abort region, which > is as big as the PCI address space (see pci_bus_init). Due to a typo > that's only 2^63-1, not 2^64. But we get it anyway because > phys_page_find ignores the upper bits of the physical address. In > address_space_translate_internal then > > diff = int128_sub(section->mr->size, int128_make64(addr)); > *plen = int128_get64(int128_min(diff, int128_make64(*plen))); > > diff becomes negative, and int128_get64 booms. > > Will send as a proper patch tomorrow... can you give your Tested-by? Tested-by: Luiz Capitulino <lcapitulino@redhat.com> > > > diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c > index b3d94bd..68901c3 100644 > --- a/hw/i386/pc_piix.c > +++ b/hw/i386/pc_piix.c > @@ -114,7 +114,7 @@ static void pc_init1(QEMUMachineInitArgs *args, > > if (pci_enabled) { > pci_memory = g_new(MemoryRegion, 1); > - memory_region_init(pci_memory, NULL, "pci", INT64_MAX); > + memory_region_init(pci_memory, NULL, "pci", UINT64_MAX); > rom_memory = pci_memory; > } else { > pci_memory = NULL; > diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c > index 2e315f7..c9a03fc 100644 > --- a/hw/i386/pc_q35.c > +++ b/hw/i386/pc_q35.c > @@ -101,7 +101,7 @@ static void pc_q35_init(QEMUMachineInitArgs *args) > /* pci enabled */ > if (pci_enabled) { > pci_memory = g_new(MemoryRegion, 1); > - memory_region_init(pci_memory, NULL, "pci", INT64_MAX); > + memory_region_init(pci_memory, NULL, "pci", UINT64_MAX); > rom_memory = pci_memory; > } else { > pci_memory = NULL; > >
On Wed, Nov 06, 2013 at 06:50:05PM +0100, Paolo Bonzini wrote: > Il 06/11/2013 18:48, Michael S. Tsirkin ha scritto: > > This just makes the symproms go away. > > That's correct. > > > The real bug is exec ignores high address bits during page > > lookups. It should fail on invalid address not access > > a random page. > > I'll send a patch. > > The real real bug is that all address spaces should be 2^64, which you > said you consider too intrusive a patch. Because this will affect performance in unpredicatable way. We can't make such changes in 1.7 IMHO: it would need much more than just a quick "works for me". > I don't feel confident > changing phys_page_find, even if it's just 2 lines. > > Paolo Well it's *obviously* broken if address is outside target address space. Take a look at the patch first, then argue.
Il 06/11/2013 19:36, Luiz Capitulino ha scritto: > On Wed, 06 Nov 2013 18:39:42 +0100 > Paolo Bonzini <pbonzini@redhat.com> wrote: > >> Il 06/11/2013 17:22, Luiz Capitulino ha scritto: >>> 1. Run qemu with gdb server support >>> >>> # qemu [...] -s -S >>> >>> 2. Connect gdb and try to set a breakpoint >>> >>> $ gdb /path/to/vmlinux >>> (gdb) target remote:1234 >>> (gdb) b secondary_startup_64 >> >> (Note that this doesn't make much sense until the kernel has been loaded >> into memory. You probably want hbreak instead). > > hbreak didn't work either, gdb doesn't stop at the breakpoint. I tried to > test this with another random function and got a "Remote 'g' packet > reply is too long" (which seems to be yet another different problem). Yeah, that's very messy and it would nice to have a fix for it, but I don't know enough about gdb to say whether it's fixable. It happens when the processor switches from 32 to 64-bit under gdb's feet. The solution is typically to do "set arch i386:x86-64" before running the guest with "c" if you know the breakpoint will happen in 64-bit mode. Paolo
Il 06/11/2013 19:39, Michael S. Tsirkin ha scritto: > Because this will affect performance in unpredicatable way. It's not really unpredictable. It can be easily unit-tested, and anyway the targets with 64-bit address spaces don't have any particular performance problem. > We can't make such changes in 1.7 IMHO: > it would need much more than just a quick "works for me". I can say the same about phys_page_find. It's been obviously broken for years and nobody ever cared, it cannot be super-urgent now to fix it. Paolo >> > I don't feel confident >> > changing phys_page_find, even if it's just 2 lines. >> > >> > Paolo > Well it's *obviously* broken if address is outside target address > space. > Take a look at the patch first, then argue.
On Wed, Nov 06, 2013 at 10:13:16PM +0100, Paolo Bonzini wrote: > Il 06/11/2013 19:39, Michael S. Tsirkin ha scritto: > > Because this will affect performance in unpredicatable way. > > It's not really unpredictable. It can be easily unit-tested, Go ahead, post, test, I'm not stopping you. > and anyway > the targets with 64-bit address spaces don't have any particular > performance problem. That's only s390x and they happen not to use IO for anything, instead they do hypercalls. So hard to tell. > > We can't make such changes in 1.7 IMHO: > > it would need much more than just a quick "works for me". > > I can say the same about phys_page_find. Why? It's *obvious* this code is not designed to work with addresses > target address space. You doubt that? What is the worry here? > It's been obviously broken for > years and nobody ever cared, it cannot be super-urgent now to fix it. > > Paolo Well we added a bunch of code and now it is failing. > >> > I don't feel confident > >> > changing phys_page_find, even if it's just 2 lines. > >> > > >> > Paolo > > Well it's *obviously* broken if address is outside target address > > space. > > Take a look at the patch first, then argue.
diff becomes negative, and int128_get64 booms. Will send as a proper patch tomorrow... can you give your Tested-by? diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c index b3d94bd..68901c3 100644 --- a/hw/i386/pc_piix.c +++ b/hw/i386/pc_piix.c @@ -114,7 +114,7 @@ static void pc_init1(QEMUMachineInitArgs *args, if (pci_enabled) { pci_memory = g_new(MemoryRegion, 1); - memory_region_init(pci_memory, NULL, "pci", INT64_MAX); + memory_region_init(pci_memory, NULL, "pci", UINT64_MAX); rom_memory = pci_memory; } else { pci_memory = NULL; diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c index 2e315f7..c9a03fc 100644 --- a/hw/i386/pc_q35.c +++ b/hw/i386/pc_q35.c @@ -101,7 +101,7 @@ static void pc_q35_init(QEMUMachineInitArgs *args) /* pci enabled */ if (pci_enabled) { pci_memory = g_new(MemoryRegion, 1); - memory_region_init(pci_memory, NULL, "pci", INT64_MAX); + memory_region_init(pci_memory, NULL, "pci", UINT64_MAX); rom_memory = pci_memory; } else { pci_memory = NULL;