Patchwork BUG: QEMU aborts when setting breakpoint in gdb (bisected)

login
register
mail settings
Submitter Paolo Bonzini
Date Nov. 6, 2013, 5:39 p.m.
Message ID <527A7EDE.3060409@redhat.com>
Download mbox | patch
Permalink /patch/288979/
State New
Headers show

Comments

Paolo Bonzini - Nov. 6, 2013, 5:39 p.m.
Il 06/11/2013 17:22, Luiz Capitulino ha scritto:
> 1. Run qemu with gdb server support
> 
>    # qemu [...] -s -S
> 
> 2. Connect gdb and try to set a breakpoint
> 
>    $ gdb /path/to/vmlinux
>    (gdb) target remote:1234
>    (gdb) b secondary_startup_64

(Note that this doesn't make much sense until the kernel has been loaded
into memory.  You probably want hbreak instead).

> 3. On qemu terminal
> 
> qemu-qmp: /home/lcapitulino/work/src/upstream/qmp-unstable/include/qemu/int128.h:22: int128_get64: Assertion `!a.hi' failed.
> Aborted (core dumped)
>    
> According to bisect the culprit is:
> 
> commit a53ae8e934cd54686875b5bcfc2f434244ee55d6
> Author: Marcel Apfelbaum <marcel.a@redhat.com>
> Date:   Mon Sep 16 11:21:16 2013 +0300
> 
>     hw/pci: partially handle pci master abort

I couldn't get quite the same reproducer, mine was:

$ gdb
(gdb) set arch  i386:x86-64
The target architecture is assumed to be i386:x86-64
(gdb) target remote localhost:1234
Remote debugging using localhost:1234
<bang>

The problem is that gdb attempts to read a few bytes from address
0xffffffffffffffe6 to 0xffffffffffffffff inclusive.

The region it gets is the (newly introduced) master abort region, which
is as big as the PCI address space (see pci_bus_init).  Due to a typo
that's only 2^63-1, not 2^64.  But we get it anyway because
phys_page_find ignores the upper bits of the physical address.  In
address_space_translate_internal then

    diff = int128_sub(section->mr->size, int128_make64(addr));
    *plen = int128_get64(int128_min(diff, int128_make64(*plen)));
Michael S. Tsirkin - Nov. 6, 2013, 5:48 p.m.
On Wed, Nov 06, 2013 at 06:39:42PM +0100, Paolo Bonzini wrote:
> Il 06/11/2013 17:22, Luiz Capitulino ha scritto:
> > 1. Run qemu with gdb server support
> > 
> >    # qemu [...] -s -S
> > 
> > 2. Connect gdb and try to set a breakpoint
> > 
> >    $ gdb /path/to/vmlinux
> >    (gdb) target remote:1234
> >    (gdb) b secondary_startup_64
> 
> (Note that this doesn't make much sense until the kernel has been loaded
> into memory.  You probably want hbreak instead).
> 
> > 3. On qemu terminal
> > 
> > qemu-qmp: /home/lcapitulino/work/src/upstream/qmp-unstable/include/qemu/int128.h:22: int128_get64: Assertion `!a.hi' failed.
> > Aborted (core dumped)
> >    
> > According to bisect the culprit is:
> > 
> > commit a53ae8e934cd54686875b5bcfc2f434244ee55d6
> > Author: Marcel Apfelbaum <marcel.a@redhat.com>
> > Date:   Mon Sep 16 11:21:16 2013 +0300
> > 
> >     hw/pci: partially handle pci master abort
> 
> I couldn't get quite the same reproducer, mine was:
> 
> $ gdb
> (gdb) set arch  i386:x86-64
> The target architecture is assumed to be i386:x86-64
> (gdb) target remote localhost:1234
> Remote debugging using localhost:1234
> <bang>
> 
> The problem is that gdb attempts to read a few bytes from address
> 0xffffffffffffffe6 to 0xffffffffffffffff inclusive.
> 
> The region it gets is the (newly introduced) master abort region, which
> is as big as the PCI address space (see pci_bus_init).  Due to a typo
> that's only 2^63-1, not 2^64.  But we get it anyway because
> phys_page_find ignores the upper bits of the physical address.  In
> address_space_translate_internal then
> 
>     diff = int128_sub(section->mr->size, int128_make64(addr));
>     *plen = int128_get64(int128_min(diff, int128_make64(*plen)));
> 
> diff becomes negative, and int128_get64 booms.
> 
> Will send as a proper patch tomorrow... can you give your Tested-by?


This just makes the symproms go away.
The real bug is exec ignores high address bits during page
lookups. It should fail on invalid address not access
a random page.
I'll send a patch.



> 
> diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
> index b3d94bd..68901c3 100644
> --- a/hw/i386/pc_piix.c
> +++ b/hw/i386/pc_piix.c
> @@ -114,7 +114,7 @@ static void pc_init1(QEMUMachineInitArgs *args,
> 
>      if (pci_enabled) {
>          pci_memory = g_new(MemoryRegion, 1);
> -        memory_region_init(pci_memory, NULL, "pci", INT64_MAX);
> +        memory_region_init(pci_memory, NULL, "pci", UINT64_MAX);
>          rom_memory = pci_memory;
>      } else {
>          pci_memory = NULL;
> diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
> index 2e315f7..c9a03fc 100644
> --- a/hw/i386/pc_q35.c
> +++ b/hw/i386/pc_q35.c
> @@ -101,7 +101,7 @@ static void pc_q35_init(QEMUMachineInitArgs *args)
>      /* pci enabled */
>      if (pci_enabled) {
>          pci_memory = g_new(MemoryRegion, 1);
> -        memory_region_init(pci_memory, NULL, "pci", INT64_MAX);
> +        memory_region_init(pci_memory, NULL, "pci", UINT64_MAX);
>          rom_memory = pci_memory;
>      } else {
>          pci_memory = NULL;
> 


This is also ugly and will be broken on actual 64 bit systems
(not x86). Generally INT64_MAX does not make sense at all.
Paolo Bonzini - Nov. 6, 2013, 5:50 p.m.
Il 06/11/2013 18:48, Michael S. Tsirkin ha scritto:
> This just makes the symproms go away.

That's correct.

> The real bug is exec ignores high address bits during page
> lookups. It should fail on invalid address not access
> a random page.
> I'll send a patch.

The real real bug is that all address spaces should be 2^64, which you
said you consider too intrusive a patch.  I don't feel confident
changing phys_page_find, even if it's just 2 lines.

Paolo
Luiz Capitulino - Nov. 6, 2013, 6:36 p.m.
On Wed, 06 Nov 2013 18:39:42 +0100
Paolo Bonzini <pbonzini@redhat.com> wrote:

> Il 06/11/2013 17:22, Luiz Capitulino ha scritto:
> > 1. Run qemu with gdb server support
> > 
> >    # qemu [...] -s -S
> > 
> > 2. Connect gdb and try to set a breakpoint
> > 
> >    $ gdb /path/to/vmlinux
> >    (gdb) target remote:1234
> >    (gdb) b secondary_startup_64
> 
> (Note that this doesn't make much sense until the kernel has been loaded
> into memory.  You probably want hbreak instead).

hbreak didn't work either, gdb doesn't stop at the breakpoint. I tried to
test this with another random function and got a "Remote 'g' packet
reply is too long" (which seems to be yet another different problem).

> > 3. On qemu terminal
> > 
> > qemu-qmp: /home/lcapitulino/work/src/upstream/qmp-unstable/include/qemu/int128.h:22: int128_get64: Assertion `!a.hi' failed.
> > Aborted (core dumped)
> >    
> > According to bisect the culprit is:
> > 
> > commit a53ae8e934cd54686875b5bcfc2f434244ee55d6
> > Author: Marcel Apfelbaum <marcel.a@redhat.com>
> > Date:   Mon Sep 16 11:21:16 2013 +0300
> > 
> >     hw/pci: partially handle pci master abort
> 
> I couldn't get quite the same reproducer, mine was:
> 
> $ gdb
> (gdb) set arch  i386:x86-64
> The target architecture is assumed to be i386:x86-64
> (gdb) target remote localhost:1234
> Remote debugging using localhost:1234
> <bang>
> 
> The problem is that gdb attempts to read a few bytes from address
> 0xffffffffffffffe6 to 0xffffffffffffffff inclusive.
> 
> The region it gets is the (newly introduced) master abort region, which
> is as big as the PCI address space (see pci_bus_init).  Due to a typo
> that's only 2^63-1, not 2^64.  But we get it anyway because
> phys_page_find ignores the upper bits of the physical address.  In
> address_space_translate_internal then
> 
>     diff = int128_sub(section->mr->size, int128_make64(addr));
>     *plen = int128_get64(int128_min(diff, int128_make64(*plen)));
> 
> diff becomes negative, and int128_get64 booms.
> 
> Will send as a proper patch tomorrow... can you give your Tested-by?

Tested-by: Luiz Capitulino <lcapitulino@redhat.com>

> 
> 
> diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
> index b3d94bd..68901c3 100644
> --- a/hw/i386/pc_piix.c
> +++ b/hw/i386/pc_piix.c
> @@ -114,7 +114,7 @@ static void pc_init1(QEMUMachineInitArgs *args,
> 
>      if (pci_enabled) {
>          pci_memory = g_new(MemoryRegion, 1);
> -        memory_region_init(pci_memory, NULL, "pci", INT64_MAX);
> +        memory_region_init(pci_memory, NULL, "pci", UINT64_MAX);
>          rom_memory = pci_memory;
>      } else {
>          pci_memory = NULL;
> diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
> index 2e315f7..c9a03fc 100644
> --- a/hw/i386/pc_q35.c
> +++ b/hw/i386/pc_q35.c
> @@ -101,7 +101,7 @@ static void pc_q35_init(QEMUMachineInitArgs *args)
>      /* pci enabled */
>      if (pci_enabled) {
>          pci_memory = g_new(MemoryRegion, 1);
> -        memory_region_init(pci_memory, NULL, "pci", INT64_MAX);
> +        memory_region_init(pci_memory, NULL, "pci", UINT64_MAX);
>          rom_memory = pci_memory;
>      } else {
>          pci_memory = NULL;
> 
>
Michael S. Tsirkin - Nov. 6, 2013, 6:39 p.m.
On Wed, Nov 06, 2013 at 06:50:05PM +0100, Paolo Bonzini wrote:
> Il 06/11/2013 18:48, Michael S. Tsirkin ha scritto:
> > This just makes the symproms go away.
> 
> That's correct.
> 
> > The real bug is exec ignores high address bits during page
> > lookups. It should fail on invalid address not access
> > a random page.
> > I'll send a patch.
> 
> The real real bug is that all address spaces should be 2^64, which you
> said you consider too intrusive a patch.

Because this will affect performance in unpredicatable way.
We can't make such changes in 1.7 IMHO:
it would need much more than just a quick "works for me".

>  I don't feel confident
> changing phys_page_find, even if it's just 2 lines.
> 
> Paolo

Well it's *obviously* broken if address is outside target address
space.
Take a look at the patch first, then argue.
Paolo Bonzini - Nov. 6, 2013, 9:11 p.m.
Il 06/11/2013 19:36, Luiz Capitulino ha scritto:
> On Wed, 06 Nov 2013 18:39:42 +0100
> Paolo Bonzini <pbonzini@redhat.com> wrote:
> 
>> Il 06/11/2013 17:22, Luiz Capitulino ha scritto:
>>> 1. Run qemu with gdb server support
>>>
>>>    # qemu [...] -s -S
>>>
>>> 2. Connect gdb and try to set a breakpoint
>>>
>>>    $ gdb /path/to/vmlinux
>>>    (gdb) target remote:1234
>>>    (gdb) b secondary_startup_64
>>
>> (Note that this doesn't make much sense until the kernel has been loaded
>> into memory.  You probably want hbreak instead).
> 
> hbreak didn't work either, gdb doesn't stop at the breakpoint. I tried to
> test this with another random function and got a "Remote 'g' packet
> reply is too long" (which seems to be yet another different problem).

Yeah, that's very messy and it would nice to have a fix for it, but I
don't know enough about gdb to say whether it's fixable.

It happens when the processor switches from 32 to 64-bit under gdb's
feet.  The solution is typically to do "set arch  i386:x86-64" before
running the guest with "c" if you know the breakpoint will happen in
64-bit mode.

Paolo
Paolo Bonzini - Nov. 6, 2013, 9:13 p.m.
Il 06/11/2013 19:39, Michael S. Tsirkin ha scritto:
> Because this will affect performance in unpredicatable way.

It's not really unpredictable.  It can be easily unit-tested, and anyway
the targets with 64-bit address spaces don't have any particular
performance problem.

> We can't make such changes in 1.7 IMHO:
> it would need much more than just a quick "works for me".

I can say the same about phys_page_find.  It's been obviously broken for
years and nobody ever cared, it cannot be super-urgent now to fix it.

Paolo

>> >  I don't feel confident
>> > changing phys_page_find, even if it's just 2 lines.
>> > 
>> > Paolo
> Well it's *obviously* broken if address is outside target address
> space.
> Take a look at the patch first, then argue.
Michael S. Tsirkin - Nov. 6, 2013, 9:36 p.m.
On Wed, Nov 06, 2013 at 10:13:16PM +0100, Paolo Bonzini wrote:
> Il 06/11/2013 19:39, Michael S. Tsirkin ha scritto:
> > Because this will affect performance in unpredicatable way.
> 
> It's not really unpredictable.  It can be easily unit-tested,

Go ahead, post, test, I'm not stopping you.

> and anyway
> the targets with 64-bit address spaces don't have any particular
> performance problem.

That's only s390x and they happen not to use IO for anything,
instead they do hypercalls. So hard to tell.

> > We can't make such changes in 1.7 IMHO:
> > it would need much more than just a quick "works for me".
> 
> I can say the same about phys_page_find.

Why? It's *obvious* this code is not designed to work with
addresses > target address space.
You doubt that?
What is the worry here?

>  It's been obviously broken for
> years and nobody ever cared, it cannot be super-urgent now to fix it.
> 
> Paolo

Well we added a bunch of code and now it is failing.

> >> >  I don't feel confident
> >> > changing phys_page_find, even if it's just 2 lines.
> >> > 
> >> > Paolo
> > Well it's *obviously* broken if address is outside target address
> > space.
> > Take a look at the patch first, then argue.

Patch

diff becomes negative, and int128_get64 booms.

Will send as a proper patch tomorrow... can you give your Tested-by?


diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index b3d94bd..68901c3 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -114,7 +114,7 @@  static void pc_init1(QEMUMachineInitArgs *args,

     if (pci_enabled) {
         pci_memory = g_new(MemoryRegion, 1);
-        memory_region_init(pci_memory, NULL, "pci", INT64_MAX);
+        memory_region_init(pci_memory, NULL, "pci", UINT64_MAX);
         rom_memory = pci_memory;
     } else {
         pci_memory = NULL;
diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index 2e315f7..c9a03fc 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -101,7 +101,7 @@  static void pc_q35_init(QEMUMachineInitArgs *args)
     /* pci enabled */
     if (pci_enabled) {
         pci_memory = g_new(MemoryRegion, 1);
-        memory_region_init(pci_memory, NULL, "pci", INT64_MAX);
+        memory_region_init(pci_memory, NULL, "pci", UINT64_MAX);
         rom_memory = pci_memory;
     } else {
         pci_memory = NULL;