diff mbox

[for-2.0?] target-i386: fix gdb debugging with large memory guests

Message ID 20140317215455.2f14b61f@redhat.com
State New
Headers show

Commit Message

Luiz Capitulino March 18, 2014, 1:54 a.m. UTC
If you start a Linux guest with more than 4GB of memory and try to look at a
memory address, you will get an error from gdb:

(gdb) p node_data[0]->node_id
Cannot access memory at address 0xffff88013fffd3a0
(gdb)

I debugged this down to x86_cpu_get_phys_page_debug(), it doesn't handle the
case where the PDPTE has the PS bit set (although I didn't check where Linux
sets that bit). This commit adds the PS bit handling, which fixes the problem
for me.

Signed-off-by: Luiz capitulino <lcapitulino@redhat.com>
---

Two observations:

 1. This bug has always existed, so it's not a regression, so I'm not sure
    it's worth it to fix for 2.0

 2. I'm not familiar with every detail of x86_cpu_get_phys_page_debug(),
    so I'm not completely sure this is the right thing to do

 target-i386/helper.c | 8 ++++++++
 1 file changed, 8 insertions(+)

Comments

Jan Kiszka March 18, 2014, 7:19 a.m. UTC | #1
On 2014-03-18 02:54, Luiz Capitulino wrote:
> If you start a Linux guest with more than 4GB of memory and try to look at a
> memory address, you will get an error from gdb:
> 
> (gdb) p node_data[0]->node_id
> Cannot access memory at address 0xffff88013fffd3a0
> (gdb)

I suppose this is x86-64, not 32-bit with PTE, right?

> 
> I debugged this down to x86_cpu_get_phys_page_debug(), it doesn't handle the
> case where the PDPTE has the PS bit set (although I didn't check where Linux
> sets that bit). This commit adds the PS bit handling, which fixes the problem
> for me.
> 
> Signed-off-by: Luiz capitulino <lcapitulino@redhat.com>
> ---
> 
> Two observations:
> 
>  1. This bug has always existed, so it's not a regression, so I'm not sure
>     it's worth it to fix for 2.0
> 
>  2. I'm not familiar with every detail of x86_cpu_get_phys_page_debug(),
>     so I'm not completely sure this is the right thing to do
> 
>  target-i386/helper.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/target-i386/helper.c b/target-i386/helper.c
> index 4f447b8..9b7803f 100644
> --- a/target-i386/helper.c
> +++ b/target-i386/helper.c
> @@ -951,6 +951,13 @@ hwaddr x86_cpu_get_phys_page_debug(CPUState *cs, vaddr addr)
>                  return -1;
>          }
>  
> +        if (pdpe & PG_PSE_MASK) {
> +            page_size = 1024 * 1024 * 1024;
> +            pte = pdpe & ~( (page_size - 1) & ~0xfff);
> +            pte &= ~(PG_NX_MASK | PG_HI_USER_MASK);
> +            goto out;
> +        }

Does this also apply if we are not in long mode?

I'll check this more carefully later.

Jan

> +
>          pde_addr = ((pdpe & ~0xfff & ~(PG_NX_MASK | PG_HI_USER_MASK)) +
>                      (((addr >> 21) & 0x1ff) << 3)) & env->a20_mask;
>          pde = ldq_phys(cs->as, pde_addr);
> @@ -993,6 +1000,7 @@ hwaddr x86_cpu_get_phys_page_debug(CPUState *cs, vaddr addr)
>          pte = pte & env->a20_mask;
>      }
>  
> +out:
>      page_offset = (addr & TARGET_PAGE_MASK) & (page_size - 1);
>      paddr = (pte & TARGET_PAGE_MASK) + page_offset;
>      return paddr;
>
Paolo Bonzini March 18, 2014, 7:36 a.m. UTC | #2
Il 18/03/2014 08:19, Jan Kiszka ha scritto:
> On 2014-03-18 02:54, Luiz Capitulino wrote:
>> If you start a Linux guest with more than 4GB of memory and try to look at a
>> memory address, you will get an error from gdb:
>>
>> (gdb) p node_data[0]->node_id
>> Cannot access memory at address 0xffff88013fffd3a0
>> (gdb)
>
> I suppose this is x86-64, not 32-bit with PTE, right?
>
>>
>> I debugged this down to x86_cpu_get_phys_page_debug(), it doesn't handle the
>> case where the PDPTE has the PS bit set (although I didn't check where Linux
>> sets that bit). This commit adds the PS bit handling, which fixes the problem
>> for me.
>>
>> Signed-off-by: Luiz capitulino <lcapitulino@redhat.com>
>> ---
>>
>> Two observations:
>>
>>  1. This bug has always existed, so it's not a regression, so I'm not sure
>>     it's worth it to fix for 2.0

Sure, why not?

>>  2. I'm not familiar with every detail of x86_cpu_get_phys_page_debug(),
>>     so I'm not completely sure this is the right thing to do
>>
>>  target-i386/helper.c | 8 ++++++++
>>  1 file changed, 8 insertions(+)
>>
>> diff --git a/target-i386/helper.c b/target-i386/helper.c
>> index 4f447b8..9b7803f 100644
>> --- a/target-i386/helper.c
>> +++ b/target-i386/helper.c
>> @@ -951,6 +951,13 @@ hwaddr x86_cpu_get_phys_page_debug(CPUState *cs, vaddr addr)
>>                  return -1;
>>          }
>>
>> +        if (pdpe & PG_PSE_MASK) {
>> +            page_size = 1024 * 1024 * 1024;
>> +            pte = pdpe & ~( (page_size - 1) & ~0xfff);
>> +            pte &= ~(PG_NX_MASK | PG_HI_USER_MASK);
>> +            goto out;
>> +        }
>
> Does this also apply if we are not in long mode?

No, it doesn't.  The only valid bits in a PAE PDPTE are P, PWT and PCD. 
  Bit 7 (PS) is reserved.

Paolo
Jan Kiszka March 18, 2014, 10:30 a.m. UTC | #3
On 2014-03-18 08:36, Paolo Bonzini wrote:
> Il 18/03/2014 08:19, Jan Kiszka ha scritto:
>> On 2014-03-18 02:54, Luiz Capitulino wrote:
>>> If you start a Linux guest with more than 4GB of memory and try to
>>> look at a
>>> memory address, you will get an error from gdb:
>>>
>>> (gdb) p node_data[0]->node_id
>>> Cannot access memory at address 0xffff88013fffd3a0
>>> (gdb)
>>
>> I suppose this is x86-64, not 32-bit with PTE, right?
>>
>>>
>>> I debugged this down to x86_cpu_get_phys_page_debug(), it doesn't
>>> handle the
>>> case where the PDPTE has the PS bit set (although I didn't check
>>> where Linux
>>> sets that bit). This commit adds the PS bit handling, which fixes the
>>> problem
>>> for me.
>>>
>>> Signed-off-by: Luiz capitulino <lcapitulino@redhat.com>
>>> ---
>>>
>>> Two observations:
>>>
>>>  1. This bug has always existed, so it's not a regression, so I'm not
>>> sure
>>>     it's worth it to fix for 2.0
> 
> Sure, why not?
> 
>>>  2. I'm not familiar with every detail of x86_cpu_get_phys_page_debug(),
>>>     so I'm not completely sure this is the right thing to do
>>>
>>>  target-i386/helper.c | 8 ++++++++
>>>  1 file changed, 8 insertions(+)
>>>
>>> diff --git a/target-i386/helper.c b/target-i386/helper.c
>>> index 4f447b8..9b7803f 100644
>>> --- a/target-i386/helper.c
>>> +++ b/target-i386/helper.c
>>> @@ -951,6 +951,13 @@ hwaddr x86_cpu_get_phys_page_debug(CPUState *cs,
>>> vaddr addr)
>>>                  return -1;
>>>          }
>>>
>>> +        if (pdpe & PG_PSE_MASK) {
>>> +            page_size = 1024 * 1024 * 1024;
>>> +            pte = pdpe & ~( (page_size - 1) & ~0xfff);
>>> +            pte &= ~(PG_NX_MASK | PG_HI_USER_MASK);
>>> +            goto out;
>>> +        }
>>
>> Does this also apply if we are not in long mode?
> 
> No, it doesn't.  The only valid bits in a PAE PDPTE are P, PWT and PCD.
>  Bit 7 (PS) is reserved.

Right, this belongs in the "if (env->hflags & HF_LMA_MASK)" block.

And the subject or description should mention that
x86_cpu_get_phys_page_debug was lacking support for 1G hugepages.

Jan
Luiz Capitulino March 18, 2014, 12:49 p.m. UTC | #4
On Tue, 18 Mar 2014 08:19:52 +0100
Jan Kiszka <jan.kiszka@siemens.com> wrote:

> On 2014-03-18 02:54, Luiz Capitulino wrote:
> > If you start a Linux guest with more than 4GB of memory and try to look at a
> > memory address, you will get an error from gdb:
> > 
> > (gdb) p node_data[0]->node_id
> > Cannot access memory at address 0xffff88013fffd3a0
> > (gdb)
> 
> I suppose this is x86-64, not 32-bit with PTE, right?

Right.

> 
> > 
> > I debugged this down to x86_cpu_get_phys_page_debug(), it doesn't handle the
> > case where the PDPTE has the PS bit set (although I didn't check where Linux
> > sets that bit). This commit adds the PS bit handling, which fixes the problem
> > for me.
> > 
> > Signed-off-by: Luiz capitulino <lcapitulino@redhat.com>
> > ---
> > 
> > Two observations:
> > 
> >  1. This bug has always existed, so it's not a regression, so I'm not sure
> >     it's worth it to fix for 2.0
> > 
> >  2. I'm not familiar with every detail of x86_cpu_get_phys_page_debug(),
> >     so I'm not completely sure this is the right thing to do
> > 
> >  target-i386/helper.c | 8 ++++++++
> >  1 file changed, 8 insertions(+)
> > 
> > diff --git a/target-i386/helper.c b/target-i386/helper.c
> > index 4f447b8..9b7803f 100644
> > --- a/target-i386/helper.c
> > +++ b/target-i386/helper.c
> > @@ -951,6 +951,13 @@ hwaddr x86_cpu_get_phys_page_debug(CPUState *cs, vaddr addr)
> >                  return -1;
> >          }
> >  
> > +        if (pdpe & PG_PSE_MASK) {
> > +            page_size = 1024 * 1024 * 1024;
> > +            pte = pdpe & ~( (page_size - 1) & ~0xfff);
> > +            pte &= ~(PG_NX_MASK | PG_HI_USER_MASK);
> > +            goto out;
> > +        }
> 
> Does this also apply if we are not in long mode?
> 
> I'll check this more carefully later.
> 
> Jan
> 
> > +
> >          pde_addr = ((pdpe & ~0xfff & ~(PG_NX_MASK | PG_HI_USER_MASK)) +
> >                      (((addr >> 21) & 0x1ff) << 3)) & env->a20_mask;
> >          pde = ldq_phys(cs->as, pde_addr);
> > @@ -993,6 +1000,7 @@ hwaddr x86_cpu_get_phys_page_debug(CPUState *cs, vaddr addr)
> >          pte = pte & env->a20_mask;
> >      }
> >  
> > +out:
> >      page_offset = (addr & TARGET_PAGE_MASK) & (page_size - 1);
> >      paddr = (pte & TARGET_PAGE_MASK) + page_offset;
> >      return paddr;
> > 
>
Luiz Capitulino March 18, 2014, 1 p.m. UTC | #5
On Tue, 18 Mar 2014 11:30:42 +0100
Jan Kiszka <jan.kiszka@siemens.com> wrote:

> On 2014-03-18 08:36, Paolo Bonzini wrote:
> > Il 18/03/2014 08:19, Jan Kiszka ha scritto:
> >> On 2014-03-18 02:54, Luiz Capitulino wrote:
> >>> If you start a Linux guest with more than 4GB of memory and try to
> >>> look at a
> >>> memory address, you will get an error from gdb:
> >>>
> >>> (gdb) p node_data[0]->node_id
> >>> Cannot access memory at address 0xffff88013fffd3a0
> >>> (gdb)
> >>
> >> I suppose this is x86-64, not 32-bit with PTE, right?
> >>
> >>>
> >>> I debugged this down to x86_cpu_get_phys_page_debug(), it doesn't
> >>> handle the
> >>> case where the PDPTE has the PS bit set (although I didn't check
> >>> where Linux
> >>> sets that bit). This commit adds the PS bit handling, which fixes the
> >>> problem
> >>> for me.
> >>>
> >>> Signed-off-by: Luiz capitulino <lcapitulino@redhat.com>
> >>> ---
> >>>
> >>> Two observations:
> >>>
> >>>  1. This bug has always existed, so it's not a regression, so I'm not
> >>> sure
> >>>     it's worth it to fix for 2.0
> > 
> > Sure, why not?
> > 
> >>>  2. I'm not familiar with every detail of x86_cpu_get_phys_page_debug(),
> >>>     so I'm not completely sure this is the right thing to do
> >>>
> >>>  target-i386/helper.c | 8 ++++++++
> >>>  1 file changed, 8 insertions(+)
> >>>
> >>> diff --git a/target-i386/helper.c b/target-i386/helper.c
> >>> index 4f447b8..9b7803f 100644
> >>> --- a/target-i386/helper.c
> >>> +++ b/target-i386/helper.c
> >>> @@ -951,6 +951,13 @@ hwaddr x86_cpu_get_phys_page_debug(CPUState *cs,
> >>> vaddr addr)
> >>>                  return -1;
> >>>          }
> >>>
> >>> +        if (pdpe & PG_PSE_MASK) {
> >>> +            page_size = 1024 * 1024 * 1024;
> >>> +            pte = pdpe & ~( (page_size - 1) & ~0xfff);
> >>> +            pte &= ~(PG_NX_MASK | PG_HI_USER_MASK);
> >>> +            goto out;
> >>> +        }
> >>
> >> Does this also apply if we are not in long mode?
> > 
> > No, it doesn't.  The only valid bits in a PAE PDPTE are P, PWT and PCD.
> >  Bit 7 (PS) is reserved.
> 
> Right, this belongs in the "if (env->hflags & HF_LMA_MASK)" block.
> 
> And the subject or description should mention that
> x86_cpu_get_phys_page_debug was lacking support for 1G hugepages.

To be honest, although the PS bit is set and that indicates a 1GB page,
I didn't know Linux does that. I thought Linux would use 4KB pages for
everything unless it's explicitly asked to use bigger pages. Also, note that
I was using gdb to debug really early kernel boot code (start_kernel()).

I'd feel more confident to have such a changelog after I find out where
exactly Linux sets that bit, but I won't have time in the next days. On the
other hand, the patch does fix the problem to me.
Jan Kiszka March 18, 2014, 2:36 p.m. UTC | #6
On 2014-03-18 14:00, Luiz Capitulino wrote:
> On Tue, 18 Mar 2014 11:30:42 +0100
> Jan Kiszka <jan.kiszka@siemens.com> wrote:
> 
>> On 2014-03-18 08:36, Paolo Bonzini wrote:
>>> Il 18/03/2014 08:19, Jan Kiszka ha scritto:
>>>> On 2014-03-18 02:54, Luiz Capitulino wrote:
>>>>> If you start a Linux guest with more than 4GB of memory and try to
>>>>> look at a
>>>>> memory address, you will get an error from gdb:
>>>>>
>>>>> (gdb) p node_data[0]->node_id
>>>>> Cannot access memory at address 0xffff88013fffd3a0
>>>>> (gdb)
>>>>
>>>> I suppose this is x86-64, not 32-bit with PTE, right?
>>>>
>>>>>
>>>>> I debugged this down to x86_cpu_get_phys_page_debug(), it doesn't
>>>>> handle the
>>>>> case where the PDPTE has the PS bit set (although I didn't check
>>>>> where Linux
>>>>> sets that bit). This commit adds the PS bit handling, which fixes the
>>>>> problem
>>>>> for me.
>>>>>
>>>>> Signed-off-by: Luiz capitulino <lcapitulino@redhat.com>
>>>>> ---
>>>>>
>>>>> Two observations:
>>>>>
>>>>>  1. This bug has always existed, so it's not a regression, so I'm not
>>>>> sure
>>>>>     it's worth it to fix for 2.0
>>>
>>> Sure, why not?
>>>
>>>>>  2. I'm not familiar with every detail of x86_cpu_get_phys_page_debug(),
>>>>>     so I'm not completely sure this is the right thing to do
>>>>>
>>>>>  target-i386/helper.c | 8 ++++++++
>>>>>  1 file changed, 8 insertions(+)
>>>>>
>>>>> diff --git a/target-i386/helper.c b/target-i386/helper.c
>>>>> index 4f447b8..9b7803f 100644
>>>>> --- a/target-i386/helper.c
>>>>> +++ b/target-i386/helper.c
>>>>> @@ -951,6 +951,13 @@ hwaddr x86_cpu_get_phys_page_debug(CPUState *cs,
>>>>> vaddr addr)
>>>>>                  return -1;
>>>>>          }
>>>>>
>>>>> +        if (pdpe & PG_PSE_MASK) {
>>>>> +            page_size = 1024 * 1024 * 1024;
>>>>> +            pte = pdpe & ~( (page_size - 1) & ~0xfff);
>>>>> +            pte &= ~(PG_NX_MASK | PG_HI_USER_MASK);
>>>>> +            goto out;
>>>>> +        }
>>>>
>>>> Does this also apply if we are not in long mode?
>>>
>>> No, it doesn't.  The only valid bits in a PAE PDPTE are P, PWT and PCD.
>>>  Bit 7 (PS) is reserved.
>>
>> Right, this belongs in the "if (env->hflags & HF_LMA_MASK)" block.
>>
>> And the subject or description should mention that
>> x86_cpu_get_phys_page_debug was lacking support for 1G hugepages.
> 
> To be honest, although the PS bit is set and that indicates a 1GB page,
> I didn't know Linux does that. I thought Linux would use 4KB pages for
> everything unless it's explicitly asked to use bigger pages. Also, note that
> I was using gdb to debug really early kernel boot code (start_kernel()).

I could imagine that Linux initially creates a giant identity mapping
page table for the startup process and only later on switches to
fine-grained tables of 4K and 2M pages. Giant pages still require
hughtlbfs, IIRC.

> 
> I'd feel more confident to have such a changelog after I find out where
> exactly Linux sets that bit, but I won't have time in the next days. On the
> other hand, the patch does fix the problem to me.

Don't worry about Linux (the code should work with any OS anyway), just
believe your reviewers. ;) Alternatively, check Intel IA32 SDM on page
table structures.

Jan
Luiz Capitulino March 18, 2014, 4:23 p.m. UTC | #7
On Tue, 18 Mar 2014 15:36:45 +0100
Jan Kiszka <jan.kiszka@siemens.com> wrote:

> >> Right, this belongs in the "if (env->hflags & HF_LMA_MASK)" block.
> >>
> >> And the subject or description should mention that
> >> x86_cpu_get_phys_page_debug was lacking support for 1G hugepages.
> > 
> > To be honest, although the PS bit is set and that indicates a 1GB page,
> > I didn't know Linux does that. I thought Linux would use 4KB pages for
> > everything unless it's explicitly asked to use bigger pages. Also, note that
> > I was using gdb to debug really early kernel boot code (start_kernel()).
> 
> I could imagine that Linux initially creates a giant identity mapping
> page table for the startup process and only later on switches to
> fine-grained tables of 4K and 2M pages. Giant pages still require
> hughtlbfs, IIRC.
> 
> > 
> > I'd feel more confident to have such a changelog after I find out where
> > exactly Linux sets that bit, but I won't have time in the next days. On the
> > other hand, the patch does fix the problem to me.
> 
> Don't worry about Linux (the code should work with any OS anyway), just
> believe your reviewers. ;) Alternatively, check Intel IA32 SDM on page
> table structures.

OK, so you want me to change the subject? Anything else for v2?
Paolo Bonzini March 18, 2014, 4:37 p.m. UTC | #8
Il 18/03/2014 17:23, Luiz Capitulino ha scritto:
> On Tue, 18 Mar 2014 15:36:45 +0100
> Jan Kiszka <jan.kiszka@siemens.com> wrote:
>
>>>> Right, this belongs in the "if (env->hflags & HF_LMA_MASK)" block.
>>>>
>>>> And the subject or description should mention that
>>>> x86_cpu_get_phys_page_debug was lacking support for 1G hugepages.
>>>
>>> To be honest, although the PS bit is set and that indicates a 1GB page,
>>> I didn't know Linux does that. I thought Linux would use 4KB pages for
>>> everything unless it's explicitly asked to use bigger pages. Also, note that
>>> I was using gdb to debug really early kernel boot code (start_kernel()).
>>
>> I could imagine that Linux initially creates a giant identity mapping
>> page table for the startup process and only later on switches to
>> fine-grained tables of 4K and 2M pages. Giant pages still require
>> hughtlbfs, IIRC.
>>
>>>
>>> I'd feel more confident to have such a changelog after I find out where
>>> exactly Linux sets that bit, but I won't have time in the next days. On the
>>> other hand, the patch does fix the problem to me.
>>
>> Don't worry about Linux (the code should work with any OS anyway), just
>> believe your reviewers. ;) Alternatively, check Intel IA32 SDM on page
>> table structures.
>
> OK, so you want me to change the subject? Anything else for v2?

You only need to move the new code into the "if (env->hflags & 
HF_LMA_MASK)", I think.  The subject is ok.

Paolo
Jan Kiszka March 18, 2014, 4:45 p.m. UTC | #9
On 2014-03-18 17:37, Paolo Bonzini wrote:
> Il 18/03/2014 17:23, Luiz Capitulino ha scritto:
>> On Tue, 18 Mar 2014 15:36:45 +0100
>> Jan Kiszka <jan.kiszka@siemens.com> wrote:
>>
>>>>> Right, this belongs in the "if (env->hflags & HF_LMA_MASK)" block.
>>>>>
>>>>> And the subject or description should mention that
>>>>> x86_cpu_get_phys_page_debug was lacking support for 1G hugepages.
>>>>
>>>> To be honest, although the PS bit is set and that indicates a 1GB page,
>>>> I didn't know Linux does that. I thought Linux would use 4KB pages for
>>>> everything unless it's explicitly asked to use bigger pages. Also,
>>>> note that
>>>> I was using gdb to debug really early kernel boot code
>>>> (start_kernel()).
>>>
>>> I could imagine that Linux initially creates a giant identity mapping
>>> page table for the startup process and only later on switches to
>>> fine-grained tables of 4K and 2M pages. Giant pages still require
>>> hughtlbfs, IIRC.
>>>
>>>>
>>>> I'd feel more confident to have such a changelog after I find out where
>>>> exactly Linux sets that bit, but I won't have time in the next days.
>>>> On the
>>>> other hand, the patch does fix the problem to me.
>>>
>>> Don't worry about Linux (the code should work with any OS anyway), just
>>> believe your reviewers. ;) Alternatively, check Intel IA32 SDM on page
>>> table structures.
>>
>> OK, so you want me to change the subject? Anything else for v2?
> 
> You only need to move the new code into the "if (env->hflags &
> HF_LMA_MASK)", I think.  The subject is ok.

Yes. Subject is fine, a reference to GB pages in the description would
be nice-to-have.

Jan
diff mbox

Patch

diff --git a/target-i386/helper.c b/target-i386/helper.c
index 4f447b8..9b7803f 100644
--- a/target-i386/helper.c
+++ b/target-i386/helper.c
@@ -951,6 +951,13 @@  hwaddr x86_cpu_get_phys_page_debug(CPUState *cs, vaddr addr)
                 return -1;
         }
 
+        if (pdpe & PG_PSE_MASK) {
+            page_size = 1024 * 1024 * 1024;
+            pte = pdpe & ~( (page_size - 1) & ~0xfff);
+            pte &= ~(PG_NX_MASK | PG_HI_USER_MASK);
+            goto out;
+        }
+
         pde_addr = ((pdpe & ~0xfff & ~(PG_NX_MASK | PG_HI_USER_MASK)) +
                     (((addr >> 21) & 0x1ff) << 3)) & env->a20_mask;
         pde = ldq_phys(cs->as, pde_addr);
@@ -993,6 +1000,7 @@  hwaddr x86_cpu_get_phys_page_debug(CPUState *cs, vaddr addr)
         pte = pte & env->a20_mask;
     }
 
+out:
     page_offset = (addr & TARGET_PAGE_MASK) & (page_size - 1);
     paddr = (pte & TARGET_PAGE_MASK) + page_offset;
     return paddr;