diff mbox

ppc: Yet another fix for the huge page support detection mechanism

Message ID 1468570225-14101-1-git-send-email-thuth@redhat.com
State New
Headers show

Commit Message

Thomas Huth July 15, 2016, 8:10 a.m. UTC
Commit 86b50f2e1bef ("Disable huge page support if it is not available
for main RAM") already made sure that huge page support is not announced
to the guest if the normal RAM of non-NUMA configurations is not backed
by a huge page filesystem. However, there is one more case that can go
wrong: NUMA is enabled, but the RAM of the NUMA nodes are not configured
with huge page support (and only the memory of a DIMM is configured with
it). When QEMU is started with the following command line for example,
the Linux guest currently crashes because it is trying to use huge pages
on a memory region that does not support huge pages:

 qemu-system-ppc64 -enable-kvm ... -m 1G,slots=4,maxmem=32G -object \
   memory-backend-file,policy=default,mem-path=/hugepages,size=1G,id=mem-mem1 \
   -device pc-dimm,id=dimm-mem1,memdev=mem-mem1 -smp 2 \
   -numa node,nodeid=0 -numa node,nodeid=1

To fix this issue, we've got to make sure to disable huge page support,
too, when there is a NUMA node that is not using a memory backend with
huge page support.

Fixes: 86b50f2e1befc33407bdfeb6f45f7b0d2439a740
Signed-off-by: Thomas Huth <thuth@redhat.com>
---
 target-ppc/kvm.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

Comments

David Gibson July 15, 2016, 8:35 a.m. UTC | #1
On Fri, Jul 15, 2016 at 10:10:25AM +0200, Thomas Huth wrote:
> Commit 86b50f2e1bef ("Disable huge page support if it is not available
> for main RAM") already made sure that huge page support is not announced
> to the guest if the normal RAM of non-NUMA configurations is not backed
> by a huge page filesystem. However, there is one more case that can go
> wrong: NUMA is enabled, but the RAM of the NUMA nodes are not configured
> with huge page support (and only the memory of a DIMM is configured with
> it). When QEMU is started with the following command line for example,
> the Linux guest currently crashes because it is trying to use huge pages
> on a memory region that does not support huge pages:
> 
>  qemu-system-ppc64 -enable-kvm ... -m 1G,slots=4,maxmem=32G -object \
>    memory-backend-file,policy=default,mem-path=/hugepages,size=1G,id=mem-mem1 \
>    -device pc-dimm,id=dimm-mem1,memdev=mem-mem1 -smp 2 \
>    -numa node,nodeid=0 -numa node,nodeid=1
> 
> To fix this issue, we've got to make sure to disable huge page support,
> too, when there is a NUMA node that is not using a memory backend with
> huge page support.
> 
> Fixes: 86b50f2e1befc33407bdfeb6f45f7b0d2439a740
> Signed-off-by: Thomas Huth <thuth@redhat.com>
> ---
>  target-ppc/kvm.c | 10 +++++++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
> index 884d564..7a8f555 100644
> --- a/target-ppc/kvm.c
> +++ b/target-ppc/kvm.c
> @@ -389,12 +389,16 @@ static long getrampagesize(void)
>  
>      object_child_foreach(memdev_root, find_max_supported_pagesize, &hpsize);
>  
> -    if (hpsize == LONG_MAX) {
> +    if (hpsize == LONG_MAX || hpsize == getpagesize()) {
>          return getpagesize();
>      }
>  
> -    if (nb_numa_nodes == 0 && hpsize > getpagesize()) {
> -        /* No NUMA nodes and normal RAM without -mem-path ==> no huge pages! */
> +    /* If NUMA is disabled or the NUMA nodes are not backed with a
> +     * memory-backend, then there is at least one node using "normal"
> +     * RAM. And since normal RAM has not been configured with "-mem-path"
> +     * (what we've checked earlier here already), we can not use huge pages!
> +     */
> +    if (nb_numa_nodes == 0 || numa_info[0].node_memdev == NULL) {

Is that second clause sufficient, or do you need to loop through and
check the memdev of every node?

>          static bool warned;
>          if (!warned) {
>              error_report("Huge page support disabled (n/a for main memory).");
Greg Kurz July 15, 2016, 9:28 a.m. UTC | #2
On Fri, 15 Jul 2016 10:10:25 +0200
Thomas Huth <thuth@redhat.com> wrote:

> Commit 86b50f2e1bef ("Disable huge page support if it is not available
> for main RAM") already made sure that huge page support is not announced
> to the guest if the normal RAM of non-NUMA configurations is not backed
> by a huge page filesystem. However, there is one more case that can go
> wrong: NUMA is enabled, but the RAM of the NUMA nodes are not configured
> with huge page support (and only the memory of a DIMM is configured with
> it). When QEMU is started with the following command line for example,
> the Linux guest currently crashes because it is trying to use huge pages
> on a memory region that does not support huge pages:
> 
>  qemu-system-ppc64 -enable-kvm ... -m 1G,slots=4,maxmem=32G -object \
>    memory-backend-file,policy=default,mem-path=/hugepages,size=1G,id=mem-mem1 \
>    -device pc-dimm,id=dimm-mem1,memdev=mem-mem1 -smp 2 \
>    -numa node,nodeid=0 -numa node,nodeid=1
> 
> To fix this issue, we've got to make sure to disable huge page support,
> too, when there is a NUMA node that is not using a memory backend with
> huge page support.
> 
> Fixes: 86b50f2e1befc33407bdfeb6f45f7b0d2439a740

According to http://patchwork.ozlabs.org/patch/584741/ , it is best worded

"Broken in commit 86b50f2e1bef"

> Signed-off-by: Thomas Huth <thuth@redhat.com>
> ---
>  target-ppc/kvm.c | 10 +++++++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
> index 884d564..7a8f555 100644
> --- a/target-ppc/kvm.c
> +++ b/target-ppc/kvm.c
> @@ -389,12 +389,16 @@ static long getrampagesize(void)
>  
>      object_child_foreach(memdev_root, find_max_supported_pagesize, &hpsize);
>  
> -    if (hpsize == LONG_MAX) {
> +    if (hpsize == LONG_MAX || hpsize == getpagesize()) {
>          return getpagesize();
>      }
>  
> -    if (nb_numa_nodes == 0 && hpsize > getpagesize()) {
> -        /* No NUMA nodes and normal RAM without -mem-path ==> no huge pages! */
> +    /* If NUMA is disabled or the NUMA nodes are not backed with a
> +     * memory-backend, then there is at least one node using "normal"
> +     * RAM. And since normal RAM has not been configured with "-mem-path"
> +     * (what we've checked earlier here already), we can not use huge pages!
> +     */
> +    if (nb_numa_nodes == 0 || numa_info[0].node_memdev == NULL) {

Dumb question: why only checking numa_info[0] ?

>          static bool warned;
>          if (!warned) {
>              error_report("Huge page support disabled (n/a for main memory).");
Thomas Huth July 15, 2016, 12:28 p.m. UTC | #3
On 15.07.2016 10:35, David Gibson wrote:
> On Fri, Jul 15, 2016 at 10:10:25AM +0200, Thomas Huth wrote:
>> Commit 86b50f2e1bef ("Disable huge page support if it is not available
>> for main RAM") already made sure that huge page support is not announced
>> to the guest if the normal RAM of non-NUMA configurations is not backed
>> by a huge page filesystem. However, there is one more case that can go
>> wrong: NUMA is enabled, but the RAM of the NUMA nodes are not configured
>> with huge page support (and only the memory of a DIMM is configured with
>> it). When QEMU is started with the following command line for example,
>> the Linux guest currently crashes because it is trying to use huge pages
>> on a memory region that does not support huge pages:
>>
>>  qemu-system-ppc64 -enable-kvm ... -m 1G,slots=4,maxmem=32G -object \
>>    memory-backend-file,policy=default,mem-path=/hugepages,size=1G,id=mem-mem1 \
>>    -device pc-dimm,id=dimm-mem1,memdev=mem-mem1 -smp 2 \
>>    -numa node,nodeid=0 -numa node,nodeid=1
>>
>> To fix this issue, we've got to make sure to disable huge page support,
>> too, when there is a NUMA node that is not using a memory backend with
>> huge page support.
>>
>> Fixes: 86b50f2e1befc33407bdfeb6f45f7b0d2439a740
>> Signed-off-by: Thomas Huth <thuth@redhat.com>
>> ---
>>  target-ppc/kvm.c | 10 +++++++---
>>  1 file changed, 7 insertions(+), 3 deletions(-)
>>
>> diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
>> index 884d564..7a8f555 100644
>> --- a/target-ppc/kvm.c
>> +++ b/target-ppc/kvm.c
>> @@ -389,12 +389,16 @@ static long getrampagesize(void)
>>  
>>      object_child_foreach(memdev_root, find_max_supported_pagesize, &hpsize);
>>  
>> -    if (hpsize == LONG_MAX) {
>> +    if (hpsize == LONG_MAX || hpsize == getpagesize()) {
>>          return getpagesize();
>>      }
>>  
>> -    if (nb_numa_nodes == 0 && hpsize > getpagesize()) {
>> -        /* No NUMA nodes and normal RAM without -mem-path ==> no huge pages! */
>> +    /* If NUMA is disabled or the NUMA nodes are not backed with a
>> +     * memory-backend, then there is at least one node using "normal"
>> +     * RAM. And since normal RAM has not been configured with "-mem-path"
>> +     * (what we've checked earlier here already), we can not use huge pages!
>> +     */
>> +    if (nb_numa_nodes == 0 || numa_info[0].node_memdev == NULL) {
> 
> Is that second clause sufficient, or do you need to loop through and
> check the memdev of every node?

Checking the first entry should be sufficient. QEMU forces you to
specify either a memory backend for all NUMA nodes (which we should have
looked at during the object_child_foreach() some lines earlier), or you
must not specify a memory backend for any NUMA node at all. You can not
mix the settings, so checking numa_info[0] is enough.

 Thomas
Greg Kurz July 15, 2016, 3:18 p.m. UTC | #4
On Fri, 15 Jul 2016 14:28:44 +0200
Thomas Huth <thuth@redhat.com> wrote:

> On 15.07.2016 10:35, David Gibson wrote:
> > On Fri, Jul 15, 2016 at 10:10:25AM +0200, Thomas Huth wrote:  
> >> Commit 86b50f2e1bef ("Disable huge page support if it is not available
> >> for main RAM") already made sure that huge page support is not announced
> >> to the guest if the normal RAM of non-NUMA configurations is not backed
> >> by a huge page filesystem. However, there is one more case that can go
> >> wrong: NUMA is enabled, but the RAM of the NUMA nodes are not configured
> >> with huge page support (and only the memory of a DIMM is configured with
> >> it). When QEMU is started with the following command line for example,
> >> the Linux guest currently crashes because it is trying to use huge pages
> >> on a memory region that does not support huge pages:
> >>
> >>  qemu-system-ppc64 -enable-kvm ... -m 1G,slots=4,maxmem=32G -object \
> >>    memory-backend-file,policy=default,mem-path=/hugepages,size=1G,id=mem-mem1 \
> >>    -device pc-dimm,id=dimm-mem1,memdev=mem-mem1 -smp 2 \
> >>    -numa node,nodeid=0 -numa node,nodeid=1
> >>
> >> To fix this issue, we've got to make sure to disable huge page support,
> >> too, when there is a NUMA node that is not using a memory backend with
> >> huge page support.
> >>
> >> Fixes: 86b50f2e1befc33407bdfeb6f45f7b0d2439a740
> >> Signed-off-by: Thomas Huth <thuth@redhat.com>
> >> ---
> >>  target-ppc/kvm.c | 10 +++++++---
> >>  1 file changed, 7 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
> >> index 884d564..7a8f555 100644
> >> --- a/target-ppc/kvm.c
> >> +++ b/target-ppc/kvm.c
> >> @@ -389,12 +389,16 @@ static long getrampagesize(void)
> >>  
> >>      object_child_foreach(memdev_root, find_max_supported_pagesize, &hpsize);
> >>  
> >> -    if (hpsize == LONG_MAX) {
> >> +    if (hpsize == LONG_MAX || hpsize == getpagesize()) {
> >>          return getpagesize();
> >>      }
> >>  
> >> -    if (nb_numa_nodes == 0 && hpsize > getpagesize()) {
> >> -        /* No NUMA nodes and normal RAM without -mem-path ==> no huge pages! */
> >> +    /* If NUMA is disabled or the NUMA nodes are not backed with a
> >> +     * memory-backend, then there is at least one node using "normal"
> >> +     * RAM. And since normal RAM has not been configured with "-mem-path"
> >> +     * (what we've checked earlier here already), we can not use huge pages!
> >> +     */
> >> +    if (nb_numa_nodes == 0 || numa_info[0].node_memdev == NULL) {  
> > 
> > Is that second clause sufficient, or do you need to loop through and
> > check the memdev of every node?  
> 
> Checking the first entry should be sufficient. QEMU forces you to
> specify either a memory backend for all NUMA nodes (which we should have
> looked at during the object_child_foreach() some lines earlier), or you
> must not specify a memory backend for any NUMA node at all. You can not
> mix the settings, so checking numa_info[0] is enough.
> 
>  Thomas
> 
> 

And what happens if we specify a hugepage memdev backend to one of the
nodes and a regular RAM memdev backend to the other ?

I actually wanted to try that but I hit an assertion, which isn't
related to this patch I think:

qemu-system-ppc64: memory.c:1934: memory_region_add_subregion_common: 
   Assertion `!subregion->container' failed.

So I tried to trick the logic you are trying to fix the other way
round:

-mem-path /dev/hugepages \
-m 1G,slots=4,maxmem=32G \
-object memory-backend-ram,policy=default,size=1G,id=mem-mem1 \
-device pc-dimm,id=dimm-mem1,memdev=mem-mem1 \
-smp 2 \
-numa node,nodeid=0 -numa node,nodeid=1

The guest fails the same way as before your patch: the hugepage size is
advertised to the guest, but the numa node is associated to regular ram.

--
Greg
Thomas Huth July 15, 2016, 3:54 p.m. UTC | #5
On 15.07.2016 17:18, Greg Kurz wrote:
> On Fri, 15 Jul 2016 14:28:44 +0200
> Thomas Huth <thuth@redhat.com> wrote:
> 
>> On 15.07.2016 10:35, David Gibson wrote:
>>> On Fri, Jul 15, 2016 at 10:10:25AM +0200, Thomas Huth wrote:  
>>>> Commit 86b50f2e1bef ("Disable huge page support if it is not available
>>>> for main RAM") already made sure that huge page support is not announced
>>>> to the guest if the normal RAM of non-NUMA configurations is not backed
>>>> by a huge page filesystem. However, there is one more case that can go
>>>> wrong: NUMA is enabled, but the RAM of the NUMA nodes are not configured
>>>> with huge page support (and only the memory of a DIMM is configured with
>>>> it). When QEMU is started with the following command line for example,
>>>> the Linux guest currently crashes because it is trying to use huge pages
>>>> on a memory region that does not support huge pages:
>>>>
>>>>  qemu-system-ppc64 -enable-kvm ... -m 1G,slots=4,maxmem=32G -object \
>>>>    memory-backend-file,policy=default,mem-path=/hugepages,size=1G,id=mem-mem1 \
>>>>    -device pc-dimm,id=dimm-mem1,memdev=mem-mem1 -smp 2 \
>>>>    -numa node,nodeid=0 -numa node,nodeid=1
>>>>
>>>> To fix this issue, we've got to make sure to disable huge page support,
>>>> too, when there is a NUMA node that is not using a memory backend with
>>>> huge page support.
>>>>
>>>> Fixes: 86b50f2e1befc33407bdfeb6f45f7b0d2439a740
>>>> Signed-off-by: Thomas Huth <thuth@redhat.com>
>>>> ---
>>>>  target-ppc/kvm.c | 10 +++++++---
>>>>  1 file changed, 7 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
>>>> index 884d564..7a8f555 100644
>>>> --- a/target-ppc/kvm.c
>>>> +++ b/target-ppc/kvm.c
>>>> @@ -389,12 +389,16 @@ static long getrampagesize(void)
>>>>  
>>>>      object_child_foreach(memdev_root, find_max_supported_pagesize, &hpsize);
>>>>  
>>>> -    if (hpsize == LONG_MAX) {
>>>> +    if (hpsize == LONG_MAX || hpsize == getpagesize()) {
>>>>          return getpagesize();
>>>>      }
>>>>  
>>>> -    if (nb_numa_nodes == 0 && hpsize > getpagesize()) {
>>>> -        /* No NUMA nodes and normal RAM without -mem-path ==> no huge pages! */
>>>> +    /* If NUMA is disabled or the NUMA nodes are not backed with a
>>>> +     * memory-backend, then there is at least one node using "normal"
>>>> +     * RAM. And since normal RAM has not been configured with "-mem-path"
>>>> +     * (what we've checked earlier here already), we can not use huge pages!
>>>> +     */
>>>> +    if (nb_numa_nodes == 0 || numa_info[0].node_memdev == NULL) {  
>>>
>>> Is that second clause sufficient, or do you need to loop through and
>>> check the memdev of every node?  
>>
>> Checking the first entry should be sufficient. QEMU forces you to
>> specify either a memory backend for all NUMA nodes (which we should have
>> looked at during the object_child_foreach() some lines earlier), or you
>> must not specify a memory backend for any NUMA node at all. You can not
>> mix the settings, so checking numa_info[0] is enough.
> 
> And what happens if we specify a hugepage memdev backend to one of the
> nodes and a regular RAM memdev backend to the other ?

I think that should be handled with the object_child_foreach() logic in
that function ... unless I completely misunderstood the code ;-)

> I actually wanted to try that but I hit an assertion, which isn't
> related to this patch I think:
> 
> qemu-system-ppc64: memory.c:1934: memory_region_add_subregion_common: 
>    Assertion `!subregion->container' failed.

I just tried that, too, and I did not get that assertion:

qemu-system-ppc64 -enable-kvm ... -m 2G,slots=4,maxmem=32G \
 -object memory-backend-file,policy=default,mem-path=/mnt/kvm_hugepage,size=1G,id=mem-mem1 \
 -object memory-backend-file,policy=default,mem-path=/mnt,size=1G,id=mem-mem2 \
 -smp 2 -numa node,nodeid=0,memdev=mem-mem1 \
 -numa node,nodeid=1,memdev=mem-mem2

And the guest was starting fine, with huge pages disabled.

> So I tried to trick the logic you are trying to fix the other way
> round:
> 
> -mem-path /dev/hugepages \
> -m 1G,slots=4,maxmem=32G \
> -object memory-backend-ram,policy=default,size=1G,id=mem-mem1 \
> -device pc-dimm,id=dimm-mem1,memdev=mem-mem1 \
> -smp 2 \
> -numa node,nodeid=0 -numa node,nodeid=1
> 
> The guest fails the same way as before your patch: the hugepage size is
> advertised to the guest, but the numa node is associated to regular ram.

You're right, this is still an issue here! ... so we need yet another
fix for this case :-/

Thanks for the testing!

 Thomas
Greg Kurz July 15, 2016, 4:31 p.m. UTC | #6
On Fri, 15 Jul 2016 17:54:44 +0200
Thomas Huth <thuth@redhat.com> wrote:

> On 15.07.2016 17:18, Greg Kurz wrote:
> > On Fri, 15 Jul 2016 14:28:44 +0200
> > Thomas Huth <thuth@redhat.com> wrote:
> >   
> >> On 15.07.2016 10:35, David Gibson wrote:  
> >>> On Fri, Jul 15, 2016 at 10:10:25AM +0200, Thomas Huth wrote:    
> >>>> Commit 86b50f2e1bef ("Disable huge page support if it is not available
> >>>> for main RAM") already made sure that huge page support is not announced
> >>>> to the guest if the normal RAM of non-NUMA configurations is not backed
> >>>> by a huge page filesystem. However, there is one more case that can go
> >>>> wrong: NUMA is enabled, but the RAM of the NUMA nodes are not configured
> >>>> with huge page support (and only the memory of a DIMM is configured with
> >>>> it). When QEMU is started with the following command line for example,
> >>>> the Linux guest currently crashes because it is trying to use huge pages
> >>>> on a memory region that does not support huge pages:
> >>>>
> >>>>  qemu-system-ppc64 -enable-kvm ... -m 1G,slots=4,maxmem=32G -object \
> >>>>    memory-backend-file,policy=default,mem-path=/hugepages,size=1G,id=mem-mem1 \
> >>>>    -device pc-dimm,id=dimm-mem1,memdev=mem-mem1 -smp 2 \
> >>>>    -numa node,nodeid=0 -numa node,nodeid=1
> >>>>
> >>>> To fix this issue, we've got to make sure to disable huge page support,
> >>>> too, when there is a NUMA node that is not using a memory backend with
> >>>> huge page support.
> >>>>
> >>>> Fixes: 86b50f2e1befc33407bdfeb6f45f7b0d2439a740
> >>>> Signed-off-by: Thomas Huth <thuth@redhat.com>
> >>>> ---
> >>>>  target-ppc/kvm.c | 10 +++++++---
> >>>>  1 file changed, 7 insertions(+), 3 deletions(-)
> >>>>
> >>>> diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
> >>>> index 884d564..7a8f555 100644
> >>>> --- a/target-ppc/kvm.c
> >>>> +++ b/target-ppc/kvm.c
> >>>> @@ -389,12 +389,16 @@ static long getrampagesize(void)
> >>>>  
> >>>>      object_child_foreach(memdev_root, find_max_supported_pagesize, &hpsize);
> >>>>  
> >>>> -    if (hpsize == LONG_MAX) {
> >>>> +    if (hpsize == LONG_MAX || hpsize == getpagesize()) {
> >>>>          return getpagesize();
> >>>>      }
> >>>>  
> >>>> -    if (nb_numa_nodes == 0 && hpsize > getpagesize()) {
> >>>> -        /* No NUMA nodes and normal RAM without -mem-path ==> no huge pages! */
> >>>> +    /* If NUMA is disabled or the NUMA nodes are not backed with a
> >>>> +     * memory-backend, then there is at least one node using "normal"
> >>>> +     * RAM. And since normal RAM has not been configured with "-mem-path"
> >>>> +     * (what we've checked earlier here already), we can not use huge pages!
> >>>> +     */
> >>>> +    if (nb_numa_nodes == 0 || numa_info[0].node_memdev == NULL) {    
> >>>
> >>> Is that second clause sufficient, or do you need to loop through and
> >>> check the memdev of every node?    
> >>
> >> Checking the first entry should be sufficient. QEMU forces you to
> >> specify either a memory backend for all NUMA nodes (which we should have
> >> looked at during the object_child_foreach() some lines earlier), or you
> >> must not specify a memory backend for any NUMA node at all. You can not
> >> mix the settings, so checking numa_info[0] is enough.  
> > 
> > And what happens if we specify a hugepage memdev backend to one of the
> > nodes and a regular RAM memdev backend to the other ?  
> 
> I think that should be handled with the object_child_foreach() logic in
> that function ... unless I completely misunderstood the code ;-)
> 

You're right. The loop always catches the smallest page size. :)

So this patch indeed fixes the case you describe in the changelog.

Reviewed-by: Greg Kurz <groug@kaod.org>
Tested-by: Greg Kurz <groug@kaod.org>

> > I actually wanted to try that but I hit an assertion, which isn't
> > related to this patch I think:
> > 
> > qemu-system-ppc64: memory.c:1934: memory_region_add_subregion_common: 
> >    Assertion `!subregion->container' failed.  
> 
> I just tried that, too, and I did not get that assertion:
> 

I tried with the master branch (commit 14c7d99333e4) + your patch... I'll
investigate that.

> qemu-system-ppc64 -enable-kvm ... -m 2G,slots=4,maxmem=32G \
>  -object memory-backend-file,policy=default,mem-path=/mnt/kvm_hugepage,size=1G,id=mem-mem1 \
>  -object memory-backend-file,policy=default,mem-path=/mnt,size=1G,id=mem-mem2 \
>  -smp 2 -numa node,nodeid=0,memdev=mem-mem1 \
>  -numa node,nodeid=1,memdev=mem-mem2
> 
> And the guest was starting fine, with huge pages disabled.
> 
> > So I tried to trick the logic you are trying to fix the other way
> > round:
> > 
> > -mem-path /dev/hugepages \
> > -m 1G,slots=4,maxmem=32G \
> > -object memory-backend-ram,policy=default,size=1G,id=mem-mem1 \
> > -device pc-dimm,id=dimm-mem1,memdev=mem-mem1 \
> > -smp 2 \
> > -numa node,nodeid=0 -numa node,nodeid=1
> > 
> > The guest fails the same way as before your patch: the hugepage size is
> > advertised to the guest, but the numa node is associated to regular ram.  
> 
> You're right, this is still an issue here! ... so we need yet another
> fix for this case :-/
> 

Maybe check the memory backend objects first if we have some,
else return gethugepagesize() if we have mem-path,
else return getpagesize() ?

> Thanks for the testing!
> 
>  Thomas
> 
> 

You're welcome.

--
Greg
David Gibson July 18, 2016, 12:52 a.m. UTC | #7
On Fri, Jul 15, 2016 at 10:10:25AM +0200, Thomas Huth wrote:
> Commit 86b50f2e1bef ("Disable huge page support if it is not available
> for main RAM") already made sure that huge page support is not announced
> to the guest if the normal RAM of non-NUMA configurations is not backed
> by a huge page filesystem. However, there is one more case that can go
> wrong: NUMA is enabled, but the RAM of the NUMA nodes are not configured
> with huge page support (and only the memory of a DIMM is configured with
> it). When QEMU is started with the following command line for example,
> the Linux guest currently crashes because it is trying to use huge pages
> on a memory region that does not support huge pages:
> 
>  qemu-system-ppc64 -enable-kvm ... -m 1G,slots=4,maxmem=32G -object \
>    memory-backend-file,policy=default,mem-path=/hugepages,size=1G,id=mem-mem1 \
>    -device pc-dimm,id=dimm-mem1,memdev=mem-mem1 -smp 2 \
>    -numa node,nodeid=0 -numa node,nodeid=1
> 
> To fix this issue, we've got to make sure to disable huge page support,
> too, when there is a NUMA node that is not using a memory backend with
> huge page support.
> 
> Fixes: 86b50f2e1befc33407bdfeb6f45f7b0d2439a740
> Signed-off-by: Thomas Huth <thuth@redhat.com>
> ---
>  target-ppc/kvm.c | 10 +++++++---
>  1 file changed, 7 insertions(+), 3 deletions(-)

Applied to ppc-for-2.7, thanks.

> 
> diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
> index 884d564..7a8f555 100644
> --- a/target-ppc/kvm.c
> +++ b/target-ppc/kvm.c
> @@ -389,12 +389,16 @@ static long getrampagesize(void)
>  
>      object_child_foreach(memdev_root, find_max_supported_pagesize, &hpsize);
>  
> -    if (hpsize == LONG_MAX) {
> +    if (hpsize == LONG_MAX || hpsize == getpagesize()) {
>          return getpagesize();
>      }
>  
> -    if (nb_numa_nodes == 0 && hpsize > getpagesize()) {
> -        /* No NUMA nodes and normal RAM without -mem-path ==> no huge pages! */
> +    /* If NUMA is disabled or the NUMA nodes are not backed with a
> +     * memory-backend, then there is at least one node using "normal"
> +     * RAM. And since normal RAM has not been configured with "-mem-path"
> +     * (what we've checked earlier here already), we can not use huge pages!
> +     */
> +    if (nb_numa_nodes == 0 || numa_info[0].node_memdev == NULL) {
>          static bool warned;
>          if (!warned) {
>              error_report("Huge page support disabled (n/a for main memory).");
Greg Kurz July 18, 2016, 8:59 a.m. UTC | #8
On Mon, 18 Jul 2016 10:52:36 +1000
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Fri, Jul 15, 2016 at 10:10:25AM +0200, Thomas Huth wrote:
> > Commit 86b50f2e1bef ("Disable huge page support if it is not available
> > for main RAM") already made sure that huge page support is not announced
> > to the guest if the normal RAM of non-NUMA configurations is not backed
> > by a huge page filesystem. However, there is one more case that can go
> > wrong: NUMA is enabled, but the RAM of the NUMA nodes are not configured
> > with huge page support (and only the memory of a DIMM is configured with
> > it). When QEMU is started with the following command line for example,
> > the Linux guest currently crashes because it is trying to use huge pages
> > on a memory region that does not support huge pages:
> > 
> >  qemu-system-ppc64 -enable-kvm ... -m 1G,slots=4,maxmem=32G -object \
> >    memory-backend-file,policy=default,mem-path=/hugepages,size=1G,id=mem-mem1 \
> >    -device pc-dimm,id=dimm-mem1,memdev=mem-mem1 -smp 2 \
> >    -numa node,nodeid=0 -numa node,nodeid=1
> > 
> > To fix this issue, we've got to make sure to disable huge page support,
> > too, when there is a NUMA node that is not using a memory backend with
> > huge page support.
> > 
> > Fixes: 86b50f2e1befc33407bdfeb6f45f7b0d2439a740
> > Signed-off-by: Thomas Huth <thuth@redhat.com>
> > ---
> >  target-ppc/kvm.c | 10 +++++++---
> >  1 file changed, 7 insertions(+), 3 deletions(-)  
> 
> Applied to ppc-for-2.7, thanks.
> 

It looks like my replies to this patch were ignored... no big deal though :)

> > 
> > diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
> > index 884d564..7a8f555 100644
> > --- a/target-ppc/kvm.c
> > +++ b/target-ppc/kvm.c
> > @@ -389,12 +389,16 @@ static long getrampagesize(void)
> >  
> >      object_child_foreach(memdev_root, find_max_supported_pagesize, &hpsize);
> >  
> > -    if (hpsize == LONG_MAX) {
> > +    if (hpsize == LONG_MAX || hpsize == getpagesize()) {
> >          return getpagesize();
> >      }
> >  
> > -    if (nb_numa_nodes == 0 && hpsize > getpagesize()) {
> > -        /* No NUMA nodes and normal RAM without -mem-path ==> no huge pages! */
> > +    /* If NUMA is disabled or the NUMA nodes are not backed with a
> > +     * memory-backend, then there is at least one node using "normal"
> > +     * RAM. And since normal RAM has not been configured with "-mem-path"
> > +     * (what we've checked earlier here already), we can not use huge pages!
> > +     */
> > +    if (nb_numa_nodes == 0 || numa_info[0].node_memdev == NULL) {
> >          static bool warned;
> >          if (!warned) {
> >              error_report("Huge page support disabled (n/a for main memory).");  
>
Thomas Huth July 18, 2016, 9:04 a.m. UTC | #9
On 18.07.2016 10:59, Greg Kurz wrote:
> On Mon, 18 Jul 2016 10:52:36 +1000
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
>> On Fri, Jul 15, 2016 at 10:10:25AM +0200, Thomas Huth wrote:
>>> Commit 86b50f2e1bef ("Disable huge page support if it is not available
>>> for main RAM") already made sure that huge page support is not announced
>>> to the guest if the normal RAM of non-NUMA configurations is not backed
>>> by a huge page filesystem. However, there is one more case that can go
>>> wrong: NUMA is enabled, but the RAM of the NUMA nodes are not configured
>>> with huge page support (and only the memory of a DIMM is configured with
>>> it). When QEMU is started with the following command line for example,
>>> the Linux guest currently crashes because it is trying to use huge pages
>>> on a memory region that does not support huge pages:
>>>
>>>  qemu-system-ppc64 -enable-kvm ... -m 1G,slots=4,maxmem=32G -object \
>>>    memory-backend-file,policy=default,mem-path=/hugepages,size=1G,id=mem-mem1 \
>>>    -device pc-dimm,id=dimm-mem1,memdev=mem-mem1 -smp 2 \
>>>    -numa node,nodeid=0 -numa node,nodeid=1
>>>
>>> To fix this issue, we've got to make sure to disable huge page support,
>>> too, when there is a NUMA node that is not using a memory backend with
>>> huge page support.
>>>
>>> Fixes: 86b50f2e1befc33407bdfeb6f45f7b0d2439a740
>>> Signed-off-by: Thomas Huth <thuth@redhat.com>
>>> ---
>>>  target-ppc/kvm.c | 10 +++++++---
>>>  1 file changed, 7 insertions(+), 3 deletions(-)  
>>
>> Applied to ppc-for-2.7, thanks.
>>
> 
> It looks like my replies to this patch were ignored... no big deal though :)

I'll try to come up with an additional patch that fixes the remaining
problem that you've found... Meanwhile, did you find out why you get
that assertion that I was not able to recreate? Could you maybe post the
exact command line to trigger that assertion?

 Thomas
David Gibson July 18, 2016, 9:21 a.m. UTC | #10
On Mon, Jul 18, 2016 at 10:59:44AM +0200, Greg Kurz wrote:
> On Mon, 18 Jul 2016 10:52:36 +1000
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > On Fri, Jul 15, 2016 at 10:10:25AM +0200, Thomas Huth wrote:
> > > Commit 86b50f2e1bef ("Disable huge page support if it is not available
> > > for main RAM") already made sure that huge page support is not announced
> > > to the guest if the normal RAM of non-NUMA configurations is not backed
> > > by a huge page filesystem. However, there is one more case that can go
> > > wrong: NUMA is enabled, but the RAM of the NUMA nodes are not configured
> > > with huge page support (and only the memory of a DIMM is configured with
> > > it). When QEMU is started with the following command line for example,
> > > the Linux guest currently crashes because it is trying to use huge pages
> > > on a memory region that does not support huge pages:
> > > 
> > >  qemu-system-ppc64 -enable-kvm ... -m 1G,slots=4,maxmem=32G -object \
> > >    memory-backend-file,policy=default,mem-path=/hugepages,size=1G,id=mem-mem1 \
> > >    -device pc-dimm,id=dimm-mem1,memdev=mem-mem1 -smp 2 \
> > >    -numa node,nodeid=0 -numa node,nodeid=1
> > > 
> > > To fix this issue, we've got to make sure to disable huge page support,
> > > too, when there is a NUMA node that is not using a memory backend with
> > > huge page support.
> > > 
> > > Fixes: 86b50f2e1befc33407bdfeb6f45f7b0d2439a740
> > > Signed-off-by: Thomas Huth <thuth@redhat.com>
> > > ---
> > >  target-ppc/kvm.c | 10 +++++++---
> > >  1 file changed, 7 insertions(+), 3 deletions(-)  
> > 
> > Applied to ppc-for-2.7, thanks.
> > 
> 
> It looks like my replies to this patch were ignored... no big deal
> though :)

I saw them.  IIUC, though, this patch still improves the situation,
even if it doesn't get every case right, so I was inclined to include
it earlier rather than later.

> 
> > > 
> > > diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
> > > index 884d564..7a8f555 100644
> > > --- a/target-ppc/kvm.c
> > > +++ b/target-ppc/kvm.c
> > > @@ -389,12 +389,16 @@ static long getrampagesize(void)
> > >  
> > >      object_child_foreach(memdev_root, find_max_supported_pagesize, &hpsize);
> > >  
> > > -    if (hpsize == LONG_MAX) {
> > > +    if (hpsize == LONG_MAX || hpsize == getpagesize()) {
> > >          return getpagesize();
> > >      }
> > >  
> > > -    if (nb_numa_nodes == 0 && hpsize > getpagesize()) {
> > > -        /* No NUMA nodes and normal RAM without -mem-path ==> no huge pages! */
> > > +    /* If NUMA is disabled or the NUMA nodes are not backed with a
> > > +     * memory-backend, then there is at least one node using "normal"
> > > +     * RAM. And since normal RAM has not been configured with "-mem-path"
> > > +     * (what we've checked earlier here already), we can not use huge pages!
> > > +     */
> > > +    if (nb_numa_nodes == 0 || numa_info[0].node_memdev == NULL) {
> > >          static bool warned;
> > >          if (!warned) {
> > >              error_report("Huge page support disabled (n/a for main memory).");  
> > 
>
Thomas Huth July 18, 2016, 9:21 a.m. UTC | #11
On 15.07.2016 11:28, Greg Kurz wrote:
> On Fri, 15 Jul 2016 10:10:25 +0200
> Thomas Huth <thuth@redhat.com> wrote:
> 
>> Commit 86b50f2e1bef ("Disable huge page support if it is not available
>> for main RAM") already made sure that huge page support is not announced
>> to the guest if the normal RAM of non-NUMA configurations is not backed
>> by a huge page filesystem. However, there is one more case that can go
>> wrong: NUMA is enabled, but the RAM of the NUMA nodes are not configured
>> with huge page support (and only the memory of a DIMM is configured with
>> it). When QEMU is started with the following command line for example,
>> the Linux guest currently crashes because it is trying to use huge pages
>> on a memory region that does not support huge pages:
>>
>>  qemu-system-ppc64 -enable-kvm ... -m 1G,slots=4,maxmem=32G -object \
>>    memory-backend-file,policy=default,mem-path=/hugepages,size=1G,id=mem-mem1 \
>>    -device pc-dimm,id=dimm-mem1,memdev=mem-mem1 -smp 2 \
>>    -numa node,nodeid=0 -numa node,nodeid=1
>>
>> To fix this issue, we've got to make sure to disable huge page support,
>> too, when there is a NUMA node that is not using a memory backend with
>> huge page support.
>>
>> Fixes: 86b50f2e1befc33407bdfeb6f45f7b0d2439a740
> 
> According to http://patchwork.ozlabs.org/patch/584741/ , it is best worded
> 
> "Broken in commit 86b50f2e1bef"

Using the "Fixes:" syntax is a well-known practise with the Linux kernel
(see
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/SubmittingPatches?id=HEAD#n187),
so I don't see the point why we should introduce another syntax for QEMU
here. And if we do, it should be documented on
http://qemu-project.org/Contribute/SubmitAPatch at least.

 Thomas
Greg Kurz July 18, 2016, 9:26 a.m. UTC | #12
On Mon, 18 Jul 2016 11:04:39 +0200
Thomas Huth <thuth@redhat.com> wrote:

> On 18.07.2016 10:59, Greg Kurz wrote:
> > On Mon, 18 Jul 2016 10:52:36 +1000
> > David Gibson <david@gibson.dropbear.id.au> wrote:
> >   
> >> On Fri, Jul 15, 2016 at 10:10:25AM +0200, Thomas Huth wrote:  
> >>> Commit 86b50f2e1bef ("Disable huge page support if it is not available
> >>> for main RAM") already made sure that huge page support is not announced
> >>> to the guest if the normal RAM of non-NUMA configurations is not backed
> >>> by a huge page filesystem. However, there is one more case that can go
> >>> wrong: NUMA is enabled, but the RAM of the NUMA nodes are not configured
> >>> with huge page support (and only the memory of a DIMM is configured with
> >>> it). When QEMU is started with the following command line for example,
> >>> the Linux guest currently crashes because it is trying to use huge pages
> >>> on a memory region that does not support huge pages:
> >>>
> >>>  qemu-system-ppc64 -enable-kvm ... -m 1G,slots=4,maxmem=32G -object \
> >>>    memory-backend-file,policy=default,mem-path=/hugepages,size=1G,id=mem-mem1 \
> >>>    -device pc-dimm,id=dimm-mem1,memdev=mem-mem1 -smp 2 \
> >>>    -numa node,nodeid=0 -numa node,nodeid=1
> >>>
> >>> To fix this issue, we've got to make sure to disable huge page support,
> >>> too, when there is a NUMA node that is not using a memory backend with
> >>> huge page support.
> >>>
> >>> Fixes: 86b50f2e1befc33407bdfeb6f45f7b0d2439a740
> >>> Signed-off-by: Thomas Huth <thuth@redhat.com>
> >>> ---
> >>>  target-ppc/kvm.c | 10 +++++++---
> >>>  1 file changed, 7 insertions(+), 3 deletions(-)    
> >>
> >> Applied to ppc-for-2.7, thanks.
> >>  
> > 
> > It looks like my replies to this patch were ignored... no big deal though :)  
> 
> I'll try to come up with an additional patch that fixes the remaining
> problem that you've found... Meanwhile, did you find out why you get
> that assertion that I was not able to recreate? Could you maybe post the
> exact command line to trigger that assertion?
> 

I hit the assertion when I specify pc-dimm devices on the command line:

qemu-system-ppc64 -enable-kvm ... -m 2G,slots=4,maxmem=32G \
 -object memory-backend-file,policy=default,mem-path=/mnt/kvm_hugepage,size=1G,id=mem-mem1 \
 -device pc-dimm,id=dimm-mem1,memdev=mem-mem1 \
 -object memory-backend-file,policy=default,mem-path=/mnt,size=1G,id=mem-mem2 \
 -device pc-dimm,id=dimm-mem2,memdev=mem-mem2 \
 -smp 2 -numa node,nodeid=0,memdev=mem-mem1 \
 -numa node,nodeid=1,memdev=mem-mem2

>  Thomas
> 
>
Thomas Huth July 18, 2016, 9:33 a.m. UTC | #13
On 18.07.2016 11:26, Greg Kurz wrote:
> On Mon, 18 Jul 2016 11:04:39 +0200
> Thomas Huth <thuth@redhat.com> wrote:
> 
>> On 18.07.2016 10:59, Greg Kurz wrote:
>>> On Mon, 18 Jul 2016 10:52:36 +1000
>>> David Gibson <david@gibson.dropbear.id.au> wrote:
>>>   
>>>> On Fri, Jul 15, 2016 at 10:10:25AM +0200, Thomas Huth wrote:  
>>>>> Commit 86b50f2e1bef ("Disable huge page support if it is not available
>>>>> for main RAM") already made sure that huge page support is not announced
>>>>> to the guest if the normal RAM of non-NUMA configurations is not backed
>>>>> by a huge page filesystem. However, there is one more case that can go
>>>>> wrong: NUMA is enabled, but the RAM of the NUMA nodes are not configured
>>>>> with huge page support (and only the memory of a DIMM is configured with
>>>>> it). When QEMU is started with the following command line for example,
>>>>> the Linux guest currently crashes because it is trying to use huge pages
>>>>> on a memory region that does not support huge pages:
>>>>>
>>>>>  qemu-system-ppc64 -enable-kvm ... -m 1G,slots=4,maxmem=32G -object \
>>>>>    memory-backend-file,policy=default,mem-path=/hugepages,size=1G,id=mem-mem1 \
>>>>>    -device pc-dimm,id=dimm-mem1,memdev=mem-mem1 -smp 2 \
>>>>>    -numa node,nodeid=0 -numa node,nodeid=1
>>>>>
>>>>> To fix this issue, we've got to make sure to disable huge page support,
>>>>> too, when there is a NUMA node that is not using a memory backend with
>>>>> huge page support.
>>>>>
>>>>> Fixes: 86b50f2e1befc33407bdfeb6f45f7b0d2439a740
>>>>> Signed-off-by: Thomas Huth <thuth@redhat.com>
>>>>> ---
>>>>>  target-ppc/kvm.c | 10 +++++++---
>>>>>  1 file changed, 7 insertions(+), 3 deletions(-)    
>>>>
>>>> Applied to ppc-for-2.7, thanks.
>>>>  
>>>
>>> It looks like my replies to this patch were ignored... no big deal though :)  
>>
>> I'll try to come up with an additional patch that fixes the remaining
>> problem that you've found... Meanwhile, did you find out why you get
>> that assertion that I was not able to recreate? Could you maybe post the
>> exact command line to trigger that assertion?
>>
> 
> I hit the assertion when I specify pc-dimm devices on the command line:
> 
> qemu-system-ppc64 -enable-kvm ... -m 2G,slots=4,maxmem=32G \
>  -object memory-backend-file,policy=default,mem-path=/mnt/kvm_hugepage,size=1G,id=mem-mem1 \
>  -device pc-dimm,id=dimm-mem1,memdev=mem-mem1 \
>  -object memory-backend-file,policy=default,mem-path=/mnt,size=1G,id=mem-mem2 \
>  -device pc-dimm,id=dimm-mem2,memdev=mem-mem2 \
>  -smp 2 -numa node,nodeid=0,memdev=mem-mem1 \
>  -numa node,nodeid=1,memdev=mem-mem2

FWIW, with that command line, I still don't get an assertion but a
normal error message:

qemu-system-ppc64: -device pc-dimm,id=dimm-mem1,memdev=mem-mem1: can't
use already busy memdev: mem-mem1

 Thomas
Greg Kurz July 18, 2016, 9:36 a.m. UTC | #14
On Mon, 18 Jul 2016 11:21:41 +0200
Thomas Huth <thuth@redhat.com> wrote:

> On 15.07.2016 11:28, Greg Kurz wrote:
> > On Fri, 15 Jul 2016 10:10:25 +0200
> > Thomas Huth <thuth@redhat.com> wrote:
> >   
> >> Commit 86b50f2e1bef ("Disable huge page support if it is not available
> >> for main RAM") already made sure that huge page support is not announced
> >> to the guest if the normal RAM of non-NUMA configurations is not backed
> >> by a huge page filesystem. However, there is one more case that can go
> >> wrong: NUMA is enabled, but the RAM of the NUMA nodes are not configured
> >> with huge page support (and only the memory of a DIMM is configured with
> >> it). When QEMU is started with the following command line for example,
> >> the Linux guest currently crashes because it is trying to use huge pages
> >> on a memory region that does not support huge pages:
> >>
> >>  qemu-system-ppc64 -enable-kvm ... -m 1G,slots=4,maxmem=32G -object \
> >>    memory-backend-file,policy=default,mem-path=/hugepages,size=1G,id=mem-mem1 \
> >>    -device pc-dimm,id=dimm-mem1,memdev=mem-mem1 -smp 2 \
> >>    -numa node,nodeid=0 -numa node,nodeid=1
> >>
> >> To fix this issue, we've got to make sure to disable huge page support,
> >> too, when there is a NUMA node that is not using a memory backend with
> >> huge page support.
> >>
> >> Fixes: 86b50f2e1befc33407bdfeb6f45f7b0d2439a740  
> > 
> > According to http://patchwork.ozlabs.org/patch/584741/ , it is best worded
> > 
> > "Broken in commit 86b50f2e1bef"  
> 
> Using the "Fixes:" syntax is a well-known practise with the Linux kernel
> (see
> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/SubmittingPatches?id=HEAD#n187),
> so I don't see the point why we should introduce another syntax for QEMU
> here. And if we do, it should be documented on
> http://qemu-project.org/Contribute/SubmitAPatch at least.
> 
>  Thomas
> 

I had used this syntax in the first place at the time, but Markus's and
David's comments lured me into believing there was a consensus against
it.

FWIW, I personally prefer the Linux kernel "Fixes:" syntax :)

Cheers.

--
Greg
Greg Kurz July 18, 2016, 10:01 a.m. UTC | #15
On Mon, 18 Jul 2016 19:21:08 +1000
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Mon, Jul 18, 2016 at 10:59:44AM +0200, Greg Kurz wrote:
> > On Mon, 18 Jul 2016 10:52:36 +1000
> > David Gibson <david@gibson.dropbear.id.au> wrote:
> >   
> > > On Fri, Jul 15, 2016 at 10:10:25AM +0200, Thomas Huth wrote:  
> > > > Commit 86b50f2e1bef ("Disable huge page support if it is not available
> > > > for main RAM") already made sure that huge page support is not announced
> > > > to the guest if the normal RAM of non-NUMA configurations is not backed
> > > > by a huge page filesystem. However, there is one more case that can go
> > > > wrong: NUMA is enabled, but the RAM of the NUMA nodes are not configured
> > > > with huge page support (and only the memory of a DIMM is configured with
> > > > it). When QEMU is started with the following command line for example,
> > > > the Linux guest currently crashes because it is trying to use huge pages
> > > > on a memory region that does not support huge pages:
> > > > 
> > > >  qemu-system-ppc64 -enable-kvm ... -m 1G,slots=4,maxmem=32G -object \
> > > >    memory-backend-file,policy=default,mem-path=/hugepages,size=1G,id=mem-mem1 \
> > > >    -device pc-dimm,id=dimm-mem1,memdev=mem-mem1 -smp 2 \
> > > >    -numa node,nodeid=0 -numa node,nodeid=1
> > > > 
> > > > To fix this issue, we've got to make sure to disable huge page support,
> > > > too, when there is a NUMA node that is not using a memory backend with
> > > > huge page support.
> > > > 
> > > > Fixes: 86b50f2e1befc33407bdfeb6f45f7b0d2439a740
> > > > Signed-off-by: Thomas Huth <thuth@redhat.com>
> > > > ---
> > > >  target-ppc/kvm.c | 10 +++++++---
> > > >  1 file changed, 7 insertions(+), 3 deletions(-)    
> > > 
> > > Applied to ppc-for-2.7, thanks.
> > >   
> > 
> > It looks like my replies to this patch were ignored... no big deal
> > though :)  
> 
> I saw them.  IIUC, though, this patch still improves the situation,
> even if it doesn't get every case right, so I was inclined to include
> it earlier rather than later.
> 

Of course it does and I acked that ! It was more about the 'Fixes:' syntax
and credits but nevermind. :)

> >   
> > > > 
> > > > diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
> > > > index 884d564..7a8f555 100644
> > > > --- a/target-ppc/kvm.c
> > > > +++ b/target-ppc/kvm.c
> > > > @@ -389,12 +389,16 @@ static long getrampagesize(void)
> > > >  
> > > >      object_child_foreach(memdev_root, find_max_supported_pagesize, &hpsize);
> > > >  
> > > > -    if (hpsize == LONG_MAX) {
> > > > +    if (hpsize == LONG_MAX || hpsize == getpagesize()) {
> > > >          return getpagesize();
> > > >      }
> > > >  
> > > > -    if (nb_numa_nodes == 0 && hpsize > getpagesize()) {
> > > > -        /* No NUMA nodes and normal RAM without -mem-path ==> no huge pages! */
> > > > +    /* If NUMA is disabled or the NUMA nodes are not backed with a
> > > > +     * memory-backend, then there is at least one node using "normal"
> > > > +     * RAM. And since normal RAM has not been configured with "-mem-path"
> > > > +     * (what we've checked earlier here already), we can not use huge pages!
> > > > +     */
> > > > +    if (nb_numa_nodes == 0 || numa_info[0].node_memdev == NULL) {
> > > >          static bool warned;
> > > >          if (!warned) {
> > > >              error_report("Huge page support disabled (n/a for main memory).");    
> > >   
> >   
> 
> 
>
Greg Kurz July 18, 2016, 10:44 a.m. UTC | #16
On Mon, 18 Jul 2016 11:33:16 +0200
Thomas Huth <thuth@redhat.com> wrote:

> On 18.07.2016 11:26, Greg Kurz wrote:
> > On Mon, 18 Jul 2016 11:04:39 +0200
> > Thomas Huth <thuth@redhat.com> wrote:
> >   
> >> On 18.07.2016 10:59, Greg Kurz wrote:  
> >>> On Mon, 18 Jul 2016 10:52:36 +1000
> >>> David Gibson <david@gibson.dropbear.id.au> wrote:
> >>>     
> >>>> On Fri, Jul 15, 2016 at 10:10:25AM +0200, Thomas Huth wrote:    
> >>>>> Commit 86b50f2e1bef ("Disable huge page support if it is not available
> >>>>> for main RAM") already made sure that huge page support is not announced
> >>>>> to the guest if the normal RAM of non-NUMA configurations is not backed
> >>>>> by a huge page filesystem. However, there is one more case that can go
> >>>>> wrong: NUMA is enabled, but the RAM of the NUMA nodes are not configured
> >>>>> with huge page support (and only the memory of a DIMM is configured with
> >>>>> it). When QEMU is started with the following command line for example,
> >>>>> the Linux guest currently crashes because it is trying to use huge pages
> >>>>> on a memory region that does not support huge pages:
> >>>>>
> >>>>>  qemu-system-ppc64 -enable-kvm ... -m 1G,slots=4,maxmem=32G -object \
> >>>>>    memory-backend-file,policy=default,mem-path=/hugepages,size=1G,id=mem-mem1 \
> >>>>>    -device pc-dimm,id=dimm-mem1,memdev=mem-mem1 -smp 2 \
> >>>>>    -numa node,nodeid=0 -numa node,nodeid=1
> >>>>>
> >>>>> To fix this issue, we've got to make sure to disable huge page support,
> >>>>> too, when there is a NUMA node that is not using a memory backend with
> >>>>> huge page support.
> >>>>>
> >>>>> Fixes: 86b50f2e1befc33407bdfeb6f45f7b0d2439a740
> >>>>> Signed-off-by: Thomas Huth <thuth@redhat.com>
> >>>>> ---
> >>>>>  target-ppc/kvm.c | 10 +++++++---
> >>>>>  1 file changed, 7 insertions(+), 3 deletions(-)      
> >>>>
> >>>> Applied to ppc-for-2.7, thanks.
> >>>>    
> >>>
> >>> It looks like my replies to this patch were ignored... no big deal though :)    
> >>
> >> I'll try to come up with an additional patch that fixes the remaining
> >> problem that you've found... Meanwhile, did you find out why you get
> >> that assertion that I was not able to recreate? Could you maybe post the
> >> exact command line to trigger that assertion?
> >>  
> > 
> > I hit the assertion when I specify pc-dimm devices on the command line:
> > 
> > qemu-system-ppc64 -enable-kvm ... -m 2G,slots=4,maxmem=32G \
> >  -object memory-backend-file,policy=default,mem-path=/mnt/kvm_hugepage,size=1G,id=mem-mem1 \
> >  -device pc-dimm,id=dimm-mem1,memdev=mem-mem1 \
> >  -object memory-backend-file,policy=default,mem-path=/mnt,size=1G,id=mem-mem2 \
> >  -device pc-dimm,id=dimm-mem2,memdev=mem-mem2 \
> >  -smp 2 -numa node,nodeid=0,memdev=mem-mem1 \
> >  -numa node,nodeid=1,memdev=mem-mem2  
> 
> FWIW, with that command line, I still don't get an assertion but a
> normal error message:
> 
> qemu-system-ppc64: -device pc-dimm,id=dimm-mem1,memdev=mem-mem1: can't
> use already busy memdev: mem-mem1
> 
>  Thomas
> 

I hit the assertion with this exact command line:

qemu-system-ppc64 -machine pseries,accel=kvm \
-m 1G,slots=4,maxmem=32G \
-object memory-backend-file,policy=default,mem-path=/dev/hugepages,size=1G,id=mem-mem1 \
-device pc-dimm,id=dimm-mem1,memdev=mem-mem1 \
-numa node,nodeid=0,memdev=mem-mem1 \
-S

QEMU was built against David's ppc-for-2.7 branch (commit 159d2e39a).

But I get the very same error as you with QEMU 2.6... regression ?

And BTW, I'm not sure to understand why it is wrong to specify both pc-dimm and
numa pointing to the same memory backend.

Cheers.

--
Greg
Thomas Huth July 18, 2016, 1:16 p.m. UTC | #17
On 18.07.2016 12:44, Greg Kurz wrote:
> On Mon, 18 Jul 2016 11:33:16 +0200
> Thomas Huth <thuth@redhat.com> wrote:
> 
>> On 18.07.2016 11:26, Greg Kurz wrote:
>>> On Mon, 18 Jul 2016 11:04:39 +0200
>>> Thomas Huth <thuth@redhat.com> wrote:
>>>   
>>>> On 18.07.2016 10:59, Greg Kurz wrote:  
>>>>> On Mon, 18 Jul 2016 10:52:36 +1000
>>>>> David Gibson <david@gibson.dropbear.id.au> wrote:
>>>>>     
>>>>>> On Fri, Jul 15, 2016 at 10:10:25AM +0200, Thomas Huth wrote:    
>>>>>>> Commit 86b50f2e1bef ("Disable huge page support if it is not available
>>>>>>> for main RAM") already made sure that huge page support is not announced
>>>>>>> to the guest if the normal RAM of non-NUMA configurations is not backed
>>>>>>> by a huge page filesystem. However, there is one more case that can go
>>>>>>> wrong: NUMA is enabled, but the RAM of the NUMA nodes are not configured
>>>>>>> with huge page support (and only the memory of a DIMM is configured with
>>>>>>> it). When QEMU is started with the following command line for example,
>>>>>>> the Linux guest currently crashes because it is trying to use huge pages
>>>>>>> on a memory region that does not support huge pages:
>>>>>>>
>>>>>>>  qemu-system-ppc64 -enable-kvm ... -m 1G,slots=4,maxmem=32G -object \
>>>>>>>    memory-backend-file,policy=default,mem-path=/hugepages,size=1G,id=mem-mem1 \
>>>>>>>    -device pc-dimm,id=dimm-mem1,memdev=mem-mem1 -smp 2 \
>>>>>>>    -numa node,nodeid=0 -numa node,nodeid=1
>>>>>>>
>>>>>>> To fix this issue, we've got to make sure to disable huge page support,
>>>>>>> too, when there is a NUMA node that is not using a memory backend with
>>>>>>> huge page support.
>>>>>>>
>>>>>>> Fixes: 86b50f2e1befc33407bdfeb6f45f7b0d2439a740
>>>>>>> Signed-off-by: Thomas Huth <thuth@redhat.com>
>>>>>>> ---
>>>>>>>  target-ppc/kvm.c | 10 +++++++---
>>>>>>>  1 file changed, 7 insertions(+), 3 deletions(-)      
>>>>>>
>>>>>> Applied to ppc-for-2.7, thanks.
>>>>>>    
>>>>>
>>>>> It looks like my replies to this patch were ignored... no big deal though :)    
>>>>
>>>> I'll try to come up with an additional patch that fixes the remaining
>>>> problem that you've found... Meanwhile, did you find out why you get
>>>> that assertion that I was not able to recreate? Could you maybe post the
>>>> exact command line to trigger that assertion?
>>>>  
>>>
>>> I hit the assertion when I specify pc-dimm devices on the command line:
>>>
>>> qemu-system-ppc64 -enable-kvm ... -m 2G,slots=4,maxmem=32G \
>>>  -object memory-backend-file,policy=default,mem-path=/mnt/kvm_hugepage,size=1G,id=mem-mem1 \
>>>  -device pc-dimm,id=dimm-mem1,memdev=mem-mem1 \
>>>  -object memory-backend-file,policy=default,mem-path=/mnt,size=1G,id=mem-mem2 \
>>>  -device pc-dimm,id=dimm-mem2,memdev=mem-mem2 \
>>>  -smp 2 -numa node,nodeid=0,memdev=mem-mem1 \
>>>  -numa node,nodeid=1,memdev=mem-mem2  
>>
>> FWIW, with that command line, I still don't get an assertion but a
>> normal error message:
>>
>> qemu-system-ppc64: -device pc-dimm,id=dimm-mem1,memdev=mem-mem1: can't
>> use already busy memdev: mem-mem1
>>
>>  Thomas
>>
> 
> I hit the assertion with this exact command line:
> 
> qemu-system-ppc64 -machine pseries,accel=kvm \
> -m 1G,slots=4,maxmem=32G \
> -object memory-backend-file,policy=default,mem-path=/dev/hugepages,size=1G,id=mem-mem1 \
> -device pc-dimm,id=dimm-mem1,memdev=mem-mem1 \
> -numa node,nodeid=0,memdev=mem-mem1 \
> -S
> 
> QEMU was built against David's ppc-for-2.7 branch (commit 159d2e39a).
> 
> But I get the very same error as you with QEMU 2.6... regression ?

Not sure why I didn't get the assertion before, but after switching back
and forth between another and the current master branch, I now get the
assertion, too:

qemu-system-ppc64: /home/thuth/devel/qemu/memory.c:1934:
memory_region_add_subregion_common: Assertion `!subregion->container'
failed.

I've bisected it to the following commit:

	2aece63c8a9d2c3a8ff41d2febc4cdeff2633331
	hostmem: detect host backend memory is being used properly

Xiao, Paolo, do you have any idea why this assert() can be triggered now?

 Thomas
Greg Kurz July 18, 2016, 1:23 p.m. UTC | #18
On Mon, 18 Jul 2016 12:44:09 +0200
Greg Kurz <groug@kaod.org> wrote:

> On Mon, 18 Jul 2016 11:33:16 +0200
> Thomas Huth <thuth@redhat.com> wrote:
> 
> > On 18.07.2016 11:26, Greg Kurz wrote:  
> > > On Mon, 18 Jul 2016 11:04:39 +0200
> > > Thomas Huth <thuth@redhat.com> wrote:
> > >     
> > >> On 18.07.2016 10:59, Greg Kurz wrote:    
> > >>> On Mon, 18 Jul 2016 10:52:36 +1000
> > >>> David Gibson <david@gibson.dropbear.id.au> wrote:
> > >>>       
> > >>>> On Fri, Jul 15, 2016 at 10:10:25AM +0200, Thomas Huth wrote:      
> > >>>>> Commit 86b50f2e1bef ("Disable huge page support if it is not available
> > >>>>> for main RAM") already made sure that huge page support is not announced
> > >>>>> to the guest if the normal RAM of non-NUMA configurations is not backed
> > >>>>> by a huge page filesystem. However, there is one more case that can go
> > >>>>> wrong: NUMA is enabled, but the RAM of the NUMA nodes are not configured
> > >>>>> with huge page support (and only the memory of a DIMM is configured with
> > >>>>> it). When QEMU is started with the following command line for example,
> > >>>>> the Linux guest currently crashes because it is trying to use huge pages
> > >>>>> on a memory region that does not support huge pages:
> > >>>>>
> > >>>>>  qemu-system-ppc64 -enable-kvm ... -m 1G,slots=4,maxmem=32G -object \
> > >>>>>    memory-backend-file,policy=default,mem-path=/hugepages,size=1G,id=mem-mem1 \
> > >>>>>    -device pc-dimm,id=dimm-mem1,memdev=mem-mem1 -smp 2 \
> > >>>>>    -numa node,nodeid=0 -numa node,nodeid=1
> > >>>>>
> > >>>>> To fix this issue, we've got to make sure to disable huge page support,
> > >>>>> too, when there is a NUMA node that is not using a memory backend with
> > >>>>> huge page support.
> > >>>>>
> > >>>>> Fixes: 86b50f2e1befc33407bdfeb6f45f7b0d2439a740
> > >>>>> Signed-off-by: Thomas Huth <thuth@redhat.com>
> > >>>>> ---
> > >>>>>  target-ppc/kvm.c | 10 +++++++---
> > >>>>>  1 file changed, 7 insertions(+), 3 deletions(-)        
> > >>>>
> > >>>> Applied to ppc-for-2.7, thanks.
> > >>>>      
> > >>>
> > >>> It looks like my replies to this patch were ignored... no big deal though :)      
> > >>
> > >> I'll try to come up with an additional patch that fixes the remaining
> > >> problem that you've found... Meanwhile, did you find out why you get
> > >> that assertion that I was not able to recreate? Could you maybe post the
> > >> exact command line to trigger that assertion?
> > >>    
> > > 
> > > I hit the assertion when I specify pc-dimm devices on the command line:
> > > 
> > > qemu-system-ppc64 -enable-kvm ... -m 2G,slots=4,maxmem=32G \
> > >  -object memory-backend-file,policy=default,mem-path=/mnt/kvm_hugepage,size=1G,id=mem-mem1 \
> > >  -device pc-dimm,id=dimm-mem1,memdev=mem-mem1 \
> > >  -object memory-backend-file,policy=default,mem-path=/mnt,size=1G,id=mem-mem2 \
> > >  -device pc-dimm,id=dimm-mem2,memdev=mem-mem2 \
> > >  -smp 2 -numa node,nodeid=0,memdev=mem-mem1 \
> > >  -numa node,nodeid=1,memdev=mem-mem2    
> > 
> > FWIW, with that command line, I still don't get an assertion but a
> > normal error message:
> > 
> > qemu-system-ppc64: -device pc-dimm,id=dimm-mem1,memdev=mem-mem1: can't
> > use already busy memdev: mem-mem1
> > 
> >  Thomas
> >   
> 
> I hit the assertion with this exact command line:
> 
> qemu-system-ppc64 -machine pseries,accel=kvm \
> -m 1G,slots=4,maxmem=32G \
> -object memory-backend-file,policy=default,mem-path=/dev/hugepages,size=1G,id=mem-mem1 \
> -device pc-dimm,id=dimm-mem1,memdev=mem-mem1 \
> -numa node,nodeid=0,memdev=mem-mem1 \
> -S
> 
> QEMU was built against David's ppc-for-2.7 branch (commit 159d2e39a).
> 
> But I get the very same error as you with QEMU 2.6... regression ?
> 

Bisect leads to:

commit 2aece63c8a9d2c3a8ff41d2febc4cdeff2633331
Author: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Date:   Wed Jul 13 12:18:06 2016 +0800

    hostmem: detect host backend memory is being used properly

... any idea ?

> And BTW, I'm not sure to understand why it is wrong to specify both pc-dimm and
> numa pointing to the same memory backend.
> 
> Cheers.
> 
> --
> Greg
diff mbox

Patch

diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
index 884d564..7a8f555 100644
--- a/target-ppc/kvm.c
+++ b/target-ppc/kvm.c
@@ -389,12 +389,16 @@  static long getrampagesize(void)
 
     object_child_foreach(memdev_root, find_max_supported_pagesize, &hpsize);
 
-    if (hpsize == LONG_MAX) {
+    if (hpsize == LONG_MAX || hpsize == getpagesize()) {
         return getpagesize();
     }
 
-    if (nb_numa_nodes == 0 && hpsize > getpagesize()) {
-        /* No NUMA nodes and normal RAM without -mem-path ==> no huge pages! */
+    /* If NUMA is disabled or the NUMA nodes are not backed with a
+     * memory-backend, then there is at least one node using "normal"
+     * RAM. And since normal RAM has not been configured with "-mem-path"
+     * (what we've checked earlier here already), we can not use huge pages!
+     */
+    if (nb_numa_nodes == 0 || numa_info[0].node_memdev == NULL) {
         static bool warned;
         if (!warned) {
             error_report("Huge page support disabled (n/a for main memory).");