diff mbox series

hostmem: don't use mbind() if host-nodes is epmty

Message ID 20200430154606.6421-1-imammedo@redhat.com
State New
Headers show
Series hostmem: don't use mbind() if host-nodes is epmty | expand

Commit Message

Igor Mammedov April 30, 2020, 3:46 p.m. UTC
Since 5.0 QEMU uses hostmem backend for allocating main guest RAM.
The backend however calls mbind() which is typically NOP
in case of default policy/absent host-nodes bitmap.
However when runing in container with black-listed mbind()
syscall, QEMU fails to start with error
 "cannot bind memory to host NUMA nodes: Operation not permitted"
even when user hasn't provided host-nodes to pin to explictly
(which is the case with -m option)

To fix issue, call mbind() only in case when user has provided
host-nodes explicitly (i.e. host_nodes bitmap is not empty).
That should allow to run QEMU in containers with black-listed
mbind() without memory pinning. If QEMU provided memory-pinning
is required user still has to white-list mbind() in container
configuration.

Reported-by: Manuel Hohmann <mhohmann@physnet.uni-hamburg.de>
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
CC: berrange@redhat.com
CC: ehabkost@redhat.com
CC: pbonzini@redhat.com
CC: mhohmann@physnet.uni-hamburg.de
CC: qemu-stable@nongnu.org
---
 backends/hostmem.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

Comments

Philippe Mathieu-Daudé April 30, 2020, 4:42 p.m. UTC | #1
Typo "empty" in patch subject.

On 4/30/20 5:46 PM, Igor Mammedov wrote:
> Since 5.0 QEMU uses hostmem backend for allocating main guest RAM.
> The backend however calls mbind() which is typically NOP
> in case of default policy/absent host-nodes bitmap.
> However when runing in container with black-listed mbind()
> syscall, QEMU fails to start with error
>   "cannot bind memory to host NUMA nodes: Operation not permitted"
> even when user hasn't provided host-nodes to pin to explictly
> (which is the case with -m option)
> 
> To fix issue, call mbind() only in case when user has provided
> host-nodes explicitly (i.e. host_nodes bitmap is not empty).
> That should allow to run QEMU in containers with black-listed
> mbind() without memory pinning. If QEMU provided memory-pinning
> is required user still has to white-list mbind() in container
> configuration.
> 
> Reported-by: Manuel Hohmann <mhohmann@physnet.uni-hamburg.de>
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> ---
> CC: berrange@redhat.com
> CC: ehabkost@redhat.com
> CC: pbonzini@redhat.com
> CC: mhohmann@physnet.uni-hamburg.de
> CC: qemu-stable@nongnu.org
> ---
>   backends/hostmem.c | 6 ++++--
>   1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/backends/hostmem.c b/backends/hostmem.c
> index 327f9eebc3..0efd7b7bd6 100644
> --- a/backends/hostmem.c
> +++ b/backends/hostmem.c
> @@ -383,8 +383,10 @@ host_memory_backend_memory_complete(UserCreatable *uc, Error **errp)
>           assert(sizeof(backend->host_nodes) >=
>                  BITS_TO_LONGS(MAX_NODES + 1) * sizeof(unsigned long));
>           assert(maxnode <= MAX_NODES);
> -        if (mbind(ptr, sz, backend->policy,
> -                  maxnode ? backend->host_nodes : NULL, maxnode + 1, flags)) {
> +
> +        if (maxnode &&
> +            mbind(ptr, sz, backend->policy, backend->host_nodes, maxnode + 1,
> +                  flags)) {
>               if (backend->policy != MPOL_DEFAULT || errno != ENOSYS) {
>                   error_setg_errno(errp, errno,
>                                    "cannot bind memory to host NUMA nodes");
>
Manuel Hohmann May 1, 2020, 7:28 a.m. UTC | #2
Thanks! I applied the patch, and now it works also inside the docker container, for all architectures (i386, x86_64, arm, aarch64) for which I have test cases at hand.

Indeed, since the container is configured by a public cloud service, there is no possibility to change any security settings. Disabling mbind unless explicitly requested seems to be the best way to go here.

On 30.04.20 19:42, Philippe Mathieu-Daudé wrote:
> Typo "empty" in patch subject.
> 
> On 4/30/20 5:46 PM, Igor Mammedov wrote:
>> Since 5.0 QEMU uses hostmem backend for allocating main guest RAM.
>> The backend however calls mbind() which is typically NOP
>> in case of default policy/absent host-nodes bitmap.
>> However when runing in container with black-listed mbind()
>> syscall, QEMU fails to start with error
>>   "cannot bind memory to host NUMA nodes: Operation not permitted"
>> even when user hasn't provided host-nodes to pin to explictly
>> (which is the case with -m option)
>>
>> To fix issue, call mbind() only in case when user has provided
>> host-nodes explicitly (i.e. host_nodes bitmap is not empty).
>> That should allow to run QEMU in containers with black-listed
>> mbind() without memory pinning. If QEMU provided memory-pinning
>> is required user still has to white-list mbind() in container
>> configuration.
>>
>> Reported-by: Manuel Hohmann <mhohmann@physnet.uni-hamburg.de>
>> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
>> ---
>> CC: berrange@redhat.com
>> CC: ehabkost@redhat.com
>> CC: pbonzini@redhat.com
>> CC: mhohmann@physnet.uni-hamburg.de
>> CC: qemu-stable@nongnu.org
>> ---
>>   backends/hostmem.c | 6 ++++--
>>   1 file changed, 4 insertions(+), 2 deletions(-)
>>
>> diff --git a/backends/hostmem.c b/backends/hostmem.c
>> index 327f9eebc3..0efd7b7bd6 100644
>> --- a/backends/hostmem.c
>> +++ b/backends/hostmem.c
>> @@ -383,8 +383,10 @@ host_memory_backend_memory_complete(UserCreatable *uc, Error **errp)
>>           assert(sizeof(backend->host_nodes) >=
>>                  BITS_TO_LONGS(MAX_NODES + 1) * sizeof(unsigned long));
>>           assert(maxnode <= MAX_NODES);
>> -        if (mbind(ptr, sz, backend->policy,
>> -                  maxnode ? backend->host_nodes : NULL, maxnode + 1, flags)) {
>> +
>> +        if (maxnode &&
>> +            mbind(ptr, sz, backend->policy, backend->host_nodes, maxnode + 1,
>> +                  flags)) {
>>               if (backend->policy != MPOL_DEFAULT || errno != ENOSYS) {
>>                   error_setg_errno(errp, errno,
>>                                    "cannot bind memory to host NUMA nodes");
>>
>
Daniel P. Berrangé May 1, 2020, 8:57 a.m. UTC | #3
On Thu, Apr 30, 2020 at 11:46:06AM -0400, Igor Mammedov wrote:
> Since 5.0 QEMU uses hostmem backend for allocating main guest RAM.
> The backend however calls mbind() which is typically NOP
> in case of default policy/absent host-nodes bitmap.
> However when runing in container with black-listed mbind()
> syscall, QEMU fails to start with error
>  "cannot bind memory to host NUMA nodes: Operation not permitted"
> even when user hasn't provided host-nodes to pin to explictly
> (which is the case with -m option)
> 
> To fix issue, call mbind() only in case when user has provided
> host-nodes explicitly (i.e. host_nodes bitmap is not empty).
> That should allow to run QEMU in containers with black-listed
> mbind() without memory pinning. If QEMU provided memory-pinning
> is required user still has to white-list mbind() in container
> configuration.
> 
> Reported-by: Manuel Hohmann <mhohmann@physnet.uni-hamburg.de>
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> ---
> CC: berrange@redhat.com
> CC: ehabkost@redhat.com
> CC: pbonzini@redhat.com
> CC: mhohmann@physnet.uni-hamburg.de
> CC: qemu-stable@nongnu.org
> ---
>  backends/hostmem.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/backends/hostmem.c b/backends/hostmem.c
> index 327f9eebc3..0efd7b7bd6 100644
> --- a/backends/hostmem.c
> +++ b/backends/hostmem.c
> @@ -383,8 +383,10 @@ host_memory_backend_memory_complete(UserCreatable *uc, Error **errp)
>          assert(sizeof(backend->host_nodes) >=
>                 BITS_TO_LONGS(MAX_NODES + 1) * sizeof(unsigned long));
>          assert(maxnode <= MAX_NODES);
> -        if (mbind(ptr, sz, backend->policy,
> -                  maxnode ? backend->host_nodes : NULL, maxnode + 1, flags)) {
> +
> +        if (maxnode &&
> +            mbind(ptr, sz, backend->policy, backend->host_nodes, maxnode + 1,
> +                  flags)) {
>              if (backend->policy != MPOL_DEFAULT || errno != ENOSYS) {
>                  error_setg_errno(errp, errno,
>                                   "cannot bind memory to host NUMA nodes");

personally I would have found this code clearer if the
check had been  "if (backend->policy != MPOL_DEFAULT && ..."
as I had to read quite a few lines to understand that the
'maxnode' is zero if-and-only-if  policy == MPOL_DEFAULT

Regardless though, this is functionally correct so

   Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>

Regards,
Daniel
Philippe Mathieu-Daudé May 4, 2020, 2:31 p.m. UTC | #4
On 5/1/20 10:57 AM, Daniel P. Berrangé wrote:
> On Thu, Apr 30, 2020 at 11:46:06AM -0400, Igor Mammedov wrote:
>> Since 5.0 QEMU uses hostmem backend for allocating main guest RAM.
>> The backend however calls mbind() which is typically NOP
>> in case of default policy/absent host-nodes bitmap.
>> However when runing in container with black-listed mbind()
>> syscall, QEMU fails to start with error
>>   "cannot bind memory to host NUMA nodes: Operation not permitted"
>> even when user hasn't provided host-nodes to pin to explictly
>> (which is the case with -m option)
>>
>> To fix issue, call mbind() only in case when user has provided
>> host-nodes explicitly (i.e. host_nodes bitmap is not empty).
>> That should allow to run QEMU in containers with black-listed
>> mbind() without memory pinning. If QEMU provided memory-pinning
>> is required user still has to white-list mbind() in container
>> configuration.
>>
>> Reported-by: Manuel Hohmann <mhohmann@physnet.uni-hamburg.de>
>> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
>> ---
>> CC: berrange@redhat.com
>> CC: ehabkost@redhat.com
>> CC: pbonzini@redhat.com
>> CC: mhohmann@physnet.uni-hamburg.de
>> CC: qemu-stable@nongnu.org
>> ---
>>   backends/hostmem.c | 6 ++++--
>>   1 file changed, 4 insertions(+), 2 deletions(-)
>>
>> diff --git a/backends/hostmem.c b/backends/hostmem.c
>> index 327f9eebc3..0efd7b7bd6 100644
>> --- a/backends/hostmem.c
>> +++ b/backends/hostmem.c
>> @@ -383,8 +383,10 @@ host_memory_backend_memory_complete(UserCreatable *uc, Error **errp)
>>           assert(sizeof(backend->host_nodes) >=
>>                  BITS_TO_LONGS(MAX_NODES + 1) * sizeof(unsigned long));
>>           assert(maxnode <= MAX_NODES);
>> -        if (mbind(ptr, sz, backend->policy,
>> -                  maxnode ? backend->host_nodes : NULL, maxnode + 1, flags)) {
>> +
>> +        if (maxnode &&
>> +            mbind(ptr, sz, backend->policy, backend->host_nodes, maxnode + 1,
>> +                  flags)) {
>>               if (backend->policy != MPOL_DEFAULT || errno != ENOSYS) {
>>                   error_setg_errno(errp, errno,
>>                                    "cannot bind memory to host NUMA nodes");
> 
> personally I would have found this code clearer if the
> check had been  "if (backend->policy != MPOL_DEFAULT && ..."
> as I had to read quite a few lines to understand that the
> 'maxnode' is zero if-and-only-if  policy == MPOL_DEFAULT
> 
> Regardless though, this is functionally correct so
> 
>     Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>

I could reproduce running 'make check-qtest-hppa' on the qemu:fedora image:

   TEST    check-qtest-hppa: tests/qtest/boot-serial-test
qemu-system-hppa: cannot bind memory to host NUMA nodes: Operation not 
permitted
Broken pipe
tests/qtest/libqtest.c:166: kill_qemu() tried to terminate QEMU process 
but encountered exit status 1 (expected 0)
ERROR - too few tests run (expected 1, got 0)
make: *** [tests/Makefile.include:637: check-qtest-hppa] Error 1

Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>

> 
> Regards,
> Daniel
>
Eduardo Habkost May 4, 2020, 3:44 p.m. UTC | #5
On Thu, Apr 30, 2020 at 11:46:06AM -0400, Igor Mammedov wrote:
> Since 5.0 QEMU uses hostmem backend for allocating main guest RAM.
> The backend however calls mbind() which is typically NOP
> in case of default policy/absent host-nodes bitmap.
> However when runing in container with black-listed mbind()
> syscall, QEMU fails to start with error
>  "cannot bind memory to host NUMA nodes: Operation not permitted"
> even when user hasn't provided host-nodes to pin to explictly
> (which is the case with -m option)
> 
> To fix issue, call mbind() only in case when user has provided
> host-nodes explicitly (i.e. host_nodes bitmap is not empty).
> That should allow to run QEMU in containers with black-listed
> mbind() without memory pinning. If QEMU provided memory-pinning
> is required user still has to white-list mbind() in container
> configuration.
> 
> Reported-by: Manuel Hohmann <mhohmann@physnet.uni-hamburg.de>
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>

Queued on machine-next, thanks!
Philippe Mathieu-Daudé May 11, 2020, 4 p.m. UTC | #6
Hi Eduardo,

On 5/4/20 5:44 PM, Eduardo Habkost wrote:
> On Thu, Apr 30, 2020 at 11:46:06AM -0400, Igor Mammedov wrote:
>> Since 5.0 QEMU uses hostmem backend for allocating main guest RAM.
>> The backend however calls mbind() which is typically NOP
>> in case of default policy/absent host-nodes bitmap.
>> However when runing in container with black-listed mbind()
>> syscall, QEMU fails to start with error
>>   "cannot bind memory to host NUMA nodes: Operation not permitted"
>> even when user hasn't provided host-nodes to pin to explictly
>> (which is the case with -m option)
>>
>> To fix issue, call mbind() only in case when user has provided
>> host-nodes explicitly (i.e. host_nodes bitmap is not empty).
>> That should allow to run QEMU in containers with black-listed
>> mbind() without memory pinning. If QEMU provided memory-pinning
>> is required user still has to white-list mbind() in container
>> configuration.
>>
>> Reported-by: Manuel Hohmann <mhohmann@physnet.uni-hamburg.de>
>> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> 
> Queued on machine-next, thanks!

I've been debugging this issue again today and figured it was not 
merged, if possible can you add the "Cc: qemu-stable@nongnu.org" tag 
before sending your pull request?

Thanks,

Phil.
Igor Mammedov May 11, 2020, 7:24 p.m. UTC | #7
On Mon, 11 May 2020 18:00:01 +0200
Philippe Mathieu-Daudé <philmd@redhat.com> wrote:

> Hi Eduardo,
> 
> On 5/4/20 5:44 PM, Eduardo Habkost wrote:
> > On Thu, Apr 30, 2020 at 11:46:06AM -0400, Igor Mammedov wrote:  
> >> Since 5.0 QEMU uses hostmem backend for allocating main guest RAM.
> >> The backend however calls mbind() which is typically NOP
> >> in case of default policy/absent host-nodes bitmap.
> >> However when runing in container with black-listed mbind()
> >> syscall, QEMU fails to start with error
> >>   "cannot bind memory to host NUMA nodes: Operation not permitted"
> >> even when user hasn't provided host-nodes to pin to explictly
> >> (which is the case with -m option)
> >>
> >> To fix issue, call mbind() only in case when user has provided
> >> host-nodes explicitly (i.e. host_nodes bitmap is not empty).
> >> That should allow to run QEMU in containers with black-listed
> >> mbind() without memory pinning. If QEMU provided memory-pinning
> >> is required user still has to white-list mbind() in container
> >> configuration.
> >>
> >> Reported-by: Manuel Hohmann <mhohmann@physnet.uni-hamburg.de>
> >> Signed-off-by: Igor Mammedov <imammedo@redhat.com>  
> > 
> > Queued on machine-next, thanks!  
> 
> I've been debugging this issue again today and figured it was not 
> merged, if possible can you add the "Cc: qemu-stable@nongnu.org" tag 
> before sending your pull request?
it's CCed already, so my impression was that will should picked up once it was reviewed.

> 
> Thanks,
> 
> Phil.
>
Philippe Mathieu-Daudé May 11, 2020, 8:03 p.m. UTC | #8
On 5/11/20 9:24 PM, Igor Mammedov wrote:
> On Mon, 11 May 2020 18:00:01 +0200
> Philippe Mathieu-Daudé <philmd@redhat.com> wrote:
> 
>> Hi Eduardo,
>>
>> On 5/4/20 5:44 PM, Eduardo Habkost wrote:
>>> On Thu, Apr 30, 2020 at 11:46:06AM -0400, Igor Mammedov wrote:
>>>> Since 5.0 QEMU uses hostmem backend for allocating main guest RAM.
>>>> The backend however calls mbind() which is typically NOP
>>>> in case of default policy/absent host-nodes bitmap.
>>>> However when runing in container with black-listed mbind()
>>>> syscall, QEMU fails to start with error
>>>>    "cannot bind memory to host NUMA nodes: Operation not permitted"
>>>> even when user hasn't provided host-nodes to pin to explictly
>>>> (which is the case with -m option)
>>>>
>>>> To fix issue, call mbind() only in case when user has provided
>>>> host-nodes explicitly (i.e. host_nodes bitmap is not empty).
>>>> That should allow to run QEMU in containers with black-listed
>>>> mbind() without memory pinning. If QEMU provided memory-pinning
>>>> is required user still has to white-list mbind() in container
>>>> configuration.
>>>>
>>>> Reported-by: Manuel Hohmann <mhohmann@physnet.uni-hamburg.de>
>>>> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
>>>
>>> Queued on machine-next, thanks!
>>
>> I've been debugging this issue again today and figured it was not
>> merged, if possible can you add the "Cc: qemu-stable@nongnu.org" tag
>> before sending your pull request?
> it's CCed already, so my impression was that will should picked up once it was reviewed.

Correct, however some distributions find easier to grep for the 'Cc: 
qemu-stable@nongnu.org' merged tag before qemu-stable is released.

> 
>>
>> Thanks,
>>
>> Phil.
>>
>
diff mbox series

Patch

diff --git a/backends/hostmem.c b/backends/hostmem.c
index 327f9eebc3..0efd7b7bd6 100644
--- a/backends/hostmem.c
+++ b/backends/hostmem.c
@@ -383,8 +383,10 @@  host_memory_backend_memory_complete(UserCreatable *uc, Error **errp)
         assert(sizeof(backend->host_nodes) >=
                BITS_TO_LONGS(MAX_NODES + 1) * sizeof(unsigned long));
         assert(maxnode <= MAX_NODES);
-        if (mbind(ptr, sz, backend->policy,
-                  maxnode ? backend->host_nodes : NULL, maxnode + 1, flags)) {
+
+        if (maxnode &&
+            mbind(ptr, sz, backend->policy, backend->host_nodes, maxnode + 1,
+                  flags)) {
             if (backend->policy != MPOL_DEFAULT || errno != ENOSYS) {
                 error_setg_errno(errp, errno,
                                  "cannot bind memory to host NUMA nodes");