diff mbox

target/s390x/kvm: Fix problem when running with SELinux under z/VM

Message ID 1490347615-19222-1-git-send-email-thuth@redhat.com
State New
Headers show

Commit Message

Thomas Huth March 24, 2017, 9:26 a.m. UTC
When running QEMU with KVM under z/VM, the memory for the guest
is allocated via legacy_s390_alloc() since the KVM_CAP_S390_COW
extension is not supported on z/VM. legacy_s390_alloc() then uses
mmap(... PROT_EXEC ...) for the guest memory - but this does not
work when running with SELinux enabled, mmap() fails and QEMU aborts
with the following error message:

 cannot set up guest memory 's390.ram': Permission denied

Looking at the other allocator function qemu_anon_ram_alloc(), it
seems like PROT_EXEC is normally not needed for allocating the
guest RAM, and indeed, the guest also starts successfully under
z/VM when we remove the PROT_EXEC from the legacy_s390_alloc()
function. So let's get rid of that flag here to be able to run
with SELinux under z/VM, too.

Signed-off-by: Thomas Huth <thuth@redhat.com>
---
 target/s390x/kvm.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

Comments

Cornelia Huck March 24, 2017, 9:38 a.m. UTC | #1
On Fri, 24 Mar 2017 10:26:55 +0100
Thomas Huth <thuth@redhat.com> wrote:

> When running QEMU with KVM under z/VM, the memory for the guest
> is allocated via legacy_s390_alloc() since the KVM_CAP_S390_COW
> extension is not supported on z/VM. legacy_s390_alloc() then uses
> mmap(... PROT_EXEC ...) for the guest memory - but this does not
> work when running with SELinux enabled, mmap() fails and QEMU aborts
> with the following error message:
> 
>  cannot set up guest memory 's390.ram': Permission denied
> 
> Looking at the other allocator function qemu_anon_ram_alloc(), it
> seems like PROT_EXEC is normally not needed for allocating the
> guest RAM, and indeed, the guest also starts successfully under
> z/VM when we remove the PROT_EXEC from the legacy_s390_alloc()
> function. So let's get rid of that flag here to be able to run
> with SELinux under z/VM, too.

The root cause of this is lack of ESOP in the host.

> 
> Signed-off-by: Thomas Huth <thuth@redhat.com>
> ---
>  target/s390x/kvm.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/target/s390x/kvm.c b/target/s390x/kvm.c
> index ac47154..5167436 100644
> --- a/target/s390x/kvm.c
> +++ b/target/s390x/kvm.c
> @@ -678,8 +678,7 @@ static void *legacy_s390_alloc(size_t size, uint64_t *align)
>  {
>      void *mem;
> 
> -    mem = mmap((void *) 0x800000000ULL, size,
> -               PROT_EXEC|PROT_READ|PROT_WRITE,
> +    mem = mmap((void *) 0x800000000ULL, size, PROT_READ | PROT_WRITE,
>                 MAP_SHARED | MAP_ANONYMOUS | MAP_FIXED, -1, 0);
>      return mem == MAP_FAILED ? NULL : mem;
>  }

Wouldn't it be better to adapt the SELinux rules?
Christian Borntraeger March 24, 2017, 9:39 a.m. UTC | #2
On 03/24/2017 10:26 AM, Thomas Huth wrote:
> When running QEMU with KVM under z/VM, the memory for the guest
> is allocated via legacy_s390_alloc() since the KVM_CAP_S390_COW
> extension is not supported on z/VM. legacy_s390_alloc() then uses
> mmap(... PROT_EXEC ...) for the guest memory - but this does not
> work when running with SELinux enabled, mmap() fails and QEMU aborts
> with the following error message:
> 
>  cannot set up guest memory 's390.ram': Permission denied
> 
> Looking at the other allocator function qemu_anon_ram_alloc(), it
> seems like PROT_EXEC is normally not needed for allocating the
> guest RAM, and indeed, the guest also starts successfully under
> z/VM when we remove the PROT_EXEC from the legacy_s390_alloc()
> function. So let's get rid of that flag here to be able to run
> with SELinux under z/VM, too.

Older z/VM versions do not provide the enhanced suppression on protection
facility, which would result in guest failures as soon as the kernel
starts dirty pages tracking by write protecting the pages via the page
table. Some kernel release back (last time I checked) the PROT_EXEC was 
necessary to prevent the dirty pages tracking from taking place. So this
patch would break KVM in that case.

Newer z/VMs (e.g. 6.3) do provide ESOP. SO the question is,
why is KVM_CAP_S390_COW not set?


> 
> Signed-off-by: Thomas Huth <thuth@redhat.com>
> ---
>  target/s390x/kvm.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/target/s390x/kvm.c b/target/s390x/kvm.c
> index ac47154..5167436 100644
> --- a/target/s390x/kvm.c
> +++ b/target/s390x/kvm.c
> @@ -678,8 +678,7 @@ static void *legacy_s390_alloc(size_t size, uint64_t *align)
>  {
>      void *mem;
> 
> -    mem = mmap((void *) 0x800000000ULL, size,
> -               PROT_EXEC|PROT_READ|PROT_WRITE,
> +    mem = mmap((void *) 0x800000000ULL, size, PROT_READ | PROT_WRITE,
>                 MAP_SHARED | MAP_ANONYMOUS | MAP_FIXED, -1, 0);
>      return mem == MAP_FAILED ? NULL : mem;
>  }
>
Thomas Huth March 24, 2017, 9:53 a.m. UTC | #3
On 24.03.2017 10:38, Cornelia Huck wrote:
> On Fri, 24 Mar 2017 10:26:55 +0100
> Thomas Huth <thuth@redhat.com> wrote:
[...]
>> diff --git a/target/s390x/kvm.c b/target/s390x/kvm.c
>> index ac47154..5167436 100644
>> --- a/target/s390x/kvm.c
>> +++ b/target/s390x/kvm.c
>> @@ -678,8 +678,7 @@ static void *legacy_s390_alloc(size_t size, uint64_t *align)
>>  {
>>      void *mem;
>>
>> -    mem = mmap((void *) 0x800000000ULL, size,
>> -               PROT_EXEC|PROT_READ|PROT_WRITE,
>> +    mem = mmap((void *) 0x800000000ULL, size, PROT_READ | PROT_WRITE,
>>                 MAP_SHARED | MAP_ANONYMOUS | MAP_FIXED, -1, 0);
>>      return mem == MAP_FAILED ? NULL : mem;
>>  }
> 
> Wouldn't it be better to adapt the SELinux rules?

I don't think that we want to change the default behavior of SELinux
here, since this is a security feature. Fortunately, there is already a
SELinux configuration variable available which can be used as a workaround:

 setsebool virt_use_execmem 1

But still, it would be nicer, if things worked out of the box instead...

 Thomas
Thomas Huth March 24, 2017, 10 a.m. UTC | #4
On 24.03.2017 10:39, Christian Borntraeger wrote:
> On 03/24/2017 10:26 AM, Thomas Huth wrote:
>> When running QEMU with KVM under z/VM, the memory for the guest
>> is allocated via legacy_s390_alloc() since the KVM_CAP_S390_COW
>> extension is not supported on z/VM. legacy_s390_alloc() then uses
>> mmap(... PROT_EXEC ...) for the guest memory - but this does not
>> work when running with SELinux enabled, mmap() fails and QEMU aborts
>> with the following error message:
>>
>>  cannot set up guest memory 's390.ram': Permission denied
>>
>> Looking at the other allocator function qemu_anon_ram_alloc(), it
>> seems like PROT_EXEC is normally not needed for allocating the
>> guest RAM, and indeed, the guest also starts successfully under
>> z/VM when we remove the PROT_EXEC from the legacy_s390_alloc()
>> function. So let's get rid of that flag here to be able to run
>> with SELinux under z/VM, too.
> 
> Older z/VM versions do not provide the enhanced suppression on protection
> facility, which would result in guest failures as soon as the kernel
> starts dirty pages tracking by write protecting the pages via the page
> table. Some kernel release back (last time I checked) the PROT_EXEC was 
> necessary to prevent the dirty pages tracking from taking place. So this
> patch would break KVM in that case.

OK, then please ignore it.

> Newer z/VMs (e.g. 6.3) do provide ESOP. SO the question is,
> why is KVM_CAP_S390_COW not set?

I'll check whether MACHINE_HAS_ESOP is correctly set in my kernel...

 Thomas
Thomas Huth March 29, 2017, 2:21 p.m. UTC | #5
On 24.03.2017 10:39, Christian Borntraeger wrote:
> On 03/24/2017 10:26 AM, Thomas Huth wrote:
>> When running QEMU with KVM under z/VM, the memory for the guest
>> is allocated via legacy_s390_alloc() since the KVM_CAP_S390_COW
>> extension is not supported on z/VM. legacy_s390_alloc() then uses
>> mmap(... PROT_EXEC ...) for the guest memory - but this does not
>> work when running with SELinux enabled, mmap() fails and QEMU aborts
>> with the following error message:
>>
>>  cannot set up guest memory 's390.ram': Permission denied
>>
>> Looking at the other allocator function qemu_anon_ram_alloc(), it
>> seems like PROT_EXEC is normally not needed for allocating the
>> guest RAM, and indeed, the guest also starts successfully under
>> z/VM when we remove the PROT_EXEC from the legacy_s390_alloc()
>> function. So let's get rid of that flag here to be able to run
>> with SELinux under z/VM, too.
> 
> Older z/VM versions do not provide the enhanced suppression on protection
> facility, which would result in guest failures as soon as the kernel
> starts dirty pages tracking by write protecting the pages via the page
> table. Some kernel release back (last time I checked) the PROT_EXEC was 
> necessary to prevent the dirty pages tracking from taking place. So this
> patch would break KVM in that case.
> 
> Newer z/VMs (e.g. 6.3) do provide ESOP. SO the question is,
> why is KVM_CAP_S390_COW not set?

I now had another look at this, and seems like the ESOP bit is indeed
not set in S390_lowcore.machine_flags here. According to /proc/sysinfo,
z/VM is version 6.1.0 here, so I guess that's just too old for ESOP?

 Thomas
Christian Borntraeger March 29, 2017, 2:25 p.m. UTC | #6
On 03/29/2017 04:21 PM, Thomas Huth wrote:
> On 24.03.2017 10:39, Christian Borntraeger wrote:
>> On 03/24/2017 10:26 AM, Thomas Huth wrote:
>>> When running QEMU with KVM under z/VM, the memory for the guest
>>> is allocated via legacy_s390_alloc() since the KVM_CAP_S390_COW
>>> extension is not supported on z/VM. legacy_s390_alloc() then uses
>>> mmap(... PROT_EXEC ...) for the guest memory - but this does not
>>> work when running with SELinux enabled, mmap() fails and QEMU aborts
>>> with the following error message:
>>>
>>>  cannot set up guest memory 's390.ram': Permission denied
>>>
>>> Looking at the other allocator function qemu_anon_ram_alloc(), it
>>> seems like PROT_EXEC is normally not needed for allocating the
>>> guest RAM, and indeed, the guest also starts successfully under
>>> z/VM when we remove the PROT_EXEC from the legacy_s390_alloc()
>>> function. So let's get rid of that flag here to be able to run
>>> with SELinux under z/VM, too.
>>
>> Older z/VM versions do not provide the enhanced suppression on protection
>> facility, which would result in guest failures as soon as the kernel
>> starts dirty pages tracking by write protecting the pages via the page
>> table. Some kernel release back (last time I checked) the PROT_EXEC was 
>> necessary to prevent the dirty pages tracking from taking place. So this
>> patch would break KVM in that case.
>>
>> Newer z/VMs (e.g. 6.3) do provide ESOP. SO the question is,
>> why is KVM_CAP_S390_COW not set?
> 
> I now had another look at this, and seems like the ESOP bit is indeed
> not set in S390_lowcore.machine_flags here. According to /proc/sysinfo,
> z/VM is version 6.1.0 here, so I guess that's just too old for ESOP?

Yes, this was introduced with z/VM 6.3
Thomas Huth Sept. 15, 2017, 2:36 p.m. UTC | #7
On 29.03.2017 16:25, Christian Borntraeger wrote:
> On 03/29/2017 04:21 PM, Thomas Huth wrote:
>> On 24.03.2017 10:39, Christian Borntraeger wrote:
>>> On 03/24/2017 10:26 AM, Thomas Huth wrote:
>>>> When running QEMU with KVM under z/VM, the memory for the guest
>>>> is allocated via legacy_s390_alloc() since the KVM_CAP_S390_COW
>>>> extension is not supported on z/VM. legacy_s390_alloc() then uses
>>>> mmap(... PROT_EXEC ...) for the guest memory - but this does not
>>>> work when running with SELinux enabled, mmap() fails and QEMU aborts
>>>> with the following error message:
>>>>
>>>>  cannot set up guest memory 's390.ram': Permission denied
>>>>
>>>> Looking at the other allocator function qemu_anon_ram_alloc(), it
>>>> seems like PROT_EXEC is normally not needed for allocating the
>>>> guest RAM, and indeed, the guest also starts successfully under
>>>> z/VM when we remove the PROT_EXEC from the legacy_s390_alloc()
>>>> function. So let's get rid of that flag here to be able to run
>>>> with SELinux under z/VM, too.
>>>
>>> Older z/VM versions do not provide the enhanced suppression on protection
>>> facility, which would result in guest failures as soon as the kernel
>>> starts dirty pages tracking by write protecting the pages via the page
>>> table. Some kernel release back (last time I checked) the PROT_EXEC was 
>>> necessary to prevent the dirty pages tracking from taking place. So this
>>> patch would break KVM in that case.
>>>
>>> Newer z/VMs (e.g. 6.3) do provide ESOP. SO the question is,
>>> why is KVM_CAP_S390_COW not set?
>>
>> I now had another look at this, and seems like the ESOP bit is indeed
>> not set in S390_lowcore.machine_flags here. According to /proc/sysinfo,
>> z/VM is version 6.1.0 here, so I guess that's just too old for ESOP?
> 
> Yes, this was introduced with z/VM 6.3

FWIW, the last version without ESOP, z/VM 6.2, is now end of life,
according to: http://www.vm.ibm.com/techinfo/lpmigr/vmleos.html
... so I guess we could remove the legacy_s390_alloc() function now?

 Thomas
Christian Borntraeger Sept. 18, 2017, 7:43 a.m. UTC | #8
On 09/15/2017 04:36 PM, Thomas Huth wrote:
> On 29.03.2017 16:25, Christian Borntraeger wrote:
>> On 03/29/2017 04:21 PM, Thomas Huth wrote:
>>> On 24.03.2017 10:39, Christian Borntraeger wrote:
>>>> On 03/24/2017 10:26 AM, Thomas Huth wrote:
>>>>> When running QEMU with KVM under z/VM, the memory for the guest
>>>>> is allocated via legacy_s390_alloc() since the KVM_CAP_S390_COW
>>>>> extension is not supported on z/VM. legacy_s390_alloc() then uses
>>>>> mmap(... PROT_EXEC ...) for the guest memory - but this does not
>>>>> work when running with SELinux enabled, mmap() fails and QEMU aborts
>>>>> with the following error message:
>>>>>
>>>>>  cannot set up guest memory 's390.ram': Permission denied
>>>>>
>>>>> Looking at the other allocator function qemu_anon_ram_alloc(), it
>>>>> seems like PROT_EXEC is normally not needed for allocating the
>>>>> guest RAM, and indeed, the guest also starts successfully under
>>>>> z/VM when we remove the PROT_EXEC from the legacy_s390_alloc()
>>>>> function. So let's get rid of that flag here to be able to run
>>>>> with SELinux under z/VM, too.
>>>>
>>>> Older z/VM versions do not provide the enhanced suppression on protection
>>>> facility, which would result in guest failures as soon as the kernel
>>>> starts dirty pages tracking by write protecting the pages via the page
>>>> table. Some kernel release back (last time I checked) the PROT_EXEC was 
>>>> necessary to prevent the dirty pages tracking from taking place. So this
>>>> patch would break KVM in that case.
>>>>
>>>> Newer z/VMs (e.g. 6.3) do provide ESOP. SO the question is,
>>>> why is KVM_CAP_S390_COW not set?
>>>
>>> I now had another look at this, and seems like the ESOP bit is indeed
>>> not set in S390_lowcore.machine_flags here. According to /proc/sysinfo,
>>> z/VM is version 6.1.0 here, so I guess that's just too old for ESOP?
>>
>> Yes, this was introduced with z/VM 6.3
> 
> FWIW, the last version without ESOP, z/VM 6.2, is now end of life,
> according to: http://www.vm.ibm.com/techinfo/lpmigr/vmleos.html
> ... so I guess we could remove the legacy_s390_alloc() function now?


I recently learned that you can buy some extended z/VM support not sure how
long this will be available. In addition, ESOP was added with z10, so
if we still care about z9 and older then this would break things on
very very old boxes.

The pain/risk-to-break ratio seems to suggest to keep this "hack"
for a while.
David Hildenbrand Sept. 19, 2017, 12:38 p.m. UTC | #9
On 18.09.2017 09:43, Christian Borntraeger wrote:
> 
> 
> On 09/15/2017 04:36 PM, Thomas Huth wrote:
>> On 29.03.2017 16:25, Christian Borntraeger wrote:
>>> On 03/29/2017 04:21 PM, Thomas Huth wrote:
>>>> On 24.03.2017 10:39, Christian Borntraeger wrote:
>>>>> On 03/24/2017 10:26 AM, Thomas Huth wrote:
>>>>>> When running QEMU with KVM under z/VM, the memory for the guest
>>>>>> is allocated via legacy_s390_alloc() since the KVM_CAP_S390_COW
>>>>>> extension is not supported on z/VM. legacy_s390_alloc() then uses
>>>>>> mmap(... PROT_EXEC ...) for the guest memory - but this does not
>>>>>> work when running with SELinux enabled, mmap() fails and QEMU aborts
>>>>>> with the following error message:
>>>>>>
>>>>>>  cannot set up guest memory 's390.ram': Permission denied
>>>>>>
>>>>>> Looking at the other allocator function qemu_anon_ram_alloc(), it
>>>>>> seems like PROT_EXEC is normally not needed for allocating the
>>>>>> guest RAM, and indeed, the guest also starts successfully under
>>>>>> z/VM when we remove the PROT_EXEC from the legacy_s390_alloc()
>>>>>> function. So let's get rid of that flag here to be able to run
>>>>>> with SELinux under z/VM, too.
>>>>>
>>>>> Older z/VM versions do not provide the enhanced suppression on protection
>>>>> facility, which would result in guest failures as soon as the kernel
>>>>> starts dirty pages tracking by write protecting the pages via the page
>>>>> table. Some kernel release back (last time I checked) the PROT_EXEC was 
>>>>> necessary to prevent the dirty pages tracking from taking place. So this
>>>>> patch would break KVM in that case.
>>>>>
>>>>> Newer z/VMs (e.g. 6.3) do provide ESOP. SO the question is,
>>>>> why is KVM_CAP_S390_COW not set?
>>>>
>>>> I now had another look at this, and seems like the ESOP bit is indeed
>>>> not set in S390_lowcore.machine_flags here. According to /proc/sysinfo,
>>>> z/VM is version 6.1.0 here, so I guess that's just too old for ESOP?
>>>
>>> Yes, this was introduced with z/VM 6.3
>>
>> FWIW, the last version without ESOP, z/VM 6.2, is now end of life,
>> according to: http://www.vm.ibm.com/techinfo/lpmigr/vmleos.html
>> ... so I guess we could remove the legacy_s390_alloc() function now?
> 
> 
> I recently learned that you can buy some extended z/VM support not sure how
> long this will be available. In addition, ESOP was added with z10, so
> if we still care about z9 and older then this would break things on
> very very old boxes.

I wonder if that is really relevant anymore.

Existing user on such machines (I doubt there are many) can simply stick
to QEMU <= 2.10. Or do we actually expect people with such old
environments to use latest and grates QEMU versions?

We could add an error message an error out.

> 
> The pain/risk-to-break ratio seems to suggest to keep this "hack"
> for a while.
Thomas Huth Sept. 19, 2017, 12:48 p.m. UTC | #10
On 19.09.2017 14:38, David Hildenbrand wrote:
> On 18.09.2017 09:43, Christian Borntraeger wrote:
>>
>>
>> On 09/15/2017 04:36 PM, Thomas Huth wrote:
>>> On 29.03.2017 16:25, Christian Borntraeger wrote:
>>>> On 03/29/2017 04:21 PM, Thomas Huth wrote:
>>>>> On 24.03.2017 10:39, Christian Borntraeger wrote:
>>>>>> On 03/24/2017 10:26 AM, Thomas Huth wrote:
>>>>>>> When running QEMU with KVM under z/VM, the memory for the guest
>>>>>>> is allocated via legacy_s390_alloc() since the KVM_CAP_S390_COW
>>>>>>> extension is not supported on z/VM. legacy_s390_alloc() then uses
>>>>>>> mmap(... PROT_EXEC ...) for the guest memory - but this does not
>>>>>>> work when running with SELinux enabled, mmap() fails and QEMU aborts
>>>>>>> with the following error message:
>>>>>>>
>>>>>>>  cannot set up guest memory 's390.ram': Permission denied
>>>>>>>
>>>>>>> Looking at the other allocator function qemu_anon_ram_alloc(), it
>>>>>>> seems like PROT_EXEC is normally not needed for allocating the
>>>>>>> guest RAM, and indeed, the guest also starts successfully under
>>>>>>> z/VM when we remove the PROT_EXEC from the legacy_s390_alloc()
>>>>>>> function. So let's get rid of that flag here to be able to run
>>>>>>> with SELinux under z/VM, too.
>>>>>>
>>>>>> Older z/VM versions do not provide the enhanced suppression on protection
>>>>>> facility, which would result in guest failures as soon as the kernel
>>>>>> starts dirty pages tracking by write protecting the pages via the page
>>>>>> table. Some kernel release back (last time I checked) the PROT_EXEC was 
>>>>>> necessary to prevent the dirty pages tracking from taking place. So this
>>>>>> patch would break KVM in that case.
>>>>>>
>>>>>> Newer z/VMs (e.g. 6.3) do provide ESOP. SO the question is,
>>>>>> why is KVM_CAP_S390_COW not set?
>>>>>
>>>>> I now had another look at this, and seems like the ESOP bit is indeed
>>>>> not set in S390_lowcore.machine_flags here. According to /proc/sysinfo,
>>>>> z/VM is version 6.1.0 here, so I guess that's just too old for ESOP?
>>>>
>>>> Yes, this was introduced with z/VM 6.3
>>>
>>> FWIW, the last version without ESOP, z/VM 6.2, is now end of life,
>>> according to: http://www.vm.ibm.com/techinfo/lpmigr/vmleos.html
>>> ... so I guess we could remove the legacy_s390_alloc() function now?
>>
>>
>> I recently learned that you can buy some extended z/VM support not sure how
>> long this will be available. In addition, ESOP was added with z10, so
>> if we still care about z9 and older then this would break things on
>> very very old boxes.
> 
> I wonder if that is really relevant anymore.
> 
> Existing user on such machines (I doubt there are many) can simply stick
> to QEMU <= 2.10. Or do we actually expect people with such old
> environments to use latest and grates QEMU versions?
> 
> We could add an error message an error out.

Well, as long as the code does not cause any trouble for us, and as long
as there still might be possible users, there is also no real urge to
remove it, is there? I originally thought that all affected systems
would now be EOL, but as Christian pointed out, the z9 BC is not EOL
yet, so I'd say we should at least wait for that point in time before
removing it (I haven't found any public information about extended z/VM
support though, so no clue whether we should really take that into account).

 Thomas
David Hildenbrand Sept. 19, 2017, 1:03 p.m. UTC | #11
On 19.09.2017 14:48, Thomas Huth wrote:
> On 19.09.2017 14:38, David Hildenbrand wrote:
>> On 18.09.2017 09:43, Christian Borntraeger wrote:
>>>
>>>
>>> On 09/15/2017 04:36 PM, Thomas Huth wrote:
>>>> On 29.03.2017 16:25, Christian Borntraeger wrote:
>>>>> On 03/29/2017 04:21 PM, Thomas Huth wrote:
>>>>>> On 24.03.2017 10:39, Christian Borntraeger wrote:
>>>>>>> On 03/24/2017 10:26 AM, Thomas Huth wrote:
>>>>>>>> When running QEMU with KVM under z/VM, the memory for the guest
>>>>>>>> is allocated via legacy_s390_alloc() since the KVM_CAP_S390_COW
>>>>>>>> extension is not supported on z/VM. legacy_s390_alloc() then uses
>>>>>>>> mmap(... PROT_EXEC ...) for the guest memory - but this does not
>>>>>>>> work when running with SELinux enabled, mmap() fails and QEMU aborts
>>>>>>>> with the following error message:
>>>>>>>>
>>>>>>>>  cannot set up guest memory 's390.ram': Permission denied
>>>>>>>>
>>>>>>>> Looking at the other allocator function qemu_anon_ram_alloc(), it
>>>>>>>> seems like PROT_EXEC is normally not needed for allocating the
>>>>>>>> guest RAM, and indeed, the guest also starts successfully under
>>>>>>>> z/VM when we remove the PROT_EXEC from the legacy_s390_alloc()
>>>>>>>> function. So let's get rid of that flag here to be able to run
>>>>>>>> with SELinux under z/VM, too.
>>>>>>>
>>>>>>> Older z/VM versions do not provide the enhanced suppression on protection
>>>>>>> facility, which would result in guest failures as soon as the kernel
>>>>>>> starts dirty pages tracking by write protecting the pages via the page
>>>>>>> table. Some kernel release back (last time I checked) the PROT_EXEC was 
>>>>>>> necessary to prevent the dirty pages tracking from taking place. So this
>>>>>>> patch would break KVM in that case.
>>>>>>>
>>>>>>> Newer z/VMs (e.g. 6.3) do provide ESOP. SO the question is,
>>>>>>> why is KVM_CAP_S390_COW not set?
>>>>>>
>>>>>> I now had another look at this, and seems like the ESOP bit is indeed
>>>>>> not set in S390_lowcore.machine_flags here. According to /proc/sysinfo,
>>>>>> z/VM is version 6.1.0 here, so I guess that's just too old for ESOP?
>>>>>
>>>>> Yes, this was introduced with z/VM 6.3
>>>>
>>>> FWIW, the last version without ESOP, z/VM 6.2, is now end of life,
>>>> according to: http://www.vm.ibm.com/techinfo/lpmigr/vmleos.html
>>>> ... so I guess we could remove the legacy_s390_alloc() function now?
>>>
>>>
>>> I recently learned that you can buy some extended z/VM support not sure how
>>> long this will be available. In addition, ESOP was added with z10, so
>>> if we still care about z9 and older then this would break things on
>>> very very old boxes.
>>
>> I wonder if that is really relevant anymore.
>>
>> Existing user on such machines (I doubt there are many) can simply stick
>> to QEMU <= 2.10. Or do we actually expect people with such old
>> environments to use latest and grates QEMU versions?
>>
>> We could add an error message an error out.
> 
> Well, as long as the code does not cause any trouble for us, and as long
> as there still might be possible users, there is also no real urge to
> remove it, is there? I originally thought that all affected systems
> would now be EOL, but as Christian pointed out, the z9 BC is not EOL
> yet, so I'd say we should at least wait for that point in time before
> removing it (I haven't found any public information about extended z/VM
> support though, so no clue whether we should really take that into account).
> 
>  Thomas
> 

It's the last remaining alloc hack we have in QEMU :) That's why I am
asking the question.
Thomas Huth Sept. 19, 2017, 1:12 p.m. UTC | #12
On 19.09.2017 15:03, David Hildenbrand wrote:
> On 19.09.2017 14:48, Thomas Huth wrote:
>> On 19.09.2017 14:38, David Hildenbrand wrote:
>>> On 18.09.2017 09:43, Christian Borntraeger wrote:
>>>>
>>>>
>>>> On 09/15/2017 04:36 PM, Thomas Huth wrote:
>>>>> On 29.03.2017 16:25, Christian Borntraeger wrote:
>>>>>> On 03/29/2017 04:21 PM, Thomas Huth wrote:
>>>>>>> On 24.03.2017 10:39, Christian Borntraeger wrote:
>>>>>>>> On 03/24/2017 10:26 AM, Thomas Huth wrote:
>>>>>>>>> When running QEMU with KVM under z/VM, the memory for the guest
>>>>>>>>> is allocated via legacy_s390_alloc() since the KVM_CAP_S390_COW
>>>>>>>>> extension is not supported on z/VM. legacy_s390_alloc() then uses
>>>>>>>>> mmap(... PROT_EXEC ...) for the guest memory - but this does not
>>>>>>>>> work when running with SELinux enabled, mmap() fails and QEMU aborts
>>>>>>>>> with the following error message:
>>>>>>>>>
>>>>>>>>>  cannot set up guest memory 's390.ram': Permission denied
>>>>>>>>>
>>>>>>>>> Looking at the other allocator function qemu_anon_ram_alloc(), it
>>>>>>>>> seems like PROT_EXEC is normally not needed for allocating the
>>>>>>>>> guest RAM, and indeed, the guest also starts successfully under
>>>>>>>>> z/VM when we remove the PROT_EXEC from the legacy_s390_alloc()
>>>>>>>>> function. So let's get rid of that flag here to be able to run
>>>>>>>>> with SELinux under z/VM, too.
>>>>>>>>
>>>>>>>> Older z/VM versions do not provide the enhanced suppression on protection
>>>>>>>> facility, which would result in guest failures as soon as the kernel
>>>>>>>> starts dirty pages tracking by write protecting the pages via the page
>>>>>>>> table. Some kernel release back (last time I checked) the PROT_EXEC was 
>>>>>>>> necessary to prevent the dirty pages tracking from taking place. So this
>>>>>>>> patch would break KVM in that case.
>>>>>>>>
>>>>>>>> Newer z/VMs (e.g. 6.3) do provide ESOP. SO the question is,
>>>>>>>> why is KVM_CAP_S390_COW not set?
>>>>>>>
>>>>>>> I now had another look at this, and seems like the ESOP bit is indeed
>>>>>>> not set in S390_lowcore.machine_flags here. According to /proc/sysinfo,
>>>>>>> z/VM is version 6.1.0 here, so I guess that's just too old for ESOP?
>>>>>>
>>>>>> Yes, this was introduced with z/VM 6.3
>>>>>
>>>>> FWIW, the last version without ESOP, z/VM 6.2, is now end of life,
>>>>> according to: http://www.vm.ibm.com/techinfo/lpmigr/vmleos.html
>>>>> ... so I guess we could remove the legacy_s390_alloc() function now?
>>>>
>>>>
>>>> I recently learned that you can buy some extended z/VM support not sure how
>>>> long this will be available. In addition, ESOP was added with z10, so
>>>> if we still care about z9 and older then this would break things on
>>>> very very old boxes.
>>>
>>> I wonder if that is really relevant anymore.
>>>
>>> Existing user on such machines (I doubt there are many) can simply stick
>>> to QEMU <= 2.10. Or do we actually expect people with such old
>>> environments to use latest and grates QEMU versions?
>>>
>>> We could add an error message an error out.
>>
>> Well, as long as the code does not cause any trouble for us, and as long
>> as there still might be possible users, there is also no real urge to
>> remove it, is there? I originally thought that all affected systems
>> would now be EOL, but as Christian pointed out, the z9 BC is not EOL
>> yet, so I'd say we should at least wait for that point in time before
>> removing it (I haven't found any public information about extended z/VM
>> support though, so no clue whether we should really take that into account).
>>
>>  Thomas
>>
> 
> It's the last remaining alloc hack we have in QEMU :) That's why I am
> asking the question.

Hmm, maybe we could remove it for QEMU v3.0 ? ;-)

 Thomas
David Hildenbrand Sept. 19, 2017, 1:14 p.m. UTC | #13
>> It's the last remaining alloc hack we have in QEMU :) That's why I am
>> asking the question.
> 
> Hmm, maybe we could remove it for QEMU v3.0 ? ;-)
> 
>  Thomas
> 
> 

Chasing unicorns on rainbows? ;)
Christian Borntraeger Sept. 19, 2017, 1:15 p.m. UTC | #14
On 09/19/2017 03:03 PM, David Hildenbrand wrote:
> On 19.09.2017 14:48, Thomas Huth wrote:
>> On 19.09.2017 14:38, David Hildenbrand wrote:
>>> On 18.09.2017 09:43, Christian Borntraeger wrote:
>>>>
>>>>
>>>> On 09/15/2017 04:36 PM, Thomas Huth wrote:
>>>>> On 29.03.2017 16:25, Christian Borntraeger wrote:
>>>>>> On 03/29/2017 04:21 PM, Thomas Huth wrote:
>>>>>>> On 24.03.2017 10:39, Christian Borntraeger wrote:
>>>>>>>> On 03/24/2017 10:26 AM, Thomas Huth wrote:
>>>>>>>>> When running QEMU with KVM under z/VM, the memory for the guest
>>>>>>>>> is allocated via legacy_s390_alloc() since the KVM_CAP_S390_COW
>>>>>>>>> extension is not supported on z/VM. legacy_s390_alloc() then uses
>>>>>>>>> mmap(... PROT_EXEC ...) for the guest memory - but this does not
>>>>>>>>> work when running with SELinux enabled, mmap() fails and QEMU aborts
>>>>>>>>> with the following error message:
>>>>>>>>>
>>>>>>>>>  cannot set up guest memory 's390.ram': Permission denied
>>>>>>>>>
>>>>>>>>> Looking at the other allocator function qemu_anon_ram_alloc(), it
>>>>>>>>> seems like PROT_EXEC is normally not needed for allocating the
>>>>>>>>> guest RAM, and indeed, the guest also starts successfully under
>>>>>>>>> z/VM when we remove the PROT_EXEC from the legacy_s390_alloc()
>>>>>>>>> function. So let's get rid of that flag here to be able to run
>>>>>>>>> with SELinux under z/VM, too.
>>>>>>>>
>>>>>>>> Older z/VM versions do not provide the enhanced suppression on protection
>>>>>>>> facility, which would result in guest failures as soon as the kernel
>>>>>>>> starts dirty pages tracking by write protecting the pages via the page
>>>>>>>> table. Some kernel release back (last time I checked) the PROT_EXEC was 
>>>>>>>> necessary to prevent the dirty pages tracking from taking place. So this
>>>>>>>> patch would break KVM in that case.
>>>>>>>>
>>>>>>>> Newer z/VMs (e.g. 6.3) do provide ESOP. SO the question is,
>>>>>>>> why is KVM_CAP_S390_COW not set?
>>>>>>>
>>>>>>> I now had another look at this, and seems like the ESOP bit is indeed
>>>>>>> not set in S390_lowcore.machine_flags here. According to /proc/sysinfo,
>>>>>>> z/VM is version 6.1.0 here, so I guess that's just too old for ESOP?
>>>>>>
>>>>>> Yes, this was introduced with z/VM 6.3
>>>>>
>>>>> FWIW, the last version without ESOP, z/VM 6.2, is now end of life,
>>>>> according to: http://www.vm.ibm.com/techinfo/lpmigr/vmleos.html
>>>>> ... so I guess we could remove the legacy_s390_alloc() function now?
>>>>
>>>>
>>>> I recently learned that you can buy some extended z/VM support not sure how
>>>> long this will be available. In addition, ESOP was added with z10, so
>>>> if we still care about z9 and older then this would break things on
>>>> very very old boxes.
>>>
>>> I wonder if that is really relevant anymore.
>>>
>>> Existing user on such machines (I doubt there are many) can simply stick
>>> to QEMU <= 2.10. Or do we actually expect people with such old
>>> environments to use latest and grates QEMU versions?
>>>
>>> We could add an error message an error out.
>>
>> Well, as long as the code does not cause any trouble for us, and as long
>> as there still might be possible users, there is also no real urge to
>> remove it, is there? I originally thought that all affected systems
>> would now be EOL, but as Christian pointed out, the z9 BC is not EOL
>> yet, so I'd say we should at least wait for that point in time before
>> removing it (I haven't found any public information about extended z/VM
>> support though, so no clue whether we should really take that into account).
>>
>>  Thomas
>>
> 
> It's the last remaining alloc hack we have in QEMU :) That's why I am
> asking the question.


I think breaking potential users (e.g. Debian on a z9) for the sake of cleanliness
is a bad trade-off. We carry along a lot of old compatibility cruft in many places
 - and that is a good thing. As long as it does not become a burden to maintain (and
it really really does not) lets not touch it.
diff mbox

Patch

diff --git a/target/s390x/kvm.c b/target/s390x/kvm.c
index ac47154..5167436 100644
--- a/target/s390x/kvm.c
+++ b/target/s390x/kvm.c
@@ -678,8 +678,7 @@  static void *legacy_s390_alloc(size_t size, uint64_t *align)
 {
     void *mem;
 
-    mem = mmap((void *) 0x800000000ULL, size,
-               PROT_EXEC|PROT_READ|PROT_WRITE,
+    mem = mmap((void *) 0x800000000ULL, size, PROT_READ | PROT_WRITE,
                MAP_SHARED | MAP_ANONYMOUS | MAP_FIXED, -1, 0);
     return mem == MAP_FAILED ? NULL : mem;
 }