Patchwork [1/9] Export function for VA defined ram allocation

login
register
mail settings
Submitter Alexander Graf
Date Oct. 21, 2009, 9:24 a.m.
Message ID <1256117106-9475-2-git-send-email-agraf@suse.de>
Download mbox | patch
Permalink /patch/36510/
State New
Headers show

Comments

Alexander Graf - Oct. 21, 2009, 9:24 a.m.
S390 requires vmas for guests to be < 256 GB. So we need to directly export
mmaps "try to use this vma as start address" feature to not accidently get
over that limit.

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 cpu-common.h |    1 +
 exec.c       |   15 +++++++++++++--
 2 files changed, 14 insertions(+), 2 deletions(-)
Anthony Liguori - Oct. 21, 2009, 2:29 p.m.
Alexander Graf wrote:
> S390 requires vmas for guests to be < 256 GB. So we need to directly export
> mmaps "try to use this vma as start address" feature to not accidently get
> over that limit.
>
> Signed-off-by: Alexander Graf <agraf@suse.de>
> ---
>  cpu-common.h |    1 +
>  exec.c       |   15 +++++++++++++--
>  2 files changed, 14 insertions(+), 2 deletions(-)
>
> diff --git a/cpu-common.h b/cpu-common.h
> index 6302372..ecaf9e3 100644
> --- a/cpu-common.h
> +++ b/cpu-common.h
> @@ -30,6 +30,7 @@ static inline void cpu_register_physical_memory(target_phys_addr_t start_addr,
>  
>  ram_addr_t cpu_get_physical_page_desc(target_phys_addr_t addr);
>  ram_addr_t qemu_ram_alloc(ram_addr_t);
> +ram_addr_t _qemu_ram_alloc(ram_addr_t size, void *map_at);
>   

I don't like this interface very much.  Couldn't the fact that we need a 
va starting at 0 for s390 kvm be hidden behind the existing qemu_ram_alloc?

Regards,

Anthony Liguori
Alexander Graf - Oct. 21, 2009, 4:02 p.m.
On 21.10.2009, at 16:29, Anthony Liguori wrote:

> Alexander Graf wrote:
>> S390 requires vmas for guests to be < 256 GB. So we need to  
>> directly export
>> mmaps "try to use this vma as start address" feature to not  
>> accidently get
>> over that limit.
>>
>> Signed-off-by: Alexander Graf <agraf@suse.de>
>> ---
>> cpu-common.h |    1 +
>> exec.c       |   15 +++++++++++++--
>> 2 files changed, 14 insertions(+), 2 deletions(-)
>>
>> diff --git a/cpu-common.h b/cpu-common.h
>> index 6302372..ecaf9e3 100644
>> --- a/cpu-common.h
>> +++ b/cpu-common.h
>> @@ -30,6 +30,7 @@ static inline void cpu_register_physical_memory 
>> (target_phys_addr_t start_addr,
>>  ram_addr_t cpu_get_physical_page_desc(target_phys_addr_t addr);
>> ram_addr_t qemu_ram_alloc(ram_addr_t);
>> +ram_addr_t _qemu_ram_alloc(ram_addr_t size, void *map_at);
>>
>
> I don't like this interface very much.  Couldn't the fact that we  
> need a va starting at 0 for s390 kvm be hidden behind the existing  
> qemu_ram_alloc?

We don't need a va starting at 0. We need a max va ending < 256 GB.

Carsten did mention in a previous mail that he has getting rid of that  
limitation on his todo list, but for now that's what we have. As soon  
as the kvm module is updated to support arbitrary addresses we can  
just revert this patch and be happy.

For the time being all we need to check is that nobody else uses  
_qemu_ram_alloc, so we really can revert it safely then.

Alex
Anthony Liguori - Oct. 21, 2009, 4:54 p.m.
Alexander Graf wrote:
>
> On 21.10.2009, at 16:29, Anthony Liguori wrote:
>
>> Alexander Graf wrote:
>>> S390 requires vmas for guests to be < 256 GB. So we need to directly 
>>> export
>>> mmaps "try to use this vma as start address" feature to not 
>>> accidently get
>>> over that limit.
>>>
>>> Signed-off-by: Alexander Graf <agraf@suse.de>
>>> ---
>>> cpu-common.h |    1 +
>>> exec.c       |   15 +++++++++++++--
>>> 2 files changed, 14 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/cpu-common.h b/cpu-common.h
>>> index 6302372..ecaf9e3 100644
>>> --- a/cpu-common.h
>>> +++ b/cpu-common.h
>>> @@ -30,6 +30,7 @@ static inline void 
>>> cpu_register_physical_memory(target_phys_addr_t start_addr,
>>>  ram_addr_t cpu_get_physical_page_desc(target_phys_addr_t addr);
>>> ram_addr_t qemu_ram_alloc(ram_addr_t);
>>> +ram_addr_t _qemu_ram_alloc(ram_addr_t size, void *map_at);
>>>
>>
>> I don't like this interface very much.  Couldn't the fact that we 
>> need a va starting at 0 for s390 kvm be hidden behind the existing 
>> qemu_ram_alloc?
>
> We don't need a va starting at 0. We need a max va ending < 256 GB.
>
> Carsten did mention in a previous mail that he has getting rid of that 
> limitation on his todo list, but for now that's what we have. As soon 
> as the kvm module is updated to support arbitrary addresses we can 
> just revert this patch and be happy.
>
> For the time being all we need to check is that nobody else uses 
> _qemu_ram_alloc, so we really can revert it safely then.

Couldn't we just as easily put the check in qemu_ram_alloc under a 
TARGET_S390 && kvm_enabled()?

That's far better than introducing a function who's only name difference 
is an underscore.

Regards,

Anthony Liguori
> Alex
Alexander Graf - Oct. 21, 2009, 5:03 p.m.
On 21.10.2009, at 18:54, Anthony Liguori wrote:

> Alexander Graf wrote:
>>
>> On 21.10.2009, at 16:29, Anthony Liguori wrote:
>>
>>> Alexander Graf wrote:
>>>> S390 requires vmas for guests to be < 256 GB. So we need to  
>>>> directly export
>>>> mmaps "try to use this vma as start address" feature to not  
>>>> accidently get
>>>> over that limit.
>>>>
>>>> Signed-off-by: Alexander Graf <agraf@suse.de>
>>>> ---
>>>> cpu-common.h |    1 +
>>>> exec.c       |   15 +++++++++++++--
>>>> 2 files changed, 14 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/cpu-common.h b/cpu-common.h
>>>> index 6302372..ecaf9e3 100644
>>>> --- a/cpu-common.h
>>>> +++ b/cpu-common.h
>>>> @@ -30,6 +30,7 @@ static inline void cpu_register_physical_memory 
>>>> (target_phys_addr_t start_addr,
>>>> ram_addr_t cpu_get_physical_page_desc(target_phys_addr_t addr);
>>>> ram_addr_t qemu_ram_alloc(ram_addr_t);
>>>> +ram_addr_t _qemu_ram_alloc(ram_addr_t size, void *map_at);
>>>>
>>>
>>> I don't like this interface very much.  Couldn't the fact that we  
>>> need a va starting at 0 for s390 kvm be hidden behind the existing  
>>> qemu_ram_alloc?
>>
>> We don't need a va starting at 0. We need a max va ending < 256 GB.
>>
>> Carsten did mention in a previous mail that he has getting rid of  
>> that limitation on his todo list, but for now that's what we have.  
>> As soon as the kvm module is updated to support arbitrary addresses  
>> we can just revert this patch and be happy.
>>
>> For the time being all we need to check is that nobody else uses  
>> _qemu_ram_alloc, so we really can revert it safely then.
>
> Couldn't we just as easily put the check in qemu_ram_alloc under a  
> TARGET_S390 && kvm_enabled()?
>
> That's far better than introducing a function who's only name  
> difference is an underscore.

So you would prefer a special #ifdef for s390 in generic code over a  
specifically for this purpose exported function?

Well, you're the boss. I like the special function better, but  
whatever you say.

Alex
Anthony Liguori - Oct. 21, 2009, 6:06 p.m.
Alexander Graf wrote:
> So you would prefer a special #ifdef for s390 in generic code over a 
> specifically for this purpose exported function?
>
> Well, you're the boss. I like the special function better, but 
> whatever you say.

How is someone supposed to figure out what _qemu_ram_alloc is for?  
Nothing in your patch really indicates that.

However, an ugly #ifdef immediately tells someone, oh, s390 kvm needs 
this terrible hack, so let's keep bugging those guys to eliminate the 
need for that.

Regards,

Anthony Liguori

> Alex
>
Alexander Graf - Oct. 21, 2009, 6:27 p.m.
On 21.10.2009, at 20:06, Anthony Liguori wrote:

> Alexander Graf wrote:
>> So you would prefer a special #ifdef for s390 in generic code over  
>> a specifically for this purpose exported function?
>>
>> Well, you're the boss. I like the special function better, but  
>> whatever you say.
>
> How is someone supposed to figure out what _qemu_ram_alloc is for?   
> Nothing in your patch really indicates that.
>
> However, an ugly #ifdef immediately tells someone, oh, s390 kvm  
> needs this terrible hack, so let's keep bugging those guys to  
> eliminate the need for that.

Alright :-). Any other complaints? If not I'd spin up v3.

Alex
Anthony Liguori - Oct. 21, 2009, 6:30 p.m.
Alexander Graf wrote:
>
> On 21.10.2009, at 20:06, Anthony Liguori wrote:
>
>> Alexander Graf wrote:
>>> So you would prefer a special #ifdef for s390 in generic code over a 
>>> specifically for this purpose exported function?
>>>
>>> Well, you're the boss. I like the special function better, but 
>>> whatever you say.
>>
>> How is someone supposed to figure out what _qemu_ram_alloc is for?  
>> Nothing in your patch really indicates that.
>>
>> However, an ugly #ifdef immediately tells someone, oh, s390 kvm needs 
>> this terrible hack, so let's keep bugging those guys to eliminate the 
>> need for that.
>
> Alright :-). Any other complaints? If not I'd spin up v3.

Nope.

Regards,

Anthony Liguori

> Alex
>
Paolo Bonzini - Oct. 21, 2009, 6:42 p.m.
On 10/21/2009 08:06 PM, Anthony Liguori wrote:
> Alexander Graf wrote:
>> So you would prefer a special #ifdef for s390 in generic code over a
>> specifically for this purpose exported function?
>>
>> Well, you're the boss. I like the special function better, but
>> whatever you say.
>
> How is someone supposed to figure out what _qemu_ram_alloc is for?
> Nothing in your patch really indicates that.
>
> However, an ugly #ifdef immediately tells someone, oh, s390 kvm needs
> this terrible hack, so let's keep bugging those guys to eliminate the
> need for that.

What about this:

-ram_addr_t qemu_ram_alloc(ram_addr_t size)
+ram_addr_t qemu_ram_alloc_at(ram_addr_t size, void *map_at)
  {
      RAMBlock *new_block;

      size = TARGET_PAGE_ALIGN(size);
      new_block = qemu_malloc(sizeof(*new_block));

-    new_block->host = qemu_vmalloc(size);
+    if (map_at) {
+        new_block->host = map_at;
+    } else {
+        new_block->host = qemu_vmalloc(size);
+    }
  #ifdef MADV_MERGEABLE
      madvise(new_block->host, size, MADV_MERGEABLE);
  #endif

and calling mmap where you're currently calling _qemu_ram_alloc?

Paolo
Anthony Liguori - Oct. 21, 2009, 6:48 p.m.
Paolo Bonzini wrote:
> What about this:
>
> -ram_addr_t qemu_ram_alloc(ram_addr_t size)
> +ram_addr_t qemu_ram_alloc_at(ram_addr_t size, void *map_at)
>  {
>      RAMBlock *new_block;
>
>      size = TARGET_PAGE_ALIGN(size);
>      new_block = qemu_malloc(sizeof(*new_block));
>
> -    new_block->host = qemu_vmalloc(size);
> +    if (map_at) {
> +        new_block->host = map_at;
> +    } else {
> +        new_block->host = qemu_vmalloc(size);
> +    }
>  #ifdef MADV_MERGEABLE
>      madvise(new_block->host, size, MADV_MERGEABLE);
>  #endif
>
> and calling mmap where you're currently calling _qemu_ram_alloc?

I'm not sure.  It's definitely better but I don't think we really want 
machines to decide whether their memory is allocated generally 
speaking.  That's the advantage of hiding behind qemu_ram_alloc().

Another way to look at this is that it's roughly the same problem as 
qemu-kvm's -mem-path.  -mem-path doesn't really address subregions 
though.  In order to support shared memory regions, I think it necessary 
to be able to say something like, ram region XX..YY should be tied to 
this mmap()'d region.  Either way, these operations should be 
transparent to the caller of qemu_ram_alloc().

I didn't want to suggest solving the larger problem because I didn't 
want to bog down an otherwise reasonable series.

Regards,

Anthony Liguori

> Paolo
Carsten Otte - Oct. 22, 2009, 5:39 a.m.
Anthony Liguori wrote:
> However, an ugly #ifdef immediately tells someone, oh, s390 kvm needs 
> this terrible hack, so let's keep bugging those guys to eliminate the 
> need for that.
/me moves cleaning up memslot handling a little further up the prio 
list. I fear it still won't make it to queue head in the near future, 
but I'll see what I can do.

Patch

diff --git a/cpu-common.h b/cpu-common.h
index 6302372..ecaf9e3 100644
--- a/cpu-common.h
+++ b/cpu-common.h
@@ -30,6 +30,7 @@  static inline void cpu_register_physical_memory(target_phys_addr_t start_addr,
 
 ram_addr_t cpu_get_physical_page_desc(target_phys_addr_t addr);
 ram_addr_t qemu_ram_alloc(ram_addr_t);
+ram_addr_t _qemu_ram_alloc(ram_addr_t size, void *map_at);
 void qemu_ram_free(ram_addr_t addr);
 /* This should only be used for ram local to a device.  */
 void *qemu_get_ram_ptr(ram_addr_t addr);
diff --git a/exec.c b/exec.c
index 076d26b..36c26cd 100644
--- a/exec.c
+++ b/exec.c
@@ -2404,14 +2404,20 @@  void qemu_unregister_coalesced_mmio(target_phys_addr_t addr, ram_addr_t size)
         kvm_uncoalesce_mmio_region(addr, size);
 }
 
-ram_addr_t qemu_ram_alloc(ram_addr_t size)
+ram_addr_t _qemu_ram_alloc(ram_addr_t size, void *map_at)
 {
     RAMBlock *new_block;
 
     size = TARGET_PAGE_ALIGN(size);
     new_block = qemu_malloc(sizeof(*new_block));
 
-    new_block->host = qemu_vmalloc(size);
+    if (map_at) {
+        new_block->host = mmap(map_at, size, PROT_EXEC|PROT_READ|PROT_WRITE,
+                               MAP_SHARED | MAP_ANONYMOUS, -1, 0);
+
+    } else {
+        new_block->host = qemu_vmalloc(size);
+    }
 #ifdef MADV_MERGEABLE
     madvise(new_block->host, size, MADV_MERGEABLE);
 #endif
@@ -2434,6 +2440,11 @@  ram_addr_t qemu_ram_alloc(ram_addr_t size)
     return new_block->offset;
 }
 
+ram_addr_t qemu_ram_alloc(ram_addr_t size)
+{
+    return _qemu_ram_alloc(size, NULL);
+}
+
 void qemu_ram_free(ram_addr_t addr)
 {
     /* TODO: implement this.  */