diff mbox series

[5/6] block/nvme: Align block pages queue to host page size

Message ID 20200504094641.4963-6-philmd@redhat.com
State New
Headers show
Series block/nvme: Align block pages queue to host page size | expand

Commit Message

Philippe Mathieu-Daudé May 4, 2020, 9:46 a.m. UTC
In nvme_create_queue_pair() we create a page list using
qemu_blockalign(), then map it with qemu_vfio_dma_map():

  q->prp_list_pages = qemu_blockalign0(bs, s->page_size * NVME_QUEUE_SIZE);
  r = qemu_vfio_dma_map(s->vfio, q->prp_list_pages,
                        s->page_size * NVME_QUEUE_SIZE, ...);

With:

  s->page_size = MAX(4096, 1 << (12 + ((cap >> 48) & 0xF)));

The qemu_vfio_dma_map() documentation says "The caller need
to make sure the area is aligned to page size". While we use
multiple s->page_size as alignment, it might be not sufficient
on some hosts. Use the qemu_real_host_page_size value to be
sure the host alignment is respected.

Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
---
Cc: Cédric Le Goater <clg@kaod.org>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Laurent Vivier <lvivier@redhat.com>
---
 block/nvme.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

David Gibson May 5, 2020, 12:34 a.m. UTC | #1
On Mon, May 04, 2020 at 11:46:40AM +0200, Philippe Mathieu-Daudé wrote:
> In nvme_create_queue_pair() we create a page list using
> qemu_blockalign(), then map it with qemu_vfio_dma_map():
> 
>   q->prp_list_pages = qemu_blockalign0(bs, s->page_size * NVME_QUEUE_SIZE);
>   r = qemu_vfio_dma_map(s->vfio, q->prp_list_pages,
>                         s->page_size * NVME_QUEUE_SIZE, ...);
> 
> With:
> 
>   s->page_size = MAX(4096, 1 << (12 + ((cap >> 48) & 0xF)));
> 
> The qemu_vfio_dma_map() documentation says "The caller need
> to make sure the area is aligned to page size". While we use
> multiple s->page_size as alignment, it might be not sufficient
> on some hosts. Use the qemu_real_host_page_size value to be
> sure the host alignment is respected.
> 
> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
> Cc: Cédric Le Goater <clg@kaod.org>
> Cc: David Gibson <david@gibson.dropbear.id.au>
> Cc: Laurent Vivier <lvivier@redhat.com>
> ---
>  block/nvme.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/block/nvme.c b/block/nvme.c
> index 7b7c0cc5d6..bde0d28b39 100644
> --- a/block/nvme.c
> +++ b/block/nvme.c
> @@ -627,7 +627,7 @@ static int nvme_init(BlockDriverState *bs, const char *device, int namespace,
>  
>      s->page_size = MAX(4096, 1 << (12 + ((cap >> 48) & 0xF)));
>      s->doorbell_scale = (4 << (((cap >> 32) & 0xF))) / sizeof(uint32_t);
> -    bs->bl.opt_mem_alignment = s->page_size;
> +    bs->bl.opt_mem_alignment = MAX(qemu_real_host_page_size, s->page_size);
>      timeout_ms = MIN(500 * ((cap >> 24) & 0xFF), 30000);
>  
>      /* Reset device to get a clean state. */
Laurent Vivier May 5, 2020, 8 a.m. UTC | #2
On 04/05/2020 11:46, Philippe Mathieu-Daudé wrote:
> In nvme_create_queue_pair() we create a page list using
> qemu_blockalign(), then map it with qemu_vfio_dma_map():
> 
>   q->prp_list_pages = qemu_blockalign0(bs, s->page_size * NVME_QUEUE_SIZE);
>   r = qemu_vfio_dma_map(s->vfio, q->prp_list_pages,
>                         s->page_size * NVME_QUEUE_SIZE, ...);
> 
> With:
> 
>   s->page_size = MAX(4096, 1 << (12 + ((cap >> 48) & 0xF)));
> 
> The qemu_vfio_dma_map() documentation says "The caller need
> to make sure the area is aligned to page size". While we use
> multiple s->page_size as alignment, it might be not sufficient
> on some hosts. Use the qemu_real_host_page_size value to be
> sure the host alignment is respected.
> 
> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
> ---
> Cc: Cédric Le Goater <clg@kaod.org>
> Cc: David Gibson <david@gibson.dropbear.id.au>
> Cc: Laurent Vivier <lvivier@redhat.com>
> ---
>  block/nvme.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/block/nvme.c b/block/nvme.c
> index 7b7c0cc5d6..bde0d28b39 100644
> --- a/block/nvme.c
> +++ b/block/nvme.c
> @@ -627,7 +627,7 @@ static int nvme_init(BlockDriverState *bs, const char *device, int namespace,
>  
>      s->page_size = MAX(4096, 1 << (12 + ((cap >> 48) & 0xF)));
>      s->doorbell_scale = (4 << (((cap >> 32) & 0xF))) / sizeof(uint32_t);
> -    bs->bl.opt_mem_alignment = s->page_size;
> +    bs->bl.opt_mem_alignment = MAX(qemu_real_host_page_size, s->page_size);

Why don't you replace directly the "4096" in s->page_size by
qemu_real_host_page_size?

I think this was the purpose of MAX(4096, ...) to align to the host page
size.

Thanks,
Laurnt
Laurent Vivier May 5, 2020, 8:23 a.m. UTC | #3
On 05/05/2020 10:00, Laurent Vivier wrote:
> On 04/05/2020 11:46, Philippe Mathieu-Daudé wrote:
>> In nvme_create_queue_pair() we create a page list using
>> qemu_blockalign(), then map it with qemu_vfio_dma_map():
>>
>>   q->prp_list_pages = qemu_blockalign0(bs, s->page_size * NVME_QUEUE_SIZE);
>>   r = qemu_vfio_dma_map(s->vfio, q->prp_list_pages,
>>                         s->page_size * NVME_QUEUE_SIZE, ...);
>>
>> With:
>>
>>   s->page_size = MAX(4096, 1 << (12 + ((cap >> 48) & 0xF)));
>>
>> The qemu_vfio_dma_map() documentation says "The caller need
>> to make sure the area is aligned to page size". While we use
>> multiple s->page_size as alignment, it might be not sufficient
>> on some hosts. Use the qemu_real_host_page_size value to be
>> sure the host alignment is respected.
>>
>> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
>> ---
>> Cc: Cédric Le Goater <clg@kaod.org>
>> Cc: David Gibson <david@gibson.dropbear.id.au>
>> Cc: Laurent Vivier <lvivier@redhat.com>
>> ---
>>  block/nvme.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/block/nvme.c b/block/nvme.c
>> index 7b7c0cc5d6..bde0d28b39 100644
>> --- a/block/nvme.c
>> +++ b/block/nvme.c
>> @@ -627,7 +627,7 @@ static int nvme_init(BlockDriverState *bs, const char *device, int namespace,
>>  
>>      s->page_size = MAX(4096, 1 << (12 + ((cap >> 48) & 0xF)));
>>      s->doorbell_scale = (4 << (((cap >> 32) & 0xF))) / sizeof(uint32_t);
>> -    bs->bl.opt_mem_alignment = s->page_size;
>> +    bs->bl.opt_mem_alignment = MAX(qemu_real_host_page_size, s->page_size);
> 
> Why don't you replace directly the "4096" in s->page_size by
> qemu_real_host_page_size?
> 
> I think this was the purpose of MAX(4096, ...) to align to the host page
> size.

oh, I see, page_size is used for a lot of things than can be broken, and
this is what is done in bdrv_opt_mem_align() for instance, 4096 is the
sector size not the host page size.

BTW, you need the same change as in nvme_init() in
nvme_refresh_limits(), I think.

Thanks,
Laurent
Philippe Mathieu-Daudé May 5, 2020, 9:48 a.m. UTC | #4
On 5/5/20 10:23 AM, Laurent Vivier wrote:
> On 05/05/2020 10:00, Laurent Vivier wrote:
>> On 04/05/2020 11:46, Philippe Mathieu-Daudé wrote:
>>> In nvme_create_queue_pair() we create a page list using
>>> qemu_blockalign(), then map it with qemu_vfio_dma_map():
>>>
>>>    q->prp_list_pages = qemu_blockalign0(bs, s->page_size * NVME_QUEUE_SIZE);
>>>    r = qemu_vfio_dma_map(s->vfio, q->prp_list_pages,
>>>                          s->page_size * NVME_QUEUE_SIZE, ...);
>>>
>>> With:
>>>
>>>    s->page_size = MAX(4096, 1 << (12 + ((cap >> 48) & 0xF)));
>>>
>>> The qemu_vfio_dma_map() documentation says "The caller need
>>> to make sure the area is aligned to page size". While we use
>>> multiple s->page_size as alignment, it might be not sufficient
>>> on some hosts. Use the qemu_real_host_page_size value to be
>>> sure the host alignment is respected.
>>>
>>> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
>>> ---
>>> Cc: Cédric Le Goater <clg@kaod.org>
>>> Cc: David Gibson <david@gibson.dropbear.id.au>
>>> Cc: Laurent Vivier <lvivier@redhat.com>
>>> ---
>>>   block/nvme.c | 2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/block/nvme.c b/block/nvme.c
>>> index 7b7c0cc5d6..bde0d28b39 100644
>>> --- a/block/nvme.c
>>> +++ b/block/nvme.c
>>> @@ -627,7 +627,7 @@ static int nvme_init(BlockDriverState *bs, const char *device, int namespace,
>>>   
>>>       s->page_size = MAX(4096, 1 << (12 + ((cap >> 48) & 0xF)));
>>>       s->doorbell_scale = (4 << (((cap >> 32) & 0xF))) / sizeof(uint32_t);
>>> -    bs->bl.opt_mem_alignment = s->page_size;
>>> +    bs->bl.opt_mem_alignment = MAX(qemu_real_host_page_size, s->page_size);
>>
>> Why don't you replace directly the "4096" in s->page_size by
>> qemu_real_host_page_size?
>>
>> I think this was the purpose of MAX(4096, ...) to align to the host page
>> size.
> 
> oh, I see, page_size is used for a lot of things than can be broken, and
> this is what is done in bdrv_opt_mem_align() for instance, 4096 is the
> sector size not the host page size.
> 
> BTW, you need the same change as in nvme_init() in
> nvme_refresh_limits(), I think.

nvme_init() seems OK, but you are right, I missed 
nvme_refresh_limits()... Thanks!

> 
> Thanks,
> Laurent
> 
>
Philippe Mathieu-Daudé May 5, 2020, 3:51 p.m. UTC | #5
On 5/4/20 11:46 AM, Philippe Mathieu-Daudé wrote:
> In nvme_create_queue_pair() we create a page list using
> qemu_blockalign(), then map it with qemu_vfio_dma_map():
> 
>    q->prp_list_pages = qemu_blockalign0(bs, s->page_size * NVME_QUEUE_SIZE);
>    r = qemu_vfio_dma_map(s->vfio, q->prp_list_pages,
>                          s->page_size * NVME_QUEUE_SIZE, ...);
> 
> With:
> 
>    s->page_size = MAX(4096, 1 << (12 + ((cap >> 48) & 0xF)));
> 
> The qemu_vfio_dma_map() documentation says "The caller need
> to make sure the area is aligned to page size". While we use
> multiple s->page_size as alignment, it might be not sufficient
> on some hosts. Use the qemu_real_host_page_size value to be
> sure the host alignment is respected.
> 
> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
> ---
> Cc: Cédric Le Goater <clg@kaod.org>
> Cc: David Gibson <david@gibson.dropbear.id.au>
> Cc: Laurent Vivier <lvivier@redhat.com>
> ---
>   block/nvme.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/block/nvme.c b/block/nvme.c
> index 7b7c0cc5d6..bde0d28b39 100644
> --- a/block/nvme.c
> +++ b/block/nvme.c
> @@ -627,7 +627,7 @@ static int nvme_init(BlockDriverState *bs, const char *device, int namespace,
>   
>       s->page_size = MAX(4096, 1 << (12 + ((cap >> 48) & 0xF)));
>       s->doorbell_scale = (4 << (((cap >> 32) & 0xF))) / sizeof(uint32_t);
> -    bs->bl.opt_mem_alignment = s->page_size;
> +    bs->bl.opt_mem_alignment = MAX(qemu_real_host_page_size, s->page_size);
>       timeout_ms = MIN(500 * ((cap >> 24) & 0xFF), 30000);
>   
>       /* Reset device to get a clean state. */
> 

Please disregard this patch as it is incomplete.
diff mbox series

Patch

diff --git a/block/nvme.c b/block/nvme.c
index 7b7c0cc5d6..bde0d28b39 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -627,7 +627,7 @@  static int nvme_init(BlockDriverState *bs, const char *device, int namespace,
 
     s->page_size = MAX(4096, 1 << (12 + ((cap >> 48) & 0xF)));
     s->doorbell_scale = (4 << (((cap >> 32) & 0xF))) / sizeof(uint32_t);
-    bs->bl.opt_mem_alignment = s->page_size;
+    bs->bl.opt_mem_alignment = MAX(qemu_real_host_page_size, s->page_size);
     timeout_ms = MIN(500 * ((cap >> 24) & 0xFF), 30000);
 
     /* Reset device to get a clean state. */