Message ID | 20200504094641.4963-6-philmd@redhat.com |
---|---|
State | New |
Headers | show |
Series | block/nvme: Align block pages queue to host page size | expand |
On Mon, May 04, 2020 at 11:46:40AM +0200, Philippe Mathieu-Daudé wrote: > In nvme_create_queue_pair() we create a page list using > qemu_blockalign(), then map it with qemu_vfio_dma_map(): > > q->prp_list_pages = qemu_blockalign0(bs, s->page_size * NVME_QUEUE_SIZE); > r = qemu_vfio_dma_map(s->vfio, q->prp_list_pages, > s->page_size * NVME_QUEUE_SIZE, ...); > > With: > > s->page_size = MAX(4096, 1 << (12 + ((cap >> 48) & 0xF))); > > The qemu_vfio_dma_map() documentation says "The caller need > to make sure the area is aligned to page size". While we use > multiple s->page_size as alignment, it might be not sufficient > on some hosts. Use the qemu_real_host_page_size value to be > sure the host alignment is respected. > > Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> > --- > Cc: Cédric Le Goater <clg@kaod.org> > Cc: David Gibson <david@gibson.dropbear.id.au> > Cc: Laurent Vivier <lvivier@redhat.com> > --- > block/nvme.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/block/nvme.c b/block/nvme.c > index 7b7c0cc5d6..bde0d28b39 100644 > --- a/block/nvme.c > +++ b/block/nvme.c > @@ -627,7 +627,7 @@ static int nvme_init(BlockDriverState *bs, const char *device, int namespace, > > s->page_size = MAX(4096, 1 << (12 + ((cap >> 48) & 0xF))); > s->doorbell_scale = (4 << (((cap >> 32) & 0xF))) / sizeof(uint32_t); > - bs->bl.opt_mem_alignment = s->page_size; > + bs->bl.opt_mem_alignment = MAX(qemu_real_host_page_size, s->page_size); > timeout_ms = MIN(500 * ((cap >> 24) & 0xFF), 30000); > > /* Reset device to get a clean state. */
On 04/05/2020 11:46, Philippe Mathieu-Daudé wrote: > In nvme_create_queue_pair() we create a page list using > qemu_blockalign(), then map it with qemu_vfio_dma_map(): > > q->prp_list_pages = qemu_blockalign0(bs, s->page_size * NVME_QUEUE_SIZE); > r = qemu_vfio_dma_map(s->vfio, q->prp_list_pages, > s->page_size * NVME_QUEUE_SIZE, ...); > > With: > > s->page_size = MAX(4096, 1 << (12 + ((cap >> 48) & 0xF))); > > The qemu_vfio_dma_map() documentation says "The caller need > to make sure the area is aligned to page size". While we use > multiple s->page_size as alignment, it might be not sufficient > on some hosts. Use the qemu_real_host_page_size value to be > sure the host alignment is respected. > > Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> > --- > Cc: Cédric Le Goater <clg@kaod.org> > Cc: David Gibson <david@gibson.dropbear.id.au> > Cc: Laurent Vivier <lvivier@redhat.com> > --- > block/nvme.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/block/nvme.c b/block/nvme.c > index 7b7c0cc5d6..bde0d28b39 100644 > --- a/block/nvme.c > +++ b/block/nvme.c > @@ -627,7 +627,7 @@ static int nvme_init(BlockDriverState *bs, const char *device, int namespace, > > s->page_size = MAX(4096, 1 << (12 + ((cap >> 48) & 0xF))); > s->doorbell_scale = (4 << (((cap >> 32) & 0xF))) / sizeof(uint32_t); > - bs->bl.opt_mem_alignment = s->page_size; > + bs->bl.opt_mem_alignment = MAX(qemu_real_host_page_size, s->page_size); Why don't you replace directly the "4096" in s->page_size by qemu_real_host_page_size? I think this was the purpose of MAX(4096, ...) to align to the host page size. Thanks, Laurnt
On 05/05/2020 10:00, Laurent Vivier wrote: > On 04/05/2020 11:46, Philippe Mathieu-Daudé wrote: >> In nvme_create_queue_pair() we create a page list using >> qemu_blockalign(), then map it with qemu_vfio_dma_map(): >> >> q->prp_list_pages = qemu_blockalign0(bs, s->page_size * NVME_QUEUE_SIZE); >> r = qemu_vfio_dma_map(s->vfio, q->prp_list_pages, >> s->page_size * NVME_QUEUE_SIZE, ...); >> >> With: >> >> s->page_size = MAX(4096, 1 << (12 + ((cap >> 48) & 0xF))); >> >> The qemu_vfio_dma_map() documentation says "The caller need >> to make sure the area is aligned to page size". While we use >> multiple s->page_size as alignment, it might be not sufficient >> on some hosts. Use the qemu_real_host_page_size value to be >> sure the host alignment is respected. >> >> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> >> --- >> Cc: Cédric Le Goater <clg@kaod.org> >> Cc: David Gibson <david@gibson.dropbear.id.au> >> Cc: Laurent Vivier <lvivier@redhat.com> >> --- >> block/nvme.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/block/nvme.c b/block/nvme.c >> index 7b7c0cc5d6..bde0d28b39 100644 >> --- a/block/nvme.c >> +++ b/block/nvme.c >> @@ -627,7 +627,7 @@ static int nvme_init(BlockDriverState *bs, const char *device, int namespace, >> >> s->page_size = MAX(4096, 1 << (12 + ((cap >> 48) & 0xF))); >> s->doorbell_scale = (4 << (((cap >> 32) & 0xF))) / sizeof(uint32_t); >> - bs->bl.opt_mem_alignment = s->page_size; >> + bs->bl.opt_mem_alignment = MAX(qemu_real_host_page_size, s->page_size); > > Why don't you replace directly the "4096" in s->page_size by > qemu_real_host_page_size? > > I think this was the purpose of MAX(4096, ...) to align to the host page > size. oh, I see, page_size is used for a lot of things than can be broken, and this is what is done in bdrv_opt_mem_align() for instance, 4096 is the sector size not the host page size. BTW, you need the same change as in nvme_init() in nvme_refresh_limits(), I think. Thanks, Laurent
On 5/5/20 10:23 AM, Laurent Vivier wrote: > On 05/05/2020 10:00, Laurent Vivier wrote: >> On 04/05/2020 11:46, Philippe Mathieu-Daudé wrote: >>> In nvme_create_queue_pair() we create a page list using >>> qemu_blockalign(), then map it with qemu_vfio_dma_map(): >>> >>> q->prp_list_pages = qemu_blockalign0(bs, s->page_size * NVME_QUEUE_SIZE); >>> r = qemu_vfio_dma_map(s->vfio, q->prp_list_pages, >>> s->page_size * NVME_QUEUE_SIZE, ...); >>> >>> With: >>> >>> s->page_size = MAX(4096, 1 << (12 + ((cap >> 48) & 0xF))); >>> >>> The qemu_vfio_dma_map() documentation says "The caller need >>> to make sure the area is aligned to page size". While we use >>> multiple s->page_size as alignment, it might be not sufficient >>> on some hosts. Use the qemu_real_host_page_size value to be >>> sure the host alignment is respected. >>> >>> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> >>> --- >>> Cc: Cédric Le Goater <clg@kaod.org> >>> Cc: David Gibson <david@gibson.dropbear.id.au> >>> Cc: Laurent Vivier <lvivier@redhat.com> >>> --- >>> block/nvme.c | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/block/nvme.c b/block/nvme.c >>> index 7b7c0cc5d6..bde0d28b39 100644 >>> --- a/block/nvme.c >>> +++ b/block/nvme.c >>> @@ -627,7 +627,7 @@ static int nvme_init(BlockDriverState *bs, const char *device, int namespace, >>> >>> s->page_size = MAX(4096, 1 << (12 + ((cap >> 48) & 0xF))); >>> s->doorbell_scale = (4 << (((cap >> 32) & 0xF))) / sizeof(uint32_t); >>> - bs->bl.opt_mem_alignment = s->page_size; >>> + bs->bl.opt_mem_alignment = MAX(qemu_real_host_page_size, s->page_size); >> >> Why don't you replace directly the "4096" in s->page_size by >> qemu_real_host_page_size? >> >> I think this was the purpose of MAX(4096, ...) to align to the host page >> size. > > oh, I see, page_size is used for a lot of things than can be broken, and > this is what is done in bdrv_opt_mem_align() for instance, 4096 is the > sector size not the host page size. > > BTW, you need the same change as in nvme_init() in > nvme_refresh_limits(), I think. nvme_init() seems OK, but you are right, I missed nvme_refresh_limits()... Thanks! > > Thanks, > Laurent > >
On 5/4/20 11:46 AM, Philippe Mathieu-Daudé wrote: > In nvme_create_queue_pair() we create a page list using > qemu_blockalign(), then map it with qemu_vfio_dma_map(): > > q->prp_list_pages = qemu_blockalign0(bs, s->page_size * NVME_QUEUE_SIZE); > r = qemu_vfio_dma_map(s->vfio, q->prp_list_pages, > s->page_size * NVME_QUEUE_SIZE, ...); > > With: > > s->page_size = MAX(4096, 1 << (12 + ((cap >> 48) & 0xF))); > > The qemu_vfio_dma_map() documentation says "The caller need > to make sure the area is aligned to page size". While we use > multiple s->page_size as alignment, it might be not sufficient > on some hosts. Use the qemu_real_host_page_size value to be > sure the host alignment is respected. > > Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> > --- > Cc: Cédric Le Goater <clg@kaod.org> > Cc: David Gibson <david@gibson.dropbear.id.au> > Cc: Laurent Vivier <lvivier@redhat.com> > --- > block/nvme.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/block/nvme.c b/block/nvme.c > index 7b7c0cc5d6..bde0d28b39 100644 > --- a/block/nvme.c > +++ b/block/nvme.c > @@ -627,7 +627,7 @@ static int nvme_init(BlockDriverState *bs, const char *device, int namespace, > > s->page_size = MAX(4096, 1 << (12 + ((cap >> 48) & 0xF))); > s->doorbell_scale = (4 << (((cap >> 32) & 0xF))) / sizeof(uint32_t); > - bs->bl.opt_mem_alignment = s->page_size; > + bs->bl.opt_mem_alignment = MAX(qemu_real_host_page_size, s->page_size); > timeout_ms = MIN(500 * ((cap >> 24) & 0xFF), 30000); > > /* Reset device to get a clean state. */ > Please disregard this patch as it is incomplete.
diff --git a/block/nvme.c b/block/nvme.c index 7b7c0cc5d6..bde0d28b39 100644 --- a/block/nvme.c +++ b/block/nvme.c @@ -627,7 +627,7 @@ static int nvme_init(BlockDriverState *bs, const char *device, int namespace, s->page_size = MAX(4096, 1 << (12 + ((cap >> 48) & 0xF))); s->doorbell_scale = (4 << (((cap >> 32) & 0xF))) / sizeof(uint32_t); - bs->bl.opt_mem_alignment = s->page_size; + bs->bl.opt_mem_alignment = MAX(qemu_real_host_page_size, s->page_size); timeout_ms = MIN(500 * ((cap >> 24) & 0xFF), 30000); /* Reset device to get a clean state. */
In nvme_create_queue_pair() we create a page list using qemu_blockalign(), then map it with qemu_vfio_dma_map(): q->prp_list_pages = qemu_blockalign0(bs, s->page_size * NVME_QUEUE_SIZE); r = qemu_vfio_dma_map(s->vfio, q->prp_list_pages, s->page_size * NVME_QUEUE_SIZE, ...); With: s->page_size = MAX(4096, 1 << (12 + ((cap >> 48) & 0xF))); The qemu_vfio_dma_map() documentation says "The caller need to make sure the area is aligned to page size". While we use multiple s->page_size as alignment, it might be not sufficient on some hosts. Use the qemu_real_host_page_size value to be sure the host alignment is respected. Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> --- Cc: Cédric Le Goater <clg@kaod.org> Cc: David Gibson <david@gibson.dropbear.id.au> Cc: Laurent Vivier <lvivier@redhat.com> --- block/nvme.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)