Message ID | 20200116202414.157959-3-dgilbert@redhat.com |
---|---|
State | New |
Headers | show |
Series | Fix hyperv synic on vhost | expand |
On 16/01/20 21:24, Dr. David Alan Gilbert (git) wrote: > + if (dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_USER) { > + /* Round the section to it's page size */ > + /* First align the start down to a page boundary */ > + size_t mrs_page = qemu_ram_pagesize(mrs_rb); > + uint64_t alignage = mrs_host & (mrs_page - 1); > + if (alignage) { > + mrs_host -= alignage; > + mrs_size += alignage; > + mrs_gpa -= alignage; > + } > + /* Now align the size up to a page boundary */ > + alignage = mrs_size & (mrs_page - 1); > + if (alignage) { > + mrs_size += mrs_page - alignage; > + } > + trace_vhost_region_add_section_aligned(section->mr->name, mrs_gpa, mrs_size, > + mrs_host); > + } Ok, now I understand! So it seems to me that the vhost-user protocol is deficient, it should have had two different fields for the region size and the region alignment (so that mmap does not fail). But I guess that's just yet another thing to remember for vhost-user v2. I would add a comment to explain why the alignment is needed in the first place, but this fix is certainly much more pleasant. Thanks very much. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Paolo
On Fri, Jan 17, 2020 at 01:52:44PM +0100, Paolo Bonzini wrote: > On 16/01/20 21:24, Dr. David Alan Gilbert (git) wrote: > > + if (dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_USER) { > > + /* Round the section to it's page size */ > > + /* First align the start down to a page boundary */ > > + size_t mrs_page = qemu_ram_pagesize(mrs_rb); > > + uint64_t alignage = mrs_host & (mrs_page - 1); > > + if (alignage) { > > + mrs_host -= alignage; > > + mrs_size += alignage; > > + mrs_gpa -= alignage; > > + } > > + /* Now align the size up to a page boundary */ > > + alignage = mrs_size & (mrs_page - 1); > > + if (alignage) { > > + mrs_size += mrs_page - alignage; > > + } > > + trace_vhost_region_add_section_aligned(section->mr->name, mrs_gpa, mrs_size, > > + mrs_host); > > + } > > Ok, now I understand! > > So it seems to me that the vhost-user protocol is deficient, it should > have had two different fields for the region size and the region > alignment (so that mmap does not fail). But I guess that's just yet > another thing to remember for vhost-user v2. We don't really need v2 just to add a field. Compatibility is maintained using feature bits. Adding that is a subject for another patch. But I'm not sure I understand why does remote need to know about alignment. This patch seems to handle it locally ... > I would add a comment to explain why the alignment is needed in the > first place, but this fix is certainly much more pleasant. Thanks very > much. > > Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> > > Paolo
On 17/01/20 14:40, Michael S. Tsirkin wrote: > We don't really need v2 just to add a field. Compatibility is maintained > using feature bits. Adding that is a subject for another patch. > But I'm not sure I understand why does remote need to know about alignment. > This patch seems to handle it locally ... Because the remote vhost here will not be able to use the synic regions. If it did, it would have the same overlap problem as vhost-kernel. The alignment is needed because, even if you are mapping only [768k,1M) of a 2M hugepage, you need to mmap [0,2M). You can then discard the rest, but IIUC if you only mmap [768k,1M) then the kernel will fail the mmap. Paolo
On Fri, Jan 17, 2020 at 02:58:47PM +0100, Paolo Bonzini wrote: > On 17/01/20 14:40, Michael S. Tsirkin wrote: > > We don't really need v2 just to add a field. Compatibility is maintained > > using feature bits. Adding that is a subject for another patch. > > But I'm not sure I understand why does remote need to know about alignment. > > This patch seems to handle it locally ... > > Because the remote vhost here will not be able to use the synic regions. > If it did, it would have the same overlap problem as vhost-kernel. > > The alignment is needed because, even if you are mapping only [768k,1M) > of a 2M hugepage, you need to mmap [0,2M). You can then discard the > rest, but IIUC if you only mmap [768k,1M) then the kernel will fail the > mmap. > > Paolo So right now remote will query the fd passed to get the alignment. You are basically saying it's not enough in some cases?
On 17/01/20 15:25, Michael S. Tsirkin wrote: > On Fri, Jan 17, 2020 at 02:58:47PM +0100, Paolo Bonzini wrote: >> On 17/01/20 14:40, Michael S. Tsirkin wrote: >>> We don't really need v2 just to add a field. Compatibility is maintained >>> using feature bits. Adding that is a subject for another patch. >>> But I'm not sure I understand why does remote need to know about alignment. >>> This patch seems to handle it locally ... >> >> Because the remote vhost here will not be able to use the synic regions. >> If it did, it would have the same overlap problem as vhost-kernel. >> >> The alignment is needed because, even if you are mapping only [768k,1M) >> of a 2M hugepage, you need to mmap [0,2M). You can then discard the >> rest, but IIUC if you only mmap [768k,1M) then the kernel will fail the >> mmap. > > So right now remote will query the fd passed to get the alignment. It should, but will it? It's not in the spec and I assume QEMU is doing this alignment work because some server is not doing it. But indeed we could use a feature bit to say "don't worry I will be doing the right thing". Paolo > You are basically saying it's not enough in some cases? >
* Paolo Bonzini (pbonzini@redhat.com) wrote: > On 17/01/20 14:40, Michael S. Tsirkin wrote: > > We don't really need v2 just to add a field. Compatibility is maintained > > using feature bits. Adding that is a subject for another patch. > > But I'm not sure I understand why does remote need to know about alignment. > > This patch seems to handle it locally ... > > Because the remote vhost here will not be able to use the synic regions. > If it did, it would have the same overlap problem as vhost-kernel. > > The alignment is needed because, even if you are mapping only [768k,1M) > of a 2M hugepage, you need to mmap [0,2M). You can then discard the > rest, but IIUC if you only mmap [768k,1M) then the kernel will fail the > mmap. The vhost-user problem is worse than that. Lets ignore the synic regions for a minute; imagine a normal PC, 0-512k ish, and a 512k-1GB map; the client side can do the alignment to allow an mmap to work for each of those two mappings, you then end up with two mmap's on the client, both aligned at 0, each covering part of the same range. If the client is careful in normal use it can make sure it's OK, but you do end up with two mappings which is odd. Once you add userfault for psotcopy, having those two mappings of the same range gets even more problematic when we wait for one of the pages to become available. The other problem is that there's a limited number of slots for mappings in the vhost-user protocol; and they easily get filled up if you break up the memory ranges. Doing the aggregation on the qemu side keeps your slot usage down and also removes the problem of the double mappings. The client can't easily resolve the double mapping alignment because it can't tell if the two fd's it's been passed actually refer to the same backing file; so it can't aggregate (from memory). Dave > Paolo > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c index 774d87d98e..25fd469179 100644 --- a/hw/virtio/vhost.c +++ b/hw/virtio/vhost.c @@ -547,26 +547,28 @@ static void vhost_region_add_section(struct vhost_dev *dev, uintptr_t mrs_host = (uintptr_t)memory_region_get_ram_ptr(section->mr) + section->offset_within_region; RAMBlock *mrs_rb = section->mr->ram_block; - size_t mrs_page = qemu_ram_pagesize(mrs_rb); trace_vhost_region_add_section(section->mr->name, mrs_gpa, mrs_size, mrs_host); - /* Round the section to it's page size */ - /* First align the start down to a page boundary */ - uint64_t alignage = mrs_host & (mrs_page - 1); - if (alignage) { - mrs_host -= alignage; - mrs_size += alignage; - mrs_gpa -= alignage; - } - /* Now align the size up to a page boundary */ - alignage = mrs_size & (mrs_page - 1); - if (alignage) { - mrs_size += mrs_page - alignage; - } - trace_vhost_region_add_section_aligned(section->mr->name, mrs_gpa, mrs_size, - mrs_host); + if (dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_USER) { + /* Round the section to it's page size */ + /* First align the start down to a page boundary */ + size_t mrs_page = qemu_ram_pagesize(mrs_rb); + uint64_t alignage = mrs_host & (mrs_page - 1); + if (alignage) { + mrs_host -= alignage; + mrs_size += alignage; + mrs_gpa -= alignage; + } + /* Now align the size up to a page boundary */ + alignage = mrs_size & (mrs_page - 1); + if (alignage) { + mrs_size += mrs_page - alignage; + } + trace_vhost_region_add_section_aligned(section->mr->name, mrs_gpa, mrs_size, + mrs_host); + } if (dev->n_tmp_sections) { /* Since we already have at least one section, lets see if