Message ID | 20210308150600.14440-3-david@redhat.com |
---|---|
State | New |
Headers | show |
Series | RAM_NORESERVE, MAP_NORESERVE and hostmem "reserve" property | expand |
* David Hildenbrand (david@redhat.com) wrote: > We can create shared anonymous memory via > "-object memory-backend-ram,share=on,..." > which is, for example, required by PVRDMA for mremap() to work. > > Shared anonymous memory is weird, though. Instead of MADV_DONTNEED, we > have to use MADV_REMOVE. MADV_DONTNEED fails silently and does nothing. OK, I wonder how stable these rules are; is it defined anywhere that it's required? Still, Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> > Fixes: 06329ccecfa0 ("mem: add share parameter to memory-backend-ram") > Signed-off-by: David Hildenbrand <david@redhat.com> > --- > softmmu/physmem.c | 7 ++++++- > 1 file changed, 6 insertions(+), 1 deletion(-) > > diff --git a/softmmu/physmem.c b/softmmu/physmem.c > index 62ea4abbdd..2ba815fec6 100644 > --- a/softmmu/physmem.c > +++ b/softmmu/physmem.c > @@ -3506,6 +3506,7 @@ int ram_block_discard_range(RAMBlock *rb, uint64_t start, size_t length) > /* The logic here is messy; > * madvise DONTNEED fails for hugepages > * fallocate works on hugepages and shmem > + * shared anonymous memory requires madvise REMOVE > */ > need_madvise = (rb->page_size == qemu_host_page_size); > need_fallocate = rb->fd != -1; > @@ -3539,7 +3540,11 @@ int ram_block_discard_range(RAMBlock *rb, uint64_t start, size_t length) > * fallocate'd away). > */ > #if defined(CONFIG_MADVISE) > - ret = madvise(host_startaddr, length, MADV_DONTNEED); > + if (qemu_ram_is_shared(rb) && rb->fd < 0) { > + ret = madvise(host_startaddr, length, MADV_REMOVE); > + } else { > + ret = madvise(host_startaddr, length, MADV_DONTNEED); > + } > if (ret) { > ret = -errno; > error_report("ram_block_discard_range: Failed to discard range " > -- > 2.29.2 >
On 11.03.21 17:39, Dr. David Alan Gilbert wrote: > * David Hildenbrand (david@redhat.com) wrote: >> We can create shared anonymous memory via >> "-object memory-backend-ram,share=on,..." >> which is, for example, required by PVRDMA for mremap() to work. >> >> Shared anonymous memory is weird, though. Instead of MADV_DONTNEED, we >> have to use MADV_REMOVE. MADV_DONTNEED fails silently and does nothing. > > OK, I wonder how stable these rules are; is it defined anywhere that > it's required? > I had a look at the Linux implementation: it's essentially shmem ... but we don't have an fd exposed, so we cannot use fallocate() ... :) MADV_REMOVE documents (man): "In the initial implementation, only tmpfs(5) was supported MADV_REMOVE; but since Linux 3.5, any filesystem which supports the fallocate(2) FALLOC_FL_PUNCH_HOLE mode also supports MADV_REMOVE." Thanks! > Still, > > > Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> > >> Fixes: 06329ccecfa0 ("mem: add share parameter to memory-backend-ram") >> Signed-off-by: David Hildenbrand <david@redhat.com> >> --- >> softmmu/physmem.c | 7 ++++++- >> 1 file changed, 6 insertions(+), 1 deletion(-) >> >> diff --git a/softmmu/physmem.c b/softmmu/physmem.c >> index 62ea4abbdd..2ba815fec6 100644 >> --- a/softmmu/physmem.c >> +++ b/softmmu/physmem.c >> @@ -3506,6 +3506,7 @@ int ram_block_discard_range(RAMBlock *rb, uint64_t start, size_t length) >> /* The logic here is messy; >> * madvise DONTNEED fails for hugepages >> * fallocate works on hugepages and shmem >> + * shared anonymous memory requires madvise REMOVE >> */ >> need_madvise = (rb->page_size == qemu_host_page_size); >> need_fallocate = rb->fd != -1; >> @@ -3539,7 +3540,11 @@ int ram_block_discard_range(RAMBlock *rb, uint64_t start, size_t length) >> * fallocate'd away). >> */ >> #if defined(CONFIG_MADVISE) >> - ret = madvise(host_startaddr, length, MADV_DONTNEED); >> + if (qemu_ram_is_shared(rb) && rb->fd < 0) { >> + ret = madvise(host_startaddr, length, MADV_REMOVE); >> + } else { >> + ret = madvise(host_startaddr, length, MADV_DONTNEED); >> + } >> if (ret) { >> ret = -errno; >> error_report("ram_block_discard_range: Failed to discard range " >> -- >> 2.29.2 >>
On Thu, Mar 11, 2021 at 05:45:46PM +0100, David Hildenbrand wrote: > On 11.03.21 17:39, Dr. David Alan Gilbert wrote: > > * David Hildenbrand (david@redhat.com) wrote: > > > We can create shared anonymous memory via > > > "-object memory-backend-ram,share=on,..." > > > which is, for example, required by PVRDMA for mremap() to work. > > > > > > Shared anonymous memory is weird, though. Instead of MADV_DONTNEED, we > > > have to use MADV_REMOVE. MADV_DONTNEED fails silently and does nothing. > > > > OK, I wonder how stable these rules are; is it defined anywhere that > > it's required? > > > > I had a look at the Linux implementation: it's essentially shmem ... but we > don't have an fd exposed, so we cannot use fallocate() ... :) > > MADV_REMOVE documents (man): > > "In the initial implementation, only tmpfs(5) was supported MADV_REMOVE; but > since Linux 3.5, any filesystem which supports the fallocate(2) > FALLOC_FL_PUNCH_HOLE mode also supports MADV_REMOVE." Hmm, I see that MADV_DONTNEED will still tear down all mappings even for anonymous shmem.. what did I miss?
On 11.03.21 18:11, Peter Xu wrote: > On Thu, Mar 11, 2021 at 05:45:46PM +0100, David Hildenbrand wrote: >> On 11.03.21 17:39, Dr. David Alan Gilbert wrote: >>> * David Hildenbrand (david@redhat.com) wrote: >>>> We can create shared anonymous memory via >>>> "-object memory-backend-ram,share=on,..." >>>> which is, for example, required by PVRDMA for mremap() to work. >>>> >>>> Shared anonymous memory is weird, though. Instead of MADV_DONTNEED, we >>>> have to use MADV_REMOVE. MADV_DONTNEED fails silently and does nothing. >>> >>> OK, I wonder how stable these rules are; is it defined anywhere that >>> it's required? >>> >> >> I had a look at the Linux implementation: it's essentially shmem ... but we >> don't have an fd exposed, so we cannot use fallocate() ... :) >> >> MADV_REMOVE documents (man): >> >> "In the initial implementation, only tmpfs(5) was supported MADV_REMOVE; but >> since Linux 3.5, any filesystem which supports the fallocate(2) >> FALLOC_FL_PUNCH_HOLE mode also supports MADV_REMOVE." > > Hmm, I see that MADV_DONTNEED will still tear down all mappings even for > anonymous shmem.. what did I miss? Where did you see that? > MADV_DONTNEED only invalidates private copies in the pagecache. It's essentially useless for any kind of shared mappings. (I am 99.9% sure that we can replace fallocate()+MADV_DONTNEED by fallocate() for fd-based shared mappings, but that's a different story)
On 11.03.21 18:15, David Hildenbrand wrote: > On 11.03.21 18:11, Peter Xu wrote: >> On Thu, Mar 11, 2021 at 05:45:46PM +0100, David Hildenbrand wrote: >>> On 11.03.21 17:39, Dr. David Alan Gilbert wrote: >>>> * David Hildenbrand (david@redhat.com) wrote: >>>>> We can create shared anonymous memory via >>>>> "-object memory-backend-ram,share=on,..." >>>>> which is, for example, required by PVRDMA for mremap() to work. >>>>> >>>>> Shared anonymous memory is weird, though. Instead of MADV_DONTNEED, we >>>>> have to use MADV_REMOVE. MADV_DONTNEED fails silently and does nothing. >>>> >>>> OK, I wonder how stable these rules are; is it defined anywhere that >>>> it's required? >>>> >>> >>> I had a look at the Linux implementation: it's essentially shmem ... but we >>> don't have an fd exposed, so we cannot use fallocate() ... :) >>> >>> MADV_REMOVE documents (man): >>> >>> "In the initial implementation, only tmpfs(5) was supported MADV_REMOVE; but >>> since Linux 3.5, any filesystem which supports the fallocate(2) >>> FALLOC_FL_PUNCH_HOLE mode also supports MADV_REMOVE." >> >> Hmm, I see that MADV_DONTNEED will still tear down all mappings even for >> anonymous shmem.. what did I miss? > > Where did you see that? > >> > > MADV_DONTNEED only invalidates private copies in the pagecache. It's > essentially useless for any kind of shared mappings. And to clarify, I think what you see is that the mapping gets torn down, but not the backend storage released/freed. Removing the backend storage (MADV_REMOVE/fallocate()) will implicitly tear down the mapping from what I can tell (and what my experiments show).
On Thu, Mar 11, 2021 at 06:15:15PM +0100, David Hildenbrand wrote: > On 11.03.21 18:11, Peter Xu wrote: > > On Thu, Mar 11, 2021 at 05:45:46PM +0100, David Hildenbrand wrote: > > > On 11.03.21 17:39, Dr. David Alan Gilbert wrote: > > > > * David Hildenbrand (david@redhat.com) wrote: > > > > > We can create shared anonymous memory via > > > > > "-object memory-backend-ram,share=on,..." > > > > > which is, for example, required by PVRDMA for mremap() to work. > > > > > > > > > > Shared anonymous memory is weird, though. Instead of MADV_DONTNEED, we > > > > > have to use MADV_REMOVE. MADV_DONTNEED fails silently and does nothing. > > > > > > > > OK, I wonder how stable these rules are; is it defined anywhere that > > > > it's required? > > > > > > > > > > I had a look at the Linux implementation: it's essentially shmem ... but we > > > don't have an fd exposed, so we cannot use fallocate() ... :) > > > > > > MADV_REMOVE documents (man): > > > > > > "In the initial implementation, only tmpfs(5) was supported MADV_REMOVE; but > > > since Linux 3.5, any filesystem which supports the fallocate(2) > > > FALLOC_FL_PUNCH_HOLE mode also supports MADV_REMOVE." > > > > Hmm, I see that MADV_DONTNEED will still tear down all mappings even for > > anonymous shmem.. what did I miss? > > Where did you see that? I see madvise_dontneed_free() calls zap_page_range(). > > > > > MADV_DONTNEED only invalidates private copies in the pagecache. It's > essentially useless for any kind of shared mappings. Since it's about zapping page tables, then I don't understand why it won't work for shmem..
On 11.03.21 18:22, Peter Xu wrote: > On Thu, Mar 11, 2021 at 06:15:15PM +0100, David Hildenbrand wrote: >> On 11.03.21 18:11, Peter Xu wrote: >>> On Thu, Mar 11, 2021 at 05:45:46PM +0100, David Hildenbrand wrote: >>>> On 11.03.21 17:39, Dr. David Alan Gilbert wrote: >>>>> * David Hildenbrand (david@redhat.com) wrote: >>>>>> We can create shared anonymous memory via >>>>>> "-object memory-backend-ram,share=on,..." >>>>>> which is, for example, required by PVRDMA for mremap() to work. >>>>>> >>>>>> Shared anonymous memory is weird, though. Instead of MADV_DONTNEED, we >>>>>> have to use MADV_REMOVE. MADV_DONTNEED fails silently and does nothing. >>>>> >>>>> OK, I wonder how stable these rules are; is it defined anywhere that >>>>> it's required? >>>>> >>>> >>>> I had a look at the Linux implementation: it's essentially shmem ... but we >>>> don't have an fd exposed, so we cannot use fallocate() ... :) >>>> >>>> MADV_REMOVE documents (man): >>>> >>>> "In the initial implementation, only tmpfs(5) was supported MADV_REMOVE; but >>>> since Linux 3.5, any filesystem which supports the fallocate(2) >>>> FALLOC_FL_PUNCH_HOLE mode also supports MADV_REMOVE." >>> >>> Hmm, I see that MADV_DONTNEED will still tear down all mappings even for >>> anonymous shmem.. what did I miss? >> >> Where did you see that? > > I see madvise_dontneed_free() calls zap_page_range(). > >> >>> >> >> MADV_DONTNEED only invalidates private copies in the pagecache. It's >> essentially useless for any kind of shared mappings. Let me rephrase because it was wrong: MADV_DONTNEED invalidates private COW pages referenced in the page tables :) > > Since it's about zapping page tables, then I don't understand why it won't work > for shmem.. It zaps the page tables but the shmem pages are still referenced (in the pagecache AFAIU). On next user space access, you would fill the page tables with the previous content. That's why MADV_DONTNEED works properly on private anonymous memory, but not on shared anonymous memory - the only valid references are in the page tables in case of private mappings (well, unless we have other references like GUP etc.). I did wonder, however, if there is benefit in doing both: MADV_REMOVE followed by MADV_DONTNEED or the other way around. Like, will the extra MADV_DONTNEED also remove page tables and not just invalidate/zap the entries. Doesn't make a difference functionality-wise, but memory-consumption-wise. I'll still have to have a look.
On Thu, Mar 11, 2021 at 06:41:29PM +0100, David Hildenbrand wrote: > It zaps the page tables but the shmem pages are still referenced (in the > pagecache AFAIU). On next user space access, you would fill the page tables > with the previous content. > > That's why MADV_DONTNEED works properly on private anonymous memory, but not > on shared anonymous memory - the only valid references are in the page > tables in case of private mappings (well, unless we have other references > like GUP etc.). For some reason I thought anonymous shared memory could do auto-recycle, but after a second thought what you said makes perfect sense. > > > I did wonder, however, if there is benefit in doing both: > > MADV_REMOVE followed by MADV_DONTNEED or the other way around. Like, will > the extra MADV_DONTNEED also remove page tables and not just invalidate/zap > the entries. Doesn't make a difference functionality-wise, but > memory-consumption-wise. > > I'll still have to have a look. I saw your other email - that'll be another topic of course. For now I believe it's not necessary, and your current patch looks valid. I just hope when qemu decides to disgard the range, we're sure the rdma mremap() region have been unmaped - iiuc that's the only use case of that. Otherwise data would corrupt.
On Mon, Mar 08, 2021 at 04:05:50PM +0100, David Hildenbrand wrote: > We can create shared anonymous memory via > "-object memory-backend-ram,share=on,..." > which is, for example, required by PVRDMA for mremap() to work. > > Shared anonymous memory is weird, though. Instead of MADV_DONTNEED, we > have to use MADV_REMOVE. MADV_DONTNEED fails silently and does nothing. > > Fixes: 06329ccecfa0 ("mem: add share parameter to memory-backend-ram") I'm thinking whether we should keep this fixes - it's valid, however it could unveil issues if those remapped ranges didn't get unmapped in time. After all "not releasing some memory existed" seems not a huge deal for stable. No strong opinion, just raise it up as a pure question. > Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>
On 11.03.21 22:37, Peter Xu wrote: > On Mon, Mar 08, 2021 at 04:05:50PM +0100, David Hildenbrand wrote: >> We can create shared anonymous memory via >> "-object memory-backend-ram,share=on,..." >> which is, for example, required by PVRDMA for mremap() to work. >> >> Shared anonymous memory is weird, though. Instead of MADV_DONTNEED, we >> have to use MADV_REMOVE. MADV_DONTNEED fails silently and does nothing. >> >> Fixes: 06329ccecfa0 ("mem: add share parameter to memory-backend-ram") > > I'm thinking whether we should keep this fixes - it's valid, however it could > unveil issues if those remapped ranges didn't get unmapped in time. After all > "not releasing some memory existed" seems not a huge deal for stable. No > strong opinion, just raise it up as a pure question. > If someone would be using it along with postcopy (which should work apart from that issue) you could be in trouble. That's why i think at least the Fixes: tag is valid. CC: stable might be debatable indeed. >> Signed-off-by: David Hildenbrand <david@redhat.com> > > Reviewed-by: Peter Xu <peterx@redhat.com> > Thanks!
diff --git a/softmmu/physmem.c b/softmmu/physmem.c index 62ea4abbdd..2ba815fec6 100644 --- a/softmmu/physmem.c +++ b/softmmu/physmem.c @@ -3506,6 +3506,7 @@ int ram_block_discard_range(RAMBlock *rb, uint64_t start, size_t length) /* The logic here is messy; * madvise DONTNEED fails for hugepages * fallocate works on hugepages and shmem + * shared anonymous memory requires madvise REMOVE */ need_madvise = (rb->page_size == qemu_host_page_size); need_fallocate = rb->fd != -1; @@ -3539,7 +3540,11 @@ int ram_block_discard_range(RAMBlock *rb, uint64_t start, size_t length) * fallocate'd away). */ #if defined(CONFIG_MADVISE) - ret = madvise(host_startaddr, length, MADV_DONTNEED); + if (qemu_ram_is_shared(rb) && rb->fd < 0) { + ret = madvise(host_startaddr, length, MADV_REMOVE); + } else { + ret = madvise(host_startaddr, length, MADV_DONTNEED); + } if (ret) { ret = -errno; error_report("ram_block_discard_range: Failed to discard range "
We can create shared anonymous memory via "-object memory-backend-ram,share=on,..." which is, for example, required by PVRDMA for mremap() to work. Shared anonymous memory is weird, though. Instead of MADV_DONTNEED, we have to use MADV_REMOVE. MADV_DONTNEED fails silently and does nothing. Fixes: 06329ccecfa0 ("mem: add share parameter to memory-backend-ram") Signed-off-by: David Hildenbrand <david@redhat.com> --- softmmu/physmem.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)