diff mbox

[v2,05/16] postcopy: enhance ram_block_discard_range for hugepages

Message ID 20170206173306.20603-6-dgilbert@redhat.com
State New
Headers show

Commit Message

Dr. David Alan Gilbert Feb. 6, 2017, 5:32 p.m. UTC
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Unfortunately madvise DONTNEED doesn't work on hugepagetlb
so use fallocate(FALLOC_FL_PUNCH_HOLE)
qemu_fd_getpagesize only sets the page based off a file
if the file is from hugetlbfs.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 exec.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

Comments

Juan Quintela Feb. 24, 2017, 1:20 p.m. UTC | #1
"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> Unfortunately madvise DONTNEED doesn't work on hugepagetlb
> so use fallocate(FALLOC_FL_PUNCH_HOLE)
> qemu_fd_getpagesize only sets the page based off a file
> if the file is from hugetlbfs.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

But ...


> ---
>  exec.c | 13 ++++++++++++-
>  1 file changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/exec.c b/exec.c
> index e040cdf..c25f6b3 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -3324,9 +3324,20 @@ int ram_block_discard_range(RAMBlock *rb, uint64_t start, size_t length)
>  
>          errno = ENOTSUP; /* If we are missing MADVISE etc */
>  
> +        if (rb->page_size == qemu_host_page_size) {
>  #if defined(CONFIG_MADVISE)
> -        ret = qemu_madvise(host_startaddr, length, QEMU_MADV_DONTNEED);
> +            ret = qemu_madvise(host_startaddr, length, QEMU_MADV_DONTNEED);
>  #endif
> +        } else {
> +            /* Huge page case  - unfortunately it can't do DONTNEED, but
> +             * it can do the equivalent by FALLOC_FL_PUNCH_HOLE in the
> +             * huge page file.
> +             */
> +#ifdef CONFIG_FALLOCATE_PUNCH_HOLE
> +            ret = fallocate(rb->fd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
> +                            start, length);


Why can't we use fallocate() when !CONFIG_MADVISE?

or even ...

         if (rb->page_size == qemu_host_page_size) {
#if defined(CONFIG_MADVISE)
            ret = qemu_madvise(host_startaddr, length, QEMU_MADV_DONTNEED);
#endif
          }

          if (ret == -1) {
              /* Huge page case  - unfortunately it can't do DONTNEED, but
               * it can do the equivalent by FALLOC_FL_PUNCH_HOLE in the
               * huge page file.
               */
#ifdef CONFIG_FALLOCATE_PUNCH_HOLE
           ret = fallocate(rb->fd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
                            start, length);
#endif
          }
Dr. David Alan Gilbert Feb. 24, 2017, 1:44 p.m. UTC | #2
* Juan Quintela (quintela@redhat.com) wrote:
> "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >
> > Unfortunately madvise DONTNEED doesn't work on hugepagetlb
> > so use fallocate(FALLOC_FL_PUNCH_HOLE)
> > qemu_fd_getpagesize only sets the page based off a file
> > if the file is from hugetlbfs.
> >
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> 
> Reviewed-by: Juan Quintela <quintela@redhat.com>
> 
> But ...
> 
> 
> > ---
> >  exec.c | 13 ++++++++++++-
> >  1 file changed, 12 insertions(+), 1 deletion(-)
> >
> > diff --git a/exec.c b/exec.c
> > index e040cdf..c25f6b3 100644
> > --- a/exec.c
> > +++ b/exec.c
> > @@ -3324,9 +3324,20 @@ int ram_block_discard_range(RAMBlock *rb, uint64_t start, size_t length)
> >  
> >          errno = ENOTSUP; /* If we are missing MADVISE etc */
> >  
> > +        if (rb->page_size == qemu_host_page_size) {
> >  #if defined(CONFIG_MADVISE)
> > -        ret = qemu_madvise(host_startaddr, length, QEMU_MADV_DONTNEED);
> > +            ret = qemu_madvise(host_startaddr, length, QEMU_MADV_DONTNEED);
> >  #endif
> > +        } else {
> > +            /* Huge page case  - unfortunately it can't do DONTNEED, but
> > +             * it can do the equivalent by FALLOC_FL_PUNCH_HOLE in the
> > +             * huge page file.
> > +             */
> > +#ifdef CONFIG_FALLOCATE_PUNCH_HOLE
> > +            ret = fallocate(rb->fd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
> > +                            start, length);
> 
> 
> Why can't we use fallocate() when !CONFIG_MADVISE?
> 
> or even ...
> 
>          if (rb->page_size == qemu_host_page_size) {
> #if defined(CONFIG_MADVISE)
>             ret = qemu_madvise(host_startaddr, length, QEMU_MADV_DONTNEED);
> #endif
>           }
> 
>           if (ret == -1) {
>               /* Huge page case  - unfortunately it can't do DONTNEED, but
>                * it can do the equivalent by FALLOC_FL_PUNCH_HOLE in the
>                * huge page file.
>                */
> #ifdef CONFIG_FALLOCATE_PUNCH_HOLE
>            ret = fallocate(rb->fd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
>                             start, length);
> #endif
>           }

The fallocate only works where we have an fd, e.g. in the hugepage case;
the madvise only works where we have anonymous memory.  So if we don't have
madvise, we can't use fallocate for normal anonymous memory.

Actually, it's much more complicated than that - I've got another patch
that adds support for postcopy with memory that's backed by tmpfs with shared=true
and that also uses fallocate;  I'm trying to decide if it also needs
the madvise.

Dave

--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Laurent Vivier Feb. 24, 2017, 2:20 p.m. UTC | #3
On 06/02/2017 18:32, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Unfortunately madvise DONTNEED doesn't work on hugepagetlb
> so use fallocate(FALLOC_FL_PUNCH_HOLE)
> qemu_fd_getpagesize only sets the page based off a file
> if the file is from hugetlbfs.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  exec.c | 13 ++++++++++++-
>  1 file changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/exec.c b/exec.c
> index e040cdf..c25f6b3 100644
> --- a/exec.c
> +++ b/exec.c

You should move here the "#include" from PATCH 04/16

+#ifdef CONFIG_FALLOCATE_PUNCH_HOLE
+#include <fcntl.h>
+#include <linux/falloc.h>
+#endif

> @@ -3324,9 +3324,20 @@ int ram_block_discard_range(RAMBlock *rb, uint64_t start, size_t length)
>  
>          errno = ENOTSUP; /* If we are missing MADVISE etc */
>  
> +        if (rb->page_size == qemu_host_page_size) {
>  #if defined(CONFIG_MADVISE)
> -        ret = qemu_madvise(host_startaddr, length, QEMU_MADV_DONTNEED);
> +            ret = qemu_madvise(host_startaddr, length, QEMU_MADV_DONTNEED);
>  #endif
> +        } else {
> +            /* Huge page case  - unfortunately it can't do DONTNEED, but
> +             * it can do the equivalent by FALLOC_FL_PUNCH_HOLE in the
> +             * huge page file.
> +             */
> +#ifdef CONFIG_FALLOCATE_PUNCH_HOLE
> +            ret = fallocate(rb->fd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
> +                            start, length);
> +#endif
> +        }
>          if (ret) {
>              ret = -errno;
>              error_report("ram_block_discard_range: Failed to discard range "
> 

Reviewed-by: Laurent Vivier <lvivier@redhat.com>
diff mbox

Patch

diff --git a/exec.c b/exec.c
index e040cdf..c25f6b3 100644
--- a/exec.c
+++ b/exec.c
@@ -3324,9 +3324,20 @@  int ram_block_discard_range(RAMBlock *rb, uint64_t start, size_t length)
 
         errno = ENOTSUP; /* If we are missing MADVISE etc */
 
+        if (rb->page_size == qemu_host_page_size) {
 #if defined(CONFIG_MADVISE)
-        ret = qemu_madvise(host_startaddr, length, QEMU_MADV_DONTNEED);
+            ret = qemu_madvise(host_startaddr, length, QEMU_MADV_DONTNEED);
 #endif
+        } else {
+            /* Huge page case  - unfortunately it can't do DONTNEED, but
+             * it can do the equivalent by FALLOC_FL_PUNCH_HOLE in the
+             * huge page file.
+             */
+#ifdef CONFIG_FALLOCATE_PUNCH_HOLE
+            ret = fallocate(rb->fd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
+                            start, length);
+#endif
+        }
         if (ret) {
             ret = -errno;
             error_report("ram_block_discard_range: Failed to discard range "