diff mbox series

[v3,3/3] migration/multifd: Warn user when zerocopy not working

Message ID 20220704202315.507145-4-leobras@redhat.com
State New
Headers show
Series Zero copy improvements (QIOChannel + multifd) | expand

Commit Message

Leonardo Bras July 4, 2022, 8:23 p.m. UTC
Some errors, like the lack of Scatter-Gather support by the network
interface(NETIF_F_SG) may cause sendmsg(...,MSG_ZEROCOPY) to fail on using
zero-copy, which causes it to fall back to the default copying mechanism.

After each full dirty-bitmap scan there should be a zero-copy flush
happening, which checks for errors each of the previous calls to
sendmsg(...,MSG_ZEROCOPY). If all of them failed to use zero-copy, then
increment dirty_sync_missed_zero_copy migration stat to let the user know
about it.

Signed-off-by: Leonardo Bras <leobras@redhat.com>
---
 migration/ram.h     | 2 ++
 migration/multifd.c | 2 ++
 migration/ram.c     | 5 +++++
 3 files changed, 9 insertions(+)

Comments

Daniel P. Berrangé July 5, 2022, 8:05 a.m. UTC | #1
On Mon, Jul 04, 2022 at 05:23:15PM -0300, Leonardo Bras wrote:
> Some errors, like the lack of Scatter-Gather support by the network
> interface(NETIF_F_SG) may cause sendmsg(...,MSG_ZEROCOPY) to fail on using
> zero-copy, which causes it to fall back to the default copying mechanism.
> 
> After each full dirty-bitmap scan there should be a zero-copy flush
> happening, which checks for errors each of the previous calls to
> sendmsg(...,MSG_ZEROCOPY). If all of them failed to use zero-copy, then
> increment dirty_sync_missed_zero_copy migration stat to let the user know
> about it.
> 
> Signed-off-by: Leonardo Bras <leobras@redhat.com>
> ---
>  migration/ram.h     | 2 ++
>  migration/multifd.c | 2 ++
>  migration/ram.c     | 5 +++++
>  3 files changed, 9 insertions(+)

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>


With regards,
Daniel
Peter Xu July 7, 2022, 5:56 p.m. UTC | #2
On Mon, Jul 04, 2022 at 05:23:15PM -0300, Leonardo Bras wrote:
> Some errors, like the lack of Scatter-Gather support by the network
> interface(NETIF_F_SG) may cause sendmsg(...,MSG_ZEROCOPY) to fail on using
> zero-copy, which causes it to fall back to the default copying mechanism.
> 
> After each full dirty-bitmap scan there should be a zero-copy flush
> happening, which checks for errors each of the previous calls to
> sendmsg(...,MSG_ZEROCOPY). If all of them failed to use zero-copy, then
> increment dirty_sync_missed_zero_copy migration stat to let the user know
> about it.
> 
> Signed-off-by: Leonardo Bras <leobras@redhat.com>
> ---
>  migration/ram.h     | 2 ++
>  migration/multifd.c | 2 ++
>  migration/ram.c     | 5 +++++
>  3 files changed, 9 insertions(+)
> 
> diff --git a/migration/ram.h b/migration/ram.h
> index ded0a3a086..d3c7eb96f5 100644
> --- a/migration/ram.h
> +++ b/migration/ram.h
> @@ -87,4 +87,6 @@ void ram_write_tracking_prepare(void);
>  int ram_write_tracking_start(void);
>  void ram_write_tracking_stop(void);
>  
> +void dirty_sync_missed_zero_copy(void);
> +
>  #endif
> diff --git a/migration/multifd.c b/migration/multifd.c
> index 684c014c86..3909b34967 100644
> --- a/migration/multifd.c
> +++ b/migration/multifd.c
> @@ -624,6 +624,8 @@ int multifd_send_sync_main(QEMUFile *f)
>              if (ret < 0) {
>                  error_report_err(err);
>                  return -1;
> +            } else if (ret == 1) {
> +                dirty_sync_missed_zero_copy();
>              }
>          }
>      }

I know that Juan is working on some patch to only do
multifd_send_sync_main() for each dirty sync, but that's not landed, right?

Can we name it without "dirty-sync" at all (so it'll work before/after
Juan's patch will be applied)?  Something like "zero-copy-send-fallbacks"?

The other thing is the subject may need to be touched up as right now with
the field we don't warn the user anymore on zero-copy-send fallbacks.

Thanks,

> diff --git a/migration/ram.c b/migration/ram.c
> index 01f9cc1d72..db948c4787 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -407,6 +407,11 @@ static void ram_transferred_add(uint64_t bytes)
>      ram_counters.transferred += bytes;
>  }
>  
> +void dirty_sync_missed_zero_copy(void)
> +{
> +    ram_counters.dirty_sync_missed_zero_copy++;
> +}
> +
>  /* used by the search for pages to send */
>  struct PageSearchStatus {
>      /* Current block being searched */
> -- 
> 2.36.1
>
Leonardo Bras July 7, 2022, 7:59 p.m. UTC | #3
Hello Peter,

On Thu, Jul 7, 2022 at 2:56 PM Peter Xu <peterx@redhat.com> wrote:
>
> On Mon, Jul 04, 2022 at 05:23:15PM -0300, Leonardo Bras wrote:
> > Some errors, like the lack of Scatter-Gather support by the network
> > interface(NETIF_F_SG) may cause sendmsg(...,MSG_ZEROCOPY) to fail on using
> > zero-copy, which causes it to fall back to the default copying mechanism.
> >
> > After each full dirty-bitmap scan there should be a zero-copy flush
> > happening, which checks for errors each of the previous calls to
> > sendmsg(...,MSG_ZEROCOPY). If all of them failed to use zero-copy, then
> > increment dirty_sync_missed_zero_copy migration stat to let the user know
> > about it.
> >
> > Signed-off-by: Leonardo Bras <leobras@redhat.com>
> > ---
> >  migration/ram.h     | 2 ++
> >  migration/multifd.c | 2 ++
> >  migration/ram.c     | 5 +++++
> >  3 files changed, 9 insertions(+)
> >
> > diff --git a/migration/ram.h b/migration/ram.h
> > index ded0a3a086..d3c7eb96f5 100644
> > --- a/migration/ram.h
> > +++ b/migration/ram.h
> > @@ -87,4 +87,6 @@ void ram_write_tracking_prepare(void);
> >  int ram_write_tracking_start(void);
> >  void ram_write_tracking_stop(void);
> >
> > +void dirty_sync_missed_zero_copy(void);
> > +
> >  #endif
> > diff --git a/migration/multifd.c b/migration/multifd.c
> > index 684c014c86..3909b34967 100644
> > --- a/migration/multifd.c
> > +++ b/migration/multifd.c
> > @@ -624,6 +624,8 @@ int multifd_send_sync_main(QEMUFile *f)
> >              if (ret < 0) {
> >                  error_report_err(err);
> >                  return -1;
> > +            } else if (ret == 1) {
> > +                dirty_sync_missed_zero_copy();
> >              }
> >          }
> >      }
>
> I know that Juan is working on some patch to only do
> multifd_send_sync_main() for each dirty sync, but that's not landed, right?

That's correct, but I am hoping it should land before the release, so
the numbers will match.


>
> Can we name it without "dirty-sync" at all (so it'll work before/after
> Juan's patch will be applied)?  Something like "zero-copy-send-fallbacks"?

It initially was something like that, but on the v2 thread there was
some discussion on
the topic, and it was suggested the number would not mean much to the
user, unless
it was connected to something else.

Markus suggested the connection to @dirty-sync-count right in the
name, and Daniel suggested the above name, which sounds fine to me.

>
> The other thing is the subject may need to be touched up as right now with
> the field we don't warn the user anymore on zero-copy-send fallbacks.

Ok, Warning sounds misleading here.
What do you think about 'report' instead?

Best regards,
Leo

>
> Thanks,
>
> > diff --git a/migration/ram.c b/migration/ram.c
> > index 01f9cc1d72..db948c4787 100644
> > --- a/migration/ram.c
> > +++ b/migration/ram.c
> > @@ -407,6 +407,11 @@ static void ram_transferred_add(uint64_t bytes)
> >      ram_counters.transferred += bytes;
> >  }
> >
> > +void dirty_sync_missed_zero_copy(void)
> > +{
> > +    ram_counters.dirty_sync_missed_zero_copy++;
> > +}
> > +
> >  /* used by the search for pages to send */
> >  struct PageSearchStatus {
> >      /* Current block being searched */
> > --
> > 2.36.1
> >
>
> --
> Peter Xu
>
Peter Xu July 7, 2022, 8:06 p.m. UTC | #4
On Thu, Jul 07, 2022 at 04:59:22PM -0300, Leonardo Bras Soares Passos wrote:
> Hello Peter,
> 
> On Thu, Jul 7, 2022 at 2:56 PM Peter Xu <peterx@redhat.com> wrote:
> >
> > On Mon, Jul 04, 2022 at 05:23:15PM -0300, Leonardo Bras wrote:
> > > Some errors, like the lack of Scatter-Gather support by the network
> > > interface(NETIF_F_SG) may cause sendmsg(...,MSG_ZEROCOPY) to fail on using
> > > zero-copy, which causes it to fall back to the default copying mechanism.
> > >
> > > After each full dirty-bitmap scan there should be a zero-copy flush
> > > happening, which checks for errors each of the previous calls to
> > > sendmsg(...,MSG_ZEROCOPY). If all of them failed to use zero-copy, then
> > > increment dirty_sync_missed_zero_copy migration stat to let the user know
> > > about it.
> > >
> > > Signed-off-by: Leonardo Bras <leobras@redhat.com>
> > > ---
> > >  migration/ram.h     | 2 ++
> > >  migration/multifd.c | 2 ++
> > >  migration/ram.c     | 5 +++++
> > >  3 files changed, 9 insertions(+)
> > >
> > > diff --git a/migration/ram.h b/migration/ram.h
> > > index ded0a3a086..d3c7eb96f5 100644
> > > --- a/migration/ram.h
> > > +++ b/migration/ram.h
> > > @@ -87,4 +87,6 @@ void ram_write_tracking_prepare(void);
> > >  int ram_write_tracking_start(void);
> > >  void ram_write_tracking_stop(void);
> > >
> > > +void dirty_sync_missed_zero_copy(void);
> > > +
> > >  #endif
> > > diff --git a/migration/multifd.c b/migration/multifd.c
> > > index 684c014c86..3909b34967 100644
> > > --- a/migration/multifd.c
> > > +++ b/migration/multifd.c
> > > @@ -624,6 +624,8 @@ int multifd_send_sync_main(QEMUFile *f)
> > >              if (ret < 0) {
> > >                  error_report_err(err);
> > >                  return -1;
> > > +            } else if (ret == 1) {
> > > +                dirty_sync_missed_zero_copy();
> > >              }
> > >          }
> > >      }
> >
> > I know that Juan is working on some patch to only do
> > multifd_send_sync_main() for each dirty sync, but that's not landed, right?
> 
> That's correct, but I am hoping it should land before the release, so
> the numbers will match.
> 
> 
> >
> > Can we name it without "dirty-sync" at all (so it'll work before/after
> > Juan's patch will be applied)?  Something like "zero-copy-send-fallbacks"?
> 
> It initially was something like that, but on the v2 thread there was
> some discussion on
> the topic, and it was suggested the number would not mean much to the
> user, unless
> it was connected to something else.
> 
> Markus suggested the connection to @dirty-sync-count right in the
> name, and Daniel suggested the above name, which sounds fine to me.

Ah okay.

But then I suggest we make sure it lands only after Juan's.. or it won't
really match.  Also when Juan's patch ready, we'd also double check it will
be exactly called once per iteration, or we can get confusing numbers.  I
assume Juan will take care of that then.

> 
> >
> > The other thing is the subject may need to be touched up as right now with
> > the field we don't warn the user anymore on zero-copy-send fallbacks.
> 
> Ok, Warning sounds misleading here.
> What do you think about 'report' instead?

Looks good.  Thanks,
Leonardo Bras July 7, 2022, 9:18 p.m. UTC | #5
On Thu, 2022-07-07 at 16:06 -0400, Peter Xu wrote:
> On Thu, Jul 07, 2022 at 04:59:22PM -0300, Leonardo Bras Soares Passos wrote:
> > Hello Peter,
> > 
> > On Thu, Jul 7, 2022 at 2:56 PM Peter Xu <peterx@redhat.com> wrote:
> > > 
> > > On Mon, Jul 04, 2022 at 05:23:15PM -0300, Leonardo Bras wrote:
> > > > Some errors, like the lack of Scatter-Gather support by the network
> > > > interface(NETIF_F_SG) may cause sendmsg(...,MSG_ZEROCOPY) to fail on
> > > > using
> > > > zero-copy, which causes it to fall back to the default copying
> > > > mechanism.
> > > > 
> > > > After each full dirty-bitmap scan there should be a zero-copy flush
> > > > happening, which checks for errors each of the previous calls to
> > > > sendmsg(...,MSG_ZEROCOPY). If all of them failed to use zero-copy, then
> > > > increment dirty_sync_missed_zero_copy migration stat to let the user
> > > > know
> > > > about it.
> > > > 
> > > > Signed-off-by: Leonardo Bras <leobras@redhat.com>
> > > > ---
> > > >  migration/ram.h     | 2 ++
> > > >  migration/multifd.c | 2 ++
> > > >  migration/ram.c     | 5 +++++
> > > >  3 files changed, 9 insertions(+)
> > > > 
> > > > diff --git a/migration/ram.h b/migration/ram.h
> > > > index ded0a3a086..d3c7eb96f5 100644
> > > > --- a/migration/ram.h
> > > > +++ b/migration/ram.h
> > > > @@ -87,4 +87,6 @@ void ram_write_tracking_prepare(void);
> > > >  int ram_write_tracking_start(void);
> > > >  void ram_write_tracking_stop(void);
> > > > 
> > > > +void dirty_sync_missed_zero_copy(void);
> > > > +
> > > >  #endif
> > > > diff --git a/migration/multifd.c b/migration/multifd.c
> > > > index 684c014c86..3909b34967 100644
> > > > --- a/migration/multifd.c
> > > > +++ b/migration/multifd.c
> > > > @@ -624,6 +624,8 @@ int multifd_send_sync_main(QEMUFile *f)
> > > >              if (ret < 0) {
> > > >                  error_report_err(err);
> > > >                  return -1;
> > > > +            } else if (ret == 1) {
> > > > +                dirty_sync_missed_zero_copy();
> > > >              }
> > > >          }
> > > >      }
> > > 
> > > I know that Juan is working on some patch to only do
> > > multifd_send_sync_main() for each dirty sync, but that's not landed,
> > > right?
> > 
> > That's correct, but I am hoping it should land before the release, so
> > the numbers will match.
> > 
> > 
> > > 
> > > Can we name it without "dirty-sync" at all (so it'll work before/after
> > > Juan's patch will be applied)?  Something like "zero-copy-send-fallbacks"?
> > 
> > It initially was something like that, but on the v2 thread there was
> > some discussion on
> > the topic, and it was suggested the number would not mean much to the
> > user, unless
> > it was connected to something else.
> > 
> > Markus suggested the connection to @dirty-sync-count right in the
> > name, and Daniel suggested the above name, which sounds fine to me.
> 
> Ah okay.
> 
> But then I suggest we make sure it lands only after Juan's.. or it won't
> really match.  Also when Juan's patch ready, we'd also double check it will
> be exactly called once per iteration, or we can get confusing numbers.  I
> assume Juan will take care of that then.
> 
> > 
> > > 
> > > The other thing is the subject may need to be touched up as right now with
> > > the field we don't warn the user anymore on zero-copy-send fallbacks.
> > 
> > Ok, Warning sounds misleading here.
> > What do you think about 'report' instead?
> 
> Looks good.  Thanks,

Thank you for reviewing, Peter!

Leo

>
diff mbox series

Patch

diff --git a/migration/ram.h b/migration/ram.h
index ded0a3a086..d3c7eb96f5 100644
--- a/migration/ram.h
+++ b/migration/ram.h
@@ -87,4 +87,6 @@  void ram_write_tracking_prepare(void);
 int ram_write_tracking_start(void);
 void ram_write_tracking_stop(void);
 
+void dirty_sync_missed_zero_copy(void);
+
 #endif
diff --git a/migration/multifd.c b/migration/multifd.c
index 684c014c86..3909b34967 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -624,6 +624,8 @@  int multifd_send_sync_main(QEMUFile *f)
             if (ret < 0) {
                 error_report_err(err);
                 return -1;
+            } else if (ret == 1) {
+                dirty_sync_missed_zero_copy();
             }
         }
     }
diff --git a/migration/ram.c b/migration/ram.c
index 01f9cc1d72..db948c4787 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -407,6 +407,11 @@  static void ram_transferred_add(uint64_t bytes)
     ram_counters.transferred += bytes;
 }
 
+void dirty_sync_missed_zero_copy(void)
+{
+    ram_counters.dirty_sync_missed_zero_copy++;
+}
+
 /* used by the search for pages to send */
 struct PageSearchStatus {
     /* Current block being searched */