Message ID | 20220704202315.507145-4-leobras@redhat.com |
---|---|
State | New |
Headers | show |
Series | Zero copy improvements (QIOChannel + multifd) | expand |
On Mon, Jul 04, 2022 at 05:23:15PM -0300, Leonardo Bras wrote: > Some errors, like the lack of Scatter-Gather support by the network > interface(NETIF_F_SG) may cause sendmsg(...,MSG_ZEROCOPY) to fail on using > zero-copy, which causes it to fall back to the default copying mechanism. > > After each full dirty-bitmap scan there should be a zero-copy flush > happening, which checks for errors each of the previous calls to > sendmsg(...,MSG_ZEROCOPY). If all of them failed to use zero-copy, then > increment dirty_sync_missed_zero_copy migration stat to let the user know > about it. > > Signed-off-by: Leonardo Bras <leobras@redhat.com> > --- > migration/ram.h | 2 ++ > migration/multifd.c | 2 ++ > migration/ram.c | 5 +++++ > 3 files changed, 9 insertions(+) Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> With regards, Daniel
On Mon, Jul 04, 2022 at 05:23:15PM -0300, Leonardo Bras wrote: > Some errors, like the lack of Scatter-Gather support by the network > interface(NETIF_F_SG) may cause sendmsg(...,MSG_ZEROCOPY) to fail on using > zero-copy, which causes it to fall back to the default copying mechanism. > > After each full dirty-bitmap scan there should be a zero-copy flush > happening, which checks for errors each of the previous calls to > sendmsg(...,MSG_ZEROCOPY). If all of them failed to use zero-copy, then > increment dirty_sync_missed_zero_copy migration stat to let the user know > about it. > > Signed-off-by: Leonardo Bras <leobras@redhat.com> > --- > migration/ram.h | 2 ++ > migration/multifd.c | 2 ++ > migration/ram.c | 5 +++++ > 3 files changed, 9 insertions(+) > > diff --git a/migration/ram.h b/migration/ram.h > index ded0a3a086..d3c7eb96f5 100644 > --- a/migration/ram.h > +++ b/migration/ram.h > @@ -87,4 +87,6 @@ void ram_write_tracking_prepare(void); > int ram_write_tracking_start(void); > void ram_write_tracking_stop(void); > > +void dirty_sync_missed_zero_copy(void); > + > #endif > diff --git a/migration/multifd.c b/migration/multifd.c > index 684c014c86..3909b34967 100644 > --- a/migration/multifd.c > +++ b/migration/multifd.c > @@ -624,6 +624,8 @@ int multifd_send_sync_main(QEMUFile *f) > if (ret < 0) { > error_report_err(err); > return -1; > + } else if (ret == 1) { > + dirty_sync_missed_zero_copy(); > } > } > } I know that Juan is working on some patch to only do multifd_send_sync_main() for each dirty sync, but that's not landed, right? Can we name it without "dirty-sync" at all (so it'll work before/after Juan's patch will be applied)? Something like "zero-copy-send-fallbacks"? The other thing is the subject may need to be touched up as right now with the field we don't warn the user anymore on zero-copy-send fallbacks. Thanks, > diff --git a/migration/ram.c b/migration/ram.c > index 01f9cc1d72..db948c4787 100644 > --- a/migration/ram.c > +++ b/migration/ram.c > @@ -407,6 +407,11 @@ static void ram_transferred_add(uint64_t bytes) > ram_counters.transferred += bytes; > } > > +void dirty_sync_missed_zero_copy(void) > +{ > + ram_counters.dirty_sync_missed_zero_copy++; > +} > + > /* used by the search for pages to send */ > struct PageSearchStatus { > /* Current block being searched */ > -- > 2.36.1 >
Hello Peter, On Thu, Jul 7, 2022 at 2:56 PM Peter Xu <peterx@redhat.com> wrote: > > On Mon, Jul 04, 2022 at 05:23:15PM -0300, Leonardo Bras wrote: > > Some errors, like the lack of Scatter-Gather support by the network > > interface(NETIF_F_SG) may cause sendmsg(...,MSG_ZEROCOPY) to fail on using > > zero-copy, which causes it to fall back to the default copying mechanism. > > > > After each full dirty-bitmap scan there should be a zero-copy flush > > happening, which checks for errors each of the previous calls to > > sendmsg(...,MSG_ZEROCOPY). If all of them failed to use zero-copy, then > > increment dirty_sync_missed_zero_copy migration stat to let the user know > > about it. > > > > Signed-off-by: Leonardo Bras <leobras@redhat.com> > > --- > > migration/ram.h | 2 ++ > > migration/multifd.c | 2 ++ > > migration/ram.c | 5 +++++ > > 3 files changed, 9 insertions(+) > > > > diff --git a/migration/ram.h b/migration/ram.h > > index ded0a3a086..d3c7eb96f5 100644 > > --- a/migration/ram.h > > +++ b/migration/ram.h > > @@ -87,4 +87,6 @@ void ram_write_tracking_prepare(void); > > int ram_write_tracking_start(void); > > void ram_write_tracking_stop(void); > > > > +void dirty_sync_missed_zero_copy(void); > > + > > #endif > > diff --git a/migration/multifd.c b/migration/multifd.c > > index 684c014c86..3909b34967 100644 > > --- a/migration/multifd.c > > +++ b/migration/multifd.c > > @@ -624,6 +624,8 @@ int multifd_send_sync_main(QEMUFile *f) > > if (ret < 0) { > > error_report_err(err); > > return -1; > > + } else if (ret == 1) { > > + dirty_sync_missed_zero_copy(); > > } > > } > > } > > I know that Juan is working on some patch to only do > multifd_send_sync_main() for each dirty sync, but that's not landed, right? That's correct, but I am hoping it should land before the release, so the numbers will match. > > Can we name it without "dirty-sync" at all (so it'll work before/after > Juan's patch will be applied)? Something like "zero-copy-send-fallbacks"? It initially was something like that, but on the v2 thread there was some discussion on the topic, and it was suggested the number would not mean much to the user, unless it was connected to something else. Markus suggested the connection to @dirty-sync-count right in the name, and Daniel suggested the above name, which sounds fine to me. > > The other thing is the subject may need to be touched up as right now with > the field we don't warn the user anymore on zero-copy-send fallbacks. Ok, Warning sounds misleading here. What do you think about 'report' instead? Best regards, Leo > > Thanks, > > > diff --git a/migration/ram.c b/migration/ram.c > > index 01f9cc1d72..db948c4787 100644 > > --- a/migration/ram.c > > +++ b/migration/ram.c > > @@ -407,6 +407,11 @@ static void ram_transferred_add(uint64_t bytes) > > ram_counters.transferred += bytes; > > } > > > > +void dirty_sync_missed_zero_copy(void) > > +{ > > + ram_counters.dirty_sync_missed_zero_copy++; > > +} > > + > > /* used by the search for pages to send */ > > struct PageSearchStatus { > > /* Current block being searched */ > > -- > > 2.36.1 > > > > -- > Peter Xu >
On Thu, Jul 07, 2022 at 04:59:22PM -0300, Leonardo Bras Soares Passos wrote: > Hello Peter, > > On Thu, Jul 7, 2022 at 2:56 PM Peter Xu <peterx@redhat.com> wrote: > > > > On Mon, Jul 04, 2022 at 05:23:15PM -0300, Leonardo Bras wrote: > > > Some errors, like the lack of Scatter-Gather support by the network > > > interface(NETIF_F_SG) may cause sendmsg(...,MSG_ZEROCOPY) to fail on using > > > zero-copy, which causes it to fall back to the default copying mechanism. > > > > > > After each full dirty-bitmap scan there should be a zero-copy flush > > > happening, which checks for errors each of the previous calls to > > > sendmsg(...,MSG_ZEROCOPY). If all of them failed to use zero-copy, then > > > increment dirty_sync_missed_zero_copy migration stat to let the user know > > > about it. > > > > > > Signed-off-by: Leonardo Bras <leobras@redhat.com> > > > --- > > > migration/ram.h | 2 ++ > > > migration/multifd.c | 2 ++ > > > migration/ram.c | 5 +++++ > > > 3 files changed, 9 insertions(+) > > > > > > diff --git a/migration/ram.h b/migration/ram.h > > > index ded0a3a086..d3c7eb96f5 100644 > > > --- a/migration/ram.h > > > +++ b/migration/ram.h > > > @@ -87,4 +87,6 @@ void ram_write_tracking_prepare(void); > > > int ram_write_tracking_start(void); > > > void ram_write_tracking_stop(void); > > > > > > +void dirty_sync_missed_zero_copy(void); > > > + > > > #endif > > > diff --git a/migration/multifd.c b/migration/multifd.c > > > index 684c014c86..3909b34967 100644 > > > --- a/migration/multifd.c > > > +++ b/migration/multifd.c > > > @@ -624,6 +624,8 @@ int multifd_send_sync_main(QEMUFile *f) > > > if (ret < 0) { > > > error_report_err(err); > > > return -1; > > > + } else if (ret == 1) { > > > + dirty_sync_missed_zero_copy(); > > > } > > > } > > > } > > > > I know that Juan is working on some patch to only do > > multifd_send_sync_main() for each dirty sync, but that's not landed, right? > > That's correct, but I am hoping it should land before the release, so > the numbers will match. > > > > > > Can we name it without "dirty-sync" at all (so it'll work before/after > > Juan's patch will be applied)? Something like "zero-copy-send-fallbacks"? > > It initially was something like that, but on the v2 thread there was > some discussion on > the topic, and it was suggested the number would not mean much to the > user, unless > it was connected to something else. > > Markus suggested the connection to @dirty-sync-count right in the > name, and Daniel suggested the above name, which sounds fine to me. Ah okay. But then I suggest we make sure it lands only after Juan's.. or it won't really match. Also when Juan's patch ready, we'd also double check it will be exactly called once per iteration, or we can get confusing numbers. I assume Juan will take care of that then. > > > > > The other thing is the subject may need to be touched up as right now with > > the field we don't warn the user anymore on zero-copy-send fallbacks. > > Ok, Warning sounds misleading here. > What do you think about 'report' instead? Looks good. Thanks,
On Thu, 2022-07-07 at 16:06 -0400, Peter Xu wrote: > On Thu, Jul 07, 2022 at 04:59:22PM -0300, Leonardo Bras Soares Passos wrote: > > Hello Peter, > > > > On Thu, Jul 7, 2022 at 2:56 PM Peter Xu <peterx@redhat.com> wrote: > > > > > > On Mon, Jul 04, 2022 at 05:23:15PM -0300, Leonardo Bras wrote: > > > > Some errors, like the lack of Scatter-Gather support by the network > > > > interface(NETIF_F_SG) may cause sendmsg(...,MSG_ZEROCOPY) to fail on > > > > using > > > > zero-copy, which causes it to fall back to the default copying > > > > mechanism. > > > > > > > > After each full dirty-bitmap scan there should be a zero-copy flush > > > > happening, which checks for errors each of the previous calls to > > > > sendmsg(...,MSG_ZEROCOPY). If all of them failed to use zero-copy, then > > > > increment dirty_sync_missed_zero_copy migration stat to let the user > > > > know > > > > about it. > > > > > > > > Signed-off-by: Leonardo Bras <leobras@redhat.com> > > > > --- > > > > migration/ram.h | 2 ++ > > > > migration/multifd.c | 2 ++ > > > > migration/ram.c | 5 +++++ > > > > 3 files changed, 9 insertions(+) > > > > > > > > diff --git a/migration/ram.h b/migration/ram.h > > > > index ded0a3a086..d3c7eb96f5 100644 > > > > --- a/migration/ram.h > > > > +++ b/migration/ram.h > > > > @@ -87,4 +87,6 @@ void ram_write_tracking_prepare(void); > > > > int ram_write_tracking_start(void); > > > > void ram_write_tracking_stop(void); > > > > > > > > +void dirty_sync_missed_zero_copy(void); > > > > + > > > > #endif > > > > diff --git a/migration/multifd.c b/migration/multifd.c > > > > index 684c014c86..3909b34967 100644 > > > > --- a/migration/multifd.c > > > > +++ b/migration/multifd.c > > > > @@ -624,6 +624,8 @@ int multifd_send_sync_main(QEMUFile *f) > > > > if (ret < 0) { > > > > error_report_err(err); > > > > return -1; > > > > + } else if (ret == 1) { > > > > + dirty_sync_missed_zero_copy(); > > > > } > > > > } > > > > } > > > > > > I know that Juan is working on some patch to only do > > > multifd_send_sync_main() for each dirty sync, but that's not landed, > > > right? > > > > That's correct, but I am hoping it should land before the release, so > > the numbers will match. > > > > > > > > > > Can we name it without "dirty-sync" at all (so it'll work before/after > > > Juan's patch will be applied)? Something like "zero-copy-send-fallbacks"? > > > > It initially was something like that, but on the v2 thread there was > > some discussion on > > the topic, and it was suggested the number would not mean much to the > > user, unless > > it was connected to something else. > > > > Markus suggested the connection to @dirty-sync-count right in the > > name, and Daniel suggested the above name, which sounds fine to me. > > Ah okay. > > But then I suggest we make sure it lands only after Juan's.. or it won't > really match. Also when Juan's patch ready, we'd also double check it will > be exactly called once per iteration, or we can get confusing numbers. I > assume Juan will take care of that then. > > > > > > > > > The other thing is the subject may need to be touched up as right now with > > > the field we don't warn the user anymore on zero-copy-send fallbacks. > > > > Ok, Warning sounds misleading here. > > What do you think about 'report' instead? > > Looks good. Thanks, Thank you for reviewing, Peter! Leo >
diff --git a/migration/ram.h b/migration/ram.h index ded0a3a086..d3c7eb96f5 100644 --- a/migration/ram.h +++ b/migration/ram.h @@ -87,4 +87,6 @@ void ram_write_tracking_prepare(void); int ram_write_tracking_start(void); void ram_write_tracking_stop(void); +void dirty_sync_missed_zero_copy(void); + #endif diff --git a/migration/multifd.c b/migration/multifd.c index 684c014c86..3909b34967 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -624,6 +624,8 @@ int multifd_send_sync_main(QEMUFile *f) if (ret < 0) { error_report_err(err); return -1; + } else if (ret == 1) { + dirty_sync_missed_zero_copy(); } } } diff --git a/migration/ram.c b/migration/ram.c index 01f9cc1d72..db948c4787 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -407,6 +407,11 @@ static void ram_transferred_add(uint64_t bytes) ram_counters.transferred += bytes; } +void dirty_sync_missed_zero_copy(void) +{ + ram_counters.dirty_sync_missed_zero_copy++; +} + /* used by the search for pages to send */ struct PageSearchStatus { /* Current block being searched */
Some errors, like the lack of Scatter-Gather support by the network interface(NETIF_F_SG) may cause sendmsg(...,MSG_ZEROCOPY) to fail on using zero-copy, which causes it to fall back to the default copying mechanism. After each full dirty-bitmap scan there should be a zero-copy flush happening, which checks for errors each of the previous calls to sendmsg(...,MSG_ZEROCOPY). If all of them failed to use zero-copy, then increment dirty_sync_missed_zero_copy migration stat to let the user know about it. Signed-off-by: Leonardo Bras <leobras@redhat.com> --- migration/ram.h | 2 ++ migration/multifd.c | 2 ++ migration/ram.c | 5 +++++ 3 files changed, 9 insertions(+)