Message ID | 20220628010908.390564-3-leobras@redhat.com |
---|---|
State | New |
Headers | show |
Series | Zero copy improvements (QIOChannel + multifd) | expand |
On Mon, Jun 27, 2022 at 10:09:09PM -0300, Leonardo Bras wrote: > Some errors, like the lack of Scatter-Gather support by the network > interface(NETIF_F_SG) may cause sendmsg(...,MSG_ZEROCOPY) to fail on using > zero-copy, which causes it to fall back to the default copying mechanism. How common is this lack of SG support ? What NICs did you have that were affected ? > After each full dirty-bitmap scan there should be a zero-copy flush > happening, which checks for errors each of the previous calls to > sendmsg(...,MSG_ZEROCOPY). If all of them failed to use zero-copy, then > warn the user about it. > > Since it happens once each full dirty-bitmap scan, even in worst case > scenario it should not print a lot of warnings, and will allow tracking > how many dirty-bitmap iterations were not able to use zero-copy send. For long running migrations which are not converging, or converging very slowly there could be 100's of passes. > Signed-off-by: Leonardo Bras <leobras@redhat.com> > --- > migration/multifd.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/migration/multifd.c b/migration/multifd.c > index 684c014c86..9c62aec84e 100644 > --- a/migration/multifd.c > +++ b/migration/multifd.c > @@ -624,6 +624,9 @@ int multifd_send_sync_main(QEMUFile *f) > if (ret < 0) { > error_report_err(err); > return -1; > + } else if (ret == 1) { > + warn_report("The network device is not able to use " > + "zero-copy-send: copying is being used"); > } > } > } > -- > 2.36.1 > With regards, Daniel
On Tue, Jun 28, 2022 at 4:53 AM Daniel P. Berrangé <berrange@redhat.com> wrote: > > On Mon, Jun 27, 2022 at 10:09:09PM -0300, Leonardo Bras wrote: > > Some errors, like the lack of Scatter-Gather support by the network > > interface(NETIF_F_SG) may cause sendmsg(...,MSG_ZEROCOPY) to fail on using > > zero-copy, which causes it to fall back to the default copying mechanism. > > How common is this lack of SG support ? What NICs did you have that > were affected ? I am not aware of any NIC without SG available for testing, nor have any idea on how common they are. But since we can detect sendmsg() falling back to copying we should warn the user if this ever happens. There is also a case in IPv6 related to fragmentation that may cause MSG_ZEROCOPY to fall back to the copying mechanism, so it's also covered. > > > After each full dirty-bitmap scan there should be a zero-copy flush > > happening, which checks for errors each of the previous calls to > > sendmsg(...,MSG_ZEROCOPY). If all of them failed to use zero-copy, then > > warn the user about it. > > > > Since it happens once each full dirty-bitmap scan, even in worst case > > scenario it should not print a lot of warnings, and will allow tracking > > how many dirty-bitmap iterations were not able to use zero-copy send. > > For long running migrations which are not converging, or converging > very slowly there could be 100's of passes. > I could change it so it only warns once, if that is too much output. Best regards, Leo > > > > Signed-off-by: Leonardo Bras <leobras@redhat.com> > > --- > > migration/multifd.c | 3 +++ > > 1 file changed, 3 insertions(+) > > > > diff --git a/migration/multifd.c b/migration/multifd.c > > index 684c014c86..9c62aec84e 100644 > > --- a/migration/multifd.c > > +++ b/migration/multifd.c > > @@ -624,6 +624,9 @@ int multifd_send_sync_main(QEMUFile *f) > > if (ret < 0) { > > error_report_err(err); > > return -1; > > + } else if (ret == 1) { > > + warn_report("The network device is not able to use " > > + "zero-copy-send: copying is being used"); > > } > > } > > } > > -- > > 2.36.1 > > > > With regards, > Daniel > -- > |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| > |: https://libvirt.org -o- https://fstop138.berrange.com :| > |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| >
On Tue, Jun 28, 2022 at 09:32:04AM -0300, Leonardo Bras Soares Passos wrote: > On Tue, Jun 28, 2022 at 4:53 AM Daniel P. Berrangé <berrange@redhat.com> wrote: > > > > On Mon, Jun 27, 2022 at 10:09:09PM -0300, Leonardo Bras wrote: > > > Some errors, like the lack of Scatter-Gather support by the network > > > interface(NETIF_F_SG) may cause sendmsg(...,MSG_ZEROCOPY) to fail on using > > > zero-copy, which causes it to fall back to the default copying mechanism. > > > > How common is this lack of SG support ? What NICs did you have that > > were affected ? > > I am not aware of any NIC without SG available for testing, nor have > any idea on how common they are. > But since we can detect sendmsg() falling back to copying we should > warn the user if this ever happens. > > There is also a case in IPv6 related to fragmentation that may cause > MSG_ZEROCOPY to fall back to the copying mechanism, so it's also > covered. > > > > > > After each full dirty-bitmap scan there should be a zero-copy flush > > > happening, which checks for errors each of the previous calls to > > > sendmsg(...,MSG_ZEROCOPY). If all of them failed to use zero-copy, then > > > warn the user about it. > > > > > > Since it happens once each full dirty-bitmap scan, even in worst case > > > scenario it should not print a lot of warnings, and will allow tracking > > > how many dirty-bitmap iterations were not able to use zero-copy send. > > > > For long running migrations which are not converging, or converging > > very slowly there could be 100's of passes. > > > > I could change it so it only warns once, if that is too much output. Well I'm mostly wondering what we're expecting the user todo with this information. Generally a log file containing warnings ends up turning into a bug report. If we think it is important for users and/or mgmt apps to be aware of this info, then it might be better to actually put a field in the query-migrate stats to report if zero-copy is being honoured or not, and just have a trace point in this location instead. With regards, Daniel
* Daniel P. Berrangé (berrange@redhat.com) wrote: > On Tue, Jun 28, 2022 at 09:32:04AM -0300, Leonardo Bras Soares Passos wrote: > > On Tue, Jun 28, 2022 at 4:53 AM Daniel P. Berrangé <berrange@redhat.com> wrote: > > > > > > On Mon, Jun 27, 2022 at 10:09:09PM -0300, Leonardo Bras wrote: > > > > Some errors, like the lack of Scatter-Gather support by the network > > > > interface(NETIF_F_SG) may cause sendmsg(...,MSG_ZEROCOPY) to fail on using > > > > zero-copy, which causes it to fall back to the default copying mechanism. > > > > > > How common is this lack of SG support ? What NICs did you have that > > > were affected ? > > > > I am not aware of any NIC without SG available for testing, nor have > > any idea on how common they are. > > But since we can detect sendmsg() falling back to copying we should > > warn the user if this ever happens. > > > > There is also a case in IPv6 related to fragmentation that may cause > > MSG_ZEROCOPY to fall back to the copying mechanism, so it's also > > covered. > > > > > > > > > After each full dirty-bitmap scan there should be a zero-copy flush > > > > happening, which checks for errors each of the previous calls to > > > > sendmsg(...,MSG_ZEROCOPY). If all of them failed to use zero-copy, then > > > > warn the user about it. > > > > > > > > Since it happens once each full dirty-bitmap scan, even in worst case > > > > scenario it should not print a lot of warnings, and will allow tracking > > > > how many dirty-bitmap iterations were not able to use zero-copy send. > > > > > > For long running migrations which are not converging, or converging > > > very slowly there could be 100's of passes. > > > > > > > I could change it so it only warns once, if that is too much output. > > Well I'm mostly wondering what we're expecting the user todo with this > information. Generally a log file containing warnings ends up turning > into a bug report. If we think it is important for users and/or mgmt > apps to be aware of this info, then it might be better to actually > put a field in the query-migrate stats to report if zero-copy is > being honoured or not, Yeh just a counter would work there I think. > and just have a trace point in this location > instead. Yeh. Dave > With regards, > Daniel > -- > |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| > |: https://libvirt.org -o- https://fstop138.berrange.com :| > |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| >
On Tue, Jun 28, 2022 at 10:52 AM Dr. David Alan Gilbert <dgilbert@redhat.com> wrote: > > * Daniel P. Berrangé (berrange@redhat.com) wrote: > > On Tue, Jun 28, 2022 at 09:32:04AM -0300, Leonardo Bras Soares Passos wrote: > > > On Tue, Jun 28, 2022 at 4:53 AM Daniel P. Berrangé <berrange@redhat.com> wrote: > > > > > > > > On Mon, Jun 27, 2022 at 10:09:09PM -0300, Leonardo Bras wrote: > > > > > Some errors, like the lack of Scatter-Gather support by the network > > > > > interface(NETIF_F_SG) may cause sendmsg(...,MSG_ZEROCOPY) to fail on using > > > > > zero-copy, which causes it to fall back to the default copying mechanism. > > > > > > > > How common is this lack of SG support ? What NICs did you have that > > > > were affected ? > > > > > > I am not aware of any NIC without SG available for testing, nor have > > > any idea on how common they are. > > > But since we can detect sendmsg() falling back to copying we should > > > warn the user if this ever happens. > > > > > > There is also a case in IPv6 related to fragmentation that may cause > > > MSG_ZEROCOPY to fall back to the copying mechanism, so it's also > > > covered. > > > > > > > > > > > > After each full dirty-bitmap scan there should be a zero-copy flush > > > > > happening, which checks for errors each of the previous calls to > > > > > sendmsg(...,MSG_ZEROCOPY). If all of them failed to use zero-copy, then > > > > > warn the user about it. > > > > > > > > > > Since it happens once each full dirty-bitmap scan, even in worst case > > > > > scenario it should not print a lot of warnings, and will allow tracking > > > > > how many dirty-bitmap iterations were not able to use zero-copy send. > > > > > > > > For long running migrations which are not converging, or converging > > > > very slowly there could be 100's of passes. > > > > > > > > > > I could change it so it only warns once, if that is too much output. > > > > Well I'm mostly wondering what we're expecting the user todo with this > > information. My rationale on that: - zero-copy-send is a feature that is supposed to improve send throughput by reducing cpu usage. - there is a chance the sendmsg(MSG_ZEROCOPY) fails to use zero-copy - if this happens, there will be a potential throughput decrease on sendmsg() - the user (or management app) need to know when zero-copy-send is degrading throughput, so it can be disabled - this is also important for performance testing, given it can be confusing having zero-copy-send improving throughput in some cases, and degrading in others, without any apparent reason why. > > Generally a log file containing warnings ends up turning > > into a bug report. If we think it is important for users and/or mgmt > > apps to be aware of this info, then it might be better to actually > > put a field in the query-migrate stats to report if zero-copy is > > being honoured or not, > > Yeh just a counter would work there I think. The warning idea was totally due to my inexperience on this mgmt app interface, since I had no other idea on how to deal with that. I think having it in query-migrate is a much better idea than a warning, since it should be much easier to parse and disable zero-copy-send if desired. Even in my current qemu test script, it's much better having it in query-migrate. > > > and just have a trace point in this location > > instead. > > Yeh. > Yeap, the counter idea seems great! Will it be always printed there, or only when zero-copy-send is enabled? Best regards, Leo > Dave > > > With regards, > > Daniel > > -- > > |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| > > |: https://libvirt.org -o- https://fstop138.berrange.com :| > > |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| > > > -- > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK >
* Leonardo Bras Soares Passos (leobras@redhat.com) wrote: > On Tue, Jun 28, 2022 at 10:52 AM Dr. David Alan Gilbert > <dgilbert@redhat.com> wrote: > > > > * Daniel P. Berrangé (berrange@redhat.com) wrote: > > > On Tue, Jun 28, 2022 at 09:32:04AM -0300, Leonardo Bras Soares Passos wrote: > > > > On Tue, Jun 28, 2022 at 4:53 AM Daniel P. Berrangé <berrange@redhat.com> wrote: > > > > > > > > > > On Mon, Jun 27, 2022 at 10:09:09PM -0300, Leonardo Bras wrote: > > > > > > Some errors, like the lack of Scatter-Gather support by the network > > > > > > interface(NETIF_F_SG) may cause sendmsg(...,MSG_ZEROCOPY) to fail on using > > > > > > zero-copy, which causes it to fall back to the default copying mechanism. > > > > > > > > > > How common is this lack of SG support ? What NICs did you have that > > > > > were affected ? > > > > > > > > I am not aware of any NIC without SG available for testing, nor have > > > > any idea on how common they are. > > > > But since we can detect sendmsg() falling back to copying we should > > > > warn the user if this ever happens. > > > > > > > > There is also a case in IPv6 related to fragmentation that may cause > > > > MSG_ZEROCOPY to fall back to the copying mechanism, so it's also > > > > covered. > > > > > > > > > > > > > > > After each full dirty-bitmap scan there should be a zero-copy flush > > > > > > happening, which checks for errors each of the previous calls to > > > > > > sendmsg(...,MSG_ZEROCOPY). If all of them failed to use zero-copy, then > > > > > > warn the user about it. > > > > > > > > > > > > Since it happens once each full dirty-bitmap scan, even in worst case > > > > > > scenario it should not print a lot of warnings, and will allow tracking > > > > > > how many dirty-bitmap iterations were not able to use zero-copy send. > > > > > > > > > > For long running migrations which are not converging, or converging > > > > > very slowly there could be 100's of passes. > > > > > > > > > > > > > I could change it so it only warns once, if that is too much output. > > > > > > Well I'm mostly wondering what we're expecting the user todo with this > > > information. > > > My rationale on that: > - zero-copy-send is a feature that is supposed to improve send > throughput by reducing cpu usage. > - there is a chance the sendmsg(MSG_ZEROCOPY) fails to use zero-copy > - if this happens, there will be a potential throughput decrease on sendmsg() > - the user (or management app) need to know when zero-copy-send is > degrading throughput, so it can be disabled > - this is also important for performance testing, given it can be > confusing having zero-copy-send improving throughput in some cases, > and degrading in others, without any apparent reason why. > > > > Generally a log file containing warnings ends up turning > > > into a bug report. If we think it is important for users and/or mgmt > > > apps to be aware of this info, then it might be better to actually > > > put a field in the query-migrate stats to report if zero-copy is > > > being honoured or not, > > > > Yeh just a counter would work there I think. > > The warning idea was totally due to my inexperience on this mgmt app > interface, since I had no other idea on how to deal with that. Yeh it's not too silly an idea! The way some of these warning or stats get to us can be a bit random, but sometimes can confuse things. > I think having it in query-migrate is a much better idea than a > warning, since it should be much easier to parse and disable > zero-copy-send if desired. > Even in my current qemu test script, it's much better having it in > query-migrate. > > > > > > and just have a trace point in this location > > > instead. > > > > Yeh. > > > > Yeap, the counter idea seems great! > Will it be always printed there, or only when zero-copy-send is enabled? You could make it either if it's enabled or if it's none zero. (I guess you want it to reset to 0 at the start of a new migration). Dave > > Best regards, > Leo > > > Dave > > > > > With regards, > > > Daniel > > > -- > > > |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| > > > |: https://libvirt.org -o- https://fstop138.berrange.com :| > > > |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| > > > > > -- > > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK > > >
On Tue, 2022-06-28 at 17:56 +0100, Dr. David Alan Gilbert wrote: > * Leonardo Bras Soares Passos (leobras@redhat.com) wrote: > > On Tue, Jun 28, 2022 at 10:52 AM Dr. David Alan Gilbert > > <dgilbert@redhat.com> wrote: > > > > > > * Daniel P. Berrangé (berrange@redhat.com) wrote: > > > > On Tue, Jun 28, 2022 at 09:32:04AM -0300, Leonardo Bras Soares Passos > > > > wrote: > > > > > On Tue, Jun 28, 2022 at 4:53 AM Daniel P. Berrangé > > > > > <berrange@redhat.com> wrote: > > > > > > > > > > > > On Mon, Jun 27, 2022 at 10:09:09PM -0300, Leonardo Bras wrote: > > > > > > > Some errors, like the lack of Scatter-Gather support by the > > > > > > > network > > > > > > > interface(NETIF_F_SG) may cause sendmsg(...,MSG_ZEROCOPY) to fail > > > > > > > on using > > > > > > > zero-copy, which causes it to fall back to the default copying > > > > > > > mechanism. > > > > > > > > > > > > How common is this lack of SG support ? What NICs did you have that > > > > > > were affected ? > > > > > > > > > > I am not aware of any NIC without SG available for testing, nor have > > > > > any idea on how common they are. > > > > > But since we can detect sendmsg() falling back to copying we should > > > > > warn the user if this ever happens. > > > > > > > > > > There is also a case in IPv6 related to fragmentation that may cause > > > > > MSG_ZEROCOPY to fall back to the copying mechanism, so it's also > > > > > covered. > > > > > > > > > > > > > > > > > > After each full dirty-bitmap scan there should be a zero-copy > > > > > > > flush > > > > > > > happening, which checks for errors each of the previous calls to > > > > > > > sendmsg(...,MSG_ZEROCOPY). If all of them failed to use zero-copy, > > > > > > > then > > > > > > > warn the user about it. > > > > > > > > > > > > > > Since it happens once each full dirty-bitmap scan, even in worst > > > > > > > case > > > > > > > scenario it should not print a lot of warnings, and will allow > > > > > > > tracking > > > > > > > how many dirty-bitmap iterations were not able to use zero-copy > > > > > > > send. > > > > > > > > > > > > For long running migrations which are not converging, or converging > > > > > > very slowly there could be 100's of passes. > > > > > > > > > > > > > > > > I could change it so it only warns once, if that is too much output. > > > > > > > > Well I'm mostly wondering what we're expecting the user todo with this > > > > information. > > > > > > My rationale on that: > > - zero-copy-send is a feature that is supposed to improve send > > throughput by reducing cpu usage. > > - there is a chance the sendmsg(MSG_ZEROCOPY) fails to use zero-copy > > - if this happens, there will be a potential throughput decrease on > > sendmsg() > > - the user (or management app) need to know when zero-copy-send is > > degrading throughput, so it can be disabled > > - this is also important for performance testing, given it can be > > confusing having zero-copy-send improving throughput in some cases, > > and degrading in others, without any apparent reason why. > > > > > > Generally a log file containing warnings ends up turning > > > > into a bug report. If we think it is important for users and/or mgmt > > > > apps to be aware of this info, then it might be better to actually > > > > put a field in the query-migrate stats to report if zero-copy is > > > > being honoured or not, > > > > > > Yeh just a counter would work there I think. > > > > The warning idea was totally due to my inexperience on this mgmt app > > interface, since I had no other idea on how to deal with that. > > Yeh it's not too silly an idea! > The way some of these warning or stats get to us can be a bit random, > but sometimes can confuse things. > > > I think having it in query-migrate is a much better idea than a > > warning, since it should be much easier to parse and disable > > zero-copy-send if desired. > > Even in my current qemu test script, it's much better having it in > > query-migrate. > > > > > > > > > and just have a trace point in this location > > > > instead. > > > > > > Yeh. > > > > > > > Yeap, the counter idea seems great! > > Will it be always printed there, or only when zero-copy-send is enabled? > > You could make it either if it's enabled or if it's none zero. > (I guess you want it to reset to 0 at the start of a new migration). > > Dave Thanks for this feedback! I have everything already working, but I am struggling with a good property name. I am currently using zero_copy_copied (or zero-copy-copied in json), but it does not look like a good Migration stat name. Do you have any suggestion? Best regards, Leo > > > > > Best regards, > > Leo > > > > > Dave > > > > > > > With regards, > > > > Daniel > > > > -- > > > > > : https://berrange.com -o- > > > > > https://www.flickr.com/photos/dberrange :| > > > > > : https://libvirt.org -o- > > > > > https://fstop138.berrange.com :| > > > > > : https://entangle-photo.org -o- > > > > > https://www.instagram.com/dberrange :| > > > > > > > -- > > > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK > > > > >
* Leonardo Brás (leobras@redhat.com) wrote: > On Tue, 2022-06-28 at 17:56 +0100, Dr. David Alan Gilbert wrote: > > * Leonardo Bras Soares Passos (leobras@redhat.com) wrote: > > > On Tue, Jun 28, 2022 at 10:52 AM Dr. David Alan Gilbert > > > <dgilbert@redhat.com> wrote: > > > > > > > > * Daniel P. Berrangé (berrange@redhat.com) wrote: > > > > > On Tue, Jun 28, 2022 at 09:32:04AM -0300, Leonardo Bras Soares Passos > > > > > wrote: > > > > > > On Tue, Jun 28, 2022 at 4:53 AM Daniel P. Berrangé > > > > > > <berrange@redhat.com> wrote: > > > > > > > > > > > > > > On Mon, Jun 27, 2022 at 10:09:09PM -0300, Leonardo Bras wrote: > > > > > > > > Some errors, like the lack of Scatter-Gather support by the > > > > > > > > network > > > > > > > > interface(NETIF_F_SG) may cause sendmsg(...,MSG_ZEROCOPY) to fail > > > > > > > > on using > > > > > > > > zero-copy, which causes it to fall back to the default copying > > > > > > > > mechanism. > > > > > > > > > > > > > > How common is this lack of SG support ? What NICs did you have that > > > > > > > were affected ? > > > > > > > > > > > > I am not aware of any NIC without SG available for testing, nor have > > > > > > any idea on how common they are. > > > > > > But since we can detect sendmsg() falling back to copying we should > > > > > > warn the user if this ever happens. > > > > > > > > > > > > There is also a case in IPv6 related to fragmentation that may cause > > > > > > MSG_ZEROCOPY to fall back to the copying mechanism, so it's also > > > > > > covered. > > > > > > > > > > > > > > > > > > > > > After each full dirty-bitmap scan there should be a zero-copy > > > > > > > > flush > > > > > > > > happening, which checks for errors each of the previous calls to > > > > > > > > sendmsg(...,MSG_ZEROCOPY). If all of them failed to use zero-copy, > > > > > > > > then > > > > > > > > warn the user about it. > > > > > > > > > > > > > > > > Since it happens once each full dirty-bitmap scan, even in worst > > > > > > > > case > > > > > > > > scenario it should not print a lot of warnings, and will allow > > > > > > > > tracking > > > > > > > > how many dirty-bitmap iterations were not able to use zero-copy > > > > > > > > send. > > > > > > > > > > > > > > For long running migrations which are not converging, or converging > > > > > > > very slowly there could be 100's of passes. > > > > > > > > > > > > > > > > > > > I could change it so it only warns once, if that is too much output. > > > > > > > > > > Well I'm mostly wondering what we're expecting the user todo with this > > > > > information. > > > > > > > > > My rationale on that: > > > - zero-copy-send is a feature that is supposed to improve send > > > throughput by reducing cpu usage. > > > - there is a chance the sendmsg(MSG_ZEROCOPY) fails to use zero-copy > > > - if this happens, there will be a potential throughput decrease on > > > sendmsg() > > > - the user (or management app) need to know when zero-copy-send is > > > degrading throughput, so it can be disabled > > > - this is also important for performance testing, given it can be > > > confusing having zero-copy-send improving throughput in some cases, > > > and degrading in others, without any apparent reason why. > > > > > > > > Generally a log file containing warnings ends up turning > > > > > into a bug report. If we think it is important for users and/or mgmt > > > > > apps to be aware of this info, then it might be better to actually > > > > > put a field in the query-migrate stats to report if zero-copy is > > > > > being honoured or not, > > > > > > > > Yeh just a counter would work there I think. > > > > > > The warning idea was totally due to my inexperience on this mgmt app > > > interface, since I had no other idea on how to deal with that. > > > > Yeh it's not too silly an idea! > > The way some of these warning or stats get to us can be a bit random, > > but sometimes can confuse things. > > > > > I think having it in query-migrate is a much better idea than a > > > warning, since it should be much easier to parse and disable > > > zero-copy-send if desired. > > > Even in my current qemu test script, it's much better having it in > > > query-migrate. > > > > > > > > > > > > and just have a trace point in this location > > > > > instead. > > > > > > > > Yeh. > > > > > > > > > > Yeap, the counter idea seems great! > > > Will it be always printed there, or only when zero-copy-send is enabled? > > > > You could make it either if it's enabled or if it's none zero. > > (I guess you want it to reset to 0 at the start of a new migration). > > > > Dave > > Thanks for this feedback! > > I have everything already working, but I am struggling with a good property > name. > > I am currently using zero_copy_copied (or zero-copy-copied in json), but it does > not look like a good Migration stat name. > > Do you have any suggestion? Shrug; I'm not going to fuss over the name too much as long as it's reasonable. 'zero-copied' might be OK. Dave > Best regards, > Leo > > > > > > > > > > Best regards, > > > Leo > > > > > > > Dave > > > > > > > > > With regards, > > > > > Daniel > > > > > -- > > > > > > : https://berrange.com -o- > > > > > > https://www.flickr.com/photos/dberrange :| > > > > > > : https://libvirt.org -o- > > > > > > https://fstop138.berrange.com :| > > > > > > : https://entangle-photo.org -o- > > > > > > https://www.instagram.com/dberrange :| > > > > > > > > > -- > > > > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK > > > > > > > >
diff --git a/migration/multifd.c b/migration/multifd.c index 684c014c86..9c62aec84e 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -624,6 +624,9 @@ int multifd_send_sync_main(QEMUFile *f) if (ret < 0) { error_report_err(err); return -1; + } else if (ret == 1) { + warn_report("The network device is not able to use " + "zero-copy-send: copying is being used"); } } }
Some errors, like the lack of Scatter-Gather support by the network interface(NETIF_F_SG) may cause sendmsg(...,MSG_ZEROCOPY) to fail on using zero-copy, which causes it to fall back to the default copying mechanism. After each full dirty-bitmap scan there should be a zero-copy flush happening, which checks for errors each of the previous calls to sendmsg(...,MSG_ZEROCOPY). If all of them failed to use zero-copy, then warn the user about it. Since it happens once each full dirty-bitmap scan, even in worst case scenario it should not print a lot of warnings, and will allow tracking how many dirty-bitmap iterations were not able to use zero-copy send. Signed-off-by: Leonardo Bras <leobras@redhat.com> --- migration/multifd.c | 3 +++ 1 file changed, 3 insertions(+)