Message ID | 1551454936-205218-2-git-send-email-imammedo@redhat.com |
---|---|
State | New |
Headers | show |
Series | numa: deprecate -numa node, mem and default memory distribution | expand |
On Fri, Mar 01, 2019 at 04:42:15PM +0100, Igor Mammedov wrote: > The parameter allows to configure fake NUMA topology where guest > VM simulates NUMA topology but not actually getting a performance > benefits from it. The same or better results could be achieved > using 'memdev' parameter. In light of that any VM that uses NUMA > to get its benefits should use 'memdev' and to allow transition > initial RAM to device based model, deprecate 'mem' parameter as > its ad-hoc partitioning of initial RAM MemoryRegion can't be > translated to memdev based backend transparently to users and in > compatible manner (migration wise). > > That will also allow to clean up a bit our numa code, leaving only > 'memdev' impl. in place and several boards that use node_mem > to generate FDT/ACPI description from it. Can you confirm that the 'mem' and 'memdev' parameters to -numa are 100% live migration compatible in both directions ? Libvirt would need this to be the case in order to use the 'memdev' syntax instead. > > Signed-off-by: Igor Mammedov <imammedo@redhat.com> > --- > numa.c | 2 ++ > qemu-deprecated.texi | 14 ++++++++++++++ > 2 files changed, 16 insertions(+) > > diff --git a/numa.c b/numa.c > index 3875e1e..2205773 100644 > --- a/numa.c > +++ b/numa.c > @@ -121,6 +121,8 @@ static void parse_numa_node(MachineState *ms, NumaNodeOptions *node, > > if (node->has_mem) { > numa_info[nodenr].node_mem = node->mem; > + warn_report("Parameter -numa node,mem is deprecated," > + " use -numa node,memdev instead"); > } > if (node->has_memdev) { > Object *o; > diff --git a/qemu-deprecated.texi b/qemu-deprecated.texi > index 45c5795..73f99d4 100644 > --- a/qemu-deprecated.texi > +++ b/qemu-deprecated.texi > @@ -60,6 +60,20 @@ Support for invalid topologies will be removed, the user must ensure > topologies described with -smp include all possible cpus, i.e. > @math{@var{sockets} * @var{cores} * @var{threads} = @var{maxcpus}}. > > +@subsection -numa node,mem=@var{size} (since 4.0) > + > +The parameter @option{mem} of @option{-numa node} is used to assign a part of > +guest RAM to a NUMA node. But when using it, it's impossible to manage specified > +size on the host side (like bind it to a host node, setting bind policy, ...), > +so guest end-ups with the fake NUMA configuration with suboptiomal performance. > +However since 2014 there is an alternative way to assign RAM to a NUMA node > +using parameter @option{memdev}, which does the same as @option{mem} and has > +an ability to actualy manage node RAM on the host side. Use parameter > +@option{memdev} with @var{memory-backend-ram} backend as an replacement for > +parameter @option{mem} to achieve the same fake NUMA effect or a properly > +configured @var{memory-backend-file} backend to actually benefit from NUMA > +configuration. > + > @section QEMU Machine Protocol (QMP) commands > > @subsection block-dirty-bitmap-add "autoload" parameter (since 2.12.0) > -- > 2.7.4 > > -- > libvir-list mailing list > libvir-list@redhat.com > https://www.redhat.com/mailman/listinfo/libvir-list Regards, Daniel
On Fri, 1 Mar 2019 15:49:47 +0000 Daniel P. Berrangé <berrange@redhat.com> wrote: > On Fri, Mar 01, 2019 at 04:42:15PM +0100, Igor Mammedov wrote: > > The parameter allows to configure fake NUMA topology where guest > > VM simulates NUMA topology but not actually getting a performance > > benefits from it. The same or better results could be achieved > > using 'memdev' parameter. In light of that any VM that uses NUMA > > to get its benefits should use 'memdev' and to allow transition > > initial RAM to device based model, deprecate 'mem' parameter as > > its ad-hoc partitioning of initial RAM MemoryRegion can't be > > translated to memdev based backend transparently to users and in > > compatible manner (migration wise). > > > > That will also allow to clean up a bit our numa code, leaving only > > 'memdev' impl. in place and several boards that use node_mem > > to generate FDT/ACPI description from it. > > Can you confirm that the 'mem' and 'memdev' parameters to -numa > are 100% live migration compatible in both directions ? Libvirt > would need this to be the case in order to use the 'memdev' syntax > instead. Unfortunately they are not migration compatible in any direction, if it where possible to translate them to each other I'd alias 'mem' to 'memdev' without deprecation. The former sends over only one MemoryRegion to target, while the later sends over several (one per memdev). Mixed memory issue[1] first came from libvirt side RHBZ1624223, back then it was resolved on libvirt side in favor of migration compatibility vs correctness (i.e. bind policy doesn't work as expected). What worse that it was made default and affects all new machines, as I understood it. In case of -mem-path + -mem-prealloc (with 1 numa node or numa less) it's possible on QEMU side to make conversion to memdev in migration compatible way (that's what stopped Michal from memdev approach). But it's hard to do so in multi-nodes case as amount of MemoryRegions is different. Point is to consider 'mem' as mis-configuration error, as the user in the first place using broken numa configuration (i.e. fake numa configuration doesn't actually improve performance). CCed David, maybe he could offer a way to do 1:n migration and other way around. > > Signed-off-by: Igor Mammedov <imammedo@redhat.com> > > --- > > numa.c | 2 ++ > > qemu-deprecated.texi | 14 ++++++++++++++ > > 2 files changed, 16 insertions(+) > > > > diff --git a/numa.c b/numa.c > > index 3875e1e..2205773 100644 > > --- a/numa.c > > +++ b/numa.c > > @@ -121,6 +121,8 @@ static void parse_numa_node(MachineState *ms, NumaNodeOptions *node, > > > > if (node->has_mem) { > > numa_info[nodenr].node_mem = node->mem; > > + warn_report("Parameter -numa node,mem is deprecated," > > + " use -numa node,memdev instead"); > > } > > if (node->has_memdev) { > > Object *o; > > diff --git a/qemu-deprecated.texi b/qemu-deprecated.texi > > index 45c5795..73f99d4 100644 > > --- a/qemu-deprecated.texi > > +++ b/qemu-deprecated.texi > > @@ -60,6 +60,20 @@ Support for invalid topologies will be removed, the user must ensure > > topologies described with -smp include all possible cpus, i.e. > > @math{@var{sockets} * @var{cores} * @var{threads} = @var{maxcpus}}. > > > > +@subsection -numa node,mem=@var{size} (since 4.0) > > + > > +The parameter @option{mem} of @option{-numa node} is used to assign a part of > > +guest RAM to a NUMA node. But when using it, it's impossible to manage specified > > +size on the host side (like bind it to a host node, setting bind policy, ...), > > +so guest end-ups with the fake NUMA configuration with suboptiomal performance. > > +However since 2014 there is an alternative way to assign RAM to a NUMA node > > +using parameter @option{memdev}, which does the same as @option{mem} and has > > +an ability to actualy manage node RAM on the host side. Use parameter > > +@option{memdev} with @var{memory-backend-ram} backend as an replacement for > > +parameter @option{mem} to achieve the same fake NUMA effect or a properly > > +configured @var{memory-backend-file} backend to actually benefit from NUMA > > +configuration. > > + > > @section QEMU Machine Protocol (QMP) commands > > > > @subsection block-dirty-bitmap-add "autoload" parameter (since 2.12.0) > > -- > > 2.7.4 > > > > -- > > libvir-list mailing list > > libvir-list@redhat.com > > https://www.redhat.com/mailman/listinfo/libvir-list > > Regards, > Daniel
On Fri, Mar 01, 2019 at 06:33:28PM +0100, Igor Mammedov wrote: > On Fri, 1 Mar 2019 15:49:47 +0000 > Daniel P. Berrangé <berrange@redhat.com> wrote: > > > On Fri, Mar 01, 2019 at 04:42:15PM +0100, Igor Mammedov wrote: > > > The parameter allows to configure fake NUMA topology where guest > > > VM simulates NUMA topology but not actually getting a performance > > > benefits from it. The same or better results could be achieved > > > using 'memdev' parameter. In light of that any VM that uses NUMA > > > to get its benefits should use 'memdev' and to allow transition > > > initial RAM to device based model, deprecate 'mem' parameter as > > > its ad-hoc partitioning of initial RAM MemoryRegion can't be > > > translated to memdev based backend transparently to users and in > > > compatible manner (migration wise). > > > > > > That will also allow to clean up a bit our numa code, leaving only > > > 'memdev' impl. in place and several boards that use node_mem > > > to generate FDT/ACPI description from it. > > > > Can you confirm that the 'mem' and 'memdev' parameters to -numa > > are 100% live migration compatible in both directions ? Libvirt > > would need this to be the case in order to use the 'memdev' syntax > > instead. > Unfortunately they are not migration compatible in any direction, > if it where possible to translate them to each other I'd alias 'mem' > to 'memdev' without deprecation. The former sends over only one > MemoryRegion to target, while the later sends over several (one per > memdev). If we can't migration from one to the other, then we can not deprecate the existing 'mem' syntax. Even if libvirt were to provide a config option to let apps opt-in to the new syntax, we need to be able to support live migration of existing running VMs indefinitely. Effectively this means we need the to keep 'mem' support forever, or at least such a long time that it effectively means forever. So I think this patch has to be dropped & replaced with one that simply documents that memdev syntax is preferred. Regards, Daniel
* Igor Mammedov (imammedo@redhat.com) wrote: > On Fri, 1 Mar 2019 15:49:47 +0000 > Daniel P. Berrangé <berrange@redhat.com> wrote: > > > On Fri, Mar 01, 2019 at 04:42:15PM +0100, Igor Mammedov wrote: > > > The parameter allows to configure fake NUMA topology where guest > > > VM simulates NUMA topology but not actually getting a performance > > > benefits from it. The same or better results could be achieved > > > using 'memdev' parameter. In light of that any VM that uses NUMA > > > to get its benefits should use 'memdev' and to allow transition > > > initial RAM to device based model, deprecate 'mem' parameter as > > > its ad-hoc partitioning of initial RAM MemoryRegion can't be > > > translated to memdev based backend transparently to users and in > > > compatible manner (migration wise). > > > > > > That will also allow to clean up a bit our numa code, leaving only > > > 'memdev' impl. in place and several boards that use node_mem > > > to generate FDT/ACPI description from it. > > > > Can you confirm that the 'mem' and 'memdev' parameters to -numa > > are 100% live migration compatible in both directions ? Libvirt > > would need this to be the case in order to use the 'memdev' syntax > > instead. > Unfortunately they are not migration compatible in any direction, > if it where possible to translate them to each other I'd alias 'mem' > to 'memdev' without deprecation. The former sends over only one > MemoryRegion to target, while the later sends over several (one per > memdev). > > Mixed memory issue[1] first came from libvirt side RHBZ1624223, > back then it was resolved on libvirt side in favor of migration > compatibility vs correctness (i.e. bind policy doesn't work as expected). > What worse that it was made default and affects all new machines, > as I understood it. > > In case of -mem-path + -mem-prealloc (with 1 numa node or numa less) > it's possible on QEMU side to make conversion to memdev in migration > compatible way (that's what stopped Michal from memdev approach). > But it's hard to do so in multi-nodes case as amount of MemoryRegions > is different. > > Point is to consider 'mem' as mis-configuration error, as the user > in the first place using broken numa configuration > (i.e. fake numa configuration doesn't actually improve performance). > > CCed David, maybe he could offer a way to do 1:n migration and other > way around. I can't see a trivial way. About the easiest I can think of is if you had a way to create a memdev that was an alias to pc.ram (of a particular size and offset). Dave > > > > Signed-off-by: Igor Mammedov <imammedo@redhat.com> > > > --- > > > numa.c | 2 ++ > > > qemu-deprecated.texi | 14 ++++++++++++++ > > > 2 files changed, 16 insertions(+) > > > > > > diff --git a/numa.c b/numa.c > > > index 3875e1e..2205773 100644 > > > --- a/numa.c > > > +++ b/numa.c > > > @@ -121,6 +121,8 @@ static void parse_numa_node(MachineState *ms, NumaNodeOptions *node, > > > > > > if (node->has_mem) { > > > numa_info[nodenr].node_mem = node->mem; > > > + warn_report("Parameter -numa node,mem is deprecated," > > > + " use -numa node,memdev instead"); > > > } > > > if (node->has_memdev) { > > > Object *o; > > > diff --git a/qemu-deprecated.texi b/qemu-deprecated.texi > > > index 45c5795..73f99d4 100644 > > > --- a/qemu-deprecated.texi > > > +++ b/qemu-deprecated.texi > > > @@ -60,6 +60,20 @@ Support for invalid topologies will be removed, the user must ensure > > > topologies described with -smp include all possible cpus, i.e. > > > @math{@var{sockets} * @var{cores} * @var{threads} = @var{maxcpus}}. > > > > > > +@subsection -numa node,mem=@var{size} (since 4.0) > > > + > > > +The parameter @option{mem} of @option{-numa node} is used to assign a part of > > > +guest RAM to a NUMA node. But when using it, it's impossible to manage specified > > > +size on the host side (like bind it to a host node, setting bind policy, ...), > > > +so guest end-ups with the fake NUMA configuration with suboptiomal performance. > > > +However since 2014 there is an alternative way to assign RAM to a NUMA node > > > +using parameter @option{memdev}, which does the same as @option{mem} and has > > > +an ability to actualy manage node RAM on the host side. Use parameter > > > +@option{memdev} with @var{memory-backend-ram} backend as an replacement for > > > +parameter @option{mem} to achieve the same fake NUMA effect or a properly > > > +configured @var{memory-backend-file} backend to actually benefit from NUMA > > > +configuration. > > > + > > > @section QEMU Machine Protocol (QMP) commands > > > > > > @subsection block-dirty-bitmap-add "autoload" parameter (since 2.12.0) > > > -- > > > 2.7.4 > > > > > > -- > > > libvir-list mailing list > > > libvir-list@redhat.com > > > https://www.redhat.com/mailman/listinfo/libvir-list > > > > Regards, > > Daniel > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Daniel P. Berrangé <berrange@redhat.com> writes: > On Fri, Mar 01, 2019 at 06:33:28PM +0100, Igor Mammedov wrote: >> On Fri, 1 Mar 2019 15:49:47 +0000 >> Daniel P. Berrangé <berrange@redhat.com> wrote: >> >> > On Fri, Mar 01, 2019 at 04:42:15PM +0100, Igor Mammedov wrote: >> > > The parameter allows to configure fake NUMA topology where guest >> > > VM simulates NUMA topology but not actually getting a performance >> > > benefits from it. The same or better results could be achieved >> > > using 'memdev' parameter. In light of that any VM that uses NUMA >> > > to get its benefits should use 'memdev' and to allow transition >> > > initial RAM to device based model, deprecate 'mem' parameter as >> > > its ad-hoc partitioning of initial RAM MemoryRegion can't be >> > > translated to memdev based backend transparently to users and in >> > > compatible manner (migration wise). >> > > >> > > That will also allow to clean up a bit our numa code, leaving only >> > > 'memdev' impl. in place and several boards that use node_mem >> > > to generate FDT/ACPI description from it. >> > >> > Can you confirm that the 'mem' and 'memdev' parameters to -numa >> > are 100% live migration compatible in both directions ? Libvirt >> > would need this to be the case in order to use the 'memdev' syntax >> > instead. >> Unfortunately they are not migration compatible in any direction, >> if it where possible to translate them to each other I'd alias 'mem' >> to 'memdev' without deprecation. The former sends over only one >> MemoryRegion to target, while the later sends over several (one per >> memdev). > > If we can't migration from one to the other, then we can not deprecate > the existing 'mem' syntax. Even if libvirt were to provide a config > option to let apps opt-in to the new syntax, we need to be able to > support live migration of existing running VMs indefinitely. Effectively > this means we need the to keep 'mem' support forever, or at least such > a long time that it effectively means forever. > > So I think this patch has to be dropped & replaced with one that > simply documents that memdev syntax is preferred. We have this habit of postulating absolutes like "can not deprecate" instead of engaging with the tradeoffs. We need to kick it. So let's have an actual look at the tradeoffs. We don't actually "support live migration of existing running VMs indefinitely". We support live migration to any newer version of QEMU that still supports the machine type. We support live migration to any older version of QEMU that already supports the machine type and all the devices the machine uses. Aside: "support" is really an honest best effort here. If you rely on it, use a downstream that puts in the (substantial!) QA work real support takes. Feature deprecation is not a contract to drop the feature after two releases, or even five. It's a formal notice that users of the feature should transition to its replacement in an orderly manner. If I understand Igor correctly, all users should transition away from outdated NUMA configurations at least for new VMs in an orderly manner. So, how could this formal notice be served constructively? If we reject outdated NUMA configurations starting with machine type T, we can remove the means to create those configurations along with machine type T-1. Won't happen anytime soon, will happen eventually, because in the long run, all machine types are dead (apologies to Keynes). If we deprecate outdated NUMA configurations now, we can start rejecting them with new machine types after a suitable grace period.
On 01/03/2019 18.48, Daniel P. Berrangé wrote: [...] > So I think this patch has to be dropped & replaced with one that > simply documents that memdev syntax is preferred. That's definitely not enough. I've had a couple of cases already where we documented that certain options should not be used anymore, and people simply ignored it (aka. if it ain't broken, don't do any change). Then they just started to complain when I really tried to remove the option after the deprecation period. So Igor, if you can not officially deprecate these things here yet, you should at least make sure that they can not be used with new machine types anymore. Then, after a couple of years, when we feel sure that there are only some few or no people left who still use it with the old machine types, we can start to discuss the deprecation process again, I think. Thomas
On Mon, Mar 04, 2019 at 08:13:53AM +0100, Markus Armbruster wrote: > Daniel P. Berrangé <berrange@redhat.com> writes: > > > On Fri, Mar 01, 2019 at 06:33:28PM +0100, Igor Mammedov wrote: > >> On Fri, 1 Mar 2019 15:49:47 +0000 > >> Daniel P. Berrangé <berrange@redhat.com> wrote: > >> > >> > On Fri, Mar 01, 2019 at 04:42:15PM +0100, Igor Mammedov wrote: > >> > > The parameter allows to configure fake NUMA topology where guest > >> > > VM simulates NUMA topology but not actually getting a performance > >> > > benefits from it. The same or better results could be achieved > >> > > using 'memdev' parameter. In light of that any VM that uses NUMA > >> > > to get its benefits should use 'memdev' and to allow transition > >> > > initial RAM to device based model, deprecate 'mem' parameter as > >> > > its ad-hoc partitioning of initial RAM MemoryRegion can't be > >> > > translated to memdev based backend transparently to users and in > >> > > compatible manner (migration wise). > >> > > > >> > > That will also allow to clean up a bit our numa code, leaving only > >> > > 'memdev' impl. in place and several boards that use node_mem > >> > > to generate FDT/ACPI description from it. > >> > > >> > Can you confirm that the 'mem' and 'memdev' parameters to -numa > >> > are 100% live migration compatible in both directions ? Libvirt > >> > would need this to be the case in order to use the 'memdev' syntax > >> > instead. > >> Unfortunately they are not migration compatible in any direction, > >> if it where possible to translate them to each other I'd alias 'mem' > >> to 'memdev' without deprecation. The former sends over only one > >> MemoryRegion to target, while the later sends over several (one per > >> memdev). > > > > If we can't migration from one to the other, then we can not deprecate > > the existing 'mem' syntax. Even if libvirt were to provide a config > > option to let apps opt-in to the new syntax, we need to be able to > > support live migration of existing running VMs indefinitely. Effectively > > this means we need the to keep 'mem' support forever, or at least such > > a long time that it effectively means forever. > > > > So I think this patch has to be dropped & replaced with one that > > simply documents that memdev syntax is preferred. > > We have this habit of postulating absolutes like "can not deprecate" > instead of engaging with the tradeoffs. We need to kick it. > > So let's have an actual look at the tradeoffs. > > We don't actually "support live migration of existing running VMs > indefinitely". > > We support live migration to any newer version of QEMU that still > supports the machine type. > > We support live migration to any older version of QEMU that already > supports the machine type and all the devices the machine uses. > > Aside: "support" is really an honest best effort here. If you rely on > it, use a downstream that puts in the (substantial!) QA work real > support takes. If upstream deletes the feature, then that in turn breaks the downstream unless downstream reverts the upstream change. When we have large overlap between downstream & upstream maintainer, it is not beneficial to delete the feature upstream as any effort saved upstream usually expands into larger effort downstream. > Feature deprecation is not a contract to drop the feature after two > releases, or even five. It's a formal notice that users of the feature > should transition to its replacement in an orderly manner. > > If I understand Igor correctly, all users should transition away from > outdated NUMA configurations at least for new VMs in an orderly manner. > > So, how could this formal notice be served constructively? > > If we reject outdated NUMA configurations starting with machine type T, > we can remove the means to create those configurations along with > machine type T-1. Won't happen anytime soon, will happen eventually, > because in the long run, all machine types are dead (apologies to > Keynes). > > If we deprecate outdated NUMA configurations now, we can start rejecting > them with new machine types after a suitable grace period. How is libvirt going to know what machines it can use with the feature ? We don't have any way to introspect machine type specific logic, since we run all probing with "-machine none", and QEMU can't report anything about machines without instantiating them. Regards, Daniel
Daniel P. Berrangé <berrange@redhat.com> writes: > On Mon, Mar 04, 2019 at 08:13:53AM +0100, Markus Armbruster wrote: >> Daniel P. Berrangé <berrange@redhat.com> writes: >> >> > On Fri, Mar 01, 2019 at 06:33:28PM +0100, Igor Mammedov wrote: >> >> On Fri, 1 Mar 2019 15:49:47 +0000 >> >> Daniel P. Berrangé <berrange@redhat.com> wrote: >> >> >> >> > On Fri, Mar 01, 2019 at 04:42:15PM +0100, Igor Mammedov wrote: >> >> > > The parameter allows to configure fake NUMA topology where guest >> >> > > VM simulates NUMA topology but not actually getting a performance >> >> > > benefits from it. The same or better results could be achieved >> >> > > using 'memdev' parameter. In light of that any VM that uses NUMA >> >> > > to get its benefits should use 'memdev' and to allow transition >> >> > > initial RAM to device based model, deprecate 'mem' parameter as >> >> > > its ad-hoc partitioning of initial RAM MemoryRegion can't be >> >> > > translated to memdev based backend transparently to users and in >> >> > > compatible manner (migration wise). >> >> > > >> >> > > That will also allow to clean up a bit our numa code, leaving only >> >> > > 'memdev' impl. in place and several boards that use node_mem >> >> > > to generate FDT/ACPI description from it. >> >> > >> >> > Can you confirm that the 'mem' and 'memdev' parameters to -numa >> >> > are 100% live migration compatible in both directions ? Libvirt >> >> > would need this to be the case in order to use the 'memdev' syntax >> >> > instead. >> >> Unfortunately they are not migration compatible in any direction, >> >> if it where possible to translate them to each other I'd alias 'mem' >> >> to 'memdev' without deprecation. The former sends over only one >> >> MemoryRegion to target, while the later sends over several (one per >> >> memdev). >> > >> > If we can't migration from one to the other, then we can not deprecate >> > the existing 'mem' syntax. Even if libvirt were to provide a config >> > option to let apps opt-in to the new syntax, we need to be able to >> > support live migration of existing running VMs indefinitely. Effectively >> > this means we need the to keep 'mem' support forever, or at least such >> > a long time that it effectively means forever. >> > >> > So I think this patch has to be dropped & replaced with one that >> > simply documents that memdev syntax is preferred. >> >> We have this habit of postulating absolutes like "can not deprecate" >> instead of engaging with the tradeoffs. We need to kick it. >> >> So let's have an actual look at the tradeoffs. >> >> We don't actually "support live migration of existing running VMs >> indefinitely". >> >> We support live migration to any newer version of QEMU that still >> supports the machine type. >> >> We support live migration to any older version of QEMU that already >> supports the machine type and all the devices the machine uses. >> >> Aside: "support" is really an honest best effort here. If you rely on >> it, use a downstream that puts in the (substantial!) QA work real >> support takes. > > If upstream deletes the feature, then that in turn breaks the downstream > unless downstream reverts the upstream change. When we have large overlap > between downstream & upstream maintainer, it is not beneficial to delete > the feature upstream as any effort saved upstream usually expands into > larger effort downstream. It can't "break" existing downstreams, only future ones forked off after the deletion. Such a fork cares only if it has backward compatibility requirements to satisfy that require the feature. My point is: it's not a simple absolute, it's a complex tradeoff. >> Feature deprecation is not a contract to drop the feature after two >> releases, or even five. It's a formal notice that users of the feature >> should transition to its replacement in an orderly manner. >> >> If I understand Igor correctly, all users should transition away from >> outdated NUMA configurations at least for new VMs in an orderly manner. >> >> So, how could this formal notice be served constructively? >> >> If we reject outdated NUMA configurations starting with machine type T, >> we can remove the means to create those configurations along with >> machine type T-1. Won't happen anytime soon, will happen eventually, >> because in the long run, all machine types are dead (apologies to >> Keynes). >> >> If we deprecate outdated NUMA configurations now, we can start rejecting >> them with new machine types after a suitable grace period. > > How is libvirt going to know what machines it can use with the feature ? > We don't have any way to introspect machine type specific logic, since we > run all probing with "-machine none", and QEMU can't report anything about > machines without instantiating them. Fair point. A practical way for management applications to decide which of the two interfaces they can use with which machine type may be required for deprecating one of the interfaces with new machine types.
On Mon, 04 Mar 2019 08:13:53 +0100 Markus Armbruster <armbru@redhat.com> wrote: > Daniel P. Berrangé <berrange@redhat.com> writes: > > > On Fri, Mar 01, 2019 at 06:33:28PM +0100, Igor Mammedov wrote: > >> On Fri, 1 Mar 2019 15:49:47 +0000 > >> Daniel P. Berrangé <berrange@redhat.com> wrote: > >> > >> > On Fri, Mar 01, 2019 at 04:42:15PM +0100, Igor Mammedov wrote: > >> > > The parameter allows to configure fake NUMA topology where guest > >> > > VM simulates NUMA topology but not actually getting a performance > >> > > benefits from it. The same or better results could be achieved > >> > > using 'memdev' parameter. In light of that any VM that uses NUMA > >> > > to get its benefits should use 'memdev' and to allow transition > >> > > initial RAM to device based model, deprecate 'mem' parameter as > >> > > its ad-hoc partitioning of initial RAM MemoryRegion can't be > >> > > translated to memdev based backend transparently to users and in > >> > > compatible manner (migration wise). > >> > > > >> > > That will also allow to clean up a bit our numa code, leaving only > >> > > 'memdev' impl. in place and several boards that use node_mem > >> > > to generate FDT/ACPI description from it. > >> > > >> > Can you confirm that the 'mem' and 'memdev' parameters to -numa > >> > are 100% live migration compatible in both directions ? Libvirt > >> > would need this to be the case in order to use the 'memdev' syntax > >> > instead. > >> Unfortunately they are not migration compatible in any direction, > >> if it where possible to translate them to each other I'd alias 'mem' > >> to 'memdev' without deprecation. The former sends over only one > >> MemoryRegion to target, while the later sends over several (one per > >> memdev). > > > > If we can't migration from one to the other, then we can not deprecate > > the existing 'mem' syntax. Even if libvirt were to provide a config > > option to let apps opt-in to the new syntax, we need to be able to > > support live migration of existing running VMs indefinitely. Effectively > > this means we need the to keep 'mem' support forever, or at least such > > a long time that it effectively means forever. > > > > So I think this patch has to be dropped & replaced with one that > > simply documents that memdev syntax is preferred. > > We have this habit of postulating absolutes like "can not deprecate" > instead of engaging with the tradeoffs. We need to kick it. > > So let's have an actual look at the tradeoffs. > > We don't actually "support live migration of existing running VMs > indefinitely". > > We support live migration to any newer version of QEMU that still > supports the machine type. > > We support live migration to any older version of QEMU that already > supports the machine type and all the devices the machine uses. > > Aside: "support" is really an honest best effort here. If you rely on > it, use a downstream that puts in the (substantial!) QA work real > support takes. > > Feature deprecation is not a contract to drop the feature after two > releases, or even five. It's a formal notice that users of the feature > should transition to its replacement in an orderly manner. > > If I understand Igor correctly, all users should transition away from > outdated NUMA configurations at least for new VMs in an orderly manner. Yes, we can postpone removing options until there are machines type versions that were capable to use it (unfortunate but probably unavoidable unless there is a migration trick to make transition transparent) but that should not stop us from disabling broken options on new machine types at least. This series can serve as formal notice with follow up disabling of deprecated options for new machine types. (As Thomas noted, just warnings do not work and users continue to use broken features regardless whether they are don't know about issues or aware of it [*]) Hence suggested deprecation approach and enforced rejection of legacy numa options for new machine types in 2 releases so users would stop using them eventually. *) https://www.redhat.com/archives/libvir-list/2018-November/msg00159.html > So, how could this formal notice be served constructively? > > If we reject outdated NUMA configurations starting with machine type T, > we can remove the means to create those configurations along with > machine type T-1. Won't happen anytime soon, will happen eventually, > because in the long run, all machine types are dead (apologies to > Keynes). > > If we deprecate outdated NUMA configurations now, we can start rejecting > them with new machine types after a suitable grace period. >
On Mon, Mar 04, 2019 at 01:25:07PM +0100, Igor Mammedov wrote: > On Mon, 04 Mar 2019 08:13:53 +0100 > Markus Armbruster <armbru@redhat.com> wrote: > > > Daniel P. Berrangé <berrange@redhat.com> writes: > > > > > On Fri, Mar 01, 2019 at 06:33:28PM +0100, Igor Mammedov wrote: > > >> On Fri, 1 Mar 2019 15:49:47 +0000 > > >> Daniel P. Berrangé <berrange@redhat.com> wrote: > > >> > > >> > On Fri, Mar 01, 2019 at 04:42:15PM +0100, Igor Mammedov wrote: > > >> > > The parameter allows to configure fake NUMA topology where guest > > >> > > VM simulates NUMA topology but not actually getting a performance > > >> > > benefits from it. The same or better results could be achieved > > >> > > using 'memdev' parameter. In light of that any VM that uses NUMA > > >> > > to get its benefits should use 'memdev' and to allow transition > > >> > > initial RAM to device based model, deprecate 'mem' parameter as > > >> > > its ad-hoc partitioning of initial RAM MemoryRegion can't be > > >> > > translated to memdev based backend transparently to users and in > > >> > > compatible manner (migration wise). > > >> > > > > >> > > That will also allow to clean up a bit our numa code, leaving only > > >> > > 'memdev' impl. in place and several boards that use node_mem > > >> > > to generate FDT/ACPI description from it. > > >> > > > >> > Can you confirm that the 'mem' and 'memdev' parameters to -numa > > >> > are 100% live migration compatible in both directions ? Libvirt > > >> > would need this to be the case in order to use the 'memdev' syntax > > >> > instead. > > >> Unfortunately they are not migration compatible in any direction, > > >> if it where possible to translate them to each other I'd alias 'mem' > > >> to 'memdev' without deprecation. The former sends over only one > > >> MemoryRegion to target, while the later sends over several (one per > > >> memdev). > > > > > > If we can't migration from one to the other, then we can not deprecate > > > the existing 'mem' syntax. Even if libvirt were to provide a config > > > option to let apps opt-in to the new syntax, we need to be able to > > > support live migration of existing running VMs indefinitely. Effectively > > > this means we need the to keep 'mem' support forever, or at least such > > > a long time that it effectively means forever. > > > > > > So I think this patch has to be dropped & replaced with one that > > > simply documents that memdev syntax is preferred. > > > > We have this habit of postulating absolutes like "can not deprecate" > > instead of engaging with the tradeoffs. We need to kick it. > > > > So let's have an actual look at the tradeoffs. > > > > We don't actually "support live migration of existing running VMs > > indefinitely". > > > > We support live migration to any newer version of QEMU that still > > supports the machine type. > > > > We support live migration to any older version of QEMU that already > > supports the machine type and all the devices the machine uses. > > > > Aside: "support" is really an honest best effort here. If you rely on > > it, use a downstream that puts in the (substantial!) QA work real > > support takes. > > > > Feature deprecation is not a contract to drop the feature after two > > releases, or even five. It's a formal notice that users of the feature > > should transition to its replacement in an orderly manner. > > > > If I understand Igor correctly, all users should transition away from > > outdated NUMA configurations at least for new VMs in an orderly manner. > Yes, we can postpone removing options until there are machines type > versions that were capable to use it (unfortunate but probably > unavoidable unless there is a migration trick to make transition > transparent) but that should not stop us from disabling broken > options on new machine types at least. > > This series can serve as formal notice with follow up disabling of > deprecated options for new machine types. (As Thomas noted, just warnings > do not work and users continue to use broken features regardless whether > they are don't know about issues or aware of it [*]) > > Hence suggested deprecation approach and enforced rejection of legacy > numa options for new machine types in 2 releases so users would stop > using them eventually. When we deprecate something, we need to have a way for apps to use the new alternative approach *at the same time*. So even if we only want to deprecate for new machine types, we still have to first solve the problem of how mgmt apps will introspect QEMU to learn which machine types expect the new options. Regards, Daniel
On Fri, 1 Mar 2019 18:01:52 +0000 "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote: > * Igor Mammedov (imammedo@redhat.com) wrote: > > On Fri, 1 Mar 2019 15:49:47 +0000 > > Daniel P. Berrangé <berrange@redhat.com> wrote: > > > > > On Fri, Mar 01, 2019 at 04:42:15PM +0100, Igor Mammedov wrote: > > > > The parameter allows to configure fake NUMA topology where guest > > > > VM simulates NUMA topology but not actually getting a performance > > > > benefits from it. The same or better results could be achieved > > > > using 'memdev' parameter. In light of that any VM that uses NUMA > > > > to get its benefits should use 'memdev' and to allow transition > > > > initial RAM to device based model, deprecate 'mem' parameter as > > > > its ad-hoc partitioning of initial RAM MemoryRegion can't be > > > > translated to memdev based backend transparently to users and in > > > > compatible manner (migration wise). > > > > > > > > That will also allow to clean up a bit our numa code, leaving only > > > > 'memdev' impl. in place and several boards that use node_mem > > > > to generate FDT/ACPI description from it. > > > > > > Can you confirm that the 'mem' and 'memdev' parameters to -numa > > > are 100% live migration compatible in both directions ? Libvirt > > > would need this to be the case in order to use the 'memdev' syntax > > > instead. > > Unfortunately they are not migration compatible in any direction, > > if it where possible to translate them to each other I'd alias 'mem' > > to 'memdev' without deprecation. The former sends over only one > > MemoryRegion to target, while the later sends over several (one per > > memdev). > > > > Mixed memory issue[1] first came from libvirt side RHBZ1624223, > > back then it was resolved on libvirt side in favor of migration > > compatibility vs correctness (i.e. bind policy doesn't work as expected). > > What worse that it was made default and affects all new machines, > > as I understood it. > > > > In case of -mem-path + -mem-prealloc (with 1 numa node or numa less) > > it's possible on QEMU side to make conversion to memdev in migration > > compatible way (that's what stopped Michal from memdev approach). > > But it's hard to do so in multi-nodes case as amount of MemoryRegions > > is different. > > > > Point is to consider 'mem' as mis-configuration error, as the user > > in the first place using broken numa configuration > > (i.e. fake numa configuration doesn't actually improve performance). > > > > CCed David, maybe he could offer a way to do 1:n migration and other > > way around. > > I can't see a trivial way. > About the easiest I can think of is if you had a way to create a memdev > that was an alias to pc.ram (of a particular size and offset). If I get you right that's what I was planning to do for numa-less machines that use -mem-path/prealloc options, where it's possible to replace an initial RAM MemoryRegion with a correspondingly named memdev and its backing MemoryRegion. But I don't see how it could work in case of legacy NUMA 'mem' options where initial RAM is 1 MemoryRegion (it's a fake numa after all) and how to translate that into several MemoryRegions (one per node/memdev). > Dave > > > > > > > Signed-off-by: Igor Mammedov <imammedo@redhat.com> > > > > --- > > > > numa.c | 2 ++ > > > > qemu-deprecated.texi | 14 ++++++++++++++ > > > > 2 files changed, 16 insertions(+) > > > > > > > > diff --git a/numa.c b/numa.c > > > > index 3875e1e..2205773 100644 > > > > --- a/numa.c > > > > +++ b/numa.c > > > > @@ -121,6 +121,8 @@ static void parse_numa_node(MachineState *ms, NumaNodeOptions *node, > > > > > > > > if (node->has_mem) { > > > > numa_info[nodenr].node_mem = node->mem; > > > > + warn_report("Parameter -numa node,mem is deprecated," > > > > + " use -numa node,memdev instead"); > > > > } > > > > if (node->has_memdev) { > > > > Object *o; > > > > diff --git a/qemu-deprecated.texi b/qemu-deprecated.texi > > > > index 45c5795..73f99d4 100644 > > > > --- a/qemu-deprecated.texi > > > > +++ b/qemu-deprecated.texi > > > > @@ -60,6 +60,20 @@ Support for invalid topologies will be removed, the user must ensure > > > > topologies described with -smp include all possible cpus, i.e. > > > > @math{@var{sockets} * @var{cores} * @var{threads} = @var{maxcpus}}. > > > > > > > > +@subsection -numa node,mem=@var{size} (since 4.0) > > > > + > > > > +The parameter @option{mem} of @option{-numa node} is used to assign a part of > > > > +guest RAM to a NUMA node. But when using it, it's impossible to manage specified > > > > +size on the host side (like bind it to a host node, setting bind policy, ...), > > > > +so guest end-ups with the fake NUMA configuration with suboptiomal performance. > > > > +However since 2014 there is an alternative way to assign RAM to a NUMA node > > > > +using parameter @option{memdev}, which does the same as @option{mem} and has > > > > +an ability to actualy manage node RAM on the host side. Use parameter > > > > +@option{memdev} with @var{memory-backend-ram} backend as an replacement for > > > > +parameter @option{mem} to achieve the same fake NUMA effect or a properly > > > > +configured @var{memory-backend-file} backend to actually benefit from NUMA > > > > +configuration. > > > > + > > > > @section QEMU Machine Protocol (QMP) commands > > > > > > > > @subsection block-dirty-bitmap-add "autoload" parameter (since 2.12.0) > > > > -- > > > > 2.7.4 > > > > > > > > -- > > > > libvir-list mailing list > > > > libvir-list@redhat.com > > > > https://www.redhat.com/mailman/listinfo/libvir-list > > > > > > Regards, > > > Daniel > > > -- > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
On Mon, 4 Mar 2019 09:11:19 +0100 Thomas Huth <thuth@redhat.com> wrote: > On 01/03/2019 18.48, Daniel P. Berrangé wrote: > [...] > > So I think this patch has to be dropped & replaced with one that > > simply documents that memdev syntax is preferred. > > That's definitely not enough. I've had a couple of cases already where > we documented that certain options should not be used anymore, and > people simply ignored it (aka. if it ain't broken, don't do any change). > Then they just started to complain when I really tried to remove the > option after the deprecation period. > So Igor, if you can not officially deprecate these things here yet, you > should at least make sure that they can not be used with new machine > types anymore. Then, after a couple of years, when we feel sure that > there are only some few or no people left who still use it with the old > machine types, we can start to discuss the deprecation process again, I > think. Is it acceptable to silently disable CLI options (even if they are broken like in this case) for new machine types? I was under impression that it should go through deprecation first. > > Thomas >
On Mon, Mar 04, 2019 at 02:55:10PM +0100, Igor Mammedov wrote: > On Mon, 4 Mar 2019 09:11:19 +0100 > Thomas Huth <thuth@redhat.com> wrote: > > > On 01/03/2019 18.48, Daniel P. Berrangé wrote: > > [...] > > > So I think this patch has to be dropped & replaced with one that > > > simply documents that memdev syntax is preferred. > > > > That's definitely not enough. I've had a couple of cases already where > > we documented that certain options should not be used anymore, and > > people simply ignored it (aka. if it ain't broken, don't do any change). > > Then they just started to complain when I really tried to remove the > > option after the deprecation period. > > > So Igor, if you can not officially deprecate these things here yet, you > > should at least make sure that they can not be used with new machine > > types anymore. Then, after a couple of years, when we feel sure that > > there are only some few or no people left who still use it with the old > > machine types, we can start to discuss the deprecation process again, I > > think. > Is it acceptable to silently disable CLI options (even if they are broken > like in this case) for new machine types? > I was under impression that it should go through deprecation first. Yes, it must go through deprecation. I was saying we can't disable the CLI options at all, until there is a way for libvirt to correctly use the new options. Regards, Daniel
On Mon, 4 Mar 2019 12:39:08 +0000 Daniel P. Berrangé <berrange@redhat.com> wrote: > On Mon, Mar 04, 2019 at 01:25:07PM +0100, Igor Mammedov wrote: > > On Mon, 04 Mar 2019 08:13:53 +0100 > > Markus Armbruster <armbru@redhat.com> wrote: > > > > > Daniel P. Berrangé <berrange@redhat.com> writes: > > > > > > > On Fri, Mar 01, 2019 at 06:33:28PM +0100, Igor Mammedov wrote: > > > >> On Fri, 1 Mar 2019 15:49:47 +0000 > > > >> Daniel P. Berrangé <berrange@redhat.com> wrote: > > > >> > > > >> > On Fri, Mar 01, 2019 at 04:42:15PM +0100, Igor Mammedov wrote: > > > >> > > The parameter allows to configure fake NUMA topology where guest > > > >> > > VM simulates NUMA topology but not actually getting a performance > > > >> > > benefits from it. The same or better results could be achieved > > > >> > > using 'memdev' parameter. In light of that any VM that uses NUMA > > > >> > > to get its benefits should use 'memdev' and to allow transition > > > >> > > initial RAM to device based model, deprecate 'mem' parameter as > > > >> > > its ad-hoc partitioning of initial RAM MemoryRegion can't be > > > >> > > translated to memdev based backend transparently to users and in > > > >> > > compatible manner (migration wise). > > > >> > > > > > >> > > That will also allow to clean up a bit our numa code, leaving only > > > >> > > 'memdev' impl. in place and several boards that use node_mem > > > >> > > to generate FDT/ACPI description from it. > > > >> > > > > >> > Can you confirm that the 'mem' and 'memdev' parameters to -numa > > > >> > are 100% live migration compatible in both directions ? Libvirt > > > >> > would need this to be the case in order to use the 'memdev' syntax > > > >> > instead. > > > >> Unfortunately they are not migration compatible in any direction, > > > >> if it where possible to translate them to each other I'd alias 'mem' > > > >> to 'memdev' without deprecation. The former sends over only one > > > >> MemoryRegion to target, while the later sends over several (one per > > > >> memdev). > > > > > > > > If we can't migration from one to the other, then we can not deprecate > > > > the existing 'mem' syntax. Even if libvirt were to provide a config > > > > option to let apps opt-in to the new syntax, we need to be able to > > > > support live migration of existing running VMs indefinitely. Effectively > > > > this means we need the to keep 'mem' support forever, or at least such > > > > a long time that it effectively means forever. > > > > > > > > So I think this patch has to be dropped & replaced with one that > > > > simply documents that memdev syntax is preferred. > > > > > > We have this habit of postulating absolutes like "can not deprecate" > > > instead of engaging with the tradeoffs. We need to kick it. > > > > > > So let's have an actual look at the tradeoffs. > > > > > > We don't actually "support live migration of existing running VMs > > > indefinitely". > > > > > > We support live migration to any newer version of QEMU that still > > > supports the machine type. > > > > > > We support live migration to any older version of QEMU that already > > > supports the machine type and all the devices the machine uses. > > > > > > Aside: "support" is really an honest best effort here. If you rely on > > > it, use a downstream that puts in the (substantial!) QA work real > > > support takes. > > > > > > Feature deprecation is not a contract to drop the feature after two > > > releases, or even five. It's a formal notice that users of the feature > > > should transition to its replacement in an orderly manner. > > > > > > If I understand Igor correctly, all users should transition away from > > > outdated NUMA configurations at least for new VMs in an orderly manner. > > Yes, we can postpone removing options until there are machines type > > versions that were capable to use it (unfortunate but probably > > unavoidable unless there is a migration trick to make transition > > transparent) but that should not stop us from disabling broken > > options on new machine types at least. > > > > This series can serve as formal notice with follow up disabling of > > deprecated options for new machine types. (As Thomas noted, just warnings > > do not work and users continue to use broken features regardless whether > > they are don't know about issues or aware of it [*]) > > > > Hence suggested deprecation approach and enforced rejection of legacy > > numa options for new machine types in 2 releases so users would stop > > using them eventually. > > When we deprecate something, we need to have a way for apps to use the > new alternative approach *at the same time*. So even if we only want to > deprecate for new machine types, we still have to first solve the problem > of how mgmt apps will introspect QEMU to learn which machine types expect > the new options. I'm not aware any mechanism to introspect machine type options (existing or something being developed). Are/were there any ideas about it that were discussed in the past? Aside from developing a new mechanism what are alternative approaches? I mean when we delete deprecated CLI option, how it's solved on libvirt side currently? For example I don't see anything introspection related when we have been removing deprecated options recently. More exact question specific to this series usecase, how libvirt decides when to use -numa node,memdev or not currently? > > Regards, > Daniel
[Thanks Igor for bringing this onto my radar. I don't follow qemu-devel that close] On 3/4/19 11:19 AM, Daniel P. Berrangé wrote: > On Mon, Mar 04, 2019 at 08:13:53AM +0100, Markus Armbruster wrote: >> Daniel P. Berrangé <berrange@redhat.com> writes: >> >>> On Fri, Mar 01, 2019 at 06:33:28PM +0100, Igor Mammedov wrote: >>>> On Fri, 1 Mar 2019 15:49:47 +0000 >>>> Daniel P. Berrangé <berrange@redhat.com> wrote: >>>> >>>>> On Fri, Mar 01, 2019 at 04:42:15PM +0100, Igor Mammedov wrote: >>>>>> The parameter allows to configure fake NUMA topology where guest >>>>>> VM simulates NUMA topology but not actually getting a performance >>>>>> benefits from it. The same or better results could be achieved >>>>>> using 'memdev' parameter. In light of that any VM that uses NUMA >>>>>> to get its benefits should use 'memdev' and to allow transition >>>>>> initial RAM to device based model, deprecate 'mem' parameter as >>>>>> its ad-hoc partitioning of initial RAM MemoryRegion can't be >>>>>> translated to memdev based backend transparently to users and in >>>>>> compatible manner (migration wise). >>>>>> >>>>>> That will also allow to clean up a bit our numa code, leaving only >>>>>> 'memdev' impl. in place and several boards that use node_mem >>>>>> to generate FDT/ACPI description from it. >>>>> >>>>> Can you confirm that the 'mem' and 'memdev' parameters to -numa >>>>> are 100% live migration compatible in both directions ? Libvirt >>>>> would need this to be the case in order to use the 'memdev' syntax >>>>> instead. >>>> Unfortunately they are not migration compatible in any direction, >>>> if it where possible to translate them to each other I'd alias 'mem' >>>> to 'memdev' without deprecation. The former sends over only one >>>> MemoryRegion to target, while the later sends over several (one per >>>> memdev). >>> >>> If we can't migration from one to the other, then we can not deprecate >>> the existing 'mem' syntax. Even if libvirt were to provide a config >>> option to let apps opt-in to the new syntax, we need to be able to >>> support live migration of existing running VMs indefinitely. Effectively >>> this means we need the to keep 'mem' support forever, or at least such >>> a long time that it effectively means forever. I'm with Daniel on this. The reason why libvirt still defaults to '-numa node,mem=' is exactly because of backward compatibility. Since a machine can't be migrated from '-numa node,mem=' to '-numa node,memdev= + -object memory-backend-*' libvirt hast to play it safe and chose a combination that is acessible the widest. If you remove this, how would you expect older machines to migrate to newer cmd line? I'm all for deprecating old stuff. In fact, I've suggested that in libvirt(!) here and there, but I'm afraid we can't just remove functionatlity unless we give users a way to migrate to the one we prefer now. And if libvirt doesn't follow qemu's warnings then it definitely should. It's a libvirt bug if it doesn't follow the best practicies (well, if can). Michal
On Mon, Mar 04, 2019 at 03:16:41PM +0100, Igor Mammedov wrote: > On Mon, 4 Mar 2019 12:39:08 +0000 > Daniel P. Berrangé <berrange@redhat.com> wrote: > > > On Mon, Mar 04, 2019 at 01:25:07PM +0100, Igor Mammedov wrote: > > > On Mon, 04 Mar 2019 08:13:53 +0100 > > > Markus Armbruster <armbru@redhat.com> wrote: > > > > > > > Daniel P. Berrangé <berrange@redhat.com> writes: > > > > > > > > > On Fri, Mar 01, 2019 at 06:33:28PM +0100, Igor Mammedov wrote: > > > > >> On Fri, 1 Mar 2019 15:49:47 +0000 > > > > >> Daniel P. Berrangé <berrange@redhat.com> wrote: > > > > >> > > > > >> > On Fri, Mar 01, 2019 at 04:42:15PM +0100, Igor Mammedov wrote: > > > > >> > > The parameter allows to configure fake NUMA topology where guest > > > > >> > > VM simulates NUMA topology but not actually getting a performance > > > > >> > > benefits from it. The same or better results could be achieved > > > > >> > > using 'memdev' parameter. In light of that any VM that uses NUMA > > > > >> > > to get its benefits should use 'memdev' and to allow transition > > > > >> > > initial RAM to device based model, deprecate 'mem' parameter as > > > > >> > > its ad-hoc partitioning of initial RAM MemoryRegion can't be > > > > >> > > translated to memdev based backend transparently to users and in > > > > >> > > compatible manner (migration wise). > > > > >> > > > > > > >> > > That will also allow to clean up a bit our numa code, leaving only > > > > >> > > 'memdev' impl. in place and several boards that use node_mem > > > > >> > > to generate FDT/ACPI description from it. > > > > >> > > > > > >> > Can you confirm that the 'mem' and 'memdev' parameters to -numa > > > > >> > are 100% live migration compatible in both directions ? Libvirt > > > > >> > would need this to be the case in order to use the 'memdev' syntax > > > > >> > instead. > > > > >> Unfortunately they are not migration compatible in any direction, > > > > >> if it where possible to translate them to each other I'd alias 'mem' > > > > >> to 'memdev' without deprecation. The former sends over only one > > > > >> MemoryRegion to target, while the later sends over several (one per > > > > >> memdev). > > > > > > > > > > If we can't migration from one to the other, then we can not deprecate > > > > > the existing 'mem' syntax. Even if libvirt were to provide a config > > > > > option to let apps opt-in to the new syntax, we need to be able to > > > > > support live migration of existing running VMs indefinitely. Effectively > > > > > this means we need the to keep 'mem' support forever, or at least such > > > > > a long time that it effectively means forever. > > > > > > > > > > So I think this patch has to be dropped & replaced with one that > > > > > simply documents that memdev syntax is preferred. > > > > > > > > We have this habit of postulating absolutes like "can not deprecate" > > > > instead of engaging with the tradeoffs. We need to kick it. > > > > > > > > So let's have an actual look at the tradeoffs. > > > > > > > > We don't actually "support live migration of existing running VMs > > > > indefinitely". > > > > > > > > We support live migration to any newer version of QEMU that still > > > > supports the machine type. > > > > > > > > We support live migration to any older version of QEMU that already > > > > supports the machine type and all the devices the machine uses. > > > > > > > > Aside: "support" is really an honest best effort here. If you rely on > > > > it, use a downstream that puts in the (substantial!) QA work real > > > > support takes. > > > > > > > > Feature deprecation is not a contract to drop the feature after two > > > > releases, or even five. It's a formal notice that users of the feature > > > > should transition to its replacement in an orderly manner. > > > > > > > > If I understand Igor correctly, all users should transition away from > > > > outdated NUMA configurations at least for new VMs in an orderly manner. > > > Yes, we can postpone removing options until there are machines type > > > versions that were capable to use it (unfortunate but probably > > > unavoidable unless there is a migration trick to make transition > > > transparent) but that should not stop us from disabling broken > > > options on new machine types at least. > > > > > > This series can serve as formal notice with follow up disabling of > > > deprecated options for new machine types. (As Thomas noted, just warnings > > > do not work and users continue to use broken features regardless whether > > > they are don't know about issues or aware of it [*]) > > > > > > Hence suggested deprecation approach and enforced rejection of legacy > > > numa options for new machine types in 2 releases so users would stop > > > using them eventually. > > > > When we deprecate something, we need to have a way for apps to use the > > new alternative approach *at the same time*. So even if we only want to > > deprecate for new machine types, we still have to first solve the problem > > of how mgmt apps will introspect QEMU to learn which machine types expect > > the new options. > I'm not aware any mechanism to introspect machine type options (existing > or something being developed). Are/were there any ideas about it that were > discussed in the past? > > Aside from developing a new mechanism what are alternative approaches? > I mean when we delete deprecated CLI option, how it's solved on libvirt > side currently? > > For example I don't see anything introspection related when we have been > removing deprecated options recently. Right, with other stuff we deprecate we've had a simpler time, as it either didn't affect migration at all, or the new replacement stuff was fully compatible with the migration data stream. IOW, libvirt could unconditionally use the new feature as soon as it saw that it exists in QEMU. We didn't have any machine type dependancy to deal with before now. > More exact question specific to this series usecase, > how libvirt decides when to use -numa node,memdev or not currently? It is pretty hard to follow the code, but IIUC we only use memdev when stting up NVDIMMs, or for guests configured to have the "shared" flag on the memory region. Regards, Daniel
On 3/4/19 3:16 PM, Igor Mammedov wrote: > On Mon, 4 Mar 2019 12:39:08 +0000 > Daniel P. Berrangé <berrange@redhat.com> wrote: > >> On Mon, Mar 04, 2019 at 01:25:07PM +0100, Igor Mammedov wrote: >>> On Mon, 04 Mar 2019 08:13:53 +0100 >>> Markus Armbruster <armbru@redhat.com> wrote: >>> >>>> Daniel P. Berrangé <berrange@redhat.com> writes: >>>> >>>>> On Fri, Mar 01, 2019 at 06:33:28PM +0100, Igor Mammedov wrote: >>>>>> On Fri, 1 Mar 2019 15:49:47 +0000 >>>>>> Daniel P. Berrangé <berrange@redhat.com> wrote: >>>>>> >>>>>>> On Fri, Mar 01, 2019 at 04:42:15PM +0100, Igor Mammedov wrote: >>>>>>>> The parameter allows to configure fake NUMA topology where guest >>>>>>>> VM simulates NUMA topology but not actually getting a performance >>>>>>>> benefits from it. The same or better results could be achieved >>>>>>>> using 'memdev' parameter. In light of that any VM that uses NUMA >>>>>>>> to get its benefits should use 'memdev' and to allow transition >>>>>>>> initial RAM to device based model, deprecate 'mem' parameter as >>>>>>>> its ad-hoc partitioning of initial RAM MemoryRegion can't be >>>>>>>> translated to memdev based backend transparently to users and in >>>>>>>> compatible manner (migration wise). >>>>>>>> >>>>>>>> That will also allow to clean up a bit our numa code, leaving only >>>>>>>> 'memdev' impl. in place and several boards that use node_mem >>>>>>>> to generate FDT/ACPI description from it. >>>>>>> >>>>>>> Can you confirm that the 'mem' and 'memdev' parameters to -numa >>>>>>> are 100% live migration compatible in both directions ? Libvirt >>>>>>> would need this to be the case in order to use the 'memdev' syntax >>>>>>> instead. >>>>>> Unfortunately they are not migration compatible in any direction, >>>>>> if it where possible to translate them to each other I'd alias 'mem' >>>>>> to 'memdev' without deprecation. The former sends over only one >>>>>> MemoryRegion to target, while the later sends over several (one per >>>>>> memdev). >>>>> >>>>> If we can't migration from one to the other, then we can not deprecate >>>>> the existing 'mem' syntax. Even if libvirt were to provide a config >>>>> option to let apps opt-in to the new syntax, we need to be able to >>>>> support live migration of existing running VMs indefinitely. Effectively >>>>> this means we need the to keep 'mem' support forever, or at least such >>>>> a long time that it effectively means forever. >>>>> >>>>> So I think this patch has to be dropped & replaced with one that >>>>> simply documents that memdev syntax is preferred. >>>> >>>> We have this habit of postulating absolutes like "can not deprecate" >>>> instead of engaging with the tradeoffs. We need to kick it. >>>> >>>> So let's have an actual look at the tradeoffs. >>>> >>>> We don't actually "support live migration of existing running VMs >>>> indefinitely". >>>> >>>> We support live migration to any newer version of QEMU that still >>>> supports the machine type. >>>> >>>> We support live migration to any older version of QEMU that already >>>> supports the machine type and all the devices the machine uses. >>>> >>>> Aside: "support" is really an honest best effort here. If you rely on >>>> it, use a downstream that puts in the (substantial!) QA work real >>>> support takes. >>>> >>>> Feature deprecation is not a contract to drop the feature after two >>>> releases, or even five. It's a formal notice that users of the feature >>>> should transition to its replacement in an orderly manner. >>>> >>>> If I understand Igor correctly, all users should transition away from >>>> outdated NUMA configurations at least for new VMs in an orderly manner. >>> Yes, we can postpone removing options until there are machines type >>> versions that were capable to use it (unfortunate but probably >>> unavoidable unless there is a migration trick to make transition >>> transparent) but that should not stop us from disabling broken >>> options on new machine types at least. >>> >>> This series can serve as formal notice with follow up disabling of >>> deprecated options for new machine types. (As Thomas noted, just warnings >>> do not work and users continue to use broken features regardless whether >>> they are don't know about issues or aware of it [*]) >>> >>> Hence suggested deprecation approach and enforced rejection of legacy >>> numa options for new machine types in 2 releases so users would stop >>> using them eventually. >> >> When we deprecate something, we need to have a way for apps to use the >> new alternative approach *at the same time*. So even if we only want to >> deprecate for new machine types, we still have to first solve the problem >> of how mgmt apps will introspect QEMU to learn which machine types expect >> the new options. > I'm not aware any mechanism to introspect machine type options (existing > or something being developed). Are/were there any ideas about it that were > discussed in the past? > > Aside from developing a new mechanism what are alternative approaches? > I mean when we delete deprecated CLI option, how it's solved on libvirt > side currently? Libvirt queries qemu capabilites via QMP. And in all places it can it preferes the latest recommended cmd line options (well, those known to a libvirt developer at the time he/she is writing the code). So as long as you remove only old stuff and libvirt refreshes itself when following best practicies we're okay. > > For example I don't see anything introspection related when we have been > removing deprecated options recently. > > More exact question specific to this series usecase, > how libvirt decides when to use -numa node,memdev or not currently? It has a mechanism to tell if '-numa node,memdev=' is needed (i.e. there is no other way to satisfy user requested configuration) and only then it uses ,memdev. For all other cases it defaults to -numa node,mem= simply to keep backwards compatibility (as I'm explaining in another e-mail I've just sent to this list). Anyway, in the libvirt code you want to be looking at: src/qemu/qemu_command.c: qemuBuildNumaArgStr src/qemu/qemu_command.c: qemuBuildMemoryCellBackendStr Michal
On Mon, 4 Mar 2019 13:59:09 +0000 Daniel P. Berrangé <berrange@redhat.com> wrote: > On Mon, Mar 04, 2019 at 02:55:10PM +0100, Igor Mammedov wrote: > > On Mon, 4 Mar 2019 09:11:19 +0100 > > Thomas Huth <thuth@redhat.com> wrote: > > > > > On 01/03/2019 18.48, Daniel P. Berrangé wrote: > > > [...] > > > > So I think this patch has to be dropped & replaced with one that > > > > simply documents that memdev syntax is preferred. > > > > > > That's definitely not enough. I've had a couple of cases already where > > > we documented that certain options should not be used anymore, and > > > people simply ignored it (aka. if it ain't broken, don't do any change). > > > Then they just started to complain when I really tried to remove the > > > option after the deprecation period. > > > > > So Igor, if you can not officially deprecate these things here yet, you > > > should at least make sure that they can not be used with new machine > > > types anymore. Then, after a couple of years, when we feel sure that > > > there are only some few or no people left who still use it with the old > > > machine types, we can start to discuss the deprecation process again, I > > > think. > > Is it acceptable to silently disable CLI options (even if they are broken > > like in this case) for new machine types? > > I was under impression that it should go through deprecation first. > > Yes, it must go through deprecation. I was saying we can't disable > the CLI options at all, until there is a way for libvirt to correctly > use the new options. I'm not adding new options (nor plan to for numa case (yet)), -numa node,memdev is around several years by now and should be used as default for creating new configs. In light of keeping 'mem' option around for old machines, Deprecation should have served for notifying users that legacy options will be disabled later on (for new machines at least if no way found for migration compatible transition for older ones). What I'm mainly aiming here is to prevent using broken legacy options for new VMs (like in RHBZ1624223 case) and deprecation is the only way we have now to notify users about CLI breaking changes. In the -mem-path/prealloc case, there will be a new machine option 'memdev' to replace them but that's migration compatible, so libvirt could use new CLI syntax to replace even old configs. (If there is a mechanism for introducing/introspecting a new options per-machine or whole QEMU, I'd happy to add a new option there on top of just deprecation notice that we have now). > Regards, > Daniel
On Mon, Mar 04, 2019 at 03:54:57PM +0100, Igor Mammedov wrote: > On Mon, 4 Mar 2019 13:59:09 +0000 > Daniel P. Berrangé <berrange@redhat.com> wrote: > > > On Mon, Mar 04, 2019 at 02:55:10PM +0100, Igor Mammedov wrote: > > > On Mon, 4 Mar 2019 09:11:19 +0100 > > > Thomas Huth <thuth@redhat.com> wrote: > > > > > > > On 01/03/2019 18.48, Daniel P. Berrangé wrote: > > > > [...] > > > > > So I think this patch has to be dropped & replaced with one that > > > > > simply documents that memdev syntax is preferred. > > > > > > > > That's definitely not enough. I've had a couple of cases already where > > > > we documented that certain options should not be used anymore, and > > > > people simply ignored it (aka. if it ain't broken, don't do any change). > > > > Then they just started to complain when I really tried to remove the > > > > option after the deprecation period. > > > > > > > So Igor, if you can not officially deprecate these things here yet, you > > > > should at least make sure that they can not be used with new machine > > > > types anymore. Then, after a couple of years, when we feel sure that > > > > there are only some few or no people left who still use it with the old > > > > machine types, we can start to discuss the deprecation process again, I > > > > think. > > > Is it acceptable to silently disable CLI options (even if they are broken > > > like in this case) for new machine types? > > > I was under impression that it should go through deprecation first. > > > > Yes, it must go through deprecation. I was saying we can't disable > > the CLI options at all, until there is a way for libvirt to correctly > > use the new options. > > I'm not adding new options (nor plan to for numa case (yet)), > -numa node,memdev is around several years by now and should be used > as default for creating new configs. > > In light of keeping 'mem' option around for old machines, > Deprecation should have served for notifying users that legacy > options will be disabled later on (for new machines at least > if no way found for migration compatible transition for older ones). > > What I'm mainly aiming here is to prevent using broken legacy options > for new VMs (like in RHBZ1624223 case) and deprecation is the only way > we have now to notify users about CLI breaking changes. The idea of doing advance warnings via deprecations is that applications have time to adapt to the new mechanism several releases before the old mechanism is removed/disabled. Since the new mechanism isn't fully usable yet, applications can't adapt to use it. So we can't start the deprecation process yet, as it would be telling apps to do a switch that isn't possible for many to actually do. In the meantime, qemu-options.hx could be updated. It documents both "mem" and "memdev" currently but doesn't tell people that "memdev" is the preferred syntax for future usage / warn against using "mem". Regards, Daniel
On Mon, 4 Mar 2019 15:24:28 +0100 Michal Privoznik <mprivozn@redhat.com> wrote: > [Thanks Igor for bringing this onto my radar. I don't follow qemu-devel > that close] > > On 3/4/19 11:19 AM, Daniel P. Berrangé wrote: > > On Mon, Mar 04, 2019 at 08:13:53AM +0100, Markus Armbruster wrote: > >> Daniel P. Berrangé <berrange@redhat.com> writes: > >> > >>> On Fri, Mar 01, 2019 at 06:33:28PM +0100, Igor Mammedov wrote: > >>>> On Fri, 1 Mar 2019 15:49:47 +0000 > >>>> Daniel P. Berrangé <berrange@redhat.com> wrote: > >>>> > >>>>> On Fri, Mar 01, 2019 at 04:42:15PM +0100, Igor Mammedov wrote: > >>>>>> The parameter allows to configure fake NUMA topology where guest > >>>>>> VM simulates NUMA topology but not actually getting a performance > >>>>>> benefits from it. The same or better results could be achieved > >>>>>> using 'memdev' parameter. In light of that any VM that uses NUMA > >>>>>> to get its benefits should use 'memdev' and to allow transition > >>>>>> initial RAM to device based model, deprecate 'mem' parameter as > >>>>>> its ad-hoc partitioning of initial RAM MemoryRegion can't be > >>>>>> translated to memdev based backend transparently to users and in > >>>>>> compatible manner (migration wise). > >>>>>> > >>>>>> That will also allow to clean up a bit our numa code, leaving only > >>>>>> 'memdev' impl. in place and several boards that use node_mem > >>>>>> to generate FDT/ACPI description from it. > >>>>> > >>>>> Can you confirm that the 'mem' and 'memdev' parameters to -numa > >>>>> are 100% live migration compatible in both directions ? Libvirt > >>>>> would need this to be the case in order to use the 'memdev' syntax > >>>>> instead. > >>>> Unfortunately they are not migration compatible in any direction, > >>>> if it where possible to translate them to each other I'd alias 'mem' > >>>> to 'memdev' without deprecation. The former sends over only one > >>>> MemoryRegion to target, while the later sends over several (one per > >>>> memdev). > >>> > >>> If we can't migration from one to the other, then we can not deprecate > >>> the existing 'mem' syntax. Even if libvirt were to provide a config > >>> option to let apps opt-in to the new syntax, we need to be able to > >>> support live migration of existing running VMs indefinitely. Effectively > >>> this means we need the to keep 'mem' support forever, or at least such > >>> a long time that it effectively means forever. > > I'm with Daniel on this. The reason why libvirt still defaults to '-numa > node,mem=' is exactly because of backward compatibility. Since a machine > can't be migrated from '-numa node,mem=' to '-numa node,memdev= + > -object memory-backend-*' libvirt hast to play it safe and chose a > combination that is acessible the widest. > > If you remove this, how would you expect older machines to migrate to > newer cmd line? > > I'm all for deprecating old stuff. In fact, I've suggested that in > libvirt(!) here and there, but I'm afraid we can't just remove > functionatlity unless we give users a way to migrate to the one we > prefer now. Agreed, it's clear now that I can't remove just 'mem' for OLD machine types (even if this safe variant is broken and doesn't actually do what it should). Libvirt should use 'memdev' for new VMs for them to actually benefit from NUMA configuration. Currently we are talking about disabling 'mem' for new machine types only (pity that I have to keep around legacy code but at least we would be able to move on to normal device modeling for initial memory on new machines). > And if libvirt doesn't follow qemu's warnings then it definitely should. > It's a libvirt bug if it doesn't follow the best practicies (well, if can). > > Michal
On Mon, 4 Mar 2019 14:24:32 +0000 Daniel P. Berrangé <berrange@redhat.com> wrote: > On Mon, Mar 04, 2019 at 03:16:41PM +0100, Igor Mammedov wrote: > > On Mon, 4 Mar 2019 12:39:08 +0000 > > Daniel P. Berrangé <berrange@redhat.com> wrote: > > > > > On Mon, Mar 04, 2019 at 01:25:07PM +0100, Igor Mammedov wrote: > > > > On Mon, 04 Mar 2019 08:13:53 +0100 > > > > Markus Armbruster <armbru@redhat.com> wrote: > > > > > > > > > Daniel P. Berrangé <berrange@redhat.com> writes: > > > > > > > > > > > On Fri, Mar 01, 2019 at 06:33:28PM +0100, Igor Mammedov wrote: > > > > > >> On Fri, 1 Mar 2019 15:49:47 +0000 > > > > > >> Daniel P. Berrangé <berrange@redhat.com> wrote: > > > > > >> > > > > > >> > On Fri, Mar 01, 2019 at 04:42:15PM +0100, Igor Mammedov wrote: > > > > > >> > > The parameter allows to configure fake NUMA topology where guest > > > > > >> > > VM simulates NUMA topology but not actually getting a performance > > > > > >> > > benefits from it. The same or better results could be achieved > > > > > >> > > using 'memdev' parameter. In light of that any VM that uses NUMA > > > > > >> > > to get its benefits should use 'memdev' and to allow transition > > > > > >> > > initial RAM to device based model, deprecate 'mem' parameter as > > > > > >> > > its ad-hoc partitioning of initial RAM MemoryRegion can't be > > > > > >> > > translated to memdev based backend transparently to users and in > > > > > >> > > compatible manner (migration wise). > > > > > >> > > > > > > > >> > > That will also allow to clean up a bit our numa code, leaving only > > > > > >> > > 'memdev' impl. in place and several boards that use node_mem > > > > > >> > > to generate FDT/ACPI description from it. > > > > > >> > > > > > > >> > Can you confirm that the 'mem' and 'memdev' parameters to -numa > > > > > >> > are 100% live migration compatible in both directions ? Libvirt > > > > > >> > would need this to be the case in order to use the 'memdev' syntax > > > > > >> > instead. > > > > > >> Unfortunately they are not migration compatible in any direction, > > > > > >> if it where possible to translate them to each other I'd alias 'mem' > > > > > >> to 'memdev' without deprecation. The former sends over only one > > > > > >> MemoryRegion to target, while the later sends over several (one per > > > > > >> memdev). > > > > > > > > > > > > If we can't migration from one to the other, then we can not deprecate > > > > > > the existing 'mem' syntax. Even if libvirt were to provide a config > > > > > > option to let apps opt-in to the new syntax, we need to be able to > > > > > > support live migration of existing running VMs indefinitely. Effectively > > > > > > this means we need the to keep 'mem' support forever, or at least such > > > > > > a long time that it effectively means forever. > > > > > > > > > > > > So I think this patch has to be dropped & replaced with one that > > > > > > simply documents that memdev syntax is preferred. > > > > > > > > > > We have this habit of postulating absolutes like "can not deprecate" > > > > > instead of engaging with the tradeoffs. We need to kick it. > > > > > > > > > > So let's have an actual look at the tradeoffs. > > > > > > > > > > We don't actually "support live migration of existing running VMs > > > > > indefinitely". > > > > > > > > > > We support live migration to any newer version of QEMU that still > > > > > supports the machine type. > > > > > > > > > > We support live migration to any older version of QEMU that already > > > > > supports the machine type and all the devices the machine uses. > > > > > > > > > > Aside: "support" is really an honest best effort here. If you rely on > > > > > it, use a downstream that puts in the (substantial!) QA work real > > > > > support takes. > > > > > > > > > > Feature deprecation is not a contract to drop the feature after two > > > > > releases, or even five. It's a formal notice that users of the feature > > > > > should transition to its replacement in an orderly manner. > > > > > > > > > > If I understand Igor correctly, all users should transition away from > > > > > outdated NUMA configurations at least for new VMs in an orderly manner. > > > > Yes, we can postpone removing options until there are machines type > > > > versions that were capable to use it (unfortunate but probably > > > > unavoidable unless there is a migration trick to make transition > > > > transparent) but that should not stop us from disabling broken > > > > options on new machine types at least. > > > > > > > > This series can serve as formal notice with follow up disabling of > > > > deprecated options for new machine types. (As Thomas noted, just warnings > > > > do not work and users continue to use broken features regardless whether > > > > they are don't know about issues or aware of it [*]) > > > > > > > > Hence suggested deprecation approach and enforced rejection of legacy > > > > numa options for new machine types in 2 releases so users would stop > > > > using them eventually. > > > > > > When we deprecate something, we need to have a way for apps to use the > > > new alternative approach *at the same time*. So even if we only want to > > > deprecate for new machine types, we still have to first solve the problem > > > of how mgmt apps will introspect QEMU to learn which machine types expect > > > the new options. > > I'm not aware any mechanism to introspect machine type options (existing > > or something being developed). Are/were there any ideas about it that were > > discussed in the past? > > > > Aside from developing a new mechanism what are alternative approaches? > > I mean when we delete deprecated CLI option, how it's solved on libvirt > > side currently? > > > > For example I don't see anything introspection related when we have been > > removing deprecated options recently. > > Right, with other stuff we deprecate we've had a simpler time, as it > either didn't affect migration at all, or the new replacement stuff > was fully compatible with the migration data stream. IOW, libvirt > could unconditionally use the new feature as soon as it saw that it > exists in QEMU. We didn't have any machine type dependancy to deal > with before now. Any suggestions what direction we should proceed? (I'm really not keen to develop a new introspection feature but if that the only way to move forward ...) > > More exact question specific to this series usecase, > > how libvirt decides when to use -numa node,memdev or not currently? > > It is pretty hard to follow the code, but IIUC we only use memdev when > stting up NVDIMMs, or for guests configured to have the "shared" flag > on the memory region. Then I'd guess that most VMs end up with default '-numa node,mem' which by design can produce only fake NUMA without ability to manage guest RAM on host side. So such VMs aren't getting performance benefits or worse run with performance regression (due to wrong sched/mm decisions as guest kernel assumes NUMA topology is valid one). > Regards, > Daniel
On Mon, Mar 04, 2019 at 12:45:14PM +0100, Markus Armbruster wrote: > Daniel P. Berrangé <berrange@redhat.com> writes: > > > On Mon, Mar 04, 2019 at 08:13:53AM +0100, Markus Armbruster wrote: > >> If we deprecate outdated NUMA configurations now, we can start rejecting > >> them with new machine types after a suitable grace period. > > > > How is libvirt going to know what machines it can use with the feature ? > > We don't have any way to introspect machine type specific logic, since we > > run all probing with "-machine none", and QEMU can't report anything about > > machines without instantiating them. > > Fair point. A practical way for management applications to decide which > of the two interfaces they can use with which machine type may be > required for deprecating one of the interfaces with new machine types. We currently have "qom-list-properties" which can report on the existance of properties registered against object types. What it can't do though is report on the default values of these properties. What's interesting though is that qmp_qom_list_properties will actually instantiate objects in order to query properties, if the type isn't an abstract type. IOW, even if you are running "$QEMU -machine none", then if at the qmp-shell you do (QEMU) qom-list-properties typename=pc-q35-2.6-machine it will have actually instantiate the pc-q35-2.6-machine machine type. Since it has instantiated the machine, the object initializer function will have run and initialized the default values for various properties. IOW, it is possible for qom-list-properties to report on default values for non-abstract types. I did a quick hack to PoC the theory: diff --git a/qapi/misc.json b/qapi/misc.json index 8b3ca4fdd3..906dfbf3b5 100644 --- a/qapi/misc.json +++ b/qapi/misc.json @@ -1368,7 +1368,8 @@ # Since: 1.2 ## { 'struct': 'ObjectPropertyInfo', - 'data': { 'name': 'str', 'type': 'str', '*description': 'str' } } + 'data': { 'name': 'str', 'type': 'str', '*description': 'str', + '*default': 'str'} } ## # @qom-list: diff --git a/qmp.c b/qmp.c index b92d62cd5f..a45669032c 100644 --- a/qmp.c +++ b/qmp.c @@ -594,6 +594,11 @@ ObjectPropertyInfoList *qmp_qom_list_properties(const char *typename, info->has_description = !!prop->description; info->description = g_strdup(prop->description); + if (obj && g_str_equal(info->type, "string")) { + info->q_default = g_strdup(object_property_get_str(obj, info->name, NULL)); + info->has_q_default = info->q_default != NULL; + } + entry = g_malloc0(sizeof(*entry)); entry->value = info; entry->next = prop_list; If we could make this hack less of a hack, then perhaps this is good enough to cope reporting machine types which forbid use of "mem" in favour of "memdev" ? They would need to have a property registered against them of course to identify the "memdev" requirement. Regards, Daniel
On Mon, 4 Mar 2019 15:28:11 +0000 Daniel P. Berrangé <berrange@redhat.com> wrote: > On Mon, Mar 04, 2019 at 12:45:14PM +0100, Markus Armbruster wrote: > > Daniel P. Berrangé <berrange@redhat.com> writes: > > > > > On Mon, Mar 04, 2019 at 08:13:53AM +0100, Markus Armbruster wrote: > > >> If we deprecate outdated NUMA configurations now, we can start rejecting > > >> them with new machine types after a suitable grace period. > > > > > > How is libvirt going to know what machines it can use with the feature ? > > > We don't have any way to introspect machine type specific logic, since we > > > run all probing with "-machine none", and QEMU can't report anything about > > > machines without instantiating them. > > > > Fair point. A practical way for management applications to decide which > > of the two interfaces they can use with which machine type may be > > required for deprecating one of the interfaces with new machine types. > > We currently have "qom-list-properties" which can report on the > existance of properties registered against object types. What it > can't do though is report on the default values of these properties. > > What's interesting though is that qmp_qom_list_properties will actually > instantiate objects in order to query properties, if the type isn't an > abstract type. > > IOW, even if you are running "$QEMU -machine none", then if at the qmp-shell > you do > > (QEMU) qom-list-properties typename=pc-q35-2.6-machine > > it will have actually instantiate the pc-q35-2.6-machine machine type. > Since it has instantiated the machine, the object initializer function > will have run and initialized the default values for various properties. > > IOW, it is possible for qom-list-properties to report on default values > for non-abstract types. > > I did a quick hack to PoC the theory: > > diff --git a/qapi/misc.json b/qapi/misc.json > index 8b3ca4fdd3..906dfbf3b5 100644 > --- a/qapi/misc.json > +++ b/qapi/misc.json > @@ -1368,7 +1368,8 @@ > # Since: 1.2 > ## > { 'struct': 'ObjectPropertyInfo', > - 'data': { 'name': 'str', 'type': 'str', '*description': 'str' } } > + 'data': { 'name': 'str', 'type': 'str', '*description': 'str', > + '*default': 'str'} } > > ## > # @qom-list: > diff --git a/qmp.c b/qmp.c > index b92d62cd5f..a45669032c 100644 > --- a/qmp.c > +++ b/qmp.c > @@ -594,6 +594,11 @@ ObjectPropertyInfoList *qmp_qom_list_properties(const char *typename, > info->has_description = !!prop->description; > info->description = g_strdup(prop->description); > > + if (obj && g_str_equal(info->type, "string")) { > + info->q_default = g_strdup(object_property_get_str(obj, info->name, NULL)); > + info->has_q_default = info->q_default != NULL; > + } > + > entry = g_malloc0(sizeof(*entry)); > entry->value = info; > entry->next = prop_list; > > > If we could make this hack less of a hack, then perhaps this is good > enough to cope reporting machine types which forbid use of "mem" in > favour of "memdev" ? They would need to have a property registered > against them of course to identify the "memdev" requirement. Thanks, I'll look into it and try to come up with patches. > Regards, > Daniel
On 3/4/19 4:19 PM, Igor Mammedov wrote: > Then I'd guess that most VMs end up with default '-numa node,mem' > which by design can produce only fake NUMA without ability to manage > guest RAM on host side. So such VMs aren't getting performance benefits > or worse run with performance regression (due to wrong sched/mm decisions > as guest kernel assumes NUMA topology is valid one). Specifying NUMA distances in libvirt XML makes it generate the modern cmd line. Michal
On 3/4/19 3:24 PM, Daniel P. Berrangé wrote: > On Mon, Mar 04, 2019 at 03:16:41PM +0100, Igor Mammedov wrote: >> On Mon, 4 Mar 2019 12:39:08 +0000 >> Daniel P. Berrangé <berrange@redhat.com> wrote: >> >>> On Mon, Mar 04, 2019 at 01:25:07PM +0100, Igor Mammedov wrote: >>>> On Mon, 04 Mar 2019 08:13:53 +0100 >>>> Markus Armbruster <armbru@redhat.com> wrote: >>>> >>>>> Daniel P. Berrangé <berrange@redhat.com> writes: >>>>> >>>>>> On Fri, Mar 01, 2019 at 06:33:28PM +0100, Igor Mammedov wrote: >>>>>>> On Fri, 1 Mar 2019 15:49:47 +0000 >>>>>>> Daniel P. Berrangé <berrange@redhat.com> wrote: >>>>>>> >>>>>>>> On Fri, Mar 01, 2019 at 04:42:15PM +0100, Igor Mammedov wrote: >>>>>>>>> The parameter allows to configure fake NUMA topology where guest >>>>>>>>> VM simulates NUMA topology but not actually getting a performance >>>>>>>>> benefits from it. The same or better results could be achieved >>>>>>>>> using 'memdev' parameter. In light of that any VM that uses NUMA >>>>>>>>> to get its benefits should use 'memdev' and to allow transition >>>>>>>>> initial RAM to device based model, deprecate 'mem' parameter as >>>>>>>>> its ad-hoc partitioning of initial RAM MemoryRegion can't be >>>>>>>>> translated to memdev based backend transparently to users and in >>>>>>>>> compatible manner (migration wise). >>>>>>>>> >>>>>>>>> That will also allow to clean up a bit our numa code, leaving only >>>>>>>>> 'memdev' impl. in place and several boards that use node_mem >>>>>>>>> to generate FDT/ACPI description from it. >>>>>>>> >>>>>>>> Can you confirm that the 'mem' and 'memdev' parameters to -numa >>>>>>>> are 100% live migration compatible in both directions ? Libvirt >>>>>>>> would need this to be the case in order to use the 'memdev' syntax >>>>>>>> instead. >>>>>>> Unfortunately they are not migration compatible in any direction, >>>>>>> if it where possible to translate them to each other I'd alias 'mem' >>>>>>> to 'memdev' without deprecation. The former sends over only one >>>>>>> MemoryRegion to target, while the later sends over several (one per >>>>>>> memdev). >>>>>> >>>>>> If we can't migration from one to the other, then we can not deprecate >>>>>> the existing 'mem' syntax. Even if libvirt were to provide a config >>>>>> option to let apps opt-in to the new syntax, we need to be able to >>>>>> support live migration of existing running VMs indefinitely. Effectively >>>>>> this means we need the to keep 'mem' support forever, or at least such >>>>>> a long time that it effectively means forever. >>>>>> >>>>>> So I think this patch has to be dropped & replaced with one that >>>>>> simply documents that memdev syntax is preferred. >>>>> >>>>> We have this habit of postulating absolutes like "can not deprecate" >>>>> instead of engaging with the tradeoffs. We need to kick it. >>>>> >>>>> So let's have an actual look at the tradeoffs. >>>>> >>>>> We don't actually "support live migration of existing running VMs >>>>> indefinitely". >>>>> >>>>> We support live migration to any newer version of QEMU that still >>>>> supports the machine type. >>>>> >>>>> We support live migration to any older version of QEMU that already >>>>> supports the machine type and all the devices the machine uses. >>>>> >>>>> Aside: "support" is really an honest best effort here. If you rely on >>>>> it, use a downstream that puts in the (substantial!) QA work real >>>>> support takes. >>>>> >>>>> Feature deprecation is not a contract to drop the feature after two >>>>> releases, or even five. It's a formal notice that users of the feature >>>>> should transition to its replacement in an orderly manner. >>>>> >>>>> If I understand Igor correctly, all users should transition away from >>>>> outdated NUMA configurations at least for new VMs in an orderly manner. >>>> Yes, we can postpone removing options until there are machines type >>>> versions that were capable to use it (unfortunate but probably >>>> unavoidable unless there is a migration trick to make transition >>>> transparent) but that should not stop us from disabling broken >>>> options on new machine types at least. >>>> >>>> This series can serve as formal notice with follow up disabling of >>>> deprecated options for new machine types. (As Thomas noted, just warnings >>>> do not work and users continue to use broken features regardless whether >>>> they are don't know about issues or aware of it [*]) >>>> >>>> Hence suggested deprecation approach and enforced rejection of legacy >>>> numa options for new machine types in 2 releases so users would stop >>>> using them eventually. >>> >>> When we deprecate something, we need to have a way for apps to use the >>> new alternative approach *at the same time*. So even if we only want to >>> deprecate for new machine types, we still have to first solve the problem >>> of how mgmt apps will introspect QEMU to learn which machine types expect >>> the new options. >> I'm not aware any mechanism to introspect machine type options (existing >> or something being developed). Are/were there any ideas about it that were >> discussed in the past? >> >> Aside from developing a new mechanism what are alternative approaches? >> I mean when we delete deprecated CLI option, how it's solved on libvirt >> side currently? >> >> For example I don't see anything introspection related when we have been >> removing deprecated options recently. > > Right, with other stuff we deprecate we've had a simpler time, as it > either didn't affect migration at all, or the new replacement stuff > was fully compatible with the migration data stream. IOW, libvirt > could unconditionally use the new feature as soon as it saw that it > exists in QEMU. We didn't have any machine type dependancy to deal > with before now. We couldn't have done that. How we would migrate from older qemu? Anyway, now that I look into this (esp. git log) I came accross: commit f309db1f4d51009bad0d32e12efc75530b66836b Author: Michal Privoznik <mprivozn@redhat.com> AuthorDate: Thu Dec 18 12:36:48 2014 +0100 Commit: Michal Privoznik <mprivozn@redhat.com> CommitDate: Fri Dec 19 07:44:44 2014 +0100 qemu: Create memory-backend-{ram,file} iff needed Or this 7832fac84741d65e851dbdbfaf474785cbfdcf3c. We did try to generated newer cmd line but then for various reasong (e.g. avoiding triggering a qemu bug) we turned it off and make libvirt default to older (now deprecated) cmd line. Frankly, I don't know how to proceed. Unless qemu is fixed to allow migration from deprecated to new cmd line (unlikely, if not impossible, right?) then I guess the only approach we can have is that: 1) whenever so called cold booting a new machine (fresh, brand new start of a new domain) libvirt would default to modern cmd line, 2) on migration, libvirt would record in the migration stream (or status XML or wherever) that modern cmd line was generated and thus it'll make the destination generate modern cmd line too. This solution still suffers a couple of problems: a) migration to older libvirt will fail as older libvirt won't recognize the flag set in 2) and therefore would default to deprecated cmd line b) migrating from one host to another won't modernize the cmd line But I guess we have to draw a line somewhere (if we are not willing to write those migration patches). Michal
On Mon, Mar 04, 2019 at 05:12:40PM +0100, Michal Privoznik wrote: > On 3/4/19 4:19 PM, Igor Mammedov wrote: > > > Then I'd guess that most VMs end up with default '-numa node,mem' > > which by design can produce only fake NUMA without ability to manage > > guest RAM on host side. So such VMs aren't getting performance benefits > > or worse run with performance regression (due to wrong sched/mm decisions > > as guest kernel assumes NUMA topology is valid one). > > Specifying NUMA distances in libvirt XML makes it generate the modern cmd > line. AFAIK, specifying any guest NUMA -> Host NUMA affinity makes it use the modern cmd line. eg I just modified a plain 8 CPU / 2 GB RAM guest with this: <numatune> <memnode cellid='0' mode='strict' nodeset='0'/> <memnode cellid='1' mode='strict' nodeset='1'/> </numatune> <cpu mode='host-model'> <numa> <cell id='0' cpus='0-3' memory='1024000' unit='KiB'/> <cell id='1' cpus='4-7' memory='1024000' unit='KiB'/> </numa> </cpu> and I can see libvirt decided to use memdev -object memory-backend-ram,id=ram-node0,size=1048576000,host-nodes=0,policy=bind -numa node,nodeid=0,cpus=0-3,memdev=ram-node0 -object memory-backend-ram,id=ram-node1,size=1048576000,host-nodes=1,policy=bind -numa node,nodeid=1,cpus=4-7,memdev=ram-node1 So unless I'm missing something, we aren't suffering from the problem described by Igor above even today. Regards, Daniel
* Michal Privoznik (mprivozn@redhat.com) wrote: > On 3/4/19 3:24 PM, Daniel P. Berrangé wrote: > > On Mon, Mar 04, 2019 at 03:16:41PM +0100, Igor Mammedov wrote: > > > On Mon, 4 Mar 2019 12:39:08 +0000 > > > Daniel P. Berrangé <berrange@redhat.com> wrote: > > > > > > > On Mon, Mar 04, 2019 at 01:25:07PM +0100, Igor Mammedov wrote: > > > > > On Mon, 04 Mar 2019 08:13:53 +0100 > > > > > Markus Armbruster <armbru@redhat.com> wrote: > > > > > > Daniel P. Berrangé <berrange@redhat.com> writes: > > > > > > > On Fri, Mar 01, 2019 at 06:33:28PM +0100, Igor Mammedov wrote: > > > > > > > > On Fri, 1 Mar 2019 15:49:47 +0000 > > > > > > > > Daniel P. Berrangé <berrange@redhat.com> wrote: > > > > > > > > > On Fri, Mar 01, 2019 at 04:42:15PM +0100, Igor Mammedov wrote: > > > > > > > > > > The parameter allows to configure fake NUMA topology where guest > > > > > > > > > > VM simulates NUMA topology but not actually getting a performance > > > > > > > > > > benefits from it. The same or better results could be achieved > > > > > > > > > > using 'memdev' parameter. In light of that any VM that uses NUMA > > > > > > > > > > to get its benefits should use 'memdev' and to allow transition > > > > > > > > > > initial RAM to device based model, deprecate 'mem' parameter as > > > > > > > > > > its ad-hoc partitioning of initial RAM MemoryRegion can't be > > > > > > > > > > translated to memdev based backend transparently to users and in > > > > > > > > > > compatible manner (migration wise). > > > > > > > > > > > > > > > > > > > > That will also allow to clean up a bit our numa code, leaving only > > > > > > > > > > 'memdev' impl. in place and several boards that use node_mem > > > > > > > > > > to generate FDT/ACPI description from it. > > > > > > > > > > > > > > > > > > Can you confirm that the 'mem' and 'memdev' parameters to -numa > > > > > > > > > are 100% live migration compatible in both directions ? Libvirt > > > > > > > > > would need this to be the case in order to use the 'memdev' syntax > > > > > > > > > instead. > > > > > > > > Unfortunately they are not migration compatible in any direction, > > > > > > > > if it where possible to translate them to each other I'd alias 'mem' > > > > > > > > to 'memdev' without deprecation. The former sends over only one > > > > > > > > MemoryRegion to target, while the later sends over several (one per > > > > > > > > memdev). > > > > > > > > > > > > > > If we can't migration from one to the other, then we can not deprecate > > > > > > > the existing 'mem' syntax. Even if libvirt were to provide a config > > > > > > > option to let apps opt-in to the new syntax, we need to be able to > > > > > > > support live migration of existing running VMs indefinitely. Effectively > > > > > > > this means we need the to keep 'mem' support forever, or at least such > > > > > > > a long time that it effectively means forever. > > > > > > > > > > > > > > So I think this patch has to be dropped & replaced with one that > > > > > > > simply documents that memdev syntax is preferred. > > > > > > > > > > > > We have this habit of postulating absolutes like "can not deprecate" > > > > > > instead of engaging with the tradeoffs. We need to kick it. > > > > > > > > > > > > So let's have an actual look at the tradeoffs. > > > > > > > > > > > > We don't actually "support live migration of existing running VMs > > > > > > indefinitely". > > > > > > > > > > > > We support live migration to any newer version of QEMU that still > > > > > > supports the machine type. > > > > > > > > > > > > We support live migration to any older version of QEMU that already > > > > > > supports the machine type and all the devices the machine uses. > > > > > > > > > > > > Aside: "support" is really an honest best effort here. If you rely on > > > > > > it, use a downstream that puts in the (substantial!) QA work real > > > > > > support takes. > > > > > > > > > > > > Feature deprecation is not a contract to drop the feature after two > > > > > > releases, or even five. It's a formal notice that users of the feature > > > > > > should transition to its replacement in an orderly manner. > > > > > > > > > > > > If I understand Igor correctly, all users should transition away from > > > > > > outdated NUMA configurations at least for new VMs in an orderly manner. > > > > > Yes, we can postpone removing options until there are machines type > > > > > versions that were capable to use it (unfortunate but probably > > > > > unavoidable unless there is a migration trick to make transition > > > > > transparent) but that should not stop us from disabling broken > > > > > options on new machine types at least. > > > > > > > > > > This series can serve as formal notice with follow up disabling of > > > > > deprecated options for new machine types. (As Thomas noted, just warnings > > > > > do not work and users continue to use broken features regardless whether > > > > > they are don't know about issues or aware of it [*]) > > > > > > > > > > Hence suggested deprecation approach and enforced rejection of legacy > > > > > numa options for new machine types in 2 releases so users would stop > > > > > using them eventually. > > > > > > > > When we deprecate something, we need to have a way for apps to use the > > > > new alternative approach *at the same time*. So even if we only want to > > > > deprecate for new machine types, we still have to first solve the problem > > > > of how mgmt apps will introspect QEMU to learn which machine types expect > > > > the new options. > > > I'm not aware any mechanism to introspect machine type options (existing > > > or something being developed). Are/were there any ideas about it that were > > > discussed in the past? > > > > > > Aside from developing a new mechanism what are alternative approaches? > > > I mean when we delete deprecated CLI option, how it's solved on libvirt > > > side currently? > > > > > > For example I don't see anything introspection related when we have been > > > removing deprecated options recently. > > > > Right, with other stuff we deprecate we've had a simpler time, as it > > either didn't affect migration at all, or the new replacement stuff > > was fully compatible with the migration data stream. IOW, libvirt > > could unconditionally use the new feature as soon as it saw that it > > exists in QEMU. We didn't have any machine type dependancy to deal > > with before now. > > We couldn't have done that. How we would migrate from older qemu? > > Anyway, now that I look into this (esp. git log) I came accross: > > commit f309db1f4d51009bad0d32e12efc75530b66836b > Author: Michal Privoznik <mprivozn@redhat.com> > AuthorDate: Thu Dec 18 12:36:48 2014 +0100 > Commit: Michal Privoznik <mprivozn@redhat.com> > CommitDate: Fri Dec 19 07:44:44 2014 +0100 > > qemu: Create memory-backend-{ram,file} iff needed > > Or this 7832fac84741d65e851dbdbfaf474785cbfdcf3c. We did try to generated > newer cmd line but then for various reasong (e.g. avoiding triggering a qemu > bug) we turned it off and make libvirt default to older (now deprecated) cmd > line. > > Frankly, I don't know how to proceed. Unless qemu is fixed to allow > migration from deprecated to new cmd line (unlikely, if not impossible, > right?) then I guess the only approach we can have is that: > > 1) whenever so called cold booting a new machine (fresh, brand new start of > a new domain) libvirt would default to modern cmd line, > > 2) on migration, libvirt would record in the migration stream (or status XML > or wherever) that modern cmd line was generated and thus it'll make the > destination generate modern cmd line too. > > This solution still suffers a couple of problems: > a) migration to older libvirt will fail as older libvirt won't recognize the > flag set in 2) and therefore would default to deprecated cmd line > b) migrating from one host to another won't modernize the cmd line > > But I guess we have to draw a line somewhere (if we are not willing to write > those migration patches). What's interesting here is that this problem isn't really machine-type related; so providing introspection on the machine type doesn't immediately help. What we're actually trying to do here is (mis)use a machine type as a proxy for knowing that both sides are new enough to handle the new command line. That's an OK thing to do, and if we did have introspection we could add a fudge flag to say it's allowed now. Dave > Michal -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
On Mon, Mar 04, 2019 at 05:20:13PM +0100, Michal Privoznik wrote: > On 3/4/19 3:24 PM, Daniel P. Berrangé wrote: > > On Mon, Mar 04, 2019 at 03:16:41PM +0100, Igor Mammedov wrote: > > > On Mon, 4 Mar 2019 12:39:08 +0000 > > > Daniel P. Berrangé <berrange@redhat.com> wrote: > > > > > > > On Mon, Mar 04, 2019 at 01:25:07PM +0100, Igor Mammedov wrote: > > > > > On Mon, 04 Mar 2019 08:13:53 +0100 > > > > > Markus Armbruster <armbru@redhat.com> wrote: > > > > > > Daniel P. Berrangé <berrange@redhat.com> writes: > > > > > > > On Fri, Mar 01, 2019 at 06:33:28PM +0100, Igor Mammedov wrote: > > > > > > > > On Fri, 1 Mar 2019 15:49:47 +0000 > > > > > > > > Daniel P. Berrangé <berrange@redhat.com> wrote: > > > > > > > > > On Fri, Mar 01, 2019 at 04:42:15PM +0100, Igor Mammedov wrote: > > > > > > > > > > The parameter allows to configure fake NUMA topology where guest > > > > > > > > > > VM simulates NUMA topology but not actually getting a performance > > > > > > > > > > benefits from it. The same or better results could be achieved > > > > > > > > > > using 'memdev' parameter. In light of that any VM that uses NUMA > > > > > > > > > > to get its benefits should use 'memdev' and to allow transition > > > > > > > > > > initial RAM to device based model, deprecate 'mem' parameter as > > > > > > > > > > its ad-hoc partitioning of initial RAM MemoryRegion can't be > > > > > > > > > > translated to memdev based backend transparently to users and in > > > > > > > > > > compatible manner (migration wise). > > > > > > > > > > > > > > > > > > > > That will also allow to clean up a bit our numa code, leaving only > > > > > > > > > > 'memdev' impl. in place and several boards that use node_mem > > > > > > > > > > to generate FDT/ACPI description from it. > > > > > > > > > > > > > > > > > > Can you confirm that the 'mem' and 'memdev' parameters to -numa > > > > > > > > > are 100% live migration compatible in both directions ? Libvirt > > > > > > > > > would need this to be the case in order to use the 'memdev' syntax > > > > > > > > > instead. > > > > > > > > Unfortunately they are not migration compatible in any direction, > > > > > > > > if it where possible to translate them to each other I'd alias 'mem' > > > > > > > > to 'memdev' without deprecation. The former sends over only one > > > > > > > > MemoryRegion to target, while the later sends over several (one per > > > > > > > > memdev). > > > > > > > > > > > > > > If we can't migration from one to the other, then we can not deprecate > > > > > > > the existing 'mem' syntax. Even if libvirt were to provide a config > > > > > > > option to let apps opt-in to the new syntax, we need to be able to > > > > > > > support live migration of existing running VMs indefinitely. Effectively > > > > > > > this means we need the to keep 'mem' support forever, or at least such > > > > > > > a long time that it effectively means forever. > > > > > > > > > > > > > > So I think this patch has to be dropped & replaced with one that > > > > > > > simply documents that memdev syntax is preferred. > > > > > > > > > > > > We have this habit of postulating absolutes like "can not deprecate" > > > > > > instead of engaging with the tradeoffs. We need to kick it. > > > > > > > > > > > > So let's have an actual look at the tradeoffs. > > > > > > > > > > > > We don't actually "support live migration of existing running VMs > > > > > > indefinitely". > > > > > > > > > > > > We support live migration to any newer version of QEMU that still > > > > > > supports the machine type. > > > > > > > > > > > > We support live migration to any older version of QEMU that already > > > > > > supports the machine type and all the devices the machine uses. > > > > > > > > > > > > Aside: "support" is really an honest best effort here. If you rely on > > > > > > it, use a downstream that puts in the (substantial!) QA work real > > > > > > support takes. > > > > > > > > > > > > Feature deprecation is not a contract to drop the feature after two > > > > > > releases, or even five. It's a formal notice that users of the feature > > > > > > should transition to its replacement in an orderly manner. > > > > > > > > > > > > If I understand Igor correctly, all users should transition away from > > > > > > outdated NUMA configurations at least for new VMs in an orderly manner. > > > > > Yes, we can postpone removing options until there are machines type > > > > > versions that were capable to use it (unfortunate but probably > > > > > unavoidable unless there is a migration trick to make transition > > > > > transparent) but that should not stop us from disabling broken > > > > > options on new machine types at least. > > > > > > > > > > This series can serve as formal notice with follow up disabling of > > > > > deprecated options for new machine types. (As Thomas noted, just warnings > > > > > do not work and users continue to use broken features regardless whether > > > > > they are don't know about issues or aware of it [*]) > > > > > > > > > > Hence suggested deprecation approach and enforced rejection of legacy > > > > > numa options for new machine types in 2 releases so users would stop > > > > > using them eventually. > > > > > > > > When we deprecate something, we need to have a way for apps to use the > > > > new alternative approach *at the same time*. So even if we only want to > > > > deprecate for new machine types, we still have to first solve the problem > > > > of how mgmt apps will introspect QEMU to learn which machine types expect > > > > the new options. > > > I'm not aware any mechanism to introspect machine type options (existing > > > or something being developed). Are/were there any ideas about it that were > > > discussed in the past? > > > > > > Aside from developing a new mechanism what are alternative approaches? > > > I mean when we delete deprecated CLI option, how it's solved on libvirt > > > side currently? > > > > > > For example I don't see anything introspection related when we have been > > > removing deprecated options recently. > > > > Right, with other stuff we deprecate we've had a simpler time, as it > > either didn't affect migration at all, or the new replacement stuff > > was fully compatible with the migration data stream. IOW, libvirt > > could unconditionally use the new feature as soon as it saw that it > > exists in QEMU. We didn't have any machine type dependancy to deal > > with before now. > > We couldn't have done that. How we would migrate from older qemu? > > Anyway, now that I look into this (esp. git log) I came accross: > > commit f309db1f4d51009bad0d32e12efc75530b66836b > Author: Michal Privoznik <mprivozn@redhat.com> > AuthorDate: Thu Dec 18 12:36:48 2014 +0100 > Commit: Michal Privoznik <mprivozn@redhat.com> > CommitDate: Fri Dec 19 07:44:44 2014 +0100 > > qemu: Create memory-backend-{ram,file} iff needed > > Or this 7832fac84741d65e851dbdbfaf474785cbfdcf3c. We did try to generated > newer cmd line but then for various reasong (e.g. avoiding triggering a qemu > bug) we turned it off and make libvirt default to older (now deprecated) cmd > line. > > Frankly, I don't know how to proceed. Unless qemu is fixed to allow > migration from deprecated to new cmd line (unlikely, if not impossible, > right?) then I guess the only approach we can have is that: > > 1) whenever so called cold booting a new machine (fresh, brand new start of > a new domain) libvirt would default to modern cmd line, > > 2) on migration, libvirt would record in the migration stream (or status XML > or wherever) that modern cmd line was generated and thus it'll make the > destination generate modern cmd line too. > > This solution still suffers a couple of problems: > a) migration to older libvirt will fail as older libvirt won't recognize the > flag set in 2) and therefore would default to deprecated cmd line > b) migrating from one host to another won't modernize the cmd line > > But I guess we have to draw a line somewhere (if we are not willing to write > those migration patches). Yeah supporting backwards migration is a non-optional requirement from at least one of the mgmt apps using libvirt, so breaking the new to old case is something we always aim to avoid. These incompabilities are reminding me why we haven't tied these kind of changes to machine type versions in the past. New machine type != new libvirt, so we can't tie usage of a feature in livirt to a new machine type. I'm wondering exactly which cases libvirt will still use the "mem" option in as opposed to "memdev". If none of the cases using "mem" actually suffer from the ill-effects of "mem", then there's not a compelling reason to stop using it. It can be discouraged in QEMU documentation but otherwise left alone. Regards, Daniel
On Mon, 4 Mar 2019 15:02:18 +0000 Daniel P. Berrangé <berrange@redhat.com> wrote: > On Mon, Mar 04, 2019 at 03:54:57PM +0100, Igor Mammedov wrote: > > On Mon, 4 Mar 2019 13:59:09 +0000 > > Daniel P. Berrangé <berrange@redhat.com> wrote: > > > > > On Mon, Mar 04, 2019 at 02:55:10PM +0100, Igor Mammedov wrote: > > > > On Mon, 4 Mar 2019 09:11:19 +0100 > > > > Thomas Huth <thuth@redhat.com> wrote: > > > > > > > > > On 01/03/2019 18.48, Daniel P. Berrangé wrote: > > > > > [...] > > > > > > So I think this patch has to be dropped & replaced with one that > > > > > > simply documents that memdev syntax is preferred. > > > > > > > > > > That's definitely not enough. I've had a couple of cases already where > > > > > we documented that certain options should not be used anymore, and > > > > > people simply ignored it (aka. if it ain't broken, don't do any change). > > > > > Then they just started to complain when I really tried to remove the > > > > > option after the deprecation period. > > > > > > > > > So Igor, if you can not officially deprecate these things here yet, you > > > > > should at least make sure that they can not be used with new machine > > > > > types anymore. Then, after a couple of years, when we feel sure that > > > > > there are only some few or no people left who still use it with the old > > > > > machine types, we can start to discuss the deprecation process again, I > > > > > think. > > > > Is it acceptable to silently disable CLI options (even if they are broken > > > > like in this case) for new machine types? > > > > I was under impression that it should go through deprecation first. > > > > > > Yes, it must go through deprecation. I was saying we can't disable > > > the CLI options at all, until there is a way for libvirt to correctly > > > use the new options. > > > > I'm not adding new options (nor plan to for numa case (yet)), > > -numa node,memdev is around several years by now and should be used > > as default for creating new configs. > > > > In light of keeping 'mem' option around for old machines, > > Deprecation should have served for notifying users that legacy > > options will be disabled later on (for new machines at least > > if no way found for migration compatible transition for older ones). > > > > What I'm mainly aiming here is to prevent using broken legacy options > > for new VMs (like in RHBZ1624223 case) and deprecation is the only way > > we have now to notify users about CLI breaking changes. > > The idea of doing advance warnings via deprecations is that applications > have time to adapt to the new mechanism several releases before the old > mechanism is removed/disabled. Since the new mechanism isn't fully > usable yet, applications can't adapt to use it. So we can't start the > deprecation process yet, as it would be telling apps to do a switch > that isn't possible for many to actually do. At least a hope attempt to deprecate 'mem', served its purpose somewhat. Now it's clear that libvirt defaults to wrong legacy mode and should use 'memdev' instead. I hope that it will be addressed on libvirt side regardless of the fate of 'mem' deprecation process. Basically new VM should default to 'memdev' if it's available. > In the meantime, qemu-options.hx could be updated. It documents both > "mem" and "memdev" currently but doesn't tell people that "memdev" is > the preferred syntax for future usage / warn against using "mem". Will do so. > > Regards, > Daniel
On Mon, 4 Mar 2019 16:35:16 +0000 Daniel P. Berrangé <berrange@redhat.com> wrote: > On Mon, Mar 04, 2019 at 05:20:13PM +0100, Michal Privoznik wrote: > > On 3/4/19 3:24 PM, Daniel P. Berrangé wrote: > > > On Mon, Mar 04, 2019 at 03:16:41PM +0100, Igor Mammedov wrote: > > > > On Mon, 4 Mar 2019 12:39:08 +0000 > > > > Daniel P. Berrangé <berrange@redhat.com> wrote: > > > > > > > > > On Mon, Mar 04, 2019 at 01:25:07PM +0100, Igor Mammedov wrote: > > > > > > On Mon, 04 Mar 2019 08:13:53 +0100 > > > > > > Markus Armbruster <armbru@redhat.com> wrote: > > > > > > > Daniel P. Berrangé <berrange@redhat.com> writes: > > > > > > > > On Fri, Mar 01, 2019 at 06:33:28PM +0100, Igor Mammedov wrote: > > > > > > > > > On Fri, 1 Mar 2019 15:49:47 +0000 > > > > > > > > > Daniel P. Berrangé <berrange@redhat.com> wrote: > > > > > > > > > > On Fri, Mar 01, 2019 at 04:42:15PM +0100, Igor Mammedov wrote: > > > > > > > > > > > The parameter allows to configure fake NUMA topology where guest > > > > > > > > > > > VM simulates NUMA topology but not actually getting a performance > > > > > > > > > > > benefits from it. The same or better results could be achieved > > > > > > > > > > > using 'memdev' parameter. In light of that any VM that uses NUMA > > > > > > > > > > > to get its benefits should use 'memdev' and to allow transition > > > > > > > > > > > initial RAM to device based model, deprecate 'mem' parameter as > > > > > > > > > > > its ad-hoc partitioning of initial RAM MemoryRegion can't be > > > > > > > > > > > translated to memdev based backend transparently to users and in > > > > > > > > > > > compatible manner (migration wise). > > > > > > > > > > > > > > > > > > > > > > That will also allow to clean up a bit our numa code, leaving only > > > > > > > > > > > 'memdev' impl. in place and several boards that use node_mem > > > > > > > > > > > to generate FDT/ACPI description from it. > > > > > > > > > > > > > > > > > > > > Can you confirm that the 'mem' and 'memdev' parameters to -numa > > > > > > > > > > are 100% live migration compatible in both directions ? Libvirt > > > > > > > > > > would need this to be the case in order to use the 'memdev' syntax > > > > > > > > > > instead. > > > > > > > > > Unfortunately they are not migration compatible in any direction, > > > > > > > > > if it where possible to translate them to each other I'd alias 'mem' > > > > > > > > > to 'memdev' without deprecation. The former sends over only one > > > > > > > > > MemoryRegion to target, while the later sends over several (one per > > > > > > > > > memdev). > > > > > > > > > > > > > > > > If we can't migration from one to the other, then we can not deprecate > > > > > > > > the existing 'mem' syntax. Even if libvirt were to provide a config > > > > > > > > option to let apps opt-in to the new syntax, we need to be able to > > > > > > > > support live migration of existing running VMs indefinitely. Effectively > > > > > > > > this means we need the to keep 'mem' support forever, or at least such > > > > > > > > a long time that it effectively means forever. > > > > > > > > > > > > > > > > So I think this patch has to be dropped & replaced with one that > > > > > > > > simply documents that memdev syntax is preferred. > > > > > > > > > > > > > > We have this habit of postulating absolutes like "can not deprecate" > > > > > > > instead of engaging with the tradeoffs. We need to kick it. > > > > > > > > > > > > > > So let's have an actual look at the tradeoffs. > > > > > > > > > > > > > > We don't actually "support live migration of existing running VMs > > > > > > > indefinitely". > > > > > > > > > > > > > > We support live migration to any newer version of QEMU that still > > > > > > > supports the machine type. > > > > > > > > > > > > > > We support live migration to any older version of QEMU that already > > > > > > > supports the machine type and all the devices the machine uses. > > > > > > > > > > > > > > Aside: "support" is really an honest best effort here. If you rely on > > > > > > > it, use a downstream that puts in the (substantial!) QA work real > > > > > > > support takes. > > > > > > > > > > > > > > Feature deprecation is not a contract to drop the feature after two > > > > > > > releases, or even five. It's a formal notice that users of the feature > > > > > > > should transition to its replacement in an orderly manner. > > > > > > > > > > > > > > If I understand Igor correctly, all users should transition away from > > > > > > > outdated NUMA configurations at least for new VMs in an orderly manner. > > > > > > Yes, we can postpone removing options until there are machines type > > > > > > versions that were capable to use it (unfortunate but probably > > > > > > unavoidable unless there is a migration trick to make transition > > > > > > transparent) but that should not stop us from disabling broken > > > > > > options on new machine types at least. > > > > > > > > > > > > This series can serve as formal notice with follow up disabling of > > > > > > deprecated options for new machine types. (As Thomas noted, just warnings > > > > > > do not work and users continue to use broken features regardless whether > > > > > > they are don't know about issues or aware of it [*]) > > > > > > > > > > > > Hence suggested deprecation approach and enforced rejection of legacy > > > > > > numa options for new machine types in 2 releases so users would stop > > > > > > using them eventually. > > > > > > > > > > When we deprecate something, we need to have a way for apps to use the > > > > > new alternative approach *at the same time*. So even if we only want to > > > > > deprecate for new machine types, we still have to first solve the problem > > > > > of how mgmt apps will introspect QEMU to learn which machine types expect > > > > > the new options. > > > > I'm not aware any mechanism to introspect machine type options (existing > > > > or something being developed). Are/were there any ideas about it that were > > > > discussed in the past? > > > > > > > > Aside from developing a new mechanism what are alternative approaches? > > > > I mean when we delete deprecated CLI option, how it's solved on libvirt > > > > side currently? > > > > > > > > For example I don't see anything introspection related when we have been > > > > removing deprecated options recently. > > > > > > Right, with other stuff we deprecate we've had a simpler time, as it > > > either didn't affect migration at all, or the new replacement stuff > > > was fully compatible with the migration data stream. IOW, libvirt > > > could unconditionally use the new feature as soon as it saw that it > > > exists in QEMU. We didn't have any machine type dependancy to deal > > > with before now. > > > > We couldn't have done that. How we would migrate from older qemu? > > > > Anyway, now that I look into this (esp. git log) I came accross: > > > > commit f309db1f4d51009bad0d32e12efc75530b66836b > > Author: Michal Privoznik <mprivozn@redhat.com> > > AuthorDate: Thu Dec 18 12:36:48 2014 +0100 > > Commit: Michal Privoznik <mprivozn@redhat.com> > > CommitDate: Fri Dec 19 07:44:44 2014 +0100 > > > > qemu: Create memory-backend-{ram,file} iff needed > > > > Or this 7832fac84741d65e851dbdbfaf474785cbfdcf3c. We did try to generated > > newer cmd line but then for various reasong (e.g. avoiding triggering a qemu > > bug) we turned it off and make libvirt default to older (now deprecated) cmd > > line. > > > > Frankly, I don't know how to proceed. Unless qemu is fixed to allow > > migration from deprecated to new cmd line (unlikely, if not impossible, > > right?) then I guess the only approach we can have is that: > > > > 1) whenever so called cold booting a new machine (fresh, brand new start of > > a new domain) libvirt would default to modern cmd line, > > > > 2) on migration, libvirt would record in the migration stream (or status XML > > or wherever) that modern cmd line was generated and thus it'll make the > > destination generate modern cmd line too. > > > > This solution still suffers a couple of problems: > > a) migration to older libvirt will fail as older libvirt won't recognize the > > flag set in 2) and therefore would default to deprecated cmd line > > b) migrating from one host to another won't modernize the cmd line > > > > But I guess we have to draw a line somewhere (if we are not willing to write > > those migration patches). > > Yeah supporting backwards migration is a non-optional requirement from at > least one of the mgmt apps using libvirt, so breaking the new to old case > is something we always aim to avoid. Aiming for support of "new QEMU + new machine type" => "old QEMU + non-existing machine type" seems a bit difficult. Note old machine types will continue to work with old CLI. > These incompabilities are reminding me why we haven't tied these kind of > changes to machine type versions in the past. New machine type != new > libvirt, so we can't tie usage of a feature in livirt to a new machine > type. > > I'm wondering exactly which cases libvirt will still use the "mem" option > in as opposed to "memdev". If none of the cases using "mem" actually > suffer from the ill-effects of "mem", then there's not a compelling reason > to stop using it. It can be discouraged in QEMU documentation but otherwise > left alone. > > Regards, > Daniel
On Mon, 4 Mar 2019 17:20:13 +0100 Michal Privoznik <mprivozn@redhat.com> wrote: > On 3/4/19 3:24 PM, Daniel P. Berrangé wrote: > > On Mon, Mar 04, 2019 at 03:16:41PM +0100, Igor Mammedov wrote: > >> On Mon, 4 Mar 2019 12:39:08 +0000 > >> Daniel P. Berrangé <berrange@redhat.com> wrote: > >> > >>> On Mon, Mar 04, 2019 at 01:25:07PM +0100, Igor Mammedov wrote: > >>>> On Mon, 04 Mar 2019 08:13:53 +0100 > >>>> Markus Armbruster <armbru@redhat.com> wrote: > >>>> > >>>>> Daniel P. Berrangé <berrange@redhat.com> writes: > >>>>> > >>>>>> On Fri, Mar 01, 2019 at 06:33:28PM +0100, Igor Mammedov wrote: > >>>>>>> On Fri, 1 Mar 2019 15:49:47 +0000 > >>>>>>> Daniel P. Berrangé <berrange@redhat.com> wrote: > >>>>>>> > >>>>>>>> On Fri, Mar 01, 2019 at 04:42:15PM +0100, Igor Mammedov wrote: > >>>>>>>>> The parameter allows to configure fake NUMA topology where guest > >>>>>>>>> VM simulates NUMA topology but not actually getting a performance > >>>>>>>>> benefits from it. The same or better results could be achieved > >>>>>>>>> using 'memdev' parameter. In light of that any VM that uses NUMA > >>>>>>>>> to get its benefits should use 'memdev' and to allow transition > >>>>>>>>> initial RAM to device based model, deprecate 'mem' parameter as > >>>>>>>>> its ad-hoc partitioning of initial RAM MemoryRegion can't be > >>>>>>>>> translated to memdev based backend transparently to users and in > >>>>>>>>> compatible manner (migration wise). > >>>>>>>>> > >>>>>>>>> That will also allow to clean up a bit our numa code, leaving only > >>>>>>>>> 'memdev' impl. in place and several boards that use node_mem > >>>>>>>>> to generate FDT/ACPI description from it. > >>>>>>>> > >>>>>>>> Can you confirm that the 'mem' and 'memdev' parameters to -numa > >>>>>>>> are 100% live migration compatible in both directions ? Libvirt > >>>>>>>> would need this to be the case in order to use the 'memdev' syntax > >>>>>>>> instead. > >>>>>>> Unfortunately they are not migration compatible in any direction, > >>>>>>> if it where possible to translate them to each other I'd alias 'mem' > >>>>>>> to 'memdev' without deprecation. The former sends over only one > >>>>>>> MemoryRegion to target, while the later sends over several (one per > >>>>>>> memdev). > >>>>>> > >>>>>> If we can't migration from one to the other, then we can not deprecate > >>>>>> the existing 'mem' syntax. Even if libvirt were to provide a config > >>>>>> option to let apps opt-in to the new syntax, we need to be able to > >>>>>> support live migration of existing running VMs indefinitely. Effectively > >>>>>> this means we need the to keep 'mem' support forever, or at least such > >>>>>> a long time that it effectively means forever. > >>>>>> > >>>>>> So I think this patch has to be dropped & replaced with one that > >>>>>> simply documents that memdev syntax is preferred. > >>>>> > >>>>> We have this habit of postulating absolutes like "can not deprecate" > >>>>> instead of engaging with the tradeoffs. We need to kick it. > >>>>> > >>>>> So let's have an actual look at the tradeoffs. > >>>>> > >>>>> We don't actually "support live migration of existing running VMs > >>>>> indefinitely". > >>>>> > >>>>> We support live migration to any newer version of QEMU that still > >>>>> supports the machine type. > >>>>> > >>>>> We support live migration to any older version of QEMU that already > >>>>> supports the machine type and all the devices the machine uses. > >>>>> > >>>>> Aside: "support" is really an honest best effort here. If you rely on > >>>>> it, use a downstream that puts in the (substantial!) QA work real > >>>>> support takes. > >>>>> > >>>>> Feature deprecation is not a contract to drop the feature after two > >>>>> releases, or even five. It's a formal notice that users of the feature > >>>>> should transition to its replacement in an orderly manner. > >>>>> > >>>>> If I understand Igor correctly, all users should transition away from > >>>>> outdated NUMA configurations at least for new VMs in an orderly manner. > >>>> Yes, we can postpone removing options until there are machines type > >>>> versions that were capable to use it (unfortunate but probably > >>>> unavoidable unless there is a migration trick to make transition > >>>> transparent) but that should not stop us from disabling broken > >>>> options on new machine types at least. > >>>> > >>>> This series can serve as formal notice with follow up disabling of > >>>> deprecated options for new machine types. (As Thomas noted, just warnings > >>>> do not work and users continue to use broken features regardless whether > >>>> they are don't know about issues or aware of it [*]) > >>>> > >>>> Hence suggested deprecation approach and enforced rejection of legacy > >>>> numa options for new machine types in 2 releases so users would stop > >>>> using them eventually. > >>> > >>> When we deprecate something, we need to have a way for apps to use the > >>> new alternative approach *at the same time*. So even if we only want to > >>> deprecate for new machine types, we still have to first solve the problem > >>> of how mgmt apps will introspect QEMU to learn which machine types expect > >>> the new options. > >> I'm not aware any mechanism to introspect machine type options (existing > >> or something being developed). Are/were there any ideas about it that were > >> discussed in the past? > >> > >> Aside from developing a new mechanism what are alternative approaches? > >> I mean when we delete deprecated CLI option, how it's solved on libvirt > >> side currently? > >> > >> For example I don't see anything introspection related when we have been > >> removing deprecated options recently. > > > > Right, with other stuff we deprecate we've had a simpler time, as it > > either didn't affect migration at all, or the new replacement stuff > > was fully compatible with the migration data stream. IOW, libvirt > > could unconditionally use the new feature as soon as it saw that it > > exists in QEMU. We didn't have any machine type dependancy to deal > > with before now. > > We couldn't have done that. How we would migrate from older qemu? > > Anyway, now that I look into this (esp. git log) I came accross: > > commit f309db1f4d51009bad0d32e12efc75530b66836b > Author: Michal Privoznik <mprivozn@redhat.com> > AuthorDate: Thu Dec 18 12:36:48 2014 +0100 > Commit: Michal Privoznik <mprivozn@redhat.com> > CommitDate: Fri Dec 19 07:44:44 2014 +0100 > > qemu: Create memory-backend-{ram,file} iff needed > > Or this 7832fac84741d65e851dbdbfaf474785cbfdcf3c. We did try to > generated newer cmd line but then for various reasong (e.g. avoiding > triggering a qemu bug) we turned it off and make libvirt default to > older (now deprecated) cmd line. > > Frankly, I don't know how to proceed. Unless qemu is fixed to allow > migration from deprecated to new cmd line (unlikely, if not impossible, > right?) then I guess the only approach we can have is that: In non numa case (or 1 node case) there is migration compatible solution, I'm working on it. The only problematic case is where -mem-path + -mem-prealloc + pc-dimm are used with muti-node variant + '-numa node,mem=xx' option or plain '-numa node' with implicit RAM distribution (the same as 'mem' just done by QEMU), here I can't translate options due to different memory layout, so these VMs would still continue using old 'mem' with old machine types. > 1) whenever so called cold booting a new machine (fresh, brand new start > of a new domain) libvirt would default to modern cmd line, that's what I'd prefer and complained that this route wasn't followed. (it would even work with currently broken QEMU, since modern CLI variant works) It would wean down 'mem' userbase that configures NUMA incorrectly (i.e. non pinned mem&cpus). Users still can do that with 'memdev' by leaving out bind policy but when they ask for pinning with bind policy they will really get it since under-hood "-mem-path + -mem-prealloc" will be the same 'memdev,prealloc=on', without surprise due to quite non obvious call ordering. > 2) on migration, libvirt would record in the migration stream (or status > XML or wherever) that modern cmd line was generated and thus it'll make > the destination generate modern cmd line too. since legacy option is not removed, it's not necessary for converting legacy SOURCE to new CLI (new QEMU would still accept legacy CLI for old machine types), i.e.e business as usual. > This solution still suffers a couple of problems: > a) migration to older libvirt will fail as older libvirt won't recognize > the flag set in 2) and therefore would default to deprecated cmd line > b) migrating from one host to another won't modernize the cmd line I didn't knew that there was 'modernizing' feature in libvirt but modernizing CLI only necessary when we remove option completely. > But I guess we have to draw a line somewhere (if we are not willing to > write those migration patches). I wasn't able to make migration from 1 to N MemoryRegions and back work. So the only choice left is to start deprecation of numa 'mem' option with new machine types, so that at least there memory pinning would not be broken and it wouldn't be possible to mis-configure it. > Michal
On Wed, Mar 06, 2019 at 08:03:48PM +0100, Igor Mammedov wrote: > On Mon, 4 Mar 2019 16:35:16 +0000 > Daniel P. Berrangé <berrange@redhat.com> wrote: > > > On Mon, Mar 04, 2019 at 05:20:13PM +0100, Michal Privoznik wrote: > > > We couldn't have done that. How we would migrate from older qemu? > > > > > > Anyway, now that I look into this (esp. git log) I came accross: > > > > > > commit f309db1f4d51009bad0d32e12efc75530b66836b > > > Author: Michal Privoznik <mprivozn@redhat.com> > > > AuthorDate: Thu Dec 18 12:36:48 2014 +0100 > > > Commit: Michal Privoznik <mprivozn@redhat.com> > > > CommitDate: Fri Dec 19 07:44:44 2014 +0100 > > > > > > qemu: Create memory-backend-{ram,file} iff needed > > > > > > Or this 7832fac84741d65e851dbdbfaf474785cbfdcf3c. We did try to generated > > > newer cmd line but then for various reasong (e.g. avoiding triggering a qemu > > > bug) we turned it off and make libvirt default to older (now deprecated) cmd > > > line. > > > > > > Frankly, I don't know how to proceed. Unless qemu is fixed to allow > > > migration from deprecated to new cmd line (unlikely, if not impossible, > > > right?) then I guess the only approach we can have is that: > > > > > > 1) whenever so called cold booting a new machine (fresh, brand new start of > > > a new domain) libvirt would default to modern cmd line, > > > > > > 2) on migration, libvirt would record in the migration stream (or status XML > > > or wherever) that modern cmd line was generated and thus it'll make the > > > destination generate modern cmd line too. > > > > > > This solution still suffers a couple of problems: > > > a) migration to older libvirt will fail as older libvirt won't recognize the > > > flag set in 2) and therefore would default to deprecated cmd line > > > b) migrating from one host to another won't modernize the cmd line > > > > > > But I guess we have to draw a line somewhere (if we are not willing to write > > > those migration patches). > > > > Yeah supporting backwards migration is a non-optional requirement from at > > least one of the mgmt apps using libvirt, so breaking the new to old case > > is something we always aim to avoid. > Aiming for support of > "new QEMU + new machine type" => "old QEMU + non-existing machine type" > seems a bit difficult. That's not the scenario that's the problem. The problem is new QEMU + new machine type + new libvirt -> new QEMU + new machine type + old libvirt Previously released versions of libvirt will happily use any new machine type that QEMU introduces. So we can't make new libvirt use a different options, only for new machine types, as old libvirt supports those machine types too. Regards, Daniel
Daniel P. Berrangé <berrange@redhat.com> writes: > On Mon, Mar 04, 2019 at 12:45:14PM +0100, Markus Armbruster wrote: >> Daniel P. Berrangé <berrange@redhat.com> writes: >> >> > On Mon, Mar 04, 2019 at 08:13:53AM +0100, Markus Armbruster wrote: >> >> If we deprecate outdated NUMA configurations now, we can start rejecting >> >> them with new machine types after a suitable grace period. >> > >> > How is libvirt going to know what machines it can use with the feature ? >> > We don't have any way to introspect machine type specific logic, since we >> > run all probing with "-machine none", and QEMU can't report anything about >> > machines without instantiating them. >> >> Fair point. A practical way for management applications to decide which >> of the two interfaces they can use with which machine type may be >> required for deprecating one of the interfaces with new machine types. > > We currently have "qom-list-properties" which can report on the > existance of properties registered against object types. What it > can't do though is report on the default values of these properties. Yes. > What's interesting though is that qmp_qom_list_properties will actually > instantiate objects in order to query properties, if the type isn't an > abstract type. If it's an abstract type, qom-list-properties returns the properties created with object_class_property_add() & friends, typically by the class_init method. This is possible without instantiating the type. If it's a concrete type, qom-list-properties additionally returns the properties created with object_property_add(), typically by the instance_init() method. This requires instantiating the type. Both kinds of properties can be added or deleted at any time. For instance, setting a property value with object_property_set() or similar could create additional properties. For historical reasons, we use often use object_property_add() where object_class_property_add() would do. Sad. > IOW, even if you are running "$QEMU -machine none", then if at the qmp-shell > you do > > (QEMU) qom-list-properties typename=pc-q35-2.6-machine > > it will have actually instantiate the pc-q35-2.6-machine machine type. > Since it has instantiated the machine, the object initializer function > will have run and initialized the default values for various properties. > > IOW, it is possible for qom-list-properties to report on default values > for non-abstract types. instance_init() also initializes the properties' values. qom-list-properties could show these initial values (I hesitate calling them default values). Setting a property's value can change other properties' values by side effect. My point is: the properties qom-list-properties shows and the initial values it could show are not necessarily final. QOM is designed to be maximally flexible, and flexibility brings along its bosom-buddy complexity. If you keep that in mind, qom-list-properties can be put to good use all the same. A way to report "default values" (really: whatever the values are after object_new()) feels like a fair feature request to me, if backed by an actual use case. [...]
Daniel P. Berrangé <berrange@redhat.com> writes: > On Wed, Mar 06, 2019 at 08:03:48PM +0100, Igor Mammedov wrote: >> On Mon, 4 Mar 2019 16:35:16 +0000 >> Daniel P. Berrangé <berrange@redhat.com> wrote: >> >> > On Mon, Mar 04, 2019 at 05:20:13PM +0100, Michal Privoznik wrote: >> > > We couldn't have done that. How we would migrate from older qemu? >> > > >> > > Anyway, now that I look into this (esp. git log) I came accross: >> > > >> > > commit f309db1f4d51009bad0d32e12efc75530b66836b >> > > Author: Michal Privoznik <mprivozn@redhat.com> >> > > AuthorDate: Thu Dec 18 12:36:48 2014 +0100 >> > > Commit: Michal Privoznik <mprivozn@redhat.com> >> > > CommitDate: Fri Dec 19 07:44:44 2014 +0100 >> > > >> > > qemu: Create memory-backend-{ram,file} iff needed >> > > >> > > Or this 7832fac84741d65e851dbdbfaf474785cbfdcf3c. We did try to generated >> > > newer cmd line but then for various reasong (e.g. avoiding triggering a qemu >> > > bug) we turned it off and make libvirt default to older (now deprecated) cmd >> > > line. >> > > >> > > Frankly, I don't know how to proceed. Unless qemu is fixed to allow >> > > migration from deprecated to new cmd line (unlikely, if not impossible, >> > > right?) then I guess the only approach we can have is that: >> > > >> > > 1) whenever so called cold booting a new machine (fresh, brand new start of >> > > a new domain) libvirt would default to modern cmd line, >> > > >> > > 2) on migration, libvirt would record in the migration stream (or status XML >> > > or wherever) that modern cmd line was generated and thus it'll make the >> > > destination generate modern cmd line too. >> > > >> > > This solution still suffers a couple of problems: >> > > a) migration to older libvirt will fail as older libvirt won't recognize the >> > > flag set in 2) and therefore would default to deprecated cmd line >> > > b) migrating from one host to another won't modernize the cmd line >> > > >> > > But I guess we have to draw a line somewhere (if we are not willing to write >> > > those migration patches). >> > >> > Yeah supporting backwards migration is a non-optional requirement from at >> > least one of the mgmt apps using libvirt, so breaking the new to old case >> > is something we always aim to avoid. >> Aiming for support of >> "new QEMU + new machine type" => "old QEMU + non-existing machine type" >> seems a bit difficult. > > That's not the scenario that's the problem. The problem is > > new QEMU + new machine type + new libvirt -> new QEMU + new machine type + old libvirt > > Previously released versions of libvirt will happily use any new machine > type that QEMU introduces. So we can't make new libvirt use a different > options, only for new machine types, as old libvirt supports those machine > types too. Avoiding tight coupling between QEMU und libvirt versions makes sense, because having to upgrade stuff in lock-step is such a pain. Does not imply we must support arbitrary combinations of QEMU and libvirt versions. Unless upstream libvirt's test matrix covers all versions of libvirt against all released versions of QEMU, "previously released versions of libvirt will continue to work with new QEMU" is largely an empty promise anyway. The real promise is more like "we won't break it intentionally; good luck". Mind, I'm not criticizing that real promise. I'm criticizing cutting yourself off from large areas of the solution space so you can continue to pretend to yourself you actually deliver on the empty promise. Now, if you limited what you promise to something more realistic, ideally to something you actually test, we could talk about deprecation schedules constructively. For instance, if you promised QEMU as of time T + its latest machine type + libvirt as of time T -> QEMU as of time T + its latest machine type + libvirt as of time T - d will work for a certain value of d, then once all released versions of libvirt since T - d support a new way of doing things, flipping to that new way becomes a whole lot easier.
On Sun, 10 Mar 2019 11:16:33 +0100 Markus Armbruster <armbru@redhat.com> wrote: > Daniel P. Berrangé <berrange@redhat.com> writes: > > > On Wed, Mar 06, 2019 at 08:03:48PM +0100, Igor Mammedov wrote: > >> On Mon, 4 Mar 2019 16:35:16 +0000 > >> Daniel P. Berrangé <berrange@redhat.com> wrote: > >> > >> > On Mon, Mar 04, 2019 at 05:20:13PM +0100, Michal Privoznik wrote: > >> > > We couldn't have done that. How we would migrate from older qemu? > >> > > > >> > > Anyway, now that I look into this (esp. git log) I came accross: > >> > > > >> > > commit f309db1f4d51009bad0d32e12efc75530b66836b > >> > > Author: Michal Privoznik <mprivozn@redhat.com> > >> > > AuthorDate: Thu Dec 18 12:36:48 2014 +0100 > >> > > Commit: Michal Privoznik <mprivozn@redhat.com> > >> > > CommitDate: Fri Dec 19 07:44:44 2014 +0100 > >> > > > >> > > qemu: Create memory-backend-{ram,file} iff needed > >> > > > >> > > Or this 7832fac84741d65e851dbdbfaf474785cbfdcf3c. We did try to generated > >> > > newer cmd line but then for various reasong (e.g. avoiding triggering a qemu > >> > > bug) we turned it off and make libvirt default to older (now deprecated) cmd > >> > > line. > >> > > > >> > > Frankly, I don't know how to proceed. Unless qemu is fixed to allow > >> > > migration from deprecated to new cmd line (unlikely, if not impossible, > >> > > right?) then I guess the only approach we can have is that: > >> > > > >> > > 1) whenever so called cold booting a new machine (fresh, brand new start of > >> > > a new domain) libvirt would default to modern cmd line, > >> > > > >> > > 2) on migration, libvirt would record in the migration stream (or status XML > >> > > or wherever) that modern cmd line was generated and thus it'll make the > >> > > destination generate modern cmd line too. > >> > > > >> > > This solution still suffers a couple of problems: > >> > > a) migration to older libvirt will fail as older libvirt won't recognize the > >> > > flag set in 2) and therefore would default to deprecated cmd line > >> > > b) migrating from one host to another won't modernize the cmd line > >> > > > >> > > But I guess we have to draw a line somewhere (if we are not willing to write > >> > > those migration patches). > >> > > >> > Yeah supporting backwards migration is a non-optional requirement from at > >> > least one of the mgmt apps using libvirt, so breaking the new to old case > >> > is something we always aim to avoid. > >> Aiming for support of > >> "new QEMU + new machine type" => "old QEMU + non-existing machine type" > >> seems a bit difficult. > > > > That's not the scenario that's the problem. The problem is > > > > new QEMU + new machine type + new libvirt -> new QEMU + new machine type + old libvirt > > > > Previously released versions of libvirt will happily use any new machine > > type that QEMU introduces. So we can't make new libvirt use a different > > options, only for new machine types, as old libvirt supports those machine > > types too. > > Avoiding tight coupling between QEMU und libvirt versions makes sense, > because having to upgrade stuff in lock-step is such a pain. > > Does not imply we must support arbitrary combinations of QEMU and > libvirt versions. Isn't it typically a job of downstream to ship a bundle that works together and it is rather limited set. e.g System 1 libvirt 0 QEMU 0 (machine 0.1 (latest)) could be migrated to 2 ways to System 2 libvirt 1 QEMU 1 (machine 0.1 (still the same old machine)) while installing QEMU 1 on System 1 might work (if it doesn't break due to dependencies) and even be able to start machine 1.0, wouldn't it really fall in unsupported category? > Unless upstream libvirt's test matrix covers all versions of libvirt > against all released versions of QEMU, "previously released versions of > libvirt will continue to work with new QEMU" is largely an empty promise > anyway. The real promise is more like "we won't break it intentionally; > good luck". > > Mind, I'm not criticizing that real promise. I'm criticizing cutting > yourself off from large areas of the solution space so you can continue > to pretend to yourself you actually deliver on the empty promise. > > Now, if you limited what you promise to something more realistic, > ideally to something you actually test, we could talk about deprecation > schedules constructively. > > For instance, if you promised > > QEMU as of time T + its latest machine type + libvirt as of time T > -> QEMU as of time T + its latest machine type + libvirt as of time T - d > > will work for a certain value of d, then once all released versions of > libvirt since T - d support a new way of doing things, flipping to that > new way becomes a whole lot easier. On top of that libvirt should be willing to adapt to the new way for above combination. I'm experimenting with making 'mem' migration compatible with 'memdev' (I have a working hack currently) to get a feeling of a toll it would take on QEMU, so far outlook isn't good. It just adds more compat code and new CLI options to deal with it (it just increases amount of maintenance nightmare for indefinite period of time). And everyone would hate adding such new CLI options which as it turns out we aren't able to deprecate and remove in practice. We really need to figure out how to introduce breaking change on management (CLI) side* in QEMU and make it digestible for libvirt and others. (* at least for new machine types).
On Mon, 4 Mar 2019 14:52:30 +0100 Igor Mammedov <imammedo@redhat.com> wrote: > On Fri, 1 Mar 2019 18:01:52 +0000 > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote: > > > * Igor Mammedov (imammedo@redhat.com) wrote: > > > On Fri, 1 Mar 2019 15:49:47 +0000 > > > Daniel P. Berrangé <berrange@redhat.com> wrote: > > > > > > > On Fri, Mar 01, 2019 at 04:42:15PM +0100, Igor Mammedov wrote: > > > > > The parameter allows to configure fake NUMA topology where guest > > > > > VM simulates NUMA topology but not actually getting a performance > > > > > benefits from it. The same or better results could be achieved > > > > > using 'memdev' parameter. In light of that any VM that uses NUMA > > > > > to get its benefits should use 'memdev' and to allow transition > > > > > initial RAM to device based model, deprecate 'mem' parameter as > > > > > its ad-hoc partitioning of initial RAM MemoryRegion can't be > > > > > translated to memdev based backend transparently to users and in > > > > > compatible manner (migration wise). > > > > > > > > > > That will also allow to clean up a bit our numa code, leaving only > > > > > 'memdev' impl. in place and several boards that use node_mem > > > > > to generate FDT/ACPI description from it. > > > > > > > > Can you confirm that the 'mem' and 'memdev' parameters to -numa > > > > are 100% live migration compatible in both directions ? Libvirt > > > > would need this to be the case in order to use the 'memdev' syntax > > > > instead. > > > Unfortunately they are not migration compatible in any direction, > > > if it where possible to translate them to each other I'd alias 'mem' > > > to 'memdev' without deprecation. The former sends over only one > > > MemoryRegion to target, while the later sends over several (one per > > > memdev). > > > > > > Mixed memory issue[1] first came from libvirt side RHBZ1624223, > > > back then it was resolved on libvirt side in favor of migration > > > compatibility vs correctness (i.e. bind policy doesn't work as expected). > > > What worse that it was made default and affects all new machines, > > > as I understood it. > > > > > > In case of -mem-path + -mem-prealloc (with 1 numa node or numa less) > > > it's possible on QEMU side to make conversion to memdev in migration > > > compatible way (that's what stopped Michal from memdev approach). > > > But it's hard to do so in multi-nodes case as amount of MemoryRegions > > > is different. > > > > > > Point is to consider 'mem' as mis-configuration error, as the user > > > in the first place using broken numa configuration > > > (i.e. fake numa configuration doesn't actually improve performance). > > > > > > CCed David, maybe he could offer a way to do 1:n migration and other > > > way around. > > > > I can't see a trivial way. > > About the easiest I can think of is if you had a way to create a memdev > > that was an alias to pc.ram (of a particular size and offset). > If I get you right that's what I was planning to do for numa-less machines > that use -mem-path/prealloc options, where it's possible to replace > an initial RAM MemoryRegion with a correspondingly named memdev and its > backing MemoryRegion. > But I don't see how it could work in case of legacy NUMA 'mem' options > where initial RAM is 1 MemoryRegion (it's a fake numa after all) and how to > translate that into several MemoryRegions (one per node/memdev). Limiting it to x86 for demo purposes. What would work (if*) is to create special MemoryRegion container, i.e. 1. make memory_region_allocate_system_memory():memory_region_init() that special which already has id pc.ram and size that matches the single RAMBlock with the same id in incoming migration stream from OLD qemu ( started with -numa node,mem=x ... options) 2. register "1" with vmstate_register_ram_global()/or other API which undercover will make migration code, split the single incoming RAM block into several smaller consecutive RAMBlocks represented by memdev backends that are mapped as subregions within container 'pc.ram' 3. in case of backward migration container MemoryRegion 'pc.ram' will serve other way around stitching back memdev subregions into the single 'pc.ram' migration stream. (if*) - but above describes an ideal use-case where -numa node,mem are properly sized. In practice though QEMU doesn't have any checks on numa's 'mem' value option. So users were able to 'split' RAM in arbitrary chunks which memdev based backends might not be able to recreate due to used backing storage limitations (alignment/page size). To make it worse we don't really know what source (old QEMU) uses for backend for real as it might fallback to anonymous RAM if mem-path fails (there is no fallback in memdev case as there user gets what he/she asked for or hard error). There might be other issues on migration side of things as well, but I just don't know about it enough to see them. > > Dave > > > > > > > > > > Signed-off-by: Igor Mammedov <imammedo@redhat.com> > > > > > --- > > > > > numa.c | 2 ++ > > > > > qemu-deprecated.texi | 14 ++++++++++++++ > > > > > 2 files changed, 16 insertions(+) > > > > > > > > > > diff --git a/numa.c b/numa.c > > > > > index 3875e1e..2205773 100644 > > > > > --- a/numa.c > > > > > +++ b/numa.c > > > > > @@ -121,6 +121,8 @@ static void parse_numa_node(MachineState *ms, NumaNodeOptions *node, > > > > > > > > > > if (node->has_mem) { > > > > > numa_info[nodenr].node_mem = node->mem; > > > > > + warn_report("Parameter -numa node,mem is deprecated," > > > > > + " use -numa node,memdev instead"); > > > > > } > > > > > if (node->has_memdev) { > > > > > Object *o; > > > > > diff --git a/qemu-deprecated.texi b/qemu-deprecated.texi > > > > > index 45c5795..73f99d4 100644 > > > > > --- a/qemu-deprecated.texi > > > > > +++ b/qemu-deprecated.texi > > > > > @@ -60,6 +60,20 @@ Support for invalid topologies will be removed, the user must ensure > > > > > topologies described with -smp include all possible cpus, i.e. > > > > > @math{@var{sockets} * @var{cores} * @var{threads} = @var{maxcpus}}. > > > > > > > > > > +@subsection -numa node,mem=@var{size} (since 4.0) > > > > > + > > > > > +The parameter @option{mem} of @option{-numa node} is used to assign a part of > > > > > +guest RAM to a NUMA node. But when using it, it's impossible to manage specified > > > > > +size on the host side (like bind it to a host node, setting bind policy, ...), > > > > > +so guest end-ups with the fake NUMA configuration with suboptiomal performance. > > > > > +However since 2014 there is an alternative way to assign RAM to a NUMA node > > > > > +using parameter @option{memdev}, which does the same as @option{mem} and has > > > > > +an ability to actualy manage node RAM on the host side. Use parameter > > > > > +@option{memdev} with @var{memory-backend-ram} backend as an replacement for > > > > > +parameter @option{mem} to achieve the same fake NUMA effect or a properly > > > > > +configured @var{memory-backend-file} backend to actually benefit from NUMA > > > > > +configuration. > > > > > + > > > > > @section QEMU Machine Protocol (QMP) commands > > > > > > > > > > @subsection block-dirty-bitmap-add "autoload" parameter (since 2.12.0) > > > > > -- > > > > > 2.7.4 > > > > > > > > > > -- > > > > > libvir-list mailing list > > > > > libvir-list@redhat.com > > > > > https://www.redhat.com/mailman/listinfo/libvir-list > > > > > > > > Regards, > > > > Daniel > > > > > -- > > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK > >
On Sun, 10 Mar 2019 11:14:08 +0100 Markus Armbruster <armbru@redhat.com> wrote: > Daniel P. Berrangé <berrange@redhat.com> writes: > > > On Mon, Mar 04, 2019 at 12:45:14PM +0100, Markus Armbruster wrote: > >> Daniel P. Berrangé <berrange@redhat.com> writes: > >> > >> > On Mon, Mar 04, 2019 at 08:13:53AM +0100, Markus Armbruster wrote: > >> >> If we deprecate outdated NUMA configurations now, we can start rejecting > >> >> them with new machine types after a suitable grace period. > >> > > >> > How is libvirt going to know what machines it can use with the feature ? > >> > We don't have any way to introspect machine type specific logic, since we > >> > run all probing with "-machine none", and QEMU can't report anything about > >> > machines without instantiating them. > >> > >> Fair point. A practical way for management applications to decide which > >> of the two interfaces they can use with which machine type may be > >> required for deprecating one of the interfaces with new machine types. > > > > We currently have "qom-list-properties" which can report on the > > existance of properties registered against object types. What it > > can't do though is report on the default values of these properties. > > Yes. > > > What's interesting though is that qmp_qom_list_properties will actually > > instantiate objects in order to query properties, if the type isn't an > > abstract type. > > If it's an abstract type, qom-list-properties returns the properties > created with object_class_property_add() & friends, typically by the > class_init method. This is possible without instantiating the type. > > If it's a concrete type, qom-list-properties additionally returns the > properties created with object_property_add(), typically by the > instance_init() method. This requires instantiating the type. > > Both kinds of properties can be added or deleted at any time. For > instance, setting a property value with object_property_set() or similar > could create additional properties. > > For historical reasons, we use often use object_property_add() where > object_class_property_add() would do. Sad. > > > IOW, even if you are running "$QEMU -machine none", then if at the qmp-shell > > you do > > > > (QEMU) qom-list-properties typename=pc-q35-2.6-machine > > > > it will have actually instantiate the pc-q35-2.6-machine machine type. > > Since it has instantiated the machine, the object initializer function > > will have run and initialized the default values for various properties. > > > > IOW, it is possible for qom-list-properties to report on default values > > for non-abstract types. > > instance_init() also initializes the properties' values. > qom-list-properties could show these initial values (I hesitate calling > them default values). > > Setting a property's value can change other properties' values by side > effect. > > My point is: the properties qom-list-properties shows and the initial > values it could show are not necessarily final. QOM is designed to be > maximally flexible, and flexibility brings along its bosom-buddy > complexity. > > If you keep that in mind, qom-list-properties can be put to good use all > the same. > > A way to report "default values" (really: whatever the values are after > object_new()) feels like a fair feature request to me, if backed by an > actual use case. Looks like trying to migrate from 'mem' to 'memdev', just creates another train-wreck (where libvirt would have to hunt for 'right' backend configuration to make migration work and even that would be best effort attempt). If that would work reliably, I'd go for it since it allows to drop 'mem' codepath altogether but it doesn't look possible. So I'll look into adding machine level introspection and deprecating 'mem' option for new machine types. > [...] >
diff --git a/numa.c b/numa.c index 3875e1e..2205773 100644 --- a/numa.c +++ b/numa.c @@ -121,6 +121,8 @@ static void parse_numa_node(MachineState *ms, NumaNodeOptions *node, if (node->has_mem) { numa_info[nodenr].node_mem = node->mem; + warn_report("Parameter -numa node,mem is deprecated," + " use -numa node,memdev instead"); } if (node->has_memdev) { Object *o; diff --git a/qemu-deprecated.texi b/qemu-deprecated.texi index 45c5795..73f99d4 100644 --- a/qemu-deprecated.texi +++ b/qemu-deprecated.texi @@ -60,6 +60,20 @@ Support for invalid topologies will be removed, the user must ensure topologies described with -smp include all possible cpus, i.e. @math{@var{sockets} * @var{cores} * @var{threads} = @var{maxcpus}}. +@subsection -numa node,mem=@var{size} (since 4.0) + +The parameter @option{mem} of @option{-numa node} is used to assign a part of +guest RAM to a NUMA node. But when using it, it's impossible to manage specified +size on the host side (like bind it to a host node, setting bind policy, ...), +so guest end-ups with the fake NUMA configuration with suboptiomal performance. +However since 2014 there is an alternative way to assign RAM to a NUMA node +using parameter @option{memdev}, which does the same as @option{mem} and has +an ability to actualy manage node RAM on the host side. Use parameter +@option{memdev} with @var{memory-backend-ram} backend as an replacement for +parameter @option{mem} to achieve the same fake NUMA effect or a properly +configured @var{memory-backend-file} backend to actually benefit from NUMA +configuration. + @section QEMU Machine Protocol (QMP) commands @subsection block-dirty-bitmap-add "autoload" parameter (since 2.12.0)
The parameter allows to configure fake NUMA topology where guest VM simulates NUMA topology but not actually getting a performance benefits from it. The same or better results could be achieved using 'memdev' parameter. In light of that any VM that uses NUMA to get its benefits should use 'memdev' and to allow transition initial RAM to device based model, deprecate 'mem' parameter as its ad-hoc partitioning of initial RAM MemoryRegion can't be translated to memdev based backend transparently to users and in compatible manner (migration wise). That will also allow to clean up a bit our numa code, leaving only 'memdev' impl. in place and several boards that use node_mem to generate FDT/ACPI description from it. Signed-off-by: Igor Mammedov <imammedo@redhat.com> --- numa.c | 2 ++ qemu-deprecated.texi | 14 ++++++++++++++ 2 files changed, 16 insertions(+)