diff mbox series

[1/2] numa: deprecate 'mem' parameter of '-numa node' option

Message ID 1551454936-205218-2-git-send-email-imammedo@redhat.com
State New
Headers show
Series numa: deprecate -numa node, mem and default memory distribution | expand

Commit Message

Igor Mammedov March 1, 2019, 3:42 p.m. UTC
The parameter allows to configure fake NUMA topology where guest
VM simulates NUMA topology but not actually getting a performance
benefits from it. The same or better results could be achieved
using 'memdev' parameter. In light of that any VM that uses NUMA
to get its benefits should use 'memdev' and to allow transition
initial RAM to device based model, deprecate 'mem' parameter as
its ad-hoc partitioning of initial RAM MemoryRegion can't be
translated to memdev based backend transparently to users and in
compatible manner (migration wise).

That will also allow to clean up a bit our numa code, leaving only
'memdev' impl. in place and several boards that use node_mem
to generate FDT/ACPI description from it.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
 numa.c               |  2 ++
 qemu-deprecated.texi | 14 ++++++++++++++
 2 files changed, 16 insertions(+)

Comments

Daniel P. Berrangé March 1, 2019, 3:49 p.m. UTC | #1
On Fri, Mar 01, 2019 at 04:42:15PM +0100, Igor Mammedov wrote:
> The parameter allows to configure fake NUMA topology where guest
> VM simulates NUMA topology but not actually getting a performance
> benefits from it. The same or better results could be achieved
> using 'memdev' parameter. In light of that any VM that uses NUMA
> to get its benefits should use 'memdev' and to allow transition
> initial RAM to device based model, deprecate 'mem' parameter as
> its ad-hoc partitioning of initial RAM MemoryRegion can't be
> translated to memdev based backend transparently to users and in
> compatible manner (migration wise).
> 
> That will also allow to clean up a bit our numa code, leaving only
> 'memdev' impl. in place and several boards that use node_mem
> to generate FDT/ACPI description from it.

Can you confirm that the  'mem' and 'memdev' parameters to -numa
are 100% live migration compatible in both directions ?  Libvirt
would need this to be the case in order to use the 'memdev' syntax
instead.

> 
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> ---
>  numa.c               |  2 ++
>  qemu-deprecated.texi | 14 ++++++++++++++
>  2 files changed, 16 insertions(+)
> 
> diff --git a/numa.c b/numa.c
> index 3875e1e..2205773 100644
> --- a/numa.c
> +++ b/numa.c
> @@ -121,6 +121,8 @@ static void parse_numa_node(MachineState *ms, NumaNodeOptions *node,
>  
>      if (node->has_mem) {
>          numa_info[nodenr].node_mem = node->mem;
> +        warn_report("Parameter -numa node,mem is deprecated,"
> +                    " use -numa node,memdev instead");
>      }
>      if (node->has_memdev) {
>          Object *o;
> diff --git a/qemu-deprecated.texi b/qemu-deprecated.texi
> index 45c5795..73f99d4 100644
> --- a/qemu-deprecated.texi
> +++ b/qemu-deprecated.texi
> @@ -60,6 +60,20 @@ Support for invalid topologies will be removed, the user must ensure
>  topologies described with -smp include all possible cpus, i.e.
>    @math{@var{sockets} * @var{cores} * @var{threads} = @var{maxcpus}}.
>  
> +@subsection -numa node,mem=@var{size} (since 4.0)
> +
> +The parameter @option{mem} of @option{-numa node} is used to assign a part of
> +guest RAM to a NUMA node. But when using it, it's impossible to manage specified
> +size on the host side (like bind it to a host node, setting bind policy, ...),
> +so guest end-ups with the fake NUMA configuration with suboptiomal performance.
> +However since 2014 there is an alternative way to assign RAM to a NUMA node
> +using parameter @option{memdev}, which does the same as @option{mem} and has
> +an ability to actualy manage node RAM on the host side. Use parameter
> +@option{memdev} with @var{memory-backend-ram} backend as an replacement for
> +parameter @option{mem} to achieve the same fake NUMA effect or a properly
> +configured @var{memory-backend-file} backend to actually benefit from NUMA
> +configuration.
> +
>  @section QEMU Machine Protocol (QMP) commands
>  
>  @subsection block-dirty-bitmap-add "autoload" parameter (since 2.12.0)
> -- 
> 2.7.4
> 
> --
> libvir-list mailing list
> libvir-list@redhat.com
> https://www.redhat.com/mailman/listinfo/libvir-list

Regards,
Daniel
Igor Mammedov March 1, 2019, 5:33 p.m. UTC | #2
On Fri, 1 Mar 2019 15:49:47 +0000
Daniel P. Berrangé <berrange@redhat.com> wrote:

> On Fri, Mar 01, 2019 at 04:42:15PM +0100, Igor Mammedov wrote:
> > The parameter allows to configure fake NUMA topology where guest
> > VM simulates NUMA topology but not actually getting a performance
> > benefits from it. The same or better results could be achieved
> > using 'memdev' parameter. In light of that any VM that uses NUMA
> > to get its benefits should use 'memdev' and to allow transition
> > initial RAM to device based model, deprecate 'mem' parameter as
> > its ad-hoc partitioning of initial RAM MemoryRegion can't be
> > translated to memdev based backend transparently to users and in
> > compatible manner (migration wise).
> > 
> > That will also allow to clean up a bit our numa code, leaving only
> > 'memdev' impl. in place and several boards that use node_mem
> > to generate FDT/ACPI description from it.  
> 
> Can you confirm that the  'mem' and 'memdev' parameters to -numa
> are 100% live migration compatible in both directions ?  Libvirt
> would need this to be the case in order to use the 'memdev' syntax
> instead.
Unfortunately they are not migration compatible in any direction,
if it where possible to translate them to each other I'd alias 'mem'
to 'memdev' without deprecation. The former sends over only one
MemoryRegion to target, while the later sends over several (one per
memdev).

Mixed memory issue[1] first came from libvirt side RHBZ1624223,
back then it was resolved on libvirt side in favor of migration
compatibility vs correctness (i.e. bind policy doesn't work as expected).
What worse that it was made default and affects all new machines,
as I understood it.

In case of -mem-path + -mem-prealloc (with 1 numa node or numa less)
it's possible on QEMU side to make conversion to memdev in migration
compatible way (that's what stopped Michal from memdev approach).
But it's hard to do so in multi-nodes case as amount of MemoryRegions
is different.

Point is to consider 'mem' as mis-configuration error, as the user
in the first place using broken numa configuration
(i.e. fake numa configuration doesn't actually improve performance).

CCed David, maybe he could offer a way to do 1:n migration and other
way around.


> > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> > ---
> >  numa.c               |  2 ++
> >  qemu-deprecated.texi | 14 ++++++++++++++
> >  2 files changed, 16 insertions(+)
> > 
> > diff --git a/numa.c b/numa.c
> > index 3875e1e..2205773 100644
> > --- a/numa.c
> > +++ b/numa.c
> > @@ -121,6 +121,8 @@ static void parse_numa_node(MachineState *ms, NumaNodeOptions *node,
> >  
> >      if (node->has_mem) {
> >          numa_info[nodenr].node_mem = node->mem;
> > +        warn_report("Parameter -numa node,mem is deprecated,"
> > +                    " use -numa node,memdev instead");
> >      }
> >      if (node->has_memdev) {
> >          Object *o;
> > diff --git a/qemu-deprecated.texi b/qemu-deprecated.texi
> > index 45c5795..73f99d4 100644
> > --- a/qemu-deprecated.texi
> > +++ b/qemu-deprecated.texi
> > @@ -60,6 +60,20 @@ Support for invalid topologies will be removed, the user must ensure
> >  topologies described with -smp include all possible cpus, i.e.
> >    @math{@var{sockets} * @var{cores} * @var{threads} = @var{maxcpus}}.
> >  
> > +@subsection -numa node,mem=@var{size} (since 4.0)
> > +
> > +The parameter @option{mem} of @option{-numa node} is used to assign a part of
> > +guest RAM to a NUMA node. But when using it, it's impossible to manage specified
> > +size on the host side (like bind it to a host node, setting bind policy, ...),
> > +so guest end-ups with the fake NUMA configuration with suboptiomal performance.
> > +However since 2014 there is an alternative way to assign RAM to a NUMA node
> > +using parameter @option{memdev}, which does the same as @option{mem} and has
> > +an ability to actualy manage node RAM on the host side. Use parameter
> > +@option{memdev} with @var{memory-backend-ram} backend as an replacement for
> > +parameter @option{mem} to achieve the same fake NUMA effect or a properly
> > +configured @var{memory-backend-file} backend to actually benefit from NUMA
> > +configuration.
> > +
> >  @section QEMU Machine Protocol (QMP) commands
> >  
> >  @subsection block-dirty-bitmap-add "autoload" parameter (since 2.12.0)
> > -- 
> > 2.7.4
> > 
> > --
> > libvir-list mailing list
> > libvir-list@redhat.com
> > https://www.redhat.com/mailman/listinfo/libvir-list  
> 
> Regards,
> Daniel
Daniel P. Berrangé March 1, 2019, 5:48 p.m. UTC | #3
On Fri, Mar 01, 2019 at 06:33:28PM +0100, Igor Mammedov wrote:
> On Fri, 1 Mar 2019 15:49:47 +0000
> Daniel P. Berrangé <berrange@redhat.com> wrote:
> 
> > On Fri, Mar 01, 2019 at 04:42:15PM +0100, Igor Mammedov wrote:
> > > The parameter allows to configure fake NUMA topology where guest
> > > VM simulates NUMA topology but not actually getting a performance
> > > benefits from it. The same or better results could be achieved
> > > using 'memdev' parameter. In light of that any VM that uses NUMA
> > > to get its benefits should use 'memdev' and to allow transition
> > > initial RAM to device based model, deprecate 'mem' parameter as
> > > its ad-hoc partitioning of initial RAM MemoryRegion can't be
> > > translated to memdev based backend transparently to users and in
> > > compatible manner (migration wise).
> > > 
> > > That will also allow to clean up a bit our numa code, leaving only
> > > 'memdev' impl. in place and several boards that use node_mem
> > > to generate FDT/ACPI description from it.  
> > 
> > Can you confirm that the  'mem' and 'memdev' parameters to -numa
> > are 100% live migration compatible in both directions ?  Libvirt
> > would need this to be the case in order to use the 'memdev' syntax
> > instead.
> Unfortunately they are not migration compatible in any direction,
> if it where possible to translate them to each other I'd alias 'mem'
> to 'memdev' without deprecation. The former sends over only one
> MemoryRegion to target, while the later sends over several (one per
> memdev).

If we can't migration from one to the other, then we can not deprecate
the existing 'mem' syntax. Even if libvirt were to provide a config
option to let apps opt-in to the new syntax, we need to be able to
support live migration of existing running VMs indefinitely. Effectively
this means we need the to keep 'mem' support forever, or at least such
a long time that it effectively means forever.

So I think this patch has to be dropped & replaced with one that
simply documents that memdev syntax is preferred.

Regards,
Daniel
Dr. David Alan Gilbert March 1, 2019, 6:01 p.m. UTC | #4
* Igor Mammedov (imammedo@redhat.com) wrote:
> On Fri, 1 Mar 2019 15:49:47 +0000
> Daniel P. Berrangé <berrange@redhat.com> wrote:
> 
> > On Fri, Mar 01, 2019 at 04:42:15PM +0100, Igor Mammedov wrote:
> > > The parameter allows to configure fake NUMA topology where guest
> > > VM simulates NUMA topology but not actually getting a performance
> > > benefits from it. The same or better results could be achieved
> > > using 'memdev' parameter. In light of that any VM that uses NUMA
> > > to get its benefits should use 'memdev' and to allow transition
> > > initial RAM to device based model, deprecate 'mem' parameter as
> > > its ad-hoc partitioning of initial RAM MemoryRegion can't be
> > > translated to memdev based backend transparently to users and in
> > > compatible manner (migration wise).
> > > 
> > > That will also allow to clean up a bit our numa code, leaving only
> > > 'memdev' impl. in place and several boards that use node_mem
> > > to generate FDT/ACPI description from it.  
> > 
> > Can you confirm that the  'mem' and 'memdev' parameters to -numa
> > are 100% live migration compatible in both directions ?  Libvirt
> > would need this to be the case in order to use the 'memdev' syntax
> > instead.
> Unfortunately they are not migration compatible in any direction,
> if it where possible to translate them to each other I'd alias 'mem'
> to 'memdev' without deprecation. The former sends over only one
> MemoryRegion to target, while the later sends over several (one per
> memdev).
> 
> Mixed memory issue[1] first came from libvirt side RHBZ1624223,
> back then it was resolved on libvirt side in favor of migration
> compatibility vs correctness (i.e. bind policy doesn't work as expected).
> What worse that it was made default and affects all new machines,
> as I understood it.
> 
> In case of -mem-path + -mem-prealloc (with 1 numa node or numa less)
> it's possible on QEMU side to make conversion to memdev in migration
> compatible way (that's what stopped Michal from memdev approach).
> But it's hard to do so in multi-nodes case as amount of MemoryRegions
> is different.
> 
> Point is to consider 'mem' as mis-configuration error, as the user
> in the first place using broken numa configuration
> (i.e. fake numa configuration doesn't actually improve performance).
> 
> CCed David, maybe he could offer a way to do 1:n migration and other
> way around.

I can't see a trivial way.
About the easiest I can think of is if you had a way to create a memdev
that was an alias to pc.ram (of a particular size and offset).

Dave

> 
> > > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> > > ---
> > >  numa.c               |  2 ++
> > >  qemu-deprecated.texi | 14 ++++++++++++++
> > >  2 files changed, 16 insertions(+)
> > > 
> > > diff --git a/numa.c b/numa.c
> > > index 3875e1e..2205773 100644
> > > --- a/numa.c
> > > +++ b/numa.c
> > > @@ -121,6 +121,8 @@ static void parse_numa_node(MachineState *ms, NumaNodeOptions *node,
> > >  
> > >      if (node->has_mem) {
> > >          numa_info[nodenr].node_mem = node->mem;
> > > +        warn_report("Parameter -numa node,mem is deprecated,"
> > > +                    " use -numa node,memdev instead");
> > >      }
> > >      if (node->has_memdev) {
> > >          Object *o;
> > > diff --git a/qemu-deprecated.texi b/qemu-deprecated.texi
> > > index 45c5795..73f99d4 100644
> > > --- a/qemu-deprecated.texi
> > > +++ b/qemu-deprecated.texi
> > > @@ -60,6 +60,20 @@ Support for invalid topologies will be removed, the user must ensure
> > >  topologies described with -smp include all possible cpus, i.e.
> > >    @math{@var{sockets} * @var{cores} * @var{threads} = @var{maxcpus}}.
> > >  
> > > +@subsection -numa node,mem=@var{size} (since 4.0)
> > > +
> > > +The parameter @option{mem} of @option{-numa node} is used to assign a part of
> > > +guest RAM to a NUMA node. But when using it, it's impossible to manage specified
> > > +size on the host side (like bind it to a host node, setting bind policy, ...),
> > > +so guest end-ups with the fake NUMA configuration with suboptiomal performance.
> > > +However since 2014 there is an alternative way to assign RAM to a NUMA node
> > > +using parameter @option{memdev}, which does the same as @option{mem} and has
> > > +an ability to actualy manage node RAM on the host side. Use parameter
> > > +@option{memdev} with @var{memory-backend-ram} backend as an replacement for
> > > +parameter @option{mem} to achieve the same fake NUMA effect or a properly
> > > +configured @var{memory-backend-file} backend to actually benefit from NUMA
> > > +configuration.
> > > +
> > >  @section QEMU Machine Protocol (QMP) commands
> > >  
> > >  @subsection block-dirty-bitmap-add "autoload" parameter (since 2.12.0)
> > > -- 
> > > 2.7.4
> > > 
> > > --
> > > libvir-list mailing list
> > > libvir-list@redhat.com
> > > https://www.redhat.com/mailman/listinfo/libvir-list  
> > 
> > Regards,
> > Daniel
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Markus Armbruster March 4, 2019, 7:13 a.m. UTC | #5
Daniel P. Berrangé <berrange@redhat.com> writes:

> On Fri, Mar 01, 2019 at 06:33:28PM +0100, Igor Mammedov wrote:
>> On Fri, 1 Mar 2019 15:49:47 +0000
>> Daniel P. Berrangé <berrange@redhat.com> wrote:
>> 
>> > On Fri, Mar 01, 2019 at 04:42:15PM +0100, Igor Mammedov wrote:
>> > > The parameter allows to configure fake NUMA topology where guest
>> > > VM simulates NUMA topology but not actually getting a performance
>> > > benefits from it. The same or better results could be achieved
>> > > using 'memdev' parameter. In light of that any VM that uses NUMA
>> > > to get its benefits should use 'memdev' and to allow transition
>> > > initial RAM to device based model, deprecate 'mem' parameter as
>> > > its ad-hoc partitioning of initial RAM MemoryRegion can't be
>> > > translated to memdev based backend transparently to users and in
>> > > compatible manner (migration wise).
>> > > 
>> > > That will also allow to clean up a bit our numa code, leaving only
>> > > 'memdev' impl. in place and several boards that use node_mem
>> > > to generate FDT/ACPI description from it.  
>> > 
>> > Can you confirm that the  'mem' and 'memdev' parameters to -numa
>> > are 100% live migration compatible in both directions ?  Libvirt
>> > would need this to be the case in order to use the 'memdev' syntax
>> > instead.
>> Unfortunately they are not migration compatible in any direction,
>> if it where possible to translate them to each other I'd alias 'mem'
>> to 'memdev' without deprecation. The former sends over only one
>> MemoryRegion to target, while the later sends over several (one per
>> memdev).
>
> If we can't migration from one to the other, then we can not deprecate
> the existing 'mem' syntax. Even if libvirt were to provide a config
> option to let apps opt-in to the new syntax, we need to be able to
> support live migration of existing running VMs indefinitely. Effectively
> this means we need the to keep 'mem' support forever, or at least such
> a long time that it effectively means forever.
>
> So I think this patch has to be dropped & replaced with one that
> simply documents that memdev syntax is preferred.

We have this habit of postulating absolutes like "can not deprecate"
instead of engaging with the tradeoffs.  We need to kick it.

So let's have an actual look at the tradeoffs.

We don't actually "support live migration of existing running VMs
indefinitely".

We support live migration to any newer version of QEMU that still
supports the machine type.

We support live migration to any older version of QEMU that already
supports the machine type and all the devices the machine uses.

Aside: "support" is really an honest best effort here.  If you rely on
it, use a downstream that puts in the (substantial!) QA work real
support takes.

Feature deprecation is not a contract to drop the feature after two
releases, or even five.  It's a formal notice that users of the feature
should transition to its replacement in an orderly manner.

If I understand Igor correctly, all users should transition away from
outdated NUMA configurations at least for new VMs in an orderly manner.

So, how could this formal notice be served constructively?

If we reject outdated NUMA configurations starting with machine type T,
we can remove the means to create those configurations along with
machine type T-1.  Won't happen anytime soon, will happen eventually,
because in the long run, all machine types are dead (apologies to
Keynes).

If we deprecate outdated NUMA configurations now, we can start rejecting
them with new machine types after a suitable grace period.
Thomas Huth March 4, 2019, 8:11 a.m. UTC | #6
On 01/03/2019 18.48, Daniel P. Berrangé wrote:
[...]
> So I think this patch has to be dropped & replaced with one that
> simply documents that memdev syntax is preferred.

That's definitely not enough. I've had a couple of cases already where
we documented that certain options should not be used anymore, and
people simply ignored it (aka. if it ain't broken, don't do any change).
Then they just started to complain when I really tried to remove the
option after the deprecation period.

So Igor, if you can not officially deprecate these things here yet, you
should at least make sure that they can not be used with new machine
types anymore. Then, after a couple of years, when we feel sure that
there are only some few or no people left who still use it with the old
machine types, we can start to discuss the deprecation process again, I
think.

 Thomas
Daniel P. Berrangé March 4, 2019, 10:19 a.m. UTC | #7
On Mon, Mar 04, 2019 at 08:13:53AM +0100, Markus Armbruster wrote:
> Daniel P. Berrangé <berrange@redhat.com> writes:
> 
> > On Fri, Mar 01, 2019 at 06:33:28PM +0100, Igor Mammedov wrote:
> >> On Fri, 1 Mar 2019 15:49:47 +0000
> >> Daniel P. Berrangé <berrange@redhat.com> wrote:
> >> 
> >> > On Fri, Mar 01, 2019 at 04:42:15PM +0100, Igor Mammedov wrote:
> >> > > The parameter allows to configure fake NUMA topology where guest
> >> > > VM simulates NUMA topology but not actually getting a performance
> >> > > benefits from it. The same or better results could be achieved
> >> > > using 'memdev' parameter. In light of that any VM that uses NUMA
> >> > > to get its benefits should use 'memdev' and to allow transition
> >> > > initial RAM to device based model, deprecate 'mem' parameter as
> >> > > its ad-hoc partitioning of initial RAM MemoryRegion can't be
> >> > > translated to memdev based backend transparently to users and in
> >> > > compatible manner (migration wise).
> >> > > 
> >> > > That will also allow to clean up a bit our numa code, leaving only
> >> > > 'memdev' impl. in place and several boards that use node_mem
> >> > > to generate FDT/ACPI description from it.  
> >> > 
> >> > Can you confirm that the  'mem' and 'memdev' parameters to -numa
> >> > are 100% live migration compatible in both directions ?  Libvirt
> >> > would need this to be the case in order to use the 'memdev' syntax
> >> > instead.
> >> Unfortunately they are not migration compatible in any direction,
> >> if it where possible to translate them to each other I'd alias 'mem'
> >> to 'memdev' without deprecation. The former sends over only one
> >> MemoryRegion to target, while the later sends over several (one per
> >> memdev).
> >
> > If we can't migration from one to the other, then we can not deprecate
> > the existing 'mem' syntax. Even if libvirt were to provide a config
> > option to let apps opt-in to the new syntax, we need to be able to
> > support live migration of existing running VMs indefinitely. Effectively
> > this means we need the to keep 'mem' support forever, or at least such
> > a long time that it effectively means forever.
> >
> > So I think this patch has to be dropped & replaced with one that
> > simply documents that memdev syntax is preferred.
> 
> We have this habit of postulating absolutes like "can not deprecate"
> instead of engaging with the tradeoffs.  We need to kick it.
> 
> So let's have an actual look at the tradeoffs.
> 
> We don't actually "support live migration of existing running VMs
> indefinitely".
>
> We support live migration to any newer version of QEMU that still
> supports the machine type.
> 
> We support live migration to any older version of QEMU that already
> supports the machine type and all the devices the machine uses.
> 
> Aside: "support" is really an honest best effort here.  If you rely on
> it, use a downstream that puts in the (substantial!) QA work real
> support takes.

If upstream deletes the feature, then that in turn breaks the downstream
unless downstream reverts the upstream change. When we have large overlap
between downstream & upstream maintainer, it is not beneficial to delete
the feature upstream as any effort saved upstream usually expands into
larger effort downstream.

> Feature deprecation is not a contract to drop the feature after two
> releases, or even five.  It's a formal notice that users of the feature
> should transition to its replacement in an orderly manner.
> 
> If I understand Igor correctly, all users should transition away from
> outdated NUMA configurations at least for new VMs in an orderly manner.
> 
> So, how could this formal notice be served constructively?
> 
> If we reject outdated NUMA configurations starting with machine type T,
> we can remove the means to create those configurations along with
> machine type T-1.  Won't happen anytime soon, will happen eventually,
> because in the long run, all machine types are dead (apologies to
> Keynes).
> 
> If we deprecate outdated NUMA configurations now, we can start rejecting
> them with new machine types after a suitable grace period.

How is libvirt going to know what machines it can use with the feature ?
We don't have any way to introspect machine type specific logic, since we
run all probing with "-machine none", and QEMU can't report anything about
machines without instantiating them.

Regards,
Daniel
Markus Armbruster March 4, 2019, 11:45 a.m. UTC | #8
Daniel P. Berrangé <berrange@redhat.com> writes:

> On Mon, Mar 04, 2019 at 08:13:53AM +0100, Markus Armbruster wrote:
>> Daniel P. Berrangé <berrange@redhat.com> writes:
>> 
>> > On Fri, Mar 01, 2019 at 06:33:28PM +0100, Igor Mammedov wrote:
>> >> On Fri, 1 Mar 2019 15:49:47 +0000
>> >> Daniel P. Berrangé <berrange@redhat.com> wrote:
>> >> 
>> >> > On Fri, Mar 01, 2019 at 04:42:15PM +0100, Igor Mammedov wrote:
>> >> > > The parameter allows to configure fake NUMA topology where guest
>> >> > > VM simulates NUMA topology but not actually getting a performance
>> >> > > benefits from it. The same or better results could be achieved
>> >> > > using 'memdev' parameter. In light of that any VM that uses NUMA
>> >> > > to get its benefits should use 'memdev' and to allow transition
>> >> > > initial RAM to device based model, deprecate 'mem' parameter as
>> >> > > its ad-hoc partitioning of initial RAM MemoryRegion can't be
>> >> > > translated to memdev based backend transparently to users and in
>> >> > > compatible manner (migration wise).
>> >> > > 
>> >> > > That will also allow to clean up a bit our numa code, leaving only
>> >> > > 'memdev' impl. in place and several boards that use node_mem
>> >> > > to generate FDT/ACPI description from it.  
>> >> > 
>> >> > Can you confirm that the  'mem' and 'memdev' parameters to -numa
>> >> > are 100% live migration compatible in both directions ?  Libvirt
>> >> > would need this to be the case in order to use the 'memdev' syntax
>> >> > instead.
>> >> Unfortunately they are not migration compatible in any direction,
>> >> if it where possible to translate them to each other I'd alias 'mem'
>> >> to 'memdev' without deprecation. The former sends over only one
>> >> MemoryRegion to target, while the later sends over several (one per
>> >> memdev).
>> >
>> > If we can't migration from one to the other, then we can not deprecate
>> > the existing 'mem' syntax. Even if libvirt were to provide a config
>> > option to let apps opt-in to the new syntax, we need to be able to
>> > support live migration of existing running VMs indefinitely. Effectively
>> > this means we need the to keep 'mem' support forever, or at least such
>> > a long time that it effectively means forever.
>> >
>> > So I think this patch has to be dropped & replaced with one that
>> > simply documents that memdev syntax is preferred.
>> 
>> We have this habit of postulating absolutes like "can not deprecate"
>> instead of engaging with the tradeoffs.  We need to kick it.
>> 
>> So let's have an actual look at the tradeoffs.
>> 
>> We don't actually "support live migration of existing running VMs
>> indefinitely".
>>
>> We support live migration to any newer version of QEMU that still
>> supports the machine type.
>> 
>> We support live migration to any older version of QEMU that already
>> supports the machine type and all the devices the machine uses.
>> 
>> Aside: "support" is really an honest best effort here.  If you rely on
>> it, use a downstream that puts in the (substantial!) QA work real
>> support takes.
>
> If upstream deletes the feature, then that in turn breaks the downstream
> unless downstream reverts the upstream change. When we have large overlap
> between downstream & upstream maintainer, it is not beneficial to delete
> the feature upstream as any effort saved upstream usually expands into
> larger effort downstream.

It can't "break" existing downstreams, only future ones forked off after
the deletion.  Such a fork cares only if it has backward compatibility
requirements to satisfy that require the feature.  My point is: it's not
a simple absolute, it's a complex tradeoff.

>> Feature deprecation is not a contract to drop the feature after two
>> releases, or even five.  It's a formal notice that users of the feature
>> should transition to its replacement in an orderly manner.
>> 
>> If I understand Igor correctly, all users should transition away from
>> outdated NUMA configurations at least for new VMs in an orderly manner.
>> 
>> So, how could this formal notice be served constructively?
>> 
>> If we reject outdated NUMA configurations starting with machine type T,
>> we can remove the means to create those configurations along with
>> machine type T-1.  Won't happen anytime soon, will happen eventually,
>> because in the long run, all machine types are dead (apologies to
>> Keynes).
>> 
>> If we deprecate outdated NUMA configurations now, we can start rejecting
>> them with new machine types after a suitable grace period.
>
> How is libvirt going to know what machines it can use with the feature ?
> We don't have any way to introspect machine type specific logic, since we
> run all probing with "-machine none", and QEMU can't report anything about
> machines without instantiating them.

Fair point.  A practical way for management applications to decide which
of the two interfaces they can use with which machine type may be
required for deprecating one of the interfaces with new machine types.
Igor Mammedov March 4, 2019, 12:25 p.m. UTC | #9
On Mon, 04 Mar 2019 08:13:53 +0100
Markus Armbruster <armbru@redhat.com> wrote:

> Daniel P. Berrangé <berrange@redhat.com> writes:
> 
> > On Fri, Mar 01, 2019 at 06:33:28PM +0100, Igor Mammedov wrote:  
> >> On Fri, 1 Mar 2019 15:49:47 +0000
> >> Daniel P. Berrangé <berrange@redhat.com> wrote:
> >>   
> >> > On Fri, Mar 01, 2019 at 04:42:15PM +0100, Igor Mammedov wrote:  
> >> > > The parameter allows to configure fake NUMA topology where guest
> >> > > VM simulates NUMA topology but not actually getting a performance
> >> > > benefits from it. The same or better results could be achieved
> >> > > using 'memdev' parameter. In light of that any VM that uses NUMA
> >> > > to get its benefits should use 'memdev' and to allow transition
> >> > > initial RAM to device based model, deprecate 'mem' parameter as
> >> > > its ad-hoc partitioning of initial RAM MemoryRegion can't be
> >> > > translated to memdev based backend transparently to users and in
> >> > > compatible manner (migration wise).
> >> > > 
> >> > > That will also allow to clean up a bit our numa code, leaving only
> >> > > 'memdev' impl. in place and several boards that use node_mem
> >> > > to generate FDT/ACPI description from it.    
> >> > 
> >> > Can you confirm that the  'mem' and 'memdev' parameters to -numa
> >> > are 100% live migration compatible in both directions ?  Libvirt
> >> > would need this to be the case in order to use the 'memdev' syntax
> >> > instead.  
> >> Unfortunately they are not migration compatible in any direction,
> >> if it where possible to translate them to each other I'd alias 'mem'
> >> to 'memdev' without deprecation. The former sends over only one
> >> MemoryRegion to target, while the later sends over several (one per
> >> memdev).  
> >
> > If we can't migration from one to the other, then we can not deprecate
> > the existing 'mem' syntax. Even if libvirt were to provide a config
> > option to let apps opt-in to the new syntax, we need to be able to
> > support live migration of existing running VMs indefinitely. Effectively
> > this means we need the to keep 'mem' support forever, or at least such
> > a long time that it effectively means forever.
> >
> > So I think this patch has to be dropped & replaced with one that
> > simply documents that memdev syntax is preferred.  
> 
> We have this habit of postulating absolutes like "can not deprecate"
> instead of engaging with the tradeoffs.  We need to kick it.
> 
> So let's have an actual look at the tradeoffs.
> 
> We don't actually "support live migration of existing running VMs
> indefinitely".
> 
> We support live migration to any newer version of QEMU that still
> supports the machine type.
> 
> We support live migration to any older version of QEMU that already
> supports the machine type and all the devices the machine uses.
> 
> Aside: "support" is really an honest best effort here.  If you rely on
> it, use a downstream that puts in the (substantial!) QA work real
> support takes.
> 
> Feature deprecation is not a contract to drop the feature after two
> releases, or even five.  It's a formal notice that users of the feature
> should transition to its replacement in an orderly manner.
> 
> If I understand Igor correctly, all users should transition away from
> outdated NUMA configurations at least for new VMs in an orderly manner.
Yes, we can postpone removing options until there are machines type
versions that were capable to use it (unfortunate but probably 
unavoidable unless there is a migration trick to make transition
transparent) but that should not stop us from disabling broken
options on new machine types at least.

This series can serve as formal notice with follow up disabling of
deprecated options for new machine types. (As Thomas noted, just warnings
do not work and users continue to use broken features regardless whether
they are don't know about issues or aware of it [*])

Hence suggested deprecation approach and enforced rejection of legacy
numa options for new machine types in 2 releases so users would stop
using them eventually.

*) https://www.redhat.com/archives/libvir-list/2018-November/msg00159.html

> So, how could this formal notice be served constructively?
> 
> If we reject outdated NUMA configurations starting with machine type T,
> we can remove the means to create those configurations along with
> machine type T-1.  Won't happen anytime soon, will happen eventually,
> because in the long run, all machine types are dead (apologies to
> Keynes).
> 
> If we deprecate outdated NUMA configurations now, we can start rejecting
> them with new machine types after a suitable grace period.
>
Daniel P. Berrangé March 4, 2019, 12:39 p.m. UTC | #10
On Mon, Mar 04, 2019 at 01:25:07PM +0100, Igor Mammedov wrote:
> On Mon, 04 Mar 2019 08:13:53 +0100
> Markus Armbruster <armbru@redhat.com> wrote:
> 
> > Daniel P. Berrangé <berrange@redhat.com> writes:
> > 
> > > On Fri, Mar 01, 2019 at 06:33:28PM +0100, Igor Mammedov wrote:  
> > >> On Fri, 1 Mar 2019 15:49:47 +0000
> > >> Daniel P. Berrangé <berrange@redhat.com> wrote:
> > >>   
> > >> > On Fri, Mar 01, 2019 at 04:42:15PM +0100, Igor Mammedov wrote:  
> > >> > > The parameter allows to configure fake NUMA topology where guest
> > >> > > VM simulates NUMA topology but not actually getting a performance
> > >> > > benefits from it. The same or better results could be achieved
> > >> > > using 'memdev' parameter. In light of that any VM that uses NUMA
> > >> > > to get its benefits should use 'memdev' and to allow transition
> > >> > > initial RAM to device based model, deprecate 'mem' parameter as
> > >> > > its ad-hoc partitioning of initial RAM MemoryRegion can't be
> > >> > > translated to memdev based backend transparently to users and in
> > >> > > compatible manner (migration wise).
> > >> > > 
> > >> > > That will also allow to clean up a bit our numa code, leaving only
> > >> > > 'memdev' impl. in place and several boards that use node_mem
> > >> > > to generate FDT/ACPI description from it.    
> > >> > 
> > >> > Can you confirm that the  'mem' and 'memdev' parameters to -numa
> > >> > are 100% live migration compatible in both directions ?  Libvirt
> > >> > would need this to be the case in order to use the 'memdev' syntax
> > >> > instead.  
> > >> Unfortunately they are not migration compatible in any direction,
> > >> if it where possible to translate them to each other I'd alias 'mem'
> > >> to 'memdev' without deprecation. The former sends over only one
> > >> MemoryRegion to target, while the later sends over several (one per
> > >> memdev).  
> > >
> > > If we can't migration from one to the other, then we can not deprecate
> > > the existing 'mem' syntax. Even if libvirt were to provide a config
> > > option to let apps opt-in to the new syntax, we need to be able to
> > > support live migration of existing running VMs indefinitely. Effectively
> > > this means we need the to keep 'mem' support forever, or at least such
> > > a long time that it effectively means forever.
> > >
> > > So I think this patch has to be dropped & replaced with one that
> > > simply documents that memdev syntax is preferred.  
> > 
> > We have this habit of postulating absolutes like "can not deprecate"
> > instead of engaging with the tradeoffs.  We need to kick it.
> > 
> > So let's have an actual look at the tradeoffs.
> > 
> > We don't actually "support live migration of existing running VMs
> > indefinitely".
> > 
> > We support live migration to any newer version of QEMU that still
> > supports the machine type.
> > 
> > We support live migration to any older version of QEMU that already
> > supports the machine type and all the devices the machine uses.
> > 
> > Aside: "support" is really an honest best effort here.  If you rely on
> > it, use a downstream that puts in the (substantial!) QA work real
> > support takes.
> > 
> > Feature deprecation is not a contract to drop the feature after two
> > releases, or even five.  It's a formal notice that users of the feature
> > should transition to its replacement in an orderly manner.
> > 
> > If I understand Igor correctly, all users should transition away from
> > outdated NUMA configurations at least for new VMs in an orderly manner.
> Yes, we can postpone removing options until there are machines type
> versions that were capable to use it (unfortunate but probably 
> unavoidable unless there is a migration trick to make transition
> transparent) but that should not stop us from disabling broken
> options on new machine types at least.
> 
> This series can serve as formal notice with follow up disabling of
> deprecated options for new machine types. (As Thomas noted, just warnings
> do not work and users continue to use broken features regardless whether
> they are don't know about issues or aware of it [*])
> 
> Hence suggested deprecation approach and enforced rejection of legacy
> numa options for new machine types in 2 releases so users would stop
> using them eventually.

When we deprecate something, we need to have a way for apps to use the
new alternative approach *at the same time*.  So even if we only want to
deprecate for new machine types, we still have to first solve the problem
of how mgmt apps will introspect QEMU to learn which machine types expect
the new options.

Regards,
Daniel
Igor Mammedov March 4, 2019, 1:52 p.m. UTC | #11
On Fri, 1 Mar 2019 18:01:52 +0000
"Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:

> * Igor Mammedov (imammedo@redhat.com) wrote:
> > On Fri, 1 Mar 2019 15:49:47 +0000
> > Daniel P. Berrangé <berrange@redhat.com> wrote:
> >   
> > > On Fri, Mar 01, 2019 at 04:42:15PM +0100, Igor Mammedov wrote:  
> > > > The parameter allows to configure fake NUMA topology where guest
> > > > VM simulates NUMA topology but not actually getting a performance
> > > > benefits from it. The same or better results could be achieved
> > > > using 'memdev' parameter. In light of that any VM that uses NUMA
> > > > to get its benefits should use 'memdev' and to allow transition
> > > > initial RAM to device based model, deprecate 'mem' parameter as
> > > > its ad-hoc partitioning of initial RAM MemoryRegion can't be
> > > > translated to memdev based backend transparently to users and in
> > > > compatible manner (migration wise).
> > > > 
> > > > That will also allow to clean up a bit our numa code, leaving only
> > > > 'memdev' impl. in place and several boards that use node_mem
> > > > to generate FDT/ACPI description from it.    
> > > 
> > > Can you confirm that the  'mem' and 'memdev' parameters to -numa
> > > are 100% live migration compatible in both directions ?  Libvirt
> > > would need this to be the case in order to use the 'memdev' syntax
> > > instead.  
> > Unfortunately they are not migration compatible in any direction,
> > if it where possible to translate them to each other I'd alias 'mem'
> > to 'memdev' without deprecation. The former sends over only one
> > MemoryRegion to target, while the later sends over several (one per
> > memdev).
> > 
> > Mixed memory issue[1] first came from libvirt side RHBZ1624223,
> > back then it was resolved on libvirt side in favor of migration
> > compatibility vs correctness (i.e. bind policy doesn't work as expected).
> > What worse that it was made default and affects all new machines,
> > as I understood it.
> > 
> > In case of -mem-path + -mem-prealloc (with 1 numa node or numa less)
> > it's possible on QEMU side to make conversion to memdev in migration
> > compatible way (that's what stopped Michal from memdev approach).
> > But it's hard to do so in multi-nodes case as amount of MemoryRegions
> > is different.
> > 
> > Point is to consider 'mem' as mis-configuration error, as the user
> > in the first place using broken numa configuration
> > (i.e. fake numa configuration doesn't actually improve performance).
> > 
> > CCed David, maybe he could offer a way to do 1:n migration and other
> > way around.  
> 
> I can't see a trivial way.
> About the easiest I can think of is if you had a way to create a memdev
> that was an alias to pc.ram (of a particular size and offset).
If I get you right that's what I was planning to do for numa-less machines
that use -mem-path/prealloc options, where it's possible to replace
an initial RAM MemoryRegion with a correspondingly named memdev and its
backing MemoryRegion.

But I don't see how it could work in case of legacy NUMA 'mem' options
where initial RAM is 1 MemoryRegion (it's a fake numa after all) and how to
translate that into several MemoryRegions (one per node/memdev).

> Dave
> 
> >   
> > > > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> > > > ---
> > > >  numa.c               |  2 ++
> > > >  qemu-deprecated.texi | 14 ++++++++++++++
> > > >  2 files changed, 16 insertions(+)
> > > > 
> > > > diff --git a/numa.c b/numa.c
> > > > index 3875e1e..2205773 100644
> > > > --- a/numa.c
> > > > +++ b/numa.c
> > > > @@ -121,6 +121,8 @@ static void parse_numa_node(MachineState *ms, NumaNodeOptions *node,
> > > >  
> > > >      if (node->has_mem) {
> > > >          numa_info[nodenr].node_mem = node->mem;
> > > > +        warn_report("Parameter -numa node,mem is deprecated,"
> > > > +                    " use -numa node,memdev instead");
> > > >      }
> > > >      if (node->has_memdev) {
> > > >          Object *o;
> > > > diff --git a/qemu-deprecated.texi b/qemu-deprecated.texi
> > > > index 45c5795..73f99d4 100644
> > > > --- a/qemu-deprecated.texi
> > > > +++ b/qemu-deprecated.texi
> > > > @@ -60,6 +60,20 @@ Support for invalid topologies will be removed, the user must ensure
> > > >  topologies described with -smp include all possible cpus, i.e.
> > > >    @math{@var{sockets} * @var{cores} * @var{threads} = @var{maxcpus}}.
> > > >  
> > > > +@subsection -numa node,mem=@var{size} (since 4.0)
> > > > +
> > > > +The parameter @option{mem} of @option{-numa node} is used to assign a part of
> > > > +guest RAM to a NUMA node. But when using it, it's impossible to manage specified
> > > > +size on the host side (like bind it to a host node, setting bind policy, ...),
> > > > +so guest end-ups with the fake NUMA configuration with suboptiomal performance.
> > > > +However since 2014 there is an alternative way to assign RAM to a NUMA node
> > > > +using parameter @option{memdev}, which does the same as @option{mem} and has
> > > > +an ability to actualy manage node RAM on the host side. Use parameter
> > > > +@option{memdev} with @var{memory-backend-ram} backend as an replacement for
> > > > +parameter @option{mem} to achieve the same fake NUMA effect or a properly
> > > > +configured @var{memory-backend-file} backend to actually benefit from NUMA
> > > > +configuration.
> > > > +
> > > >  @section QEMU Machine Protocol (QMP) commands
> > > >  
> > > >  @subsection block-dirty-bitmap-add "autoload" parameter (since 2.12.0)
> > > > -- 
> > > > 2.7.4
> > > > 
> > > > --
> > > > libvir-list mailing list
> > > > libvir-list@redhat.com
> > > > https://www.redhat.com/mailman/listinfo/libvir-list    
> > > 
> > > Regards,
> > > Daniel  
> >   
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Igor Mammedov March 4, 2019, 1:55 p.m. UTC | #12
On Mon, 4 Mar 2019 09:11:19 +0100
Thomas Huth <thuth@redhat.com> wrote:

> On 01/03/2019 18.48, Daniel P. Berrangé wrote:
> [...]
> > So I think this patch has to be dropped & replaced with one that
> > simply documents that memdev syntax is preferred.  
> 
> That's definitely not enough. I've had a couple of cases already where
> we documented that certain options should not be used anymore, and
> people simply ignored it (aka. if it ain't broken, don't do any change).
> Then they just started to complain when I really tried to remove the
> option after the deprecation period.

> So Igor, if you can not officially deprecate these things here yet, you
> should at least make sure that they can not be used with new machine
> types anymore. Then, after a couple of years, when we feel sure that
> there are only some few or no people left who still use it with the old
> machine types, we can start to discuss the deprecation process again, I
> think.
Is it acceptable to silently disable CLI options (even if they are broken
like in this case) for new machine types?
I was under impression that it should go through deprecation first.

> 
>  Thomas
>
Daniel P. Berrangé March 4, 2019, 1:59 p.m. UTC | #13
On Mon, Mar 04, 2019 at 02:55:10PM +0100, Igor Mammedov wrote:
> On Mon, 4 Mar 2019 09:11:19 +0100
> Thomas Huth <thuth@redhat.com> wrote:
> 
> > On 01/03/2019 18.48, Daniel P. Berrangé wrote:
> > [...]
> > > So I think this patch has to be dropped & replaced with one that
> > > simply documents that memdev syntax is preferred.  
> > 
> > That's definitely not enough. I've had a couple of cases already where
> > we documented that certain options should not be used anymore, and
> > people simply ignored it (aka. if it ain't broken, don't do any change).
> > Then they just started to complain when I really tried to remove the
> > option after the deprecation period.
> 
> > So Igor, if you can not officially deprecate these things here yet, you
> > should at least make sure that they can not be used with new machine
> > types anymore. Then, after a couple of years, when we feel sure that
> > there are only some few or no people left who still use it with the old
> > machine types, we can start to discuss the deprecation process again, I
> > think.
> Is it acceptable to silently disable CLI options (even if they are broken
> like in this case) for new machine types?
> I was under impression that it should go through deprecation first.

Yes, it must go through deprecation. I was saying we can't disable
the CLI options at all, until there is a way for libvirt to correctly
use the new options.

Regards,
Daniel
Igor Mammedov March 4, 2019, 2:16 p.m. UTC | #14
On Mon, 4 Mar 2019 12:39:08 +0000
Daniel P. Berrangé <berrange@redhat.com> wrote:

> On Mon, Mar 04, 2019 at 01:25:07PM +0100, Igor Mammedov wrote:
> > On Mon, 04 Mar 2019 08:13:53 +0100
> > Markus Armbruster <armbru@redhat.com> wrote:
> >   
> > > Daniel P. Berrangé <berrange@redhat.com> writes:
> > >   
> > > > On Fri, Mar 01, 2019 at 06:33:28PM +0100, Igor Mammedov wrote:    
> > > >> On Fri, 1 Mar 2019 15:49:47 +0000
> > > >> Daniel P. Berrangé <berrange@redhat.com> wrote:
> > > >>     
> > > >> > On Fri, Mar 01, 2019 at 04:42:15PM +0100, Igor Mammedov wrote:    
> > > >> > > The parameter allows to configure fake NUMA topology where guest
> > > >> > > VM simulates NUMA topology but not actually getting a performance
> > > >> > > benefits from it. The same or better results could be achieved
> > > >> > > using 'memdev' parameter. In light of that any VM that uses NUMA
> > > >> > > to get its benefits should use 'memdev' and to allow transition
> > > >> > > initial RAM to device based model, deprecate 'mem' parameter as
> > > >> > > its ad-hoc partitioning of initial RAM MemoryRegion can't be
> > > >> > > translated to memdev based backend transparently to users and in
> > > >> > > compatible manner (migration wise).
> > > >> > > 
> > > >> > > That will also allow to clean up a bit our numa code, leaving only
> > > >> > > 'memdev' impl. in place and several boards that use node_mem
> > > >> > > to generate FDT/ACPI description from it.      
> > > >> > 
> > > >> > Can you confirm that the  'mem' and 'memdev' parameters to -numa
> > > >> > are 100% live migration compatible in both directions ?  Libvirt
> > > >> > would need this to be the case in order to use the 'memdev' syntax
> > > >> > instead.    
> > > >> Unfortunately they are not migration compatible in any direction,
> > > >> if it where possible to translate them to each other I'd alias 'mem'
> > > >> to 'memdev' without deprecation. The former sends over only one
> > > >> MemoryRegion to target, while the later sends over several (one per
> > > >> memdev).    
> > > >
> > > > If we can't migration from one to the other, then we can not deprecate
> > > > the existing 'mem' syntax. Even if libvirt were to provide a config
> > > > option to let apps opt-in to the new syntax, we need to be able to
> > > > support live migration of existing running VMs indefinitely. Effectively
> > > > this means we need the to keep 'mem' support forever, or at least such
> > > > a long time that it effectively means forever.
> > > >
> > > > So I think this patch has to be dropped & replaced with one that
> > > > simply documents that memdev syntax is preferred.    
> > > 
> > > We have this habit of postulating absolutes like "can not deprecate"
> > > instead of engaging with the tradeoffs.  We need to kick it.
> > > 
> > > So let's have an actual look at the tradeoffs.
> > > 
> > > We don't actually "support live migration of existing running VMs
> > > indefinitely".
> > > 
> > > We support live migration to any newer version of QEMU that still
> > > supports the machine type.
> > > 
> > > We support live migration to any older version of QEMU that already
> > > supports the machine type and all the devices the machine uses.
> > > 
> > > Aside: "support" is really an honest best effort here.  If you rely on
> > > it, use a downstream that puts in the (substantial!) QA work real
> > > support takes.
> > > 
> > > Feature deprecation is not a contract to drop the feature after two
> > > releases, or even five.  It's a formal notice that users of the feature
> > > should transition to its replacement in an orderly manner.
> > > 
> > > If I understand Igor correctly, all users should transition away from
> > > outdated NUMA configurations at least for new VMs in an orderly manner.  
> > Yes, we can postpone removing options until there are machines type
> > versions that were capable to use it (unfortunate but probably 
> > unavoidable unless there is a migration trick to make transition
> > transparent) but that should not stop us from disabling broken
> > options on new machine types at least.
> > 
> > This series can serve as formal notice with follow up disabling of
> > deprecated options for new machine types. (As Thomas noted, just warnings
> > do not work and users continue to use broken features regardless whether
> > they are don't know about issues or aware of it [*])
> > 
> > Hence suggested deprecation approach and enforced rejection of legacy
> > numa options for new machine types in 2 releases so users would stop
> > using them eventually.  
> 
> When we deprecate something, we need to have a way for apps to use the
> new alternative approach *at the same time*.  So even if we only want to
> deprecate for new machine types, we still have to first solve the problem
> of how mgmt apps will introspect QEMU to learn which machine types expect
> the new options.
I'm not aware any mechanism to introspect machine type options (existing
or something being developed). Are/were there any ideas about it that were
discussed in the past?

Aside from developing a new mechanism what are alternative approaches?
I mean when we delete deprecated CLI option, how it's solved on libvirt
side currently?

For example I don't see anything introspection related when we have been
removing deprecated options recently.

More exact question specific to this series usecase,
how libvirt decides when to use -numa node,memdev or not currently?


> 
> Regards,
> Daniel
Michal Prívozník March 4, 2019, 2:24 p.m. UTC | #15
[Thanks Igor for bringing this onto my radar. I don't follow qemu-devel 
that close]

On 3/4/19 11:19 AM, Daniel P. Berrangé wrote:
> On Mon, Mar 04, 2019 at 08:13:53AM +0100, Markus Armbruster wrote:
>> Daniel P. Berrangé <berrange@redhat.com> writes:
>>
>>> On Fri, Mar 01, 2019 at 06:33:28PM +0100, Igor Mammedov wrote:
>>>> On Fri, 1 Mar 2019 15:49:47 +0000
>>>> Daniel P. Berrangé <berrange@redhat.com> wrote:
>>>>
>>>>> On Fri, Mar 01, 2019 at 04:42:15PM +0100, Igor Mammedov wrote:
>>>>>> The parameter allows to configure fake NUMA topology where guest
>>>>>> VM simulates NUMA topology but not actually getting a performance
>>>>>> benefits from it. The same or better results could be achieved
>>>>>> using 'memdev' parameter. In light of that any VM that uses NUMA
>>>>>> to get its benefits should use 'memdev' and to allow transition
>>>>>> initial RAM to device based model, deprecate 'mem' parameter as
>>>>>> its ad-hoc partitioning of initial RAM MemoryRegion can't be
>>>>>> translated to memdev based backend transparently to users and in
>>>>>> compatible manner (migration wise).
>>>>>>
>>>>>> That will also allow to clean up a bit our numa code, leaving only
>>>>>> 'memdev' impl. in place and several boards that use node_mem
>>>>>> to generate FDT/ACPI description from it.
>>>>>
>>>>> Can you confirm that the  'mem' and 'memdev' parameters to -numa
>>>>> are 100% live migration compatible in both directions ?  Libvirt
>>>>> would need this to be the case in order to use the 'memdev' syntax
>>>>> instead.
>>>> Unfortunately they are not migration compatible in any direction,
>>>> if it where possible to translate them to each other I'd alias 'mem'
>>>> to 'memdev' without deprecation. The former sends over only one
>>>> MemoryRegion to target, while the later sends over several (one per
>>>> memdev).
>>>
>>> If we can't migration from one to the other, then we can not deprecate
>>> the existing 'mem' syntax. Even if libvirt were to provide a config
>>> option to let apps opt-in to the new syntax, we need to be able to
>>> support live migration of existing running VMs indefinitely. Effectively
>>> this means we need the to keep 'mem' support forever, or at least such
>>> a long time that it effectively means forever.

I'm with Daniel on this. The reason why libvirt still defaults to '-numa 
node,mem=' is exactly because of backward compatibility. Since a machine 
can't be migrated from '-numa node,mem=' to '-numa node,memdev= + 
-object memory-backend-*' libvirt hast to play it safe and chose a 
combination that is acessible the widest.

If you remove this, how would you expect older machines to migrate to 
newer cmd line?

I'm all for deprecating old stuff. In fact, I've suggested that in 
libvirt(!) here and there, but I'm afraid we can't just remove 
functionatlity unless we give users a way to migrate to the one we 
prefer now.

And if libvirt doesn't follow qemu's warnings then it definitely should. 
It's a libvirt bug if it doesn't follow the best practicies (well, if can).

Michal
Daniel P. Berrangé March 4, 2019, 2:24 p.m. UTC | #16
On Mon, Mar 04, 2019 at 03:16:41PM +0100, Igor Mammedov wrote:
> On Mon, 4 Mar 2019 12:39:08 +0000
> Daniel P. Berrangé <berrange@redhat.com> wrote:
> 
> > On Mon, Mar 04, 2019 at 01:25:07PM +0100, Igor Mammedov wrote:
> > > On Mon, 04 Mar 2019 08:13:53 +0100
> > > Markus Armbruster <armbru@redhat.com> wrote:
> > >   
> > > > Daniel P. Berrangé <berrange@redhat.com> writes:
> > > >   
> > > > > On Fri, Mar 01, 2019 at 06:33:28PM +0100, Igor Mammedov wrote:    
> > > > >> On Fri, 1 Mar 2019 15:49:47 +0000
> > > > >> Daniel P. Berrangé <berrange@redhat.com> wrote:
> > > > >>     
> > > > >> > On Fri, Mar 01, 2019 at 04:42:15PM +0100, Igor Mammedov wrote:    
> > > > >> > > The parameter allows to configure fake NUMA topology where guest
> > > > >> > > VM simulates NUMA topology but not actually getting a performance
> > > > >> > > benefits from it. The same or better results could be achieved
> > > > >> > > using 'memdev' parameter. In light of that any VM that uses NUMA
> > > > >> > > to get its benefits should use 'memdev' and to allow transition
> > > > >> > > initial RAM to device based model, deprecate 'mem' parameter as
> > > > >> > > its ad-hoc partitioning of initial RAM MemoryRegion can't be
> > > > >> > > translated to memdev based backend transparently to users and in
> > > > >> > > compatible manner (migration wise).
> > > > >> > > 
> > > > >> > > That will also allow to clean up a bit our numa code, leaving only
> > > > >> > > 'memdev' impl. in place and several boards that use node_mem
> > > > >> > > to generate FDT/ACPI description from it.      
> > > > >> > 
> > > > >> > Can you confirm that the  'mem' and 'memdev' parameters to -numa
> > > > >> > are 100% live migration compatible in both directions ?  Libvirt
> > > > >> > would need this to be the case in order to use the 'memdev' syntax
> > > > >> > instead.    
> > > > >> Unfortunately they are not migration compatible in any direction,
> > > > >> if it where possible to translate them to each other I'd alias 'mem'
> > > > >> to 'memdev' without deprecation. The former sends over only one
> > > > >> MemoryRegion to target, while the later sends over several (one per
> > > > >> memdev).    
> > > > >
> > > > > If we can't migration from one to the other, then we can not deprecate
> > > > > the existing 'mem' syntax. Even if libvirt were to provide a config
> > > > > option to let apps opt-in to the new syntax, we need to be able to
> > > > > support live migration of existing running VMs indefinitely. Effectively
> > > > > this means we need the to keep 'mem' support forever, or at least such
> > > > > a long time that it effectively means forever.
> > > > >
> > > > > So I think this patch has to be dropped & replaced with one that
> > > > > simply documents that memdev syntax is preferred.    
> > > > 
> > > > We have this habit of postulating absolutes like "can not deprecate"
> > > > instead of engaging with the tradeoffs.  We need to kick it.
> > > > 
> > > > So let's have an actual look at the tradeoffs.
> > > > 
> > > > We don't actually "support live migration of existing running VMs
> > > > indefinitely".
> > > > 
> > > > We support live migration to any newer version of QEMU that still
> > > > supports the machine type.
> > > > 
> > > > We support live migration to any older version of QEMU that already
> > > > supports the machine type and all the devices the machine uses.
> > > > 
> > > > Aside: "support" is really an honest best effort here.  If you rely on
> > > > it, use a downstream that puts in the (substantial!) QA work real
> > > > support takes.
> > > > 
> > > > Feature deprecation is not a contract to drop the feature after two
> > > > releases, or even five.  It's a formal notice that users of the feature
> > > > should transition to its replacement in an orderly manner.
> > > > 
> > > > If I understand Igor correctly, all users should transition away from
> > > > outdated NUMA configurations at least for new VMs in an orderly manner.  
> > > Yes, we can postpone removing options until there are machines type
> > > versions that were capable to use it (unfortunate but probably 
> > > unavoidable unless there is a migration trick to make transition
> > > transparent) but that should not stop us from disabling broken
> > > options on new machine types at least.
> > > 
> > > This series can serve as formal notice with follow up disabling of
> > > deprecated options for new machine types. (As Thomas noted, just warnings
> > > do not work and users continue to use broken features regardless whether
> > > they are don't know about issues or aware of it [*])
> > > 
> > > Hence suggested deprecation approach and enforced rejection of legacy
> > > numa options for new machine types in 2 releases so users would stop
> > > using them eventually.  
> > 
> > When we deprecate something, we need to have a way for apps to use the
> > new alternative approach *at the same time*.  So even if we only want to
> > deprecate for new machine types, we still have to first solve the problem
> > of how mgmt apps will introspect QEMU to learn which machine types expect
> > the new options.
> I'm not aware any mechanism to introspect machine type options (existing
> or something being developed). Are/were there any ideas about it that were
> discussed in the past?
> 
> Aside from developing a new mechanism what are alternative approaches?
> I mean when we delete deprecated CLI option, how it's solved on libvirt
> side currently?
> 
> For example I don't see anything introspection related when we have been
> removing deprecated options recently.

Right, with other stuff we deprecate we've had a simpler time, as it
either didn't affect migration at all, or the new replacement stuff
was fully compatible with the migration data stream. IOW, libvirt
could unconditionally use the new feature as soon as it saw that it
exists in QEMU. We didn't have any machine type dependancy to deal
with before now.

> More exact question specific to this series usecase,
> how libvirt decides when to use -numa node,memdev or not currently?

It is pretty hard to follow the code, but IIUC we only use memdev when
stting up NVDIMMs, or for guests configured to have the "shared" flag
on the memory region.

Regards,
Daniel
Michal Prívozník March 4, 2019, 2:34 p.m. UTC | #17
On 3/4/19 3:16 PM, Igor Mammedov wrote:
> On Mon, 4 Mar 2019 12:39:08 +0000
> Daniel P. Berrangé <berrange@redhat.com> wrote:
> 
>> On Mon, Mar 04, 2019 at 01:25:07PM +0100, Igor Mammedov wrote:
>>> On Mon, 04 Mar 2019 08:13:53 +0100
>>> Markus Armbruster <armbru@redhat.com> wrote:
>>>    
>>>> Daniel P. Berrangé <berrange@redhat.com> writes:
>>>>    
>>>>> On Fri, Mar 01, 2019 at 06:33:28PM +0100, Igor Mammedov wrote:
>>>>>> On Fri, 1 Mar 2019 15:49:47 +0000
>>>>>> Daniel P. Berrangé <berrange@redhat.com> wrote:
>>>>>>      
>>>>>>> On Fri, Mar 01, 2019 at 04:42:15PM +0100, Igor Mammedov wrote:
>>>>>>>> The parameter allows to configure fake NUMA topology where guest
>>>>>>>> VM simulates NUMA topology but not actually getting a performance
>>>>>>>> benefits from it. The same or better results could be achieved
>>>>>>>> using 'memdev' parameter. In light of that any VM that uses NUMA
>>>>>>>> to get its benefits should use 'memdev' and to allow transition
>>>>>>>> initial RAM to device based model, deprecate 'mem' parameter as
>>>>>>>> its ad-hoc partitioning of initial RAM MemoryRegion can't be
>>>>>>>> translated to memdev based backend transparently to users and in
>>>>>>>> compatible manner (migration wise).
>>>>>>>>
>>>>>>>> That will also allow to clean up a bit our numa code, leaving only
>>>>>>>> 'memdev' impl. in place and several boards that use node_mem
>>>>>>>> to generate FDT/ACPI description from it.
>>>>>>>
>>>>>>> Can you confirm that the  'mem' and 'memdev' parameters to -numa
>>>>>>> are 100% live migration compatible in both directions ?  Libvirt
>>>>>>> would need this to be the case in order to use the 'memdev' syntax
>>>>>>> instead.
>>>>>> Unfortunately they are not migration compatible in any direction,
>>>>>> if it where possible to translate them to each other I'd alias 'mem'
>>>>>> to 'memdev' without deprecation. The former sends over only one
>>>>>> MemoryRegion to target, while the later sends over several (one per
>>>>>> memdev).
>>>>>
>>>>> If we can't migration from one to the other, then we can not deprecate
>>>>> the existing 'mem' syntax. Even if libvirt were to provide a config
>>>>> option to let apps opt-in to the new syntax, we need to be able to
>>>>> support live migration of existing running VMs indefinitely. Effectively
>>>>> this means we need the to keep 'mem' support forever, or at least such
>>>>> a long time that it effectively means forever.
>>>>>
>>>>> So I think this patch has to be dropped & replaced with one that
>>>>> simply documents that memdev syntax is preferred.
>>>>
>>>> We have this habit of postulating absolutes like "can not deprecate"
>>>> instead of engaging with the tradeoffs.  We need to kick it.
>>>>
>>>> So let's have an actual look at the tradeoffs.
>>>>
>>>> We don't actually "support live migration of existing running VMs
>>>> indefinitely".
>>>>
>>>> We support live migration to any newer version of QEMU that still
>>>> supports the machine type.
>>>>
>>>> We support live migration to any older version of QEMU that already
>>>> supports the machine type and all the devices the machine uses.
>>>>
>>>> Aside: "support" is really an honest best effort here.  If you rely on
>>>> it, use a downstream that puts in the (substantial!) QA work real
>>>> support takes.
>>>>
>>>> Feature deprecation is not a contract to drop the feature after two
>>>> releases, or even five.  It's a formal notice that users of the feature
>>>> should transition to its replacement in an orderly manner.
>>>>
>>>> If I understand Igor correctly, all users should transition away from
>>>> outdated NUMA configurations at least for new VMs in an orderly manner.
>>> Yes, we can postpone removing options until there are machines type
>>> versions that were capable to use it (unfortunate but probably
>>> unavoidable unless there is a migration trick to make transition
>>> transparent) but that should not stop us from disabling broken
>>> options on new machine types at least.
>>>
>>> This series can serve as formal notice with follow up disabling of
>>> deprecated options for new machine types. (As Thomas noted, just warnings
>>> do not work and users continue to use broken features regardless whether
>>> they are don't know about issues or aware of it [*])
>>>
>>> Hence suggested deprecation approach and enforced rejection of legacy
>>> numa options for new machine types in 2 releases so users would stop
>>> using them eventually.
>>
>> When we deprecate something, we need to have a way for apps to use the
>> new alternative approach *at the same time*.  So even if we only want to
>> deprecate for new machine types, we still have to first solve the problem
>> of how mgmt apps will introspect QEMU to learn which machine types expect
>> the new options.
> I'm not aware any mechanism to introspect machine type options (existing
> or something being developed). Are/were there any ideas about it that were
> discussed in the past?
> 
> Aside from developing a new mechanism what are alternative approaches?
> I mean when we delete deprecated CLI option, how it's solved on libvirt
> side currently?

Libvirt queries qemu capabilites via QMP. And in all places it can it 
preferes the latest recommended cmd line options (well, those known to a 
libvirt developer at the time he/she is writing the code). So as long as 
you remove only old stuff and libvirt refreshes itself when following 
best practicies we're okay.

> 
> For example I don't see anything introspection related when we have been
> removing deprecated options recently.
> 
> More exact question specific to this series usecase,
> how libvirt decides when to use -numa node,memdev or not currently?

It has a mechanism to tell if '-numa node,memdev=' is needed (i.e. there 
is no other way to satisfy user requested configuration) and only then 
it uses ,memdev. For all other cases it defaults to -numa node,mem= 
simply to keep backwards compatibility (as I'm explaining in another 
e-mail I've just sent to this list).

Anyway, in the libvirt code you want to be looking at:

src/qemu/qemu_command.c: qemuBuildNumaArgStr
src/qemu/qemu_command.c: qemuBuildMemoryCellBackendStr

Michal
Igor Mammedov March 4, 2019, 2:54 p.m. UTC | #18
On Mon, 4 Mar 2019 13:59:09 +0000
Daniel P. Berrangé <berrange@redhat.com> wrote:

> On Mon, Mar 04, 2019 at 02:55:10PM +0100, Igor Mammedov wrote:
> > On Mon, 4 Mar 2019 09:11:19 +0100
> > Thomas Huth <thuth@redhat.com> wrote:
> >   
> > > On 01/03/2019 18.48, Daniel P. Berrangé wrote:
> > > [...]  
> > > > So I think this patch has to be dropped & replaced with one that
> > > > simply documents that memdev syntax is preferred.    
> > > 
> > > That's definitely not enough. I've had a couple of cases already where
> > > we documented that certain options should not be used anymore, and
> > > people simply ignored it (aka. if it ain't broken, don't do any change).
> > > Then they just started to complain when I really tried to remove the
> > > option after the deprecation period.  
> >   
> > > So Igor, if you can not officially deprecate these things here yet, you
> > > should at least make sure that they can not be used with new machine
> > > types anymore. Then, after a couple of years, when we feel sure that
> > > there are only some few or no people left who still use it with the old
> > > machine types, we can start to discuss the deprecation process again, I
> > > think.  
> > Is it acceptable to silently disable CLI options (even if they are broken
> > like in this case) for new machine types?
> > I was under impression that it should go through deprecation first.  
> 
> Yes, it must go through deprecation. I was saying we can't disable
> the CLI options at all, until there is a way for libvirt to correctly
> use the new options.

I'm not adding new options (nor plan to for numa case (yet)),
-numa node,memdev is around several years by now and should be used
as default for creating new configs.

In light of keeping 'mem' option around for old machines,
Deprecation should have served for notifying users that legacy
options will be disabled later on (for new machines at least
if no way found for migration compatible transition for older ones).

What I'm mainly aiming here is to prevent using broken legacy options
for new VMs (like in RHBZ1624223 case) and deprecation is the only way
we have now to notify users about CLI breaking changes.

In the -mem-path/prealloc case, there will be a new machine option 'memdev'
to replace them but that's migration compatible, so libvirt could use new
CLI syntax to replace even old configs.
(If there is a mechanism for introducing/introspecting a new options
per-machine or whole QEMU, I'd happy to add a new option there on top of
just deprecation notice that we have now).

> Regards,
> Daniel
Daniel P. Berrangé March 4, 2019, 3:02 p.m. UTC | #19
On Mon, Mar 04, 2019 at 03:54:57PM +0100, Igor Mammedov wrote:
> On Mon, 4 Mar 2019 13:59:09 +0000
> Daniel P. Berrangé <berrange@redhat.com> wrote:
> 
> > On Mon, Mar 04, 2019 at 02:55:10PM +0100, Igor Mammedov wrote:
> > > On Mon, 4 Mar 2019 09:11:19 +0100
> > > Thomas Huth <thuth@redhat.com> wrote:
> > >   
> > > > On 01/03/2019 18.48, Daniel P. Berrangé wrote:
> > > > [...]  
> > > > > So I think this patch has to be dropped & replaced with one that
> > > > > simply documents that memdev syntax is preferred.    
> > > > 
> > > > That's definitely not enough. I've had a couple of cases already where
> > > > we documented that certain options should not be used anymore, and
> > > > people simply ignored it (aka. if it ain't broken, don't do any change).
> > > > Then they just started to complain when I really tried to remove the
> > > > option after the deprecation period.  
> > >   
> > > > So Igor, if you can not officially deprecate these things here yet, you
> > > > should at least make sure that they can not be used with new machine
> > > > types anymore. Then, after a couple of years, when we feel sure that
> > > > there are only some few or no people left who still use it with the old
> > > > machine types, we can start to discuss the deprecation process again, I
> > > > think.  
> > > Is it acceptable to silently disable CLI options (even if they are broken
> > > like in this case) for new machine types?
> > > I was under impression that it should go through deprecation first.  
> > 
> > Yes, it must go through deprecation. I was saying we can't disable
> > the CLI options at all, until there is a way for libvirt to correctly
> > use the new options.
> 
> I'm not adding new options (nor plan to for numa case (yet)),
> -numa node,memdev is around several years by now and should be used
> as default for creating new configs.
> 
> In light of keeping 'mem' option around for old machines,
> Deprecation should have served for notifying users that legacy
> options will be disabled later on (for new machines at least
> if no way found for migration compatible transition for older ones).
> 
> What I'm mainly aiming here is to prevent using broken legacy options
> for new VMs (like in RHBZ1624223 case) and deprecation is the only way
> we have now to notify users about CLI breaking changes.

The idea of doing advance warnings via deprecations is that applications
have time to adapt to the new mechanism several releases before the old
mechanism is removed/disabled.  Since the new mechanism isn't fully
usable yet, applications can't adapt to use it. So we can't start the
deprecation process yet, as it would be telling apps to do a switch
that isn't possible for many to actually do.

In the meantime, qemu-options.hx could be updated. It documents both
"mem" and "memdev" currently but doesn't tell people that "memdev" is
the preferred syntax for future usage / warn against using "mem".


Regards,
Daniel
Igor Mammedov March 4, 2019, 3:03 p.m. UTC | #20
On Mon, 4 Mar 2019 15:24:28 +0100
Michal Privoznik <mprivozn@redhat.com> wrote:

> [Thanks Igor for bringing this onto my radar. I don't follow qemu-devel 
> that close]
> 
> On 3/4/19 11:19 AM, Daniel P. Berrangé wrote:
> > On Mon, Mar 04, 2019 at 08:13:53AM +0100, Markus Armbruster wrote:  
> >> Daniel P. Berrangé <berrange@redhat.com> writes:
> >>  
> >>> On Fri, Mar 01, 2019 at 06:33:28PM +0100, Igor Mammedov wrote:  
> >>>> On Fri, 1 Mar 2019 15:49:47 +0000
> >>>> Daniel P. Berrangé <berrange@redhat.com> wrote:
> >>>>  
> >>>>> On Fri, Mar 01, 2019 at 04:42:15PM +0100, Igor Mammedov wrote:  
> >>>>>> The parameter allows to configure fake NUMA topology where guest
> >>>>>> VM simulates NUMA topology but not actually getting a performance
> >>>>>> benefits from it. The same or better results could be achieved
> >>>>>> using 'memdev' parameter. In light of that any VM that uses NUMA
> >>>>>> to get its benefits should use 'memdev' and to allow transition
> >>>>>> initial RAM to device based model, deprecate 'mem' parameter as
> >>>>>> its ad-hoc partitioning of initial RAM MemoryRegion can't be
> >>>>>> translated to memdev based backend transparently to users and in
> >>>>>> compatible manner (migration wise).
> >>>>>>
> >>>>>> That will also allow to clean up a bit our numa code, leaving only
> >>>>>> 'memdev' impl. in place and several boards that use node_mem
> >>>>>> to generate FDT/ACPI description from it.  
> >>>>>
> >>>>> Can you confirm that the  'mem' and 'memdev' parameters to -numa
> >>>>> are 100% live migration compatible in both directions ?  Libvirt
> >>>>> would need this to be the case in order to use the 'memdev' syntax
> >>>>> instead.  
> >>>> Unfortunately they are not migration compatible in any direction,
> >>>> if it where possible to translate them to each other I'd alias 'mem'
> >>>> to 'memdev' without deprecation. The former sends over only one
> >>>> MemoryRegion to target, while the later sends over several (one per
> >>>> memdev).  
> >>>
> >>> If we can't migration from one to the other, then we can not deprecate
> >>> the existing 'mem' syntax. Even if libvirt were to provide a config
> >>> option to let apps opt-in to the new syntax, we need to be able to
> >>> support live migration of existing running VMs indefinitely. Effectively
> >>> this means we need the to keep 'mem' support forever, or at least such
> >>> a long time that it effectively means forever.  
> 
> I'm with Daniel on this. The reason why libvirt still defaults to '-numa 
> node,mem=' is exactly because of backward compatibility. Since a machine 
> can't be migrated from '-numa node,mem=' to '-numa node,memdev= + 
> -object memory-backend-*' libvirt hast to play it safe and chose a 
> combination that is acessible the widest.
> 
> If you remove this, how would you expect older machines to migrate to 
> newer cmd line?
> 
> I'm all for deprecating old stuff. In fact, I've suggested that in 
> libvirt(!) here and there, but I'm afraid we can't just remove 
> functionatlity unless we give users a way to migrate to the one we 
> prefer now.
Agreed, it's clear now that I can't remove just 'mem' for OLD machine
types (even if this safe variant is broken and doesn't actually do
what it should). Libvirt should use 'memdev' for new VMs for them
to actually benefit from NUMA configuration.

Currently we are talking about disabling 'mem' for new machine types
only (pity that I have to keep around legacy code but at least we would
be able to move on to normal device modeling for initial memory on
new machines).

> And if libvirt doesn't follow qemu's warnings then it definitely should. 
> It's a libvirt bug if it doesn't follow the best practicies (well, if can).
> 
> Michal
Igor Mammedov March 4, 2019, 3:19 p.m. UTC | #21
On Mon, 4 Mar 2019 14:24:32 +0000
Daniel P. Berrangé <berrange@redhat.com> wrote:

> On Mon, Mar 04, 2019 at 03:16:41PM +0100, Igor Mammedov wrote:
> > On Mon, 4 Mar 2019 12:39:08 +0000
> > Daniel P. Berrangé <berrange@redhat.com> wrote:
> >   
> > > On Mon, Mar 04, 2019 at 01:25:07PM +0100, Igor Mammedov wrote:  
> > > > On Mon, 04 Mar 2019 08:13:53 +0100
> > > > Markus Armbruster <armbru@redhat.com> wrote:
> > > >     
> > > > > Daniel P. Berrangé <berrange@redhat.com> writes:
> > > > >     
> > > > > > On Fri, Mar 01, 2019 at 06:33:28PM +0100, Igor Mammedov wrote:      
> > > > > >> On Fri, 1 Mar 2019 15:49:47 +0000
> > > > > >> Daniel P. Berrangé <berrange@redhat.com> wrote:
> > > > > >>       
> > > > > >> > On Fri, Mar 01, 2019 at 04:42:15PM +0100, Igor Mammedov wrote:      
> > > > > >> > > The parameter allows to configure fake NUMA topology where guest
> > > > > >> > > VM simulates NUMA topology but not actually getting a performance
> > > > > >> > > benefits from it. The same or better results could be achieved
> > > > > >> > > using 'memdev' parameter. In light of that any VM that uses NUMA
> > > > > >> > > to get its benefits should use 'memdev' and to allow transition
> > > > > >> > > initial RAM to device based model, deprecate 'mem' parameter as
> > > > > >> > > its ad-hoc partitioning of initial RAM MemoryRegion can't be
> > > > > >> > > translated to memdev based backend transparently to users and in
> > > > > >> > > compatible manner (migration wise).
> > > > > >> > > 
> > > > > >> > > That will also allow to clean up a bit our numa code, leaving only
> > > > > >> > > 'memdev' impl. in place and several boards that use node_mem
> > > > > >> > > to generate FDT/ACPI description from it.        
> > > > > >> > 
> > > > > >> > Can you confirm that the  'mem' and 'memdev' parameters to -numa
> > > > > >> > are 100% live migration compatible in both directions ?  Libvirt
> > > > > >> > would need this to be the case in order to use the 'memdev' syntax
> > > > > >> > instead.      
> > > > > >> Unfortunately they are not migration compatible in any direction,
> > > > > >> if it where possible to translate them to each other I'd alias 'mem'
> > > > > >> to 'memdev' without deprecation. The former sends over only one
> > > > > >> MemoryRegion to target, while the later sends over several (one per
> > > > > >> memdev).      
> > > > > >
> > > > > > If we can't migration from one to the other, then we can not deprecate
> > > > > > the existing 'mem' syntax. Even if libvirt were to provide a config
> > > > > > option to let apps opt-in to the new syntax, we need to be able to
> > > > > > support live migration of existing running VMs indefinitely. Effectively
> > > > > > this means we need the to keep 'mem' support forever, or at least such
> > > > > > a long time that it effectively means forever.
> > > > > >
> > > > > > So I think this patch has to be dropped & replaced with one that
> > > > > > simply documents that memdev syntax is preferred.      
> > > > > 
> > > > > We have this habit of postulating absolutes like "can not deprecate"
> > > > > instead of engaging with the tradeoffs.  We need to kick it.
> > > > > 
> > > > > So let's have an actual look at the tradeoffs.
> > > > > 
> > > > > We don't actually "support live migration of existing running VMs
> > > > > indefinitely".
> > > > > 
> > > > > We support live migration to any newer version of QEMU that still
> > > > > supports the machine type.
> > > > > 
> > > > > We support live migration to any older version of QEMU that already
> > > > > supports the machine type and all the devices the machine uses.
> > > > > 
> > > > > Aside: "support" is really an honest best effort here.  If you rely on
> > > > > it, use a downstream that puts in the (substantial!) QA work real
> > > > > support takes.
> > > > > 
> > > > > Feature deprecation is not a contract to drop the feature after two
> > > > > releases, or even five.  It's a formal notice that users of the feature
> > > > > should transition to its replacement in an orderly manner.
> > > > > 
> > > > > If I understand Igor correctly, all users should transition away from
> > > > > outdated NUMA configurations at least for new VMs in an orderly manner.    
> > > > Yes, we can postpone removing options until there are machines type
> > > > versions that were capable to use it (unfortunate but probably 
> > > > unavoidable unless there is a migration trick to make transition
> > > > transparent) but that should not stop us from disabling broken
> > > > options on new machine types at least.
> > > > 
> > > > This series can serve as formal notice with follow up disabling of
> > > > deprecated options for new machine types. (As Thomas noted, just warnings
> > > > do not work and users continue to use broken features regardless whether
> > > > they are don't know about issues or aware of it [*])
> > > > 
> > > > Hence suggested deprecation approach and enforced rejection of legacy
> > > > numa options for new machine types in 2 releases so users would stop
> > > > using them eventually.    
> > > 
> > > When we deprecate something, we need to have a way for apps to use the
> > > new alternative approach *at the same time*.  So even if we only want to
> > > deprecate for new machine types, we still have to first solve the problem
> > > of how mgmt apps will introspect QEMU to learn which machine types expect
> > > the new options.  
> > I'm not aware any mechanism to introspect machine type options (existing
> > or something being developed). Are/were there any ideas about it that were
> > discussed in the past?
> > 
> > Aside from developing a new mechanism what are alternative approaches?
> > I mean when we delete deprecated CLI option, how it's solved on libvirt
> > side currently?
> > 
> > For example I don't see anything introspection related when we have been
> > removing deprecated options recently.  
> 
> Right, with other stuff we deprecate we've had a simpler time, as it
> either didn't affect migration at all, or the new replacement stuff
> was fully compatible with the migration data stream. IOW, libvirt
> could unconditionally use the new feature as soon as it saw that it
> exists in QEMU. We didn't have any machine type dependancy to deal
> with before now.
Any suggestions what direction we should proceed?
(I'm really not keen to develop a new introspection feature but if that
the only way to move forward ...)

> > More exact question specific to this series usecase,
> > how libvirt decides when to use -numa node,memdev or not currently?  
> 
> It is pretty hard to follow the code, but IIUC we only use memdev when
> stting up NVDIMMs, or for guests configured to have the "shared" flag
> on the memory region.
Then I'd guess that most VMs end up with default '-numa node,mem' 
which by design can produce only fake NUMA without ability to manage
guest RAM on host side. So such VMs aren't getting performance benefits
or worse run with performance regression (due to wrong sched/mm decisions
as guest kernel assumes NUMA topology is valid one).
 
> Regards,
> Daniel
Daniel P. Berrangé March 4, 2019, 3:28 p.m. UTC | #22
On Mon, Mar 04, 2019 at 12:45:14PM +0100, Markus Armbruster wrote:
> Daniel P. Berrangé <berrange@redhat.com> writes:
> 
> > On Mon, Mar 04, 2019 at 08:13:53AM +0100, Markus Armbruster wrote:
> >> If we deprecate outdated NUMA configurations now, we can start rejecting
> >> them with new machine types after a suitable grace period.
> >
> > How is libvirt going to know what machines it can use with the feature ?
> > We don't have any way to introspect machine type specific logic, since we
> > run all probing with "-machine none", and QEMU can't report anything about
> > machines without instantiating them.
> 
> Fair point.  A practical way for management applications to decide which
> of the two interfaces they can use with which machine type may be
> required for deprecating one of the interfaces with new machine types.

We currently have  "qom-list-properties" which can report on the
existance of properties registered against object types. What it
can't do though is report on the default values of these properties.

What's interesting though is that qmp_qom_list_properties will actually
instantiate objects in order to query properties, if the type isn't an
abstract type.

IOW, even if you are running "$QEMU -machine none", then if at the qmp-shell
you do

   (QEMU) qom-list-properties typename=pc-q35-2.6-machine

it will have actually instantiate the pc-q35-2.6-machine machine type.
Since it has instantiated the machine, the object initializer function
will have run and initialized the default values for various properties.

IOW, it is possible for qom-list-properties to report on default values
for non-abstract types.

I did a quick hack to PoC the theory:

diff --git a/qapi/misc.json b/qapi/misc.json
index 8b3ca4fdd3..906dfbf3b5 100644
--- a/qapi/misc.json
+++ b/qapi/misc.json
@@ -1368,7 +1368,8 @@
 # Since: 1.2
 ##
 { 'struct': 'ObjectPropertyInfo',
-  'data': { 'name': 'str', 'type': 'str', '*description': 'str' } }
+  'data': { 'name': 'str', 'type': 'str', '*description': 'str',
+            '*default': 'str'} }
 
 ##
 # @qom-list:
diff --git a/qmp.c b/qmp.c
index b92d62cd5f..a45669032c 100644
--- a/qmp.c
+++ b/qmp.c
@@ -594,6 +594,11 @@ ObjectPropertyInfoList *qmp_qom_list_properties(const char *typename,
         info->has_description = !!prop->description;
         info->description = g_strdup(prop->description);
 
+        if (obj && g_str_equal(info->type, "string")) {
+            info->q_default = g_strdup(object_property_get_str(obj, info->name, NULL));
+            info->has_q_default = info->q_default != NULL;
+        }
+
         entry = g_malloc0(sizeof(*entry));
         entry->value = info;
         entry->next = prop_list;


If we could make this hack less of a hack, then perhaps this is good
enough to cope reporting machine types which forbid use of "mem" in
favour of "memdev" ? They would need to have a property registered
against them of course to identify the "memdev" requirement.

Regards,
Daniel
Igor Mammedov March 4, 2019, 3:46 p.m. UTC | #23
On Mon, 4 Mar 2019 15:28:11 +0000
Daniel P. Berrangé <berrange@redhat.com> wrote:

> On Mon, Mar 04, 2019 at 12:45:14PM +0100, Markus Armbruster wrote:
> > Daniel P. Berrangé <berrange@redhat.com> writes:
> >   
> > > On Mon, Mar 04, 2019 at 08:13:53AM +0100, Markus Armbruster wrote:  
> > >> If we deprecate outdated NUMA configurations now, we can start rejecting
> > >> them with new machine types after a suitable grace period.  
> > >
> > > How is libvirt going to know what machines it can use with the feature ?
> > > We don't have any way to introspect machine type specific logic, since we
> > > run all probing with "-machine none", and QEMU can't report anything about
> > > machines without instantiating them.  
> > 
> > Fair point.  A practical way for management applications to decide which
> > of the two interfaces they can use with which machine type may be
> > required for deprecating one of the interfaces with new machine types.  
> 
> We currently have  "qom-list-properties" which can report on the
> existance of properties registered against object types. What it
> can't do though is report on the default values of these properties.
> 
> What's interesting though is that qmp_qom_list_properties will actually
> instantiate objects in order to query properties, if the type isn't an
> abstract type.
> 
> IOW, even if you are running "$QEMU -machine none", then if at the qmp-shell
> you do
> 
>    (QEMU) qom-list-properties typename=pc-q35-2.6-machine
> 
> it will have actually instantiate the pc-q35-2.6-machine machine type.
> Since it has instantiated the machine, the object initializer function
> will have run and initialized the default values for various properties.
> 
> IOW, it is possible for qom-list-properties to report on default values
> for non-abstract types.
> 
> I did a quick hack to PoC the theory:
> 
> diff --git a/qapi/misc.json b/qapi/misc.json
> index 8b3ca4fdd3..906dfbf3b5 100644
> --- a/qapi/misc.json
> +++ b/qapi/misc.json
> @@ -1368,7 +1368,8 @@
>  # Since: 1.2
>  ##
>  { 'struct': 'ObjectPropertyInfo',
> -  'data': { 'name': 'str', 'type': 'str', '*description': 'str' } }
> +  'data': { 'name': 'str', 'type': 'str', '*description': 'str',
> +            '*default': 'str'} }
>  
>  ##
>  # @qom-list:
> diff --git a/qmp.c b/qmp.c
> index b92d62cd5f..a45669032c 100644
> --- a/qmp.c
> +++ b/qmp.c
> @@ -594,6 +594,11 @@ ObjectPropertyInfoList *qmp_qom_list_properties(const char *typename,
>          info->has_description = !!prop->description;
>          info->description = g_strdup(prop->description);
>  
> +        if (obj && g_str_equal(info->type, "string")) {
> +            info->q_default = g_strdup(object_property_get_str(obj, info->name, NULL));
> +            info->has_q_default = info->q_default != NULL;
> +        }
> +
>          entry = g_malloc0(sizeof(*entry));
>          entry->value = info;
>          entry->next = prop_list;
> 
> 
> If we could make this hack less of a hack, then perhaps this is good
> enough to cope reporting machine types which forbid use of "mem" in
> favour of "memdev" ? They would need to have a property registered
> against them of course to identify the "memdev" requirement.

Thanks, I'll look into it and try to come up with patches.

> Regards,
> Daniel
Michal Prívozník March 4, 2019, 4:12 p.m. UTC | #24
On 3/4/19 4:19 PM, Igor Mammedov wrote:

> Then I'd guess that most VMs end up with default '-numa node,mem'
> which by design can produce only fake NUMA without ability to manage
> guest RAM on host side. So such VMs aren't getting performance benefits
> or worse run with performance regression (due to wrong sched/mm decisions
> as guest kernel assumes NUMA topology is valid one).

Specifying NUMA distances in libvirt XML makes it generate the modern 
cmd line.

Michal
Michal Prívozník March 4, 2019, 4:20 p.m. UTC | #25
On 3/4/19 3:24 PM, Daniel P. Berrangé wrote:
> On Mon, Mar 04, 2019 at 03:16:41PM +0100, Igor Mammedov wrote:
>> On Mon, 4 Mar 2019 12:39:08 +0000
>> Daniel P. Berrangé <berrange@redhat.com> wrote:
>>
>>> On Mon, Mar 04, 2019 at 01:25:07PM +0100, Igor Mammedov wrote:
>>>> On Mon, 04 Mar 2019 08:13:53 +0100
>>>> Markus Armbruster <armbru@redhat.com> wrote:
>>>>    
>>>>> Daniel P. Berrangé <berrange@redhat.com> writes:
>>>>>    
>>>>>> On Fri, Mar 01, 2019 at 06:33:28PM +0100, Igor Mammedov wrote:
>>>>>>> On Fri, 1 Mar 2019 15:49:47 +0000
>>>>>>> Daniel P. Berrangé <berrange@redhat.com> wrote:
>>>>>>>      
>>>>>>>> On Fri, Mar 01, 2019 at 04:42:15PM +0100, Igor Mammedov wrote:
>>>>>>>>> The parameter allows to configure fake NUMA topology where guest
>>>>>>>>> VM simulates NUMA topology but not actually getting a performance
>>>>>>>>> benefits from it. The same or better results could be achieved
>>>>>>>>> using 'memdev' parameter. In light of that any VM that uses NUMA
>>>>>>>>> to get its benefits should use 'memdev' and to allow transition
>>>>>>>>> initial RAM to device based model, deprecate 'mem' parameter as
>>>>>>>>> its ad-hoc partitioning of initial RAM MemoryRegion can't be
>>>>>>>>> translated to memdev based backend transparently to users and in
>>>>>>>>> compatible manner (migration wise).
>>>>>>>>>
>>>>>>>>> That will also allow to clean up a bit our numa code, leaving only
>>>>>>>>> 'memdev' impl. in place and several boards that use node_mem
>>>>>>>>> to generate FDT/ACPI description from it.
>>>>>>>>
>>>>>>>> Can you confirm that the  'mem' and 'memdev' parameters to -numa
>>>>>>>> are 100% live migration compatible in both directions ?  Libvirt
>>>>>>>> would need this to be the case in order to use the 'memdev' syntax
>>>>>>>> instead.
>>>>>>> Unfortunately they are not migration compatible in any direction,
>>>>>>> if it where possible to translate them to each other I'd alias 'mem'
>>>>>>> to 'memdev' without deprecation. The former sends over only one
>>>>>>> MemoryRegion to target, while the later sends over several (one per
>>>>>>> memdev).
>>>>>>
>>>>>> If we can't migration from one to the other, then we can not deprecate
>>>>>> the existing 'mem' syntax. Even if libvirt were to provide a config
>>>>>> option to let apps opt-in to the new syntax, we need to be able to
>>>>>> support live migration of existing running VMs indefinitely. Effectively
>>>>>> this means we need the to keep 'mem' support forever, or at least such
>>>>>> a long time that it effectively means forever.
>>>>>>
>>>>>> So I think this patch has to be dropped & replaced with one that
>>>>>> simply documents that memdev syntax is preferred.
>>>>>
>>>>> We have this habit of postulating absolutes like "can not deprecate"
>>>>> instead of engaging with the tradeoffs.  We need to kick it.
>>>>>
>>>>> So let's have an actual look at the tradeoffs.
>>>>>
>>>>> We don't actually "support live migration of existing running VMs
>>>>> indefinitely".
>>>>>
>>>>> We support live migration to any newer version of QEMU that still
>>>>> supports the machine type.
>>>>>
>>>>> We support live migration to any older version of QEMU that already
>>>>> supports the machine type and all the devices the machine uses.
>>>>>
>>>>> Aside: "support" is really an honest best effort here.  If you rely on
>>>>> it, use a downstream that puts in the (substantial!) QA work real
>>>>> support takes.
>>>>>
>>>>> Feature deprecation is not a contract to drop the feature after two
>>>>> releases, or even five.  It's a formal notice that users of the feature
>>>>> should transition to its replacement in an orderly manner.
>>>>>
>>>>> If I understand Igor correctly, all users should transition away from
>>>>> outdated NUMA configurations at least for new VMs in an orderly manner.
>>>> Yes, we can postpone removing options until there are machines type
>>>> versions that were capable to use it (unfortunate but probably
>>>> unavoidable unless there is a migration trick to make transition
>>>> transparent) but that should not stop us from disabling broken
>>>> options on new machine types at least.
>>>>
>>>> This series can serve as formal notice with follow up disabling of
>>>> deprecated options for new machine types. (As Thomas noted, just warnings
>>>> do not work and users continue to use broken features regardless whether
>>>> they are don't know about issues or aware of it [*])
>>>>
>>>> Hence suggested deprecation approach and enforced rejection of legacy
>>>> numa options for new machine types in 2 releases so users would stop
>>>> using them eventually.
>>>
>>> When we deprecate something, we need to have a way for apps to use the
>>> new alternative approach *at the same time*.  So even if we only want to
>>> deprecate for new machine types, we still have to first solve the problem
>>> of how mgmt apps will introspect QEMU to learn which machine types expect
>>> the new options.
>> I'm not aware any mechanism to introspect machine type options (existing
>> or something being developed). Are/were there any ideas about it that were
>> discussed in the past?
>>
>> Aside from developing a new mechanism what are alternative approaches?
>> I mean when we delete deprecated CLI option, how it's solved on libvirt
>> side currently?
>>
>> For example I don't see anything introspection related when we have been
>> removing deprecated options recently.
> 
> Right, with other stuff we deprecate we've had a simpler time, as it
> either didn't affect migration at all, or the new replacement stuff
> was fully compatible with the migration data stream. IOW, libvirt
> could unconditionally use the new feature as soon as it saw that it
> exists in QEMU. We didn't have any machine type dependancy to deal
> with before now.

We couldn't have done that. How we would migrate from older qemu?

Anyway, now that I look into this (esp. git log) I came accross:

commit f309db1f4d51009bad0d32e12efc75530b66836b
Author:     Michal Privoznik <mprivozn@redhat.com>
AuthorDate: Thu Dec 18 12:36:48 2014 +0100
Commit:     Michal Privoznik <mprivozn@redhat.com>
CommitDate: Fri Dec 19 07:44:44 2014 +0100

     qemu: Create memory-backend-{ram,file} iff needed

Or this 7832fac84741d65e851dbdbfaf474785cbfdcf3c. We did try to 
generated newer cmd line but then for various reasong (e.g. avoiding 
triggering a qemu bug) we turned it off and make libvirt default to 
older (now deprecated) cmd line.

Frankly, I don't know how to proceed. Unless qemu is fixed to allow 
migration from deprecated to new cmd line (unlikely, if not impossible, 
right?) then I guess the only approach we can have is that:

1) whenever so called cold booting a new machine (fresh, brand new start 
of a new domain) libvirt would default to modern cmd line,

2) on migration, libvirt would record in the migration stream (or status 
XML or wherever) that modern cmd line was generated and thus it'll make 
the destination generate modern cmd line too.

This solution still suffers a couple of problems:
a) migration to older libvirt will fail as older libvirt won't recognize 
the flag set in 2) and therefore would default to deprecated cmd line
b) migrating from one host to another won't modernize the cmd line

But I guess we have to draw a line somewhere (if we are not willing to 
write those migration patches).

Michal
Daniel P. Berrangé March 4, 2019, 4:27 p.m. UTC | #26
On Mon, Mar 04, 2019 at 05:12:40PM +0100, Michal Privoznik wrote:
> On 3/4/19 4:19 PM, Igor Mammedov wrote:
> 
> > Then I'd guess that most VMs end up with default '-numa node,mem'
> > which by design can produce only fake NUMA without ability to manage
> > guest RAM on host side. So such VMs aren't getting performance benefits
> > or worse run with performance regression (due to wrong sched/mm decisions
> > as guest kernel assumes NUMA topology is valid one).
> 
> Specifying NUMA distances in libvirt XML makes it generate the modern cmd
> line.

AFAIK, specifying any guest NUMA -> Host NUMA affinity makes it use the
modern cmd line. eg I  just modified a plain 8 CPU / 2 GB RAM guest
with this:

  <numatune>
    <memnode cellid='0' mode='strict' nodeset='0'/>
    <memnode cellid='1' mode='strict' nodeset='1'/>
  </numatune>
  <cpu mode='host-model'>
    <numa>
      <cell id='0' cpus='0-3' memory='1024000' unit='KiB'/>
      <cell id='1' cpus='4-7' memory='1024000' unit='KiB'/>
    </numa>
  </cpu>

and I can see libvirt decided to use memdev

  -object memory-backend-ram,id=ram-node0,size=1048576000,host-nodes=0,policy=bind
  -numa node,nodeid=0,cpus=0-3,memdev=ram-node0
  -object memory-backend-ram,id=ram-node1,size=1048576000,host-nodes=1,policy=bind
  -numa node,nodeid=1,cpus=4-7,memdev=ram-node1 

So unless I'm missing something, we aren't suffering from the problem
described by Igor above even today.


Regards,
Daniel
Dr. David Alan Gilbert March 4, 2019, 4:31 p.m. UTC | #27
* Michal Privoznik (mprivozn@redhat.com) wrote:
> On 3/4/19 3:24 PM, Daniel P. Berrangé wrote:
> > On Mon, Mar 04, 2019 at 03:16:41PM +0100, Igor Mammedov wrote:
> > > On Mon, 4 Mar 2019 12:39:08 +0000
> > > Daniel P. Berrangé <berrange@redhat.com> wrote:
> > > 
> > > > On Mon, Mar 04, 2019 at 01:25:07PM +0100, Igor Mammedov wrote:
> > > > > On Mon, 04 Mar 2019 08:13:53 +0100
> > > > > Markus Armbruster <armbru@redhat.com> wrote:
> > > > > > Daniel P. Berrangé <berrange@redhat.com> writes:
> > > > > > > On Fri, Mar 01, 2019 at 06:33:28PM +0100, Igor Mammedov wrote:
> > > > > > > > On Fri, 1 Mar 2019 15:49:47 +0000
> > > > > > > > Daniel P. Berrangé <berrange@redhat.com> wrote:
> > > > > > > > > On Fri, Mar 01, 2019 at 04:42:15PM +0100, Igor Mammedov wrote:
> > > > > > > > > > The parameter allows to configure fake NUMA topology where guest
> > > > > > > > > > VM simulates NUMA topology but not actually getting a performance
> > > > > > > > > > benefits from it. The same or better results could be achieved
> > > > > > > > > > using 'memdev' parameter. In light of that any VM that uses NUMA
> > > > > > > > > > to get its benefits should use 'memdev' and to allow transition
> > > > > > > > > > initial RAM to device based model, deprecate 'mem' parameter as
> > > > > > > > > > its ad-hoc partitioning of initial RAM MemoryRegion can't be
> > > > > > > > > > translated to memdev based backend transparently to users and in
> > > > > > > > > > compatible manner (migration wise).
> > > > > > > > > > 
> > > > > > > > > > That will also allow to clean up a bit our numa code, leaving only
> > > > > > > > > > 'memdev' impl. in place and several boards that use node_mem
> > > > > > > > > > to generate FDT/ACPI description from it.
> > > > > > > > > 
> > > > > > > > > Can you confirm that the  'mem' and 'memdev' parameters to -numa
> > > > > > > > > are 100% live migration compatible in both directions ?  Libvirt
> > > > > > > > > would need this to be the case in order to use the 'memdev' syntax
> > > > > > > > > instead.
> > > > > > > > Unfortunately they are not migration compatible in any direction,
> > > > > > > > if it where possible to translate them to each other I'd alias 'mem'
> > > > > > > > to 'memdev' without deprecation. The former sends over only one
> > > > > > > > MemoryRegion to target, while the later sends over several (one per
> > > > > > > > memdev).
> > > > > > > 
> > > > > > > If we can't migration from one to the other, then we can not deprecate
> > > > > > > the existing 'mem' syntax. Even if libvirt were to provide a config
> > > > > > > option to let apps opt-in to the new syntax, we need to be able to
> > > > > > > support live migration of existing running VMs indefinitely. Effectively
> > > > > > > this means we need the to keep 'mem' support forever, or at least such
> > > > > > > a long time that it effectively means forever.
> > > > > > > 
> > > > > > > So I think this patch has to be dropped & replaced with one that
> > > > > > > simply documents that memdev syntax is preferred.
> > > > > > 
> > > > > > We have this habit of postulating absolutes like "can not deprecate"
> > > > > > instead of engaging with the tradeoffs.  We need to kick it.
> > > > > > 
> > > > > > So let's have an actual look at the tradeoffs.
> > > > > > 
> > > > > > We don't actually "support live migration of existing running VMs
> > > > > > indefinitely".
> > > > > > 
> > > > > > We support live migration to any newer version of QEMU that still
> > > > > > supports the machine type.
> > > > > > 
> > > > > > We support live migration to any older version of QEMU that already
> > > > > > supports the machine type and all the devices the machine uses.
> > > > > > 
> > > > > > Aside: "support" is really an honest best effort here.  If you rely on
> > > > > > it, use a downstream that puts in the (substantial!) QA work real
> > > > > > support takes.
> > > > > > 
> > > > > > Feature deprecation is not a contract to drop the feature after two
> > > > > > releases, or even five.  It's a formal notice that users of the feature
> > > > > > should transition to its replacement in an orderly manner.
> > > > > > 
> > > > > > If I understand Igor correctly, all users should transition away from
> > > > > > outdated NUMA configurations at least for new VMs in an orderly manner.
> > > > > Yes, we can postpone removing options until there are machines type
> > > > > versions that were capable to use it (unfortunate but probably
> > > > > unavoidable unless there is a migration trick to make transition
> > > > > transparent) but that should not stop us from disabling broken
> > > > > options on new machine types at least.
> > > > > 
> > > > > This series can serve as formal notice with follow up disabling of
> > > > > deprecated options for new machine types. (As Thomas noted, just warnings
> > > > > do not work and users continue to use broken features regardless whether
> > > > > they are don't know about issues or aware of it [*])
> > > > > 
> > > > > Hence suggested deprecation approach and enforced rejection of legacy
> > > > > numa options for new machine types in 2 releases so users would stop
> > > > > using them eventually.
> > > > 
> > > > When we deprecate something, we need to have a way for apps to use the
> > > > new alternative approach *at the same time*.  So even if we only want to
> > > > deprecate for new machine types, we still have to first solve the problem
> > > > of how mgmt apps will introspect QEMU to learn which machine types expect
> > > > the new options.
> > > I'm not aware any mechanism to introspect machine type options (existing
> > > or something being developed). Are/were there any ideas about it that were
> > > discussed in the past?
> > > 
> > > Aside from developing a new mechanism what are alternative approaches?
> > > I mean when we delete deprecated CLI option, how it's solved on libvirt
> > > side currently?
> > > 
> > > For example I don't see anything introspection related when we have been
> > > removing deprecated options recently.
> > 
> > Right, with other stuff we deprecate we've had a simpler time, as it
> > either didn't affect migration at all, or the new replacement stuff
> > was fully compatible with the migration data stream. IOW, libvirt
> > could unconditionally use the new feature as soon as it saw that it
> > exists in QEMU. We didn't have any machine type dependancy to deal
> > with before now.
> 
> We couldn't have done that. How we would migrate from older qemu?
> 
> Anyway, now that I look into this (esp. git log) I came accross:
> 
> commit f309db1f4d51009bad0d32e12efc75530b66836b
> Author:     Michal Privoznik <mprivozn@redhat.com>
> AuthorDate: Thu Dec 18 12:36:48 2014 +0100
> Commit:     Michal Privoznik <mprivozn@redhat.com>
> CommitDate: Fri Dec 19 07:44:44 2014 +0100
> 
>     qemu: Create memory-backend-{ram,file} iff needed
> 
> Or this 7832fac84741d65e851dbdbfaf474785cbfdcf3c. We did try to generated
> newer cmd line but then for various reasong (e.g. avoiding triggering a qemu
> bug) we turned it off and make libvirt default to older (now deprecated) cmd
> line.
> 
> Frankly, I don't know how to proceed. Unless qemu is fixed to allow
> migration from deprecated to new cmd line (unlikely, if not impossible,
> right?) then I guess the only approach we can have is that:
> 
> 1) whenever so called cold booting a new machine (fresh, brand new start of
> a new domain) libvirt would default to modern cmd line,
>
> 2) on migration, libvirt would record in the migration stream (or status XML
> or wherever) that modern cmd line was generated and thus it'll make the
> destination generate modern cmd line too.
> 
> This solution still suffers a couple of problems:
> a) migration to older libvirt will fail as older libvirt won't recognize the
> flag set in 2) and therefore would default to deprecated cmd line
> b) migrating from one host to another won't modernize the cmd line
> 
> But I guess we have to draw a line somewhere (if we are not willing to write
> those migration patches).

What's interesting here is that this problem isn't really machine-type
related; so providing introspection on the machine type doesn't
immediately help.  What we're actually trying to do here is (mis)use a
machine type as a proxy for knowing that both sides are new enough to
handle the new command line.

That's an OK thing to do, and if we did have introspection we could
add a fudge flag to say it's allowed now.

Dave


> Michal
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Daniel P. Berrangé March 4, 2019, 4:35 p.m. UTC | #28
On Mon, Mar 04, 2019 at 05:20:13PM +0100, Michal Privoznik wrote:
> On 3/4/19 3:24 PM, Daniel P. Berrangé wrote:
> > On Mon, Mar 04, 2019 at 03:16:41PM +0100, Igor Mammedov wrote:
> > > On Mon, 4 Mar 2019 12:39:08 +0000
> > > Daniel P. Berrangé <berrange@redhat.com> wrote:
> > > 
> > > > On Mon, Mar 04, 2019 at 01:25:07PM +0100, Igor Mammedov wrote:
> > > > > On Mon, 04 Mar 2019 08:13:53 +0100
> > > > > Markus Armbruster <armbru@redhat.com> wrote:
> > > > > > Daniel P. Berrangé <berrange@redhat.com> writes:
> > > > > > > On Fri, Mar 01, 2019 at 06:33:28PM +0100, Igor Mammedov wrote:
> > > > > > > > On Fri, 1 Mar 2019 15:49:47 +0000
> > > > > > > > Daniel P. Berrangé <berrange@redhat.com> wrote:
> > > > > > > > > On Fri, Mar 01, 2019 at 04:42:15PM +0100, Igor Mammedov wrote:
> > > > > > > > > > The parameter allows to configure fake NUMA topology where guest
> > > > > > > > > > VM simulates NUMA topology but not actually getting a performance
> > > > > > > > > > benefits from it. The same or better results could be achieved
> > > > > > > > > > using 'memdev' parameter. In light of that any VM that uses NUMA
> > > > > > > > > > to get its benefits should use 'memdev' and to allow transition
> > > > > > > > > > initial RAM to device based model, deprecate 'mem' parameter as
> > > > > > > > > > its ad-hoc partitioning of initial RAM MemoryRegion can't be
> > > > > > > > > > translated to memdev based backend transparently to users and in
> > > > > > > > > > compatible manner (migration wise).
> > > > > > > > > > 
> > > > > > > > > > That will also allow to clean up a bit our numa code, leaving only
> > > > > > > > > > 'memdev' impl. in place and several boards that use node_mem
> > > > > > > > > > to generate FDT/ACPI description from it.
> > > > > > > > > 
> > > > > > > > > Can you confirm that the  'mem' and 'memdev' parameters to -numa
> > > > > > > > > are 100% live migration compatible in both directions ?  Libvirt
> > > > > > > > > would need this to be the case in order to use the 'memdev' syntax
> > > > > > > > > instead.
> > > > > > > > Unfortunately they are not migration compatible in any direction,
> > > > > > > > if it where possible to translate them to each other I'd alias 'mem'
> > > > > > > > to 'memdev' without deprecation. The former sends over only one
> > > > > > > > MemoryRegion to target, while the later sends over several (one per
> > > > > > > > memdev).
> > > > > > > 
> > > > > > > If we can't migration from one to the other, then we can not deprecate
> > > > > > > the existing 'mem' syntax. Even if libvirt were to provide a config
> > > > > > > option to let apps opt-in to the new syntax, we need to be able to
> > > > > > > support live migration of existing running VMs indefinitely. Effectively
> > > > > > > this means we need the to keep 'mem' support forever, or at least such
> > > > > > > a long time that it effectively means forever.
> > > > > > > 
> > > > > > > So I think this patch has to be dropped & replaced with one that
> > > > > > > simply documents that memdev syntax is preferred.
> > > > > > 
> > > > > > We have this habit of postulating absolutes like "can not deprecate"
> > > > > > instead of engaging with the tradeoffs.  We need to kick it.
> > > > > > 
> > > > > > So let's have an actual look at the tradeoffs.
> > > > > > 
> > > > > > We don't actually "support live migration of existing running VMs
> > > > > > indefinitely".
> > > > > > 
> > > > > > We support live migration to any newer version of QEMU that still
> > > > > > supports the machine type.
> > > > > > 
> > > > > > We support live migration to any older version of QEMU that already
> > > > > > supports the machine type and all the devices the machine uses.
> > > > > > 
> > > > > > Aside: "support" is really an honest best effort here.  If you rely on
> > > > > > it, use a downstream that puts in the (substantial!) QA work real
> > > > > > support takes.
> > > > > > 
> > > > > > Feature deprecation is not a contract to drop the feature after two
> > > > > > releases, or even five.  It's a formal notice that users of the feature
> > > > > > should transition to its replacement in an orderly manner.
> > > > > > 
> > > > > > If I understand Igor correctly, all users should transition away from
> > > > > > outdated NUMA configurations at least for new VMs in an orderly manner.
> > > > > Yes, we can postpone removing options until there are machines type
> > > > > versions that were capable to use it (unfortunate but probably
> > > > > unavoidable unless there is a migration trick to make transition
> > > > > transparent) but that should not stop us from disabling broken
> > > > > options on new machine types at least.
> > > > > 
> > > > > This series can serve as formal notice with follow up disabling of
> > > > > deprecated options for new machine types. (As Thomas noted, just warnings
> > > > > do not work and users continue to use broken features regardless whether
> > > > > they are don't know about issues or aware of it [*])
> > > > > 
> > > > > Hence suggested deprecation approach and enforced rejection of legacy
> > > > > numa options for new machine types in 2 releases so users would stop
> > > > > using them eventually.
> > > > 
> > > > When we deprecate something, we need to have a way for apps to use the
> > > > new alternative approach *at the same time*.  So even if we only want to
> > > > deprecate for new machine types, we still have to first solve the problem
> > > > of how mgmt apps will introspect QEMU to learn which machine types expect
> > > > the new options.
> > > I'm not aware any mechanism to introspect machine type options (existing
> > > or something being developed). Are/were there any ideas about it that were
> > > discussed in the past?
> > > 
> > > Aside from developing a new mechanism what are alternative approaches?
> > > I mean when we delete deprecated CLI option, how it's solved on libvirt
> > > side currently?
> > > 
> > > For example I don't see anything introspection related when we have been
> > > removing deprecated options recently.
> > 
> > Right, with other stuff we deprecate we've had a simpler time, as it
> > either didn't affect migration at all, or the new replacement stuff
> > was fully compatible with the migration data stream. IOW, libvirt
> > could unconditionally use the new feature as soon as it saw that it
> > exists in QEMU. We didn't have any machine type dependancy to deal
> > with before now.
> 
> We couldn't have done that. How we would migrate from older qemu?
> 
> Anyway, now that I look into this (esp. git log) I came accross:
> 
> commit f309db1f4d51009bad0d32e12efc75530b66836b
> Author:     Michal Privoznik <mprivozn@redhat.com>
> AuthorDate: Thu Dec 18 12:36:48 2014 +0100
> Commit:     Michal Privoznik <mprivozn@redhat.com>
> CommitDate: Fri Dec 19 07:44:44 2014 +0100
> 
>     qemu: Create memory-backend-{ram,file} iff needed
> 
> Or this 7832fac84741d65e851dbdbfaf474785cbfdcf3c. We did try to generated
> newer cmd line but then for various reasong (e.g. avoiding triggering a qemu
> bug) we turned it off and make libvirt default to older (now deprecated) cmd
> line.
> 
> Frankly, I don't know how to proceed. Unless qemu is fixed to allow
> migration from deprecated to new cmd line (unlikely, if not impossible,
> right?) then I guess the only approach we can have is that:
> 
> 1) whenever so called cold booting a new machine (fresh, brand new start of
> a new domain) libvirt would default to modern cmd line,
> 
> 2) on migration, libvirt would record in the migration stream (or status XML
> or wherever) that modern cmd line was generated and thus it'll make the
> destination generate modern cmd line too.
> 
> This solution still suffers a couple of problems:
> a) migration to older libvirt will fail as older libvirt won't recognize the
> flag set in 2) and therefore would default to deprecated cmd line
> b) migrating from one host to another won't modernize the cmd line
> 
> But I guess we have to draw a line somewhere (if we are not willing to write
> those migration patches).

Yeah supporting backwards migration is a non-optional requirement from at
least one of the mgmt apps using libvirt, so breaking the new to old case
is something we always aim to avoid.

These incompabilities are reminding me why we haven't tied these kind of
changes to machine type versions in the past. New machine type != new
libvirt, so we can't tie usage of a feature in livirt to a new machine
type.

I'm wondering exactly which cases libvirt will still use the "mem" option
in  as opposed to "memdev".  If none of the cases using "mem" actually
suffer from the ill-effects of "mem", then there's not a compelling reason
to stop using it. It can be discouraged in QEMU documentation but otherwise
left alone.

Regards,
Daniel
Igor Mammedov March 4, 2019, 4:45 p.m. UTC | #29
On Mon, 4 Mar 2019 15:02:18 +0000
Daniel P. Berrangé <berrange@redhat.com> wrote:

> On Mon, Mar 04, 2019 at 03:54:57PM +0100, Igor Mammedov wrote:
> > On Mon, 4 Mar 2019 13:59:09 +0000
> > Daniel P. Berrangé <berrange@redhat.com> wrote:
> >   
> > > On Mon, Mar 04, 2019 at 02:55:10PM +0100, Igor Mammedov wrote:  
> > > > On Mon, 4 Mar 2019 09:11:19 +0100
> > > > Thomas Huth <thuth@redhat.com> wrote:
> > > >     
> > > > > On 01/03/2019 18.48, Daniel P. Berrangé wrote:
> > > > > [...]    
> > > > > > So I think this patch has to be dropped & replaced with one that
> > > > > > simply documents that memdev syntax is preferred.      
> > > > > 
> > > > > That's definitely not enough. I've had a couple of cases already where
> > > > > we documented that certain options should not be used anymore, and
> > > > > people simply ignored it (aka. if it ain't broken, don't do any change).
> > > > > Then they just started to complain when I really tried to remove the
> > > > > option after the deprecation period.    
> > > >     
> > > > > So Igor, if you can not officially deprecate these things here yet, you
> > > > > should at least make sure that they can not be used with new machine
> > > > > types anymore. Then, after a couple of years, when we feel sure that
> > > > > there are only some few or no people left who still use it with the old
> > > > > machine types, we can start to discuss the deprecation process again, I
> > > > > think.    
> > > > Is it acceptable to silently disable CLI options (even if they are broken
> > > > like in this case) for new machine types?
> > > > I was under impression that it should go through deprecation first.    
> > > 
> > > Yes, it must go through deprecation. I was saying we can't disable
> > > the CLI options at all, until there is a way for libvirt to correctly
> > > use the new options.  
> > 
> > I'm not adding new options (nor plan to for numa case (yet)),
> > -numa node,memdev is around several years by now and should be used
> > as default for creating new configs.
> > 
> > In light of keeping 'mem' option around for old machines,
> > Deprecation should have served for notifying users that legacy
> > options will be disabled later on (for new machines at least
> > if no way found for migration compatible transition for older ones).
> > 
> > What I'm mainly aiming here is to prevent using broken legacy options
> > for new VMs (like in RHBZ1624223 case) and deprecation is the only way
> > we have now to notify users about CLI breaking changes.  
> 
> The idea of doing advance warnings via deprecations is that applications
> have time to adapt to the new mechanism several releases before the old
> mechanism is removed/disabled.  Since the new mechanism isn't fully
> usable yet, applications can't adapt to use it. So we can't start the
> deprecation process yet, as it would be telling apps to do a switch
> that isn't possible for many to actually do.

At least a hope attempt to deprecate 'mem', served its purpose somewhat.
Now it's clear that libvirt defaults to wrong legacy mode and should use
'memdev' instead. I hope that it will be addressed on libvirt side
regardless of the fate of 'mem' deprecation process.
Basically new VM should default to 'memdev' if it's available. 


> In the meantime, qemu-options.hx could be updated. It documents both
> "mem" and "memdev" currently but doesn't tell people that "memdev" is
> the preferred syntax for future usage / warn against using "mem".

Will do so.

> 
> Regards,
> Daniel
Igor Mammedov March 6, 2019, 7:03 p.m. UTC | #30
On Mon, 4 Mar 2019 16:35:16 +0000
Daniel P. Berrangé <berrange@redhat.com> wrote:

> On Mon, Mar 04, 2019 at 05:20:13PM +0100, Michal Privoznik wrote:
> > On 3/4/19 3:24 PM, Daniel P. Berrangé wrote:
> > > On Mon, Mar 04, 2019 at 03:16:41PM +0100, Igor Mammedov wrote:
> > > > On Mon, 4 Mar 2019 12:39:08 +0000
> > > > Daniel P. Berrangé <berrange@redhat.com> wrote:
> > > > 
> > > > > On Mon, Mar 04, 2019 at 01:25:07PM +0100, Igor Mammedov wrote:
> > > > > > On Mon, 04 Mar 2019 08:13:53 +0100
> > > > > > Markus Armbruster <armbru@redhat.com> wrote:
> > > > > > > Daniel P. Berrangé <berrange@redhat.com> writes:
> > > > > > > > On Fri, Mar 01, 2019 at 06:33:28PM +0100, Igor Mammedov wrote:
> > > > > > > > > On Fri, 1 Mar 2019 15:49:47 +0000
> > > > > > > > > Daniel P. Berrangé <berrange@redhat.com> wrote:
> > > > > > > > > > On Fri, Mar 01, 2019 at 04:42:15PM +0100, Igor Mammedov wrote:
> > > > > > > > > > > The parameter allows to configure fake NUMA topology where guest
> > > > > > > > > > > VM simulates NUMA topology but not actually getting a performance
> > > > > > > > > > > benefits from it. The same or better results could be achieved
> > > > > > > > > > > using 'memdev' parameter. In light of that any VM that uses NUMA
> > > > > > > > > > > to get its benefits should use 'memdev' and to allow transition
> > > > > > > > > > > initial RAM to device based model, deprecate 'mem' parameter as
> > > > > > > > > > > its ad-hoc partitioning of initial RAM MemoryRegion can't be
> > > > > > > > > > > translated to memdev based backend transparently to users and in
> > > > > > > > > > > compatible manner (migration wise).
> > > > > > > > > > > 
> > > > > > > > > > > That will also allow to clean up a bit our numa code, leaving only
> > > > > > > > > > > 'memdev' impl. in place and several boards that use node_mem
> > > > > > > > > > > to generate FDT/ACPI description from it.
> > > > > > > > > > 
> > > > > > > > > > Can you confirm that the  'mem' and 'memdev' parameters to -numa
> > > > > > > > > > are 100% live migration compatible in both directions ?  Libvirt
> > > > > > > > > > would need this to be the case in order to use the 'memdev' syntax
> > > > > > > > > > instead.
> > > > > > > > > Unfortunately they are not migration compatible in any direction,
> > > > > > > > > if it where possible to translate them to each other I'd alias 'mem'
> > > > > > > > > to 'memdev' without deprecation. The former sends over only one
> > > > > > > > > MemoryRegion to target, while the later sends over several (one per
> > > > > > > > > memdev).
> > > > > > > > 
> > > > > > > > If we can't migration from one to the other, then we can not deprecate
> > > > > > > > the existing 'mem' syntax. Even if libvirt were to provide a config
> > > > > > > > option to let apps opt-in to the new syntax, we need to be able to
> > > > > > > > support live migration of existing running VMs indefinitely. Effectively
> > > > > > > > this means we need the to keep 'mem' support forever, or at least such
> > > > > > > > a long time that it effectively means forever.
> > > > > > > > 
> > > > > > > > So I think this patch has to be dropped & replaced with one that
> > > > > > > > simply documents that memdev syntax is preferred.
> > > > > > > 
> > > > > > > We have this habit of postulating absolutes like "can not deprecate"
> > > > > > > instead of engaging with the tradeoffs.  We need to kick it.
> > > > > > > 
> > > > > > > So let's have an actual look at the tradeoffs.
> > > > > > > 
> > > > > > > We don't actually "support live migration of existing running VMs
> > > > > > > indefinitely".
> > > > > > > 
> > > > > > > We support live migration to any newer version of QEMU that still
> > > > > > > supports the machine type.
> > > > > > > 
> > > > > > > We support live migration to any older version of QEMU that already
> > > > > > > supports the machine type and all the devices the machine uses.
> > > > > > > 
> > > > > > > Aside: "support" is really an honest best effort here.  If you rely on
> > > > > > > it, use a downstream that puts in the (substantial!) QA work real
> > > > > > > support takes.
> > > > > > > 
> > > > > > > Feature deprecation is not a contract to drop the feature after two
> > > > > > > releases, or even five.  It's a formal notice that users of the feature
> > > > > > > should transition to its replacement in an orderly manner.
> > > > > > > 
> > > > > > > If I understand Igor correctly, all users should transition away from
> > > > > > > outdated NUMA configurations at least for new VMs in an orderly manner.
> > > > > > Yes, we can postpone removing options until there are machines type
> > > > > > versions that were capable to use it (unfortunate but probably
> > > > > > unavoidable unless there is a migration trick to make transition
> > > > > > transparent) but that should not stop us from disabling broken
> > > > > > options on new machine types at least.
> > > > > > 
> > > > > > This series can serve as formal notice with follow up disabling of
> > > > > > deprecated options for new machine types. (As Thomas noted, just warnings
> > > > > > do not work and users continue to use broken features regardless whether
> > > > > > they are don't know about issues or aware of it [*])
> > > > > > 
> > > > > > Hence suggested deprecation approach and enforced rejection of legacy
> > > > > > numa options for new machine types in 2 releases so users would stop
> > > > > > using them eventually.
> > > > > 
> > > > > When we deprecate something, we need to have a way for apps to use the
> > > > > new alternative approach *at the same time*.  So even if we only want to
> > > > > deprecate for new machine types, we still have to first solve the problem
> > > > > of how mgmt apps will introspect QEMU to learn which machine types expect
> > > > > the new options.
> > > > I'm not aware any mechanism to introspect machine type options (existing
> > > > or something being developed). Are/were there any ideas about it that were
> > > > discussed in the past?
> > > > 
> > > > Aside from developing a new mechanism what are alternative approaches?
> > > > I mean when we delete deprecated CLI option, how it's solved on libvirt
> > > > side currently?
> > > > 
> > > > For example I don't see anything introspection related when we have been
> > > > removing deprecated options recently.
> > > 
> > > Right, with other stuff we deprecate we've had a simpler time, as it
> > > either didn't affect migration at all, or the new replacement stuff
> > > was fully compatible with the migration data stream. IOW, libvirt
> > > could unconditionally use the new feature as soon as it saw that it
> > > exists in QEMU. We didn't have any machine type dependancy to deal
> > > with before now.
> > 
> > We couldn't have done that. How we would migrate from older qemu?
> > 
> > Anyway, now that I look into this (esp. git log) I came accross:
> > 
> > commit f309db1f4d51009bad0d32e12efc75530b66836b
> > Author:     Michal Privoznik <mprivozn@redhat.com>
> > AuthorDate: Thu Dec 18 12:36:48 2014 +0100
> > Commit:     Michal Privoznik <mprivozn@redhat.com>
> > CommitDate: Fri Dec 19 07:44:44 2014 +0100
> > 
> >     qemu: Create memory-backend-{ram,file} iff needed
> > 
> > Or this 7832fac84741d65e851dbdbfaf474785cbfdcf3c. We did try to generated
> > newer cmd line but then for various reasong (e.g. avoiding triggering a qemu
> > bug) we turned it off and make libvirt default to older (now deprecated) cmd
> > line.
> > 
> > Frankly, I don't know how to proceed. Unless qemu is fixed to allow
> > migration from deprecated to new cmd line (unlikely, if not impossible,
> > right?) then I guess the only approach we can have is that:
> > 
> > 1) whenever so called cold booting a new machine (fresh, brand new start of
> > a new domain) libvirt would default to modern cmd line,
> > 
> > 2) on migration, libvirt would record in the migration stream (or status XML
> > or wherever) that modern cmd line was generated and thus it'll make the
> > destination generate modern cmd line too.
> > 
> > This solution still suffers a couple of problems:
> > a) migration to older libvirt will fail as older libvirt won't recognize the
> > flag set in 2) and therefore would default to deprecated cmd line
> > b) migrating from one host to another won't modernize the cmd line
> > 
> > But I guess we have to draw a line somewhere (if we are not willing to write
> > those migration patches).
> 
> Yeah supporting backwards migration is a non-optional requirement from at
> least one of the mgmt apps using libvirt, so breaking the new to old case
> is something we always aim to avoid.
Aiming for support of 
"new QEMU + new machine type" => "old QEMU + non-existing machine type"
seems a bit difficult.
Note old machine types will continue to work with old CLI.
 

> These incompabilities are reminding me why we haven't tied these kind of
> changes to machine type versions in the past. New machine type != new
> libvirt, so we can't tie usage of a feature in livirt to a new machine
> type.
> 
> I'm wondering exactly which cases libvirt will still use the "mem" option
> in  as opposed to "memdev".  If none of the cases using "mem" actually
> suffer from the ill-effects of "mem", then there's not a compelling reason
> to stop using it. It can be discouraged in QEMU documentation but otherwise
> left alone.
> 
> Regards,
> Daniel
Igor Mammedov March 6, 2019, 7:56 p.m. UTC | #31
On Mon, 4 Mar 2019 17:20:13 +0100
Michal Privoznik <mprivozn@redhat.com> wrote:

> On 3/4/19 3:24 PM, Daniel P. Berrangé wrote:
> > On Mon, Mar 04, 2019 at 03:16:41PM +0100, Igor Mammedov wrote:
> >> On Mon, 4 Mar 2019 12:39:08 +0000
> >> Daniel P. Berrangé <berrange@redhat.com> wrote:
> >>
> >>> On Mon, Mar 04, 2019 at 01:25:07PM +0100, Igor Mammedov wrote:
> >>>> On Mon, 04 Mar 2019 08:13:53 +0100
> >>>> Markus Armbruster <armbru@redhat.com> wrote:
> >>>>    
> >>>>> Daniel P. Berrangé <berrange@redhat.com> writes:
> >>>>>    
> >>>>>> On Fri, Mar 01, 2019 at 06:33:28PM +0100, Igor Mammedov wrote:
> >>>>>>> On Fri, 1 Mar 2019 15:49:47 +0000
> >>>>>>> Daniel P. Berrangé <berrange@redhat.com> wrote:
> >>>>>>>      
> >>>>>>>> On Fri, Mar 01, 2019 at 04:42:15PM +0100, Igor Mammedov wrote:
> >>>>>>>>> The parameter allows to configure fake NUMA topology where guest
> >>>>>>>>> VM simulates NUMA topology but not actually getting a performance
> >>>>>>>>> benefits from it. The same or better results could be achieved
> >>>>>>>>> using 'memdev' parameter. In light of that any VM that uses NUMA
> >>>>>>>>> to get its benefits should use 'memdev' and to allow transition
> >>>>>>>>> initial RAM to device based model, deprecate 'mem' parameter as
> >>>>>>>>> its ad-hoc partitioning of initial RAM MemoryRegion can't be
> >>>>>>>>> translated to memdev based backend transparently to users and in
> >>>>>>>>> compatible manner (migration wise).
> >>>>>>>>>
> >>>>>>>>> That will also allow to clean up a bit our numa code, leaving only
> >>>>>>>>> 'memdev' impl. in place and several boards that use node_mem
> >>>>>>>>> to generate FDT/ACPI description from it.
> >>>>>>>>
> >>>>>>>> Can you confirm that the  'mem' and 'memdev' parameters to -numa
> >>>>>>>> are 100% live migration compatible in both directions ?  Libvirt
> >>>>>>>> would need this to be the case in order to use the 'memdev' syntax
> >>>>>>>> instead.
> >>>>>>> Unfortunately they are not migration compatible in any direction,
> >>>>>>> if it where possible to translate them to each other I'd alias 'mem'
> >>>>>>> to 'memdev' without deprecation. The former sends over only one
> >>>>>>> MemoryRegion to target, while the later sends over several (one per
> >>>>>>> memdev).
> >>>>>>
> >>>>>> If we can't migration from one to the other, then we can not deprecate
> >>>>>> the existing 'mem' syntax. Even if libvirt were to provide a config
> >>>>>> option to let apps opt-in to the new syntax, we need to be able to
> >>>>>> support live migration of existing running VMs indefinitely. Effectively
> >>>>>> this means we need the to keep 'mem' support forever, or at least such
> >>>>>> a long time that it effectively means forever.
> >>>>>>
> >>>>>> So I think this patch has to be dropped & replaced with one that
> >>>>>> simply documents that memdev syntax is preferred.
> >>>>>
> >>>>> We have this habit of postulating absolutes like "can not deprecate"
> >>>>> instead of engaging with the tradeoffs.  We need to kick it.
> >>>>>
> >>>>> So let's have an actual look at the tradeoffs.
> >>>>>
> >>>>> We don't actually "support live migration of existing running VMs
> >>>>> indefinitely".
> >>>>>
> >>>>> We support live migration to any newer version of QEMU that still
> >>>>> supports the machine type.
> >>>>>
> >>>>> We support live migration to any older version of QEMU that already
> >>>>> supports the machine type and all the devices the machine uses.
> >>>>>
> >>>>> Aside: "support" is really an honest best effort here.  If you rely on
> >>>>> it, use a downstream that puts in the (substantial!) QA work real
> >>>>> support takes.
> >>>>>
> >>>>> Feature deprecation is not a contract to drop the feature after two
> >>>>> releases, or even five.  It's a formal notice that users of the feature
> >>>>> should transition to its replacement in an orderly manner.
> >>>>>
> >>>>> If I understand Igor correctly, all users should transition away from
> >>>>> outdated NUMA configurations at least for new VMs in an orderly manner.
> >>>> Yes, we can postpone removing options until there are machines type
> >>>> versions that were capable to use it (unfortunate but probably
> >>>> unavoidable unless there is a migration trick to make transition
> >>>> transparent) but that should not stop us from disabling broken
> >>>> options on new machine types at least.
> >>>>
> >>>> This series can serve as formal notice with follow up disabling of
> >>>> deprecated options for new machine types. (As Thomas noted, just warnings
> >>>> do not work and users continue to use broken features regardless whether
> >>>> they are don't know about issues or aware of it [*])
> >>>>
> >>>> Hence suggested deprecation approach and enforced rejection of legacy
> >>>> numa options for new machine types in 2 releases so users would stop
> >>>> using them eventually.
> >>>
> >>> When we deprecate something, we need to have a way for apps to use the
> >>> new alternative approach *at the same time*.  So even if we only want to
> >>> deprecate for new machine types, we still have to first solve the problem
> >>> of how mgmt apps will introspect QEMU to learn which machine types expect
> >>> the new options.
> >> I'm not aware any mechanism to introspect machine type options (existing
> >> or something being developed). Are/were there any ideas about it that were
> >> discussed in the past?
> >>
> >> Aside from developing a new mechanism what are alternative approaches?
> >> I mean when we delete deprecated CLI option, how it's solved on libvirt
> >> side currently?
> >>
> >> For example I don't see anything introspection related when we have been
> >> removing deprecated options recently.
> > 
> > Right, with other stuff we deprecate we've had a simpler time, as it
> > either didn't affect migration at all, or the new replacement stuff
> > was fully compatible with the migration data stream. IOW, libvirt
> > could unconditionally use the new feature as soon as it saw that it
> > exists in QEMU. We didn't have any machine type dependancy to deal
> > with before now.
> 
> We couldn't have done that. How we would migrate from older qemu?
> 
> Anyway, now that I look into this (esp. git log) I came accross:
> 
> commit f309db1f4d51009bad0d32e12efc75530b66836b
> Author:     Michal Privoznik <mprivozn@redhat.com>
> AuthorDate: Thu Dec 18 12:36:48 2014 +0100
> Commit:     Michal Privoznik <mprivozn@redhat.com>
> CommitDate: Fri Dec 19 07:44:44 2014 +0100
> 
>      qemu: Create memory-backend-{ram,file} iff needed
> 
> Or this 7832fac84741d65e851dbdbfaf474785cbfdcf3c. We did try to 
> generated newer cmd line but then for various reasong (e.g. avoiding 
> triggering a qemu bug) we turned it off and make libvirt default to 
> older (now deprecated) cmd line.
> 
> Frankly, I don't know how to proceed. Unless qemu is fixed to allow 
> migration from deprecated to new cmd line (unlikely, if not impossible, 
> right?) then I guess the only approach we can have is that:
In non numa case (or 1 node case) there is migration compatible solution,
I'm working on it.

The only problematic case is where -mem-path + -mem-prealloc + pc-dimm are
used with muti-node variant + '-numa node,mem=xx' option or plain '-numa node'
with implicit RAM distribution (the same as 'mem' just done by QEMU), here I
can't translate options due to different memory layout, so these VMs would 
still continue using old 'mem' with old machine types.

> 1) whenever so called cold booting a new machine (fresh, brand new start 
> of a new domain) libvirt would default to modern cmd line,
that's what I'd prefer and complained that this route wasn't followed.
(it would even work with currently broken QEMU, since modern CLI variant works)

It would wean down 'mem' userbase that configures NUMA incorrectly (i.e. non pinned mem&cpus).
Users still can do that with 'memdev' by leaving out bind policy but
when they ask for pinning with bind policy they will really get it since
under-hood "-mem-path + -mem-prealloc" will be the same 'memdev,prealloc=on',
without surprise due to quite non obvious call ordering.
 
> 2) on migration, libvirt would record in the migration stream (or status 
> XML or wherever) that modern cmd line was generated and thus it'll make 
> the destination generate modern cmd line too.
since legacy option is not removed, it's not necessary for converting
legacy SOURCE to new CLI (new QEMU would still accept legacy CLI for
old machine types), i.e.e business as usual.


> This solution still suffers a couple of problems:
> a) migration to older libvirt will fail as older libvirt won't recognize 
> the flag set in 2) and therefore would default to deprecated cmd line

> b) migrating from one host to another won't modernize the cmd line
I didn't knew that there was 'modernizing' feature in libvirt but
modernizing CLI only necessary when we remove option completely.


> But I guess we have to draw a line somewhere (if we are not willing to 
> write those migration patches).
I wasn't able to make migration from 1 to N MemoryRegions and back work.
So the only choice left is to start deprecation of numa 'mem' option with
new machine types, so that at least there memory pinning would not be broken
and it wouldn't be possible to mis-configure it.


> Michal
Daniel P. Berrangé March 7, 2019, 9:59 a.m. UTC | #32
On Wed, Mar 06, 2019 at 08:03:48PM +0100, Igor Mammedov wrote:
> On Mon, 4 Mar 2019 16:35:16 +0000
> Daniel P. Berrangé <berrange@redhat.com> wrote:
> 
> > On Mon, Mar 04, 2019 at 05:20:13PM +0100, Michal Privoznik wrote:
> > > We couldn't have done that. How we would migrate from older qemu?
> > > 
> > > Anyway, now that I look into this (esp. git log) I came accross:
> > > 
> > > commit f309db1f4d51009bad0d32e12efc75530b66836b
> > > Author:     Michal Privoznik <mprivozn@redhat.com>
> > > AuthorDate: Thu Dec 18 12:36:48 2014 +0100
> > > Commit:     Michal Privoznik <mprivozn@redhat.com>
> > > CommitDate: Fri Dec 19 07:44:44 2014 +0100
> > > 
> > >     qemu: Create memory-backend-{ram,file} iff needed
> > > 
> > > Or this 7832fac84741d65e851dbdbfaf474785cbfdcf3c. We did try to generated
> > > newer cmd line but then for various reasong (e.g. avoiding triggering a qemu
> > > bug) we turned it off and make libvirt default to older (now deprecated) cmd
> > > line.
> > > 
> > > Frankly, I don't know how to proceed. Unless qemu is fixed to allow
> > > migration from deprecated to new cmd line (unlikely, if not impossible,
> > > right?) then I guess the only approach we can have is that:
> > > 
> > > 1) whenever so called cold booting a new machine (fresh, brand new start of
> > > a new domain) libvirt would default to modern cmd line,
> > > 
> > > 2) on migration, libvirt would record in the migration stream (or status XML
> > > or wherever) that modern cmd line was generated and thus it'll make the
> > > destination generate modern cmd line too.
> > > 
> > > This solution still suffers a couple of problems:
> > > a) migration to older libvirt will fail as older libvirt won't recognize the
> > > flag set in 2) and therefore would default to deprecated cmd line
> > > b) migrating from one host to another won't modernize the cmd line
> > > 
> > > But I guess we have to draw a line somewhere (if we are not willing to write
> > > those migration patches).
> > 
> > Yeah supporting backwards migration is a non-optional requirement from at
> > least one of the mgmt apps using libvirt, so breaking the new to old case
> > is something we always aim to avoid.
> Aiming for support of 
> "new QEMU + new machine type" => "old QEMU + non-existing machine type"
> seems a bit difficult.

That's not the scenario that's the problem. The problem is

   new QEMU + new machine type + new libvirt   -> new QEMU + new machine type + old libvirt

Previously released versions of libvirt will happily use any new machine
type that QEMU introduces. So we can't make new libvirt use a different
options, only for new machine types, as old libvirt supports those machine
types too.

Regards,
Daniel
Markus Armbruster March 10, 2019, 10:14 a.m. UTC | #33
Daniel P. Berrangé <berrange@redhat.com> writes:

> On Mon, Mar 04, 2019 at 12:45:14PM +0100, Markus Armbruster wrote:
>> Daniel P. Berrangé <berrange@redhat.com> writes:
>> 
>> > On Mon, Mar 04, 2019 at 08:13:53AM +0100, Markus Armbruster wrote:
>> >> If we deprecate outdated NUMA configurations now, we can start rejecting
>> >> them with new machine types after a suitable grace period.
>> >
>> > How is libvirt going to know what machines it can use with the feature ?
>> > We don't have any way to introspect machine type specific logic, since we
>> > run all probing with "-machine none", and QEMU can't report anything about
>> > machines without instantiating them.
>> 
>> Fair point.  A practical way for management applications to decide which
>> of the two interfaces they can use with which machine type may be
>> required for deprecating one of the interfaces with new machine types.
>
> We currently have  "qom-list-properties" which can report on the
> existance of properties registered against object types. What it
> can't do though is report on the default values of these properties.

Yes.

> What's interesting though is that qmp_qom_list_properties will actually
> instantiate objects in order to query properties, if the type isn't an
> abstract type.

If it's an abstract type, qom-list-properties returns the properties
created with object_class_property_add() & friends, typically by the
class_init method.  This is possible without instantiating the type.

If it's a concrete type, qom-list-properties additionally returns the
properties created with object_property_add(), typically by the
instance_init() method.  This requires instantiating the type.

Both kinds of properties can be added or deleted at any time.  For
instance, setting a property value with object_property_set() or similar
could create additional properties.

For historical reasons, we use often use object_property_add() where
object_class_property_add() would do.  Sad.

> IOW, even if you are running "$QEMU -machine none", then if at the qmp-shell
> you do
>
>    (QEMU) qom-list-properties typename=pc-q35-2.6-machine
>
> it will have actually instantiate the pc-q35-2.6-machine machine type.
> Since it has instantiated the machine, the object initializer function
> will have run and initialized the default values for various properties.
>
> IOW, it is possible for qom-list-properties to report on default values
> for non-abstract types.

instance_init() also initializes the properties' values.
qom-list-properties could show these initial values (I hesitate calling
them default values).

Setting a property's value can change other properties' values by side
effect.

My point is: the properties qom-list-properties shows and the initial
values it could show are not necessarily final.  QOM is designed to be
maximally flexible, and flexibility brings along its bosom-buddy
complexity.

If you keep that in mind, qom-list-properties can be put to good use all
the same.

A way to report "default values" (really: whatever the values are after
object_new()) feels like a fair feature request to me, if backed by an
actual use case.

[...]
Markus Armbruster March 10, 2019, 10:16 a.m. UTC | #34
Daniel P. Berrangé <berrange@redhat.com> writes:

> On Wed, Mar 06, 2019 at 08:03:48PM +0100, Igor Mammedov wrote:
>> On Mon, 4 Mar 2019 16:35:16 +0000
>> Daniel P. Berrangé <berrange@redhat.com> wrote:
>> 
>> > On Mon, Mar 04, 2019 at 05:20:13PM +0100, Michal Privoznik wrote:
>> > > We couldn't have done that. How we would migrate from older qemu?
>> > > 
>> > > Anyway, now that I look into this (esp. git log) I came accross:
>> > > 
>> > > commit f309db1f4d51009bad0d32e12efc75530b66836b
>> > > Author:     Michal Privoznik <mprivozn@redhat.com>
>> > > AuthorDate: Thu Dec 18 12:36:48 2014 +0100
>> > > Commit:     Michal Privoznik <mprivozn@redhat.com>
>> > > CommitDate: Fri Dec 19 07:44:44 2014 +0100
>> > > 
>> > >     qemu: Create memory-backend-{ram,file} iff needed
>> > > 
>> > > Or this 7832fac84741d65e851dbdbfaf474785cbfdcf3c. We did try to generated
>> > > newer cmd line but then for various reasong (e.g. avoiding triggering a qemu
>> > > bug) we turned it off and make libvirt default to older (now deprecated) cmd
>> > > line.
>> > > 
>> > > Frankly, I don't know how to proceed. Unless qemu is fixed to allow
>> > > migration from deprecated to new cmd line (unlikely, if not impossible,
>> > > right?) then I guess the only approach we can have is that:
>> > > 
>> > > 1) whenever so called cold booting a new machine (fresh, brand new start of
>> > > a new domain) libvirt would default to modern cmd line,
>> > > 
>> > > 2) on migration, libvirt would record in the migration stream (or status XML
>> > > or wherever) that modern cmd line was generated and thus it'll make the
>> > > destination generate modern cmd line too.
>> > > 
>> > > This solution still suffers a couple of problems:
>> > > a) migration to older libvirt will fail as older libvirt won't recognize the
>> > > flag set in 2) and therefore would default to deprecated cmd line
>> > > b) migrating from one host to another won't modernize the cmd line
>> > > 
>> > > But I guess we have to draw a line somewhere (if we are not willing to write
>> > > those migration patches).
>> > 
>> > Yeah supporting backwards migration is a non-optional requirement from at
>> > least one of the mgmt apps using libvirt, so breaking the new to old case
>> > is something we always aim to avoid.
>> Aiming for support of 
>> "new QEMU + new machine type" => "old QEMU + non-existing machine type"
>> seems a bit difficult.
>
> That's not the scenario that's the problem. The problem is
>
>    new QEMU + new machine type + new libvirt   -> new QEMU + new machine type + old libvirt
>
> Previously released versions of libvirt will happily use any new machine
> type that QEMU introduces. So we can't make new libvirt use a different
> options, only for new machine types, as old libvirt supports those machine
> types too.

Avoiding tight coupling between QEMU und libvirt versions makes sense,
because having to upgrade stuff in lock-step is such a pain.

Does not imply we must support arbitrary combinations of QEMU and
libvirt versions.

Unless upstream libvirt's test matrix covers all versions of libvirt
against all released versions of QEMU, "previously released versions of
libvirt will continue to work with new QEMU" is largely an empty promise
anyway.  The real promise is more like "we won't break it intentionally;
good luck".

Mind, I'm not criticizing that real promise.  I'm criticizing cutting
yourself off from large areas of the solution space so you can continue
to pretend to yourself you actually deliver on the empty promise.

Now, if you limited what you promise to something more realistic,
ideally to something you actually test, we could talk about deprecation
schedules constructively.

For instance, if you promised

    QEMU as of time T + its latest machine type + libvirt as of time T
 -> QEMU as of time T + its latest machine type + libvirt as of time T - d

will work for a certain value of d, then once all released versions of
libvirt since T - d support a new way of doing things, flipping to that
new way becomes a whole lot easier.
Igor Mammedov March 14, 2019, 2:52 p.m. UTC | #35
On Sun, 10 Mar 2019 11:16:33 +0100
Markus Armbruster <armbru@redhat.com> wrote:

> Daniel P. Berrangé <berrange@redhat.com> writes:
> 
> > On Wed, Mar 06, 2019 at 08:03:48PM +0100, Igor Mammedov wrote:  
> >> On Mon, 4 Mar 2019 16:35:16 +0000
> >> Daniel P. Berrangé <berrange@redhat.com> wrote:
> >>   
> >> > On Mon, Mar 04, 2019 at 05:20:13PM +0100, Michal Privoznik wrote:  
> >> > > We couldn't have done that. How we would migrate from older qemu?
> >> > > 
> >> > > Anyway, now that I look into this (esp. git log) I came accross:
> >> > > 
> >> > > commit f309db1f4d51009bad0d32e12efc75530b66836b
> >> > > Author:     Michal Privoznik <mprivozn@redhat.com>
> >> > > AuthorDate: Thu Dec 18 12:36:48 2014 +0100
> >> > > Commit:     Michal Privoznik <mprivozn@redhat.com>
> >> > > CommitDate: Fri Dec 19 07:44:44 2014 +0100
> >> > > 
> >> > >     qemu: Create memory-backend-{ram,file} iff needed
> >> > > 
> >> > > Or this 7832fac84741d65e851dbdbfaf474785cbfdcf3c. We did try to generated
> >> > > newer cmd line but then for various reasong (e.g. avoiding triggering a qemu
> >> > > bug) we turned it off and make libvirt default to older (now deprecated) cmd
> >> > > line.
> >> > > 
> >> > > Frankly, I don't know how to proceed. Unless qemu is fixed to allow
> >> > > migration from deprecated to new cmd line (unlikely, if not impossible,
> >> > > right?) then I guess the only approach we can have is that:
> >> > > 
> >> > > 1) whenever so called cold booting a new machine (fresh, brand new start of
> >> > > a new domain) libvirt would default to modern cmd line,
> >> > > 
> >> > > 2) on migration, libvirt would record in the migration stream (or status XML
> >> > > or wherever) that modern cmd line was generated and thus it'll make the
> >> > > destination generate modern cmd line too.
> >> > > 
> >> > > This solution still suffers a couple of problems:
> >> > > a) migration to older libvirt will fail as older libvirt won't recognize the
> >> > > flag set in 2) and therefore would default to deprecated cmd line
> >> > > b) migrating from one host to another won't modernize the cmd line
> >> > > 
> >> > > But I guess we have to draw a line somewhere (if we are not willing to write
> >> > > those migration patches).  
> >> > 
> >> > Yeah supporting backwards migration is a non-optional requirement from at
> >> > least one of the mgmt apps using libvirt, so breaking the new to old case
> >> > is something we always aim to avoid.  
> >> Aiming for support of 
> >> "new QEMU + new machine type" => "old QEMU + non-existing machine type"
> >> seems a bit difficult.  
> >
> > That's not the scenario that's the problem. The problem is
> >
> >    new QEMU + new machine type + new libvirt   -> new QEMU + new machine type + old libvirt
> >
> > Previously released versions of libvirt will happily use any new machine
> > type that QEMU introduces. So we can't make new libvirt use a different
> > options, only for new machine types, as old libvirt supports those machine
> > types too.  
> 
> Avoiding tight coupling between QEMU und libvirt versions makes sense,
> because having to upgrade stuff in lock-step is such a pain.
> 
> Does not imply we must support arbitrary combinations of QEMU and
> libvirt versions.
Isn't it typically a job of downstream to ship a bundle that works
together and it is rather limited set.
e.g 
  System 1 libvirt 0 QEMU 0 (machine 0.1 (latest)) could be migrated to 2 ways to
  System 2 libvirt 1 QEMU 1 (machine 0.1 (still the same old machine))
while installing QEMU 1 on System 1 might work (if it doesn't break due to dependencies)
and even be able to start machine 1.0, wouldn't it really fall in unsupported category?


> Unless upstream libvirt's test matrix covers all versions of libvirt
> against all released versions of QEMU, "previously released versions of
> libvirt will continue to work with new QEMU" is largely an empty promise
> anyway.  The real promise is more like "we won't break it intentionally;
> good luck".
> 
> Mind, I'm not criticizing that real promise.  I'm criticizing cutting
> yourself off from large areas of the solution space so you can continue
> to pretend to yourself you actually deliver on the empty promise.
> 
> Now, if you limited what you promise to something more realistic,
> ideally to something you actually test, we could talk about deprecation
> schedules constructively.
> 
> For instance, if you promised
> 
>     QEMU as of time T + its latest machine type + libvirt as of time T
>  -> QEMU as of time T + its latest machine type + libvirt as of time T - d  
> 
> will work for a certain value of d, then once all released versions of
> libvirt since T - d support a new way of doing things, flipping to that
> new way becomes a whole lot easier.
On top of that libvirt should be willing to adapt to the new way for above
combination.

I'm experimenting with making 'mem' migration compatible with 'memdev'
(I have a working hack currently) to get a feeling of a toll it would
take on QEMU, so far outlook isn't good. It just adds more compat code
and new CLI options to deal with it (it just increases amount of
maintenance nightmare for indefinite period of time). And everyone would
hate adding such new CLI options which as it turns out we aren't able to
deprecate and remove in practice.

We really need to figure out how to introduce breaking change on management
(CLI) side* in QEMU and make it digestible for libvirt and others.
(* at least for new machine types).
Igor Mammedov March 18, 2019, 4:44 p.m. UTC | #36
On Mon, 4 Mar 2019 14:52:30 +0100
Igor Mammedov <imammedo@redhat.com> wrote:

> On Fri, 1 Mar 2019 18:01:52 +0000
> "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> 
> > * Igor Mammedov (imammedo@redhat.com) wrote:  
> > > On Fri, 1 Mar 2019 15:49:47 +0000
> > > Daniel P. Berrangé <berrange@redhat.com> wrote:
> > >     
> > > > On Fri, Mar 01, 2019 at 04:42:15PM +0100, Igor Mammedov wrote:    
> > > > > The parameter allows to configure fake NUMA topology where guest
> > > > > VM simulates NUMA topology but not actually getting a performance
> > > > > benefits from it. The same or better results could be achieved
> > > > > using 'memdev' parameter. In light of that any VM that uses NUMA
> > > > > to get its benefits should use 'memdev' and to allow transition
> > > > > initial RAM to device based model, deprecate 'mem' parameter as
> > > > > its ad-hoc partitioning of initial RAM MemoryRegion can't be
> > > > > translated to memdev based backend transparently to users and in
> > > > > compatible manner (migration wise).
> > > > > 
> > > > > That will also allow to clean up a bit our numa code, leaving only
> > > > > 'memdev' impl. in place and several boards that use node_mem
> > > > > to generate FDT/ACPI description from it.      
> > > > 
> > > > Can you confirm that the  'mem' and 'memdev' parameters to -numa
> > > > are 100% live migration compatible in both directions ?  Libvirt
> > > > would need this to be the case in order to use the 'memdev' syntax
> > > > instead.    
> > > Unfortunately they are not migration compatible in any direction,
> > > if it where possible to translate them to each other I'd alias 'mem'
> > > to 'memdev' without deprecation. The former sends over only one
> > > MemoryRegion to target, while the later sends over several (one per
> > > memdev).
> > > 
> > > Mixed memory issue[1] first came from libvirt side RHBZ1624223,
> > > back then it was resolved on libvirt side in favor of migration
> > > compatibility vs correctness (i.e. bind policy doesn't work as expected).
> > > What worse that it was made default and affects all new machines,
> > > as I understood it.
> > > 
> > > In case of -mem-path + -mem-prealloc (with 1 numa node or numa less)
> > > it's possible on QEMU side to make conversion to memdev in migration
> > > compatible way (that's what stopped Michal from memdev approach).
> > > But it's hard to do so in multi-nodes case as amount of MemoryRegions
> > > is different.
> > > 
> > > Point is to consider 'mem' as mis-configuration error, as the user
> > > in the first place using broken numa configuration
> > > (i.e. fake numa configuration doesn't actually improve performance).
> > > 
> > > CCed David, maybe he could offer a way to do 1:n migration and other
> > > way around.    
> > 
> > I can't see a trivial way.
> > About the easiest I can think of is if you had a way to create a memdev
> > that was an alias to pc.ram (of a particular size and offset).  
> If I get you right that's what I was planning to do for numa-less machines
> that use -mem-path/prealloc options, where it's possible to replace
> an initial RAM MemoryRegion with a correspondingly named memdev and its
> backing MemoryRegion.

> But I don't see how it could work in case of legacy NUMA 'mem' options
> where initial RAM is 1 MemoryRegion (it's a fake numa after all) and how to
> translate that into several MemoryRegions (one per node/memdev).
Limiting it to x86 for demo purposes.
What would work (if*) is to create special MemoryRegion container, i.e.
  1. make memory_region_allocate_system_memory():memory_region_init()
     that special which already has id pc.ram and size that matches
     the single RAMBlock with the same id in incoming migration stream
     from OLD qemu ( started with -numa node,mem=x ... options)
  2. register "1" with vmstate_register_ram_global()/or other API
     which undercover will make migration code, split the single incoming
     RAM block into several smaller consecutive RAMBlocks  represented
     by memdev backends that are mapped as subregions within container 'pc.ram'
  3. in case of backward migration container MemoryRegion 'pc.ram' will serve
     other way around stitching back memdev subregions into the single
     'pc.ram' migration stream.

(if*) - but above describes an ideal use-case where -numa node,mem
are properly sized. In practice though QEMU doesn't have any checks
on numa's 'mem' value option. So users were able to 'split' RAM in
arbitrary chunks which memdev based backends might not be able
to recreate due to used backing storage limitations (alignment/page size).
To make it worse we don't really know what source (old QEMU) uses for
backend for real as it might fallback to anonymous RAM if mem-path fails
(there is no fallback in memdev case as there user gets what he/she asked
for or hard error).

There might be other issues on migration side of things as well,
but I just don't know about it enough to see them.


> > Dave
> >   
> > >     
> > > > > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> > > > > ---
> > > > >  numa.c               |  2 ++
> > > > >  qemu-deprecated.texi | 14 ++++++++++++++
> > > > >  2 files changed, 16 insertions(+)
> > > > > 
> > > > > diff --git a/numa.c b/numa.c
> > > > > index 3875e1e..2205773 100644
> > > > > --- a/numa.c
> > > > > +++ b/numa.c
> > > > > @@ -121,6 +121,8 @@ static void parse_numa_node(MachineState *ms, NumaNodeOptions *node,
> > > > >  
> > > > >      if (node->has_mem) {
> > > > >          numa_info[nodenr].node_mem = node->mem;
> > > > > +        warn_report("Parameter -numa node,mem is deprecated,"
> > > > > +                    " use -numa node,memdev instead");
> > > > >      }
> > > > >      if (node->has_memdev) {
> > > > >          Object *o;
> > > > > diff --git a/qemu-deprecated.texi b/qemu-deprecated.texi
> > > > > index 45c5795..73f99d4 100644
> > > > > --- a/qemu-deprecated.texi
> > > > > +++ b/qemu-deprecated.texi
> > > > > @@ -60,6 +60,20 @@ Support for invalid topologies will be removed, the user must ensure
> > > > >  topologies described with -smp include all possible cpus, i.e.
> > > > >    @math{@var{sockets} * @var{cores} * @var{threads} = @var{maxcpus}}.
> > > > >  
> > > > > +@subsection -numa node,mem=@var{size} (since 4.0)
> > > > > +
> > > > > +The parameter @option{mem} of @option{-numa node} is used to assign a part of
> > > > > +guest RAM to a NUMA node. But when using it, it's impossible to manage specified
> > > > > +size on the host side (like bind it to a host node, setting bind policy, ...),
> > > > > +so guest end-ups with the fake NUMA configuration with suboptiomal performance.
> > > > > +However since 2014 there is an alternative way to assign RAM to a NUMA node
> > > > > +using parameter @option{memdev}, which does the same as @option{mem} and has
> > > > > +an ability to actualy manage node RAM on the host side. Use parameter
> > > > > +@option{memdev} with @var{memory-backend-ram} backend as an replacement for
> > > > > +parameter @option{mem} to achieve the same fake NUMA effect or a properly
> > > > > +configured @var{memory-backend-file} backend to actually benefit from NUMA
> > > > > +configuration.
> > > > > +
> > > > >  @section QEMU Machine Protocol (QMP) commands
> > > > >  
> > > > >  @subsection block-dirty-bitmap-add "autoload" parameter (since 2.12.0)
> > > > > -- 
> > > > > 2.7.4
> > > > > 
> > > > > --
> > > > > libvir-list mailing list
> > > > > libvir-list@redhat.com
> > > > > https://www.redhat.com/mailman/listinfo/libvir-list      
> > > > 
> > > > Regards,
> > > > Daniel    
> > >     
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK  
> 
>
Igor Mammedov March 19, 2019, 2:17 p.m. UTC | #37
On Sun, 10 Mar 2019 11:14:08 +0100
Markus Armbruster <armbru@redhat.com> wrote:

> Daniel P. Berrangé <berrange@redhat.com> writes:
> 
> > On Mon, Mar 04, 2019 at 12:45:14PM +0100, Markus Armbruster wrote:  
> >> Daniel P. Berrangé <berrange@redhat.com> writes:
> >>   
> >> > On Mon, Mar 04, 2019 at 08:13:53AM +0100, Markus Armbruster wrote:  
> >> >> If we deprecate outdated NUMA configurations now, we can start rejecting
> >> >> them with new machine types after a suitable grace period.  
> >> >
> >> > How is libvirt going to know what machines it can use with the feature ?
> >> > We don't have any way to introspect machine type specific logic, since we
> >> > run all probing with "-machine none", and QEMU can't report anything about
> >> > machines without instantiating them.  
> >> 
> >> Fair point.  A practical way for management applications to decide which
> >> of the two interfaces they can use with which machine type may be
> >> required for deprecating one of the interfaces with new machine types.  
> >
> > We currently have  "qom-list-properties" which can report on the
> > existance of properties registered against object types. What it
> > can't do though is report on the default values of these properties.  
> 
> Yes.
> 
> > What's interesting though is that qmp_qom_list_properties will actually
> > instantiate objects in order to query properties, if the type isn't an
> > abstract type.  
> 
> If it's an abstract type, qom-list-properties returns the properties
> created with object_class_property_add() & friends, typically by the
> class_init method.  This is possible without instantiating the type.
> 
> If it's a concrete type, qom-list-properties additionally returns the
> properties created with object_property_add(), typically by the
> instance_init() method.  This requires instantiating the type.
> 
> Both kinds of properties can be added or deleted at any time.  For
> instance, setting a property value with object_property_set() or similar
> could create additional properties.
> 
> For historical reasons, we use often use object_property_add() where
> object_class_property_add() would do.  Sad.
> 
> > IOW, even if you are running "$QEMU -machine none", then if at the qmp-shell
> > you do
> >
> >    (QEMU) qom-list-properties typename=pc-q35-2.6-machine
> >
> > it will have actually instantiate the pc-q35-2.6-machine machine type.
> > Since it has instantiated the machine, the object initializer function
> > will have run and initialized the default values for various properties.
> >
> > IOW, it is possible for qom-list-properties to report on default values
> > for non-abstract types.  
> 
> instance_init() also initializes the properties' values.
> qom-list-properties could show these initial values (I hesitate calling
> them default values).
> 
> Setting a property's value can change other properties' values by side
> effect.
> 
> My point is: the properties qom-list-properties shows and the initial
> values it could show are not necessarily final.  QOM is designed to be
> maximally flexible, and flexibility brings along its bosom-buddy
> complexity.
> 
> If you keep that in mind, qom-list-properties can be put to good use all
> the same.
> 
> A way to report "default values" (really: whatever the values are after
> object_new()) feels like a fair feature request to me, if backed by an
> actual use case.

Looks like trying to migrate from 'mem' to 'memdev', just creates another
train-wreck (where libvirt would have to hunt for 'right' backend
configuration to make migration work and even that would be best effort
attempt). If that would work reliably, I'd go for it since it allows to
drop 'mem' codepath altogether but it doesn't look possible.
So I'll look into adding machine level introspection and deprecating 'mem'
option for new machine types.

> [...]
>
diff mbox series

Patch

diff --git a/numa.c b/numa.c
index 3875e1e..2205773 100644
--- a/numa.c
+++ b/numa.c
@@ -121,6 +121,8 @@  static void parse_numa_node(MachineState *ms, NumaNodeOptions *node,
 
     if (node->has_mem) {
         numa_info[nodenr].node_mem = node->mem;
+        warn_report("Parameter -numa node,mem is deprecated,"
+                    " use -numa node,memdev instead");
     }
     if (node->has_memdev) {
         Object *o;
diff --git a/qemu-deprecated.texi b/qemu-deprecated.texi
index 45c5795..73f99d4 100644
--- a/qemu-deprecated.texi
+++ b/qemu-deprecated.texi
@@ -60,6 +60,20 @@  Support for invalid topologies will be removed, the user must ensure
 topologies described with -smp include all possible cpus, i.e.
   @math{@var{sockets} * @var{cores} * @var{threads} = @var{maxcpus}}.
 
+@subsection -numa node,mem=@var{size} (since 4.0)
+
+The parameter @option{mem} of @option{-numa node} is used to assign a part of
+guest RAM to a NUMA node. But when using it, it's impossible to manage specified
+size on the host side (like bind it to a host node, setting bind policy, ...),
+so guest end-ups with the fake NUMA configuration with suboptiomal performance.
+However since 2014 there is an alternative way to assign RAM to a NUMA node
+using parameter @option{memdev}, which does the same as @option{mem} and has
+an ability to actualy manage node RAM on the host side. Use parameter
+@option{memdev} with @var{memory-backend-ram} backend as an replacement for
+parameter @option{mem} to achieve the same fake NUMA effect or a properly
+configured @var{memory-backend-file} backend to actually benefit from NUMA
+configuration.
+
 @section QEMU Machine Protocol (QMP) commands
 
 @subsection block-dirty-bitmap-add "autoload" parameter (since 2.12.0)