diff mbox series

vl.c: make sure maxcpus matches topology to prevent migration failure

Message ID 1535035877-219196-1-git-send-email-imammedo@redhat.com
State New
Headers show
Series vl.c: make sure maxcpus matches topology to prevent migration failure | expand

Commit Message

Igor Mammedov Aug. 23, 2018, 2:51 p.m. UTC
Topology (threads*cores*sockets) must match maxcpus to be valid,
otherwise we could start QEMU with invalid topology that throws
a error on migration destination side, that should not be reachable:
Source:
  -smp 8,maxcpus=64,cores=1,threads=8,sockets=1
// hotplug cpus upto maxcpus
Destination:
  -smp 64,maxcpus=64,cores=1,threads=8,sockets=1
  qemu: cpu topology: sockets (1) * cores (1) * threads (8) < smp_cpus (64)

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
 vl.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

Comments

Paolo Bonzini Aug. 23, 2018, 4:32 p.m. UTC | #1
On 23/08/2018 16:51, Igor Mammedov wrote:
> Topology (threads*cores*sockets) must match maxcpus to be valid,
> otherwise we could start QEMU with invalid topology that throws
> a error on migration destination side, that should not be reachable:
> Source:
>   -smp 8,maxcpus=64,cores=1,threads=8,sockets=1
> // hotplug cpus upto maxcpus
> Destination:
>   -smp 64,maxcpus=64,cores=1,threads=8,sockets=1
>   qemu: cpu topology: sockets (1) * cores (1) * threads (8) < smp_cpus (64)

The destination should have sockets=8, shouldn't it?

It seems to me that, at startup, you should have cpus = s*t*c and cpus
<= maxcpus.  Currently we check cpus <= s*t*c <= maxcpus, which doesn't
make much sense.

Paolo

> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> ---
>  vl.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/vl.c b/vl.c
> index 16b913f..2b35e0c 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -1238,10 +1238,10 @@ static void smp_parse(QemuOpts *opts)
>              exit(1);
>          }
>  
> -        if (sockets * cores * threads > max_cpus) {
> +        if (sockets * cores * threads != max_cpus) {
>              error_report("cpu topology: "
> -                         "sockets (%u) * cores (%u) * threads (%u) > "
> -                         "maxcpus (%u)",
> +                         "sockets (%u) * cores (%u) * threads (%u) not equal to"
> +                         " maxcpus (%u)",
>                           sockets, cores, threads, max_cpus);
>              exit(1);
>          }
>
Eduardo Habkost Aug. 23, 2018, 6:03 p.m. UTC | #2
On Thu, Aug 23, 2018 at 06:32:41PM +0200, Paolo Bonzini wrote:
> On 23/08/2018 16:51, Igor Mammedov wrote:
> > Topology (threads*cores*sockets) must match maxcpus to be valid,
> > otherwise we could start QEMU with invalid topology that throws
> > a error on migration destination side, that should not be reachable:
> > Source:
> >   -smp 8,maxcpus=64,cores=1,threads=8,sockets=1
> > // hotplug cpus upto maxcpus
> > Destination:
> >   -smp 64,maxcpus=64,cores=1,threads=8,sockets=1
> >   qemu: cpu topology: sockets (1) * cores (1) * threads (8) < smp_cpus (64)
> 
> The destination should have sockets=8, shouldn't it?
> 
> It seems to me that, at startup, you should have cpus = s*t*c and cpus
> <= maxcpus.  Currently we check cpus <= s*t*c <= maxcpus, which doesn't
> make much sense.

Most of the incompleteness of input validation at smp_parse() can
be explained by our fear of breaking existing configurations and
making existing running VMs not runnable.

But now we have a deprecation policy.  If we're still afraid of
breaking peoples' existing configurations, we should at least
deprecate those configurations as soon as possible (and make QEMU
at least emit a warning).
Igor Mammedov Aug. 24, 2018, 9:13 a.m. UTC | #3
On Thu, 23 Aug 2018 18:32:41 +0200
Paolo Bonzini <pbonzini@redhat.com> wrote:

> On 23/08/2018 16:51, Igor Mammedov wrote:
> > Topology (threads*cores*sockets) must match maxcpus to be valid,
> > otherwise we could start QEMU with invalid topology that throws
> > a error on migration destination side, that should not be reachable:
> > Source:
> >   -smp 8,maxcpus=64,cores=1,threads=8,sockets=1
> > // hotplug cpus upto maxcpus
> > Destination:
> >   -smp 64,maxcpus=64,cores=1,threads=8,sockets=1
> >   qemu: cpu topology: sockets (1) * cores (1) * threads (8) < smp_cpus (64)
This destination CLI aren't exactly correct as well since
it should've been exactly the same -smp as on source + a bunch of -device cpufoo...
so we can always say go fix your CLI so it won't trigger error.
  
> The destination should have sockets=8, shouldn't it?
either that or cores=8 or cores=4,sockets=2 ...
 
> It seems to me that, at startup, you should have cpus = s*t*c and cpus
> <= maxcpus.  Currently we check cpus <= s*t*c <= maxcpus, which doesn't
> make much sense.
I think that s*t*c should describe topology of whole machine
including not yet plugged vcpus. "cpus = s*t*c" probably won't work
for partially filled package case:
       -smp 1,cores=1,threads=8,sockets=1
cores/threads should reflect full package configuration
for guest to see an expected topology.


> Paolo
> 
> > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> > ---
> >  vl.c | 6 +++---
> >  1 file changed, 3 insertions(+), 3 deletions(-)
> > 
> > diff --git a/vl.c b/vl.c
> > index 16b913f..2b35e0c 100644
> > --- a/vl.c
> > +++ b/vl.c
> > @@ -1238,10 +1238,10 @@ static void smp_parse(QemuOpts *opts)
> >              exit(1);
> >          }
> >  
> > -        if (sockets * cores * threads > max_cpus) {
> > +        if (sockets * cores * threads != max_cpus) {
> >              error_report("cpu topology: "
> > -                         "sockets (%u) * cores (%u) * threads (%u) > "
> > -                         "maxcpus (%u)",
> > +                         "sockets (%u) * cores (%u) * threads (%u) not equal to"
> > +                         " maxcpus (%u)",
> >                           sockets, cores, threads, max_cpus);
> >              exit(1);
> >          }
> >   
>
Igor Mammedov Aug. 24, 2018, 9:15 a.m. UTC | #4
On Thu, 23 Aug 2018 15:03:07 -0300
Eduardo Habkost <ehabkost@redhat.com> wrote:

> On Thu, Aug 23, 2018 at 06:32:41PM +0200, Paolo Bonzini wrote:
> > On 23/08/2018 16:51, Igor Mammedov wrote:  
> > > Topology (threads*cores*sockets) must match maxcpus to be valid,
> > > otherwise we could start QEMU with invalid topology that throws
> > > a error on migration destination side, that should not be reachable:
> > > Source:
> > >   -smp 8,maxcpus=64,cores=1,threads=8,sockets=1
> > > // hotplug cpus upto maxcpus
> > > Destination:
> > >   -smp 64,maxcpus=64,cores=1,threads=8,sockets=1
> > >   qemu: cpu topology: sockets (1) * cores (1) * threads (8) < smp_cpus (64)  
> > 
> > The destination should have sockets=8, shouldn't it?
> > 
> > It seems to me that, at startup, you should have cpus = s*t*c and cpus
> > <= maxcpus.  Currently we check cpus <= s*t*c <= maxcpus, which doesn't
> > make much sense.  
> 
> Most of the incompleteness of input validation at smp_parse() can
> be explained by our fear of breaking existing configurations and
> making existing running VMs not runnable.
> 
> But now we have a deprecation policy.  If we're still afraid of
> breaking peoples' existing configurations, we should at least
> deprecate those configurations as soon as possible (and make QEMU
> at least emit a warning).
Sure, I can send a deprecation patch first.
Eduardo Habkost Aug. 24, 2018, 11:11 a.m. UTC | #5
On Fri, Aug 24, 2018 at 11:13:50AM +0200, Igor Mammedov wrote:
> On Thu, 23 Aug 2018 18:32:41 +0200
> Paolo Bonzini <pbonzini@redhat.com> wrote:
> 
> > On 23/08/2018 16:51, Igor Mammedov wrote:
> > > Topology (threads*cores*sockets) must match maxcpus to be valid,
> > > otherwise we could start QEMU with invalid topology that throws
> > > a error on migration destination side, that should not be reachable:
> > > Source:
> > >   -smp 8,maxcpus=64,cores=1,threads=8,sockets=1
> > > // hotplug cpus upto maxcpus
> > > Destination:
> > >   -smp 64,maxcpus=64,cores=1,threads=8,sockets=1
> > >   qemu: cpu topology: sockets (1) * cores (1) * threads (8) < smp_cpus (64)
> This destination CLI aren't exactly correct as well since
> it should've been exactly the same -smp as on source + a bunch of -device cpufoo...
> so we can always say go fix your CLI so it won't trigger error.
>   
> > The destination should have sockets=8, shouldn't it?
> either that or cores=8 or cores=4,sockets=2 ...
>  
> > It seems to me that, at startup, you should have cpus = s*t*c and cpus
> > <= maxcpus.  Currently we check cpus <= s*t*c <= maxcpus, which doesn't
> > make much sense.
> I think that s*t*c should describe topology of whole machine
> including not yet plugged vcpus. "cpus = s*t*c" probably won't work
> for partially filled package case:
>        -smp 1,cores=1,threads=8,sockets=1
> cores/threads should reflect full package configuration
> for guest to see an expected topology.

Oh, now I remember: that's the reason we don't enforce
s*t*c == smp_cpus nor s*t*c == max_cpus.

Both "-smp 4,maxcpus=8,cores=2,threads=2,sockets=1" and
     "-smp 4,maxcpus=8,cores=2,threads=2,sockets=2"
worked since maxcpus was introduced, making the semantics of
"sockets" unclear and hard to change without breaking existing
configs.
Igor Mammedov Aug. 24, 2018, 11:26 a.m. UTC | #6
On Fri, 24 Aug 2018 08:11:48 -0300
Eduardo Habkost <ehabkost@redhat.com> wrote:

> On Fri, Aug 24, 2018 at 11:13:50AM +0200, Igor Mammedov wrote:
> > On Thu, 23 Aug 2018 18:32:41 +0200
> > Paolo Bonzini <pbonzini@redhat.com> wrote:
> >   
> > > On 23/08/2018 16:51, Igor Mammedov wrote:  
> > > > Topology (threads*cores*sockets) must match maxcpus to be valid,
> > > > otherwise we could start QEMU with invalid topology that throws
> > > > a error on migration destination side, that should not be reachable:
> > > > Source:
> > > >   -smp 8,maxcpus=64,cores=1,threads=8,sockets=1
> > > > // hotplug cpus upto maxcpus
> > > > Destination:
> > > >   -smp 64,maxcpus=64,cores=1,threads=8,sockets=1
> > > >   qemu: cpu topology: sockets (1) * cores (1) * threads (8) < smp_cpus (64)  
> > This destination CLI aren't exactly correct as well since
> > it should've been exactly the same -smp as on source + a bunch of -device cpufoo...
> > so we can always say go fix your CLI so it won't trigger error.
> >     
> > > The destination should have sockets=8, shouldn't it?  
> > either that or cores=8 or cores=4,sockets=2 ...
> >    
> > > It seems to me that, at startup, you should have cpus = s*t*c and cpus
> > > <= maxcpus.  Currently we check cpus <= s*t*c <= maxcpus, which doesn't
> > > make much sense.  
> > I think that s*t*c should describe topology of whole machine
> > including not yet plugged vcpus. "cpus = s*t*c" probably won't work
> > for partially filled package case:
> >        -smp 1,cores=1,threads=8,sockets=1
> > cores/threads should reflect full package configuration
> > for guest to see an expected topology.  
> 
> Oh, now I remember: that's the reason we don't enforce
> s*t*c == smp_cpus nor s*t*c == max_cpus.
> 
> Both "-smp 4,maxcpus=8,cores=2,threads=2,sockets=1" and
>      "-smp 4,maxcpus=8,cores=2,threads=2,sockets=2"
> worked since maxcpus was introduced, making the semantics of
> "sockets" unclear and hard to change without breaking existing
> configs.
Should we go with deprication thingy then,
so we could make it clear in the future?
Eduardo Habkost Aug. 24, 2018, 1:53 p.m. UTC | #7
On Fri, Aug 24, 2018 at 01:26:54PM +0200, Igor Mammedov wrote:
> On Fri, 24 Aug 2018 08:11:48 -0300
> Eduardo Habkost <ehabkost@redhat.com> wrote:
> 
> > On Fri, Aug 24, 2018 at 11:13:50AM +0200, Igor Mammedov wrote:
> > > On Thu, 23 Aug 2018 18:32:41 +0200
> > > Paolo Bonzini <pbonzini@redhat.com> wrote:
> > >   
> > > > On 23/08/2018 16:51, Igor Mammedov wrote:  
> > > > > Topology (threads*cores*sockets) must match maxcpus to be valid,
> > > > > otherwise we could start QEMU with invalid topology that throws
> > > > > a error on migration destination side, that should not be reachable:
> > > > > Source:
> > > > >   -smp 8,maxcpus=64,cores=1,threads=8,sockets=1
> > > > > // hotplug cpus upto maxcpus
> > > > > Destination:
> > > > >   -smp 64,maxcpus=64,cores=1,threads=8,sockets=1
> > > > >   qemu: cpu topology: sockets (1) * cores (1) * threads (8) < smp_cpus (64)  
> > > This destination CLI aren't exactly correct as well since
> > > it should've been exactly the same -smp as on source + a bunch of -device cpufoo...
> > > so we can always say go fix your CLI so it won't trigger error.
> > >     
> > > > The destination should have sockets=8, shouldn't it?  
> > > either that or cores=8 or cores=4,sockets=2 ...
> > >    
> > > > It seems to me that, at startup, you should have cpus = s*t*c and cpus
> > > > <= maxcpus.  Currently we check cpus <= s*t*c <= maxcpus, which doesn't
> > > > make much sense.  
> > > I think that s*t*c should describe topology of whole machine
> > > including not yet plugged vcpus. "cpus = s*t*c" probably won't work
> > > for partially filled package case:
> > >        -smp 1,cores=1,threads=8,sockets=1
> > > cores/threads should reflect full package configuration
> > > for guest to see an expected topology.  
> > 
> > Oh, now I remember: that's the reason we don't enforce
> > s*t*c == smp_cpus nor s*t*c == max_cpus.
> > 
> > Both "-smp 4,maxcpus=8,cores=2,threads=2,sockets=1" and
> >      "-smp 4,maxcpus=8,cores=2,threads=2,sockets=2"
> > worked since maxcpus was introduced, making the semantics of
> > "sockets" unclear and hard to change without breaking existing
> > configs.
> Should we go with deprication thingy then,
> so we could make it clear in the future?

Yes, but I'm not sure which option we should adopt
(s*t*c == smp_cpus or s*t*c == max_cpus).

Does anybody know what's the semantics expected by libvirt today?
Daniel P. Berrangé Aug. 24, 2018, 2:03 p.m. UTC | #8
On Fri, Aug 24, 2018 at 10:53:32AM -0300, Eduardo Habkost wrote:
> On Fri, Aug 24, 2018 at 01:26:54PM +0200, Igor Mammedov wrote:
> > On Fri, 24 Aug 2018 08:11:48 -0300
> > Eduardo Habkost <ehabkost@redhat.com> wrote:
> > 
> > > On Fri, Aug 24, 2018 at 11:13:50AM +0200, Igor Mammedov wrote:
> > > > On Thu, 23 Aug 2018 18:32:41 +0200
> > > > Paolo Bonzini <pbonzini@redhat.com> wrote:
> > > >   
> > > > > On 23/08/2018 16:51, Igor Mammedov wrote:  
> > > > > > Topology (threads*cores*sockets) must match maxcpus to be valid,
> > > > > > otherwise we could start QEMU with invalid topology that throws
> > > > > > a error on migration destination side, that should not be reachable:
> > > > > > Source:
> > > > > >   -smp 8,maxcpus=64,cores=1,threads=8,sockets=1
> > > > > > // hotplug cpus upto maxcpus
> > > > > > Destination:
> > > > > >   -smp 64,maxcpus=64,cores=1,threads=8,sockets=1
> > > > > >   qemu: cpu topology: sockets (1) * cores (1) * threads (8) < smp_cpus (64)  
> > > > This destination CLI aren't exactly correct as well since
> > > > it should've been exactly the same -smp as on source + a bunch of -device cpufoo...
> > > > so we can always say go fix your CLI so it won't trigger error.
> > > >     
> > > > > The destination should have sockets=8, shouldn't it?  
> > > > either that or cores=8 or cores=4,sockets=2 ...
> > > >    
> > > > > It seems to me that, at startup, you should have cpus = s*t*c and cpus
> > > > > <= maxcpus.  Currently we check cpus <= s*t*c <= maxcpus, which doesn't
> > > > > make much sense.  
> > > > I think that s*t*c should describe topology of whole machine
> > > > including not yet plugged vcpus. "cpus = s*t*c" probably won't work
> > > > for partially filled package case:
> > > >        -smp 1,cores=1,threads=8,sockets=1
> > > > cores/threads should reflect full package configuration
> > > > for guest to see an expected topology.  
> > > 
> > > Oh, now I remember: that's the reason we don't enforce
> > > s*t*c == smp_cpus nor s*t*c == max_cpus.
> > > 
> > > Both "-smp 4,maxcpus=8,cores=2,threads=2,sockets=1" and
> > >      "-smp 4,maxcpus=8,cores=2,threads=2,sockets=2"
> > > worked since maxcpus was introduced, making the semantics of
> > > "sockets" unclear and hard to change without breaking existing
> > > configs.
> > Should we go with deprication thingy then,
> > so we could make it clear in the future?
> 
> Yes, but I'm not sure which option we should adopt
> (s*t*c == smp_cpus or s*t*c == max_cpus).
> 
> Does anybody know what's the semantics expected by libvirt today?

Libvirt requires s*c*t to equal the total number of possible
CPUs, *not* the currently plugged number.

ie

Valid:

  <vcpu placement='static' current='16'>32</vcpu>
  <cpu>
    <topology sockets='4' cores='4' threads='2'/>
  </cpu>

Invalid:

  <vcpu placement='static' current='32'>64</vcpu>
  <cpu>
    <topology sockets='4' cores='4' threads='2'/>
  </cpu>

Test with:

$ virsh  edit QEMUGuest1
error: unsupported configuration: CPU topology doesn't match maximum vcpu count
Failed. Try again? [y,n,i,f,?]: 

Regards,
Daniel
Igor Mammedov Aug. 24, 2018, 3:24 p.m. UTC | #9
On Fri, 24 Aug 2018 10:53:32 -0300
Eduardo Habkost <ehabkost@redhat.com> wrote:

> On Fri, Aug 24, 2018 at 01:26:54PM +0200, Igor Mammedov wrote:
> > On Fri, 24 Aug 2018 08:11:48 -0300
> > Eduardo Habkost <ehabkost@redhat.com> wrote:
> >   
> > > On Fri, Aug 24, 2018 at 11:13:50AM +0200, Igor Mammedov wrote:  
> > > > On Thu, 23 Aug 2018 18:32:41 +0200
> > > > Paolo Bonzini <pbonzini@redhat.com> wrote:
> > > >     
> > > > > On 23/08/2018 16:51, Igor Mammedov wrote:    
> > > > > > Topology (threads*cores*sockets) must match maxcpus to be valid,
> > > > > > otherwise we could start QEMU with invalid topology that throws
> > > > > > a error on migration destination side, that should not be reachable:
> > > > > > Source:
> > > > > >   -smp 8,maxcpus=64,cores=1,threads=8,sockets=1
> > > > > > // hotplug cpus upto maxcpus
> > > > > > Destination:
> > > > > >   -smp 64,maxcpus=64,cores=1,threads=8,sockets=1
> > > > > >   qemu: cpu topology: sockets (1) * cores (1) * threads (8) < smp_cpus (64)    
> > > > This destination CLI aren't exactly correct as well since
> > > > it should've been exactly the same -smp as on source + a bunch of -device cpufoo...
> > > > so we can always say go fix your CLI so it won't trigger error.
> > > >       
> > > > > The destination should have sockets=8, shouldn't it?    
> > > > either that or cores=8 or cores=4,sockets=2 ...
> > > >      
> > > > > It seems to me that, at startup, you should have cpus = s*t*c and cpus
> > > > > <= maxcpus.  Currently we check cpus <= s*t*c <= maxcpus, which doesn't
> > > > > make much sense.    
> > > > I think that s*t*c should describe topology of whole machine
> > > > including not yet plugged vcpus. "cpus = s*t*c" probably won't work
> > > > for partially filled package case:
> > > >        -smp 1,cores=1,threads=8,sockets=1
> > > > cores/threads should reflect full package configuration
> > > > for guest to see an expected topology.    
> > > 
> > > Oh, now I remember: that's the reason we don't enforce
> > > s*t*c == smp_cpus nor s*t*c == max_cpus.
> > > 
> > > Both "-smp 4,maxcpus=8,cores=2,threads=2,sockets=1" and
> > >      "-smp 4,maxcpus=8,cores=2,threads=2,sockets=2"
> > > worked since maxcpus was introduced, making the semantics of
> > > "sockets" unclear and hard to change without breaking existing
> > > configs.  
> > Should we go with deprication thingy then,
> > so we could make it clear in the future?  
> 
> Yes, but I'm not sure which option we should adopt
> (s*t*c == smp_cpus or s*t*c == max_cpus).
s*t*c == smp_cpus is wrong as one won't be able to start QEMU
with partial package and then hotplug the rest when needed.
[...]
Andrew Jones Aug. 27, 2018, 11:21 a.m. UTC | #10
On Thu, Aug 23, 2018 at 03:03:07PM -0300, Eduardo Habkost wrote:
> On Thu, Aug 23, 2018 at 06:32:41PM +0200, Paolo Bonzini wrote:
> > On 23/08/2018 16:51, Igor Mammedov wrote:
> > > Topology (threads*cores*sockets) must match maxcpus to be valid,
> > > otherwise we could start QEMU with invalid topology that throws
> > > a error on migration destination side, that should not be reachable:
> > > Source:
> > >   -smp 8,maxcpus=64,cores=1,threads=8,sockets=1
> > > // hotplug cpus upto maxcpus
> > > Destination:
> > >   -smp 64,maxcpus=64,cores=1,threads=8,sockets=1
> > >   qemu: cpu topology: sockets (1) * cores (1) * threads (8) < smp_cpus (64)
> > 
> > The destination should have sockets=8, shouldn't it?
> > 
> > It seems to me that, at startup, you should have cpus = s*t*c and cpus
> > <= maxcpus.  Currently we check cpus <= s*t*c <= maxcpus, which doesn't
> > make much sense.
> 
> Most of the incompleteness of input validation at smp_parse() can
> be explained by our fear of breaking existing configurations and
> making existing running VMs not runnable.
> 
> But now we have a deprecation policy.  If we're still afraid of
> breaking peoples' existing configurations, we should at least
> deprecate those configurations as soon as possible (and make QEMU
> at least emit a warning).
>

A million years ago (well > 2 anyway) when I was thinking about doing some
'-smp' improvements I tried to address this without breaking existing
configs. Here's how I approached it

 https://lists.gnu.org/archive/html/qemu-ppc/2016-06/msg00317.html

Also, there's another similar fix needed in smbios generation. See

 https://lists.gnu.org/archive/html/qemu-ppc/2016-06/msg00322.html

Thanks,
drew
diff mbox series

Patch

diff --git a/vl.c b/vl.c
index 16b913f..2b35e0c 100644
--- a/vl.c
+++ b/vl.c
@@ -1238,10 +1238,10 @@  static void smp_parse(QemuOpts *opts)
             exit(1);
         }
 
-        if (sockets * cores * threads > max_cpus) {
+        if (sockets * cores * threads != max_cpus) {
             error_report("cpu topology: "
-                         "sockets (%u) * cores (%u) * threads (%u) > "
-                         "maxcpus (%u)",
+                         "sockets (%u) * cores (%u) * threads (%u) not equal to"
+                         " maxcpus (%u)",
                          sockets, cores, threads, max_cpus);
             exit(1);
         }