diff mbox

[RFC,v2] PPC: smp: autodetect numbers of threads per core

Message ID 1389245648-10300-1-git-send-email-aik@ozlabs.ru
State New
Headers show

Commit Message

Alexey Kardashevskiy Jan. 9, 2014, 5:34 a.m. UTC
On POWERPC, only a whole CPU core can be assigned to a KVM. Since
POWER7/8 support several threads per core, we want all threads of a core
to go to the same KVM so every time we run QEMU with -enable-kvm,
we have to add -smp X,threads=(4|8)" (4 for POWER7 and 8 for POWER8).

The user rather wants the maximum number of threads enabled, and
there is no easy way to know the maximum number supported by the hardware.
However the accelerator (KVM) knows this number and we can use it to
set the threads number to the maximum.

This adds a "threads_max" boolean switch to the "-smp" parameter which
tells QEMU to use the smp_threads value. The existing "threads" is not
changed to support a "max" string value in order to avoid duplicating
the parsing errors handling.

This adds smp_threads tweaking into POWERPC's kvm_arch_init().

This moves smp_parse() call in vl.c later to let the accelerator alter
the smp_threads value before actual "-smp" option parsing happens.

This does not change the default value of smp_threads which remains 1.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 qemu-options.hx  |  7 +++++--
 target-ppc/kvm.c |  2 ++
 vl.c             | 34 +++++++++++++++++++++++-----------
 3 files changed, 30 insertions(+), 13 deletions(-)

Comments

Mike D. Day Jan. 9, 2014, 9 p.m. UTC | #1
Alexey Kardashevskiy <aik@ozlabs.ru> writes:

>          /* compute missing values, prefer sockets over cores over threads */
>          if (cpus == 0 || sockets == 0) {
>              sockets = sockets > 0 ? sockets : 1;
>              cores = cores > 0 ? cores : 1;
> -            threads = threads > 0 ? threads : 1;
> +            if (threads_max) {
> +                if (threads > 0) {
> +                    fprintf(stderr, "Use either threads or threads_max\n");
> +                    exit(1);

If you went ahead with the threads="max" string option you wouldn't need
to check here for mutual excusivity and the user wouldn't need to worry
about an extra command options.

> +                }
> +                threads = smp_threads > 0 ? smp_threads : 1;
> +            } else {
> +                threads = threads > 0 ? threads : 1;
> +            }
>              if (cpus == 0) {
>                  cpus = cores * threads * sockets;
>              }
Alexey Kardashevskiy Jan. 9, 2014, 10:12 p.m. UTC | #2
On 01/10/2014 08:00 AM, Mike Day wrote:
> 
> Alexey Kardashevskiy <aik@ozlabs.ru> writes:
> 
>>          /* compute missing values, prefer sockets over cores over threads */
>>          if (cpus == 0 || sockets == 0) {
>>              sockets = sockets > 0 ? sockets : 1;
>>              cores = cores > 0 ? cores : 1;
>> -            threads = threads > 0 ? threads : 1;
>> +            if (threads_max) {
>> +                if (threads > 0) {
>> +                    fprintf(stderr, "Use either threads or threads_max\n");
>> +                    exit(1);
> 
> If you went ahead with the threads="max" string option you wouldn't need
> to check here for mutual excusivity and the user wouldn't need to worry
> about an extra command options.


Is this the only concern and the rest is fine and can go to upstream? If
so, I'll fix it and repost.


>> +                }
>> +                threads = smp_threads > 0 ? smp_threads : 1;
>> +            } else {
>> +                threads = threads > 0 ? threads : 1;
>> +            }
>>              if (cpus == 0) {
>>                  cpus = cores * threads * sockets;
>>              }
>
Scott Wood Jan. 9, 2014, 10:50 p.m. UTC | #3
On Thu, 2014-01-09 at 16:34 +1100, Alexey Kardashevskiy wrote:
> On POWERPC, only a whole CPU core can be assigned to a KVM.

s/POWERPC/POWER/

-Scott
Alexander Graf Jan. 9, 2014, 11:40 p.m. UTC | #4
> Am 09.01.2014 um 23:12 schrieb Alexey Kardashevskiy <aik@ozlabs.ru>:
> 
>> On 01/10/2014 08:00 AM, Mike Day wrote:
>> 
>> Alexey Kardashevskiy <aik@ozlabs.ru> writes:
>> 
>>>         /* compute missing values, prefer sockets over cores over threads */
>>>         if (cpus == 0 || sockets == 0) {
>>>             sockets = sockets > 0 ? sockets : 1;
>>>             cores = cores > 0 ? cores : 1;
>>> -            threads = threads > 0 ? threads : 1;
>>> +            if (threads_max) {
>>> +                if (threads > 0) {
>>> +                    fprintf(stderr, "Use either threads or threads_max\n");
>>> +                    exit(1);
>> 
>> If you went ahead with the threads="max" string option you wouldn't need
>> to check here for mutual excusivity and the user wouldn't need to worry
>> about an extra command options.
> 
> 
> Is this the only concern and the rest is fine and can go to upstream? If
> so, I'll fix it and repost.

What if we make the max thread count a property of our cpu class? The we can add a threads=max option which will be identical between kvm and tcg.

Overall I'm not yet fully convinced this whole idea is eventually going to improve the situation though.


Alex

> 
> 
>>> +                }
>>> +                threads = smp_threads > 0 ? smp_threads : 1;
>>> +            } else {
>>> +                threads = threads > 0 ? threads : 1;
>>> +            }
>>>             if (cpus == 0) {
>>>                 cpus = cores * threads * sockets;
>>>             }
> 
> 
> -- 
> Alexey
Alexey Kardashevskiy Jan. 10, 2014, 1:39 a.m. UTC | #5
On 01/10/2014 10:40 AM, Alexander Graf wrote:
> 
> 
>> Am 09.01.2014 um 23:12 schrieb Alexey Kardashevskiy <aik@ozlabs.ru>:
>>
>>> On 01/10/2014 08:00 AM, Mike Day wrote:
>>>
>>> Alexey Kardashevskiy <aik@ozlabs.ru> writes:
>>>
>>>>         /* compute missing values, prefer sockets over cores over threads */
>>>>         if (cpus == 0 || sockets == 0) {
>>>>             sockets = sockets > 0 ? sockets : 1;
>>>>             cores = cores > 0 ? cores : 1;
>>>> -            threads = threads > 0 ? threads : 1;
>>>> +            if (threads_max) {
>>>> +                if (threads > 0) {
>>>> +                    fprintf(stderr, "Use either threads or threads_max\n");
>>>> +                    exit(1);
>>>
>>> If you went ahead with the threads="max" string option you wouldn't need
>>> to check here for mutual excusivity and the user wouldn't need to worry
>>> about an extra command options.
>>
>>
>> Is this the only concern and the rest is fine and can go to upstream? If
>> so, I'll fix it and repost.
> 

> What if we make the max thread count a property of our cpu class? The
> we
can add a threads=max option which will be identical between kvm and tcg.


You lost me here :)
Right now the sequence is:
1. smp_parse
2. config_accelerator
3. machine_init

I proposed
1. config_accelerator - reads max threads from KVM (and initializes "host"
type)
2. smp_parse - does the parsing using smp_threads tweaked in 1)
3. machine_init - creates CPUs which may or may be not "host".

From what you said I conclude that we add this "max" property only to the
"host" CPU type and then in smp_parse() we look if "host" CPU is going to
be used and if so, then we read "max" property from it and use it. But we
cannot read property for a class, only from an instance and it is not
created yet. And we also need to tweak the number depending on "compat".

And we still need "-smp ...threads_max" or "-smp ...threads=max" property
for "-smp".

Where am I wrong here?

> Overall I'm not yet fully convinced this whole idea is eventually going
> to improve the situation though.


My goal is to have a script which would automatically set threads to the
maximum value supported by the host hardware. Reading /proc/cpuinfo and
parsing for POWER6/7/8 and setting threads to 2/4/8 is lame for my taste
because what was the point of adding cap_ppc_smt at the first place then?



> 
> 
> Alex
> 
>>
>>
>>>> +                }
>>>> +                threads = smp_threads > 0 ? smp_threads : 1;
>>>> +            } else {
>>>> +                threads = threads > 0 ? threads : 1;
>>>> +            }
>>>>             if (cpus == 0) {
>>>>                 cpus = cores * threads * sockets;
>>>>             }
>>
>>
>> -- 
>> Alexey
Mike D. Day Jan. 10, 2014, 1:03 p.m. UTC | #6
Alexey Kardashevskiy <aik@ozlabs.ru> writes:

> On 01/10/2014 10:40 AM, Alexander Graf wrote:
>> 
>
>> What if we make the max thread count a property of our cpu class? The
>> we
> can add a threads=max option which will be identical between kvm and tcg.
>
>
> You lost me here :)
> Right now the sequence is:
> 1. smp_parse
> 2. config_accelerator
> 3. machine_init
>
> I proposed
> 1. config_accelerator - reads max threads from KVM (and initializes "host"
> type)
> 2. smp_parse - does the parsing using smp_threads tweaked in 1)
> 3. machine_init - creates CPUs which may or may be not "host".

The patch as it its now is very simple and well-contained. I wonder how
much it would expand if we added a max thread count to the cpu class. It
seems like the need for a max thread count is idiomatic to powerpc. 

Mike
Alexander Graf Jan. 10, 2014, 1:28 p.m. UTC | #7
On 10.01.2014, at 14:03, Mike Day <ncmike@ncultra.org> wrote:

> 
> Alexey Kardashevskiy <aik@ozlabs.ru> writes:
> 
>> On 01/10/2014 10:40 AM, Alexander Graf wrote:
>>> 
>> 
>>> What if we make the max thread count a property of our cpu class? The
>>> we
>> can add a threads=max option which will be identical between kvm and tcg.
>> 
>> 
>> You lost me here :)
>> Right now the sequence is:
>> 1. smp_parse
>> 2. config_accelerator
>> 3. machine_init
>> 
>> I proposed
>> 1. config_accelerator - reads max threads from KVM (and initializes "host"
>> type)
>> 2. smp_parse - does the parsing using smp_threads tweaked in 1)
>> 3. machine_init - creates CPUs which may or may be not "host".
> 
> The patch as it its now is very simple and well-contained. I wonder how
> much it would expand if we added a max thread count to the cpu class. It
> seems like the need for a max thread count is idiomatic to powerpc. 

It's only ever useful on IBM POWER. Any other PowerPC system can partition vcpus by host threads, it's only IBM POWER hardware that's as broken as it is.

But really, what problem are you trying to solve here? Do you have users that don't understand the constraints they have? You will still have this even with this patch, as if you do threads=max as default for KVM (which is a bad idea, because it diverges from TCG) -smp 5 would still allocate 2 host cores on p7 and you're not effectively using your resources.

With the patch you're also not taking compat thread count into account. A POWER7 compat system would still need to manually specify threads=4, no?

I do see that the user experience is slightly suboptimal, but by creating this special case there's a good chance you're doing more harm than good.


Alex
Alexey Kardashevskiy Jan. 10, 2014, 1:42 p.m. UTC | #8
On 01/11/2014 12:28 AM, Alexander Graf wrote:
> 
> On 10.01.2014, at 14:03, Mike Day <ncmike@ncultra.org> wrote:
> 
>> 
>> Alexey Kardashevskiy <aik@ozlabs.ru> writes:
>> 
>>> On 01/10/2014 10:40 AM, Alexander Graf wrote:
>>>> 
>>> 
>>>> What if we make the max thread count a property of our cpu class?
>>>> The we
>>> can add a threads=max option which will be identical between kvm and
>>> tcg.
>>> 
>>> 
>>> You lost me here :) Right now the sequence is: 1. smp_parse 2.
>>> config_accelerator 3. machine_init
>>> 
>>> I proposed 1. config_accelerator - reads max threads from KVM (and
>>> initializes "host" type) 2. smp_parse - does the parsing using
>>> smp_threads tweaked in 1) 3. machine_init - creates CPUs which may
>>> or may be not "host".
>> 
>> The patch as it its now is very simple and well-contained. I wonder
>> how much it would expand if we added a max thread count to the cpu
>> class. It seems like the need for a max thread count is idiomatic to
>> powerpc.
> 
> It's only ever useful on IBM POWER. Any other PowerPC system can
> partition vcpus by host threads, it's only IBM POWER hardware that's as
> broken as it is.
> 

> But really, what problem are you trying to solve here? Do you have users
> that don't understand the constraints they have? You will still have
> this even with this patch, as if you do threads=max as default for KVM
> (which is a bad idea, because it diverges from TCG) -smp 5 would still
> allocate 2 host cores on p7 and you're not effectively using your
> resources.

I am not changing the default here. I am adding an ability to choose the
maximum.


> With the patch you're also not taking compat thread count into account.
> A POWER7 compat system would still need to manually specify threads=4,
> no?

Nope. I will need to smp_threads=min(smp_threads, 4) (and I do this in my
patches which I think I already posted but I need to repost them) so if it
is 1 by default, it will be still 1.


> I do see that the user experience is slightly suboptimal, but by
> creating this special case there's a good chance you're doing more harm
> than good.

Again. I do not change the default. I add an additional option (which is
false by default) to use the maximum of the CPU. What harm can it possibly
make?

I am definitely missing your point, sorry.


> 
> 
> Alex
>
Alexander Graf Jan. 10, 2014, 2 p.m. UTC | #9
On 10.01.2014, at 14:42, Alexey Kardashevskiy <aik@ozlabs.ru> wrote:

> On 01/11/2014 12:28 AM, Alexander Graf wrote:
>> 
>> On 10.01.2014, at 14:03, Mike Day <ncmike@ncultra.org> wrote:
>> 
>>> 
>>> Alexey Kardashevskiy <aik@ozlabs.ru> writes:
>>> 
>>>> On 01/10/2014 10:40 AM, Alexander Graf wrote:
>>>>> 
>>>> 
>>>>> What if we make the max thread count a property of our cpu class?
>>>>> The we
>>>> can add a threads=max option which will be identical between kvm and
>>>> tcg.
>>>> 
>>>> 
>>>> You lost me here :) Right now the sequence is: 1. smp_parse 2.
>>>> config_accelerator 3. machine_init
>>>> 
>>>> I proposed 1. config_accelerator - reads max threads from KVM (and
>>>> initializes "host" type) 2. smp_parse - does the parsing using
>>>> smp_threads tweaked in 1) 3. machine_init - creates CPUs which may
>>>> or may be not "host".
>>> 
>>> The patch as it its now is very simple and well-contained. I wonder
>>> how much it would expand if we added a max thread count to the cpu
>>> class. It seems like the need for a max thread count is idiomatic to
>>> powerpc.
>> 
>> It's only ever useful on IBM POWER. Any other PowerPC system can
>> partition vcpus by host threads, it's only IBM POWER hardware that's as
>> broken as it is.
>> 
> 
>> But really, what problem are you trying to solve here? Do you have users
>> that don't understand the constraints they have? You will still have
>> this even with this patch, as if you do threads=max as default for KVM
>> (which is a bad idea, because it diverges from TCG) -smp 5 would still
>> allocate 2 host cores on p7 and you're not effectively using your
>> resources.
> 
> I am not changing the default here. I am adding an ability to choose the
> maximum.
> 
> 
>> With the patch you're also not taking compat thread count into account.
>> A POWER7 compat system would still need to manually specify threads=4,
>> no?
> 
> Nope. I will need to smp_threads=min(smp_threads, 4) (and I do this in my
> patches which I think I already posted but I need to repost them) so if it
> is 1 by default, it will be still 1.

Bleks. So suddenly we have one piece of code overriding magic that happens in another piece of code. This sounds like in 5 years from now nobody will have a clue what's going on here anymore :).

Can't we determine the number of "default threads" at a common place, preferably derived from cpu type?

I do remember that Anthony wanted to introduce a concept for "CPU sockets" once where you just plug in a 6-core p7 into a slot and that automatically gives you 6x4 threads of the specific cpu type. That's probably overkill for this scenario though :). Be aware that you're not the first person thinking along these lines.

> 
>> I do see that the user experience is slightly suboptimal, but by
>> creating this special case there's a good chance you're doing more harm
>> than good.
> 
> Again. I do not change the default. I add an additional option (which is
> false by default) to use the maximum of the CPU. What harm can it possibly
> make?
> 
> I am definitely missing your point, sorry.

In which case would this help anyone? I'm serious, please describe exactly the reason for this patch. I'm sure you had people ask you to write it or it fixes a nuisance you have yourself.


Alex
Mike D. Day Jan. 10, 2014, 2:12 p.m. UTC | #10
Alexander Graf <agraf@suse.de> writes:

>> The patch as it its now is very simple and well-contained. I wonder how
>> much it would expand if we added a max thread count to the cpu class. It
>> seems like the need for a max thread count is idiomatic to powerpc. 
>
> It's only ever useful on IBM POWER. Any other PowerPC system can partition vcpus by host threads, it's only IBM POWER hardware that's as broken as it is.


> I do see that the user experience is slightly suboptimal, but by creating this special case there's a good chance you're doing more harm than good.

All the problems you raise about special cases are true. But, as you
point out, ibm power is uniquely broken, which would possibly justify a
special case, unless there is a better general solution. In other words,
special cases exist for unique circumstances and if I understand you
correctly this is a unique circumstance.

Alexy explained the use case in his initial posting. The user needs to
allocate all threads of a core to an instance of KVM, but has no easy
way to obtain that information (threads per core) for the Qemu threads
option. So specifying threads="max" is a more user-friendly option. In
my understanding this is strictly a usability issue.

Mike
Alexey Kardashevskiy Jan. 10, 2014, 2:13 p.m. UTC | #11
On 01/11/2014 01:00 AM, Alexander Graf wrote:
> 
> On 10.01.2014, at 14:42, Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
> 
>> On 01/11/2014 12:28 AM, Alexander Graf wrote:
>>> 
>>> On 10.01.2014, at 14:03, Mike Day <ncmike@ncultra.org> wrote:
>>> 
>>>> 
>>>> Alexey Kardashevskiy <aik@ozlabs.ru> writes:
>>>> 
>>>>> On 01/10/2014 10:40 AM, Alexander Graf wrote:
>>>>>> 
>>>>> 
>>>>>> What if we make the max thread count a property of our cpu
>>>>>> class? The we
>>>>> can add a threads=max option which will be identical between kvm
>>>>> and tcg.
>>>>> 
>>>>> 
>>>>> You lost me here :) Right now the sequence is: 1. smp_parse 2. 
>>>>> config_accelerator 3. machine_init
>>>>> 
>>>>> I proposed 1. config_accelerator - reads max threads from KVM
>>>>> (and initializes "host" type) 2. smp_parse - does the parsing
>>>>> using smp_threads tweaked in 1) 3. machine_init - creates CPUs
>>>>> which may or may be not "host".
>>>> 
>>>> The patch as it its now is very simple and well-contained. I
>>>> wonder how much it would expand if we added a max thread count to
>>>> the cpu class. It seems like the need for a max thread count is
>>>> idiomatic to powerpc.
>>> 
>>> It's only ever useful on IBM POWER. Any other PowerPC system can 
>>> partition vcpus by host threads, it's only IBM POWER hardware that's
>>> as broken as it is.
>>> 
>> 
>>> But really, what problem are you trying to solve here? Do you have
>>> users that don't understand the constraints they have? You will
>>> still have this even with this patch, as if you do threads=max as
>>> default for KVM (which is a bad idea, because it diverges from TCG)
>>> -smp 5 would still allocate 2 host cores on p7 and you're not
>>> effectively using your resources.
>> 
>> I am not changing the default here. I am adding an ability to choose
>> the maximum.
>> 
>> 
>>> With the patch you're also not taking compat thread count into
>>> account. A POWER7 compat system would still need to manually specify
>>> threads=4, no?
>> 
>> Nope. I will need to smp_threads=min(smp_threads, 4) (and I do this in
>> my patches which I think I already posted but I need to repost them)
>> so if it is 1 by default, it will be still 1.
> 
> Bleks. So suddenly we have one piece of code overriding magic that
> happens in another piece of code. This sounds like in 5 years from now
> nobody will have a clue what's going on here anymore :).


git blame will know.


> Can't we determine the number of "default threads" at a common place,
> preferably derived from cpu type?


We can do anything. I asked how exactly as I really (really) do not
understand the details.


> I do remember that Anthony wanted to introduce a concept for "CPU
> sockets" once where you just plug in a 6-core p7 into a slot and that
> automatically gives you 6x4 threads of the specific cpu type. That's
> probably overkill for this scenario though :). Be aware that you're not
> the first person thinking along these lines.


Sure I am not.


>>> I do see that the user experience is slightly suboptimal, but by 
>>> creating this special case there's a good chance you're doing more
>>> harm than good.
>> 
>> Again. I do not change the default. I add an additional option (which
>> is false by default) to use the maximum of the CPU. What harm can it
>> possibly make?
>> 
>> I am definitely missing your point, sorry.
> 

> In which case would this help anyone? I'm serious, please describe
> exactly the reason for this patch. I'm sure you had people ask you to
> write it or it fixes a nuisance you have yourself.


Right now by default people use 1 (one) threads from each core. -smp 8
takes 8 CPU cores instead of 8 threads of one CPU. How is it good?

And I'll ask again - how can the user know the maximum number of threads
per core otherwise than guessing it from the CPU model? Does any user space
tool print it?
Alexander Graf Jan. 10, 2014, 2:20 p.m. UTC | #12
On 10.01.2014, at 15:13, Alexey Kardashevskiy <aik@ozlabs.ru> wrote:

> On 01/11/2014 01:00 AM, Alexander Graf wrote:
>> 
>> On 10.01.2014, at 14:42, Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
>> 
>>> On 01/11/2014 12:28 AM, Alexander Graf wrote:
>>>> 
>>>> On 10.01.2014, at 14:03, Mike Day <ncmike@ncultra.org> wrote:
>>>> 
>>>>> 
>>>>> Alexey Kardashevskiy <aik@ozlabs.ru> writes:
>>>>> 
>>>>>> On 01/10/2014 10:40 AM, Alexander Graf wrote:
>>>>>>> 
>>>>>> 
>>>>>>> What if we make the max thread count a property of our cpu
>>>>>>> class? The we
>>>>>> can add a threads=max option which will be identical between kvm
>>>>>> and tcg.
>>>>>> 
>>>>>> 
>>>>>> You lost me here :) Right now the sequence is: 1. smp_parse 2. 
>>>>>> config_accelerator 3. machine_init
>>>>>> 
>>>>>> I proposed 1. config_accelerator - reads max threads from KVM
>>>>>> (and initializes "host" type) 2. smp_parse - does the parsing
>>>>>> using smp_threads tweaked in 1) 3. machine_init - creates CPUs
>>>>>> which may or may be not "host".
>>>>> 
>>>>> The patch as it its now is very simple and well-contained. I
>>>>> wonder how much it would expand if we added a max thread count to
>>>>> the cpu class. It seems like the need for a max thread count is
>>>>> idiomatic to powerpc.
>>>> 
>>>> It's only ever useful on IBM POWER. Any other PowerPC system can 
>>>> partition vcpus by host threads, it's only IBM POWER hardware that's
>>>> as broken as it is.
>>>> 
>>> 
>>>> But really, what problem are you trying to solve here? Do you have
>>>> users that don't understand the constraints they have? You will
>>>> still have this even with this patch, as if you do threads=max as
>>>> default for KVM (which is a bad idea, because it diverges from TCG)
>>>> -smp 5 would still allocate 2 host cores on p7 and you're not
>>>> effectively using your resources.
>>> 
>>> I am not changing the default here. I am adding an ability to choose
>>> the maximum.
>>> 
>>> 
>>>> With the patch you're also not taking compat thread count into
>>>> account. A POWER7 compat system would still need to manually specify
>>>> threads=4, no?
>>> 
>>> Nope. I will need to smp_threads=min(smp_threads, 4) (and I do this in
>>> my patches which I think I already posted but I need to repost them)
>>> so if it is 1 by default, it will be still 1.
>> 
>> Bleks. So suddenly we have one piece of code overriding magic that
>> happens in another piece of code. This sounds like in 5 years from now
>> nobody will have a clue what's going on here anymore :).
> 
> 
> git blame will know.
> 
> 
>> Can't we determine the number of "default threads" at a common place,
>> preferably derived from cpu type?
> 
> 
> We can do anything. I asked how exactly as I really (really) do not
> understand the details.

If we can override smp_threads with the compat= option we should be able to do so based on the cpu class a user selects as well, no?

> 
> 
>> I do remember that Anthony wanted to introduce a concept for "CPU
>> sockets" once where you just plug in a 6-core p7 into a slot and that
>> automatically gives you 6x4 threads of the specific cpu type. That's
>> probably overkill for this scenario though :). Be aware that you're not
>> the first person thinking along these lines.
> 
> 
> Sure I am not.
> 
> 
>>>> I do see that the user experience is slightly suboptimal, but by 
>>>> creating this special case there's a good chance you're doing more
>>>> harm than good.
>>> 
>>> Again. I do not change the default. I add an additional option (which
>>> is false by default) to use the maximum of the CPU. What harm can it
>>> possibly make?
>>> 
>>> I am definitely missing your point, sorry.
>> 
> 
>> In which case would this help anyone? I'm serious, please describe
>> exactly the reason for this patch. I'm sure you had people ask you to
>> write it or it fixes a nuisance you have yourself.
> 
> 
> Right now by default people use 1 (one) threads from each core. -smp 8
> takes 8 CPU cores instead of 8 threads of one CPU. How is it good?

It's not good, it's consistent. A user gets exactly what he specified. If I am on p7, I specify threads=4 (or not, depending on how fast I need single thread performance to be).

> And I'll ask again - how can the user know the maximum number of threads
> per core otherwise than guessing it from the CPU model? Does any user space
> tool print it?

How does the maximum number help the user at all? If he is on p8 and can do SMT2 granularity over cores (which IIRC was the granularity hardware allows you to do) why would he want to use threads=8 which is the maximum he can use?


Alex
Mike D. Day Jan. 10, 2014, 2:21 p.m. UTC | #13
On Fri, Jan 10, 2014 at 9:13 AM, Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
> On 01/11/2014 01:00 AM, Alexander Graf wrote:

>> Can't we determine the number of "default threads" at a common place,
>> preferably derived from cpu type?
>
> We can do anything. I asked how exactly as I really (really) do not
> understand the details.
>

Are you suggesting we create a dictionary with all the cpu type
information stored in it
(stepping, cores, threads, memory channels, caches) that we need to
keep updated?
Alexander Graf Jan. 10, 2014, 2:25 p.m. UTC | #14
On 10.01.2014, at 15:21, Mike Day <ncmike@ncultra.org> wrote:

> On Fri, Jan 10, 2014 at 9:13 AM, Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
>> On 01/11/2014 01:00 AM, Alexander Graf wrote:
> 
>>> Can't we determine the number of "default threads" at a common place,
>>> preferably derived from cpu type?
>> 
>> We can do anything. I asked how exactly as I really (really) do not
>> understand the details.
>> 
> 
> Are you suggesting we create a dictionary with all the cpu type
> information stored in it
> (stepping, cores, threads, memory channels, caches) that we need to
> keep updated?

We can always talk in extremes :). Today we have a dictionary of core types in QEMU. If a certain core type comes with a specific number of threads, that's a property of the core, no?


Alex
Mike D. Day Jan. 10, 2014, 2:29 p.m. UTC | #15
On Fri, Jan 10, 2014 at 9:25 AM, Alexander Graf <agraf@suse.de> wrote:
>>> On 01/11/2014 01:00 AM, Alexander Graf wrote:
>>
>>>> Can't we determine the number of "default threads" at a common place,
>>>> preferably derived from cpu type?
>>>
>>> We can do anything. I asked how exactly as I really (really) do not
>>> understand the details.
>>>
>>
>> Are you suggesting we create a dictionary with all the cpu type
>> information stored in it
>> (stepping, cores, threads, memory channels, caches) that we need to
>> keep updated?
>
> We can always talk in extremes :). Today we have a dictionary of core types in QEMU. If a certain core type comes with a specific number of threads, that's a property of the core, no?

Yes, very true. I'm just considering the eventual situation where each
cpu type has several or more child classes to represent different
configurations (threads, cores). That would be a lot more complicated
than now.

Mike
Alexey Kardashevskiy Jan. 10, 2014, 2:35 p.m. UTC | #16
On 01/11/2014 01:25 AM, Alexander Graf wrote:
> 
> On 10.01.2014, at 15:21, Mike Day <ncmike@ncultra.org> wrote:
> 
>> On Fri, Jan 10, 2014 at 9:13 AM, Alexey Kardashevskiy <aik@ozlabs.ru>
>> wrote:
>>> On 01/11/2014 01:00 AM, Alexander Graf wrote:
>> 
>>>> Can't we determine the number of "default threads" at a common
>>>> place, preferably derived from cpu type?
>>> 
>>> We can do anything. I asked how exactly as I really (really) do not 
>>> understand the details.
>>> 
>> 
>> Are you suggesting we create a dictionary with all the cpu type 
>> information stored in it (stepping, cores, threads, memory channels,
>> caches) that we need to keep updated?
> 
> We can always talk in extremes :). Today we have a dictionary of core
> types in QEMU. If a certain core type comes with a specific number of
> threads, that's a property of the core, no?


No, the max smt number comes from cap_ppc_smt, not from CPU type. PR KVM
may return something different (what does it return?) and TCG definitely
will not allow anything but 1. Up to the accelerator, not a CPU class.
diff mbox

Patch

diff --git a/qemu-options.hx b/qemu-options.hx
index bcfe9ea..8d582fa 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -73,16 +73,17 @@  Select CPU model (@code{-cpu help} for list and additional feature selection)
 ETEXI
 
 DEF("smp", HAS_ARG, QEMU_OPTION_smp,
-    "-smp [cpus=]n[,maxcpus=cpus][,cores=cores][,threads=threads][,sockets=sockets]\n"
+    "-smp [cpus=]n[,maxcpus=cpus][,cores=cores][,threads=threads][,threads_max=on|off][,sockets=sockets]\n"
     "                set the number of CPUs to 'n' [default=1]\n"
     "                maxcpus= maximum number of total cpus, including\n"
     "                offline CPUs for hotplug, etc\n"
     "                cores= number of CPU cores on one socket\n"
     "                threads= number of threads on one CPU core\n"
+    "                threads_max= set number of threads on one CPU core to maximum supported by accelerator\n"
     "                sockets= number of discrete sockets in the system\n",
         QEMU_ARCH_ALL)
 STEXI
-@item -smp [cpus=]@var{n}[,cores=@var{cores}][,threads=@var{threads}][,sockets=@var{sockets}][,maxcpus=@var{maxcpus}]
+@item -smp [cpus=]@var{n}[,cores=@var{cores}][,threads=@var{threads}][,threads_max=@var{threads_max}][,sockets=@var{sockets}][,maxcpus=@var{maxcpus}]
 @findex -smp
 Simulate an SMP system with @var{n} CPUs. On the PC target, up to 255
 CPUs are supported. On Sparc32 target, Linux limits the number of usable CPUs
@@ -92,6 +93,8 @@  of @var{threads} per cores and the total number of @var{sockets} can be
 specified. Missing values will be computed. If any on the three values is
 given, the total number of CPUs @var{n} can be omitted. @var{maxcpus}
 specifies the maximum number of hotpluggable CPUs.
+If the accelerator has a knowledge about the maximum number of threads per
+core supported, @var{threads_max} will use the maximum value.
 ETEXI
 
 DEF("numa", HAS_ARG, QEMU_OPTION_numa,
diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
index 781b72f..fa845fa 100644
--- a/target-ppc/kvm.c
+++ b/target-ppc/kvm.c
@@ -109,6 +109,8 @@  int kvm_arch_init(KVMState *s)
                         "VM to stall at times!\n");
     }
 
+    smp_threads = kvmppc_smt_threads();
+
     kvm_ppc_register_host_cpu_type();
 
     return 0;
diff --git a/vl.c b/vl.c
index c268949..57589aa 100644
--- a/vl.c
+++ b/vl.c
@@ -1381,6 +1381,9 @@  static QemuOptsList qemu_smp_opts = {
             .name = "threads",
             .type = QEMU_OPT_NUMBER,
         }, {
+            .name = "threads_max",
+            .type = QEMU_OPT_BOOL,
+        }, {
             .name = "maxcpus",
             .type = QEMU_OPT_NUMBER,
         },
@@ -1396,12 +1399,21 @@  static void smp_parse(QemuOpts *opts)
         unsigned sockets = qemu_opt_get_number(opts, "sockets", 0);
         unsigned cores   = qemu_opt_get_number(opts, "cores", 0);
         unsigned threads = qemu_opt_get_number(opts, "threads", 0);
+        bool threads_max = qemu_opt_get_bool(opts, "threads_max", false);
 
         /* compute missing values, prefer sockets over cores over threads */
         if (cpus == 0 || sockets == 0) {
             sockets = sockets > 0 ? sockets : 1;
             cores = cores > 0 ? cores : 1;
-            threads = threads > 0 ? threads : 1;
+            if (threads_max) {
+                if (threads > 0) {
+                    fprintf(stderr, "Use either threads or threads_max\n");
+                    exit(1);
+                }
+                threads = smp_threads > 0 ? smp_threads : 1;
+            } else {
+                threads = threads > 0 ? threads : 1;
+            }
             if (cpus == 0) {
                 cpus = cores * threads * sockets;
             }
@@ -3895,16 +3907,6 @@  int main(int argc, char **argv, char **envp)
         data_dir[data_dir_idx++] = CONFIG_QEMU_DATADIR;
     }
 
-    smp_parse(qemu_opts_find(qemu_find_opts("smp-opts"), NULL));
-
-    machine->max_cpus = machine->max_cpus ?: 1; /* Default to UP */
-    if (smp_cpus > machine->max_cpus) {
-        fprintf(stderr, "Number of SMP cpus requested (%d), exceeds max cpus "
-                "supported by machine `%s' (%d)\n", smp_cpus,  machine->name,
-                machine->max_cpus);
-        exit(1);
-    }
-
     /*
      * Get the default machine options from the machine if it is not already
      * specified either by the configuration file or by the command line.
@@ -4058,6 +4060,16 @@  int main(int argc, char **argv, char **envp)
         qtest_init(qtest_chrdev, qtest_log);
     }
 
+    smp_parse(qemu_opts_find(qemu_find_opts("smp-opts"), NULL));
+
+    machine->max_cpus = machine->max_cpus ?: 1; /* Default to UP */
+    if (smp_cpus > machine->max_cpus) {
+        fprintf(stderr, "Number of SMP cpus requested (%d), exceeds max cpus "
+                "supported by machine `%s' (%d)\n", smp_cpus,  machine->name,
+                machine->max_cpus);
+        exit(1);
+    }
+
     machine_opts = qemu_get_machine_opts();
     kernel_filename = qemu_opt_get(machine_opts, "kernel");
     initrd_filename = qemu_opt_get(machine_opts, "initrd");