diff mbox

[libvirt] Modern CPU models cannot be used with libvirt

Message ID 20120310124246.GA4408@redhat.com
State New
Headers show

Commit Message

Daniel P. Berrangé March 10, 2012, 12:42 p.m. UTC
On Fri, Mar 09, 2012 at 09:04:03PM +0000, Daniel P. Berrange wrote:
> On Fri, Mar 09, 2012 at 05:56:52PM -0300, Eduardo Habkost wrote:
> > Resurrecting an old thread:
> > 
> > I didn't see any clear conclusion in this thread (this is why I am
> > resurrecting it), except that many were arguing that libvirt should
> > simply copy and/or generate the CPU model definitions from Qemu. I
> > really don't think it's reasonable to expect that.
> > 
> > On Thu, Dec 15, 2011 at 03:54:15PM +0100, Jiri Denemark wrote:
> > > Hi,
> > > 
> > > Recently I realized that all modern CPU models defined in
> > > /etc/qemu/target-x86_64.conf are useless when qemu is used through libvirt.
> > > That's because we start qemu with -nodefconfig which results in qemu ignoring
> > > that file with CPU model definitions. We have a very good reason for using
> > > -nodefconfig because we need to control the ABI presented to a guest OS and we
> > > don't want any configuration file that can contain lots of things including
> > > device definitions to be read by qemu. However, we would really like the new
> > > CPU models to be understood by qemu even if used through libvirt. What would
> > > be the best way to solve this?
> > > 
> > > I suspect this could have been already discussed in the past but obviously a
> > > workable solution was either not found or just not implemented.
> > 
> > So, our problem today is basically:
> > 
> > A) libvirt uses -nodefconfig;
> > B) -nodefconfig makes Qemu not load the config file containing the CPU
> >    model definitions; and
> > C) libvirt expects the full CPU model list from Qemu to be available.
> 
> I could have sworn we had this discussion a year ago or so, and had decided
> that the default CPU models would be in something like /usr/share/qemu/cpu-x86_64.conf
> and loaded regardless of the -nodefconfig setting. /etc/qemu/target-x86_64.conf
> would be solely for end user configuration changes, not for QEMU builtin
> defaults.
> 
> But looking at the code in QEMU, it doesn't seem we ever implemented this ?

Arrrgggh. It seems this was implemented as a patch in RHEL-6 qemu RPMs but,
contrary to our normal RHEL development practice, it was not based on
a cherry-pick of an upstream patch :-(

For sake of reference, I'm attaching the two patches from the RHEL6 source
RPM that do what I'm describing

NB, I'm not neccessarily advocating these patches for upstream. I still
maintain that libvirt should write out a config file containing the
exact CPU model description it desires and specify that with -readconfig.
The end result would be identical from QEMU's POV and it would avoid
playing games with QEMU's config loading code.

Regards,
Daniel

Comments

Eduardo Habkost March 10, 2012, 3:58 p.m. UTC | #1
On Sat, Mar 10, 2012 at 12:42:46PM +0000, Daniel P. Berrange wrote:
> > 
> > I could have sworn we had this discussion a year ago or so, and had decided
> > that the default CPU models would be in something like /usr/share/qemu/cpu-x86_64.conf
> > and loaded regardless of the -nodefconfig setting. /etc/qemu/target-x86_64.conf
> > would be solely for end user configuration changes, not for QEMU builtin
> > defaults.
> > 
> > But looking at the code in QEMU, it doesn't seem we ever implemented this ?
> 
> Arrrgggh. It seems this was implemented as a patch in RHEL-6 qemu RPMs but,
> contrary to our normal RHEL development practice, it was not based on
> a cherry-pick of an upstream patch :-(
> 
> For sake of reference, I'm attaching the two patches from the RHEL6 source
> RPM that do what I'm describing
> 
> NB, I'm not neccessarily advocating these patches for upstream. I still
> maintain that libvirt should write out a config file containing the
> exact CPU model description it desires and specify that with -readconfig.
> The end result would be identical from QEMU's POV and it would avoid
> playing games with QEMU's config loading code.

I agree that libvirt should just write the config somewhere. The problem
here is to define: 1) what information should be mandatory on that
config data; 2) who should be responsible to test and maintain sane
defaults (and where should they be maintained).

The current cpudef definitions are simply too low-level to require it to
be written from scratch. Lots of testing have to be done to make sure we
have working combinations of CPUID bits defined, so they can be used as
defaults or templates. Not facilitating reuse of those tested
defauls/templates by libvirt is duplication of efforts.

Really, if we expect libvirt to define all the CPU bits from scratch on
a config file, we could as well just expect libvirt to open /dev/kvm
itself and call the all CPUID setup ioctl()s itself. That's how
low-level some of the cpudef bits are.

(Also, there are additional low-level bits that really have to be
maintained somewhere, just to have sane defaults. Currently many CPUID
leafs are exposed to the guest without letting the user control them,
and worse: without keeping stability of guest-visible bits when
upgrading Qemu or the host kernel. And that's what machine-types are
for: to have sane defaults to be used as base.)

Let me give you a practical example: I had a bug report about improper
CPU topology information[1]. After investigating it, I have found out
that the "level" cpudef field is too low; CPU core topology information
is provided on CPUID leaf 4, and most of the Intel CPU models on Qemu
have level=2 today (I don't know why). So, Qemu is responsible for
exposing CPU topology information set using '-smp' to the guest OS, but
libvirt would have to be responsible for choosing a proper "level" value
that makes that information visible to the guest. We can _allow_ libvirt
to fiddle with these low-level bits, of course, but requiring every
management layer to build this low-level information from scratch is
just a recipe to waste developer time.

(And I really hope that there's no plan to require all those low-level
bits to appear as-is on the libvirt XML definitions. Because that would
require users to read the Intel 64 and IA-32 Architectures Software
Developer's Manual, or the AMD64 Architecture Programmer's Manual and
BIOS and Kernel Developer's Guides, just to understand why something is
not working on his Virtual Machine.)

[1] https://bugzilla.redhat.com/show_bug.cgi?id=689665
Anthony Liguori March 10, 2012, 6:24 p.m. UTC | #2
On 03/10/2012 09:58 AM, Eduardo Habkost wrote:
> On Sat, Mar 10, 2012 at 12:42:46PM +0000, Daniel P. Berrange wrote:
>>>
>>> I could have sworn we had this discussion a year ago or so, and had decided
>>> that the default CPU models would be in something like /usr/share/qemu/cpu-x86_64.conf
>>> and loaded regardless of the -nodefconfig setting. /etc/qemu/target-x86_64.conf
>>> would be solely for end user configuration changes, not for QEMU builtin
>>> defaults.
>>>
>>> But looking at the code in QEMU, it doesn't seem we ever implemented this ?
>>
>> Arrrgggh. It seems this was implemented as a patch in RHEL-6 qemu RPMs but,
>> contrary to our normal RHEL development practice, it was not based on
>> a cherry-pick of an upstream patch :-(
>>
>> For sake of reference, I'm attaching the two patches from the RHEL6 source
>> RPM that do what I'm describing
>>
>> NB, I'm not neccessarily advocating these patches for upstream. I still
>> maintain that libvirt should write out a config file containing the
>> exact CPU model description it desires and specify that with -readconfig.
>> The end result would be identical from QEMU's POV and it would avoid
>> playing games with QEMU's config loading code.
>
> I agree that libvirt should just write the config somewhere. The problem
> here is to define: 1) what information should be mandatory on that
> config data; 2) who should be responsible to test and maintain sane
> defaults (and where should they be maintained).
>
> The current cpudef definitions are simply too low-level to require it to
> be written from scratch. Lots of testing have to be done to make sure we
> have working combinations of CPUID bits defined, so they can be used as
> defaults or templates. Not facilitating reuse of those tested
> defauls/templates by libvirt is duplication of efforts.
>
> Really, if we expect libvirt to define all the CPU bits from scratch on
> a config file, we could as well just expect libvirt to open /dev/kvm
> itself and call the all CPUID setup ioctl()s itself. That's how
> low-level some of the cpudef bits are.

Let's step back here.

Why are you writing these patches?  It's probably not because you have a desire 
to say -cpu Westmere when you run QEMU on your laptop.  I'd wager to say that no 
human has ever done that or that if they had, they did so by accident because 
they read documentation and thought they had to.

Humans probably do one of two things: 1) no cpu option or 2) -cpu host.

So then why are you introducing -cpu Westmere?  Because ovirt-engine has a 
concept of datacenters and the entire datacenter has to use a compatible CPU 
model to allow migration compatibility.  Today, the interface that ovirt-engine 
exposes is based on CPU codenames.  Presumably ovirt-engine wants to add a 
Westmere CPU group and as such have levied a requirement down the stack to QEMU.

But there's no intrinsic reason why it uses CPU model names.  VMware doesn't do 
this.  It has a concept of compatibility groups[1].

oVirt could just as well define compatibility groups like GroupA, GroupB, 
GroupC, etc. and then the -cpu option we would be discussing would be -cpu GroupA.

This is why it's a configuration option and not builtin to QEMU.  It's a user 
interface as as such, should be defined at a higher level.

Perhaps it really should be VDSM that is providing the model info to libvirt? 
Then they can add whatever groups then want whenever they want as long as we 
have the appropriate feature bits.

P.S. I spent 30 minutes the other day helping a user who was attempting to 
figure out whether his processor was a Conroe, Penryn, etc.  Making this 
determination is fairly difficult and it makes me wonder whether having CPU code 
names is even the best interface for oVirt..

[1] 
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1991

Regards,

Anthony Liguori

>
> (Also, there are additional low-level bits that really have to be
> maintained somewhere, just to have sane defaults. Currently many CPUID
> leafs are exposed to the guest without letting the user control them,
> and worse: without keeping stability of guest-visible bits when
> upgrading Qemu or the host kernel. And that's what machine-types are
> for: to have sane defaults to be used as base.)
>
> Let me give you a practical example: I had a bug report about improper
> CPU topology information[1]. After investigating it, I have found out
> that the "level" cpudef field is too low; CPU core topology information
> is provided on CPUID leaf 4, and most of the Intel CPU models on Qemu
> have level=2 today (I don't know why). So, Qemu is responsible for
> exposing CPU topology information set using '-smp' to the guest OS, but
> libvirt would have to be responsible for choosing a proper "level" value
> that makes that information visible to the guest. We can _allow_ libvirt
> to fiddle with these low-level bits, of course, but requiring every
> management layer to build this low-level information from scratch is
> just a recipe to waste developer time.
>
> (And I really hope that there's no plan to require all those low-level
> bits to appear as-is on the libvirt XML definitions. Because that would
> require users to read the Intel 64 and IA-32 Architectures Software
> Developer's Manual, or the AMD64 Architecture Programmer's Manual and
> BIOS and Kernel Developer's Guides, just to understand why something is
> not working on his Virtual Machine.)
>
> [1] https://bugzilla.redhat.com/show_bug.cgi?id=689665
>
Andreas Färber March 10, 2012, 6:37 p.m. UTC | #3
Am 10.03.2012 19:24, schrieb Anthony Liguori:
> Humans probably do one of two things: 1) no cpu option or 2) -cpu host.
> 
> So then why are you introducing -cpu Westmere?
[...]
> P.S. I spent 30 minutes the other day helping a user who was attempting
> to figure out whether his processor was a Conroe, Penryn, etc.  Making
> this determination is fairly difficult and it makes me wonder whether
> having CPU code names is even the best interface for oVirt..

That's why Alex suggested -cpu best, which goes through all those
definitions whatever their name and chooses the closest one IIUC.

http://patchwork.ozlabs.org/patch/134955/

Andreas
Doug Goldstein March 10, 2012, 10:39 p.m. UTC | #4
On Sat, Mar 10, 2012 at 12:24 PM, Anthony Liguori <anthony@codemonkey.ws> wrote:
> On 03/10/2012 09:58 AM, Eduardo Habkost wrote:
>>
>> On Sat, Mar 10, 2012 at 12:42:46PM +0000, Daniel P. Berrange wrote:
>>>>
>>>>
>>>> I could have sworn we had this discussion a year ago or so, and had
>>>> decided
>>>> that the default CPU models would be in something like
>>>> /usr/share/qemu/cpu-x86_64.conf
>>>> and loaded regardless of the -nodefconfig setting.
>>>> /etc/qemu/target-x86_64.conf
>>>> would be solely for end user configuration changes, not for QEMU builtin
>>>> defaults.
>>>>
>>>> But looking at the code in QEMU, it doesn't seem we ever implemented
>>>> this ?
>>>
>>>
>>> Arrrgggh. It seems this was implemented as a patch in RHEL-6 qemu RPMs
>>> but,
>>> contrary to our normal RHEL development practice, it was not based on
>>> a cherry-pick of an upstream patch :-(
>>>
>>> For sake of reference, I'm attaching the two patches from the RHEL6
>>> source
>>> RPM that do what I'm describing
>>>
>>> NB, I'm not neccessarily advocating these patches for upstream. I still
>>> maintain that libvirt should write out a config file containing the
>>> exact CPU model description it desires and specify that with -readconfig.
>>> The end result would be identical from QEMU's POV and it would avoid
>>> playing games with QEMU's config loading code.
>>
>>
>> I agree that libvirt should just write the config somewhere. The problem
>> here is to define: 1) what information should be mandatory on that
>> config data; 2) who should be responsible to test and maintain sane
>> defaults (and where should they be maintained).
>>
>> The current cpudef definitions are simply too low-level to require it to
>> be written from scratch. Lots of testing have to be done to make sure we
>> have working combinations of CPUID bits defined, so they can be used as
>> defaults or templates. Not facilitating reuse of those tested
>> defauls/templates by libvirt is duplication of efforts.
>>
>> Really, if we expect libvirt to define all the CPU bits from scratch on
>> a config file, we could as well just expect libvirt to open /dev/kvm
>> itself and call the all CPUID setup ioctl()s itself. That's how
>> low-level some of the cpudef bits are.
>
>
> Let's step back here.
>
> Why are you writing these patches?  It's probably not because you have a
> desire to say -cpu Westmere when you run QEMU on your laptop.  I'd wager to
> say that no human has ever done that or that if they had, they did so by
> accident because they read documentation and thought they had to.
>
> Humans probably do one of two things: 1) no cpu option or 2) -cpu host.
>
> So then why are you introducing -cpu Westmere?  Because ovirt-engine has a
> concept of datacenters and the entire datacenter has to use a compatible CPU
> model to allow migration compatibility.  Today, the interface that
> ovirt-engine exposes is based on CPU codenames.  Presumably ovirt-engine
> wants to add a Westmere CPU group and as such have levied a requirement down
> the stack to QEMU.
>
> But there's no intrinsic reason why it uses CPU model names.  VMware doesn't
> do this.  It has a concept of compatibility groups[1].
>
> oVirt could just as well define compatibility groups like GroupA, GroupB,
> GroupC, etc. and then the -cpu option we would be discussing would be -cpu
> GroupA.
>
> This is why it's a configuration option and not builtin to QEMU.  It's a
> user interface as as such, should be defined at a higher level.
>
> Perhaps it really should be VDSM that is providing the model info to
> libvirt? Then they can add whatever groups then want whenever they want as
> long as we have the appropriate feature bits.
>
> P.S. I spent 30 minutes the other day helping a user who was attempting to
> figure out whether his processor was a Conroe, Penryn, etc.  Making this
> determination is fairly difficult and it makes me wonder whether having CPU
> code names is even the best interface for oVirt..
>
> [1]
> http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1991
>
> Regards,
>
> Anthony Liguori

FWIW, as a user this would be a good improvement. As it stands right
now when a cluster of machines is established as being redundant
migratable machines for each other I must do the following for each
machine:
virsh -c qemu://machine/system capabilities | xpath
/capabilities/host/cpu > machine-cpu.xml
Once I have that data I combine them together and use virsh
cpu-baseline, which is a handy addition from the past of doing it
manually, but still not optimal. This gives me a model which is mostly
meaningless and uninteresting to me, but I know all the guests must
use Penryn for example. If ovirt and by extension libvirt let me know
that guest X is running on CPU-A, I know I could migrate it to any
other machine supporting CPU-A or CPU-B (assuming B is a super set of
A).
Andrew Cathrow March 11, 2012, 12:55 a.m. UTC | #5
----- Original Message -----
> From: "Anthony Liguori" <anthony@codemonkey.ws>
> To: "Daniel P. Berrange" <berrange@redhat.com>, libvir-list@redhat.com, qemu-devel@nongnu.org, "Gleb Natapov"
> <gleb@redhat.com>, "Jiri Denemark" <jdenemar@redhat.com>, "Avi Kivity" <avi@redhat.com>, arch@ovirt.org
> Sent: Saturday, March 10, 2012 1:24:47 PM
> Subject: Re: [libvirt] [Qemu-devel] Modern CPU models cannot be used with libvirt
> 
> On 03/10/2012 09:58 AM, Eduardo Habkost wrote:
> > On Sat, Mar 10, 2012 at 12:42:46PM +0000, Daniel P. Berrange wrote:
> >>>
> >>> I could have sworn we had this discussion a year ago or so, and
> >>> had decided
> >>> that the default CPU models would be in something like
> >>> /usr/share/qemu/cpu-x86_64.conf
> >>> and loaded regardless of the -nodefconfig setting.
> >>> /etc/qemu/target-x86_64.conf
> >>> would be solely for end user configuration changes, not for QEMU
> >>> builtin
> >>> defaults.
> >>>
> >>> But looking at the code in QEMU, it doesn't seem we ever
> >>> implemented this ?
> >>
> >> Arrrgggh. It seems this was implemented as a patch in RHEL-6 qemu
> >> RPMs but,
> >> contrary to our normal RHEL development practice, it was not based
> >> on
> >> a cherry-pick of an upstream patch :-(
> >>
> >> For sake of reference, I'm attaching the two patches from the
> >> RHEL6 source
> >> RPM that do what I'm describing
> >>
> >> NB, I'm not neccessarily advocating these patches for upstream. I
> >> still
> >> maintain that libvirt should write out a config file containing
> >> the
> >> exact CPU model description it desires and specify that with
> >> -readconfig.
> >> The end result would be identical from QEMU's POV and it would
> >> avoid
> >> playing games with QEMU's config loading code.
> >
> > I agree that libvirt should just write the config somewhere. The
> > problem
> > here is to define: 1) what information should be mandatory on that
> > config data; 2) who should be responsible to test and maintain sane
> > defaults (and where should they be maintained).
> >
> > The current cpudef definitions are simply too low-level to require
> > it to
> > be written from scratch. Lots of testing have to be done to make
> > sure we
> > have working combinations of CPUID bits defined, so they can be
> > used as
> > defaults or templates. Not facilitating reuse of those tested
> > defauls/templates by libvirt is duplication of efforts.
> >
> > Really, if we expect libvirt to define all the CPU bits from
> > scratch on
> > a config file, we could as well just expect libvirt to open
> > /dev/kvm
> > itself and call the all CPUID setup ioctl()s itself. That's how
> > low-level some of the cpudef bits are.
> 
> Let's step back here.
> 
> Why are you writing these patches?  It's probably not because you
> have a desire
> to say -cpu Westmere when you run QEMU on your laptop.  I'd wager to
> say that no
> human has ever done that or that if they had, they did so by accident
> because
> they read documentation and thought they had to.
> 
> Humans probably do one of two things: 1) no cpu option or 2) -cpu
> host.
> 
> So then why are you introducing -cpu Westmere?  Because ovirt-engine
> has a
> concept of datacenters and the entire datacenter has to use a
> compatible CPU
> model to allow migration compatibility.  Today, the interface that
> ovirt-engine
> exposes is based on CPU codenames.  Presumably ovirt-engine wants to
> add a
> Westmere CPU group and as such have levied a requirement down the
> stack to QEMU.
> 
> But there's no intrinsic reason why it uses CPU model names.  VMware
> doesn't do
> this.  It has a concept of compatibility groups[1].

s/has/had

That was back in the 3.5 days and it was hit and miss, it relied on a user putting the same kind of machines in the resource groups and often caused issues.
Now they've moved up to a model very similar to what we're using:
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1003212


> 
> oVirt could just as well define compatibility groups like GroupA,
> GroupB,
> GroupC, etc. and then the -cpu option we would be discussing would be
> -cpu GroupA.
> 
> This is why it's a configuration option and not builtin to QEMU.
>  It's a user
> interface as as such, should be defined at a higher level.
> 
> Perhaps it really should be VDSM that is providing the model info to
> libvirt?
> Then they can add whatever groups then want whenever they want as
> long as we
> have the appropriate feature bits.

I think the "real" (model specific) names are the best place to start.
But if a user wants to override those with their own specific types then it should be allowed


> 
> P.S. I spent 30 minutes the other day helping a user who was
> attempting to
> figure out whether his processor was a Conroe, Penryn, etc.  Making
> this
> determination is fairly difficult and it makes me wonder whether
> having CPU code
> names is even the best interface for oVirt..

I think that was more about a bad choice in UI than a bad choice in the architecture.
It should be made clear to a user what kind of machine they have and what it's capabilities are
This bug was borne out of that issue  https://bugzilla.redhat.com/show_bug.cgi?id=799708


> 
> [1]
> http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1991
> 
> Regards,
> 
> Anthony Liguori
> 
> >
> > (Also, there are additional low-level bits that really have to be
> > maintained somewhere, just to have sane defaults. Currently many
> > CPUID
> > leafs are exposed to the guest without letting the user control
> > them,
> > and worse: without keeping stability of guest-visible bits when
> > upgrading Qemu or the host kernel. And that's what machine-types
> > are
> > for: to have sane defaults to be used as base.)
> >
> > Let me give you a practical example: I had a bug report about
> > improper
> > CPU topology information[1]. After investigating it, I have found
> > out
> > that the "level" cpudef field is too low; CPU core topology
> > information
> > is provided on CPUID leaf 4, and most of the Intel CPU models on
> > Qemu
> > have level=2 today (I don't know why). So, Qemu is responsible for
> > exposing CPU topology information set using '-smp' to the guest OS,
> > but
> > libvirt would have to be responsible for choosing a proper "level"
> > value
> > that makes that information visible to the guest. We can _allow_
> > libvirt
> > to fiddle with these low-level bits, of course, but requiring every
> > management layer to build this low-level information from scratch
> > is
> > just a recipe to waste developer time.
> >
> > (And I really hope that there's no plan to require all those
> > low-level
> > bits to appear as-is on the libvirt XML definitions. Because that
> > would
> > require users to read the Intel 64 and IA-32 Architectures Software
> > Developer's Manual, or the AMD64 Architecture Programmer's Manual
> > and
> > BIOS and Kernel Developer's Guides, just to understand why
> > something is
> > not working on his Virtual Machine.)
> >
> > [1] https://bugzilla.redhat.com/show_bug.cgi?id=689665
> >
> 
> --
> libvir-list mailing list
> libvir-list@redhat.com
> https://www.redhat.com/mailman/listinfo/libvir-list
>
Gleb Natapov March 11, 2012, 12:41 p.m. UTC | #6
On Sat, Mar 10, 2012 at 12:58:43PM -0300, Eduardo Habkost wrote:
> On Sat, Mar 10, 2012 at 12:42:46PM +0000, Daniel P. Berrange wrote:
> > > 
> > > I could have sworn we had this discussion a year ago or so, and had decided
> > > that the default CPU models would be in something like /usr/share/qemu/cpu-x86_64.conf
> > > and loaded regardless of the -nodefconfig setting. /etc/qemu/target-x86_64.conf
> > > would be solely for end user configuration changes, not for QEMU builtin
> > > defaults.
> > > 
> > > But looking at the code in QEMU, it doesn't seem we ever implemented this ?
> > 
> > Arrrgggh. It seems this was implemented as a patch in RHEL-6 qemu RPMs but,
> > contrary to our normal RHEL development practice, it was not based on
> > a cherry-pick of an upstream patch :-(
> > 
> > For sake of reference, I'm attaching the two patches from the RHEL6 source
> > RPM that do what I'm describing
> > 
> > NB, I'm not neccessarily advocating these patches for upstream. I still
> > maintain that libvirt should write out a config file containing the
> > exact CPU model description it desires and specify that with -readconfig.
> > The end result would be identical from QEMU's POV and it would avoid
> > playing games with QEMU's config loading code.
> 
> I agree that libvirt should just write the config somewhere. The problem
> here is to define: 1) what information should be mandatory on that
> config data; 2) who should be responsible to test and maintain sane
> defaults (and where should they be maintained).
> 
> The current cpudef definitions are simply too low-level to require it to
> be written from scratch. Lots of testing have to be done to make sure we
> have working combinations of CPUID bits defined, so they can be used as
> defaults or templates. Not facilitating reuse of those tested
> defauls/templates by libvirt is duplication of efforts.
> 
> Really, if we expect libvirt to define all the CPU bits from scratch on
> a config file, we could as well just expect libvirt to open /dev/kvm
> itself and call the all CPUID setup ioctl()s itself. That's how
> low-level some of the cpudef bits are.
> 
s/some/all

If libvirt assumes anything about what kvm actually supports it is
working only by sheer luck.

> (Also, there are additional low-level bits that really have to be
> maintained somewhere, just to have sane defaults. Currently many CPUID
> leafs are exposed to the guest without letting the user control them,
> and worse: without keeping stability of guest-visible bits when
> upgrading Qemu or the host kernel. And that's what machine-types are
> for: to have sane defaults to be used as base.)
> 
> Let me give you a practical example: I had a bug report about improper
> CPU topology information[1]. After investigating it, I have found out
> that the "level" cpudef field is too low; CPU core topology information
> is provided on CPUID leaf 4, and most of the Intel CPU models on Qemu
> have level=2 today (I don't know why). So, Qemu is responsible for
> exposing CPU topology information set using '-smp' to the guest OS, but
> libvirt would have to be responsible for choosing a proper "level" value
> that makes that information visible to the guest. We can _allow_ libvirt
> to fiddle with these low-level bits, of course, but requiring every
> management layer to build this low-level information from scratch is
> just a recipe to waste developer time.
And QEMU become even less usable from a command line. One more point to
kvm-tool I guess.

> 
> (And I really hope that there's no plan to require all those low-level
> bits to appear as-is on the libvirt XML definitions. Because that would
> require users to read the Intel 64 and IA-32 Architectures Software
> Developer's Manual, or the AMD64 Architecture Programmer's Manual and
> BIOS and Kernel Developer's Guides, just to understand why something is
> not working on his Virtual Machine.)
> 
> [1] https://bugzilla.redhat.com/show_bug.cgi?id=689665
> 
> -- 
> Eduardo

--
			Gleb.
Gleb Natapov March 11, 2012, 1:27 p.m. UTC | #7
On Sat, Mar 10, 2012 at 12:24:47PM -0600, Anthony Liguori wrote:
> Let's step back here.
> 
> Why are you writing these patches?  It's probably not because you
> have a desire to say -cpu Westmere when you run QEMU on your laptop.
> I'd wager to say that no human has ever done that or that if they
> had, they did so by accident because they read documentation and
> thought they had to.
> 
I'd be glad if QEMU will chose -cpu Westmere for me if it detects
Westmere host CPU as a default.

> Humans probably do one of two things: 1) no cpu option or 2) -cpu host.
> 
And both are not optimal. Actually both are bad. First one because
default cpu is very conservative and the second because there is no
guaranty that guest will continue to work after qemu or kernel upgrade.

Let me elaborate about the later. Suppose host CPU has kill_guest
feature and at the time a guest was installed it was not implemented by
kvm. Since it was not implemented by kvm it was not present in vcpu
during installation and the guest didn't install "workaround kill_guest"
module. Now unsuspecting user upgrades the kernel and tries to restart
the guest and fails. He writes angry letter to qemu-devel and is asked to
reinstall his guest and move along.

> So then why are you introducing -cpu Westmere?  Because ovirt-engine
> has a concept of datacenters and the entire datacenter has to use a
> compatible CPU model to allow migration compatibility.  Today, the
> interface that ovirt-engine exposes is based on CPU codenames.
> Presumably ovirt-engine wants to add a Westmere CPU group and as
> such have levied a requirement down the stack to QEMU.
> 
First of all this is not about live migration only. Guest visible vcpu
should not change after guest reboot (or hibernate/resume) too. And
second this concept exists with only your laptop and single guest on it
too. There are three inputs into a "CPU model module": 1) host cpu, 2)
qemu capabilities, 3) kvm capabilities. With datacenters scenario all
three can change, with your laptop only last two can change (first one
can change too when you'll get new laptop) , but the net result is that
guest visible cpuid can change and it shouldn't. This is the goal of
introducing -cpu Westmere, to prevent it from happening.

> But there's no intrinsic reason why it uses CPU model names.  VMware
> doesn't do this.  It has a concept of compatibility groups[1].
> 
As Andrew noted, not any more. There is no intrinsic reason, but people
are more familiar with Intel terminology than random hypervisor
terminology.

> oVirt could just as well define compatibility groups like GroupA,
> GroupB, GroupC, etc. and then the -cpu option we would be discussing
> would be -cpu GroupA.
It could, but I can't see why this is less confusing. 

> 
> This is why it's a configuration option and not builtin to QEMU.
> It's a user interface as as such, should be defined at a higher
> level.
This is not the only configuration that is builtin in QEMU. As it stands
now QEMU does not even allow configuring cpuid enough to define those
compatibility groups outside of QEMU. And after the work is done to allow
enough configurability there is no much left to provide compatibility
groups in QEMU itself.

> 
> Perhaps it really should be VDSM that is providing the model info to
> libvirt? Then they can add whatever groups then want whenever they
> want as long as we have the appropriate feature bits.
> 
> P.S. I spent 30 minutes the other day helping a user who was
> attempting to figure out whether his processor was a Conroe, Penryn,
> etc.  Making this determination is fairly difficult and it makes me
> wonder whether having CPU code names is even the best interface for
> oVirt..
> 
> [1] http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1991
> 
> Regards,
> 
> Anthony Liguori
> 
> >
> >(Also, there are additional low-level bits that really have to be
> >maintained somewhere, just to have sane defaults. Currently many CPUID
> >leafs are exposed to the guest without letting the user control them,
> >and worse: without keeping stability of guest-visible bits when
> >upgrading Qemu or the host kernel. And that's what machine-types are
> >for: to have sane defaults to be used as base.)
> >
> >Let me give you a practical example: I had a bug report about improper
> >CPU topology information[1]. After investigating it, I have found out
> >that the "level" cpudef field is too low; CPU core topology information
> >is provided on CPUID leaf 4, and most of the Intel CPU models on Qemu
> >have level=2 today (I don't know why). So, Qemu is responsible for
> >exposing CPU topology information set using '-smp' to the guest OS, but
> >libvirt would have to be responsible for choosing a proper "level" value
> >that makes that information visible to the guest. We can _allow_ libvirt
> >to fiddle with these low-level bits, of course, but requiring every
> >management layer to build this low-level information from scratch is
> >just a recipe to waste developer time.
> >
> >(And I really hope that there's no plan to require all those low-level
> >bits to appear as-is on the libvirt XML definitions. Because that would
> >require users to read the Intel 64 and IA-32 Architectures Software
> >Developer's Manual, or the AMD64 Architecture Programmer's Manual and
> >BIOS and Kernel Developer's Guides, just to understand why something is
> >not working on his Virtual Machine.)
> >
> >[1] https://bugzilla.redhat.com/show_bug.cgi?id=689665
> >

--
			Gleb.
Anthony Liguori March 11, 2012, 2:12 p.m. UTC | #8
On 03/11/2012 08:27 AM, Gleb Natapov wrote:
> On Sat, Mar 10, 2012 at 12:24:47PM -0600, Anthony Liguori wrote:
>> Let's step back here.
>>
>> Why are you writing these patches?  It's probably not because you
>> have a desire to say -cpu Westmere when you run QEMU on your laptop.
>> I'd wager to say that no human has ever done that or that if they
>> had, they did so by accident because they read documentation and
>> thought they had to.
>>
> I'd be glad if QEMU will chose -cpu Westmere for me if it detects
> Westmere host CPU as a default.

This is -cpu best that Alex proposed FWIW.

>> Humans probably do one of two things: 1) no cpu option or 2) -cpu host.
>>
> And both are not optimal. Actually both are bad. First one because
> default cpu is very conservative and the second because there is no
> guaranty that guest will continue to work after qemu or kernel upgrade.
>
> Let me elaborate about the later. Suppose host CPU has kill_guest
> feature and at the time a guest was installed it was not implemented by
> kvm. Since it was not implemented by kvm it was not present in vcpu
> during installation and the guest didn't install "workaround kill_guest"
> module. Now unsuspecting user upgrades the kernel and tries to restart
> the guest and fails. He writes angry letter to qemu-devel and is asked to
> reinstall his guest and move along.

-cpu best wouldn't solve this.  You need a read/write configuration file where 
QEMU probes the available CPU and records it to be used for the lifetime of the VM.

>> So then why are you introducing -cpu Westmere?  Because ovirt-engine
>> has a concept of datacenters and the entire datacenter has to use a
>> compatible CPU model to allow migration compatibility.  Today, the
>> interface that ovirt-engine exposes is based on CPU codenames.
>> Presumably ovirt-engine wants to add a Westmere CPU group and as
>> such have levied a requirement down the stack to QEMU.
>>
> First of all this is not about live migration only. Guest visible vcpu
> should not change after guest reboot (or hibernate/resume) too. And
> second this concept exists with only your laptop and single guest on it
> too. There are three inputs into a "CPU model module": 1) host cpu, 2)
> qemu capabilities, 3) kvm capabilities. With datacenters scenario all
> three can change, with your laptop only last two can change (first one
> can change too when you'll get new laptop) , but the net result is that
> guest visible cpuid can change and it shouldn't. This is the goal of
> introducing -cpu Westmere, to prevent it from happening.

This discussion isn't about whether QEMU should have a Westmere processor 
definition.  In fact, I think I already applied that patch.

It's a discussion about how we handle this up and down the stack.

The question is who should define and manage CPU compatibility.  Right now QEMU 
does to a certain degree, libvirt discards this and does it's own thing, and 
VDSM/ovirt-engine assume that we're providing something and has built a UI 
around it.

What I'm proposing we consider: have VDSM manage CPU definitions in order to 
provide a specific user experience in ovirt-engine.

We would continue to have Westmere/etc in QEMU exposed as part of the user 
configuration.  But I don't think it makes a lot of sense to have to modify QEMU 
any time a new CPU comes out.

Regards,

Anthony Liguori
Anthony Liguori March 11, 2012, 2:16 p.m. UTC | #9
On 03/11/2012 07:41 AM, Gleb Natapov wrote:
> On Sat, Mar 10, 2012 at 12:58:43PM -0300, Eduardo Habkost wrote:
>> On Sat, Mar 10, 2012 at 12:42:46PM +0000, Daniel P. Berrange wrote:
>>>>
>>>> I could have sworn we had this discussion a year ago or so, and had decided
>>>> that the default CPU models would be in something like /usr/share/qemu/cpu-x86_64.conf
>>>> and loaded regardless of the -nodefconfig setting. /etc/qemu/target-x86_64.conf
>>>> would be solely for end user configuration changes, not for QEMU builtin
>>>> defaults.
>>>>
>>>> But looking at the code in QEMU, it doesn't seem we ever implemented this ?
>>>
>>> Arrrgggh. It seems this was implemented as a patch in RHEL-6 qemu RPMs but,
>>> contrary to our normal RHEL development practice, it was not based on
>>> a cherry-pick of an upstream patch :-(
>>>
>>> For sake of reference, I'm attaching the two patches from the RHEL6 source
>>> RPM that do what I'm describing
>>>
>>> NB, I'm not neccessarily advocating these patches for upstream. I still
>>> maintain that libvirt should write out a config file containing the
>>> exact CPU model description it desires and specify that with -readconfig.
>>> The end result would be identical from QEMU's POV and it would avoid
>>> playing games with QEMU's config loading code.
>>
>> I agree that libvirt should just write the config somewhere. The problem
>> here is to define: 1) what information should be mandatory on that
>> config data; 2) who should be responsible to test and maintain sane
>> defaults (and where should they be maintained).
>>
>> The current cpudef definitions are simply too low-level to require it to
>> be written from scratch. Lots of testing have to be done to make sure we
>> have working combinations of CPUID bits defined, so they can be used as
>> defaults or templates. Not facilitating reuse of those tested
>> defauls/templates by libvirt is duplication of efforts.
>>
>> Really, if we expect libvirt to define all the CPU bits from scratch on
>> a config file, we could as well just expect libvirt to open /dev/kvm
>> itself and call the all CPUID setup ioctl()s itself. That's how
>> low-level some of the cpudef bits are.
>>
> s/some/all
>
> If libvirt assumes anything about what kvm actually supports it is
> working only by sheer luck.

Well the simple answer for libvirt is don't use -nodefconfig and then it can 
reuse the CPU definitions (including any that the user adds).

Really, what's the point of having a layer of management if we're saying that 
doing policy management is too complicated for that layer?  What does that layer 
exist to provide then?

>> (Also, there are additional low-level bits that really have to be
>> maintained somewhere, just to have sane defaults. Currently many CPUID
>> leafs are exposed to the guest without letting the user control them,
>> and worse: without keeping stability of guest-visible bits when
>> upgrading Qemu or the host kernel. And that's what machine-types are
>> for: to have sane defaults to be used as base.)
>>
>> Let me give you a practical example: I had a bug report about improper
>> CPU topology information[1]. After investigating it, I have found out
>> that the "level" cpudef field is too low; CPU core topology information
>> is provided on CPUID leaf 4, and most of the Intel CPU models on Qemu
>> have level=2 today (I don't know why). So, Qemu is responsible for
>> exposing CPU topology information set using '-smp' to the guest OS, but
>> libvirt would have to be responsible for choosing a proper "level" value
>> that makes that information visible to the guest. We can _allow_ libvirt
>> to fiddle with these low-level bits, of course, but requiring every
>> management layer to build this low-level information from scratch is
>> just a recipe to waste developer time.
> And QEMU become even less usable from a command line. One more point to
> kvm-tool I guess.

I'm not sure what your point is.  We're talking about an option that humans 
don't use.  How is this a discussion about QEMU usability?

Regards,

Anthony Liguori
Gleb Natapov March 11, 2012, 2:56 p.m. UTC | #10
On Sun, Mar 11, 2012 at 09:12:58AM -0500, Anthony Liguori wrote:
> On 03/11/2012 08:27 AM, Gleb Natapov wrote:
> >On Sat, Mar 10, 2012 at 12:24:47PM -0600, Anthony Liguori wrote:
> >>Let's step back here.
> >>
> >>Why are you writing these patches?  It's probably not because you
> >>have a desire to say -cpu Westmere when you run QEMU on your laptop.
> >>I'd wager to say that no human has ever done that or that if they
> >>had, they did so by accident because they read documentation and
> >>thought they had to.
> >>
> >I'd be glad if QEMU will chose -cpu Westmere for me if it detects
> >Westmere host CPU as a default.
> 
> This is -cpu best that Alex proposed FWIW.
> 
I didn't look at exact implementation but I doubt it does exactly what
we need because currently we do not have infrastructure for that. If qemu
is upgraded with support for new cpuid bits and -cpu best will pass them
to a guest on next boot then this is not the same. -cpu Westmere can
mean different thing for different machine types with proper
infrastructure in place.

> >>Humans probably do one of two things: 1) no cpu option or 2) -cpu host.
> >>
> >And both are not optimal. Actually both are bad. First one because
> >default cpu is very conservative and the second because there is no
> >guaranty that guest will continue to work after qemu or kernel upgrade.
> >
> >Let me elaborate about the later. Suppose host CPU has kill_guest
> >feature and at the time a guest was installed it was not implemented by
> >kvm. Since it was not implemented by kvm it was not present in vcpu
> >during installation and the guest didn't install "workaround kill_guest"
> >module. Now unsuspecting user upgrades the kernel and tries to restart
> >the guest and fails. He writes angry letter to qemu-devel and is asked to
> >reinstall his guest and move along.
> 
> -cpu best wouldn't solve this.  You need a read/write configuration
> file where QEMU probes the available CPU and records it to be used
> for the lifetime of the VM.
That what I thought too, but this shouldn't be the case (Avi's idea).
We need two things: 1) CPU model config should be per machine type.
2) QEMU should refuse to start if it cannot create cpu exactly as
specified by model config.

With two conditions above if user creates VM with qemu 1.0 and cpu model
Westmere which has no kill_guest feature he will still be able to run it
in QEMU 1.1 (where kill_guest is added to Westmere model) and new kvm
that support kill_guest by providing -M pc-1.0 flag (old definition of
Westmere will be used). If user will try to create VM with QEMU 1.1 on
a kernel that does not support kill_guest QEMU will refuse to start.

> 
> >>So then why are you introducing -cpu Westmere?  Because ovirt-engine
> >>has a concept of datacenters and the entire datacenter has to use a
> >>compatible CPU model to allow migration compatibility.  Today, the
> >>interface that ovirt-engine exposes is based on CPU codenames.
> >>Presumably ovirt-engine wants to add a Westmere CPU group and as
> >>such have levied a requirement down the stack to QEMU.
> >>
> >First of all this is not about live migration only. Guest visible vcpu
> >should not change after guest reboot (or hibernate/resume) too. And
> >second this concept exists with only your laptop and single guest on it
> >too. There are three inputs into a "CPU model module": 1) host cpu, 2)
> >qemu capabilities, 3) kvm capabilities. With datacenters scenario all
> >three can change, with your laptop only last two can change (first one
> >can change too when you'll get new laptop) , but the net result is that
> >guest visible cpuid can change and it shouldn't. This is the goal of
> >introducing -cpu Westmere, to prevent it from happening.
> 
> This discussion isn't about whether QEMU should have a Westmere
> processor definition.  In fact, I think I already applied that
> patch.
> 
> It's a discussion about how we handle this up and down the stack.
> 
> The question is who should define and manage CPU compatibility.
> Right now QEMU does to a certain degree, libvirt discards this and
> does it's own thing, and VDSM/ovirt-engine assume that we're
> providing something and has built a UI around it.
If we want QEMU to be usable without management layer then QEMU should
provide stable CPU models. Stable in a sense that qemu, kernel or CPU
upgrade does not change what guest sees. If libvirt wants to override 
QEMU we should have a way to allow that, but than compatibility becomes
libvirt problem. Figuring out what minimal CPU model that can be used
across a cluster of different machines should be ovirt task.

> 
> What I'm proposing we consider: have VDSM manage CPU definitions in
> order to provide a specific user experience in ovirt-engine.
> 
> We would continue to have Westmere/etc in QEMU exposed as part of
> the user configuration.  But I don't think it makes a lot of sense
> to have to modify QEMU any time a new CPU comes out.
> 
If new cpu does not provide any new instruction set or capability that
can be passed to a guest then there is no point creating CPU model for
it in QEMU. If it does it is just a matter of updating config file. New
CPUs are not something that pops up twice a month.

--
			Gleb.
Gleb Natapov March 11, 2012, 3:12 p.m. UTC | #11
On Sun, Mar 11, 2012 at 09:16:49AM -0500, Anthony Liguori wrote:
> >If libvirt assumes anything about what kvm actually supports it is
> >working only by sheer luck.
> 
> Well the simple answer for libvirt is don't use -nodefconfig and
> then it can reuse the CPU definitions (including any that the user
> adds).
CPU models should be usable even with -nodefconfig. CPU model is more
like device. By -cpu Nehalem I am saying I want Nehalem device in my
machine.

> 
> Really, what's the point of having a layer of management if we're
> saying that doing policy management is too complicated for that
> layer?  What does that layer exist to provide then?
> 
I was always against libvirt configuring low level details of CPU. What
it should do IMO is to chose best CPU model for host cpu (one can argue
that fiddling with /proc/cpuinfo is not QEMU busyness).

> >>(Also, there are additional low-level bits that really have to be
> >>maintained somewhere, just to have sane defaults. Currently many CPUID
> >>leafs are exposed to the guest without letting the user control them,
> >>and worse: without keeping stability of guest-visible bits when
> >>upgrading Qemu or the host kernel. And that's what machine-types are
> >>for: to have sane defaults to be used as base.)
> >>
> >>Let me give you a practical example: I had a bug report about improper
> >>CPU topology information[1]. After investigating it, I have found out
> >>that the "level" cpudef field is too low; CPU core topology information
> >>is provided on CPUID leaf 4, and most of the Intel CPU models on Qemu
> >>have level=2 today (I don't know why). So, Qemu is responsible for
> >>exposing CPU topology information set using '-smp' to the guest OS, but
> >>libvirt would have to be responsible for choosing a proper "level" value
> >>that makes that information visible to the guest. We can _allow_ libvirt
> >>to fiddle with these low-level bits, of course, but requiring every
> >>management layer to build this low-level information from scratch is
> >>just a recipe to waste developer time.
> >And QEMU become even less usable from a command line. One more point to
> >kvm-tool I guess.
> 
> I'm not sure what your point is.  We're talking about an option that
> humans don't use.  How is this a discussion about QEMU usability?
> 
If for a user to have stable guest environment we require libvirt use
then QEMU by itself is less usable. We do have machine types in QEMU to
expose stable machine to a guest. CPU models should be part of it.

--
			Gleb.
Anthony Liguori March 11, 2012, 3:33 p.m. UTC | #12
On 03/11/2012 09:56 AM, Gleb Natapov wrote:
> On Sun, Mar 11, 2012 at 09:12:58AM -0500, Anthony Liguori wrote:
>> -cpu best wouldn't solve this.  You need a read/write configuration
>> file where QEMU probes the available CPU and records it to be used
>> for the lifetime of the VM.
> That what I thought too, but this shouldn't be the case (Avi's idea).
> We need two things: 1) CPU model config should be per machine type.
> 2) QEMU should refuse to start if it cannot create cpu exactly as
> specified by model config.

This would either mean:

A. pc-1.1 uses -cpu best with a fixed mask for 1.1

B. pc-1.1 hardcodes Westmere or some other family

(A) would imply a different CPU if you moved the machine from one system to 
another.  I would think this would be very problematic from a user's perspective.

(B) would imply that we had to choose the least common denominator which is 
essentially what we do today with qemu64.  If you want to just switch qemu64 to 
Conroe, I don't think that's a huge difference from what we have today.

>> It's a discussion about how we handle this up and down the stack.
>>
>> The question is who should define and manage CPU compatibility.
>> Right now QEMU does to a certain degree, libvirt discards this and
>> does it's own thing, and VDSM/ovirt-engine assume that we're
>> providing something and has built a UI around it.
> If we want QEMU to be usable without management layer then QEMU should
> provide stable CPU models. Stable in a sense that qemu, kernel or CPU
> upgrade does not change what guest sees.

We do this today by exposing -cpu qemu64 by default.  If all you're advocating 
is doing -cpu Conroe by default, that's fine.

But I fail to see where this fits into the larger discussion here.  The problem 
to solve is: I want to use the largest possible subset of CPU features available 
uniformly throughout my datacenter.

QEMU and libvirt have single node views so they cannot solve this problem on 
their own.  Whether that subset is a generic Westmere-like processor that never 
existed IRL or a specific Westmere processor seems like a decision that should 
be made by the datacenter level manager with the node level view.

If I have a homogeneous environments of Xeon 7540, I would probably like to see 
a Xeon 7540 in my guest.  Doesn't it make sense to enable the management tool to 
make this decision?

Regards,

Anthony Liguori
Anthony Liguori March 11, 2012, 3:41 p.m. UTC | #13
On 03/11/2012 10:12 AM, Gleb Natapov wrote:
> On Sun, Mar 11, 2012 at 09:16:49AM -0500, Anthony Liguori wrote:
>>> If libvirt assumes anything about what kvm actually supports it is
>>> working only by sheer luck.
>>
>> Well the simple answer for libvirt is don't use -nodefconfig and
>> then it can reuse the CPU definitions (including any that the user
>> adds).
> CPU models should be usable even with -nodefconfig. CPU model is more
> like device. By -cpu Nehalem I am saying I want Nehalem device in my
> machine.

Let's say we moved CPU definitions to /usr/share/qemu/cpu-models.xml.

Obviously, we'd want a command line option to be able to change that location so 
we'd introduce -cpu-models PATH.

But we want all of our command line options to be settable by the global 
configuration file so we would have a cpu-model=PATH to the configuration file.

But why hard code a path when we can just set the default path in the 
configuration file so let's avoid hard coding and just put 
cpu-models=/usr/share/qemu/cpu-models.xml in the default configuration file.

But now when libvirt uses -nodefconfig, those models go away.  -nodefconfig 
means start QEMU in the most minimal state possible.  You get what you pay for 
if you use it.

We'll have the same problem with machine configuration files.  At some point in 
time, -nodefconfig will make machine models disappear.

Regards,

Anthony Liguori
Gleb Natapov March 11, 2012, 4:16 p.m. UTC | #14
On Sun, Mar 11, 2012 at 10:33:15AM -0500, Anthony Liguori wrote:
> On 03/11/2012 09:56 AM, Gleb Natapov wrote:
> >On Sun, Mar 11, 2012 at 09:12:58AM -0500, Anthony Liguori wrote:
> >>-cpu best wouldn't solve this.  You need a read/write configuration
> >>file where QEMU probes the available CPU and records it to be used
> >>for the lifetime of the VM.
> >That what I thought too, but this shouldn't be the case (Avi's idea).
> >We need two things: 1) CPU model config should be per machine type.
> >2) QEMU should refuse to start if it cannot create cpu exactly as
> >specified by model config.
> 
> This would either mean:
> 
> A. pc-1.1 uses -cpu best with a fixed mask for 1.1
> 
> B. pc-1.1 hardcodes Westmere or some other family
> 
This would mean neither A nor B. May be it wasn't clear but I didn't talk
about -cpu best above. I am talking about any CPU model with fixed meaning
(not host or best which are host cpu dependant). Lets take Nehalem for
example (just to move from Westmere :)). Currently it has level=2. Eduardo
wants to fix it to be 11, but old guests, installed with -cpu Nehalem,
should see the same CPU exactly. How do you do it? Have different
Nehalem definition for pc-1.0 (which level=2) and pc-1.1 (with level=11).
Lets get back to Westmere. It actually has level=11, but that's only
expose another problem. Kernel 3.3 and qemu-1.1 combo will support
architectural PMU which is exposed in cpuid leaf 10. We do not want
guests installed with -cpu Westmere and qemu-1.0 to see architectural
PMU after upgrade. How do you do it? Have different Westmere definitions
for pc-1.0 (does not report PMU) and pc-1.1 (reports PMU). What happens
if you'll try to run qemu-1.1 -cpu Westmere on Kernel < 3.3 (without
PMU support)? Qemu will fail to start.


> (A) would imply a different CPU if you moved the machine from one
> system to another.  I would think this would be very problematic
> from a user's perspective.
> 
> (B) would imply that we had to choose the least common denominator
> which is essentially what we do today with qemu64.  If you want to
> just switch qemu64 to Conroe, I don't think that's a huge difference
> from what we have today.
> 
> >>It's a discussion about how we handle this up and down the stack.
> >>
> >>The question is who should define and manage CPU compatibility.
> >>Right now QEMU does to a certain degree, libvirt discards this and
> >>does it's own thing, and VDSM/ovirt-engine assume that we're
> >>providing something and has built a UI around it.
> >If we want QEMU to be usable without management layer then QEMU should
> >provide stable CPU models. Stable in a sense that qemu, kernel or CPU
> >upgrade does not change what guest sees.
> 
> We do this today by exposing -cpu qemu64 by default.  If all you're
> advocating is doing -cpu Conroe by default, that's fine.
I am not advocating that. I am saying we should be able to amend qemu64
definition without breaking older guests that use it.

> 
> But I fail to see where this fits into the larger discussion here.
> The problem to solve is: I want to use the largest possible subset
> of CPU features available uniformly throughout my datacenter.
> 
> QEMU and libvirt have single node views so they cannot solve this
> problem on their own.  Whether that subset is a generic
> Westmere-like processor that never existed IRL or a specific
> Westmere processor seems like a decision that should be made by the
> datacenter level manager with the node level view.
> 
> If I have a homogeneous environments of Xeon 7540, I would probably
> like to see a Xeon 7540 in my guest.  Doesn't it make sense to
> enable the management tool to make this decision?
> 
Of course neither QEMU nor libvirt can't made a cluster wide decision.
If QEMU provides sane CPU model definitions (usable even with -nodefconfig)
it would be always possible to find the model that fits best. If the
oldest CPU in data center is Nehalem then probably -cpu Nehalem will do.
But our CPU model definitions have a lot of shortcomings and we were
talking with Edurado how to fix them when he brought this thread back to
life, so may be I stirred the discussion a little bit in the wrong
direction, but I do think those things are connected. If QEMU CPU model
definitions are not stable across upgrades how can we say to management
that it is safe to use them? Instead they insist in reimplementing the
same logic in mngmt layer and do it badly (because the lack of info).

--
			Gleb.
Gleb Natapov March 11, 2012, 4:27 p.m. UTC | #15
On Sun, Mar 11, 2012 at 10:41:32AM -0500, Anthony Liguori wrote:
> On 03/11/2012 10:12 AM, Gleb Natapov wrote:
> >On Sun, Mar 11, 2012 at 09:16:49AM -0500, Anthony Liguori wrote:
> >>>If libvirt assumes anything about what kvm actually supports it is
> >>>working only by sheer luck.
> >>
> >>Well the simple answer for libvirt is don't use -nodefconfig and
> >>then it can reuse the CPU definitions (including any that the user
> >>adds).
> >CPU models should be usable even with -nodefconfig. CPU model is more
> >like device. By -cpu Nehalem I am saying I want Nehalem device in my
> >machine.
> 
> Let's say we moved CPU definitions to /usr/share/qemu/cpu-models.xml.
> 
> Obviously, we'd want a command line option to be able to change that
> location so we'd introduce -cpu-models PATH.
> 
> But we want all of our command line options to be settable by the
> global configuration file so we would have a cpu-model=PATH to the
> configuration file.
> 
> But why hard code a path when we can just set the default path in
> the configuration file so let's avoid hard coding and just put
> cpu-models=/usr/share/qemu/cpu-models.xml in the default
> configuration file.
> 
We have two places where we define cpu models: hardcoded in
target-i386/cpuid.c and in target-x86_64.conf. We moved them out to conf
file because this way it is easier to add, update, examine compare CPU
models. But they still should be treated as essential part of qemu. Given
this I do not see the step above as a logical one. CPU models are not
part of machine config.  "-cpu Nehalem,-sse,level=3,model=5" is part of
machine config.

What if we introduce a way to write devices in LUA. Should -nodefconfig
drop devices implemented as LUA scripts too?

> But now when libvirt uses -nodefconfig, those models go away.
> -nodefconfig means start QEMU in the most minimal state possible.
> You get what you pay for if you use it.
> 
> We'll have the same problem with machine configuration files.  At
> some point in time, -nodefconfig will make machine models disappear.
> 
--
			Gleb.
Eduardo Habkost March 12, 2012, 12:52 p.m. UTC | #16
On Sun, Mar 11, 2012 at 09:12:58AM -0500, Anthony Liguori wrote:
> On 03/11/2012 08:27 AM, Gleb Natapov wrote:
> >On Sat, Mar 10, 2012 at 12:24:47PM -0600, Anthony Liguori wrote:
> >>Let's step back here.
> >>
> >>Why are you writing these patches?  It's probably not because you
> >>have a desire to say -cpu Westmere when you run QEMU on your laptop.
> >>I'd wager to say that no human has ever done that or that if they
> >>had, they did so by accident because they read documentation and
> >>thought they had to.

No, it's because libvirt doesn't handle all the tiny small details
involved in specifying a CPU. All libvirty knows about are a set of CPU
flag bits, but it knows nothing about 'level', 'family', and 'xlevel',
but we would like to allow it to expose a Westmere-like CPU to the
guest.

libvirt does know how to use the Westmere CPU model today, if it is not
disabled by -nodefconfig. The interface it uses for probing has
deficiencies, but it works right now.


> >>Humans probably do one of two things: 1) no cpu option or 2) -cpu host.
> >>
> >And both are not optimal. Actually both are bad. First one because
> >default cpu is very conservative and the second because there is no
> >guaranty that guest will continue to work after qemu or kernel upgrade.
> >
> >Let me elaborate about the later. Suppose host CPU has kill_guest
> >feature and at the time a guest was installed it was not implemented by
> >kvm. Since it was not implemented by kvm it was not present in vcpu
> >during installation and the guest didn't install "workaround kill_guest"
> >module. Now unsuspecting user upgrades the kernel and tries to restart
> >the guest and fails. He writes angry letter to qemu-devel and is asked to
> >reinstall his guest and move along.
> 
> -cpu best wouldn't solve this.  You need a read/write configuration
> file where QEMU probes the available CPU and records it to be used
> for the lifetime of the VM.

If the CPU records are used for probing, this is yet another reason they
are not "configuration", but "defaults/templates to be used to build the
actual configuration".

IMHO, having to generate an opaque config file based on the results of
probing is poor interface design, for humans _and_ for machines. If we
have any bug on the probing, or on the data used as base for the
probing, or on the config generation, it will be impossible to deploy a
fix for the users.

This is why machine-types exist: you have the ability to implement
probing and/or sane defaults, but at the same time you can change the
probing behavior or the set of defaults without breaking existing
machines. This way, the config file contains only what the user really
wanted to configure, not some complex and opaque result of a probing
process.

Tthe fact that we have a _set_ of CPU definitions to choose from (or to
use as input for probing) instead of a single default "CPU" definition
that the user can change is a sign that that the cpudefs are _not_ user
configuration, but just templates/defaults.


[...]
> This discussion isn't about whether QEMU should have a Westmere
> processor definition.  In fact, I think I already applied that patch.
> 
> It's a discussion about how we handle this up and down the stack.

Agreed on this point.

> 
> The question is who should define and manage CPU compatibility.
> Right now QEMU does to a certain degree, libvirt discards this and
> does it's own thing, and VDSM/ovirt-engine assume that we're
> providing something and has built a UI around it.

libvirt doesn't discard this. If it just discarded this and properly
defined its own models, I wouldn't even have (re-)started this thread.

(Well, maybe I would have started a similar thread arguing that we are
wasting time working on equivalent known-to-work CPU model definitions
on Qemu and libvirt. Today we don't waste time doing it because libvirt
currently expects -nodefconfig to not disable the existing default
models).

> 
> What I'm proposing we consider: have VDSM manage CPU definitions in
> order to provide a specific user experience in ovirt-engine.

I don't disagree completely with that. The problem is defining what's
"CPU definitions". The current cpudef semantics is simply too low level,
it impacts other features that are _already_ managed by Qemu. Let me try
to enumerate:

- Some CPUID leafs are defined based on -smp;
- Some CPUID leafs depend on kernel capabilities;
- The availability of some CPUID leafs depend on some features
  being enabled or not, but they are simply not exposed if a proper
  'level' value is set.

We could have two approaches here: we can define some details of CPU
definitions as "not configurable" and others as "must-be configurable",
and force management layer to agree with us about what should be
configurable or not.

Or, we could simply define that a sane set of CPU definitions are part
of a machine-type, and let managment to reconfigure parts of it if
desired, but do not force it to configure it if not needed.

> 
> We would continue to have Westmere/etc in QEMU exposed as part of the
> user configuration.  But I don't think it makes a lot of sense to
> have to modify QEMU any time a new CPU comes out.

Today we have to, because libvirt doesn't handle all the details of CPU
definitions. I would be happy if libvirt took to itself the
responsibility of defining all those CPUs, but that's not true today.

And even if we all agree that in the future libvirt will manage every
single detail of the CPU. I would still argue that CPU definition
defaults (especially if they are used as input for probing) should be
part of machine-type definitions, as not everybody uses libvirt.
Daniel P. Berrangé March 12, 2012, 1:04 p.m. UTC | #17
On Mon, Mar 12, 2012 at 09:52:27AM -0300, Eduardo Habkost wrote:
> On Sun, Mar 11, 2012 at 09:12:58AM -0500, Anthony Liguori wrote:
> > On 03/11/2012 08:27 AM, Gleb Natapov wrote:
> > >On Sat, Mar 10, 2012 at 12:24:47PM -0600, Anthony Liguori wrote:
> > >>Let's step back here.
> > >>
> > >>Why are you writing these patches?  It's probably not because you
> > >>have a desire to say -cpu Westmere when you run QEMU on your laptop.
> > >>I'd wager to say that no human has ever done that or that if they
> > >>had, they did so by accident because they read documentation and
> > >>thought they had to.
> 
> No, it's because libvirt doesn't handle all the tiny small details
> involved in specifying a CPU. All libvirty knows about are a set of CPU
> flag bits, but it knows nothing about 'level', 'family', and 'xlevel',
> but we would like to allow it to expose a Westmere-like CPU to the
> guest.

This is easily fixable in libvirt - so for the point of going discussion,
IMHO, we can assume libvirt will support level, family, xlevel, etc.


Daniel
Eduardo Habkost March 12, 2012, 1:08 p.m. UTC | #18
On Sun, Mar 11, 2012 at 10:41:32AM -0500, Anthony Liguori wrote:
> On 03/11/2012 10:12 AM, Gleb Natapov wrote:
> >On Sun, Mar 11, 2012 at 09:16:49AM -0500, Anthony Liguori wrote:
> >>>If libvirt assumes anything about what kvm actually supports it is
> >>>working only by sheer luck.
> >>
> >>Well the simple answer for libvirt is don't use -nodefconfig and
> >>then it can reuse the CPU definitions (including any that the user
> >>adds).
> >CPU models should be usable even with -nodefconfig. CPU model is more
> >like device. By -cpu Nehalem I am saying I want Nehalem device in my
> >machine.
> 
> Let's say we moved CPU definitions to /usr/share/qemu/cpu-models.xml.
> 
> Obviously, we'd want a command line option to be able to change that
> location so we'd introduce -cpu-models PATH.
> 
> But we want all of our command line options to be settable by the
> global configuration file so we would have a cpu-model=PATH to the
> configuration file.
> 
> But why hard code a path when we can just set the default path in the
> configuration file so let's avoid hard coding and just put
> cpu-models=/usr/share/qemu/cpu-models.xml in the default
> configuration file.

We wouldn't do the above.

-nodefconfig should disable the loading of files on /etc, but it
shouldn't disable loading internal non-configurable data that we just
happened to choose to store outside the qemu binary because it makes
development easier.

Really, the requirement of a "default configuration file" is a problem
by itself. Qemu should not require a default configuration file to work,
and it shouldn't require users to copy the default configuration file to
change options from the default.

Doing this would make it impossible to deploy fixes to users if we evern
find out that the default configuration file had a serious bug. What if
a bug in our default configuration file has a serious security
implication?

> 
> But now when libvirt uses -nodefconfig, those models go away.
> -nodefconfig means start QEMU in the most minimal state possible.
> You get what you pay for if you use it.
> 
> We'll have the same problem with machine configuration files.  At
> some point in time, -nodefconfig will make machine models disappear.

It shouldn't. Machine-types are defaults to be used as base, they are
not user-provided configuration. And the fact that we decided to store
some data outside of the Qemu binary is orthogonal the design decisions
in the Qemu command-line and configuration interface.

As I said previously, requiring generation of opaque config files (and
"copy the default config file and change it" is included on my
definition of "generation of opaque config files") is poor design, IMO.
I bet this even has an entry in some design anti-pattern catalog
somewhere.
Gleb Natapov March 12, 2012, 1:15 p.m. UTC | #19
On Mon, Mar 12, 2012 at 01:04:19PM +0000, Daniel P. Berrange wrote:
> On Mon, Mar 12, 2012 at 09:52:27AM -0300, Eduardo Habkost wrote:
> > On Sun, Mar 11, 2012 at 09:12:58AM -0500, Anthony Liguori wrote:
> > > On 03/11/2012 08:27 AM, Gleb Natapov wrote:
> > > >On Sat, Mar 10, 2012 at 12:24:47PM -0600, Anthony Liguori wrote:
> > > >>Let's step back here.
> > > >>
> > > >>Why are you writing these patches?  It's probably not because you
> > > >>have a desire to say -cpu Westmere when you run QEMU on your laptop.
> > > >>I'd wager to say that no human has ever done that or that if they
> > > >>had, they did so by accident because they read documentation and
> > > >>thought they had to.
> > 
> > No, it's because libvirt doesn't handle all the tiny small details
> > involved in specifying a CPU. All libvirty knows about are a set of CPU
> > flag bits, but it knows nothing about 'level', 'family', and 'xlevel',
> > but we would like to allow it to expose a Westmere-like CPU to the
> > guest.
> 
> This is easily fixable in libvirt - so for the point of going discussion,
> IMHO, we can assume libvirt will support level, family, xlevel, etc.
> 
And fill in all cpuid leafs by querying /dev/kvm when needed or, if TCG
is used, replicating QEMU logic? And since QEMU should be usable without
libvirt the same logic should be implemented in QEMU anyway.

--
			Gleb.
Eduardo Habkost March 12, 2012, 1:32 p.m. UTC | #20
On Mon, Mar 12, 2012 at 03:15:32PM +0200, Gleb Natapov wrote:
> On Mon, Mar 12, 2012 at 01:04:19PM +0000, Daniel P. Berrange wrote:
> > On Mon, Mar 12, 2012 at 09:52:27AM -0300, Eduardo Habkost wrote:
> > > On Sun, Mar 11, 2012 at 09:12:58AM -0500, Anthony Liguori wrote:
> > > > On 03/11/2012 08:27 AM, Gleb Natapov wrote:
> > > > >On Sat, Mar 10, 2012 at 12:24:47PM -0600, Anthony Liguori wrote:
> > > > >>Let's step back here.
> > > > >>
> > > > >>Why are you writing these patches?  It's probably not because you
> > > > >>have a desire to say -cpu Westmere when you run QEMU on your laptop.
> > > > >>I'd wager to say that no human has ever done that or that if they
> > > > >>had, they did so by accident because they read documentation and
> > > > >>thought they had to.
> > > 
> > > No, it's because libvirt doesn't handle all the tiny small details
> > > involved in specifying a CPU. All libvirty knows about are a set of CPU
> > > flag bits, but it knows nothing about 'level', 'family', and 'xlevel',
> > > but we would like to allow it to expose a Westmere-like CPU to the
> > > guest.
> > 
> > This is easily fixable in libvirt - so for the point of going discussion,
> > IMHO, we can assume libvirt will support level, family, xlevel, etc.
> > 
> And fill in all cpuid leafs by querying /dev/kvm when needed or, if TCG
> is used, replicating QEMU logic? And since QEMU should be usable without
> libvirt the same logic should be implemented in QEMU anyway.

To implement this properly, libvirt will need a proper probing interface
to know what exactly is available and can be enabled. I plan to
implement that.

I am have no problem in giving to libvirt the power to shoot itself in
the foot. I believe libvirt developers can handle that. I have a problem
with requiring every user (human or machine) to handle a weapon that can
shoot their foot (that means, requiring the user to write the CPU model
definition from scratch, or requiring the user to blindly copy&paste the
default config file).
Gleb Natapov March 12, 2012, 1:34 p.m. UTC | #21
On Mon, Mar 12, 2012 at 10:32:21AM -0300, Eduardo Habkost wrote:
> On Mon, Mar 12, 2012 at 03:15:32PM +0200, Gleb Natapov wrote:
> > On Mon, Mar 12, 2012 at 01:04:19PM +0000, Daniel P. Berrange wrote:
> > > On Mon, Mar 12, 2012 at 09:52:27AM -0300, Eduardo Habkost wrote:
> > > > On Sun, Mar 11, 2012 at 09:12:58AM -0500, Anthony Liguori wrote:
> > > > > On 03/11/2012 08:27 AM, Gleb Natapov wrote:
> > > > > >On Sat, Mar 10, 2012 at 12:24:47PM -0600, Anthony Liguori wrote:
> > > > > >>Let's step back here.
> > > > > >>
> > > > > >>Why are you writing these patches?  It's probably not because you
> > > > > >>have a desire to say -cpu Westmere when you run QEMU on your laptop.
> > > > > >>I'd wager to say that no human has ever done that or that if they
> > > > > >>had, they did so by accident because they read documentation and
> > > > > >>thought they had to.
> > > > 
> > > > No, it's because libvirt doesn't handle all the tiny small details
> > > > involved in specifying a CPU. All libvirty knows about are a set of CPU
> > > > flag bits, but it knows nothing about 'level', 'family', and 'xlevel',
> > > > but we would like to allow it to expose a Westmere-like CPU to the
> > > > guest.
> > > 
> > > This is easily fixable in libvirt - so for the point of going discussion,
> > > IMHO, we can assume libvirt will support level, family, xlevel, etc.
> > > 
> > And fill in all cpuid leafs by querying /dev/kvm when needed or, if TCG
> > is used, replicating QEMU logic? And since QEMU should be usable without
> > libvirt the same logic should be implemented in QEMU anyway.
> 
> To implement this properly, libvirt will need a proper probing interface
> to know what exactly is available and can be enabled. I plan to
> implement that.
> 
> I am have no problem in giving to libvirt the power to shoot itself in
> the foot. I believe libvirt developers can handle that. I have a problem
> with requiring every user (human or machine) to handle a weapon that can
> shoot their foot (that means, requiring the user to write the CPU model
> definition from scratch, or requiring the user to blindly copy&paste the
> default config file).
> 
You are dangerous person Eduardo!

--
			Gleb.
Daniel P. Berrangé March 12, 2012, 1:50 p.m. UTC | #22
On Mon, Mar 12, 2012 at 03:15:32PM +0200, Gleb Natapov wrote:
> On Mon, Mar 12, 2012 at 01:04:19PM +0000, Daniel P. Berrange wrote:
> > On Mon, Mar 12, 2012 at 09:52:27AM -0300, Eduardo Habkost wrote:
> > > On Sun, Mar 11, 2012 at 09:12:58AM -0500, Anthony Liguori wrote:
> > > > On 03/11/2012 08:27 AM, Gleb Natapov wrote:
> > > > >On Sat, Mar 10, 2012 at 12:24:47PM -0600, Anthony Liguori wrote:
> > > > >>Let's step back here.
> > > > >>
> > > > >>Why are you writing these patches?  It's probably not because you
> > > > >>have a desire to say -cpu Westmere when you run QEMU on your laptop.
> > > > >>I'd wager to say that no human has ever done that or that if they
> > > > >>had, they did so by accident because they read documentation and
> > > > >>thought they had to.
> > > 
> > > No, it's because libvirt doesn't handle all the tiny small details
> > > involved in specifying a CPU. All libvirty knows about are a set of CPU
> > > flag bits, but it knows nothing about 'level', 'family', and 'xlevel',
> > > but we would like to allow it to expose a Westmere-like CPU to the
> > > guest.
> > 
> > This is easily fixable in libvirt - so for the point of going discussion,
> > IMHO, we can assume libvirt will support level, family, xlevel, etc.
> > 
> And fill in all cpuid leafs by querying /dev/kvm when needed or, if TCG
> is used, replicating QEMU logic? And since QEMU should be usable without
> libvirt the same logic should be implemented in QEMU anyway.

I'm not refering to that. I'm saying that any data QEMU has in its
config file (/etc/qemu/target-x86_64.conf) should be represented
in the libvirt CPU XML. family, model, stepping, xlevel and
model_id are currently in QEMU CPU configs, but not in libvirt XML,
which is something we will fix. The other issues you mention are
completely independant of that.

Regards,
Daniel
Gleb Natapov March 12, 2012, 1:53 p.m. UTC | #23
On Mon, Mar 12, 2012 at 01:50:18PM +0000, Daniel P. Berrange wrote:
> On Mon, Mar 12, 2012 at 03:15:32PM +0200, Gleb Natapov wrote:
> > On Mon, Mar 12, 2012 at 01:04:19PM +0000, Daniel P. Berrange wrote:
> > > On Mon, Mar 12, 2012 at 09:52:27AM -0300, Eduardo Habkost wrote:
> > > > On Sun, Mar 11, 2012 at 09:12:58AM -0500, Anthony Liguori wrote:
> > > > > On 03/11/2012 08:27 AM, Gleb Natapov wrote:
> > > > > >On Sat, Mar 10, 2012 at 12:24:47PM -0600, Anthony Liguori wrote:
> > > > > >>Let's step back here.
> > > > > >>
> > > > > >>Why are you writing these patches?  It's probably not because you
> > > > > >>have a desire to say -cpu Westmere when you run QEMU on your laptop.
> > > > > >>I'd wager to say that no human has ever done that or that if they
> > > > > >>had, they did so by accident because they read documentation and
> > > > > >>thought they had to.
> > > > 
> > > > No, it's because libvirt doesn't handle all the tiny small details
> > > > involved in specifying a CPU. All libvirty knows about are a set of CPU
> > > > flag bits, but it knows nothing about 'level', 'family', and 'xlevel',
> > > > but we would like to allow it to expose a Westmere-like CPU to the
> > > > guest.
> > > 
> > > This is easily fixable in libvirt - so for the point of going discussion,
> > > IMHO, we can assume libvirt will support level, family, xlevel, etc.
> > > 
> > And fill in all cpuid leafs by querying /dev/kvm when needed or, if TCG
> > is used, replicating QEMU logic? And since QEMU should be usable without
> > libvirt the same logic should be implemented in QEMU anyway.
> 
> I'm not refering to that. I'm saying that any data QEMU has in its
> config file (/etc/qemu/target-x86_64.conf) should be represented
> in the libvirt CPU XML. family, model, stepping, xlevel and
> model_id are currently in QEMU CPU configs, but not in libvirt XML,
> which is something we will fix. The other issues you mention are
> completely independant of that.
> 
Eduardo is going to extend what can be configured in /etc/qemu/target-x86_64.conf
and make CPU models name per machine type. What QEMU has now is not
good enough. I doubt libvirt goal is to be as bad as QEMU :)

--
			Gleb.
Daniel P. Berrangé March 12, 2012, 1:55 p.m. UTC | #24
On Mon, Mar 12, 2012 at 03:53:38PM +0200, Gleb Natapov wrote:
> On Mon, Mar 12, 2012 at 01:50:18PM +0000, Daniel P. Berrange wrote:
> > On Mon, Mar 12, 2012 at 03:15:32PM +0200, Gleb Natapov wrote:
> > > On Mon, Mar 12, 2012 at 01:04:19PM +0000, Daniel P. Berrange wrote:
> > > > On Mon, Mar 12, 2012 at 09:52:27AM -0300, Eduardo Habkost wrote:
> > > > > On Sun, Mar 11, 2012 at 09:12:58AM -0500, Anthony Liguori wrote:
> > > > > > On 03/11/2012 08:27 AM, Gleb Natapov wrote:
> > > > > > >On Sat, Mar 10, 2012 at 12:24:47PM -0600, Anthony Liguori wrote:
> > > > > > >>Let's step back here.
> > > > > > >>
> > > > > > >>Why are you writing these patches?  It's probably not because you
> > > > > > >>have a desire to say -cpu Westmere when you run QEMU on your laptop.
> > > > > > >>I'd wager to say that no human has ever done that or that if they
> > > > > > >>had, they did so by accident because they read documentation and
> > > > > > >>thought they had to.
> > > > > 
> > > > > No, it's because libvirt doesn't handle all the tiny small details
> > > > > involved in specifying a CPU. All libvirty knows about are a set of CPU
> > > > > flag bits, but it knows nothing about 'level', 'family', and 'xlevel',
> > > > > but we would like to allow it to expose a Westmere-like CPU to the
> > > > > guest.
> > > > 
> > > > This is easily fixable in libvirt - so for the point of going discussion,
> > > > IMHO, we can assume libvirt will support level, family, xlevel, etc.
> > > > 
> > > And fill in all cpuid leafs by querying /dev/kvm when needed or, if TCG
> > > is used, replicating QEMU logic? And since QEMU should be usable without
> > > libvirt the same logic should be implemented in QEMU anyway.
> > 
> > I'm not refering to that. I'm saying that any data QEMU has in its
> > config file (/etc/qemu/target-x86_64.conf) should be represented
> > in the libvirt CPU XML. family, model, stepping, xlevel and
> > model_id are currently in QEMU CPU configs, but not in libvirt XML,
> > which is something we will fix. The other issues you mention are
> > completely independant of that.
> > 
> Eduardo is going to extend what can be configured in /etc/qemu/target-x86_64.conf
> and make CPU models name per machine type. What QEMU has now is not
> good enough. I doubt libvirt goal is to be as bad as QEMU :)

Of course not - libvirt will obviously be extended to cope with this
too


Daniel
Gleb Natapov March 12, 2012, 2:01 p.m. UTC | #25
On Mon, Mar 12, 2012 at 01:55:34PM +0000, Daniel P. Berrange wrote:
> On Mon, Mar 12, 2012 at 03:53:38PM +0200, Gleb Natapov wrote:
> > On Mon, Mar 12, 2012 at 01:50:18PM +0000, Daniel P. Berrange wrote:
> > > On Mon, Mar 12, 2012 at 03:15:32PM +0200, Gleb Natapov wrote:
> > > > On Mon, Mar 12, 2012 at 01:04:19PM +0000, Daniel P. Berrange wrote:
> > > > > On Mon, Mar 12, 2012 at 09:52:27AM -0300, Eduardo Habkost wrote:
> > > > > > On Sun, Mar 11, 2012 at 09:12:58AM -0500, Anthony Liguori wrote:
> > > > > > > On 03/11/2012 08:27 AM, Gleb Natapov wrote:
> > > > > > > >On Sat, Mar 10, 2012 at 12:24:47PM -0600, Anthony Liguori wrote:
> > > > > > > >>Let's step back here.
> > > > > > > >>
> > > > > > > >>Why are you writing these patches?  It's probably not because you
> > > > > > > >>have a desire to say -cpu Westmere when you run QEMU on your laptop.
> > > > > > > >>I'd wager to say that no human has ever done that or that if they
> > > > > > > >>had, they did so by accident because they read documentation and
> > > > > > > >>thought they had to.
> > > > > > 
> > > > > > No, it's because libvirt doesn't handle all the tiny small details
> > > > > > involved in specifying a CPU. All libvirty knows about are a set of CPU
> > > > > > flag bits, but it knows nothing about 'level', 'family', and 'xlevel',
> > > > > > but we would like to allow it to expose a Westmere-like CPU to the
> > > > > > guest.
> > > > > 
> > > > > This is easily fixable in libvirt - so for the point of going discussion,
> > > > > IMHO, we can assume libvirt will support level, family, xlevel, etc.
> > > > > 
> > > > And fill in all cpuid leafs by querying /dev/kvm when needed or, if TCG
> > > > is used, replicating QEMU logic? And since QEMU should be usable without
> > > > libvirt the same logic should be implemented in QEMU anyway.
> > > 
> > > I'm not refering to that. I'm saying that any data QEMU has in its
> > > config file (/etc/qemu/target-x86_64.conf) should be represented
> > > in the libvirt CPU XML. family, model, stepping, xlevel and
> > > model_id are currently in QEMU CPU configs, but not in libvirt XML,
> > > which is something we will fix. The other issues you mention are
> > > completely independant of that.
> > > 
> > Eduardo is going to extend what can be configured in /etc/qemu/target-x86_64.conf
> > and make CPU models name per machine type. What QEMU has now is not
> > good enough. I doubt libvirt goal is to be as bad as QEMU :)
> 
> Of course not - libvirt will obviously be extended to cope with this
> too
> 
So you goal is to follow closely what QEMU does? Fine by me, but then
QEMU design decisions in this ares should not rely on libvirt (as in
"this is libvirt job").

--
			Gleb.
Anthony Liguori March 12, 2012, 2:48 p.m. UTC | #26
On 03/11/2012 11:16 AM, Gleb Natapov wrote:
> On Sun, Mar 11, 2012 at 10:33:15AM -0500, Anthony Liguori wrote:
>> On 03/11/2012 09:56 AM, Gleb Natapov wrote:
>>> On Sun, Mar 11, 2012 at 09:12:58AM -0500, Anthony Liguori wrote:
>>>> -cpu best wouldn't solve this.  You need a read/write configuration
>>>> file where QEMU probes the available CPU and records it to be used
>>>> for the lifetime of the VM.
>>> That what I thought too, but this shouldn't be the case (Avi's idea).
>>> We need two things: 1) CPU model config should be per machine type.
>>> 2) QEMU should refuse to start if it cannot create cpu exactly as
>>> specified by model config.
>>
>> This would either mean:
>>
>> A. pc-1.1 uses -cpu best with a fixed mask for 1.1
>>
>> B. pc-1.1 hardcodes Westmere or some other family
>>
> This would mean neither A nor B. May be it wasn't clear but I didn't talk
> about -cpu best above. I am talking about any CPU model with fixed meaning
> (not host or best which are host cpu dependant). Lets take Nehalem for
> example (just to move from Westmere :)). Currently it has level=2. Eduardo
> wants to fix it to be 11, but old guests, installed with -cpu Nehalem,
> should see the same CPU exactly. How do you do it? Have different
> Nehalem definition for pc-1.0 (which level=2) and pc-1.1 (with level=11).
> Lets get back to Westmere. It actually has level=11, but that's only
> expose another problem. Kernel 3.3 and qemu-1.1 combo will support
> architectural PMU which is exposed in cpuid leaf 10. We do not want
> guests installed with -cpu Westmere and qemu-1.0 to see architectural
> PMU after upgrade. How do you do it? Have different Westmere definitions
> for pc-1.0 (does not report PMU) and pc-1.1 (reports PMU). What happens
> if you'll try to run qemu-1.1 -cpu Westmere on Kernel<  3.3 (without
> PMU support)? Qemu will fail to start.

So, you're essentially proposing that -cpu Westmere becomes a machine option and 
that we let the machines interpret it as they see fit?

So --machine pc-1.0,cpu=Westmere would result in something different than 
--machine pc-1.1,cpu=Westmere?

That's something pretty different than what we're doing today.  I think that we 
would have a single CPUX86 object and that part of the pc initialization process 
was to create an appropriately configured CPUx86 object.

Regards,

Anthony Liguori
Eduardo Habkost March 12, 2012, 3:16 p.m. UTC | #27
On Mon, Mar 12, 2012 at 09:48:11AM -0500, Anthony Liguori wrote:
> On 03/11/2012 11:16 AM, Gleb Natapov wrote:
> >On Sun, Mar 11, 2012 at 10:33:15AM -0500, Anthony Liguori wrote:
> >>On 03/11/2012 09:56 AM, Gleb Natapov wrote:
> >>>On Sun, Mar 11, 2012 at 09:12:58AM -0500, Anthony Liguori wrote:
> >>>>-cpu best wouldn't solve this.  You need a read/write configuration
> >>>>file where QEMU probes the available CPU and records it to be used
> >>>>for the lifetime of the VM.
> >>>That what I thought too, but this shouldn't be the case (Avi's idea).
> >>>We need two things: 1) CPU model config should be per machine type.
> >>>2) QEMU should refuse to start if it cannot create cpu exactly as
> >>>specified by model config.
> >>
> >>This would either mean:
> >>
> >>A. pc-1.1 uses -cpu best with a fixed mask for 1.1
> >>
> >>B. pc-1.1 hardcodes Westmere or some other family
> >>
> >This would mean neither A nor B. May be it wasn't clear but I didn't talk
> >about -cpu best above. I am talking about any CPU model with fixed meaning
> >(not host or best which are host cpu dependant). Lets take Nehalem for
> >example (just to move from Westmere :)). Currently it has level=2. Eduardo
> >wants to fix it to be 11, but old guests, installed with -cpu Nehalem,
> >should see the same CPU exactly. How do you do it? Have different
> >Nehalem definition for pc-1.0 (which level=2) and pc-1.1 (with level=11).
> >Lets get back to Westmere. It actually has level=11, but that's only
> >expose another problem. Kernel 3.3 and qemu-1.1 combo will support
> >architectural PMU which is exposed in cpuid leaf 10. We do not want
> >guests installed with -cpu Westmere and qemu-1.0 to see architectural
> >PMU after upgrade. How do you do it? Have different Westmere definitions
> >for pc-1.0 (does not report PMU) and pc-1.1 (reports PMU). What happens
> >if you'll try to run qemu-1.1 -cpu Westmere on Kernel<  3.3 (without
> >PMU support)? Qemu will fail to start.
> 
> So, you're essentially proposing that -cpu Westmere becomes a machine
> option and that we let the machines interpret it as they see fit?
> 
> So --machine pc-1.0,cpu=Westmere would result in something different
> than --machine pc-1.1,cpu=Westmere?

Exactly.

> That's something pretty different than what we're doing today.  I
> think that we would have a single CPUX86 object and that part of the
> pc initialization process was to create an appropriately configured
> CPUx86 object.

Yes, that's different from what we're doing today, and it has to be
fixed.

(And, BTW, I'm really worried about your proposal that machine-types
would suddenly disappear when using -nodefconfig in case we decide to
move machine-type data to an external file one day. Design decisions
aside, this would break an interface that management tools already have
today.)
Andreas Färber March 12, 2012, 3:49 p.m. UTC | #28
Am 11.03.2012 17:16, schrieb Gleb Natapov:
> On Sun, Mar 11, 2012 at 10:33:15AM -0500, Anthony Liguori wrote:
>> On 03/11/2012 09:56 AM, Gleb Natapov wrote:
>>> On Sun, Mar 11, 2012 at 09:12:58AM -0500, Anthony Liguori wrote:
>>>> -cpu best wouldn't solve this.  You need a read/write configuration
>>>> file where QEMU probes the available CPU and records it to be used
>>>> for the lifetime of the VM.
>>> That what I thought too, but this shouldn't be the case (Avi's idea).
>>> We need two things: 1) CPU model config should be per machine type.
>>> 2) QEMU should refuse to start if it cannot create cpu exactly as
>>> specified by model config.
>>
>> This would either mean:
>>
>> A. pc-1.1 uses -cpu best with a fixed mask for 1.1
>>
>> B. pc-1.1 hardcodes Westmere or some other family
>>
> This would mean neither A nor B. May be it wasn't clear but I didn't talk
> about -cpu best above. I am talking about any CPU model with fixed meaning
> (not host or best which are host cpu dependant). Lets take Nehalem for
> example (just to move from Westmere :)). Currently it has level=2. Eduardo
> wants to fix it to be 11, but old guests, installed with -cpu Nehalem,
> should see the same CPU exactly. How do you do it? Have different
> Nehalem definition for pc-1.0 (which level=2) and pc-1.1 (with level=11).
> Lets get back to Westmere. It actually has level=11, but that's only
> expose another problem. Kernel 3.3 and qemu-1.1 combo will support
> architectural PMU which is exposed in cpuid leaf 10. We do not want
> guests installed with -cpu Westmere and qemu-1.0 to see architectural
> PMU after upgrade. How do you do it? Have different Westmere definitions
> for pc-1.0 (does not report PMU) and pc-1.1 (reports PMU). What happens
> if you'll try to run qemu-1.1 -cpu Westmere on Kernel < 3.3 (without
> PMU support)? Qemu will fail to start.

This sounds pretty much like what Liu Jinsong and Jan are discussing in
the TSC thread on qemu-devel. (cc'ing)

IMO interpreting an explicit -cpu parameter depending on -M would be
wrong. Changing the default CPU based on -M is fine with me. For an
explicit argument we would need Westmere-1.0 analog to pc-1.0. Then the
user gets what the user asks for, without unexpected magic.

Note that on my qom-cpu-wip branch [1] (that I hope to have cleaned up
and sent out by tomorrow), all built-in CPUs become statically
registered QOM types. The external definitions that get passed in via
-cpudef become dynamically registered QOM types; I took care to allow
overriding existing classes with the specified -cpudef fields (but
untested). Setting family, level, etc. for -cpu is done on the X86CPU
object instance. [2]
What I don't have yet are QOM properties to set the fields from, e.g.,
machine code, but those should be fairly easy to add.

Andreas

[1] http://repo.or.cz/w/qemu/afaerber.git/shortlog/refs/heads/qom-cpu-wip

[2]
http://repo.or.cz/w/qemu/afaerber.git/commit/8a6ede101a2722b790489989f21cad38d3e41fb5
Eduardo Habkost March 12, 2012, 4:50 p.m. UTC | #29
On Mon, Mar 12, 2012 at 04:49:47PM +0100, Andreas Färber wrote:
> Am 11.03.2012 17:16, schrieb Gleb Natapov:
> > On Sun, Mar 11, 2012 at 10:33:15AM -0500, Anthony Liguori wrote:
> >> On 03/11/2012 09:56 AM, Gleb Natapov wrote:
> >>> On Sun, Mar 11, 2012 at 09:12:58AM -0500, Anthony Liguori wrote:
> >>>> -cpu best wouldn't solve this.  You need a read/write configuration
> >>>> file where QEMU probes the available CPU and records it to be used
> >>>> for the lifetime of the VM.
> >>> That what I thought too, but this shouldn't be the case (Avi's idea).
> >>> We need two things: 1) CPU model config should be per machine type.
> >>> 2) QEMU should refuse to start if it cannot create cpu exactly as
> >>> specified by model config.
> >>
> >> This would either mean:
> >>
> >> A. pc-1.1 uses -cpu best with a fixed mask for 1.1
> >>
> >> B. pc-1.1 hardcodes Westmere or some other family
> >>
> > This would mean neither A nor B. May be it wasn't clear but I didn't talk
> > about -cpu best above. I am talking about any CPU model with fixed meaning
> > (not host or best which are host cpu dependant). Lets take Nehalem for
> > example (just to move from Westmere :)). Currently it has level=2. Eduardo
> > wants to fix it to be 11, but old guests, installed with -cpu Nehalem,
> > should see the same CPU exactly. How do you do it? Have different
> > Nehalem definition for pc-1.0 (which level=2) and pc-1.1 (with level=11).
> > Lets get back to Westmere. It actually has level=11, but that's only
> > expose another problem. Kernel 3.3 and qemu-1.1 combo will support
> > architectural PMU which is exposed in cpuid leaf 10. We do not want
> > guests installed with -cpu Westmere and qemu-1.0 to see architectural
> > PMU after upgrade. How do you do it? Have different Westmere definitions
> > for pc-1.0 (does not report PMU) and pc-1.1 (reports PMU). What happens
> > if you'll try to run qemu-1.1 -cpu Westmere on Kernel < 3.3 (without
> > PMU support)? Qemu will fail to start.
> 
> This sounds pretty much like what Liu Jinsong and Jan are discussing in
> the TSC thread on qemu-devel. (cc'ing)

I'll look for that thread. Thanks!

> 
> IMO interpreting an explicit -cpu parameter depending on -M would be
> wrong. Changing the default CPU based on -M is fine with me. For an
> explicit argument we would need Westmere-1.0 analog to pc-1.0. Then the
> user gets what the user asks for, without unexpected magic.

It is not unexpected magic. It would be a documented mechanism:
"-cpu Nehalem-1.0" and "-cpu Nehalem-1.1" would have the same meaning
every time, with any machine-type, but "-cpu Nehalem" would be an alias,
whose meaning depends on the machine-type.

Otherwise we would be stuck with a broken "Nehalem" model forever, and
we don't want that.

> Note that on my qom-cpu-wip branch [1] (that I hope to have cleaned up
> and sent out by tomorrow), all built-in CPUs become statically
> registered QOM types. The external definitions that get passed in via
> -cpudef become dynamically registered QOM types; I took care to allow
> overriding existing classes with the specified -cpudef fields (but
> untested). Setting family, level, etc. for -cpu is done on the X86CPU
> object instance. [2]
> What I don't have yet are QOM properties to set the fields from, e.g.,
> machine code, but those should be fairly easy to add.

Sounds interesting. I will have to take a look at the code to understand how it
affects what's being discussed in this thread.

> 
> Andreas
> 
> [1] http://repo.or.cz/w/qemu/afaerber.git/shortlog/refs/heads/qom-cpu-wip
> 
> [2]
> http://repo.or.cz/w/qemu/afaerber.git/commit/8a6ede101a2722b790489989f21cad38d3e41fb5
> 
> -- 
> SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
> GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
Andreas Färber March 12, 2012, 5:41 p.m. UTC | #30
Am 12.03.2012 17:50, schrieb Eduardo Habkost:
> On Mon, Mar 12, 2012 at 04:49:47PM +0100, Andreas Färber wrote:
>> Am 11.03.2012 17:16, schrieb Gleb Natapov:
>>> On Sun, Mar 11, 2012 at 10:33:15AM -0500, Anthony Liguori wrote:
>>>> On 03/11/2012 09:56 AM, Gleb Natapov wrote:
>>>>> On Sun, Mar 11, 2012 at 09:12:58AM -0500, Anthony Liguori wrote:
>>>>>> -cpu best wouldn't solve this.  You need a read/write configuration
>>>>>> file where QEMU probes the available CPU and records it to be used
>>>>>> for the lifetime of the VM.
>>>>> That what I thought too, but this shouldn't be the case (Avi's idea).
>>>>> We need two things: 1) CPU model config should be per machine type.
>>>>> 2) QEMU should refuse to start if it cannot create cpu exactly as
>>>>> specified by model config.
>>>>
>>>> This would either mean:
>>>>
>>>> A. pc-1.1 uses -cpu best with a fixed mask for 1.1
>>>>
>>>> B. pc-1.1 hardcodes Westmere or some other family
>>>>
>>> This would mean neither A nor B. May be it wasn't clear but I didn't talk
>>> about -cpu best above. I am talking about any CPU model with fixed meaning
>>> (not host or best which are host cpu dependant). Lets take Nehalem for
>>> example (just to move from Westmere :)). Currently it has level=2. Eduardo
>>> wants to fix it to be 11, but old guests, installed with -cpu Nehalem,
>>> should see the same CPU exactly. How do you do it? Have different
>>> Nehalem definition for pc-1.0 (which level=2) and pc-1.1 (with level=11).
>>> Lets get back to Westmere. It actually has level=11, but that's only
>>> expose another problem. Kernel 3.3 and qemu-1.1 combo will support
>>> architectural PMU which is exposed in cpuid leaf 10. We do not want
>>> guests installed with -cpu Westmere and qemu-1.0 to see architectural
>>> PMU after upgrade. How do you do it? Have different Westmere definitions
>>> for pc-1.0 (does not report PMU) and pc-1.1 (reports PMU). What happens
>>> if you'll try to run qemu-1.1 -cpu Westmere on Kernel < 3.3 (without
>>> PMU support)? Qemu will fail to start.
[...]
>> IMO interpreting an explicit -cpu parameter depending on -M would be
>> wrong. Changing the default CPU based on -M is fine with me. For an
>> explicit argument we would need Westmere-1.0 analog to pc-1.0. Then the
>> user gets what the user asks for, without unexpected magic.
> 
> It is not unexpected magic. It would be a documented mechanism:
> "-cpu Nehalem-1.0" and "-cpu Nehalem-1.1" would have the same meaning
> every time, with any machine-type, but "-cpu Nehalem" would be an alias,
> whose meaning depends on the machine-type.
> 
> Otherwise we would be stuck with a broken "Nehalem" model forever, and
> we don't want that.

Not quite what I meant: In light of QOM we should be able to instantiate
a CPU based on its name and optional parameters IMO. No dependency on
the machine, please. An alias sure, but if the user explicitly says -cpu
Nehalem then on 1.1 it should always be an alias to Nehalem-1.1 whether
the machine is -M pc-0.15 or pc. If no -cpu was specified by the user,
then choosing a default of Nehalem-1.0 for pc-1.0 is fine. Just trying
to keep separate things separate here.

Also keep in mind linux-user. There's no concept of a machine there, but
there's a cpu_copy() function used for forking that tries to re-create
the CPU based on its model. So currently cpu_*_init(env->cpu_model_str)
needs to be able to recreate an identical CPU through the central code
path, without access to a QEMUMachine.

(I'd really like to fix this "reentrancy" but we can't just trivially
memcpy().)

Andreas
Peter Maydell March 12, 2012, 5:47 p.m. UTC | #31
On 12 March 2012 17:41, Andreas Färber <afaerber@suse.de> wrote:
> Also keep in mind linux-user. There's no concept of a machine there, but
> there's a cpu_copy() function used for forking that tries to re-create
> the CPU based on its model.

Incidentally, do you know why the linux-user code calls cpu_reset on
the newly copied CPU state but only for TARGET_I386/SPARC/PPC ? That
looks very odd to me...

-- PMM
Gleb Natapov March 12, 2012, 5:52 p.m. UTC | #32
On Mon, Mar 12, 2012 at 06:41:06PM +0100, Andreas Färber wrote:
> Am 12.03.2012 17:50, schrieb Eduardo Habkost:
> > On Mon, Mar 12, 2012 at 04:49:47PM +0100, Andreas Färber wrote:
> >> Am 11.03.2012 17:16, schrieb Gleb Natapov:
> >>> On Sun, Mar 11, 2012 at 10:33:15AM -0500, Anthony Liguori wrote:
> >>>> On 03/11/2012 09:56 AM, Gleb Natapov wrote:
> >>>>> On Sun, Mar 11, 2012 at 09:12:58AM -0500, Anthony Liguori wrote:
> >>>>>> -cpu best wouldn't solve this.  You need a read/write configuration
> >>>>>> file where QEMU probes the available CPU and records it to be used
> >>>>>> for the lifetime of the VM.
> >>>>> That what I thought too, but this shouldn't be the case (Avi's idea).
> >>>>> We need two things: 1) CPU model config should be per machine type.
> >>>>> 2) QEMU should refuse to start if it cannot create cpu exactly as
> >>>>> specified by model config.
> >>>>
> >>>> This would either mean:
> >>>>
> >>>> A. pc-1.1 uses -cpu best with a fixed mask for 1.1
> >>>>
> >>>> B. pc-1.1 hardcodes Westmere or some other family
> >>>>
> >>> This would mean neither A nor B. May be it wasn't clear but I didn't talk
> >>> about -cpu best above. I am talking about any CPU model with fixed meaning
> >>> (not host or best which are host cpu dependant). Lets take Nehalem for
> >>> example (just to move from Westmere :)). Currently it has level=2. Eduardo
> >>> wants to fix it to be 11, but old guests, installed with -cpu Nehalem,
> >>> should see the same CPU exactly. How do you do it? Have different
> >>> Nehalem definition for pc-1.0 (which level=2) and pc-1.1 (with level=11).
> >>> Lets get back to Westmere. It actually has level=11, but that's only
> >>> expose another problem. Kernel 3.3 and qemu-1.1 combo will support
> >>> architectural PMU which is exposed in cpuid leaf 10. We do not want
> >>> guests installed with -cpu Westmere and qemu-1.0 to see architectural
> >>> PMU after upgrade. How do you do it? Have different Westmere definitions
> >>> for pc-1.0 (does not report PMU) and pc-1.1 (reports PMU). What happens
> >>> if you'll try to run qemu-1.1 -cpu Westmere on Kernel < 3.3 (without
> >>> PMU support)? Qemu will fail to start.
> [...]
> >> IMO interpreting an explicit -cpu parameter depending on -M would be
> >> wrong. Changing the default CPU based on -M is fine with me. For an
> >> explicit argument we would need Westmere-1.0 analog to pc-1.0. Then the
> >> user gets what the user asks for, without unexpected magic.
> > 
> > It is not unexpected magic. It would be a documented mechanism:
> > "-cpu Nehalem-1.0" and "-cpu Nehalem-1.1" would have the same meaning
> > every time, with any machine-type, but "-cpu Nehalem" would be an alias,
> > whose meaning depends on the machine-type.
> > 
> > Otherwise we would be stuck with a broken "Nehalem" model forever, and
> > we don't want that.
> 
> Not quite what I meant: In light of QOM we should be able to instantiate
> a CPU based on its name and optional parameters IMO. No dependency on
> the machine, please. An alias sure, but if the user explicitly says -cpu
> Nehalem then on 1.1 it should always be an alias to Nehalem-1.1 whether
> the machine is -M pc-0.15 or pc. If no -cpu was specified by the user,
> then choosing a default of Nehalem-1.0 for pc-1.0 is fine. Just trying
> to keep separate things separate here.
> 
Those things are not separate. If user will get Nehalem-1.1 with -M
pc-0.15 on qemu-1.1 it will get broken VM. If user uses -M pc-0.15
it should get exactly same machine it gets by running qemu-0.15. Guest
should not be able to tell the difference. This is the reason -M exists,
anything else is a bug.

--
			Gleb.
Andreas Färber March 12, 2012, 5:53 p.m. UTC | #33
Am 12.03.2012 18:47, schrieb Peter Maydell:
> On 12 March 2012 17:41, Andreas Färber <afaerber@suse.de> wrote:
>> Also keep in mind linux-user. There's no concept of a machine there, but
>> there's a cpu_copy() function used for forking that tries to re-create
>> the CPU based on its model.
> 
> Incidentally, do you know why the linux-user code calls cpu_reset on
> the newly copied CPU state but only for TARGET_I386/SPARC/PPC ? That
> looks very odd to me...

Incidentally for i386 I do: cpu_reset() is intentionally not part of
cpu_init() there because afterwards the machine or something sets
whether this CPU is a "bsp" (Board Support Package? ;)) and only then
resets it.

For ppc and sparc I don't know but I'd be surprised if it's necessary
for ppc... Alex?

Andreas
Gleb Natapov March 12, 2012, 5:55 p.m. UTC | #34
On Mon, Mar 12, 2012 at 06:53:27PM +0100, Andreas Färber wrote:
> Am 12.03.2012 18:47, schrieb Peter Maydell:
> > On 12 March 2012 17:41, Andreas Färber <afaerber@suse.de> wrote:
> >> Also keep in mind linux-user. There's no concept of a machine there, but
> >> there's a cpu_copy() function used for forking that tries to re-create
> >> the CPU based on its model.
> > 
> > Incidentally, do you know why the linux-user code calls cpu_reset on
> > the newly copied CPU state but only for TARGET_I386/SPARC/PPC ? That
> > looks very odd to me...
> 
> Incidentally for i386 I do: cpu_reset() is intentionally not part of
> cpu_init() there because afterwards the machine or something sets
> whether this CPU is a "bsp" (Board Support Package? ;)) and only then
Boot Strap Processor I guess :)

> resets it.
> 
> For ppc and sparc I don't know but I'd be surprised if it's necessary
> for ppc... Alex?
> 
> Andreas
> 
> -- 
> SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
> GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

--
			Gleb.
Alexander Graf March 12, 2012, 5:59 p.m. UTC | #35
On 12.03.2012, at 18:53, Andreas Färber wrote:

> Am 12.03.2012 18:47, schrieb Peter Maydell:
>> On 12 March 2012 17:41, Andreas Färber <afaerber@suse.de> wrote:
>>> Also keep in mind linux-user. There's no concept of a machine there, but
>>> there's a cpu_copy() function used for forking that tries to re-create
>>> the CPU based on its model.
>> 
>> Incidentally, do you know why the linux-user code calls cpu_reset on
>> the newly copied CPU state but only for TARGET_I386/SPARC/PPC ? That
>> looks very odd to me...
> 
> Incidentally for i386 I do: cpu_reset() is intentionally not part of
> cpu_init() there because afterwards the machine or something sets
> whether this CPU is a "bsp" (Board Support Package? ;)) and only then
> resets it.
> 
> For ppc and sparc I don't know but I'd be surprised if it's necessary
> for ppc... Alex?

Phew - no idea. Does git blame know more there? :)


Alex
Eduardo Habkost March 12, 2012, 6:30 p.m. UTC | #36
On Mon, Mar 12, 2012 at 06:41:06PM +0100, Andreas Färber wrote:
> Am 12.03.2012 17:50, schrieb Eduardo Habkost:
> > On Mon, Mar 12, 2012 at 04:49:47PM +0100, Andreas Färber wrote:
[...]
> >> IMO interpreting an explicit -cpu parameter depending on -M would be
> >> wrong. Changing the default CPU based on -M is fine with me. For an
> >> explicit argument we would need Westmere-1.0 analog to pc-1.0. Then the
> >> user gets what the user asks for, without unexpected magic.
> > 
> > It is not unexpected magic. It would be a documented mechanism:
> > "-cpu Nehalem-1.0" and "-cpu Nehalem-1.1" would have the same meaning
> > every time, with any machine-type, but "-cpu Nehalem" would be an alias,
> > whose meaning depends on the machine-type.
> > 
> > Otherwise we would be stuck with a broken "Nehalem" model forever, and
> > we don't want that.
> 
> Not quite what I meant: In light of QOM we should be able to instantiate
> a CPU based on its name and optional parameters IMO. No dependency on
> the machine, please. An alias sure, but if the user explicitly says -cpu
> Nehalem then on 1.1 it should always be an alias to Nehalem-1.1 whether
> the machine is -M pc-0.15 or pc. If no -cpu was specified by the user,
> then choosing a default of Nehalem-1.0 for pc-1.0 is fine. Just trying
> to keep separate things separate here.

As Gleb explained, things aren't really separated:
"qemu-1.1 -M pc-1.0 -cpu Nehalem" should result in the same machine as
"qemu-1.0 -cpu Nehalem", no difference should be visible to the guest.
simply make incompatible changes.

> 
> Also keep in mind linux-user. There's no concept of a machine there, but
> there's a cpu_copy() function used for forking that tries to re-create
> the CPU based on its model. So currently cpu_*_init(env->cpu_model_str)
> needs to be able to recreate an identical CPU through the central code
> path, without access to a QEMUMachine.

So just translate the CPU alias given to "-cpu" to the true CPU model
name as soon as possible, at the command-line-handling code, so the rest
of the code always see the true CPU model name.

After all, the need to make the aliases is a command-line interface
compatibility problem, so it makes sense to handle this at the
command-line-handling code.
Anthony Liguori March 12, 2012, 6:42 p.m. UTC | #37
On 03/12/2012 01:30 PM, Eduardo Habkost wrote:
> On Mon, Mar 12, 2012 at 06:41:06PM +0100, Andreas Färber wrote:
>> Am 12.03.2012 17:50, schrieb Eduardo Habkost:
>>> On Mon, Mar 12, 2012 at 04:49:47PM +0100, Andreas Färber wrote:
> [...]
>>>> IMO interpreting an explicit -cpu parameter depending on -M would be
>>>> wrong. Changing the default CPU based on -M is fine with me. For an
>>>> explicit argument we would need Westmere-1.0 analog to pc-1.0. Then the
>>>> user gets what the user asks for, without unexpected magic.
>>>
>>> It is not unexpected magic. It would be a documented mechanism:
>>> "-cpu Nehalem-1.0" and "-cpu Nehalem-1.1" would have the same meaning
>>> every time, with any machine-type, but "-cpu Nehalem" would be an alias,
>>> whose meaning depends on the machine-type.
>>>
>>> Otherwise we would be stuck with a broken "Nehalem" model forever, and
>>> we don't want that.
>>
>> Not quite what I meant: In light of QOM we should be able to instantiate
>> a CPU based on its name and optional parameters IMO. No dependency on
>> the machine, please. An alias sure, but if the user explicitly says -cpu
>> Nehalem then on 1.1 it should always be an alias to Nehalem-1.1 whether
>> the machine is -M pc-0.15 or pc. If no -cpu was specified by the user,
>> then choosing a default of Nehalem-1.0 for pc-1.0 is fine. Just trying
>> to keep separate things separate here.
>
> As Gleb explained, things aren't really separated:
> "qemu-1.1 -M pc-1.0 -cpu Nehalem" should result in the same machine as
> "qemu-1.0 -cpu Nehalem", no difference should be visible to the guest.
> simply make incompatible changes.

So this is easy.  CPU's need to be qdev/QOM and the various cpuid settings need 
to be done through qdev properties.

Then you can just add globals to the machine definition.  No different than what 
we do with virtio-blk.

Regards,

Anthony Liguori

>
>>
>> Also keep in mind linux-user. There's no concept of a machine there, but
>> there's a cpu_copy() function used for forking that tries to re-create
>> the CPU based on its model. So currently cpu_*_init(env->cpu_model_str)
>> needs to be able to recreate an identical CPU through the central code
>> path, without access to a QEMUMachine.
>
> So just translate the CPU alias given to "-cpu" to the true CPU model
> name as soon as possible, at the command-line-handling code, so the rest
> of the code always see the true CPU model name.
>
> After all, the need to make the aliases is a command-line interface
> compatibility problem, so it makes sense to handle this at the
> command-line-handling code.
>
Itamar Heim March 12, 2012, 6:53 p.m. UTC | #38
On 03/11/2012 05:33 PM, Anthony Liguori wrote:
> On 03/11/2012 09:56 AM, Gleb Natapov wrote:
>> On Sun, Mar 11, 2012 at 09:12:58AM -0500, Anthony Liguori wrote:
>>> -cpu best wouldn't solve this. You need a read/write configuration
>>> file where QEMU probes the available CPU and records it to be used
>>> for the lifetime of the VM.
>> That what I thought too, but this shouldn't be the case (Avi's idea).
>> We need two things: 1) CPU model config should be per machine type.
>> 2) QEMU should refuse to start if it cannot create cpu exactly as
>> specified by model config.
>
> This would either mean:
>
> A. pc-1.1 uses -cpu best with a fixed mask for 1.1
>
> B. pc-1.1 hardcodes Westmere or some other family
>
> (A) would imply a different CPU if you moved the machine from one system
> to another. I would think this would be very problematic from a user's
> perspective.
>
> (B) would imply that we had to choose the least common denominator which
> is essentially what we do today with qemu64. If you want to just switch
> qemu64 to Conroe, I don't think that's a huge difference from what we
> have today.
>
>>> It's a discussion about how we handle this up and down the stack.
>>>
>>> The question is who should define and manage CPU compatibility.
>>> Right now QEMU does to a certain degree, libvirt discards this and
>>> does it's own thing, and VDSM/ovirt-engine assume that we're
>>> providing something and has built a UI around it.
>> If we want QEMU to be usable without management layer then QEMU should
>> provide stable CPU models. Stable in a sense that qemu, kernel or CPU
>> upgrade does not change what guest sees.
>
> We do this today by exposing -cpu qemu64 by default. If all you're
> advocating is doing -cpu Conroe by default, that's fine.
>
> But I fail to see where this fits into the larger discussion here. The
> problem to solve is: I want to use the largest possible subset of CPU
> features available uniformly throughout my datacenter.
>
> QEMU and libvirt have single node views so they cannot solve this
> problem on their own. Whether that subset is a generic Westmere-like
> processor that never existed IRL or a specific Westmere processor seems
> like a decision that should be made by the datacenter level manager with
> the node level view.
>
> If I have a homogeneous environments of Xeon 7540, I would probably like
> to see a Xeon 7540 in my guest. Doesn't it make sense to enable the
> management tool to make this decision?

literally, or in capabilities?
literally means you want to allow passing the cpu name to be exposed to 
the guest?
if in capabilities, how would it differ from choosing the correct "cpu 
family"?
it wouldn't really be identical (say, number of cores/sockets and no VT 
for time being)

ovirt allows to set "cpu family" per cluster. assume tomorrow it could 
do it an even more granular way.
it could also do it automatically based on subset of flags on all hosts 
- but would it really make sense to expose a set of capabilities which 
doesn't exist in the real world (which iiuc, is pretty much aligned with 
the cpu families?), that users understand?
Anthony Liguori March 12, 2012, 7:01 p.m. UTC | #39
On 03/12/2012 01:53 PM, Itamar Heim wrote:
> On 03/11/2012 05:33 PM, Anthony Liguori wrote:
>> On 03/11/2012 09:56 AM, Gleb Natapov wrote:
>>> On Sun, Mar 11, 2012 at 09:12:58AM -0500, Anthony Liguori wrote:
>>>> -cpu best wouldn't solve this. You need a read/write configuration
>>>> file where QEMU probes the available CPU and records it to be used
>>>> for the lifetime of the VM.
>>> That what I thought too, but this shouldn't be the case (Avi's idea).
>>> We need two things: 1) CPU model config should be per machine type.
>>> 2) QEMU should refuse to start if it cannot create cpu exactly as
>>> specified by model config.
>>
>> This would either mean:
>>
>> A. pc-1.1 uses -cpu best with a fixed mask for 1.1
>>
>> B. pc-1.1 hardcodes Westmere or some other family
>>
>> (A) would imply a different CPU if you moved the machine from one system
>> to another. I would think this would be very problematic from a user's
>> perspective.
>>
>> (B) would imply that we had to choose the least common denominator which
>> is essentially what we do today with qemu64. If you want to just switch
>> qemu64 to Conroe, I don't think that's a huge difference from what we
>> have today.
>>
>>>> It's a discussion about how we handle this up and down the stack.
>>>>
>>>> The question is who should define and manage CPU compatibility.
>>>> Right now QEMU does to a certain degree, libvirt discards this and
>>>> does it's own thing, and VDSM/ovirt-engine assume that we're
>>>> providing something and has built a UI around it.
>>> If we want QEMU to be usable without management layer then QEMU should
>>> provide stable CPU models. Stable in a sense that qemu, kernel or CPU
>>> upgrade does not change what guest sees.
>>
>> We do this today by exposing -cpu qemu64 by default. If all you're
>> advocating is doing -cpu Conroe by default, that's fine.
>>
>> But I fail to see where this fits into the larger discussion here. The
>> problem to solve is: I want to use the largest possible subset of CPU
>> features available uniformly throughout my datacenter.
>>
>> QEMU and libvirt have single node views so they cannot solve this
>> problem on their own. Whether that subset is a generic Westmere-like
>> processor that never existed IRL or a specific Westmere processor seems
>> like a decision that should be made by the datacenter level manager with
>> the node level view.
>>
>> If I have a homogeneous environments of Xeon 7540, I would probably like
>> to see a Xeon 7540 in my guest. Doesn't it make sense to enable the
>> management tool to make this decision?
>
> literally, or in capabilities?
> literally means you want to allow passing the cpu name to be exposed to the guest?

Yes, literally.

Xen exposes the host CPUID to the guest for PV.  Both PHYP (IBM System P) and 
z/VM (IBM System Z) do the same.

What does VMware expose to guests by default?

> if in capabilities, how would it differ from choosing the correct "cpu family"?
> it wouldn't really be identical (say, number of cores/sockets and no VT for time
> being)

It's a trade off.  From a RAS perspective, it's helpful to have information 
about the host available in the guest.

If you're already exposing a compatible family, exposing the actual processor 
seems to be worth the extra effort.

> ovirt allows to set "cpu family" per cluster. assume tomorrow it could do it an
> even more granular way.
> it could also do it automatically based on subset of flags on all hosts - but
> would it really make sense to expose a set of capabilities which doesn't exist
> in the real world (which iiuc, is pretty much aligned with the cpu families?),
> that users understand?

No, I think the lesson we've learned in QEMU (the hard way) is that exposing a 
CPU that never existed will cause something to break.  Often times, that 
something is glibc or GCC which tends to be rather epic in terms of failure.

Regards,

Anthony Liguori

>
>
>
Itamar Heim March 12, 2012, 7:12 p.m. UTC | #40
On 03/12/2012 09:01 PM, Anthony Liguori wrote:
> On 03/12/2012 01:53 PM, Itamar Heim wrote:
>> On 03/11/2012 05:33 PM, Anthony Liguori wrote:
>>> On 03/11/2012 09:56 AM, Gleb Natapov wrote:
>>>> On Sun, Mar 11, 2012 at 09:12:58AM -0500, Anthony Liguori wrote:
>>>>> -cpu best wouldn't solve this. You need a read/write configuration
>>>>> file where QEMU probes the available CPU and records it to be used
>>>>> for the lifetime of the VM.
>>>> That what I thought too, but this shouldn't be the case (Avi's idea).
>>>> We need two things: 1) CPU model config should be per machine type.
>>>> 2) QEMU should refuse to start if it cannot create cpu exactly as
>>>> specified by model config.
>>>
>>> This would either mean:
>>>
>>> A. pc-1.1 uses -cpu best with a fixed mask for 1.1
>>>
>>> B. pc-1.1 hardcodes Westmere or some other family
>>>
>>> (A) would imply a different CPU if you moved the machine from one system
>>> to another. I would think this would be very problematic from a user's
>>> perspective.
>>>
>>> (B) would imply that we had to choose the least common denominator which
>>> is essentially what we do today with qemu64. If you want to just switch
>>> qemu64 to Conroe, I don't think that's a huge difference from what we
>>> have today.
>>>
>>>>> It's a discussion about how we handle this up and down the stack.
>>>>>
>>>>> The question is who should define and manage CPU compatibility.
>>>>> Right now QEMU does to a certain degree, libvirt discards this and
>>>>> does it's own thing, and VDSM/ovirt-engine assume that we're
>>>>> providing something and has built a UI around it.
>>>> If we want QEMU to be usable without management layer then QEMU should
>>>> provide stable CPU models. Stable in a sense that qemu, kernel or CPU
>>>> upgrade does not change what guest sees.
>>>
>>> We do this today by exposing -cpu qemu64 by default. If all you're
>>> advocating is doing -cpu Conroe by default, that's fine.
>>>
>>> But I fail to see where this fits into the larger discussion here. The
>>> problem to solve is: I want to use the largest possible subset of CPU
>>> features available uniformly throughout my datacenter.
>>>
>>> QEMU and libvirt have single node views so they cannot solve this
>>> problem on their own. Whether that subset is a generic Westmere-like
>>> processor that never existed IRL or a specific Westmere processor seems
>>> like a decision that should be made by the datacenter level manager with
>>> the node level view.
>>>
>>> If I have a homogeneous environments of Xeon 7540, I would probably like
>>> to see a Xeon 7540 in my guest. Doesn't it make sense to enable the
>>> management tool to make this decision?
>>
>> literally, or in capabilities?
>> literally means you want to allow passing the cpu name to be exposed
>> to the guest?
>
> Yes, literally.
>
> Xen exposes the host CPUID to the guest for PV. Both PHYP (IBM System P)
> and z/VM (IBM System Z) do the same.
>
> What does VMware expose to guests by default?
>
>> if in capabilities, how would it differ from choosing the correct "cpu
>> family"?
>> it wouldn't really be identical (say, number of cores/sockets and no
>> VT for time
>> being)
>
> It's a trade off. From a RAS perspective, it's helpful to have
> information about the host available in the guest.
>
> If you're already exposing a compatible family, exposing the actual
> processor seems to be worth the extra effort.

only if the entire cluster is (and will be?) identical cpu.
or if you don't care about live migration i guess, which could be hte 
case for clouds, then again, not sure a cloud provider would want to 
expose the physical cpu to the tenant.

>
>> ovirt allows to set "cpu family" per cluster. assume tomorrow it could
>> do it an
>> even more granular way.
>> it could also do it automatically based on subset of flags on all
>> hosts - but
>> would it really make sense to expose a set of capabilities which
>> doesn't exist
>> in the real world (which iiuc, is pretty much aligned with the cpu
>> families?),
>> that users understand?
>
> No, I think the lesson we've learned in QEMU (the hard way) is that
> exposing a CPU that never existed will cause something to break. Often
> times, that something is glibc or GCC which tends to be rather epic in
> terms of failure.

good to hear - I think this is the important part.
so from that perspective, cpu families sounds the right abstraction for 
general use case to me.
for ovirt, could improve on smaller/dynamic subsets of migration domains 
rather than current clusters
and sounds like you would want to see "expose host cpu for non 
migratable guests, or for identical clusters".
Anthony Liguori March 12, 2012, 7:50 p.m. UTC | #41
On 03/12/2012 02:12 PM, Itamar Heim wrote:
> On 03/12/2012 09:01 PM, Anthony Liguori wrote:
>>
>> It's a trade off. From a RAS perspective, it's helpful to have
>> information about the host available in the guest.
>>
>> If you're already exposing a compatible family, exposing the actual
>> processor seems to be worth the extra effort.
>
> only if the entire cluster is (and will be?) identical cpu.

At least in my experience, this isn't unusual.

> or if you don't care about live migration i guess, which could be hte case for
> clouds, then again, not sure a cloud provider would want to expose the physical
> cpu to the tenant.

Depends on the type of cloud you're building, I guess.

>>> ovirt allows to set "cpu family" per cluster. assume tomorrow it could
>>> do it an
>>> even more granular way.
>>> it could also do it automatically based on subset of flags on all
>>> hosts - but
>>> would it really make sense to expose a set of capabilities which
>>> doesn't exist
>>> in the real world (which iiuc, is pretty much aligned with the cpu
>>> families?),
>>> that users understand?
>>
>> No, I think the lesson we've learned in QEMU (the hard way) is that
>> exposing a CPU that never existed will cause something to break. Often
>> times, that something is glibc or GCC which tends to be rather epic in
>> terms of failure.
>
> good to hear - I think this is the important part.
> so from that perspective, cpu families sounds the right abstraction for general
> use case to me.
> for ovirt, could improve on smaller/dynamic subsets of migration domains rather
> than current clusters
> and sounds like you would want to see "expose host cpu for non migratable
> guests, or for identical clusters".

Would it be possible to have a "best available" option in oVirt-engine that 
would assume that all processors are of the same class and fail an attempt to 
add something that's an older class?

I think that most people probably would start with "best available" and then 
after adding a node fails, revisit the decision and start lowering the minimum 
CPU family (I'm assuming that it's possible to modify the CPU family over time).

 From a QEMU perspective, I think that means having per-family CPU options and 
then Alex's '-cpu best'.  But presumably it's also necessary to be able to 
figure out in virsh capabilities what '-cpu best' would be.

Regards,

Anthony Liguori

> _______________________________________________
> Arch mailing list
> Arch@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/arch
>
Itamar Heim March 12, 2012, 8 p.m. UTC | #42
On 03/12/2012 09:50 PM, Anthony Liguori wrote:
> On 03/12/2012 02:12 PM, Itamar Heim wrote:
>> On 03/12/2012 09:01 PM, Anthony Liguori wrote:
>>>
>>> It's a trade off. From a RAS perspective, it's helpful to have
>>> information about the host available in the guest.
>>>
>>> If you're already exposing a compatible family, exposing the actual
>>> processor seems to be worth the extra effort.
>>
>> only if the entire cluster is (and will be?) identical cpu.
>
> At least in my experience, this isn't unusual.
>
>> or if you don't care about live migration i guess, which could be hte
>> case for
>> clouds, then again, not sure a cloud provider would want to expose the
>> physical
>> cpu to the tenant.
>
> Depends on the type of cloud you're building, I guess.
>
>>>> ovirt allows to set "cpu family" per cluster. assume tomorrow it could
>>>> do it an
>>>> even more granular way.
>>>> it could also do it automatically based on subset of flags on all
>>>> hosts - but
>>>> would it really make sense to expose a set of capabilities which
>>>> doesn't exist
>>>> in the real world (which iiuc, is pretty much aligned with the cpu
>>>> families?),
>>>> that users understand?
>>>
>>> No, I think the lesson we've learned in QEMU (the hard way) is that
>>> exposing a CPU that never existed will cause something to break. Often
>>> times, that something is glibc or GCC which tends to be rather epic in
>>> terms of failure.
>>
>> good to hear - I think this is the important part.
>> so from that perspective, cpu families sounds the right abstraction
>> for general
>> use case to me.
>> for ovirt, could improve on smaller/dynamic subsets of migration
>> domains rather
>> than current clusters
>> and sounds like you would want to see "expose host cpu for non migratable
>> guests, or for identical clusters".
>
> Would it be possible to have a "best available" option in oVirt-engine
> that would assume that all processors are of the same class and fail an
> attempt to add something that's an older class?
>
> I think that most people probably would start with "best available" and
> then after adding a node fails, revisit the decision and start lowering
> the minimum CPU family (I'm assuming that it's possible to modify the
> CPU family over time).

iirc, the original implementation for cpu family was start with an empty 
family, and use the best match from the first host added to the cluster.
not sure if that's still the behavior though.
worth mentioning the cpu families in ovirt have a 'sort' field to allow 
starting from best available.
and you can change the cpu family of a cluster today as well (with some 
validations hosts in the cluster match up)

>
>  From a QEMU perspective, I think that means having per-family CPU
> options and then Alex's '-cpu best'. But presumably it's also necessary
> to be able to figure out in virsh capabilities what '-cpu best' would be.

if sticking to cpu families, updating the config with name/prioirty of 
the families twice a year (or by user) seems good enough to me...

>
> Regards,
>
> Anthony Liguori
>
>> _______________________________________________
>> Arch mailing list
>> Arch@ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/arch
>>
>
Ayal Baron March 12, 2012, 8:19 p.m. UTC | #43
----- Original Message -----
> On 03/12/2012 02:12 PM, Itamar Heim wrote:
> > On 03/12/2012 09:01 PM, Anthony Liguori wrote:
> >>
> >> It's a trade off. From a RAS perspective, it's helpful to have
> >> information about the host available in the guest.
> >>
> >> If you're already exposing a compatible family, exposing the
> >> actual
> >> processor seems to be worth the extra effort.
> >
> > only if the entire cluster is (and will be?) identical cpu.
> 
> At least in my experience, this isn't unusual.

I can definitely see places choosing homogeneous hardware and upgrading every few years. 
Giving them max capabilities for their cluster sounds logical to me.
Esp. cloud providers.

> 
> > or if you don't care about live migration i guess, which could be
> > hte case for
> > clouds, then again, not sure a cloud provider would want to expose
> > the physical
> > cpu to the tenant.
> 
> Depends on the type of cloud you're building, I guess.
> 

Wouldn't this affect a simple startup of a VM with a different CPU (if motherboard changed as well cause reactivation issues in windows and fun things like that)?
Even if the cloud doesn't support live migration, they don't pin VMs to a host. User could shut it down and start it up again and it might run on a different node.  Your ephemeral storage would be lost, but persistent image storage could still contain os info pertinent to cpu type.
Btw, I don't see why internally they would not support live migration even for when they need to put a host in maintenance etc. live storage migration could take care of the ephemeral storage if that's the issue (albeit take a million years to finish).

> >>> ovirt allows to set "cpu family" per cluster. assume tomorrow it
> >>> could
> >>> do it an
> >>> even more granular way.
> >>> it could also do it automatically based on subset of flags on all
> >>> hosts - but
> >>> would it really make sense to expose a set of capabilities which
> >>> doesn't exist
> >>> in the real world (which iiuc, is pretty much aligned with the
> >>> cpu
> >>> families?),
> >>> that users understand?
> >>
> >> No, I think the lesson we've learned in QEMU (the hard way) is
> >> that
> >> exposing a CPU that never existed will cause something to break.
> >> Often
> >> times, that something is glibc or GCC which tends to be rather
> >> epic in
> >> terms of failure.
> >
> > good to hear - I think this is the important part.
> > so from that perspective, cpu families sounds the right abstraction
> > for general
> > use case to me.
> > for ovirt, could improve on smaller/dynamic subsets of migration
> > domains rather
> > than current clusters
> > and sounds like you would want to see "expose host cpu for non
> > migratable
> > guests, or for identical clusters".
> 
> Would it be possible to have a "best available" option in
> oVirt-engine that
> would assume that all processors are of the same class and fail an
> attempt to
> add something that's an older class?
> 
> I think that most people probably would start with "best available"
> and then
> after adding a node fails, revisit the decision and start lowering
> the minimum
> CPU family (I'm assuming that it's possible to modify the CPU family
> over time).

But then they'd already have VMs that were started with the better CPU and now it'd change under their feet? or would we start them up with the best and fail to start these VMs on the newly added hosts which have the lower cpu family/type?

> 
>  From a QEMU perspective, I think that means having per-family CPU
>  options and
> then Alex's '-cpu best'.  But presumably it's also necessary to be
> able to
> figure out in virsh capabilities what '-cpu best' would be.
> 
> Regards,
> 
> Anthony Liguori
> 
> > _______________________________________________
> > Arch mailing list
> > Arch@ovirt.org
> > http://lists.ovirt.org/mailman/listinfo/arch
> >
> 
> --
> libvir-list mailing list
> libvir-list@redhat.com
> https://www.redhat.com/mailman/listinfo/libvir-list
>
Itamar Heim March 13, 2012, 8:32 a.m. UTC | #44
On 03/12/2012 10:19 PM, Ayal Baron wrote:
>
>
> ----- Original Message -----
>> On 03/12/2012 02:12 PM, Itamar Heim wrote:
>>> On 03/12/2012 09:01 PM, Anthony Liguori wrote:
>>>>
>>>> It's a trade off. From a RAS perspective, it's helpful to have
>>>> information about the host available in the guest.
>>>>
>>>> If you're already exposing a compatible family, exposing the
>>>> actual
>>>> processor seems to be worth the extra effort.
>>>
>>> only if the entire cluster is (and will be?) identical cpu.
>>
>> At least in my experience, this isn't unusual.
>
> I can definitely see places choosing homogeneous hardware and upgrading every few years.
> Giving them max capabilities for their cluster sounds logical to me.
> Esp. cloud providers.

they would get same performance as from the matching "cpu family".
only difference would be if the guest known the name of the host cpu.

>
>>
>>> or if you don't care about live migration i guess, which could be
>>> hte case for
>>> clouds, then again, not sure a cloud provider would want to expose
>>> the physical
>>> cpu to the tenant.
>>
>> Depends on the type of cloud you're building, I guess.
>>
>
> Wouldn't this affect a simple startup of a VM with a different CPU (if motherboard changed as well cause reactivation issues in windows and fun things like that)?

that's an interesting question, I have to assume this works though, 
since we didn't see issues with changing the cpu family for guests so far.
Eduardo Habkost March 13, 2012, 2:53 p.m. UTC | #45
So, trying to summarize what was discussed in the call:

On Mon, Mar 12, 2012 at 10:08:10AM -0300, Eduardo Habkost wrote:
> > Let's say we moved CPU definitions to /usr/share/qemu/cpu-models.xml.
> > 
> > Obviously, we'd want a command line option to be able to change that
> > location so we'd introduce -cpu-models PATH.
> > 
> > But we want all of our command line options to be settable by the
> > global configuration file so we would have a cpu-model=PATH to the
> > configuration file.
> > 
> > But why hard code a path when we can just set the default path in the
> > configuration file so let's avoid hard coding and just put
> > cpu-models=/usr/share/qemu/cpu-models.xml in the default
> > configuration file.
> 
> We wouldn't do the above.
> 
> -nodefconfig should disable the loading of files on /etc, but it
> shouldn't disable loading internal non-configurable data that we just
> happened to choose to store outside the qemu binary because it makes
> development easier.

The statement above is the one not fulfilled by the compromise solution:
-nodefconfig would really disable the loading of files on /usr/share.

> 
> Really, the requirement of a "default configuration file" is a problem
> by itself. Qemu should not require a default configuration file to work,
> and it shouldn't require users to copy the default configuration file to
> change options from the default.

The statement above is only partly true. The default configuration file
would be still needed, but if defaults are stored on /usr/share, I will
be happy with it.

My main problem was with the need to _copy_ or edit a non-trivial
default config file. If the not-often-edited defaults/templates are
easily found on /usr/share to be used with -readconfig, I will be happy
with this solution, even if -nodefconfig disable the files on
/usr/share.

> 
> Doing this would make it impossible to deploy fixes to users if we evern
> find out that the default configuration file had a serious bug. What if
> a bug in our default configuration file has a serious security
> implication?

The answer to this is: if the broken templates/defaults are on
/usr/share, it would be easy to deploy the fix.

So, the compromise solution is:

- We can move some configuration data (especially defaults/templates)
  to /usr/share (machine-types and CPU models could go there). This
  way we can easily deploy fixes to the defaults, if necessary.
- To reuse Qemu models, or machine-types, and not define everything from
  scratch, libvirt will have to use something like:
  "-nodefconfig -readconfig /usr/share/qemu/cpu-models-x86.conf"


(the item below is not something discussed on the call, just something I
want to add)

To make this work better, we can allow users (humans or machines) to
"extend" CPU models on the config file, instead of having to define
everything from scratch. So, on /etc (or on a libvirt-generated config)
we could have something like:

=============
[cpu]
base_cpudef = Nehalem
add_features = "vmx"
=============

Then, as long as /usr/share/cpu-models-x86.conf is loaded, the user will
be able to reuse the Nehalem CPU model provided by Qemu.

> 
> > 
> > But now when libvirt uses -nodefconfig, those models go away.
> > -nodefconfig means start QEMU in the most minimal state possible.
> > You get what you pay for if you use it.
> > 
> > We'll have the same problem with machine configuration files.  At
> > some point in time, -nodefconfig will make machine models disappear.
> 
> It shouldn't. Machine-types are defaults to be used as base, they are
> not user-provided configuration. And the fact that we decided to store
> some data outside of the Qemu binary is orthogonal the design decisions
> in the Qemu command-line and configuration interface.

So, this problem is solved if the defaults are easily found on
/usr/share.

We still have the backwards compatibility problem for pc-1.0, pc-1.1,
and so on. But that can be discussed later, when we actually move
machine-types to somewhere outside .c files.

> 
> As I said previously, requiring generation of opaque config files (and
> "copy the default config file and change it" is included on my
> definition of "generation of opaque config files") is poor design, IMO.
> I bet this even has an entry in some design anti-pattern catalog
> somewhere.

This problem is also solved if the defaults are deployed on /usr/share
and just reused/included by the config files on /etc.
Ayal Baron March 14, 2012, 12:11 a.m. UTC | #46
----- Original Message -----
> On 03/12/2012 10:19 PM, Ayal Baron wrote:
> >
> >
> > ----- Original Message -----
> >> On 03/12/2012 02:12 PM, Itamar Heim wrote:
> >>> On 03/12/2012 09:01 PM, Anthony Liguori wrote:
> >>>>
> >>>> It's a trade off. From a RAS perspective, it's helpful to have
> >>>> information about the host available in the guest.
> >>>>
> >>>> If you're already exposing a compatible family, exposing the
> >>>> actual
> >>>> processor seems to be worth the extra effort.
> >>>
> >>> only if the entire cluster is (and will be?) identical cpu.
> >>
> >> At least in my experience, this isn't unusual.
> >
> > I can definitely see places choosing homogeneous hardware and
> > upgrading every few years.
> > Giving them max capabilities for their cluster sounds logical to
> > me.
> > Esp. cloud providers.
> 
> they would get same performance as from the matching "cpu family".
> only difference would be if the guest known the name of the host cpu.
> 
> >
> >>
> >>> or if you don't care about live migration i guess, which could be
> >>> hte case for
> >>> clouds, then again, not sure a cloud provider would want to
> >>> expose
> >>> the physical
> >>> cpu to the tenant.
> >>
> >> Depends on the type of cloud you're building, I guess.
> >>
> >
> > Wouldn't this affect a simple startup of a VM with a different CPU
> > (if motherboard changed as well cause reactivation issues in
> > windows and fun things like that)?
> 
> that's an interesting question, I have to assume this works though,
> since we didn't see issues with changing the cpu family for guests so
> far.
> 

assumption... :)
I'd try changing twice in a row (run VM, stop, change family, restart VM, stop, change family restart VM).
Gleb Natapov March 22, 2012, 9:32 a.m. UTC | #47
On Tue, Mar 13, 2012 at 11:53:19AM -0300, Eduardo Habkost wrote:
> So, trying to summarize what was discussed in the call:
> 
> On Mon, Mar 12, 2012 at 10:08:10AM -0300, Eduardo Habkost wrote:
> > > Let's say we moved CPU definitions to /usr/share/qemu/cpu-models.xml.
> > > 
> > > Obviously, we'd want a command line option to be able to change that
> > > location so we'd introduce -cpu-models PATH.
> > > 
> > > But we want all of our command line options to be settable by the
> > > global configuration file so we would have a cpu-model=PATH to the
> > > configuration file.
> > > 
> > > But why hard code a path when we can just set the default path in the
> > > configuration file so let's avoid hard coding and just put
> > > cpu-models=/usr/share/qemu/cpu-models.xml in the default
> > > configuration file.
> > 
> > We wouldn't do the above.
> > 
> > -nodefconfig should disable the loading of files on /etc, but it
> > shouldn't disable loading internal non-configurable data that we just
> > happened to choose to store outside the qemu binary because it makes
> > development easier.
> 
> The statement above is the one not fulfilled by the compromise solution:
> -nodefconfig would really disable the loading of files on /usr/share.
> 
What does this mean? Will -nodefconfig disable loading of bios.bin,
option roms, keymaps?

> > 
> > Really, the requirement of a "default configuration file" is a problem
> > by itself. Qemu should not require a default configuration file to work,
> > and it shouldn't require users to copy the default configuration file to
> > change options from the default.
> 
> The statement above is only partly true. The default configuration file
> would be still needed, but if defaults are stored on /usr/share, I will
> be happy with it.
> 
> My main problem was with the need to _copy_ or edit a non-trivial
> default config file. If the not-often-edited defaults/templates are
> easily found on /usr/share to be used with -readconfig, I will be happy
> with this solution, even if -nodefconfig disable the files on
> /usr/share.
> 
> > 
> > Doing this would make it impossible to deploy fixes to users if we evern
> > find out that the default configuration file had a serious bug. What if
> > a bug in our default configuration file has a serious security
> > implication?
> 
> The answer to this is: if the broken templates/defaults are on
> /usr/share, it would be easy to deploy the fix.
> 
> So, the compromise solution is:
> 
> - We can move some configuration data (especially defaults/templates)
>   to /usr/share (machine-types and CPU models could go there). This
>   way we can easily deploy fixes to the defaults, if necessary.
> - To reuse Qemu models, or machine-types, and not define everything from
>   scratch, libvirt will have to use something like:
>   "-nodefconfig -readconfig /usr/share/qemu/cpu-models-x86.conf"
> 
cpu-models-x86.conf is not a configuration file. It is hardware
description file. QEMU should not lose capability just because you run
it with -nodefconfig. -nodefconfig means that QEMU does not create
machine for you, but all parts needed to create a machine that would have
been created without -nodefconfig are still present. Not been able to
create Nehalem CPU after specifying -nodefconfig is the same as not been
able to create virtio-net i.e the bug.

> 
> (the item below is not something discussed on the call, just something I
> want to add)
> 
> To make this work better, we can allow users (humans or machines) to
> "extend" CPU models on the config file, instead of having to define
> everything from scratch. So, on /etc (or on a libvirt-generated config)
> we could have something like:
> 
> =============
> [cpu]
> base_cpudef = Nehalem
> add_features = "vmx"
> =============
> 
> Then, as long as /usr/share/cpu-models-x86.conf is loaded, the user will
> be able to reuse the Nehalem CPU model provided by Qemu.
> 
And if it will not be loaded?

> > 
> > > 
> > > But now when libvirt uses -nodefconfig, those models go away.
> > > -nodefconfig means start QEMU in the most minimal state possible.
> > > You get what you pay for if you use it.
> > > 
> > > We'll have the same problem with machine configuration files.  At
> > > some point in time, -nodefconfig will make machine models disappear.
> > 
> > It shouldn't. Machine-types are defaults to be used as base, they are
> > not user-provided configuration. And the fact that we decided to store
> > some data outside of the Qemu binary is orthogonal the design decisions
> > in the Qemu command-line and configuration interface.
> 
> So, this problem is solved if the defaults are easily found on
> /usr/share.
> 
What problem is solved and why are we mixing machine configuration files
and cpu configuration files? They are different and should be treated
differently. -nodefconfig exists only because there is not machine
configuration files currently. With machine configuration files
libvirt does not need -nodefconfig because it can create its own machine
file and make QEMU use it. So specifying machine file on QEMU's command
line implies -nodefconfig. The option itself loses its meaning and can be
dropped.

> We still have the backwards compatibility problem for pc-1.0, pc-1.1,
> and so on. But that can be discussed later, when we actually move
> machine-types to somewhere outside .c files.
> 
> > 
> > As I said previously, requiring generation of opaque config files (and
> > "copy the default config file and change it" is included on my
> > definition of "generation of opaque config files") is poor design, IMO.
> > I bet this even has an entry in some design anti-pattern catalog
> > somewhere.
> 
> This problem is also solved if the defaults are deployed on /usr/share
> and just reused/included by the config files on /etc.
> 

--
			Gleb.
Eduardo Habkost March 22, 2012, 1:31 p.m. UTC | #48
On Thu, Mar 22, 2012 at 11:32:44AM +0200, Gleb Natapov wrote:
> On Tue, Mar 13, 2012 at 11:53:19AM -0300, Eduardo Habkost wrote:
> > So, trying to summarize what was discussed in the call:
> > 
> > On Mon, Mar 12, 2012 at 10:08:10AM -0300, Eduardo Habkost wrote:
> > > > Let's say we moved CPU definitions to /usr/share/qemu/cpu-models.xml.
> > > > 
> > > > Obviously, we'd want a command line option to be able to change that
> > > > location so we'd introduce -cpu-models PATH.
> > > > 
> > > > But we want all of our command line options to be settable by the
> > > > global configuration file so we would have a cpu-model=PATH to the
> > > > configuration file.
> > > > 
> > > > But why hard code a path when we can just set the default path in the
> > > > configuration file so let's avoid hard coding and just put
> > > > cpu-models=/usr/share/qemu/cpu-models.xml in the default
> > > > configuration file.
> > > 
> > > We wouldn't do the above.
> > > 
> > > -nodefconfig should disable the loading of files on /etc, but it
> > > shouldn't disable loading internal non-configurable data that we just
> > > happened to choose to store outside the qemu binary because it makes
> > > development easier.
> > 
> > The statement above is the one not fulfilled by the compromise solution:
> > -nodefconfig would really disable the loading of files on /usr/share.
> > 
> What does this mean? Will -nodefconfig disable loading of bios.bin,
> option roms, keymaps?

Correcting myself: loading of _config_ files on /usr/share. ROM images
are opaque data to be presented to the guest somehow, just like a disk
image or kernel binary. But maybe keymaps will become "configuration"
someday, I really don't know.


> > > 
> > > Doing this would make it impossible to deploy fixes to users if we evern
> > > find out that the default configuration file had a serious bug. What if
> > > a bug in our default configuration file has a serious security
> > > implication?
> > 
> > The answer to this is: if the broken templates/defaults are on
> > /usr/share, it would be easy to deploy the fix.
> > 
> > So, the compromise solution is:
> > 
> > - We can move some configuration data (especially defaults/templates)
> >   to /usr/share (machine-types and CPU models could go there). This
> >   way we can easily deploy fixes to the defaults, if necessary.
> > - To reuse Qemu models, or machine-types, and not define everything from
> >   scratch, libvirt will have to use something like:
> >   "-nodefconfig -readconfig /usr/share/qemu/cpu-models-x86.conf"
> > 
> cpu-models-x86.conf is not a configuration file. It is hardware
> description file. QEMU should not lose capability just because you run
> it with -nodefconfig. -nodefconfig means that QEMU does not create
> machine for you, but all parts needed to create a machine that would have
> been created without -nodefconfig are still present. Not been able to
> create Nehalem CPU after specifying -nodefconfig is the same as not been
> able to create virtio-net i.e the bug.


The current design direction Qemu seems to be following is different
from that: hardware description is also considered "configuration" just
like actual machine configuration. Anthony, please correct me if I am
wrong.


What you propose is to have two levels of "configuration" (or
descriptions, or whatever we call it):

1) Hardware descriptions (or templates, or models, whatever we call it),
   that are not editable by the user (and not disabled by -nodefconfig).
   This may include CPU models, hardware emulation implemented in
   another language, machine-types, and other stuff that is part of
   "what Qemu always provides".
2) Actual machine configuration file, that is configurable and editable
   by the user, and normally loaded from /etc on from the command-line.

The only problem is: the Qemu design simply doesn't have this
distinction today (well, it _has_, the only difference is that today
item (1) is almost completely coded inside tables in .c files). So if we
want to go in that direction we have to agree this will be part of the
Qemu design.

I am not strongly inclined either way. Both approaches look good to me,
we just have to decide where we are going, because we're are in this
weird position todady because we never decided it explicitly, libvirt
expected one thing, and we implemented something else.

On the one hand I think the "two-layer" design gives us more freedom to
move stuff outside .c files and change implementation details, and fits
how we have been doing until today with machine types and built-in CPU
models, keymaps, etc.

On the other hand, I think not having this distinction between "machine
configuration" and "hardware description" may be a good thing.

For example: today there are two different ways of enabling a feature on
a CPU: defining a new model, and adding a flag to "-cpu". And I think
this asymmetry shouldn't be there: you just need a good system to define
a CPU, a good set of defaults/templates, and a good system to base your
configuration on those defaults/teampltes, no need to have two separate
"CPU definition languages". Also, we wouldn't have to code things twice
if we want to load an internal hardware description externally _and_
make some hardware details configurable one day: we just do it once:
decide how it will be configured, and store it on the configuration
file.

Maybe the same thing can be applied to machine types: machine types can
become "configuration" too, but just a configuration template to be used
as base, and can be augmented/extended on the user-provided
configuration file.


> 
> > 
> > (the item below is not something discussed on the call, just something I
> > want to add)
> > 
> > To make this work better, we can allow users (humans or machines) to
> > "extend" CPU models on the config file, instead of having to define
> > everything from scratch. So, on /etc (or on a libvirt-generated config)
> > we could have something like:
> > 
> > =============
> > [cpu]
> > base_cpudef = Nehalem
> > add_features = "vmx"
> > =============
> > 
> > Then, as long as /usr/share/cpu-models-x86.conf is loaded, the user will
> > be able to reuse the Nehalem CPU model provided by Qemu.
> > 
> And if it will not be loaded?

If it is not loaded, it is a configuration mistake. If you are reusing
something defined somewhere, it would be your responsibility to make
sure the file where the model is defined is present. On most cases you
wouldn't use -nodefconfig and it would be shipped with Qemu and you
shouldn't worry. If you used -nodefconfig, you load the CPU models file
explicitly using -readconfig.


> > > > 
> > > > But now when libvirt uses -nodefconfig, those models go away.
> > > > -nodefconfig means start QEMU in the most minimal state possible.
> > > > You get what you pay for if you use it.
> > > > 
> > > > We'll have the same problem with machine configuration files.  At
> > > > some point in time, -nodefconfig will make machine models disappear.
> > > 
> > > It shouldn't. Machine-types are defaults to be used as base, they are
> > > not user-provided configuration. And the fact that we decided to store
> > > some data outside of the Qemu binary is orthogonal the design decisions
> > > in the Qemu command-line and configuration interface.
> > 
> > So, this problem is solved if the defaults are easily found on
> > /usr/share.
> > 
> What problem is solved and why are we mixing machine configuration files
> and cpu configuration files? They are different and should be treated
> differently.

This is the root of the disagreement, it seems: they are not considered
different today. Today cpudefs are on a config file inside /etc.  One
may argue that this was a mistake in the first place, but that's the
design we have today.


> -nodefconfig exists only because there is not machine
> configuration files currently. With machine configuration files
> libvirt does not need -nodefconfig because it can create its own machine
> file and make QEMU use it. So specifying machine file on QEMU's command
> line implies -nodefconfig. The option itself loses its meaning and can be
> dropped.

I think the approach today is:

- Qemu loads defaults from default config files;
- Machine description files would be given using -readconfig and they
  would _augment_ the defaults from the default config files.

With this interface, -nodefconfig is necessary and useful. But if we
consider that "configuration" = "machine description file", and a
machine description file would never include the CPU models themselves,
that would be a different approach.
Gleb Natapov March 22, 2012, 2:30 p.m. UTC | #49
On Thu, Mar 22, 2012 at 10:31:21AM -0300, Eduardo Habkost wrote:
> On Thu, Mar 22, 2012 at 11:32:44AM +0200, Gleb Natapov wrote:
> > On Tue, Mar 13, 2012 at 11:53:19AM -0300, Eduardo Habkost wrote:
> > > So, trying to summarize what was discussed in the call:
> > > 
> > > On Mon, Mar 12, 2012 at 10:08:10AM -0300, Eduardo Habkost wrote:
> > > > > Let's say we moved CPU definitions to /usr/share/qemu/cpu-models.xml.
> > > > > 
> > > > > Obviously, we'd want a command line option to be able to change that
> > > > > location so we'd introduce -cpu-models PATH.
> > > > > 
> > > > > But we want all of our command line options to be settable by the
> > > > > global configuration file so we would have a cpu-model=PATH to the
> > > > > configuration file.
> > > > > 
> > > > > But why hard code a path when we can just set the default path in the
> > > > > configuration file so let's avoid hard coding and just put
> > > > > cpu-models=/usr/share/qemu/cpu-models.xml in the default
> > > > > configuration file.
> > > > 
> > > > We wouldn't do the above.
> > > > 
> > > > -nodefconfig should disable the loading of files on /etc, but it
> > > > shouldn't disable loading internal non-configurable data that we just
> > > > happened to choose to store outside the qemu binary because it makes
> > > > development easier.
> > > 
> > > The statement above is the one not fulfilled by the compromise solution:
> > > -nodefconfig would really disable the loading of files on /usr/share.
> > > 
> > What does this mean? Will -nodefconfig disable loading of bios.bin,
> > option roms, keymaps?
> 
> Correcting myself: loading of _config_ files on /usr/share. ROM images
> are opaque data to be presented to the guest somehow, just like a disk
> image or kernel binary. But maybe keymaps will become "configuration"
> someday, I really don't know.
> 
Where do you draw the line between "opaque data" and configuration. CPU
models are also something that is present to a guest somehow. Are you
consider ROMs to be "opaque data" because they are binary and CPU models
to be config just because it is ascii file? What if we pre-process CPU
models into binary for QEMU to read will it magically stop being
configuration?

> 
> > > > 
> > > > Doing this would make it impossible to deploy fixes to users if we evern
> > > > find out that the default configuration file had a serious bug. What if
> > > > a bug in our default configuration file has a serious security
> > > > implication?
> > > 
> > > The answer to this is: if the broken templates/defaults are on
> > > /usr/share, it would be easy to deploy the fix.
> > > 
> > > So, the compromise solution is:
> > > 
> > > - We can move some configuration data (especially defaults/templates)
> > >   to /usr/share (machine-types and CPU models could go there). This
> > >   way we can easily deploy fixes to the defaults, if necessary.
> > > - To reuse Qemu models, or machine-types, and not define everything from
> > >   scratch, libvirt will have to use something like:
> > >   "-nodefconfig -readconfig /usr/share/qemu/cpu-models-x86.conf"
> > > 
> > cpu-models-x86.conf is not a configuration file. It is hardware
> > description file. QEMU should not lose capability just because you run
> > it with -nodefconfig. -nodefconfig means that QEMU does not create
> > machine for you, but all parts needed to create a machine that would have
> > been created without -nodefconfig are still present. Not been able to
> > create Nehalem CPU after specifying -nodefconfig is the same as not been
> > able to create virtio-net i.e the bug.
> 
> 
> The current design direction Qemu seems to be following is different
> from that: hardware description is also considered "configuration" just
> like actual machine configuration. Anthony, please correct me if I am
> wrong.
That's a bug. Why trying to rationalize it now instead of fixing it. It
was fixed in RHEL by the same person who introduced it in upstream in
the first place. He just forgot to send the fix upstream. Does bug that
is present for a long time is promoted to a feature?

> 
> 
> What you propose is to have two levels of "configuration" (or
> descriptions, or whatever we call it):
> 
> 1) Hardware descriptions (or templates, or models, whatever we call it),
>    that are not editable by the user (and not disabled by -nodefconfig).
>    This may include CPU models, hardware emulation implemented in
>    another language, machine-types, and other stuff that is part of
>    "what Qemu always provides".
> 2) Actual machine configuration file, that is configurable and editable
>    by the user, and normally loaded from /etc on from the command-line.
> 
> The only problem is: the Qemu design simply doesn't have this
> distinction today (well, it _has_, the only difference is that today
> item (1) is almost completely coded inside tables in .c files). So if we
> want to go in that direction we have to agree this will be part of the
> Qemu design.
> 
> I am not strongly inclined either way. Both approaches look good to me,
> we just have to decide where we are going, because we're are in this
> weird position todady because we never decided it explicitly, libvirt
> expected one thing, and we implemented something else.
> 
> On the one hand I think the "two-layer" design gives us more freedom to
> move stuff outside .c files and change implementation details, and fits
> how we have been doing until today with machine types and built-in CPU
> models, keymaps, etc.
> 
> On the other hand, I think not having this distinction between "machine
> configuration" and "hardware description" may be a good thing.
> 
> For example: today there are two different ways of enabling a feature on
> a CPU: defining a new model, and adding a flag to "-cpu". And I think
> this asymmetry shouldn't be there: you just need a good system to define
> a CPU, a good set of defaults/templates, and a good system to base your
> configuration on those defaults/teampltes, no need to have two separate
> "CPU definition languages". Also, we wouldn't have to code things twice
> if we want to load an internal hardware description externally _and_
> make some hardware details configurable one day: we just do it once:
> decide how it will be configured, and store it on the configuration
> file.
> 
> Maybe the same thing can be applied to machine types: machine types can
> become "configuration" too, but just a configuration template to be used
> as base, and can be augmented/extended on the user-provided
> configuration file.
> 
> 
> > 
> > > 
> > > (the item below is not something discussed on the call, just something I
> > > want to add)
> > > 
> > > To make this work better, we can allow users (humans or machines) to
> > > "extend" CPU models on the config file, instead of having to define
> > > everything from scratch. So, on /etc (or on a libvirt-generated config)
> > > we could have something like:
> > > 
> > > =============
> > > [cpu]
> > > base_cpudef = Nehalem
> > > add_features = "vmx"
> > > =============
> > > 
> > > Then, as long as /usr/share/cpu-models-x86.conf is loaded, the user will
> > > be able to reuse the Nehalem CPU model provided by Qemu.
> > > 
> > And if it will not be loaded?
> 
> If it is not loaded, it is a configuration mistake. If you are reusing
> something defined somewhere, it would be your responsibility to make
> sure the file where the model is defined is present. On most cases you
> wouldn't use -nodefconfig and it would be shipped with Qemu and you
> shouldn't worry. If you used -nodefconfig, you load the CPU models file
> explicitly using -readconfig.
> 
Regular user has no idea that we describe some cpu models in .c code and
others in config file and those that are described in config files are
disappear if  -nodefconfig is used and that if they disappear they
should be loaded back and how exactly they should be loaded back. The
only user of -nodefconfig is libvirt and currently it does unexpected
things for its only user.

> 
> > > > > 
> > > > > But now when libvirt uses -nodefconfig, those models go away.
> > > > > -nodefconfig means start QEMU in the most minimal state possible.
> > > > > You get what you pay for if you use it.
> > > > > 
> > > > > We'll have the same problem with machine configuration files.  At
> > > > > some point in time, -nodefconfig will make machine models disappear.
> > > > 
> > > > It shouldn't. Machine-types are defaults to be used as base, they are
> > > > not user-provided configuration. And the fact that we decided to store
> > > > some data outside of the Qemu binary is orthogonal the design decisions
> > > > in the Qemu command-line and configuration interface.
> > > 
> > > So, this problem is solved if the defaults are easily found on
> > > /usr/share.
> > > 
> > What problem is solved and why are we mixing machine configuration files
> > and cpu configuration files? They are different and should be treated
> > differently.
> 
> This is the root of the disagreement, it seems: they are not considered
> different today. Today cpudefs are on a config file inside /etc.  One
> may argue that this was a mistake in the first place, but that's the
> design we have today.
> 
Machine configuration files do not exist today! Machine configuration is
done in .c code and -nodefconfig skips the code that does that, but it
does not skip the code that describes cpu models and resides in .c. So
as far as .c code is concerned machine creation and cpu description are
two different things, but for some reason the file with rest of cpu
model descriptions is not loaded. This is madness, not design.

> 
> > -nodefconfig exists only because there is not machine
> > configuration files currently. With machine configuration files
> > libvirt does not need -nodefconfig because it can create its own machine
> > file and make QEMU use it. So specifying machine file on QEMU's command
> > line implies -nodefconfig. The option itself loses its meaning and can be
> > dropped.
> 
> I think the approach today is:
There is not such thing as todays approach since there is not machine
config file today.

> 
> - Qemu loads defaults from default config files;
> - Machine description files would be given using -readconfig and they
>   would _augment_ the defaults from the default config files.
Jut make it possible to include one machine config file from another and
you do not need  -nodefconfig again. But this is completely orthogonal to
the discussion if cpu models are configuration or not.

> 
> With this interface, -nodefconfig is necessary and useful. But if we
> consider that "configuration" = "machine description file", and a
> machine description file would never include the CPU models themselves,
> that would be a different approach.
> 
> -- 
> Eduardo

--
			Gleb.
Eduardo Habkost March 22, 2012, 3:50 p.m. UTC | #50
On Thu, Mar 22, 2012 at 04:30:55PM +0200, Gleb Natapov wrote:
> On Thu, Mar 22, 2012 at 10:31:21AM -0300, Eduardo Habkost wrote:
> > On Thu, Mar 22, 2012 at 11:32:44AM +0200, Gleb Natapov wrote:
> > > What does this mean? Will -nodefconfig disable loading of bios.bin,
> > > option roms, keymaps?
> > 
> > Correcting myself: loading of _config_ files on /usr/share. ROM images
> > are opaque data to be presented to the guest somehow, just like a disk
> > image or kernel binary. But maybe keymaps will become "configuration"
> > someday, I really don't know.
> > 
> Where do you draw the line between "opaque data" and configuration. CPU
> models are also something that is present to a guest somehow.

Just the fact that it's in a structured key=value format that Qemu
itself will interpret before exposing something to the guest. Yes, it's
a bit arbitrary. If we could make everything "configuration data", we
would (or that's what I think Anthony is pushing for--I hope he will
reply and clarify that).

> Are you
> consider ROMs to be "opaque data" because they are binary and CPU models
> to be config just because it is ascii file?

Not just "ascii file", but structured (and relatively small)
[section]key=value data. If BIOSes were not opaque binary code and could
be converted to a small set of config sections and key=values just like
cpudefs, one could argue that BIOSes could become configuration data,
too.


> What if we pre-process CPU
> models into binary for QEMU to read will it magically stop being
> configuration?

Doing the reverse (transforming simple [section]key=value data to a
binary format) would just be silly and wouldn't gain us anything. The
point here is that we (Qemu) seem to be moving towards a design where
most things are structured "configuration data" that fits on a
[section]key=value model, and Qemu just interprets that structured data
to build a virtual machine.

(That's why I said that perhaps keymaps could become configuration
someday. Because maybe they can be converted to a key=value model
relatively easily)


> > > > > Doing this would make it impossible to deploy fixes to users if we evern
> > > > > find out that the default configuration file had a serious bug. What if
> > > > > a bug in our default configuration file has a serious security
> > > > > implication?
> > > > 
> > > > The answer to this is: if the broken templates/defaults are on
> > > > /usr/share, it would be easy to deploy the fix.
> > > > 
> > > > So, the compromise solution is:
> > > > 
> > > > - We can move some configuration data (especially defaults/templates)
> > > >   to /usr/share (machine-types and CPU models could go there). This
> > > >   way we can easily deploy fixes to the defaults, if necessary.
> > > > - To reuse Qemu models, or machine-types, and not define everything from
> > > >   scratch, libvirt will have to use something like:
> > > >   "-nodefconfig -readconfig /usr/share/qemu/cpu-models-x86.conf"
> > > > 
> > > cpu-models-x86.conf is not a configuration file. It is hardware
> > > description file. QEMU should not lose capability just because you run
> > > it with -nodefconfig. -nodefconfig means that QEMU does not create
> > > machine for you, but all parts needed to create a machine that would have
> > > been created without -nodefconfig are still present. Not been able to
> > > create Nehalem CPU after specifying -nodefconfig is the same as not been
> > > able to create virtio-net i.e the bug.
> > 
> > 
> > The current design direction Qemu seems to be following is different
> > from that: hardware description is also considered "configuration" just
> > like actual machine configuration. Anthony, please correct me if I am
> > wrong.
> That's a bug. Why trying to rationalize it now instead of fixing it.

It's just a bug for you because you disagree with the current design.
You can call it rationalization, yes; I am just accepting Anthony's
proposal because it's equally good (to me) as what you are proposing.


> It
> was fixed in RHEL by the same person who introduced it in upstream in
> the first place. He just forgot to send the fix upstream. Does bug that
> is present for a long time is promoted to a feature?

John didn't forget it, he knew that upstream could go in a different
direction. The RHEL6 patch description has this:

"Note this is intended as an interim work-around for rhel6.0. While the
new location of the config file should remain the same, the mechanism to
direct qemu to it will likely differ going forward."


> > > > (the item below is not something discussed on the call, just something I
> > > > want to add)
> > > > 
> > > > To make this work better, we can allow users (humans or machines) to
> > > > "extend" CPU models on the config file, instead of having to define
> > > > everything from scratch. So, on /etc (or on a libvirt-generated config)
> > > > we could have something like:
> > > > 
> > > > =============
> > > > [cpu]
> > > > base_cpudef = Nehalem
> > > > add_features = "vmx"
> > > > =============
> > > > 
> > > > Then, as long as /usr/share/cpu-models-x86.conf is loaded, the user will
> > > > be able to reuse the Nehalem CPU model provided by Qemu.
> > > > 
> > > And if it will not be loaded?
> > 
> > If it is not loaded, it is a configuration mistake. If you are reusing
> > something defined somewhere, it would be your responsibility to make
> > sure the file where the model is defined is present. On most cases you
> > wouldn't use -nodefconfig and it would be shipped with Qemu and you
> > shouldn't worry. If you used -nodefconfig, you load the CPU models file
> > explicitly using -readconfig.
> > 
> Regular user has no idea that we describe some cpu models in .c code and
> others in config file and those that are described in config files are
> disappear if  -nodefconfig is used and that if they disappear they
> should be loaded back and how exactly they should be loaded back. The
> only user of -nodefconfig is libvirt and currently it does unexpected
> things for its only user.

And one could argue that this only user has wrong expectations and Qemu
does it this way by design, and we would be running in circles forever.
The only problem for us is that if we keep running in circles, Qemu will
keep the current design (or bug, if you want to call it), until Qemu
maintainers agree to change the current behavior.

I mean: you're right into arguing for it, but somehow I feel I am the
wrong person to reply to your arguments, because IMO both approaches are
valid. I really hope Anthony will reply to your points, too, or that we
discuss that on the next Qemu call. Because if you convince only me, we
would gain nothing if the patches implementing the design we agreed with
get rejected. It's really a pity that your arguments weren't exposed
last week during the Qemu call, when I and Anthony discussed this and
agreed on the "compromise solution" I described.


> > > > > > 
> > > > > > But now when libvirt uses -nodefconfig, those models go away.
> > > > > > -nodefconfig means start QEMU in the most minimal state possible.
> > > > > > You get what you pay for if you use it.
> > > > > > 
> > > > > > We'll have the same problem with machine configuration files.  At
> > > > > > some point in time, -nodefconfig will make machine models disappear.
> > > > > 
> > > > > It shouldn't. Machine-types are defaults to be used as base, they are
> > > > > not user-provided configuration. And the fact that we decided to store
> > > > > some data outside of the Qemu binary is orthogonal the design decisions
> > > > > in the Qemu command-line and configuration interface.
> > > > 
> > > > So, this problem is solved if the defaults are easily found on
> > > > /usr/share.
> > > > 
> > > What problem is solved and why are we mixing machine configuration files
> > > and cpu configuration files? They are different and should be treated
> > > differently.
> > 
> > This is the root of the disagreement, it seems: they are not considered
> > different today. Today cpudefs are on a config file inside /etc.  One
> > may argue that this was a mistake in the first place, but that's the
> > design we have today.
> > 
> Machine configuration files do not exist today! Machine configuration is
> done in .c code and -nodefconfig skips the code that does that, but it
> does not skip the code that describes cpu models and resides in .c. So
> as far as .c code is concerned machine creation and cpu description are
> two different things, but for some reason the file with rest of cpu
> model descriptions is not loaded. This is madness, not design.

Let's try to agree on terminology here: what's "machine configuration"?
I understand it as the configuration of a specific virtual machine for a
specific Qemu instance (i.e. what we put on the command-line today), I
am not talking about machine-types (yet ;).

So, considering this definition, machine configuration exists today,
there is stuff you can define on a configuration file, and you can give
this configuration to -readconfig today. The config file is
unfortunately not as powerful as the command-line (some things can be
set only on the command-line today), but it exists. Anthony even
submitted a RFC series to make the config files as powerful as the
command-line, by adding a [system] section. I don't agree completely
with that specific implementation, but I agree with the direction it is
taking.


> > 
> > > -nodefconfig exists only because there is not machine
> > > configuration files currently. With machine configuration files
> > > libvirt does not need -nodefconfig because it can create its own machine
> > > file and make QEMU use it. So specifying machine file on QEMU's command
> > > line implies -nodefconfig. The option itself loses its meaning and can be
> > > dropped.
> > 
> > I think the approach today is:
> There is not such thing as todays approach since there is not machine
> config file today.

There are config files today, and they work (but they are not as
powerful as the command-line). But the interface is the one I describe
below.

> 
> > 
> > - Qemu loads defaults from default config files;
> > - Machine description files would be given using -readconfig and they
> >   would _augment_ the defaults from the default config files.
> Jut make it possible to include one machine config file from another and
> you do not need  -nodefconfig again.

That would be a valid approach too, but this is just not how the config
files work today. We may change it, yes. But today the config file
system is based on getting config data from multiple places
(/etc/qemu/qemu.conf, extend it with /etc/qemu/target-x86_64.conf, and
extend the result with the command-line arguments), instead of using
includes.

> But this is completely orthogonal to
> the discussion if cpu models are configuration or not.

You are right. How the "machine configuration" is going to be loaded is
orthogonal to the main question. The main question here is wheter
cpudefs should be included on what we call "machine configuration" or
not. (agreed?)
Anthony Liguori March 22, 2012, 4:37 p.m. UTC | #51
On 03/22/2012 04:32 AM, Gleb Natapov wrote:
> On Tue, Mar 13, 2012 at 11:53:19AM -0300, Eduardo Habkost wrote:
>> So, trying to summarize what was discussed in the call:
>>
>> On Mon, Mar 12, 2012 at 10:08:10AM -0300, Eduardo Habkost wrote:
>>>> Let's say we moved CPU definitions to /usr/share/qemu/cpu-models.xml.
>>>>
>>>> Obviously, we'd want a command line option to be able to change that
>>>> location so we'd introduce -cpu-models PATH.
>>>>
>>>> But we want all of our command line options to be settable by the
>>>> global configuration file so we would have a cpu-model=PATH to the
>>>> configuration file.
>>>>
>>>> But why hard code a path when we can just set the default path in the
>>>> configuration file so let's avoid hard coding and just put
>>>> cpu-models=/usr/share/qemu/cpu-models.xml in the default
>>>> configuration file.
>>>
>>> We wouldn't do the above.
>>>
>>> -nodefconfig should disable the loading of files on /etc, but it
>>> shouldn't disable loading internal non-configurable data that we just
>>> happened to choose to store outside the qemu binary because it makes
>>> development easier.
>>
>> The statement above is the one not fulfilled by the compromise solution:
>> -nodefconfig would really disable the loading of files on /usr/share.
>>
> What does this mean? Will -nodefconfig disable loading of bios.bin,
> option roms, keymaps?
>
>>>
>>> Really, the requirement of a "default configuration file" is a problem
>>> by itself. Qemu should not require a default configuration file to work,
>>> and it shouldn't require users to copy the default configuration file to
>>> change options from the default.
>>
>> The statement above is only partly true. The default configuration file
>> would be still needed, but if defaults are stored on /usr/share, I will
>> be happy with it.
>>
>> My main problem was with the need to _copy_ or edit a non-trivial
>> default config file. If the not-often-edited defaults/templates are
>> easily found on /usr/share to be used with -readconfig, I will be happy
>> with this solution, even if -nodefconfig disable the files on
>> /usr/share.
>>
>>>
>>> Doing this would make it impossible to deploy fixes to users if we evern
>>> find out that the default configuration file had a serious bug. What if
>>> a bug in our default configuration file has a serious security
>>> implication?
>>
>> The answer to this is: if the broken templates/defaults are on
>> /usr/share, it would be easy to deploy the fix.
>>
>> So, the compromise solution is:
>>
>> - We can move some configuration data (especially defaults/templates)
>>    to /usr/share (machine-types and CPU models could go there). This
>>    way we can easily deploy fixes to the defaults, if necessary.
>> - To reuse Qemu models, or machine-types, and not define everything from
>>    scratch, libvirt will have to use something like:
>>    "-nodefconfig -readconfig /usr/share/qemu/cpu-models-x86.conf"
>>
> cpu-models-x86.conf is not a configuration file. It is hardware
> description file. QEMU should not lose capability just because you run
> it with -nodefconfig. -nodefconfig means that QEMU does not create
> machine for you, but all parts needed to create a machine that would have
> been created without -nodefconfig are still present. Not been able to
> create Nehalem CPU after specifying -nodefconfig is the same as not been
> able to create virtio-net i.e the bug.
>
>>
>> (the item below is not something discussed on the call, just something I
>> want to add)
>>
>> To make this work better, we can allow users (humans or machines) to
>> "extend" CPU models on the config file, instead of having to define
>> everything from scratch. So, on /etc (or on a libvirt-generated config)
>> we could have something like:
>>
>> =============
>> [cpu]
>> base_cpudef = Nehalem
>> add_features = "vmx"
>> =============
>>
>> Then, as long as /usr/share/cpu-models-x86.conf is loaded, the user will
>> be able to reuse the Nehalem CPU model provided by Qemu.
>>
> And if it will not be loaded?
>
>>>
>>>>
>>>> But now when libvirt uses -nodefconfig, those models go away.
>>>> -nodefconfig means start QEMU in the most minimal state possible.
>>>> You get what you pay for if you use it.
>>>>
>>>> We'll have the same problem with machine configuration files.  At
>>>> some point in time, -nodefconfig will make machine models disappear.
>>>
>>> It shouldn't. Machine-types are defaults to be used as base, they are
>>> not user-provided configuration. And the fact that we decided to store
>>> some data outside of the Qemu binary is orthogonal the design decisions
>>> in the Qemu command-line and configuration interface.
>>
>> So, this problem is solved if the defaults are easily found on
>> /usr/share.
>>
> What problem is solved and why are we mixing machine configuration files
> and cpu configuration files? They are different and should be treated
> differently. -nodefconfig exists only because there is not machine
> configuration files currently. With machine configuration files
> libvirt does not need -nodefconfig because it can create its own machine
> file and make QEMU use it. So specifying machine file on QEMU's command
> line implies -nodefconfig. The option itself loses its meaning and can be
> dropped.

No, -nodefconfig means "no default config".

As with many projects, we can have *some* configuration required.

The default configure should have a:

[system]
readconfig=@SYSCONFDIR@/cpu-models-x86_64.cfg

Stanza by default.  If libvirt wants to reuse this, they can use -readconfig if 
they use -nodefconfig.

Regards,

Anthony Liguori

>
>> We still have the backwards compatibility problem for pc-1.0, pc-1.1,
>> and so on. But that can be discussed later, when we actually move
>> machine-types to somewhere outside .c files.
>>
>>>
>>> As I said previously, requiring generation of opaque config files (and
>>> "copy the default config file and change it" is included on my
>>> definition of "generation of opaque config files") is poor design, IMO.
>>> I bet this even has an entry in some design anti-pattern catalog
>>> somewhere.
>>
>> This problem is also solved if the defaults are deployed on /usr/share
>> and just reused/included by the config files on /etc.
>>
>
> --
> 			Gleb.
Eduardo Habkost March 22, 2012, 5:14 p.m. UTC | #52
On Thu, Mar 22, 2012 at 11:37:39AM -0500, Anthony Liguori wrote:
> On 03/22/2012 04:32 AM, Gleb Natapov wrote:
> >On Tue, Mar 13, 2012 at 11:53:19AM -0300, Eduardo Habkost wrote:
> >>So, this problem is solved if the defaults are easily found on
> >>/usr/share.
> >>
> >What problem is solved and why are we mixing machine configuration files
> >and cpu configuration files? They are different and should be treated
> >differently. -nodefconfig exists only because there is not machine
> >configuration files currently. With machine configuration files
> >libvirt does not need -nodefconfig because it can create its own machine
> >file and make QEMU use it. So specifying machine file on QEMU's command
> >line implies -nodefconfig. The option itself loses its meaning and can be
> >dropped.
> 
> No, -nodefconfig means "no default config".
> 
> As with many projects, we can have *some* configuration required.
> 
> The default configure should have a:
> 
> [system]
> readconfig=@SYSCONFDIR@/cpu-models-x86_64.cfg

Not @SYSCONFDIR@, but @DATADIR@. CPU models belong to /usr/share because
they aren't meant to be changed by the user (I think I already explained
why: because we have to be able to deploy fixes to them).

> 
> Stanza by default.  If libvirt wants to reuse this, they can use
> -readconfig if they use -nodefconfig.

You are just repeating how you believe it should work based on the
premise that "cpudefs are configuration". We're discussing/questioning
this exact premise, here, and I would really appreciate to hear why the
model Gleb is proposing is not valid.

More precisely, this part:

> >cpu-models-x86.conf is not a configuration file. It is hardware
> >description file. QEMU should not lose capability just because you run
> >it with -nodefconfig. -nodefconfig means that QEMU does not create
> >machine for you, but all parts needed to create a machine that would have
> >been created without -nodefconfig are still present. Not been able to
> >create Nehalem CPU after specifying -nodefconfig is the same as not been
> >able to create virtio-net i.e the bug.

And the related points Gleb mentioned further in this thread.
Anthony Liguori March 22, 2012, 8:01 p.m. UTC | #53
On 03/22/2012 12:14 PM, Eduardo Habkost wrote:
> On Thu, Mar 22, 2012 at 11:37:39AM -0500, Anthony Liguori wrote:
>> On 03/22/2012 04:32 AM, Gleb Natapov wrote:
>>> On Tue, Mar 13, 2012 at 11:53:19AM -0300, Eduardo Habkost wrote:
>>>> So, this problem is solved if the defaults are easily found on
>>>> /usr/share.
>>>>
>>> What problem is solved and why are we mixing machine configuration files
>>> and cpu configuration files? They are different and should be treated
>>> differently. -nodefconfig exists only because there is not machine
>>> configuration files currently. With machine configuration files
>>> libvirt does not need -nodefconfig because it can create its own machine
>>> file and make QEMU use it. So specifying machine file on QEMU's command
>>> line implies -nodefconfig. The option itself loses its meaning and can be
>>> dropped.
>>
>> No, -nodefconfig means "no default config".
>>
>> As with many projects, we can have *some* configuration required.
>>
>> The default configure should have a:
>>
>> [system]
>> readconfig=@SYSCONFDIR@/cpu-models-x86_64.cfg
>
> Not @SYSCONFDIR@, but @DATADIR@. CPU models belong to /usr/share because
> they aren't meant to be changed by the user (I think I already explained
> why: because we have to be able to deploy fixes to them).
>
>>
>> Stanza by default.  If libvirt wants to reuse this, they can use
>> -readconfig if they use -nodefconfig.
>
> You are just repeating how you believe it should work based on the
> premise that "cpudefs are configuration". We're discussing/questioning
> this exact premise, here, and I would really appreciate to hear why the
> model Gleb is proposing is not valid.
>
> More precisely, this part:
>
>>> cpu-models-x86.conf is not a configuration file. It is hardware
>>> description file. QEMU should not lose capability just because you run
>>> it with -nodefconfig. -nodefconfig means that QEMU does not create
>>> machine for you, but all parts needed to create a machine that would have
>>> been created without -nodefconfig are still present. Not been able to
>>> create Nehalem CPU after specifying -nodefconfig is the same as not been
>>> able to create virtio-net i.e the bug.
>
> And the related points Gleb mentioned further in this thread.

Because the next patch series that would follow would be a -cpu-defs-path that 
would be a one-off hack with a global variable and a -no-cpu-defs.

So let's avoid that and start by having a positive configuration mechanism that 
the user can use to change the path and exclude it.  My suggestion eliminate the 
need for two future command line options.

Regards,

Anthony Liguori

>
Gleb Natapov March 25, 2012, 9:49 a.m. UTC | #54
On Thu, Mar 22, 2012 at 03:01:17PM -0500, Anthony Liguori wrote:
> On 03/22/2012 12:14 PM, Eduardo Habkost wrote:
> >On Thu, Mar 22, 2012 at 11:37:39AM -0500, Anthony Liguori wrote:
> >>On 03/22/2012 04:32 AM, Gleb Natapov wrote:
> >>>On Tue, Mar 13, 2012 at 11:53:19AM -0300, Eduardo Habkost wrote:
> >>>>So, this problem is solved if the defaults are easily found on
> >>>>/usr/share.
> >>>>
> >>>What problem is solved and why are we mixing machine configuration files
> >>>and cpu configuration files? They are different and should be treated
> >>>differently. -nodefconfig exists only because there is not machine
> >>>configuration files currently. With machine configuration files
> >>>libvirt does not need -nodefconfig because it can create its own machine
> >>>file and make QEMU use it. So specifying machine file on QEMU's command
> >>>line implies -nodefconfig. The option itself loses its meaning and can be
> >>>dropped.
> >>
> >>No, -nodefconfig means "no default config".
> >>
> >>As with many projects, we can have *some* configuration required.
> >>
> >>The default configure should have a:
> >>
> >>[system]
> >>readconfig=@SYSCONFDIR@/cpu-models-x86_64.cfg
> >
> >Not @SYSCONFDIR@, but @DATADIR@. CPU models belong to /usr/share because
> >they aren't meant to be changed by the user (I think I already explained
> >why: because we have to be able to deploy fixes to them).
> >
> >>
> >>Stanza by default.  If libvirt wants to reuse this, they can use
> >>-readconfig if they use -nodefconfig.
> >
> >You are just repeating how you believe it should work based on the
> >premise that "cpudefs are configuration". We're discussing/questioning
> >this exact premise, here, and I would really appreciate to hear why the
> >model Gleb is proposing is not valid.
> >
> >More precisely, this part:
> >
> >>>cpu-models-x86.conf is not a configuration file. It is hardware
> >>>description file. QEMU should not lose capability just because you run
> >>>it with -nodefconfig. -nodefconfig means that QEMU does not create
> >>>machine for you, but all parts needed to create a machine that would have
> >>>been created without -nodefconfig are still present. Not been able to
> >>>create Nehalem CPU after specifying -nodefconfig is the same as not been
> >>>able to create virtio-net i.e the bug.
> >
> >And the related points Gleb mentioned further in this thread.
> 
> Because the next patch series that would follow would be a
> -cpu-defs-path that would be a one-off hack with a global variable
> and a -no-cpu-defs.
> 
And it will be rejected since cpu models are not part of configuration,
but QEMU internals stored in outside file. We have -L switch to tell
qemu where such things should be loaded from and that's it.

> So let's avoid that and start by having a positive configuration
> mechanism that the user can use to change the path and exclude it.
> My suggestion eliminate the need for two future command line
> options.
> 
If cpu models are not part of configuration they should not be affected
by configuration mechanism. You are just avoiding addressing the real
question that if asked above.

--
			Gleb.
Gleb Natapov March 25, 2012, 10:19 a.m. UTC | #55
On Thu, Mar 22, 2012 at 12:50:18PM -0300, Eduardo Habkost wrote:
> On Thu, Mar 22, 2012 at 04:30:55PM +0200, Gleb Natapov wrote:
> > On Thu, Mar 22, 2012 at 10:31:21AM -0300, Eduardo Habkost wrote:
> > > On Thu, Mar 22, 2012 at 11:32:44AM +0200, Gleb Natapov wrote:
> > > > What does this mean? Will -nodefconfig disable loading of bios.bin,
> > > > option roms, keymaps?
> > > 
> > > Correcting myself: loading of _config_ files on /usr/share. ROM images
> > > are opaque data to be presented to the guest somehow, just like a disk
> > > image or kernel binary. But maybe keymaps will become "configuration"
> > > someday, I really don't know.
> > > 
> > Where do you draw the line between "opaque data" and configuration. CPU
> > models are also something that is present to a guest somehow.
> 
> Just the fact that it's in a structured key=value format that Qemu
> itself will interpret before exposing something to the guest. Yes, it's
> a bit arbitrary. If we could make everything "configuration data", we
> would (or that's what I think Anthony is pushing for--I hope he will
> reply and clarify that).
> 
It is not a "bit arbitrary" it is completely arbitrary.

> > Are you
> > consider ROMs to be "opaque data" because they are binary and CPU models
> > to be config just because it is ascii file?
> 
> Not just "ascii file", but structured (and relatively small)
> [section]key=value data. If BIOSes were not opaque binary code and could
> be converted to a small set of config sections and key=values just like
> cpudefs, one could argue that BIOSes could become configuration data,
> too.
> 
There is no argument I can make about it since there is no logic to
refute to begin with :) Well may be except that when cpu model file will
support configuring all couid leafs for each cpu model it will not
be so small :)
 
> 
> > What if we pre-process CPU
> > models into binary for QEMU to read will it magically stop being
> > configuration?
> 
> Doing the reverse (transforming simple [section]key=value data to a
> binary format) would just be silly and wouldn't gain us anything. The
> point here is that we (Qemu) seem to be moving towards a design where
> most things are structured "configuration data" that fits on a
> [section]key=value model, and Qemu just interprets that structured data
> to build a virtual machine.
Nothing silly about it. You can move data parsing outside of QEMU and
just mmap cpu definitions in QEMU.

> 
> (That's why I said that perhaps keymaps could become configuration
> someday. Because maybe they can be converted to a key=value model
> relatively easily)
> 
Such whole sale approach is harmful since it starts to affect design
decisions. So now if it seams logical to move something outside the code
one can decide against it just because it will become "configuration"
due to that design.

> 
> > > > > > Doing this would make it impossible to deploy fixes to users if we evern
> > > > > > find out that the default configuration file had a serious bug. What if
> > > > > > a bug in our default configuration file has a serious security
> > > > > > implication?
> > > > > 
> > > > > The answer to this is: if the broken templates/defaults are on
> > > > > /usr/share, it would be easy to deploy the fix.
> > > > > 
> > > > > So, the compromise solution is:
> > > > > 
> > > > > - We can move some configuration data (especially defaults/templates)
> > > > >   to /usr/share (machine-types and CPU models could go there). This
> > > > >   way we can easily deploy fixes to the defaults, if necessary.
> > > > > - To reuse Qemu models, or machine-types, and not define everything from
> > > > >   scratch, libvirt will have to use something like:
> > > > >   "-nodefconfig -readconfig /usr/share/qemu/cpu-models-x86.conf"
> > > > > 
> > > > cpu-models-x86.conf is not a configuration file. It is hardware
> > > > description file. QEMU should not lose capability just because you run
> > > > it with -nodefconfig. -nodefconfig means that QEMU does not create
> > > > machine for you, but all parts needed to create a machine that would have
> > > > been created without -nodefconfig are still present. Not been able to
> > > > create Nehalem CPU after specifying -nodefconfig is the same as not been
> > > > able to create virtio-net i.e the bug.
> > > 
> > > 
> > > The current design direction Qemu seems to be following is different
> > > from that: hardware description is also considered "configuration" just
> > > like actual machine configuration. Anthony, please correct me if I am
> > > wrong.
> > That's a bug. Why trying to rationalize it now instead of fixing it.
> 
> It's just a bug for you because you disagree with the current design.
> You can call it rationalization, yes; I am just accepting Anthony's
> proposal because it's equally good (to me) as what you are proposing.
> 
> 
> > It
> > was fixed in RHEL by the same person who introduced it in upstream in
> > the first place. He just forgot to send the fix upstream. Does bug that
> > is present for a long time is promoted to a feature?
> 
> John didn't forget it, he knew that upstream could go in a different
> direction. The RHEL6 patch description has this:
> 
> "Note this is intended as an interim work-around for rhel6.0. While the
> new location of the config file should remain the same, the mechanism to
> direct qemu to it will likely differ going forward."
> 
That's even worse. It is very sad state of affairs for upstream if he
knew that the fix will not be accepted. As a result QEMU upstream does
not work properly with libvirt for actually users.

> > > > > (the item below is not something discussed on the call, just something I
> > > > > want to add)
> > > > > 
> > > > > To make this work better, we can allow users (humans or machines) to
> > > > > "extend" CPU models on the config file, instead of having to define
> > > > > everything from scratch. So, on /etc (or on a libvirt-generated config)
> > > > > we could have something like:
> > > > > 
> > > > > =============
> > > > > [cpu]
> > > > > base_cpudef = Nehalem
> > > > > add_features = "vmx"
> > > > > =============
> > > > > 
> > > > > Then, as long as /usr/share/cpu-models-x86.conf is loaded, the user will
> > > > > be able to reuse the Nehalem CPU model provided by Qemu.
> > > > > 
> > > > And if it will not be loaded?
> > > 
> > > If it is not loaded, it is a configuration mistake. If you are reusing
> > > something defined somewhere, it would be your responsibility to make
> > > sure the file where the model is defined is present. On most cases you
> > > wouldn't use -nodefconfig and it would be shipped with Qemu and you
> > > shouldn't worry. If you used -nodefconfig, you load the CPU models file
> > > explicitly using -readconfig.
> > > 
> > Regular user has no idea that we describe some cpu models in .c code and
> > others in config file and those that are described in config files are
> > disappear if  -nodefconfig is used and that if they disappear they
> > should be loaded back and how exactly they should be loaded back. The
> > only user of -nodefconfig is libvirt and currently it does unexpected
> > things for its only user.
> 
> And one could argue that this only user has wrong expectations and Qemu
> does it this way by design, and we would be running in circles forever.
QEMU is new GNOME3?

> The only problem for us is that if we keep running in circles, Qemu will
> keep the current design (or bug, if you want to call it), until Qemu
> maintainers agree to change the current behavior.
> 
> I mean: you're right into arguing for it, but somehow I feel I am the
> wrong person to reply to your arguments, because IMO both approaches are
> valid. I really hope Anthony will reply to your points, too, or that we
> discuss that on the next Qemu call. Because if you convince only me, we
> would gain nothing if the patches implementing the design we agreed with
> get rejected. It's really a pity that your arguments weren't exposed
> last week during the Qemu call, when I and Anthony discussed this and
> agreed on the "compromise solution" I described.
> 
If I will convince you we can both convince Anthony afterwords :) You
are keeping rationalizing the current behaviour, and since you are the
only one who is doing it I have no other points to reply to.

> 
> > > > > > > 
> > > > > > > But now when libvirt uses -nodefconfig, those models go away.
> > > > > > > -nodefconfig means start QEMU in the most minimal state possible.
> > > > > > > You get what you pay for if you use it.
> > > > > > > 
> > > > > > > We'll have the same problem with machine configuration files.  At
> > > > > > > some point in time, -nodefconfig will make machine models disappear.
> > > > > > 
> > > > > > It shouldn't. Machine-types are defaults to be used as base, they are
> > > > > > not user-provided configuration. And the fact that we decided to store
> > > > > > some data outside of the Qemu binary is orthogonal the design decisions
> > > > > > in the Qemu command-line and configuration interface.
> > > > > 
> > > > > So, this problem is solved if the defaults are easily found on
> > > > > /usr/share.
> > > > > 
> > > > What problem is solved and why are we mixing machine configuration files
> > > > and cpu configuration files? They are different and should be treated
> > > > differently.
> > > 
> > > This is the root of the disagreement, it seems: they are not considered
> > > different today. Today cpudefs are on a config file inside /etc.  One
> > > may argue that this was a mistake in the first place, but that's the
> > > design we have today.
> > > 
> > Machine configuration files do not exist today! Machine configuration is
> > done in .c code and -nodefconfig skips the code that does that, but it
> > does not skip the code that describes cpu models and resides in .c. So
> > as far as .c code is concerned machine creation and cpu description are
> > two different things, but for some reason the file with rest of cpu
> > model descriptions is not loaded. This is madness, not design.
> 
> Let's try to agree on terminology here: what's "machine configuration"?
> I understand it as the configuration of a specific virtual machine for a
> specific Qemu instance (i.e. what we put on the command-line today), I
> am not talking about machine-types (yet ;).
> 
> So, considering this definition, machine configuration exists today,
> there is stuff you can define on a configuration file, and you can give
> this configuration to -readconfig today. The config file is
My /etc/qemu/ contains no machine configuration file after fresh QEMU
install. The only thing there is the thing that shouldn't be there: cpu
modes definition. So everything I wrote above is correct. i.e -nodefconfig
skips the _code_ that creates machine (not reading a config file) and it
does not skip cpu model definitions that are done in .c file. I would
gladly hear rationalization for this behaviour and you just ignored it
in your reply.

> unfortunately not as powerful as the command-line (some things can be
> set only on the command-line today), but it exists. Anthony even
> submitted a RFC series to make the config files as powerful as the
> command-line, by adding a [system] section. I don't agree completely
> with that specific implementation, but I agree with the direction it is
> taking.
> 
> 
> > > 
> > > > -nodefconfig exists only because there is not machine
> > > > configuration files currently. With machine configuration files
> > > > libvirt does not need -nodefconfig because it can create its own machine
> > > > file and make QEMU use it. So specifying machine file on QEMU's command
> > > > line implies -nodefconfig. The option itself loses its meaning and can be
> > > > dropped.
> > > 
> > > I think the approach today is:
> > There is not such thing as todays approach since there is not machine
> > config file today.
> 
> There are config files today, and they work (but they are not as
> powerful as the command-line). But the interface is the one I describe
> below.
Again, this is not how default machine is created, so hardly relevant to
-nodefconfig behaviour, no?

> 
> > 
> > > 
> > > - Qemu loads defaults from default config files;
> > > - Machine description files would be given using -readconfig and they
> > >   would _augment_ the defaults from the default config files.
> > Jut make it possible to include one machine config file from another and
> > you do not need  -nodefconfig again.
> 
> That would be a valid approach too, but this is just not how the config
> files work today. We may change it, yes. But today the config file
> system is based on getting config data from multiple places
> (/etc/qemu/qemu.conf, extend it with /etc/qemu/target-x86_64.conf, and
> extend the result with the command-line arguments), instead of using
> includes.
Nothing wrong with that! But cpu models should not be part of
/etc/qemu/target-x86_64.conf :)

> 
> > But this is completely orthogonal to
> > the discussion if cpu models are configuration or not.
> 
> You are right. How the "machine configuration" is going to be loaded is
> orthogonal to the main question. The main question here is wheter
> cpudefs should be included on what we call "machine configuration" or
> not. (agreed?)
> 
> -- 
> Eduardo

--
			Gleb.
Anthony Liguori March 25, 2012, 12:55 p.m. UTC | #56
On 03/25/2012 04:49 AM, Gleb Natapov wrote:
> On Thu, Mar 22, 2012 at 03:01:17PM -0500, Anthony Liguori wrote:
>> So let's avoid that and start by having a positive configuration
>> mechanism that the user can use to change the path and exclude it.
>> My suggestion eliminate the need for two future command line
>> options.
>>
> If cpu models are not part of configuration they should not be affected
> by configuration mechanism. You are just avoiding addressing the real
> question that if asked above.

I think you're just refusing to listen.

The stated direction of QEMU, for literally years now, is that we want to arrive 
at the following:

QEMU is composed of a series of objects who's relationships can be fully 
described by an external configuration file.  Much of the current baked in 
concepts (like machines) would then become configuration files.

qemu -M pc

Would effectively be short hand for -readconfig /usr/share/qemu/machines/pc.cfg

There are some valid points that were raised in this thread, namely that the 
user needs to have a file that acts as strictly config (stored in /etc).  I'm 
totally happy moving the CPU configuration to /usr/share in order to address this.

I think the thread has reduced to: should /usr/share configuration files be read 
by default or just treated as additional configuration files.  It seems pretty 
obvious to me that they should be treated as normal configuration files.  This 
gives you the user the ability to have fine grain control over which files are 
used including the ability to change the location for each file.

Maybe RHEL only wants to expose supported CPUs and supported machines, wouldn't 
it be better to not have to patch QEMU to do that?

Wouldn't it be even better if you could drop in a separate CPU configuration 
file with the supported CPU types and then change the default /etc config to use 
that instead?

Regards,

Anthony Liguori
Avi Kivity March 25, 2012, 1:08 p.m. UTC | #57
On 03/25/2012 02:55 PM, Anthony Liguori wrote:
>> If cpu models are not part of configuration they should not be affected
>> by configuration mechanism. You are just avoiding addressing the real
>> question that if asked above.
>
>
> I think you're just refusing to listen.
>
> The stated direction of QEMU, for literally years now, is that we want
> to arrive at the following:
>
> QEMU is composed of a series of objects who's relationships can be
> fully described by an external configuration file.  Much of the
> current baked in concepts (like machines) would then become
> configuration files.
>
> qemu -M pc
>
> Would effectively be short hand for -readconfig
> /usr/share/qemu/machines/pc.cfg

In that case

 qemu -cpu westmere

is shorthand for -readconfig /usr/share/qemu/cpus/westmere.cfg.

> I think the thread has reduced to: should /usr/share configuration
> files be read by default or just treated as additional configuration
> files.

If they're read as soon as they're referenced, what's the difference?
Anthony Liguori March 25, 2012, 1:09 p.m. UTC | #58
On 03/25/2012 05:19 AM, Gleb Natapov wrote:
> On Thu, Mar 22, 2012 at 12:50:18PM -0300, Eduardo Habkost wrote:
>> On Thu, Mar 22, 2012 at 04:30:55PM +0200, Gleb Natapov wrote:
>>> On Thu, Mar 22, 2012 at 10:31:21AM -0300, Eduardo Habkost wrote:
>>>> On Thu, Mar 22, 2012 at 11:32:44AM +0200, Gleb Natapov wrote:
>>>>> What does this mean? Will -nodefconfig disable loading of bios.bin,
>>>>> option roms, keymaps?
>>>>
>>>> Correcting myself: loading of _config_ files on /usr/share. ROM images
>>>> are opaque data to be presented to the guest somehow, just like a disk
>>>> image or kernel binary. But maybe keymaps will become "configuration"
>>>> someday, I really don't know.
>>>>
>>> Where do you draw the line between "opaque data" and configuration. CPU
>>> models are also something that is present to a guest somehow.
>>
>> Just the fact that it's in a structured key=value format that Qemu
>> itself will interpret before exposing something to the guest. Yes, it's
>> a bit arbitrary. If we could make everything "configuration data", we
>> would (or that's what I think Anthony is pushing for--I hope he will
>> reply and clarify that).
>>
> It is not a "bit arbitrary" it is completely arbitrary.

It's the Unix Philosophy:

"Rule of Representation: Fold knowledge into data so program logic can be stupid 
and robust."

If it can be reasonably represented as data, it should be.  If that data can be 
pushed to a flat text file, it should be.  If you can avoid making that special, 
you should.  This keeps your core logic simpler, empowers the user, and creates 
greater flexibility long term.

Your whole argument seems to boil down to: I don't like this--but you aren't 
providing any concrete problems.  It doesn't make it harder to write a 
management tool, it's completely invisible to a user, and we have total control 
over the data files if they're stored in /usr/share.

So what's your concrete concern here?  Random comments about kvm tool or Gnome 3 
are not concrete concerns.  What use-case do you think is impacted here and why 
(and please be specific)?

http://en.wikipedia.org/wiki/Unix_philosophy

Regards,

Anthony Liguori
Anthony Liguori March 25, 2012, 1:12 p.m. UTC | #59
On 03/25/2012 08:08 AM, Avi Kivity wrote:
> On 03/25/2012 02:55 PM, Anthony Liguori wrote:
>>> If cpu models are not part of configuration they should not be affected
>>> by configuration mechanism. You are just avoiding addressing the real
>>> question that if asked above.
>>
>>
>> I think you're just refusing to listen.
>>
>> The stated direction of QEMU, for literally years now, is that we want
>> to arrive at the following:
>>
>> QEMU is composed of a series of objects who's relationships can be
>> fully described by an external configuration file.  Much of the
>> current baked in concepts (like machines) would then become
>> configuration files.
>>
>> qemu -M pc
>>
>> Would effectively be short hand for -readconfig
>> /usr/share/qemu/machines/pc.cfg
>
> In that case
>
>   qemu -cpu westmere
>
> is shorthand for -readconfig /usr/share/qemu/cpus/westmere.cfg.

This is not a bad suggestion, although it would make -cpu ? a bit awkward.  Do 
you see an advantage to this over having /usr/share/qemu/target-x86_64-cpus.cfg 
that's read early on?

>> I think the thread has reduced to: should /usr/share configuration
>> files be read by default or just treated as additional configuration
>> files.
>
> If they're read as soon as they're referenced, what's the difference?

I suspect libvirt would not be happy with reading configuration files on demand..

Regards,

Anthony Liguori
Avi Kivity March 25, 2012, 1:14 p.m. UTC | #60
On 03/25/2012 03:12 PM, Anthony Liguori wrote:
>>> qemu -M pc
>>>
>>> Would effectively be short hand for -readconfig
>>> /usr/share/qemu/machines/pc.cfg
>>
>> In that case
>>
>>   qemu -cpu westmere
>>
>> is shorthand for -readconfig /usr/share/qemu/cpus/westmere.cfg.
>
>
> This is not a bad suggestion, although it would make -cpu ? a bit
> awkward.  Do you see an advantage to this over having
> /usr/share/qemu/target-x86_64-cpus.cfg that's read early on?

Nope.  As long as qemu -nodefconfig -cpu westmere works, I'm happy.

The reasoning is, loading target-x86_64-cpus.cfg does not alter the
current instance's configuration, so reading it doesn't violate
-nodefconfig.

>>> files be read by default or just treated as additional configuration
>>> files.
>>
>> If they're read as soon as they're referenced, what's the difference?
> I think the thread has reduced to: should /usr/share configuration
>
> I suspect libvirt would not be happy with reading configuration files
> on demand..

Why not?
Avi Kivity March 25, 2012, 1:21 p.m. UTC | #61
On 03/11/2012 04:12 PM, Anthony Liguori wrote:
>> Let me elaborate about the later. Suppose host CPU has kill_guest
>> feature and at the time a guest was installed it was not implemented by
>> kvm. Since it was not implemented by kvm it was not present in vcpu
>> during installation and the guest didn't install "workaround kill_guest"
>> module. Now unsuspecting user upgrades the kernel and tries to restart
>> the guest and fails. He writes angry letter to qemu-devel and is
>> asked to
>> reinstall his guest and move along.
>
>
> -cpu best wouldn't solve this.  You need a read/write configuration
> file where QEMU probes the available CPU and records it to be used for
> the lifetime of the VM.

This doesn't work with live migration, and makes templating harder.  The
only persistent storage we can count on are disk images.

The current approach is simple.  The management tool determines the
configuration, qemu applies it.  Unidirectional information flow.  This
also lends itself to the management tool scanning a cluster and
determining a GCD.

> This discussion isn't about whether QEMU should have a Westmere
> processor definition.  In fact, I think I already applied that patch.
>
> It's a discussion about how we handle this up and down the stack.
>
> The question is who should define and manage CPU compatibility.  Right
> now QEMU does to a certain degree, libvirt discards this and does it's
> own thing, and VDSM/ovirt-engine assume that we're providing something
> and has built a UI around it.
>
> What I'm proposing we consider: have VDSM manage CPU definitions in
> order to provide a specific user experience in ovirt-engine.
>
> We would continue to have Westmere/etc in QEMU exposed as part of the
> user configuration.  But I don't think it makes a lot of sense to have
> to modify QEMU any time a new CPU comes out.

We have to.  New features often come with new MSRs which need to be live
migrated, and of course the cpu flags as well.  We may push all these to
qemu data files, but this is still qemu.  We can't let a management tool
decide that cpu feature X is safe to use on qemu version Y.
Anthony Liguori March 25, 2012, 1:22 p.m. UTC | #62
On 03/25/2012 08:14 AM, Avi Kivity wrote:
> On 03/25/2012 03:12 PM, Anthony Liguori wrote:
>>>> qemu -M pc
>>>>
>>>> Would effectively be short hand for -readconfig
>>>> /usr/share/qemu/machines/pc.cfg
>>>
>>> In that case
>>>
>>>    qemu -cpu westmere
>>>
>>> is shorthand for -readconfig /usr/share/qemu/cpus/westmere.cfg.
>>
>>
>> This is not a bad suggestion, although it would make -cpu ? a bit
>> awkward.  Do you see an advantage to this over having
>> /usr/share/qemu/target-x86_64-cpus.cfg that's read early on?
>
> Nope.  As long as qemu -nodefconfig -cpu westmere works, I'm happy.

Why?  What's wrong with:

qemu -nodefconfig -readconfig /usr/share/qemu/cpus/target-x86_64-cpus.cfg \
      -cpu westmere

And if that's not okay, would:

qemu -nodefconfig -nocpudefconfig -cpu Westmere

Not working be a problem?

> The reasoning is, loading target-x86_64-cpus.cfg does not alter the
> current instance's configuration, so reading it doesn't violate
> -nodefconfig.

I think we have a different view of what -nodefconfig does.

We have a couple options today:

-nodefconfig

Don't read the default configuration files.  By default, we read 
/etc/qemu/qemu.cfg and /etc/qemu/target-$(ARCH).cfg

-nodefaults

Don't create default devices.

-vga none

Don't create the default VGA device (not covered by -nodefaults).

With these two options, the semantics you get an absolutely minimalistic 
instance of QEMU.  Tools like libguestfs really want to create the simplest 
guest and do the least amount of processing so the guest runs as fast as possible.

It does suck a lot that this isn't a single option.  I would much prefer 
-nodefaults to be implied by -nodefconfig.  Likewise, I would prefer that 
-nodefaults implied -vga none.

>>>> files be read by default or just treated as additional configuration
>>>> files.
>>>
>>> If they're read as soon as they're referenced, what's the difference?
>> I think the thread has reduced to: should /usr/share configuration
>>
>> I suspect libvirt would not be happy with reading configuration files
>> on demand..
>
> Why not?

It implies a bunch of SELinux labeling to make sVirt work.  libvirt tries very 
hard to avoid having QEMU read *any* files at all when it starts up.

Regards,

Anthony Liguori
Anthony Liguori March 25, 2012, 1:26 p.m. UTC | #63
On 03/25/2012 08:21 AM, Avi Kivity wrote:
> On 03/11/2012 04:12 PM, Anthony Liguori wrote:
>> This discussion isn't about whether QEMU should have a Westmere
>> processor definition.  In fact, I think I already applied that patch.
>>
>> It's a discussion about how we handle this up and down the stack.
>>
>> The question is who should define and manage CPU compatibility.  Right
>> now QEMU does to a certain degree, libvirt discards this and does it's
>> own thing, and VDSM/ovirt-engine assume that we're providing something
>> and has built a UI around it.
>>
>> What I'm proposing we consider: have VDSM manage CPU definitions in
>> order to provide a specific user experience in ovirt-engine.
>>
>> We would continue to have Westmere/etc in QEMU exposed as part of the
>> user configuration.  But I don't think it makes a lot of sense to have
>> to modify QEMU any time a new CPU comes out.
>
> We have to.  New features often come with new MSRs which need to be live
> migrated, and of course the cpu flags as well.  We may push all these to
> qemu data files, but this is still qemu.  We can't let a management tool
> decide that cpu feature X is safe to use on qemu version Y.

I think QEMU should own CPU definitions.  I think a management tool should have 
the choice of whether they are used though because they are a policy IMHO.

It's okay for QEMU to implement some degree of policy as long as a management 
tool can override it with a different policy.

Regards,

Anthony Liguori

>
Avi Kivity March 25, 2012, 1:34 p.m. UTC | #64
On 03/25/2012 03:22 PM, Anthony Liguori wrote:
>>>> In that case
>>>>
>>>>    qemu -cpu westmere
>>>>
>>>> is shorthand for -readconfig /usr/share/qemu/cpus/westmere.cfg.
>>>
>>>
>>> This is not a bad suggestion, although it would make -cpu ? a bit
>>> awkward.  Do you see an advantage to this over having
>>> /usr/share/qemu/target-x86_64-cpus.cfg that's read early on?
>>
>> Nope.  As long as qemu -nodefconfig -cpu westmere works, I'm happy.
>
>
> Why?  What's wrong with:
>
> qemu -nodefconfig -readconfig
> /usr/share/qemu/cpus/target-x86_64-cpus.cfg \
>      -cpu westmere
>
> And if that's not okay, would:
>
> qemu -nodefconfig -nocpudefconfig -cpu Westmere
>
> Not working be a problem?

Apart from the command line length, it confuses configuration with
definition.

target-x86_64-cpus.cfg does not configure qemu for anything, it's merely
the equivalent of

  #define westmere (x86_def_t) { ... }
  #define nehalem (x86_def_t) { ... }
  #define bulldozer (x86_def_t) { ... } // for PC

so it should be read at each invocation.  On the other hand, pc.cfg and
westmere.cfg (as used previously) are shorthand for

   machine = (QEMUMachine) { ... };
   cpu = (x86_def_t) { ... };

so they should only be read if requested explicitly (or indirectly).

>
>> The reasoning is, loading target-x86_64-cpus.cfg does not alter the
>> current instance's configuration, so reading it doesn't violate
>> -nodefconfig.
>
> I think we have a different view of what -nodefconfig does.
>
> We have a couple options today:
>
> -nodefconfig
>
> Don't read the default configuration files.  By default, we read
> /etc/qemu/qemu.cfg and /etc/qemu/target-$(ARCH).cfg
>

The latter seems meaningless to avoid reading.  It's just a set of
#defines, what do you get by not reading it?

> -nodefaults
>
> Don't create default devices.
>
> -vga none
>
> Don't create the default VGA device (not covered by -nodefaults).
>
> With these two options, the semantics you get an absolutely
> minimalistic instance of QEMU.  Tools like libguestfs really want to
> create the simplest guest and do the least amount of processing so the
> guest runs as fast as possible.
>
> It does suck a lot that this isn't a single option.  I would much
> prefer -nodefaults to be implied by -nodefconfig.  Likewise, I would
> prefer that -nodefaults implied -vga none.

I don't have a qemu.cfg so can't comment on it, but in what way does
reading target-x86_64.cfg affect the current instance (that is, why is
-nodefconfig needed over -nodefaults -vga look-at-the-previous-option?)

>
>>>>> files be read by default or just treated as additional configuration
>>>>> files.
>>>>
>>>> If they're read as soon as they're referenced, what's the difference?
>>> I think the thread has reduced to: should /usr/share configuration
>>>
>>> I suspect libvirt would not be happy with reading configuration files
>>> on demand..
>>
>> Why not?
>
> It implies a bunch of SELinux labeling to make sVirt work.  libvirt
> tries very hard to avoid having QEMU read *any* files at all when it
> starts up.

The /usr/share/qemu files should be statically labelled to allow qemu to
read them, so we can push more code into data files.
Anthony Liguori March 25, 2012, 2:36 p.m. UTC | #65
On 03/25/2012 08:34 AM, Avi Kivity wrote:
> On 03/25/2012 03:22 PM, Anthony Liguori wrote:
>>>>> In that case
>>>>>
>>>>>     qemu -cpu westmere
>>>>>
>>>>> is shorthand for -readconfig /usr/share/qemu/cpus/westmere.cfg.
>>>>
>>>>
>>>> This is not a bad suggestion, although it would make -cpu ? a bit
>>>> awkward.  Do you see an advantage to this over having
>>>> /usr/share/qemu/target-x86_64-cpus.cfg that's read early on?
>>>
>>> Nope.  As long as qemu -nodefconfig -cpu westmere works, I'm happy.
>>
>>
>> Why?  What's wrong with:
>>
>> qemu -nodefconfig -readconfig
>> /usr/share/qemu/cpus/target-x86_64-cpus.cfg \
>>       -cpu westmere
>>
>> And if that's not okay, would:
>>
>> qemu -nodefconfig -nocpudefconfig -cpu Westmere
>>
>> Not working be a problem?
>
> Apart from the command line length, it confuses configuration with
> definition.

There is no distinction with what we have today.  Our configuration file 
basically corresponds to command line options and as there is no distinction in 
command line options, there's no distinction in the configuration format.

> target-x86_64-cpus.cfg does not configure qemu for anything, it's merely
> the equivalent of
>
>    #define westmere (x86_def_t) { ... }
>    #define nehalem (x86_def_t) { ... }
>    #define bulldozer (x86_def_t) { ... } // for PC
>
> so it should be read at each invocation.  On the other hand, pc.cfg and
> westmere.cfg (as used previously) are shorthand for
>
>     machine = (QEMUMachine) { ... };
>     cpu = (x86_def_t) { ... };
>
> so they should only be read if requested explicitly (or indirectly).

This doesn't make a lot of sense to me.  Here's what I'm proposing:

1) QEMU would have a target-x86_64-cpu.cfg.in that is installed by default in 
/etc/qemu.  It would contain:

[system]
# Load default CPU definitions
readconfig = @DATADIR@/target-x86_64-cpus.cfg

2) target-x86_64-cpus.cfg would be installed to @DATADIR@ and would contain:

[cpudef]
   name = "Westmere"
   ...

This has the following properties:

A) QEMU has no builtin notion of CPU definitions.  It just has a "cpu factory". 
  -cpudef will create a new class called Westmere that can then be enumerated 
through qom-type-list and created via qom-create.

B) A management tool has complete control over cpu definitions without modifying 
the underlying filesystem.  -nodefconfig will prevent it from loading and the 
management tool can explicitly load the QEMU definition (via -readconfig, 
potentially using a /dev/fd/N path) or it can define it's own cpu definitions.

C) This model maps to any other type of class factory.  Machines will eventually 
be expressed as a class factory.  When we implement this, we would change the 
default target-x86_64-cpu.cfg to:

[system]
# Load default CPU definitions
readconfig = @DATADIR@/target-x86_64-cpus.cfg
# Load default machines
readconfig = @DATADIR@/target-x86_64-machines.cfg

A machine definition would look like:

[machinedef]
  name = pc-0.15
  virtio-blk.class_code = 32
  ...

Loading a file based on -cpu doesn't generalize well unless we try to load a 
definition for any possible QOM type to find the class factory for it.  I don't 
think this is a good idea.

>>> The reasoning is, loading target-x86_64-cpus.cfg does not alter the
>>> current instance's configuration, so reading it doesn't violate
>>> -nodefconfig.
>>
>> I think we have a different view of what -nodefconfig does.
>>
>> We have a couple options today:
>>
>> -nodefconfig
>>
>> Don't read the default configuration files.  By default, we read
>> /etc/qemu/qemu.cfg and /etc/qemu/target-$(ARCH).cfg
>>
>
> The latter seems meaningless to avoid reading.  It's just a set of
> #defines, what do you get by not reading it?

In my target-$(ARCH).cfg, I have:

[machine]
enable-kvm = "on"

Which means I don't have to use -enable-kvm anymore.  But if you look at a tool 
like libguestfs, start up time is the most important thing so avoiding 
unnecessary I/O and processing is critical.

>> -nodefaults
>>
>> Don't create default devices.
>>
>> -vga none
>>
>> Don't create the default VGA device (not covered by -nodefaults).
>>
>> With these two options, the semantics you get an absolutely
>> minimalistic instance of QEMU.  Tools like libguestfs really want to
>> create the simplest guest and do the least amount of processing so the
>> guest runs as fast as possible.
>>
>> It does suck a lot that this isn't a single option.  I would much
>> prefer -nodefaults to be implied by -nodefconfig.  Likewise, I would
>> prefer that -nodefaults implied -vga none.
>
> I don't have a qemu.cfg so can't comment on it, but in what way does
> reading target-x86_64.cfg affect the current instance (that is, why is
> -nodefconfig needed over -nodefaults -vga look-at-the-previous-option?)

It depends on what the user configures it to do.

Regards,

Anthony Liguori
Avi Kivity March 25, 2012, 2:46 p.m. UTC | #66
On 03/25/2012 04:36 PM, Anthony Liguori wrote:
>> Apart from the command line length, it confuses configuration with
>> definition.
>
>
> There is no distinction with what we have today.  Our configuration
> file basically corresponds to command line options and as there is no
> distinction in command line options, there's no distinction in the
> configuration format.

We don't have command line options for defining, only configuring.

Again, defining = #define
Configuring = modifying current instance

>
>> target-x86_64-cpus.cfg does not configure qemu for anything, it's merely
>> the equivalent of
>>
>>    #define westmere (x86_def_t) { ... }
>>    #define nehalem (x86_def_t) { ... }
>>    #define bulldozer (x86_def_t) { ... } // for PC
>>
>> so it should be read at each invocation.  On the other hand, pc.cfg and
>> westmere.cfg (as used previously) are shorthand for
>>
>>     machine = (QEMUMachine) { ... };
>>     cpu = (x86_def_t) { ... };
>>
>> so they should only be read if requested explicitly (or indirectly).
>
> This doesn't make a lot of sense to me.  Here's what I'm proposing:
>
> 1) QEMU would have a target-x86_64-cpu.cfg.in that is installed by
> default in /etc/qemu.  It would contain:
>
> [system]
> # Load default CPU definitions
> readconfig = @DATADIR@/target-x86_64-cpus.cfg
>
> 2) target-x86_64-cpus.cfg would be installed to @DATADIR@ and would
> contain:
>
> [cpudef]
>   name = "Westmere"
>   ...
>
> This has the following properties:
>
> A) QEMU has no builtin notion of CPU definitions.  It just has a "cpu
> factory".  -cpudef will create a new class called Westmere that can
> then be enumerated through qom-type-list and created via qom-create.
>
> B) A management tool has complete control over cpu definitions without
> modifying the underlying filesystem.  -nodefconfig will prevent it
> from loading and the management tool can explicitly load the QEMU
> definition (via -readconfig, potentially using a /dev/fd/N path) or it
> can define it's own cpu definitions.

Why does -nodefconfig affect anything?

The file defines westmere as an alias for a grab bag of options. 
Whether it's loaded or not is immaterial, unless someone uses one of the
names within.

>
> C) This model maps to any other type of class factory.  Machines will
> eventually be expressed as a class factory.  When we implement this,
> we would change the default target-x86_64-cpu.cfg to:
>
> [system]
> # Load default CPU definitions
> readconfig = @DATADIR@/target-x86_64-cpus.cfg
> # Load default machines
> readconfig = @DATADIR@/target-x86_64-machines.cfg
>
> A machine definition would look like:
>
> [machinedef]
>  name = pc-0.15
>  virtio-blk.class_code = 32
>  ...
>
> Loading a file based on -cpu doesn't generalize well unless we try to
> load a definition for any possible QOM type to find the class factory
> for it.  I don't think this is a good idea.

Why not load all class factories?  Just don't instantiate any objects.

Otherwise, the meaning of -nodefconfig changes as more stuff is moved
out of .c and into .cfg.

>
>>>> The reasoning is, loading target-x86_64-cpus.cfg does not alter the
>>>> current instance's configuration, so reading it doesn't violate
>>>> -nodefconfig.
>>>
>>> I think we have a different view of what -nodefconfig does.
>>>
>>> We have a couple options today:
>>>
>>> -nodefconfig
>>>
>>> Don't read the default configuration files.  By default, we read
>>> /etc/qemu/qemu.cfg and /etc/qemu/target-$(ARCH).cfg
>>>
>>
>> The latter seems meaningless to avoid reading.  It's just a set of
>> #defines, what do you get by not reading it?
>
> In my target-$(ARCH).cfg, I have:
>
> [machine]
> enable-kvm = "on"
>
> Which means I don't have to use -enable-kvm anymore.  But if you look
> at a tool like libguestfs, start up time is the most important thing
> so avoiding unnecessary I/O and processing is critical.

So this is definitely configuration (applies to the current instance) as
opposed to target-x86_64.cfg, which doesn't.

>
>>> -nodefaults
>>>
>>> Don't create default devices.
>>>
>>> -vga none
>>>
>>> Don't create the default VGA device (not covered by -nodefaults).
>>>
>>> With these two options, the semantics you get an absolutely
>>> minimalistic instance of QEMU.  Tools like libguestfs really want to
>>> create the simplest guest and do the least amount of processing so the
>>> guest runs as fast as possible.
>>>
>>> It does suck a lot that this isn't a single option.  I would much
>>> prefer -nodefaults to be implied by -nodefconfig.  Likewise, I would
>>> prefer that -nodefaults implied -vga none.
>>
>> I don't have a qemu.cfg so can't comment on it, but in what way does
>> reading target-x86_64.cfg affect the current instance (that is, why is
>> -nodefconfig needed over -nodefaults -vga look-at-the-previous-option?)
>
> It depends on what the user configures it to do.

How?

As far as I can tell, the only difference is that -nodefconfig -cpu
westmere will error out instead of working.  But if you don't supply
-cpu westmere, the configuration is identical.
Gleb Natapov March 25, 2012, 2:46 p.m. UTC | #67
On Sun, Mar 25, 2012 at 08:09:37AM -0500, Anthony Liguori wrote:
> On 03/25/2012 05:19 AM, Gleb Natapov wrote:
> >On Thu, Mar 22, 2012 at 12:50:18PM -0300, Eduardo Habkost wrote:
> >>On Thu, Mar 22, 2012 at 04:30:55PM +0200, Gleb Natapov wrote:
> >>>On Thu, Mar 22, 2012 at 10:31:21AM -0300, Eduardo Habkost wrote:
> >>>>On Thu, Mar 22, 2012 at 11:32:44AM +0200, Gleb Natapov wrote:
> >>>>>What does this mean? Will -nodefconfig disable loading of bios.bin,
> >>>>>option roms, keymaps?
> >>>>
> >>>>Correcting myself: loading of _config_ files on /usr/share. ROM images
> >>>>are opaque data to be presented to the guest somehow, just like a disk
> >>>>image or kernel binary. But maybe keymaps will become "configuration"
> >>>>someday, I really don't know.
> >>>>
> >>>Where do you draw the line between "opaque data" and configuration. CPU
> >>>models are also something that is present to a guest somehow.
> >>
> >>Just the fact that it's in a structured key=value format that Qemu
> >>itself will interpret before exposing something to the guest. Yes, it's
> >>a bit arbitrary. If we could make everything "configuration data", we
> >>would (or that's what I think Anthony is pushing for--I hope he will
> >>reply and clarify that).
> >>
> >It is not a "bit arbitrary" it is completely arbitrary.
> 
> It's the Unix Philosophy:
> 
> "Rule of Representation: Fold knowledge into data so program logic
> can be stupid and robust."
> 
> If it can be reasonably represented as data, it should be.  If that
> data can be pushed to a flat text file, it should be.  If you can
> avoid making that special, you should.  This keeps your core logic
> simpler, empowers the user, and creates greater flexibility long
> term.
> 
So you are making my point. You should be able to move data outside of
you code without it becoming user configurable file.

> Your whole argument seems to boil down to: I don't like this--but
> you aren't providing any concrete problems.  It doesn't make it
> harder to write a management tool, it's completely invisible to a
> user, and we have total control over the data files if they're
> stored in /usr/share.
> 
I don't like what? Jugging by above two paragraph I am not so sure you
know. I am for moving cpu model definitions into separate file and putting
it into /usr/share. I am against QEMU not loading it. The reason I am
against it is because the file is not part of a machine configuration
and does not stands by it's own. It depends on combination of QEMU/KVM
and machine definition. You said in this thread that CPU types should be
treated like regular devices by machine type mechanism i.e machine types
should have list of properties for each cpu model which are different
from default.  I do agree with that but how is it going to work if you
do not event have standard model definitions that you can rely on.

> So what's your concrete concern here?  Random comments about kvm
> tool or Gnome 3 are not concrete concerns.  What use-case do you
> think is impacted here and why (and please be specific)?
That are comment about QEMU usability. You do not consider that
important?

> 
> http://en.wikipedia.org/wiki/Unix_philosophy
> 
Nothing there supports your design. Actually I think it contradicts at
least this:
 Rule of Clarity: Clarity is better than cleverness
You try to be clever, but in the end nobody expects CPU models to
disappear just because you asked QEMU to not create default machine.

And you still didn't answer what is your view on current state of
affairs where cpu models in .c files are present while those in separate
file are diaper? So you view it as a bug and is going to make those in
.c files disappear to ?

--
			Gleb.
Gleb Natapov March 25, 2012, 2:58 p.m. UTC | #68
On Sun, Mar 25, 2012 at 03:14:56PM +0200, Avi Kivity wrote:
> On 03/25/2012 03:12 PM, Anthony Liguori wrote:
> >>> qemu -M pc
> >>>
> >>> Would effectively be short hand for -readconfig
> >>> /usr/share/qemu/machines/pc.cfg
> >>
> >> In that case
> >>
> >>   qemu -cpu westmere
> >>
> >> is shorthand for -readconfig /usr/share/qemu/cpus/westmere.cfg.
> >
> >
> > This is not a bad suggestion, although it would make -cpu ? a bit
> > awkward.  Do you see an advantage to this over having
> > /usr/share/qemu/target-x86_64-cpus.cfg that's read early on?
> 
> Nope.  As long as qemu -nodefconfig -cpu westmere works, I'm happy.
> 
As log as qemu -nodefconfig -cpu westmere -M pc1.1 can use different
westmere definition than -M pc1.0 (by amending it according to qom
properties in pc1.1 machine description or by reading
/usr/share/qemu/cpus/westmere-pc1.1.cfg instead) I'm happy too.

> The reasoning is, loading target-x86_64-cpus.cfg does not alter the
> current instance's configuration, so reading it doesn't violate
> -nodefconfig.
> 
> >>> files be read by default or just treated as additional configuration
> >>> files.
> >>
> >> If they're read as soon as they're referenced, what's the difference?
> > I think the thread has reduced to: should /usr/share configuration
> >
> > I suspect libvirt would not be happy with reading configuration files
> > on demand..
> 
> Why not?
> 
> -- 
> error compiling committee.c: too many arguments to function

--
			Gleb.
Anthony Liguori March 25, 2012, 2:59 p.m. UTC | #69
On 03/25/2012 09:46 AM, Avi Kivity wrote:
> On 03/25/2012 04:36 PM, Anthony Liguori wrote:
>>> Apart from the command line length, it confuses configuration with
>>> definition.
>>
>>
>> There is no distinction with what we have today.  Our configuration
>> file basically corresponds to command line options and as there is no
>> distinction in command line options, there's no distinction in the
>> configuration format.
>
> We don't have command line options for defining, only configuring.

That's an oversight.  There should be a -cpudef option.  It's a QemuOptsList.

> Again, defining = #define

I think -global fits your definition of #define...

> Configuring = modifying current instance
>
>>
>>> target-x86_64-cpus.cfg does not configure qemu for anything, it's merely
>>> the equivalent of
>>>
>>>     #define westmere (x86_def_t) { ... }
>>>     #define nehalem (x86_def_t) { ... }
>>>     #define bulldozer (x86_def_t) { ... } // for PC
>>>
>>> so it should be read at each invocation.  On the other hand, pc.cfg and
>>> westmere.cfg (as used previously) are shorthand for
>>>
>>>      machine = (QEMUMachine) { ... };
>>>      cpu = (x86_def_t) { ... };
>>>
>>> so they should only be read if requested explicitly (or indirectly).
>>
>> This doesn't make a lot of sense to me.  Here's what I'm proposing:
>>
>> 1) QEMU would have a target-x86_64-cpu.cfg.in that is installed by
>> default in /etc/qemu.  It would contain:
>>
>> [system]
>> # Load default CPU definitions
>> readconfig = @DATADIR@/target-x86_64-cpus.cfg
>>
>> 2) target-x86_64-cpus.cfg would be installed to @DATADIR@ and would
>> contain:
>>
>> [cpudef]
>>    name = "Westmere"
>>    ...
>>
>> This has the following properties:
>>
>> A) QEMU has no builtin notion of CPU definitions.  It just has a "cpu
>> factory".  -cpudef will create a new class called Westmere that can
>> then be enumerated through qom-type-list and created via qom-create.
>>
>> B) A management tool has complete control over cpu definitions without
>> modifying the underlying filesystem.  -nodefconfig will prevent it
>> from loading and the management tool can explicitly load the QEMU
>> definition (via -readconfig, potentially using a /dev/fd/N path) or it
>> can define it's own cpu definitions.
>
> Why does -nodefconfig affect anything?

Because -nodefconfig means "don't load *any* default configuration files".

> The file defines westmere as an alias for a grab bag of options.
> Whether it's loaded or not is immaterial, unless someone uses one of the
> names within.

But you would agree, a management tool should be able to control whether class 
factories get loaded, right?  So what's the mechanism to do this?

>> C) This model maps to any other type of class factory.  Machines will
>> eventually be expressed as a class factory.  When we implement this,
>> we would change the default target-x86_64-cpu.cfg to:
>>
>> [system]
>> # Load default CPU definitions
>> readconfig = @DATADIR@/target-x86_64-cpus.cfg
>> # Load default machines
>> readconfig = @DATADIR@/target-x86_64-machines.cfg
>>
>> A machine definition would look like:
>>
>> [machinedef]
>>   name = pc-0.15
>>   virtio-blk.class_code = 32
>>   ...
>>
>> Loading a file based on -cpu doesn't generalize well unless we try to
>> load a definition for any possible QOM type to find the class factory
>> for it.  I don't think this is a good idea.
>
> Why not load all class factories?  Just don't instantiate any objects.

Unless we have two different config syntaxes, I think it will lead to a lot of 
confusion.  Having some parts of a config file be parsed and others not is 
fairly strange.

> Otherwise, the meaning of -nodefconfig changes as more stuff is moved
> out of .c and into .cfg.

What's the problem with this?

>>>>> The reasoning is, loading target-x86_64-cpus.cfg does not alter the
>>>>> current instance's configuration, so reading it doesn't violate
>>>>> -nodefconfig.
>>>>
>>>> I think we have a different view of what -nodefconfig does.
>>>>
>>>> We have a couple options today:
>>>>
>>>> -nodefconfig
>>>>
>>>> Don't read the default configuration files.  By default, we read
>>>> /etc/qemu/qemu.cfg and /etc/qemu/target-$(ARCH).cfg
>>>>
>>>
>>> The latter seems meaningless to avoid reading.  It's just a set of
>>> #defines, what do you get by not reading it?
>>
>> In my target-$(ARCH).cfg, I have:
>>
>> [machine]
>> enable-kvm = "on"
>>
>> Which means I don't have to use -enable-kvm anymore.  But if you look
>> at a tool like libguestfs, start up time is the most important thing
>> so avoiding unnecessary I/O and processing is critical.
>
> So this is definitely configuration (applies to the current instance) as
> opposed to target-x86_64.cfg, which doesn't.

I'm not sure which part you're responding to..

>>
>>>> -nodefaults
>>>>
>>>> Don't create default devices.
>>>>
>>>> -vga none
>>>>
>>>> Don't create the default VGA device (not covered by -nodefaults).
>>>>
>>>> With these two options, the semantics you get an absolutely
>>>> minimalistic instance of QEMU.  Tools like libguestfs really want to
>>>> create the simplest guest and do the least amount of processing so the
>>>> guest runs as fast as possible.
>>>>
>>>> It does suck a lot that this isn't a single option.  I would much
>>>> prefer -nodefaults to be implied by -nodefconfig.  Likewise, I would
>>>> prefer that -nodefaults implied -vga none.
>>>
>>> I don't have a qemu.cfg so can't comment on it, but in what way does
>>> reading target-x86_64.cfg affect the current instance (that is, why is
>>> -nodefconfig needed over -nodefaults -vga look-at-the-previous-option?)
>>
>> It depends on what the user configures it to do.
>
> How?
>
> As far as I can tell, the only difference is that -nodefconfig -cpu
> westmere will error out instead of working.  But if you don't supply
> -cpu westmere, the configuration is identical.

What configuration?

Let me ask, what do you think the semantics of -nodefconfig should be?  I'm not 
sure I understand what you're advocating for.

Regards,

Anthony Liguori
Anthony Liguori March 25, 2012, 3:06 p.m. UTC | #70
On 03/25/2012 09:46 AM, Gleb Natapov wrote:
> On Sun, Mar 25, 2012 at 08:09:37AM -0500, Anthony Liguori wrote:
>> On 03/25/2012 05:19 AM, Gleb Natapov wrote:
>> It's the Unix Philosophy:
>>
>> "Rule of Representation: Fold knowledge into data so program logic
>> can be stupid and robust."
>>
>> If it can be reasonably represented as data, it should be.  If that
>> data can be pushed to a flat text file, it should be.  If you can
>> avoid making that special, you should.  This keeps your core logic
>> simpler, empowers the user, and creates greater flexibility long
>> term.
>>
> So you are making my point. You should be able to move data outside of
> you code without it becoming user configurable file.

You're reading words that don't exist.

>> Your whole argument seems to boil down to: I don't like this--but
>> you aren't providing any concrete problems.  It doesn't make it
>> harder to write a management tool, it's completely invisible to a
>> user, and we have total control over the data files if they're
>> stored in /usr/share.
>>
> I don't like what?

User configuration apparently.

> Jugging by above two paragraph I am not so sure you
> know. I am for moving cpu model definitions into separate file and putting
> it into /usr/share. I am against QEMU not loading it.

Why are you trying to prevent a user from being able to control what QEMU does?

> The reason I am
> against it is because the file is not part of a machine configuration
> and does not stands by it's own.

This is not a concrete argument.  It assumes that there's an agreed upon concept 
of "machine configuration" and "stands by it's own" which there obviously isn't.

What is the concrete technical or use-case argument here beyond that it doesn't 
match a concept that you have in your head of how things should be?

> It depends on combination of QEMU/KVM
> and machine definition. You said in this thread that CPU types should be
> treated like regular devices by machine type mechanism i.e machine types
> should have list of properties for each cpu model which are different
> from default.  I do agree with that but how is it going to work if you
> do not event have standard model definitions that you can rely on.

Who is "you"?  QEMU will provide a list of models in /usr/share that are loaded 
by default.  If you actively disable it by using -nodefconfig, you're on your 
own.  I would personally never use -nodefconfig.  The only user of -nodefconfig 
is a management tool that is purposefully trying to make QEMU do the 
minimalistic amount of things possible.

I'm not sympathetic to arguments that user's are stupid and you have to keep 
them from doing things they shouldn't.  Defaults should Just Work and simple 
things should be simple to do.  But if a user expressly tells QEMU not to enable 
defaults, then they should know what they're doing.

>> So what's your concrete concern here?  Random comments about kvm
>> tool or Gnome 3 are not concrete concerns.  What use-case do you
>> think is impacted here and why (and please be specific)?
> That are comment about QEMU usability. You do not consider that
> important?
>
>>
>> http://en.wikipedia.org/wiki/Unix_philosophy
>>
> Nothing there supports your design. Actually I think it contradicts at
> least this:
>   Rule of Clarity: Clarity is better than cleverness
> You try to be clever, but in the end nobody expects CPU models to
> disappear just because you asked QEMU to not create default machine.

It's not clever to me, it's obvious.

> And you still didn't answer what is your view on current state of
> affairs where cpu models in .c files are present while those in separate
> file are diaper?

This is strictly a compatibility issue.  At this point in time, we could move 
the .c definitions to a configuration file as we've gone through enough releases 
with the default configuration file present.

> So you view it as a bug and is going to make those in
> .c files disappear to ?

Absolutely.

Regards,

Anthony Liguori

> --
> 			Gleb.
Anthony Liguori March 25, 2012, 3:07 p.m. UTC | #71
On 03/25/2012 09:58 AM, Gleb Natapov wrote:
> On Sun, Mar 25, 2012 at 03:14:56PM +0200, Avi Kivity wrote:
>> On 03/25/2012 03:12 PM, Anthony Liguori wrote:
>>>>> qemu -M pc
>>>>>
>>>>> Would effectively be short hand for -readconfig
>>>>> /usr/share/qemu/machines/pc.cfg
>>>>
>>>> In that case
>>>>
>>>>    qemu -cpu westmere
>>>>
>>>> is shorthand for -readconfig /usr/share/qemu/cpus/westmere.cfg.
>>>
>>>
>>> This is not a bad suggestion, although it would make -cpu ? a bit
>>> awkward.  Do you see an advantage to this over having
>>> /usr/share/qemu/target-x86_64-cpus.cfg that's read early on?
>>
>> Nope.  As long as qemu -nodefconfig -cpu westmere works, I'm happy.
>>
> As log as qemu -nodefconfig -cpu westmere -M pc1.1

-nodefconfig is going to eventually mean that -cpu westmere and -M pc-1.1 will 
not work.

This is where QEMU is going.  There is no reason that a normal user should ever 
use -nodefconfig.

Regards,

Anthony Liguori
Avi Kivity March 25, 2012, 3:16 p.m. UTC | #72
On 03/25/2012 04:59 PM, Anthony Liguori wrote:
> On 03/25/2012 09:46 AM, Avi Kivity wrote:
>> On 03/25/2012 04:36 PM, Anthony Liguori wrote:
>>>> Apart from the command line length, it confuses configuration with
>>>> definition.
>>>
>>>
>>> There is no distinction with what we have today.  Our configuration
>>> file basically corresponds to command line options and as there is no
>>> distinction in command line options, there's no distinction in the
>>> configuration format.
>>
>> We don't have command line options for defining, only configuring.
>
> That's an oversight.  There should be a -cpudef option.  It's a
> QemuOptsList.
>
>> Again, defining = #define
>
> I think -global fits your definition of #define...

Yes (apart from the corner case of modifying a default-instantiated device).

>>> B) A management tool has complete control over cpu definitions without
>>> modifying the underlying filesystem.  -nodefconfig will prevent it
>>> from loading and the management tool can explicitly load the QEMU
>>> definition (via -readconfig, potentially using a /dev/fd/N path) or it
>>> can define it's own cpu definitions.
>>
>> Why does -nodefconfig affect anything?
>
>
> Because -nodefconfig means "don't load *any* default configuration
> files".

Put the emphasis around *configuration*.

"#define westmere blah" is not configuration, otherwise the meaning of
configuration will drift over time.

-cpu blah is, of course.

>
>> The file defines westmere as an alias for a grab bag of options.
>> Whether it's loaded or not is immaterial, unless someone uses one of the
>> names within.
>
> But you would agree, a management tool should be able to control
> whether class factories get loaded, right?  

No, why?  But perhaps I don't entirely get what you mean by "class
factories".

Aren't they just implementations of

   virtual Device *new_instance(...) = 0?
  
if so, why not load them?

> So what's the mechanism to do this?
>
>>> C) This model maps to any other type of class factory.  Machines will
>>> eventually be expressed as a class factory.  When we implement this,
>>> we would change the default target-x86_64-cpu.cfg to:
>>>
>>> [system]
>>> # Load default CPU definitions
>>> readconfig = @DATADIR@/target-x86_64-cpus.cfg
>>> # Load default machines
>>> readconfig = @DATADIR@/target-x86_64-machines.cfg
>>>
>>> A machine definition would look like:
>>>
>>> [machinedef]
>>>   name = pc-0.15
>>>   virtio-blk.class_code = 32
>>>   ...
>>>
>>> Loading a file based on -cpu doesn't generalize well unless we try to
>>> load a definition for any possible QOM type to find the class factory
>>> for it.  I don't think this is a good idea.
>>
>> Why not load all class factories?  Just don't instantiate any objects.
>
> Unless we have two different config syntaxes, I think it will lead to
> a lot of confusion.  Having some parts of a config file be parsed and
> others not is fairly strange.

Parse all of them (and make sure all are class factories).

The only real configuration item is that without -nodefconfig, we create
a -M pc-1.1 system.  Everything else derives from that.

>
>> Otherwise, the meaning of -nodefconfig changes as more stuff is moved
>> out of .c and into .cfg.
>
> What's the problem with this?

The command line becomes unstable if you use -nodefconfig.

>>>
>>> In my target-$(ARCH).cfg, I have:
>>>
>>> [machine]
>>> enable-kvm = "on"
>>>
>>> Which means I don't have to use -enable-kvm anymore.  But if you look
>>> at a tool like libguestfs, start up time is the most important thing
>>> so avoiding unnecessary I/O and processing is critical.
>>
>> So this is definitely configuration (applies to the current instance) as
>> opposed to target-x86_64.cfg, which doesn't.
>  
>
> I'm not sure which part you're responding to..

I was saying that target-x86_64.cfg appears to be definitions, not
configuration, and was asking about qemu.cfg (which is configuration).

>> As far as I can tell, the only difference is that -nodefconfig -cpu
>> westmere will error out instead of working.  But if you don't supply
>> -cpu westmere, the configuration is identical.
>
> What configuration?
>
> Let me ask, what do you think the semantics of -nodefconfig should
> be?  I'm not sure I understand what you're advocating for.
>

-nodefconfig = create an empty machine, don't assume anything (=don't
read qemu.cfg) let me build it out of all those lego bricks.  Those can
be defined in code or in definition files in /usr/share, I don't care.

Maybe that's -nodevices -vga none.  But in this case I don't see the
point in -nodefconfig.  Not loading target_x86-64.cfg doesn't buy the
user anything, since it wouldn't affect the guest in any way.
Avi Kivity March 25, 2012, 3:18 p.m. UTC | #73
On 03/25/2012 05:07 PM, Anthony Liguori wrote:
>> As log as qemu -nodefconfig -cpu westmere -M pc1.1
>
>
> -nodefconfig is going to eventually mean that -cpu westmere and -M
> pc-1.1 will not work.
>
> This is where QEMU is going.  There is no reason that a normal user
> should ever use -nodefconfig.

I don't think anyone or anything can use it, since it's meaning is not
well defined.  "not read any configuration files" where parts of qemu
are continually moved out to configuration files means its a moving target.

Suppose we define the southbridge via a configuration file.  Does that
mean we don't load it any more?
Anthony Liguori March 25, 2012, 3:26 p.m. UTC | #74
On 03/25/2012 10:16 AM, Avi Kivity wrote:
> On 03/25/2012 04:59 PM, Anthony Liguori wrote:
>> On 03/25/2012 09:46 AM, Avi Kivity wrote:
>>> On 03/25/2012 04:36 PM, Anthony Liguori wrote:
>>>>> Apart from the command line length, it confuses configuration with
>>>>> definition.
>>>>
>>>>
>>>> There is no distinction with what we have today.  Our configuration
>>>> file basically corresponds to command line options and as there is no
>>>> distinction in command line options, there's no distinction in the
>>>> configuration format.
>>>
>>> We don't have command line options for defining, only configuring.
>>
>> That's an oversight.  There should be a -cpudef option.  It's a
>> QemuOptsList.
>>
>>> Again, defining = #define
>>
>> I think -global fits your definition of #define...
>
> Yes (apart from the corner case of modifying a default-instantiated device).
>
>>>> B) A management tool has complete control over cpu definitions without
>>>> modifying the underlying filesystem.  -nodefconfig will prevent it
>>>> from loading and the management tool can explicitly load the QEMU
>>>> definition (via -readconfig, potentially using a /dev/fd/N path) or it
>>>> can define it's own cpu definitions.
>>>
>>> Why does -nodefconfig affect anything?
>>
>>
>> Because -nodefconfig means "don't load *any* default configuration
>> files".
>
> Put the emphasis around *configuration*.

So how about:

1) Load ['@SYSCONFDIR@/qemu/qemu.cfg', '@SYSCONFDIR@/qemu/target-@ARCH@.cfg',
          '@DATADIR@/system.cfg', '@DATADIR@/system-@ARCH@.cfg']

2) system-@ARCH@.cfg will contain:

[system]
readconfig=@DATADIR@/target-@ARCH@-cpus.cfg
readconfig=@DATADIR@/target-@ARCH@-machine.cfg

3) -nodefconfig will not load any configuration files from DATADIR or 
SYSCONFDIR.  -no-user-config will not load any configuration files from SYSCONFDIR.

> "#define westmere blah" is not configuration, otherwise the meaning of
> configuration will drift over time.
>
> -cpu blah is, of course.

It's the same mechanism, but the above would create two classes of default 
configuration files and then it becomes a question of how they're used.

>>> The file defines westmere as an alias for a grab bag of options.
>>> Whether it's loaded or not is immaterial, unless someone uses one of the
>>> names within.
>>
>> But you would agree, a management tool should be able to control
>> whether class factories get loaded, right?
>
> No, why?  But perhaps I don't entirely get what you mean by "class
> factories".
>
> Aren't they just implementations of
>
>     virtual Device *new_instance(...) = 0?
>
> if so, why not load them?

No, a class factory creates a new type of class.  -cpudef will ultimately call 
type_register() to create a new QOM visible type.  From a management tools 
perspective, the type is no different than a built-in type.

>>> Otherwise, the meaning of -nodefconfig changes as more stuff is moved
>>> out of .c and into .cfg.
>>
>> What's the problem with this?
>
> The command line becomes unstable if you use -nodefconfig.

-no-user-config solves this but I fully expect libvirt would continue to use 
-nodefconfig.

>>>>
>>>> In my target-$(ARCH).cfg, I have:
>>>>
>>>> [machine]
>>>> enable-kvm = "on"
>>>>
>>>> Which means I don't have to use -enable-kvm anymore.  But if you look
>>>> at a tool like libguestfs, start up time is the most important thing
>>>> so avoiding unnecessary I/O and processing is critical.
>>>
>>> So this is definitely configuration (applies to the current instance) as
>>> opposed to target-x86_64.cfg, which doesn't.
>>
>>
>> I'm not sure which part you're responding to..
>
> I was saying that target-x86_64.cfg appears to be definitions, not
> configuration, and was asking about qemu.cfg (which is configuration).
>
>>> As far as I can tell, the only difference is that -nodefconfig -cpu
>>> westmere will error out instead of working.  But if you don't supply
>>> -cpu westmere, the configuration is identical.
>>
>> What configuration?
>>
>> Let me ask, what do you think the semantics of -nodefconfig should
>> be?  I'm not sure I understand what you're advocating for.
>>
>
> -nodefconfig = create an empty machine, don't assume anything (=don't
> read qemu.cfg) let me build it out of all those lego bricks.  Those can
> be defined in code or in definition files in /usr/share, I don't care.
>
> Maybe that's -nodevices -vga none.  But in this case I don't see the
> point in -nodefconfig.  Not loading target_x86-64.cfg doesn't buy the
> user anything, since it wouldn't affect the guest in any way.

-nodefconfig doesn't mean what you think it means.  -nodefconfig doesn't say 
anything about the user visible machine.

-nodefconfig tells QEMU not to read any configuration files at start up.  This 
has an undefined affect on the user visible machine that depends on the specific 
version of QEMU.

Regards,

Anthony Liguori
Anthony Liguori March 25, 2012, 3:30 p.m. UTC | #75
On 03/25/2012 10:18 AM, Avi Kivity wrote:
> On 03/25/2012 05:07 PM, Anthony Liguori wrote:
>>> As log as qemu -nodefconfig -cpu westmere -M pc1.1
>>
>>
>> -nodefconfig is going to eventually mean that -cpu westmere and -M
>> pc-1.1 will not work.
>>
>> This is where QEMU is going.  There is no reason that a normal user
>> should ever use -nodefconfig.
>
> I don't think anyone or anything can use it, since it's meaning is not
> well defined.  "not read any configuration files" where parts of qemu
> are continually moved out to configuration files means its a moving target.

I think you assume that all QEMU users care about forward and backwards 
compatibility on the command line about all else.

That's really not true.  The libvirt folks have stated repeatedly that command 
line backwards compatibility is not critical to them.  They are happy to require 
that a new version of QEMU requires a new version of libvirt.

I'm not saying that backwards compat isn't important--it is.  But there are 
users who are happy to live on the bleeding edge.

> Suppose we define the southbridge via a configuration file.  Does that
> mean we don't load it any more?

Yes.  If I want the leanest and meanest version of QEMU that will start in the 
smallest number of milliseconds, then being able to tell QEMU not to load 
configuration files and create a very specific machine is a Good Thing.  Why 
exclude users from being able to do this?

Regards,

Anthony Liguori

>
Avi Kivity March 25, 2012, 3:40 p.m. UTC | #76
On 03/25/2012 05:26 PM, Anthony Liguori wrote:
>> Put the emphasis around *configuration*.
>
>
> So how about:
>
> 1) Load ['@SYSCONFDIR@/qemu/qemu.cfg',
> '@SYSCONFDIR@/qemu/target-@ARCH@.cfg',
>          '@DATADIR@/system.cfg', '@DATADIR@/system-@ARCH@.cfg']
>
> 2) system-@ARCH@.cfg will contain:
>
> [system]
> readconfig=@DATADIR@/target-@ARCH@-cpus.cfg
> readconfig=@DATADIR@/target-@ARCH@-machine.cfg
>
> 3) -nodefconfig will not load any configuration files from DATADIR or
> SYSCONFDIR.  -no-user-config will not load any configuration files
> from SYSCONFDIR.

What, more options?

I don't think -nodefconfig (as defined) is usable, since there is no way
for the user to tell what it means short of reading those files.

-no-user-config is usable, I think it needs also to mean that qemu
without -M/-cpu/-m options will error out? since the default machine/cpu
types are default configuration.

>
>> "#define westmere blah" is not configuration, otherwise the meaning of
>> configuration will drift over time.
>>
>> -cpu blah is, of course.
>
> It's the same mechanism, but the above would create two classes of
> default configuration files and then it becomes a question of how
> they're used.

Confused.

>
>>>> The file defines westmere as an alias for a grab bag of options.
>>>> Whether it's loaded or not is immaterial, unless someone uses one
>>>> of the
>>>> names within.
>>>
>>> But you would agree, a management tool should be able to control
>>> whether class factories get loaded, right?
>>
>> No, why?  But perhaps I don't entirely get what you mean by "class
>> factories".
>>
>> Aren't they just implementations of
>>
>>     virtual Device *new_instance(...) = 0?
>>
>> if so, why not load them?
>
> No, a class factory creates a new type of class.  -cpudef will
> ultimately call type_register() to create a new QOM visible type. 
> From a management tools perspective, the type is no different than a
> built-in type.

Exactly.  The types are no different, so there's no reason to
discriminate against types that happen to live in qemu-provided data
files vs. qemu code.  They aren't instantiated, so we lose nothing by
creating the factories (just so long as the factories aren't
mass-producing objects).

>
>>>> Otherwise, the meaning of -nodefconfig changes as more stuff is moved
>>>> out of .c and into .cfg.
>>>
>>> What's the problem with this?
>>
>> The command line becomes unstable if you use -nodefconfig.
>
> -no-user-config solves this but I fully expect libvirt would continue
> to use -nodefconfig.


I don't see how libvirt can use -nodefconfig with the fluid meaning you
attach to it, or what it gains from it.

>>
>> -nodefconfig = create an empty machine, don't assume anything (=don't
>> read qemu.cfg) let me build it out of all those lego bricks.  Those can
>> be defined in code or in definition files in /usr/share, I don't care.
>>
>> Maybe that's -nodevices -vga none.  But in this case I don't see the
>> point in -nodefconfig.  Not loading target_x86-64.cfg doesn't buy the
>> user anything, since it wouldn't affect the guest in any way.
>
>
> -nodefconfig doesn't mean what you think it means.  -nodefconfig
> doesn't say anything about the user visible machine.
>
> -nodefconfig tells QEMU not to read any configuration files at start
> up.  This has an undefined affect on the user visible machine that
> depends on the specific version of QEMU.

Then it's broken.  How can anyone use something that has an undefined
effect?

If I see something like -nodefconfig, I assume it will create a bare
bones guest that will not depend on any qemu defaults and will be stable
across releases.  I don't think anyone will understand -nodefconfig to
be something version dependent without reading the qemu management tool
author's guide.
Avi Kivity March 25, 2012, 3:45 p.m. UTC | #77
On 03/25/2012 05:30 PM, Anthony Liguori wrote:
> On 03/25/2012 10:18 AM, Avi Kivity wrote:
>> On 03/25/2012 05:07 PM, Anthony Liguori wrote:
>>>> As log as qemu -nodefconfig -cpu westmere -M pc1.1
>>>
>>>
>>> -nodefconfig is going to eventually mean that -cpu westmere and -M
>>> pc-1.1 will not work.
>>>
>>> This is where QEMU is going.  There is no reason that a normal user
>>> should ever use -nodefconfig.
>>
>> I don't think anyone or anything can use it, since it's meaning is not
>> well defined.  "not read any configuration files" where parts of qemu
>> are continually moved out to configuration files means its a moving
>> target.
>
> I think you assume that all QEMU users care about forward and
> backwards compatibility on the command line about all else.
>
> That's really not true.  The libvirt folks have stated repeatedly that
> command line backwards compatibility is not critical to them.  They
> are happy to require that a new version of QEMU requires a new version
> of libvirt.

I don't think this came out of happiness, but despair.  Seriously,
keeping compatibility is one of the things we work hardest to achieve,
and we can't manage it for our command line?

>
> I'm not saying that backwards compat isn't important--it is.  But
> there are users who are happy to live on the bleeding edge.

That's fine, but I don't see how -nodefconfig helps them.  All it does
is take away the building blocks (definitions) that they can use when
setting up their configuration.

>
>> Suppose we define the southbridge via a configuration file.  Does that
>> mean we don't load it any more?
>
> Yes.  If I want the leanest and meanest version of QEMU that will
> start in the smallest number of milliseconds, then being able to tell
> QEMU not to load configuration files and create a very specific
> machine is a Good Thing.  Why exclude users from being able to do this?

So is this the point?  Reducing startup time?

I can't say I see the reason to invest so much effort in shaving a
millisecond or less from this, but if we did want to, the way would be
lazy loading of the configuration where items are parsed as they are
referenced.
Avi Kivity March 25, 2012, 4:06 p.m. UTC | #78
On 03/25/2012 03:26 PM, Anthony Liguori wrote:
>>> We would continue to have Westmere/etc in QEMU exposed as part of the
>>> user configuration.  But I don't think it makes a lot of sense to have
>>> to modify QEMU any time a new CPU comes out.
>>
>> We have to.  New features often come with new MSRs which need to be live
>> migrated, and of course the cpu flags as well.  We may push all these to
>> qemu data files, but this is still qemu.  We can't let a management tool
>> decide that cpu feature X is safe to use on qemu version Y.
>
>
> I think QEMU should own CPU definitions.  

Agree.

> I think a management tool should have the choice of whether they are
> used though because they are a policy IMHO.
>
> It's okay for QEMU to implement some degree of policy as long as a
> management tool can override it with a different policy.

Sure.

We can have something like

  # default machine's westmere
  qemu -cpu westmere

  # pc-1.0's westmere
  qemu -M pc-1.0 -cpu westmere

  # pc-1.0's westmere, without nx-less
  qemu -M pc-1.0 -cpu westmere,-nx

  # specify everything in painful detail
  qemu -cpu
vendor=Foo,family=17,model=19,stepping=3,maxleaf=12,+fpu,+vme,leaf10eax=0x1234567,+etc
Gleb Natapov March 25, 2012, 4:34 p.m. UTC | #79
On Sun, Mar 25, 2012 at 10:06:03AM -0500, Anthony Liguori wrote:
> On 03/25/2012 09:46 AM, Gleb Natapov wrote:
> >On Sun, Mar 25, 2012 at 08:09:37AM -0500, Anthony Liguori wrote:
> >>On 03/25/2012 05:19 AM, Gleb Natapov wrote:
> >>It's the Unix Philosophy:
> >>
> >>"Rule of Representation: Fold knowledge into data so program logic
> >>can be stupid and robust."
> >>
> >>If it can be reasonably represented as data, it should be.  If that
> >>data can be pushed to a flat text file, it should be.  If you can
> >>avoid making that special, you should.  This keeps your core logic
> >>simpler, empowers the user, and creates greater flexibility long
> >>term.
> >>
> >So you are making my point. You should be able to move data outside of
> >you code without it becoming user configurable file.
> 
> You're reading words that don't exist.
> 
I am pointing you to words that do not exist because you implying that
they somehow follow from the words that you wrote. They do not.

> >>Your whole argument seems to boil down to: I don't like this--but
> >>you aren't providing any concrete problems.  It doesn't make it
> >>harder to write a management tool, it's completely invisible to a
> >>user, and we have total control over the data files if they're
> >>stored in /usr/share.
> >>
> >I don't like what?
> 
> User configuration apparently.
I think that not everything should be user configuration, yes.

> 
> >Jugging by above two paragraph I am not so sure you
> >know. I am for moving cpu model definitions into separate file and putting
> >it into /usr/share. I am against QEMU not loading it.
> 
> Why are you trying to prevent a user from being able to control what QEMU does?
> 
Because I want QEMU behave in deterministic manner. It means no devices
should disappear.

> >The reason I am
> >against it is because the file is not part of a machine configuration
> >and does not stands by it's own.
> 
> This is not a concrete argument.  It assumes that there's an agreed
> upon concept of "machine configuration" and "stands by it's own"
> which there obviously isn't.
> 
> What is the concrete technical or use-case argument here beyond that
> it doesn't match a concept that you have in your head of how things
> should be?
> 
If user tells me that he has -device virtio-net on the command line and
he uses qemu-1.0 I know exactly how this device works. Even if there is
-nodefconfig (whatever this thing does) on the command line. The same
should be true for -cpu nehalem.

> >It depends on combination of QEMU/KVM
> >and machine definition. You said in this thread that CPU types should be
> >treated like regular devices by machine type mechanism i.e machine types
> >should have list of properties for each cpu model which are different
> >from default.  I do agree with that but how is it going to work if you
> >do not event have standard model definitions that you can rely on.
> 
> Who is "you"?
"You" is you here:
http://lists.gnu.org/archive/html/qemu-devel/2012-03/msg02077.html

>                QEMU will provide a list of models in /usr/share that
> are loaded by default.  If you actively disable it by using
> -nodefconfig, you're on your own.  I would personally never use
> -nodefconfig.  The only user of -nodefconfig is a management tool
> that is purposefully trying to make QEMU do the minimalistic amount
> of things possible.
> 
> I'm not sympathetic to arguments that user's are stupid and you have
> to keep them from doing things they shouldn't.  Defaults should Just
> Work and simple things should be simple to do.  But if a user
> expressly tells QEMU not to enable defaults, then they should know
> what they're doing.
> 
I personally do not like the idea of letting libvirt controlling such low
levels of QEMU as cpu models, but if you what to let them shoot
themselves in the foot we can compromise by allowing using cpu model
specification as a parameter to -cpu  (-cpu /watever/cpu-model-file). This
way libvirt will not be able to specify its own Nehalem cpu type by
mistake. But if/when we extend cpu model definition file format to
contain all information about all cpuid leafs I doubt they will want to
do that.

> >>So what's your concrete concern here?  Random comments about kvm
> >>tool or Gnome 3 are not concrete concerns.  What use-case do you
> >>think is impacted here and why (and please be specific)?
> >That are comment about QEMU usability. You do not consider that
> >important?
> >
> >>
> >>http://en.wikipedia.org/wiki/Unix_philosophy
> >>
> >Nothing there supports your design. Actually I think it contradicts at
> >least this:
> >  Rule of Clarity: Clarity is better than cleverness
> >You try to be clever, but in the end nobody expects CPU models to
> >disappear just because you asked QEMU to not create default machine.
> 
> It's not clever to me, it's obvious.
Not for me and reading your discussion with Avi and looking at the
subject line of the thread I tend to think you are the only one for
whom it is obvious.

> 
> >And you still didn't answer what is your view on current state of
> >affairs where cpu models in .c files are present while those in separate
> >file are diaper?
> 
> This is strictly a compatibility issue.  At this point in time, we
> could move the .c definitions to a configuration file as we've gone
> through enough releases with the default configuration file present.
> 
You do not need to move then into config file to make them disappear
with -nodefconfig. Would you except a patch?

> >So you view it as a bug and is going to make those in
> >.c files disappear to ?
> 
> Absolutely.
> 
At least you are consistent :)

--
			Gleb.
Anthony Liguori March 25, 2012, 6:01 p.m. UTC | #80
On 03/25/2012 10:45 AM, Avi Kivity wrote:
> On 03/25/2012 05:30 PM, Anthony Liguori wrote:
>> On 03/25/2012 10:18 AM, Avi Kivity wrote:
>>> On 03/25/2012 05:07 PM, Anthony Liguori wrote:
>>>>> As log as qemu -nodefconfig -cpu westmere -M pc1.1
>>>>
>>>>
>>>> -nodefconfig is going to eventually mean that -cpu westmere and -M
>>>> pc-1.1 will not work.
>>>>
>>>> This is where QEMU is going.  There is no reason that a normal user
>>>> should ever use -nodefconfig.
>>>
>>> I don't think anyone or anything can use it, since it's meaning is not
>>> well defined.  "not read any configuration files" where parts of qemu
>>> are continually moved out to configuration files means its a moving
>>> target.
>>
>> I think you assume that all QEMU users care about forward and
>> backwards compatibility on the command line about all else.
>>
>> That's really not true.  The libvirt folks have stated repeatedly that
>> command line backwards compatibility is not critical to them.  They
>> are happy to require that a new version of QEMU requires a new version
>> of libvirt.
>
> I don't think this came out of happiness, but despair.  Seriously,
> keeping compatibility is one of the things we work hardest to achieve,
> and we can't manage it for our command line?

I hate to burst your bubble, but we struggle and rarely maintain the level of 
compatibility you're seeking to have.

I agree with you that we need to do a better job maintaining compatibility which 
is why I'm trying to clearly separate the things that we will never break from 
the things that will change over time.

-nodefconfig is a moving target.  If you want stability, don't use it.  If you 
just want to prevent the user's /etc/qemu stuff from being loaded, use 
-no-user-config.

>>
>> I'm not saying that backwards compat isn't important--it is.  But
>> there are users who are happy to live on the bleeding edge.
>
> That's fine, but I don't see how -nodefconfig helps them.  All it does
> is take away the building blocks (definitions) that they can use when
> setting up their configuration.

Yes, this is a feature.

>>> Suppose we define the southbridge via a configuration file.  Does that
>>> mean we don't load it any more?
>>
>> Yes.  If I want the leanest and meanest version of QEMU that will
>> start in the smallest number of milliseconds, then being able to tell
>> QEMU not to load configuration files and create a very specific
>> machine is a Good Thing.  Why exclude users from being able to do this?
>
> So is this the point?  Reducing startup time?

Yes, that's one reason.  But maybe a user wants to have a whole different set of 
machine types and doesn't care to have the ones we provide.  Why prevent a user 
from doing this?

Maybe they have a management tool that attempts to totally hide QEMU from the 
end user and exposes a different set of machine types.  It's certainly more 
convenient for something like the Android emulator to only have to deal with 
QEMU knowing about the 4 types of machines that it specifically supports.

Regards,

Anthony Liguori
Avi Kivity March 25, 2012, 6:09 p.m. UTC | #81
On 03/25/2012 08:01 PM, Anthony Liguori wrote:
>> I don't think this came out of happiness, but despair.  Seriously,
>> keeping compatibility is one of the things we work hardest to achieve,
>> and we can't manage it for our command line?
>
>
> I hate to burst your bubble, but we struggle and rarely maintain the
> level of compatibility you're seeking to have.
>
> I agree with you that we need to do a better job maintaining
> compatibility which is why I'm trying to clearly separate the things
> that we will never break from the things that will change over time.
>
> -nodefconfig is a moving target.  If you want stability, don't use
> it.  If you just want to prevent the user's /etc/qemu stuff from being
> loaded, use -no-user-config.

Fine, but let's clearly document it as such.

Note just saying it doesn't load any configuration files isn't
sufficient.  We have to say that it kills Westmere and some of its
friends, but preserves others like qemu64.  Otherwise it's impossible to
use it except by trial and error.

>
>>>
>>> I'm not saying that backwards compat isn't important--it is.  But
>>> there are users who are happy to live on the bleeding edge.
>>
>> That's fine, but I don't see how -nodefconfig helps them.  All it does
>> is take away the building blocks (definitions) that they can use when
>> setting up their configuration.
>
> Yes, this is a feature.

I don't see how, but okay.

>
>>>> Suppose we define the southbridge via a configuration file.  Does that
>>>> mean we don't load it any more?
>>>
>>> Yes.  If I want the leanest and meanest version of QEMU that will
>>> start in the smallest number of milliseconds, then being able to tell
>>> QEMU not to load configuration files and create a very specific
>>> machine is a Good Thing.  Why exclude users from being able to do this?
>>
>> So is this the point?  Reducing startup time?
>
> Yes, that's one reason.  But maybe a user wants to have a whole
> different set of machine types and doesn't care to have the ones we
> provide.  Why prevent a user from doing this?

How are we preventing a user from doing it?  In what way is -nodefconfig
helping it?

> Maybe they have a management tool that attempts to totally hide QEMU
> from the end user and exposes a different set of machine types.  It's
> certainly more convenient for something like the Android emulator to
> only have to deal with QEMU knowing about the 4 types of machines that
> it specifically supports.

If it supports four types, it should always pass one of them to qemu. 
The only thing -nodefconfig adds is breakage when qemu moves something
that one of those four machines relies on to a config file.
Anthony Liguori March 25, 2012, 6:11 p.m. UTC | #82
On 03/25/2012 10:40 AM, Avi Kivity wrote:
> On 03/25/2012 05:26 PM, Anthony Liguori wrote:
>>> Put the emphasis around *configuration*.
>>
>>
>> So how about:
>>
>> 1) Load ['@SYSCONFDIR@/qemu/qemu.cfg',
>> '@SYSCONFDIR@/qemu/target-@ARCH@.cfg',
>>           '@DATADIR@/system.cfg', '@DATADIR@/system-@ARCH@.cfg']
>>
>> 2) system-@ARCH@.cfg will contain:
>>
>> [system]
>> readconfig=@DATADIR@/target-@ARCH@-cpus.cfg
>> readconfig=@DATADIR@/target-@ARCH@-machine.cfg
>>
>> 3) -nodefconfig will not load any configuration files from DATADIR or
>> SYSCONFDIR.  -no-user-config will not load any configuration files
>> from SYSCONFDIR.
>
> What, more options?

Okay, we can just drop -no-user-config and then if a management tool wants to do 
the equivalent, they can do -nodefconfig + '-readconfig 
@DATADIR@/system-@ARCH@.cfg'.  I'm equally happy with that :-)

> I don't think -nodefconfig (as defined) is usable, since there is no way
> for the user to tell what it means short of reading those files.

*if the user doesn't know specifics about this QEMU version.

You make the assumption that all users are going to throw arbitrary options at 
arbitrary QEMU versions.  That's certainly an important use-case but it's not 
the only one.

> -no-user-config is usable, I think it needs also to mean that qemu
> without -M/-cpu/-m options will error out?

You're confusing -nodefaults (or something stronger than -nodefaults) with 
-no-user-config.

Yes, the distinctions are confusing.  It's not all fixable tomorrow.  If we take 
my config refactoring series, we can get 90% of the way there soon but Paolo has 
a more thorough refactoring..

>>> "#define westmere blah" is not configuration, otherwise the meaning of
>>> configuration will drift over time.
>>>
>>> -cpu blah is, of course.
>>
>> It's the same mechanism, but the above would create two classes of
>> default configuration files and then it becomes a question of how
>> they're used.
>
> Confused.

We don't have a formal concept of -read-definition-config and 
-read-configuration-config

There's no easy or obvious way to create such a concept either nor do I think 
the distinction is meaningful to users.

>>>>> The file defines westmere as an alias for a grab bag of options.
>>>>> Whether it's loaded or not is immaterial, unless someone uses one
>>>>> of the
>>>>> names within.
>>>>
>>>> But you would agree, a management tool should be able to control
>>>> whether class factories get loaded, right?
>>>
>>> No, why?  But perhaps I don't entirely get what you mean by "class
>>> factories".
>>>
>>> Aren't they just implementations of
>>>
>>>      virtual Device *new_instance(...) = 0?
>>>
>>> if so, why not load them?
>>
>> No, a class factory creates a new type of class.  -cpudef will
>> ultimately call type_register() to create a new QOM visible type.
>>  From a management tools perspective, the type is no different than a
>> built-in type.
>
> Exactly.  The types are no different, so there's no reason to
> discriminate against types that happen to live in qemu-provided data
> files vs. qemu code.  They aren't instantiated, so we lose nothing by
> creating the factories (just so long as the factories aren't
> mass-producing objects).

At some point, I'd like to have type modules that are shared objects.  I'd like 
QEMU to start with almost no builtin types and allow the user to configure which 
modules get loaded.

In the long term, I'd like QEMU to be a small, robust core with the vast 
majority of code relegated to modules with the user ultimately in control of 
module loading.

Yes, I'd want some module autoloading system but there should always be a way to 
launch QEMU without loading any modules and then load a very specific set of 
modules (as defined by the user).

You can imagine this being useful for something like Common Criteria certifications.

>>>>> Otherwise, the meaning of -nodefconfig changes as more stuff is moved
>>>>> out of .c and into .cfg.
>>>>
>>>> What's the problem with this?
>>>
>>> The command line becomes unstable if you use -nodefconfig.
>>
>> -no-user-config solves this but I fully expect libvirt would continue
>> to use -nodefconfig.
>
>
> I don't see how libvirt can use -nodefconfig with the fluid meaning you
> attach to it, or what it gains from it.
>
>>>
>>> -nodefconfig = create an empty machine, don't assume anything (=don't
>>> read qemu.cfg) let me build it out of all those lego bricks.  Those can
>>> be defined in code or in definition files in /usr/share, I don't care.
>>>
>>> Maybe that's -nodevices -vga none.  But in this case I don't see the
>>> point in -nodefconfig.  Not loading target_x86-64.cfg doesn't buy the
>>> user anything, since it wouldn't affect the guest in any way.
>>
>>
>> -nodefconfig doesn't mean what you think it means.  -nodefconfig
>> doesn't say anything about the user visible machine.
>>
>> -nodefconfig tells QEMU not to read any configuration files at start
>> up.  This has an undefined affect on the user visible machine that
>> depends on the specific version of QEMU.
>
> Then it's broken.  How can anyone use something that has an undefined
> effect?

It's obviously defined for a given release, just not defined long term.

> If I see something like -nodefconfig, I assume it will create a bare
> bones guest that will not depend on any qemu defaults and will be stable
> across releases.

That's not even close to what -nodefconfig is.  That's pretty much what 
-nodefaults is but -nodefaults has also had a fluid definition historically.

Regards,

Anthony Liguori

> I don't think anyone will understand -nodefconfig to
> be something version dependent without reading the qemu management tool
> author's guide.
>
Avi Kivity March 26, 2012, 9:08 a.m. UTC | #83
On 03/25/2012 08:11 PM, Anthony Liguori wrote:
>
>> I don't think -nodefconfig (as defined) is usable, since there is no way
>> for the user to tell what it means short of reading those files.
>
> *if the user doesn't know specifics about this QEMU version.
>
> You make the assumption that all users are going to throw arbitrary
> options at arbitrary QEMU versions.  That's certainly an important
> use-case but it's not the only one.

If a Fedora user is using qemu, then their qemu version will change
every six months.  Their options are to update their scripts/management
tool in step, or not have their management tool use -nodefconfig.

The same holds for anyone using qemu from upstream, since that's
approximately the qemu release cycle.

>
>> -no-user-config is usable, I think it needs also to mean that qemu
>> without -M/-cpu/-m options will error out?
>
> You're confusing -nodefaults (or something stronger than -nodefaults)
> with -no-user-config.
>

Right.

> Yes, the distinctions are confusing.  It's not all fixable tomorrow. 
> If we take my config refactoring series, we can get 90% of the way
> there soon but Paolo has a more thorough refactoring..
>
>>>> "#define westmere blah" is not configuration, otherwise the meaning of
>>>> configuration will drift over time.
>>>>
>>>> -cpu blah is, of course.
>>>
>>> It's the same mechanism, but the above would create two classes of
>>> default configuration files and then it becomes a question of how
>>> they're used.
>>
>> Confused.
>
> We don't have a formal concept of -read-definition-config and
> -read-configuration-config
>
> There's no easy or obvious way to create such a concept either nor do
> I think the distinction is meaningful to users.

Definition files should be invisible to users.  They're part of the
implementation.  If we have a file that says

  pc-1.1 = piix + cirrus + memory(128) + ...

then it's nobody's business if it's in a text file or a .c file.

Of course it's  nice to allow users to load their own definition files,
but that's strictly a convenience.

>> Exactly.  The types are no different, so there's no reason to
>> discriminate against types that happen to live in qemu-provided data
>> files vs. qemu code.  They aren't instantiated, so we lose nothing by
>> creating the factories (just so long as the factories aren't
>> mass-producing objects).
>
>
> At some point, I'd like to have type modules that are shared objects. 
> I'd like QEMU to start with almost no builtin types and allow the user
> to configure which modules get loaded.
>
> In the long term, I'd like QEMU to be a small, robust core with the
> vast majority of code relegated to modules with the user ultimately in
> control of module loading.
>
> Yes, I'd want some module autoloading system but there should always
> be a way to launch QEMU without loading any modules and then load a
> very specific set of modules (as defined by the user).
>
> You can imagine this being useful for something like Common Criteria
> certifications.

Okay.

> It's obviously defined for a given release, just not defined long term.
>
>> If I see something like -nodefconfig, I assume it will create a bare
>> bones guest that will not depend on any qemu defaults and will be stable
>> across releases.
>
> That's not even close to what -nodefconfig is.  That's pretty much
> what -nodefaults is but -nodefaults has also had a fluid definition
> historically.

Okay.  Let's just make sure to document -nodefconfig as version specific
and -nodefaults as the stable way to create a bare bones guest (and
define exactly what that means).
Gleb Natapov March 26, 2012, 9:53 a.m. UTC | #84
On Mon, Mar 26, 2012 at 11:08:16AM +0200, Avi Kivity wrote:
> >> Exactly.  The types are no different, so there's no reason to
> >> discriminate against types that happen to live in qemu-provided data
> >> files vs. qemu code.  They aren't instantiated, so we lose nothing by
> >> creating the factories (just so long as the factories aren't
> >> mass-producing objects).
> >
> >
> > At some point, I'd like to have type modules that are shared objects. 
> > I'd like QEMU to start with almost no builtin types and allow the user
> > to configure which modules get loaded.
> >
> > In the long term, I'd like QEMU to be a small, robust core with the
> > vast majority of code relegated to modules with the user ultimately in
> > control of module loading.
> >
> > Yes, I'd want some module autoloading system but there should always
> > be a way to launch QEMU without loading any modules and then load a
> > very specific set of modules (as defined by the user).
> >
> > You can imagine this being useful for something like Common Criteria
> > certifications.
> 
> Okay.
> 
Modularised minimal QEMU may be a good thing, but how -nodefconfig helps
here? Won't you have the same effect if QEMU will load modules on demand,
only when they are actually needed (regardless of -nodefconfig). i.e
virtio-blk is loaded only if there is -device virtio-blk somewhere in
configuration.

> > It's obviously defined for a given release, just not defined long term.
> >
> >> If I see something like -nodefconfig, I assume it will create a bare
> >> bones guest that will not depend on any qemu defaults and will be stable
> >> across releases.
> >
> > That's not even close to what -nodefconfig is.  That's pretty much
> > what -nodefaults is but -nodefaults has also had a fluid definition
> > historically.
> 
> Okay.  Let's just make sure to document -nodefconfig as version specific
> and -nodefaults as the stable way to create a bare bones guest (and
> define exactly what that means).
> 
What is the reason libvirt uses -nodefconfig instead of -nodefaults now?
What the former does for them that the later doesn't?

--
			Gleb.
Jiri Denemark March 26, 2012, 11:24 a.m. UTC | #85
On Sun, Mar 25, 2012 at 10:26:57 -0500, Anthony Liguori wrote:
> On 03/25/2012 10:16 AM, Avi Kivity wrote:
> > On 03/25/2012 04:59 PM, Anthony Liguori wrote:
> So how about:
> 
> 1) Load ['@SYSCONFDIR@/qemu/qemu.cfg', '@SYSCONFDIR@/qemu/target-@ARCH@.cfg',
>           '@DATADIR@/system.cfg', '@DATADIR@/system-@ARCH@.cfg']
> 
> 2) system-@ARCH@.cfg will contain:
> 
> [system]
> readconfig=@DATADIR@/target-@ARCH@-cpus.cfg
> readconfig=@DATADIR@/target-@ARCH@-machine.cfg
> 
> 3) -nodefconfig will not load any configuration files from DATADIR or 
> SYSCONFDIR.  -no-user-config will not load any configuration files from SYSCONFDIR.
> 
...
> > The command line becomes unstable if you use -nodefconfig.
> 
> -no-user-config solves this but I fully expect libvirt would continue to use 
> -nodefconfig.

Libvirt uses -nodefaults -nodefconfig because it wants to fully control how
the virtual machine will look like (mainly in terms of devices). In other
words, we don't want any devices to just magically appear without libvirt
knowing about them. -nodefaults gets rid of default devices that are built
directly in qemu. Since users can set any devices or command line options
(such as enable-kvm) into qemu configuration files in @SYSCONFDIR@, we need to
avoid reading those files as well. Hence we use -nodefconfig. However, we
would still like qemu to read CPU definitions, machine types, etc. once they
become externally loaded configuration (or however we decide to call it). That
said, when CPU definitions are moved into @DATADIR@, and -no-user-config is
introduced, I don't see any reason for libvirt to keep using -nodefconfig.

I actually like
-no-user-config
more than
-nodefconfig -readconfig @DATADIR@/...
since it would avoid additional magic to detect what files libvirt should
explicitly pass to -readconfig but basically any approach that would allow us
to do read files only from @DATADIR@ is much better than what we have with
-nodefconfig now.

Jirka
Avi Kivity March 26, 2012, 11:59 a.m. UTC | #86
On 03/26/2012 01:24 PM, Jiri Denemark wrote:
> ...
> > > The command line becomes unstable if you use -nodefconfig.
> > 
> > -no-user-config solves this but I fully expect libvirt would continue to use 
> > -nodefconfig.
>
> Libvirt uses -nodefaults -nodefconfig because it wants to fully control how
> the virtual machine will look like (mainly in terms of devices). In other
> words, we don't want any devices to just magically appear without libvirt
> knowing about them. -nodefaults gets rid of default devices that are built
> directly in qemu. Since users can set any devices or command line options
> (such as enable-kvm) into qemu configuration files in @SYSCONFDIR@, we need to
> avoid reading those files as well. Hence we use -nodefconfig. However, we
> would still like qemu to read CPU definitions, machine types, etc. once they
> become externally loaded configuration (or however we decide to call it). That
> said, when CPU definitions are moved into @DATADIR@, and -no-user-config is
> introduced, I don't see any reason for libvirt to keep using -nodefconfig.
>
> I actually like
> -no-user-config
> more than
> -nodefconfig -readconfig @DATADIR@/...
> since it would avoid additional magic to detect what files libvirt should
> explicitly pass to -readconfig but basically any approach that would allow us
> to do read files only from @DATADIR@ is much better than what we have with
> -nodefconfig now.

That's how I see it as well.
Gleb Natapov March 26, 2012, 12:03 p.m. UTC | #87
On Mon, Mar 26, 2012 at 01:59:05PM +0200, Avi Kivity wrote:
> On 03/26/2012 01:24 PM, Jiri Denemark wrote:
> > ...
> > > > The command line becomes unstable if you use -nodefconfig.
> > > 
> > > -no-user-config solves this but I fully expect libvirt would continue to use 
> > > -nodefconfig.
> >
> > Libvirt uses -nodefaults -nodefconfig because it wants to fully control how
> > the virtual machine will look like (mainly in terms of devices). In other
> > words, we don't want any devices to just magically appear without libvirt
> > knowing about them. -nodefaults gets rid of default devices that are built
> > directly in qemu. Since users can set any devices or command line options
> > (such as enable-kvm) into qemu configuration files in @SYSCONFDIR@, we need to
> > avoid reading those files as well. Hence we use -nodefconfig. However, we
> > would still like qemu to read CPU definitions, machine types, etc. once they
> > become externally loaded configuration (or however we decide to call it). That
> > said, when CPU definitions are moved into @DATADIR@, and -no-user-config is
> > introduced, I don't see any reason for libvirt to keep using -nodefconfig.
> >
> > I actually like
> > -no-user-config
> > more than
> > -nodefconfig -readconfig @DATADIR@/...
> > since it would avoid additional magic to detect what files libvirt should
> > explicitly pass to -readconfig but basically any approach that would allow us
> > to do read files only from @DATADIR@ is much better than what we have with
> > -nodefconfig now.
> 
> That's how I see it as well.
> 
+1

except that instead of -no-user-config we can do what most other
programs do. If config file is specified during invocation default one
is not used. After implementing -no-user-config (or similar) we can drop
-nodefconfig entirely since its only user will be gone it its semantics
is not clear.

--
			Gleb.
Eduardo Habkost March 26, 2012, 4 p.m. UTC | #88
On Sun, Mar 25, 2012 at 12:19:13PM +0200, Gleb Natapov wrote:
> > (That's why I said that perhaps keymaps could become configuration
> > someday. Because maybe they can be converted to a key=value model
> > relatively easily)
> > 
> Such whole sale approach is harmful since it starts to affect design
> decisions. So now if it seams logical to move something outside the code
> one can decide against it just because it will become "configuration"
> due to that design.

This is one point I agree completely with.

There's no reason to couple the decision to "move something to an
external file" with making changes to the Qemu external interfaces.

We should be able to gradually move things to be "data" without breaking
the expectations of libvirt at the same time. We must be able to make a
gradual design change, where we first move data to an external file
without making that change visible to the outside, then slowly try to
make it part of the user-visible configuration (that means making
externally visible interface changes, something that has to be made much
more carefully than just moving "internal data" around). Anthony seems
to be simply rejecting the possibility of doing this gradual change[1].

[1] I still need to read this whole thread, so sorry if I am wrong.
Eduardo Habkost March 26, 2012, 4:14 p.m. UTC | #89
On Mon, Mar 26, 2012 at 02:03:21PM +0200, Gleb Natapov wrote:
> On Mon, Mar 26, 2012 at 01:59:05PM +0200, Avi Kivity wrote:
> > On 03/26/2012 01:24 PM, Jiri Denemark wrote:
> > > ...
> > > > > The command line becomes unstable if you use -nodefconfig.
> > > > 
> > > > -no-user-config solves this but I fully expect libvirt would continue to use 
> > > > -nodefconfig.
> > >
> > > Libvirt uses -nodefaults -nodefconfig because it wants to fully control how
> > > the virtual machine will look like (mainly in terms of devices). In other
> > > words, we don't want any devices to just magically appear without libvirt
> > > knowing about them. -nodefaults gets rid of default devices that are built
> > > directly in qemu. Since users can set any devices or command line options
> > > (such as enable-kvm) into qemu configuration files in @SYSCONFDIR@, we need to
> > > avoid reading those files as well. Hence we use -nodefconfig. However, we
> > > would still like qemu to read CPU definitions, machine types, etc. once they
> > > become externally loaded configuration (or however we decide to call it). That
> > > said, when CPU definitions are moved into @DATADIR@, and -no-user-config is
> > > introduced, I don't see any reason for libvirt to keep using -nodefconfig.

ACK.

> > >
> > > I actually like
> > > -no-user-config
> > > more than
> > > -nodefconfig -readconfig @DATADIR@/...
> > > since it would avoid additional magic to detect what files libvirt should
> > > explicitly pass to -readconfig but basically any approach that would allow us
> > > to do read files only from @DATADIR@ is much better than what we have with
> > > -nodefconfig now.
> > 
> > That's how I see it as well.
> > 
> +1
> 
> except that instead of -no-user-config we can do what most other
> programs do. If config file is specified during invocation default one
> is not used. After implementing -no-user-config (or similar) we can drop
> -nodefconfig entirely since its only user will be gone it its semantics
> is not clear.

Awesome. It looks like we have a solution now? Anthony, do you agree
with that? Daniel, it looks good for you?

It looks like in the end, no one will ever use -nodefconfig because it's
optimizing for a use-case that nobody cares about. Maybe I'm wrong and
somebody somewhere will use -nodefconfig, maybe Anthony has a specific
use-case or specific tools in mind, I don't know. But personally I will
probably simply ignore the existence of -nodefconfig because it is
absolutely useless for me and for libvirt.
Eduardo Habkost March 26, 2012, 4:34 p.m. UTC | #90
On Sun, Mar 25, 2012 at 01:11:04PM -0500, Anthony Liguori wrote:
> On 03/25/2012 10:40 AM, Avi Kivity wrote:
> >On 03/25/2012 05:26 PM, Anthony Liguori wrote:
> >>>Put the emphasis around *configuration*.
> >>
> >>
> >>So how about:
> >>
> >>1) Load ['@SYSCONFDIR@/qemu/qemu.cfg',
> >>'@SYSCONFDIR@/qemu/target-@ARCH@.cfg',
> >>          '@DATADIR@/system.cfg', '@DATADIR@/system-@ARCH@.cfg']
> >>
> >>2) system-@ARCH@.cfg will contain:
> >>
> >>[system]
> >>readconfig=@DATADIR@/target-@ARCH@-cpus.cfg
> >>readconfig=@DATADIR@/target-@ARCH@-machine.cfg
> >>
> >>3) -nodefconfig will not load any configuration files from DATADIR or
> >>SYSCONFDIR.  -no-user-config will not load any configuration files
> >>from SYSCONFDIR.
> >
> >What, more options?
> 
> Okay, we can just drop -no-user-config and then if a management tool
> wants to do the equivalent, they can do -nodefconfig + '-readconfig
> @DATADIR@/system-@ARCH@.cfg'.  I'm equally happy with that :-)

I actually prefer -no-user-config, because it gives Qemu freedom to add
more stuff to the outside if needed, but without requiring more
-readconfig options (or -read-something-else options, if we create them)
to be added in the future.

For example, if one day we move machine-types to external files, libvirt
wouldn't have to be changed to add Yet Another -readconfig argument to
make the machine-types available for use. If Qemu moves some device
implementation to external modules, they won't suddenly go away for
users of -no-user-config. The list of possible changes that would break
compatibility for -nodefconfig users but no -no-user-config users is
large.

[...]
> >I don't think -nodefconfig (as defined) is usable, since there is no way
> >for the user to tell what it means short of reading those files.
> 
> *if the user doesn't know specifics about this QEMU version.
> 
> You make the assumption that all users are going to throw arbitrary
> options at arbitrary QEMU versions.  That's certainly an important
> use-case but it's not the only one.

Well, you make the assumption that somebody will every want to use
-nodefconfig the way you want to define it. I don't think nobody will
ever use it if we defined it that way, but that's OK with me. I will
simply ignore the existence of -nodefconfig from now on.  :-)

[...]
> >>>"#define westmere blah" is not configuration, otherwise the meaning of
> >>>configuration will drift over time.
> >>>
> >>>-cpu blah is, of course.
> >>
> >>It's the same mechanism, but the above would create two classes of
> >>default configuration files and then it becomes a question of how
> >>they're used.
> >
> >Confused.
> 
> We don't have a formal concept of -read-definition-config and
> -read-configuration-config
> 
> There's no easy or obvious way to create such a concept either nor do
> I think the distinction is meaningful to users.

The distinction _is_ meaningful to libvirt, that's what started this
thread.

[...]
> >>>>>The file defines westmere as an alias for a grab bag of options.
> >>>>>Whether it's loaded or not is immaterial, unless someone uses one
> >>>>>of the
> >>>>>names within.
> >>>>
> >>>>But you would agree, a management tool should be able to control
> >>>>whether class factories get loaded, right?
> >>>
> >>>No, why?  But perhaps I don't entirely get what you mean by "class
> >>>factories".
> >>>
> >>>Aren't they just implementations of
> >>>
> >>>     virtual Device *new_instance(...) = 0?
> >>>
> >>>if so, why not load them?
> >>
> >>No, a class factory creates a new type of class.  -cpudef will
> >>ultimately call type_register() to create a new QOM visible type.
> >> From a management tools perspective, the type is no different than a
> >>built-in type.
> >
> >Exactly.  The types are no different, so there's no reason to
> >discriminate against types that happen to live in qemu-provided data
> >files vs. qemu code.  They aren't instantiated, so we lose nothing by
> >creating the factories (just so long as the factories aren't
> >mass-producing objects).
> 
> At some point, I'd like to have type modules that are shared objects.
> I'd like QEMU to start with almost no builtin types and allow the
> user to configure which modules get loaded.
> 
> In the long term, I'd like QEMU to be a small, robust core with the
> vast majority of code relegated to modules with the user ultimately
> in control of module loading.
> 
> Yes, I'd want some module autoloading system but there should always
> be a way to launch QEMU without loading any modules and then load a
> very specific set of modules (as defined by the user).

And libvirt needs a way to keep module autoloading enabled while
disabling the loading of files from /etc.

> 
> You can imagine this being useful for something like Common Criteria certifications.

No problem, except that this is not the use-case libvirt has. If you
want -nodefconfig to mean that, that's OK. But we need an option to just
disable the loading of files from /etc, but keeping loading the "default
non-user-configurable modules that usually are available" (be it CPU
models, machine-types, external modules, whatever), and doesn't keep
changing meaning on every minor release.

[...]
> >>>-nodefconfig = create an empty machine, don't assume anything (=don't
> >>>read qemu.cfg) let me build it out of all those lego bricks.  Those can
> >>>be defined in code or in definition files in /usr/share, I don't care.
> >>>
> >>>Maybe that's -nodevices -vga none.  But in this case I don't see the
> >>>point in -nodefconfig.  Not loading target_x86-64.cfg doesn't buy the
> >>>user anything, since it wouldn't affect the guest in any way.
> >>
> >>
> >>-nodefconfig doesn't mean what you think it means.  -nodefconfig
> >>doesn't say anything about the user visible machine.
> >>
> >>-nodefconfig tells QEMU not to read any configuration files at start
> >>up.  This has an undefined affect on the user visible machine that
> >>depends on the specific version of QEMU.
> >
> >Then it's broken.  How can anyone use something that has an undefined
> >effect?
> 
> It's obviously defined for a given release, just not defined long term.

Then it's not usable by libvirt.

[...]
> >If I see something like -nodefconfig, I assume it will create a bare
> >bones guest that will not depend on any qemu defaults and will be stable
> >across releases.
> 
> That's not even close to what -nodefconfig is.  That's pretty much
> what -nodefaults is but -nodefaults has also had a fluid definition
> historically.

Then we need an option with the meaning that libvirt needs, and a
meaning that is stable across releases. If I understood this dicussion
correctly, that would be "-no-user-config".
Anthony Liguori March 26, 2012, 7 p.m. UTC | #91
On 03/25/2012 01:09 PM, Avi Kivity wrote:
>>>>> Suppose we define the southbridge via a configuration file.  Does that
>>>>> mean we don't load it any more?
>>>>
>>>> Yes.  If I want the leanest and meanest version of QEMU that will
>>>> start in the smallest number of milliseconds, then being able to tell
>>>> QEMU not to load configuration files and create a very specific
>>>> machine is a Good Thing.  Why exclude users from being able to do this?
>>>
>>> So is this the point?  Reducing startup time?
>>
>> Yes, that's one reason.  But maybe a user wants to have a whole
>> different set of machine types and doesn't care to have the ones we
>> provide.  Why prevent a user from doing this?
>
> How are we preventing a user from doing it?  In what way is -nodefconfig
> helping it?

Let me explain it in a different way, perhaps.

We launch smbd in QEMU in order to do file sharing over slirp.  One of the 
historic problems we've had is that we don't assume root privileges, yet want to 
be able to run smbd without using any of the system configuration files.

You can do this by specify -s with the config file, and then in the config file 
you can overload the various default paths (like private dir, lock dir, etc.). 
In some cases, earlier versions of smbd didn't allow you to change private dir.

You should be able to tell a well behaved tool not to read any 
configuration/data files and explicitly tell it where/how to read them.  We 
cannot exhaustively anticipate every future use case of QEMU.

But beyond the justification for -nodefconfig, the fact is that it exists today, 
and has a specific semantic.  If we want to have a different semantic, we should 
introduce a new option (-no-user-config).

Regards,

Anthony Liguori

>> Maybe they have a management tool that attempts to totally hide QEMU
>> from the end user and exposes a different set of machine types.  It's
>> certainly more convenient for something like the Android emulator to
>> only have to deal with QEMU knowing about the 4 types of machines that
>> it specifically supports.
>
> If it supports four types, it should always pass one of them to qemu.
> The only thing -nodefconfig adds is breakage when qemu moves something
> that one of those four machines relies on to a config file.
>
Anthony Liguori March 26, 2012, 7:03 p.m. UTC | #92
On 03/26/2012 04:08 AM, Avi Kivity wrote:
>>> If I see something like -nodefconfig, I assume it will create a bare
>>> bones guest that will not depend on any qemu defaults and will be stable
>>> across releases.
>>
>> That's not even close to what -nodefconfig is.  That's pretty much
>> what -nodefaults is but -nodefaults has also had a fluid definition
>> historically.
>
> Okay.  Let's just make sure to document -nodefconfig as version specific
> and -nodefaults as the stable way to create a bare bones guest (and
> define exactly what that means).

Agreed.  But I do want to point out, that -nodefaults has not been stable and 
since it doesn't universally create a bare bones guest, it's not clear that it 
will be.

I think what we want to move toward is a -no-machine option which allows a user 
to explicitly build a machine from scratch.  That is:

qemu -no-machine -device i440fx,id=host -device isa-serial,chr=chr0 ...

Regards,

Anthony Liguori
Anthony Liguori March 26, 2012, 7:04 p.m. UTC | #93
On 03/26/2012 11:14 AM, Eduardo Habkost wrote:
> On Mon, Mar 26, 2012 at 02:03:21PM +0200, Gleb Natapov wrote:
>> On Mon, Mar 26, 2012 at 01:59:05PM +0200, Avi Kivity wrote:
>>> On 03/26/2012 01:24 PM, Jiri Denemark wrote:
>>>> ...
>>>>>> The command line becomes unstable if you use -nodefconfig.
>>>>>
>>>>> -no-user-config solves this but I fully expect libvirt would continue to use
>>>>> -nodefconfig.
>>>>
>>>> Libvirt uses -nodefaults -nodefconfig because it wants to fully control how
>>>> the virtual machine will look like (mainly in terms of devices). In other
>>>> words, we don't want any devices to just magically appear without libvirt
>>>> knowing about them. -nodefaults gets rid of default devices that are built
>>>> directly in qemu. Since users can set any devices or command line options
>>>> (such as enable-kvm) into qemu configuration files in @SYSCONFDIR@, we need to
>>>> avoid reading those files as well. Hence we use -nodefconfig. However, we
>>>> would still like qemu to read CPU definitions, machine types, etc. once they
>>>> become externally loaded configuration (or however we decide to call it). That
>>>> said, when CPU definitions are moved into @DATADIR@, and -no-user-config is
>>>> introduced, I don't see any reason for libvirt to keep using -nodefconfig.
>
> ACK.
>
>>>>
>>>> I actually like
>>>> -no-user-config
>>>> more than
>>>> -nodefconfig -readconfig @DATADIR@/...
>>>> since it would avoid additional magic to detect what files libvirt should
>>>> explicitly pass to -readconfig but basically any approach that would allow us
>>>> to do read files only from @DATADIR@ is much better than what we have with
>>>> -nodefconfig now.
>>>
>>> That's how I see it as well.
>>>
>> +1
>>
>> except that instead of -no-user-config we can do what most other
>> programs do. If config file is specified during invocation default one
>> is not used. After implementing -no-user-config (or similar) we can drop
>> -nodefconfig entirely since its only user will be gone it its semantics
>> is not clear.
>
> Awesome. It looks like we have a solution now? Anthony, do you agree
> with that? Daniel, it looks good for you?

We cannot and should not drop -nodefconfig.  But yes, I agree that we should 
introduce -no-user-config and use the semantics I specified earlier in the thread.

Regards,

Anthony Liguori
Avi Kivity March 28, 2012, 9:55 a.m. UTC | #94
On 03/26/2012 09:03 PM, Anthony Liguori wrote:
>
> I think what we want to move toward is a -no-machine option which
> allows a user to explicitly build a machine from scratch.  That is:
>
> qemu -no-machine -device i440fx,id=host -device isa-serial,chr=chr0 ...
>

I'd call it -M bare-1.1, so that it can be used to override driver
properties in 1.2+.

So we'd have

  # default machine for this version
  qemu / qemu -M pc

  # an older version's pc
  qemu -M pc-1.1

  # just a chassis, bring your own screwdriver
  qemu -M bare

  # previous generation chassis, beige
  qemu -M bare-1.1

That is because -M not only specifies the components that go into the
machine, it also alters other devices you add to it.

This also helps preserve the planet's dwindling supply of command line
options.
Avi Kivity March 28, 2012, 9:59 a.m. UTC | #95
On 03/26/2012 09:00 PM, Anthony Liguori wrote:
>>> Yes, that's one reason.  But maybe a user wants to have a whole
>>> different set of machine types and doesn't care to have the ones we
>>> provide.  Why prevent a user from doing this?
>>
>> How are we preventing a user from doing it?  In what way is -nodefconfig
>> helping it?
>
>
> Let me explain it in a different way, perhaps.
>
> We launch smbd in QEMU in order to do file sharing over slirp.  One of
> the historic problems we've had is that we don't assume root
> privileges, yet want to be able to run smbd without using any of the
> system configuration files.
>
> You can do this by specify -s with the config file, and then in the
> config file you can overload the various default paths (like private
> dir, lock dir, etc.). In some cases, earlier versions of smbd didn't
> allow you to change private dir.
>
> You should be able to tell a well behaved tool not to read any
> configuration/data files and explicitly tell it where/how to read
> them.  We cannot exhaustively anticipate every future use case of QEMU.

100% agree.  But that says nothing about a text file that defines
"westmere" as a set of cpu flags, as long as we allow the user to define
"mywestmere" as a different set.  That is because target-x86_64.cfg does
not configure anything, it just defines a macro, which qemu doesn't
force you to use.

>
> But beyond the justification for -nodefconfig, the fact is that it
> exists today, and has a specific semantic.  If we want to have a
> different semantic, we should introduce a new option (-no-user-config).

Sure.
diff mbox

Patch

diff --git a/Makefile b/Makefile
index c1dadd0..d147632 100644
--- a/Makefile
+++ b/Makefile
@@ -319,11 +319,11 @@  ifdef CONFIG_POSIX
 	$(INSTALL_DATA) qemu-nbd.8 "$(DESTDIR)$(mandir)/man8"
 endif
 
-install-sysconfig:
-	$(INSTALL_DIR) "$(sysconfdir)/qemu"
-	$(INSTALL_DATA) sysconfigs/target/target-x86_64.conf "$(sysconfdir)/qemu"
+install-cpuconfig:
+	$(INSTALL_DIR) "$(DESTDIR)$(cpuconfdir)"
+	$(INSTALL_DATA) sysconfigs/target/cpu-x86_64.conf "$(DESTDIR)$(cpuconfdir)"
 
-install: all $(if $(BUILD_DOCS),install-doc) install-sysconfig
+install: all $(if $(BUILD_DOCS),install-doc) install-cpuconfig
 	$(INSTALL_DIR) "$(DESTDIR)$(bindir)"
 ifneq ($(TOOLS),)
 	$(INSTALL_PROG) $(STRIP_OPT) $(TOOLS) "$(DESTDIR)$(bindir)"
diff --git a/configure b/configure
index 9e62b0d..be78a0c 100755
--- a/configure
+++ b/configure
@@ -33,6 +33,7 @@  prefix=""
 interp_prefix="/usr/gnemul/qemu-%M"
 static="no"
 sysconfdir=""
+cpuconfdir=""
 sparc_cpu=""
 cross_prefix=""
 cc="gcc"
@@ -480,6 +481,8 @@  for opt do
   ;;
   --sysconfdir=*) sysconfdir="$optarg"
   ;;
+  --cpuconfdir=*) cpuconfdir="$optarg"
+  ;;
   --disable-sdl) sdl="no"
   ;;
   --enable-sdl) sdl="yes"
@@ -739,6 +742,7 @@  echo "  --make=MAKE              use specified make [$make]"
 echo "  --install=INSTALL        use specified install [$install]"
 echo "  --static                 enable static build [$static]"
 echo "  --sysconfdir=PATH        install config in PATH"
+echo "  --cpuconfdir=PATH        install cpu model config in PATH"
 echo "  --enable-debug-tcg       enable TCG debugging"
 echo "  --disable-debug-tcg      disable TCG debugging (default)"
 echo "  --enable-debug           enable common debug build options"
@@ -2058,6 +2062,9 @@  else
   if test -z "$sysconfdir" ; then
       sysconfdir="${prefix}/etc"
   fi
+  if test -z "$cpuconfdir" ; then
+      cpuconfdir="$sysconfdir/qemu"
+  fi
 fi
 
 if test -f kvm/kernel/configure; then
@@ -2159,6 +2166,7 @@  if test "$mingw32" = "yes" ; then
   echo "CONFIG_QEMU_CONFDIR=\"$sysconfdir\"" >> $config_host_mak
 else
   echo "CONFIG_QEMU_CONFDIR=\"${sysconfdir}/qemu\"" >> $config_host_mak
+  echo "CONFIG_QEMU_CPUCONFDIR=\"${cpuconfdir}\"" >> $config_host_mak
 fi
 
 case "$cpu" in
@@ -2426,6 +2434,7 @@  echo "bindir=\${prefix}$binsuffix" >> $config_host_mak
 echo "mandir=\${prefix}$mansuffix" >> $config_host_mak
 echo "datadir=\${prefix}$datasuffix" >> $config_host_mak
 echo "sysconfdir=$sysconfdir" >> $config_host_mak
+echo "cpuconfdir=$cpuconfdir" >> $config_host_mak
 echo "docdir=\${prefix}$docsuffix" >> $config_host_mak
 echo "MAKE=$make" >> $config_host_mak
 echo "INSTALL=$install" >> $config_host_mak
diff --git a/sysconfigs/target/cpu-x86_64.conf b/sysconfigs/target/cpu-x86_64.conf
new file mode 100644
index 0000000..ca07088
--- /dev/null
+++ b/sysconfigs/target/cpu-x86_64.conf
@@ -0,0 +1,85 @@ 
+# x86 CPU MODELS
+
+[cpudef]
+   name = "Conroe"
+   level = "2"
+   vendor = "GenuineIntel"
+   family = "6"
+   model = "2"
+   stepping = "3"
+   feature_edx = "sse2 sse fxsr mmx pat cmov pge sep apic cx8 mce pae msr tsc pse de fpu    mtrr clflush mca pse36"
+   feature_ecx = "sse3 ssse3 x2apic"
+   extfeature_edx = "fxsr mmx pat cmov pge apic cx8 mce pae msr tsc pse de fpu    lm syscall nx"
+   extfeature_ecx = "lahf_lm"
+   xlevel = "0x8000000A"
+   model_id = "Intel Celeron_4x0 (Conroe/Merom Class Core 2)"
+
+[cpudef]
+   name = "Penryn"
+   level = "2"
+   vendor = "GenuineIntel"
+   family = "6"
+   model = "2"
+   stepping = "3"
+   feature_edx = "sse2 sse fxsr mmx pat cmov pge sep apic cx8 mce pae msr tsc pse de fpu    mtrr clflush mca pse36"
+   feature_ecx = "sse3 cx16 ssse3 sse4.1 x2apic"
+   extfeature_edx = "fxsr mmx pat cmov pge apic cx8 mce pae msr tsc pse de fpu    lm syscall nx"
+   extfeature_ecx = "lahf_lm"
+   xlevel = "0x8000000A"
+   model_id = "Intel Core 2 Duo P9xxx (Penryn Class Core 2)"
+
+[cpudef]
+   name = "Nehalem"
+   level = "2"
+   vendor = "GenuineIntel"
+   family = "6"
+   model = "2"
+   stepping = "3"
+   feature_edx = "sse2 sse fxsr mmx pat cmov pge sep apic cx8 mce pae msr tsc pse de fpu    mtrr clflush mca pse36"
+   feature_ecx = "sse3 cx16 ssse3 sse4.1 sse4.2 x2apic popcnt"
+   extfeature_edx = "fxsr mmx pat cmov pge apic cx8 mce pae msr tsc pse de fpu    lm syscall nx"
+   extfeature_ecx = "lahf_lm"
+   xlevel = "0x8000000A"
+   model_id = "Intel Core i7 9xx (Nehalem Class Core i7)"
+
+[cpudef]
+   name = "Opteron_G1"
+   level = "5"
+   vendor = "AuthenticAMD"
+   family = "15"
+   model = "6"
+   stepping = "1"
+   feature_edx = "sse2 sse fxsr mmx pat cmov pge sep apic cx8 mce pae msr tsc pse de fpu    mtrr clflush mca pse36"
+   feature_ecx = "sse3"
+   extfeature_edx = "fxsr mmx pat cmov pge apic cx8 mce pae msr tsc pse de fpu    lm syscall nx"
+#   extfeature_ecx = ""
+   xlevel = "0x80000008"
+   model_id = "AMD Opteron 240 (Gen 1 Class Opteron)"
+
+[cpudef]
+   name = "Opteron_G2"
+   level = "5"
+   vendor = "AuthenticAMD"
+   family = "15"
+   model = "6"
+   stepping = "1"
+   feature_edx = "sse2 sse fxsr mmx pat cmov pge sep apic cx8 mce pae msr tsc pse de fpu    mtrr clflush mca pse36"
+   feature_ecx = "sse3 cx16"
+   extfeature_edx = "fxsr mmx pat cmov pge apic cx8 mce pae msr tsc pse de fpu    lm syscall nx rdtscp"
+   extfeature_ecx = "svm lahf_lm"
+   xlevel = "0x80000008"
+   model_id = "AMD Opteron 22xx (Gen 2 Class Opteron)"
+
+[cpudef]
+   name = "Opteron_G3"
+   level = "5"
+   vendor = "AuthenticAMD"
+   family = "15"
+   model = "6"
+   stepping = "1"
+   feature_edx = "sse2 sse fxsr mmx pat cmov pge sep apic cx8 mce pae msr tsc pse de fpu    mtrr clflush mca pse36"
+   feature_ecx = "sse3 cx16 monitor popcnt"
+   extfeature_edx = "fxsr mmx pat cmov pge apic cx8 mce pae msr tsc pse de fpu    lm syscall nx rdtscp"
+   extfeature_ecx = "svm sse4a  abm misalignsse lahf_lm"
+   xlevel = "0x80000008"
+   model_id = "AMD Opteron 23xx (Gen 3 Class Opteron)"
diff --git a/sysconfigs/target/target-x86_64.conf b/sysconfigs/target/target-x86_64.conf
deleted file mode 100644
index ca07088..0000000
--- a/sysconfigs/target/target-x86_64.conf
+++ /dev/null
@@ -1,85 +0,0 @@ 
-# x86 CPU MODELS
-
-[cpudef]
-   name = "Conroe"
-   level = "2"
-   vendor = "GenuineIntel"
-   family = "6"
-   model = "2"
-   stepping = "3"
-   feature_edx = "sse2 sse fxsr mmx pat cmov pge sep apic cx8 mce pae msr tsc pse de fpu    mtrr clflush mca pse36"
-   feature_ecx = "sse3 ssse3 x2apic"
-   extfeature_edx = "fxsr mmx pat cmov pge apic cx8 mce pae msr tsc pse de fpu    lm syscall nx"
-   extfeature_ecx = "lahf_lm"
-   xlevel = "0x8000000A"
-   model_id = "Intel Celeron_4x0 (Conroe/Merom Class Core 2)"
-
-[cpudef]
-   name = "Penryn"
-   level = "2"
-   vendor = "GenuineIntel"
-   family = "6"
-   model = "2"
-   stepping = "3"
-   feature_edx = "sse2 sse fxsr mmx pat cmov pge sep apic cx8 mce pae msr tsc pse de fpu    mtrr clflush mca pse36"
-   feature_ecx = "sse3 cx16 ssse3 sse4.1 x2apic"
-   extfeature_edx = "fxsr mmx pat cmov pge apic cx8 mce pae msr tsc pse de fpu    lm syscall nx"
-   extfeature_ecx = "lahf_lm"
-   xlevel = "0x8000000A"
-   model_id = "Intel Core 2 Duo P9xxx (Penryn Class Core 2)"
-
-[cpudef]
-   name = "Nehalem"
-   level = "2"
-   vendor = "GenuineIntel"
-   family = "6"
-   model = "2"
-   stepping = "3"
-   feature_edx = "sse2 sse fxsr mmx pat cmov pge sep apic cx8 mce pae msr tsc pse de fpu    mtrr clflush mca pse36"
-   feature_ecx = "sse3 cx16 ssse3 sse4.1 sse4.2 x2apic popcnt"
-   extfeature_edx = "fxsr mmx pat cmov pge apic cx8 mce pae msr tsc pse de fpu    lm syscall nx"
-   extfeature_ecx = "lahf_lm"
-   xlevel = "0x8000000A"
-   model_id = "Intel Core i7 9xx (Nehalem Class Core i7)"
-
-[cpudef]
-   name = "Opteron_G1"
-   level = "5"
-   vendor = "AuthenticAMD"
-   family = "15"
-   model = "6"
-   stepping = "1"
-   feature_edx = "sse2 sse fxsr mmx pat cmov pge sep apic cx8 mce pae msr tsc pse de fpu    mtrr clflush mca pse36"
-   feature_ecx = "sse3"
-   extfeature_edx = "fxsr mmx pat cmov pge apic cx8 mce pae msr tsc pse de fpu    lm syscall nx"
-#   extfeature_ecx = ""
-   xlevel = "0x80000008"
-   model_id = "AMD Opteron 240 (Gen 1 Class Opteron)"
-
-[cpudef]
-   name = "Opteron_G2"
-   level = "5"
-   vendor = "AuthenticAMD"
-   family = "15"
-   model = "6"
-   stepping = "1"
-   feature_edx = "sse2 sse fxsr mmx pat cmov pge sep apic cx8 mce pae msr tsc pse de fpu    mtrr clflush mca pse36"
-   feature_ecx = "sse3 cx16"
-   extfeature_edx = "fxsr mmx pat cmov pge apic cx8 mce pae msr tsc pse de fpu    lm syscall nx rdtscp"
-   extfeature_ecx = "svm lahf_lm"
-   xlevel = "0x80000008"
-   model_id = "AMD Opteron 22xx (Gen 2 Class Opteron)"
-
-[cpudef]
-   name = "Opteron_G3"
-   level = "5"
-   vendor = "AuthenticAMD"
-   family = "15"
-   model = "6"
-   stepping = "1"
-   feature_edx = "sse2 sse fxsr mmx pat cmov pge sep apic cx8 mce pae msr tsc pse de fpu    mtrr clflush mca pse36"
-   feature_ecx = "sse3 cx16 monitor popcnt"
-   extfeature_edx = "fxsr mmx pat cmov pge apic cx8 mce pae msr tsc pse de fpu    lm syscall nx rdtscp"
-   extfeature_ecx = "svm sse4a  abm misalignsse lahf_lm"
-   xlevel = "0x80000008"
-   model_id = "AMD Opteron 23xx (Gen 3 Class Opteron)"
diff --git a/vl.c b/vl.c
index a3b682d..eb993d6 100644
--- a/vl.c
+++ b/vl.c
@@ -5141,6 +5141,13 @@  int main(int argc, char **argv, char **envp)
         }
     }
 
+    /* load local cpu config, NB: may possibly be overridden by defconfigs
+     */
+    if (qemu_read_config_file(
+        CONFIG_QEMU_CPUCONFDIR "/cpu-" TARGET_ARCH ".conf",
+        defconfig_verbose) == -EINVAL) {
+            exit(1);
+    }
     if (defconfig) {
         int ret;