diff mbox series

[RFC,v2,1/1] s390x/css: unrestrict cssids

Message ID 20171128130758.67556-1-pasic@linux.vnet.ibm.com
State New
Headers show
Series [RFC,v2,1/1] s390x/css: unrestrict cssids | expand

Commit Message

Halil Pasic Nov. 28, 2017, 1:07 p.m. UTC
The default css 0xfe is currently restricted to virtual subchannel
devices. The hope when the decision was made was, that non-virtual
subchannel devices will come around when guest can exploit multiple
channel subsystems. Since the guests generally don't do, the pain
of the partitioned (cssid) namespace outweighs the gain.

Let us remove the corresponding restrictions (virtual devices
can be put only in 0xfe and non-virtual devices in any css except
the 0xfe -- while s390-squash-mcss then remaps everything to cssid 0).

With this change, however, our schema for generating a css bus ids, if
none is specified does not make much sense. Currently we start at cssid
0x0 for non-virtual devices and use the default css (without
s390-squash-mcss exclusively) for virtual devices.  That means for
non-virtual device the device would auto-magically end up non-visible for
guests (which can't use the other channel subsystems).

Thus let us also change the css bus id auto assignment algorithm,
so that we first fill the default css, and then proceed to the
next one (modulo MAX_CSSID).

The adverse effect of getting rid of the restriction on migration should
not be too severe.  Vfio-ccw devices are not live-migratable yet, and for
virtual devices using the extra freedom would only make sense with the
aforementioned guest support in place.

The auto-generated bus ids are affected by both changes. We hope to not
encounter any auto-generated bus ids in production as Libvirt is always
explicit about the bus id.  Since 8ed179c937 ("s390x/css: catch section
mismatch on load", 2017-05-18) the worst that can happen because the same
device ended up having a different bus id is a cleanly failed migration.
I find it hard to reason about the impact of changed auto-generated bus
ids on migration for command line users as I don't know which rules is
such an user supposed to follow.

Another pain-point is down- or upgrade of QEMU for command line users.
The old way and the new way of doing vfio-ccw are mutually incompatible.
Libvirt is only going to support the new way, so for libvirt users, the
possible problems at QEMU downgrade are the following. If a domain
contains virtual devices placed into a css different than 0xfe the domain
wil refuse to start with a QEMU not having this patch. Putting devices
into a css different that 0xfe however won't make much sense in the
near future (guest support). Libvirt will refuse to do vfio-ccw with
a QEMU not having this patch. This is business as usual.

Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com>

---
Hi all!

Moving the property to the machine turned out very ugly (IMHO).  Libvirt
detects machine properties via query-command-line-options.  This is
however decoupled from the existence of the actual property: one needs to
touch util/qemu-config.c (see patch) so the property shows up.
Furthermore this stuff is global. I've also noticed that the infamous
s390-squash-mcss does not show up.

On the other hand to get the property displayed by -machine
s390-ccw-virtio,help one needs a setter on the property. So I have
created a fake setter which produces an error each time called.

I would strongly prefer putting back the property to the device level!

v1 -> v2:
* changed ccw bus id generation too (see commit message)
* moved the property to the machine (see cover letter)
* added a description to the property
---
 hw/s390x/3270-ccw.c        |  2 +-
 hw/s390x/css.c             | 28 ++++------------------------
 hw/s390x/s390-ccw.c        |  2 +-
 hw/s390x/s390-virtio-ccw.c | 21 +++++++++++++++++++++
 hw/s390x/virtio-ccw.c      |  2 +-
 include/hw/s390x/css.h     | 12 ++++--------
 util/qemu-config.c         |  6 ++++++
 7 files changed, 38 insertions(+), 35 deletions(-)

Comments

Dong Jia Shi Nov. 29, 2017, 8:17 a.m. UTC | #1
* Halil Pasic <pasic@linux.vnet.ibm.com> [2017-11-28 14:07:58 +0100]:

[...]
> 
> The auto-generated bus ids are affected by both changes. We hope to not
> encounter any auto-generated bus ids in production as Libvirt is always
> explicit about the bus id.  Since 8ed179c937 ("s390x/css: catch section
> mismatch on load", 2017-05-18) the worst that can happen because the same
> device ended up having a different bus id is a cleanly failed migration.
> I find it hard to reason about the impact of changed auto-generated bus
> ids on migration for command line users as I don't know which rules is
> such an user supposed to follow.
For this paragraph, Halil pointed to me a case that he is thinking of.
1. VM configuration with 3 devices:
  -device virtio (e.g. virtio-blk-ccw,id=disk0)
  -device vfio-ccw (e.g. id=vfio0)
  -device virtio (e.g. virtio-rng-ccw,id=rng0)
2. Start the vm.
3. device_del vfio0
4. migrate "exec:gzip -c > /tmp/tmp_vmstate.gz"
5. modify cmd line from step 1 by removing the vfio0 device, and adding:
   -incoming "exec:gzip -c -d /tmp/tmp_vmstate.gz"

Let me list my test results here for everybody's reference.

W/o this patch
==============

------------+---------------+-------------
            | squashing off | squashing on
------------+---------------+-------------
    auto id |        F      |     F
------------+---------------+-------------
explicit id |        F      |     S
------------+---------------+-------------

T1. squashing off + auto id
  qemu-system-s390x: vmstate: get_nullptr expected VMS_NULLPTR_MARKER
  qemu-system-s390x: Failed to load s390_css:css
  qemu-system-s390x: error while loading state for instance 0x0 of device 's390_css'
  qemu-system-s390x: load of migration failed: Invalid argument
[Fail due to css mismatch - there is no css 0 in the new vm.]

T2. squashing off + explicit given id
  qemu-system-s390x: vmstate: get_nullptr expected VMS_NULLPTR_MARKER
  qemu-system-s390x: Failed to load s390_css:css
  qemu-system-s390x: error while loading state for instance 0x0 of device 's390_css'
  qemu-system-s390x: load of migration failed: Invalid argument
[Fail due to css mismatch - there is no css 0 in the new vm.]

T3. squashing on  + auto id
  qemu-system-s390x: Unknown savevm section or instance '/00.0.0003/virtio-rng' 0
  qemu-system-s390x: load of migration failed: Invalid argument
[Fail due to busid mismatch.]

T4. squashing on  + explicit given id
  Succeed.

With this patch
===============

------------+---------------+-------------
            | squashing off | squashing on
------------+---------------+-------------
    auto id |        F      |     F
------------+---------------+-------------
explicit id |        S'     |     S
------------+---------------+-------------

T5. squashing off + auto id
  qemu-system-s390x: Unknown savevm section or instance '/fe.0.0003/virtio-rng' 0
  qemu-system-s390x: load of migration failed: Invalid argument
[Fail due to busid mismatch.]

T6. squashing off + explicit given id
  qemu-system-s390x: vmstate: get_nullptr expected VMS_NULLPTR_MARKER
  qemu-system-s390x: Failed to load s390_css:css
  qemu-system-s390x: error while loading state for instance 0x0 of device 's390_css'
  qemu-system-s390x: load of migration failed: Invalid argument
[Setting vfio-ccw.devno=non-fe.x.xxxx. (same as T1)
 Fail due to css mismatch - there is no css 0 in the new vm.]

  Succeed.
[Setting vfio-ccw.devno=fe.x.xxxx.]

T7. squashing on  + auto id
  qemu-system-s390x: Unknown savevm section or instance '/00.0.0003/virtio-rng' 0
  qemu-system-s390x: load of migration failed: Invalid argument
[Fail due to busid mismatch.]

T8. squashing on  + explicit given id
  Succeed.


Notice:
The differences of the test results between w and w/o this patch are in
the "squashing off" cases. I think these are things that we can accept.

[...]
Dong Jia Shi Nov. 29, 2017, 8:20 a.m. UTC | #2
* Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> [2017-11-29 16:17:35 +0800]:

[...]

> T6. squashing off + explicit given id
>   qemu-system-s390x: vmstate: get_nullptr expected VMS_NULLPTR_MARKER
>   qemu-system-s390x: Failed to load s390_css:css
>   qemu-system-s390x: error while loading state for instance 0x0 of device 's390_css'
>   qemu-system-s390x: load of migration failed: Invalid argument
> [Setting vfio-ccw.devno=non-fe.x.xxxx. (same as T1)
s/T1/T2/

>  Fail due to css mismatch - there is no css 0 in the new vm.]
> 
>   Succeed.
> [Setting vfio-ccw.devno=fe.x.xxxx.]
> 

[...]
Cornelia Huck Nov. 29, 2017, 11:47 a.m. UTC | #3
On Wed, 29 Nov 2017 16:17:35 +0800
Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> wrote:

> * Halil Pasic <pasic@linux.vnet.ibm.com> [2017-11-28 14:07:58 +0100]:
> 
> [...]
> > 
> > The auto-generated bus ids are affected by both changes. We hope to not
> > encounter any auto-generated bus ids in production as Libvirt is always
> > explicit about the bus id.  Since 8ed179c937 ("s390x/css: catch section
> > mismatch on load", 2017-05-18) the worst that can happen because the same
> > device ended up having a different bus id is a cleanly failed migration.
> > I find it hard to reason about the impact of changed auto-generated bus
> > ids on migration for command line users as I don't know which rules is
> > such an user supposed to follow.  
> For this paragraph, Halil pointed to me a case that he is thinking of.
> 1. VM configuration with 3 devices:
>   -device virtio (e.g. virtio-blk-ccw,id=disk0)
>   -device vfio-ccw (e.g. id=vfio0)
>   -device virtio (e.g. virtio-rng-ccw,id=rng0)
> 2. Start the vm.
> 3. device_del vfio0
> 4. migrate "exec:gzip -c > /tmp/tmp_vmstate.gz"
> 5. modify cmd line from step 1 by removing the vfio0 device, and adding:
>    -incoming "exec:gzip -c -d /tmp/tmp_vmstate.gz"
> 
> Let me list my test results here for everybody's reference.
> 
> W/o this patch
> ==============
> 
> ------------+---------------+-------------
>             | squashing off | squashing on
> ------------+---------------+-------------
>     auto id |        F      |     F
> ------------+---------------+-------------
> explicit id |        F      |     S
> ------------+---------------+-------------
> 
> T1. squashing off + auto id
>   qemu-system-s390x: vmstate: get_nullptr expected VMS_NULLPTR_MARKER
>   qemu-system-s390x: Failed to load s390_css:css
>   qemu-system-s390x: error while loading state for instance 0x0 of device 's390_css'
>   qemu-system-s390x: load of migration failed: Invalid argument
> [Fail due to css mismatch - there is no css 0 in the new vm.]
> 
> T2. squashing off + explicit given id
>   qemu-system-s390x: vmstate: get_nullptr expected VMS_NULLPTR_MARKER
>   qemu-system-s390x: Failed to load s390_css:css
>   qemu-system-s390x: error while loading state for instance 0x0 of device 's390_css'
>   qemu-system-s390x: load of migration failed: Invalid argument
> [Fail due to css mismatch - there is no css 0 in the new vm.]

Hmm... so should we even try to migrate an empty css 0? It only exists
because we have created a device that we had to detach anyway because
it was non-migrateable...

[Probably no easy way to deal with this, though.]

> 
> T3. squashing on  + auto id
>   qemu-system-s390x: Unknown savevm section or instance '/00.0.0003/virtio-rng' 0
>   qemu-system-s390x: load of migration failed: Invalid argument
> [Fail due to busid mismatch.]
> 
> T4. squashing on  + explicit given id
>   Succeed.
> 
> With this patch
> ===============
> 
> ------------+---------------+-------------
>             | squashing off | squashing on
> ------------+---------------+-------------
>     auto id |        F      |     F
> ------------+---------------+-------------
> explicit id |        S'     |     S
> ------------+---------------+-------------
> 
> T5. squashing off + auto id
>   qemu-system-s390x: Unknown savevm section or instance '/fe.0.0003/virtio-rng' 0
>   qemu-system-s390x: load of migration failed: Invalid argument
> [Fail due to busid mismatch.]
> 
> T6. squashing off + explicit given id
>   qemu-system-s390x: vmstate: get_nullptr expected VMS_NULLPTR_MARKER
>   qemu-system-s390x: Failed to load s390_css:css
>   qemu-system-s390x: error while loading state for instance 0x0 of device 's390_css'
>   qemu-system-s390x: load of migration failed: Invalid argument
> [Setting vfio-ccw.devno=non-fe.x.xxxx. (same as T1)
>  Fail due to css mismatch - there is no css 0 in the new vm.]
> 
>   Succeed.
> [Setting vfio-ccw.devno=fe.x.xxxx.]

Don't you need to attach the vfio-ccw device later anyway? You have to
detach it from the source before you migrate, and I'd expect it to be
symmetric.

> 
> T7. squashing on  + auto id
>   qemu-system-s390x: Unknown savevm section or instance '/00.0.0003/virtio-rng' 0
>   qemu-system-s390x: load of migration failed: Invalid argument
> [Fail due to busid mismatch.]
> 
> T8. squashing on  + explicit given id
>   Succeed.
> 
> 
> Notice:
> The differences of the test results between w and w/o this patch are in
> the "squashing off" cases. I think these are things that we can accept.

Yes, I think that makes sense. If you want reliable migration, you need
to be specific with your ids. I'd just don't want us to break things
explicitly.
Cornelia Huck Nov. 29, 2017, 12:37 p.m. UTC | #4
On Tue, 28 Nov 2017 14:07:58 +0100
Halil Pasic <pasic@linux.vnet.ibm.com> wrote:

> The default css 0xfe is currently restricted to virtual subchannel
> devices. The hope when the decision was made was, that non-virtual
> subchannel devices will come around when guest can exploit multiple

s/guest/guests/

> channel subsystems. Since the guests generally don't do, the pain

s/the guests generally don't do/current guests don't do that/

> of the partitioned (cssid) namespace outweighs the gain.
> 
> Let us remove the corresponding restrictions (virtual devices
> can be put only in 0xfe and non-virtual devices in any css except
> the 0xfe -- while s390-squash-mcss then remaps everything to cssid 0).
> 
> With this change, however, our schema for generating a css bus ids, if
> none is specified does not make much sense. Currently we start at cssid
> 0x0 for non-virtual devices and use the default css (without
> s390-squash-mcss exclusively) for virtual devices.  That means for
> non-virtual device the device would auto-magically end up non-visible for
> guests (which can't use the other channel subsystems).
> 
> Thus let us also change the css bus id auto assignment algorithm,
> so that we first fill the default css, and then proceed to the
> next one (modulo MAX_CSSID).

Let's reword the previous two paragraphs:

"At the same time, change our schema for generating css bus ids to put
both virtual and non-virtual devices into the default css (spilling
over into other css images, if needed) so that devices without a
specified id don't end up hidden to guests not supporting multiple
channel subsystems."

> 
> The adverse effect of getting rid of the restriction on migration should
> not be too severe.  Vfio-ccw devices are not live-migratable yet, and for
> virtual devices using the extra freedom would only make sense with the
> aforementioned guest support in place.
> 
> The auto-generated bus ids are affected by both changes. We hope to not
> encounter any auto-generated bus ids in production as Libvirt is always
> explicit about the bus id.  Since 8ed179c937 ("s390x/css: catch section
> mismatch on load", 2017-05-18) the worst that can happen because the same
> device ended up having a different bus id is a cleanly failed migration.
> I find it hard to reason about the impact of changed auto-generated bus
> ids on migration for command line users as I don't know which rules is
> such an user supposed to follow.
> 
> Another pain-point is down- or upgrade of QEMU for command line users.
> The old way and the new way of doing vfio-ccw are mutually incompatible.
> Libvirt is only going to support the new way, so for libvirt users, the
> possible problems at QEMU downgrade are the following. If a domain
> contains virtual devices placed into a css different than 0xfe the domain
> wil refuse to start with a QEMU not having this patch. Putting devices
> into a css different that 0xfe however won't make much sense in the
> near future (guest support). Libvirt will refuse to do vfio-ccw with
> a QEMU not having this patch. This is business as usual.

All of this is valuable information, but I don't think a changelog is
the right place for it. We should really have a place where we can also
save things like Dong Jia's writeup downthread. In the documentation
folder or on the QEMU wiki (or both). We can be much more verbose there
(including examples, which make this stuff way easier to understand).
I'd recommend adding a documentation file with this patch (or patch
series, as I'd also like to see a squash-mcss deprecation patch).

> 
> Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com>
> 
> ---
> Hi all!
> 
> Moving the property to the machine turned out very ugly (IMHO).  Libvirt
> detects machine properties via query-command-line-options.  This is
> however decoupled from the existence of the actual property: one needs to
> touch util/qemu-config.c (see patch) so the property shows up.
> Furthermore this stuff is global. I've also noticed that the infamous
> s390-squash-mcss does not show up.

s390x-softmmu/qemu-system-s390x -machine s390-ccw-virtio,help

shows it, as does qom-list on /machine.

Output of <qemu binary> --help is very haphazard anyway and you should
rely on the interfaces above.

> 
> On the other hand to get the property displayed by -machine
> s390-ccw-virtio,help one needs a setter on the property. So I have
> created a fake setter which produces an error each time called.

Yes, this is fugly. A css object would be a better place, but way too
much work for now.

> 
> I would strongly prefer putting back the property to the device level!

I continue to strongly oppose that. The device is entirely the wrong
level.

> 
> v1 -> v2:
> * changed ccw bus id generation too (see commit message)
> * moved the property to the machine (see cover letter)
> * added a description to the property
> ---
>  hw/s390x/3270-ccw.c        |  2 +-
>  hw/s390x/css.c             | 28 ++++------------------------
>  hw/s390x/s390-ccw.c        |  2 +-
>  hw/s390x/s390-virtio-ccw.c | 21 +++++++++++++++++++++
>  hw/s390x/virtio-ccw.c      |  2 +-
>  include/hw/s390x/css.h     | 12 ++++--------
>  util/qemu-config.c         |  6 ++++++
>  7 files changed, 38 insertions(+), 35 deletions(-)
> 

> diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
> index 6a57f94197..b558b2adad 100644
> --- a/hw/s390x/s390-virtio-ccw.c
> +++ b/hw/s390x/s390-virtio-ccw.c
> @@ -556,6 +556,21 @@ static inline void machine_set_squash_mcss(Object *obj, bool value,
>      ms->s390_squash_mcss = value;
>  }
>  
> +static bool prop_get_true(Object *obj, Error **errp)
> +{
> +    return true;
> +}

I'd prefer to have something with 'cssid' in the name. An always-true
property should be rather the exception.

> +
> +/*
> + * This is a fake setter so the device shows up in the output of
> + * --machine s390-ccw-virtio,help.

Having a required setter is ugly. I wonder whether there's a better way
for non-modifiable properties. I doubt it, though.

> + */
> +static inline void prop_set_bool_fail(Object *obj, bool value,
> +                                           Error **errp)
> +{
> +    error_setg(errp, "Tried to set non-settable property!");
> +}
> +
>  static inline void s390_machine_initfn(Object *obj)
>  {
>      object_property_add_bool(obj, "aes-key-wrap",
> @@ -587,6 +602,12 @@ static inline void s390_machine_initfn(Object *obj)
>              "enable/disable squashing subchannels into the default css",
>              NULL);
>      object_property_set_bool(obj, false, "s390-squash-mcss", NULL);
> +    object_property_add_bool(obj, "cssid-unrestricted",
> +                                   prop_get_true, prop_set_bool_fail, NULL);
> +    object_property_set_description(obj, "cssid-unrestricted",
> +            "can use any cssid with any css device (the restriction,"
> +            " 0xfe for virtual and 0x00-0xfd for non-virtual, is no more)",
> +            NULL);
>  }
>  
>  static const TypeInfo ccw_machine_info = {

I suggest that you also update the comment for the squash-mcss
dependent css image creation in ccw_init().

> diff --git a/util/qemu-config.c b/util/qemu-config.c
> index 99b0e46fa3..acfc452fc2 100644
> --- a/util/qemu-config.c
> +++ b/util/qemu-config.c
> @@ -233,6 +233,12 @@ static QemuOptsList machine_opts = {
>              .help = "Up to 8 chars in set of [A-Za-z0-9. ](lower case chars"
>                      " converted to upper case) to pass to machine"
>                      " loader, boot manager, and guest kernel",
> +        },{
> +            /* TODO: Consider device property. This is way too ugly. */

It is ugly, but a device property is worse.

> +            .name = "cssid-unrestricted",
> +            .type = QEMU_OPT_BOOL,
> +            .help = "can use any cssid with any css device (the restriction,"
> +            " 0xfe for virtual and 0x00-0xfd for non-virtual, is no more)",

"Always true (a css device can use any cssid, regardless whether
virtual or not)"

?

>          },
>          { /* End of list */ }
>      }
Halil Pasic Nov. 29, 2017, 4:25 p.m. UTC | #5
On 11/29/2017 01:37 PM, Cornelia Huck wrote:
> On Tue, 28 Nov 2017 14:07:58 +0100
> Halil Pasic <pasic@linux.vnet.ibm.com> wrote:
> 
>> The default css 0xfe is currently restricted to virtual subchannel
>> devices. The hope when the decision was made was, that non-virtual
>> subchannel devices will come around when guest can exploit multiple
> 
> s/guest/guests/

OK.

> 
>> channel subsystems. Since the guests generally don't do, the pain
> 
> s/the guests generally don't do/current guests don't do that/
> 

OK.

>> of the partitioned (cssid) namespace outweighs the gain.
>>
>> Let us remove the corresponding restrictions (virtual devices
>> can be put only in 0xfe and non-virtual devices in any css except
>> the 0xfe -- while s390-squash-mcss then remaps everything to cssid 0).
>>
>> With this change, however, our schema for generating a css bus ids, if
>> none is specified does not make much sense. Currently we start at cssid
>> 0x0 for non-virtual devices and use the default css (without
>> s390-squash-mcss exclusively) for virtual devices.  That means for
>> non-virtual device the device would auto-magically end up non-visible for
>> guests (which can't use the other channel subsystems).
>>
>> Thus let us also change the css bus id auto assignment algorithm,
>> so that we first fill the default css, and then proceed to the
>> next one (modulo MAX_CSSID).
> 
> Let's reword the previous two paragraphs:
> 
> "At the same time, change our schema for generating css bus ids to put
> both virtual and non-virtual devices into the default css (spilling
> over into other css images, if needed) so that devices without a
> specified id don't end up hidden to guests not supporting multiple
> channel subsystems."
> 

What I don't like about your explanation is, that "so that devices without
a specified id don't end up hidden to guests not supporting multiple channel 
subsystems" is not necessarily true: if we spill the devices are going
to end up hidden.

>>
>> The adverse effect of getting rid of the restriction on migration should
>> not be too severe.  Vfio-ccw devices are not live-migratable yet, and for
>> virtual devices using the extra freedom would only make sense with the
>> aforementioned guest support in place.
>>
>> The auto-generated bus ids are affected by both changes. We hope to not
>> encounter any auto-generated bus ids in production as Libvirt is always
>> explicit about the bus id.  Since 8ed179c937 ("s390x/css: catch section
>> mismatch on load", 2017-05-18) the worst that can happen because the same
>> device ended up having a different bus id is a cleanly failed migration.
>> I find it hard to reason about the impact of changed auto-generated bus
>> ids on migration for command line users as I don't know which rules is
>> such an user supposed to follow.
>>
>> Another pain-point is down- or upgrade of QEMU for command line users.
>> The old way and the new way of doing vfio-ccw are mutually incompatible.
>> Libvirt is only going to support the new way, so for libvirt users, the
>> possible problems at QEMU downgrade are the following. If a domain
>> contains virtual devices placed into a css different than 0xfe the domain
>> wil refuse to start with a QEMU not having this patch. Putting devices
>> into a css different that 0xfe however won't make much sense in the
>> near future (guest support). Libvirt will refuse to do vfio-ccw with
>> a QEMU not having this patch. This is business as usual.
> 
> All of this is valuable information, but I don't think a changelog is
> the right place for it. We should really have a place where we can also
> save things like Dong Jia's writeup downthread. In the documentation
> folder or on the QEMU wiki (or both). We can be much more verbose there
> (including examples, which make this stuff way easier to understand).
> I'd recommend adding a documentation file with this patch (or patch
> series, as I'd also like to see a squash-mcss deprecation patch).
> 

I tired to be quite elaborate, because at some point of the v1
discussion, you said if we are planting landmines you want them explained
in the commit message. I'm not sure how this document file is supposed
to be called, and look like. I think this stuff is relevant for
the decision why is this patch a good one, and what are the trade-offs
we make. Referencing to a document would be also OK with me.

Regarding the deprecation patch. It's already on the list as RFC. You
have even commented on it. I intend to make a v2 once we know what are
we going to do here.

>>
>> Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com>
>>
>> ---
>> Hi all!
>>
>> Moving the property to the machine turned out very ugly (IMHO).  Libvirt
>> detects machine properties via query-command-line-options.  This is
>> however decoupled from the existence of the actual property: one needs to
>> touch util/qemu-config.c (see patch) so the property shows up.
>> Furthermore this stuff is global. I've also noticed that the infamous
>> s390-squash-mcss does not show up.
> 
> s390x-softmmu/qemu-system-s390x -machine s390-ccw-virtio,help
> 
> shows it, as does qom-list on /machine.

For qom-list you need an instance. Libvirt probes such stuff using
(btw probing is done with machine type none, and qom-list /machine
should not list squash if we are running machine type none).

> 
> Output of <qemu binary> --help is very haphazard anyway and you should
> rely on the interfaces above.
> 

I disagree. AFAIK management software should probe using the
query-command-line-options QMP command. Or am I missing something?

I don't speak about the output of <qemu binary> --help here.

>>
>> On the other hand to get the property displayed by -machine
>> s390-ccw-virtio,help one needs a setter on the property. So I have
>> created a fake setter which produces an error each time called.
> 
> Yes, this is fugly. A css object would be a better place, but way too
> much work for now.
> 

Actually not necessarily. We could simply put this at the virtual-css-bridge.
I don't know if Libvirt would accept that though. The problem regarding
virtual-css-bridge (and css object) was rw properties: we can't set a value
for a property of the virtual-css-bridge on the command line. That was a
problem for default-css or whatever but is completely fine for the read
only property 'cssid-unrestricted'.

>>
>> I would strongly prefer putting back the property to the device level!
> 
> I continue to strongly oppose that. The device is entirely the wrong
> level.
> 

I don't recall you ever explaining why. If it's completely wrong
I would have expected Boris and also me being for long enough around
to get it at least after the first hint.

>>
>> v1 -> v2:
>> * changed ccw bus id generation too (see commit message)
>> * moved the property to the machine (see cover letter)
>> * added a description to the property
>> ---
>>  hw/s390x/3270-ccw.c        |  2 +-
>>  hw/s390x/css.c             | 28 ++++------------------------
>>  hw/s390x/s390-ccw.c        |  2 +-
>>  hw/s390x/s390-virtio-ccw.c | 21 +++++++++++++++++++++
>>  hw/s390x/virtio-ccw.c      |  2 +-
>>  include/hw/s390x/css.h     | 12 ++++--------
>>  util/qemu-config.c         |  6 ++++++
>>  7 files changed, 38 insertions(+), 35 deletions(-)
>>
> 
>> diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
>> index 6a57f94197..b558b2adad 100644
>> --- a/hw/s390x/s390-virtio-ccw.c
>> +++ b/hw/s390x/s390-virtio-ccw.c
>> @@ -556,6 +556,21 @@ static inline void machine_set_squash_mcss(Object *obj, bool value,
>>      ms->s390_squash_mcss = value;
>>  }
>>  
>> +static bool prop_get_true(Object *obj, Error **errp)
>> +{
>> +    return true;
>> +}
> 
> I'd prefer to have something with 'cssid' in the name. An always-true
> property should be rather the exception.
> 

Can do that. Makes things less obvious in my opinion, but whatever
you like.

>> +
>> +/*
>> + * This is a fake setter so the device shows up in the output of
>> + * --machine s390-ccw-virtio,help.
> 
> Having a required setter is ugly. I wonder whether there's a better way
> for non-modifiable properties. I doubt it, though.
> 

We can just specify NULL for the setter. The only thing we loose is
that the property won't be listed by --machine s390-ccw-virtio,help.

>> + */
>> +static inline void prop_set_bool_fail(Object *obj, bool value,
>> +                                           Error **errp)
>> +{
>> +    error_setg(errp, "Tried to set non-settable property!");
>> +}
>> +
>>  static inline void s390_machine_initfn(Object *obj)
>>  {
>>      object_property_add_bool(obj, "aes-key-wrap",
>> @@ -587,6 +602,12 @@ static inline void s390_machine_initfn(Object *obj)
>>              "enable/disable squashing subchannels into the default css",
>>              NULL);
>>      object_property_set_bool(obj, false, "s390-squash-mcss", NULL);
>> +    object_property_add_bool(obj, "cssid-unrestricted",
>> +                                   prop_get_true, prop_set_bool_fail, NULL);
>> +    object_property_set_description(obj, "cssid-unrestricted",
>> +            "can use any cssid with any css device (the restriction,"
>> +            " 0xfe for virtual and 0x00-0xfd for non-virtual, is no more)",
>> +            NULL);
>>  }
>>  
>>  static const TypeInfo ccw_machine_info = {
> 
> I suggest that you also update the comment for the squash-mcss
> dependent css image creation in ccw_init().
> 
>> diff --git a/util/qemu-config.c b/util/qemu-config.c
>> index 99b0e46fa3..acfc452fc2 100644
>> --- a/util/qemu-config.c
>> +++ b/util/qemu-config.c
>> @@ -233,6 +233,12 @@ static QemuOptsList machine_opts = {
>>              .help = "Up to 8 chars in set of [A-Za-z0-9. ](lower case chars"
>>                      " converted to upper case) to pass to machine"
>>                      " loader, boot manager, and guest kernel",
>> +        },{
>> +            /* TODO: Consider device property. This is way too ugly. */
> 
> It is ugly, but a device property is worse.
> 

I fail to see how. But I respect your opinion (in particular, and maintainer
opinions in general).

AFAIU you see a machine property like this as the way to go. Please
reinforce me, and I will do a non-rfc patch addressing your concerns
with this one.

I'm open to any suggestions, but I really think we should put effort
into being constructive here. I feel we have been running in a circle
for almost a week.

>> +            .name = "cssid-unrestricted",
>> +            .type = QEMU_OPT_BOOL,
>> +            .help = "can use any cssid with any css device (the restriction,"
>> +            " 0xfe for virtual and 0x00-0xfd for non-virtual, is no more)",
> 
> "Always true (a css device can use any cssid, regardless whether
> virtual or not)"
> 
> ?

Are you proposing this as description, help or both?  I do agree that
always true is important. Is read only important too?

Also consider:
# qemu-git -device vfio-ccw,help  2>&1 
vfio-ccw.devno=str (Identifier of an I/O device in the channel subsystem, example: fe.1.23ab)

I'm not sure the user is educated what a valid cssid is.

> 
>>          },
>>          { /* End of list */ }
>>      }
>
Halil Pasic Nov. 29, 2017, 4:30 p.m. UTC | #6
On 11/29/2017 12:47 PM, Cornelia Huck wrote:
> On Wed, 29 Nov 2017 16:17:35 +0800
> Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> wrote:
> 
>> * Halil Pasic <pasic@linux.vnet.ibm.com> [2017-11-28 14:07:58 +0100]:
>>
>> [...]
>>> The auto-generated bus ids are affected by both changes. We hope to not
>>> encounter any auto-generated bus ids in production as Libvirt is always
>>> explicit about the bus id.  Since 8ed179c937 ("s390x/css: catch section
>>> mismatch on load", 2017-05-18) the worst that can happen because the same
>>> device ended up having a different bus id is a cleanly failed migration.
>>> I find it hard to reason about the impact of changed auto-generated bus
>>> ids on migration for command line users as I don't know which rules is
>>> such an user supposed to follow.  
>> For this paragraph, Halil pointed to me a case that he is thinking of.
>> 1. VM configuration with 3 devices:
>>   -device virtio (e.g. virtio-blk-ccw,id=disk0)
>>   -device vfio-ccw (e.g. id=vfio0)
>>   -device virtio (e.g. virtio-rng-ccw,id=rng0)
>> 2. Start the vm.
>> 3. device_del vfio0
>> 4. migrate "exec:gzip -c > /tmp/tmp_vmstate.gz"
>> 5. modify cmd line from step 1 by removing the vfio0 device, and adding:
>>    -incoming "exec:gzip -c -d /tmp/tmp_vmstate.gz"
>>
>> Let me list my test results here for everybody's reference.
>>
>> W/o this patch
>> ==============
>>
>> ------------+---------------+-------------
>>             | squashing off | squashing on
>> ------------+---------------+-------------
>>     auto id |        F      |     F
>> ------------+---------------+-------------
>> explicit id |        F      |     S
>> ------------+---------------+-------------
>>
>> T1. squashing off + auto id
>>   qemu-system-s390x: vmstate: get_nullptr expected VMS_NULLPTR_MARKER
>>   qemu-system-s390x: Failed to load s390_css:css
>>   qemu-system-s390x: error while loading state for instance 0x0 of device 's390_css'
>>   qemu-system-s390x: load of migration failed: Invalid argument
>> [Fail due to css mismatch - there is no css 0 in the new vm.]
>>
>> T2. squashing off + explicit given id
>>   qemu-system-s390x: vmstate: get_nullptr expected VMS_NULLPTR_MARKER
>>   qemu-system-s390x: Failed to load s390_css:css
>>   qemu-system-s390x: error while loading state for instance 0x0 of device 's390_css'
>>   qemu-system-s390x: load of migration failed: Invalid argument
>> [Fail due to css mismatch - there is no css 0 in the new vm.]
> Hmm... so should we even try to migrate an empty css 0? It only exists
> because we have created a device that we had to detach anyway because
> it was non-migrateable...
> 
> [Probably no easy way to deal with this, though.]
> 

We could make the thing go away when the last device is gone.
I see a general problem with implicitly generated shared stuff.

Obviously we can't fix the past.

@Dong Jia:

Thanks for doing the experiments and publishing your findings.

Halil
Cornelia Huck Nov. 29, 2017, 5:29 p.m. UTC | #7
On Wed, 29 Nov 2017 17:25:59 +0100
Halil Pasic <pasic@linux.vnet.ibm.com> wrote:

> On 11/29/2017 01:37 PM, Cornelia Huck wrote:
> > On Tue, 28 Nov 2017 14:07:58 +0100
> > Halil Pasic <pasic@linux.vnet.ibm.com> wrote:
> >   
> >> The default css 0xfe is currently restricted to virtual subchannel
> >> devices. The hope when the decision was made was, that non-virtual
> >> subchannel devices will come around when guest can exploit multiple  
> > 
> > s/guest/guests/  
> 
> OK.
> 
> >   
> >> channel subsystems. Since the guests generally don't do, the pain  
> > 
> > s/the guests generally don't do/current guests don't do that/
> >   
> 
> OK.
> 
> >> of the partitioned (cssid) namespace outweighs the gain.
> >>
> >> Let us remove the corresponding restrictions (virtual devices
> >> can be put only in 0xfe and non-virtual devices in any css except
> >> the 0xfe -- while s390-squash-mcss then remaps everything to cssid 0).
> >>
> >> With this change, however, our schema for generating a css bus ids, if
> >> none is specified does not make much sense. Currently we start at cssid
> >> 0x0 for non-virtual devices and use the default css (without
> >> s390-squash-mcss exclusively) for virtual devices.  That means for
> >> non-virtual device the device would auto-magically end up non-visible for
> >> guests (which can't use the other channel subsystems).
> >>
> >> Thus let us also change the css bus id auto assignment algorithm,
> >> so that we first fill the default css, and then proceed to the
> >> next one (modulo MAX_CSSID).  
> > 
> > Let's reword the previous two paragraphs:
> > 
> > "At the same time, change our schema for generating css bus ids to put
> > both virtual and non-virtual devices into the default css (spilling
> > over into other css images, if needed) so that devices without a
> > specified id don't end up hidden to guests not supporting multiple
> > channel subsystems."
> >   
> 
> What I don't like about your explanation is, that "so that devices without
> a specified id don't end up hidden to guests not supporting multiple channel 
> subsystems" is not necessarily true: if we spill the devices are going
> to end up hidden.

Let's make this "don't always end up hidden".

> 
> >>
> >> The adverse effect of getting rid of the restriction on migration should
> >> not be too severe.  Vfio-ccw devices are not live-migratable yet, and for
> >> virtual devices using the extra freedom would only make sense with the
> >> aforementioned guest support in place.
> >>
> >> The auto-generated bus ids are affected by both changes. We hope to not
> >> encounter any auto-generated bus ids in production as Libvirt is always
> >> explicit about the bus id.  Since 8ed179c937 ("s390x/css: catch section
> >> mismatch on load", 2017-05-18) the worst that can happen because the same
> >> device ended up having a different bus id is a cleanly failed migration.
> >> I find it hard to reason about the impact of changed auto-generated bus
> >> ids on migration for command line users as I don't know which rules is
> >> such an user supposed to follow.
> >>
> >> Another pain-point is down- or upgrade of QEMU for command line users.
> >> The old way and the new way of doing vfio-ccw are mutually incompatible.
> >> Libvirt is only going to support the new way, so for libvirt users, the
> >> possible problems at QEMU downgrade are the following. If a domain
> >> contains virtual devices placed into a css different than 0xfe the domain
> >> wil refuse to start with a QEMU not having this patch. Putting devices
> >> into a css different that 0xfe however won't make much sense in the
> >> near future (guest support). Libvirt will refuse to do vfio-ccw with
> >> a QEMU not having this patch. This is business as usual.  
> > 
> > All of this is valuable information, but I don't think a changelog is
> > the right place for it. We should really have a place where we can also
> > save things like Dong Jia's writeup downthread. In the documentation
> > folder or on the QEMU wiki (or both). We can be much more verbose there
> > (including examples, which make this stuff way easier to understand).
> > I'd recommend adding a documentation file with this patch (or patch
> > series, as I'd also like to see a squash-mcss deprecation patch).
> >   
> 
> I tired to be quite elaborate, because at some point of the v1
> discussion, you said if we are planting landmines you want them explained
> in the commit message. I'm not sure how this document file is supposed
> to be called, and look like. I think this stuff is relevant for
> the decision why is this patch a good one, and what are the trade-offs
> we make. Referencing to a document would be also OK with me.

I don't think there will be landmines left, no? Or do I misread?

> 
> Regarding the deprecation patch. It's already on the list as RFC. You
> have even commented on it. I intend to make a v2 once we know what are
> we going to do here.

This needs to be a patch series with a cover letter. Discussing in
multiple places is draining.

> 
> >>
> >> Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com>
> >>
> >> ---
> >> Hi all!
> >>
> >> Moving the property to the machine turned out very ugly (IMHO).  Libvirt
> >> detects machine properties via query-command-line-options.  This is
> >> however decoupled from the existence of the actual property: one needs to
> >> touch util/qemu-config.c (see patch) so the property shows up.
> >> Furthermore this stuff is global. I've also noticed that the infamous
> >> s390-squash-mcss does not show up.  
> > 
> > s390x-softmmu/qemu-system-s390x -machine s390-ccw-virtio,help
> > 
> > shows it, as does qom-list on /machine.  
> 
> For qom-list you need an instance. Libvirt probes such stuff using
> (btw probing is done with machine type none, and qom-list /machine
> should not list squash if we are running machine type none).
> 
> > 
> > Output of <qemu binary> --help is very haphazard anyway and you should
> > rely on the interfaces above.
> >   
> 
> I disagree. AFAIK management software should probe using the
> query-command-line-options QMP command. Or am I missing something?
> 
> I don't speak about the output of <qemu binary> --help here.

It's the same interface. It both over- and underreports. Querying the
actual object is the only way to get this reliable. If that is not
possible today, it needs to be implemented.

> 
> >>
> >> On the other hand to get the property displayed by -machine
> >> s390-ccw-virtio,help one needs a setter on the property. So I have
> >> created a fake setter which produces an error each time called.  
> > 
> > Yes, this is fugly. A css object would be a better place, but way too
> > much work for now.
> >   
> 
> Actually not necessarily. We could simply put this at the virtual-css-bridge.
> I don't know if Libvirt would accept that though. The problem regarding
> virtual-css-bridge (and css object) was rw properties: we can't set a value
> for a property of the virtual-css-bridge on the command line. That was a
> problem for default-css or whatever but is completely fine for the read
> only property 'cssid-unrestricted'.
> 
> >>
> >> I would strongly prefer putting back the property to the device level!  
> > 
> > I continue to strongly oppose that. The device is entirely the wrong
> > level.
> >   
> 
> I don't recall you ever explaining why. If it's completely wrong
> I would have expected Boris and also me being for long enough around
> to get it at least after the first hint.

Just once again: This is a property of the whole css/machine, not of
the individual device.

No more from me today.
Dong Jia Shi Nov. 30, 2017, 3:16 a.m. UTC | #8
* Halil Pasic <pasic@linux.vnet.ibm.com> [2017-11-29 17:30:15 +0100]:

> 
> 
> On 11/29/2017 12:47 PM, Cornelia Huck wrote:
> > On Wed, 29 Nov 2017 16:17:35 +0800
> > Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> wrote:
> > 
> >> * Halil Pasic <pasic@linux.vnet.ibm.com> [2017-11-28 14:07:58 +0100]:
> >>
> >> [...]
> >>> The auto-generated bus ids are affected by both changes. We hope to not
> >>> encounter any auto-generated bus ids in production as Libvirt is always
> >>> explicit about the bus id.  Since 8ed179c937 ("s390x/css: catch section
> >>> mismatch on load", 2017-05-18) the worst that can happen because the same
> >>> device ended up having a different bus id is a cleanly failed migration.
> >>> I find it hard to reason about the impact of changed auto-generated bus
> >>> ids on migration for command line users as I don't know which rules is
> >>> such an user supposed to follow.  
> >> For this paragraph, Halil pointed to me a case that he is thinking of.
> >> 1. VM configuration with 3 devices:
> >>   -device virtio (e.g. virtio-blk-ccw,id=disk0)
> >>   -device vfio-ccw (e.g. id=vfio0)
> >>   -device virtio (e.g. virtio-rng-ccw,id=rng0)
> >> 2. Start the vm.
> >> 3. device_del vfio0
> >> 4. migrate "exec:gzip -c > /tmp/tmp_vmstate.gz"
> >> 5. modify cmd line from step 1 by removing the vfio0 device, and adding:
> >>    -incoming "exec:gzip -c -d /tmp/tmp_vmstate.gz"
> >>
> >> Let me list my test results here for everybody's reference.
> >>
> >> W/o this patch
> >> ==============
> >>
> >> ------------+---------------+-------------
> >>             | squashing off | squashing on
> >> ------------+---------------+-------------
> >>     auto id |        F      |     F
> >> ------------+---------------+-------------
> >> explicit id |        F      |     S
> >> ------------+---------------+-------------
> >>
> >> T1. squashing off + auto id
> >>   qemu-system-s390x: vmstate: get_nullptr expected VMS_NULLPTR_MARKER
> >>   qemu-system-s390x: Failed to load s390_css:css
> >>   qemu-system-s390x: error while loading state for instance 0x0 of device 's390_css'
> >>   qemu-system-s390x: load of migration failed: Invalid argument
> >> [Fail due to css mismatch - there is no css 0 in the new vm.]
> >>
> >> T2. squashing off + explicit given id
> >>   qemu-system-s390x: vmstate: get_nullptr expected VMS_NULLPTR_MARKER
> >>   qemu-system-s390x: Failed to load s390_css:css
> >>   qemu-system-s390x: error while loading state for instance 0x0 of device 's390_css'
> >>   qemu-system-s390x: load of migration failed: Invalid argument
> >> [Fail due to css mismatch - there is no css 0 in the new vm.]
> > Hmm... so should we even try to migrate an empty css 0? It only exists
> > because we have created a device that we had to detach anyway because
> > it was non-migrateable...
> > 
> > [Probably no easy way to deal with this, though.]
> > 
> 
> We could make the thing go away when the last device is gone.
Is it possible to free the empty css in a .pre_save handler somewhere?

> I see a general problem with implicitly generated shared stuff.
> 
> Obviously we can't fix the past.
Nod.

> 
> @Dong Jia:
> 
> Thanks for doing the experiments and publishing your findings.
> 
Just want to ease the review. No need mention. :)
Dong Jia Shi Nov. 30, 2017, 3:29 a.m. UTC | #9
* Cornelia Huck <cohuck@redhat.com> [2017-11-29 12:47:47 +0100]:

[...]

> > With this patch
> > ===============
> > 
> > ------------+---------------+-------------
> >             | squashing off | squashing on
> > ------------+---------------+-------------
> >     auto id |        F      |     F
> > ------------+---------------+-------------
> > explicit id |        S'     |     S
> > ------------+---------------+-------------
> > 
> > T5. squashing off + auto id
> >   qemu-system-s390x: Unknown savevm section or instance '/fe.0.0003/virtio-rng' 0
> >   qemu-system-s390x: load of migration failed: Invalid argument
> > [Fail due to busid mismatch.]
> > 
> > T6. squashing off + explicit given id
> >   qemu-system-s390x: vmstate: get_nullptr expected VMS_NULLPTR_MARKER
> >   qemu-system-s390x: Failed to load s390_css:css
> >   qemu-system-s390x: error while loading state for instance 0x0 of device 's390_css'
> >   qemu-system-s390x: load of migration failed: Invalid argument
> > [Setting vfio-ccw.devno=non-fe.x.xxxx. (same as T1)
> >  Fail due to css mismatch - there is no css 0 in the new vm.]
> > 
> >   Succeed.
> > [Setting vfio-ccw.devno=fe.x.xxxx.]
> 
> Don't you need to attach the vfio-ccw device later anyway? You have to
> detach it from the source before you migrate, and I'd expect it to be
> symmetric.
Yes. After migrate, there is no problem to device_add the vfio-ccw
device (id=vfio0,devno=fe.x.xxxx). The result (succeed for this case)
does not change.

> 
> > 
> > T7. squashing on  + auto id
> >   qemu-system-s390x: Unknown savevm section or instance '/00.0.0003/virtio-rng' 0
> >   qemu-system-s390x: load of migration failed: Invalid argument
> > [Fail due to busid mismatch.]
> > 
> > T8. squashing on  + explicit given id
> >   Succeed.
> > 
> > 
> > Notice:
> > The differences of the test results between w and w/o this patch are in
> > the "squashing off" cases. I think these are things that we can accept.
> 
> Yes, I think that makes sense. If you want reliable migration, you need
> to be specific with your ids. I'd just don't want us to break things
> explicitly.
> 
Fair enough. Got the message.
Halil Pasic Nov. 30, 2017, 12:32 p.m. UTC | #10
On 11/29/2017 06:29 PM, Cornelia Huck wrote:
[..]
>>>> With this change, however, our schema for generating a css bus ids, if
>>>> none is specified does not make much sense. Currently we start at cssid
>>>> 0x0 for non-virtual devices and use the default css (without
>>>> s390-squash-mcss exclusively) for virtual devices.  That means for
>>>> non-virtual device the device would auto-magically end up non-visible for
>>>> guests (which can't use the other channel subsystems).
>>>>
>>>> Thus let us also change the css bus id auto assignment algorithm,
>>>> so that we first fill the default css, and then proceed to the
>>>> next one (modulo MAX_CSSID).  
>>>
>>> Let's reword the previous two paragraphs:
>>>
>>> "At the same time, change our schema for generating css bus ids to put
>>> both virtual and non-virtual devices into the default css (spilling
>>> over into other css images, if needed) so that devices without a
>>> specified id don't end up hidden to guests not supporting multiple
>>> channel subsystems."
>>>   
>>
>> What I don't like about your explanation is, that "so that devices without
>> a specified id don't end up hidden to guests not supporting multiple channel 
>> subsystems" is not necessarily true: if we spill the devices are going
>> to end up hidden.
> 
> Let's make this "don't always end up hidden".

I would prefer to go with this.


At the same time, change our schema for generating css bus ids to put
both virtual and non-virtual devices into the default css (spilling
over into other css images, if needed). The intention is to deprecate
s390-squash-mcss. Whit this change devices without a specified devno won't
end up hidden to guests not supporting multiple channel subsystems, unless
this can not be avoided (default css full).

> 
>>
>>>>
>>>> The adverse effect of getting rid of the restriction on migration should
>>>> not be too severe.  Vfio-ccw devices are not live-migratable yet, and for
>>>> virtual devices using the extra freedom would only make sense with the
>>>> aforementioned guest support in place.
>>>>
>>>> The auto-generated bus ids are affected by both changes. We hope to not


[..]
>>
>> I tired to be quite elaborate, because at some point of the v1
>> discussion, you said if we are planting landmines you want them explained
>> in the commit message. I'm not sure how this document file is supposed
>> to be called, and look like. I think this stuff is relevant for
>> the decision why is this patch a good one, and what are the trade-offs
>> we make. Referencing to a document would be also OK with me.
> 
> I don't think there will be landmines left, no? Or do I misread?
>

The patch basically just got worse with changing the schema. If there
were landmines they are still there. My original point was that it's
bearable in practice, so that's why I had this short and vague ain't
too bad formulation.
 
>>
>> Regarding the deprecation patch. It's already on the list as RFC. You
>> have even commented on it. I intend to make a v2 once we know what are
>> we going to do here.
> 
> This needs to be a patch series with a cover letter. Discussing in
> multiple places is draining.
> 

Can do. No problem for me.

>>
>>>>
>>>> Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com>
>>>>
>>>> ---
>>>> Hi all!
>>>>
>>>> Moving the property to the machine turned out very ugly (IMHO).  Libvirt
>>>> detects machine properties via query-command-line-options.  This is
>>>> however decoupled from the existence of the actual property: one needs to
>>>> touch util/qemu-config.c (see patch) so the property shows up.
>>>> Furthermore this stuff is global. I've also noticed that the infamous
>>>> s390-squash-mcss does not show up.  
>>>
>>> s390x-softmmu/qemu-system-s390x -machine s390-ccw-virtio,help
>>>
>>> shows it, as does qom-list on /machine.  
>>
>> For qom-list you need an instance. Libvirt probes such stuff using
>> (btw probing is done with machine type none, and qom-list /machine
>> should not list squash if we are running machine type none).
>>
>>>
>>> Output of <qemu binary> --help is very haphazard anyway and you should
>>> rely on the interfaces above.
>>>   
>>
>> I disagree. AFAIK management software should probe using the
>> query-command-line-options QMP command. Or am I missing something?
>>
>> I don't speak about the output of <qemu binary> --help here.
> 
> It's the same interface. It both over- and underreports. Querying the
> actual object is the only way to get this reliable. If that is not
> possible today, it needs to be implemented.
>

Sorry I did not look into how <qemu binary> --help is implemented. For
what we are trying to accomplish here it's irrelevant. Wonder why did you
have to bring this up and make the discussion even more complicated.
 
>>
>>>>
>>>> On the other hand to get the property displayed by -machine
>>>> s390-ccw-virtio,help one needs a setter on the property. So I have
>>>> created a fake setter which produces an error each time called.  
>>>
>>> Yes, this is fugly. A css object would be a better place, but way too
>>> much work for now.
>>>   
>>
>> Actually not necessarily. We could simply put this at the virtual-css-bridge.
>> I don't know if Libvirt would accept that though. The problem regarding
>> virtual-css-bridge (and css object) was rw properties: we can't set a value
>> for a property of the virtual-css-bridge on the command line. That was a
>> problem for default-css or whatever but is completely fine for the read
>> only property 'cssid-unrestricted'.
>>
>>>>
>>>> I would strongly prefer putting back the property to the device level!  
>>>
>>> I continue to strongly oppose that. The device is entirely the wrong
>>> level.
>>>   
>>
>> I don't recall you ever explaining why. If it's completely wrong
>> I would have expected Boris and also me being for long enough around
>> to get it at least after the first hint.
> 
> Just once again: This is a property of the whole css/machine, not of
> the individual device.
> 
> No more from me today.
>

Hope there is more form you coming today. :) I would like to know if
I'm on the right track with the machine property. You did not comment
on a couple of my points, among others this question. This libvirt
may not be using the right interface for discovering machine properties,
and the right interface may note even exist in QEMU just got me more
insecure.

Regards,
Halil
Cornelia Huck Nov. 30, 2017, 1:32 p.m. UTC | #11
On Thu, 30 Nov 2017 13:32:12 +0100
Halil Pasic <pasic@linux.vnet.ibm.com> wrote:

(...)

Before I spend way too much time on this:

Is the proposed machine-property interface usable from a libvirt POV?
IOW, can we go with this now and fix the ugliness later (probably via a
generic overhaul of the interface)?
Halil Pasic Dec. 1, 2017, 2:38 p.m. UTC | #12
On 11/30/2017 02:32 PM, Cornelia Huck wrote:
> On Thu, 30 Nov 2017 13:32:12 +0100
> Halil Pasic <pasic@linux.vnet.ibm.com> wrote:
> 
> (...)
> 
> Before I spend way too much time on this:
> 
> Is the proposed machine-property interface usable from a libvirt POV?
> IOW, can we go with this now and fix the ugliness later (probably via a
> generic overhaul of the interface)?
> 

I've talked to Boris and Christian yesterday, and we agreed putting
the read only property to the virtual-css-bridge. It's kind of the
'css object' variant all seemed quite happy with, and since the property
is read only, not user creatable is not a problem.

Thus, from our perspective, this question is regarded obsolete at
the moment (this may change, should you turn out to be against having
the property at the virtual-css-bridge).

I've just sent a series named 'unrestrict cssids related patches'.
Could you please have a look?

Regards,
Halil
diff mbox series

Patch

diff --git a/hw/s390x/3270-ccw.c b/hw/s390x/3270-ccw.c
index 081e3ef6f4..3af13ea027 100644
--- a/hw/s390x/3270-ccw.c
+++ b/hw/s390x/3270-ccw.c
@@ -104,7 +104,7 @@  static void emulated_ccw_3270_realize(DeviceState *ds, Error **errp)
     SubchDev *sch;
     Error *err = NULL;
 
-    sch = css_create_sch(cdev->devno, true, cbus->squash_mcss, errp);
+    sch = css_create_sch(cdev->devno, cbus->squash_mcss, errp);
     if (!sch) {
         return;
     }
diff --git a/hw/s390x/css.c b/hw/s390x/css.c
index f6b5c807cd..cd26f32050 100644
--- a/hw/s390x/css.c
+++ b/hw/s390x/css.c
@@ -2370,22 +2370,12 @@  const PropertyInfo css_devid_ro_propinfo = {
     .get = get_css_devid,
 };
 
-SubchDev *css_create_sch(CssDevId bus_id, bool is_virtual, bool squash_mcss,
-                         Error **errp)
+SubchDev *css_create_sch(CssDevId bus_id, bool squash_mcss, Error **errp)
 {
     uint16_t schid = 0;
     SubchDev *sch;
 
     if (bus_id.valid) {
-        if (is_virtual != (bus_id.cssid == VIRTUAL_CSSID)) {
-            error_setg(errp, "cssid %hhx not valid for %s devices",
-                       bus_id.cssid,
-                       (is_virtual ? "virtual" : "non-virtual"));
-            return NULL;
-        }
-    }
-
-    if (bus_id.valid) {
         if (squash_mcss) {
             bus_id.cssid = channel_subsys.default_cssid;
         } else if (!channel_subsys.css[bus_id.cssid]) {
@@ -2396,19 +2386,8 @@  SubchDev *css_create_sch(CssDevId bus_id, bool is_virtual, bool squash_mcss,
                                            bus_id.devid, &schid, errp)) {
             return NULL;
         }
-    } else if (squash_mcss || is_virtual) {
-        bus_id.cssid = channel_subsys.default_cssid;
-
-        if (!css_find_free_subch_and_devno(bus_id.cssid, &bus_id.ssid,
-                                           &bus_id.devid, &schid, errp)) {
-            return NULL;
-        }
     } else {
-        for (bus_id.cssid = 0; bus_id.cssid < MAX_CSSID; ++bus_id.cssid) {
-            if (bus_id.cssid == VIRTUAL_CSSID) {
-                continue;
-            }
-
+        for (bus_id.cssid = channel_subsys.default_cssid;;) {
             if (!channel_subsys.css[bus_id.cssid]) {
                 css_create_css_image(bus_id.cssid, false);
             }
@@ -2418,7 +2397,8 @@  SubchDev *css_create_sch(CssDevId bus_id, bool is_virtual, bool squash_mcss,
                                                 NULL)) {
                 break;
             }
-            if (bus_id.cssid == MAX_CSSID) {
+            bus_id.cssid = (bus_id.cssid + 1) % MAX_CSSID;
+            if (bus_id.cssid == channel_subsys.default_cssid) {
                 error_setg(errp, "Virtual channel subsystem is full!");
                 return NULL;
             }
diff --git a/hw/s390x/s390-ccw.c b/hw/s390x/s390-ccw.c
index 0ef232ec27..4a9d4d2534 100644
--- a/hw/s390x/s390-ccw.c
+++ b/hw/s390x/s390-ccw.c
@@ -77,7 +77,7 @@  static void s390_ccw_realize(S390CCWDevice *cdev, char *sysfsdev, Error **errp)
         goto out_err_propagate;
     }
 
-    sch = css_create_sch(ccw_dev->devno, false, cbus->squash_mcss, &err);
+    sch = css_create_sch(ccw_dev->devno, cbus->squash_mcss, &err);
     if (!sch) {
         goto out_mdevid_free;
     }
diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
index 6a57f94197..b558b2adad 100644
--- a/hw/s390x/s390-virtio-ccw.c
+++ b/hw/s390x/s390-virtio-ccw.c
@@ -556,6 +556,21 @@  static inline void machine_set_squash_mcss(Object *obj, bool value,
     ms->s390_squash_mcss = value;
 }
 
+static bool prop_get_true(Object *obj, Error **errp)
+{
+    return true;
+}
+
+/*
+ * This is a fake setter so the device shows up in the output of
+ * --machine s390-ccw-virtio,help.
+ */
+static inline void prop_set_bool_fail(Object *obj, bool value,
+                                           Error **errp)
+{
+    error_setg(errp, "Tried to set non-settable property!");
+}
+
 static inline void s390_machine_initfn(Object *obj)
 {
     object_property_add_bool(obj, "aes-key-wrap",
@@ -587,6 +602,12 @@  static inline void s390_machine_initfn(Object *obj)
             "enable/disable squashing subchannels into the default css",
             NULL);
     object_property_set_bool(obj, false, "s390-squash-mcss", NULL);
+    object_property_add_bool(obj, "cssid-unrestricted",
+                                   prop_get_true, prop_set_bool_fail, NULL);
+    object_property_set_description(obj, "cssid-unrestricted",
+            "can use any cssid with any css device (the restriction,"
+            " 0xfe for virtual and 0x00-0xfd for non-virtual, is no more)",
+            NULL);
 }
 
 static const TypeInfo ccw_machine_info = {
diff --git a/hw/s390x/virtio-ccw.c b/hw/s390x/virtio-ccw.c
index 184515ce94..3dd902a664 100644
--- a/hw/s390x/virtio-ccw.c
+++ b/hw/s390x/virtio-ccw.c
@@ -701,7 +701,7 @@  static void virtio_ccw_device_realize(VirtioCcwDevice *dev, Error **errp)
     SubchDev *sch;
     Error *err = NULL;
 
-    sch = css_create_sch(ccw_dev->devno, true, cbus->squash_mcss, errp);
+    sch = css_create_sch(ccw_dev->devno, cbus->squash_mcss, errp);
     if (!sch) {
         return;
     }
diff --git a/include/hw/s390x/css.h b/include/hw/s390x/css.h
index ab6ebe66b5..53c270a216 100644
--- a/include/hw/s390x/css.h
+++ b/include/hw/s390x/css.h
@@ -272,12 +272,9 @@  extern const PropertyInfo css_devid_ro_propinfo;
  * default css image for it.
  * If @p bus_id is valid, and @p squash_mcss is false, verify that it is
  * not already in use, and find a free devno for it.
- * If @p bus_id is not valid, and if either @p squash_mcss or @p is_virtual
- * is true, find a free subchannel id and device number across all
- * subchannel sets from the default css image.
- * If @p bus_id is not valid, and if both @p squash_mcss and @p is_virtual
- * are false, find a non-full css image and find a free subchannel id and
- * device number across all subchannel sets from it.
+ * If @p bus_id is not valid find a free subchannel id and device number
+ * across all subchannel sets and all css images starting from the default
+ * css image.
  *
  * If either of the former actions succeed, allocate a subchannel structure,
  * initialise it with the bus id, subchannel id and device number, register
@@ -286,8 +283,7 @@  extern const PropertyInfo css_devid_ro_propinfo;
  * The caller becomes owner of the returned subchannel structure and
  * is responsible for unregistering and freeing it.
  */
-SubchDev *css_create_sch(CssDevId bus_id, bool is_virtual, bool squash_mcss,
-                         Error **errp);
+SubchDev *css_create_sch(CssDevId bus_id, bool squash_mcss, Error **errp);
 
 /** Turn on css migration */
 void css_register_vmstate(void);
diff --git a/util/qemu-config.c b/util/qemu-config.c
index 99b0e46fa3..acfc452fc2 100644
--- a/util/qemu-config.c
+++ b/util/qemu-config.c
@@ -233,6 +233,12 @@  static QemuOptsList machine_opts = {
             .help = "Up to 8 chars in set of [A-Za-z0-9. ](lower case chars"
                     " converted to upper case) to pass to machine"
                     " loader, boot manager, and guest kernel",
+        },{
+            /* TODO: Consider device property. This is way too ugly. */
+            .name = "cssid-unrestricted",
+            .type = QEMU_OPT_BOOL,
+            .help = "can use any cssid with any css device (the restriction,"
+            " 0xfe for virtual and 0x00-0xfd for non-virtual, is no more)",
         },
         { /* End of list */ }
     }