diff mbox series

[v3,1/5] qga: win32: fix crashes when PCI info cannot be retrived

Message ID a8cf3b35e7559bfaf20f2d71a0b4809fdd4f38e6.1536320522.git.tgolembi@redhat.com
State New
Headers show
Series qga: report serial number and disk node | expand

Commit Message

Tomáš Golembiovský Sept. 7, 2018, 11:42 a.m. UTC
The guest-get-fsinfo command collects also information about PCI
controller where the disk is attached. When this fails for some reasons
it tries to return just the partial information. However in certain
cases the pointer to the structure was not initialized and was set to
NULL. This breaks the serializer and leads to a crash of the guest agent.

Signed-off-by: Tomáš Golembiovský <tgolembi@redhat.com>
---
 qga/commands-win32.c | 27 ++++++++++++++++++++++-----
 1 file changed, 22 insertions(+), 5 deletions(-)

Comments

Michael Roth Sept. 26, 2018, 5:15 p.m. UTC | #1
Quoting Tomáš Golembiovský (2018-09-07 06:42:09)
> The guest-get-fsinfo command collects also information about PCI
> controller where the disk is attached. When this fails for some reasons
> it tries to return just the partial information. However in certain
> cases the pointer to the structure was not initialized and was set to
> NULL. This breaks the serializer and leads to a crash of the guest agent.
> 
> Signed-off-by: Tomáš Golembiovský <tgolembi@redhat.com>

For a win10 guest started with:

qemu-system-x86_64 -m 2G -smp 2,cores=2,sockets=1 -drive file=/home/mdroth/vm/win10_pro_n_snap0.qcow2,if=virtio -drive file=/home/mdroth/vm/virtio-win-0.1.102.iso,if=ide,media=cdrom -rtc base=localtime,driftfix=slew -vga std -boot d -name vm4 -netdev tap,script=/etc/qemu-ifup,vhost=on,id=vnet0 -device virtio-net-pci,mac=52:54:00:12:34:04,id=vnic0,netdev=vnet0,disable-modern=true -vnc :4 -device virtio-serial -balloon virtio -mon chardev=hmp0 -chardev socket,path=/tmp/vm4-hmp0.sock,server,nowait,id=hmp0 -mon chardev=qmp0,mode=control -chardev socket,path=/tmp/vm4-qmp0.sock,server,nowait,id=qmp0 -device virtserialport,chardev=vs0,name=vs0,id=vs_vs0 -chardev socket,path=/tmp/vm4-vs0.sock,server,nowait,id=vs0 -device virtserialport,chardev=vs1,name=vs1,id=vs_vs1 -chardev socket,path=/tmp/vm4-vs1.sock,server,nowait,id=vs1 -device virtserialport,chardev=qga,name=org.qemu.guest_agent.0,id=vs_qga -chardev socket,path=/tmp/vm4-qga.sock,server,nowait,id=qga -device isa-serial,chardev=serial0,id=serial_serial0 -chardev socket,path=/tmp/vm4-serial0.sock,server,nowait,id=serial0 -L /home/mdroth/w/build/qemu-2.11.2-build/pc-bios --enable-kvm

this yields the following:

{'execute':'guest-get-fsinfo'}
{"return": [{"name": "\\\\?\\Volume{83835b2d-0032-11e6-a84f-806e6f6e6963}\\", "total-bytes": 160755712, "mountpoint": "D:\\", "disk": [{"bus-type": "ide", "bus": 0, "unit": 0, "pci-controller": {"bus": 0, "slot": 0, "domain": 0, "function": 0}, "target": 0}], "used-bytes": 160755712, "type": "CDFS"}, {"name": "\\\\?\\Volume{2ea839c6-0000-0000-0000-80620c000000}\\", "mountpoint": "System Reserved", "disk": [{"bus-type": "scsi", "bus": 0, "unit": 0, "pci-controller": {"bus": 0, "slot": 0, "domain": 0, "function": 0}, "target": 0}], "type": "NTFS"}, {"name": "\\\\?\\Volume{2ea839c6-0000-0000-0000-501f00000000}\\", "total-bytes": 52665839616, "mountpoint": "C:\\", "disk": [{"bus-type": "scsi", "bus": 0, "unit": 0, "pci-controller": {"bus": 0, "slot": 0, "domain": 0, "function": 0}, "target": 0}], "used-bytes": 25265487872, "type": "NTFS"}, {"name": "\\\\?\\Volume{2ea839c6-0000-0000-0000-100000000000}\\", "mountpoint": "System Reserved", "disk": [{"bus-type": "scsi", "bus": 0, "unit": 0, "pci-controller": {"bus": 0, "slot": 0, "domain": 0, "function": 0}, "target": 0}], "type": "NTFS"}]}

domain/bus/slot/function=0 are valid PCI addresses so initializing to 0 is
wrong. Sameeh had a previous series that initializes to -1 that I think
is more appropriate (it hasn't gone in yet since we opted not to enable
CONFIG_QGA_NTDDSCSI for 3.0 since the PCI stuff seems be generally
broken for Windows, also because the 2nd patch needs some fixups:

  https://lists.gnu.org/archive/html/qemu-devel/2018-06/msg07500.html

With that series (and some fixups I have on top at
https://github.com/mdroth/qemu/commits/qga-test), we get the following
output:

{'execute':'guest-get-fsinfo'}
{"return": [{"name": "\\\\?\\Volume{83835b2d-0032-11e6-a84f-806e6f6e6963}\\", "total-bytes": 160755712, "mountpoint": "D:\\", "disk": [{"bus-type": "ide", "bus": 0, "unit": 0, "pci-controller": {"bus": -1, "slot": -1, "domain": -1, "function": -1}, "target": 0}], "used-bytes": 160755712, "type": "CDFS"}, {"name": "\\\\?\\Volume{2ea839c6-0000-0000-0000-80620c000000}\\", "mountpoint": "System Reserved", "disk": [{"bus-type": "scsi", "bus": 0, "unit": 0, "pci-controller": {"bus": -1, "slot": -1, "domain": 0, "function": 3}, "target": 0}], "type": "NTFS"}, {"name": "\\\\?\\Volume{2ea839c6-0000-0000-0000-501f00000000}\\", "total-bytes": 52665839616, "mountpoint": "C:\\", "disk": [{"bus-type": "scsi", "bus": 0, "unit": 0, "pci-controller": {"bus": -1, "slot": -1, "domain": 0, "function": 2}, "target": 0}], "used-bytes": 25267560448, "type": "NTFS"}, {"name": "\\\\?\\Volume{2ea839c6-0000-0000-0000-100000000000}\\", "mountpoint": "System Reserved", "disk": [{"bus-type": "scsi", "bus": 0, "unit": 0, "pci-controller": {"bus": -1, "slot": -1, "domain": 0, "function": 1}, "target": 0}], "type": "NTFS"}]}

Here we see the non-sensical PCI topology info I mentioned previously.
There are values like '"function": 3' even though there are no
multifunction devices present. This will be exposed to users if we
enable CONFIG_QGA_NTDDSCSI with things as they stand. Currently we
just get an empty array for "disk" field of GuestFilesystemInfo
for w32, which fortunately aligns with the current QAPI schema (it's
an array since the volume can span multiple disks). I'm not about the
SCSI bus/unit/target data either. It's a real mess (maybe things
work a bit better when an actual SCSI controller is used?) and I'm
not sure why/how I didn't notice during initial testing.

I think all this needs to be addressed before we enable
CONFIG_QGA_NTDDSCSI (in terms of what that's used for in the current
code at least, i.e. enabling everything reported by
GuestFilesystemInfo.disk).

What is the oVirt use-case? It doesn't seem like you need PCI topology,
but what about SCSI topology (and if so, does that information seem
correct to you)? Or is just a list a serials/dev paths by themselves
all that's needed? Trying to see how we might decouple things from the
broken PCI topology stuff. May need to deprecate
GuestFilesystemInfo.disk in favor of something less monolithic if all
of this isn't fixable in a reasonable-enough timeframe.


> ---
>  qga/commands-win32.c | 27 ++++++++++++++++++++++-----
>  1 file changed, 22 insertions(+), 5 deletions(-)
> 
> diff --git a/qga/commands-win32.c b/qga/commands-win32.c
> index 98d9735389..9c959122d9 100644
> --- a/qga/commands-win32.c
> +++ b/qga/commands-win32.c
> @@ -633,15 +633,32 @@ static GuestDiskAddressList *build_guest_disk_info(char *guid, Error **errp)
>           * https://technet.microsoft.com/en-us/library/ee851589(v=ws.10).aspx */
>          if (DeviceIoControl(vol_h, IOCTL_SCSI_GET_ADDRESS, NULL, 0, scsi_ad,
>                              sizeof(SCSI_ADDRESS), &len, NULL)) {
> +            Error *local_err = NULL;
>              disk->unit = addr.Lun;
>              disk->target = addr.TargetId;
>              disk->bus = addr.PathId;
> -            disk->pci_controller = get_pci_info(name, errp);
> +            g_debug("unit=%lld target=%lld bus=%lld",
> +                disk->unit, disk->target, disk->bus);
> +            disk->pci_controller = get_pci_info(name, &local_err);
> +
> +            if (local_err) {
> +                g_debug("failed to get PCI controller info: %s",
> +                    error_get_pretty(local_err));
> +                error_free(local_err);
> +            } else if (disk->pci_controller != NULL) {
> +                g_debug("pci: domain=%lld bus=%lld slot=%lld function=%lld",
> +                    disk->pci_controller->domain,
> +                    disk->pci_controller->bus,
> +                    disk->pci_controller->slot,
> +                    disk->pci_controller->function);
> +            }
>          }
> -        /* We do not set error in this case, because we still have enough
> -         * information about volume. */
> -    } else {
> -         disk->pci_controller = NULL;
> +    }
> +    /* We do not set error in case pci_controller is NULL, because we still
> +     * have enough information about volume. */
> +    if (disk->pci_controller == NULL) {
> +        g_debug("no PCI controller info");
> +        disk->pci_controller = g_malloc0(sizeof(GuestPCIAddress));
>      }
> 
>      list = g_malloc0(sizeof(*list));
> -- 
> 2.18.0
>
Tomáš Golembiovský Sept. 27, 2018, 9:06 a.m. UTC | #2
Hi Michael,

thanks for looking into this. My comments are below.

Adding Sameeh...


On Wed, 26 Sep 2018 12:15:48 -0500
Michael Roth <mdroth@linux.vnet.ibm.com> wrote:

> Quoting Tomáš Golembiovský (2018-09-07 06:42:09)
> > The guest-get-fsinfo command collects also information about PCI
> > controller where the disk is attached. When this fails for some reasons
> > it tries to return just the partial information. However in certain
> > cases the pointer to the structure was not initialized and was set to
> > NULL. This breaks the serializer and leads to a crash of the guest agent.
> > 
> > Signed-off-by: Tomáš Golembiovský <tgolembi@redhat.com>  
> 
> For a win10 guest started with:
> 
> qemu-system-x86_64 -m 2G -smp 2,cores=2,sockets=1 -drive file=/home/mdroth/vm/win10_pro_n_snap0.qcow2,if=virtio -drive file=/home/mdroth/vm/virtio-win-0.1.102.iso,if=ide,media=cdrom -rtc base=localtime,driftfix=slew -vga std -boot d -name vm4 -netdev tap,script=/etc/qemu-ifup,vhost=on,id=vnet0 -device virtio-net-pci,mac=52:54:00:12:34:04,id=vnic0,netdev=vnet0,disable-modern=true -vnc :4 -device virtio-serial -balloon virtio -mon chardev=hmp0 -chardev socket,path=/tmp/vm4-hmp0.sock,server,nowait,id=hmp0 -mon chardev=qmp0,mode=control -chardev socket,path=/tmp/vm4-qmp0.sock,server,nowait,id=qmp0 -device virtserialport,chardev=vs0,name=vs0,id=vs_vs0 -chardev socket,path=/tmp/vm4-vs0.sock,server,nowait,id=vs0 -device virtserialport,chardev=vs1,name=vs1,id=vs_vs1 -chardev socket,path=/tmp/vm4-vs1.sock,server,nowait,id=vs1 -device virtserialport,chardev=qga,name=org.qemu.guest_agent.0,id=vs_qga -chardev socket,path=/tmp/vm4-qga.sock,server,nowait,id=qga -device isa-serial,chardev=serial0,id=serial_serial0 -chardev socket,path=/tmp/vm4-serial0.sock,server,nowait,id=serial0 -L /home/mdroth/w/build/qemu-2.11.2-build/pc-bios --enable-kvm
> 
> this yields the following:
> 
> {'execute':'guest-get-fsinfo'}
> {"return": [{"name": "\\\\?\\Volume{83835b2d-0032-11e6-a84f-806e6f6e6963}\\", "total-bytes": 160755712, "mountpoint": "D:\\", "disk": [{"bus-type": "ide", "bus": 0, "unit": 0, "pci-controller": {"bus": 0, "slot": 0, "domain": 0, "function": 0}, "target": 0}], "used-bytes": 160755712, "type": "CDFS"}, {"name": "\\\\?\\Volume{2ea839c6-0000-0000-0000-80620c000000}\\", "mountpoint": "System Reserved", "disk": [{"bus-type": "scsi", "bus": 0, "unit": 0, "pci-controller": {"bus": 0, "slot": 0, "domain": 0, "function": 0}, "target": 0}], "type": "NTFS"}, {"name": "\\\\?\\Volume{2ea839c6-0000-0000-0000-501f00000000}\\", "total-bytes": 52665839616, "mountpoint": "C:\\", "disk": [{"bus-type": "scsi", "bus": 0, "unit": 0, "pci-controller": {"bus": 0, "slot": 0, "domain": 0, "function": 0}, "target": 0}], "used-bytes": 25265487872, "type": "NTFS"}, {"name": "\\\\?\\Volume{2ea839c6-0000-0000-0000-100000000000}\\", "mountpoint": "System Reserved", "disk": [{"bus-type": "scsi", "bus": 0, "unit": 0, "pci-controller": {"bus": 0, "slot": 0, "domain": 0, "function": 0}, "target": 0}], "type": "NTFS"}]}
> 
> domain/bus/slot/function=0 are valid PCI addresses so initializing to 0 is
> wrong. Sameeh had a previous series that initializes to -1 that I think
> is more appropriate (it hasn't gone in yet since we opted not to enable
> CONFIG_QGA_NTDDSCSI for 3.0 since the PCI stuff seems be generally
> broken for Windows, also because the 2nd patch needs some fixups:
> 
>   https://lists.gnu.org/archive/html/qemu-devel/2018-06/msg07500.html

I wasn't aware of that. I is trying to "fix" the same issue.

I've been also thinking about using -1, but I didn't know what is/isn't
correct PCI address.

> 
> With that series (and some fixups I have on top at
> https://github.com/mdroth/qemu/commits/qga-test), we get the following
> output:

Should I rebase on that and drop my patch?

> 
> {'execute':'guest-get-fsinfo'}
> {"return": [{"name": "\\\\?\\Volume{83835b2d-0032-11e6-a84f-806e6f6e6963}\\", "total-bytes": 160755712, "mountpoint": "D:\\", "disk": [{"bus-type": "ide", "bus": 0, "unit": 0, "pci-controller": {"bus": -1, "slot": -1, "domain": -1, "function": -1}, "target": 0}], "used-bytes": 160755712, "type": "CDFS"}, {"name": "\\\\?\\Volume{2ea839c6-0000-0000-0000-80620c000000}\\", "mountpoint": "System Reserved", "disk": [{"bus-type": "scsi", "bus": 0, "unit": 0, "pci-controller": {"bus": -1, "slot": -1, "domain": 0, "function": 3}, "target": 0}], "type": "NTFS"}, {"name": "\\\\?\\Volume{2ea839c6-0000-0000-0000-501f00000000}\\", "total-bytes": 52665839616, "mountpoint": "C:\\", "disk": [{"bus-type": "scsi", "bus": 0, "unit": 0, "pci-controller": {"bus": -1, "slot": -1, "domain": 0, "function": 2}, "target": 0}], "used-bytes": 25267560448, "type": "NTFS"}, {"name": "\\\\?\\Volume{2ea839c6-0000-0000-0000-100000000000}\\", "mountpoint": "System Reserved", "disk": [{"bus-type": "scsi", "bus": 0, "unit": 0, "pci-controller": {"bus": -1, "slot": -1, "domain": 0, "function": 1}, "target": 0}], "type": "NTFS"}]}
> 
> Here we see the non-sensical PCI topology info I mentioned previously.
> There are values like '"function": 3' even though there are no
> multifunction devices present. This will be exposed to users if we
> enable CONFIG_QGA_NTDDSCSI with things as they stand.

Sadly, I have no idea how to properly fix this code. But patch 5 of the
series IMHO brings it one step closer to proper solution. The original
code tries to fetch the addr/bus/domain/slot for volume handle (which
fails on missing properties) instead of disk handle.

The remaining issue is that when listing the disks the entries are
cryptic names like "\Device\0000001d" instead of "\PhysicalDriveX" or
"\HarddiskX". And I don't know how to map one name to the other.


> Currently we
> just get an empty array for "disk" field of GuestFilesystemInfo
> for w32, which fortunately aligns with the current QAPI schema (it's
> an array since the volume can span multiple disks).

No. You get array with one item and useless data. Unless I missed
something.


> I'm not about the
> SCSI bus/unit/target data either. It's a real mess (maybe things
> work a bit better when an actual SCSI controller is used?) and I'm
> not sure why/how I didn't notice during initial testing.

I think unit/target should work. At least with all patches from my
series (well notably the last patch). But please, do verify that if you
can.


> I think all this needs to be addressed before we enable
> CONFIG_QGA_NTDDSCSI (in terms of what that's used for in the current
> code at least, i.e. enabling everything reported by
> GuestFilesystemInfo.disk).
> 
> What is the oVirt use-case? It doesn't seem like you need PCI topology,
> but what about SCSI topology (and if so, does that information seem
> correct to you)? Or is just a list a serials/dev paths by themselves
> all that's needed? Trying to see how we might decouple things from the
> broken PCI topology stuff. May need to deprecate
> GuestFilesystemInfo.disk in favor of something less monolithic if all
> of this isn't fixable in a reasonable-enough timeframe.

We don't need the PCI info. I've been only trying to fix this to some
sort along the way. What we need is to get disk serial number (patch 4)
and "name" of the disk of some sort (patch 5). For that I use device
node (e.g. /dev/sda) on Linux and UNC for the disk on Windows (e.g.
\\?\PhysicalDrive1). But the code relies on CONFIG_QGA_NTDDSCSI being
enabled.

    Tomas

> 
> 
> > ---
> >  qga/commands-win32.c | 27 ++++++++++++++++++++++-----
> >  1 file changed, 22 insertions(+), 5 deletions(-)
> > 
> > diff --git a/qga/commands-win32.c b/qga/commands-win32.c
> > index 98d9735389..9c959122d9 100644
> > --- a/qga/commands-win32.c
> > +++ b/qga/commands-win32.c
> > @@ -633,15 +633,32 @@ static GuestDiskAddressList *build_guest_disk_info(char *guid, Error **errp)
> >           * https://technet.microsoft.com/en-us/library/ee851589(v=ws.10).aspx */
> >          if (DeviceIoControl(vol_h, IOCTL_SCSI_GET_ADDRESS, NULL, 0, scsi_ad,
> >                              sizeof(SCSI_ADDRESS), &len, NULL)) {
> > +            Error *local_err = NULL;
> >              disk->unit = addr.Lun;
> >              disk->target = addr.TargetId;
> >              disk->bus = addr.PathId;
> > -            disk->pci_controller = get_pci_info(name, errp);
> > +            g_debug("unit=%lld target=%lld bus=%lld",
> > +                disk->unit, disk->target, disk->bus);
> > +            disk->pci_controller = get_pci_info(name, &local_err);
> > +
> > +            if (local_err) {
> > +                g_debug("failed to get PCI controller info: %s",
> > +                    error_get_pretty(local_err));
> > +                error_free(local_err);
> > +            } else if (disk->pci_controller != NULL) {
> > +                g_debug("pci: domain=%lld bus=%lld slot=%lld function=%lld",
> > +                    disk->pci_controller->domain,
> > +                    disk->pci_controller->bus,
> > +                    disk->pci_controller->slot,
> > +                    disk->pci_controller->function);
> > +            }
> >          }
> > -        /* We do not set error in this case, because we still have enough
> > -         * information about volume. */
> > -    } else {
> > -         disk->pci_controller = NULL;
> > +    }
> > +    /* We do not set error in case pci_controller is NULL, because we still
> > +     * have enough information about volume. */
> > +    if (disk->pci_controller == NULL) {
> > +        g_debug("no PCI controller info");
> > +        disk->pci_controller = g_malloc0(sizeof(GuestPCIAddress));
> >      }
> > 
> >      list = g_malloc0(sizeof(*list));
> > -- 
> > 2.18.0
> >
Sameeh Jubran Sept. 27, 2018, 3:16 p.m. UTC | #3
On Thu, Sep 27, 2018 at 12:06 PM Tomáš Golembiovský <tgolembi@redhat.com>
wrote:

> Hi Michael,
>
> thanks for looking into this. My comments are below.
>
> Adding Sameeh...
>
>
> On Wed, 26 Sep 2018 12:15:48 -0500
> Michael Roth <mdroth@linux.vnet.ibm.com> wrote:
>
> > Quoting Tomáš Golembiovský (2018-09-07 06:42:09)
> > > The guest-get-fsinfo command collects also information about PCI
> > > controller where the disk is attached. When this fails for some reasons
> > > it tries to return just the partial information. However in certain
> > > cases the pointer to the structure was not initialized and was set to
> > > NULL. This breaks the serializer and leads to a crash of the guest
> agent.
> > >
> > > Signed-off-by: Tomáš Golembiovský <tgolembi@redhat.com>
> >
> > For a win10 guest started with:
> >
> > qemu-system-x86_64 -m 2G -smp 2,cores=2,sockets=1 -drive
> file=/home/mdroth/vm/win10_pro_n_snap0.qcow2,if=virtio -drive
> file=/home/mdroth/vm/virtio-win-0.1.102.iso,if=ide,media=cdrom -rtc
> base=localtime,driftfix=slew -vga std -boot d -name vm4 -netdev
> tap,script=/etc/qemu-ifup,vhost=on,id=vnet0 -device
> virtio-net-pci,mac=52:54:00:12:34:04,id=vnic0,netdev=vnet0,disable-modern=true
> -vnc :4 -device virtio-serial -balloon virtio -mon chardev=hmp0 -chardev
> socket,path=/tmp/vm4-hmp0.sock,server,nowait,id=hmp0 -mon
> chardev=qmp0,mode=control -chardev
> socket,path=/tmp/vm4-qmp0.sock,server,nowait,id=qmp0 -device
> virtserialport,chardev=vs0,name=vs0,id=vs_vs0 -chardev
> socket,path=/tmp/vm4-vs0.sock,server,nowait,id=vs0 -device
> virtserialport,chardev=vs1,name=vs1,id=vs_vs1 -chardev
> socket,path=/tmp/vm4-vs1.sock,server,nowait,id=vs1 -device
> virtserialport,chardev=qga,name=org.qemu.guest_agent.0,id=vs_qga -chardev
> socket,path=/tmp/vm4-qga.sock,server,nowait,id=qga -device
> isa-serial,chardev=serial0,id=serial_serial0 -chardev
> socket,path=/tmp/vm4-serial0.sock,server,nowait,id=serial0 -L
> /home/mdroth/w/build/qemu-2.11.2-build/pc-bios --enable-kvm
> >
> > this yields the following:
> >
> > {'execute':'guest-get-fsinfo'}
> > {"return": [{"name":
> "\\\\?\\Volume{83835b2d-0032-11e6-a84f-806e6f6e6963}\\", "total-bytes":
> 160755712, "mountpoint": "D:\\", "disk": [{"bus-type": "ide", "bus": 0,
> "unit": 0, "pci-controller": {"bus": 0, "slot": 0, "domain": 0, "function":
> 0}, "target": 0}], "used-bytes": 160755712, "type": "CDFS"}, {"name":
> "\\\\?\\Volume{2ea839c6-0000-0000-0000-80620c000000}\\", "mountpoint":
> "System Reserved", "disk": [{"bus-type": "scsi", "bus": 0, "unit": 0,
> "pci-controller": {"bus": 0, "slot": 0, "domain": 0, "function": 0},
> "target": 0}], "type": "NTFS"}, {"name":
> "\\\\?\\Volume{2ea839c6-0000-0000-0000-501f00000000}\\", "total-bytes":
> 52665839616, "mountpoint": "C:\\", "disk": [{"bus-type": "scsi", "bus": 0,
> "unit": 0, "pci-controller": {"bus": 0, "slot": 0, "domain": 0, "function":
> 0}, "target": 0}], "used-bytes": 25265487872, "type": "NTFS"}, {"name":
> "\\\\?\\Volume{2ea839c6-0000-0000-0000-100000000000}\\", "mountpoint":
> "System Reserved", "disk": [{"bus-type": "scsi", "bus": 0, "unit": 0,
> "pci-controller": {"bus": 0, "slot": 0, "domain": 0, "function": 0},
> "target": 0}], "type": "NTFS"}]}
> >
> > domain/bus/slot/function=0 are valid PCI addresses so initializing to 0
> is
> > wrong. Sameeh had a previous series that initializes to -1 that I think
> > is more appropriate (it hasn't gone in yet since we opted not to enable
> > CONFIG_QGA_NTDDSCSI for 3.0 since the PCI stuff seems be generally
> > broken for Windows, also because the 2nd patch needs some fixups:
>
Can you please elaborate? What kind of fixups? I can see no comments of
this in the 2nd patch

> >
> >   https://lists.gnu.org/archive/html/qemu-devel/2018-06/msg07500.html
>
> I wasn't aware of that. I is trying to "fix" the same issue.
>
> I've been also thinking about using -1, but I didn't know what is/isn't
> correct PCI address.
>
> >
> > With that series (and some fixups I have on top at
> > https://github.com/mdroth/qemu/commits/qga-test), we get the following
> > output:
>
> Should I rebase on that and drop my patch?
>
> >
> > {'execute':'guest-get-fsinfo'}
> > {"return": [{"name":
> "\\\\?\\Volume{83835b2d-0032-11e6-a84f-806e6f6e6963}\\", "total-bytes":
> 160755712, "mountpoint": "D:\\", "disk": [{"bus-type": "ide", "bus": 0,
> "unit": 0, "pci-controller": {"bus": -1, "slot": -1, "domain": -1,
> "function": -1}, "target": 0}], "used-bytes": 160755712, "type": "CDFS"},
> {"name": "\\\\?\\Volume{2ea839c6-0000-0000-0000-80620c000000}\\",
> "mountpoint": "System Reserved", "disk": [{"bus-type": "scsi", "bus": 0,
> "unit": 0, "pci-controller": {"bus": -1, "slot": -1, "domain": 0,
> "function": 3}, "target": 0}], "type": "NTFS"}, {"name":
> "\\\\?\\Volume{2ea839c6-0000-0000-0000-501f00000000}\\", "total-bytes":
> 52665839616, "mountpoint": "C:\\", "disk": [{"bus-type": "scsi", "bus": 0,
> "unit": 0, "pci-controller": {"bus": -1, "slot": -1, "domain": 0,
> "function": 2}, "target": 0}], "used-bytes": 25267560448, "type": "NTFS"},
> {"name": "\\\\?\\Volume{2ea839c6-0000-0000-0000-100000000000}\\",
> "mountpoint": "System Reserved", "disk": [{"bus-type": "scsi", "bus": 0,
> "unit": 0, "pci-controller": {"bus": -1, "slot": -1, "domain": 0,
> "function": 1}, "target": 0}], "type": "NTFS"}]}
> >
> > Here we see the non-sensical PCI topology info I mentioned previously.
> > There are values like '"function": 3' even though there are no
> > multifunction devices present. This will be exposed to users if we
> > enable CONFIG_QGA_NTDDSCSI with things as they stand.
>
> Sadly, I have no idea how to properly fix this code. But patch 5 of the
> series IMHO brings it one step closer to proper solution. The original
> code tries to fetch the addr/bus/domain/slot for volume handle (which
> fails on missing properties) instead of disk handle.


> The remaining issue is that when listing the disks the entries are
> cryptic names like "\Device\0000001d" instead of "\PhysicalDriveX" or
> "\HarddiskX". And I don't know how to map one name to the other.
>
    char *name = g_strdup(&guid[4]);
This needs some investigation...

>
> > Currently we
> > just get an empty array for "disk" field of GuestFilesystemInfo
> > for w32, which fortunately aligns with the current QAPI schema (it's
> > an array since the volume can span multiple disks).
>
> No. You get array with one item and useless data. Unless I missed
> something.
>
>
> > I'm not about the
> > SCSI bus/unit/target data either. It's a real mess (maybe things
> > work a bit better when an actual SCSI controller is used?) and I'm
> > not sure why/how I didn't notice during initial testing.
>
> I think unit/target should work. At least with all patches from my
> series (well notably the last patch). But please, do verify that if you
> can.
>
>
> > I think all this needs to be addressed before we enable
> > CONFIG_QGA_NTDDSCSI (in terms of what that's used for in the current
> > code at least, i.e. enabling everything reported by
> > GuestFilesystemInfo.disk).
> >
> > What is the oVirt use-case? It doesn't seem like you need PCI topology,
> > but what about SCSI topology (and if so, does that information seem
> > correct to you)? Or is just a list a serials/dev paths by themselves
> > all that's needed? Trying to see how we might decouple things from the
> > broken PCI topology stuff. May need to deprecate
> > GuestFilesystemInfo.disk in favor of something less monolithic if all
> > of this isn't fixable in a reasonable-enough timeframe.
>
> We don't need the PCI info. I've been only trying to fix this to some
> sort along the way. What we need is to get disk serial number (patch 4)
> and "name" of the disk of some sort (patch 5). For that I use device
> node (e.g. /dev/sda) on Linux and UNC for the disk on Windows (e.g.
> \\?\PhysicalDrive1). But the code relies on CONFIG_QGA_NTDDSCSI being
> enabled.
>
This is already enabled in downstream version of qga-win, I don't think it
should be
enabled upstream if it's not necessary. ( Actually I'm not sure why it is
disabled in the first place? I supposed because it is not stable)

>
>     Tomas
>
> >
> >
> > > ---
> > >  qga/commands-win32.c | 27 ++++++++++++++++++++++-----
> > >  1 file changed, 22 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/qga/commands-win32.c b/qga/commands-win32.c
> > > index 98d9735389..9c959122d9 100644
> > > --- a/qga/commands-win32.c
> > > +++ b/qga/commands-win32.c
> > > @@ -633,15 +633,32 @@ static GuestDiskAddressList
> *build_guest_disk_info(char *guid, Error **errp)
> > >           *
> https://technet.microsoft.com/en-us/library/ee851589(v=ws.10).aspx */
> > >          if (DeviceIoControl(vol_h, IOCTL_SCSI_GET_ADDRESS, NULL, 0,
> scsi_ad,
> > >                              sizeof(SCSI_ADDRESS), &len, NULL)) {
> > > +            Error *local_err = NULL;
> > >              disk->unit = addr.Lun;
> > >              disk->target = addr.TargetId;
> > >              disk->bus = addr.PathId;
> > > -            disk->pci_controller = get_pci_info(name, errp);
> > > +            g_debug("unit=%lld target=%lld bus=%lld",
> > > +                disk->unit, disk->target, disk->bus);
> > > +            disk->pci_controller = get_pci_info(name, &local_err);
> > > +
> > > +            if (local_err) {
> > > +                g_debug("failed to get PCI controller info: %s",
> > > +                    error_get_pretty(local_err));
> > > +                error_free(local_err);
> > > +            } else if (disk->pci_controller != NULL) {
> > > +                g_debug("pci: domain=%lld bus=%lld slot=%lld
> function=%lld",
> > > +                    disk->pci_controller->domain,
> > > +                    disk->pci_controller->bus,
> > > +                    disk->pci_controller->slot,
> > > +                    disk->pci_controller->function);
> > > +            }
> > >          }
> > > -        /* We do not set error in this case, because we still have
> enough
> > > -         * information about volume. */
> > > -    } else {
> > > -         disk->pci_controller = NULL;
> > > +    }
> > > +    /* We do not set error in case pci_controller is NULL, because we
> still
> > > +     * have enough information about volume. */
> > > +    if (disk->pci_controller == NULL) {
> > > +        g_debug("no PCI controller info");
> > > +        disk->pci_controller = g_malloc0(sizeof(GuestPCIAddress));
> > >      }
> > >
> > >      list = g_malloc0(sizeof(*list));
> > > --
> > > 2.18.0
> > >
>
>
> --
> Tomáš Golembiovský <tgolembi@redhat.com>
>
Tomáš Golembiovský Oct. 1, 2018, 12:11 p.m. UTC | #4
On Thu, 27 Sep 2018 18:16:04 +0300
Sameeh Jubran <sjubran@redhat.com> wrote:

> On Thu, Sep 27, 2018 at 12:06 PM Tomáš Golembiovský <tgolembi@redhat.com>
> wrote:
> 
> > Hi Michael,
> >
> > thanks for looking into this. My comments are below.
> >
> > Adding Sameeh...
> >
> >
> > On Wed, 26 Sep 2018 12:15:48 -0500
> > Michael Roth <mdroth@linux.vnet.ibm.com> wrote:
> >  
> > > Quoting Tomáš Golembiovský (2018-09-07 06:42:09)  
> > > > The guest-get-fsinfo command collects also information about PCI
> > > > controller where the disk is attached. When this fails for some reasons
> > > > it tries to return just the partial information. However in certain
> > > > cases the pointer to the structure was not initialized and was set to
> > > > NULL. This breaks the serializer and leads to a crash of the guest  
> > agent.  
> > > >
> > > > Signed-off-by: Tomáš Golembiovský <tgolembi@redhat.com>  
> > >
> > > For a win10 guest started with:
> > >
> > > qemu-system-x86_64 -m 2G -smp 2,cores=2,sockets=1 -drive  
> > file=/home/mdroth/vm/win10_pro_n_snap0.qcow2,if=virtio -drive
> > file=/home/mdroth/vm/virtio-win-0.1.102.iso,if=ide,media=cdrom -rtc
> > base=localtime,driftfix=slew -vga std -boot d -name vm4 -netdev
> > tap,script=/etc/qemu-ifup,vhost=on,id=vnet0 -device
> > virtio-net-pci,mac=52:54:00:12:34:04,id=vnic0,netdev=vnet0,disable-modern=true
> > -vnc :4 -device virtio-serial -balloon virtio -mon chardev=hmp0 -chardev
> > socket,path=/tmp/vm4-hmp0.sock,server,nowait,id=hmp0 -mon
> > chardev=qmp0,mode=control -chardev
> > socket,path=/tmp/vm4-qmp0.sock,server,nowait,id=qmp0 -device
> > virtserialport,chardev=vs0,name=vs0,id=vs_vs0 -chardev
> > socket,path=/tmp/vm4-vs0.sock,server,nowait,id=vs0 -device
> > virtserialport,chardev=vs1,name=vs1,id=vs_vs1 -chardev
> > socket,path=/tmp/vm4-vs1.sock,server,nowait,id=vs1 -device
> > virtserialport,chardev=qga,name=org.qemu.guest_agent.0,id=vs_qga -chardev
> > socket,path=/tmp/vm4-qga.sock,server,nowait,id=qga -device
> > isa-serial,chardev=serial0,id=serial_serial0 -chardev
> > socket,path=/tmp/vm4-serial0.sock,server,nowait,id=serial0 -L
> > /home/mdroth/w/build/qemu-2.11.2-build/pc-bios --enable-kvm  
> > >
> > > this yields the following:
> > >
> > > {'execute':'guest-get-fsinfo'}
> > > {"return": [{"name":  
> > "\\\\?\\Volume{83835b2d-0032-11e6-a84f-806e6f6e6963}\\", "total-bytes":
> > 160755712, "mountpoint": "D:\\", "disk": [{"bus-type": "ide", "bus": 0,
> > "unit": 0, "pci-controller": {"bus": 0, "slot": 0, "domain": 0, "function":
> > 0}, "target": 0}], "used-bytes": 160755712, "type": "CDFS"}, {"name":
> > "\\\\?\\Volume{2ea839c6-0000-0000-0000-80620c000000}\\", "mountpoint":
> > "System Reserved", "disk": [{"bus-type": "scsi", "bus": 0, "unit": 0,
> > "pci-controller": {"bus": 0, "slot": 0, "domain": 0, "function": 0},
> > "target": 0}], "type": "NTFS"}, {"name":
> > "\\\\?\\Volume{2ea839c6-0000-0000-0000-501f00000000}\\", "total-bytes":
> > 52665839616, "mountpoint": "C:\\", "disk": [{"bus-type": "scsi", "bus": 0,
> > "unit": 0, "pci-controller": {"bus": 0, "slot": 0, "domain": 0, "function":
> > 0}, "target": 0}], "used-bytes": 25265487872, "type": "NTFS"}, {"name":
> > "\\\\?\\Volume{2ea839c6-0000-0000-0000-100000000000}\\", "mountpoint":
> > "System Reserved", "disk": [{"bus-type": "scsi", "bus": 0, "unit": 0,
> > "pci-controller": {"bus": 0, "slot": 0, "domain": 0, "function": 0},
> > "target": 0}], "type": "NTFS"}]}  
> > >
> > > domain/bus/slot/function=0 are valid PCI addresses so initializing to 0  
> > is  
> > > wrong. Sameeh had a previous series that initializes to -1 that I think
> > > is more appropriate (it hasn't gone in yet since we opted not to enable
> > > CONFIG_QGA_NTDDSCSI for 3.0 since the PCI stuff seems be generally
> > > broken for Windows, also because the 2nd patch needs some fixups:  
> >  
> Can you please elaborate? What kind of fixups? I can see no comments of
> this in the 2nd patch
> 

I think he refers to this fix:

https://github.com/mdroth/qemu/commit/201db36b56d7d1ba5ff720eedcb3b62b75306fde

Plus there seems to be an issue when "calculating" function number as
described below. (Maybe related to casting addr from -1 to uint and
back?)

> > >
> > >   https://lists.gnu.org/archive/html/qemu-devel/2018-06/msg07500.html  
> >
> > I wasn't aware of that. I is trying to "fix" the same issue.
> >
> > I've been also thinking about using -1, but I didn't know what is/isn't
> > correct PCI address.
> >  
> > >
> > > With that series (and some fixups I have on top at
> > > https://github.com/mdroth/qemu/commits/qga-test), we get the following
> > > output:  
> >
> > Should I rebase on that and drop my patch?
> >  
> > >
> > > {'execute':'guest-get-fsinfo'}
> > > {"return": [{"name":  
> > "\\\\?\\Volume{83835b2d-0032-11e6-a84f-806e6f6e6963}\\", "total-bytes":
> > 160755712, "mountpoint": "D:\\", "disk": [{"bus-type": "ide", "bus": 0,
> > "unit": 0, "pci-controller": {"bus": -1, "slot": -1, "domain": -1,
> > "function": -1}, "target": 0}], "used-bytes": 160755712, "type": "CDFS"},
> > {"name": "\\\\?\\Volume{2ea839c6-0000-0000-0000-80620c000000}\\",
> > "mountpoint": "System Reserved", "disk": [{"bus-type": "scsi", "bus": 0,
> > "unit": 0, "pci-controller": {"bus": -1, "slot": -1, "domain": 0,
> > "function": 3}, "target": 0}], "type": "NTFS"}, {"name":
> > "\\\\?\\Volume{2ea839c6-0000-0000-0000-501f00000000}\\", "total-bytes":
> > 52665839616, "mountpoint": "C:\\", "disk": [{"bus-type": "scsi", "bus": 0,
> > "unit": 0, "pci-controller": {"bus": -1, "slot": -1, "domain": 0,
> > "function": 2}, "target": 0}], "used-bytes": 25267560448, "type": "NTFS"},
> > {"name": "\\\\?\\Volume{2ea839c6-0000-0000-0000-100000000000}\\",
> > "mountpoint": "System Reserved", "disk": [{"bus-type": "scsi", "bus": 0,
> > "unit": 0, "pci-controller": {"bus": -1, "slot": -1, "domain": 0,
> > "function": 1}, "target": 0}], "type": "NTFS"}]}  
> > >
> > > Here we see the non-sensical PCI topology info I mentioned previously.
> > > There are values like '"function": 3' even though there are no
> > > multifunction devices present. This will be exposed to users if we
> > > enable CONFIG_QGA_NTDDSCSI with things as they stand.  
> >
> > Sadly, I have no idea how to properly fix this code. But patch 5 of the
> > series IMHO brings it one step closer to proper solution. The original
> > code tries to fetch the addr/bus/domain/slot for volume handle (which
> > fails on missing properties) instead of disk handle.  
> 
> 
> > The remaining issue is that when listing the disks the entries are
> > cryptic names like "\Device\0000001d" instead of "\PhysicalDriveX" or
> > "\HarddiskX". And I don't know how to map one name to the other.
> >  
>     char *name = g_strdup(&guid[4]);
> This needs some investigation...
> 

This is to strip first 4 characters from UNC names.

    "\\?\abcdef" -> "\abcdef"


> >  
> > > Currently we
> > > just get an empty array for "disk" field of GuestFilesystemInfo
> > > for w32, which fortunately aligns with the current QAPI schema (it's
> > > an array since the volume can span multiple disks).  
> >
> > No. You get array with one item and useless data. Unless I missed
> > something.
> >
> >  
> > > I'm not about the
> > > SCSI bus/unit/target data either. It's a real mess (maybe things
> > > work a bit better when an actual SCSI controller is used?) and I'm
> > > not sure why/how I didn't notice during initial testing.  
> >
> > I think unit/target should work. At least with all patches from my
> > series (well notably the last patch). But please, do verify that if you
> > can.
> >
> >  
> > > I think all this needs to be addressed before we enable
> > > CONFIG_QGA_NTDDSCSI (in terms of what that's used for in the current
> > > code at least, i.e. enabling everything reported by
> > > GuestFilesystemInfo.disk).
> > >
> > > What is the oVirt use-case? It doesn't seem like you need PCI topology,
> > > but what about SCSI topology (and if so, does that information seem
> > > correct to you)? Or is just a list a serials/dev paths by themselves
> > > all that's needed? Trying to see how we might decouple things from the
> > > broken PCI topology stuff. May need to deprecate
> > > GuestFilesystemInfo.disk in favor of something less monolithic if all
> > > of this isn't fixable in a reasonable-enough timeframe.  
> >
> > We don't need the PCI info. I've been only trying to fix this to some
> > sort along the way. What we need is to get disk serial number (patch 4)
> > and "name" of the disk of some sort (patch 5). For that I use device
> > node (e.g. /dev/sda) on Linux and UNC for the disk on Windows (e.g.
> > \\?\PhysicalDrive1). But the code relies on CONFIG_QGA_NTDDSCSI being
> > enabled.
> >  
> This is already enabled in downstream version of qga-win, I don't think it
> should be
> enabled upstream if it's not necessary. ( Actually I'm not sure why it is
> disabled in the first place? I supposed because it is not stable)

So can we enable CONFIG_QGA_NTDDSCSI and disable PCI controller info by
different means?

> 
> >
> >     Tomas
> >  
> > >
> > >  
> > > > ---
> > > >  qga/commands-win32.c | 27 ++++++++++++++++++++++-----
> > > >  1 file changed, 22 insertions(+), 5 deletions(-)
> > > >
> > > > diff --git a/qga/commands-win32.c b/qga/commands-win32.c
> > > > index 98d9735389..9c959122d9 100644
> > > > --- a/qga/commands-win32.c
> > > > +++ b/qga/commands-win32.c
> > > > @@ -633,15 +633,32 @@ static GuestDiskAddressList  
> > *build_guest_disk_info(char *guid, Error **errp)  
> > > >           *  
> > https://technet.microsoft.com/en-us/library/ee851589(v=ws.10).aspx */  
> > > >          if (DeviceIoControl(vol_h, IOCTL_SCSI_GET_ADDRESS, NULL, 0,  
> > scsi_ad,  
> > > >                              sizeof(SCSI_ADDRESS), &len, NULL)) {
> > > > +            Error *local_err = NULL;
> > > >              disk->unit = addr.Lun;
> > > >              disk->target = addr.TargetId;
> > > >              disk->bus = addr.PathId;
> > > > -            disk->pci_controller = get_pci_info(name, errp);
> > > > +            g_debug("unit=%lld target=%lld bus=%lld",
> > > > +                disk->unit, disk->target, disk->bus);
> > > > +            disk->pci_controller = get_pci_info(name, &local_err);
> > > > +
> > > > +            if (local_err) {
> > > > +                g_debug("failed to get PCI controller info: %s",
> > > > +                    error_get_pretty(local_err));
> > > > +                error_free(local_err);
> > > > +            } else if (disk->pci_controller != NULL) {
> > > > +                g_debug("pci: domain=%lld bus=%lld slot=%lld  
> > function=%lld",  
> > > > +                    disk->pci_controller->domain,
> > > > +                    disk->pci_controller->bus,
> > > > +                    disk->pci_controller->slot,
> > > > +                    disk->pci_controller->function);
> > > > +            }
> > > >          }
> > > > -        /* We do not set error in this case, because we still have  
> > enough  
> > > > -         * information about volume. */
> > > > -    } else {
> > > > -         disk->pci_controller = NULL;
> > > > +    }
> > > > +    /* We do not set error in case pci_controller is NULL, because we  
> > still  
> > > > +     * have enough information about volume. */
> > > > +    if (disk->pci_controller == NULL) {
> > > > +        g_debug("no PCI controller info");
> > > > +        disk->pci_controller = g_malloc0(sizeof(GuestPCIAddress));
> > > >      }
> > > >
> > > >      list = g_malloc0(sizeof(*list));
> > > > --
> > > > 2.18.0
> > > >  
> >
> >
> > --
> > Tomáš Golembiovský <tgolembi@redhat.com>
> >
diff mbox series

Patch

diff --git a/qga/commands-win32.c b/qga/commands-win32.c
index 98d9735389..9c959122d9 100644
--- a/qga/commands-win32.c
+++ b/qga/commands-win32.c
@@ -633,15 +633,32 @@  static GuestDiskAddressList *build_guest_disk_info(char *guid, Error **errp)
          * https://technet.microsoft.com/en-us/library/ee851589(v=ws.10).aspx */
         if (DeviceIoControl(vol_h, IOCTL_SCSI_GET_ADDRESS, NULL, 0, scsi_ad,
                             sizeof(SCSI_ADDRESS), &len, NULL)) {
+            Error *local_err = NULL;
             disk->unit = addr.Lun;
             disk->target = addr.TargetId;
             disk->bus = addr.PathId;
-            disk->pci_controller = get_pci_info(name, errp);
+            g_debug("unit=%lld target=%lld bus=%lld",
+                disk->unit, disk->target, disk->bus);
+            disk->pci_controller = get_pci_info(name, &local_err);
+
+            if (local_err) {
+                g_debug("failed to get PCI controller info: %s",
+                    error_get_pretty(local_err));
+                error_free(local_err);
+            } else if (disk->pci_controller != NULL) {
+                g_debug("pci: domain=%lld bus=%lld slot=%lld function=%lld",
+                    disk->pci_controller->domain,
+                    disk->pci_controller->bus,
+                    disk->pci_controller->slot,
+                    disk->pci_controller->function);
+            }
         }
-        /* We do not set error in this case, because we still have enough
-         * information about volume. */
-    } else {
-         disk->pci_controller = NULL;
+    }
+    /* We do not set error in case pci_controller is NULL, because we still
+     * have enough information about volume. */
+    if (disk->pci_controller == NULL) {
+        g_debug("no PCI controller info");
+        disk->pci_controller = g_malloc0(sizeof(GuestPCIAddress));
     }
 
     list = g_malloc0(sizeof(*list));