diff mbox

[BUGFIX,for,2.2,1/1] hw/ide/core.c: Prevent SIGSEGV during migration

Message ID 1416259239-13281-1-git-send-email-dslutz@verizon.com
State New
Headers show

Commit Message

Don Slutz Nov. 17, 2014, 9:20 p.m. UTC
The other callers to blk_set_enable_write_cache() in this file
already check for s->blk == NULL.

Signed-off-by: Don Slutz <dslutz@verizon.com>
---

I think this is a bugfix that should be back ported to stable
releases.

I also think this should be done in xen's copy of QEMU for 4.5 with
back port(s) to active stable releases.

Note: In 2.1 and earlier the routine is
bdrv_set_enable_write_cache(); variable is s->bs.

 hw/ide/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Paolo Bonzini Nov. 18, 2014, 10:05 a.m. UTC | #1
On 17/11/2014 22:20, Don Slutz wrote:
> The other callers to blk_set_enable_write_cache() in this file
> already check for s->blk == NULL.
> 
> Signed-off-by: Don Slutz <dslutz@verizon.com>
> ---
> 
> I think this is a bugfix that should be back ported to stable
> releases.
> 
> I also think this should be done in xen's copy of QEMU for 4.5 with
> back port(s) to active stable releases.
> 
> Note: In 2.1 and earlier the routine is
> bdrv_set_enable_write_cache(); variable is s->bs.
> 
>  hw/ide/core.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/hw/ide/core.c b/hw/ide/core.c
> index 00e21cf..d4af5e2 100644
> --- a/hw/ide/core.c
> +++ b/hw/ide/core.c
> @@ -2401,7 +2401,7 @@ static int ide_drive_post_load(void *opaque, int version_id)
>  {
>      IDEState *s = opaque;
>  
> -    if (s->identify_set) {
> +    if (s->blk && s->identify_set) {
>          blk_set_enable_write_cache(s->blk, !!(s->identify_data[85] & (1 << 5)));
>      }
>      return 0;
> 

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Stefan Hajnoczi Nov. 18, 2014, 11:37 a.m. UTC | #2
On Mon, Nov 17, 2014 at 04:20:39PM -0500, Don Slutz wrote:
> The other callers to blk_set_enable_write_cache() in this file
> already check for s->blk == NULL.
> 
> Signed-off-by: Don Slutz <dslutz@verizon.com>
> ---
> 
> I think this is a bugfix that should be back ported to stable
> releases.
> 
> I also think this should be done in xen's copy of QEMU for 4.5 with
> back port(s) to active stable releases.
> 
> Note: In 2.1 and earlier the routine is
> bdrv_set_enable_write_cache(); variable is s->bs.
> 
>  hw/ide/core.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Kevin Wolf Nov. 18, 2014, 11:41 a.m. UTC | #3
Am 17.11.2014 um 22:20 hat Don Slutz geschrieben:
> The other callers to blk_set_enable_write_cache() in this file
> already check for s->blk == NULL.
> 
> Signed-off-by: Don Slutz <dslutz@verizon.com>
> ---
> 
> I think this is a bugfix that should be back ported to stable
> releases.

Please remember to include a 'Cc: qemu-stable@nongnu.org' line in your
commit messages for such patches. I've added it and applied the patch to
my block branch, thanks. (Saw it too late for my -rc2 pull request,
though, so it'll have to wait for -rc3.)

Kevin
Stefano Stabellini Nov. 18, 2014, 2:12 p.m. UTC | #4
Konrad,
I think we should have this fix in Xen 4.5. Should I go ahead and
backport it?

On Mon, 17 Nov 2014, Don Slutz wrote:
> The other callers to blk_set_enable_write_cache() in this file
> already check for s->blk == NULL.
> 
> Signed-off-by: Don Slutz <dslutz@verizon.com>
> ---
> 
> I think this is a bugfix that should be back ported to stable
> releases.
> 
> I also think this should be done in xen's copy of QEMU for 4.5 with
> back port(s) to active stable releases.
> 
> Note: In 2.1 and earlier the routine is
> bdrv_set_enable_write_cache(); variable is s->bs.
> 
>  hw/ide/core.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/hw/ide/core.c b/hw/ide/core.c
> index 00e21cf..d4af5e2 100644
> --- a/hw/ide/core.c
> +++ b/hw/ide/core.c
> @@ -2401,7 +2401,7 @@ static int ide_drive_post_load(void *opaque, int version_id)
>  {
>      IDEState *s = opaque;
>  
> -    if (s->identify_set) {
> +    if (s->blk && s->identify_set) {
>          blk_set_enable_write_cache(s->blk, !!(s->identify_data[85] & (1 << 5)));
>      }
>      return 0;
> -- 
> 1.8.4
>
Peter Maydell Nov. 18, 2014, 6 p.m. UTC | #5
On 18 November 2014 11:41, Kevin Wolf <kwolf@redhat.com> wrote:
> Am 17.11.2014 um 22:20 hat Don Slutz geschrieben:
>> The other callers to blk_set_enable_write_cache() in this file
>> already check for s->blk == NULL.
>>
>> Signed-off-by: Don Slutz <dslutz@verizon.com>
>> ---
>>
>> I think this is a bugfix that should be back ported to stable
>> releases.
>
> Please remember to include a 'Cc: qemu-stable@nongnu.org' line in your
> commit messages for such patches. I've added it and applied the patch to
> my block branch, thanks. (Saw it too late for my -rc2 pull request,
> though, so it'll have to wait for -rc3.)

Now applied directly to master so it will make -rc2.

thanks
-- PMM
Markus Armbruster Nov. 19, 2014, 12:29 p.m. UTC | #6
Don Slutz <dslutz@verizon.com> writes:

> The other callers to blk_set_enable_write_cache() in this file
> already check for s->blk == NULL.
>
> Signed-off-by: Don Slutz <dslutz@verizon.com>
> ---
>
> I think this is a bugfix that should be back ported to stable
> releases.
>
> I also think this should be done in xen's copy of QEMU for 4.5 with
> back port(s) to active stable releases.
>
> Note: In 2.1 and earlier the routine is
> bdrv_set_enable_write_cache(); variable is s->bs.

Got a reproducer?

I'm asking because I believe s->identify_set implies s->blk.
s->identify_set is initialized to zero, and gets set to non-zero exactly
on the first successful IDENTIFY DEVICE or IDENTIFY PACKET DEVICE, in
ide_identify(), ide_atapi_identify() or ide_cfata_identify(),
respectively.  Only called via cmd_identify() / cmd_identify_packet()
via ide_exec_cmd().  The latter immediately fails when !s->blk:

    s = idebus_active_if(bus);
    /* ignore commands to non existent slave */
    if (s != bus->ifs && !s->blk) {
        return;
    }

Even if I'm right, your patch is fine, because it makes this spot more
obviously correct, and consistent with the other uses of
blk_set_enable_write_cache().  The case for stable is weak, though.

>
>  hw/ide/core.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/hw/ide/core.c b/hw/ide/core.c
> index 00e21cf..d4af5e2 100644
> --- a/hw/ide/core.c
> +++ b/hw/ide/core.c
> @@ -2401,7 +2401,7 @@ static int ide_drive_post_load(void *opaque, int version_id)
>  {
>      IDEState *s = opaque;
>  
> -    if (s->identify_set) {
> +    if (s->blk && s->identify_set) {
>          blk_set_enable_write_cache(s->blk, !!(s->identify_data[85] & (1 << 5)));
>      }
>      return 0;
Don Slutz Nov. 20, 2014, 6:31 p.m. UTC | #7
On 11/19/14 07:29, Markus Armbruster wrote:
> Don Slutz <dslutz@verizon.com> writes:
>
>> The other callers to blk_set_enable_write_cache() in this file
>> already check for s->blk == NULL.
>>
>> Signed-off-by: Don Slutz <dslutz@verizon.com>
>> ---
>>
>> I think this is a bugfix that should be back ported to stable
>> releases.
>>
>> I also think this should be done in xen's copy of QEMU for 4.5 with
>> back port(s) to active stable releases.
>>
>> Note: In 2.1 and earlier the routine is
>> bdrv_set_enable_write_cache(); variable is s->bs.
> Got a reproducer?

yes.  Migrating a guest from xen 4.2 or 4.3 to xen 4.4 (or 4.5-unstable) on
CentOS 6.3 with xen_emul_unplug=unnecessary and no cdrom defined.


>
> I'm asking because I believe s->identify_set implies s->blk.
> s->identify_set is initialized to zero, and gets set to non-zero exactly
> on the first successful IDENTIFY DEVICE or IDENTIFY PACKET DEVICE, in
> ide_identify(), ide_atapi_identify() or ide_cfata_identify(),
> respectively.  Only called via cmd_identify() / cmd_identify_packet()
> via ide_exec_cmd().  The latter immediately fails when !s->blk:
>
>      s = idebus_active_if(bus);
>      /* ignore commands to non existent slave */
>      if (s != bus->ifs && !s->blk) {
>          return;
>      }

I do think that you are right.  I have now spent more time on why I am
seeing this.


> Even if I'm right, your patch is fine, because it makes this spot more
> obviously correct, and consistent with the other uses of
> blk_set_enable_write_cache().  The case for stable is weak, though.
>

I had not fully tracked down what is happening before sending the bugfix.
I have now done more debugging, and have tracked it down to xen 4.4
now using "-nodefaults" with QEMU.

I needed to add output to QEMU to track this down because I have long
command lines...

(all I get for ps -ef):
root     14864     1 82 16:42 ?        00:00:09 
/usr/lib/xen/bin/qemu-system-i386 -xen-domid 20 -chardev 
socket,id=libxl-cmd,path=/var/run/xen/qmp-libxl-20,server,nowait -mon 
chardev=libxl-cmd,mode=control -name C63-min-tools -machine 
xenfv,vmware_hw=7,xen_platform_pci=240 -hostbridge 82443 -device 
agp-bridge,id=agp,msi=off,msi-x=off,addr=0x12.0 -monitor pty -monitor vc 
-boot menu=on -device 
vmware-pci-bridge,msi=on,msi-x=on,id=pciBridge1.0,addr=0x11.0 -device 
vmware-pcie-bridge,msi=on,msi-x=on,id=pciBridge5.0,multifunction=on,addr=0x15.0 
-device 
vmware-pcie-bridge,msi=on,msi-x=on,id=pciBridge5.1,multifunction=on,addr=0x15.1 
-device 
vmware-pcie-bridge,msi=on,msi-x=on,id=pciBridge5.2,multifunction=on,addr=0x15.2 
-device 
vmware-pcie-bridge,msi=on,msi-x=on,id=pciBridge5.3,multifunction=on,addr=0x15.3 
-device 
vmware-pcie-bridge,msi=on,msi-x=on,id=pciBridge5.4,multifunction=on,addr=0x15.4 
-device 
vmware-pcie-bridge,msi=on,msi-x=on,id=pciBridge5.5,multifunction=on,addr=0x15.5 
-device 
vmware-pcie-bridge,msi=on,msi-x=on,id=pciBridge5.6,multifunction=on,addr=0x15.6 
-device 
vmware-pcie-bridge,msi=on,msi-x=on,id=pciBridge5.7,multifunction=on,addr=0x15.7 
-device 
vmware-pcie-bridge,msi=on,msi-x=on,id=pciBridge6.0,multifunction=on,addr=0x16.0 
-device 
vmware-pcie-bridge,msi=on,msi-x=on,id=pciBridge6.1,multifunction=on,addr=0x16.1 
-device 
vmware-pcie-bridge,msi=on,msi-x=on,id=pciBridge6.2,multifunction=on,addr=0x16.2 
-device 
vmware-pcie-bridge,msi=on,msi-x=on,id=pciBridge6.3,multifunction=on,addr=0x16.3 
-device 
vmware-pcie-bridge,msi=on,msi-x=on,id=pciBridge6.4,multifunction=on,addr=0x16.4 
-device 
vmware-pcie-bridge,msi=on,msi-x=on,id=pciBridge6.5,multifunction=on,addr=0x16.5 
-device 
vmware-pcie-bridge,msi=on,msi-x=on,id=pciBridge6.6,multifunction=on,addr=0x16.6 
-device 
vmware-pcie-bridge,msi=on,msi-x=on,id=pciBridge6.7,multifunction=on,addr=0x16.7 
-device 
vmware-pcie-bridge,msi=on,msi-x=on,id=pciBridge7.0,multifunction=on,addr=0x17.0 
-device 
vmware-pcie-bridge,msi=on,msi-x=on,id=pciBridge7.1,multifunction=on,addr=0x17.1 
-device 
vmware-pcie-bridge,msi=on,msi-x=on,id=pciBridge7.2,multifunction=on,addr=0x17.2 
-device 
vmware-pcie-bridge,msi=on,msi-x=on,id=pciBridge7.3,multifunction=on,addr=0x17.3 
-device 
vmware-pcie-bridge,msi=on,msi-x=on,id=pciBridge7.4,multifunction=on,addr=0x17.4 
-device 
vmware-pcie-bridge,msi=on,msi-x=on,id=pciBridge7.5,multifunction=on,addr=0x17.5 
-device 
vmware-pcie-bridge,msi=on,msi-x=on,id=pciBridge7.6,multifunction=on,addr=0x17.6 
-device 
vmware-pcie-bridge,msi=on,msi-x=on,id=pciBridge7.7,multifunction=on,addr=0x17.7 
-device 
vmware-pcie-bridge,msi=on,msi-x=on,id=pciBridge8.0,multifunction=on,addr=0x18.0 
-device 
vmware-pcie-bridge,msi=on,msi-x=on,id=pciBridge8.1,multifunction=on,addr=0x18.1 
-device 
vmware-pcie-bridge,msi=on,msi-x=on,id=pciBridge8.2,multifunction=on,addr=0x18.2 
-device 
vmware-pcie-bridge,msi=on,msi-x=on,id=pciBridge8.3,multifunction=on,addr=0x18.3 
-device 
vmware-pcie-bridge,msi=on,msi-x=on,id=pciBridge8.4,multifunction=on,addr=0x18.4 
-device 
vmware-pcie-bridge,msi=on,msi-x=on,id=pciBridge8.5,multifunction=on,addr=0x18.5 
-device 
vmware-pcie-bridge,msi=on,msi-x=on,id=pciBridge8.6,multifunction=on,addr=0x18.6 
-device 
vmware-pcie-bridge,msi=on,msi-x=on,id=pciBridge8.7,multifunction=on,addr=0x18.7 
-vnc 0.0.0.0:7,to=99 -serial pty -vga vmware -global 
vmware-svga.vgamem_mb=32 -boot order=cda -device 
vmxnet3,id=nic0,netdev=net0,mac=00:0c:29:86:44:a0,addr=0x3.0x0 -netdev 
type=tap,id=net0,ifname=vif20.0-emu,script=no,downscript=no -device 
e1000_vmw,id=nic1,netdev=net1,mac=00:0c:29:86:44:aa,addr=0x4.0x0 -netdev 
type=tap,id=net1,ifname=vif20.1-emu,script=no,downscript=no -device 
vmxnet3,id=nic2,netdev=net2,mac=00:0c:29:86:44:b4,addr=0x5.0x0 -netdev 
type=tap,id=net2,ifname=vif20.2-emu,script=no,downscript=no -device 
vmxnet3,id=nic3,netdev=net3,mac=00:0c:29:86:44:be,addr=0x6.0x0 -netdev 
type=tap,id=net3,ifname=vif20.3-emu,script=no,downscript=no -device 
vmxnet3,id=nic4,netdev=net4,mac=00:0c:29:86:44:c8,addr=0x8.0x0 -netdev 
type=tap,id=net4,ifname=vif20.4-emu,script=no,downscript=no -device 
vmxnet3,id=nic5,netdev=net5,mac=00:0c:29:86:44:d2,addr=0x9.0x0 -netdev t


Which is missing that option.

The ide that was aborting in this case is the cdrom at hdc that is added
if you do not specify "-nodefaults".

Since this is a "changed" machine config, I am no longer as sure as what
versions this needs to be in.

If I put my QEMU hat on, it does not look like a back port is needed.  
However
for xen it would be nice.

I do not know how the QEMU community feels about migration from a config
without "-nodefaults" to one with "-nodefaults" as the only difference.

    -Don Slutz

>>   hw/ide/core.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/hw/ide/core.c b/hw/ide/core.c
>> index 00e21cf..d4af5e2 100644
>> --- a/hw/ide/core.c
>> +++ b/hw/ide/core.c
>> @@ -2401,7 +2401,7 @@ static int ide_drive_post_load(void *opaque, int version_id)
>>   {
>>       IDEState *s = opaque;
>>   
>> -    if (s->identify_set) {
>> +    if (s->blk && s->identify_set) {
>>           blk_set_enable_write_cache(s->blk, !!(s->identify_data[85] & (1 << 5)));
>>       }
>>       return 0;
Markus Armbruster Nov. 21, 2014, 8:42 a.m. UTC | #8
Don Slutz <dslutz@verizon.com> writes:

> On 11/19/14 07:29, Markus Armbruster wrote:
>> Don Slutz <dslutz@verizon.com> writes:
>>
>>> The other callers to blk_set_enable_write_cache() in this file
>>> already check for s->blk == NULL.
>>>
>>> Signed-off-by: Don Slutz <dslutz@verizon.com>
>>> ---
>>>
>>> I think this is a bugfix that should be back ported to stable
>>> releases.
>>>
>>> I also think this should be done in xen's copy of QEMU for 4.5 with
>>> back port(s) to active stable releases.
>>>
>>> Note: In 2.1 and earlier the routine is
>>> bdrv_set_enable_write_cache(); variable is s->bs.
>> Got a reproducer?
>
> yes.  Migrating a guest from xen 4.2 or 4.3 to xen 4.4 (or 4.5-unstable) on
> CentOS 6.3 with xen_emul_unplug=unnecessary and no cdrom defined.
>
>
>>
>> I'm asking because I believe s->identify_set implies s->blk.
>> s->identify_set is initialized to zero, and gets set to non-zero exactly
>> on the first successful IDENTIFY DEVICE or IDENTIFY PACKET DEVICE, in
>> ide_identify(), ide_atapi_identify() or ide_cfata_identify(),
>> respectively.  Only called via cmd_identify() / cmd_identify_packet()
>> via ide_exec_cmd().  The latter immediately fails when !s->blk:
>>
>>      s = idebus_active_if(bus);
>>      /* ignore commands to non existent slave */
>>      if (s != bus->ifs && !s->blk) {
>>          return;
>>      }
>
> I do think that you are right.  I have now spent more time on why I am
> seeing this.
>
>
>> Even if I'm right, your patch is fine, because it makes this spot more
>> obviously correct, and consistent with the other uses of
>> blk_set_enable_write_cache().  The case for stable is weak, though.
>>
>
> I had not fully tracked down what is happening before sending the bugfix.
> I have now done more debugging, and have tracked it down to xen 4.4
> now using "-nodefaults" with QEMU.
>
> I needed to add output to QEMU to track this down because I have long
> command lines...
>
> (all I get for ps -ef):
[...]
>
>
> Which is missing that option.
>
> The ide that was aborting in this case is the cdrom at hdc that is added
> if you do not specify "-nodefaults".
>
> Since this is a "changed" machine config, I am no longer as sure as what
> versions this needs to be in.
>
> If I put my QEMU hat on, it does not look like a back port is needed.
> However
> for xen it would be nice.
>
> I do not know how the QEMU community feels about migration from a config
> without "-nodefaults" to one with "-nodefaults" as the only difference.

So you have a CD-ROM on the source, but not on the destination?

That can't work.  I guess it broke for you in an unusual way (target
crashes) rather than the usual way (target rejects migration data for a
device it doesn't have) due to our convoluted IDE data structures.  With
your patch applied it should break the usual way.  Does it?

Management tools should use -nodefaults.  But if it mixes default and
-nodefaults in migration, recreating the stuff it got by default but
doesn't get with -nodefaults is its own responsibility.
Dr. David Alan Gilbert Nov. 21, 2014, 10:49 a.m. UTC | #9
* Markus Armbruster (armbru@redhat.com) wrote:
> Don Slutz <dslutz@verizon.com> writes:
> 
> > On 11/19/14 07:29, Markus Armbruster wrote:
> >> Don Slutz <dslutz@verizon.com> writes:
> >>
> >>> The other callers to blk_set_enable_write_cache() in this file
> >>> already check for s->blk == NULL.
> >>>
> >>> Signed-off-by: Don Slutz <dslutz@verizon.com>
> >>> ---
> >>>
> >>> I think this is a bugfix that should be back ported to stable
> >>> releases.
> >>>
> >>> I also think this should be done in xen's copy of QEMU for 4.5 with
> >>> back port(s) to active stable releases.
> >>>
> >>> Note: In 2.1 and earlier the routine is
> >>> bdrv_set_enable_write_cache(); variable is s->bs.
> >> Got a reproducer?
> >
> > yes.  Migrating a guest from xen 4.2 or 4.3 to xen 4.4 (or 4.5-unstable) on
> > CentOS 6.3 with xen_emul_unplug=unnecessary and no cdrom defined.
> >
> >
> >>
> >> I'm asking because I believe s->identify_set implies s->blk.
> >> s->identify_set is initialized to zero, and gets set to non-zero exactly
> >> on the first successful IDENTIFY DEVICE or IDENTIFY PACKET DEVICE, in
> >> ide_identify(), ide_atapi_identify() or ide_cfata_identify(),
> >> respectively.  Only called via cmd_identify() / cmd_identify_packet()
> >> via ide_exec_cmd().  The latter immediately fails when !s->blk:
> >>
> >>      s = idebus_active_if(bus);
> >>      /* ignore commands to non existent slave */
> >>      if (s != bus->ifs && !s->blk) {
> >>          return;
> >>      }
> >
> > I do think that you are right.  I have now spent more time on why I am
> > seeing this.
> >
> >
> >> Even if I'm right, your patch is fine, because it makes this spot more
> >> obviously correct, and consistent with the other uses of
> >> blk_set_enable_write_cache().  The case for stable is weak, though.
> >>
> >
> > I had not fully tracked down what is happening before sending the bugfix.
> > I have now done more debugging, and have tracked it down to xen 4.4
> > now using "-nodefaults" with QEMU.
> >
> > I needed to add output to QEMU to track this down because I have long
> > command lines...
> >
> > (all I get for ps -ef):
> [...]
> >
> >
> > Which is missing that option.
> >
> > The ide that was aborting in this case is the cdrom at hdc that is added
> > if you do not specify "-nodefaults".
> >
> > Since this is a "changed" machine config, I am no longer as sure as what
> > versions this needs to be in.
> >
> > If I put my QEMU hat on, it does not look like a back port is needed.
> > However
> > for xen it would be nice.
> >
> > I do not know how the QEMU community feels about migration from a config
> > without "-nodefaults" to one with "-nodefaults" as the only difference.
> 
> So you have a CD-ROM on the source, but not on the destination?
> 
> That can't work.  I guess it broke for you in an unusual way (target
> crashes) rather than the usual way (target rejects migration data for a
> device it doesn't have) due to our convoluted IDE data structures.  With
> your patch applied it should break the usual way.  Does it?
> 
> Management tools should use -nodefaults.  But if it mixes default and
> -nodefaults in migration, recreating the stuff it got by default but
> doesn't get with -nodefaults is its own responsibility.

Well, mostly - we wouldn't expect a migration to work if the source/dest
didn't match exactly; but QEMU shouldn't seg.

Dave

--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Don Slutz Nov. 25, 2014, 12:48 a.m. UTC | #10
On 11/21/14 05:49, Dr. David Alan Gilbert wrote:
> * Markus Armbruster (armbru@redhat.com) wrote:
>> Don Slutz<dslutz@verizon.com>  writes:
>>
>>> On 11/19/14 07:29, Markus Armbruster wrote:
>>>> Don Slutz<dslutz@verizon.com>  writes:
>>>>
>>>>> The other callers to blk_set_enable_write_cache() in this file
>>>>> already check for s->blk == NULL.
>>>>>
>>>>> Signed-off-by: Don Slutz<dslutz@verizon.com>
>>>>> ---
>>>>>
>>>>> I think this is a bugfix that should be back ported to stable
>>>>> releases.
>>>>>
>>>>> I also think this should be done in xen's copy of QEMU for 4.5 with
>>>>> back port(s) to active stable releases.
>>>>>
>>>>> Note: In 2.1 and earlier the routine is
>>>>> bdrv_set_enable_write_cache(); variable is s->bs.
>>>> Got a reproducer?
>>> yes.  Migrating a guest from xen 4.2 or 4.3 to xen 4.4 (or 4.5-unstable) on
>>> CentOS 6.3 with xen_emul_unplug=unnecessary and no cdrom defined.
>>>
>>>
>>>> I'm asking because I believe s->identify_set implies s->blk.
>>>> s->identify_set is initialized to zero, and gets set to non-zero exactly
>>>> on the first successful IDENTIFY DEVICE or IDENTIFY PACKET DEVICE, in
>>>> ide_identify(), ide_atapi_identify() or ide_cfata_identify(),
>>>> respectively.  Only called via cmd_identify() / cmd_identify_packet()
>>>> via ide_exec_cmd().  The latter immediately fails when !s->blk:
>>>>
>>>>       s = idebus_active_if(bus);
>>>>       /* ignore commands to non existent slave */
>>>>       if (s != bus->ifs && !s->blk) {
>>>>           return;
>>>>       }
>>> I do think that you are right.  I have now spent more time on why I am
>>> seeing this.
>>>
>>>
>>>> Even if I'm right, your patch is fine, because it makes this spot more
>>>> obviously correct, and consistent with the other uses of
>>>> blk_set_enable_write_cache().  The case for stable is weak, though.
>>>>
>>> I had not fully tracked down what is happening before sending the bugfix.
>>> I have now done more debugging, and have tracked it down to xen 4.4
>>> now using "-nodefaults" with QEMU.
>>>
>>> I needed to add output to QEMU to track this down because I have long
>>> command lines...
>>>
>>> (all I get for ps -ef):
>> [...]
>>> Which is missing that option.
>>>
>>> The ide that was aborting in this case is the cdrom at hdc that is added
>>> if you do not specify "-nodefaults".
>>>
>>> Since this is a "changed" machine config, I am no longer as sure as what
>>> versions this needs to be in.
>>>
>>> If I put my QEMU hat on, it does not look like a back port is needed.
>>> However
>>> for xen it would be nice.
>>>
>>> I do not know how the QEMU community feels about migration from a config
>>> without "-nodefaults" to one with "-nodefaults" as the only difference.
>> So you have a CD-ROM on the source, but not on the destination?

Yes, QEMU in the source has an empty CD-ROM, but not on the 
destination.  xen
does not know that QEMU added a CD-ROM drive in hdc by default.

>> That can't work.  I guess it broke for you in an unusual way (target
>> crashes) rather than the usual way (target rejects migration data for a
>> device it doesn't have) due to our convoluted IDE data structures.  With
>> your patch applied it should break the usual way.  Does it?

Nope. It does not break at all.  The migration "works".  The target 
accepts the hdc data.
It looks like to me that all 4 IDE state data is sent.


>> Management tools should use -nodefaults.  But if it mixes default and
>> -nodefaults in migration, recreating the stuff it got by default but
>> doesn't get with -nodefaults is its own responsibility.

Yes. Since xen did not ask for a CD-ROM, it did not expect to need to create
it.   xen was "fixed" to use -nodefaults.

> Well, mostly - we wouldn't expect a migration to work if the source/dest
> didn't match exactly; but QEMU shouldn't seg.

Well, this change prevents a seg.  So you are in favor of having this
backported?


    -Don Slutz

> Dave
>
> --
> Dr. David Alan Gilbert /dgilbert@redhat.com  / Manchester, UK
Dr. David Alan Gilbert Nov. 25, 2014, 8:59 a.m. UTC | #11
* Don Slutz (dslutz@verizon.com) wrote:
> On 11/21/14 05:49, Dr. David Alan Gilbert wrote:
> >* Markus Armbruster (armbru@redhat.com) wrote:
> >>Don Slutz<dslutz@verizon.com>  writes:
> >>
> >>>On 11/19/14 07:29, Markus Armbruster wrote:
> >>>>Don Slutz<dslutz@verizon.com>  writes:
> >>>>
> >>>>>The other callers to blk_set_enable_write_cache() in this file
> >>>>>already check for s->blk == NULL.
> >>>>>
> >>>>>Signed-off-by: Don Slutz<dslutz@verizon.com>
> >>>>>---
> >>>>>
> >>>>>I think this is a bugfix that should be back ported to stable
> >>>>>releases.
> >>>>>
> >>>>>I also think this should be done in xen's copy of QEMU for 4.5 with
> >>>>>back port(s) to active stable releases.
> >>>>>
> >>>>>Note: In 2.1 and earlier the routine is
> >>>>>bdrv_set_enable_write_cache(); variable is s->bs.
> >>>>Got a reproducer?
> >>>yes.  Migrating a guest from xen 4.2 or 4.3 to xen 4.4 (or 4.5-unstable) on
> >>>CentOS 6.3 with xen_emul_unplug=unnecessary and no cdrom defined.
> >>>
> >>>
> >>>>I'm asking because I believe s->identify_set implies s->blk.
> >>>>s->identify_set is initialized to zero, and gets set to non-zero exactly
> >>>>on the first successful IDENTIFY DEVICE or IDENTIFY PACKET DEVICE, in
> >>>>ide_identify(), ide_atapi_identify() or ide_cfata_identify(),
> >>>>respectively.  Only called via cmd_identify() / cmd_identify_packet()
> >>>>via ide_exec_cmd().  The latter immediately fails when !s->blk:
> >>>>
> >>>>      s = idebus_active_if(bus);
> >>>>      /* ignore commands to non existent slave */
> >>>>      if (s != bus->ifs && !s->blk) {
> >>>>          return;
> >>>>      }
> >>>I do think that you are right.  I have now spent more time on why I am
> >>>seeing this.
> >>>
> >>>
> >>>>Even if I'm right, your patch is fine, because it makes this spot more
> >>>>obviously correct, and consistent with the other uses of
> >>>>blk_set_enable_write_cache().  The case for stable is weak, though.
> >>>>
> >>>I had not fully tracked down what is happening before sending the bugfix.
> >>>I have now done more debugging, and have tracked it down to xen 4.4
> >>>now using "-nodefaults" with QEMU.
> >>>
> >>>I needed to add output to QEMU to track this down because I have long
> >>>command lines...
> >>>
> >>>(all I get for ps -ef):
> >>[...]
> >>>Which is missing that option.
> >>>
> >>>The ide that was aborting in this case is the cdrom at hdc that is added
> >>>if you do not specify "-nodefaults".
> >>>
> >>>Since this is a "changed" machine config, I am no longer as sure as what
> >>>versions this needs to be in.
> >>>
> >>>If I put my QEMU hat on, it does not look like a back port is needed.
> >>>However
> >>>for xen it would be nice.
> >>>
> >>>I do not know how the QEMU community feels about migration from a config
> >>>without "-nodefaults" to one with "-nodefaults" as the only difference.
> >>So you have a CD-ROM on the source, but not on the destination?
> 
> Yes, QEMU in the source has an empty CD-ROM, but not on the destination.
> xen
> does not know that QEMU added a CD-ROM drive in hdc by default.
> 
> >>That can't work.  I guess it broke for you in an unusual way (target
> >>crashes) rather than the usual way (target rejects migration data for a
> >>device it doesn't have) due to our convoluted IDE data structures.  With
> >>your patch applied it should break the usual way.  Does it?
> 
> Nope. It does not break at all.  The migration "works".  The target accepts
> the hdc data.
> It looks like to me that all 4 IDE state data is sent.
> 
> 
> >>Management tools should use -nodefaults.  But if it mixes default and
> >>-nodefaults in migration, recreating the stuff it got by default but
> >>doesn't get with -nodefaults is its own responsibility.
> 
> Yes. Since xen did not ask for a CD-ROM, it did not expect to need to create
> it.   xen was "fixed" to use -nodefaults.
> 
> >Well, mostly - we wouldn't expect a migration to work if the source/dest
> >didn't match exactly; but QEMU shouldn't seg.
> 
> Well, this change prevents a seg.  So you are in favor of having this
> backported?

Yes.

Dave

> 
> 
>    -Don Slutz
> 
> >Dave
> >
> >--
> >Dr. David Alan Gilbert /dgilbert@redhat.com  / Manchester, UK
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Markus Armbruster Nov. 25, 2014, 11:11 a.m. UTC | #12
"Dr. David Alan Gilbert" <dgilbert@redhat.com> writes:

> * Don Slutz (dslutz@verizon.com) wrote:
>> On 11/21/14 05:49, Dr. David Alan Gilbert wrote:
>> >* Markus Armbruster (armbru@redhat.com) wrote:
[...]
>> >>Management tools should use -nodefaults.  But if it mixes default and
>> >>-nodefaults in migration, recreating the stuff it got by default but
>> >>doesn't get with -nodefaults is its own responsibility.
>> 
>> Yes. Since xen did not ask for a CD-ROM, it did not expect to need to create
>> it.   xen was "fixed" to use -nodefaults.
>> 
>> >Well, mostly - we wouldn't expect a migration to work if the source/dest
>> >didn't match exactly; but QEMU shouldn't seg.
>> 
>> Well, this change prevents a seg.  So you are in favor of having this
>> backported?
>
> Yes.

I gladly defer to Dave's judgement here.
diff mbox

Patch

diff --git a/hw/ide/core.c b/hw/ide/core.c
index 00e21cf..d4af5e2 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -2401,7 +2401,7 @@  static int ide_drive_post_load(void *opaque, int version_id)
 {
     IDEState *s = opaque;
 
-    if (s->identify_set) {
+    if (s->blk && s->identify_set) {
         blk_set_enable_write_cache(s->blk, !!(s->identify_data[85] & (1 << 5)));
     }
     return 0;