diff mbox

[09/26] ide: Set BSY bit during FLUSH

Message ID 1371073298-14519-10-git-send-email-mdroth@linux.vnet.ibm.com
State New
Headers show

Commit Message

Michael Roth June 12, 2013, 9:41 p.m. UTC
From: Andreas Färber <afaerber@suse.de>

The implementation of the ATA FLUSH command invokes a flush at the block
layer, which may on raw files on POSIX entail a synchronous fdatasync().
This may in some cases take so long that the SLES 11 SP1 guest driver
reports I/O errors and filesystems get corrupted or remounted read-only.

Avoid this by setting BUSY_STAT, so that the guest is made aware we are
in the middle of an operation and no ATA commands are attempted to be
processed concurrently.

Addresses BNC#637297.

Suggested-by: Gonglei (Arei) <arei.gonglei@huawei.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
(cherry picked from commit f68ec8379e88502b4841a110c070e9b118d3151c)

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
 hw/ide/core.c |    1 +
 1 file changed, 1 insertion(+)

Comments

Alex Williamson July 3, 2013, 8:10 p.m. UTC | #1
On Wed, 2013-06-12 at 16:41 -0500, Michael Roth wrote:
> From: Andreas Färber <afaerber@suse.de>
> 
> The implementation of the ATA FLUSH command invokes a flush at the block
> layer, which may on raw files on POSIX entail a synchronous fdatasync().
> This may in some cases take so long that the SLES 11 SP1 guest driver
> reports I/O errors and filesystems get corrupted or remounted read-only.
> 
> Avoid this by setting BUSY_STAT, so that the guest is made aware we are
> in the middle of an operation and no ATA commands are attempted to be
> processed concurrently.
> 
> Addresses BNC#637297.
> 
> Suggested-by: Gonglei (Arei) <arei.gonglei@huawei.com>
> Signed-off-by: Andreas Färber <afaerber@suse.de>
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> (cherry picked from commit f68ec8379e88502b4841a110c070e9b118d3151c)
> 
> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> ---
>  hw/ide/core.c |    1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/hw/ide/core.c b/hw/ide/core.c
> index c7a8041..9926d92 100644
> --- a/hw/ide/core.c
> +++ b/hw/ide/core.c
> @@ -814,6 +814,7 @@ void ide_flush_cache(IDEState *s)
>          return;
>      }
>  
> +    s->status |= BUSY_STAT;
>      bdrv_acct_start(s->bs, &s->acct, 0, BDRV_ACCT_FLUSH);
>      bdrv_aio_flush(s->bs, ide_flush_cb, s);
>  }

This caused a regression for me from 1.5.0->1.5.1.  Windows 7 x64 no
longer boots on q35 IDE with this change.  Thanks,

Alex
Michael Roth July 3, 2013, 8:16 p.m. UTC | #2
On Wed, Jul 3, 2013 at 3:10 PM, Alex Williamson
<alex.williamson@redhat.com> wrote:
> On Wed, 2013-06-12 at 16:41 -0500, Michael Roth wrote:
>> From: Andreas Färber <afaerber@suse.de>
>>
>> The implementation of the ATA FLUSH command invokes a flush at the block
>> layer, which may on raw files on POSIX entail a synchronous fdatasync().
>> This may in some cases take so long that the SLES 11 SP1 guest driver
>> reports I/O errors and filesystems get corrupted or remounted read-only.
>>
>> Avoid this by setting BUSY_STAT, so that the guest is made aware we are
>> in the middle of an operation and no ATA commands are attempted to be
>> processed concurrently.
>>
>> Addresses BNC#637297.
>>
>> Suggested-by: Gonglei (Arei) <arei.gonglei@huawei.com>
>> Signed-off-by: Andreas Färber <afaerber@suse.de>
>> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
>> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
>> (cherry picked from commit f68ec8379e88502b4841a110c070e9b118d3151c)
>>
>> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
>> ---
>>  hw/ide/core.c |    1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/hw/ide/core.c b/hw/ide/core.c
>> index c7a8041..9926d92 100644
>> --- a/hw/ide/core.c
>> +++ b/hw/ide/core.c
>> @@ -814,6 +814,7 @@ void ide_flush_cache(IDEState *s)
>>          return;
>>      }
>>
>> +    s->status |= BUSY_STAT;
>>      bdrv_acct_start(s->bs, &s->acct, 0, BDRV_ACCT_FLUSH);
>>      bdrv_aio_flush(s->bs, ide_flush_cb, s);
>>  }
>
> This caused a regression for me from 1.5.0->1.5.1.  Windows 7 x64 no
> longer boots on q35 IDE with this change.  Thanks,

Are you seeing the issue for upstream builds as well?

>
> Alex
>
Alex Williamson July 3, 2013, 9:51 p.m. UTC | #3
On Wed, 2013-07-03 at 15:16 -0500, Michael Roth wrote:
> On Wed, Jul 3, 2013 at 3:10 PM, Alex Williamson
> <alex.williamson@redhat.com> wrote:
> > On Wed, 2013-06-12 at 16:41 -0500, Michael Roth wrote:
> >> From: Andreas Färber <afaerber@suse.de>
> >>
> >> The implementation of the ATA FLUSH command invokes a flush at the block
> >> layer, which may on raw files on POSIX entail a synchronous fdatasync().
> >> This may in some cases take so long that the SLES 11 SP1 guest driver
> >> reports I/O errors and filesystems get corrupted or remounted read-only.
> >>
> >> Avoid this by setting BUSY_STAT, so that the guest is made aware we are
> >> in the middle of an operation and no ATA commands are attempted to be
> >> processed concurrently.
> >>
> >> Addresses BNC#637297.
> >>
> >> Suggested-by: Gonglei (Arei) <arei.gonglei@huawei.com>
> >> Signed-off-by: Andreas Färber <afaerber@suse.de>
> >> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> >> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> >> (cherry picked from commit f68ec8379e88502b4841a110c070e9b118d3151c)
> >>
> >> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> >> ---
> >>  hw/ide/core.c |    1 +
> >>  1 file changed, 1 insertion(+)
> >>
> >> diff --git a/hw/ide/core.c b/hw/ide/core.c
> >> index c7a8041..9926d92 100644
> >> --- a/hw/ide/core.c
> >> +++ b/hw/ide/core.c
> >> @@ -814,6 +814,7 @@ void ide_flush_cache(IDEState *s)
> >>          return;
> >>      }
> >>
> >> +    s->status |= BUSY_STAT;
> >>      bdrv_acct_start(s->bs, &s->acct, 0, BDRV_ACCT_FLUSH);
> >>      bdrv_aio_flush(s->bs, ide_flush_cb, s);
> >>  }
> >
> > This caused a regression for me from 1.5.0->1.5.1.  Windows 7 x64 no
> > longer boots on q35 IDE with this change.  Thanks,
> 
> Are you seeing the issue for upstream builds as well?

Yes, I bisected this on upstream.  If i revert just this from 1.5.1, I
can boot again.  Upstream requires reverting this and a workaround for
9afce429.  Thanks,

Alex
Michael Roth Aug. 12, 2013, 10:43 p.m. UTC | #4
Quoting Alex Williamson (2013-07-03 16:51:56)
> On Wed, 2013-07-03 at 15:16 -0500, Michael Roth wrote:
> > On Wed, Jul 3, 2013 at 3:10 PM, Alex Williamson
> > <alex.williamson@redhat.com> wrote:
> > > On Wed, 2013-06-12 at 16:41 -0500, Michael Roth wrote:
> > >> From: Andreas Färber <afaerber@suse.de>
> > >>
> > >> The implementation of the ATA FLUSH command invokes a flush at the block
> > >> layer, which may on raw files on POSIX entail a synchronous fdatasync().
> > >> This may in some cases take so long that the SLES 11 SP1 guest driver
> > >> reports I/O errors and filesystems get corrupted or remounted read-only.
> > >>
> > >> Avoid this by setting BUSY_STAT, so that the guest is made aware we are
> > >> in the middle of an operation and no ATA commands are attempted to be
> > >> processed concurrently.
> > >>
> > >> Addresses BNC#637297.
> > >>
> > >> Suggested-by: Gonglei (Arei) <arei.gonglei@huawei.com>
> > >> Signed-off-by: Andreas Färber <afaerber@suse.de>
> > >> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> > >> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > >> (cherry picked from commit f68ec8379e88502b4841a110c070e9b118d3151c)
> > >>
> > >> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> > >> ---
> > >>  hw/ide/core.c |    1 +
> > >>  1 file changed, 1 insertion(+)
> > >>
> > >> diff --git a/hw/ide/core.c b/hw/ide/core.c
> > >> index c7a8041..9926d92 100644
> > >> --- a/hw/ide/core.c
> > >> +++ b/hw/ide/core.c
> > >> @@ -814,6 +814,7 @@ void ide_flush_cache(IDEState *s)
> > >>          return;
> > >>      }
> > >>
> > >> +    s->status |= BUSY_STAT;
> > >>      bdrv_acct_start(s->bs, &s->acct, 0, BDRV_ACCT_FLUSH);
> > >>      bdrv_aio_flush(s->bs, ide_flush_cb, s);
> > >>  }
> > >
> > > This caused a regression for me from 1.5.0->1.5.1.  Windows 7 x64 no
> > > longer boots on q35 IDE with this change.  Thanks,
> > 
> > Are you seeing the issue for upstream builds as well?
> 
> Yes, I bisected this on upstream.  If i revert just this from 1.5.1, I
> can boot again.  Upstream requires reverting this and a workaround for
> 9afce429.  Thanks,

Does the problem still exist upstream? I'd rather not revert without a
fix for the issue in 9afce429 since that might cause a regression for 1.5.1
users now.

Sucks either way but I think it's safer to have Win7 not boot on q35 than
re-introduce potential image corruption.

> 
> Alex
Michael Roth Aug. 13, 2013, 12:12 a.m. UTC | #5
Quoting Michael Roth (2013-08-12 17:43:00)
> Quoting Alex Williamson (2013-07-03 16:51:56)
> > On Wed, 2013-07-03 at 15:16 -0500, Michael Roth wrote:
> > > On Wed, Jul 3, 2013 at 3:10 PM, Alex Williamson
> > > <alex.williamson@redhat.com> wrote:
> > > > On Wed, 2013-06-12 at 16:41 -0500, Michael Roth wrote:
> > > >> From: Andreas Färber <afaerber@suse.de>
> > > >>
> > > >> The implementation of the ATA FLUSH command invokes a flush at the block
> > > >> layer, which may on raw files on POSIX entail a synchronous fdatasync().
> > > >> This may in some cases take so long that the SLES 11 SP1 guest driver
> > > >> reports I/O errors and filesystems get corrupted or remounted read-only.
> > > >>
> > > >> Avoid this by setting BUSY_STAT, so that the guest is made aware we are
> > > >> in the middle of an operation and no ATA commands are attempted to be
> > > >> processed concurrently.
> > > >>
> > > >> Addresses BNC#637297.
> > > >>
> > > >> Suggested-by: Gonglei (Arei) <arei.gonglei@huawei.com>
> > > >> Signed-off-by: Andreas Färber <afaerber@suse.de>
> > > >> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> > > >> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > > >> (cherry picked from commit f68ec8379e88502b4841a110c070e9b118d3151c)
> > > >>
> > > >> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> > > >> ---
> > > >>  hw/ide/core.c |    1 +
> > > >>  1 file changed, 1 insertion(+)
> > > >>
> > > >> diff --git a/hw/ide/core.c b/hw/ide/core.c
> > > >> index c7a8041..9926d92 100644
> > > >> --- a/hw/ide/core.c
> > > >> +++ b/hw/ide/core.c
> > > >> @@ -814,6 +814,7 @@ void ide_flush_cache(IDEState *s)
> > > >>          return;
> > > >>      }
> > > >>
> > > >> +    s->status |= BUSY_STAT;
> > > >>      bdrv_acct_start(s->bs, &s->acct, 0, BDRV_ACCT_FLUSH);
> > > >>      bdrv_aio_flush(s->bs, ide_flush_cb, s);
> > > >>  }
> > > >
> > > > This caused a regression for me from 1.5.0->1.5.1.  Windows 7 x64 no
> > > > longer boots on q35 IDE with this change.  Thanks,
> > > 
> > > Are you seeing the issue for upstream builds as well?
> > 
> > Yes, I bisected this on upstream.  If i revert just this from 1.5.1, I
> > can boot again.  Upstream requires reverting this and a workaround for
> > 9afce429.  Thanks,
> 
> Does the problem still exist upstream? I'd rather not revert without a
> fix for the issue in 9afce429 since that might cause a regression for 1.5.1
> users now.

Ok, looks like we have a fix for this that was already sent to stable:

  a62eaa26c1d6d48fbdc3ac1d32bd1314f5fdc8c9

Could've sworn I searched master for a reference to f68ec837...

Will pull it in for 1.5.3

> 
> Sucks either way but I think it's safer to have Win7 not boot on q35 than
> re-introduce potential image corruption.
> 
> > 
> > Alex
diff mbox

Patch

diff --git a/hw/ide/core.c b/hw/ide/core.c
index c7a8041..9926d92 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -814,6 +814,7 @@  void ide_flush_cache(IDEState *s)
         return;
     }
 
+    s->status |= BUSY_STAT;
     bdrv_acct_start(s->bs, &s->acct, 0, BDRV_ACCT_FLUSH);
     bdrv_aio_flush(s->bs, ide_flush_cb, s);
 }