Message ID | 1369729127-24499-1-git-send-email-afaerber@suse.de |
---|---|
State | New |
Headers | show |
Am 28.05.2013 um 10:18 hat Andreas Färber geschrieben: > The implementation of the ATA FLUSH command invokes a flush at the block > layer, which may on raw files on POSIX entail a synchronous fdatasync(). > This may in some cases take so long that the SLES 11 SP1 guest driver > reports I/O errors and filesystems get corrupted or remounted read-only. > > Avoid this by setting BUSY_STAT, so that the guest is made aware we are > in the middle of an operation and no ATA commands are attempted to be > processed concurrently. > > Addresses BNC#637297. > > Suggested-by: Gonglei (Arei) <arei.gonglei@huawei.com> > Signed-off-by: Andreas Färber <afaerber@suse.de> > --- > hw/ide/core.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/hw/ide/core.c b/hw/ide/core.c > index c7a8041..bf1ff18 100644 > --- a/hw/ide/core.c > +++ b/hw/ide/core.c > @@ -795,6 +795,8 @@ static void ide_flush_cb(void *opaque, int ret) > { > IDEState *s = opaque; > > + s->status &= ~BUSY_STAT; > + This part is unnecessary, the status is already reset. > if (ret < 0) { > /* XXX: What sector number to set here? */ > if (ide_handle_rw_error(s, -ret, BM_STATUS_RETRY_FLUSH)) { > @@ -814,6 +816,7 @@ void ide_flush_cache(IDEState *s) > return; > } > > + s->status |= BUSY_STAT; > bdrv_acct_start(s->bs, &s->acct, 0, BDRV_ACCT_FLUSH); > bdrv_aio_flush(s->bs, ide_flush_cb, s); > } This should fix the bug, however in an one-off way. I was planning to fix it by setting BSY for all commands and having an explicit command completion everywhere. This part is a mess currently in IDE. The other part why I haven't sent a fix yet is that I don't have a test case for it. I guess I need to extend blkdebug first before this can be reliably tested by qtest. Kevin
Am 28.05.2013 10:27, schrieb Kevin Wolf: > Am 28.05.2013 um 10:18 hat Andreas Färber geschrieben: >> The implementation of the ATA FLUSH command invokes a flush at the block >> layer, which may on raw files on POSIX entail a synchronous fdatasync(). >> This may in some cases take so long that the SLES 11 SP1 guest driver >> reports I/O errors and filesystems get corrupted or remounted read-only. >> >> Avoid this by setting BUSY_STAT, so that the guest is made aware we are >> in the middle of an operation and no ATA commands are attempted to be >> processed concurrently. >> >> Addresses BNC#637297. >> >> Suggested-by: Gonglei (Arei) <arei.gonglei@huawei.com> >> Signed-off-by: Andreas Färber <afaerber@suse.de> >> --- >> hw/ide/core.c | 3 +++ >> 1 file changed, 3 insertions(+) >> >> diff --git a/hw/ide/core.c b/hw/ide/core.c >> index c7a8041..bf1ff18 100644 >> --- a/hw/ide/core.c >> +++ b/hw/ide/core.c >> @@ -795,6 +795,8 @@ static void ide_flush_cb(void *opaque, int ret) >> { >> IDEState *s = opaque; >> >> + s->status &= ~BUSY_STAT; >> + > > This part is unnecessary, the status is already reset. Only in the ret >= 0 case though AFAICS? >> if (ret < 0) { >> /* XXX: What sector number to set here? */ >> if (ide_handle_rw_error(s, -ret, BM_STATUS_RETRY_FLUSH)) { >> @@ -814,6 +816,7 @@ void ide_flush_cache(IDEState *s) >> return; >> } >> >> + s->status |= BUSY_STAT; >> bdrv_acct_start(s->bs, &s->acct, 0, BDRV_ACCT_FLUSH); >> bdrv_aio_flush(s->bs, ide_flush_cb, s); >> } > > This should fix the bug, however in an one-off way. I was planning to > fix it by setting BSY for all commands and having an explicit command > completion everywhere. This part is a mess currently in IDE. That's a valid idea, but I had backporting to 0.15 in mind. ;) And doh, I forgot qemu-stable. > The other part why I haven't sent a fix yet is that I don't have a test > case for it. Temporarily add a sleep(31) in qemu_fdatasync()? I was lazy in testing with -snapshot to not corrupt my disk image, which would not trigger the same issue since qcow2-backed AFAIU. > I guess I need to extend blkdebug first before this can be > reliably tested by qtest. It can't, since it's not a pure device emulation issue but depends on the relative timing of filesystem operations and subsequent commands. Andreas
Am 28.05.2013 um 10:46 hat Andreas Färber geschrieben: > Am 28.05.2013 10:27, schrieb Kevin Wolf: > > Am 28.05.2013 um 10:18 hat Andreas Färber geschrieben: > >> The implementation of the ATA FLUSH command invokes a flush at the block > >> layer, which may on raw files on POSIX entail a synchronous fdatasync(). > >> This may in some cases take so long that the SLES 11 SP1 guest driver > >> reports I/O errors and filesystems get corrupted or remounted read-only. > >> > >> Avoid this by setting BUSY_STAT, so that the guest is made aware we are > >> in the middle of an operation and no ATA commands are attempted to be > >> processed concurrently. > >> > >> Addresses BNC#637297. > >> > >> Suggested-by: Gonglei (Arei) <arei.gonglei@huawei.com> > >> Signed-off-by: Andreas Färber <afaerber@suse.de> > >> --- > >> hw/ide/core.c | 3 +++ > >> 1 file changed, 3 insertions(+) > >> > >> diff --git a/hw/ide/core.c b/hw/ide/core.c > >> index c7a8041..bf1ff18 100644 > >> --- a/hw/ide/core.c > >> +++ b/hw/ide/core.c > >> @@ -795,6 +795,8 @@ static void ide_flush_cb(void *opaque, int ret) > >> { > >> IDEState *s = opaque; > >> > >> + s->status &= ~BUSY_STAT; > >> + > > > > This part is unnecessary, the status is already reset. > > Only in the ret >= 0 case though AFAICS? ide_handle_rw_error() takes care of resetting the status as well, except when the VM is stopped. But then it will be immediately set again when the VM is continued and the request is restarted. So the semantic difference is just whether BSY would be set or not when you somehow inspect the state while the VM is stopped after an I/O error. > >> if (ret < 0) { > >> /* XXX: What sector number to set here? */ > >> if (ide_handle_rw_error(s, -ret, BM_STATUS_RETRY_FLUSH)) { > >> @@ -814,6 +816,7 @@ void ide_flush_cache(IDEState *s) > >> return; > >> } > >> > >> + s->status |= BUSY_STAT; > >> bdrv_acct_start(s->bs, &s->acct, 0, BDRV_ACCT_FLUSH); > >> bdrv_aio_flush(s->bs, ide_flush_cb, s); > >> } > > > > This should fix the bug, however in an one-off way. I was planning to > > fix it by setting BSY for all commands and having an explicit command > > completion everywhere. This part is a mess currently in IDE. > > That's a valid idea, but I had backporting to 0.15 in mind. ;) > And doh, I forgot qemu-stable. Fair enough, we can merge something like this first and do the real thing on top. Though nobody will be interested in the real thing any more, as usual... :-/ > > The other part why I haven't sent a fix yet is that I don't have a test > > case for it. > > Temporarily add a sleep(31) in qemu_fdatasync()? > > I was lazy in testing with -snapshot to not corrupt my disk image, which > would not trigger the same issue since qcow2-backed AFAIU. > > > I guess I need to extend blkdebug first before this can be > > reliably tested by qtest. > > It can't, since it's not a pure device emulation issue but depends on > the relative timing of filesystem operations and subsequent commands. That's why you need to take influence on the timing. It's no excuse for merging without a test case. If we only ever tested devices that have no relation to the outside world, our testing would be pretty useless and always stay as bad as it is today in many areas. Kevin
Il 28/05/2013 11:18, Kevin Wolf ha scritto: >>> The other part why I haven't sent a fix yet is that I don't have a test >>> case for it. >> >> Temporarily add a sleep(31) in qemu_fdatasync()? >> >> I was lazy in testing with -snapshot to not corrupt my disk image, which >> would not trigger the same issue since qcow2-backed AFAIU. >> >>> I guess I need to extend blkdebug first before this can be >>> reliably tested by qtest. >> >> It can't, since it's not a pure device emulation issue but depends on >> the relative timing of filesystem operations and subsequent commands. > > That's why you need to take influence on the timing. It's no excuse for > merging without a test case. If we only ever tested devices that have no > relation to the outside world, our testing would be pretty useless and > always stay as bad as it is today in many areas. I don't think the qtest would be timing dependent. The Linux testcase is timing dependent, but for the qtest all you need to check is "is BUSY set during a flush?". This can be done with blkdebug suspend/resume, except that there is no way to call bdrv_debug_resume from QEMU. Paolo
Am 28.05.2013 um 11:24 hat Paolo Bonzini geschrieben: > Il 28/05/2013 11:18, Kevin Wolf ha scritto: > >>> The other part why I haven't sent a fix yet is that I don't have a test > >>> case for it. > >> > >> Temporarily add a sleep(31) in qemu_fdatasync()? > >> > >> I was lazy in testing with -snapshot to not corrupt my disk image, which > >> would not trigger the same issue since qcow2-backed AFAIU. > >> > >>> I guess I need to extend blkdebug first before this can be > >>> reliably tested by qtest. > >> > >> It can't, since it's not a pure device emulation issue but depends on > >> the relative timing of filesystem operations and subsequent commands. > > > > That's why you need to take influence on the timing. It's no excuse for > > merging without a test case. If we only ever tested devices that have no > > relation to the outside world, our testing would be pretty useless and > > always stay as bad as it is today in many areas. > > I don't think the qtest would be timing dependent. The Linux testcase > is timing dependent, but for the qtest all you need to check is "is BUSY > set during a flush?". This can be done with blkdebug suspend/resume, > except that there is no way to call bdrv_debug_resume from QEMU. That's exactly what I was talking about, suspending a request is taking influence on its timing. I'm looking into this right now. (And it's not just resume, bdrv_debug_suspend can't be called from QEMU either) In fact, I'm checking whether we can have a monitor command to issue qemu-io commands, which will be more generally useful for test cases. We just need to make obvious that it doesn't become an ABI. Maybe prefix it with "__org.qemu.debug-" or something like that. Kevin
Il 28/05/2013 11:36, Kevin Wolf ha scritto: > Am 28.05.2013 um 11:24 hat Paolo Bonzini geschrieben: >> Il 28/05/2013 11:18, Kevin Wolf ha scritto: >>>>> The other part why I haven't sent a fix yet is that I don't have a test >>>>> case for it. >>>> >>>> Temporarily add a sleep(31) in qemu_fdatasync()? >>>> >>>> I was lazy in testing with -snapshot to not corrupt my disk image, which >>>> would not trigger the same issue since qcow2-backed AFAIU. >>>> >>>>> I guess I need to extend blkdebug first before this can be >>>>> reliably tested by qtest. >>>> >>>> It can't, since it's not a pure device emulation issue but depends on >>>> the relative timing of filesystem operations and subsequent commands. >>> >>> That's why you need to take influence on the timing. It's no excuse for >>> merging without a test case. If we only ever tested devices that have no >>> relation to the outside world, our testing would be pretty useless and >>> always stay as bad as it is today in many areas. >> >> I don't think the qtest would be timing dependent. The Linux testcase >> is timing dependent, but for the qtest all you need to check is "is BUSY >> set during a flush?". This can be done with blkdebug suspend/resume, >> except that there is no way to call bdrv_debug_resume from QEMU. > > That's exactly what I was talking about, suspending a request is taking > influence on its timing. I'm looking into this right now. (And it's not > just resume, bdrv_debug_suspend can't be called from QEMU either) It can be called from the rules file though, can't it? > In fact, I'm checking whether we can have a monitor command to issue > qemu-io commands, which will be more generally useful for test cases. We > just need to make obvious that it doesn't become an ABI. Maybe prefix it > with "__org.qemu.debug-" or something like that. Makes sense. I'm not sure why you'd want to read or write from testcases, but bdrv_drain(_all) can also be useful from testcases. Paolo
Am 28.05.2013 um 11:48 hat Paolo Bonzini geschrieben: > Il 28/05/2013 11:36, Kevin Wolf ha scritto: > > Am 28.05.2013 um 11:24 hat Paolo Bonzini geschrieben: > >> Il 28/05/2013 11:18, Kevin Wolf ha scritto: > >>>>> The other part why I haven't sent a fix yet is that I don't have a test > >>>>> case for it. > >>>> > >>>> Temporarily add a sleep(31) in qemu_fdatasync()? > >>>> > >>>> I was lazy in testing with -snapshot to not corrupt my disk image, which > >>>> would not trigger the same issue since qcow2-backed AFAIU. > >>>> > >>>>> I guess I need to extend blkdebug first before this can be > >>>>> reliably tested by qtest. > >>>> > >>>> It can't, since it's not a pure device emulation issue but depends on > >>>> the relative timing of filesystem operations and subsequent commands. > >>> > >>> That's why you need to take influence on the timing. It's no excuse for > >>> merging without a test case. If we only ever tested devices that have no > >>> relation to the outside world, our testing would be pretty useless and > >>> always stay as bad as it is today in many areas. > >> > >> I don't think the qtest would be timing dependent. The Linux testcase > >> is timing dependent, but for the qtest all you need to check is "is BUSY > >> set during a flush?". This can be done with blkdebug suspend/resume, > >> except that there is no way to call bdrv_debug_resume from QEMU. > > > > That's exactly what I was talking about, suspending a request is taking > > influence on its timing. I'm looking into this right now. (And it's not > > just resume, bdrv_debug_suspend can't be called from QEMU either) > > It can be called from the rules file though, can't it? No, you can only define ACTION_INJECT_ERROR and ACTION_SET_STATE from the config file, but not ACTION_SUSPEND. Maybe we should add it, but it would still require a manual resume. So far all test cases suspend requests with explicit qemu-io commands. > > In fact, I'm checking whether we can have a monitor command to issue > > qemu-io commands, which will be more generally useful for test cases. We > > just need to make obvious that it doesn't become an ABI. Maybe prefix it > > with "__org.qemu.debug-" or something like that. > > Makes sense. I'm not sure why you'd want to read or write from > testcases, but bdrv_drain(_all) can also be useful from testcases. I imagine writing could be very useful for block job test cases. Kevin
diff --git a/hw/ide/core.c b/hw/ide/core.c index c7a8041..bf1ff18 100644 --- a/hw/ide/core.c +++ b/hw/ide/core.c @@ -795,6 +795,8 @@ static void ide_flush_cb(void *opaque, int ret) { IDEState *s = opaque; + s->status &= ~BUSY_STAT; + if (ret < 0) { /* XXX: What sector number to set here? */ if (ide_handle_rw_error(s, -ret, BM_STATUS_RETRY_FLUSH)) { @@ -814,6 +816,7 @@ void ide_flush_cache(IDEState *s) return; } + s->status |= BUSY_STAT; bdrv_acct_start(s->bs, &s->acct, 0, BDRV_ACCT_FLUSH); bdrv_aio_flush(s->bs, ide_flush_cb, s); }
The implementation of the ATA FLUSH command invokes a flush at the block layer, which may on raw files on POSIX entail a synchronous fdatasync(). This may in some cases take so long that the SLES 11 SP1 guest driver reports I/O errors and filesystems get corrupted or remounted read-only. Avoid this by setting BUSY_STAT, so that the guest is made aware we are in the middle of an operation and no ATA commands are attempted to be processed concurrently. Addresses BNC#637297. Suggested-by: Gonglei (Arei) <arei.gonglei@huawei.com> Signed-off-by: Andreas Färber <afaerber@suse.de> --- hw/ide/core.c | 3 +++ 1 file changed, 3 insertions(+)