Message ID | 1274203124-14318-1-git-send-email-avi@redhat.com |
---|---|
State | New |
Headers | show |
On Tue, May 18, 2010 at 6:18 PM, Avi Kivity <avi@redhat.com> wrote: > The block multiwrite code pretends to be able to merge overlapping requests, > but doesn't do so in fact. This leads to I/O errors (for example on mkfs > of a large virtio disk). Are overlapping write requests correct guest behavior? I thought the ordering semantics require a flush between overlapping writes to ensure A is written before B. What cache= mode are you running? Stefan
I just caught up on mails and saw you had already mentioned that overlapping writes from the guest look fishy in the "the >1Tb block issue". Cache mode might still be interesting because it affects how guest virtio-blk chooses queue ordering mode. Stefan
18.05.2010 23:37, Stefan Hajnoczi wrote: > I just caught up on mails and saw you had already mentioned that > overlapping writes from the guest look fishy in the "the>1Tb block > issue". Cache mode might still be interesting because it affects how > guest virtio-blk chooses queue ordering mode. What I can say for sure is that the issue mentioned (>1Tb block) occurs regardless of the cache mode. I tried all 3, with exactly the same results (well, not entirely exactly, since the prob depends on timing too, and timing is different depending on the cache mode), and performed all further tests with cache=writeback since it's the mode which lets the guest to finish mkfs'ing the 1.5Tb "disk" in a reasonable time (or else it takes hours). But actually I don't quite see where that dependency is: guest does not know how its data is cached on host... /mjt
On Tue, May 18, 2010 at 8:41 PM, Michael Tokarev <mjt@tls.msk.ru> wrote: > But actually I don't quite see where that dependency is: guest > does not know how its data is cached on host... I was thinking of this bit in linux-2.6:drivers/block/virtio_blk.c: /* If barriers are supported, tell block layer that queue is ordered */ if (virtio_has_feature(vdev, VIRTIO_BLK_F_FLUSH)) blk_queue_ordered(q, QUEUE_ORDERED_DRAIN_FLUSH, virtblk_prepare_flush); else if (virtio_has_feature(vdev, VIRTIO_BLK_F_BARRIER)) blk_queue_ordered(q, QUEUE_ORDERED_TAG, NULL); I was wondering whether we have something wrong, causing the guest to perform overlapping write requests without draining the queue or flushing. Stefan
On 05/18/2010 10:22 PM, Stefan Hajnoczi wrote: > On Tue, May 18, 2010 at 6:18 PM, Avi Kivity<avi@redhat.com> wrote: > >> The block multiwrite code pretends to be able to merge overlapping requests, >> but doesn't do so in fact. This leads to I/O errors (for example on mkfs >> of a large virtio disk). >> > Are overlapping write requests correct guest behavior? I thought the > ordering semantics require a flush between overlapping writes to > ensure A is written before B. > > What cache= mode are you running? > writeback.
On Wed, May 19, 2010 at 9:09 AM, Avi Kivity <avi@redhat.com> wrote: > On 05/18/2010 10:22 PM, Stefan Hajnoczi wrote: >> What cache= mode are you running? > > writeback. In the cache=writeback case the virtio-blk guest driver does: blk_queue_ordered(q, QUEUE_ORDERED_DRAIN_FLUSH, ...) Stefan
On 05/19/2010 12:01 PM, Stefan Hajnoczi wrote: > On Wed, May 19, 2010 at 9:09 AM, Avi Kivity<avi@redhat.com> wrote: > >> On 05/18/2010 10:22 PM, Stefan Hajnoczi wrote: >> >>> What cache= mode are you running? >>> >> writeback. >> > In the cache=writeback case the virtio-blk guest driver does: > > blk_queue_ordered(q, QUEUE_ORDERED_DRAIN_FLUSH, ...) > I don't follow. What's the implication? btw I really dislike how the cache attribute (which I see as a pure host choice) is exposed to the guest. It means we can't change caching mode on the fly (for example after live migration), or that changing caching mode during a restart may expose a previously hidden guest bug.
On Wed, May 19, 2010 at 10:06 AM, Avi Kivity <avi@redhat.com> wrote: >> In the cache=writeback case the virtio-blk guest driver does: >> >> blk_queue_ordered(q, QUEUE_ORDERED_DRAIN_FLUSH, ...) >> > > I don't follow. What's the implication? I was wondering whether the queue is incorrectly set to a mode where overlapping write requests aren't being ordered. Anyway, Christoph says overlapping write requests can be issued so my theory is broken. > btw I really dislike how the cache attribute (which I see as a pure host > choice) is exposed to the guest. It means we can't change caching mode on > the fly (for example after live migration), or that changing caching mode > during a restart may expose a previously hidden guest bug. Christoph has mentioned wanting to make the write cache switchable from inside the guest. Stefan
On Wed, May 19, 2010 at 10:23:44AM +0100, Stefan Hajnoczi wrote: > On Wed, May 19, 2010 at 10:06 AM, Avi Kivity <avi@redhat.com> wrote: > >> In the cache=writeback case the virtio-blk guest driver does: > >> > >> blk_queue_ordered(q, QUEUE_ORDERED_DRAIN_FLUSH, ...) > >> > > > > I don't follow. ?What's the implication? > > I was wondering whether the queue is incorrectly set to a mode where > overlapping write requests aren't being ordered. Anyway, Christoph > says overlapping write requests can be issued so my theory is broken. They can happen, but won't during usual pagecache based writeback. So this should not happen for the pagecache based mke2fs workload. It could happen for a workload with mkfs that uses O_DIRECT. And we need to handle it either way. And btw, our barrier handling for devices using multiwrite (aka virtio) is utterly busted right now, patch will follow ASAP.
diff --git a/block.c b/block.c index 48305b7..0e44e26 100644 --- a/block.c +++ b/block.c @@ -1956,8 +1956,8 @@ static int multiwrite_merge(BlockDriverState *bs, BlockRequest *reqs, int64_t oldreq_last = reqs[outidx].sector + reqs[outidx].nb_sectors; // This handles the cases that are valid for all block drivers, namely - // exactly sequential writes and overlapping writes. - if (reqs[i].sector <= oldreq_last) { + // exactly sequential writes + if (reqs[i].sector == oldreq_last) { merge = 1; }
The block multiwrite code pretends to be able to merge overlapping requests, but doesn't do so in fact. This leads to I/O errors (for example on mkfs of a large virtio disk). Signed-off-by: Avi Kivity <avi@redhat.com> --- block.c | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-)