Message ID | 1334595957-12552-1-git-send-email-stefano.stabellini@eu.citrix.com |
---|---|
State | New |
Headers | show |
On Mon, Apr 16, 2012 at 06:05:57PM +0100, Stefano Stabellini wrote: > Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> > --- > hw/xen_disk.c | 8 ++++---- > 1 files changed, 4 insertions(+), 4 deletions(-) > > diff --git a/hw/xen_disk.c b/hw/xen_disk.c > index 015d2af..3e4a47b 100644 > --- a/hw/xen_disk.c > +++ b/hw/xen_disk.c > @@ -183,12 +183,12 @@ static int ioreq_parse(struct ioreq *ioreq) > case BLKIF_OP_READ: > ioreq->prot = PROT_WRITE; /* to memory */ > break; > - case BLKIF_OP_WRITE_BARRIER: > + case BLKIF_OP_FLUSH_DISKCACHE: > if (!ioreq->req.nr_segments) { > ioreq->presync = 1; > return 0; > } > - ioreq->presync = ioreq->postsync = 1; > + ioreq->postsync = 1; > /* fall through */ > case BLKIF_OP_WRITE: > ioreq->prot = PROT_READ; /* from memory */ > @@ -354,7 +354,7 @@ static int ioreq_runio_qemu_aio(struct ioreq *ioreq) > qemu_aio_complete, ioreq); > break; > case BLKIF_OP_WRITE: > - case BLKIF_OP_WRITE_BARRIER: > + case BLKIF_OP_FLUSH_DISKCACHE: > if (!ioreq->req.nr_segments) { > break; > } > @@ -627,7 +627,7 @@ static int blk_init(struct XenDevice *xendev) > blkdev->file_size, blkdev->file_size >> 20); > > /* fill info */ > - xenstore_write_be_int(&blkdev->xendev, "feature-barrier", 1); > + xenstore_write_be_int(&blkdev->xendev, "feature-flush-cache", 1); > xenstore_write_be_int(&blkdev->xendev, "info", info); > xenstore_write_be_int(&blkdev->xendev, "sector-size", blkdev->file_blk); > xenstore_write_be_int(&blkdev->xendev, "sectors", > -- > 1.7.2.5
> - case BLKIF_OP_WRITE_BARRIER: > + case BLKIF_OP_FLUSH_DISKCACHE: > if (!ioreq->req.nr_segments) { > ioreq->presync = 1; > return 0; > } > - ioreq->presync = ioreq->postsync = 1; > + ioreq->postsync = 1; > /* fall through */ It might be worth documenting the semantics of BLKIF_OP_FLUSH_DISKCACHE in a comment here. I haven't found any spec for the xen_disk protocol, but from looking at the Linux frontend it seems like the semantics of REQ_FLUSH and REQ_FUA in the Linux block driver are overloaded into BLKIF_OP_FLUSH_DISKCACHE, which is fairly confusing given that REQ_FLUSH already overload functionality. Even worse REQ_FLUSH with a payload implies a preflush, while REQ_FUA implies a post flush, and it seems like Xen has no way to distinguish the two, making thing like log writes very inefficient. Independent of that the implementation should really use a state machine around bdrv_aio_flush instead of doing guest-sychronous bdrv_flush calls.
On Wed, 2012-04-25 at 09:45 +0100, Christoph Hellwig wrote: > > - case BLKIF_OP_WRITE_BARRIER: > > + case BLKIF_OP_FLUSH_DISKCACHE: > > if (!ioreq->req.nr_segments) { > > ioreq->presync = 1; > > return 0; > > } > > - ioreq->presync = ioreq->postsync = 1; > > + ioreq->postsync = 1; > > /* fall through */ > > It might be worth documenting the semantics of BLKIF_OP_FLUSH_DISKCACHE > in a comment here. I haven't found any spec for the xen_disk protocol, The blkif spec was recently much improved, you can find it at http://xenbits.xen.org/docs/unstable/hypercall/include,public,io,blkif.h.html TBH I'm not sure it actually answers your questions wrt BLKIF_OP_FLUSH_DISKCACHE, if not please let us know and we can see about improving it. Ian
On Wed, Apr 25, 2012 at 10:02:45AM +0100, Ian Campbell wrote: > The blkif spec was recently much improved, you can find it at > http://xenbits.xen.org/docs/unstable/hypercall/include,public,io,blkif.h.html > > TBH I'm not sure it actually answers your questions wrt > BLKIF_OP_FLUSH_DISKCACHE, if not please let us know and we can see about > improving it. That description in there is overly simple, and does not match any of the implementations known to me on either end. Talking about those: the mainline Linux blkback backend also implements different semantics from what mainline Linux blkfront seems to expect, as well as different from qemu. Looking at these three alone I can't see how Xen ever managed to get data to disk reliably if using the paravirt interface. with the implementations in qemu and the Linux kernel frontend and backends, which > > Ian > ---end quoted text---
On Wed, 2012-04-25 at 12:21 +0100, Stefano Stabellini wrote: > On Wed, 25 Apr 2012, Christoph Hellwig wrote: > > On Wed, Apr 25, 2012 at 10:02:45AM +0100, Ian Campbell wrote: > > > The blkif spec was recently much improved, you can find it at > > > http://xenbits.xen.org/docs/unstable/hypercall/include,public,io,blkif.h.html > > > > > > TBH I'm not sure it actually answers your questions wrt > > > BLKIF_OP_FLUSH_DISKCACHE, if not please let us know and we can see about > > > improving it. > > > > That description in there is overly simple, and does not match any of the > > implementations known to me on either end. > > That is true, in fact I couldn't figure out what I had to implement just > reading the comment. So I went through the blkback code and tried to > understand what I had to do, but I got it wrong. > > Reading the code again it seems to me that BLKIF_OP_FLUSH_DISKCACHE > is supposed to have the same semantics as REQ_FLUSH, that implies a > preflush if nr_segments > 0, not a postflush like I did. > > Konrad, can you please confirm this? ... and then provide a patch to blkif.h. Thanks, Ian.
On Wed, 25 Apr 2012, Christoph Hellwig wrote: > On Wed, Apr 25, 2012 at 10:02:45AM +0100, Ian Campbell wrote: > > The blkif spec was recently much improved, you can find it at > > http://xenbits.xen.org/docs/unstable/hypercall/include,public,io,blkif.h.html > > > > TBH I'm not sure it actually answers your questions wrt > > BLKIF_OP_FLUSH_DISKCACHE, if not please let us know and we can see about > > improving it. > > That description in there is overly simple, and does not match any of the > implementations known to me on either end. That is true, in fact I couldn't figure out what I had to implement just reading the comment. So I went through the blkback code and tried to understand what I had to do, but I got it wrong. Reading the code again it seems to me that BLKIF_OP_FLUSH_DISKCACHE is supposed to have the same semantics as REQ_FLUSH, that implies a preflush if nr_segments > 0, not a postflush like I did. Konrad, can you please confirm this?
On Wed, Apr 25, 2012 at 12:21:53PM +0100, Stefano Stabellini wrote: > That is true, in fact I couldn't figure out what I had to implement just > reading the comment. So I went through the blkback code and tried to > understand what I had to do, but I got it wrong. > > Reading the code again it seems to me that BLKIF_OP_FLUSH_DISKCACHE > is supposed to have the same semantics as REQ_FLUSH, that implies a > preflush if nr_segments > 0, not a postflush like I did. It's worse - blkfront translates both a REQ_FLUSH or a REQ_FUA into BLKIF_OP_FLUSH_DISKCACHE. REQ_FLUSH either is a pre flush or a pure flush without a data transfer, and REQ_FUA is a post flush. So to get the proper semantics you'll have to do both, _and_ sequence it so that no operation starts before the previous one finished.
On Wed, Apr 25, 2012 at 01:23:35PM +0200, Christoph Hellwig wrote: > On Wed, Apr 25, 2012 at 12:21:53PM +0100, Stefano Stabellini wrote: > > That is true, in fact I couldn't figure out what I had to implement just > > reading the comment. So I went through the blkback code and tried to > > understand what I had to do, but I got it wrong. > > > > Reading the code again it seems to me that BLKIF_OP_FLUSH_DISKCACHE > > is supposed to have the same semantics as REQ_FLUSH, that implies a > > preflush if nr_segments > 0, not a postflush like I did. > > It's worse - blkfront translates both a REQ_FLUSH or a REQ_FUA > into BLKIF_OP_FLUSH_DISKCACHE. I think that is what remained of the BARRIER request. > > REQ_FLUSH either is a pre flush or a pure flush without a data transfer, > and REQ_FUA is a post flush. So to get the proper semantics you'll have > to do both, _and_ sequence it so that no operation starts before the > previous one finished. If I were to emulate the SCSI SYNC command which one would it be? I think REQ_FLUSH? In which I would think that the blkfront needs to get rid of the REQ_FUA part?
On Thu, 26 Apr 2012, Konrad Rzeszutek Wilk wrote: > On Wed, Apr 25, 2012 at 01:23:35PM +0200, Christoph Hellwig wrote: > > On Wed, Apr 25, 2012 at 12:21:53PM +0100, Stefano Stabellini wrote: > > > That is true, in fact I couldn't figure out what I had to implement just > > > reading the comment. So I went through the blkback code and tried to > > > understand what I had to do, but I got it wrong. > > > > > > Reading the code again it seems to me that BLKIF_OP_FLUSH_DISKCACHE > > > is supposed to have the same semantics as REQ_FLUSH, that implies a > > > preflush if nr_segments > 0, not a postflush like I did. > > > > It's worse - blkfront translates both a REQ_FLUSH or a REQ_FUA > > into BLKIF_OP_FLUSH_DISKCACHE. > > I think that is what remained of the BARRIER request. > > > > REQ_FLUSH either is a pre flush or a pure flush without a data transfer, > > and REQ_FUA is a post flush. So to get the proper semantics you'll have > > to do both, _and_ sequence it so that no operation starts before the > > previous one finished. > > If I were to emulate the SCSI SYNC command which one would it be? > > I think REQ_FLUSH? In which I would think that the blkfront needs to > get rid of the REQ_FUA part? > ping?
On Wed, May 09, 2012 at 01:42:41PM +0100, Stefano Stabellini wrote: > On Thu, 26 Apr 2012, Konrad Rzeszutek Wilk wrote: > > On Wed, Apr 25, 2012 at 01:23:35PM +0200, Christoph Hellwig wrote: > > > On Wed, Apr 25, 2012 at 12:21:53PM +0100, Stefano Stabellini wrote: > > > > That is true, in fact I couldn't figure out what I had to implement just > > > > reading the comment. So I went through the blkback code and tried to > > > > understand what I had to do, but I got it wrong. > > > > > > > > Reading the code again it seems to me that BLKIF_OP_FLUSH_DISKCACHE > > > > is supposed to have the same semantics as REQ_FLUSH, that implies a > > > > preflush if nr_segments > 0, not a postflush like I did. > > > > > > It's worse - blkfront translates both a REQ_FLUSH or a REQ_FUA > > > into BLKIF_OP_FLUSH_DISKCACHE. > > > > I think that is what remained of the BARRIER request. > > > > > > REQ_FLUSH either is a pre flush or a pure flush without a data transfer, > > > and REQ_FUA is a post flush. So to get the proper semantics you'll have > > > to do both, _and_ sequence it so that no operation starts before the > > > previous one finished. > > > > If I were to emulate the SCSI SYNC command which one would it be? > > > > I think REQ_FLUSH? In which I would think that the blkfront needs to > > get rid of the REQ_FUA part? > > > > ping? And just shy of 7 months later I answer :-) I think you are right. Getting rid of REQ_FUA looks like the right way. Oh, and blkfront already does that! 1290 err = xenbus_gather(XBT_NIL, info->xbdev->otherend, 1291 "feature-flush-cache", "%d", &flush, 1292 NULL); 1293 1294 if (!err && flush) { 1295 info->feature_flush = REQ_FLUSH; 1296 info->flush_op = BLKIF_OP_FLUSH_DISKCACHE; 1297 } 1298 So what I am missing? > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
On Wed, 19 Dec 2012, Konrad Rzeszutek Wilk wrote: > On Wed, May 09, 2012 at 01:42:41PM +0100, Stefano Stabellini wrote: > > On Thu, 26 Apr 2012, Konrad Rzeszutek Wilk wrote: > > > On Wed, Apr 25, 2012 at 01:23:35PM +0200, Christoph Hellwig wrote: > > > > On Wed, Apr 25, 2012 at 12:21:53PM +0100, Stefano Stabellini wrote: > > > > > That is true, in fact I couldn't figure out what I had to implement just > > > > > reading the comment. So I went through the blkback code and tried to > > > > > understand what I had to do, but I got it wrong. > > > > > > > > > > Reading the code again it seems to me that BLKIF_OP_FLUSH_DISKCACHE > > > > > is supposed to have the same semantics as REQ_FLUSH, that implies a > > > > > preflush if nr_segments > 0, not a postflush like I did. > > > > > > > > It's worse - blkfront translates both a REQ_FLUSH or a REQ_FUA > > > > into BLKIF_OP_FLUSH_DISKCACHE. > > > > > > I think that is what remained of the BARRIER request. > > > > > > > > REQ_FLUSH either is a pre flush or a pure flush without a data transfer, > > > > and REQ_FUA is a post flush. So to get the proper semantics you'll have > > > > to do both, _and_ sequence it so that no operation starts before the > > > > previous one finished. > > > > > > If I were to emulate the SCSI SYNC command which one would it be? > > > > > > I think REQ_FLUSH? In which I would think that the blkfront needs to > > > get rid of the REQ_FUA part? > > > > > > > ping? > > And just shy of 7 months later I answer :-) > > I think you are right. Getting rid of REQ_FUA looks like the > right way. Oh, and blkfront already does that! > > 1290 err = xenbus_gather(XBT_NIL, info->xbdev->otherend, > 1291 "feature-flush-cache", "%d", &flush, > 1292 NULL); > 1293 > 1294 if (!err && flush) { > 1295 info->feature_flush = REQ_FLUSH; > 1296 info->flush_op = BLKIF_OP_FLUSH_DISKCACHE; > 1297 } > 1298 > > So what I am missing? Nothing, thanks. I have updated and resent the patch, fixing the implementation of BLKIF_OP_FLUSH_DISKCACHE.
diff --git a/hw/xen_disk.c b/hw/xen_disk.c index 015d2af..3e4a47b 100644 --- a/hw/xen_disk.c +++ b/hw/xen_disk.c @@ -183,12 +183,12 @@ static int ioreq_parse(struct ioreq *ioreq) case BLKIF_OP_READ: ioreq->prot = PROT_WRITE; /* to memory */ break; - case BLKIF_OP_WRITE_BARRIER: + case BLKIF_OP_FLUSH_DISKCACHE: if (!ioreq->req.nr_segments) { ioreq->presync = 1; return 0; } - ioreq->presync = ioreq->postsync = 1; + ioreq->postsync = 1; /* fall through */ case BLKIF_OP_WRITE: ioreq->prot = PROT_READ; /* from memory */ @@ -354,7 +354,7 @@ static int ioreq_runio_qemu_aio(struct ioreq *ioreq) qemu_aio_complete, ioreq); break; case BLKIF_OP_WRITE: - case BLKIF_OP_WRITE_BARRIER: + case BLKIF_OP_FLUSH_DISKCACHE: if (!ioreq->req.nr_segments) { break; } @@ -627,7 +627,7 @@ static int blk_init(struct XenDevice *xendev) blkdev->file_size, blkdev->file_size >> 20); /* fill info */ - xenstore_write_be_int(&blkdev->xendev, "feature-barrier", 1); + xenstore_write_be_int(&blkdev->xendev, "feature-flush-cache", 1); xenstore_write_be_int(&blkdev->xendev, "info", info); xenstore_write_be_int(&blkdev->xendev, "sector-size", blkdev->file_blk); xenstore_write_be_int(&blkdev->xendev, "sectors",
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> --- hw/xen_disk.c | 8 ++++---- 1 files changed, 4 insertions(+), 4 deletions(-)