Message ID | 1429864436-17711-2-git-send-email-famz@redhat.com |
---|---|
State | New |
Headers | show |
On 24/04/2015 10:33, Fam Zheng wrote: > SBC-4 says: > > If the number of logical blocks specified to be unmapped or written > exceeds the value indicated in the MAXIMUM WRITE SAME LENGTH field > in the Block Limits VPD page (see 6.6.4), then the device server > shall terminate the command with CHECK CONDITION status with the > sense key set to ILLEGAL REQUEST and the additional sense code set > to INVALID FIELD IN CDB. > > Check the request size to match the spec. > > Signed-off-by: Fam Zheng <famz@redhat.com> > --- > hw/scsi/scsi-disk.c | 7 ++++++- > 1 file changed, 6 insertions(+), 1 deletion(-) > > diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c > index 54d71f4..b748982 100644 > --- a/hw/scsi/scsi-disk.c > +++ b/hw/scsi/scsi-disk.c > @@ -1707,6 +1707,11 @@ static void scsi_disk_emulate_write_same(SCSIDiskReq *r, uint8_t *inbuf) > return; > } > > + if (nb_sectors * (s->qdev.blocksize / 512) * 512 > SCSI_WRITE_SAME_MAX) { > + scsi_check_condition(r, SENSE_CODE(INVALID_FIELD)); > + return; > + } > + > if (buffer_is_zero(inbuf, s->qdev.blocksize)) { > int flags = (req->cmd.buf[1] & 0x8) ? BDRV_REQ_MAY_UNMAP : 0; > > @@ -1726,7 +1731,7 @@ static void scsi_disk_emulate_write_same(SCSIDiskReq *r, uint8_t *inbuf) > data->r = r; > data->sector = r->req.cmd.lba * (s->qdev.blocksize / 512); > data->nb_sectors = nb_sectors * (s->qdev.blocksize / 512); > - data->iov.iov_len = MIN(data->nb_sectors * 512, SCSI_WRITE_SAME_MAX); > + data->iov.iov_len = data->nb_sectors * 512; > data->iov.iov_base = buf = blk_blockalign(s->qdev.conf.blk, > data->iov.iov_len); > qemu_iovec_init_external(&data->qiov, &data->iov, 1); > SCSI_WRITE_SAME_MAX is not exported as the MAXIMUM WRITE SAME LENGTH in the VPD. In fact, we don't export anything and prefer to get very large requests, because then we can choose ourselves how to split them and serialize them. By contrast, if you have a low limit, some guests including Linux will send a lot of concurrent WRITE SAME requests which will have a huge latency. In addition, SCSI_WRITE_SAME_MAX is 512 *kilo*bytes. That's really little :) as some disks have a multi-*mega*byte unmap granularity. So what is SCSI_WRITE_SAME_MAX? Simply, when we're asked to do a WRITE SAME with non-zero payload, we build a 512 KiB buffer and fill it repeatedly with the pattern. Then the I/O is split in multiple writes, each covering 512 KiB except possibly the last. Note that non-zero WRITE SAME is not a fast path. So this patch is not correct. However, it shouldn't be a problem for the rest of the series. Paolo
On Fri, 04/24 10:50, Paolo Bonzini wrote: > > > On 24/04/2015 10:33, Fam Zheng wrote: > > SBC-4 says: > > > > If the number of logical blocks specified to be unmapped or written > > exceeds the value indicated in the MAXIMUM WRITE SAME LENGTH field > > in the Block Limits VPD page (see 6.6.4), then the device server > > shall terminate the command with CHECK CONDITION status with the > > sense key set to ILLEGAL REQUEST and the additional sense code set > > to INVALID FIELD IN CDB. > > > > Check the request size to match the spec. > > > > Signed-off-by: Fam Zheng <famz@redhat.com> > > --- > > hw/scsi/scsi-disk.c | 7 ++++++- > > 1 file changed, 6 insertions(+), 1 deletion(-) > > > > diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c > > index 54d71f4..b748982 100644 > > --- a/hw/scsi/scsi-disk.c > > +++ b/hw/scsi/scsi-disk.c > > @@ -1707,6 +1707,11 @@ static void scsi_disk_emulate_write_same(SCSIDiskReq *r, uint8_t *inbuf) > > return; > > } > > > > + if (nb_sectors * (s->qdev.blocksize / 512) * 512 > SCSI_WRITE_SAME_MAX) { > > + scsi_check_condition(r, SENSE_CODE(INVALID_FIELD)); > > + return; > > + } > > + > > if (buffer_is_zero(inbuf, s->qdev.blocksize)) { > > int flags = (req->cmd.buf[1] & 0x8) ? BDRV_REQ_MAY_UNMAP : 0; > > > > @@ -1726,7 +1731,7 @@ static void scsi_disk_emulate_write_same(SCSIDiskReq *r, uint8_t *inbuf) > > data->r = r; > > data->sector = r->req.cmd.lba * (s->qdev.blocksize / 512); > > data->nb_sectors = nb_sectors * (s->qdev.blocksize / 512); > > - data->iov.iov_len = MIN(data->nb_sectors * 512, SCSI_WRITE_SAME_MAX); > > + data->iov.iov_len = data->nb_sectors * 512; > > data->iov.iov_base = buf = blk_blockalign(s->qdev.conf.blk, > > data->iov.iov_len); > > qemu_iovec_init_external(&data->qiov, &data->iov, 1); > > > > SCSI_WRITE_SAME_MAX is not exported as the MAXIMUM WRITE SAME LENGTH in > the VPD. In fact, we don't export anything and prefer to get very large > requests, because then we can choose ourselves how to split them and > serialize them. By contrast, if you have a low limit, some guests > including Linux will send a lot of concurrent WRITE SAME requests which > will have a huge latency. > > In addition, SCSI_WRITE_SAME_MAX is 512 *kilo*bytes. That's really > little :) as some disks have a multi-*mega*byte unmap granularity. So > what is SCSI_WRITE_SAME_MAX? > > Simply, when we're asked to do a WRITE SAME with non-zero payload, we > build a 512 KiB buffer and fill it repeatedly with the pattern. Then > the I/O is split in multiple writes, each covering 512 KiB except > possibly the last. Note that non-zero WRITE SAME is not a fast path. > > So this patch is not correct. OK, I get it. Thanks for explanation. > However, it shouldn't be a problem for > the rest of the series. It is. The request has to be splitted to aligned part and unaligned part because the latter requires a buffer, as we don't like unbounded allocation. Fam
On 24/04/2015 11:02, Fam Zheng wrote: > >> > However, it shouldn't be a problem for >> > the rest of the series. > It is. The request has to be splitted to aligned part and unaligned part > because the latter requires a buffer, as we don't like unbounded allocation. Yes, I found that out after reading patch 2. :( Paolo
diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c index 54d71f4..b748982 100644 --- a/hw/scsi/scsi-disk.c +++ b/hw/scsi/scsi-disk.c @@ -1707,6 +1707,11 @@ static void scsi_disk_emulate_write_same(SCSIDiskReq *r, uint8_t *inbuf) return; } + if (nb_sectors * (s->qdev.blocksize / 512) * 512 > SCSI_WRITE_SAME_MAX) { + scsi_check_condition(r, SENSE_CODE(INVALID_FIELD)); + return; + } + if (buffer_is_zero(inbuf, s->qdev.blocksize)) { int flags = (req->cmd.buf[1] & 0x8) ? BDRV_REQ_MAY_UNMAP : 0; @@ -1726,7 +1731,7 @@ static void scsi_disk_emulate_write_same(SCSIDiskReq *r, uint8_t *inbuf) data->r = r; data->sector = r->req.cmd.lba * (s->qdev.blocksize / 512); data->nb_sectors = nb_sectors * (s->qdev.blocksize / 512); - data->iov.iov_len = MIN(data->nb_sectors * 512, SCSI_WRITE_SAME_MAX); + data->iov.iov_len = data->nb_sectors * 512; data->iov.iov_base = buf = blk_blockalign(s->qdev.conf.blk, data->iov.iov_len); qemu_iovec_init_external(&data->qiov, &data->iov, 1);
SBC-4 says: If the number of logical blocks specified to be unmapped or written exceeds the value indicated in the MAXIMUM WRITE SAME LENGTH field in the Block Limits VPD page (see 6.6.4), then the device server shall terminate the command with CHECK CONDITION status with the sense key set to ILLEGAL REQUEST and the additional sense code set to INVALID FIELD IN CDB. Check the request size to match the spec. Signed-off-by: Fam Zheng <famz@redhat.com> --- hw/scsi/scsi-disk.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)