diff mbox

[1/3] scsi-disk: Limit zero write request to SCSI_WRITE_SAME_MAX

Message ID 1429864436-17711-2-git-send-email-famz@redhat.com
State New
Headers show

Commit Message

Fam Zheng April 24, 2015, 8:33 a.m. UTC
SBC-4 says:

    If the number of logical blocks specified to be unmapped or written
    exceeds the value indicated in the MAXIMUM WRITE SAME LENGTH field
    in the Block Limits VPD page (see 6.6.4), then the device server
    shall terminate the command with CHECK CONDITION status with the
    sense key set to ILLEGAL REQUEST and the additional sense code set
    to INVALID FIELD IN CDB.

Check the request size to match the spec.

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 hw/scsi/scsi-disk.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

Comments

Paolo Bonzini April 24, 2015, 8:50 a.m. UTC | #1
On 24/04/2015 10:33, Fam Zheng wrote:
> SBC-4 says:
> 
>     If the number of logical blocks specified to be unmapped or written
>     exceeds the value indicated in the MAXIMUM WRITE SAME LENGTH field
>     in the Block Limits VPD page (see 6.6.4), then the device server
>     shall terminate the command with CHECK CONDITION status with the
>     sense key set to ILLEGAL REQUEST and the additional sense code set
>     to INVALID FIELD IN CDB.
> 
> Check the request size to match the spec.
> 
> Signed-off-by: Fam Zheng <famz@redhat.com>
> ---
>  hw/scsi/scsi-disk.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
> index 54d71f4..b748982 100644
> --- a/hw/scsi/scsi-disk.c
> +++ b/hw/scsi/scsi-disk.c
> @@ -1707,6 +1707,11 @@ static void scsi_disk_emulate_write_same(SCSIDiskReq *r, uint8_t *inbuf)
>          return;
>      }
>  
> +    if (nb_sectors * (s->qdev.blocksize / 512) * 512 > SCSI_WRITE_SAME_MAX) {
> +        scsi_check_condition(r, SENSE_CODE(INVALID_FIELD));
> +        return;
> +    }
> +
>      if (buffer_is_zero(inbuf, s->qdev.blocksize)) {
>          int flags = (req->cmd.buf[1] & 0x8) ? BDRV_REQ_MAY_UNMAP : 0;
>  
> @@ -1726,7 +1731,7 @@ static void scsi_disk_emulate_write_same(SCSIDiskReq *r, uint8_t *inbuf)
>      data->r = r;
>      data->sector = r->req.cmd.lba * (s->qdev.blocksize / 512);
>      data->nb_sectors = nb_sectors * (s->qdev.blocksize / 512);
> -    data->iov.iov_len = MIN(data->nb_sectors * 512, SCSI_WRITE_SAME_MAX);
> +    data->iov.iov_len = data->nb_sectors * 512;
>      data->iov.iov_base = buf = blk_blockalign(s->qdev.conf.blk,
>                                                data->iov.iov_len);
>      qemu_iovec_init_external(&data->qiov, &data->iov, 1);
> 

SCSI_WRITE_SAME_MAX is not exported as the MAXIMUM WRITE SAME LENGTH in
the VPD.  In fact, we don't export anything and prefer to get very large
requests, because then we can choose ourselves how to split them and
serialize them.  By contrast, if you have a low limit, some guests
including Linux will send a lot of concurrent WRITE SAME requests which
will have a huge latency.

In addition, SCSI_WRITE_SAME_MAX is 512 *kilo*bytes.  That's really
little :) as some disks have a multi-*mega*byte unmap granularity.  So
what is SCSI_WRITE_SAME_MAX?

Simply, when we're asked to do a WRITE SAME with non-zero payload, we
build a 512 KiB buffer and fill it repeatedly with the pattern.  Then
the I/O is split in multiple writes, each covering 512 KiB except
possibly the last.  Note that non-zero WRITE SAME is not a fast path.

So this patch is not correct.  However, it shouldn't be a problem for
the rest of the series.

Paolo
Fam Zheng April 24, 2015, 9:02 a.m. UTC | #2
On Fri, 04/24 10:50, Paolo Bonzini wrote:
> 
> 
> On 24/04/2015 10:33, Fam Zheng wrote:
> > SBC-4 says:
> > 
> >     If the number of logical blocks specified to be unmapped or written
> >     exceeds the value indicated in the MAXIMUM WRITE SAME LENGTH field
> >     in the Block Limits VPD page (see 6.6.4), then the device server
> >     shall terminate the command with CHECK CONDITION status with the
> >     sense key set to ILLEGAL REQUEST and the additional sense code set
> >     to INVALID FIELD IN CDB.
> > 
> > Check the request size to match the spec.
> > 
> > Signed-off-by: Fam Zheng <famz@redhat.com>
> > ---
> >  hw/scsi/scsi-disk.c | 7 ++++++-
> >  1 file changed, 6 insertions(+), 1 deletion(-)
> > 
> > diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
> > index 54d71f4..b748982 100644
> > --- a/hw/scsi/scsi-disk.c
> > +++ b/hw/scsi/scsi-disk.c
> > @@ -1707,6 +1707,11 @@ static void scsi_disk_emulate_write_same(SCSIDiskReq *r, uint8_t *inbuf)
> >          return;
> >      }
> >  
> > +    if (nb_sectors * (s->qdev.blocksize / 512) * 512 > SCSI_WRITE_SAME_MAX) {
> > +        scsi_check_condition(r, SENSE_CODE(INVALID_FIELD));
> > +        return;
> > +    }
> > +
> >      if (buffer_is_zero(inbuf, s->qdev.blocksize)) {
> >          int flags = (req->cmd.buf[1] & 0x8) ? BDRV_REQ_MAY_UNMAP : 0;
> >  
> > @@ -1726,7 +1731,7 @@ static void scsi_disk_emulate_write_same(SCSIDiskReq *r, uint8_t *inbuf)
> >      data->r = r;
> >      data->sector = r->req.cmd.lba * (s->qdev.blocksize / 512);
> >      data->nb_sectors = nb_sectors * (s->qdev.blocksize / 512);
> > -    data->iov.iov_len = MIN(data->nb_sectors * 512, SCSI_WRITE_SAME_MAX);
> > +    data->iov.iov_len = data->nb_sectors * 512;
> >      data->iov.iov_base = buf = blk_blockalign(s->qdev.conf.blk,
> >                                                data->iov.iov_len);
> >      qemu_iovec_init_external(&data->qiov, &data->iov, 1);
> > 
> 
> SCSI_WRITE_SAME_MAX is not exported as the MAXIMUM WRITE SAME LENGTH in
> the VPD.  In fact, we don't export anything and prefer to get very large
> requests, because then we can choose ourselves how to split them and
> serialize them.  By contrast, if you have a low limit, some guests
> including Linux will send a lot of concurrent WRITE SAME requests which
> will have a huge latency.
> 
> In addition, SCSI_WRITE_SAME_MAX is 512 *kilo*bytes.  That's really
> little :) as some disks have a multi-*mega*byte unmap granularity.  So
> what is SCSI_WRITE_SAME_MAX?
> 
> Simply, when we're asked to do a WRITE SAME with non-zero payload, we
> build a 512 KiB buffer and fill it repeatedly with the pattern.  Then
> the I/O is split in multiple writes, each covering 512 KiB except
> possibly the last.  Note that non-zero WRITE SAME is not a fast path.
> 
> So this patch is not correct.

OK, I get it. Thanks for explanation.

> However, it shouldn't be a problem for
> the rest of the series.

It is. The request has to be splitted to aligned part and unaligned part
because the latter requires a buffer, as we don't like unbounded allocation.

Fam
Paolo Bonzini April 24, 2015, 9:03 a.m. UTC | #3
On 24/04/2015 11:02, Fam Zheng wrote:
> 
>> > However, it shouldn't be a problem for
>> > the rest of the series.
> It is. The request has to be splitted to aligned part and unaligned part
> because the latter requires a buffer, as we don't like unbounded allocation.

Yes, I found that out after reading patch 2. :(

Paolo
diff mbox

Patch

diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
index 54d71f4..b748982 100644
--- a/hw/scsi/scsi-disk.c
+++ b/hw/scsi/scsi-disk.c
@@ -1707,6 +1707,11 @@  static void scsi_disk_emulate_write_same(SCSIDiskReq *r, uint8_t *inbuf)
         return;
     }
 
+    if (nb_sectors * (s->qdev.blocksize / 512) * 512 > SCSI_WRITE_SAME_MAX) {
+        scsi_check_condition(r, SENSE_CODE(INVALID_FIELD));
+        return;
+    }
+
     if (buffer_is_zero(inbuf, s->qdev.blocksize)) {
         int flags = (req->cmd.buf[1] & 0x8) ? BDRV_REQ_MAY_UNMAP : 0;
 
@@ -1726,7 +1731,7 @@  static void scsi_disk_emulate_write_same(SCSIDiskReq *r, uint8_t *inbuf)
     data->r = r;
     data->sector = r->req.cmd.lba * (s->qdev.blocksize / 512);
     data->nb_sectors = nb_sectors * (s->qdev.blocksize / 512);
-    data->iov.iov_len = MIN(data->nb_sectors * 512, SCSI_WRITE_SAME_MAX);
+    data->iov.iov_len = data->nb_sectors * 512;
     data->iov.iov_base = buf = blk_blockalign(s->qdev.conf.blk,
                                               data->iov.iov_len);
     qemu_iovec_init_external(&data->qiov, &data->iov, 1);