Patchwork [7/8] iscsi: assert that sectors are aligned to LUN blocksize

login
register
mail settings
Submitter Peter Lieven
Date June 22, 2013, 8:58 p.m.
Message ID <1371934712-11714-8-git-send-email-pl@kamp.de>
Download mbox | patch
Permalink /patch/253433/
State New
Headers show

Comments

Peter Lieven - June 22, 2013, 8:58 p.m.
if the blocksize of an iSCSI LUN is bigger than the BDRV_SECTOR_SIZE
it is possible that sector_num or nb_sectors are not correctly
alligned.

for now assert that there is no misalignment to avoid data
corruption.

Signed-off-by: Peter Lieven <pl@kamp.de>
---
 block/iscsi.c |    7 +++++++
 1 file changed, 7 insertions(+)
Paolo Bonzini - June 24, 2013, 2:30 p.m.
Il 22/06/2013 22:58, Peter Lieven ha scritto:
> if the blocksize of an iSCSI LUN is bigger than the BDRV_SECTOR_SIZE
> it is possible that sector_num or nb_sectors are not correctly
> alligned.
> 
> for now assert that there is no misalignment to avoid data
> corruption.

You should just fail to open the LUN instead.

Paolo

> Signed-off-by: Peter Lieven <pl@kamp.de>
> ---
>  block/iscsi.c |    7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/block/iscsi.c b/block/iscsi.c
> index 683692a..2c410b1 100644
> --- a/block/iscsi.c
> +++ b/block/iscsi.c
> @@ -249,6 +249,7 @@ iscsi_aio_write16_cb(struct iscsi_context *iscsi, int status,
>  
>  static int64_t sector_qemu2lun(int64_t sector, IscsiLun *iscsilun)
>  {
> +    assert((sector * BDRV_SECTOR_SIZE) % iscsilun->block_size == 0);
>      return sector * BDRV_SECTOR_SIZE / iscsilun->block_size;
>  }
>  
> @@ -269,6 +270,8 @@ iscsi_aio_writev_acb(IscsiAIOCB *acb)
>      acb->status     = -EINPROGRESS;
>      acb->buf        = NULL;
>  
> +    assert((acb->nb_sectors * BDRV_SECTOR_SIZE) % acb->iscsilun->block_size == 0);
> +
>      /* this will allow us to get rid of 'buf' completely */
>      size = acb->nb_sectors * BDRV_SECTOR_SIZE;
>  
> @@ -618,6 +621,8 @@ static int iscsi_aio_discard_acb(IscsiAIOCB *acb) {
>      acb->buf        = NULL;
>  
>      list[0].lba = sector_qemu2lun(acb->sector_num, acb->iscsilun);
> +
> +    assert((acb->nb_sectors * BDRV_SECTOR_SIZE) % acb->iscsilun->block_size == 0);
>      list[0].num = acb->nb_sectors * BDRV_SECTOR_SIZE / acb->iscsilun->block_size;
>  
>      acb->task = iscsi_unmap_task(iscsi, acb->iscsilun->lun,
> @@ -947,6 +952,8 @@ coroutine_fn iscsi_co_write_zeroes(BlockDriverState *bs, int64_t sector_num,
>      iTask.complete = 0;
>      iTask.bs = bs;
>  
> +    assert((nb_sectors * BDRV_SECTOR_SIZE) % iscsilun->block_size == 0);
> +
>      list[0].lba = sector_qemu2lun(sector_num, iscsilun);
>      list[0].num = nb_sectors * BDRV_SECTOR_SIZE / iscsilun->block_size;
>  
>
Peter Lieven - June 24, 2013, 4:10 p.m.
Am 24.06.2013 16:30, schrieb Paolo Bonzini:
> Il 22/06/2013 22:58, Peter Lieven ha scritto:
>> if the blocksize of an iSCSI LUN is bigger than the BDRV_SECTOR_SIZE
>> it is possible that sector_num or nb_sectors are not correctly
>> alligned.
>>
>> for now assert that there is no misalignment to avoid data
>> corruption.
> You should just fail to open the LUN instead.
Ronnie added support for reading fragments of blocksize if the
offset is aligned. I don't know what the idea was, maybe for qemu
reading the boot sector. If the OS does only read multiple of
blocksize at aligned offsets everything should work. I now that
our storages support 4K blocksize. I can check if it is usable.
I could also just fail the operations instead of asserting.

Peter
>
> Paolo
>
>> Signed-off-by: Peter Lieven <pl@kamp.de>
>> ---
>>  block/iscsi.c |    7 +++++++
>>  1 file changed, 7 insertions(+)
>>
>> diff --git a/block/iscsi.c b/block/iscsi.c
>> index 683692a..2c410b1 100644
>> --- a/block/iscsi.c
>> +++ b/block/iscsi.c
>> @@ -249,6 +249,7 @@ iscsi_aio_write16_cb(struct iscsi_context *iscsi, int status,
>>  
>>  static int64_t sector_qemu2lun(int64_t sector, IscsiLun *iscsilun)
>>  {
>> +    assert((sector * BDRV_SECTOR_SIZE) % iscsilun->block_size == 0);
>>      return sector * BDRV_SECTOR_SIZE / iscsilun->block_size;
>>  }
>>  
>> @@ -269,6 +270,8 @@ iscsi_aio_writev_acb(IscsiAIOCB *acb)
>>      acb->status     = -EINPROGRESS;
>>      acb->buf        = NULL;
>>  
>> +    assert((acb->nb_sectors * BDRV_SECTOR_SIZE) % acb->iscsilun->block_size == 0);
>> +
>>      /* this will allow us to get rid of 'buf' completely */
>>      size = acb->nb_sectors * BDRV_SECTOR_SIZE;
>>  
>> @@ -618,6 +621,8 @@ static int iscsi_aio_discard_acb(IscsiAIOCB *acb) {
>>      acb->buf        = NULL;
>>  
>>      list[0].lba = sector_qemu2lun(acb->sector_num, acb->iscsilun);
>> +
>> +    assert((acb->nb_sectors * BDRV_SECTOR_SIZE) % acb->iscsilun->block_size == 0);
>>      list[0].num = acb->nb_sectors * BDRV_SECTOR_SIZE / acb->iscsilun->block_size;
>>  
>>      acb->task = iscsi_unmap_task(iscsi, acb->iscsilun->lun,
>> @@ -947,6 +952,8 @@ coroutine_fn iscsi_co_write_zeroes(BlockDriverState *bs, int64_t sector_num,
>>      iTask.complete = 0;
>>      iTask.bs = bs;
>>  
>> +    assert((nb_sectors * BDRV_SECTOR_SIZE) % iscsilun->block_size == 0);
>> +
>>      list[0].lba = sector_qemu2lun(sector_num, iscsilun);
>>      list[0].num = nb_sectors * BDRV_SECTOR_SIZE / iscsilun->block_size;
>>  
>>
Paolo Bonzini - June 24, 2013, 4:13 p.m.
Il 24/06/2013 18:10, Peter Lieven ha scritto:
> Am 24.06.2013 16:30, schrieb Paolo Bonzini:
>> Il 22/06/2013 22:58, Peter Lieven ha scritto:
>>> if the blocksize of an iSCSI LUN is bigger than the BDRV_SECTOR_SIZE
>>> it is possible that sector_num or nb_sectors are not correctly
>>> alligned.
>>>
>>> for now assert that there is no misalignment to avoid data
>>> corruption.
>> You should just fail to open the LUN instead.
> Ronnie added support for reading fragments of blocksize if the
> offset is aligned. I don't know what the idea was, maybe for qemu
> reading the boot sector. If the OS does only read multiple of
> blocksize at aligned offsets everything should work. I now that
> our storages support 4K blocksize. I can check if it is usable.
> I could also just fail the operations instead of asserting.

So far, 4K blocksize is usable if you also specify the same block size
for the guest device.  I have posted once the patches to do
read-modify-write, but I never really pursued inclusion of those.

In any case, the right place to fix this is the block layer;
driver-specific hacks are... hacks. :)

Paolo
Peter Lieven - June 24, 2013, 4:24 p.m.
Am 24.06.2013 18:13, schrieb Paolo Bonzini:
> Il 24/06/2013 18:10, Peter Lieven ha scritto:
>> Am 24.06.2013 16:30, schrieb Paolo Bonzini:
>>> Il 22/06/2013 22:58, Peter Lieven ha scritto:
>>>> if the blocksize of an iSCSI LUN is bigger than the BDRV_SECTOR_SIZE
>>>> it is possible that sector_num or nb_sectors are not correctly
>>>> alligned.
>>>>
>>>> for now assert that there is no misalignment to avoid data
>>>> corruption.
>>> You should just fail to open the LUN instead.
>> Ronnie added support for reading fragments of blocksize if the
>> offset is aligned. I don't know what the idea was, maybe for qemu
>> reading the boot sector. If the OS does only read multiple of
>> blocksize at aligned offsets everything should work. I now that
>> our storages support 4K blocksize. I can check if it is usable.
>> I could also just fail the operations instead of asserting.
> So far, 4K blocksize is usable if you also specify the same block size
> for the guest device.  I have posted once the patches to do
> read-modify-write, but I never really pursued inclusion of those.
>
> In any case, the right place to fix this is the block layer;
> driver-specific hacks are... hacks. :)
Where do I find the sector size if not in BDRV_SECTOR_SIZE?
Is there a dynamic field or is the answer qemu only support 512 Byte
sector size at the moment?

So you would go for fail to open a device if the LUN blocksize is
not equal to BDRV_SECTOR_SIZE?

Peter
Paolo Bonzini - June 24, 2013, 4:27 p.m.
Il 24/06/2013 18:24, Peter Lieven ha scritto:
>> > So far, 4K blocksize is usable if you also specify the same block size
>> > for the guest device.  I have posted once the patches to do
>> > read-modify-write, but I never really pursued inclusion of those.
>> >
>> > In any case, the right place to fix this is the block layer;
>> > driver-specific hacks are... hacks. :)
> Where do I find the sector size if not in BDRV_SECTOR_SIZE?
> Is there a dynamic field or is the answer qemu only support 512 Byte
> sector size at the moment?

bdrv_read/write and friends are always in 512-byte units.

There is bs->buffer_alignment for the guest sector size, but it is not
set at bdrv_open, only later.  So I think QEMU support for bigger
sectors is too sparse to be useful.

> So you would go for fail to open a device if the LUN blocksize is
> not equal to BDRV_SECTOR_SIZE?

Either that, or fails reads/writes that are not aligned.

Paolo
Peter Lieven - June 24, 2013, 4:36 p.m.
Am 24.06.2013 18:27, schrieb Paolo Bonzini:
> Il 24/06/2013 18:24, Peter Lieven ha scritto:
>>>> So far, 4K blocksize is usable if you also specify the same block size
>>>> for the guest device.  I have posted once the patches to do
>>>> read-modify-write, but I never really pursued inclusion of those.
>>>>
>>>> In any case, the right place to fix this is the block layer;
>>>> driver-specific hacks are... hacks. :)
>> Where do I find the sector size if not in BDRV_SECTOR_SIZE?
>> Is there a dynamic field or is the answer qemu only support 512 Byte
>> sector size at the moment?
> bdrv_read/write and friends are always in 512-byte units.
>
> There is bs->buffer_alignment for the guest sector size, but it is not
> set at bdrv_open, only later.  So I think QEMU support for bigger
> sectors is too sparse to be useful.
>
>> So you would go for fail to open a device if the LUN blocksize is
>> not equal to BDRV_SECTOR_SIZE?
> Either that, or fails reads/writes that are not aligned.
It would be the easiest. Maybe Ronnie can point out what the hack
with the acb->buffer_offset in aio_read was meant for. All other
functions have the requirement to be properly aligned regarding
to the xfersize. I am also not sure if the e.g. reading of 512 byte
at sector 0 from a 4K block device does still work with the
iovector patches we added earlier.

Peter
>
> Paolo

Patch

diff --git a/block/iscsi.c b/block/iscsi.c
index 683692a..2c410b1 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -249,6 +249,7 @@  iscsi_aio_write16_cb(struct iscsi_context *iscsi, int status,
 
 static int64_t sector_qemu2lun(int64_t sector, IscsiLun *iscsilun)
 {
+    assert((sector * BDRV_SECTOR_SIZE) % iscsilun->block_size == 0);
     return sector * BDRV_SECTOR_SIZE / iscsilun->block_size;
 }
 
@@ -269,6 +270,8 @@  iscsi_aio_writev_acb(IscsiAIOCB *acb)
     acb->status     = -EINPROGRESS;
     acb->buf        = NULL;
 
+    assert((acb->nb_sectors * BDRV_SECTOR_SIZE) % acb->iscsilun->block_size == 0);
+
     /* this will allow us to get rid of 'buf' completely */
     size = acb->nb_sectors * BDRV_SECTOR_SIZE;
 
@@ -618,6 +621,8 @@  static int iscsi_aio_discard_acb(IscsiAIOCB *acb) {
     acb->buf        = NULL;
 
     list[0].lba = sector_qemu2lun(acb->sector_num, acb->iscsilun);
+
+    assert((acb->nb_sectors * BDRV_SECTOR_SIZE) % acb->iscsilun->block_size == 0);
     list[0].num = acb->nb_sectors * BDRV_SECTOR_SIZE / acb->iscsilun->block_size;
 
     acb->task = iscsi_unmap_task(iscsi, acb->iscsilun->lun,
@@ -947,6 +952,8 @@  coroutine_fn iscsi_co_write_zeroes(BlockDriverState *bs, int64_t sector_num,
     iTask.complete = 0;
     iTask.bs = bs;
 
+    assert((nb_sectors * BDRV_SECTOR_SIZE) % iscsilun->block_size == 0);
+
     list[0].lba = sector_qemu2lun(sector_num, iscsilun);
     list[0].num = nb_sectors * BDRV_SECTOR_SIZE / iscsilun->block_size;