diff mbox

overflow of int ret: use ssize_t for ret

Message ID 1353575275-1343-1-git-send-email-s.priebe@profihost.ag
State New
Headers show

Commit Message

Stefan Priebe - Profihost AG Nov. 22, 2012, 9:07 a.m. UTC
When acb->cmd is WRITE or DISCARD block/rbd stores rcb->size into acb->ret

Look here:
   if (acb->cmd == RBD_AIO_WRITE ||
        acb->cmd == RBD_AIO_DISCARD) {
        if (r < 0) {
            acb->ret = r;
            acb->error = 1;
        } else if (!acb->error) {
            acb->ret = rcb->size;
        }

right now acb->ret is just an int and we might get an overflow if size is too big.
For discards rcb->size holds the size of the discard - this might be some TB if you
discard a whole device.

The steps to reproduce are:
mkfs.xfs -f a whole device bigger than int in bytes. mkfs.xfs sends a discard. Important is that you use scsi-hd and set discard_granularity=512. Otherwise rbd disabled discard support.
---
 block/rbd.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Andreas Färber Nov. 22, 2012, 4:40 p.m. UTC | #1
Am 22.11.2012 10:07, schrieb Stefan Priebe:
> When acb->cmd is WRITE or DISCARD block/rbd stores rcb->size into acb->ret
> 
> Look here:
>    if (acb->cmd == RBD_AIO_WRITE ||
>         acb->cmd == RBD_AIO_DISCARD) {
>         if (r < 0) {
>             acb->ret = r;
>             acb->error = 1;
>         } else if (!acb->error) {
>             acb->ret = rcb->size;
>         }
> 
> right now acb->ret is just an int and we might get an overflow if size is too big.
> For discards rcb->size holds the size of the discard - this might be some TB if you
> discard a whole device.
> 
> The steps to reproduce are:
> mkfs.xfs -f a whole device bigger than int in bytes. mkfs.xfs sends a discard. Important is that you use scsi-hd and set discard_granularity=512. Otherwise rbd disabled discard support.

Whatever type you decide to use, please add an identifying topic such as
"block/rbd:" in the subject (int ret is very generic!), and this patch
is missing a Signed-off-by.

Regards,
Andreas
Stefan Priebe - Profihost AG Nov. 22, 2012, 7:09 p.m. UTC | #2
Hi Andreas,

thanks for your comment. Do i have to resend this patch?

--
Greets,
Stefan

Am 22.11.2012 17:40, schrieb Andreas Färber:
> Am 22.11.2012 10:07, schrieb Stefan Priebe:
>> When acb->cmd is WRITE or DISCARD block/rbd stores rcb->size into acb->ret
>>
>> Look here:
>>     if (acb->cmd == RBD_AIO_WRITE ||
>>          acb->cmd == RBD_AIO_DISCARD) {
>>          if (r<  0) {
>>              acb->ret = r;
>>              acb->error = 1;
>>          } else if (!acb->error) {
>>              acb->ret = rcb->size;
>>          }
>>
>> right now acb->ret is just an int and we might get an overflow if size is too big.
>> For discards rcb->size holds the size of the discard - this might be some TB if you
>> discard a whole device.
>>
>> The steps to reproduce are:
>> mkfs.xfs -f a whole device bigger than int in bytes. mkfs.xfs sends a discard. Important is that you use scsi-hd and set discard_granularity=512. Otherwise rbd disabled discard support.
>
> Whatever type you decide to use, please add an identifying topic such as
> "block/rbd:" in the subject (int ret is very generic!), and this patch
> is missing a Signed-off-by.
>
> Regards,
> Andreas
>
>
Stefan Weil Nov. 22, 2012, 7:37 p.m. UTC | #3
Am 22.11.2012 20:09, schrieb Stefan Priebe - Profihost AG:
> Hi Andreas,
>
> thanks for your comment. Do i have to resend this patch?
>
> -- 
> Greets,
> Stefan
>


Hi Stefan,

I'm afraid yes, you'll have to resend the patch.

Signed-off-by is a must, see http://wiki.qemu.org/Contribute/SubmitAPatch

When you resend the patch, you can fix the minor issues (subject)as well.

Regards

StefanW.
Stefan Priebe - Profihost AG Nov. 22, 2012, 8:49 p.m. UTC | #4
Signed-off-by: Stefan Priebe <s.priebe@profihost.ag>

Am 22.11.2012 10:07, schrieb Stefan Priebe:
> When acb->cmd is WRITE or DISCARD block/rbd stores rcb->size into acb->ret
>
> Look here:
>     if (acb->cmd == RBD_AIO_WRITE ||
>          acb->cmd == RBD_AIO_DISCARD) {
>          if (r<  0) {
>              acb->ret = r;
>              acb->error = 1;
>          } else if (!acb->error) {
>              acb->ret = rcb->size;
>          }
>
> right now acb->ret is just an int and we might get an overflow if size is too big.
> For discards rcb->size holds the size of the discard - this might be some TB if you
> discard a whole device.
>
> The steps to reproduce are:
> mkfs.xfs -f a whole device bigger than int in bytes. mkfs.xfs sends a discard. Important is that you use scsi-hd and set discard_granularity=512. Otherwise rbd disabled discard support.
> ---
>   block/rbd.c |    4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/block/rbd.c b/block/rbd.c
> index 5a0f79f..0384c6c 100644
> --- a/block/rbd.c
> +++ b/block/rbd.c
> @@ -69,7 +69,7 @@ typedef enum {
>   typedef struct RBDAIOCB {
>       BlockDriverAIOCB common;
>       QEMUBH *bh;
> -    int ret;
> +    ssize_t ret;
>       QEMUIOVector *qiov;
>       char *bounce;
>       RBDAIOCmd cmd;
> @@ -86,7 +86,7 @@ typedef struct RADOSCB {
>       int done;
>       int64_t size;
>       char *buf;
> -    int ret;
> +    ssize_t ret;
>   } RADOSCB;
>
>   #define RBD_FD_READ 0
Stefan Hajnoczi Nov. 23, 2012, 2:11 p.m. UTC | #5
On Thu, Nov 22, 2012 at 10:07 AM, Stefan Priebe <s.priebe@profihost.ag> wrote:
> diff --git a/block/rbd.c b/block/rbd.c
> index 5a0f79f..0384c6c 100644
> --- a/block/rbd.c
> +++ b/block/rbd.c
> @@ -69,7 +69,7 @@ typedef enum {
>  typedef struct RBDAIOCB {
>      BlockDriverAIOCB common;
>      QEMUBH *bh;
> -    int ret;
> +    ssize_t ret;
>      QEMUIOVector *qiov;
>      char *bounce;
>      RBDAIOCmd cmd;
> @@ -86,7 +86,7 @@ typedef struct RADOSCB {
>      int done;
>      int64_t size;
>      char *buf;
> -    int ret;
> +    ssize_t ret;
>  } RADOSCB;
>
>  #define RBD_FD_READ 0

I preferred your previous patch:

ssize_t on 32-bit hosts has sizeof(ssize_t) == 4.  In
qemu_rbd_complete_aio() we may assign acb->ret = rcb->size.  Here the
size field is int64_t, so ssize_t ret would truncate the value.

But BlockDriverCompetionFunc only takes an int argument so we're back
to square one.

The types are busted and changing the type of ret won't fix that :(.

Stefan
Peter Maydell Nov. 23, 2012, 2:15 p.m. UTC | #6
On 23 November 2012 14:11, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> On Thu, Nov 22, 2012 at 10:07 AM, Stefan Priebe <s.priebe@profihost.ag> wrote:
>> diff --git a/block/rbd.c b/block/rbd.c
>> index 5a0f79f..0384c6c 100644
>> --- a/block/rbd.c
>> +++ b/block/rbd.c
>> @@ -69,7 +69,7 @@ typedef enum {
>>  typedef struct RBDAIOCB {
>>      BlockDriverAIOCB common;
>>      QEMUBH *bh;
>> -    int ret;
>> +    ssize_t ret;
>>      QEMUIOVector *qiov;
>>      char *bounce;
>>      RBDAIOCmd cmd;
>> @@ -86,7 +86,7 @@ typedef struct RADOSCB {
>>      int done;
>>      int64_t size;
>>      char *buf;
>> -    int ret;
>> +    ssize_t ret;
>>  } RADOSCB;
>>
>>  #define RBD_FD_READ 0
>
> I preferred your previous patch:
>
> ssize_t on 32-bit hosts has sizeof(ssize_t) == 4.  In
> qemu_rbd_complete_aio() we may assign acb->ret = rcb->size.  Here the
> size field is int64_t, so ssize_t ret would truncate the value.

The rcb size field should be a size_t: it is used for calling
rbd_aio_write and rbd_aio_read so if we've overflowed 32 bits
then we've already got a problem there.

-- PMM
Stefan Priebe - Profihost AG Nov. 23, 2012, 2:38 p.m. UTC | #7
Hi,

i'm not a ceph or inktank guy. I can't made any decision on what to 
change. At least right now you'll see failing I/O's in your guest, when 
you discard whole disks.

I could fix this for me with int64 and with ssize_t. So if i should 
resend another patch i need a concrete advise how to patch.

Thanks!

Greets,
Stefan

Am 23.11.2012 15:15, schrieb Peter Maydell:
> On 23 November 2012 14:11, Stefan Hajnoczi <stefanha@gmail.com> wrote:
>> On Thu, Nov 22, 2012 at 10:07 AM, Stefan Priebe <s.priebe@profihost.ag> wrote:
>>> diff --git a/block/rbd.c b/block/rbd.c
>>> index 5a0f79f..0384c6c 100644
>>> --- a/block/rbd.c
>>> +++ b/block/rbd.c
>>> @@ -69,7 +69,7 @@ typedef enum {
>>>   typedef struct RBDAIOCB {
>>>       BlockDriverAIOCB common;
>>>       QEMUBH *bh;
>>> -    int ret;
>>> +    ssize_t ret;
>>>       QEMUIOVector *qiov;
>>>       char *bounce;
>>>       RBDAIOCmd cmd;
>>> @@ -86,7 +86,7 @@ typedef struct RADOSCB {
>>>       int done;
>>>       int64_t size;
>>>       char *buf;
>>> -    int ret;
>>> +    ssize_t ret;
>>>   } RADOSCB;
>>>
>>>   #define RBD_FD_READ 0
>>
>> I preferred your previous patch:
>>
>> ssize_t on 32-bit hosts has sizeof(ssize_t) == 4.  In
>> qemu_rbd_complete_aio() we may assign acb->ret = rcb->size.  Here the
>> size field is int64_t, so ssize_t ret would truncate the value.
>
> The rcb size field should be a size_t: it is used for calling
> rbd_aio_write and rbd_aio_read so if we've overflowed 32 bits
> then we've already got a problem there.
>
> -- PMM
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
Stefan Hajnoczi Nov. 23, 2012, 3:56 p.m. UTC | #8
On Fri, Nov 23, 2012 at 3:15 PM, Peter Maydell <peter.maydell@linaro.org> wrote:
> On 23 November 2012 14:11, Stefan Hajnoczi <stefanha@gmail.com> wrote:
>> On Thu, Nov 22, 2012 at 10:07 AM, Stefan Priebe <s.priebe@profihost.ag> wrote:
>>> diff --git a/block/rbd.c b/block/rbd.c
>>> index 5a0f79f..0384c6c 100644
>>> --- a/block/rbd.c
>>> +++ b/block/rbd.c
>>> @@ -69,7 +69,7 @@ typedef enum {
>>>  typedef struct RBDAIOCB {
>>>      BlockDriverAIOCB common;
>>>      QEMUBH *bh;
>>> -    int ret;
>>> +    ssize_t ret;
>>>      QEMUIOVector *qiov;
>>>      char *bounce;
>>>      RBDAIOCmd cmd;
>>> @@ -86,7 +86,7 @@ typedef struct RADOSCB {
>>>      int done;
>>>      int64_t size;
>>>      char *buf;
>>> -    int ret;
>>> +    ssize_t ret;
>>>  } RADOSCB;
>>>
>>>  #define RBD_FD_READ 0
>>
>> I preferred your previous patch:
>>
>> ssize_t on 32-bit hosts has sizeof(ssize_t) == 4.  In
>> qemu_rbd_complete_aio() we may assign acb->ret = rcb->size.  Here the
>> size field is int64_t, so ssize_t ret would truncate the value.
>
> The rcb size field should be a size_t: it is used for calling
> rbd_aio_write and rbd_aio_read so if we've overflowed 32 bits
> then we've already got a problem there.

You are right that size_t is fine for read/write.

But size_t is not fine for discard:
1. librbd takes uint64_t for discard.
2. Completing a discard request uses size.
3. BlockDriverCompletionFunc only passes int ret value <--- broken for
large discard

Stefan Priebe has been discarding large regions (e.g. >4 GB must be
possible on a 32-bit host).

The previous int64_t patch didn't clean up types completely but it
made things better.  AFAICT we need to be able to discard >4 GB so
ssize_t/size_t doesn't cut it on 32-bit hosts.

Stefan
diff mbox

Patch

diff --git a/block/rbd.c b/block/rbd.c
index 5a0f79f..0384c6c 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -69,7 +69,7 @@  typedef enum {
 typedef struct RBDAIOCB {
     BlockDriverAIOCB common;
     QEMUBH *bh;
-    int ret;
+    ssize_t ret;
     QEMUIOVector *qiov;
     char *bounce;
     RBDAIOCmd cmd;
@@ -86,7 +86,7 @@  typedef struct RADOSCB {
     int done;
     int64_t size;
     char *buf;
-    int ret;
+    ssize_t ret;
 } RADOSCB;
 
 #define RBD_FD_READ 0