diff mbox

use int64_t for return values from rbd instead of int

Message ID 1353415495-30561-1-git-send-email-s.priebe@profihost.ag
State New
Headers show

Commit Message

Stefan Priebe - Profihost AG Nov. 20, 2012, 12:44 p.m. UTC
rbd / rados tends to return pretty often length of writes
or discarded blocks. These values might be bigger than int.

Signed-off-by: Stefan Priebe <s.priebe@profihost.ag>
---
 block/rbd.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Stefan Hajnoczi Nov. 20, 2012, 4:29 p.m. UTC | #1
On Tue, Nov 20, 2012 at 01:44:55PM +0100, Stefan Priebe wrote:
> rbd / rados tends to return pretty often length of writes
> or discarded blocks. These values might be bigger than int.
> 
> Signed-off-by: Stefan Priebe <s.priebe@profihost.ag>
> ---
>  block/rbd.c |    4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

Looks good but I want to check whether this fixes an bug you've hit?
Please indicate details of the bug and how to reproduce it in the commit
message.

Stefan
Stefan Priebe - Profihost AG Nov. 20, 2012, 7:16 p.m. UTC | #2
Hi Stefan,

Am 20.11.2012 17:29, schrieb Stefan Hajnoczi:
> On Tue, Nov 20, 2012 at 01:44:55PM +0100, Stefan Priebe wrote:
>> rbd / rados tends to return pretty often length of writes
>> or discarded blocks. These values might be bigger than int.
>>
>> Signed-off-by: Stefan Priebe <s.priebe@profihost.ag>
>> ---
>>   block/rbd.c |    4 ++--
>>   1 file changed, 2 insertions(+), 2 deletions(-)
>
> Looks good but I want to check whether this fixes an bug you've hit?
> Please indicate details of the bug and how to reproduce it in the commit
> message.

you get various I/O errors in client. As negative return values indicate 
I/O errors. When now a big positive value is returned by librbd 
block/rbd tries to store this one in acb->ret which is an int. Then it 
wraps around and is negative. After that block/rbd thinks this is an I/O 
error and report this to the guest.

Stefan
Stefan Hajnoczi Nov. 21, 2012, 6:41 a.m. UTC | #3
On Tue, Nov 20, 2012 at 8:16 PM, Stefan Priebe <s.priebe@profihost.ag> wrote:
> Hi Stefan,
>
> Am 20.11.2012 17:29, schrieb Stefan Hajnoczi:
>
>> On Tue, Nov 20, 2012 at 01:44:55PM +0100, Stefan Priebe wrote:
>>>
>>> rbd / rados tends to return pretty often length of writes
>>> or discarded blocks. These values might be bigger than int.
>>>
>>> Signed-off-by: Stefan Priebe <s.priebe@profihost.ag>
>>> ---
>>>   block/rbd.c |    4 ++--
>>>   1 file changed, 2 insertions(+), 2 deletions(-)
>>
>>
>> Looks good but I want to check whether this fixes an bug you've hit?
>> Please indicate details of the bug and how to reproduce it in the commit
>> message.
>
>
> you get various I/O errors in client. As negative return values indicate I/O
> errors. When now a big positive value is returned by librbd block/rbd tries
> to store this one in acb->ret which is an int. Then it wraps around and is
> negative. After that block/rbd thinks this is an I/O error and report this
> to the guest.

It's still not clear whether this is a bug that you can reproduce.
After all, the ret value would have to be >2^31 which is a 2+ GB
request!

I'm asking if this is a critical bug fix that needs to go into QEMU
1.3-rc1 because of a real-world issue?

Stefan
Stefan Priebe - Profihost AG Nov. 21, 2012, 7:47 a.m. UTC | #4
Am 21.11.2012 07:41, schrieb Stefan Hajnoczi:
> On Tue, Nov 20, 2012 at 8:16 PM, Stefan Priebe <s.priebe@profihost.ag> wrote:
>> Hi Stefan,
>>
>> Am 20.11.2012 17:29, schrieb Stefan Hajnoczi:
>>
>>> On Tue, Nov 20, 2012 at 01:44:55PM +0100, Stefan Priebe wrote:
>>>>
>>>> rbd / rados tends to return pretty often length of writes
>>>> or discarded blocks. These values might be bigger than int.
>>>>
>>>> Signed-off-by: Stefan Priebe <s.priebe@profihost.ag>
>>>> ---
>>>>    block/rbd.c |    4 ++--
>>>>    1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>>
>>> Looks good but I want to check whether this fixes an bug you've hit?
>>> Please indicate details of the bug and how to reproduce it in the commit
>>> message.
>>
>>
>> you get various I/O errors in client. As negative return values indicate I/O
>> errors. When now a big positive value is returned by librbd block/rbd tries
>> to store this one in acb->ret which is an int. Then it wraps around and is
>> negative. After that block/rbd thinks this is an I/O error and report this
>> to the guest.
>
> It's still not clear whether this is a bug that you can reproduce.
> After all, the ret value would have to be >2^31 which is a 2+ GB
> request!
Yes and that is the fact.

Look here:
    if (acb->cmd == RBD_AIO_WRITE ||
         acb->cmd == RBD_AIO_DISCARD) {
         if (r < 0) {
             acb->ret = r;
             acb->error = 1;
         } else if (!acb->error) {
             acb->ret = rcb->size;
         }

It sets acb->ret to rcb->size. But the size from a DISCARD if you 
DISCARD a whole device might be 500GB or today even some TB.

Greets,
Stefan
Stefan Hajnoczi Nov. 21, 2012, 8:26 a.m. UTC | #5
On Wed, Nov 21, 2012 at 08:47:16AM +0100, Stefan Priebe - Profihost AG wrote:
> Am 21.11.2012 07:41, schrieb Stefan Hajnoczi:
> >On Tue, Nov 20, 2012 at 8:16 PM, Stefan Priebe <s.priebe@profihost.ag> wrote:
> >>Hi Stefan,
> >>
> >>Am 20.11.2012 17:29, schrieb Stefan Hajnoczi:
> >>
> >>>On Tue, Nov 20, 2012 at 01:44:55PM +0100, Stefan Priebe wrote:
> >>>>
> >>>>rbd / rados tends to return pretty often length of writes
> >>>>or discarded blocks. These values might be bigger than int.
> >>>>
> >>>>Signed-off-by: Stefan Priebe <s.priebe@profihost.ag>
> >>>>---
> >>>>   block/rbd.c |    4 ++--
> >>>>   1 file changed, 2 insertions(+), 2 deletions(-)
> >>>
> >>>
> >>>Looks good but I want to check whether this fixes an bug you've hit?
> >>>Please indicate details of the bug and how to reproduce it in the commit
> >>>message.
> >>
> >>
> >>you get various I/O errors in client. As negative return values indicate I/O
> >>errors. When now a big positive value is returned by librbd block/rbd tries
> >>to store this one in acb->ret which is an int. Then it wraps around and is
> >>negative. After that block/rbd thinks this is an I/O error and report this
> >>to the guest.
> >
> >It's still not clear whether this is a bug that you can reproduce.
> >After all, the ret value would have to be >2^31 which is a 2+ GB
> >request!
> Yes and that is the fact.
> 
> Look here:
>    if (acb->cmd == RBD_AIO_WRITE ||
>         acb->cmd == RBD_AIO_DISCARD) {
>         if (r < 0) {
>             acb->ret = r;
>             acb->error = 1;
>         } else if (!acb->error) {
>             acb->ret = rcb->size;
>         }
> 
> It sets acb->ret to rcb->size. But the size from a DISCARD if you
> DISCARD a whole device might be 500GB or today even some TB.

We're going in circles here.  I know the types are wrong in the code and
your patch fixes it, that's why I said it looks good in my first reply.

QEMU is currently in hard freeze and only critical patches should go in.
Providing steps to reproduce the bug helps me decide that this patch
should still be merged for QEMU 1.3-rc1.

Anyway, the patch is straightforward, I have applied it to my block tree
and it will be in QEMU 1.3-rc1:
https://github.com/stefanha/qemu/commits/block

Stefan
Stefan Priebe - Profihost AG Nov. 21, 2012, 8:33 a.m. UTC | #6
Am 21.11.2012 09:26, schrieb Stefan Hajnoczi:
> On Wed, Nov 21, 2012 at 08:47:16AM +0100, Stefan Priebe - Profihost AG wrote:
>> Am 21.11.2012 07:41, schrieb Stefan Hajnoczi:
> We're going in circles here.  I know the types are wrong in the code and
> your patch fixes it, that's why I said it looks good in my first reply.

Sorry not so familiar with processes like these.


> QEMU is currently in hard freeze and only critical patches should go in.
> Providing steps to reproduce the bug helps me decide that this patch
> should still be merged for QEMU 1.3-rc1.
>
> Anyway, the patch is straightforward, I have applied it to my block tree
> and it will be in QEMU 1.3-rc1:
> https://github.com/stefanha/qemu/commits/block

Thanks!

The steps to reproduce are:
mkfs.xfs -f a whole device bigger than int in bytes. mkfs.xfs sends a 
discard. Important is that you use scsi-hd and set 
discard_granularity=512. Otherwise rbd disabled discard support.

Might you have a look at my other rbd fix too? It fixes a race between 
task cancellation and writes. The same race was fixed in iscsi this summer.

Greets,
Stefan
Stefan Hajnoczi Nov. 21, 2012, 8:43 a.m. UTC | #7
On Wed, Nov 21, 2012 at 09:33:08AM +0100, Stefan Priebe - Profihost AG wrote:
> Am 21.11.2012 09:26, schrieb Stefan Hajnoczi:
> >On Wed, Nov 21, 2012 at 08:47:16AM +0100, Stefan Priebe - Profihost AG wrote:
> >>Am 21.11.2012 07:41, schrieb Stefan Hajnoczi:
> >QEMU is currently in hard freeze and only critical patches should go in.
> >Providing steps to reproduce the bug helps me decide that this patch
> >should still be merged for QEMU 1.3-rc1.
> >
> >Anyway, the patch is straightforward, I have applied it to my block tree
> >and it will be in QEMU 1.3-rc1:
> >https://github.com/stefanha/qemu/commits/block
> 
> Thanks!
> 
> The steps to reproduce are:
> mkfs.xfs -f a whole device bigger than int in bytes. mkfs.xfs sends
> a discard. Important is that you use scsi-hd and set
> discard_granularity=512. Otherwise rbd disabled discard support.

Excellent, thanks!  I will add it to the commit description.

> Might you have a look at my other rbd fix too? It fixes a race
> between task cancellation and writes. The same race was fixed in
> iscsi this summer.

Yes.

Stefan
Stefan Weil Nov. 21, 2012, 5:03 p.m. UTC | #8
Am 20.11.2012 13:44, schrieb Stefan Priebe:
> rbd / rados tends to return pretty often length of writes
> or discarded blocks. These values might be bigger than int.
>
> Signed-off-by: Stefan Priebe <s.priebe@profihost.ag>
> ---
>   block/rbd.c |    4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/block/rbd.c b/block/rbd.c
> index f57d0c6..6bf9c2e 100644
> --- a/block/rbd.c
> +++ b/block/rbd.c
> @@ -69,7 +69,7 @@ typedef enum {
>   typedef struct RBDAIOCB {
>       BlockDriverAIOCB common;
>       QEMUBH *bh;
> -    int ret;
> +    int64_t ret;
>       QEMUIOVector *qiov;
>       char *bounce;
>       RBDAIOCmd cmd;
> @@ -87,7 +87,7 @@ typedef struct RADOSCB {
>       int done;
>       int64_t size;
>       char *buf;
> -    int ret;
> +    int64_t ret;
>   } RADOSCB;
>   
>   #define RBD_FD_READ 0


Why do you use int64_t instead of off_t?
If the value is related to file sizes, off_t would be a good choice.

Stefan W.
Stefan Priebe - Profihost AG Nov. 21, 2012, 8:53 p.m. UTC | #9
Not sure about off_t. What is min and max size?

Stefan

Am 21.11.2012 um 18:03 schrieb Stefan Weil <sw@weilnetz.de>:

> Am 20.11.2012 13:44, schrieb Stefan Priebe:
>> rbd / rados tends to return pretty often length of writes
>> or discarded blocks. These values might be bigger than int.
>> 
>> Signed-off-by: Stefan Priebe <s.priebe@profihost.ag>
>> ---
>>  block/rbd.c |    4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>> 
>> diff --git a/block/rbd.c b/block/rbd.c
>> index f57d0c6..6bf9c2e 100644
>> --- a/block/rbd.c
>> +++ b/block/rbd.c
>> @@ -69,7 +69,7 @@ typedef enum {
>>  typedef struct RBDAIOCB {
>>      BlockDriverAIOCB common;
>>      QEMUBH *bh;
>> -    int ret;
>> +    int64_t ret;
>>      QEMUIOVector *qiov;
>>      char *bounce;
>>      RBDAIOCmd cmd;
>> @@ -87,7 +87,7 @@ typedef struct RADOSCB {
>>      int done;
>>      int64_t size;
>>      char *buf;
>> -    int ret;
>> +    int64_t ret;
>>  } RADOSCB;
>>    #define RBD_FD_READ 0
> 
> 
> Why do you use int64_t instead of off_t?
> If the value is related to file sizes, off_t would be a good choice.
> 
> Stefan W.
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
Peter Maydell Nov. 21, 2012, 10:32 p.m. UTC | #10
On 21 November 2012 17:03, Stefan Weil <sw@weilnetz.de> wrote:
> Why do you use int64_t instead of off_t?
> If the value is related to file sizes, off_t would be a good choice.

Looking at the librbd API (which is what the size and ret
values come from), it uses size_t and ssize_t for these.
So I think probably ssize_t is the right type for ret
(and size) in our structs here.

-- PMM
Stefan Priebe - Profihost AG Nov. 22, 2012, 8:23 a.m. UTC | #11
Am 21.11.2012 23:32, schrieb Peter Maydell:
> On 21 November 2012 17:03, Stefan Weil <sw@weilnetz.de> wrote:
>> Why do you use int64_t instead of off_t?
>> If the value is related to file sizes, off_t would be a good choice.
>
> Looking at the librbd API (which is what the size and ret
> values come from), it uses size_t and ssize_t for these.
> So I think probably ssize_t is the right type for ret
> (and size) in our structs here.

This sounds reasonable but does ssize_t support negative values? For 
error values.

Greets,
Stefan
Peter Maydell Nov. 22, 2012, 8:40 a.m. UTC | #12
On 22 November 2012 08:23, Stefan Priebe - Profihost AG
<s.priebe@profihost.ag> wrote:
> Am 21.11.2012 23:32, schrieb Peter Maydell:
>> Looking at the librbd API (which is what the size and ret
>> values come from), it uses size_t and ssize_t for these.
>> So I think probably ssize_t is the right type for ret
>> (and size) in our structs here.
>
>
> This sounds reasonable but does ssize_t support negative values? For error
> values.

Yes, the first 's' in ssize_t means 'signed' and is the
difference between it and size_t.

-- PMM
Stefan Priebe - Profihost AG Nov. 22, 2012, 9:08 a.m. UTC | #13
Hello,

i send a new patch using ssize_t. (Subject [PATCH] overflow of int ret: 
use ssize_t for ret)

Stefan

Am 22.11.2012 09:40, schrieb Peter Maydell:
> On 22 November 2012 08:23, Stefan Priebe - Profihost AG
> <s.priebe@profihost.ag> wrote:
>> Am 21.11.2012 23:32, schrieb Peter Maydell:
>>> Looking at the librbd API (which is what the size and ret
>>> values come from), it uses size_t and ssize_t for these.
>>> So I think probably ssize_t is the right type for ret
>>> (and size) in our structs here.
>>
>>
>> This sounds reasonable but does ssize_t support negative values? For error
>> values.
>
> Yes, the first 's' in ssize_t means 'signed' and is the
> difference between it and size_t.
>
> -- PMM
>
diff mbox

Patch

diff --git a/block/rbd.c b/block/rbd.c
index f57d0c6..6bf9c2e 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -69,7 +69,7 @@  typedef enum {
 typedef struct RBDAIOCB {
     BlockDriverAIOCB common;
     QEMUBH *bh;
-    int ret;
+    int64_t ret;
     QEMUIOVector *qiov;
     char *bounce;
     RBDAIOCmd cmd;
@@ -87,7 +87,7 @@  typedef struct RADOSCB {
     int done;
     int64_t size;
     char *buf;
-    int ret;
+    int64_t ret;
 } RADOSCB;
 
 #define RBD_FD_READ 0