diff mbox

block/mirror: limit qiov to IOV_MAX elements

Message ID 1435761950-26714-1-git-send-email-stefanha@redhat.com
State New
Headers show

Commit Message

Stefan Hajnoczi July 1, 2015, 2:45 p.m. UTC
If mirror has more free buffers than IOV_MAX, preadv(2)/pwritev(2)
EINVAL failures may be encountered.

It is possible to trigger this by setting granularity to a low value
like 8192.

This patch stops appending chunks once IOV_MAX is reached.

The spurious EINVAL failure can be reproduced with a qcow2 image file
and the following QMP invocation:

  qmp.command('drive-mirror', device='virtio0', target='/tmp/r7.s1',
              granularity=8192, sync='full', mode='absolute-paths',
              format='raw')

While the guest is running dd if=/dev/zero of=/var/tmp/foo oflag=direct
bs=4k.

Cc: Jeff Cody <jcody@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/mirror.c | 4 ++++
 trace-events   | 1 +
 2 files changed, 5 insertions(+)

Comments

Paolo Bonzini July 1, 2015, 2:47 p.m. UTC | #1
On 01/07/2015 16:45, Stefan Hajnoczi wrote:
> If mirror has more free buffers than IOV_MAX, preadv(2)/pwritev(2)
> EINVAL failures may be encountered.
> 
> It is possible to trigger this by setting granularity to a low value
> like 8192.
> 
> This patch stops appending chunks once IOV_MAX is reached.
> 
> The spurious EINVAL failure can be reproduced with a qcow2 image file
> and the following QMP invocation:
> 
>   qmp.command('drive-mirror', device='virtio0', target='/tmp/r7.s1',
>               granularity=8192, sync='full', mode='absolute-paths',
>               format='raw')
> 
> While the guest is running dd if=/dev/zero of=/var/tmp/foo oflag=direct
> bs=4k.
> 
> Cc: Jeff Cody <jcody@redhat.com>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  block/mirror.c | 4 ++++
>  trace-events   | 1 +
>  2 files changed, 5 insertions(+)
> 
> diff --git a/block/mirror.c b/block/mirror.c
> index 048e452..985ad00 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -241,6 +241,10 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
>              trace_mirror_break_buf_busy(s, nb_chunks, s->in_flight);
>              break;
>          }
> +        if (IOV_MAX < nb_chunks + added_chunks) {

No Yoda conditions... apart from that,

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

> +            trace_mirror_break_iov_max(s, nb_chunks, added_chunks);
> +            break;
> +        }
>  
>          /* We have enough free space to copy these sectors.  */
>          bitmap_set(s->in_flight_bitmap, next_chunk, added_chunks);
> diff --git a/trace-events b/trace-events
> index 52b7efa..943cd0c 100644
> --- a/trace-events
> +++ b/trace-events
> @@ -94,6 +94,7 @@ mirror_yield(void *s, int64_t cnt, int buf_free_count, int in_flight) "s %p dirt
>  mirror_yield_in_flight(void *s, int64_t sector_num, int in_flight) "s %p sector_num %"PRId64" in_flight %d"
>  mirror_yield_buf_busy(void *s, int nb_chunks, int in_flight) "s %p requested chunks %d in_flight %d"
>  mirror_break_buf_busy(void *s, int nb_chunks, int in_flight) "s %p requested chunks %d in_flight %d"
> +mirror_break_iov_max(void *s, int nb_chunks, int added_chunks) "s %p requested chunks %d added_chunks %d"
>  
>  # block/backup.c
>  backup_do_cow_enter(void *job, int64_t start, int64_t sector_num, int nb_sectors) "job %p start %"PRId64" sector_num %"PRId64" nb_sectors %d"
>
Stefan Hajnoczi July 1, 2015, 2:59 p.m. UTC | #2
On Wed, Jul 1, 2015 at 3:47 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> On 01/07/2015 16:45, Stefan Hajnoczi wrote:
>> If mirror has more free buffers than IOV_MAX, preadv(2)/pwritev(2)
>> EINVAL failures may be encountered.
>>
>> It is possible to trigger this by setting granularity to a low value
>> like 8192.
>>
>> This patch stops appending chunks once IOV_MAX is reached.
>>
>> The spurious EINVAL failure can be reproduced with a qcow2 image file
>> and the following QMP invocation:
>>
>>   qmp.command('drive-mirror', device='virtio0', target='/tmp/r7.s1',
>>               granularity=8192, sync='full', mode='absolute-paths',
>>               format='raw')
>>
>> While the guest is running dd if=/dev/zero of=/var/tmp/foo oflag=direct
>> bs=4k.
>>
>> Cc: Jeff Cody <jcody@redhat.com>
>> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
>> ---
>>  block/mirror.c | 4 ++++
>>  trace-events   | 1 +
>>  2 files changed, 5 insertions(+)
>>
>> diff --git a/block/mirror.c b/block/mirror.c
>> index 048e452..985ad00 100644
>> --- a/block/mirror.c
>> +++ b/block/mirror.c
>> @@ -241,6 +241,10 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
>>              trace_mirror_break_buf_busy(s, nb_chunks, s->in_flight);
>>              break;
>>          }
>> +        if (IOV_MAX < nb_chunks + added_chunks) {
>
> No Yoda conditions... apart from that,
>
> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

I found it annoying to write it backwards too, but it's for consistency:

  if (s->buf_free_count < nb_chunks + added_chunks) {
      trace_mirror_break_buf_busy(s, nb_chunks, s->in_flight);
      break;
  }
  if (IOV_MAX < nb_chunks + added_chunks) {
      trace_mirror_break_iov_max(s, nb_chunks, added_chunks);
      break;
  }

It's the same type of check as s->buf_free_count (which isn't modified
by this loop either so it's a yoda conditional).
Paolo Bonzini July 1, 2015, 3:03 p.m. UTC | #3
On 01/07/2015 16:59, Stefan Hajnoczi wrote:
> I found it annoying to write it backwards too, but it's for consistency:
> 
>   if (s->buf_free_count < nb_chunks + added_chunks) {
>       trace_mirror_break_buf_busy(s, nb_chunks, s->in_flight);
>       break;
>   }
>   if (IOV_MAX < nb_chunks + added_chunks) {
>       trace_mirror_break_iov_max(s, nb_chunks, added_chunks);
>       break;
>   }
> 
> It's the same type of check as s->buf_free_count (which isn't modified
> by this loop either so it's a yoda conditional).

Hmm, right.  The problem goes back to:

        while (nb_chunks == 0 && s->buf_free_count < added_chunks) {
            trace_mirror_yield_buf_busy(s, nb_chunks, s->in_flight);
            qemu_coroutine_yield();
        }

where s->buf_free_count _is_ modified by the loop.  The if below:

        if (s->buf_free_count < nb_chunks + added_chunks) {
            trace_mirror_break_buf_busy(s, nb_chunks, s->in_flight);
            break;
        }

is written as a < check for consistency, and the one you add exacerbates
the problem.  If you want you can change the < to > in the "while" loop
as well; otherwise the patch is okay as is.

Paolo
Stefan Hajnoczi July 1, 2015, 4:05 p.m. UTC | #4
On Wed, Jul 1, 2015 at 4:03 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> On 01/07/2015 16:59, Stefan Hajnoczi wrote:
>> I found it annoying to write it backwards too, but it's for consistency:
>>
>>   if (s->buf_free_count < nb_chunks + added_chunks) {
>>       trace_mirror_break_buf_busy(s, nb_chunks, s->in_flight);
>>       break;
>>   }
>>   if (IOV_MAX < nb_chunks + added_chunks) {
>>       trace_mirror_break_iov_max(s, nb_chunks, added_chunks);
>>       break;
>>   }
>>
>> It's the same type of check as s->buf_free_count (which isn't modified
>> by this loop either so it's a yoda conditional).
>
> Hmm, right.  The problem goes back to:
>
>         while (nb_chunks == 0 && s->buf_free_count < added_chunks) {
>             trace_mirror_yield_buf_busy(s, nb_chunks, s->in_flight);
>             qemu_coroutine_yield();
>         }
>
> where s->buf_free_count _is_ modified by the loop.  The if below:
>
>         if (s->buf_free_count < nb_chunks + added_chunks) {
>             trace_mirror_break_buf_busy(s, nb_chunks, s->in_flight);
>             break;
>         }
>
> is written as a < check for consistency, and the one you add exacerbates
> the problem.  If you want you can change the < to > in the "while" loop
> as well; otherwise the patch is okay as is.

Let's leave it.
Kevin Wolf July 8, 2015, 11:14 a.m. UTC | #5
Am 01.07.2015 um 16:45 hat Stefan Hajnoczi geschrieben:
> If mirror has more free buffers than IOV_MAX, preadv(2)/pwritev(2)
> EINVAL failures may be encountered.
> 
> It is possible to trigger this by setting granularity to a low value
> like 8192.
> 
> This patch stops appending chunks once IOV_MAX is reached.
> 
> The spurious EINVAL failure can be reproduced with a qcow2 image file
> and the following QMP invocation:
> 
>   qmp.command('drive-mirror', device='virtio0', target='/tmp/r7.s1',
>               granularity=8192, sync='full', mode='absolute-paths',
>               format='raw')
> 
> While the guest is running dd if=/dev/zero of=/var/tmp/foo oflag=direct
> bs=4k.
> 
> Cc: Jeff Cody <jcody@redhat.com>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>

This looks like a backend-specific problem with raw-posix. Wouldn't it
make more sense to either let raw-posix split requests it can't handle
or introduce a bs->iov_max instead of spreading knowledge about specific
backends into all callers of the block layer?

I'm not objecting to taking this patch for now to fix the bug in 2.4,
but I don't think it's the proper solution.

Kevin

>  block/mirror.c | 4 ++++
>  trace-events   | 1 +
>  2 files changed, 5 insertions(+)
> 
> diff --git a/block/mirror.c b/block/mirror.c
> index 048e452..985ad00 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -241,6 +241,10 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
>              trace_mirror_break_buf_busy(s, nb_chunks, s->in_flight);
>              break;
>          }
> +        if (IOV_MAX < nb_chunks + added_chunks) {
> +            trace_mirror_break_iov_max(s, nb_chunks, added_chunks);
> +            break;
> +        }
>  
>          /* We have enough free space to copy these sectors.  */
>          bitmap_set(s->in_flight_bitmap, next_chunk, added_chunks);
> diff --git a/trace-events b/trace-events
> index 52b7efa..943cd0c 100644
> --- a/trace-events
> +++ b/trace-events
> @@ -94,6 +94,7 @@ mirror_yield(void *s, int64_t cnt, int buf_free_count, int in_flight) "s %p dirt
>  mirror_yield_in_flight(void *s, int64_t sector_num, int in_flight) "s %p sector_num %"PRId64" in_flight %d"
>  mirror_yield_buf_busy(void *s, int nb_chunks, int in_flight) "s %p requested chunks %d in_flight %d"
>  mirror_break_buf_busy(void *s, int nb_chunks, int in_flight) "s %p requested chunks %d in_flight %d"
> +mirror_break_iov_max(void *s, int nb_chunks, int added_chunks) "s %p requested chunks %d added_chunks %d"
>  
>  # block/backup.c
>  backup_do_cow_enter(void *job, int64_t start, int64_t sector_num, int nb_sectors) "job %p start %"PRId64" sector_num %"PRId64" nb_sectors %d"
> -- 
> 2.4.3
> 
>
Stefan Hajnoczi July 8, 2015, 12:34 p.m. UTC | #6
On Wed, Jul 8, 2015 at 12:14 PM, Kevin Wolf <kwolf@redhat.com> wrote:
> Am 01.07.2015 um 16:45 hat Stefan Hajnoczi geschrieben:
>> If mirror has more free buffers than IOV_MAX, preadv(2)/pwritev(2)
>> EINVAL failures may be encountered.
>>
>> It is possible to trigger this by setting granularity to a low value
>> like 8192.
>>
>> This patch stops appending chunks once IOV_MAX is reached.
>>
>> The spurious EINVAL failure can be reproduced with a qcow2 image file
>> and the following QMP invocation:
>>
>>   qmp.command('drive-mirror', device='virtio0', target='/tmp/r7.s1',
>>               granularity=8192, sync='full', mode='absolute-paths',
>>               format='raw')
>>
>> While the guest is running dd if=/dev/zero of=/var/tmp/foo oflag=direct
>> bs=4k.
>>
>> Cc: Jeff Cody <jcody@redhat.com>
>> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
>
> This looks like a backend-specific problem with raw-posix. Wouldn't it
> make more sense to either let raw-posix split requests it can't handle
> or introduce a bs->iov_max instead of spreading knowledge about specific
> backends into all callers of the block layer?
>
> I'm not objecting to taking this patch for now to fix the bug in 2.4,
> but I don't think it's the proper solution.

I will send a patch for QEMU 2.5 that introduces an iov max field in
BlockLimits.  It will convert the request merging code which currently
hardcodes IOV_MAX too.

In practice there are probably only two values that get used: IOV_MAX
and INT_MAX, but we already have BlockLimits and it doesn't hurt to
add this.

Stefan
Stefan Hajnoczi Aug. 4, 2015, 4:13 p.m. UTC | #7
Sorry Jeff, my "Cc:" line in the commit description wasn't picked up.
Please merge through your block/jobs tree for QEMU 2.5.

Thanks,
Stefan

On Wed, Jul 1, 2015 at 3:45 PM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> If mirror has more free buffers than IOV_MAX, preadv(2)/pwritev(2)
> EINVAL failures may be encountered.
>
> It is possible to trigger this by setting granularity to a low value
> like 8192.
>
> This patch stops appending chunks once IOV_MAX is reached.
>
> The spurious EINVAL failure can be reproduced with a qcow2 image file
> and the following QMP invocation:
>
>   qmp.command('drive-mirror', device='virtio0', target='/tmp/r7.s1',
>               granularity=8192, sync='full', mode='absolute-paths',
>               format='raw')
>
> While the guest is running dd if=/dev/zero of=/var/tmp/foo oflag=direct
> bs=4k.
>
> Cc: Jeff Cody <jcody@redhat.com>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  block/mirror.c | 4 ++++
>  trace-events   | 1 +
>  2 files changed, 5 insertions(+)
>
> diff --git a/block/mirror.c b/block/mirror.c
> index 048e452..985ad00 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -241,6 +241,10 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
>              trace_mirror_break_buf_busy(s, nb_chunks, s->in_flight);
>              break;
>          }
> +        if (IOV_MAX < nb_chunks + added_chunks) {
> +            trace_mirror_break_iov_max(s, nb_chunks, added_chunks);
> +            break;
> +        }
>
>          /* We have enough free space to copy these sectors.  */
>          bitmap_set(s->in_flight_bitmap, next_chunk, added_chunks);
> diff --git a/trace-events b/trace-events
> index 52b7efa..943cd0c 100644
> --- a/trace-events
> +++ b/trace-events
> @@ -94,6 +94,7 @@ mirror_yield(void *s, int64_t cnt, int buf_free_count, int in_flight) "s %p dirt
>  mirror_yield_in_flight(void *s, int64_t sector_num, int in_flight) "s %p sector_num %"PRId64" in_flight %d"
>  mirror_yield_buf_busy(void *s, int nb_chunks, int in_flight) "s %p requested chunks %d in_flight %d"
>  mirror_break_buf_busy(void *s, int nb_chunks, int in_flight) "s %p requested chunks %d in_flight %d"
> +mirror_break_iov_max(void *s, int nb_chunks, int added_chunks) "s %p requested chunks %d added_chunks %d"
>
>  # block/backup.c
>  backup_do_cow_enter(void *job, int64_t start, int64_t sector_num, int nb_sectors) "job %p start %"PRId64" sector_num %"PRId64" nb_sectors %d"
> --
> 2.4.3
>
>
Jeff Cody Aug. 6, 2015, 8:50 a.m. UTC | #8
On Wed, Jul 01, 2015 at 03:45:50PM +0100, Stefan Hajnoczi wrote:
> If mirror has more free buffers than IOV_MAX, preadv(2)/pwritev(2)
> EINVAL failures may be encountered.
> 
> It is possible to trigger this by setting granularity to a low value
> like 8192.
> 
> This patch stops appending chunks once IOV_MAX is reached.
> 
> The spurious EINVAL failure can be reproduced with a qcow2 image file
> and the following QMP invocation:
> 
>   qmp.command('drive-mirror', device='virtio0', target='/tmp/r7.s1',
>               granularity=8192, sync='full', mode='absolute-paths',
>               format='raw')
> 
> While the guest is running dd if=/dev/zero of=/var/tmp/foo oflag=direct
> bs=4k.
> 

Thanks,

Applied to my 'block-next' branch:

https://github.com/codyprime/qemu-kvm-jtc/tree/block-next

-Jeff
diff mbox

Patch

diff --git a/block/mirror.c b/block/mirror.c
index 048e452..985ad00 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -241,6 +241,10 @@  static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
             trace_mirror_break_buf_busy(s, nb_chunks, s->in_flight);
             break;
         }
+        if (IOV_MAX < nb_chunks + added_chunks) {
+            trace_mirror_break_iov_max(s, nb_chunks, added_chunks);
+            break;
+        }
 
         /* We have enough free space to copy these sectors.  */
         bitmap_set(s->in_flight_bitmap, next_chunk, added_chunks);
diff --git a/trace-events b/trace-events
index 52b7efa..943cd0c 100644
--- a/trace-events
+++ b/trace-events
@@ -94,6 +94,7 @@  mirror_yield(void *s, int64_t cnt, int buf_free_count, int in_flight) "s %p dirt
 mirror_yield_in_flight(void *s, int64_t sector_num, int in_flight) "s %p sector_num %"PRId64" in_flight %d"
 mirror_yield_buf_busy(void *s, int nb_chunks, int in_flight) "s %p requested chunks %d in_flight %d"
 mirror_break_buf_busy(void *s, int nb_chunks, int in_flight) "s %p requested chunks %d in_flight %d"
+mirror_break_iov_max(void *s, int nb_chunks, int added_chunks) "s %p requested chunks %d added_chunks %d"
 
 # block/backup.c
 backup_do_cow_enter(void *job, int64_t start, int64_t sector_num, int nb_sectors) "job %p start %"PRId64" sector_num %"PRId64" nb_sectors %d"