diff mbox

[1/1] block: fix inability to start VM with native AIO

Message ID 1450767586-28794-1-git-send-email-den@openvz.org
State New
Headers show

Commit Message

Denis V. Lunev Dec. 22, 2015, 6:59 a.m. UTC
error: Failed to start domain rhel7
error: internal error: process exited while connecting to monitor:
    2015-12-22T06:55:18.812637Z qemu-system-x86_64:
    -drive file=/var/lib/libvirt/images/rhel7.qcow2,if=none,
        id=drive-scsi0-0-0-0,format=qcow2,cache=none,aio=native:
    aio=native was specified, but it requires cache.direct=on,
    which was not specified.

cache=none option was specified as seen above while the VM is unable to
start. The patch properly passed BDRV_O_NOCACHE to underlying layer.

The problem is revealed with
    commit d657c0c289e944fc22289f5c318f48da87d79dcb
    Author: Kevin Wolf <kwolf@redhat.com>
    Date:   Tue Dec 15 11:35:36 2015 +0100

        raw-posix: Make aio=native option binding

Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Kevin Wolf <kwolf@redhat.com>
---
 block.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Stefan Hajnoczi Dec. 23, 2015, 8:07 a.m. UTC | #1
On Tue, Dec 22, 2015 at 09:59:46AM +0300, Denis V. Lunev wrote:
> error: Failed to start domain rhel7
> error: internal error: process exited while connecting to monitor:
>     2015-12-22T06:55:18.812637Z qemu-system-x86_64:
>     -drive file=/var/lib/libvirt/images/rhel7.qcow2,if=none,
>         id=drive-scsi0-0-0-0,format=qcow2,cache=none,aio=native:
>     aio=native was specified, but it requires cache.direct=on,
>     which was not specified.
> 
> cache=none option was specified as seen above while the VM is unable to
> start. The patch properly passed BDRV_O_NOCACHE to underlying layer.
> 
> The problem is revealed with
>     commit d657c0c289e944fc22289f5c318f48da87d79dcb
>     Author: Kevin Wolf <kwolf@redhat.com>
>     Date:   Tue Dec 15 11:35:36 2015 +0100
> 
>         raw-posix: Make aio=native option binding
> 
> Signed-off-by: Denis V. Lunev <den@openvz.org>
> CC: Kevin Wolf <kwolf@redhat.com>
> ---
>  block.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/block.c b/block.c
> index 411edbf..fe0fbbc 100644
> --- a/block.c
> +++ b/block.c
> @@ -990,6 +990,7 @@ static int bdrv_open_common(BlockDriverState *bs, BdrvChild *file,
>      bs->opaque = g_malloc0(drv->instance_size);
>  
>      /* Apply cache mode options */
> +    update_flags_from_options(&open_flags, opts);
>      update_flags_from_options(&bs->open_flags, opts);
>      bdrv_set_enable_write_cache(bs, bs->open_flags & BDRV_O_CACHE_WB);

I tried to review this patch and failed to understand block.c flags
handling.  Perhaps there is a larger problem here because I see:

1. .bdrv_open() and friends take an int flags argument in addition to
   the already open bs node, which has bs->open_flags.  Some of the code
   that manipulates the flags argument in block.c updates bs->open_flags
   to keep them in sync.  Some code does not (e.g. BDRV_O_NO_BACKING).
   Is this a bug?

   Should int flags be removed in favor of just bs->open_flags?

2. The bdrv_open_common() open_flags local variable contains different
   values from bs->open_flags (i.e. BDRV_O_PROTOCOL).  I'm not sure
   whether this is intentional or a bug.

   Your patch syncs them but I wonder if it would be cleaner to remove
   the open_flags local (avoiding similar problems in the future)?

3. block/qcow2.c stashes .bdrv_open(flags) in s->flags.  I'm not sure
   if bs->open_flags can be used instead of s->flags.  That would be
   simpler.

Maybe someone can explain how this is all supposed to work.

Stefan
Stefan Hajnoczi Dec. 23, 2015, 8:08 a.m. UTC | #2
On Tue, Dec 22, 2015 at 09:59:46AM +0300, Denis V. Lunev wrote:
> error: Failed to start domain rhel7
> error: internal error: process exited while connecting to monitor:
>     2015-12-22T06:55:18.812637Z qemu-system-x86_64:
>     -drive file=/var/lib/libvirt/images/rhel7.qcow2,if=none,
>         id=drive-scsi0-0-0-0,format=qcow2,cache=none,aio=native:
>     aio=native was specified, but it requires cache.direct=on,
>     which was not specified.
> 
> cache=none option was specified as seen above while the VM is unable to
> start. The patch properly passed BDRV_O_NOCACHE to underlying layer.
> 
> The problem is revealed with
>     commit d657c0c289e944fc22289f5c318f48da87d79dcb
>     Author: Kevin Wolf <kwolf@redhat.com>
>     Date:   Tue Dec 15 11:35:36 2015 +0100
> 
>         raw-posix: Make aio=native option binding
> 
> Signed-off-by: Denis V. Lunev <den@openvz.org>
> CC: Kevin Wolf <kwolf@redhat.com>
> ---
>  block.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/block.c b/block.c
> index 411edbf..fe0fbbc 100644
> --- a/block.c
> +++ b/block.c
> @@ -990,6 +990,7 @@ static int bdrv_open_common(BlockDriverState *bs, BdrvChild *file,
>      bs->opaque = g_malloc0(drv->instance_size);
>  
>      /* Apply cache mode options */
> +    update_flags_from_options(&open_flags, opts);
>      update_flags_from_options(&bs->open_flags, opts);
>      bdrv_set_enable_write_cache(bs, bs->open_flags & BDRV_O_CACHE_WB);

By the way, my questions do not prevent your patch from being merged.

I just wanted to raise them for Kevin because I'm concerned there are
more bugs in this area.
Denis V. Lunev Jan. 8, 2016, 12:25 p.m. UTC | #3
On 12/22/2015 09:59 AM, Denis V. Lunev wrote:
> error: Failed to start domain rhel7
> error: internal error: process exited while connecting to monitor:
>      2015-12-22T06:55:18.812637Z qemu-system-x86_64:
>      -drive file=/var/lib/libvirt/images/rhel7.qcow2,if=none,
>          id=drive-scsi0-0-0-0,format=qcow2,cache=none,aio=native:
>      aio=native was specified, but it requires cache.direct=on,
>      which was not specified.
>
> cache=none option was specified as seen above while the VM is unable to
> start. The patch properly passed BDRV_O_NOCACHE to underlying layer.
>
> The problem is revealed with
>      commit d657c0c289e944fc22289f5c318f48da87d79dcb
>      Author: Kevin Wolf <kwolf@redhat.com>
>      Date:   Tue Dec 15 11:35:36 2015 +0100
>
>          raw-posix: Make aio=native option binding
>
> Signed-off-by: Denis V. Lunev <den@openvz.org>
> CC: Kevin Wolf <kwolf@redhat.com>
> ---
>   block.c | 1 +
>   1 file changed, 1 insertion(+)
>
> diff --git a/block.c b/block.c
> index 411edbf..fe0fbbc 100644
> --- a/block.c
> +++ b/block.c
> @@ -990,6 +990,7 @@ static int bdrv_open_common(BlockDriverState *bs, BdrvChild *file,
>       bs->opaque = g_malloc0(drv->instance_size);
>   
>       /* Apply cache mode options */
> +    update_flags_from_options(&open_flags, opts);
>       update_flags_from_options(&bs->open_flags, opts);
>       bdrv_set_enable_write_cache(bs, bs->open_flags & BDRV_O_CACHE_WB);
>   
ping
Christian Borntraeger Jan. 11, 2016, 1:46 p.m. UTC | #4
On 12/22/2015 07:59 AM, Denis V. Lunev wrote:
> error: Failed to start domain rhel7
> error: internal error: process exited while connecting to monitor:
>     2015-12-22T06:55:18.812637Z qemu-system-x86_64:
>     -drive file=/var/lib/libvirt/images/rhel7.qcow2,if=none,
>         id=drive-scsi0-0-0-0,format=qcow2,cache=none,aio=native:
>     aio=native was specified, but it requires cache.direct=on,
>     which was not specified.
> 
> cache=none option was specified as seen above while the VM is unable to
> start. The patch properly passed BDRV_O_NOCACHE to underlying layer.
> 
> The problem is revealed with
>     commit d657c0c289e944fc22289f5c318f48da87d79dcb
>     Author: Kevin Wolf <kwolf@redhat.com>
>     Date:   Tue Dec 15 11:35:36 2015 +0100
> 
>         raw-posix: Make aio=native option binding
> 
> Signed-off-by: Denis V. Lunev <den@openvz.org>
> CC: Kevin Wolf <kwolf@redhat.com>

Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>

Without this patch all libvirt xmls with
cache='none' io='native'
are broken. We should apply this patch (or something else)
soon.





> ---
>  block.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/block.c b/block.c
> index 411edbf..fe0fbbc 100644
> --- a/block.c
> +++ b/block.c
> @@ -990,6 +990,7 @@ static int bdrv_open_common(BlockDriverState *bs, BdrvChild *file,
>      bs->opaque = g_malloc0(drv->instance_size);
> 
>      /* Apply cache mode options */
> +    update_flags_from_options(&open_flags, opts);
>      update_flags_from_options(&bs->open_flags, opts);
>      bdrv_set_enable_write_cache(bs, bs->open_flags & BDRV_O_CACHE_WB);
>
Denis V. Lunev Jan. 11, 2016, 2:26 p.m. UTC | #5
On 01/11/2016 04:46 PM, Christian Borntraeger wrote:
> On 12/22/2015 07:59 AM, Denis V. Lunev wrote:
>> error: Failed to start domain rhel7
>> error: internal error: process exited while connecting to monitor:
>>      2015-12-22T06:55:18.812637Z qemu-system-x86_64:
>>      -drive file=/var/lib/libvirt/images/rhel7.qcow2,if=none,
>>          id=drive-scsi0-0-0-0,format=qcow2,cache=none,aio=native:
>>      aio=native was specified, but it requires cache.direct=on,
>>      which was not specified.
>>
>> cache=none option was specified as seen above while the VM is unable to
>> start. The patch properly passed BDRV_O_NOCACHE to underlying layer.
>>
>> The problem is revealed with
>>      commit d657c0c289e944fc22289f5c318f48da87d79dcb
>>      Author: Kevin Wolf <kwolf@redhat.com>
>>      Date:   Tue Dec 15 11:35:36 2015 +0100
>>
>>          raw-posix: Make aio=native option binding
>>
>> Signed-off-by: Denis V. Lunev <den@openvz.org>
>> CC: Kevin Wolf <kwolf@redhat.com>
> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
>
> Without this patch all libvirt xmls with
> cache='none' io='native'
> are broken. We should apply this patch (or something else)
> soon.
>
exactly!
Stefan Hajnoczi Feb. 8, 2016, 2:49 p.m. UTC | #6
On Mon, Jan 11, 2016 at 05:26:19PM +0300, Denis V. Lunev wrote:
> On 01/11/2016 04:46 PM, Christian Borntraeger wrote:
> >On 12/22/2015 07:59 AM, Denis V. Lunev wrote:
> >>error: Failed to start domain rhel7
> >>error: internal error: process exited while connecting to monitor:
> >>     2015-12-22T06:55:18.812637Z qemu-system-x86_64:
> >>     -drive file=/var/lib/libvirt/images/rhel7.qcow2,if=none,
> >>         id=drive-scsi0-0-0-0,format=qcow2,cache=none,aio=native:
> >>     aio=native was specified, but it requires cache.direct=on,
> >>     which was not specified.
> >>
> >>cache=none option was specified as seen above while the VM is unable to
> >>start. The patch properly passed BDRV_O_NOCACHE to underlying layer.
> >>
> >>The problem is revealed with
> >>     commit d657c0c289e944fc22289f5c318f48da87d79dcb
> >>     Author: Kevin Wolf <kwolf@redhat.com>
> >>     Date:   Tue Dec 15 11:35:36 2015 +0100
> >>
> >>         raw-posix: Make aio=native option binding
> >>
> >>Signed-off-by: Denis V. Lunev <den@openvz.org>
> >>CC: Kevin Wolf <kwolf@redhat.com>
> >Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
> >
> >Without this patch all libvirt xmls with
> >cache='none' io='native'
> >are broken. We should apply this patch (or something else)
> >soon.
> >
> exactly!

For the record, the bug has been fixed.  I went back and bisected:

Kevin Wolf fixed the issue in "block: Fix .bdrv_open flags"
(82dc8b411040fa8a7418a012ff39b8d06f68e639).

Stefan
diff mbox

Patch

diff --git a/block.c b/block.c
index 411edbf..fe0fbbc 100644
--- a/block.c
+++ b/block.c
@@ -990,6 +990,7 @@  static int bdrv_open_common(BlockDriverState *bs, BdrvChild *file,
     bs->opaque = g_malloc0(drv->instance_size);
 
     /* Apply cache mode options */
+    update_flags_from_options(&open_flags, opts);
     update_flags_from_options(&bs->open_flags, opts);
     bdrv_set_enable_write_cache(bs, bs->open_flags & BDRV_O_CACHE_WB);