diff mbox series

[v2,12/17] mirror: Fix potential use-after-free in active commit

Message ID 20180913125217.23173-13-kwolf@redhat.com
State New
Headers show
Series Fix some jobs/drain/aio_poll related hangs | expand

Commit Message

Kevin Wolf Sept. 13, 2018, 12:52 p.m. UTC
When starting an active commit job, other callbacks can run before
mirror_start_job() calls bdrv_ref() where needed and cause the nodes to
go away. Add another pair of bdrv_ref/unref() around it to protect
against this case.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/mirror.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

Comments

Max Reitz Sept. 13, 2018, 8:55 p.m. UTC | #1
On 13.09.18 14:52, Kevin Wolf wrote:
> When starting an active commit job, other callbacks can run before
> mirror_start_job() calls bdrv_ref() where needed and cause the nodes to
> go away. Add another pair of bdrv_ref/unref() around it to protect
> against this case.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  block/mirror.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)

Reviewed-by: Max Reitz <mreitz@redhat.com>

But...  How?

Like...  You mirror to some target (in an iothread), then you give that
target a backing file, then you cancel the mirror and immediately commit
the target?

Max
Max Reitz Sept. 13, 2018, 9:43 p.m. UTC | #2
On 13.09.18 22:55, Max Reitz wrote:
> On 13.09.18 14:52, Kevin Wolf wrote:
>> When starting an active commit job, other callbacks can run before
>> mirror_start_job() calls bdrv_ref() where needed and cause the nodes to
>> go away. Add another pair of bdrv_ref/unref() around it to protect
>> against this case.
>>
>> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
>> ---
>>  block/mirror.c | 11 +++++++++++
>>  1 file changed, 11 insertions(+)
> 
> Reviewed-by: Max Reitz <mreitz@redhat.com>
> 
> But...  How?
> 
> Like...  You mirror to some target (in an iothread), then you give that
> target a backing file, then you cancel the mirror and immediately commit
> the target?

The only way I got this to work was to allow commit to accept a non-root
BDS as @device.  I can't imagine a way where @device can go away, but
isn't currently in use by something that would make it a non-root BDS.
(Because the only reason someone can make it go away is because that
someone uses it right now.)

But if commit accepts non-root BDSs as @device, I get a segfault even
after this commit...

Max
Kevin Wolf Sept. 14, 2018, 4:25 p.m. UTC | #3
Am 13.09.2018 um 22:55 hat Max Reitz geschrieben:
> On 13.09.18 14:52, Kevin Wolf wrote:
> > When starting an active commit job, other callbacks can run before
> > mirror_start_job() calls bdrv_ref() where needed and cause the nodes to
> > go away. Add another pair of bdrv_ref/unref() around it to protect
> > against this case.
> > 
> > Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> > ---
> >  block/mirror.c | 11 +++++++++++
> >  1 file changed, 11 insertions(+)
> 
> Reviewed-by: Max Reitz <mreitz@redhat.com>
> 
> But...  How?

Tried to reproduce the other bug Paolo was concerned about (good there
is something like 'git reflog'!) and dug up this one instead.

So the scenario seems to be test_stream_commit_1 in qemu-iotests 030.

The backtrace when the BDS is deleted is the following:

(rr) bt
#0  0x00007faeaf6145ec in __memset_sse2_unaligned_erms () at /lib64/libc.so.6
#1  0x00007faeaf60414e in _int_free () at /lib64/libc.so.6
#2  0x00007faeaf6093be in free () at /lib64/libc.so.6
#3  0x00007faecaea6b4e in g_free () at /lib64/libglib-2.0.so.0
#4  0x000055c0c94f9d6c in bdrv_delete (bs=0x55c0cbe69240) at block.c:3590
#5  0x000055c0c94fbe82 in bdrv_unref (bs=0x55c0cbe69240) at block.c:4638
#6  0x000055c0c94f6761 in bdrv_root_unref_child (child=0x55c0cbe57af0) at block.c:2188
#7  0x000055c0c94fe239 in block_job_remove_all_bdrv (job=0x55c0ccf9e220) at blockjob.c:200
#8  0x000055c0c94fdf1f in block_job_free (job=0x55c0ccf9e220) at blockjob.c:94
#9  0x000055c0c94ffe0d in job_unref (job=0x55c0ccf9e220) at job.c:368
#10 0x000055c0c95007cd in job_do_dismiss (job=0x55c0ccf9e220) at job.c:641
#11 0x000055c0c95008d9 in job_conclude (job=0x55c0ccf9e220) at job.c:667
#12 0x000055c0c9500b66 in job_finalize_single (job=0x55c0ccf9e220) at job.c:735
#13 0x000055c0c94ff4fa in job_txn_apply (txn=0x55c0cbe19eb0, fn=0x55c0c9500a62 <job_finalize_single>, lock=true) at job.c:151
#14 0x000055c0c9500e92 in job_do_finalize (job=0x55c0ccf9e220) at job.c:822
#15 0x000055c0c9501040 in job_completed_txn_success (job=0x55c0ccf9e220) at job.c:872
#16 0x000055c0c950115c in job_completed (job=0x55c0ccf9e220, ret=0, error=0x0) at job.c:892
#17 0x000055c0c92b572c in stream_complete (job=0x55c0ccf9e220, opaque=0x55c0cc050bc0) at block/stream.c:96
#18 0x000055c0c950142b in job_defer_to_main_loop_bh (opaque=0x55c0cbe38a50) at job.c:981
#19 0x000055c0c96171fc in aio_bh_call (bh=0x55c0cbe19f20) at util/async.c:90
#20 0x000055c0c9617294 in aio_bh_poll (ctx=0x55c0cbd7b8d0) at util/async.c:118
#21 0x000055c0c961c150 in aio_poll (ctx=0x55c0cbd7b8d0, blocking=true) at util/aio-posix.c:690
#22 0x000055c0c957246c in bdrv_flush (bs=0x55c0cbe4b250) at block/io.c:2693
#23 0x000055c0c94f8e46 in bdrv_reopen_prepare (reopen_state=0x55c0cbef4d68, queue=0x55c0cbeab9f0, errp=0x7ffd65617050) at block.c:3206
#24 0x000055c0c94f883a in bdrv_reopen_multiple (ctx=0x55c0cbd7b8d0, bs_queue=0x55c0cbeab9f0, errp=0x7ffd656170c8) at block.c:3032
#25 0x000055c0c94f8a00 in bdrv_reopen (bs=0x55c0cbe2c250, bdrv_flags=8194, errp=0x7ffd65617220) at block.c:3075
#26 0x000055c0c956a23f in commit_active_start (job_id=0x0, bs=0x55c0cbd905b0, base=0x55c0cbe2c250, creation_flags=0, speed=0, on_error=BLOCKDEV_ON_ERROR_REPORT, filter_node_name=0x0, cb=0x0, opaque=0x0, auto_complete=false, errp=0x7ffd65617220) at block/mirror.c:1687
#27 0x000055c0c927437c in qmp_block_commit (has_job_id=false, job_id=0x0, device=0x55c0cbef4d20 "drive0", has_base_node=false, base_node=0x0, has_base=true, base=0x55c0cbe38a00 "/home/kwolf/source/qemu/tests/qemu-iotests/scratch/img-3.img", has_top_node=false, top_node=0x0, has_top=false, top=0x0, has_backing_file=false, backing_file=0x0, has_speed=false, speed=0, has_filter_node_name=false, filter_node_name=0x0, errp=0x7ffd656172d8) at blockdev.c:3325
#28 0x000055c0c928bb08 in qmp_marshal_block_commit (args=<optimized out>, ret=<optimized out>, errp=0x7ffd656173b8) at qapi/qapi-commands-block-core.c:409
#29 0x000055c0c960beef in do_qmp_dispatch (errp=0x7ffd656173b0, allow_oob=false, request=<optimized out>, cmds=0x55c0c9f56f80 <qmp_commands>) at qapi/qmp-dispatch.c:129
#30 0x000055c0c960beef in qmp_dispatch (cmds=0x55c0c9f56f80 <qmp_commands>, request=<optimized out>, allow_oob=<optimized out>) at qapi/qmp-dispatch.c:171
#31 0x000055c0c9111623 in monitor_qmp_dispatch (mon=0x55c0cbdbb930, req=0x55c0ccf90e00, id=0x0) at /home/kwolf/source/qemu/monitor.c:4168
#32 0x000055c0c911191f in monitor_qmp_bh_dispatcher (data=0x0) at /home/kwolf/source/qemu/monitor.c:4237
#33 0x000055c0c96171fc in aio_bh_call (bh=0x55c0cbccc840) at util/async.c:90
#34 0x000055c0c9617294 in aio_bh_poll (ctx=0x55c0cbccc550) at util/async.c:118
#35 0x000055c0c961b723 in aio_dispatch (ctx=0x55c0cbccc550) at util/aio-posix.c:436
#36 0x000055c0c961762f in aio_ctx_dispatch (source=0x55c0cbccc550, callback=0x0, user_data=0x0) at util/async.c:261
#37 0x00007faecaea1257 in g_main_context_dispatch () at /lib64/libglib-2.0.so.0
#38 0x000055c0c961a2aa in glib_pollfds_poll () at util/main-loop.c:215
#39 0x000055c0c961a2aa in os_host_main_loop_wait (timeout=0) at util/main-loop.c:238
#40 0x000055c0c961a2aa in main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:497
#41 0x000055c0c92805e1 in main_loop () at vl.c:1866
#42 0x000055c0c9287eca in main (argc=18, argv=0x7ffd65617958, envp=0x7ffd656179f0) at vl.c:4647
(rr) p bs.node_name 
$17 = "node1", '\000' <repeats 26 times>

Obviously, this exact backtrace shouldn't even be possible like this
because job_defer_to_main_loop_bh() shouldn't be called inside
bdrv_flush(), but when draining the subtree in bdrv_reopen(). This is
only fixed in "blockjob: Lie better in child_job_drained_poll()".

It probably doesn't make a difference, though, where exactly during
bdrv_reopen() the node is deleted. We'll crash anyway if we don't hold
the extra reference.

Kevin
Max Reitz Sept. 16, 2018, 10:05 p.m. UTC | #4
On 14.09.18 18:25, Kevin Wolf wrote:
> Am 13.09.2018 um 22:55 hat Max Reitz geschrieben:
>> On 13.09.18 14:52, Kevin Wolf wrote:
>>> When starting an active commit job, other callbacks can run before
>>> mirror_start_job() calls bdrv_ref() where needed and cause the nodes to
>>> go away. Add another pair of bdrv_ref/unref() around it to protect
>>> against this case.
>>>
>>> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
>>> ---
>>>  block/mirror.c | 11 +++++++++++
>>>  1 file changed, 11 insertions(+)
>>
>> Reviewed-by: Max Reitz <mreitz@redhat.com>
>>
>> But...  How?
> 
> Tried to reproduce the other bug Paolo was concerned about (good there
> is something like 'git reflog'!) and dug up this one instead.
> 
> So the scenario seems to be test_stream_commit_1 in qemu-iotests 030.
> 
> The backtrace when the BDS is deleted is the following:
> 
> (rr) bt
> #0  0x00007faeaf6145ec in __memset_sse2_unaligned_erms () at /lib64/libc.so.6
> #1  0x00007faeaf60414e in _int_free () at /lib64/libc.so.6
> #2  0x00007faeaf6093be in free () at /lib64/libc.so.6
> #3  0x00007faecaea6b4e in g_free () at /lib64/libglib-2.0.so.0
> #4  0x000055c0c94f9d6c in bdrv_delete (bs=0x55c0cbe69240) at block.c:3590
> #5  0x000055c0c94fbe82 in bdrv_unref (bs=0x55c0cbe69240) at block.c:4638
> #6  0x000055c0c94f6761 in bdrv_root_unref_child (child=0x55c0cbe57af0) at block.c:2188
> #7  0x000055c0c94fe239 in block_job_remove_all_bdrv (job=0x55c0ccf9e220) at blockjob.c:200
> #8  0x000055c0c94fdf1f in block_job_free (job=0x55c0ccf9e220) at blockjob.c:94
> #9  0x000055c0c94ffe0d in job_unref (job=0x55c0ccf9e220) at job.c:368
> #10 0x000055c0c95007cd in job_do_dismiss (job=0x55c0ccf9e220) at job.c:641
> #11 0x000055c0c95008d9 in job_conclude (job=0x55c0ccf9e220) at job.c:667
> #12 0x000055c0c9500b66 in job_finalize_single (job=0x55c0ccf9e220) at job.c:735
> #13 0x000055c0c94ff4fa in job_txn_apply (txn=0x55c0cbe19eb0, fn=0x55c0c9500a62 <job_finalize_single>, lock=true) at job.c:151
> #14 0x000055c0c9500e92 in job_do_finalize (job=0x55c0ccf9e220) at job.c:822
> #15 0x000055c0c9501040 in job_completed_txn_success (job=0x55c0ccf9e220) at job.c:872
> #16 0x000055c0c950115c in job_completed (job=0x55c0ccf9e220, ret=0, error=0x0) at job.c:892
> #17 0x000055c0c92b572c in stream_complete (job=0x55c0ccf9e220, opaque=0x55c0cc050bc0) at block/stream.c:96

But isn't this just deletion of node1, i.e. the intermediate node of the
stream job?

The commit job happens above that (node3 and upwards), so all its BDSs
shouldn't be affected.  So even with the bdrv_ref()s introduced in this
patch, I'd still expect node1 to be deleted exactly here.

Max

> #18 0x000055c0c950142b in job_defer_to_main_loop_bh (opaque=0x55c0cbe38a50) at job.c:981
> #19 0x000055c0c96171fc in aio_bh_call (bh=0x55c0cbe19f20) at util/async.c:90
> #20 0x000055c0c9617294 in aio_bh_poll (ctx=0x55c0cbd7b8d0) at util/async.c:118
> #21 0x000055c0c961c150 in aio_poll (ctx=0x55c0cbd7b8d0, blocking=true) at util/aio-posix.c:690
> #22 0x000055c0c957246c in bdrv_flush (bs=0x55c0cbe4b250) at block/io.c:2693
> #23 0x000055c0c94f8e46 in bdrv_reopen_prepare (reopen_state=0x55c0cbef4d68, queue=0x55c0cbeab9f0, errp=0x7ffd65617050) at block.c:3206
> #24 0x000055c0c94f883a in bdrv_reopen_multiple (ctx=0x55c0cbd7b8d0, bs_queue=0x55c0cbeab9f0, errp=0x7ffd656170c8) at block.c:3032
> #25 0x000055c0c94f8a00 in bdrv_reopen (bs=0x55c0cbe2c250, bdrv_flags=8194, errp=0x7ffd65617220) at block.c:3075
> #26 0x000055c0c956a23f in commit_active_start (job_id=0x0, bs=0x55c0cbd905b0, base=0x55c0cbe2c250, creation_flags=0, speed=0, on_error=BLOCKDEV_ON_ERROR_REPORT, filter_node_name=0x0, cb=0x0, opaque=0x0, auto_complete=false, errp=0x7ffd65617220) at block/mirror.c:1687
> #27 0x000055c0c927437c in qmp_block_commit (has_job_id=false, job_id=0x0, device=0x55c0cbef4d20 "drive0", has_base_node=false, base_node=0x0, has_base=true, base=0x55c0cbe38a00 "/home/kwolf/source/qemu/tests/qemu-iotests/scratch/img-3.img", has_top_node=false, top_node=0x0, has_top=false, top=0x0, has_backing_file=false, backing_file=0x0, has_speed=false, speed=0, has_filter_node_name=false, filter_node_name=0x0, errp=0x7ffd656172d8) at blockdev.c:3325
> #28 0x000055c0c928bb08 in qmp_marshal_block_commit (args=<optimized out>, ret=<optimized out>, errp=0x7ffd656173b8) at qapi/qapi-commands-block-core.c:409
> #29 0x000055c0c960beef in do_qmp_dispatch (errp=0x7ffd656173b0, allow_oob=false, request=<optimized out>, cmds=0x55c0c9f56f80 <qmp_commands>) at qapi/qmp-dispatch.c:129
> #30 0x000055c0c960beef in qmp_dispatch (cmds=0x55c0c9f56f80 <qmp_commands>, request=<optimized out>, allow_oob=<optimized out>) at qapi/qmp-dispatch.c:171
> #31 0x000055c0c9111623 in monitor_qmp_dispatch (mon=0x55c0cbdbb930, req=0x55c0ccf90e00, id=0x0) at /home/kwolf/source/qemu/monitor.c:4168
> #32 0x000055c0c911191f in monitor_qmp_bh_dispatcher (data=0x0) at /home/kwolf/source/qemu/monitor.c:4237
> #33 0x000055c0c96171fc in aio_bh_call (bh=0x55c0cbccc840) at util/async.c:90
> #34 0x000055c0c9617294 in aio_bh_poll (ctx=0x55c0cbccc550) at util/async.c:118
> #35 0x000055c0c961b723 in aio_dispatch (ctx=0x55c0cbccc550) at util/aio-posix.c:436
> #36 0x000055c0c961762f in aio_ctx_dispatch (source=0x55c0cbccc550, callback=0x0, user_data=0x0) at util/async.c:261
> #37 0x00007faecaea1257 in g_main_context_dispatch () at /lib64/libglib-2.0.so.0
> #38 0x000055c0c961a2aa in glib_pollfds_poll () at util/main-loop.c:215
> #39 0x000055c0c961a2aa in os_host_main_loop_wait (timeout=0) at util/main-loop.c:238
> #40 0x000055c0c961a2aa in main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:497
> #41 0x000055c0c92805e1 in main_loop () at vl.c:1866
> #42 0x000055c0c9287eca in main (argc=18, argv=0x7ffd65617958, envp=0x7ffd656179f0) at vl.c:4647
> (rr) p bs.node_name 
> $17 = "node1", '\000' <repeats 26 times>
> 
> Obviously, this exact backtrace shouldn't even be possible like this
> because job_defer_to_main_loop_bh() shouldn't be called inside
> bdrv_flush(), but when draining the subtree in bdrv_reopen(). This is
> only fixed in "blockjob: Lie better in child_job_drained_poll()".
> 
> It probably doesn't make a difference, though, where exactly during
> bdrv_reopen() the node is deleted. We'll crash anyway if we don't hold
> the extra reference.
> 
> Kevin
>
Kevin Wolf Sept. 17, 2018, 11:37 a.m. UTC | #5
Am 17.09.2018 um 00:05 hat Max Reitz geschrieben:
> On 14.09.18 18:25, Kevin Wolf wrote:
> > Am 13.09.2018 um 22:55 hat Max Reitz geschrieben:
> >> On 13.09.18 14:52, Kevin Wolf wrote:
> >>> When starting an active commit job, other callbacks can run before
> >>> mirror_start_job() calls bdrv_ref() where needed and cause the nodes to
> >>> go away. Add another pair of bdrv_ref/unref() around it to protect
> >>> against this case.
> >>>
> >>> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> >>> ---
> >>>  block/mirror.c | 11 +++++++++++
> >>>  1 file changed, 11 insertions(+)
> >>
> >> Reviewed-by: Max Reitz <mreitz@redhat.com>
> >>
> >> But...  How?
> > 
> > Tried to reproduce the other bug Paolo was concerned about (good there
> > is something like 'git reflog'!) and dug up this one instead.
> > 
> > So the scenario seems to be test_stream_commit_1 in qemu-iotests 030.
> > 
> > The backtrace when the BDS is deleted is the following:
> > 
> > (rr) bt
> > #0  0x00007faeaf6145ec in __memset_sse2_unaligned_erms () at /lib64/libc.so.6
> > #1  0x00007faeaf60414e in _int_free () at /lib64/libc.so.6
> > #2  0x00007faeaf6093be in free () at /lib64/libc.so.6
> > #3  0x00007faecaea6b4e in g_free () at /lib64/libglib-2.0.so.0
> > #4  0x000055c0c94f9d6c in bdrv_delete (bs=0x55c0cbe69240) at block.c:3590
> > #5  0x000055c0c94fbe82 in bdrv_unref (bs=0x55c0cbe69240) at block.c:4638
> > #6  0x000055c0c94f6761 in bdrv_root_unref_child (child=0x55c0cbe57af0) at block.c:2188
> > #7  0x000055c0c94fe239 in block_job_remove_all_bdrv (job=0x55c0ccf9e220) at blockjob.c:200
> > #8  0x000055c0c94fdf1f in block_job_free (job=0x55c0ccf9e220) at blockjob.c:94
> > #9  0x000055c0c94ffe0d in job_unref (job=0x55c0ccf9e220) at job.c:368
> > #10 0x000055c0c95007cd in job_do_dismiss (job=0x55c0ccf9e220) at job.c:641
> > #11 0x000055c0c95008d9 in job_conclude (job=0x55c0ccf9e220) at job.c:667
> > #12 0x000055c0c9500b66 in job_finalize_single (job=0x55c0ccf9e220) at job.c:735
> > #13 0x000055c0c94ff4fa in job_txn_apply (txn=0x55c0cbe19eb0, fn=0x55c0c9500a62 <job_finalize_single>, lock=true) at job.c:151
> > #14 0x000055c0c9500e92 in job_do_finalize (job=0x55c0ccf9e220) at job.c:822
> > #15 0x000055c0c9501040 in job_completed_txn_success (job=0x55c0ccf9e220) at job.c:872
> > #16 0x000055c0c950115c in job_completed (job=0x55c0ccf9e220, ret=0, error=0x0) at job.c:892
> > #17 0x000055c0c92b572c in stream_complete (job=0x55c0ccf9e220, opaque=0x55c0cc050bc0) at block/stream.c:96
> 
> But isn't this just deletion of node1, i.e. the intermediate node of the
> stream job?
> 
> The commit job happens above that (node3 and upwards), so all its BDSs
> shouldn't be affected.  So even with the bdrv_ref()s introduced in this
> patch, I'd still expect node1 to be deleted exactly here.

Hm, I suppose that's true.

So it crashed because the drain in bdrv_reopen() didn't actually drain
the streaming job (which includes removing node1), so node1 ended up in
the ReopenQueue and when it disappeared in the middle of reopen, we got
a use after free.

Then it would be bug between job draining and bdrv_reopen() and the
active commit job would only be a victim of the bug without actually
doing anything wrong, and we can drop this patch.

Does this make sense?

Kevin
Max Reitz Sept. 18, 2018, 2:11 p.m. UTC | #6
On 17.09.18 13:37, Kevin Wolf wrote:
> Am 17.09.2018 um 00:05 hat Max Reitz geschrieben:
>> On 14.09.18 18:25, Kevin Wolf wrote:
>>> Am 13.09.2018 um 22:55 hat Max Reitz geschrieben:
>>>> On 13.09.18 14:52, Kevin Wolf wrote:
>>>>> When starting an active commit job, other callbacks can run before
>>>>> mirror_start_job() calls bdrv_ref() where needed and cause the nodes to
>>>>> go away. Add another pair of bdrv_ref/unref() around it to protect
>>>>> against this case.
>>>>>
>>>>> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
>>>>> ---
>>>>>  block/mirror.c | 11 +++++++++++
>>>>>  1 file changed, 11 insertions(+)
>>>>
>>>> Reviewed-by: Max Reitz <mreitz@redhat.com>
>>>>
>>>> But...  How?
>>>
>>> Tried to reproduce the other bug Paolo was concerned about (good there
>>> is something like 'git reflog'!) and dug up this one instead.
>>>
>>> So the scenario seems to be test_stream_commit_1 in qemu-iotests 030.
>>>
>>> The backtrace when the BDS is deleted is the following:
>>>
>>> (rr) bt
>>> #0  0x00007faeaf6145ec in __memset_sse2_unaligned_erms () at /lib64/libc.so.6
>>> #1  0x00007faeaf60414e in _int_free () at /lib64/libc.so.6
>>> #2  0x00007faeaf6093be in free () at /lib64/libc.so.6
>>> #3  0x00007faecaea6b4e in g_free () at /lib64/libglib-2.0.so.0
>>> #4  0x000055c0c94f9d6c in bdrv_delete (bs=0x55c0cbe69240) at block.c:3590
>>> #5  0x000055c0c94fbe82 in bdrv_unref (bs=0x55c0cbe69240) at block.c:4638
>>> #6  0x000055c0c94f6761 in bdrv_root_unref_child (child=0x55c0cbe57af0) at block.c:2188
>>> #7  0x000055c0c94fe239 in block_job_remove_all_bdrv (job=0x55c0ccf9e220) at blockjob.c:200
>>> #8  0x000055c0c94fdf1f in block_job_free (job=0x55c0ccf9e220) at blockjob.c:94
>>> #9  0x000055c0c94ffe0d in job_unref (job=0x55c0ccf9e220) at job.c:368
>>> #10 0x000055c0c95007cd in job_do_dismiss (job=0x55c0ccf9e220) at job.c:641
>>> #11 0x000055c0c95008d9 in job_conclude (job=0x55c0ccf9e220) at job.c:667
>>> #12 0x000055c0c9500b66 in job_finalize_single (job=0x55c0ccf9e220) at job.c:735
>>> #13 0x000055c0c94ff4fa in job_txn_apply (txn=0x55c0cbe19eb0, fn=0x55c0c9500a62 <job_finalize_single>, lock=true) at job.c:151
>>> #14 0x000055c0c9500e92 in job_do_finalize (job=0x55c0ccf9e220) at job.c:822
>>> #15 0x000055c0c9501040 in job_completed_txn_success (job=0x55c0ccf9e220) at job.c:872
>>> #16 0x000055c0c950115c in job_completed (job=0x55c0ccf9e220, ret=0, error=0x0) at job.c:892
>>> #17 0x000055c0c92b572c in stream_complete (job=0x55c0ccf9e220, opaque=0x55c0cc050bc0) at block/stream.c:96
>>
>> But isn't this just deletion of node1, i.e. the intermediate node of the
>> stream job?
>>
>> The commit job happens above that (node3 and upwards), so all its BDSs
>> shouldn't be affected.  So even with the bdrv_ref()s introduced in this
>> patch, I'd still expect node1 to be deleted exactly here.
> 
> Hm, I suppose that's true.
> 
> So it crashed because the drain in bdrv_reopen() didn't actually drain
> the streaming job (which includes removing node1), so node1 ended up in
> the ReopenQueue and when it disappeared in the middle of reopen, we got
> a use after free.
> 
> Then it would be bug between job draining and bdrv_reopen() and the
> active commit job would only be a victim of the bug without actually
> doing anything wrong, and we can drop this patch.
> 
> Does this make sense?

The explanation makes sense, yes.

About dropping the patch...  We would have to be certain that you can
only run a block job if none of the nodes involved can be deleted at any
point.  After having started the job, this is the job's responsibility
by having referenced everything it needs (through block_job_add_bdrv()).
 So the critical point is when the job is starting, before it invoked
that function.

The user cannot do any graph manipulation then because the monitor is
blocked by starting the job.  So the only issue would be other
operations that can make nodes go away without user intervention.
Naturally the way to prevent conflicts would be with blockers, and
currently our old job blockers should be sufficient to prevent them.
But in theory we don't like them, so is this what we need the graph
manipulation blocker for? :-)

Max
Kevin Wolf Sept. 18, 2018, 3:04 p.m. UTC | #7
Am 18.09.2018 um 16:11 hat Max Reitz geschrieben:
> On 17.09.18 13:37, Kevin Wolf wrote:
> > Am 17.09.2018 um 00:05 hat Max Reitz geschrieben:
> >> On 14.09.18 18:25, Kevin Wolf wrote:
> >>> Am 13.09.2018 um 22:55 hat Max Reitz geschrieben:
> >>>> On 13.09.18 14:52, Kevin Wolf wrote:
> >>>>> When starting an active commit job, other callbacks can run before
> >>>>> mirror_start_job() calls bdrv_ref() where needed and cause the nodes to
> >>>>> go away. Add another pair of bdrv_ref/unref() around it to protect
> >>>>> against this case.
> >>>>>
> >>>>> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> >>>>> ---
> >>>>>  block/mirror.c | 11 +++++++++++
> >>>>>  1 file changed, 11 insertions(+)
> >>>>
> >>>> Reviewed-by: Max Reitz <mreitz@redhat.com>
> >>>>
> >>>> But...  How?
> >>>
> >>> Tried to reproduce the other bug Paolo was concerned about (good there
> >>> is something like 'git reflog'!) and dug up this one instead.
> >>>
> >>> So the scenario seems to be test_stream_commit_1 in qemu-iotests 030.
> >>>
> >>> The backtrace when the BDS is deleted is the following:
> >>>
> >>> (rr) bt
> >>> #0  0x00007faeaf6145ec in __memset_sse2_unaligned_erms () at /lib64/libc.so.6
> >>> #1  0x00007faeaf60414e in _int_free () at /lib64/libc.so.6
> >>> #2  0x00007faeaf6093be in free () at /lib64/libc.so.6
> >>> #3  0x00007faecaea6b4e in g_free () at /lib64/libglib-2.0.so.0
> >>> #4  0x000055c0c94f9d6c in bdrv_delete (bs=0x55c0cbe69240) at block.c:3590
> >>> #5  0x000055c0c94fbe82 in bdrv_unref (bs=0x55c0cbe69240) at block.c:4638
> >>> #6  0x000055c0c94f6761 in bdrv_root_unref_child (child=0x55c0cbe57af0) at block.c:2188
> >>> #7  0x000055c0c94fe239 in block_job_remove_all_bdrv (job=0x55c0ccf9e220) at blockjob.c:200
> >>> #8  0x000055c0c94fdf1f in block_job_free (job=0x55c0ccf9e220) at blockjob.c:94
> >>> #9  0x000055c0c94ffe0d in job_unref (job=0x55c0ccf9e220) at job.c:368
> >>> #10 0x000055c0c95007cd in job_do_dismiss (job=0x55c0ccf9e220) at job.c:641
> >>> #11 0x000055c0c95008d9 in job_conclude (job=0x55c0ccf9e220) at job.c:667
> >>> #12 0x000055c0c9500b66 in job_finalize_single (job=0x55c0ccf9e220) at job.c:735
> >>> #13 0x000055c0c94ff4fa in job_txn_apply (txn=0x55c0cbe19eb0, fn=0x55c0c9500a62 <job_finalize_single>, lock=true) at job.c:151
> >>> #14 0x000055c0c9500e92 in job_do_finalize (job=0x55c0ccf9e220) at job.c:822
> >>> #15 0x000055c0c9501040 in job_completed_txn_success (job=0x55c0ccf9e220) at job.c:872
> >>> #16 0x000055c0c950115c in job_completed (job=0x55c0ccf9e220, ret=0, error=0x0) at job.c:892
> >>> #17 0x000055c0c92b572c in stream_complete (job=0x55c0ccf9e220, opaque=0x55c0cc050bc0) at block/stream.c:96
> >>
> >> But isn't this just deletion of node1, i.e. the intermediate node of the
> >> stream job?
> >>
> >> The commit job happens above that (node3 and upwards), so all its BDSs
> >> shouldn't be affected.  So even with the bdrv_ref()s introduced in this
> >> patch, I'd still expect node1 to be deleted exactly here.
> > 
> > Hm, I suppose that's true.
> > 
> > So it crashed because the drain in bdrv_reopen() didn't actually drain
> > the streaming job (which includes removing node1), so node1 ended up in
> > the ReopenQueue and when it disappeared in the middle of reopen, we got
> > a use after free.
> > 
> > Then it would be bug between job draining and bdrv_reopen() and the
> > active commit job would only be a victim of the bug without actually
> > doing anything wrong, and we can drop this patch.
> > 
> > Does this make sense?
> 
> The explanation makes sense, yes.
> 
> About dropping the patch...  We would have to be certain that you can
> only run a block job if none of the nodes involved can be deleted at any
> point.  After having started the job, this is the job's responsibility
> by having referenced everything it needs (through block_job_add_bdrv()).
>  So the critical point is when the job is starting, before it invoked
> that function.
> 
> The user cannot do any graph manipulation then because the monitor is
> blocked by starting the job.  So the only issue would be other
> operations that can make nodes go away without user intervention.
> Naturally the way to prevent conflicts would be with blockers, and
> currently our old job blockers should be sufficient to prevent them.
> But in theory we don't like them, so is this what we need the graph
> manipulation blocker for? :-)

I don't think so. Permissions are for users that are potentially
attached for a long time, not for protecting operations during which we
hold the BQL anyway.

Our problem here isn't that the graph is changing, but that nodes are
going away. The solution is to take references when we need them instead
of relying on someone else holding a reference for us when we don't know
who that someone is. Specifically, QMP commands might need to hold an
additional reference while working with a node.

An actual blocker for graph modification might make sense in the case of
a commit job, for example: If you break the backing chain while a commit
is running, the job becomes semantically invalid.

Kevin
Max Reitz Sept. 18, 2018, 11:48 p.m. UTC | #8
On 18.09.18 17:04, Kevin Wolf wrote:
> Am 18.09.2018 um 16:11 hat Max Reitz geschrieben:
>> On 17.09.18 13:37, Kevin Wolf wrote:
>>> Am 17.09.2018 um 00:05 hat Max Reitz geschrieben:
>>>> On 14.09.18 18:25, Kevin Wolf wrote:
>>>>> Am 13.09.2018 um 22:55 hat Max Reitz geschrieben:
>>>>>> On 13.09.18 14:52, Kevin Wolf wrote:
>>>>>>> When starting an active commit job, other callbacks can run before
>>>>>>> mirror_start_job() calls bdrv_ref() where needed and cause the nodes to
>>>>>>> go away. Add another pair of bdrv_ref/unref() around it to protect
>>>>>>> against this case.
>>>>>>>
>>>>>>> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
>>>>>>> ---
>>>>>>>  block/mirror.c | 11 +++++++++++
>>>>>>>  1 file changed, 11 insertions(+)
>>>>>>
>>>>>> Reviewed-by: Max Reitz <mreitz@redhat.com>
>>>>>>
>>>>>> But...  How?
>>>>>
>>>>> Tried to reproduce the other bug Paolo was concerned about (good there
>>>>> is something like 'git reflog'!) and dug up this one instead.
>>>>>
>>>>> So the scenario seems to be test_stream_commit_1 in qemu-iotests 030.
>>>>>
>>>>> The backtrace when the BDS is deleted is the following:
>>>>>
>>>>> (rr) bt
>>>>> #0  0x00007faeaf6145ec in __memset_sse2_unaligned_erms () at /lib64/libc.so.6
>>>>> #1  0x00007faeaf60414e in _int_free () at /lib64/libc.so.6
>>>>> #2  0x00007faeaf6093be in free () at /lib64/libc.so.6
>>>>> #3  0x00007faecaea6b4e in g_free () at /lib64/libglib-2.0.so.0
>>>>> #4  0x000055c0c94f9d6c in bdrv_delete (bs=0x55c0cbe69240) at block.c:3590
>>>>> #5  0x000055c0c94fbe82 in bdrv_unref (bs=0x55c0cbe69240) at block.c:4638
>>>>> #6  0x000055c0c94f6761 in bdrv_root_unref_child (child=0x55c0cbe57af0) at block.c:2188
>>>>> #7  0x000055c0c94fe239 in block_job_remove_all_bdrv (job=0x55c0ccf9e220) at blockjob.c:200
>>>>> #8  0x000055c0c94fdf1f in block_job_free (job=0x55c0ccf9e220) at blockjob.c:94
>>>>> #9  0x000055c0c94ffe0d in job_unref (job=0x55c0ccf9e220) at job.c:368
>>>>> #10 0x000055c0c95007cd in job_do_dismiss (job=0x55c0ccf9e220) at job.c:641
>>>>> #11 0x000055c0c95008d9 in job_conclude (job=0x55c0ccf9e220) at job.c:667
>>>>> #12 0x000055c0c9500b66 in job_finalize_single (job=0x55c0ccf9e220) at job.c:735
>>>>> #13 0x000055c0c94ff4fa in job_txn_apply (txn=0x55c0cbe19eb0, fn=0x55c0c9500a62 <job_finalize_single>, lock=true) at job.c:151
>>>>> #14 0x000055c0c9500e92 in job_do_finalize (job=0x55c0ccf9e220) at job.c:822
>>>>> #15 0x000055c0c9501040 in job_completed_txn_success (job=0x55c0ccf9e220) at job.c:872
>>>>> #16 0x000055c0c950115c in job_completed (job=0x55c0ccf9e220, ret=0, error=0x0) at job.c:892
>>>>> #17 0x000055c0c92b572c in stream_complete (job=0x55c0ccf9e220, opaque=0x55c0cc050bc0) at block/stream.c:96
>>>>
>>>> But isn't this just deletion of node1, i.e. the intermediate node of the
>>>> stream job?
>>>>
>>>> The commit job happens above that (node3 and upwards), so all its BDSs
>>>> shouldn't be affected.  So even with the bdrv_ref()s introduced in this
>>>> patch, I'd still expect node1 to be deleted exactly here.
>>>
>>> Hm, I suppose that's true.
>>>
>>> So it crashed because the drain in bdrv_reopen() didn't actually drain
>>> the streaming job (which includes removing node1), so node1 ended up in
>>> the ReopenQueue and when it disappeared in the middle of reopen, we got
>>> a use after free.
>>>
>>> Then it would be bug between job draining and bdrv_reopen() and the
>>> active commit job would only be a victim of the bug without actually
>>> doing anything wrong, and we can drop this patch.
>>>
>>> Does this make sense?
>>
>> The explanation makes sense, yes.
>>
>> About dropping the patch...  We would have to be certain that you can
>> only run a block job if none of the nodes involved can be deleted at any
>> point.  After having started the job, this is the job's responsibility
>> by having referenced everything it needs (through block_job_add_bdrv()).
>>  So the critical point is when the job is starting, before it invoked
>> that function.
>>
>> The user cannot do any graph manipulation then because the monitor is
>> blocked by starting the job.  So the only issue would be other
>> operations that can make nodes go away without user intervention.
>> Naturally the way to prevent conflicts would be with blockers, and
>> currently our old job blockers should be sufficient to prevent them.
>> But in theory we don't like them, so is this what we need the graph
>> manipulation blocker for? :-)
> 
> I don't think so. Permissions are for users that are potentially
> attached for a long time, not for protecting operations during which we
> hold the BQL anyway.
> 
> Our problem here isn't that the graph is changing, but that nodes are
> going away.

Hm, yeah.  Which they usually wouldn't unless the graph is changing, but
that's not a necessity.

Though the better explanation is: The node can only go away when it has
only one parent.  So even if that parent took the graph manipulation
blocker, there'd be nobody to care.

> The solution is to take references when we need them instead
> of relying on someone else holding a reference for us when we don't know
> who that someone is.

Yeah.  Which means that this patch isn't completely wrong, we should
just do it for every job.  And maybe take the references not in
block/mirror.c, but immediately in blockdev.c.

> Specifically, QMP commands might need to hold an
> additional reference while working with a node.
> 
> An actual blocker for graph modification might make sense in the case of
> a commit job, for example: If you break the backing chain while a commit
> is running, the job becomes semantically invalid.

Well, at least doing the graph changes after the job is done will be
impossible, yes.

(The copy operation itself should still be OK as long as the visible
content of the top node doesn't change -- and at least in theory we only
allow changing the backing file to something that won't violate this
condition.  Otherwise, you'd probably have to un-share consistent_read.)

Max
Kevin Wolf Sept. 20, 2018, 11:52 a.m. UTC | #9
Am 19.09.2018 um 01:48 hat Max Reitz geschrieben:
> On 18.09.18 17:04, Kevin Wolf wrote:
> > Am 18.09.2018 um 16:11 hat Max Reitz geschrieben:
> >> The user cannot do any graph manipulation then because the monitor is
> >> blocked by starting the job.  So the only issue would be other
> >> operations that can make nodes go away without user intervention.
> >> Naturally the way to prevent conflicts would be with blockers, and
> >> currently our old job blockers should be sufficient to prevent them.
> >> But in theory we don't like them, so is this what we need the graph
> >> manipulation blocker for? :-)
> > 
> > I don't think so. Permissions are for users that are potentially
> > attached for a long time, not for protecting operations during which we
> > hold the BQL anyway.
> > 
> > Our problem here isn't that the graph is changing, but that nodes are
> > going away.
> 
> Hm, yeah.  Which they usually wouldn't unless the graph is changing,
> but that's not a necessity.

My point was more that you can modify the graph without nodes going
away (if they are still referenced somewhere else).

> Though the better explanation is: The node can only go away when it has
> only one parent.  So even if that parent took the graph manipulation
> blocker, there'd be nobody to care.
> 
> > The solution is to take references when we need them instead
> > of relying on someone else holding a reference for us when we don't know
> > who that someone is.
> 
> Yeah.  Which means that this patch isn't completely wrong, we should
> just do it for every job.  And maybe take the references not in
> block/mirror.c, but immediately in blockdev.c.

Yes. I'll drop this patch anyway as it's unrelated and making the fix in
the wrong place. As you say, blockdev.c would be the right place for a
fix, and it would be good to have a test case that actually reproduces
the problem.

> > Specifically, QMP commands might need to hold an
> > additional reference while working with a node.
> > 
> > An actual blocker for graph modification might make sense in the case of
> > a commit job, for example: If you break the backing chain while a commit
> > is running, the job becomes semantically invalid.
> 
> Well, at least doing the graph changes after the job is done will be
> impossible, yes.

Actually, it depends. Modifying the graph is okay as long as certain
conditions are met. In the case of commit that is primarily that the
base node is still in the backing chain. If you want to add a throttling
filter in the backing chain while commit is running, that's not really
a problem (other than the filter going away when commit completes...)

The "I promise this is the same" case is another option, though rather
unlikely when the base image isn't read-only, but written to by the
commit job. Maybe if you open a raw image through two different links to
the same image or something...

> (The copy operation itself should still be OK as long as the visible
> content of the top node doesn't change -- and at least in theory we only
> allow changing the backing file to something that won't violate this
> condition.  Otherwise, you'd probably have to un-share consistent_read.)

If the old and the new backing file node aren't actually referring to the
same storage, the copy operation would seemingly succeed, but not make
any sense because you wouldn't know which part of the image was committed
to which backing file.

Kevin
diff mbox series

Patch

diff --git a/block/mirror.c b/block/mirror.c
index 56d9ef7474..c8657991cf 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -1697,7 +1697,14 @@  void commit_active_start(const char *job_id, BlockDriverState *bs,
 
     orig_base_flags = bdrv_get_flags(base);
 
+    /* bdrv_reopen() drains, which might make the BDSes go away before a
+     * reference is taken in mirror_start_job(). */
+    bdrv_ref(bs);
+    bdrv_ref(base);
+
     if (bdrv_reopen(base, bs->open_flags, errp)) {
+        bdrv_unref(bs);
+        bdrv_unref(base);
         return;
     }
 
@@ -1707,6 +1714,10 @@  void commit_active_start(const char *job_id, BlockDriverState *bs,
                      &commit_active_job_driver, false, base, auto_complete,
                      filter_node_name, false, MIRROR_COPY_MODE_BACKGROUND,
                      &local_err);
+
+    bdrv_unref(bs);
+    bdrv_unref(base);
+
     if (local_err) {
         error_propagate(errp, local_err);
         goto error_restore_flags;