diff mbox

[1.1] scsi: Add assertion for use-after-free errors

Message ID 1336066583-10503-1-git-send-email-sw@weilnetz.de
State Accepted
Headers show

Commit Message

Stefan Weil May 3, 2012, 5:36 p.m. UTC
The QEMU emulation which is currently used with Raspberry PI images
(qemu-system-arm -M versatilepb ...) accesses memory which was freed.

Valgrind output (extract):

==17857== Invalid write of size 4
==17857==    at 0x24EB06: scsi_req_unref (scsi-bus.c:1273)
==17857==    by 0x24FFAE: scsi_read_complete (scsi-disk.c:277)
==17857==    by 0x152ACC: bdrv_co_em_bh (block.c:3363)
==17857==    by 0x13D49C: qemu_bh_poll (async.c:71)
==17857==    by 0x211A8C: main_loop_wait (main-loop.c:503)
==17857==    by 0x207954: main_loop (vl.c:1555)
==17857==    by 0x20E9C9: main (vl.c:3653)
==17857==  Address 0x1c54383c is 12 bytes inside a block of size 260 free'd
==17857==    at 0x4824B3A: free (vg_replace_malloc.c:366)
==17857==    by 0x20ADFA: free_and_trace (vl.c:2250)
==17857==    by 0x4899FC5: g_free (in /lib/libglib-2.0.so.0.2400.1)
==17857==    by 0x24EB3B: scsi_req_unref (scsi-bus.c:1277)
==17857==    by 0x24F003: scsi_req_complete (scsi-bus.c:1383)
==17857==    by 0x25022A: scsi_read_data (scsi-disk.c:334)
==17857==    by 0x24EB9F: scsi_req_continue (scsi-bus.c:1289)
==17857==    by 0x1C7787: lsi_do_dma (lsi53c895a.c:575)
==17857==    by 0x1C8CDA: lsi_execute_script (lsi53c895a.c:1147)
==17857==    by 0x1C74EA: lsi_resume_script (lsi53c895a.c:510)
==17857==    by 0x1C7ECD: lsi_transfer_data (lsi53c895a.c:746)
==17857==    by 0x24EC90: scsi_req_data (scsi-bus.c:1307)

(There are some more similar messages.)

This patch adds an assertion which also detects those errors:

Calling scsi_req_unref is not allowed when the previous call
of that function has decremented refcount to 0, because in this
case req was freed.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
---

There are chances that this patch breaks some test scenarios,
but that is intentional: we should not pretend that there are
no errors when there are some.

The Raspberry PI emulation with QEMU is currently used by
a lot of people.

Please apply this patch for the tests of QEMU 1.1.

Of course we should also fix the problem which triggers the
assertion. I still don't know whether it is caused by
lsi53c895a.c or by the scsi code.

Thanks,

Stefan Weil


 hw/scsi-bus.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

Comments

Stefan Weil May 3, 2012, 8:58 p.m. UTC | #1
Am 03.05.2012 19:36, schrieb Stefan Weil:
> The QEMU emulation which is currently used with Raspberry PI images
> (qemu-system-arm -M versatilepb ...) accesses memory which was freed.
>
> Valgrind output (extract):
>
> ==17857== Invalid write of size 4
> ==17857== at 0x24EB06: scsi_req_unref (scsi-bus.c:1273)
> ==17857== by 0x24FFAE: scsi_read_complete (scsi-disk.c:277)
> ==17857== by 0x152ACC: bdrv_co_em_bh (block.c:3363)
> ==17857== by 0x13D49C: qemu_bh_poll (async.c:71)
> ==17857== by 0x211A8C: main_loop_wait (main-loop.c:503)
> ==17857== by 0x207954: main_loop (vl.c:1555)
> ==17857== by 0x20E9C9: main (vl.c:3653)
> ==17857== Address 0x1c54383c is 12 bytes inside a block of size 260 free'd
> ==17857== at 0x4824B3A: free (vg_replace_malloc.c:366)
> ==17857== by 0x20ADFA: free_and_trace (vl.c:2250)
> ==17857== by 0x4899FC5: g_free (in /lib/libglib-2.0.so.0.2400.1)
> ==17857== by 0x24EB3B: scsi_req_unref (scsi-bus.c:1277)
> ==17857== by 0x24F003: scsi_req_complete (scsi-bus.c:1383)
> ==17857== by 0x25022A: scsi_read_data (scsi-disk.c:334)
> ==17857== by 0x24EB9F: scsi_req_continue (scsi-bus.c:1289)
> ==17857== by 0x1C7787: lsi_do_dma (lsi53c895a.c:575)
> ==17857== by 0x1C8CDA: lsi_execute_script (lsi53c895a.c:1147)
> ==17857== by 0x1C74EA: lsi_resume_script (lsi53c895a.c:510)
> ==17857== by 0x1C7ECD: lsi_transfer_data (lsi53c895a.c:746)
> ==17857== by 0x24EC90: scsi_req_data (scsi-bus.c:1307)


Hi Paolo,

this is the result of a bisect to narrow the source of problem:

ac6684264642f1aea7cba5c0c3907409b1f7f904 is the first bad commit
commit ac6684264642f1aea7cba5c0c3907409b1f7f904
Author: Paolo Bonzini <pbonzini@redhat.com>
Date:   Thu Apr 19 11:55:28 2012 +0200

     scsi: support FUA on reads

     To force unit access on reads, flush the cache *before* doing the read.

     Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Regards,

Stefan


>
> (There are some more similar messages.)
>
> This patch adds an assertion which also detects those errors:
>
> Calling scsi_req_unref is not allowed when the previous call
> of that function has decremented refcount to 0, because in this
> case req was freed.
>
> Signed-off-by: Stefan Weil <sw@weilnetz.de>
> ---
>
> There are chances that this patch breaks some test scenarios,
> but that is intentional: we should not pretend that there are
> no errors when there are some.
>
> The Raspberry PI emulation with QEMU is currently used by
> a lot of people.
>
> Please apply this patch for the tests of QEMU 1.1.
>
> Of course we should also fix the problem which triggers the
> assertion. I still don't know whether it is caused by
> lsi53c895a.c or by the scsi code.

It is the scsi code, see git bisect result.
Paolo Bonzini May 4, 2012, 6:52 a.m. UTC | #2
Il 03/05/2012 22:58, Stefan Weil ha scritto:
> Am 03.05.2012 19:36, schrieb Stefan Weil:
>> The QEMU emulation which is currently used with Raspberry PI images
>> (qemu-system-arm -M versatilepb ...) accesses memory which was freed.
>>
>> Valgrind output (extract):
>>
>> ==17857== Invalid write of size 4
>> ==17857== at 0x24EB06: scsi_req_unref (scsi-bus.c:1273)
>> ==17857== by 0x24FFAE: scsi_read_complete (scsi-disk.c:277)
>> ==17857== by 0x152ACC: bdrv_co_em_bh (block.c:3363)
>> ==17857== by 0x13D49C: qemu_bh_poll (async.c:71)
>> ==17857== by 0x211A8C: main_loop_wait (main-loop.c:503)
>> ==17857== by 0x207954: main_loop (vl.c:1555)
>> ==17857== by 0x20E9C9: main (vl.c:3653)
>> ==17857== Address 0x1c54383c is 12 bytes inside a block of size 260
>> free'd
>> ==17857== at 0x4824B3A: free (vg_replace_malloc.c:366)
>> ==17857== by 0x20ADFA: free_and_trace (vl.c:2250)
>> ==17857== by 0x4899FC5: g_free (in /lib/libglib-2.0.so.0.2400.1)
>> ==17857== by 0x24EB3B: scsi_req_unref (scsi-bus.c:1277)
>> ==17857== by 0x24F003: scsi_req_complete (scsi-bus.c:1383)
>> ==17857== by 0x25022A: scsi_read_data (scsi-disk.c:334)
>> ==17857== by 0x24EB9F: scsi_req_continue (scsi-bus.c:1289)
>> ==17857== by 0x1C7787: lsi_do_dma (lsi53c895a.c:575)
>> ==17857== by 0x1C8CDA: lsi_execute_script (lsi53c895a.c:1147)
>> ==17857== by 0x1C74EA: lsi_resume_script (lsi53c895a.c:510)
>> ==17857== by 0x1C7ECD: lsi_transfer_data (lsi53c895a.c:746)
>> ==17857== by 0x24EC90: scsi_req_data (scsi-bus.c:1307)

Yes, this was reported by David Gibson too.  Interesting that
virtio-scsi doesn't show it, probably it's the sglist support that hides
it.  I queued the fix and I'm sending the pull request in a matter of
minutes.  The patch is a good addition so I queued it too, thanks.

Paolo
diff mbox

Patch

diff --git a/hw/scsi-bus.c b/hw/scsi-bus.c
index dbdb99c..62779c7 100644
--- a/hw/scsi-bus.c
+++ b/hw/scsi-bus.c
@@ -1270,6 +1270,7 @@  SCSIRequest *scsi_req_ref(SCSIRequest *req)
 
 void scsi_req_unref(SCSIRequest *req)
 {
+    assert(req->refcount > 0);
     if (--req->refcount == 0) {
         if (req->ops->free_req) {
             req->ops->free_req(req);