Patchwork Make default invocation of block drivers safer (v3)

login
register
mail settings
Submitter Anthony Liguori
Date July 15, 2010, 12:50 p.m.
Message ID <1279198257-23681-1-git-send-email-aliguori@us.ibm.com>
Download mbox | patch
Permalink /patch/58980/
State New
Headers show

Comments

Anthony Liguori - July 15, 2010, 12:50 p.m.
CVE-2008-2004 described a vulnerability in QEMU whereas a malicious user could
trick the block probing code into accessing arbitrary files in a guest.  To
mitigate this, we added an explicit format parameter to -drive which disabling
block probing.

Fast forward to today, and the vast majority of users do not use this parameter.
libvirt does not use this by default nor does virt-manager.

Most users want block probing so we should try to make it safer.

This patch adds some logic to the raw device which attempts to detect a write
operation to the beginning of a raw device.  If the first 4 bytes happen to
match an image file that has a backing file that we support, it scrubs the
signature to all zeros.  If a user specifies an explicit format parameter, this
behavior is disabled.

I contend that while a legitimate guest could write such a signature to the
header, we would behave incorrectly anyway upon the next invocation of QEMU.
This simply changes the incorrect behavior to not involve a security
vulnerability.

I've tested this pretty extensively both in the positive and negative case.  I'm
not 100% confident in the block layer's ability to deal with zero sized writes
particularly with respect to the aio functions so some additional eyes would be
appreciated.

Even in the case of a single sector write, we have to make sure to invoked the
completion from a bottom half so just removing the zero sized write is not an
option.

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
---
v2 -> v3
 - add an assert to ensure the first iovec element is at least 512 bytes
v1 -> v2
 - be more paranoid about empty iovecs
---
 block.c     |    4 ++
 block/raw.c |  130 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 block_int.h |    1 +
 3 files changed, 135 insertions(+), 0 deletions(-)
Kevin Wolf - July 15, 2010, 1:01 p.m.
Am 15.07.2010 14:50, schrieb Anthony Liguori:
> CVE-2008-2004 described a vulnerability in QEMU whereas a malicious user could
> trick the block probing code into accessing arbitrary files in a guest.  To
> mitigate this, we added an explicit format parameter to -drive which disabling
> block probing.
> 
> Fast forward to today, and the vast majority of users do not use this parameter.
> libvirt does not use this by default nor does virt-manager.
> 
> Most users want block probing so we should try to make it safer.
> 
> This patch adds some logic to the raw device which attempts to detect a write
> operation to the beginning of a raw device.  If the first 4 bytes happen to
> match an image file that has a backing file that we support, it scrubs the
> signature to all zeros.  If a user specifies an explicit format parameter, this
> behavior is disabled.
> 
> I contend that while a legitimate guest could write such a signature to the
> header, we would behave incorrectly anyway upon the next invocation of QEMU.
> This simply changes the incorrect behavior to not involve a security
> vulnerability.
> 
> I've tested this pretty extensively both in the positive and negative case.  I'm
> not 100% confident in the block layer's ability to deal with zero sized writes
> particularly with respect to the aio functions so some additional eyes would be
> appreciated.
> 
> Even in the case of a single sector write, we have to make sure to invoked the
> completion from a bottom half so just removing the zero sized write is not an
> option.
> 
> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>

Acked-by: Kevin Wolf <kwolf@redhat.com>
Anthony PERARD - July 27, 2010, 5:01 p.m.
Anthony Liguori wrote:
> CVE-2008-2004 described a vulnerability in QEMU whereas a malicious user could
> trick the block probing code into accessing arbitrary files in a guest.  To
> mitigate this, we added an explicit format parameter to -drive which disabling
> block probing.
> 
> Fast forward to today, and the vast majority of users do not use this parameter.
> libvirt does not use this by default nor does virt-manager.
> 
> Most users want block probing so we should try to make it safer.
> 
> This patch adds some logic to the raw device which attempts to detect a write
> operation to the beginning of a raw device.  If the first 4 bytes happen to
> match an image file that has a backing file that we support, it scrubs the
> signature to all zeros.  If a user specifies an explicit format parameter, this
> behavior is disabled.
> 
> I contend that while a legitimate guest could write such a signature to the
> header, we would behave incorrectly anyway upon the next invocation of QEMU.
> This simply changes the incorrect behavior to not involve a security
> vulnerability.
> 
> I've tested this pretty extensively both in the positive and negative case.  I'm
> not 100% confident in the block layer's ability to deal with zero sized writes
> particularly with respect to the aio functions so some additional eyes would be
> appreciated.
> 
> Even in the case of a single sector write, we have to make sure to invoked the
> completion from a bottom half so just removing the zero sized write is not an
> option.
> 
> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
> ---
> v2 -> v3
>  - add an assert to ensure the first iovec element is at least 512 bytes
> v1 -> v2
>  - be more paranoid about empty iovecs
> ---
>  block.c     |    4 ++
>  block/raw.c |  130 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  block_int.h |    1 +
>  3 files changed, 135 insertions(+), 0 deletions(-)


>  static BlockDriverAIOCB *raw_aio_writev(BlockDriverState *bs,
>      int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
>      BlockDriverCompletionFunc *cb, void *opaque)
>  {
> +    const uint8_t *first_buf;
> +    int first_buf_index = 0, i;
> +
> +    /* This is probably being paranoid, but handle cases of zero size
> +       vectors. */
> +    for (i = 0; i < qiov->niov; i++) {
> +        if (qiov->iov[i].iov_len) {
> +            assert(qiov->iov[i].iov_len >= 512);
> +            first_buf_index = i;
> +            break;
> +        }
> +    }

Hi,

I have try to do an installation of Windows XP SP2, with qemu fd2f659,
and the Assertion failed when windows begin to format the disk.

The command line and the error message:
$ i386-softmmu/qemu -hda vm.img -cdrom winxpsp2.iso -boot dc
qemu: qemu/block/raw.c:130: raw_aio_writev: Assertion `qiov->iov[i].iov_len >= 512' failed.

And here, a little more information about the iov:
(gdb) p *qiov
$2 = {iov = 0x9106010, niov = 2, nalloc = 2, size = 512}
(gdb) p qiov->iov[0]
$3 = {iov_base = 0xaff3ce90, iov_len = 368}
(gdb) p qiov->iov[1]
$4 = {iov_base = 0xaff3f000, iov_len = 144}

Without the assert, the install work fine.

Regards,
Anthony Liguori - July 27, 2010, 5:16 p.m.
On 07/27/2010 12:01 PM, Anthony PERARD wrote:
> Anthony Liguori wrote:
>> CVE-2008-2004 described a vulnerability in QEMU whereas a malicious 
>> user could
>> trick the block probing code into accessing arbitrary files in a 
>> guest.  To
>> mitigate this, we added an explicit format parameter to -drive which 
>> disabling
>> block probing.
>>
>> Fast forward to today, and the vast majority of users do not use this 
>> parameter.
>> libvirt does not use this by default nor does virt-manager.
>>
>> Most users want block probing so we should try to make it safer.
>>
>> This patch adds some logic to the raw device which attempts to detect 
>> a write
>> operation to the beginning of a raw device.  If the first 4 bytes 
>> happen to
>> match an image file that has a backing file that we support, it 
>> scrubs the
>> signature to all zeros.  If a user specifies an explicit format 
>> parameter, this
>> behavior is disabled.
>>
>> I contend that while a legitimate guest could write such a signature 
>> to the
>> header, we would behave incorrectly anyway upon the next invocation 
>> of QEMU.
>> This simply changes the incorrect behavior to not involve a security
>> vulnerability.
>>
>> I've tested this pretty extensively both in the positive and negative 
>> case.  I'm
>> not 100% confident in the block layer's ability to deal with zero 
>> sized writes
>> particularly with respect to the aio functions so some additional 
>> eyes would be
>> appreciated.
>>
>> Even in the case of a single sector write, we have to make sure to 
>> invoked the
>> completion from a bottom half so just removing the zero sized write 
>> is not an
>> option.
>>
>> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
>> ---
>> v2 -> v3
>>  - add an assert to ensure the first iovec element is at least 512 bytes
>> v1 -> v2
>>  - be more paranoid about empty iovecs
>> ---
>>  block.c     |    4 ++
>>  block/raw.c |  130 
>> +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>  block_int.h |    1 +
>>  3 files changed, 135 insertions(+), 0 deletions(-)
>
>
>>  static BlockDriverAIOCB *raw_aio_writev(BlockDriverState *bs,
>>      int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
>>      BlockDriverCompletionFunc *cb, void *opaque)
>>  {
>> +    const uint8_t *first_buf;
>> +    int first_buf_index = 0, i;
>> +
>> +    /* This is probably being paranoid, but handle cases of zero size
>> +       vectors. */
>> +    for (i = 0; i < qiov->niov; i++) {
>> +        if (qiov->iov[i].iov_len) {
>> +            assert(qiov->iov[i].iov_len >= 512);
>> +            first_buf_index = i;
>> +            break;
>> +        }
>> +    }
>
> Hi,
>
> I have try to do an installation of Windows XP SP2, with qemu fd2f659,
> and the Assertion failed when windows begin to format the disk.
>
> The command line and the error message:
> $ i386-softmmu/qemu -hda vm.img -cdrom winxpsp2.iso -boot dc
> qemu: qemu/block/raw.c:130: raw_aio_writev: Assertion 
> `qiov->iov[i].iov_len >= 512' failed.
>
> And here, a little more information about the iov:
> (gdb) p *qiov
> $2 = {iov = 0x9106010, niov = 2, nalloc = 2, size = 512}
> (gdb) p qiov->iov[0]
> $3 = {iov_base = 0xaff3ce90, iov_len = 368}
> (gdb) p qiov->iov[1]
> $4 = {iov_base = 0xaff3f000, iov_len = 144}

How can a single sector request be split between two iovs in QEMU?  Are 
you carrying any patches in the version of QEMU that you're testing?  Is 
this qemu-dm?

To be clear, this is a discontiguous request. I'm looking at the core 
now in core.c and I don't see how an IDE disk can generate a request 
that looks like this.

Can you provide a full stack trace?

Regards,

Anthony Liguori


>
> Without the assert, the install work fine.
>
> Regards,
>
Anthony PERARD - July 27, 2010, 5:43 p.m.
Anthony Liguori wrote:
> On 07/27/2010 12:01 PM, Anthony PERARD wrote:
>> Anthony Liguori wrote:
>>> CVE-2008-2004 described a vulnerability in QEMU whereas a malicious 
>>> user could
>>> trick the block probing code into accessing arbitrary files in a 
>>> guest.  To
>>> mitigate this, we added an explicit format parameter to -drive which 
>>> disabling
>>> block probing.
>>>
>>> Fast forward to today, and the vast majority of users do not use this 
>>> parameter.
>>> libvirt does not use this by default nor does virt-manager.
>>>
>>> Most users want block probing so we should try to make it safer.
>>>
>>> This patch adds some logic to the raw device which attempts to detect 
>>> a write
>>> operation to the beginning of a raw device.  If the first 4 bytes 
>>> happen to
>>> match an image file that has a backing file that we support, it 
>>> scrubs the
>>> signature to all zeros.  If a user specifies an explicit format 
>>> parameter, this
>>> behavior is disabled.
>>>
>>> I contend that while a legitimate guest could write such a signature 
>>> to the
>>> header, we would behave incorrectly anyway upon the next invocation 
>>> of QEMU.
>>> This simply changes the incorrect behavior to not involve a security
>>> vulnerability.
>>>
>>> I've tested this pretty extensively both in the positive and negative 
>>> case.  I'm
>>> not 100% confident in the block layer's ability to deal with zero 
>>> sized writes
>>> particularly with respect to the aio functions so some additional 
>>> eyes would be
>>> appreciated.
>>>
>>> Even in the case of a single sector write, we have to make sure to 
>>> invoked the
>>> completion from a bottom half so just removing the zero sized write 
>>> is not an
>>> option.
>>>
>>> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
>>> ---
>>> v2 -> v3
>>>  - add an assert to ensure the first iovec element is at least 512 bytes
>>> v1 -> v2
>>>  - be more paranoid about empty iovecs
>>> ---
>>>  block.c     |    4 ++
>>>  block/raw.c |  130 
>>> +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>  block_int.h |    1 +
>>>  3 files changed, 135 insertions(+), 0 deletions(-)
>>
>>>  static BlockDriverAIOCB *raw_aio_writev(BlockDriverState *bs,
>>>      int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
>>>      BlockDriverCompletionFunc *cb, void *opaque)
>>>  {
>>> +    const uint8_t *first_buf;
>>> +    int first_buf_index = 0, i;
>>> +
>>> +    /* This is probably being paranoid, but handle cases of zero size
>>> +       vectors. */
>>> +    for (i = 0; i < qiov->niov; i++) {
>>> +        if (qiov->iov[i].iov_len) {
>>> +            assert(qiov->iov[i].iov_len >= 512);
>>> +            first_buf_index = i;
>>> +            break;
>>> +        }
>>> +    }
>> Hi,
>>
>> I have try to do an installation of Windows XP SP2, with qemu fd2f659,
>> and the Assertion failed when windows begin to format the disk.
>>
>> The command line and the error message:
>> $ i386-softmmu/qemu -hda vm.img -cdrom winxpsp2.iso -boot dc
>> qemu: qemu/block/raw.c:130: raw_aio_writev: Assertion 
>> `qiov->iov[i].iov_len >= 512' failed.
>>
>> And here, a little more information about the iov:
>> (gdb) p *qiov
>> $2 = {iov = 0x9106010, niov = 2, nalloc = 2, size = 512}
>> (gdb) p qiov->iov[0]
>> $3 = {iov_base = 0xaff3ce90, iov_len = 368}
>> (gdb) p qiov->iov[1]
>> $4 = {iov_base = 0xaff3f000, iov_len = 144}
> 
> How can a single sector request be split between two iovs in QEMU?  Are 
> you carrying any patches in the version of QEMU that you're testing?  Is 
> this qemu-dm?

Nop, I don't have any patch for this test. Is not qemu-dm.

> To be clear, this is a discontiguous request. I'm looking at the core 
> now in core.c and I don't see how an IDE disk can generate a request 
> that looks like this.
> 
> Can you provide a full stack trace?

#0  0xb77dd424 in __kernel_vsyscall ()
#1  0xb7418640 in raise () from /lib/i686/cmov/libc.so.6
#2  0xb741a018 in abort () from /lib/i686/cmov/libc.so.6
#3  0xb74115be in __assert_fail () from /lib/i686/cmov/libc.so.6
#4  0x08074d30 in raw_aio_writev (bs=0xa5bcec0, sector_num=63, qiov=0xa67cf14, nb_sectors=1, cb=0x81ae8c0 <dma_bdrv_cb>,
     opaque=0xa67cee0) at /tmp/qemu-merge/block/raw.c:130
#5  0x0806d024 in bdrv_aio_writev (bs=0xa5bcec0, sector_num=63, qiov=0xa67cf14, nb_sectors=1, cb=0x81ae8c0 <dma_bdrv_cb>,
     opaque=0xa67cee0) at /tmp/qemu-merge/block.c:2004
#6  0x081aea78 in dma_bdrv_cb (opaque=0xa67cee0, ret=0) at /tmp/qemu-merge/dma-helpers.c:120
#7  0x081aebc9 in dma_bdrv_io (bs=0xa5bcec0, sg=0xa61bd48, sector_num=63, cb=0x81a9380 <ide_write_dma_cb>, opaque=0xa61c684,
     is_write=1) at /tmp/qemu-merge/dma-helpers.c:163
#8  0x081a9484 in ide_write_dma_cb (opaque=0xa61c684, ret=0) at /tmp/qemu-merge/hw/ide/core.c:748
#9  0x081a9eba in bmdma_cmd_writeb (opaque=0xa61c684, addr=49152, val=1) at /tmp/qemu-merge/hw/ide/pci.c:51
#10 0x080a6b7b in cpu_outb (addr=6, val=<value optimized out>) at /tmp/qemu-merge/ioport.c:80
#11 0xb5c95609 in ?? ()
#12 0x0000c000 in ?? ()
#13 0x00000001 in ?? ()
#14 0xff0a0000 in ?? ()
#15 0xbfa41448 in ?? ()
#16 0x00000000 in ?? ()

> Regards,
Anthony Liguori - July 27, 2010, 6:25 p.m.
On 07/27/2010 12:43 PM, Anthony PERARD wrote:
> Anthony Liguori wrote:
>> On 07/27/2010 12:01 PM, Anthony PERARD wrote:
>>> Anthony Liguori wrote:
>>>> CVE-2008-2004 described a vulnerability in QEMU whereas a malicious 
>>>> user could
>>>> trick the block probing code into accessing arbitrary files in a 
>>>> guest.  To
>>>> mitigate this, we added an explicit format parameter to -drive 
>>>> which disabling
>>>> block probing.
>>>>
>>>> Fast forward to today, and the vast majority of users do not use 
>>>> this parameter.
>>>> libvirt does not use this by default nor does virt-manager.
>>>>
>>>> Most users want block probing so we should try to make it safer.
>>>>
>>>> This patch adds some logic to the raw device which attempts to 
>>>> detect a write
>>>> operation to the beginning of a raw device.  If the first 4 bytes 
>>>> happen to
>>>> match an image file that has a backing file that we support, it 
>>>> scrubs the
>>>> signature to all zeros.  If a user specifies an explicit format 
>>>> parameter, this
>>>> behavior is disabled.
>>>>
>>>> I contend that while a legitimate guest could write such a 
>>>> signature to the
>>>> header, we would behave incorrectly anyway upon the next invocation 
>>>> of QEMU.
>>>> This simply changes the incorrect behavior to not involve a security
>>>> vulnerability.
>>>>
>>>> I've tested this pretty extensively both in the positive and 
>>>> negative case.  I'm
>>>> not 100% confident in the block layer's ability to deal with zero 
>>>> sized writes
>>>> particularly with respect to the aio functions so some additional 
>>>> eyes would be
>>>> appreciated.
>>>>
>>>> Even in the case of a single sector write, we have to make sure to 
>>>> invoked the
>>>> completion from a bottom half so just removing the zero sized write 
>>>> is not an
>>>> option.
>>>>
>>>> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
>>>> ---
>>>> v2 -> v3
>>>>  - add an assert to ensure the first iovec element is at least 512 
>>>> bytes
>>>> v1 -> v2
>>>>  - be more paranoid about empty iovecs
>>>> ---
>>>>  block.c     |    4 ++
>>>>  block/raw.c |  130 
>>>> +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>  block_int.h |    1 +
>>>>  3 files changed, 135 insertions(+), 0 deletions(-)
>>>
>>>>  static BlockDriverAIOCB *raw_aio_writev(BlockDriverState *bs,
>>>>      int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
>>>>      BlockDriverCompletionFunc *cb, void *opaque)
>>>>  {
>>>> +    const uint8_t *first_buf;
>>>> +    int first_buf_index = 0, i;
>>>> +
>>>> +    /* This is probably being paranoid, but handle cases of zero size
>>>> +       vectors. */
>>>> +    for (i = 0; i < qiov->niov; i++) {
>>>> +        if (qiov->iov[i].iov_len) {
>>>> +            assert(qiov->iov[i].iov_len >= 512);
>>>> +            first_buf_index = i;
>>>> +            break;
>>>> +        }
>>>> +    }
>>> Hi,
>>>
>>> I have try to do an installation of Windows XP SP2, with qemu fd2f659,
>>> and the Assertion failed when windows begin to format the disk.
>>>
>>> The command line and the error message:
>>> $ i386-softmmu/qemu -hda vm.img -cdrom winxpsp2.iso -boot dc
>>> qemu: qemu/block/raw.c:130: raw_aio_writev: Assertion 
>>> `qiov->iov[i].iov_len >= 512' failed.
>>>
>>> And here, a little more information about the iov:
>>> (gdb) p *qiov
>>> $2 = {iov = 0x9106010, niov = 2, nalloc = 2, size = 512}
>>> (gdb) p qiov->iov[0]
>>> $3 = {iov_base = 0xaff3ce90, iov_len = 368}
>>> (gdb) p qiov->iov[1]
>>> $4 = {iov_base = 0xaff3f000, iov_len = 144}
>>
>> How can a single sector request be split between two iovs in QEMU?  
>> Are you carrying any patches in the version of QEMU that you're 
>> testing?  Is this qemu-dm?
>
> Nop, I don't have any patch for this test. Is not qemu-dm.
>
>> To be clear, this is a discontiguous request. I'm looking at the core 
>> now in core.c and I don't see how an IDE disk can generate a request 
>> that looks like this.
>>
>> Can you provide a full stack trace?
>
> #0  0xb77dd424 in __kernel_vsyscall ()
> #1  0xb7418640 in raise () from /lib/i686/cmov/libc.so.6
> #2  0xb741a018 in abort () from /lib/i686/cmov/libc.so.6
> #3  0xb74115be in __assert_fail () from /lib/i686/cmov/libc.so.6
> #4  0x08074d30 in raw_aio_writev (bs=0xa5bcec0, sector_num=63, 
> qiov=0xa67cf14, nb_sectors=1, cb=0x81ae8c0 <dma_bdrv_cb>,
>     opaque=0xa67cee0) at /tmp/qemu-merge/block/raw.c:130
> #5  0x0806d024 in bdrv_aio_writev (bs=0xa5bcec0, sector_num=63, 
> qiov=0xa67cf14, nb_sectors=1, cb=0x81ae8c0 <dma_bdrv_cb>,
>     opaque=0xa67cee0) at /tmp/qemu-merge/block.c:2004
> #6  0x081aea78 in dma_bdrv_cb (opaque=0xa67cee0, ret=0) at 
> /tmp/qemu-merge/dma-helpers.c:120
> #7  0x081aebc9 in dma_bdrv_io (bs=0xa5bcec0, sg=0xa61bd48, 
> sector_num=63, cb=0x81a9380 <ide_write_dma_cb>, opaque=0xa61c684,
>     is_write=1) at /tmp/qemu-merge/dma-helpers.c:163
> #8  0x081a9484 in ide_write_dma_cb (opaque=0xa61c684, ret=0) at 
> /tmp/qemu-merge/hw/ide/core.c:748
> #9  0x081a9eba in bmdma_cmd_writeb (opaque=0xa61c684, addr=49152, 
> val=1) at /tmp/qemu-merge/hw/ide/pci.c:51
> #10 0x080a6b7b in cpu_outb (addr=6, val=<value optimized out>) at 
> /tmp/qemu-merge/ioport.c:80
> #11 0xb5c95609 in ?? ()
> #12 0x0000c000 in ?? ()
> #13 0x00000001 in ?? ()
> #14 0xff0a0000 in ?? ()
> #15 0xbfa41448 in ?? ()
> #16 0x00000000 in ?? ()

Thanks.  I see the problem.  Working on a patch now.

Regards,

Anthony Liguori

>> Regards,
>
Kevin Wolf - Sept. 3, 2010, 8:42 a.m.
Am 27.07.2010 20:25, schrieb Anthony Liguori:
> On 07/27/2010 12:43 PM, Anthony PERARD wrote:
>> Anthony Liguori wrote:
>>> On 07/27/2010 12:01 PM, Anthony PERARD wrote:
>>>> Anthony Liguori wrote:
>>>>> CVE-2008-2004 described a vulnerability in QEMU whereas a malicious 
>>>>> user could
>>>>> trick the block probing code into accessing arbitrary files in a 
>>>>> guest.  To
>>>>> mitigate this, we added an explicit format parameter to -drive 
>>>>> which disabling
>>>>> block probing.
>>>>>
>>>>> Fast forward to today, and the vast majority of users do not use 
>>>>> this parameter.
>>>>> libvirt does not use this by default nor does virt-manager.
>>>>>
>>>>> Most users want block probing so we should try to make it safer.
>>>>>
>>>>> This patch adds some logic to the raw device which attempts to 
>>>>> detect a write
>>>>> operation to the beginning of a raw device.  If the first 4 bytes 
>>>>> happen to
>>>>> match an image file that has a backing file that we support, it 
>>>>> scrubs the
>>>>> signature to all zeros.  If a user specifies an explicit format 
>>>>> parameter, this
>>>>> behavior is disabled.
>>>>>
>>>>> I contend that while a legitimate guest could write such a 
>>>>> signature to the
>>>>> header, we would behave incorrectly anyway upon the next invocation 
>>>>> of QEMU.
>>>>> This simply changes the incorrect behavior to not involve a security
>>>>> vulnerability.
>>>>>
>>>>> I've tested this pretty extensively both in the positive and 
>>>>> negative case.  I'm
>>>>> not 100% confident in the block layer's ability to deal with zero 
>>>>> sized writes
>>>>> particularly with respect to the aio functions so some additional 
>>>>> eyes would be
>>>>> appreciated.
>>>>>
>>>>> Even in the case of a single sector write, we have to make sure to 
>>>>> invoked the
>>>>> completion from a bottom half so just removing the zero sized write 
>>>>> is not an
>>>>> option.
>>>>>
>>>>> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
>>>>> ---
>>>>> v2 -> v3
>>>>>  - add an assert to ensure the first iovec element is at least 512 
>>>>> bytes
>>>>> v1 -> v2
>>>>>  - be more paranoid about empty iovecs
>>>>> ---
>>>>>  block.c     |    4 ++
>>>>>  block/raw.c |  130 
>>>>> +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>  block_int.h |    1 +
>>>>>  3 files changed, 135 insertions(+), 0 deletions(-)
>>>>
>>>>>  static BlockDriverAIOCB *raw_aio_writev(BlockDriverState *bs,
>>>>>      int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
>>>>>      BlockDriverCompletionFunc *cb, void *opaque)
>>>>>  {
>>>>> +    const uint8_t *first_buf;
>>>>> +    int first_buf_index = 0, i;
>>>>> +
>>>>> +    /* This is probably being paranoid, but handle cases of zero size
>>>>> +       vectors. */
>>>>> +    for (i = 0; i < qiov->niov; i++) {
>>>>> +        if (qiov->iov[i].iov_len) {
>>>>> +            assert(qiov->iov[i].iov_len >= 512);
>>>>> +            first_buf_index = i;
>>>>> +            break;
>>>>> +        }
>>>>> +    }
>>>> Hi,
>>>>
>>>> I have try to do an installation of Windows XP SP2, with qemu fd2f659,
>>>> and the Assertion failed when windows begin to format the disk.
>>>>
>>>> The command line and the error message:
>>>> $ i386-softmmu/qemu -hda vm.img -cdrom winxpsp2.iso -boot dc
>>>> qemu: qemu/block/raw.c:130: raw_aio_writev: Assertion 
>>>> `qiov->iov[i].iov_len >= 512' failed.
>>>>
>>>> And here, a little more information about the iov:
>>>> (gdb) p *qiov
>>>> $2 = {iov = 0x9106010, niov = 2, nalloc = 2, size = 512}
>>>> (gdb) p qiov->iov[0]
>>>> $3 = {iov_base = 0xaff3ce90, iov_len = 368}
>>>> (gdb) p qiov->iov[1]
>>>> $4 = {iov_base = 0xaff3f000, iov_len = 144}
>>>
>>> How can a single sector request be split between two iovs in QEMU?  
>>> Are you carrying any patches in the version of QEMU that you're 
>>> testing?  Is this qemu-dm?
>>
>> Nop, I don't have any patch for this test. Is not qemu-dm.
>>
>>> To be clear, this is a discontiguous request. I'm looking at the core 
>>> now in core.c and I don't see how an IDE disk can generate a request 
>>> that looks like this.
>>>
>>> Can you provide a full stack trace?
>>
>> #0  0xb77dd424 in __kernel_vsyscall ()
>> #1  0xb7418640 in raise () from /lib/i686/cmov/libc.so.6
>> #2  0xb741a018 in abort () from /lib/i686/cmov/libc.so.6
>> #3  0xb74115be in __assert_fail () from /lib/i686/cmov/libc.so.6
>> #4  0x08074d30 in raw_aio_writev (bs=0xa5bcec0, sector_num=63, 
>> qiov=0xa67cf14, nb_sectors=1, cb=0x81ae8c0 <dma_bdrv_cb>,
>>     opaque=0xa67cee0) at /tmp/qemu-merge/block/raw.c:130
>> #5  0x0806d024 in bdrv_aio_writev (bs=0xa5bcec0, sector_num=63, 
>> qiov=0xa67cf14, nb_sectors=1, cb=0x81ae8c0 <dma_bdrv_cb>,
>>     opaque=0xa67cee0) at /tmp/qemu-merge/block.c:2004
>> #6  0x081aea78 in dma_bdrv_cb (opaque=0xa67cee0, ret=0) at 
>> /tmp/qemu-merge/dma-helpers.c:120
>> #7  0x081aebc9 in dma_bdrv_io (bs=0xa5bcec0, sg=0xa61bd48, 
>> sector_num=63, cb=0x81a9380 <ide_write_dma_cb>, opaque=0xa61c684,
>>     is_write=1) at /tmp/qemu-merge/dma-helpers.c:163
>> #8  0x081a9484 in ide_write_dma_cb (opaque=0xa61c684, ret=0) at 
>> /tmp/qemu-merge/hw/ide/core.c:748
>> #9  0x081a9eba in bmdma_cmd_writeb (opaque=0xa61c684, addr=49152, 
>> val=1) at /tmp/qemu-merge/hw/ide/pci.c:51
>> #10 0x080a6b7b in cpu_outb (addr=6, val=<value optimized out>) at 
>> /tmp/qemu-merge/ioport.c:80
>> #11 0xb5c95609 in ?? ()
>> #12 0x0000c000 in ?? ()
>> #13 0x00000001 in ?? ()
>> #14 0xff0a0000 in ?? ()
>> #15 0xbfa41448 in ?? ()
>> #16 0x00000000 in ?? ()
> 
> Thanks.  I see the problem.  Working on a patch now.
> 
> Regards,
> 
> Anthony Liguori

Anthony, what happened with this one? I can't see a patch applied for
this and I just saw a similar report on the fedora-virt mailing list (no
backtrace yet, but the same assertion triggering).

Kevin

Patch

diff --git a/block.c b/block.c
index 65cf4dc..f837876 100644
--- a/block.c
+++ b/block.c
@@ -511,6 +511,7 @@  int bdrv_open(BlockDriverState *bs, const char *filename, int flags,
               BlockDriver *drv)
 {
     int ret;
+    int probed = 0;
 
     if (flags & BDRV_O_SNAPSHOT) {
         BlockDriverState *bs1;
@@ -571,6 +572,7 @@  int bdrv_open(BlockDriverState *bs, const char *filename, int flags,
     /* Find the right image format driver */
     if (!drv) {
         drv = find_image_format(filename);
+        probed = 1;
     }
 
     if (!drv) {
@@ -584,6 +586,8 @@  int bdrv_open(BlockDriverState *bs, const char *filename, int flags,
         goto unlink_and_fail;
     }
 
+    bs->probed = probed;
+
     /* If there is a backing file, use it */
     if ((flags & BDRV_O_NO_BACKING) == 0 && bs->backing_file[0] != '\0') {
         char backing_filename[PATH_MAX];
diff --git a/block/raw.c b/block/raw.c
index 4406b8c..1414e77 100644
--- a/block/raw.c
+++ b/block/raw.c
@@ -9,15 +9,82 @@  static int raw_open(BlockDriverState *bs, int flags)
     return 0;
 }
 
+/* check for the user attempting to write something that looks like a
+   block format header to the beginning of the image and fail out.
+*/
+static int check_for_block_signature(BlockDriverState *bs, const uint8_t *buf)
+{
+    static const uint8_t signatures[][4] = {
+        { 'Q', 'F', 'I', 0xfb }, /* qcow/qcow2 */
+        { 'C', 'O', 'W', 'D' }, /* VMDK3 */
+        { 'V', 'M', 'D', 'K' }, /* VMDK4 */
+        { 'O', 'O', 'O', 'M' }, /* UML COW */
+        {}
+    };
+    int i;
+
+    for (i = 0; signatures[i][0] != 0; i++) {
+        if (memcmp(buf, signatures[i], 4) == 0) {
+            return 1;
+        }
+    }
+
+    return 0;
+}
+
+static int check_write_unsafe(BlockDriverState *bs, int64_t sector_num,
+                              const uint8_t *buf, int nb_sectors)
+{
+    /* assume that if the user specifies the format explicitly, then assume
+       that they will continue to do so and provide no safety net */
+    if (!bs->probed) {
+        return 0;
+    }
+
+    if (sector_num == 0 && nb_sectors > 0) {
+        return check_for_block_signature(bs, buf);
+    }
+
+    return 0;
+}
+
 static int raw_read(BlockDriverState *bs, int64_t sector_num,
                     uint8_t *buf, int nb_sectors)
 {
     return bdrv_read(bs->file, sector_num, buf, nb_sectors);
 }
 
+static int raw_write_scrubbed_bootsect(BlockDriverState *bs,
+                                       const uint8_t *buf)
+{
+    uint8_t bootsect[512];
+
+    /* scrub the dangerous signature */
+    memcpy(bootsect, buf, 512);
+    memset(bootsect, 0, 4);
+
+    return bdrv_write(bs->file, 0, bootsect, 1);
+}
+
 static int raw_write(BlockDriverState *bs, int64_t sector_num,
                      const uint8_t *buf, int nb_sectors)
 {
+    if (check_write_unsafe(bs, sector_num, buf, nb_sectors)) {
+        int ret;
+
+        ret = raw_write_scrubbed_bootsect(bs, buf);
+        if (ret < 0) {
+            return ret;
+        }
+
+        ret = bdrv_write(bs->file, 1, buf + 512, nb_sectors - 1);
+        if (ret < 0) {
+            return ret;
+        }
+
+        return ret + 512;
+    }
+
     return bdrv_write(bs->file, sector_num, buf, nb_sectors);
 }
 
@@ -28,10 +95,73 @@  static BlockDriverAIOCB *raw_aio_readv(BlockDriverState *bs,
     return bdrv_aio_readv(bs->file, sector_num, qiov, nb_sectors, cb, opaque);
 }
 
+typedef struct RawScrubberBounce
+{
+    BlockDriverCompletionFunc *cb;
+    void *opaque;
+    QEMUIOVector qiov;
+} RawScrubberBounce;
+
+static void raw_aio_writev_scrubbed(void *opaque, int ret)
+{
+    RawScrubberBounce *b = opaque;
+
+    if (ret < 0) {
+        b->cb(b->opaque, ret);
+    } else {
+        b->cb(b->opaque, ret + 512);
+    }
+
+    qemu_iovec_destroy(&b->qiov);
+    qemu_free(b);
+}
+
 static BlockDriverAIOCB *raw_aio_writev(BlockDriverState *bs,
     int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
     BlockDriverCompletionFunc *cb, void *opaque)
 {
+    const uint8_t *first_buf;
+    int first_buf_index = 0, i;
+
+    /* This is probably being paranoid, but handle cases of zero size
+       vectors. */
+    for (i = 0; i < qiov->niov; i++) {
+        if (qiov->iov[i].iov_len) {
+            assert(qiov->iov[i].iov_len >= 512);
+            first_buf_index = i;
+            break;
+        }
+    }
+
+    first_buf = qiov->iov[first_buf_index].iov_base;
+
+    if (check_write_unsafe(bs, sector_num, first_buf, nb_sectors)) {
+        RawScrubberBounce *b;
+        int ret;
+
+        /* write the first sector using sync I/O */
+        ret = raw_write_scrubbed_bootsect(bs, first_buf);
+        if (ret < 0) {
+            return NULL;
+        }
+
+        /* adjust request to be everything but first sector */
+
+        b = qemu_malloc(sizeof(*b));
+        b->cb = cb;
+        b->opaque = opaque;
+
+        qemu_iovec_init(&b->qiov, qiov->nalloc);
+        qemu_iovec_concat(&b->qiov, qiov, qiov->size);
+
+        b->qiov.size -= 512;
+        b->qiov.iov[first_buf_index].iov_base += 512;
+        b->qiov.iov[first_buf_index].iov_len -= 512;
+
+        return bdrv_aio_writev(bs->file, sector_num + 1, &b->qiov,
+                               nb_sectors - 1, raw_aio_writev_scrubbed, b);
+    }
+
     return bdrv_aio_writev(bs->file, sector_num, qiov, nb_sectors, cb, opaque);
 }
 
diff --git a/block_int.h b/block_int.h
index 877e1e5..96ff4cf 100644
--- a/block_int.h
+++ b/block_int.h
@@ -144,6 +144,7 @@  struct BlockDriverState {
     int encrypted; /* if true, the media is encrypted */
     int valid_key; /* if true, a valid encryption key has been set */
     int sg;        /* if true, the device is a /dev/sg* */
+    int probed;    /* if true, format was probed automatically */
     /* event callback when inserting/removing */
     void (*change_cb)(void *opaque);
     void *change_opaque;