diff mbox

[v2,0/7] dma-helpers, scsi-block: use SG_IO for all I/O on scsi-block

Message ID 57435BB8.3000600@ilande.co.uk
State New
Headers show

Commit Message

Mark Cave-Ayland May 23, 2016, 7:36 p.m. UTC
On 23/05/16 13:54, Paolo Bonzini wrote:

> scsi-block uses the block layer for reads and writes in order to avoid
> allocating bounce buffers as big as the transferred data.  We know how
> to split a large transfer to multiple reads and writes, and thus we can
> use scsi-disk.c's existing code to do I/O in multiple chunks (for non-s/g
> SCSI hosts) or through the DMA helpers (for s/g SCSI hosts).
> 
> Unfortunately, this has the side effect of eating the SCSI status except
> in the very few cases where we can convert an errno code back to a SCSI
> status.  It puts a big wrench in persistent reservations support in the
> guest, for example.
> 
> Luckily, splitting a large transfer into multiple SBC commands is just as
> easy, and this is what the last patch does.  It takes the original CDB,
> patches in a modified starting sector and sector count, and executes the
> SCSI command through blk_aio_ioctl.  It is also easy to pass a QEMUIOVector
> to SG_IO, so that s/g SCSI hosts keep the performance.
> 
> This rebases the patches on top of Eric's changes for byte-based
> BlockBackend access and fixes a few bugs I knew about in the RFC.
> 
> Patches 1, 5 and 6 are new.
> 
> Paolo
> 
> Paolo Bonzini (7):
>   dma-helpers: change interface to byte-based
>   dma-helpers: change BlockBackend to opaque value in DMAIOFunc
>   scsi-disk: introduce a common base class
>   scsi-disk: introduce dma_readv and dma_writev
>   scsi-disk: add need_fua_emulation to SCSIDiskClass
>   scsi-disk: introduce scsi_disk_req_check_error
>   scsi-block: always use SG_IO
> 
>  dma-helpers.c        |  54 +++++--
>  hw/block/nvme.c      |   6 +-
>  hw/ide/ahci.c        |   6 +-
>  hw/ide/core.c        |  20 ++-
>  hw/ide/internal.h    |   6 +-
>  hw/ide/macio.c       |   2 +-
>  hw/scsi/scsi-disk.c  | 409 +++++++++++++++++++++++++++++++++++++--------------
>  include/sysemu/dma.h |  20 +--
>  trace-events         |   2 +-
>  9 files changed, 371 insertions(+), 154 deletions(-)

Hi Paolo,

I thought I'd give this patchset a spin with a view to seeing whether I
could switch the macio device back to the now byte-based dma-helpers,
but came up with a couple of compile errors. Attached is the minor diff
I applied in order to get a successful compile (apologies for not being
inline, however I couldn't get my mail client to stop wrapping incorrectly).


ATB,

Mark.

Comments

Mark Cave-Ayland May 24, 2016, 10:59 p.m. UTC | #1
On 23/05/16 20:36, Mark Cave-Ayland wrote:

> On 23/05/16 13:54, Paolo Bonzini wrote:
> 
>> scsi-block uses the block layer for reads and writes in order to avoid
>> allocating bounce buffers as big as the transferred data.  We know how
>> to split a large transfer to multiple reads and writes, and thus we can
>> use scsi-disk.c's existing code to do I/O in multiple chunks (for non-s/g
>> SCSI hosts) or through the DMA helpers (for s/g SCSI hosts).
>>
>> Unfortunately, this has the side effect of eating the SCSI status except
>> in the very few cases where we can convert an errno code back to a SCSI
>> status.  It puts a big wrench in persistent reservations support in the
>> guest, for example.
>>
>> Luckily, splitting a large transfer into multiple SBC commands is just as
>> easy, and this is what the last patch does.  It takes the original CDB,
>> patches in a modified starting sector and sector count, and executes the
>> SCSI command through blk_aio_ioctl.  It is also easy to pass a QEMUIOVector
>> to SG_IO, so that s/g SCSI hosts keep the performance.
>>
>> This rebases the patches on top of Eric's changes for byte-based
>> BlockBackend access and fixes a few bugs I knew about in the RFC.
>>
>> Patches 1, 5 and 6 are new.
>>
>> Paolo
>>
>> Paolo Bonzini (7):
>>   dma-helpers: change interface to byte-based
>>   dma-helpers: change BlockBackend to opaque value in DMAIOFunc
>>   scsi-disk: introduce a common base class
>>   scsi-disk: introduce dma_readv and dma_writev
>>   scsi-disk: add need_fua_emulation to SCSIDiskClass
>>   scsi-disk: introduce scsi_disk_req_check_error
>>   scsi-block: always use SG_IO
>>
>>  dma-helpers.c        |  54 +++++--
>>  hw/block/nvme.c      |   6 +-
>>  hw/ide/ahci.c        |   6 +-
>>  hw/ide/core.c        |  20 ++-
>>  hw/ide/internal.h    |   6 +-
>>  hw/ide/macio.c       |   2 +-
>>  hw/scsi/scsi-disk.c  | 409 +++++++++++++++++++++++++++++++++++++--------------
>>  include/sysemu/dma.h |  20 +--
>>  trace-events         |   2 +-
>>  9 files changed, 371 insertions(+), 154 deletions(-)
> 
> Hi Paolo,
> 
> I thought I'd give this patchset a spin with a view to seeing whether I
> could switch the macio device back to the now byte-based dma-helpers,
> but came up with a couple of compile errors. Attached is the minor diff
> I applied in order to get a successful compile (apologies for not being
> inline, however I couldn't get my mail client to stop wrapping incorrectly).

After some head-scratching, I've finally managed to install and boot
Darwin PPC using the new byte-based DMA helpers facilitated by this
patch in the macio IDE driver. Just switching over to the new helpers
provided by this patch seemingly allowed the installation to proceed,
but the resulting image was corrupt and refused to boot.

I eventually traced the corruption down to this section of code in
dma_blk_cb() which was incorrectly truncating the unaligned iovecs:

if (dbs->iov.size & ~BDRV_SECTOR_MASK) {
    qemu_iovec_discard_back(&dbs->iov, dbs->iov.size &
        ~BDRV_SECTOR_MASK);
}

This was introduced in
http://lists.gnu.org/archive/html/qemu-devel/2014-07/msg02236.html to
handle non-sector aligned SG lists, although given that this is a
restriction of one particular implementation (PRDT) I'm not sure whether
a plain revert is the correct thing to do or whether an alternative
solution needs to be found.


ATB,

Mark.
Paolo Bonzini May 25, 2016, 8:45 a.m. UTC | #2
On 25/05/2016 00:59, Mark Cave-Ayland wrote:
> I eventually traced the corruption down to this section of code in
> dma_blk_cb() which was incorrectly truncating the unaligned iovecs:
> 
> if (dbs->iov.size & ~BDRV_SECTOR_MASK) {
>     qemu_iovec_discard_back(&dbs->iov, dbs->iov.size &
>         ~BDRV_SECTOR_MASK);
> }
> 
> This was introduced in
> http://lists.gnu.org/archive/html/qemu-devel/2014-07/msg02236.html to
> handle non-sector aligned SG lists, although given that this is a
> restriction of one particular implementation (PRDT) I'm not sure whether
> a plain revert is the correct thing to do or whether an alternative
> solution needs to be found.

Yeah, I have a plan for this bit.  It's related to this code I'm adding
in patch 7:

+    /* This is not supported yet.  It can only happen if the guest does
+     * reads and writes that are not aligned to one logical sectors
+     * _and_ cover multiple MemoryRegions.
+     */
+    assert(offset % s->qdev.blocksize == 0);
+    assert(iov->size % s->qdev.blocksize == 0);

The idea behind the "if" is that the I/O code cannot deal with a number
of bytes that is not a multiple of the logical sector size.  These
assertions could be removed by generalizing the "if" to an arbitrary
mask, in this case s->qdev.blocksize - 1.

There are two things that are wrong however in the logic.  First, the
"if" must be moved before the "if (dbs->iov.size & ~BDRV_SECTOR_MASK)",
because the call to qemu_iovec_discard_back can result in a zero-byte
QEMUIOVector.  Second, the sg_cur_* variables must be rewound too.

Thanks,

Paolo
Mark Cave-Ayland May 25, 2016, 9:13 a.m. UTC | #3
On 25/05/16 09:45, Paolo Bonzini wrote:

> On 25/05/2016 00:59, Mark Cave-Ayland wrote:
>> I eventually traced the corruption down to this section of code in
>> dma_blk_cb() which was incorrectly truncating the unaligned iovecs:
>>
>> if (dbs->iov.size & ~BDRV_SECTOR_MASK) {
>>     qemu_iovec_discard_back(&dbs->iov, dbs->iov.size &
>>         ~BDRV_SECTOR_MASK);
>> }
>>
>> This was introduced in
>> http://lists.gnu.org/archive/html/qemu-devel/2014-07/msg02236.html to
>> handle non-sector aligned SG lists, although given that this is a
>> restriction of one particular implementation (PRDT) I'm not sure whether
>> a plain revert is the correct thing to do or whether an alternative
>> solution needs to be found.
> 
> Yeah, I have a plan for this bit.  It's related to this code I'm adding
> in patch 7:
> 
> +    /* This is not supported yet.  It can only happen if the guest does
> +     * reads and writes that are not aligned to one logical sectors
> +     * _and_ cover multiple MemoryRegions.
> +     */
> +    assert(offset % s->qdev.blocksize == 0);
> +    assert(iov->size % s->qdev.blocksize == 0);
> 
> The idea behind the "if" is that the I/O code cannot deal with a number
> of bytes that is not a multiple of the logical sector size.  These
> assertions could be removed by generalizing the "if" to an arbitrary
> mask, in this case s->qdev.blocksize - 1.
> 
> There are two things that are wrong however in the logic.  First, the
> "if" must be moved before the "if (dbs->iov.size & ~BDRV_SECTOR_MASK)",
> because the call to qemu_iovec_discard_back can result in a zero-byte
> QEMUIOVector.  Second, the sg_cur_* variables must be rewound too.

Great - as long as you're aware of this I'll wait for the next version
of the patchset to come out, and then at least I can re-run the macio
installation test to ensure that there are no further corruption issues
with unaligned DMA accesses.

I'd like to add my thanks to everyone who has worked hard on allowing
byte-based accesses in the I/O layer since on my local test case it
makes a significant improvement to the macio emulation. With my basic
macio conversion applied on top of this patchset, the installation time
for a complete Darwin PPC install drops from 25 mins to about 9 mins on
my laptop which is a clear 2.5x speed increase over the existing code.


ATB,

Mark.
Paolo Bonzini May 25, 2016, 10:11 a.m. UTC | #4
On 25/05/2016 11:13, Mark Cave-Ayland wrote:
> Great - as long as you're aware of this I'll wait for the next version
> of the patchset to come out, and then at least I can re-run the macio
> installation test to ensure that there are no further corruption issues
> with unaligned DMA accesses.

If I get an ack for patches 1 and 2, I will just put these patches in a
pull request (obviously with your fixes---I had forgotten to redo the
git format-patch step).

Thanks,

Paolo
diff mbox

Patch

diff --git a/dma-helpers.c b/dma-helpers.c
index 92c5d55..b521d84 100644
--- a/dma-helpers.c
+++ b/dma-helpers.c
@@ -192,7 +192,7 @@  static const AIOCBInfo dma_aiocb_info = {
 };
 
 BlockAIOCB *dma_blk_io(AioContext *ctx,
-    QEMUSGList *sg, uint64_t opaque,
+    QEMUSGList *sg, uint64_t offset,
     DMAIOFunc *io_func, void *io_func_opaque,
     BlockCompletionFunc *cb,
     void *opaque, DMADirection dir)
diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
index 02faa3a..f7a23fc 100644
--- a/hw/scsi/scsi-disk.c
+++ b/hw/scsi/scsi-disk.c
@@ -540,7 +540,6 @@  static void scsi_write_data(SCSIRequest *req)
     if (r->req.sg) {
         dma_acct_start(s->qdev.conf.blk, &r->acct, r->req.sg, BLOCK_ACCT_WRITE);
         r->req.resid -= r->req.sg->size;
-                                     scsi_dma_complete, r);
         r->req.aiocb = dma_blk_io(blk_get_aio_context(s->qdev.conf.blk),
                                   r->req.sg, r->sector << BDRV_SECTOR_BITS,
                                   sdc->dma_writev, r, scsi_dma_complete, r,