Message ID | 20100727173050.GK16655@random.random |
---|---|
State | New |
Headers | show |
On 07/27/2010 12:30 PM, Andrea Arcangeli wrote: > Subject: avoid canceling ide dma > > From: Andrea Arcangeli<aarcange@redhat.com> > > The reason for not actually canceling the I/O is because with > virtualization and lots of VM running, a guest fs may mistake a > overload of the host, as an IDE timeout. So rather than canceling the > I/O, it's safer to wait I/O completion and simulate that the I/O has > completed just before the io cancellation was requested by the > guest. This way if ntfs or an app writes data without checking for > -EIO retval, and it thinks the write has succeeded, it's less likely > to run into troubles. Similar issues for reads. > > Furthermore because the DMA operation is splitted into many synchronous > aio_read/write if there's more than one entry in the SG table, without this > patch the DMA would be cancelled in the middle, something we've no idea if it > happens on real hardware too or not. Overall this seems a great risk for zero > gain. > > This approach is sure safer than previous code given we can't pretend all guest > fs code out there to check for errors and reply the DMA if it was completed > partially, given a timeout would never materialize on a real harddisk unless > there are defective blocks (and defective blocks are practically only an issue > for reads never for writes in any recent hardware as writing to blocks is the > way to fix them) or the harddisk breaks as a whole. > > Signed-off-by: Izik Eidus<ieidus@redhat.com> > Signed-off-by: Andrea Arcangeli<aarcange@redhat.com> > --- > > diff --git a/hw/ide/pci.c b/hw/ide/pci.c > index 780fc5f..9f6d42a 100644 > --- a/hw/ide/pci.c > +++ b/hw/ide/pci.c > @@ -40,8 +40,25 @@ void bmdma_cmd_writeb(void *opaque, uint32_t addr, uint32_t val) > printf("%s: 0x%08x\n", __func__, val); > #endif > if (!(val& BM_CMD_START)) { > - /* XXX: do it better */ > - ide_dma_cancel(bm); > + /* > + * We can't cancel Scatter Gather DMA in the middle of the > + * operation or a partial (not full) DMA transfer would reach > + * the storage so we wait for completion instead (we beahve > + * like if the DMA was completed by the time the guest trying > + * to cancel dma with bmdma_cmd_writeb with BM_CMD_START not > + * set). > + * > + * In the future we'll be able to safely cancel the I/O if the > + * whole DMA operation will be submitted to disk with a single > + * aio operation with preadv/pwritev. > + */ > + if (bm->aiocb) { > + qemu_aio_flush(); > + if (bm->aiocb) > + printf("ide_dma_cancel: aiocb still pending"); > + if (bm->status& BM_STATUS_DMAING) > + printf("ide_dma_cancel: BM_STATUS_DMAING still pending"); > printf()s? Regards, Anthony Liguori > + } > bm->cmd = val& 0x09; > } else { > if (!(bm->status& BM_STATUS_DMAING)) { > > >
On Tue, 27 Jul 2010, Andrea Arcangeli wrote: > Subject: avoid canceling ide dma > > From: Andrea Arcangeli <aarcange@redhat.com> > > The reason for not actually canceling the I/O is because with > virtualization and lots of VM running, a guest fs may mistake a > overload of the host, as an IDE timeout. So rather than canceling the > I/O, it's safer to wait I/O completion and simulate that the I/O has > completed just before the io cancellation was requested by the > guest. This way if ntfs or an app writes data without checking for > -EIO retval, and it thinks the write has succeeded, it's less likely > to run into troubles. Similar issues for reads. > > Furthermore because the DMA operation is splitted into many synchronous > aio_read/write if there's more than one entry in the SG table, without this > patch the DMA would be cancelled in the middle, something we've no idea if it > happens on real hardware too or not. Overall this seems a great risk for zero > gain. > > This approach is sure safer than previous code given we can't pretend all guest > fs code out there to check for errors and reply the DMA if it was completed > partially, given a timeout would never materialize on a real harddisk unless > there are defective blocks (and defective blocks are practically only an issue > for reads never for writes in any recent hardware as writing to blocks is the > way to fix them) or the harddisk breaks as a whole. > > Signed-off-by: Izik Eidus <ieidus@redhat.com> > Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> > --- > > diff --git a/hw/ide/pci.c b/hw/ide/pci.c > index 780fc5f..9f6d42a 100644 > --- a/hw/ide/pci.c > +++ b/hw/ide/pci.c > @@ -40,8 +40,25 @@ void bmdma_cmd_writeb(void *opaque, uint32_t addr, uint32_t val) > printf("%s: 0x%08x\n", __func__, val); > #endif > if (!(val & BM_CMD_START)) { > - /* XXX: do it better */ > - ide_dma_cancel(bm); > + /* > + * We can't cancel Scatter Gather DMA in the middle of the > + * operation or a partial (not full) DMA transfer would reach > + * the storage so we wait for completion instead (we beahve > + * like if the DMA was completed by the time the guest trying > + * to cancel dma with bmdma_cmd_writeb with BM_CMD_START not > + * set). > + * > + * In the future we'll be able to safely cancel the I/O if the > + * whole DMA operation will be submitted to disk with a single > + * aio operation with preadv/pwritev. > + */ > + if (bm->aiocb) { > + qemu_aio_flush(); > + if (bm->aiocb) > + printf("ide_dma_cancel: aiocb still pending"); > + if (bm->status & BM_STATUS_DMAING) > + printf("ide_dma_cancel: BM_STATUS_DMAING still pending"); > + } Indentation is off. > bm->cmd = val & 0x09; > } else { > if (!(bm->status & BM_STATUS_DMAING)) { > >
On Tue, Jul 27, 2010 at 12:44:27PM -0500, Anthony Liguori wrote:
> printf()s?
I see plenty of printf in that file, do you want them only under
#ifdef DEBUG_IDE?
On 07/27/2010 01:15 PM, Andrea Arcangeli wrote: > On Tue, Jul 27, 2010 at 12:44:27PM -0500, Anthony Liguori wrote: > >> printf()s? >> > I see plenty of printf in that file, do you want them only under > #ifdef DEBUG_IDE? > Yes. Regards, Anthony Liguori
On 07/27/2010 01:35 PM, Andrea Arcangeli wrote: > On Tue, Jul 27, 2010 at 01:24:12PM -0500, Anthony Liguori wrote: > >> On 07/27/2010 01:15 PM, Andrea Arcangeli wrote: >> >>> On Tue, Jul 27, 2010 at 12:44:27PM -0500, Anthony Liguori wrote: >>> >>> >>>> printf()s? >>>> >>>> >>> I see plenty of printf in that file, do you want them only under >>> #ifdef DEBUG_IDE? >>> >>> >> Yes. >> > Indented with 4 spaces too, but there are tabs, hope that's ok > otherwise I need to undo my editor settings optimized for kernel > (develock has quite an opinion on the tab/space issue ;). > No tabs, see CODING_STYLE. Thanks. Regards, Anthony Liguori > ===== > Subject: avoid canceling ide dma > > From: Andrea Arcangeli<aarcange@redhat.com> > > The reason for not actually canceling the I/O is because with > virtualization and lots of VM running, a guest fs may mistake a > overload of the host, as an IDE timeout. So rather than canceling the > I/O, it's safer to wait I/O completion and simulate that the I/O has > completed just before the io cancellation was requested by the > guest. This way if ntfs or an app writes data without checking for > -EIO retval, and it thinks the write has succeeded, it's less likely > to run into troubles. Similar issues for reads. > > Furthermore because the DMA operation is splitted into many synchronous > aio_read/write if there's more than one entry in the SG table, without this > patch the DMA would be cancelled in the middle, something we've no idea if it > happens on real hardware too or not. Overall this seems a great risk for zero > gain. > > This approach is sure safer than previous code given we can't pretend all guest > fs code out there to check for errors and reply the DMA if it was completed > partially, given a timeout would never materialize on a real harddisk unless > there are defective blocks (and defective blocks are practically only an issue > for reads never for writes in any recent hardware as writing to blocks is the > way to fix them) or the harddisk breaks as a whole. > > Signed-off-by: Izik Eidus<ieidus@redhat.com> > Signed-off-by: Andrea Arcangeli<aarcange@redhat.com> > --- > > diff --git a/hw/ide/pci.c b/hw/ide/pci.c > index 4331d77..a019e0d 100644 > --- a/hw/ide/pci.c > +++ b/hw/ide/pci.c > @@ -40,8 +40,27 @@ void bmdma_cmd_writeb(void *opaque, uint32_t addr, uint32_t val) > printf("%s: 0x%08x\n", __func__, val); > #endif > if (!(val& BM_CMD_START)) { > - /* XXX: do it better */ > - ide_dma_cancel(bm); > + /* > + * We can't cancel Scatter Gather DMA in the middle of the > + * operation or a partial (not full) DMA transfer would reach > + * the storage so we wait for completion instead (we beahve > + * like if the DMA was completed by the time the guest trying > + * to cancel dma with bmdma_cmd_writeb with BM_CMD_START not > + * set). > + * > + * In the future we'll be able to safely cancel the I/O if the > + * whole DMA operation will be submitted to disk with a single > + * aio operation with preadv/pwritev. > + */ > + if (bm->aiocb) { > + qemu_aio_flush(); > +#ifdef DEBUG_IDE > + if (bm->aiocb) > + printf("ide_dma_cancel: aiocb still pending"); > + if (bm->status& BM_STATUS_DMAING) > + printf("ide_dma_cancel: BM_STATUS_DMAING still pending"); > +#endif > + } > bm->cmd = val& 0x09; > } else { > if (!(bm->status& BM_STATUS_DMAING)) { >
diff --git a/hw/ide/pci.c b/hw/ide/pci.c index 780fc5f..9f6d42a 100644 --- a/hw/ide/pci.c +++ b/hw/ide/pci.c @@ -40,8 +40,25 @@ void bmdma_cmd_writeb(void *opaque, uint32_t addr, uint32_t val) printf("%s: 0x%08x\n", __func__, val); #endif if (!(val & BM_CMD_START)) { - /* XXX: do it better */ - ide_dma_cancel(bm); + /* + * We can't cancel Scatter Gather DMA in the middle of the + * operation or a partial (not full) DMA transfer would reach + * the storage so we wait for completion instead (we beahve + * like if the DMA was completed by the time the guest trying + * to cancel dma with bmdma_cmd_writeb with BM_CMD_START not + * set). + * + * In the future we'll be able to safely cancel the I/O if the + * whole DMA operation will be submitted to disk with a single + * aio operation with preadv/pwritev. + */ + if (bm->aiocb) { + qemu_aio_flush(); + if (bm->aiocb) + printf("ide_dma_cancel: aiocb still pending"); + if (bm->status & BM_STATUS_DMAING) + printf("ide_dma_cancel: BM_STATUS_DMAING still pending"); + } bm->cmd = val & 0x09; } else { if (!(bm->status & BM_STATUS_DMAING)) {