Message ID | 20100331061731.GA11480@gondor.apana.org.au |
---|---|
State | Superseded |
Delegated to: | David Miller |
Headers | show |
Tejun, please review Herbert's two patches, thank you! -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hello, Herbert. On 03/31/2010 03:17 PM, Herbert Xu wrote: > commit 8f6205cd572fece673da0255d74843680f67f879 > Author: Tejun Heo <tj@kernel.org> > Date: Fri May 8 11:53:59 2009 +0900 > > ide: dequeue in-flight request > > The problem is that the function ide_dma_timeout_retry does not > requeue the current request, causing one request to be lost for > each DMA timeout. Hmmm.... > diff --git a/drivers/ide/ide-dma.c b/drivers/ide/ide-dma.c > index ee58c88..62a257f 100644 > --- a/drivers/ide/ide-dma.c > +++ b/drivers/ide/ide-dma.c > @@ -492,6 +492,7 @@ ide_startstop_t ide_dma_timeout_retry(ide_drive_t *drive, int error) > if (rq) { > hwif->rq = NULL; > rq->errors = 0; > + ide_requeue_request(drive, rq); > } > return ret; > } Hmmm... ide_dma_timeout_retry() is called from ide_timer_expiry() if !hwif->polling. The former returns ide_stopped if the current request processing should be stopped, in which case ide_timer_expiry() calls plug_device to 1 which makes it call ide_requeue_and_plug() at the end of the function. Does the above change make the request to be requeued twice? Thanks.
On 04/01/2010 11:34 AM, David Miller wrote: > > Tejun, please review Herbert's two patches, thank you! Oops, missed it was caused by my patch and wondering why I was cc'd. Reviewing now. Thanks.
On Thu, Apr 01, 2010 at 12:04:44PM +0900, Tejun Heo wrote: > > > diff --git a/drivers/ide/ide-dma.c b/drivers/ide/ide-dma.c > > index ee58c88..62a257f 100644 > > --- a/drivers/ide/ide-dma.c > > +++ b/drivers/ide/ide-dma.c > > @@ -492,6 +492,7 @@ ide_startstop_t ide_dma_timeout_retry(ide_drive_t *drive, int error) > > if (rq) { > > hwif->rq = NULL; > > rq->errors = 0; > > + ide_requeue_request(drive, rq); > > } > > return ret; > > } > > Hmmm... ide_dma_timeout_retry() is called from ide_timer_expiry() if > !hwif->polling. The former returns ide_stopped if the current request > processing should be stopped, in which case ide_timer_expiry() calls > plug_device to 1 which makes it call ide_requeue_and_plug() at the end > of the function. Does the above change make the request to be > requeued twice? No, we clear hwif->rq in ide_dma_timeout_retry so ide_timer_expiry will have nothing to requeue. Besides, we want to requeue here regardless of whether we return ide_stopped. For example, we may return ide_started in case of a pending reset, but as the original request hasn't been completed it must still be requeued. Cheers,
Hello, On 04/01/2010 01:32 PM, Herbert Xu wrote: >> Does the above change make the request to be requeued twice? > > No, we clear hwif->rq in ide_dma_timeout_retry so ide_timer_expiry > will have nothing to requeue. OIC. It's also cleared in ide_timer_expiry() too. Asymmetry among different failure paths worries me. e.g. looking at the code, I can't find how ide_error() would requeue the request either. It looks like each hwif->rq = NULL in failure path should be investigated and the affected ones should be replaced with a function which requeues and clears hwif->rq. Hmmm.... am I misunderstanding something? > Besides, we want to requeue here regardless of whether we return > ide_stopped. For example, we may return ide_started in case of > a pending reset, but as the original request hasn't been completed > it must still be requeued. Yeap, right. ide_started/stopped doesn't have much bearing with the current request. It indicates the current controller/driver state. Thanks.
On Thu, Apr 01, 2010 at 01:56:53PM +0900, Tejun Heo wrote: > > OIC. It's also cleared in ide_timer_expiry() too. Asymmetry among > different failure paths worries me. e.g. looking at the code, I can't > find how ide_error() would requeue the request either. It looks like > each hwif->rq = NULL in failure path should be investigated and the > affected ones should be replaced with a function which requeues and > clears hwif->rq. Hmmm.... am I misunderstanding something? I had a look at the rest of them and they seemed to be fine. So are you OK for this patch to go in? Thanks,
Hello, On 04/01/2010 02:54 PM, Herbert Xu wrote: > On Thu, Apr 01, 2010 at 01:56:53PM +0900, Tejun Heo wrote: >> >> OIC. It's also cleared in ide_timer_expiry() too. Asymmetry among >> different failure paths worries me. e.g. looking at the code, I can't >> find how ide_error() would requeue the request either. It looks like >> each hwif->rq = NULL in failure path should be investigated and the >> affected ones should be replaced with a function which requeues and >> clears hwif->rq. Hmmm.... am I misunderstanding something? > > I had a look at the rest of them and they seemed to be fine. In ide_timer_expiry() if drive->waiting_for_dma is false, ide_error() is called, which in turn calls __ide_error() for fs requests. ide_ata_error() will be called if the device is a disk. If the request hasn't reached the retry limit and reset is not necessary, ide_ata_error() will return ide_stopped without requeueing the request. ide_timer_expiry() will clear hwif->rq without requeueing the request and the request will be lost. No? > So are you OK for this patch to go in? Yeah yeah, I think those patches are okay by themselves and am just trying to find out whether anything similar is missing, in which case the requeue might fit better somewhere higher in the call chain. Thanks.
On Thu, Apr 01, 2010 at 03:25:50PM +0900, Tejun Heo wrote: > > In ide_timer_expiry() if drive->waiting_for_dma is false, ide_error() > is called, which in turn calls __ide_error() for fs requests. > ide_ata_error() will be called if the device is a disk. If the > request hasn't reached the retry limit and reset is not necessary, > ide_ata_error() will return ide_stopped without requeueing the > request. ide_timer_expiry() will clear hwif->rq without requeueing > the request and the request will be lost. No? It shouldn't be lost in that case because of the rq_in_flight thing that you added will catch it and requeue. Cheers,
Hello, On 04/01/2010 03:32 PM, Herbert Xu wrote: >> ide_timer_expiry() will clear hwif->rq without requeueing >> the request and the request will be lost. No? > > It shouldn't be lost in that case because of the rq_in_flight > thing that you added will catch it and requeue. I feel pretty stupid now. Thanks for enlightening me on how the code I added works. :-) It was the asymmetry between the two paths that bothered me and made me think there should be something else wrong. So, the problem is ide_dma_timeout_retry(), which is used only by ide_timer_expiry(), clearing hwif->rq, right? Then, wouldn't not clearing hwif->rq in ide_dma_timeout_retry() a better solution? Thanks.
On Thu, Apr 01, 2010 at 03:37:47PM +0900, Tejun Heo wrote: > > It was the asymmetry between the two paths that bothered me and made > me think there should be something else wrong. So, the problem is > ide_dma_timeout_retry(), which is used only by ide_timer_expiry(), > clearing hwif->rq, right? Then, wouldn't not clearing hwif->rq in > ide_dma_timeout_retry() a better solution? I don't think that works. We want to requeue regardless of whether we return ide_stopped. If you don't clear hwif->rq and rely on the parent to do it then it'll only requeue when we return ide_stopped. Cheers,
Hello, On 04/01/2010 04:54 PM, Herbert Xu wrote: > On Thu, Apr 01, 2010 at 03:37:47PM +0900, Tejun Heo wrote: >> >> It was the asymmetry between the two paths that bothered me and made >> me think there should be something else wrong. So, the problem is >> ide_dma_timeout_retry(), which is used only by ide_timer_expiry(), >> clearing hwif->rq, right? Then, wouldn't not clearing hwif->rq in >> ide_dma_timeout_retry() a better solution? > > I don't think that works. We want to requeue regardless of whether > we return ide_stopped. If you don't clear hwif->rq and rely on the > parent to do it then it'll only requeue when we return ide_stopped. Yeap, which applies the same to the other failure path too. I think back then I repeated the same mistake I did in this thread - ie. thinking ide_stopped indicates the state of the request. It seems the error path needs audit and more comprehensive fix unless I'm mistaken yet again, which definitely is a possbility. :-) If you're interested in fixing the request requeueing in error path properly, please go ahead. If not, I'll give it a shot this in a few days. David, in the meantime, although I'm not quite sure the fix is comprehensive yet, the patches definitely fix some of the issues. So, I have no objection to applying them. Thanks for your patience.
From: Tejun Heo <tj@kernel.org> Date: Thu, 01 Apr 2010 17:26:37 +0900 > David, in the meantime, although I'm not quite sure the fix is > comprehensive yet, the patches definitely fix some of the issues. So, > I have no objection to applying them. Ok, thanks for reviewing. -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/ide/ide-dma.c b/drivers/ide/ide-dma.c index ee58c88..62a257f 100644 --- a/drivers/ide/ide-dma.c +++ b/drivers/ide/ide-dma.c @@ -492,6 +492,7 @@ ide_startstop_t ide_dma_timeout_retry(ide_drive_t *drive, int error) if (rq) { hwif->rq = NULL; rq->errors = 0; + ide_requeue_request(drive, rq); } return ret; } diff --git a/drivers/ide/ide-io.c b/drivers/ide/ide-io.c index db96138..0a5f346 100644 --- a/drivers/ide/ide-io.c +++ b/drivers/ide/ide-io.c @@ -566,6 +566,16 @@ plug_device_2: blk_plug_device(q); } +void ide_requeue_request(ide_drive_t *drive, struct request *rq) +{ + struct request_queue *q = drive->queue; + unsigned long flags; + + spin_lock_irqsave(q->queue_lock, flags); + blk_requeue_request(q, rq); + spin_unlock_irqrestore(q->queue_lock, flags); +} + static void ide_requeue_and_plug(ide_drive_t *drive, struct request *rq) { struct request_queue *q = drive->queue; diff --git a/include/linux/ide.h b/include/linux/ide.h index 97e6ab4..c369f27 100644 --- a/include/linux/ide.h +++ b/include/linux/ide.h @@ -1169,6 +1169,7 @@ extern void ide_stall_queue(ide_drive_t *drive, unsigned long timeout); extern void ide_timer_expiry(unsigned long); extern irqreturn_t ide_intr(int irq, void *dev_id); extern void do_ide_request(struct request_queue *); +extern void ide_requeue_request(ide_drive_t *drive, struct request *rq); void ide_init_disk(struct gendisk *, ide_drive_t *);
Hi: ide: Requeue request after DMA timeout I noticed that my KVM virtual machines were experiencing IDE issues resulting in processes stuck on waiting for buffers to complete. The root cause is of course race conditions in the ancient qemu backend that I'm using. However, the fact that the guest isn't recovering is a bug. I've tracked it down to the change made last year to dequeue requests at the start rather than at the end in the IDE layer. commit 8f6205cd572fece673da0255d74843680f67f879 Author: Tejun Heo <tj@kernel.org> Date: Fri May 8 11:53:59 2009 +0900 ide: dequeue in-flight request The problem is that the function ide_dma_timeout_retry does not requeue the current request, causing one request to be lost for each DMA timeout. This patch fixes this by requeueing the request. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Cheers,