Patchwork [v2.6.38-rc5,2/2] block: blk-flush shouldn't call directly into q->request_fn() __blk_run_queue()

login
register
mail settings
Submitter Tejun Heo
Date Feb. 17, 2011, 11:16 a.m.
Message ID <20110217111619.GR19830@htj.dyndns.org>
Download mbox | patch
Permalink /patch/83441/
State Not Applicable
Delegated to: David Miller
Headers show

Comments

Tejun Heo - Feb. 17, 2011, 11:16 a.m.
blk-flush decomposes a flush into sequence of multiple requests.  On
completion of a request, the next one is queued; however, block layer
must not implicitly call into q->request_fn() directly from completion
path.  This makes the queue behave unexpectedly when seen from the
drivers and violates the assumption that q->request_fn() is called
with process context + queue_lock.

This patch makes blk-flush the following two changes to make sure
q->request_fn() is not called directly from request completion path.

- blk_flush_complete_seq_end_io() now asks __blk_run_queue() to always
  use kblockd instead of calling directly into q->request_fn().

- queue_next_fseq() uses ELEVATOR_INSERT_REQUEUE instead of
  ELEVATOR_INSERT_FRONT so that elv_insert() doesn't try to unplug the
  request queue directly.

Reported by Jan in the following threads.

 http://thread.gmane.org/gmane.linux.ide/48778
 http://thread.gmane.org/gmane.linux.ide/48786

stable: applicable to v2.6.37.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Jan Beulich <JBeulich@novell.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: stable@kernel.org
---
Jens, this is applicable to v2.6.37 and 38.  The new implementation
for 39 would need similar fix but I couldn't find where the tree is.
Which branch is it?

Thanks.

 block/blk-flush.c |    8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mike Snitzer - Feb. 17, 2011, 3:01 p.m.
Hi Tejun,

On Thu, Feb 17, 2011 at 6:16 AM, Tejun Heo <tj@kernel.org> wrote:

> Jens, this is applicable to v2.6.37 and 38.  The new implementation
> for 39 would need similar fix but I couldn't find where the tree is.
> Which branch is it?

for-2.6.39/core
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tejun Heo - Feb. 18, 2011, 9:49 a.m.
On Thu, Feb 17, 2011 at 10:01:51AM -0500, Mike Snitzer wrote:
> Hi Tejun,
> 
> On Thu, Feb 17, 2011 at 6:16 AM, Tejun Heo <tj@kernel.org> wrote:
> 
> > Jens, this is applicable to v2.6.37 and 38.  The new implementation
> > for 39 would need similar fix but I couldn't find where the tree is.
> > Which branch is it?
> 
> for-2.6.39/core

Right, thanks.  Jens, after you apply the two fixes for 2.6.38, I can
create a merge branch for for-2.6.39/core which you can pull.  Would
that work for you?

Thanks.
Jens Axboe - March 2, 2011, 1:46 p.m.
On 2011-02-18 04:49, Tejun Heo wrote:
> On Thu, Feb 17, 2011 at 10:01:51AM -0500, Mike Snitzer wrote:
>> Hi Tejun,
>>
>> On Thu, Feb 17, 2011 at 6:16 AM, Tejun Heo <tj@kernel.org> wrote:
>>
>>> Jens, this is applicable to v2.6.37 and 38.  The new implementation
>>> for 39 would need similar fix but I couldn't find where the tree is.
>>> Which branch is it?
>>
>> for-2.6.39/core
> 
> Right, thanks.  Jens, after you apply the two fixes for 2.6.38, I can
> create a merge branch for for-2.6.39/core which you can pull.  Would
> that work for you?

Thanks, that would be great. I'm applying them now.
Tejun Heo - March 4, 2011, 6:25 p.m.
Hello, Jens.

On Wed, Mar 02, 2011 at 08:46:46AM -0500, Jens Axboe wrote:
> > Right, thanks.  Jens, after you apply the two fixes for 2.6.38, I can
> > create a merge branch for for-2.6.39/core which you can pull.  Would
> > that work for you?
> 
> Thanks, that would be great. I'm applying them now.

Okay, please pull from the following branch to receive the merge
between linux-2.6-block:for-linus and :for-2.6.39/core.

 git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git block-for-2.6.39-core

HEAD is e83a46bbb1d4c03defd733a64b727632a40059ad but git.korg seems a
bit slow to sync, so if you don't see the commit there, please pull
from master.korg.

 ssh://master.kernel.org/pub/scm/linux/kernel/git/tj/misc.git block-for-2.6.39-core

Thanks.
Jens Axboe - March 7, 2011, 8:41 a.m.
On 2011-03-04 19:25, Tejun Heo wrote:
> Hello, Jens.
> 
> On Wed, Mar 02, 2011 at 08:46:46AM -0500, Jens Axboe wrote:
>>> Right, thanks.  Jens, after you apply the two fixes for 2.6.38, I can
>>> create a merge branch for for-2.6.39/core which you can pull.  Would
>>> that work for you?
>>
>> Thanks, that would be great. I'm applying them now.
> 
> Okay, please pull from the following branch to receive the merge
> between linux-2.6-block:for-linus and :for-2.6.39/core.
> 
>  git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git block-for-2.6.39-core
> 
> HEAD is e83a46bbb1d4c03defd733a64b727632a40059ad but git.korg seems a
> bit slow to sync, so if you don't see the commit there, please pull
> from master.korg.
> 
>  ssh://master.kernel.org/pub/scm/linux/kernel/git/tj/misc.git block-for-2.6.39-core
> 
> Thanks.

Pulled, thanks Tejun.
Jeff Moyer - March 7, 2011, 7:33 p.m.
Tejun Heo <tj@kernel.org> writes:

> Hello, Jens.
>
> On Wed, Mar 02, 2011 at 08:46:46AM -0500, Jens Axboe wrote:
>> > Right, thanks.  Jens, after you apply the two fixes for 2.6.38, I can
>> > create a merge branch for for-2.6.39/core which you can pull.  Would
>> > that work for you?
>> 
>> Thanks, that would be great. I'm applying them now.
>
> Okay, please pull from the following branch to receive the merge
> between linux-2.6-block:for-linus and :for-2.6.39/core.
>
>  git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git block-for-2.6.39-core
>
> HEAD is e83a46bbb1d4c03defd733a64b727632a40059ad but git.korg seems a
> bit slow to sync, so if you don't see the commit there, please pull
> from master.korg.
>
>  ssh://master.kernel.org/pub/scm/linux/kernel/git/tj/misc.git block-for-2.6.39-core
>
> Thanks.

I know I'm coming to the party late (and maybe wrong), but I've got some
questions here.

Tejun, you introduced a commit to the ide driver that made it block in
its request function.  As far as I know, that's not allowed.  For scsi,
at least, it has always allowed calling back into the request function
from the completion handler, and I think this is actully the common case
(not some corner case).

So, why doesn't the ide driver see calls back into its request function
from the completion handler?  It's clear that it calls blk_end_request
from ide_end_rq, which can definitely call __blk_run_queue.  In other
words, why is it that the flush requests are triggerring this problem
while normal I/O isn't?

I think the real issue may just be that the ide driver is blocking in
its request function.  What have I missed?

Thanks!
Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jens Axboe - March 7, 2011, 7:36 p.m.
On 2011-03-07 20:33, Jeff Moyer wrote:
> Tejun Heo <tj@kernel.org> writes:
> 
>> Hello, Jens.
>>
>> On Wed, Mar 02, 2011 at 08:46:46AM -0500, Jens Axboe wrote:
>>>> Right, thanks.  Jens, after you apply the two fixes for 2.6.38, I can
>>>> create a merge branch for for-2.6.39/core which you can pull.  Would
>>>> that work for you?
>>>
>>> Thanks, that would be great. I'm applying them now.
>>
>> Okay, please pull from the following branch to receive the merge
>> between linux-2.6-block:for-linus and :for-2.6.39/core.
>>
>>  git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git block-for-2.6.39-core
>>
>> HEAD is e83a46bbb1d4c03defd733a64b727632a40059ad but git.korg seems a
>> bit slow to sync, so if you don't see the commit there, please pull
>> from master.korg.
>>
>>  ssh://master.kernel.org/pub/scm/linux/kernel/git/tj/misc.git block-for-2.6.39-core
>>
>> Thanks.
> 
> I know I'm coming to the party late (and maybe wrong), but I've got some
> questions here.
> 
> Tejun, you introduced a commit to the ide driver that made it block in
> its request function.  As far as I know, that's not allowed.  For scsi,
> at least, it has always allowed calling back into the request function
> from the completion handler, and I think this is actully the common case
> (not some corner case).
> 
> So, why doesn't the ide driver see calls back into its request function
> from the completion handler?  It's clear that it calls blk_end_request
> from ide_end_rq, which can definitely call __blk_run_queue.  In other
> words, why is it that the flush requests are triggerring this problem
> while normal I/O isn't?
> 
> I think the real issue may just be that the ide driver is blocking in
> its request function.  What have I missed?

So the only case where the request_fn is called and you cannot block, is
if you call it from your completion function. Any other invocation
should be from process context. As long as you remember to drop the
queue lock and re-enable interrupts, it should work. It's not great
style and I would not recommend it for a performance environment, but it
should work.
Jeff Moyer - March 7, 2011, 7:39 p.m.
Jens Axboe <axboe@kernel.dk> writes:

> On 2011-03-07 20:33, Jeff Moyer wrote:
>> Tejun Heo <tj@kernel.org> writes:
>> 
>>> Hello, Jens.
>>>
>>> On Wed, Mar 02, 2011 at 08:46:46AM -0500, Jens Axboe wrote:
>>>>> Right, thanks.  Jens, after you apply the two fixes for 2.6.38, I can
>>>>> create a merge branch for for-2.6.39/core which you can pull.  Would
>>>>> that work for you?
>>>>
>>>> Thanks, that would be great. I'm applying them now.
>>>
>>> Okay, please pull from the following branch to receive the merge
>>> between linux-2.6-block:for-linus and :for-2.6.39/core.
>>>
>>>  git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git block-for-2.6.39-core
>>>
>>> HEAD is e83a46bbb1d4c03defd733a64b727632a40059ad but git.korg seems a
>>> bit slow to sync, so if you don't see the commit there, please pull
>>> from master.korg.
>>>
>>>  ssh://master.kernel.org/pub/scm/linux/kernel/git/tj/misc.git block-for-2.6.39-core
>>>
>>> Thanks.
>> 
>> I know I'm coming to the party late (and maybe wrong), but I've got some
>> questions here.
>> 
>> Tejun, you introduced a commit to the ide driver that made it block in
>> its request function.  As far as I know, that's not allowed.  For scsi,
>> at least, it has always allowed calling back into the request function
>> from the completion handler, and I think this is actully the common case
>> (not some corner case).
>> 
>> So, why doesn't the ide driver see calls back into its request function
>> from the completion handler?  It's clear that it calls blk_end_request
>> from ide_end_rq, which can definitely call __blk_run_queue.  In other
>> words, why is it that the flush requests are triggerring this problem
>> while normal I/O isn't?
>> 
>> I think the real issue may just be that the ide driver is blocking in
>> its request function.  What have I missed?
>
> So the only case where the request_fn is called and you cannot block, is
> if you call it from your completion function. Any other invocation
> should be from process context. As long as you remember to drop the
> queue lock and re-enable interrupts, it should work. It's not great
> style and I would not recommend it for a performance environment, but it
> should work.

So are you agreeing with me or disagreeing?  ;-)  It sounds to me like
you're saying that the ide driver should be able to cope with being
called from softirq context.

Cheers,
Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jens Axboe - March 7, 2011, 7:41 p.m.
On 2011-03-07 20:39, Jeff Moyer wrote:
> Jens Axboe <axboe@kernel.dk> writes:
> 
>> On 2011-03-07 20:33, Jeff Moyer wrote:
>>> Tejun Heo <tj@kernel.org> writes:
>>>
>>>> Hello, Jens.
>>>>
>>>> On Wed, Mar 02, 2011 at 08:46:46AM -0500, Jens Axboe wrote:
>>>>>> Right, thanks.  Jens, after you apply the two fixes for 2.6.38, I can
>>>>>> create a merge branch for for-2.6.39/core which you can pull.  Would
>>>>>> that work for you?
>>>>>
>>>>> Thanks, that would be great. I'm applying them now.
>>>>
>>>> Okay, please pull from the following branch to receive the merge
>>>> between linux-2.6-block:for-linus and :for-2.6.39/core.
>>>>
>>>>  git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git block-for-2.6.39-core
>>>>
>>>> HEAD is e83a46bbb1d4c03defd733a64b727632a40059ad but git.korg seems a
>>>> bit slow to sync, so if you don't see the commit there, please pull
>>>> from master.korg.
>>>>
>>>>  ssh://master.kernel.org/pub/scm/linux/kernel/git/tj/misc.git block-for-2.6.39-core
>>>>
>>>> Thanks.
>>>
>>> I know I'm coming to the party late (and maybe wrong), but I've got some
>>> questions here.
>>>
>>> Tejun, you introduced a commit to the ide driver that made it block in
>>> its request function.  As far as I know, that's not allowed.  For scsi,
>>> at least, it has always allowed calling back into the request function
>>> from the completion handler, and I think this is actully the common case
>>> (not some corner case).
>>>
>>> So, why doesn't the ide driver see calls back into its request function
>>> from the completion handler?  It's clear that it calls blk_end_request
>>> from ide_end_rq, which can definitely call __blk_run_queue.  In other
>>> words, why is it that the flush requests are triggerring this problem
>>> while normal I/O isn't?
>>>
>>> I think the real issue may just be that the ide driver is blocking in
>>> its request function.  What have I missed?
>>
>> So the only case where the request_fn is called and you cannot block, is
>> if you call it from your completion function. Any other invocation
>> should be from process context. As long as you remember to drop the
>> queue lock and re-enable interrupts, it should work. It's not great
>> style and I would not recommend it for a performance environment, but it
>> should work.
> 
> So are you agreeing with me or disagreeing?  ;-)  It sounds to me like
> you're saying that the ide driver should be able to cope with being
> called from softirq context.

I'm just stating how it should work :-)

But yes, it sounds like IDE is violating this rule and that's why it was
broken. Even with that, having explicit control of the queue running
does make sense.
Jeff Moyer - March 7, 2011, 7:50 p.m.
Jens Axboe <axboe@kernel.dk> writes:

> On 2011-03-07 20:39, Jeff Moyer wrote:
>>>> I think the real issue may just be that the ide driver is blocking in
>>>> its request function.  What have I missed?
>>>
>>> So the only case where the request_fn is called and you cannot block, is
>>> if you call it from your completion function. Any other invocation
>>> should be from process context. As long as you remember to drop the
>>> queue lock and re-enable interrupts, it should work. It's not great
>>> style and I would not recommend it for a performance environment, but it
>>> should work.
>> 
>> So are you agreeing with me or disagreeing?  ;-)  It sounds to me like
>> you're saying that the ide driver should be able to cope with being
>> called from softirq context.
>
> I'm just stating how it should work :-)
>
> But yes, it sounds like IDE is violating this rule and that's why it was
> broken. Even with that, having explicit control of the queue running
> does make sense.

Well, I wonder if it makes sense *in this case*.  With all of the work
going into optimizing the flushing, is deferring work to kblockd really
the best idea?  Tejun, do you know if it has any measurable impact?

Cheers,
Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tejun Heo - March 8, 2011, 7:54 a.m.
Hello,

On Mon, Mar 07, 2011 at 02:50:28PM -0500, Jeff Moyer wrote:
> > But yes, it sounds like IDE is violating this rule and that's why it was
> > broken. Even with that, having explicit control of the queue running
> > does make sense.

Yeah, IDE is the only one which actually depends on it.  At the time I
thought more drivers would use the sleeping context but that didn't
happen (yet).  That said, I think it's generally a good idea to
guarantee context on the issue path (the request_lock requirement
makes it quite ugly to use but that's a separate issue).  It makes
things much easier - e.g. we can use mempool and friends for forward
progress guarantee instead of doing explicit retrying which also can
hide deadlocks quite effectively.

> Well, I wonder if it makes sense *in this case*.  With all of the work
> going into optimizing the flushing, is deferring work to kblockd really
> the best idea?  Tejun, do you know if it has any measurable impact?

I don't think it matters.  First of all, the overhead itself isn't
that big to begin with.  More importantly, the flush optimization is
not about squeezing out the last cpu cycles out of the existing path.
It's about consolidating similar operations and thus increasing
scalability when a storm of fsync's are issued in parallel likely with
a lot of other IOs.  The kicking path won't be used at all on busy
queues (queue won't be empty).

So, I don't think there's anything to worry about here.

Thanks.

Patch

Index: work/block/blk-flush.c
===================================================================
--- work.orig/block/blk-flush.c
+++ work/block/blk-flush.c
@@ -66,10 +66,12 @@  static void blk_flush_complete_seq_end_i
 
 	/*
 	 * Moving a request silently to empty queue_head may stall the
-	 * queue.  Kick the queue in those cases.
+	 * queue.  Kick the queue in those cases.  This function is called
+	 * from request completion path and calling directly into
+	 * request_fn may confuse the driver.  Always use kblockd.
 	 */
 	if (was_empty && next_rq)
-		__blk_run_queue(q, false);
+		__blk_run_queue(q, true);
 }
 
 static void pre_flush_end_io(struct request *rq, int error)
@@ -130,7 +132,7 @@  static struct request *queue_next_fseq(s
 		BUG();
 	}
 
-	elv_insert(q, rq, ELEVATOR_INSERT_FRONT);
+	elv_insert(q, rq, ELEVATOR_INSERT_REQUEUE);
 	return rq;
 }