diff mbox

[block:for-3.3/core] cfq: merged request shouldn't jump to a different cfqq

Message ID 20120106021707.GA6276@google.com
State Not Applicable
Delegated to: David Miller
Headers show

Commit Message

Tejun Heo Jan. 6, 2012, 2:17 a.m. UTC
When two requests are merged, if the absorbed request is older than
the absorbing one, cfq_merged_requests() tries to reposition it in the
cfqq->fifo list by list_move()'ing the absorbing request to the
absorbed one before removing it.

This works if both requests are on the same cfqq but nothing
guarantees that and the code ends up moving the merged request to a
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Tejun Heo Jan. 6, 2012, 2:36 a.m. UTC | #1
Hello, again.

On Thu, Jan 05, 2012 at 06:17:07PM -0800, Tejun Heo wrote:
> When two requests are merged, if the absorbed request is older than
> the absorbing one, cfq_merged_requests() tries to reposition it in the
> cfqq->fifo list by list_move()'ing the absorbing request to the
> absorbed one before removing it.
> 
> This works if both requests are on the same cfqq but nothing
> guarantees that and the code ends up moving the merged request to a
> different cfqq's fifo list without adjusting the rest.  This leads to
> the following failures.
> 
> * A request may be on the fifo list of a cfqq without holding
>   reference to it and the cfqq can be freed before requst is finished.
>   Among other things, this triggers list debug warning and slab debug
>   use-after-free warning.
> 
> * As a request can be on the wrong fifo queue, it may be issued and
>   completed before its cfqq is scheduled.  If the cfqq didn't have
>   other requests on it, it would be empty by the time it's dispatched
>   triggering BUG_ON() in cfq_dispatch_request().
> 
> Fix it by making cfq_merged_requests() scan the absorbing request's
> fifo list for the correct slot and move there instead.

Hmmm... while the patch would fix the problem.  It isn't entirely
correct.  The root cause is,

1. q->last_merge and rqhash used to be used only for merging bios into
   requests and that queries elevator whether the merge should be
   allowed.  cfq disallows merging if they belong to different cfqqs.

2. request-request merging didn't use to use q->last_merge or rqhash to
   find request candidates.  It used elv_former/latter_request() and
   cfq never returned request from a different cfqq.

3. Plug merging started using q->last_merge and rqhash and now
   elevator can't prevent cross cfqq merges.

So, yeah, the right fix would be using elv_former/latter_request()
instead.  Maybe we should strip out rqhash altogether and change
elevator handle everything?  I don't know.  I'll prepare a different
fix patch soon.

Thanks.
Shaohua Li Jan. 6, 2012, 2:47 a.m. UTC | #2
On Thu, 2012-01-05 at 18:17 -0800, Tejun Heo wrote:
> When two requests are merged, if the absorbed request is older than
> the absorbing one, cfq_merged_requests() tries to reposition it in the
> cfqq->fifo list by list_move()'ing the absorbing request to the
> absorbed one before removing it.
> 
> This works if both requests are on the same cfqq but nothing
> guarantees that and the code ends up moving the merged request to a
> different cfqq's fifo list without adjusting the rest.  This leads to
> the following failures.
> 
> * A request may be on the fifo list of a cfqq without holding
>   reference to it and the cfqq can be freed before requst is finished.
>   Among other things, this triggers list debug warning and slab debug
>   use-after-free warning.
> 
> * As a request can be on the wrong fifo queue, it may be issued and
>   completed before its cfqq is scheduled.  If the cfqq didn't have
>   other requests on it, it would be empty by the time it's dispatched
>   triggering BUG_ON() in cfq_dispatch_request().
> 
> Fix it by making cfq_merged_requests() scan the absorbing request's
> fifo list for the correct slot and move there instead.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Reported-by: Hugh Dickins <hughd@google.com>
> Cc: stable@vger.kernel.org
> ---
> It survived my testing long enough and I'm relatively confident this
> should fix the crash but I might have gotten the scanning wrong, so
> please pay extra attention there.
> 
> I suspect we just didn't have enough backward request-request merges
> before the recent plug merge updates to trigger this bug.
The patch itself looks good to me, but I'm wondering if we really need
do reposition of the fifo list for merged request. it's rare case and
not worthy such complexity to me.

Thanks,
Shaohua

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tejun Heo Jan. 6, 2012, 3:04 a.m. UTC | #3
Hello,

On Fri, Jan 06, 2012 at 11:14:15AM +0800, Shaohua Li wrote:
> > So, yeah, the right fix would be using elv_former/latter_request()
> > instead.  Maybe we should strip out rqhash altogether and change
> > elevator handle everything?  I don't know.  I'll prepare a different
> > fix patch soon.
>
> So not allow merge from two cfq queues strictly? This will impact
> performance. I don't know how important the strict isolation is. we even
> allow two cfq queues merge to improve performance.

That's how cfq has behaved before this recent plug merge breakage and
IIRC why the cooperating queue thing is there.  If you want to change
the behavior, that should be an explicit separate patch.

Thanks.
Shaohua Li Jan. 6, 2012, 3:14 a.m. UTC | #4
On Thu, 2012-01-05 at 18:36 -0800, Tejun Heo wrote:
> Hello, again.
> 
> On Thu, Jan 05, 2012 at 06:17:07PM -0800, Tejun Heo wrote:
> > When two requests are merged, if the absorbed request is older than
> > the absorbing one, cfq_merged_requests() tries to reposition it in the
> > cfqq->fifo list by list_move()'ing the absorbing request to the
> > absorbed one before removing it.
> > 
> > This works if both requests are on the same cfqq but nothing
> > guarantees that and the code ends up moving the merged request to a
> > different cfqq's fifo list without adjusting the rest.  This leads to
> > the following failures.
> > 
> > * A request may be on the fifo list of a cfqq without holding
> >   reference to it and the cfqq can be freed before requst is finished.
> >   Among other things, this triggers list debug warning and slab debug
> >   use-after-free warning.
> > 
> > * As a request can be on the wrong fifo queue, it may be issued and
> >   completed before its cfqq is scheduled.  If the cfqq didn't have
> >   other requests on it, it would be empty by the time it's dispatched
> >   triggering BUG_ON() in cfq_dispatch_request().
> > 
> > Fix it by making cfq_merged_requests() scan the absorbing request's
> > fifo list for the correct slot and move there instead.
> 
> Hmmm... while the patch would fix the problem.  It isn't entirely
> correct.  The root cause is,
> 
> 1. q->last_merge and rqhash used to be used only for merging bios into
>    requests and that queries elevator whether the merge should be
>    allowed.  cfq disallows merging if they belong to different cfqqs.
> 
> 2. request-request merging didn't use to use q->last_merge or rqhash to
>    find request candidates.  It used elv_former/latter_request() and
>    cfq never returned request from a different cfqq.
> 
> 3. Plug merging started using q->last_merge and rqhash and now
>    elevator can't prevent cross cfqq merges.
> 
> So, yeah, the right fix would be using elv_former/latter_request()
> instead.  Maybe we should strip out rqhash altogether and change
> elevator handle everything?  I don't know.  I'll prepare a different
> fix patch soon.
So not allow merge from two cfq queues strictly? This will impact
performance. I don't know how important the strict isolation is. we even
allow two cfq queues merge to improve performance.

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tejun Heo Jan. 6, 2012, 3:22 a.m. UTC | #5
On Thu, Jan 5, 2012 at 7:34 PM, Shaohua Li <shaohua.li@intel.com> wrote:
>> That's how cfq has behaved before this recent plug merge breakage and
>> IIRC why the cooperating queue thing is there.  If you want to change
>> the behavior, that should be an explicit separate patch.
> My point is both cooperating merge and the plug merge of different cfq
> are merge, no reason we allow one but disallow the other. plug merge
> isn't a breakage to me.

Isolation is pretty big deal for cfq and cross cfqq merging happening
without cfq noticing it isn't gonna be helpful to the cause.  Why
don't we merge bio's across different cfqq's then?
Tejun Heo Jan. 6, 2012, 3:30 a.m. UTC | #6
Ummmm... I've been looking at the code and currently I think the best
option is yank out plug merging for this merge window.

Bypassing rqhash for request merging doesn't seem like a good idea.
rqhash itself is, at this point, redundant and the limitation that it
can only find requests by the ending sectors lead to weird designs.
The single ->last_merge was okay but the recursive back merging is
just ugly.  What it should be doing is trying back merge and then
front merge once for each request insertion as the usual merge path
does.

We can't do attempt_back/front_merge() for INSERT_MERGE at this point
because elv_latter/former_request() only works for requests which are
already on elevator and, unfortunately, putting a request onto cfq may
kick the queue directly and the request might already be gone by the
time we try to merge it.

For this merge window, I think we better just disable INSERT_MERGE.

A mid-term solution could be changing elevator interface such that
elevator_add_req_fn() doesn't kick the queue directly but notify
elevator core that the queue needs kicking via return value, so that
merging can happen before kicking the queue.  Note that there's a
caveat here.  Merging might make kicking unnecessary or wrong.

Better solution would be changing elevator merge logic so that it has
"give me request closest to this offset" interface and then use
prev/next from there to find out merge candidates for both bio-rq and
rq-rq merges and just kill rqhash.

Thanks.
Shaohua Li Jan. 6, 2012, 3:34 a.m. UTC | #7
On Thu, 2012-01-05 at 19:04 -0800, Tejun Heo wrote:
> Hello,
> 
> On Fri, Jan 06, 2012 at 11:14:15AM +0800, Shaohua Li wrote:
> > > So, yeah, the right fix would be using elv_former/latter_request()
> > > instead.  Maybe we should strip out rqhash altogether and change
> > > elevator handle everything?  I don't know.  I'll prepare a different
> > > fix patch soon.
> >
> > So not allow merge from two cfq queues strictly? This will impact
> > performance. I don't know how important the strict isolation is. we even
> > allow two cfq queues merge to improve performance.
> 
> That's how cfq has behaved before this recent plug merge breakage and
> IIRC why the cooperating queue thing is there.  If you want to change
> the behavior, that should be an explicit separate patch.
My point is both cooperating merge and the plug merge of different cfq
are merge, no reason we allow one but disallow the other. plug merge
isn't a breakage to me.

Thanks,
Shaohua

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Shaohua Li Jan. 6, 2012, 4:15 a.m. UTC | #8
On Thu, 2012-01-05 at 19:22 -0800, Tejun Heo wrote:
> On Thu, Jan 5, 2012 at 7:34 PM, Shaohua Li <shaohua.li@intel.com> wrote:
> >> That's how cfq has behaved before this recent plug merge breakage and
> >> IIRC why the cooperating queue thing is there.  If you want to change
> >> the behavior, that should be an explicit separate patch.
> > My point is both cooperating merge and the plug merge of different cfq
> > are merge, no reason we allow one but disallow the other. plug merge
> > isn't a breakage to me.
> 
> Isolation is pretty big deal for cfq and cross cfqq merging happening
> without cfq noticing it isn't gonna be helpful to the cause.  Why
> don't we merge bio's across different cfqq's then?
don't know. I don't think a tweak for merge impacts isolation so much.
For rotate disk, request size hasn't impact to request cost, so this
doesn't impact isolation. Even for ssd, big size request is more
efficient to dispatch. And we already have breakage of fairness for SSD,
such as no idle.

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tejun Heo Jan. 6, 2012, 4:40 a.m. UTC | #9
Hello,

On Thu, Jan 5, 2012 at 8:15 PM, Shaohua Li <shaohua.li@intel.com> wrote:
> don't know. I don't think a tweak for merge impacts isolation so much.
> For rotate disk, request size hasn't impact to request cost, so this
> doesn't impact isolation. Even for ssd, big size request is more
> efficient to dispatch. And we already have breakage of fairness for SSD,
> such as no idle.

I'm not saying they shouldn't be merged but the decision should be
elevator's. Block core shouldn't decide it for the elevator. So,
whether cross cfqq merge is a good idea or not is mostly irrelevant in
this thread.

Thanks.
diff mbox

Patch

different cfqq's fifo list without adjusting the rest.  This leads to
the following failures.

* A request may be on the fifo list of a cfqq without holding
  reference to it and the cfqq can be freed before requst is finished.
  Among other things, this triggers list debug warning and slab debug
  use-after-free warning.

* As a request can be on the wrong fifo queue, it may be issued and
  completed before its cfqq is scheduled.  If the cfqq didn't have
  other requests on it, it would be empty by the time it's dispatched
  triggering BUG_ON() in cfq_dispatch_request().

Fix it by making cfq_merged_requests() scan the absorbing request's
fifo list for the correct slot and move there instead.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Hugh Dickins <hughd@google.com>
Cc: stable@vger.kernel.org
---
It survived my testing long enough and I'm relatively confident this
should fix the crash but I might have gotten the scanning wrong, so
please pay extra attention there.

I suspect we just didn't have enough backward request-request merges
before the recent plug merge updates to trigger this bug.

Thanks.

 block/cfq-iosched.c |    9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index 163263d..a235cf3 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -1663,8 +1663,15 @@  cfq_merged_requests(struct request_queue *q, struct request *rq,
 	 */
 	if (!list_empty(&rq->queuelist) && !list_empty(&next->queuelist) &&
 	    time_before(rq_fifo_time(next), rq_fifo_time(rq))) {
-		list_move(&rq->queuelist, &next->queuelist);
+		struct request *pos = rq;
+
 		rq_set_fifo_time(rq, rq_fifo_time(next));
+
+		list_for_each_entry_continue_reverse(pos, &cfqq->fifo, queuelist)
+			if (time_before_eq(rq_fifo_time(pos), rq_fifo_time(rq)))
+				break;
+		if (rq != pos)
+			list_move(&rq->queuelist, &pos->queuelist);
 	}
 
 	if (cfqq->next_rq == next)