Patchwork [block:for-3.3/core] block: disable ELEVATOR_INSERT_SORT_MERGE

login
register
mail settings
Submitter Tejun Heo
Date Jan. 6, 2012, 3:52 a.m.
Message ID <20120106035247.GF6276@google.com>
Download mbox | patch
Permalink /patch/134596/
State Not Applicable
Delegated to: David Miller
Headers show

Comments

Tejun Heo - Jan. 6, 2012, 3:52 a.m.
5e84ea3a9c "block: attempt to merge with existing requests on plug
flush" added support for merging requests on plug flush and 274193224c
"block: recursive merge requests" added recursive merging.

Because these mergings happen before the request is inserted on the
elevator, the usual elv_latter/former_request() can't be used to
locate merge candidates.  It instead used bio merging mechanism -
last_merge hint and rqhash; unfortunately, this means that the
elevator doesn't have a say in which are allowed to merge and which
aren't.

For cfq, this resulted in merges across different cfqq's which led to
crashes as requests jump between different cfqq's unexpectedly.

Proper solution would be improving merge mechanism such that we can
always query elevator to find out merge candidates and remove rqhash;
however, the merge window is already upon us.  Disable
INSERT_SORT_MERGE for now.

For detailed discussion of the bug:

 http://thread.gmane.org/gmane.linux.kernel.next/20064/focus=20159

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Hugh Dickins <hughd@google.com>
Cc: stable@vger.kernel.org
---
 block/blk-core.c |    5 ++++-
 block/elevator.c |    5 +++++
 2 files changed, 9 insertions(+), 1 deletion(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Shaohua Li - Jan. 6, 2012, 4:19 a.m.
On Thu, 2012-01-05 at 19:52 -0800, Tejun Heo wrote:
> 5e84ea3a9c "block: attempt to merge with existing requests on plug
> flush" added support for merging requests on plug flush and 274193224c
> "block: recursive merge requests" added recursive merging.
> 
> Because these mergings happen before the request is inserted on the
> elevator, the usual elv_latter/former_request() can't be used to
> locate merge candidates.  It instead used bio merging mechanism -
> last_merge hint and rqhash; unfortunately, this means that the
> elevator doesn't have a say in which are allowed to merge and which
> aren't.
> 
> For cfq, this resulted in merges across different cfqq's which led to
> crashes as requests jump between different cfqq's unexpectedly.
> 
> Proper solution would be improving merge mechanism such that we can
> always query elevator to find out merge candidates and remove rqhash;
> however, the merge window is already upon us.  Disable
> INSERT_SORT_MERGE for now.
> 
> For detailed discussion of the bug:
> 
>  http://thread.gmane.org/gmane.linux.kernel.next/20064/focus=20159
> 
this is overkill. when plug is added, we found huge performance
regression, that's why we add INSERT_SORT_MERGE.

Thanks,
Shaohua

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tejun Heo - Jan. 6, 2012, 4:38 a.m.
Hello,

On Thu, Jan 5, 2012 at 8:19 PM, Shaohua Li <shaohua.li@intel.com> wrote:
>> For detailed discussion of the bug:
>>
>>  http://thread.gmane.org/gmane.linux.kernel.next/20064/focus=20159
>>
> this is overkill. when plug is added, we found huge performance
> regression, that's why we add INSERT_SORT_MERGE.

With what workload? I suspect most of the improvements is from merging
across different cfqqs, no? The whole recursive thing can't be very
useful if cross-cfqq merging isn't allowed. Maybe there are specific
cases where last_merge hint merging can be specially effective. I
don't know. Regardless, this is an apparent bug and the block tree
will be pushed mainline pretty soon. If necessary, fix it better
later. Doing complex things inside merge window usually isn't a good
idea.

Thanks.
Shaohua Li - Jan. 6, 2012, 8:15 a.m.
On Thu, 2012-01-05 at 20:38 -0800, Tejun Heo wrote:
> Hello,
> 
> On Thu, Jan 5, 2012 at 8:19 PM, Shaohua Li <shaohua.li@intel.com> wrote:
> >> For detailed discussion of the bug:
> >>
> >>  http://thread.gmane.org/gmane.linux.kernel.next/20064/focus=20159
> >>
> > this is overkill. when plug is added, we found huge performance
> > regression, that's why we add INSERT_SORT_MERGE.
> 
> With what workload? I suspect most of the improvements is from merging
> across different cfqqs, no? The whole recursive thing can't be very
> useful if cross-cfqq merging isn't allowed. Maybe there are specific
> cases where last_merge hint merging can be specially effective. I
> don't know. Regardless, this is an apparent bug and the block tree
> will be pushed mainline pretty soon. If necessary, fix it better
> later. Doing complex things inside merge window usually isn't a good
> idea.
it's not related to the recursive merge. I forgot the detail of the
workload why we add INSERT_SORT_MERGE, it's added several months ago
anyway.
Obviously we shouldn't do complex things inside the merge window. I just
didn't agree to disable INSERT_SORT_MERGE, which will bring performance
regression for sure. Can we just change CFQ like your first path?

Thanks,
Shaohua

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tejun Heo - Jan. 6, 2012, 3:34 p.m.
Hello, Shaohua.

On Fri, Jan 06, 2012 at 04:15:57PM +0800, Shaohua Li wrote:
> it's not related to the recursive merge. I forgot the detail of the
> workload why we add INSERT_SORT_MERGE, it's added several months ago
> anyway.
> Obviously we shouldn't do complex things inside the merge window. I just
> didn't agree to disable INSERT_SORT_MERGE, which will bring performance
> regression for sure. Can we just change CFQ like your first path?

But that is exactly what was broken and papering it over is exactly
the thing we shouldn't do, for now or for ever.  Think about it.  Your
commit f1f8cc94651 "block, cfq: fix empty queue crash caused by
request merge" fixed one symptom caused by cross cfqq merging
happening behind cfq's back without properly root-causing it.  I'm not
saying it was your fault but in the end all it did was obscuring the
root cause of the problem further, making the next more subtle failure
much more difficult to diagnose.

So, if you think cross-cfqq merging is a good idea, please go ahead
and do it properly.  It shouldn't happen as a side effect of two bugs
folded together.

As for quicker fix than revamping merge infrastructure, adding another
callback to explicitly ask elevator whether two requests chosen by
block core can be merged.  I didn't go that route because I wasn't
sure about the benefit of doing so - I still can't see how that would
make a lot of difference without cross cfqq merging, and wanted to
stay on the safer side given the intricacy of the problem.

So, if you wanna do that and have workload which is benefited enough
to justify the extra step, let's do it after the merge window, merge
as fixes and send it back via -stable.  But we shouldn't be doing that
inside merge window when all other trees are getting pushed upstream
and we don't have much time for testing ourselves or in linux-next.

Thanks.

Patch

diff --git a/block/blk-core.c b/block/blk-core.c
index 8fbdac7..7db6afa 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -2859,11 +2859,14 @@  void blk_flush_plug_list(struct blk_plug *plug, bool from_schedule)
 
 		/*
 		 * rq is already accounted, so use raw insert
+		 *
+		 * FIXME: We want INSERT_SORT_MERGE for non-FLUSH/FUA
+		 * requests but it's currently broken.
 		 */
 		if (rq->cmd_flags & (REQ_FLUSH | REQ_FUA))
 			__elv_add_request(q, rq, ELEVATOR_INSERT_FLUSH);
 		else
-			__elv_add_request(q, rq, ELEVATOR_INSERT_SORT_MERGE);
+			__elv_add_request(q, rq, ELEVATOR_INSERT_SORT);
 
 		depth++;
 	}
diff --git a/block/elevator.c b/block/elevator.c
index 99838f4..c32f5bc 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -644,6 +644,11 @@  void __elv_add_request(struct request_queue *q, struct request *rq, int where)
 
 	rq->q = q;
 
+	/*
+	 * FIXME: INSERT_SORT_MERGE is broken and blk_flush_plug_list(),
+	 * the only user, is updated to use INSERT_SORT for now.
+	 */
+
 	if (rq->cmd_flags & REQ_SOFTBARRIER) {
 		/* barriers are scheduling boundary, update end_sector */
 		if (rq->cmd_type == REQ_TYPE_FS ||