diff mbox

[block/for-3.3/core] block: an exiting task should be allowed to create io_context

Message ID 20120105012445.GP31746@google.com
State Not Applicable
Delegated to: David Miller
Headers show

Commit Message

Tejun Heo Jan. 5, 2012, 1:24 a.m. UTC
Hello, again.

On Tue, Jan 03, 2012 at 02:35:05PM -0800, Tejun Heo wrote:
> > Hmmm... I disabled cfqq merge logic (commented out
> > cfq_close_cooperator() and the following cfq_setup_merge() calls) in
> > cfq_select_queue() and neither is triggering for quite a while now.
> > Maybe cfqq refcnt is getting borked over cfqq merging / splitting?  It
> > would also explain the low frequency of the issue too.  I'll try to
> > further isolate it but It would be awesome if someone more familiar
> > with the logic can go over that part.
> 
> Scrap that.  It triggered and yeah cfq_get_next_queue() is retrieving
> empty cfqq from the service tree.

Update: I've been running with ELEVATOR_INSERT_SORT_MERGE disabled and
none of the issues triggered for about three hours now.  It usually
triggers under an hour.

Hugh, can you please verify whether the patch appended at the end
makes the problem go away?

Thanks.

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Hugh Dickins Jan. 5, 2012, 6:36 p.m. UTC | #1
On Wed, 4 Jan 2012, Tejun Heo wrote:
> On Tue, Jan 03, 2012 at 02:35:05PM -0800, Tejun Heo wrote:
> > > Hmmm... I disabled cfqq merge logic (commented out
> > > cfq_close_cooperator() and the following cfq_setup_merge() calls) in
> > > cfq_select_queue() and neither is triggering for quite a while now.
> > > Maybe cfqq refcnt is getting borked over cfqq merging / splitting?  It
> > > would also explain the low frequency of the issue too.  I'll try to
> > > further isolate it but It would be awesome if someone more familiar
> > > with the logic can go over that part.
> > 
> > Scrap that.  It triggered and yeah cfq_get_next_queue() is retrieving
> > empty cfqq from the service tree.
> 
> Update: I've been running with ELEVATOR_INSERT_SORT_MERGE disabled and
> none of the issues triggered for about three hours now.  It usually
> triggers under an hour.
> 
> Hugh, can you please verify whether the patch appended at the end
> makes the problem go away?

I can vouch that it makes the list_debug.c:53 warnings go away: they
used to come in half an hour, but I've run on two machines for 10
hours so far without seeing any.

And I've not yet seen a cfq-iosched.c:2585 with that patch in; but
those were more elusive - we'll have to run until tomorrow night
before concluding that it makes those go away too.

Or if you understand the connection between these, and would prefer
to assume that it fixes them both, and want me to try another patch
instead (I assume the one I'm running with is a workaround to narrow
down the issue, rather than an intended fix?), let me know and I'll
switch over.

Hugh

> 
> Thanks.
> 
> diff --git a/block/elevator.c b/block/elevator.c
> index 99838f4..2e172d2 100644
> --- a/block/elevator.c
> +++ b/block/elevator.c
> @@ -644,6 +644,9 @@ void __elv_add_request(struct request_queue *q, struct request *rq, int where)
>  
>  	rq->q = q;
>  
> +	if (where == ELEVATOR_INSERT_SORT_MERGE)
> +		where = ELEVATOR_INSERT_SORT;
> +
>  	if (rq->cmd_flags & REQ_SOFTBARRIER) {
>  		/* barriers are scheduling boundary, update end_sector */
>  		if (rq->cmd_type == REQ_TYPE_FS ||
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tejun Heo Jan. 5, 2012, 6:38 p.m. UTC | #2
Hello, Hugh.

On Thu, Jan 05, 2012 at 10:36:21AM -0800, Hugh Dickins wrote:
> I can vouch that it makes the list_debug.c:53 warnings go away: they
> used to come in half an hour, but I've run on two machines for 10
> hours so far without seeing any.

Yeah, my overnight test is pretty conclusive too.  I can usually hit
the bug under 30mins without the patch.

> And I've not yet seen a cfq-iosched.c:2585 with that patch in; but
> those were more elusive - we'll have to run until tomorrow night
> before concluding that it makes those go away too.
> 
> Or if you understand the connection between these, and would prefer
> to assume that it fixes them both, and want me to try another patch
> instead (I assume the one I'm running with is a workaround to narrow
> down the issue, rather than an intended fix?), let me know and I'll
> switch over.

Yeap, I can see the connections now and think I know where the bug is
too.  I'll, hopefully, be able to post fix soon.

Thanks.
diff mbox

Patch

diff --git a/block/elevator.c b/block/elevator.c
index 99838f4..2e172d2 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -644,6 +644,9 @@  void __elv_add_request(struct request_queue *q, struct request *rq, int where)
 
 	rq->q = q;
 
+	if (where == ELEVATOR_INSERT_SORT_MERGE)
+		where = ELEVATOR_INSERT_SORT;
+
 	if (rq->cmd_flags & REQ_SOFTBARRIER) {
 		/* barriers are scheduling boundary, update end_sector */
 		if (rq->cmd_type == REQ_TYPE_FS ||