diff mbox

[Bugme-new,Bug,39372] New: Problems with HFSC Scheduler

Message ID 1311948052.2843.19.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Eric Dumazet July 29, 2011, 2 p.m. UTC
Le vendredi 29 juillet 2011 à 15:27 +0200, Eric Dumazet a écrit :
> Le vendredi 29 juillet 2011 à 14:29 +0200, Michal Soltys a écrit :
> > On 11-07-15 00:14, Andrew Morton wrote:
> > > 
> > > (switched to email.  Please respond via emailed reply-to-all, not via
> > > the bugzilla web interface).
> > > 
> > > 
> > > Here: WARN_ON(next_time == 0);
> > > 
> > 
> > From the other thread on netfilter-devel:
> > 
> > > On 11-07-22 11:58, Michal Pokrywka wrote: After bisecting 2.6.39.1 it
> > > turned out that the bug is caused independently by two patches:
> > > 
> > > commit b262a5da755cc6ed0cb4fba230cd9bf4037e1096 sch_sfq: fix peek()
> > > implementation
> > > 
> > > and
> > > 
> > > commit 9df49f2bfe862573911a080c75a6d81113c5c81d sch_sfq: avoid giving
> > > spurious NET_XMIT_CN signals
> > > 
> > > Reverting these patches makes HFSC work again.
> > > 
> > 
> > This one (upstream 8efa885406359af300d46910642b50ca82c0fe47) seems to be
> > the culprit (does reverting only that one cures the problem ?)
> > 
> > It allows SFQ to return success on enqueuing, when the packet really
> > replaced some other packet in some other flow. This confuses outer qdisc
> > (in this particular case HFSC) which thinks new packet was actually
> > added each time such situation happes.
> > 
> 
> Technically speaking, _this_ packet was successfuly enqueued.
> 
> Returning NET_XMIT_CN or NET_XMIT_SUCCESS should not trigger a bug in
> caller.
> 
> > This in turn causes additional dequeues and ends with attempt
> > to schedule non-existent packets, and triggers the warning.
> > 
> 
> Then its probably a bug in HFSC : It doesnt understand SFQ lost a
> packet.
> 
> I'll take a look, thanks for the report.
> 
> 

Oh well, it seems one qdisc_tree_decrease_qlen(sch, 1) is missing

Maybe following patch would help...




--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Patrick McHardy July 29, 2011, 2:11 p.m. UTC | #1
On 29.07.2011 16:00, Eric Dumazet wrote:
> Le vendredi 29 juillet 2011 à 15:27 +0200, Eric Dumazet a écrit :
>> Le vendredi 29 juillet 2011 à 14:29 +0200, Michal Soltys a écrit :
>>> On 11-07-15 00:14, Andrew Morton wrote:
>>>>
>>>> (switched to email.  Please respond via emailed reply-to-all, not via
>>>> the bugzilla web interface).
>>>>
>>>>
>>>> Here: WARN_ON(next_time == 0);
>>>>
>>>
>>> From the other thread on netfilter-devel:
>>>
>>>> On 11-07-22 11:58, Michal Pokrywka wrote: After bisecting 2.6.39.1 it
>>>> turned out that the bug is caused independently by two patches:
>>>>
>>>> commit b262a5da755cc6ed0cb4fba230cd9bf4037e1096 sch_sfq: fix peek()
>>>> implementation
>>>>
>>>> and
>>>>
>>>> commit 9df49f2bfe862573911a080c75a6d81113c5c81d sch_sfq: avoid giving
>>>> spurious NET_XMIT_CN signals
>>>>
>>>> Reverting these patches makes HFSC work again.
>>>>
>>>
>>> This one (upstream 8efa885406359af300d46910642b50ca82c0fe47) seems to be
>>> the culprit (does reverting only that one cures the problem ?)
>>>
>>> It allows SFQ to return success on enqueuing, when the packet really
>>> replaced some other packet in some other flow. This confuses outer qdisc
>>> (in this particular case HFSC) which thinks new packet was actually
>>> added each time such situation happes.
>>>
>>
>> Technically speaking, _this_ packet was successfuly enqueued.
>>
>> Returning NET_XMIT_CN or NET_XMIT_SUCCESS should not trigger a bug in
>> caller.
>>
>>> This in turn causes additional dequeues and ends with attempt
>>> to schedule non-existent packets, and triggers the warning.
>>>
>>
>> Then its probably a bug in HFSC : It doesnt understand SFQ lost a
>> packet.
>>
>> I'll take a look, thanks for the report.
>>
>>
> 
> Oh well, it seems one qdisc_tree_decrease_qlen(sch, 1) is missing
> 
> Maybe following patch would help...
> 
> 
> diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c
> index 4536ee6..2a2d287 100644
> --- a/net/sched/sch_sfq.c
> +++ b/net/sched/sch_sfq.c
> @@ -410,7 +410,12 @@ sfq_enqueue(struct sk_buff *skb, struct Qdisc *sch)
>  	/* Return Congestion Notification only if we dropped a packet
>  	 * from this flow.
>  	 */
> -	return (qlen != slot->qlen) ? NET_XMIT_CN : NET_XMIT_SUCCESS;
> +	if (qlen != slot->qlen)
> +		return NET_XMIT_CN;
> +
> +	/* as we dropped a packet, better let upper stack know this */
> +	qdisc_tree_decrease_qlen(sch, 1);
> +	return NET_XMIT_SUCCESS;
>  }
>  

Yeah, that seems to be the correct fix, thanks for looking into this.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c
index 4536ee6..2a2d287 100644
--- a/net/sched/sch_sfq.c
+++ b/net/sched/sch_sfq.c
@@ -410,7 +410,12 @@  sfq_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 	/* Return Congestion Notification only if we dropped a packet
 	 * from this flow.
 	 */
-	return (qlen != slot->qlen) ? NET_XMIT_CN : NET_XMIT_SUCCESS;
+	if (qlen != slot->qlen)
+		return NET_XMIT_CN;
+
+	/* as we dropped a packet, better let upper stack know this */
+	qdisc_tree_decrease_qlen(sch, 1);
+	return NET_XMIT_SUCCESS;
 }
 
 static struct sk_buff *