Patchwork [BUGFIX] pkt_sched: fix little service anomalies and possible crashes of qfq+

login
register
mail settings
Submitter Paolo Valente
Date Feb. 26, 2013, 5:02 p.m.
Message ID <1361898166-18295-1-git-send-email-paolo.valente@unimore.it>
Download mbox | patch
Permalink /patch/223358/
State Changes Requested
Delegated to: David Miller
Headers show

Comments

Paolo Valente - Feb. 26, 2013, 5:02 p.m.
This is a new version of the patch, with comments reformatted as
requested, and reviewed by Fabio (one of the authors of QFQ). This
version also includes some little, non-functional code improvements
suggested by Fabio.

The patch contains five fixes that remove some little service
anomalies, plus an improvement that slightly reduces the execution
time of qfq+.

As for possible crashes, in a stress test in which a few hundreds of
classes randomly switched from backlogged to non-backlogged, and the
parameters of a few tens of randomly chosen classes were changed
repeatedly and in parallel, the problems solved by these fixes
happened to combine and cause corruption of the data structures
supporting the group bucket lists.

The portions of the code interested by each fix are small and do not
overlap with each other, so I decided to provide just one patch
(I hope that this was the right choice).

1. Fix the update of eligible-group sets

Between two invocations of make_eligible, the system virtual time may
happen to grow enough that, in its binary representation, a bit with
higher order than 31 flips. This happens especially with
TSO/GSO. Before this fix, the mask used in make_eligible was computed
as (1UL<<index_of_last_flipped_bit)-1, whose value is well defined on
a 64-bit architecture, because index_of_flipped_bit <= 63, but is in
general undefined on a 32-bit architecture if index_of_flipped_bit > 31.
The fix just replaces 1UL with 1ULL.

2. Properly cap timestamps in charge_actual_service

After decreasing the number of classes, and hence the maximum budget
budgetmax for an aggregate in service, it may happen that the service
it receives before a new aggregate is chosen for service becomes
larger than budgetmax. This may cause the aggregate to be assigned an
incorrect, too high virtual finish time. The cap introduced by this
fix solves this problem.

3. Do not allow virtual time to jump if an aggregate is in service

By definition of (the algorithm of) qfq+, the system virtual time must
be pushed up only if there is no eligible aggregate. But to decide
whether this condition holds correctly, qfq+ must also check whether
no aggregate is in service (and hence eligible).

4. Prevent the budget of an aggregate from becoming negative in
qfq_dequeue

If lmax is lowered, through qfq_change_class, for a class owning
pending packets with larger size than the new value of lmax, then even
the budget of a just selected-aggregate may happen to be lower than
the length of the next packet to serve. This fix prevents the budget
from becoming negative after the packet is dequeued.

5. Start serving a just-activated aggregate immediately if the
scheduler is empty

This avoids qfq+ to be occasionally non-work-conserving.

6. Remove useless invocation of qfq_update_eligible from qfq_deactivate_agg

There is no need to invoke qfq_update_eligible in qfq_deactivate_agg
to keep qfq+ work-conserving.

Signed-off-by: Paolo Valente <paolo.valente@unimore.it>
Reviewed-by: Fabio Checconi <fchecconi@gmail.com>
---
 net/sched/sch_qfq.c |   66 +++++++++++++++++++++++++++++++++++----------------
 1 file changed, 45 insertions(+), 21 deletions(-)
David Miller - Feb. 26, 2013, 10:37 p.m.
From: Paolo valente <paolo.valente@unimore.it>
Date: Tue, 26 Feb 2013 18:02:46 +0100

> The portions of the code interested by each fix are small and do not
> overlap with each other, so I decided to provide just one patch
> (I hope that this was the right choice).

Please split this up into 6 patches, each with an appropriately
verbose analysis and explanation of each bug being fixed, thanks.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paolo Valente - March 5, 2013, 6:04 p.m.
Il 26/02/2013 23:37, David Miller ha scritto:
> From: Paolo valente <paolo.valente@unimore.it>
> Date: Tue, 26 Feb 2013 18:02:46 +0100
> 
>> The portions of the code interested by each fix are small and do not
>> overlap with each other, so I decided to provide just one patch
>> (I hope that this was the right choice).
> 
> Please split this up into 6 patches, each with an appropriately
> verbose analysis and explanation of each bug being fixed, thanks.
> 
> 

Split, and inserted a detailed description of both the problem and the fix
in each patch.

Paolo valente (6):
  pkt_sched: properly cap timestamps in charge_actual_service
  pkt_sched: fix the update of eligible-group sets
  pkt_sched: serve activated aggregates immediately if the scheduler is
    empty
  pkt_sched: prevent budget from wrapping around after a dequeue
  pkt_sched: do not allow virtual time to jump if an aggregate is in
    service
  pkt_sched: remove a useless invocation of qfq_update_eligible

 net/sched/sch_qfq.c |   66 +++++++++++++++++++++++++++++++++++----------------
 1 file changed, 45 insertions(+), 21 deletions(-)
David Miller - March 6, 2013, 4:50 a.m.
From: Paolo valente <paolo.valente@unimore.it>
Date: Tue,  5 Mar 2013 19:04:56 +0100

> Split, and inserted a detailed description of both the problem and the fix
> in each patch.

Series applied, thanks.

Although two topics for possibly resolving later:

1) That 1ULL bit mask fix is quite expensive on 32-bit, it would
   probably be cheaper to test for that case using a helper function
   that nops out on 64-bit.  Although this is not so important.

2) That static inline forward declaration is ugly, better to remove
   the inline tag (let the compiler handle it) or move the function
   above all the call sites.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paolo Valente - July 10, 2013, 1:46 p.m.
Hi,
this patchset tries to address the two issues raised by Dave in
http://marc.info/?l=linux-kernel&m=136254542019113&w=2

The first of these issues has been also raised by
David Laight:
http://marc.info/?l=linux-kernel&m=136256446624557&w=2

> Although two topics for possibly resolving later:
> 
> 1) That 1ULL bit mask fix is quite expensive on 32-bit, it would
>     probably be cheaper to test for that case using a helper function
>     that nops out on 64-bit.  Although this is not so important.
> 
I have tried to find a simple solution, based on the properties of the mask.
It is in the first patch.

> 2) That static inline forward declaration is ugly, better to remove
>     the inline tag (let the compiler handle it) or move the function
>     above all the call sites.
> 
Done by the second patch.

Paolo Valente (2):
  pkt_sched: sch_qfq: improve efficiency of make_eligible
  pkt_sched: sch_qfq: remove forward declaration of qfq_update_agg_ts

 net/sched/sch_qfq.c | 127 ++++++++++++++++++++++++++--------------------------
 1 file changed, 63 insertions(+), 64 deletions(-)
David Miller - July 11, 2013, 8:01 p.m.
From: Paolo Valente <paolo.valente@unimore.it>
Date: Wed, 10 Jul 2013 15:46:07 +0200

> Hi,
> this patchset tries to address the two issues raised by Dave in
> http://marc.info/?l=linux-kernel&m=136254542019113&w=2
> 
> The first of these issues has been also raised by
> David Laight:
> http://marc.info/?l=linux-kernel&m=136256446624557&w=2
> 
>> Although two topics for possibly resolving later:
>> 
>> 1) That 1ULL bit mask fix is quite expensive on 32-bit, it would
>>     probably be cheaper to test for that case using a helper function
>>     that nops out on 64-bit.  Although this is not so important.
>> 
> I have tried to find a simple solution, based on the properties of the mask.
> It is in the first patch.
> 
>> 2) That static inline forward declaration is ugly, better to remove
>>     the inline tag (let the compiler handle it) or move the function
>>     above all the call sites.
>> 
> Done by the second patch.

Looks good, thanks for doing this, both applied.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/net/sched/sch_qfq.c b/net/sched/sch_qfq.c
index 6ed3765..7cfa1f8 100644
--- a/net/sched/sch_qfq.c
+++ b/net/sched/sch_qfq.c
@@ -299,6 +299,10 @@  static void qfq_update_agg(struct qfq_sched *q, struct qfq_aggregate *agg,
 	    new_num_classes == q->max_agg_classes - 1) /* agg no more full */
 		hlist_add_head(&agg->nonfull_next, &q->nonfull_aggs);
 
+	/* The next assignment may let
+	 * agg->initial_budget > agg->budgetmax
+	 * hold, we will take it into account in charge_actual_service().
+	 */
 	agg->budgetmax = new_num_classes * agg->lmax;
 	new_agg_weight = agg->class_weight * new_num_classes;
 	agg->inv_w = ONE_FP/new_agg_weight;
@@ -819,7 +823,7 @@  static void qfq_make_eligible(struct qfq_sched *q)
 	unsigned long old_vslot = q->oldV >> q->min_slot_shift;
 
 	if (vslot != old_vslot) {
-		unsigned long mask = (1UL << fls(vslot ^ old_vslot)) - 1;
+		unsigned long mask = (1ULL << fls(vslot ^ old_vslot)) - 1;
 		qfq_move_groups(q, mask, IR, ER);
 		qfq_move_groups(q, mask, IB, EB);
 	}
@@ -990,12 +994,23 @@  static inline struct sk_buff *qfq_peek_skb(struct qfq_aggregate *agg,
 /* Update F according to the actual service received by the aggregate. */
 static inline void charge_actual_service(struct qfq_aggregate *agg)
 {
-	/* compute the service received by the aggregate */
-	u32 service_received = agg->initial_budget - agg->budget;
+	/* Compute the service received by the aggregate, taking into
+	 * account that, after decreasing the number of classes in
+	 * agg, it may happen that
+	 * agg->initial_budget - agg->budget > agg->bugdetmax
+	 */
+	u32 service_received = min(agg->budgetmax,
+				   agg->initial_budget - agg->budget);
 
 	agg->F = agg->S + (u64)service_received * agg->inv_w;
 }
 
+static inline void qfq_update_agg_ts(struct qfq_sched *q,
+				     struct qfq_aggregate *agg,
+				     enum update_reason reason);
+
+static void qfq_schedule_agg(struct qfq_sched *q, struct qfq_aggregate *agg);
+
 static struct sk_buff *qfq_dequeue(struct Qdisc *sch)
 {
 	struct qfq_sched *q = qdisc_priv(sch);
@@ -1023,7 +1038,7 @@  static struct sk_buff *qfq_dequeue(struct Qdisc *sch)
 		in_serv_agg->initial_budget = in_serv_agg->budget =
 			in_serv_agg->budgetmax;
 
-		if (!list_empty(&in_serv_agg->active))
+		if (!list_empty(&in_serv_agg->active)) {
 			/*
 			 * Still active: reschedule for
 			 * service. Possible optimization: if no other
@@ -1034,8 +1049,9 @@  static struct sk_buff *qfq_dequeue(struct Qdisc *sch)
 			 * handle it, we would need to maintain an
 			 * extra num_active_aggs field.
 			*/
-			qfq_activate_agg(q, in_serv_agg, requeue);
-		else if (sch->q.qlen == 0) { /* no aggregate to serve */
+			qfq_update_agg_ts(q, in_serv_agg, requeue);
+			qfq_schedule_agg(q, in_serv_agg);
+		} else if (sch->q.qlen == 0) { /* no aggregate to serve */
 			q->in_serv_agg = NULL;
 			return NULL;
 		}
@@ -1054,7 +1070,15 @@  static struct sk_buff *qfq_dequeue(struct Qdisc *sch)
 	qdisc_bstats_update(sch, skb);
 
 	agg_dequeue(in_serv_agg, cl, len);
-	in_serv_agg->budget -= len;
+	/* If lmax is lowered, through qfq_change_class, for a class
+	 * owning pending packets with larger size than the new value
+	 * of lmax, then the following condition may hold.
+	 */
+	if (unlikely(in_serv_agg->budget < len))
+		in_serv_agg->budget = 0;
+	else
+		in_serv_agg->budget -= len;
+
 	q->V += (u64)len * IWSUM;
 	pr_debug("qfq dequeue: len %u F %lld now %lld\n",
 		 len, (unsigned long long) in_serv_agg->F,
@@ -1219,17 +1243,11 @@  static int qfq_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 	cl->deficit = agg->lmax;
 	list_add_tail(&cl->alist, &agg->active);
 
-	if (list_first_entry(&agg->active, struct qfq_class, alist) != cl)
-		return err; /* aggregate was not empty, nothing else to do */
+	if (list_first_entry(&agg->active, struct qfq_class, alist) != cl ||
+	    q->in_serv_agg == agg)
+		return err; /* non-empty or in service, nothing else to do */
 
-	/* recharge budget */
-	agg->initial_budget = agg->budget = agg->budgetmax;
-
-	qfq_update_agg_ts(q, agg, enqueue);
-	if (q->in_serv_agg == NULL)
-		q->in_serv_agg = agg;
-	else if (agg != q->in_serv_agg)
-		qfq_schedule_agg(q, agg);
+	qfq_activate_agg(q, agg, enqueue);
 
 	return err;
 }
@@ -1263,7 +1281,8 @@  static void qfq_schedule_agg(struct qfq_sched *q, struct qfq_aggregate *agg)
 		/* group was surely ineligible, remove */
 		__clear_bit(grp->index, &q->bitmaps[IR]);
 		__clear_bit(grp->index, &q->bitmaps[IB]);
-	} else if (!q->bitmaps[ER] && qfq_gt(roundedS, q->V))
+	} else if (!q->bitmaps[ER] && qfq_gt(roundedS, q->V) &&
+		   q->in_serv_agg == NULL)
 		q->V = roundedS;
 
 	grp->S = roundedS;
@@ -1286,8 +1305,15 @@  skip_update:
 static void qfq_activate_agg(struct qfq_sched *q, struct qfq_aggregate *agg,
 			     enum update_reason reason)
 {
+	agg->initial_budget = agg->budget = agg->budgetmax; /* recharge budg. */
+
 	qfq_update_agg_ts(q, agg, reason);
-	qfq_schedule_agg(q, agg);
+	if (q->in_serv_agg == NULL) { /* no aggr. in service or scheduled */
+		q->in_serv_agg = agg; /* start serving this aggregate */
+		 /* update V: to be in service, agg must be eligible */
+		q->oldV = q->V = agg->S;
+	} else if (agg != q->in_serv_agg)
+		qfq_schedule_agg(q, agg);
 }
 
 static void qfq_slot_remove(struct qfq_sched *q, struct qfq_group *grp,
@@ -1359,8 +1385,6 @@  static void qfq_deactivate_agg(struct qfq_sched *q, struct qfq_aggregate *agg)
 			__set_bit(grp->index, &q->bitmaps[s]);
 		}
 	}
-
-	qfq_update_eligible(q);
 }
 
 static void qfq_qlen_notify(struct Qdisc *sch, unsigned long arg)