diff mbox

[net-next,v2,1/2] pkt_sched: fq: avoid artificial bursts for clocked flows

Message ID 1422992932-9893-1-git-send-email-kennetkl@ifi.uio.no
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Kenneth Klette Jonassen Feb. 3, 2015, 7:48 p.m. UTC
Current pacing behavior always throttle flows for a time equal to one full
quantum, starting when a flow exhausts its credit. This is only optimal for
bursty traffic that consumes the entire credit in one sitting.

For flows with small packets, this throttling behavior can cause packets
to artificially queue and transmit in bursts, even when their sending rate
is well below their pacing rate. There is a refill mechanism in fq_enqueue()
that counteracts this in some cases, but it only works when flows space
their packets further apart than the flow refill delay.

Keep track of the time a flows credit was last filled, and use this to
approximate a full credit refill when one quantum of time passes. This is
a more fine-grained approach than the refill heuristic in fq_enqueue(),
which is removed by the next patch in this series.

Since calls to ktime are expensive, time_credit_filled is only set correctly
on dequeue. For new flows, set time_credit_filled to zero and anticipate
dequeue to refill one quantum without throttling. This approach requires
that initial_quantum >= quantum.

Increases memory footprint from 104 to 112 bytes per flow.

V2: avoids ktime_get_ns() on enqueue, as suggested by Eric Dumazet.

Signed-off-by: Kenneth Klette Jonassen <kennetkl@ifi.uio.no>
---
 net/sched/sch_fq.c | 27 ++++++++++++++++++++-------
 1 file changed, 20 insertions(+), 7 deletions(-)

Comments

Eric Dumazet Feb. 3, 2015, 10:06 p.m. UTC | #1
On Tue, 2015-02-03 at 20:48 +0100, Kenneth Klette Jonassen wrote:
>  
> -	if (tb[TCA_FQ_INITIAL_QUANTUM])
> -		q->initial_quantum = nla_get_u32(tb[TCA_FQ_INITIAL_QUANTUM]);
> +	if (tb[TCA_FQ_INITIAL_QUANTUM]) {
> +		u32 initial_quantum = nla_get_u32(tb[TCA_FQ_INITIAL_QUANTUM]);
> +
> +		if (initial_quantum > q->quantum)
> +			q->initial_quantum = initial_quantum - q->quantum;
> +		else
> +			q->initial_quantum = 0;
> +	}

Please do not mix patches. This part already have been Acked by me.

Quite frankly I do not like this patch.

There is no rush, and you touch some critical infrastructure for us.

Always CC me on any fq change, do not assume I catch all netdev
messages.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Kenneth Klette Jonassen Feb. 4, 2015, 5:56 p.m. UTC | #2
>> -	if (tb[TCA_FQ_INITIAL_QUANTUM])

>> -		q->initial_quantum = nla_get_u32(tb[TCA_FQ_INITIAL_QUANTUM]);

>> +	if (tb[TCA_FQ_INITIAL_QUANTUM]) {

>> +		u32 initial_quantum = nla_get_u32(tb[TCA_FQ_INITIAL_QUANTUM]);

>> +

>> +		if (initial_quantum > q->quantum)

>> +			q->initial_quantum = initial_quantum - q->quantum;

>> +		else

>> +			q->initial_quantum = 0;

>> +	}

> 

> Please do not mix patches. This part already have been Acked by me.


The hang fix touches another part of fq. This part is specific to this patch.

> 

> Quite frankly I do not like this patch.


If it is a code issue, please refactor it or point out some specifics.

Thanks for cc’ing me on your related patch (“do not pace pure ack packets”). Note that your patch does not address the core issue that this patch attempts to fix: fq throttling is just broken for small, clocked flows. This not only concerns TCP/SCTP/ARQ acks, but also thin-streams like VoIP and gaming.
Eric Dumazet Feb. 4, 2015, 6:47 p.m. UTC | #3
On Wed, 2015-02-04 at 17:56 +0000, Kenneth Klette Jonassen wrote:

> 
> If it is a code issue, please refactor it or point out some specifics.
> 
> Thanks for cc’ing me on your related patch (“do not pace pure ack
> packets”). Note that your patch does not address the core issue that
> this patch attempts to fix: fq throttling is just broken for small,
> clocked flows. This not only concerns TCP/SCTP/ARQ acks, but also
> thin-streams like VoIP and gaming.


I already told you this patch was not wanted.

We want a reasonably efficient packet scheduler, allowing TCP pacing,
at the scale of million of flows.

You can reduce quantum to whatever you need.

We have an internal patch doing something really different,
not adding 8 bytes overhead in the flow structure.

We will upstream it when fully tested.

commit 799735e5d08b3a7fc19176eee316ca647c9bad32
Author: Eric Dumazet <edumazet@google.com>
Date:   Mon Aug 11 14:35:36 2014 -0700

    net-sched-fq: special case low rate flows
    
    Effort: net-sched-fq
    Change-Id: Ibee17453702cccf729154f6b4db7aeb028fc555a



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/sched/sch_fq.c b/net/sched/sch_fq.c
index 313794b..43d5b74 100644
--- a/net/sched/sch_fq.c
+++ b/net/sched/sch_fq.c
@@ -71,6 +71,7 @@  struct fq_flow {
 
 	struct rb_node  rate_node;	/* anchor in q->delayed tree */
 	u64		time_next_packet;
+	u64		time_credit_filled;
 };
 
 struct fq_flow_head {
@@ -250,6 +251,7 @@  static struct fq_flow *fq_classify(struct sk_buff *skb, struct fq_sched_data *q)
 			if (unlikely(skb->sk &&
 				     f->socket_hash != sk->sk_hash)) {
 				f->credit = q->initial_quantum;
+				f->time_credit_filled = 0ULL;
 				f->socket_hash = sk->sk_hash;
 				f->time_next_packet = 0ULL;
 			}
@@ -271,6 +273,7 @@  static struct fq_flow *fq_classify(struct sk_buff *skb, struct fq_sched_data *q)
 	if (skb->sk)
 		f->socket_hash = sk->sk_hash;
 	f->credit = q->initial_quantum;
+	f->time_credit_filled = 0ULL;
 
 	rb_link_node(&f->fq_node, parent, p);
 	rb_insert_color(&f->fq_node, root);
@@ -374,8 +377,10 @@  static int fq_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 	qdisc_qstats_backlog_inc(sch, skb);
 	if (fq_flow_is_detached(f)) {
 		fq_flow_add_tail(&q->new_flows, f);
-		if (time_after(jiffies, f->age + q->flow_refill_delay))
-			f->credit = max_t(u32, f->credit, q->quantum);
+		if (time_after(jiffies, f->age + q->flow_refill_delay)) {
+			f->credit = max_t(int, f->credit - q->quantum, 0);
+			f->time_credit_filled = 0ULL;
+		}
 		q->inactive_flows--;
 	}
 
@@ -440,6 +445,7 @@  begin:
 
 	if (f->credit <= 0) {
 		f->credit += q->quantum;
+		f->time_credit_filled = max(now, f->time_next_packet);
 		head->first = f->next;
 		fq_flow_add_tail(&q->old_flows, f);
 		goto begin;
@@ -489,7 +495,7 @@  begin:
 			q->stat_pkts_too_long++;
 		}
 
-		f->time_next_packet = now + len;
+		f->time_next_packet = max(now, f->time_credit_filled + len);
 	}
 out:
 	qdisc_bstats_update(sch, skb);
@@ -679,8 +685,14 @@  static int fq_change(struct Qdisc *sch, struct nlattr *opt)
 			err = -EINVAL;
 	}
 
-	if (tb[TCA_FQ_INITIAL_QUANTUM])
-		q->initial_quantum = nla_get_u32(tb[TCA_FQ_INITIAL_QUANTUM]);
+	if (tb[TCA_FQ_INITIAL_QUANTUM]) {
+		u32 initial_quantum = nla_get_u32(tb[TCA_FQ_INITIAL_QUANTUM]);
+
+		if (initial_quantum > q->quantum)
+			q->initial_quantum = initial_quantum - q->quantum;
+		else
+			q->initial_quantum = 0;
+	}
 
 	if (tb[TCA_FQ_FLOW_DEFAULT_RATE])
 		pr_warn_ratelimited("sch_fq: defrate %u ignored.\n",
@@ -740,7 +752,7 @@  static int fq_init(struct Qdisc *sch, struct nlattr *opt)
 	sch->limit		= 10000;
 	q->flow_plimit		= 100;
 	q->quantum		= 2 * psched_mtu(qdisc_dev(sch));
-	q->initial_quantum	= 10 * psched_mtu(qdisc_dev(sch));
+	q->initial_quantum	= 10 * psched_mtu(qdisc_dev(sch)) - q->quantum;
 	q->flow_refill_delay	= msecs_to_jiffies(40);
 	q->flow_max_rate	= ~0U;
 	q->rate_enable		= 1;
@@ -773,7 +785,8 @@  static int fq_dump(struct Qdisc *sch, struct sk_buff *skb)
 	if (nla_put_u32(skb, TCA_FQ_PLIMIT, sch->limit) ||
 	    nla_put_u32(skb, TCA_FQ_FLOW_PLIMIT, q->flow_plimit) ||
 	    nla_put_u32(skb, TCA_FQ_QUANTUM, q->quantum) ||
-	    nla_put_u32(skb, TCA_FQ_INITIAL_QUANTUM, q->initial_quantum) ||
+	    nla_put_u32(skb, TCA_FQ_INITIAL_QUANTUM,
+		        q->initial_quantum + q->quantum) ||
 	    nla_put_u32(skb, TCA_FQ_RATE_ENABLE, q->rate_enable) ||
 	    nla_put_u32(skb, TCA_FQ_FLOW_MAX_RATE, q->flow_max_rate) ||
 	    nla_put_u32(skb, TCA_FQ_FLOW_REFILL_DELAY,