diff mbox

[net-next] fq_codel: add memory limitation per queue

Message ID 1462550112.13075.47.camel@edumazet-glaptop3.roam.corp.google.com
State Accepted, archived
Delegated to: David Miller
Headers show

Commit Message

Eric Dumazet May 6, 2016, 3:55 p.m. UTC
From: Eric Dumazet <edumazet@google.com>

On small embedded routers, one wants to control maximal amount of
memory used by fq_codel, instead of controlling number of packets or
bytes, since GRO/TSO make these not practical.

Assuming skb->truesize is accurate, we have to keep track of
skb->truesize sum for skbs in queue.

This patch adds a new TCA_FQ_CODEL_MEMORY_LIMIT attribute.

I chose a default value of 32 MBytes, which looks reasonable even
for heavy duty usages. (Prior fq_codel users should not be hurt
when they upgrade their kernels)

Two fields are added to tc_fq_codel_qd_stats to report :
 - Current memory usage
 - Number of drops caused by memory limits

# tc qd replace dev eth1 root est 1sec 4sec fq_codel memory_limit 4M
..
# tc -s -d qd sh dev eth1
qdisc fq_codel 8008: root refcnt 257 limit 10240p flows 1024
 quantum 1514 target 5.0ms interval 100.0ms memory_limit 4Mb ecn 
 Sent 2083566791363 bytes 1376214889 pkt (dropped 4994406, overlimits 0
requeues 21705223) 
 rate 9841Mbit 812549pps backlog 3906120b 376p requeues 21705223 
  maxpacket 68130 drop_overlimit 4994406 new_flow_count 28855414
  ecn_mark 0 memory_used 4190048 drop_overmemory 4994406
  new_flows_len 1 old_flows_len 177


Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Dave Täht <dave.taht@gmail.com>
Cc: Sebastian Möller <moeller0@gmx.de>
---
 include/uapi/linux/pkt_sched.h |    3 +++
 net/sched/sch_fq_codel.c       |   27 ++++++++++++++++++++++++---
 2 files changed, 27 insertions(+), 3 deletions(-)

Comments

David Miller May 9, 2016, 3:49 a.m. UTC | #1
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Fri, 06 May 2016 08:55:12 -0700

> From: Eric Dumazet <edumazet@google.com>
> 
> On small embedded routers, one wants to control maximal amount of
> memory used by fq_codel, instead of controlling number of packets or
> bytes, since GRO/TSO make these not practical.
> 
> Assuming skb->truesize is accurate, we have to keep track of
> skb->truesize sum for skbs in queue.
> 
> This patch adds a new TCA_FQ_CODEL_MEMORY_LIMIT attribute.
> 
> I chose a default value of 32 MBytes, which looks reasonable even
> for heavy duty usages. (Prior fq_codel users should not be hurt
> when they upgrade their kernels)
> 
> Two fields are added to tc_fq_codel_qd_stats to report :
>  - Current memory usage
>  - Number of drops caused by memory limits
> 
> # tc qd replace dev eth1 root est 1sec 4sec fq_codel memory_limit 4M
> ..
> # tc -s -d qd sh dev eth1
> qdisc fq_codel 8008: root refcnt 257 limit 10240p flows 1024
>  quantum 1514 target 5.0ms interval 100.0ms memory_limit 4Mb ecn 
>  Sent 2083566791363 bytes 1376214889 pkt (dropped 4994406, overlimits 0
> requeues 21705223) 
>  rate 9841Mbit 812549pps backlog 3906120b 376p requeues 21705223 
>   maxpacket 68130 drop_overlimit 4994406 new_flow_count 28855414
>   ecn_mark 0 memory_used 4190048 drop_overmemory 4994406
>   new_flows_len 1 old_flows_len 177
> 
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied, thanks Eric.
Cong Wang May 9, 2016, 4:14 a.m. UTC | #2
On Fri, May 6, 2016 at 8:55 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> @@ -193,6 +199,7 @@ static int fq_codel_enqueue(struct sk_buff *skb, struct Qdisc *sch)
>         unsigned int idx, prev_backlog, prev_qlen;
>         struct fq_codel_flow *flow;
>         int uninitialized_var(ret);
> +       bool memory_limited;
>
>         idx = fq_codel_classify(skb, sch, &ret);
>         if (idx == 0) {
> @@ -215,7 +222,9 @@ static int fq_codel_enqueue(struct sk_buff *skb, struct Qdisc *sch)
>                 flow->deficit = q->quantum;
>                 flow->dropped = 0;
>         }
> -       if (++sch->q.qlen <= sch->limit)
> +       q->memory_usage += skb->truesize;
> +       memory_limited = q->memory_usage > q->memory_limit;
> +       if (++sch->q.qlen <= sch->limit && !memory_limited)
>                 return NET_XMIT_SUCCESS;
>
>         prev_backlog = sch->qstats.backlog;
> @@ -229,7 +238,8 @@ static int fq_codel_enqueue(struct sk_buff *skb, struct Qdisc *sch)
>         ret = fq_codel_drop(sch, q->drop_batch_size);
>
>         q->drop_overlimit += prev_qlen - sch->q.qlen;
> -
> +       if (memory_limited)
> +               q->drop_overmemory += prev_qlen - sch->q.qlen;

So when the packet is dropped due to memory over limit, should
we return failure for this case? Or I miss anything?
Eric Dumazet May 9, 2016, 4:31 a.m. UTC | #3
On Sun, 2016-05-08 at 21:14 -0700, Cong Wang wrote:

> So when the packet is dropped due to memory over limit, should
> we return failure for this case? Or I miss anything?

Same behavior than before.

If we dropped some packets of this flow, we return NET_XMIT_CN
Cong Wang May 9, 2016, 5:07 a.m. UTC | #4
On Sun, May 8, 2016 at 9:31 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Sun, 2016-05-08 at 21:14 -0700, Cong Wang wrote:
>
>> So when the packet is dropped due to memory over limit, should
>> we return failure for this case? Or I miss anything?
>
> Same behavior than before.
>
> If we dropped some packets of this flow, we return NET_XMIT_CN

I think for the limited memory case, the upper layer is supposed
to stop sending more packets when hitting the limit.
Eric Dumazet May 9, 2016, 2:26 p.m. UTC | #5
On Sun, 2016-05-08 at 22:07 -0700, Cong Wang wrote:
> On Sun, May 8, 2016 at 9:31 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > On Sun, 2016-05-08 at 21:14 -0700, Cong Wang wrote:
> >
> >> So when the packet is dropped due to memory over limit, should
> >> we return failure for this case? Or I miss anything?
> >
> > Same behavior than before.
> >
> > If we dropped some packets of this flow, we return NET_XMIT_CN
> 
> I think for the limited memory case, the upper layer is supposed
> to stop sending more packets when hitting the limit.

They doe. NET_XMIT_CN for example aborts IP fragmentation.

TCP flows will also instantly react.
Cong Wang May 10, 2016, 4:34 a.m. UTC | #6
On Mon, May 9, 2016 at 7:26 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Sun, 2016-05-08 at 22:07 -0700, Cong Wang wrote:
>> On Sun, May 8, 2016 at 9:31 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> > On Sun, 2016-05-08 at 21:14 -0700, Cong Wang wrote:
>> >
>> >> So when the packet is dropped due to memory over limit, should
>> >> we return failure for this case? Or I miss anything?
>> >
>> > Same behavior than before.
>> >
>> > If we dropped some packets of this flow, we return NET_XMIT_CN
>>
>> I think for the limited memory case, the upper layer is supposed
>> to stop sending more packets when hitting the limit.
>
> They doe. NET_XMIT_CN for example aborts IP fragmentation.
>
> TCP flows will also instantly react.

But not for the NET_XMIT_SUCCESS case:

        return ret == idx ? NET_XMIT_CN : NET_XMIT_SUCCESS;
Eric Dumazet May 10, 2016, 4:45 a.m. UTC | #7
On Mon, 2016-05-09 at 21:34 -0700, Cong Wang wrote:
> On Mon, May 9, 2016 at 7:26 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > On Sun, 2016-05-08 at 22:07 -0700, Cong Wang wrote:
> >> On Sun, May 8, 2016 at 9:31 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >> > On Sun, 2016-05-08 at 21:14 -0700, Cong Wang wrote:
> >> >
> >> >> So when the packet is dropped due to memory over limit, should
> >> >> we return failure for this case? Or I miss anything?
> >> >
> >> > Same behavior than before.
> >> >
> >> > If we dropped some packets of this flow, we return NET_XMIT_CN
> >>
> >> I think for the limited memory case, the upper layer is supposed
> >> to stop sending more packets when hitting the limit.
> >
> > They doe. NET_XMIT_CN for example aborts IP fragmentation.
> >
> > TCP flows will also instantly react.
> 
> But not for the NET_XMIT_SUCCESS case:
> 
>         return ret == idx ? NET_XMIT_CN : NET_XMIT_SUCCESS;


I believe you missed whole point of FQ (SFQ, FQ_CODEL, FQ, ...)

If we dropped a packet of another flow because this other flow is an
elephant, why should we notify the mouse that we shot an elephant ?

We return NET_XMIT_SUCCESS because we properly queued this packet for
this flow. This is absolutely right.

If you do not like fq, just use pfifo, and yes you'll kill the mice.
Cong Wang May 10, 2016, 4:57 a.m. UTC | #8
On Mon, May 9, 2016 at 9:45 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Mon, 2016-05-09 at 21:34 -0700, Cong Wang wrote:
>> On Mon, May 9, 2016 at 7:26 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> > On Sun, 2016-05-08 at 22:07 -0700, Cong Wang wrote:
>> >> On Sun, May 8, 2016 at 9:31 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> >> > On Sun, 2016-05-08 at 21:14 -0700, Cong Wang wrote:
>> >> >
>> >> >> So when the packet is dropped due to memory over limit, should
>> >> >> we return failure for this case? Or I miss anything?
>> >> >
>> >> > Same behavior than before.
>> >> >
>> >> > If we dropped some packets of this flow, we return NET_XMIT_CN
>> >>
>> >> I think for the limited memory case, the upper layer is supposed
>> >> to stop sending more packets when hitting the limit.
>> >
>> > They doe. NET_XMIT_CN for example aborts IP fragmentation.
>> >
>> > TCP flows will also instantly react.
>>
>> But not for the NET_XMIT_SUCCESS case:
>>
>>         return ret == idx ? NET_XMIT_CN : NET_XMIT_SUCCESS;
>
>
> I believe you missed whole point of FQ (SFQ, FQ_CODEL, FQ, ...)
>
> If we dropped a packet of another flow because this other flow is an
> elephant, why should we notify the mouse that we shot an elephant ?
>
> We return NET_XMIT_SUCCESS because we properly queued this packet for
> this flow. This is absolutely right.
>

Sure, but we are talking about memory constraint case, aren't we?

If the whole system are suffering from memory pressure, the whole
qdisc should be halted?
Eric Dumazet May 10, 2016, 5:10 a.m. UTC | #9
On Mon, 2016-05-09 at 21:57 -0700, Cong Wang wrote:

> Sure, but we are talking about memory constraint case, aren't we?
> 
> If the whole system are suffering from memory pressure, the whole
> qdisc should be halted?

Please read the patch again.

I added a mem control, exactly to control memory usage in the first
place. If the admin allows this qdisc to consume 4MBytes, then we can
queue up to 4 Mbytes on it.

If we evict packets from _other_ flow because of whatever limit is hit
(being number of packets or memory usage), we do not report to the
innocent guy that some packets were dropped.

The innocent guy packet _is_ queued and _should_ be sent eventually.

Of course, if we could predict the future and that 456 usec later, the
packet will be lost anyway, we would notify the innocent guy right away.

But this is left for future improvement.
diff mbox

Patch

diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h
index a11afecd4482..2382eed50278 100644
--- a/include/uapi/linux/pkt_sched.h
+++ b/include/uapi/linux/pkt_sched.h
@@ -719,6 +719,7 @@  enum {
 	TCA_FQ_CODEL_QUANTUM,
 	TCA_FQ_CODEL_CE_THRESHOLD,
 	TCA_FQ_CODEL_DROP_BATCH_SIZE,
+	TCA_FQ_CODEL_MEMORY_LIMIT,
 	__TCA_FQ_CODEL_MAX
 };
 
@@ -743,6 +744,8 @@  struct tc_fq_codel_qd_stats {
 	__u32	new_flows_len;	/* count of flows in new list */
 	__u32	old_flows_len;	/* count of flows in old list */
 	__u32	ce_mark;	/* packets above ce_threshold */
+	__u32	memory_usage;	/* in bytes */
+	__u32	drop_overmemory;
 };
 
 struct tc_fq_codel_cl_stats {
diff --git a/net/sched/sch_fq_codel.c b/net/sched/sch_fq_codel.c
index e7b42b0d5145..bb8bd9314629 100644
--- a/net/sched/sch_fq_codel.c
+++ b/net/sched/sch_fq_codel.c
@@ -60,8 +60,11 @@  struct fq_codel_sched_data {
 	u32		perturbation;	/* hash perturbation */
 	u32		quantum;	/* psched_mtu(qdisc_dev(sch)); */
 	u32		drop_batch_size;
+	u32		memory_limit;
 	struct codel_params cparams;
 	struct codel_stats cstats;
+	u32		memory_usage;
+	u32		drop_overmemory;
 	u32		drop_overlimit;
 	u32		new_flow_count;
 
@@ -143,6 +146,7 @@  static unsigned int fq_codel_drop(struct Qdisc *sch, unsigned int max_packets)
 	unsigned int maxbacklog = 0, idx = 0, i, len;
 	struct fq_codel_flow *flow;
 	unsigned int threshold;
+	unsigned int mem = 0;
 
 	/* Queue is full! Find the fat flow and drop packet(s) from it.
 	 * This might sound expensive, but with 1024 flows, we scan
@@ -167,11 +171,13 @@  static unsigned int fq_codel_drop(struct Qdisc *sch, unsigned int max_packets)
 	do {
 		skb = dequeue_head(flow);
 		len += qdisc_pkt_len(skb);
+		mem += skb->truesize;
 		kfree_skb(skb);
 	} while (++i < max_packets && len < threshold);
 
 	flow->dropped += i;
 	q->backlogs[idx] -= len;
+	q->memory_usage -= mem;
 	sch->qstats.drops += i;
 	sch->qstats.backlog -= len;
 	sch->q.qlen -= i;
@@ -193,6 +199,7 @@  static int fq_codel_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 	unsigned int idx, prev_backlog, prev_qlen;
 	struct fq_codel_flow *flow;
 	int uninitialized_var(ret);
+	bool memory_limited;
 
 	idx = fq_codel_classify(skb, sch, &ret);
 	if (idx == 0) {
@@ -215,7 +222,9 @@  static int fq_codel_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 		flow->deficit = q->quantum;
 		flow->dropped = 0;
 	}
-	if (++sch->q.qlen <= sch->limit)
+	q->memory_usage += skb->truesize;
+	memory_limited = q->memory_usage > q->memory_limit;
+	if (++sch->q.qlen <= sch->limit && !memory_limited)
 		return NET_XMIT_SUCCESS;
 
 	prev_backlog = sch->qstats.backlog;
@@ -229,7 +238,8 @@  static int fq_codel_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 	ret = fq_codel_drop(sch, q->drop_batch_size);
 
 	q->drop_overlimit += prev_qlen - sch->q.qlen;
-
+	if (memory_limited)
+		q->drop_overmemory += prev_qlen - sch->q.qlen;
 	/* As we dropped packet(s), better let upper stack know this */
 	qdisc_tree_reduce_backlog(sch, prev_qlen - sch->q.qlen,
 				  prev_backlog - sch->qstats.backlog);
@@ -308,6 +318,7 @@  begin:
 			list_del_init(&flow->flowchain);
 		goto begin;
 	}
+	q->memory_usage -= skb->truesize;
 	qdisc_bstats_update(sch, skb);
 	flow->deficit -= qdisc_pkt_len(skb);
 	/* We cant call qdisc_tree_reduce_backlog() if our qlen is 0,
@@ -355,6 +366,7 @@  static const struct nla_policy fq_codel_policy[TCA_FQ_CODEL_MAX + 1] = {
 	[TCA_FQ_CODEL_QUANTUM]	= { .type = NLA_U32 },
 	[TCA_FQ_CODEL_CE_THRESHOLD] = { .type = NLA_U32 },
 	[TCA_FQ_CODEL_DROP_BATCH_SIZE] = { .type = NLA_U32 },
+	[TCA_FQ_CODEL_MEMORY_LIMIT] = { .type = NLA_U32 },
 };
 
 static int fq_codel_change(struct Qdisc *sch, struct nlattr *opt)
@@ -409,7 +421,11 @@  static int fq_codel_change(struct Qdisc *sch, struct nlattr *opt)
 	if (tb[TCA_FQ_CODEL_DROP_BATCH_SIZE])
 		q->drop_batch_size = min(1U, nla_get_u32(tb[TCA_FQ_CODEL_DROP_BATCH_SIZE]));
 
-	while (sch->q.qlen > sch->limit) {
+	if (tb[TCA_FQ_CODEL_MEMORY_LIMIT])
+		q->memory_limit = min(1U << 31, nla_get_u32(tb[TCA_FQ_CODEL_MEMORY_LIMIT]));
+
+	while (sch->q.qlen > sch->limit ||
+	       q->memory_usage > q->memory_limit) {
 		struct sk_buff *skb = fq_codel_dequeue(sch);
 
 		q->cstats.drop_len += qdisc_pkt_len(skb);
@@ -454,6 +470,7 @@  static int fq_codel_init(struct Qdisc *sch, struct nlattr *opt)
 
 	sch->limit = 10*1024;
 	q->flows_cnt = 1024;
+	q->memory_limit = 32 << 20; /* 32 MBytes */
 	q->drop_batch_size = 64;
 	q->quantum = psched_mtu(qdisc_dev(sch));
 	q->perturbation = prandom_u32();
@@ -515,6 +532,8 @@  static int fq_codel_dump(struct Qdisc *sch, struct sk_buff *skb)
 			q->quantum) ||
 	    nla_put_u32(skb, TCA_FQ_CODEL_DROP_BATCH_SIZE,
 			q->drop_batch_size) ||
+	    nla_put_u32(skb, TCA_FQ_CODEL_MEMORY_LIMIT,
+			q->memory_limit) ||
 	    nla_put_u32(skb, TCA_FQ_CODEL_FLOWS,
 			q->flows_cnt))
 		goto nla_put_failure;
@@ -543,6 +562,8 @@  static int fq_codel_dump_stats(struct Qdisc *sch, struct gnet_dump *d)
 	st.qdisc_stats.ecn_mark = q->cstats.ecn_mark;
 	st.qdisc_stats.new_flow_count = q->new_flow_count;
 	st.qdisc_stats.ce_mark = q->cstats.ce_mark;
+	st.qdisc_stats.memory_usage  = q->memory_usage;
+	st.qdisc_stats.drop_overmemory = q->drop_overmemory;
 
 	list_for_each(pos, &q->new_flows)
 		st.qdisc_stats.new_flows_len++;