diff mbox

[net-next,v5,10/11] tbf: take into account gso skbs

Message ID 20130218095837.GA1566@minipsycho.orion
State RFC, archived
Headers show

Commit Message

Jiri Pirko Feb. 18, 2013, 9:58 a.m. UTC
Sun, Feb 17, 2013 at 06:54:23PM CET, eric.dumazet@gmail.com wrote:
>On Sun, 2013-02-17 at 17:18 +0100, Jiri Pirko wrote:
>
>> I'm going through this issue back and front and on the second thought,
>> I think this patch might not be so wrong after all.
>> 
>> "Accumulating" time in ptoks would effectively cause the skb to be sent
>> only in case time for whole skb is available (accumulated).
>> 
>> The re-segmenting will only cause the skb fragments sent in each time frame.
>> 
>> I can't see how the bigger bursts you are reffering to can happen.
>> 
>> Or am I missing something?
>
>Token Bucket Filter doesnt allow to accumulate tokens above a given
>threshold. Thats the whole point of the algo.
>
>After a one hour idle time, you don't want to allow your device sending
>a burst exceeding the constraint.

You are right, therefore I said "not so wrong". Let me illustrate my
thoughts. Here is a patch:

Subject: [patch net-next RFC] tbf: take into account gso skbs

Ignore max_size check for gso skbs. This check made bigger packets
incorrectly dropped. Remove this limitation for gso skbs.

Also for peaks, accumulate time for big gso skbs.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
---
 net/sched/sch_tbf.c | 24 ++++++++++++++++++++----
 1 file changed, 20 insertions(+), 4 deletions(-)

Comments

Eric Dumazet Feb. 19, 2013, 4:15 p.m. UTC | #1
On Mon, 2013-02-18 at 10:58 +0100, Jiri Pirko wrote:
> Sun, Feb 17, 2013 at 06:54:23PM CET, eric.dumazet@gmail.com wrote:
> >On Sun, 2013-02-17 at 17:18 +0100, Jiri Pirko wrote:
> >
> >> I'm going through this issue back and front and on the second thought,
> >> I think this patch might not be so wrong after all.
> >> 
> >> "Accumulating" time in ptoks would effectively cause the skb to be sent
> >> only in case time for whole skb is available (accumulated).
> >> 
> >> The re-segmenting will only cause the skb fragments sent in each time frame.
> >> 
> >> I can't see how the bigger bursts you are reffering to can happen.
> >> 
> >> Or am I missing something?
> >
> >Token Bucket Filter doesnt allow to accumulate tokens above a given
> >threshold. Thats the whole point of the algo.
> >
> >After a one hour idle time, you don't want to allow your device sending
> >a burst exceeding the constraint.
> 
> You are right, therefore I said "not so wrong". Let me illustrate my
> thoughts. Here is a patch:
> 
> Subject: [patch net-next RFC] tbf: take into account gso skbs
> 
> Ignore max_size check for gso skbs. This check made bigger packets
> incorrectly dropped. Remove this limitation for gso skbs.
> 
> Also for peaks, accumulate time for big gso skbs.
> 
> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
> ---

I am sorry, we can not do this accumulation.

If we are allowed to send 1k per second, we are not allowed to send 10k
after 10 seconds of idle.

Either we are able to split the GSO packet, and respect the TBF
constraints, either we must drop it.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jiri Pirko Feb. 19, 2013, 4:46 p.m. UTC | #2
Tue, Feb 19, 2013 at 05:15:02PM CET, eric.dumazet@gmail.com wrote:
>On Mon, 2013-02-18 at 10:58 +0100, Jiri Pirko wrote:
>> Sun, Feb 17, 2013 at 06:54:23PM CET, eric.dumazet@gmail.com wrote:
>> >On Sun, 2013-02-17 at 17:18 +0100, Jiri Pirko wrote:
>> >
>> >> I'm going through this issue back and front and on the second thought,
>> >> I think this patch might not be so wrong after all.
>> >> 
>> >> "Accumulating" time in ptoks would effectively cause the skb to be sent
>> >> only in case time for whole skb is available (accumulated).
>> >> 
>> >> The re-segmenting will only cause the skb fragments sent in each time frame.
>> >> 
>> >> I can't see how the bigger bursts you are reffering to can happen.
>> >> 
>> >> Or am I missing something?
>> >
>> >Token Bucket Filter doesnt allow to accumulate tokens above a given
>> >threshold. Thats the whole point of the algo.
>> >
>> >After a one hour idle time, you don't want to allow your device sending
>> >a burst exceeding the constraint.
>> 
>> You are right, therefore I said "not so wrong". Let me illustrate my
>> thoughts. Here is a patch:
>> 
>> Subject: [patch net-next RFC] tbf: take into account gso skbs
>> 
>> Ignore max_size check for gso skbs. This check made bigger packets
>> incorrectly dropped. Remove this limitation for gso skbs.
>> 
>> Also for peaks, accumulate time for big gso skbs.
>> 
>> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
>> ---
>
>I am sorry, we can not do this accumulation.
>
>If we are allowed to send 1k per second, we are not allowed to send 10k
>after 10 seconds of idle.
>
>Either we are able to split the GSO packet, and respect the TBF
>constraints, either we must drop it.


That's a shame. Would be easy this way, also applicable to act_police :/

About the gso_segment, do you see any cons doing that on enqueue path
rather than dequeue?

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Feb. 19, 2013, 5:01 p.m. UTC | #3
On Tue, 2013-02-19 at 17:46 +0100, Jiri Pirko wrote:

> About the gso_segment, do you see any cons doing that on enqueue path
> rather than dequeue?
> 

It would be fine, and could be done in core stack instead of qdisc.

netif_skb_features() for example has the following (incomplete) check

if (skb_shinfo(skb)->gso_segs > skb->dev->gso_max_segs)
    features &= ~NETIF_F_GSO_MASK;

We do have a dev->gso_max_size, but its currently used in TCP stack to
size the skbs built in tcp_sendmsg().

In a forwarding workload, it seems we dont use/check gso_max_size.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jiri Pirko March 8, 2013, 3:23 p.m. UTC | #4
Tue, Feb 19, 2013 at 06:01:27PM CET, eric.dumazet@gmail.com wrote:
>On Tue, 2013-02-19 at 17:46 +0100, Jiri Pirko wrote:
>
>> About the gso_segment, do you see any cons doing that on enqueue path
>> rather than dequeue?
>> 
>
>It would be fine, and could be done in core stack instead of qdisc.
>

So you mean for example in tcp code? the maximum possible size would be
propagated from set qdiscs up to the tcp code?

I'm not sure how exactly do that.

>netif_skb_features() for example has the following (incomplete) check
>
>if (skb_shinfo(skb)->gso_segs > skb->dev->gso_max_segs)
>    features &= ~NETIF_F_GSO_MASK;

Why this is incomplete?

>
>We do have a dev->gso_max_size, but its currently used in TCP stack to
>size the skbs built in tcp_sendmsg().

Where exactly in tcp_sendmsg() this is? I found dev->gso_max_size is copied to
sk_gso_max_size in tcp_v4_connect->sk_setup_caps.

>
>In a forwarding workload, it seems we dont use/check gso_max_size.

Yep, that would require to do the segmentation in enqueue anyway. Maybe
I can implement segmentation in enqueue path first and provide tcp
optimalization after that. What do you think?


Thanks!

Jiri
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jiri Pirko March 22, 2013, 10:02 a.m. UTC | #5
Fri, Mar 08, 2013 at 04:23:38PM CET, jiri@resnulli.us wrote:
>Tue, Feb 19, 2013 at 06:01:27PM CET, eric.dumazet@gmail.com wrote:
>>On Tue, 2013-02-19 at 17:46 +0100, Jiri Pirko wrote:
>>
>>> About the gso_segment, do you see any cons doing that on enqueue path
>>> rather than dequeue?
>>> 
>>
>>It would be fine, and could be done in core stack instead of qdisc.
>>
>
>So you mean for example in tcp code? the maximum possible size would be
>propagated from set qdiscs up to the tcp code?
>
>I'm not sure how exactly do that.
>
>>netif_skb_features() for example has the following (incomplete) check
>>
>>if (skb_shinfo(skb)->gso_segs > skb->dev->gso_max_segs)
>>    features &= ~NETIF_F_GSO_MASK;
>
>Why this is incomplete?
>
>>
>>We do have a dev->gso_max_size, but its currently used in TCP stack to
>>size the skbs built in tcp_sendmsg().
>
>Where exactly in tcp_sendmsg() this is? I found dev->gso_max_size is copied to
>sk_gso_max_size in tcp_v4_connect->sk_setup_caps.
>
>>
>>In a forwarding workload, it seems we dont use/check gso_max_size.
>
>Yep, that would require to do the segmentation in enqueue anyway. Maybe
>I can implement segmentation in enqueue path first and provide tcp
>optimalization after that. What do you think?


Reminding myself with this...

Thanks.

Jiri
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/sched/sch_tbf.c b/net/sched/sch_tbf.c
index c8388f3..bd36977 100644
--- a/net/sched/sch_tbf.c
+++ b/net/sched/sch_tbf.c
@@ -114,6 +114,8 @@  struct tbf_sched_data {
 	s64	t_c;			/* Time check-point */
 	struct Qdisc	*qdisc;		/* Inner qdisc, default - bfifo queue */
 	struct qdisc_watchdog watchdog;	/* Watchdog timer */
+	bool	last_dequeued;		/* Flag to indicate that a skb was
+					   returned by last dequeue */
 };
 
 static int tbf_enqueue(struct sk_buff *skb, struct Qdisc *sch)
@@ -121,7 +123,7 @@  static int tbf_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 	struct tbf_sched_data *q = qdisc_priv(sch);
 	int ret;
 
-	if (qdisc_pkt_len(skb) > q->max_size)
+	if (qdisc_pkt_len(skb) > q->max_size && !skb_is_gso(skb))
 		return qdisc_reshape_fail(skb, sch);
 
 	ret = qdisc_enqueue(skb, q->qdisc);
@@ -164,10 +166,18 @@  static struct sk_buff *tbf_dequeue(struct Qdisc *sch)
 		toks = min_t(s64, now - q->t_c, q->buffer);
 
 		if (q->peak_present) {
+			s64 skb_ptoks = (s64) psched_l2t_ns(&q->peak, len);
+			bool big_gso = skb_is_gso(skb) && skb_ptoks > q->mtu;
+
 			ptoks = toks + q->ptokens;
-			if (ptoks > q->mtu)
+			/* In case we hit big GSO packet, don't cap to MTU
+			 * when skb is here for >= 2 time and rather accumulate
+			 * time over calls to send it as a whole.
+			 */
+			if (ptoks > q->mtu &&
+			    (!big_gso || q->last_dequeued))
 				ptoks = q->mtu;
-			ptoks -= (s64) psched_l2t_ns(&q->peak, len);
+			ptoks -= skb_ptoks;
 		}
 		toks += q->tokens;
 		if (toks > q->buffer)
@@ -177,7 +187,7 @@  static struct sk_buff *tbf_dequeue(struct Qdisc *sch)
 		if ((toks|ptoks) >= 0) {
 			skb = qdisc_dequeue_peeked(q->qdisc);
 			if (unlikely(!skb))
-				return NULL;
+				goto null_out;
 
 			q->t_c = now;
 			q->tokens = toks;
@@ -185,6 +195,7 @@  static struct sk_buff *tbf_dequeue(struct Qdisc *sch)
 			sch->q.qlen--;
 			qdisc_unthrottled(sch);
 			qdisc_bstats_update(sch, skb);
+			q->last_dequeued = true;
 			return skb;
 		}
 
@@ -204,6 +215,8 @@  static struct sk_buff *tbf_dequeue(struct Qdisc *sch)
 
 		sch->qstats.overlimits++;
 	}
+null_out:
+	q->last_dequeued = false;
 	return NULL;
 }
 
@@ -212,6 +225,7 @@  static void tbf_reset(struct Qdisc *sch)
 	struct tbf_sched_data *q = qdisc_priv(sch);
 
 	qdisc_reset(q->qdisc);
+	q->last_dequeued = false;
 	sch->q.qlen = 0;
 	q->t_c = ktime_to_ns(ktime_get());
 	q->tokens = q->buffer;
@@ -290,6 +304,7 @@  static int tbf_change(struct Qdisc *sch, struct nlattr *opt)
 		qdisc_tree_decrease_qlen(q->qdisc, q->qdisc->q.qlen);
 		qdisc_destroy(q->qdisc);
 		q->qdisc = child;
+		q->last_dequeued = false;
 	}
 	q->limit = qopt->limit;
 	q->mtu = PSCHED_TICKS2NS(qopt->mtu);
@@ -392,6 +407,7 @@  static int tbf_graft(struct Qdisc *sch, unsigned long arg, struct Qdisc *new,
 	q->qdisc = new;
 	qdisc_tree_decrease_qlen(*old, (*old)->q.qlen);
 	qdisc_reset(*old);
+	q->last_dequeued = false;
 	sch_tree_unlock(sch);
 
 	return 0;