diff mbox

[-next] net: core: set qdisc pkt len before tc_classify

Message ID 1431437838-5478-1-git-send-email-fw@strlen.de
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Florian Westphal May 12, 2015, 1:37 p.m. UTC
commit d2788d34885d4ce5ba ("net: sched: further simplify handle_ing")
removed the call to qdisc_enqueue_root().

However, after this removal we no longer set qdisc pkt length.
This breaks traffic policing on ingress.

Daniel re-ran his test with this patch applied and reports that
while there is a small performance hit the total gain vs. before
d2788d34885d4ce5ba is still consistently higher than 1Mpps.

Fixes: d2788d34885d ("net: sched: further simplify handle_ing")
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/core/dev.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Eric Dumazet May 12, 2015, 2:22 p.m. UTC | #1
On Tue, 2015-05-12 at 15:37 +0200, Florian Westphal wrote:

> 
> Fixes: d2788d34885d ("net: sched: further simplify handle_ing")
> Acked-by: Daniel Borkmann <daniel@iogearbox.net>
> Signed-off-by: Florian Westphal <fw@strlen.de>
> ---
>  net/core/dev.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/net/core/dev.c b/net/core/dev.c
> index e5f77c4..871181a 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -3540,8 +3540,9 @@ static inline struct sk_buff *handle_ing(struct sk_buff *skb,
>  		*pt_prev = NULL;
>  	}
>  
> -	qdisc_bstats_update_cpu(cl->q, skb);
> +	qdisc_skb_cb(skb)->pkt_len = skb->len;


A qdisc might have a stab (cf qdisc_calculate_pkt_len() )

>  	skb->tc_verd = SET_TC_AT(skb->tc_verd, AT_INGRESS);
> +	qdisc_bstats_update_cpu(cl->q, skb);
>  
>  	switch (tc_classify(skb, cl, &cl_res)) {
>  	case TC_ACT_OK:


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Florian Westphal May 12, 2015, 3:16 p.m. UTC | #2
Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > Fixes: d2788d34885d ("net: sched: further simplify handle_ing")
> > Acked-by: Daniel Borkmann <daniel@iogearbox.net>
> > Signed-off-by: Florian Westphal <fw@strlen.de>
> > ---
> >  net/core/dev.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/net/core/dev.c b/net/core/dev.c
> > index e5f77c4..871181a 100644
> > --- a/net/core/dev.c
> > +++ b/net/core/dev.c
> > @@ -3540,8 +3540,9 @@ static inline struct sk_buff *handle_ing(struct sk_buff *skb,
> >  		*pt_prev = NULL;
> >  	}
> >  
> > -	qdisc_bstats_update_cpu(cl->q, skb);
> > +	qdisc_skb_cb(skb)->pkt_len = skb->len;
> 
> 
> A qdisc might have a stab (cf qdisc_calculate_pkt_len() )

Thanks for pointing this out Eric.

I was under impression that stab was only useful for egress but in
fact tc did support ingress stab too.

Daniel will submit a v2.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexei Starovoitov May 12, 2015, 5:03 p.m. UTC | #3
On Tue, May 12, 2015 at 05:16:46PM +0200, Florian Westphal wrote:
> > 
> > A qdisc might have a stab (cf qdisc_calculate_pkt_len() )
> 
> Thanks for pointing this out Eric.

Indeed. Thanks Eric!

> I was under impression that stab was only useful for egress but in
> fact tc did support ingress stab too.

That was my impression as well. Though it was allowed to add
qdisc_size_table to ingress, it's useless. Nothing takes advantage
of recomputed qdisc_pkt_len. It can only mess with stats, which
seems to be already broken:
- egress qdiscs do qdisc_bstats_update() only at dequeue, whereas
  ingress double counts dropped packets
- qdisc_bstats_update() does:
  bstats->bytes += qdisc_pkt_len(skb);
  bstats->packets += skb_is_gso(skb) ? skb_shinfo(skb)->gso_segs : 1;
  but nothing on ingress does qdisc_pkt_len_init().
  So when we see gso packet on ingress the stats are very wrong.

I think we should fix ingress stats as a whole.
Option 1:
 do qdisc_pkt_len_init() + optional call into size_table
 to update stats only after classifers returned TC_ACT_OK.
 Cons: extra overhead per packet only to update ingress qdisc stats.
Option 2:
 use byte and packet counts from underlying netdev, when
 reporting stats via ingress qdisc.
 Pros: No arithmetic in the fast path.

I think option 2 is much preferred, since it's faster and
equally accurate.

Jamal, what's your take?

Thanks!

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jamal Hadi Salim May 13, 2015, 11:02 a.m. UTC | #4
On 05/12/15 13:03, Alexei Starovoitov wrote:
> On Tue, May 12, 2015 at 05:16:46PM +0200, Florian Westphal wrote:
>>>
>>> A qdisc might have a stab (cf qdisc_calculate_pkt_len() )
>>
>> Thanks for pointing this out Eric.
>
> Indeed. Thanks Eric!
>
>> I was under impression that stab was only useful for egress but in
>> fact tc did support ingress stab too.
>
> That was my impression as well. Though it was allowed to add
> qdisc_size_table to ingress, it's useless. Nothing takes advantage
> of recomputed qdisc_pkt_len. It can only mess with stats, which
> seems to be already broken:
> - egress qdiscs do qdisc_bstats_update() only at dequeue, whereas
>    ingress double counts dropped packets
> - qdisc_bstats_update() does:
>    bstats->bytes += qdisc_pkt_len(skb);
>    bstats->packets += skb_is_gso(skb) ? skb_shinfo(skb)->gso_segs : 1;
>    but nothing on ingress does qdisc_pkt_len_init().
>    So when we see gso packet on ingress the stats are very wrong.
>
> I think we should fix ingress stats as a whole.
> Option 1:
>   do qdisc_pkt_len_init() + optional call into size_table
>   to update stats only after classifers returned TC_ACT_OK.
>   Cons: extra overhead per packet only to update ingress qdisc stats.
> Option 2:
>   use byte and packet counts from underlying netdev, when
>   reporting stats via ingress qdisc.
>   Pros: No arithmetic in the fast path.
>
> I think option 2 is much preferred, since it's faster and
> equally accurate.
>
> Jamal, what's your take?
>

I dont think we need the stab on the ingress but we do need to account
for gso. So option #1 with qdisc_pkt_len_init() alone is the only thing
needed. i.e Florian's change becomes:

-	qdisc_bstats_update_cpu(cl->q, skb);
+	qdisc_pkt_len_init(skb)
  	skb->tc_verd = SET_TC_AT(skb->tc_verd, AT_INGRESS);
+	qdisc_bstats_update_cpu(cl->q, skb);

Alexei, why do you say this option will have overhead?

cheers,
jamal
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jamal Hadi Salim May 13, 2015, 11:07 a.m. UTC | #5
On 05/13/15 07:02, Jamal Hadi Salim wrote:
>
>
> I dont think we need the stab on the ingress but we do need to account
> for gso. So option #1 with qdisc_pkt_len_init() alone is the only thing
> needed. i.e Florian's change becomes:
>
> -    qdisc_bstats_update_cpu(cl->q, skb);
> +    qdisc_pkt_len_init(skb)
>       skb->tc_verd = SET_TC_AT(skb->tc_verd, AT_INGRESS);
> +    qdisc_bstats_update_cpu(cl->q, skb);
>
> Alexei, why do you say this option will have overhead?
>

and the comment in qdisc_pkt_len_init() needs fixing too.
It says:
         /* To get more precise estimation of bytes sent on wire,
          * we add to pkt_len the headers size of all segments
          */


It implies transmit direction only - modern nics do set this
on receive.  Something like:

         /* To get more precise estimation of bytes rx or to be sent
           on the wire, we add to pkt_len the headers size of
           all segments
         */

cheers,
jamal
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet May 13, 2015, 1:10 p.m. UTC | #6
On Wed, 2015-05-13 at 07:02 -0400, Jamal Hadi Salim wrote:

> Alexei, why do you say this option will have overhead?

Well, they want to be able to give nice numbers and ultimately have
a nice bogomips value ;)

About stab on ingress, I never used it, but maybe Jesper or someone else
did.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/core/dev.c b/net/core/dev.c
index e5f77c4..871181a 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3540,8 +3540,9 @@  static inline struct sk_buff *handle_ing(struct sk_buff *skb,
 		*pt_prev = NULL;
 	}
 
-	qdisc_bstats_update_cpu(cl->q, skb);
+	qdisc_skb_cb(skb)->pkt_len = skb->len;
 	skb->tc_verd = SET_TC_AT(skb->tc_verd, AT_INGRESS);
+	qdisc_bstats_update_cpu(cl->q, skb);
 
 	switch (tc_classify(skb, cl, &cl_res)) {
 	case TC_ACT_OK: