Message ID | 1431437838-5478-1-git-send-email-fw@strlen.de |
---|---|
State | Changes Requested, archived |
Delegated to: | David Miller |
Headers | show |
On Tue, 2015-05-12 at 15:37 +0200, Florian Westphal wrote: > > Fixes: d2788d34885d ("net: sched: further simplify handle_ing") > Acked-by: Daniel Borkmann <daniel@iogearbox.net> > Signed-off-by: Florian Westphal <fw@strlen.de> > --- > net/core/dev.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/net/core/dev.c b/net/core/dev.c > index e5f77c4..871181a 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -3540,8 +3540,9 @@ static inline struct sk_buff *handle_ing(struct sk_buff *skb, > *pt_prev = NULL; > } > > - qdisc_bstats_update_cpu(cl->q, skb); > + qdisc_skb_cb(skb)->pkt_len = skb->len; A qdisc might have a stab (cf qdisc_calculate_pkt_len() ) > skb->tc_verd = SET_TC_AT(skb->tc_verd, AT_INGRESS); > + qdisc_bstats_update_cpu(cl->q, skb); > > switch (tc_classify(skb, cl, &cl_res)) { > case TC_ACT_OK: -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Eric Dumazet <eric.dumazet@gmail.com> wrote: > > Fixes: d2788d34885d ("net: sched: further simplify handle_ing") > > Acked-by: Daniel Borkmann <daniel@iogearbox.net> > > Signed-off-by: Florian Westphal <fw@strlen.de> > > --- > > net/core/dev.c | 3 ++- > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > diff --git a/net/core/dev.c b/net/core/dev.c > > index e5f77c4..871181a 100644 > > --- a/net/core/dev.c > > +++ b/net/core/dev.c > > @@ -3540,8 +3540,9 @@ static inline struct sk_buff *handle_ing(struct sk_buff *skb, > > *pt_prev = NULL; > > } > > > > - qdisc_bstats_update_cpu(cl->q, skb); > > + qdisc_skb_cb(skb)->pkt_len = skb->len; > > > A qdisc might have a stab (cf qdisc_calculate_pkt_len() ) Thanks for pointing this out Eric. I was under impression that stab was only useful for egress but in fact tc did support ingress stab too. Daniel will submit a v2. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, May 12, 2015 at 05:16:46PM +0200, Florian Westphal wrote: > > > > A qdisc might have a stab (cf qdisc_calculate_pkt_len() ) > > Thanks for pointing this out Eric. Indeed. Thanks Eric! > I was under impression that stab was only useful for egress but in > fact tc did support ingress stab too. That was my impression as well. Though it was allowed to add qdisc_size_table to ingress, it's useless. Nothing takes advantage of recomputed qdisc_pkt_len. It can only mess with stats, which seems to be already broken: - egress qdiscs do qdisc_bstats_update() only at dequeue, whereas ingress double counts dropped packets - qdisc_bstats_update() does: bstats->bytes += qdisc_pkt_len(skb); bstats->packets += skb_is_gso(skb) ? skb_shinfo(skb)->gso_segs : 1; but nothing on ingress does qdisc_pkt_len_init(). So when we see gso packet on ingress the stats are very wrong. I think we should fix ingress stats as a whole. Option 1: do qdisc_pkt_len_init() + optional call into size_table to update stats only after classifers returned TC_ACT_OK. Cons: extra overhead per packet only to update ingress qdisc stats. Option 2: use byte and packet counts from underlying netdev, when reporting stats via ingress qdisc. Pros: No arithmetic in the fast path. I think option 2 is much preferred, since it's faster and equally accurate. Jamal, what's your take? Thanks! -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 05/12/15 13:03, Alexei Starovoitov wrote: > On Tue, May 12, 2015 at 05:16:46PM +0200, Florian Westphal wrote: >>> >>> A qdisc might have a stab (cf qdisc_calculate_pkt_len() ) >> >> Thanks for pointing this out Eric. > > Indeed. Thanks Eric! > >> I was under impression that stab was only useful for egress but in >> fact tc did support ingress stab too. > > That was my impression as well. Though it was allowed to add > qdisc_size_table to ingress, it's useless. Nothing takes advantage > of recomputed qdisc_pkt_len. It can only mess with stats, which > seems to be already broken: > - egress qdiscs do qdisc_bstats_update() only at dequeue, whereas > ingress double counts dropped packets > - qdisc_bstats_update() does: > bstats->bytes += qdisc_pkt_len(skb); > bstats->packets += skb_is_gso(skb) ? skb_shinfo(skb)->gso_segs : 1; > but nothing on ingress does qdisc_pkt_len_init(). > So when we see gso packet on ingress the stats are very wrong. > > I think we should fix ingress stats as a whole. > Option 1: > do qdisc_pkt_len_init() + optional call into size_table > to update stats only after classifers returned TC_ACT_OK. > Cons: extra overhead per packet only to update ingress qdisc stats. > Option 2: > use byte and packet counts from underlying netdev, when > reporting stats via ingress qdisc. > Pros: No arithmetic in the fast path. > > I think option 2 is much preferred, since it's faster and > equally accurate. > > Jamal, what's your take? > I dont think we need the stab on the ingress but we do need to account for gso. So option #1 with qdisc_pkt_len_init() alone is the only thing needed. i.e Florian's change becomes: - qdisc_bstats_update_cpu(cl->q, skb); + qdisc_pkt_len_init(skb) skb->tc_verd = SET_TC_AT(skb->tc_verd, AT_INGRESS); + qdisc_bstats_update_cpu(cl->q, skb); Alexei, why do you say this option will have overhead? cheers, jamal -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 05/13/15 07:02, Jamal Hadi Salim wrote: > > > I dont think we need the stab on the ingress but we do need to account > for gso. So option #1 with qdisc_pkt_len_init() alone is the only thing > needed. i.e Florian's change becomes: > > - qdisc_bstats_update_cpu(cl->q, skb); > + qdisc_pkt_len_init(skb) > skb->tc_verd = SET_TC_AT(skb->tc_verd, AT_INGRESS); > + qdisc_bstats_update_cpu(cl->q, skb); > > Alexei, why do you say this option will have overhead? > and the comment in qdisc_pkt_len_init() needs fixing too. It says: /* To get more precise estimation of bytes sent on wire, * we add to pkt_len the headers size of all segments */ It implies transmit direction only - modern nics do set this on receive. Something like: /* To get more precise estimation of bytes rx or to be sent on the wire, we add to pkt_len the headers size of all segments */ cheers, jamal -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2015-05-13 at 07:02 -0400, Jamal Hadi Salim wrote:
> Alexei, why do you say this option will have overhead?
Well, they want to be able to give nice numbers and ultimately have
a nice bogomips value ;)
About stab on ingress, I never used it, but maybe Jesper or someone else
did.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/net/core/dev.c b/net/core/dev.c index e5f77c4..871181a 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -3540,8 +3540,9 @@ static inline struct sk_buff *handle_ing(struct sk_buff *skb, *pt_prev = NULL; } - qdisc_bstats_update_cpu(cl->q, skb); + qdisc_skb_cb(skb)->pkt_len = skb->len; skb->tc_verd = SET_TC_AT(skb->tc_verd, AT_INGRESS); + qdisc_bstats_update_cpu(cl->q, skb); switch (tc_classify(skb, cl, &cl_res)) { case TC_ACT_OK: