Message ID | 1431550209-3498-1-git-send-email-daniel@iogearbox.net |
---|---|
State | Superseded, archived |
Delegated to: | David Miller |
Headers | show |
On 5/13/15 1:50 PM, Daniel Borkmann wrote: > From: Florian Westphal <fw@strlen.de> > > Commit d2788d34885d4ce5ba ("net: sched: further simplify handle_ing") > removed the call to qdisc_enqueue_root(). However, after this removal > we no longer set qdisc pkt length, which this patch fixes. We make use > of qdisc_pkt_len_init() from 1def9238d4aa ("net_sched: more precise > pkt_len computation"), which was suggested as the current approach so > far does not take into account when GRO builds up GSO packets. > > Fixes: d2788d34885d ("net: sched: further simplify handle_ing") > Suggested-by: Jamal Hadi Salim <jhs@mojatatu.com> > Signed-off-by: Florian Westphal <fw@strlen.de> > Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> > --- Looks good as quick fix for now. Long term I still want to try to reuse stats from underlying netdev ;) For TX we don't have this option and have to do qdisc_pkt_len_init(), but for RX we should be able to do something smarter. Acked-by: Alexei Starovoitov <ast@plumgrid.com> -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2015-05-13 at 22:50 +0200, Daniel Borkmann wrote: > From: Florian Westphal <fw@strlen.de> > > Commit d2788d34885d4ce5ba ("net: sched: further simplify handle_ing") > removed the call to qdisc_enqueue_root(). However, after this removal > we no longer set qdisc pkt length, which this patch fixes. We make use > of qdisc_pkt_len_init() from 1def9238d4aa ("net_sched: more precise > pkt_len computation"), which was suggested as the current approach so > far does not take into account when GRO builds up GSO packets. > > Fixes: d2788d34885d ("net: sched: further simplify handle_ing") > Suggested-by: Jamal Hadi Salim <jhs@mojatatu.com> > Signed-off-by: Florian Westphal <fw@strlen.de> > Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> > --- > v1->v2: > - use qdisc_pkt_len_init() as suggested, thanks! > > net/core/dev.c | 8 +++++--- > 1 file changed, 5 insertions(+), 3 deletions(-) > > diff --git a/net/core/dev.c b/net/core/dev.c > index 15e51a9..6e19521 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -2787,8 +2787,8 @@ static void qdisc_pkt_len_init(struct sk_buff *skb) > > qdisc_skb_cb(skb)->pkt_len = skb->len; > > - /* To get more precise estimation of bytes sent on wire, > - * we add to pkt_len the headers size of all segments > + /* To get more precise estimation of bytes sent/received on > + * wire, we add to pkt_len the headers size of all segments. > */ > if (shinfo->gso_size) { > unsigned int hdr_len; > @@ -3541,9 +3541,11 @@ static inline struct sk_buff *handle_ing(struct sk_buff *skb, > *pt_prev = NULL; > } > > - qdisc_bstats_update_cpu(cl->q, skb); > + qdisc_pkt_len_init(skb); > skb->tc_verd = SET_TC_AT(skb->tc_verd, AT_INGRESS); > > + qdisc_bstats_update_cpu(cl->q, skb); > + > switch (tc_classify(skb, cl, &cl_res)) { > case TC_ACT_OK: > case TC_ACT_RECLASSIFY: Is qdisc_pkt_len_init() still inlined, now it has 2 callers ? People using policers on ingress are very often disabling GRO for various historical reasons. Since handle_ing() is very uncommon, I would prefer not slowing down tx just so that GRO ~5% error in ingress is fixed. I noticed you did not CC me, don't be afraid to include me next time one of my commit is referred in a changelog ;) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Eric Dumazet <eric.dumazet@gmail.com> wrote: > > Commit d2788d34885d4ce5ba ("net: sched: further simplify handle_ing") > > removed the call to qdisc_enqueue_root(). However, after this removal > > we no longer set qdisc pkt length, which this patch fixes. We make use > > of qdisc_pkt_len_init() from 1def9238d4aa ("net_sched: more precise > > pkt_len computation"), which was suggested as the current approach so > > far does not take into account when GRO builds up GSO packets. > > > > Fixes: d2788d34885d ("net: sched: further simplify handle_ing") > > Suggested-by: Jamal Hadi Salim <jhs@mojatatu.com> > > Signed-off-by: Florian Westphal <fw@strlen.de> > > Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> > > --- > > v1->v2: > > - use qdisc_pkt_len_init() as suggested, thanks! > > > > @@ -3541,9 +3541,11 @@ static inline struct sk_buff *handle_ing(struct sk_buff *skb, > > *pt_prev = NULL; > > } > > > > - qdisc_bstats_update_cpu(cl->q, skb); > > + qdisc_pkt_len_init(skb); [..] > > Is qdisc_pkt_len_init() still inlined, now it has 2 callers ? gcc 4.8.4, OPTIMIZE_FOR_SIZE=n: nope, callq. > People using policers on ingress are very often disabling GRO for > various historical reasons. right, ingress policing is rather poor with GRO (or you need to set mtu 64000 on the tc police cmd line). > Since handle_ing() is very uncommon, I would prefer not slowing down tx > just so that GRO ~5% error in ingress is fixed. Hmm, so whats you suggestion? Just use qdisc_skb_cb(skb)->pkt_len = skb->len; like in v1 of the patch set? resurrect qdisc_enqueue_root? set always_inline (i'd prefer not, its not very small)? Any other idea? Thanks! -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 05/13/2015 11:20 PM, Eric Dumazet wrote: ... > Is qdisc_pkt_len_init() still inlined, now it has 2 callers ? > > People using policers on ingress are very often disabling GRO for > various historical reasons. Yes, tested policer and for getting it working on my side, GRO needs to be off. > Since handle_ing() is very uncommon, I would prefer not slowing down tx > just so that GRO ~5% error in ingress is fixed. I see a callq dc0 <qdisc_pkt_len_init> now. Ok, in that case Florian's original version is fine: http://patchwork.ozlabs.org/patch/471346/ > I noticed you did not CC me, don't be afraid to include me next time one > of my commit is referred in a changelog ;) Hmm, I did Cc you. ;) To: davem@davemloft.net Cc: fw@strlen.de, ast@plumgrid.com, jhs@mojatatu.com, eric.dumazet@gmail.com, netdev@vger.kernel.org, daniel@iogearbox.net Thanks, Daniel -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2015-05-13 at 23:33 +0200, Florian Westphal wrote: > Eric Dumazet <eric.dumazet@gmail.com> wrote: > > > Commit d2788d34885d4ce5ba ("net: sched: further simplify handle_ing") > > > removed the call to qdisc_enqueue_root(). However, after this removal > > > we no longer set qdisc pkt length, which this patch fixes. We make use > > > of qdisc_pkt_len_init() from 1def9238d4aa ("net_sched: more precise > > > pkt_len computation"), which was suggested as the current approach so > > > far does not take into account when GRO builds up GSO packets. > > > > > > Fixes: d2788d34885d ("net: sched: further simplify handle_ing") > > > Suggested-by: Jamal Hadi Salim <jhs@mojatatu.com> > > > Signed-off-by: Florian Westphal <fw@strlen.de> > > > Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> > > > --- > > > v1->v2: > > > - use qdisc_pkt_len_init() as suggested, thanks! > > > > > > @@ -3541,9 +3541,11 @@ static inline struct sk_buff *handle_ing(struct sk_buff *skb, > > > *pt_prev = NULL; > > > } > > > > > > - qdisc_bstats_update_cpu(cl->q, skb); > > > + qdisc_pkt_len_init(skb); > [..] > > > > Is qdisc_pkt_len_init() still inlined, now it has 2 callers ? > > gcc 4.8.4, OPTIMIZE_FOR_SIZE=n: nope, callq. > > > People using policers on ingress are very often disabling GRO for > > various historical reasons. > > right, ingress policing is rather poor with GRO (or you need to set mtu 64000 > on the tc police cmd line). > > > Since handle_ing() is very uncommon, I would prefer not slowing down tx > > just so that GRO ~5% error in ingress is fixed. > > Hmm, so whats you suggestion? Just use > > qdisc_skb_cb(skb)->pkt_len = skb->len; > > like in v1 of the patch set? > resurrect qdisc_enqueue_root? > set always_inline (i'd prefer not, its not very small)? > > Any other idea? One possibility would be to set it in GRO engine. But then we would have to set qdisc_skb_cb(skb)->pkt_len = skb->len for non GRO packets. Also note the DODGY thing might be not needed for ingress, and I am not sure we can currently deliver UDP aggregated packets. Hmm.... -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2015-05-13 at 14:20 -0700, Eric Dumazet wrote: > > I noticed you did not CC me, don't be afraid to include me next time one > of my commit is referred in a changelog ;) Sorry, I was wrong on this ;) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 5/13/15 2:38 PM, Eric Dumazet wrote: > > One possibility would be to set it in GRO engine. > > But then we would have to set qdisc_skb_cb(skb)->pkt_len = skb->len for > non GRO packets. > > Also note the DODGY thing might be not needed for ingress, > and I am not sure we can currently deliver UDP aggregated packets. yeah, current qdisc_pkt_len_init() is certainly TX oriented. I'm not sure we can use tcp_hdrlen(skb) on RX which is called as part of qdisc_pkt_len_init(). I think it's probably better to take Florian's V1 with: qdisc_skb_cb(skb)->pkt_len = skb->len; to fix immediate breakage of act police and then proceed further? Like doing qdisc_skb_cb(skb)->pkt_len = skb->len inside GRO engine is probably not possible, since cb is used by napi_gro_cb ? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2015-05-13 at 15:27 -0700, Alexei Starovoitov wrote: > On 5/13/15 2:38 PM, Eric Dumazet wrote: > > > > One possibility would be to set it in GRO engine. > > > > But then we would have to set qdisc_skb_cb(skb)->pkt_len = skb->len for > > non GRO packets. > > > > Also note the DODGY thing might be not needed for ingress, > > and I am not sure we can currently deliver UDP aggregated packets. > > yeah, current qdisc_pkt_len_init() is certainly TX oriented. > I'm not sure we can use tcp_hdrlen(skb) on RX which is called as > part of qdisc_pkt_len_init(). We can, because GRO leave transport header set at the right place. > I think it's probably better to take Florian's V1 with: > qdisc_skb_cb(skb)->pkt_len = skb->len; > to fix immediate breakage of act police and then proceed further? Yes. > Like doing qdisc_skb_cb(skb)->pkt_len = skb->len inside GRO engine > is probably not possible, since cb is used by napi_gro_cb ? Everything is possible ;) At the time tcp_gro_complete() is called, we certainly can overwrite NAPI_GRO_CB(skb)->frag0, with good documentation and BUILD_BUG_ON() safet nets. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/net/core/dev.c b/net/core/dev.c index 15e51a9..6e19521 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -2787,8 +2787,8 @@ static void qdisc_pkt_len_init(struct sk_buff *skb) qdisc_skb_cb(skb)->pkt_len = skb->len; - /* To get more precise estimation of bytes sent on wire, - * we add to pkt_len the headers size of all segments + /* To get more precise estimation of bytes sent/received on + * wire, we add to pkt_len the headers size of all segments. */ if (shinfo->gso_size) { unsigned int hdr_len; @@ -3541,9 +3541,11 @@ static inline struct sk_buff *handle_ing(struct sk_buff *skb, *pt_prev = NULL; } - qdisc_bstats_update_cpu(cl->q, skb); + qdisc_pkt_len_init(skb); skb->tc_verd = SET_TC_AT(skb->tc_verd, AT_INGRESS); + qdisc_bstats_update_cpu(cl->q, skb); + switch (tc_classify(skb, cl, &cl_res)) { case TC_ACT_OK: case TC_ACT_RECLASSIFY: