diff mbox

[net-next,v2] net: core: set and refine qdisc pkt len before tc_classify

Message ID 1431550209-3498-1-git-send-email-daniel@iogearbox.net
State Superseded, archived
Delegated to: David Miller
Headers show

Commit Message

Daniel Borkmann May 13, 2015, 8:50 p.m. UTC
From: Florian Westphal <fw@strlen.de>

Commit d2788d34885d4ce5ba ("net: sched: further simplify handle_ing")
removed the call to qdisc_enqueue_root(). However, after this removal
we no longer set qdisc pkt length, which this patch fixes. We make use
of qdisc_pkt_len_init() from 1def9238d4aa ("net_sched: more precise
pkt_len computation"), which was suggested as the current approach so
far does not take into account when GRO builds up GSO packets.

Fixes: d2788d34885d ("net: sched: further simplify handle_ing")
Suggested-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
---
 v1->v2:
  - use qdisc_pkt_len_init() as suggested, thanks!

 net/core/dev.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

Comments

Alexei Starovoitov May 13, 2015, 9:07 p.m. UTC | #1
On 5/13/15 1:50 PM, Daniel Borkmann wrote:
> From: Florian Westphal <fw@strlen.de>
>
> Commit d2788d34885d4ce5ba ("net: sched: further simplify handle_ing")
> removed the call to qdisc_enqueue_root(). However, after this removal
> we no longer set qdisc pkt length, which this patch fixes. We make use
> of qdisc_pkt_len_init() from 1def9238d4aa ("net_sched: more precise
> pkt_len computation"), which was suggested as the current approach so
> far does not take into account when GRO builds up GSO packets.
>
> Fixes: d2788d34885d ("net: sched: further simplify handle_ing")
> Suggested-by: Jamal Hadi Salim <jhs@mojatatu.com>
> Signed-off-by: Florian Westphal <fw@strlen.de>
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> ---

Looks good as quick fix for now. Long term I still want to try to reuse
stats from underlying netdev ;)
For TX we don't have this option and have to do qdisc_pkt_len_init(),
but for RX we should be able to do something smarter.

Acked-by: Alexei Starovoitov <ast@plumgrid.com>


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet May 13, 2015, 9:20 p.m. UTC | #2
On Wed, 2015-05-13 at 22:50 +0200, Daniel Borkmann wrote:
> From: Florian Westphal <fw@strlen.de>
> 
> Commit d2788d34885d4ce5ba ("net: sched: further simplify handle_ing")
> removed the call to qdisc_enqueue_root(). However, after this removal
> we no longer set qdisc pkt length, which this patch fixes. We make use
> of qdisc_pkt_len_init() from 1def9238d4aa ("net_sched: more precise
> pkt_len computation"), which was suggested as the current approach so
> far does not take into account when GRO builds up GSO packets.
> 
> Fixes: d2788d34885d ("net: sched: further simplify handle_ing")
> Suggested-by: Jamal Hadi Salim <jhs@mojatatu.com>
> Signed-off-by: Florian Westphal <fw@strlen.de>
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> ---
>  v1->v2:
>   - use qdisc_pkt_len_init() as suggested, thanks!
> 
>  net/core/dev.c | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 15e51a9..6e19521 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -2787,8 +2787,8 @@ static void qdisc_pkt_len_init(struct sk_buff *skb)
>  
>  	qdisc_skb_cb(skb)->pkt_len = skb->len;
>  
> -	/* To get more precise estimation of bytes sent on wire,
> -	 * we add to pkt_len the headers size of all segments
> +	/* To get more precise estimation of bytes sent/received on
> +	 * wire, we add to pkt_len the headers size of all segments.
>  	 */
>  	if (shinfo->gso_size)  {
>  		unsigned int hdr_len;
> @@ -3541,9 +3541,11 @@ static inline struct sk_buff *handle_ing(struct sk_buff *skb,
>  		*pt_prev = NULL;
>  	}
>  
> -	qdisc_bstats_update_cpu(cl->q, skb);
> +	qdisc_pkt_len_init(skb);
>  	skb->tc_verd = SET_TC_AT(skb->tc_verd, AT_INGRESS);
>  
> +	qdisc_bstats_update_cpu(cl->q, skb);
> +
>  	switch (tc_classify(skb, cl, &cl_res)) {
>  	case TC_ACT_OK:
>  	case TC_ACT_RECLASSIFY:

Is qdisc_pkt_len_init() still inlined, now it has 2 callers ?

People using policers on ingress are very often disabling GRO for
various historical reasons.

Since handle_ing() is very uncommon, I would prefer not slowing down tx
just so that GRO ~5% error in ingress is fixed.

I noticed you did not CC me, don't be afraid to include me next time one
of my commit is referred in a changelog ;)



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Florian Westphal May 13, 2015, 9:33 p.m. UTC | #3
Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > Commit d2788d34885d4ce5ba ("net: sched: further simplify handle_ing")
> > removed the call to qdisc_enqueue_root(). However, after this removal
> > we no longer set qdisc pkt length, which this patch fixes. We make use
> > of qdisc_pkt_len_init() from 1def9238d4aa ("net_sched: more precise
> > pkt_len computation"), which was suggested as the current approach so
> > far does not take into account when GRO builds up GSO packets.
> > 
> > Fixes: d2788d34885d ("net: sched: further simplify handle_ing")
> > Suggested-by: Jamal Hadi Salim <jhs@mojatatu.com>
> > Signed-off-by: Florian Westphal <fw@strlen.de>
> > Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> > ---
> >  v1->v2:
> >   - use qdisc_pkt_len_init() as suggested, thanks!
> >
> > @@ -3541,9 +3541,11 @@ static inline struct sk_buff *handle_ing(struct sk_buff *skb,
> >  		*pt_prev = NULL;
> >  	}
> >  
> > -	qdisc_bstats_update_cpu(cl->q, skb);
> > +	qdisc_pkt_len_init(skb);
[..]
> 
> Is qdisc_pkt_len_init() still inlined, now it has 2 callers ?

gcc 4.8.4, OPTIMIZE_FOR_SIZE=n: nope, callq.

> People using policers on ingress are very often disabling GRO for
> various historical reasons.

right, ingress policing is rather poor with GRO (or you need to set mtu 64000
on the tc police cmd line).

> Since handle_ing() is very uncommon, I would prefer not slowing down tx
> just so that GRO ~5% error in ingress is fixed.

Hmm, so whats you suggestion?  Just use

qdisc_skb_cb(skb)->pkt_len = skb->len;

like in v1 of the patch set?
resurrect qdisc_enqueue_root?
set always_inline (i'd prefer not, its not very small)?

Any other idea?

Thanks!
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Daniel Borkmann May 13, 2015, 9:36 p.m. UTC | #4
On 05/13/2015 11:20 PM, Eric Dumazet wrote:
...
> Is qdisc_pkt_len_init() still inlined, now it has 2 callers ?
>
> People using policers on ingress are very often disabling GRO for
> various historical reasons.

Yes, tested policer and for getting it working on my side, GRO needs
to be off.

> Since handle_ing() is very uncommon, I would prefer not slowing down tx
> just so that GRO ~5% error in ingress is fixed.

I see a callq dc0 <qdisc_pkt_len_init> now. Ok, in that case Florian's
original version is fine:

   http://patchwork.ozlabs.org/patch/471346/

> I noticed you did not CC me, don't be afraid to include me next time one
> of my commit is referred in a changelog ;)

Hmm, I did Cc you. ;)

To:	davem@davemloft.net
Cc:	fw@strlen.de, ast@plumgrid.com, jhs@mojatatu.com,
	eric.dumazet@gmail.com, netdev@vger.kernel.org,
	daniel@iogearbox.net

Thanks,
Daniel
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet May 13, 2015, 9:38 p.m. UTC | #5
On Wed, 2015-05-13 at 23:33 +0200, Florian Westphal wrote:
> Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > > Commit d2788d34885d4ce5ba ("net: sched: further simplify handle_ing")
> > > removed the call to qdisc_enqueue_root(). However, after this removal
> > > we no longer set qdisc pkt length, which this patch fixes. We make use
> > > of qdisc_pkt_len_init() from 1def9238d4aa ("net_sched: more precise
> > > pkt_len computation"), which was suggested as the current approach so
> > > far does not take into account when GRO builds up GSO packets.
> > > 
> > > Fixes: d2788d34885d ("net: sched: further simplify handle_ing")
> > > Suggested-by: Jamal Hadi Salim <jhs@mojatatu.com>
> > > Signed-off-by: Florian Westphal <fw@strlen.de>
> > > Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> > > ---
> > >  v1->v2:
> > >   - use qdisc_pkt_len_init() as suggested, thanks!
> > >
> > > @@ -3541,9 +3541,11 @@ static inline struct sk_buff *handle_ing(struct sk_buff *skb,
> > >  		*pt_prev = NULL;
> > >  	}
> > >  
> > > -	qdisc_bstats_update_cpu(cl->q, skb);
> > > +	qdisc_pkt_len_init(skb);
> [..]
> > 
> > Is qdisc_pkt_len_init() still inlined, now it has 2 callers ?
> 
> gcc 4.8.4, OPTIMIZE_FOR_SIZE=n: nope, callq.
> 
> > People using policers on ingress are very often disabling GRO for
> > various historical reasons.
> 
> right, ingress policing is rather poor with GRO (or you need to set mtu 64000
> on the tc police cmd line).
> 
> > Since handle_ing() is very uncommon, I would prefer not slowing down tx
> > just so that GRO ~5% error in ingress is fixed.
> 
> Hmm, so whats you suggestion?  Just use
> 
> qdisc_skb_cb(skb)->pkt_len = skb->len;
> 
> like in v1 of the patch set?
> resurrect qdisc_enqueue_root?
> set always_inline (i'd prefer not, its not very small)?
> 
> Any other idea?

One possibility would be to set it in GRO engine.

But then we would have to set qdisc_skb_cb(skb)->pkt_len = skb->len for
non GRO packets.

Also note the DODGY thing might be not needed for ingress,
and I am not sure we can currently deliver UDP aggregated packets.

Hmm....


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet May 13, 2015, 9:42 p.m. UTC | #6
On Wed, 2015-05-13 at 14:20 -0700, Eric Dumazet wrote:

> 
> I noticed you did not CC me, don't be afraid to include me next time one
> of my commit is referred in a changelog ;)

Sorry, I was wrong on this ;)




--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexei Starovoitov May 13, 2015, 10:27 p.m. UTC | #7
On 5/13/15 2:38 PM, Eric Dumazet wrote:
>
> One possibility would be to set it in GRO engine.
>
> But then we would have to set qdisc_skb_cb(skb)->pkt_len = skb->len for
> non GRO packets.
>
> Also note the DODGY thing might be not needed for ingress,
> and I am not sure we can currently deliver UDP aggregated packets.

yeah, current qdisc_pkt_len_init() is certainly TX oriented.
I'm not sure we can use tcp_hdrlen(skb) on RX which is called as
part of qdisc_pkt_len_init().
I think it's probably better to take Florian's V1 with:
qdisc_skb_cb(skb)->pkt_len = skb->len;
to fix immediate breakage of act police and then proceed further?
Like doing qdisc_skb_cb(skb)->pkt_len = skb->len inside GRO engine
is probably not possible, since cb is used by napi_gro_cb ?

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet May 13, 2015, 10:43 p.m. UTC | #8
On Wed, 2015-05-13 at 15:27 -0700, Alexei Starovoitov wrote:
> On 5/13/15 2:38 PM, Eric Dumazet wrote:
> >
> > One possibility would be to set it in GRO engine.
> >
> > But then we would have to set qdisc_skb_cb(skb)->pkt_len = skb->len for
> > non GRO packets.
> >
> > Also note the DODGY thing might be not needed for ingress,
> > and I am not sure we can currently deliver UDP aggregated packets.
> 
> yeah, current qdisc_pkt_len_init() is certainly TX oriented.
> I'm not sure we can use tcp_hdrlen(skb) on RX which is called as
> part of qdisc_pkt_len_init().

We can, because GRO leave transport header set at the right place.

> I think it's probably better to take Florian's V1 with:
> qdisc_skb_cb(skb)->pkt_len = skb->len;
> to fix immediate breakage of act police and then proceed further?

Yes.

> Like doing qdisc_skb_cb(skb)->pkt_len = skb->len inside GRO engine
> is probably not possible, since cb is used by napi_gro_cb ?

Everything is possible ;)

At the time tcp_gro_complete() is called, we certainly can overwrite 
NAPI_GRO_CB(skb)->frag0, with good documentation and BUILD_BUG_ON()
safet nets.




--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/core/dev.c b/net/core/dev.c
index 15e51a9..6e19521 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2787,8 +2787,8 @@  static void qdisc_pkt_len_init(struct sk_buff *skb)
 
 	qdisc_skb_cb(skb)->pkt_len = skb->len;
 
-	/* To get more precise estimation of bytes sent on wire,
-	 * we add to pkt_len the headers size of all segments
+	/* To get more precise estimation of bytes sent/received on
+	 * wire, we add to pkt_len the headers size of all segments.
 	 */
 	if (shinfo->gso_size)  {
 		unsigned int hdr_len;
@@ -3541,9 +3541,11 @@  static inline struct sk_buff *handle_ing(struct sk_buff *skb,
 		*pt_prev = NULL;
 	}
 
-	qdisc_bstats_update_cpu(cl->q, skb);
+	qdisc_pkt_len_init(skb);
 	skb->tc_verd = SET_TC_AT(skb->tc_verd, AT_INGRESS);
 
+	qdisc_bstats_update_cpu(cl->q, skb);
+
 	switch (tc_classify(skb, cl, &cl_res)) {
 	case TC_ACT_OK:
 	case TC_ACT_RECLASSIFY: