diff mbox series

[net-next,v3,4/5] net/tc: introduce TC_ACT_REINJECT.

Message ID 3c20787be0fd5d64728ffed46ae0a7dff10d7e05.1532437050.git.pabeni@redhat.com
State Changes Requested, archived
Delegated to: David Miller
Headers show
Series TC: refactor act_mirred packets re-injection | expand

Commit Message

Paolo Abeni July 24, 2018, 8:06 p.m. UTC
This is similar TC_ACT_REDIRECT, but with a slightly different
semantic:
- on ingress the mirred skbs are passed to the target device
network stack without any additional check not scrubbing.
- the rcu-protected stats provided via the tcf_result struct
  are updated on error conditions.

This new tcfa_action value is not exposed to the user-space
and can be used only internally by clsact.

v1 -> v2: do not touch TC_ACT_REDIRECT code path, introduce
 a new action type instead

v2 -> v3:
 - rename the new action value TC_ACT_REINJECT, update the
   helper accordingly
 - take care of uncloned reinjected packets in XDP generic
   hook

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 include/net/pkt_cls.h     |  3 +++
 include/net/sch_generic.h | 19 +++++++++++++++++++
 net/core/dev.c            |  6 +++++-
 3 files changed, 27 insertions(+), 1 deletion(-)

Comments

Cong Wang July 24, 2018, 8:38 p.m. UTC | #1
On Tue, Jul 24, 2018 at 1:07 PM Paolo Abeni <pabeni@redhat.com> wrote:
> +static inline void skb_tc_reinject(struct sk_buff *skb, struct tcf_result *res)
> +{
> +       struct gnet_stats_queue *stats = res->qstats;
> +       int ret;
> +
> +       if (res->ingress)
> +               ret = netif_receive_skb(skb);
> +       else
> +               ret = dev_queue_xmit(skb);
> +       if (ret && stats)
> +               qstats_overlimit_inc(res->qstats);

Why increasing overlimit? Overlimit is typically increased
by traffic shapers to indicate there is no bandwidth to send
out the packet.

I fail to understand why overlimit is increased in your case
here. I guess you want to increase 'drops' instead.
Cong Wang July 24, 2018, 8:50 p.m. UTC | #2
On Tue, Jul 24, 2018 at 1:38 PM Cong Wang <xiyou.wangcong@gmail.com> wrote:
>
> On Tue, Jul 24, 2018 at 1:07 PM Paolo Abeni <pabeni@redhat.com> wrote:
> > +static inline void skb_tc_reinject(struct sk_buff *skb, struct tcf_result *res)
> > +{
> > +       struct gnet_stats_queue *stats = res->qstats;
> > +       int ret;
> > +
> > +       if (res->ingress)
> > +               ret = netif_receive_skb(skb);
> > +       else
> > +               ret = dev_queue_xmit(skb);
> > +       if (ret && stats)
> > +               qstats_overlimit_inc(res->qstats);
>
> Why increasing overlimit? Overlimit is typically increased
> by traffic shapers to indicate there is no bandwidth to send
> out the packet.
>
> I fail to understand why overlimit is increased in your case
> here. I guess you want to increase 'drops' instead.

Hmm, actually the current mirred code increases overlimit too.
But I still don't think it makes sense.
Paolo Abeni July 25, 2018, 8:29 a.m. UTC | #3
On Tue, 2018-07-24 at 13:50 -0700, Cong Wang wrote:
> On Tue, Jul 24, 2018 at 1:38 PM Cong Wang <xiyou.wangcong@gmail.com> wrote:
> > 
> > On Tue, Jul 24, 2018 at 1:07 PM Paolo Abeni <pabeni@redhat.com> wrote:
> > > +static inline void skb_tc_reinject(struct sk_buff *skb, struct tcf_result *res)
> > > +{
> > > +       struct gnet_stats_queue *stats = res->qstats;
> > > +       int ret;
> > > +
> > > +       if (res->ingress)
> > > +               ret = netif_receive_skb(skb);
> > > +       else
> > > +               ret = dev_queue_xmit(skb);
> > > +       if (ret && stats)
> > > +               qstats_overlimit_inc(res->qstats);
> > 
> > Why increasing overlimit? Overlimit is typically increased
> > by traffic shapers to indicate there is no bandwidth to send
> > out the packet.
> > 
> > I fail to understand why overlimit is increased in your case
> > here. I guess you want to increase 'drops' instead.
> 
> Hmm, actually the current mirred code increases overlimit too.
> But I still don't think it makes sense.

Yep, I chose to increment 'overlimits' to preserve the current mirred
semantic. 

AFAICS, that was first introduced with:

commit 8919bc13e8d92c5b082c5c0321567383a071f5bc
Author: Jamal Hadi Salim <jhs@mojatatu.com>
Date:   Mon Aug 15 05:25:40 2011 +0000

    net_sched: fix port mirror/redirect stats reporting

Likely increasing 'drops' would be "better", but I'm unsure we can
change this established behavior without affecting some user.

Anyway, I'm fine either way. Please advice.

Cheers,

Paolo
Jamal Hadi Salim July 25, 2018, 12:16 p.m. UTC | #4
+Cc Shmulik

Paolo - please also run the tdc tests (and add anymore if you
feel they dont do coverage to your changes)

On 24/07/18 04:06 PM, Paolo Abeni wrote:
> This is similar TC_ACT_REDIRECT, but with a slightly different
> semantic:
> - on ingress the mirred skbs are passed to the target device
> network stack without any additional check not scrubbing.
> - the rcu-protected stats provided via the tcf_result struct
>    are updated on error conditions.
> 
> This new tcfa_action value is not exposed to the user-space
> and can be used only internally by clsact.
> 
> v1 -> v2: do not touch TC_ACT_REDIRECT code path, introduce
>   a new action type instead
> 
> v2 -> v3:
>   - rename the new action value TC_ACT_REINJECT, update the
>     helper accordingly
>   - take care of uncloned reinjected packets in XDP generic
>     hook
> 
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
>   include/net/pkt_cls.h     |  3 +++
>   include/net/sch_generic.h | 19 +++++++++++++++++++
>   net/core/dev.c            |  6 +++++-
>   3 files changed, 27 insertions(+), 1 deletion(-)
> 
> diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
> index 2081e4219f81..36ccfe2a303a 100644
> --- a/include/net/pkt_cls.h
> +++ b/include/net/pkt_cls.h
> @@ -7,6 +7,9 @@
>   #include <net/sch_generic.h>
>   #include <net/act_api.h>
>   
> +/* TC action not accessible from user space */
> +#define TC_ACT_REINJECT		(TC_ACT_VALUE_MAX + 1)

Lets say in the future we add a new opcode.
Will old kernel, new iproute2 (new value) work?
Maybe use a negative number below -1.
I am honestly still unclear about introducing this new
code.
Could you not use the result code to carry sufficient
info to indicate the intent?

cheers,
jamal

> +
>   /* Basic packet classifier frontend definitions. */
>   
>   struct tcf_walker {
> diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
> index 056dc1083aa3..95e81a70f549 100644
> --- a/include/net/sch_generic.h
> +++ b/include/net/sch_generic.h
> @@ -235,6 +235,12 @@ struct tcf_result {
>   			u32		classid;
>   		};
>   		const struct tcf_proto *goto_tp;
> +
> +		/* used by the TC_ACT_REINJECT action */
> +		struct {
> +			bool		ingress;
> +			struct gnet_stats_queue *qstats;
> +		};
>   	};
>   };
>   
> @@ -1091,4 +1097,17 @@ void mini_qdisc_pair_swap(struct mini_Qdisc_pair *miniqp,
>   void mini_qdisc_pair_init(struct mini_Qdisc_pair *miniqp, struct Qdisc *qdisc,
>   			  struct mini_Qdisc __rcu **p_miniq);
>   
> +static inline void skb_tc_reinject(struct sk_buff *skb, struct tcf_result *res)
> +{
> +	struct gnet_stats_queue *stats = res->qstats;
> +	int ret;
> +
> +	if (res->ingress)
> +		ret = netif_receive_skb(skb);
> +	else
> +		ret = dev_queue_xmit(skb);
> +	if (ret && stats)
> +		qstats_overlimit_inc(res->qstats);
> +}
> +
>   #endif
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 14a748ee8cc9..826ec74fe1d9 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -4252,7 +4252,7 @@ static u32 netif_receive_generic_xdp(struct sk_buff *skb,
>   	/* Reinjected packets coming from act_mirred or similar should
>   	 * not get XDP generic processing.
>   	 */
> -	if (skb_cloned(skb))
> +	if (skb_cloned(skb) || skb->tc_redirected)
>   		return XDP_PASS;
>   
>   	/* XDP packets must be linear and must have sufficient headroom
> @@ -4602,6 +4602,10 @@ sch_handle_ingress(struct sk_buff *skb, struct packet_type **pt_prev, int *ret,
>   		__skb_push(skb, skb->mac_len);
>   		skb_do_redirect(skb);
>   		return NULL;
> +	case TC_ACT_REINJECT:
> +		/* this does not scrub the packet, and updates stats on error */
> +		skb_tc_reinject(skb, &cl_res);
> +		return NULL;
>   	default:
>   		break;
>   	}
>
Jamal Hadi Salim July 25, 2018, 12:27 p.m. UTC | #5
On 25/07/18 04:29 AM, Paolo Abeni wrote:
> On Tue, 2018-07-24 at 13:50 -0700, Cong Wang wrote:

[..]
>>> I fail to understand why overlimit is increased in your case
>>> here. I guess you want to increase 'drops' instead.
>>
>> Hmm, actually the current mirred code increases overlimit too.
>> But I still don't think it makes sense.
> 
> Yep, I chose to increment 'overlimits' to preserve the current mirred
> semantic.
> 
> AFAICS, that was first introduced with:
> 
> commit 8919bc13e8d92c5b082c5c0321567383a071f5bc
> Author: Jamal Hadi Salim <jhs@mojatatu.com>
> Date:   Mon Aug 15 05:25:40 2011 +0000
> 
>      net_sched: fix port mirror/redirect stats reporting
> 
> Likely increasing 'drops' would be "better", but I'm unsure we can
> change this established behavior without affecting some user.
> 

Those changes were there from the beginning (above patch did
not introduce them).
IIRC, the reason was to distinguish between policy intended
drops and drops because of errors.

cheers,
jamal
Jiri Pirko July 25, 2018, 12:57 p.m. UTC | #6
Tue, Jul 24, 2018 at 10:06:42PM CEST, pabeni@redhat.com wrote:
>This is similar TC_ACT_REDIRECT, but with a slightly different
>semantic:
>- on ingress the mirred skbs are passed to the target device
>network stack without any additional check not scrubbing.
>- the rcu-protected stats provided via the tcf_result struct
>  are updated on error conditions.
>
>This new tcfa_action value is not exposed to the user-space
>and can be used only internally by clsact.
>
>v1 -> v2: do not touch TC_ACT_REDIRECT code path, introduce
> a new action type instead
>
>v2 -> v3:
> - rename the new action value TC_ACT_REINJECT, update the
>   helper accordingly
> - take care of uncloned reinjected packets in XDP generic
>   hook
>
>Signed-off-by: Paolo Abeni <pabeni@redhat.com>
>---
> include/net/pkt_cls.h     |  3 +++
> include/net/sch_generic.h | 19 +++++++++++++++++++
> net/core/dev.c            |  6 +++++-
> 3 files changed, 27 insertions(+), 1 deletion(-)
>
>diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
>index 2081e4219f81..36ccfe2a303a 100644
>--- a/include/net/pkt_cls.h
>+++ b/include/net/pkt_cls.h
>@@ -7,6 +7,9 @@
> #include <net/sch_generic.h>
> #include <net/act_api.h>
> 
>+/* TC action not accessible from user space */
>+#define TC_ACT_REINJECT		(TC_ACT_VALUE_MAX + 1)
>+
> /* Basic packet classifier frontend definitions. */
> 
> struct tcf_walker {
>diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
>index 056dc1083aa3..95e81a70f549 100644
>--- a/include/net/sch_generic.h
>+++ b/include/net/sch_generic.h
>@@ -235,6 +235,12 @@ struct tcf_result {
> 			u32		classid;
> 		};
> 		const struct tcf_proto *goto_tp;
>+
>+		/* used by the TC_ACT_REINJECT action */
>+		struct {
>+			bool		ingress;
>+			struct gnet_stats_queue *qstats;
>+		};
> 	};
> };
> 
>@@ -1091,4 +1097,17 @@ void mini_qdisc_pair_swap(struct mini_Qdisc_pair *miniqp,
> void mini_qdisc_pair_init(struct mini_Qdisc_pair *miniqp, struct Qdisc *qdisc,
> 			  struct mini_Qdisc __rcu **p_miniq);
> 
>+static inline void skb_tc_reinject(struct sk_buff *skb, struct tcf_result *res)
>+{
>+	struct gnet_stats_queue *stats = res->qstats;
>+	int ret;
>+
>+	if (res->ingress)
>+		ret = netif_receive_skb(skb);
>+	else
>+		ret = dev_queue_xmit(skb);

Hmm. "reinject" by the name tells me that the packet should be injected
again. By "inject", I understand beginning of the rx path. However, this
does xmit as well :/ It is a bit misleading. Maybe "reinsert" would
sound better?



>+	if (ret && stats)
>+		qstats_overlimit_inc(res->qstats);
>+}
>+
> #endif
>diff --git a/net/core/dev.c b/net/core/dev.c
>index 14a748ee8cc9..826ec74fe1d9 100644
>--- a/net/core/dev.c
>+++ b/net/core/dev.c
>@@ -4252,7 +4252,7 @@ static u32 netif_receive_generic_xdp(struct sk_buff *skb,
> 	/* Reinjected packets coming from act_mirred or similar should
> 	 * not get XDP generic processing.
> 	 */
>-	if (skb_cloned(skb))
>+	if (skb_cloned(skb) || skb->tc_redirected)
> 		return XDP_PASS;
> 
> 	/* XDP packets must be linear and must have sufficient headroom
>@@ -4602,6 +4602,10 @@ sch_handle_ingress(struct sk_buff *skb, struct packet_type **pt_prev, int *ret,
> 		__skb_push(skb, skb->mac_len);
> 		skb_do_redirect(skb);
> 		return NULL;
>+	case TC_ACT_REINJECT:
>+		/* this does not scrub the packet, and updates stats on error */
>+		skb_tc_reinject(skb, &cl_res);
>+		return NULL;
> 	default:
> 		break;
> 	}
>-- 
>2.17.1
>
Jiri Pirko July 25, 2018, 12:59 p.m. UTC | #7
Wed, Jul 25, 2018 at 02:16:12PM CEST, jhs@mojatatu.com wrote:

[...]

>> diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
>> index 2081e4219f81..36ccfe2a303a 100644
>> --- a/include/net/pkt_cls.h
>> +++ b/include/net/pkt_cls.h
>> @@ -7,6 +7,9 @@
>>   #include <net/sch_generic.h>
>>   #include <net/act_api.h>
>> +/* TC action not accessible from user space */
>> +#define TC_ACT_REINJECT		(TC_ACT_VALUE_MAX + 1)
>
>Lets say in the future we add a new opcode.
>Will old kernel, new iproute2 (new value) work?

This is safe. See patch #2.
Paolo Abeni July 25, 2018, 1:55 p.m. UTC | #8
Hi,

On Wed, 2018-07-25 at 08:16 -0400, Jamal Hadi Salim wrote:
> +Cc Shmulik
> 
> Paolo - please also run the tdc tests (and add anymore if you
> feel they dont do coverage to your changes)

I run successfully tdc tests on a patched before posting. I plan to
rerun them before posting the v4.

> On 24/07/18 04:06 PM, Paolo Abeni wrote:
> > This is similar TC_ACT_REDIRECT, but with a slightly different
> > semantic:
> > - on ingress the mirred skbs are passed to the target device
> > network stack without any additional check not scrubbing.
> > - the rcu-protected stats provided via the tcf_result struct
> >    are updated on error conditions.
> > 
> > This new tcfa_action value is not exposed to the user-space
> > and can be used only internally by clsact.
> > 
> > v1 -> v2: do not touch TC_ACT_REDIRECT code path, introduce
> >   a new action type instead
> > 
> > v2 -> v3:
> >   - rename the new action value TC_ACT_REINJECT, update the
> >     helper accordingly
> >   - take care of uncloned reinjected packets in XDP generic
> >     hook
> > 
> > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > ---
> >   include/net/pkt_cls.h     |  3 +++
> >   include/net/sch_generic.h | 19 +++++++++++++++++++
> >   net/core/dev.c            |  6 +++++-
> >   3 files changed, 27 insertions(+), 1 deletion(-)
> > 
> > diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
> > index 2081e4219f81..36ccfe2a303a 100644
> > --- a/include/net/pkt_cls.h
> > +++ b/include/net/pkt_cls.h
> > @@ -7,6 +7,9 @@
> >   #include <net/sch_generic.h>
> >   #include <net/act_api.h>
> >   
> > +/* TC action not accessible from user space */
> > +#define TC_ACT_REINJECT		(TC_ACT_VALUE_MAX + 1)
> 
> Lets say in the future we add a new opcode.
> Will old kernel, new iproute2 (new value) work?

It will works as it currently does in similar situation: opcode unknown
to the kernel are treated as TC_ACT_UNSPEC even now. 

Cheers,

Paolo
Paolo Abeni July 25, 2018, 2:24 p.m. UTC | #9
On Wed, 2018-07-25 at 08:27 -0400, Jamal Hadi Salim wrote:
> On 25/07/18 04:29 AM, Paolo Abeni wrote:
> > On Tue, 2018-07-24 at 13:50 -0700, Cong Wang wrote:
> 
> [..]
> > > > I fail to understand why overlimit is increased in your case
> > > > here. I guess you want to increase 'drops' instead.
> > > 
> > > Hmm, actually the current mirred code increases overlimit too.
> > > But I still don't think it makes sense.
> > 
> > Yep, I chose to increment 'overlimits' to preserve the current mirred
> > semantic.
> > 
> > AFAICS, that was first introduced with:
> > 
> > commit 8919bc13e8d92c5b082c5c0321567383a071f5bc
> > Author: Jamal Hadi Salim <jhs@mojatatu.com>
> > Date:   Mon Aug 15 05:25:40 2011 +0000
> > 
> >      net_sched: fix port mirror/redirect stats reporting
> > 
> > Likely increasing 'drops' would be "better", but I'm unsure we can
> > change this established behavior without affecting some user.
> > 
> 
> Those changes were there from the beginning (above patch did
> not introduce them).
> IIRC, the reason was to distinguish between policy intended
> drops and drops because of errors.

Double-checking to avoid misinterepration on my side: you are ok with
keeping the 'overlimits' increment, right?

Thanks,

Paolo
Jamal Hadi Salim July 25, 2018, 3:26 p.m. UTC | #10
On 25/07/18 10:24 AM, Paolo Abeni wrote:
> On Wed, 2018-07-25 at 08:27 -0400, Jamal Hadi Salim wrote:

>> Those changes were there from the beginning (above patch did
>> not introduce them).
>> IIRC, the reason was to distinguish between policy intended
>> drops and drops because of errors.
> 
> Double-checking to avoid misinterepration on my side: you are ok with
> keeping the 'overlimits' increment, right?

Yes.

cheers,
jamal
Cong Wang July 25, 2018, 4:48 p.m. UTC | #11
On Wed, Jul 25, 2018 at 5:27 AM Jamal Hadi Salim <jhs@mojatatu.com> wrote:
>
> Those changes were there from the beginning (above patch did
> not introduce them).
> IIRC, the reason was to distinguish between policy intended
> drops and drops because of errors.

There must be a limit for "overlimit" to make sense. There is
no limit in mirred action's context, probably there is only
such a limit in act_police. So, all rest should not touch overlimit.
Marcelo Ricardo Leitner July 25, 2018, 5:09 p.m. UTC | #12
On Wed, Jul 25, 2018 at 09:48:16AM -0700, Cong Wang wrote:
> On Wed, Jul 25, 2018 at 5:27 AM Jamal Hadi Salim <jhs@mojatatu.com> wrote:
> >
> > Those changes were there from the beginning (above patch did
> > not introduce them).
> > IIRC, the reason was to distinguish between policy intended
> > drops and drops because of errors.
> 
> There must be a limit for "overlimit" to make sense. There is
> no limit in mirred action's context, probably there is only
> such a limit in act_police. So, all rest should not touch overlimit.

+1
Jamal Hadi Salim July 26, 2018, 12:52 p.m. UTC | #13
On 25/07/18 01:09 PM, Marcelo Ricardo Leitner wrote:
> On Wed, Jul 25, 2018 at 09:48:16AM -0700, Cong Wang wrote:
>> On Wed, Jul 25, 2018 at 5:27 AM Jamal Hadi Salim <jhs@mojatatu.com> wrote:
>>>
>>> Those changes were there from the beginning (above patch did
>>> not introduce them).
>>> IIRC, the reason was to distinguish between policy intended
>>> drops and drops because of errors.
>>
>> There must be a limit for "overlimit" to make sense. There is
>> no limit in mirred action's context, probably there is only
>> such a limit in act_police. So, all rest should not touch overlimit.
> 
> +1
> 

I agree we should at least record drop count(unrelated patch though).
we should keep overlimit (for no other reason other than this
has been around for at least 15 years).

On why "overlimit"? It is just a name for a counter that is useless
for most actions (but was still being transfered to user space).
It is the closest counter to counting "this failed because of
runtime errors" as opposed to "user asked us to drop this".

Probably a good alternative is to make a very small stats v3 structure
(we have migrated stats structures before) and extend for
each action/classifier/qdisc to add its extra counters using XSTATS.

Note:
If you are _mirroring_ packets - incrementing the drop counter is
misleading because the packet is not dropped by the system.
i.e the qdisc will not record it as dropped; it should for
redirect policy.  It is useful to be able to tell
them apart when you are collecting  analytics just for actions.
(if youve worked on a massive amount of machines you'll appreciate
being able to debug by looking at counters that reduce ambiguity).

cheers,
jamal
Cong Wang July 26, 2018, 11:25 p.m. UTC | #14
On Thu, Jul 26, 2018 at 5:52 AM Jamal Hadi Salim <jhs@mojatatu.com> wrote:
>
> On 25/07/18 01:09 PM, Marcelo Ricardo Leitner wrote:
> > On Wed, Jul 25, 2018 at 09:48:16AM -0700, Cong Wang wrote:
> >> On Wed, Jul 25, 2018 at 5:27 AM Jamal Hadi Salim <jhs@mojatatu.com> wrote:
> >>>
> >>> Those changes were there from the beginning (above patch did
> >>> not introduce them).
> >>> IIRC, the reason was to distinguish between policy intended
> >>> drops and drops because of errors.
> >>
> >> There must be a limit for "overlimit" to make sense. There is
> >> no limit in mirred action's context, probably there is only
> >> such a limit in act_police. So, all rest should not touch overlimit.
> >
> > +1
> >
>
> I agree we should at least record drop count(unrelated patch though).
> we should keep overlimit (for no other reason other than this
> has been around for at least 15 years).
>
> On why "overlimit"? It is just a name for a counter that is useless
> for most actions (but was still being transfered to user space).
> It is the closest counter to counting "this failed because of
> runtime errors" as opposed to "user asked us to drop this".
>
> Probably a good alternative is to make a very small stats v3 structure
> (we have migrated stats structures before) and extend for
> each action/classifier/qdisc to add its extra counters using XSTATS.

Agreed.

>
> Note:
> If you are _mirroring_ packets - incrementing the drop counter is
> misleading because the packet is not dropped by the system.
> i.e the qdisc will not record it as dropped; it should for
> redirect policy.  It is useful to be able to tell
> them apart when you are collecting  analytics just for actions.

Sounds like we just need another counter rather than re-using overlimit
or drops.


> (if youve worked on a massive amount of machines you'll appreciate
> being able to debug by looking at counters that reduce ambiguity).
>

Yes, this is how I found out the overlimit of htb qdisc is inaccurate
or misleading, instead the one in htb class is accurate, see:

commit 3c75f6ee139d464351f8ab77a042dd3635769d98
Author: Eric Dumazet <edumazet@google.com>
Date:   Mon Sep 18 12:36:22 2017 -0700

    net_sched: sch_htb: add per class overlimits counter
diff mbox series

Patch

diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index 2081e4219f81..36ccfe2a303a 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -7,6 +7,9 @@ 
 #include <net/sch_generic.h>
 #include <net/act_api.h>
 
+/* TC action not accessible from user space */
+#define TC_ACT_REINJECT		(TC_ACT_VALUE_MAX + 1)
+
 /* Basic packet classifier frontend definitions. */
 
 struct tcf_walker {
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 056dc1083aa3..95e81a70f549 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -235,6 +235,12 @@  struct tcf_result {
 			u32		classid;
 		};
 		const struct tcf_proto *goto_tp;
+
+		/* used by the TC_ACT_REINJECT action */
+		struct {
+			bool		ingress;
+			struct gnet_stats_queue *qstats;
+		};
 	};
 };
 
@@ -1091,4 +1097,17 @@  void mini_qdisc_pair_swap(struct mini_Qdisc_pair *miniqp,
 void mini_qdisc_pair_init(struct mini_Qdisc_pair *miniqp, struct Qdisc *qdisc,
 			  struct mini_Qdisc __rcu **p_miniq);
 
+static inline void skb_tc_reinject(struct sk_buff *skb, struct tcf_result *res)
+{
+	struct gnet_stats_queue *stats = res->qstats;
+	int ret;
+
+	if (res->ingress)
+		ret = netif_receive_skb(skb);
+	else
+		ret = dev_queue_xmit(skb);
+	if (ret && stats)
+		qstats_overlimit_inc(res->qstats);
+}
+
 #endif
diff --git a/net/core/dev.c b/net/core/dev.c
index 14a748ee8cc9..826ec74fe1d9 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4252,7 +4252,7 @@  static u32 netif_receive_generic_xdp(struct sk_buff *skb,
 	/* Reinjected packets coming from act_mirred or similar should
 	 * not get XDP generic processing.
 	 */
-	if (skb_cloned(skb))
+	if (skb_cloned(skb) || skb->tc_redirected)
 		return XDP_PASS;
 
 	/* XDP packets must be linear and must have sufficient headroom
@@ -4602,6 +4602,10 @@  sch_handle_ingress(struct sk_buff *skb, struct packet_type **pt_prev, int *ret,
 		__skb_push(skb, skb->mac_len);
 		skb_do_redirect(skb);
 		return NULL;
+	case TC_ACT_REINJECT:
+		/* this does not scrub the packet, and updates stats on error */
+		skb_tc_reinject(skb, &cl_res);
+		return NULL;
 	default:
 		break;
 	}