diff mbox

[-next,2/3] net: sched: remove AT INGRESS/EGRESS

Message ID 1431679850-31896-3-git-send-email-fw@strlen.de
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Florian Westphal May 15, 2015, 8:50 a.m. UTC
act_mirred needs to know when it is invoked from ingress sched so it
knows that it has to push the l2 header back (which is already in place
if its called via the 'normal' egress path).

This is currently done via SET_TC_AT(), and explicit AT_INGRESS/EGRESS.
But some of this information is redundant.

The skb can only be in one of four states:

- FROM_INGRESS: skb was redirected via act_mirred and should be
                reinjected into ingress path
- FROM_EGRESS:  skb was redirected via act_mirred and needs to
                be transmitted via the device that the mirred action is
		attached to.

These (existing) two states are set by act_mirred, IFB driver is consumer.

The third state is 'zero', which means the skb has not been handled by
any part of the tc machinery (or has no "special properties" tc
needs to be aware of).

This adds the 4th skb_tc_state: TC_AT_INGRESS.

This is set when calling tc_classify in the ingress path.  The mirred
action uses this to decide the state of the cloned skb that it operates on:

original      clone
TC_AT_INGRESS TC_FROM_INGRESS
0             TC_FROM_EGRESS

We also remove the need to re-set tc_verd AT state in dev_queue_xmit.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 include/linux/skbuff.h       |  2 +-
 include/net/pkt_sched.h      | 12 ++++++++++++
 include/uapi/linux/pkt_cls.h |  4 ++--
 net/core/dev.c               |  6 ++----
 net/sched/act_mirred.c       |  8 +++-----
 5 files changed, 20 insertions(+), 12 deletions(-)

Comments

Alexei Starovoitov May 15, 2015, 4:23 p.m. UTC | #1
On Fri, May 15, 2015 at 10:50:49AM +0200, Florian Westphal wrote:
>  
>  enum skb_tc_state {
> +	TC_NO_STATE = 0, /* must be 0 */
> +
>  	/* set by act_mirred to tell IFB that skb needs to be ... */
>  	TC_FROM_INGRESS = 1, /* ... re-injected to local stack */
>  	TC_FROM_EGRESS = 2,  /* ... transmitted to device */
> +
> +	/* used by act_mirred to learn its called during skb rx processing
> +	 * and has to push back the (already pulled) l2 header.
> +	 */
> +	TC_AT_INGRESS = 3,
>  };
>  
...
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 0e7afef..802b9b9 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -3071,9 +3071,6 @@ static int __dev_queue_xmit(struct sk_buff *skb, void *accel_priv)
>  	txq = netdev_pick_tx(dev, skb, accel_priv);
>  	q = rcu_dereference_bh(txq->qdisc);
>  
> -#ifdef CONFIG_NET_CLS_ACT
> -	skb->tc_verd = SET_TC_AT(skb->tc_verd, AT_EGRESS);
> -#endif

I don't think it's a good change.
Squeezing 4 bits into 2 by losing AT_STACK condition, imo, is wrong.
Before ifb could differentiate AT_STACK and AT_EGRESS, but when
they're aliased we lose this information.
I think we should keep AT_STACK, AT_INGRESS, AT_EGRESS.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Florian Westphal May 15, 2015, 5:21 p.m. UTC | #2
Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> > diff --git a/net/core/dev.c b/net/core/dev.c
> > index 0e7afef..802b9b9 100644
> > --- a/net/core/dev.c
> > +++ b/net/core/dev.c
> > @@ -3071,9 +3071,6 @@ static int __dev_queue_xmit(struct sk_buff *skb, void *accel_priv)
> >  	txq = netdev_pick_tx(dev, skb, accel_priv);
> >  	q = rcu_dereference_bh(txq->qdisc);
> >  
> > -#ifdef CONFIG_NET_CLS_ACT
> > -	skb->tc_verd = SET_TC_AT(skb->tc_verd, AT_EGRESS);
> > -#endif
> 
> I don't think it's a good change.
> Squeezing 4 bits into 2 by losing AT_STACK condition, imo, is wrong.

Its right, IMO.

> Before ifb could differentiate AT_STACK and AT_EGRESS, but when
> they're aliased we lose this information.

IFB refuses to work with skbs that did not come in via mirred.
So, from ifb point of view it makes no difference, G_TC_FROM+AT_STACK
causes skb to be dropped and IFB doesn't care about G_TC_AT() at all.

So the change is compatible: zero state == "mirred did not see this skb"
-> drop

> I think we shmuld keep AT_STACK, AT_INGRESS, AT_EGRESS.

We have dozens of combinations that either cannot happen
from concept point of view (skb is AT_EGRESS or its coming in,
it can't be both logically) or because no code path uses/sets it.

For G_TC_AT(), we only have two cases: egress (since
everything is attached to qdisc / coming in via dev_queue_xmit thats
the "normal case", and the special ingress one, which is what the new
TC_AT_INGRESS state is for.  The ingress case is also "special" because
it can only turn back to 0 (no mirred action attached to ingress) or
change to TC_FROM_INGRESS (when mirred grabs it).

The existing G_TC_AT is is irrelevant for IFB since ifb is reachable only
called via ndo_start_xmit so its always at egress.

skb_tc_state enum just tells us what do to with the skb --
back to ingress or transmit via device.

AT_STACK cannot even happen for the G_TC_AT case from looking at the
code since dev_queue_xmit forces AT_EGRESS & rx sets AT_INGRESS.

And for FROM its only relevant in ifb to detect skb that wasn't seen by
mirred.  And we keep this functionality via 0 skb_tc_state.

I did a test patch with Jamals suggestion (continue using macros, just
close holes), but I see no gain: we get 5 bits that are used for
automaton that only has/cares about 4 possible (distinct) skb states.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexei Starovoitov May 15, 2015, 8:09 p.m. UTC | #3
On Fri, May 15, 2015 at 07:21:15PM +0200, Florian Westphal wrote:
> So, from ifb point of view it makes no difference, G_TC_FROM+AT_STACK
> causes skb to be dropped and IFB doesn't care about G_TC_AT() at all.

yes. your change is technically correct. It's not causing ifb regression,
but it removes information in a way that will be very hard to add it later.

> AT_STACK cannot even happen for the G_TC_AT case from looking at the
> code since dev_queue_xmit forces AT_EGRESS & rx sets AT_INGRESS.

yes, if we only consider ingress and egress hooks.
I want to use this stack/ingress/egress indication with socket filters.
If we make stack==egress, I would need to refactor this code all over again.
It's not broken today. You're doing this aliasing only two squeeze a bit.
That's why I'm saying keep the stack/ingress/egress flag as-is. It's useful.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Florian Westphal May 15, 2015, 10:22 p.m. UTC | #4
Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> On Fri, May 15, 2015 at 07:21:15PM +0200, Florian Westphal wrote:
> > So, from ifb point of view it makes no difference, G_TC_FROM+AT_STACK
> > causes skb to be dropped and IFB doesn't care about G_TC_AT() at all.
> 
> yes. your change is technically correct. It's not causing ifb regression,

Thanks.

> but it removes information in a way that will be very hard to add it later.

Are you sure?  Would you mind elaborating a bit?

> > AT_STACK cannot even happen for the G_TC_AT case from looking at the
> > code since dev_queue_xmit forces AT_EGRESS & rx sets AT_INGRESS.
> 
> yes, if we only consider ingress and egress hooks.
> I want to use this stack/ingress/egress indication with socket filters.

Hmm... I'm sorry, I fail to understand where problem is.

> If we make stack==egress, I would need to refactor this code all over again.

If you mean "skb was not forwarded", you could just check for
skb->skb_iif = 0?

If not, what info do you need, and why could we not extend proposed enum
if absolutely required?

> It's not broken today. You're doing this aliasing only two squeeze a bit.

Yep, but its was also to minimize the state machinery down to whats
required.

Sorry Alexei, I'm just trying to find out what exactly is needed,
perhaps if you can clarify/explain I might be able to re-spin this in a
way that will meet your requirements.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jamal Hadi Salim May 15, 2015, 10:43 p.m. UTC | #5
On 05/15/15 16:09, Alexei Starovoitov wrote:
> On Fri, May 15, 2015 at 07:21:15PM +0200, Florian Westphal wrote:
>> So, from ifb point of view it makes no difference, G_TC_FROM+AT_STACK
>> causes skb to be dropped and IFB doesn't care about G_TC_AT() at all.
>
> yes. your change is technically correct. It's not causing ifb regression,
> but it removes information in a way that will be very hard to add it later.
>
>> AT_STACK cannot even happen for the G_TC_AT case from looking at the
>> code since dev_queue_xmit forces AT_EGRESS & rx sets AT_INGRESS.
>
> yes, if we only consider ingress and egress hooks.
> I want to use this stack/ingress/egress indication with socket filters.
> If we make stack==egress, I would need to refactor this code all over again.
> It's not broken today. You're doing this aliasing only two squeeze a bit.
> That's why I'm saying keep the stack/ingress/egress flag as-is. It's useful.
>

My point as well. Using ifb or mirred as examples is fine
but they are not the only potential consumers/producers. Using examples
as such is out of place when it is an architectural issue.
So i would rather this be left alone.

cheers,
jamal
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 4a1367e..906dc35 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -487,7 +487,7 @@  static inline u32 skb_mstamp_us_delta(const struct skb_mstamp *t1,
  *	@hash: the packet hash
  *	@queue_mapping: Queue mapping for multiqueue devices
  *	@xmit_more: More SKBs are pending for this queue
- *	@skb_tc_state: was mirrored (act_mirred)
+ *	@skb_tc_state: was mirrored (act_mirred) or is handled via sch_ingress
  *	@ndisc_nodetype: router type (from link layer)
  *	@ooo_okay: allow the mapping of a socket to a queue to be changed
  *	@l4_hash: indicate hash is a canonical 4-tuple hash over transport
diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
index a4c509d..63328cf 100644
--- a/include/net/pkt_sched.h
+++ b/include/net/pkt_sched.h
@@ -134,10 +134,22 @@  static inline unsigned int psched_mtu(const struct net_device *dev)
 	return dev->mtu + dev->hard_header_len;
 }
 
+/* traffic control processing state of the skb.
+ *
+ * This is mainly used by IFB driver, the 'mirred' action, and
+ * the ingress scheduler (sch_ingress).
+ */
 enum skb_tc_state {
+	TC_NO_STATE = 0, /* must be 0 */
+
 	/* set by act_mirred to tell IFB that skb needs to be ... */
 	TC_FROM_INGRESS = 1, /* ... re-injected to local stack */
 	TC_FROM_EGRESS = 2,  /* ... transmitted to device */
+
+	/* used by act_mirred to learn its called during skb rx processing
+	 * and has to push back the (already pulled) l2 header.
+	 */
+	TC_AT_INGRESS = 3,
 };
 
 #endif
diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
index 3308e89..271c788 100644
--- a/include/uapi/linux/pkt_cls.h
+++ b/include/uapi/linux/pkt_cls.h
@@ -56,10 +56,10 @@  bits 9,10,11: redirect counter -  redirect TTL. Loop avoidance
 #define G_TC_FROM(x)       _TC_GETVALUE(x,S_TC_FROM,M_TC_FROM)
 #define V_TC_FROM(x)       _TC_MAKEVALUE(x,S_TC_FROM)
 #define SET_TC_FROM(v,n)   ((V_TC_FROM(n)) | (v & ~M_TC_FROM))
-#endif
 #define AT_STACK	0x0
 #define AT_INGRESS	0x1
 #define AT_EGRESS	0x2
+#endif
 
 #define TC_NCLS          _TC_MAKEMASK1(8)
 #define SET_TC_NCLS(v)   ( TC_NCLS | (v & ~TC_NCLS))
@@ -71,13 +71,13 @@  bits 9,10,11: redirect counter -  redirect TTL. Loop avoidance
 #define G_TC_RTTL(x)       _TC_GETVALUE(x,S_TC_RTTL,M_TC_RTTL)
 #define V_TC_RTTL(x)       _TC_MAKEVALUE(x,S_TC_RTTL)
 #define SET_TC_RTTL(v,n)   ((V_TC_RTTL(n)) | (v & ~M_TC_RTTL))
-#endif
 
 #define S_TC_AT          _TC_MAKE32(12)
 #define M_TC_AT          _TC_MAKEMASK(2,S_TC_AT)
 #define G_TC_AT(x)       _TC_GETVALUE(x,S_TC_AT,M_TC_AT)
 #define V_TC_AT(x)       _TC_MAKEVALUE(x,S_TC_AT)
 #define SET_TC_AT(v,n)   ((V_TC_AT(n)) | (v & ~M_TC_AT))
+#endif
 
 /* Action attributes */
 enum {
diff --git a/net/core/dev.c b/net/core/dev.c
index 0e7afef..802b9b9 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3071,9 +3071,6 @@  static int __dev_queue_xmit(struct sk_buff *skb, void *accel_priv)
 	txq = netdev_pick_tx(dev, skb, accel_priv);
 	q = rcu_dereference_bh(txq->qdisc);
 
-#ifdef CONFIG_NET_CLS_ACT
-	skb->tc_verd = SET_TC_AT(skb->tc_verd, AT_EGRESS);
-#endif
 	trace_net_dev_queue(skb);
 	if (q->enqueue) {
 		rc = __dev_xmit_skb(skb, q, dev, txq);
@@ -3648,7 +3645,7 @@  static inline struct sk_buff *handle_ing(struct sk_buff *skb,
 	}
 
 	qdisc_skb_cb(skb)->pkt_len = skb->len;
-	skb->tc_verd = SET_TC_AT(skb->tc_verd, AT_INGRESS);
+	skb->skb_tc_state = TC_AT_INGRESS;
 	qdisc_bstats_update_cpu(cl->q, skb);
 
 	switch (tc_classify(skb, cl, &cl_res)) {
@@ -3665,6 +3662,7 @@  static inline struct sk_buff *handle_ing(struct sk_buff *skb,
 	default:
 		break;
 	}
+	skb->skb_tc_state = 0;
 
 	return skb;
 }
diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
index 34d4320..b8d70de 100644
--- a/net/sched/act_mirred.c
+++ b/net/sched/act_mirred.c
@@ -131,7 +131,6 @@  static int tcf_mirred(struct sk_buff *skb, const struct tc_action *a,
 	struct tcf_mirred *m = a->priv;
 	struct net_device *dev;
 	struct sk_buff *skb2;
-	u32 at;
 	int retval, err = 1;
 
 	spin_lock(&m->tcf_lock);
@@ -150,21 +149,20 @@  static int tcf_mirred(struct sk_buff *skb, const struct tc_action *a,
 		goto out;
 	}
 
-	at = G_TC_AT(skb->tc_verd);
 	skb2 = skb_clone(skb, GFP_ATOMIC);
 	if (skb2 == NULL)
 		goto out;
 
-	if (!(at & AT_EGRESS)) {
+	if (skb->skb_tc_state == TC_AT_INGRESS) {
 		if (m->tcfm_ok_push)
 			skb_push(skb2, skb->mac_len);
 	}
 
 	/* mirror is always swallowed */
 	if (m->tcfm_eaction != TCA_EGRESS_MIRROR) {
-		if (at & AT_INGRESS)
+		if (skb->skb_tc_state == TC_AT_INGRESS)
 			skb2->skb_tc_state = TC_FROM_INGRESS;
-		else if (at & AT_EGRESS)
+		else
 			skb2->skb_tc_state = TC_FROM_EGRESS;
 	}
 	skb2->skb_iif = skb->dev->ifindex;