diff mbox series

[net-next,4/4] act_mirred: use ACT_REDIRECT when possible

Message ID 95034fe41fa132b2216686eeb41deaefd997e85b.1531473946.git.pabeni@redhat.com
State Changes Requested, archived
Delegated to: David Miller
Headers show
Series TC: refactor TC_ACT_REDIRECT action | expand

Commit Message

Paolo Abeni July 13, 2018, 9:55 a.m. UTC
When mirred is invoked from the ingress path, and it wants to redirect
the processed packet, it can now use the ACT_REDIRECT action,
filling the tcf_result accordingly.

This avoids a skb_clone() in the TC S/W data path giving a ~10%
improvement in forwarding performances. Overall TC S/W performances
are now comparable to the kernel openswitch datapath.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/sched/act_mirred.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

Comments

Cong Wang July 16, 2018, 11:39 p.m. UTC | #1
On Fri, Jul 13, 2018 at 2:55 AM Paolo Abeni <pabeni@redhat.com> wrote:
>
> When mirred is invoked from the ingress path, and it wants to redirect
> the processed packet, it can now use the ACT_REDIRECT action,
> filling the tcf_result accordingly.
>
> This avoids a skb_clone() in the TC S/W data path giving a ~10%
> improvement in forwarding performances. Overall TC S/W performances
> are now comparable to the kernel openswitch datapath.

Avoiding skb_clone() for redirection is cool, but why need to use
skb_do_redirect() here?

There is a subtle difference here:

skb_do_redirect() calls __bpf_rx_skb() which calls
dev_forward_skb().

while the current mirred action doesn't scrub packets when
redirecting to ingress (from egress). Although I forget if it is
intentionally.

Also, skb->skb_iif is unset in skb_do_redirect() when
redirecting to ingress, I recall we have to set it correctly
for input routing. Probably yet another reason why we
can't scrub it, unless my memory goes wrong. :)

Thanks!
Eyal Birger July 17, 2018, 7:01 a.m. UTC | #2
Hi,

On Mon, 16 Jul 2018 16:39:55 -0700
Cong Wang <xiyou.wangcong@gmail.com> wrote:

> On Fri, Jul 13, 2018 at 2:55 AM Paolo Abeni <pabeni@redhat.com> wrote:
> >
> > When mirred is invoked from the ingress path, and it wants to
> > redirect the processed packet, it can now use the ACT_REDIRECT
> > action, filling the tcf_result accordingly.
> >
> > This avoids a skb_clone() in the TC S/W data path giving a ~10%
> > improvement in forwarding performances. Overall TC S/W performances
> > are now comparable to the kernel openswitch datapath.  
> 
> Avoiding skb_clone() for redirection is cool, but why need to use
> skb_do_redirect() here?
> 
> There is a subtle difference here:
> 
> skb_do_redirect() calls __bpf_rx_skb() which calls
> dev_forward_skb().
> 
> while the current mirred action doesn't scrub packets when
> redirecting to ingress (from egress). Although I forget if it is
> intentionally.
> 
> Also, skb->skb_iif is unset in skb_do_redirect() when
> redirecting to ingress, I recall we have to set it correctly
> for input routing. Probably yet another reason why we
> can't scrub it, unless my memory goes wrong. :)

Also dev_forward_skb() enforces MTU checks on the packet which are not
done at the moment. I am aware of deployments where this would break
things.

Eyal.
Paolo Abeni July 17, 2018, 9:15 a.m. UTC | #3
Hi,

On Mon, 2018-07-16 at 16:39 -0700, Cong Wang wrote:
> On Fri, Jul 13, 2018 at 2:55 AM Paolo Abeni <pabeni@redhat.com> wrote:
> > 
> > When mirred is invoked from the ingress path, and it wants to redirect
> > the processed packet, it can now use the ACT_REDIRECT action,
> > filling the tcf_result accordingly.
> > 
> > This avoids a skb_clone() in the TC S/W data path giving a ~10%
> > improvement in forwarding performances. Overall TC S/W performances
> > are now comparable to the kernel openswitch datapath.

Thank you for the feedback.

> Avoiding skb_clone() for redirection is cool, but why need to use
> skb_do_redirect() here?

Well, it's not needed. I tried to reduce code duplication, and I tried
to avoid adding another TC_ACT_* value.

> There is a subtle difference here:
> 
> skb_do_redirect() calls __bpf_rx_skb() which calls
> dev_forward_skb().
>
> while the current mirred action doesn't scrub packets when
> redirecting to ingress (from egress). Although I forget if it is
> intentionally.

Understood.

A possible option out of this issues would be adding another action
value - TC_ACT_MIRRED ? - and handle it in sch_handle_egress()[1] with
the appropriate semantic. That should address also Daniel and Eyal
concerns.

Would you consider the above acceptable?

Thanks,

Paolo
Dave Taht July 17, 2018, 9:38 a.m. UTC | #4
On Tue, Jul 17, 2018 at 2:16 AM Paolo Abeni <pabeni@redhat.com> wrote:
>
> Hi,
>
> On Mon, 2018-07-16 at 16:39 -0700, Cong Wang wrote:
> > On Fri, Jul 13, 2018 at 2:55 AM Paolo Abeni <pabeni@redhat.com> wrote:
> > >
> > > When mirred is invoked from the ingress path, and it wants to redirect
> > > the processed packet, it can now use the ACT_REDIRECT action,
> > > filling the tcf_result accordingly.
> > >
> > > This avoids a skb_clone() in the TC S/W data path giving a ~10%
> > > improvement in forwarding performances. Overall TC S/W performances
> > > are now comparable to the kernel openswitch datapath.
>
> Thank you for the feedback.
>
> > Avoiding skb_clone() for redirection is cool, but why need to use
> > skb_do_redirect() here?
>
> Well, it's not needed. I tried to reduce code duplication, and I tried
> to avoid adding another TC_ACT_* value.
>
> > There is a subtle difference here:
> >
> > skb_do_redirect() calls __bpf_rx_skb() which calls
> > dev_forward_skb().
> >
> > while the current mirred action doesn't scrub packets when
> > redirecting to ingress (from egress). Although I forget if it is
> > intentionally.
>
> Understood.
>
> A possible option out of this issues would be adding another action
> value - TC_ACT_MIRRED ? - and handle it in sch_handle_egress()[1] with
> the appropriate semantic. That should address also Daniel and Eyal
> concerns.
>
> Would you consider the above acceptable?

Our dream has been to be able to specify a 1 liner:

tc qdisc add dev eth0 ingress cake bandwidth 200mbit besteffort wash

and be done with it, eliminating ifb, mirred, etc, entirely.

(I am otherwise delighted to see anything appear that makes inbound
shaping cheaper along the lines of this patchset)
>
> Thanks,
>
> Paolo
>
>
Cong Wang July 17, 2018, 5:24 p.m. UTC | #5
On Tue, Jul 17, 2018 at 2:16 AM Paolo Abeni <pabeni@redhat.com> wrote:
>
> Hi,
>
> On Mon, 2018-07-16 at 16:39 -0700, Cong Wang wrote:
> > On Fri, Jul 13, 2018 at 2:55 AM Paolo Abeni <pabeni@redhat.com> wrote:
> > >
> > > When mirred is invoked from the ingress path, and it wants to redirect
> > > the processed packet, it can now use the ACT_REDIRECT action,
> > > filling the tcf_result accordingly.
> > >
> > > This avoids a skb_clone() in the TC S/W data path giving a ~10%
> > > improvement in forwarding performances. Overall TC S/W performances
> > > are now comparable to the kernel openswitch datapath.
>
> Thank you for the feedback.
>
> > Avoiding skb_clone() for redirection is cool, but why need to use
> > skb_do_redirect() here?
>
> Well, it's not needed. I tried to reduce code duplication, and I tried
> to avoid adding another TC_ACT_* value.


If you consider dev_forward_skb(), it is not a duplication.


>
> > There is a subtle difference here:
> >
> > skb_do_redirect() calls __bpf_rx_skb() which calls
> > dev_forward_skb().
> >
> > while the current mirred action doesn't scrub packets when
> > redirecting to ingress (from egress). Although I forget if it is
> > intentionally.
>
> Understood.
>
> A possible option out of this issues would be adding another action
> value - TC_ACT_MIRRED ? - and handle it in sch_handle_egress()[1] with
> the appropriate semantic. That should address also Daniel and Eyal
> concerns.
>
> Would you consider the above acceptable?

If you goal is to get rid of skb_clone(), why not just do the following?

        if (tcf_mirred_is_act_redirect(m_eaction)) {
                skb2 = skb;
        } else {
                skb2 = skb_clone(skb, GFP_ATOMIC);
                if (!skb2)
                        goto out;
        }

For redirect, we return TC_ACT_SHOT, so upper layer should not
touch the skb after that.

What am I missing here?
Paolo Abeni July 18, 2018, 10:05 a.m. UTC | #6
Hi,

On Tue, 2018-07-17 at 10:24 -0700, Cong Wang wrote:
> If you goal is to get rid of skb_clone(), why not just do the following?
> 
>         if (tcf_mirred_is_act_redirect(m_eaction)) {
>                 skb2 = skb;
>         } else {
>                 skb2 = skb_clone(skb, GFP_ATOMIC);
>                 if (!skb2)
>                         goto out;
>         }
> 
> For redirect, we return TC_ACT_SHOT, so upper layer should not
> touch the skb after that.
> 
> What am I missing here?

With ACT_SHOT caller/upper layer will free the skb, too. We will have
an use after free (from either the upper layer and the xmit device).
Similar issues with STOLEN, TRAP, etc.

In the past, Changli Gao attempted to avoid the clone incrementing the
skb usage count:

commit 210d6de78c5d7c785fc532556cea340e517955e1
Author: Changli Gao <xiaosuo@gmail.com>
Date:   Thu Jun 24 16:25:12 2010 +0000

    act_mirred: don't clone skb when skb isn't shared

but some/many device drivers expect an skb usage count of 1, and that
caused ooops and was revered.

I think the only other option (beyond re-using ACT_MIRROR) is adding
another action value, and let the upper layer re-inject the packet
while handling such action (similar to what ACT_MIRROR currently does,
but preserving the current mirred semantic).

Cheers,

Paolo
kernel test robot July 19, 2018, 5:16 p.m. UTC | #7
Hi Paolo,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on net-next/master]

url:    https://github.com/0day-ci/linux/commits/Paolo-Abeni/TC-refactor-TC_ACT_REDIRECT-action/20180716-011055
reproduce:
        # apt-get install sparse
        make ARCH=x86_64 allmodconfig
        make C=1 CF=-D__CHECK_ENDIAN__


sparse warnings: (new ones prefixed by >>)

>> net/sched/act_mirred.c:195:17: sparse: dubious: x | !y
   net/sched/act_mirred.c:260:23: sparse: expression using sizeof(void)
   net/sched/act_mirred.c:260:23: sparse: expression using sizeof(void)

vim +195 net/sched/act_mirred.c

   169	
   170	static int tcf_mirred(struct sk_buff *skb, const struct tc_action *a,
   171			      struct tcf_result *res)
   172	{
   173		struct tcf_mirred *m = to_mirred(a);
   174		bool m_mac_header_xmit;
   175		struct net_device *dev;
   176		struct sk_buff *skb2;
   177		int retval, err = 0;
   178		bool want_ingress;
   179		int m_eaction;
   180		int mac_len;
   181	
   182		tcf_lastuse_update(&m->tcf_tm);
   183		bstats_cpu_update(this_cpu_ptr(m->common.cpu_bstats), skb);
   184	
   185		m_mac_header_xmit = READ_ONCE(m->tcfm_mac_header_xmit);
   186		m_eaction = READ_ONCE(m->tcfm_eaction);
   187		retval = READ_ONCE(m->tcf_action);
   188		dev = rcu_dereference_bh(m->tcfm_dev);
   189		want_ingress = tcf_mirred_act_wants_ingress(m_eaction);
   190		if (skb_at_tc_ingress(skb) && tcf_mirred_is_act_redirect(m_eaction)) {
   191			skb->tc_redirected = 1;
   192			skb->tc_from_ingress = 1;
   193	
   194			/* the core redirect code will check dev and its status */
 > 195			TCF_RESULT_SET_REDIRECT(res, dev, want_ingress);
   196			res->qstats = this_cpu_ptr(m->common.cpu_qstats);
   197			return TC_ACT_REDIRECT;
   198		}
   199	
   200		if (unlikely(!dev)) {
   201			pr_notice_once("tc mirred: target device is gone\n");
   202			goto out;
   203		}
   204	
   205		if (unlikely(!(dev->flags & IFF_UP))) {
   206			net_notice_ratelimited("tc mirred to Houston: device %s is down\n",
   207					       dev->name);
   208			goto out;
   209		}
   210	
   211		skb2 = skb_clone(skb, GFP_ATOMIC);
   212		if (!skb2)
   213			goto out;
   214	
   215		/* If action's target direction differs than filter's direction,
   216		 * and devices expect a mac header on xmit, then mac push/pull is
   217		 * needed.
   218		 */
   219		if (skb_at_tc_ingress(skb) != want_ingress && m_mac_header_xmit) {
   220			if (!skb_at_tc_ingress(skb)) {
   221				/* caught at egress, act ingress: pull mac */
   222				mac_len = skb_network_header(skb) - skb_mac_header(skb);
   223				skb_pull_rcsum(skb2, mac_len);
   224			} else {
   225				/* caught at ingress, act egress: push mac */
   226				skb_push_rcsum(skb2, skb->mac_len);
   227			}
   228		}
   229	
   230		/* mirror is always swallowed */
   231		if (tcf_mirred_is_act_redirect(m_eaction)) {
   232			skb2->tc_redirected = 1;
   233			skb2->tc_from_ingress = skb2->tc_at_ingress;
   234		}
   235	
   236		skb2->skb_iif = skb->dev->ifindex;
   237		skb2->dev = dev;
   238		if (!tcf_mirred_act_wants_ingress(m_eaction))
   239			err = dev_queue_xmit(skb2);
   240		else
   241			err = netif_receive_skb(skb2);
   242	
   243		if (err) {
   244	out:
   245			qstats_overlimit_inc(this_cpu_ptr(m->common.cpu_qstats));
   246			if (tcf_mirred_is_act_redirect(m_eaction))
   247				retval = TC_ACT_SHOT;
   248		}
   249	
   250		return retval;
   251	}
   252	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation
Cong Wang July 19, 2018, 5:56 p.m. UTC | #8
On Wed, Jul 18, 2018 at 3:05 AM Paolo Abeni <pabeni@redhat.com> wrote:
>
> Hi,
>
> On Tue, 2018-07-17 at 10:24 -0700, Cong Wang wrote:
> > If you goal is to get rid of skb_clone(), why not just do the following?
> >
> >         if (tcf_mirred_is_act_redirect(m_eaction)) {
> >                 skb2 = skb;
> >         } else {
> >                 skb2 = skb_clone(skb, GFP_ATOMIC);
> >                 if (!skb2)
> >                         goto out;
> >         }
> >
> > For redirect, we return TC_ACT_SHOT, so upper layer should not
> > touch the skb after that.
> >
> > What am I missing here?
>
> With ACT_SHOT caller/upper layer will free the skb, too. We will have
> an use after free (from either the upper layer and the xmit device).
> Similar issues with STOLEN, TRAP, etc.
>
> In the past, Changli Gao attempted to avoid the clone incrementing the
> skb usage count:
>
> commit 210d6de78c5d7c785fc532556cea340e517955e1
> Author: Changli Gao <xiaosuo@gmail.com>
> Date:   Thu Jun 24 16:25:12 2010 +0000
>
>     act_mirred: don't clone skb when skb isn't shared
>
> but some/many device drivers expect an skb usage count of 1, and that
> caused ooops and was revered.

Interesting, I wasn't aware of the above commit and its revert.

First, I didn't use skb_get() above.

Second, I think the caller of dev_queue_xmit() should not
touch the skb after it, the skb is either freed by dev_queue_xmit()
or successfully transmitted, in either case, the ownership belongs
to dev_queue_xmit(). So, I think we should skip the qdisc_drop()
for this case.

Not sure about netif_receive_skb() case, given veth calls in its
xmit too, I speculate the rule is probably same.

Not sure about other ACT_SHOT case than act_mirred...

>
> I think the only other option (beyond re-using ACT_MIRROR) is adding
> another action value, and let the upper layer re-inject the packet
> while handling such action (similar to what ACT_MIRROR currently does,
> but preserving the current mirred semantic).

Maybe if you mean to avoid breaking ACT_SHOT.

Thanks.
Paolo Abeni July 20, 2018, 10:16 a.m. UTC | #9
On Thu, 2018-07-19 at 10:56 -0700, Cong Wang wrote:
> On Wed, Jul 18, 2018 at 3:05 AM Paolo Abeni <pabeni@redhat.com> wrote:
> > 
> > Hi,
> > 
> > On Tue, 2018-07-17 at 10:24 -0700, Cong Wang wrote:
> > > If you goal is to get rid of skb_clone(), why not just do the following?
> > > 
> > >         if (tcf_mirred_is_act_redirect(m_eaction)) {
> > >                 skb2 = skb;
> > >         } else {
> > >                 skb2 = skb_clone(skb, GFP_ATOMIC);
> > >                 if (!skb2)
> > >                         goto out;
> > >         }
> > > 
> > > For redirect, we return TC_ACT_SHOT, so upper layer should not
> > > touch the skb after that.
> > > 
> > > What am I missing here?
> > 
> > With ACT_SHOT caller/upper layer will free the skb, too. We will have
> > an use after free (from either the upper layer and the xmit device).
> > Similar issues with STOLEN, TRAP, etc.
> > 
> > In the past, Changli Gao attempted to avoid the clone incrementing the
> > skb usage count:
> > 
> > commit 210d6de78c5d7c785fc532556cea340e517955e1
> > Author: Changli Gao <xiaosuo@gmail.com>
> > Date:   Thu Jun 24 16:25:12 2010 +0000
> > 
> >     act_mirred: don't clone skb when skb isn't shared
> > 
> > but some/many device drivers expect an skb usage count of 1, and that
> > caused ooops and was revered.
> 
> Interesting, I wasn't aware of the above commit and its revert.
> 
> First, I didn't use skb_get() above.
> 
> Second, I think the caller of dev_queue_xmit() should not
> touch the skb after it, the skb is either freed by dev_queue_xmit()
> or successfully transmitted, in either case, the ownership belongs
> to dev_queue_xmit(). So, I think we should skip the qdisc_drop()
> for this case.
> 
> Not sure about netif_receive_skb() case, given veth calls in its
> xmit too, I speculate the rule is probably same.
> 
> Not sure about other ACT_SHOT case than act_mirred...

I think any tc filter can be configured from user space to return
ACT_SHOT, so changing the ACT_SHOT handling would be quite invasive and
error prone, as all the tc filters (to free the skb on ACT_SHOT) and tc
schedulers (to avoid touching the skb) must be modified.

If there are no strong objection vs a new action value, I would opt for
such option.

Thanks,

Paolo
diff mbox series

Patch

diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
index eeb335f03102..b19317426117 100644
--- a/net/sched/act_mirred.c
+++ b/net/sched/act_mirred.c
@@ -175,6 +175,7 @@  static int tcf_mirred(struct sk_buff *skb, const struct tc_action *a,
 	struct net_device *dev;
 	struct sk_buff *skb2;
 	int retval, err = 0;
+	bool want_ingress;
 	int m_eaction;
 	int mac_len;
 
@@ -185,6 +186,17 @@  static int tcf_mirred(struct sk_buff *skb, const struct tc_action *a,
 	m_eaction = READ_ONCE(m->tcfm_eaction);
 	retval = READ_ONCE(m->tcf_action);
 	dev = rcu_dereference_bh(m->tcfm_dev);
+	want_ingress = tcf_mirred_act_wants_ingress(m_eaction);
+	if (skb_at_tc_ingress(skb) && tcf_mirred_is_act_redirect(m_eaction)) {
+		skb->tc_redirected = 1;
+		skb->tc_from_ingress = 1;
+
+		/* the core redirect code will check dev and its status */
+		TCF_RESULT_SET_REDIRECT(res, dev, want_ingress);
+		res->qstats = this_cpu_ptr(m->common.cpu_qstats);
+		return TC_ACT_REDIRECT;
+	}
+
 	if (unlikely(!dev)) {
 		pr_notice_once("tc mirred: target device is gone\n");
 		goto out;
@@ -204,8 +216,7 @@  static int tcf_mirred(struct sk_buff *skb, const struct tc_action *a,
 	 * and devices expect a mac header on xmit, then mac push/pull is
 	 * needed.
 	 */
-	if (skb_at_tc_ingress(skb) != tcf_mirred_act_wants_ingress(m_eaction) &&
-	    m_mac_header_xmit) {
+	if (skb_at_tc_ingress(skb) != want_ingress && m_mac_header_xmit) {
 		if (!skb_at_tc_ingress(skb)) {
 			/* caught at egress, act ingress: pull mac */
 			mac_len = skb_network_header(skb) - skb_mac_header(skb);