Message ID | 95034fe41fa132b2216686eeb41deaefd997e85b.1531473946.git.pabeni@redhat.com |
---|---|
State | Changes Requested, archived |
Delegated to: | David Miller |
Headers | show |
Series | TC: refactor TC_ACT_REDIRECT action | expand |
On Fri, Jul 13, 2018 at 2:55 AM Paolo Abeni <pabeni@redhat.com> wrote: > > When mirred is invoked from the ingress path, and it wants to redirect > the processed packet, it can now use the ACT_REDIRECT action, > filling the tcf_result accordingly. > > This avoids a skb_clone() in the TC S/W data path giving a ~10% > improvement in forwarding performances. Overall TC S/W performances > are now comparable to the kernel openswitch datapath. Avoiding skb_clone() for redirection is cool, but why need to use skb_do_redirect() here? There is a subtle difference here: skb_do_redirect() calls __bpf_rx_skb() which calls dev_forward_skb(). while the current mirred action doesn't scrub packets when redirecting to ingress (from egress). Although I forget if it is intentionally. Also, skb->skb_iif is unset in skb_do_redirect() when redirecting to ingress, I recall we have to set it correctly for input routing. Probably yet another reason why we can't scrub it, unless my memory goes wrong. :) Thanks!
Hi, On Mon, 16 Jul 2018 16:39:55 -0700 Cong Wang <xiyou.wangcong@gmail.com> wrote: > On Fri, Jul 13, 2018 at 2:55 AM Paolo Abeni <pabeni@redhat.com> wrote: > > > > When mirred is invoked from the ingress path, and it wants to > > redirect the processed packet, it can now use the ACT_REDIRECT > > action, filling the tcf_result accordingly. > > > > This avoids a skb_clone() in the TC S/W data path giving a ~10% > > improvement in forwarding performances. Overall TC S/W performances > > are now comparable to the kernel openswitch datapath. > > Avoiding skb_clone() for redirection is cool, but why need to use > skb_do_redirect() here? > > There is a subtle difference here: > > skb_do_redirect() calls __bpf_rx_skb() which calls > dev_forward_skb(). > > while the current mirred action doesn't scrub packets when > redirecting to ingress (from egress). Although I forget if it is > intentionally. > > Also, skb->skb_iif is unset in skb_do_redirect() when > redirecting to ingress, I recall we have to set it correctly > for input routing. Probably yet another reason why we > can't scrub it, unless my memory goes wrong. :) Also dev_forward_skb() enforces MTU checks on the packet which are not done at the moment. I am aware of deployments where this would break things. Eyal.
Hi, On Mon, 2018-07-16 at 16:39 -0700, Cong Wang wrote: > On Fri, Jul 13, 2018 at 2:55 AM Paolo Abeni <pabeni@redhat.com> wrote: > > > > When mirred is invoked from the ingress path, and it wants to redirect > > the processed packet, it can now use the ACT_REDIRECT action, > > filling the tcf_result accordingly. > > > > This avoids a skb_clone() in the TC S/W data path giving a ~10% > > improvement in forwarding performances. Overall TC S/W performances > > are now comparable to the kernel openswitch datapath. Thank you for the feedback. > Avoiding skb_clone() for redirection is cool, but why need to use > skb_do_redirect() here? Well, it's not needed. I tried to reduce code duplication, and I tried to avoid adding another TC_ACT_* value. > There is a subtle difference here: > > skb_do_redirect() calls __bpf_rx_skb() which calls > dev_forward_skb(). > > while the current mirred action doesn't scrub packets when > redirecting to ingress (from egress). Although I forget if it is > intentionally. Understood. A possible option out of this issues would be adding another action value - TC_ACT_MIRRED ? - and handle it in sch_handle_egress()[1] with the appropriate semantic. That should address also Daniel and Eyal concerns. Would you consider the above acceptable? Thanks, Paolo
On Tue, Jul 17, 2018 at 2:16 AM Paolo Abeni <pabeni@redhat.com> wrote: > > Hi, > > On Mon, 2018-07-16 at 16:39 -0700, Cong Wang wrote: > > On Fri, Jul 13, 2018 at 2:55 AM Paolo Abeni <pabeni@redhat.com> wrote: > > > > > > When mirred is invoked from the ingress path, and it wants to redirect > > > the processed packet, it can now use the ACT_REDIRECT action, > > > filling the tcf_result accordingly. > > > > > > This avoids a skb_clone() in the TC S/W data path giving a ~10% > > > improvement in forwarding performances. Overall TC S/W performances > > > are now comparable to the kernel openswitch datapath. > > Thank you for the feedback. > > > Avoiding skb_clone() for redirection is cool, but why need to use > > skb_do_redirect() here? > > Well, it's not needed. I tried to reduce code duplication, and I tried > to avoid adding another TC_ACT_* value. > > > There is a subtle difference here: > > > > skb_do_redirect() calls __bpf_rx_skb() which calls > > dev_forward_skb(). > > > > while the current mirred action doesn't scrub packets when > > redirecting to ingress (from egress). Although I forget if it is > > intentionally. > > Understood. > > A possible option out of this issues would be adding another action > value - TC_ACT_MIRRED ? - and handle it in sch_handle_egress()[1] with > the appropriate semantic. That should address also Daniel and Eyal > concerns. > > Would you consider the above acceptable? Our dream has been to be able to specify a 1 liner: tc qdisc add dev eth0 ingress cake bandwidth 200mbit besteffort wash and be done with it, eliminating ifb, mirred, etc, entirely. (I am otherwise delighted to see anything appear that makes inbound shaping cheaper along the lines of this patchset) > > Thanks, > > Paolo > >
On Tue, Jul 17, 2018 at 2:16 AM Paolo Abeni <pabeni@redhat.com> wrote: > > Hi, > > On Mon, 2018-07-16 at 16:39 -0700, Cong Wang wrote: > > On Fri, Jul 13, 2018 at 2:55 AM Paolo Abeni <pabeni@redhat.com> wrote: > > > > > > When mirred is invoked from the ingress path, and it wants to redirect > > > the processed packet, it can now use the ACT_REDIRECT action, > > > filling the tcf_result accordingly. > > > > > > This avoids a skb_clone() in the TC S/W data path giving a ~10% > > > improvement in forwarding performances. Overall TC S/W performances > > > are now comparable to the kernel openswitch datapath. > > Thank you for the feedback. > > > Avoiding skb_clone() for redirection is cool, but why need to use > > skb_do_redirect() here? > > Well, it's not needed. I tried to reduce code duplication, and I tried > to avoid adding another TC_ACT_* value. If you consider dev_forward_skb(), it is not a duplication. > > > There is a subtle difference here: > > > > skb_do_redirect() calls __bpf_rx_skb() which calls > > dev_forward_skb(). > > > > while the current mirred action doesn't scrub packets when > > redirecting to ingress (from egress). Although I forget if it is > > intentionally. > > Understood. > > A possible option out of this issues would be adding another action > value - TC_ACT_MIRRED ? - and handle it in sch_handle_egress()[1] with > the appropriate semantic. That should address also Daniel and Eyal > concerns. > > Would you consider the above acceptable? If you goal is to get rid of skb_clone(), why not just do the following? if (tcf_mirred_is_act_redirect(m_eaction)) { skb2 = skb; } else { skb2 = skb_clone(skb, GFP_ATOMIC); if (!skb2) goto out; } For redirect, we return TC_ACT_SHOT, so upper layer should not touch the skb after that. What am I missing here?
Hi, On Tue, 2018-07-17 at 10:24 -0700, Cong Wang wrote: > If you goal is to get rid of skb_clone(), why not just do the following? > > if (tcf_mirred_is_act_redirect(m_eaction)) { > skb2 = skb; > } else { > skb2 = skb_clone(skb, GFP_ATOMIC); > if (!skb2) > goto out; > } > > For redirect, we return TC_ACT_SHOT, so upper layer should not > touch the skb after that. > > What am I missing here? With ACT_SHOT caller/upper layer will free the skb, too. We will have an use after free (from either the upper layer and the xmit device). Similar issues with STOLEN, TRAP, etc. In the past, Changli Gao attempted to avoid the clone incrementing the skb usage count: commit 210d6de78c5d7c785fc532556cea340e517955e1 Author: Changli Gao <xiaosuo@gmail.com> Date: Thu Jun 24 16:25:12 2010 +0000 act_mirred: don't clone skb when skb isn't shared but some/many device drivers expect an skb usage count of 1, and that caused ooops and was revered. I think the only other option (beyond re-using ACT_MIRROR) is adding another action value, and let the upper layer re-inject the packet while handling such action (similar to what ACT_MIRROR currently does, but preserving the current mirred semantic). Cheers, Paolo
Hi Paolo, Thank you for the patch! Perhaps something to improve: [auto build test WARNING on net-next/master] url: https://github.com/0day-ci/linux/commits/Paolo-Abeni/TC-refactor-TC_ACT_REDIRECT-action/20180716-011055 reproduce: # apt-get install sparse make ARCH=x86_64 allmodconfig make C=1 CF=-D__CHECK_ENDIAN__ sparse warnings: (new ones prefixed by >>) >> net/sched/act_mirred.c:195:17: sparse: dubious: x | !y net/sched/act_mirred.c:260:23: sparse: expression using sizeof(void) net/sched/act_mirred.c:260:23: sparse: expression using sizeof(void) vim +195 net/sched/act_mirred.c 169 170 static int tcf_mirred(struct sk_buff *skb, const struct tc_action *a, 171 struct tcf_result *res) 172 { 173 struct tcf_mirred *m = to_mirred(a); 174 bool m_mac_header_xmit; 175 struct net_device *dev; 176 struct sk_buff *skb2; 177 int retval, err = 0; 178 bool want_ingress; 179 int m_eaction; 180 int mac_len; 181 182 tcf_lastuse_update(&m->tcf_tm); 183 bstats_cpu_update(this_cpu_ptr(m->common.cpu_bstats), skb); 184 185 m_mac_header_xmit = READ_ONCE(m->tcfm_mac_header_xmit); 186 m_eaction = READ_ONCE(m->tcfm_eaction); 187 retval = READ_ONCE(m->tcf_action); 188 dev = rcu_dereference_bh(m->tcfm_dev); 189 want_ingress = tcf_mirred_act_wants_ingress(m_eaction); 190 if (skb_at_tc_ingress(skb) && tcf_mirred_is_act_redirect(m_eaction)) { 191 skb->tc_redirected = 1; 192 skb->tc_from_ingress = 1; 193 194 /* the core redirect code will check dev and its status */ > 195 TCF_RESULT_SET_REDIRECT(res, dev, want_ingress); 196 res->qstats = this_cpu_ptr(m->common.cpu_qstats); 197 return TC_ACT_REDIRECT; 198 } 199 200 if (unlikely(!dev)) { 201 pr_notice_once("tc mirred: target device is gone\n"); 202 goto out; 203 } 204 205 if (unlikely(!(dev->flags & IFF_UP))) { 206 net_notice_ratelimited("tc mirred to Houston: device %s is down\n", 207 dev->name); 208 goto out; 209 } 210 211 skb2 = skb_clone(skb, GFP_ATOMIC); 212 if (!skb2) 213 goto out; 214 215 /* If action's target direction differs than filter's direction, 216 * and devices expect a mac header on xmit, then mac push/pull is 217 * needed. 218 */ 219 if (skb_at_tc_ingress(skb) != want_ingress && m_mac_header_xmit) { 220 if (!skb_at_tc_ingress(skb)) { 221 /* caught at egress, act ingress: pull mac */ 222 mac_len = skb_network_header(skb) - skb_mac_header(skb); 223 skb_pull_rcsum(skb2, mac_len); 224 } else { 225 /* caught at ingress, act egress: push mac */ 226 skb_push_rcsum(skb2, skb->mac_len); 227 } 228 } 229 230 /* mirror is always swallowed */ 231 if (tcf_mirred_is_act_redirect(m_eaction)) { 232 skb2->tc_redirected = 1; 233 skb2->tc_from_ingress = skb2->tc_at_ingress; 234 } 235 236 skb2->skb_iif = skb->dev->ifindex; 237 skb2->dev = dev; 238 if (!tcf_mirred_act_wants_ingress(m_eaction)) 239 err = dev_queue_xmit(skb2); 240 else 241 err = netif_receive_skb(skb2); 242 243 if (err) { 244 out: 245 qstats_overlimit_inc(this_cpu_ptr(m->common.cpu_qstats)); 246 if (tcf_mirred_is_act_redirect(m_eaction)) 247 retval = TC_ACT_SHOT; 248 } 249 250 return retval; 251 } 252 --- 0-DAY kernel test infrastructure Open Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation
On Wed, Jul 18, 2018 at 3:05 AM Paolo Abeni <pabeni@redhat.com> wrote: > > Hi, > > On Tue, 2018-07-17 at 10:24 -0700, Cong Wang wrote: > > If you goal is to get rid of skb_clone(), why not just do the following? > > > > if (tcf_mirred_is_act_redirect(m_eaction)) { > > skb2 = skb; > > } else { > > skb2 = skb_clone(skb, GFP_ATOMIC); > > if (!skb2) > > goto out; > > } > > > > For redirect, we return TC_ACT_SHOT, so upper layer should not > > touch the skb after that. > > > > What am I missing here? > > With ACT_SHOT caller/upper layer will free the skb, too. We will have > an use after free (from either the upper layer and the xmit device). > Similar issues with STOLEN, TRAP, etc. > > In the past, Changli Gao attempted to avoid the clone incrementing the > skb usage count: > > commit 210d6de78c5d7c785fc532556cea340e517955e1 > Author: Changli Gao <xiaosuo@gmail.com> > Date: Thu Jun 24 16:25:12 2010 +0000 > > act_mirred: don't clone skb when skb isn't shared > > but some/many device drivers expect an skb usage count of 1, and that > caused ooops and was revered. Interesting, I wasn't aware of the above commit and its revert. First, I didn't use skb_get() above. Second, I think the caller of dev_queue_xmit() should not touch the skb after it, the skb is either freed by dev_queue_xmit() or successfully transmitted, in either case, the ownership belongs to dev_queue_xmit(). So, I think we should skip the qdisc_drop() for this case. Not sure about netif_receive_skb() case, given veth calls in its xmit too, I speculate the rule is probably same. Not sure about other ACT_SHOT case than act_mirred... > > I think the only other option (beyond re-using ACT_MIRROR) is adding > another action value, and let the upper layer re-inject the packet > while handling such action (similar to what ACT_MIRROR currently does, > but preserving the current mirred semantic). Maybe if you mean to avoid breaking ACT_SHOT. Thanks.
On Thu, 2018-07-19 at 10:56 -0700, Cong Wang wrote: > On Wed, Jul 18, 2018 at 3:05 AM Paolo Abeni <pabeni@redhat.com> wrote: > > > > Hi, > > > > On Tue, 2018-07-17 at 10:24 -0700, Cong Wang wrote: > > > If you goal is to get rid of skb_clone(), why not just do the following? > > > > > > if (tcf_mirred_is_act_redirect(m_eaction)) { > > > skb2 = skb; > > > } else { > > > skb2 = skb_clone(skb, GFP_ATOMIC); > > > if (!skb2) > > > goto out; > > > } > > > > > > For redirect, we return TC_ACT_SHOT, so upper layer should not > > > touch the skb after that. > > > > > > What am I missing here? > > > > With ACT_SHOT caller/upper layer will free the skb, too. We will have > > an use after free (from either the upper layer and the xmit device). > > Similar issues with STOLEN, TRAP, etc. > > > > In the past, Changli Gao attempted to avoid the clone incrementing the > > skb usage count: > > > > commit 210d6de78c5d7c785fc532556cea340e517955e1 > > Author: Changli Gao <xiaosuo@gmail.com> > > Date: Thu Jun 24 16:25:12 2010 +0000 > > > > act_mirred: don't clone skb when skb isn't shared > > > > but some/many device drivers expect an skb usage count of 1, and that > > caused ooops and was revered. > > Interesting, I wasn't aware of the above commit and its revert. > > First, I didn't use skb_get() above. > > Second, I think the caller of dev_queue_xmit() should not > touch the skb after it, the skb is either freed by dev_queue_xmit() > or successfully transmitted, in either case, the ownership belongs > to dev_queue_xmit(). So, I think we should skip the qdisc_drop() > for this case. > > Not sure about netif_receive_skb() case, given veth calls in its > xmit too, I speculate the rule is probably same. > > Not sure about other ACT_SHOT case than act_mirred... I think any tc filter can be configured from user space to return ACT_SHOT, so changing the ACT_SHOT handling would be quite invasive and error prone, as all the tc filters (to free the skb on ACT_SHOT) and tc schedulers (to avoid touching the skb) must be modified. If there are no strong objection vs a new action value, I would opt for such option. Thanks, Paolo
diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c index eeb335f03102..b19317426117 100644 --- a/net/sched/act_mirred.c +++ b/net/sched/act_mirred.c @@ -175,6 +175,7 @@ static int tcf_mirred(struct sk_buff *skb, const struct tc_action *a, struct net_device *dev; struct sk_buff *skb2; int retval, err = 0; + bool want_ingress; int m_eaction; int mac_len; @@ -185,6 +186,17 @@ static int tcf_mirred(struct sk_buff *skb, const struct tc_action *a, m_eaction = READ_ONCE(m->tcfm_eaction); retval = READ_ONCE(m->tcf_action); dev = rcu_dereference_bh(m->tcfm_dev); + want_ingress = tcf_mirred_act_wants_ingress(m_eaction); + if (skb_at_tc_ingress(skb) && tcf_mirred_is_act_redirect(m_eaction)) { + skb->tc_redirected = 1; + skb->tc_from_ingress = 1; + + /* the core redirect code will check dev and its status */ + TCF_RESULT_SET_REDIRECT(res, dev, want_ingress); + res->qstats = this_cpu_ptr(m->common.cpu_qstats); + return TC_ACT_REDIRECT; + } + if (unlikely(!dev)) { pr_notice_once("tc mirred: target device is gone\n"); goto out; @@ -204,8 +216,7 @@ static int tcf_mirred(struct sk_buff *skb, const struct tc_action *a, * and devices expect a mac header on xmit, then mac push/pull is * needed. */ - if (skb_at_tc_ingress(skb) != tcf_mirred_act_wants_ingress(m_eaction) && - m_mac_header_xmit) { + if (skb_at_tc_ingress(skb) != want_ingress && m_mac_header_xmit) { if (!skb_at_tc_ingress(skb)) { /* caught at egress, act ingress: pull mac */ mac_len = skb_network_header(skb) - skb_mac_header(skb);
When mirred is invoked from the ingress path, and it wants to redirect the processed packet, it can now use the ACT_REDIRECT action, filling the tcf_result accordingly. This avoids a skb_clone() in the TC S/W data path giving a ~10% improvement in forwarding performances. Overall TC S/W performances are now comparable to the kernel openswitch datapath. Signed-off-by: Paolo Abeni <pabeni@redhat.com> --- net/sched/act_mirred.c | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-)