diff mbox

act_mirred: don't clone skb when skb isn't shared

Message ID 1275658990-15838-1-git-send-email-xiaosuo@gmail.com
State Superseded, archived
Delegated to: David Miller
Headers show

Commit Message

Changli Gao June 4, 2010, 1:43 p.m. UTC
don't clone skb when skb isn't shared

When the tcf_action is TC_ACT_STOLEN, and the skb isn't shared, we don't need
to clone a new skb. As the skb will be freed after this function returns, we
can use it freely once we get a reference to it.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
----
 net/sched/act_mirred.c |   17 ++++++++++++-----
 1 file changed, 12 insertions(+), 5 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

jamal June 5, 2010, 12:53 p.m. UTC | #1
On Fri, 2010-06-04 at 21:43 +0800, Changli Gao wrote:
> don't clone skb when skb isn't shared
> 
> When the tcf_action is TC_ACT_STOLEN, and the skb isn't shared, we don't need
> to clone a new skb. As the skb will be freed after this function returns, we
> can use it freely once we get a reference to it.

It looks like a good optimization - but i am not a big fan of one-offs
[because usability goes down and I am forced to explain it longer in the
rules (refer to: Documentation/networking/tc-actions-env-rules.txt)]

How about you update skb_act_clone to take take the action code as well
and do the check the if stolen/queued it does a skb_get otherwise it
calls skb_clone?

cheers,
jamal

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Changli Gao June 5, 2010, 1:07 p.m. UTC | #2
On Sat, Jun 5, 2010 at 8:53 PM, jamal <hadi@cyberus.ca> wrote:
> On Fri, 2010-06-04 at 21:43 +0800, Changli Gao wrote:
>> don't clone skb when skb isn't shared
>>
>> When the tcf_action is TC_ACT_STOLEN, and the skb isn't shared, we don't need
>> to clone a new skb. As the skb will be freed after this function returns, we
>> can use it freely once we get a reference to it.
>
> It looks like a good optimization - but i am not a big fan of one-offs
> [because usability goes down and I am forced to explain it longer in the
> rules (refer to: Documentation/networking/tc-actions-env-rules.txt)]

Thanks. BTW: act_nat.c doesn't obey the following rule, and you plan
to remove TC_MUNGED and TC_OK2MUNGE?

2) If you munge any packet thou shalt call pskb_expand_head in the case
someone else is referencing the skb. After that you "own" the skb.
You must also tell us if it is ok to munge the packet (TC_OK2MUNGE),
this way any action downstream can stomp on the packet.

>
> How about you update skb_act_clone to take take the action code as well
> and do the check the if stolen/queued it does a skb_get otherwise it
> calls skb_clone?
>

Good idea. Thanks.
jamal June 5, 2010, 1:24 p.m. UTC | #3
On Sat, 2010-06-05 at 21:07 +0800, Changli Gao wrote:

> Thanks. BTW: act_nat.c doesn't obey the following rule, and you plan
> to remove TC_MUNGED and TC_OK2MUNGE?

> 2) If you munge any packet thou shalt call pskb_expand_head in the case
> someone else is referencing the skb. After that you "own" the skb.
> You must also tell us if it is ok to munge the packet (TC_OK2MUNGE),
> this way any action downstream can stomp on the packet.

That rule still applies but it is upto the discretion of the action.
i.e if the act_nat thinks it is ok for others down the street to trample
on the packet, it should tell us so. Maybe i should change the wording
to use the word "may" in that 3rd sentence.
[I will kill (low prio) TC_OK2MUNGE but not TC_MUNGED.]

cheers,
jamal

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Changli Gao June 5, 2010, 1:33 p.m. UTC | #4
On Sat, Jun 5, 2010 at 9:24 PM, jamal <hadi@cyberus.ca> wrote:
> On Sat, 2010-06-05 at 21:07 +0800, Changli Gao wrote:
>
>> Thanks. BTW: act_nat.c doesn't obey the following rule, and you plan
>> to remove TC_MUNGED and TC_OK2MUNGE?
>
>> 2) If you munge any packet thou shalt call pskb_expand_head in the case
>> someone else is referencing the skb. After that you "own" the skb.
>> You must also tell us if it is ok to munge the packet (TC_OK2MUNGE),
>> this way any action downstream can stomp on the packet.
>
> That rule still applies but it is upto the discretion of the action.
> i.e if the act_nat thinks it is ok for others down the street to trample
> on the packet, it should tell us so. Maybe i should change the wording
> to use the word "may" in that 3rd sentence.
> [I will kill (low prio) TC_OK2MUNGE but not TC_MUNGED.]
>

If you kill TC_OK2MUNGE, you should kill TC_MUNGED too, as it will
become useless.

localhost linux # grep MUNGE net/sched/ -R
net/sched/act_pedit.c:  if (!(skb->tc_verd & TC_OK2MUNGE)) {
net/sched/act_pedit.c:                  skb->tc_verd =
SET_TC_MUNGED(skb->tc_verd);
net/sched/act_api.c:                    if (TC_MUNGED & skb->tc_verd) {
net/sched/act_api.c:                            skb->tc_verd =
SET_TC_OK2MUNGE(skb->tc_verd);
net/sched/act_api.c:                            skb->tc_verd =
CLR_TC_MUNGED(skb->tc_verd);

int tcf_action_exec(struct sk_buff *skb, struct tc_action *act,
                    struct tcf_result *res)
{
        struct tc_action *a;
        int ret = -1;

        if (skb->tc_verd & TC_NCLS) {
                skb->tc_verd = CLR_TC_NCLS(skb->tc_verd);
                ret = TC_ACT_OK;
                goto exec_done;
        }
        while ((a = act) != NULL) {
repeat:
                if (a->ops && a->ops->act) {
                        ret = a->ops->act(skb, a, res);
                        if (TC_MUNGED & skb->tc_verd) {
                                /* copied already, allow trampling */
                                skb->tc_verd = SET_TC_OK2MUNGE(skb->tc_verd);
                                skb->tc_verd = CLR_TC_MUNGED(skb->tc_verd);
                        }
                        if (ret == TC_ACT_REPEAT)
                                goto repeat;    /* we need a ttl - JHS */
                        if (ret != TC_ACT_PIPE)
                                goto exec_done;
                }
                act = a->next;
        }
exec_done:
        return ret;
}

The bit OK2MUNGE relies on MUNGED only.
jamal June 5, 2010, 1:59 p.m. UTC | #5
On Sat, 2010-06-05 at 21:33 +0800, Changli Gao wrote:

> 
> If you kill TC_OK2MUNGE, you should kill TC_MUNGED too, as it will
> become useless.

Those two have different semantics and use different verdict bits:
one is saying "this packet has been munged" another is "I give you ok to
munge this packet".
Thanks for the suggestion - I will think about it some more before
deleting anything.

cheers,
jamal

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
index c0b6863..79d4318 100644
--- a/net/sched/act_mirred.c
+++ b/net/sched/act_mirred.c
@@ -169,13 +169,20 @@  static int tcf_mirred(struct sk_buff *skb, struct tc_action *a,
 		goto out;
 	}
 
-	skb2 = skb_act_clone(skb, GFP_ATOMIC);
-	if (skb2 == NULL)
-		goto out;
+	at = G_TC_AT(skb->tc_verd);
+	if (m->tcf_action == TC_ACT_STOLEN && !skb_shared(skb)) {
+		skb2 = skb_get(skb);
+		skb2->tc_verd = SET_TC_VERD(skb2->tc_verd, 0);
+		skb2->tc_verd = CLR_TC_OK2MUNGE(skb2->tc_verd);
+		skb2->tc_verd = CLR_TC_MUNGED(skb2->tc_verd);
+	} else {
+		skb2 = skb_act_clone(skb, GFP_ATOMIC);
+		if (skb2 == NULL)
+			goto out;
+	}
 
 	m->tcf_bstats.bytes += qdisc_pkt_len(skb2);
 	m->tcf_bstats.packets++;
-	at = G_TC_AT(skb->tc_verd);
 	if (!(at & AT_EGRESS)) {
 		if (m->tcfm_ok_push)
 			skb_push(skb2, skb2->dev->hard_header_len);
@@ -185,8 +192,8 @@  static int tcf_mirred(struct sk_buff *skb, struct tc_action *a,
 	if (m->tcfm_eaction != TCA_EGRESS_MIRROR)
 		skb2->tc_verd = SET_TC_FROM(skb2->tc_verd, at);
 
-	skb2->dev = dev;
 	skb2->skb_iif = skb->dev->ifindex;
+	skb2->dev = dev;
 	dev_queue_xmit(skb2);
 	err = 0;