diff mbox

pkt_sched: act_xt support new Xtables interface

Message ID 50D1AB7E.5060000@mojatatu.com
State Not Applicable
Headers show

Commit Message

Jamal Hadi Salim Dec. 19, 2012, 11:56 a.m. UTC
To be applied pending more testing.

Attached. Sorry, I thought I had sent this out over the weekend.
I have done basic testing with a single mark and sending pings to
update stats which can then displayed for the mark.

Hasan/Yury, if you test this please use the latest iproute2 with only 
the first patch I posted (originally from Hasan). Hasan please use that
patch not your version - if theres anything wrong we can find out sooner
before the patch becomes final.

cheers,
jamal
commit 82330cc874429c63bd0e476e413a79ebab3da350
Author: Jamal Hadi Salim <hadi@mojatatu.com>
Date:   Wed Dec 19 06:23:28 2012 -0500

    Fix iptables/xtables ABI changes. We will eventually replace
    act_ipt with act_xt since only very few targets still support the
    old xtables interface
    
    Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>

Comments

Jamal Hadi Salim Dec. 19, 2012, 11 p.m. UTC | #1
On 12-12-19 10:51 AM, Hasan Chowdhury wrote:
> Hi Jamal,
> I will test it once I get some opportunity , but think I like to know
> even before any  testing
>
> 1. What will be the new procedure to compile iproute2 after the patch
> apply (any new library or any configuration that needs to be adjusted )

git pull Stephens latest tree.
Apply patch 1 and compile.
When you are done compiling an m_xt.so will sit in the tc directory;
unfortunate that we are putting out this shared libs. Backup your distro
version and copy this over to that location. On ubuntu 12.04:
sudo cp tc/m_xt.so /usr/lib/tc/m_xt.so
will do it.

> 2.  tc filter add dev eth0 protocol ip parent 1: prio 3 u32 match ip src
> 192.168.0.0/16 <http://192.168.0.0/16>  flowid 1:1  action xt  -j MARK
> --set-mark 3
>
>
> is this still a valid command ?
>

Indeed it is.

cheers,
jamal

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yury Stankevich Dec. 20, 2012, 8:54 a.m. UTC | #2
19.12.2012 15:56, Jamal Hadi Salim пишет:
> Hasan/Yury, if you test this please use the latest iproute2 with only
> the first patch I posted (originally from Hasan). Hasan please use that
> patch not your version - if theres anything wrong we can find out sooner
> before the patch becomes final.

Hello,
3.7.1 kernel with 3.7.0 iproute,
patch-xt, xt-p1 + linkage fix was applyed
command successfully performed, but actually doesn't work.

command:
tc filter add dev $dev parent ffff: protocol ip u32 match u32 0 0 \
            action xt -j CONNMARK --restore-mark \
            action mirred egress redirect dev ifb0
then i use filter:

tc filter add dev ifb0 protocol ip parent 1: prio 2 handle 0xa fw flowid
1:102

iptables line:
iptable -t mangle -A POSTROUTING -p tcp --dport 80 -m connmark --mark 0
-m connbytes --connbytes 204800: --connbytes-dir both --connbytes-mode
bytes -j CONNMARK --set-mark 0xa

once i run a test to download 300K file,
from iptables counters i can see that rule in POSTROUTING is triggered,
but from `tc -s qdisc show dev ifb0` i see that no packets was sent to
1:102 flow.

btw,
tc -p -s filter show dev ifb0 parent 1:
do not show stats `(rule hit 416 success 0)` for this (filter protocol
ip pref 2 fw handle 0xa classid 1:102) rule.
Jamal Hadi Salim Dec. 20, 2012, 12:35 p.m. UTC | #3
Could be your setup. I didnt do a lot of testing but
from my notes (running different kernel at the moment):

#try to point to everything (no iptables setup)
tc filter add dev eth0 parent ffff: protocol ip u32 match u32 0 0 flowid 
23:23 action xt -j CONNMARK --restore-mark
#let it run for a 1 sec then display with
tc -s filter show dev eth0 parent ffff:

----
filter protocol ip pref 49152 u32
filter protocol ip pref 49152 u32 fh 800: ht divisor 1
filter protocol ip pref 49152 u32 fh 800::800 order 2048 key ht 800 bkt 
0 flowid 23:23
   match 00000000/00000000 at 0
	action order 1: tablename: mangle  hook: NF_IP_PRE_ROUTING
	target  CONNMARK restore
	index 1 ref 1 bind 1 installed 3 sec used 1 sec
	Action statistics:
	Sent 280 bytes 4 pkt (dropped 0, overlimits 0 requeues 0)
	backlog 0b 0p requeues 0
----

cheers,
jamal

On 12-12-20 03:54 AM, Yury Stankevich wrote:
> 19.12.2012 15:56, Jamal Hadi Salim пишет:
>> Hasan/Yury, if you test this please use the latest iproute2 with only
>> the first patch I posted (originally from Hasan). Hasan please use that
>> patch not your version - if theres anything wrong we can find out sooner
>> before the patch becomes final.
>
> Hello,
> 3.7.1 kernel with 3.7.0 iproute,
> patch-xt, xt-p1 + linkage fix was applyed
> command successfully performed, but actually doesn't work.
>
> command:
> tc filter add dev $dev parent ffff: protocol ip u32 match u32 0 0 \
>              action xt -j CONNMARK --restore-mark \
>              action mirred egress redirect dev ifb0
> then i use filter:
>
> tc filter add dev ifb0 protocol ip parent 1: prio 2 handle 0xa fw flowid
> 1:102
>
> iptables line:
> iptable -t mangle -A POSTROUTING -p tcp --dport 80 -m connmark --mark 0
> -m connbytes --connbytes 204800: --connbytes-dir both --connbytes-mode
> bytes -j CONNMARK --set-mark 0xa
>
> once i run a test to download 300K file,
> from iptables counters i can see that rule in POSTROUTING is triggered,
> but from `tc -s qdisc show dev ifb0` i see that no packets was sent to
> 1:102 flow.
>
> btw,
> tc -p -s filter show dev ifb0 parent 1:
> do not show stats `(rule hit 416 success 0)` for this (filter protocol
> ip pref 2 fw handle 0xa classid 1:102) rule.
>
>
>

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yury Stankevich Dec. 20, 2012, 2:59 p.m. UTC | #4
interesting,

#tc -s filter show dev usb0 parent ffff:
filter protocol ip pref 49152 u32
filter protocol ip pref 49152 u32 fh 800: ht divisor 1
filter protocol ip pref 49152 u32 fh 800::800 order 2048 key ht 800 bkt
0 terminal flowid ???  (rule hit 707 success 707)
  match 00000000/00000000 at 0 (success 707 )
	action order 1: tablename: mangle  hook: NF_IP_PRE_ROUTING
	target  CONNMARK restore
	index 5 ref 1 bind 1 installed 394 sec used 11 sec
	Action statistics:
	Sent 783783 bytes 707 pkt (dropped 0, overlimits 0 requeues 0)
	backlog 0b 0p requeues 0

	action order 2: mirred (Egress Redirect to device ifb0) stolen
 	index 5 ref 1 bind 1 installed 394 sec used 11 sec
 	Action statistics:
	Sent 783783 bytes 707 pkt (dropped 0, overlimits 0 requeues 0)
	backlog 0b 0p requeues 0

so, looks like packets was sent to CONNMARK target.

but...
i make a iptables rule to log packets with 0xa mark:

Chain PREROUTING (policy ACCEPT 1308 packets, 848K bytes)
 pkts bytes target     prot opt in     out     source
destination
    0     0 NFLOG      all  --  *      *       0.0.0.0/0
0.0.0.0/0            mark match 0xa nflog-group 1

Chain POSTROUTING (policy ACCEPT 1240 packets, 550K bytes)
 pkts bytes target     prot opt in     out     source
destination
    1    40 CONNMARK   tcp  --  *      *       0.0.0.0/0
0.0.0.0/0            tcp dpt:80 connmark match  0x0 connbytes 204800
connbytes mode bytes connbytes direction both CONNMARK set 0xa

idea is:
i run downloading, rule from POSTROUTING must fire if i download more
than ~200K,
tc filter call to CONNMARK restore, must restore mark (0xa) for packets
belong to this connection.
so i expect, that PREROUTING rule must notice the restored mark, but it
doesn't.
maybe i miss something ?


20.12.2012 16:35, Jamal Hadi Salim пишет:
> 
> Could be your setup. I didnt do a lot of testing but
> from my notes (running different kernel at the moment):
> 
> #try to point to everything (no iptables setup)
> tc filter add dev eth0 parent ffff: protocol ip u32 match u32 0 0 flowid
> 23:23 action xt -j CONNMARK --restore-mark
> #let it run for a 1 sec then display with
> tc -s filter show dev eth0 parent ffff:
> 
> ----
> filter protocol ip pref 49152 u32
> filter protocol ip pref 49152 u32 fh 800: ht divisor 1
> filter protocol ip pref 49152 u32 fh 800::800 order 2048 key ht 800 bkt
> 0 flowid 23:23
>   match 00000000/00000000 at 0
>     action order 1: tablename: mangle  hook: NF_IP_PRE_ROUTING
>     target  CONNMARK restore
>     index 1 ref 1 bind 1 installed 3 sec used 1 sec
>     Action statistics:
>     Sent 280 bytes 4 pkt (dropped 0, overlimits 0 requeues 0)
>     backlog 0b 0p requeues 0
> ----
> 
> cheers,
> jamal
> 
> On 12-12-20 03:54 AM, Yury Stankevich wrote:
>> 19.12.2012 15:56, Jamal Hadi Salim пишет:
>>> Hasan/Yury, if you test this please use the latest iproute2 with only
>>> the first patch I posted (originally from Hasan). Hasan please use that
>>> patch not your version - if theres anything wrong we can find out sooner
>>> before the patch becomes final.
>>
>> Hello,
>> 3.7.1 kernel with 3.7.0 iproute,
>> patch-xt, xt-p1 + linkage fix was applyed
>> command successfully performed, but actually doesn't work.
>>
>> command:
>> tc filter add dev $dev parent ffff: protocol ip u32 match u32 0 0 \
>>              action xt -j CONNMARK --restore-mark \
>>              action mirred egress redirect dev ifb0
>> then i use filter:
>>
>> tc filter add dev ifb0 protocol ip parent 1: prio 2 handle 0xa fw flowid
>> 1:102
>>
>> iptables line:
>> iptable -t mangle -A POSTROUTING -p tcp --dport 80 -m connmark --mark 0
>> -m connbytes --connbytes 204800: --connbytes-dir both --connbytes-mode
>> bytes -j CONNMARK --set-mark 0xa
>>
>> once i run a test to download 300K file,
>> from iptables counters i can see that rule in POSTROUTING is triggered,
>> but from `tc -s qdisc show dev ifb0` i see that no packets was sent to
>> 1:102 flow.
>>
>> btw,
>> tc -p -s filter show dev ifb0 parent 1:
>> do not show stats `(rule hit 416 success 0)` for this (filter protocol
>> ip pref 2 fw handle 0xa classid 1:102) rule.
>>
>>
>>
>
Jamal Hadi Salim Dec. 21, 2012, 1:03 p.m. UTC | #5
On 12-12-20 09:59 AM, Yury Stankevich wrote:
> interesting,
>
> #tc -s filter show dev usb0 parent ffff:


Given you are adding this on ingress - the settings you have will
happen before pre-routing hook.
If you did things at egress - the setting will take effect after
post-routing. So take a closer look at those details they look
like your source of issues..

cheers,
jamal
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yury Stankevich Dec. 21, 2012, 1:13 p.m. UTC | #6
21.12.2012 17:03, Jamal Hadi Salim пишет:
> On 12-12-20 09:59 AM, Yury Stankevich wrote:
>> interesting,
>>
>> #tc -s filter show dev usb0 parent ffff:
> 
> 
> Given you are adding this on ingress - the settings you have will
> happen before pre-routing hook.
> If you did things at egress - the setting will take effect after
> post-routing. So take a closer look at those details they look
> like your source of issues..

sure,
i use it ingress,
so, i need to use tc xt action
to get mark on the packet, before filter on ifb will run.

prerouting rule, in turn, used to test if mark was actually restored.

in practice:
1. prerouting rule - is not fired. so, no packets with mark was seen.
2. filter on ifb - do not pass traffic to flow configured.
looks like `CONNMARK --restore` is not really called.
Jamal Hadi Salim Dec. 21, 2012, 1:50 p.m. UTC | #7
On 12-12-21 08:13 AM, Yury Stankevich wrote:

> sure,
> i use it ingress,
> so, i need to use tc xt action
> to get mark on the packet, before filter on ifb will run.

Ok. So does ifb see it?

> prerouting rule, in turn, used to test if mark was actually restored.

No experience with connmark, but - in order to restore something has
to store it, correct?

> in practice:
> 1. prerouting rule - is not fired. so, no packets with mark was seen.
> 2. filter on ifb - do not pass traffic to flow configured.
> looks like `CONNMARK --restore` is not really called.
>

My suspicion is that it is not set to begin with...

cheers,
jamal

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yury Stankevich Dec. 21, 2012, 2:14 p.m. UTC | #8
well.
let me describe whole picture i want to achieve

1. use htb/sfq on ingress.

i got a traffic, and use few u32 filters to direct it to 3 flows,
priority, interactive, and bulk.
as http normally pass to interactive flow, i want to move long donwloads
to the bulk one.

how i trying to find these downloads:

iptables -t mangle -A POSTROUTING -p tcp --dport 80 -m connmark --mark 0
-m connbytes --connbytes 204800: --connbytes-dir both --connbytes-mode
bytes -j CONNMARK --set-mark 0xa
so, http connection where more than 200K downloaded, must got a
connection mark.

since ingress traffic hits qos before netfilter,
i use xt action, to copy connection mark, to a packet.
(action xt -j CONNMARK --restore-mark )
from this moment, i expect packet must have a restored mark

after this, i can use high priority tc filter .. handle 0xa fw flowid 1:102
to direct packet with mark 0xa to 1:102 flow (bulk).

now about a problem.

1. i run http download, once i get 200K - i can see that rule in
POSTROUTING is triggered and connection mark is installed (iptables -L
-n -v mangle -- can show number of packets matched by rule)

2. i see to tc stats for my flows, and i see, that packets still going
to interactive flow, not bulk as i expect.

3. from tc -s filter show dev usb0 parent ffff:
filter protocol ip pref 49152 u32
filter protocol ip pref 49152 u32 fh 800: ht divisor 1
filter protocol ip pref 49152 u32 fh 800::800 order 2048 key ht 800 bkt
0 terminal flowid ???  (rule hit 707 success 707)
  match 00000000/00000000 at 0 (success 707 )
	action order 1: tablename: mangle  hook: NF_IP_PRE_ROUTING
	target  CONNMARK restore
	index 5 ref 1 bind 1 installed 394 sec used 11 sec
	Action statistics:
	Sent 783783 bytes 707 pkt (dropped 0, overlimits 0 requeues 0)
	backlog 0b 0p requeues 0

	action order 2: mirred (Egress Redirect to device ifb0) stolen
 	index 5 ref 1 bind 1 installed 394 sec used 11 sec
 	Action statistics:
	Sent 783783 bytes 707 pkt (dropped 0, overlimits 0 requeues 0)
	backlog 0b 0p requeues 0

i can see that packets must reach xt action.

4. lets try to check packets mark with iptables,
if mark restored by xt action - i must be able to match it in prerouting
rule.
iptables -t mangle -A PREROUTING -m mark --mark 0xa -j NFLOG --nflog-group 1

but this rule not macthesd - so, no mark is restored by xt action.

maybe im completely wrong here, and such mode can't work for some reason ?

21.12.2012 17:50, Jamal Hadi Salim пишет:
> On 12-12-21 08:13 AM, Yury Stankevich wrote:
> 
>> sure,
>> i use it ingress,
>> so, i need to use tc xt action
>> to get mark on the packet, before filter on ifb will run.
> 
> Ok. So does ifb see it?
> 
>> prerouting rule, in turn, used to test if mark was actually restored.
> 
> No experience with connmark, but - in order to restore something has
> to store it, correct?
> 
>> in practice:
>> 1. prerouting rule - is not fired. so, no packets with mark was seen.
>> 2. filter on ifb - do not pass traffic to flow configured.
>> looks like `CONNMARK --restore` is not really called.
>>
> 
> My suspicion is that it is not set to begin with...
> 
> cheers,
> jamal
>
Jan Engelhardt Dec. 21, 2012, 2:35 p.m. UTC | #9
On Friday 2012-12-21 14:50, Jamal Hadi Salim wrote:
> On 12-12-21 08:13 AM, Yury Stankevich wrote:

>> i use it ingress,
>> so, i need to use tc xt action
>> to get mark on the packet, before filter on ifb will run.
>> prerouting rule, in turn, used to test if mark was actually restored.
>
> No experience with connmark, but - in order to restore something has
> to store it, correct?

The bigger problem here, if I see __netif_receive_skb right, is that
when ingress rules run, skb->nfct is still unset, thereby the
CONNMARK action is a no-op.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Dec. 21, 2012, 3:45 p.m. UTC | #10
On Fri, 2012-12-21 at 15:35 +0100, Jan Engelhardt wrote:

> 
> The bigger problem here, if I see __netif_receive_skb right, is that
> when ingress rules run, skb->nfct is still unset, thereby the
> CONNMARK action is a no-op.

Right, ingress is performed before IP/netfilter stack.

This reminds me this might be the reason we have
skb_reset_transport_header(skb);
in __netif_receive_skb(), while its not very logical.

(Yes, sorry for being off topic, but I am referring to 
http://www.spinics.net/lists/netdev/msg214662.html )



--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jamal Hadi Salim Dec. 22, 2012, 1:19 p.m. UTC | #11
On 12-12-21 09:14 AM, Yury Stankevich wrote:
>
> well.
> let me describe whole picture i want to achieve
>

I think i got what you are trying to do Yury. Clever.
 From the description Jan provided in his response, I dont
think this used to work at all. Are you saying it worked before?

Having said that, what you are doing sounds so useful
that we need to make it work ;-> But it appears like
we need a brand new action for it, something like
GetMarkFromConntrack. Jan, I am assuming (on ingress only)
we need to call "something" to give us the nfct then
grab the skb->mark from nfct. On egress,
I am assuming the skb->mark is already set if connmark
is to be used... Am i correct?
If yes, then this action will only be useful at ingress.

cheers,
jamal
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jamal Hadi Salim Dec. 22, 2012, 1:42 p.m. UTC | #12
On 12-12-21 10:45 AM, Eric Dumazet wrote:
> On Fri, 2012-12-21 at 15:35 +0100, Jan Engelhardt wrote:
>

> This reminds me this might be the reason we have
> skb_reset_transport_header(skb);
> in __netif_receive_skb(), while its not very logical.
>

You seem to have nailed the egress part finally. That has
been a constant battle. At one point the standard answer
was "turn off TSO" ;->

> (Yes, sorry for being off topic, but I am referring to
> http://www.spinics.net/lists/netdev/msg214662.html )


I think the skb_reset_transport_header() when Acme made
a major overhaul to replace direct pointer access.
For this reason i think your second option seems preferable.

cheers,
jamal

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jan Engelhardt Dec. 22, 2012, 1:43 p.m. UTC | #13
On Saturday 2012-12-22 14:19, Jamal Hadi Salim wrote:
>
> Having said that, what you are doing sounds so useful
> that we need to make it work ;-> But it appears like
> we need a brand new action for it, something like
> GetMarkFromConntrack. Jan, I am assuming (on ingress only)
> we need to call "something" to give us the nfct then
> grab the skb->mark from nfct.

Looking up CT before ingress would mean the entire "raw"
table needs to be moved before ingress. But with classic
ip_tables, calling a table requires a lot of setup
(basically ip_rcv).

> On egress,
> I am assuming the skb->mark is already set if connmark
> is to be used... Am i correct?

All new skbs (i.e. those that did not loop due to IPsec, for example)
received through __netif_receive_skb should start out with
skb->mark=0, which is why CONNMARK --restore-mark is needed
to copy skb->mark=ct->mark.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jamal Hadi Salim Dec. 22, 2012, 1:56 p.m. UTC | #14
On 12-12-22 08:43 AM, Jan Engelhardt wrote:

>
> Looking up CT before ingress would mean the entire "raw"
> table needs to be moved before ingress. But with classic
> ip_tables, calling a table requires a lot of setup
> (basically ip_rcv).

Scanning the code:
Would it not work if i only passed it IP packets (the tc
classifier can check) and then for v4 i do something like
ipv4_conntrack_in() with pre-routing as the hook to update
the skb?

> All new skbs (i.e. those that did not loop due to IPsec, for example)
> received through __netif_receive_skb should start out with
> skb->mark=0, which is why CONNMARK --restore-mark is needed
> to copy skb->mark=ct->mark.

I  may be overthinking this: are you saying connmark should do the
copying to skb->mark instead of some action? Earlier you said
conmark depends on presence of skb->nfct.

cheers,
jamal
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yury Stankevich Dec. 22, 2012, 1:58 p.m. UTC | #15
22.12.2012 17:19, Jamal Hadi Salim пишет:
> From the description Jan provided in his response, I dont
> think this used to work at all. Are you saying it worked before?

no.
i'm trying if this can work, alas. it can't.

> Having said that, what you are doing sounds so useful
> that we need to make it work ;-> But it appears like
> we need a brand new action for it, something like
> GetMarkFromConntrack. 

maybe ifb device can be made more friendly to iptables ?
for a sample, run some (or all?) nefilter hooks before qdisc, like on a
normal interface ?
Florian Westphal Dec. 22, 2012, 2:04 p.m. UTC | #16
Yury Stankevich <urykhy@gmail.com> wrote:
> 22.12.2012 17:19, Jamal Hadi Salim пишет:
> > From the description Jan provided in his response, I dont
> > think this used to work at all. Are you saying it worked before?
> 
> no.
> i'm trying if this can work, alas. it can't.

Yury, what are you trying to accomplish?
Is there a particular reason why you want to use ingress shaping
instead of pure policing?

I ask, because you could try to use hashlimit match to do
rate policing via netfilter.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jamal Hadi Salim Dec. 22, 2012, 2:09 p.m. UTC | #17
On 12-12-22 08:58 AM, Yury Stankevich wrote:

> i'm trying if this can work, alas. it can't.

Now i want it to work ;-> So dont give up yet.


cheers,
jamal

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jamal Hadi Salim Dec. 24, 2012, 11:34 a.m. UTC | #18
Some good news Yury.
I am told Felix Fietkau <nbd@openwrt.org> (on CC) actually
already solved this issue and it is a feature in openwrt. I
cant find the code.

Felix - Yury is trying to retrieve skb->mark fields from
netfilter connmark. My understanding is you have written
such an action. Can you please point us to it - and any
reason you havent submitted this for inclusion in kernel
proper?

cheers,
jamal

On 12-12-22 09:09 AM, Jamal Hadi Salim wrote:
> On 12-12-22 08:58 AM, Yury Stankevich wrote:
>
>> i'm trying if this can work, alas. it can't.
>
> Now i want it to work ;-> So dont give up yet.
>
>
> cheers,
> jamal
>

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/sched/Kconfig b/net/sched/Kconfig
index 235e01a..1693973 100644
--- a/net/sched/Kconfig
+++ b/net/sched/Kconfig
@@ -578,12 +578,25 @@  config NET_ACT_MIRRED
 config NET_ACT_IPT
         tristate "IPtables targets"
         depends on NET_CLS_ACT && NETFILTER && IP_NF_IPTABLES
+	select NET_ACT_XT
         ---help---
 	  Say Y here to be able to invoke iptables targets after successful
-	  classification.
+	  classification. Better yet choose NET_ACT_XT since this version
+	  will eventually be obsoleted.
 
 	  To compile this code as a module, choose M here: the
 	  module will be called act_ipt.
+config NET_ACT_XT
+        tristate "New IPtables targets"
+        depends on NET_CLS_ACT && NETFILTER && IP_NF_IPTABLES
+        ---help---
+	  Say Y here to be able to invoke iptables targets after successful
+	  classification using the new xtables mechanism. This mechanism
+	  will eventually replace NET_ACT_IPT
+
+	  To compile this code as a module, choose M here: the
+	  module will be called act_xt.
+
 
 config NET_ACT_NAT
         tristate "Stateless NAT"
diff --git a/net/sched/Makefile b/net/sched/Makefile
index 978cbf0..10a1136 100644
--- a/net/sched/Makefile
+++ b/net/sched/Makefile
@@ -11,6 +11,7 @@  obj-$(CONFIG_NET_ACT_POLICE)	+= act_police.o
 obj-$(CONFIG_NET_ACT_GACT)	+= act_gact.o
 obj-$(CONFIG_NET_ACT_MIRRED)	+= act_mirred.o
 obj-$(CONFIG_NET_ACT_IPT)	+= act_ipt.o
+obj-$(CONFIG_NET_ACT_XT)	+= act_xt.o
 obj-$(CONFIG_NET_ACT_NAT)	+= act_nat.o
 obj-$(CONFIG_NET_ACT_PEDIT)	+= act_pedit.o
 obj-$(CONFIG_NET_ACT_SIMP)	+= act_simple.o
diff --git a/net/sched/act_xt.c b/net/sched/act_xt.c
new file mode 100644
index 0000000..589cfe6
--- /dev/null
+++ b/net/sched/act_xt.c
@@ -0,0 +1,324 @@ 
+/*
+ * net/sched/act_xt.c     iptables target interface
+ *
+ *TODO: Add other tables. For now we only support the ipv4 table targets
+ *
+ *		This program is free software; you can redistribute it and/or
+ *		modify it under the terms of the GNU General Public License
+ *		as published by the Free Software Foundation; either version
+ *		2 of the License, or (at your option) any later version.
+ *
+ * Copyright:	Jamal Hadi Salim (2002-12)
+ */
+
+#include <linux/types.h>
+#include <linux/kernel.h>
+#include <linux/string.h>
+#include <linux/errno.h>
+#include <linux/skbuff.h>
+#include <linux/rtnetlink.h>
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/slab.h>
+#include <net/netlink.h>
+#include <net/pkt_sched.h>
+#include <linux/tc_act/tc_ipt.h>
+#include <net/tc_act/tc_ipt.h>
+
+#include <linux/netfilter_ipv4/ip_tables.h>
+
+#define IPT_TAB_MASK     15
+static struct tcf_common *tcf_ipt_ht[IPT_TAB_MASK + 1];
+static u32 ipt_idx_gen;
+static DEFINE_RWLOCK(ipt_lock);
+
+static struct tcf_hashinfo ipt_hash_info = {
+	.htab = tcf_ipt_ht,
+	.hmask = IPT_TAB_MASK,
+	.lock = &ipt_lock,
+};
+
+static int ipt_init_target(struct xt_entry_target *t, char *table,
+			   unsigned int hook)
+{
+	struct xt_tgchk_param par;
+	struct xt_target *target;
+	int ret = 0;
+
+	target = xt_request_find_target(AF_INET, t->u.user.name,
+					t->u.user.revision);
+	if (IS_ERR(target))
+		return PTR_ERR(target);
+
+	t->u.kernel.target = target;
+	par.table = table;
+	par.entryinfo = NULL;
+	par.target = target;
+	par.targinfo = t->data;
+	par.hook_mask = hook;
+	par.family = NFPROTO_IPV4;
+
+	ret = xt_check_target(&par, t->u.target_size - sizeof(*t), 0, false);
+	if (ret < 0) {
+		module_put(t->u.kernel.target->me);
+		return ret;
+	}
+	return 0;
+}
+
+static void ipt_destroy_target(struct xt_entry_target *t)
+{
+	struct xt_tgdtor_param par = {
+		.target = t->u.kernel.target,
+		.targinfo = t->data,
+	};
+	if (par.target->destroy != NULL)
+		par.target->destroy(&par);
+	module_put(par.target->me);
+}
+
+static int tcf_ipt_release(struct tcf_ipt *ipt, int bind)
+{
+	int ret = 0;
+	if (ipt) {
+		if (bind)
+			ipt->tcf_bindcnt--;
+		ipt->tcf_refcnt--;
+		if (ipt->tcf_bindcnt <= 0 && ipt->tcf_refcnt <= 0) {
+			ipt_destroy_target(ipt->tcfi_t);
+			kfree(ipt->tcfi_tname);
+			kfree(ipt->tcfi_t);
+			tcf_hash_destroy(&ipt->common, &ipt_hash_info);
+			ret = ACT_P_DELETED;
+		}
+	}
+	return ret;
+}
+
+static const struct nla_policy ipt_policy[TCA_IPT_MAX + 1] = {
+	[TCA_IPT_TABLE] = {.type = NLA_STRING,.len = IFNAMSIZ},
+	[TCA_IPT_HOOK] = {.type = NLA_U32},
+	[TCA_IPT_INDEX] = {.type = NLA_U32},
+	[TCA_IPT_TARG] = {.len = sizeof(struct xt_entry_target)},
+};
+
+static int tcf_ipt_init(struct nlattr *nla, struct nlattr *est,
+			struct tc_action *a, int ovr, int bind)
+{
+	struct nlattr *tb[TCA_IPT_MAX + 1];
+	struct tcf_ipt *ipt;
+	struct tcf_common *pc;
+	struct xt_entry_target *td, *t;
+	char *tname;
+	int ret = 0, err;
+	u32 hook = 0;
+	u32 index = 0;
+
+	if (nla == NULL)
+		return -EINVAL;
+
+	err = nla_parse_nested(tb, TCA_IPT_MAX, nla, ipt_policy);
+	if (err < 0)
+		return err;
+
+	if (tb[TCA_IPT_HOOK] == NULL)
+		return -EINVAL;
+	if (tb[TCA_IPT_TARG] == NULL)
+		return -EINVAL;
+
+	td = (struct xt_entry_target *)nla_data(tb[TCA_IPT_TARG]);
+	if (nla_len(tb[TCA_IPT_TARG]) < td->u.target_size)
+		return -EINVAL;
+
+	if (tb[TCA_IPT_INDEX] != NULL)
+		index = nla_get_u32(tb[TCA_IPT_INDEX]);
+
+	pc = tcf_hash_check(index, a, bind, &ipt_hash_info);
+	if (!pc) {
+		pc = tcf_hash_create(index, est, a, sizeof(*ipt), bind,
+				     &ipt_idx_gen, &ipt_hash_info);
+		if (IS_ERR(pc))
+			return PTR_ERR(pc);
+		ret = ACT_P_CREATED;
+	} else {
+		if (!ovr) {
+			tcf_ipt_release(to_ipt(pc), bind);
+			return -EEXIST;
+		}
+	}
+	ipt = to_ipt(pc);
+
+	hook = nla_get_u32(tb[TCA_IPT_HOOK]);
+
+	err = -ENOMEM;
+	tname = kmalloc(IFNAMSIZ, GFP_KERNEL);
+	if (unlikely(!tname))
+		goto err1;
+	if (tb[TCA_IPT_TABLE] == NULL ||
+	    nla_strlcpy(tname, tb[TCA_IPT_TABLE], IFNAMSIZ) >= IFNAMSIZ)
+		strcpy(tname, "mangle");
+
+	t = kmemdup(td, td->u.target_size, GFP_KERNEL);
+	if (unlikely(!t))
+		goto err2;
+
+	err = ipt_init_target(t, tname, hook);
+	if (err < 0)
+		goto err3;
+
+	spin_lock_bh(&ipt->tcf_lock);
+	if (ret != ACT_P_CREATED) {
+		ipt_destroy_target(ipt->tcfi_t);
+		kfree(ipt->tcfi_tname);
+		kfree(ipt->tcfi_t);
+	}
+	ipt->tcfi_tname = tname;
+	ipt->tcfi_t = t;
+	ipt->tcfi_hook = hook;
+	spin_unlock_bh(&ipt->tcf_lock);
+	if (ret == ACT_P_CREATED)
+		tcf_hash_insert(pc, &ipt_hash_info);
+	return ret;
+
+err3:
+	kfree(t);
+err2:
+	kfree(tname);
+err1:
+	if (ret == ACT_P_CREATED) {
+		if (est)
+			gen_kill_estimator(&pc->tcfc_bstats,
+					   &pc->tcfc_rate_est);
+		kfree_rcu(pc, tcfc_rcu);
+	}
+	return err;
+}
+
+static int tcf_ipt_cleanup(struct tc_action *a, int bind)
+{
+	struct tcf_ipt *ipt = a->priv;
+	return tcf_ipt_release(ipt, bind);
+}
+
+static int tcf_ipt(struct sk_buff *skb, const struct tc_action *a,
+		   struct tcf_result *res)
+{
+	int ret = 0, result = 0;
+	struct tcf_ipt *ipt = a->priv;
+	struct xt_action_param par;
+
+	if (skb_cloned(skb)) {
+		if (pskb_expand_head(skb, 0, 0, GFP_ATOMIC))
+			return TC_ACT_UNSPEC;
+	}
+
+	spin_lock(&ipt->tcf_lock);
+
+	ipt->tcf_tm.lastuse = jiffies;
+	bstats_update(&ipt->tcf_bstats, skb);
+
+	/* yes, we have to worry about both in and out dev
+	 * worry later - danger - this API seems to have changed
+	 * from earlier kernels
+	 */
+	par.in = skb->dev;
+	par.out = NULL;
+	par.hooknum = ipt->tcfi_hook;
+	par.target = ipt->tcfi_t->u.kernel.target;
+	par.targinfo = ipt->tcfi_t->data;
+	ret = par.target->target(skb, &par);
+
+	switch (ret) {
+	case NF_ACCEPT:
+		result = TC_ACT_OK;
+		break;
+	case NF_DROP:
+		result = TC_ACT_SHOT;
+		ipt->tcf_qstats.drops++;
+		break;
+	case XT_CONTINUE:
+		result = TC_ACT_PIPE;
+		break;
+	default:
+		net_notice_ratelimited
+		    ("tc filter: Bogus netfilter code %d assume ACCEPT\n", ret);
+		result = TC_POLICE_OK;
+		break;
+	}
+	spin_unlock(&ipt->tcf_lock);
+	return result;
+
+}
+
+static int tcf_ipt_dump(struct sk_buff *skb, struct tc_action *a, int bind,
+			int ref)
+{
+	unsigned char *b = skb_tail_pointer(skb);
+	struct tcf_ipt *ipt = a->priv;
+	struct xt_entry_target *t;
+	struct tcf_t tm;
+	struct tc_cnt c;
+
+	/* for simple targets kernel size == user size
+	 * user name = target name
+	 * for foolproof you need to not assume this
+	 */
+
+	t = kmemdup(ipt->tcfi_t, ipt->tcfi_t->u.user.target_size, GFP_ATOMIC);
+	if (unlikely(!t))
+		goto nla_put_failure;
+
+	c.bindcnt = ipt->tcf_bindcnt - bind;
+	c.refcnt = ipt->tcf_refcnt - ref;
+	strcpy(t->u.user.name, ipt->tcfi_t->u.kernel.target->name);
+
+	if (nla_put(skb, TCA_IPT_TARG, ipt->tcfi_t->u.user.target_size, t) ||
+	    nla_put_u32(skb, TCA_IPT_INDEX, ipt->tcf_index) ||
+	    nla_put_u32(skb, TCA_IPT_HOOK, ipt->tcfi_hook) ||
+	    nla_put(skb, TCA_IPT_CNT, sizeof(struct tc_cnt), &c) ||
+	    nla_put_string(skb, TCA_IPT_TABLE, ipt->tcfi_tname))
+		goto nla_put_failure;
+	tm.install = jiffies_to_clock_t(jiffies - ipt->tcf_tm.install);
+	tm.lastuse = jiffies_to_clock_t(jiffies - ipt->tcf_tm.lastuse);
+	tm.expires = jiffies_to_clock_t(ipt->tcf_tm.expires);
+	if (nla_put(skb, TCA_IPT_TM, sizeof(tm), &tm))
+		goto nla_put_failure;
+	kfree(t);
+	return skb->len;
+
+nla_put_failure:
+	nlmsg_trim(skb, b);
+	kfree(t);
+	return -1;
+}
+
+static struct tc_action_ops act_ipt_ops = {
+	.kind = "xt",
+	.hinfo = &ipt_hash_info,
+	.type = TCA_ACT_IPT,
+	.capab = TCA_CAP_NONE,
+	.owner = THIS_MODULE,
+	.act = tcf_ipt,
+	.dump = tcf_ipt_dump,
+	.cleanup = tcf_ipt_cleanup,
+	.lookup = tcf_hash_search,
+	.init = tcf_ipt_init,
+	.walk = tcf_generic_walker
+};
+
+MODULE_AUTHOR("Jamal Hadi Salim(2002-12)");
+MODULE_DESCRIPTION("New Iptables target actions");
+MODULE_LICENSE("GPL");
+
+static int __init ipt_init_module(void)
+{
+	return tcf_register_action(&act_ipt_ops);
+}
+
+static void __exit ipt_cleanup_module(void)
+{
+	tcf_unregister_action(&act_ipt_ops);
+}
+
+module_init(ipt_init_module);
+module_exit(ipt_cleanup_module);