diff mbox

net: Fix RPF to work with policy routing

Message ID 1255867954.4815.25.camel@dogo.mojatatu.com
State Accepted, archived
Delegated to: David Miller
Headers show

Commit Message

jamal Oct. 18, 2009, 12:12 p.m. UTC
policy routing never worked with mark.

I tested this with ping and the skbedit patch i posted a few days back.

cheers,
jamal
commit f7c6fd2465d8e6f4f89c5d1262da10b4a6d499d0
Author: Jamal Hadi Salim <hadi@cyberus.ca>
Date:   Sun Oct 18 08:04:51 2009 -0400

    [PATCH] net: Fix RPF to work with policy routing
    Policy routing is not looked up by mark on reverse path filtering.
    This fixes it.
    
    Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>

Comments

jamal Oct. 18, 2009, 12:13 p.m. UTC | #1
On Sun, 2009-10-18 at 08:12 -0400, jamal wrote:
> policy routing never worked with mark.

I meant policy routing, mark and RPF never worked together ;->

cheers,
jamal

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Oct. 23, 2009, 4:49 a.m. UTC | #2
From: jamal <hadi@cyberus.ca>
Date: Sun, 18 Oct 2009 08:13:39 -0400

> On Sun, 2009-10-18 at 08:12 -0400, jamal wrote:
>> policy routing never worked with mark.
> 
> I meant policy routing, mark and RPF never worked together ;->

Is this actually valid?

Such a change has a built-in assumption, I think, that
marks are symmetric.

Just because we ended up with mark X on input doesn't mean
that the reverse path route exists with mark X too.

In fact I can't even see a valid way to specify a mark for
the RPF lookup.

Maybe you can convince me otherwise :-)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Maciej Żenczykowski Oct. 23, 2009, 6:30 a.m. UTC | #3
In most of the use cases I've seen marking packets has been far from
symmetric (of course: not that this is representative of anything).
Although, to be fair, most of the time this is because packets in only
direction (usually trasmit) were even being marked in the first place.

However, I do agree that .mark should be explicitly initialized (or a
comment about it being left =0 inserted in the same location).

On Thu, Oct 22, 2009 at 21:49, David Miller <davem@davemloft.net> wrote:
> From: jamal <hadi@cyberus.ca>
> Date: Sun, 18 Oct 2009 08:13:39 -0400
>
>> On Sun, 2009-10-18 at 08:12 -0400, jamal wrote:
>>> policy routing never worked with mark.
>>
>> I meant policy routing, mark and RPF never worked together ;->
>
> Is this actually valid?
>
> Such a change has a built-in assumption, I think, that
> marks are symmetric.
>
> Just because we ended up with mark X on input doesn't mean
> that the reverse path route exists with mark X too.
>
> In fact I can't even see a valid way to specify a mark for
> the RPF lookup.
>
> Maybe you can convince me otherwise :-)
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
jamal Oct. 23, 2009, 10:51 a.m. UTC | #4
On Thu, 2009-10-22 at 21:49 -0700, David Miller wrote:

> Such a change has a built-in assumption, I think, that
> marks are symmetric.

Only if the admin said they are symetric (by jumping
a few hoops).  In otherwise, I may intentionaly want
them to be symetric and use policy routing. I cant today.

> Just because we ended up with mark X on input doesn't mean
> that the reverse path route exists with mark X too.
>
> In fact I can't even see a valid way to specify a mark for
> the RPF lookup.

with the ipt or skbedit actions or via netfilter i could
set marks which could be as trivial as "set mark X if packet 
came in via eth0 or eth1 and mark Y if they came in via gre0"

> Maybe you can convince me otherwise :-)

Ok, let me try ;-> 

First let me say that it is _non-trivial_ for a packet to be 
policy-routing-RPF-dropped even with this patch. Youd have to do at
least 3 things to achieve that goal:
a) mark the packet on ingress 
b) have a  blackhole route in the policy routing table as the fall
through match and 
c) turn on rpf.

If someone goes that far to install policies, their intent could be
judged to mean they are serious about using RPF with policy routing.
If any of the above 3 conditions are not true then things continue to
work as is today.
IOW, if i set all those 3 conditions above then my expectation is the
system should not let in a packet if the policy routing table says no.
My intent is not to use the main table or some default route in the main
table; i really meant to use that policy routing table.

cheers,
jamal

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ben Greear Oct. 23, 2009, 3:34 p.m. UTC | #5
jamal wrote:
> with the ipt or skbedit actions or via netfilter i could
> set marks which could be as trivial as "set mark X if packet 
> came in via eth0 or eth1 and mark Y if they came in via gre0"
>   
I implemented something similar while allowing for virtual router like
applications.  I had to add a mark very early in the pkt rx logic in dev.c,
and had to add a 'skb_default_mark' member to the netdevice because
the route lookup is done before the normal iptables logic ran.  Without
this, if a flow already existed for pkts coming in eth1, if the packet came
back in eth2, it would use eth1's flow.

I'll dig out the patch if anyone is interested...

Ben
jamal Oct. 23, 2009, 10:40 p.m. UTC | #6
On Fri, 2009-10-23 at 08:34 -0700, Ben Greear wrote:

> I implemented something similar while allowing for virtual router like
> applications.  I had to add a mark very early in the pkt rx logic in dev.c,
> and had to add a 'skb_default_mark' member to the netdevice because
> the route lookup is done before the normal iptables logic ran.  

You dont need to add a new construct to netdev.
Here's how youd tag all packets coming on eth0 with mark 7:
----
tc filter add dev eth0 parent ffff: protocol ip \
pref 10 u32 match u32 0 0 flowid 1:17 \
action skbedit mark 7
---

Of course you could also be very flow specific, example:

----
tc filter add dev eth0 parent ffff: protocol ip \
pref 9 u32 match ip src 64.233.169.99/32 flowid 1:5 \
action skbedit mark 5
---

Or even use iptable marker
---
tc filter add dev eth2 parent 1:0 protocol ip \
prio 5 u32 match ip dst 10.0.0.90/32 flowid 1:12 \
action ipt -j mark --set-mark 2
----

You could even slice bread with this stuff. Example
you could use certain policy routing tables only
if a flow misbehaved (works well with routing not
local destined packets), example

---
tc filter add dev eth0 parent ffff: protocol ip prio 1 u32 \
match ip src 10.0.0.90/32 flowid 1:10 \
action ipt -j mark --set-mark 1 \
action police rate 100kbit burst 90k pipe \
action ipt -j mark --set-mark 2 \
action police rate 50kbit burst 50k pipe \
action ipt -j mark --set-mark 3 \
action police rate 50kbit burst 50k drop
----

As a warning ipt could be shaky in some distros because
of the morphing iptables interface.

> Without
> this, if a flow already existed for pkts coming in eth1, if the packet came
> back in eth2, it would use eth1's flow.

True. Of course you can avoid that with the patch i posted 
meeting the conditions i described with RPF.

> I'll dig out the patch if anyone is interested...

If you can do overlapping IP addresses, it would be interesting
to see. Otherwise all is achievable with smart policy routing.

cheers,
jamal

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Oct. 30, 2009, 5:51 a.m. UTC | #7
From: jamal <hadi@cyberus.ca>
Date: Sun, 18 Oct 2009 08:12:33 -0400

> commit f7c6fd2465d8e6f4f89c5d1262da10b4a6d499d0
> Author: Jamal Hadi Salim <hadi@cyberus.ca>
> Date:   Sun Oct 18 08:04:51 2009 -0400
> 
>     [PATCH] net: Fix RPF to work with policy routing
>     Policy routing is not looked up by mark on reverse path filtering.
>     This fixes it.
>     
>     Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>

Applied to net-2.6, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
index ef91fe9..4d22fab 100644
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -210,7 +210,8 @@  extern struct fib_table *fib_get_table(struct net *net, u32 id);
 extern const struct nla_policy rtm_ipv4_policy[];
 extern void		ip_fib_init(void);
 extern int fib_validate_source(__be32 src, __be32 dst, u8 tos, int oif,
-			       struct net_device *dev, __be32 *spec_dst, u32 *itag);
+			       struct net_device *dev, __be32 *spec_dst,
+			       u32 *itag, u32 mark);
 extern void fib_select_default(struct net *net, const struct flowi *flp,
 			       struct fib_result *res);
 
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index e2f9505..aa00398 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -229,14 +229,17 @@  unsigned int inet_dev_addr_type(struct net *net, const struct net_device *dev,
  */
 
 int fib_validate_source(__be32 src, __be32 dst, u8 tos, int oif,
-			struct net_device *dev, __be32 *spec_dst, u32 *itag)
+			struct net_device *dev, __be32 *spec_dst,
+			u32 *itag, u32 mark)
 {
 	struct in_device *in_dev;
 	struct flowi fl = { .nl_u = { .ip4_u =
 				      { .daddr = src,
 					.saddr = dst,
 					.tos = tos } },
+			    .mark = mark,
 			    .iif = oif };
+
 	struct fib_result res;
 	int no_addr, rpf;
 	int ret;
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 278f46f..9744fc5 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1852,7 +1852,7 @@  static int ip_route_input_mc(struct sk_buff *skb, __be32 daddr, __be32 saddr,
 			goto e_inval;
 		spec_dst = inet_select_addr(dev, 0, RT_SCOPE_LINK);
 	} else if (fib_validate_source(saddr, 0, tos, 0,
-					dev, &spec_dst, &itag) < 0)
+					dev, &spec_dst, &itag, 0) < 0)
 		goto e_inval;
 
 	rth = dst_alloc(&ipv4_dst_ops);
@@ -1965,7 +1965,7 @@  static int __mkroute_input(struct sk_buff *skb,
 
 
 	err = fib_validate_source(saddr, daddr, tos, FIB_RES_OIF(*res),
-				  in_dev->dev, &spec_dst, &itag);
+				  in_dev->dev, &spec_dst, &itag, skb->mark);
 	if (err < 0) {
 		ip_handle_martian_source(in_dev->dev, in_dev, skb, daddr,
 					 saddr);
@@ -2139,7 +2139,7 @@  static int ip_route_input_slow(struct sk_buff *skb, __be32 daddr, __be32 saddr,
 		int result;
 		result = fib_validate_source(saddr, daddr, tos,
 					     net->loopback_dev->ifindex,
-					     dev, &spec_dst, &itag);
+					     dev, &spec_dst, &itag, skb->mark);
 		if (result < 0)
 			goto martian_source;
 		if (result)
@@ -2168,7 +2168,7 @@  brd_input:
 		spec_dst = inet_select_addr(dev, 0, RT_SCOPE_LINK);
 	else {
 		err = fib_validate_source(saddr, 0, tos, 0, dev, &spec_dst,
-					  &itag);
+					  &itag, skb->mark);
 		if (err < 0)
 			goto martian_source;
 		if (err)