diff mbox

Add sk_mark route lookup support for IPv4 listening sockets, and for IPv4 multicast forwarding

Message ID 200910071559.56526.atis@mikrotik.com
State Accepted, archived
Delegated to: David Miller
Headers show

Commit Message

Atis Elsts Oct. 7, 2009, 12:59 p.m. UTC
On Wednesday 07 October 2009 13:19:57 David Miller wrote:
> From: Atis Elsts <atis@mikrotik.com>
> Date: Mon, 5 Oct 2009 16:46:34 +0300
> 
> > @@ -1238,6 +1238,7 @@ static void ipmr_queue_xmit(struct sk_buff *skb, struct mfc_cache *c, int vifi)
> >  
> >  	if (vif->flags&VIFF_TUNNEL) {
> >  		struct flowi fl = { .oif = vif->link,
> > +				    .mark = skb->mark,
> >  				    .nl_u = { .ip4_u =
> >  					      { .daddr = vif->remote,
> >  						.saddr = vif->local,
> 
> I'm not so sure if this is right.
> 
> I understand what you're trying to do, inherit the socket's
> mark when it goes over a multicast tunnel.
> 
> But I'm not so sure that's what we want to do, semantically.
> 
> Could you split out these skb->mark cases into a seperate
> patch?  The parts that only use sk->mark are fine and I
> would like to apply a patch from you which just does that
> while we discuss the skb->mark case.
> 

Here is the sk_mark part.
     
As for the ipmr.c code, I agree with your comment. Using mark from skb probably is wrong in case of tunnel interface (i.e. in the "if (vif->flags&VIFF_TUNNEL)" part of the patch), my mistake. I still think that the "else" part is correct, though, because using mark from skb there mirrors behaviour for unicast forwarding routing lookup in ip_route_input_slow(). The same applies to IPv6 code in ip6mr_forward2().


Add support for route lookup using sk_mark on IPv4 listening sockets.
Signed-off-by: Atis Elsts <atis@mikrotik.com>
---
 net/ipv4/inet_connection_sock.c |    1 +
 net/ipv4/syncookies.c           |    3 ++-
 2 files changed, 3 insertions(+), 1 deletion(-)

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

David Miller Oct. 7, 2009, 8:56 p.m. UTC | #1
From: Atis Elsts <atis@mikrotik.com>
Date: Wed, 7 Oct 2009 15:59:56 +0300

> Here is the sk_mark part.

Applied, thanks.

> As for the ipmr.c code, I agree with your comment. Using mark from
> skb probably is wrong in case of tunnel interface (i.e. in the "if
> (vif->flags&VIFF_TUNNEL)" part of the patch), my mistake. I still
> think that the "else" part is correct, though, because using mark
> from skb there mirrors behaviour for unicast forwarding routing
> lookup in ip_route_input_slow(). The same applies to IPv6 code in
> ip6mr_forward2().

Ok submit just the else part and we'll have a look at it.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Maciej Żenczykowski Oct. 8, 2009, 12:03 a.m. UTC | #2
Should wrapping a packet into a tunnel clear the mark?
Should perhaps the mark be a parameter of the tunnel (like ttl or qos,
can be set or can be inherited)?

On Wed, Oct 7, 2009 at 13:56, David Miller <davem@davemloft.net> wrote:
> From: Atis Elsts <atis@mikrotik.com>
> Date: Wed, 7 Oct 2009 15:59:56 +0300
>
>> Here is the sk_mark part.
>
> Applied, thanks.
>
>> As for the ipmr.c code, I agree with your comment. Using mark from
>> skb probably is wrong in case of tunnel interface (i.e. in the "if
>> (vif->flags&VIFF_TUNNEL)" part of the patch), my mistake. I still
>> think that the "else" part is correct, though, because using mark
>> from skb there mirrors behaviour for unicast forwarding routing
>> lookup in ip_route_input_slow(). The same applies to IPv6 code in
>> ip6mr_forward2().
>
> Ok submit just the else part and we'll have a look at it.
>
> Thanks.
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
steve@chygwyn.com Oct. 14, 2009, 7:23 a.m. UTC | #3
Hi,

On Wed, Oct 14, 2009 at 12:51:56AM -0700, Maciej Żenczykowski wrote:
> I'm thinking that the mark should be a tunnel parameter with values of
> inherit or a constant.
>
Why not do this on a per-route basis (i.e. lets suppose we add a
"setmark" parameter to each route) and this would allow changing
a mark when a packet matches the route. This not only solves the
tunnel case, but would be generically useful as well.

Since we have to look up routes anyway, it shouldn't add any
real overhead to the routing process and we can benefit from
all the existing infrastructure (route cache, etc).

Steve.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Maciej Żenczykowski Oct. 14, 2009, 7:51 a.m. UTC | #4
I'm thinking that the mark should be a tunnel parameter with values of
inherit or a constant.

The reasoning being that this allows you to mark all packets leaving
on a tunnel with a specific value and you can then use that mark to
use a different routing table just for that tunnel.

The default routing table could point at the tunnel, and if the tunnel
is marked than the mark on those packets could make them not use the
default routing table, and instead use a routing table which is closer
to physical network layout.

I believe nowadays you usually solve this by having an explicit route
for the other endpoint of the tunnel in the main routing table...
which means you can't actually tunnel the normal traffic directed at
the other endpoint of the tunnel.
This would fix that.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Oct. 14, 2009, 9:15 a.m. UTC | #5
From: steve@chygwyn.com

Date: Wed, 14 Oct 2009 08:23:19 +0100

> On Wed, Oct 14, 2009 at 12:51:56AM -0700, Maciej Żenczykowski wrote:

>> I'm thinking that the mark should be a tunnel parameter with values of

>> inherit or a constant.

>>

> Why not do this on a per-route basis (i.e. lets suppose we add a

> "setmark" parameter to each route) and this would allow changing

> a mark when a packet matches the route. This not only solves the

> tunnel case, but would be generically useful as well.

> 

> Since we have to look up routes anyway, it shouldn't add any

> real overhead to the routing process and we can benefit from

> all the existing infrastructure (route cache, etc).


This idea, I like :-)
steve@chygwyn.com Oct. 14, 2009, 9:27 a.m. UTC | #6
Hi,

On Wed, Oct 14, 2009 at 02:50:47AM -0700, Maciej Żenczykowski wrote:
> Problem is the primary purpose of the mark is to enable matching on
> the mark in the routing tables.
> 
> See 'ip rule  ... fwmark X ...'
> 
> ie. that fails due to circular dependency.
> 
>
I don't agree. There are two route lookups with a tunnel, the
internal one and the tunnel one. Here is an example of what I'm
thinking:

1. Look up a route which points at a remote ip addres via a tunnel device.
   The "setmark" on this route sets the skb mark
2. Look up a route on the tunnel itself (i.e. the tunnel endpoint not
   the socket endpoint) using the mark from the initial lookup. This
   route can depend on the previous lookup (if there are multiple
   routes for multiple marks) and also set the mark to use.

The default would be to inherit the mark over a route lookup, in
case that no "setmark" had been specified for that route. In
other words, it would be the same as it is now.

The mark is supposed to be a generic thing, not just for routing
lookups, it can be used for classification, etc as well. I would
expect to see such a thing used for maybe specifying a VLAN or
a reference to an MPLS label stack, or something similar too,

Steve.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Maciej Żenczykowski Oct. 14, 2009, 9:50 a.m. UTC | #7
Problem is the primary purpose of the mark is to enable matching on
the mark in the routing tables.

See 'ip rule  ... fwmark X ...'

ie. that fails due to circular dependency.


On Wed, Oct 14, 2009 at 02:15, David Miller <davem@davemloft.net> wrote:
> From: steve@chygwyn.com
> Date: Wed, 14 Oct 2009 08:23:19 +0100
>
>> On Wed, Oct 14, 2009 at 12:51:56AM -0700, Maciej Żenczykowski wrote:
>>> I'm thinking that the mark should be a tunnel parameter with values of
>>> inherit or a constant.
>>>
>> Why not do this on a per-route basis (i.e. lets suppose we add a
>> "setmark" parameter to each route) and this would allow changing
>> a mark when a packet matches the route. This not only solves the
>> tunnel case, but would be generically useful as well.
>>
>> Since we have to look up routes anyway, it shouldn't add any
>> real overhead to the routing process and we can benefit from
>> all the existing infrastructure (route cache, etc).
>
> This idea, I like :-)
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
steve@chygwyn.com Oct. 14, 2009, 10:16 a.m. UTC | #8
Hi,

On Wed, Oct 14, 2009 at 02:04:37PM +0300, Atis Elsts wrote:
> On Wednesday 14 October 2009 12:27:43 steve@chygwyn.com wrote:
> >
> > The mark is supposed to be a generic thing, not just for routing
> > lookups, it can be used for classification, etc as well. 
> 
> In general, sounds like a good idea, but IMHO exactly this could be a problem.
> skb->mark is already used for a lot of things. What if I am setting the mark 
> by a firewall rule in prerouting chain, and matching it by a postrouting 
> rule? If routing lookup was changing the mark, then my setup would break.
>
Yes, thats exactly why I said that it should default to the current
behaviour.
 
> Perhaps one more field could be added dst_entry? The field would be filled 
> from route's table (if "setmark" for that route was specified). The use of 
> that field would be similar to tclassid (e.g matching in firewall), except 
> that it would also be used in routing lookups, if set.
>
Yes, I think that we are both thinking along the same lines. There
must obviously be a default "don't touch" setting,

Steve.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Atis Elsts Oct. 14, 2009, 11:04 a.m. UTC | #9
On Wednesday 14 October 2009 12:27:43 steve@chygwyn.com wrote:
>
> The mark is supposed to be a generic thing, not just for routing
> lookups, it can be used for classification, etc as well. 

In general, sounds like a good idea, but IMHO exactly this could be a problem.
skb->mark is already used for a lot of things. What if I am setting the mark 
by a firewall rule in prerouting chain, and matching it by a postrouting 
rule? If routing lookup was changing the mark, then my setup would break.

Perhaps one more field could be added dst_entry? The field would be filled 
from route's table (if "setmark" for that route was specified). The use of 
that field would be similar to tclassid (e.g matching in firewall), except 
that it would also be used in routing lookups, if set.

> I would 
> expect to see such a thing used for maybe specifying a VLAN or
> a reference to an MPLS label stack, or something similar too,
>
> Steve.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Maciej Żenczykowski Oct. 14, 2009, 6:33 p.m. UTC | #10
> I don't agree. There are two route lookups with a tunnel, the
> internal one and the tunnel one. Here is an example of what I'm
> thinking:
>
> 1. Look up a route which points at a remote ip addres via a tunnel device.
>   The "setmark" on this route sets the skb mark

imho, this is much better done by having the mark setting performed
explicitly by the tunnel device itself.
That's also were we set ttl and qos (or inherit) on the outgoing packet).

> 2. Look up a route on the tunnel itself (i.e. the tunnel endpoint not
>   the socket endpoint) using the mark from the initial lookup. This
>   route can depend on the previous lookup (if there are multiple
>   routes for multiple marks) and also set the mark to use.

we would get the mark set by the tunneling module here.

> The default would be to inherit the mark over a route lookup, in
> case that no "setmark" had been specified for that route. In
> other words, it would be the same as it is now.

I'm not saying your solution wouldn't work, but I think it's less
clean.  I don't think marking should be inherited (in the general
case) in case of packet wrapping (whether via gre, ipip, sit, or other
methods).

> The mark is supposed to be a generic thing, not just for routing
> lookups, it can be used for classification, etc as well. I would
> expect to see such a thing used for maybe specifying a VLAN or
> a reference to an MPLS label stack, or something similar too,

Right, the mark can currently (as far as I know) be set in one of two
ways - either from the mangle table (and it can also be matched on in
netfilter) or by using setsockopt(SO_MARK).

Imagine a situation where you have a machine with routing already
configured (pretty complex setup, tunnels, firewalls, etc) and you
want to run a user space application that verifies (health-checks)
some remote host (or something).  As part of the health check you want
to verify a particular route to the destination.  This requires
per-socket routing, which can (almost) be achieved by having proper
routing (on fwmark) setup and using setsockopt(SO_MARK) on the health
check socket in order to force specific routing.  These health checks
may then of course be feedback into the routing system (ie. if they
fail the routing rules get modified).  Note, that in particular we may
want to be healthchecking routes that aren't even available in the
default routing table (because they've currently been removed from the
default table, because previous health checks failed).

Maciej
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
steve@chygwyn.com Oct. 19, 2009, 8:20 a.m. UTC | #11
Hi,

On Wed, Oct 14, 2009 at 11:33:39AM -0700, Maciej Żenczykowski wrote:
> > I don't agree. There are two route lookups with a tunnel, the
> > internal one and the tunnel one. Here is an example of what I'm
> > thinking:
> >
> > 1. Look up a route which points at a remote ip addres via a tunnel device.
> >   The "setmark" on this route sets the skb mark
> 
> imho, this is much better done by having the mark setting performed
> explicitly by the tunnel device itself.
> That's also were we set ttl and qos (or inherit) on the outgoing packet).
>
Yes, I think (having looked at the code a bit more in the mean time)
that there is an argument for doing that. Although I still think that
setting the mark via the routing table would be a useful feature too.

 
> > 2. Look up a route on the tunnel itself (i.e. the tunnel endpoint not
> >   the socket endpoint) using the mark from the initial lookup. This
> >   route can depend on the previous lookup (if there are multiple
> >   routes for multiple marks) and also set the mark to use.
> 
> we would get the mark set by the tunneling module here.
> 
> > The default would be to inherit the mark over a route lookup, in
> > case that no "setmark" had been specified for that route. In
> > other words, it would be the same as it is now.
> 
> I'm not saying your solution wouldn't work, but I think it's less
> clean.  I don't think marking should be inherited (in the general
> case) in case of packet wrapping (whether via gre, ipip, sit, or other
> methods).
>
I guess we could say that inheriting the mark would not be the default
if the packet has gone through a device (whether virtual or physical)
then. That still seems ok to me since its basically what happens currently
I think.
 
> > The mark is supposed to be a generic thing, not just for routing
> > lookups, it can be used for classification, etc as well. I would
> > expect to see such a thing used for maybe specifying a VLAN or
> > a reference to an MPLS label stack, or something similar too,
> 
> Right, the mark can currently (as far as I know) be set in one of two
> ways - either from the mangle table (and it can also be matched on in
> netfilter) or by using setsockopt(SO_MARK).
> 
> Imagine a situation where you have a machine with routing already
> configured (pretty complex setup, tunnels, firewalls, etc) and you
> want to run a user space application that verifies (health-checks)
> some remote host (or something).  As part of the health check you want
> to verify a particular route to the destination.  This requires
> per-socket routing, which can (almost) be achieved by having proper
> routing (on fwmark) setup and using setsockopt(SO_MARK) on the health
> check socket in order to force specific routing.  These health checks
> may then of course be feedback into the routing system (ie. if they
> fail the routing rules get modified).  Note, that in particular we may
> want to be healthchecking routes that aren't even available in the
> default routing table (because they've currently been removed from the
> default table, because previous health checks failed).
> 
> Maciej

Yes, thats a good use case. I think there are a lot of other potential
use cases too though. A while back when I was looking into MPLS I wondered
about using the mark to index into a set of outgoing label stacks. That
was the original reason that I thought setting the mark via the routing
table would be useful. I've not really had the time to continue my
MPLS investigations recently though :(

Another potential use case would be to segregate traffic into different
routing domains (and thus being able to change the mark when moving from
one routing domain to another might be useful).

Steve.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Atis Elsts Oct. 19, 2009, 11:38 a.m. UTC | #12
On Monday 19 October 2009 11:20:33 steve@chygwyn.com wrote:
>
> Another potential use case would be to segregate traffic into different
> routing domains (and thus being able to change the mark when moving from
> one routing domain to another might be useful).

I agree. Actually, one of our  users recenlty requested adding matcher in 
firewall that would match outgoing the packets by the routing table that was 
used to route them. (For now we found a workaround using tclassid, but that 
requires manual configuration.) So yes, it's an useful feature even excluding 
the tunnel cases.

I don't like the idea of using skb->mark for storing that information though, 
because I think these multiple uses of the same field would be too confusing 
for users, even if the default behavior remained the same as now. Also, 
consider the case when someone watch to match packets in post routing chain 
*both* by the mark that was set in prerouting chain, and routing table used 
to route the packet?

There already is free space (padding fieds) in struct dst_entry, so why not 
use this space to store the routing table? Speed is also not an issue, 
because the field only needs to be filled in slowpath routing lookup, and 
will be used only
1) if iptables are explicitly configured to match by it;
2) (?) in tunnel routing lookups. (no idea which is the best option here)

I see that struct rt6_info already has field
    struct fib6_table		*rt6i_table
so this matcher already can be made for IPv6 firewall. But IPv4 still is more 
imporant at the moment :)

Atis
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index 4351ca2..9139e8f 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -358,6 +358,7 @@  struct dst_entry *inet_csk_route_req(struct sock *sk,
 	const struct inet_request_sock *ireq = inet_rsk(req);
 	struct ip_options *opt = inet_rsk(req)->opt;
 	struct flowi fl = { .oif = sk->sk_bound_dev_if,
+			    .mark = sk->sk_mark,
 			    .nl_u = { .ip4_u =
 				      { .daddr = ((opt && opt->srr) ?
 						  opt->faddr :
diff --git a/net/ipv4/syncookies.c b/net/ipv4/syncookies.c
index a6e0e07..5ec678a 100644
--- a/net/ipv4/syncookies.c
+++ b/net/ipv4/syncookies.c
@@ -333,7 +333,8 @@  struct sock *cookie_v4_check(struct sock *sk, struct sk_buff *skb,
 	 * no easy way to do this.
 	 */
 	{
-		struct flowi fl = { .nl_u = { .ip4_u =
+		struct flowi fl = { .mark = sk->sk_mark,
+				    .nl_u = { .ip4_u =
 					      { .daddr = ((opt && opt->srr) ?
 							  opt->faddr :
 							  ireq->rmt_addr),