diff mbox

[net] ipv4: fib: check forwarding before checking send_redirects

Message ID 1396985482-30886-1-git-send-email-xiyou.wangcong@gmail.com
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Cong Wang April 8, 2014, 7:31 p.m. UTC
From: Cong Wang <cwang@twopensource.com>

We have seen in a weird case we had to disable send_redirects in order
to pass rp filter check even though we don't set forwarding at all.
This looks wrong, at least according to ip-sysctl.txt send_redirects should
only make sense when we enable forwarding.

Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Julian Anastasov <ja@ssi.bg>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Cong Wang <cwang@twopensource.com>

---
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

David Miller April 8, 2014, 8:43 p.m. UTC | #1
From: Cong Wang <xiyou.wangcong@gmail.com>
Date: Tue,  8 Apr 2014 12:31:22 -0700

> From: Cong Wang <cwang@twopensource.com>
> 
> We have seen in a weird case we had to disable send_redirects in order
> to pass rp filter check even though we don't set forwarding at all.
> This looks wrong, at least according to ip-sysctl.txt send_redirects should
> only make sense when we enable forwarding.
> 
> Cc: Eric Biederman <ebiederm@xmission.com>
> Cc: Julian Anastasov <ja@ssi.bg>
> Cc: David S. Miller <davem@davemloft.net>
> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
> Signed-off-by: Cong Wang <cwang@twopensource.com>

I'm not so sure about this.

This test here is just an optimization, which bypasses the long path
processing of FIB source address validation if certain strict
conditions are met.

__fib_validate_source() should do the right thing if it is executed,
it is just the slow path, and you should determine why it is rejecting
your traffic instead.

Your change is a valid optimization perhaps, but not a bug fix.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Cong Wang April 8, 2014, 9:17 p.m. UTC | #2
On Tue, Apr 8, 2014 at 1:43 PM, David Miller <davem@davemloft.net> wrote:
> From: Cong Wang <xiyou.wangcong@gmail.com>
> Date: Tue,  8 Apr 2014 12:31:22 -0700
>
>> From: Cong Wang <cwang@twopensource.com>
>>
>> We have seen in a weird case we had to disable send_redirects in order
>> to pass rp filter check even though we don't set forwarding at all.
>> This looks wrong, at least according to ip-sysctl.txt send_redirects should
>> only make sense when we enable forwarding.
>>
>> Cc: Eric Biederman <ebiederm@xmission.com>
>> Cc: Julian Anastasov <ja@ssi.bg>
>> Cc: David S. Miller <davem@davemloft.net>
>> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
>> Signed-off-by: Cong Wang <cwang@twopensource.com>
>
> I'm not so sure about this.
>
> This test here is just an optimization, which bypasses the long path
> processing of FIB source address validation if certain strict
> conditions are met.
>
> __fib_validate_source() should do the right thing if it is executed,
> it is just the slow path, and you should determine why it is rejecting
> your traffic instead.
>
> Your change is a valid optimization perhaps, but not a bug fix.

In our case, we have the following setting:

1) rp_filter = 0 (set manually)
2) forwarding = 0 (default)
3) send_redirects = 1 (default)

We are not supposed even to execute __fib_validate_source() in such case,
are we? :)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller April 8, 2014, 9:23 p.m. UTC | #3
From: Cong Wang <xiyou.wangcong@gmail.com>
Date: Tue, 8 Apr 2014 14:17:08 -0700

> We are not supposed even to execute __fib_validate_source() in such
> case, are we? :)

Yes, we used to, every time we'd make a routing cache entry, we'd
execute this code.

Check fib_validate_source() from before the routing cache removal if
you don't believe me.

That check you are editing was added by myself as an optimization
that was necessary after the routing cache removal since this
function performs two FIB table lookups every time it is called.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Cong Wang April 8, 2014, 9:50 p.m. UTC | #4
On Tue, Apr 8, 2014 at 2:23 PM, David Miller <davem@davemloft.net> wrote:
> From: Cong Wang <xiyou.wangcong@gmail.com>
> Date: Tue, 8 Apr 2014 14:17:08 -0700
>
>> We are not supposed even to execute __fib_validate_source() in such
>> case, are we? :)
>
> Yes, we used to, every time we'd make a routing cache entry, we'd
> execute this code.
>
> Check fib_validate_source() from before the routing cache removal if
> you don't believe me.
>
> That check you are editing was added by myself as an optimization
> that was necessary after the routing cache removal since this
> function performs two FIB table lookups every time it is called.

I must miss something here. Looking at the code:

        int r = secpath_exists(skb) ? 0 : IN_DEV_RPFILTER(idev);

        if (!r && !fib_num_tclassid_users(dev_net(dev)) &&
            (dev->ifindex != oif || !IN_DEV_TX_REDIRECTS(idev))) {
                *itag = 0;
                return 0;
        }

r == 0 is true in our case since we disable rp_filter manually.
And fib_num_tclassid_users() == 0 by default too.

Therefore, the only thing takes effect is if dev->ifindex == oif.
Given the fact that disabling send_redirects makes the problem
disappear, dev->ifindex != oif must be false, otherwise it doesn't
even need to check TX_REDIRECTS.

The rx interface is loopback, so skb->dev == lo, and the oif we pass
here must be LOOPBACK_IFINDEX, therefore the following code
in ip_route_input_slow():

        if (res.type == RTN_LOCAL) {
                err = fib_validate_source(skb, saddr, daddr, tos,
                                          LOOPBACK_IFINDEX,
                                          dev, in_dev, &itag);
                if (err < 0)
                        goto martian_source_keep_err;
                goto local_input;
        }

which looks good to me too, since the packet has src addr == dst addr.

Which part am I missing?

Thanks!
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Julian Anastasov April 8, 2014, 10:03 p.m. UTC | #5
Hello,

On Tue, 8 Apr 2014, Cong Wang wrote:

> From: Cong Wang <cwang@twopensource.com>
> 
> We have seen in a weird case we had to disable send_redirects in order
> to pass rp filter check even though we don't set forwarding at all.
> This looks wrong, at least according to ip-sysctl.txt send_redirects should
> only make sense when we enable forwarding.
> 
> Cc: Eric Biederman <ebiederm@xmission.com>
> Cc: Julian Anastasov <ja@ssi.bg>
> Cc: David S. Miller <davem@davemloft.net>
> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
> Signed-off-by: Cong Wang <cwang@twopensource.com>
> 
> ---
> diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
> index 1a629f8..3f9e324 100644
> --- a/net/ipv4/fib_frontend.c
> +++ b/net/ipv4/fib_frontend.c
> @@ -321,7 +321,8 @@ int fib_validate_source(struct sk_buff *skb, __be32 src, __be32 dst,
>  	int r = secpath_exists(skb) ? 0 : IN_DEV_RPFILTER(idev);
>  
>  	if (!r && !fib_num_tclassid_users(dev_net(dev)) &&
> -	    (dev->ifindex != oif || !IN_DEV_TX_REDIRECTS(idev))) {
> +	    (dev->ifindex != oif || !IN_DEV_FORWARD(idev) ||
> +	     !IN_DEV_TX_REDIRECTS(idev))) {

	Very strange. Above IN_DEV_FORWARD matters only
when dev->ifindex == oif. And this can happen only for the
ip_route_input_slow -> ip_mkroute_input -> __mkroute_input
case. We are not supposed to reach fib_validate_source because
ip_mkroute_input is after the IN_DEV_FORWARD check in
ip_route_input_slow. This was for forwarding.

	If the case is not forwarding but local delivery,
the dev->ifindex != oif condition should be always true
because we provide 0 or LOOPBACK_IFINDEX for oif.

	So, the question is how we reach fib_validate_source
with dev->ifindex == oif and IN_DEV_FORWARD(idev) already
checked to be != 0 ?

>  		*itag = 0;
>  		return 0;
>  	}

Regards

--
Julian Anastasov <ja@ssi.bg>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller April 8, 2014, 10:04 p.m. UTC | #6
From: Cong Wang <xiyou.wangcong@gmail.com>
Date: Tue, 8 Apr 2014 14:50:09 -0700

> Which part am I missing?

Please stop looking at this conditional, I'm begging you.

Instead, concentrate on why __fib_validate_source() is rejecting
your packets.

The code must function correctly even if __fib_validate_source()
is invoked unconditionally, all the time, for all non-IPSEC
packets.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Julian Anastasov April 8, 2014, 10:15 p.m. UTC | #7
Hello,

On Tue, 8 Apr 2014, Cong Wang wrote:

> I must miss something here. Looking at the code:
> 
>         int r = secpath_exists(skb) ? 0 : IN_DEV_RPFILTER(idev);
> 
>         if (!r && !fib_num_tclassid_users(dev_net(dev)) &&
>             (dev->ifindex != oif || !IN_DEV_TX_REDIRECTS(idev))) {
>                 *itag = 0;
>                 return 0;
>         }
> 
> r == 0 is true in our case since we disable rp_filter manually.
> And fib_num_tclassid_users() == 0 by default too.
> 
> Therefore, the only thing takes effect is if dev->ifindex == oif.
> Given the fact that disabling send_redirects makes the problem
> disappear, dev->ifindex != oif must be false, otherwise it doesn't
> even need to check TX_REDIRECTS.
> 
> The rx interface is loopback, so skb->dev == lo, and the oif we pass
> here must be LOOPBACK_IFINDEX, therefore the following code
> in ip_route_input_slow():

	Yes, the code will fail if input route is
tried on loopback device. My first thought was that
you forgot to 'ip route flush cache' after clearing
rp_filter.

> which looks good to me too, since the packet has src addr == dst addr.
> 
> Which part am I missing?

	What kind of packet is that and why the skb lost its
output route and now it is trying for input route?

Regards

--
Julian Anastasov <ja@ssi.bg>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Cong Wang April 8, 2014, 10:55 p.m. UTC | #8
Thanks for your reply, David and Julian!

On Tue, Apr 8, 2014 at 3:15 PM, Julian Anastasov <ja@ssi.bg> wrote:
>         Yes, the code will fail if input route is
> tried on loopback device. My first thought was that
> you forgot to 'ip route flush cache' after clearing
> rp_filter.
>

route cache is removed after 3.6, so I don't need to do it?

>> which looks good to me too, since the packet has src addr == dst addr.
>>
>> Which part am I missing?
>
>         What kind of packet is that and why the skb lost its
> output route and now it is trying for input route?
>

Yeah, I just noticed that for normal loopback packets its ->dst
are forced to be kept so that it will not need to lookup route
again on rx. However, in our case, we redirect the packets
from veth0 to lo when src_addr == dst_addr, its ->dst is dropped
after coming out of the network namespace, therefore goes into
the slow path to validate the reverse path and lookup the route.

I think David's point is still valid in this case, __fib_validate_source()
should not reject the packets. So hmm... the only reason I can find is
the following code:

        if (fib_lookup(net, &fl4, &res) == 0) {
                if (res.type == RTN_UNICAST)
                        ret = FIB_RES_NH(res).nh_scope >= RT_SCOPE_HOST;
        }

where its nh_scope == RT_SCOPE_HOST?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Julian Anastasov April 9, 2014, 8:03 a.m. UTC | #9
Hello,

On Tue, 8 Apr 2014, Cong Wang wrote:

> >         Yes, the code will fail if input route is
> > tried on loopback device. My first thought was that
> > you forgot to 'ip route flush cache' after clearing
> > rp_filter.
> >
> 
> route cache is removed after 3.6, so I don't need to do it?

	route cache is now FIB cache :)

	My 'ip' tool still does

open("/proc/sys/net/ipv4/route/flush", O_WRONLY) = 4
write(4, "-1", 2) = 2
close(4)          = 0

	Which invalidates the FIB cache:
ipv4_sysctl_rtcache_flush -> rt_cache_flush, fnhe_genid_bump

	The change is noticed via rt_cache_valid and
rt_is_expired.

> >         What kind of packet is that and why the skb lost its
> > output route and now it is trying for input route?
> >
> 
> Yeah, I just noticed that for normal loopback packets its ->dst
> are forced to be kept so that it will not need to lookup route
> again on rx. However, in our case, we redirect the packets
> from veth0 to lo when src_addr == dst_addr, its ->dst is dropped
> after coming out of the network namespace, therefore goes into
> the slow path to validate the reverse path and lookup the route.

	I'm still not sure that it is valid to process
input route on loopback device. There is also a case
of broadcasts/multicasts that should preserve their local output
route while being looped in ip_mc_output(). In this
case device is not loopback. I need more time to check
the code again...

> I think David's point is still valid in this case, __fib_validate_source()
> should not reject the packets. So hmm... the only reason I can find is
> the following code:
> 
>         if (fib_lookup(net, &fl4, &res) == 0) {
>                 if (res.type == RTN_UNICAST)
>                         ret = FIB_RES_NH(res).nh_scope >= RT_SCOPE_HOST;
>         }

	I think fib_validate_source is called with
oif=LOOPBACK_IFINDEX and dev/idev=lo for skb without route,
that is the problem. In this case fib_validate_source
thinks this is forwarding from lo to lo. So, there is
a reason your patch fixes the problem, alternative
is to add oif == LOOPBACK_IFINDEX check:

	dev->ifindex != oif || !IN_DEV_TX_REDIRECTS(idev) ||
	oif == LOOPBACK_IFINDEX

> where its nh_scope == RT_SCOPE_HOST?

	The idea is to avoid calling __fib_validate_source,
right? No need to look into __fib_validate_source. Also,
I'm not sure how legal is to receive packets on lo without
skb->dst...

Regards

--
Julian Anastasov <ja@ssi.bg>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Cong Wang April 11, 2014, 7:11 p.m. UTC | #10
On Wed, Apr 9, 2014 at 1:03 AM, Julian Anastasov <ja@ssi.bg> wrote:
>
>         Hello,
>
> On Tue, 8 Apr 2014, Cong Wang wrote:
>
>> >         Yes, the code will fail if input route is
>> > tried on loopback device. My first thought was that
>> > you forgot to 'ip route flush cache' after clearing
>> > rp_filter.
>> >
>>
>> route cache is removed after 3.6, so I don't need to do it?
>
>         route cache is now FIB cache :)
>
>         My 'ip' tool still does
>
> open("/proc/sys/net/ipv4/route/flush", O_WRONLY) = 4
> write(4, "-1", 2) = 2
> close(4)          = 0
>
>         Which invalidates the FIB cache:
> ipv4_sysctl_rtcache_flush -> rt_cache_flush, fnhe_genid_bump
>
>         The change is noticed via rt_cache_valid and
> rt_is_expired.

OK, I will add this.

>
>> >         What kind of packet is that and why the skb lost its
>> > output route and now it is trying for input route?
>> >
>>
>> Yeah, I just noticed that for normal loopback packets its ->dst
>> are forced to be kept so that it will not need to lookup route
>> again on rx. However, in our case, we redirect the packets
>> from veth0 to lo when src_addr == dst_addr, its ->dst is dropped
>> after coming out of the network namespace, therefore goes into
>> the slow path to validate the reverse path and lookup the route.
>
>         I'm still not sure that it is valid to process
> input route on loopback device. There is also a case
> of broadcasts/multicasts that should preserve their local output
> route while being looped in ip_mc_output(). In this
> case device is not loopback. I need more time to check
> the code again...

OK, please do, you are the expert on routing. :)

>
>> I think David's point is still valid in this case, __fib_validate_source()
>> should not reject the packets. So hmm... the only reason I can find is
>> the following code:
>>
>>         if (fib_lookup(net, &fl4, &res) == 0) {
>>                 if (res.type == RTN_UNICAST)
>>                         ret = FIB_RES_NH(res).nh_scope >= RT_SCOPE_HOST;
>>         }
>
>         I think fib_validate_source is called with
> oif=LOOPBACK_IFINDEX and dev/idev=lo for skb without route,
> that is the problem. In this case fib_validate_source
> thinks this is forwarding from lo to lo. So, there is
> a reason your patch fixes the problem, alternative
> is to add oif == LOOPBACK_IFINDEX check:
>
>         dev->ifindex != oif || !IN_DEV_TX_REDIRECTS(idev) ||
>         oif == LOOPBACK_IFINDEX

Yeah, actually this is my initial fix, but I thought making an
exception here for loopback does not look good therefore dropped
this idea.

>
>> where its nh_scope == RT_SCOPE_HOST?
>
>         The idea is to avoid calling __fib_validate_source,
> right? No need to look into __fib_validate_source. Also,
> I'm not sure how legal is to receive packets on lo without
> skb->dst...
>

That's my point, but David seems not agree. :)

Thanks!
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rex Ge April 12, 2014, 5:32 a.m. UTC | #11
sorry for send this the third time, I believe I fix the Content-Type this time.


I agree with David.
If your packet fail RP filter, that means something else went seriously wrong. This is not the fix.
some information, like what is the "src addr" you refer to, what does your routing table look like, would help

Rex

On Apr 12, 2014, at 3:11 AM, Cong Wang <xiyou.wangcong@gmail.com> wrote:

> On Wed, Apr 9, 2014 at 1:03 AM, Julian Anastasov <ja@ssi.bg> wrote:
>> 
>>        Hello,
>> 
>> On Tue, 8 Apr 2014, Cong Wang wrote:
>> 
>>>>        Yes, the code will fail if input route is
>>>> tried on loopback device. My first thought was that
>>>> you forgot to 'ip route flush cache' after clearing
>>>> rp_filter.
>>>> 
>>> 
>>> route cache is removed after 3.6, so I don't need to do it?
>> 
>>        route cache is now FIB cache :)
>> 
>>        My 'ip' tool still does
>> 
>> open("/proc/sys/net/ipv4/route/flush", O_WRONLY) = 4
>> write(4, "-1", 2) = 2
>> close(4)          = 0
>> 
>>        Which invalidates the FIB cache:
>> ipv4_sysctl_rtcache_flush -> rt_cache_flush, fnhe_genid_bump
>> 
>>        The change is noticed via rt_cache_valid and
>> rt_is_expired.
> 
> OK, I will add this.
> 
>> 
>>>>        What kind of packet is that and why the skb lost its
>>>> output route and now it is trying for input route?
>>>> 
>>> 
>>> Yeah, I just noticed that for normal loopback packets its ->dst
>>> are forced to be kept so that it will not need to lookup route
>>> again on rx. However, in our case, we redirect the packets
>>> from veth0 to lo when src_addr == dst_addr, its ->dst is dropped
>>> after coming out of the network namespace, therefore goes into
>>> the slow path to validate the reverse path and lookup the route.
>> 
>>        I'm still not sure that it is valid to process
>> input route on loopback device. There is also a case
>> of broadcasts/multicasts that should preserve their local output
>> route while being looped in ip_mc_output(). In this
>> case device is not loopback. I need more time to check
>> the code again...
> 
> OK, please do, you are the expert on routing. :)
> 
>> 
>>> I think David's point is still valid in this case, __fib_validate_source()
>>> should not reject the packets. So hmm... the only reason I can find is
>>> the following code:
>>> 
>>>        if (fib_lookup(net, &fl4, &res) == 0) {
>>>                if (res.type == RTN_UNICAST)
>>>                        ret = FIB_RES_NH(res).nh_scope >= RT_SCOPE_HOST;
>>>        }
>> 
>>        I think fib_validate_source is called with
>> oif=LOOPBACK_IFINDEX and dev/idev=lo for skb without route,
>> that is the problem. In this case fib_validate_source
>> thinks this is forwarding from lo to lo. So, there is
>> a reason your patch fixes the problem, alternative
>> is to add oif == LOOPBACK_IFINDEX check:
>> 
>>        dev->ifindex != oif || !IN_DEV_TX_REDIRECTS(idev) ||
>>        oif == LOOPBACK_IFINDEX
> 
> Yeah, actually this is my initial fix, but I thought making an
> exception here for loopback does not look good therefore dropped
> this idea.
> 
>> 
>>> where its nh_scope == RT_SCOPE_HOST?
>> 
>>        The idea is to avoid calling __fib_validate_source,
>> right? No need to look into __fib_validate_source. Also,
>> I'm not sure how legal is to receive packets on lo without
>> skb->dst...
>> 
> 
> That's my point, but David seems not agree. :)
> 
> Thanks!
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rex Ge April 12, 2014, 9:52 a.m. UTC | #12
On Apr 9, 2014, at 6:55 AM, Cong Wang <xiyou.wangcong@gmail.com> wrote:

> Thanks for your reply, David and Julian!
> 
> On Tue, Apr 8, 2014 at 3:15 PM, Julian Anastasov <ja@ssi.bg> wrote:
>>        Yes, the code will fail if input route is
>> tried on loopback device. My first thought was that
>> you forgot to 'ip route flush cache' after clearing
>> rp_filter.
>> 
> 
> route cache is removed after 3.6, so I don't need to do it?
> 
>>> which looks good to me too, since the packet has src addr == dst addr.
>>> 
>>> Which part am I missing?
>> 
>>        What kind of packet is that and why the skb lost its
>> output route and now it is trying for input route?
>> 
> 
> Yeah, I just noticed that for normal loopback packets its ->dst
> are forced to be kept so that it will not need to lookup route
> again on rx. However, in our case, we redirect the packets
> from veth0 to lo when src_addr == dst_addr, its ->dst is dropped
> after coming out of the network namespace, therefore goes into
> the slow path to validate the reverse path and lookup the route.

If what you say is true, this is exactly where the problem is caused.
  "redirect the packet from veth0 to lo when src_addr == dst_addr”
  “its ->dst is dropped after coming out of the network namespace”
these two does not suppose to happen at the same time.

loopback packets are suppose to be redirected to lo device
of the same network namespace, and ->dst is kept.
if ->dst is dropped, set rp_filter to 2 and enable accept_local,
it should be able to pass “__fib_validate_source” check.

If you do these two at the same time, it is your responsibility to trick
routing into believing “dst_addr” is configured on lo device 
of target namespace and enable accept_local on lo,
or rebuild a “dst_entry” manually.

Throw away the assumption that ->dst is attached with
loopback packet is big for routing, some serious checking is required.


> 
> I think David's point is still valid in this case, __fib_validate_source()
> should not reject the packets. So hmm... the only reason I can find is
> the following code:
> 
>        if (fib_lookup(net, &fl4, &res) == 0) {
>                if (res.type == RTN_UNICAST)
>                        ret = FIB_RES_NH(res).nh_scope >= RT_SCOPE_HOST;
>        }
> 
> where its nh_scope == RT_SCOPE_HOST?
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Julian Anastasov April 13, 2014, 2:57 p.m. UTC | #13
Hello,

On Fri, 11 Apr 2014, Cong Wang wrote:

> On Wed, Apr 9, 2014 at 1:03 AM, Julian Anastasov <ja@ssi.bg> wrote:
> >
> >         I'm still not sure that it is valid to process
> > input route on loopback device. There is also a case
> > of broadcasts/multicasts that should preserve their local output
> > route while being looped in ip_mc_output(). In this
> > case device is not loopback. I need more time to check
> > the code again...

	It looks like LOOPBACK_IFINDEX is used only for
flowi4_iif. Users can distinguish input from output route
lookups by 'ip rule ... iif lo' matching. This trick works
because the lo interface was never used for input routes.
This is documented in iproute2/doc/ip-cref.tex, look for
'iif NAME' for the ip rules.

	As result, input routes are not expected on loopback.
Not sure if you can configure your setup in different way.
We still can solve it by fixing another problem, read below.

	Considering how flowi_iif is matched in fib_rule_match(),
it is wrong to provide 0 for the flowi4_iif field in
__fib_validate_source(). By this way, users can not match
this reverse flow with 'ip rule ... iif lo'. Of course,
such rules are rarely used.

	There are other places that need to provide
LOOPBACK_IFINDEX instead of 0:

- include/net/flow.h flowi4_init_output()
- net/ipv4/ipmr.c reg_vif_xmit(): skb->skb_iif ? : LOOPBACK_IFINDEX;
- net/ipv4/netfilter/ipt_rpfilter.c rpfilter_mt()

	Simply, flowi4_iif must not contain 0, it does not
look logical to ignore all ip rules with specified iif.

	So, we can implement such changes:

- all places that provide LOOPBACK_IFINDEX to fib_validate_source()
should be changed to provide 0 for oif. By this way the
'dev->ifindex != oif' check will work for your setup.

- then __fib_validate_source() will use:

	-	fl4.flowi4_iif = oif;
	+	fl4.flowi4_iif = oif ? : LOOPBACK_IFINDEX;

	What we achieve is:

- correct 'ip rule ... iif lo' matching for rp_filter checks

- fib_validate_source() will not call __fib_validate_source()
for loopback interface (when rp_filter is 0).

	By this way we do not add extra check in fib_validate_source(),
it goes in __fib_validate_source. The special semantic for
'lo' is coded at the right place, just for flowi4_iif.

Regards

--
Julian Anastasov <ja@ssi.bg>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Cong Wang April 14, 2014, 10:59 p.m. UTC | #14
On Sun, Apr 13, 2014 at 7:57 AM, Julian Anastasov <ja@ssi.bg> wrote:
>
>         It looks like LOOPBACK_IFINDEX is used only for
> flowi4_iif. Users can distinguish input from output route
> lookups by 'ip rule ... iif lo' matching. This trick works
> because the lo interface was never used for input routes.
> This is documented in iproute2/doc/ip-cref.tex, look for
> 'iif NAME' for the ip rules.
>
>         As result, input routes are not expected on loopback.
> Not sure if you can configure your setup in different way.
> We still can solve it by fixing another problem, read below.
>
>         Considering how flowi_iif is matched in fib_rule_match(),
> it is wrong to provide 0 for the flowi4_iif field in
> __fib_validate_source(). By this way, users can not match
> this reverse flow with 'ip rule ... iif lo'. Of course,
> such rules are rarely used.
>
>         There are other places that need to provide
> LOOPBACK_IFINDEX instead of 0:
>
> - include/net/flow.h flowi4_init_output()
> - net/ipv4/ipmr.c reg_vif_xmit(): skb->skb_iif ? : LOOPBACK_IFINDEX;
> - net/ipv4/netfilter/ipt_rpfilter.c rpfilter_mt()
>
>         Simply, flowi4_iif must not contain 0, it does not
> look logical to ignore all ip rules with specified iif.
>
>         So, we can implement such changes:
>
> - all places that provide LOOPBACK_IFINDEX to fib_validate_source()
> should be changed to provide 0 for oif. By this way the
> 'dev->ifindex != oif' check will work for your setup.
>
> - then __fib_validate_source() will use:
>
>         -       fl4.flowi4_iif = oif;
>         +       fl4.flowi4_iif = oif ? : LOOPBACK_IFINDEX;
>
>         What we achieve is:
>
> - correct 'ip rule ... iif lo' matching for rp_filter checks
>
> - fib_validate_source() will not call __fib_validate_source()
> for loopback interface (when rp_filter is 0).
>
>         By this way we do not add extra check in fib_validate_source(),
> it goes in __fib_validate_source. The special semantic for
> 'lo' is coded at the right place, just for flowi4_iif.

Thanks for the details! This makes a lot sense to me. :)

I just cooked 3 patches based on your suggestion, I will give them
some quick tests before sending them out.

Will send them soon with you Cc'ed.

Regards.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 1a629f8..3f9e324 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -321,7 +321,8 @@  int fib_validate_source(struct sk_buff *skb, __be32 src, __be32 dst,
 	int r = secpath_exists(skb) ? 0 : IN_DEV_RPFILTER(idev);
 
 	if (!r && !fib_num_tclassid_users(dev_net(dev)) &&
-	    (dev->ifindex != oif || !IN_DEV_TX_REDIRECTS(idev))) {
+	    (dev->ifindex != oif || !IN_DEV_FORWARD(idev) ||
+	     !IN_DEV_TX_REDIRECTS(idev))) {
 		*itag = 0;
 		return 0;
 	}