diff mbox

netfilter: Update ip_route_me_harder to consider L3 domain

Message ID 1478715880-18952-1-git-send-email-dsa@cumulusnetworks.com
State Accepted
Delegated to: Pablo Neira
Headers show

Commit Message

David Ahern Nov. 9, 2016, 6:24 p.m. UTC
ip_route_me_harder is not considering the L3 domain and sending lookups
to the wrong table. For example consider the following output rule:

iptables -I OUTPUT -p tcp --dport 12345 -j REJECT --reject-with tcp-reset

using perf to analyze lookups via the fib_table_lookup tracepoint shows:

vrf-test  1187 [001] 46887.295927: fib:fib_table_lookup: table 255 oif 0 iif 0 src 0.0.0.0 dst 10.100.1.254 tos 0 scope 0 flags 0
        ffffffff8143922c perf_trace_fib_table_lookup ([kernel.kallsyms])
        ffffffff81493aac fib_table_lookup ([kernel.kallsyms])
        ffffffff8148dda3 __inet_dev_addr_type ([kernel.kallsyms])
        ffffffff8148ddf6 inet_addr_type ([kernel.kallsyms])
        ffffffff8149e344 ip_route_me_harder ([kernel.kallsyms])

and

vrf-test  1187 [001] 46887.295933: fib:fib_table_lookup: table 255 oif 0 iif 1 src 10.100.1.254 dst 10.100.1.2 tos 0 scope 0 flags
        ffffffff8143922c perf_trace_fib_table_lookup ([kernel.kallsyms])
        ffffffff81493aac fib_table_lookup ([kernel.kallsyms])
        ffffffff814998ff fib4_rule_action ([kernel.kallsyms])
        ffffffff81437f35 fib_rules_lookup ([kernel.kallsyms])
        ffffffff81499758 __fib_lookup ([kernel.kallsyms])
        ffffffff8144f010 fib_lookup.constprop.34 ([kernel.kallsyms])
        ffffffff8144f759 __ip_route_output_key_hash ([kernel.kallsyms])
        ffffffff8144fc6a ip_route_output_flow ([kernel.kallsyms])
        ffffffff8149e39b ip_route_me_harder ([kernel.kallsyms])

In both cases the lookups are directed to table 255 rather than the
table associated with the device via the L3 domain. Update both
lookups to pull the L3 domain from the dst currently attached to the
skb.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
---
 net/ipv4/netfilter.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

Comments

Pablo Neira Ayuso Nov. 14, 2016, 10:59 p.m. UTC | #1
On Wed, Nov 09, 2016 at 10:24:40AM -0800, David Ahern wrote:
> ip_route_me_harder is not considering the L3 domain and sending lookups
> to the wrong table. For example consider the following output rule:
>
> iptables -I OUTPUT -p tcp --dport 12345 -j REJECT --reject-with tcp-reset
> 
> using perf to analyze lookups via the fib_table_lookup tracepoint shows:
> 
> vrf-test  1187 [001] 46887.295927: fib:fib_table_lookup: table 255 oif 0 iif 0 src 0.0.0.0 dst 10.100.1.254 tos 0 scope 0 flags 0
>         ffffffff8143922c perf_trace_fib_table_lookup ([kernel.kallsyms])
>         ffffffff81493aac fib_table_lookup ([kernel.kallsyms])
>         ffffffff8148dda3 __inet_dev_addr_type ([kernel.kallsyms])
>         ffffffff8148ddf6 inet_addr_type ([kernel.kallsyms])
>         ffffffff8149e344 ip_route_me_harder ([kernel.kallsyms])
> 
> and
> 
> vrf-test  1187 [001] 46887.295933: fib:fib_table_lookup: table 255 oif 0 iif 1 src 10.100.1.254 dst 10.100.1.2 tos 0 scope 0 flags
>         ffffffff8143922c perf_trace_fib_table_lookup ([kernel.kallsyms])
>         ffffffff81493aac fib_table_lookup ([kernel.kallsyms])
>         ffffffff814998ff fib4_rule_action ([kernel.kallsyms])
>         ffffffff81437f35 fib_rules_lookup ([kernel.kallsyms])
>         ffffffff81499758 __fib_lookup ([kernel.kallsyms])
>         ffffffff8144f010 fib_lookup.constprop.34 ([kernel.kallsyms])
>         ffffffff8144f759 __ip_route_output_key_hash ([kernel.kallsyms])
>         ffffffff8144fc6a ip_route_output_flow ([kernel.kallsyms])
>         ffffffff8149e39b ip_route_me_harder ([kernel.kallsyms])
> 
> In both cases the lookups are directed to table 255 rather than the
> table associated with the device via the L3 domain. Update both
> lookups to pull the L3 domain from the dst currently attached to the
> skb.

Does ip6_route_me_harder need an update too?
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Ahern Nov. 14, 2016, 11:04 p.m. UTC | #2
On 11/14/16 3:59 PM, Pablo Neira Ayuso wrote:
> Does ip6_route_me_harder need an update too?

I have not hit a use case yet. Rather than blindly going through and adding l3mdev hooks I would like to tie the changes to known uses cases.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Pablo Neira Ayuso Nov. 14, 2016, 11:48 p.m. UTC | #3
Hi David,

On Mon, Nov 14, 2016 at 04:04:26PM -0700, David Ahern wrote:
> On 11/14/16 3:59 PM, Pablo Neira Ayuso wrote:
> > Does ip6_route_me_harder need an update too?
> 
> I have not hit a use case yet. Rather than blindly going through and
> adding l3mdev hooks I would like to tie the changes to known uses
> cases.

Hm, your follow up patch updates nf_send_reset6() but not
nf_send_reset(). Sorry but it strikes me as inconsistent that some
spots are updated and some others are not.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Pablo Neira Ayuso Nov. 14, 2016, 11:49 p.m. UTC | #4
On Tue, Nov 15, 2016 at 12:48:17AM +0100, Pablo Neira Ayuso wrote:
> Hi David,
> 
> On Mon, Nov 14, 2016 at 04:04:26PM -0700, David Ahern wrote:
> > On 11/14/16 3:59 PM, Pablo Neira Ayuso wrote:
> > > Does ip6_route_me_harder need an update too?
> > 
> > I have not hit a use case yet. Rather than blindly going through and
> > adding l3mdev hooks I would like to tie the changes to known uses
> > cases.
> 
> Hm, your follow up patch updates nf_send_reset6() but not
> nf_send_reset(). Sorry but it strikes me as inconsistent that some
> spots are updated and some others are not.

What usecases you have in mind, btw? I can help testing other
scenarios and fix other spots too if it makes sense to do it in one
go.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Ahern Nov. 14, 2016, 11:53 p.m. UTC | #5
On 11/14/16 4:49 PM, Pablo Neira Ayuso wrote:
> On Tue, Nov 15, 2016 at 12:48:17AM +0100, Pablo Neira Ayuso wrote:
>> Hi David,
>>
>> On Mon, Nov 14, 2016 at 04:04:26PM -0700, David Ahern wrote:
>>> On 11/14/16 3:59 PM, Pablo Neira Ayuso wrote:
>>>> Does ip6_route_me_harder need an update too?
>>>
>>> I have not hit a use case yet. Rather than blindly going through and
>>> adding l3mdev hooks I would like to tie the changes to known uses
>>> cases.
>>
>> Hm, your follow up patch updates nf_send_reset6() but not
>> nf_send_reset(). Sorry but it strikes me as inconsistent that some
>> spots are updated and some others are not.
> 
> What usecases you have in mind, btw? I can help testing other
> scenarios and fix other spots too if it makes sense to do it in one
> go.
> 

As mentioned in the commit message, both this one and the IPV6 one get the REJECT target working for tcp-reset:

iptables -I OUTPUT -p tcp --dport 12345 -j REJECT --reject-with tcp-reset

ip6tables -A OUTPUT -p tcp --dport 12345 -j REJECT --reject-with tcp-reset


--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Pablo Neira Ayuso Nov. 24, 2016, 11:45 a.m. UTC | #6
On Wed, Nov 09, 2016 at 10:24:40AM -0800, David Ahern wrote:
> ip_route_me_harder is not considering the L3 domain and sending lookups
> to the wrong table. For example consider the following output rule:
> 
> iptables -I OUTPUT -p tcp --dport 12345 -j REJECT --reject-with tcp-reset
> 
> using perf to analyze lookups via the fib_table_lookup tracepoint shows:
> 
> vrf-test  1187 [001] 46887.295927: fib:fib_table_lookup: table 255 oif 0 iif 0 src 0.0.0.0 dst 10.100.1.254 tos 0 scope 0 flags 0
>         ffffffff8143922c perf_trace_fib_table_lookup ([kernel.kallsyms])
>         ffffffff81493aac fib_table_lookup ([kernel.kallsyms])
>         ffffffff8148dda3 __inet_dev_addr_type ([kernel.kallsyms])
>         ffffffff8148ddf6 inet_addr_type ([kernel.kallsyms])
>         ffffffff8149e344 ip_route_me_harder ([kernel.kallsyms])
> 
> and
> 
> vrf-test  1187 [001] 46887.295933: fib:fib_table_lookup: table 255 oif 0 iif 1 src 10.100.1.254 dst 10.100.1.2 tos 0 scope 0 flags
>         ffffffff8143922c perf_trace_fib_table_lookup ([kernel.kallsyms])
>         ffffffff81493aac fib_table_lookup ([kernel.kallsyms])
>         ffffffff814998ff fib4_rule_action ([kernel.kallsyms])
>         ffffffff81437f35 fib_rules_lookup ([kernel.kallsyms])
>         ffffffff81499758 __fib_lookup ([kernel.kallsyms])
>         ffffffff8144f010 fib_lookup.constprop.34 ([kernel.kallsyms])
>         ffffffff8144f759 __ip_route_output_key_hash ([kernel.kallsyms])
>         ffffffff8144fc6a ip_route_output_flow ([kernel.kallsyms])
>         ffffffff8149e39b ip_route_me_harder ([kernel.kallsyms])
> 
> In both cases the lookups are directed to table 255 rather than the
> table associated with the device via the L3 domain. Update both
> lookups to pull the L3 domain from the dst currently attached to the
> skb.

Applied.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/ipv4/netfilter.c b/net/ipv4/netfilter.c
index c3776ff6749f..b3cc1335adbc 100644
--- a/net/ipv4/netfilter.c
+++ b/net/ipv4/netfilter.c
@@ -24,10 +24,11 @@  int ip_route_me_harder(struct net *net, struct sk_buff *skb, unsigned int addr_t
 	struct flowi4 fl4 = {};
 	__be32 saddr = iph->saddr;
 	__u8 flags = skb->sk ? inet_sk_flowi_flags(skb->sk) : 0;
+	struct net_device *dev = skb_dst(skb)->dev;
 	unsigned int hh_len;
 
 	if (addr_type == RTN_UNSPEC)
-		addr_type = inet_addr_type(net, saddr);
+		addr_type = inet_addr_type_dev_table(net, dev, saddr);
 	if (addr_type == RTN_LOCAL || addr_type == RTN_UNICAST)
 		flags |= FLOWI_FLAG_ANYSRC;
 	else
@@ -40,6 +41,8 @@  int ip_route_me_harder(struct net *net, struct sk_buff *skb, unsigned int addr_t
 	fl4.saddr = saddr;
 	fl4.flowi4_tos = RT_TOS(iph->tos);
 	fl4.flowi4_oif = skb->sk ? skb->sk->sk_bound_dev_if : 0;
+	if (!fl4.flowi4_oif)
+		fl4.flowi4_oif = l3mdev_master_ifindex(dev);
 	fl4.flowi4_mark = skb->mark;
 	fl4.flowi4_flags = flags;
 	rt = ip_route_output_key(net, &fl4);