diff mbox

[net-next,2/3] netfilter: ip6_tables: use reasm skb for matching

Message ID 1383649333-6321-3-git-send-email-jiri@resnulli.us
State Superseded
Headers show

Commit Message

Jiri Pirko Nov. 5, 2013, 11:02 a.m. UTC
Currently, when ipv6 fragment goes through the netfilter, match
functions are called on them directly. This might cause match function
to fail. So benefit from the fact that nf_defrag_ipv6 constructs
reassembled skb for us and use this reassembled skb for matching.

This patch fixes for example following situation:
On HOSTA do:
ip6tables -I INPUT -p icmpv6 -j DROP
ip6tables -I INPUT -p icmpv6 -m icmp6 --icmpv6-type 128 -j ACCEPT

and on HOSTB you do:
ping6 HOSTA -s2000    (MTU is 1500)

Incoming echo requests will be filtered out on HOSTA. This issue does
not occur with smaller packets than MTU (where fragmentation does not happen).

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
---
 net/ipv6/netfilter/ip6_tables.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

Comments

Marcelo Leitner Nov. 5, 2013, 11:50 a.m. UTC | #1
Em 05-11-2013 09:02, Jiri Pirko escreveu:
> Currently, when ipv6 fragment goes through the netfilter, match
> functions are called on them directly. This might cause match function
> to fail. So benefit from the fact that nf_defrag_ipv6 constructs
> reassembled skb for us and use this reassembled skb for matching.
>
> This patch fixes for example following situation:
> On HOSTA do:
> ip6tables -I INPUT -p icmpv6 -j DROP
> ip6tables -I INPUT -p icmpv6 -m icmp6 --icmpv6-type 128 -j ACCEPT
>
> and on HOSTB you do:
> ping6 HOSTA -s2000    (MTU is 1500)
>
> Incoming echo requests will be filtered out on HOSTA. This issue does
> not occur with smaller packets than MTU (where fragmentation does not happen).
>
> Signed-off-by: Jiri Pirko <jiri@resnulli.us>

Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>

> ---
>   net/ipv6/netfilter/ip6_tables.c | 8 ++++++--
>   1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/net/ipv6/netfilter/ip6_tables.c b/net/ipv6/netfilter/ip6_tables.c
> index 710238f..ec9cb1a 100644
> --- a/net/ipv6/netfilter/ip6_tables.c
> +++ b/net/ipv6/netfilter/ip6_tables.c
> @@ -328,6 +328,10 @@ ip6t_do_table(struct sk_buff *skb,
>   	const struct xt_table_info *private;
>   	struct xt_action_param acpar;
>   	unsigned int addend;
> +	struct sk_buff *reasm = skb_nfct_reasm(skb);
> +
> +	if (!reasm)
> +		reasm = skb;
>
>   	/* Initialization */
>   	indev = in ? in->name : nulldevname;
> @@ -368,7 +372,7 @@ ip6t_do_table(struct sk_buff *skb,
>
>   		IP_NF_ASSERT(e);
>   		acpar.thoff = 0;
> -		if (!ip6_packet_match(skb, indev, outdev, &e->ipv6,
> +		if (!ip6_packet_match(reasm, indev, outdev, &e->ipv6,
>   		    &acpar.thoff, &acpar.fragoff, &acpar.hotdrop)) {
>    no_match:
>   			e = ip6t_next_entry(e);
> @@ -378,7 +382,7 @@ ip6t_do_table(struct sk_buff *skb,
>   		xt_ematch_foreach(ematch, e) {
>   			acpar.match     = ematch->u.kernel.match;
>   			acpar.matchinfo = ematch->data;
> -			if (!acpar.match->match(skb, &acpar))
> +			if (!acpar.match->match(reasm, &acpar))
>   				goto no_match;
>   		}
>
>

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Florian Westphal Nov. 5, 2013, 1:32 p.m. UTC | #2
Jiri Pirko <jiri@resnulli.us> wrote:
> This patch fixes for example following situation:
> On HOSTA do:
> ip6tables -I INPUT -p icmpv6 -j DROP
> ip6tables -I INPUT -p icmpv6 -m icmp6 --icmpv6-type 128 -j ACCEPT

untested:

-A INPUT -p icmpv6 -m icmp6 --icmpv6-type 128 -j ACCEPT
-A INPUT -p icmpv6 -m conntrack --ctstatus CONFIRMED -j ACCEPT
-A INPUT -p icmpv6 -j DROP

> and on HOSTB you do:
> ping6 HOSTA -s2000    (MTU is 1500)
> 
> Incoming echo requests will be filtered out on HOSTA. This issue does
> not occur with smaller packets than MTU (where fragmentation does not happen).

Patrick, any reason not to kill the special-casing (ct has assigned helper or
unconfirmed conntrack) in __ipv6_conntrack_in() ?

This should make ipv6 frag behaviour consistent; right now its rather
confusing from ruleset point of view, especially the first packet
of a connection is always seen as reassembled.

So with Jiris rules

-A INPUT -p icmpv6 -m icmp6 --icmpv6-type 128 -j ACCEPT
-A INPUT -p icmpv6 -j DROP

ping6 -s $bignum works for the first packet but not for subsequent ones
which is quite irritating.

This change would obviously have userspace visibility (e.g. -m frag
won't work anymore when conntrack is on), but so far I couldn't come
up with a scenario where a legitimate ruleset could break.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Patrick McHardy Nov. 5, 2013, 1:41 p.m. UTC | #3
On Tue, Nov 05, 2013 at 02:32:05PM +0100, Florian Westphal wrote:
> Jiri Pirko <jiri@resnulli.us> wrote:
> > This patch fixes for example following situation:
> > On HOSTA do:
> > ip6tables -I INPUT -p icmpv6 -j DROP
> > ip6tables -I INPUT -p icmpv6 -m icmp6 --icmpv6-type 128 -j ACCEPT
> 
> untested:
> 
> -A INPUT -p icmpv6 -m icmp6 --icmpv6-type 128 -j ACCEPT
> -A INPUT -p icmpv6 -m conntrack --ctstatus CONFIRMED -j ACCEPT
> -A INPUT -p icmpv6 -j DROP
> 
> > and on HOSTB you do:
> > ping6 HOSTA -s2000    (MTU is 1500)
> > 
> > Incoming echo requests will be filtered out on HOSTA. This issue does
> > not occur with smaller packets than MTU (where fragmentation does not happen).
> 
> Patrick, any reason not to kill the special-casing (ct has assigned helper or
> unconfirmed conntrack) in __ipv6_conntrack_in() ?
> 
> This should make ipv6 frag behaviour consistent; right now its rather
> confusing from ruleset point of view, especially the first packet
> of a connection is always seen as reassembled.
> 
> So with Jiris rules
> 
> -A INPUT -p icmpv6 -m icmp6 --icmpv6-type 128 -j ACCEPT
> -A INPUT -p icmpv6 -j DROP
> 
> ping6 -s $bignum works for the first packet but not for subsequent ones
> which is quite irritating.

Well, the reason was to avoid unnecessary work doing refragmentation
unless really required. I know its rather complicated, but IPv6 has
always required treating fragments manually or using conntrack state.

I'm not objecting to changing this, but the patches as they are are
not the way to go. First, moving nfct_frag to struct sk_buff seems
like a real waste of space for this quite rare case. Also, we can't
just use the reassembled packet in ip6tables, when modifying it we
will still output the unchanged fragments. An last of all, we'll be
executing the rules on the reassembled packet multiple times, one
for each fragment.

So if someone wants to change this, simply *only* pass the reassembled
packet through the netfilter hooks and drop the fragments, as in IPv4.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jiri Pirko Nov. 5, 2013, 3:01 p.m. UTC | #4
Tue, Nov 05, 2013 at 02:41:19PM CET, kaber@trash.net wrote:
>On Tue, Nov 05, 2013 at 02:32:05PM +0100, Florian Westphal wrote:
>> Jiri Pirko <jiri@resnulli.us> wrote:
>> > This patch fixes for example following situation:
>> > On HOSTA do:
>> > ip6tables -I INPUT -p icmpv6 -j DROP
>> > ip6tables -I INPUT -p icmpv6 -m icmp6 --icmpv6-type 128 -j ACCEPT
>> 
>> untested:
>> 
>> -A INPUT -p icmpv6 -m icmp6 --icmpv6-type 128 -j ACCEPT
>> -A INPUT -p icmpv6 -m conntrack --ctstatus CONFIRMED -j ACCEPT
>> -A INPUT -p icmpv6 -j DROP
>> 
>> > and on HOSTB you do:
>> > ping6 HOSTA -s2000    (MTU is 1500)
>> > 
>> > Incoming echo requests will be filtered out on HOSTA. This issue does
>> > not occur with smaller packets than MTU (where fragmentation does not happen).
>> 
>> Patrick, any reason not to kill the special-casing (ct has assigned helper or
>> unconfirmed conntrack) in __ipv6_conntrack_in() ?
>> 
>> This should make ipv6 frag behaviour consistent; right now its rather
>> confusing from ruleset point of view, especially the first packet
>> of a connection is always seen as reassembled.
>> 
>> So with Jiris rules
>> 
>> -A INPUT -p icmpv6 -m icmp6 --icmpv6-type 128 -j ACCEPT
>> -A INPUT -p icmpv6 -j DROP
>> 
>> ping6 -s $bignum works for the first packet but not for subsequent ones
>> which is quite irritating.
>
>Well, the reason was to avoid unnecessary work doing refragmentation
>unless really required. I know its rather complicated, but IPv6 has
>always required treating fragments manually or using conntrack state.
>
>I'm not objecting to changing this, but the patches as they are are
>not the way to go. First, moving nfct_frag to struct sk_buff seems

I'm a bit lost. What "nfct_frag" are you reffering to here?

>like a real waste of space for this quite rare case. Also, we can't
>just use the reassembled packet in ip6tables, when modifying it we
>will still output the unchanged fragments. An last of all, we'll be
>executing the rules on the reassembled packet multiple times, one
>for each fragment.

Reassembled skb would be only used for matching where no changes takes
place.

End even though, the matching is now done for each fragment skb anyway. The
change is only to do it on different skb. I see no erformance or any
other problem in that.

>
>So if someone wants to change this, simply *only* pass the reassembled
>packet through the netfilter hooks and drop the fragments, as in IPv4.

This is unfortunatelly not possible because in forwarding use case, the
fragments have to be send out as they come in.

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Florian Westphal Nov. 5, 2013, 3:39 p.m. UTC | #5
Jiri Pirko <jiri@resnulli.us> wrote:
> Tue, Nov 05, 2013 at 02:41:19PM CET, kaber@trash.net wrote:
> >executing the rules on the reassembled packet multiple times, one
> >for each fragment.
> 
[..]
> End even though, the matching is now done for each fragment skb anyway. The
> change is only to do it on different skb. I see no erformance or any
> other problem in that.

One problem that comes to mind is that nfacct or quota match will
now account num_of_fragments * length_of_reassemled_skb bytes.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Patrick McHardy Nov. 5, 2013, 6:16 p.m. UTC | #6
On Tue, Nov 05, 2013 at 04:01:15PM +0100, Jiri Pirko wrote:
> Tue, Nov 05, 2013 at 02:41:19PM CET, kaber@trash.net wrote:
> >On Tue, Nov 05, 2013 at 02:32:05PM +0100, Florian Westphal wrote:
> >> Jiri Pirko <jiri@resnulli.us> wrote:
> >> > This patch fixes for example following situation:
> >> > On HOSTA do:
> >> > ip6tables -I INPUT -p icmpv6 -j DROP
> >> > ip6tables -I INPUT -p icmpv6 -m icmp6 --icmpv6-type 128 -j ACCEPT
> >> 
> >> untested:
> >> 
> >> -A INPUT -p icmpv6 -m icmp6 --icmpv6-type 128 -j ACCEPT
> >> -A INPUT -p icmpv6 -m conntrack --ctstatus CONFIRMED -j ACCEPT
> >> -A INPUT -p icmpv6 -j DROP
> >> 
> >> > and on HOSTB you do:
> >> > ping6 HOSTA -s2000    (MTU is 1500)
> >> > 
> >> > Incoming echo requests will be filtered out on HOSTA. This issue does
> >> > not occur with smaller packets than MTU (where fragmentation does not happen).
> >> 
> >> Patrick, any reason not to kill the special-casing (ct has assigned helper or
> >> unconfirmed conntrack) in __ipv6_conntrack_in() ?
> >> 
> >> This should make ipv6 frag behaviour consistent; right now its rather
> >> confusing from ruleset point of view, especially the first packet
> >> of a connection is always seen as reassembled.
> >> 
> >> So with Jiris rules
> >> 
> >> -A INPUT -p icmpv6 -m icmp6 --icmpv6-type 128 -j ACCEPT
> >> -A INPUT -p icmpv6 -j DROP
> >> 
> >> ping6 -s $bignum works for the first packet but not for subsequent ones
> >> which is quite irritating.
> >
> >Well, the reason was to avoid unnecessary work doing refragmentation
> >unless really required. I know its rather complicated, but IPv6 has
> >always required treating fragments manually or using conntrack state.
> >
> >I'm not objecting to changing this, but the patches as they are are
> >not the way to go. First, moving nfct_frag to struct sk_buff seems
> 
> I'm a bit lost. What "nfct_frag" are you reffering to here?

I meant nfct_reasm of course.

> >like a real waste of space for this quite rare case. Also, we can't
> >just use the reassembled packet in ip6tables, when modifying it we
> >will still output the unchanged fragments. An last of all, we'll be
> >executing the rules on the reassembled packet multiple times, one
> >for each fragment.
> 
> Reassembled skb would be only used for matching where no changes takes
> place.

That still doesn't work, our matches are not purely passive.

> End even though, the matching is now done for each fragment skb anyway. The
> change is only to do it on different skb. I see no erformance or any
> other problem in that.

Accounting, quota, statistic, limit, ... come to mind. Basically any
match that keeps state.

> >So if someone wants to change this, simply *only* pass the reassembled
> >packet through the netfilter hooks and drop the fragments, as in IPv4.
> 
> This is unfortunatelly not possible because in forwarding use case, the
> fragments have to be send out as they come in.

No, the IPv6 NAT patches fixed that, we still do proper refragmentation
and we still respect the original fragment sizes, thus are not responsible
for potentially exceeding the PMTU on the following path.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Patrick McHardy Nov. 5, 2013, 6:19 p.m. UTC | #7
On Tue, Nov 05, 2013 at 04:39:21PM +0100, Florian Westphal wrote:
> Jiri Pirko <jiri@resnulli.us> wrote:
> > Tue, Nov 05, 2013 at 02:41:19PM CET, kaber@trash.net wrote:
> > >executing the rules on the reassembled packet multiple times, one
> > >for each fragment.
> > 
> [..]
> > End even though, the matching is now done for each fragment skb anyway. The
> > change is only to do it on different skb. I see no erformance or any
> > other problem in that.
> 
> One problem that comes to mind is that nfacct or quota match will
> now account num_of_fragments * length_of_reassemled_skb bytes.

indeed. The easiest way to fix all this (and, btw, also the
pskb_expand_head() oops which is currently reported by multiple people)
is to get rid of all the fragmentation handling and simply use the
reassembled skb.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jiri Pirko Nov. 5, 2013, 6:21 p.m. UTC | #8
Tue, Nov 05, 2013 at 07:19:24PM CET, kaber@trash.net wrote:
>On Tue, Nov 05, 2013 at 04:39:21PM +0100, Florian Westphal wrote:
>> Jiri Pirko <jiri@resnulli.us> wrote:
>> > Tue, Nov 05, 2013 at 02:41:19PM CET, kaber@trash.net wrote:
>> > >executing the rules on the reassembled packet multiple times, one
>> > >for each fragment.
>> > 
>> [..]
>> > End even though, the matching is now done for each fragment skb anyway. The
>> > change is only to do it on different skb. I see no erformance or any
>> > other problem in that.
>> 
>> One problem that comes to mind is that nfacct or quota match will
>> now account num_of_fragments * length_of_reassemled_skb bytes.
>
>indeed. The easiest way to fix all this (and, btw, also the
>pskb_expand_head() oops which is currently reported by multiple people)
>is to get rid of all the fragmentation handling and simply use the
>reassembled skb.

Okay. That will resolve the skb->sk rewrite problem as well. I will
prepare a patch. 
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jiri Pirko Nov. 5, 2013, 8:55 p.m. UTC | #9
Tue, Nov 05, 2013 at 07:16:33PM CET, kaber@trash.net wrote:
>On Tue, Nov 05, 2013 at 04:01:15PM +0100, Jiri Pirko wrote:
>> Tue, Nov 05, 2013 at 02:41:19PM CET, kaber@trash.net wrote:
>> >On Tue, Nov 05, 2013 at 02:32:05PM +0100, Florian Westphal wrote:
>> >> Jiri Pirko <jiri@resnulli.us> wrote:
>> >> > This patch fixes for example following situation:
>> >> > On HOSTA do:
>> >> > ip6tables -I INPUT -p icmpv6 -j DROP
>> >> > ip6tables -I INPUT -p icmpv6 -m icmp6 --icmpv6-type 128 -j ACCEPT
>> >> 
>> >> untested:
>> >> 
>> >> -A INPUT -p icmpv6 -m icmp6 --icmpv6-type 128 -j ACCEPT
>> >> -A INPUT -p icmpv6 -m conntrack --ctstatus CONFIRMED -j ACCEPT
>> >> -A INPUT -p icmpv6 -j DROP
>> >> 
>> >> > and on HOSTB you do:
>> >> > ping6 HOSTA -s2000    (MTU is 1500)
>> >> > 
>> >> > Incoming echo requests will be filtered out on HOSTA. This issue does
>> >> > not occur with smaller packets than MTU (where fragmentation does not happen).
>> >> 
>> >> Patrick, any reason not to kill the special-casing (ct has assigned helper or
>> >> unconfirmed conntrack) in __ipv6_conntrack_in() ?
>> >> 
>> >> This should make ipv6 frag behaviour consistent; right now its rather
>> >> confusing from ruleset point of view, especially the first packet
>> >> of a connection is always seen as reassembled.
>> >> 
>> >> So with Jiris rules
>> >> 
>> >> -A INPUT -p icmpv6 -m icmp6 --icmpv6-type 128 -j ACCEPT
>> >> -A INPUT -p icmpv6 -j DROP
>> >> 
>> >> ping6 -s $bignum works for the first packet but not for subsequent ones
>> >> which is quite irritating.
>> >
>> >Well, the reason was to avoid unnecessary work doing refragmentation
>> >unless really required. I know its rather complicated, but IPv6 has
>> >always required treating fragments manually or using conntrack state.
>> >
>> >I'm not objecting to changing this, but the patches as they are are
>> >not the way to go. First, moving nfct_frag to struct sk_buff seems
>> 
>> I'm a bit lost. What "nfct_frag" are you reffering to here?
>
>I meant nfct_reasm of course.

The patch is not moving this to struct sk_buff. It is already there.

>
>> >like a real waste of space for this quite rare case. Also, we can't
>> >just use the reassembled packet in ip6tables, when modifying it we
>> >will still output the unchanged fragments. An last of all, we'll be
>> >executing the rules on the reassembled packet multiple times, one
>> >for each fragment.
>> 
>> Reassembled skb would be only used for matching where no changes takes
>> place.
>
>That still doesn't work, our matches are not purely passive.
>
>> End even though, the matching is now done for each fragment skb anyway. The
>> change is only to do it on different skb. I see no erformance or any
>> other problem in that.
>
>Accounting, quota, statistic, limit, ... come to mind. Basically any
>match that keeps state.

Ok. Makes sense.

>
>> >So if someone wants to change this, simply *only* pass the reassembled
>> >packet through the netfilter hooks and drop the fragments, as in IPv4.
>> 
>> This is unfortunatelly not possible because in forwarding use case, the
>> fragments have to be send out as they come in.
>
>No, the IPv6 NAT patches fixed that, we still do proper refragmentation
>and we still respect the original fragment sizes, thus are not responsible
>for potentially exceeding the PMTU on the following path.

Ok. So the plan is to remove net/ipv6/netfilter/nf_conntrack_reasm.c
code entirely and use net/ipv6/reassembly.c code directly from
nf_defrag_ipv6. This would result in very similar code currently ipv4
has. 


--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Patrick McHardy Nov. 5, 2013, 10:02 p.m. UTC | #10
On Tue, Nov 05, 2013 at 09:55:20PM +0100, Jiri Pirko wrote:
> Tue, Nov 05, 2013 at 07:16:33PM CET, kaber@trash.net wrote:
>
> >> I'm a bit lost. What "nfct_frag" are you reffering to here?
> >
> >I meant nfct_reasm of course.
> 
> The patch is not moving this to struct sk_buff. It is already there.

Right again, sorry, I was replying to Florian's mail without the
quoted patch, so I seem to have mixed up things in between :)

> >> >So if someone wants to change this, simply *only* pass the reassembled
> >> >packet through the netfilter hooks and drop the fragments, as in IPv4.
> >> 
> >> This is unfortunatelly not possible because in forwarding use case, the
> >> fragments have to be send out as they come in.
> >
> >No, the IPv6 NAT patches fixed that, we still do proper refragmentation
> >and we still respect the original fragment sizes, thus are not responsible
> >for potentially exceeding the PMTU on the following path.
> 
> Ok. So the plan is to remove net/ipv6/netfilter/nf_conntrack_reasm.c
> code entirely and use net/ipv6/reassembly.c code directly from
> nf_defrag_ipv6. This would result in very similar code currently ipv4
> has. 

If its possible to use net/ipv6/reassembly.c directly, even better,
so far I was thinking of just getting rid of the fragment replay,
but you're most likely right that this is the proper way to do it.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jiri Pirko Nov. 6, 2013, 2:18 p.m. UTC | #11
>> >So if someone wants to change this, simply *only* pass the reassembled
>> >packet through the netfilter hooks and drop the fragments, as in IPv4.
>> 
>> This is unfortunatelly not possible because in forwarding use case, the
>> fragments have to be send out as they come in.
>
>No, the IPv6 NAT patches fixed that, we still do proper refragmentation
>and we still respect the original fragment sizes, thus are not responsible
>for potentially exceeding the PMTU on the following path.

Can you please point where this is done. Where the original fragment
sizes are stored and in which code are they restored? Thanks.

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Florian Westphal Nov. 6, 2013, 2:33 p.m. UTC | #12
Jiri Pirko <jiri@resnulli.us> wrote:
> >> >So if someone wants to change this, simply *only* pass the reassembled
> >> >packet through the netfilter hooks and drop the fragments, as in IPv4.
> >> 
> >> This is unfortunatelly not possible because in forwarding use case, the
> >> fragments have to be send out as they come in.
> >
> >No, the IPv6 NAT patches fixed that, we still do proper refragmentation
> >and we still respect the original fragment sizes, thus are not responsible
> >for potentially exceeding the PMTU on the following path.
> 
> Can you please point where this is done. Where the original fragment
> sizes are stored and in which code are they restored? Thanks.

Patrick is probably talking about

commit 4cdd34084d539c758d00c5dc7bf95db2e4f2bc70
(netfilter: nf_conntrack_ipv6: improve fragmentation handling)
which introduces 'frag_max_size' in inet6_skb_parm struct.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jiri Pirko Nov. 6, 2013, 2:44 p.m. UTC | #13
Wed, Nov 06, 2013 at 03:33:49PM CET, fw@strlen.de wrote:
>Jiri Pirko <jiri@resnulli.us> wrote:
>> >> >So if someone wants to change this, simply *only* pass the reassembled
>> >> >packet through the netfilter hooks and drop the fragments, as in IPv4.
>> >> 
>> >> This is unfortunatelly not possible because in forwarding use case, the
>> >> fragments have to be send out as they come in.
>> >
>> >No, the IPv6 NAT patches fixed that, we still do proper refragmentation
>> >and we still respect the original fragment sizes, thus are not responsible
>> >for potentially exceeding the PMTU on the following path.
>> 
>> Can you please point where this is done. Where the original fragment
>> sizes are stored and in which code are they restored? Thanks.
>
>Patrick is probably talking about
>
>commit 4cdd34084d539c758d00c5dc7bf95db2e4f2bc70
>(netfilter: nf_conntrack_ipv6: improve fragmentation handling)
>which introduces 'frag_max_size' in inet6_skb_parm struct.

Thanks for the pointer. Interestingly though, according to my testing,
if reassembled packet would fit into outdev mtu, it is not fragmented
to the original frag size and it is send as single big packet. That is
I believe not correct.

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Patrick McHardy Nov. 6, 2013, 2:51 p.m. UTC | #14
On Wed, Nov 06, 2013 at 03:44:53PM +0100, Jiri Pirko wrote:
> Wed, Nov 06, 2013 at 03:33:49PM CET, fw@strlen.de wrote:
> >Jiri Pirko <jiri@resnulli.us> wrote:
> >> >> >So if someone wants to change this, simply *only* pass the reassembled
> >> >> >packet through the netfilter hooks and drop the fragments, as in IPv4.
> >> >> 
> >> >> This is unfortunatelly not possible because in forwarding use case, the
> >> >> fragments have to be send out as they come in.
> >> >
> >> >No, the IPv6 NAT patches fixed that, we still do proper refragmentation
> >> >and we still respect the original fragment sizes, thus are not responsible
> >> >for potentially exceeding the PMTU on the following path.
> >> 
> >> Can you please point where this is done. Where the original fragment
> >> sizes are stored and in which code are they restored? Thanks.
> >
> >Patrick is probably talking about
> >
> >commit 4cdd34084d539c758d00c5dc7bf95db2e4f2bc70
> >(netfilter: nf_conntrack_ipv6: improve fragmentation handling)
> >which introduces 'frag_max_size' in inet6_skb_parm struct.

Indeed.

> Thanks for the pointer. Interestingly though, according to my testing,
> if reassembled packet would fit into outdev mtu, it is not fragmented
> to the original frag size and it is send as single big packet. That is
> I believe not correct.

Hmm right, that wasn't the intention. We currently only use frag_max_size
to send PKT_TOOBIG messages if the size exceeds the MTU, but not to trigger
fragmentation itself.

We probably need to handle this in ip6_finish_output(). Would you mind
fixing this up as well in your patches?
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jiri Pirko Nov. 6, 2013, 3:29 p.m. UTC | #15
Wed, Nov 06, 2013 at 03:51:04PM CET, kaber@trash.net wrote:
>On Wed, Nov 06, 2013 at 03:44:53PM +0100, Jiri Pirko wrote:
>> Wed, Nov 06, 2013 at 03:33:49PM CET, fw@strlen.de wrote:
>> >Jiri Pirko <jiri@resnulli.us> wrote:
>> >> >> >So if someone wants to change this, simply *only* pass the reassembled
>> >> >> >packet through the netfilter hooks and drop the fragments, as in IPv4.
>> >> >> 
>> >> >> This is unfortunatelly not possible because in forwarding use case, the
>> >> >> fragments have to be send out as they come in.
>> >> >
>> >> >No, the IPv6 NAT patches fixed that, we still do proper refragmentation
>> >> >and we still respect the original fragment sizes, thus are not responsible
>> >> >for potentially exceeding the PMTU on the following path.
>> >> 
>> >> Can you please point where this is done. Where the original fragment
>> >> sizes are stored and in which code are they restored? Thanks.
>> >
>> >Patrick is probably talking about
>> >
>> >commit 4cdd34084d539c758d00c5dc7bf95db2e4f2bc70
>> >(netfilter: nf_conntrack_ipv6: improve fragmentation handling)
>> >which introduces 'frag_max_size' in inet6_skb_parm struct.
>
>Indeed.
>
>> Thanks for the pointer. Interestingly though, according to my testing,
>> if reassembled packet would fit into outdev mtu, it is not fragmented
>> to the original frag size and it is send as single big packet. That is
>> I believe not correct.
>
>Hmm right, that wasn't the intention. We currently only use frag_max_size
>to send PKT_TOOBIG messages if the size exceeds the MTU, but not to trigger
>fragmentation itself.
>
>We probably need to handle this in ip6_finish_output(). Would you mind
>fixing this up as well in your patches?

I'm working on it atm.


--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Nov. 6, 2013, 4:12 p.m. UTC | #16
On Wed, 2013-11-06 at 16:29 +0100, Jiri Pirko wrote:

> I'm working on it atm.

Note that we have a similar problem in general with GRO and forwading.

We do no check against output mtu, if the packet is GSO.

skb_is_gso() being true bypasses the safety checks.


--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/ipv6/netfilter/ip6_tables.c b/net/ipv6/netfilter/ip6_tables.c
index 710238f..ec9cb1a 100644
--- a/net/ipv6/netfilter/ip6_tables.c
+++ b/net/ipv6/netfilter/ip6_tables.c
@@ -328,6 +328,10 @@  ip6t_do_table(struct sk_buff *skb,
 	const struct xt_table_info *private;
 	struct xt_action_param acpar;
 	unsigned int addend;
+	struct sk_buff *reasm = skb_nfct_reasm(skb);
+
+	if (!reasm)
+		reasm = skb;
 
 	/* Initialization */
 	indev = in ? in->name : nulldevname;
@@ -368,7 +372,7 @@  ip6t_do_table(struct sk_buff *skb,
 
 		IP_NF_ASSERT(e);
 		acpar.thoff = 0;
-		if (!ip6_packet_match(skb, indev, outdev, &e->ipv6,
+		if (!ip6_packet_match(reasm, indev, outdev, &e->ipv6,
 		    &acpar.thoff, &acpar.fragoff, &acpar.hotdrop)) {
  no_match:
 			e = ip6t_next_entry(e);
@@ -378,7 +382,7 @@  ip6t_do_table(struct sk_buff *skb,
 		xt_ematch_foreach(ematch, e) {
 			acpar.match     = ematch->u.kernel.match;
 			acpar.matchinfo = ematch->data;
-			if (!acpar.match->match(skb, &acpar))
+			if (!acpar.match->match(reasm, &acpar))
 				goto no_match;
 		}