Message ID | 1383649333-6321-3-git-send-email-jiri@resnulli.us |
---|---|
State | Superseded |
Headers | show |
Em 05-11-2013 09:02, Jiri Pirko escreveu: > Currently, when ipv6 fragment goes through the netfilter, match > functions are called on them directly. This might cause match function > to fail. So benefit from the fact that nf_defrag_ipv6 constructs > reassembled skb for us and use this reassembled skb for matching. > > This patch fixes for example following situation: > On HOSTA do: > ip6tables -I INPUT -p icmpv6 -j DROP > ip6tables -I INPUT -p icmpv6 -m icmp6 --icmpv6-type 128 -j ACCEPT > > and on HOSTB you do: > ping6 HOSTA -s2000 (MTU is 1500) > > Incoming echo requests will be filtered out on HOSTA. This issue does > not occur with smaller packets than MTU (where fragmentation does not happen). > > Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com> > --- > net/ipv6/netfilter/ip6_tables.c | 8 ++++++-- > 1 file changed, 6 insertions(+), 2 deletions(-) > > diff --git a/net/ipv6/netfilter/ip6_tables.c b/net/ipv6/netfilter/ip6_tables.c > index 710238f..ec9cb1a 100644 > --- a/net/ipv6/netfilter/ip6_tables.c > +++ b/net/ipv6/netfilter/ip6_tables.c > @@ -328,6 +328,10 @@ ip6t_do_table(struct sk_buff *skb, > const struct xt_table_info *private; > struct xt_action_param acpar; > unsigned int addend; > + struct sk_buff *reasm = skb_nfct_reasm(skb); > + > + if (!reasm) > + reasm = skb; > > /* Initialization */ > indev = in ? in->name : nulldevname; > @@ -368,7 +372,7 @@ ip6t_do_table(struct sk_buff *skb, > > IP_NF_ASSERT(e); > acpar.thoff = 0; > - if (!ip6_packet_match(skb, indev, outdev, &e->ipv6, > + if (!ip6_packet_match(reasm, indev, outdev, &e->ipv6, > &acpar.thoff, &acpar.fragoff, &acpar.hotdrop)) { > no_match: > e = ip6t_next_entry(e); > @@ -378,7 +382,7 @@ ip6t_do_table(struct sk_buff *skb, > xt_ematch_foreach(ematch, e) { > acpar.match = ematch->u.kernel.match; > acpar.matchinfo = ematch->data; > - if (!acpar.match->match(skb, &acpar)) > + if (!acpar.match->match(reasm, &acpar)) > goto no_match; > } > > -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Jiri Pirko <jiri@resnulli.us> wrote: > This patch fixes for example following situation: > On HOSTA do: > ip6tables -I INPUT -p icmpv6 -j DROP > ip6tables -I INPUT -p icmpv6 -m icmp6 --icmpv6-type 128 -j ACCEPT untested: -A INPUT -p icmpv6 -m icmp6 --icmpv6-type 128 -j ACCEPT -A INPUT -p icmpv6 -m conntrack --ctstatus CONFIRMED -j ACCEPT -A INPUT -p icmpv6 -j DROP > and on HOSTB you do: > ping6 HOSTA -s2000 (MTU is 1500) > > Incoming echo requests will be filtered out on HOSTA. This issue does > not occur with smaller packets than MTU (where fragmentation does not happen). Patrick, any reason not to kill the special-casing (ct has assigned helper or unconfirmed conntrack) in __ipv6_conntrack_in() ? This should make ipv6 frag behaviour consistent; right now its rather confusing from ruleset point of view, especially the first packet of a connection is always seen as reassembled. So with Jiris rules -A INPUT -p icmpv6 -m icmp6 --icmpv6-type 128 -j ACCEPT -A INPUT -p icmpv6 -j DROP ping6 -s $bignum works for the first packet but not for subsequent ones which is quite irritating. This change would obviously have userspace visibility (e.g. -m frag won't work anymore when conntrack is on), but so far I couldn't come up with a scenario where a legitimate ruleset could break. -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Nov 05, 2013 at 02:32:05PM +0100, Florian Westphal wrote: > Jiri Pirko <jiri@resnulli.us> wrote: > > This patch fixes for example following situation: > > On HOSTA do: > > ip6tables -I INPUT -p icmpv6 -j DROP > > ip6tables -I INPUT -p icmpv6 -m icmp6 --icmpv6-type 128 -j ACCEPT > > untested: > > -A INPUT -p icmpv6 -m icmp6 --icmpv6-type 128 -j ACCEPT > -A INPUT -p icmpv6 -m conntrack --ctstatus CONFIRMED -j ACCEPT > -A INPUT -p icmpv6 -j DROP > > > and on HOSTB you do: > > ping6 HOSTA -s2000 (MTU is 1500) > > > > Incoming echo requests will be filtered out on HOSTA. This issue does > > not occur with smaller packets than MTU (where fragmentation does not happen). > > Patrick, any reason not to kill the special-casing (ct has assigned helper or > unconfirmed conntrack) in __ipv6_conntrack_in() ? > > This should make ipv6 frag behaviour consistent; right now its rather > confusing from ruleset point of view, especially the first packet > of a connection is always seen as reassembled. > > So with Jiris rules > > -A INPUT -p icmpv6 -m icmp6 --icmpv6-type 128 -j ACCEPT > -A INPUT -p icmpv6 -j DROP > > ping6 -s $bignum works for the first packet but not for subsequent ones > which is quite irritating. Well, the reason was to avoid unnecessary work doing refragmentation unless really required. I know its rather complicated, but IPv6 has always required treating fragments manually or using conntrack state. I'm not objecting to changing this, but the patches as they are are not the way to go. First, moving nfct_frag to struct sk_buff seems like a real waste of space for this quite rare case. Also, we can't just use the reassembled packet in ip6tables, when modifying it we will still output the unchanged fragments. An last of all, we'll be executing the rules on the reassembled packet multiple times, one for each fragment. So if someone wants to change this, simply *only* pass the reassembled packet through the netfilter hooks and drop the fragments, as in IPv4. -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Tue, Nov 05, 2013 at 02:41:19PM CET, kaber@trash.net wrote: >On Tue, Nov 05, 2013 at 02:32:05PM +0100, Florian Westphal wrote: >> Jiri Pirko <jiri@resnulli.us> wrote: >> > This patch fixes for example following situation: >> > On HOSTA do: >> > ip6tables -I INPUT -p icmpv6 -j DROP >> > ip6tables -I INPUT -p icmpv6 -m icmp6 --icmpv6-type 128 -j ACCEPT >> >> untested: >> >> -A INPUT -p icmpv6 -m icmp6 --icmpv6-type 128 -j ACCEPT >> -A INPUT -p icmpv6 -m conntrack --ctstatus CONFIRMED -j ACCEPT >> -A INPUT -p icmpv6 -j DROP >> >> > and on HOSTB you do: >> > ping6 HOSTA -s2000 (MTU is 1500) >> > >> > Incoming echo requests will be filtered out on HOSTA. This issue does >> > not occur with smaller packets than MTU (where fragmentation does not happen). >> >> Patrick, any reason not to kill the special-casing (ct has assigned helper or >> unconfirmed conntrack) in __ipv6_conntrack_in() ? >> >> This should make ipv6 frag behaviour consistent; right now its rather >> confusing from ruleset point of view, especially the first packet >> of a connection is always seen as reassembled. >> >> So with Jiris rules >> >> -A INPUT -p icmpv6 -m icmp6 --icmpv6-type 128 -j ACCEPT >> -A INPUT -p icmpv6 -j DROP >> >> ping6 -s $bignum works for the first packet but not for subsequent ones >> which is quite irritating. > >Well, the reason was to avoid unnecessary work doing refragmentation >unless really required. I know its rather complicated, but IPv6 has >always required treating fragments manually or using conntrack state. > >I'm not objecting to changing this, but the patches as they are are >not the way to go. First, moving nfct_frag to struct sk_buff seems I'm a bit lost. What "nfct_frag" are you reffering to here? >like a real waste of space for this quite rare case. Also, we can't >just use the reassembled packet in ip6tables, when modifying it we >will still output the unchanged fragments. An last of all, we'll be >executing the rules on the reassembled packet multiple times, one >for each fragment. Reassembled skb would be only used for matching where no changes takes place. End even though, the matching is now done for each fragment skb anyway. The change is only to do it on different skb. I see no erformance or any other problem in that. > >So if someone wants to change this, simply *only* pass the reassembled >packet through the netfilter hooks and drop the fragments, as in IPv4. This is unfortunatelly not possible because in forwarding use case, the fragments have to be send out as they come in. -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Jiri Pirko <jiri@resnulli.us> wrote: > Tue, Nov 05, 2013 at 02:41:19PM CET, kaber@trash.net wrote: > >executing the rules on the reassembled packet multiple times, one > >for each fragment. > [..] > End even though, the matching is now done for each fragment skb anyway. The > change is only to do it on different skb. I see no erformance or any > other problem in that. One problem that comes to mind is that nfacct or quota match will now account num_of_fragments * length_of_reassemled_skb bytes. -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Nov 05, 2013 at 04:01:15PM +0100, Jiri Pirko wrote: > Tue, Nov 05, 2013 at 02:41:19PM CET, kaber@trash.net wrote: > >On Tue, Nov 05, 2013 at 02:32:05PM +0100, Florian Westphal wrote: > >> Jiri Pirko <jiri@resnulli.us> wrote: > >> > This patch fixes for example following situation: > >> > On HOSTA do: > >> > ip6tables -I INPUT -p icmpv6 -j DROP > >> > ip6tables -I INPUT -p icmpv6 -m icmp6 --icmpv6-type 128 -j ACCEPT > >> > >> untested: > >> > >> -A INPUT -p icmpv6 -m icmp6 --icmpv6-type 128 -j ACCEPT > >> -A INPUT -p icmpv6 -m conntrack --ctstatus CONFIRMED -j ACCEPT > >> -A INPUT -p icmpv6 -j DROP > >> > >> > and on HOSTB you do: > >> > ping6 HOSTA -s2000 (MTU is 1500) > >> > > >> > Incoming echo requests will be filtered out on HOSTA. This issue does > >> > not occur with smaller packets than MTU (where fragmentation does not happen). > >> > >> Patrick, any reason not to kill the special-casing (ct has assigned helper or > >> unconfirmed conntrack) in __ipv6_conntrack_in() ? > >> > >> This should make ipv6 frag behaviour consistent; right now its rather > >> confusing from ruleset point of view, especially the first packet > >> of a connection is always seen as reassembled. > >> > >> So with Jiris rules > >> > >> -A INPUT -p icmpv6 -m icmp6 --icmpv6-type 128 -j ACCEPT > >> -A INPUT -p icmpv6 -j DROP > >> > >> ping6 -s $bignum works for the first packet but not for subsequent ones > >> which is quite irritating. > > > >Well, the reason was to avoid unnecessary work doing refragmentation > >unless really required. I know its rather complicated, but IPv6 has > >always required treating fragments manually or using conntrack state. > > > >I'm not objecting to changing this, but the patches as they are are > >not the way to go. First, moving nfct_frag to struct sk_buff seems > > I'm a bit lost. What "nfct_frag" are you reffering to here? I meant nfct_reasm of course. > >like a real waste of space for this quite rare case. Also, we can't > >just use the reassembled packet in ip6tables, when modifying it we > >will still output the unchanged fragments. An last of all, we'll be > >executing the rules on the reassembled packet multiple times, one > >for each fragment. > > Reassembled skb would be only used for matching where no changes takes > place. That still doesn't work, our matches are not purely passive. > End even though, the matching is now done for each fragment skb anyway. The > change is only to do it on different skb. I see no erformance or any > other problem in that. Accounting, quota, statistic, limit, ... come to mind. Basically any match that keeps state. > >So if someone wants to change this, simply *only* pass the reassembled > >packet through the netfilter hooks and drop the fragments, as in IPv4. > > This is unfortunatelly not possible because in forwarding use case, the > fragments have to be send out as they come in. No, the IPv6 NAT patches fixed that, we still do proper refragmentation and we still respect the original fragment sizes, thus are not responsible for potentially exceeding the PMTU on the following path. -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Nov 05, 2013 at 04:39:21PM +0100, Florian Westphal wrote: > Jiri Pirko <jiri@resnulli.us> wrote: > > Tue, Nov 05, 2013 at 02:41:19PM CET, kaber@trash.net wrote: > > >executing the rules on the reassembled packet multiple times, one > > >for each fragment. > > > [..] > > End even though, the matching is now done for each fragment skb anyway. The > > change is only to do it on different skb. I see no erformance or any > > other problem in that. > > One problem that comes to mind is that nfacct or quota match will > now account num_of_fragments * length_of_reassemled_skb bytes. indeed. The easiest way to fix all this (and, btw, also the pskb_expand_head() oops which is currently reported by multiple people) is to get rid of all the fragmentation handling and simply use the reassembled skb. -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Tue, Nov 05, 2013 at 07:19:24PM CET, kaber@trash.net wrote: >On Tue, Nov 05, 2013 at 04:39:21PM +0100, Florian Westphal wrote: >> Jiri Pirko <jiri@resnulli.us> wrote: >> > Tue, Nov 05, 2013 at 02:41:19PM CET, kaber@trash.net wrote: >> > >executing the rules on the reassembled packet multiple times, one >> > >for each fragment. >> > >> [..] >> > End even though, the matching is now done for each fragment skb anyway. The >> > change is only to do it on different skb. I see no erformance or any >> > other problem in that. >> >> One problem that comes to mind is that nfacct or quota match will >> now account num_of_fragments * length_of_reassemled_skb bytes. > >indeed. The easiest way to fix all this (and, btw, also the >pskb_expand_head() oops which is currently reported by multiple people) >is to get rid of all the fragmentation handling and simply use the >reassembled skb. Okay. That will resolve the skb->sk rewrite problem as well. I will prepare a patch. -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Tue, Nov 05, 2013 at 07:16:33PM CET, kaber@trash.net wrote: >On Tue, Nov 05, 2013 at 04:01:15PM +0100, Jiri Pirko wrote: >> Tue, Nov 05, 2013 at 02:41:19PM CET, kaber@trash.net wrote: >> >On Tue, Nov 05, 2013 at 02:32:05PM +0100, Florian Westphal wrote: >> >> Jiri Pirko <jiri@resnulli.us> wrote: >> >> > This patch fixes for example following situation: >> >> > On HOSTA do: >> >> > ip6tables -I INPUT -p icmpv6 -j DROP >> >> > ip6tables -I INPUT -p icmpv6 -m icmp6 --icmpv6-type 128 -j ACCEPT >> >> >> >> untested: >> >> >> >> -A INPUT -p icmpv6 -m icmp6 --icmpv6-type 128 -j ACCEPT >> >> -A INPUT -p icmpv6 -m conntrack --ctstatus CONFIRMED -j ACCEPT >> >> -A INPUT -p icmpv6 -j DROP >> >> >> >> > and on HOSTB you do: >> >> > ping6 HOSTA -s2000 (MTU is 1500) >> >> > >> >> > Incoming echo requests will be filtered out on HOSTA. This issue does >> >> > not occur with smaller packets than MTU (where fragmentation does not happen). >> >> >> >> Patrick, any reason not to kill the special-casing (ct has assigned helper or >> >> unconfirmed conntrack) in __ipv6_conntrack_in() ? >> >> >> >> This should make ipv6 frag behaviour consistent; right now its rather >> >> confusing from ruleset point of view, especially the first packet >> >> of a connection is always seen as reassembled. >> >> >> >> So with Jiris rules >> >> >> >> -A INPUT -p icmpv6 -m icmp6 --icmpv6-type 128 -j ACCEPT >> >> -A INPUT -p icmpv6 -j DROP >> >> >> >> ping6 -s $bignum works for the first packet but not for subsequent ones >> >> which is quite irritating. >> > >> >Well, the reason was to avoid unnecessary work doing refragmentation >> >unless really required. I know its rather complicated, but IPv6 has >> >always required treating fragments manually or using conntrack state. >> > >> >I'm not objecting to changing this, but the patches as they are are >> >not the way to go. First, moving nfct_frag to struct sk_buff seems >> >> I'm a bit lost. What "nfct_frag" are you reffering to here? > >I meant nfct_reasm of course. The patch is not moving this to struct sk_buff. It is already there. > >> >like a real waste of space for this quite rare case. Also, we can't >> >just use the reassembled packet in ip6tables, when modifying it we >> >will still output the unchanged fragments. An last of all, we'll be >> >executing the rules on the reassembled packet multiple times, one >> >for each fragment. >> >> Reassembled skb would be only used for matching where no changes takes >> place. > >That still doesn't work, our matches are not purely passive. > >> End even though, the matching is now done for each fragment skb anyway. The >> change is only to do it on different skb. I see no erformance or any >> other problem in that. > >Accounting, quota, statistic, limit, ... come to mind. Basically any >match that keeps state. Ok. Makes sense. > >> >So if someone wants to change this, simply *only* pass the reassembled >> >packet through the netfilter hooks and drop the fragments, as in IPv4. >> >> This is unfortunatelly not possible because in forwarding use case, the >> fragments have to be send out as they come in. > >No, the IPv6 NAT patches fixed that, we still do proper refragmentation >and we still respect the original fragment sizes, thus are not responsible >for potentially exceeding the PMTU on the following path. Ok. So the plan is to remove net/ipv6/netfilter/nf_conntrack_reasm.c code entirely and use net/ipv6/reassembly.c code directly from nf_defrag_ipv6. This would result in very similar code currently ipv4 has. -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Nov 05, 2013 at 09:55:20PM +0100, Jiri Pirko wrote: > Tue, Nov 05, 2013 at 07:16:33PM CET, kaber@trash.net wrote: > > >> I'm a bit lost. What "nfct_frag" are you reffering to here? > > > >I meant nfct_reasm of course. > > The patch is not moving this to struct sk_buff. It is already there. Right again, sorry, I was replying to Florian's mail without the quoted patch, so I seem to have mixed up things in between :) > >> >So if someone wants to change this, simply *only* pass the reassembled > >> >packet through the netfilter hooks and drop the fragments, as in IPv4. > >> > >> This is unfortunatelly not possible because in forwarding use case, the > >> fragments have to be send out as they come in. > > > >No, the IPv6 NAT patches fixed that, we still do proper refragmentation > >and we still respect the original fragment sizes, thus are not responsible > >for potentially exceeding the PMTU on the following path. > > Ok. So the plan is to remove net/ipv6/netfilter/nf_conntrack_reasm.c > code entirely and use net/ipv6/reassembly.c code directly from > nf_defrag_ipv6. This would result in very similar code currently ipv4 > has. If its possible to use net/ipv6/reassembly.c directly, even better, so far I was thinking of just getting rid of the fragment replay, but you're most likely right that this is the proper way to do it. -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
>> >So if someone wants to change this, simply *only* pass the reassembled >> >packet through the netfilter hooks and drop the fragments, as in IPv4. >> >> This is unfortunatelly not possible because in forwarding use case, the >> fragments have to be send out as they come in. > >No, the IPv6 NAT patches fixed that, we still do proper refragmentation >and we still respect the original fragment sizes, thus are not responsible >for potentially exceeding the PMTU on the following path. Can you please point where this is done. Where the original fragment sizes are stored and in which code are they restored? Thanks. -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Jiri Pirko <jiri@resnulli.us> wrote: > >> >So if someone wants to change this, simply *only* pass the reassembled > >> >packet through the netfilter hooks and drop the fragments, as in IPv4. > >> > >> This is unfortunatelly not possible because in forwarding use case, the > >> fragments have to be send out as they come in. > > > >No, the IPv6 NAT patches fixed that, we still do proper refragmentation > >and we still respect the original fragment sizes, thus are not responsible > >for potentially exceeding the PMTU on the following path. > > Can you please point where this is done. Where the original fragment > sizes are stored and in which code are they restored? Thanks. Patrick is probably talking about commit 4cdd34084d539c758d00c5dc7bf95db2e4f2bc70 (netfilter: nf_conntrack_ipv6: improve fragmentation handling) which introduces 'frag_max_size' in inet6_skb_parm struct. -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Wed, Nov 06, 2013 at 03:33:49PM CET, fw@strlen.de wrote: >Jiri Pirko <jiri@resnulli.us> wrote: >> >> >So if someone wants to change this, simply *only* pass the reassembled >> >> >packet through the netfilter hooks and drop the fragments, as in IPv4. >> >> >> >> This is unfortunatelly not possible because in forwarding use case, the >> >> fragments have to be send out as they come in. >> > >> >No, the IPv6 NAT patches fixed that, we still do proper refragmentation >> >and we still respect the original fragment sizes, thus are not responsible >> >for potentially exceeding the PMTU on the following path. >> >> Can you please point where this is done. Where the original fragment >> sizes are stored and in which code are they restored? Thanks. > >Patrick is probably talking about > >commit 4cdd34084d539c758d00c5dc7bf95db2e4f2bc70 >(netfilter: nf_conntrack_ipv6: improve fragmentation handling) >which introduces 'frag_max_size' in inet6_skb_parm struct. Thanks for the pointer. Interestingly though, according to my testing, if reassembled packet would fit into outdev mtu, it is not fragmented to the original frag size and it is send as single big packet. That is I believe not correct. -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Nov 06, 2013 at 03:44:53PM +0100, Jiri Pirko wrote: > Wed, Nov 06, 2013 at 03:33:49PM CET, fw@strlen.de wrote: > >Jiri Pirko <jiri@resnulli.us> wrote: > >> >> >So if someone wants to change this, simply *only* pass the reassembled > >> >> >packet through the netfilter hooks and drop the fragments, as in IPv4. > >> >> > >> >> This is unfortunatelly not possible because in forwarding use case, the > >> >> fragments have to be send out as they come in. > >> > > >> >No, the IPv6 NAT patches fixed that, we still do proper refragmentation > >> >and we still respect the original fragment sizes, thus are not responsible > >> >for potentially exceeding the PMTU on the following path. > >> > >> Can you please point where this is done. Where the original fragment > >> sizes are stored and in which code are they restored? Thanks. > > > >Patrick is probably talking about > > > >commit 4cdd34084d539c758d00c5dc7bf95db2e4f2bc70 > >(netfilter: nf_conntrack_ipv6: improve fragmentation handling) > >which introduces 'frag_max_size' in inet6_skb_parm struct. Indeed. > Thanks for the pointer. Interestingly though, according to my testing, > if reassembled packet would fit into outdev mtu, it is not fragmented > to the original frag size and it is send as single big packet. That is > I believe not correct. Hmm right, that wasn't the intention. We currently only use frag_max_size to send PKT_TOOBIG messages if the size exceeds the MTU, but not to trigger fragmentation itself. We probably need to handle this in ip6_finish_output(). Would you mind fixing this up as well in your patches? -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Wed, Nov 06, 2013 at 03:51:04PM CET, kaber@trash.net wrote: >On Wed, Nov 06, 2013 at 03:44:53PM +0100, Jiri Pirko wrote: >> Wed, Nov 06, 2013 at 03:33:49PM CET, fw@strlen.de wrote: >> >Jiri Pirko <jiri@resnulli.us> wrote: >> >> >> >So if someone wants to change this, simply *only* pass the reassembled >> >> >> >packet through the netfilter hooks and drop the fragments, as in IPv4. >> >> >> >> >> >> This is unfortunatelly not possible because in forwarding use case, the >> >> >> fragments have to be send out as they come in. >> >> > >> >> >No, the IPv6 NAT patches fixed that, we still do proper refragmentation >> >> >and we still respect the original fragment sizes, thus are not responsible >> >> >for potentially exceeding the PMTU on the following path. >> >> >> >> Can you please point where this is done. Where the original fragment >> >> sizes are stored and in which code are they restored? Thanks. >> > >> >Patrick is probably talking about >> > >> >commit 4cdd34084d539c758d00c5dc7bf95db2e4f2bc70 >> >(netfilter: nf_conntrack_ipv6: improve fragmentation handling) >> >which introduces 'frag_max_size' in inet6_skb_parm struct. > >Indeed. > >> Thanks for the pointer. Interestingly though, according to my testing, >> if reassembled packet would fit into outdev mtu, it is not fragmented >> to the original frag size and it is send as single big packet. That is >> I believe not correct. > >Hmm right, that wasn't the intention. We currently only use frag_max_size >to send PKT_TOOBIG messages if the size exceeds the MTU, but not to trigger >fragmentation itself. > >We probably need to handle this in ip6_finish_output(). Would you mind >fixing this up as well in your patches? I'm working on it atm. -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2013-11-06 at 16:29 +0100, Jiri Pirko wrote:
> I'm working on it atm.
Note that we have a similar problem in general with GRO and forwading.
We do no check against output mtu, if the packet is GSO.
skb_is_gso() being true bypasses the safety checks.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/net/ipv6/netfilter/ip6_tables.c b/net/ipv6/netfilter/ip6_tables.c index 710238f..ec9cb1a 100644 --- a/net/ipv6/netfilter/ip6_tables.c +++ b/net/ipv6/netfilter/ip6_tables.c @@ -328,6 +328,10 @@ ip6t_do_table(struct sk_buff *skb, const struct xt_table_info *private; struct xt_action_param acpar; unsigned int addend; + struct sk_buff *reasm = skb_nfct_reasm(skb); + + if (!reasm) + reasm = skb; /* Initialization */ indev = in ? in->name : nulldevname; @@ -368,7 +372,7 @@ ip6t_do_table(struct sk_buff *skb, IP_NF_ASSERT(e); acpar.thoff = 0; - if (!ip6_packet_match(skb, indev, outdev, &e->ipv6, + if (!ip6_packet_match(reasm, indev, outdev, &e->ipv6, &acpar.thoff, &acpar.fragoff, &acpar.hotdrop)) { no_match: e = ip6t_next_entry(e); @@ -378,7 +382,7 @@ ip6t_do_table(struct sk_buff *skb, xt_ematch_foreach(ematch, e) { acpar.match = ematch->u.kernel.match; acpar.matchinfo = ematch->data; - if (!acpar.match->match(skb, &acpar)) + if (!acpar.match->match(reasm, &acpar)) goto no_match; }
Currently, when ipv6 fragment goes through the netfilter, match functions are called on them directly. This might cause match function to fail. So benefit from the fact that nf_defrag_ipv6 constructs reassembled skb for us and use this reassembled skb for matching. This patch fixes for example following situation: On HOSTA do: ip6tables -I INPUT -p icmpv6 -j DROP ip6tables -I INPUT -p icmpv6 -m icmp6 --icmpv6-type 128 -j ACCEPT and on HOSTB you do: ping6 HOSTA -s2000 (MTU is 1500) Incoming echo requests will be filtered out on HOSTA. This issue does not occur with smaller packets than MTU (where fragmentation does not happen). Signed-off-by: Jiri Pirko <jiri@resnulli.us> --- net/ipv6/netfilter/ip6_tables.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)