diff mbox

[nf] netfilter: nat: remove incorrect debug assert

Message ID 20170208221429.3555-1-fw@strlen.de
State Changes Requested
Delegated to: Pablo Neira
Headers show

Commit Message

Florian Westphal Feb. 8, 2017, 10:14 p.m. UTC
The comment is incorrect, this function does see fragments when
IP_NODEFRAG is used.  Remove the wrong assertion.

As conntrack doesn't track fragments skb->nfct will be null
and no nat is performed.

Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/ipv4/netfilter/nf_nat_l3proto_ipv4.c | 5 -----
 1 file changed, 5 deletions(-)

Comments

Pablo Neira Ayuso Feb. 21, 2017, 2:09 p.m. UTC | #1
On Wed, Feb 08, 2017 at 11:14:29PM +0100, Florian Westphal wrote:
> The comment is incorrect, this function does see fragments when
> IP_NODEFRAG is used.  Remove the wrong assertion.
> 
> As conntrack doesn't track fragments skb->nfct will be null
> and no nat is performed.

With IP_NODEFRAG, ipv4_conntrack_defrag() will just accept the packet.

So the first fragment will get into nf_conntrack_in(), and I think, if
enough information is there in place, it will get a ct object. Follow
up fragments with offset != 0 which doesn't contain headers will
definitely not get a ct object.

Shouldn't handle case this by attaching a template conntrack?
Currently this IP_NODEFRAG case is going through as invalid traffic.

My impression is that we're handling this case in a sloppy way, am I
missing anything?

> Reported-by: Andrey Konovalov <andreyknvl@google.com>
> Signed-off-by: Florian Westphal <fw@strlen.de>
> ---
>  net/ipv4/netfilter/nf_nat_l3proto_ipv4.c | 5 -----
>  1 file changed, 5 deletions(-)
> 
> diff --git a/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c b/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c
> index f8aad03d674b..6f5e8d01b876 100644
> --- a/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c
> +++ b/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c
> @@ -255,11 +255,6 @@ nf_nat_ipv4_fn(void *priv, struct sk_buff *skb,
>  	/* maniptype == SRC for postrouting. */
>  	enum nf_nat_manip_type maniptype = HOOK2MANIP(state->hook);
>  
> -	/* We never see fragments: conntrack defrags on pre-routing
> -	 * and local-out, and nf_nat_out protects post-routing.
> -	 */
> -	NF_CT_ASSERT(!ip_is_fragment(ip_hdr(skb)));
> -
>  	ct = nf_ct_get(skb, &ctinfo);
>  	/* Can't track?  It's not due to stress, or conntrack would
>  	 * have dropped it.  Hence it's the user's responsibilty to
> -- 
> 2.10.2
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Florian Westphal Feb. 21, 2017, 2:40 p.m. UTC | #2
Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> On Wed, Feb 08, 2017 at 11:14:29PM +0100, Florian Westphal wrote:
> > The comment is incorrect, this function does see fragments when
> > IP_NODEFRAG is used.  Remove the wrong assertion.
> > 
> > As conntrack doesn't track fragments skb->nfct will be null
> > and no nat is performed.
> 
> With IP_NODEFRAG, ipv4_conntrack_defrag() will just accept the packet.
> 
> So the first fragment will get into nf_conntrack_in(), and I think, if
> enough information is there in place, it will get a ct object.

ipv4_get_l4proto():
       if (iph->frag_off & htons(IP_OFFSET))
              return -NF_ACCEPT;

so yes, you are right, first packet will be tracked in this case.

> up fragments with offset != 0 which doesn't contain headers will
> definitely not get a ct object.
> 
> Shouldn't handle case this by attaching a template conntrack?
> Currently this IP_NODEFRAG case is going through as invalid traffic.
> 
> My impression is that we're handling this case in a sloppy way, am I
> missing anything?

What would you do instead?

We currently have a suboptimal handling of such cases, but I don't see
how we can change it without (possibly) breaking existing setups.
I also don't see how alternative handling is 'better'.

Tagging it as UNTRACKED seems wrong because its used for cases where
we could track but decide against it, e.g. due to -j NOTRACK or explicit
tracker whitelist (icmpv6 neigh for instance).

Documentation says (iptables-extensions):

  INVALID The packet is associated with no known connection.
  UNTRACKED  The packet is not tracked at all, which happens if  you
  explictly untrack it by using -j CT --notrack in the raw table.

(XXX: needs a sentence wrt. icmpv6...)

So current behaviour at least appears consistent with documentation.

> > diff --git a/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c b/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c
> > ick_inndex f8aad03d674b..6f5e8d01b876 100644
> > --- a/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c
> > +++ b/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c
> > @@ -255,11 +255,6 @@ nf_nat_ipv4_fn(void *priv, struct sk_buff *skb,
> >  	/* maniptype == SRC for postrouting. */
> >  	enum nf_nat_manip_type maniptype = HOOK2MANIP(state->hook);
> >  
> > -	/* We never see fragments: conntrack defrags on pre-routing
> > -	 * and local-out, and nf_nat_out protects post-routing.
> > -	 */
> > -	NF_CT_ASSERT(!ip_is_fragment(ip_hdr(skb)));
> > -

We could make this a explicit test+return but that seems weird too,
we would track the first fragment but would not nat.

However, changing test to if (iph->frag_off) return -NF_ACCEPT seems
wrong too because we have enough info to track. OTOH, this only happens
with HDRINCL+raw socket so perhaps we shouldn't care about this and
just change ipv4 l3 tracker to ignore all packets w. iph->frag_off set.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Pablo Neira Ayuso Feb. 21, 2017, 3:54 p.m. UTC | #3
On Tue, Feb 21, 2017 at 03:40:19PM +0100, Florian Westphal wrote:
> Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > On Wed, Feb 08, 2017 at 11:14:29PM +0100, Florian Westphal wrote:
> > > The comment is incorrect, this function does see fragments when
> > > IP_NODEFRAG is used.  Remove the wrong assertion.
> > > 
> > > As conntrack doesn't track fragments skb->nfct will be null
> > > and no nat is performed.
> > 
> > With IP_NODEFRAG, ipv4_conntrack_defrag() will just accept the packet.
> > 
> > So the first fragment will get into nf_conntrack_in(), and I think, if
> > enough information is there in place, it will get a ct object.
> 
> ipv4_get_l4proto():
>        if (iph->frag_off & htons(IP_OFFSET))
>               return -NF_ACCEPT;
> 
> so yes, you are right, first packet will be tracked in this case.

With NAT in place, this also means the first packet of a flow gets
mangled, while follow up don't.

Probably change that spot above to use ip_is_fragment()?

> > up fragments with offset != 0 which doesn't contain headers will
> > definitely not get a ct object.
> > 
> > Shouldn't handle case this by attaching a template conntrack?
> > Currently this IP_NODEFRAG case is going through as invalid traffic.
> > 
> > My impression is that we're handling this case in a sloppy way, am I
> > missing anything?
> 
> What would you do instead?
> 
> We currently have a suboptimal handling of such cases, but I don't see
> how we can change it without (possibly) breaking existing setups.

AFAIK, only IP_NODEFRAG locally generated packets would get affected.
I wonder how this option is used (network testing?). I cannot come up
with any reasonable stateful ruleset that may work with this. With a
stateful ruleset in place, the first packet will go through and follow
up would be INVALID. There are tons of rulesets outthere simply
logging and dropping invalid ones.

Look, the first packet create an entry in SYN_SENT state, that just
expires later on.

> I also don't see how alternative handling is 'better'.

We could just handle all the packets in a flow in the same way, so
they all go through INVALID.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Pablo Neira Ayuso March 3, 2017, 11:55 a.m. UTC | #4
On Tue, Feb 21, 2017 at 03:40:19PM +0100, Florian Westphal wrote:
> Pablo Neira Ayuso <pablo@netfilter.org> wrote:
[...]
> > > diff --git a/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c b/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c
> > > ick_inndex f8aad03d674b..6f5e8d01b876 100644
> > > --- a/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c
> > > +++ b/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c
> > > @@ -255,11 +255,6 @@ nf_nat_ipv4_fn(void *priv, struct sk_buff *skb,
> > >  	/* maniptype == SRC for postrouting. */
> > >  	enum nf_nat_manip_type maniptype = HOOK2MANIP(state->hook);
> > >  
> > > -	/* We never see fragments: conntrack defrags on pre-routing
> > > -	 * and local-out, and nf_nat_out protects post-routing.
> > > -	 */
> > > -	NF_CT_ASSERT(!ip_is_fragment(ip_hdr(skb)));
> > > -
> 
> We could make this a explicit test+return but that seems weird too,
> we would track the first fragment but would not nat.

Right, that test+return just for this is weird.

> However, changing test to if (iph->frag_off) return -NF_ACCEPT seems
> wrong too because we have enough info to track. OTOH, this only happens
> with HDRINCL+raw socket so perhaps we shouldn't care about this and
> just change ipv4 l3 tracker to ignore all packets w. iph->frag_off set.

Florian, unless you rise your hand, I'm going to take this patch so we
at least fix splats here. I still have the impression that this
setsockopt() option and its interaction with netfilter is broken at
many levels.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Florian Westphal March 3, 2017, 12:44 p.m. UTC | #5
Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > However, changing test to if (iph->frag_off) return -NF_ACCEPT seems
> > wrong too because we have enough info to track. OTOH, this only happens
> > with HDRINCL+raw socket so perhaps we shouldn't care about this and
> > just change ipv4 l3 tracker to ignore all packets w. iph->frag_off set.
> 
> Florian, unless you rise your hand, I'm going to take this patch so we
> at least fix splats here. I still have the impression that this
> setsockopt() option and its interaction with netfilter is broken at
> many levels.

Hmmm, I think we should disable tracking of all fragmented packets,
or at least disable NAT of all fragmented packets.

If we NAT 1st packet only then frag reasm won't complete anyway.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Pablo Neira Ayuso March 3, 2017, 12:47 p.m. UTC | #6
On Fri, Mar 03, 2017 at 01:44:04PM +0100, Florian Westphal wrote:
> Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > > However, changing test to if (iph->frag_off) return -NF_ACCEPT seems
> > > wrong too because we have enough info to track. OTOH, this only happens
> > > with HDRINCL+raw socket so perhaps we shouldn't care about this and
> > > just change ipv4 l3 tracker to ignore all packets w. iph->frag_off set.
> > 
> > Florian, unless you rise your hand, I'm going to take this patch so we
> > at least fix splats here. I still have the impression that this
> > setsockopt() option and its interaction with netfilter is broken at
> > many levels.
> 
> Hmmm, I think we should disable tracking of all fragmented packets,
> or at least disable NAT of all fragmented packets.
> 
> If we NAT 1st packet only then frag reasm won't complete anyway.

Right. Would you give a shoot at this? If you do, I'd appreciate.
Thanks.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jozsef Kadlecsik March 3, 2017, 7:40 p.m. UTC | #7
On Fri, 3 Mar 2017, Florian Westphal wrote:

> Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > > However, changing test to if (iph->frag_off) return -NF_ACCEPT seems
> > > wrong too because we have enough info to track. OTOH, this only happens
> > > with HDRINCL+raw socket so perhaps we shouldn't care about this and
> > > just change ipv4 l3 tracker to ignore all packets w. iph->frag_off set.
> > 
> > Florian, unless you rise your hand, I'm going to take this patch so we
> > at least fix splats here. I still have the impression that this
> > setsockopt() option and its interaction with netfilter is broken at
> > many levels.
> 
> Hmmm, I think we should disable tracking of all fragmented packets,
> or at least disable NAT of all fragmented packets.

I think that is the safest solution, i.e. disable tracking and NAT for all 
fragmented packets.

> If we NAT 1st packet only then frag reasm won't complete anyway.

Best regards,
Jozsef
-
E-mail  : kadlec@blackhole.kfki.hu, kadlecsik.jozsef@wigner.mta.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : Wigner Research Centre for Physics, Hungarian Academy of Sciences
          H-1525 Budapest 114, POB. 49, Hungary
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c b/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c
index f8aad03d674b..6f5e8d01b876 100644
--- a/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c
+++ b/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c
@@ -255,11 +255,6 @@  nf_nat_ipv4_fn(void *priv, struct sk_buff *skb,
 	/* maniptype == SRC for postrouting. */
 	enum nf_nat_manip_type maniptype = HOOK2MANIP(state->hook);
 
-	/* We never see fragments: conntrack defrags on pre-routing
-	 * and local-out, and nf_nat_out protects post-routing.
-	 */
-	NF_CT_ASSERT(!ip_is_fragment(ip_hdr(skb)));
-
 	ct = nf_ct_get(skb, &ctinfo);
 	/* Can't track?  It's not due to stress, or conntrack would
 	 * have dropped it.  Hence it's the user's responsibilty to