diff mbox

[net-next] net: bail on sock_wfree, sock_rfree when we have a TCP_TIMEWAIT sk

Message ID CALCkm3eCnC7fwtStGZj4xZDJeNegLT3vBHKNABvsEq7_H7aQfg@mail.gmail.com
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Alex Gartrell July 3, 2015, 9:54 p.m. UTC
Hey

On Fri, Jul 3, 2015 at 1:32 AM, Julian Anastasov <ja@ssi.bg> wrote:

> To summarize:
>         - we should call skb_orphan as soon as possible after
>         deciding if packets goes to local or remote real server
>         but only for skb->sk set by early_demux, not for packets
>         sent by TCP

Yeah, agree

>         - if packets go to local server IPVS should not touch
>         skb->dst, skb->sk, etc (NF_ACCEPT case)

Yeah, the thing is that early demux could totally match for a socket
that existed before we created the service, and in that instance it
might make the most sense to retain the connection and simply
NF_ACCEPT.  The problem with that approach though is that is that the
behavior changes if early_demux is not enabled.  I believe that we
should just do the consistent thing and always drop the early_demux
result if bound for non-local, as you've said.

The interesting thing though is that, for the purposes of routing,
enabling early_demux does change the behavior.  I suspect that's a
bug, but it's far enough away from actual use cases that it's probably
fine (who is out there tearing down addresses and setting up routes in
their place?)

>         - for skb->sk set by early_demux, skb_orphan should happen before
>         skb_set_owner_w in ip_vs_prepare_tunneled_skb because
>         skb_set_owner_w will try to increase sk_wmem_alloc which is
>         wrong for early_demux phase

Yeah that's my thinking as well.

>         - reaching skb_set_owner_w code for skb->sk set by eraly_demux
>         looks wrong to me, it can happen on:
>                 - redirect (DNAT), if somehow we have socket too
>                 - IPVS redirect: if we forward both to local and remote
>                 real servers
>                 - not likely for forward, nobody forwards traffic
>                 destined to local IP to remote host


What do you think of the following:

commit f04c42f8041cc4ccc4cb2a30c1058136dd497a83
Author: Alex Gartrell <agartrell@fb.com>
Date:   Wed Jul 1 13:24:46 2015 -0700

    ipvs: orphan_skb in case of forwarding

    It is possible that we bind against a local socket in early_demux when we
    are actually going to want to forward it.  In this case, the socket serves
    no purpose and only serves to confuse things (particularly functions which
    implicitly expect sk_fullsock to be true, like ip_local_out).
    Additionally, skb_set_owner_w is totally broken for non full-socks.

    Signed-off-by: Alex Gartrell <agartrell@fb.com>

                                         struct ip_vs_conn *cp, int local)
@@ -539,6 +552,7 @@ static inline int ip_vs_nat_send_or_cont(int pf,
struct sk_buff *skb,
        else
                ip_vs_update_conntrack(skb, cp, 1);
        if (!local) {
+               ip_vs_drop_early_demux_sk(skb);
                skb_forward_csum(skb);
                NF_HOOK(pf, NF_INET_LOCAL_OUT, NULL, skb,
                        NULL, skb_dst(skb)->dev, dst_output_sk);
@@ -557,6 +571,7 @@ static inline int ip_vs_send_or_cont(int pf,
struct sk_buff *skb,
        if (likely(!(cp->flags & IP_VS_CONN_F_NFCT)))
                ip_vs_notrack(skb);
        if (!local) {
+               ip_vs_drop_early_demux_sk(skb);
                skb_forward_csum(skb);
                NF_HOOK(pf, NF_INET_LOCAL_OUT, NULL, skb,
                        NULL, skb_dst(skb)->dev, dst_output_sk);
@@ -845,6 +860,8 @@ ip_vs_prepare_tunneled_skb(struct sk_buff *skb, int skb_af,
        struct ipv6hdr *old_ipv6h = NULL;
 #endif

+       ip_vs_drop_early_demux_sk(skb);
+
        if (skb_headroom(skb) < max_headroom || skb_cloned(skb)) {
                new_skb = skb_realloc_headroom(skb, max_headroom);
                if (!new_skb)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Julian Anastasov July 4, 2015, 8:34 a.m. UTC | #1
Hello,

On Fri, 3 Jul 2015, Alex Gartrell wrote:

> >         - if packets go to local server IPVS should not touch
> >         skb->dst, skb->sk, etc (NF_ACCEPT case)
> 
> Yeah, the thing is that early demux could totally match for a socket
> that existed before we created the service, and in that instance it
> might make the most sense to retain the connection and simply
> NF_ACCEPT.  The problem with that approach though is that is that the
> behavior changes if early_demux is not enabled.  I believe that we
> should just do the consistent thing and always drop the early_demux
> result if bound for non-local, as you've said.

	We must not forget that a local server listening
on 0.0.0.0:VPORT or VIP:VPORT can be reached if a real
server with some local IP is used as RIP. So, early demux
will really work for this case when local stack is one
of the real servers.

> The interesting thing though is that, for the purposes of routing,
> enabling early_demux does change the behavior.  I suspect that's a
> bug, but it's far enough away from actual use cases that it's probably
> fine (who is out there tearing down addresses and setting up routes in
> their place?)

	Looks like routing by definition can not divert skbs with
early-demux socket because input routing is not called.
Netfilter's DNAT may change daddr/dport before early-demux
and in this case socket should not be found (eg. if we
DNAT to other host). So, there is problem mostly for IPVS,
I don't remember for other cases. May be CLUSTERIP too,
I'm not sure. There is the problem that at LOCAL_IN
SNAT is valid operation, not sure how it affects
early-demux.

> What do you think of the following:
> 
> commit f04c42f8041cc4ccc4cb2a30c1058136dd497a83
> Author: Alex Gartrell <agartrell@fb.com>
> Date:   Wed Jul 1 13:24:46 2015 -0700
> 
>     ipvs: orphan_skb in case of forwarding

	skb_orphan or orphan skb

>     It is possible that we bind against a local socket in early_demux when we
>     are actually going to want to forward it.  In this case, the socket serves
>     no purpose and only serves to confuse things (particularly functions which
>     implicitly expect sk_fullsock to be true, like ip_local_out).
>     Additionally, skb_set_owner_w is totally broken for non full-socks.
> 
>     Signed-off-by: Alex Gartrell <agartrell@fb.com>
> 
> diff --git a/net/netfilter/ipvs/ip_vs_xmit.c b/net/netfilter/ipvs/ip_vs_xmit.c
> index bf66a86..3efe719 100644
> --- a/net/netfilter/ipvs/ip_vs_xmit.c
> +++ b/net/netfilter/ipvs/ip_vs_xmit.c
> @@ -527,6 +527,19 @@ static inline int
> ip_vs_tunnel_xmit_prepare(struct sk_buff *skb,
>         return ret;
>  }
> 
> +/* In the event of a remote destination, it's possible that we would have
> + * matches against an old socket (particularly a TIME-WAIT socket). This
> + * causes havoc down the line (ip_local_out et. al. expect regular sockets
> + * and invalid memory accesses will happen) so simply drop the association
> + * in this case
> +*/
> +static inline void ip_vs_drop_early_demux_sk(struct sk_buff *skb) {

	Move '{' on next line and below comment should be closed
on next line. But I guess you will run later
scripts/checkpatch.pl --strict /tmp/file.patch

> +       /* If dev is set, the packet came from the LOCAL_IN callback and
> +        * not from a local TCP socket */
> +       if (skb->dev)
> +               skb_orphan(skb);
> +}
> +
>  /* return NF_STOLEN (sent) or NF_ACCEPT if local=1 (not sent) */
>  static inline int ip_vs_nat_send_or_cont(int pf, struct sk_buff *skb,
>                                          struct ip_vs_conn *cp, int local)
> @@ -539,6 +552,7 @@ static inline int ip_vs_nat_send_or_cont(int pf,
> struct sk_buff *skb,
>         else
>                 ip_vs_update_conntrack(skb, cp, 1);
>         if (!local) {
> +               ip_vs_drop_early_demux_sk(skb);
>                 skb_forward_csum(skb);
>                 NF_HOOK(pf, NF_INET_LOCAL_OUT, NULL, skb,
>                         NULL, skb_dst(skb)->dev, dst_output_sk);

	For the local=true case in ip_vs_nat_send_or_cont may be
we should call skb_orphan when cp->dport != cp->vport or
cp->daddr != cp->vaddr. This is a case where we DNAT to
local real server but on different addr/port. If early
demux finds socket, it is some socket shadowed after adding
the virtual service. So, may be we have to add such checks
near the NF_ACCEPT code.

	Can this work?

	else {
		/* Drop early-demux socket on DNAT */
		if (cp->vport != cp->dport ||
		    !ip_vs_addr_equal(cp->af, cp->vaddr, &cp->caddr))
			ip_vs_drop_early_demux_sk(skb);
		ret = NF_ACCEPT;
	}

	Otherwise, the other changes look good to me.

Regards

--
Julian Anastasov <ja@ssi.bg>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet July 4, 2015, 8:40 a.m. UTC | #2
>         Looks like routing by definition can not divert skbs with
> early-demux socket because input routing is not called.

Only if found socket has a valid sk->sk_rx_dst

Early demux :

1) if TCP lookup found a matching socket, we do the attachment
   skb->sk = sk;
   skb->destructor = sock_edemux

2) If sk->sk_rx_dst is set and still valid, IP routing will use this cached dst.

So it looks very possible that some packets could match a socket but
fail the 2) phase.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/netfilter/ipvs/ip_vs_xmit.c b/net/netfilter/ipvs/ip_vs_xmit.c
index bf66a86..3efe719 100644
--- a/net/netfilter/ipvs/ip_vs_xmit.c
+++ b/net/netfilter/ipvs/ip_vs_xmit.c
@@ -527,6 +527,19 @@  static inline int
ip_vs_tunnel_xmit_prepare(struct sk_buff *skb,
        return ret;
 }

+/* In the event of a remote destination, it's possible that we would have
+ * matches against an old socket (particularly a TIME-WAIT socket). This
+ * causes havoc down the line (ip_local_out et. al. expect regular sockets
+ * and invalid memory accesses will happen) so simply drop the association
+ * in this case
+*/
+static inline void ip_vs_drop_early_demux_sk(struct sk_buff *skb) {
+       /* If dev is set, the packet came from the LOCAL_IN callback and
+        * not from a local TCP socket */
+       if (skb->dev)
+               skb_orphan(skb);
+}
+
 /* return NF_STOLEN (sent) or NF_ACCEPT if local=1 (not sent) */
 static inline int ip_vs_nat_send_or_cont(int pf, struct sk_buff *skb,