Patchwork Unix Socket buffer attribution

login
register
mail settings
Submitter WANG Cong
Date Jan. 23, 2013, 11:42 a.m.
Message ID <kdoibl$978$1@ger.gmane.org>
Download mbox | patch
Permalink /patch/214896/
State Changes Requested
Delegated to: David Miller
Headers show

Comments

WANG Cong - Jan. 23, 2013, 11:42 a.m.
On Tue, 22 Jan 2013 at 02:01 GMT, Yannick Koehler <yannick@koehler.name> wrote:
>
>   I believe that the problem is that once we move the skb into the
> client's receive queue we need to decrease the sk_wmem_alloc variable
> of the server socket since that skb is no more tied to the server.
> The code should then account for this memory as part of the
> sk_rmem_alloc variable on the client's socket.  The function
> "skb_set_owner_r(skb,owner)" would seem to be the function to do that,
> so it would seem to me.

Something like below??

-------->


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet - Jan. 23, 2013, 2:26 p.m.
On Wed, 2013-01-23 at 11:42 +0000, Cong Wang wrote:
> On Tue, 22 Jan 2013 at 02:01 GMT, Yannick Koehler <yannick@koehler.name> wrote:
> >
> >   I believe that the problem is that once we move the skb into the
> > client's receive queue we need to decrease the sk_wmem_alloc variable
> > of the server socket since that skb is no more tied to the server.
> > The code should then account for this memory as part of the
> > sk_rmem_alloc variable on the client's socket.  The function
> > "skb_set_owner_r(skb,owner)" would seem to be the function to do that,
> > so it would seem to me.
> 
> Something like below??
> 
> -------->
> 
> diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
> index 0c61236..e273072 100644
> --- a/net/unix/af_unix.c
> +++ b/net/unix/af_unix.c
> @@ -1205,6 +1205,7 @@ restart:
>  
>  	unix_state_unlock(sk);
>  
> +	skb_set_owner_r(skb, other);
>  	/* take ten and and send info to listening sock */
>  	spin_lock(&other->sk_receive_queue.lock);
>  	__skb_queue_tail(&other->sk_receive_queue, skb);
> @@ -1578,6 +1579,7 @@ restart:
>  
>  	if (sock_flag(other, SOCK_RCVTSTAMP))
>  		__net_timestamp(skb);
> +	skb_set_owner_r(skb, other);
>  	maybe_add_creds(skb, sock, other);
>  	skb_queue_tail(&other->sk_receive_queue, skb);
>  	if (max_level > unix_sk(other)->recursion_level)
> @@ -1693,6 +1695,7 @@ static int unix_stream_sendmsg(struct kiocb *kiocb, struct socket *sock,
>  		    (other->sk_shutdown & RCV_SHUTDOWN))
>  			goto pipe_err_free;
>  
> +		skb_set_owner_r(skb, other);
>  		maybe_add_creds(skb, sock, other);
>  		skb_queue_tail(&other->sk_receive_queue, skb);
>  		if (max_level > unix_sk(other)->recursion_level)
> 

So what prevents a malicious program to DOS the machine ?

Current behavior is on purpose. Limited, predictable, but less holes.

Some applications might depend on the current flow control : Limiting
the working set also permits to keep cpu caches hot.

If you want to change it, better do a full analysis, because hackers
will do it.

Its probably doable, but with a "man unix" change.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yannick Koehler - Jan. 23, 2013, 4:36 p.m.
> So what prevents a malicious program to DOS the machine ?

The recv queue (checked with recvq_full()) and receiving's socket
rcvbuf (check added in my patch).

Actually the current situation can easily lead to a DOS situation.  I
simply have to write one application that connect to a unix socket
domain and have it send me data for which I never call recvfrom() and
voilĂ , all other consumer of this unix socket application will no more
be able to communicate with this application once it maxed out it's
sndbuf, default is 128k I believe.

I will submit my patch in my next email.

--
Yannick Koehler
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet - Jan. 23, 2013, 4:56 p.m.
On Wed, 2013-01-23 at 11:36 -0500, Yannick Koehler wrote:
> > So what prevents a malicious program to DOS the machine ?
> 
> The recv queue (checked with recvq_full()) and receiving's socket
> rcvbuf (check added in my patch).
> 

Nope. The limit is given in number of messages, and its the socket
backlog. Many machines setup a somaxconn = 1024 limit in order
to reasonably listen for TCP connections.

> Actually the current situation can easily lead to a DOS situation.  I
> simply have to write one application that connect to a unix socket
> domain and have it send me data for which I never call recvfrom() and
> voilĂ , all other consumer of this unix socket application will no more
> be able to communicate with this application once it maxed out it's
> sndbuf, default is 128k I believe.

A single message can consume ~128k.

If we allow 1024 messages being sent, we consume 128 Mbytes per evil
socket.

Enough to kill many linux based devices.

You'll have to add proper limits (SO_RCVBUF), accounting the truesize of
all accumulated messages.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet - Jan. 23, 2013, 5:13 p.m.
On Wed, 2013-01-23 at 08:56 -0800, Eric Dumazet wrote:

> You'll have to add proper limits (SO_RCVBUF), accounting the truesize of
> all accumulated messages.

And if you claim being able to remove DOS attacks, you'll also have to
add global limits, at a very minimum.

(a la /proc/sys/net/ipv4/tcp_mem or /proc/sys/net/ipv4/udp_mem)

Its not an easy problem, unfortunately.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yannick Koehler - Jan. 23, 2013, 5:36 p.m.
Hi Eric,

  I am not sure to follow you.  I am not changing how sockets works.
I am actually making the af_unix socket works like others, by using
the sndbuf/rcvbuf limits.  The code I added was took from netlink.c
and sock.c (sock_queue_err_skb).  And actually, I am simply "adding" a
limit check, not removing.  The only thing this may do as a negative
side effect is allow more buffer at the same time in the system, but
the global number of buffer remains checked, as it was, if it was,
since I am not changing how buffer gets allocated, just accounted.
Please check my patch.

2013/1/23 Eric Dumazet <eric.dumazet@gmail.com>:
> On Wed, 2013-01-23 at 08:56 -0800, Eric Dumazet wrote:
>
>> You'll have to add proper limits (SO_RCVBUF), accounting the truesize of
>> all accumulated messages.
>
> And if you claim being able to remove DOS attacks, you'll also have to
> add global limits, at a very minimum.
>
> (a la /proc/sys/net/ipv4/tcp_mem or /proc/sys/net/ipv4/udp_mem)
>
> Its not an easy problem, unfortunately.
>
>
>

Patch

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 0c61236..e273072 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -1205,6 +1205,7 @@  restart:
 
 	unix_state_unlock(sk);
 
+	skb_set_owner_r(skb, other);
 	/* take ten and and send info to listening sock */
 	spin_lock(&other->sk_receive_queue.lock);
 	__skb_queue_tail(&other->sk_receive_queue, skb);
@@ -1578,6 +1579,7 @@  restart:
 
 	if (sock_flag(other, SOCK_RCVTSTAMP))
 		__net_timestamp(skb);
+	skb_set_owner_r(skb, other);
 	maybe_add_creds(skb, sock, other);
 	skb_queue_tail(&other->sk_receive_queue, skb);
 	if (max_level > unix_sk(other)->recursion_level)
@@ -1693,6 +1695,7 @@  static int unix_stream_sendmsg(struct kiocb *kiocb, struct socket *sock,
 		    (other->sk_shutdown & RCV_SHUTDOWN))
 			goto pipe_err_free;
 
+		skb_set_owner_r(skb, other);
 		maybe_add_creds(skb, sock, other);
 		skb_queue_tail(&other->sk_receive_queue, skb);
 		if (max_level > unix_sk(other)->recursion_level)