diff mbox

RFC: sk leak in sock_graft?

Message ID 20170624130827.GD6901@oracle.com
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Sowmini Varadhan June 24, 2017, 1:08 p.m. UTC
We're seeing a memleak when we run an infinite loop that 
loads/unloads rds-tcp, and runs some traffic between each 
load/unload.

Analysis shows that this is happening for the following reason:

inet_accept -> sock_graft does
	parent->sk = sk
but if the parent->sk was previously pointing at some other
struct sock "old_sk" (happens in the case of rds_tcp_accept_one()
which has historically called sock_create_kern() to set up
the new_sock), we need to sock_put(old_sk), else we'd leak it.

In general, sock_graft() is cutting loose the parent->sk,
so it looks like it needs to release its refcnt on it?

Patch below takes care of the leak in our case, but I could use
some input on other locking considerations, and if this is ok
with other modules that use sock_graft()

-----------------------patch below---------------------------------

Comments

David Miller June 27, 2017, 7:38 p.m. UTC | #1
From: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Date: Sat, 24 Jun 2017 09:08:27 -0400

> We're seeing a memleak when we run an infinite loop that 
> loads/unloads rds-tcp, and runs some traffic between each 
> load/unload.
> 
> Analysis shows that this is happening for the following reason:
> 
> inet_accept -> sock_graft does
> 	parent->sk = sk
> but if the parent->sk was previously pointing at some other
> struct sock "old_sk" (happens in the case of rds_tcp_accept_one()
> which has historically called sock_create_kern() to set up
> the new_sock), we need to sock_put(old_sk), else we'd leak it.
> 
> In general, sock_graft() is cutting loose the parent->sk,
> so it looks like it needs to release its refcnt on it?
> 
> Patch below takes care of the leak in our case, but I could use
> some input on other locking considerations, and if this is ok
> with other modules that use sock_graft()

It could simply be the case that rds-tcp is the first setup that
created that situation where there is a parent->sk already.

In all of the normal accept*() code paths, a plain struct socket
is allocated and nothing sets newsock->sk to anything before that
sock_graft() call.

Why does rds-tcp need to call sock_graft() without those invariants
met?

Thanks.
Sowmini Varadhan June 27, 2017, 7:59 p.m. UTC | #2
On (06/27/17 15:38), David Miller wrote:
> 
> It could simply be the case that rds-tcp is the first setup that
> created that situation where there is a parent->sk already.

Possibly, I noticed that other callers call sock_create_lite()
and I dont know the history here - this seems to have been
the case from day-1 of rds-tcp. (and I dread changing 
rds_tcp_accept_kern() to do this, because then every module unload
would need to go and check if sock->sk is non-null first, before
cleaning it up

> Why does rds-tcp need to call sock_graft() without those invariants
> met?

It would certainly help to declare "dont use sock_creeate_kern()
if you are going to accept on this socket"- I dont see that being 
mandated anywhere.

It would also help to have a BUG_ON(parent->sk) or at least a
WARN_ON(parent->sk) in sock_graft, before unilaterally assigning 
it to the new sk. 

--Sowmini
Sowmini Varadhan June 27, 2017, 8:45 p.m. UTC | #3
On (06/27/17 15:59), Sowmini Varadhan wrote:
> > Why does rds-tcp need to call sock_graft() without those invariants
> > met?
> 
> It would certainly help to declare "dont use sock_creeate_kern()
> if you are going to accept on this socket"- I dont see that being 
> mandated anywhere.

I can look into getting rds_tcp_accept_one also calling sock_create_lite
like every other caller, (though I may not get to this for another week,
due to travel), but the code in sock_graft() doesnt look right either. 

At the very least, there needs to be a WARN_ON(parent->sk) there,
to provide a gentle dope-slap for the next slob that stumbles on this
type of leak.

--Sowmini
David Miller June 29, 2017, 4:46 p.m. UTC | #4
From: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Date: Tue, 27 Jun 2017 16:45:29 -0400

> On (06/27/17 15:59), Sowmini Varadhan wrote:
>> > Why does rds-tcp need to call sock_graft() without those invariants
>> > met?
>> 
>> It would certainly help to declare "dont use sock_creeate_kern()
>> if you are going to accept on this socket"- I dont see that being 
>> mandated anywhere.
> 
> I can look into getting rds_tcp_accept_one also calling sock_create_lite
> like every other caller, (though I may not get to this for another week,
> due to travel), but the code in sock_graft() doesnt look right either. 
> 
> At the very least, there needs to be a WARN_ON(parent->sk) there,
> to provide a gentle dope-slap for the next slob that stumbles on this
> type of leak.

In the accept case it is, if anything, very wasteful.

This is because you allocate a sock and then immediately free it up.

I would say mimick the logic of sys_accept4() and just use
sock_alloc() to allocate the struct socket without a struct sock.
Or, as you say, use sock_create_lite() or a similar helper.
diff mbox

Patch

diff --git a/include/net/sock.h b/include/net/sock.h
index 5374c0d..014ad56 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1686,12 +1686,19 @@  static inline void sock_orphan(struct sock *sk)
 
 static inline void sock_graft(struct sock *sk, struct socket *parent)
 {
+       struct sock *old_sk;
+
        write_lock_bh(&sk->sk_callback_lock);
        sk->sk_wq = parent->wq;
+       old_sk = parent->sk;
        parent->sk = sk;
        sk_set_socket(sk, parent);
        security_sock_graft(sk, parent);
        write_unlock_bh(&sk->sk_callback_lock);
+       if (old_sk) {
+               sock_orphan(old_sk);
+               sock_put(old_sk);
+       }
 }