[net-next] rds: avoid lock hierarchy violation between m_rs_lock and rs_recv_lock

Message ID 1533761833-106379-1-git-send-email-sowmini.varadhan@oracle.com
State Deferred
Delegated to: David Miller
Headers show
Series
  • [net-next] rds: avoid lock hierarchy violation between m_rs_lock and rs_recv_lock
Related show

Commit Message

Sowmini Varadhan Aug. 8, 2018, 8:57 p.m.
The following deadlock, reported by syzbot, can occur if CPU0 is in
rds_send_remove_from_sock() while CPU1 is in rds_clear_recv_queue()

       CPU0                    CPU1
       ----                    ----
  lock(&(&rm->m_rs_lock)->rlock);
                               lock(&rs->rs_recv_lock);
                               lock(&(&rm->m_rs_lock)->rlock);
  lock(&rs->rs_recv_lock);

The deadlock should be avoided by moving the messages from the
rs_recv_queue into a tmp_list in rds_clear_recv_queue() under
the rs_recv_lock, and then dropping the refcnt on the messages
in the tmp_list (potentially resulting in rds_message_purge())
after dropping the rs_recv_lock.

The same lock hierarchy violation also exists in rds_still_queued()
and should be avoided in a similar manner

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Reported-by: syzbot+52140d69ac6dc6b927a9@syzkaller.appspotmail.com
---
 net/rds/recv.c |   11 +++++++++--
 1 files changed, 9 insertions(+), 2 deletions(-)

Comments

Santosh Shilimkar Aug. 8, 2018, 9:51 p.m. | #1
On 8/8/2018 1:57 PM, Sowmini Varadhan wrote:
> The following deadlock, reported by syzbot, can occur if CPU0 is in
> rds_send_remove_from_sock() while CPU1 is in rds_clear_recv_queue()
> 
>         CPU0                    CPU1
>         ----                    ----
>    lock(&(&rm->m_rs_lock)->rlock);
>                                 lock(&rs->rs_recv_lock);
>                                 lock(&(&rm->m_rs_lock)->rlock);
>    lock(&rs->rs_recv_lock);
> 
> The deadlock should be avoided by moving the messages from the
> rs_recv_queue into a tmp_list in rds_clear_recv_queue() under
> the rs_recv_lock, and then dropping the refcnt on the messages
> in the tmp_list (potentially resulting in rds_message_purge())
> after dropping the rs_recv_lock.
> 
> The same lock hierarchy violation also exists in rds_still_queued()
> and should be avoided in a similar manner
> 
> Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
> Reported-by: syzbot+52140d69ac6dc6b927a9@syzkaller.appspotmail.com
> ---
This bug doesn't make sense since two different transports are using
same socket (Loop and rds_tcp) and running together.
For same transport, such race can't happen with MSG_ON_SOCK flag.
CPU1-> rds_loop_inc_free
CPU0 -> rds_tcp_cork ...

I need to understand this test better.

Regards,
Santosh
Sowmini Varadhan Aug. 8, 2018, 10:18 p.m. | #2
On (08/08/18 14:51), Santosh Shilimkar wrote:
> This bug doesn't make sense since two different transports are using
> same socket (Loop and rds_tcp) and running together.
> For same transport, such race can't happen with MSG_ON_SOCK flag.
> CPU1-> rds_loop_inc_free
> CPU0 -> rds_tcp_cork ...
> 

The test is just reporting a lock hierarchy violation

As far as I can tell, this wasn't an actual deadlock that happened
because as you point out, either a socket has the rds_tcp transport
or the rds_loop transport, so this particular pair of stack traces
would not happen with the code as it exists today.

but there is a valid lock hierachy violation here, and
imho it's a good idea to get that cleaned up. 

It also avoids needlessly  holding down the rs_recv_lock
when doing an rds_inc_put.

--Sowmini
Santosh Shilimkar Aug. 8, 2018, 10:37 p.m. | #3
On 8/8/2018 3:18 PM, Sowmini Varadhan wrote:
> On (08/08/18 14:51), Santosh Shilimkar wrote:
>> This bug doesn't make sense since two different transports are using
>> same socket (Loop and rds_tcp) and running together.
>> For same transport, such race can't happen with MSG_ON_SOCK flag.
>> CPU1-> rds_loop_inc_free
>> CPU0 -> rds_tcp_cork ...
>>
> 
> The test is just reporting a lock hierarchy violation
>
> As far as I can tell, this wasn't an actual deadlock that happened
> because as you point out, either a socket has the rds_tcp transport
> or the rds_loop transport, so this particular pair of stack traces
> would not happen with the code as it exists today.
>
Exactly.

> but there is a valid lock hierachy violation here, and
> imho it's a good idea to get that cleaned up.
> 
The lock hierarchy violation is protected for the same transport.
I don't see this violation possible for legitimate use and hence
the comment. If we start supporting two different transport on
same socket then we have many more cases to fix and as such lock
violation will be just one of those.

Loop transport seems to keep throwing surprises. Need to
confirm but looks like it can co-exist with another transport
on same socket if those traces to be believed. If its the case,
then definitely that need to be plugged.

Regards,
Santosh
David Miller Aug. 11, 2018, 6:22 p.m. | #4
From: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Date: Wed,  8 Aug 2018 13:57:13 -0700

> The following deadlock, reported by syzbot, can occur if CPU0 is in
> rds_send_remove_from_sock() while CPU1 is in rds_clear_recv_queue()
> 
>        CPU0                    CPU1
>        ----                    ----
>   lock(&(&rm->m_rs_lock)->rlock);
>                                lock(&rs->rs_recv_lock);
>                                lock(&(&rm->m_rs_lock)->rlock);
>   lock(&rs->rs_recv_lock);
> 
> The deadlock should be avoided by moving the messages from the
> rs_recv_queue into a tmp_list in rds_clear_recv_queue() under
> the rs_recv_lock, and then dropping the refcnt on the messages
> in the tmp_list (potentially resulting in rds_message_purge())
> after dropping the rs_recv_lock.
> 
> The same lock hierarchy violation also exists in rds_still_queued()
> and should be avoided in a similar manner
> 
> Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
> Reported-by: syzbot+52140d69ac6dc6b927a9@syzkaller.appspotmail.com

I'm putting this in deferred state for now.

Sowmini, once you and Santosh agree on what exactly to do, please
resubmit.

Thank you.

Patch

diff --git a/net/rds/recv.c b/net/rds/recv.c
index 504cd6b..1cf7072 100644
--- a/net/rds/recv.c
+++ b/net/rds/recv.c
@@ -429,6 +429,7 @@  static int rds_still_queued(struct rds_sock *rs, struct rds_incoming *inc,
 	struct sock *sk = rds_rs_to_sk(rs);
 	int ret = 0;
 	unsigned long flags;
+	bool drop_ref = false;
 
 	write_lock_irqsave(&rs->rs_recv_lock, flags);
 	if (!list_empty(&inc->i_item)) {
@@ -439,11 +440,13 @@  static int rds_still_queued(struct rds_sock *rs, struct rds_incoming *inc,
 					      -be32_to_cpu(inc->i_hdr.h_len),
 					      inc->i_hdr.h_dport);
 			list_del_init(&inc->i_item);
-			rds_inc_put(inc);
+			drop_ref = true;
 		}
 	}
 	write_unlock_irqrestore(&rs->rs_recv_lock, flags);
 
+	if (drop_ref)
+		rds_inc_put(inc);
 	rdsdebug("inc %p rs %p still %d dropped %d\n", inc, rs, ret, drop);
 	return ret;
 }
@@ -751,16 +754,20 @@  void rds_clear_recv_queue(struct rds_sock *rs)
 	struct sock *sk = rds_rs_to_sk(rs);
 	struct rds_incoming *inc, *tmp;
 	unsigned long flags;
+	LIST_HEAD(tmp_list);
 
 	write_lock_irqsave(&rs->rs_recv_lock, flags);
 	list_for_each_entry_safe(inc, tmp, &rs->rs_recv_queue, i_item) {
 		rds_recv_rcvbuf_delta(rs, sk, inc->i_conn->c_lcong,
 				      -be32_to_cpu(inc->i_hdr.h_len),
 				      inc->i_hdr.h_dport);
+		list_move_tail(&inc->i_item, &tmp_list);
+	}
+	write_unlock_irqrestore(&rs->rs_recv_lock, flags);
+	list_for_each_entry_safe(inc, tmp, &tmp_list, i_item) {
 		list_del_init(&inc->i_item);
 		rds_inc_put(inc);
 	}
-	write_unlock_irqrestore(&rs->rs_recv_lock, flags);
 }
 
 /*