diff mbox

BUG at net/sctp/socket.c:7425

Message ID 20170129104002.GJ3781@localhost.localdomain
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Marcelo Ricardo Leitner Jan. 29, 2017, 10:40 a.m. UTC
On Sun, Jan 29, 2017 at 03:35:31AM +0300, Alexander Popov wrote:
> Hello,
> 
> I'm running the syzkaller fuzzer for v4.10-rc4 (0aa0313f9d576affd7747cc3f179feb097d28990)
> and have such a crash in sctp code:
> 
...
> 
> Unfortunately, I didn't manage to get a C program reproducing the crash (looks like race).
> However, I stably hit it on my setup - so I can help fixing the issue.
> 
> The crash happens here:
> 	/* Let another process have a go.  Since we are going
> 	 * to sleep anyway.
> 	 */
> 	release_sock(sk);
> 	current_timeo = schedule_timeout(current_timeo);
> >	BUG_ON(sk != asoc->base.sk);
> 	lock_sock(sk);
> 
> I've added some debugging output and see, that the original value of asoc->base.sk is
> changed to the address of another struct sock, which appeared in sctp_endpoint_init()
> shortly before the crash.

You need some threading for this to happen.  asoc->base.sk will change
if you peeloff the association.
It seems you had one thread waiting for some sndbuf to be available on a
sendmsg() call and another thread did a peeloff on the association that
the first thread was using.
Yeah I think this will reproduce it.
And in this case, it's probably better if we just return -EPIPE as the
association doesn't exist in that socket anymore instead of the BUG_ON.

  Marcelo

---8<---

Comments

Alexander Popov Jan. 30, 2017, 11:19 a.m. UTC | #1
On 29.01.2017 13:40, Marcelo Ricardo Leitner wrote:
> On Sun, Jan 29, 2017 at 03:35:31AM +0300, Alexander Popov wrote:
>> Hello,
>>
>> I'm running the syzkaller fuzzer for v4.10-rc4 (0aa0313f9d576affd7747cc3f179feb097d28990)
>> and have such a crash in sctp code:
>>
> ...
>>
>> Unfortunately, I didn't manage to get a C program reproducing the crash (looks like race).
>> However, I stably hit it on my setup - so I can help fixing the issue.
>>
>> The crash happens here:
>> 	/* Let another process have a go.  Since we are going
>> 	 * to sleep anyway.
>> 	 */
>> 	release_sock(sk);
>> 	current_timeo = schedule_timeout(current_timeo);
>>> 	BUG_ON(sk != asoc->base.sk);
>> 	lock_sock(sk);
>>
>> I've added some debugging output and see, that the original value of asoc->base.sk is
>> changed to the address of another struct sock, which appeared in sctp_endpoint_init()
>> shortly before the crash.
> 
> You need some threading for this to happen.  asoc->base.sk will change
> if you peeloff the association.
> It seems you had one thread waiting for some sndbuf to be available on a
> sendmsg() call and another thread did a peeloff on the association that
> the first thread was using.
> Yeah I think this will reproduce it.
> And in this case, it's probably better if we just return -EPIPE as the
> association doesn't exist in that socket anymore instead of the BUG_ON.

Thanks for your reply and patch, Marcelo.
I've checked your explanation and agree with it. The situation looks like this:
...
[   55.719561] sctp_endpoint_init: sk ffff88006718c8c0
[   55.721158] sctp_association_init: asoc ffff880059e96818, base.sk = ffff88006718c8c0
...
[   56.144070] sctp_wait_for_sndbuf: asoc:ffff880059e96818, timeo:9223372036854775807,
msg_len:24
[   56.148650] sctp_endpoint_init: sk ffff880068bca480
[   56.149216] sctp_sock_migrate: asoc ffff880059e96818 from oldsk ffff88006718c8c0 to
newsk ffff880068bca480
[   56.150442] sctp_assoc_migrate: asoc ffff880059e96818 to newsk ffff880068bca480
[   56.168827] crash!!! asoc ffff880059e96818: sk ffff88006718c8c0 != base.sk ffff880068bca480
[   56.169801] ------------[ cut here ]------------
[   56.170151] kernel BUG at net/sctp/socket.c:7433!
...


> ---8<---
> 
> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
> index 26a514269b92..e9870aead88b 100644
> --- a/net/sctp/socket.c
> +++ b/net/sctp/socket.c
> @@ -6838,7 +6838,8 @@ static int sctp_wait_for_sndbuf(struct sctp_association *asoc, long *timeo_p,
>  		 */
>  		sctp_release_sock(sk);
>  		current_timeo = schedule_timeout(current_timeo);
> -		BUG_ON(sk != asoc->base.sk);
> +		if (sk != asoc->base.sk)
> +			goto do_error;
>  		sctp_lock_sock(sk);
>  
>  		*timeo_p = current_timeo;

Tested your fix.
Acked-by: Alexander Popov <alex.popov@linux.com>
diff mbox

Patch

diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 26a514269b92..e9870aead88b 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -6838,7 +6838,8 @@  static int sctp_wait_for_sndbuf(struct sctp_association *asoc, long *timeo_p,
 		 */
 		sctp_release_sock(sk);
 		current_timeo = schedule_timeout(current_timeo);
-		BUG_ON(sk != asoc->base.sk);
+		if (sk != asoc->base.sk)
+			goto do_error;
 		sctp_lock_sock(sk);
 
 		*timeo_p = current_timeo;