diff mbox

[net-next] sctp: sctp_close: fix release of bindings for deferred call_rcu's

Message ID 13ea81dcf399f935bd8e048ce225411d8f29a792.1359650254.git.dborkman@redhat.com
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Daniel Borkmann Jan. 31, 2013, 4:51 p.m. UTC
It seems due to RCU usage, i.e. within SCTP's address binding list,
a, say, ``behavioral change'' was introduced which does actually
not conform to the RFC anymore. In particular consider the following
(fictional) scenario to demonstrate this:

  do:
    Two SOCK_SEQPACKET-style sockets are openend (S1, S2)
    S1 is bound to 127.0.0.1, port 1024 [server]
    S2 is bound to 127.0.0.1, port 1025 [client]
    listen(2) is invoked on S1
    From S2 we call one sendmsg(2) with msg.msg_name and
       msg.msg_namelen parameters set to the server's
       address
    S1, S2 are closed
    goto do

The first pass of this loop passes sucessful, while the second round
fails during binding of S1 (address still in use). What is happening?
In the first round, the initial handshake is being done, and, at the
time close(2) is called on S1, a nongraceful shutdown is performed via
ABORT since in S1's receive queue an unprocessed packet is present,
thus stating an error condition. This can be considered as a correct
behavior.

During close also all bound addresses are freed, thus nothing *must*
be active anymore.

  After checking the Verification Tag, the receiving endpoint shall
  remove the association from its record, and shall report the
  termination to its upper layer. (RFC2960, 9.1 Abort of an Association)

Also, no half-open states are supported, thus after an ungraceful
shutdown, we leave nothing behind. However, this seems not to be
happening though. In a real-world scenario, this is exactly where
it breaks the lksctp-tools functional test suite, *for instance*:

  ./test_sockopt
  test_sockopt.c  1 PASS : getsockopt(SCTP_STATUS) on a socket with no assoc
  test_sockopt.c  2 PASS : getsockopt(SCTP_STATUS)
  test_sockopt.c  3 PASS : getsockopt(SCTP_STATUS) with invalid associd
  test_sockopt.c  4 PASS : getsockopt(SCTP_STATUS) with NULL associd
  test_sockopt.c  5 BROK : bind: Address already in use

With this patch, the example above (which simulates a similar scenario
as in the implementation of this test case) and therefore also this test
runs successfully through.

If one wants to fix this issue, an RCU barrier needs to be introduced
within the sctp_close handler. One could argue that this is quite costly,
which is true, but on the other hand, if an application calls close on
its socket, it likely might be out of its critical path anyway.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
---
 net/sctp/socket.c | 5 +++++
 1 file changed, 5 insertions(+)

Comments

Vladislav Yasevich Jan. 31, 2013, 7:49 p.m. UTC | #1
On 01/31/2013 11:51 AM, Daniel Borkmann wrote:
> It seems due to RCU usage, i.e. within SCTP's address binding list,
> a, say, ``behavioral change'' was introduced which does actually
> not conform to the RFC anymore. In particular consider the following
> (fictional) scenario to demonstrate this:
>
>    do:
>      Two SOCK_SEQPACKET-style sockets are openend (S1, S2)
>      S1 is bound to 127.0.0.1, port 1024 [server]
>      S2 is bound to 127.0.0.1, port 1025 [client]
>      listen(2) is invoked on S1
>      From S2 we call one sendmsg(2) with msg.msg_name and
>         msg.msg_namelen parameters set to the server's
>         address
>      S1, S2 are closed
>      goto do
>
> The first pass of this loop passes sucessful, while the second round
> fails during binding of S1 (address still in use). What is happening?
> In the first round, the initial handshake is being done, and, at the
> time close(2) is called on S1, a nongraceful shutdown is performed via
> ABORT since in S1's receive queue an unprocessed packet is present,
> thus stating an error condition. This can be considered as a correct
> behavior.
>
> During close also all bound addresses are freed, thus nothing *must*
> be active anymore.
>
>    After checking the Verification Tag, the receiving endpoint shall
>    remove the association from its record, and shall report the
>    termination to its upper layer. (RFC2960, 9.1 Abort of an Association)
>
> Also, no half-open states are supported, thus after an ungraceful
> shutdown, we leave nothing behind. However, this seems not to be
> happening though. In a real-world scenario, this is exactly where
> it breaks the lksctp-tools functional test suite, *for instance*:
>
>    ./test_sockopt
>    test_sockopt.c  1 PASS : getsockopt(SCTP_STATUS) on a socket with no assoc
>    test_sockopt.c  2 PASS : getsockopt(SCTP_STATUS)
>    test_sockopt.c  3 PASS : getsockopt(SCTP_STATUS) with invalid associd
>    test_sockopt.c  4 PASS : getsockopt(SCTP_STATUS) with NULL associd
>    test_sockopt.c  5 BROK : bind: Address already in use
>
> With this patch, the example above (which simulates a similar scenario
> as in the implementation of this test case) and therefore also this test
> runs successfully through.
>
> If one wants to fix this issue, an RCU barrier needs to be introduced
> within the sctp_close handler. One could argue that this is quite costly,
> which is true, but on the other hand, if an application calls close on
> its socket, it likely might be out of its critical path anyway.

Hi Daniel

The fact that we delay freeing bind_addr list due to rcu shouldn't 
change the endpoint destruction path.

The reason you'd get a EADDRINSUE would be that the 
sctp_endpoint_destroy() hasn't been triggered.  That means that 
something is still referencing the endpoint.  However, there doesn't
appear to be anything holding a reference to the association or the
endpoint from the bind address list.  So the fact that the entries might 
not have been kfreed yet shouldn't impact the binding of new sockets.

What is most likely happening instead is that we now have rcu delayed
transport destruction and in that path, we delay dropping the 
association refcount until after the rcu grace period.  That in turn 
causes delayed endpoint refcount drop, which in turn causes delayed 
removed of the socket from the port list.  This is the cause of the issue.

The right solution would be to see if we can drop the refcounts at 
delete instead of at destroy.  That should remove the delay.

-vlad

>
> Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
> ---
>   net/sctp/socket.c | 5 +++++
>   1 file changed, 5 insertions(+)
>
> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
> index 9e65758..a9c18b4 100644
> --- a/net/sctp/socket.c
> +++ b/net/sctp/socket.c
> @@ -1536,6 +1536,11 @@ SCTP_STATIC void sctp_close(struct sock *sk, long timeout)
>
>   	sock_put(sk);
>
> +	/* There can still be (non-)TCP-style associations be waiting
> +	 * to be processed via deferred RCU.
> +	 */
> +	rcu_barrier();
> +
>   	SCTP_DBG_OBJCNT_DEC(sock);
>   }
>
>

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Daniel Borkmann Feb. 1, 2013, 8:04 a.m. UTC | #2
On 01/31/2013 08:49 PM, Vlad Yasevich wrote:
> On 01/31/2013 11:51 AM, Daniel Borkmann wrote:
>> It seems due to RCU usage, i.e. within SCTP's address binding list,
>> a, say, ``behavioral change'' was introduced which does actually
>> not conform to the RFC anymore. In particular consider the following
>> (fictional) scenario to demonstrate this:
>>
>>    do:
>>      Two SOCK_SEQPACKET-style sockets are openend (S1, S2)
>>      S1 is bound to 127.0.0.1, port 1024 [server]
>>      S2 is bound to 127.0.0.1, port 1025 [client]
>>      listen(2) is invoked on S1
>>      From S2 we call one sendmsg(2) with msg.msg_name and
>>         msg.msg_namelen parameters set to the server's
>>         address
>>      S1, S2 are closed
>>      goto do
>>
>> The first pass of this loop passes sucessful, while the second round
>> fails during binding of S1 (address still in use). What is happening?
>> In the first round, the initial handshake is being done, and, at the
>> time close(2) is called on S1, a nongraceful shutdown is performed via
>> ABORT since in S1's receive queue an unprocessed packet is present,
>> thus stating an error condition. This can be considered as a correct
>> behavior.
>>
>> During close also all bound addresses are freed, thus nothing *must*
>> be active anymore.
>>
>>    After checking the Verification Tag, the receiving endpoint shall
>>    remove the association from its record, and shall report the
>>    termination to its upper layer. (RFC2960, 9.1 Abort of an Association)
>>
>> Also, no half-open states are supported, thus after an ungraceful
>> shutdown, we leave nothing behind. However, this seems not to be
>> happening though. In a real-world scenario, this is exactly where
>> it breaks the lksctp-tools functional test suite, *for instance*:
>>
>>    ./test_sockopt
>>    test_sockopt.c  1 PASS : getsockopt(SCTP_STATUS) on a socket with no assoc
>>    test_sockopt.c  2 PASS : getsockopt(SCTP_STATUS)
>>    test_sockopt.c  3 PASS : getsockopt(SCTP_STATUS) with invalid associd
>>    test_sockopt.c  4 PASS : getsockopt(SCTP_STATUS) with NULL associd
>>    test_sockopt.c  5 BROK : bind: Address already in use
>>
>> With this patch, the example above (which simulates a similar scenario
>> as in the implementation of this test case) and therefore also this test
>> runs successfully through.
>>
>> If one wants to fix this issue, an RCU barrier needs to be introduced
>> within the sctp_close handler. One could argue that this is quite costly,
>> which is true, but on the other hand, if an application calls close on
>> its socket, it likely might be out of its critical path anyway.
>
> The fact that we delay freeing bind_addr list due to rcu shouldn't change the endpoint destruction path.
>
> The reason you'd get a EADDRINSUE would be that the sctp_endpoint_destroy() hasn't been triggered.  That means that something is still referencing the endpoint.  However, there doesn't
> appear to be anything holding a reference to the association or the
> endpoint from the bind address list.  So the fact that the entries might not have been kfreed yet shouldn't impact the binding of new sockets.
>
> What is most likely happening instead is that we now have rcu delayed
> transport destruction and in that path, we delay dropping the association refcount until after the rcu grace period.  That in turn causes delayed endpoint refcount drop, which in turn causes delayed removed of the socket from the port list.  This is the cause of the issue.
>
> The right solution would be to see if we can drop the refcounts at delete instead of at destroy.  That should remove the delay.

Thanks for your feedback Vlad!

I'll do a rework of this patch and send a version 2.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 9e65758..a9c18b4 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -1536,6 +1536,11 @@  SCTP_STATIC void sctp_close(struct sock *sk, long timeout)
 
 	sock_put(sk);
 
+	/* There can still be (non-)TCP-style associations be waiting
+	 * to be processed via deferred RCU.
+	 */
+	rcu_barrier();
+
 	SCTP_DBG_OBJCNT_DEC(sock);
 }