diff mbox

[net,v2] net: sctp: test if association is dead in sctp_wake_up_waiters

Message ID de3e98e644dde7dcaea9a952de38eb32a2959eeb.1396999714.git.dborkman@redhat.com
State Superseded, archived
Delegated to: David Miller
Headers show

Commit Message

Daniel Borkmann April 8, 2014, 11:32 p.m. UTC
In function sctp_wake_up_waiters(), we need to involve a test
if the association is declared dead. If so, we don't have any
reference to a possible sibling association anymore and need
to invoke sctp_write_space() instead and normally walk the
socket's associations and notify them of new wmem space. The
reason for special casing is that, otherwise, we could run
into the following issue:

sctp_association_free()
`-> list_del(&asoc->asocs)         <-- poisons list pointer
    asoc->base.dead = true
    sctp_outq_free(&asoc->outqueue)
    `-> __sctp_outq_teardown()
     `-> sctp_chunk_free()
      `-> consume_skb()
       `-> sctp_wfree()
        `-> sctp_wake_up_waiters() <-- dereferences poisoned pointers
                                       if asoc->ep->sndbuf_policy=0

Therefore, only walk the list in an 'optimized' way if we find
that the current association is still active. We could also use
list_del_init() in addition when we call sctp_association_free(),
but as Vlad suggests, we want to trap such bugs and thus leave
it poisoned as is. Stress-testing seems fine now.

Fixes: cd253f9f357d ("net: sctp: wake up all assocs if sndbuf policy is per socket")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Vlad Yasevich <vyasevic@redhat.com>
---
 v1->v2:
  - leave list_del()

 net/sctp/socket.c | 6 ++++++
 1 file changed, 6 insertions(+)

Comments

Daniel Borkmann April 9, 2014, 8:09 a.m. UTC | #1
On 04/09/2014 01:10 AM, Vlad Yasevich wrote:
 > On 04/08/2014 06:23 PM, Daniel Borkmann wrote:
 >> In function sctp_wake_up_waiters() we need to involve a test
 >> if the association is declared dead. If so, we don't have any
 >> reference to a possible sibling association anymore and need
 >> to invoke sctp_write_space() instead and normally walk the
 >> socket's associations and notify them of new wmem space. The
 >> reason for special casing is that, otherwise, we could run
 >> into the following issue:
 >>
 >> sctp_association_free()
 >> `-> list_del(&asoc->asocs)         <-- poisons list pointer
 >>      asoc->base.dead = true
 >>      sctp_outq_free(&asoc->outqueue)
 >>      `-> __sctp_outq_teardown()
 >>       `-> sctp_chunk_free()
 >>        `-> consume_skb()
 >>         `-> sctp_wfree()
 >>          `-> sctp_wake_up_waiters() <-- dereferences poisoned pointers
 >>                                         if asoc->ep->sndbuf_policy=0
 >>
 >> Therefore, only walk the list in an 'optimized' way if we find
 >> that the current association is still active. It's also more
 >> clean in that context to just use list_del_init() when we call
 >> sctp_association_free(). Stress-testing seems fine now.
 >
 > One of the reasons that we don't use list_del_init() here is that
 > we want to be able to trap on uninitialized/corrupt list manipulation,
 > just like you did.  If it wasn't there, the bug would have been hidden.
 >
 > Please keep it there.  The rest of the patch is fine.

Test run over night and I've seen no issues.

But I'd still question the usage of asoc->base.dead though, I think
this approach of testing for asoc->base.dead is a bit racy (perhaps
general usage of it, imho) - at least here there's a tiny window where
we poison pointers before we actually declare the associaton dead.

Also, I think even if we would have deleted ourselves from the list
after declaring the association dead, a different CPU accessing this
association via sctp_wfree() might already have gotten past the
asoc->base.dead test while we declare it dead in the meantime.

Imho, this still needs to be resolved differently. I'll look further ...
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Daniel Borkmann April 9, 2014, 10:32 a.m. UTC | #2
On 04/09/2014 10:09 AM, Daniel Borkmann wrote:
> On 04/09/2014 01:10 AM, Vlad Yasevich wrote:
>  > On 04/08/2014 06:23 PM, Daniel Borkmann wrote:
>  >> In function sctp_wake_up_waiters() we need to involve a test
>  >> if the association is declared dead. If so, we don't have any
>  >> reference to a possible sibling association anymore and need
>  >> to invoke sctp_write_space() instead and normally walk the
>  >> socket's associations and notify them of new wmem space. The
>  >> reason for special casing is that, otherwise, we could run
>  >> into the following issue:
>  >>
>  >> sctp_association_free()
>  >> `-> list_del(&asoc->asocs)         <-- poisons list pointer
>  >>      asoc->base.dead = true
>  >>      sctp_outq_free(&asoc->outqueue)
>  >>      `-> __sctp_outq_teardown()
>  >>       `-> sctp_chunk_free()
>  >>        `-> consume_skb()
>  >>         `-> sctp_wfree()
>  >>          `-> sctp_wake_up_waiters() <-- dereferences poisoned pointers
>  >>                                         if asoc->ep->sndbuf_policy=0
>  >>
>  >> Therefore, only walk the list in an 'optimized' way if we find
>  >> that the current association is still active. It's also more
>  >> clean in that context to just use list_del_init() when we call
>  >> sctp_association_free(). Stress-testing seems fine now.
>  >
>  > One of the reasons that we don't use list_del_init() here is that
>  > we want to be able to trap on uninitialized/corrupt list manipulation,
>  > just like you did.  If it wasn't there, the bug would have been hidden.
>  >
>  > Please keep it there.  The rest of the patch is fine.
>
> Test run over night and I've seen no issues.
>
> But I'd still question the usage of asoc->base.dead though, I think
> this approach of testing for asoc->base.dead is a bit racy (perhaps
> general usage of it, imho) - at least here there's a tiny window where
> we poison pointers before we actually declare the associaton dead.
>
> Also, I think even if we would have deleted ourselves from the list
> after declaring the association dead, a different CPU accessing this
> association via sctp_wfree() might already have gotten past the
> asoc->base.dead test while we declare it dead in the meantime.

Ok, I think we can scratch that thought ... what happens is that parallel
calls to sctp_sendmsg() are protected under lock_sock()/release_sock()
pair as already stated in the code and within that lock, we are setting
sctp_set_owner_w() for each chunk. When we call sctp_primitive_SEND(),
still under lock, we might eventually end up in sctp_packet_transmit(),
if I follow the path correctly, and orphan the skb in sctp_packet_set_owner_w()
[ which basically would mean, we actually uncharge the accounted memory by
orphaning _before_ we call dev_queue_xmit() since commit 4c3a5bdae293
("sctp: Don't charge for data in sndbuf again when transmitting packet")
but that's perhaps a different story ] and set a new destructor. The
only thing where in that context an association can be freed up by
sctp_association_free() is if sctp_primitive_SEND() returns with error.
So even in that case, we're still protected under lock_sock()/release_sock()
when we flush the outq, so testing asoc->base.dead should be okay then,
quite unintuitive though. Thus, patch seems fine, if wished, I could
still document that in the commit message? Vlad, are we on the same page? ;)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Neil Horman April 9, 2014, 10:59 a.m. UTC | #3
On Wed, Apr 09, 2014 at 12:32:48PM +0200, Daniel Borkmann wrote:
> On 04/09/2014 10:09 AM, Daniel Borkmann wrote:
> >On 04/09/2014 01:10 AM, Vlad Yasevich wrote:
> > > On 04/08/2014 06:23 PM, Daniel Borkmann wrote:
> > >> In function sctp_wake_up_waiters() we need to involve a test
> > >> if the association is declared dead. If so, we don't have any
> > >> reference to a possible sibling association anymore and need
> > >> to invoke sctp_write_space() instead and normally walk the
> > >> socket's associations and notify them of new wmem space. The
> > >> reason for special casing is that, otherwise, we could run
> > >> into the following issue:
> > >>
> > >> sctp_association_free()
> > >> `-> list_del(&asoc->asocs)         <-- poisons list pointer
> > >>      asoc->base.dead = true
> > >>      sctp_outq_free(&asoc->outqueue)
> > >>      `-> __sctp_outq_teardown()
> > >>       `-> sctp_chunk_free()
> > >>        `-> consume_skb()
> > >>         `-> sctp_wfree()
> > >>          `-> sctp_wake_up_waiters() <-- dereferences poisoned pointers
> > >>                                         if asoc->ep->sndbuf_policy=0
> > >>
> > >> Therefore, only walk the list in an 'optimized' way if we find
> > >> that the current association is still active. It's also more
> > >> clean in that context to just use list_del_init() when we call
> > >> sctp_association_free(). Stress-testing seems fine now.
> > >
> > > One of the reasons that we don't use list_del_init() here is that
> > > we want to be able to trap on uninitialized/corrupt list manipulation,
> > > just like you did.  If it wasn't there, the bug would have been hidden.
> > >
> > > Please keep it there.  The rest of the patch is fine.
> >
> >Test run over night and I've seen no issues.
> >
> >But I'd still question the usage of asoc->base.dead though, I think
> >this approach of testing for asoc->base.dead is a bit racy (perhaps
> >general usage of it, imho) - at least here there's a tiny window where
> >we poison pointers before we actually declare the associaton dead.
> >
> >Also, I think even if we would have deleted ourselves from the list
> >after declaring the association dead, a different CPU accessing this
> >association via sctp_wfree() might already have gotten past the
> >asoc->base.dead test while we declare it dead in the meantime.
> 
> Ok, I think we can scratch that thought ... what happens is that parallel
> calls to sctp_sendmsg() are protected under lock_sock()/release_sock()
> pair as already stated in the code and within that lock, we are setting
> sctp_set_owner_w() for each chunk. When we call sctp_primitive_SEND(),
> still under lock, we might eventually end up in sctp_packet_transmit(),
> if I follow the path correctly, and orphan the skb in sctp_packet_set_owner_w()
> [ which basically would mean, we actually uncharge the accounted memory by
> orphaning _before_ we call dev_queue_xmit() since commit 4c3a5bdae293
> ("sctp: Don't charge for data in sndbuf again when transmitting packet")
> but that's perhaps a different story ] and set a new destructor. The
> only thing where in that context an association can be freed up by
> sctp_association_free() is if sctp_primitive_SEND() returns with error.
> So even in that case, we're still protected under lock_sock()/release_sock()
> when we flush the outq, so testing asoc->base.dead should be okay then,
> quite unintuitive though. Thus, patch seems fine, if wished, I could
> still document that in the commit message? Vlad, are we on the same page? ;)
> --
> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

Acked-by: Neil Horman <nhorman@tuxdriver.com>

Both to the patch, and the documentation, as its not at all clear what lock
protects the reading and writing of the dead variable.
Neil

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Vladislav Yasevich April 9, 2014, 12:52 p.m. UTC | #4
On 04/09/2014 04:09 AM, Daniel Borkmann wrote:
> On 04/09/2014 01:10 AM, Vlad Yasevich wrote:
>> On 04/08/2014 06:23 PM, Daniel Borkmann wrote:
>>> In function sctp_wake_up_waiters() we need to involve a test
>>> if the association is declared dead. If so, we don't have any
>>> reference to a possible sibling association anymore and need
>>> to invoke sctp_write_space() instead and normally walk the
>>> socket's associations and notify them of new wmem space. The
>>> reason for special casing is that, otherwise, we could run
>>> into the following issue:
>>>
>>> sctp_association_free()
>>> `-> list_del(&asoc->asocs)         <-- poisons list pointer
>>>      asoc->base.dead = true
>>>      sctp_outq_free(&asoc->outqueue)
>>>      `-> __sctp_outq_teardown()
>>>       `-> sctp_chunk_free()
>>>        `-> consume_skb()
>>>         `-> sctp_wfree()
>>>          `-> sctp_wake_up_waiters() <-- dereferences poisoned pointers
>>>                                         if asoc->ep->sndbuf_policy=0
>>>
>>> Therefore, only walk the list in an 'optimized' way if we find
>>> that the current association is still active. It's also more
>>> clean in that context to just use list_del_init() when we call
>>> sctp_association_free(). Stress-testing seems fine now.
>>
>> One of the reasons that we don't use list_del_init() here is that
>> we want to be able to trap on uninitialized/corrupt list manipulation,
>> just like you did.  If it wasn't there, the bug would have been hidden.
>>
>> Please keep it there.  The rest of the patch is fine.
> 
> Test run over night and I've seen no issues.
> 
> But I'd still question the usage of asoc->base.dead though, I think
> this approach of testing for asoc->base.dead is a bit racy (perhaps
> general usage of it, imho) - at least here there's a tiny window where
> we poison pointers before we actually declare the associaton dead.
> 
> Also, I think even if we would have deleted ourselves from the list
> after declaring the association dead, a different CPU accessing this
> association via sctp_wfree() might already have gotten past the
> asoc->base.dead test while we declare it dead in the meantime.

sctp_wfree is the destructor for the chunk.  Chunks are freed directly
by association while under lock.  So, a different CPU can't be running
sctp_wfree while another CPU is destroying the association as both
actions happen under the same socket lock.

The times we check asoc->base.dead is when we've cached an association
pointer for later processing.  In between cache and processing, the
association may have been freed and is simply still around due to
reference counts.  We check asoc->base.dead under a lock, so it
should always be safe to check and not race against sctp_association_free().

-vlad

-vlad

> 
> Imho, this still needs to be resolved differently. I'll look further ...
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Vladislav Yasevich April 9, 2014, 12:55 p.m. UTC | #5
On 04/08/2014 07:32 PM, Daniel Borkmann wrote:
> In function sctp_wake_up_waiters(), we need to involve a test
> if the association is declared dead. If so, we don't have any
> reference to a possible sibling association anymore and need
> to invoke sctp_write_space() instead and normally walk the
> socket's associations and notify them of new wmem space. The
> reason for special casing is that, otherwise, we could run
> into the following issue:
> 
> sctp_association_free()
> `-> list_del(&asoc->asocs)         <-- poisons list pointer
>     asoc->base.dead = true
>     sctp_outq_free(&asoc->outqueue)
>     `-> __sctp_outq_teardown()
>      `-> sctp_chunk_free()
>       `-> consume_skb()
>        `-> sctp_wfree()
>         `-> sctp_wake_up_waiters() <-- dereferences poisoned pointers
>                                        if asoc->ep->sndbuf_policy=0
> 
> Therefore, only walk the list in an 'optimized' way if we find
> that the current association is still active. We could also use
> list_del_init() in addition when we call sctp_association_free(),
> but as Vlad suggests, we want to trap such bugs and thus leave
> it poisoned as is. Stress-testing seems fine now.
> 
> Fixes: cd253f9f357d ("net: sctp: wake up all assocs if sndbuf policy is per socket")
> Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
> Cc: Vlad Yasevich <vyasevic@redhat.com>

Acked-by: Vlad Yasevich <vyasevic@redhat.com>

-vlad
> ---
>  v1->v2:
>   - leave list_del()
> 
>  net/sctp/socket.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
> index 5f83a6a..270d5bd 100644
> --- a/net/sctp/socket.c
> +++ b/net/sctp/socket.c
> @@ -6604,6 +6604,12 @@ static void sctp_wake_up_waiters(struct sock *sk,
>  	if (asoc->ep->sndbuf_policy)
>  		return __sctp_write_space(asoc);
>  
> +	/* If association goes down and is just flushing its
> +	 * outq, then just normally notify others.
> +	 */
> +	if (asoc->base.dead)
> +		return sctp_write_space(sk);
> +
>  	/* Accounting for the sndbuf space is per socket, so we
>  	 * need to wake up others, try to be fair and in case of
>  	 * other associations, let them have a go first instead
> 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Vladislav Yasevich April 9, 2014, 12:56 p.m. UTC | #6
On 04/09/2014 06:32 AM, Daniel Borkmann wrote:
> On 04/09/2014 10:09 AM, Daniel Borkmann wrote:
>> On 04/09/2014 01:10 AM, Vlad Yasevich wrote:
>>  > On 04/08/2014 06:23 PM, Daniel Borkmann wrote:
>>  >> In function sctp_wake_up_waiters() we need to involve a test
>>  >> if the association is declared dead. If so, we don't have any
>>  >> reference to a possible sibling association anymore and need
>>  >> to invoke sctp_write_space() instead and normally walk the
>>  >> socket's associations and notify them of new wmem space. The
>>  >> reason for special casing is that, otherwise, we could run
>>  >> into the following issue:
>>  >>
>>  >> sctp_association_free()
>>  >> `-> list_del(&asoc->asocs)         <-- poisons list pointer
>>  >>      asoc->base.dead = true
>>  >>      sctp_outq_free(&asoc->outqueue)
>>  >>      `-> __sctp_outq_teardown()
>>  >>       `-> sctp_chunk_free()
>>  >>        `-> consume_skb()
>>  >>         `-> sctp_wfree()
>>  >>          `-> sctp_wake_up_waiters() <-- dereferences poisoned
>> pointers
>>  >>                                         if asoc->ep->sndbuf_policy=0
>>  >>
>>  >> Therefore, only walk the list in an 'optimized' way if we find
>>  >> that the current association is still active. It's also more
>>  >> clean in that context to just use list_del_init() when we call
>>  >> sctp_association_free(). Stress-testing seems fine now.
>>  >
>>  > One of the reasons that we don't use list_del_init() here is that
>>  > we want to be able to trap on uninitialized/corrupt list manipulation,
>>  > just like you did.  If it wasn't there, the bug would have been
>> hidden.
>>  >
>>  > Please keep it there.  The rest of the patch is fine.
>>
>> Test run over night and I've seen no issues.
>>
>> But I'd still question the usage of asoc->base.dead though, I think
>> this approach of testing for asoc->base.dead is a bit racy (perhaps
>> general usage of it, imho) - at least here there's a tiny window where
>> we poison pointers before we actually declare the associaton dead.
>>
>> Also, I think even if we would have deleted ourselves from the list
>> after declaring the association dead, a different CPU accessing this
>> association via sctp_wfree() might already have gotten past the
>> asoc->base.dead test while we declare it dead in the meantime.
> 
> Ok, I think we can scratch that thought ... what happens is that parallel
> calls to sctp_sendmsg() are protected under lock_sock()/release_sock()
> pair as already stated in the code and within that lock, we are setting
> sctp_set_owner_w() for each chunk. When we call sctp_primitive_SEND(),
> still under lock, we might eventually end up in sctp_packet_transmit(),
> if I follow the path correctly, and orphan the skb in
> sctp_packet_set_owner_w()
> [ which basically would mean, we actually uncharge the accounted memory by
> orphaning _before_ we call dev_queue_xmit() since commit 4c3a5bdae293
> ("sctp: Don't charge for data in sndbuf again when transmitting packet")
> but that's perhaps a different story ] and set a new destructor. The
> only thing where in that context an association can be freed up by
> sctp_association_free() is if sctp_primitive_SEND() returns with error.
> So even in that case, we're still protected under
> lock_sock()/release_sock()
> when we flush the outq, so testing asoc->base.dead should be okay then,
> quite unintuitive though. Thus, patch seems fine, if wished, I could
> still document that in the commit message? Vlad, are we on the same
> page? ;)

yes, socket lock protects the reading of writing to any association
variables.

-vlad

> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Daniel Borkmann April 9, 2014, 1:34 p.m. UTC | #7
On 04/09/2014 12:59 PM, Neil Horman wrote:
...
> Both to the patch, and the documentation, as its not at all clear what lock
> protects the reading and writing of the dead variable.

Agreed, I'll send a v3 with an improved commit message to
summarize everything for future reference. Thanks everyone.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 5f83a6a..270d5bd 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -6604,6 +6604,12 @@  static void sctp_wake_up_waiters(struct sock *sk,
 	if (asoc->ep->sndbuf_policy)
 		return __sctp_write_space(asoc);
 
+	/* If association goes down and is just flushing its
+	 * outq, then just normally notify others.
+	 */
+	if (asoc->base.dead)
+		return sctp_write_space(sk);
+
 	/* Accounting for the sndbuf space is per socket, so we
 	 * need to wake up others, try to be fair and in case of
 	 * other associations, let them have a go first instead