diff mbox

[PATCHv2] sctp: Enforce retransmission limit during shutdown

Message ID 20110704135019.GA801@canuck.infradead.org
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Thomas Graf July 4, 2011, 1:50 p.m. UTC
When initiating a graceful shutdown while having data chunks
on the retransmission queue with a peer which is in zero
window mode the shutdown is never completed because the
retransmission error count is reset periodically by the
following two rules:

 - Do not timeout association while doing zero window probe.
 - Reset overall error count when a heartbeat request has
   been acknowledged.

The graceful shutdown will wait for all outstanding TSN to
be acknowledged before sending the SHUTDOWN request. This
never happens due to the peer's zero window not acknowledging
the continuously retransmitted data chunks. Although the
error counter is incremented for each failed retransmission,
the receiving of the SACK announcing the zero window clears
the error count again immediately. Also heartbeat requests
continue to be sent periodically. The peer acknowledges these
requests causing the error counter to be reset as well.

This patch changes behaviour to only reset the overall error
counter for the above rules while not in shutdown. After
reaching the maximum number of retransmission attempts, the
T5 shutdown guard timer is scheduled to give the receiver
some additional time to recover. The timer is stopped as soon
as the receiver acknowledges any data.

The issue can be easily reproduced by establishing a sctp
association over the loopback device, constantly queueing
data at the sender while not reading any at the receiver.
Wait for the window to reach zero, then initiate a shutdown
by killing both processes simultaneously. The association
will never be freed and the chunks on the retransmission
queue will be retransmitted indefinitely.

Signed-off-by: Thomas Graf <tgraf@infradead.org>

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

David Miller July 6, 2011, 7:24 a.m. UTC | #1
Vlad, SCTP folks, please review this patch.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Neil Horman July 6, 2011, 12:15 p.m. UTC | #2
On Mon, Jul 04, 2011 at 09:50:19AM -0400, Thomas Graf wrote:
> When initiating a graceful shutdown while having data chunks
> on the retransmission queue with a peer which is in zero
> window mode the shutdown is never completed because the
> retransmission error count is reset periodically by the
> following two rules:
> 
>  - Do not timeout association while doing zero window probe.
>  - Reset overall error count when a heartbeat request has
>    been acknowledged.
> 
> The graceful shutdown will wait for all outstanding TSN to
> be acknowledged before sending the SHUTDOWN request. This
> never happens due to the peer's zero window not acknowledging
> the continuously retransmitted data chunks. Although the
> error counter is incremented for each failed retransmission,
> the receiving of the SACK announcing the zero window clears
> the error count again immediately. Also heartbeat requests
> continue to be sent periodically. The peer acknowledges these
> requests causing the error counter to be reset as well.
> 
> This patch changes behaviour to only reset the overall error
> counter for the above rules while not in shutdown. After
> reaching the maximum number of retransmission attempts, the
> T5 shutdown guard timer is scheduled to give the receiver
> some additional time to recover. The timer is stopped as soon
> as the receiver acknowledges any data.
> 
> The issue can be easily reproduced by establishing a sctp
> association over the loopback device, constantly queueing
> data at the sender while not reading any at the receiver.
> Wait for the window to reach zero, then initiate a shutdown
> by killing both processes simultaneously. The association
> will never be freed and the chunks on the retransmission
> queue will be retransmitted indefinitely.
> 
> Signed-off-by: Thomas Graf <tgraf@infradead.org>
<snip>
> --- a/net/sctp/sm_statefuns.c
> +++ b/net/sctp/sm_statefuns.c
> @@ -5154,7 +5154,7 @@ sctp_disposition_t sctp_sf_do_9_2_start_shutdown(
>  	 * The sender of the SHUTDOWN MAY also start an overall guard timer
>  	 * 'T5-shutdown-guard' to bound the overall time for shutdown sequence.
>  	 */
> -	sctp_add_cmd_sf(commands, SCTP_CMD_TIMER_START,
> +	sctp_add_cmd_sf(commands, SCTP_CMD_TIMER_RESTART,
>  			SCTP_TO(SCTP_EVENT_TIMEOUT_T5_SHUTDOWN_GUARD));
>  
How come you're modifying this chunk to use TIMER_RESTART rather than
TIMER_START? start shutdown is where the t5 timer is actually started, isn't it?


The rest, I think looks ok to me.
Neil
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Thomas Graf July 6, 2011, 1:16 p.m. UTC | #3
On Wed, 2011-07-06 at 08:15 -0400, Neil Horman wrote: 
> On Mon, Jul 04, 2011 at 09:50:19AM -0400, Thomas Graf wrote:

> > --- a/net/sctp/sm_statefuns.c
> > +++ b/net/sctp/sm_statefuns.c
> > @@ -5154,7 +5154,7 @@ sctp_disposition_t sctp_sf_do_9_2_start_shutdown(
> >  	 * The sender of the SHUTDOWN MAY also start an overall guard timer
> >  	 * 'T5-shutdown-guard' to bound the overall time for shutdown sequence.
> >  	 */
> > -	sctp_add_cmd_sf(commands, SCTP_CMD_TIMER_START,
> > +	sctp_add_cmd_sf(commands, SCTP_CMD_TIMER_RESTART,
> >  			SCTP_TO(SCTP_EVENT_TIMEOUT_T5_SHUTDOWN_GUARD));
> >  
> How come you're modifying this chunk to use TIMER_RESTART rather than
> TIMER_START? start shutdown is where the t5 timer is actually started, isn't it?

Since we also start the timer in SHUTDOWN_PENDING now if we hit
the retransmission limit the timer may be running already and
needs to be restarted (at least in theory).

In reality the timer should be stopped though, we can only go
from SHUTDOWN_PENDING into SHUTDOWN by actually SACKing bytes which
will delete the timer. This may change though and I did not want
this to bite us later on.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Vlad Yasevich July 6, 2011, 1:42 p.m. UTC | #4
Hi Tomas

Some minor nits and one substantial issue.  See below.

On a related note, were you going to re-submit the receiver patch as well?

On 07/04/2011 09:50 AM, Thomas Graf wrote:
> 
> diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c
> index 1c88c89..0ae911f 100644
> --- a/net/sctp/outqueue.c
> +++ b/net/sctp/outqueue.c
> @@ -1582,6 +1582,9 @@ static void sctp_check_transmitted(struct sctp_outq *q,
>  #endif /* SCTP_DEBUG */
>  	if (transport) {
>  		if (bytes_acked) {
> +			struct sctp_association *asoc = transport->asoc;
> +			struct timer_list *t;
> +
>  			/* We may have counted DATA that was migrated
>  			 * to this transport due to DEL-IP operation.
>  			 * Subtract those bytes, since the were never
> @@ -1600,6 +1603,17 @@ static void sctp_check_transmitted(struct sctp_outq *q,
>  			transport->error_count = 0;
>  			transport->asoc->overall_error_count = 0;
>  
> +			/*
> +			 * While in SHUTDOWN PENDING, we may have started
> +			 * the T5 shutdown guard timer after reaching the
> +			 * retransmission limit. Stop that timer as soon
> +			 * as the receiver acknowledged any data.
> +			 */
> +			t = &asoc->timers[SCTP_EVENT_TIMEOUT_T5_SHUTDOWN_GUARD];
> +			if (asoc->state == SCTP_STATE_SHUTDOWN_PENDING &&
> +			    timer_pending(t) && del_timer(t))
> +				sctp_association_put(asoc);
> +

I believe 'state' and 'timers' are in different cache lines, so might be able to optimize it
a little by checking the state prior to referencing timers array.

>  			/* Mark the destination transport address as
>  			 * active if it is not so marked.
>  			 */
> @@ -1629,10 +1643,15 @@ static void sctp_check_transmitted(struct sctp_outq *q,
>  			 * A sender is doing zero window probing when the
>  			 * receiver's advertised window is zero, and there is
>  			 * only one data chunk in flight to the receiver.
> +			 *
> +			 * Allow the association to timeout if SHUTDOWN is
> +			 * pending in case the receiver stays in zero window
> +			 * mode forever.
>  			 */
>  			if (!q->asoc->peer.rwnd &&
>  			    !list_empty(&tlist) &&
> -			    (sack_ctsn+2 == q->asoc->next_tsn)) {
> +			    (sack_ctsn+2 == q->asoc->next_tsn) &&
> +			    !(q->asoc->state >= SCTP_STATE_SHUTDOWN_PENDING)) {

Would a test for (q->asoc->state != SCTP_STATE_SHUTDOWN_PENDING) be clearer?  We only
care about the PENDING state here.

>  				SCTP_DEBUG_PRINTK("%s: SACK received for zero "
>  						  "window probe: %u\n",
>  						  __func__, sack_ctsn);
> diff --git a/net/sctp/sm_sideeffect.c b/net/sctp/sm_sideeffect.c
> index 534c2e5..fa92f4d6 100644
> --- a/net/sctp/sm_sideeffect.c
> +++ b/net/sctp/sm_sideeffect.c
> @@ -670,10 +670,21 @@ static void sctp_cmd_transport_on(sctp_cmd_seq_t *cmds,
>  	/* 8.3 Upon the receipt of the HEARTBEAT ACK, the sender of the
>  	 * HEARTBEAT should clear the error counter of the destination
>  	 * transport address to which the HEARTBEAT was sent.
> -	 * The association's overall error count is also cleared.
>  	 */
>  	t->error_count = 0;
> -	t->asoc->overall_error_count = 0;
> +
> +	/*
> +	 * Although RFC2960 and RFC4460 specify that the overall error
> +	 * count must be cleared when a HEARTBEAT ACK is received this
> +	 * behaviour may prevent the maximum retransmission count from
> +	 * being reached while in SHUTDOWN. If the peer keeps its window
> +	 * closed not acknowledging any outstanding TSN we may rely on
> +	 * reaching the max_retrans limit via the T3-rtx timer to close
> +	 * the association which will never happen if the error count is
> +	 * reset every heartbeat interval.
> +	 */
> +	if (!(t->asoc->state >= SCTP_STATE_SHUTDOWN_PENDING))
> +		t->asoc->overall_error_count = 0;

Same here.  We only care about the PENDING state. Also, please fix the comment to reflect
the code.

>  
>  	/* Clear the hb_sent flag to signal that we had a good
>  	 * acknowledgement.
> diff --git a/net/sctp/sm_statefuns.c b/net/sctp/sm_statefuns.c
> index a297283..e6a0c35 100644
> --- a/net/sctp/sm_statefuns.c
> +++ b/net/sctp/sm_statefuns.c
> @@ -5154,7 +5154,7 @@ sctp_disposition_t sctp_sf_do_9_2_start_shutdown(
>  	 * The sender of the SHUTDOWN MAY also start an overall guard timer
>  	 * 'T5-shutdown-guard' to bound the overall time for shutdown sequence.
>  	 */
> -	sctp_add_cmd_sf(commands, SCTP_CMD_TIMER_START,
> +	sctp_add_cmd_sf(commands, SCTP_CMD_TIMER_RESTART,
>  			SCTP_TO(SCTP_EVENT_TIMEOUT_T5_SHUTDOWN_GUARD));
>  
>  	if (asoc->autoclose)
> @@ -5299,14 +5299,28 @@ sctp_disposition_t sctp_sf_do_6_3_3_rtx(const struct sctp_endpoint *ep,
>  	SCTP_INC_STATS(SCTP_MIB_T3_RTX_EXPIREDS);
>  
>  	if (asoc->overall_error_count >= asoc->max_retrans) {
> -		sctp_add_cmd_sf(commands, SCTP_CMD_SET_SK_ERR,
> -				SCTP_ERROR(ETIMEDOUT));
> -		/* CMD_ASSOC_FAILED calls CMD_DELETE_TCB. */
> -		sctp_add_cmd_sf(commands, SCTP_CMD_ASSOC_FAILED,
> -				SCTP_PERR(SCTP_ERROR_NO_ERROR));
> -		SCTP_INC_STATS(SCTP_MIB_ABORTEDS);
> -		SCTP_DEC_STATS(SCTP_MIB_CURRESTAB);
> -		return SCTP_DISPOSITION_DELETE_TCB;
> +		if (asoc->state == SCTP_STATE_SHUTDOWN_PENDING) {
> +			/*
> +			 * We are here likely because the receiver had its rwnd
> +			 * closed for a while and we have not been able to
> +			 * transmit the locally queued data within the maximum
> +			 * retransmission attempts limit.  Start the T5
> +			 * shutdown guard timer to give the receiver one last
> +			 * chance and some additional time to recover before
> +			 * aborting.
> +			 */
> +			sctp_add_cmd_sf(commands, SCTP_CMD_TIMER_RESTART,
> +				SCTP_TO(SCTP_EVENT_TIMEOUT_T5_SHUTDOWN_GUARD));

This is bug.  You don't want to restart the timer every time you hit a T3-timeout.  Remember, since you fall
through here, you do another retransmission and schedule another timeout.  So next time the timeout happens,
you'll restart the SHUTDOWN_GUARD, which is not what you want.

We want to start it once if it isn't pending, and leave it running without restart if it is already pending.

-vlad

> +		} else {
> +			sctp_add_cmd_sf(commands, SCTP_CMD_SET_SK_ERR,
> +					SCTP_ERROR(ETIMEDOUT));
> +			/* CMD_ASSOC_FAILED calls CMD_DELETE_TCB. */
> +			sctp_add_cmd_sf(commands, SCTP_CMD_ASSOC_FAILED,
> +					SCTP_PERR(SCTP_ERROR_NO_ERROR));
> +			SCTP_INC_STATS(SCTP_MIB_ABORTEDS);
> +			SCTP_DEC_STATS(SCTP_MIB_CURRESTAB);
> +			return SCTP_DISPOSITION_DELETE_TCB;
> +		}
>  	}
>  
>  	/* E1) For the destination address for which the timer
> diff --git a/net/sctp/sm_statetable.c b/net/sctp/sm_statetable.c
> index 0338dc6..7c211a7 100644
> --- a/net/sctp/sm_statetable.c
> +++ b/net/sctp/sm_statetable.c
> @@ -827,7 +827,7 @@ static const sctp_sm_table_entry_t other_event_table[SCTP_NUM_OTHER_TYPES][SCTP_
>  	/* SCTP_STATE_ESTABLISHED */ \
>  	TYPE_SCTP_FUNC(sctp_sf_timer_ignore), \
>  	/* SCTP_STATE_SHUTDOWN_PENDING */ \
> -	TYPE_SCTP_FUNC(sctp_sf_timer_ignore), \
> +	TYPE_SCTP_FUNC(sctp_sf_t5_timer_expire), \
>  	/* SCTP_STATE_SHUTDOWN_SENT */ \
>  	TYPE_SCTP_FUNC(sctp_sf_t5_timer_expire), \
>  	/* SCTP_STATE_SHUTDOWN_RECEIVED */ \
> 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Thomas Graf July 6, 2011, 2:18 p.m. UTC | #5
On Wed, Jul 06, 2011 at 09:42:42AM -0400, Vladislav Yasevich wrote:
> On a related note, were you going to re-submit the receiver patch as well?

Yes

> On 07/04/2011 09:50 AM, Thomas Graf wrote:
> > +			 * retransmission limit. Stop that timer as soon
> > +			 * as the receiver acknowledged any data.
> > +			 */
> > +			t = &asoc->timers[SCTP_EVENT_TIMEOUT_T5_SHUTDOWN_GUARD];
> > +			if (asoc->state == SCTP_STATE_SHUTDOWN_PENDING &&
> > +			    timer_pending(t) && del_timer(t))
> > +				sctp_association_put(asoc);
> > +
> 
> I believe 'state' and 'timers' are in different cache lines, so might be able to optimize it
> a little by checking the state prior to referencing timers array.

gcc should do that but I'm fine with changing it.

> > +			 *
> > +			 * Allow the association to timeout if SHUTDOWN is
> > +			 * pending in case the receiver stays in zero window
> > +			 * mode forever.
> >  			 */
> >  			if (!q->asoc->peer.rwnd &&
> >  			    !list_empty(&tlist) &&
> > -			    (sack_ctsn+2 == q->asoc->next_tsn)) {
> > +			    (sack_ctsn+2 == q->asoc->next_tsn) &&
> > +			    !(q->asoc->state >= SCTP_STATE_SHUTDOWN_PENDING)) {
> 
> Would a test for (q->asoc->state != SCTP_STATE_SHUTDOWN_PENDING) be clearer?  We only
> care about the PENDING state here.

I think SHUTDOWN_RECEIVED should also be included. We continue to transmit and
process SACKs after receiving a SHUTDOWN.

> > +	 * Although RFC2960 and RFC4460 specify that the overall error
> > +	 * count must be cleared when a HEARTBEAT ACK is received this
> > +	 * behaviour may prevent the maximum retransmission count from
> > +	 * being reached while in SHUTDOWN. If the peer keeps its window
> > +	 * closed not acknowledging any outstanding TSN we may rely on
> > +	 * reaching the max_retrans limit via the T3-rtx timer to close
> > +	 * the association which will never happen if the error count is
> > +	 * reset every heartbeat interval.
> > +	 */
> > +	if (!(t->asoc->state >= SCTP_STATE_SHUTDOWN_PENDING))
> > +		t->asoc->overall_error_count = 0;
> 
> Same here.  We only care about the PENDING state. Also, please fix the comment to reflect
> the code.

Agreed.

> > +		if (asoc->state == SCTP_STATE_SHUTDOWN_PENDING) {
> > +			/*
> > +			 * We are here likely because the receiver had its rwnd
> > +			 * closed for a while and we have not been able to
> > +			 * transmit the locally queued data within the maximum
> > +			 * retransmission attempts limit.  Start the T5
> > +			 * shutdown guard timer to give the receiver one last
> > +			 * chance and some additional time to recover before
> > +			 * aborting.
> > +			 */
> > +			sctp_add_cmd_sf(commands, SCTP_CMD_TIMER_RESTART,
> > +				SCTP_TO(SCTP_EVENT_TIMEOUT_T5_SHUTDOWN_GUARD));
> 
> This is bug.  You don't want to restart the timer every time you hit a T3-timeout.  Remember, since you fall
> through here, you do another retransmission and schedule another timeout.  So next time the timeout happens,
> you'll restart the SHUTDOWN_GUARD, which is not what you want.
> 
> We want to start it once if it isn't pending, and leave it running without restart if it is already pending.

Doh, absolutely. The timer_pending() check got lost between testing and submission.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Neil Horman July 6, 2011, 2:19 p.m. UTC | #6
On Wed, Jul 06, 2011 at 03:16:24PM +0200, Thomas Graf wrote:
> On Wed, 2011-07-06 at 08:15 -0400, Neil Horman wrote: 
> > On Mon, Jul 04, 2011 at 09:50:19AM -0400, Thomas Graf wrote:
> 
> > > --- a/net/sctp/sm_statefuns.c
> > > +++ b/net/sctp/sm_statefuns.c
> > > @@ -5154,7 +5154,7 @@ sctp_disposition_t sctp_sf_do_9_2_start_shutdown(
> > >  	 * The sender of the SHUTDOWN MAY also start an overall guard timer
> > >  	 * 'T5-shutdown-guard' to bound the overall time for shutdown sequence.
> > >  	 */
> > > -	sctp_add_cmd_sf(commands, SCTP_CMD_TIMER_START,
> > > +	sctp_add_cmd_sf(commands, SCTP_CMD_TIMER_RESTART,
> > >  			SCTP_TO(SCTP_EVENT_TIMEOUT_T5_SHUTDOWN_GUARD));
> > >  
> > How come you're modifying this chunk to use TIMER_RESTART rather than
> > TIMER_START? start shutdown is where the t5 timer is actually started, isn't it?
> 
> Since we also start the timer in SHUTDOWN_PENDING now if we hit
> the retransmission limit the timer may be running already and
> needs to be restarted (at least in theory).
> 
> In reality the timer should be stopped though, we can only go
> from SHUTDOWN_PENDING into SHUTDOWN by actually SACKing bytes which
> will delete the timer. This may change though and I did not want
> this to bite us later on.
> 
> 
> 
Ok, makes sense
Acked-by: Neil Horman <nhorman@tuxdriver.com>

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Vlad Yasevich July 6, 2011, 2:31 p.m. UTC | #7
On 07/06/2011 10:18 AM, Thomas Graf wrote:
> On Wed, Jul 06, 2011 at 09:42:42AM -0400, Vladislav Yasevich wrote:
>> On a related note, were you going to re-submit the receiver patch as well?
> 
> Yes
> 
>> On 07/04/2011 09:50 AM, Thomas Graf wrote:
>>> +			 * retransmission limit. Stop that timer as soon
>>> +			 * as the receiver acknowledged any data.
>>> +			 */
>>> +			t = &asoc->timers[SCTP_EVENT_TIMEOUT_T5_SHUTDOWN_GUARD];
>>> +			if (asoc->state == SCTP_STATE_SHUTDOWN_PENDING &&
>>> +			    timer_pending(t) && del_timer(t))
>>> +				sctp_association_put(asoc);
>>> +
>>
>> I believe 'state' and 'timers' are in different cache lines, so might be able to optimize it
>> a little by checking the state prior to referencing timers array.
> 
> gcc should do that but I'm fine with changing it.
> 
>>> +			 *
>>> +			 * Allow the association to timeout if SHUTDOWN is
>>> +			 * pending in case the receiver stays in zero window
>>> +			 * mode forever.
>>>  			 */
>>>  			if (!q->asoc->peer.rwnd &&
>>>  			    !list_empty(&tlist) &&
>>> -			    (sack_ctsn+2 == q->asoc->next_tsn)) {
>>> +			    (sack_ctsn+2 == q->asoc->next_tsn) &&
>>> +			    !(q->asoc->state >= SCTP_STATE_SHUTDOWN_PENDING)) {
>>
>> Would a test for (q->asoc->state != SCTP_STATE_SHUTDOWN_PENDING) be clearer?  We only
>> care about the PENDING state here.
> 
> I think SHUTDOWN_RECEIVED should also be included. We continue to transmit and
> process SACKs after receiving a SHUTDOWN.

I am not sure about SHUTDOWN_RECEIVED.  If we received shutdown, then we are not in
a 0 window situation.  Additionally, the sender of the SHUTDOWN started the GUARD timer
and will abort after it expires.  So there is no special handling on our part.

-vlad

> 
>>> +	 * Although RFC2960 and RFC4460 specify that the overall error
>>> +	 * count must be cleared when a HEARTBEAT ACK is received this
>>> +	 * behaviour may prevent the maximum retransmission count from
>>> +	 * being reached while in SHUTDOWN. If the peer keeps its window
>>> +	 * closed not acknowledging any outstanding TSN we may rely on
>>> +	 * reaching the max_retrans limit via the T3-rtx timer to close
>>> +	 * the association which will never happen if the error count is
>>> +	 * reset every heartbeat interval.
>>> +	 */
>>> +	if (!(t->asoc->state >= SCTP_STATE_SHUTDOWN_PENDING))
>>> +		t->asoc->overall_error_count = 0;
>>
>> Same here.  We only care about the PENDING state. Also, please fix the comment to reflect
>> the code.
> 
> Agreed.
> 
>>> +		if (asoc->state == SCTP_STATE_SHUTDOWN_PENDING) {
>>> +			/*
>>> +			 * We are here likely because the receiver had its rwnd
>>> +			 * closed for a while and we have not been able to
>>> +			 * transmit the locally queued data within the maximum
>>> +			 * retransmission attempts limit.  Start the T5
>>> +			 * shutdown guard timer to give the receiver one last
>>> +			 * chance and some additional time to recover before
>>> +			 * aborting.
>>> +			 */
>>> +			sctp_add_cmd_sf(commands, SCTP_CMD_TIMER_RESTART,
>>> +				SCTP_TO(SCTP_EVENT_TIMEOUT_T5_SHUTDOWN_GUARD));
>>
>> This is bug.  You don't want to restart the timer every time you hit a T3-timeout.  Remember, since you fall
>> through here, you do another retransmission and schedule another timeout.  So next time the timeout happens,
>> you'll restart the SHUTDOWN_GUARD, which is not what you want.
>>
>> We want to start it once if it isn't pending, and leave it running without restart if it is already pending.
> 
> Doh, absolutely. The timer_pending() check got lost between testing and submission.
> 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Thomas Graf July 6, 2011, 3:49 p.m. UTC | #8
On Wed, Jul 06, 2011 at 10:31:56AM -0400, Vladislav Yasevich wrote:
> >>> +			 *
> >>> +			 * Allow the association to timeout if SHUTDOWN is
> >>> +			 * pending in case the receiver stays in zero window
> >>> +			 * mode forever.
> >>>  			 */
> >>>  			if (!q->asoc->peer.rwnd &&
> >>>  			    !list_empty(&tlist) &&
> >>> -			    (sack_ctsn+2 == q->asoc->next_tsn)) {
> >>> +			    (sack_ctsn+2 == q->asoc->next_tsn) &&
> >>> +			    !(q->asoc->state >= SCTP_STATE_SHUTDOWN_PENDING)) {
> >>
> >> Would a test for (q->asoc->state != SCTP_STATE_SHUTDOWN_PENDING) be clearer?  We only
> >> care about the PENDING state here.
> > 
> > I think SHUTDOWN_RECEIVED should also be included. We continue to transmit and
> > process SACKs after receiving a SHUTDOWN.
> 
> I am not sure about SHUTDOWN_RECEIVED.  If we received shutdown, then we are not in
> a 0 window situation.  Additionally, the sender of the SHUTDOWN started the GUARD timer
> and will abort after it expires.  So there is no special handling on our part.

Why can't we be in a 0 window situation? A well behaving sctp peer may not,
but we're on the Internet, everyone behaves at their worst :-)

Seriously, this would make for a simple dos. Establish a stream, don't ack any
data to make sure there is something on the retransmission queue of the peer.
Immediately shutdown the stream and ack any retransmission attempt with
a_rwnd=0 to keep the association around forever.

Starting the T5 SHUTDOWN GUARD timer is specified as MAY and not MUST so even in
a well behaving world we could not really rely on it.

Alternatively the peer could just be buggy as well.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Vlad Yasevich July 6, 2011, 4:23 p.m. UTC | #9
On 07/06/2011 11:49 AM, Thomas Graf wrote:
> On Wed, Jul 06, 2011 at 10:31:56AM -0400, Vladislav Yasevich wrote:
>>>>> +			 *
>>>>> +			 * Allow the association to timeout if SHUTDOWN is
>>>>> +			 * pending in case the receiver stays in zero window
>>>>> +			 * mode forever.
>>>>>  			 */
>>>>>  			if (!q->asoc->peer.rwnd &&
>>>>>  			    !list_empty(&tlist) &&
>>>>> -			    (sack_ctsn+2 == q->asoc->next_tsn)) {
>>>>> +			    (sack_ctsn+2 == q->asoc->next_tsn) &&
>>>>> +			    !(q->asoc->state >= SCTP_STATE_SHUTDOWN_PENDING)) {
>>>>
>>>> Would a test for (q->asoc->state != SCTP_STATE_SHUTDOWN_PENDING) be clearer?  We only
>>>> care about the PENDING state here.
>>>
>>> I think SHUTDOWN_RECEIVED should also be included. We continue to transmit and
>>> process SACKs after receiving a SHUTDOWN.
>>
>> I am not sure about SHUTDOWN_RECEIVED.  If we received shutdown, then we are not in
>> a 0 window situation.  Additionally, the sender of the SHUTDOWN started the GUARD timer
>> and will abort after it expires.  So there is no special handling on our part.
> 
> Why can't we be in a 0 window situation? A well behaving sctp peer may not,
> but we're on the Internet, everyone behaves at their worst :-)
> 
> Seriously, this would make for a simple dos. Establish a stream, don't ack any
> data to make sure there is something on the retransmission queue of the peer.
> Immediately shutdown the stream and ack any retransmission attempt with
> a_rwnd=0 to keep the association around forever.
> 
> Starting the T5 SHUTDOWN GUARD timer is specified as MAY and not MUST so even in
> a well behaving world we could not really rely on it.
> 
> Alternatively the peer could just be buggy as well.
> 

You are right.  Without a receiver patch, a linux receiver would stay in 0-window condition
while sending a SHUTDOWN with a_rwnd of 0.

How about instead of checking for "Not greater then or equals", we instead simply test for
"less then"?

-vlad
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Thomas Graf July 6, 2011, 9:58 p.m. UTC | #10
On Wed, Jul 06, 2011 at 12:23:50PM -0400, Vladislav Yasevich wrote:
> You are right.  Without a receiver patch, a linux receiver would stay in 0-window condition
> while sending a SHUTDOWN with a_rwnd of 0.
> 
> How about instead of checking for "Not greater then or equals", we instead simply test for
> "less then"?

Agreed

Will repost the patch with your suggestions included and look into the
receiver patch as well.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c
index 1c88c89..0ae911f 100644
--- a/net/sctp/outqueue.c
+++ b/net/sctp/outqueue.c
@@ -1582,6 +1582,9 @@  static void sctp_check_transmitted(struct sctp_outq *q,
 #endif /* SCTP_DEBUG */
 	if (transport) {
 		if (bytes_acked) {
+			struct sctp_association *asoc = transport->asoc;
+			struct timer_list *t;
+
 			/* We may have counted DATA that was migrated
 			 * to this transport due to DEL-IP operation.
 			 * Subtract those bytes, since the were never
@@ -1600,6 +1603,17 @@  static void sctp_check_transmitted(struct sctp_outq *q,
 			transport->error_count = 0;
 			transport->asoc->overall_error_count = 0;
 
+			/*
+			 * While in SHUTDOWN PENDING, we may have started
+			 * the T5 shutdown guard timer after reaching the
+			 * retransmission limit. Stop that timer as soon
+			 * as the receiver acknowledged any data.
+			 */
+			t = &asoc->timers[SCTP_EVENT_TIMEOUT_T5_SHUTDOWN_GUARD];
+			if (asoc->state == SCTP_STATE_SHUTDOWN_PENDING &&
+			    timer_pending(t) && del_timer(t))
+				sctp_association_put(asoc);
+
 			/* Mark the destination transport address as
 			 * active if it is not so marked.
 			 */
@@ -1629,10 +1643,15 @@  static void sctp_check_transmitted(struct sctp_outq *q,
 			 * A sender is doing zero window probing when the
 			 * receiver's advertised window is zero, and there is
 			 * only one data chunk in flight to the receiver.
+			 *
+			 * Allow the association to timeout if SHUTDOWN is
+			 * pending in case the receiver stays in zero window
+			 * mode forever.
 			 */
 			if (!q->asoc->peer.rwnd &&
 			    !list_empty(&tlist) &&
-			    (sack_ctsn+2 == q->asoc->next_tsn)) {
+			    (sack_ctsn+2 == q->asoc->next_tsn) &&
+			    !(q->asoc->state >= SCTP_STATE_SHUTDOWN_PENDING)) {
 				SCTP_DEBUG_PRINTK("%s: SACK received for zero "
 						  "window probe: %u\n",
 						  __func__, sack_ctsn);
diff --git a/net/sctp/sm_sideeffect.c b/net/sctp/sm_sideeffect.c
index 534c2e5..fa92f4d6 100644
--- a/net/sctp/sm_sideeffect.c
+++ b/net/sctp/sm_sideeffect.c
@@ -670,10 +670,21 @@  static void sctp_cmd_transport_on(sctp_cmd_seq_t *cmds,
 	/* 8.3 Upon the receipt of the HEARTBEAT ACK, the sender of the
 	 * HEARTBEAT should clear the error counter of the destination
 	 * transport address to which the HEARTBEAT was sent.
-	 * The association's overall error count is also cleared.
 	 */
 	t->error_count = 0;
-	t->asoc->overall_error_count = 0;
+
+	/*
+	 * Although RFC2960 and RFC4460 specify that the overall error
+	 * count must be cleared when a HEARTBEAT ACK is received this
+	 * behaviour may prevent the maximum retransmission count from
+	 * being reached while in SHUTDOWN. If the peer keeps its window
+	 * closed not acknowledging any outstanding TSN we may rely on
+	 * reaching the max_retrans limit via the T3-rtx timer to close
+	 * the association which will never happen if the error count is
+	 * reset every heartbeat interval.
+	 */
+	if (!(t->asoc->state >= SCTP_STATE_SHUTDOWN_PENDING))
+		t->asoc->overall_error_count = 0;
 
 	/* Clear the hb_sent flag to signal that we had a good
 	 * acknowledgement.
diff --git a/net/sctp/sm_statefuns.c b/net/sctp/sm_statefuns.c
index a297283..e6a0c35 100644
--- a/net/sctp/sm_statefuns.c
+++ b/net/sctp/sm_statefuns.c
@@ -5154,7 +5154,7 @@  sctp_disposition_t sctp_sf_do_9_2_start_shutdown(
 	 * The sender of the SHUTDOWN MAY also start an overall guard timer
 	 * 'T5-shutdown-guard' to bound the overall time for shutdown sequence.
 	 */
-	sctp_add_cmd_sf(commands, SCTP_CMD_TIMER_START,
+	sctp_add_cmd_sf(commands, SCTP_CMD_TIMER_RESTART,
 			SCTP_TO(SCTP_EVENT_TIMEOUT_T5_SHUTDOWN_GUARD));
 
 	if (asoc->autoclose)
@@ -5299,14 +5299,28 @@  sctp_disposition_t sctp_sf_do_6_3_3_rtx(const struct sctp_endpoint *ep,
 	SCTP_INC_STATS(SCTP_MIB_T3_RTX_EXPIREDS);
 
 	if (asoc->overall_error_count >= asoc->max_retrans) {
-		sctp_add_cmd_sf(commands, SCTP_CMD_SET_SK_ERR,
-				SCTP_ERROR(ETIMEDOUT));
-		/* CMD_ASSOC_FAILED calls CMD_DELETE_TCB. */
-		sctp_add_cmd_sf(commands, SCTP_CMD_ASSOC_FAILED,
-				SCTP_PERR(SCTP_ERROR_NO_ERROR));
-		SCTP_INC_STATS(SCTP_MIB_ABORTEDS);
-		SCTP_DEC_STATS(SCTP_MIB_CURRESTAB);
-		return SCTP_DISPOSITION_DELETE_TCB;
+		if (asoc->state == SCTP_STATE_SHUTDOWN_PENDING) {
+			/*
+			 * We are here likely because the receiver had its rwnd
+			 * closed for a while and we have not been able to
+			 * transmit the locally queued data within the maximum
+			 * retransmission attempts limit.  Start the T5
+			 * shutdown guard timer to give the receiver one last
+			 * chance and some additional time to recover before
+			 * aborting.
+			 */
+			sctp_add_cmd_sf(commands, SCTP_CMD_TIMER_RESTART,
+				SCTP_TO(SCTP_EVENT_TIMEOUT_T5_SHUTDOWN_GUARD));
+		} else {
+			sctp_add_cmd_sf(commands, SCTP_CMD_SET_SK_ERR,
+					SCTP_ERROR(ETIMEDOUT));
+			/* CMD_ASSOC_FAILED calls CMD_DELETE_TCB. */
+			sctp_add_cmd_sf(commands, SCTP_CMD_ASSOC_FAILED,
+					SCTP_PERR(SCTP_ERROR_NO_ERROR));
+			SCTP_INC_STATS(SCTP_MIB_ABORTEDS);
+			SCTP_DEC_STATS(SCTP_MIB_CURRESTAB);
+			return SCTP_DISPOSITION_DELETE_TCB;
+		}
 	}
 
 	/* E1) For the destination address for which the timer
diff --git a/net/sctp/sm_statetable.c b/net/sctp/sm_statetable.c
index 0338dc6..7c211a7 100644
--- a/net/sctp/sm_statetable.c
+++ b/net/sctp/sm_statetable.c
@@ -827,7 +827,7 @@  static const sctp_sm_table_entry_t other_event_table[SCTP_NUM_OTHER_TYPES][SCTP_
 	/* SCTP_STATE_ESTABLISHED */ \
 	TYPE_SCTP_FUNC(sctp_sf_timer_ignore), \
 	/* SCTP_STATE_SHUTDOWN_PENDING */ \
-	TYPE_SCTP_FUNC(sctp_sf_timer_ignore), \
+	TYPE_SCTP_FUNC(sctp_sf_t5_timer_expire), \
 	/* SCTP_STATE_SHUTDOWN_SENT */ \
 	TYPE_SCTP_FUNC(sctp_sf_t5_timer_expire), \
 	/* SCTP_STATE_SHUTDOWN_RECEIVED */ \