sctp: Enforce maximum retransmissions during shutdown

Message ID	20110629135704.GB10085@canuck.infradead.org
State	Superseded, archived
Delegated to:	David Miller
Headers	show Return-Path: <netdev-owner@vger.kernel.org> Date: Wed, 29 Jun 2011 09:57:04 -0400 From: Thomas Graf <tgraf@infradead.org> To: netdev@vger.kernel.org Cc: davem@davemloft.net, Wei Yongjun <yjwei@cn.fujitsu.com>, Vlad Yasevich <vladislav.yasevich@hp.com>, Sridhar Samudrala <sri@us.ibm.com>, linux-sctp@vger.kernel.org Subject: [PATCH] sctp: Enforce maximum retransmissions during shutdown Message-ID: <20110629135704.GB10085@canuck.infradead.org> Mail-Followup-To: netdev@vger.kernel.org, davem@davemloft.net, Wei Yongjun <yjwei@cn.fujitsu.com>, Vlad Yasevich <vladislav.yasevich@hp.com>, Sridhar Samudrala <sri@us.ibm.com>, linux-sctp@vger.kernel.org MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.20 (2009-08-17) Sender: netdev-owner@vger.kernel.org Precedence: bulk

Thomas Graf June 29, 2011, 1:57 p.m. UTC

When initiating a graceful shutdown while having data chunks
on the retransmission queue with a peer which is in zero
window mode the shutdown is never completed because the
retransmission error count is reset periodically by the
following two rules:

 - Do not timeout association while doing zero window probe.
 - Reset overall error count when a heartbeat request has
   been acknowledged.

The graceful shutdown will wait for all outstanding TSN to be
acknowledged before sending the SHUTDOWN request. This never
happens due to the peer's zero window not acknowledging the
continuously retransmitted data chunks. Although the error
counter is incremented for each failed retransmission done
via the T3-rtx timer, the receiving of the SACK sent in return
to the retransmission, announcing the zero window, clears the
error count again immediately.

Also heartbeat requests continue to be sent periodically. The
peer acknowledges these requests causing the error counter to
be reset as well.

This patch changes behaviour to only reset the overall error
counter for the above rules while not in shutdown. This means
that if already queued data can't be transmitted in max_retrans
attempts we ABORT because a graceful shutdown is obviously not
possible.

The issue can be easily reproduced by establishing a sctp
association over the loopback device, constantly queueing data
at the sender while not reading any at the receiver.  Wait for
the window to reach zero, then initiate a shutdown by killing
both processes simultaneously. The association will never be
freed and the chunks on the retransmission queue will be
retransmitted indefinitely.

Signed-off-by: Thomas Graf <tgraf@infradead.org>

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Vlad Yasevich June 29, 2011, 2:20 p.m. UTC | #1

On 06/29/2011 09:57 AM, Thomas Graf wrote:
> When initiating a graceful shutdown while having data chunks
> on the retransmission queue with a peer which is in zero
> window mode the shutdown is never completed because the
> retransmission error count is reset periodically by the
> following two rules:
> 
>  - Do not timeout association while doing zero window probe.
>  - Reset overall error count when a heartbeat request has
>    been acknowledged.
> 
> The graceful shutdown will wait for all outstanding TSN to be
> acknowledged before sending the SHUTDOWN request. This never
> happens due to the peer's zero window not acknowledging the
> continuously retransmitted data chunks. Although the error
> counter is incremented for each failed retransmission done
> via the T3-rtx timer, the receiving of the SACK sent in return
> to the retransmission, announcing the zero window, clears the
> error count again immediately.
> 
> Also heartbeat requests continue to be sent periodically. The
> peer acknowledges these requests causing the error counter to
> be reset as well.
> 
> This patch changes behaviour to only reset the overall error
> counter for the above rules while not in shutdown. This means
> that if already queued data can't be transmitted in max_retrans
> attempts we ABORT because a graceful shutdown is obviously not
> possible.
> 
> The issue can be easily reproduced by establishing a sctp
> association over the loopback device, constantly queueing data
> at the sender while not reading any at the receiver.  Wait for
> the window to reach zero, then initiate a shutdown by killing
> both processes simultaneously. The association will never be
> freed and the chunks on the retransmission queue will be
> retransmitted indefinitely.
> 

Hi Tomas

I think in this particular case, the receiver has to terminate, not the sender.
Look at how tcp_close() handles this.

As long as receiver is available, the sender should continue to try
sending data.

Once the receiver is no longer available (killed), if it had queued data, it should
trigger an ABORT since it lost data.  That ABORT will stop the sender as well.

-vlad

> Signed-off-by: Thomas Graf <tgraf@infradead.org>
> 
> diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c
> index 1c88c89..14a5295 100644
> --- a/net/sctp/outqueue.c
> +++ b/net/sctp/outqueue.c
> @@ -1629,10 +1629,15 @@ static void sctp_check_transmitted(struct sctp_outq *q,
>  			 * A sender is doing zero window probing when the
>  			 * receiver's advertised window is zero, and there is
>  			 * only one data chunk in flight to the receiver.
> +			 *
> +			 * Allow the association to timeout if SHUTDOWN is
> +			 * pending. We have no interest in keeping the
> +			 * association around forever.
>  			 */
>  			if (!q->asoc->peer.rwnd &&
>  			    !list_empty(&tlist) &&
> -			    (sack_ctsn+2 == q->asoc->next_tsn)) {
> +			    (sack_ctsn+2 == q->asoc->next_tsn) &&
> +			    !(q->asoc->state >= SCTP_STATE_SHUTDOWN_PENDING)) {
>  				SCTP_DEBUG_PRINTK("%s: SACK received for zero "
>  						  "window probe: %u\n",
>  						  __func__, sack_ctsn);
> diff --git a/net/sctp/sm_sideeffect.c b/net/sctp/sm_sideeffect.c
> index 534c2e5..fa92f4d6 100644
> --- a/net/sctp/sm_sideeffect.c
> +++ b/net/sctp/sm_sideeffect.c
> @@ -670,10 +670,21 @@ static void sctp_cmd_transport_on(sctp_cmd_seq_t *cmds,
>  	/* 8.3 Upon the receipt of the HEARTBEAT ACK, the sender of the
>  	 * HEARTBEAT should clear the error counter of the destination
>  	 * transport address to which the HEARTBEAT was sent.
> -	 * The association's overall error count is also cleared.
>  	 */
>  	t->error_count = 0;
> -	t->asoc->overall_error_count = 0;
> +
> +	/*
> +	 * Although RFC2960 and RFC4460 specify that the overall error
> +	 * count must be cleared when a HEARTBEAT ACK is received this
> +	 * behaviour may prevent the maximum retransmission count from
> +	 * being reached while in SHUTDOWN. If the peer keeps its window
> +	 * closed not acknowledging any outstanding TSN we may rely on
> +	 * reaching the max_retrans limit via the T3-rtx timer to close
> +	 * the association which will never happen if the error count is
> +	 * reset every heartbeat interval.
> +	 */
> +	if (!(t->asoc->state >= SCTP_STATE_SHUTDOWN_PENDING))
> +		t->asoc->overall_error_count = 0;
>  
>  	/* Clear the hb_sent flag to signal that we had a good
>  	 * acknowledgement.
> 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Thomas Graf June 29, 2011, 2:36 p.m. UTC | #2

On Wed, Jun 29, 2011 at 10:20:01AM -0400, Vladislav Yasevich wrote:
> I think in this particular case, the receiver has to terminate, not the sender.
> Look at how tcp_close() handles this.
> 
> As long as receiver is available, the sender should continue to try
> sending data.

The receiver does not know that the sender wishes to shutdown the
association. No shutdown request has been sent yet.

I don't think we should be relying on the behaviour of the sender for
the receiver to be able to ever free its ressources. We will be
retransmitting data and keeping an association alive _forever_ for no
purpose.

If there is no reliable way of _ever_ doing a graceful shutdown then
the only alternative is to just ABORT in the first place.

The difference in TCP is that we can close the connection half-way,
something we can't do in sctp.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Vlad Yasevich June 29, 2011, 2:58 p.m. UTC | #3

On 06/29/2011 10:36 AM, Thomas Graf wrote:
> On Wed, Jun 29, 2011 at 10:20:01AM -0400, Vladislav Yasevich wrote:
>> I think in this particular case, the receiver has to terminate, not the sender.
>> Look at how tcp_close() handles this.
>>
>> As long as receiver is available, the sender should continue to try
>> sending data.
> 
> The receiver does not know that the sender wishes to shutdown the
> association. No shutdown request has been sent yet.
> 
> I don't think we should be relying on the behaviour of the sender for
> the receiver to be able to ever free its ressources. We will be
> retransmitting data and keeping an association alive _forever_ for no
> purpose.
> 
> If there is no reliable way of _ever_ doing a graceful shutdown then
> the only alternative is to just ABORT in the first place.
> 
> The difference in TCP is that we can close the connection half-way,
> something we can't do in sctp.
> 

But what you are proposing violates the protocol.  Zero-window probes do
not count against max retransmissions, even when you are in shutdown pending
state.

You'll come out of this one of 2 ways:
  1) receiver wakes up and processes data.  This will allow for graceful termination.
  2) receiver dies.  Since receive window is full, we have data queued, and this will
     trigger an ABORT to be sent to the sender.

What you patch is doing is taking a perfectly valid scenario and putting a time limit
on it in violation of the spec.

-vlad
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Thomas Graf June 29, 2011, 3:48 p.m. UTC | #4

On Wed, Jun 29, 2011 at 10:58:41AM -0400, Vladislav Yasevich wrote:
> But what you are proposing violates the protocol.  Zero-window probes do
> not count against max retransmissions, even when you are in shutdown pending
> state.
> 
> You'll come out of this one of 2 ways:
>   1) receiver wakes up and processes data.  This will allow for graceful termination.
>   2) receiver dies.  Since receive window is full, we have data queued, and this will
>      trigger an ABORT to be sent to the sender.

If by die you mean kill the process then this is exactly what I do to trigger
the issue. I simulatenously kill both processes. Not sure what you mean by
trigger an ABORT but I don't see that happen. I also don't see the rwnd reopen
after the socket is closed on the receiver side but that's a separate issue.

> What you patch is doing is taking a perfectly valid scenario and putting a time limit
> on it in violation of the spec.

We also violate the spec by not doing so. The spec says that the number of
SHUTDOWN retransmissions has to be limited by Max.Retrans which we also
can't enforce because of the above.

The scenario is closed sockets on both sides, endpoints on both sides gone
already and retransmissions + heartbeat requests forever.

Any alternative suggestion how to fix this?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Vlad Yasevich June 29, 2011, 4:14 p.m. UTC | #5

On 06/29/2011 11:48 AM, Thomas Graf wrote:
> On Wed, Jun 29, 2011 at 10:58:41AM -0400, Vladislav Yasevich wrote:
>> But what you are proposing violates the protocol.  Zero-window probes do
>> not count against max retransmissions, even when you are in shutdown pending
>> state.
>>
>> You'll come out of this one of 2 ways:
>>   1) receiver wakes up and processes data.  This will allow for graceful termination.
>>   2) receiver dies.  Since receive window is full, we have data queued, and this will
>>      trigger an ABORT to be sent to the sender.
> 
> If by die you mean kill the process then this is exactly what I do to trigger
> the issue. I simulatenously kill both processes. Not sure what you mean by
> trigger an ABORT but I don't see that happen. I also don't see the rwnd reopen
> after the socket is closed on the receiver side but that's a separate issue.

Right.  The lack of ABORT from the receive of data is a bug.  I was trying to point out
that instead of modified the sender of data to send the ABORT, you modify the receiver
to send the ABORT when it is being closed while having data queued.

> 
>> What you patch is doing is taking a perfectly valid scenario and putting a time limit
>> on it in violation of the spec.
> 
> We also violate the spec by not doing so. The spec says that the number of
> SHUTDOWN retransmissions has to be limited by Max.Retrans which we also
> can't enforce because of the above.

But we don't even get to sending the SHUTDOWN, so from the wire protocol, we
do not violated it.  We have bad behavior in that when both sender and receiver
are dead, the association is hung.

> 
> The scenario is closed sockets on both sides, endpoints on both sides gone
> already and retransmissions + heartbeat requests forever.
> 
> Any alternative suggestion how to fix this?
> 

The receiver, on sctp_close() checks to see if it has queued data and if it does, the
ABORT is triggered.  This is the same behavior as tcp_close().

-vlad
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Thomas Graf June 30, 2011, 8:49 a.m. UTC | #6

On Wed, Jun 29, 2011 at 12:14:41PM -0400, Vladislav Yasevich wrote:
> Right.  The lack of ABORT from the receive of data is a bug.  I was trying to point out
> that instead of modified the sender of data to send the ABORT, you modify the receiver
> to send the ABORT when it is being closed while having data queued.

Agreed. This makes a good procedure if there is data is on
sk_receive_queue and gets us in line with TCP although I don't see this
in the spec at all :-)

> But we don't even get to sending the SHUTDOWN, so from the wire protocol, we
> do not violated it.  We have bad behavior in that when both sender and receiver
> are dead, the association is hung.

So how do we get out if ...

1) there is nothing queued on sk_receive_queue but the window still
   remains 0 forver?

2) the receiver is an older Linux without the above fix or another stack
   that does not ABORT?

I agree that using ABORT on the receiver is the ideal way whenver
possible but we still need to fix this if the receiver does not do so.

What sideeffects are you worried about resulting from my proposal?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Vlad Yasevich June 30, 2011, 2:08 p.m. UTC | #7

On 06/30/2011 04:49 AM, Thomas Graf wrote:
> On Wed, Jun 29, 2011 at 12:14:41PM -0400, Vladislav Yasevich wrote:
>> Right.  The lack of ABORT from the receive of data is a bug.  I was trying to point out
>> that instead of modified the sender of data to send the ABORT, you modify the receiver
>> to send the ABORT when it is being closed while having data queued.
> 
> Agreed. This makes a good procedure if there is data is on
> sk_receive_queue and gets us in line with TCP although I don't see this
> in the spec at all :-)
> 
>> But we don't even get to sending the SHUTDOWN, so from the wire protocol, we
>> do not violated it.  We have bad behavior in that when both sender and receiver
>> are dead, the association is hung.
> 
> So how do we get out if ...
> 
> 1) there is nothing queued on sk_receive_queue but the window still
>    remains 0 forver?

sk_receive_queue isn't the only queue you have to check.  You'll need to check
the reassembly and ordering queues, as partial or out of order things might stuck
there.  That's would be an extremely rare condition since if we ever get here, the
first thing we do is reneg on those TSN and open the window to get the missing chunk
in and push complete packet up to sk_receive_queue.

>    
> 2) the receiver is an older Linux without the above fix or another stack
>    that does not ABORT?

crap....

How about this.  If we in SHUTDOWN_PENDING state, let the errors accumulate upto
max_retrans.  After that, start SHUTDOWN_GUARD timer to let the association live a
bit longer just on the off-chance the receive comes back.  When SHUTDOWN_GUARD
expires it will abort the association.

When we are in this state, SACK processing will have to reset SHUTDOWN_GUARD when
the SACK is actually acknowledging something.

> 
> I agree that using ABORT on the receiver is the ideal way whenver
> possible but we still need to fix this if the receiver does not do so.
> 
> What sideeffects are you worried about resulting from my proposal?
> 

There is a potential that the sender may abort prematurely.  The issue is that
the sender has no way of knowing if the remote process somehow terminated and
will never consume data, or if it is just extremely busy with something else and
will come back.  Since this is a reliable protocol, we given the receive the benefit
of the doubt and try our hardest to get the data across.

My suggestion above is still a bit of a hack that one could argue still violates the
protocol, but the time period tries to remove as much doubt from the sender as possible
the the receiver is really out-to-lunch.

-vlad
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Thomas Graf June 30, 2011, 4:17 p.m. UTC | #8

On Thu, Jun 30, 2011 at 10:08:40AM -0400, Vladislav Yasevich wrote:
> How about this.  If we in SHUTDOWN_PENDING state, let the errors accumulate upto
> max_retrans.  After that, start SHUTDOWN_GUARD timer to let the association live a
> bit longer just on the off-chance the receive comes back.  When SHUTDOWN_GUARD
> expires it will abort the association.
> 
> When we are in this state, SACK processing will have to reset SHUTDOWN_GUARD when
> the SACK is actually acknowledging something.

Good idea. I'll update my patch.

> > 
> > What sideeffects are you worried about resulting from my proposal?
> > 
> 
> There is a potential that the sender may abort prematurely.  The issue is that
> the sender has no way of knowing if the remote process somehow terminated and
> will never consume data, or if it is just extremely busy with something else and
> will come back.  Since this is a reliable protocol, we given the receive the benefit
> of the doubt and try our hardest to get the data across.

Understood although we are talking 10 * RTO here without an actual SACK.

> My suggestion above is still a bit of a hack that one could argue still violates the
> protocol, but the time period tries to remove as much doubt from the sender as possible
> the the receiver is really out-to-lunch.

Assuming that by 'shutdown sequence' the spec is only referring to the
SHUTDOWN / SHUTDOWN ACK exchange it would still violate the protocol
but I don't see how to avoid having association hang around forever without
violating the spec. This really looks like a hole in the spec to me.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

sctp: Enforce maximum retransmissions during shutdown

Commit Message

Comments

Patch