diff mbox series

[net-next,2/2] tipc: reduce sensitive to retransmit failures

Message ID 20191106062610.12039-2-hoang.h.le@dektech.com.au
State Accepted
Delegated to: David Miller
Headers show
Series [net-next,1/2] tipc: update cluster capabilities if node deleted | expand

Commit Message

Hoang Huu Le Nov. 6, 2019, 6:26 a.m. UTC
With huge cluster (e.g >200nodes), the amount of that flow:
gap -> retransmit packet -> acked will take time in case of STATE_MSG
dropped/delayed because a lot of traffic. This lead to 1.5 sec tolerance
value criteria made link easy failure around 2nd, 3rd of failed
retransmission attempts.

Instead of re-introduced criteria of 99 faled retransmissions to fix the
issue, we increase failure detection timer to ten times tolerance value.

Fixes: 77cf8edbc0e7 ("tipc: simplify stale link failure criteria")
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au>
---
 net/tipc/link.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Jon Maloy Nov. 6, 2019, 9:07 p.m. UTC | #1
Acked-by: Jon

> -----Original Message-----
> From: Hoang Le <hoang.h.le@dektech.com.au>
> Sent: 6-Nov-19 01:26
> To: Jon Maloy <jon.maloy@ericsson.com>; maloy@donjonn.com; netdev@vger.kernel.org; tipc-
> discussion@lists.sourceforge.net
> Subject: [net-next 2/2] tipc: reduce sensitive to retransmit failures
> 
> With huge cluster (e.g >200nodes), the amount of that flow:
> gap -> retransmit packet -> acked will take time in case of STATE_MSG
> dropped/delayed because a lot of traffic. This lead to 1.5 sec tolerance
> value criteria made link easy failure around 2nd, 3rd of failed
> retransmission attempts.
> 
> Instead of re-introduced criteria of 99 faled retransmissions to fix the
> issue, we increase failure detection timer to ten times tolerance value.
> 
> Fixes: 77cf8edbc0e7 ("tipc: simplify stale link failure criteria")
> Acked-by: Jon Maloy <jon.maloy@ericsson.com>
> Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au>
> ---
>  net/tipc/link.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/tipc/link.c b/net/tipc/link.c
> index 038861bad72b..2aed7a958a8c 100644
> --- a/net/tipc/link.c
> +++ b/net/tipc/link.c
> @@ -1087,7 +1087,7 @@ static bool link_retransmit_failure(struct tipc_link *l, struct tipc_link *r,
>  		return false;
> 
>  	if (!time_after(jiffies, TIPC_SKB_CB(skb)->retr_stamp +
> -			msecs_to_jiffies(r->tolerance)))
> +			msecs_to_jiffies(r->tolerance * 10)))
>  		return false;
> 
>  	hdr = buf_msg(skb);
> --
> 2.20.1
David Miller Nov. 7, 2019, 1:38 a.m. UTC | #2
From: Hoang Le <hoang.h.le@dektech.com.au>
Date: Wed,  6 Nov 2019 13:26:10 +0700

> With huge cluster (e.g >200nodes), the amount of that flow:
> gap -> retransmit packet -> acked will take time in case of STATE_MSG
> dropped/delayed because a lot of traffic. This lead to 1.5 sec tolerance
> value criteria made link easy failure around 2nd, 3rd of failed
> retransmission attempts.
> 
> Instead of re-introduced criteria of 99 faled retransmissions to fix the
> issue, we increase failure detection timer to ten times tolerance value.
> 
> Fixes: 77cf8edbc0e7 ("tipc: simplify stale link failure criteria")
> Acked-by: Jon Maloy <jon.maloy@ericsson.com>
> Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au>

Applied.
diff mbox series

Patch

diff --git a/net/tipc/link.c b/net/tipc/link.c
index 038861bad72b..2aed7a958a8c 100644
--- a/net/tipc/link.c
+++ b/net/tipc/link.c
@@ -1087,7 +1087,7 @@  static bool link_retransmit_failure(struct tipc_link *l, struct tipc_link *r,
 		return false;
 
 	if (!time_after(jiffies, TIPC_SKB_CB(skb)->retr_stamp +
-			msecs_to_jiffies(r->tolerance)))
+			msecs_to_jiffies(r->tolerance * 10)))
 		return false;
 
 	hdr = buf_msg(skb);