diff mbox series

[ovs-dev,2/2] pinctrl: reset success and failures n_count regardless of svc state

Message ID 20231107143905.1288261-2-ekovalev.off@gmail.com
State Accepted
Delegated to: Dumitru Ceara
Headers show
Series [ovs-dev,1/2] pinctrl: send RST instead of RST_ACK bit for lb hc | expand

Checks

Context Check Description
ovsrobot/apply-robot success apply and check: success
ovsrobot/github-robot-_Build_and_Test success github build: passed
ovsrobot/github-robot-_ovn-kubernetes success github build: passed

Commit Message

Evgenii Kovalev Nov. 7, 2023, 2:39 p.m. UTC
This patch adds reset n_count regardless of svc state and does only
if n_count >= configured_count.

The behavior hc with windows server as backend in load balancers show some
issue with counters, e.g.:

ovn-controller --> win_srv | SYN
ovn-controller <-- win_srv | SYN_ACK <- increase n_success
ovn-controller --> win_srv | RST_ACK
ovn-controller <-- win_srv | SYN_ACK (TCP Retransmission) <- increase n_success
ovn-controller --> win_srv | RST_ACK
ovn-controller <-- win_srv | SYN_ACK (TCP Retransmission) <- increase n_success
                             n_success == success_count => status = online; n_success = 0
ovn-controller --> win_srv | RST_ACK
ovn-controller <-- win_srv | RST (win_srv don't recive ACK and sent RST)
                             increase n_failures (count = 1)
After wait_time:
ovn-controller --> win_srv | SYN
ovn-controller <-- win_srv | SYN_ACK <- increase n_success
ovn-controller --> win_srv | RST_ACK
ovn-controller <-- win_srv | SYN_ACK (TCP Retransmission) <- increase n_success
ovn-controller --> win_srv | RST_ACK
ovn-controller <-- win_srv | SYN_ACK (TCP Retransmission) <- increase n_success
                             n_success == success_count => status = online; n_success = 0
ovn-controller --> win_srv | RST_ACK
ovn-controller <-- win_srv | RST (win_srv don't recive ACK and sent RST)
                             increase n_failures (count = 2)
After wait_time:
ovn-controller --> win_srv | SYN
ovn-controller <-- win_srv | SYN_ACK <- increase n_success
ovn-controller --> win_srv | RST_ACK
ovn-controller <-- win_srv | SYN_ACK (TCP Retransmission) <- increase n_success
ovn-controller --> win_srv | RST_ACK
ovn-controller <-- win_srv | SYN_ACK (TCP Retransmission) <- increase n_success
                             n_success == success_count => status = online; n_success = 0
ovn-controller --> win_srv | RST_ACK
ovn-controller <-- win_srv | RST (win_srv don't recive ACK and sent RST)
                             increase n_failures (count = 3)
                             n_failures == failure_count => status = offline; n_failures = 0

So, the main point is svc reset the counters only for current state if
n_success >= success_count, but if a backend of lb for some reason
sent RST to ovn-controller it's just increase count and will reset only then
state is offline and greater or equal than failure_count. The same for SYN_ACK.

Signed-off-by: Evgenii Kovalev <ekovalev.off@gmail.com>
---
 controller/pinctrl.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Mark Michelson Nov. 8, 2023, 5:55 p.m. UTC | #1
Thank you for the detailed explanation. It makes perfect sense to me.

Acked-by: Mark Michelson <mmichels@redhat.com>

On 11/7/23 09:39, Evgenii Kovalev wrote:
> This patch adds reset n_count regardless of svc state and does only
> if n_count >= configured_count.
> 
> The behavior hc with windows server as backend in load balancers show some
> issue with counters, e.g.:
> 
> ovn-controller --> win_srv | SYN
> ovn-controller <-- win_srv | SYN_ACK <- increase n_success
> ovn-controller --> win_srv | RST_ACK
> ovn-controller <-- win_srv | SYN_ACK (TCP Retransmission) <- increase n_success
> ovn-controller --> win_srv | RST_ACK
> ovn-controller <-- win_srv | SYN_ACK (TCP Retransmission) <- increase n_success
>                               n_success == success_count => status = online; n_success = 0
> ovn-controller --> win_srv | RST_ACK
> ovn-controller <-- win_srv | RST (win_srv don't recive ACK and sent RST)
>                               increase n_failures (count = 1)
> After wait_time:
> ovn-controller --> win_srv | SYN
> ovn-controller <-- win_srv | SYN_ACK <- increase n_success
> ovn-controller --> win_srv | RST_ACK
> ovn-controller <-- win_srv | SYN_ACK (TCP Retransmission) <- increase n_success
> ovn-controller --> win_srv | RST_ACK
> ovn-controller <-- win_srv | SYN_ACK (TCP Retransmission) <- increase n_success
>                               n_success == success_count => status = online; n_success = 0
> ovn-controller --> win_srv | RST_ACK
> ovn-controller <-- win_srv | RST (win_srv don't recive ACK and sent RST)
>                               increase n_failures (count = 2)
> After wait_time:
> ovn-controller --> win_srv | SYN
> ovn-controller <-- win_srv | SYN_ACK <- increase n_success
> ovn-controller --> win_srv | RST_ACK
> ovn-controller <-- win_srv | SYN_ACK (TCP Retransmission) <- increase n_success
> ovn-controller --> win_srv | RST_ACK
> ovn-controller <-- win_srv | SYN_ACK (TCP Retransmission) <- increase n_success
>                               n_success == success_count => status = online; n_success = 0
> ovn-controller --> win_srv | RST_ACK
> ovn-controller <-- win_srv | RST (win_srv don't recive ACK and sent RST)
>                               increase n_failures (count = 3)
>                               n_failures == failure_count => status = offline; n_failures = 0
> 
> So, the main point is svc reset the counters only for current state if
> n_success >= success_count, but if a backend of lb for some reason
> sent RST to ovn-controller it's just increase count and will reset only then
> state is offline and greater or equal than failure_count. The same for SYN_ACK.
> 
> Signed-off-by: Evgenii Kovalev <ekovalev.off@gmail.com>
> ---
>   controller/pinctrl.c | 3 +++
>   1 file changed, 3 insertions(+)
> 
> diff --git a/controller/pinctrl.c b/controller/pinctrl.c
> index f98f6d70c..158e0ee75 100644
> --- a/controller/pinctrl.c
> +++ b/controller/pinctrl.c
> @@ -7761,7 +7761,9 @@ svc_monitors_run(struct rconn *swconn,
>               if (svc_mon->n_success >= svc_mon->success_count) {
>                   svc_mon->status = SVC_MON_ST_ONLINE;
>                   svc_mon->n_success = 0;
> +                svc_mon->n_failures = 0;
>               }
> +
>               if (current_time >= svc_mon->next_send_time) {
>                   svc_monitor_send_health_check(swconn, svc_mon);
>                   next_run_time = svc_mon->wait_time;
> @@ -7773,6 +7775,7 @@ svc_monitors_run(struct rconn *swconn,
>           case SVC_MON_S_OFFLINE:
>               if (svc_mon->n_failures >= svc_mon->failure_count) {
>                   svc_mon->status = SVC_MON_ST_OFFLINE;
> +                svc_mon->n_success = 0;
>                   svc_mon->n_failures = 0;
>               }
>   
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
Dumitru Ceara Nov. 20, 2023, 7:47 p.m. UTC | #2
On 11/8/23 18:55, Mark Michelson wrote:
> Thank you for the detailed explanation. It makes perfect sense to me.
> 
> Acked-by: Mark Michelson <mmichels@redhat.com>
> 

Thanks, applied to main and backported to all stable branches down to 22.03.

> On 11/7/23 09:39, Evgenii Kovalev wrote:
>> This patch adds reset n_count regardless of svc state and does only
>> if n_count >= configured_count.
>>
>> The behavior hc with windows server as backend in load balancers show
>> some
>> issue with counters, e.g.:
>>
>> ovn-controller --> win_srv | SYN
>> ovn-controller <-- win_srv | SYN_ACK <- increase n_success
>> ovn-controller --> win_srv | RST_ACK
>> ovn-controller <-- win_srv | SYN_ACK (TCP Retransmission) <- increase
>> n_success
>> ovn-controller --> win_srv | RST_ACK
>> ovn-controller <-- win_srv | SYN_ACK (TCP Retransmission) <- increase
>> n_success
>>                               n_success == success_count => status =
>> online; n_success = 0
>> ovn-controller --> win_srv | RST_ACK
>> ovn-controller <-- win_srv | RST (win_srv don't recive ACK and sent RST)
>>                               increase n_failures (count = 1)
>> After wait_time:
>> ovn-controller --> win_srv | SYN
>> ovn-controller <-- win_srv | SYN_ACK <- increase n_success
>> ovn-controller --> win_srv | RST_ACK
>> ovn-controller <-- win_srv | SYN_ACK (TCP Retransmission) <- increase
>> n_success
>> ovn-controller --> win_srv | RST_ACK
>> ovn-controller <-- win_srv | SYN_ACK (TCP Retransmission) <- increase
>> n_success
>>                               n_success == success_count => status =
>> online; n_success = 0
>> ovn-controller --> win_srv | RST_ACK
>> ovn-controller <-- win_srv | RST (win_srv don't recive ACK and sent RST)
>>                               increase n_failures (count = 2)
>> After wait_time:
>> ovn-controller --> win_srv | SYN
>> ovn-controller <-- win_srv | SYN_ACK <- increase n_success
>> ovn-controller --> win_srv | RST_ACK
>> ovn-controller <-- win_srv | SYN_ACK (TCP Retransmission) <- increase
>> n_success
>> ovn-controller --> win_srv | RST_ACK
>> ovn-controller <-- win_srv | SYN_ACK (TCP Retransmission) <- increase
>> n_success
>>                               n_success == success_count => status =
>> online; n_success = 0
>> ovn-controller --> win_srv | RST_ACK
>> ovn-controller <-- win_srv | RST (win_srv don't recive ACK and sent RST)
>>                               increase n_failures (count = 3)
>>                               n_failures == failure_count => status =
>> offline; n_failures = 0
>>
>> So, the main point is svc reset the counters only for current state if
>> n_success >= success_count, but if a backend of lb for some reason
>> sent RST to ovn-controller it's just increase count and will reset
>> only then
>> state is offline and greater or equal than failure_count. The same for
>> SYN_ACK.
>>
>> Signed-off-by: Evgenii Kovalev <ekovalev.off@gmail.com>
>> ---
>>   controller/pinctrl.c | 3 +++
>>   1 file changed, 3 insertions(+)
>>
>> diff --git a/controller/pinctrl.c b/controller/pinctrl.c
>> index f98f6d70c..158e0ee75 100644
>> --- a/controller/pinctrl.c
>> +++ b/controller/pinctrl.c
>> @@ -7761,7 +7761,9 @@ svc_monitors_run(struct rconn *swconn,
>>               if (svc_mon->n_success >= svc_mon->success_count) {
>>                   svc_mon->status = SVC_MON_ST_ONLINE;
>>                   svc_mon->n_success = 0;
>> +                svc_mon->n_failures = 0;
>>               }
>> +
>>               if (current_time >= svc_mon->next_send_time) {
>>                   svc_monitor_send_health_check(swconn, svc_mon);
>>                   next_run_time = svc_mon->wait_time;
>> @@ -7773,6 +7775,7 @@ svc_monitors_run(struct rconn *swconn,
>>           case SVC_MON_S_OFFLINE:
>>               if (svc_mon->n_failures >= svc_mon->failure_count) {
>>                   svc_mon->status = SVC_MON_ST_OFFLINE;
>> +                svc_mon->n_success = 0;
>>                   svc_mon->n_failures = 0;
>>               }
>>   _______________________________________________
>> dev mailing list
>> dev@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>
> 
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
diff mbox series

Patch

diff --git a/controller/pinctrl.c b/controller/pinctrl.c
index f98f6d70c..158e0ee75 100644
--- a/controller/pinctrl.c
+++ b/controller/pinctrl.c
@@ -7761,7 +7761,9 @@  svc_monitors_run(struct rconn *swconn,
             if (svc_mon->n_success >= svc_mon->success_count) {
                 svc_mon->status = SVC_MON_ST_ONLINE;
                 svc_mon->n_success = 0;
+                svc_mon->n_failures = 0;
             }
+
             if (current_time >= svc_mon->next_send_time) {
                 svc_monitor_send_health_check(swconn, svc_mon);
                 next_run_time = svc_mon->wait_time;
@@ -7773,6 +7775,7 @@  svc_monitors_run(struct rconn *swconn,
         case SVC_MON_S_OFFLINE:
             if (svc_mon->n_failures >= svc_mon->failure_count) {
                 svc_mon->status = SVC_MON_ST_OFFLINE;
+                svc_mon->n_success = 0;
                 svc_mon->n_failures = 0;
             }