Message ID | 20231107143905.1288261-2-ekovalev.off@gmail.com |
---|---|
State | Accepted |
Delegated to: | Dumitru Ceara |
Headers | show |
Series | [ovs-dev,1/2] pinctrl: send RST instead of RST_ACK bit for lb hc | expand |
Context | Check | Description |
---|---|---|
ovsrobot/apply-robot | success | apply and check: success |
ovsrobot/github-robot-_Build_and_Test | success | github build: passed |
ovsrobot/github-robot-_ovn-kubernetes | success | github build: passed |
Thank you for the detailed explanation. It makes perfect sense to me. Acked-by: Mark Michelson <mmichels@redhat.com> On 11/7/23 09:39, Evgenii Kovalev wrote: > This patch adds reset n_count regardless of svc state and does only > if n_count >= configured_count. > > The behavior hc with windows server as backend in load balancers show some > issue with counters, e.g.: > > ovn-controller --> win_srv | SYN > ovn-controller <-- win_srv | SYN_ACK <- increase n_success > ovn-controller --> win_srv | RST_ACK > ovn-controller <-- win_srv | SYN_ACK (TCP Retransmission) <- increase n_success > ovn-controller --> win_srv | RST_ACK > ovn-controller <-- win_srv | SYN_ACK (TCP Retransmission) <- increase n_success > n_success == success_count => status = online; n_success = 0 > ovn-controller --> win_srv | RST_ACK > ovn-controller <-- win_srv | RST (win_srv don't recive ACK and sent RST) > increase n_failures (count = 1) > After wait_time: > ovn-controller --> win_srv | SYN > ovn-controller <-- win_srv | SYN_ACK <- increase n_success > ovn-controller --> win_srv | RST_ACK > ovn-controller <-- win_srv | SYN_ACK (TCP Retransmission) <- increase n_success > ovn-controller --> win_srv | RST_ACK > ovn-controller <-- win_srv | SYN_ACK (TCP Retransmission) <- increase n_success > n_success == success_count => status = online; n_success = 0 > ovn-controller --> win_srv | RST_ACK > ovn-controller <-- win_srv | RST (win_srv don't recive ACK and sent RST) > increase n_failures (count = 2) > After wait_time: > ovn-controller --> win_srv | SYN > ovn-controller <-- win_srv | SYN_ACK <- increase n_success > ovn-controller --> win_srv | RST_ACK > ovn-controller <-- win_srv | SYN_ACK (TCP Retransmission) <- increase n_success > ovn-controller --> win_srv | RST_ACK > ovn-controller <-- win_srv | SYN_ACK (TCP Retransmission) <- increase n_success > n_success == success_count => status = online; n_success = 0 > ovn-controller --> win_srv | RST_ACK > ovn-controller <-- win_srv | RST (win_srv don't recive ACK and sent RST) > increase n_failures (count = 3) > n_failures == failure_count => status = offline; n_failures = 0 > > So, the main point is svc reset the counters only for current state if > n_success >= success_count, but if a backend of lb for some reason > sent RST to ovn-controller it's just increase count and will reset only then > state is offline and greater or equal than failure_count. The same for SYN_ACK. > > Signed-off-by: Evgenii Kovalev <ekovalev.off@gmail.com> > --- > controller/pinctrl.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/controller/pinctrl.c b/controller/pinctrl.c > index f98f6d70c..158e0ee75 100644 > --- a/controller/pinctrl.c > +++ b/controller/pinctrl.c > @@ -7761,7 +7761,9 @@ svc_monitors_run(struct rconn *swconn, > if (svc_mon->n_success >= svc_mon->success_count) { > svc_mon->status = SVC_MON_ST_ONLINE; > svc_mon->n_success = 0; > + svc_mon->n_failures = 0; > } > + > if (current_time >= svc_mon->next_send_time) { > svc_monitor_send_health_check(swconn, svc_mon); > next_run_time = svc_mon->wait_time; > @@ -7773,6 +7775,7 @@ svc_monitors_run(struct rconn *swconn, > case SVC_MON_S_OFFLINE: > if (svc_mon->n_failures >= svc_mon->failure_count) { > svc_mon->status = SVC_MON_ST_OFFLINE; > + svc_mon->n_success = 0; > svc_mon->n_failures = 0; > } > > _______________________________________________ > dev mailing list > dev@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev >
On 11/8/23 18:55, Mark Michelson wrote: > Thank you for the detailed explanation. It makes perfect sense to me. > > Acked-by: Mark Michelson <mmichels@redhat.com> > Thanks, applied to main and backported to all stable branches down to 22.03. > On 11/7/23 09:39, Evgenii Kovalev wrote: >> This patch adds reset n_count regardless of svc state and does only >> if n_count >= configured_count. >> >> The behavior hc with windows server as backend in load balancers show >> some >> issue with counters, e.g.: >> >> ovn-controller --> win_srv | SYN >> ovn-controller <-- win_srv | SYN_ACK <- increase n_success >> ovn-controller --> win_srv | RST_ACK >> ovn-controller <-- win_srv | SYN_ACK (TCP Retransmission) <- increase >> n_success >> ovn-controller --> win_srv | RST_ACK >> ovn-controller <-- win_srv | SYN_ACK (TCP Retransmission) <- increase >> n_success >> n_success == success_count => status = >> online; n_success = 0 >> ovn-controller --> win_srv | RST_ACK >> ovn-controller <-- win_srv | RST (win_srv don't recive ACK and sent RST) >> increase n_failures (count = 1) >> After wait_time: >> ovn-controller --> win_srv | SYN >> ovn-controller <-- win_srv | SYN_ACK <- increase n_success >> ovn-controller --> win_srv | RST_ACK >> ovn-controller <-- win_srv | SYN_ACK (TCP Retransmission) <- increase >> n_success >> ovn-controller --> win_srv | RST_ACK >> ovn-controller <-- win_srv | SYN_ACK (TCP Retransmission) <- increase >> n_success >> n_success == success_count => status = >> online; n_success = 0 >> ovn-controller --> win_srv | RST_ACK >> ovn-controller <-- win_srv | RST (win_srv don't recive ACK and sent RST) >> increase n_failures (count = 2) >> After wait_time: >> ovn-controller --> win_srv | SYN >> ovn-controller <-- win_srv | SYN_ACK <- increase n_success >> ovn-controller --> win_srv | RST_ACK >> ovn-controller <-- win_srv | SYN_ACK (TCP Retransmission) <- increase >> n_success >> ovn-controller --> win_srv | RST_ACK >> ovn-controller <-- win_srv | SYN_ACK (TCP Retransmission) <- increase >> n_success >> n_success == success_count => status = >> online; n_success = 0 >> ovn-controller --> win_srv | RST_ACK >> ovn-controller <-- win_srv | RST (win_srv don't recive ACK and sent RST) >> increase n_failures (count = 3) >> n_failures == failure_count => status = >> offline; n_failures = 0 >> >> So, the main point is svc reset the counters only for current state if >> n_success >= success_count, but if a backend of lb for some reason >> sent RST to ovn-controller it's just increase count and will reset >> only then >> state is offline and greater or equal than failure_count. The same for >> SYN_ACK. >> >> Signed-off-by: Evgenii Kovalev <ekovalev.off@gmail.com> >> --- >> controller/pinctrl.c | 3 +++ >> 1 file changed, 3 insertions(+) >> >> diff --git a/controller/pinctrl.c b/controller/pinctrl.c >> index f98f6d70c..158e0ee75 100644 >> --- a/controller/pinctrl.c >> +++ b/controller/pinctrl.c >> @@ -7761,7 +7761,9 @@ svc_monitors_run(struct rconn *swconn, >> if (svc_mon->n_success >= svc_mon->success_count) { >> svc_mon->status = SVC_MON_ST_ONLINE; >> svc_mon->n_success = 0; >> + svc_mon->n_failures = 0; >> } >> + >> if (current_time >= svc_mon->next_send_time) { >> svc_monitor_send_health_check(swconn, svc_mon); >> next_run_time = svc_mon->wait_time; >> @@ -7773,6 +7775,7 @@ svc_monitors_run(struct rconn *swconn, >> case SVC_MON_S_OFFLINE: >> if (svc_mon->n_failures >= svc_mon->failure_count) { >> svc_mon->status = SVC_MON_ST_OFFLINE; >> + svc_mon->n_success = 0; >> svc_mon->n_failures = 0; >> } >> _______________________________________________ >> dev mailing list >> dev@openvswitch.org >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >> > > _______________________________________________ > dev mailing list > dev@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev >
diff --git a/controller/pinctrl.c b/controller/pinctrl.c index f98f6d70c..158e0ee75 100644 --- a/controller/pinctrl.c +++ b/controller/pinctrl.c @@ -7761,7 +7761,9 @@ svc_monitors_run(struct rconn *swconn, if (svc_mon->n_success >= svc_mon->success_count) { svc_mon->status = SVC_MON_ST_ONLINE; svc_mon->n_success = 0; + svc_mon->n_failures = 0; } + if (current_time >= svc_mon->next_send_time) { svc_monitor_send_health_check(swconn, svc_mon); next_run_time = svc_mon->wait_time; @@ -7773,6 +7775,7 @@ svc_monitors_run(struct rconn *swconn, case SVC_MON_S_OFFLINE: if (svc_mon->n_failures >= svc_mon->failure_count) { svc_mon->status = SVC_MON_ST_OFFLINE; + svc_mon->n_success = 0; svc_mon->n_failures = 0; }
This patch adds reset n_count regardless of svc state and does only if n_count >= configured_count. The behavior hc with windows server as backend in load balancers show some issue with counters, e.g.: ovn-controller --> win_srv | SYN ovn-controller <-- win_srv | SYN_ACK <- increase n_success ovn-controller --> win_srv | RST_ACK ovn-controller <-- win_srv | SYN_ACK (TCP Retransmission) <- increase n_success ovn-controller --> win_srv | RST_ACK ovn-controller <-- win_srv | SYN_ACK (TCP Retransmission) <- increase n_success n_success == success_count => status = online; n_success = 0 ovn-controller --> win_srv | RST_ACK ovn-controller <-- win_srv | RST (win_srv don't recive ACK and sent RST) increase n_failures (count = 1) After wait_time: ovn-controller --> win_srv | SYN ovn-controller <-- win_srv | SYN_ACK <- increase n_success ovn-controller --> win_srv | RST_ACK ovn-controller <-- win_srv | SYN_ACK (TCP Retransmission) <- increase n_success ovn-controller --> win_srv | RST_ACK ovn-controller <-- win_srv | SYN_ACK (TCP Retransmission) <- increase n_success n_success == success_count => status = online; n_success = 0 ovn-controller --> win_srv | RST_ACK ovn-controller <-- win_srv | RST (win_srv don't recive ACK and sent RST) increase n_failures (count = 2) After wait_time: ovn-controller --> win_srv | SYN ovn-controller <-- win_srv | SYN_ACK <- increase n_success ovn-controller --> win_srv | RST_ACK ovn-controller <-- win_srv | SYN_ACK (TCP Retransmission) <- increase n_success ovn-controller --> win_srv | RST_ACK ovn-controller <-- win_srv | SYN_ACK (TCP Retransmission) <- increase n_success n_success == success_count => status = online; n_success = 0 ovn-controller --> win_srv | RST_ACK ovn-controller <-- win_srv | RST (win_srv don't recive ACK and sent RST) increase n_failures (count = 3) n_failures == failure_count => status = offline; n_failures = 0 So, the main point is svc reset the counters only for current state if n_success >= success_count, but if a backend of lb for some reason sent RST to ovn-controller it's just increase count and will reset only then state is offline and greater or equal than failure_count. The same for SYN_ACK. Signed-off-by: Evgenii Kovalev <ekovalev.off@gmail.com> --- controller/pinctrl.c | 3 +++ 1 file changed, 3 insertions(+)