Message ID | 20230608140411.148041-1-dceara@redhat.com |
---|---|
State | RFC |
Headers | show |
Series | [ovs-dev,RFC] ci: Remove '--recheck' in CI. | expand |
Context | Check | Description |
---|---|---|
ovsrobot/apply-robot | success | apply and check: success |
ovsrobot/github-robot-_ovn-kubernetes | success | github build: passed |
ovsrobot/github-robot-_Build_and_Test | fail | github build: failed |
In theory, this idea is great. By removing the recheck, our CI will potentially turn red more often, resulting in us investigating our tests to resolve what makes them flaky. Eventually, we will resolve all of the flakes and CI will be green. From that point on, any time the CI turns red, it will immediately get our attention and we will fix whatever needs fixing. I have two main fears here: 1) If we don't fix the existing flaky tests quickly enough, it will be considered "normal" for CI to be red, meaning we are less likely to notice genuine failures when they happen. Worse, developers familiar with the flaky reputation of tests will just issue their own rechecks until the CI turns green. 2) Let's say we track down and fix all current flaky tests. Developers will soon write new tests, and those tests may end up being flaky. Then our CI will start turning red again due to test failures seemingly unrelated to code being committed. The result is the same as (1). By leaving the recheck intact, we are less likely to notice and fix flaky tests. That's also bad. I think that if we are going to remove the recheck now, then we have to be fully dedicated to eradicating all existing flaky tests. If we as a project can be dedicated to fixing flaky tests *immediately* when they are found, then we could move forward with this change now. I just don't trust us to actually do it. I think a more plausible idea would be to identify all known flaky tests, fix those, and then remove the recheck. This way, we only have to deal with new flaky tests that get written in the future. I think we can deal with this situation more easily than suddenly going from green CI to red CI because of existing flaky tests. Everything below this sentence is wishful thinking and shouldn't get in the way of approving/rejecting this change. I also think that we need to identify the common causes of flaky tests, and take measures to prevent them from happening in the future. There are a few ways I can think of to accomplish this: 1) Ensure people performing code reviews are aware of these patterns and point them out during code review. 2) Automatically prevent known flaky patterns from existing in our code. For instance, we could introduce a build-time check that ensures that nobody attempts to sleep during a test to "let things settle." 3) Provide functions/macros to use in the testsuite that avoid potentially flaky behavior. As a hypothetical example, let's say that a common cause of flakes is not using "--wait=hv" or "--wait=sb" after a series of ovn-nbctl commands. We could provide a macro like: OVN_NBCTL([hv], [ lr-add ro0 ls-add sw0 ]) (That's probably bad syntax, but I think you get the idea) This would expand to something like: check ovn-nbctl lr-add ro0 check ovn-nbctl ls-add sw0 check ovn-nbctl --wait=hv sync If we combine this with the idea from (2), we could prevent all bare uses of `ovn-nbctl` in tests, thus preventing this flaky behavior from existing. 4) Identify commonly used patterns in tests and provide macros that essentially do the work for us. Some examples might be: a) Common logical network topologies could be set up for you by calling a single macro. b) Checking logical flows could have a macro that dumps the information you care about, sorts the output, anonymizes the table numbers, and eliminates odd spacing in the output. c) Ensuring TCP/UDP/IP connectivity between two endpoints could potentially be done with a macro or two, assuming that the traffic you are sending is not overly specific. The idea here is that by having commonly used macros that are bulletproof, we are less likely to introduce flakes by hand-coding the same scenarios. It also would likely reduce the code size of the testsuite, which is a nice bonus. 5) Enhanced debugability of tests would be great. There are too many times that I've looked at testsuite.log on a failing test and had no idea where to start looking for the failure. There are other times where it's clear what failed, but trying to figure out why is more difficult. Unfortunately, there isn't likely to be a magic cure-all for this problem. On 6/8/23 10:04, Dumitru Ceara wrote: > If we want to catch new failures faster we have a better chance if CI > doesn't auto-retry (once). > > Signed-off-by: Dumitru Ceara <dceara@redhat.com> > --- > NOTE: Sending this as RFC to start the discussion with the community. > --- > .ci/linux-build.sh | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/.ci/linux-build.sh b/.ci/linux-build.sh > index 907a0dc6c9..64f7a96d91 100755 > --- a/.ci/linux-build.sh > +++ b/.ci/linux-build.sh > @@ -66,7 +66,7 @@ function execute_tests() > > export DISTCHECK_CONFIGURE_FLAGS="$OPTS" > if ! make distcheck CFLAGS="${COMMON_CFLAGS} ${OVN_CFLAGS}" $JOBS \ > - TESTSUITEFLAGS="$JOBS $TEST_RANGE" RECHECK=yes > + TESTSUITEFLAGS="$JOBS $TEST_RANGE" > then > # testsuite.log is necessary for debugging. > cat */_build/sub/tests/testsuite.log > @@ -81,7 +81,7 @@ function execute_system_tests() > > configure_ovn $OPTS > make $JOBS || { cat config.log; exit 1; } > - if ! sudo make $JOBS $type TESTSUITEFLAGS="$TEST_RANGE" RECHECK=yes; then > + if ! sudo make $JOBS $type TESTSUITEFLAGS="$TEST_RANGE"; then > # $log_file is necessary for debugging. > cat tests/$log_file > exit 1
Hi Mark, Thanks for your feedback on this! On 6/8/23 21:58, Mark Michelson wrote: > In theory, this idea is great. By removing the recheck, our CI will > potentially turn red more often, resulting in us investigating our tests > to resolve what makes them flaky. Eventually, we will resolve all of the > flakes and CI will be green. From that point on, any time the CI turns > red, it will immediately get our attention and we will fix whatever > needs fixing. > > I have two main fears here: > > 1) If we don't fix the existing flaky tests quickly enough, it will be > considered "normal" for CI to be red, meaning we are less likely to > notice genuine failures when they happen. Worse, developers familiar > with the flaky reputation of tests will just issue their own rechecks > until the CI turns green. > > 2) Let's say we track down and fix all current flaky tests. Developers > will soon write new tests, and those tests may end up being flaky. Then > our CI will start turning red again due to test failures seemingly > unrelated to code being committed. The result is the same as (1). > > By leaving the recheck intact, we are less likely to notice and fix > flaky tests. That's also bad. > > I think that if we are going to remove the recheck now, then we have to > be fully dedicated to eradicating all existing flaky tests. If we as a > project can be dedicated to fixing flaky tests *immediately* when they > are found, then we could move forward with this change now. I just don't > trust us to actually do it. > > I think a more plausible idea would be to identify all known flaky > tests, fix those, and then remove the recheck. This way, we only have to > deal with new flaky tests that get written in the future. I think we can > deal with this situation more easily than suddenly going from green CI > to red CI because of existing flaky tests. > What about the following compromise instead? Identify all known flaky tests, fix the ones we can fix without too much effort, tag the ones that we couldn't easily fix with a new keyword (e.g., "unstable" or "flaky" or whatever), and only run recheck for those. In support of this I did a test last night. I force pushed the commit that removes the --recheck multiple times in my fork to trigger CI. The GitHub action ran 23 times which means the unit and system tests were ran 3 x 23 = 69 times. I collected the results and grouped them by test and counted how many times they failed (disclaimer: it's not completely fair because some tests run with different variations more times than others but it should be a good enough first approximation). I got: 40 ovn.at:30812 propagate Port_Binding.up to NB and OVS -- ovn-northd 34 ovn-northd.at:9487 LSP incremental processing -- ovn-northd 8 ovn.at:14436 options:multiple requested-chassis for logical port: unclaimed behavior -- ovn-northd 7 ovn.at:16010 tug-of-war between two chassis for the same port -- ovn-northd 6 ovn.at:29369 nb_cfg timestamp -- ovn-northd 5 ovn-northd.at:4507 northd ssl file change -- ovn-northd 4 ovn-performance.at:227 ovn-controller incremental processing 3 ovn.at:34900 Check default openflow flows -- ovn-northd 3 ovn.at:34466 recomputes -- ovn-northd 3 ovn.at:14284 options:multiple requested-chassis for logical port -- ovn-northd 2 ovn.at:7909 policy-based routing IPv6: 1 HVs, 3 LSs, 1 lport/LS, 1 LR -- ovn-northd 1 system-ovn.at:6525 egress qos -- ovn-northd 1 system-ovn.at:11151 ovn mirroring -- ovn-northd 1 ovn-northd.at:8809 Address set incremental processing -- ovn-northd 1 ovn-controller.at:702 ovn-controller - ovn action metering -- ovn-northd 1 ovn.at:29467 ARP replies for SNAT external ips -- ovn-northd 1 ovn.at:13171 1 LR with HA distributed router gateway port -- ovn-northd The first two are probably real bugs and I reported them separately yesterday [0] but the other 15 didn't fail that often. I would even go one step further and ignore the ones that only failed once (it's acceptable IMO every now and then a maintainer has to re-trigger CI). That would leave us with 9 tests that failed during this exercise and need either to be fixed or to be tagged as "unstable": 8 ovn.at:14436 options:multiple requested-chassis for logical port: unclaimed behavior -- ovn-northd 7 ovn.at:16010 tug-of-war between two chassis for the same port -- ovn-northd 6 ovn.at:29369 nb_cfg timestamp -- ovn-northd 5 ovn-northd.at:4507 northd ssl file change -- ovn-northd 4 ovn-performance.at:227 ovn-controller incremental processing 3 ovn.at:34900 Check default openflow flows -- ovn-northd 3 ovn.at:34466 recomputes -- ovn-northd 3 ovn.at:14284 options:multiple requested-chassis for logical port -- ovn-northd 2 ovn.at:7909 policy-based routing IPv6: 1 HVs, 3 LSs, 1 lport/LS, 1 LR -- ovn-northd Would you consider this compromise as a good alternative? [0] https://mail.openvswitch.org/pipermail/ovs-dev/2023-June/405418.html > > Everything below this sentence is wishful thinking and shouldn't get in > the way of approving/rejecting this change. > > > I also think that we need to identify the common causes of flaky tests, > and take measures to prevent them from happening in the future. There > are a few ways I can think of to accomplish this: > > 1) Ensure people performing code reviews are aware of these patterns and > point them out during code review. > > 2) Automatically prevent known flaky patterns from existing in our code. > For instance, we could introduce a build-time check that ensures that > nobody attempts to sleep during a test to "let things settle." > > 3) Provide functions/macros to use in the testsuite that avoid > potentially flaky behavior. As a hypothetical example, let's say that a > common cause of flakes is not using "--wait=hv" or "--wait=sb" after a > series of ovn-nbctl commands. We could provide a macro like: > > OVN_NBCTL([hv], [ > lr-add ro0 > ls-add sw0 > ]) > > (That's probably bad syntax, but I think you get the idea) > > This would expand to something like: > check ovn-nbctl lr-add ro0 > check ovn-nbctl ls-add sw0 > check ovn-nbctl --wait=hv sync > > If we combine this with the idea from (2), we could prevent all bare > uses of `ovn-nbctl` in tests, thus preventing this flaky behavior from > existing. > > 4) Identify commonly used patterns in tests and provide macros that > essentially do the work for us. Some examples might be: > > a) Common logical network topologies could be set up for you by calling > a single macro. > b) Checking logical flows could have a macro that dumps the information > you care about, sorts the output, anonymizes the table numbers, and > eliminates odd spacing in the output. > c) Ensuring TCP/UDP/IP connectivity between two endpoints could > potentially be done with a macro or two, assuming that the traffic you > are sending is not overly specific. > > The idea here is that by having commonly used macros that are > bulletproof, we are less likely to introduce flakes by hand-coding the > same scenarios. It also would likely reduce the code size of the > testsuite, which is a nice bonus. > > 5) Enhanced debugability of tests would be great. There are too many > times that I've looked at testsuite.log on a failing test and had no > idea where to start looking for the failure. There are other times where > it's clear what failed, but trying to figure out why is more difficult. > Unfortunately, there isn't likely to be a magic cure-all for this problem. > I think all 5 points above are things that are nice to have regardless of --recheck happening or not. Thanks, Dumitru > On 6/8/23 10:04, Dumitru Ceara wrote: >> If we want to catch new failures faster we have a better chance if CI >> doesn't auto-retry (once). >> >> Signed-off-by: Dumitru Ceara <dceara@redhat.com> >> --- >> NOTE: Sending this as RFC to start the discussion with the community. >> --- >> .ci/linux-build.sh | 4 ++-- >> 1 file changed, 2 insertions(+), 2 deletions(-) >> >> diff --git a/.ci/linux-build.sh b/.ci/linux-build.sh >> index 907a0dc6c9..64f7a96d91 100755 >> --- a/.ci/linux-build.sh >> +++ b/.ci/linux-build.sh >> @@ -66,7 +66,7 @@ function execute_tests() >> export DISTCHECK_CONFIGURE_FLAGS="$OPTS" >> if ! make distcheck CFLAGS="${COMMON_CFLAGS} ${OVN_CFLAGS}" $JOBS \ >> - TESTSUITEFLAGS="$JOBS $TEST_RANGE" RECHECK=yes >> + TESTSUITEFLAGS="$JOBS $TEST_RANGE" >> then >> # testsuite.log is necessary for debugging. >> cat */_build/sub/tests/testsuite.log >> @@ -81,7 +81,7 @@ function execute_system_tests() >> configure_ovn $OPTS >> make $JOBS || { cat config.log; exit 1; } >> - if ! sudo make $JOBS $type TESTSUITEFLAGS="$TEST_RANGE" >> RECHECK=yes; then >> + if ! sudo make $JOBS $type TESTSUITEFLAGS="$TEST_RANGE"; then >> # $log_file is necessary for debugging. >> cat tests/$log_file >> exit 1 >
fre. 9. jun. 2023, 12:25 skrev Dumitru Ceara <dceara@redhat.com>: > Hi Mark, > > Thanks for your feedback on this! > > On 6/8/23 21:58, Mark Michelson wrote: > > In theory, this idea is great. By removing the recheck, our CI will > > potentially turn red more often, resulting in us investigating our tests > > to resolve what makes them flaky. Eventually, we will resolve all of the > > flakes and CI will be green. From that point on, any time the CI turns > > red, it will immediately get our attention and we will fix whatever > > needs fixing. > > > > I have two main fears here: > > > > 1) If we don't fix the existing flaky tests quickly enough, it will be > > considered "normal" for CI to be red, meaning we are less likely to > > notice genuine failures when they happen. Worse, developers familiar > > with the flaky reputation of tests will just issue their own rechecks > > until the CI turns green. > > > > 2) Let's say we track down and fix all current flaky tests. Developers > > will soon write new tests, and those tests may end up being flaky. Then > > our CI will start turning red again due to test failures seemingly > > unrelated to code being committed. The result is the same as (1). > > > > By leaving the recheck intact, we are less likely to notice and fix > > flaky tests. That's also bad. > > > > I think that if we are going to remove the recheck now, then we have to > > be fully dedicated to eradicating all existing flaky tests. If we as a > > project can be dedicated to fixing flaky tests *immediately* when they > > are found, then we could move forward with this change now. I just don't > > trust us to actually do it. > > > > I think a more plausible idea would be to identify all known flaky > > tests, fix those, and then remove the recheck. This way, we only have to > > deal with new flaky tests that get written in the future. I think we can > > deal with this situation more easily than suddenly going from green CI > > to red CI because of existing flaky tests. > > > > What about the following compromise instead? > > Identify all known flaky tests, fix the ones we can fix without too much > effort, tag the ones that we couldn't easily fix with a new keyword > (e.g., "unstable" or "flaky" or whatever), and only run recheck for > those. > > In support of this I did a test last night. I force pushed the commit > that removes the --recheck multiple times in my fork to trigger CI. > The GitHub action ran 23 times which means the unit and system tests > were ran 3 x 23 = 69 times. I collected the results and grouped them > by test and counted how many times they failed (disclaimer: it's not > completely fair because some tests run with different variations more > times than others but it should be a good enough first approximation). > I got: > > 40 ovn.at:30812 propagate Port_Binding.up to NB and OVS -- > ovn-northd > 34 ovn-northd.at:9487 LSP incremental processing -- ovn-northd > 8 ovn.at:14436 options:multiple requested-chassis for > logical port: unclaimed behavior -- ovn-northd > 7 ovn.at:16010 tug-of-war between two chassis for the same > port -- ovn-northd > 6 ovn.at:29369 nb_cfg timestamp -- ovn-northd > 5 ovn-northd.at:4507 northd ssl file change -- ovn-northd > 4 ovn-performance.at:227 ovn-controller incremental processing > 3 ovn.at:34900 Check default openflow flows -- ovn-northd > 3 ovn.at:34466 recomputes -- ovn-northd > 3 ovn.at:14284 options:multiple requested-chassis for > logical port -- ovn-northd > 2 ovn.at:7909 policy-based routing IPv6: 1 HVs, 3 LSs, 1 > lport/LS, 1 LR -- ovn-northd > 1 system-ovn.at:6525 egress qos -- ovn-northd > 1 system-ovn.at:11151 ovn mirroring -- ovn-northd > 1 ovn-northd.at:8809 Address set incremental processing -- > ovn-northd > 1 ovn-controller.at:702 ovn-controller - ovn action metering -- > ovn-northd > 1 ovn.at:29467 ARP replies for SNAT external ips -- > ovn-northd > 1 ovn.at:13171 1 LR with HA distributed router gateway port > -- ovn-northd > > The first two are probably real bugs and I reported them separately > yesterday [0] but the other 15 didn't fail that often. I would even > go one step further and ignore the ones that only failed once (it's > acceptable IMO every now and then a maintainer has to re-trigger CI). > > That would leave us with 9 tests that failed during this exercise and > need either to be fixed or to be tagged as "unstable": > 8 ovn.at:14436 options:multiple requested-chassis for > logical port: unclaimed behavior -- ovn-northd > 7 ovn.at:16010 tug-of-war between two chassis for the same > port -- ovn-northd > 6 ovn.at:29369 nb_cfg timestamp -- ovn-northd > 5 ovn-northd.at:4507 northd ssl file change -- ovn-northd > 4 ovn-performance.at:227 ovn-controller incremental processing > 3 ovn.at:34900 Check default openflow flows -- ovn-northd > 3 ovn.at:34466 recomputes -- ovn-northd > 3 ovn.at:14284 options:multiple requested-chassis for > logical port -- ovn-northd > 2 ovn.at:7909 policy-based routing IPv6: 1 HVs, 3 LSs, 1 > lport/LS, 1 LR -- ovn-northd > Just a data point, the above list is not too far from what Debian/Ubuntu currently list as flaky tests in the packaging [1]. Would you consider this compromise as a good alternative? > The existence of flaky test is an annoyance outside of the context of the upstream CI, so +1 on any effort to stop new ones from creeping in. 1: https://salsa.debian.org/openstack-team/third-party/ovn/-/tree/debian/sid/debian -- Frode Nordahl [0] https://mail.openvswitch.org/pipermail/ovs-dev/2023-June/405418.html > > > > > Everything below this sentence is wishful thinking and shouldn't get in > > the way of approving/rejecting this change. > > > > > > I also think that we need to identify the common causes of flaky tests, > > and take measures to prevent them from happening in the future. There > > are a few ways I can think of to accomplish this: > > > > 1) Ensure people performing code reviews are aware of these patterns and > > point them out during code review. > > > > 2) Automatically prevent known flaky patterns from existing in our code. > > For instance, we could introduce a build-time check that ensures that > > nobody attempts to sleep during a test to "let things settle." > > > > 3) Provide functions/macros to use in the testsuite that avoid > > potentially flaky behavior. As a hypothetical example, let's say that a > > common cause of flakes is not using "--wait=hv" or "--wait=sb" after a > > series of ovn-nbctl commands. We could provide a macro like: > > > > OVN_NBCTL([hv], [ > > lr-add ro0 > > ls-add sw0 > > ]) > > > > (That's probably bad syntax, but I think you get the idea) > > > > This would expand to something like: > > check ovn-nbctl lr-add ro0 > > check ovn-nbctl ls-add sw0 > > check ovn-nbctl --wait=hv sync > > > > If we combine this with the idea from (2), we could prevent all bare > > uses of `ovn-nbctl` in tests, thus preventing this flaky behavior from > > existing. > > > > 4) Identify commonly used patterns in tests and provide macros that > > essentially do the work for us. Some examples might be: > > > > a) Common logical network topologies could be set up for you by calling > > a single macro. > > b) Checking logical flows could have a macro that dumps the information > > you care about, sorts the output, anonymizes the table numbers, and > > eliminates odd spacing in the output. > > c) Ensuring TCP/UDP/IP connectivity between two endpoints could > > potentially be done with a macro or two, assuming that the traffic you > > are sending is not overly specific. > > > > The idea here is that by having commonly used macros that are > > bulletproof, we are less likely to introduce flakes by hand-coding the > > same scenarios. It also would likely reduce the code size of the > > testsuite, which is a nice bonus. > > > > 5) Enhanced debugability of tests would be great. There are too many > > times that I've looked at testsuite.log on a failing test and had no > > idea where to start looking for the failure. There are other times where > > it's clear what failed, but trying to figure out why is more difficult. > > Unfortunately, there isn't likely to be a magic cure-all for this > problem. > > > > I think all 5 points above are things that are nice to have regardless > of --recheck happening or not. > > Thanks, > Dumitru > > > On 6/8/23 10:04, Dumitru Ceara wrote: > >> If we want to catch new failures faster we have a better chance if CI > >> doesn't auto-retry (once). > >> > >> Signed-off-by: Dumitru Ceara <dceara@redhat.com> > >> --- > >> NOTE: Sending this as RFC to start the discussion with the community. > >> --- > >> .ci/linux-build.sh | 4 ++-- > >> 1 file changed, 2 insertions(+), 2 deletions(-) > >> > >> diff --git a/.ci/linux-build.sh b/.ci/linux-build.sh > >> index 907a0dc6c9..64f7a96d91 100755 > >> --- a/.ci/linux-build.sh > >> +++ b/.ci/linux-build.sh > >> @@ -66,7 +66,7 @@ function execute_tests() > >> export DISTCHECK_CONFIGURE_FLAGS="$OPTS" > >> if ! make distcheck CFLAGS="${COMMON_CFLAGS} ${OVN_CFLAGS}" $JOBS > \ > >> - TESTSUITEFLAGS="$JOBS $TEST_RANGE" RECHECK=yes > >> + TESTSUITEFLAGS="$JOBS $TEST_RANGE" > >> then > >> # testsuite.log is necessary for debugging. > >> cat */_build/sub/tests/testsuite.log > >> @@ -81,7 +81,7 @@ function execute_system_tests() > >> configure_ovn $OPTS > >> make $JOBS || { cat config.log; exit 1; } > >> - if ! sudo make $JOBS $type TESTSUITEFLAGS="$TEST_RANGE" > >> RECHECK=yes; then > >> + if ! sudo make $JOBS $type TESTSUITEFLAGS="$TEST_RANGE"; then > >> # $log_file is necessary for debugging. > >> cat tests/$log_file > >> exit 1 > > > > _______________________________________________ > dev mailing list > dev@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev >
On 6/9/23 06:25, Dumitru Ceara wrote: > Hi Mark, > > Thanks for your feedback on this! > > On 6/8/23 21:58, Mark Michelson wrote: >> In theory, this idea is great. By removing the recheck, our CI will >> potentially turn red more often, resulting in us investigating our tests >> to resolve what makes them flaky. Eventually, we will resolve all of the >> flakes and CI will be green. From that point on, any time the CI turns >> red, it will immediately get our attention and we will fix whatever >> needs fixing. >> >> I have two main fears here: >> >> 1) If we don't fix the existing flaky tests quickly enough, it will be >> considered "normal" for CI to be red, meaning we are less likely to >> notice genuine failures when they happen. Worse, developers familiar >> with the flaky reputation of tests will just issue their own rechecks >> until the CI turns green. >> >> 2) Let's say we track down and fix all current flaky tests. Developers >> will soon write new tests, and those tests may end up being flaky. Then >> our CI will start turning red again due to test failures seemingly >> unrelated to code being committed. The result is the same as (1). >> >> By leaving the recheck intact, we are less likely to notice and fix >> flaky tests. That's also bad. >> >> I think that if we are going to remove the recheck now, then we have to >> be fully dedicated to eradicating all existing flaky tests. If we as a >> project can be dedicated to fixing flaky tests *immediately* when they >> are found, then we could move forward with this change now. I just don't >> trust us to actually do it. >> >> I think a more plausible idea would be to identify all known flaky >> tests, fix those, and then remove the recheck. This way, we only have to >> deal with new flaky tests that get written in the future. I think we can >> deal with this situation more easily than suddenly going from green CI >> to red CI because of existing flaky tests. >> > > What about the following compromise instead? > > Identify all known flaky tests, fix the ones we can fix without too much > effort, tag the ones that we couldn't easily fix with a new keyword > (e.g., "unstable" or "flaky" or whatever), and only run recheck for > those. > > In support of this I did a test last night. I force pushed the commit > that removes the --recheck multiple times in my fork to trigger CI. > The GitHub action ran 23 times which means the unit and system tests > were ran 3 x 23 = 69 times. I collected the results and grouped them > by test and counted how many times they failed (disclaimer: it's not > completely fair because some tests run with different variations more > times than others but it should be a good enough first approximation). > I got: > > 40 ovn.at:30812 propagate Port_Binding.up to NB and OVS -- ovn-northd > 34 ovn-northd.at:9487 LSP incremental processing -- ovn-northd > 8 ovn.at:14436 options:multiple requested-chassis for logical port: unclaimed behavior -- ovn-northd > 7 ovn.at:16010 tug-of-war between two chassis for the same port -- ovn-northd > 6 ovn.at:29369 nb_cfg timestamp -- ovn-northd > 5 ovn-northd.at:4507 northd ssl file change -- ovn-northd > 4 ovn-performance.at:227 ovn-controller incremental processing > 3 ovn.at:34900 Check default openflow flows -- ovn-northd > 3 ovn.at:34466 recomputes -- ovn-northd > 3 ovn.at:14284 options:multiple requested-chassis for logical port -- ovn-northd > 2 ovn.at:7909 policy-based routing IPv6: 1 HVs, 3 LSs, 1 lport/LS, 1 LR -- ovn-northd > 1 system-ovn.at:6525 egress qos -- ovn-northd > 1 system-ovn.at:11151 ovn mirroring -- ovn-northd > 1 ovn-northd.at:8809 Address set incremental processing -- ovn-northd > 1 ovn-controller.at:702 ovn-controller - ovn action metering -- ovn-northd > 1 ovn.at:29467 ARP replies for SNAT external ips -- ovn-northd > 1 ovn.at:13171 1 LR with HA distributed router gateway port -- ovn-northd > > The first two are probably real bugs and I reported them separately > yesterday [0] but the other 15 didn't fail that often. I would even > go one step further and ignore the ones that only failed once (it's > acceptable IMO every now and then a maintainer has to re-trigger CI). > > That would leave us with 9 tests that failed during this exercise and > need either to be fixed or to be tagged as "unstable": > 8 ovn.at:14436 options:multiple requested-chassis for logical port: unclaimed behavior -- ovn-northd > 7 ovn.at:16010 tug-of-war between two chassis for the same port -- ovn-northd > 6 ovn.at:29369 nb_cfg timestamp -- ovn-northd > 5 ovn-northd.at:4507 northd ssl file change -- ovn-northd > 4 ovn-performance.at:227 ovn-controller incremental processing > 3 ovn.at:34900 Check default openflow flows -- ovn-northd > 3 ovn.at:34466 recomputes -- ovn-northd > 3 ovn.at:14284 options:multiple requested-chassis for logical port -- ovn-northd > 2 ovn.at:7909 policy-based routing IPv6: 1 HVs, 3 LSs, 1 lport/LS, 1 LR -- ovn-northd > > Would you consider this compromise as a good alternative? I think this can be a good alternative. I think if we go with this, then we need to be vigilant and be willing to adjust if it turns out that we end up having more CI failures than expected. The other thing we have to avoid is haphazardly labeling tests as "unstable" instead of actually trying to fix them. And finally, if tests are marked as "unstable", we have to answer the question of whether they should even remain in the testsuite. If they're always being skipped and we never intend to fix the tests, then they probably should be removed and replaced with something else. > > [0] https://mail.openvswitch.org/pipermail/ovs-dev/2023-June/405418.html > >> >> Everything below this sentence is wishful thinking and shouldn't get in >> the way of approving/rejecting this change. >> >> >> I also think that we need to identify the common causes of flaky tests, >> and take measures to prevent them from happening in the future. There >> are a few ways I can think of to accomplish this: >> >> 1) Ensure people performing code reviews are aware of these patterns and >> point them out during code review. >> >> 2) Automatically prevent known flaky patterns from existing in our code. >> For instance, we could introduce a build-time check that ensures that >> nobody attempts to sleep during a test to "let things settle." >> >> 3) Provide functions/macros to use in the testsuite that avoid >> potentially flaky behavior. As a hypothetical example, let's say that a >> common cause of flakes is not using "--wait=hv" or "--wait=sb" after a >> series of ovn-nbctl commands. We could provide a macro like: >> >> OVN_NBCTL([hv], [ >> lr-add ro0 >> ls-add sw0 >> ]) >> >> (That's probably bad syntax, but I think you get the idea) >> >> This would expand to something like: >> check ovn-nbctl lr-add ro0 >> check ovn-nbctl ls-add sw0 >> check ovn-nbctl --wait=hv sync >> >> If we combine this with the idea from (2), we could prevent all bare >> uses of `ovn-nbctl` in tests, thus preventing this flaky behavior from >> existing. >> >> 4) Identify commonly used patterns in tests and provide macros that >> essentially do the work for us. Some examples might be: >> >> a) Common logical network topologies could be set up for you by calling >> a single macro. >> b) Checking logical flows could have a macro that dumps the information >> you care about, sorts the output, anonymizes the table numbers, and >> eliminates odd spacing in the output. >> c) Ensuring TCP/UDP/IP connectivity between two endpoints could >> potentially be done with a macro or two, assuming that the traffic you >> are sending is not overly specific. >> >> The idea here is that by having commonly used macros that are >> bulletproof, we are less likely to introduce flakes by hand-coding the >> same scenarios. It also would likely reduce the code size of the >> testsuite, which is a nice bonus. >> >> 5) Enhanced debugability of tests would be great. There are too many >> times that I've looked at testsuite.log on a failing test and had no >> idea where to start looking for the failure. There are other times where >> it's clear what failed, but trying to figure out why is more difficult. >> Unfortunately, there isn't likely to be a magic cure-all for this problem. >> > > I think all 5 points above are things that are nice to have regardless > of --recheck happening or not. > > Thanks, > Dumitru > >> On 6/8/23 10:04, Dumitru Ceara wrote: >>> If we want to catch new failures faster we have a better chance if CI >>> doesn't auto-retry (once). >>> >>> Signed-off-by: Dumitru Ceara <dceara@redhat.com> >>> --- >>> NOTE: Sending this as RFC to start the discussion with the community. >>> --- >>> .ci/linux-build.sh | 4 ++-- >>> 1 file changed, 2 insertions(+), 2 deletions(-) >>> >>> diff --git a/.ci/linux-build.sh b/.ci/linux-build.sh >>> index 907a0dc6c9..64f7a96d91 100755 >>> --- a/.ci/linux-build.sh >>> +++ b/.ci/linux-build.sh >>> @@ -66,7 +66,7 @@ function execute_tests() >>> export DISTCHECK_CONFIGURE_FLAGS="$OPTS" >>> if ! make distcheck CFLAGS="${COMMON_CFLAGS} ${OVN_CFLAGS}" $JOBS \ >>> - TESTSUITEFLAGS="$JOBS $TEST_RANGE" RECHECK=yes >>> + TESTSUITEFLAGS="$JOBS $TEST_RANGE" >>> then >>> # testsuite.log is necessary for debugging. >>> cat */_build/sub/tests/testsuite.log >>> @@ -81,7 +81,7 @@ function execute_system_tests() >>> configure_ovn $OPTS >>> make $JOBS || { cat config.log; exit 1; } >>> - if ! sudo make $JOBS $type TESTSUITEFLAGS="$TEST_RANGE" >>> RECHECK=yes; then >>> + if ! sudo make $JOBS $type TESTSUITEFLAGS="$TEST_RANGE"; then >>> # $log_file is necessary for debugging. >>> cat tests/$log_file >>> exit 1 >> >
On Fri, Jun 9, 2023 at 11:53 AM Mark Michelson <mmichels@redhat.com> wrote: > > On 6/9/23 06:25, Dumitru Ceara wrote: > > Hi Mark, > > > > Thanks for your feedback on this! > > > > On 6/8/23 21:58, Mark Michelson wrote: > >> In theory, this idea is great. By removing the recheck, our CI will > >> potentially turn red more often, resulting in us investigating our tests > >> to resolve what makes them flaky. Eventually, we will resolve all of the > >> flakes and CI will be green. From that point on, any time the CI turns > >> red, it will immediately get our attention and we will fix whatever > >> needs fixing. > >> > >> I have two main fears here: > >> > >> 1) If we don't fix the existing flaky tests quickly enough, it will be > >> considered "normal" for CI to be red, meaning we are less likely to > >> notice genuine failures when they happen. Worse, developers familiar > >> with the flaky reputation of tests will just issue their own rechecks > >> until the CI turns green. > >> > >> 2) Let's say we track down and fix all current flaky tests. Developers > >> will soon write new tests, and those tests may end up being flaky. Then > >> our CI will start turning red again due to test failures seemingly > >> unrelated to code being committed. The result is the same as (1). > >> > >> By leaving the recheck intact, we are less likely to notice and fix > >> flaky tests. That's also bad. > >> > >> I think that if we are going to remove the recheck now, then we have to > >> be fully dedicated to eradicating all existing flaky tests. If we as a > >> project can be dedicated to fixing flaky tests *immediately* when they > >> are found, then we could move forward with this change now. I just don't > >> trust us to actually do it. > >> > >> I think a more plausible idea would be to identify all known flaky > >> tests, fix those, and then remove the recheck. This way, we only have to > >> deal with new flaky tests that get written in the future. I think we can > >> deal with this situation more easily than suddenly going from green CI > >> to red CI because of existing flaky tests. > >> > > > > What about the following compromise instead? > > > > Identify all known flaky tests, fix the ones we can fix without too much > > effort, tag the ones that we couldn't easily fix with a new keyword > > (e.g., "unstable" or "flaky" or whatever), and only run recheck for > > those. > > > > In support of this I did a test last night. I force pushed the commit > > that removes the --recheck multiple times in my fork to trigger CI. > > The GitHub action ran 23 times which means the unit and system tests > > were ran 3 x 23 = 69 times. I collected the results and grouped them > > by test and counted how many times they failed (disclaimer: it's not > > completely fair because some tests run with different variations more > > times than others but it should be a good enough first approximation). > > I got: > > > > 40 ovn.at:30812 propagate Port_Binding.up to NB and OVS -- ovn-northd > > 34 ovn-northd.at:9487 LSP incremental processing -- ovn-northd > > 8 ovn.at:14436 options:multiple requested-chassis for logical port: unclaimed behavior -- ovn-northd > > 7 ovn.at:16010 tug-of-war between two chassis for the same port -- ovn-northd > > 6 ovn.at:29369 nb_cfg timestamp -- ovn-northd > > 5 ovn-northd.at:4507 northd ssl file change -- ovn-northd > > 4 ovn-performance.at:227 ovn-controller incremental processing > > 3 ovn.at:34900 Check default openflow flows -- ovn-northd > > 3 ovn.at:34466 recomputes -- ovn-northd > > 3 ovn.at:14284 options:multiple requested-chassis for logical port -- ovn-northd > > 2 ovn.at:7909 policy-based routing IPv6: 1 HVs, 3 LSs, 1 lport/LS, 1 LR -- ovn-northd > > 1 system-ovn.at:6525 egress qos -- ovn-northd > > 1 system-ovn.at:11151 ovn mirroring -- ovn-northd > > 1 ovn-northd.at:8809 Address set incremental processing -- ovn-northd > > 1 ovn-controller.at:702 ovn-controller - ovn action metering -- ovn-northd > > 1 ovn.at:29467 ARP replies for SNAT external ips -- ovn-northd > > 1 ovn.at:13171 1 LR with HA distributed router gateway port -- ovn-northd > > > > The first two are probably real bugs and I reported them separately The first is a bug, and here is the fix: https://patchwork.ozlabs.org/project/ovn/list/?series=359066 The second may be categorized as unstable test, and here is an attempt to make it stable (not sure if it can be 100% stable because performance counter related checks are always tricky): https://patchwork.ozlabs.org/project/ovn/patch/20230609194337.3538411-1-hzhou@ovn.org/ > > yesterday [0] but the other 15 didn't fail that often. I would even > > go one step further and ignore the ones that only failed once (it's > > acceptable IMO every now and then a maintainer has to re-trigger CI). > > > > That would leave us with 9 tests that failed during this exercise and > > need either to be fixed or to be tagged as "unstable": > > 8 ovn.at:14436 options:multiple requested-chassis for logical port: unclaimed behavior -- ovn-northd > > 7 ovn.at:16010 tug-of-war between two chassis for the same port -- ovn-northd > > 6 ovn.at:29369 nb_cfg timestamp -- ovn-northd > > 5 ovn-northd.at:4507 northd ssl file change -- ovn-northd > > 4 ovn-performance.at:227 ovn-controller incremental processing > > 3 ovn.at:34900 Check default openflow flows -- ovn-northd > > 3 ovn.at:34466 recomputes -- ovn-northd > > 3 ovn.at:14284 options:multiple requested-chassis for logical port -- ovn-northd > > 2 ovn.at:7909 policy-based routing IPv6: 1 HVs, 3 LSs, 1 lport/LS, 1 LR -- ovn-northd > > > > Would you consider this compromise as a good alternative? > +1 for this good alternative. > I think this can be a good alternative. I think if we go with this, then > we need to be vigilant and be willing to adjust if it turns out that we > end up having more CI failures than expected. > > The other thing we have to avoid is haphazardly labeling tests as > "unstable" instead of actually trying to fix them. > > And finally, if tests are marked as "unstable", we have to answer the > question of whether they should even remain in the testsuite. If they're > always being skipped and we never intend to fix the tests, then they > probably should be removed and replaced with something else. > I'd like to interpret "unstable" as "should pass in most cases but may occasionally fail, and >90% (or even higher) chance a retry will pass if it fails in the first attempt", rather than "very likely to fail". I think at least today we should not have test cases that fail more than this in CI. Of course if there are any new "very likely to fail" test cases we should treat it as CI broken and fix them immediately (if unfortunately wasn't caught before merging for whatever reason, which shouldn't happen in most cases). Thanks, Han > > > > [0] https://mail.openvswitch.org/pipermail/ovs-dev/2023-June/405418.html > > > >> > >> Everything below this sentence is wishful thinking and shouldn't get in > >> the way of approving/rejecting this change. > >> > >> > >> I also think that we need to identify the common causes of flaky tests, > >> and take measures to prevent them from happening in the future. There > >> are a few ways I can think of to accomplish this: > >> > >> 1) Ensure people performing code reviews are aware of these patterns and > >> point them out during code review. > >> > >> 2) Automatically prevent known flaky patterns from existing in our code. > >> For instance, we could introduce a build-time check that ensures that > >> nobody attempts to sleep during a test to "let things settle." > >> > >> 3) Provide functions/macros to use in the testsuite that avoid > >> potentially flaky behavior. As a hypothetical example, let's say that a > >> common cause of flakes is not using "--wait=hv" or "--wait=sb" after a > >> series of ovn-nbctl commands. We could provide a macro like: > >> > >> OVN_NBCTL([hv], [ > >> lr-add ro0 > >> ls-add sw0 > >> ]) > >> > >> (That's probably bad syntax, but I think you get the idea) > >> > >> This would expand to something like: > >> check ovn-nbctl lr-add ro0 > >> check ovn-nbctl ls-add sw0 > >> check ovn-nbctl --wait=hv sync > >> > >> If we combine this with the idea from (2), we could prevent all bare > >> uses of `ovn-nbctl` in tests, thus preventing this flaky behavior from > >> existing. > >> > >> 4) Identify commonly used patterns in tests and provide macros that > >> essentially do the work for us. Some examples might be: > >> > >> a) Common logical network topologies could be set up for you by calling > >> a single macro. > >> b) Checking logical flows could have a macro that dumps the information > >> you care about, sorts the output, anonymizes the table numbers, and > >> eliminates odd spacing in the output. > >> c) Ensuring TCP/UDP/IP connectivity between two endpoints could > >> potentially be done with a macro or two, assuming that the traffic you > >> are sending is not overly specific. > >> > >> The idea here is that by having commonly used macros that are > >> bulletproof, we are less likely to introduce flakes by hand-coding the > >> same scenarios. It also would likely reduce the code size of the > >> testsuite, which is a nice bonus. > >> > >> 5) Enhanced debugability of tests would be great. There are too many > >> times that I've looked at testsuite.log on a failing test and had no > >> idea where to start looking for the failure. There are other times where > >> it's clear what failed, but trying to figure out why is more difficult. > >> Unfortunately, there isn't likely to be a magic cure-all for this problem. > >> > > > > I think all 5 points above are things that are nice to have regardless > > of --recheck happening or not. > > > > Thanks, > > Dumitru > > > >> On 6/8/23 10:04, Dumitru Ceara wrote: > >>> If we want to catch new failures faster we have a better chance if CI > >>> doesn't auto-retry (once). > >>> > >>> Signed-off-by: Dumitru Ceara <dceara@redhat.com> > >>> --- > >>> NOTE: Sending this as RFC to start the discussion with the community. > >>> --- > >>> .ci/linux-build.sh | 4 ++-- > >>> 1 file changed, 2 insertions(+), 2 deletions(-) > >>> > >>> diff --git a/.ci/linux-build.sh b/.ci/linux-build.sh > >>> index 907a0dc6c9..64f7a96d91 100755 > >>> --- a/.ci/linux-build.sh > >>> +++ b/.ci/linux-build.sh > >>> @@ -66,7 +66,7 @@ function execute_tests() > >>> export DISTCHECK_CONFIGURE_FLAGS="$OPTS" > >>> if ! make distcheck CFLAGS="${COMMON_CFLAGS} ${OVN_CFLAGS}" $JOBS \ > >>> - TESTSUITEFLAGS="$JOBS $TEST_RANGE" RECHECK=yes > >>> + TESTSUITEFLAGS="$JOBS $TEST_RANGE" > >>> then > >>> # testsuite.log is necessary for debugging. > >>> cat */_build/sub/tests/testsuite.log > >>> @@ -81,7 +81,7 @@ function execute_system_tests() > >>> configure_ovn $OPTS > >>> make $JOBS || { cat config.log; exit 1; } > >>> - if ! sudo make $JOBS $type TESTSUITEFLAGS="$TEST_RANGE" > >>> RECHECK=yes; then > >>> + if ! sudo make $JOBS $type TESTSUITEFLAGS="$TEST_RANGE"; then > >>> # $log_file is necessary for debugging. > >>> cat tests/$log_file > >>> exit 1 > >> > > > > _______________________________________________ > dev mailing list > dev@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
On 6/9/23 22:21, Han Zhou wrote: > On Fri, Jun 9, 2023 at 11:53 AM Mark Michelson <mmichels@redhat.com> wrote: >> >> On 6/9/23 06:25, Dumitru Ceara wrote: >>> Hi Mark, >>> >>> Thanks for your feedback on this! >>> >>> On 6/8/23 21:58, Mark Michelson wrote: >>>> In theory, this idea is great. By removing the recheck, our CI will >>>> potentially turn red more often, resulting in us investigating our > tests >>>> to resolve what makes them flaky. Eventually, we will resolve all of > the >>>> flakes and CI will be green. From that point on, any time the CI turns >>>> red, it will immediately get our attention and we will fix whatever >>>> needs fixing. >>>> >>>> I have two main fears here: >>>> >>>> 1) If we don't fix the existing flaky tests quickly enough, it will be >>>> considered "normal" for CI to be red, meaning we are less likely to >>>> notice genuine failures when they happen. Worse, developers familiar >>>> with the flaky reputation of tests will just issue their own rechecks >>>> until the CI turns green. >>>> >>>> 2) Let's say we track down and fix all current flaky tests. Developers >>>> will soon write new tests, and those tests may end up being flaky. Then >>>> our CI will start turning red again due to test failures seemingly >>>> unrelated to code being committed. The result is the same as (1). >>>> >>>> By leaving the recheck intact, we are less likely to notice and fix >>>> flaky tests. That's also bad. >>>> >>>> I think that if we are going to remove the recheck now, then we have to >>>> be fully dedicated to eradicating all existing flaky tests. If we as a >>>> project can be dedicated to fixing flaky tests *immediately* when they >>>> are found, then we could move forward with this change now. I just > don't >>>> trust us to actually do it. >>>> >>>> I think a more plausible idea would be to identify all known flaky >>>> tests, fix those, and then remove the recheck. This way, we only have > to >>>> deal with new flaky tests that get written in the future. I think we > can >>>> deal with this situation more easily than suddenly going from green CI >>>> to red CI because of existing flaky tests. >>>> >>> >>> What about the following compromise instead? >>> >>> Identify all known flaky tests, fix the ones we can fix without too much >>> effort, tag the ones that we couldn't easily fix with a new keyword >>> (e.g., "unstable" or "flaky" or whatever), and only run recheck for >>> those. >>> >>> In support of this I did a test last night. I force pushed the commit >>> that removes the --recheck multiple times in my fork to trigger CI. >>> The GitHub action ran 23 times which means the unit and system tests >>> were ran 3 x 23 = 69 times. I collected the results and grouped them >>> by test and counted how many times they failed (disclaimer: it's not >>> completely fair because some tests run with different variations more >>> times than others but it should be a good enough first approximation). >>> I got: >>> >>> 40 ovn.at:30812 propagate Port_Binding.up to NB and OVS -- > ovn-northd >>> 34 ovn-northd.at:9487 LSP incremental processing -- ovn-northd >>> 8 ovn.at:14436 options:multiple requested-chassis for > logical port: unclaimed behavior -- ovn-northd >>> 7 ovn.at:16010 tug-of-war between two chassis for the > same port -- ovn-northd >>> 6 ovn.at:29369 nb_cfg timestamp -- ovn-northd >>> 5 ovn-northd.at:4507 northd ssl file change -- ovn-northd >>> 4 ovn-performance.at:227 ovn-controller incremental processing >>> 3 ovn.at:34900 Check default openflow flows -- ovn-northd >>> 3 ovn.at:34466 recomputes -- ovn-northd >>> 3 ovn.at:14284 options:multiple requested-chassis for > logical port -- ovn-northd >>> 2 ovn.at:7909 policy-based routing IPv6: 1 HVs, 3 LSs, 1 > lport/LS, 1 LR -- ovn-northd >>> 1 system-ovn.at:6525 egress qos -- ovn-northd >>> 1 system-ovn.at:11151 ovn mirroring -- ovn-northd >>> 1 ovn-northd.at:8809 Address set incremental processing -- > ovn-northd >>> 1 ovn-controller.at:702 ovn-controller - ovn action metering -- > ovn-northd >>> 1 ovn.at:29467 ARP replies for SNAT external ips -- > ovn-northd >>> 1 ovn.at:13171 1 LR with HA distributed router gateway > port -- ovn-northd >>> >>> The first two are probably real bugs and I reported them separately > > The first is a bug, and here is the fix: > https://patchwork.ozlabs.org/project/ovn/list/?series=359066 > > The second may be categorized as unstable test, and here is an attempt to > make it stable (not sure if it can be 100% stable because performance > counter related checks are always tricky): > https://patchwork.ozlabs.org/project/ovn/patch/20230609194337.3538411-1-hzhou@ovn.org/ > >>> yesterday [0] but the other 15 didn't fail that often. I would even >>> go one step further and ignore the ones that only failed once (it's >>> acceptable IMO every now and then a maintainer has to re-trigger CI). >>> >>> That would leave us with 9 tests that failed during this exercise and >>> need either to be fixed or to be tagged as "unstable": >>> 8 ovn.at:14436 options:multiple requested-chassis for > logical port: unclaimed behavior -- ovn-northd >>> 7 ovn.at:16010 tug-of-war between two chassis for the > same port -- ovn-northd >>> 6 ovn.at:29369 nb_cfg timestamp -- ovn-northd >>> 5 ovn-northd.at:4507 northd ssl file change -- ovn-northd >>> 4 ovn-performance.at:227 ovn-controller incremental processing >>> 3 ovn.at:34900 Check default openflow flows -- ovn-northd >>> 3 ovn.at:34466 recomputes -- ovn-northd >>> 3 ovn.at:14284 options:multiple requested-chassis for > logical port -- ovn-northd >>> 2 ovn.at:7909 policy-based routing IPv6: 1 HVs, 3 LSs, 1 > lport/LS, 1 LR -- ovn-northd >>> >>> Would you consider this compromise as a good alternative? >> > > +1 for this good alternative. > >> I think this can be a good alternative. I think if we go with this, then >> we need to be vigilant and be willing to adjust if it turns out that we >> end up having more CI failures than expected. >> >> The other thing we have to avoid is haphazardly labeling tests as >> "unstable" instead of actually trying to fix them. >> >> And finally, if tests are marked as "unstable", we have to answer the >> question of whether they should even remain in the testsuite. If they're >> always being skipped and we never intend to fix the tests, then they >> probably should be removed and replaced with something else. >> > I'd like to interpret "unstable" as "should pass in most cases but may > occasionally fail, and >90% (or even higher) chance a retry will pass if it > fails in the first attempt", rather than "very likely to fail". > I think at least today we should not have test cases that fail more than > this in CI. > Of course if there are any new "very likely to fail" test cases we should > treat it as CI broken and fix them immediately (if unfortunately wasn't > caught before merging for whatever reason, which shouldn't happen in most > cases). > I posted a patch to implement the alternative discussed above: https://patchwork.ozlabs.org/project/ovn/patch/20230705170813.1199533-1-dceara@redhat.com/ Regards, Dumitru
diff --git a/.ci/linux-build.sh b/.ci/linux-build.sh index 907a0dc6c9..64f7a96d91 100755 --- a/.ci/linux-build.sh +++ b/.ci/linux-build.sh @@ -66,7 +66,7 @@ function execute_tests() export DISTCHECK_CONFIGURE_FLAGS="$OPTS" if ! make distcheck CFLAGS="${COMMON_CFLAGS} ${OVN_CFLAGS}" $JOBS \ - TESTSUITEFLAGS="$JOBS $TEST_RANGE" RECHECK=yes + TESTSUITEFLAGS="$JOBS $TEST_RANGE" then # testsuite.log is necessary for debugging. cat */_build/sub/tests/testsuite.log @@ -81,7 +81,7 @@ function execute_system_tests() configure_ovn $OPTS make $JOBS || { cat config.log; exit 1; } - if ! sudo make $JOBS $type TESTSUITEFLAGS="$TEST_RANGE" RECHECK=yes; then + if ! sudo make $JOBS $type TESTSUITEFLAGS="$TEST_RANGE"; then # $log_file is necessary for debugging. cat tests/$log_file exit 1
If we want to catch new failures faster we have a better chance if CI doesn't auto-retry (once). Signed-off-by: Dumitru Ceara <dceara@redhat.com> --- NOTE: Sending this as RFC to start the discussion with the community. --- .ci/linux-build.sh | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)