Message ID | 20240301113744.3684763-1-i.maximets@ovn.org |
---|---|
State | Accepted |
Commit | 99c86c6c46632e6adfd699611136ea6b26d1e291 |
Delegated to: | Ilya Maximets |
Headers | show |
Series | [ovs-dev] github: Temporarily disable SNAT with exhaustion system test. | expand |
Context | Check | Description |
---|---|---|
ovsrobot/apply-robot | success | apply and check: success |
ovsrobot/github-robot-_Build_and_Test | success | github build: passed |
ovsrobot/intel-ovs-compilation | success | test: success |
On 3/1/24 12:37, Ilya Maximets wrote: > With a new runner update, GitHub Actions had a kernel update. > And it seems like something changed between kernels 6.2 and 6.5 > so this test now fails very frequently. > > I can reproduce the same issue on RHEL 9, and I can't reproduce > it on Ubuntu 23.04 (kernel 6.2). > > The test is creating a NAT with a single address+port pair in > an attempt to simulate an address space exhaustion. It is > expected that a first connection with wget leaves a conntrack > entry in a TIME_WAIT state and the second wget should fail > as long as this entry remains, because the only available > address+port pair is already taken. > > However, for some reason, very frequently (not always!) the > second connection replaces the first conntrack entry with a > new one and connection succeeds. There is still only one > connection in the conntrack at any single moment in time, so > there is seemingly no issue with the NAT, but the behavior > is unexpected and the test fails. > > Disable the test in CI until we figure out how to fix the > kernel (if it is a kernel bug) or the test. Marcelo pointed out the following change that went into 6.5 kernel: https://lore.kernel.org/netdev/20230626064749.75525-7-pablo@netfilter.org/ It looks like this test is not going to work anymore as new connections are now allowed to evict closing ones on collisions. We need to re-think the test, i.e. we have to keep the first connection open while we're trying to make the second one, so it is not evicted. > > Signed-off-by: Ilya Maximets <i.maximets@ovn.org> > --- > tests/system-traffic.at | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/tests/system-traffic.at b/tests/system-traffic.at > index 98e494abf..07d09b912 100644 > --- a/tests/system-traffic.at > +++ b/tests/system-traffic.at > @@ -6388,6 +6388,7 @@ OVS_TRAFFIC_VSWITCHD_STOP > AT_CLEANUP > > AT_SETUP([conntrack - SNAT with port range with exhaustion]) > +OVS_CHECK_GITHUB_ACTION() > CHECK_CONNTRACK() > CHECK_CONNTRACK_NAT() > OVS_TRAFFIC_VSWITCHD_START()
On Fri, Mar 01, 2024 at 12:37:36PM +0100, Ilya Maximets wrote: > With a new runner update, GitHub Actions had a kernel update. > And it seems like something changed between kernels 6.2 and 6.5 > so this test now fails very frequently. > > I can reproduce the same issue on RHEL 9, and I can't reproduce > it on Ubuntu 23.04 (kernel 6.2). > > The test is creating a NAT with a single address+port pair in > an attempt to simulate an address space exhaustion. It is > expected that a first connection with wget leaves a conntrack > entry in a TIME_WAIT state and the second wget should fail > as long as this entry remains, because the only available > address+port pair is already taken. > > However, for some reason, very frequently (not always!) the > second connection replaces the first conntrack entry with a > new one and connection succeeds. There is still only one > connection in the conntrack at any single moment in time, so > there is seemingly no issue with the NAT, but the behavior > is unexpected and the test fails. > > Disable the test in CI until we figure out how to fix the > kernel (if it is a kernel bug) or the test. > > Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: Simon Horman <horms@ovn.org>
Ilya Maximets <i.maximets@ovn.org> writes: > With a new runner update, GitHub Actions had a kernel update. > And it seems like something changed between kernels 6.2 and 6.5 > so this test now fails very frequently. > > I can reproduce the same issue on RHEL 9, and I can't reproduce > it on Ubuntu 23.04 (kernel 6.2). > > The test is creating a NAT with a single address+port pair in > an attempt to simulate an address space exhaustion. It is > expected that a first connection with wget leaves a conntrack > entry in a TIME_WAIT state and the second wget should fail > as long as this entry remains, because the only available > address+port pair is already taken. > > However, for some reason, very frequently (not always!) the > second connection replaces the first conntrack entry with a > new one and connection succeeds. There is still only one > connection in the conntrack at any single moment in time, so > there is seemingly no issue with the NAT, but the behavior > is unexpected and the test fails. > > Disable the test in CI until we figure out how to fix the > kernel (if it is a kernel bug) or the test. > > Signed-off-by: Ilya Maximets <i.maximets@ovn.org> > --- Acked-by: Paolo Valerio <pvalerio@redhat.com>
On 1 Mar 2024, at 13:42, Ilya Maximets wrote: > On 3/1/24 12:37, Ilya Maximets wrote: >> With a new runner update, GitHub Actions had a kernel update. >> And it seems like something changed between kernels 6.2 and 6.5 >> so this test now fails very frequently. >> >> I can reproduce the same issue on RHEL 9, and I can't reproduce >> it on Ubuntu 23.04 (kernel 6.2). >> >> The test is creating a NAT with a single address+port pair in >> an attempt to simulate an address space exhaustion. It is >> expected that a first connection with wget leaves a conntrack >> entry in a TIME_WAIT state and the second wget should fail >> as long as this entry remains, because the only available >> address+port pair is already taken. >> >> However, for some reason, very frequently (not always!) the >> second connection replaces the first conntrack entry with a >> new one and connection succeeds. There is still only one >> connection in the conntrack at any single moment in time, so >> there is seemingly no issue with the NAT, but the behavior >> is unexpected and the test fails. >> >> Disable the test in CI until we figure out how to fix the >> kernel (if it is a kernel bug) or the test. > > Marcelo pointed out the following change that went into 6.5 kernel: > https://lore.kernel.org/netdev/20230626064749.75525-7-pablo@netfilter.org/ > > It looks like this test is not going to work anymore as new connections > are now allowed to evict closing ones on collisions. > > We need to re-think the test, i.e. we have to keep the first connection > open while we're trying to make the second one, so it is not evicted. Paolo, can you take a look at fixing the test case? Do you want an FDP issue for this? >> >> Signed-off-by: Ilya Maximets <i.maximets@ovn.org> >> --- The change looks good for now. Acked-by: Eelco Chaudron <echaudro@redhat.com> >> tests/system-traffic.at | 1 + >> 1 file changed, 1 insertion(+) >> >> diff --git a/tests/system-traffic.at b/tests/system-traffic.at >> index 98e494abf..07d09b912 100644 >> --- a/tests/system-traffic.at >> +++ b/tests/system-traffic.at >> @@ -6388,6 +6388,7 @@ OVS_TRAFFIC_VSWITCHD_STOP >> AT_CLEANUP >> >> AT_SETUP([conntrack - SNAT with port range with exhaustion]) >> +OVS_CHECK_GITHUB_ACTION() >> CHECK_CONNTRACK() >> CHECK_CONNTRACK_NAT() >> OVS_TRAFFIC_VSWITCHD_START()
On 3/1/24 16:18, Paolo Valerio wrote: > Ilya Maximets <i.maximets@ovn.org> writes: > >> With a new runner update, GitHub Actions had a kernel update. >> And it seems like something changed between kernels 6.2 and 6.5 >> so this test now fails very frequently. >> >> I can reproduce the same issue on RHEL 9, and I can't reproduce >> it on Ubuntu 23.04 (kernel 6.2). >> >> The test is creating a NAT with a single address+port pair in >> an attempt to simulate an address space exhaustion. It is >> expected that a first connection with wget leaves a conntrack >> entry in a TIME_WAIT state and the second wget should fail >> as long as this entry remains, because the only available >> address+port pair is already taken. >> >> However, for some reason, very frequently (not always!) the >> second connection replaces the first conntrack entry with a >> new one and connection succeeds. There is still only one >> connection in the conntrack at any single moment in time, so >> there is seemingly no issue with the NAT, but the behavior >> is unexpected and the test fails. >> >> Disable the test in CI until we figure out how to fix the >> kernel (if it is a kernel bug) or the test. >> >> Signed-off-by: Ilya Maximets <i.maximets@ovn.org> >> --- > > Acked-by: Paolo Valerio <pvalerio@redhat.com> > Thanks, Simon, Paolo and Eelco! I updated the commit message with the reference to the kernel change. With that, applied and backported to 3.3 to stop further CI failures. Best regards, Ilya Maximets.
diff --git a/tests/system-traffic.at b/tests/system-traffic.at index 98e494abf..07d09b912 100644 --- a/tests/system-traffic.at +++ b/tests/system-traffic.at @@ -6388,6 +6388,7 @@ OVS_TRAFFIC_VSWITCHD_STOP AT_CLEANUP AT_SETUP([conntrack - SNAT with port range with exhaustion]) +OVS_CHECK_GITHUB_ACTION() CHECK_CONNTRACK() CHECK_CONNTRACK_NAT() OVS_TRAFFIC_VSWITCHD_START()
With a new runner update, GitHub Actions had a kernel update. And it seems like something changed between kernels 6.2 and 6.5 so this test now fails very frequently. I can reproduce the same issue on RHEL 9, and I can't reproduce it on Ubuntu 23.04 (kernel 6.2). The test is creating a NAT with a single address+port pair in an attempt to simulate an address space exhaustion. It is expected that a first connection with wget leaves a conntrack entry in a TIME_WAIT state and the second wget should fail as long as this entry remains, because the only available address+port pair is already taken. However, for some reason, very frequently (not always!) the second connection replaces the first conntrack entry with a new one and connection succeeds. There is still only one connection in the conntrack at any single moment in time, so there is seemingly no issue with the NAT, but the behavior is unexpected and the test fails. Disable the test in CI until we figure out how to fix the kernel (if it is a kernel bug) or the test. Signed-off-by: Ilya Maximets <i.maximets@ovn.org> --- tests/system-traffic.at | 1 + 1 file changed, 1 insertion(+)