Message ID | 20211201111200.424375-1-snemec@redhat.com |
---|---|
State | Accepted |
Delegated to: | Pablo Neira |
Headers | show |
Series | [nft] tests: shell: better parameters for the interval stack overflow test | expand |
Štěpán Němec <snemec@redhat.com> wrote: > Wider testing has shown that 128 kB stack is too low (e.g. for systems > with 64 kB page size), leading to false failures in some environments. We could try to get rid of large on-stack allocations and always malloc instead.
On Wed, 01 Dec 2021 12:26:14 +0100 Florian Westphal wrote: > Štěpán Němec <snemec@redhat.com> wrote: >> Wider testing has shown that 128 kB stack is too low (e.g. for systems >> with 64 kB page size), leading to false failures in some environments. > > We could try to get rid of large on-stack allocations and always malloc > instead. [I think this might rather be a topic for discussion with other developers and maintainers, but given that I was alone in the To: header, here's my 0.02 CZK, to not leave this hanging:] My patch only addresses the regression test for one case where stack allocation lead to runaway stack. Looking for and fixing other such code paths (if any) does sound like a good idea to me. If you meant an effort to decrease nftables stack usage in general, I think I have neither the experience nor the expertise to provide an informed opinion on that. On a superficial look, nft stack size doesn't seem small, but not outrageous, either (the below is with nftables-0.9.8-9.el9.x86_64 (RHEL), but I get about the same numbers on Arch with the 1.0.1 release; both nftables and iptables rule sets were empty, but for nftables stack usage that doesn't seem to make any difference; for iptables-save it does: 51248 B with a nonempty rule set): # memusage nft list ruleset 2>&1 | grep -o 'stack peak: .*' stack peak: 98400 Same thing with iptables-save: 38352 bash -c exit: 5456 python -Sc 'exit()': 23360 python -Sm os: 70448 vim -c q: 105872 Thanks, Štěpán
On Wed, Dec 01, 2021 at 12:12:00PM +0100, Štěpán Němec wrote: > Wider testing has shown that 128 kB stack is too low (e.g. for systems > with 64 kB page size), leading to false failures in some environments. > > Based on results from a matrix of RHEL 8 and RHEL 9 systems across > x86_64, aarch64, ppc64le and s390x architectures as well as some > anecdotal testing of other Linux distros on x86_64 machines, 400 kB > seems safe: the normal nft stack (which should stay constant during > this test) on all tested systems doesn't exceed 200 kB (stays around > 100 kB on typical systems with 4 kB page size), while always growing > beyond 500 kB in the failing case (nftables before baecd1cf2685) with > the increased set size. > > Fixes: d8ccad2a2b73 ("tests: cover baecd1cf2685 ("segtree: Fix segfault when restoring a huge interval set")") > Signed-off-by: Štěpán Němec <snemec@redhat.com> Applied, thanks!
diff --git a/tests/shell/testcases/sets/0068interval_stack_overflow_0 b/tests/shell/testcases/sets/0068interval_stack_overflow_0 index 6620572449c3..2cbc98680264 100755 --- a/tests/shell/testcases/sets/0068interval_stack_overflow_0 +++ b/tests/shell/testcases/sets/0068interval_stack_overflow_0 @@ -9,7 +9,7 @@ trap 'rm -f "$ruleset_file"' EXIT { echo 'define big_set = {' for ((i = 1; i < 255; i++)); do - for ((j = 1; j < 80; j++)); do + for ((j = 1; j < 255; j++)); do echo "10.0.$i.$j," done done @@ -26,4 +26,4 @@ table inet test68_table { } EOF -( ulimit -s 128 && $NFT -f "$ruleset_file" ) +( ulimit -s 400 && $NFT -f "$ruleset_file" )
Wider testing has shown that 128 kB stack is too low (e.g. for systems with 64 kB page size), leading to false failures in some environments. Based on results from a matrix of RHEL 8 and RHEL 9 systems across x86_64, aarch64, ppc64le and s390x architectures as well as some anecdotal testing of other Linux distros on x86_64 machines, 400 kB seems safe: the normal nft stack (which should stay constant during this test) on all tested systems doesn't exceed 200 kB (stays around 100 kB on typical systems with 4 kB page size), while always growing beyond 500 kB in the failing case (nftables before baecd1cf2685) with the increased set size. Fixes: d8ccad2a2b73 ("tests: cover baecd1cf2685 ("segtree: Fix segfault when restoring a huge interval set")") Signed-off-by: Štěpán Němec <snemec@redhat.com> --- I haven't been able to find an answer to the question of how much stack size can vary across different systems (particularly those nftables is likely to run on), so more testing might be useful, especially on systems not listed above. In an attempt to avoid depending on a particular stack size and instead fail the test in case the stack continues to grow, I also successfully tested the following (across the same range of systems as the above), but don't think the possible gain is worth the clunkiness. At least with the current version there is only one assumption (the stack limit) that might be wrong. --8<---------------cut here---------------start------------->8--- #!/bin/bash ruleset_file=$(mktemp) || exit 1 trap 'rm -f "$ruleset_file"' EXIT { echo 'define big_set = {' for ((i = 1; i < 255; i++)); do for ((j = 1; j < 255; j++)); do echo "10.0.$i.$j," done done echo '10.1.0.0/24 }' } >"$ruleset_file" || exit 1 cat >>"$ruleset_file" <<\EOF || exit 1 table inet test68_table { set test68_set { type ipv4_addr flags interval elements = { $big_set } } } EOF report() { printf 'Initial stack: %dkB\nCurrent stack: %dkB\n' \ "$initial" "$current" exit "$1" } get_stack() { # Going by 'Size:' rather than 'Rss:'; the latter seemed # too precise (e.g., it sometimes also catched the # initial bump from a few kB to the usual stack size). awk ' found && /^Size:/ { print $2; exit } /\[stack\]/ { found = 1 } ' /proc/"$nft_pid"/smaps } watch_stack() { local interval initial current interval=$1 # discard two initial samples (even with Size: instead of Rss:, it # did happen once (in more than 100 runs) that the initial sample # was 0kB) get_stack; get_stack initial=$(get_stack) || { echo This should never happen; exit 1; } while true; do if stack=$(get_stack); then current=$stack printf '%d\n' "$stack" # failure: stack size more than doubled # (should be ~constant) ((current - initial > initial)) && report 1 else # success?: /proc/$nft_pid/smaps gone means that # $nft_pid exited wait "$nft_pid" report $? fi sleep "$interval" done } $NFT -f "$ruleset_file" & nft_pid=$! trap 'rm -f "$ruleset_file"; kill "$nft_pid" && wait "$nft_pid"' EXIT watch_stack 0.01 --8<---------------cut here---------------end--------------->8--- tests/shell/testcases/sets/0068interval_stack_overflow_0 | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) base-commit: 247eb3c7a102ce184ca203978e74351d01cee79d