Message ID | 20230222163159.3446687-1-adhemerval.zanella@linaro.org |
---|---|
State | New |
Headers | show |
Series | [v2] string: Fix OOB read on generic strncmp | expand |
The 02/22/2023 13:31, Adhemerval Zanella wrote: > For unaligned case, reading ahead can only be done if parting reads > matches the aligned input. > > Also extend the stratcliff tests to check such cases. > > Checked on x86_64-linux-gnu, i686-linux-gnu, powerpc64-linux-gnu, > and powerpc-linux-gnu by removing the arch-specific assembly > implementation and disabling multi-arch (it covers both LE and BE > for 64 and 32 bits). thanks this looks good. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com> > --- > string/stratcliff.c | 17 ++++++++++++++++- > string/strncmp.c | 13 ++++++++++++- > 2 files changed, 28 insertions(+), 2 deletions(-) > > diff --git a/string/stratcliff.c b/string/stratcliff.c > index 74d64cc03d..88ac787088 100644 > --- a/string/stratcliff.c > +++ b/string/stratcliff.c > @@ -401,12 +401,27 @@ do_test (void) > result = 1; > } > > - if (STRNCMP (dest + nchars - outer, adr + middle, outer) <= 0) > + /* Also check for size larger than the string. */ > + if (STRNCMP (adr + middle, dest + nchars - outer, outer + 99) >= 0) > { > printf ("%s 2 flunked for outer = %zu, middle = %zu, full\n", > + STRINGIFY (STRNCMP), outer + 99, middle); > + result = 1; > + } > + > + if (STRNCMP (dest + nchars - outer, adr + middle, outer) <= 0) > + { > + printf ("%s 3 flunked for outer = %zu, middle = %zu, full\n", > STRINGIFY (STRNCMP), outer, middle); > result = 1; > } > + > + if (STRNCMP (dest + nchars - outer, adr + middle, outer + 99) <= 0) > + { > + printf ("%s 4 flunked for outer = %zu, middle = %zu, full\n", > + STRINGIFY (STRNCMP), outer + 99, middle); > + result = 1; > + } > } > > /* strncpy/wcsncpy tests */ > diff --git a/string/strncmp.c b/string/strncmp.c > index 4c8bf36bb9..751bf53d55 100644 > --- a/string/strncmp.c > +++ b/string/strncmp.c > @@ -73,7 +73,11 @@ strncmp_unaligned_loop (const op_t *x1, const op_t *x2, op_t w1, uintptr_t ofs, > uintptr_t sh_2 = sizeof(op_t) * CHAR_BIT - sh_1; > > op_t w2 = MERGE (w2a, sh_1, (op_t)-1, sh_2); > - if (!has_zero (w2) && n > (sizeof (op_t) - ofs)) > + > + /* Reading ahead is wrong if w1 and w2 already differs. */ > + op_t w1a = MERGE (w1, 0, (op_t)-1, sh_2); > + > + if (!has_zero (w2) && w2 == w1a && n >= (sizeof (op_t) - ofs)) > { > op_t w2b; > > @@ -90,6 +94,13 @@ strncmp_unaligned_loop (const op_t *x1, const op_t *x2, op_t w1, uintptr_t ofs, > if (has_zero (w2b) || n <= (sizeof (op_t) - ofs)) > break; > w1 = *x1++; > + > + /* Reading ahead is wrong if w1 and w2 already differs. */ > + w2 = MERGE (w2b, sh_1, (op_t)-1, sh_2); > + w1a = MERGE (w1, 0, (op_t)-1, sh_2); > + if (w2 != w1a) > + return final_cmp (w1a, w2, n); > + > w2a = w2b; > } > > -- > 2.34.1 >
On 22/02/23 14:21, Szabolcs Nagy wrote: > The 02/22/2023 13:31, Adhemerval Zanella wrote: >> For unaligned case, reading ahead can only be done if parting reads >> matches the aligned input. >> >> Also extend the stratcliff tests to check such cases. >> >> Checked on x86_64-linux-gnu, i686-linux-gnu, powerpc64-linux-gnu, >> and powerpc-linux-gnu by removing the arch-specific assembly >> implementation and disabling multi-arch (it covers both LE and BE >> for 64 and 32 bits). > > thanks this looks good. > > Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com> > So before I push the fix along with the testcase, I checked all strncmp optimization and found out that some implementations also do not handle this correctly as expected: sysdeps/x86_64/multiarch/strncmp-sse2.S FAIL sysdeps/x86_64/multiarch/strncmp-sse4_2.S FAIL sysdeps/x86_64/multiarch/strncmp-avx2.S OK sysdeps/x86_64/multiarch/strncmp-evex.S ? sysdeps/x86_64/multiarch/strncmp-avx2-rtm.S ? sysdeps/ia64/strncmp.S ? sysdeps/sparc/sparc32/sparcv9/strncmp.S OK sysdeps/sparc/sparc64/strncmp.S OK sysdeps/aarch64/strncmp.S OK sysdeps/powerpc/powerpc32/power7/strncmp.S FAIL sysdeps/powerpc/powerpc32/405/strncmp.S ? sysdeps/powerpc/powerpc32/strncmp.S FAIL sysdeps/powerpc/powerpc32/power4/strncmp.S FAIL sysdeps/powerpc/powerpc64/power7/strncmp.S FAIL sysdeps/powerpc/powerpc64/power8/strncmp.S OK sysdeps/powerpc/powerpc64/strncmp.S FAIL sysdeps/powerpc/powerpc64/le/power9/strncmp.S OK sysdeps/alpha/strncmp.S FAIL sysdeps/i386/i686/multiarch/strncmp-sse4.S OK sysdeps/i386/i686/multiarch/strncmp-ssse3.S FAIL sysdeps/s390/strncmp-vx.S OK (the ? are implementations that I can really test, even qemu static thrown illegal instruction). Noah has brought to my attention that he tried to add similar tests, but they were rejected by strncmp string must be null-terminated [1]. The working drafts for C standard I have access (n1256.pdf for C99 and n3047.pdf for c2x) do not say possibly null-terminated array (as some stackoverflow answer state [2]) they refer only as array. So I tend to follow Florian understanding that strncmp inputs should be NULL terminated. So should we really consider this a OOB read on generic strncmp? [1] https://sourceware.org/pipermail/libc-alpha/2022-January/135130.html [2] https://stackoverflow.com/questions/41418766/is-it-legal-to-pass-a-non-null-terminated-string-to-strncmp-in-c
The 02/23/2023 15:15, Adhemerval Zanella Netto wrote: > Noah has brought to my attention that he tried to add similar tests, > but they were rejected by strncmp string must be null-terminated [1]. > > The working drafts for C standard I have access (n1256.pdf for C99 and > n3047.pdf for c2x) do not say possibly null-terminated array (as some > stackoverflow answer state [2]) they refer only as array. So I tend > to follow Florian understanding that strncmp inputs should be NULL > terminated. c11 draft is n1570.pdf https://open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf i dont understand what extension Florian is talking about. (i think that was about strcmp not strncmp) c11 and c23 are clear that strncmp args may *not* be null-terminated so i think we should be careful not to overread. glibc itself has test code that relies on this: crypt/badsalttest > > So should we really consider this a OOB read on generic strncmp? > > [1] https://sourceware.org/pipermail/libc-alpha/2022-January/135130.html > [2] https://stackoverflow.com/questions/41418766/is-it-legal-to-pass-a-non-null-terminated-string-to-strncmp-in-c
* Szabolcs Nagy: > The 02/23/2023 15:15, Adhemerval Zanella Netto wrote: >> Noah has brought to my attention that he tried to add similar tests, >> but they were rejected by strncmp string must be null-terminated [1]. >> >> The working drafts for C standard I have access (n1256.pdf for C99 and >> n3047.pdf for c2x) do not say possibly null-terminated array (as some >> stackoverflow answer state [2]) they refer only as array. So I tend >> to follow Florian understanding that strncmp inputs should be NULL >> terminated. > > c11 draft is n1570.pdf > https://open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf > > i dont understand what extension Florian is talking about. > (i think that was about strcmp not strncmp) > > c11 and c23 are clear that strncmp args may *not* be null-terminated > so i think we should be careful not to overread. It's common to use strncmp as a starts-with predicate, like this: strncmp (s, "prefix", 5) This requires that reading stops at the first null byte. C11 wording makes this kind of usage inadvisable because it treats the inputs as arrays, and this means that mean that implementations could read past the null byte. But that doesn't match current programming practice. The strnlen function has the same problem if you want to use it to limit readhead, e.g. in sscanf to fix bug 17577. (POSIX also speaks of an array argument.) In stdio-common/Xprintf_buffer_puts_1.c, we already use it to avoid a similar performance glitch. It's not the first such uses, there is already a similar call (with similar rationale) in string/strcasestr.c, and for wcsnlen in stdio-common/vfprintf-process-arg.c. I think we should support all these as extensions. The alternative would be to add new functions that stop reading after the first null byte (particularly for the strnlen optimization). Unfortunately, the last time I brought this up, the manual was updated to double down on the array interpretation. I think this was the result: commit 2cc4b9cce51643ec299e97450ccde4deeecfb083 Author: Paul Eggert <eggert@cs.ucla.edu> Date: Fri Dec 4 08:27:14 2015 -0800 Consistency about byte vs character in string.texi I think that's just not the right direction for glibc. Thanks, Florian
diff --git a/string/stratcliff.c b/string/stratcliff.c index 74d64cc03d..88ac787088 100644 --- a/string/stratcliff.c +++ b/string/stratcliff.c @@ -401,12 +401,27 @@ do_test (void) result = 1; } - if (STRNCMP (dest + nchars - outer, adr + middle, outer) <= 0) + /* Also check for size larger than the string. */ + if (STRNCMP (adr + middle, dest + nchars - outer, outer + 99) >= 0) { printf ("%s 2 flunked for outer = %zu, middle = %zu, full\n", + STRINGIFY (STRNCMP), outer + 99, middle); + result = 1; + } + + if (STRNCMP (dest + nchars - outer, adr + middle, outer) <= 0) + { + printf ("%s 3 flunked for outer = %zu, middle = %zu, full\n", STRINGIFY (STRNCMP), outer, middle); result = 1; } + + if (STRNCMP (dest + nchars - outer, adr + middle, outer + 99) <= 0) + { + printf ("%s 4 flunked for outer = %zu, middle = %zu, full\n", + STRINGIFY (STRNCMP), outer + 99, middle); + result = 1; + } } /* strncpy/wcsncpy tests */ diff --git a/string/strncmp.c b/string/strncmp.c index 4c8bf36bb9..751bf53d55 100644 --- a/string/strncmp.c +++ b/string/strncmp.c @@ -73,7 +73,11 @@ strncmp_unaligned_loop (const op_t *x1, const op_t *x2, op_t w1, uintptr_t ofs, uintptr_t sh_2 = sizeof(op_t) * CHAR_BIT - sh_1; op_t w2 = MERGE (w2a, sh_1, (op_t)-1, sh_2); - if (!has_zero (w2) && n > (sizeof (op_t) - ofs)) + + /* Reading ahead is wrong if w1 and w2 already differs. */ + op_t w1a = MERGE (w1, 0, (op_t)-1, sh_2); + + if (!has_zero (w2) && w2 == w1a && n >= (sizeof (op_t) - ofs)) { op_t w2b; @@ -90,6 +94,13 @@ strncmp_unaligned_loop (const op_t *x1, const op_t *x2, op_t w1, uintptr_t ofs, if (has_zero (w2b) || n <= (sizeof (op_t) - ofs)) break; w1 = *x1++; + + /* Reading ahead is wrong if w1 and w2 already differs. */ + w2 = MERGE (w2b, sh_1, (op_t)-1, sh_2); + w1a = MERGE (w1, 0, (op_t)-1, sh_2); + if (w2 != w1a) + return final_cmp (w1a, w2, n); + w2a = w2b; }