Message ID | 15a32181-8060-4135-cb72-9e79831697d5@gmail.com |
---|---|
State | New |
Headers | show |
Series | remove attribute access from regexec | expand |
On 8/13/21 11:26 AM, Martin Sebor via Libc-alpha wrote: > - __attr_access ((__write_only__, 4, 3)); > + __attr_access ((__read_write__, 4, 3)); Wouldn't it be simpler to remove the __attr_access instead? What utility does __attr_access ((__read_write__, 4, 3)) have? FWIW, Glibc currently does not use such an attribute anywhere.
On 8/13/21 2:11 PM, Paul Eggert wrote: > On 8/13/21 11:26 AM, Martin Sebor via Libc-alpha wrote: >> - __attr_access ((__write_only__, 4, 3)); >> + __attr_access ((__read_write__, 4, 3)); > > Wouldn't it be simpler to remove the __attr_access instead? > > What utility does __attr_access ((__read_write__, 4, 3)) have? FWIW, > Glibc currently does not use such an attribute anywhere. The attribute serves two purposes: it specifies 1) how the function might access the array (to determine whether it should be initialized by the caller) and 2) the minimum number of elements the caller must provide and the maximum number of elements the function definition might access. GCC checks to make sure these constraints are satisfied, both at call sites and in the definitions of these functions. The read_write mode means that a function may both read from and write to the array, and expects it to be initialized. But since regexec might just write to the array and not read from it, depending on flags, the read_write mode wouldn't be correct either. There is no attribute access mode that would capture this but in C99 (though sadly not in C++) and thanks to the nmatch argument preceding pmatch, the VLA argument syntax does make the checking possible. It's like attribute access with an unspecified mode. (It might be worth extending the attribute to express this mode so it can be used when the bound doesn't precede the VLA argument and in C++.) Attached is a revised patch with this approach. Martin PS POSIX says regexec() ignores pnmatch when REG_NOSUB is set, so strictly speaking, warning for such calls to it in that case is also a false positive. I think that's an acceptable trade-off for the buffer overflow detection (passing a zero nmatch in that case is a trivial way to avoid the warning).
On 8/13/21 2:30 PM, Martin Sebor wrote: > Attached is a revised patch with this approach. The revised patch is to include/regex.h but the original patch was to posix/regex.h. Is that intentional? We need to check whether __STDC_VERSION__ is defined. Also, no need for parens around arg of 'defined'. Something like this perhaps: #if (defined __STDC_VERSION__ && 199901L <= __STDC_VERSION__ \ && !defined __STDC_NO_VLA__) Also, the duplication of the declarations make the headers harder to read and encourage typos (I noticed one typo: "_Restrict_arr" without the trailing "_"). Instead, I suggest something like this: #if (defined __STDC_VERSION__ && 199901L <= __STDC_VERSION__ \ && !defined __STDC_NO_VLA__) # define _REGEX_VLA(arg) arg #else # define _REGEX_VLA(arg) #endif That way, we can simply change "regmatch_t __pmatch[_Restrict_arr_]" to "regmatch_t __pmatch[_Restrict_arr_ _REGEX_VLA (__nmatch)]" without having to duplicate the entire function declaration. > PS POSIX says regexec() ignores pnmatch when REG_NOSUB is set, so > strictly speaking, warning for such calls to it in that case is > also a false positive. Ouch, this casts doubt on the entire exercise. It's not simply about warnings: it's about the code being generated for the matcher. For example, for: int f (_Bool flag, unsigned long n, int a[n]) { return n == 0 ? 0 : flag ? a[n - 1] : a[0]; } a compiler is allowed to generate code that loads a[n - 1] even when FLAG is false. Similarly, if we add this VLA business to regexec, the generated machine code could dereference pmatch unconditionally even if our source code makes the dereferencing conditional on REG_NOSUB, and the resulting behavior would fail to conform to POSIX.
On 8/13/21 4:34 PM, Paul Eggert wrote: > On 8/13/21 2:30 PM, Martin Sebor wrote: >> Attached is a revised patch with this approach. > > The revised patch is to include/regex.h but the original patch was to > posix/regex.h. Is that intentional? Yes, they need to be consistent, otherwise GCC issues -Wvla-parameter. (That's to help detect inadvertent array/VLA mismatches as well as mismatches in the VLA parameter bounds.) > > We need to check whether __STDC_VERSION__ is defined. Also, no need for > parens around arg of 'defined'. Something like this perhaps: > > #if (defined __STDC_VERSION__ && 199901L <= __STDC_VERSION__ \ > && !defined __STDC_NO_VLA__) > > Also, the duplication of the declarations make the headers harder to > read and encourage typos (I noticed one typo: "_Restrict_arr" without > the trailing "_"). Instead, I suggest something like this: > > #if (defined __STDC_VERSION__ && 199901L <= __STDC_VERSION__ \ > && !defined __STDC_NO_VLA__) > # define _REGEX_VLA(arg) arg > #else > # define _REGEX_VLA(arg) > #endif > > That way, we can simply change "regmatch_t __pmatch[_Restrict_arr_]" to > "regmatch_t __pmatch[_Restrict_arr_ _REGEX_VLA (__nmatch)]" without > having to duplicate the entire function declaration. Sounds good. I've defined the macro in cdefs.h and mamed it _VLA_ARG to make it usable in other contexts. Please see the attached revision. > >> PS POSIX says regexec() ignores pnmatch when REG_NOSUB is set, so >> strictly speaking, warning for such calls to it in that case is >> also a false positive. > > Ouch, this casts doubt on the entire exercise. It's not simply about > warnings: it's about the code being generated for the matcher. For > example, for: > > int > f (_Bool flag, unsigned long n, int a[n]) > { > return n == 0 ? 0 : flag ? a[n - 1] : a[0]; > } > > a compiler is allowed to generate code that loads a[n - 1] even when > FLAG is false. Similarly, if we add this VLA business to regexec, the > generated machine code could dereference pmatch unconditionally even if > our source code makes the dereferencing conditional on REG_NOSUB, and > the resulting behavior would fail to conform to POSIX. The VLA bound by itself doesn't affect codegen. I suspect you're thinking of a[static n]? With just a[n], without static, there is no requirement that a point to an array with n elements. It simply declares an ordinary pointer, same as [] or *. Martin
On 8/14/21 1:08 PM, Martin Sebor wrote: > The VLA bound by itself doesn't affect codegen. I suspect you're > thinking of a[static n]? With just a[n], without static, there > is no requirement that a point to an array with n elements. It > simply declares an ordinary pointer, same as [] or *. Thanks for clarifying. I tried using a patch like that on coreutils, but it caused the build to fail like this: In file included from lib/exclude.c:35: ./lib/regex.h:661:7: error: ISO C90 forbids variable length array '__pmatch' [-Werror=vla] 661 | regmatch_t __pmatch[_Restrict_arr_ _VLA_ARG (__nmatch)], | ^~~~~~~~~~ cc1: all warnings being treated as errors make[2]: *** [Makefile:10648: lib/exclude.o] Error 1 This is because coreutils is compiled with -Wvla -Werror, to catch inadvertent attempts to use VLAs in local variables (this helps avoid stack-overflow problems). It'd be unfortunate if we had to give that useful diagnostic up simply due to this declaration, as there's no stack-overflow problem here. If you can think of a way around this issue, here are some other things I ran into while trying this idea out on Coreutils. * Other cdefs.h macros (__NTH, __REDIRECT, etc.) start with two underscores, so shouldn't this new macro too? * Come to think of it, the name _VLA_ARG could be improved, as its argument is not actually a VLA; it's the number of elements in a VLA-like array. Also, its formal-parameter "arg" is confusingly-named, as it's an arbitrary integer expression and need not be a function parameter name. How about naming the macro __ARG_NELTS instead? * regex.h cannot use __ARG_NELTS directly, for the same reason it can't use __restrict_arr directly: regex.h is shared with Gnulib and can't assume that a glibc-like sys/cdefs.h is present. I suppose regex.h would need something like this: #ifndef _ARG_NELTS_ # ifdef __ARG_NELTS # define _ARG_NELTS_(arg) __ARG_NELTS (arg) # elif (defined __STDC_VERSION__ && 199901L <= __STDC_VERSION__ \ && !defined __STDC_NO_VLA__) # define _ARG_NELTS_(n) n # else # define _ARG_NELTS_(n) # endif #endif and then use _ARG_NELTS_ later. * The cdefs.h comment has a stray 'n', its wording could be improved (I misread "variable bound" as a variable that's bound to something), there's a stray empty line, and it's nicer to put the comment in front of all the lines that define the macro. Perhaps something like this: /* Specify the number of elements of a function's array parameter, as in 'int f (int n, int a[__ARG_NELTS (n)]);'. */ #if (defined __STDC_VERSION__ && 199901L <= __STDC_VERSION__ \ && !defined __STDC_NO_VLA__) # define __ARG_NELTS(n) n #else # define __ARG_NELTS(n) #endif Though again, it's not clear to me that this idea will fly at all, due to the -Wvla issue. Maybe GCC's -Wvla should be fixed to not report an error in this case? It's actually not a VLA if you ask me (the C standard is unclear).
diff --git a/posix/regex.h b/posix/regex.h index 14fb1d8364..ba8351f873 100644 --- a/posix/regex.h +++ b/posix/regex.h @@ -656,7 +656,7 @@ extern int regexec (const regex_t *_Restrict_ __preg, const char *_Restrict_ __String, size_t __nmatch, regmatch_t __pmatch[_Restrict_arr_], int __eflags) - __attr_access ((__write_only__, 4, 3)); + __attr_access ((__read_write__, 4, 3)); extern size_t regerror (int __errcode, const regex_t *_Restrict_ __preg, char *_Restrict_ __errbuf, size_t __errbuf_size)