Message ID | 20201130084053.90470-1-siddhesh@sourceware.org |
---|---|
State | New |
Headers | show |
Series | [RFC] __builtin_dynamic_object_size with -D_FORTIFY_SOURCE=3 | expand |
* Siddhesh Poyarekar: > -D_FORTIFY_SOURCE=3 > ------------------- > > I restricted usage of __builtin_dynamic_object_size to a new > _FORTIFY_SOURCE level because it has one fundamental difference from > the earlier level, which is that it can pass a variable to the _chk > function instead of constant. As a result, 1 and 2 are fast with most > overhead being optimized out while 3 may have additional overhead. > > For example at the moment the simple case of memcpy, memmove, > etc. where they're implemented with compiler builtins like > __builtin___memcpy_chk, clang is able to generate compact code that > passes the expression to __memcpy_chk or just generates a memcpy when > possible. In the non-builtin cases though, where the size evaluates > to an expression, one may see patterns like: > > cmpq $-1, %rbx > je .LBB0_2 > callq fortified_chk > jmp .LBB0_3 > .LBB0_2: # %if.else > callq fortified > > since the compiler isn't smart enough yet to reduce that condition. > This can be fixed (I'm working on that right now) in the simple case > of a direct comparison and thus work for _FORTIFY_SOURCE=3, but it may > be harder to evaluate for __builtin_dynamic_object_size in general > where the comparison happens indirectly. This is also why > __builtin_dynamic_object_size isn't exactly a drop-in replacement for > __builtin_object_size in all cases. I think this should be fixed in the compiler, and the level 3 should be dropped. A compiler bug is not a good reason to change the external interface, especially if it is just a performance bug. The 2 vs 3 choice isn't something that's useful to developers. Thanks, Florian
On Mon, Nov 30, 2020 at 10:11:51AM +0100, Florian Weimer wrote: > > For example at the moment the simple case of memcpy, memmove, > > etc. where they're implemented with compiler builtins like > > __builtin___memcpy_chk, clang is able to generate compact code that > > passes the expression to __memcpy_chk or just generates a memcpy when > > possible. In the non-builtin cases though, where the size evaluates > > to an expression, one may see patterns like: > > > > cmpq $-1, %rbx > > je .LBB0_2 > > callq fortified_chk > > jmp .LBB0_3 > > .LBB0_2: # %if.else > > callq fortified > > > > since the compiler isn't smart enough yet to reduce that condition. > > This can be fixed (I'm working on that right now) in the simple case > > of a direct comparison and thus work for _FORTIFY_SOURCE=3, but it may > > be harder to evaluate for __builtin_dynamic_object_size in general > > where the comparison happens indirectly. This is also why > > __builtin_dynamic_object_size isn't exactly a drop-in replacement for > > __builtin_object_size in all cases. > > I think this should be fixed in the compiler, and the level 3 should be > dropped. A compiler bug is not a good reason to change the external > interface, especially if it is just a performance bug. The 2 vs 3 > choice isn't something that's useful to developers. What bug do you mean? The fact that __builtin_object_size is required to be a constant has been a fundamental requirement of the whole _FORTIFY_SOURCE design. Without that, it adds completely unbounded runtime overhead, turning the program ultimately into yet another bounded pointers implementation. In the worst case, every pointer arithmetic will need to be accomodated by tracking of its runtime length, using maximum/minimum, saturated arithmetics etc. Consider just for a start e.g. #include <stdlib.h> void foo (void *, void *, void *); size_t bar (size_t x, size_t y, size_t z, size_t w, size_t v) { char *p = malloc (x + 15); char *q = calloc (y + 12, z + 31); char *r = w ? p : q; if (v > 23) r += v; foo (r, p, q); size_t ret = __builtin_dynamic_object_size (r, 0); free (p); free (q); return ret; } LLVM seems to handle this as adding size_t len1 = x + 15; size_t len2 = (y + 12) * (z + 31); /* Note, no overflow checking here, though as calloc would return NULL perhaps it is not needed. */ size_t len3 = w ? len1 : len2; size_t len4 = v > 23 ? (len3 - 23 /* saturating */) : len3; Now consider the involved pointer to be conditionally incremented in some loop... __builtin_object_size is only exact if 0 mode returns the same value as 2 mode (or 1 vs. 3), otherwise it is an upper bound, __builtin_dynamic_object_size is bounded pointer implementation (for selected pointers). I think users should have a choice whether they want up to 2x slowdown in their code or not. Jakub
On 11/30/20 2:41 PM, Florian Weimer via Libc-alpha wrote: > * Siddhesh Poyarekar: > >> -D_FORTIFY_SOURCE=3 >> ------------------- >> >> I restricted usage of __builtin_dynamic_object_size to a new >> _FORTIFY_SOURCE level because it has one fundamental difference from >> the earlier level, which is that it can pass a variable to the _chk >> function instead of constant. As a result, 1 and 2 are fast with most >> overhead being optimized out while 3 may have additional overhead. >> >> For example at the moment the simple case of memcpy, memmove, >> etc. where they're implemented with compiler builtins like >> __builtin___memcpy_chk, clang is able to generate compact code that >> passes the expression to __memcpy_chk or just generates a memcpy when >> possible. In the non-builtin cases though, where the size evaluates >> to an expression, one may see patterns like: >> >> cmpq $-1, %rbx >> je .LBB0_2 >> callq fortified_chk >> jmp .LBB0_3 >> .LBB0_2: # %if.else >> callq fortified >> >> since the compiler isn't smart enough yet to reduce that condition. >> This can be fixed (I'm working on that right now) in the simple case >> of a direct comparison and thus work for _FORTIFY_SOURCE=3, but it may >> be harder to evaluate for __builtin_dynamic_object_size in general >> where the comparison happens indirectly. This is also why >> __builtin_dynamic_object_size isn't exactly a drop-in replacement for >> __builtin_object_size in all cases. > > I think this should be fixed in the compiler, and the level 3 should be > dropped. A compiler bug is not a good reason to change the external > interface, especially if it is just a performance bug. The 2 vs 3 > choice isn't something that's useful to developers. Besides the bug, do you think a performance tradeoff should result in us having this 2 vs 3 differentiation? I didn't make the 2 vs 3 proposal specifically to work around this bug although though it does make it easier for us to add support into glibc without blocking on llvm fixing it. In general, dynamic object size checks may end up having an additional performance tradeoff (even __bdos without this bug will have additional instructions emitted, perhaps even spills, making them a wee bit slower) and it may be desirable to have a separate fortification level to allow developers to choose. Siddhesh
* Siddhesh Poyarekar: > Besides the bug, do you think a performance tradeoff should result in > us having this 2 vs 3 differentiation? I didn't make the 2 vs 3 > proposal specifically to work around this bug although though it does > make it easier for us to add support into glibc without blocking on > llvm fixing it. In general, dynamic object size checks may end up > having an additional performance tradeoff (even __bdos without this > bug will have additional instructions emitted, perhaps even spills, > making them a wee bit slower) and it may be desirable to have a > separate fortification level to allow developers to choose. I really dislike more developer choices. I think we should experiment with this with a fixed Clang, and see what the generated code looks like in practice. Thanks, Florian
* Jakub Jelinek: > On Mon, Nov 30, 2020 at 10:11:51AM +0100, Florian Weimer wrote: >> > For example at the moment the simple case of memcpy, memmove, >> > etc. where they're implemented with compiler builtins like >> > __builtin___memcpy_chk, clang is able to generate compact code that >> > passes the expression to __memcpy_chk or just generates a memcpy when >> > possible. In the non-builtin cases though, where the size evaluates >> > to an expression, one may see patterns like: >> > >> > cmpq $-1, %rbx >> > je .LBB0_2 >> > callq fortified_chk >> > jmp .LBB0_3 >> > .LBB0_2: # %if.else >> > callq fortified >> > >> > since the compiler isn't smart enough yet to reduce that condition. >> > This can be fixed (I'm working on that right now) in the simple case >> > of a direct comparison and thus work for _FORTIFY_SOURCE=3, but it may >> > be harder to evaluate for __builtin_dynamic_object_size in general >> > where the comparison happens indirectly. This is also why >> > __builtin_dynamic_object_size isn't exactly a drop-in replacement for >> > __builtin_object_size in all cases. >> >> I think this should be fixed in the compiler, and the level 3 should be >> dropped. A compiler bug is not a good reason to change the external >> interface, especially if it is just a performance bug. The 2 vs 3 >> choice isn't something that's useful to developers. > > What bug do you mean? The pointless run-time check to choose between the _chk and non-_chk variants. > The fact that __builtin_object_size is required to be a constant has > been a fundamental requirement of the whole _FORTIFY_SOURCE design. > Without that, it adds completely unbounded runtime overhead, turning > the program ultimately into yet another bounded pointers > implementation. In the worst case, every pointer arithmetic will need > to be accomodated by tracking of its runtime length, using > maximum/minimum, saturated arithmetics etc. Let's see if there is an actual performance impact for Clang. I think it's wrong to view this in isolation. Of course there will be regressions for certain code sequences. But glibc will use this in conjunction with string functions, which already trigger heavy optimization from GCC with lots of code duplication. The Clang implementation, once fixed, will give us a way to experiment with this. Maybe some trade-offs are possible: Track just the end of the allocated buffer, at the cost of some accuracy. Then intermediate pointer arithmetic does not have to be instrumented. Thanks, Florian
On Mon, Nov 30, 2020 at 11:10:56AM +0100, Florian Weimer wrote: > I think it's wrong to view this in isolation. Of course there will be > regressions for certain code sequences. But glibc will use this in > conjunction with string functions, which already trigger heavy > optimization from GCC with lots of code duplication. > > The Clang implementation, once fixed, will give us a way to experiment > with this. > > Maybe some trade-offs are possible: Track just the end of the allocated > buffer, at the cost of some accuracy. Then intermediate pointer > arithmetic does not have to be instrumented. There is no "the allocated buffer", a pointer may point to many different ones and which ones depends on the control flow etc. and for each of the allocations there could be different pointer arithmetics involved, in both directions. As can be seen e.g. on following example, LLVM really does implement it as bounded pointers, the sizes are then precise, but the runtime cost is high. There is no limit it to simple most common cases. typedef __SIZE_TYPE__ size_t; void *malloc (size_t); void free (void *); int foo (int p[128], int q) { char *r = malloc (q); for (int i = 0; i < 64; i++) { if (p[i]) r += p[64 + i]; } size_t ret = __builtin_dynamic_object_size (r, 0); free (r); return ret; } Actually, trying: typedef __SIZE_TYPE__ size_t; void *malloc (size_t); void free (void *); int bar (int p[192], int q, char *s) { char *r = malloc (q); for (int i = 0; i < 64; i++) { if (p[i]) r += p[64 + i]; if (p[128 + i]) r = s; } size_t ret = __builtin_dynamic_object_size (r, 0); free (r); return ret; } in that case LLVM gives up, so maybe it does the tracking only if all the possible values have some known object size. Even with the above foo test where r can have (expensively computed) precise value, if one e.g. add char *s argument and uses s ? s : r in __builtin_dynamic_object_size, it just returns -1. Jakub
On Mon, Nov 30, 2020 at 11:06:47AM +0100, Florian Weimer wrote: > * Siddhesh Poyarekar: > > > Besides the bug, do you think a performance tradeoff should result in > > us having this 2 vs 3 differentiation? I didn't make the 2 vs 3 > > proposal specifically to work around this bug although though it does > > make it easier for us to add support into glibc without blocking on > > llvm fixing it. In general, dynamic object size checks may end up > > having an additional performance tradeoff (even __bdos without this > > bug will have additional instructions emitted, perhaps even spills, > > making them a wee bit slower) and it may be desirable to have a > > separate fortification level to allow developers to choose. > > I really dislike more developer choices. I think we should experiment > with this with a fixed Clang, and see what the generated code looks like > in practice. I think it is very important to give developers a choice. Because, otherwise if the dynamic bounded pointers tracking is too costly for them, they will disable all fortification, even the cheap ones, which will in the end be worse for security. Jakub
diff --git a/include/features.h b/include/features.h index f3e62d3362..86409dd457 100644 --- a/include/features.h +++ b/include/features.h @@ -397,10 +397,10 @@ # warning _FORTIFY_SOURCE requires compiling with optimization (-O) # elif !__GNUC_PREREQ (4, 1) # warning _FORTIFY_SOURCE requires GCC 4.1 or later -# elif _FORTIFY_SOURCE > 1 -# define __USE_FORTIFY_LEVEL 2 +# elif _FORTIFY_SOURCE > 2 +# define __USE_FORTIFY_LEVEL 3 # else -# define __USE_FORTIFY_LEVEL 1 +# define __USE_FORTIFY_LEVEL _FORTIFY_SOURCE # endif #endif #ifndef __USE_FORTIFY_LEVEL diff --git a/misc/sys/cdefs.h b/misc/sys/cdefs.h index e94d09d7dd..563b238ed5 100644 --- a/misc/sys/cdefs.h +++ b/misc/sys/cdefs.h @@ -127,6 +127,15 @@ #define __bos(ptr) __builtin_object_size (ptr, __USE_FORTIFY_LEVEL > 1) #define __bos0(ptr) __builtin_object_size (ptr, 0) +/* Use __builtin_dynamic_object_size if available. */ +#if __USE_FORTIFY_LEVEL == 3 && __glibc_clang_prereq (9, 0) +# define __objsize0(__o) __builtin_dynamic_object_size (__o, 0) +# define __objsize(__o) __builtin_dynamic_object_size (__o, 1) +#else +# define __objsize0(__o) __bos0 (__o) +# define __objsize(__o) __bos (__o) +#endif + #if __GNUC_PREREQ (4,3) # define __warnattr(msg) __attribute__((__warning__ (msg))) # define __errordecl(name, msg) \ diff --git a/string/bits/string_fortified.h b/string/bits/string_fortified.h index 4c1aeb45f1..e8e14506f9 100644 --- a/string/bits/string_fortified.h +++ b/string/bits/string_fortified.h @@ -26,13 +26,13 @@ __fortify_function void * __NTH (memcpy (void *__restrict __dest, const void *__restrict __src, size_t __len)) { - return __builtin___memcpy_chk (__dest, __src, __len, __bos0 (__dest)); + return __builtin___memcpy_chk (__dest, __src, __len, __objsize0 (__dest)); } __fortify_function void * __NTH (memmove (void *__dest, const void *__src, size_t __len)) { - return __builtin___memmove_chk (__dest, __src, __len, __bos0 (__dest)); + return __builtin___memmove_chk (__dest, __src, __len, __objsize0 (__dest)); } #ifdef __USE_GNU @@ -40,7 +40,7 @@ __fortify_function void * __NTH (mempcpy (void *__restrict __dest, const void *__restrict __src, size_t __len)) { - return __builtin___mempcpy_chk (__dest, __src, __len, __bos0 (__dest)); + return __builtin___mempcpy_chk (__dest, __src, __len, __objsize0 (__dest)); } #endif @@ -53,7 +53,7 @@ __NTH (mempcpy (void *__restrict __dest, const void *__restrict __src, __fortify_function void * __NTH (memset (void *__dest, int __ch, size_t __len)) { - return __builtin___memset_chk (__dest, __ch, __len, __bos0 (__dest)); + return __builtin___memset_chk (__dest, __ch, __len, __objsize0 (__dest)); } #ifdef __USE_MISC @@ -65,21 +65,21 @@ void __explicit_bzero_chk (void *__dest, size_t __len, size_t __destlen) __fortify_function void __NTH (explicit_bzero (void *__dest, size_t __len)) { - __explicit_bzero_chk (__dest, __len, __bos0 (__dest)); + __explicit_bzero_chk (__dest, __len, __objsize0 (__dest)); } #endif __fortify_function char * __NTH (strcpy (char *__restrict __dest, const char *__restrict __src)) { - return __builtin___strcpy_chk (__dest, __src, __bos (__dest)); + return __builtin___strcpy_chk (__dest, __src, __objsize (__dest)); } #ifdef __USE_GNU __fortify_function char * __NTH (stpcpy (char *__restrict __dest, const char *__restrict __src)) { - return __builtin___stpcpy_chk (__dest, __src, __bos (__dest)); + return __builtin___stpcpy_chk (__dest, __src, __objsize (__dest)); } #endif @@ -88,14 +88,14 @@ __fortify_function char * __NTH (strncpy (char *__restrict __dest, const char *__restrict __src, size_t __len)) { - return __builtin___strncpy_chk (__dest, __src, __len, __bos (__dest)); + return __builtin___strncpy_chk (__dest, __src, __len, __objsize (__dest)); } #if __GNUC_PREREQ (4, 7) || __glibc_clang_prereq (2, 6) __fortify_function char * __NTH (stpncpy (char *__dest, const char *__src, size_t __n)) { - return __builtin___stpncpy_chk (__dest, __src, __n, __bos (__dest)); + return __builtin___stpncpy_chk (__dest, __src, __n, __objsize (__dest)); } #else extern char *__stpncpy_chk (char *__dest, const char *__src, size_t __n, @@ -107,7 +107,7 @@ extern char *__REDIRECT_NTH (__stpncpy_alias, (char *__dest, const char *__src, __fortify_function char * __NTH (stpncpy (char *__dest, const char *__src, size_t __n)) { - if (__bos (__dest) != (size_t) -1 + if (__objsize (__dest) != (size_t) -1 && (!__builtin_constant_p (__n) || __n > __bos (__dest))) return __stpncpy_chk (__dest, __src, __n, __bos (__dest)); return __stpncpy_alias (__dest, __src, __n); @@ -118,7 +118,7 @@ __NTH (stpncpy (char *__dest, const char *__src, size_t __n)) __fortify_function char * __NTH (strcat (char *__restrict __dest, const char *__restrict __src)) { - return __builtin___strcat_chk (__dest, __src, __bos (__dest)); + return __builtin___strcat_chk (__dest, __src, __objsize (__dest)); } @@ -126,7 +126,7 @@ __fortify_function char * __NTH (strncat (char *__restrict __dest, const char *__restrict __src, size_t __len)) { - return __builtin___strncat_chk (__dest, __src, __len, __bos (__dest)); + return __builtin___strncat_chk (__dest, __src, __len, __objsize (__dest)); } #endif /* bits/string_fortified.h */ diff --git a/string/bits/strings_fortified.h b/string/bits/strings_fortified.h index d4091f4f69..122199e036 100644 --- a/string/bits/strings_fortified.h +++ b/string/bits/strings_fortified.h @@ -22,13 +22,13 @@ __fortify_function void __NTH (bcopy (const void *__src, void *__dest, size_t __len)) { - (void) __builtin___memmove_chk (__dest, __src, __len, __bos0 (__dest)); + (void) __builtin___memmove_chk (__dest, __src, __len, __objsize0 (__dest)); } __fortify_function void __NTH (bzero (void *__dest, size_t __len)) { - (void) __builtin___memset_chk (__dest, '\0', __len, __bos0 (__dest)); + (void) __builtin___memset_chk (__dest, '\0', __len, __objsize0 (__dest)); } #endif