Submitted by Sergey Senozhatsky on March 16, 2017, 3:49 p.m.

Message ID | 20170316154908.GA575@tigerII.localdomain |
---|---|

State | New |

Headers | show |

On Fri, 17 Mar 2017, Sergey Senozhatsky wrote: > c) I don't think I see what the "target milestone" is even supposed > to mean. Sorry! glibc version? min glibc version that requires a > backport of this fix (if there are -stable/LTS glibc releases)? etc. > etc. etc. See: <https://sourceware.org/glibc/wiki/Bugzilla%20Procedures>. The person who commits a fix for a bug needs to set the target milestone of the bug along with marking it fixed (so if 2.26 is the first release with the fix, it needs to be set to 2.26).

On Fri, 17 Mar 2017, Sergey Senozhatsky wrote: > I'm not sure I see Ulrich's "You do not even understand how binary > searching works, do you? The sum can never exceed nmemb and nmemb > obviously fits into an size_t" point. it's a bug. As I understand it: it's a bug for an array the number of whose elements is more than SIZE_MAX / 2. Now, there is certainly no need for the comparison function actually to look at the contents of the elements; it's completely valid for it to do something magic that depends on the position of the element in the array rather than (just) its contents. And 1-byte elements are certainly valid. But the pointers do need to be valid - although when you enter the POSIX context there is a strong argument that as long as the addresses are all within the process's address space, it's OK for them to be mapped with PROT_NONE permissions (as long as the comparison function doesn't actually access the referenced objects). Allocating objects occupying half or more of the address space, even with PROT_NONE permissions, is questionable because pointer subtraction cannot work reliably in them (so there is an argument for malloc, mmap etc. to disallow such allocations). But it's believed, from past discussions, that some users of 32-bit systems (possibly 32-bit processes under a 64-bit kernel) do in fact rely on being able to make a single allocation of 2 GB or more, and our existing practice for string functions is to consider it a bug if a string function doesn't work correctly for >= 2 GB strings on 32-bit systems. So on the same basis, this bsearch issue should be considered a bug. If the fix reduces performance, there's a case for making it conditional: if (__builtin_constant_p (__size) && __size >= 2) /* old code */ else /* new code */ given that the issue can only apply with elements of size 1, and the common case probably *is* that the size is constant (__builtin_constant_p works in inline functions) and at least 2, so that would avoid affecting the code generated in the common case at all.

On 17 Mar 2017 00:49, Sergey Senozhatsky wrote: > On (03/16/17 14:02), Joseph Myers wrote: > > If this fixes a user-visible bug then the ChangeLog entry needs to include > > [BZ #N] referencing the bug filed in Bugzilla (and once the fix is in the > > bug needs to be resolved as FIXED with appropriate target milestone set). > > Is this bug 2753? If not, a new bug would need to be filed for it. > > a) um... I assume glibc Bugzilla is located at https://sourceware.org/bugzilla/ > and 2753, thus, is https://sourceware.org/bugzilla/show_bug.cgi?id=2753 > if so, then, yes looks like I'm not the first one to point that out. > I'm not sure I see Ulrich's "You do not even understand how binary > searching works, do you? The sum can never exceed nmemb and nmemb > obviously fits into an size_t" point. it's a bug. if you see a comment by Ulrich that looks to be unrelated/off, then you can probably safely assume it is. if the bug is open still, then we usually consider it *now* to be something worth fixing. sorry for the hassle. -mike

On (03/16/17 20:33), Joseph Myers wrote: > On Fri, 17 Mar 2017, Sergey Senozhatsky wrote: > > > I'm not sure I see Ulrich's "You do not even understand how binary > > searching works, do you? The sum can never exceed nmemb and nmemb > > obviously fits into an size_t" point. it's a bug. so it was 1am yesterday when I took a look at that BZ and I didn't read it carefully (guilty). I kinda see what they meant there. [..] > given that the issue can only apply with elements of size 1, and the > common case probably *is* that the size is constant (__builtin_constant_p > works in inline functions) and at least 2, so that would avoid affecting > the code generated in the common case at all. the builtin __builtin_constant_p(size) suggestion sounds reasonable, since that new middle element calculation does come at a price. --- so I modified the test I posted earlier. moved array to .bss, made array size reasonable (so there is no overflow). each ./a.out executes bsearch() 10000000 times. # perf stat ./old_bsearch.out >> OLD 2>&1 # perf stat ./new_bsearch.out >> NEW 2>&1 # paste OLD NEW the results are ( you guys use big monitors, right? :) ) UNPATCHED PATCHED ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- *== CHAR array ==* Performance counter stats for './a.out': Performance counter stats for './a.out': 490.700599 task-clock:u (msec) # 0.999 CPUs utilized 571.728758 task-clock:u (msec) # 0.999 CPUs utilized 0 context-switches:u # 0.000 K/sec 0 context-switches:u # 0.000 K/sec 0 cpu-migrations:u # 0.000 K/sec 0 cpu-migrations:u # 0.000 K/sec 53 page-faults:u # 0.108 K/sec 53 page-faults:u # 0.093 K/sec 1,539,234,373 cycles:u # 3.137 GHz (83.41%) 1,792,026,989 cycles:u # 3.134 GHz (83.22%) 127,938,341 stalled-cycles-frontend:u # 8.31% frontend cycles idle (83.49%) 131,679,816 stalled-cycles-frontend:u # 7.35% frontend cycles idle (83.21%) 109,189,954 stalled-cycles-backend:u # 7.09% backend cycles idle (66.99%) 95,841,146 stalled-cycles-backend:u # 5.35% backend cycles idle (66.79%) 3,806,104,997 instructions:u # 2.47 insn per cycle 4,331,940,534 instructions:u # 2.42 insn per cycle # 0.03 stalled cycles per insn (83.49%) # 0.03 stalled cycles per insn (83.66%) 1,091,706,962 branches:u # 2224.792 M/sec (83.49%) 1,090,066,259 branches:u # 1906.614 M/sec (83.74%) 11,753,417 branch-misses:u # 1.08% of all branches (83.02%) 10,862,016 branch-misses:u # 1.00% of all branches (83.36%) 0.491320512 seconds time elapsed 0.572367076 seconds time elapsed Performance counter stats for './a.out': Performance counter stats for './a.out': 491.312169 task-clock:u (msec) # 0.999 CPUs utilized 576.186929 task-clock:u (msec) # 0.999 CPUs utilized 0 context-switches:u # 0.000 K/sec 0 context-switches:u # 0.000 K/sec 0 cpu-migrations:u # 0.000 K/sec 0 cpu-migrations:u # 0.000 K/sec 55 page-faults:u # 0.112 K/sec 54 page-faults:u # 0.094 K/sec 1,545,979,259 cycles:u # 3.147 GHz (82.91%) 1,795,566,378 cycles:u # 3.116 GHz (83.34%) 128,899,741 stalled-cycles-frontend:u # 8.34% frontend cycles idle (83.47%) 141,401,217 stalled-cycles-frontend:u # 7.88% frontend cycles idle (83.34%) 103,723,436 stalled-cycles-backend:u # 6.71% backend cycles idle (67.03%) 101,552,870 stalled-cycles-backend:u # 5.66% backend cycles idle (66.68%) 3,809,678,005 instructions:u # 2.46 insn per cycle 4,350,733,034 instructions:u # 2.42 insn per cycle # 0.03 stalled cycles per insn (83.52%) # 0.03 stalled cycles per insn (83.34%) 1,091,129,881 branches:u # 2220.848 M/sec (83.52%) 1,087,751,930 branches:u # 1887.846 M/sec (83.34%) 12,016,960 branch-misses:u # 1.10% of all branches (83.32%) 10,725,247 branch-misses:u # 0.99% of all branches (83.70%) 0.491822793 seconds time elapsed 0.576731036 seconds time elapsed Performance counter stats for './a.out': Performance counter stats for './a.out': 490.530342 task-clock:u (msec) # 0.999 CPUs utilized 572.092012 task-clock:u (msec) # 0.998 CPUs utilized 0 context-switches:u # 0.000 K/sec 0 context-switches:u # 0.000 K/sec 0 cpu-migrations:u # 0.000 K/sec 0 cpu-migrations:u # 0.000 K/sec 54 page-faults:u # 0.110 K/sec 53 page-faults:u # 0.093 K/sec 1,536,658,844 cycles:u # 3.133 GHz (82.88%) 1,786,237,073 cycles:u # 3.122 GHz (83.22%) 126,761,969 stalled-cycles-frontend:u # 8.25% frontend cycles idle (83.45%) 133,130,965 stalled-cycles-frontend:u # 7.45% frontend cycles idle (83.22%) 124,498,779 stalled-cycles-backend:u # 8.10% backend cycles idle (66.98%) 98,609,399 stalled-cycles-backend:u # 5.52% backend cycles idle (66.86%) 3,817,891,053 instructions:u # 2.48 insn per cycle 4,333,473,067 instructions:u # 2.43 insn per cycle # 0.03 stalled cycles per insn (83.49%) # 0.03 stalled cycles per insn (83.68%) 1,088,399,022 branches:u # 2218.821 M/sec (83.49%) 1,089,672,636 branches:u # 1904.716 M/sec (83.75%) 11,599,517 branch-misses:u # 1.07% of all branches (83.44%) 10,308,961 branch-misses:u # 0.95% of all branches (83.30%) 0.491065497 seconds time elapsed 0.573038448 seconds time elapsed Performance counter stats for './a.out': Performance counter stats for './a.out': 494.313419 task-clock:u (msec) # 0.999 CPUs utilized 570.833950 task-clock:u (msec) # 0.999 CPUs utilized 0 context-switches:u # 0.000 K/sec 0 context-switches:u # 0.000 K/sec 0 cpu-migrations:u # 0.000 K/sec 0 cpu-migrations:u # 0.000 K/sec 54 page-faults:u # 0.109 K/sec 55 page-faults:u # 0.096 K/sec 1,540,156,057 cycles:u # 3.116 GHz (83.02%) 1,789,383,612 cycles:u # 3.135 GHz (83.19%) 133,690,009 stalled-cycles-frontend:u # 8.68% frontend cycles idle (83.01%) 133,977,834 stalled-cycles-frontend:u # 7.49% frontend cycles idle (83.17%) 105,653,002 stalled-cycles-backend:u # 6.86% backend cycles idle (67.20%) 100,380,149 stalled-cycles-backend:u # 5.61% backend cycles idle (66.38%) 3,808,529,575 instructions:u # 2.47 insn per cycle 4,360,155,445 instructions:u # 2.44 insn per cycle # 0.04 stalled cycles per insn (83.62%) # 0.03 stalled cycles per insn (83.19%) 1,088,698,277 branches:u # 2202.445 M/sec (83.62%) 1,088,622,411 branches:u # 1907.074 M/sec (83.70%) 11,517,756 branch-misses:u # 1.06% of all branches (83.21%) 10,616,959 branch-misses:u # 0.98% of all branches (83.58%) 0.494938286 seconds time elapsed 0.571451079 seconds time elapsed *== INT array ==* Performance counter stats for './a.out': Performance counter stats for './a.out': 443.792538 task-clock:u (msec) # 0.997 CPUs utilized 515.688363 task-clock:u (msec) # 0.999 CPUs utilized 0 context-switches:u # 0.000 K/sec 0 context-switches:u # 0.000 K/sec 0 cpu-migrations:u # 0.000 K/sec 0 cpu-migrations:u # 0.000 K/sec 50 page-faults:u # 0.113 K/sec 53 page-faults:u # 0.103 K/sec 1,400,310,241 cycles:u # 3.155 GHz (83.10%) 1,623,123,844 cycles:u # 3.147 GHz (83.13%) 112,395,786 stalled-cycles-frontend:u # 8.03% frontend cycles idle (83.25%) 151,631,032 stalled-cycles-frontend:u # 9.34% frontend cycles idle (83.13%) 95,467,149 stalled-cycles-backend:u # 6.82% backend cycles idle (66.67%) 108,350,044 stalled-cycles-backend:u # 6.68% backend cycles idle (66.74%) 3,371,853,412 instructions:u # 2.41 insn per cycle 3,854,946,185 instructions:u # 2.38 insn per cycle # 0.03 stalled cycles per insn (83.69%) # 0.04 stalled cycles per insn (83.66%) 966,737,955 branches:u # 2178.356 M/sec (83.78%) 969,559,223 branches:u # 1880.126 M/sec (83.72%) 10,749,026 branch-misses:u # 1.11% of all branches (83.63%) 11,323,207 branch-misses:u # 1.17% of all branches (83.49%) 0.445114068 seconds time elapsed 0.516332787 seconds time elapsed Performance counter stats for './a.out': Performance counter stats for './a.out': 447.392463 task-clock:u (msec) # 0.998 CPUs utilized 512.701616 task-clock:u (msec) # 0.998 CPUs utilized 0 context-switches:u # 0.000 K/sec 0 context-switches:u # 0.000 K/sec 0 cpu-migrations:u # 0.000 K/sec 0 cpu-migrations:u # 0.000 K/sec 51 page-faults:u # 0.114 K/sec 52 page-faults:u # 0.101 K/sec 1,398,350,256 cycles:u # 3.126 GHz (83.24%) 1,606,918,412 cycles:u # 3.134 GHz (83.03%) 121,919,785 stalled-cycles-frontend:u # 8.72% frontend cycles idle (83.23%) 128,876,515 stalled-cycles-frontend:u # 8.02% frontend cycles idle (83.03%) 94,562,361 stalled-cycles-backend:u # 6.76% backend cycles idle (66.50%) 94,866,954 stalled-cycles-backend:u # 5.90% backend cycles idle (67.16%) 3,399,240,172 instructions:u # 2.43 insn per cycle 3,876,976,133 instructions:u # 2.41 insn per cycle # 0.04 stalled cycles per insn (83.24%) # 0.03 stalled cycles per insn (83.64%) 964,909,971 branches:u # 2156.742 M/sec (83.83%) 969,090,907 branches:u # 1890.166 M/sec (83.64%) 10,618,480 branch-misses:u # 1.10% of all branches (83.69%) 10,861,681 branch-misses:u # 1.12% of all branches (83.33%) 0.448073517 seconds time elapsed 0.513532781 seconds time elapsed Performance counter stats for './a.out': Performance counter stats for './a.out': 447.387472 task-clock:u (msec) # 0.999 CPUs utilized 517.531654 task-clock:u (msec) # 0.999 CPUs utilized 0 context-switches:u # 0.000 K/sec 0 context-switches:u # 0.000 K/sec 0 cpu-migrations:u # 0.000 K/sec 0 cpu-migrations:u # 0.000 K/sec 51 page-faults:u # 0.114 K/sec 53 page-faults:u # 0.102 K/sec 1,406,720,267 cycles:u # 3.144 GHz (83.24%) 1,616,525,247 cycles:u # 3.124 GHz (83.18%) 123,427,413 stalled-cycles-frontend:u # 8.77% frontend cycles idle (83.24%) 142,913,567 stalled-cycles-frontend:u # 8.84% frontend cycles idle (83.21%) 112,720,863 stalled-cycles-backend:u # 8.01% backend cycles idle (66.48%) 114,806,324 stalled-cycles-backend:u # 7.10% backend cycles idle (66.82%) 3,376,115,599 instructions:u # 2.40 insn per cycle 3,868,878,104 instructions:u # 2.39 insn per cycle # 0.04 stalled cycles per insn (83.24%) # 0.04 stalled cycles per insn (83.69%) 970,199,916 branches:u # 2168.590 M/sec (83.24%) 967,881,284 branches:u # 1870.188 M/sec (83.79%) 11,041,374 branch-misses:u # 1.14% of all branches (83.86%) 11,289,517 branch-misses:u # 1.17% of all branches (83.27%) 0.448035542 seconds time elapsed 0.518214692 seconds time elapsed Performance counter stats for './a.out': Performance counter stats for './a.out': 450.246740 task-clock:u (msec) # 0.998 CPUs utilized 518.604218 task-clock:u (msec) # 0.999 CPUs utilized 0 context-switches:u # 0.000 K/sec 0 context-switches:u # 0.000 K/sec 0 cpu-migrations:u # 0.000 K/sec 0 cpu-migrations:u # 0.000 K/sec 52 page-faults:u # 0.115 K/sec 52 page-faults:u # 0.100 K/sec 1,403,767,028 cycles:u # 3.118 GHz (83.35%) 1,622,782,703 cycles:u # 3.129 GHz (83.21%) 124,553,251 stalled-cycles-frontend:u # 8.87% frontend cycles idle (83.37%) 148,974,707 stalled-cycles-frontend:u # 9.18% frontend cycles idle (83.24%) 101,045,481 stalled-cycles-backend:u # 7.20% backend cycles idle (66.71%) 112,861,212 stalled-cycles-backend:u # 6.95% backend cycles idle (66.45%) 3,380,728,555 instructions:u # 2.41 insn per cycle 3,865,667,812 instructions:u # 2.38 insn per cycle # 0.04 stalled cycles per insn (83.37%) # 0.04 stalled cycles per insn (83.22%) 973,763,119 branches:u # 2162.732 M/sec (83.35%) 970,109,337 branches:u # 1870.616 M/sec (83.81%) 11,511,822 branch-misses:u # 1.18% of all branches (83.54%) 11,546,829 branch-misses:u # 1.19% of all branches (83.29%) 0.451002497 seconds time elapsed 0.519224084 seconds time elapsed -ss

On Thu, Mar 16, 2017 at 04:54:40PM -0400, Mike Frysinger wrote: > On 17 Mar 2017 00:49, Sergey Senozhatsky wrote: > > On (03/16/17 14:02), Joseph Myers wrote: > > > If this fixes a user-visible bug then the ChangeLog entry needs to include > > > [BZ #N] referencing the bug filed in Bugzilla (and once the fix is in the > > > bug needs to be resolved as FIXED with appropriate target milestone set). > > > Is this bug 2753? If not, a new bug would need to be filed for it. > > > > a) um... I assume glibc Bugzilla is located at https://sourceware.org/bugzilla/ > > and 2753, thus, is https://sourceware.org/bugzilla/show_bug.cgi?id=2753 > > if so, then, yes looks like I'm not the first one to point that out. > > I'm not sure I see Ulrich's "You do not even understand how binary > > searching works, do you? The sum can never exceed nmemb and nmemb > > obviously fits into an size_t" point. it's a bug. > > if you see a comment by Ulrich that looks to be unrelated/off, then you > can probably safely assume it is. if the bug is open still, then we > usually consider it *now* to be something worth fixing. > No, this is wrong fix as I probably write before. It isn't visible bug unless programmer is idiot who tries to sort gigabyte long array of single bytes and use binary search on it. Just keeping 256 byte table of positions would be more effective than sorting. For any bigger size there isn't overflow as difference between that pointers is bounded. This fix takes extra instruction which would harm performance for basically everybody for nonexistent problem. If you want to fix this special-case one byte branch to make gcc optimize it away for larger sizes.

On Thu, Mar 16, 2017 at 11:32 PM, Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> wrote: > so I modified the test I posted earlier. moved array to .bss, made array > size reasonable (so there is no overflow). each ./a.out executes bsearch() > 10000000 times. Thanks for this. It might also be useful to show the codegen difference, which (on x86-64) is -bsearch_stock: +bsearch_patched: .LFB10: .cfi_startproc pushq %r15 @@ -43,9 +43,11 @@ cmpq %r13, %r14 jnb .L6 .L9: - leaq (%r14,%r13), %rbx + movq %r13, %rbx movq (%rsp), %rdi + subq %r14, %rbx shrq %rbx + addq %r14, %rbx movq %rbx, %r15 imulq %r12, %r15 addq 8(%rsp), %r15 This probably hurts by extending the critical dependency chain, as well as by issuing more instructions. I would like to know what the codegen difference looks like on more RISCy architectures (maybe MIPS and AArch64?) -- I have to wonder whether a CPU that didn't have complex LEA would wind up generating essentially the same code either way. zw

On Tue, Mar 28, 2017 at 6:19 AM, Ondřej Bílka <neleai@seznam.cz> wrote: > On Thu, Mar 16, 2017 at 04:54:40PM -0400, Mike Frysinger wrote: >> if you see a comment by Ulrich that looks to be unrelated/off, then you >> can probably safely assume it is. if the bug is open still, then we >> usually consider it *now* to be something worth fixing. > > No, this is wrong fix as I probably write before. It isn't visible bug > unless programmer is idiot who tries to sort gigabyte long array of > single bytes and use binary search on it. Just keeping 256 byte table of positions > would be more effective than sorting. There's another possibility: a program might be tricked into calling bsearch with the wrong arguments for the array it actually wants to search. If I understand the problem correctly, under the right conditions bsearch could return a pointer outside the bounds of the true (quite small) array. Because of that, I think this qualifies as a must-fix bug and as a potential security vulnerability. > For any bigger size there isn't overflow as difference between that > pointers is bounded. It's unfortunate that there is no good way of making that apparent to the compiler. The relationship among base, nmemb, size, and the address space as a whole is just too complicated. Maybe if we were actually doing pointer arithmetic on a nonce type - but that has its own problems. > This fix takes extra instruction which would harm performance for basically everybody for nonexistent problem. As a meta concern: it is possible to disagree with people without being this hostile. You could have said "I don't think this is worth doing, because it makes all uses of bsearch slower to handle a corner case that will essentially never occur in practice" - same message, avoids elevating your personal opinon to the status of a fact, and avoids blowing off someone who cared enough to write a patch. Please strive to do better in the future. zw

diff --git a/ChangeLog b/ChangeLog index e0acd7d0c4..7142794922 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,3 +1,8 @@ +2017-03-17 Sergey Senozhatsky <sergey.senozhatsky@gmail.com> + + [BZ #2753] + * bits/stdlib-bsearch.h: Fix integer overflow. + 2017-03-10 Stefan Liebler <stli@linux.vnet.ibm.com> * math/auto-libm-test-out-catan: Regenerated. diff --git a/bits/stdlib-bsearch.h b/bits/stdlib-bsearch.h index eb145381fd..5fd8a8b607 100644 --- a/bits/stdlib-bsearch.h +++ b/bits/stdlib-bsearch.h @@ -28,7 +28,7 @@ bsearch (const void *__key, const void *__base, size_t __nmemb, size_t __size, __u = __nmemb; while (__l < __u) { - __idx = (__l + __u) / 2; + __idx = __l + (__u - __l) / 2; __p = (void *) (((const char *) __base) + (__idx * __size)); __comparison = (*__compar) (__key, __p); if (__comparison < 0)