Message ID | 20180201205757.51911-1-hjl.tools@gmail.com |
---|---|
Headers | show |
Series | nptl: Update struct pthread_unwind_buf | expand |
On 02/01/2018 12:57 PM, H.J. Lu wrote: > struct pthread_unwind_buf is updated to save and restore shadow stack > register with backward binary compatibility. > > H.J. Lu (2): > Revert "Revert Intel CET changes to __jmp_buf_tag (Bug 22743)" > nptl: Update struct pthread_unwind_buf [BZ #22743] > High Level: To enable Intel CET we need to make sure that all places that change execution context abruptly also adjust the shadow stack such that the subsequent return matches what is presently on the stack. The current setjmp/longjmp are just such functions that change execution context abruptly, and to use longjmp to return to a setjmp location we must also restore the shadow stack on the return jump. The cancellation push/pop sequences also use a jump buffer, and at a high-level your solution is to expand those buffers with enough space (an ABI change) to store the shadow stack pointer. The ABI transition is managed using the CET note markup, and safety is ensured in that way as a all-or-nothing ABI change. Note that it depends heavily CET markup rolling out smoothly with glibc 2.28. If any distro enables CET markup without glibc 2.28 then it might be possible to have objects that use the old-ABI jump buffers in a CET enabled application and crash. Similarly if users build and use their own runtime. Why not version __pthread_register_cancel instead? The versioned symbol would seem to me a more robust indicator of the larger jump buffer than the requirement on coordinating markup + glibc 2.28 in all distros and for all users? I would like to see an argument made for CET markup against versioning __pthread_register_cleanup. Design: Commit f33632ccd1dec3217583fcfdd965afb62954203c finds space in the existing sigset_t by reducing the number of possible signals down to 513, the kernel is only using 64, and the remaining space is used to store the 64-bit shadow stack pointer. However, this isn't the only place that jump buffers like those used in setjmp/longjmp are used. Internally in pthread_cleanup_push and pthread_clenaup_pop the cancellation is handled by a special structure that is almost identical to the jump buffers used by setjmp/longjmp. However, because they are not identical, and were designed to avoid needing to save/restore the process signal mask during cancellation (as an optimization), these buffers also do not have enough space to store the shadow stack pointer. Therefore there needs to be an expansion in the size of the jump buffer used for cancellation to accomodate the shadow stack pointer. Your patches present the most logical next step here, which is to make the cancellation jump buffer the same size as the normal jump buffer used for setjmp/longjmp. Changing the cancellation jump buffer is an ABI change though since the jump buffer is allocated in the application stack via the the pthread_cleanup_push macro. Some way must be found to determine which size of jump buffer the caller is using to know if the newer larger version can be used. It is not sufficient to version pthread_create to make this distinction because an application may have been compiled with a new glibc, get the new version of pthread_create, and then call a DSO which used the old pthread_cleanup_push, and so expecte the old-ABI jump buffer. It is not sufficient to use priv.pad[3] because on ABIs with 32-bit pointers like x32 there is insufficient space to store the 64-bit shadow stack pointer. It *is* sufficient to version __pthread_register_cleanup in theory because on a caller-by-caller basis this would indicate if the caller had been compiled with the old or new-ABI jump buffer and read accordingly. Your patches take another approach, which is to use the binutils CET markup to indicate an all-or-nothing ABI transition to the new larger sized jump buffer. If CET is enabled, then the application would have been compiled with a CET enabled gcc, CET enabled glibc, and CET enabled binutils, and the result is a CET enabled application which is then assumed to have the newer larger sized cancellation jump buffer. Your solution avoids adding another symbol version and uses the existing CET markup as the deciding factor. Why might we reject versioning __pthread_register_cleanup? It would seem that since glibc internally controls the pthread_unwind_buf it will never be passed by pointer to another TU to be used in ways we don't control (the problem s390x faced when trying to increase the size of jmpbuf for setjmp/longjmp) e.g. loaded by a newer version of __pthread_register_cleanup that expects a different sized object. The internals and their use as expected from pthread_cleanup_push/pop are all in the same translation unit. Again, I would like to see an argument made for CET markup against versioning __pthread_register_cleanup. Details: The name NEED_SAVED_MASK_IN_CANCEL_JMP_BUF while true is slightly misleading IMO. While it is true that you need the saved mask there, the actual logical goal is to make the structure match in layout with the non-cancel jmp_buf. I would rename this to NEED_SETJMP_LAYOUT or something like that to indicate the intent is to make the structure match the layout used by setjmp jump buffers. Both UNWIND_BUF_PRIV and THREAD_COPY_ADDITONAL_INFO are exmaples of typo-prone macro APIs. We should work hard to define defaults and have unconditional users of these macros such that -Wundef can warn us if we make mistakes e.g. https://sourceware.org/glibc/wiki/Wundef Notes: - This current patch set passes the ABI tests I ran last time so it is good to note that your patches do not appear to introduce any other ABI regressions.
* Carlos O'Donell: > I would like to see an argument made for CET markup against versioning > __pthread_register_cleanup. I don't think __pthread_register_cleanup. It's __sigsetjmp. We would have to version __sigsetjmp. Changing the name would be ideal, but this will be difficult because of its returns-twice nature. I also don't have a problem with requiring -fexceptions for cancellation handlers with CET support. For additional safety, we could change sigsetjmp to write the shadow stack pointer before the kernel signal mask, so that it will still be in bounds for legacy cancellation handler allocation. __pthread_register_cleanup will overwrite it, but I don't think we ever need to restore it, so that shouldn't be a problem.
On Thu, Feb 8, 2018 at 1:25 AM, Carlos O'Donell <carlos@redhat.com> wrote: > On 02/01/2018 12:57 PM, H.J. Lu wrote: >> struct pthread_unwind_buf is updated to save and restore shadow stack >> register with backward binary compatibility. >> >> H.J. Lu (2): >> Revert "Revert Intel CET changes to __jmp_buf_tag (Bug 22743)" >> nptl: Update struct pthread_unwind_buf [BZ #22743] >> > > High Level: > > To enable Intel CET we need to make sure that all places that change > execution context abruptly also adjust the shadow stack such that the > subsequent return matches what is presently on the stack. The current > setjmp/longjmp are just such functions that change execution context > abruptly, and to use longjmp to return to a setjmp location we must > also restore the shadow stack on the return jump. > > The cancellation push/pop sequences also use a jump buffer, and at a > high-level your solution is to expand those buffers with enough space > (an ABI change) to store the shadow stack pointer. The ABI transition > is managed using the CET note markup, and safety is ensured in that way > as a all-or-nothing ABI change. > > Note that it depends heavily CET markup rolling out smoothly with glibc > 2.28. If any distro enables CET markup without glibc 2.28 then it might > be possible to have objects that use the old-ABI jump buffers in a CET > enabled application and crash. Similarly if users build and use their > own runtime. > > Why not version __pthread_register_cancel instead? The versioned symbol > would seem to me a more robust indicator of the larger jump buffer than > the requirement on coordinating markup + glibc 2.28 in all distros and > for all users? > > I would like to see an argument made for CET markup against versioning > __pthread_register_cleanup. > > Design: > > Commit f33632ccd1dec3217583fcfdd965afb62954203c finds space in the > existing sigset_t by reducing the number of possible signals down to > 513, the kernel is only using 64, and the remaining space is used to > store the 64-bit shadow stack pointer. > > However, this isn't the only place that jump buffers like those used > in setjmp/longjmp are used. Internally in pthread_cleanup_push and > pthread_clenaup_pop the cancellation is handled by a special > structure that is almost identical to the jump buffers used by > setjmp/longjmp. However, because they are not identical, and were > designed to avoid needing to save/restore the process signal mask > during cancellation (as an optimization), these buffers also do > not have enough space to store the shadow stack pointer. > > Therefore there needs to be an expansion in the size of the jump > buffer used for cancellation to accomodate the shadow stack pointer. > > Your patches present the most logical next step here, which is to > make the cancellation jump buffer the same size as the normal jump > buffer used for setjmp/longjmp. > > Changing the cancellation jump buffer is an ABI change though since > the jump buffer is allocated in the application stack via the the > pthread_cleanup_push macro. Some way must be found to determine which > size of jump buffer the caller is using to know if the newer larger > version can be used. > > It is not sufficient to version pthread_create to make this distinction > because an application may have been compiled with a new glibc, get > the new version of pthread_create, and then call a DSO which used the > old pthread_cleanup_push, and so expecte the old-ABI jump buffer. > > It is not sufficient to use priv.pad[3] because on ABIs with 32-bit > pointers like x32 there is insufficient space to store the 64-bit > shadow stack pointer. > > It *is* sufficient to version __pthread_register_cleanup in theory > because on a caller-by-caller basis this would indicate if the caller > had been compiled with the old or new-ABI jump buffer and read > accordingly. > > Your patches take another approach, which is to use the binutils CET > markup to indicate an all-or-nothing ABI transition to the new larger > sized jump buffer. If CET is enabled, then the application would have > been compiled with a CET enabled gcc, CET enabled glibc, and CET > enabled binutils, and the result is a CET enabled application which > is then assumed to have the newer larger sized cancellation jump buffer. > Your solution avoids adding another symbol version and uses the existing > CET markup as the deciding factor. > > Why might we reject versioning __pthread_register_cleanup? It would seem > that since glibc internally controls the pthread_unwind_buf it will never > be passed by pointer to another TU to be used in ways we don't control > (the problem s390x faced when trying to increase the size of jmpbuf > for setjmp/longjmp) e.g. loaded by a newer version of __pthread_register_cleanup > that expects a different sized object. The internals and their use > as expected from pthread_cleanup_push/pop are all in the same translation > unit. > > Again, I would like to see an argument made for CET markup against versioning > __pthread_register_cleanup. Symbol versioning works well when only opaque data pointers are used. A pointer of struct pthread_unwind_buf is passed to libpthread from user programs. Within libpthread, we need to know the exact layout of struct pthread_unwind_buf. It is next to impossible to use symbol versioning here. > Details: > > The name NEED_SAVED_MASK_IN_CANCEL_JMP_BUF while true is slightly > misleading IMO. While it is true that you need the saved mask there, the > actual logical goal is to make the structure match in layout with the > non-cancel jmp_buf. I would rename this to NEED_SETJMP_LAYOUT or something Will do. > like that to indicate the intent is to make the structure match the layout > used by setjmp jump buffers. > > Both UNWIND_BUF_PRIV and THREAD_COPY_ADDITONAL_INFO are exmaples of typo-prone > macro APIs. We should work hard to define defaults and have unconditional > users of these macros such that -Wundef can warn us if we make mistakes > e.g. https://sourceware.org/glibc/wiki/Wundef Will do. > Notes: > - This current patch set passes the ABI tests I ran last time so it is good > to note that your patches do not appear to introduce any other ABI regressions. > Thanks.
On 02/08/2018 03:55 AM, Florian Weimer wrote: > * Carlos O'Donell: > >> I would like to see an argument made for CET markup against versioning >> __pthread_register_cleanup. > > I don't think __pthread_register_cleanup. It's __sigsetjmp. We would > have to version __sigsetjmp. Changing the name would be ideal, but > this will be difficult because of its returns-twice nature. I'm looking for a solution that yields a clearly diagnosable failure when users mix a compiler which emits CET markup and a glibc which has the smaller sized cancellation jump buffer. If we version __sigsetjmp, we need to decide the semantics of the old and new version. A naive old version would not save the shadow stack pointer to the reclaimed space after the newer smaller internal __sigset_t, and a newer version would save the shadow stack pointer. This leads to sigsegv when a user mixes as above by accident, and they will have no warning about the problem. The CET markup will be present in all objects they have, but when run with a newer glibc it will crash by corrupting the stack when it writes the shadow stack pointer. My suggestion with __pthread_register_cancel et.al. (and yes we would have to version all the interfaces) was to basically data version the structure, but my original idea revolved around the idea of shifting __pad[3] up to the start of the structure and use it for versioning. However, your idea to version __sigsetjmp has reminded me that __saved_mask need only be non-zero to work properly and we could store a version cookie there, which could allow us to do the following: * Version __sigsetjmp, and store a version cookie in __saved_mask indicating the version of the structure. * Have the old __sigsetjmp disable CET since it clearly indicates that we have the wrong sized structure regardless of CET flags. * Adjust the unwinder to look at __saved_mask version cooke to decide which sized structure it is dealing with. Is this feasible? It would give high quality diagnostics and potentially allow us to extend the cancellation jump buffer with more data in the future. > I also don't have a problem with requiring -fexceptions for > cancellation handlers with CET support. For additional safety, we > could change sigsetjmp to write the shadow stack pointer before the > kernel signal mask, so that it will still be in bounds for legacy > cancellation handler allocation. __pthread_register_cleanup will > overwrite it, but I don't think we ever need to restore it, so that > shouldn't be a problem. In the truncated sigjmp_buf used by pthread cancellation we have only void * which is not large enough on x32, where you need a 64-bit shadow stack pointer. It would work on every other machine though. HJ has mentioned this problem already IIRC. Why would __pthread_register_cancel overwrite it?
On 02/08/2018 05:27 AM, H.J. Lu wrote: > On Thu, Feb 8, 2018 at 1:25 AM, Carlos O'Donell <carlos@redhat.com> wrote: >> Again, I would like to see an argument made for CET markup against versioning >> __pthread_register_cleanup. > > Symbol versioning works well when only opaque data pointers are used. > A pointer of struct pthread_unwind_buf is passed to libpthread from user > programs. Within libpthread, we need to know the exact layout of struct > pthread_unwind_buf. It is next to impossible to use symbol versioning here. My worry is about incorrectly marked up objects which use the wrong unwind buffer size resulting in difficult to diagnose crashes. The exact issue would be users using a gcc and binutils which marks up binaries with CET notes, but glibc headers which are not CET enabled. This will happen when we ship Developer Toolset with a gcc and binutils that has CET support, but the system glibc is still < 2.28. Then users will take these binaries and try to run them on systems with glibc >= 2.28 because we allow this backwards compatibility, and this will segfault because we are turning on CET, but mixing different sized unwind buffers. Please tell me if you think my worries are unfounded, or of sufficiently low risk that they will not be a problem, or that we can somehow say that what was done was unsupported e.g. saying that Developer Toolset used with Intel CET flags to build anything on any system with glibc < 2.28 generates unsupported binaries. My suggestion in the other email is that we briefly investigate what it would take version the data in the unwind buffer such that we *can* handle different sized objects properly, and enable us to potentially grow the unwind data we have in the future. To reiterate from my other email: * Version __sigsetjmp, and store a version cookie in __saved_mask indicating the version of the structure. * Have the old __sigsetjmp disable CET since it clearly indicates that we have the wrong sized structure regardless of CET flags. * Adjust the unwinder to look at __saved_mask version cooke to decide which sized structure it is dealing with. The third bullet point is effectively what you are *already* doing with the CET flags, but using the flags in a coarse way to decide which of 2 sizes of unwind buffers we are using. This has the problems that I outline above which is that it is error prone. While the symbol version mechanism will always tell us exactly which sized object was used when the TU was compiled.
* Carlos O'Donell: > In the truncated sigjmp_buf used by pthread cancellation we have only > void * which is not large enough on x32, where you need a 64-bit > shadow stack pointer. It would work on every other machine though. > HJ has mentioned this problem already IIRC. > > Why would __pthread_register_cancel overwrite it? __pthread_unwind_buf_t is defined as: typedef struct { struct { __jmp_buf __cancel_jmp_buf; int __mask_was_saved; } __cancel_jmp_buf[1]; void *__pad[4]; } __pthread_unwind_buf_t __attribute__ ((__aligned__)); pthread_cleanup_push does this: # define pthread_cleanup_push(routine, arg) \ do { \ __pthread_unwind_buf_t __cancel_buf; \ void (*__cancel_routine) (void *) = (routine); \ void *__cancel_arg = (arg); \ int __not_first_call = __sigsetjmp ((struct __jmp_buf_tag *) (void *) \ __cancel_buf.__cancel_jmp_buf, 0); \ if (__glibc_unlikely (__not_first_call)) \ { \ __cancel_routine (__cancel_arg); \ __pthread_unwind_next (&__cancel_buf); \ /* NOTREACHED */ \ } \ \ __pthread_register_cancel (&__cancel_buf); \ do { __pthread_register_cancel overwrites __cancel_buf.__pad. If __sigsetjmp were to write to that memory area, it would not matter, as long as we skip restoring the shadow stack pointer during unwinding (which we do not need to do because we never return along the regular execution path recorded on the shadow stack). In short, the only thing need to ensure is that the over-write from __sigsetjmp stays within __cancel_buf. Then we are good, without changing the stack layout for cancellation. My proposal is still rather hackish, but so is the existing code (the truncated jump buffer), and HJ's approach of storing the shadow stack pointer in the signal save area of the non-truncated jump buffer. But I think we can make it work.
On Fri, Feb 9, 2018 at 2:48 AM, Florian Weimer <fw@deneb.enyo.de> wrote: > * Carlos O'Donell: > >> In the truncated sigjmp_buf used by pthread cancellation we have only >> void * which is not large enough on x32, where you need a 64-bit >> shadow stack pointer. It would work on every other machine though. >> HJ has mentioned this problem already IIRC. >> >> Why would __pthread_register_cancel overwrite it? > > __pthread_unwind_buf_t is defined as: > > typedef struct > { > struct > { > __jmp_buf __cancel_jmp_buf; > int __mask_was_saved; > } __cancel_jmp_buf[1]; > void *__pad[4]; > } __pthread_unwind_buf_t __attribute__ ((__aligned__)); > > pthread_cleanup_push does this: > > # define pthread_cleanup_push(routine, arg) \ > do { \ > __pthread_unwind_buf_t __cancel_buf; \ > void (*__cancel_routine) (void *) = (routine); \ > void *__cancel_arg = (arg); \ > int __not_first_call = __sigsetjmp ((struct __jmp_buf_tag *) (void *) \ > __cancel_buf.__cancel_jmp_buf, 0); \ > if (__glibc_unlikely (__not_first_call)) \ > { \ > __cancel_routine (__cancel_arg); \ > __pthread_unwind_next (&__cancel_buf); \ > /* NOTREACHED */ \ > } \ > \ > __pthread_register_cancel (&__cancel_buf); \ > do { > > __pthread_register_cancel overwrites __cancel_buf.__pad. If > __sigsetjmp were to write to that memory area, it would not matter, as > long as we skip restoring the shadow stack pointer during unwinding > (which we do not need to do because we never return along the regular > execution path recorded on the shadow stack). That is true. > In short, the only thing need to ensure is that the over-write from > __sigsetjmp stays within __cancel_buf. Then we are good, without > changing the stack layout for cancellation. That is correct. > My proposal is still rather hackish, but so is the existing code (the A pointer to a buffer in user program is passed to libpthread. There is a jmp buf in the buffer followed by other fields. Since the size of jmp buf is increased in glibc 2.28, we need to know the offset of other fields. Otherwise libpthread may write beyond the buffer in user program. I don't see how symbol versioning can help us here since the INTERNAL libpthread functions don't know the layout of __pthread_unwind_buf_t of USER programs. > truncated jump buffer), and HJ's approach of storing the shadow stack > pointer in the signal save area of the non-truncated jump buffer. But > I think we can make it work.
* H. J. Lu: >> My proposal is still rather hackish, but so is the existing code (the > > A pointer to a buffer in user program is passed to libpthread. > There is a jmp buf in the buffer followed by other fields. Since > the size of jmp buf is increased in glibc 2.28, we need to know the > offset of other fields. Otherwise libpthread may write beyond the > buffer in user program. I don't see how symbol versioning can help > us here since the INTERNAL libpthread functions don't know the > layout of __pthread_unwind_buf_t of USER programs. I suggest *not* to increase the size of the jump buffer. CET markup will not be correct for static libraries compiled against 2.27 or earlier with a CET-enabled toolchain, so this is the only completely safe approach.
On Fri, Feb 9, 2018 at 4:11 AM, Florian Weimer <fw@deneb.enyo.de> wrote: > * H. J. Lu: > >>> My proposal is still rather hackish, but so is the existing code (the >> >> A pointer to a buffer in user program is passed to libpthread. >> There is a jmp buf in the buffer followed by other fields. Since >> the size of jmp buf is increased in glibc 2.28, we need to know the >> offset of other fields. Otherwise libpthread may write beyond the >> buffer in user program. I don't see how symbol versioning can help >> us here since the INTERNAL libpthread functions don't know the >> layout of __pthread_unwind_buf_t of USER programs. > > I suggest *not* to increase the size of the jump buffer. Where do we save shadow stack pointer? > CET markup will not be correct for static libraries compiled against > 2.27 or earlier with a CET-enabled toolchain, so this is the only > completely safe approach. I don't believe so. As long as there is a single input .o file which isn't marked with CET, the output static binary won't be marked with CET. If what you said is true, please file a glibc bug report.
On Fri, Feb 9, 2018 at 4:34 AM, H.J. Lu <hjl.tools@gmail.com> wrote: > On Fri, Feb 9, 2018 at 4:11 AM, Florian Weimer <fw@deneb.enyo.de> wrote: >> * H. J. Lu: >> >>>> My proposal is still rather hackish, but so is the existing code (the >>> >>> A pointer to a buffer in user program is passed to libpthread. >>> There is a jmp buf in the buffer followed by other fields. Since >>> the size of jmp buf is increased in glibc 2.28, we need to know the >>> offset of other fields. Otherwise libpthread may write beyond the >>> buffer in user program. I don't see how symbol versioning can help >>> us here since the INTERNAL libpthread functions don't know the >>> layout of __pthread_unwind_buf_t of USER programs. >> >> I suggest *not* to increase the size of the jump buffer. > > Where do we save shadow stack pointer? > >> CET markup will not be correct for static libraries compiled against >> 2.27 or earlier with a CET-enabled toolchain, so this is the only >> completely safe approach. > > I don't believe so. As long as there is a single input .o file which > isn't marked > with CET, the output static binary won't be marked with CET. If what > you said is true, please file a glibc bug report. > > I built glibc master with gcc-8.0.1 -mcet -fcf-protection. Some object files do get CET marker as expected. But static executable isn't: [hjl@gnu-skx-1 build-x86_64-linux]$ readelf -n elf/sln.o Displaying notes found in: .note.gnu.property Owner Data size Description GNU 0x00000010 NT_GNU_PROPERTY_TYPE_0 Properties: x86 feature: IBT, SHSTK [hjl@gnu-skx-1 build-x86_64-linux]$ readelf -n elf/sln Displaying notes found in: .note.ABI-tag Owner Data size Description GNU 0x00000010 NT_GNU_ABI_TAG (ABI version tag) OS: Linux, ABI: 3.2.0 Displaying notes found in: .note.gnu.build-id Owner Data size Description GNU 0x00000014 NT_GNU_BUILD_ID (unique build ID bitstring) Build ID: ab9587f0ef16086b740923f72bc1a024b6185b06 [hjl@gnu-skx-1 build-x86_64-linux]$ My CET ecosystem design prevents what you worried from happening.
On 02/09/2018 03:13 PM, H.J. Lu wrote: > I built glibc master with gcc-8.0.1 -mcet -fcf-protection. Some object > files do get CET marker as expected. But static executable isn't: I said static libraries. If I compile this code (based on the example from the manual page) on Fedora rawhide: #include <pthread.h> #include <stdio.h> static int done = 0; static int cleanup_pop_arg = 0; static int cnt = 0; static void cleanup_handler(void *arg) { printf("Called clean-up handler\n"); cnt = 0; } void * thread_start(void *arg) { time_t start, curr; printf("New thread started\n"); pthread_cleanup_push(cleanup_handler, NULL); curr = start = time(NULL); while (!done) { pthread_testcancel(); /* A cancellation point */ if (curr < time(NULL)) { curr = time(NULL); printf("cnt = %d\n", cnt); /* A cancellation point */ cnt++; } } pthread_cleanup_pop(cleanup_pop_arg); return NULL; } the small jump buffer is used: 0000000000000030 <thread_start>: 30: f3 0f 1e fa endbr64 34: 53 push %rbx 35: bf 00 00 00 00 mov $0x0,%edi 36: R_X86_64_32 .rodata.str1.1+0x18 3a: 48 83 ec 70 sub $0x70,%rsp 3e: e8 00 00 00 00 callq 43 <thread_start+0x13> 3f: R_X86_64_PC32 puts-0x4 43: 31 f6 xor %esi,%esi 45: 48 89 e7 mov %rsp,%rdi 48: e8 00 00 00 00 callq 4d <thread_start+0x1d> 49: R_X86_64_PC32 __sigsetjmp-0x4 4d: f3 0f 1e fa endbr64 51: 85 c0 test %eax,%eax 53: 75 51 jne a6 <thread_start+0x76> 55: 48 89 e7 mov %rsp,%rdi 58: e8 00 00 00 00 callq 5d <thread_start+0x2d> 59: R_X86_64_PC32 __pthread_register_cancel-0x4 And it looks to me that readelf says the object file is compatible with CET: Displaying notes found in: .note.gnu.property Owner Data size Description GNU 0x00000010 NT_GNU_PROPERTY_TYPE_0 Properties: x86 feature: IBT, SHSTK Thanks, Florian
On Fri, Feb 9, 2018 at 6:33 AM, Florian Weimer <fweimer@redhat.com> wrote: > On 02/09/2018 03:13 PM, H.J. Lu wrote: >> >> I built glibc master with gcc-8.0.1 -mcet -fcf-protection. Some object >> files do get CET marker as expected. But static executable isn't: > > > I said static libraries. If I compile this code (based on the example from > the manual page) on Fedora rawhide: > Glibc never provides binary compatibility with static libraries. My suggestions are 1. Recompile static libraries after CET is enabled in glibc. Or 2. Don't compile static libraries with CET. BTW, we don't have space to save shadow stack register with existing cancel buf.
On 02/09/2018 04:34 AM, H.J. Lu wrote: > On Fri, Feb 9, 2018 at 4:11 AM, Florian Weimer <fw@deneb.enyo.de> wrote: >> * H. J. Lu: >> >>>> My proposal is still rather hackish, but so is the existing code (the >>> >>> A pointer to a buffer in user program is passed to libpthread. >>> There is a jmp buf in the buffer followed by other fields. Since >>> the size of jmp buf is increased in glibc 2.28, we need to know the >>> offset of other fields. Otherwise libpthread may write beyond the >>> buffer in user program. I don't see how symbol versioning can help >>> us here since the INTERNAL libpthread functions don't know the >>> layout of __pthread_unwind_buf_t of USER programs. >> >> I suggest *not* to increase the size of the jump buffer. > > Where do we save shadow stack pointer? typedef struct { struct { __jmp_buf __cancel_jmp_buf; int __mask_was_saved; } __cancel_jmp_buf[1]; void *__pad[4]; ^^^^^^^^^^^^^^^ Save the shadow stack pointer here. } __pthread_unwind_buf_t __attribute__ ((__aligned__)); Save the shadow stack pointer to __pad[4] by making the internal sigset_t smaller and moving it down. The key aspect of Florian's recommendation is a realization that a pthread_cleanup_pop can only restore you to the *same* function e.g. the earlier pthread_cleanup_push, and therefore does not need to change the shadow stack pointer. All subsequent unwinding will proceed from one jump buffer to the next, and through the unwinder, until it reaches pthread_create. When will we ever return via a normal return instruction and need to verify via the shadow stack?
On 02/09/2018 02:48 AM, Florian Weimer wrote: > In short, the only thing need to ensure is that the over-write from > __sigsetjmp stays within __cancel_buf. Then we are good, without > changing the stack layout for cancellation. > > My proposal is still rather hackish, but so is the existing code (the > truncated jump buffer), and HJ's approach of storing the shadow stack > pointer in the signal save area of the non-truncated jump buffer. But > I think we can make it work. I disagree completely. I think this is an an elegant solution to an ugly problem. We need to see if HJ has any other requirements which may conflict with this suggestion.
On Fri, Feb 23, 2018 at 9:46 PM, Carlos O'Donell <carlos@redhat.com> wrote: > On 02/09/2018 04:34 AM, H.J. Lu wrote: >> On Fri, Feb 9, 2018 at 4:11 AM, Florian Weimer <fw@deneb.enyo.de> wrote: >>> * H. J. Lu: >>> >>>>> My proposal is still rather hackish, but so is the existing code (the >>>> >>>> A pointer to a buffer in user program is passed to libpthread. >>>> There is a jmp buf in the buffer followed by other fields. Since >>>> the size of jmp buf is increased in glibc 2.28, we need to know the >>>> offset of other fields. Otherwise libpthread may write beyond the >>>> buffer in user program. I don't see how symbol versioning can help >>>> us here since the INTERNAL libpthread functions don't know the >>>> layout of __pthread_unwind_buf_t of USER programs. >>> >>> I suggest *not* to increase the size of the jump buffer. >> >> Where do we save shadow stack pointer? > > typedef struct > { > struct > { > __jmp_buf __cancel_jmp_buf; > int __mask_was_saved; > } __cancel_jmp_buf[1]; > > > void *__pad[4]; > ^^^^^^^^^^^^^^^ Save the shadow stack pointer here. > > > } __pthread_unwind_buf_t __attribute__ ((__aligned__)); > > Save the shadow stack pointer to __pad[4] by making the > internal sigset_t smaller and moving it down. > > The key aspect of Florian's recommendation is a realization > that a pthread_cleanup_pop can only restore you to the *same* > function e.g. the earlier pthread_cleanup_push, and therefore > does not need to change the shadow stack pointer. PLEASE take a closer look: Yes, there are void *__pad[4]; But the name is misleading. It isn't real padding. This is an opaque array: /* Private data in the cleanup buffer. */ union pthread_unwind_buf_data { /* This is the placeholder of the public version. */ void *pad[4]; struct { /* Pointer to the previous cleanup buffer. */ struct pthread_unwind_buf *prev; /* Backward compatibility: state of the old-style cleanup handler at the time of the previous new-style cleanup handler installment. */ struct _pthread_cleanup_buffer *cleanup; /* Cancellation type before the push call. */ int canceltype; } data; }; Only the last element in __pad[4] is unused. There is --- Note: There is an unused pointer space in pthread_unwind_buf_data. But it isn't suitable for saving and restoring shadow stack register since x32 is a 64-bit process with 32-bit software pointer and kernel may place x32 shadow stack above 4GB. We need to save and restore 64-bit shadow stack register for x32. --- in my commit log to explain why it isn't suitable for shadow stack.
* H. J. Lu: > PLEASE take a closer look: > > Yes, there are > > void *__pad[4]; > > But the name is misleading. It isn't real padding. This is > an opaque array: > > /* Private data in the cleanup buffer. */ > union pthread_unwind_buf_data > { > /* This is the placeholder of the public version. */ > void *pad[4]; > > struct > { > /* Pointer to the previous cleanup buffer. */ > struct pthread_unwind_buf *prev; > > /* Backward compatibility: state of the old-style cleanup > handler at the time of the previous new-style cleanup handler > installment. */ > struct _pthread_cleanup_buffer *cleanup; > > /* Cancellation type before the push call. */ > int canceltype; > } data; > }; > > Only the last element in __pad[4] is unused. There is The entire __pad array is unused until the handler is registered, which happens *after* the call to __sigsetjmp, in the __pthread_register_cancel function. This means that __sigsetjmp may clobber it.
On Sat, Feb 24, 2018 at 7:46 AM, Florian Weimer <fw@deneb.enyo.de> wrote: > * H. J. Lu: > >> PLEASE take a closer look: >> >> Yes, there are >> >> void *__pad[4]; >> >> But the name is misleading. It isn't real padding. This is >> an opaque array: >> >> /* Private data in the cleanup buffer. */ >> union pthread_unwind_buf_data >> { >> /* This is the placeholder of the public version. */ >> void *pad[4]; >> >> struct >> { >> /* Pointer to the previous cleanup buffer. */ >> struct pthread_unwind_buf *prev; >> >> /* Backward compatibility: state of the old-style cleanup >> handler at the time of the previous new-style cleanup handler >> installment. */ >> struct _pthread_cleanup_buffer *cleanup; >> >> /* Cancellation type before the push call. */ >> int canceltype; >> } data; >> }; >> >> Only the last element in __pad[4] is unused. There is > > The entire __pad array is unused until the handler is registered, > which happens *after* the call to __sigsetjmp, in the > __pthread_register_cancel function. This means that __sigsetjmp may > clobber it. Please check out hjl/setjmp/pad branch and check it on x86-64. 1. It uses pad array in struct pthread_unwind_buf to save and restore shadow stack register if size of struct pthread_unwind_buf is no less than offset of shadow stack pointer + shadow stack pointer size. 2. It stores (int64_t) -1 as shadow stack register in x86-64 setjmp and read it back in x86-64 longjmp to verify that it is unchanged. I got FAIL: nptl/tst-basic3 FAIL: nptl/tst-cancel-self FAIL: nptl/tst-cancel-self-cancelstate FAIL: nptl/tst-cancel-self-canceltype FAIL: nptl/tst-cancel-self-testcancel FAIL: nptl/tst-cancel1 FAIL: nptl/tst-cancel10 FAIL: nptl/tst-cancel11 FAIL: nptl/tst-cancel12 FAIL: nptl/tst-cancel13 FAIL: nptl/tst-cancel14 FAIL: nptl/tst-cancel15 FAIL: nptl/tst-cancel16 FAIL: nptl/tst-cancel17 FAIL: nptl/tst-cancel18 FAIL: nptl/tst-cancel20 FAIL: nptl/tst-cancel21 FAIL: nptl/tst-cancel21-static FAIL: nptl/tst-cancel24 FAIL: nptl/tst-cancel24-static FAIL: nptl/tst-cancel25 FAIL: nptl/tst-cancel4 FAIL: nptl/tst-cancel4_1 FAIL: nptl/tst-cancel4_2 FAIL: nptl/tst-cancel5 FAIL: nptl/tst-cancel7 FAIL: nptl/tst-cancel9 FAIL: nptl/tst-cancelx13 FAIL: nptl/tst-cancelx15 FAIL: nptl/tst-cancelx21 FAIL: nptl/tst-cancelx7 FAIL: nptl/tst-cleanup0 FAIL: nptl/tst-cleanup0-cmp FAIL: nptl/tst-cleanup1 FAIL: nptl/tst-cleanup3 FAIL: nptl/tst-cleanup4 FAIL: nptl/tst-cleanupx0 FAIL: nptl/tst-cleanupx4 FAIL: nptl/tst-cond-except FAIL: nptl/tst-cond22 FAIL: nptl/tst-cond25 FAIL: nptl/tst-cond7 FAIL: nptl/tst-cond8 FAIL: nptl/tst-cond8-static FAIL: nptl/tst-execstack FAIL: nptl/tst-exit2 FAIL: nptl/tst-exit3 FAIL: nptl/tst-join1 FAIL: nptl/tst-join5 FAIL: nptl/tst-mutex8 FAIL: nptl/tst-mutex8-static FAIL: nptl/tst-mutexpi8 FAIL: nptl/tst-mutexpi8-static FAIL: nptl/tst-once3 FAIL: nptl/tst-once4 FAIL: nptl/tst-oncex3 FAIL: nptl/tst-oncex4 FAIL: nptl/tst-sem11 FAIL: nptl/tst-sem11-static FAIL: nptl/tst-sem12 FAIL: nptl/tst-sem12-static FAIL: nptl/tst-tsd5 FAIL: nss/tst-cancel-getpwuid_r FAIL: rt/tst-mqueue8 FAIL: nptl/tst-setuid2 For nptl/tst-tsd5, it went like this: 1. __libc_start_main calls __sigsetjmp: /* Memory for the cancellation buffer. */ struct pthread_unwind_buf unwind_buf; int not_first_call; not_first_call = setjmp ((struct __jmp_buf_tag *) unwind_buf.cancel_jmp_buf); if (__glibc_likely (! not_first_call)) __sigsetjmp stores -1 as shadow stack pointer. 2. After calling __sigsetjmp, __libc_start_main does /* Store old info. */ unwind_buf.priv.data.prev = THREAD_GETMEM (self, cleanup_jmp_buf); unwind_buf.priv.data.cleanup = THREAD_GETMEM (self, cleanup); which overrides shadow stack pointer. What have I done wrong?
* H. J. Lu:
> What have I done wrong?
My understanding (shared by Carlos as far as I know) is that you do
not need to restore the shadow stack pointer when unwinding for
cancellation or thread exit.
On Sun, Feb 25, 2018 at 1:26 AM, Florian Weimer <fw@deneb.enyo.de> wrote: > * H. J. Lu: > >> What have I done wrong? > > My understanding (shared by Carlos as far as I know) is that you do > not need to restore the shadow stack pointer when unwinding for > cancellation or thread exit. It may be true for thread exit. Are you saying that a function will never return after cancellation point?
* H. J. Lu: > On Sun, Feb 25, 2018 at 1:26 AM, Florian Weimer <fw@deneb.enyo.de> wrote: >> * H. J. Lu: >> >>> What have I done wrong? >> >> My understanding (shared by Carlos as far as I know) is that you do >> not need to restore the shadow stack pointer when unwinding for >> cancellation or thread exit. > > It may be true for thread exit. Are you saying that a function will never > return after cancellation point? That's certainly the intent. Cancellation is supposed to be unstoppable.
On Sun, Feb 25, 2018 at 3:59 AM, Florian Weimer <fw@deneb.enyo.de> wrote: > * H. J. Lu: > >> On Sun, Feb 25, 2018 at 1:26 AM, Florian Weimer <fw@deneb.enyo.de> wrote: >>> * H. J. Lu: >>> >>>> What have I done wrong? >>> >>> My understanding (shared by Carlos as far as I know) is that you do >>> not need to restore the shadow stack pointer when unwinding for >>> cancellation or thread exit. >> >> It may be true for thread exit. Are you saying that a function will never >> return after cancellation point? > > That's certainly the intent. Cancellation is supposed to be > unstoppable. The question is what happens after cancellation. Can a function return after cancellation?
On Sun, Feb 25, 2018 at 4:53 AM, H.J. Lu <hjl.tools@gmail.com> wrote: > On Sun, Feb 25, 2018 at 3:59 AM, Florian Weimer <fw@deneb.enyo.de> wrote: >> * H. J. Lu: >> >>> On Sun, Feb 25, 2018 at 1:26 AM, Florian Weimer <fw@deneb.enyo.de> wrote: >>>> * H. J. Lu: >>>> >>>>> What have I done wrong? >>>> >>>> My understanding (shared by Carlos as far as I know) is that you do >>>> not need to restore the shadow stack pointer when unwinding for >>>> cancellation or thread exit. >>> >>> It may be true for thread exit. Are you saying that a function will never >>> return after cancellation point? >> >> That's certainly the intent. Cancellation is supposed to be >> unstoppable. > > The question is what happens after cancellation. Can a function return > after cancellation? > Let me rephrase. Can a function, which contains cancellation points, return to its caller after cancellation happened?
* H. J. Lu: > Let me rephrase. Can a function, which contains cancellation points, > return to its caller after cancellation happened? I don't think it can. It's undefined to use longjmp to jump out of a cancellation handler. On the C++ side, we make sure that the unwinding always proceeds, even across catch (...) blocks which swallow the exception.
On Sun, Feb 25, 2018 at 4:58 AM, Florian Weimer <fw@deneb.enyo.de> wrote: > * H. J. Lu: > >> Let me rephrase. Can a function, which contains cancellation points, >> return to its caller after cancellation happened? > > I don't think it can. It's undefined to use longjmp to jump out of a > cancellation handler. On the C++ side, we make sure that the > unwinding always proceeds, even across catch (...) blocks which > swallow the exception. libpthread cancellation implementation passes cancel_jmp_buf to libgcc unwinder, which saves and restores shadow stack register for C++ exception. How can libgcc unwinder tell when to save and restore shadow stack register?
* H. J. Lu: > libpthread cancellation implementation passes cancel_jmp_buf to > libgcc unwinder, Oh. Where does it do that? If you mean _Unwind_ForcedUnwind, I think that's just an opaque closure argument for the callback.
On Sun, Feb 25, 2018 at 5:31 AM, Florian Weimer <fw@deneb.enyo.de> wrote: > * H. J. Lu: > >> libpthread cancellation implementation passes cancel_jmp_buf to >> libgcc unwinder, > > Oh. Where does it do that? If you mean _Unwind_ForcedUnwind, I think > that's just an opaque closure argument for the callback. Yes. Libgcc unwinder needs to deal with it.
On Sun, Feb 25, 2018 at 5:36 AM, H.J. Lu <hjl.tools@gmail.com> wrote: > On Sun, Feb 25, 2018 at 5:31 AM, Florian Weimer <fw@deneb.enyo.de> wrote: >> * H. J. Lu: >> >>> libpthread cancellation implementation passes cancel_jmp_buf to >>> libgcc unwinder, >> >> Oh. Where does it do that? If you mean _Unwind_ForcedUnwind, I think >> that's just an opaque closure argument for the callback. > > Yes. Libgcc unwinder needs to deal with it. > Because of libgcc unwinder, provide another set of setjmp/longjmp without saving and restoring shadow stack register for thread cancellation won't solve our problem.
* H. J. Lu: > On Sun, Feb 25, 2018 at 5:31 AM, Florian Weimer <fw@deneb.enyo.de> wrote: >> * H. J. Lu: >> >>> libpthread cancellation implementation passes cancel_jmp_buf to >>> libgcc unwinder, >> >> Oh. Where does it do that? If you mean _Unwind_ForcedUnwind, I think >> that's just an opaque closure argument for the callback. > > Yes. Libgcc unwinder needs to deal with it. Please point me to the code. Thanks.
On Sun, Feb 25, 2018 at 5:49 AM, Florian Weimer <fw@deneb.enyo.de> wrote: > * H. J. Lu: > >> On Sun, Feb 25, 2018 at 5:31 AM, Florian Weimer <fw@deneb.enyo.de> wrote: >>> * H. J. Lu: >>> >>>> libpthread cancellation implementation passes cancel_jmp_buf to >>>> libgcc unwinder, >>> >>> Oh. Where does it do that? If you mean _Unwind_ForcedUnwind, I think >>> that's just an opaque closure argument for the callback. >> >> Yes. Libgcc unwinder needs to deal with it. > > Please point me to the code. Thanks. sysdeps/nptl/unwind-forcedunwind.c has _Unwind_Reason_Code _Unwind_ForcedUnwind (struct _Unwind_Exception *exc, _Unwind_Stop_Fn stop, void *stop_argument) { if (__glibc_unlikely (libgcc_s_handle == NULL)) pthread_cancel_init (); else atomic_read_barrier (); _Unwind_Reason_Code (*forcedunwind) (struct _Unwind_Exception *, _Unwind_Stop_Fn, void *) = libgcc_s_forcedunwind; PTR_DEMANGLE (forcedunwind); return forcedunwind (exc, stop, stop_argument); }
* H. J. Lu: > On Sun, Feb 25, 2018 at 5:49 AM, Florian Weimer <fw@deneb.enyo.de> wrote: >> * H. J. Lu: >> >>> On Sun, Feb 25, 2018 at 5:31 AM, Florian Weimer <fw@deneb.enyo.de> wrote: >>>> * H. J. Lu: >>>> >>>>> libpthread cancellation implementation passes cancel_jmp_buf to >>>>> libgcc unwinder, >>>> >>>> Oh. Where does it do that? If you mean _Unwind_ForcedUnwind, I think >>>> that's just an opaque closure argument for the callback. >>> >>> Yes. Libgcc unwinder needs to deal with it. >> >> Please point me to the code. Thanks. > > sysdeps/nptl/unwind-forcedunwind.c has > > _Unwind_Reason_Code > _Unwind_ForcedUnwind (struct _Unwind_Exception *exc, _Unwind_Stop_Fn stop, > void *stop_argument) > { > if (__glibc_unlikely (libgcc_s_handle == NULL)) > pthread_cancel_init (); > else > atomic_read_barrier (); > > _Unwind_Reason_Code (*forcedunwind) > (struct _Unwind_Exception *, _Unwind_Stop_Fn, void *) > = libgcc_s_forcedunwind; > PTR_DEMANGLE (forcedunwind); > return forcedunwind (exc, stop, stop_argument); > } Thanks. I think stop_argument ends up in the private_2 member inside unwind.inc, which is only passed back to the callback (the stop function pointer) in _Unwind_ForcedUnwind_Phase2, and not interpreted by libgcc itself. So this shouldn't be a problem.
On Sun, Feb 25, 2018 at 6:13 AM, Florian Weimer <fw@deneb.enyo.de> wrote: > * H. J. Lu: > >> On Sun, Feb 25, 2018 at 5:49 AM, Florian Weimer <fw@deneb.enyo.de> wrote: >>> * H. J. Lu: >>> >>>> On Sun, Feb 25, 2018 at 5:31 AM, Florian Weimer <fw@deneb.enyo.de> wrote: >>>>> * H. J. Lu: >>>>> >>>>>> libpthread cancellation implementation passes cancel_jmp_buf to >>>>>> libgcc unwinder, >>>>> >>>>> Oh. Where does it do that? If you mean _Unwind_ForcedUnwind, I think >>>>> that's just an opaque closure argument for the callback. >>>> >>>> Yes. Libgcc unwinder needs to deal with it. >>> >>> Please point me to the code. Thanks. >> >> sysdeps/nptl/unwind-forcedunwind.c has >> >> _Unwind_Reason_Code >> _Unwind_ForcedUnwind (struct _Unwind_Exception *exc, _Unwind_Stop_Fn stop, >> void *stop_argument) >> { >> if (__glibc_unlikely (libgcc_s_handle == NULL)) >> pthread_cancel_init (); >> else >> atomic_read_barrier (); >> >> _Unwind_Reason_Code (*forcedunwind) >> (struct _Unwind_Exception *, _Unwind_Stop_Fn, void *) >> = libgcc_s_forcedunwind; >> PTR_DEMANGLE (forcedunwind); >> return forcedunwind (exc, stop, stop_argument); >> } > > Thanks. I think stop_argument ends up in the private_2 member inside > unwind.inc, which is only passed back to the callback (the stop > function pointer) in _Unwind_ForcedUnwind_Phase2, and not interpreted > by libgcc itself. So this shouldn't be a problem. Please take a look at hjl/setjmp/cancel branch: https://github.com/hjl-tools/glibc/tree/hjl/setjmp/cancel Functions, like LIBC_START_MAIN, START_THREAD_DEFN as well as these with thread cancellation, call setjmp, but never return to their callers after longjmp returns. This patch adds <bits/setjmp-cancel.h> and <setjmp-cancelP.h> to provide a version of setjmp family functions, __setjmp_cancel and __sigsetjmp_cancel, which are used to implement thread cancellation. The default __setjmp_cancel and __sigsetjmp_cancel are defined as setjmp and __sigsetjmp, respectively, which are the same as before. On x86, a different version is added to avoid saving and restoring shadow stack register. __libc_longjmp, which is a private interface for cancellation implementation in libpthread, is changed to call __longjmp_cancel instead of __longjmp. This leaves cancel_jmp_buf unchanged. But is it worth it? 1. I have to add __setjmp_cancel and __sigsetjmp_cancel which won't save and restore shadow stack register. 2. I still need yet to add the new version of __sigsetjmp for older binaries. 3, Older .o files compiled against glibc 2.27 are still incompatible with glibc 2.28.
On 02/25/2018 07:55 PM, H.J. Lu wrote: > On Sun, Feb 25, 2018 at 6:13 AM, Florian Weimer <fw@deneb.enyo.de> wrote: >> * H. J. Lu: >> >>> On Sun, Feb 25, 2018 at 5:49 AM, Florian Weimer <fw@deneb.enyo.de> wrote: >>>> * H. J. Lu: >>>> >>>>> On Sun, Feb 25, 2018 at 5:31 AM, Florian Weimer <fw@deneb.enyo.de> wrote: >>>>>> * H. J. Lu: >>>>>> >>>>>>> libpthread cancellation implementation passes cancel_jmp_buf to >>>>>>> libgcc unwinder, >>>>>> >>>>>> Oh. Where does it do that? If you mean _Unwind_ForcedUnwind, I think >>>>>> that's just an opaque closure argument for the callback. >>>>> >>>>> Yes. Libgcc unwinder needs to deal with it. >>>> >>>> Please point me to the code. Thanks. >>> >>> sysdeps/nptl/unwind-forcedunwind.c has >>> >>> _Unwind_Reason_Code >>> _Unwind_ForcedUnwind (struct _Unwind_Exception *exc, _Unwind_Stop_Fn stop, >>> void *stop_argument) >>> { >>> if (__glibc_unlikely (libgcc_s_handle == NULL)) >>> pthread_cancel_init (); >>> else >>> atomic_read_barrier (); >>> >>> _Unwind_Reason_Code (*forcedunwind) >>> (struct _Unwind_Exception *, _Unwind_Stop_Fn, void *) >>> = libgcc_s_forcedunwind; >>> PTR_DEMANGLE (forcedunwind); >>> return forcedunwind (exc, stop, stop_argument); >>> } >> >> Thanks. I think stop_argument ends up in the private_2 member inside >> unwind.inc, which is only passed back to the callback (the stop >> function pointer) in _Unwind_ForcedUnwind_Phase2, and not interpreted >> by libgcc itself. So this shouldn't be a problem. > > Please take a look at hjl/setjmp/cancel branch: > > https://github.com/hjl-tools/glibc/tree/hjl/setjmp/cancel OK. > Functions, like LIBC_START_MAIN, START_THREAD_DEFN as well as these > with thread cancellation, call setjmp, but never return to their > callers after longjmp returns. This patch adds <bits/setjmp-cancel.h> > and <setjmp-cancelP.h> to provide a version of setjmp family functions, > __setjmp_cancel and __sigsetjmp_cancel, which are used to implement > thread cancellation. The default __setjmp_cancel and __sigsetjmp_cancel > are defined as setjmp and __sigsetjmp, respectively, which are the same > as before. > On x86, a different version is added to avoid saving and restoring > shadow stack register. __libc_longjmp, which is a private interface > for cancellation implementation in libpthread, is changed to call > __longjmp_cancel instead of __longjmp. The consequence of this solution is that any function calls after the first longjmp will continue to extend the shadow stack because there is no restoration? I think this is OK, you effectively need to have enough shadow stack to run through all of the unwind functions as if they were nested frames on the shadow stack. Do we see that being a problem? > This leaves cancel_jmp_buf unchanged. But is it worth it? Absolutely it is worth it. This new design looks much simpler to support. The technical trade is as follows: * Three new versioned symbols for __sigsetjmp, __sigsetjmp_cancel, and __setjmp_cancel. * You can avoid versioning by placing the shadow stack pointer such that it is written into the pad[4] of the cancel_jmp_buf, and thus it would always have space to be written. However, it is cleaner to have cancel versions of these functions that do less. vs. * An ABI change that requires coordination across glibc, gcc, and binutils to ensure there is a "flag day" where the ABI can be correctly selected at runtime. The latter is not easy to execute for downstream, and the versioned symbols provide a good barrier in terms of compatibility and errors reported by the dynamic loader. > 1. I have to add __setjmp_cancel and __sigsetjmp_cancel which won't > save and restore shadow stack register. OK. > 2. I still need yet to add the new version of __sigsetjmp for older binaries. OK. > 3, Older .o files compiled against glibc 2.27 are still incompatible with > glibc 2.28. This is normal, and has never been assured.
On Wed, Feb 28, 2018 at 3:23 PM, Carlos O'Donell <carlos@redhat.com> wrote: > On 02/25/2018 07:55 PM, H.J. Lu wrote: >> On Sun, Feb 25, 2018 at 6:13 AM, Florian Weimer <fw@deneb.enyo.de> wrote: >>> * H. J. Lu: >>> >>>> On Sun, Feb 25, 2018 at 5:49 AM, Florian Weimer <fw@deneb.enyo.de> wrote: >>>>> * H. J. Lu: >>>>> >>>>>> On Sun, Feb 25, 2018 at 5:31 AM, Florian Weimer <fw@deneb.enyo.de> wrote: >>>>>>> * H. J. Lu: >>>>>>> >>>>>>>> libpthread cancellation implementation passes cancel_jmp_buf to >>>>>>>> libgcc unwinder, >>>>>>> >>>>>>> Oh. Where does it do that? If you mean _Unwind_ForcedUnwind, I think >>>>>>> that's just an opaque closure argument for the callback. >>>>>> >>>>>> Yes. Libgcc unwinder needs to deal with it. >>>>> >>>>> Please point me to the code. Thanks. >>>> >>>> sysdeps/nptl/unwind-forcedunwind.c has >>>> >>>> _Unwind_Reason_Code >>>> _Unwind_ForcedUnwind (struct _Unwind_Exception *exc, _Unwind_Stop_Fn stop, >>>> void *stop_argument) >>>> { >>>> if (__glibc_unlikely (libgcc_s_handle == NULL)) >>>> pthread_cancel_init (); >>>> else >>>> atomic_read_barrier (); >>>> >>>> _Unwind_Reason_Code (*forcedunwind) >>>> (struct _Unwind_Exception *, _Unwind_Stop_Fn, void *) >>>> = libgcc_s_forcedunwind; >>>> PTR_DEMANGLE (forcedunwind); >>>> return forcedunwind (exc, stop, stop_argument); >>>> } >>> >>> Thanks. I think stop_argument ends up in the private_2 member inside >>> unwind.inc, which is only passed back to the callback (the stop >>> function pointer) in _Unwind_ForcedUnwind_Phase2, and not interpreted >>> by libgcc itself. So this shouldn't be a problem. >> >> Please take a look at hjl/setjmp/cancel branch: >> >> https://github.com/hjl-tools/glibc/tree/hjl/setjmp/cancel > > OK. > >> Functions, like LIBC_START_MAIN, START_THREAD_DEFN as well as these >> with thread cancellation, call setjmp, but never return to their >> callers after longjmp returns. This patch adds <bits/setjmp-cancel.h> >> and <setjmp-cancelP.h> to provide a version of setjmp family functions, >> __setjmp_cancel and __sigsetjmp_cancel, which are used to implement >> thread cancellation. The default __setjmp_cancel and __sigsetjmp_cancel >> are defined as setjmp and __sigsetjmp, respectively, which are the same >> as before. > >> On x86, a different version is added to avoid saving and restoring >> shadow stack register. __libc_longjmp, which is a private interface >> for cancellation implementation in libpthread, is changed to call >> __longjmp_cancel instead of __longjmp. > > The consequence of this solution is that any function calls after the > first longjmp will continue to extend the shadow stack because there is > no restoration? I think this is OK, you effectively need to have enough > shadow stack to run through all of the unwind functions as if they were > nested frames on the shadow stack. > > Do we see that being a problem? > >> This leaves cancel_jmp_buf unchanged. But is it worth it? > > Absolutely it is worth it. This new design looks much simpler to support. > > The technical trade is as follows: > > * Three new versioned symbols for __sigsetjmp, __sigsetjmp_cancel, > and __setjmp_cancel. > > * You can avoid versioning by placing the shadow stack pointer such > that it is written into the pad[4] of the cancel_jmp_buf, and thus > it would always have space to be written. However, it is cleaner > to have cancel versions of these functions that do less. > > vs. > > * An ABI change that requires coordination across glibc, gcc, and > binutils to ensure there is a "flag day" where the ABI can be correctly > selected at runtime. > > The latter is not easy to execute for downstream, and the versioned > symbols provide a good barrier in terms of compatibility and errors > reported by the dynamic loader. > >> 1. I have to add __setjmp_cancel and __sigsetjmp_cancel which won't >> save and restore shadow stack register. I have been testing this. I ran into one issue. GCC knows that setjmp will return via longjmp and inserts ENDBR after it. But it doesn't know __setjmp_cancel and __sigsetjmp_cancel. We can either add them to GCC or add NOTRACK prefix to the corresponding longjmps. > OK. > >> 2. I still need yet to add the new version of __sigsetjmp for older binaries. > > OK. > >> 3, Older .o files compiled against glibc 2.27 are still incompatible with >> glibc 2.28. > > This is normal, and has never been assured. > > -- > Cheers, > Carlos.
On 03/07/2018 03:56 AM, H.J. Lu wrote: >>> 1. I have to add __setjmp_cancel and __sigsetjmp_cancel which won't >>> save and restore shadow stack register. > > I have been testing this. I ran into one issue. GCC knows that setjmp will > return via longjmp and inserts ENDBR after it. But it doesn't know > __setjmp_cancel and __sigsetjmp_cancel. We can either add them to GCC > or add NOTRACK prefix to the corresponding longjmps. I would rather GCC did not know about these implementation details. I have no objection to the NOTRACK prefix in the corresponding longjmps. What would be a downside to this choice?
On Wed, Mar 7, 2018 at 9:34 AM, Carlos O'Donell <carlos@redhat.com> wrote: > On 03/07/2018 03:56 AM, H.J. Lu wrote: >>>> 1. I have to add __setjmp_cancel and __sigsetjmp_cancel which won't >>>> save and restore shadow stack register. >> >> I have been testing this. I ran into one issue. GCC knows that setjmp will >> return via longjmp and inserts ENDBR after it. But it doesn't know >> __setjmp_cancel and __sigsetjmp_cancel. We can either add them to GCC >> or add NOTRACK prefix to the corresponding longjmps. > > I would rather GCC did not know about these implementation details. > > I have no objection to the NOTRACK prefix in the corresponding longjmps. > > What would be a downside to this choice? > NOTRACK prefix is typically generated by compiler for switch table. Compiler knows each indirect jump target is valid and pointer load for indirect jump is generated by compiler in read-only section. This is pretty safe since there is very little chance for malicious codes to temper the pointer value. However, in case of longjmp, the indirect jump target is in jmpbuf. There is a possilibty for malicious codes to change the indirect jump target such that longjmp wil jump to the wrong place. Use NOTRACK prefix here defeats the purpose of indirect branch tracking in CET.
On Wed, Mar 7, 2018 at 11:47 AM, H.J. Lu <hjl.tools@gmail.com> wrote: > On Wed, Mar 7, 2018 at 9:34 AM, Carlos O'Donell <carlos@redhat.com> wrote: >> On 03/07/2018 03:56 AM, H.J. Lu wrote: >>>>> 1. I have to add __setjmp_cancel and __sigsetjmp_cancel which won't >>>>> save and restore shadow stack register. >>> >>> I have been testing this. I ran into one issue. GCC knows that setjmp will >>> return via longjmp and inserts ENDBR after it. But it doesn't know >>> __setjmp_cancel and __sigsetjmp_cancel. We can either add them to GCC >>> or add NOTRACK prefix to the corresponding longjmps. >> >> I would rather GCC did not know about these implementation details. >> >> I have no objection to the NOTRACK prefix in the corresponding longjmps. >> >> What would be a downside to this choice? >> > > NOTRACK prefix is typically generated by compiler for switch table. Compiler > knows each indirect jump target is valid and pointer load for indirect jump is > generated by compiler in read-only section. This is pretty safe since there is > very little chance for malicious codes to temper the pointer value. However, > in case of longjmp, the indirect jump target is in jmpbuf. There is > a possilibty > for malicious codes to change the indirect jump target such that longjmp wil > jump to the wrong place. Use NOTRACK prefix here defeats the purpose of > indirect branch tracking in CET. > Also GCC needs to know that __setjmp_cancel and __sigsetjmp_cancel may return twice, similar to setjmp.
On Wed, Mar 7, 2018 at 12:14 PM, H.J. Lu <hjl.tools@gmail.com> wrote: > On Wed, Mar 7, 2018 at 11:47 AM, H.J. Lu <hjl.tools@gmail.com> wrote: >> On Wed, Mar 7, 2018 at 9:34 AM, Carlos O'Donell <carlos@redhat.com> wrote: >>> On 03/07/2018 03:56 AM, H.J. Lu wrote: >>>>>> 1. I have to add __setjmp_cancel and __sigsetjmp_cancel which won't >>>>>> save and restore shadow stack register. >>>> >>>> I have been testing this. I ran into one issue. GCC knows that setjmp will >>>> return via longjmp and inserts ENDBR after it. But it doesn't know >>>> __setjmp_cancel and __sigsetjmp_cancel. We can either add them to GCC >>>> or add NOTRACK prefix to the corresponding longjmps. >>> >>> I would rather GCC did not know about these implementation details. >>> >>> I have no objection to the NOTRACK prefix in the corresponding longjmps. >>> >>> What would be a downside to this choice? >>> >> >> NOTRACK prefix is typically generated by compiler for switch table. Compiler >> knows each indirect jump target is valid and pointer load for indirect jump is >> generated by compiler in read-only section. This is pretty safe since there is >> very little chance for malicious codes to temper the pointer value. However, >> in case of longjmp, the indirect jump target is in jmpbuf. There is >> a possilibty >> for malicious codes to change the indirect jump target such that longjmp wil >> jump to the wrong place. Use NOTRACK prefix here defeats the purpose of >> indirect branch tracking in CET. >> > > Also GCC needs to know that __setjmp_cancel and __sigsetjmp_cancel may > return twice, similar to setjmp. > Here is the GCC patch: diff --git a/gcc/calls.c b/gcc/calls.c index 19c95b8455b..d1f436dfa91 100644 --- a/gcc/calls.c +++ b/gcc/calls.c @@ -604,7 +604,7 @@ special_function_p (const_tree fndecl, int flags) name_decl = DECL_NAME (cgraph_node::get (fndecl)->orig_decl); if (fndecl && name_decl - && IDENTIFIER_LENGTH (name_decl) <= 11 + && IDENTIFIER_LENGTH (name_decl) <= 18 /* Exclude functions not at the file scope, or not `extern', since they are not the magic functions we would otherwise think they are. @@ -637,8 +637,8 @@ special_function_p (const_tree fndecl, int flags) } /* ECF_RETURNS_TWICE is safe even for -ffreestanding. */ - if (! strcmp (tname, "setjmp") - || ! strcmp (tname, "sigsetjmp") + if (! strncmp (tname, "setjmp", 6) + || ! strncmp (tname, "sigsetjmp", 9) || ! strcmp (name, "savectx") || ! strcmp (name, "vfork") || ! strcmp (name, "getcontext"))
> -----Original Message----- > From: H.J. Lu [mailto:hjl.tools@gmail.com] > Sent: Wednesday, March 7, 2018 11:07 PM > To: Carlos O'Donell <carlos@redhat.com>; Tsimbalist, Igor V > <igor.v.tsimbalist@intel.com> > Cc: Florian Weimer <fw@deneb.enyo.de>; GNU C Library <libc- > alpha@sourceware.org> > Subject: Re: [PATCH 0/2] nptl: Update struct pthread_unwind_buf > > On Wed, Mar 7, 2018 at 12:14 PM, H.J. Lu <hjl.tools@gmail.com> wrote: > > On Wed, Mar 7, 2018 at 11:47 AM, H.J. Lu <hjl.tools@gmail.com> wrote: > >> On Wed, Mar 7, 2018 at 9:34 AM, Carlos O'Donell <carlos@redhat.com> > wrote: > >>> On 03/07/2018 03:56 AM, H.J. Lu wrote: > >>>>>> 1. I have to add __setjmp_cancel and __sigsetjmp_cancel which > won't > >>>>>> save and restore shadow stack register. > >>>> > >>>> I have been testing this. I ran into one issue. GCC knows that setjmp > will > >>>> return via longjmp and inserts ENDBR after it. But it doesn't know > >>>> __setjmp_cancel and __sigsetjmp_cancel. We can either add them to > GCC > >>>> or add NOTRACK prefix to the corresponding longjmps. > >>> > >>> I would rather GCC did not know about these implementation details. > >>> > >>> I have no objection to the NOTRACK prefix in the corresponding > longjmps. > >>> > >>> What would be a downside to this choice? > >>> > >> > >> NOTRACK prefix is typically generated by compiler for switch table. > Compiler > >> knows each indirect jump target is valid and pointer load for indirect > jump is > >> generated by compiler in read-only section. This is pretty safe since there > is > >> very little chance for malicious codes to temper the pointer value. > However, > >> in case of longjmp, the indirect jump target is in jmpbuf. There is > >> a possilibty > >> for malicious codes to change the indirect jump target such that longjmp > wil > >> jump to the wrong place. Use NOTRACK prefix here defeats the purpose > of > >> indirect branch tracking in CET. > >> > > > > Also GCC needs to know that __setjmp_cancel and __sigsetjmp_cancel may > > return twice, similar to setjmp. > > > > Here is the GCC patch: > > > diff --git a/gcc/calls.c b/gcc/calls.c > index 19c95b8455b..d1f436dfa91 100644 > --- a/gcc/calls.c > +++ b/gcc/calls.c > @@ -604,7 +604,7 @@ special_function_p (const_tree fndecl, int flags) > name_decl = DECL_NAME (cgraph_node::get (fndecl)->orig_decl); > > if (fndecl && name_decl > - && IDENTIFIER_LENGTH (name_decl) <= 11 > + && IDENTIFIER_LENGTH (name_decl) <= 18 > /* Exclude functions not at the file scope, or not `extern', > since they are not the magic functions we would otherwise > think they are. > @@ -637,8 +637,8 @@ special_function_p (const_tree fndecl, int flags) > } > > /* ECF_RETURNS_TWICE is safe even for -ffreestanding. */ > - if (! strcmp (tname, "setjmp") > - || ! strcmp (tname, "sigsetjmp") > + if (! strncmp (tname, "setjmp", 6) > + || ! strncmp (tname, "sigsetjmp", 9) > || ! strcmp (name, "savectx") > || ! strcmp (name, "vfork") > || ! strcmp (name, "getcontext")) Should it be compared with the whole function name (__setjmp_cancel and __sigsetjmp_cancel) as something like setjmp_my_func will be also detected? Also there was an error in detection of __getcontext, which is 12 bytes, but it will be fixed by this patch. Igor > -- > H.J.
On Thu, Mar 8, 2018 at 4:24 AM, Tsimbalist, Igor V <igor.v.tsimbalist@intel.com> wrote: >> -----Original Message----- >> From: H.J. Lu [mailto:hjl.tools@gmail.com] >> Sent: Wednesday, March 7, 2018 11:07 PM >> To: Carlos O'Donell <carlos@redhat.com>; Tsimbalist, Igor V >> <igor.v.tsimbalist@intel.com> >> Cc: Florian Weimer <fw@deneb.enyo.de>; GNU C Library <libc- >> alpha@sourceware.org> >> Subject: Re: [PATCH 0/2] nptl: Update struct pthread_unwind_buf >> >> On Wed, Mar 7, 2018 at 12:14 PM, H.J. Lu <hjl.tools@gmail.com> wrote: >> > On Wed, Mar 7, 2018 at 11:47 AM, H.J. Lu <hjl.tools@gmail.com> wrote: >> >> On Wed, Mar 7, 2018 at 9:34 AM, Carlos O'Donell <carlos@redhat.com> >> wrote: >> >>> On 03/07/2018 03:56 AM, H.J. Lu wrote: >> >>>>>> 1. I have to add __setjmp_cancel and __sigsetjmp_cancel which >> won't >> >>>>>> save and restore shadow stack register. >> >>>> >> >>>> I have been testing this. I ran into one issue. GCC knows that setjmp >> will >> >>>> return via longjmp and inserts ENDBR after it. But it doesn't know >> >>>> __setjmp_cancel and __sigsetjmp_cancel. We can either add them to >> GCC >> >>>> or add NOTRACK prefix to the corresponding longjmps. >> >>> >> >>> I would rather GCC did not know about these implementation details. >> >>> >> >>> I have no objection to the NOTRACK prefix in the corresponding >> longjmps. >> >>> >> >>> What would be a downside to this choice? >> >>> >> >> >> >> NOTRACK prefix is typically generated by compiler for switch table. >> Compiler >> >> knows each indirect jump target is valid and pointer load for indirect >> jump is >> >> generated by compiler in read-only section. This is pretty safe since there >> is >> >> very little chance for malicious codes to temper the pointer value. >> However, >> >> in case of longjmp, the indirect jump target is in jmpbuf. There is >> >> a possilibty >> >> for malicious codes to change the indirect jump target such that longjmp >> wil >> >> jump to the wrong place. Use NOTRACK prefix here defeats the purpose >> of >> >> indirect branch tracking in CET. >> >> >> > >> > Also GCC needs to know that __setjmp_cancel and __sigsetjmp_cancel may >> > return twice, similar to setjmp. >> > >> >> Here is the GCC patch: >> >> >> diff --git a/gcc/calls.c b/gcc/calls.c >> index 19c95b8455b..d1f436dfa91 100644 >> --- a/gcc/calls.c >> +++ b/gcc/calls.c >> @@ -604,7 +604,7 @@ special_function_p (const_tree fndecl, int flags) >> name_decl = DECL_NAME (cgraph_node::get (fndecl)->orig_decl); >> >> if (fndecl && name_decl >> - && IDENTIFIER_LENGTH (name_decl) <= 11 >> + && IDENTIFIER_LENGTH (name_decl) <= 18 >> /* Exclude functions not at the file scope, or not `extern', >> since they are not the magic functions we would otherwise >> think they are. >> @@ -637,8 +637,8 @@ special_function_p (const_tree fndecl, int flags) >> } >> >> /* ECF_RETURNS_TWICE is safe even for -ffreestanding. */ >> - if (! strcmp (tname, "setjmp") >> - || ! strcmp (tname, "sigsetjmp") >> + if (! strncmp (tname, "setjmp", 6) >> + || ! strncmp (tname, "sigsetjmp", 9) >> || ! strcmp (name, "savectx") >> || ! strcmp (name, "vfork") >> || ! strcmp (name, "getcontext")) > > Should it be compared with the whole function name (__setjmp_cancel and > __sigsetjmp_cancel) as something like setjmp_my_func will be also detected? True. This is the patch I have tested: https://github.com/hjl-tools/gcc/commit/e98087865405f051e93d5f35588789ef9686db4a > Also there was an error in detection of __getcontext, which is 12 bytes, but it > will be fixed by this patch.
On 03/08/2018 04:48 AM, H.J. Lu wrote: > True. This is the patch I have tested: > > https://github.com/hjl-tools/gcc/commit/e98087865405f051e93d5f35588789ef9686db4a I assume NOTRACK prefix is what Intel calls 'NO_TRACK_EN' in the CET documentation since binutils uses the 0x3E prefix for it and that matches the Intel CET docs. Please correct me if I'm wrong. In which case I agree, using NOTRACK is going to prevent a useful use of CET against writes to the cancellation jump buffer. This patch looks good to me, but is not a *correctness* issue, it is simply that we want to extend coverage to the private cancellation setjmp/longjmp buffers. Is there any way we can do this with source markup instead of via a fragile list in the compiler? Presumably users may want to markup their own code like this also if they have custom implementations of functions that behave like setjmp/longjmp?
On Thu, Mar 8, 2018 at 4:47 PM, Carlos O'Donell <carlos@redhat.com> wrote: > On 03/08/2018 04:48 AM, H.J. Lu wrote: >> True. This is the patch I have tested: >> >> https://github.com/hjl-tools/gcc/commit/e98087865405f051e93d5f35588789ef9686db4a > > I assume NOTRACK prefix is what Intel calls 'NO_TRACK_EN' in the CET > documentation since binutils uses the 0x3E prefix for it and that matches > the Intel CET docs. Please correct me if I'm wrong. That is correct. > In which case I agree, using NOTRACK is going to prevent a useful use > of CET against writes to the cancellation jump buffer. True. > This patch looks good to me, but is not a *correctness* issue, it is > simply that we want to extend coverage to the private cancellation > setjmp/longjmp buffers. > > Is there any way we can do this with source markup instead of via > a fragile list in the compiler? > > Presumably users may want to markup their own code like this also > if they have custom implementations of functions that behave like > setjmp/longjmp? > Yes, we can use __attribute__((__returns_twice__)). I updated hjl/setjmp/cancel branch to do that. No GCC changes are needed.
On 03/08/2018 10:23 PM, H.J. Lu wrote: > On Thu, Mar 8, 2018 at 4:47 PM, Carlos O'Donell <carlos@redhat.com> wrote: >> On 03/08/2018 04:48 AM, H.J. Lu wrote: >>> True. This is the patch I have tested: >>> >>> https://github.com/hjl-tools/gcc/commit/e98087865405f051e93d5f35588789ef9686db4a >> >> I assume NOTRACK prefix is what Intel calls 'NO_TRACK_EN' in the CET >> documentation since binutils uses the 0x3E prefix for it and that matches >> the Intel CET docs. Please correct me if I'm wrong. > > That is correct. > >> In which case I agree, using NOTRACK is going to prevent a useful use >> of CET against writes to the cancellation jump buffer. > > True. > >> This patch looks good to me, but is not a *correctness* issue, it is >> simply that we want to extend coverage to the private cancellation >> setjmp/longjmp buffers. >> >> Is there any way we can do this with source markup instead of via >> a fragile list in the compiler? >> >> Presumably users may want to markup their own code like this also >> if they have custom implementations of functions that behave like >> setjmp/longjmp? >> > > Yes, we can use __attribute__((__returns_twice__)). I updated > hjl/setjmp/cancel branch to do that. No GCC changes are needed. Is the next step for me to do another round of review on this branch? Cheers, Carlos.