pthread_cleanup_push macro generates warning when -Wclobbered is set

Message ID 3255e05c-196f-be61-799d-ad64828a721a@mentorg.com
State New
Headers show
Series
  • pthread_cleanup_push macro generates warning when -Wclobbered is set
Related show

Commit Message

Paul Carroll Nov. 14, 2017, 6:05 p.m.
I originally submitted this patch back in May as part of Bugzilla 
#21513.  That issue was never pursued to completion.

There is an open issue filed against the Gcc Middle-End - Bugzilla #61118.
In that issue, it is noted that the use of the pthread_cleanup_push 
macro from pthread.h will generate 2 warnings when -Wclobbered is set.
The warnings are for the '__cancel_routine' and '__cancel_arg' variables 
that are created as part of the macro definition.
The warning occurs because of the presence of a sigsetjmp() call after 
those 2 variables are defined.

For our customer, who compiles with -Werror, the presence of these 
warnings is unacceptable.

In the absence of a GCC fix, a solution is to modify the macro 
definitions in pthread.h so as to mark those variables as 'volatile'.
The changes would be made to both pthread_cleanup_push and 
pthread_cleanup_push_defer_np macro definitions.  The changes would make 
the macros look something like this:

# define pthread_cleanup_push(routine, arg) \
   do {                                                               \
     __pthread_unwind_buf_t __cancel_buf;                             \
     void (* volatile __cancel_routine) (void *) = (routine);         \
     void * volatile __cancel_arg = (arg);                            \
     :

Since those variables are now reloaded whenever they are used and thus 
cannot not be clobbered by the setjmp, the compilation is quiet.

Attempts were made to use '#pragma GCC diagnostic' to turn off the 
warning specifically for this macro expansion, but that had no effect.

And, as has been noted in the GCC Bugzilla issue, this issue only occurs 
when optimization is used and when files are preprocessed (i.e., not .i 
files).


The analysis behind this patch was noted by one of our engineers:

(a) The warning is a false positive. The _cancel_routine and cancel_arg 
variables will not in fact be modified between the _sigsetjmp call and 
the corresponding longjmp (if any) with the same jmp_buf.

(b) The compiler has no way of knowing that it is a false positive. The 
calls to the macros are within a loop and the compiler cannot tell that 
a longjmp does not occur that uses the jmp_buf from a previous iteration.

(c) As these warnings occur quite late and are based on when RTL pseudos 
for particular variables are live, they are very sensitive to details of 
source code and code generation.

(d) Occurring that late also means the macro expansion contexts that are 
available earlier in compilation are not available here, only the 
expansion-point location. This probably explains why diagnostic pragmas 
inside the cleanup macros do not serve to disable the warning.

(e) Going via preprocessed source simplifies the location information to 
the form that can be represented in .i files. By forcing every expanded 
token to be either a system header token or not, as marked in the .i 
file, it may well perturb details of when warnings occur for code coming 
partly from expansions of macros in system headers. The warning state in 
normal compilation matters more than the warning state when information 
has been discarded by going through a .i file.

(f) On that basis, although it is a workaround for a compiler limitation 
regarding diagnostic pragmas, adding volatile in pthread.h seems a 
reasonable approach to avoiding the warning.

Comments

Florian Weimer Nov. 14, 2017, 7 p.m. | #1
On 11/14/2017 07:05 PM, Paul Carroll wrote:
> (f) On that basis, although it is a workaround for a compiler limitation 
> regarding diagnostic pragmas, adding volatile in pthread.h seems a 
> reasonable approach to avoiding the warning.

The problem is that this forces the function pointer onto the stack. 
 From tst-clobbered:


   a6:   48 c7 44 24 20 00 00    movq   $0x0,0x20(%rsp)
   ad:   00 00
                         ab: R_X86_64_32S        cleanup_fn

  168:   48 8b 44 24 20          mov    0x20(%rsp),%rax
  16d:   48 8b 7c 24 28          mov    0x28(%rsp),%rdi
  172:   ff d0                   callq  *%rax

Or tst-cancel1:

  1f4:   48 c7 04 24 00 00 00    movq   $0x0,(%rsp)
  1fb:   00
                         1f8: R_X86_64_32S       .text

  2d8:   48 8b 04 24             mov    (%rsp),%rax
  2dc:   48 8b 7c 24 08          mov    0x8(%rsp),%rdi
  2e1:   ff d0                   callq  *%rax
  2e3:   48 8d 7c 24 10          lea    0x10(%rsp),%rdi
  2e8:   e8 00 00 00 00          callq  2ed <tf+0x12d>

At least tst-cancel1 had a direct call before:

  2a0:   bf 2a 00 00 00          mov    $0x2a,%edi
  2a5:   e8 56 fd ff ff          callq  0 <cleanup>

This is problematic from a security point of view: We really do not want 
unencrypted function pointers on the stack.  (The setjmp return address 
is at least mangled.)

Your test case already used an indirect call before the change with GCC 
7.  I think we should try to fix this in GCC.  GCC 4.8 used to generate 
a direct call here, so this is a minor regression in the area of 
security hardening.

In the meantime, can you compile with -fexceptions?

Thanks,
Florian

PS: For those reading along, this is the GCC PR:

   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61118
Joseph Myers Nov. 14, 2017, 9:47 p.m. | #2
On Tue, 14 Nov 2017, Florian Weimer wrote:

> Your test case already used an indirect call before the change with GCC 7.  I
> think we should try to fix this in GCC.  GCC 4.8 used to generate a direct
> call here, so this is a minor regression in the area of security hardening.

How do you suggest the compiler could tell that longjmp is only ever 
called from the same iteration of the outer loop as setjmp?
Florian Weimer Nov. 14, 2017, 9:55 p.m. | #3
On 11/14/2017 10:47 PM, Joseph Myers wrote:
> On Tue, 14 Nov 2017, Florian Weimer wrote:
> 
>> Your test case already used an indirect call before the change with GCC 7.  I
>> think we should try to fix this in GCC.  GCC 4.8 used to generate a direct
>> call here, so this is a minor regression in the area of security hardening.
> 
> How do you suggest the compiler could tell that longjmp is only ever
> called from the same iteration of the outer loop as setjmp?

I'm not sure if I understand your question correctly.

The jump buffer does not even live as long as one iteration of the loop, 
so it necessarily has to be the same iteration.

Thanks,
Florian
Joseph Myers Nov. 14, 2017, 10:06 p.m. | #4
On Tue, 14 Nov 2017, Florian Weimer wrote:

> On 11/14/2017 10:47 PM, Joseph Myers wrote:
> > On Tue, 14 Nov 2017, Florian Weimer wrote:
> > 
> > > Your test case already used an indirect call before the change with GCC 7.
> > > I
> > > think we should try to fix this in GCC.  GCC 4.8 used to generate a direct
> > > call here, so this is a minor regression in the area of security
> > > hardening.
> > 
> > How do you suggest the compiler could tell that longjmp is only ever
> > called from the same iteration of the outer loop as setjmp?
> 
> I'm not sure if I understand your question correctly.
> 
> The jump buffer does not even live as long as one iteration of the loop, so it
> necessarily has to be the same iteration.

The "returns twice" information in GCC does not link the possible second 
return to the lifetime of a particular object.

Would that be extended in some way that introduces such a linkage?  
Special knowledge about __sigsetjmp, or an extension to the returns_twice 
attribute (cf. bug 20382 noting how glibc relies on such special knowledge 
at present and doesn't have such attributes in its headers)?
Florian Weimer Nov. 15, 2017, 8:14 a.m. | #5
On 11/14/2017 11:06 PM, Joseph Myers wrote:
> On Tue, 14 Nov 2017, Florian Weimer wrote:
> 
>> On 11/14/2017 10:47 PM, Joseph Myers wrote:
>>> On Tue, 14 Nov 2017, Florian Weimer wrote:
>>>
>>>> Your test case already used an indirect call before the change with GCC 7.
>>>> I
>>>> think we should try to fix this in GCC.  GCC 4.8 used to generate a direct
>>>> call here, so this is a minor regression in the area of security
>>>> hardening.
>>>
>>> How do you suggest the compiler could tell that longjmp is only ever
>>> called from the same iteration of the outer loop as setjmp?
>>
>> I'm not sure if I understand your question correctly.
>>
>> The jump buffer does not even live as long as one iteration of the loop, so it
>> necessarily has to be the same iteration.
> 
> The "returns twice" information in GCC does not link the possible second
> return to the lifetime of a particular object.

But the control flow in pthread_cleanup_push is such that for both 
returns, local objects of limited scope are referenced immediately, so 
the compiler could infer that.

We could however take the position that C11 requires that once you call 
setjmp, the lifetime of all objects begins and ends and the beginning 
and end of the closest enclosing scope which has an object of variably 
modified type.  This would preclude unrolling loops and putting declared 
objects at different addresses, for example.  With this interpretation, 
the compiler cannot infer that the loop iteration is the same.  (GCC 
could perform the this transformation conservatively for any frame which 
has a call to a returns-twice function; no new attribute is needed.)

However, I still don't see how this matters here.  __cancel_routine and 
__cancel_arg are not addressable and not written to after initialization 
(and that's also true for the future argument in Paul's reproducer). 
This means that even with the POSIX interpretation if setjmp (where 
object lifetimes do not float outwards, and the address of the jump 
buffer matters), there is no wiggle room for the implementation to 
change the value of these variables.

Thanks,
Florian
Joseph Myers Nov. 15, 2017, 1:44 p.m. | #6
On Wed, 15 Nov 2017, Florian Weimer wrote:

> > The "returns twice" information in GCC does not link the possible second
> > return to the lifetime of a particular object.
> 
> But the control flow in pthread_cleanup_push is such that for both returns,
> local objects of limited scope are referenced immediately, so the compiler
> could infer that.

As I see it, the point of the warning is to say that a second return might 
access a variable that has (through being in another loop iteration, in 
this case) changed.  If we reason that such an access would be invalid if 
in another loop iteration, and so we can't be in another loop iteration, 
when would the warning ever occur?

As far as the compiler can see, the __sigsetjmp call makes the jmp_buf 
contents escape, and at any subsequent point (before the function returns 
or a scope with a variably modified type is left) the function might 
return again - and if this is when the containing scope has reentered, the 
warning is meant to warn, not deduce that in fact that case does not 
occur.  Some more precise way of describing the possible times of a second 
return would be needed to make it valid not to warn.

> However, I still don't see how this matters here.  __cancel_routine and
> __cancel_arg are not addressable and not written to after initialization (and
> that's also true for the future argument in Paul's reproducer). This means
> that even with the POSIX interpretation if setjmp (where object lifetimes do
> not float outwards, and the address of the jump buffer matters), there is no
> wiggle room for the implementation to change the value of these variables.

The end of the containing scope causes their values to be uninitialized.
Florian Weimer Nov. 15, 2017, 2:08 p.m. | #7
On 11/15/2017 02:44 PM, Joseph Myers wrote:
> On Wed, 15 Nov 2017, Florian Weimer wrote:
> 
>>> The "returns twice" information in GCC does not link the possible second
>>> return to the lifetime of a particular object.
>>
>> But the control flow in pthread_cleanup_push is such that for both returns,
>> local objects of limited scope are referenced immediately, so the compiler
>> could infer that.
> 
> As I see it, the point of the warning is to say that a second return might
> access a variable that has (through being in another loop iteration, in
> this case) changed.  If we reason that such an access would be invalid if
> in another loop iteration, and so we can't be in another loop iteration,
> when would the warning ever occur?

I think the warning is completely bogus in this case, and for any 
properly matched use of pthread_cleanup_push and pthread_cleanup_pop, as 
far as the __cancel_routine and __cancel_arg variables are concerned.

The warning is valid in in other cases.  I think here it is valid 
because the lifetime of the variable a has not ended upon the second 
return from setjmp:

#include <setjmp.h>

extern jmp_buf buf;
extern void f1 (void);
extern void f2 (void);

int
f3 (void)
{
   int a = 1;
   if (setjmp (buf))
     return a;
   f1 ();
   ++a;
   f2 ();
   return a;
}

But as far as I can see, this is always a portability warning because 
GCC will put the variable on the stack to make sure that its value is in 
fact determinate.

> As far as the compiler can see, the __sigsetjmp call makes the jmp_buf
> contents escape, and at any subsequent point (before the function returns
> or a scope with a variably modified type is left) the function might
> return again - and if this is when the containing scope has reentered, the
> warning is meant to warn, not deduce that in fact that case does not
> occur.  Some more precise way of describing the possible times of a second
> return would be needed to make it valid not to warn.

Still not convinced about this, and what's worse, I don't understand why 
you come to a different conclusion.  This shouldn't be a matter of 
opinion. 8-/

>> However, I still don't see how this matters here.  __cancel_routine and
>> __cancel_arg are not addressable and not written to after initialization (and
>> that's also true for the future argument in Paul's reproducer). This means
>> that even with the POSIX interpretation if setjmp (where object lifetimes do
>> not float outwards, and the address of the jump buffer matters), there is no
>> wiggle room for the implementation to change the value of these variables.
> 
> The end of the containing scope causes their values to be uninitialized.

They cease to exist completely once the block is exited.  Any further 
access should result in undefined behavior.

Thanks,
Florian
Joseph Myers Nov. 15, 2017, 3:28 p.m. | #8
On Wed, 15 Nov 2017, Florian Weimer wrote:

> > As far as the compiler can see, the __sigsetjmp call makes the jmp_buf
> > contents escape, and at any subsequent point (before the function returns
> > or a scope with a variably modified type is left) the function might
> > return again - and if this is when the containing scope has reentered, the
> > warning is meant to warn, not deduce that in fact that case does not
> > occur.  Some more precise way of describing the possible times of a second
> > return would be needed to make it valid not to warn.
> 
> Still not convinced about this, and what's worse, I don't understand why you
> come to a different conclusion.  This shouldn't be a matter of opinion. 8-/

Given that the compiler knows nothing about the semantics of the functions 
in question, beyond the returns-twice property of __sigsetjmp, the warning 
seems justified to me, in that the second return of __sigsetjmp might 
result in undefined behavior from variables being accessed after the 
containing block has exited, which is what the warning is about.  This is 
a matter of heuristics for what possible problematic returns should be 
considered by the warning.

Patch

diff -urN a/ChangeLog b/ChangeLog
--- a/ChangeLog	2017-08-02 07:57:16.000000000 -0500
+++ b/ChangeLog	2017-11-14 10:44:33.367788400 -0500
@@ -1,3 +1,13 @@ 
+2017-11-14  Paul Carroll  <pcarroll@codesourcery.com>
+
+	[BZ #21513]
+	* sysdeps/nptl/pthread.h (pthread_cleanup_push): Added 'volatile'
+	to __cancel_routine and __cancel_arg definitions.
+	* sysdeps/unix/sysv/linux/hppa/pthread.h (pthread_cleanup_push): Added
+	'volatile' to __cancel_routine and __cancel_arg definitions.
+	* nptl/Makefile (tests): Add tst-clobbered.c.
+	* nptl/tst-clobbered.c: New file.
+
 2017-08-02  Siddhesh Poyarekar  <siddhesh@sourceware.org>
 
 	* version.h (RELEASE): Set to "stable"
diff -urN a/nptl/Makefile b/nptl/Makefile
--- a/nptl/Makefile	2017-08-02 07:57:16.000000000 -0500
+++ b/nptl/Makefile	2017-11-14 10:44:12.211757900 -0500
@@ -302,7 +302,8 @@ 
 			    c89 gnu89 c99 gnu99 c11 gnu11) \
 	tst-bad-schedattr \
 	tst-thread_local1 tst-mutex-errorcheck tst-robust10 \
-	tst-robust-fork tst-create-detached tst-memstream
+	tst-robust-fork tst-create-detached tst-memstream \
+	tst-clobbered
 
 tests-internal := tst-typesizes \
 		  tst-rwlock19 tst-rwlock20 \
diff -urN a/nptl/tst-clobbered.c b/nptl/tst-clobbered.c
--- a/nptl/tst-clobbered.c	1969-12-31 19:00:00.000000000 -0500
+++ b/nptl/tst-clobbered.c	2017-11-14 10:45:50.376896700 -0500
@@ -0,0 +1,39 @@ 
+#include <pthread.h>
+#pragma GCC diagnostic error "-Wclobbered"
+
+#include <stdlib.h>
+
+void cleanup_fn(void *mutex)
+{
+}
+
+typedef struct {
+  size_t progress;
+  size_t total;
+  pthread_mutex_t mutex;
+  pthread_cond_t cond;
+  double min_wait;
+} dmnsn_future;
+
+void
+dmnsn_future_wait(dmnsn_future *future, double progress)
+{
+  pthread_mutex_lock(&future->mutex);
+  while ((double)future->progress/future->total < progress) {
+    /* Warning goes away without this block */
+    if (progress < future->min_wait) {
+      future->min_wait = progress;
+    }
+
+    pthread_cleanup_push(cleanup_fn, &future->mutex);
+    pthread_cond_wait(&future->cond, &future->mutex);
+    pthread_cleanup_pop(0);
+  }
+  pthread_mutex_unlock(&future->mutex);
+}
+
+int
+main(int argc, char *argv[])
+{
+  exit(0);
+}
diff -urN a/sysdeps/nptl/pthread.h b/sysdeps/nptl/pthread.h
--- a/sysdeps/nptl/pthread.h	2017-08-02 07:57:16.000000000 -0500
+++ b/sysdeps/nptl/pthread.h	2017-11-14 10:48:12.943097200 -0500
@@ -665,8 +665,8 @@ 
 # define pthread_cleanup_push(routine, arg) \
   do {									      \
     __pthread_unwind_buf_t __cancel_buf;				      \
-    void (*__cancel_routine) (void *) = (routine);			      \
-    void *__cancel_arg = (arg);						      \
+    void (* volatile __cancel_routine) (void *) = (routine);		      \
+    void * volatile __cancel_arg = (arg);				      \
     int __not_first_call = __sigsetjmp ((struct __jmp_buf_tag *) (void *)     \
 					__cancel_buf.__cancel_jmp_buf, 0);    \
     if (__glibc_unlikely (__not_first_call))				      \
@@ -700,8 +700,8 @@ 
 #  define pthread_cleanup_push_defer_np(routine, arg) \
   do {									      \
     __pthread_unwind_buf_t __cancel_buf;				      \
-    void (*__cancel_routine) (void *) = (routine);			      \
-    void *__cancel_arg = (arg);						      \
+    void (* volatile __cancel_routine) (void *) = (routine);		      \
+    void * volatile __cancel_arg = (arg);				      \
     int __not_first_call = __sigsetjmp ((struct __jmp_buf_tag *) (void *)     \
 					__cancel_buf.__cancel_jmp_buf, 0);    \
     if (__glibc_unlikely (__not_first_call))				      \
diff -urN a/sysdeps/unix/sysv/linux/hppa/pthread.h b/sysdeps/unix/sysv/linux/hppa/pthread.h
--- a/sysdeps/unix/sysv/linux/hppa/pthread.h	2017-08-02 07:57:16.000000000 -0500
+++ b/sysdeps/unix/sysv/linux/hppa/pthread.h	2017-11-14 10:50:03.305252400 -0500
@@ -641,8 +641,8 @@ 
 # define pthread_cleanup_push(routine, arg) \
   do {									      \
     __pthread_unwind_buf_t __cancel_buf;				      \
-    void (*__cancel_routine) (void *) = (routine);			      \
-    void *__cancel_arg = (arg);						      \
+    void (* volatile __cancel_routine) (void *) = (routine);		      \
+    void * volatile __cancel_arg = (arg);				      \
     int __not_first_call = __sigsetjmp ((struct __jmp_buf_tag *) (void *)     \
 					__cancel_buf.__cancel_jmp_buf, 0);    \
     if (__glibc_unlikely (__not_first_call))				      \
@@ -676,8 +676,8 @@ 
 #  define pthread_cleanup_push_defer_np(routine, arg) \
   do {									      \
     __pthread_unwind_buf_t __cancel_buf;				      \
-    void (*__cancel_routine) (void *) = (routine);			      \
-    void *__cancel_arg = (arg);						      \
+    void (* volatile __cancel_routine) (void *) = (routine);		      \
+    void * volatile __cancel_arg = (arg);				      \
     int __not_first_call = __sigsetjmp ((struct __jmp_buf_tag *) (void *)     \
 					__cancel_buf.__cancel_jmp_buf, 0);    \
     if (__glibc_unlikely (__not_first_call))				      \