Patchwork Fix 32-bit __atomic_*_16 problems (hopefully)

login
register
mail settings
Submitter Joseph S. Myers
Date Nov. 8, 2013, 10:30 p.m.
Message ID <Pine.LNX.4.64.1311082227500.14552@digraph.polyomino.org.uk>
Download mbox | patch
Permalink /patch/289933/
State New
Headers show

Comments

Joseph S. Myers - Nov. 8, 2013, 10:30 p.m.
I hope this patch will fix the issues people have seen with atomics
tests failing on 32-bit architectures with missing __atomic_*_16 (or
at least replace them by different problems).  I'm running tests on
x86_64-unknown-linux-gnu; perhaps someone seeing the 32-bit problems
could test it there?

2013-11-08  Joseph Myers  <joseph@codesourcery.com>

	* c-common.c (atomic_size_supported_p): New function.
	(resolve_overloaded_atomic_exchange)
	(resolve_overloaded_atomic_compare_exchange)
	(resolve_overloaded_atomic_load, resolve_overloaded_atomic_store):
	Use it instead of comparing size with a local list of sizes.
Dominique Dhumieres - Nov. 9, 2013, 2:59 p.m.
> I hope this patch will fix the issues people have seen with atomics
> tests failing on 32-bit architectures with missing __atomic_*_16 (or
> at least replace them by different problems).  I'm running tests on
> x86_64-unknown-linux-gnu; perhaps someone seeing the 32-bit problems
> could test it there?

I have applied the patch on top of revision 204561 on x86_64-apple-darwin13.
The tests c11-atomic-exec-1.c to c11-atomic-exec-4.c pass after it.
However the test c11-atomic-exec-5.c is timed out for 10 sets of options
with -m32 and 8 sets with -m64 (-O2 and -O3 -fomit-frame-pointer pass).
This has been tested with

make -k check-gcc RUNTESTFLAGS="atomic.exp --target_board=unix'{-m32,-m64}'"

So I have reverted the patch, then the tests failed with -m32 and
c11-atomic-exec-5.c was timed out with -O0 only.
This failure disappeared when testing with

make -k -j8 check-c RUNTESTFLAGS="--target_board=unix'{-m32,-m64}'"

without the patch. With it only gcc.dg/atomic/c11-atomic-exec-5.c  -O2 -flto
has been timed out.

When run manually (compiled with
gcc49 -std=c11 -pedantic-errors -pthread -D_POSIX_C_SOURCE=200809L /opt/gcc/work/gcc/testsuite/gcc.dg/atomic/c11-atomic-exec-5.c -L/opt/gcc/build_w/x86_64-apple-darwin13.0.0/./libatomic/.libs -latomic
or
gcc49 -std=c11 -pedantic-errors -pthread -D_POSIX_C_SOURCE=200809L /opt/gcc/work/gcc/testsuite/gcc.dg/atomic/c11-atomic-exec-5.c -L/opt/gcc/build_w/x86_64-apple-darwin13.0.0/i386/./libatomic/.libs -latomic -m32)
the test takes minutes.

The config infos are

Using built-in specs.
COLLECT_GCC=gcc49
COLLECT_LTO_WRAPPER=/opt/gcc/gcc4.9w/libexec/gcc/x86_64-apple-darwin13.0.0/4.9.0/lto-wrapper
Target: x86_64-apple-darwin13.0.0
Configured with: ../work/configure --prefix=/opt/gcc/gcc4.9w --enable-languages=c,c++,fortran,objc,obj-c++,ada,java,lto --with-gmp=/opt/mp --with-system-zlib --with-isl=/opt/mp --enable-lto --enable-plugin --with-arch=corei7 --with-cpu=corei7
Thread model: posix
gcc version 4.9.0 20131108 (experimental) [trunk revision 204561p12] (GCC) 

Dominique
Dominique Dhumieres - Nov. 9, 2013, 3:47 p.m.
Typical very long run output:

[Book15] f90/bug% gcc49 -std=c11 -pedantic-errors -pthread -D_POSIX_C_SOURCE=200809L /opt/gcc/work/gcc/testsuite/gcc.dg/atomic/c11-atomic-exec-5.c -L/opt/gcc/build_w/x86_64-apple-darwin13.0.0/i386/./libatomic/.libs -latomic -m32
[Book15] f90/bug% time a.out                                                                                                                          float_add_invalid (a) 7528 pass, 0 fail; (b) 2472 pass, 0 fail
float_add_invalid_prev (a) 3102 pass, 0 fail; (b) 6898 pass, 0 fail
float_add_overflow (a) 4862 pass, 0 fail; (b) 5138 pass, 0 fail
float_add_overflow_prev (a) 5145 pass, 0 fail; (b) 4855 pass, 0 fail
float_add_overflow_double (a) 2364 pass, 0 fail; (b) 7636 pass, 0 fail
float_add_overflow_long_double (a) 1272 pass, 0 fail; (b) 8728 pass, 0 fail
float_add_inexact (a) 4923 pass, 0 fail; (b) 5077 pass, 0 fail
float_add_inexact_int (a) 4368 pass, 0 fail; (b) 5632 pass, 0 fail
float_preinc_inexact (a) 4243 pass, 0 fail; (b) 5757 pass, 0 fail
float_postinc_inexact (a) 4180 pass, 0 fail; (b) 5820 pass, 0 fail
complex_float_add_overflow (a) 6033 pass, 0 fail; (b) 3967 pass, 0 fail
float_sub_invalid (a) 135 pass, 0 fail; (b) 9865 pass, 0 fail
float_sub_overflow (a) 1752 pass, 0 fail; (b) 8248 pass, 0 fail
float_sub_inexact (a) 1838 pass, 0 fail; (b) 8162 pass, 0 fail
float_sub_inexact_int (a) 3106 pass, 0 fail; (b) 6894 pass, 0 fail
float_predec_inexact (a) 5278 pass, 0 fail; (b) 4722 pass, 0 fail
float_postdec_inexact (a) 2567 pass, 0 fail; (b) 7433 pass, 0 fail
complex_float_sub_overflow (a) 5024 pass, 0 fail; (b) 4976 pass, 0 fail
float_mul_invalid (a) 492 pass, 0 fail; (b) 9508 pass, 0 fail
float_mul_overflow (a) 6745 pass, 0 fail; (b) 3255 pass, 0 fail
float_mul_underflow (a) 7017 pass, 0 fail; (b) 2983 pass, 0 fail
float_mul_inexact (a) 9986 pass, 0 fail; (b) 14 pass, 0 fail
float_mul_inexact_int (a) 5761 pass, 0 fail; (b) 4239 pass, 0 fail
complex_float_mul_overflow (a) 4633 pass, 0 fail; (b) 5367 pass, 0 fail
float_div_invalid_divbyzero (a) 296 pass, 0 fail; (b) 9704 pass, 0 fail
float_div_overflow (a) 7839 pass, 0 fail; (b) 2161 pass, 0 fail
float_div_underflow (a) 2370 pass, 0 fail; (b) 7630 pass, 0 fail
float_div_inexact (a) 2306 pass, 0 fail; (b) 7694 pass, 0 fail
float_div_inexact_int (a) 4119 pass, 0 fail; (b) 5881 pass, 0 fail
int_div_float_inexact (a) 2754 pass, 0 fail; (b) 7246 pass, 0 fail
complex_float_div_overflow (a) 7465 pass, 0 fail; (b) 2535 pass, 0 fail
double_add_invalid (a) 4871 pass, 0 fail; (b) 5129 pass, 0 fail
double_add_overflow (a) 5579 pass, 0 fail; (b) 4421 pass, 0 fail
double_add_overflow_long_double (a) 7823 pass, 0 fail; (b) 2177 pass, 0 fail
double_add_inexact (a) 6207 pass, 0 fail; (b) 3793 pass, 0 fail
double_add_inexact_int (a) 6279 pass, 0 fail; (b) 3721 pass, 0 fail
double_preinc_inexact (a) 4770 pass, 0 fail; (b) 5230 pass, 0 fail
double_postinc_inexact (a) 4422 pass, 0 fail; (b) 5578 pass, 0 fail
complex_double_add_overflow (a) 4939 pass, 0 fail; (b) 5061 pass, 0 fail
double_sub_invalid (a) 591 pass, 0 fail; (b) 9409 pass, 0 fail
double_sub_overflow (a) 7111 pass, 0 fail; (b) 2889 pass, 0 fail
double_sub_inexact (a) 3981 pass, 0 fail; (b) 6019 pass, 0 fail
double_sub_inexact_int (a) 4738 pass, 0 fail; (b) 5262 pass, 0 fail
double_predec_inexact (a) 4998 pass, 0 fail; (b) 5002 pass, 0 fail
double_postdec_inexact (a) 5141 pass, 0 fail; (b) 4859 pass, 0 fail
complex_double_sub_overflow (a) 4980 pass, 0 fail; (b) 5020 pass, 0 fail
double_mul_invalid (a) 2560 pass, 0 fail; (b) 7440 pass, 0 fail
double_mul_overflow (a) 9950 pass, 0 fail; (b) 50 pass, 0 fail
double_mul_overflow_float (a) 9978 pass, 0 fail; (b) 22 pass, 0 fail
double_mul_underflow (a) 6028 pass, 0 fail; (b) 3972 pass, 0 fail
double_mul_inexact (a) 4822 pass, 0 fail; (b) 5178 pass, 0 fail
double_mul_inexact_int (a) 4468 pass, 0 fail; (b) 5532 pass, 0 fail
complex_double_mul_overflow (a) 4912 pass, 0 fail; (b) 5088 pass, 0 fail
double_div_invalid_divbyzero (a) 127 pass, 0 fail; (b) 9873 pass, 0 fail
double_div_overflow (a) 9982 pass, 0 fail; (b) 18 pass, 0 fail
double_div_underflow (a) 8294 pass, 0 fail; (b) 1706 pass, 0 fail
double_div_inexact (a) 6491 pass, 0 fail; (b) 3509 pass, 0 fail
double_div_inexact_int (a) 6350 pass, 0 fail; (b) 3650 pass, 0 fail
int_div_double_inexact (a) 1115 pass, 0 fail; (b) 8885 pass, 0 fail
complex_double_div_overflow (a) 4934 pass, 0 fail; (b) 5066 pass, 0 fail
long_double_add_invalid (a) 4911 pass, 0 fail; (b) 5089 pass, 0 fail
long_double_add_overflow (a) 4962 pass, 0 fail; (b) 5038 pass, 0 fail
long_double_add_inexact (a) 5014 pass, 0 fail; (b) 4986 pass, 0 fail
long_double_add_inexact_int (a) 4970 pass, 0 fail; (b) 5030 pass, 0 fail
long_double_preinc_inexact (a) 4981 pass, 0 fail; (b) 5019 pass, 0 fail
long_double_postinc_inexact (a) 4992 pass, 0 fail; (b) 5008 pass, 0 fail
complex_long_double_add_overflow (a) 5049 pass, 0 fail; (b) 4951 pass, 0 fail
long_double_sub_invalid (a) 4956 pass, 0 fail; (b) 5044 pass, 0 fail
long_double_sub_overflow (a) 4776 pass, 0 fail; (b) 5224 pass, 0 fail
long_double_sub_inexact (a) 4957 pass, 0 fail; (b) 5043 pass, 0 fail
long_double_sub_inexact_int (a) 4991 pass, 0 fail; (b) 5009 pass, 0 fail
long_double_predec_inexact (a) 5019 pass, 0 fail; (b) 4981 pass, 0 fail
long_double_postdec_inexact (a) 4973 pass, 0 fail; (b) 5027 pass, 0 fail
complex_long_double_sub_overflow (a) 4997 pass, 0 fail; (b) 5003 pass, 0 fail
long_double_mul_invalid (a) 4970 pass, 0 fail; (b) 5030 pass, 0 fail
long_double_mul_overflow (a) 4965 pass, 0 fail; (b) 5035 pass, 0 fail
long_double_mul_overflow_float (a) 4964 pass, 0 fail; (b) 5036 pass, 0 fail
long_double_mul_overflow_double (a) 4962 pass, 0 fail; (b) 5038 pass, 0 fail
long_double_mul_underflow (a) 4963 pass, 0 fail; (b) 5037 pass, 0 fail
long_double_mul_inexact (a) 5000 pass, 0 fail; (b) 5000 pass, 0 fail
long_double_mul_inexact_int (a) 4987 pass, 0 fail; (b) 5013 pass, 0 fail
complex_long_double_mul_overflow (a) 5014 pass, 0 fail; (b) 4986 pass, 0 fail
long_double_div_invalid_divbyzero (a) 4967 pass, 0 fail; (b) 5033 pass, 0 fail
long_double_div_overflow (a) 5000 pass, 0 fail; (b) 5000 pass, 0 fail
long_double_div_underflow (a) 4905 pass, 0 fail; (b) 5095 pass, 0 fail
long_double_div_inexact (a) 5020 pass, 0 fail; (b) 4980 pass, 0 fail
long_double_div_inexact_int (a) 4976 pass, 0 fail; (b) 5024 pass, 0 fail
int_div_long_double_inexact (a) 845 pass, 0 fail; (b) 9155 pass, 0 fail
complex_long_double_div_overflow (a) 5035 pass, 0 fail; (b) 4965 pass, 0 fail
653.695u 3015.500s 52:04.42 117.4%	0+0k 0+0io 0pf+0w

Dominique
Joseph S. Myers - Nov. 9, 2013, 4:58 p.m.
On Sat, 9 Nov 2013, Dominique Dhumieres wrote:

> > I hope this patch will fix the issues people have seen with atomics
> > tests failing on 32-bit architectures with missing __atomic_*_16 (or
> > at least replace them by different problems).  I'm running tests on
> > x86_64-unknown-linux-gnu; perhaps someone seeing the 32-bit problems
> > could test it there?
> 
> I have applied the patch on top of revision 204561 on x86_64-apple-darwin13.
> The tests c11-atomic-exec-1.c to c11-atomic-exec-4.c pass after it.

I've committed the patch, as it seems to make progress on allowing people 
to test this feature for 32-bit systems and so add their 
TARGET_ATOMIC_ASSIGN_EXPAND_FENV implementations and fix any other 
architecture-specific issues with atomic operations that the C11 atomics 
support shows up.

> However the test c11-atomic-exec-5.c is timed out for 10 sets of options
> with -m32 and 8 sets with -m64 (-O2 and -O3 -fomit-frame-pointer pass).
> This has been tested with

I think this suggests that the locking primitives used in libatomic are 
much slower on Darwin than on Linux - the total run of atomic.exp on 
x86_64-unknown-linux-gnu takes maybe a couple of minutes for me (and a 
large part of that is compiling the tests, some of which are pretty large 
after preprocessing, rather than executing them).

It may be necessary to make the iteration count depend on the target - 
there are various tests with such a target dependency already in the 
testsuite - but it would also be worth investigating why the tests are so 
slow on Darwin and whether e.g. some faster primitives are available that 
could be used in libatomic to make atomic operations faster (given that 
the slowness is a general deficiency for users of atomic operations on 
Darwin).  10000 was intended to be a reasonable balance with enough 
iterations that races are likely to be detected, while not being too slow 
on slow systems.  Everything in c11-atomic-exec-5.c should just work with 
a smaller iteration count, but it would be less effective at detecting 
races.  (Making the iteration count in c11-atomic-exec-4.c smaller would 
be more complicated because of the expected results that depend on that 
count.)

Patch

Index: c-family/c-common.c
===================================================================
--- c-family/c-common.c	(revision 204602)
+++ c-family/c-common.c	(working copy)
@@ -10403,6 +10403,27 @@  add_atomic_size_parameter (unsigned n, location_t
 }
 
 
+/* Return whether atomic operations for naturally aligned N-byte
+   arguments are supported, whether inline or through libatomic.  */
+static bool
+atomic_size_supported_p (int n)
+{
+  switch (n)
+    {
+    case 1:
+    case 2:
+    case 4:
+    case 8:
+      return true;
+
+    case 16:
+      return targetm.scalar_mode_supported_p (TImode);
+
+    default:
+      return false;
+    }
+}
+
 /* This will process an __atomic_exchange function call, determine whether it
    needs to be mapped to the _N variation, or turned into a library call.
    LOC is the location of the builtin call.
@@ -10428,7 +10449,7 @@  resolve_overloaded_atomic_exchange (location_t loc
     }
 
   /* If not a lock-free size, change to the library generic format.  */
-  if (n != 1 && n != 2 && n != 4 && n != 8 && n != 16)
+  if (!atomic_size_supported_p (n))
     {
       *new_return = add_atomic_size_parameter (n, loc, function, params);
       return true;
@@ -10493,7 +10514,7 @@  resolve_overloaded_atomic_compare_exchange (locati
     }
 
   /* If not a lock-free size, change to the library generic format.  */
-  if (n != 1 && n != 2 && n != 4 && n != 8 && n != 16)
+  if (!atomic_size_supported_p (n))
     {
       /* The library generic format does not have the weak parameter, so 
 	 remove it from the param list.  Since a parameter has been removed,
@@ -10569,7 +10590,7 @@  resolve_overloaded_atomic_load (location_t loc, tr
     }
 
   /* If not a lock-free size, change to the library generic format.  */
-  if (n != 1 && n != 2 && n != 4 && n != 8 && n != 16)
+  if (!atomic_size_supported_p (n))
     {
       *new_return = add_atomic_size_parameter (n, loc, function, params);
       return true;
@@ -10629,7 +10650,7 @@  resolve_overloaded_atomic_store (location_t loc, t
     }
 
   /* If not a lock-free size, change to the library generic format.  */
-  if (n != 1 && n != 2 && n != 4 && n != 8 && n != 16)
+  if (!atomic_size_supported_p (n))
     {
       *new_return = add_atomic_size_parameter (n, loc, function, params);
       return true;