diff mbox series

Fix x86_64 va_arg (ap, __int128) handling (PR target/92904)

Message ID 20191212235039.GN10088@tucnak
State New
Headers show
Series Fix x86_64 va_arg (ap, __int128) handling (PR target/92904) | expand

Commit Message

Jakub Jelinek Dec. 12, 2019, 11:50 p.m. UTC
Hi!

As the following testcase shows, for va_arg (ap, __int128) or va_arg (ap, X)
where X is __attribute__((aligned (16))) 16 byte struct containing just
integral fields we sometimes emit incorrect read.  While on the stack (i.e.
the overflow area) both of these are 16-byte aligned, when __int128 or
16-byte aligned e.g. { long a, b; } struct are passed in registers,
there is no alignment of it being only passed in even registers or something
similar (correct, the ABI says that), but in the end this means that when we
spill those GPRs into the gpr save area, the __int128 or aligned struct
might be only 8-byte aligned there rather than 16-byte aligned.
In the testcase, e.g. for f1 that isn't a problem, we just load the two
registers separately as 64-bit loads, but in f4 or f6 as it is a copy from
memory to memory, we actually optimize it using an aligned SSE load into an
SSE register and store from there back to memory (again, aligned SSE load,
that one is correct).
The current va_arg expanded code just computes the address from which the
__int128 etc. should be loaded, in two different branches and the load is
then done in the joiner bb.
So, what we could do is either move the loads from the joiner bb into the
two predecessors of that bb, have the one for the case of read from gpr save
area use 8-byte only aligned load and the one reading from overflow area
with 16-byte alignment, but I'd say that would result in unnecessarily
longer sequence, or we can as the patch just lower the alignment of the
MEM_REF in the joiner bb, which will for the cases we used vmovdqa etc.
just use vmovdqu instead, but will not grow code size and on contemporary
CPUs if the address is actually aligned, it shouldn't be really slower.
In most cases the __int128 is used anyway directly and in those cases likely
will be loaded into two GRPs.

The patch does this only for the needed_intregs case and !need_temp,
when need_temp, this kind of problem doesn't exist, the value is copied
either using memcpy or just 64-bit loads + stores into the aligned
temporary, and for needed_sseregs the slots in the fp save area are 16-byte
aligned already, and 32-byte aligned structures just would have 32-byte size
and thus would be passed in memory for ... args, no matter if they contain
__mm256 or __mm512 or something else.

Bootstrapped/regtested on x86_64-linux and i686-linux and after a while
release branches?

2019-12-12  Jakub Jelinek  <jakub@redhat.com>

	PR target/92904
	* config/i386/i386.c (ix86_gimplify_va_arg): If need_intregs and
	not need_temp, decrease alignment of the read because the GPR save
	area only guarantees 8-byte alignment.

	* gcc.c-torture/execute/pr92904.c: New test.


	Jakub

Comments

Jan Hubicka Dec. 12, 2019, 11:55 p.m. UTC | #1
> Hi!
> 
> As the following testcase shows, for va_arg (ap, __int128) or va_arg (ap, X)
> where X is __attribute__((aligned (16))) 16 byte struct containing just
> integral fields we sometimes emit incorrect read.  While on the stack (i.e.
> the overflow area) both of these are 16-byte aligned, when __int128 or
> 16-byte aligned e.g. { long a, b; } struct are passed in registers,
> there is no alignment of it being only passed in even registers or something
> similar (correct, the ABI says that), but in the end this means that when we
> spill those GPRs into the gpr save area, the __int128 or aligned struct
> might be only 8-byte aligned there rather than 16-byte aligned.
> In the testcase, e.g. for f1 that isn't a problem, we just load the two
> registers separately as 64-bit loads, but in f4 or f6 as it is a copy from
> memory to memory, we actually optimize it using an aligned SSE load into an
> SSE register and store from there back to memory (again, aligned SSE load,
> that one is correct).
> The current va_arg expanded code just computes the address from which the
> __int128 etc. should be loaded, in two different branches and the load is
> then done in the joiner bb.
> So, what we could do is either move the loads from the joiner bb into the
> two predecessors of that bb, have the one for the case of read from gpr save
> area use 8-byte only aligned load and the one reading from overflow area
> with 16-byte alignment, but I'd say that would result in unnecessarily
> longer sequence, or we can as the patch just lower the alignment of the
> MEM_REF in the joiner bb, which will for the cases we used vmovdqa etc.
> just use vmovdqu instead, but will not grow code size and on contemporary
> CPUs if the address is actually aligned, it shouldn't be really slower.
> In most cases the __int128 is used anyway directly and in those cases likely
> will be loaded into two GRPs.

Agreed, movdqu is usually about as cheap as movdqa ;)
> 
> The patch does this only for the needed_intregs case and !need_temp,
> when need_temp, this kind of problem doesn't exist, the value is copied
> either using memcpy or just 64-bit loads + stores into the aligned
> temporary, and for needed_sseregs the slots in the fp save area are 16-byte
> aligned already, and 32-byte aligned structures just would have 32-byte size
> and thus would be passed in memory for ... args, no matter if they contain
> __mm256 or __mm512 or something else.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux and after a while
> release branches?
> 
> 2019-12-12  Jakub Jelinek  <jakub@redhat.com>
> 
> 	PR target/92904
> 	* config/i386/i386.c (ix86_gimplify_va_arg): If need_intregs and
> 	not need_temp, decrease alignment of the read because the GPR save
> 	area only guarantees 8-byte alignment.
> 
> 	* gcc.c-torture/execute/pr92904.c: New test.
OK,
thanks!
honza
> 
> --- gcc/config/i386/i386.c.jj	2019-12-10 10:01:08.384171578 +0100
> +++ gcc/config/i386/i386.c	2019-12-12 13:57:38.584461843 +0100
> @@ -4277,6 +4277,7 @@ ix86_gimplify_va_arg (tree valist, tree
>    tree ptrtype;
>    machine_mode nat_mode;
>    unsigned int arg_boundary;
> +  unsigned int type_align;
>  
>    /* Only 64bit target needs something special.  */
>    if (is_va_list_char_pointer (TREE_TYPE (valist)))
> @@ -4334,6 +4335,7 @@ ix86_gimplify_va_arg (tree valist, tree
>    /* Pull the value out of the saved registers.  */
>  
>    addr = create_tmp_var (ptr_type_node, "addr");
> +  type_align = TYPE_ALIGN (type);
>  
>    if (container)
>      {
> @@ -4504,6 +4506,9 @@ ix86_gimplify_va_arg (tree valist, tree
>  	  t = build2 (PLUS_EXPR, TREE_TYPE (gpr), gpr,
>  		      build_int_cst (TREE_TYPE (gpr), needed_intregs * 8));
>  	  gimplify_assign (gpr, t, pre_p);
> +	  /* The GPR save area guarantees only 8-byte alignment.  */
> +	  if (!need_temp)
> +	    type_align = MIN (type_align, 64);
>  	}
>  
>        if (needed_sseregs)
> @@ -4548,6 +4556,7 @@ ix86_gimplify_va_arg (tree valist, tree
>    if (container)
>      gimple_seq_add_stmt (pre_p, gimple_build_label (lab_over));
>  
> +  type = build_aligned_type (type, type_align);
>    ptrtype = build_pointer_type_for_mode (type, ptr_mode, true);
>    addr = fold_convert (ptrtype, addr);
>  
> --- gcc/testsuite/gcc.c-torture/execute/pr92904.c.jj	2019-12-12 15:04:59.203302591 +0100
> +++ gcc/testsuite/gcc.c-torture/execute/pr92904.c	2019-12-12 15:08:05.190449143 +0100
> @@ -0,0 +1,395 @@
> +/* PR target/92904 */
> +
> +#include <stdarg.h>
> +
> +struct S { long long a, b; };
> +struct __attribute__((aligned (16))) T { long long a, b; };
> +struct U { double a, b, c, d; };
> +struct __attribute__((aligned (32))) V { double a, b, c, d; };
> +struct W { double a; long long b; };
> +struct __attribute__((aligned (16))) X { double a; long long b; };
> +#if __SIZEOF_INT128__ == 2 * __SIZEOF_LONG_LONG__
> +__int128 b;
> +#endif
> +struct S c;
> +struct T d;
> +struct U e;
> +struct V f;
> +struct W g;
> +struct X h;
> +
> +#if __SIZEOF_INT128__ == 2 * __SIZEOF_LONG_LONG__
> +__attribute__((noipa)) __int128
> +f1 (int x, ...)
> +{
> +  __int128 r;
> +  va_list ap;
> +  va_start (ap, x);
> +  while (x--)
> +    va_arg (ap, int);
> +  r = va_arg (ap, __int128);
> +  va_end (ap);
> +  return r;
> +}
> +#endif
> +
> +__attribute__((noipa)) struct S
> +f2 (int x, ...)
> +{
> +  struct S r;
> +  va_list ap;
> +  va_start (ap, x);
> +  while (x--)
> +    va_arg (ap, int);
> +  r = va_arg (ap, struct S);
> +  va_end (ap);
> +  return r;
> +}
> +
> +__attribute__((noipa)) struct T
> +f3 (int x, ...)
> +{
> +  struct T r;
> +  va_list ap;
> +  va_start (ap, x);
> +  while (x--)
> +    va_arg (ap, int);
> +  r = va_arg (ap, struct T);
> +  va_end (ap);
> +  return r;
> +}
> +
> +#if __SIZEOF_INT128__ == 2 * __SIZEOF_LONG_LONG__
> +__attribute__((noipa)) void
> +f4 (int x, ...)
> +{
> +  va_list ap;
> +  va_start (ap, x);
> +  while (x--)
> +    va_arg (ap, int);
> +  b = va_arg (ap, __int128);
> +  va_end (ap);
> +}
> +#endif
> +
> +__attribute__((noipa)) void
> +f5 (int x, ...)
> +{
> +  va_list ap;
> +  va_start (ap, x);
> +  while (x--)
> +    va_arg (ap, int);
> +  c = va_arg (ap, struct S);
> +  va_end (ap);
> +}
> +
> +__attribute__((noipa)) void
> +f6 (int x, ...)
> +{
> +  va_list ap;
> +  va_start (ap, x);
> +  while (x--)
> +    va_arg (ap, int);
> +  d = va_arg (ap, struct T);
> +  va_end (ap);
> +}
> +
> +__attribute__((noipa)) struct U
> +f7 (int x, ...)
> +{
> +  struct U r;
> +  va_list ap;
> +  va_start (ap, x);
> +  while (x--)
> +    va_arg (ap, double);
> +  r = va_arg (ap, struct U);
> +  va_end (ap);
> +  return r;
> +}
> +
> +__attribute__((noipa)) struct V
> +f8 (int x, ...)
> +{
> +  struct V r;
> +  va_list ap;
> +  va_start (ap, x);
> +  while (x--)
> +    va_arg (ap, double);
> +  r = va_arg (ap, struct V);
> +  va_end (ap);
> +  return r;
> +}
> +
> +__attribute__((noipa)) void
> +f9 (int x, ...)
> +{
> +  va_list ap;
> +  va_start (ap, x);
> +  while (x--)
> +    va_arg (ap, double);
> +  e = va_arg (ap, struct U);
> +  va_end (ap);
> +}
> +
> +__attribute__((noipa)) void
> +f10 (int x, ...)
> +{
> +  va_list ap;
> +  va_start (ap, x);
> +  while (x--)
> +    va_arg (ap, double);
> +  f = va_arg (ap, struct V);
> +  va_end (ap);
> +}
> +
> +__attribute__((noipa)) struct W
> +f11 (int x, ...)
> +{
> +  struct W r;
> +  va_list ap;
> +  va_start (ap, x);
> +  while (x--)
> +    {
> +      va_arg (ap, int);
> +      va_arg (ap, double);
> +    }
> +  r = va_arg (ap, struct W);
> +  va_end (ap);
> +  return r;
> +}
> +
> +__attribute__((noipa)) struct X
> +f12 (int x, ...)
> +{
> +  struct X r;
> +  va_list ap;
> +  va_start (ap, x);
> +  while (x--)
> +    {
> +      va_arg (ap, int);
> +      va_arg (ap, double);
> +    }
> +  r = va_arg (ap, struct X);
> +  va_end (ap);
> +  return r;
> +}
> +
> +__attribute__((noipa)) void
> +f13 (int x, ...)
> +{
> +  va_list ap;
> +  va_start (ap, x);
> +  while (x--)
> +    {
> +      va_arg (ap, int);
> +      va_arg (ap, double);
> +    }
> +  g = va_arg (ap, struct W);
> +  va_end (ap);
> +}
> +
> +__attribute__((noipa)) void
> +f14 (int x, ...)
> +{
> +  va_list ap;
> +  va_start (ap, x);
> +  while (x--)
> +    {
> +      va_arg (ap, int);
> +      va_arg (ap, double);
> +    }
> +  h = va_arg (ap, struct X);
> +  va_end (ap);
> +}
> +
> +int
> +main ()
> +{
> +  union Y {
> +#if __SIZEOF_INT128__ == 2 * __SIZEOF_LONG_LONG__
> +    __int128 b;
> +#endif
> +    struct S c;
> +    struct T d;
> +    struct U e;
> +    struct V f;
> +    struct W g;
> +    struct X h;
> +  } u, v;
> +  u.c.a = 0x5555555555555555ULL;
> +  u.c.b = 0xaaaaaaaaaaaaaaaaULL;
> +#define C(x) \
> +  do {								\
> +    if (u.c.a != x.c.a || u.c.b != x.c.b) __builtin_abort ();	\
> +    u.c.a++;							\
> +    u.c.b--;							\
> +  } while (0)
> +#if __SIZEOF_INT128__ == 2 * __SIZEOF_LONG_LONG__
> +  v.b = f1 (0, u.b); C (v);
> +  v.b = f1 (1, 0, u.b); C (v);
> +  v.b = f1 (2, 0, 0, u.b); C (v);
> +  v.b = f1 (3, 0, 0, 0, u.b); C (v);
> +  v.b = f1 (4, 0, 0, 0, 0, u.b); C (v);
> +  v.b = f1 (5, 0, 0, 0, 0, 0, u.b); C (v);
> +  v.b = f1 (6, 0, 0, 0, 0, 0, 0, u.b); C (v);
> +  v.b = f1 (7, 0, 0, 0, 0, 0, 0, 0, u.b); C (v);
> +  v.b = f1 (8, 0, 0, 0, 0, 0, 0, 0, 0, u.b); C (v);
> +  v.b = f1 (9, 0, 0, 0, 0, 0, 0, 0, 0, 0, u.b); C (v);
> +#endif
> +  v.c = f2 (0, u.c); C (v);
> +  v.c = f2 (1, 0, u.c); C (v);
> +  v.c = f2 (2, 0, 0, u.c); C (v);
> +  v.c = f2 (3, 0, 0, 0, u.c); C (v);
> +  v.c = f2 (4, 0, 0, 0, 0, u.c); C (v);
> +  v.c = f2 (5, 0, 0, 0, 0, 0, u.c); C (v);
> +  v.c = f2 (6, 0, 0, 0, 0, 0, 0, u.c); C (v);
> +  v.c = f2 (7, 0, 0, 0, 0, 0, 0, 0, u.c); C (v);
> +  v.c = f2 (8, 0, 0, 0, 0, 0, 0, 0, 0, u.c); C (v);
> +  v.c = f2 (9, 0, 0, 0, 0, 0, 0, 0, 0, 0, u.c); C (v);
> +  v.d = f3 (0, u.d); C (v);
> +  v.d = f3 (1, 0, u.d); C (v);
> +  v.d = f3 (2, 0, 0, u.d); C (v);
> +  v.d = f3 (3, 0, 0, 0, u.d); C (v);
> +  v.d = f3 (4, 0, 0, 0, 0, u.d); C (v);
> +  v.d = f3 (5, 0, 0, 0, 0, 0, u.d); C (v);
> +  v.d = f3 (6, 0, 0, 0, 0, 0, 0, u.d); C (v);
> +  v.d = f3 (7, 0, 0, 0, 0, 0, 0, 0, u.d); C (v);
> +  v.d = f3 (8, 0, 0, 0, 0, 0, 0, 0, 0, u.d); C (v);
> +  v.d = f3 (9, 0, 0, 0, 0, 0, 0, 0, 0, 0, u.d); C (v);
> +#if __SIZEOF_INT128__ == 2 * __SIZEOF_LONG_LONG__
> +  f4 (0, u.b); v.b = b; C (v);
> +  f4 (1, 0, u.b); v.b = b; C (v);
> +  f4 (2, 0, 0, u.b); v.b = b; C (v);
> +  f4 (3, 0, 0, 0, u.b); v.b = b; C (v);
> +  f4 (4, 0, 0, 0, 0, u.b); v.b = b; C (v);
> +  f4 (5, 0, 0, 0, 0, 0, u.b); v.b = b; C (v);
> +  f4 (6, 0, 0, 0, 0, 0, 0, u.b); v.b = b; C (v);
> +  f4 (7, 0, 0, 0, 0, 0, 0, 0, u.b); v.b = b; C (v);
> +  f4 (8, 0, 0, 0, 0, 0, 0, 0, 0, u.b); v.b = b; C (v);
> +  f4 (9, 0, 0, 0, 0, 0, 0, 0, 0, 0, u.b); v.b = b; C (v);
> +#endif
> +  f5 (0, u.c); v.c = c; C (v);
> +  f5 (1, 0, u.c); v.c = c; C (v);
> +  f5 (2, 0, 0, u.c); v.c = c; C (v);
> +  f5 (3, 0, 0, 0, u.c); v.c = c; C (v);
> +  f5 (4, 0, 0, 0, 0, u.c); v.c = c; C (v);
> +  f5 (5, 0, 0, 0, 0, 0, u.c); v.c = c; C (v);
> +  f5 (6, 0, 0, 0, 0, 0, 0, u.c); v.c = c; C (v);
> +  f5 (7, 0, 0, 0, 0, 0, 0, 0, u.c); v.c = c; C (v);
> +  f5 (8, 0, 0, 0, 0, 0, 0, 0, 0, u.c); v.c = c; C (v);
> +  f5 (9, 0, 0, 0, 0, 0, 0, 0, 0, 0, u.c); v.c = c; C (v);
> +  f6 (0, u.d); v.d = d; C (v);
> +  f6 (1, 0, u.d); v.d = d; C (v);
> +  f6 (2, 0, 0, u.d); v.d = d; C (v);
> +  f6 (3, 0, 0, 0, u.d); v.d = d; C (v);
> +  f6 (4, 0, 0, 0, 0, u.d); v.d = d; C (v);
> +  f6 (5, 0, 0, 0, 0, 0, u.d); v.d = d; C (v);
> +  f6 (6, 0, 0, 0, 0, 0, 0, u.d); v.d = d; C (v);
> +  f6 (7, 0, 0, 0, 0, 0, 0, 0, u.d); v.d = d; C (v);
> +  f6 (8, 0, 0, 0, 0, 0, 0, 0, 0, u.d); v.d = d; C (v);
> +  f6 (9, 0, 0, 0, 0, 0, 0, 0, 0, 0, u.d); v.d = d; C (v);
> +  u.e.a = 1.25;
> +  u.e.b = 2.75;
> +  u.e.c = -3.5;
> +  u.e.d = -2.0;
> +#undef C
> +#define C(x) \
> +  do {								\
> +    if (u.e.a != x.e.a || u.e.b != x.e.b			\
> +	|| u.e.c != x.e.c || u.e.d != x.e.d) __builtin_abort ();\
> +    u.e.a++;							\
> +    u.e.b--;							\
> +    u.e.c++;							\
> +    u.e.d--;							\
> +  } while (0)
> +  v.e = f7 (0, u.e); C (v);
> +  v.e = f7 (1, 0.0, u.e); C (v);
> +  v.e = f7 (2, 0.0, 0.0, u.e); C (v);
> +  v.e = f7 (3, 0.0, 0.0, 0.0, u.e); C (v);
> +  v.e = f7 (4, 0.0, 0.0, 0.0, 0.0, u.e); C (v);
> +  v.e = f7 (5, 0.0, 0.0, 0.0, 0.0, 0.0, u.e); C (v);
> +  v.e = f7 (6, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, u.e); C (v);
> +  v.e = f7 (7, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, u.e); C (v);
> +  v.e = f7 (8, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, u.e); C (v);
> +  v.e = f7 (9, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, u.e); C (v);
> +  v.f = f8 (0, u.f); C (v);
> +  v.f = f8 (1, 0.0, u.f); C (v);
> +  v.f = f8 (2, 0.0, 0.0, u.f); C (v);
> +  v.f = f8 (3, 0.0, 0.0, 0.0, u.f); C (v);
> +  v.f = f8 (4, 0.0, 0.0, 0.0, 0.0, u.f); C (v);
> +  v.f = f8 (5, 0.0, 0.0, 0.0, 0.0, 0.0, u.f); C (v);
> +  v.f = f8 (6, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, u.f); C (v);
> +  v.f = f8 (7, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, u.f); C (v);
> +  v.f = f8 (8, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, u.f); C (v);
> +  v.f = f8 (9, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, u.f); C (v);
> +  f9 (0, u.e); v.e = e; C (v);
> +  f9 (1, 0.0, u.e); v.e = e; C (v);
> +  f9 (2, 0.0, 0.0, u.e); v.e = e; C (v);
> +  f9 (3, 0.0, 0.0, 0.0, u.e); v.e = e; C (v);
> +  f9 (4, 0.0, 0.0, 0.0, 0.0, u.e); v.e = e; C (v);
> +  f9 (5, 0.0, 0.0, 0.0, 0.0, 0.0, u.e); v.e = e; C (v);
> +  f9 (6, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, u.e); v.e = e; C (v);
> +  f9 (7, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, u.e); v.e = e; C (v);
> +  f9 (8, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, u.e); v.e = e; C (v);
> +  f9 (9, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, u.e); v.e = e; C (v);
> +  f10 (0, u.f); v.f = f; C (v);
> +  f10 (1, 0.0, u.f); v.f = f; C (v);
> +  f10 (2, 0.0, 0.0, u.f); v.f = f; C (v);
> +  f10 (3, 0.0, 0.0, 0.0, u.f); v.f = f; C (v);
> +  f10 (4, 0.0, 0.0, 0.0, 0.0, u.f); v.f = f; C (v);
> +  f10 (5, 0.0, 0.0, 0.0, 0.0, 0.0, u.f); v.f = f; C (v);
> +  f10 (6, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, u.f); v.f = f; C (v);
> +  f10 (7, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, u.f); v.f = f; C (v);
> +  f10 (8, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, u.f); v.f = f; C (v);
> +  f10 (9, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, u.f); v.f = f; C (v);
> +  u.g.a = 9.5;
> +  u.g.b = 0x5555555555555555ULL;
> +#undef C
> +#define C(x) \
> +  do {								\
> +    if (u.e.a != x.e.a || u.e.b != x.e.b) __builtin_abort ();	\
> +    u.e.a++;							\
> +    u.e.b--;							\
> +  } while (0)
> +  v.g = f11 (0, u.g); C (v);
> +  v.g = f11 (1, 0, 0.0, u.g); C (v);
> +  v.g = f11 (2, 0, 0.0, 0, 0.0, u.g); C (v);
> +  v.g = f11 (3, 0, 0.0, 0, 0.0, 0, 0.0, u.g); C (v);
> +  v.g = f11 (4, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, u.g); C (v);
> +  v.g = f11 (5, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, u.g); C (v);
> +  v.g = f11 (6, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, u.g); C (v);
> +  v.g = f11 (7, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, u.g); C (v);
> +  v.g = f11 (8, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, u.g); C (v);
> +  v.g = f11 (9, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, u.g); C (v);
> +  v.h = f12 (0, u.h); C (v);
> +  v.h = f12 (1, 0, 0.0, u.h); C (v);
> +  v.h = f12 (2, 0, 0.0, 0, 0.0, u.h); C (v);
> +  v.h = f12 (3, 0, 0.0, 0, 0.0, 0, 0.0, u.h); C (v);
> +  v.h = f12 (4, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, u.h); C (v);
> +  v.h = f12 (5, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, u.h); C (v);
> +  v.h = f12 (6, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, u.h); C (v);
> +  v.h = f12 (7, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, u.h); C (v);
> +  v.h = f12 (8, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, u.h); C (v);
> +  v.h = f12 (9, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, u.h); C (v);
> +  f13 (0, u.g); v.g = g; C (v);
> +  f13 (1, 0, 0.0, u.g); v.g = g; C (v);
> +  f13 (2, 0, 0.0, 0, 0.0, u.g); v.g = g; C (v);
> +  f13 (3, 0, 0.0, 0, 0.0, 0, 0.0, u.g); v.g = g; C (v);
> +  f13 (4, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, u.g); v.g = g; C (v);
> +  f13 (5, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, u.g); v.g = g; C (v);
> +  f13 (6, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, u.g); v.g = g; C (v);
> +  f13 (7, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, u.g); v.g = g; C (v);
> +  f13 (8, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, u.g); v.g = g; C (v);
> +  f13 (9, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, u.g); v.g = g; C (v);
> +  f14 (0, u.h); v.h = h; C (v);
> +  f14 (1, 0, 0.0, u.h); v.h = h; C (v);
> +  f14 (2, 0, 0.0, 0, 0.0, u.h); v.h = h; C (v);
> +  f14 (3, 0, 0.0, 0, 0.0, 0, 0.0, u.h); v.h = h; C (v);
> +  f14 (4, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, u.h); v.h = h; C (v);
> +  f14 (5, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, u.h); v.h = h; C (v);
> +  f14 (6, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, u.h); v.h = h; C (v);
> +  f14 (7, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, u.h); v.h = h; C (v);
> +  f14 (8, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, u.h); v.h = h; C (v);
> +  f14 (9, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, u.h); v.h = h; C (v);
> +  return 0;
> +}
> 
> 	Jakub
>
diff mbox series

Patch

--- gcc/config/i386/i386.c.jj	2019-12-10 10:01:08.384171578 +0100
+++ gcc/config/i386/i386.c	2019-12-12 13:57:38.584461843 +0100
@@ -4277,6 +4277,7 @@  ix86_gimplify_va_arg (tree valist, tree
   tree ptrtype;
   machine_mode nat_mode;
   unsigned int arg_boundary;
+  unsigned int type_align;
 
   /* Only 64bit target needs something special.  */
   if (is_va_list_char_pointer (TREE_TYPE (valist)))
@@ -4334,6 +4335,7 @@  ix86_gimplify_va_arg (tree valist, tree
   /* Pull the value out of the saved registers.  */
 
   addr = create_tmp_var (ptr_type_node, "addr");
+  type_align = TYPE_ALIGN (type);
 
   if (container)
     {
@@ -4504,6 +4506,9 @@  ix86_gimplify_va_arg (tree valist, tree
 	  t = build2 (PLUS_EXPR, TREE_TYPE (gpr), gpr,
 		      build_int_cst (TREE_TYPE (gpr), needed_intregs * 8));
 	  gimplify_assign (gpr, t, pre_p);
+	  /* The GPR save area guarantees only 8-byte alignment.  */
+	  if (!need_temp)
+	    type_align = MIN (type_align, 64);
 	}
 
       if (needed_sseregs)
@@ -4548,6 +4556,7 @@  ix86_gimplify_va_arg (tree valist, tree
   if (container)
     gimple_seq_add_stmt (pre_p, gimple_build_label (lab_over));
 
+  type = build_aligned_type (type, type_align);
   ptrtype = build_pointer_type_for_mode (type, ptr_mode, true);
   addr = fold_convert (ptrtype, addr);
 
--- gcc/testsuite/gcc.c-torture/execute/pr92904.c.jj	2019-12-12 15:04:59.203302591 +0100
+++ gcc/testsuite/gcc.c-torture/execute/pr92904.c	2019-12-12 15:08:05.190449143 +0100
@@ -0,0 +1,395 @@ 
+/* PR target/92904 */
+
+#include <stdarg.h>
+
+struct S { long long a, b; };
+struct __attribute__((aligned (16))) T { long long a, b; };
+struct U { double a, b, c, d; };
+struct __attribute__((aligned (32))) V { double a, b, c, d; };
+struct W { double a; long long b; };
+struct __attribute__((aligned (16))) X { double a; long long b; };
+#if __SIZEOF_INT128__ == 2 * __SIZEOF_LONG_LONG__
+__int128 b;
+#endif
+struct S c;
+struct T d;
+struct U e;
+struct V f;
+struct W g;
+struct X h;
+
+#if __SIZEOF_INT128__ == 2 * __SIZEOF_LONG_LONG__
+__attribute__((noipa)) __int128
+f1 (int x, ...)
+{
+  __int128 r;
+  va_list ap;
+  va_start (ap, x);
+  while (x--)
+    va_arg (ap, int);
+  r = va_arg (ap, __int128);
+  va_end (ap);
+  return r;
+}
+#endif
+
+__attribute__((noipa)) struct S
+f2 (int x, ...)
+{
+  struct S r;
+  va_list ap;
+  va_start (ap, x);
+  while (x--)
+    va_arg (ap, int);
+  r = va_arg (ap, struct S);
+  va_end (ap);
+  return r;
+}
+
+__attribute__((noipa)) struct T
+f3 (int x, ...)
+{
+  struct T r;
+  va_list ap;
+  va_start (ap, x);
+  while (x--)
+    va_arg (ap, int);
+  r = va_arg (ap, struct T);
+  va_end (ap);
+  return r;
+}
+
+#if __SIZEOF_INT128__ == 2 * __SIZEOF_LONG_LONG__
+__attribute__((noipa)) void
+f4 (int x, ...)
+{
+  va_list ap;
+  va_start (ap, x);
+  while (x--)
+    va_arg (ap, int);
+  b = va_arg (ap, __int128);
+  va_end (ap);
+}
+#endif
+
+__attribute__((noipa)) void
+f5 (int x, ...)
+{
+  va_list ap;
+  va_start (ap, x);
+  while (x--)
+    va_arg (ap, int);
+  c = va_arg (ap, struct S);
+  va_end (ap);
+}
+
+__attribute__((noipa)) void
+f6 (int x, ...)
+{
+  va_list ap;
+  va_start (ap, x);
+  while (x--)
+    va_arg (ap, int);
+  d = va_arg (ap, struct T);
+  va_end (ap);
+}
+
+__attribute__((noipa)) struct U
+f7 (int x, ...)
+{
+  struct U r;
+  va_list ap;
+  va_start (ap, x);
+  while (x--)
+    va_arg (ap, double);
+  r = va_arg (ap, struct U);
+  va_end (ap);
+  return r;
+}
+
+__attribute__((noipa)) struct V
+f8 (int x, ...)
+{
+  struct V r;
+  va_list ap;
+  va_start (ap, x);
+  while (x--)
+    va_arg (ap, double);
+  r = va_arg (ap, struct V);
+  va_end (ap);
+  return r;
+}
+
+__attribute__((noipa)) void
+f9 (int x, ...)
+{
+  va_list ap;
+  va_start (ap, x);
+  while (x--)
+    va_arg (ap, double);
+  e = va_arg (ap, struct U);
+  va_end (ap);
+}
+
+__attribute__((noipa)) void
+f10 (int x, ...)
+{
+  va_list ap;
+  va_start (ap, x);
+  while (x--)
+    va_arg (ap, double);
+  f = va_arg (ap, struct V);
+  va_end (ap);
+}
+
+__attribute__((noipa)) struct W
+f11 (int x, ...)
+{
+  struct W r;
+  va_list ap;
+  va_start (ap, x);
+  while (x--)
+    {
+      va_arg (ap, int);
+      va_arg (ap, double);
+    }
+  r = va_arg (ap, struct W);
+  va_end (ap);
+  return r;
+}
+
+__attribute__((noipa)) struct X
+f12 (int x, ...)
+{
+  struct X r;
+  va_list ap;
+  va_start (ap, x);
+  while (x--)
+    {
+      va_arg (ap, int);
+      va_arg (ap, double);
+    }
+  r = va_arg (ap, struct X);
+  va_end (ap);
+  return r;
+}
+
+__attribute__((noipa)) void
+f13 (int x, ...)
+{
+  va_list ap;
+  va_start (ap, x);
+  while (x--)
+    {
+      va_arg (ap, int);
+      va_arg (ap, double);
+    }
+  g = va_arg (ap, struct W);
+  va_end (ap);
+}
+
+__attribute__((noipa)) void
+f14 (int x, ...)
+{
+  va_list ap;
+  va_start (ap, x);
+  while (x--)
+    {
+      va_arg (ap, int);
+      va_arg (ap, double);
+    }
+  h = va_arg (ap, struct X);
+  va_end (ap);
+}
+
+int
+main ()
+{
+  union Y {
+#if __SIZEOF_INT128__ == 2 * __SIZEOF_LONG_LONG__
+    __int128 b;
+#endif
+    struct S c;
+    struct T d;
+    struct U e;
+    struct V f;
+    struct W g;
+    struct X h;
+  } u, v;
+  u.c.a = 0x5555555555555555ULL;
+  u.c.b = 0xaaaaaaaaaaaaaaaaULL;
+#define C(x) \
+  do {								\
+    if (u.c.a != x.c.a || u.c.b != x.c.b) __builtin_abort ();	\
+    u.c.a++;							\
+    u.c.b--;							\
+  } while (0)
+#if __SIZEOF_INT128__ == 2 * __SIZEOF_LONG_LONG__
+  v.b = f1 (0, u.b); C (v);
+  v.b = f1 (1, 0, u.b); C (v);
+  v.b = f1 (2, 0, 0, u.b); C (v);
+  v.b = f1 (3, 0, 0, 0, u.b); C (v);
+  v.b = f1 (4, 0, 0, 0, 0, u.b); C (v);
+  v.b = f1 (5, 0, 0, 0, 0, 0, u.b); C (v);
+  v.b = f1 (6, 0, 0, 0, 0, 0, 0, u.b); C (v);
+  v.b = f1 (7, 0, 0, 0, 0, 0, 0, 0, u.b); C (v);
+  v.b = f1 (8, 0, 0, 0, 0, 0, 0, 0, 0, u.b); C (v);
+  v.b = f1 (9, 0, 0, 0, 0, 0, 0, 0, 0, 0, u.b); C (v);
+#endif
+  v.c = f2 (0, u.c); C (v);
+  v.c = f2 (1, 0, u.c); C (v);
+  v.c = f2 (2, 0, 0, u.c); C (v);
+  v.c = f2 (3, 0, 0, 0, u.c); C (v);
+  v.c = f2 (4, 0, 0, 0, 0, u.c); C (v);
+  v.c = f2 (5, 0, 0, 0, 0, 0, u.c); C (v);
+  v.c = f2 (6, 0, 0, 0, 0, 0, 0, u.c); C (v);
+  v.c = f2 (7, 0, 0, 0, 0, 0, 0, 0, u.c); C (v);
+  v.c = f2 (8, 0, 0, 0, 0, 0, 0, 0, 0, u.c); C (v);
+  v.c = f2 (9, 0, 0, 0, 0, 0, 0, 0, 0, 0, u.c); C (v);
+  v.d = f3 (0, u.d); C (v);
+  v.d = f3 (1, 0, u.d); C (v);
+  v.d = f3 (2, 0, 0, u.d); C (v);
+  v.d = f3 (3, 0, 0, 0, u.d); C (v);
+  v.d = f3 (4, 0, 0, 0, 0, u.d); C (v);
+  v.d = f3 (5, 0, 0, 0, 0, 0, u.d); C (v);
+  v.d = f3 (6, 0, 0, 0, 0, 0, 0, u.d); C (v);
+  v.d = f3 (7, 0, 0, 0, 0, 0, 0, 0, u.d); C (v);
+  v.d = f3 (8, 0, 0, 0, 0, 0, 0, 0, 0, u.d); C (v);
+  v.d = f3 (9, 0, 0, 0, 0, 0, 0, 0, 0, 0, u.d); C (v);
+#if __SIZEOF_INT128__ == 2 * __SIZEOF_LONG_LONG__
+  f4 (0, u.b); v.b = b; C (v);
+  f4 (1, 0, u.b); v.b = b; C (v);
+  f4 (2, 0, 0, u.b); v.b = b; C (v);
+  f4 (3, 0, 0, 0, u.b); v.b = b; C (v);
+  f4 (4, 0, 0, 0, 0, u.b); v.b = b; C (v);
+  f4 (5, 0, 0, 0, 0, 0, u.b); v.b = b; C (v);
+  f4 (6, 0, 0, 0, 0, 0, 0, u.b); v.b = b; C (v);
+  f4 (7, 0, 0, 0, 0, 0, 0, 0, u.b); v.b = b; C (v);
+  f4 (8, 0, 0, 0, 0, 0, 0, 0, 0, u.b); v.b = b; C (v);
+  f4 (9, 0, 0, 0, 0, 0, 0, 0, 0, 0, u.b); v.b = b; C (v);
+#endif
+  f5 (0, u.c); v.c = c; C (v);
+  f5 (1, 0, u.c); v.c = c; C (v);
+  f5 (2, 0, 0, u.c); v.c = c; C (v);
+  f5 (3, 0, 0, 0, u.c); v.c = c; C (v);
+  f5 (4, 0, 0, 0, 0, u.c); v.c = c; C (v);
+  f5 (5, 0, 0, 0, 0, 0, u.c); v.c = c; C (v);
+  f5 (6, 0, 0, 0, 0, 0, 0, u.c); v.c = c; C (v);
+  f5 (7, 0, 0, 0, 0, 0, 0, 0, u.c); v.c = c; C (v);
+  f5 (8, 0, 0, 0, 0, 0, 0, 0, 0, u.c); v.c = c; C (v);
+  f5 (9, 0, 0, 0, 0, 0, 0, 0, 0, 0, u.c); v.c = c; C (v);
+  f6 (0, u.d); v.d = d; C (v);
+  f6 (1, 0, u.d); v.d = d; C (v);
+  f6 (2, 0, 0, u.d); v.d = d; C (v);
+  f6 (3, 0, 0, 0, u.d); v.d = d; C (v);
+  f6 (4, 0, 0, 0, 0, u.d); v.d = d; C (v);
+  f6 (5, 0, 0, 0, 0, 0, u.d); v.d = d; C (v);
+  f6 (6, 0, 0, 0, 0, 0, 0, u.d); v.d = d; C (v);
+  f6 (7, 0, 0, 0, 0, 0, 0, 0, u.d); v.d = d; C (v);
+  f6 (8, 0, 0, 0, 0, 0, 0, 0, 0, u.d); v.d = d; C (v);
+  f6 (9, 0, 0, 0, 0, 0, 0, 0, 0, 0, u.d); v.d = d; C (v);
+  u.e.a = 1.25;
+  u.e.b = 2.75;
+  u.e.c = -3.5;
+  u.e.d = -2.0;
+#undef C
+#define C(x) \
+  do {								\
+    if (u.e.a != x.e.a || u.e.b != x.e.b			\
+	|| u.e.c != x.e.c || u.e.d != x.e.d) __builtin_abort ();\
+    u.e.a++;							\
+    u.e.b--;							\
+    u.e.c++;							\
+    u.e.d--;							\
+  } while (0)
+  v.e = f7 (0, u.e); C (v);
+  v.e = f7 (1, 0.0, u.e); C (v);
+  v.e = f7 (2, 0.0, 0.0, u.e); C (v);
+  v.e = f7 (3, 0.0, 0.0, 0.0, u.e); C (v);
+  v.e = f7 (4, 0.0, 0.0, 0.0, 0.0, u.e); C (v);
+  v.e = f7 (5, 0.0, 0.0, 0.0, 0.0, 0.0, u.e); C (v);
+  v.e = f7 (6, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, u.e); C (v);
+  v.e = f7 (7, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, u.e); C (v);
+  v.e = f7 (8, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, u.e); C (v);
+  v.e = f7 (9, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, u.e); C (v);
+  v.f = f8 (0, u.f); C (v);
+  v.f = f8 (1, 0.0, u.f); C (v);
+  v.f = f8 (2, 0.0, 0.0, u.f); C (v);
+  v.f = f8 (3, 0.0, 0.0, 0.0, u.f); C (v);
+  v.f = f8 (4, 0.0, 0.0, 0.0, 0.0, u.f); C (v);
+  v.f = f8 (5, 0.0, 0.0, 0.0, 0.0, 0.0, u.f); C (v);
+  v.f = f8 (6, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, u.f); C (v);
+  v.f = f8 (7, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, u.f); C (v);
+  v.f = f8 (8, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, u.f); C (v);
+  v.f = f8 (9, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, u.f); C (v);
+  f9 (0, u.e); v.e = e; C (v);
+  f9 (1, 0.0, u.e); v.e = e; C (v);
+  f9 (2, 0.0, 0.0, u.e); v.e = e; C (v);
+  f9 (3, 0.0, 0.0, 0.0, u.e); v.e = e; C (v);
+  f9 (4, 0.0, 0.0, 0.0, 0.0, u.e); v.e = e; C (v);
+  f9 (5, 0.0, 0.0, 0.0, 0.0, 0.0, u.e); v.e = e; C (v);
+  f9 (6, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, u.e); v.e = e; C (v);
+  f9 (7, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, u.e); v.e = e; C (v);
+  f9 (8, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, u.e); v.e = e; C (v);
+  f9 (9, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, u.e); v.e = e; C (v);
+  f10 (0, u.f); v.f = f; C (v);
+  f10 (1, 0.0, u.f); v.f = f; C (v);
+  f10 (2, 0.0, 0.0, u.f); v.f = f; C (v);
+  f10 (3, 0.0, 0.0, 0.0, u.f); v.f = f; C (v);
+  f10 (4, 0.0, 0.0, 0.0, 0.0, u.f); v.f = f; C (v);
+  f10 (5, 0.0, 0.0, 0.0, 0.0, 0.0, u.f); v.f = f; C (v);
+  f10 (6, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, u.f); v.f = f; C (v);
+  f10 (7, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, u.f); v.f = f; C (v);
+  f10 (8, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, u.f); v.f = f; C (v);
+  f10 (9, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, u.f); v.f = f; C (v);
+  u.g.a = 9.5;
+  u.g.b = 0x5555555555555555ULL;
+#undef C
+#define C(x) \
+  do {								\
+    if (u.e.a != x.e.a || u.e.b != x.e.b) __builtin_abort ();	\
+    u.e.a++;							\
+    u.e.b--;							\
+  } while (0)
+  v.g = f11 (0, u.g); C (v);
+  v.g = f11 (1, 0, 0.0, u.g); C (v);
+  v.g = f11 (2, 0, 0.0, 0, 0.0, u.g); C (v);
+  v.g = f11 (3, 0, 0.0, 0, 0.0, 0, 0.0, u.g); C (v);
+  v.g = f11 (4, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, u.g); C (v);
+  v.g = f11 (5, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, u.g); C (v);
+  v.g = f11 (6, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, u.g); C (v);
+  v.g = f11 (7, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, u.g); C (v);
+  v.g = f11 (8, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, u.g); C (v);
+  v.g = f11 (9, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, u.g); C (v);
+  v.h = f12 (0, u.h); C (v);
+  v.h = f12 (1, 0, 0.0, u.h); C (v);
+  v.h = f12 (2, 0, 0.0, 0, 0.0, u.h); C (v);
+  v.h = f12 (3, 0, 0.0, 0, 0.0, 0, 0.0, u.h); C (v);
+  v.h = f12 (4, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, u.h); C (v);
+  v.h = f12 (5, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, u.h); C (v);
+  v.h = f12 (6, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, u.h); C (v);
+  v.h = f12 (7, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, u.h); C (v);
+  v.h = f12 (8, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, u.h); C (v);
+  v.h = f12 (9, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, u.h); C (v);
+  f13 (0, u.g); v.g = g; C (v);
+  f13 (1, 0, 0.0, u.g); v.g = g; C (v);
+  f13 (2, 0, 0.0, 0, 0.0, u.g); v.g = g; C (v);
+  f13 (3, 0, 0.0, 0, 0.0, 0, 0.0, u.g); v.g = g; C (v);
+  f13 (4, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, u.g); v.g = g; C (v);
+  f13 (5, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, u.g); v.g = g; C (v);
+  f13 (6, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, u.g); v.g = g; C (v);
+  f13 (7, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, u.g); v.g = g; C (v);
+  f13 (8, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, u.g); v.g = g; C (v);
+  f13 (9, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, u.g); v.g = g; C (v);
+  f14 (0, u.h); v.h = h; C (v);
+  f14 (1, 0, 0.0, u.h); v.h = h; C (v);
+  f14 (2, 0, 0.0, 0, 0.0, u.h); v.h = h; C (v);
+  f14 (3, 0, 0.0, 0, 0.0, 0, 0.0, u.h); v.h = h; C (v);
+  f14 (4, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, u.h); v.h = h; C (v);
+  f14 (5, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, u.h); v.h = h; C (v);
+  f14 (6, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, u.h); v.h = h; C (v);
+  f14 (7, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, u.h); v.h = h; C (v);
+  f14 (8, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, u.h); v.h = h; C (v);
+  f14 (9, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, 0, 0.0, u.h); v.h = h; C (v);
+  return 0;
+}