diff mbox

Fix PR67781: wrong code generation for partial load on big endian targets

Message ID 5842907.YBOs7q9Hb0@hardin.shanghai.arm.com
State New
Headers show

Commit Message

Thomas Preudhomme Jan. 5, 2016, 5:53 a.m. UTC
Hi,

bswap optimization pass generate wrong code on big endian targets when the 
result of a bit operation it analyzed is a partial load of the range of memory 
accessed by the original expression (when one or more bytes at lowest address 
were lost in the computation). This is due to the way cmpxchg and cmpnop are 
adjusted in find_bswap_or_nop before being compared to the result of the 
symbolic expression. Part of the adjustment is endian independent: it's to 
ignore the bytes that were not accessed by the original gimple expression. 
However, when the result has less byte than that original expression, some 
more byte need to be ignored and this is endian dependent.

The current code only support loss of bytes at the highest addresses because 
there is no code to adjust the address of the load. However, for little and 
big endian targets the bytes at highest address translate into different byte 
significance in the result. This patch first separate cmpxchg and cmpnop 
adjustement into 2 steps and then deal with endianness correctly for the 
second step.

ChangeLog entries are as follow:


*** gcc/ChangeLog ***

2015-12-16  Thomas Preud'homme  <thomas.preudhomme@arm.com>

        PR tree-optimization/67781
        * tree-ssa-math-opts.c (find_bswap_or_nop): Zero out bytes in cmpxchg
        and cmpnop in two steps: first the ones not accessed in original
        gimple expression in a endian independent way and then the ones not
        accessed in the final result in an endian-specific way.


*** gcc/testsuite/ChangeLog ***

2015-12-16  Thomas Preud'homme  <thomas.preudhomme@arm.com>

        PR tree-optimization/67781
        * gcc.c-torture/execute/pr67781.c: New file.


    according to the size of the symbolic number before using it.  */
@@ -2464,24 +2466,38 @@ find_bswap_or_nop (gimple *stmt, struct symbolic_number 
*n, bool *bswap)
 
   /* Find real size of result (highest non-zero byte).  */
   if (n->base_addr)
-    {
-      int rsize;
-      uint64_t tmpn;
-
-      for (tmpn = n->n, rsize = 0; tmpn; tmpn >>= BITS_PER_MARKER, rsize++);
-      n->range = rsize;
-    }
+    for (tmpn = n->n, rsize = 0; tmpn; tmpn >>= BITS_PER_MARKER, rsize++);
+  else
+    rsize = n->range;
 
-  /* Zero out the extra bits of N and CMP*.  */
+  /* Zero out the bits corresponding to untouched bytes in original gimple
+     expression.  */
   if (n->range < (int) sizeof (int64_t))
     {
-      uint64_t mask;
-
       mask = ((uint64_t) 1 << (n->range * BITS_PER_MARKER)) - 1;
       cmpxchg >>= (64 / BITS_PER_MARKER - n->range) * BITS_PER_MARKER;
       cmpnop &= mask;
     }
 
+  /* Zero out the bits corresponding to unused bytes in the result of the
+     gimple expression.  */
+  if (rsize < n->range)
+    {
+      if (BYTES_BIG_ENDIAN)
+	{
+	  mask = ((uint64_t) 1 << (rsize * BITS_PER_MARKER)) - 1;
+	  cmpxchg &= mask;
+	  cmpnop >>= (n->range - rsize) * BITS_PER_MARKER;
+	}
+      else
+	{
+	  mask = ((uint64_t) 1 << (rsize * BITS_PER_MARKER)) - 1;
+	  cmpxchg >>= (n->range - rsize) * BITS_PER_MARKER;
+	  cmpnop &= mask;
+	}
+      n->range = rsize;
+    }
+
   /* A complete byte swap should make the symbolic number to start with
      the largest digit in the highest order byte. Unchanged symbolic
      number indicates a read with same endianness as target architecture.  */



Regression testsuite was run on a bootstrapped native x86_64-linux-gnu GCC and 
on an arm-none-eabi GCC cross-compiler without any regression. I'm waiting for 
a slot on gcc110 to do a big endian bootstrap but at least the testcase works 
on mips-linux. I'll send an update once bootstrap is complete.

Is this ok for trunk and 5 branch in a week time if no regression is reported?

Best regards,

Thomas

Comments

Thomas Preudhomme Jan. 5, 2016, 7:31 a.m. UTC | #1
On Tuesday, January 05, 2016 01:53:37 PM you wrote:
> 
> Regression testsuite was run on a bootstrapped native x86_64-linux-gnu GCC
> and on an arm-none-eabi GCC cross-compiler without any regression. I'm
> waiting for a slot on gcc110 to do a big endian bootstrap but at least the
> testcase works on mips-linux. I'll send an update once bootstrap is
> complete.

Bootstrap went fine on gcc110 with the following language enabled: 
c,c++,objc,obj-c++,java,fortran,ada,go,lto.

Best regards,

Thomas
Richard Biener Jan. 8, 2016, 9:05 a.m. UTC | #2
On Tue, 5 Jan 2016, Thomas Preud'homme wrote:

> Hi,
> 
> bswap optimization pass generate wrong code on big endian targets when the 
> result of a bit operation it analyzed is a partial load of the range of memory 
> accessed by the original expression (when one or more bytes at lowest address 
> were lost in the computation). This is due to the way cmpxchg and cmpnop are 
> adjusted in find_bswap_or_nop before being compared to the result of the 
> symbolic expression. Part of the adjustment is endian independent: it's to 
> ignore the bytes that were not accessed by the original gimple expression. 
> However, when the result has less byte than that original expression, some 
> more byte need to be ignored and this is endian dependent.
> 
> The current code only support loss of bytes at the highest addresses because 
> there is no code to adjust the address of the load. However, for little and 
> big endian targets the bytes at highest address translate into different byte 
> significance in the result. This patch first separate cmpxchg and cmpnop 
> adjustement into 2 steps and then deal with endianness correctly for the 
> second step.
> 
> ChangeLog entries are as follow:
> 
> 
> *** gcc/ChangeLog ***
> 
> 2015-12-16  Thomas Preud'homme  <thomas.preudhomme@arm.com>
> 
>         PR tree-optimization/67781
>         * tree-ssa-math-opts.c (find_bswap_or_nop): Zero out bytes in cmpxchg
>         and cmpnop in two steps: first the ones not accessed in original
>         gimple expression in a endian independent way and then the ones not
>         accessed in the final result in an endian-specific way.
> 
> 
> *** gcc/testsuite/ChangeLog ***
> 
> 2015-12-16  Thomas Preud'homme  <thomas.preudhomme@arm.com>
> 
>         PR tree-optimization/67781
>         * gcc.c-torture/execute/pr67781.c: New file.
> 
> 
> diff --git a/gcc/testsuite/gcc.c-torture/execute/pr67781.c 
> b/gcc/testsuite/gcc.c-torture/execute/pr67781.c
> new file mode 100644
> index 0000000..bf50aa2
> --- /dev/null
> +++ b/gcc/testsuite/gcc.c-torture/execute/pr67781.c
> @@ -0,0 +1,34 @@
> +#ifdef __UINT32_TYPE__
> +typedef __UINT32_TYPE__ uint32_t;
> +#else
> +typedef unsigned uint32_t;
> +#endif
> +
> +#ifdef __UINT8_TYPE__
> +typedef __UINT8_TYPE__ uint8_t;
> +#else
> +typedef unsigned char uint8_t;
> +#endif
> +
> +struct
> +{
> +  uint32_t a;
> +  uint8_t b;
> +} s = { 0x123456, 0x78 };
> +
> +int pr67781()
> +{
> +  uint32_t c = (s.a << 8) | s.b;
> +  return c;
> +}
> +
> +int
> +main ()
> +{
> +  if (sizeof (uint32_t) * __CHAR_BIT__ != 32)
> +    return 0;
> +
> +  if (pr67781 () != 0x12345678)
> +    __builtin_abort ();
> +  return 0;
> +}
> diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c
> index b00f046..e5a185f 100644
> --- a/gcc/tree-ssa-math-opts.c
> +++ b/gcc/tree-ssa-math-opts.c
> @@ -2441,6 +2441,8 @@ find_bswap_or_nop_1 (gimple *stmt, struct symbolic_number 
> *n, int limit)
>  static gimple *
>  find_bswap_or_nop (gimple *stmt, struct symbolic_number *n, bool *bswap)
>  {
> +  unsigned rsize;
> +  uint64_t tmpn, mask;
>  /* The number which the find_bswap_or_nop_1 result should match in order
>     to have a full byte swap.  The number is shifted to the right
>     according to the size of the symbolic number before using it.  */
> @@ -2464,24 +2466,38 @@ find_bswap_or_nop (gimple *stmt, struct symbolic_number 
> *n, bool *bswap)
>  
>    /* Find real size of result (highest non-zero byte).  */
>    if (n->base_addr)
> -    {
> -      int rsize;
> -      uint64_t tmpn;
> -
> -      for (tmpn = n->n, rsize = 0; tmpn; tmpn >>= BITS_PER_MARKER, rsize++);
> -      n->range = rsize;
> -    }
> +    for (tmpn = n->n, rsize = 0; tmpn; tmpn >>= BITS_PER_MARKER, rsize++);
> +  else
> +    rsize = n->range;
>  
> -  /* Zero out the extra bits of N and CMP*.  */
> +  /* Zero out the bits corresponding to untouched bytes in original gimple
> +     expression.  */
>    if (n->range < (int) sizeof (int64_t))
>      {
> -      uint64_t mask;
> -
>        mask = ((uint64_t) 1 << (n->range * BITS_PER_MARKER)) - 1;
>        cmpxchg >>= (64 / BITS_PER_MARKER - n->range) * BITS_PER_MARKER;
>        cmpnop &= mask;
>      }
>  
> +  /* Zero out the bits corresponding to unused bytes in the result of the
> +     gimple expression.  */
> +  if (rsize < n->range)
> +    {
> +      if (BYTES_BIG_ENDIAN)
> +	{
> +	  mask = ((uint64_t) 1 << (rsize * BITS_PER_MARKER)) - 1;
> +	  cmpxchg &= mask;
> +	  cmpnop >>= (n->range - rsize) * BITS_PER_MARKER;
> +	}
> +      else
> +	{
> +	  mask = ((uint64_t) 1 << (rsize * BITS_PER_MARKER)) - 1;
> +	  cmpxchg >>= (n->range - rsize) * BITS_PER_MARKER;
> +	  cmpnop &= mask;
> +	}
> +      n->range = rsize;
> +    }
> +
>    /* A complete byte swap should make the symbolic number to start with
>       the largest digit in the highest order byte. Unchanged symbolic
>       number indicates a read with same endianness as target architecture.  */
> 
> 
> 
> Regression testsuite was run on a bootstrapped native x86_64-linux-gnu GCC and 
> on an arm-none-eabi GCC cross-compiler without any regression. I'm waiting for 
> a slot on gcc110 to do a big endian bootstrap but at least the testcase works 
> on mips-linux. I'll send an update once bootstrap is complete.
> 
> Is this ok for trunk and 5 branch in a week time if no regression is reported?

Yes.

Thanks,
Richar.

> Best regards,
> 
> Thomas
> 
>
Thomas Preudhomme Jan. 21, 2016, 7:29 a.m. UTC | #3
On Friday, January 08, 2016 10:05:25 AM Richard Biener wrote:
> On Tue, 5 Jan 2016, Thomas Preud'homme wrote:
> > Hi,
> > 
> > bswap optimization pass generate wrong code on big endian targets when the
> > result of a bit operation it analyzed is a partial load of the range of
> > memory accessed by the original expression (when one or more bytes at
> > lowest address were lost in the computation). This is due to the way
> > cmpxchg and cmpnop are adjusted in find_bswap_or_nop before being
> > compared to the result of the symbolic expression. Part of the adjustment
> > is endian independent: it's to ignore the bytes that were not accessed by
> > the original gimple expression. However, when the result has less byte
> > than that original expression, some more byte need to be ignored and this
> > is endian dependent.
> > 
> > The current code only support loss of bytes at the highest addresses
> > because there is no code to adjust the address of the load. However, for
> > little and big endian targets the bytes at highest address translate into
> > different byte significance in the result. This patch first separate
> > cmpxchg and cmpnop adjustement into 2 steps and then deal with endianness
> > correctly for the second step.
> > 
> > ChangeLog entries are as follow:
> > 
> > 
> > *** gcc/ChangeLog ***
> > 
> > 2015-12-16  Thomas Preud'homme  <thomas.preudhomme@arm.com>
> > 
> >         PR tree-optimization/67781
> >         * tree-ssa-math-opts.c (find_bswap_or_nop): Zero out bytes in
> >         cmpxchg
> >         and cmpnop in two steps: first the ones not accessed in original
> >         gimple expression in a endian independent way and then the ones
> >         not
> >         accessed in the final result in an endian-specific way.
> > 
> > *** gcc/testsuite/ChangeLog ***
> > 
> > 2015-12-16  Thomas Preud'homme  <thomas.preudhomme@arm.com>
> > 
> >         PR tree-optimization/67781
> >         * gcc.c-torture/execute/pr67781.c: New file.
> > 
> > diff --git a/gcc/testsuite/gcc.c-torture/execute/pr67781.c
> > b/gcc/testsuite/gcc.c-torture/execute/pr67781.c
> > new file mode 100644
> > index 0000000..bf50aa2
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.c-torture/execute/pr67781.c
> > @@ -0,0 +1,34 @@
> > +#ifdef __UINT32_TYPE__
> > +typedef __UINT32_TYPE__ uint32_t;
> > +#else
> > +typedef unsigned uint32_t;
> > +#endif
> > +
> > +#ifdef __UINT8_TYPE__
> > +typedef __UINT8_TYPE__ uint8_t;
> > +#else
> > +typedef unsigned char uint8_t;
> > +#endif
> > +
> > +struct
> > +{
> > +  uint32_t a;
> > +  uint8_t b;
> > +} s = { 0x123456, 0x78 };
> > +
> > +int pr67781()
> > +{
> > +  uint32_t c = (s.a << 8) | s.b;
> > +  return c;
> > +}
> > +
> > +int
> > +main ()
> > +{
> > +  if (sizeof (uint32_t) * __CHAR_BIT__ != 32)
> > +    return 0;
> > +
> > +  if (pr67781 () != 0x12345678)
> > +    __builtin_abort ();
> > +  return 0;
> > +}
> > diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c
> > index b00f046..e5a185f 100644
> > --- a/gcc/tree-ssa-math-opts.c
> > +++ b/gcc/tree-ssa-math-opts.c
> > @@ -2441,6 +2441,8 @@ find_bswap_or_nop_1 (gimple *stmt, struct
> > symbolic_number *n, int limit)
> > 
> >  static gimple *
> >  find_bswap_or_nop (gimple *stmt, struct symbolic_number *n, bool *bswap)
> >  {
> > 
> > +  unsigned rsize;
> > +  uint64_t tmpn, mask;
> > 
> >  /* The number which the find_bswap_or_nop_1 result should match in order
> >  
> >     to have a full byte swap.  The number is shifted to the right
> >     according to the size of the symbolic number before using it.  */
> > 
> > @@ -2464,24 +2466,38 @@ find_bswap_or_nop (gimple *stmt, struct
> > symbolic_number *n, bool *bswap)
> > 
> >    /* Find real size of result (highest non-zero byte).  */
> >    if (n->base_addr)
> > 
> > -    {
> > -      int rsize;
> > -      uint64_t tmpn;
> > -
> > -      for (tmpn = n->n, rsize = 0; tmpn; tmpn >>= BITS_PER_MARKER,
> > rsize++); -      n->range = rsize;
> > -    }
> > +    for (tmpn = n->n, rsize = 0; tmpn; tmpn >>= BITS_PER_MARKER,
> > rsize++);
> > +  else
> > +    rsize = n->range;
> > 
> > -  /* Zero out the extra bits of N and CMP*.  */
> > +  /* Zero out the bits corresponding to untouched bytes in original
> > gimple
> > +     expression.  */
> > 
> >    if (n->range < (int) sizeof (int64_t))
> >    
> >      {
> > 
> > -      uint64_t mask;
> > -
> > 
> >        mask = ((uint64_t) 1 << (n->range * BITS_PER_MARKER)) - 1;
> >        cmpxchg >>= (64 / BITS_PER_MARKER - n->range) * BITS_PER_MARKER;
> >        cmpnop &= mask;
> >      
> >      }
> > 
> > +  /* Zero out the bits corresponding to unused bytes in the result of the
> > +     gimple expression.  */
> > +  if (rsize < n->range)
> > +    {
> > +      if (BYTES_BIG_ENDIAN)
> > +	{
> > +	  mask = ((uint64_t) 1 << (rsize * BITS_PER_MARKER)) - 1;
> > +	  cmpxchg &= mask;
> > +	  cmpnop >>= (n->range - rsize) * BITS_PER_MARKER;
> > +	}
> > +      else
> > +	{
> > +	  mask = ((uint64_t) 1 << (rsize * BITS_PER_MARKER)) - 1;
> > +	  cmpxchg >>= (n->range - rsize) * BITS_PER_MARKER;
> > +	  cmpnop &= mask;
> > +	}
> > +      n->range = rsize;
> > +    }
> > +
> > 
> >    /* A complete byte swap should make the symbolic number to start with
> >    
> >       the largest digit in the highest order byte. Unchanged symbolic
> >       number indicates a read with same endianness as target architecture.
> >        */
> > 
> > Regression testsuite was run on a bootstrapped native x86_64-linux-gnu GCC
> > and on an arm-none-eabi GCC cross-compiler without any regression. I'm
> > waiting for a slot on gcc110 to do a big endian bootstrap but at least
> > the testcase works on mips-linux. I'll send an update once bootstrap is
> > complete.
> > 
> > Is this ok for trunk and 5 branch in a week time if no regression is
> > reported?
> Yes.

What about a backport to GCC 5? The patch applies cleanly and shows no 
regression when running the testsuite with an arm-none-eabi GCC cross-compiler 
and a natively bootstrapped x86_64-linux-gnu GCC compiler.

It also pass the test added in the patch with a nativly bootstrapped  
powerpc64-unknown-linux-gnu GCC compiler (on gcc110 in compile farm).

Best regards,

Thomas
Richard Biener Jan. 21, 2016, 8:21 a.m. UTC | #4
On Thu, 21 Jan 2016, Thomas Preud'homme wrote:

> On Friday, January 08, 2016 10:05:25 AM Richard Biener wrote:
> > On Tue, 5 Jan 2016, Thomas Preud'homme wrote:
> > > Hi,
> > > 
> > > bswap optimization pass generate wrong code on big endian targets when the
> > > result of a bit operation it analyzed is a partial load of the range of
> > > memory accessed by the original expression (when one or more bytes at
> > > lowest address were lost in the computation). This is due to the way
> > > cmpxchg and cmpnop are adjusted in find_bswap_or_nop before being
> > > compared to the result of the symbolic expression. Part of the adjustment
> > > is endian independent: it's to ignore the bytes that were not accessed by
> > > the original gimple expression. However, when the result has less byte
> > > than that original expression, some more byte need to be ignored and this
> > > is endian dependent.
> > > 
> > > The current code only support loss of bytes at the highest addresses
> > > because there is no code to adjust the address of the load. However, for
> > > little and big endian targets the bytes at highest address translate into
> > > different byte significance in the result. This patch first separate
> > > cmpxchg and cmpnop adjustement into 2 steps and then deal with endianness
> > > correctly for the second step.
> > > 
> > > ChangeLog entries are as follow:
> > > 
> > > 
> > > *** gcc/ChangeLog ***
> > > 
> > > 2015-12-16  Thomas Preud'homme  <thomas.preudhomme@arm.com>
> > > 
> > >         PR tree-optimization/67781
> > >         * tree-ssa-math-opts.c (find_bswap_or_nop): Zero out bytes in
> > >         cmpxchg
> > >         and cmpnop in two steps: first the ones not accessed in original
> > >         gimple expression in a endian independent way and then the ones
> > >         not
> > >         accessed in the final result in an endian-specific way.
> > > 
> > > *** gcc/testsuite/ChangeLog ***
> > > 
> > > 2015-12-16  Thomas Preud'homme  <thomas.preudhomme@arm.com>
> > > 
> > >         PR tree-optimization/67781
> > >         * gcc.c-torture/execute/pr67781.c: New file.
> > > 
> > > diff --git a/gcc/testsuite/gcc.c-torture/execute/pr67781.c
> > > b/gcc/testsuite/gcc.c-torture/execute/pr67781.c
> > > new file mode 100644
> > > index 0000000..bf50aa2
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.c-torture/execute/pr67781.c
> > > @@ -0,0 +1,34 @@
> > > +#ifdef __UINT32_TYPE__
> > > +typedef __UINT32_TYPE__ uint32_t;
> > > +#else
> > > +typedef unsigned uint32_t;
> > > +#endif
> > > +
> > > +#ifdef __UINT8_TYPE__
> > > +typedef __UINT8_TYPE__ uint8_t;
> > > +#else
> > > +typedef unsigned char uint8_t;
> > > +#endif
> > > +
> > > +struct
> > > +{
> > > +  uint32_t a;
> > > +  uint8_t b;
> > > +} s = { 0x123456, 0x78 };
> > > +
> > > +int pr67781()
> > > +{
> > > +  uint32_t c = (s.a << 8) | s.b;
> > > +  return c;
> > > +}
> > > +
> > > +int
> > > +main ()
> > > +{
> > > +  if (sizeof (uint32_t) * __CHAR_BIT__ != 32)
> > > +    return 0;
> > > +
> > > +  if (pr67781 () != 0x12345678)
> > > +    __builtin_abort ();
> > > +  return 0;
> > > +}
> > > diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c
> > > index b00f046..e5a185f 100644
> > > --- a/gcc/tree-ssa-math-opts.c
> > > +++ b/gcc/tree-ssa-math-opts.c
> > > @@ -2441,6 +2441,8 @@ find_bswap_or_nop_1 (gimple *stmt, struct
> > > symbolic_number *n, int limit)
> > > 
> > >  static gimple *
> > >  find_bswap_or_nop (gimple *stmt, struct symbolic_number *n, bool *bswap)
> > >  {
> > > 
> > > +  unsigned rsize;
> > > +  uint64_t tmpn, mask;
> > > 
> > >  /* The number which the find_bswap_or_nop_1 result should match in order
> > >  
> > >     to have a full byte swap.  The number is shifted to the right
> > >     according to the size of the symbolic number before using it.  */
> > > 
> > > @@ -2464,24 +2466,38 @@ find_bswap_or_nop (gimple *stmt, struct
> > > symbolic_number *n, bool *bswap)
> > > 
> > >    /* Find real size of result (highest non-zero byte).  */
> > >    if (n->base_addr)
> > > 
> > > -    {
> > > -      int rsize;
> > > -      uint64_t tmpn;
> > > -
> > > -      for (tmpn = n->n, rsize = 0; tmpn; tmpn >>= BITS_PER_MARKER,
> > > rsize++); -      n->range = rsize;
> > > -    }
> > > +    for (tmpn = n->n, rsize = 0; tmpn; tmpn >>= BITS_PER_MARKER,
> > > rsize++);
> > > +  else
> > > +    rsize = n->range;
> > > 
> > > -  /* Zero out the extra bits of N and CMP*.  */
> > > +  /* Zero out the bits corresponding to untouched bytes in original
> > > gimple
> > > +     expression.  */
> > > 
> > >    if (n->range < (int) sizeof (int64_t))
> > >    
> > >      {
> > > 
> > > -      uint64_t mask;
> > > -
> > > 
> > >        mask = ((uint64_t) 1 << (n->range * BITS_PER_MARKER)) - 1;
> > >        cmpxchg >>= (64 / BITS_PER_MARKER - n->range) * BITS_PER_MARKER;
> > >        cmpnop &= mask;
> > >      
> > >      }
> > > 
> > > +  /* Zero out the bits corresponding to unused bytes in the result of the
> > > +     gimple expression.  */
> > > +  if (rsize < n->range)
> > > +    {
> > > +      if (BYTES_BIG_ENDIAN)
> > > +	{
> > > +	  mask = ((uint64_t) 1 << (rsize * BITS_PER_MARKER)) - 1;
> > > +	  cmpxchg &= mask;
> > > +	  cmpnop >>= (n->range - rsize) * BITS_PER_MARKER;
> > > +	}
> > > +      else
> > > +	{
> > > +	  mask = ((uint64_t) 1 << (rsize * BITS_PER_MARKER)) - 1;
> > > +	  cmpxchg >>= (n->range - rsize) * BITS_PER_MARKER;
> > > +	  cmpnop &= mask;
> > > +	}
> > > +      n->range = rsize;
> > > +    }
> > > +
> > > 
> > >    /* A complete byte swap should make the symbolic number to start with
> > >    
> > >       the largest digit in the highest order byte. Unchanged symbolic
> > >       number indicates a read with same endianness as target architecture.
> > >        */
> > > 
> > > Regression testsuite was run on a bootstrapped native x86_64-linux-gnu GCC
> > > and on an arm-none-eabi GCC cross-compiler without any regression. I'm
> > > waiting for a slot on gcc110 to do a big endian bootstrap but at least
> > > the testcase works on mips-linux. I'll send an update once bootstrap is
> > > complete.
> > > 
> > > Is this ok for trunk and 5 branch in a week time if no regression is
> > > reported?
> > Yes.
> 
> What about a backport to GCC 5? The patch applies cleanly and shows no 
> regression when running the testsuite with an arm-none-eabi GCC cross-compiler 
> and a natively bootstrapped x86_64-linux-gnu GCC compiler.
> 
> It also pass the test added in the patch with a nativly bootstrapped  
> powerpc64-unknown-linux-gnu GCC compiler (on gcc110 in compile farm).

The "Yes" above approved a backport.

Richard.
Thomas Preudhomme Jan. 21, 2016, 8:22 a.m. UTC | #5
On Thursday, January 21, 2016 09:21:52 AM Richard Biener wrote:
> On Thu, 21 Jan 2016, Thomas Preud'homme wrote:
> > On Friday, January 08, 2016 10:05:25 AM Richard Biener wrote:
> > > On Tue, 5 Jan 2016, Thomas Preud'homme wrote:
> > > > Hi,
> > > > 
> > > > bswap optimization pass generate wrong code on big endian targets when
> > > > the
> > > > result of a bit operation it analyzed is a partial load of the range
> > > > of
> > > > memory accessed by the original expression (when one or more bytes at
> > > > lowest address were lost in the computation). This is due to the way
> > > > cmpxchg and cmpnop are adjusted in find_bswap_or_nop before being
> > > > compared to the result of the symbolic expression. Part of the
> > > > adjustment
> > > > is endian independent: it's to ignore the bytes that were not accessed
> > > > by
> > > > the original gimple expression. However, when the result has less byte
> > > > than that original expression, some more byte need to be ignored and
> > > > this
> > > > is endian dependent.
> > > > 
> > > > The current code only support loss of bytes at the highest addresses
> > > > because there is no code to adjust the address of the load. However,
> > > > for
> > > > little and big endian targets the bytes at highest address translate
> > > > into
> > > > different byte significance in the result. This patch first separate
> > > > cmpxchg and cmpnop adjustement into 2 steps and then deal with
> > > > endianness
> > > > correctly for the second step.
> > > > 
> > > > ChangeLog entries are as follow:
> > > > 
> > > > 
> > > > *** gcc/ChangeLog ***
> > > > 
> > > > 2015-12-16  Thomas Preud'homme  <thomas.preudhomme@arm.com>
> > > > 
> > > >         PR tree-optimization/67781
> > > >         * tree-ssa-math-opts.c (find_bswap_or_nop): Zero out bytes in
> > > >         cmpxchg
> > > >         and cmpnop in two steps: first the ones not accessed in
> > > >         original
> > > >         gimple expression in a endian independent way and then the
> > > >         ones
> > > >         not
> > > >         accessed in the final result in an endian-specific way.
> > > > 
> > > > *** gcc/testsuite/ChangeLog ***
> > > > 
> > > > 2015-12-16  Thomas Preud'homme  <thomas.preudhomme@arm.com>
> > > > 
> > > >         PR tree-optimization/67781
> > > >         * gcc.c-torture/execute/pr67781.c: New file.
> > > > 
> > > > diff --git a/gcc/testsuite/gcc.c-torture/execute/pr67781.c
> > > > b/gcc/testsuite/gcc.c-torture/execute/pr67781.c
> > > > new file mode 100644
> > > > index 0000000..bf50aa2
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.c-torture/execute/pr67781.c
> > > > @@ -0,0 +1,34 @@
> > > > +#ifdef __UINT32_TYPE__
> > > > +typedef __UINT32_TYPE__ uint32_t;
> > > > +#else
> > > > +typedef unsigned uint32_t;
> > > > +#endif
> > > > +
> > > > +#ifdef __UINT8_TYPE__
> > > > +typedef __UINT8_TYPE__ uint8_t;
> > > > +#else
> > > > +typedef unsigned char uint8_t;
> > > > +#endif
> > > > +
> > > > +struct
> > > > +{
> > > > +  uint32_t a;
> > > > +  uint8_t b;
> > > > +} s = { 0x123456, 0x78 };
> > > > +
> > > > +int pr67781()
> > > > +{
> > > > +  uint32_t c = (s.a << 8) | s.b;
> > > > +  return c;
> > > > +}
> > > > +
> > > > +int
> > > > +main ()
> > > > +{
> > > > +  if (sizeof (uint32_t) * __CHAR_BIT__ != 32)
> > > > +    return 0;
> > > > +
> > > > +  if (pr67781 () != 0x12345678)
> > > > +    __builtin_abort ();
> > > > +  return 0;
> > > > +}
> > > > diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c
> > > > index b00f046..e5a185f 100644
> > > > --- a/gcc/tree-ssa-math-opts.c
> > > > +++ b/gcc/tree-ssa-math-opts.c
> > > > @@ -2441,6 +2441,8 @@ find_bswap_or_nop_1 (gimple *stmt, struct
> > > > symbolic_number *n, int limit)
> > > > 
> > > >  static gimple *
> > > >  find_bswap_or_nop (gimple *stmt, struct symbolic_number *n, bool
> > > >  *bswap)
> > > >  {
> > > > 
> > > > +  unsigned rsize;
> > > > +  uint64_t tmpn, mask;
> > > > 
> > > >  /* The number which the find_bswap_or_nop_1 result should match in
> > > >  order
> > > >  
> > > >     to have a full byte swap.  The number is shifted to the right
> > > >     according to the size of the symbolic number before using it.  */
> > > > 
> > > > @@ -2464,24 +2466,38 @@ find_bswap_or_nop (gimple *stmt, struct
> > > > symbolic_number *n, bool *bswap)
> > > > 
> > > >    /* Find real size of result (highest non-zero byte).  */
> > > >    if (n->base_addr)
> > > > 
> > > > -    {
> > > > -      int rsize;
> > > > -      uint64_t tmpn;
> > > > -
> > > > -      for (tmpn = n->n, rsize = 0; tmpn; tmpn >>= BITS_PER_MARKER,
> > > > rsize++); -      n->range = rsize;
> > > > -    }
> > > > +    for (tmpn = n->n, rsize = 0; tmpn; tmpn >>= BITS_PER_MARKER,
> > > > rsize++);
> > > > +  else
> > > > +    rsize = n->range;
> > > > 
> > > > -  /* Zero out the extra bits of N and CMP*.  */
> > > > +  /* Zero out the bits corresponding to untouched bytes in original
> > > > gimple
> > > > +     expression.  */
> > > > 
> > > >    if (n->range < (int) sizeof (int64_t))
> > > >    
> > > >      {
> > > > 
> > > > -      uint64_t mask;
> > > > -
> > > > 
> > > >        mask = ((uint64_t) 1 << (n->range * BITS_PER_MARKER)) - 1;
> > > >        cmpxchg >>= (64 / BITS_PER_MARKER - n->range) *
> > > >        BITS_PER_MARKER;
> > > >        cmpnop &= mask;
> > > >      
> > > >      }
> > > > 
> > > > +  /* Zero out the bits corresponding to unused bytes in the result of
> > > > the
> > > > +     gimple expression.  */
> > > > +  if (rsize < n->range)
> > > > +    {
> > > > +      if (BYTES_BIG_ENDIAN)
> > > > +	{
> > > > +	  mask = ((uint64_t) 1 << (rsize * BITS_PER_MARKER)) - 1;
> > > > +	  cmpxchg &= mask;
> > > > +	  cmpnop >>= (n->range - rsize) * BITS_PER_MARKER;
> > > > +	}
> > > > +      else
> > > > +	{
> > > > +	  mask = ((uint64_t) 1 << (rsize * BITS_PER_MARKER)) - 1;
> > > > +	  cmpxchg >>= (n->range - rsize) * BITS_PER_MARKER;
> > > > +	  cmpnop &= mask;
> > > > +	}
> > > > +      n->range = rsize;
> > > > +    }
> > > > +
> > > > 
> > > >    /* A complete byte swap should make the symbolic number to start
> > > >    with
> > > >    
> > > >       the largest digit in the highest order byte. Unchanged symbolic
> > > >       number indicates a read with same endianness as target
> > > >       architecture.
> > > >       
> > > >        */
> > > > 
> > > > Regression testsuite was run on a bootstrapped native x86_64-linux-gnu
> > > > GCC
> > > > and on an arm-none-eabi GCC cross-compiler without any regression. I'm
> > > > waiting for a slot on gcc110 to do a big endian bootstrap but at least
> > > > the testcase works on mips-linux. I'll send an update once bootstrap
> > > > is
> > > > complete.
> > > > 
> > > > Is this ok for trunk and 5 branch in a week time if no regression is
> > > > reported?
> > > 
> > > Yes.
> > 
> > What about a backport to GCC 5? The patch applies cleanly and shows no
> > regression when running the testsuite with an arm-none-eabi GCC
> > cross-compiler and a natively bootstrapped x86_64-linux-gnu GCC compiler.
> > 
> > It also pass the test added in the patch with a nativly bootstrapped
> > powerpc64-unknown-linux-gnu GCC compiler (on gcc110 in compile farm).
> 
> The "Yes" above approved a backport.

Doh indeed. My apologize.

Best regards,

Thomas
diff mbox

Patch

diff --git a/gcc/testsuite/gcc.c-torture/execute/pr67781.c 
b/gcc/testsuite/gcc.c-torture/execute/pr67781.c
new file mode 100644
index 0000000..bf50aa2
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr67781.c
@@ -0,0 +1,34 @@ 
+#ifdef __UINT32_TYPE__
+typedef __UINT32_TYPE__ uint32_t;
+#else
+typedef unsigned uint32_t;
+#endif
+
+#ifdef __UINT8_TYPE__
+typedef __UINT8_TYPE__ uint8_t;
+#else
+typedef unsigned char uint8_t;
+#endif
+
+struct
+{
+  uint32_t a;
+  uint8_t b;
+} s = { 0x123456, 0x78 };
+
+int pr67781()
+{
+  uint32_t c = (s.a << 8) | s.b;
+  return c;
+}
+
+int
+main ()
+{
+  if (sizeof (uint32_t) * __CHAR_BIT__ != 32)
+    return 0;
+
+  if (pr67781 () != 0x12345678)
+    __builtin_abort ();
+  return 0;
+}
diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c
index b00f046..e5a185f 100644
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -2441,6 +2441,8 @@  find_bswap_or_nop_1 (gimple *stmt, struct symbolic_number 
*n, int limit)
 static gimple *
 find_bswap_or_nop (gimple *stmt, struct symbolic_number *n, bool *bswap)
 {
+  unsigned rsize;
+  uint64_t tmpn, mask;
 /* The number which the find_bswap_or_nop_1 result should match in order
    to have a full byte swap.  The number is shifted to the right