wide-int: i386: Fix ICEs on TImode signed overflow add/sub patterns [PR93376]
diff mbox series

Message ID 20200123081643.GV10088@tucnak
State New
Headers show
Series
  • wide-int: i386: Fix ICEs on TImode signed overflow add/sub patterns [PR93376]
Related show

Commit Message

Jakub Jelinek Jan. 23, 2020, 8:16 a.m. UTC
Hi!

The following testcase ICEs, because during try_combine of i3:
(insn 18 17 19 2 (parallel [
            (set (reg:CCO 17 flags)
                (eq:CCO (plus:OI (sign_extend:OI (reg:TI 96))
                        (const_int 1 [0x1]))
                    (sign_extend:OI (plus:TI (reg:TI 96)
                            (const_int 1 [0x1])))))
            (set (reg:TI 98)
                (plus:TI (reg:TI 96)
                    (const_int 1 [0x1])))
        ]) "pr93376.c":8:10 223 {*addvti4_doubleword_1}
     (expr_list:REG_UNUSED (reg:TI 98)
        (expr_list:REG_DEAD (reg:TI 96)
            (nil))))
and i2:
(insn 17 37 18 2 (set (reg:TI 96)
        (const_wide_int 0x7fffffffffffffffffffffffffffffff)) "pr93376.c":8:10 65 {*movti_internal}
     (nil))
the eq in there gets simplified into:
(eq:CCO (const_wide_int 0x080000000000000000000000000000000)
    (const_wide_int 0x80000000000000000000000000000000))
and simplify-rtx.c tries to simplify it by simplifying MINUS
of the two operands.
Now, i386 defines MAX_BITSIZE_MODE_ANY_INT to 128, because OImode
and XImode are used mainly as a placeholder for the vector modes;
these new signed overflow patterns are an exception to that,
but what they really need is just TImode precision + 1 (maybe 2 worst case)
bits at any time.

wide-int.h defines WIDE_INT_MAX_ELTS in a way that it contains one more
HWI above number of HWIs to cover WIDE_INT_MAX_ELTS, so on i386 that is
3 HWIs, meaning that TImode precision + 1/2 bits is still representable in
there.  Unfortunately, the way wi::sub_large is implemented, it needs
not just those 3 HWIs, but one HWI above the maximum of the lengths of
both operands, which means it buffer overflows, overwrites the following
precision in wide_int_storage and ICEs later on.  The need for 4 HWIs is
only temporary, because canonize immediately after it canonicalizes it
back to 3 HWIs only.

Attached are two patches, the first one is admittedly a hack, but could be
considered an optimization too, simply before overwriting the last HWI
({add,sub}_large seem to be the only one that have such len++) perform
a cheap check if canonize won't optimize it away immediately and in such
case don't store it.  Yes, we are playing with fire because full OImode
is 256 bits and we can't store that many, but we in the end only need 129
bits.

The other patch is something suggested by Richard S., avoid using OImode
for this and instead use a partial int mode that is smaller.  This is still
playing with fire because even the partial int mode is larger than
MAX_BITSIZE_MODE_ANY_INT, but at least its precision fits into those 3
WIDE_INT_MAX_ELTS wide-int.h uses.  The disadvantage of that approach is
that it is more costly at compile time, various RA data structures contains
number_of_modes^2 sized arrays, and RA initialization will want to compute
at runtime various properties of each of the modes.

I've tried to think about other ways how to represent signed overflow check
in RTL, but didn't find any that would actually properly describe it and not
be way too complicated for the patterns.

So, my preference is the first patch, but Richard S. doesn't like that much.

Both patches have been bootstrapped/regtested on x86_64-linux and
i686-linux, ok for trunk (which one)?

	Jakub
2020-01-23  Jakub Jelinek  <jakub@redhat.com>

	PR target/93376
	* wide-int.cc (wi::add_large, wi::sub_large): Don't extend len by
	writing SIGN_MASK of the highest element that canonize would remove
	again.

	* gcc.dg/pr93376.c: New test.
2020-01-23  Jakub Jelinek  <jakub@redhat.com>

	PR target/93376
	* config/i386/i386-modes.def (POImode): New mode.
	* config/i386/i386.md (DPWI): New mode attribute.
	(addv<mode>4, subv<mode>4): Use <DPWI> instead of <DWI>.
	(QWI): Rename to...
	(QPWI): ... this.  Use POI instead of OI for TImode.
	(*addv<dwi>4_doubleword, *addv<dwi>4_doubleword_1,
	*subv<dwi>4_doubleword, *subv<dwi>4_doubleword_1): Use <QPWI>
	instead of <QWI>.

	* gcc.dg/pr93376.c: New test.

--- gcc/config/i386/i386-modes.def.jj	2020-01-12 11:54:36.323414766 +0100
+++ gcc/config/i386/i386-modes.def	2020-01-22 14:10:44.781367495 +0100
@@ -107,6 +107,13 @@ INT_MODE (XI, 64);
 PARTIAL_INT_MODE (HI, 16, P2QI);
 PARTIAL_INT_MODE (SI, 32, P2HI);
 
+/* Mode used for signed overflow checking of TImode.  As
+   MAX_BITSIZE_MODE_ANY_INT is only 128, wide-int.h reserves only that
+   plus HOST_BITS_PER_WIDE_INT bits in wide_int etc., so OImode is too
+   large.  For the overflow checking we actually need just 1 or 2 bits
+   beyond TImode precision.  */
+PARTIAL_INT_MODE (OI, 192, POI);
+
 /* Keep the OI and XI modes from confusing the compiler into thinking
    that these modes could actually be used for computation.  They are
    only holders for vectors during data movement.  */
--- gcc/config/i386/i386.md.jj	2020-01-22 14:03:52.597593584 +0100
+++ gcc/config/i386/i386.md	2020-01-22 14:31:22.661691613 +0100
@@ -6054,15 +6054,18 @@ (define_insn "*addqi_ext_2"
   [(set_attr "type" "alu")
    (set_attr "mode" "QI")])
 
+;; Like DWI, but use POImode instead of OImode.
+(define_mode_attr DPWI [(QI "HI") (HI "SI") (SI "DI") (DI "TI") (TI "POI")])
+
 ;; Add with jump on overflow.
 (define_expand "addv<mode>4"
   [(parallel [(set (reg:CCO FLAGS_REG)
 		   (eq:CCO
-		     (plus:<DWI>
-		       (sign_extend:<DWI>
+		     (plus:<DPWI>
+		       (sign_extend:<DPWI>
 			 (match_operand:SWIDWI 1 "nonimmediate_operand"))
 		       (match_dup 4))
-			 (sign_extend:<DWI>
+			 (sign_extend:<DPWI>
 			   (plus:SWIDWI (match_dup 1)
 			     (match_operand:SWIDWI 2
 			       "<general_hilo_operand>")))))
@@ -6078,7 +6081,7 @@ (define_expand "addv<mode>4"
   if (CONST_SCALAR_INT_P (operands[2]))
     operands[4] = operands[2];
   else
-    operands[4] = gen_rtx_SIGN_EXTEND (<DWI>mode, operands[2]);
+    operands[4] = gen_rtx_SIGN_EXTEND (<DPWI>mode, operands[2]);
 })
 
 (define_insn "*addv<mode>4"
@@ -6123,17 +6126,17 @@ (define_insn "addv<mode>4_1"
 	      (const_string "<MODE_SIZE>")))])
 
 ;; Quad word integer modes as mode attribute.
-(define_mode_attr QWI [(SI "TI") (DI "OI")])
+(define_mode_attr QPWI [(SI "TI") (DI "POI")])
 
 (define_insn_and_split "*addv<dwi>4_doubleword"
   [(set (reg:CCO FLAGS_REG)
 	(eq:CCO
-	  (plus:<QWI>
-	    (sign_extend:<QWI>
+	  (plus:<QPWI>
+	    (sign_extend:<QPWI>
 	      (match_operand:<DWI> 1 "nonimmediate_operand" "%0,0"))
-	    (sign_extend:<QWI>
+	    (sign_extend:<QPWI>
 	      (match_operand:<DWI> 2 "x86_64_hilo_general_operand" "r<di>,o")))
-	  (sign_extend:<QWI>
+	  (sign_extend:<QPWI>
 	    (plus:<DWI> (match_dup 1) (match_dup 2)))))
    (set (match_operand:<DWI> 0 "nonimmediate_operand" "=ro,r")
 	(plus:<DWI> (match_dup 1) (match_dup 2)))]
@@ -6172,11 +6175,11 @@ (define_insn_and_split "*addv<dwi>4_doub
 (define_insn_and_split "*addv<dwi>4_doubleword_1"
   [(set (reg:CCO FLAGS_REG)
 	(eq:CCO
-	  (plus:<QWI>
-	    (sign_extend:<QWI>
+	  (plus:<QPWI>
+	    (sign_extend:<QPWI>
 	      (match_operand:<DWI> 1 "nonimmediate_operand" "%0"))
-	    (match_operand:<QWI> 3 "const_scalar_int_operand" ""))
-	  (sign_extend:<QWI>
+	    (match_operand:<QPWI> 3 "const_scalar_int_operand" ""))
+	  (sign_extend:<QPWI>
 	    (plus:<DWI>
 	      (match_dup 1)
 	      (match_operand:<DWI> 2 "x86_64_hilo_general_operand" "<di>")))))
@@ -6570,11 +6573,11 @@ (define_insn "*subsi_2_zext"
 (define_expand "subv<mode>4"
   [(parallel [(set (reg:CCO FLAGS_REG)
 		   (eq:CCO
-		     (minus:<DWI>
-		       (sign_extend:<DWI>
+		     (minus:<DPWI>
+		       (sign_extend:<DPWI>
 			 (match_operand:SWIDWI 1 "nonimmediate_operand"))
 		       (match_dup 4))
-		     (sign_extend:<DWI>
+		     (sign_extend:<DPWI>
 		       (minus:SWIDWI (match_dup 1)
 				     (match_operand:SWIDWI 2
 						"<general_hilo_operand>")))))
@@ -6590,7 +6593,7 @@ (define_expand "subv<mode>4"
   if (CONST_SCALAR_INT_P (operands[2]))
     operands[4] = operands[2];
   else
-    operands[4] = gen_rtx_SIGN_EXTEND (<DWI>mode, operands[2]);
+    operands[4] = gen_rtx_SIGN_EXTEND (<DPWI>mode, operands[2]);
 })
 
 (define_insn "*subv<mode>4"
@@ -6637,12 +6640,12 @@ (define_insn "subv<mode>4_1"
 (define_insn_and_split "*subv<dwi>4_doubleword"
   [(set (reg:CCO FLAGS_REG)
 	(eq:CCO
-	  (minus:<QWI>
-	    (sign_extend:<QWI>
+	  (minus:<QPWI>
+	    (sign_extend:<QPWI>
 	      (match_operand:<DWI> 1 "nonimmediate_operand" "0,0"))
-	    (sign_extend:<QWI>
+	    (sign_extend:<QPWI>
 	      (match_operand:<DWI> 2 "x86_64_hilo_general_operand" "r<di>,o")))
-	  (sign_extend:<QWI>
+	  (sign_extend:<QPWI>
 	    (minus:<DWI> (match_dup 1) (match_dup 2)))))
    (set (match_operand:<DWI> 0 "nonimmediate_operand" "=ro,r")
 	(minus:<DWI> (match_dup 1) (match_dup 2)))]
@@ -6679,11 +6682,11 @@ (define_insn_and_split "*subv<dwi>4_doub
 (define_insn_and_split "*subv<dwi>4_doubleword_1"
   [(set (reg:CCO FLAGS_REG)
 	(eq:CCO
-	  (minus:<QWI>
-	    (sign_extend:<QWI>
+	  (minus:<QPWI>
+	    (sign_extend:<QPWI>
 	      (match_operand:<DWI> 1 "nonimmediate_operand" "0"))
-	    (match_operand:<QWI> 3 "const_scalar_int_operand" ""))
-	  (sign_extend:<QWI>
+	    (match_operand:<QPWI> 3 "const_scalar_int_operand" ""))
+	  (sign_extend:<QPWI>
 	    (minus:<DWI>
 	      (match_dup 1)
 	      (match_operand:<DWI> 2 "x86_64_hilo_general_operand" "<di>")))))
--- gcc/testsuite/gcc.dg/pr93376.c.jj	2020-01-22 14:04:57.955604949 +0100
+++ gcc/testsuite/gcc.dg/pr93376.c	2020-01-22 14:04:57.955604949 +0100
@@ -0,0 +1,20 @@
+/* PR target/93376 */
+/* { dg-do compile { target int128 } } */
+/* { dg-options "-Og -finline-functions-called-once" } */
+
+unsigned a, b, c;
+
+int
+bar (int x)
+{
+  short s = __builtin_sub_overflow (~x, 0, &b);
+  a = __builtin_ffsll (~x);
+  return __builtin_add_overflow_p (-(unsigned __int128) a, s,
+				   (unsigned __int128) 0);
+}
+
+void
+foo (void)
+{
+  c = bar (0);
+}

Comments

Richard Sandiford Jan. 23, 2020, 9:14 a.m. UTC | #1
Jakub Jelinek <jakub@redhat.com> writes:
> Hi!
>
> The following testcase ICEs, because during try_combine of i3:
> (insn 18 17 19 2 (parallel [
>             (set (reg:CCO 17 flags)
>                 (eq:CCO (plus:OI (sign_extend:OI (reg:TI 96))
>                         (const_int 1 [0x1]))
>                     (sign_extend:OI (plus:TI (reg:TI 96)
>                             (const_int 1 [0x1])))))
>             (set (reg:TI 98)
>                 (plus:TI (reg:TI 96)
>                     (const_int 1 [0x1])))
>         ]) "pr93376.c":8:10 223 {*addvti4_doubleword_1}
>      (expr_list:REG_UNUSED (reg:TI 98)
>         (expr_list:REG_DEAD (reg:TI 96)
>             (nil))))
> and i2:
> (insn 17 37 18 2 (set (reg:TI 96)
>         (const_wide_int 0x7fffffffffffffffffffffffffffffff)) "pr93376.c":8:10 65 {*movti_internal}
>      (nil))
> the eq in there gets simplified into:
> (eq:CCO (const_wide_int 0x080000000000000000000000000000000)
>     (const_wide_int 0x80000000000000000000000000000000))
> and simplify-rtx.c tries to simplify it by simplifying MINUS
> of the two operands.
> Now, i386 defines MAX_BITSIZE_MODE_ANY_INT to 128, because OImode
> and XImode are used mainly as a placeholder for the vector modes;
> these new signed overflow patterns are an exception to that,
> but what they really need is just TImode precision + 1 (maybe 2 worst case)
> bits at any time.
>
> wide-int.h defines WIDE_INT_MAX_ELTS in a way that it contains one more
> HWI above number of HWIs to cover WIDE_INT_MAX_ELTS, so on i386 that is
> 3 HWIs, meaning that TImode precision + 1/2 bits is still representable in
> there.  Unfortunately, the way wi::sub_large is implemented, it needs
> not just those 3 HWIs, but one HWI above the maximum of the lengths of
> both operands, which means it buffer overflows, overwrites the following
> precision in wide_int_storage and ICEs later on.  The need for 4 HWIs is
> only temporary, because canonize immediately after it canonicalizes it
> back to 3 HWIs only.
>
> Attached are two patches, the first one is admittedly a hack, but could be
> considered an optimization too, simply before overwriting the last HWI
> ({add,sub}_large seem to be the only one that have such len++) perform
> a cheap check if canonize won't optimize it away immediately and in such
> case don't store it.  Yes, we are playing with fire because full OImode
> is 256 bits and we can't store that many, but we in the end only need 129
> bits.
>
> The other patch is something suggested by Richard S., avoid using OImode
> for this and instead use a partial int mode that is smaller.  This is still
> playing with fire because even the partial int mode is larger than
> MAX_BITSIZE_MODE_ANY_INT, but at least its precision fits into those 3
> WIDE_INT_MAX_ELTS wide-int.h uses.

I think we should increase MAX_BITSIZE_MODE_ANY_INT to the precision
of POImode at the same time, and make that precision less than 192
(191 maybe?).  That way everything is "correct" without increasing
the size of wide_int.

> The disadvantage of that approach is that it is more costly at compile
> time, various RA data structures contains number_of_modes^2 sized
> arrays, and RA initialization will want to compute at runtime various
> properties of each of the modes.

True, but if that's a real source of slow compilation in practice then
we should do something about it independently of this, since many mode
combinations make no sense.

If I've counted correctly, x86 already has 109 modes, so we're talking
about less than a 1% increase in O(modes) operations and less than a 1.9%
increase in O(modes^2) operations.

Thanks,
Richard

> I've tried to think about other ways how to represent signed overflow check
> in RTL, but didn't find any that would actually properly describe it and not
> be way too complicated for the patterns.
>
> So, my preference is the first patch, but Richard S. doesn't like that much.
>
> Both patches have been bootstrapped/regtested on x86_64-linux and
> i686-linux, ok for trunk (which one)?
>
> 	Jakub
Jakub Jelinek Jan. 23, 2020, 9:32 a.m. UTC | #2
On Thu, Jan 23, 2020 at 09:14:42AM +0000, Richard Sandiford wrote:
> > The other patch is something suggested by Richard S., avoid using OImode
> > for this and instead use a partial int mode that is smaller.  This is still
> > playing with fire because even the partial int mode is larger than
> > MAX_BITSIZE_MODE_ANY_INT, but at least its precision fits into those 3
> > WIDE_INT_MAX_ELTS wide-int.h uses.
> 
> I think we should increase MAX_BITSIZE_MODE_ANY_INT to the precision
> of POImode at the same time, and make that precision less than 192
> (191 maybe?).  That way everything is "correct" without increasing
> the size of wide_int.

The 192 was chosen just to be nice and round, I guess it could be just 130
or say 160 (128 + 32).

Uros, would you be ok with bumping MAX_BITSIZE_MODE_ANY_INT to 160
and changing the POImode patch to use 160 bits instead of 192
if it passes testing?

I guess it would still affect the sizes of the wide-int.cc arrays:
  unsigned HOST_HALF_WIDE_INT
    u[4 * MAX_BITSIZE_MODE_ANY_INT / HOST_BITS_PER_HALF_WIDE_INT];
  unsigned HOST_HALF_WIDE_INT
    v[4 * MAX_BITSIZE_MODE_ANY_INT / HOST_BITS_PER_HALF_WIDE_INT];
  /* The '2' in 'R' is because we are internally doing a full
     multiply.  */
  unsigned HOST_HALF_WIDE_INT
    r[2 * 4 * MAX_BITSIZE_MODE_ANY_INT / HOST_BITS_PER_HALF_WIDE_INT];
so if we didn't want that, we would need to use 132; but those
are weird anyway, because they round down rather than up.

	Jakub
Uros Bizjak Jan. 23, 2020, 9:38 a.m. UTC | #3
On Thu, Jan 23, 2020 at 10:33 AM Jakub Jelinek <jakub@redhat.com> wrote:
>
> On Thu, Jan 23, 2020 at 09:14:42AM +0000, Richard Sandiford wrote:
> > > The other patch is something suggested by Richard S., avoid using OImode
> > > for this and instead use a partial int mode that is smaller.  This is still
> > > playing with fire because even the partial int mode is larger than
> > > MAX_BITSIZE_MODE_ANY_INT, but at least its precision fits into those 3
> > > WIDE_INT_MAX_ELTS wide-int.h uses.
> >
> > I think we should increase MAX_BITSIZE_MODE_ANY_INT to the precision
> > of POImode at the same time, and make that precision less than 192
> > (191 maybe?).  That way everything is "correct" without increasing
> > the size of wide_int.
>
> The 192 was chosen just to be nice and round, I guess it could be just 130
> or say 160 (128 + 32).
>
> Uros, would you be ok with bumping MAX_BITSIZE_MODE_ANY_INT to 160
> and changing the POImode patch to use 160 bits instead of 192
> if it passes testing?

We can try 160 and hope that nothing breaks due to "weird" number.

Uros.

> I guess it would still affect the sizes of the wide-int.cc arrays:
>   unsigned HOST_HALF_WIDE_INT
>     u[4 * MAX_BITSIZE_MODE_ANY_INT / HOST_BITS_PER_HALF_WIDE_INT];
>   unsigned HOST_HALF_WIDE_INT
>     v[4 * MAX_BITSIZE_MODE_ANY_INT / HOST_BITS_PER_HALF_WIDE_INT];
>   /* The '2' in 'R' is because we are internally doing a full
>      multiply.  */
>   unsigned HOST_HALF_WIDE_INT
>     r[2 * 4 * MAX_BITSIZE_MODE_ANY_INT / HOST_BITS_PER_HALF_WIDE_INT];
> so if we didn't want that, we would need to use 132; but those
> are weird anyway, because they round down rather than up.
>
>         Jakub
>
Jakub Jelinek Jan. 23, 2020, 1:17 p.m. UTC | #4
On Thu, Jan 23, 2020 at 10:38:31AM +0100, Uros Bizjak wrote:
> On Thu, Jan 23, 2020 at 10:33 AM Jakub Jelinek <jakub@redhat.com> wrote:
> >
> > On Thu, Jan 23, 2020 at 09:14:42AM +0000, Richard Sandiford wrote:
> > > > The other patch is something suggested by Richard S., avoid using OImode
> > > > for this and instead use a partial int mode that is smaller.  This is still
> > > > playing with fire because even the partial int mode is larger than
> > > > MAX_BITSIZE_MODE_ANY_INT, but at least its precision fits into those 3
> > > > WIDE_INT_MAX_ELTS wide-int.h uses.
> > >
> > > I think we should increase MAX_BITSIZE_MODE_ANY_INT to the precision
> > > of POImode at the same time, and make that precision less than 192
> > > (191 maybe?).  That way everything is "correct" without increasing
> > > the size of wide_int.
> >
> > The 192 was chosen just to be nice and round, I guess it could be just 130
> > or say 160 (128 + 32).
> >
> > Uros, would you be ok with bumping MAX_BITSIZE_MODE_ANY_INT to 160
> > and changing the POImode patch to use 160 bits instead of 192
> > if it passes testing?
> 
> We can try 160 and hope that nothing breaks due to "weird" number.

Following passed bootstrap/regtest on x86_64-linux and i686-linux.

Ok for trunk then?

2020-01-23  Jakub Jelinek  <jakub@redhat.com>

	PR target/93376
	* config/i386/i386-modes.def (POImode): New mode.
	(MAX_BITSIZE_MODE_ANY_INT): Change from 128 to 160.
	* config/i386/i386.md (DPWI): New mode attribute.
	(addv<mode>4, subv<mode>4): Use <DPWI> instead of <DWI>.
	(QWI): Rename to...
	(QPWI): ... this.  Use POI instead of OI for TImode.
	(*addv<dwi>4_doubleword, *addv<dwi>4_doubleword_1,
	*subv<dwi>4_doubleword, *subv<dwi>4_doubleword_1): Use <QPWI>
	instead of <QWI>.

	* gcc.dg/pr93376.c: New test.

--- gcc/config/i386/i386-modes.def.jj	2020-01-22 16:10:29.812837184 +0100
+++ gcc/config/i386/i386-modes.def	2020-01-23 11:43:37.931345071 +0100
@@ -107,10 +107,19 @@ INT_MODE (XI, 64);
 PARTIAL_INT_MODE (HI, 16, P2QI);
 PARTIAL_INT_MODE (SI, 32, P2HI);
 
+/* Mode used for signed overflow checking of TImode.  As
+   MAX_BITSIZE_MODE_ANY_INT is only 160, wide-int.h reserves only that
+   rounded up to multiple of HOST_BITS_PER_WIDE_INT bits in wide_int etc.,
+   so OImode is too large.  For the overflow checking we actually need
+   just 1 or 2 bits beyond TImode precision.  Use 160 bits to have
+   a multiple of 32.  */
+PARTIAL_INT_MODE (OI, 160, POI);
+
 /* Keep the OI and XI modes from confusing the compiler into thinking
    that these modes could actually be used for computation.  They are
-   only holders for vectors during data movement.  */
-#define MAX_BITSIZE_MODE_ANY_INT (128)
+   only holders for vectors during data movement.  Include POImode precision
+   though.  */
+#define MAX_BITSIZE_MODE_ANY_INT (160)
 
 /* The symbol Pmode stands for one of the above machine modes (usually SImode).
    The tm.h file specifies which one.  It is not a distinct mode.  */
--- gcc/config/i386/i386.md.jj	2020-01-22 16:10:29.815837139 +0100
+++ gcc/config/i386/i386.md	2020-01-23 11:40:54.923822116 +0100
@@ -6054,15 +6054,18 @@ (define_insn "*addqi_ext_2"
   [(set_attr "type" "alu")
    (set_attr "mode" "QI")])
 
+;; Like DWI, but use POImode instead of OImode.
+(define_mode_attr DPWI [(QI "HI") (HI "SI") (SI "DI") (DI "TI") (TI "POI")])
+
 ;; Add with jump on overflow.
 (define_expand "addv<mode>4"
   [(parallel [(set (reg:CCO FLAGS_REG)
 		   (eq:CCO
-		     (plus:<DWI>
-		       (sign_extend:<DWI>
+		     (plus:<DPWI>
+		       (sign_extend:<DPWI>
 			 (match_operand:SWIDWI 1 "nonimmediate_operand"))
 		       (match_dup 4))
-			 (sign_extend:<DWI>
+			 (sign_extend:<DPWI>
 			   (plus:SWIDWI (match_dup 1)
 			     (match_operand:SWIDWI 2
 			       "<general_hilo_operand>")))))
@@ -6078,7 +6081,7 @@ (define_expand "addv<mode>4"
   if (CONST_SCALAR_INT_P (operands[2]))
     operands[4] = operands[2];
   else
-    operands[4] = gen_rtx_SIGN_EXTEND (<DWI>mode, operands[2]);
+    operands[4] = gen_rtx_SIGN_EXTEND (<DPWI>mode, operands[2]);
 })
 
 (define_insn "*addv<mode>4"
@@ -6123,17 +6126,17 @@ (define_insn "addv<mode>4_1"
 	      (const_string "<MODE_SIZE>")))])
 
 ;; Quad word integer modes as mode attribute.
-(define_mode_attr QWI [(SI "TI") (DI "OI")])
+(define_mode_attr QPWI [(SI "TI") (DI "POI")])
 
 (define_insn_and_split "*addv<dwi>4_doubleword"
   [(set (reg:CCO FLAGS_REG)
 	(eq:CCO
-	  (plus:<QWI>
-	    (sign_extend:<QWI>
+	  (plus:<QPWI>
+	    (sign_extend:<QPWI>
 	      (match_operand:<DWI> 1 "nonimmediate_operand" "%0,0"))
-	    (sign_extend:<QWI>
+	    (sign_extend:<QPWI>
 	      (match_operand:<DWI> 2 "x86_64_hilo_general_operand" "r<di>,o")))
-	  (sign_extend:<QWI>
+	  (sign_extend:<QPWI>
 	    (plus:<DWI> (match_dup 1) (match_dup 2)))))
    (set (match_operand:<DWI> 0 "nonimmediate_operand" "=ro,r")
 	(plus:<DWI> (match_dup 1) (match_dup 2)))]
@@ -6172,11 +6175,11 @@ (define_insn_and_split "*addv<dwi>4_doub
 (define_insn_and_split "*addv<dwi>4_doubleword_1"
   [(set (reg:CCO FLAGS_REG)
 	(eq:CCO
-	  (plus:<QWI>
-	    (sign_extend:<QWI>
+	  (plus:<QPWI>
+	    (sign_extend:<QPWI>
 	      (match_operand:<DWI> 1 "nonimmediate_operand" "%0"))
-	    (match_operand:<QWI> 3 "const_scalar_int_operand" ""))
-	  (sign_extend:<QWI>
+	    (match_operand:<QPWI> 3 "const_scalar_int_operand" ""))
+	  (sign_extend:<QPWI>
 	    (plus:<DWI>
 	      (match_dup 1)
 	      (match_operand:<DWI> 2 "x86_64_hilo_general_operand" "<di>")))))
@@ -6570,11 +6573,11 @@ (define_insn "*subsi_2_zext"
 (define_expand "subv<mode>4"
   [(parallel [(set (reg:CCO FLAGS_REG)
 		   (eq:CCO
-		     (minus:<DWI>
-		       (sign_extend:<DWI>
+		     (minus:<DPWI>
+		       (sign_extend:<DPWI>
 			 (match_operand:SWIDWI 1 "nonimmediate_operand"))
 		       (match_dup 4))
-		     (sign_extend:<DWI>
+		     (sign_extend:<DPWI>
 		       (minus:SWIDWI (match_dup 1)
 				     (match_operand:SWIDWI 2
 						"<general_hilo_operand>")))))
@@ -6590,7 +6593,7 @@ (define_expand "subv<mode>4"
   if (CONST_SCALAR_INT_P (operands[2]))
     operands[4] = operands[2];
   else
-    operands[4] = gen_rtx_SIGN_EXTEND (<DWI>mode, operands[2]);
+    operands[4] = gen_rtx_SIGN_EXTEND (<DPWI>mode, operands[2]);
 })
 
 (define_insn "*subv<mode>4"
@@ -6637,12 +6640,12 @@ (define_insn "subv<mode>4_1"
 (define_insn_and_split "*subv<dwi>4_doubleword"
   [(set (reg:CCO FLAGS_REG)
 	(eq:CCO
-	  (minus:<QWI>
-	    (sign_extend:<QWI>
+	  (minus:<QPWI>
+	    (sign_extend:<QPWI>
 	      (match_operand:<DWI> 1 "nonimmediate_operand" "0,0"))
-	    (sign_extend:<QWI>
+	    (sign_extend:<QPWI>
 	      (match_operand:<DWI> 2 "x86_64_hilo_general_operand" "r<di>,o")))
-	  (sign_extend:<QWI>
+	  (sign_extend:<QPWI>
 	    (minus:<DWI> (match_dup 1) (match_dup 2)))))
    (set (match_operand:<DWI> 0 "nonimmediate_operand" "=ro,r")
 	(minus:<DWI> (match_dup 1) (match_dup 2)))]
@@ -6679,11 +6682,11 @@ (define_insn_and_split "*subv<dwi>4_doub
 (define_insn_and_split "*subv<dwi>4_doubleword_1"
   [(set (reg:CCO FLAGS_REG)
 	(eq:CCO
-	  (minus:<QWI>
-	    (sign_extend:<QWI>
+	  (minus:<QPWI>
+	    (sign_extend:<QPWI>
 	      (match_operand:<DWI> 1 "nonimmediate_operand" "0"))
-	    (match_operand:<QWI> 3 "const_scalar_int_operand" ""))
-	  (sign_extend:<QWI>
+	    (match_operand:<QPWI> 3 "const_scalar_int_operand" ""))
+	  (sign_extend:<QPWI>
 	    (minus:<DWI>
 	      (match_dup 1)
 	      (match_operand:<DWI> 2 "x86_64_hilo_general_operand" "<di>")))))
--- gcc/testsuite/gcc.dg/pr93376.c.jj	2020-01-23 11:40:54.923822116 +0100
+++ gcc/testsuite/gcc.dg/pr93376.c	2020-01-23 11:40:54.923822116 +0100
@@ -0,0 +1,20 @@
+/* PR target/93376 */
+/* { dg-do compile { target int128 } } */
+/* { dg-options "-Og -finline-functions-called-once" } */
+
+unsigned a, b, c;
+
+int
+bar (int x)
+{
+  short s = __builtin_sub_overflow (~x, 0, &b);
+  a = __builtin_ffsll (~x);
+  return __builtin_add_overflow_p (-(unsigned __int128) a, s,
+				   (unsigned __int128) 0);
+}
+
+void
+foo (void)
+{
+  c = bar (0);
+}


	Jakub
Uros Bizjak Jan. 23, 2020, 3:02 p.m. UTC | #5
On Thu, Jan 23, 2020 at 2:17 PM Jakub Jelinek <jakub@redhat.com> wrote:
>
> On Thu, Jan 23, 2020 at 10:38:31AM +0100, Uros Bizjak wrote:
> > On Thu, Jan 23, 2020 at 10:33 AM Jakub Jelinek <jakub@redhat.com> wrote:
> > >
> > > On Thu, Jan 23, 2020 at 09:14:42AM +0000, Richard Sandiford wrote:
> > > > > The other patch is something suggested by Richard S., avoid using OImode
> > > > > for this and instead use a partial int mode that is smaller.  This is still
> > > > > playing with fire because even the partial int mode is larger than
> > > > > MAX_BITSIZE_MODE_ANY_INT, but at least its precision fits into those 3
> > > > > WIDE_INT_MAX_ELTS wide-int.h uses.
> > > >
> > > > I think we should increase MAX_BITSIZE_MODE_ANY_INT to the precision
> > > > of POImode at the same time, and make that precision less than 192
> > > > (191 maybe?).  That way everything is "correct" without increasing
> > > > the size of wide_int.
> > >
> > > The 192 was chosen just to be nice and round, I guess it could be just 130
> > > or say 160 (128 + 32).
> > >
> > > Uros, would you be ok with bumping MAX_BITSIZE_MODE_ANY_INT to 160
> > > and changing the POImode patch to use 160 bits instead of 192
> > > if it passes testing?
> >
> > We can try 160 and hope that nothing breaks due to "weird" number.
>
> Following passed bootstrap/regtest on x86_64-linux and i686-linux.
>
> Ok for trunk then?
>
> 2020-01-23  Jakub Jelinek  <jakub@redhat.com>
>
>         PR target/93376
>         * config/i386/i386-modes.def (POImode): New mode.
>         (MAX_BITSIZE_MODE_ANY_INT): Change from 128 to 160.
>         * config/i386/i386.md (DPWI): New mode attribute.
>         (addv<mode>4, subv<mode>4): Use <DPWI> instead of <DWI>.
>         (QWI): Rename to...
>         (QPWI): ... this.  Use POI instead of OI for TImode.
>         (*addv<dwi>4_doubleword, *addv<dwi>4_doubleword_1,
>         *subv<dwi>4_doubleword, *subv<dwi>4_doubleword_1): Use <QPWI>
>         instead of <QWI>.
>
>         * gcc.dg/pr93376.c: New test.

LGTM.

Thanks,
Uros.

> --- gcc/config/i386/i386-modes.def.jj   2020-01-22 16:10:29.812837184 +0100
> +++ gcc/config/i386/i386-modes.def      2020-01-23 11:43:37.931345071 +0100
> @@ -107,10 +107,19 @@ INT_MODE (XI, 64);
>  PARTIAL_INT_MODE (HI, 16, P2QI);
>  PARTIAL_INT_MODE (SI, 32, P2HI);
>
> +/* Mode used for signed overflow checking of TImode.  As
> +   MAX_BITSIZE_MODE_ANY_INT is only 160, wide-int.h reserves only that
> +   rounded up to multiple of HOST_BITS_PER_WIDE_INT bits in wide_int etc.,
> +   so OImode is too large.  For the overflow checking we actually need
> +   just 1 or 2 bits beyond TImode precision.  Use 160 bits to have
> +   a multiple of 32.  */
> +PARTIAL_INT_MODE (OI, 160, POI);
> +
>  /* Keep the OI and XI modes from confusing the compiler into thinking
>     that these modes could actually be used for computation.  They are
> -   only holders for vectors during data movement.  */
> -#define MAX_BITSIZE_MODE_ANY_INT (128)
> +   only holders for vectors during data movement.  Include POImode precision
> +   though.  */
> +#define MAX_BITSIZE_MODE_ANY_INT (160)
>
>  /* The symbol Pmode stands for one of the above machine modes (usually SImode).
>     The tm.h file specifies which one.  It is not a distinct mode.  */
> --- gcc/config/i386/i386.md.jj  2020-01-22 16:10:29.815837139 +0100
> +++ gcc/config/i386/i386.md     2020-01-23 11:40:54.923822116 +0100
> @@ -6054,15 +6054,18 @@ (define_insn "*addqi_ext_2"
>    [(set_attr "type" "alu")
>     (set_attr "mode" "QI")])
>
> +;; Like DWI, but use POImode instead of OImode.
> +(define_mode_attr DPWI [(QI "HI") (HI "SI") (SI "DI") (DI "TI") (TI "POI")])
> +
>  ;; Add with jump on overflow.
>  (define_expand "addv<mode>4"
>    [(parallel [(set (reg:CCO FLAGS_REG)
>                    (eq:CCO
> -                    (plus:<DWI>
> -                      (sign_extend:<DWI>
> +                    (plus:<DPWI>
> +                      (sign_extend:<DPWI>
>                          (match_operand:SWIDWI 1 "nonimmediate_operand"))
>                        (match_dup 4))
> -                        (sign_extend:<DWI>
> +                        (sign_extend:<DPWI>
>                            (plus:SWIDWI (match_dup 1)
>                              (match_operand:SWIDWI 2
>                                "<general_hilo_operand>")))))
> @@ -6078,7 +6081,7 @@ (define_expand "addv<mode>4"
>    if (CONST_SCALAR_INT_P (operands[2]))
>      operands[4] = operands[2];
>    else
> -    operands[4] = gen_rtx_SIGN_EXTEND (<DWI>mode, operands[2]);
> +    operands[4] = gen_rtx_SIGN_EXTEND (<DPWI>mode, operands[2]);
>  })
>
>  (define_insn "*addv<mode>4"
> @@ -6123,17 +6126,17 @@ (define_insn "addv<mode>4_1"
>               (const_string "<MODE_SIZE>")))])
>
>  ;; Quad word integer modes as mode attribute.
> -(define_mode_attr QWI [(SI "TI") (DI "OI")])
> +(define_mode_attr QPWI [(SI "TI") (DI "POI")])
>
>  (define_insn_and_split "*addv<dwi>4_doubleword"
>    [(set (reg:CCO FLAGS_REG)
>         (eq:CCO
> -         (plus:<QWI>
> -           (sign_extend:<QWI>
> +         (plus:<QPWI>
> +           (sign_extend:<QPWI>
>               (match_operand:<DWI> 1 "nonimmediate_operand" "%0,0"))
> -           (sign_extend:<QWI>
> +           (sign_extend:<QPWI>
>               (match_operand:<DWI> 2 "x86_64_hilo_general_operand" "r<di>,o")))
> -         (sign_extend:<QWI>
> +         (sign_extend:<QPWI>
>             (plus:<DWI> (match_dup 1) (match_dup 2)))))
>     (set (match_operand:<DWI> 0 "nonimmediate_operand" "=ro,r")
>         (plus:<DWI> (match_dup 1) (match_dup 2)))]
> @@ -6172,11 +6175,11 @@ (define_insn_and_split "*addv<dwi>4_doub
>  (define_insn_and_split "*addv<dwi>4_doubleword_1"
>    [(set (reg:CCO FLAGS_REG)
>         (eq:CCO
> -         (plus:<QWI>
> -           (sign_extend:<QWI>
> +         (plus:<QPWI>
> +           (sign_extend:<QPWI>
>               (match_operand:<DWI> 1 "nonimmediate_operand" "%0"))
> -           (match_operand:<QWI> 3 "const_scalar_int_operand" ""))
> -         (sign_extend:<QWI>
> +           (match_operand:<QPWI> 3 "const_scalar_int_operand" ""))
> +         (sign_extend:<QPWI>
>             (plus:<DWI>
>               (match_dup 1)
>               (match_operand:<DWI> 2 "x86_64_hilo_general_operand" "<di>")))))
> @@ -6570,11 +6573,11 @@ (define_insn "*subsi_2_zext"
>  (define_expand "subv<mode>4"
>    [(parallel [(set (reg:CCO FLAGS_REG)
>                    (eq:CCO
> -                    (minus:<DWI>
> -                      (sign_extend:<DWI>
> +                    (minus:<DPWI>
> +                      (sign_extend:<DPWI>
>                          (match_operand:SWIDWI 1 "nonimmediate_operand"))
>                        (match_dup 4))
> -                    (sign_extend:<DWI>
> +                    (sign_extend:<DPWI>
>                        (minus:SWIDWI (match_dup 1)
>                                      (match_operand:SWIDWI 2
>                                                 "<general_hilo_operand>")))))
> @@ -6590,7 +6593,7 @@ (define_expand "subv<mode>4"
>    if (CONST_SCALAR_INT_P (operands[2]))
>      operands[4] = operands[2];
>    else
> -    operands[4] = gen_rtx_SIGN_EXTEND (<DWI>mode, operands[2]);
> +    operands[4] = gen_rtx_SIGN_EXTEND (<DPWI>mode, operands[2]);
>  })
>
>  (define_insn "*subv<mode>4"
> @@ -6637,12 +6640,12 @@ (define_insn "subv<mode>4_1"
>  (define_insn_and_split "*subv<dwi>4_doubleword"
>    [(set (reg:CCO FLAGS_REG)
>         (eq:CCO
> -         (minus:<QWI>
> -           (sign_extend:<QWI>
> +         (minus:<QPWI>
> +           (sign_extend:<QPWI>
>               (match_operand:<DWI> 1 "nonimmediate_operand" "0,0"))
> -           (sign_extend:<QWI>
> +           (sign_extend:<QPWI>
>               (match_operand:<DWI> 2 "x86_64_hilo_general_operand" "r<di>,o")))
> -         (sign_extend:<QWI>
> +         (sign_extend:<QPWI>
>             (minus:<DWI> (match_dup 1) (match_dup 2)))))
>     (set (match_operand:<DWI> 0 "nonimmediate_operand" "=ro,r")
>         (minus:<DWI> (match_dup 1) (match_dup 2)))]
> @@ -6679,11 +6682,11 @@ (define_insn_and_split "*subv<dwi>4_doub
>  (define_insn_and_split "*subv<dwi>4_doubleword_1"
>    [(set (reg:CCO FLAGS_REG)
>         (eq:CCO
> -         (minus:<QWI>
> -           (sign_extend:<QWI>
> +         (minus:<QPWI>
> +           (sign_extend:<QPWI>
>               (match_operand:<DWI> 1 "nonimmediate_operand" "0"))
> -           (match_operand:<QWI> 3 "const_scalar_int_operand" ""))
> -         (sign_extend:<QWI>
> +           (match_operand:<QPWI> 3 "const_scalar_int_operand" ""))
> +         (sign_extend:<QPWI>
>             (minus:<DWI>
>               (match_dup 1)
>               (match_operand:<DWI> 2 "x86_64_hilo_general_operand" "<di>")))))
> --- gcc/testsuite/gcc.dg/pr93376.c.jj   2020-01-23 11:40:54.923822116 +0100
> +++ gcc/testsuite/gcc.dg/pr93376.c      2020-01-23 11:40:54.923822116 +0100
> @@ -0,0 +1,20 @@
> +/* PR target/93376 */
> +/* { dg-do compile { target int128 } } */
> +/* { dg-options "-Og -finline-functions-called-once" } */
> +
> +unsigned a, b, c;
> +
> +int
> +bar (int x)
> +{
> +  short s = __builtin_sub_overflow (~x, 0, &b);
> +  a = __builtin_ffsll (~x);
> +  return __builtin_add_overflow_p (-(unsigned __int128) a, s,
> +                                  (unsigned __int128) 0);
> +}
> +
> +void
> +foo (void)
> +{
> +  c = bar (0);
> +}
>
>
>         Jakub
>

Patch
diff mbox series

--- gcc/wide-int.cc.jj	2020-01-12 11:54:38.535381394 +0100
+++ gcc/wide-int.cc	2020-01-22 12:05:06.739486117 +0100
@@ -1155,8 +1155,14 @@  wi::add_large (HOST_WIDE_INT *val, const
 
   if (len * HOST_BITS_PER_WIDE_INT < prec)
     {
-      val[len] = mask0 + mask1 + carry;
-      len++;
+      HOST_WIDE_INT val_top = mask0 + mask1 + carry;
+      /* Don't extend len unnecessarily when canonize would shrink it
+	 again immediately.  */
+      if (SIGN_MASK (val[len - 1]) != val_top)
+	{
+	  val[len] = val_top;
+	  len++;
+	}
       if (overflow)
 	*overflow
 	  = (sgn == UNSIGNED && carry) ? wi::OVF_OVERFLOW : wi::OVF_NONE;
@@ -1566,8 +1572,14 @@  wi::sub_large (HOST_WIDE_INT *val, const
 
   if (len * HOST_BITS_PER_WIDE_INT < prec)
     {
-      val[len] = mask0 - mask1 - borrow;
-      len++;
+      HOST_WIDE_INT val_top = mask0 - mask1 - borrow;
+      /* Don't extend len unnecessarily when canonize would shrink it
+	 again immediately.  */
+      if (SIGN_MASK (val[len - 1]) != val_top)
+	{
+	  val[len] = val_top;
+	  len++;
+	}
       if (overflow)
 	*overflow = (sgn == UNSIGNED && borrow) ? OVF_UNDERFLOW : OVF_NONE;
     }
--- gcc/testsuite/gcc.dg/pr93376.c.jj	2020-01-22 12:16:40.231937796 +0100
+++ gcc/testsuite/gcc.dg/pr93376.c	2020-01-22 12:16:23.953190057 +0100
@@ -0,0 +1,20 @@ 
+/* PR target/93376 */
+/* { dg-do compile { target int128 } } */
+/* { dg-options "-Og -finline-functions-called-once" } */
+
+unsigned a, b, c;
+
+int
+bar (int x)
+{
+  short s = __builtin_sub_overflow (~x, 0, &b);
+  a = __builtin_ffsll (~x);
+  return __builtin_add_overflow_p (-(unsigned __int128) a, s,
+				   (unsigned __int128) 0);
+}
+
+void
+foo (void)
+{
+  c = bar (0);
+}