diff mbox series

Add COMPLEX_VECTOR_INT modes

Message ID 77479c6f-5ce4-12fa-f429-c49ffbff3542@codesourcery.com
State New
Headers show
Series Add COMPLEX_VECTOR_INT modes | expand

Commit Message

Andrew Stubbs May 26, 2023, 2:34 p.m. UTC
Hi all,

I want to implement a vector DIVMOD libfunc for amdgcn, but I can't just 
do it because the GCC middle-end models DIVMOD's return value as 
"complex int" type, and there are no vector equivalents of that type.

Therefore, this patch adds minimal support for "complex vector int" 
modes.  I have not attempted to provide any means to use these modes 
from C, so they're really only useful for DIVMOD.  The actual libfunc 
implementation will pack the data into wider vector modes manually.

A knock-on effect of this is that I needed to increase the range of 
"mode_unit_size" (several of the vector modes supported by amdgcn exceed 
the previous 255-byte limit).

Since this change would add a large number of new, unused modes to many 
architectures, I have elected to *not* enable them, by default, in 
machmode.def (where the other complex modes are created).  The new modes 
are therefore inactive on all architectures but amdgcn, for now.

OK for mainline?  (I've not done a full test yet, but I will.)

Thanks

Andrew
Add COMPLEX_VECTOR_INT modes for amdgcn

This enables only minimal support for complex types containing integer
vectors with the intention of allowing vectorized divmod libfunc operations
(these return a pair of integers modelled as a complex number).

There's no way to declare variables of this mode in the front-end, and no
attempt to support it everywhere that complex modes can exist; the only
use-case, at present, is the implicit use by divmod calls generated by
the middle-end.

In order to prevent unexpected problems with other architectures these
modes are only enabled for amdgcn.

gcc/ChangeLog:

	* config/gcn/gcn-modes.def: Initialize COMPLEX_VECTOR_INT modes.
	* genmodes.cc (complex_class): Support MODE_COMPLEX_VECTOR_INT.
	(complete_mode): Likewise.
	(emit_mode_unit_size): Upgrade mode_unit_size type to short.
	(emit_mode_adjustments): Support MODE_COMPLEX_VECTOR_INT.
	* machmode.def: Mention MODE_COMPLEX_VECTOR_INT.
	* machmode.h (mode_to_unit_size): Upgrade type to short.
	* mode-classes.def: Add MODE_COMPLEX_VECTOR_INT.
	* stor-layout.cc (int_mode_for_mode): Support MODE_COMPLEX_VECTOR_INT.
	* tree.cc (build_complex_type): Allow VECTOR_INTEGER_TYPE_P.

Comments

Richard Biener May 30, 2023, 6:26 a.m. UTC | #1
On Fri, May 26, 2023 at 4:35 PM Andrew Stubbs <ams@codesourcery.com> wrote:
>
> Hi all,
>
> I want to implement a vector DIVMOD libfunc for amdgcn, but I can't just
> do it because the GCC middle-end models DIVMOD's return value as
> "complex int" type, and there are no vector equivalents of that type.
>
> Therefore, this patch adds minimal support for "complex vector int"
> modes.  I have not attempted to provide any means to use these modes
> from C, so they're really only useful for DIVMOD.  The actual libfunc
> implementation will pack the data into wider vector modes manually.
>
> A knock-on effect of this is that I needed to increase the range of
> "mode_unit_size" (several of the vector modes supported by amdgcn exceed
> the previous 255-byte limit).
>
> Since this change would add a large number of new, unused modes to many
> architectures, I have elected to *not* enable them, by default, in
> machmode.def (where the other complex modes are created).  The new modes
> are therefore inactive on all architectures but amdgcn, for now.
>
> OK for mainline?  (I've not done a full test yet, but I will.)

I think it makes more sense to map vector CSImode to vector SImode with
the double number of lanes.  In fact since divmod is a libgcc function
I wonder where your vector variant would reside and how GCC decides to
emit calls to it?  That is, there's no way to OMP simd declare this function?

Richard.

> Thanks
>
> Andrew
Richard Sandiford May 31, 2023, 1:14 p.m. UTC | #2
Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> On Fri, May 26, 2023 at 4:35 PM Andrew Stubbs <ams@codesourcery.com> wrote:
>>
>> Hi all,
>>
>> I want to implement a vector DIVMOD libfunc for amdgcn, but I can't just
>> do it because the GCC middle-end models DIVMOD's return value as
>> "complex int" type, and there are no vector equivalents of that type.
>>
>> Therefore, this patch adds minimal support for "complex vector int"
>> modes.  I have not attempted to provide any means to use these modes
>> from C, so they're really only useful for DIVMOD.  The actual libfunc
>> implementation will pack the data into wider vector modes manually.
>>
>> A knock-on effect of this is that I needed to increase the range of
>> "mode_unit_size" (several of the vector modes supported by amdgcn exceed
>> the previous 255-byte limit).
>>
>> Since this change would add a large number of new, unused modes to many
>> architectures, I have elected to *not* enable them, by default, in
>> machmode.def (where the other complex modes are created).  The new modes
>> are therefore inactive on all architectures but amdgcn, for now.
>>
>> OK for mainline?  (I've not done a full test yet, but I will.)
>
> I think it makes more sense to map vector CSImode to vector SImode with
> the double number of lanes.

Agreed FWIW.  This is effectively what AArch64 now does for x2, x3 and
x4 tuple types (where x2 is often used for complex values).

Thanks,
Richard
Andrew Stubbs June 5, 2023, 1:49 p.m. UTC | #3
On 30/05/2023 07:26, Richard Biener wrote:
> On Fri, May 26, 2023 at 4:35 PM Andrew Stubbs <ams@codesourcery.com> wrote:
>>
>> Hi all,
>>
>> I want to implement a vector DIVMOD libfunc for amdgcn, but I can't just
>> do it because the GCC middle-end models DIVMOD's return value as
>> "complex int" type, and there are no vector equivalents of that type.
>>
>> Therefore, this patch adds minimal support for "complex vector int"
>> modes.  I have not attempted to provide any means to use these modes
>> from C, so they're really only useful for DIVMOD.  The actual libfunc
>> implementation will pack the data into wider vector modes manually.
>>
>> A knock-on effect of this is that I needed to increase the range of
>> "mode_unit_size" (several of the vector modes supported by amdgcn exceed
>> the previous 255-byte limit).
>>
>> Since this change would add a large number of new, unused modes to many
>> architectures, I have elected to *not* enable them, by default, in
>> machmode.def (where the other complex modes are created).  The new modes
>> are therefore inactive on all architectures but amdgcn, for now.
>>
>> OK for mainline?  (I've not done a full test yet, but I will.)
> 
> I think it makes more sense to map vector CSImode to vector SImode with
> the double number of lanes.  In fact since divmod is a libgcc function
> I wonder where your vector variant would reside and how GCC decides to
> emit calls to it?  That is, there's no way to OMP simd declare this function?

The divmod implementation lives in libgcc. It's not too difficult to 
write using vector extensions and some asm tricks. I did try an OMP simd 
declare implementation, but it didn't vectorize well, and that's a yack 
I don't wish to shave right now.

In any case, the OMP simd declare will not help us here, directly, 
because the DIVMOD transformation happens too late in the pass pipeline, 
long after ifcvt and vect. My implementation (not yet posted), uses a 
libfunc and the TARGET_EXPAND_DIVMOD_LIBFUNC hook in the standard way. 
It just needs the complex vector modes to exist.

Using vectors twice the length is problematic also. If I create a new 
V128SImode that spans across two 64-lane vector registers then that will 
probably have the desired effect ("real" quotient in v8, "imaginary" 
remainder in v9), but if I use V64SImode to represent two V32SImode 
vectors then that's a one-register mode, and I'll have to use a 
permutation (a memory operation) to extract lanes 32-63 into lanes 0-31, 
and if we ever want to implement instructions that operate on these 
modes (as opposed to the odd/even add/sub complex patterns we have now) 
then the masking will be all broken and we'd need to constantly 
disassemble the double length vectors to operate on them.

The implementation I proposed is essentially a struct containing two 
vectors placed in consecutive registers. This is the natural 
representation for the architecture.

Anyway, you don't like this patch and I see that AArch64 is picking 
apart BLKmode to see if there's complex inside, so maybe I can make 
something like that work here? AArch64 doesn't seem to use 
TARGET_EXPAND_DIVMOD_LIBFUNC though, and I'm pretty sure the problem I 
was trying to solve was in the way the expand pass handles the BLKmode 
complex, outside the control of the backend hook (I'm still paging this 
stuff back in, post vacation).

Thanks

Andrew
Richard Biener June 6, 2023, 6:40 a.m. UTC | #4
On Mon, Jun 5, 2023 at 3:49 PM Andrew Stubbs <ams@codesourcery.com> wrote:
>
> On 30/05/2023 07:26, Richard Biener wrote:
> > On Fri, May 26, 2023 at 4:35 PM Andrew Stubbs <ams@codesourcery.com> wrote:
> >>
> >> Hi all,
> >>
> >> I want to implement a vector DIVMOD libfunc for amdgcn, but I can't just
> >> do it because the GCC middle-end models DIVMOD's return value as
> >> "complex int" type, and there are no vector equivalents of that type.
> >>
> >> Therefore, this patch adds minimal support for "complex vector int"
> >> modes.  I have not attempted to provide any means to use these modes
> >> from C, so they're really only useful for DIVMOD.  The actual libfunc
> >> implementation will pack the data into wider vector modes manually.
> >>
> >> A knock-on effect of this is that I needed to increase the range of
> >> "mode_unit_size" (several of the vector modes supported by amdgcn exceed
> >> the previous 255-byte limit).
> >>
> >> Since this change would add a large number of new, unused modes to many
> >> architectures, I have elected to *not* enable them, by default, in
> >> machmode.def (where the other complex modes are created).  The new modes
> >> are therefore inactive on all architectures but amdgcn, for now.
> >>
> >> OK for mainline?  (I've not done a full test yet, but I will.)
> >
> > I think it makes more sense to map vector CSImode to vector SImode with
> > the double number of lanes.  In fact since divmod is a libgcc function
> > I wonder where your vector variant would reside and how GCC decides to
> > emit calls to it?  That is, there's no way to OMP simd declare this function?
>
> The divmod implementation lives in libgcc. It's not too difficult to
> write using vector extensions and some asm tricks. I did try an OMP simd
> declare implementation, but it didn't vectorize well, and that's a yack
> I don't wish to shave right now.
>
> In any case, the OMP simd declare will not help us here, directly,
> because the DIVMOD transformation happens too late in the pass pipeline,
> long after ifcvt and vect. My implementation (not yet posted), uses a
> libfunc and the TARGET_EXPAND_DIVMOD_LIBFUNC hook in the standard way.
> It just needs the complex vector modes to exist.
>
> Using vectors twice the length is problematic also. If I create a new
> V128SImode that spans across two 64-lane vector registers then that will
> probably have the desired effect ("real" quotient in v8, "imaginary"
> remainder in v9), but if I use V64SImode to represent two V32SImode
> vectors then that's a one-register mode, and I'll have to use a
> permutation (a memory operation) to extract lanes 32-63 into lanes 0-31,
> and if we ever want to implement instructions that operate on these
> modes (as opposed to the odd/even add/sub complex patterns we have now)
> then the masking will be all broken and we'd need to constantly
> disassemble the double length vectors to operate on them.

I'm a bit confused as I don't see the difference between V64SCImode and
V128SImode since both contain 128 SImode values.  And I would expect
the imag/real parts to be _always_ interleaved, irrespective of whether
the result fits one or two vector registers.

> The implementation I proposed is essentially a struct containing two
> vectors placed in consecutive registers. This is the natural
> representation for the architecture.

I don't think you did that?  Or at least I don't see how vectors of
complex modes would match that.  It would be a complex of a vector
mode instead, no?

I do see that internal functions with more than one output would be
desirable and I think I proposed ASMs with a "coded text" aka
something like a pattern ID or an optab identifier would be the best
fit on GIMPLE but TARGET_EXPAND_DIVMOD_LIBFUNC for this
particular case should be a good fit as well, no?

Can you share what you needed to change to get your complex vector int
code actually working?  What does the divmod pattern matching create
for the return type?  The pass has

  /* Disable the transform if either is a constant, since division-by-constant
     may have specialized expansion.  */
  if (CONSTANT_CLASS_P (op1))
    return false;

  if (CONSTANT_CLASS_P (op2))
    {
      if (integer_pow2p (op2))
        return false;

      if (TYPE_PRECISION (type) <= HOST_BITS_PER_WIDE_INT
          && TYPE_PRECISION (type) <= BITS_PER_WORD)
        return false;

at least the TYPE_PRECISION query is bogus when type is a vector type
and the IFN building does

 /* Part 3: Create libcall to internal fn DIVMOD:
     divmod_tmp = DIVMOD (op1, op2).  */

  gcall *call_stmt = gimple_build_call_internal (IFN_DIVMOD, 2, op1, op2);
  tree res = make_temp_ssa_name (build_complex_type (TREE_TYPE (op1)),
                                 call_stmt, "divmod_tmp");

so that builds a complex type with a vector component, not a vector
with complex components.

Richard.


> Anyway, you don't like this patch and I see that AArch64 is picking
> apart BLKmode to see if there's complex inside, so maybe I can make
> something like that work here? AArch64 doesn't seem to use
> TARGET_EXPAND_DIVMOD_LIBFUNC though, and I'm pretty sure the problem I
> was trying to solve was in the way the expand pass handles the BLKmode
> complex, outside the control of the backend hook (I'm still paging this
> stuff back in, post vacation).
>
> Thanks
>
> Andrew
Andrew Stubbs June 6, 2023, 9:37 a.m. UTC | #5
On 06/06/2023 07:40, Richard Biener wrote:
> On Mon, Jun 5, 2023 at 3:49 PM Andrew Stubbs <ams@codesourcery.com> wrote:
>>
>> On 30/05/2023 07:26, Richard Biener wrote:
>>> On Fri, May 26, 2023 at 4:35 PM Andrew Stubbs <ams@codesourcery.com> wrote:
>>>>
>>>> Hi all,
>>>>
>>>> I want to implement a vector DIVMOD libfunc for amdgcn, but I can't just
>>>> do it because the GCC middle-end models DIVMOD's return value as
>>>> "complex int" type, and there are no vector equivalents of that type.
>>>>
>>>> Therefore, this patch adds minimal support for "complex vector int"
>>>> modes.  I have not attempted to provide any means to use these modes
>>>> from C, so they're really only useful for DIVMOD.  The actual libfunc
>>>> implementation will pack the data into wider vector modes manually.
>>>>
>>>> A knock-on effect of this is that I needed to increase the range of
>>>> "mode_unit_size" (several of the vector modes supported by amdgcn exceed
>>>> the previous 255-byte limit).
>>>>
>>>> Since this change would add a large number of new, unused modes to many
>>>> architectures, I have elected to *not* enable them, by default, in
>>>> machmode.def (where the other complex modes are created).  The new modes
>>>> are therefore inactive on all architectures but amdgcn, for now.
>>>>
>>>> OK for mainline?  (I've not done a full test yet, but I will.)
>>>
>>> I think it makes more sense to map vector CSImode to vector SImode with
>>> the double number of lanes.  In fact since divmod is a libgcc function
>>> I wonder where your vector variant would reside and how GCC decides to
>>> emit calls to it?  That is, there's no way to OMP simd declare this function?
>>
>> The divmod implementation lives in libgcc. It's not too difficult to
>> write using vector extensions and some asm tricks. I did try an OMP simd
>> declare implementation, but it didn't vectorize well, and that's a yack
>> I don't wish to shave right now.
>>
>> In any case, the OMP simd declare will not help us here, directly,
>> because the DIVMOD transformation happens too late in the pass pipeline,
>> long after ifcvt and vect. My implementation (not yet posted), uses a
>> libfunc and the TARGET_EXPAND_DIVMOD_LIBFUNC hook in the standard way.
>> It just needs the complex vector modes to exist.
>>
>> Using vectors twice the length is problematic also. If I create a new
>> V128SImode that spans across two 64-lane vector registers then that will
>> probably have the desired effect ("real" quotient in v8, "imaginary"
>> remainder in v9), but if I use V64SImode to represent two V32SImode
>> vectors then that's a one-register mode, and I'll have to use a
>> permutation (a memory operation) to extract lanes 32-63 into lanes 0-31,
>> and if we ever want to implement instructions that operate on these
>> modes (as opposed to the odd/even add/sub complex patterns we have now)
>> then the masking will be all broken and we'd need to constantly
>> disassemble the double length vectors to operate on them.
> 
> I'm a bit confused as I don't see the difference between V64SCImode and
> V128SImode since both contain 128 SImode values.  And I would expect
> the imag/real parts to be _always_ interleaved, irrespective of whether
> the result fits one or two vector registers.

The patch doesn't create "V64SCImode", it creates "CV64SImode". 
GET_UNIT_SIZE returns the size of V64SImode, so it is a tuple with a 
vector in each field.

For GCN specifically, I think this would make sense (hypothetically); it 
would give each notional SIMT "thread" a real and complex part in 
consecutive registers, meaning it can add all the reals in one 
instruction, and subtract all the imaginaries in another instruction, 
and then multiply them together (or whatever) in a very natural way. 
Meanwhile, the interleaved approach requires masking half the lanes to 
do the add, mask the other lanes to do the subtract, do a permutation to 
operate on them together, and then do it all again for the other half of 
the values in the other vector. GCN also has load/store instructions 
that can interleave two, three, or four, registers when storing them 
(this is what V64DImode and V64TImode do already), so there's no extra 
pain there.

Anyway, I didn't actually want to do any of that, at this time, I just 
want DIVMOD to work but the middle-end models it as having a complex int 
return value. If the complex modes do not exist then the middle-end does 
not fail, but the code expands to nonsense RTL because the types are all 
VOIDmode.

>> The implementation I proposed is essentially a struct containing two
>> vectors placed in consecutive registers. This is the natural
>> representation for the architecture.
> 
> I don't think you did that?  Or at least I don't see how vectors of
> complex modes would match that.  It would be a complex of a vector
> mode instead, no?
> 
> I do see that internal functions with more than one output would be
> desirable and I think I proposed ASMs with a "coded text" aka
> something like a pattern ID or an optab identifier would be the best
> fit on GIMPLE but TARGET_EXPAND_DIVMOD_LIBFUNC for this
> particular case should be a good fit as well, no?
> 
> Can you share what you needed to change to get your complex vector int
> code actually working?  What does the divmod pattern matching create
> for the return type?  The pass has
> 
>    /* Disable the transform if either is a constant, since division-by-constant
>       may have specialized expansion.  */
>    if (CONSTANT_CLASS_P (op1))
>      return false;
> 
>    if (CONSTANT_CLASS_P (op2))
>      {
>        if (integer_pow2p (op2))
>          return false;
> 
>        if (TYPE_PRECISION (type) <= HOST_BITS_PER_WIDE_INT
>            && TYPE_PRECISION (type) <= BITS_PER_WORD)
>          return false;
> 
> at least the TYPE_PRECISION query is bogus when type is a vector type
> and the IFN building does
> 
>   /* Part 3: Create libcall to internal fn DIVMOD:
>       divmod_tmp = DIVMOD (op1, op2).  */
> 
>    gcall *call_stmt = gimple_build_call_internal (IFN_DIVMOD, 2, op1, op2);
>    tree res = make_temp_ssa_name (build_complex_type (TREE_TYPE (op1)),
>                                   call_stmt, "divmod_tmp");
> 
> so that builds a complex type with a vector component, not a vector
> with complex components.

In my libgcc __divmodv64si4 routine I return a V64DImode value with the 
two values as the high and low parts. GCN supports DImode values as 
pairs of SImode registers, and V64DImode is a pair of V64SImode 
registers such that each vector lane works exactly like the scalar case 
(indeed, scalars are basically "V1DImode" for many operators, since the 
scalar registers do not have a full set of instructions). This means 
that the quotients are returned in v8 and the remainders are returned in 
v9, and because all the compiler sees the wider mode it doesn't treat 
them as an aggregate.

Then in my TARGET_EXPAND_DIVMOD_LIBFUNC implementation I have this 
(simplified slightly to illustrate only the SImode case):

   emit_library_call_value (libfunc, gen_rtx_REG (widemode, 
RETURN_VALUE_REG),
                            LCT_NORMAL, widemode, op0, mode, op1, mode);

   *quot = gen_rtx_REG (mode, RETURN_VALUE_REG);
   *rem = gen_rtx_REG (mode, RETURN_VALUE_REG + 1);

I think this hook is necessary because this libfunc can have a custom 
ABI? Maybe this isn't how a CV64SImode would actually get returned? 
Anyway, this is how this libfunc works.

When CV64SImode exists the above works, otherwise I get an ICE because 
it tries to store a VOIDmode register using my scatter_store_v64si 
instruction and that isn't valid.

>> Anyway, you don't like this patch and I see that AArch64 is picking
>> apart BLKmode to see if there's complex inside, so maybe I can make
>> something like that work here? AArch64 doesn't seem to use
>> TARGET_EXPAND_DIVMOD_LIBFUNC though, and I'm pretty sure the problem I
>> was trying to solve was in the way the expand pass handles the BLKmode
>> complex, outside the control of the backend hook (I'm still paging this
>> stuff back in, post vacation).

I have attempted to get expand_DIVMOD to do the right thing without the 
new complex-of-vectors but there's no way to fix it with the existing 
hooks. I have experimented with simply fixing up the VOIDmode values in 
the backend, but the dataflow remains broken (the register it tries to 
store is not connected to the "quot" and "rem" values provided by my 
hook). I'll need to change the way DIVMOD works, somehow, but it's 
relying on the complex int representation to pass the return values 
onward so if I can't provide that then I'll need to code some kind of 
non-obvious alternative.

Any suggestions? My unfinished patches are attached (I think I only need 
to tidy a few bits and write a changelog).

Thanks

Andrew
amdgcn: vector div, mod, and divmod


diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc
index edc2abcad26..0ce5b0c461d 100644
--- a/gcc/config/gcn/gcn.cc
+++ b/gcc/config/gcn/gcn.cc
@@ -2930,8 +2930,9 @@ gcn_return_in_memory (const_tree type, const_tree ARG_UNUSED (fntype))
   if (mode == BLKmode)
     return true;
 
-  if ((!VECTOR_TYPE_P (type) && size > 2 * UNITS_PER_WORD)
-      || size > 2 * UNITS_PER_WORD * 64)
+  if (!VECTOR_TYPE_P (type)
+      ? size > 6 * UNITS_PER_WORD
+      : size > 6 * UNITS_PER_WORD * 64)
     return true;
 
   return false;
@@ -3782,6 +3783,47 @@ gcn_trampoline_init (rtx m_tramp, tree fndecl, rtx chain_value)
 					      TRAMPOLINE_SIZE)));
 }
 
+/* Implement TARGET_EXPAND_DIVMOD_LIBFUNC.
+
+   There are divmod libfuncs for all modes except TImode.  They return the
+   two values packed into a larger integer/vector.  */
+
+void
+gcn_expand_divmod_libfunc (rtx libfunc, machine_mode mode, rtx op0, rtx op1,
+			   rtx *quot, rtx *rem)
+{
+  machine_mode innermode = (VECTOR_MODE_P (mode)
+			    ? GET_MODE_INNER (mode) : mode);
+  machine_mode wideinnermode = VOIDmode;
+  machine_mode widemode = VOIDmode;
+
+  switch (innermode)
+    {
+    case E_QImode:
+    case E_HImode:
+    case E_SImode:
+      wideinnermode = DImode;
+      break;
+    case E_DImode:
+      wideinnermode = TImode;
+      break;
+    default:
+      gcc_unreachable ();
+    }
+
+  if (VECTOR_MODE_P (mode))
+    widemode = VnMODE (GET_MODE_NUNITS (mode), wideinnermode);
+  else
+    widemode = wideinnermode;
+
+  emit_library_call_value (libfunc, gen_rtx_REG (widemode, RETURN_VALUE_REG),
+			   LCT_NORMAL, widemode, op0, mode, op1, mode);
+
+  *quot = gen_rtx_REG (mode, RETURN_VALUE_REG);
+  *rem = gen_rtx_REG (mode,
+		      RETURN_VALUE_REG + (wideinnermode == TImode ? 2 : 1));
+}
+
 /* }}}  */
 /* {{{ Miscellaneous.  */
 
@@ -4220,6 +4262,161 @@ gcn_init_libfuncs (void)
   set_optab_libfunc (popcount_optab, TImode, "__popcountti2");
   set_optab_libfunc (parity_optab, TImode, "__parityti2");
   set_optab_libfunc (bswap_optab, TImode, "__bswapti2");
+
+  set_optab_libfunc (sdivmod_optab, HImode, "__divmodhi4");
+  set_optab_libfunc (udivmod_optab, HImode, "__udivmodhi4");
+  set_optab_libfunc (sdivmod_optab, SImode, "__divmodsi4");
+  set_optab_libfunc (udivmod_optab, SImode, "__udivmodsi4");
+  set_optab_libfunc (sdivmod_optab, DImode, "__divmoddi4");
+  set_optab_libfunc (udivmod_optab, DImode, "__udivmoddi4");
+
+  set_optab_libfunc (sdiv_optab, V2QImode, "__divv2qi3");
+  set_optab_libfunc (udiv_optab, V2QImode, "__udivv2qi3");
+  set_optab_libfunc (smod_optab, V2QImode, "__modv2qi3");
+  set_optab_libfunc (umod_optab, V2QImode, "__umodv2qi3");
+  set_optab_libfunc (sdivmod_optab, V2QImode, "__divmodv2qi4");
+  set_optab_libfunc (udivmod_optab, V2QImode, "__udivmodv2qi4");
+  set_optab_libfunc (sdiv_optab, V4QImode, "__divv4qi3");
+  set_optab_libfunc (udiv_optab, V4QImode, "__udivv4qi3");
+  set_optab_libfunc (smod_optab, V4QImode, "__modv4qi3");
+  set_optab_libfunc (umod_optab, V4QImode, "__umodv4qi3");
+  set_optab_libfunc (sdivmod_optab, V4QImode, "__divmodv4qi4");
+  set_optab_libfunc (udivmod_optab, V4QImode, "__udivmodv4qi4");
+  set_optab_libfunc (sdiv_optab, V8QImode, "__divv8qi3");
+  set_optab_libfunc (udiv_optab, V8QImode, "__udivv8qi3");
+  set_optab_libfunc (smod_optab, V8QImode, "__modv8qi3");
+  set_optab_libfunc (umod_optab, V8QImode, "__umodv8qi3");
+  set_optab_libfunc (sdivmod_optab, V8QImode, "__divmodv8qi4");
+  set_optab_libfunc (udivmod_optab, V8QImode, "__udivmodv8qi4");
+  set_optab_libfunc (sdiv_optab, V16QImode, "__divv16qi3");
+  set_optab_libfunc (udiv_optab, V16QImode, "__udivv16qi3");
+  set_optab_libfunc (smod_optab, V16QImode, "__modv16qi3");
+  set_optab_libfunc (umod_optab, V16QImode, "__umodv16qi3");
+  set_optab_libfunc (sdivmod_optab, V16QImode, "__divmodv16qi4");
+  set_optab_libfunc (udivmod_optab, V16QImode, "__udivmodv16qi4");
+  set_optab_libfunc (sdiv_optab, V32QImode, "__divv32qi3");
+  set_optab_libfunc (udiv_optab, V32QImode, "__udivv32qi3");
+  set_optab_libfunc (smod_optab, V32QImode, "__modv32qi3");
+  set_optab_libfunc (umod_optab, V32QImode, "__umodv32qi3");
+  set_optab_libfunc (sdivmod_optab, V32QImode, "__divmodv32qi4");
+  set_optab_libfunc (udivmod_optab, V32QImode, "__udivmodv32qi4");
+  set_optab_libfunc (sdiv_optab, V64QImode, "__divv64qi3");
+  set_optab_libfunc (udiv_optab, V64QImode, "__udivv64qi3");
+  set_optab_libfunc (smod_optab, V64QImode, "__modv64qi3");
+  set_optab_libfunc (umod_optab, V64QImode, "__umodv64qi3");
+  set_optab_libfunc (sdivmod_optab, V64QImode, "__divmodv64qi4");
+  set_optab_libfunc (udivmod_optab, V64QImode, "__udivmodv64qi4");
+
+  set_optab_libfunc (sdiv_optab, V2HImode, "__divv2hi3");
+  set_optab_libfunc (udiv_optab, V2HImode, "__udivv2hi3");
+  set_optab_libfunc (smod_optab, V2HImode, "__modv2hi3");
+  set_optab_libfunc (umod_optab, V2HImode, "__umodv2hi3");
+  set_optab_libfunc (sdivmod_optab, V2HImode, "__divmodv2hi4");
+  set_optab_libfunc (udivmod_optab, V2HImode, "__udivmodv2hi4");
+  set_optab_libfunc (sdiv_optab, V4HImode, "__divv4hi3");
+  set_optab_libfunc (udiv_optab, V4HImode, "__udivv4hi3");
+  set_optab_libfunc (smod_optab, V4HImode, "__modv4hi3");
+  set_optab_libfunc (umod_optab, V4HImode, "__umodv4hi3");
+  set_optab_libfunc (sdivmod_optab, V4HImode, "__divmodv4hi4");
+  set_optab_libfunc (udivmod_optab, V4HImode, "__udivmodv4hi4");
+  set_optab_libfunc (sdiv_optab, V8HImode, "__divv8hi3");
+  set_optab_libfunc (udiv_optab, V8HImode, "__udivv8hi3");
+  set_optab_libfunc (smod_optab, V8HImode, "__modv8hi3");
+  set_optab_libfunc (umod_optab, V8HImode, "__umodv8hi3");
+  set_optab_libfunc (sdivmod_optab, V8HImode, "__divmodv8hi4");
+  set_optab_libfunc (udivmod_optab, V8HImode, "__udivmodv8hi4");
+  set_optab_libfunc (sdiv_optab, V16HImode, "__divv16hi3");
+  set_optab_libfunc (udiv_optab, V16HImode, "__udivv16hi3");
+  set_optab_libfunc (smod_optab, V16HImode, "__modv16hi3");
+  set_optab_libfunc (umod_optab, V16HImode, "__umodv16hi3");
+  set_optab_libfunc (sdivmod_optab, V16HImode, "__divmodv16hi4");
+  set_optab_libfunc (udivmod_optab, V16HImode, "__udivmodv16hi4");
+  set_optab_libfunc (sdiv_optab, V32HImode, "__divv32hi3");
+  set_optab_libfunc (udiv_optab, V32HImode, "__udivv32hi3");
+  set_optab_libfunc (smod_optab, V32HImode, "__modv32hi3");
+  set_optab_libfunc (umod_optab, V32HImode, "__umodv32hi3");
+  set_optab_libfunc (sdivmod_optab, V32HImode, "__divmodv32hi4");
+  set_optab_libfunc (udivmod_optab, V32HImode, "__udivmodv32hi4");
+  set_optab_libfunc (sdiv_optab, V64HImode, "__divv64hi3");
+  set_optab_libfunc (udiv_optab, V64HImode, "__udivv64hi3");
+  set_optab_libfunc (smod_optab, V64HImode, "__modv64hi3");
+  set_optab_libfunc (umod_optab, V64HImode, "__umodv64hi3");
+  set_optab_libfunc (sdivmod_optab, V64HImode, "__divmodv64hi4");
+  set_optab_libfunc (udivmod_optab, V64HImode, "__udivmodv64hi4");
+
+  set_optab_libfunc (sdiv_optab, V2SImode, "__divv2si3");
+  set_optab_libfunc (udiv_optab, V2SImode, "__udivv2si3");
+  set_optab_libfunc (smod_optab, V2SImode, "__modv2si3");
+  set_optab_libfunc (umod_optab, V2SImode, "__umodv2si3");
+  set_optab_libfunc (sdivmod_optab, V2SImode, "__divmodv2si4");
+  set_optab_libfunc (udivmod_optab, V2SImode, "__udivmodv2si4");
+  set_optab_libfunc (sdiv_optab, V4SImode, "__divv4si3");
+  set_optab_libfunc (udiv_optab, V4SImode, "__udivv4si3");
+  set_optab_libfunc (smod_optab, V4SImode, "__modv4si3");
+  set_optab_libfunc (umod_optab, V4SImode, "__umodv4si3");
+  set_optab_libfunc (sdivmod_optab, V4SImode, "__divmodv4si4");
+  set_optab_libfunc (udivmod_optab, V4SImode, "__udivmodv4si4");
+  set_optab_libfunc (sdiv_optab, V8SImode, "__divv8si3");
+  set_optab_libfunc (udiv_optab, V8SImode, "__udivv8si3");
+  set_optab_libfunc (smod_optab, V8SImode, "__modv8si3");
+  set_optab_libfunc (umod_optab, V8SImode, "__umodv8si3");
+  set_optab_libfunc (sdivmod_optab, V8SImode, "__divmodv8si4");
+  set_optab_libfunc (udivmod_optab, V8SImode, "__udivmodv8si4");
+  set_optab_libfunc (sdiv_optab, V16SImode, "__divv16si3");
+  set_optab_libfunc (udiv_optab, V16SImode, "__udivv16si3");
+  set_optab_libfunc (smod_optab, V16SImode, "__modv16si3");
+  set_optab_libfunc (umod_optab, V16SImode, "__umodv16si3");
+  set_optab_libfunc (sdivmod_optab, V16SImode, "__divmodv16si4");
+  set_optab_libfunc (udivmod_optab, V16SImode, "__udivmodv16si4");
+  set_optab_libfunc (sdiv_optab, V32SImode, "__divv32si3");
+  set_optab_libfunc (udiv_optab, V32SImode, "__udivv32si3");
+  set_optab_libfunc (smod_optab, V32SImode, "__modv32si3");
+  set_optab_libfunc (umod_optab, V32SImode, "__umodv32si3");
+  set_optab_libfunc (sdivmod_optab, V32SImode, "__divmodv32si4");
+  set_optab_libfunc (udivmod_optab, V32SImode, "__udivmodv32si4");
+  set_optab_libfunc (sdiv_optab, V64SImode, "__divv64si3");
+  set_optab_libfunc (udiv_optab, V64SImode, "__udivv64si3");
+  set_optab_libfunc (smod_optab, V64SImode, "__modv64si3");
+  set_optab_libfunc (umod_optab, V64SImode, "__umodv64si3");
+  set_optab_libfunc (sdivmod_optab, V64SImode, "__divmodv64si4");
+  set_optab_libfunc (udivmod_optab, V64SImode, "__udivmodv64si4");
+
+  set_optab_libfunc (sdiv_optab, V2DImode, "__divv2di3");
+  set_optab_libfunc (udiv_optab, V2DImode, "__udivv2di3");
+  set_optab_libfunc (smod_optab, V2DImode, "__modv2di3");
+  set_optab_libfunc (umod_optab, V2DImode, "__umodv2di3");
+  set_optab_libfunc (sdivmod_optab, V2DImode, "__divmodv2di4");
+  set_optab_libfunc (udivmod_optab, V2DImode, "__udivmodv2di4");
+  set_optab_libfunc (sdiv_optab, V4DImode, "__divv4di3");
+  set_optab_libfunc (udiv_optab, V4DImode, "__udivv4di3");
+  set_optab_libfunc (smod_optab, V4DImode, "__modv4di3");
+  set_optab_libfunc (umod_optab, V4DImode, "__umodv4di3");
+  set_optab_libfunc (sdivmod_optab, V4DImode, "__divmodv4di4");
+  set_optab_libfunc (udivmod_optab, V4DImode, "__udivmodv4di4");
+  set_optab_libfunc (sdiv_optab, V8DImode, "__divv8di3");
+  set_optab_libfunc (udiv_optab, V8DImode, "__udivv8di3");
+  set_optab_libfunc (smod_optab, V8DImode, "__modv8di3");
+  set_optab_libfunc (umod_optab, V8DImode, "__umodv8di3");
+  set_optab_libfunc (sdivmod_optab, V8DImode, "__divmodv8di4");
+  set_optab_libfunc (udivmod_optab, V8DImode, "__udivmodv8di4");
+  set_optab_libfunc (sdiv_optab, V16DImode, "__divv16di3");
+  set_optab_libfunc (udiv_optab, V16DImode, "__udivv16di3");
+  set_optab_libfunc (smod_optab, V16DImode, "__modv16di3");
+  set_optab_libfunc (umod_optab, V16DImode, "__umodv16di3");
+  set_optab_libfunc (sdivmod_optab, V16DImode, "__divmodv16di4");
+  set_optab_libfunc (udivmod_optab, V16DImode, "__udivmodv16di4");
+  set_optab_libfunc (sdiv_optab, V32DImode, "__divv32di3");
+  set_optab_libfunc (udiv_optab, V32DImode, "__udivv32di3");
+  set_optab_libfunc (smod_optab, V32DImode, "__modv32di3");
+  set_optab_libfunc (umod_optab, V32DImode, "__umodv32di3");
+  set_optab_libfunc (sdivmod_optab, V32DImode, "__divmodv32di4");
+  set_optab_libfunc (udivmod_optab, V32DImode, "__udivmodv32di4");
+  set_optab_libfunc (sdiv_optab, V64DImode, "__divv64di3");
+  set_optab_libfunc (udiv_optab, V64DImode, "__udivv64di3");
+  set_optab_libfunc (smod_optab, V64DImode, "__modv64di3");
+  set_optab_libfunc (umod_optab, V64DImode, "__umodv64di3");
+  set_optab_libfunc (sdivmod_optab, V64DImode, "__divmodv64di4");
+  set_optab_libfunc (udivmod_optab, V64DImode, "__udivmodv64di4");
 }
 
 /* Expand the CMP_SWAP GCN builtins.  We have our own versions that do
@@ -7492,6 +7689,8 @@ gcn_dwarf_register_span (rtx rtl)
 #define TARGET_EMUTLS_VAR_INIT gcn_emutls_var_init
 #undef  TARGET_EXPAND_BUILTIN
 #define TARGET_EXPAND_BUILTIN gcn_expand_builtin
+#undef  TARGET_EXPAND_DIVMOD_LIBFUNC
+#define TARGET_EXPAND_DIVMOD_LIBFUNC gcn_expand_divmod_libfunc
 #undef  TARGET_FRAME_POINTER_REQUIRED
 #define TARGET_FRAME_POINTER_REQUIRED gcn_frame_pointer_rqd
 #undef  TARGET_FUNCTION_ARG
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/predcom-2.c b/gcc/testsuite/gcc.dg/tree-ssa/predcom-2.c
index d8fe51c5a6c..1c54679c022 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/predcom-2.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/predcom-2.c
@@ -1,5 +1,6 @@
 /* { dg-do run } */
 /* { dg-options "-O2 -funroll-loops --param max-unroll-times=8 -fpredictive-commoning -fdump-tree-pcom-details -fno-tree-pre" } */
+/* { dg-additional-options "-fno-tree-vectorize" { target amdgcn-*-* } } */
 
 void abort (void);
 
diff --git a/gcc/testsuite/gcc.dg/unroll-8.c b/gcc/testsuite/gcc.dg/unroll-8.c
index dfcfe2eebf0..c4f6ac91581 100644
--- a/gcc/testsuite/gcc.dg/unroll-8.c
+++ b/gcc/testsuite/gcc.dg/unroll-8.c
@@ -1,5 +1,7 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -fdump-rtl-loop2_unroll -funroll-loops" } */
+/* { dg-additional-options "-fno-tree-vectorize" { target amdgcn-*-* } } */
+
 struct a {int a[7];};
 int t(struct a *a, int n)
 {
diff --git a/gcc/testsuite/gcc.dg/vect/slp-26.c b/gcc/testsuite/gcc.dg/vect/slp-26.c
index 01f4e4ee32e..f8b49ff603c 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-26.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-26.c
@@ -46,7 +46,7 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { target { ! mips_msa } } } } */
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { mips_msa } } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" { target { ! mips_msa } } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target { mips_msa } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { target { ! { mips_msa || amdgcn-*-* } } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { mips_msa || amdgcn-*-* } } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" { target { ! { mips_msa || amdgcn-*-* } } } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target { mips_msa || amdgcn-*-* } } } } */
diff --git a/gcc/testsuite/gcc.target/gcn/simd-math-3-16.c b/gcc/testsuite/gcc.target/gcn/simd-math-3-16.c
new file mode 100644
index 00000000000..b0f88b4b3e3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/gcn/simd-math-3-16.c
@@ -0,0 +1,13 @@
+#define STYPE v16si
+#define UTYPE v16usi
+#define N 16
+#include "simd-math-3.c"
+
+/* { dg-final { scan-assembler-times {__divmodv16si4@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__udivmodv16si4@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__divv16si3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__udivv16si3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__modv16si3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__umodv16si3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__divsi3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__udivsi3@rel32@lo} 1 } } */
diff --git a/gcc/testsuite/gcc.target/gcn/simd-math-3-2.c b/gcc/testsuite/gcc.target/gcn/simd-math-3-2.c
new file mode 100644
index 00000000000..ba4a5da5827
--- /dev/null
+++ b/gcc/testsuite/gcc.target/gcn/simd-math-3-2.c
@@ -0,0 +1,13 @@
+#define STYPE v2si
+#define UTYPE v2usi
+#define N 2
+#include "simd-math-3.c"
+
+/* { dg-final { scan-assembler-times {__divmodv2si4@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__udivmodv2si4@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__divv2si3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__udivv2si3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__modv2si3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__umodv2si3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__divsi3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__udivsi3@rel32@lo} 1 } } */
diff --git a/gcc/testsuite/gcc.target/gcn/simd-math-3-32.c b/gcc/testsuite/gcc.target/gcn/simd-math-3-32.c
new file mode 100644
index 00000000000..6329581509b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/gcn/simd-math-3-32.c
@@ -0,0 +1,13 @@
+#define STYPE v32si
+#define UTYPE v32usi
+#define N 32
+#include "simd-math-3.c"
+
+/* { dg-final { scan-assembler-times {__divmodv32si4@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__udivmodv32si4@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__divv32si3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__udivv32si3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__modv32si3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__umodv32si3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__divsi3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__udivsi3@rel32@lo} 1 } } */
diff --git a/gcc/testsuite/gcc.target/gcn/simd-math-3-4.c b/gcc/testsuite/gcc.target/gcn/simd-math-3-4.c
new file mode 100644
index 00000000000..32686b66eda
--- /dev/null
+++ b/gcc/testsuite/gcc.target/gcn/simd-math-3-4.c
@@ -0,0 +1,13 @@
+#define STYPE v4si
+#define UTYPE v4usi
+#define N 4
+#include "simd-math-3.c"
+
+/* { dg-final { scan-assembler-times {__divmodv4si4@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__udivmodv4si4@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__divv4si3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__udivv4si3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__modv4si3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__umodv4si3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__divsi3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__udivsi3@rel32@lo} 1 } } */
diff --git a/gcc/testsuite/gcc.target/gcn/simd-math-3-8.c b/gcc/testsuite/gcc.target/gcn/simd-math-3-8.c
new file mode 100644
index 00000000000..defa2eb01dc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/gcn/simd-math-3-8.c
@@ -0,0 +1,13 @@
+#define STYPE v8si
+#define UTYPE v8usi
+#define N 8
+#include "simd-math-3.c"
+
+/* { dg-final { scan-assembler-times {__divmodv8si4@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__udivmodv8si4@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__divv8si3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__udivv8si3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__modv8si3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__umodv8si3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__divsi3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__udivsi3@rel32@lo} 1 } } */
diff --git a/gcc/testsuite/gcc.target/gcn/simd-math-3-char-16.c b/gcc/testsuite/gcc.target/gcn/simd-math-3-char-16.c
new file mode 100644
index 00000000000..e1c8fde74b0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/gcn/simd-math-3-char-16.c
@@ -0,0 +1,11 @@
+#define STYPE v16qi
+#define UTYPE v16uqi
+#define N 16
+#include "simd-math-3.c"
+
+/* { dg-final { scan-assembler-times {__divmodv16qi4@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__udivmodv16qi4@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__divv16qi3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__udivv16qi3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__modv16qi3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__umodv16qi3@rel32@lo} 1 } } */
diff --git a/gcc/testsuite/gcc.target/gcn/simd-math-3-char-2.c b/gcc/testsuite/gcc.target/gcn/simd-math-3-char-2.c
new file mode 100644
index 00000000000..d6db61dc3ad
--- /dev/null
+++ b/gcc/testsuite/gcc.target/gcn/simd-math-3-char-2.c
@@ -0,0 +1,11 @@
+#define STYPE v2qi
+#define UTYPE v2uqi
+#define N 2
+#include "simd-math-3.c"
+
+/* { dg-final { scan-assembler-times {__divmodv2qi4@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__udivmodv2qi4@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__divv2qi3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__udivv2qi3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__modv2qi3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__umodv2qi3@rel32@lo} 1 } } */
diff --git a/gcc/testsuite/gcc.target/gcn/simd-math-3-char-32.c b/gcc/testsuite/gcc.target/gcn/simd-math-3-char-32.c
new file mode 100644
index 00000000000..8885abd2092
--- /dev/null
+++ b/gcc/testsuite/gcc.target/gcn/simd-math-3-char-32.c
@@ -0,0 +1,11 @@
+#define STYPE v32qi
+#define UTYPE v32uqi
+#define N 32
+#include "simd-math-3.c"
+
+/* { dg-final { scan-assembler-times {__divmodv32qi4@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__udivmodv32qi4@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__divv32qi3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__udivv32qi3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__modv32qi3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__umodv32qi3@rel32@lo} 1 } } */
diff --git a/gcc/testsuite/gcc.target/gcn/simd-math-3-char-4.c b/gcc/testsuite/gcc.target/gcn/simd-math-3-char-4.c
new file mode 100644
index 00000000000..46bac50c1ee
--- /dev/null
+++ b/gcc/testsuite/gcc.target/gcn/simd-math-3-char-4.c
@@ -0,0 +1,11 @@
+#define STYPE v4qi
+#define UTYPE v4uqi
+#define N 4
+#include "simd-math-3.c"
+
+/* { dg-final { scan-assembler-times {__divmodv4qi4@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__udivmodv4qi4@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__divv4qi3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__udivv4qi3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__modv4qi3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__umodv4qi3@rel32@lo} 1 } } */
diff --git a/gcc/testsuite/gcc.target/gcn/simd-math-3-char-8.c b/gcc/testsuite/gcc.target/gcn/simd-math-3-char-8.c
new file mode 100644
index 00000000000..7cb3b4c2d28
--- /dev/null
+++ b/gcc/testsuite/gcc.target/gcn/simd-math-3-char-8.c
@@ -0,0 +1,11 @@
+#define STYPE v8qi
+#define UTYPE v8uqi
+#define N 8
+#include "simd-math-3.c"
+
+/* { dg-final { scan-assembler-times {__divmodv8qi4@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__udivmodv8qi4@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__divv8qi3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__udivv8qi3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__modv8qi3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__umodv8qi3@rel32@lo} 1 } } */
diff --git a/gcc/testsuite/gcc.target/gcn/simd-math-3-char-run-16.c b/gcc/testsuite/gcc.target/gcn/simd-math-3-char-run-16.c
new file mode 100644
index 00000000000..159b9802b07
--- /dev/null
+++ b/gcc/testsuite/gcc.target/gcn/simd-math-3-char-run-16.c
@@ -0,0 +1,2 @@
+/* { dg-do run } */
+#include "simd-math-3-char-16.c"
diff --git a/gcc/testsuite/gcc.target/gcn/simd-math-3-char-run-2.c b/gcc/testsuite/gcc.target/gcn/simd-math-3-char-run-2.c
new file mode 100644
index 00000000000..6e730d07a03
--- /dev/null
+++ b/gcc/testsuite/gcc.target/gcn/simd-math-3-char-run-2.c
@@ -0,0 +1,2 @@
+/* { dg-do run } */
+#include "simd-math-3-char-2.c"
diff --git a/gcc/testsuite/gcc.target/gcn/simd-math-3-char-run-32.c b/gcc/testsuite/gcc.target/gcn/simd-math-3-char-run-32.c
new file mode 100644
index 00000000000..8e4932bf5b9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/gcn/simd-math-3-char-run-32.c
@@ -0,0 +1,2 @@
+/* { dg-do run } */
+#include "simd-math-3-char-32.c"
diff --git a/gcc/testsuite/gcc.target/gcn/simd-math-3-char-run-4.c b/gcc/testsuite/gcc.target/gcn/simd-math-3-char-run-4.c
new file mode 100644
index 00000000000..d07a4318390
--- /dev/null
+++ b/gcc/testsuite/gcc.target/gcn/simd-math-3-char-run-4.c
@@ -0,0 +1,2 @@
+/* { dg-do run } */
+#include "simd-math-3-char-4.c"
diff --git a/gcc/testsuite/gcc.target/gcn/simd-math-3-char-run-8.c b/gcc/testsuite/gcc.target/gcn/simd-math-3-char-run-8.c
new file mode 100644
index 00000000000..64f789a4528
--- /dev/null
+++ b/gcc/testsuite/gcc.target/gcn/simd-math-3-char-run-8.c
@@ -0,0 +1,2 @@
+/* { dg-do run } */
+#include "simd-math-3-char-8.c"
diff --git a/gcc/testsuite/gcc.target/gcn/simd-math-3-char-run.c b/gcc/testsuite/gcc.target/gcn/simd-math-3-char-run.c
new file mode 100644
index 00000000000..4e2d4c0a19a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/gcn/simd-math-3-char-run.c
@@ -0,0 +1,2 @@
+/* { dg-do run } */
+#include "simd-math-3-char.c"
diff --git a/gcc/testsuite/gcc.target/gcn/simd-math-3-char.c b/gcc/testsuite/gcc.target/gcn/simd-math-3-char.c
new file mode 100644
index 00000000000..c338adb84e2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/gcn/simd-math-3-char.c
@@ -0,0 +1,10 @@
+#define STYPE v64qi
+#define UTYPE v64uqi
+#include "simd-math-3.c"
+
+/* { dg-final { scan-assembler-times {__divmodv64qi4@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__udivmodv64qi4@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__divv64qi3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__udivv64qi3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__modv64qi3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__umodv64qi3@rel32@lo} 1 } } */
diff --git a/gcc/testsuite/gcc.target/gcn/simd-math-3-long-16.c b/gcc/testsuite/gcc.target/gcn/simd-math-3-long-16.c
new file mode 100644
index 00000000000..748f4acbf83
--- /dev/null
+++ b/gcc/testsuite/gcc.target/gcn/simd-math-3-long-16.c
@@ -0,0 +1,11 @@
+#define STYPE v16di
+#define UTYPE v16udi
+#define N 16
+#include "simd-math-3.c"
+
+/* { dg-final { scan-assembler-times {__divmodv16di4@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__udivmodv16di4@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__divv16di3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__udivv16di3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__modv16di3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__umodv16di3@rel32@lo} 1 } } */
diff --git a/gcc/testsuite/gcc.target/gcn/simd-math-3-long-2.c b/gcc/testsuite/gcc.target/gcn/simd-math-3-long-2.c
new file mode 100644
index 00000000000..3ae63978ed2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/gcn/simd-math-3-long-2.c
@@ -0,0 +1,11 @@
+#define STYPE v2di
+#define UTYPE v2udi
+#define N 2
+#include "simd-math-3.c"
+
+/* { dg-final { scan-assembler-times {__divmodv2di4@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__udivmodv2di4@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__divv2di3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__udivv2di3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__modv2di3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__umodv2di3@rel32@lo} 1 } } */
diff --git a/gcc/testsuite/gcc.target/gcn/simd-math-3-long-32.c b/gcc/testsuite/gcc.target/gcn/simd-math-3-long-32.c
new file mode 100644
index 00000000000..0732de4e54d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/gcn/simd-math-3-long-32.c
@@ -0,0 +1,11 @@
+#define STYPE v32di
+#define UTYPE v32udi
+#define N 32
+#include "simd-math-3.c"
+
+/* { dg-final { scan-assembler-times {__divmodv32di4@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__udivmodv32di4@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__divv32di3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__udivv32di3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__modv32di3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__umodv32di3@rel32@lo} 1 } } */
diff --git a/gcc/testsuite/gcc.target/gcn/simd-math-3-long-4.c b/gcc/testsuite/gcc.target/gcn/simd-math-3-long-4.c
new file mode 100644
index 00000000000..e0b4f2693c2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/gcn/simd-math-3-long-4.c
@@ -0,0 +1,11 @@
+#define STYPE v4di
+#define UTYPE v4udi
+#define N 4
+#include "simd-math-3.c"
+
+/* { dg-final { scan-assembler-times {__divmodv4di4@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__udivmodv4di4@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__divv4di3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__udivv4di3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__modv4di3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__umodv4di3@rel32@lo} 1 } } */
diff --git a/gcc/testsuite/gcc.target/gcn/simd-math-3-long-8.c b/gcc/testsuite/gcc.target/gcn/simd-math-3-long-8.c
new file mode 100644
index 00000000000..10841b81c3b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/gcn/simd-math-3-long-8.c
@@ -0,0 +1,11 @@
+#define STYPE v8di
+#define UTYPE v8udi
+#define N 8
+#include "simd-math-3.c"
+
+/* { dg-final { scan-assembler-times {__divmodv8di4@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__udivmodv8di4@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__divv8di3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__udivv8di3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__modv8di3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__umodv8di3@rel32@lo} 1 } } */
diff --git a/gcc/testsuite/gcc.target/gcn/simd-math-3-long-run-16.c b/gcc/testsuite/gcc.target/gcn/simd-math-3-long-run-16.c
new file mode 100644
index 00000000000..7ce9c92a7a0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/gcn/simd-math-3-long-run-16.c
@@ -0,0 +1,2 @@
+/* { dg-do run } */
+#include "simd-math-3-long-16.c"
diff --git a/gcc/testsuite/gcc.target/gcn/simd-math-3-long-run-2.c b/gcc/testsuite/gcc.target/gcn/simd-math-3-long-run-2.c
new file mode 100644
index 00000000000..20996a56f5c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/gcn/simd-math-3-long-run-2.c
@@ -0,0 +1,2 @@
+/* { dg-do run } */
+#include "simd-math-3-long-2.c"
diff --git a/gcc/testsuite/gcc.target/gcn/simd-math-3-long-run-32.c b/gcc/testsuite/gcc.target/gcn/simd-math-3-long-run-32.c
new file mode 100644
index 00000000000..1ca25ac9b51
--- /dev/null
+++ b/gcc/testsuite/gcc.target/gcn/simd-math-3-long-run-32.c
@@ -0,0 +1,2 @@
+/* { dg-do run } */
+#include "simd-math-3-long-32.c"
diff --git a/gcc/testsuite/gcc.target/gcn/simd-math-3-long-run-4.c b/gcc/testsuite/gcc.target/gcn/simd-math-3-long-run-4.c
new file mode 100644
index 00000000000..b31769ad0bc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/gcn/simd-math-3-long-run-4.c
@@ -0,0 +1,2 @@
+/* { dg-do run } */
+#include "simd-math-3-long-4.c"
diff --git a/gcc/testsuite/gcc.target/gcn/simd-math-3-long-run-8.c b/gcc/testsuite/gcc.target/gcn/simd-math-3-long-run-8.c
new file mode 100644
index 00000000000..930256a0fc7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/gcn/simd-math-3-long-run-8.c
@@ -0,0 +1,2 @@
+/* { dg-do run } */
+#include "simd-math-3-long-8.c"
diff --git a/gcc/testsuite/gcc.target/gcn/simd-math-3-long-run.c b/gcc/testsuite/gcc.target/gcn/simd-math-3-long-run.c
new file mode 100644
index 00000000000..363e42573d2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/gcn/simd-math-3-long-run.c
@@ -0,0 +1,2 @@
+/* { dg-do run } */
+#include "simd-math-3-long.c"
diff --git a/gcc/testsuite/gcc.target/gcn/simd-math-3-long.c b/gcc/testsuite/gcc.target/gcn/simd-math-3-long.c
new file mode 100644
index 00000000000..9d263241536
--- /dev/null
+++ b/gcc/testsuite/gcc.target/gcn/simd-math-3-long.c
@@ -0,0 +1,10 @@
+#define STYPE v64di
+#define UTYPE v64udi
+#include "simd-math-3.c"
+
+/* { dg-final { scan-assembler-times {__divmodv64di4@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__udivmodv64di4@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__divv64di3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__udivv64di3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__modv64di3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__umodv64di3@rel32@lo} 1 } } */
diff --git a/gcc/testsuite/gcc.target/gcn/simd-math-3-run-16.c b/gcc/testsuite/gcc.target/gcn/simd-math-3-run-16.c
new file mode 100644
index 00000000000..ae8cdbffa1d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/gcn/simd-math-3-run-16.c
@@ -0,0 +1,2 @@
+/* { dg-do run } */
+#include "simd-math-3-16.c"
diff --git a/gcc/testsuite/gcc.target/gcn/simd-math-3-run-2.c b/gcc/testsuite/gcc.target/gcn/simd-math-3-run-2.c
new file mode 100644
index 00000000000..7d80382f23b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/gcn/simd-math-3-run-2.c
@@ -0,0 +1,2 @@
+/* { dg-do run } */
+#include "simd-math-3-2.c"
diff --git a/gcc/testsuite/gcc.target/gcn/simd-math-3-run-32.c b/gcc/testsuite/gcc.target/gcn/simd-math-3-run-32.c
new file mode 100644
index 00000000000..127fd36f0f4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/gcn/simd-math-3-run-32.c
@@ -0,0 +1,2 @@
+/* { dg-do run } */
+#include "simd-math-3-32.c"
diff --git a/gcc/testsuite/gcc.target/gcn/simd-math-3-run-4.c b/gcc/testsuite/gcc.target/gcn/simd-math-3-run-4.c
new file mode 100644
index 00000000000..e1d5b5de5c1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/gcn/simd-math-3-run-4.c
@@ -0,0 +1,3 @@
+/* { dg-do run } */
+#include "simd-math-3-4.c"
+
diff --git a/gcc/testsuite/gcc.target/gcn/simd-math-3-run-8.c b/gcc/testsuite/gcc.target/gcn/simd-math-3-run-8.c
new file mode 100644
index 00000000000..ec98b60ae2e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/gcn/simd-math-3-run-8.c
@@ -0,0 +1,2 @@
+/* { dg-do run } */
+#include "simd-math-3-8.c"
diff --git a/gcc/testsuite/gcc.target/gcn/simd-math-3-run.c b/gcc/testsuite/gcc.target/gcn/simd-math-3-run.c
new file mode 100644
index 00000000000..aca508cbc25
--- /dev/null
+++ b/gcc/testsuite/gcc.target/gcn/simd-math-3-run.c
@@ -0,0 +1,2 @@
+/* { dg-do run } */
+#include "simd-math-3.c"
diff --git a/gcc/testsuite/gcc.target/gcn/simd-math-3-short-16.c b/gcc/testsuite/gcc.target/gcn/simd-math-3-short-16.c
new file mode 100644
index 00000000000..019e6954306
--- /dev/null
+++ b/gcc/testsuite/gcc.target/gcn/simd-math-3-short-16.c
@@ -0,0 +1,11 @@
+#define STYPE v16hi
+#define UTYPE v16uhi
+#define N 16
+#include "simd-math-3.c"
+
+/* { dg-final { scan-assembler-times {__divmodv16hi4@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__udivmodv16hi4@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__divv16hi3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__udivv16hi3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__modv16hi3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__umodv16hi3@rel32@lo} 1 } } */
diff --git a/gcc/testsuite/gcc.target/gcn/simd-math-3-short-2.c b/gcc/testsuite/gcc.target/gcn/simd-math-3-short-2.c
new file mode 100644
index 00000000000..2b867c2dbe3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/gcn/simd-math-3-short-2.c
@@ -0,0 +1,11 @@
+#define STYPE v2hi
+#define UTYPE v2uhi
+#define N 2
+#include "simd-math-3.c"
+
+/* { dg-final { scan-assembler-times {__divmodv2hi4@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__udivmodv2hi4@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__divv2hi3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__udivv2hi3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__modv2hi3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__umodv2hi3@rel32@lo} 1 } } */
diff --git a/gcc/testsuite/gcc.target/gcn/simd-math-3-short-32.c b/gcc/testsuite/gcc.target/gcn/simd-math-3-short-32.c
new file mode 100644
index 00000000000..2a6fc2c9fbf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/gcn/simd-math-3-short-32.c
@@ -0,0 +1,11 @@
+#define STYPE v32hi
+#define UTYPE v32uhi
+#define N 32
+#include "simd-math-3.c"
+
+/* { dg-final { scan-assembler-times {__divmodv32hi4@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__udivmodv32hi4@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__divv32hi3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__udivv32hi3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__modv32hi3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__umodv32hi3@rel32@lo} 1 } } */
diff --git a/gcc/testsuite/gcc.target/gcn/simd-math-3-short-4.c b/gcc/testsuite/gcc.target/gcn/simd-math-3-short-4.c
new file mode 100644
index 00000000000..61ef8b55f21
--- /dev/null
+++ b/gcc/testsuite/gcc.target/gcn/simd-math-3-short-4.c
@@ -0,0 +1,11 @@
+#define STYPE v4hi
+#define UTYPE v4uhi
+#define N 4
+#include "simd-math-3.c"
+
+/* { dg-final { scan-assembler-times {__divmodv4hi4@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__udivmodv4hi4@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__divv4hi3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__udivv4hi3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__modv4hi3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__umodv4hi3@rel32@lo} 1 } } */
diff --git a/gcc/testsuite/gcc.target/gcn/simd-math-3-short-8.c b/gcc/testsuite/gcc.target/gcn/simd-math-3-short-8.c
new file mode 100644
index 00000000000..e716d4a1f00
--- /dev/null
+++ b/gcc/testsuite/gcc.target/gcn/simd-math-3-short-8.c
@@ -0,0 +1,11 @@
+#define STYPE v8hi
+#define UTYPE v8uhi
+#define N 8
+#include "simd-math-3.c"
+
+/* { dg-final { scan-assembler-times {__divmodv8hi4@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__udivmodv8hi4@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__divv8hi3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__udivv8hi3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__modv8hi3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__umodv8hi3@rel32@lo} 1 } } */
diff --git a/gcc/testsuite/gcc.target/gcn/simd-math-3-short-run-16.c b/gcc/testsuite/gcc.target/gcn/simd-math-3-short-run-16.c
new file mode 100644
index 00000000000..8ca866ce21f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/gcn/simd-math-3-short-run-16.c
@@ -0,0 +1,2 @@
+/* { dg-do run } */
+#include "simd-math-3-short-16.c"
diff --git a/gcc/testsuite/gcc.target/gcn/simd-math-3-short-run-2.c b/gcc/testsuite/gcc.target/gcn/simd-math-3-short-run-2.c
new file mode 100644
index 00000000000..6c6d8b68f28
--- /dev/null
+++ b/gcc/testsuite/gcc.target/gcn/simd-math-3-short-run-2.c
@@ -0,0 +1,2 @@
+/* { dg-do run } */
+#include "simd-math-3-short-2.c"
diff --git a/gcc/testsuite/gcc.target/gcn/simd-math-3-short-run-32.c b/gcc/testsuite/gcc.target/gcn/simd-math-3-short-run-32.c
new file mode 100644
index 00000000000..8c30ebc5528
--- /dev/null
+++ b/gcc/testsuite/gcc.target/gcn/simd-math-3-short-run-32.c
@@ -0,0 +1,2 @@
+/* { dg-do run } */
+#include "simd-math-3-short-32.c"
diff --git a/gcc/testsuite/gcc.target/gcn/simd-math-3-short-run-4.c b/gcc/testsuite/gcc.target/gcn/simd-math-3-short-run-4.c
new file mode 100644
index 00000000000..e70697e6e42
--- /dev/null
+++ b/gcc/testsuite/gcc.target/gcn/simd-math-3-short-run-4.c
@@ -0,0 +1,2 @@
+/* { dg-do run } */
+#include "simd-math-3-short-4.c"
diff --git a/gcc/testsuite/gcc.target/gcn/simd-math-3-short-run-8.c b/gcc/testsuite/gcc.target/gcn/simd-math-3-short-run-8.c
new file mode 100644
index 00000000000..9cb9a6fe297
--- /dev/null
+++ b/gcc/testsuite/gcc.target/gcn/simd-math-3-short-run-8.c
@@ -0,0 +1,2 @@
+/* { dg-do run } */
+#include "simd-math-3-short-8.c"
diff --git a/gcc/testsuite/gcc.target/gcn/simd-math-3-short-run.c b/gcc/testsuite/gcc.target/gcn/simd-math-3-short-run.c
new file mode 100644
index 00000000000..08f72671e96
--- /dev/null
+++ b/gcc/testsuite/gcc.target/gcn/simd-math-3-short-run.c
@@ -0,0 +1,2 @@
+/* { dg-do run } */
+#include "simd-math-3-short.c"
diff --git a/gcc/testsuite/gcc.target/gcn/simd-math-3-short.c b/gcc/testsuite/gcc.target/gcn/simd-math-3-short.c
new file mode 100644
index 00000000000..7d4723ca43e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/gcn/simd-math-3-short.c
@@ -0,0 +1,10 @@
+#define STYPE v64hi
+#define UTYPE v64uhi
+#include "simd-math-3.c"
+
+/* { dg-final { scan-assembler-times {__divmodv64hi4@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__udivmodv64hi4@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__divv64hi3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__udivv64hi3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__modv64hi3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__umodv64hi3@rel32@lo} 1 } } */
diff --git a/gcc/testsuite/gcc.target/gcn/simd-math-3.c b/gcc/testsuite/gcc.target/gcn/simd-math-3.c
new file mode 100644
index 00000000000..90a2e6f5488
--- /dev/null
+++ b/gcc/testsuite/gcc.target/gcn/simd-math-3.c
@@ -0,0 +1,186 @@
+/* Test that signed and unsigned division and modulus use the correct
+   vector routines and give the correct results.  */
+
+/* Setting it this way ensures the run tests use the same flag as the
+   compile tests.  */
+#pragma GCC optimize("O2")
+
+typedef signed char v2qi __attribute__ ((vector_size (2)));
+typedef signed char v4qi __attribute__ ((vector_size (4)));
+typedef signed char v8qi __attribute__ ((vector_size (8)));
+typedef signed char v16qi __attribute__ ((vector_size (16)));
+typedef signed char v32qi __attribute__ ((vector_size (32)));
+typedef signed char v64qi __attribute__ ((vector_size (64)));
+
+typedef unsigned char v2uqi __attribute__ ((vector_size (2)));
+typedef unsigned char v4uqi __attribute__ ((vector_size (4)));
+typedef unsigned char v8uqi __attribute__ ((vector_size (8)));
+typedef unsigned char v16uqi __attribute__ ((vector_size (16)));
+typedef unsigned char v32uqi __attribute__ ((vector_size (32)));
+typedef unsigned char v64uqi __attribute__ ((vector_size (64)));
+
+typedef short v2hi __attribute__ ((vector_size (4)));
+typedef short v4hi __attribute__ ((vector_size (8)));
+typedef short v8hi __attribute__ ((vector_size (16)));
+typedef short v16hi __attribute__ ((vector_size (32)));
+typedef short v32hi __attribute__ ((vector_size (64)));
+typedef short v64hi __attribute__ ((vector_size (128)));
+
+typedef unsigned short v2uhi __attribute__ ((vector_size (4)));
+typedef unsigned short v4uhi __attribute__ ((vector_size (8)));
+typedef unsigned short v8uhi __attribute__ ((vector_size (16)));
+typedef unsigned short v16uhi __attribute__ ((vector_size (32)));
+typedef unsigned short v32uhi __attribute__ ((vector_size (64)));
+typedef unsigned short v64uhi __attribute__ ((vector_size (128)));
+
+typedef int v2si __attribute__ ((vector_size (8)));
+typedef int v4si __attribute__ ((vector_size (16)));
+typedef int v8si __attribute__ ((vector_size (32)));
+typedef int v16si __attribute__ ((vector_size (64)));
+typedef int v32si __attribute__ ((vector_size (128)));
+typedef int v64si __attribute__ ((vector_size (256)));
+
+typedef unsigned int v2usi __attribute__ ((vector_size (8)));
+typedef unsigned int v4usi __attribute__ ((vector_size (16)));
+typedef unsigned int v8usi __attribute__ ((vector_size (32)));
+typedef unsigned int v16usi __attribute__ ((vector_size (64)));
+typedef unsigned int v32usi __attribute__ ((vector_size (128)));
+typedef unsigned int v64usi __attribute__ ((vector_size (256)));
+
+typedef long v2di __attribute__ ((vector_size (16)));
+typedef long v4di __attribute__ ((vector_size (32)));
+typedef long v8di __attribute__ ((vector_size (64)));
+typedef long v16di __attribute__ ((vector_size (128)));
+typedef long v32di __attribute__ ((vector_size (256)));
+typedef long v64di __attribute__ ((vector_size (512)));
+
+typedef unsigned long v2udi __attribute__ ((vector_size (16)));
+typedef unsigned long v4udi __attribute__ ((vector_size (32)));
+typedef unsigned long v8udi __attribute__ ((vector_size (64)));
+typedef unsigned long v16udi __attribute__ ((vector_size (128)));
+typedef unsigned long v32udi __attribute__ ((vector_size (256)));
+typedef unsigned long v64udi __attribute__ ((vector_size (512)));
+
+#ifndef STYPE
+#define STYPE v64si
+#define UTYPE v64usi
+#endif
+#ifndef N
+#define N 64
+#endif
+
+STYPE a;
+STYPE b;
+UTYPE ua;
+UTYPE ub;
+
+int main()
+{
+  int i;
+  STYPE squot, srem;
+  UTYPE usquot, usrem;
+  STYPE vquot, vrem;
+  UTYPE uvquot, uvrem;
+  STYPE vquot2, vrem2;
+  UTYPE uvquot2, uvrem2;
+  STYPE refquot, refrem;
+  UTYPE urefquot, urefrem;
+
+  for (i = 0; i < N; i++)
+    {
+      a[i] = i * (i >> 2) + (i >> 1);
+      ua[i] = a[i];
+      b[i] = i;
+      ub[i] = i;
+    }
+
+  for (i = 0; i < N; i++)
+    {
+      /* Calculate reference values using regular scalar div and mod.  */
+      refquot[i] = a[i] / b[i];
+      __asm__ ("" ::: "memory");
+      refrem[i] = a[i] % b[i];
+      urefquot[i] = ua[i] / ub[i];
+      __asm__ ("" ::: "memory");
+      urefrem[i] = ua[i] % ub[i];
+    }
+
+  __asm__ ("" ::: "memory");
+  /* Scalar with divmod.  */
+  for (i = 0; i < N; i++)
+    {
+      squot[i] = a[i] / b[i];
+      srem[i] = a[i] % b[i];
+      usquot[i] = ua[i] / ub[i];
+      usrem[i] = ua[i] % ub[i];
+    }
+
+  __asm__ ("" ::: "memory");
+  /* Vectorized with divmod.  */
+  vquot = a / b;
+  vrem = a % b;
+  uvquot = ua / ub;
+  uvrem = ua % ub;
+
+  __asm__ ("" ::: "memory");
+  /* Vectorized with separte div and mod.  */
+  vquot2 = a / b;
+  __asm__ ("" ::: "memory");
+  vrem2 = a % b;
+  uvquot2 = ua / ub;
+  __asm__ ("" ::: "memory");
+  uvrem2 = ua % ub;
+
+#ifdef DEBUG
+#define DUMP(VAR) \
+  __builtin_printf ("%8s: ", #VAR); \
+  for (i = 0; i < N; i++) \
+    __builtin_printf ("%d ", (int)VAR[i]); \
+  __builtin_printf ("\n");
+  DUMP (refquot)
+  DUMP (squot)
+  DUMP (vquot)
+  DUMP (vquot2)
+  __builtin_printf ("\n");
+  DUMP (urefquot)
+  DUMP (usquot)
+  DUMP (uvquot)
+  DUMP (uvquot2)
+  __builtin_printf ("\n");
+  DUMP (refrem)
+  DUMP (srem)
+  DUMP (vrem)
+  DUMP (vrem2)
+  __builtin_printf ("\n");
+  DUMP (urefrem)
+  DUMP (usrem)
+  DUMP (uvrem)
+  DUMP (uvrem2)
+  __builtin_printf ("\n");
+#endif
+
+  for (i = 0; i < N; i++)
+    if (squot[i] != refquot[i]
+	|| vquot[i] != refquot[i]
+	|| vquot2[i] != refquot[i]
+	|| usquot[i] != urefquot[i]
+	|| uvquot[i] != urefquot[i]
+	|| uvquot2[i] != urefquot[i]
+	|| srem[i] != refrem[i]
+	|| vrem[i] != refrem[i]
+	|| vrem2[i] != refrem[i]
+	|| usrem[i] != urefrem[i]
+	|| uvrem[i] != urefrem[i]
+	|| uvrem2[i] != urefrem[i])
+      __builtin_abort ();
+  return 0;
+}
+
+/* { dg-final { scan-assembler-times {__divmodv64si4@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__udivmodv64si4@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__divv64si3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__udivv64si3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__modv64si3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__umodv64si3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__divsi3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__udivsi3@rel32@lo} 1 } } */
diff --git a/gcc/testsuite/gcc.target/gcn/simd-math-4-char-run.c b/gcc/testsuite/gcc.target/gcn/simd-math-4-char-run.c
new file mode 100644
index 00000000000..b328a3eeec5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/gcn/simd-math-4-char-run.c
@@ -0,0 +1,2 @@
+/* { dg-do run } */
+#include "simd-math-4-char.c"
diff --git a/gcc/testsuite/gcc.target/gcn/simd-math-4-char.c b/gcc/testsuite/gcc.target/gcn/simd-math-4-char.c
new file mode 100644
index 00000000000..099b8e28865
--- /dev/null
+++ b/gcc/testsuite/gcc.target/gcn/simd-math-4-char.c
@@ -0,0 +1,9 @@
+#define TYPE v64qi
+#include "simd-math-4.c"
+
+/* { dg-final { scan-assembler-times {__divmodv64qi4@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__udivmodv64qi4@rel32@lo} 0 } } */
+/* { dg-final { scan-assembler-times {__divv64qi3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__udivv64qi3@rel32@lo} 0 } } */
+/* { dg-final { scan-assembler-times {__modv64qi3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__umodv64qi3@rel32@lo} 0 } } */
diff --git a/gcc/testsuite/gcc.target/gcn/simd-math-4-long-run.c b/gcc/testsuite/gcc.target/gcn/simd-math-4-long-run.c
new file mode 100644
index 00000000000..34cbc467709
--- /dev/null
+++ b/gcc/testsuite/gcc.target/gcn/simd-math-4-long-run.c
@@ -0,0 +1,2 @@
+/* { dg-do run } */
+#include "simd-math-4-long.c"
diff --git a/gcc/testsuite/gcc.target/gcn/simd-math-4-long.c b/gcc/testsuite/gcc.target/gcn/simd-math-4-long.c
new file mode 100644
index 00000000000..fff9f5771e4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/gcn/simd-math-4-long.c
@@ -0,0 +1,9 @@
+#define TYPE v64di
+#include "simd-math-4.c"
+
+/* { dg-final { scan-assembler-times {__divmodv64di4@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__udivmodv64di4@rel32@lo} 0 } } */
+/* { dg-final { scan-assembler-times {__divv64di3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__udivv64di3@rel32@lo} 0 } } */
+/* { dg-final { scan-assembler-times {__modv64di3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__umodv64di3@rel32@lo} 0 } } */
diff --git a/gcc/testsuite/gcc.target/gcn/simd-math-4-run.c b/gcc/testsuite/gcc.target/gcn/simd-math-4-run.c
new file mode 100644
index 00000000000..3b98c0e7738
--- /dev/null
+++ b/gcc/testsuite/gcc.target/gcn/simd-math-4-run.c
@@ -0,0 +1,2 @@
+/* { dg-do run } */
+#include "simd-math-4.c"
diff --git a/gcc/testsuite/gcc.target/gcn/simd-math-4-short-run.c b/gcc/testsuite/gcc.target/gcn/simd-math-4-short-run.c
new file mode 100644
index 00000000000..4cbeb97432e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/gcn/simd-math-4-short-run.c
@@ -0,0 +1,2 @@
+/* { dg-do run } */
+#include "simd-math-4-short.c"
diff --git a/gcc/testsuite/gcc.target/gcn/simd-math-4-short.c b/gcc/testsuite/gcc.target/gcn/simd-math-4-short.c
new file mode 100644
index 00000000000..3af4ad1be7e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/gcn/simd-math-4-short.c
@@ -0,0 +1,9 @@
+#define TYPE v64hi
+#include "simd-math-4.c"
+
+/* { dg-final { scan-assembler-times {__divmodv64hi4@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__udivmodv64hi4@rel32@lo} 0 } } */
+/* { dg-final { scan-assembler-times {__divv64hi3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__udivv64hi3@rel32@lo} 0 } } */
+/* { dg-final { scan-assembler-times {__modv64hi3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__umodv64hi3@rel32@lo} 0 } } */
diff --git a/gcc/testsuite/gcc.target/gcn/simd-math-4.c b/gcc/testsuite/gcc.target/gcn/simd-math-4.c
new file mode 100644
index 00000000000..39fa6c56f96
--- /dev/null
+++ b/gcc/testsuite/gcc.target/gcn/simd-math-4.c
@@ -0,0 +1,99 @@
+/* Test that signed division and modulus give the correct result with
+   different variations of signedness.  */
+
+/* Setting it this way ensures the run tests use the same flag as the
+   compile tests.  */
+#pragma GCC optimize("O2")
+
+typedef char v64qi __attribute__ ((vector_size (64)));
+typedef short v64hi __attribute__ ((vector_size (128)));
+typedef int v64si __attribute__ ((vector_size (256)));
+typedef long v64di __attribute__ ((vector_size (512)));
+
+#ifndef TYPE
+#define TYPE v64si
+#endif
+#define N 64
+
+TYPE a;
+TYPE b;
+
+int main()
+{
+  int i;
+  TYPE squot, srem;
+  TYPE usquot, usrem;
+  TYPE vquot, vrem;
+  TYPE vquot2, vrem2;
+  TYPE refquot, refrem;
+
+  for (i = 0; i < 64; i++)
+    {
+      a[i] = i * (i >> 2) * (i&1 ? -1 : 1);
+      b[i] = i * (i&2 ? -1 : 1);
+    }
+
+  for (i = 0; i < N; i++)
+    {
+      /* Calculate reference values using regular scalar div and mod.  */
+      refquot[i] = a[i] / b[i];
+      __asm__ ("" ::: "memory");
+      refrem[i] = a[i] % b[i];
+    }
+
+  __asm__ ("" ::: "memory");
+  /* Scalar with divmod.  */
+  for (i = 0; i < N; i++)
+    {
+      squot[i] = a[i] / b[i];
+      srem[i] = a[i] % b[i];
+    }
+
+  __asm__ ("" ::: "memory");
+  /* Vectorized with divmod.  */
+  vquot = a / b;
+  vrem = a % b;
+
+  __asm__ ("" ::: "memory");
+  /* Vectorized with separte div and mod.  */
+  vquot2 = a / b;
+  __asm__ ("" ::: "memory");
+  vrem2 = a % b;
+
+#ifdef DEBUG
+#define DUMP(VAR) \
+  __builtin_printf ("%8s: ", #VAR); \
+  for (i = 0; i < N; i++) \
+    __builtin_printf ("%d ", (int)VAR[i]); \
+  __builtin_printf ("\n");
+  DUMP (refquot)
+  DUMP (squot)
+  DUMP (vquot)
+  DUMP (vquot2)
+  __builtin_printf ("\n");
+  DUMP (refrem)
+  DUMP (srem)
+  DUMP (vrem)
+  DUMP (vrem2)
+  __builtin_printf ("\n");
+#endif
+
+  for (i = 0; i < N; i++)
+    if (squot[i] != refquot[i]
+	|| vquot[i] != refquot[i]
+	|| vquot2[i] != refquot[i]
+	|| srem[i] != refrem[i]
+	|| vrem[i] != refrem[i]
+	|| vrem2[i] != refrem[i])
+      __builtin_abort ();
+  return 0;
+}
+
+/* { dg-final { scan-assembler-times {__divmodv64si4@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__udivmodv64si4@rel32@lo} 0 } } */
+/* { dg-final { scan-assembler-times {__divv64si3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__udivv64si3@rel32@lo} 0 } } */
+/* { dg-final { scan-assembler-times {__modv64si3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__umodv64si3@rel32@lo} 0 } } */
+/* { dg-final { scan-assembler-times {__divsi3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__udivsi3@rel32@lo} 0 } } */
diff --git a/gcc/testsuite/gcc.target/gcn/simd-math-5-16.c b/gcc/testsuite/gcc.target/gcn/simd-math-5-16.c
new file mode 100644
index 00000000000..241589bfcb5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/gcn/simd-math-5-16.c
@@ -0,0 +1,8 @@
+#define N 16
+#include "simd-math-5.c"
+
+/* { dg-final { scan-assembler-times {__divmodv16si4@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__divv16si3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__udivv16si3@rel32@lo} 0 } } */
+/* { dg-final { scan-assembler-times {__modv16si3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__umodv16si3@rel32@lo} 0 } } */
diff --git a/gcc/testsuite/gcc.target/gcn/simd-math-5-32.c b/gcc/testsuite/gcc.target/gcn/simd-math-5-32.c
new file mode 100644
index 00000000000..803e2a966ae
--- /dev/null
+++ b/gcc/testsuite/gcc.target/gcn/simd-math-5-32.c
@@ -0,0 +1,8 @@
+#define N 32
+#include "simd-math-5.c"
+
+/* { dg-final { scan-assembler-times {__divmodv32si4@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__divv32si3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__udivv32si3@rel32@lo} 0 } } */
+/* { dg-final { scan-assembler-times {__modv32si3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__umodv32si3@rel32@lo} 0 } } */
diff --git a/gcc/testsuite/gcc.target/gcn/simd-math-5-4.c b/gcc/testsuite/gcc.target/gcn/simd-math-5-4.c
new file mode 100644
index 00000000000..08df45154a5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/gcn/simd-math-5-4.c
@@ -0,0 +1,8 @@
+#define N 4
+#include "simd-math-5.c"
+
+/* { dg-final { scan-assembler-times {__divmodv4si4@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__divv4si3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__udivv4si3@rel32@lo} 0 } } */
+/* { dg-final { scan-assembler-times {__modv4si3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__umodv4si3@rel32@lo} 0 } } */
diff --git a/gcc/testsuite/gcc.target/gcn/simd-math-5-8.c b/gcc/testsuite/gcc.target/gcn/simd-math-5-8.c
new file mode 100644
index 00000000000..47afb35a0a9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/gcn/simd-math-5-8.c
@@ -0,0 +1,8 @@
+#define N 8
+#include "simd-math-5.c"
+
+/* { dg-final { scan-assembler-times {__divmodv8si4@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__divv8si3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__udivv8si3@rel32@lo} 0 } } */
+/* { dg-final { scan-assembler-times {__modv8si3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__umodv8si3@rel32@lo} 0 } } */
diff --git a/gcc/testsuite/gcc.target/gcn/simd-math-5-char-16.c b/gcc/testsuite/gcc.target/gcn/simd-math-5-char-16.c
new file mode 100644
index 00000000000..0ebf640dc8c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/gcn/simd-math-5-char-16.c
@@ -0,0 +1,11 @@
+#define TYPE char
+#define N 16
+#include "simd-math-5.c"
+
+/* C integer promotion means that div uses HImode and divmod doesn't match.  */
+/* { dg-final { scan-assembler-times {__divmod16.i4@rel32@lo} 1 { xfail *-*-* } } } */
+/* { dg-final { scan-assembler-times {__divv16hi3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__divv16qi3@rel32@lo} 0 } } */
+/* { dg-final { scan-assembler-times {__udivv16qi3@rel32@lo} 0 } } */
+/* { dg-final { scan-assembler-times {__modv16qi3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__umodv16qi3@rel32@lo} 0 } } */
diff --git a/gcc/testsuite/gcc.target/gcn/simd-math-5-char-32.c b/gcc/testsuite/gcc.target/gcn/simd-math-5-char-32.c
new file mode 100644
index 00000000000..0905f31048c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/gcn/simd-math-5-char-32.c
@@ -0,0 +1,11 @@
+#define TYPE char
+#define N 32
+#include "simd-math-5.c"
+
+/* C integer promotion means that div uses HImode and divmod doesn't match.  */
+/* { dg-final { scan-assembler-times {__divmod32.i4@rel32@lo} 1 { xfail *-*-* } } } */
+/* { dg-final { scan-assembler-times {__divv32hi3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__divv32qi3@rel32@lo} 0 } } */
+/* { dg-final { scan-assembler-times {__udivv32qi3@rel32@lo} 0 } } */
+/* { dg-final { scan-assembler-times {__modv32qi3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__umodv32qi3@rel32@lo} 0 } } */
diff --git a/gcc/testsuite/gcc.target/gcn/simd-math-5-char-4.c b/gcc/testsuite/gcc.target/gcn/simd-math-5-char-4.c
new file mode 100644
index 00000000000..772fe37fe81
--- /dev/null
+++ b/gcc/testsuite/gcc.target/gcn/simd-math-5-char-4.c
@@ -0,0 +1,11 @@
+#define TYPE char
+#define N 4
+#include "simd-math-5.c"
+
+/* C integer promotion means that div uses HImode and divmod doesn't match.  */
+/* { dg-final { scan-assembler-times {__divmod4.i4@rel32@lo} 1 { xfail *-*-* } } } */
+/* { dg-final { scan-assembler-times {__divv4hi3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__divv4qi3@rel32@lo} 0 } } */
+/* { dg-final { scan-assembler-times {__udivv4qi3@rel32@lo} 0 } } */
+/* { dg-final { scan-assembler-times {__modv4qi3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__umodv4qi3@rel32@lo} 0 } } */
diff --git a/gcc/testsuite/gcc.target/gcn/simd-math-5-char-8.c b/gcc/testsuite/gcc.target/gcn/simd-math-5-char-8.c
new file mode 100644
index 00000000000..539ce9a7f91
--- /dev/null
+++ b/gcc/testsuite/gcc.target/gcn/simd-math-5-char-8.c
@@ -0,0 +1,11 @@
+#define TYPE char
+#define N 8
+#include "simd-math-5.c"
+
+/* C integer promotion means that div uses HImode and divmod doesn't match.  */
+/* { dg-final { scan-assembler-times {__divmod8.i4@rel32@lo} 1 { xfail *-*-* } } } */
+/* { dg-final { scan-assembler-times {__divv8hi3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__divv8qi3@rel32@lo} 0 } } */
+/* { dg-final { scan-assembler-times {__udivv8qi3@rel32@lo} 0 } } */
+/* { dg-final { scan-assembler-times {__modv8qi3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__umodv8qi3@rel32@lo} 0 } } */
diff --git a/gcc/testsuite/gcc.target/gcn/simd-math-5-char-run-16.c b/gcc/testsuite/gcc.target/gcn/simd-math-5-char-run-16.c
new file mode 100644
index 00000000000..0f1af0858ca
--- /dev/null
+++ b/gcc/testsuite/gcc.target/gcn/simd-math-5-char-run-16.c
@@ -0,0 +1,2 @@
+/* { dg-do run } */
+#include "simd-math-5-char-16.c"
diff --git a/gcc/testsuite/gcc.target/gcn/simd-math-5-char-run-32.c b/gcc/testsuite/gcc.target/gcn/simd-math-5-char-run-32.c
new file mode 100644
index 00000000000..a2794c84a83
--- /dev/null
+++ b/gcc/testsuite/gcc.target/gcn/simd-math-5-char-run-32.c
@@ -0,0 +1,2 @@
+/* { dg-do run } */
+#include "simd-math-5-char-32.c"
diff --git a/gcc/testsuite/gcc.target/gcn/simd-math-5-char-run-4.c b/gcc/testsuite/gcc.target/gcn/simd-math-5-char-run-4.c
new file mode 100644
index 00000000000..a8e418770ff
--- /dev/null
+++ b/gcc/testsuite/gcc.target/gcn/simd-math-5-char-run-4.c
@@ -0,0 +1,2 @@
+/* { dg-do run } */
+#include "simd-math-5-char-4.c"
diff --git a/gcc/testsuite/gcc.target/gcn/simd-math-5-char-run-8.c b/gcc/testsuite/gcc.target/gcn/simd-math-5-char-run-8.c
new file mode 100644
index 00000000000..7a6a95922cc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/gcn/simd-math-5-char-run-8.c
@@ -0,0 +1,2 @@
+/* { dg-do run } */
+#include "simd-math-5-char-8.c"
diff --git a/gcc/testsuite/gcc.target/gcn/simd-math-5-char-run.c b/gcc/testsuite/gcc.target/gcn/simd-math-5-char-run.c
new file mode 100644
index 00000000000..d3ca775f6a9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/gcn/simd-math-5-char-run.c
@@ -0,0 +1,2 @@
+/* { dg-do run } */
+#include "simd-math-5-char.c"
diff --git a/gcc/testsuite/gcc.target/gcn/simd-math-5-char.c b/gcc/testsuite/gcc.target/gcn/simd-math-5-char.c
new file mode 100644
index 00000000000..2321c8390c6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/gcn/simd-math-5-char.c
@@ -0,0 +1,10 @@
+#define TYPE char
+#include "simd-math-5.c"
+
+/* C integer promotion means that div uses HImode and divmod doesn't match.  */
+/* { dg-final { scan-assembler-times {__divmodv64si4@rel32@lo} 1 { xfail *-*-* } } } */
+/* { dg-final { scan-assembler-times {__divv64hi3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__divv64qi3@rel32@lo} 0 } } */
+/* { dg-final { scan-assembler-times {__udivv64qi3@rel32@lo} 0 } } */
+/* { dg-final { scan-assembler-times {__modv64qi3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__umodv64qi3@rel32@lo} 0 } } */
diff --git a/gcc/testsuite/gcc.target/gcn/simd-math-5-long-16.c b/gcc/testsuite/gcc.target/gcn/simd-math-5-long-16.c
new file mode 100644
index 00000000000..21a447c84e6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/gcn/simd-math-5-long-16.c
@@ -0,0 +1,9 @@
+#define TYPE long
+#define N 16
+#include "simd-math-5.c"
+
+/* { dg-final { scan-assembler-times {__divmodv16di4@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__divv16di3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__udivv16di3@rel32@lo} 0 } } */
+/* { dg-final { scan-assembler-times {__modv16di3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__umodv16di3@rel32@lo} 0 } } */
diff --git a/gcc/testsuite/gcc.target/gcn/simd-math-5-long-32.c b/gcc/testsuite/gcc.target/gcn/simd-math-5-long-32.c
new file mode 100644
index 00000000000..624b6097953
--- /dev/null
+++ b/gcc/testsuite/gcc.target/gcn/simd-math-5-long-32.c
@@ -0,0 +1,9 @@
+#define TYPE long
+#define N 32
+#include "simd-math-5.c"
+
+/* { dg-final { scan-assembler-times {__divmodv32di4@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__divv32di3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__udivv32di3@rel32@lo} 0 } } */
+/* { dg-final { scan-assembler-times {__modv32di3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__umodv32di3@rel32@lo} 0 } } */
diff --git a/gcc/testsuite/gcc.target/gcn/simd-math-5-long-4.c b/gcc/testsuite/gcc.target/gcn/simd-math-5-long-4.c
new file mode 100644
index 00000000000..10cf71e6770
--- /dev/null
+++ b/gcc/testsuite/gcc.target/gcn/simd-math-5-long-4.c
@@ -0,0 +1,9 @@
+#define TYPE long
+#define N 4
+#include "simd-math-5.c"
+
+/* { dg-final { scan-assembler-times {__divmodv4di4@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__divv4di3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__udivv4di3@rel32@lo} 0 } } */
+/* { dg-final { scan-assembler-times {__modv4di3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__umodv4di3@rel32@lo} 0 } } */
diff --git a/gcc/testsuite/gcc.target/gcn/simd-math-5-long-8.c b/gcc/testsuite/gcc.target/gcn/simd-math-5-long-8.c
new file mode 100644
index 00000000000..3b264f6cb3e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/gcn/simd-math-5-long-8.c
@@ -0,0 +1,9 @@
+#define TYPE long
+#define N 8
+#include "simd-math-5.c"
+
+/* { dg-final { scan-assembler-times {__divmodv8di4@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__divv8di3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__udivv8di3@rel32@lo} 0 } } */
+/* { dg-final { scan-assembler-times {__modv8di3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__umodv8di3@rel32@lo} 0 } } */
diff --git a/gcc/testsuite/gcc.target/gcn/simd-math-5-long-run-16.c b/gcc/testsuite/gcc.target/gcn/simd-math-5-long-run-16.c
new file mode 100644
index 00000000000..20919255404
--- /dev/null
+++ b/gcc/testsuite/gcc.target/gcn/simd-math-5-long-run-16.c
@@ -0,0 +1,2 @@
+/* { dg-do run } */
+#include "simd-math-5-long-16.c"
diff --git a/gcc/testsuite/gcc.target/gcn/simd-math-5-long-run-32.c b/gcc/testsuite/gcc.target/gcn/simd-math-5-long-run-32.c
new file mode 100644
index 00000000000..c7ff7ca52b9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/gcn/simd-math-5-long-run-32.c
@@ -0,0 +1,2 @@
+/* { dg-do run } */
+#include "simd-math-5-long-32.c"
diff --git a/gcc/testsuite/gcc.target/gcn/simd-math-5-long-run-4.c b/gcc/testsuite/gcc.target/gcn/simd-math-5-long-run-4.c
new file mode 100644
index 00000000000..c6cf3344ec4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/gcn/simd-math-5-long-run-4.c
@@ -0,0 +1,2 @@
+/* { dg-do run } */
+#include "simd-math-5-long-4.c"
diff --git a/gcc/testsuite/gcc.target/gcn/simd-math-5-long-run-8.c b/gcc/testsuite/gcc.target/gcn/simd-math-5-long-run-8.c
new file mode 100644
index 00000000000..85fdf6ffe1b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/gcn/simd-math-5-long-run-8.c
@@ -0,0 +1,2 @@
+/* { dg-do run } */
+#include "simd-math-5-long-8.c"
diff --git a/gcc/testsuite/gcc.target/gcn/simd-math-5-long-run.c b/gcc/testsuite/gcc.target/gcn/simd-math-5-long-run.c
new file mode 100644
index 00000000000..b948fa08c7e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/gcn/simd-math-5-long-run.c
@@ -0,0 +1,2 @@
+/* { dg-do run } */
+#include "simd-math-5-long.c"
diff --git a/gcc/testsuite/gcc.target/gcn/simd-math-5-long.c b/gcc/testsuite/gcc.target/gcn/simd-math-5-long.c
new file mode 100644
index 00000000000..df15b74ccb8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/gcn/simd-math-5-long.c
@@ -0,0 +1,8 @@
+#define TYPE long
+#include "simd-math-5.c"
+
+/* { dg-final { scan-assembler-times {__divmodv64di4@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__divv64di3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__udivv64di3@rel32@lo} 0 } } */
+/* { dg-final { scan-assembler-times {__modv64di3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__umodv64di3@rel32@lo} 0 } } */
diff --git a/gcc/testsuite/gcc.target/gcn/simd-math-5-run-16.c b/gcc/testsuite/gcc.target/gcn/simd-math-5-run-16.c
new file mode 100644
index 00000000000..20919255404
--- /dev/null
+++ b/gcc/testsuite/gcc.target/gcn/simd-math-5-run-16.c
@@ -0,0 +1,2 @@
+/* { dg-do run } */
+#include "simd-math-5-long-16.c"
diff --git a/gcc/testsuite/gcc.target/gcn/simd-math-5-run-32.c b/gcc/testsuite/gcc.target/gcn/simd-math-5-run-32.c
new file mode 100644
index 00000000000..c7ff7ca52b9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/gcn/simd-math-5-run-32.c
@@ -0,0 +1,2 @@
+/* { dg-do run } */
+#include "simd-math-5-long-32.c"
diff --git a/gcc/testsuite/gcc.target/gcn/simd-math-5-run-4.c b/gcc/testsuite/gcc.target/gcn/simd-math-5-run-4.c
new file mode 100644
index 00000000000..c6cf3344ec4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/gcn/simd-math-5-run-4.c
@@ -0,0 +1,2 @@
+/* { dg-do run } */
+#include "simd-math-5-long-4.c"
diff --git a/gcc/testsuite/gcc.target/gcn/simd-math-5-run-8.c b/gcc/testsuite/gcc.target/gcn/simd-math-5-run-8.c
new file mode 100644
index 00000000000..85fdf6ffe1b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/gcn/simd-math-5-run-8.c
@@ -0,0 +1,2 @@
+/* { dg-do run } */
+#include "simd-math-5-long-8.c"
diff --git a/gcc/testsuite/gcc.target/gcn/simd-math-5-run.c b/gcc/testsuite/gcc.target/gcn/simd-math-5-run.c
new file mode 100644
index 00000000000..de6504c737c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/gcn/simd-math-5-run.c
@@ -0,0 +1,2 @@
+/* { dg-do run } */
+#include "simd-math-5.c"
diff --git a/gcc/testsuite/gcc.target/gcn/simd-math-5-short-16.c b/gcc/testsuite/gcc.target/gcn/simd-math-5-short-16.c
new file mode 100644
index 00000000000..5d5953bc604
--- /dev/null
+++ b/gcc/testsuite/gcc.target/gcn/simd-math-5-short-16.c
@@ -0,0 +1,11 @@
+#define TYPE short
+#define N 16
+#include "simd-math-5.c"
+
+/* C integer promotion means that div uses SImode and divmod doesn't match.  */
+/* { dg-final { scan-assembler-times {__divmod16.i4@rel32@lo} 1 { xfail *-*-* } } } */
+/* { dg-final { scan-assembler-times {__divv16si3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__divv16hi3@rel32@lo} 0 } } */
+/* { dg-final { scan-assembler-times {__udivv16hi3@rel32@lo} 0 } } */
+/* { dg-final { scan-assembler-times {__modv16hi3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__umodv16hi3@rel32@lo} 0 } } */
diff --git a/gcc/testsuite/gcc.target/gcn/simd-math-5-short-32.c b/gcc/testsuite/gcc.target/gcn/simd-math-5-short-32.c
new file mode 100644
index 00000000000..bf8a3addfc4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/gcn/simd-math-5-short-32.c
@@ -0,0 +1,11 @@
+#define TYPE short
+#define N 32
+#include "simd-math-5.c"
+
+/* C integer promotion means that div uses SImode and divmod doesn't match.  */
+/* { dg-final { scan-assembler-times {__divmod32.i4@rel32@lo} 1 { xfail *-*-* } } } */
+/* { dg-final { scan-assembler-times {__divv32si3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__divv32hi3@rel32@lo} 0 } } */
+/* { dg-final { scan-assembler-times {__udivv32hi3@rel32@lo} 0 } } */
+/* { dg-final { scan-assembler-times {__modv32hi3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__umodv32hi3@rel32@lo} 0 } } */
diff --git a/gcc/testsuite/gcc.target/gcn/simd-math-5-short-4.c b/gcc/testsuite/gcc.target/gcn/simd-math-5-short-4.c
new file mode 100644
index 00000000000..a2cb46c8646
--- /dev/null
+++ b/gcc/testsuite/gcc.target/gcn/simd-math-5-short-4.c
@@ -0,0 +1,11 @@
+#define TYPE short
+#define N 4
+#include "simd-math-5.c"
+
+/* C integer promotion means that div uses SImode and divmod doesn't match.  */
+/* { dg-final { scan-assembler-times {__divmod4.i4@rel32@lo} 1 { xfail *-*-* } } } */
+/* { dg-final { scan-assembler-times {__divv4si3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__divv4hi3@rel32@lo} 0 } } */
+/* { dg-final { scan-assembler-times {__udivv4hi3@rel32@lo} 0 } } */
+/* { dg-final { scan-assembler-times {__modv4hi3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__umodv4hi3@rel32@lo} 0 } } */
diff --git a/gcc/testsuite/gcc.target/gcn/simd-math-5-short-8.c b/gcc/testsuite/gcc.target/gcn/simd-math-5-short-8.c
new file mode 100644
index 00000000000..fa343e5262e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/gcn/simd-math-5-short-8.c
@@ -0,0 +1,11 @@
+#define TYPE short
+#define N 8
+#include "simd-math-5.c"
+
+/* C integer promotion means that div uses SImode and divmod doesn't match.  */
+/* { dg-final { scan-assembler-times {__divmod8.i4@rel32@lo} 1 { xfail *-*-* } } } */
+/* { dg-final { scan-assembler-times {__divv8si3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__divv8hi3@rel32@lo} 0 } } */
+/* { dg-final { scan-assembler-times {__udivv8hi3@rel32@lo} 0 } } */
+/* { dg-final { scan-assembler-times {__modv8hi3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__umodv8hi3@rel32@lo} 0 } } */
diff --git a/gcc/testsuite/gcc.target/gcn/simd-math-5-short-run-16.c b/gcc/testsuite/gcc.target/gcn/simd-math-5-short-run-16.c
new file mode 100644
index 00000000000..3fc946e2e09
--- /dev/null
+++ b/gcc/testsuite/gcc.target/gcn/simd-math-5-short-run-16.c
@@ -0,0 +1,2 @@
+/* { dg-do run } */
+#include "simd-math-5-short-16.c"
diff --git a/gcc/testsuite/gcc.target/gcn/simd-math-5-short-run-32.c b/gcc/testsuite/gcc.target/gcn/simd-math-5-short-run-32.c
new file mode 100644
index 00000000000..34b1d75fa8e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/gcn/simd-math-5-short-run-32.c
@@ -0,0 +1,2 @@
+/* { dg-do run } */
+#include "simd-math-5-short-32.c"
diff --git a/gcc/testsuite/gcc.target/gcn/simd-math-5-short-run-4.c b/gcc/testsuite/gcc.target/gcn/simd-math-5-short-run-4.c
new file mode 100644
index 00000000000..09385c78bc4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/gcn/simd-math-5-short-run-4.c
@@ -0,0 +1,2 @@
+/* { dg-do run } */
+#include "simd-math-5-short-4.c"
diff --git a/gcc/testsuite/gcc.target/gcn/simd-math-5-short-run-8.c b/gcc/testsuite/gcc.target/gcn/simd-math-5-short-run-8.c
new file mode 100644
index 00000000000..1de4d2631b3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/gcn/simd-math-5-short-run-8.c
@@ -0,0 +1,2 @@
+/* { dg-do run } */
+#include "simd-math-5-short-8.c"
diff --git a/gcc/testsuite/gcc.target/gcn/simd-math-5-short-run.c b/gcc/testsuite/gcc.target/gcn/simd-math-5-short-run.c
new file mode 100644
index 00000000000..2e0c490fdd9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/gcn/simd-math-5-short-run.c
@@ -0,0 +1,2 @@
+/* { dg-do run } */
+#include "simd-math-5-short.c"
diff --git a/gcc/testsuite/gcc.target/gcn/simd-math-5-short.c b/gcc/testsuite/gcc.target/gcn/simd-math-5-short.c
new file mode 100644
index 00000000000..84cdc9b5fdd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/gcn/simd-math-5-short.c
@@ -0,0 +1,10 @@
+#define TYPE short
+#include "simd-math-5.c"
+
+/* C integer promotion means that div uses SImode and divmod doesn't match.  */
+/* { dg-final { scan-assembler-times {__divmodv64si4@rel32@lo} 1 { xfail *-*-* } } } */
+/* { dg-final { scan-assembler-times {__divv64si3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__divv64hi3@rel32@lo} 0 } } */
+/* { dg-final { scan-assembler-times {__udivv64hi3@rel32@lo} 0 } } */
+/* { dg-final { scan-assembler-times {__modv64hi3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__umodv64hi3@rel32@lo} 0 } } */
diff --git a/gcc/testsuite/gcc.target/gcn/simd-math-5.c b/gcc/testsuite/gcc.target/gcn/simd-math-5.c
new file mode 100644
index 00000000000..26e97070bf9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/gcn/simd-math-5.c
@@ -0,0 +1,88 @@
+/* Test that the auto-vectorizer uses the libgcc vectorized division and
+   modulus functions.  */
+
+/* Setting it this way ensures the run tests use the same flag as the
+   compile tests.  */
+#pragma GCC optimize("O2")
+
+#ifndef TYPE
+#define TYPE int
+#endif
+#ifndef N
+#define N 64
+#endif
+
+TYPE a[N];
+TYPE b[N];
+
+int main()
+{
+  int i;
+  TYPE quot[N], rem[N];
+  TYPE quot2[N], rem2[N];
+  TYPE refquot[N], refrem[N];
+
+  for (i = 0; i < N; i++)
+    {
+      a[i] = i * (i >> 2) + (i >> 1);
+      b[i] = i;
+    }
+  __asm__ ("" ::: "memory");
+
+  /* Vector divmod.  */
+  for (i = 0; i < N; i++)
+    {
+      quot[i] = (TYPE)a[i] / (TYPE)b[i];
+      rem[i] = (TYPE)a[i] % (TYPE)b[i];
+    }
+  __asm__ ("" ::: "memory");
+
+  /* Vector div.  */
+  for (i = 0; i < N; i++)
+    quot2[i] = (TYPE)a[i] / (TYPE)b[i];
+  __asm__ ("" ::: "memory");
+
+  /* Vector mod.  */
+  for (i = 0; i < N; i++)
+    rem2[i] = (TYPE)a[i] % (TYPE)b[i];
+
+  /* Calculate reference values with no vectorization.  */
+  for (i = 0; i < N; i++)
+    {
+      refquot[i] = (TYPE)a[i] / (TYPE)b[i];
+      __asm__ ("" ::: "memory");
+      refrem[i] = (TYPE)a[i] % (TYPE)b[i];
+    }
+
+#ifdef DEBUG
+#define DUMP(VAR) \
+  __builtin_printf ("%8s: ", #VAR); \
+  for (i = 0; i < N; i++) \
+    __builtin_printf ("%d ", (int)VAR[i]); \
+  __builtin_printf ("\n");
+  DUMP (refquot)
+  DUMP (quot)
+  DUMP (quot2)
+  __builtin_printf ("\n");
+  DUMP (refrem)
+  DUMP (rem)
+  DUMP (rem2)
+#endif
+
+  for (i = 0; i < N; i++)
+    if (quot[i] != refquot[i]
+	|| quot2[i] != refquot[i]
+	|| rem[i] != refrem[i]
+	|| rem2[i] != refrem[i])
+      __builtin_abort ();
+  return 0;
+}
+
+/* { dg-final { scan-assembler-times {__divmodv64si4@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__udivmodv64si4@rel32@lo} 0 } } */
+/* { dg-final { scan-assembler-times {__divv64si3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__udivv64si3@rel32@lo} 0 } } */
+/* { dg-final { scan-assembler-times {__modv64si3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__umodv64si3@rel32@lo} 0 } } */
+/* { dg-final { scan-assembler-times {__divsi3@rel32@lo} 1 } } */
+/* { dg-final { scan-assembler-times {__udivsi3@rel32@lo} 0 } } */
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 95cbb1afa16..2eb3738d921 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -8432,8 +8432,9 @@ proc check_effective_target_vect_long_mult { } {
 
 proc check_effective_target_vect_int_mod { } {
     return [check_cached_effective_target_indexed vect_int_mod {
-      expr { [istarget powerpc*-*-*]
-	     && [check_effective_target_has_arch_pwr10] }}]
+      expr { ([istarget powerpc*-*-*]
+	      && [check_effective_target_has_arch_pwr10])
+             || [istarget amdgcn-*-*] }}]
 }
 
 # Return 1 if the target supports vector even/odd elements extraction, 0 otherwise.
@@ -11566,7 +11567,8 @@ proc check_effective_target_divmod { } {
     #TODO: Add checks for all targets that have either hardware divmod insn
     # or define libfunc for divmod.
     if { [istarget arm*-*-*]
-	 || [istarget i?86-*-*] || [istarget x86_64-*-*] } {
+	 || [istarget i?86-*-*] || [istarget x86_64-*-*]
+         || [istarget amdgcn-*-*] } {
 	return 1
     }
     return 0
diff --git a/gcc/tree-vect-generic.cc b/gcc/tree-vect-generic.cc
index 445da53292e..1c6fa501a5d 100644
--- a/gcc/tree-vect-generic.cc
+++ b/gcc/tree-vect-generic.cc
@@ -44,6 +44,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "gimple-fold.h"
 #include "gimple-match.h"
 #include "recog.h"		/* FIXME: for insn_data */
+#include "optabs-libfuncs.h"
 
 
 /* Build a ternary operation and gimplify it.  Emit code before GSI.
@@ -1743,7 +1744,8 @@ get_compute_type (enum tree_code code, optab op, tree type)
       machine_mode compute_mode = TYPE_MODE (compute_type);
       if (VECTOR_MODE_P (compute_mode))
 	{
-	  if (op && optab_handler (op, compute_mode) != CODE_FOR_nothing)
+	  if (op && (optab_handler (op, compute_mode) != CODE_FOR_nothing
+		     || optab_libfunc (op, compute_mode)))
 	    return compute_type;
 	  if (code == MULT_HIGHPART_EXPR
 	      && can_mult_highpart_p (compute_mode,
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 272839a658c..5f490a8d280 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -55,6 +55,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "gimple-fold.h"
 #include "regs.h"
 #include "attribs.h"
+#include "optabs-libfuncs.h"
 
 /* For lang_hooks.types.type_for_mode.  */
 #include "langhooks.h"
@@ -6377,8 +6378,8 @@ vectorizable_operation (vec_info *vinfo,
                              "no optab.\n");
 	  return false;
 	}
-      target_support_p = (optab_handler (optab, vec_mode)
-			  != CODE_FOR_nothing);
+      target_support_p = (optab_handler (optab, vec_mode) != CODE_FOR_nothing
+			  || optab_libfunc (optab, vec_mode));
     }
 
   bool using_emulated_vectors_p = vect_emulated_vector_p (vectype);
diff --git a/libgcc/config/gcn/amdgcn_veclib.h b/libgcc/config/gcn/amdgcn_veclib.h
new file mode 100644
index 00000000000..3bbd71ed077
--- /dev/null
+++ b/libgcc/config/gcn/amdgcn_veclib.h
@@ -0,0 +1,314 @@
+/*
+ * Copyright 2023 Siemens
+ *
+ * The authors hereby grant permission to use, copy, modify, distribute,
+ * and license this software and its documentation for any purpose, provided
+ * that existing copyright notices are retained in all copies and that this
+ * notice is included verbatim in any distributions.  No written agreement,
+ * license, or royalty fee is required for any of the authorized uses.
+ * Modifications to this software may be copyrighted by their authors
+ * and need not follow the licensing terms described here, provided that
+ * the new terms are clearly indicated on the first page of each file where
+ * they apply.
+ */
+
+/* Macro library used to help during conversion of scalar math functions to
+   vectorized SIMD equivalents on AMD GCN.  */
+
+typedef union {
+  v2sf t_v2sf;
+  v4sf t_v4sf;
+  v8sf t_v8sf;
+  v16sf t_v16sf;
+  v32sf t_v32sf;
+  v64sf t_v64sf;
+
+  v2df t_v2df;
+  v4df t_v4df;
+  v8df t_v8df;
+  v16df t_v16df;
+  v32df t_v32df;
+  v64df t_v64df;
+
+  v64qi t_v64qi;
+  v64hi t_v64hi;
+
+  v2si t_v2si;
+  v4si t_v4si;
+  v8si t_v8si;
+  v16si t_v16si;
+  v32si t_v32si;
+  v64si t_v64si;
+
+  v64usi t_v64usi;
+
+  v2di t_v2di;
+  v4di t_v4di;
+  v8di t_v8di;
+  v16di t_v16di;
+  v32di t_v32di;
+  v64di t_v64di;
+} vector_union;
+
+/* Cast between vectors with a different number of elements, or type.  */
+
+#define VGPR_CAST(to_t, from) \
+({ \
+  to_t __res; \
+  __asm__ ("" : "=v"(__res) : "0"(from)); \
+  __res; \
+})
+
+#define PACK_SI_PAIR(low, high) \
+({ \
+  v64udi __res; \
+  asm ("v_mov_b32\t%L0, %1\n\t" \
+       "v_mov_b32\t%H0, %2" \
+       : "=&v"(__res) : "v0"(low), "v"(high), "e"(-1L)); \
+  __res; \
+ })
+
+#define UNPACK_SI_LOW(to_t, pair) VGPR_CAST(to_t, pair)
+#define UNPACK_SI_HIGH(to_t, pair) \
+({ \
+  to_t __res; \
+  asm ("v_mov_b32\t%0, %H1" : "=v"(__res) : "v"(pair), "e"(-1L)); \
+  __res; \
+ })
+
+#define PACK_DI_PAIR(low, high) \
+({ \
+  v64uti __res; \
+  asm ("v_mov_b32\t%L0, %L1\n\t" \
+       "v_mov_b32\t%H0, %H1\n\t" \
+       "v_mov_b32\t%J0, %L2\n\t" \
+       "v_mov_b32\t%K0, %H2" \
+       : "=&v"(__res) : "v0"(low), "v"(high), "e"(-1L)); \
+  __res; \
+ })
+
+#define UNPACK_DI_LOW(to_t, pair) VGPR_CAST(to_t, pair)
+#define UNPACK_DI_HIGH(to_t, pair) \
+({ \
+  to_t __res; \
+  asm ("v_mov_b32\t%L0, %J1\n\t" \
+       "v_mov_b32\t%H0, %K1" : "=v"(__res) : "v"(pair), "e"(-1L)); \
+  __res; \
+ })
+
+#define NO_COND __mask
+
+/* Note - __mask is _not_ accounted for in VECTOR_MERGE!  */
+#define VECTOR_MERGE(vec1, vec2, cond) \
+({ \
+  _Static_assert (__builtin_types_compatible_p (typeof (vec1), typeof (vec2))); \
+  union { \
+    typeof (vec1) val; \
+    v64qi t_v64qi; \
+    v64hi t_v64hi; \
+    v64si t_v64si; \
+    v64di t_v64di; \
+  } __vec1, __vec2, __res; \
+  __vec1.val = (vec1); \
+  __vec2.val = (vec2); \
+  __builtin_choose_expr ( \
+        sizeof (vec1) == sizeof (v64si), \
+        ({ \
+          v64si __bitmask = __builtin_convertvector ((cond), v64si); \
+          __res.t_v64si = (__vec1.t_v64si & __bitmask) \
+                          | (__vec2.t_v64si & ~__bitmask); \
+        }), \
+	__builtin_choose_expr ( \
+	  sizeof (vec1) == sizeof (v64hi), \
+	  ({ \
+	    v64hi __bitmask = __builtin_convertvector ((cond), v64hi); \
+	    __res.t_v64hi = (__vec1.t_v64hi & __bitmask) \
+			    | (__vec2.t_v64hi & ~__bitmask); \
+	   }), \
+	   __builtin_choose_expr ( \
+	     sizeof (vec1) == sizeof (v64qi), \
+	     ({ \
+	     v64qi __bitmask = __builtin_convertvector ((cond), v64qi); \
+	     __res.t_v64qi = (__vec1.t_v64qi & __bitmask) \
+			      | (__vec2.t_v64qi & ~__bitmask); \
+	     }), \
+	     ({ \
+	      v64di __bitmask = __builtin_convertvector ((cond), v64di); \
+	      __res.t_v64di = (__vec1.t_v64di & __bitmask) \
+			      | (__vec2.t_v64di & ~__bitmask); \
+	      })))); \
+  __res.val; \
+})
+
+#define VECTOR_COND_MOVE(var, val, cond) \
+do { \
+  _Static_assert (__builtin_types_compatible_p (typeof (var), typeof (val))); \
+  __auto_type __cond = __builtin_convertvector ((cond), typeof (__mask)); \
+  var = VECTOR_MERGE ((val), var, __cond & __mask); \
+} while (0)
+
+#define VECTOR_IF(cond, cond_var) \
+{ \
+  __auto_type cond_var = (cond); \
+  __auto_type __inv_cond __attribute__((unused)) = ~cond_var; \
+  if (!ALL_ZEROES_P (cond_var)) \
+  {
+
+#define VECTOR_ELSEIF(cond, cond_var) \
+  } \
+  cond_var = __inv_cond & (cond); \
+  __inv_cond &= ~(cond); \
+  if (!ALL_ZEROES_P (cond_var)) \
+  {
+
+#define VECTOR_ELSE(cond_var) \
+  } \
+  cond_var = __inv_cond; \
+  if (!ALL_ZEROES_P (cond_var)) \
+  {
+
+#define VECTOR_IF2(cond, cond_var, prev_cond_var) \
+{ \
+  __auto_type cond_var = (cond) & __builtin_convertvector (prev_cond_var, typeof (cond)); \
+  __auto_type __inv_cond __attribute__((unused)) = ~cond_var; \
+  if (!ALL_ZEROES_P (cond_var)) \
+  {
+
+#define VECTOR_ELSEIF2(cond, cond_var, prev_cond_var) \
+  } \
+  cond_var = (cond) & __inv_cond & __builtin_convertvector (prev_cond_var, typeof (cond)); \
+  __inv_cond &= ~(cond); \
+  if (!ALL_ZEROES_P (cond_var)) \
+  {
+
+#define VECTOR_ELSE2(cond_var, prev_cond_var) \
+  } \
+  cond_var = __inv_cond & __builtin_convertvector (prev_cond_var, typeof (__inv_cond)); \
+  if (!ALL_ZEROES_P (cond_var)) \
+  {
+
+
+#define VECTOR_ENDIF \
+  } \
+}
+
+#define VECTOR_INIT_AUX(x, type) \
+({ \
+  typeof (x) __e = (x); \
+  type __tmp = { \
+    __e, __e, __e, __e, __e, __e, __e, __e, \
+    __e, __e, __e, __e, __e, __e, __e, __e, \
+    __e, __e, __e, __e, __e, __e, __e, __e, \
+    __e, __e, __e, __e, __e, __e, __e, __e, \
+    __e, __e, __e, __e, __e, __e, __e, __e, \
+    __e, __e, __e, __e, __e, __e, __e, __e, \
+    __e, __e, __e, __e, __e, __e, __e, __e, \
+    __e, __e, __e, __e, __e, __e, __e, __e }; \
+  __tmp; \
+})
+
+#define VECTOR_INIT(x) \
+  (_Generic ((x), int: VECTOR_INIT_AUX ((x), v64si), \
+                  unsigned: VECTOR_INIT_AUX ((x), v64usi), \
+                  char: VECTOR_INIT_AUX ((x), v64qi), \
+                  unsigned char: VECTOR_INIT_AUX ((x), v64uqi), \
+                  short: VECTOR_INIT_AUX ((x), v64hi), \
+                  unsigned short: VECTOR_INIT_AUX ((x), v64uhi), \
+                  long: VECTOR_INIT_AUX ((x), v64di), \
+                  unsigned long: VECTOR_INIT_AUX ((x), v64udi), \
+                  float: VECTOR_INIT_AUX ((x), v64sf), \
+                  double: VECTOR_INIT_AUX ((x), v64df)))
+
+
+#if defined (__GCN3__) || defined (__GCN5__) \
+    || defined (__CDNA1__) || defined (__CDNA2__)
+#define CDNA3_PLUS 0
+#else
+#define CDNA3_PLUS 1
+#endif
+
+#define VECTOR_INIT_MASK(COUNT) \
+({ \
+  MASKMODE __mask; \
+  int count = (COUNT); \
+  if (count == 64) \
+    { \
+      if (sizeof (MASKMODE) < 512 || CDNA3_PLUS) \
+	asm ("v_mov%B0\t%0, -1" : "=v"(__mask) : "e"(-1L)); \
+      else \
+	asm ("v_mov_b32\t%L0, -1\n\t" \
+	     "v_mov_b32\t%H0, -1" : "=v"(__mask) : "e"(-1L)); \
+    } \
+  else \
+    { \
+      long bitmask = (count == 64 ? -1 : (1<<count)-1); \
+      if (sizeof (MASKMODE) < 512 || CDNA3_PLUS) \
+        { \
+	  asm ("v_mov%B0\t%0, 0" : "=v"(__mask) : "e"(-1L)); \
+	  asm ("v_mov%B0\t%0, -1" : "+v"(__mask) : "e"(bitmask)); \
+	} \
+      else \
+        { \
+	  asm ("v_mov_b32\t%L0, 0\n\t" \
+	       "v_mov_b32\t%H0, 0" : "=v"(__mask) : "e"(-1L)); \
+	  asm ("v_mov_b32\t%L0, -1\n\t" \
+	       "v_mov_b32\t%H0, -1" : "+v"(__mask) : "e"(bitmask)); \
+	} \
+    } \
+  __mask; \
+})
+
+#define ALL_ZEROES_P(x) (COND_TO_BITMASK(x) == 0)
+
+#define COND_TO_BITMASK(x) \
+({ \
+  long __tmp = 0; \
+  __auto_type __x = __builtin_convertvector((x), typeof (__mask)) & __mask; \
+  __builtin_choose_expr (sizeof (__mask) != 512, \
+                         ({ asm ("v_cmp_ne_u32_e64 %0, %1, 0" \
+                                 : "=Sg" (__tmp) \
+                                 : "v" (__x)); }), \
+                         ({ asm ("v_cmp_ne_u64_e64 %0, %1, 0" \
+                                 : "=Sg" (__tmp) \
+                                 : "v" (__x)); })); \
+  __tmp; \
+})
+
+#define VECTOR_WHILE(cond, cond_var, prev_cond_var) \
+{ \
+  __auto_type cond_var = prev_cond_var; \
+  for (;;) { \
+    cond_var &= (cond); \
+    if (ALL_ZEROES_P (cond_var)) \
+      break;
+
+#define VECTOR_ENDWHILE \
+  } \
+}
+
+#define DEF_VARIANT(FUN, SUFFIX, OTYPE, TYPE, COUNT) \
+v##COUNT##OTYPE \
+FUN##v##COUNT##SUFFIX (v##COUNT##TYPE __arg1, v##COUNT##TYPE __arg2) \
+{ \
+  __auto_type __upsized_arg1 = VGPR_CAST (v64##TYPE, __arg1); \
+  __auto_type __upsized_arg2 = VGPR_CAST (v64##TYPE, __arg2); \
+  __auto_type __mask = VECTOR_INIT_MASK (COUNT); \
+  __auto_type __result = FUN##v64##SUFFIX##_aux (__upsized_arg1, __upsized_arg2, __mask); \
+  return VGPR_CAST (v##COUNT##OTYPE, __result); \
+}
+
+#define DEF_VARIANTS(FUN, SUFFIX, TYPE) \
+  DEF_VARIANT (FUN, SUFFIX, TYPE, TYPE, 2) \
+  DEF_VARIANT (FUN, SUFFIX, TYPE, TYPE, 4) \
+  DEF_VARIANT (FUN, SUFFIX, TYPE, TYPE, 8) \
+  DEF_VARIANT (FUN, SUFFIX, TYPE, TYPE, 16) \
+  DEF_VARIANT (FUN, SUFFIX, TYPE, TYPE, 32) \
+  DEF_VARIANT (FUN, SUFFIX, TYPE, TYPE, 64)
+
+#define DEF_VARIANTS_B(FUN, SUFFIX, OTYPE, TYPE) \
+  DEF_VARIANT (FUN, SUFFIX, OTYPE, TYPE, 2) \
+  DEF_VARIANT (FUN, SUFFIX, OTYPE, TYPE, 4) \
+  DEF_VARIANT (FUN, SUFFIX, OTYPE, TYPE, 8) \
+  DEF_VARIANT (FUN, SUFFIX, OTYPE, TYPE, 16) \
+  DEF_VARIANT (FUN, SUFFIX, OTYPE, TYPE, 32) \
+  DEF_VARIANT (FUN, SUFFIX, OTYPE, TYPE, 64)
diff --git a/libgcc/config/gcn/lib2-divmod-di.c b/libgcc/config/gcn/lib2-divmod-di.c
index a9023770c27..d0385f3b28c 100644
--- a/libgcc/config/gcn/lib2-divmod-di.c
+++ b/libgcc/config/gcn/lib2-divmod-di.c
@@ -22,14 +22,101 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 
 #include "lib2-gcn.h"
 
-/* We really want DImode here: override LIBGCC2_UNITS_PER_WORD.  */
-#define LIBGCC2_UNITS_PER_WORD 4
-#define TARGET_HAS_NO_HW_DIVIDE
+/* 64-bit SI divide and modulo as used in gcn.  */
 
-#define L_divmoddi4
-#define L_divdi3
-#define L_moddi3
-#define L_udivdi3
-#define L_umoddi3
+union pack {
+  UTItype ti;
+  struct {DItype quot, rem;} pair;
+};
+union upack {
+  UTItype ti;
+  struct {UDItype quot, rem;} pair;
+};
+
+UTItype
+__udivmoddi4 (UDItype num, UDItype den)
+{
+  UDItype bit = 1;
+  union upack res = {0};
+
+  while (den < num && bit && !(den & (1L<<63)))
+    {
+      den <<=1;
+      bit <<=1;
+    }
+  while (bit)
+    {
+      if (num >= den)
+	{
+	  num -= den;
+	  res.pair.quot |= bit;
+	}
+      bit >>=1;
+      den >>=1;
+    }
+  res.pair.rem = num;
+  return res.ti;
+}
+
+UTItype
+__divmoddi4 (DItype a, DItype b)
+{
+  word_type nega = 0, negb = 0;
+  union pack res;
+
+  if (a < 0)
+    {
+      a = -a;
+      nega = 1;
+    }
+
+  if (b < 0)
+    {
+      b = -b;
+      negb = 1;
+    }
+
+  res.ti = __udivmoddi4 (a, b);
+
+  if (nega)
+    res.pair.rem = -res.pair.rem;
+  if (nega ^ negb)
+    res.pair.quot = -res.pair.quot;
+
+  return res.ti;
+}
+
+
+DItype
+__divdi3 (DItype a, DItype b)
+{
+  union pack u;
+  u.ti = __divmoddi4 (a, b);
+  return u.pair.quot;
+}
+
+DItype
+__moddi3 (DItype a, DItype b)
+{
+  union pack u;
+  u.ti = __divmoddi4 (a, b);
+  return u.pair.rem;
+}
+
+
+UDItype
+__udivdi3 (UDItype a, UDItype b)
+{
+  union pack u;
+  u.ti = __udivmoddi4 (a, b);
+  return u.pair.quot;
+}
+
+UDItype
+__umoddi3 (UDItype a, UDItype b)
+{
+  union pack u;
+ u.ti = __udivmoddi4 (a, b);
+ return u.pair.rem;
+}
 
-#include "libgcc2.c"
diff --git a/libgcc/config/gcn/lib2-divmod-hi.c b/libgcc/config/gcn/lib2-divmod-hi.c
index f4584aabcd9..d6f4bd37a72 100644
--- a/libgcc/config/gcn/lib2-divmod-hi.c
+++ b/libgcc/config/gcn/lib2-divmod-hi.c
@@ -24,11 +24,20 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 
 /* 16-bit HI divide and modulo as used in gcn.  */
 
-static UHItype
-udivmodhi4 (UHItype num, UHItype den, word_type modwanted)
+union pack {
+  UDItype di;
+  struct {HItype quot, rem;} pair;
+};
+union upack {
+  UDItype di;
+  struct {UHItype quot, rem;} pair;
+};
+
+UDItype
+__udivmodhi4 (UHItype num, UHItype den)
 {
   UHItype bit = 1;
-  UHItype res = 0;
+  union upack res = {0};
 
   while (den < num && bit && !(den & (1L<<15)))
     {
@@ -40,78 +49,75 @@ udivmodhi4 (UHItype num, UHItype den, word_type modwanted)
       if (num >= den)
 	{
 	  num -= den;
-	  res |= bit;
+	  res.pair.quot |= bit;
 	}
       bit >>=1;
       den >>=1;
     }
-  if (modwanted)
-    return num;
-  return res;
+  res.pair.rem = num;
+  return res.di;
 }
 
-
-HItype
-__divhi3 (HItype a, HItype b)
+UDItype
+__divmodhi4 (HItype a, HItype b)
 {
-  word_type neg = 0;
-  HItype res;
+  word_type nega = 0, negb;
+  union pack res;
 
   if (a < 0)
     {
       a = -a;
-      neg = !neg;
+      nega = 1;
     }
 
   if (b < 0)
     {
       b = -b;
-      neg = !neg;
+      negb = 1;
     }
 
-  res = udivmodhi4 (a, b, 0);
+  res.di = __udivmodhi4 (a, b);
 
-  if (neg)
-    res = -res;
+  if (nega)
+    res.pair.rem = -res.pair.rem;
+  if (nega ^ negb)
+    res.pair.quot = -res.pair.quot;
 
-  return res;
+  return res.di;
 }
 
 
 HItype
-__modhi3 (HItype a, HItype b)
+__divhi3 (HItype a, HItype b)
 {
-  word_type neg = 0;
-  HItype res;
-
-  if (a < 0)
-    {
-      a = -a;
-      neg = 1;
-    }
-
-  if (b < 0)
-    b = -b;
-
-  res = udivmodhi4 (a, b, 1);
-
-  if (neg)
-    res = -res;
+  union pack u;
+  u.di = __divmodhi4 (a, b);
+  return u.pair.quot;
+}
 
-  return res;
+HItype
+__modhi3 (HItype a, HItype b)
+{
+  union pack u;
+  u.di = __divmodhi4 (a, b);
+  return u.pair.rem;
 }
 
 
 UHItype
 __udivhi3 (UHItype a, UHItype b)
 {
-  return udivmodhi4 (a, b, 0);
+  union pack u;
+  u.di = __udivmodhi4 (a, b);
+  return u.pair.quot;
 }
 
 
 UHItype
 __umodhi3 (UHItype a, UHItype b)
 {
-  return udivmodhi4 (a, b, 1);
+  union pack u;
+ u.di = __udivmodhi4 (a, b);
+ return u.pair.rem;
 }
 
diff --git a/libgcc/config/gcn/lib2-divmod.c b/libgcc/config/gcn/lib2-divmod.c
index c350f7858f1..d701d1a4f58 100644
--- a/libgcc/config/gcn/lib2-divmod.c
+++ b/libgcc/config/gcn/lib2-divmod.c
@@ -24,11 +24,20 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 
 /* 32-bit SI divide and modulo as used in gcn.  */
 
-static USItype
-udivmodsi4 (USItype num, USItype den, word_type modwanted)
+union pack {
+  UDItype di;
+  struct {SItype quot, rem;} pair;
+};
+union upack {
+  UDItype di;
+  struct {USItype quot, rem;} pair;
+};
+
+UDItype
+__udivmodsi4 (USItype num, USItype den)
 {
   USItype bit = 1;
-  USItype res = 0;
+  union upack res = {0};
 
   while (den < num && bit && !(den & (1L<<31)))
     {
@@ -40,78 +49,75 @@ udivmodsi4 (USItype num, USItype den, word_type modwanted)
       if (num >= den)
 	{
 	  num -= den;
-	  res |= bit;
+	  res.pair.quot |= bit;
 	}
       bit >>=1;
       den >>=1;
     }
-  if (modwanted)
-    return num;
-  return res;
+  res.pair.rem = num;
+  return res.di;
 }
 
-
-SItype
-__divsi3 (SItype a, SItype b)
+UDItype
+__divmodsi4 (SItype a, SItype b)
 {
-  word_type neg = 0;
-  SItype res;
+  word_type nega = 0, negb = 0;
+  union pack res;
 
   if (a < 0)
     {
       a = -a;
-      neg = !neg;
+      nega = 1;
     }
 
   if (b < 0)
     {
       b = -b;
-      neg = !neg;
+      negb = 1;
     }
 
-  res = udivmodsi4 (a, b, 0);
+  res.di = __udivmodsi4 (a, b);
 
-  if (neg)
-    res = -res;
+  if (nega)
+    res.pair.rem = -res.pair.rem;
+  if (nega ^ negb)
+    res.pair.quot = -res.pair.quot;
 
-  return res;
+  return res.di;
 }
 
 
 SItype
-__modsi3 (SItype a, SItype b)
+__divsi3 (SItype a, SItype b)
 {
-  word_type neg = 0;
-  SItype res;
-
-  if (a < 0)
-    {
-      a = -a;
-      neg = 1;
-    }
-
-  if (b < 0)
-    b = -b;
-
-  res = udivmodsi4 (a, b, 1);
-
-  if (neg)
-    res = -res;
+  union pack u;
+  u.di = __divmodsi4 (a, b);
+  return u.pair.quot;
+}
 
-  return res;
+SItype
+__modsi3 (SItype a, SItype b)
+{
+  union pack u;
+  u.di = __divmodsi4 (a, b);
+  return u.pair.rem;
 }
 
 
 USItype
 __udivsi3 (USItype a, USItype b)
 {
-  return udivmodsi4 (a, b, 0);
+  union pack u;
+  u.di = __udivmodsi4 (a, b);
+  return u.pair.quot;
 }
 
 
 USItype
 __umodsi3 (USItype a, USItype b)
 {
-  return udivmodsi4 (a, b, 1);
+  union pack u;
+ u.di = __udivmodsi4 (a, b);
+ return u.pair.rem;
 }
 
diff --git a/libgcc/config/gcn/lib2-gcn.h b/libgcc/config/gcn/lib2-gcn.h
index 645245b2128..b004d039df3 100644
--- a/libgcc/config/gcn/lib2-gcn.h
+++ b/libgcc/config/gcn/lib2-gcn.h
@@ -39,19 +39,135 @@ typedef int TItype __attribute__ ((mode (TI)));
 typedef unsigned int UTItype __attribute__ ((mode (TI)));
 typedef int word_type __attribute__ ((mode (__word__)));
 
+typedef float v2sf __attribute__ ((vector_size (8)));
+typedef float v4sf __attribute__ ((vector_size (16)));
+typedef float v8sf __attribute__ ((vector_size (32)));
+typedef float v16sf __attribute__ ((vector_size (64)));
+typedef float v32sf __attribute__ ((vector_size (128)));
+typedef float v64sf __attribute__ ((vector_size (256)));
+
+typedef double v2df __attribute__ ((vector_size (16)));
+typedef double v4df __attribute__ ((vector_size (32)));
+typedef double v8df __attribute__ ((vector_size (64)));
+typedef double v16df __attribute__ ((vector_size (128)));
+typedef double v32df __attribute__ ((vector_size (256)));
+typedef double v64df __attribute__ ((vector_size (512)));
+
+typedef signed char v2qi __attribute__ ((vector_size (2)));
+typedef signed char v4qi __attribute__ ((vector_size (4)));
+typedef signed char v8qi __attribute__ ((vector_size (8)));
+typedef signed char v16qi __attribute__ ((vector_size (16)));
+typedef signed char v32qi __attribute__ ((vector_size (32)));
+typedef signed char v64qi __attribute__ ((vector_size (64)));
+
+typedef unsigned char v2uqi __attribute__ ((vector_size (2)));
+typedef unsigned char v4uqi __attribute__ ((vector_size (4)));
+typedef unsigned char v8uqi __attribute__ ((vector_size (8)));
+typedef unsigned char v16uqi __attribute__ ((vector_size (16)));
+typedef unsigned char v32uqi __attribute__ ((vector_size (32)));
+typedef unsigned char v64uqi __attribute__ ((vector_size (64)));
+
+typedef short v2hi __attribute__ ((vector_size (4)));
+typedef short v4hi __attribute__ ((vector_size (8)));
+typedef short v8hi __attribute__ ((vector_size (16)));
+typedef short v16hi __attribute__ ((vector_size (32)));
+typedef short v32hi __attribute__ ((vector_size (64)));
+typedef short v64hi __attribute__ ((vector_size (128)));
+
+typedef unsigned short v2uhi __attribute__ ((vector_size (4)));
+typedef unsigned short v4uhi __attribute__ ((vector_size (8)));
+typedef unsigned short v8uhi __attribute__ ((vector_size (16)));
+typedef unsigned short v16uhi __attribute__ ((vector_size (32)));
+typedef unsigned short v32uhi __attribute__ ((vector_size (64)));
+typedef unsigned short v64uhi __attribute__ ((vector_size (128)));
+
+typedef int v2si __attribute__ ((vector_size (8)));
+typedef int v4si __attribute__ ((vector_size (16)));
+typedef int v8si __attribute__ ((vector_size (32)));
+typedef int v16si __attribute__ ((vector_size (64)));
+typedef int v32si __attribute__ ((vector_size (128)));
+typedef int v64si __attribute__ ((vector_size (256)));
+
+typedef unsigned int v2usi __attribute__ ((vector_size (8)));
+typedef unsigned int v4usi __attribute__ ((vector_size (16)));
+typedef unsigned int v8usi __attribute__ ((vector_size (32)));
+typedef unsigned int v16usi __attribute__ ((vector_size (64)));
+typedef unsigned int v32usi __attribute__ ((vector_size (128)));
+typedef unsigned int v64usi __attribute__ ((vector_size (256)));
+
+typedef long v2di __attribute__ ((vector_size (16)));
+typedef long v4di __attribute__ ((vector_size (32)));
+typedef long v8di __attribute__ ((vector_size (64)));
+typedef long v16di __attribute__ ((vector_size (128)));
+typedef long v32di __attribute__ ((vector_size (256)));
+typedef long v64di __attribute__ ((vector_size (512)));
+
+typedef unsigned long v2udi __attribute__ ((vector_size (16)));
+typedef unsigned long v4udi __attribute__ ((vector_size (32)));
+typedef unsigned long v8udi __attribute__ ((vector_size (64)));
+typedef unsigned long v16udi __attribute__ ((vector_size (128)));
+typedef unsigned long v32udi __attribute__ ((vector_size (256)));
+typedef unsigned long v64udi __attribute__ ((vector_size (512)));
+
+typedef UTItype v2uti __attribute__ ((vector_size (32)));
+typedef UTItype v4uti __attribute__ ((vector_size (64)));
+typedef UTItype v8uti __attribute__ ((vector_size (128)));
+typedef UTItype v16uti __attribute__ ((vector_size (256)));
+typedef UTItype v32uti __attribute__ ((vector_size (512)));
+typedef UTItype v64uti __attribute__ ((vector_size (1024)));
+
 /* Exported functions.  */
 extern DItype __divdi3 (DItype, DItype);
 extern DItype __moddi3 (DItype, DItype);
+extern UTItype __divmoddi4 (DItype, DItype);
 extern UDItype __udivdi3 (UDItype, UDItype);
 extern UDItype __umoddi3 (UDItype, UDItype);
+extern UTItype __udivmoddi4 (UDItype, UDItype);
 extern SItype __divsi3 (SItype, SItype);
 extern SItype __modsi3 (SItype, SItype);
+extern UDItype __divmodsi4 (SItype, SItype);
 extern USItype __udivsi3 (USItype, USItype);
 extern USItype __umodsi3 (USItype, USItype);
+extern UDItype __udivmodsi4 (USItype, USItype);
 extern HItype __divhi3 (HItype, HItype);
 extern HItype __modhi3 (HItype, HItype);
+extern UDItype __divmodhi4 (HItype, HItype);
 extern UHItype __udivhi3 (UHItype, UHItype);
 extern UHItype __umodhi3 (UHItype, UHItype);
+extern UDItype __udivmodhi4 (UHItype, UHItype);
 extern SItype __mulsi3 (SItype, SItype);
 
+#define VECTOR_PROTOTYPES(SIZE) \
+  extern v##SIZE##qi  __divv##SIZE##qi3     (v##SIZE##qi,  v##SIZE##qi);  \
+  extern v##SIZE##qi  __modv##SIZE##qi3     (v##SIZE##qi,  v##SIZE##qi);  \
+  extern v##SIZE##udi __divmodv##SIZE##qi4  (v##SIZE##qi,  v##SIZE##qi);  \
+  extern v##SIZE##uqi __udivv##SIZE##qi3    (v##SIZE##uqi, v##SIZE##uqi); \
+  extern v##SIZE##uqi __umodv##SIZE##qi3    (v##SIZE##uqi, v##SIZE##uqi); \
+  extern v##SIZE##udi __udivmodv##SIZE##qi4 (v##SIZE##uqi, v##SIZE##uqi);  \
+  extern v##SIZE##hi  __divv##SIZE##hi3     (v##SIZE##hi,  v##SIZE##hi);  \
+  extern v##SIZE##hi  __modv##SIZE##hi3     (v##SIZE##hi,  v##SIZE##hi);  \
+  extern v##SIZE##udi __divmodv##SIZE##hi4  (v##SIZE##hi,  v##SIZE##hi);  \
+  extern v##SIZE##uhi __udivv##SIZE##hi3    (v##SIZE##uhi, v##SIZE##uhi); \
+  extern v##SIZE##uhi __umodv##SIZE##hi3    (v##SIZE##uhi, v##SIZE##uhi); \
+  extern v##SIZE##udi __udivmodv##SIZE##hi4 (v##SIZE##uhi, v##SIZE##uhi); \
+  extern v##SIZE##si  __divv##SIZE##si3     (v##SIZE##si,  v##SIZE##si);  \
+  extern v##SIZE##si  __modv##SIZE##si3     (v##SIZE##si,  v##SIZE##si);  \
+  extern v##SIZE##udi __divmodv##SIZE##si4  (v##SIZE##si,  v##SIZE##si);  \
+  extern v##SIZE##usi __udivv##SIZE##si3    (v##SIZE##usi, v##SIZE##usi); \
+  extern v##SIZE##usi __umodv##SIZE##si3    (v##SIZE##usi, v##SIZE##usi); \
+  extern v##SIZE##udi __udivmodv##SIZE##si4 (v##SIZE##usi, v##SIZE##usi); \
+  extern v##SIZE##di  __divv##SIZE##di3     (v##SIZE##di,  v##SIZE##di);  \
+  extern v##SIZE##di  __modv##SIZE##di3     (v##SIZE##di,  v##SIZE##di);  \
+  extern v##SIZE##uti __divmodv##SIZE##di4  (v##SIZE##di,  v##SIZE##di);  \
+  extern v##SIZE##udi __udivv##SIZE##di3    (v##SIZE##udi, v##SIZE##udi); \
+  extern v##SIZE##udi __umodv##SIZE##di3    (v##SIZE##udi, v##SIZE##udi); \
+  extern v##SIZE##uti __udivmodv##SIZE##di4 (v##SIZE##udi, v##SIZE##udi);
+VECTOR_PROTOTYPES (2)
+VECTOR_PROTOTYPES (4)
+VECTOR_PROTOTYPES (8)
+VECTOR_PROTOTYPES (16)
+VECTOR_PROTOTYPES (32)
+VECTOR_PROTOTYPES (64)
+#undef VECTOR_PROTOTYPES
+
 #endif /* LIB2_GCN_H */
diff --git a/libgcc/config/gcn/lib2-vec_divmod-di.c b/libgcc/config/gcn/lib2-vec_divmod-di.c
new file mode 100644
index 00000000000..8f4a035f198
--- /dev/null
+++ b/libgcc/config/gcn/lib2-vec_divmod-di.c
@@ -0,0 +1,118 @@
+/* Copyright (C) 2012-2023 Free Software Foundation, Inc.
+   Contributed by Altera and Mentor Graphics, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+<http://www.gnu.org/licenses/>.  */
+
+#include "lib2-gcn.h"
+
+/* 64-bit V64SI divide and modulo as used in gcn.
+   This is a simple conversion from lib2-divmod.c.  */
+
+#define MASKMODE v64di
+#include "amdgcn_veclib.h"
+
+static v64uti
+__udivmodv64di4_aux (v64udi num, v64udi den, v64di __mask)
+{
+  v64udi bit = VECTOR_INIT (1UL);
+  v64udi res = VECTOR_INIT (0UL);
+
+  VECTOR_WHILE ((den < num) & (bit != 0) & ((den & (1L<<31)) == 0),
+		cond, NO_COND)
+    VECTOR_COND_MOVE (den, den << 1, cond);
+    VECTOR_COND_MOVE (bit, bit << 1, cond);
+  VECTOR_ENDWHILE
+  VECTOR_WHILE (bit != 0, loopcond, NO_COND)
+    VECTOR_IF2 (num >= den, ifcond, loopcond)
+      VECTOR_COND_MOVE (num, num - den, ifcond);
+      VECTOR_COND_MOVE (res, res | bit, ifcond);
+    VECTOR_ENDIF
+    VECTOR_COND_MOVE (bit, bit >> 1, loopcond);
+    VECTOR_COND_MOVE (den, den >> 1, loopcond);
+  VECTOR_ENDWHILE
+
+  return PACK_DI_PAIR (res, num);
+}
+
+static v64uti
+__divmodv64di4_aux (v64di a, v64di b, v64di __mask)
+{
+  v64di nega = VECTOR_INIT (0L);
+  v64di negb = VECTOR_INIT (0L);
+
+  VECTOR_IF (a < 0, cond)
+    VECTOR_COND_MOVE (a, -a, cond);
+    nega = cond;
+  VECTOR_ENDIF
+
+  VECTOR_IF (b < 0, cond)
+    VECTOR_COND_MOVE (b, -b, cond);
+    negb = cond;
+  VECTOR_ENDIF
+
+  v64udi ua = __builtin_convertvector (a, v64udi);
+  v64udi ub = __builtin_convertvector (b, v64udi);
+  v64uti pair = __udivmodv64di4_aux (ua, ub, __mask);
+
+  v64di quot = UNPACK_DI_LOW (v64di, pair);
+  v64di rem = UNPACK_DI_HIGH (v64di, pair);
+  VECTOR_COND_MOVE (quot, -quot, nega ^ negb);
+  VECTOR_COND_MOVE (rem, -rem, nega);
+  pair = PACK_DI_PAIR (quot, rem);
+
+  return pair;
+}
+
+
+static inline v64di
+__divv64di3_aux (v64di a, v64di b, v64di __mask)
+{
+  v64uti pair = __divmodv64di4_aux (a, b, __mask);
+  return UNPACK_DI_LOW (v64di, pair);
+}
+
+static inline v64di
+__modv64di3_aux (v64di a, v64di b, v64di __mask)
+{
+  v64uti pair = __divmodv64di4_aux (a, b, __mask);
+  return UNPACK_DI_HIGH (v64di, pair);
+}
+
+
+static inline v64udi
+__udivv64di3_aux (v64udi a, v64udi b, v64di __mask)
+{
+  v64uti pair = __udivmodv64di4_aux (a, b, __mask);
+  return UNPACK_DI_LOW (v64udi, pair);
+}
+
+static inline v64udi
+__umodv64di3_aux (v64udi a, v64udi b, v64di __mask)
+{
+  v64uti pair = __udivmodv64di4_aux (a, b, __mask);
+  return UNPACK_DI_HIGH (v64udi, pair);
+}
+
+DEF_VARIANTS (__div, di3, di)
+DEF_VARIANTS (__mod, di3, di)
+DEF_VARIANTS_B (__divmod, di4, uti, di)
+DEF_VARIANTS (__udiv, di3, udi)
+DEF_VARIANTS (__umod, di3, udi)
+DEF_VARIANTS_B (__udivmod, di4, uti, udi)
diff --git a/libgcc/config/gcn/lib2-vec_divmod-hi.c b/libgcc/config/gcn/lib2-vec_divmod-hi.c
new file mode 100644
index 00000000000..175ddf84bb2
--- /dev/null
+++ b/libgcc/config/gcn/lib2-vec_divmod-hi.c
@@ -0,0 +1,118 @@
+/* Copyright (C) 2012-2023 Free Software Foundation, Inc.
+   Contributed by Altera and Mentor Graphics, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+<http://www.gnu.org/licenses/>.  */
+
+#include "lib2-gcn.h"
+
+/* 16-bit V64HI divide and modulo as used in gcn.
+   This is a simple conversion from lib2-divmod.c.  */
+
+#define MASKMODE v64hi
+#include "amdgcn_veclib.h"
+
+static v64udi
+__udivmodv64hi4_aux (v64uhi num, v64uhi den, v64hi __mask)
+{
+  v64uhi bit = VECTOR_INIT ((unsigned short)1U);
+  v64uhi res = VECTOR_INIT ((unsigned short)0U);
+
+  VECTOR_WHILE ((den < num) & (bit != 0) & ((den & (1L<<15)) == 0),
+		cond, NO_COND)
+    VECTOR_COND_MOVE (den, den << 1, cond);
+    VECTOR_COND_MOVE (bit, bit << 1, cond);
+  VECTOR_ENDWHILE
+  VECTOR_WHILE (bit != 0, loopcond, NO_COND)
+    VECTOR_IF2 (num >= den, ifcond, loopcond)
+      VECTOR_COND_MOVE (num, num - den, ifcond);
+      VECTOR_COND_MOVE (res, res | bit, ifcond);
+    VECTOR_ENDIF
+    VECTOR_COND_MOVE (bit, bit >> 1, loopcond);
+    VECTOR_COND_MOVE (den, den >> 1, loopcond);
+  VECTOR_ENDWHILE
+
+  return PACK_SI_PAIR (res, num);
+}
+
+static v64udi
+__divmodv64hi4_aux (v64hi a, v64hi b,  v64hi __mask)
+{
+  v64hi nega = VECTOR_INIT ((short)0);
+  v64hi negb = VECTOR_INIT ((short)0);
+
+  VECTOR_IF (a < 0, cond)
+    VECTOR_COND_MOVE (a, -a, cond);
+    nega = cond;
+  VECTOR_ENDIF
+
+  VECTOR_IF (b < 0, cond)
+    VECTOR_COND_MOVE (b, -b, cond);
+    negb = cond;
+  VECTOR_ENDIF
+
+  v64uhi ua = __builtin_convertvector (a, v64uhi);
+  v64uhi ub = __builtin_convertvector (b, v64uhi);
+  v64udi pair = __udivmodv64hi4_aux (ua, ub, __mask);
+
+  v64hi quot = UNPACK_SI_LOW (v64hi, pair);
+  v64hi rem = UNPACK_SI_HIGH (v64hi, pair);
+  VECTOR_COND_MOVE (quot, -quot, nega ^ negb);
+  VECTOR_COND_MOVE (rem, -rem, nega);
+  pair = PACK_SI_PAIR (quot, rem);
+
+  return pair;
+}
+
+
+static inline v64hi
+__divv64hi3_aux (v64hi a, v64hi b, v64hi __mask)
+{
+  v64udi pair = __divmodv64hi4_aux (a, b, __mask);
+  return UNPACK_SI_LOW (v64hi, pair);
+}
+
+static inline v64hi
+__modv64hi3_aux (v64hi a, v64hi b, v64hi __mask)
+{
+  v64udi pair = __divmodv64hi4_aux (a, b, __mask);
+  return UNPACK_SI_HIGH (v64hi, pair);
+}
+
+
+static inline v64uhi
+__udivv64hi3_aux (v64uhi a, v64uhi b, v64hi __mask)
+{
+  v64udi pair = __udivmodv64hi4_aux (a, b, __mask);
+  return UNPACK_SI_LOW (v64uhi, pair);
+}
+
+static inline v64uhi
+__umodv64hi3_aux (v64uhi a, v64uhi b, v64hi __mask)
+{
+  v64udi pair = __udivmodv64hi4_aux (a, b, __mask);
+  return UNPACK_SI_HIGH (v64uhi, pair);
+}
+
+DEF_VARIANTS (__div, hi3, hi)
+DEF_VARIANTS (__mod, hi3, hi)
+DEF_VARIANTS_B (__divmod, hi4, udi, hi)
+DEF_VARIANTS (__udiv, hi3, uhi)
+DEF_VARIANTS (__umod, hi3, uhi)
+DEF_VARIANTS_B (__udivmod, hi4, udi, uhi)
diff --git a/libgcc/config/gcn/lib2-vec_divmod-qi.c b/libgcc/config/gcn/lib2-vec_divmod-qi.c
new file mode 100644
index 00000000000..ff6b5c2e7d8
--- /dev/null
+++ b/libgcc/config/gcn/lib2-vec_divmod-qi.c
@@ -0,0 +1,118 @@
+/* Copyright (C) 2012-2023 Free Software Foundation, Inc.
+   Contributed by Altera and Mentor Graphics, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+<http://www.gnu.org/licenses/>.  */
+
+#include "lib2-gcn.h"
+
+/* 8-bit V64QI divide and modulo as used in gcn.
+   This is a simple conversion from lib2-divmod.c.  */
+
+#define MASKMODE v64qi
+#include "amdgcn_veclib.h"
+
+static v64udi
+__udivmodv64qi4_aux (v64uqi num, v64uqi den, v64qi __mask)
+{
+  v64uqi bit = VECTOR_INIT ((unsigned char)1U);
+  v64uqi res = VECTOR_INIT ((unsigned char)0U);
+
+  VECTOR_WHILE ((den < num) & (bit != 0) & ((den & (1<<7)) == 0),
+		cond, NO_COND)
+    VECTOR_COND_MOVE (den, den << 1, cond);
+    VECTOR_COND_MOVE (bit, bit << 1, cond);
+  VECTOR_ENDWHILE
+  VECTOR_WHILE (bit != 0, loopcond, NO_COND)
+    VECTOR_IF2 (num >= den, ifcond, loopcond)
+      VECTOR_COND_MOVE (num, num - den, ifcond);
+      VECTOR_COND_MOVE (res, res | bit, ifcond);
+    VECTOR_ENDIF
+    VECTOR_COND_MOVE (bit, bit >> 1, loopcond);
+    VECTOR_COND_MOVE (den, den >> 1, loopcond);
+  VECTOR_ENDWHILE
+
+  return PACK_SI_PAIR (res, num);
+}
+
+static v64udi
+__divmodv64qi4_aux (v64qi a, v64qi b, v64qi __mask)
+{
+  v64qi nega = VECTOR_INIT ((char)0);
+  v64qi negb = VECTOR_INIT ((char)0);
+
+  VECTOR_IF (a < 0, cond)
+    VECTOR_COND_MOVE (a, -a, cond);
+    nega = cond;
+  VECTOR_ENDIF
+
+  VECTOR_IF (b < 0, cond)
+    VECTOR_COND_MOVE (b, -b, cond);
+    negb = cond;
+  VECTOR_ENDIF
+
+  v64uqi ua = __builtin_convertvector (a, v64uqi);
+  v64uqi ub = __builtin_convertvector (b, v64uqi);
+  v64udi pair = __udivmodv64qi4_aux (ua, ub, __mask);
+
+  v64qi quot = UNPACK_SI_LOW (v64qi, pair);
+  v64qi rem = UNPACK_SI_HIGH (v64qi, pair);
+  VECTOR_COND_MOVE (quot, -quot, nega ^ negb);
+  VECTOR_COND_MOVE (rem, -rem, nega);
+  pair = PACK_SI_PAIR (quot, rem);
+
+  return pair;
+}
+
+
+static inline v64qi
+__divv64qi3_aux (v64qi a, v64qi b, v64qi __mask)
+{
+  v64udi pair = __divmodv64qi4_aux (a, b, __mask);
+  return UNPACK_SI_LOW (v64qi, pair);
+}
+
+static inline v64qi
+__modv64qi3_aux (v64qi a, v64qi b, v64qi __mask)
+{
+  v64udi pair = __divmodv64qi4_aux (a, b, __mask);
+  return UNPACK_SI_HIGH (v64qi, pair);
+}
+
+
+static inline v64uqi
+__udivv64qi3_aux (v64uqi a, v64uqi b, v64qi __mask)
+{
+  v64udi pair = __udivmodv64qi4_aux (a, b, __mask);
+  return UNPACK_SI_LOW (v64uqi, pair);
+}
+
+static inline v64uqi
+__umodv64qi3_aux (v64uqi a, v64uqi b, v64qi __mask)
+{
+  v64udi pair = __udivmodv64qi4_aux (a, b, __mask);
+  return UNPACK_SI_HIGH (v64uqi, pair);
+}
+
+DEF_VARIANTS (__div, qi3, qi)
+DEF_VARIANTS (__mod, qi3, qi)
+DEF_VARIANTS_B (__divmod, qi4, udi, qi)
+DEF_VARIANTS (__udiv, qi3, uqi)
+DEF_VARIANTS (__umod, qi3, uqi)
+DEF_VARIANTS_B (__udivmod, qi4, udi, uqi)
diff --git a/libgcc/config/gcn/lib2-vec_divmod.c b/libgcc/config/gcn/lib2-vec_divmod.c
new file mode 100644
index 00000000000..e1667668e68
--- /dev/null
+++ b/libgcc/config/gcn/lib2-vec_divmod.c
@@ -0,0 +1,118 @@
+/* Copyright (C) 2012-2023 Free Software Foundation, Inc.
+   Contributed by Altera and Mentor Graphics, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+<http://www.gnu.org/licenses/>.  */
+
+#include "lib2-gcn.h"
+
+/* 32-bit V64SI divide and modulo as used in gcn.
+   This is a simple conversion from lib2-divmod.c.  */
+
+#define MASKMODE v64si
+#include "amdgcn_veclib.h"
+
+static v64udi
+__udivmodv64si4_aux (v64usi num, v64usi den, v64si __mask)
+{
+  v64usi bit = VECTOR_INIT (1U);
+  v64usi res = VECTOR_INIT (0U);
+
+  VECTOR_WHILE ((den < num) & (bit != 0) & ((den & (1L<<31)) == 0),
+		cond, NO_COND)
+    VECTOR_COND_MOVE (den, den << 1, cond);
+    VECTOR_COND_MOVE (bit, bit << 1, cond);
+  VECTOR_ENDWHILE
+  VECTOR_WHILE (bit != 0, loopcond, NO_COND)
+    VECTOR_IF2 (num >= den, ifcond, loopcond)
+      VECTOR_COND_MOVE (num, num - den, ifcond);
+      VECTOR_COND_MOVE (res, res | bit, ifcond);
+    VECTOR_ENDIF
+    VECTOR_COND_MOVE (bit, bit >> 1, loopcond);
+    VECTOR_COND_MOVE (den, den >> 1, loopcond);
+  VECTOR_ENDWHILE
+
+  return PACK_SI_PAIR (res, num);
+}
+
+static v64udi
+__divmodv64si4_aux (v64si a, v64si b, v64si __mask)
+{
+  v64si nega = VECTOR_INIT (0);
+  v64si negb = VECTOR_INIT (0);
+
+  VECTOR_IF (a < 0, cond)
+    VECTOR_COND_MOVE (a, -a, cond);
+    nega = cond;
+  VECTOR_ENDIF
+
+  VECTOR_IF (b < 0, cond)
+    VECTOR_COND_MOVE (b, -b, cond);
+    negb = cond;
+  VECTOR_ENDIF
+
+  v64usi ua = __builtin_convertvector (a, v64usi);
+  v64usi ub = __builtin_convertvector (b, v64usi);
+  v64udi pair = __udivmodv64si4_aux (ua, ub, __mask);
+
+  v64si quot = UNPACK_SI_LOW (v64si, pair);
+  v64si rem = UNPACK_SI_HIGH (v64si, pair);
+  VECTOR_COND_MOVE (quot, -quot, nega ^ negb);
+  VECTOR_COND_MOVE (rem, -rem, nega);
+  pair = PACK_SI_PAIR (quot, rem);
+
+  return pair;
+}
+
+
+static inline v64si
+__divv64si3_aux (v64si a, v64si b, v64si __mask)
+{
+  v64udi pair = __divmodv64si4_aux (a, b, __mask);
+  return UNPACK_SI_LOW (v64si, pair);
+}
+
+static inline v64si
+__modv64si3_aux (v64si a, v64si b, v64si __mask)
+{
+  v64udi pair = __divmodv64si4_aux (a, b, __mask);
+  return UNPACK_SI_HIGH (v64si, pair);
+}
+
+
+static inline v64usi
+__udivv64si3_aux (v64usi a, v64usi b, v64si __mask)
+{
+  v64udi pair = __udivmodv64si4_aux (a, b, __mask);
+  return UNPACK_SI_LOW (v64usi, pair);
+}
+
+static inline v64usi
+__umodv64si3_aux (v64usi a, v64usi b, v64si __mask)
+{
+  v64udi pair = __udivmodv64si4_aux (a, b, __mask);
+  return UNPACK_SI_HIGH (v64usi, pair);
+}
+
+DEF_VARIANTS (__div, si3, si)
+DEF_VARIANTS (__mod, si3, si)
+DEF_VARIANTS_B (__divmod, si4, udi, si)
+DEF_VARIANTS (__udiv, si3, usi)
+DEF_VARIANTS (__umod, si3, usi)
+DEF_VARIANTS_B (__udivmod, si4, udi, usi)
diff --git a/libgcc/config/gcn/t-amdgcn b/libgcc/config/gcn/t-amdgcn
index 38bde54a096..43a496a191c 100644
--- a/libgcc/config/gcn/t-amdgcn
+++ b/libgcc/config/gcn/t-amdgcn
@@ -2,6 +2,10 @@ LIB2ADD += $(srcdir)/config/gcn/atomic.c \
 	   $(srcdir)/config/gcn/lib2-divmod.c \
 	   $(srcdir)/config/gcn/lib2-divmod-hi.c \
 	   $(srcdir)/config/gcn/lib2-divmod-di.c \
+	   $(srcdir)/config/gcn/lib2-vec_divmod.c \
+	   $(srcdir)/config/gcn/lib2-vec_divmod-qi.c \
+	   $(srcdir)/config/gcn/lib2-vec_divmod-hi.c \
+	   $(srcdir)/config/gcn/lib2-vec_divmod-di.c \
 	   $(srcdir)/config/gcn/lib2-bswapti2.c \
 	   $(srcdir)/config/gcn/unwind-gcn.c
amdgcn: minimal V64TImode vector support

Just enough support for TImode vectors to exist, load, store, move,
without any real instructions available.

This is primarily for the use of divmodv64di4, which uses TImode to
return a pair of DImode values.

gcc/ChangeLog:

	* config/gcn/gcn-protos.h (vgpr_4reg_mode_p): New function.
	* config/gcn/gcn-valu.md (V_4REG, V_4REG_ALT): New iterators.
	(V_MOV, V_MOV_ALT): Likewise.
	(scalar_mode, SCALAR_MODE): Add TImode.
	(vnsi, VnSI, vndi, VnDI): Likewise.
	(vec_merge, vec_merge_with_clobber, vec_merge_with_vcc): Use V_MOV.
	(mov<mode>, mov<mode>_unspec): Use V_MOV.
	(*mov<mode>_4reg): New insn.
	(mov<mode>_exec): New 4reg variant.
	(mov<mode>_sgprbase): Likewise.
	(reload_in<mode>, reload_out<mode>): Use V_MOV.
	(vec_set<mode>): Likewise.
	(vec_duplicate<mode><exec>): New 4reg variant.
	(vec_extract<mode><scalar_mode>): Likewise.
	(vec_extract<V_ALL:mode><V_ALL_ALT:mode>): Rename to ...
	(vec_extract<V_MOV:mode><V_MOV_ALT:mode>): ... this, and use V_MOV.
	(vec_extract<V_4REG:mode><V_4REG_ALT:mode>_nop): New 4reg variant.
	(fold_extract_last_<mode>): Use V_MOV.
	(vec_init<V_ALL:mode><V_ALL_ALT:mode>): Rename to ...
	(vec_init<V_MOV:mode><V_MOV_ALT:mode>): ... this, and use V_MOV.
	(gather_load<mode><vnsi>, gather<mode>_expr<exec>,
	gather<mode>_insn_1offset<exec>, gather<mode>_insn_1offset_ds<exec>,
	gather<mode>_insn_2offsets<exec>): Use V_MOV.
	(scatter_store<mode><vnsi>, scatter<mode>_expr<exec_scatter>,
	scatter<mode>_insn_1offset<exec_scatter>,
	scatter<mode>_insn_1offset_ds<exec_scatter>,
	scatter<mode>_insn_2offsets<exec_scatter>): Likewise.
	(maskload<mode>di, maskstore<mode>di, mask_gather_load<mode><vnsi>,
	mask_scatter_store<mode><vnsi>): Likewise.
	* config/gcn/gcn.cc (gcn_class_max_nregs): Use vgpr_4reg_mode_p.
	(gcn_hard_regno_mode_ok): Likewise.
	(GEN_VNM): Add TImode support.
	(USE_TI): New macro. Separate TImode operations from non-TImode ones.
	(gcn_vector_mode_supported_p): Add V64TImode, V32TImode, V16TImode,
	V8TImode, and V2TImode.
	(print_operand):  Add 'J' and 'K' print codes.

diff --git a/gcc/config/gcn/gcn-protos.h b/gcc/config/gcn/gcn-protos.h
index 287ce17d422..3befb2b7caa 100644
--- a/gcc/config/gcn/gcn-protos.h
+++ b/gcc/config/gcn/gcn-protos.h
@@ -136,6 +136,17 @@ vgpr_2reg_mode_p (machine_mode mode)
   return (mode == DImode || mode == DFmode);
 }
 
+/* Return true if MODE is valid for four VGPR registers.  */
+
+inline bool
+vgpr_4reg_mode_p (machine_mode mode)
+{
+  if (VECTOR_MODE_P (mode))
+    mode = GET_MODE_INNER (mode);
+
+  return (mode == TImode);
+}
+
 /* Return true if MODE can be handled directly by VGPR operations.  */
 
 inline bool
diff --git a/gcc/config/gcn/gcn-valu.md b/gcc/config/gcn/gcn-valu.md
index 44c48468dd6..aa6c8fe27c2 100644
--- a/gcc/config/gcn/gcn-valu.md
+++ b/gcc/config/gcn/gcn-valu.md
@@ -96,6 +96,10 @@ (define_mode_iterator V_2REG_ALT
 		       V32DI V32DF
 		       V64DI V64DF])
 
+; Vector modes for four vector registers
+(define_mode_iterator V_4REG [V2TI V4TI V8TI V16TI V32TI V64TI])
+(define_mode_iterator V_4REG_ALT [V2TI V4TI V8TI V16TI V32TI V64TI])
+
 ; Vector modes with native support
 (define_mode_iterator V_noQI
 		      [V2HI V2HF V2SI V2SF V2DI V2DF
@@ -136,7 +140,7 @@ (define_mode_iterator SV_SFDF
 		       V32SF V32DF
 		       V64SF V64DF])
 
-; All of above
+; All modes in which we want to do more than just moves.
 (define_mode_iterator V_ALL
 		      [V2QI V2HI V2HF V2SI V2SF V2DI V2DF
 		       V4QI V4HI V4HF V4SI V4SF V4DI V4DF
@@ -175,97 +179,113 @@ (define_mode_iterator SV_FP
 		       V32HF V32SF V32DF
 		       V64HF V64SF V64DF])
 
+; All modes that need moves, including those without many insns.
+(define_mode_iterator V_MOV
+		      [V2QI V2HI V2HF V2SI V2SF V2DI V2DF V2TI
+		       V4QI V4HI V4HF V4SI V4SF V4DI V4DF V4TI
+		       V8QI V8HI V8HF V8SI V8SF V8DI V8DF V8TI
+		       V16QI V16HI V16HF V16SI V16SF V16DI V16DF V16TI
+		       V32QI V32HI V32HF V32SI V32SF V32DI V32DF V32TI
+		       V64QI V64HI V64HF V64SI V64SF V64DI V64DF V64TI])
+(define_mode_iterator V_MOV_ALT
+		      [V2QI V2HI V2HF V2SI V2SF V2DI V2DF V2TI
+		       V4QI V4HI V4HF V4SI V4SF V4DI V4DF V4TI
+		       V8QI V8HI V8HF V8SI V8SF V8DI V8DF V8TI
+		       V16QI V16HI V16HF V16SI V16SF V16DI V16DF V16TI
+		       V32QI V32HI V32HF V32SI V32SF V32DI V32DF V32TI
+		       V64QI V64HI V64HF V64SI V64SF V64DI V64DF V64TI])
+
 (define_mode_attr scalar_mode
-  [(QI "qi") (HI "hi") (SI "si")
+  [(QI "qi") (HI "hi") (SI "si") (TI "ti")
    (HF "hf") (SF "sf") (DI "di") (DF "df")
-   (V2QI "qi") (V2HI "hi") (V2SI "si")
+   (V2QI "qi") (V2HI "hi") (V2SI "si") (V2TI "ti")
    (V2HF "hf") (V2SF "sf") (V2DI "di") (V2DF "df")
-   (V4QI "qi") (V4HI "hi") (V4SI "si")
+   (V4QI "qi") (V4HI "hi") (V4SI "si") (V4TI "ti")
    (V4HF "hf") (V4SF "sf") (V4DI "di") (V4DF "df")
-   (V8QI "qi") (V8HI "hi") (V8SI "si")
+   (V8QI "qi") (V8HI "hi") (V8SI "si") (V8TI "ti")
    (V8HF "hf") (V8SF "sf") (V8DI "di") (V8DF "df")
-   (V16QI "qi") (V16HI "hi") (V16SI "si")
+   (V16QI "qi") (V16HI "hi") (V16SI "si") (V16TI "ti")
    (V16HF "hf") (V16SF "sf") (V16DI "di") (V16DF "df")
-   (V32QI "qi") (V32HI "hi") (V32SI "si")
+   (V32QI "qi") (V32HI "hi") (V32SI "si") (V32TI "ti")
    (V32HF "hf") (V32SF "sf") (V32DI "di") (V32DF "df")
-   (V64QI "qi") (V64HI "hi") (V64SI "si")
+   (V64QI "qi") (V64HI "hi") (V64SI "si") (V64TI "ti")
    (V64HF "hf") (V64SF "sf") (V64DI "di") (V64DF "df")])
 
 (define_mode_attr SCALAR_MODE
-  [(QI "QI") (HI "HI") (SI "SI")
+  [(QI "QI") (HI "HI") (SI "SI") (TI "TI")
    (HF "HF") (SF "SF") (DI "DI") (DF "DF")
-   (V2QI "QI") (V2HI "HI") (V2SI "SI")
+   (V2QI "QI") (V2HI "HI") (V2SI "SI") (V2TI "TI")
    (V2HF "HF") (V2SF "SF") (V2DI "DI") (V2DF "DF")
-   (V4QI "QI") (V4HI "HI") (V4SI "SI")
+   (V4QI "QI") (V4HI "HI") (V4SI "SI") (V4TI "TI")
    (V4HF "HF") (V4SF "SF") (V4DI "DI") (V4DF "DF")
-   (V8QI "QI") (V8HI "HI") (V8SI "SI")
+   (V8QI "QI") (V8HI "HI") (V8SI "SI") (V8TI "TI")
    (V8HF "HF") (V8SF "SF") (V8DI "DI") (V8DF "DF")
-   (V16QI "QI") (V16HI "HI") (V16SI "SI")
+   (V16QI "QI") (V16HI "HI") (V16SI "SI") (V16TI "TI")
    (V16HF "HF") (V16SF "SF") (V16DI "DI") (V16DF "DF")
-   (V32QI "QI") (V32HI "HI") (V32SI "SI")
+   (V32QI "QI") (V32HI "HI") (V32SI "SI") (V32TI "TI")
    (V32HF "HF") (V32SF "SF") (V32DI "DI") (V32DF "DF")
-   (V64QI "QI") (V64HI "HI") (V64SI "SI")
+   (V64QI "QI") (V64HI "HI") (V64SI "SI") (V64TI "TI")
    (V64HF "HF") (V64SF "SF") (V64DI "DI") (V64DF "DF")])
 
 (define_mode_attr vnsi
-  [(QI "si") (HI "si") (SI "si")
+  [(QI "si") (HI "si") (SI "si") (TI "si")
    (HF "si") (SF "si") (DI "si") (DF "si")
    (V2QI "v2si") (V2HI "v2si") (V2HF "v2si") (V2SI "v2si")
-   (V2SF "v2si") (V2DI "v2si") (V2DF "v2si")
+   (V2SF "v2si") (V2DI "v2si") (V2DF "v2si") (V2TI "v2si")
    (V4QI "v4si") (V4HI "v4si") (V4HF "v4si") (V4SI "v4si")
-   (V4SF "v4si") (V4DI "v4si") (V4DF "v4si")
+   (V4SF "v4si") (V4DI "v4si") (V4DF "v4si") (V4TI "v4si")
    (V8QI "v8si") (V8HI "v8si") (V8HF "v8si") (V8SI "v8si")
-   (V8SF "v8si") (V8DI "v8si") (V8DF "v8si")
+   (V8SF "v8si") (V8DI "v8si") (V8DF "v8si") (V8TI "v8si")
    (V16QI "v16si") (V16HI "v16si") (V16HF "v16si") (V16SI "v16si")
-   (V16SF "v16si") (V16DI "v16si") (V16DF "v16si")
+   (V16SF "v16si") (V16DI "v16si") (V16DF "v16si") (V16TI "v16si")
    (V32QI "v32si") (V32HI "v32si") (V32HF "v32si") (V32SI "v32si")
-   (V32SF "v32si") (V32DI "v32si") (V32DF "v32si")
+   (V32SF "v32si") (V32DI "v32si") (V32DF "v32si") (V32TI "v32si")
    (V64QI "v64si") (V64HI "v64si") (V64HF "v64si") (V64SI "v64si")
-   (V64SF "v64si") (V64DI "v64si") (V64DF "v64si")])
+   (V64SF "v64si") (V64DI "v64si") (V64DF "v64si") (V64TI "v64si")])
 
 (define_mode_attr VnSI
-  [(QI "SI") (HI "SI") (SI "SI")
+  [(QI "SI") (HI "SI") (SI "SI") (TI "SI")
    (HF "SI") (SF "SI") (DI "SI") (DF "SI")
    (V2QI "V2SI") (V2HI "V2SI") (V2HF "V2SI") (V2SI "V2SI")
-   (V2SF "V2SI") (V2DI "V2SI") (V2DF "V2SI")
+   (V2SF "V2SI") (V2DI "V2SI") (V2DF "V2SI") (V2TI "V2SI")
    (V4QI "V4SI") (V4HI "V4SI") (V4HF "V4SI") (V4SI "V4SI")
-   (V4SF "V4SI") (V4DI "V4SI") (V4DF "V4SI")
+   (V4SF "V4SI") (V4DI "V4SI") (V4DF "V4SI") (V4TI "V4SI")
    (V8QI "V8SI") (V8HI "V8SI") (V8HF "V8SI") (V8SI "V8SI")
-   (V8SF "V8SI") (V8DI "V8SI") (V8DF "V8SI")
+   (V8SF "V8SI") (V8DI "V8SI") (V8DF "V8SI") (V8TI "V8SI")
    (V16QI "V16SI") (V16HI "V16SI") (V16HF "V16SI") (V16SI "V16SI")
-   (V16SF "V16SI") (V16DI "V16SI") (V16DF "V16SI")
+   (V16SF "V16SI") (V16DI "V16SI") (V16DF "V16SI") (V16TI "V16SI")
    (V32QI "V32SI") (V32HI "V32SI") (V32HF "V32SI") (V32SI "V32SI")
-   (V32SF "V32SI") (V32DI "V32SI") (V32DF "V32SI")
+   (V32SF "V32SI") (V32DI "V32SI") (V32DF "V32SI") (V32TI "V32SI")
    (V64QI "V64SI") (V64HI "V64SI") (V64HF "V64SI") (V64SI "V64SI")
-   (V64SF "V64SI") (V64DI "V64SI") (V64DF "V64SI")])
+   (V64SF "V64SI") (V64DI "V64SI") (V64DF "V64SI") (V64TI "V64SI")])
 
 (define_mode_attr vndi
   [(V2QI "v2di") (V2HI "v2di") (V2HF "v2di") (V2SI "v2di")
-   (V2SF "v2di") (V2DI "v2di") (V2DF "v2di")
+   (V2SF "v2di") (V2DI "v2di") (V2DF "v2di") (V2TI "v2di")
    (V4QI "v4di") (V4HI "v4di") (V4HF "v4di") (V4SI "v4di")
-   (V4SF "v4di") (V4DI "v4di") (V4DF "v4di")
+   (V4SF "v4di") (V4DI "v4di") (V4DF "v4di") (V4TI "v4di")
    (V8QI "v8di") (V8HI "v8di") (V8HF "v8di") (V8SI "v8di")
-   (V8SF "v8di") (V8DI "v8di") (V8DF "v8di")
+   (V8SF "v8di") (V8DI "v8di") (V8DF "v8di") (V8TI "v8di")
    (V16QI "v16di") (V16HI "v16di") (V16HF "v16di") (V16SI "v16di")
-   (V16SF "v16di") (V16DI "v16di") (V16DF "v16di")
+   (V16SF "v16di") (V16DI "v16di") (V16DF "v16di") (V16TI "v16di")
    (V32QI "v32di") (V32HI "v32di") (V32HF "v32di") (V32SI "v32di")
-   (V32SF "v32di") (V32DI "v32di") (V32DF "v32di")
+   (V32SF "v32di") (V32DI "v32di") (V32DF "v32di") (V32TI "v32di")
    (V64QI "v64di") (V64HI "v64di") (V64HF "v64di") (V64SI "v64di")
-   (V64SF "v64di") (V64DI "v64di") (V64DF "v64di")])
+   (V64SF "v64di") (V64DI "v64di") (V64DF "v64di") (V64TI "v64di")])
 
 (define_mode_attr VnDI
   [(V2QI "V2DI") (V2HI "V2DI") (V2HF "V2DI") (V2SI "V2DI")
-   (V2SF "V2DI") (V2DI "V2DI") (V2DF "V2DI")
+   (V2SF "V2DI") (V2DI "V2DI") (V2DF "V2DI") (V2TI "V2DI")
    (V4QI "V4DI") (V4HI "V4DI") (V4HF "V4DI") (V4SI "V4DI")
-   (V4SF "V4DI") (V4DI "V4DI") (V4DF "V4DI")
+   (V4SF "V4DI") (V4DI "V4DI") (V4DF "V4DI") (V4TI "V4DI")
    (V8QI "V8DI") (V8HI "V8DI") (V8HF "V8DI") (V8SI "V8DI")
-   (V8SF "V8DI") (V8DI "V8DI") (V8DF "V8DI")
+   (V8SF "V8DI") (V8DI "V8DI") (V8DF "V8DI") (V8TI "V8DI")
    (V16QI "V16DI") (V16HI "V16DI") (V16HF "V16DI") (V16SI "V16DI")
-   (V16SF "V16DI") (V16DI "V16DI") (V16DF "V16DI")
+   (V16SF "V16DI") (V16DI "V16DI") (V16DF "V16DI") (V16TI "V16DI")
    (V32QI "V32DI") (V32HI "V32DI") (V32HF "V32DI") (V32SI "V32DI")
-   (V32SF "V32DI") (V32DI "V32DI") (V32DF "V32DI")
+   (V32SF "V32DI") (V32DI "V32DI") (V32DF "V32DI") (V32TI "V32DI")
    (V64QI "V64DI") (V64HI "V64DI") (V64HF "V64DI") (V64SI "V64DI")
-   (V64SF "V64DI") (V64DI "V64DI") (V64DF "V64DI")])
+   (V64SF "V64DI") (V64DI "V64DI") (V64DF "V64DI") (V64TI "V64DI")])
 
 (define_mode_attr sdwa
   [(V2QI "BYTE_0") (V2HI "WORD_0") (V2SI "DWORD")
@@ -288,38 +308,38 @@ (define_subst_attr "exec_scatter" "scatter_store"
 		   "" "_exec")
 
 (define_subst "vec_merge"
-  [(set (match_operand:V_ALL 0)
-	(match_operand:V_ALL 1))]
+  [(set (match_operand:V_MOV 0)
+	(match_operand:V_MOV 1))]
   ""
   [(set (match_dup 0)
-	(vec_merge:V_ALL
+	(vec_merge:V_MOV
 	  (match_dup 1)
-	  (match_operand:V_ALL 3 "gcn_register_or_unspec_operand" "U0")
+	  (match_operand:V_MOV 3 "gcn_register_or_unspec_operand" "U0")
 	  (match_operand:DI 4 "gcn_exec_reg_operand" "e")))])
 
 (define_subst "vec_merge_with_clobber"
-  [(set (match_operand:V_ALL 0)
-	(match_operand:V_ALL 1))
+  [(set (match_operand:V_MOV 0)
+	(match_operand:V_MOV 1))
    (clobber (match_operand 2))]
   ""
   [(set (match_dup 0)
-	(vec_merge:V_ALL
+	(vec_merge:V_MOV
 	  (match_dup 1)
-	  (match_operand:V_ALL 3 "gcn_register_or_unspec_operand" "U0")
+	  (match_operand:V_MOV 3 "gcn_register_or_unspec_operand" "U0")
 	  (match_operand:DI 4 "gcn_exec_reg_operand" "e")))
    (clobber (match_dup 2))])
 
 (define_subst "vec_merge_with_vcc"
-  [(set (match_operand:V_ALL 0)
-	(match_operand:V_ALL 1))
+  [(set (match_operand:V_MOV 0)
+	(match_operand:V_MOV 1))
    (set (match_operand:DI 2)
 	(match_operand:DI 3))]
   ""
   [(parallel
      [(set (match_dup 0)
-	   (vec_merge:V_ALL
+	   (vec_merge:V_MOV
 	     (match_dup 1)
-	     (match_operand:V_ALL 4 "gcn_register_or_unspec_operand" "U0")
+	     (match_operand:V_MOV 4 "gcn_register_or_unspec_operand" "U0")
 	     (match_operand:DI 5 "gcn_exec_reg_operand" "e")))
       (set (match_dup 2)
 	   (and:DI (match_dup 3)
@@ -351,8 +371,8 @@ (define_subst "scatter_store"
 ; gather/scatter, maskload/store, etc.
 
 (define_expand "mov<mode>"
-  [(set (match_operand:V_ALL 0 "nonimmediate_operand")
-	(match_operand:V_ALL 1 "general_operand"))]
+  [(set (match_operand:V_MOV 0 "nonimmediate_operand")
+	(match_operand:V_MOV 1 "general_operand"))]
   ""
   {
     /* Bitwise reinterpret casts via SUBREG don't work with GCN vector
@@ -421,8 +441,8 @@ (define_expand "mov<mode>"
 ; A pseudo instruction that helps LRA use the "U0" constraint.
 
 (define_insn "mov<mode>_unspec"
-  [(set (match_operand:V_ALL 0 "nonimmediate_operand" "=v")
-	(match_operand:V_ALL 1 "gcn_unspec_operand"   " U"))]
+  [(set (match_operand:V_MOV 0 "nonimmediate_operand" "=v")
+	(match_operand:V_MOV 1 "gcn_unspec_operand"   " U"))]
   ""
   ""
   [(set_attr "type" "unknown")
@@ -527,6 +547,69 @@ (define_insn "mov<mode>_exec"
   [(set_attr "type" "vmult,vmult,vmult,*,*")
    (set_attr "length" "16,16,16,16,16")])
 
+(define_insn "*mov<mode>_4reg"
+  [(set (match_operand:V_4REG 0 "nonimmediate_operand" "=v")
+	(match_operand:V_4REG 1 "general_operand"      "vDB"))]
+  ""
+  {
+    return "v_mov_b32\t%L0, %L1\;"
+           "v_mov_b32\t%H0, %H1\;"
+           "v_mov_b32\t%J0, %J1\;"
+           "v_mov_b32\t%K0, %K1\;";
+  }
+  [(set_attr "type" "vmult")
+   (set_attr "length" "16")])
+
+(define_insn "mov<mode>_exec"
+  [(set (match_operand:V_4REG 0 "nonimmediate_operand" "= v,   v,   v, v, m")
+	(vec_merge:V_4REG
+	  (match_operand:V_4REG 1 "general_operand"    "vDB,  v0,  v0, m, v")
+	  (match_operand:V_4REG 2 "gcn_alu_or_unspec_operand"
+						       " U0,vDA0,vDA0,U0,U0")
+	  (match_operand:DI 3 "register_operand"       "  e,  cV,  Sv, e, e")))
+   (clobber (match_scratch:<VnDI> 4		       "= X,   X,   X,&v,&v"))]
+  "!MEM_P (operands[0]) || REG_P (operands[1])"
+  {
+    if (!REG_P (operands[1]) || REGNO (operands[0]) <= REGNO (operands[1]))
+      switch (which_alternative)
+	{
+	case 0:
+	  return "v_mov_b32\t%L0, %L1\;v_mov_b32\t%H0, %H1\;"
+                 "v_mov_b32\t%J0, %J1\;v_mov_b32\t%K0, %K1";
+	case 1:
+	  return "v_cndmask_b32\t%L0, %L2, %L1, vcc\;"
+		 "v_cndmask_b32\t%H0, %H2, %H1, vcc\;"
+		 "v_cndmask_b32\t%J0, %J2, %J1, vcc\;"
+		 "v_cndmask_b32\t%K0, %K2, %K1, vcc";
+	case 2:
+	  return "v_cndmask_b32\t%L0, %L2, %L1, %3\;"
+		 "v_cndmask_b32\t%H0, %H2, %H1, %3\;"
+		 "v_cndmask_b32\t%J0, %J2, %J1, %3\;"
+		 "v_cndmask_b32\t%K0, %K2, %K1, %3";
+	}
+    else
+      switch (which_alternative)
+	{
+	case 0:
+	  return "v_mov_b32\t%H0, %H1\;v_mov_b32\t%L0, %L1\;"
+                 "v_mov_b32\t%J0, %J1\;v_mov_b32\t%K0, %K1";
+	case 1:
+	  return "v_cndmask_b32\t%H0, %H2, %H1, vcc\;"
+		 "v_cndmask_b32\t%L0, %L2, %L1, vcc\;"
+		 "v_cndmask_b32\t%J0, %J2, %J1, vcc\;"
+		 "v_cndmask_b32\t%K0, %K2, %K1, vcc";
+	case 2:
+	  return "v_cndmask_b32\t%H0, %H2, %H1, %3\;"
+		 "v_cndmask_b32\t%L0, %L2, %L1, %3\;"
+		 "v_cndmask_b32\t%J0, %J2, %J1, %3\;"
+		 "v_cndmask_b32\t%K0, %K2, %K1, %3";
+	}
+
+    return "#";
+  }
+  [(set_attr "type" "vmult,vmult,vmult,*,*")
+   (set_attr "length" "32")])
+
 ; This variant does not accept an unspec, but does permit MEM
 ; read/modify/write which is necessary for maskstore.
 
@@ -592,12 +675,25 @@ (define_insn "mov<mode>_sgprbase"
   [(set_attr "type" "vmult,*,*")
    (set_attr "length" "8,12,12")])
 
+(define_insn "mov<mode>_sgprbase"
+  [(set (match_operand:V_4REG 0 "nonimmediate_operand" "= v, v, m")
+	(unspec:V_4REG
+	  [(match_operand:V_4REG 1 "general_operand"   "vDB, m, v")]
+	  UNSPEC_SGPRBASE))
+   (clobber (match_operand:<VnDI> 2 "register_operand"  "=&v,&v,&v"))]
+  "lra_in_progress || reload_completed"
+  "v_mov_b32\t%L0, %L1\;v_mov_b32\t%H0, %H1\;v_mov_b32\t%J0, %J1\;v_mov_b32\t%K0, %K1
+   #
+   #"
+  [(set_attr "type" "vmult,*,*")
+   (set_attr "length" "8,12,12")])
+
 ; reload_in was once a standard name, but here it's only referenced by
 ; gcn_secondary_reload.  It allows a reload with a scratch register.
 
 (define_expand "reload_in<mode>"
-  [(set (match_operand:V_ALL 0 "register_operand"     "= v")
-	(match_operand:V_ALL 1 "memory_operand"	      "  m"))
+  [(set (match_operand:V_MOV 0 "register_operand"     "= v")
+	(match_operand:V_MOV 1 "memory_operand"	      "  m"))
    (clobber (match_operand:<VnDI> 2 "register_operand" "=&v"))]
   ""
   {
@@ -608,8 +704,8 @@ (define_expand "reload_in<mode>"
 ; reload_out is similar to reload_in, above.
 
 (define_expand "reload_out<mode>"
-  [(set (match_operand:V_ALL 0 "memory_operand"	      "= m")
-	(match_operand:V_ALL 1 "register_operand"     "  v"))
+  [(set (match_operand:V_MOV 0 "memory_operand"	      "= m")
+	(match_operand:V_MOV 1 "register_operand"     "  v"))
    (clobber (match_operand:<VnDI> 2 "register_operand" "=&v"))]
   ""
   {
@@ -620,9 +716,9 @@ (define_expand "reload_out<mode>"
 ; Expand scalar addresses into gather/scatter patterns
 
 (define_split
-  [(set (match_operand:V_ALL 0 "memory_operand")
-	(unspec:V_ALL
-	  [(match_operand:V_ALL 1 "general_operand")]
+  [(set (match_operand:V_MOV 0 "memory_operand")
+	(unspec:V_MOV
+	  [(match_operand:V_MOV 1 "general_operand")]
 	  UNSPEC_SGPRBASE))
    (clobber (match_scratch:<VnDI> 2))]
   ""
@@ -638,10 +734,10 @@ (define_split
   })
 
 (define_split
-  [(set (match_operand:V_ALL 0 "memory_operand")
-	(vec_merge:V_ALL
-	  (match_operand:V_ALL 1 "general_operand")
-	  (match_operand:V_ALL 2 "")
+  [(set (match_operand:V_MOV 0 "memory_operand")
+	(vec_merge:V_MOV
+	  (match_operand:V_MOV 1 "general_operand")
+	  (match_operand:V_MOV 2 "")
 	  (match_operand:DI 3 "gcn_exec_reg_operand")))
    (clobber (match_scratch:<VnDI> 4))]
   ""
@@ -659,14 +755,14 @@ (define_split
   })
 
 (define_split
-  [(set (match_operand:V_ALL 0 "nonimmediate_operand")
-	(unspec:V_ALL
-	  [(match_operand:V_ALL 1 "memory_operand")]
+  [(set (match_operand:V_MOV 0 "nonimmediate_operand")
+	(unspec:V_MOV
+	  [(match_operand:V_MOV 1 "memory_operand")]
 	  UNSPEC_SGPRBASE))
    (clobber (match_scratch:<VnDI> 2))]
   ""
   [(set (match_dup 0)
-	(unspec:V_ALL [(match_dup 5) (match_dup 6) (match_dup 7)
+	(unspec:V_MOV [(match_dup 5) (match_dup 6) (match_dup 7)
 		       (mem:BLK (scratch))]
 		      UNSPEC_GATHER))]
   {
@@ -678,16 +774,16 @@ (define_split
   })
 
 (define_split
-  [(set (match_operand:V_ALL 0 "nonimmediate_operand")
-	(vec_merge:V_ALL
-	  (match_operand:V_ALL 1 "memory_operand")
-	  (match_operand:V_ALL 2 "")
+  [(set (match_operand:V_MOV 0 "nonimmediate_operand")
+	(vec_merge:V_MOV
+	  (match_operand:V_MOV 1 "memory_operand")
+	  (match_operand:V_MOV 2 "")
 	  (match_operand:DI 3 "gcn_exec_reg_operand")))
    (clobber (match_scratch:<VnDI> 4))]
   ""
   [(set (match_dup 0)
-	(vec_merge:V_ALL
-	  (unspec:V_ALL [(match_dup 5) (match_dup 6) (match_dup 7)
+	(vec_merge:V_MOV
+	  (unspec:V_MOV [(match_dup 5) (match_dup 6) (match_dup 7)
 			 (mem:BLK (scratch))]
 			 UNSPEC_GATHER)
 	  (match_dup 2)
@@ -744,9 +840,9 @@ (define_insn "*vec_set<mode>"
    (set_attr "laneselect" "yes")])
 
 (define_expand "vec_set<mode>"
-  [(set (match_operand:V_ALL 0 "register_operand")
-	(vec_merge:V_ALL
-	  (vec_duplicate:V_ALL
+  [(set (match_operand:V_MOV 0 "register_operand")
+	(vec_merge:V_MOV
+	  (vec_duplicate:V_MOV
 	    (match_operand:<SCALAR_MODE> 1 "register_operand"))
 	  (match_dup 0)
 	  (ashift (const_int 1) (match_operand:SI 2 "gcn_alu_operand"))))]
@@ -804,6 +900,15 @@ (define_insn "vec_duplicate<mode><exec>"
   [(set_attr "type" "vop3a")
    (set_attr "length" "16")])
 
+(define_insn "vec_duplicate<mode><exec>"
+  [(set (match_operand:V_4REG 0 "register_operand"	   "=  v")
+	(vec_duplicate:V_4REG
+	  (match_operand:<SCALAR_MODE> 1 "gcn_alu_operand" "SvDB")))]
+  ""
+  "v_mov_b32\t%L0, %L1\;v_mov_b32\t%H0, %H1\;v_mov_b32\t%J0, %J1\;v_mov_b32\t%K0, %K1"
+  [(set_attr "type" "mult")
+   (set_attr "length" "32")])
+
 (define_insn "vec_extract<mode><scalar_mode>"
   [(set (match_operand:<SCALAR_MODE> 0 "register_operand"  "=Sg")
 	(vec_select:<SCALAR_MODE>
@@ -828,6 +933,18 @@ (define_insn "vec_extract<mode><scalar_mode>"
    (set_attr "exec" "none")
    (set_attr "laneselect" "yes")])
 
+(define_insn "vec_extract<mode><scalar_mode>"
+  [(set (match_operand:<SCALAR_MODE> 0 "register_operand"  "=&Sg")
+	(vec_select:<SCALAR_MODE>
+	  (match_operand:V_4REG 1 "register_operand"	   "   v")
+	  (parallel [(match_operand:SI 2 "gcn_alu_operand" " SvB")])))]
+  ""
+  "v_readlane_b32 %L0, %L1, %2\;v_readlane_b32 %H0, %H1, %2\;v_readlane_b32 %J0, %J1, %2\;v_readlane_b32 %K0, %K1, %2"
+  [(set_attr "type" "vmult")
+   (set_attr "length" "32")
+   (set_attr "exec" "none")
+   (set_attr "laneselect" "yes")])
+
 (define_insn "vec_extract<V_1REG:mode><V_1REG_ALT:mode>_nop"
   [(set (match_operand:V_1REG_ALT 0 "register_operand" "=v,v")
 	(vec_select:V_1REG_ALT
@@ -854,39 +971,52 @@ (define_insn "vec_extract<V_2REG:mode><V_2REG_ALT:mode>_nop"
   [(set_attr "type" "vmult")
    (set_attr "length" "0,8")])
   
-(define_expand "vec_extract<V_ALL:mode><V_ALL_ALT:mode>"
-  [(match_operand:V_ALL_ALT 0 "register_operand")
-   (match_operand:V_ALL 1 "register_operand")
+(define_insn "vec_extract<V_4REG:mode><V_4REG_ALT:mode>_nop"
+  [(set (match_operand:V_4REG_ALT 0 "register_operand" "=v,v")
+	(vec_select:V_4REG_ALT
+	  (match_operand:V_4REG 1 "register_operand"   " 0,v")
+	  (match_operand 2 "ascending_zero_int_parallel" "")))]
+  "MODE_VF (<V_4REG_ALT:MODE>mode) < MODE_VF (<V_4REG:MODE>mode)
+   && <V_4REG_ALT:SCALAR_MODE>mode == <V_4REG:SCALAR_MODE>mode"
+  "@
+  ; in-place extract %0
+  v_mov_b32\t%L0, %L1\;v_mov_b32\t%H0, %H1\;v_mov_b32\t%J0, %J1\;v_mov_b32\t%K0, %K1"
+  [(set_attr "type" "vmult")
+   (set_attr "length" "0,16")])
+  
+(define_expand "vec_extract<V_MOV:mode><V_MOV_ALT:mode>"
+  [(match_operand:V_MOV_ALT 0 "register_operand")
+   (match_operand:V_MOV 1 "register_operand")
    (match_operand 2 "immediate_operand")]
-  "MODE_VF (<V_ALL_ALT:MODE>mode) < MODE_VF (<V_ALL:MODE>mode)
-   && <V_ALL_ALT:SCALAR_MODE>mode == <V_ALL:SCALAR_MODE>mode"
+  "MODE_VF (<V_MOV_ALT:MODE>mode) < MODE_VF (<V_MOV:MODE>mode)
+   && <V_MOV_ALT:SCALAR_MODE>mode == <V_MOV:SCALAR_MODE>mode"
   {
-    int numlanes = GET_MODE_NUNITS (<V_ALL_ALT:MODE>mode);
+    int numlanes = GET_MODE_NUNITS (<V_MOV_ALT:MODE>mode);
     int firstlane = INTVAL (operands[2]) * numlanes;
     rtx tmp;
 
     if (firstlane == 0)
       {
-	rtx parallel = gen_rtx_PARALLEL (<V_ALL:MODE>mode,
+	rtx parallel = gen_rtx_PARALLEL (<V_MOV:MODE>mode,
 					  rtvec_alloc (numlanes));
 	for (int i = 0; i < numlanes; i++)
 	  XVECEXP (parallel, 0, i) = GEN_INT (i);
-	emit_insn (gen_vec_extract<V_ALL:mode><V_ALL_ALT:mode>_nop
+	emit_insn (gen_vec_extract<V_MOV:mode><V_MOV_ALT:mode>_nop
 		   (operands[0], operands[1], parallel));
       } else {
         /* FIXME: optimize this by using DPP where available.  */
 
-        rtx permutation = gen_reg_rtx (<V_ALL:VnSI>mode);
-	emit_insn (gen_vec_series<V_ALL:vnsi> (permutation,
+        rtx permutation = gen_reg_rtx (<V_MOV:VnSI>mode);
+	emit_insn (gen_vec_series<V_MOV:vnsi> (permutation,
 					       GEN_INT (firstlane*4),
 					       GEN_INT (4)));
 
-	tmp = gen_reg_rtx (<V_ALL:MODE>mode);
-	emit_insn (gen_ds_bpermute<V_ALL:mode> (tmp, permutation, operands[1],
-						get_exec (<V_ALL:MODE>mode)));
+	tmp = gen_reg_rtx (<V_MOV:MODE>mode);
+	emit_insn (gen_ds_bpermute<V_MOV:mode> (tmp, permutation, operands[1],
+						get_exec (<V_MOV:MODE>mode)));
 
 	emit_move_insn (operands[0],
-			gen_rtx_SUBREG (<V_ALL_ALT:MODE>mode, tmp, 0));
+			gen_rtx_SUBREG (<V_MOV_ALT:MODE>mode, tmp, 0));
       }
     DONE;
   })
@@ -894,7 +1024,7 @@ (define_expand "vec_extract<V_ALL:mode><V_ALL_ALT:mode>"
 (define_expand "extract_last_<mode>"
   [(match_operand:<SCALAR_MODE> 0 "register_operand")
    (match_operand:DI 1 "gcn_alu_operand")
-   (match_operand:V_ALL 2 "register_operand")]
+   (match_operand:V_MOV 2 "register_operand")]
   "can_create_pseudo_p ()"
   {
     rtx dst = operands[0];
@@ -912,7 +1042,7 @@ (define_expand "fold_extract_last_<mode>"
   [(match_operand:<SCALAR_MODE> 0 "register_operand")
    (match_operand:<SCALAR_MODE> 1 "gcn_alu_operand")
    (match_operand:DI 2 "gcn_alu_operand")
-   (match_operand:V_ALL 3 "register_operand")]
+   (match_operand:V_MOV 3 "register_operand")]
   "can_create_pseudo_p ()"
   {
     rtx dst = operands[0];
@@ -934,7 +1064,7 @@ (define_expand "fold_extract_last_<mode>"
   })
 
 (define_expand "vec_init<mode><scalar_mode>"
-  [(match_operand:V_ALL 0 "register_operand")
+  [(match_operand:V_MOV 0 "register_operand")
    (match_operand 1)]
   ""
   {
@@ -942,11 +1072,11 @@ (define_expand "vec_init<mode><scalar_mode>"
     DONE;
   })
 
-(define_expand "vec_init<V_ALL:mode><V_ALL_ALT:mode>"
-  [(match_operand:V_ALL 0 "register_operand")
-   (match_operand:V_ALL_ALT 1)]
-  "<V_ALL:SCALAR_MODE>mode == <V_ALL_ALT:SCALAR_MODE>mode
-   && MODE_VF (<V_ALL_ALT:MODE>mode) < MODE_VF (<V_ALL:MODE>mode)"
+(define_expand "vec_init<V_MOV:mode><V_MOV_ALT:mode>"
+  [(match_operand:V_MOV 0 "register_operand")
+   (match_operand:V_MOV_ALT 1)]
+  "<V_MOV:SCALAR_MODE>mode == <V_MOV_ALT:SCALAR_MODE>mode
+   && MODE_VF (<V_MOV_ALT:MODE>mode) < MODE_VF (<V_MOV:MODE>mode)"
   {
     gcn_expand_vector_init (operands[0], operands[1]);
     DONE;
@@ -988,7 +1118,7 @@ (define_expand "vec_init<V_ALL:mode><V_ALL_ALT:mode>"
 ;; TODO: implement combined gather and zero_extend, but only for -msram-ecc=on
 
 (define_expand "gather_load<mode><vnsi>"
-  [(match_operand:V_ALL 0 "register_operand")
+  [(match_operand:V_MOV 0 "register_operand")
    (match_operand:DI 1 "register_operand")
    (match_operand:<VnSI> 2 "register_operand")
    (match_operand 3 "immediate_operand")
@@ -1011,8 +1141,8 @@ (define_expand "gather_load<mode><vnsi>"
 
 ; Allow any address expression
 (define_expand "gather<mode>_expr<exec>"
-  [(set (match_operand:V_ALL 0 "register_operand")
-	(unspec:V_ALL
+  [(set (match_operand:V_MOV 0 "register_operand")
+	(unspec:V_MOV
 	  [(match_operand 1 "")
 	   (match_operand 2 "immediate_operand")
 	   (match_operand 3 "immediate_operand")
@@ -1022,8 +1152,8 @@ (define_expand "gather<mode>_expr<exec>"
     {})
 
 (define_insn "gather<mode>_insn_1offset<exec>"
-  [(set (match_operand:V_ALL 0 "register_operand"		   "=v")
-	(unspec:V_ALL
+  [(set (match_operand:V_MOV 0 "register_operand"		   "=v")
+	(unspec:V_MOV
 	  [(plus:<VnDI> (match_operand:<VnDI> 1 "register_operand" " v")
 			(vec_duplicate:<VnDI>
 			  (match_operand 2 "immediate_operand"	   " n")))
@@ -1061,8 +1191,8 @@ (define_insn "gather<mode>_insn_1offset<exec>"
    (set_attr "length" "12")])
 
 (define_insn "gather<mode>_insn_1offset_ds<exec>"
-  [(set (match_operand:V_ALL 0 "register_operand"		   "=v")
-	(unspec:V_ALL
+  [(set (match_operand:V_MOV 0 "register_operand"		   "=v")
+	(unspec:V_MOV
 	  [(plus:<VnSI> (match_operand:<VnSI> 1 "register_operand" " v")
 			(vec_duplicate:<VnSI>
 			  (match_operand 2 "immediate_operand"	   " n")))
@@ -1083,8 +1213,8 @@ (define_insn "gather<mode>_insn_1offset_ds<exec>"
    (set_attr "length" "12")])
 
 (define_insn "gather<mode>_insn_2offsets<exec>"
-  [(set (match_operand:V_ALL 0 "register_operand"			"=v")
-	(unspec:V_ALL
+  [(set (match_operand:V_MOV 0 "register_operand"			"=v")
+	(unspec:V_MOV
 	  [(plus:<VnDI>
 	     (plus:<VnDI>
 	       (vec_duplicate:<VnDI>
@@ -1119,7 +1249,7 @@ (define_expand "scatter_store<mode><vnsi>"
    (match_operand:<VnSI> 1 "register_operand")
    (match_operand 2 "immediate_operand")
    (match_operand:SI 3 "gcn_alu_operand")
-   (match_operand:V_ALL 4 "register_operand")]
+   (match_operand:V_MOV 4 "register_operand")]
   ""
   {
     rtx addr = gcn_expand_scaled_offsets (DEFAULT_ADDR_SPACE, operands[0],
@@ -1141,7 +1271,7 @@ (define_expand "scatter<mode>_expr<exec_scatter>"
   [(set (mem:BLK (scratch))
 	(unspec:BLK
 	  [(match_operand:<VnDI> 0 "")
-	   (match_operand:V_ALL 1 "register_operand")
+	   (match_operand:V_MOV 1 "register_operand")
 	   (match_operand 2 "immediate_operand")
 	   (match_operand 3 "immediate_operand")]
 	  UNSPEC_SCATTER))]
@@ -1154,7 +1284,7 @@ (define_insn "scatter<mode>_insn_1offset<exec_scatter>"
 	  [(plus:<VnDI> (match_operand:<VnDI> 0 "register_operand" "v")
 			(vec_duplicate:<VnDI>
 			  (match_operand 1 "immediate_operand"	   "n")))
-	   (match_operand:V_ALL 2 "register_operand"		   "v")
+	   (match_operand:V_MOV 2 "register_operand"		   "v")
 	   (match_operand 3 "immediate_operand"			   "n")
 	   (match_operand 4 "immediate_operand"			   "n")]
 	  UNSPEC_SCATTER))]
@@ -1192,7 +1322,7 @@ (define_insn "scatter<mode>_insn_1offset_ds<exec_scatter>"
 	  [(plus:<VnSI> (match_operand:<VnSI> 0 "register_operand" "v")
 			(vec_duplicate:<VnSI>
 			  (match_operand 1 "immediate_operand"	   "n")))
-	   (match_operand:V_ALL 2 "register_operand"		   "v")
+	   (match_operand:V_MOV 2 "register_operand"		   "v")
 	   (match_operand 3 "immediate_operand"			   "n")
 	   (match_operand 4 "immediate_operand"			   "n")]
 	  UNSPEC_SCATTER))]
@@ -1218,7 +1348,7 @@ (define_insn "scatter<mode>_insn_2offsets<exec_scatter>"
 	       (sign_extend:<VnDI>
 		 (match_operand:<VnSI> 1 "register_operand"		" v")))
 	     (vec_duplicate:<VnDI> (match_operand 2 "immediate_operand" " n")))
-	   (match_operand:V_ALL 3 "register_operand"			" v")
+	   (match_operand:V_MOV 3 "register_operand"			" v")
 	   (match_operand 4 "immediate_operand"				" n")
 	   (match_operand 5 "immediate_operand"				" n")]
 	  UNSPEC_SCATTER))]
@@ -3797,8 +3927,8 @@ (define_expand "while_ultsidi"
   })
 
 (define_expand "maskload<mode>di"
-  [(match_operand:V_ALL 0 "register_operand")
-   (match_operand:V_ALL 1 "memory_operand")
+  [(match_operand:V_MOV 0 "register_operand")
+   (match_operand:V_MOV 1 "memory_operand")
    (match_operand 2 "")]
   ""
   {
@@ -3817,8 +3947,8 @@ (define_expand "maskload<mode>di"
   })
 
 (define_expand "maskstore<mode>di"
-  [(match_operand:V_ALL 0 "memory_operand")
-   (match_operand:V_ALL 1 "register_operand")
+  [(match_operand:V_MOV 0 "memory_operand")
+   (match_operand:V_MOV 1 "register_operand")
    (match_operand 2 "")]
   ""
   {
@@ -3832,7 +3962,7 @@ (define_expand "maskstore<mode>di"
   })
 
 (define_expand "mask_gather_load<mode><vnsi>"
-  [(match_operand:V_ALL 0 "register_operand")
+  [(match_operand:V_MOV 0 "register_operand")
    (match_operand:DI 1 "register_operand")
    (match_operand:<VnSI> 2 "register_operand")
    (match_operand 3 "immediate_operand")
@@ -3867,7 +3997,7 @@ (define_expand "mask_scatter_store<mode><vnsi>"
    (match_operand:<VnSI> 1 "register_operand")
    (match_operand 2 "immediate_operand")
    (match_operand:SI 3 "gcn_alu_operand")
-   (match_operand:V_ALL 4 "register_operand")
+   (match_operand:V_MOV 4 "register_operand")
    (match_operand:DI 5 "")]
   ""
   {
diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc
index 5608d85a1a0..edc2abcad26 100644
--- a/gcc/config/gcn/gcn.cc
+++ b/gcc/config/gcn/gcn.cc
@@ -487,7 +487,7 @@ gcn_class_max_nregs (reg_class_t rclass, machine_mode mode)
       if (vgpr_2reg_mode_p (mode))
 	return 2;
       /* TImode is used by DImode compare_and_swap.  */
-      if (mode == TImode)
+      if (vgpr_4reg_mode_p (mode))
 	return 4;
     }
   else if (rclass == VCC_CONDITIONAL_REG && mode == BImode)
@@ -590,9 +590,9 @@ gcn_hard_regno_mode_ok (unsigned int regno, machine_mode mode)
        Therefore, we restrict ourselved to aligned registers.  */
     return (vgpr_1reg_mode_p (mode)
 	    || (!((regno - FIRST_VGPR_REG) & 1) && vgpr_2reg_mode_p (mode))
-	    /* TImode is used by DImode compare_and_swap.  */
-	    || (mode == TImode
-		&& !((regno - FIRST_VGPR_REG) & 3)));
+	    /* TImode is used by DImode compare_and_swap,
+	       and by DIVMOD V64DImode libfuncs.  */
+	    || (!((regno - FIRST_VGPR_REG) & 3) && vgpr_4reg_mode_p (mode)));
   return false;
 }
 
@@ -1324,6 +1324,7 @@ GEN_VN (PREFIX, si##SUFFIX, A(PARAMS), A(ARGS)) \
 GEN_VN (PREFIX, sf##SUFFIX, A(PARAMS), A(ARGS)) \
 GEN_VN (PREFIX, di##SUFFIX, A(PARAMS), A(ARGS)) \
 GEN_VN (PREFIX, df##SUFFIX, A(PARAMS), A(ARGS)) \
+USE_TI (GEN_VN (PREFIX, ti##SUFFIX, A(PARAMS), A(ARGS))) \
 static rtx \
 gen_##PREFIX##vNm##SUFFIX (PARAMS, rtx merge_src=NULL, rtx exec=NULL) \
 { \
@@ -1338,6 +1339,8 @@ gen_##PREFIX##vNm##SUFFIX (PARAMS, rtx merge_src=NULL, rtx exec=NULL) \
     case E_SFmode: return gen_##PREFIX##vNsf##SUFFIX (ARGS, merge_src, exec); \
     case E_DImode: return gen_##PREFIX##vNdi##SUFFIX (ARGS, merge_src, exec); \
     case E_DFmode: return gen_##PREFIX##vNdf##SUFFIX (ARGS, merge_src, exec); \
+    case E_TImode: \
+	USE_TI (return gen_##PREFIX##vNti##SUFFIX (ARGS, merge_src, exec);) \
     default: \
       break; \
     } \
@@ -1346,6 +1349,14 @@ gen_##PREFIX##vNm##SUFFIX (PARAMS, rtx merge_src=NULL, rtx exec=NULL) \
   return NULL_RTX; \
 }
 
+/* These have TImode support.  */
+#define USE_TI(ARGS) ARGS
+GEN_VNM (mov,, A(rtx dest, rtx src), A(dest, src))
+GEN_VNM (vec_duplicate,, A(rtx dest, rtx src), A(dest, src))
+
+/* These do not have TImode support.  */
+#undef USE_TI
+#define USE_TI(ARGS)
 GEN_VNM (add,3, A(rtx dest, rtx src1, rtx src2), A(dest, src1, src2))
 GEN_VN (add,si3_dup, A(rtx dest, rtx src1, rtx src2), A(dest, src1, src2))
 GEN_VN (add,si3_vcc_dup, A(rtx dest, rtx src1, rtx src2, rtx vcc),
@@ -1364,12 +1375,11 @@ GEN_VNM_NOEXEC (ds_bpermute,, A(rtx dest, rtx addr, rtx src, rtx exec),
 		A(dest, addr, src, exec))
 GEN_VNM (gather,_expr, A(rtx dest, rtx addr, rtx as, rtx vol),
 	 A(dest, addr, as, vol))
-GEN_VNM (mov,, A(rtx dest, rtx src), A(dest, src))
 GEN_VN (mul,si3_dup, A(rtx dest, rtx src1, rtx src2), A(dest, src1, src2))
 GEN_VN (sub,si3, A(rtx dest, rtx src1, rtx src2), A(dest, src1, src2))
-GEN_VNM (vec_duplicate,, A(rtx dest, rtx src), A(dest, src))
 GEN_VN_NOEXEC (vec_series,si, A(rtx dest, rtx x, rtx c), A(dest, x, c))
 
+#undef USE_TI
 #undef GEN_VNM
 #undef GEN_VN
 #undef GET_VN_FN
@@ -4893,7 +4903,13 @@ gcn_vector_mode_supported_p (machine_mode mode)
 	  || mode == V4SFmode || mode == V4DFmode
 	  || mode == V2QImode || mode == V2HImode
 	  || mode == V2SImode || mode == V2DImode
-	  || mode == V2SFmode || mode == V2DFmode);
+	  || mode == V2SFmode || mode == V2DFmode
+	  /* TImode vectors are allowed to exist for divmod, but there
+	     are almost no instructions defined for them, and the
+	     autovectorizer does not use them.  */
+	  || mode == V64TImode || mode == V32TImode
+	  || mode == V16TImode || mode == V8TImode
+	  || mode == V4TImode || mode == V2TImode);
 }
 
 /* Implement TARGET_VECTORIZE_PREFERRED_SIMD_MODE.
@@ -6721,6 +6737,10 @@ print_operand_address (FILE *file, rtx mem)
    O - print offset:n for data share operations.
    ^ - print "_co" suffix for GCN5 mnemonics
    g - print "glc", if appropriate for given MEM
+   L - print low-part of a multi-reg value
+   H - print second part of a multi-reg value (high-part of 2-reg value)
+   J - print third part of a multi-reg value
+   K - print fourth part of a multi-reg value
  */
 
 void
@@ -7260,6 +7280,12 @@ print_operand (FILE *file, rtx x, int code)
     case 'H':
       print_operand (file, gcn_operand_part (GET_MODE (x), x, 1), 0);
       return;
+    case 'J':
+      print_operand (file, gcn_operand_part (GET_MODE (x), x, 2), 0);
+      return;
+    case 'K':
+      print_operand (file, gcn_operand_part (GET_MODE (x), x, 3), 0);
+      return;
     case 'R':
       /* Print a scalar register number as an integer.  Temporary hack.  */
       gcc_assert (REG_P (x));
Richard Sandiford June 7, 2023, 7:42 p.m. UTC | #6
Andrew Stubbs <ams@codesourcery.com> writes:
> On 30/05/2023 07:26, Richard Biener wrote:
>> On Fri, May 26, 2023 at 4:35 PM Andrew Stubbs <ams@codesourcery.com> wrote:
>>>
>>> Hi all,
>>>
>>> I want to implement a vector DIVMOD libfunc for amdgcn, but I can't just
>>> do it because the GCC middle-end models DIVMOD's return value as
>>> "complex int" type, and there are no vector equivalents of that type.
>>>
>>> Therefore, this patch adds minimal support for "complex vector int"
>>> modes.  I have not attempted to provide any means to use these modes
>>> from C, so they're really only useful for DIVMOD.  The actual libfunc
>>> implementation will pack the data into wider vector modes manually.
>>>
>>> A knock-on effect of this is that I needed to increase the range of
>>> "mode_unit_size" (several of the vector modes supported by amdgcn exceed
>>> the previous 255-byte limit).
>>>
>>> Since this change would add a large number of new, unused modes to many
>>> architectures, I have elected to *not* enable them, by default, in
>>> machmode.def (where the other complex modes are created).  The new modes
>>> are therefore inactive on all architectures but amdgcn, for now.
>>>
>>> OK for mainline?  (I've not done a full test yet, but I will.)
>> 
>> I think it makes more sense to map vector CSImode to vector SImode with
>> the double number of lanes.  In fact since divmod is a libgcc function
>> I wonder where your vector variant would reside and how GCC decides to
>> emit calls to it?  That is, there's no way to OMP simd declare this function?
>
> The divmod implementation lives in libgcc. It's not too difficult to 
> write using vector extensions and some asm tricks. I did try an OMP simd 
> declare implementation, but it didn't vectorize well, and that's a yack 
> I don't wish to shave right now.
>
> In any case, the OMP simd declare will not help us here, directly, 
> because the DIVMOD transformation happens too late in the pass pipeline, 
> long after ifcvt and vect. My implementation (not yet posted), uses a 
> libfunc and the TARGET_EXPAND_DIVMOD_LIBFUNC hook in the standard way. 
> It just needs the complex vector modes to exist.
>
> Using vectors twice the length is problematic also. If I create a new 
> V128SImode that spans across two 64-lane vector registers then that will 
> probably have the desired effect ("real" quotient in v8, "imaginary" 
> remainder in v9), but if I use V64SImode to represent two V32SImode 
> vectors then that's a one-register mode, and I'll have to use a 
> permutation (a memory operation) to extract lanes 32-63 into lanes 0-31, 
> and if we ever want to implement instructions that operate on these 
> modes (as opposed to the odd/even add/sub complex patterns we have now) 
> then the masking will be all broken and we'd need to constantly 
> disassemble the double length vectors to operate on them.

I don't know if this helps (probably not), but we have a similar
situation on AArch64: a 64-bit mode like V8QI can be doubled to a
128-bit vector or to a pair of 64-bit vectors.  We used V16QI for
the former and "V2x8QI" for the latter.  V2x8QI is forced to come
after V16QI in the mode list, and so it is only ever used through
explicit choice.  But both modes are functionally vectors of 16 QIs.

Thanks,
Richard
Andrew Stubbs June 9, 2023, 8:42 a.m. UTC | #7
On 07/06/2023 20:42, Richard Sandiford wrote:
> I don't know if this helps (probably not), but we have a similar
> situation on AArch64: a 64-bit mode like V8QI can be doubled to a
> 128-bit vector or to a pair of 64-bit vectors.  We used V16QI for
> the former and "V2x8QI" for the latter.  V2x8QI is forced to come
> after V16QI in the mode list, and so it is only ever used through
> explicit choice.  But both modes are functionally vectors of 16 QIs.

OK, that's interesting, but how do you map "complex int" vectors to that 
mode? I tried to figure it out, but there's no DIVMOD support so I 
couldn't just do a straight comparison.

Thanks

Andrew
Richard Sandiford June 9, 2023, 9:02 a.m. UTC | #8
Andrew Stubbs <ams@codesourcery.com> writes:
> On 07/06/2023 20:42, Richard Sandiford wrote:
>> I don't know if this helps (probably not), but we have a similar
>> situation on AArch64: a 64-bit mode like V8QI can be doubled to a
>> 128-bit vector or to a pair of 64-bit vectors.  We used V16QI for
>> the former and "V2x8QI" for the latter.  V2x8QI is forced to come
>> after V16QI in the mode list, and so it is only ever used through
>> explicit choice.  But both modes are functionally vectors of 16 QIs.
>
> OK, that's interesting, but how do you map "complex int" vectors to that 
> mode? I tried to figure it out, but there's no DIVMOD support so I 
> couldn't just do a straight comparison.

Yeah, we don't do that currently.  Instead we make TARGET_ARRAY_MODE
return V2x8QI for an array of 2 V8QIs (which is OK, since V2x8QI has
64-bit rather than 128-bit alignment).  So we should use it for a
complex-y type like:

  struct { res_type res[2]; };

In principle we should be able to do the same for:

  struct { res_type a, b; };

but that isn't supported yet.  I think it would need a new target hook
along the lines of TARGET_ARRAY_MODE, but for structs rather than arrays.

The advantage of this from AArch64's PoV is that it extends to 3x and 4x
tuples as well, whereas complex is obviously for pairs only.

I don't know if it would be acceptable to use that kind of struct wrapper
for the divmod code though (for the vector case only).

Thanks,
Richard
Andrew Stubbs June 9, 2023, 9:45 a.m. UTC | #9
On 09/06/2023 10:02, Richard Sandiford wrote:
> Andrew Stubbs <ams@codesourcery.com> writes:
>> On 07/06/2023 20:42, Richard Sandiford wrote:
>>> I don't know if this helps (probably not), but we have a similar
>>> situation on AArch64: a 64-bit mode like V8QI can be doubled to a
>>> 128-bit vector or to a pair of 64-bit vectors.  We used V16QI for
>>> the former and "V2x8QI" for the latter.  V2x8QI is forced to come
>>> after V16QI in the mode list, and so it is only ever used through
>>> explicit choice.  But both modes are functionally vectors of 16 QIs.
>>
>> OK, that's interesting, but how do you map "complex int" vectors to that
>> mode? I tried to figure it out, but there's no DIVMOD support so I
>> couldn't just do a straight comparison.
> 
> Yeah, we don't do that currently.  Instead we make TARGET_ARRAY_MODE
> return V2x8QI for an array of 2 V8QIs (which is OK, since V2x8QI has
> 64-bit rather than 128-bit alignment).  So we should use it for a
> complex-y type like:
> 
>    struct { res_type res[2]; };
> 
> In principle we should be able to do the same for:
> 
>    struct { res_type a, b; };
> 
> but that isn't supported yet.  I think it would need a new target hook
> along the lines of TARGET_ARRAY_MODE, but for structs rather than arrays.
> 
> The advantage of this from AArch64's PoV is that it extends to 3x and 4x
> tuples as well, whereas complex is obviously for pairs only.
> 
> I don't know if it would be acceptable to use that kind of struct wrapper
> for the divmod code though (for the vector case only).

Looking again, I don't think this will help because GCN does not have an 
instruction that loads vectors that are back-to-back, hence there's 
little benefit in adding the tuple mode.

However, GCN does have instructions that effectively load 2, 3, or 4 
vectors that are *interleaved*, which would be the likely case for 
complex numbers (or pixel colour data!)

I need to figure out how to move forward with this patch, please; if the 
new complex modes are not acceptable then I think I need to reimplement 
DIVMOD (maybe the scalars can remain as-is), but it's not clear to me 
what that would look like.

Andrew
Richard Biener June 9, 2023, 11:40 a.m. UTC | #10
On Fri, Jun 9, 2023 at 11:45 AM Andrew Stubbs <ams@codesourcery.com> wrote:
>
> On 09/06/2023 10:02, Richard Sandiford wrote:
> > Andrew Stubbs <ams@codesourcery.com> writes:
> >> On 07/06/2023 20:42, Richard Sandiford wrote:
> >>> I don't know if this helps (probably not), but we have a similar
> >>> situation on AArch64: a 64-bit mode like V8QI can be doubled to a
> >>> 128-bit vector or to a pair of 64-bit vectors.  We used V16QI for
> >>> the former and "V2x8QI" for the latter.  V2x8QI is forced to come
> >>> after V16QI in the mode list, and so it is only ever used through
> >>> explicit choice.  But both modes are functionally vectors of 16 QIs.
> >>
> >> OK, that's interesting, but how do you map "complex int" vectors to that
> >> mode? I tried to figure it out, but there's no DIVMOD support so I
> >> couldn't just do a straight comparison.
> >
> > Yeah, we don't do that currently.  Instead we make TARGET_ARRAY_MODE
> > return V2x8QI for an array of 2 V8QIs (which is OK, since V2x8QI has
> > 64-bit rather than 128-bit alignment).  So we should use it for a
> > complex-y type like:
> >
> >    struct { res_type res[2]; };
> >
> > In principle we should be able to do the same for:
> >
> >    struct { res_type a, b; };
> >
> > but that isn't supported yet.  I think it would need a new target hook
> > along the lines of TARGET_ARRAY_MODE, but for structs rather than arrays.

And the same should work for complex types, no?  In fact we could document
that TARGET_ARRAY_MODE also is used for _Complex?  Note the hook
is used for type layout and thus innocent array types (in aggregates) can end up
with a vector mode now.  Hopefully that's without bad effects (on the ABI).

That said, the hook _could_ be used just for divmod expansion without
actually creating a complex (or array) type of vectors.

> > The advantage of this from AArch64's PoV is that it extends to 3x and 4x
> > tuples as well, whereas complex is obviously for pairs only.
> >
> > I don't know if it would be acceptable to use that kind of struct wrapper
> > for the divmod code though (for the vector case only).
>
> Looking again, I don't think this will help because GCN does not have an
> instruction that loads vectors that are back-to-back, hence there's
> little benefit in adding the tuple mode.
>
> However, GCN does have instructions that effectively load 2, 3, or 4
> vectors that are *interleaved*, which would be the likely case for
> complex numbers (or pixel colour data!)

that's load_lanes and I think not related here but it probably also
needs the xN modes.

> I need to figure out how to move forward with this patch, please; if the
> new complex modes are not acceptable then I think I need to reimplement
> DIVMOD (maybe the scalars can remain as-is), but it's not clear to me
> what that would look like.
>
> Andrew
Richard Sandiford June 9, 2023, 11:57 a.m. UTC | #11
Richard Biener <richard.guenther@gmail.com> writes:
> On Fri, Jun 9, 2023 at 11:45 AM Andrew Stubbs <ams@codesourcery.com> wrote:
>>
>> On 09/06/2023 10:02, Richard Sandiford wrote:
>> > Andrew Stubbs <ams@codesourcery.com> writes:
>> >> On 07/06/2023 20:42, Richard Sandiford wrote:
>> >>> I don't know if this helps (probably not), but we have a similar
>> >>> situation on AArch64: a 64-bit mode like V8QI can be doubled to a
>> >>> 128-bit vector or to a pair of 64-bit vectors.  We used V16QI for
>> >>> the former and "V2x8QI" for the latter.  V2x8QI is forced to come
>> >>> after V16QI in the mode list, and so it is only ever used through
>> >>> explicit choice.  But both modes are functionally vectors of 16 QIs.
>> >>
>> >> OK, that's interesting, but how do you map "complex int" vectors to that
>> >> mode? I tried to figure it out, but there's no DIVMOD support so I
>> >> couldn't just do a straight comparison.
>> >
>> > Yeah, we don't do that currently.  Instead we make TARGET_ARRAY_MODE
>> > return V2x8QI for an array of 2 V8QIs (which is OK, since V2x8QI has
>> > 64-bit rather than 128-bit alignment).  So we should use it for a
>> > complex-y type like:
>> >
>> >    struct { res_type res[2]; };
>> >
>> > In principle we should be able to do the same for:
>> >
>> >    struct { res_type a, b; };
>> >
>> > but that isn't supported yet.  I think it would need a new target hook
>> > along the lines of TARGET_ARRAY_MODE, but for structs rather than arrays.
>
> And the same should work for complex types, no?  In fact we could document
> that TARGET_ARRAY_MODE also is used for _Complex?  Note the hook
> is used for type layout and thus innocent array types (in aggregates) can end up
> with a vector mode now.

Yeah, that was deliberate.  Given that we have modes for pairs of vectors,
it seemed better to use them even without an explicit opt-in.

> Hopefully that's without bad effects (on the ABI).

Well, I won't make any guarantees :)  But we did check, and it seemed
to be handled correctly.  Most of the AArch64 ABI code is agnostic to
aggregate modes.

> That said, the hook _could_ be used just for divmod expansion without
> actually creating a complex (or array) type of vectors.
>
>> > The advantage of this from AArch64's PoV is that it extends to 3x and 4x
>> > tuples as well, whereas complex is obviously for pairs only.
>> >
>> > I don't know if it would be acceptable to use that kind of struct wrapper
>> > for the divmod code though (for the vector case only).
>>
>> Looking again, I don't think this will help because GCN does not have an
>> instruction that loads vectors that are back-to-back, hence there's
>> little benefit in adding the tuple mode.
>>
>> However, GCN does have instructions that effectively load 2, 3, or 4
>> vectors that are *interleaved*, which would be the likely case for
>> complex numbers (or pixel colour data!)
>
> that's load_lanes and I think not related here but it probably also
> needs the xN modes.

Yeah, we need the modes for that too.

I don't the modes imply that the registers can be loaded and stored
in-order using a single instruction.  That isn't possible for big-endian
AArch64, for example.  It also isn't possible for the equivalent SVE types.
But the modes are still useful in those cases, because of their use in
interleaved loads and stores (and for the ABI).

Thanks,
Richard
diff mbox series

Patch

diff --git a/gcc/config/gcn/gcn-modes.def b/gcc/config/gcn/gcn-modes.def
index 1357bec825d..486168fbeb3 100644
--- a/gcc/config/gcn/gcn-modes.def
+++ b/gcc/config/gcn/gcn-modes.def
@@ -121,3 +121,6 @@  ADJUST_ALIGNMENT (V2TI, 16);
 ADJUST_ALIGNMENT (V2HF, 2);
 ADJUST_ALIGNMENT (V2SF, 4);
 ADJUST_ALIGNMENT (V2DF, 8);
+
+/* These are used for vectorized divmod.  */
+COMPLEX_MODES (VECTOR_INT);
diff --git a/gcc/genmodes.cc b/gcc/genmodes.cc
index 715787b8f48..d472ee5a9a3 100644
--- a/gcc/genmodes.cc
+++ b/gcc/genmodes.cc
@@ -125,6 +125,7 @@  complex_class (enum mode_class c)
     case MODE_INT: return MODE_COMPLEX_INT;
     case MODE_PARTIAL_INT: return MODE_COMPLEX_INT;
     case MODE_FLOAT: return MODE_COMPLEX_FLOAT;
+    case MODE_VECTOR_INT: return MODE_COMPLEX_VECTOR_INT;
     default:
       error ("no complex class for class %s", mode_class_names[c]);
       return MODE_RANDOM;
@@ -382,6 +383,7 @@  complete_mode (struct mode_data *m)
 
     case MODE_COMPLEX_INT:
     case MODE_COMPLEX_FLOAT:
+    case MODE_COMPLEX_VECTOR_INT:
       /* Complex modes should have a component indicated, but no more.  */
       validate_mode (m, UNSET, UNSET, SET, UNSET, UNSET);
       m->ncomponents = 2;
@@ -1173,10 +1175,10 @@  inline __attribute__((__always_inline__))\n\
 #else\n\
 extern __inline__ __attribute__((__always_inline__, __gnu_inline__))\n\
 #endif\n\
-unsigned char\n\
+unsigned short\n\
 mode_unit_size_inline (machine_mode mode)\n\
 {\n\
-  extern CONST_MODE_UNIT_SIZE unsigned char mode_unit_size[NUM_MACHINE_MODES];\
+  extern CONST_MODE_UNIT_SIZE unsigned short mode_unit_size[NUM_MACHINE_MODES];\
 \n\
   gcc_assert (mode >= 0 && mode < NUM_MACHINE_MODES);\n\
   switch (mode)\n\
@@ -1683,7 +1685,7 @@  emit_mode_unit_size (void)
   int c;
   struct mode_data *m;
 
-  print_maybe_const_decl ("%sunsigned char", "mode_unit_size",
+  print_maybe_const_decl ("%sunsigned short", "mode_unit_size",
 			  "NUM_MACHINE_MODES", adj_bytesize);
 
   for_all_modes (c, m)
@@ -1873,6 +1875,7 @@  emit_mode_adjustments (void)
 	    {
 	    case MODE_COMPLEX_INT:
 	    case MODE_COMPLEX_FLOAT:
+            case MODE_COMPLEX_VECTOR_INT:
 	      printf ("  mode_size[E_%smode] = 2*s;\n", m->name);
 	      printf ("  mode_unit_size[E_%smode] = s;\n", m->name);
 	      printf ("  mode_base_align[E_%smode] = s & (~s + 1);\n",
@@ -1920,6 +1923,7 @@  emit_mode_adjustments (void)
 	    {
 	    case MODE_COMPLEX_INT:
 	    case MODE_COMPLEX_FLOAT:
+	    case MODE_COMPLEX_VECTOR_INT:
 	      printf ("  mode_base_align[E_%smode] = s;\n", m->name);
 	      break;
 
diff --git a/gcc/machmode.def b/gcc/machmode.def
index 62e2ba10d45..5bb5ae14fc3 100644
--- a/gcc/machmode.def
+++ b/gcc/machmode.def
@@ -267,6 +267,7 @@  UACCUM_MODE (UTA, 16, 64, 64); /* 64.64 */
 COMPLEX_MODES (INT);
 COMPLEX_MODES (PARTIAL_INT);
 COMPLEX_MODES (FLOAT);
+/* COMPLEX_MODES (VECTOR_INT);  Let's enable these per architecture.  */
 
 /* Decimal floating point modes.  */
 DECIMAL_FLOAT_MODE (SD, 4, decimal_single_format);
diff --git a/gcc/machmode.h b/gcc/machmode.h
index f1865c1ef42..5f4e13f0643 100644
--- a/gcc/machmode.h
+++ b/gcc/machmode.h
@@ -26,7 +26,7 @@  extern CONST_MODE_SIZE poly_uint16_pod mode_size[NUM_MACHINE_MODES];
 extern CONST_MODE_PRECISION poly_uint16_pod mode_precision[NUM_MACHINE_MODES];
 extern const unsigned char mode_inner[NUM_MACHINE_MODES];
 extern CONST_MODE_NUNITS poly_uint16_pod mode_nunits[NUM_MACHINE_MODES];
-extern CONST_MODE_UNIT_SIZE unsigned char mode_unit_size[NUM_MACHINE_MODES];
+extern CONST_MODE_UNIT_SIZE unsigned short mode_unit_size[NUM_MACHINE_MODES];
 extern const unsigned short mode_unit_precision[NUM_MACHINE_MODES];
 extern const unsigned char mode_next[NUM_MACHINE_MODES];
 extern const unsigned char mode_wider[NUM_MACHINE_MODES];
@@ -586,7 +586,7 @@  mode_to_inner (machine_mode mode)
 
 /* Return the base GET_MODE_UNIT_SIZE value for MODE.  */
 
-ALWAYS_INLINE unsigned char
+ALWAYS_INLINE unsigned short
 mode_to_unit_size (machine_mode mode)
 {
 #if GCC_VERSION >= 4001
diff --git a/gcc/mode-classes.def b/gcc/mode-classes.def
index de42d7ee6fb..dc6097081b3 100644
--- a/gcc/mode-classes.def
+++ b/gcc/mode-classes.def
@@ -37,4 +37,5 @@  along with GCC; see the file COPYING3.  If not see
   DEF_MODE_CLASS (MODE_VECTOR_ACCUM),	/* SIMD vectors */		   \
   DEF_MODE_CLASS (MODE_VECTOR_UACCUM),	/* SIMD vectors */		   \
   DEF_MODE_CLASS (MODE_VECTOR_FLOAT),                                      \
+  DEF_MODE_CLASS (MODE_COMPLEX_VECTOR_INT),				   \
   DEF_MODE_CLASS (MODE_OPAQUE)          /* opaque modes */
diff --git a/gcc/stor-layout.cc b/gcc/stor-layout.cc
index 023de8c37db..0699d44e516 100644
--- a/gcc/stor-layout.cc
+++ b/gcc/stor-layout.cc
@@ -391,6 +391,7 @@  int_mode_for_mode (machine_mode mode)
     case MODE_VECTOR_ACCUM:
     case MODE_VECTOR_UFRACT:
     case MODE_VECTOR_UACCUM:
+    case MODE_COMPLEX_VECTOR_INT:
       return int_mode_for_size (GET_MODE_BITSIZE (mode), 0);
 
     case MODE_OPAQUE:
diff --git a/gcc/tree.cc b/gcc/tree.cc
index ead4248b8e5..e1c87bbb0bb 100644
--- a/gcc/tree.cc
+++ b/gcc/tree.cc
@@ -7665,7 +7665,8 @@  build_complex_type (tree component_type, bool named)
 {
   gcc_assert (INTEGRAL_TYPE_P (component_type)
 	      || SCALAR_FLOAT_TYPE_P (component_type)
-	      || FIXED_POINT_TYPE_P (component_type));
+	      || FIXED_POINT_TYPE_P (component_type)
+	      || VECTOR_INTEGER_TYPE_P (component_type));
 
   /* Make a node of the sort we want.  */
   tree probe = make_node (COMPLEX_TYPE);