diff mbox series

[v4,01/10] Initial TI PRU GCC port

Message ID 20180906111217.24365-2-dimitar@dinux.eu
State New
Headers show
Series New backend for the TI PRU processor | expand

Commit Message

Dimitar Dimitrov Sept. 6, 2018, 11:12 a.m. UTC
ChangeLog:

2018-08-29  Dimitar Dimitrov  <dimitar@dinux.eu>

	* configure: Regenerate.
	* configure.ac: Add PRU target.

gcc/ChangeLog:

2018-08-29  Dimitar Dimitrov  <dimitar@dinux.eu>

	* common/config/pru/pru-common.c: New file.
	* config.gcc: Add PRU target.
	* config/pru/alu-zext.md: New file.
	* config/pru/constraints.md: New file.
	* config/pru/predicates.md: New file.
	* config/pru/pru-opts.h: New file.
	* config/pru/pru-passes.c: New file.
	* config/pru/pru-pragma.c: New file.
	* config/pru/pru-protos.h: New file.
	* config/pru/pru.c: New file.
	* config/pru/pru.h: New file.
	* config/pru/pru.md: New file.
	* config/pru/pru.opt: New file.
	* config/pru/t-pru: New file.
	* doc/extend.texi: Document PRU pragmas.
	* doc/invoke.texi: Document PRU-specific options.
	* doc/md.texi: Document PRU asm constraints.

Signed-off-by: Dimitar Dimitrov <dimitar@dinux.eu>
---
 configure.ac                       |    7 +
 gcc/common/config/pru/pru-common.c |   36 +
 gcc/config.gcc                     |    9 +
 gcc/config/pru/alu-zext.md         |  178 +++
 gcc/config/pru/constraints.md      |   88 ++
 gcc/config/pru/predicates.md       |  224 +++
 gcc/config/pru/pru-opts.h          |   31 +
 gcc/config/pru/pru-passes.c        |  234 +++
 gcc/config/pru/pru-pragma.c        |   90 ++
 gcc/config/pru/pru-protos.h        |   72 +
 gcc/config/pru/pru.c               | 3008 ++++++++++++++++++++++++++++++++++++
 gcc/config/pru/pru.h               |  551 +++++++
 gcc/config/pru/pru.md              |  946 ++++++++++++
 gcc/config/pru/pru.opt             |   53 +
 gcc/config/pru/t-pru               |   31 +
 gcc/doc/extend.texi                |   21 +
 gcc/doc/invoke.texi                |   55 +
 gcc/doc/md.texi                    |   22 +
 18 files changed, 5656 insertions(+)
 create mode 100644 gcc/common/config/pru/pru-common.c
 create mode 100644 gcc/config/pru/alu-zext.md
 create mode 100644 gcc/config/pru/constraints.md
 create mode 100644 gcc/config/pru/predicates.md
 create mode 100644 gcc/config/pru/pru-opts.h
 create mode 100644 gcc/config/pru/pru-passes.c
 create mode 100644 gcc/config/pru/pru-pragma.c
 create mode 100644 gcc/config/pru/pru-protos.h
 create mode 100644 gcc/config/pru/pru.c
 create mode 100644 gcc/config/pru/pru.h
 create mode 100644 gcc/config/pru/pru.md
 create mode 100644 gcc/config/pru/pru.opt
 create mode 100644 gcc/config/pru/t-pru

Comments

Richard Sandiford Sept. 13, 2018, 12:02 p.m. UTC | #1
Dimitar Dimitrov <dimitar@dinux.eu> writes:
> +; Specialized IOR/AND patterns for matching setbit/clearbit instructions.
> +;
> +; TODO - allow clrbit and setbit to support (1 << REG) constructs
> +
> +(define_insn "clearbit_<EQD:mode><EQS0:mode>_<bitalu_zext>"
> +  [(set (match_operand:EQD 0 "register_operand"	"=r")

Nit: stray tab instead of space.

> +	(and:EQD
> +	  (zero_extend:EQD
> +	    (match_operand:EQS0 1 "register_operand" "r"))
> +	  (match_operand:EQD 2 "single_zero_operand" "n")))]
> +  ""
> +  "clr\\t%0, %1, %V2"
> +  [(set_attr "type" "alu")
> +   (set_attr "length" "4")])

Very minor suggestion, doesn't need to hold up acceptance, but: some
patterns (like this one) have an explicit length of 4, but others have
an implicit length of 4 thanks to the default value in the define_attr.
It would be more consistent to do one or the other everywhere.

Using EQD and EQS0 like this has the unfortunate effect of creating
patterns like:

   ... (zero_extend:SI (match_operand:SI ...)) ...

(which you rely on for the define_substs) and:

   ... (zero_extend:HI (match_operand:SI ...)) ...

even though these are both invalid rtl.  That said, the rtl passes
should never generate this kind of rtl (i.e. they shouldn't rely on
targets to reject it), so that's probably fine in practice.  It just
adds a bit of bloat.  I also don't have a good suggestion how to fix it
without more infrastructure.  Adding:

  GET_MODE_SIZE (<EQD>mode) >= GET_MODE_SIZE (<EQS0>mode)

should cause gencondmd to remove the cases in which the zero_extend
result is narrower than the input, but doing that everywhere would
be ugly, and would still leave the case in which the zero_extend
result is the same size as the input (which you can't remove without
breaking the define_substs).

So after all that, I think this is fine as-is.

> +(define_insn "sub_impl<EQD:mode><EQS0:mode><EQS1:mode>_<alu3_zext><alu3_zext_op1><alu3_zext_op2>"
> +  [(set (match_operand:EQD 0 "register_operand" "=r,r,r")
> +	(minus:EQD
> +	 (zero_extend:EQD
> +	  (match_operand:EQS0 1 "reg_or_ubyte_operand" "r,r,I"))
> +	 (zero_extend:EQD
> +	  (match_operand:EQS1 2 "reg_or_ubyte_operand" "r,I,r"))))]
> +  ""
> +  "@
> +   sub\\t%0, %1, %2
> +   sub\\t%0, %1, %2
> +   rsb\\t%0, %2, %1"
> +  [(set_attr "type" "alu")
> +   (set_attr "length" "4")])

By convention, subtraction patterns shouldn't accept constants for
operand 2.  Constants should instead be subtracted using an addition
of the negative.

> +(define_constraint "I"
> +  "An unsigned 8-bit constant."
> +  (and (match_code "const_int")
> +       (match_test "UBYTE_INT (ival)")))

As it stands this will reject QImode constants with the top bit set,
since const_ints are always stored in sign-extended form.  E.g. QImode
128 is stored as (const_int -128) rather than (const_int 128).

Unfortunately this is difficult to fix in a clean way, since
const_ints don't store their mode (a long-standing wart) and unlike
predicates, constraints aren't given a mode from context.  The best
way I can think of coping with it is:

a) have a separate constraint for -128...127
b) add a define_mode_attr that maps QI to the new constraint and
   HI and SI to I
c) use <EQS1:...> etc. instead of I in the match_operands

Similar comment for "J" and HImode, although you already have the
"N" as the corresponding signed constraint and so don't need a new one.

> +(define_predicate "const_1_operand"
> +  (and (match_code "const_int")
> +       (match_test "IN_RANGE (INTVAL (op), 1, 1)")))

INTVAL (op) == 1 seems more obvious.

> +(define_predicate "const_ubyte_operand"
> +  (and (match_code "const_int")
> +       (match_test "IN_RANGE (INTVAL (op), 0, 0xff)")))
> +
> +(define_predicate "const_uhword_operand"
> +  (and (match_code "const_int")
> +       (match_test "IN_RANGE (INTVAL (op), 0, 0xffff)")))

Here you can use "INTVAL (op) & GET_MODE_MASK (mode)" to avoid the
problem above, as long as you always pass a mode to these predicates.

> +(define_predicate "ctable_addr_operand"
> +  (and (match_code "const_int")
> +       (match_test ("(pru_get_ctable_base_index (INTVAL (op)) >= 0)"))))
> +
> +(define_predicate "ctable_base_operand"
> +  (and (match_code "const_int")
> +       (match_test ("(pru_get_ctable_exact_base_index (INTVAL (op)) >= 0)"))))

Redundant brackets around the match_test strings.

(I want to rip out that syntax at some point, since allowing brackets
around strings just makes the files harder for scripting tools to parse.)

> +;; Return true if OP is a text segment reference.
> +;; This is needed for program memory address expressions.  Borrowed from AVR.
> +(define_predicate "text_segment_operand"
> +  (match_code "code_label,label_ref,symbol_ref,plus,minus,const")
> +{
> +  switch (GET_CODE (op))
> +    {
> +    case CODE_LABEL:
> +      return true;
> +    case LABEL_REF :
> +      return true;
> +    case SYMBOL_REF :
> +      return SYMBOL_REF_FUNCTION_P (op);
> +    case PLUS :
> +    case MINUS :
> +      /* Assume canonical format of symbol + constant.
> +	 Fall through.  */
> +    case CONST :
> +      return text_segment_operand (XEXP (op, 0), VOIDmode);
> +    default :
> +      return false;
> +    }
> +})

This probably comes from AVR, but: no spaces before ":".

Bit surprised that we can get a CODE_LABEL rather than a LABEL_REF here.
Do you know if that triggers in practice, and if so where?

An IMO neater and slightly more robust way of writing the body is:

  poly_int64 offset:
  rtx base = strip_offset (op, &offset);
  switch (GET_CODE (base))
    {
    case LABEL_REF:
      ...as above...
    case SYMBOL_REF:
      ...as above...
    default:
      return false;
    }

with "plus" and "minus" not in the match_code list (since they should
always appear in consts if they really are text references).

> +;; Return true if OP is a load multiple operation.  It is known to be a
> +;; PARALLEL and the first section will be tested.
> +
> +(define_special_predicate "load_multiple_operation"
> +  (match_code "parallel")
> +{
> +  machine_mode elt_mode;
> +  int count = XVECLEN (op, 0);
> +  unsigned int dest_regno;
> +  rtx src_addr;
> +  int i, off;
> +
> +  /* Perform a quick check so we don't blow up below.  */
> +  if (GET_CODE (XVECEXP (op, 0, 0)) != SET
> +      || GET_CODE (SET_DEST (XVECEXP (op, 0, 0))) != REG
> +      || GET_CODE (SET_SRC (XVECEXP (op, 0, 0))) != MEM)
> +    return false;
> +
> +  dest_regno = REGNO (SET_DEST (XVECEXP (op, 0, 0)));
> +  src_addr = XEXP (SET_SRC (XVECEXP (op, 0, 0)), 0);
> +  elt_mode = GET_MODE (SET_DEST (XVECEXP (op, 0, 0)));
> +
> +  /* Check, is base, or base + displacement.  */
> +
> +  if (GET_CODE (src_addr) == REG)
> +    off = 0;
> +  else if (GET_CODE (src_addr) == PLUS
> +	   && GET_CODE (XEXP (src_addr, 0)) == REG
> +	   && GET_CODE (XEXP (src_addr, 1)) == CONST_INT)
> +    {
> +      off = INTVAL (XEXP (src_addr, 1));
> +      src_addr = XEXP (src_addr, 0);
> +    }
> +  else
> +    return false;
> +
> +  for (i = 1; i < count; i++)
> +    {
> +      rtx elt = XVECEXP (op, 0, i);
> +
> +      if (GET_CODE (elt) != SET
> +	  || GET_CODE (SET_DEST (elt)) != REG
> +	  || GET_MODE (SET_DEST (elt)) != elt_mode
> +	  || REGNO (SET_DEST (elt)) != dest_regno + i
> +	  || GET_CODE (SET_SRC (elt)) != MEM
> +	  || GET_MODE (SET_SRC (elt)) != elt_mode
> +	  || GET_CODE (XEXP (SET_SRC (elt), 0)) != PLUS
> +	  || ! rtx_equal_p (XEXP (XEXP (SET_SRC (elt), 0), 0), src_addr)
> +	  || GET_CODE (XEXP (XEXP (SET_SRC (elt), 0), 1)) != CONST_INT
> +	  || INTVAL (XEXP (XEXP (SET_SRC (elt), 0), 1))
> +	     != off + i * GET_MODE_SIZE (elt_mode))
> +	return false;
> +    }
> +
> +  return true;
> +})

This may not matter in practice, but the downside of testing PLUS and
CONST_INT explicitly is that the predicate won't match something like:

  (set (reg R1) (mem (plus (reg RB) (const_int -4))))
  (set (reg R2) (mem (reg RB)))
  (set (reg R3) (mem (plus (reg RB) (const_int 4))))
  ...

Using strip_offset on the addresses would avoid this.

> +;; Return true if OP is a store multiple operation.  It is known to be a
> +;; PARALLEL and the first section will be tested.
> +
> +(define_special_predicate "store_multiple_operation"
> +  (match_code "parallel")
> +{
> +  machine_mode elt_mode;
> +  int count = XVECLEN (op, 0);
> +  unsigned int src_regno;
> +  rtx dest_addr;
> +  int i, off;
> +
> +  /* Perform a quick check so we don't blow up below.  */
> +  if (GET_CODE (XVECEXP (op, 0, 0)) != SET
> +      || GET_CODE (SET_DEST (XVECEXP (op, 0, 0))) != MEM
> +      || GET_CODE (SET_SRC (XVECEXP (op, 0, 0))) != REG)
> +    return false;
> +
> +  src_regno = REGNO (SET_SRC (XVECEXP (op, 0, 0)));
> +  dest_addr = XEXP (SET_DEST (XVECEXP (op, 0, 0)), 0);
> +  elt_mode = GET_MODE (SET_SRC (XVECEXP (op, 0, 0)));
> +
> +  /* Check, is base, or base + displacement.  */
> +
> +  if (GET_CODE (dest_addr) == REG)
> +    off = 0;
> +  else if (GET_CODE (dest_addr) == PLUS
> +	   && GET_CODE (XEXP (dest_addr, 0)) == REG
> +	   && GET_CODE (XEXP (dest_addr, 1)) == CONST_INT)
> +    {
> +      off = INTVAL (XEXP (dest_addr, 1));
> +      dest_addr = XEXP (dest_addr, 0);
> +    }
> +  else
> +    return false;
> +
> +  for (i = 1; i < count; i++)
> +    {
> +      rtx elt = XVECEXP (op, 0, i);
> +
> +      if (GET_CODE (elt) != SET
> +	  || GET_CODE (SET_SRC (elt)) != REG
> +	  || GET_MODE (SET_SRC (elt)) != elt_mode
> +	  || REGNO (SET_SRC (elt)) != src_regno + i
> +	  || GET_CODE (SET_DEST (elt)) != MEM
> +	  || GET_MODE (SET_DEST (elt)) != elt_mode
> +	  || GET_CODE (XEXP (SET_DEST (elt), 0)) != PLUS
> +	  || ! rtx_equal_p (XEXP (XEXP (SET_DEST (elt), 0), 0), dest_addr)
> +	  || GET_CODE (XEXP (XEXP (SET_DEST (elt), 0), 1)) != CONST_INT
> +	  || INTVAL (XEXP (XEXP (SET_DEST (elt), 0), 1))
> +	     != off + i * GET_MODE_SIZE (elt_mode))
> +	return false;
> +    }
> +  return true;
> +})

Same comment here.

> +/* Callback for walk_gimple_seq that checks TP tree for TI ABI compliance.  */
> +static tree
> +check_op_callback (tree *tp, int *walk_subtrees, void *data)
> +{
> +  struct walk_stmt_info *wi = (struct walk_stmt_info *) data;
> +
> +  if (RECORD_OR_UNION_TYPE_P (*tp) || TREE_CODE (*tp) == ENUMERAL_TYPE)
> +    {
> +      /* Forward declarations have NULL tree type.  Skip them.  */
> +      if (TREE_TYPE (*tp) == NULL)
> +	return NULL;
> +    }
> +
> +  /* TODO - why C++ leaves INTEGER_TYPE forward declarations around?  */
> +  if (TREE_TYPE (*tp) == NULL)
> +    return NULL;
> +
> +  const tree type = TREE_TYPE (*tp);
> +
> +  /* Direct function calls are allowed, obviously.  */
> +  if (TREE_CODE (*tp) == ADDR_EXPR && TREE_CODE (type) == POINTER_TYPE)
> +    {
> +      const tree ptype = TREE_TYPE (type);
> +      if (TREE_CODE (ptype) == FUNCTION_TYPE)
> +	return NULL;
> +    }

This seems like a bit of a dangerous heuristic.  Couldn't it also cause
us to skip things like:

   (void *) func

?  (OK, that's dubious C, but we do support it.)

Maybe it would be better to check something like:

  gcall *call = dyn_cast <gcall *> (gsi_stmt (wi->gsi));
  if (call
      && tp == gimple_call_fn_ptr (call)
      && gimple_call_fndecl (call))
    return NULL;

instead (completely untested).

> +/* Implements the "pragma CTABLE_ENTRY" pragma.  This pragma takes a
> +   CTABLE index and an address, and instructs the compiler that
> +   LBCO/SBCO can be used on that base address.
> +
> +   WARNING: Only immediate constant addresses are currently supported.  */
> +static void
> +pru_pragma_ctable_entry (cpp_reader * reader ATTRIBUTE_UNUSED)
> +{
> +  tree ctable_index, base_addr;
> +  enum cpp_ttype type;
> +
> +  type = pragma_lex (&ctable_index);
> +  if (type == CPP_NUMBER)
> +    {
> +      type = pragma_lex (&base_addr);
> +      if (type == CPP_NUMBER)
> +	{
> +	  unsigned int i = tree_to_uhwi (ctable_index);
> +	  unsigned HOST_WIDE_INT base = tree_to_uhwi (base_addr);

You should test tree_fits_uhwi_p as well as type == CPP_NUMBER,
otherwise this could ICE for large values.  Also, "i" should be
"unsigned HOST_WIDE_INT", so that silly indices like 0x100000000
don't get silently truncated to 0.

> +extern const char * pru_output_sign_extend (rtx *);
> +extern const char * pru_output_signed_cbranch (rtx *, bool);
> +extern const char * pru_output_signed_cbranch_ubyteop2 (rtx *, bool);
> +extern const char * pru_output_signed_cbranch_zeroop2 (rtx *, bool);

Nit: should be no space between "*" and the function name.

> +#ifdef TREE_CODE
> +bool pru_return_in_memory (const_tree type, const_tree fntype ATTRIBUTE_UNUSED);
> +#endif /* TREE_CODE */

Nit: ATTRIBUTE_UNUSED doesn't have any meaning for prototypes,
and could easily be forgotten if the implementation does end up looking
at fntype in future.

> +/* Return the bytes needed to compute the frame pointer from the current
> +   stack pointer.  */
> +static int
> +pru_compute_frame_layout (void)
> +{
> +  int regno;
> +  HARD_REG_SET *save_mask;
> +  int total_size;
> +  int var_size;
> +  int out_args_size;
> +  int save_reg_size;
> +
> +  if (cfun->machine->initialized)
> +    return cfun->machine->total_size;
[...]
> +  cfun->machine->initialized = reload_completed;

Rather than this, please convert to the new TARGET_COMPUTE_FRAME_LAYOUT hook,
which is called only when necessary, and so doesn't need to protect against
repeated evaluations.

> +  /* Attach a note indicating what happened.  */
> +  if (reg_note_rtx == NULL_RTX)
> +    reg_note_rtx = copy_rtx (op0_adjust);
> +  if (kind != REG_NOTE_MAX)
> +    add_reg_note (insn, kind, reg_note_rtx);

Nit: would be better to copy only if kind != REG_NOTE_MAX.

> +  /* Ok, save this bunch.  */
> +  addr = gen_rtx_PLUS (Pmode, stack_pointer_rtx,
> +		       gen_int_mode (*sp_offset, Pmode));

Better to use plus_constant (which is shorter and also copies with a
zero offset).

> +/* Emit function prologue.  */
> +void
> +pru_expand_prologue (void)
> +{
> +  int regno_start;
> +  int total_frame_size;
> +  int sp_offset;      /* Offset from base_reg to final stack value.  */
> +  int save_regs_base; /* Offset from base_reg to register save area.  */
> +  int save_offset;    /* Temporary offset to currently saved register group.  */
> +  rtx insn;
> +
> +  total_frame_size = pru_compute_frame_layout ();
> +
> +  if (flag_stack_usage_info)
> +    current_function_static_stack_size = total_frame_size;
> +
> +  /* Decrement the stack pointer.  */
> +  if (!UBYTE_INT (total_frame_size))
> +    {
> +      /* We need an intermediary point, this will point at the spill block.  */
> +      insn = pru_add_to_sp (cfun->machine->save_regs_offset
> +			     - total_frame_size,
> +			     REG_NOTE_MAX);
> +      save_regs_base = 0;
> +      sp_offset = -cfun->machine->save_regs_offset;
> +    }
> +  else if (total_frame_size)
> +    {
> +      insn = emit_insn (gen_sub2_insn (stack_pointer_rtx,
> +				       gen_int_mode (total_frame_size,
> +						     Pmode)));

As above, constant amounts should be subtracted using an add2_insn with
the negative amount: insns that subtract a const_int aren't canonical.

> +  regno_start = 0;
> +  save_offset = save_regs_base;
> +  do {
> +      regno_start = xbbo_next_reg_cluster (regno_start, &save_offset, true);
> +  } while (regno_start >= 0);

Nit: GNU formatting would be:

  do
    regno_start = xbbo_next_reg_cluster (regno_start, &save_offset, true);
  while (regno_start >= 0);

Same for the copy in the epilogue.

> +  if (frame_pointer_needed)
> +    {
> +      int fp_save_offset = save_regs_base + cfun->machine->fp_save_offset;
> +      pru_add3_frame_adjust (hard_frame_pointer_rtx,
> +				    stack_pointer_rtx,
> +				    fp_save_offset,
> +				    REG_NOTE_MAX,
> +				    NULL_RTX);

Indentation looks off.

> +    }
> +
> +  if (sp_offset)
> +      pru_add_to_sp (sp_offset, REG_FRAME_RELATED_EXPR);

Same here (and for the epilogue code).

> +  else if (!UBYTE_INT (total_frame_size))
> +    {
> +      pru_add_to_sp (cfun->machine->save_regs_offset,
> +			    REG_CFA_ADJUST_CFA);

Same here (sorry!).

> +/* Implement TARGET_MODES_TIEABLE_P.  */
> +
> +static bool
> +pru_modes_tieable_p (machine_mode mode1, machine_mode mode2)
> +{
> +  return (mode1 == mode2
> +	  || (GET_MODE_SIZE (mode1) <= 4 && GET_MODE_SIZE (mode2) <= 4));
> +}

I think this should have a comment explaining why this implementation
is needed.

> +/* Implement TARGET_HARD_REGNO_MODE_OK.  */
> +
> +static bool
> +pru_hard_regno_mode_ok (unsigned int regno, machine_mode mode)
> +{
> +  switch (GET_MODE_SIZE (mode))
> +    {
> +    case 1: return true;
> +    case 2: return (regno % 4) <= 2;
> +    case 4: return (regno % 4) == 0;
> +    default: return (regno % 4) == 0; /* Not sure why VOIDmode is passed.  */
> +    }
> +}

The comment made me think that the default case was only there for
VOIDmode, but it looks from the rest of the port like you do support
modes bigger than 4 units.  Might be worth writing in a way that makes
that more obvious.

But yeah, passing VOIDmode is a bug.

> +/* Implement `TARGET_HARD_REGNO_SCRATCH_OK'.
> +   Returns true if REGNO is safe to be allocated as a scratch
> +   register (for a define_peephole2) in the current function.  */
> +
> +static bool
> +pru_hard_regno_scratch_ok (unsigned int regno)
> +{
> +  /* Don't allow hard registers that might be part of the frame pointer.
> +     Some places in the compiler just test for [HARD_]FRAME_POINTER_REGNUM
> +     and don't handle a frame pointer that spans more than one register.  */
> +
> +  if ((!reload_completed || frame_pointer_needed)
> +      && ((regno >= HARD_FRAME_POINTER_REGNUM
> +	   && regno <= HARD_FRAME_POINTER_REGNUM + 3)
> +	  || (regno >= FRAME_POINTER_REGNUM
> +	      && regno <= FRAME_POINTER_REGNUM + 3)))
> +    {
> +      return false;
> +    }
> +
> +  return true;
> +}
> +
> +
> +/* Worker function for `HARD_REGNO_RENAME_OK'.
> +   Return nonzero if register OLD_REG can be renamed to register NEW_REG.  */
> +
> +int
> +pru_hard_regno_rename_ok (unsigned int old_reg,
> +			  unsigned int new_reg)
> +{
> +  /* Don't allow hard registers that might be part of the frame pointer.
> +     Some places in the compiler just test for [HARD_]FRAME_POINTER_REGNUM
> +     and don't care for a frame pointer that spans more than one register.  */
> +  if ((!reload_completed || frame_pointer_needed)
> +      && ((old_reg >= HARD_FRAME_POINTER_REGNUM
> +	   && old_reg <= HARD_FRAME_POINTER_REGNUM + 3)
> +	  || (old_reg >= FRAME_POINTER_REGNUM
> +	      && old_reg <= FRAME_POINTER_REGNUM + 3)
> +	  || (new_reg >= HARD_FRAME_POINTER_REGNUM
> +	      && new_reg <= HARD_FRAME_POINTER_REGNUM + 3)
> +	  || (new_reg >= FRAME_POINTER_REGNUM
> +	      && new_reg <= FRAME_POINTER_REGNUM + 3)))
> +    {
> +      return 0;
> +    }
> +
> +  return 1;
> +}

I realise you've borrowed this approach from AVR, but it would really be
good to fix the target-independent code instead of hacking around it
in target code.

That said, given that this is what AVR already does, it would be unfair
to insist on it before accepting the port.

GNU style is to have no braces around single statements.

> +/* Compute a (partial) cost for rtx X.  Return true if the complete
> +   cost has been computed, and false if subexpressions should be
> +   scanned.  In either case, *TOTAL contains the cost result.  */
> +static bool
> +pru_rtx_costs (rtx x, machine_mode mode ATTRIBUTE_UNUSED,
> +	       int outer_code, int opno ATTRIBUTE_UNUSED,
> +	       int *total, bool speed ATTRIBUTE_UNUSED)
> +{
> +  const int code = GET_CODE (x);
> +
> +  switch (code)
> +    {
> +      case CONST_INT:

Nit: case statements should be indented by the same amount as the switch "{".

> +    case SET:
> +	{
> +	  /* A SET doesn't have a mode, so let's look at the SET_DEST to get
> +	     the mode for the factor.  */
> +	  mode = GET_MODE (SET_DEST (x));
> +
> +	  if (GET_MODE_SIZE (mode) <= GET_MODE_SIZE (SImode)
> +	      && (GET_CODE (SET_SRC (x)) == ZERO_EXTEND
> +		  || outer_code == ZERO_EXTEND))

The outer_code == ZERO_EXTEND seems redundant: SETs can never be
embedded in ZERO_EXTENDs.

> +	    {
> +	      *total = 0;
> +	    }
> +	  else
> +	    {
> +	      /* SI move has the same cost as a QI move.  */
> +	      int factor = GET_MODE_SIZE (mode) / GET_MODE_SIZE (SImode);
> +	      if (factor == 0)
> +		factor = 1;
> +	      *total = factor * COSTS_N_INSNS (1);
> +	    }

Could you explain this a bit more?  It looks like a pure QImode operation
gets a cost of 1 insn but an SImode operation zero-extended from QImode
gets a cost of 0.

> +      case PLUS:
> +	{
> +	  rtx op0 = XEXP (x, 0);
> +	  rtx op1 = XEXP (x, 1);
> +	  if (outer_code == MEM
> +	      && ((REG_P (op0) && reg_or_ubyte_operand (op1, VOIDmode))
> +		  || (REG_P (op1) && reg_or_ubyte_operand (op0, VOIDmode))

This second check shouldn't be necessary: (plus (const_int ...) (reg ...))
is invalid.

> +		  || (ctable_addr_operand (op0, VOIDmode) && op1 == NULL_RTX)
> +		  || (ctable_addr_operand (op1, VOIDmode) && op0 == NULL_RTX)

(plus ...) should never have null operands.

> +		  || (ctable_base_operand (op0, VOIDmode) && REG_P (op1))
> +		  || (ctable_base_operand (op1, VOIDmode) && REG_P (op0))))
> +	    {
> +	      /* CTABLE or REG base addressing - PLUS comes for free.  */
> +	      *total = COSTS_N_INSNS (0);
> +	      return true;
> +	    }
> +	  else
> +	    {
> +	      *total = COSTS_N_INSNS (1);
> +	      return false;
> +	    }
> +	}

[...]

> +/* Implement TARGET_PREFERRED_RELOAD_CLASS.  */
> +static reg_class_t
> +pru_preferred_reload_class (rtx x ATTRIBUTE_UNUSED, reg_class_t regclass)
> +{
> +  return regclass == NO_REGS ? GENERAL_REGS : regclass;
> +}

This looks odd: PREFERRED_RELOAD_CLASS should return a subset of the
original class rather than add new registers.  Can you remember why
it was needed?

> +  /* Floating-point comparisons.  */
> +  eqsf_libfunc = init_one_libfunc ("__pruabi_eqf");
> +  nesf_libfunc = init_one_libfunc ("__pruabi_neqf");
> +  lesf_libfunc = init_one_libfunc ("__pruabi_lef");
> +  ltsf_libfunc = init_one_libfunc ("__pruabi_ltf");
> +  gesf_libfunc = init_one_libfunc ("__pruabi_gef");
> +  gtsf_libfunc = init_one_libfunc ("__pruabi_gtf");
> +  eqdf_libfunc = init_one_libfunc ("__pruabi_eqd");
> +  nedf_libfunc = init_one_libfunc ("__pruabi_neqd");
> +  ledf_libfunc = init_one_libfunc ("__pruabi_led");
> +  ltdf_libfunc = init_one_libfunc ("__pruabi_ltd");
> +  gedf_libfunc = init_one_libfunc ("__pruabi_ged");
> +  gtdf_libfunc = init_one_libfunc ("__pruabi_gtd");
> +
> +  set_optab_libfunc (eq_optab, SFmode, NULL);
> +  set_optab_libfunc (ne_optab, SFmode, "__pruabi_neqf");
> +  set_optab_libfunc (gt_optab, SFmode, NULL);
> +  set_optab_libfunc (ge_optab, SFmode, NULL);
> +  set_optab_libfunc (lt_optab, SFmode, NULL);
> +  set_optab_libfunc (le_optab, SFmode, NULL);
> +  set_optab_libfunc (unord_optab, SFmode, "__pruabi_unordf");
> +  set_optab_libfunc (eq_optab, DFmode, NULL);
> +  set_optab_libfunc (ne_optab, DFmode, "__pruabi_neqd");
> +  set_optab_libfunc (gt_optab, DFmode, NULL);
> +  set_optab_libfunc (ge_optab, DFmode, NULL);
> +  set_optab_libfunc (lt_optab, DFmode, NULL);
> +  set_optab_libfunc (le_optab, DFmode, NULL);

So you need to pass NULL for the ordered comparisons because the ABI
routines return a different value from the one GCC expects?  Might be
worth a comment if so.

> +/* Emit comparison instruction if necessary, returning the expression
> +   that holds the compare result in the proper mode.  Return the comparison
> +   that should be used in the jump insn.  */
> +
> +rtx
> +pru_expand_fp_compare (rtx comparison, machine_mode mode)
> +{
> +  enum rtx_code code = GET_CODE (comparison);
> +  rtx op0 = XEXP (comparison, 0);
> +  rtx op1 = XEXP (comparison, 1);
> +  rtx cmp;
> +  enum rtx_code jump_code = code;
> +  machine_mode op_mode = GET_MODE (op0);
> +  rtx_insn *insns;
> +  rtx libfunc;
> +
> +  gcc_assert (op_mode == DFmode || op_mode == SFmode);
> +
> +  if (code == UNGE)
> +    {
> +      code = LT;
> +      jump_code = EQ;
> +    }
> +  else if (code == UNLE)
> +    {
> +      code = GT;
> +      jump_code = EQ;
> +    }

This is only safe if the LT and GT libcalls don't raise exceptions for
unordered inputs.  I'm guessing they don't, but it's probably worth a
comment if so.

> +/* Return the sign bit position for given OP's mode.  */
> +static int
> +sign_bit_position (const rtx op)
> +{
> +  const int sz = GET_MODE_SIZE (GET_MODE (op));
> +
> +  return  sz * 8 - 1;
> +}

Nit: excess space after "return".

> +  gcc_assert (bufi > 0);
> +  gcc_assert ((unsigned int)bufi < sizeof (buf));

Nit: missing space after "(unsigned int)".  Other instances later.

> +  /* Determine the comparison operators for positive and negative operands.  */
> +  if (code == LT)
> +      cmp_opstr = "qblt";
> +  else if (code == LE)
> +      cmp_opstr = "qble";
> +  else
> +      gcc_unreachable ();

Nit: indented too far.  Same for gtge.

> +/* Output asm code for a signed-compare conditional branch.
> +
> +   If IS_NEAR is true, then QBBx instructions may be used for reaching
> +   the destination label.  Otherwise JMP is used, at the expense of
> +   increased code size.  */
> +const char *
> +pru_output_signed_cbranch (rtx *operands, bool is_near)
> +{
> +  enum rtx_code code = GET_CODE (operands[0]);
> +
> +  if (code == LT || code == LE)
> +    return pru_output_ltle_signed_cbranch (operands, is_near);
> +  else if (code == GT || code == GE)
> +    return pru_output_gtge_signed_cbranch (operands, is_near);
> +  else
> +      gcc_unreachable ();

gcc_unreachable indented too far.

> +/*
> +   Optimized version of pru_output_signed_cbranch for constant second
> +   operand.  */
> +
> +const char *
> +pru_output_signed_cbranch_ubyteop2 (rtx *operands, bool is_near)

Excess newline after "/*".  Same for later routines.

> +/* Return true if register REGNO is a valid base register.
> +   STRICT_P is true if REG_OK_STRICT is in effect.  */
> +
> +bool
> +pru_regno_ok_for_base_p (int regno, bool strict_p)
> +{
> +  if (!HARD_REGISTER_NUM_P (regno))
> +    {
> +      if (!strict_p)
> +	return true;
> +
> +      if (!reg_renumber)
> +	return false;
> +
> +      regno = reg_renumber[regno];

reg_renumber is a reload-only thing and shouldn't be needed/used on
LRA targets.

> +/* Recognize a CTABLE base address.  Return CTABLE entry index, or -1 if
> + * base was not found in the pragma-filled pru_ctable.  */
> +int
> +pru_get_ctable_exact_base_index (unsigned HOST_WIDE_INT caddr)

Nit: no leading "*" on second line.

> +/* Return true if the address expression formed by BASE + OFFSET is
> +   valid.  */
> +static bool
> +pru_valid_addr_expr_p (machine_mode mode, rtx base, rtx offset, bool strict_p)
> +{
> +  if (!strict_p && base != NULL_RTX && GET_CODE (base) == SUBREG)
> +    base = SUBREG_REG (base);
> +  if (!strict_p && offset != NULL_RTX && GET_CODE (offset) == SUBREG)
> +    offset = SUBREG_REG (offset);

As above, this shouldn't need to cope with null base and offset, since
PLUS can never have null operands.

> +  if (REG_P (base)
> +      && pru_regno_ok_for_base_p (REGNO (base), strict_p)
> +      && (offset == NULL_RTX
> +	  || (CONST_INT_P (offset)
> +	      && pru_valid_const_ubyte_offset (mode, INTVAL (offset)))
> +	  || (REG_P (offset)
> +	      && pru_regno_ok_for_index_p (REGNO (offset), strict_p))))
> +    {
> +      /*     base register + register offset
> +       * OR  base register + UBYTE constant offset.  */
> +      return true;
> +    }
> +  else if (REG_P (base)
> +	   && pru_regno_ok_for_index_p (REGNO (base), strict_p)
> +	   && (offset != NULL_RTX && ctable_base_operand (offset, VOIDmode)))
> +    {
> +      /*     base CTABLE constant base + register offset
> +       * Note: GCC always puts the register as a first operand of PLUS.  */
> +      return true;
> +    }
> +  else if (CONST_INT_P (base)
> +	   && offset == NULL_RTX
> +	   && (ctable_addr_operand (base, VOIDmode)))
> +    {
> +      /*     base CTABLE constant base + UBYTE constant offset.  */
> +      return true;

This address should simply be a (const_int ...), with no containing (plus ...).

> +  switch (GET_CODE (op))
> +    {
> +    case REG:
> +      if (letter == 0)
> +	{
> +	  fprintf (file, "%s", pru_asm_regname (op));
> +	  return;
> +	}
> +      else if (letter == 'b')
> +	{
> +	  gcc_assert (REGNO (op) <= LAST_NONIO_GP_REG);
> +	  fprintf (file, "r%d.b%d", REGNO (op) / 4, REGNO (op) % 4);
> +	  return;
> +	}
> +      else if (letter == 'F')
> +	{
> +	  gcc_assert (REGNO (op) <= LAST_NONIO_GP_REG);
> +	  gcc_assert (REGNO (op) % 4 == 0);
> +	  fprintf (file, "r%d", REGNO (op) / 4);
> +	  return;

This routine needs to use output_operand_lossage instead of an assert
for anything that could conceivably come from inline asms.  E.g. the
assert for REGNO (op) % 4 == 0 would cause in an ICE if the user
(wrongly) used %F with a byte register.

This is true even for undocumented prefix letters, since we don't give
a specific error for using undocumented features and could still reach
this point for them.

> +    case CONST_DOUBLE:
> +      if (letter == 0)
> +	{
> +	  long val;
> +
> +	  if (GET_MODE (op) != SFmode)
> +	    fatal_insn ("internal compiler error.  Unknown mode:", op);

Same here.  DOUBLE_TYPE_SIZE is 64, so an inline asm like:

  asm ("# %0" :: "F" (1.0));

could get this internal compiler error.

> +    case SUBREG:
> +    case MEM:
> +      if (letter == 0)
> +	{
> +	  output_address (VOIDmode, op);
> +	  return;
> +	}

Does the SUBREG case ever hit?  They shouldn't appear as operands
at this late stage.

> +/* Implement TARGET_PRINT_OPERAND_ADDRESS.  */
> +static void
> +pru_print_operand_address (FILE *file, machine_mode mode, rtx op)
> +{
> +  if (GET_CODE (op) != REG && CONSTANT_ADDRESS_P (op)
> +      && text_segment_operand (op, VOIDmode))
> +    {
> +      fprintf (stderr, "Unexpectred text address?\n");
> +      debug_rtx (op);
> +      gcc_unreachable ();
> +    }

CONSTANT_ADDRESS_P makes the test for REG redundant.  This too could
trigger for inline asms and should be reported as an error (if that's
the right thing to do) rather than an ICE.

> +    case PLUS:
> +      {
> +	int base;
> +	rtx op0 = XEXP (op, 0);
> +	rtx op1 = XEXP (op, 1);
> +
> +	if (REG_P (op0) && CONST_INT_P (op1)
> +	    && pru_get_ctable_exact_base_index (INTVAL (op1)) >= 0)
> +	  {
> +	    base = pru_get_ctable_exact_base_index (INTVAL (op1));
> +	    fprintf (file, "%d, %s", base, pru_asm_regname (op0));
> +	    return;
> +	  }
> +	else if (REG_P (op1) && CONST_INT_P (op0)
> +		 && pru_get_ctable_exact_base_index (INTVAL (op0)) >= 0)
> +	  {
> +	    base = pru_get_ctable_exact_base_index (INTVAL (op0));
> +	    fprintf (file, "%d, %s", base, pru_asm_regname (op1));
> +	    return;
> +	  }

As above it's a bug if this last case ever occurs.

> +	else if (REG_P (op0) && CONSTANT_P (op1))
> +	  {
> +	    fprintf (file, "%s, ", pru_asm_regname (op0));
> +	    output_addr_const (file, op1);
> +	    return;
> +	  }
> +	else if (REG_P (op1) && CONSTANT_P (op0))
> +	  {
> +	    fprintf (file, "%s, ", pru_asm_regname (op1));
> +	    output_addr_const (file, op0);
> +	    return;
> +	  }

Same here.

> +	else if (REG_P (op1) && REG_P (op0))
> +	  {
> +	    fprintf (file, "%s, %s", pru_asm_regname (op0),
> +				     pru_asm_regname (op1));
> +	    return;
> +	  }
> +      }
> +      break;
> +
> +    case REG:
> +      fprintf (file, "%s, 0", pru_asm_regname (op));
> +      return;
> +
> +    case MEM:
> +      {
> +	rtx base = XEXP (op, 0);
> +	pru_print_operand_address (file, mode, base);
> +	return;
> +      }
> +    default:
> +      break;
> +    }
> +
> +  fprintf (stderr, "Missing way to print address\n");
> +  debug_rtx (op);
> +  gcc_unreachable ();

Same comment about ICEing on user inline asms.

> +/* Implement TARGET_ASM_FILETARGET_ASM_FILE_START_START.  */

Typo.

> +/* Helper function to get the starting storage HW register for an argument,
> +   or -1 if it must be passed on stack.  The cum_v state is not changed.  */
> +static int
> +pru_function_arg_regi (cumulative_args_t cum_v,
> +		       machine_mode mode, const_tree type,
> +		       bool named)
> +{
> +  CUMULATIVE_ARGS *cum = get_cumulative_args (cum_v);
> +  size_t argsize = pru_function_arg_size (mode, type);
> +  size_t i, bi;
> +  int regi = -1;
> +
> +  if (!pru_arg_in_reg_bysize (argsize))
> +    return -1;
> +
> +  if (!named)
> +    return -1;
> +
> +  /* Find the first available slot that fits.  Yes, that's the PRU ABI.  */
> +  for (i = 0; regi < 0 && i < ARRAY_SIZE (cum->regs_used); i++)
> +    {
> +      if (mode == BLKmode)
> +	{
> +	  /* Structs are passed, beginning at a full register.  */
> +	  if ((i % 4) != 0)
> +	    continue;
> +	}

The comment doesn't seem to match the code: a struct like:

  struct s { short x; };

will have HImode rather than BLKmode.

It's usually better to base ABI stuff on the type where possible,
since that corresponds directly to the source language, while modes
are more of an internal GCC thing.

> +  for (; count < 2; count++)
> +      emit_insn_before (gen_nop (), last_insn);

Nit: indented too far.

> +/* Code for converting doloop_begins and doloop_ends into valid
> +   PRU instructions.  Idea and code snippets borrowed from mep port.
> +
> +   A doloop_begin is just a placeholder:
> +
> +	$count = unspec ($count)
> +
> +   where $count is initially the number of iterations.
> +   doloop_end has the form:
> +
> +	if (--$count == 0) goto label
> +
> +   The counter variable is private to the doloop insns, nothing else
> +   relies on its value.
> +
> +   There are three cases, in decreasing order of preference:
> +
> +      1.  A loop has exactly one doloop_begin and one doloop_end.
> +	 The doloop_end branches to the first instruction after
> +	 the doloop_begin.
> +
> +	 In this case we can replace the doloop_begin with a LOOP
> +	 instruction and remove the doloop_end.  I.e.:
> +
> +		$count1 = unspec ($count1)
> +	    label:
> +		...
> +		if (--$count2 != 0) goto label
> +
> +	  becomes:
> +
> +		LOOP end_label,$count1
> +	    label:
> +		...
> +	    end_label:
> +		# end loop
> +
> +      2.  As for (1), except there are several doloop_ends.  One of them
> +	 (call it X) falls through to a label L.  All the others fall
> +	 through to branches to L.
> +
> +	 In this case, we remove X and replace the other doloop_ends
> +	 with branches to the LOOP label.  For example:
> +
> +		$count1 = unspec ($count1)
> +	    label:
> +		...
> +		if (--$count1 != 0) goto label
> +	    end_label:
> +		...
> +		if (--$count2 != 0) goto label
> +		goto end_label
> +
> +	 becomes:
> +
> +		LOOP end_label,$count1
> +	    label:
> +		...
> +	    end_label:
> +		# end repeat
> +		...
> +		goto end_label
> +
> +      3.  The fallback case.  Replace doloop_begins with:
> +
> +		$count = $count
> +
> +	 Replace doloop_ends with the equivalent of:
> +
> +		$count = $count - 1
> +		if ($count != 0) goto loop_label
> +
> +	 */

Ooh, I remember this :-)  Nice to see the code live on in some form.

> +  pru_builtins[PRU_BUILTIN_DELAY_CYCLES]
> +    = add_builtin_function ( "__delay_cycles", void_ftype_longlong,
> +			    PRU_BUILTIN_DELAY_CYCLES, BUILT_IN_MD, NULL,
> +			    NULL_TREE);

Nit: stray space before "__delay_cycles".

> +/* Emit a sequence of one or more delay_cycles_X insns, in order to generate
> +   code that delays exactly ARG cycles.  */
> +
> +static rtx
> +pru_expand_delay_cycles (rtx arg)
> +{
> +  HOST_WIDE_INT c, n;
> +
> +  if (GET_CODE (arg) != CONST_INT)
> +    {
> +      error ("%<__delay_cycles%> only takes constant arguments");
> +      return NULL_RTX;
> +    }
> +
> +  c = INTVAL (arg);
> +
> +  if (HOST_BITS_PER_WIDE_INT > 32)

This is always true now.

> +/* TI ABI implementation is not feature enough (e.g. function pointers are
> +   not supported), so we cannot list it as a multilib variant.  To prevent
> +   misuse from users, do not link any of the standard libraries.  */

Maybe s/feature enough/feature-complete enough/ (already correct in the
t-pru comment).

> +/* Basic characteristics of PRU registers:
> +
> +   Regno  Name
> +   0      r0		  Caller Saved.  Also used as a static chain register.
> +   1      r1		  Caller Saved.  Also used as a temporary by function
> +			  profiler and function prologue/epilogue.
> +   2      r2       sp	  Stack Pointer
> +   3*     r3.w0    ra	  Return Address (16-bit)
> +   4      r4       fp	  Frame Pointer
> +   5-13   r5-r13	  Callee Saved Registers
> +   14-29  r14-r29	  Register Arguments.  Caller Saved Registers.
> +   14-15  r14-r15	  Return Location
> +   30     r30		  Special I/O register.  Not used by compiler.
> +   31     r31		  Special I/O register.  Not used by compiler.
> +
> +   32     loop_cntr	  Internal register used as a counter by LOOP insns
> +
> +   33     pc		  Not an actual register
> +
> +   34     fake_fp	  Fake Frame Pointer (always eliminated)
> +   35     fake_ap	  Fake Argument Pointer (always eliminated)
> +   36			  First Pseudo Register
> +
> +   The definitions for all the hard register numbers are located in pru.md.
> +*/

Might be worth clarifying this to say that there are four GCC registers
per PRU register, so e.g. the first pseudo register is actually 144 rather
than 36.

> +/* Tests for various kinds of constants used in the PRU port.  */
> +#define SHIFT_INT(X) ((X) >= 0 && (X) <= 31)
> +
> +#define UHWORD_INT(X) (IN_RANGE ((X), 0, 0xffff))
> +#define SHWORD_INT(X) (IN_RANGE ((X), -32768, 32767))
> +#define UBYTE_INT(X) (IN_RANGE ((X), 0, 0xff))

Would be good to use IN_RANGE for SHIFT_INT too, since it avoids the
double evaluation of the macro argument.

> +/* Say that the epilogue uses the return address register.  Note that
> +   in the case of sibcalls, the values "used by the epilogue" are
> +   considered live at the start of the called function.  */
> +#define EPILOGUE_USES(REGNO) (epilogue_completed &&	      \
> +			      (((REGNO) == RA_REGNO)	      \
> +			       || (REGNO) == (RA_REGNO + 1)))

The "&&" should be on the next line.

> +#define __pru_name_R(X)  X".b0", X".b1", X".b2", X".b3"

Would be better not to use a leading "__", since that's in the
implementation namespace.  How about just PRU_NAME_R?

> +#define __pru_overlap_R(X)	      \
> +  { "r" #X	, X * 4	    ,  4 },   \
> +  { "r" #X ".w0", X * 4 + 0 ,  2 },   \
> +  { "r" #X ".w1", X * 4 + 1 ,  2 },   \
> +  { "r" #X ".w2", X * 4 + 2 ,  2 }

Same sort of thing here.

> +; Not recommended.  Please use %0 instead!
> +(define_mode_attr regwidth [(QI ".b0") (HI ".w0") (SI "")])

Looks like you've successfully done this transition :-)  I couldn't
see any other references to "regwidth", so might as well just delete it.

> +; I cannot think of any reason for the core to pass a 64-bit symbolic
> +; constants.  Hence simplify the rule and handle only numeric constants.

Not sure what you mean by "the core" here: the core compiler?
If so, then yeah, by default the only modes that a symbolic address can
have are Pmode and ptr_mode, which are both SImode here.  So I don't
think this is a simplification so much as expected behaviour.

> +; load_multiple pattern(s).
> +;
> +; ??? Due to reload problems with replacing registers inside match_parallel
> +; we currently support load_multiple/store_multiple only after reload.
> +;
> +; Idea taken from the s390 port.
> +
> +(define_expand "load_multiple"
> +  [(match_par_dup 3 [(set (match_operand 0 "" "")
> +			  (match_operand 1 "" ""))
> +		     (use (match_operand 2 "" ""))])]
> +  "reload_completed"
> +{
> +  machine_mode mode;
> +  int regno;
> +  int count;
> +  rtx from;
> +  int i, off;
> +
> +  /* Support only loading a constant number of fixed-point registers from
> +     memory.  */
> +  if (GET_CODE (operands[2]) != CONST_INT
> +      || GET_CODE (operands[1]) != MEM
> +      || GET_CODE (operands[0]) != REG)
> +    FAIL;
> +
> +  count = INTVAL (operands[2]);
> +  regno = REGNO (operands[0]);
> +  mode = GET_MODE (operands[0]);
> +  if (mode != QImode)
> +    FAIL;
> +
> +  operands[3] = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (count));
> +  if (!can_create_pseudo_p ())

This is always true if reload_completed, so I think the "else" arm
is dead code.

> +    {
> +      if (GET_CODE (XEXP (operands[1], 0)) == REG)
> +	{
> +	  from = XEXP (operands[1], 0);
> +	  off = 0;
> +	}
> +      else if (GET_CODE (XEXP (operands[1], 0)) == PLUS
> +	       && GET_CODE (XEXP (XEXP (operands[1], 0), 0)) == REG
> +	       && GET_CODE (XEXP (XEXP (operands[1], 0), 1)) == CONST_INT)
> +	{
> +	  from = XEXP (XEXP (operands[1], 0), 0);
> +	  off = INTVAL (XEXP (XEXP (operands[1], 0), 1));
> +	}
> +      else
> +	FAIL;

Could be simplified using split_offset.

Same comments for store_multiple.

> +;; Arithmetic Operations
> +
> +(define_expand "add<mode>3"
> +  [(set (match_operand:QISI 0 "register_operand"	    "=r,r,r")
> +	(plus:QISI (match_operand:QISI 1 "register_operand" "%r,r,r")
> +		 (match_operand:QISI 2 "nonmemory_operand"  "r,I,M")))]
> +  ""
> +  ""
> +  [(set_attr "type" "alu")])

define_expands shouldn't have constraints or attributes (or at least,
they have no effect).  So this should just be:

(define_expand "add<mode>3"
  [(set (match_operand:QISI 0 "register_operand")
	(plus:QISI (match_operand:QISI 1 "register_operand")
		   (match_operand:QISI 2 "nonmemory_operand")))])

Same for the rest of the file.

> +(define_insn "subdi3"
> +  [(set (match_operand:DI 0 "register_operand"		    "=&r,&r,&r")
> +	(minus:DI (match_operand:DI 1 "register_operand"    "r,r,I")
> +		 (match_operand:DI 2 "reg_or_ubyte_operand" "r,I,r")))]
> +  ""
> +  "@
> +   sub\\t%F0, %F1, %F2\;suc\\t%N0, %N1, %N2
> +   sub\\t%F0, %F1, %2\;suc\\t%N0, %N1, 0
> +   rsb\\t%F0, %F2, %1\;rsc\\t%N0, %N2, 0"
> +  [(set_attr "type" "alu,alu,alu")
> +   (set_attr "length" "8,8,8")])

As above, this shouldn't handle operand 2 being constant; that should
always go through adddi3 instead.

> +; LRA cannot cope with clobbered op2, hence the scratch register.
> +(define_insn "ashr<mode>3"
> +  [(set (match_operand:QISI 0 "register_operand"	    "=&r,r")
> +	  (ashiftrt:QISI
> +	    (match_operand:QISI 1 "register_operand"	    "0,r")
> +	    (match_operand:QISI 2 "reg_or_const_1_operand"  "r,P")))
> +   (clobber (match_scratch:QISI 3			    "=r,X"))]
> +  ""
> +  "@
> +   mov %3, %2\;ASHRLP%=:\;qbeq ASHREND%=, %3, 0\; sub %3, %3, 1\; lsr\\t%0, %0, 1\; qbbc ASHRLP%=, %0, (%S0 * 8) - 2\; set %0, %0, (%S0 * 8) - 1\; jmp ASHRLP%=\;ASHREND%=:
> +   lsr\\t%0, %1, 1\;qbbc LSIGN%=, %0, (%S0 * 8) - 2\;set %0, %0, (%S0 * 8) - 1\;LSIGN%=:"
> +  [(set_attr "type" "complex,alu")
> +   (set_attr "length" "28,4")])

What do you mean by LRA not coping?  What did you try originally and
what went wrong?

> +(define_insn "<code>di3"
> +  [(set (match_operand:DI 0 "register_operand"		"=&r,&r")
> +	  (LOGICAL_BITOP:DI
> +	    (match_operand:DI 1 "register_operand"	"%r,r")
> +	    (match_operand:DI 2 "reg_or_ubyte_operand"	"r,I")))]
> +  ""
> +  "@
> +   <logical_bitop_asm>\\t%F0, %F1, %F2\;<logical_bitop_asm>\\t%N0, %N1, %N2
> +   <logical_bitop_asm>\\t%F0, %F1, %2\;<logical_bitop_asm>\\t%N0, %N1, 0"
> +  [(set_attr "type" "alu,alu")
> +   (set_attr "length" "8,8")])
> +
> +
> +(define_insn "one_cmpldi2"
> +  [(set (match_operand:DI 0 "register_operand"		"=r")
> +	(not:DI (match_operand:DI 1 "register_operand"	"r")))]
> +  ""
> +{
> +  /* careful with overlapping source and destination regs.  */
> +  gcc_assert (GP_REG_P (REGNO (operands[0])));
> +  gcc_assert (GP_REG_P (REGNO (operands[1])));
> +  if (REGNO (operands[0]) == (REGNO (operands[1]) + 4))
> +    return "not\\t%N0, %N1\;not\\t%F0, %F1";
> +  else
> +    return "not\\t%F0, %F1\;not\\t%N0, %N1";
> +}
> +  [(set_attr "type" "alu")
> +   (set_attr "length" "8")])

Do you see any code improvements from defining these patterns?
These days I'd expect you'd get better code by letting the target-independent
code split them up into SImode operations.

> +;; Multiply instruction.  Idea for fixing registers comes from the AVR backend.
> +
> +(define_expand "mulsi3"
> +  [(set (match_operand:SI 0 "register_operand" "")
> +	(mult:SI (match_operand:SI 1 "register_operand" "")
> +		 (match_operand:SI 2 "register_operand" "")))]
> +  ""
> +{
> +  emit_insn (gen_mulsi3_fixinp (operands[0], operands[1], operands[2]));
> +  DONE;
> +})
> +
> +
> +(define_expand "mulsi3_fixinp"
> +  [(set (reg:SI 112) (match_operand:SI 1 "register_operand" ""))
> +   (set (reg:SI 116) (match_operand:SI 2 "register_operand" ""))
> +   (set (reg:SI 104) (mult:SI (reg:SI 112) (reg:SI 116)))
> +   (set (match_operand:SI 0 "register_operand" "") (reg:SI 104))]
> +  ""
> +{
> +})

This seems slightly dangerous since there's nothing to guarantee that
those registers aren't already live at the point of expansion.
The more usual way (and IMO correct way) would be for:

> +(define_insn "*mulsi3_prumac"
> +  [(set (reg:SI 104) (mult:SI (reg:SI 112) (reg:SI 116)))]
> +  ""
> +  "nop\;xin\\t0, r26, 4"
> +  [(set_attr "type" "alu")
> +   (set_attr "length" "8")])

to have register classes and constraints for these three registers,
like e.g. the x86 port does for %cx etc.

It would be good if you could try that.  On the other hand, if AVR
already does this and it worked in practice then I guess it's not
something that should hold up the port.

> +(define_expand "doloop_end"
> +  [(use (match_operand 0 "nonimmediate_operand" ""))
> +   (use (label_ref (match_operand 1 "" "")))]
> +  "TARGET_OPT_LOOP"
> +  "if (GET_CODE (operands[0]) == REG && GET_MODE (operands[0]) == QImode)
> +     FAIL;
> +   pru_emit_doloop (operands, 1);
> +   DONE;
> +  ")

Would be more consistent to use { ... } instead of "..." to quote the code.

> +@item ctable_entry @var{index} @var{constant_address}
> +@cindex pragma, ctable_entry
> +Specifies that the given PRU CTABLE entry at @var{index} has a value
> +@var{constant_address}.  This enables GCC to emit LBCO/SBCO instructions
> +when the load/store address is known and can be addressed with some CTABLE
> +entry.  Example:

Maybe "...that the PRU CTABLE entry given by @var{index} has the value...".
Think "For example:" is used more than "Example:".

> @@ -23444,6 +23449,56 @@ the offset with a symbol reference to a canary in the TLS block.
>  @end table
>  
>  
> +@node PRU Options
> +@subsection PRU Options
> +@cindex PRU Options
> +
> +These command-line options are defined for PRU target:
> +
> +@table @gcctabopt
> +@item -minrt
> +@opindex minrt
> +Enable the use of a minimum runtime environment---no static
> +initializers or constructors.  Results in significant code size
> +reduction of final ELF binaries.

Up to you, but this read to me as "-m"+"inrt".  If this option is already
in use then obviously keep it, but if it's new then it might be worth
calling it -mminrt instead, for consistency with -mmcu.

Maybe s/---/, with no support for/?

Maybe "Using this option can significantly reduce the size of the
final ELF binary."

It might also be worth emphasising that this option only has an effect
when linking, and that the compiler doesn't (AFAICT) enforce that static
initialisers and constructors aren't used when -minrt is passed.

> +@item -mmcu=@var{mcu}
> +@opindex mmcu
> +Specify the PRU MCU variant to use.  Check Newlib for exact list of options.

s/for exact/for the exact/.
Maybe s/list of options/list of supported MCUs/?

> +@item -mno-relax
> +@opindex mno-relax
> +Pass on (or do not pass on) the @option{-mrelax} command-line option
> +to the linker.

This "Pass on (or do not pass on)" construct is only used when GCC
supports two contrasting options.  It looks like the port really
does only support for -mno-relax.  Also, the linker option is
--relax rather than -mrelax.  So maybe something like:

Make GCC pass @option{--no-relax} command-line option to the linker
instead of the @option{--relax} option.

The mno-relax entry in pru.opt should have RejectNegative, to prevent
"mno-no-relax" (unless something already prevents that these days).

> +@item -mabi=@var{variant}
> +@opindex mabi
> +Specify the ABI variant to output code for.  Permissible values are @samp{gnu}
> +for GCC, and @samp{ti} for fully conformant TI ABI.  These are the differences:
>
> +@table @samp
> +@item Function Pointer Size
> +TI ABI specifies that function (code) pointers are 16-bit, whereas GCC
> +supports only 32-bit data and code pointers.
>
> +@item Optional Return Value Pointer
> +Function return values larger than 64 bits are passed by using a hidden
> +pointer as the first argument of the function.  TI ABI, though, mandates that
> +the pointer can be NULL in case the caller is not using the returned value.
> +GCC always passes and expects a valid return value pointer.
> +
> +@end table
> +
> +The current @samp{mabi=ti} implementation simply raises a compile error
> +when any of the above code constructs is detected.

It might be better to reword this so that there's a clearer distinction
between "gnu" the GNU ABI for PRU on the one hand and what GCC the
compiler generates on the other.  Maybe after "Specify the ABI variant
to output code for.":

@option{-mbi=ti} selects the unmodified TI ABI while @option{-mbi=gnu}
selects a GNU variant that copes more naturally with certain GCC
assumptions.

and then refer to the "GNU ABI" instead of "GCC" when describing the
differences.

Also, this doesn't mention that -mabi=ti implies -mno-relax.
Would be good to mention that and explain why.

> +@item M
> +An integer constant in the range [-255;0].

Think [-255, 0] is more common.

> +
> +@item T
> +A text segment (program memory) constant label.
> +
> +@item Z
> +Integer constant zero.

Just checking, but did you deliberately leave out N?  That's fine if so,
but the docstring in constraints.md didn't have an "@internal" marker to
indicate it was internal-only.

Despite the long reply (sorry), this looks in really good shape to me FWIW.
Thanks for the submission.

Richard
Dimitar Dimitrov Sept. 22, 2018, 9:08 a.m. UTC | #2
On Thursday, 9/13/2018 13:02:21 EEST Richard Sandiford wrote:
> Dimitar Dimitrov <dimitar@dinux.eu> writes:
> > +(define_insn
> > "sub_impl<EQD:mode><EQS0:mode><EQS1:mode>_<alu3_zext><alu3_zext_op1><alu3
> > _zext_op2>" +  [(set (match_operand:EQD 0 "register_operand" "=r,r,r")
> > +	(minus:EQD
> > +	 (zero_extend:EQD
> > +	  (match_operand:EQS0 1 "reg_or_ubyte_operand" "r,r,I"))
> > +	 (zero_extend:EQD
> > +	  (match_operand:EQS1 2 "reg_or_ubyte_operand" "r,I,r"))))]
> > +  ""
> > +  "@
> > +   sub\\t%0, %1, %2
> > +   sub\\t%0, %1, %2
> > +   rsb\\t%0, %2, %1"
> > +  [(set_attr "type" "alu")
> > +   (set_attr "length" "4")])
> 
> By convention, subtraction patterns shouldn't accept constants for
> operand 2.  Constants should instead be subtracted using an addition
> of the negative.
Understood. I will remove second alternative. But I will leave the third one 
since it enables an important optimization:

   unsigned test(unsigned a)
   {
        return 10-a;
   }

RTL:
(insn 6 3 7 2 (set (reg:SI 152)
        (minus:SI (const_int 10 [0xa])
            (reg/v:SI 151 [ a ]))) "test.c":430 -1
     (nil))

Assembly:
    test:
        rsb     r14, r14, 10
        ret

> 
> > +(define_constraint "I"
> > +  "An unsigned 8-bit constant."
> > +  (and (match_code "const_int")
> > +       (match_test "UBYTE_INT (ival)")))
> 
> As it stands this will reject QImode constants with the top bit set,
> since const_ints are always stored in sign-extended form.  E.g. QImode
> 128 is stored as (const_int -128) rather than (const_int 128).
> 
> Unfortunately this is difficult to fix in a clean way, since
> const_ints don't store their mode (a long-standing wart) and unlike
> predicates, constraints aren't given a mode from context.  The best
> way I can think of coping with it is:
> 
> a) have a separate constraint for -128...127
> b) add a define_mode_attr that maps QI to the new constraint and
>    HI and SI to I
> c) use <EQS1:...> etc. instead of I in the match_operands
> 
> Similar comment for "J" and HImode, although you already have the
> "N" as the corresponding signed constraint and so don't need a new one.
Thank you. This strategy worked for QImode. I will include the changes in my 
next patch.

Since PRU ALU operations do not have 16-bit constant operands, there is no 
need to add define_mode_attr for HImode. The "mov" pattern already handles the 
"N" [-32768, 32767] constraint.

> 
> 
> > +;; Return true if OP is a text segment reference.
> > +;; This is needed for program memory address expressions.  Borrowed from
> > AVR. +(define_predicate "text_segment_operand"
> > +  (match_code "code_label,label_ref,symbol_ref,plus,minus,const")
> > +{
> > +  switch (GET_CODE (op))
> > +    {
> > +    case CODE_LABEL:
> > +      return true;
> > +    case LABEL_REF :
> > +      return true;
> > +    case SYMBOL_REF :
> > +      return SYMBOL_REF_FUNCTION_P (op);
> > +    case PLUS :
> > +    case MINUS :
> > +      /* Assume canonical format of symbol + constant.
> > +	 Fall through.  */
> > +    case CONST :
> > +      return text_segment_operand (XEXP (op, 0), VOIDmode);
> > +    default :
> > +      return false;
> > +    }
> > +})
> 
> This probably comes from AVR, but: no spaces before ":".
> 
> Bit surprised that we can get a CODE_LABEL rather than a LABEL_REF here.
> Do you know if that triggers in practice, and if so where?
Indeed, CODE_LABEL case is never reached. I'll leave gcc_unreachable here.

> An IMO neater and slightly more robust way of writing the body is:
>   poly_int64 offset:
>   rtx base = strip_offset (op, &offset);
>   switch (GET_CODE (base))
>   
>     {
>     
>     case LABEL_REF:
>       ...as above...
>     
>     case SYMBOL_REF:
>       ...as above...
>     
>     default:
>       return false;
>     
>     }
> 
> with "plus" and "minus" not in the match_code list (since they should
> always appear in consts if they really are text references).

The "plus" and "minus" are needed when handling code labels as values. Take 
for example the following construct:

   int x = &&lab1 - &&lab0;
lab1:
  ...
lab2:


My TARGET_ASM_INTEGER callback uses the text_segment_operand predicate. In the 
above case it is passed the following RTL expression:
(minus:SI (label_ref/v:SI 20)
    (label_ref/v:SI 27))

I need to detect text labels so that I annotate them with %pmem:
        .4byte	%pmem(.L4-(.L2))
Instead of the incorrect:
        .4byte   .L3-(.L2)


> > +;; Return true if OP is a load multiple operation.  It is known to be a
> > +;; PARALLEL and the first section will be tested.
> > +
> > +(define_special_predicate "load_multiple_operation"
> > +  (match_code "parallel")
> > +{
> > +  machine_mode elt_mode;
> > +  int count = XVECLEN (op, 0);
> > +  unsigned int dest_regno;
> > +  rtx src_addr;
> > +  int i, off;
> > +
> > +  /* Perform a quick check so we don't blow up below.  */
> > +  if (GET_CODE (XVECEXP (op, 0, 0)) != SET
> > +      || GET_CODE (SET_DEST (XVECEXP (op, 0, 0))) != REG
> > +      || GET_CODE (SET_SRC (XVECEXP (op, 0, 0))) != MEM)
> > +    return false;
> > +
> > +  dest_regno = REGNO (SET_DEST (XVECEXP (op, 0, 0)));
> > +  src_addr = XEXP (SET_SRC (XVECEXP (op, 0, 0)), 0);
> > +  elt_mode = GET_MODE (SET_DEST (XVECEXP (op, 0, 0)));
> > +
> > +  /* Check, is base, or base + displacement.  */
> > +
> > +  if (GET_CODE (src_addr) == REG)
> > +    off = 0;
> > +  else if (GET_CODE (src_addr) == PLUS
> > +	   && GET_CODE (XEXP (src_addr, 0)) == REG
> > +	   && GET_CODE (XEXP (src_addr, 1)) == CONST_INT)
> > +    {
> > +      off = INTVAL (XEXP (src_addr, 1));
> > +      src_addr = XEXP (src_addr, 0);
> > +    }
> > +  else
> > +    return false;
> > +
> > +  for (i = 1; i < count; i++)
> > +    {
> > +      rtx elt = XVECEXP (op, 0, i);
> > +
> > +      if (GET_CODE (elt) != SET
> > +	  || GET_CODE (SET_DEST (elt)) != REG
> > +	  || GET_MODE (SET_DEST (elt)) != elt_mode
> > +	  || REGNO (SET_DEST (elt)) != dest_regno + i
> > +	  || GET_CODE (SET_SRC (elt)) != MEM
> > +	  || GET_MODE (SET_SRC (elt)) != elt_mode
> > +	  || GET_CODE (XEXP (SET_SRC (elt), 0)) != PLUS
> > +	  || ! rtx_equal_p (XEXP (XEXP (SET_SRC (elt), 0), 0), src_addr)
> > +	  || GET_CODE (XEXP (XEXP (SET_SRC (elt), 0), 1)) != CONST_INT
> > +	  || INTVAL (XEXP (XEXP (SET_SRC (elt), 0), 1))
> > +	     != off + i * GET_MODE_SIZE (elt_mode))
> > +	return false;
> > +    }
> > +
> > +  return true;
> > +})
> 
> This may not matter in practice, but the downside of testing PLUS and
> CONST_INT explicitly is that the predicate won't match something like:
> 
>   (set (reg R1) (mem (plus (reg RB) (const_int -4))))
>   (set (reg R2) (mem (reg RB)))
>   (set (reg R3) (mem (plus (reg RB) (const_int 4))))
>   ...
> 
> Using strip_offset on the addresses would avoid this.
It does not matter for PRU since all offsets are unsigned integers. But I will 
switch to strip_offset to make the code cleaner.

> > +/* Callback for walk_gimple_seq that checks TP tree for TI ABI
> > compliance.  */ +static tree
> > +check_op_callback (tree *tp, int *walk_subtrees, void *data)
> > +{
> > +  struct walk_stmt_info *wi = (struct walk_stmt_info *) data;
> > +
> > +  if (RECORD_OR_UNION_TYPE_P (*tp) || TREE_CODE (*tp) == ENUMERAL_TYPE)
> > +    {
> > +      /* Forward declarations have NULL tree type.  Skip them.  */
> > +      if (TREE_TYPE (*tp) == NULL)
> > +	return NULL;
> > +    }
> > +
> > +  /* TODO - why C++ leaves INTEGER_TYPE forward declarations around?  */
> > +  if (TREE_TYPE (*tp) == NULL)
> > +    return NULL;
> > +
> > +  const tree type = TREE_TYPE (*tp);
> > +
> > +  /* Direct function calls are allowed, obviously.  */
> > +  if (TREE_CODE (*tp) == ADDR_EXPR && TREE_CODE (type) == POINTER_TYPE)
> > +    {
> > +      const tree ptype = TREE_TYPE (type);
> > +      if (TREE_CODE (ptype) == FUNCTION_TYPE)
> > +	return NULL;
> > +    }
> 
> This seems like a bit of a dangerous heuristic.  Couldn't it also cause
> us to skip things like:
> 
>    (void *) func
> 
> ?  (OK, that's dubious C, but we do support it.)
The cast yields a "data pointer", which is 32 bits for both types of ABI. 
Hence it is safe to skip "(void *) func".

The TI ABI's 16 bit function pointers become a problem when they change the 
layout of structures and function argument registers.


> > +/* Implement TARGET_MODES_TIEABLE_P.  */
> > +
> > +static bool
> > +pru_modes_tieable_p (machine_mode mode1, machine_mode mode2)
> > +{
> > +  return (mode1 == mode2
> > +	  || (GET_MODE_SIZE (mode1) <= 4 && GET_MODE_SIZE (mode2) <= 4));
> > +}
> 
> I think this should have a comment explaining why this implementation
> is needed.
Sorry, this is stale. I will altogether remove the callback, and will use the 
default "true" implementation. PRU registers are fairly orthogonal.

A few years ago I got confused and considered this a requirement for correct 
64 bit operations.

> 
> > +	    {
> > +	      *total = 0;
> > +	    }
> > +	  else
> > +	    {
> > +	      /* SI move has the same cost as a QI move.  */
> > +	      int factor = GET_MODE_SIZE (mode) / GET_MODE_SIZE (SImode);
> > +	      if (factor == 0) 
> > +		factor = 1;
> > +	      *total = factor * COSTS_N_INSNS (1);
> > +	    }
> 
> Could you explain this a bit more?  It looks like a pure QImode operation
> gets a cost of 1 insn but an SImode operation zero-extended from QImode
> gets a cost of 0.
I have unintentionally bumped QI cost while trying to make zero-extends cheap. 
I will fix by using the following simple logic:

    case SET:
	  mode = GET_MODE (SET_DEST (x));
	  /* SI move has the same cost as a QI move.  Moves larger than
	     64 bits are costly.  */
	  int factor = CEIL (GET_MODE_SIZE (mode), GET_MODE_SIZE (SImode));
	  *total = factor * COSTS_N_INSNS (1);


> 
> [...]
> 
> > +/* Implement TARGET_PREFERRED_RELOAD_CLASS.  */
> > +static reg_class_t
> > +pru_preferred_reload_class (rtx x ATTRIBUTE_UNUSED, reg_class_t regclass)
> > +{
> > +  return regclass == NO_REGS ? GENERAL_REGS : regclass;
> > +}
> 
> This looks odd: PREFERRED_RELOAD_CLASS should return a subset of the
> original class rather than add new registers.  Can you remember why
> it was needed?
I'm not sure what target code is supposed to return when NO_REGS is passed 
here. I saw how other ports handle NO_REGS (c-sky, m32c, nios2, rl78). So I 
followed suite.

Here is a backtrace of LRA when NO_REGS is passed:
0xf629ae pru_preferred_reload_class
        ../../gcc/gcc/config/pru/pru.c:788
0xa3d6e8 process_alt_operands
        ../../gcc/gcc/lra-constraints.c:2600
0xa3ef7d curr_insn_transform
        ../../gcc/gcc/lra-constraints.c:3889
0xa4301e lra_constraints(bool)
        ../../gcc/gcc/lra-constraints.c:4906
0xa2726c lra(_IO_FILE*)
        ../../gcc/gcc/lra.c:2446
0x9c97a9 do_reload
        ../../gcc/gcc/ira.c:5469
0x9c97a9 execute
        ../../gcc/gcc/ira.c:5653

> 
> > +  /* Floating-point comparisons.  */
> > +  eqsf_libfunc = init_one_libfunc ("__pruabi_eqf");
> > +  nesf_libfunc = init_one_libfunc ("__pruabi_neqf");
> > +  lesf_libfunc = init_one_libfunc ("__pruabi_lef");
> > +  ltsf_libfunc = init_one_libfunc ("__pruabi_ltf");
> > +  gesf_libfunc = init_one_libfunc ("__pruabi_gef");
> > +  gtsf_libfunc = init_one_libfunc ("__pruabi_gtf");
> > +  eqdf_libfunc = init_one_libfunc ("__pruabi_eqd");
> > +  nedf_libfunc = init_one_libfunc ("__pruabi_neqd");
> > +  ledf_libfunc = init_one_libfunc ("__pruabi_led");
> > +  ltdf_libfunc = init_one_libfunc ("__pruabi_ltd");
> > +  gedf_libfunc = init_one_libfunc ("__pruabi_ged");
> > +  gtdf_libfunc = init_one_libfunc ("__pruabi_gtd");
> > +
> > +  set_optab_libfunc (eq_optab, SFmode, NULL);
> > +  set_optab_libfunc (ne_optab, SFmode, "__pruabi_neqf");
> > +  set_optab_libfunc (gt_optab, SFmode, NULL);
> > +  set_optab_libfunc (ge_optab, SFmode, NULL);
> > +  set_optab_libfunc (lt_optab, SFmode, NULL);
> > +  set_optab_libfunc (le_optab, SFmode, NULL);
> > +  set_optab_libfunc (unord_optab, SFmode, "__pruabi_unordf");
> > +  set_optab_libfunc (eq_optab, DFmode, NULL);
> > +  set_optab_libfunc (ne_optab, DFmode, "__pruabi_neqd");
> > +  set_optab_libfunc (gt_optab, DFmode, NULL);
> > +  set_optab_libfunc (ge_optab, DFmode, NULL);
> > +  set_optab_libfunc (lt_optab, DFmode, NULL);
> > +  set_optab_libfunc (le_optab, DFmode, NULL);
> 
> So you need to pass NULL for the ordered comparisons because the ABI
> routines return a different value from the one GCC expects?  Might be
> worth a comment if so.
Yes, I will comment.

> 
> > +/* Emit comparison instruction if necessary, returning the expression
> > +   that holds the compare result in the proper mode.  Return the
> > comparison +   that should be used in the jump insn.  */
> > +
> > +rtx
> > +pru_expand_fp_compare (rtx comparison, machine_mode mode)
> > +{
> > +  enum rtx_code code = GET_CODE (comparison);
> > +  rtx op0 = XEXP (comparison, 0);
> > +  rtx op1 = XEXP (comparison, 1);
> > +  rtx cmp;
> > +  enum rtx_code jump_code = code;
> > +  machine_mode op_mode = GET_MODE (op0);
> > +  rtx_insn *insns;
> > +  rtx libfunc;
> > +
> > +  gcc_assert (op_mode == DFmode || op_mode == SFmode);
> > +
> > +  if (code == UNGE)
> > +    {
> > +      code = LT;
> > +      jump_code = EQ;
> > +    }
> > +  else if (code == UNLE)
> > +    {
> > +      code = GT;
> > +      jump_code = EQ;
> > +    }
> 
> This is only safe if the LT and GT libcalls don't raise exceptions for
> unordered inputs.  I'm guessing they don't, but it's probably worth a
> comment if so.
Yes, I will comment.

> 
> > +/* Return true if register REGNO is a valid base register.
> > +   STRICT_P is true if REG_OK_STRICT is in effect.  */
> > +
> > +bool
> > +pru_regno_ok_for_base_p (int regno, bool strict_p)
> > +{
> > +  if (!HARD_REGISTER_NUM_P (regno))
> > +    {
> > +      if (!strict_p)
> > +	return true;
> > +
> > +      if (!reg_renumber)
> > +	return false;
> > +
> > +      regno = reg_renumber[regno];
> 
> reg_renumber is a reload-only thing and shouldn't be needed/used on
> LRA targets.
I will remove the snippet. It's a remnant from the PRU's reload days.

> 
> > +  if (REG_P (base)
> > +      && pru_regno_ok_for_base_p (REGNO (base), strict_p)
> > +      && (offset == NULL_RTX
> > +	  || (CONST_INT_P (offset)
> > +	      && pru_valid_const_ubyte_offset (mode, INTVAL (offset)))
> > +	  || (REG_P (offset)
> > +	      && pru_regno_ok_for_index_p (REGNO (offset), strict_p))))
> > +    {
> > +      /*     base register + register offset
> > +       * OR  base register + UBYTE constant offset.  */
> > +      return true;
> > +    }
> > +  else if (REG_P (base)
> > +	   && pru_regno_ok_for_index_p (REGNO (base), strict_p)
> > +	   && (offset != NULL_RTX && ctable_base_operand (offset, VOIDmode)))
> > +    {
> > +      /*     base CTABLE constant base + register offset
> > +       * Note: GCC always puts the register as a first operand of PLUS. 
> > */ +      return true;
> > +    }
> > +  else if (CONST_INT_P (base)
> > +	   && offset == NULL_RTX
> > +	   && (ctable_addr_operand (base, VOIDmode)))
> > +    {
> > +      /*     base CTABLE constant base + UBYTE constant offset.  */
> > +      return true;
> 
> This address should simply be a (const_int ...), with no containing (plus
> ...).
Yes, my bad. Will remove this snippet.

> > +    case SUBREG:
> > +    case MEM:
> > +      if (letter == 0)
> > +	{
> > +	  output_address (VOIDmode, op);
> > +	  return;
> > +	}
> 
> Does the SUBREG case ever hit?  They shouldn't appear as operands
> at this late stage.
No, it does not hit. I will put gcc_unreachable().


> > +/* Helper function to get the starting storage HW register for an
> > argument, +   or -1 if it must be passed on stack.  The cum_v state is
> > not changed.  */ +static int
> > +pru_function_arg_regi (cumulative_args_t cum_v,
> > +		       machine_mode mode, const_tree type,
> > +		       bool named)
> > +{
> > +  CUMULATIVE_ARGS *cum = get_cumulative_args (cum_v);
> > +  size_t argsize = pru_function_arg_size (mode, type);
> > +  size_t i, bi;
> > +  int regi = -1;
> > +
> > +  if (!pru_arg_in_reg_bysize (argsize))
> > +    return -1;
> > +
> > +  if (!named)
> > +    return -1;
> > +
> > +  /* Find the first available slot that fits.  Yes, that's the PRU ABI. 
> > */ +  for (i = 0; regi < 0 && i < ARRAY_SIZE (cum->regs_used); i++)
> > +    {
> > +      if (mode == BLKmode)
> > +	{
> > +	  /* Structs are passed, beginning at a full register.  */
> > +	  if ((i % 4) != 0)
> > +	    continue;
> > +	}
> 
> The comment doesn't seem to match the code: a struct like:
> 
>   struct s { short x; };
> 
> will have HImode rather than BLKmode.
I will update the comment to clarify it is only for large structs. Code should 
be ok.

> 
> It's usually better to base ABI stuff on the type where possible,
> since that corresponds directly to the source language, while modes
> are more of an internal GCC thing.
I will explore this option. But ABI is already defined in data sizes, which I 
think neatly fit with GCC's modes.

> 
> > +(define_insn "subdi3"
> > +  [(set (match_operand:DI 0 "register_operand"		    "=&r,&r,&r")
> > +	(minus:DI (match_operand:DI 1 "register_operand"    "r,r,I")
> > +		 (match_operand:DI 2 "reg_or_ubyte_operand" "r,I,r")))]
> > +  ""
> > +  "@
> > +   sub\\t%F0, %F1, %F2\;suc\\t%N0, %N1, %N2
> > +   sub\\t%F0, %F1, %2\;suc\\t%N0, %N1, 0
> > +   rsb\\t%F0, %F2, %1\;rsc\\t%N0, %N2, 0"
> > +  [(set_attr "type" "alu,alu,alu")
> > +   (set_attr "length" "8,8,8")])
> 
> As above, this shouldn't handle operand 2 being constant; that should
> always go through adddi3 instead.
I will remove alternative 2, but will leave 3.

> 
> > +; LRA cannot cope with clobbered op2, hence the scratch register.
> > +(define_insn "ashr<mode>3"
> > +  [(set (match_operand:QISI 0 "register_operand"	    "=&r,r")
> > +	  (ashiftrt:QISI
> > +	    (match_operand:QISI 1 "register_operand"	    "0,r")
> > +	    (match_operand:QISI 2 "reg_or_const_1_operand"  "r,P")))
> > +   (clobber (match_scratch:QISI 3			    "=r,X"))]
> > +  ""
> > +  "@
> > +   mov %3, %2\;ASHRLP%=:\;qbeq ASHREND%=, %3, 0\; sub %3, %3, 1\;
> > lsr\\t%0, %0, 1\; qbbc ASHRLP%=, %0, (%S0 * 8) - 2\; set %0, %0, (%S0 *
> > 8) - 1\; jmp ASHRLP%=\;ASHREND%=: +   lsr\\t%0, %1, 1\;qbbc LSIGN%=, %0,
> > (%S0 * 8) - 2\;set %0, %0, (%S0 * 8) - 1\;LSIGN%=:" +  [(set_attr "type"
> > "complex,alu")
> > +   (set_attr "length" "28,4")])
> 
> What do you mean by LRA not coping?  What did you try originally and
> what went wrong?
Better assembly could be generated if the shift count register was used for a 
loop counter. Pseudo code:

   while (op2--)
     op0 >>= 1;
 
I originally tried to clobber operand 2:
 (define_insn "ashr<mode>3"
   [(set (match_operand:QISI 0 "register_operand" "=&r,r")
         (ashiftrt:QISI
           (match_operand:QISI 1 "register_operand" "0,r")
           (match_operand:QISI 2 "reg_or_const_1_operand" "+r,P"))
          )]

But with the above pattern operand 2 was not clobbered. Its value was deemed 
untouched (i.e. live across the pattern). So I came up with  clobbering a 
separate register to fix this, at the expense of slightly bigger generated 
code.

> 
> > +(define_insn "<code>di3"
> > +  [(set (match_operand:DI 0 "register_operand"		"=&r,&r")
> > +	  (LOGICAL_BITOP:DI
> > +	    (match_operand:DI 1 "register_operand"	"%r,r")
> > +	    (match_operand:DI 2 "reg_or_ubyte_operand"	"r,I")))]
> > +  ""
> > +  "@
> > +   <logical_bitop_asm>\\t%F0, %F1, %F2\;<logical_bitop_asm>\\t%N0, %N1,
> > %N2 +   <logical_bitop_asm>\\t%F0, %F1, %2\;<logical_bitop_asm>\\t%N0,
> > %N1, 0" +  [(set_attr "type" "alu,alu")
> > +   (set_attr "length" "8,8")])
> > +
> > +
> > +(define_insn "one_cmpldi2"
> > +  [(set (match_operand:DI 0 "register_operand"		"=r")
> > +	(not:DI (match_operand:DI 1 "register_operand"	"r")))]
> > +  ""
> > +{
> > +  /* careful with overlapping source and destination regs.  */
> > +  gcc_assert (GP_REG_P (REGNO (operands[0])));
> > +  gcc_assert (GP_REG_P (REGNO (operands[1])));
> > +  if (REGNO (operands[0]) == (REGNO (operands[1]) + 4))
> > +    return "not\\t%N0, %N1\;not\\t%F0, %F1";
> > +  else
> > +    return "not\\t%F0, %F1\;not\\t%N0, %N1";
> > +}
> > +  [(set_attr "type" "alu")
> > +   (set_attr "length" "8")])
> 
> Do you see any code improvements from defining these patterns?
> These days I'd expect you'd get better code by letting the
> target-independent code split them up into SImode operations.
Yes, these patterns improve code size and speed.

Looking at expand_binop(), DI logical ops are split into WORD mode, which in 
PRU's peculiar case is QI. So without those patterns, a 64 bit IOR generates 8 
QI instructions:

        or      r0.b0, r14.b0, r16.b0
        or      r0.b1, r14.b1, r16.b1
        or      r0.b2, r14.b2, r16.b2
        or      r0.b3, r14.b3, r16.b3
        or      r1.b0, r15.b0, r17.b0
        or      r1.b1, r15.b1, r17.b1
        or      r1.b2, r15.b2, r17.b2
        or      r1.b3, r15.b3, r17.b3

Whereas with patterns defined, it is only 2 SImode instructions:
        or      r0, r14, r16
        or      r1, r15, r17

> 
> > +;; Multiply instruction.  Idea for fixing registers comes from the AVR
> > backend. +
> > +(define_expand "mulsi3"
> > +  [(set (match_operand:SI 0 "register_operand" "")
> > +	(mult:SI (match_operand:SI 1 "register_operand" "")
> > +		 (match_operand:SI 2 "register_operand" "")))]
> > +  ""
> > +{
> > +  emit_insn (gen_mulsi3_fixinp (operands[0], operands[1], operands[2]));
> > +  DONE;
> > +})
> > +
> > +
> > +(define_expand "mulsi3_fixinp"
> > +  [(set (reg:SI 112) (match_operand:SI 1 "register_operand" ""))
> > +   (set (reg:SI 116) (match_operand:SI 2 "register_operand" ""))
> > +   (set (reg:SI 104) (mult:SI (reg:SI 112) (reg:SI 116)))
> > +   (set (match_operand:SI 0 "register_operand" "") (reg:SI 104))]
> > +  ""
> > +{
> > +})
> 
> This seems slightly dangerous since there's nothing to guarantee that
> those registers aren't already live at the point of expansion.
> 
> The more usual way (and IMO correct way) would be for:
> > +(define_insn "*mulsi3_prumac"
> > +  [(set (reg:SI 104) (mult:SI (reg:SI 112) (reg:SI 116)))]
> > +  ""
> > +  "nop\;xin\\t0, r26, 4"
> > +  [(set_attr "type" "alu")
> > +   (set_attr "length" "8")])
> 
> to have register classes and constraints for these three registers,
> like e.g. the x86 port does for %cx etc.
> 
> It would be good if you could try that.  On the other hand, if AVR
> already does this and it worked in practice then I guess it's not
> something that should hold up the port.
This suggestion worked. It was a matter of correct predicates.

Here is what I will put in the next patch version:

(define_predicate "pru_muldst_operand"
  (and (match_code "reg")
       (ior (match_test "REGNO_REG_CLASS (REGNO (op)) == MULDST_REGS")
	    (match_test "REGNO (op) >= FIRST_PSEUDO_REGISTER"))))
...
(define_register_constraint "Rmd0" "MULDST_REGS"
  "@internal
  The multiply destination register.")
...

(define_insn "mulsi3"
  [(set (match_operand:SI 0 "pru_muldst_operand"	   "=Rmd0")
	(mult:SI (match_operand:SI 1 "pru_mulsrc0_operand" "Rms0")
		 (match_operand:SI 2 "pru_mulsrc1_operand" "Rms1")))]

> 
> > @@ -23444,6 +23449,56 @@ the offset with a symbol reference to a canary in
> > the TLS block.> 
> >  @end table
> > 
> > +@node PRU Options
> > +@subsection PRU Options
> > +@cindex PRU Options
> > +
> > +These command-line options are defined for PRU target:
> > +
> > +@table @gcctabopt
> > +@item -minrt
> > +@opindex minrt
> > +Enable the use of a minimum runtime environment---no static
> > +initializers or constructors.  Results in significant code size
> > +reduction of final ELF binaries.
> 
> Up to you, but this read to me as "-m"+"inrt".  If this option is already
> in use then obviously keep it, but if it's new then it might be worth
> calling it -mminrt instead, for consistency with -mmcu.
I assumed this is a standard naming since MSP430 already uses it. I've seen 
the option being utilized by pru-gcc users, so let's keep it that way.

> > +@item T
> > +A text segment (program memory) constant label.
> > +
> > +@item Z
> > +Integer constant zero.
> 
> Just checking, but did you deliberately leave out N?  That's fine if so,
> but the docstring in constraints.md didn't have an "@internal" marker to
> indicate it was internal-only.
I did not know about "@internal. I will annotate appropriately.

> 
> Despite the long reply (sorry), this looks in really good shape to me FWIW.
> Thanks for the submission.
I take this sentence as an indication that the PRU port is welcome for 
inclusion into GCC. I'll work on fixing the comments and will send a new patch 
version.

Thank you for the in-depth review and valuable suggestions. 

Regards,
Dimitar
Richard Sandiford Sept. 24, 2018, 10:38 a.m. UTC | #3
Dimitar Dimitrov <dimitar@dinux.eu> writes:
> On Thursday, 9/13/2018 13:02:21 EEST Richard Sandiford wrote:
>> Dimitar Dimitrov <dimitar@dinux.eu> writes:
>> > +(define_insn
>> > "sub_impl<EQD:mode><EQS0:mode><EQS1:mode>_<alu3_zext><alu3_zext_op1><alu3
>> > _zext_op2>" +  [(set (match_operand:EQD 0 "register_operand" "=r,r,r")
>> > +	(minus:EQD
>> > +	 (zero_extend:EQD
>> > +	  (match_operand:EQS0 1 "reg_or_ubyte_operand" "r,r,I"))
>> > +	 (zero_extend:EQD
>> > +	  (match_operand:EQS1 2 "reg_or_ubyte_operand" "r,I,r"))))]
>> > +  ""
>> > +  "@
>> > +   sub\\t%0, %1, %2
>> > +   sub\\t%0, %1, %2
>> > +   rsb\\t%0, %2, %1"
>> > +  [(set_attr "type" "alu")
>> > +   (set_attr "length" "4")])
>> 
>> By convention, subtraction patterns shouldn't accept constants for
>> operand 2.  Constants should instead be subtracted using an addition
>> of the negative.
> Understood. I will remove second alternative. But I will leave the third one 
> since it enables an important optimization:
>
>    unsigned test(unsigned a)
>    {
>         return 10-a;
>    }
>
> RTL:
> (insn 6 3 7 2 (set (reg:SI 152)
>         (minus:SI (const_int 10 [0xa])
>             (reg/v:SI 151 [ a ]))) "test.c":430 -1
>      (nil))
>
> Assembly:
>     test:
>         rsb     r14, r14, 10
>         ret

Thanks.  And yeah, the final alternative is fine.

>> > +;; Return true if OP is a text segment reference.
>> > +;; This is needed for program memory address expressions.  Borrowed from
>> > AVR. +(define_predicate "text_segment_operand"
>> > +  (match_code "code_label,label_ref,symbol_ref,plus,minus,const")
>> > +{
>> > +  switch (GET_CODE (op))
>> > +    {
>> > +    case CODE_LABEL:
>> > +      return true;
>> > +    case LABEL_REF :
>> > +      return true;
>> > +    case SYMBOL_REF :
>> > +      return SYMBOL_REF_FUNCTION_P (op);
>> > +    case PLUS :
>> > +    case MINUS :
>> > +      /* Assume canonical format of symbol + constant.
>> > +	 Fall through.  */
>> > +    case CONST :
>> > +      return text_segment_operand (XEXP (op, 0), VOIDmode);
>> > +    default :
>> > +      return false;
>> > +    }
>> > +})
>> 
>> This probably comes from AVR, but: no spaces before ":".
>> 
>> Bit surprised that we can get a CODE_LABEL rather than a LABEL_REF here.
>> Do you know if that triggers in practice, and if so where?
> Indeed, CODE_LABEL case is never reached. I'll leave gcc_unreachable here.
>
>> An IMO neater and slightly more robust way of writing the body is:
>>   poly_int64 offset:
>>   rtx base = strip_offset (op, &offset);
>>   switch (GET_CODE (base))
>>   
>>     {
>>     
>>     case LABEL_REF:
>>       ...as above...
>>     
>>     case SYMBOL_REF:
>>       ...as above...
>>     
>>     default:
>>       return false;
>>     
>>     }
>> 
>> with "plus" and "minus" not in the match_code list (since they should
>> always appear in consts if they really are text references).
>
> The "plus" and "minus" are needed when handling code labels as values. Take 
> for example the following construct:
>
>    int x = &&lab1 - &&lab0;
> lab1:
>   ...
> lab2:
>
>
> My TARGET_ASM_INTEGER callback uses the text_segment_operand predicate. In the 
> above case it is passed the following RTL expression:
> (minus:SI (label_ref/v:SI 20)
>     (label_ref/v:SI 27))
>
> I need to detect text labels so that I annotate them with %pmem:
>         .4byte	%pmem(.L4-(.L2))
> Instead of the incorrect:
>         .4byte   .L3-(.L2)

OK, thanks for the explanation.  I think the target-independent code should
really be passing (const (minus ...)) rather than a plain (minus ...) here,
but that's going to be difficult to change at this stage.  And like you say,
the split_offset suggestion wouldn't have handled this anyway.

So yeah, please keep what you have now.

>> > +/* Callback for walk_gimple_seq that checks TP tree for TI ABI
>> > compliance.  */ +static tree
>> > +check_op_callback (tree *tp, int *walk_subtrees, void *data)
>> > +{
>> > +  struct walk_stmt_info *wi = (struct walk_stmt_info *) data;
>> > +
>> > +  if (RECORD_OR_UNION_TYPE_P (*tp) || TREE_CODE (*tp) == ENUMERAL_TYPE)
>> > +    {
>> > +      /* Forward declarations have NULL tree type.  Skip them.  */
>> > +      if (TREE_TYPE (*tp) == NULL)
>> > +	return NULL;
>> > +    }
>> > +
>> > +  /* TODO - why C++ leaves INTEGER_TYPE forward declarations around?  */
>> > +  if (TREE_TYPE (*tp) == NULL)
>> > +    return NULL;
>> > +
>> > +  const tree type = TREE_TYPE (*tp);
>> > +
>> > +  /* Direct function calls are allowed, obviously.  */
>> > +  if (TREE_CODE (*tp) == ADDR_EXPR && TREE_CODE (type) == POINTER_TYPE)
>> > +    {
>> > +      const tree ptype = TREE_TYPE (type);
>> > +      if (TREE_CODE (ptype) == FUNCTION_TYPE)
>> > +	return NULL;
>> > +    }
>> 
>> This seems like a bit of a dangerous heuristic.  Couldn't it also cause
>> us to skip things like:
>> 
>>    (void *) func
>> 
>> ?  (OK, that's dubious C, but we do support it.)
> The cast yields a "data pointer", which is 32 bits for both types of ABI. 
> Hence it is safe to skip "(void *) func".
>
> The TI ABI's 16 bit function pointers become a problem when they change the 
> layout of structures and function argument registers.

OK.  The reason this stood out is that the code doesn't obviously match
the comment.  If the code is just trying to skip direct function calls,
I think the gcall sequence I mentioned would be more obvious, and would
match the existing comment.  If anything that takes the address of a
function is OK then it might be worth changing the comment to include that.

>> > +	    {
>> > +	      *total = 0;
>> > +	    }
>> > +	  else
>> > +	    {
>> > +	      /* SI move has the same cost as a QI move.  */
>> > +	      int factor = GET_MODE_SIZE (mode) / GET_MODE_SIZE (SImode);
>> > +	      if (factor == 0) 
>> > +		factor = 1;
>> > +	      *total = factor * COSTS_N_INSNS (1);
>> > +	    }
>> 
>> Could you explain this a bit more?  It looks like a pure QImode operation
>> gets a cost of 1 insn but an SImode operation zero-extended from QImode
>> gets a cost of 0.
> I have unintentionally bumped QI cost while trying to make zero-extends cheap. 
> I will fix by using the following simple logic:
>
>     case SET:
> 	  mode = GET_MODE (SET_DEST (x));
> 	  /* SI move has the same cost as a QI move.  Moves larger than
> 	     64 bits are costly.  */
> 	  int factor = CEIL (GET_MODE_SIZE (mode), GET_MODE_SIZE (SImode));
> 	  *total = factor * COSTS_N_INSNS (1);

Looks good.

>> > +/* Implement TARGET_PREFERRED_RELOAD_CLASS.  */
>> > +static reg_class_t
>> > +pru_preferred_reload_class (rtx x ATTRIBUTE_UNUSED, reg_class_t regclass)
>> > +{
>> > +  return regclass == NO_REGS ? GENERAL_REGS : regclass;
>> > +}
>> 
>> This looks odd: PREFERRED_RELOAD_CLASS should return a subset of the
>> original class rather than add new registers.  Can you remember why
>> it was needed?
> I'm not sure what target code is supposed to return when NO_REGS is passed 
> here. I saw how other ports handle NO_REGS (c-sky, m32c, nios2, rl78). So I 
> followed suite.
>
> Here is a backtrace of LRA when NO_REGS is passed:
> 0xf629ae pru_preferred_reload_class
>         ../../gcc/gcc/config/pru/pru.c:788
> 0xa3d6e8 process_alt_operands
>         ../../gcc/gcc/lra-constraints.c:2600
> 0xa3ef7d curr_insn_transform
>         ../../gcc/gcc/lra-constraints.c:3889
> 0xa4301e lra_constraints(bool)
>         ../../gcc/gcc/lra-constraints.c:4906
> 0xa2726c lra(_IO_FILE*)
>         ../../gcc/gcc/lra.c:2446
> 0x9c97a9 do_reload
>         ../../gcc/gcc/ira.c:5469
> 0x9c97a9 execute
>         ../../gcc/gcc/ira.c:5653

I think it should just pass NO_REGS through unmodified (which means
spill to memory).  In some ways it would be good if it didn't have to
handle this case, but again that's going to be work to fix.

The RA will pass ALL_REGS if it can handle any register, and wants to
know what the target would prefer.  But for any input, the hook needs
to stick within the registers it was given.

>> > +/* Helper function to get the starting storage HW register for an
>> > argument, +   or -1 if it must be passed on stack.  The cum_v state is
>> > not changed.  */ +static int
>> > +pru_function_arg_regi (cumulative_args_t cum_v,
>> > +		       machine_mode mode, const_tree type,
>> > +		       bool named)
>> > +{
>> > +  CUMULATIVE_ARGS *cum = get_cumulative_args (cum_v);
>> > +  size_t argsize = pru_function_arg_size (mode, type);
>> > +  size_t i, bi;
>> > +  int regi = -1;
>> > +
>> > +  if (!pru_arg_in_reg_bysize (argsize))
>> > +    return -1;
>> > +
>> > +  if (!named)
>> > +    return -1;
>> > +
>> > +  /* Find the first available slot that fits.  Yes, that's the PRU ABI. 
>> > */ +  for (i = 0; regi < 0 && i < ARRAY_SIZE (cum->regs_used); i++)
>> > +    {
>> > +      if (mode == BLKmode)
>> > +	{
>> > +	  /* Structs are passed, beginning at a full register.  */
>> > +	  if ((i % 4) != 0)
>> > +	    continue;
>> > +	}
>> 
>> The comment doesn't seem to match the code: a struct like:
>> 
>>   struct s { short x; };
>> 
>> will have HImode rather than BLKmode.
> I will update the comment to clarify it is only for large structs. Code should 
> be ok.
>
>> It's usually better to base ABI stuff on the type where possible,
>> since that corresponds directly to the source language, while modes
>> are more of an internal GCC thing.
> I will explore this option. But ABI is already defined in data sizes, which I 
> think neatly fit with GCC's modes.

Thanks.  And this certainly shouldn't hold up acceptance, it was just a
suggestion.

>> > +; LRA cannot cope with clobbered op2, hence the scratch register.
>> > +(define_insn "ashr<mode>3"
>> > +  [(set (match_operand:QISI 0 "register_operand"	    "=&r,r")
>> > +	  (ashiftrt:QISI
>> > +	    (match_operand:QISI 1 "register_operand"	    "0,r")
>> > +	    (match_operand:QISI 2 "reg_or_const_1_operand"  "r,P")))
>> > +   (clobber (match_scratch:QISI 3			    "=r,X"))]
>> > +  ""
>> > +  "@
>> > +   mov %3, %2\;ASHRLP%=:\;qbeq ASHREND%=, %3, 0\; sub %3, %3, 1\;
>> > lsr\\t%0, %0, 1\; qbbc ASHRLP%=, %0, (%S0 * 8) - 2\; set %0, %0, (%S0 *
>> > 8) - 1\; jmp ASHRLP%=\;ASHREND%=: +   lsr\\t%0, %1, 1\;qbbc LSIGN%=, %0,
>> > (%S0 * 8) - 2\;set %0, %0, (%S0 * 8) - 1\;LSIGN%=:" +  [(set_attr "type"
>> > "complex,alu")
>> > +   (set_attr "length" "28,4")])
>> 
>> What do you mean by LRA not coping?  What did you try originally and
>> what went wrong?
> Better assembly could be generated if the shift count register was used for a 
> loop counter. Pseudo code:
>
>    while (op2--)
>      op0 >>= 1;
>  
> I originally tried to clobber operand 2:
>  (define_insn "ashr<mode>3"
>    [(set (match_operand:QISI 0 "register_operand" "=&r,r")
>          (ashiftrt:QISI
>            (match_operand:QISI 1 "register_operand" "0,r")
>            (match_operand:QISI 2 "reg_or_const_1_operand" "+r,P"))
>           )]
>
> But with the above pattern operand 2 was not clobbered. Its value was deemed 
> untouched (i.e. live across the pattern). So I came up with  clobbering a 
> separate register to fix this, at the expense of slightly bigger generated 
> code.

Ah, yeah, "+" has to be on an output operand (the destination of a set
or clobber).

In this kind of situation you might be better off expanding the loop
as individual RTL instructions, which is what several targets do for
things like memmove.  But if you'd prefer to keep a single pattern,
you could use:

  [(set (match_operand:QISI 0 "register_operand" "=r")
	(ashiftrt:QISI
	  (match_operand:QISI 2 "register_operand" "0")
	  (match_operand:QISI 3 "register_operand" "1")))
   (set (match_operand:QISI 1 "register_operand" "=r")
	(const_int -1))]

(assuming the optimised loop does leave operand 2 containing all-1s).

The shift by 1 would then need to be a separate pattern, with the
define_expand choosing between them.

>> > +(define_insn "<code>di3"
>> > +  [(set (match_operand:DI 0 "register_operand"		"=&r,&r")
>> > +	  (LOGICAL_BITOP:DI
>> > +	    (match_operand:DI 1 "register_operand"	"%r,r")
>> > +	    (match_operand:DI 2 "reg_or_ubyte_operand"	"r,I")))]
>> > +  ""
>> > +  "@
>> > +   <logical_bitop_asm>\\t%F0, %F1, %F2\;<logical_bitop_asm>\\t%N0, %N1,
>> > %N2 +   <logical_bitop_asm>\\t%F0, %F1, %2\;<logical_bitop_asm>\\t%N0,
>> > %N1, 0" +  [(set_attr "type" "alu,alu")
>> > +   (set_attr "length" "8,8")])
>> > +
>> > +
>> > +(define_insn "one_cmpldi2"
>> > +  [(set (match_operand:DI 0 "register_operand"		"=r")
>> > +	(not:DI (match_operand:DI 1 "register_operand"	"r")))]
>> > +  ""
>> > +{
>> > +  /* careful with overlapping source and destination regs.  */
>> > +  gcc_assert (GP_REG_P (REGNO (operands[0])));
>> > +  gcc_assert (GP_REG_P (REGNO (operands[1])));
>> > +  if (REGNO (operands[0]) == (REGNO (operands[1]) + 4))
>> > +    return "not\\t%N0, %N1\;not\\t%F0, %F1";
>> > +  else
>> > +    return "not\\t%F0, %F1\;not\\t%N0, %N1";
>> > +}
>> > +  [(set_attr "type" "alu")
>> > +   (set_attr "length" "8")])
>> 
>> Do you see any code improvements from defining these patterns?
>> These days I'd expect you'd get better code by letting the
>> target-independent code split them up into SImode operations.
> Yes, these patterns improve code size and speed.
>
> Looking at expand_binop(), DI logical ops are split into WORD mode, which in 
> PRU's peculiar case is QI. So without those patterns, a 64 bit IOR generates 8 
> QI instructions:
>
>         or      r0.b0, r14.b0, r16.b0
>         or      r0.b1, r14.b1, r16.b1
>         or      r0.b2, r14.b2, r16.b2
>         or      r0.b3, r14.b3, r16.b3
>         or      r1.b0, r15.b0, r17.b0
>         or      r1.b1, r15.b1, r17.b1
>         or      r1.b2, r15.b2, r17.b2
>         or      r1.b3, r15.b3, r17.b3
>
> Whereas with patterns defined, it is only 2 SImode instructions:
>         or      r0, r14, r16
>         or      r1, r15, r17

Ah, yeah, I'd forgotten about words being QImode.  Ideally the doubleword
handling would be extended to any mode that is twice the width of a "cheap"
mode, but again that'd be a fair amount of work.

So yeah, please keep this as-is.

>> > +;; Multiply instruction.  Idea for fixing registers comes from the AVR
>> > backend. +
>> > +(define_expand "mulsi3"
>> > +  [(set (match_operand:SI 0 "register_operand" "")
>> > +	(mult:SI (match_operand:SI 1 "register_operand" "")
>> > +		 (match_operand:SI 2 "register_operand" "")))]
>> > +  ""
>> > +{
>> > +  emit_insn (gen_mulsi3_fixinp (operands[0], operands[1], operands[2]));
>> > +  DONE;
>> > +})
>> > +
>> > +
>> > +(define_expand "mulsi3_fixinp"
>> > +  [(set (reg:SI 112) (match_operand:SI 1 "register_operand" ""))
>> > +   (set (reg:SI 116) (match_operand:SI 2 "register_operand" ""))
>> > +   (set (reg:SI 104) (mult:SI (reg:SI 112) (reg:SI 116)))
>> > +   (set (match_operand:SI 0 "register_operand" "") (reg:SI 104))]
>> > +  ""
>> > +{
>> > +})
>> 
>> This seems slightly dangerous since there's nothing to guarantee that
>> those registers aren't already live at the point of expansion.
>> 
>> The more usual way (and IMO correct way) would be for:
>> > +(define_insn "*mulsi3_prumac"
>> > +  [(set (reg:SI 104) (mult:SI (reg:SI 112) (reg:SI 116)))]
>> > +  ""
>> > +  "nop\;xin\\t0, r26, 4"
>> > +  [(set_attr "type" "alu")
>> > +   (set_attr "length" "8")])
>> 
>> to have register classes and constraints for these three registers,
>> like e.g. the x86 port does for %cx etc.
>> 
>> It would be good if you could try that.  On the other hand, if AVR
>> already does this and it worked in practice then I guess it's not
>> something that should hold up the port.
> This suggestion worked. It was a matter of correct predicates.
>
> Here is what I will put in the next patch version:
>
> (define_predicate "pru_muldst_operand"
>   (and (match_code "reg")
>        (ior (match_test "REGNO_REG_CLASS (REGNO (op)) == MULDST_REGS")
> 	    (match_test "REGNO (op) >= FIRST_PSEUDO_REGISTER"))))
> ...
> (define_register_constraint "Rmd0" "MULDST_REGS"
>   "@internal
>   The multiply destination register.")
> ...
>
> (define_insn "mulsi3"
>   [(set (match_operand:SI 0 "pru_muldst_operand"	   "=Rmd0")
> 	(mult:SI (match_operand:SI 1 "pru_mulsrc0_operand" "Rms0")
> 		 (match_operand:SI 2 "pru_mulsrc1_operand" "Rms1")))]

Looks good.  Thanks for doing this.

You could also add a "%" constraint to operand 1 to allow the input
registers to be swapped, just on the off chance that that's useful.
But I suppose there's a risk that providing that alternative will
confuse the RA cost model a bit.  Just a suggestion.

>> > @@ -23444,6 +23449,56 @@ the offset with a symbol reference to a canary in
>> > the TLS block.> 
>> >  @end table
>> > 
>> > +@node PRU Options
>> > +@subsection PRU Options
>> > +@cindex PRU Options
>> > +
>> > +These command-line options are defined for PRU target:
>> > +
>> > +@table @gcctabopt
>> > +@item -minrt
>> > +@opindex minrt
>> > +Enable the use of a minimum runtime environment---no static
>> > +initializers or constructors.  Results in significant code size
>> > +reduction of final ELF binaries.
>> 
>> Up to you, but this read to me as "-m"+"inrt".  If this option is already
>> in use then obviously keep it, but if it's new then it might be worth
>> calling it -mminrt instead, for consistency with -mmcu.
> I assumed this is a standard naming since MSP430 already uses it. I've seen 
> the option being utilized by pru-gcc users, so let's keep it that way.

OK.

>> Despite the long reply (sorry), this looks in really good shape to me FWIW.
>> Thanks for the submission.
> I take this sentence as an indication that the PRU port is welcome for 
> inclusion into GCC. I'll work on fixing the comments and will send a new patch 
> version.

Well, process-wise, it's up to the steering committee to decide whether
to accept the port.  But one of the main requirements is of course
reviewing the patches.

Which of the other patches in the series still need review?  I think Jeff
approved some of them earlier.

Thanks,
Richard
Jeff Law Sept. 24, 2018, 5:40 p.m. UTC | #4
On 9/24/18 4:38 AM, Richard Sandiford wrote:
> Dimitar Dimitrov <dimitar@dinux.eu> writes:
>> On Thursday, 9/13/2018 13:02:21 EEST Richard Sandiford wrote:
>>> Dimitar Dimitrov <dimitar@dinux.eu> writes:
>>>> +(define_insn
>>>> "sub_impl<EQD:mode><EQS0:mode><EQS1:mode>_<alu3_zext><alu3_zext_op1><alu3
>>>> _zext_op2>" +  [(set (match_operand:EQD 0 "register_operand" "=r,r,r")
>>>> +	(minus:EQD
>>>> +	 (zero_extend:EQD
>>>> +	  (match_operand:EQS0 1 "reg_or_ubyte_operand" "r,r,I"))
>>>> +	 (zero_extend:EQD
>>>> +	  (match_operand:EQS1 2 "reg_or_ubyte_operand" "r,I,r"))))]
>>>> +  ""
>>>> +  "@
>>>> +   sub\\t%0, %1, %2
>>>> +   sub\\t%0, %1, %2
>>>> +   rsb\\t%0, %2, %1"
>>>> +  [(set_attr "type" "alu")
>>>> +   (set_attr "length" "4")])
>>>
>>> By convention, subtraction patterns shouldn't accept constants for
>>> operand 2.  Constants should instead be subtracted using an addition
>>> of the negative.
>> Understood. I will remove second alternative. But I will leave the third one 
>> since it enables an important optimization:
>>
>>    unsigned test(unsigned a)
>>    {
>>         return 10-a;
>>    }
>>
>> RTL:
>> (insn 6 3 7 2 (set (reg:SI 152)
>>         (minus:SI (const_int 10 [0xa])
>>             (reg/v:SI 151 [ a ]))) "test.c":430 -1
>>      (nil))
>>
>> Assembly:
>>     test:
>>         rsb     r14, r14, 10
>>         ret
> 
> Thanks.  And yeah, the final alternative is fine.
> 
>>>> +;; Return true if OP is a text segment reference.
>>>> +;; This is needed for program memory address expressions.  Borrowed from
>>>> AVR. +(define_predicate "text_segment_operand"
>>>> +  (match_code "code_label,label_ref,symbol_ref,plus,minus,const")
>>>> +{
>>>> +  switch (GET_CODE (op))
>>>> +    {
>>>> +    case CODE_LABEL:
>>>> +      return true;
>>>> +    case LABEL_REF :
>>>> +      return true;
>>>> +    case SYMBOL_REF :
>>>> +      return SYMBOL_REF_FUNCTION_P (op);
>>>> +    case PLUS :
>>>> +    case MINUS :
>>>> +      /* Assume canonical format of symbol + constant.
>>>> +	 Fall through.  */
>>>> +    case CONST :
>>>> +      return text_segment_operand (XEXP (op, 0), VOIDmode);
>>>> +    default :
>>>> +      return false;
>>>> +    }
>>>> +})
>>>
>>> This probably comes from AVR, but: no spaces before ":".
>>>
>>> Bit surprised that we can get a CODE_LABEL rather than a LABEL_REF here.
>>> Do you know if that triggers in practice, and if so where?
>> Indeed, CODE_LABEL case is never reached. I'll leave gcc_unreachable here.
>>
>>> An IMO neater and slightly more robust way of writing the body is:
>>>   poly_int64 offset:
>>>   rtx base = strip_offset (op, &offset);
>>>   switch (GET_CODE (base))
>>>   
>>>     {
>>>     
>>>     case LABEL_REF:
>>>       ...as above...
>>>     
>>>     case SYMBOL_REF:
>>>       ...as above...
>>>     
>>>     default:
>>>       return false;
>>>     
>>>     }
>>>
>>> with "plus" and "minus" not in the match_code list (since they should
>>> always appear in consts if they really are text references).
>>
>> The "plus" and "minus" are needed when handling code labels as values. Take 
>> for example the following construct:
>>
>>    int x = &&lab1 - &&lab0;
>> lab1:
>>   ...
>> lab2:
>>
>>
>> My TARGET_ASM_INTEGER callback uses the text_segment_operand predicate. In the 
>> above case it is passed the following RTL expression:
>> (minus:SI (label_ref/v:SI 20)
>>     (label_ref/v:SI 27))
>>
>> I need to detect text labels so that I annotate them with %pmem:
>>         .4byte	%pmem(.L4-(.L2))
>> Instead of the incorrect:
>>         .4byte   .L3-(.L2)
> 
> OK, thanks for the explanation.  I think the target-independent code should
> really be passing (const (minus ...)) rather than a plain (minus ...) here,
> but that's going to be difficult to change at this stage.  And like you say,
> the split_offset suggestion wouldn't have handled this anyway.
> 
> So yeah, please keep what you have now.
> 
>>>> +/* Callback for walk_gimple_seq that checks TP tree for TI ABI
>>>> compliance.  */ +static tree
>>>> +check_op_callback (tree *tp, int *walk_subtrees, void *data)
>>>> +{
>>>> +  struct walk_stmt_info *wi = (struct walk_stmt_info *) data;
>>>> +
>>>> +  if (RECORD_OR_UNION_TYPE_P (*tp) || TREE_CODE (*tp) == ENUMERAL_TYPE)
>>>> +    {
>>>> +      /* Forward declarations have NULL tree type.  Skip them.  */
>>>> +      if (TREE_TYPE (*tp) == NULL)
>>>> +	return NULL;
>>>> +    }
>>>> +
>>>> +  /* TODO - why C++ leaves INTEGER_TYPE forward declarations around?  */
>>>> +  if (TREE_TYPE (*tp) == NULL)
>>>> +    return NULL;
>>>> +
>>>> +  const tree type = TREE_TYPE (*tp);
>>>> +
>>>> +  /* Direct function calls are allowed, obviously.  */
>>>> +  if (TREE_CODE (*tp) == ADDR_EXPR && TREE_CODE (type) == POINTER_TYPE)
>>>> +    {
>>>> +      const tree ptype = TREE_TYPE (type);
>>>> +      if (TREE_CODE (ptype) == FUNCTION_TYPE)
>>>> +	return NULL;
>>>> +    }
>>>
>>> This seems like a bit of a dangerous heuristic.  Couldn't it also cause
>>> us to skip things like:
>>>
>>>    (void *) func
>>>
>>> ?  (OK, that's dubious C, but we do support it.)
>> The cast yields a "data pointer", which is 32 bits for both types of ABI. 
>> Hence it is safe to skip "(void *) func".
>>
>> The TI ABI's 16 bit function pointers become a problem when they change the 
>> layout of structures and function argument registers.
> 
> OK.  The reason this stood out is that the code doesn't obviously match
> the comment.  If the code is just trying to skip direct function calls,
> I think the gcall sequence I mentioned would be more obvious, and would
> match the existing comment.  If anything that takes the address of a
> function is OK then it might be worth changing the comment to include that.
> 
>>>> +	    {
>>>> +	      *total = 0;
>>>> +	    }
>>>> +	  else
>>>> +	    {
>>>> +	      /* SI move has the same cost as a QI move.  */
>>>> +	      int factor = GET_MODE_SIZE (mode) / GET_MODE_SIZE (SImode);
>>>> +	      if (factor == 0) 
>>>> +		factor = 1;
>>>> +	      *total = factor * COSTS_N_INSNS (1);
>>>> +	    }
>>>
>>> Could you explain this a bit more?  It looks like a pure QImode operation
>>> gets a cost of 1 insn but an SImode operation zero-extended from QImode
>>> gets a cost of 0.
>> I have unintentionally bumped QI cost while trying to make zero-extends cheap. 
>> I will fix by using the following simple logic:
>>
>>     case SET:
>> 	  mode = GET_MODE (SET_DEST (x));
>> 	  /* SI move has the same cost as a QI move.  Moves larger than
>> 	     64 bits are costly.  */
>> 	  int factor = CEIL (GET_MODE_SIZE (mode), GET_MODE_SIZE (SImode));
>> 	  *total = factor * COSTS_N_INSNS (1);
> 
> Looks good.
> 
>>>> +/* Implement TARGET_PREFERRED_RELOAD_CLASS.  */
>>>> +static reg_class_t
>>>> +pru_preferred_reload_class (rtx x ATTRIBUTE_UNUSED, reg_class_t regclass)
>>>> +{
>>>> +  return regclass == NO_REGS ? GENERAL_REGS : regclass;
>>>> +}
>>>
>>> This looks odd: PREFERRED_RELOAD_CLASS should return a subset of the
>>> original class rather than add new registers.  Can you remember why
>>> it was needed?
>> I'm not sure what target code is supposed to return when NO_REGS is passed 
>> here. I saw how other ports handle NO_REGS (c-sky, m32c, nios2, rl78). So I 
>> followed suite.
>>
>> Here is a backtrace of LRA when NO_REGS is passed:
>> 0xf629ae pru_preferred_reload_class
>>         ../../gcc/gcc/config/pru/pru.c:788
>> 0xa3d6e8 process_alt_operands
>>         ../../gcc/gcc/lra-constraints.c:2600
>> 0xa3ef7d curr_insn_transform
>>         ../../gcc/gcc/lra-constraints.c:3889
>> 0xa4301e lra_constraints(bool)
>>         ../../gcc/gcc/lra-constraints.c:4906
>> 0xa2726c lra(_IO_FILE*)
>>         ../../gcc/gcc/lra.c:2446
>> 0x9c97a9 do_reload
>>         ../../gcc/gcc/ira.c:5469
>> 0x9c97a9 execute
>>         ../../gcc/gcc/ira.c:5653
> 
> I think it should just pass NO_REGS through unmodified (which means
> spill to memory).  In some ways it would be good if it didn't have to
> handle this case, but again that's going to be work to fix.
> 
> The RA will pass ALL_REGS if it can handle any register, and wants to
> know what the target would prefer.  But for any input, the hook needs
> to stick within the registers it was given.
> 
>>>> +/* Helper function to get the starting storage HW register for an
>>>> argument, +   or -1 if it must be passed on stack.  The cum_v state is
>>>> not changed.  */ +static int
>>>> +pru_function_arg_regi (cumulative_args_t cum_v,
>>>> +		       machine_mode mode, const_tree type,
>>>> +		       bool named)
>>>> +{
>>>> +  CUMULATIVE_ARGS *cum = get_cumulative_args (cum_v);
>>>> +  size_t argsize = pru_function_arg_size (mode, type);
>>>> +  size_t i, bi;
>>>> +  int regi = -1;
>>>> +
>>>> +  if (!pru_arg_in_reg_bysize (argsize))
>>>> +    return -1;
>>>> +
>>>> +  if (!named)
>>>> +    return -1;
>>>> +
>>>> +  /* Find the first available slot that fits.  Yes, that's the PRU ABI. 
>>>> */ +  for (i = 0; regi < 0 && i < ARRAY_SIZE (cum->regs_used); i++)
>>>> +    {
>>>> +      if (mode == BLKmode)
>>>> +	{
>>>> +	  /* Structs are passed, beginning at a full register.  */
>>>> +	  if ((i % 4) != 0)
>>>> +	    continue;
>>>> +	}
>>>
>>> The comment doesn't seem to match the code: a struct like:
>>>
>>>   struct s { short x; };
>>>
>>> will have HImode rather than BLKmode.
>> I will update the comment to clarify it is only for large structs. Code should 
>> be ok.
>>
>>> It's usually better to base ABI stuff on the type where possible,
>>> since that corresponds directly to the source language, while modes
>>> are more of an internal GCC thing.
>> I will explore this option. But ABI is already defined in data sizes, which I 
>> think neatly fit with GCC's modes.
> 
> Thanks.  And this certainly shouldn't hold up acceptance, it was just a
> suggestion.
> 
>>>> +; LRA cannot cope with clobbered op2, hence the scratch register.
>>>> +(define_insn "ashr<mode>3"
>>>> +  [(set (match_operand:QISI 0 "register_operand"	    "=&r,r")
>>>> +	  (ashiftrt:QISI
>>>> +	    (match_operand:QISI 1 "register_operand"	    "0,r")
>>>> +	    (match_operand:QISI 2 "reg_or_const_1_operand"  "r,P")))
>>>> +   (clobber (match_scratch:QISI 3			    "=r,X"))]
>>>> +  ""
>>>> +  "@
>>>> +   mov %3, %2\;ASHRLP%=:\;qbeq ASHREND%=, %3, 0\; sub %3, %3, 1\;
>>>> lsr\\t%0, %0, 1\; qbbc ASHRLP%=, %0, (%S0 * 8) - 2\; set %0, %0, (%S0 *
>>>> 8) - 1\; jmp ASHRLP%=\;ASHREND%=: +   lsr\\t%0, %1, 1\;qbbc LSIGN%=, %0,
>>>> (%S0 * 8) - 2\;set %0, %0, (%S0 * 8) - 1\;LSIGN%=:" +  [(set_attr "type"
>>>> "complex,alu")
>>>> +   (set_attr "length" "28,4")])
>>>
>>> What do you mean by LRA not coping?  What did you try originally and
>>> what went wrong?
>> Better assembly could be generated if the shift count register was used for a 
>> loop counter. Pseudo code:
>>
>>    while (op2--)
>>      op0 >>= 1;
>>  
>> I originally tried to clobber operand 2:
>>  (define_insn "ashr<mode>3"
>>    [(set (match_operand:QISI 0 "register_operand" "=&r,r")
>>          (ashiftrt:QISI
>>            (match_operand:QISI 1 "register_operand" "0,r")
>>            (match_operand:QISI 2 "reg_or_const_1_operand" "+r,P"))
>>           )]
>>
>> But with the above pattern operand 2 was not clobbered. Its value was deemed 
>> untouched (i.e. live across the pattern). So I came up with  clobbering a 
>> separate register to fix this, at the expense of slightly bigger generated 
>> code.
> 
> Ah, yeah, "+" has to be on an output operand (the destination of a set
> or clobber).
> 
> In this kind of situation you might be better off expanding the loop
> as individual RTL instructions, which is what several targets do for
> things like memmove.  But if you'd prefer to keep a single pattern,
> you could use:
> 
>   [(set (match_operand:QISI 0 "register_operand" "=r")
> 	(ashiftrt:QISI
> 	  (match_operand:QISI 2 "register_operand" "0")
> 	  (match_operand:QISI 3 "register_operand" "1")))
>    (set (match_operand:QISI 1 "register_operand" "=r")
> 	(const_int -1))]
> 
> (assuming the optimised loop does leave operand 2 containing all-1s).
> 
> The shift by 1 would then need to be a separate pattern, with the
> define_expand choosing between them.
> 
>>>> +(define_insn "<code>di3"
>>>> +  [(set (match_operand:DI 0 "register_operand"		"=&r,&r")
>>>> +	  (LOGICAL_BITOP:DI
>>>> +	    (match_operand:DI 1 "register_operand"	"%r,r")
>>>> +	    (match_operand:DI 2 "reg_or_ubyte_operand"	"r,I")))]
>>>> +  ""
>>>> +  "@
>>>> +   <logical_bitop_asm>\\t%F0, %F1, %F2\;<logical_bitop_asm>\\t%N0, %N1,
>>>> %N2 +   <logical_bitop_asm>\\t%F0, %F1, %2\;<logical_bitop_asm>\\t%N0,
>>>> %N1, 0" +  [(set_attr "type" "alu,alu")
>>>> +   (set_attr "length" "8,8")])
>>>> +
>>>> +
>>>> +(define_insn "one_cmpldi2"
>>>> +  [(set (match_operand:DI 0 "register_operand"		"=r")
>>>> +	(not:DI (match_operand:DI 1 "register_operand"	"r")))]
>>>> +  ""
>>>> +{
>>>> +  /* careful with overlapping source and destination regs.  */
>>>> +  gcc_assert (GP_REG_P (REGNO (operands[0])));
>>>> +  gcc_assert (GP_REG_P (REGNO (operands[1])));
>>>> +  if (REGNO (operands[0]) == (REGNO (operands[1]) + 4))
>>>> +    return "not\\t%N0, %N1\;not\\t%F0, %F1";
>>>> +  else
>>>> +    return "not\\t%F0, %F1\;not\\t%N0, %N1";
>>>> +}
>>>> +  [(set_attr "type" "alu")
>>>> +   (set_attr "length" "8")])
>>>
>>> Do you see any code improvements from defining these patterns?
>>> These days I'd expect you'd get better code by letting the
>>> target-independent code split them up into SImode operations.
>> Yes, these patterns improve code size and speed.
>>
>> Looking at expand_binop(), DI logical ops are split into WORD mode, which in 
>> PRU's peculiar case is QI. So without those patterns, a 64 bit IOR generates 8 
>> QI instructions:
>>
>>         or      r0.b0, r14.b0, r16.b0
>>         or      r0.b1, r14.b1, r16.b1
>>         or      r0.b2, r14.b2, r16.b2
>>         or      r0.b3, r14.b3, r16.b3
>>         or      r1.b0, r15.b0, r17.b0
>>         or      r1.b1, r15.b1, r17.b1
>>         or      r1.b2, r15.b2, r17.b2
>>         or      r1.b3, r15.b3, r17.b3
>>
>> Whereas with patterns defined, it is only 2 SImode instructions:
>>         or      r0, r14, r16
>>         or      r1, r15, r17
> 
> Ah, yeah, I'd forgotten about words being QImode.  Ideally the doubleword
> handling would be extended to any mode that is twice the width of a "cheap"
> mode, but again that'd be a fair amount of work.
> 
> So yeah, please keep this as-is.
> 
>>>> +;; Multiply instruction.  Idea for fixing registers comes from the AVR
>>>> backend. +
>>>> +(define_expand "mulsi3"
>>>> +  [(set (match_operand:SI 0 "register_operand" "")
>>>> +	(mult:SI (match_operand:SI 1 "register_operand" "")
>>>> +		 (match_operand:SI 2 "register_operand" "")))]
>>>> +  ""
>>>> +{
>>>> +  emit_insn (gen_mulsi3_fixinp (operands[0], operands[1], operands[2]));
>>>> +  DONE;
>>>> +})
>>>> +
>>>> +
>>>> +(define_expand "mulsi3_fixinp"
>>>> +  [(set (reg:SI 112) (match_operand:SI 1 "register_operand" ""))
>>>> +   (set (reg:SI 116) (match_operand:SI 2 "register_operand" ""))
>>>> +   (set (reg:SI 104) (mult:SI (reg:SI 112) (reg:SI 116)))
>>>> +   (set (match_operand:SI 0 "register_operand" "") (reg:SI 104))]
>>>> +  ""
>>>> +{
>>>> +})
>>>
>>> This seems slightly dangerous since there's nothing to guarantee that
>>> those registers aren't already live at the point of expansion.
>>>
>>> The more usual way (and IMO correct way) would be for:
>>>> +(define_insn "*mulsi3_prumac"
>>>> +  [(set (reg:SI 104) (mult:SI (reg:SI 112) (reg:SI 116)))]
>>>> +  ""
>>>> +  "nop\;xin\\t0, r26, 4"
>>>> +  [(set_attr "type" "alu")
>>>> +   (set_attr "length" "8")])
>>>
>>> to have register classes and constraints for these three registers,
>>> like e.g. the x86 port does for %cx etc.
>>>
>>> It would be good if you could try that.  On the other hand, if AVR
>>> already does this and it worked in practice then I guess it's not
>>> something that should hold up the port.
>> This suggestion worked. It was a matter of correct predicates.
>>
>> Here is what I will put in the next patch version:
>>
>> (define_predicate "pru_muldst_operand"
>>   (and (match_code "reg")
>>        (ior (match_test "REGNO_REG_CLASS (REGNO (op)) == MULDST_REGS")
>> 	    (match_test "REGNO (op) >= FIRST_PSEUDO_REGISTER"))))
>> ...
>> (define_register_constraint "Rmd0" "MULDST_REGS"
>>   "@internal
>>   The multiply destination register.")
>> ...
>>
>> (define_insn "mulsi3"
>>   [(set (match_operand:SI 0 "pru_muldst_operand"	   "=Rmd0")
>> 	(mult:SI (match_operand:SI 1 "pru_mulsrc0_operand" "Rms0")
>> 		 (match_operand:SI 2 "pru_mulsrc1_operand" "Rms1")))]
> 
> Looks good.  Thanks for doing this.
> 
> You could also add a "%" constraint to operand 1 to allow the input
> registers to be swapped, just on the off chance that that's useful.
> But I suppose there's a risk that providing that alternative will
> confuse the RA cost model a bit.  Just a suggestion.
> 
>>>> @@ -23444,6 +23449,56 @@ the offset with a symbol reference to a canary in
>>>> the TLS block.> 
>>>>  @end table
>>>>
>>>> +@node PRU Options
>>>> +@subsection PRU Options
>>>> +@cindex PRU Options
>>>> +
>>>> +These command-line options are defined for PRU target:
>>>> +
>>>> +@table @gcctabopt
>>>> +@item -minrt
>>>> +@opindex minrt
>>>> +Enable the use of a minimum runtime environment---no static
>>>> +initializers or constructors.  Results in significant code size
>>>> +reduction of final ELF binaries.
>>>
>>> Up to you, but this read to me as "-m"+"inrt".  If this option is already
>>> in use then obviously keep it, but if it's new then it might be worth
>>> calling it -mminrt instead, for consistency with -mmcu.
>> I assumed this is a standard naming since MSP430 already uses it. I've seen 
>> the option being utilized by pru-gcc users, so let's keep it that way.
> 
> OK.
> 
>>> Despite the long reply (sorry), this looks in really good shape to me FWIW.
>>> Thanks for the submission.
>> I take this sentence as an indication that the PRU port is welcome for 
>> inclusion into GCC. I'll work on fixing the comments and will send a new patch 
>> version.
> 
> Well, process-wise, it's up to the steering committee to decide whether
> to accept the port.  But one of the main requirements is of course
> reviewing the patches.
> 
> Which of the other patches in the series still need review?  I think Jeff
> approved some of them earlier.
I think the only thing outstanding was the .md/.c/.h files for the port
itself.  The rest was very straightforward.
jeff
Dimitar Dimitrov Sept. 25, 2018, 3:19 a.m. UTC | #5
On Monday, 9/24/2018 11:38:23 EEST Richard Sandiford wrote:
> Dimitar Dimitrov <dimitar@dinux.eu> writes:
> > On Thursday, 9/13/2018 13:02:21 EEST Richard Sandiford wrote:
> >> Dimitar Dimitrov <dimitar@dinux.eu> writes:
> >> > +/* Callback for walk_gimple_seq that checks TP tree for TI ABI
> >> > compliance.  */ +static tree
> >> > +check_op_callback (tree *tp, int *walk_subtrees, void *data)
> >> > +{
> >> > +  struct walk_stmt_info *wi = (struct walk_stmt_info *) data;
> >> > +
> >> > +  if (RECORD_OR_UNION_TYPE_P (*tp) || TREE_CODE (*tp) ==
> >> > ENUMERAL_TYPE)
> >> > +    {
> >> > +      /* Forward declarations have NULL tree type.  Skip them.  */
> >> > +      if (TREE_TYPE (*tp) == NULL)
> >> > +	return NULL;
> >> > +    }
> >> > +
> >> > +  /* TODO - why C++ leaves INTEGER_TYPE forward declarations around? 
> >> > */
> >> > +  if (TREE_TYPE (*tp) == NULL)
> >> > +    return NULL;
> >> > +
> >> > +  const tree type = TREE_TYPE (*tp);
> >> > +
> >> > +  /* Direct function calls are allowed, obviously.  */
> >> > +  if (TREE_CODE (*tp) == ADDR_EXPR && TREE_CODE (type) ==
> >> > POINTER_TYPE)
> >> > +    {
> >> > +      const tree ptype = TREE_TYPE (type);
> >> > +      if (TREE_CODE (ptype) == FUNCTION_TYPE)
> >> > +	return NULL;
> >> > +    }
> >> 
> >> This seems like a bit of a dangerous heuristic.  Couldn't it also cause
> >> 
> >> us to skip things like:
> >>    (void *) func
> >> 
> >> ?  (OK, that's dubious C, but we do support it.)
> > 
> > The cast yields a "data pointer", which is 32 bits for both types of ABI.
> > Hence it is safe to skip "(void *) func".
> > 
> > The TI ABI's 16 bit function pointers become a problem when they change
> > the
> > layout of structures and function argument registers.
> 
> OK.  The reason this stood out is that the code doesn't obviously match
> the comment.  If the code is just trying to skip direct function calls,
> I think the gcall sequence I mentioned would be more obvious, and would
> match the existing comment.  If anything that takes the address of a
> function is OK then it might be worth changing the comment to include that.
I will use your suggested gcall snippet since it is safe and obvious. The 
comment matches my original intent.


> >> > +/* Implement TARGET_PREFERRED_RELOAD_CLASS.  */
> >> > +static reg_class_t
> >> > +pru_preferred_reload_class (rtx x ATTRIBUTE_UNUSED, reg_class_t
> >> > regclass)
> >> > +{
> >> > +  return regclass == NO_REGS ? GENERAL_REGS : regclass;
> >> > +}
> >> 
> >> This looks odd: PREFERRED_RELOAD_CLASS should return a subset of the
> >> original class rather than add new registers.  Can you remember why
> >> it was needed?
> > 
> > I'm not sure what target code is supposed to return when NO_REGS is passed
> > here. I saw how other ports handle NO_REGS (c-sky, m32c, nios2, rl78). So
> > I
> > followed suite.
> > 
> > Here is a backtrace of LRA when NO_REGS is passed:
> > 0xf629ae pru_preferred_reload_class
> > 
> >         ../../gcc/gcc/config/pru/pru.c:788
> > 
> > 0xa3d6e8 process_alt_operands
> > 
> >         ../../gcc/gcc/lra-constraints.c:2600
> > 
> > 0xa3ef7d curr_insn_transform
> > 
> >         ../../gcc/gcc/lra-constraints.c:3889
> > 
> > 0xa4301e lra_constraints(bool)
> > 
> >         ../../gcc/gcc/lra-constraints.c:4906
> > 
> > 0xa2726c lra(_IO_FILE*)
> > 
> >         ../../gcc/gcc/lra.c:2446
> > 
> > 0x9c97a9 do_reload
> > 
> >         ../../gcc/gcc/ira.c:5469
> > 
> > 0x9c97a9 execute
> > 
> >         ../../gcc/gcc/ira.c:5653
> 
> I think it should just pass NO_REGS through unmodified (which means
> spill to memory).  In some ways it would be good if it didn't have to
> handle this case, but again that's going to be work to fix.
> 
> The RA will pass ALL_REGS if it can handle any register, and wants to
> know what the target would prefer.  But for any input, the hook needs
> to stick within the registers it was given.

Thanks for the clarification. I will remove the PRU hook and will rely on the 
default implementation (i.e. return the given rclass).

> >> > +; LRA cannot cope with clobbered op2, hence the scratch register.
> >> > +(define_insn "ashr<mode>3"
> >> > +  [(set (match_operand:QISI 0 "register_operand"	    "=&r,r")
> >> > +	  (ashiftrt:QISI
> >> > +	    (match_operand:QISI 1 "register_operand"	    "0,r")
> >> > +	    (match_operand:QISI 2 "reg_or_const_1_operand"  "r,P")))
> >> > +   (clobber (match_scratch:QISI 3			    "=r,X"))]
> >> > +  ""
> >> > +  "@
> >> > +   mov %3, %2\;ASHRLP%=:\;qbeq ASHREND%=, %3, 0\; sub %3, %3, 1\;
> >> > lsr\\t%0, %0, 1\; qbbc ASHRLP%=, %0, (%S0 * 8) - 2\; set %0, %0, (%S0 *
> >> > 8) - 1\; jmp ASHRLP%=\;ASHREND%=: +   lsr\\t%0, %1, 1\;qbbc LSIGN%=,
> >> > %0,
> >> > (%S0 * 8) - 2\;set %0, %0, (%S0 * 8) - 1\;LSIGN%=:" +  [(set_attr
> >> > "type"
> >> > "complex,alu")
> >> > +   (set_attr "length" "28,4")])
> >> 
> >> What do you mean by LRA not coping?  What did you try originally and
> >> what went wrong?
> > 
> > Better assembly could be generated if the shift count register was used
> > for a> 
> > loop counter. Pseudo code:
> >    while (op2--)
> >    
> >      op0 >>= 1;
> > 
> > I originally tried to clobber operand 2:
> >  (define_insn "ashr<mode>3"
> >  
> >    [(set (match_operand:QISI 0 "register_operand" "=&r,r")
> >    
> >          (ashiftrt:QISI
> >          
> >            (match_operand:QISI 1 "register_operand" "0,r")
> >            (match_operand:QISI 2 "reg_or_const_1_operand" "+r,P"))
> >           
> >           )]
> > 
> > But with the above pattern operand 2 was not clobbered. Its value was
> > deemed untouched (i.e. live across the pattern). So I came up with 
> > clobbering a separate register to fix this, at the expense of slightly
> > bigger generated code.
> 
> Ah, yeah, "+" has to be on an output operand (the destination of a set
> or clobber).
> 
> In this kind of situation you might be better off expanding the loop
> as individual RTL instructions, which is what several targets do for
> things like memmove.  But if you'd prefer to keep a single pattern,
> you could use:
> 
>   [(set (match_operand:QISI 0 "register_operand" "=r")
> 	(ashiftrt:QISI
> 	  (match_operand:QISI 2 "register_operand" "0")
> 	  (match_operand:QISI 3 "register_operand" "1")))
>    (set (match_operand:QISI 1 "register_operand" "=r")
> 	(const_int -1))]
> 
> (assuming the optimised loop does leave operand 2 containing all-1s).
> 
> The shift by 1 would then need to be a separate pattern, with the
> define_expand choosing between them.
> 

Expanding the loop works like a charm. Thanks for the tip. I will include the 
rework in the next patch revision.


> >> Despite the long reply (sorry), this looks in really good shape to me
> >> FWIW.
> >> Thanks for the submission.
> > 
> > I take this sentence as an indication that the PRU port is welcome for
> > inclusion into GCC. I'll work on fixing the comments and will send a new
> > patch version.
> 
> Well, process-wise, it's up to the steering committee to decide whether
> to accept the port.  But one of the main requirements is of course
> reviewing the patches.
> 
> Which of the other patches in the series still need review?  I think Jeff
> approved some of them earlier.
Rest of patches are approved:
https://gcc.gnu.org/ml/gcc-patches/2018-06/msg01448.html
https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01152.html

Thanks,
Dimitar
diff mbox series

Patch

diff --git a/configure.ac b/configure.ac
index a0b0917dd55..11c9f72fc64 100644
--- a/configure.ac
+++ b/configure.ac
@@ -640,6 +640,10 @@  case "${target}" in
   powerpc-*-aix* | rs6000-*-aix*)
     noconfigdirs="$noconfigdirs target-libssp"
     ;;
+  pru-*-*)
+    # No hosted I/O support.
+    noconfigdirs="$noconfigdirs target-libssp"
+    ;;
   rl78-*-*)
     # libssp uses a misaligned load to trigger a fault, but the RL78
     # doesn't fault for those - instead, it gives a build-time error
@@ -824,6 +828,9 @@  case "${target}" in
   powerpc*-*-*)
     libgloss_dir=rs6000
     ;;
+  pru-*-*)
+    libgloss_dir=pru
+    ;;
   sparc*-*-*)
     libgloss_dir=sparc
     ;;
diff --git a/gcc/common/config/pru/pru-common.c b/gcc/common/config/pru/pru-common.c
new file mode 100644
index 00000000000..e87d70ce9ca
--- /dev/null
+++ b/gcc/common/config/pru/pru-common.c
@@ -0,0 +1,36 @@ 
+/* Common hooks for TI PRU
+   Copyright (C) 2014-2018 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "diagnostic-core.h"
+#include "tm.h"
+#include "common/common-target.h"
+#include "common/common-target-def.h"
+#include "opts.h"
+#include "flags.h"
+
+#undef TARGET_DEFAULT_TARGET_FLAGS
+#define TARGET_DEFAULT_TARGET_FLAGS		(MASK_OPT_LOOP)
+
+#undef TARGET_EXCEPT_UNWIND_INFO
+#define TARGET_EXCEPT_UNWIND_INFO sjlj_except_unwind_info
+
+struct gcc_targetm_common targetm_common = TARGETM_COMMON_INITIALIZER;
diff --git a/gcc/config.gcc b/gcc/config.gcc
index 6ad2ba4d152..66b31f320e1 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -493,6 +493,9 @@  powerpc*-*-*)
 	esac
 	extra_options="${extra_options} g.opt fused-madd.opt rs6000/rs6000-tables.opt"
 	;;
+pru-*-*)
+	cpu_type=pru
+	;;
 riscv*)
 	cpu_type=riscv
 	extra_objs="riscv-builtins.o riscv-c.o"
@@ -2699,6 +2702,12 @@  powerpcle-*-eabi*)
 	extra_options="${extra_options} rs6000/sysv4.opt"
 	use_gcc_stdint=wrap
 	;;
+pru*-*-*)
+	tm_file="elfos.h newlib-stdint.h ${tm_file}"
+	tmake_file="${tmake_file} pru/t-pru"
+	extra_objs="pru-pragma.o pru-passes.o"
+	use_gcc_stdint=wrap
+	;;
 rs6000-ibm-aix6.* | powerpc-ibm-aix6.*)
 	tm_file="${tm_file} rs6000/aix.h rs6000/aix61.h rs6000/xcoff.h rs6000/aix-stdint.h"
 	tmake_file="rs6000/t-aix52 t-slibgcc"
diff --git a/gcc/config/pru/alu-zext.md b/gcc/config/pru/alu-zext.md
new file mode 100644
index 00000000000..2112b08d3f4
--- /dev/null
+++ b/gcc/config/pru/alu-zext.md
@@ -0,0 +1,178 @@ 
+;; ALU operations with zero extensions
+;;
+;; Copyright (C) 2015-2018 Free Software Foundation, Inc.
+;; Contributed by Dimitar Dimitrov <dimitar@dinux.eu>
+;;
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+;;
+;; GCC is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+;; GNU General Public License for more details.
+;;
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; <http://www.gnu.org/licenses/>.
+
+(define_subst_attr "alu2_zext"     "alu2_zext_subst"     "_z" "_noz")
+
+(define_subst_attr "alu3_zext_op1" "alu3_zext_op1_subst" "_z1" "_noz1")
+(define_subst_attr "alu3_zext_op2" "alu3_zext_op2_subst" "_z2" "_noz2")
+(define_subst_attr "alu3_zext"     "alu3_zext_subst"     "_z" "_noz")
+
+(define_subst_attr "bitalu_zext"   "bitalu_zext_subst"   "_z" "_noz")
+
+(define_code_iterator ALUOP3 [plus minus and ior xor umin umax ashift lshiftrt])
+(define_code_iterator ALUOP2 [neg not])
+
+;; Arithmetic Operations
+
+(define_insn "add_impl<EQD:mode><EQS0:mode><EQS1:mode>_<alu3_zext><alu3_zext_op1><alu3_zext_op2>"
+  [(set (match_operand:EQD 0 "register_operand" "=r,r,r")
+	(plus:EQD
+	 (zero_extend:EQD
+	  (match_operand:EQS0 1 "register_operand" "%r,r,r"))
+	 (zero_extend:EQD
+	  (match_operand:EQS1 2 "nonmemory_operand" "r,I,M"))))]
+  ""
+  "@
+   add\\t%0, %1, %2
+   add\\t%0, %1, %2
+   sub\\t%0, %1, %n2"
+  [(set_attr "type" "alu")
+   (set_attr "length" "4")])
+
+(define_insn "sub_impl<EQD:mode><EQS0:mode><EQS1:mode>_<alu3_zext><alu3_zext_op1><alu3_zext_op2>"
+  [(set (match_operand:EQD 0 "register_operand" "=r,r,r")
+	(minus:EQD
+	 (zero_extend:EQD
+	  (match_operand:EQS0 1 "reg_or_ubyte_operand" "r,r,I"))
+	 (zero_extend:EQD
+	  (match_operand:EQS1 2 "reg_or_ubyte_operand" "r,I,r"))))]
+  ""
+  "@
+   sub\\t%0, %1, %2
+   sub\\t%0, %1, %2
+   rsb\\t%0, %2, %1"
+  [(set_attr "type" "alu")
+   (set_attr "length" "4")])
+
+
+(define_insn "neg_impl<EQD:mode><EQS0:mode>_<alu2_zext>"
+  [(set (match_operand:EQD 0 "register_operand" "=r")
+	(neg:EQD
+	  (zero_extend:EQD (match_operand:EQS0 1 "register_operand" "r"))))]
+  ""
+  "rsb\\t%0, %1, 0"
+  [(set_attr "type" "alu")
+   (set_attr "length" "4")])
+
+
+(define_insn "one_impl<EQD:mode><EQS0:mode>_<alu2_zext>"
+  [(set (match_operand:EQD 0 "register_operand" "=r")
+	(not:EQD
+	  (zero_extend:EQD (match_operand:EQS0 1 "register_operand" "r"))))]
+  ""
+  "not\\t%0, %1"
+  [(set_attr "type" "alu")
+   (set_attr "length" "4")])
+
+; Specialized IOR/AND patterns for matching setbit/clearbit instructions.
+;
+; TODO - allow clrbit and setbit to support (1 << REG) constructs
+
+(define_insn "clearbit_<EQD:mode><EQS0:mode>_<bitalu_zext>"
+  [(set (match_operand:EQD 0 "register_operand"	"=r")
+	(and:EQD
+	  (zero_extend:EQD
+	    (match_operand:EQS0 1 "register_operand" "r"))
+	  (match_operand:EQD 2 "single_zero_operand" "n")))]
+  ""
+  "clr\\t%0, %1, %V2"
+  [(set_attr "type" "alu")
+   (set_attr "length" "4")])
+
+(define_insn "setbit_<EQD:mode><EQS0:mode>_<bitalu_zext>"
+  [(set (match_operand:EQD 0 "register_operand" "=r")
+	(ior:EQD
+	  (zero_extend:EQD
+	    (match_operand:EQS0 1 "register_operand" "r"))
+	  (match_operand:EQD 2 "single_one_operand" "n")))]
+  ""
+  "set\\t%0, %1, %T2"
+  [(set_attr "type" "alu")
+   (set_attr "length" "4")])
+
+; Regular ALU ops
+(define_insn "<code>_impl<EQD:mode><EQS0:mode><EQS1:mode>_<alu3_zext><alu3_zext_op1><alu3_zext_op2>"
+  [(set (match_operand:EQD 0 "register_operand" "=r")
+	(LOGICAL:EQD
+	  (zero_extend:EQD
+	    (match_operand:EQS0 1 "register_operand"     "%r"))
+	  (zero_extend:EQD
+	    (match_operand:EQS1 2 "reg_or_ubyte_operand"  "rI"))))]
+  ""
+  "<logical_asm>\\t%0, %1, %2"
+  [(set_attr "type" "alu")
+   (set_attr "length" "4")])
+
+; Shift ALU ops
+(define_insn "<shift_op>_impl<EQD:mode><EQS0:mode><EQS1:mode>_<alu3_zext><alu3_zext_op1><alu3_zext_op2>"
+  [(set (match_operand:EQD 0 "register_operand" "=r")
+	(SHIFT:EQD
+	 (zero_extend:EQD (match_operand:EQS0 1 "register_operand" "r"))
+	 (zero_extend:EQD (match_operand:EQS1 2 "shift_operand"    "rL"))))]
+  ""
+  "<shift_asm>\\t%0, %1, %2"
+  [(set_attr "type" "alu")
+   (set_attr "length" "4")])
+
+;; Substitutions
+
+(define_subst "alu2_zext_subst"
+  [(set (match_operand:EQD 0 "" "")
+	(ALUOP2:EQD (zero_extend:EQD (match_operand:EQD 1 "" ""))))]
+  ""
+  [(set (match_dup 0)
+	(ALUOP2:EQD (match_dup 1)))])
+
+(define_subst "bitalu_zext_subst"
+  [(set (match_operand:EQD 0 "" "")
+	(ALUOP3:EQD (zero_extend:EQD (match_operand:EQD 1 "" ""))
+		    (match_operand:EQD 2 "" "")))]
+  ""
+  [(set (match_dup 0)
+	(ALUOP3:EQD (match_dup 1)
+		    (match_dup 2)))])
+
+(define_subst "alu3_zext_subst"
+  [(set (match_operand:EQD 0 "" "")
+	(ALUOP3:EQD (zero_extend:EQD (match_operand:EQD 1 "" ""))
+		    (zero_extend:EQD (match_operand:EQD 2 "" ""))))]
+  ""
+  [(set (match_dup 0)
+	(ALUOP3:EQD (match_dup 1)
+		    (match_dup 2)))])
+
+(define_subst "alu3_zext_op1_subst"
+  [(set (match_operand:EQD 0 "" "")
+	(ALUOP3:EQD (zero_extend:EQD (match_operand:EQD 1 "" ""))
+		    (zero_extend:EQD (match_operand:EQS1 2 "" ""))))]
+  ""
+  [(set (match_dup 0)
+	(ALUOP3:EQD (match_dup 1)
+		    (zero_extend:EQD (match_dup 2))))])
+
+(define_subst "alu3_zext_op2_subst"
+  [(set (match_operand:EQD 0 "" "")
+	(ALUOP3:EQD (zero_extend:EQD (match_operand:EQS0 1 "" ""))
+		    (zero_extend:EQD (match_operand:EQD 2 "" ""))))]
+  ""
+  [(set (match_dup 0)
+	(ALUOP3:EQD (zero_extend:EQD (match_dup 1))
+		    (match_dup 2)))])
diff --git a/gcc/config/pru/constraints.md b/gcc/config/pru/constraints.md
new file mode 100644
index 00000000000..326c8e9907c
--- /dev/null
+++ b/gcc/config/pru/constraints.md
@@ -0,0 +1,88 @@ 
+;; Constraint definitions for TI PRU.
+;; Copyright (C) 2014-2018 Free Software Foundation, Inc.
+;; Contributed by Dimitar Dimitrov <dimitar@dinux.eu>
+;;
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+;;
+;; GCC is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+;; GNU General Public License for more details.
+;;
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; <http://www.gnu.org/licenses/>.
+
+;; We use the following constraint letters for constants:
+;;
+;;  I: 0 to 255.
+;;  J: 0 to 65535.
+;;  L: 0 to 31 (for shift counts).
+;;  M: -255 to 0 (for converting ADD to SUB with suitable UBYTE OP2).
+;;  N: -32768 to 32767 (16-bit signed integer)
+;;  P: 1
+;;  T: Text segment label.  Needed to know when to select %pmem relocation.
+;;  Z: Constant integer zero.
+;;
+;; We use the following built-in register classes:
+;;
+;;  r: general purpose register (r0..r31)
+;;  m: memory operand
+;;
+;; The following constraints are intended for internal use only:
+;;  j: jump address register suitable for sibling calls
+;;  l: the internal counter register used by LOOP instructions
+
+;; Register constraints.
+
+(define_register_constraint "j" "SIB_REGS"
+  "A register suitable for an indirect sibcall.")
+
+(define_register_constraint "l" "LOOPCNTR_REGS"
+  "The internal counter register used by the LOOP instruction.")
+
+;; Integer constraints.
+
+(define_constraint "I"
+  "An unsigned 8-bit constant."
+  (and (match_code "const_int")
+       (match_test "UBYTE_INT (ival)")))
+
+(define_constraint "J"
+  "An unsigned 16-bit constant."
+  (and (match_code "const_int")
+       (match_test "UHWORD_INT (ival)")))
+
+(define_constraint "L"
+  "An unsigned 5-bit constant (for shift counts)."
+  (and (match_code "const_int")
+       (match_test "ival >= 0 && ival <= 31")))
+
+(define_constraint "M"
+  "A constant in the range [-255;0]."
+  (and (match_code "const_int")
+       (match_test "UBYTE_INT (-ival)")))
+
+(define_constraint "N"
+  "A constant in the range [-32768;32767]."
+  (and (match_code "const_int")
+       (match_test "SHWORD_INT (ival)")))
+
+(define_constraint "P"
+  "A constant 1."
+  (and (match_code "const_int")
+       (match_test "ival == 1")))
+
+(define_constraint "T"
+  "A text segment (program memory) constant label."
+  (match_test "text_segment_operand (op, VOIDmode)"))
+
+(define_constraint "Z"
+  "An integer constant zero."
+  (and (match_code "const_int")
+       (match_test "ival == 0")))
diff --git a/gcc/config/pru/predicates.md b/gcc/config/pru/predicates.md
new file mode 100644
index 00000000000..78fdb8a25b7
--- /dev/null
+++ b/gcc/config/pru/predicates.md
@@ -0,0 +1,224 @@ 
+;; Predicate definitions for TI PRU.
+;; Copyright (C) 2014-2018 Free Software Foundation, Inc.
+;; Contributed by Dimitar Dimitrov <dimitar@dinux.eu>
+;;
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+;;
+;; GCC is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+;; GNU General Public License for more details.
+;;
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; <http://www.gnu.org/licenses/>.
+
+(define_predicate "const_1_operand"
+  (and (match_code "const_int")
+       (match_test "IN_RANGE (INTVAL (op), 1, 1)")))
+
+(define_predicate "const_ubyte_operand"
+  (and (match_code "const_int")
+       (match_test "IN_RANGE (INTVAL (op), 0, 0xff)")))
+
+(define_predicate "const_uhword_operand"
+  (and (match_code "const_int")
+       (match_test "IN_RANGE (INTVAL (op), 0, 0xffff)")))
+
+; TRUE for comparisons we support.
+(define_predicate "pru_cmp_operator"
+  (match_code "eq,ne,leu,ltu,geu,gtu"))
+
+; TRUE for signed comparisons that need special handling for PRU.
+(define_predicate "pru_signed_cmp_operator"
+  (match_code "ge,gt,le,lt"))
+
+;; FP Comparisons handled by pru_expand_pru_compare.
+(define_predicate "pru_fp_comparison_operator"
+  (match_code "eq,lt,gt,le,ge"))
+
+;; Return true if OP is a constant that contains only one 1 in its
+;; binary representation.
+(define_predicate "single_one_operand"
+  (and (match_code "const_int")
+       (match_test "exact_log2 (INTVAL (op) & GET_MODE_MASK (mode)) >= 0")))
+
+;; Return true if OP is a constant that contains only one 0 in its
+;; binary representation.
+(define_predicate "single_zero_operand"
+  (and (match_code "const_int")
+       (match_test "exact_log2 (~INTVAL (op) & GET_MODE_MASK (mode)) >= 0")))
+
+
+(define_predicate "reg_or_ubyte_operand"
+  (ior (match_operand 0 "const_ubyte_operand")
+       (match_operand 0 "register_operand")))
+
+(define_predicate "reg_or_const_1_operand"
+  (ior (match_operand 0 "const_1_operand")
+       (match_operand 0 "register_operand")))
+
+(define_predicate "const_shift_operand"
+  (and (match_code "const_int")
+       (match_test "SHIFT_INT (INTVAL (op))")))
+
+(define_predicate "shift_operand"
+  (ior (match_operand 0 "const_shift_operand")
+       (match_operand 0 "register_operand")))
+
+(define_predicate "ctable_addr_operand"
+  (and (match_code "const_int")
+       (match_test ("(pru_get_ctable_base_index (INTVAL (op)) >= 0)"))))
+
+(define_predicate "ctable_base_operand"
+  (and (match_code "const_int")
+       (match_test ("(pru_get_ctable_exact_base_index (INTVAL (op)) >= 0)"))))
+
+;; Ideally we should enforce a restriction to all text labels to fit in
+;; 16bits, as required by the PRU ISA.  But for the time being we'll rely on
+;; binutils to catch text segment overflows.
+(define_predicate "call_operand"
+  (ior (match_operand 0 "immediate_operand")
+       (match_operand 0 "register_operand")))
+
+;; Return true if OP is a text segment reference.
+;; This is needed for program memory address expressions.  Borrowed from AVR.
+(define_predicate "text_segment_operand"
+  (match_code "code_label,label_ref,symbol_ref,plus,minus,const")
+{
+  switch (GET_CODE (op))
+    {
+    case CODE_LABEL:
+      return true;
+    case LABEL_REF :
+      return true;
+    case SYMBOL_REF :
+      return SYMBOL_REF_FUNCTION_P (op);
+    case PLUS :
+    case MINUS :
+      /* Assume canonical format of symbol + constant.
+	 Fall through.  */
+    case CONST :
+      return text_segment_operand (XEXP (op, 0), VOIDmode);
+    default :
+      return false;
+    }
+})
+
+;; Return true if OP is a load multiple operation.  It is known to be a
+;; PARALLEL and the first section will be tested.
+
+(define_special_predicate "load_multiple_operation"
+  (match_code "parallel")
+{
+  machine_mode elt_mode;
+  int count = XVECLEN (op, 0);
+  unsigned int dest_regno;
+  rtx src_addr;
+  int i, off;
+
+  /* Perform a quick check so we don't blow up below.  */
+  if (GET_CODE (XVECEXP (op, 0, 0)) != SET
+      || GET_CODE (SET_DEST (XVECEXP (op, 0, 0))) != REG
+      || GET_CODE (SET_SRC (XVECEXP (op, 0, 0))) != MEM)
+    return false;
+
+  dest_regno = REGNO (SET_DEST (XVECEXP (op, 0, 0)));
+  src_addr = XEXP (SET_SRC (XVECEXP (op, 0, 0)), 0);
+  elt_mode = GET_MODE (SET_DEST (XVECEXP (op, 0, 0)));
+
+  /* Check, is base, or base + displacement.  */
+
+  if (GET_CODE (src_addr) == REG)
+    off = 0;
+  else if (GET_CODE (src_addr) == PLUS
+	   && GET_CODE (XEXP (src_addr, 0)) == REG
+	   && GET_CODE (XEXP (src_addr, 1)) == CONST_INT)
+    {
+      off = INTVAL (XEXP (src_addr, 1));
+      src_addr = XEXP (src_addr, 0);
+    }
+  else
+    return false;
+
+  for (i = 1; i < count; i++)
+    {
+      rtx elt = XVECEXP (op, 0, i);
+
+      if (GET_CODE (elt) != SET
+	  || GET_CODE (SET_DEST (elt)) != REG
+	  || GET_MODE (SET_DEST (elt)) != elt_mode
+	  || REGNO (SET_DEST (elt)) != dest_regno + i
+	  || GET_CODE (SET_SRC (elt)) != MEM
+	  || GET_MODE (SET_SRC (elt)) != elt_mode
+	  || GET_CODE (XEXP (SET_SRC (elt), 0)) != PLUS
+	  || ! rtx_equal_p (XEXP (XEXP (SET_SRC (elt), 0), 0), src_addr)
+	  || GET_CODE (XEXP (XEXP (SET_SRC (elt), 0), 1)) != CONST_INT
+	  || INTVAL (XEXP (XEXP (SET_SRC (elt), 0), 1))
+	     != off + i * GET_MODE_SIZE (elt_mode))
+	return false;
+    }
+
+  return true;
+})
+
+;; Return true if OP is a store multiple operation.  It is known to be a
+;; PARALLEL and the first section will be tested.
+
+(define_special_predicate "store_multiple_operation"
+  (match_code "parallel")
+{
+  machine_mode elt_mode;
+  int count = XVECLEN (op, 0);
+  unsigned int src_regno;
+  rtx dest_addr;
+  int i, off;
+
+  /* Perform a quick check so we don't blow up below.  */
+  if (GET_CODE (XVECEXP (op, 0, 0)) != SET
+      || GET_CODE (SET_DEST (XVECEXP (op, 0, 0))) != MEM
+      || GET_CODE (SET_SRC (XVECEXP (op, 0, 0))) != REG)
+    return false;
+
+  src_regno = REGNO (SET_SRC (XVECEXP (op, 0, 0)));
+  dest_addr = XEXP (SET_DEST (XVECEXP (op, 0, 0)), 0);
+  elt_mode = GET_MODE (SET_SRC (XVECEXP (op, 0, 0)));
+
+  /* Check, is base, or base + displacement.  */
+
+  if (GET_CODE (dest_addr) == REG)
+    off = 0;
+  else if (GET_CODE (dest_addr) == PLUS
+	   && GET_CODE (XEXP (dest_addr, 0)) == REG
+	   && GET_CODE (XEXP (dest_addr, 1)) == CONST_INT)
+    {
+      off = INTVAL (XEXP (dest_addr, 1));
+      dest_addr = XEXP (dest_addr, 0);
+    }
+  else
+    return false;
+
+  for (i = 1; i < count; i++)
+    {
+      rtx elt = XVECEXP (op, 0, i);
+
+      if (GET_CODE (elt) != SET
+	  || GET_CODE (SET_SRC (elt)) != REG
+	  || GET_MODE (SET_SRC (elt)) != elt_mode
+	  || REGNO (SET_SRC (elt)) != src_regno + i
+	  || GET_CODE (SET_DEST (elt)) != MEM
+	  || GET_MODE (SET_DEST (elt)) != elt_mode
+	  || GET_CODE (XEXP (SET_DEST (elt), 0)) != PLUS
+	  || ! rtx_equal_p (XEXP (XEXP (SET_DEST (elt), 0), 0), dest_addr)
+	  || GET_CODE (XEXP (XEXP (SET_DEST (elt), 0), 1)) != CONST_INT
+	  || INTVAL (XEXP (XEXP (SET_DEST (elt), 0), 1))
+	     != off + i * GET_MODE_SIZE (elt_mode))
+	return false;
+    }
+  return true;
+})
diff --git a/gcc/config/pru/pru-opts.h b/gcc/config/pru/pru-opts.h
new file mode 100644
index 00000000000..1c1514cb2a3
--- /dev/null
+++ b/gcc/config/pru/pru-opts.h
@@ -0,0 +1,31 @@ 
+/* Copyright (C) 2017-2018 Free Software Foundation, Inc.
+   Contributed by Dimitar Dimitrov <dimitar@dinux.eu>
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published
+   by the Free Software Foundation; either version 3, or (at your
+   option) any later version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+   License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+/* Definitions for option handling for PRU.  */
+
+#ifndef GCC_PRU_OPTS_H
+#define GCC_PRU_OPTS_H
+
+/* ABI variant for code generation.  */
+enum pru_abi {
+    PRU_ABI_GNU,
+    PRU_ABI_TI
+};
+
+#endif
diff --git a/gcc/config/pru/pru-passes.c b/gcc/config/pru/pru-passes.c
new file mode 100644
index 00000000000..c95dc877722
--- /dev/null
+++ b/gcc/config/pru/pru-passes.c
@@ -0,0 +1,234 @@ 
+/* PRU target specific passes
+   Copyright (C) 2017-2018 Free Software Foundation, Inc.
+   Dimitar Dimitrov <dimitar@dinux.eu>
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published
+   by the Free Software Foundation; either version 3, or (at your
+   option) any later version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+   License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#define IN_TARGET_CODE 1
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "backend.h"
+#include "context.h"
+#include "tm.h"
+#include "alias.h"
+#include "symtab.h"
+#include "tree.h"
+#include "diagnostic-core.h"
+#include "function.h"
+#include "gimple.h"
+#include "gimple-iterator.h"
+#include "gimple-walk.h"
+#include "gimple-expr.h"
+#include "tree-pass.h"
+#include "gimple-pretty-print.h"
+
+#include "pru-protos.h"
+
+namespace {
+
+/* Scan the tree to ensure that the compiled code by GCC
+   conforms to the TI ABI specification.  If GCC cannot
+   output a conforming code, raise an error.  */
+const pass_data pass_data_tiabi_check =
+{
+  GIMPLE_PASS, /* type */
+  "*tiabi_check", /* name */
+  OPTGROUP_NONE, /* optinfo_flags */
+  TV_NONE, /* tv_id */
+  PROP_gimple_any, /* properties_required */
+  0, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  0, /* todo_flags_finish */
+};
+
+/* Implementation class for the TI ABI compliance-check pass.  */
+class pass_tiabi_check : public gimple_opt_pass
+{
+public:
+  pass_tiabi_check (gcc::context *ctxt)
+    : gimple_opt_pass (pass_data_tiabi_check, ctxt)
+  {}
+
+  /* opt_pass methods: */
+  virtual unsigned int execute (function *);
+
+  virtual bool gate (function *fun ATTRIBUTE_UNUSED)
+  {
+    return pru_current_abi == PRU_ABI_TI;
+  }
+
+}; // class pass_tiabi_check
+
+/* Return 1 if type TYPE is a pointer to function type or a
+   structure having a pointer to function type as one of its fields.
+   Otherwise return 0.  */
+static bool
+chkp_type_has_function_pointer (const_tree type)
+{
+  bool res = false;
+
+  if (POINTER_TYPE_P (type) && FUNC_OR_METHOD_TYPE_P (TREE_TYPE (type)))
+    res = true;
+  else if (RECORD_OR_UNION_TYPE_P (type))
+    {
+      tree field;
+
+      for (field = TYPE_FIELDS (type); field; field = DECL_CHAIN (field))
+	if (TREE_CODE (field) == FIELD_DECL)
+	  res = res || chkp_type_has_function_pointer (TREE_TYPE (field));
+    }
+  else if (TREE_CODE (type) == ARRAY_TYPE)
+    res = chkp_type_has_function_pointer (TREE_TYPE (type));
+
+  return res;
+}
+
+/* Check the function declaration FNTYPE for TI ABI compatibility.  */
+static void
+chk_function_decl (const_tree fntype, location_t call_location)
+{
+  /* GCC does not check if the RETURN VALUE pointer is NULL,
+     so do not allow GCC functions with large return values.  */
+  if (!VOID_TYPE_P (TREE_TYPE (fntype))
+      && pru_return_in_memory (TREE_TYPE (fntype), fntype))
+    error_at (call_location,
+	      "large return values not supported with %<-mabi=ti%> option");
+
+  /* Check this function's arguments.  */
+  for (tree p = TYPE_ARG_TYPES (fntype); p; p = TREE_CHAIN (p))
+    {
+      tree arg_type = TREE_VALUE (p);
+      if (chkp_type_has_function_pointer (arg_type))
+	{
+	  error_at (call_location,
+		    "function pointers not supported with %<-mabi=ti%> option");
+	}
+    }
+}
+
+/* Callback for walk_gimple_seq that checks TP tree for TI ABI compliance.  */
+static tree
+check_op_callback (tree *tp, int *walk_subtrees, void *data)
+{
+  struct walk_stmt_info *wi = (struct walk_stmt_info *) data;
+
+  if (RECORD_OR_UNION_TYPE_P (*tp) || TREE_CODE (*tp) == ENUMERAL_TYPE)
+    {
+      /* Forward declarations have NULL tree type.  Skip them.  */
+      if (TREE_TYPE (*tp) == NULL)
+	return NULL;
+    }
+
+  /* TODO - why C++ leaves INTEGER_TYPE forward declarations around?  */
+  if (TREE_TYPE (*tp) == NULL)
+    return NULL;
+
+  const tree type = TREE_TYPE (*tp);
+
+  /* Direct function calls are allowed, obviously.  */
+  if (TREE_CODE (*tp) == ADDR_EXPR && TREE_CODE (type) == POINTER_TYPE)
+    {
+      const tree ptype = TREE_TYPE (type);
+      if (TREE_CODE (ptype) == FUNCTION_TYPE)
+	return NULL;
+    }
+
+  switch (TREE_CODE (type))
+    {
+    case FUNCTION_TYPE:
+    case METHOD_TYPE:
+	{
+	  /* Note: Do not enforce a small return value.  It is safe to
+	     call any TI ABI function from GCC, since GCC will
+	     never pass NULL.  */
+
+	  /* Check arguments for function pointers.  */
+	  for (tree p = TYPE_ARG_TYPES (type); p; p = TREE_CHAIN (p))
+	    {
+	      tree arg_type = TREE_VALUE (p);
+	      if (chkp_type_has_function_pointer (arg_type))
+		{
+		  error_at (gimple_location (wi->stmt), "function pointers "
+			    "not supported with %<-mabi=ti%> option");
+		}
+	    }
+	  break;
+	}
+    case RECORD_TYPE:
+    case UNION_TYPE:
+    case QUAL_UNION_TYPE:
+    case POINTER_TYPE:
+	{
+	  if (chkp_type_has_function_pointer (type))
+	    {
+	      error_at (gimple_location (wi->stmt),
+			"function pointers not supported with "
+			"%<-mabi=ti%> option");
+	      *walk_subtrees = false;
+	    }
+	  break;
+	}
+    default:
+	  break;
+    }
+  return NULL;
+}
+
+/* Pass implementation.  */
+unsigned
+pass_tiabi_check::execute (function *fun)
+{
+  struct walk_stmt_info wi;
+  const_tree fntype = TREE_TYPE (fun->decl);
+
+  gimple_seq body = gimple_body (current_function_decl);
+
+  memset (&wi, 0, sizeof (wi));
+  wi.info = NULL;
+  wi.want_locations = true;
+
+  /* Check the function body.  */
+  walk_gimple_seq (body, NULL, check_op_callback, &wi);
+
+  /* Check the function declaration.  */
+  chk_function_decl (fntype, fun->function_start_locus);
+
+  return 0;
+}
+
+} // anon namespace
+
+gimple_opt_pass *
+make_pass_tiabi_check (gcc::context *ctxt)
+{
+  return new pass_tiabi_check (ctxt);
+}
+
+/* Register as early as possible.  */
+void
+pru_register_abicheck_pass (void)
+{
+  opt_pass *tiabi_check = make_pass_tiabi_check (g);
+  struct register_pass_info tiabi_check_info
+    = { tiabi_check, "*warn_unused_result",
+	1, PASS_POS_INSERT_AFTER
+      };
+  register_pass (&tiabi_check_info);
+}
diff --git a/gcc/config/pru/pru-pragma.c b/gcc/config/pru/pru-pragma.c
new file mode 100644
index 00000000000..11cb5bd625c
--- /dev/null
+++ b/gcc/config/pru/pru-pragma.c
@@ -0,0 +1,90 @@ 
+/* PRU target specific pragmas
+   Copyright (C) 2015-2018 Free Software Foundation, Inc.
+   Contributed by Dimitar Dimitrov <dimitar@dinux.eu>
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published
+   by the Free Software Foundation; either version 3, or (at your
+   option) any later version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+   License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#define IN_TARGET_CODE 1
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "alias.h"
+#include "symtab.h"
+#include "tree.h"
+#include "c-family/c-pragma.h"
+#include "c-family/c-common.h"
+#include "diagnostic-core.h"
+#include "cpplib.h"
+#include "pru-protos.h"
+
+
+/* Implements the "pragma CTABLE_ENTRY" pragma.  This pragma takes a
+   CTABLE index and an address, and instructs the compiler that
+   LBCO/SBCO can be used on that base address.
+
+   WARNING: Only immediate constant addresses are currently supported.  */
+static void
+pru_pragma_ctable_entry (cpp_reader * reader ATTRIBUTE_UNUSED)
+{
+  tree ctable_index, base_addr;
+  enum cpp_ttype type;
+
+  type = pragma_lex (&ctable_index);
+  if (type == CPP_NUMBER)
+    {
+      type = pragma_lex (&base_addr);
+      if (type == CPP_NUMBER)
+	{
+	  unsigned int i = tree_to_uhwi (ctable_index);
+	  unsigned HOST_WIDE_INT base = tree_to_uhwi (base_addr);
+
+	  type = pragma_lex (&base_addr);
+	  if (type != CPP_EOF)
+	    {
+	      error ("junk at end of %<#pragma CTABLE_ENTRY%>");
+	    }
+	  else if (i >= ARRAY_SIZE (pru_ctable))
+	    {
+	      error ("%<CTABLE_ENTRY%> index %d is not valid", i);
+	    }
+	  else if (pru_ctable[i].valid && pru_ctable[i].base != base)
+	    {
+	      error ("redefinition of %<CTABLE_ENTRY %d%>", i);
+	    }
+	  else
+	    {
+	      if (base & 0xff)
+		warning (0, "%<CTABLE_ENTRY%> base address is not "
+			    "a multiple of 256");
+	      pru_ctable[i].base = base;
+	      pru_ctable[i].valid = true;
+	    }
+	  return;
+	}
+    }
+  error ("malformed %<#pragma CTABLE_ENTRY%> variable address");
+}
+
+/* Implements REGISTER_TARGET_PRAGMAS.  */
+void
+pru_register_pragmas (void)
+{
+  c_register_pragma (NULL, "ctable_entry", pru_pragma_ctable_entry);
+  c_register_pragma (NULL, "CTABLE_ENTRY", pru_pragma_ctable_entry);
+}
diff --git a/gcc/config/pru/pru-protos.h b/gcc/config/pru/pru-protos.h
new file mode 100644
index 00000000000..73ab76e076c
--- /dev/null
+++ b/gcc/config/pru/pru-protos.h
@@ -0,0 +1,72 @@ 
+/* Subroutine declarations for TI PRU target support.
+   Copyright (C) 2014-2018 Free Software Foundation, Inc.
+   Contributed by Dimitar Dimitrov <dimitar@dinux.eu>
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published
+   by the Free Software Foundation; either version 3, or (at your
+   option) any later version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+   License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef GCC_PRU_PROTOS_H
+#define GCC_PRU_PROTOS_H
+
+struct pru_ctable_entry {
+    bool valid;
+    unsigned HOST_WIDE_INT base;
+};
+
+extern struct pru_ctable_entry pru_ctable[32];
+
+extern int pru_initial_elimination_offset (int, int);
+extern int pru_can_use_return_insn (void);
+extern void pru_expand_prologue (void);
+extern void pru_expand_epilogue (bool);
+extern void pru_function_profiler (FILE *, int);
+
+void pru_register_pragmas (void);
+
+#ifdef RTX_CODE
+extern rtx pru_get_return_address (int);
+extern int pru_hard_regno_rename_ok (unsigned int, unsigned int);
+
+extern const char * pru_output_sign_extend (rtx *);
+extern const char * pru_output_signed_cbranch (rtx *, bool);
+extern const char * pru_output_signed_cbranch_ubyteop2 (rtx *, bool);
+extern const char * pru_output_signed_cbranch_zeroop2 (rtx *, bool);
+
+extern rtx pru_expand_fp_compare (rtx comparison, machine_mode mode);
+
+extern void pru_emit_doloop (rtx *, int);
+
+extern bool pru_regno_ok_for_base_p (int, bool);
+
+static inline bool
+pru_regno_ok_for_index_p (int regno, bool strict_p)
+{
+  /* Selection logic is the same - PRU instructions are quite orthogonal.  */
+  return pru_regno_ok_for_base_p (regno, strict_p);
+}
+
+extern int pru_get_ctable_exact_base_index (unsigned HOST_WIDE_INT caddr);
+extern int pru_get_ctable_base_index (unsigned HOST_WIDE_INT caddr);
+extern int pru_get_ctable_base_offset (unsigned HOST_WIDE_INT caddr);
+
+extern void pru_register_abicheck_pass (void);
+#endif /* RTX_CODE */
+
+#ifdef TREE_CODE
+bool pru_return_in_memory (const_tree type, const_tree fntype ATTRIBUTE_UNUSED);
+#endif /* TREE_CODE */
+
+#endif /* GCC_PRU_PROTOS_H */
diff --git a/gcc/config/pru/pru.c b/gcc/config/pru/pru.c
new file mode 100644
index 00000000000..f1ce8bfeddd
--- /dev/null
+++ b/gcc/config/pru/pru.c
@@ -0,0 +1,3008 @@ 
+/* Target machine subroutines for TI PRU.
+   Copyright (C) 2014-2018 Free Software Foundation, Inc.
+   Dimitar Dimitrov <dimitar@dinux.eu>
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published
+   by the Free Software Foundation; either version 3, or (at your
+   option) any later version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+   License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#define IN_TARGET_CODE 1
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "backend.h"
+#include "target.h"
+#include "rtl.h"
+#include "tree.h"
+#include "stringpool.h"
+#include "attribs.h"
+#include "df.h"
+#include "memmodel.h"
+#include "tm_p.h"
+#include "optabs.h"
+#include "regs.h"
+#include "emit-rtl.h"
+#include "recog.h"
+#include "diagnostic-core.h"
+#include "output.h"
+#include "insn-attr.h"
+#include "flags.h"
+#include "explow.h"
+#include "calls.h"
+#include "varasm.h"
+#include "expr.h"
+#include "toplev.h"
+#include "langhooks.h"
+#include "cfgrtl.h"
+#include "stor-layout.h"
+#include "dumpfile.h"
+#include "builtins.h"
+#include "pru-protos.h"
+
+/* This file should be included last.  */
+#include "target-def.h"
+
+#define INIT_ARRAY_ENTRY_BYTES	2
+
+/* Global PRU CTABLE entries, filled in by pragmas, and used for fast
+   addressing via LBCO/SBCO instructions.  */
+struct pru_ctable_entry pru_ctable[32];
+
+/* Forward function declarations.  */
+static bool prologue_saved_reg_p (int);
+static void pru_reorg_loop (rtx_insn *);
+
+struct GTY (()) machine_function
+{
+  /* Current frame information, to be filled in by pru_compute_frame_layout
+     with register save masks, and offsets for the current function.  */
+
+  /* Mask of registers to save.  */
+  HARD_REG_SET save_mask;
+  /* Number of bytes that the entire frame takes up.  */
+  int total_size;
+  /* Number of bytes that variables take up.  */
+  int var_size;
+  /* Number of bytes that outgoing arguments take up.  */
+  int args_size;
+  /* Number of bytes needed to store registers in frame.  */
+  int save_reg_size;
+  /* Offset from new stack pointer to store registers.  */
+  int save_regs_offset;
+  /* Offset from save_regs_offset to store frame pointer register.  */
+  int fp_save_offset;
+  /* != 0 if frame layout already calculated.  */
+  int initialized;
+  /* Number of doloop tags used so far.  */
+  int doloop_tags;
+  /* True if the last tag was allocated to a doloop_end.  */
+  bool doloop_tag_from_end;
+};
+
+/* Stack layout and calling conventions.  */
+
+#define PRU_STACK_ALIGN(LOC)  ROUND_UP((LOC), STACK_BOUNDARY / BITS_PER_UNIT)
+
+/* Return the bytes needed to compute the frame pointer from the current
+   stack pointer.  */
+static int
+pru_compute_frame_layout (void)
+{
+  int regno;
+  HARD_REG_SET *save_mask;
+  int total_size;
+  int var_size;
+  int out_args_size;
+  int save_reg_size;
+
+  if (cfun->machine->initialized)
+    return cfun->machine->total_size;
+
+  save_mask = &cfun->machine->save_mask;
+  CLEAR_HARD_REG_SET (*save_mask);
+
+  var_size = PRU_STACK_ALIGN ((HOST_WIDE_INT) get_frame_size ());
+  out_args_size = PRU_STACK_ALIGN ((HOST_WIDE_INT) crtl->outgoing_args_size);
+  total_size = var_size + out_args_size;
+
+  /* Calculate space needed for gp registers.  */
+  save_reg_size = 0;
+  for (regno = 0; regno <= LAST_GP_REG; regno++)
+    if (prologue_saved_reg_p (regno))
+      {
+	SET_HARD_REG_BIT (*save_mask, regno);
+	save_reg_size += 1;
+      }
+
+  cfun->machine->fp_save_offset = 0;
+  if (TEST_HARD_REG_BIT (*save_mask, HARD_FRAME_POINTER_REGNUM))
+    {
+      int fp_save_offset = 0;
+      for (regno = 0; regno < HARD_FRAME_POINTER_REGNUM; regno++)
+	if (TEST_HARD_REG_BIT (*save_mask, regno))
+	  fp_save_offset += 1;
+
+      cfun->machine->fp_save_offset = fp_save_offset;
+    }
+
+  save_reg_size = PRU_STACK_ALIGN (save_reg_size);
+  total_size += save_reg_size;
+  total_size += PRU_STACK_ALIGN (crtl->args.pretend_args_size);
+
+  /* Save other computed information.  */
+  cfun->machine->total_size = total_size;
+  cfun->machine->var_size = var_size;
+  cfun->machine->args_size = out_args_size;
+  cfun->machine->save_reg_size = save_reg_size;
+  cfun->machine->initialized = reload_completed;
+  cfun->machine->save_regs_offset = out_args_size + var_size;
+
+  return total_size;
+}
+
+/* Emit efficient RTL equivalent of ADD3 with the given const_int for
+   frame-related registers.
+     op0	  - Destination register.
+     op1	  - First addendum operand (a register).
+     addendum     - Second addendum operand (a constant).
+     kind	  - Note kind.  REG_NOTE_MAX if no note must be added.
+     reg_note_rtx - Reg note RTX.  NULL if it should be computed automatically.
+ */
+static rtx
+pru_add3_frame_adjust (rtx op0, rtx op1, int addendum,
+		       const enum reg_note kind, rtx reg_note_rtx)
+{
+  rtx insn;
+
+  rtx op0_adjust = gen_rtx_SET (op0, plus_constant (Pmode, op1, addendum));
+
+  if (UBYTE_INT (addendum) || UBYTE_INT (-addendum))
+    insn = emit_insn (op0_adjust);
+  else
+    {
+      /* Help the compiler to cope with an arbitrary integer constant.
+	 Reload has finished so we can't expect the compiler to
+	 auto-allocate a temporary register.  But we know that call-saved
+	 registers are not live yet, so we utilize them.  */
+      rtx tmpreg = gen_rtx_REG (Pmode, PROLOGUE_TEMP_REGNO);
+      if (addendum < 0)
+	{
+	  emit_insn (gen_rtx_SET (tmpreg, gen_int_mode (-addendum, Pmode)));
+	  insn = emit_insn (gen_sub3_insn (op0, op1, tmpreg));
+	}
+      else
+	{
+	  emit_insn (gen_rtx_SET (tmpreg, gen_int_mode (addendum, Pmode)));
+	  insn = emit_insn (gen_add3_insn (op0, op1, tmpreg));
+	}
+    }
+
+  /* Attach a note indicating what happened.  */
+  if (reg_note_rtx == NULL_RTX)
+    reg_note_rtx = copy_rtx (op0_adjust);
+  if (kind != REG_NOTE_MAX)
+    add_reg_note (insn, kind, reg_note_rtx);
+
+  RTX_FRAME_RELATED_P (insn) = 1;
+
+  return insn;
+}
+
+/* Add a const_int to the stack pointer register.  */
+static rtx
+pru_add_to_sp (int addendum, const enum reg_note kind)
+{
+  return pru_add3_frame_adjust (stack_pointer_rtx, stack_pointer_rtx,
+				addendum, kind, NULL_RTX);
+}
+
+/* Helper function used during prologue/epilogue.  Emits a single LBBO/SBBO
+   instruction for load/store of the next group of consecutive registers.  */
+static int
+xbbo_next_reg_cluster (int regno_start, int *sp_offset, bool do_store)
+{
+  int regno, nregs, i;
+  rtx addr;
+  rtx_insn *insn;
+
+  nregs = 0;
+
+  /* Skip the empty slots.  */
+  for (; regno_start <= LAST_GP_REG;)
+    if (TEST_HARD_REG_BIT (cfun->machine->save_mask, regno_start))
+      break;
+    else
+      regno_start++;
+
+  /* Find the largest consecutive group of registers to save.  */
+  for (regno = regno_start; regno <= LAST_GP_REG;)
+    if (TEST_HARD_REG_BIT (cfun->machine->save_mask, regno))
+      {
+	regno++;
+	nregs++;
+      }
+    else
+      break;
+
+  if (!nregs)
+    return -1;
+
+  gcc_assert (UBYTE_INT (*sp_offset));
+
+  /* Ok, save this bunch.  */
+  addr = gen_rtx_PLUS (Pmode, stack_pointer_rtx,
+		       gen_int_mode (*sp_offset, Pmode));
+
+  if (do_store)
+    insn = targetm.gen_store_multiple (gen_frame_mem (BLKmode, addr),
+				       gen_rtx_REG (QImode, regno_start),
+				       GEN_INT (nregs));
+  else
+    insn = targetm.gen_load_multiple (gen_rtx_REG (QImode, regno_start),
+				      gen_frame_mem (BLKmode, addr),
+				      GEN_INT (nregs));
+
+  gcc_assert (reload_completed);
+  gcc_assert (insn);
+  emit_insn (insn);
+
+  /* Tag as frame-related.  */
+  RTX_FRAME_RELATED_P (insn) = 1;
+
+  if (!do_store)
+    {
+      /* Tag epilogue unwind notes.  */
+      for (i = regno_start; i < (regno_start + nregs); i++)
+	add_reg_note (insn, REG_CFA_RESTORE, gen_rtx_REG (QImode, i));
+    }
+
+  /* Increment and save offset in anticipation of the next register group.  */
+  *sp_offset += nregs * UNITS_PER_WORD;
+
+  return regno_start + nregs;
+}
+
+/* Emit function prologue.  */
+void
+pru_expand_prologue (void)
+{
+  int regno_start;
+  int total_frame_size;
+  int sp_offset;      /* Offset from base_reg to final stack value.  */
+  int save_regs_base; /* Offset from base_reg to register save area.  */
+  int save_offset;    /* Temporary offset to currently saved register group.  */
+  rtx insn;
+
+  total_frame_size = pru_compute_frame_layout ();
+
+  if (flag_stack_usage_info)
+    current_function_static_stack_size = total_frame_size;
+
+  /* Decrement the stack pointer.  */
+  if (!UBYTE_INT (total_frame_size))
+    {
+      /* We need an intermediary point, this will point at the spill block.  */
+      insn = pru_add_to_sp (cfun->machine->save_regs_offset
+			     - total_frame_size,
+			     REG_NOTE_MAX);
+      save_regs_base = 0;
+      sp_offset = -cfun->machine->save_regs_offset;
+    }
+  else if (total_frame_size)
+    {
+      insn = emit_insn (gen_sub2_insn (stack_pointer_rtx,
+				       gen_int_mode (total_frame_size,
+						     Pmode)));
+      RTX_FRAME_RELATED_P (insn) = 1;
+      save_regs_base = cfun->machine->save_regs_offset;
+      sp_offset = 0;
+    }
+  else
+    save_regs_base = sp_offset = 0;
+
+  regno_start = 0;
+  save_offset = save_regs_base;
+  do {
+      regno_start = xbbo_next_reg_cluster (regno_start, &save_offset, true);
+  } while (regno_start >= 0);
+
+  if (frame_pointer_needed)
+    {
+      int fp_save_offset = save_regs_base + cfun->machine->fp_save_offset;
+      pru_add3_frame_adjust (hard_frame_pointer_rtx,
+				    stack_pointer_rtx,
+				    fp_save_offset,
+				    REG_NOTE_MAX,
+				    NULL_RTX);
+    }
+
+  if (sp_offset)
+      pru_add_to_sp (sp_offset, REG_FRAME_RELATED_EXPR);
+
+  /* If we are profiling, make sure no instructions are scheduled before
+     the call to mcount.  */
+  if (crtl->profile)
+    emit_insn (gen_blockage ());
+}
+
+/* Emit function epilogue.  */
+void
+pru_expand_epilogue (bool sibcall_p)
+{
+  rtx cfa_adj;
+  int total_frame_size;
+  int sp_adjust, save_offset;
+  int regno_start;
+
+  if (!sibcall_p && pru_can_use_return_insn ())
+    {
+      emit_jump_insn (gen_return ());
+      return;
+    }
+
+  emit_insn (gen_blockage ());
+
+  total_frame_size = pru_compute_frame_layout ();
+  if (frame_pointer_needed)
+    {
+      /* Recover the stack pointer.  */
+      cfa_adj = plus_constant (Pmode, stack_pointer_rtx,
+			       (total_frame_size
+				- cfun->machine->save_regs_offset));
+      pru_add3_frame_adjust (stack_pointer_rtx,
+			     hard_frame_pointer_rtx,
+			     -cfun->machine->fp_save_offset,
+			     REG_CFA_DEF_CFA,
+			     cfa_adj);
+
+      save_offset = 0;
+      sp_adjust = total_frame_size - cfun->machine->save_regs_offset;
+    }
+  else if (!UBYTE_INT (total_frame_size))
+    {
+      pru_add_to_sp (cfun->machine->save_regs_offset,
+			    REG_CFA_ADJUST_CFA);
+      save_offset = 0;
+      sp_adjust = total_frame_size - cfun->machine->save_regs_offset;
+    }
+  else
+    {
+      save_offset = cfun->machine->save_regs_offset;
+      sp_adjust = total_frame_size;
+    }
+
+  regno_start = 0;
+  do {
+      regno_start = xbbo_next_reg_cluster (regno_start, &save_offset, false);
+  } while (regno_start >= 0);
+
+  /* Emit a blockage insn here to keep these insns from being moved to
+     an earlier spot in the epilogue.
+
+     This is necessary as we must not cut the stack back before all the
+     restores are finished.  */
+  emit_insn (gen_blockage ());
+
+  if (sp_adjust)
+      pru_add_to_sp (sp_adjust, REG_CFA_ADJUST_CFA);
+
+  if (!sibcall_p)
+    emit_jump_insn (gen_simple_return ());
+}
+
+/* Implement RETURN_ADDR_RTX.  Note, we do not support moving
+   back to a previous frame.  */
+rtx
+pru_get_return_address (int count)
+{
+  if (count != 0)
+    return NULL_RTX;
+
+  /* Return r3.w2.  */
+  return get_hard_reg_initial_val (HImode, RA_REGNO);
+}
+
+/* Implement FUNCTION_PROFILER macro.  */
+void
+pru_function_profiler (FILE *file, int labelno ATTRIBUTE_UNUSED)
+{
+  fprintf (file, "\tmov\tr1, ra\n");
+  fprintf (file, "\tcall\t_mcount\n");
+  fprintf (file, "\tmov\tra, r1\n");
+}
+
+/* Dump stack layout.  */
+static void
+pru_dump_frame_layout (FILE *file)
+{
+  fprintf (file, "\t%s Current Frame Info\n", ASM_COMMENT_START);
+  fprintf (file, "\t%s total_size = %d\n", ASM_COMMENT_START,
+	   cfun->machine->total_size);
+  fprintf (file, "\t%s var_size = %d\n", ASM_COMMENT_START,
+	   cfun->machine->var_size);
+  fprintf (file, "\t%s args_size = %d\n", ASM_COMMENT_START,
+	   cfun->machine->args_size);
+  fprintf (file, "\t%s save_reg_size = %d\n", ASM_COMMENT_START,
+	   cfun->machine->save_reg_size);
+  fprintf (file, "\t%s initialized = %d\n", ASM_COMMENT_START,
+	   cfun->machine->initialized);
+  fprintf (file, "\t%s save_regs_offset = %d\n", ASM_COMMENT_START,
+	   cfun->machine->save_regs_offset);
+  fprintf (file, "\t%s is_leaf = %d\n", ASM_COMMENT_START,
+	   crtl->is_leaf);
+  fprintf (file, "\t%s frame_pointer_needed = %d\n", ASM_COMMENT_START,
+	   frame_pointer_needed);
+  fprintf (file, "\t%s pretend_args_size = %d\n", ASM_COMMENT_START,
+	   crtl->args.pretend_args_size);
+}
+
+/* Return true if REGNO should be saved in the prologue.  */
+static bool
+prologue_saved_reg_p (int regno)
+{
+  gcc_assert (GP_REG_P (regno));
+
+  if (df_regs_ever_live_p (regno) && !call_used_regs[regno])
+    return true;
+
+  /* 32-bit FP.  */
+  if (frame_pointer_needed
+      && regno >= HARD_FRAME_POINTER_REGNUM
+      && regno < HARD_FRAME_POINTER_REGNUM + GET_MODE_SIZE (Pmode))
+    return true;
+
+  /* 16-bit RA.  */
+  if (regno == RA_REGNO && df_regs_ever_live_p (RA_REGNO))
+    return true;
+  if (regno == RA_REGNO + 1 && df_regs_ever_live_p (RA_REGNO + 1))
+    return true;
+
+  return false;
+}
+
+/* Implement TARGET_CAN_ELIMINATE.  */
+static bool
+pru_can_eliminate (const int from ATTRIBUTE_UNUSED, const int to)
+{
+  if (to == STACK_POINTER_REGNUM)
+    return !frame_pointer_needed;
+  return true;
+}
+
+/* Implement INITIAL_ELIMINATION_OFFSET macro.  */
+int
+pru_initial_elimination_offset (int from, int to)
+{
+  int offset;
+
+  pru_compute_frame_layout ();
+
+  /* Set OFFSET to the offset from the stack pointer.  */
+  switch (from)
+    {
+    case FRAME_POINTER_REGNUM:
+      offset = cfun->machine->args_size;
+      break;
+
+    case ARG_POINTER_REGNUM:
+      offset = cfun->machine->total_size;
+      offset -= crtl->args.pretend_args_size;
+      break;
+
+    default:
+      gcc_unreachable ();
+    }
+
+  /* If we are asked for the frame pointer offset, then adjust OFFSET
+     by the offset from the frame pointer to the stack pointer.  */
+  if (to == HARD_FRAME_POINTER_REGNUM)
+    offset -= (cfun->machine->save_regs_offset
+	       + cfun->machine->fp_save_offset);
+
+  return offset;
+}
+
+/* Return nonzero if this function is known to have a null epilogue.
+   This allows the optimizer to omit jumps to jumps if no stack
+   was created.  */
+int
+pru_can_use_return_insn (void)
+{
+  if (!reload_completed || crtl->profile)
+    return 0;
+
+  return pru_compute_frame_layout () == 0;
+}
+
+/* Implement TARGET_MODES_TIEABLE_P.  */
+
+static bool
+pru_modes_tieable_p (machine_mode mode1, machine_mode mode2)
+{
+  return (mode1 == mode2
+	  || (GET_MODE_SIZE (mode1) <= 4 && GET_MODE_SIZE (mode2) <= 4));
+}
+
+/* Implement TARGET_HARD_REGNO_MODE_OK.  */
+
+static bool
+pru_hard_regno_mode_ok (unsigned int regno, machine_mode mode)
+{
+  switch (GET_MODE_SIZE (mode))
+    {
+    case 1: return true;
+    case 2: return (regno % 4) <= 2;
+    case 4: return (regno % 4) == 0;
+    default: return (regno % 4) == 0; /* Not sure why VOIDmode is passed.  */
+    }
+}
+
+/* Implement `TARGET_HARD_REGNO_SCRATCH_OK'.
+   Returns true if REGNO is safe to be allocated as a scratch
+   register (for a define_peephole2) in the current function.  */
+
+static bool
+pru_hard_regno_scratch_ok (unsigned int regno)
+{
+  /* Don't allow hard registers that might be part of the frame pointer.
+     Some places in the compiler just test for [HARD_]FRAME_POINTER_REGNUM
+     and don't handle a frame pointer that spans more than one register.  */
+
+  if ((!reload_completed || frame_pointer_needed)
+      && ((regno >= HARD_FRAME_POINTER_REGNUM
+	   && regno <= HARD_FRAME_POINTER_REGNUM + 3)
+	  || (regno >= FRAME_POINTER_REGNUM
+	      && regno <= FRAME_POINTER_REGNUM + 3)))
+    {
+      return false;
+    }
+
+  return true;
+}
+
+
+/* Worker function for `HARD_REGNO_RENAME_OK'.
+   Return nonzero if register OLD_REG can be renamed to register NEW_REG.  */
+
+int
+pru_hard_regno_rename_ok (unsigned int old_reg,
+			  unsigned int new_reg)
+{
+  /* Don't allow hard registers that might be part of the frame pointer.
+     Some places in the compiler just test for [HARD_]FRAME_POINTER_REGNUM
+     and don't care for a frame pointer that spans more than one register.  */
+  if ((!reload_completed || frame_pointer_needed)
+      && ((old_reg >= HARD_FRAME_POINTER_REGNUM
+	   && old_reg <= HARD_FRAME_POINTER_REGNUM + 3)
+	  || (old_reg >= FRAME_POINTER_REGNUM
+	      && old_reg <= FRAME_POINTER_REGNUM + 3)
+	  || (new_reg >= HARD_FRAME_POINTER_REGNUM
+	      && new_reg <= HARD_FRAME_POINTER_REGNUM + 3)
+	  || (new_reg >= FRAME_POINTER_REGNUM
+	      && new_reg <= FRAME_POINTER_REGNUM + 3)))
+    {
+      return 0;
+    }
+
+  return 1;
+}
+
+/* Allocate a chunk of memory for per-function machine-dependent data.  */
+static struct machine_function *
+pru_init_machine_status (void)
+{
+  return ggc_cleared_alloc<machine_function> ();
+}
+
+/* Implement TARGET_OPTION_OVERRIDE.  */
+static void
+pru_option_override (void)
+{
+#ifdef SUBTARGET_OVERRIDE_OPTIONS
+  SUBTARGET_OVERRIDE_OPTIONS;
+#endif
+
+  /* Check for unsupported options.  */
+  if (flag_pic == 1)
+    warning (OPT_fpic, "%<-fpic%> is not supported");
+  if (flag_pic == 2)
+    warning (OPT_fPIC, "%<-fPIC%> is not supported");
+  if (flag_pie == 1)
+    warning (OPT_fpie, "%<-fpie%> is not supported");
+  if (flag_pie == 2)
+    warning (OPT_fPIE, "%<-fPIE%> is not supported");
+
+  /* QBxx conditional branching cannot cope with block reordering.  */
+  if (flag_reorder_blocks_and_partition)
+    {
+      inform (input_location, "%<-freorder-blocks-and-partition%> "
+			      "not supported on this architecture");
+      flag_reorder_blocks_and_partition = 0;
+      flag_reorder_blocks = 1;
+    }
+
+  /* Function to allocate machine-dependent function status.  */
+  init_machine_status = &pru_init_machine_status;
+
+  /* Save the initial options in case the user does function specific
+     options.  */
+  target_option_default_node = target_option_current_node
+    = build_target_option_node (&global_options);
+
+  /* Due to difficulties in implementing the TI ABI with GCC,
+     at least check and error-out if GCC cannot compile a
+     compliant output.  */
+  pru_register_abicheck_pass ();
+}
+
+/* Compute a (partial) cost for rtx X.  Return true if the complete
+   cost has been computed, and false if subexpressions should be
+   scanned.  In either case, *TOTAL contains the cost result.  */
+static bool
+pru_rtx_costs (rtx x, machine_mode mode ATTRIBUTE_UNUSED,
+	       int outer_code, int opno ATTRIBUTE_UNUSED,
+	       int *total, bool speed ATTRIBUTE_UNUSED)
+{
+  const int code = GET_CODE (x);
+
+  switch (code)
+    {
+      case CONST_INT:
+	if (UBYTE_INT (INTVAL (x)))
+	  {
+	    *total = COSTS_N_INSNS (0);
+	    return true;
+	  }
+	else if (outer_code == MEM && ctable_addr_operand (x, VOIDmode))
+	  {
+	    *total = COSTS_N_INSNS (0);
+	    return true;
+	  }
+	else if (UHWORD_INT (INTVAL (x)))
+	  {
+	    *total = COSTS_N_INSNS (1);
+	    return true;
+	  }
+	else
+	  {
+	    *total = COSTS_N_INSNS (2);
+	    return true;
+	  }
+
+      case LABEL_REF:
+      case SYMBOL_REF:
+      case CONST:
+      case CONST_DOUBLE:
+	  {
+	    *total = COSTS_N_INSNS (1);
+	    return true;
+	  }
+    case SET:
+	{
+	  /* A SET doesn't have a mode, so let's look at the SET_DEST to get
+	     the mode for the factor.  */
+	  mode = GET_MODE (SET_DEST (x));
+
+	  if (GET_MODE_SIZE (mode) <= GET_MODE_SIZE (SImode)
+	      && (GET_CODE (SET_SRC (x)) == ZERO_EXTEND
+		  || outer_code == ZERO_EXTEND))
+	    {
+	      *total = 0;
+	    }
+	  else
+	    {
+	      /* SI move has the same cost as a QI move.  */
+	      int factor = GET_MODE_SIZE (mode) / GET_MODE_SIZE (SImode);
+	      if (factor == 0)
+		factor = 1;
+	      *total = factor * COSTS_N_INSNS (1);
+	    }
+
+	  return false;
+	}
+
+      case MULT:
+	{
+	  *total = COSTS_N_INSNS (8);
+	  return false;
+	}
+      case PLUS:
+	{
+	  rtx op0 = XEXP (x, 0);
+	  rtx op1 = XEXP (x, 1);
+	  if (outer_code == MEM
+	      && ((REG_P (op0) && reg_or_ubyte_operand (op1, VOIDmode))
+		  || (REG_P (op1) && reg_or_ubyte_operand (op0, VOIDmode))
+		  || (ctable_addr_operand (op0, VOIDmode) && op1 == NULL_RTX)
+		  || (ctable_addr_operand (op1, VOIDmode) && op0 == NULL_RTX)
+		  || (ctable_base_operand (op0, VOIDmode) && REG_P (op1))
+		  || (ctable_base_operand (op1, VOIDmode) && REG_P (op0))))
+	    {
+	      /* CTABLE or REG base addressing - PLUS comes for free.  */
+	      *total = COSTS_N_INSNS (0);
+	      return true;
+	    }
+	  else
+	    {
+	      *total = COSTS_N_INSNS (1);
+	      return false;
+	    }
+	}
+      case SIGN_EXTEND:
+	{
+	  *total = COSTS_N_INSNS (3);
+	  return false;
+	}
+      case ASHIFTRT:
+	{
+	  rtx op1 = XEXP (x, 1);
+	  if (const_1_operand (op1, VOIDmode))
+	    *total = COSTS_N_INSNS (2);
+	  else
+	    *total = COSTS_N_INSNS (6);
+	  return false;
+	}
+      case ZERO_EXTRACT:
+	{
+	  rtx op2 = XEXP (x, 2);
+	  if ((outer_code == EQ || outer_code == NE)
+	      && CONST_INT_P (op2)
+	      && INTVAL (op2) == 1)
+	    {
+	      /* Branch if bit is set/clear is a single instruction.  */
+	      *total = COSTS_N_INSNS (0);
+	      return true;
+	    }
+	  else
+	    {
+	      *total = COSTS_N_INSNS (2);
+	      return false;
+	    }
+	}
+
+      case ZERO_EXTEND:
+	{
+	  *total = COSTS_N_INSNS (0);
+	  return false;
+	}
+
+      default:
+	{
+	  /* Do not factor mode size in the cost.  */
+	  *total = COSTS_N_INSNS (1);
+	  return false;
+	}
+    }
+}
+
+/* Implement TARGET_PREFERRED_RELOAD_CLASS.  */
+static reg_class_t
+pru_preferred_reload_class (rtx x ATTRIBUTE_UNUSED, reg_class_t regclass)
+{
+  return regclass == NO_REGS ? GENERAL_REGS : regclass;
+}
+
+static GTY(()) rtx eqdf_libfunc;
+static GTY(()) rtx nedf_libfunc;
+static GTY(()) rtx ledf_libfunc;
+static GTY(()) rtx ltdf_libfunc;
+static GTY(()) rtx gedf_libfunc;
+static GTY(()) rtx gtdf_libfunc;
+static GTY(()) rtx eqsf_libfunc;
+static GTY(()) rtx nesf_libfunc;
+static GTY(()) rtx lesf_libfunc;
+static GTY(()) rtx ltsf_libfunc;
+static GTY(()) rtx gesf_libfunc;
+static GTY(()) rtx gtsf_libfunc;
+
+/* Implement the TARGET_INIT_LIBFUNCS macro.  We use this to rename library
+   functions to match the PRU ABI.  */
+
+static void
+pru_init_libfuncs (void)
+{
+  /* Double-precision floating-point arithmetic.  */
+  set_optab_libfunc (add_optab, DFmode, "__pruabi_addd");
+  set_optab_libfunc (sdiv_optab, DFmode, "__pruabi_divd");
+  set_optab_libfunc (smul_optab, DFmode, "__pruabi_mpyd");
+  set_optab_libfunc (neg_optab, DFmode, "__pruabi_negd");
+  set_optab_libfunc (sub_optab, DFmode, "__pruabi_subd");
+
+  /* Single-precision floating-point arithmetic.  */
+  set_optab_libfunc (add_optab, SFmode, "__pruabi_addf");
+  set_optab_libfunc (sdiv_optab, SFmode, "__pruabi_divf");
+  set_optab_libfunc (smul_optab, SFmode, "__pruabi_mpyf");
+  set_optab_libfunc (neg_optab, SFmode, "__pruabi_negf");
+  set_optab_libfunc (sub_optab, SFmode, "__pruabi_subf");
+
+  /* Floating-point comparisons.  */
+  eqsf_libfunc = init_one_libfunc ("__pruabi_eqf");
+  nesf_libfunc = init_one_libfunc ("__pruabi_neqf");
+  lesf_libfunc = init_one_libfunc ("__pruabi_lef");
+  ltsf_libfunc = init_one_libfunc ("__pruabi_ltf");
+  gesf_libfunc = init_one_libfunc ("__pruabi_gef");
+  gtsf_libfunc = init_one_libfunc ("__pruabi_gtf");
+  eqdf_libfunc = init_one_libfunc ("__pruabi_eqd");
+  nedf_libfunc = init_one_libfunc ("__pruabi_neqd");
+  ledf_libfunc = init_one_libfunc ("__pruabi_led");
+  ltdf_libfunc = init_one_libfunc ("__pruabi_ltd");
+  gedf_libfunc = init_one_libfunc ("__pruabi_ged");
+  gtdf_libfunc = init_one_libfunc ("__pruabi_gtd");
+
+  set_optab_libfunc (eq_optab, SFmode, NULL);
+  set_optab_libfunc (ne_optab, SFmode, "__pruabi_neqf");
+  set_optab_libfunc (gt_optab, SFmode, NULL);
+  set_optab_libfunc (ge_optab, SFmode, NULL);
+  set_optab_libfunc (lt_optab, SFmode, NULL);
+  set_optab_libfunc (le_optab, SFmode, NULL);
+  set_optab_libfunc (unord_optab, SFmode, "__pruabi_unordf");
+  set_optab_libfunc (eq_optab, DFmode, NULL);
+  set_optab_libfunc (ne_optab, DFmode, "__pruabi_neqd");
+  set_optab_libfunc (gt_optab, DFmode, NULL);
+  set_optab_libfunc (ge_optab, DFmode, NULL);
+  set_optab_libfunc (lt_optab, DFmode, NULL);
+  set_optab_libfunc (le_optab, DFmode, NULL);
+  set_optab_libfunc (unord_optab, DFmode, "__pruabi_unordd");
+
+  /* Floating-point to integer conversions.  */
+  set_conv_libfunc (sfix_optab, SImode, DFmode, "__pruabi_fixdi");
+  set_conv_libfunc (ufix_optab, SImode, DFmode, "__pruabi_fixdu");
+  set_conv_libfunc (sfix_optab, DImode, DFmode, "__pruabi_fixdlli");
+  set_conv_libfunc (ufix_optab, DImode, DFmode, "__pruabi_fixdull");
+  set_conv_libfunc (sfix_optab, SImode, SFmode, "__pruabi_fixfi");
+  set_conv_libfunc (ufix_optab, SImode, SFmode, "__pruabi_fixfu");
+  set_conv_libfunc (sfix_optab, DImode, SFmode, "__pruabi_fixflli");
+  set_conv_libfunc (ufix_optab, DImode, SFmode, "__pruabi_fixfull");
+
+  /* Conversions between floating types.  */
+  set_conv_libfunc (trunc_optab, SFmode, DFmode, "__pruabi_cvtdf");
+  set_conv_libfunc (sext_optab, DFmode, SFmode, "__pruabi_cvtfd");
+
+  /* Integer to floating-point conversions.  */
+  set_conv_libfunc (sfloat_optab, DFmode, SImode, "__pruabi_fltid");
+  set_conv_libfunc (ufloat_optab, DFmode, SImode, "__pruabi_fltud");
+  set_conv_libfunc (sfloat_optab, DFmode, DImode, "__pruabi_fltllid");
+  set_conv_libfunc (ufloat_optab, DFmode, DImode, "__pruabi_fltulld");
+  set_conv_libfunc (sfloat_optab, SFmode, SImode, "__pruabi_fltif");
+  set_conv_libfunc (ufloat_optab, SFmode, SImode, "__pruabi_fltuf");
+  set_conv_libfunc (sfloat_optab, SFmode, DImode, "__pruabi_fltllif");
+  set_conv_libfunc (ufloat_optab, SFmode, DImode, "__pruabi_fltullf");
+
+  /* Long long.  */
+  set_optab_libfunc (ashr_optab, DImode, "__pruabi_asrll");
+  set_optab_libfunc (smul_optab, DImode, "__pruabi_mpyll");
+  set_optab_libfunc (ashl_optab, DImode, "__pruabi_lslll");
+  set_optab_libfunc (lshr_optab, DImode, "__pruabi_lsrll");
+
+  set_optab_libfunc (sdiv_optab, SImode, "__pruabi_divi");
+  set_optab_libfunc (udiv_optab, SImode, "__pruabi_divu");
+  set_optab_libfunc (smod_optab, SImode, "__pruabi_remi");
+  set_optab_libfunc (umod_optab, SImode, "__pruabi_remu");
+  set_optab_libfunc (sdivmod_optab, SImode, "__pruabi_divremi");
+  set_optab_libfunc (udivmod_optab, SImode, "__pruabi_divremu");
+  set_optab_libfunc (sdiv_optab, DImode, "__pruabi_divlli");
+  set_optab_libfunc (udiv_optab, DImode, "__pruabi_divull");
+  set_optab_libfunc (smod_optab, DImode, "__pruabi_remlli");
+  set_optab_libfunc (umod_optab, DImode, "__pruabi_remull");
+  set_optab_libfunc (udivmod_optab, DImode, "__pruabi_divremull");
+}
+
+
+/* Emit comparison instruction if necessary, returning the expression
+   that holds the compare result in the proper mode.  Return the comparison
+   that should be used in the jump insn.  */
+
+rtx
+pru_expand_fp_compare (rtx comparison, machine_mode mode)
+{
+  enum rtx_code code = GET_CODE (comparison);
+  rtx op0 = XEXP (comparison, 0);
+  rtx op1 = XEXP (comparison, 1);
+  rtx cmp;
+  enum rtx_code jump_code = code;
+  machine_mode op_mode = GET_MODE (op0);
+  rtx_insn *insns;
+  rtx libfunc;
+
+  gcc_assert (op_mode == DFmode || op_mode == SFmode);
+
+  if (code == UNGE)
+    {
+      code = LT;
+      jump_code = EQ;
+    }
+  else if (code == UNLE)
+    {
+      code = GT;
+      jump_code = EQ;
+    }
+  else
+    jump_code = NE;
+
+  switch (code)
+    {
+    case EQ:
+      libfunc = op_mode == DFmode ? eqdf_libfunc : eqsf_libfunc;
+      break;
+    case NE:
+      libfunc = op_mode == DFmode ? nedf_libfunc : nesf_libfunc;
+      break;
+    case GT:
+      libfunc = op_mode == DFmode ? gtdf_libfunc : gtsf_libfunc;
+      break;
+    case GE:
+      libfunc = op_mode == DFmode ? gedf_libfunc : gesf_libfunc;
+      break;
+    case LT:
+      libfunc = op_mode == DFmode ? ltdf_libfunc : ltsf_libfunc;
+      break;
+    case LE:
+      libfunc = op_mode == DFmode ? ledf_libfunc : lesf_libfunc;
+      break;
+    default:
+      gcc_unreachable ();
+    }
+  start_sequence ();
+
+  cmp = emit_library_call_value (libfunc, 0, LCT_CONST, SImode,
+				 op0, op_mode, op1, op_mode);
+  insns = get_insns ();
+  end_sequence ();
+
+  emit_libcall_block (insns, cmp, cmp,
+		      gen_rtx_fmt_ee (code, SImode, op0, op1));
+
+  return gen_rtx_fmt_ee (jump_code, mode, cmp, const0_rtx);
+}
+
+/* Return the sign bit position for given OP's mode.  */
+static int
+sign_bit_position (const rtx op)
+{
+  const int sz = GET_MODE_SIZE (GET_MODE (op));
+
+  return  sz * 8 - 1;
+}
+
+/* Output asm code for sign_extend operation.  */
+const char *
+pru_output_sign_extend (rtx *operands)
+{
+  static char buf[512];
+  int bufi;
+  const int dst_sz = GET_MODE_SIZE (GET_MODE (operands[0]));
+  const int src_sz = GET_MODE_SIZE (GET_MODE (operands[1]));
+  char ext_start;
+
+  switch (src_sz)
+    {
+    case 1: ext_start = 'y'; break;
+    case 2: ext_start = 'z'; break;
+    default: gcc_unreachable ();
+    }
+
+  gcc_assert (dst_sz > src_sz);
+
+  /* Note that src and dst can be different parts of the same
+     register, e.g. "r7, r7.w1".  */
+  bufi = snprintf (buf, sizeof (buf),
+	  "mov\t%%0, %%1\n\t"		      /* Copy AND make positive.  */
+	  "qbbc\t.+8, %%0, %d\n\t"	      /* Check sign bit.  */
+	  "fill\t%%%c0, %d",		      /* Make negative.  */
+	  sign_bit_position (operands[1]),
+	  ext_start,
+	  dst_sz - src_sz);
+
+  gcc_assert (bufi > 0);
+  gcc_assert ((unsigned int)bufi < sizeof (buf));
+
+  return buf;
+}
+
+/* Branches and compares.  */
+
+/* PRU's ALU does not support signed comparison operations.  That's why we
+   emulate them.  By first checking the sign bit and handling every possible
+   operand sign combination, we can simulate signed comparisons in just
+   5 instructions.  See table below.
+
+.-------------------.---------------------------------------------------.
+| Operand sign bit  | Mapping the signed comparison to an unsigned one  |
+|---------+---------+------------+------------+------------+------------|
+| OP1.b31 | OP2.b31 | OP1 < OP2  | OP1 <= OP2 | OP1 > OP2  | OP1 >= OP2 |
+|---------+---------+------------+------------+------------+------------|
+| 0       | 0       | OP1 < OP2  | OP1 <= OP2 | OP1 > OP2  | OP1 >= OP2 |
+|---------+---------+------------+------------+------------+------------|
+| 0       | 1       | false      | false      | true       | true       |
+|---------+---------+------------+------------+------------+------------|
+| 1       | 0       | true       | true       | false      | false      |
+|---------+---------+------------+------------+------------+------------|
+| 1       | 1       | OP1 < OP2  | OP1 <= OP2 | OP1 > OP2  | OP1 >= OP2 |
+`---------'---------'------------'------------'------------+------------'
+
+
+Given the table above, here is an example for a concrete op:
+  LT:
+		    qbbc OP1_POS, OP1, 31
+  OP1_NEG:	    qbbc BRANCH_TAKEN_LABEL, OP2, 31
+  OP1_NEG_OP2_NEG:  qblt BRANCH_TAKEN_LABEL, OP2, OP1
+		    ; jmp OUT -> can be eliminated because we'll take the
+		    ; following branch.  OP2.b31 is guaranteed to be 1
+		    ; by the time we get here.
+  OP1_POS:	    qbbs OUT, OP2, 31
+  OP1_POS_OP2_POS:  qblt BRANCH_TAKEN_LABEL, OP2, OP1
+#if FAR_JUMP
+		    jmp OUT
+BRANCH_TAKEN_LABEL: jmp REAL_BRANCH_TAKEN_LABEL
+#endif
+  OUT:
+
+*/
+
+/* Output asm code for a signed-compare LT/LE conditional branch.  */
+static const char *
+pru_output_ltle_signed_cbranch (rtx *operands, bool is_near)
+{
+  static char buf[1024];
+  enum rtx_code code = GET_CODE (operands[0]);
+  rtx op1;
+  rtx op2;
+  const char *cmp_opstr;
+  int bufi = 0;
+
+  op1 = operands[1];
+  op2 = operands[2];
+
+  gcc_assert (GET_CODE (op1) == REG && GET_CODE (op2) == REG);
+
+  /* Determine the comparison operators for positive and negative operands.  */
+  if (code == LT)
+      cmp_opstr = "qblt";
+  else if (code == LE)
+      cmp_opstr = "qble";
+  else
+      gcc_unreachable ();
+
+  if (is_near)
+    {
+      bufi = snprintf (buf, sizeof (buf),
+		       "qbbc\t.+12, %%1, %d\n\t"
+		       "qbbc\t%%l3, %%2, %d\n\t"  /* OP1_NEG.  */
+		       "%s\t%%l3, %%2, %%1\n\t"   /* OP1_NEG_OP2_NEG.  */
+		       "qbbs\t.+8, %%2, %d\n\t"   /* OP1_POS.  */
+		       "%s\t%%l3, %%2, %%1",	  /* OP1_POS_OP2_POS.  */
+		       sign_bit_position (op1),
+		       sign_bit_position (op2),
+		       cmp_opstr,
+		       sign_bit_position (op2),
+		       cmp_opstr);
+    }
+  else
+    {
+      bufi = snprintf (buf, sizeof (buf),
+		       "qbbc\t.+12, %%1, %d\n\t"
+		       "qbbc\t.+20, %%2, %d\n\t"  /* OP1_NEG.  */
+		       "%s\t.+16, %%2, %%1\n\t"   /* OP1_NEG_OP2_NEG.  */
+		       "qbbs\t.+16, %%2, %d\n\t"  /* OP1_POS.  */
+		       "%s\t.+8, %%2, %%1\n\t"    /* OP1_POS_OP2_POS.  */
+		       "jmp\t.+8\n\t"		  /* jmp OUT.  */
+		       "jmp\t%%%%label(%%l3)",	  /* BRANCH_TAKEN_LABEL.  */
+		       sign_bit_position (op1),
+		       sign_bit_position (op2),
+		       cmp_opstr,
+		       sign_bit_position (op2),
+		       cmp_opstr);
+    }
+
+  gcc_assert (bufi > 0);
+  gcc_assert ((unsigned int)bufi < sizeof (buf));
+
+  return buf;
+}
+
+/* Output asm code for a signed-compare GT/GE conditional branch.  */
+static const char *
+pru_output_gtge_signed_cbranch (rtx *operands, bool is_near)
+{
+  static char buf[1024];
+  enum rtx_code code = GET_CODE (operands[0]);
+  rtx op1;
+  rtx op2;
+  const char *cmp_opstr;
+  int bufi = 0;
+
+  op1 = operands[1];
+  op2 = operands[2];
+
+  gcc_assert (GET_CODE (op1) == REG && GET_CODE (op2) == REG);
+
+  /* Determine the comparison operators for positive and negative operands.  */
+  if (code == GT)
+      cmp_opstr = "qbgt";
+  else if (code == GE)
+      cmp_opstr = "qbge";
+  else
+    gcc_unreachable ();
+
+  if (is_near)
+    {
+      bufi = snprintf (buf, sizeof (buf),
+		       "qbbs\t.+12, %%1, %d\n\t"
+		       "qbbs\t%%l3, %%2, %d\n\t"  /* OP1_POS.  */
+		       "%s\t%%l3, %%2, %%1\n\t"   /* OP1_POS_OP2_POS.  */
+		       "qbbc\t.+8, %%2, %d\n\t"   /* OP1_NEG.  */
+		       "%s\t%%l3, %%2, %%1",      /* OP1_NEG_OP2_NEG.  */
+		       sign_bit_position (op1),
+		       sign_bit_position (op2),
+		       cmp_opstr,
+		       sign_bit_position (op2),
+		       cmp_opstr);
+    }
+  else
+    {
+      bufi = snprintf (buf, sizeof (buf),
+		       "qbbs\t.+12, %%1, %d\n\t"
+		       "qbbs\t.+20, %%2, %d\n\t"  /* OP1_POS.  */
+		       "%s\t.+16, %%2, %%1\n\t"   /* OP1_POS_OP2_POS.  */
+		       "qbbc\t.+16, %%2, %d\n\t"  /* OP1_NEG.  */
+		       "%s\t.+8, %%2, %%1\n\t"    /* OP1_NEG_OP2_NEG.  */
+		       "jmp\t.+8\n\t"		  /* jmp OUT.  */
+		       "jmp\t%%%%label(%%l3)",	  /* BRANCH_TAKEN_LABEL.  */
+		       sign_bit_position (op1),
+		       sign_bit_position (op2),
+		       cmp_opstr,
+		       sign_bit_position (op2),
+		       cmp_opstr);
+    }
+
+  gcc_assert (bufi > 0);
+  gcc_assert ((unsigned int)bufi < sizeof (buf));
+
+  return buf;
+}
+
+/* Output asm code for a signed-compare conditional branch.
+
+   If IS_NEAR is true, then QBBx instructions may be used for reaching
+   the destination label.  Otherwise JMP is used, at the expense of
+   increased code size.  */
+const char *
+pru_output_signed_cbranch (rtx *operands, bool is_near)
+{
+  enum rtx_code code = GET_CODE (operands[0]);
+
+  if (code == LT || code == LE)
+    return pru_output_ltle_signed_cbranch (operands, is_near);
+  else if (code == GT || code == GE)
+    return pru_output_gtge_signed_cbranch (operands, is_near);
+  else
+      gcc_unreachable ();
+}
+
+/*
+   Optimized version of pru_output_signed_cbranch for constant second
+   operand.  */
+
+const char *
+pru_output_signed_cbranch_ubyteop2 (rtx *operands, bool is_near)
+{
+  static char buf[1024];
+  enum rtx_code code = GET_CODE (operands[0]);
+  int regop_sign_bit_pos = sign_bit_position (operands[1]);
+  const char *cmp_opstr;
+  const char *rcmp_opstr;
+
+  /* We must swap operands due to PRU's demand OP1 to be the immediate.  */
+  code = swap_condition (code);
+
+  /* Determine normal and reversed comparison operators for both positive
+     operands.  This enables us to go completely unsigned.
+
+     NOTE: We cannot use the R print modifier because we convert signed
+     comparison operators to unsigned ones.  */
+  switch (code)
+    {
+    case LT: cmp_opstr = "qblt"; rcmp_opstr = "qbge"; break;
+    case LE: cmp_opstr = "qble"; rcmp_opstr = "qbgt"; break;
+    case GT: cmp_opstr = "qbgt"; rcmp_opstr = "qble"; break;
+    case GE: cmp_opstr = "qbge"; rcmp_opstr = "qblt"; break;
+    default: gcc_unreachable ();
+    }
+
+  /* OP2 is a constant unsigned byte - utilize this info to generate
+     optimized code.  We can "remove half" of the op table above because
+     we know that OP2.b31 = 0 (remember that 0 <= OP2 <= 255).  */
+  if (code == LT || code == LE)
+    {
+      if (is_near)
+	snprintf (buf, sizeof (buf),
+		  "qbbs\t.+8, %%1, %d\n\t"
+		  "%s\t%%l3, %%1, %%2",
+		  regop_sign_bit_pos,
+		  cmp_opstr);
+      else
+	snprintf (buf, sizeof (buf),
+		  "qbbs\t.+12, %%1, %d\n\t"
+		  "%s\t.+8, %%1, %%2\n\t"
+		  "jmp\t%%%%label(%%l3)",
+		  regop_sign_bit_pos,
+		  rcmp_opstr);
+    }
+  else if (code == GT || code == GE)
+    {
+      if (is_near)
+	snprintf (buf, sizeof (buf),
+		  "qbbs\t%%l3, %%1, %d\n\t"
+		  "%s\t%%l3, %%1, %%2",
+		  regop_sign_bit_pos,
+		  cmp_opstr);
+      else
+	snprintf (buf, sizeof (buf),
+		  "qbbs\t.+8, %%1, %d\n\t"
+		  "%s\t.+8, %%1, %%2\n\t"
+		  "jmp\t%%%%label(%%l3)",
+		  regop_sign_bit_pos,
+		  rcmp_opstr);
+    }
+  else
+    gcc_unreachable ();
+
+  return buf;
+}
+
+/*
+   Optimized version of pru_output_signed_cbranch_ubyteop2 for constant
+   zero second operand.  */
+
+const char *
+pru_output_signed_cbranch_zeroop2 (rtx *operands, bool is_near)
+{
+  static char buf[1024];
+  enum rtx_code code = GET_CODE (operands[0]);
+  int regop_sign_bit_pos = sign_bit_position (operands[1]);
+
+  /* OP2 is a constant zero - utilize this info to simply check the
+     OP1 sign bit when comparing for LT or GE.  */
+  if (code == LT)
+    {
+      if (is_near)
+	snprintf (buf, sizeof (buf),
+		  "qbbs\t%%l3, %%1, %d\n\t",
+		  regop_sign_bit_pos);
+      else
+	snprintf (buf, sizeof (buf),
+		  "qbbc\t.+8, %%1, %d\n\t"
+		  "jmp\t%%%%label(%%l3)",
+		  regop_sign_bit_pos);
+    }
+  else if (code == GE)
+    {
+      if (is_near)
+	snprintf (buf, sizeof (buf),
+		  "qbbc\t%%l3, %%1, %d\n\t",
+		  regop_sign_bit_pos);
+      else
+	snprintf (buf, sizeof (buf),
+		  "qbbs\t.+8, %%1, %d\n\t"
+		  "jmp\t%%%%label(%%l3)",
+		  regop_sign_bit_pos);
+    }
+  else
+    gcc_unreachable ();
+
+  return buf;
+}
+
+/* Addressing Modes.  */
+
+/* Return true if register REGNO is a valid base register.
+   STRICT_P is true if REG_OK_STRICT is in effect.  */
+
+bool
+pru_regno_ok_for_base_p (int regno, bool strict_p)
+{
+  if (!HARD_REGISTER_NUM_P (regno))
+    {
+      if (!strict_p)
+	return true;
+
+      if (!reg_renumber)
+	return false;
+
+      regno = reg_renumber[regno];
+    }
+
+  /* The fake registers will be eliminated to either the stack or
+     hard frame pointer, both of which are usually valid base registers.
+     Reload deals with the cases where the eliminated form isn't valid.  */
+  return (GP_REG_P (regno)
+	  || regno == FRAME_POINTER_REGNUM
+	  || regno == ARG_POINTER_REGNUM);
+}
+
+/* Return true if given xbbo constant OFFSET is valid.  */
+static bool
+pru_valid_const_ubyte_offset (machine_mode mode, HOST_WIDE_INT offset)
+{
+  bool valid = UBYTE_INT (offset);
+
+  /* Reload can split multi word accesses, so make sure we can address
+     the second word in a DI.  */
+  if (valid && GET_MODE_SIZE (mode) > GET_MODE_SIZE (SImode))
+    valid = UBYTE_INT (offset + GET_MODE_SIZE (mode) - 1);
+
+  return valid;
+}
+
+/* Recognize a CTABLE base address.  Return CTABLE entry index, or -1 if
+ * base was not found in the pragma-filled pru_ctable.  */
+int
+pru_get_ctable_exact_base_index (unsigned HOST_WIDE_INT caddr)
+{
+  unsigned int i;
+
+  for (i = 0; i < ARRAY_SIZE (pru_ctable); i++)
+    {
+      if (pru_ctable[i].valid && pru_ctable[i].base == caddr)
+	return i;
+    }
+  return -1;
+}
+
+
+/* Check if the given address can be addressed via CTABLE_BASE + UBYTE_OFFS,
+   and return the base CTABLE index if possible.  */
+int
+pru_get_ctable_base_index (unsigned HOST_WIDE_INT caddr)
+{
+  unsigned int i;
+
+  for (i = 0; i < ARRAY_SIZE (pru_ctable); i++)
+    {
+      if (pru_ctable[i].valid && IN_RANGE (caddr,
+					   pru_ctable[i].base,
+					   pru_ctable[i].base + 0xff))
+	return i;
+    }
+  return -1;
+}
+
+
+/* Return the offset from some CTABLE base for this address.  */
+int
+pru_get_ctable_base_offset (unsigned HOST_WIDE_INT caddr)
+{
+  int i;
+
+  i = pru_get_ctable_base_index (caddr);
+  gcc_assert (i >= 0);
+
+  return caddr - pru_ctable[i].base;
+}
+
+/* Return true if the address expression formed by BASE + OFFSET is
+   valid.  */
+static bool
+pru_valid_addr_expr_p (machine_mode mode, rtx base, rtx offset, bool strict_p)
+{
+  if (!strict_p && base != NULL_RTX && GET_CODE (base) == SUBREG)
+    base = SUBREG_REG (base);
+  if (!strict_p && offset != NULL_RTX && GET_CODE (offset) == SUBREG)
+    offset = SUBREG_REG (offset);
+
+  if (REG_P (base)
+      && pru_regno_ok_for_base_p (REGNO (base), strict_p)
+      && (offset == NULL_RTX
+	  || (CONST_INT_P (offset)
+	      && pru_valid_const_ubyte_offset (mode, INTVAL (offset)))
+	  || (REG_P (offset)
+	      && pru_regno_ok_for_index_p (REGNO (offset), strict_p))))
+    {
+      /*     base register + register offset
+       * OR  base register + UBYTE constant offset.  */
+      return true;
+    }
+  else if (REG_P (base)
+	   && pru_regno_ok_for_index_p (REGNO (base), strict_p)
+	   && (offset != NULL_RTX && ctable_base_operand (offset, VOIDmode)))
+    {
+      /*     base CTABLE constant base + register offset
+       * Note: GCC always puts the register as a first operand of PLUS.  */
+      return true;
+    }
+  else if (CONST_INT_P (base)
+	   && offset == NULL_RTX
+	   && (ctable_addr_operand (base, VOIDmode)))
+    {
+      /*     base CTABLE constant base + UBYTE constant offset.  */
+      return true;
+    }
+  else
+    {
+      return false;
+    }
+}
+
+/* Implement TARGET_LEGITIMATE_ADDRESS_P.  */
+static bool
+pru_legitimate_address_p (machine_mode mode,
+			    rtx operand, bool strict_p)
+{
+  switch (GET_CODE (operand))
+    {
+      /* Direct.  */
+    case SYMBOL_REF:
+    case LABEL_REF:
+    case CONST:
+    case CONST_DOUBLE:
+      return false;
+
+    case CONST_INT:
+      return ctable_addr_operand (operand, VOIDmode);
+
+      /* Register indirect.  */
+    case REG:
+      return pru_regno_ok_for_base_p (REGNO (operand), strict_p);
+
+      /* Register indirect with displacement.  */
+    case PLUS:
+	{
+	  rtx op0 = XEXP (operand, 0);
+	  rtx op1 = XEXP (operand, 1);
+
+	  return (pru_valid_addr_expr_p (mode, op0, op1, strict_p)
+		  || pru_valid_addr_expr_p (mode, op1, op0, strict_p));
+	}
+
+    default:
+      break;
+    }
+  return false;
+}
+
+/* Output assembly language related definitions.  */
+
+/* Implement TARGET_ASM_CONSTRUCTOR.  */
+static void
+pru_elf_asm_constructor (rtx symbol, int priority)
+{
+  char buf[23];
+  section *s;
+
+  if (priority == DEFAULT_INIT_PRIORITY)
+    snprintf (buf, sizeof (buf), ".init_array");
+  else
+    {
+      /* While priority is known to be in range [0, 65535], so 18 bytes
+	 would be enough, the compiler might not know that.  To avoid
+	 -Wformat-truncation false positive, use a larger size.  */
+      snprintf (buf, sizeof (buf), ".init_array.%.5u", priority);
+    }
+  s = get_section (buf, SECTION_WRITE | SECTION_NOTYPE, NULL);
+  switch_to_section (s);
+  assemble_aligned_integer (INIT_ARRAY_ENTRY_BYTES, symbol);
+}
+
+/* Implement TARGET_ASM_DESTRUCTOR.  */
+static void
+pru_elf_asm_destructor (rtx symbol, int priority)
+{
+  char buf[23];
+  section *s;
+
+  if (priority == DEFAULT_INIT_PRIORITY)
+    snprintf (buf, sizeof (buf), ".fini_array");
+  else
+    {
+      /* While priority is known to be in range [0, 65535], so 18 bytes
+	 would be enough, the compiler might not know that.  To avoid
+	 -Wformat-truncation false positive, use a larger size.  */
+      snprintf (buf, sizeof (buf), ".fini_array.%.5u", priority);
+    }
+  s = get_section (buf, SECTION_WRITE | SECTION_NOTYPE, NULL);
+  switch_to_section (s);
+  assemble_aligned_integer (INIT_ARRAY_ENTRY_BYTES, symbol);
+}
+
+/* Map rtx_code to unsigned PRU branch op suffix.  Callers must
+   handle sign comparison themselves for signed operations.  */
+static const char *
+pru_comparison_str (enum rtx_code cond)
+{
+  switch (cond)
+    {
+    case NE:  return "ne";
+    case EQ:  return "eq";
+    case GEU: return "ge";
+    case GTU: return "gt";
+    case LEU: return "le";
+    case LTU: return "lt";
+    default: gcc_unreachable ();
+    }
+}
+
+/* Access some RTX as INT_MODE.  If X is a CONST_FIXED we can get
+   the bit representation of X by "casting" it to CONST_INT.  */
+
+static rtx
+pru_to_int_mode (rtx x)
+{
+  machine_mode mode = GET_MODE (x);
+
+  return VOIDmode == mode
+    ? x
+    : simplify_gen_subreg (int_mode_for_mode (mode).require (), x, mode, 0);
+}
+
+/* Translate between the MachineDescription notion
+   of 8-bit consecutive registers, to the PRU
+   assembler syntax of REGWORD[.SUBREG].  */
+static const char *
+pru_asm_regname (rtx op)
+{
+  static char canon_reg_names[3][LAST_GP_REG][8];
+  int speci, regi;
+
+  gcc_assert (REG_P (op));
+
+  if (!canon_reg_names[0][0][0])
+    {
+      for (regi = 0; regi < LAST_GP_REG; regi++)
+	for (speci = 0; speci < 3; speci++)
+	  {
+	    const int sz = (speci == 0) ? 1 : ((speci == 1) ? 2 : 4);
+	    if ((regi + sz) > (32 * 4))
+	      continue;	/* Invalid entry.  */
+
+	    /* Construct the lookup table.  */
+	    const char *suffix = "";
+
+	    switch ((sz << 8) | (regi % 4))
+	      {
+	      case (1 << 8) | 0: suffix = ".b0"; break;
+	      case (1 << 8) | 1: suffix = ".b1"; break;
+	      case (1 << 8) | 2: suffix = ".b2"; break;
+	      case (1 << 8) | 3: suffix = ".b3"; break;
+	      case (2 << 8) | 0: suffix = ".w0"; break;
+	      case (2 << 8) | 1: suffix = ".w1"; break;
+	      case (2 << 8) | 2: suffix = ".w2"; break;
+	      case (4 << 8) | 0: suffix = ""; break;
+	      default:
+		/* Invalid entry.  */
+		continue;
+	      }
+	    sprintf (&canon_reg_names[speci][regi][0],
+		     "r%d%s", regi / 4, suffix);
+	  }
+    }
+
+  switch (GET_MODE_SIZE (GET_MODE (op)))
+    {
+    case 1: speci = 0; break;
+    case 2: speci = 1; break;
+    case 4: speci = 2; break;
+    case 8: speci = 2; break; /* Existing GCC test cases are not using %F.  */
+    default: gcc_unreachable ();
+    }
+  regi = REGNO (op);
+  gcc_assert (regi < LAST_GP_REG);
+  gcc_assert (canon_reg_names[speci][regi][0]);
+
+  return &canon_reg_names[speci][regi][0];
+}
+
+/* Print the operand OP to file stream FILE modified by LETTER.
+   LETTER can be one of:
+
+     b: prints the register byte start (used by LBBO/SBBO)
+     B: prints 'c' or 'b' for CTABLE or REG base in a memory address
+     F: Full 32-bit register.
+     H: Higher 16-bits of a const_int operand
+     L: Lower 16-bits of a const_int operand
+     N: prints next 32-bit register (upper 32bits of a 64bit REG couple)
+     P: prints swapped condition.
+     Q: prints swapped and reversed condition.
+     R: prints reversed condition.
+     S: print operand mode size (but do not print the operand itself)
+     T: print exact_log2 () for const_int operands
+     V: print exact_log2 () of negated const_int operands
+     w: Lower 32-bits of a const_int operand
+     W: Upper 32-bits of a const_int operand
+     y: print the next 8-bit register (regardless of op size)
+     z: print the second next 8-bit register (regardless of op size)
+*/
+static void
+pru_print_operand (FILE *file, rtx op, int letter)
+{
+
+  switch (letter)
+    {
+    case 'S':
+      fprintf (file, "%d", GET_MODE_SIZE (GET_MODE (op)));
+      return;
+
+    default:
+      break;
+    }
+
+  if (comparison_operator (op, VOIDmode))
+    {
+      enum rtx_code cond = GET_CODE (op);
+      gcc_assert (!pru_signed_cmp_operator (op, VOIDmode));
+
+      switch (letter)
+	{
+	case 0:
+	  fprintf (file, "%s", pru_comparison_str (cond));
+	  return;
+	case 'P':
+	  fprintf (file, "%s", pru_comparison_str (swap_condition (cond)));
+	  return;
+	case 'Q':
+	  cond = swap_condition (cond);
+	  /* Fall through to reverse.  */
+	case 'R':
+	  fprintf (file, "%s", pru_comparison_str (reverse_condition (cond)));
+	  return;
+	}
+    }
+
+  switch (GET_CODE (op))
+    {
+    case REG:
+      if (letter == 0)
+	{
+	  fprintf (file, "%s", pru_asm_regname (op));
+	  return;
+	}
+      else if (letter == 'b')
+	{
+	  gcc_assert (REGNO (op) <= LAST_NONIO_GP_REG);
+	  fprintf (file, "r%d.b%d", REGNO (op) / 4, REGNO (op) % 4);
+	  return;
+	}
+      else if (letter == 'F')
+	{
+	  gcc_assert (REGNO (op) <= LAST_NONIO_GP_REG);
+	  gcc_assert (REGNO (op) % 4 == 0);
+	  fprintf (file, "r%d", REGNO (op) / 4);
+	  return;
+	}
+      else if (letter == 'N')
+	{
+	  gcc_assert (REGNO (op) <= LAST_NONIO_GP_REG);
+	  gcc_assert (REGNO (op) % 4 == 0);
+	  fprintf (file, "r%d", REGNO (op) / 4 + 1);
+	  return;
+	}
+      else if (letter == 'y')
+	{
+	  gcc_assert (REGNO (op) <= LAST_NONIO_GP_REG - 1);
+	  fprintf (file, "%s", reg_names[REGNO (op) + 1]);
+	  return;
+	}
+      else if (letter == 'z')
+	{
+	  gcc_assert (REGNO (op) <= LAST_NONIO_GP_REG - 2);
+	  fprintf (file, "%s", reg_names[REGNO (op) + 2]);
+	  return;
+	}
+      break;
+
+    case CONST_INT:
+      if (letter == 'H')
+	{
+	  HOST_WIDE_INT val = INTVAL (op);
+	  val = (val >> 16) & 0xFFFF;
+	  output_addr_const (file, gen_int_mode (val, SImode));
+	  return;
+	}
+      else if (letter == 'L')
+	{
+	  HOST_WIDE_INT val = INTVAL (op);
+	  val &= 0xFFFF;
+	  output_addr_const (file, gen_int_mode (val, SImode));
+	  return;
+	}
+      else if (letter == 'T')
+	{
+	  /* The predicate should have already validated the 1-high-bit
+	     requirement.  Use CTZ here to deal with constant's sign
+	     extension.  */
+	  HOST_WIDE_INT val = wi::ctz (INTVAL (op));
+	  gcc_assert (val >= 0 && val <= 31);
+	  output_addr_const (file, gen_int_mode (val, SImode));
+	  return;
+	}
+      else if (letter == 'V')
+	{
+	  HOST_WIDE_INT val = wi::ctz (~INTVAL (op));
+	  gcc_assert (val >= 0 && val <= 31);
+	  output_addr_const (file, gen_int_mode (val, SImode));
+	  return;
+	}
+      else if (letter == 'w')
+	{
+	  HOST_WIDE_INT val = INTVAL (op) & 0xffffffff;
+	  output_addr_const (file, gen_int_mode (val, SImode));
+	  return;
+	}
+      else if (letter == 'W')
+	{
+	  HOST_WIDE_INT val = (INTVAL (op) >> 32) & 0xffffffff;
+	  output_addr_const (file, gen_int_mode (val, SImode));
+	  return;
+	}
+      /* Else, fall through.  */
+
+    case CONST:
+    case LABEL_REF:
+    case SYMBOL_REF:
+      if (letter == 0)
+	{
+	  output_addr_const (file, op);
+	  return;
+	}
+      break;
+
+    case CONST_FIXED:
+	{
+	  HOST_WIDE_INT ival = INTVAL (pru_to_int_mode (op));
+	  if (letter != 0)
+	    output_operand_lossage ("Unsupported code '%c' for fixed-point:",
+				    letter);
+	  fprintf (file, HOST_WIDE_INT_PRINT_DEC, ival);
+	  return;
+	}
+      break;
+
+    case CONST_DOUBLE:
+      if (letter == 0)
+	{
+	  long val;
+
+	  if (GET_MODE (op) != SFmode)
+	    fatal_insn ("internal compiler error.  Unknown mode:", op);
+	  REAL_VALUE_TO_TARGET_SINGLE (*CONST_DOUBLE_REAL_VALUE (op), val);
+	  fprintf (file, "0x%lx", val);
+	  return;
+	}
+      else if (letter == 'w' || letter == 'W')
+	{
+	  long t[2];
+	  REAL_VALUE_TO_TARGET_DOUBLE (*CONST_DOUBLE_REAL_VALUE (op), t);
+	  fprintf (file, "0x%lx", t[letter == 'w' ? 0 : 1]);
+	  return;
+	}
+      else
+	{
+	  gcc_unreachable ();
+	}
+      break;
+
+    case SUBREG:
+    case MEM:
+      if (letter == 0)
+	{
+	  output_address (VOIDmode, op);
+	  return;
+	}
+      else if (letter == 'B')
+	{
+	  rtx base = XEXP (op, 0);
+	  if (GET_CODE (base) == PLUS)
+	    {
+	      rtx op0 = XEXP (base, 0);
+	      rtx op1 = XEXP (base, 1);
+
+	      /* PLUS cannot have two constant operands, so one
+		 of them must be a REG, hence we must check for an
+		 exact base address.  */
+	      if (ctable_base_operand (op0, VOIDmode)
+		  || ctable_base_operand (op1, VOIDmode))
+		{
+		  fprintf (file, "c");
+		  return;
+		}
+	      else if (REG_P (op0) || REG_P (op1))
+		{
+		  fprintf (file, "b");
+		  return;
+		}
+	      else
+		gcc_unreachable ();
+	    }
+	  else if (REG_P (base))
+	    {
+	      fprintf (file, "b");
+	      return;
+	    }
+	  else if (ctable_addr_operand (base, VOIDmode))
+	    {
+	      fprintf (file, "c");
+	      return;
+	    }
+	  else
+	    gcc_unreachable ();
+	}
+      break;
+
+    case CODE_LABEL:
+      if (letter == 0)
+	{
+	  output_addr_const (file, op);
+	  return;
+	}
+      break;
+
+    default:
+      break;
+    }
+
+  output_operand_lossage ("Unsupported operand %s for code '%c'",
+			  GET_RTX_NAME (GET_CODE (op)), letter);
+  gcc_unreachable ();
+}
+
+/* Implement TARGET_PRINT_OPERAND_ADDRESS.  */
+static void
+pru_print_operand_address (FILE *file, machine_mode mode, rtx op)
+{
+  if (GET_CODE (op) != REG && CONSTANT_ADDRESS_P (op)
+      && text_segment_operand (op, VOIDmode))
+    {
+      fprintf (stderr, "Unexpectred text address?\n");
+      debug_rtx (op);
+      gcc_unreachable ();
+    }
+
+  switch (GET_CODE (op))
+    {
+    case CONST:
+    case LABEL_REF:
+    case CONST_DOUBLE:
+    case SYMBOL_REF:
+      break;
+
+    case CONST_INT:
+      {
+	unsigned HOST_WIDE_INT caddr = INTVAL (op);
+	int base = pru_get_ctable_base_index (caddr);
+	int offs = pru_get_ctable_base_offset (caddr);
+	gcc_assert (base >= 0);
+	fprintf (file, "%d, %d", base, offs);
+	return;
+      }
+      break;
+
+    case PLUS:
+      {
+	int base;
+	rtx op0 = XEXP (op, 0);
+	rtx op1 = XEXP (op, 1);
+
+	if (REG_P (op0) && CONST_INT_P (op1)
+	    && pru_get_ctable_exact_base_index (INTVAL (op1)) >= 0)
+	  {
+	    base = pru_get_ctable_exact_base_index (INTVAL (op1));
+	    fprintf (file, "%d, %s", base, pru_asm_regname (op0));
+	    return;
+	  }
+	else if (REG_P (op1) && CONST_INT_P (op0)
+		 && pru_get_ctable_exact_base_index (INTVAL (op0)) >= 0)
+	  {
+	    base = pru_get_ctable_exact_base_index (INTVAL (op0));
+	    fprintf (file, "%d, %s", base, pru_asm_regname (op1));
+	    return;
+	  }
+	else if (REG_P (op0) && CONSTANT_P (op1))
+	  {
+	    fprintf (file, "%s, ", pru_asm_regname (op0));
+	    output_addr_const (file, op1);
+	    return;
+	  }
+	else if (REG_P (op1) && CONSTANT_P (op0))
+	  {
+	    fprintf (file, "%s, ", pru_asm_regname (op1));
+	    output_addr_const (file, op0);
+	    return;
+	  }
+	else if (REG_P (op1) && REG_P (op0))
+	  {
+	    fprintf (file, "%s, %s", pru_asm_regname (op0),
+				     pru_asm_regname (op1));
+	    return;
+	  }
+      }
+      break;
+
+    case REG:
+      fprintf (file, "%s, 0", pru_asm_regname (op));
+      return;
+
+    case MEM:
+      {
+	rtx base = XEXP (op, 0);
+	pru_print_operand_address (file, mode, base);
+	return;
+      }
+    default:
+      break;
+    }
+
+  fprintf (stderr, "Missing way to print address\n");
+  debug_rtx (op);
+  gcc_unreachable ();
+}
+
+/* Implement TARGET_ASM_FUNCTION_PROLOGUE.  */
+static void
+pru_asm_function_prologue (FILE *file)
+{
+  if (flag_verbose_asm || flag_debug_asm)
+    {
+      pru_compute_frame_layout ();
+      pru_dump_frame_layout (file);
+    }
+}
+
+/* Implement `TARGET_ASM_INTEGER'.
+   Target hook for assembling integer objects.  PRU version needs
+   special handling for references to pmem.  Code copied from AVR.  */
+
+static bool
+pru_assemble_integer (rtx x, unsigned int size, int aligned_p)
+{
+  if (size == POINTER_SIZE / BITS_PER_UNIT
+      && aligned_p
+      && text_segment_operand (x, VOIDmode))
+    {
+      fputs ("\t.4byte\t%pmem(", asm_out_file);
+      output_addr_const (asm_out_file, x);
+      fputs (")\n", asm_out_file);
+
+      return true;
+    }
+  else if (size == INIT_ARRAY_ENTRY_BYTES
+	   && aligned_p
+	   && text_segment_operand (x, VOIDmode))
+    {
+      fputs ("\t.2byte\t%pmem(", asm_out_file);
+      output_addr_const (asm_out_file, x);
+      fputs (")\n", asm_out_file);
+
+      return true;
+    }
+  else
+    {
+      return default_assemble_integer (x, size, aligned_p);
+    }
+}
+
+/* Implement TARGET_ASM_FILETARGET_ASM_FILE_START_START.  */
+
+static void
+pru_file_start (void)
+{
+  default_file_start ();
+
+  /* Compiler will take care of placing %label, so there is no
+     need to confuse users with this warning.  */
+  fprintf (asm_out_file, "\t.set no_warn_regname_label\n");
+}
+
+/* Function argument related.  */
+
+/* Return the number of bytes needed for storing an argument with
+   the given MODE and TYPE.  */
+static int
+pru_function_arg_size (machine_mode mode, const_tree type)
+{
+  HOST_WIDE_INT param_size;
+
+  if (mode == BLKmode)
+    param_size = int_size_in_bytes (type);
+  else
+    param_size = GET_MODE_SIZE (mode);
+
+  /* Convert to words (round up).  */
+  param_size = (UNITS_PER_WORD - 1 + param_size) / UNITS_PER_WORD;
+  gcc_assert (param_size >= 0);
+
+  return param_size;
+}
+
+/* Check if argument with the given size must be
+   passed/returned in a register.
+
+   Reference:
+   https://e2e.ti.com/support/development_tools/compiler/f/343/p/650176/2393029
+
+   Arguments other than 8/16/24/32/64bits are passed on stack.  */
+static bool
+pru_arg_in_reg_bysize (size_t sz)
+{
+  return sz == 1 || sz == 2 || sz == 3 || sz == 4 || sz == 8;
+}
+
+/* Helper function to get the starting storage HW register for an argument,
+   or -1 if it must be passed on stack.  The cum_v state is not changed.  */
+static int
+pru_function_arg_regi (cumulative_args_t cum_v,
+		       machine_mode mode, const_tree type,
+		       bool named)
+{
+  CUMULATIVE_ARGS *cum = get_cumulative_args (cum_v);
+  size_t argsize = pru_function_arg_size (mode, type);
+  size_t i, bi;
+  int regi = -1;
+
+  if (!pru_arg_in_reg_bysize (argsize))
+    return -1;
+
+  if (!named)
+    return -1;
+
+  /* Find the first available slot that fits.  Yes, that's the PRU ABI.  */
+  for (i = 0; regi < 0 && i < ARRAY_SIZE (cum->regs_used); i++)
+    {
+      if (mode == BLKmode)
+	{
+	  /* Structs are passed, beginning at a full register.  */
+	  if ((i % 4) != 0)
+	    continue;
+	}
+      else
+	{
+	  /* Scalar arguments.  */
+
+	  /* Ensure SI and DI arguments are stored in full registers only.  */
+	  if ((argsize >= 4) && (i % 4) != 0)
+	    continue;
+
+	  /* rX.w0/w1/w2 are OK.  But avoid spreading the second byte
+	     into a different full register.  */
+	  if (argsize == 2 && (i % 4) == 3)
+	    continue;
+	}
+
+      for (bi = 0;
+	   bi < argsize && (bi + i) < ARRAY_SIZE (cum->regs_used);
+	   bi++)
+	{
+	  if (cum->regs_used[bi + i])
+	    break;
+	}
+      if (bi == argsize)
+	regi = FIRST_ARG_REGNO + i;
+    }
+
+  return regi;
+}
+
+/* Mark CUM_V that a function argument will occupy HW register slot starting
+   at REGI.  The number of consecutive 8-bit HW registers marked as occupied
+   depends on the MODE and TYPE of the argument.  */
+static void
+pru_function_arg_regi_mark_slot (int regi,
+				 cumulative_args_t cum_v,
+				 machine_mode mode, const_tree type,
+				 bool named)
+{
+  CUMULATIVE_ARGS *cum = get_cumulative_args (cum_v);
+  HOST_WIDE_INT param_size = pru_function_arg_size (mode, type);
+
+  gcc_assert (named);
+
+  /* Mark all byte sub-registers occupied by argument as used.  */
+  while (param_size--)
+    {
+      gcc_assert (regi >= FIRST_ARG_REGNO && regi <= LAST_ARG_REGNO);
+      gcc_assert (!cum->regs_used[regi - FIRST_ARG_REGNO]);
+      cum->regs_used[regi - FIRST_ARG_REGNO] = true;
+      regi++;
+    }
+}
+
+/* Define where to put the arguments to a function.  Value is zero to
+   push the argument on the stack, or a hard register in which to
+   store the argument.
+
+   MODE is the argument's machine mode.
+   TYPE is the data type of the argument (as a tree).
+   This is null for libcalls where that information may
+   not be available.
+   CUM is a variable of type CUMULATIVE_ARGS which gives info about
+   the preceding args and about the function being called.
+   NAMED is nonzero if this argument is a named parameter
+   (otherwise it is an extra parameter matching an ellipsis).  */
+
+static rtx
+pru_function_arg (cumulative_args_t cum_v, machine_mode mode,
+		    const_tree type,
+		    bool named)
+{
+  rtx return_rtx = NULL_RTX;
+  int regi = pru_function_arg_regi (cum_v, mode, type, named);
+
+  if (regi >= 0)
+    return_rtx = gen_rtx_REG (mode, regi);
+
+  return return_rtx;
+}
+
+/* Return number of bytes, at the beginning of the argument, that must be
+   put in registers.  0 is the argument is entirely in registers or entirely
+   in memory.  */
+
+static int
+pru_arg_partial_bytes (cumulative_args_t cum_v ATTRIBUTE_UNUSED,
+		       machine_mode mode ATTRIBUTE_UNUSED,
+		       tree type ATTRIBUTE_UNUSED,
+		       bool named ATTRIBUTE_UNUSED)
+{
+  return 0;
+}
+
+/* Update the data in CUM to advance over an argument of mode MODE
+   and data type TYPE; TYPE is null for libcalls where that information
+   may not be available.  */
+
+static void
+pru_function_arg_advance (cumulative_args_t cum_v, machine_mode mode,
+			    const_tree type,
+			    bool named)
+{
+  int regi = pru_function_arg_regi (cum_v, mode, type, named);
+
+  if (regi >= 0)
+    pru_function_arg_regi_mark_slot (regi, cum_v, mode, type, named);
+}
+
+/* Implement TARGET_FUNCTION_VALUE.  */
+static rtx
+pru_function_value (const_tree ret_type, const_tree fn ATTRIBUTE_UNUSED,
+		      bool outgoing ATTRIBUTE_UNUSED)
+{
+  return gen_rtx_REG (TYPE_MODE (ret_type), FIRST_RETVAL_REGNO);
+}
+
+/* Implement TARGET_LIBCALL_VALUE.  */
+static rtx
+pru_libcall_value (machine_mode mode, const_rtx fun ATTRIBUTE_UNUSED)
+{
+  return gen_rtx_REG (mode, FIRST_RETVAL_REGNO);
+}
+
+/* Implement TARGET_FUNCTION_VALUE_REGNO_P.  */
+static bool
+pru_function_value_regno_p (const unsigned int regno)
+{
+  return regno == FIRST_RETVAL_REGNO;
+}
+
+/* Implement TARGET_RETURN_IN_MEMORY.  */
+bool
+pru_return_in_memory (const_tree type, const_tree fntype ATTRIBUTE_UNUSED)
+{
+  bool in_memory = (!pru_arg_in_reg_bysize (int_size_in_bytes (type))
+		    || int_size_in_bytes (type) == -1);
+
+  return in_memory;
+}
+
+/* Implement TARGET_CAN_USE_DOLOOP_P.  */
+
+static bool
+pru_can_use_doloop_p (const widest_int &, const widest_int &iterations_max,
+		      unsigned int loop_depth, bool)
+{
+  /* Considering limitations in the hardware, only use doloop
+     for innermost loops which must be entered from the top.  */
+  if (loop_depth > 1)
+    return false;
+  /* PRU internal loop counter is 16bits wide.  Remember that iterations_max
+     holds the maximum number of loop latch executions, while PRU loop
+     instruction needs the count of loop body executions.  */
+  if (iterations_max == 0 || wi::geu_p (iterations_max, 0xffff))
+    return false;
+
+  return true;
+}
+
+/* NULL if INSN insn is valid within a low-overhead loop.
+   Otherwise return why doloop cannot be applied.  */
+
+static const char *
+pru_invalid_within_doloop (const rtx_insn *insn)
+{
+  if (CALL_P (insn))
+    return "Function call in the loop.";
+
+  if (JUMP_P (insn) && INSN_CODE (insn) == CODE_FOR_return)
+    return "Return from a call instruction in the loop.";
+
+  if (NONDEBUG_INSN_P (insn)
+      && INSN_CODE (insn) < 0
+      && (GET_CODE (PATTERN (insn)) == ASM_INPUT
+	  || asm_noperands (PATTERN (insn)) >= 0))
+    return "Loop contains asm statement.";
+
+  return NULL;
+}
+
+
+/* Figure out where to put LABEL, which is the label for a repeat loop.
+   The loop ends just before LAST_INSN.  If SHARED, insns other than the
+   "repeat" might use LABEL to jump to the loop's continuation point.
+
+   Return the last instruction in the adjusted loop.  */
+
+static rtx_insn *
+pru_insert_loop_label_last (rtx_insn *last_insn, rtx_code_label *label,
+			    bool shared)
+{
+  rtx_insn *next, *prev;
+  int count = 0, code, icode;
+
+  if (dump_file)
+    fprintf (dump_file, "considering end of repeat loop at insn %d\n",
+	     INSN_UID (last_insn));
+
+  /* Set PREV to the last insn in the loop.  */
+  prev = PREV_INSN (last_insn);
+
+  /* Set NEXT to the next insn after the loop label.  */
+  next = last_insn;
+  if (!shared)
+    while (prev != 0)
+      {
+	code = GET_CODE (prev);
+	if (code == CALL_INSN || code == CODE_LABEL || code == BARRIER)
+	  break;
+
+	if (INSN_P (prev))
+	  {
+	    if (GET_CODE (PATTERN (prev)) == SEQUENCE)
+	      prev = as_a <rtx_insn *> (XVECEXP (PATTERN (prev), 0, 1));
+
+	    /* Other insns that should not be in the last two opcodes.  */
+	    icode = recog_memoized (prev);
+	    if (icode < 0
+		|| icode == CODE_FOR_pruloophi
+		|| icode == CODE_FOR_pruloopsi)
+	      break;
+
+	    count++;
+	    next = prev;
+	    if (dump_file)
+	      print_rtl_single (dump_file, next);
+	    if (count == 2)
+	      break;
+	  }
+	prev = PREV_INSN (prev);
+      }
+
+  /* Insert the nops.  */
+  if (dump_file && count < 2)
+    fprintf (dump_file, "Adding %d nop%s inside loop\n\n",
+	     2 - count, count == 1 ? "" : "s");
+
+  for (; count < 2; count++)
+      emit_insn_before (gen_nop (), last_insn);
+
+  /* Insert the label.  */
+  emit_label_before (label, last_insn);
+
+  return last_insn;
+}
+
+/* If IS_END is false, expand a canonical doloop_begin RTL into the
+   PRU-specific doloop_begin_internal.  Otherwise expand doloop_end to
+   doloop_end_internal.  */
+void
+pru_emit_doloop (rtx *operands, int is_end)
+{
+  rtx tag;
+
+  if (cfun->machine->doloop_tags == 0
+      || cfun->machine->doloop_tag_from_end == is_end)
+    {
+      cfun->machine->doloop_tags++;
+      cfun->machine->doloop_tag_from_end = is_end;
+    }
+
+  tag = GEN_INT (cfun->machine->doloop_tags - 1);
+  machine_mode opmode = GET_MODE (operands[0]);
+  if (is_end)
+    {
+      if (opmode == HImode)
+	emit_jump_insn (gen_doloop_end_internalhi (operands[0],
+						   operands[1], tag));
+      else if (opmode == SImode)
+	emit_jump_insn (gen_doloop_end_internalsi (operands[0],
+						   operands[1], tag));
+      else
+	gcc_unreachable ();
+    }
+  else
+    {
+      if (opmode == HImode)
+	emit_insn (gen_doloop_begin_internalhi (operands[0], operands[0], tag));
+      else if (opmode == SImode)
+	emit_insn (gen_doloop_begin_internalsi (operands[0], operands[0], tag));
+      else
+	gcc_unreachable ();
+    }
+}
+
+
+/* Code for converting doloop_begins and doloop_ends into valid
+   PRU instructions.  Idea and code snippets borrowed from mep port.
+
+   A doloop_begin is just a placeholder:
+
+	$count = unspec ($count)
+
+   where $count is initially the number of iterations.
+   doloop_end has the form:
+
+	if (--$count == 0) goto label
+
+   The counter variable is private to the doloop insns, nothing else
+   relies on its value.
+
+   There are three cases, in decreasing order of preference:
+
+      1.  A loop has exactly one doloop_begin and one doloop_end.
+	 The doloop_end branches to the first instruction after
+	 the doloop_begin.
+
+	 In this case we can replace the doloop_begin with a LOOP
+	 instruction and remove the doloop_end.  I.e.:
+
+		$count1 = unspec ($count1)
+	    label:
+		...
+		if (--$count2 != 0) goto label
+
+	  becomes:
+
+		LOOP end_label,$count1
+	    label:
+		...
+	    end_label:
+		# end loop
+
+      2.  As for (1), except there are several doloop_ends.  One of them
+	 (call it X) falls through to a label L.  All the others fall
+	 through to branches to L.
+
+	 In this case, we remove X and replace the other doloop_ends
+	 with branches to the LOOP label.  For example:
+
+		$count1 = unspec ($count1)
+	    label:
+		...
+		if (--$count1 != 0) goto label
+	    end_label:
+		...
+		if (--$count2 != 0) goto label
+		goto end_label
+
+	 becomes:
+
+		LOOP end_label,$count1
+	    label:
+		...
+	    end_label:
+		# end repeat
+		...
+		goto end_label
+
+      3.  The fallback case.  Replace doloop_begins with:
+
+		$count = $count
+
+	 Replace doloop_ends with the equivalent of:
+
+		$count = $count - 1
+		if ($count != 0) goto loop_label
+
+	 */
+
+/* A structure describing one doloop_begin.  */
+struct pru_doloop_begin {
+  /* The next doloop_begin with the same tag.  */
+  struct pru_doloop_begin *next;
+
+  /* The instruction itself.  */
+  rtx_insn *insn;
+
+  /* The initial counter value.  */
+  rtx loop_count;
+
+  /* The counter register.  */
+  rtx counter;
+};
+
+/* A structure describing a doloop_end.  */
+struct pru_doloop_end {
+  /* The next doloop_end with the same loop tag.  */
+  struct pru_doloop_end *next;
+
+  /* The instruction itself.  */
+  rtx_insn *insn;
+
+  /* The first instruction after INSN when the branch isn't taken.  */
+  rtx_insn *fallthrough;
+
+  /* The location of the counter value.  Since doloop_end_internal is a
+     jump instruction, it has to allow the counter to be stored anywhere
+     (any non-fixed register).  */
+  rtx counter;
+
+  /* The target label (the place where the insn branches when the counter
+     isn't zero).  */
+  rtx label;
+
+  /* A scratch register.  Only available when COUNTER isn't stored
+     in a general register.  */
+  rtx scratch;
+};
+
+
+/* One do-while loop.  */
+struct pru_doloop {
+  /* All the doloop_begins for this loop (in no particular order).  */
+  struct pru_doloop_begin *begin;
+
+  /* All the doloop_ends.  When there is more than one, arrange things
+     so that the first one is the most likely to be X in case (2) above.  */
+  struct pru_doloop_end *end;
+};
+
+
+/* Return true if LOOP can be converted into LOOP form
+   (that is, if it matches cases (1) or (2) above).  */
+
+static bool
+pru_repeat_loop_p (struct pru_doloop *loop)
+{
+  struct pru_doloop_end *end;
+  rtx_insn *fallthrough;
+
+  /* There must be exactly one doloop_begin and at least one doloop_end.  */
+  if (loop->begin == 0 || loop->end == 0 || loop->begin->next != 0)
+    return false;
+
+  /* The first doloop_end (X) must branch back to the insn after
+     the doloop_begin.  */
+  if (prev_real_insn (as_a<rtx_insn *> (loop->end->label)) != loop->begin->insn)
+    return false;
+
+  /* Check that the first doloop_end (X) can actually reach
+     doloop_begin () with U8_PCREL relocation for LOOP instruction.  */
+  if (get_attr_length (loop->end->insn) != 4)
+    return false;
+
+  /* All the other doloop_ends must branch to the same place as X.
+     When the branch isn't taken, they must jump to the instruction
+     after X.  */
+  fallthrough = loop->end->fallthrough;
+  for (end = loop->end->next; end != 0; end = end->next)
+    if (end->label != loop->end->label
+	|| !simplejump_p (end->fallthrough)
+	|| fallthrough
+	   != next_real_insn (JUMP_LABEL_AS_INSN (end->fallthrough)))
+      return false;
+
+  return true;
+}
+
+
+/* The main repeat reorg function.  See comment above for details.  */
+
+static void
+pru_reorg_loop (rtx_insn *insns)
+{
+  rtx_insn *insn;
+  struct pru_doloop *loops, *loop;
+  struct pru_doloop_begin *begin;
+  struct pru_doloop_end *end;
+  size_t tmpsz;
+
+  /* Quick exit if we haven't created any loops.  */
+  if (cfun->machine->doloop_tags == 0)
+    return;
+
+  /* Create an array of pru_doloop structures.  */
+  tmpsz = sizeof (loops[0]) * cfun->machine->doloop_tags;
+  loops = (struct pru_doloop *) alloca (tmpsz);
+  memset (loops, 0, sizeof (loops[0]) * cfun->machine->doloop_tags);
+
+  /* Search the function for do-while insns and group them by loop tag.  */
+  for (insn = insns; insn; insn = NEXT_INSN (insn))
+    if (INSN_P (insn))
+      switch (recog_memoized (insn))
+	{
+	case CODE_FOR_doloop_begin_internalhi:
+	case CODE_FOR_doloop_begin_internalsi:
+	  insn_extract (insn);
+	  loop = &loops[INTVAL (recog_data.operand[2])];
+
+	  tmpsz = sizeof (struct pru_doloop_begin);
+	  begin = (struct pru_doloop_begin *) alloca (tmpsz);
+	  begin->next = loop->begin;
+	  begin->insn = insn;
+	  begin->loop_count = recog_data.operand[1];
+	  begin->counter = recog_data.operand[0];
+
+	  loop->begin = begin;
+	  break;
+
+	case CODE_FOR_doloop_end_internalhi:
+	case CODE_FOR_doloop_end_internalsi:
+	  insn_extract (insn);
+	  loop = &loops[INTVAL (recog_data.operand[2])];
+
+	  tmpsz = sizeof (struct pru_doloop_end);
+	  end = (struct pru_doloop_end *) alloca (tmpsz);
+	  end->insn = insn;
+	  end->fallthrough = next_real_insn (insn);
+	  end->counter = recog_data.operand[0];
+	  end->label = recog_data.operand[1];
+	  end->scratch = recog_data.operand[3];
+
+	  /* If this insn falls through to an unconditional jump,
+	     give it a lower priority than the others.  */
+	  if (loop->end != 0 && simplejump_p (end->fallthrough))
+	    {
+	      end->next = loop->end->next;
+	      loop->end->next = end;
+	    }
+	  else
+	    {
+	      end->next = loop->end;
+	      loop->end = end;
+	    }
+	  break;
+	}
+
+  /* Convert the insns for each loop in turn.  */
+  for (loop = loops; loop < loops + cfun->machine->doloop_tags; loop++)
+    if (pru_repeat_loop_p (loop))
+      {
+	/* Case (1) or (2).  */
+	rtx_code_label *repeat_label;
+	rtx label_ref;
+
+	/* Create a new label for the repeat insn.  */
+	repeat_label = gen_label_rtx ();
+
+	/* Replace the doloop_begin with a repeat.  We get rid
+	   of the iteration register because LOOP instruction
+	   will utilize an internal for the PRU core LOOP register.  */
+	label_ref = gen_rtx_LABEL_REF (VOIDmode, repeat_label);
+	machine_mode loop_mode = GET_MODE (loop->begin->loop_count);
+	if (loop_mode == HImode)
+	  emit_insn_before (gen_pruloophi (loop->begin->loop_count, label_ref),
+			    loop->begin->insn);
+	else if (loop_mode == SImode)
+	  {
+	    rtx loop_rtx = gen_pruloopsi (loop->begin->loop_count, label_ref);
+	    emit_insn_before (loop_rtx, loop->begin->insn);
+	  }
+	else if (loop_mode == VOIDmode)
+	  {
+	    gcc_assert (CONST_INT_P (loop->begin->loop_count));
+	    gcc_assert (UBYTE_INT ( INTVAL (loop->begin->loop_count)));
+	    rtx loop_rtx = gen_pruloopsi (loop->begin->loop_count, label_ref);
+	    emit_insn_before (loop_rtx, loop->begin->insn);
+	  }
+	else
+	  gcc_unreachable ();
+	delete_insn (loop->begin->insn);
+
+	/* Insert the repeat label before the first doloop_end.
+	   Fill the gap with nops if LOOP insn is less than 2
+	   instructions away than loop->end.  */
+	pru_insert_loop_label_last (loop->end->insn, repeat_label,
+				    loop->end->next != 0);
+
+	/* Emit a pruloop_end (to improve the readability of the output).  */
+	emit_insn_before (gen_pruloop_end (), loop->end->insn);
+
+	/* HACK: TODO: This is usually not needed, but is required for
+	   a few rare cases where a JUMP that breaks the loop
+	   references the LOOP_END address.  In other words, since
+	   we're missing a real "loop_end" instruction, a loop "break"
+	   may accidentally reference the loop end itself, and thus
+	   continuing the cycle.  */
+	for (insn = NEXT_INSN (loop->end->insn);
+	     insn != next_real_insn (loop->end->insn);
+	     insn = NEXT_INSN (insn))
+	  {
+	    if (LABEL_P (insn) && LABEL_NUSES (insn) > 0)
+	      emit_insn_before (gen_nop_loop_guard (), loop->end->insn);
+	  }
+
+	/* Delete the first doloop_end.  */
+	delete_insn (loop->end->insn);
+
+	/* Replace the others with branches to REPEAT_LABEL.  */
+	for (end = loop->end->next; end != 0; end = end->next)
+	  {
+	    rtx_insn *newjmp;
+	    newjmp = emit_jump_insn_before (gen_jump (repeat_label), end->insn);
+	    JUMP_LABEL (newjmp) = repeat_label;
+	    delete_insn (end->insn);
+	    delete_insn (end->fallthrough);
+	  }
+      }
+    else
+      {
+	/* Case (3).  First replace all the doloop_begins with setting
+	   the HW register used for loop counter.  */
+	for (begin = loop->begin; begin != 0; begin = begin->next)
+	  {
+	    insn = gen_move_insn (copy_rtx (begin->counter),
+				  copy_rtx (begin->loop_count));
+	    emit_insn_before (insn, begin->insn);
+	    delete_insn (begin->insn);
+	  }
+
+	/* Replace all the doloop_ends with decrement-and-branch sequences.  */
+	for (end = loop->end; end != 0; end = end->next)
+	  {
+	    rtx reg;
+
+	    start_sequence ();
+
+	    /* Load the counter value into a general register.  */
+	    reg = end->counter;
+	    if (!REG_P (reg) || REGNO (reg) > LAST_NONIO_GP_REG)
+	      {
+		reg = end->scratch;
+		emit_move_insn (copy_rtx (reg), copy_rtx (end->counter));
+	      }
+
+	    /* Decrement the counter.  */
+	    emit_insn (gen_add3_insn (copy_rtx (reg), copy_rtx (reg),
+				      constm1_rtx));
+
+	    /* Copy it back to its original location.  */
+	    if (reg != end->counter)
+	      emit_move_insn (copy_rtx (end->counter), copy_rtx (reg));
+
+	    /* Jump back to the start label.  */
+	    insn = emit_jump_insn (gen_cbranchsi4 (gen_rtx_NE (VOIDmode, reg,
+							       const0_rtx),
+						   reg,
+						   const0_rtx,
+						   end->label));
+
+	    JUMP_LABEL (insn) = end->label;
+	    LABEL_NUSES (end->label)++;
+
+	    /* Emit the whole sequence before the doloop_end.  */
+	    insn = get_insns ();
+	    end_sequence ();
+	    emit_insn_before (insn, end->insn);
+
+	    /* Delete the doloop_end.  */
+	    delete_insn (end->insn);
+	  }
+      }
+}
+
+/* Implement TARGET_MACHINE_DEPENDENT_REORG.  */
+static void
+pru_reorg (void)
+{
+  rtx_insn *insns = get_insns ();
+
+  compute_bb_for_insn ();
+  df_analyze ();
+
+  /* Need correct insn lengths for allowing LOOP instruction
+     emitting due to U8_PCREL limitations.  */
+  shorten_branches (get_insns ());
+
+  /* The generic reorg_loops () is not suitable for PRU because
+     it doesn't handle doloop_begin/end tying.  And we need our
+     doloop_begin emitted before reload.  It is difficult to coalesce
+     UBYTE constant initial loop values into the LOOP insn during
+     machine reorg phase.  */
+  pru_reorg_loop (insns);
+
+  df_finish_pass (false);
+}
+
+/* Enumerate all PRU-specific builtins.  */
+enum pru_builtin
+{
+  PRU_BUILTIN_DELAY_CYCLES,
+  PRU_BUILTIN_max
+};
+
+static GTY(()) tree pru_builtins [(int) PRU_BUILTIN_max];
+
+/* Implement TARGET_INIT_BUILTINS.  */
+
+static void
+pru_init_builtins (void)
+{
+  tree void_ftype_longlong
+    = build_function_type_list (void_type_node,
+				long_long_integer_type_node,
+				NULL);
+
+  pru_builtins[PRU_BUILTIN_DELAY_CYCLES]
+    = add_builtin_function ( "__delay_cycles", void_ftype_longlong,
+			    PRU_BUILTIN_DELAY_CYCLES, BUILT_IN_MD, NULL,
+			    NULL_TREE);
+}
+
+/* Implement TARGET_BUILTIN_DECL.  */
+
+static tree
+pru_builtin_decl (unsigned code, bool initialize_p ATTRIBUTE_UNUSED)
+{
+  switch (code)
+    {
+    case PRU_BUILTIN_DELAY_CYCLES:
+      return pru_builtins[code];
+    default:
+      return error_mark_node;
+    }
+}
+
+/* Emit a sequence of one or more delay_cycles_X insns, in order to generate
+   code that delays exactly ARG cycles.  */
+
+static rtx
+pru_expand_delay_cycles (rtx arg)
+{
+  HOST_WIDE_INT c, n;
+
+  if (GET_CODE (arg) != CONST_INT)
+    {
+      error ("%<__delay_cycles%> only takes constant arguments");
+      return NULL_RTX;
+    }
+
+  c = INTVAL (arg);
+
+  if (HOST_BITS_PER_WIDE_INT > 32)
+    {
+      if (c < 0)
+	{
+	  error ("%<__delay_cycles%> only takes non-negative cycle counts");
+	  return NULL_RTX;
+	}
+    }
+
+  emit_insn (gen_delay_cycles_start (arg));
+
+  /* For 32-bit loops, there's 2 + 2x cycles.  */
+  if (c > 2 * 0xffff + 1)
+    {
+      n = (c - 2) / 2;
+      c -= (n * 2) + 2;
+      if ((unsigned long long) n > 0xffffffffULL)
+	{
+	  error ("%<__delay_cycles%> is limited to 32-bit loop counts");
+	  return NULL_RTX;
+	}
+      emit_insn (gen_delay_cycles_2x_plus2_si (GEN_INT (n)));
+    }
+
+  /* For 16-bit loops, there's 1 + 2x cycles.  */
+  if (c > 2)
+    {
+      n = (c - 1) / 2;
+      c -= (n * 2) + 1;
+
+      emit_insn (gen_delay_cycles_2x_plus1_hi (GEN_INT (n)));
+    }
+
+  while (c > 0)
+    {
+      emit_insn (gen_delay_cycles_1 ());
+      c -= 1;
+    }
+
+  emit_insn (gen_delay_cycles_end (arg));
+
+  return NULL_RTX;
+}
+
+
+/* Implement TARGET_EXPAND_BUILTIN.  Expand an expression EXP that calls
+   a built-in function, with result going to TARGET if that's convenient
+   (and in mode MODE if that's convenient).
+   SUBTARGET may be used as the target for computing one of EXP's operands.
+   IGNORE is nonzero if the value is to be ignored.  */
+
+static rtx
+pru_expand_builtin (tree exp, rtx target ATTRIBUTE_UNUSED,
+		    rtx subtarget ATTRIBUTE_UNUSED,
+		    machine_mode mode ATTRIBUTE_UNUSED,
+		    int ignore ATTRIBUTE_UNUSED)
+{
+  tree fndecl = TREE_OPERAND (CALL_EXPR_FN (exp), 0);
+  unsigned int fcode = DECL_FUNCTION_CODE (fndecl);
+  rtx arg1 = expand_normal (CALL_EXPR_ARG (exp, 0));
+
+  if (fcode == PRU_BUILTIN_DELAY_CYCLES)
+    return pru_expand_delay_cycles (arg1);
+
+  internal_error ("bad builtin code");
+
+  return NULL_RTX;
+}
+
+/* Remember the last target of pru_set_current_function.  */
+static GTY(()) tree pru_previous_fndecl;
+
+/* Establish appropriate back-end context for processing the function
+   FNDECL.  The argument might be NULL to indicate processing at top
+   level, outside of any function scope.  */
+static void
+pru_set_current_function (tree fndecl)
+{
+  tree old_tree = (pru_previous_fndecl
+		   ? DECL_FUNCTION_SPECIFIC_TARGET (pru_previous_fndecl)
+		   : NULL_TREE);
+
+  tree new_tree = (fndecl
+		   ? DECL_FUNCTION_SPECIFIC_TARGET (fndecl)
+		   : NULL_TREE);
+
+  if (fndecl && fndecl != pru_previous_fndecl)
+    {
+      pru_previous_fndecl = fndecl;
+      if (old_tree == new_tree)
+	;
+
+      else if (new_tree)
+	{
+	  cl_target_option_restore (&global_options,
+				    TREE_TARGET_OPTION (new_tree));
+	  target_reinit ();
+	}
+
+      else if (old_tree)
+	{
+	  struct cl_target_option *def
+	    = TREE_TARGET_OPTION (target_option_current_node);
+
+	  cl_target_option_restore (&global_options, def);
+	  target_reinit ();
+	}
+    }
+}
+
+/* Implement TARGET_UNWIND_WORD_MODE.
+
+   Since PRU is really a 32-bit CPU, the default word_mode is not suitable.  */
+static scalar_int_mode
+pru_unwind_word_mode (void)
+{
+  return SImode;
+}
+
+
+/* Initialize the GCC target structure.  */
+#undef TARGET_ASM_FUNCTION_PROLOGUE
+#define TARGET_ASM_FUNCTION_PROLOGUE pru_asm_function_prologue
+#undef TARGET_ASM_INTEGER
+#define TARGET_ASM_INTEGER pru_assemble_integer
+
+#undef TARGET_ASM_FILE_START
+#define TARGET_ASM_FILE_START pru_file_start
+
+#undef TARGET_INIT_BUILTINS
+#define TARGET_INIT_BUILTINS pru_init_builtins
+#undef TARGET_EXPAND_BUILTIN
+#define TARGET_EXPAND_BUILTIN pru_expand_builtin
+#undef TARGET_BUILTIN_DECL
+#define TARGET_BUILTIN_DECL pru_builtin_decl
+
+#undef TARGET_FUNCTION_OK_FOR_SIBCALL
+#define TARGET_FUNCTION_OK_FOR_SIBCALL hook_bool_tree_tree_true
+
+#undef TARGET_CAN_ELIMINATE
+#define TARGET_CAN_ELIMINATE pru_can_eliminate
+
+#undef TARGET_MODES_TIEABLE_P
+#define TARGET_MODES_TIEABLE_P pru_modes_tieable_p
+
+#undef TARGET_HARD_REGNO_MODE_OK
+#define TARGET_HARD_REGNO_MODE_OK pru_hard_regno_mode_ok
+
+#undef  TARGET_HARD_REGNO_SCRATCH_OK
+#define TARGET_HARD_REGNO_SCRATCH_OK pru_hard_regno_scratch_ok
+
+#undef TARGET_FUNCTION_ARG
+#define TARGET_FUNCTION_ARG pru_function_arg
+
+#undef TARGET_FUNCTION_ARG_ADVANCE
+#define TARGET_FUNCTION_ARG_ADVANCE pru_function_arg_advance
+
+#undef TARGET_ARG_PARTIAL_BYTES
+#define TARGET_ARG_PARTIAL_BYTES pru_arg_partial_bytes
+
+#undef TARGET_FUNCTION_VALUE
+#define TARGET_FUNCTION_VALUE pru_function_value
+
+#undef TARGET_LIBCALL_VALUE
+#define TARGET_LIBCALL_VALUE pru_libcall_value
+
+#undef TARGET_FUNCTION_VALUE_REGNO_P
+#define TARGET_FUNCTION_VALUE_REGNO_P pru_function_value_regno_p
+
+#undef TARGET_RETURN_IN_MEMORY
+#define TARGET_RETURN_IN_MEMORY pru_return_in_memory
+
+#undef TARGET_MUST_PASS_IN_STACK
+#define TARGET_MUST_PASS_IN_STACK must_pass_in_stack_var_size
+
+#undef TARGET_LEGITIMATE_ADDRESS_P
+#define TARGET_LEGITIMATE_ADDRESS_P pru_legitimate_address_p
+
+#undef TARGET_PREFERRED_RELOAD_CLASS
+#define TARGET_PREFERRED_RELOAD_CLASS pru_preferred_reload_class
+
+#undef TARGET_INIT_LIBFUNCS
+#define TARGET_INIT_LIBFUNCS pru_init_libfuncs
+#undef TARGET_LIBFUNC_GNU_PREFIX
+#define TARGET_LIBFUNC_GNU_PREFIX true
+
+#undef TARGET_RTX_COSTS
+#define TARGET_RTX_COSTS pru_rtx_costs
+
+#undef TARGET_PRINT_OPERAND
+#define TARGET_PRINT_OPERAND pru_print_operand
+
+#undef TARGET_PRINT_OPERAND_ADDRESS
+#define TARGET_PRINT_OPERAND_ADDRESS pru_print_operand_address
+
+#undef TARGET_OPTION_OVERRIDE
+#define TARGET_OPTION_OVERRIDE pru_option_override
+
+#undef TARGET_SET_CURRENT_FUNCTION
+#define TARGET_SET_CURRENT_FUNCTION pru_set_current_function
+
+#undef  TARGET_MACHINE_DEPENDENT_REORG
+#define TARGET_MACHINE_DEPENDENT_REORG  pru_reorg
+
+#undef  TARGET_CAN_USE_DOLOOP_P
+#define TARGET_CAN_USE_DOLOOP_P		pru_can_use_doloop_p
+
+#undef TARGET_INVALID_WITHIN_DOLOOP
+#define TARGET_INVALID_WITHIN_DOLOOP  pru_invalid_within_doloop
+
+#undef  TARGET_UNWIND_WORD_MODE
+#define TARGET_UNWIND_WORD_MODE pru_unwind_word_mode
+
+#undef TARGET_HAVE_SPECULATION_SAFE_VALUE
+#define TARGET_HAVE_SPECULATION_SAFE_VALUE speculation_safe_value_not_needed
+
+struct gcc_target targetm = TARGET_INITIALIZER;
+
+#include "gt-pru.h"
diff --git a/gcc/config/pru/pru.h b/gcc/config/pru/pru.h
new file mode 100644
index 00000000000..de1b9100209
--- /dev/null
+++ b/gcc/config/pru/pru.h
@@ -0,0 +1,551 @@ 
+/* Definitions of target machine for TI PRU.
+   Copyright (C) 2014-2018 Free Software Foundation, Inc.
+   Contributed by Dimitar Dimitrov <dimitar@dinux.eu>
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published
+   by the Free Software Foundation; either version 3, or (at your
+   option) any later version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+   License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef GCC_PRU_H
+#define GCC_PRU_H
+
+#include "config/pru/pru-opts.h"
+
+/* Define built-in preprocessor macros.  */
+#define TARGET_CPU_CPP_BUILTINS()		    \
+  do						    \
+    {						    \
+      builtin_define_std ("__PRU__");		    \
+      builtin_define_std ("__pru__");		    \
+      builtin_define_std ("__PRU_V3__");	    \
+      builtin_define_std ("__LITTLE_ENDIAN__");	    \
+      builtin_define_std ("__little_endian__");	    \
+      /* Trampolines are disabled for now.  */	    \
+      builtin_define_std ("NO_TRAMPOLINES");	    \
+    }						    \
+  while (0)
+
+/* TI ABI implementation is not feature enough (e.g. function pointers are
+   not supported), so we cannot list it as a multilib variant.  To prevent
+   misuse from users, do not link any of the standard libraries.  */
+#define DRIVER_SELF_SPECS			      \
+  "%{mabi=ti:-nodefaultlibs} "			      \
+  "%{mmcu=*:-specs=device-specs/%*%s %<mmcu=*} "
+
+#undef CPP_SPEC
+#define CPP_SPEC					\
+  "%(cpp_device) "					\
+  "%{mabi=ti:-D__PRU_EABI_TI__; :-D__PRU_EABI_GNU__}"
+
+/* Do not relax when in TI ABI mode since TI tools do not always
+   put PRU_S10_PCREL.  */
+#undef  LINK_SPEC
+#define LINK_SPEC					    \
+  "%(link_device) "					    \
+  "%{mabi=ti:--no-relax;:%{mno-relax:--no-relax;:--relax}} "   \
+  "%{shared:%eshared is not supported} "
+
+/* CRT0 is carefully maintained to be compatible with both GNU and TI ABIs.  */
+#undef  STARTFILE_SPEC
+#define STARTFILE_SPEC							\
+  "%{!pg:%{minrt:crt0-minrt.o%s}%{!minrt:crt0.o%s}} %{!mabi=ti:-lgcc} "
+
+#undef  ENDFILE_SPEC
+#define ENDFILE_SPEC "%{!mabi=ti:-lgloss} "
+
+/* TI ABI mandates that ELF symbols do not start with any prefix.  */
+#undef USER_LABEL_PREFIX
+#define USER_LABEL_PREFIX ""
+
+#undef LOCAL_LABEL_PREFIX
+#define LOCAL_LABEL_PREFIX ".L"
+
+/* Storage layout.  */
+
+#define DEFAULT_SIGNED_CHAR 0
+#define BITS_BIG_ENDIAN 0
+#define BYTES_BIG_ENDIAN 0
+#define WORDS_BIG_ENDIAN 0
+
+/* PRU is represented in GCC as an 8-bit CPU with fast 16b and 32bb
+   arithmetic.  */
+#define BITS_PER_WORD 8
+
+#ifdef IN_LIBGCC2
+/* This is to get correct SI and DI modes in libgcc2.c (32 and 64 bits).  */
+#define UNITS_PER_WORD 4
+#else
+/* Width of a word, in units (bytes).  */
+#define UNITS_PER_WORD 1
+#endif
+
+#define POINTER_SIZE 32
+#define BIGGEST_ALIGNMENT 8
+#define STRICT_ALIGNMENT 0
+#define FUNCTION_BOUNDARY 8	/* Func pointers are word-addressed.  */
+#define PARM_BOUNDARY 8
+#define STACK_BOUNDARY 8
+#define MAX_FIXED_MODE_SIZE 64
+
+#define POINTERS_EXTEND_UNSIGNED 1
+
+/* Layout of source language data types.  */
+
+#define INT_TYPE_SIZE 32
+#define SHORT_TYPE_SIZE 16
+#define LONG_TYPE_SIZE 32
+#define LONG_LONG_TYPE_SIZE 64
+#define FLOAT_TYPE_SIZE 32
+#define DOUBLE_TYPE_SIZE 64
+#define LONG_DOUBLE_TYPE_SIZE DOUBLE_TYPE_SIZE
+
+#undef SIZE_TYPE
+#define SIZE_TYPE "unsigned int"
+
+#undef PTRDIFF_TYPE
+#define PTRDIFF_TYPE "int"
+
+
+/* Basic characteristics of PRU registers:
+
+   Regno  Name
+   0      r0		  Caller Saved.  Also used as a static chain register.
+   1      r1		  Caller Saved.  Also used as a temporary by function
+			  profiler and function prologue/epilogue.
+   2      r2       sp	  Stack Pointer
+   3*     r3.w0    ra	  Return Address (16-bit)
+   4      r4       fp	  Frame Pointer
+   5-13   r5-r13	  Callee Saved Registers
+   14-29  r14-r29	  Register Arguments.  Caller Saved Registers.
+   14-15  r14-r15	  Return Location
+   30     r30		  Special I/O register.  Not used by compiler.
+   31     r31		  Special I/O register.  Not used by compiler.
+
+   32     loop_cntr	  Internal register used as a counter by LOOP insns
+
+   33     pc		  Not an actual register
+
+   34     fake_fp	  Fake Frame Pointer (always eliminated)
+   35     fake_ap	  Fake Argument Pointer (always eliminated)
+   36			  First Pseudo Register
+
+   The definitions for all the hard register numbers are located in pru.md.
+*/
+
+#define FIXED_REGISTERS				\
+  {						\
+/*   0 */  0,0,0,0, 0,0,0,0, 1,1,1,1, 1,1,1,1,	\
+/*   4 */  0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,	\
+/*   8 */  0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,	\
+/*  12 */  0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,	\
+/*  16 */  0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,	\
+/*  20 */  0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,	\
+/*  24 */  0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,	\
+/*  28 */  0,0,0,0, 0,0,0,0, 1,1,1,1, 1,1,1,1,	\
+/*  32 */  1,1,1,1, 1,1,1,1, 1,1,1,1, 1,1,1,1	\
+  }
+
+/* Call used == caller saved + fixed regs + args + ret vals.  */
+#define CALL_USED_REGISTERS			\
+  {						\
+/*   0 */  1,1,1,1, 1,1,1,1, 1,1,1,1, 1,1,1,1,	\
+/*   4 */  0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,	\
+/*   8 */  0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,	\
+/*  12 */  0,0,0,0, 0,0,0,0, 1,1,1,1, 1,1,1,1,	\
+/*  16 */  1,1,1,1, 1,1,1,1, 1,1,1,1, 1,1,1,1,	\
+/*  20 */  1,1,1,1, 1,1,1,1, 1,1,1,1, 1,1,1,1,	\
+/*  24 */  1,1,1,1, 1,1,1,1, 1,1,1,1, 1,1,1,1,	\
+/*  28 */  1,1,1,1, 1,1,1,1, 1,1,1,1, 1,1,1,1,	\
+/*  32 */  1,1,1,1, 1,1,1,1, 1,1,1,1, 1,1,1,1	\
+  }
+
+#define __pru_RSEQ(X)  (X) * 4 + 0, (X) * 4 + 1, (X) * 4 + 2, (X) * 4 + 3
+#define REG_ALLOC_ORDER							    \
+  {									    \
+    /* Call-clobbered, yet not used for parameters.  */			    \
+    __pru_RSEQ (0),  __pru_RSEQ ( 1),					    \
+									    \
+    __pru_RSEQ (14), __pru_RSEQ (15), __pru_RSEQ (16), __pru_RSEQ (17),	    \
+    __pru_RSEQ (18), __pru_RSEQ (19), __pru_RSEQ (20), __pru_RSEQ (21),	    \
+    __pru_RSEQ (22), __pru_RSEQ (23), __pru_RSEQ (24), __pru_RSEQ (25),	    \
+    __pru_RSEQ (26), __pru_RSEQ (27), __pru_RSEQ (28), __pru_RSEQ (29),	    \
+									    \
+    __pru_RSEQ ( 5), __pru_RSEQ ( 6), __pru_RSEQ ( 7), __pru_RSEQ ( 8),	    \
+    __pru_RSEQ ( 9), __pru_RSEQ (10), __pru_RSEQ (11), __pru_RSEQ (12),	    \
+    __pru_RSEQ (13),							    \
+									    \
+    __pru_RSEQ ( 4),							    \
+    __pru_RSEQ ( 2), __pru_RSEQ ( 3),					    \
+									    \
+    /* I/O and virtual registers.  */					    \
+    __pru_RSEQ (30), __pru_RSEQ (31), __pru_RSEQ (32), __pru_RSEQ (33),	    \
+    __pru_RSEQ (34), __pru_RSEQ (35)					    \
+  }
+
+/* Register Classes.  */
+
+enum reg_class
+{
+  NO_REGS,
+  SIB_REGS,
+  LOOPCNTR_REGS,
+  GP_REGS,
+  ALL_REGS,
+  LIM_REG_CLASSES
+};
+
+#define N_REG_CLASSES (int) LIM_REG_CLASSES
+
+#define REG_CLASS_NAMES   \
+  {  "NO_REGS",		  \
+     "SIB_REGS",	  \
+     "LOOPCNTR_REGS",	  \
+     "GP_REGS",		  \
+     "ALL_REGS" }
+
+#define GENERAL_REGS ALL_REGS
+
+#define REG_CLASS_CONTENTS					\
+  {								\
+    /* NO_REGS	      */ { 0, 0, 0, 0, 0},			\
+    /* SIB_REGS	      */ { 0xf, 0xff000000, ~0, 0xffffff, 0},	\
+    /* LOOPCNTR_REGS  */ { 0, 0, 0, 0, 0xf},			\
+    /* GP_REGS	      */ { ~0, ~0, ~0, ~0, 0},			\
+    /* ALL_REGS	      */ { ~0,~0, ~0, ~0, ~0}			\
+  }
+
+
+#define GP_REG_P(REGNO) ((unsigned)(REGNO) <= LAST_GP_REG)
+#define REGNO_REG_CLASS(REGNO)						    \
+	((REGNO) >= FIRST_ARG_REGNO && (REGNO) <= LAST_ARG_REGNO ? SIB_REGS \
+	 : (REGNO) == STATIC_CHAIN_REGNUM ? SIB_REGS			    \
+	 : (REGNO) == LOOPCNTR_REG ? LOOPCNTR_REGS			    \
+	 : (REGNO) <= LAST_NONIO_GP_REG ? GP_REGS			    \
+	 : ALL_REGS)
+
+#define CLASS_MAX_NREGS(CLASS, MODE) \
+  ((GET_MODE_SIZE (MODE) + UNITS_PER_WORD - 1) / UNITS_PER_WORD)
+
+/* Arbitrarily set to a non-argument register.  Not defined by TI ABI.  */
+#define STATIC_CHAIN_REGNUM      0	/* r0 */
+
+/* Tests for various kinds of constants used in the PRU port.  */
+#define SHIFT_INT(X) ((X) >= 0 && (X) <= 31)
+
+#define UHWORD_INT(X) (IN_RANGE ((X), 0, 0xffff))
+#define SHWORD_INT(X) (IN_RANGE ((X), -32768, 32767))
+#define UBYTE_INT(X) (IN_RANGE ((X), 0, 0xff))
+
+/* Say that the epilogue uses the return address register.  Note that
+   in the case of sibcalls, the values "used by the epilogue" are
+   considered live at the start of the called function.  */
+#define EPILOGUE_USES(REGNO) (epilogue_completed &&	      \
+			      (((REGNO) == RA_REGNO)	      \
+			       || (REGNO) == (RA_REGNO + 1)))
+
+/* EXIT_IGNORE_STACK should be nonzero if, when returning from a function,
+   the stack pointer does not matter.  The value is tested only in
+   functions that have frame pointers.
+   No definition is equivalent to always zero.  */
+
+#define EXIT_IGNORE_STACK 1
+
+/* Trampolines are not supported, but put a define to keep the build.  */
+#define TRAMPOLINE_SIZE 4
+
+/* Stack layout.  */
+#define STACK_GROWS_DOWNWARD  1
+#undef FRAME_GROWS_DOWNWARD
+#define FIRST_PARM_OFFSET(FUNDECL) 0
+
+/* Before the prologue, RA lives in r3.w2.  */
+#define INCOMING_RETURN_ADDR_RTX	gen_rtx_REG (HImode, RA_REGNO)
+
+#define RETURN_ADDR_RTX(C,F) pru_get_return_address (C)
+
+#define DWARF_FRAME_RETURN_COLUMN RA_REGNO
+
+/* The CFA includes the pretend args.  */
+#define ARG_POINTER_CFA_OFFSET(FNDECL) \
+  (gcc_assert ((FNDECL) == current_function_decl), \
+   FIRST_PARM_OFFSET (FNDECL) + crtl->args.pretend_args_size)
+
+/* Frame/arg pointer elimination settings.  */
+#define ELIMINABLE_REGS							\
+{{ ARG_POINTER_REGNUM,   STACK_POINTER_REGNUM},				\
+ { ARG_POINTER_REGNUM,   HARD_FRAME_POINTER_REGNUM},			\
+ { FRAME_POINTER_REGNUM, STACK_POINTER_REGNUM},				\
+ { FRAME_POINTER_REGNUM, HARD_FRAME_POINTER_REGNUM}}
+
+#define INITIAL_ELIMINATION_OFFSET(FROM, TO, OFFSET) \
+  (OFFSET) = pru_initial_elimination_offset ((FROM), (TO))
+
+#define HARD_REGNO_RENAME_OK(OLD_REG, NEW_REG) \
+  pru_hard_regno_rename_ok (OLD_REG, NEW_REG)
+
+/* Calling convention definitions.  */
+#if !defined(IN_LIBGCC2)
+
+#define NUM_ARG_REGS (LAST_ARG_REGNO - FIRST_ARG_REGNO + 1)
+
+typedef struct pru_args
+{
+  bool regs_used[NUM_ARG_REGS];
+} CUMULATIVE_ARGS;
+
+#define INIT_CUMULATIVE_ARGS(CUM, FNTYPE, LIBNAME, FNDECL, N_NAMED_ARGS)  \
+  do {									  \
+      memset ((CUM).regs_used, 0, sizeof ((CUM).regs_used));		  \
+  } while (0)
+
+#define FUNCTION_ARG_REGNO_P(REGNO) \
+  ((REGNO) >= FIRST_ARG_REGNO && (REGNO) <= LAST_ARG_REGNO)
+
+/* Passing function arguments on stack.  */
+#define PUSH_ARGS 0
+#define ACCUMULATE_OUTGOING_ARGS 1
+
+/* We define TARGET_RETURN_IN_MEMORY, so set to zero.  */
+#define DEFAULT_PCC_STRUCT_RETURN 0
+
+/* Profiling.  */
+#define PROFILE_BEFORE_PROLOGUE
+#define NO_PROFILE_COUNTERS 1
+#define FUNCTION_PROFILER(FILE, LABELNO) \
+  pru_function_profiler ((FILE), (LABELNO))
+
+#endif	/* IN_LIBGCC2 */
+
+/* Addressing modes.  */
+
+#define CONSTANT_ADDRESS_P(X) \
+  (CONSTANT_P (X) && memory_address_p (SImode, X))
+
+#define MAX_REGS_PER_ADDRESS 2
+#define BASE_REG_CLASS ALL_REGS
+#define INDEX_REG_CLASS ALL_REGS
+
+#define REGNO_OK_FOR_BASE_P(REGNO) pru_regno_ok_for_base_p ((REGNO), true)
+#define REGNO_OK_FOR_INDEX_P(REGNO) pru_regno_ok_for_index_p ((REGNO), true)
+
+/* Limited by the insns in pru-ldst-multiple.md.  */
+#define MOVE_MAX 8
+#define SLOW_BYTE_ACCESS 1
+
+/* It is as good to call a constant function address as to call an address
+   kept in a register.  */
+#define NO_FUNCTION_CSE 1
+
+/* Define output assembler language.  */
+
+#define ASM_APP_ON "#APP\n"
+#define ASM_APP_OFF "#NO_APP\n"
+
+#define ASM_COMMENT_START "# "
+
+#define GLOBAL_ASM_OP "\t.global\t"
+
+#define __pru_name_R(X)  X".b0", X".b1", X".b2", X".b3"
+#define REGISTER_NAMES		  \
+  {				  \
+    __pru_name_R ("r0"),	  \
+    __pru_name_R ("r1"),	  \
+    __pru_name_R ("r2"),	  \
+    __pru_name_R ("r3"),	  \
+    __pru_name_R ("r4"),	  \
+    __pru_name_R ("r5"),	  \
+    __pru_name_R ("r6"),	  \
+    __pru_name_R ("r7"),	  \
+    __pru_name_R ("r8"),	  \
+    __pru_name_R ("r9"),	  \
+    __pru_name_R ("r10"),	  \
+    __pru_name_R ("r11"),	  \
+    __pru_name_R ("r12"),	  \
+    __pru_name_R ("r13"),	  \
+    __pru_name_R ("r14"),	  \
+    __pru_name_R ("r15"),	  \
+    __pru_name_R ("r16"),	  \
+    __pru_name_R ("r17"),	  \
+    __pru_name_R ("r18"),	  \
+    __pru_name_R ("r19"),	  \
+    __pru_name_R ("r20"),	  \
+    __pru_name_R ("r21"),	  \
+    __pru_name_R ("r22"),	  \
+    __pru_name_R ("r23"),	  \
+    __pru_name_R ("r24"),	  \
+    __pru_name_R ("r25"),	  \
+    __pru_name_R ("r26"),	  \
+    __pru_name_R ("r27"),	  \
+    __pru_name_R ("r28"),	  \
+    __pru_name_R ("r29"),	  \
+    __pru_name_R ("r30"),	  \
+    __pru_name_R ("r31"),	  \
+    __pru_name_R ("loopcntr_reg"), \
+    __pru_name_R ("pc"),	  \
+    __pru_name_R ("fake_fp"),	  \
+    __pru_name_R ("fake_ap"),	  \
+}
+
+#define __pru_overlap_R(X)	      \
+  { "r" #X	, X * 4	    ,  4 },   \
+  { "r" #X ".w0", X * 4 + 0 ,  2 },   \
+  { "r" #X ".w1", X * 4 + 1 ,  2 },   \
+  { "r" #X ".w2", X * 4 + 2 ,  2 }
+
+#define OVERLAPPING_REGISTER_NAMES  \
+  {				    \
+    /* Aliases.  */		    \
+    { "sp", 2 * 4, 4 },		    \
+    { "ra", 3 * 4, 2 },		    \
+    { "fp", 4 * 4, 4 },		    \
+    __pru_overlap_R (0),	    \
+    __pru_overlap_R (1),	    \
+    __pru_overlap_R (2),	    \
+    __pru_overlap_R (3),	    \
+    __pru_overlap_R (4),	    \
+    __pru_overlap_R (5),	    \
+    __pru_overlap_R (6),	    \
+    __pru_overlap_R (7),	    \
+    __pru_overlap_R (8),	    \
+    __pru_overlap_R (9),	    \
+    __pru_overlap_R (10),	    \
+    __pru_overlap_R (11),	    \
+    __pru_overlap_R (12),	    \
+    __pru_overlap_R (13),	    \
+    __pru_overlap_R (14),	    \
+    __pru_overlap_R (15),	    \
+    __pru_overlap_R (16),	    \
+    __pru_overlap_R (17),	    \
+    __pru_overlap_R (18),	    \
+    __pru_overlap_R (19),	    \
+    __pru_overlap_R (20),	    \
+    __pru_overlap_R (21),	    \
+    __pru_overlap_R (22),	    \
+    __pru_overlap_R (23),	    \
+    __pru_overlap_R (24),	    \
+    __pru_overlap_R (25),	    \
+    __pru_overlap_R (26),	    \
+    __pru_overlap_R (27),	    \
+    __pru_overlap_R (28),	    \
+    __pru_overlap_R (29),	    \
+    __pru_overlap_R (30),	    \
+    __pru_overlap_R (31),	    \
+}
+
+#define ASM_OUTPUT_ADDR_VEC_ELT(FILE, VALUE)				    \
+  do									    \
+    {									    \
+      fputs (integer_asm_op (POINTER_SIZE / BITS_PER_UNIT, TRUE), FILE);    \
+      fprintf (FILE, "%%pmem(.L%u)\n", (unsigned) (VALUE));		    \
+    }									    \
+  while (0)
+
+#define ASM_OUTPUT_ADDR_DIFF_ELT(STREAM, BODY, VALUE, REL)		    \
+  do									    \
+    {									    \
+      fputs (integer_asm_op (POINTER_SIZE / BITS_PER_UNIT, TRUE), STREAM);  \
+      fprintf (STREAM, "%%pmem(.L%u-.L%u)\n", (unsigned) (VALUE),	    \
+	       (unsigned) (REL));					    \
+    }									    \
+  while (0)
+
+/* Section directives.  */
+
+/* Output before read-only data.  */
+#define TEXT_SECTION_ASM_OP "\t.section\t.text"
+
+/* Output before writable data.  */
+#define DATA_SECTION_ASM_OP "\t.section\t.data"
+
+/* Output before uninitialized data.  */
+#define BSS_SECTION_ASM_OP "\t.section\t.bss"
+
+#define CTORS_SECTION_ASM_OP "\t.section\t.init_array,\"aw\",%init_array"
+#define DTORS_SECTION_ASM_OP "\t.section\t.fini_array,\"aw\",%fini_array"
+
+#undef INIT_SECTION_ASM_OP
+#undef FINI_SECTION_ASM_OP
+#define INIT_ARRAY_SECTION_ASM_OP CTORS_SECTION_ASM_OP
+#define FINI_ARRAY_SECTION_ASM_OP DTORS_SECTION_ASM_OP
+
+/* Since we use .init_array/.fini_array we don't need the markers at
+   the start and end of the ctors/dtors arrays.  */
+#define CTOR_LIST_BEGIN asm (CTORS_SECTION_ASM_OP)
+#define CTOR_LIST_END		/* empty */
+#define DTOR_LIST_BEGIN asm (DTORS_SECTION_ASM_OP)
+#define DTOR_LIST_END		/* empty */
+
+#undef TARGET_ASM_CONSTRUCTOR
+#define TARGET_ASM_CONSTRUCTOR pru_elf_asm_constructor
+
+#undef TARGET_ASM_DESTRUCTOR
+#define TARGET_ASM_DESTRUCTOR pru_elf_asm_destructor
+
+#define ASM_OUTPUT_ALIGN(FILE, LOG)		      \
+  do {						      \
+    fprintf ((FILE), "%s%d\n", ALIGN_ASM_OP, (LOG));  \
+  } while (0)
+
+#undef  ASM_OUTPUT_ALIGNED_COMMON
+#define ASM_OUTPUT_ALIGNED_COMMON(FILE, NAME, SIZE, ALIGN)		\
+do									\
+  {									\
+    fprintf ((FILE), "%s", COMMON_ASM_OP);				\
+    assemble_name ((FILE), (NAME));					\
+    fprintf ((FILE), "," HOST_WIDE_INT_PRINT_UNSIGNED ",%u\n", (SIZE),	\
+	     (ALIGN) / BITS_PER_UNIT);					\
+  }									\
+while (0)
+
+
+/* This says how to output assembler code to declare an
+   uninitialized internal linkage data object.  Under SVR4,
+   the linker seems to want the alignment of data objects
+   to depend on their types.  We do exactly that here.  */
+
+#undef  ASM_OUTPUT_ALIGNED_LOCAL
+#define ASM_OUTPUT_ALIGNED_LOCAL(FILE, NAME, SIZE, ALIGN)		\
+do {									\
+  switch_to_section (bss_section);					\
+  ASM_OUTPUT_TYPE_DIRECTIVE (FILE, NAME, "object");			\
+  if (!flag_inhibit_size_directive)					\
+    ASM_OUTPUT_SIZE_DIRECTIVE (FILE, NAME, SIZE);			\
+  ASM_OUTPUT_ALIGN ((FILE), exact_log2 ((ALIGN) / BITS_PER_UNIT));      \
+  ASM_OUTPUT_LABEL (FILE, NAME);					\
+  ASM_OUTPUT_SKIP ((FILE), (SIZE) ? (SIZE) : 1);			\
+} while (0)
+
+/* Misc parameters.  */
+
+#define STORE_FLAG_VALUE 1
+#define Pmode SImode
+#define FUNCTION_MODE Pmode
+
+#define CASE_VECTOR_MODE Pmode
+
+/* Jumps are cheap on PRU.  */
+#define LOGICAL_OP_NON_SHORT_CIRCUIT		0
+
+/* Unfortunately the LBBO instruction does not zero-extend data.  */
+#undef LOAD_EXTEND_OP
+
+#undef WORD_REGISTER_OPERATIONS
+
+#define HAS_LONG_UNCOND_BRANCH			1
+#define HAS_LONG_COND_BRANCH			1
+
+#define REGISTER_TARGET_PRAGMAS() pru_register_pragmas ()
+
+#endif /* GCC_PRU_H */
diff --git a/gcc/config/pru/pru.md b/gcc/config/pru/pru.md
new file mode 100644
index 00000000000..de6639e1648
--- /dev/null
+++ b/gcc/config/pru/pru.md
@@ -0,0 +1,946 @@ 
+;; Machine Description for TI PRU.
+;; Copyright (C) 2014-2018 Free Software Foundation, Inc.
+;; Contributed by Dimitar Dimitrov <dimitar@dinux.eu>
+;; Based on the NIOS2 GCC port.
+;;
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+;;
+;; GCC is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+;; GNU General Public License for more details.
+;;
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; <http://www.gnu.org/licenses/>.
+
+;; Register numbers.
+(define_constants
+  [
+   (FIRST_ARG_REGNO	    56)	; Argument registers.
+   (LAST_ARG_REGNO	    119)	;
+   (FIRST_RETVAL_REGNO	    56)	; Return value registers.
+   (LAST_RETVAL_REGNO	    60)	;
+   (PROLOGUE_TEMP_REGNO	    4)	; Temporary register to use in prologue.
+
+   (RA_REGNO		    14)	; Return address register r3.w2.
+   (FP_REGNO		    16)	; Frame pointer register.
+   (LAST_NONIO_GP_REG	    119)	; Last non-I/O general purpose register.
+   (LOOPCNTR_REG	    128)	; internal LOOP counter register
+   (LAST_GP_REG		    132)	; Last general purpose register.
+
+   ;; Target register definitions.
+   (STACK_POINTER_REGNUM	8)
+   (HARD_FRAME_POINTER_REGNUM	FP_REGNO)
+   (PC_REGNUM			132)
+   (FRAME_POINTER_REGNUM	136)
+   (ARG_POINTER_REGNUM		140)
+   (FIRST_PSEUDO_REGISTER	144)
+  ]
+)
+
+;; Enumeration of UNSPECs.
+
+(define_c_enum "unspecv" [
+  UNSPECV_DELAY_CYCLES_START
+  UNSPECV_DELAY_CYCLES_END
+  UNSPECV_DELAY_CYCLES_2X_HI
+  UNSPECV_DELAY_CYCLES_2X_SI
+  UNSPECV_DELAY_CYCLES_1
+
+  UNSPECV_LOOP_BEGIN
+  UNSPECV_LOOP_END
+
+  UNSPECV_BLOCKAGE
+])
+
+; Length of an instruction (in bytes).
+(define_attr "length" "" (const_int 4))
+(define_attr "type"
+  "unknown,complex,control,alu,cond_alu,st,ld,shift"
+  (const_string "complex"))
+
+(define_asm_attributes
+ [(set_attr "length" "4")
+  (set_attr "type" "complex")])
+
+; There is no pipeline, so our scheduling description is simple.
+(define_automaton "pru")
+(define_cpu_unit "cpu" "pru")
+
+(define_insn_reservation "everything" 1 (match_test "true") "cpu")
+
+(include "predicates.md")
+(include "constraints.md")
+
+;; All supported direct move-modes
+(define_mode_iterator MOV8_16_32 [QI QQ UQQ
+				  HI HQ UHQ HA UHA
+				  SI SQ USQ SA USA SF SD])
+
+(define_mode_iterator MOV8_16 [QI QQ UQQ
+			       HI HQ UHQ HA UHA])
+(define_mode_iterator MOV32 [SI SQ USQ SA USA SF SD])
+(define_mode_iterator MOV64 [DI DF DD DQ UDQ])
+(define_mode_iterator QISI [QI HI SI])
+(define_mode_iterator HISI [HI SI])
+(define_mode_iterator SFDF [SF DF])
+
+;; EQS0/0 for extension source 0/1 and EQD for extension destination patterns.
+(define_mode_iterator EQS0 [QI HI SI])
+(define_mode_iterator EQS1 [QI HI SI])
+(define_mode_iterator EQD [QI HI SI])
+
+; Not recommended.  Please use %0 instead!
+(define_mode_attr regwidth [(QI ".b0") (HI ".w0") (SI "")])
+
+;; Move instructions
+
+(define_expand "mov<mode>"
+  [(set (match_operand:MOV8_16_32 0 "nonimmediate_operand" "")
+	(match_operand:MOV8_16_32 1 "general_operand"      ""))]
+  ""
+{
+  /* It helps to split constant loading and memory access
+     early, so that the LDI/LDI32 instructions can be hoisted
+     outside a loop body.  */
+  if (MEM_P (operands[0]))
+    operands[1] = force_reg (<MODE>mode, operands[1]);
+})
+
+;; Keep a single pattern for 32 bit MOV operations.  LRA requires that the
+;; movXX patterns be unified for any given mode.
+;;
+;; Note: Assume that Program Mem (T constraint) can fit in 16 bits!
+(define_insn "prumov<mode>"
+  [(set (match_operand:MOV32 0 "nonimmediate_operand" "=m,r,r,r,r,r")
+	(match_operand:MOV32 1 "general_operand"      "r,m,r,T,J,iF"))]
+  ""
+  "@
+    sb%B0o\\t%b1, %0, %S0
+    lb%B1o\\t%b0, %1, %S1
+    mov\\t%0, %1
+    ldi\\t%0, %%pmem(%1)
+    ldi\\t%0, %1
+    ldi32\\t%0, %1"
+  [(set_attr "type" "st,ld,alu,alu,alu,alu")
+   (set_attr "length" "4,4,4,4,4,8")])
+
+
+;; Separate pattern for 8 and 16 bit moves, since LDI32 pseudo instruction
+;; cannot handle byte and word-sized registers.
+(define_insn "prumov<mode>"
+  [(set (match_operand:MOV8_16 0 "nonimmediate_operand" "=m,r,r,r,r,r")
+	(match_operand:MOV8_16 1 "general_operand"      "r,m,r,T,J,N"))]
+  ""
+  "@
+    sb%B0o\\t%b1, %0, %S0
+    lb%B1o\\t%b0, %1, %S1
+    mov\\t%0, %1
+    ldi\\t%0, %%pmem(%1)
+    ldi\\t%0, %1
+    ldi\\t%0, (%1) & 0xffff"
+  [(set_attr "type" "st,ld,alu,alu,alu,alu")
+   (set_attr "length" "4,4,4,4,4,4")])
+
+
+; I cannot think of any reason for the core to pass a 64-bit symbolic
+; constants.  Hence simplify the rule and handle only numeric constants.
+;
+; Note: Unlike the arithmetics, here we cannot use "&" output modifier.
+; GCC expects to be able to move registers around "no matter what".
+; Forcing DI reg alignment (akin to microblaze's HARD_REGNO_MODE_OK)
+; does not seem efficient, and will violate TI ABI.
+(define_insn "mov<mode>"
+  [(set (match_operand:MOV64 0 "nonimmediate_operand" "=m,r,r,r,r,r")
+	(match_operand:MOV64 1 "general_operand"      "r,m,r,T,J,nF"))]
+  ""
+{
+  switch (which_alternative)
+    {
+    case 0:
+      return "sb%B0o\\t%b1, %0, %S0";
+    case 1:
+      return "lb%B1o\\t%b0, %1, %S1";
+    case 2:
+      /* careful with overlapping source and destination regs.  */
+      gcc_assert (GP_REG_P (REGNO (operands[0])));
+      gcc_assert (GP_REG_P (REGNO (operands[1])));
+      if (REGNO (operands[0]) == (REGNO (operands[1]) + 4))
+	return "mov\\t%N0, %N1\;mov\\t%F0, %F1";
+      else
+	return "mov\\t%F0, %F1\;mov\\t%N0, %N1";
+    case 3:
+      return "ldi\\t%F0, %%pmem(%1)\;ldi\\t%N0, 0";
+    case 4:
+      return "ldi\\t%F0, %1\;ldi\\t%N0, 0";
+    case 5:
+      return "ldi32\\t%F0, %w1\;ldi32\\t%N0, %W1";
+    default:
+      gcc_unreachable ();
+  }
+}
+  [(set_attr "type" "st,ld,alu,alu,alu,alu")
+   (set_attr "length" "4,4,8,8,8,16")])
+
+;
+; load_multiple pattern(s).
+;
+; ??? Due to reload problems with replacing registers inside match_parallel
+; we currently support load_multiple/store_multiple only after reload.
+;
+; Idea taken from the s390 port.
+
+(define_expand "load_multiple"
+  [(match_par_dup 3 [(set (match_operand 0 "" "")
+			  (match_operand 1 "" ""))
+		     (use (match_operand 2 "" ""))])]
+  "reload_completed"
+{
+  machine_mode mode;
+  int regno;
+  int count;
+  rtx from;
+  int i, off;
+
+  /* Support only loading a constant number of fixed-point registers from
+     memory.  */
+  if (GET_CODE (operands[2]) != CONST_INT
+      || GET_CODE (operands[1]) != MEM
+      || GET_CODE (operands[0]) != REG)
+    FAIL;
+
+  count = INTVAL (operands[2]);
+  regno = REGNO (operands[0]);
+  mode = GET_MODE (operands[0]);
+  if (mode != QImode)
+    FAIL;
+
+  operands[3] = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (count));
+  if (!can_create_pseudo_p ())
+    {
+      if (GET_CODE (XEXP (operands[1], 0)) == REG)
+	{
+	  from = XEXP (operands[1], 0);
+	  off = 0;
+	}
+      else if (GET_CODE (XEXP (operands[1], 0)) == PLUS
+	       && GET_CODE (XEXP (XEXP (operands[1], 0), 0)) == REG
+	       && GET_CODE (XEXP (XEXP (operands[1], 0), 1)) == CONST_INT)
+	{
+	  from = XEXP (XEXP (operands[1], 0), 0);
+	  off = INTVAL (XEXP (XEXP (operands[1], 0), 1));
+	}
+      else
+	FAIL;
+    }
+  else
+    {
+      from = force_reg (Pmode, XEXP (operands[1], 0));
+      off = 0;
+    }
+
+  for (i = 0; i < count; i++)
+    XVECEXP (operands[3], 0, i)
+      = gen_rtx_SET (gen_rtx_REG (mode, regno + i),
+		     change_address (operands[1], mode,
+		       plus_constant (Pmode, from,
+				      off + i * GET_MODE_SIZE (mode))));
+})
+
+(define_insn "*pru_load_multiple"
+  [(match_parallel 0 "load_multiple_operation"
+		   [(set (match_operand:QI 1 "register_operand" "=r")
+			 (match_operand:QI 2 "memory_operand"   "m"))])]
+  "reload_completed"
+{
+  int nregs = XVECLEN (operands[0], 0);
+  operands[0] = GEN_INT (nregs);
+  return "lb%B2o\\t%b1, %2, %0";
+}
+  [(set_attr "type" "ld")
+   (set_attr "length" "4")])
+
+;
+; store multiple pattern(s).
+;
+
+(define_expand "store_multiple"
+  [(match_par_dup 3 [(set (match_operand 0 "" "")
+			  (match_operand 1 "" ""))
+		     (use (match_operand 2 "" ""))])]
+  "reload_completed"
+{
+  machine_mode mode;
+  int regno;
+  int count;
+  rtx to;
+  int i, off;
+
+  /* Support only storing a constant number of fixed-point registers to
+     memory.  */
+  if (GET_CODE (operands[2]) != CONST_INT
+      || GET_CODE (operands[0]) != MEM
+      || GET_CODE (operands[1]) != REG)
+    FAIL;
+
+  count = INTVAL (operands[2]);
+  regno = REGNO (operands[1]);
+  mode = GET_MODE (operands[1]);
+  if (mode != QImode)
+    FAIL;
+
+  operands[3] = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (count));
+
+  if (!can_create_pseudo_p ())
+    {
+      if (GET_CODE (XEXP (operands[0], 0)) == REG)
+	{
+	  to = XEXP (operands[0], 0);
+	  off = 0;
+	}
+      else if (GET_CODE (XEXP (operands[0], 0)) == PLUS
+	       && GET_CODE (XEXP (XEXP (operands[0], 0), 0)) == REG
+	       && GET_CODE (XEXP (XEXP (operands[0], 0), 1)) == CONST_INT)
+	{
+	  to = XEXP (XEXP (operands[0], 0), 0);
+	  off = INTVAL (XEXP (XEXP (operands[0], 0), 1));
+	}
+      else
+	FAIL;
+    }
+  else
+    {
+      to = force_reg (Pmode, XEXP (operands[0], 0));
+      off = 0;
+    }
+
+  for (i = 0; i < count; i++)
+    XVECEXP (operands[3], 0, i)
+      = gen_rtx_SET (change_address (operands[0], mode,
+		       plus_constant (Pmode, to,
+				      off + i * GET_MODE_SIZE (mode))),
+		     gen_rtx_REG (mode, regno + i));
+})
+
+(define_insn "*pru_store_multiple"
+  [(match_parallel 0 "store_multiple_operation"
+		   [(set (match_operand:QI 1 "memory_operand"   "=m")
+			 (match_operand:QI 2 "register_operand" "r"))])]
+  "reload_completed"
+{
+  int nregs = XVECLEN (operands[0], 0);
+  operands[0] = GEN_INT (nregs);
+  return "sb%B1o\\t%b2, %1, %0";
+}
+  [(set_attr "type" "st")
+   (set_attr "length" "4")])
+
+;; Zero extension patterns
+;;
+;; Unfortunately we cannot use lbbo to load AND zero-extent a value.
+;; The burst length parameter of the LBBO instruction designates not only
+;; the number of memory data bytes fetched, but also the number of register
+;; byte fields written.
+(define_expand "zero_extend<EQS0:mode><EQD:mode>2"
+  [(set (match_operand:EQD 0 "register_operand" "=r")
+	(zero_extend:EQD (match_operand:EQS0 1 "register_operand" "r")))]
+  ""
+  ""
+  [(set_attr "type"     "alu")])
+
+(define_insn "*zero_extend<EQS0:mode><EQD:mode>2"
+  [(set (match_operand:EQD 0 "register_operand" "=r")
+	(zero_extend:EQD (match_operand:EQS0 1 "register_operand" "r")))]
+  ""
+  "mov\\t%0, %1"
+  [(set_attr "type"     "alu")])
+
+;; Sign extension patterns.  We have to emulate them due to lack of
+;; signed operations in PRU's ALU.
+
+(define_insn "extend<EQS0:mode><EQD:mode>2"
+  [(set (match_operand:EQD 0 "register_operand"			  "=r")
+	(sign_extend:EQD (match_operand:EQS0 1 "register_operand"  "r")))]
+  ""
+{
+  return pru_output_sign_extend (operands);
+}
+  [(set_attr "type" "complex")
+   (set_attr "length" "12")])
+
+;; Bit extraction
+;; We define it solely to allow combine to choose SImode
+;; for word mode when trying to match our cbranch_qbbx_* insn.
+;;
+;; Check how combine.c:make_extraction() uses
+;; get_best_reg_extraction_insn() to select the op size.
+(define_insn "extzv<mode>"
+  [(set (match_operand:QISI 0 "register_operand"	"=r")
+	  (zero_extract:QISI
+	   (match_operand:QISI 1 "register_operand"	"r")
+	   (match_operand:QISI 2 "const_int_operand"	"i")
+	   (match_operand:QISI 3 "const_int_operand"	"i")))]
+  ""
+  "lsl\\t%0, %1, (%S0 * 8 - %2 - %3)\;lsr\\t%0, %0, (%S0 * 8 - %2)"
+  [(set_attr "type" "complex")
+   (set_attr "length" "8")])
+
+
+
+;; Arithmetic Operations
+
+(define_expand "add<mode>3"
+  [(set (match_operand:QISI 0 "register_operand"	    "=r,r,r")
+	(plus:QISI (match_operand:QISI 1 "register_operand" "%r,r,r")
+		 (match_operand:QISI 2 "nonmemory_operand"  "r,I,M")))]
+  ""
+  ""
+  [(set_attr "type" "alu")])
+
+(define_insn "adddi3"
+  [(set (match_operand:DI 0 "register_operand"		    "=&r,&r,&r")
+	(plus:DI (match_operand:DI 1 "register_operand"	    "%r,r,r")
+		 (match_operand:DI 2 "reg_or_ubyte_operand" "r,I,M")))]
+  ""
+  "@
+   add\\t%F0, %F1, %F2\;adc\\t%N0, %N1, %N2
+   add\\t%F0, %F1, %2\;adc\\t%N0, %N1, 0
+   sub\\t%F0, %F1, %n2\;suc\\t%N0, %N1, 0"
+  [(set_attr "type" "alu,alu,alu")
+   (set_attr "length" "8,8,8")])
+
+(define_expand "sub<mode>3"
+  [(set (match_operand:QISI 0 "register_operand"		  "=r,r,r")
+	(minus:QISI (match_operand:QISI 1 "reg_or_ubyte_operand"  "r,r,I")
+		  (match_operand:QISI 2 "reg_or_ubyte_operand"	  "r,I,r")))]
+  ""
+  ""
+  [(set_attr "type" "alu")])
+
+(define_insn "subdi3"
+  [(set (match_operand:DI 0 "register_operand"		    "=&r,&r,&r")
+	(minus:DI (match_operand:DI 1 "register_operand"    "r,r,I")
+		 (match_operand:DI 2 "reg_or_ubyte_operand" "r,I,r")))]
+  ""
+  "@
+   sub\\t%F0, %F1, %F2\;suc\\t%N0, %N1, %N2
+   sub\\t%F0, %F1, %2\;suc\\t%N0, %N1, 0
+   rsb\\t%F0, %F2, %1\;rsc\\t%N0, %N2, 0"
+  [(set_attr "type" "alu,alu,alu")
+   (set_attr "length" "8,8,8")])
+
+;;  Negate and ones complement
+
+(define_expand "neg<mode>2"
+  [(set (match_operand:QISI 0 "register_operand"		"=r")
+	(neg:QISI (match_operand:QISI 1 "register_operand"	"r")))]
+  ""
+  ""
+  [(set_attr "type" "alu")])
+
+(define_expand "one_cmpl<mode>2"
+  [(set (match_operand:QISI 0 "register_operand"		"=r")
+	(not:QISI (match_operand:QISI 1 "register_operand"	"r")))]
+  ""
+  ""
+  [(set_attr "type" "alu")])
+
+;;  Integer logical Operations
+;;
+;; TODO - add optimized cases that exploit the fact that we can get away
+;; with a single machine op for special constants, e.g. UBYTE << (0/8/16/24)
+
+(define_code_iterator LOGICAL [and ior xor umin umax])
+(define_code_attr logical_asm [(and "and") (ior "or") (xor "xor") (umin "min") (umax "max")])
+
+(define_code_iterator LOGICAL_BITOP [and ior xor])
+(define_code_attr logical_bitop_asm [(and "and") (ior "or") (xor "xor")])
+
+(define_expand "<code><mode>3"
+  [(set (match_operand:QISI 0 "register_operand"		    "=r")
+	(LOGICAL:QISI (match_operand:QISI 1 "register_operand"	    "%r")
+		      (match_operand:QISI 2 "reg_or_ubyte_operand"  "rI")))]
+  ""
+  ""
+  [(set_attr "type" "alu")])
+
+
+;;  Shift instructions
+
+(define_code_iterator SHIFT  [ashift lshiftrt])
+(define_code_attr shift_op   [(ashift "ashl") (lshiftrt "lshr")])
+(define_code_attr shift_asm  [(ashift "lsl") (lshiftrt "lsr")])
+
+(define_expand "<shift_op><mode>3"
+  [(set (match_operand:QISI 0 "register_operand"	      "=r")
+	(SHIFT:QISI (match_operand:QISI 1 "register_operand"  "r")
+		    (match_operand:QISI 2 "shift_operand"     "rL")))]
+  ""
+  ""
+  [(set_attr "type" "shift")])
+
+; LRA cannot cope with clobbered op2, hence the scratch register.
+(define_insn "ashr<mode>3"
+  [(set (match_operand:QISI 0 "register_operand"	    "=&r,r")
+	  (ashiftrt:QISI
+	    (match_operand:QISI 1 "register_operand"	    "0,r")
+	    (match_operand:QISI 2 "reg_or_const_1_operand"  "r,P")))
+   (clobber (match_scratch:QISI 3			    "=r,X"))]
+  ""
+  "@
+   mov %3, %2\;ASHRLP%=:\;qbeq ASHREND%=, %3, 0\; sub %3, %3, 1\; lsr\\t%0, %0, 1\; qbbc ASHRLP%=, %0, (%S0 * 8) - 2\; set %0, %0, (%S0 * 8) - 1\; jmp ASHRLP%=\;ASHREND%=:
+   lsr\\t%0, %1, 1\;qbbc LSIGN%=, %0, (%S0 * 8) - 2\;set %0, %0, (%S0 * 8) - 1\;LSIGN%=:"
+  [(set_attr "type" "complex,alu")
+   (set_attr "length" "28,4")])
+
+
+;; Include ALU patterns with zero-extension of operands.  That's where
+;; the real insns are defined.
+
+(include "alu-zext.md")
+
+(define_insn "<code>di3"
+  [(set (match_operand:DI 0 "register_operand"		"=&r,&r")
+	  (LOGICAL_BITOP:DI
+	    (match_operand:DI 1 "register_operand"	"%r,r")
+	    (match_operand:DI 2 "reg_or_ubyte_operand"	"r,I")))]
+  ""
+  "@
+   <logical_bitop_asm>\\t%F0, %F1, %F2\;<logical_bitop_asm>\\t%N0, %N1, %N2
+   <logical_bitop_asm>\\t%F0, %F1, %2\;<logical_bitop_asm>\\t%N0, %N1, 0"
+  [(set_attr "type" "alu,alu")
+   (set_attr "length" "8,8")])
+
+
+(define_insn "one_cmpldi2"
+  [(set (match_operand:DI 0 "register_operand"		"=r")
+	(not:DI (match_operand:DI 1 "register_operand"	"r")))]
+  ""
+{
+  /* careful with overlapping source and destination regs.  */
+  gcc_assert (GP_REG_P (REGNO (operands[0])));
+  gcc_assert (GP_REG_P (REGNO (operands[1])));
+  if (REGNO (operands[0]) == (REGNO (operands[1]) + 4))
+    return "not\\t%N0, %N1\;not\\t%F0, %F1";
+  else
+    return "not\\t%F0, %F1\;not\\t%N0, %N1";
+}
+  [(set_attr "type" "alu")
+   (set_attr "length" "8")])
+
+;; Multiply instruction.  Idea for fixing registers comes from the AVR backend.
+
+(define_expand "mulsi3"
+  [(set (match_operand:SI 0 "register_operand" "")
+	(mult:SI (match_operand:SI 1 "register_operand" "")
+		 (match_operand:SI 2 "register_operand" "")))]
+  ""
+{
+  emit_insn (gen_mulsi3_fixinp (operands[0], operands[1], operands[2]));
+  DONE;
+})
+
+
+(define_expand "mulsi3_fixinp"
+  [(set (reg:SI 112) (match_operand:SI 1 "register_operand" ""))
+   (set (reg:SI 116) (match_operand:SI 2 "register_operand" ""))
+   (set (reg:SI 104) (mult:SI (reg:SI 112) (reg:SI 116)))
+   (set (match_operand:SI 0 "register_operand" "") (reg:SI 104))]
+  ""
+{
+})
+
+(define_insn "*mulsi3_prumac"
+  [(set (reg:SI 104) (mult:SI (reg:SI 112) (reg:SI 116)))]
+  ""
+  "nop\;xin\\t0, r26, 4"
+  [(set_attr "type" "alu")
+   (set_attr "length" "8")])
+
+;; Prologue, Epilogue and Return
+
+(define_expand "prologue"
+  [(const_int 1)]
+  ""
+{
+  pru_expand_prologue ();
+  DONE;
+})
+
+(define_expand "epilogue"
+  [(return)]
+  ""
+{
+  pru_expand_epilogue (false);
+  DONE;
+})
+
+(define_expand "sibcall_epilogue"
+  [(return)]
+  ""
+{
+  pru_expand_epilogue (true);
+  DONE;
+})
+
+(define_insn "return"
+  [(simple_return)]
+  "pru_can_use_return_insn ()"
+  "ret")
+
+(define_insn "simple_return"
+  [(simple_return)]
+  ""
+  "ret")
+
+;; Block any insns from being moved before this point, since the
+;; profiling call to mcount can use various registers that aren't
+;; saved or used to pass arguments.
+
+(define_insn "blockage"
+  [(unspec_volatile [(const_int 0)] UNSPECV_BLOCKAGE)]
+  ""
+  ""
+  [(set_attr "type" "unknown")
+   (set_attr "length" "0")])
+
+;;  Jumps and calls
+
+(define_insn "indirect_jump"
+  [(set (pc) (match_operand:SI 0 "register_operand" "r"))]
+  ""
+  "jmp\\t%0"
+  [(set_attr "type" "control")])
+
+(define_insn "jump"
+  [(set (pc)
+	(label_ref (match_operand 0 "" "")))]
+  ""
+  "jmp\\t%%label(%l0)"
+  [(set_attr "type" "control")
+   (set_attr "length" "4")])
+
+
+(define_expand "call"
+  [(parallel [(call (match_operand 0 "" "")
+		    (match_operand 1 "" ""))
+	      (clobber (reg:HI RA_REGNO))])]
+  ""
+  "")
+
+(define_expand "call_value"
+  [(parallel [(set (match_operand 0 "" "")
+		   (call (match_operand 1 "" "")
+			 (match_operand 2 "" "")))
+	      (clobber (reg:HI RA_REGNO))])]
+  ""
+  "")
+
+(define_insn "*call"
+  [(call (mem:SI (match_operand:SI 0 "call_operand" "i,r"))
+	 (match_operand 1 "" ""))
+   (clobber (reg:HI RA_REGNO))]
+  ""
+  "@
+    call\\t%%label(%0)
+    call\\t%0"
+  [(set_attr "type" "control")])
+
+(define_insn "*call_value"
+  [(set (match_operand 0 "" "")
+	(call (mem:SI (match_operand:SI 1 "call_operand" "i,r"))
+	      (match_operand 2 "" "")))
+   (clobber (reg:HI RA_REGNO))]
+  ""
+  "@
+    call\\t%%label(%1)
+    call\\t%1"
+  [(set_attr "type" "control")])
+
+(define_expand "sibcall"
+  [(parallel [(call (match_operand 0 "" "")
+		    (match_operand 1 "" ""))
+	      (return)])]
+  ""
+  "")
+
+(define_expand "sibcall_value"
+  [(parallel [(set (match_operand 0 "" "")
+		   (call (match_operand 1 "" "")
+			 (match_operand 2 "" "")))
+	      (return)])]
+  ""
+  "")
+
+(define_insn "*sibcall"
+ [(call (mem:SI (match_operand:SI 0 "call_operand" "i,j"))
+	(match_operand 1 "" ""))
+  (return)]
+  "SIBLING_CALL_P (insn)"
+  "@
+    jmp\\t%%label(%0)
+    jmp\\t%0"
+  [(set_attr "type" "control")])
+
+(define_insn "*sibcall_value"
+ [(set (match_operand 0 "register_operand" "")
+       (call (mem:SI (match_operand:SI 1 "call_operand" "i,j"))
+	     (match_operand 2 "" "")))
+  (return)]
+  "SIBLING_CALL_P (insn)"
+  "@
+    jmp\\t%%label(%1)
+    jmp\\t%1"
+  [(set_attr "type" "control")])
+
+(define_insn "*tablejump"
+  [(set (pc)
+	(match_operand:SI 0 "register_operand" "r"))
+   (use (label_ref (match_operand 1 "" "")))]
+  ""
+  "jmp\\t%0"
+  [(set_attr "type" "control")])
+
+;; cbranch pattern.
+;;
+;; NOTE: The short branch check has no typo! We must be conservative and take
+;; into account the worst case of having a signed comparison with a
+;; "far taken branch" label, which amounts to 7 instructions.
+
+(define_insn "cbranch<mode>4"
+  [(set (pc)
+     (if_then_else
+       (match_operator 0 "ordered_comparison_operator"
+	 [(match_operand:QISI 1 "register_operand" "r,r,r")
+	  (match_operand:QISI 2 "reg_or_ubyte_operand" "r,Z,I")])
+       (label_ref (match_operand 3 "" ""))
+       (pc)))]
+  ""
+{
+  const int length = (get_attr_length (insn));
+  const bool is_near = (length == 20 || length == 4);
+
+  if (pru_signed_cmp_operator (operands[0], VOIDmode))
+    {
+      enum rtx_code code = GET_CODE (operands[0]);
+
+      if (which_alternative == 0)
+	return pru_output_signed_cbranch (operands, is_near);
+      else if (which_alternative == 1 && (code == LT || code == GE))
+	return pru_output_signed_cbranch_zeroop2 (operands, is_near);
+      else
+	return pru_output_signed_cbranch_ubyteop2 (operands, is_near);
+    }
+  else
+    {
+      /* PRU demands OP1 to be immediate, so swap operands.  */
+      if (is_near)
+	return "qb%P0\t%l3, %1, %2";
+      else
+	return "qb%Q0\t.+8, %1, %2\;jmp\t%%label(%l3)";
+    }
+}
+  [(set_attr "type" "control")
+   (set (attr "length")
+	(if_then_else
+	    (and (ge (minus (match_dup 3) (pc)) (const_int -2020))
+		 (le (minus (match_dup 3) (pc)) (const_int 2016)))
+	    (if_then_else
+		(match_test "pru_signed_cmp_operator (operands[0], VOIDmode)")
+		    (const_int 20)
+		    (const_int 4))
+	    (if_then_else
+		(match_test "pru_signed_cmp_operator (operands[0], VOIDmode)")
+		    (const_int 28)
+		    (const_int 8))))])
+
+
+(define_expand "cbranch<mode>4"
+  [(set (pc)
+	(if_then_else (match_operator 0 "pru_fp_comparison_operator"
+		       [(match_operand:SFDF 1 "register_operand" "")
+			(match_operand:SFDF 2 "register_operand" "")])
+		      (label_ref (match_operand 3 "" ""))
+		      (pc)))]
+  ""
+{
+  rtx t = pru_expand_fp_compare (operands[0], VOIDmode);
+  operands[0] = t;
+  operands[1] = XEXP (t, 0);
+  operands[2] = XEXP (t, 1);
+})
+
+;
+; Bit test branch
+
+(define_code_iterator BIT_TEST  [eq ne])
+(define_code_attr qbbx_op   [(eq "qbbc") (ne "qbbs")])
+(define_code_attr qbbx_negop   [(eq "qbbs") (ne "qbbc")])
+
+(define_insn "cbranch_qbbx_<BIT_TEST:code><EQS0:mode><EQS1:mode><EQD:mode>4"
+ [(set (pc)
+   (if_then_else
+    (BIT_TEST (zero_extract:EQD
+	 (match_operand:EQS0 0 "register_operand" "r")
+	 (const_int 1)
+	 (match_operand:EQS1 1 "reg_or_ubyte_operand" "rI"))
+     (const_int 0))
+    (label_ref (match_operand 2 "" ""))
+    (pc)))]
+  ""
+{
+  const int length = (get_attr_length (insn));
+  const bool is_near = (length == 4);
+  if (is_near)
+    return "<BIT_TEST:qbbx_op>\\t%l2, %0, %1";
+  else
+    return "<BIT_TEST:qbbx_negop>\\t.+8, %0, %1\;jmp\\t%%label(%l2)";
+}
+  [(set_attr "type" "control")
+   (set (attr "length")
+      (if_then_else
+	  (and (ge (minus (match_dup 2) (pc)) (const_int -2048))
+	       (le (minus (match_dup 2) (pc)) (const_int 2044)))
+	  (const_int 4)
+	  (const_int 8)))])
+
+;; ::::::::::::::::::::
+;; ::
+;; :: Low Overhead Looping - idea "borrowed" from MEP
+;; ::
+;; ::::::::::::::::::::
+
+;; This insn is volatile because we'd like it to stay in its original
+;; position, just before the loop header.  If it stays there, we might
+;; be able to convert it into a "loop" insn.
+(define_insn "doloop_begin_internal<mode>"
+  [(set (match_operand:HISI 0 "register_operand" "=r")
+	(unspec_volatile:HISI
+	 [(match_operand:HISI 1 "reg_or_ubyte_operand" "rI")
+	  (match_operand 2 "const_int_operand" "")] UNSPECV_LOOP_BEGIN))]
+  ""
+  { gcc_unreachable (); }
+  [(set_attr "length" "4")])
+
+(define_expand "doloop_begin"
+  [(use (match_operand 0 "register_operand" ""))
+   (use (match_operand 1 "" ""))]
+  "TARGET_OPT_LOOP"
+  "pru_emit_doloop (operands, 0);
+   DONE;
+  ")
+
+; Note: "JUMP_INSNs and CALL_INSNs are not allowed to have any output
+; reloads;". Hence this insn must be prepared for a counter that is
+; not a register.
+(define_insn "doloop_end_internal<mode>"
+  [(set (pc)
+	(if_then_else (ne (match_operand:HISI 0 "nonimmediate_operand" "+r,*m")
+			  (const_int 1))
+		      (label_ref (match_operand 1 "" ""))
+		      (pc)))
+   (set (match_dup 0)
+	(plus:HISI (match_dup 0)
+		 (const_int -1)))
+   (unspec [(match_operand 2 "const_int_operand" "")] UNSPECV_LOOP_END)
+   (clobber (match_scratch:HISI 3 "=X,&r"))]
+  ""
+  { gcc_unreachable (); }
+  ;; Worst case length:
+  ;;
+  ;;      sub <op3>, 1		4
+  ;;      qbeq .+8, <op3>, 0    4
+  ;;      jmp <op1>		4
+  [(set (attr "length")
+      (if_then_else
+	  (and (ge (minus (pc) (match_dup 1)) (const_int 0))
+	       (le (minus (pc) (match_dup 1)) (const_int 1020)))
+	  (const_int 4)
+	  (const_int 12)))])
+
+(define_expand "doloop_end"
+  [(use (match_operand 0 "nonimmediate_operand" ""))
+   (use (label_ref (match_operand 1 "" "")))]
+  "TARGET_OPT_LOOP"
+  "if (GET_CODE (operands[0]) == REG && GET_MODE (operands[0]) == QImode)
+     FAIL;
+   pru_emit_doloop (operands, 1);
+   DONE;
+  ")
+
+(define_insn "pruloop<mode>"
+  [(set (reg:HISI LOOPCNTR_REG)
+	(unspec:HISI [(match_operand:HISI 0 "reg_or_ubyte_operand" "rI")
+		    (label_ref (match_operand 1 "" ""))]
+		   UNSPECV_LOOP_BEGIN))]
+  ""
+  "loop\\t%l1, %0"
+  [(set_attr "length" "4")])
+
+(define_insn "pruloop_end"
+  [(unspec [(const_int 0)] UNSPECV_LOOP_END)]
+  ""
+  "# loop end"
+  [(set_attr "length" "0")])
+
+
+;;  Misc patterns
+
+(define_insn "delay_cycles_start"
+  [(unspec_volatile [(match_operand 0 "immediate_operand" "i")]
+		    UNSPECV_DELAY_CYCLES_START)]
+  ""
+  "/* Begin %0 cycle delay.  */"
+)
+
+(define_insn "delay_cycles_end"
+  [(unspec_volatile [(match_operand 0 "immediate_operand" "i")]
+		    UNSPECV_DELAY_CYCLES_END)]
+  ""
+  "/* End %0 cycle delay.  */"
+)
+
+
+(define_insn "delay_cycles_2x_plus1_hi"
+  [(unspec_volatile [(match_operand 0 "const_uhword_operand" "J")]
+		    UNSPECV_DELAY_CYCLES_2X_HI)
+   (clobber (match_scratch:SI 1 "=&r"))]
+  ""
+  "ldi\\t%1, %0\;sub\\t%1, %1, 1\;qbne\\t.-4, %1, 0"
+  [(set_attr "length" "12")])
+
+
+; Do not use LDI32 here because we do not want
+; to accidentally loose one instruction cycle.
+(define_insn "delay_cycles_2x_plus2_si"
+  [(unspec_volatile [(match_operand:SI 0 "const_int_operand" "n")]
+		    UNSPECV_DELAY_CYCLES_2X_SI)
+   (clobber (match_scratch:SI 1 "=&r"))]
+  ""
+  "ldi\\t%1.w0, %L0\;ldi\\t%1.w2, %H0\;sub\\t%1, %1, 1\;qbne\\t.-4, %1, 0"
+  [(set_attr "length" "16")])
+
+(define_insn "delay_cycles_1"
+  [(unspec_volatile [(const_int 0) ] UNSPECV_DELAY_CYCLES_1)]
+  ""
+  "nop\\t# delay_cycles_1"
+)
+
+
+(define_insn "nop"
+  [(const_int 0)]
+  ""
+  "nop"
+  [(set_attr "type" "alu")])
+
+(define_insn "nop_loop_guard"
+  [(const_int 0)]
+  ""
+  "nop\\t# Loop end guard"
+  [(set_attr "type" "alu")])
diff --git a/gcc/config/pru/pru.opt b/gcc/config/pru/pru.opt
new file mode 100644
index 00000000000..e17d92fe1d3
--- /dev/null
+++ b/gcc/config/pru/pru.opt
@@ -0,0 +1,53 @@ 
+; Options for the TI PRU port of the compiler.
+; Copyright (C) 2018 Free Software Foundation, Inc.
+; Contributed by Dimitar Dimitrov <dimitar@dinux.eu>
+;
+; This file is part of GCC.
+;
+; GCC is free software; you can redistribute it and/or modify
+; it under the terms of the GNU General Public License as published by
+; the Free Software Foundation; either version 3, or (at your option)
+; any later version.
+;
+; GCC is distributed in the hope that it will be useful,
+; but WITHOUT ANY WARRANTY; without even the implied warranty of
+; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+; GNU General Public License for more details.
+;
+; You should have received a copy of the GNU General Public License
+; along with GCC; see the file COPYING3.  If not see
+; <http://www.gnu.org/licenses/>.
+
+HeaderInclude
+config/pru/pru-opts.h
+
+minrt
+Target Report Mask(MINRT) RejectNegative
+Use a minimum runtime (no static initializers or ctors) for memory-constrained
+devices.
+
+mmcu=
+Target RejectNegative Joined
+-mmcu=MCU	Select the target System-On-Chip variant that embeds this PRU.
+
+mno-relax
+Target Report
+Prevent relaxation of LDI32 relocations.
+
+mloop
+Target Mask(OPT_LOOP)
+Allow (or do not allow) gcc to use the LOOP instruction.
+
+mabi=
+Target RejectNegative Report Joined Enum(pru_abi_t) Var(pru_current_abi) Init(PRU_ABI_GNU) Save
+Select target ABI variant.
+
+Enum
+Name(pru_abi_t) Type(enum pru_abi)
+ABI variant code generation (for use with -mabi= option):
+
+EnumValue
+Enum(pru_abi_t) String(gnu) Value(PRU_ABI_GNU)
+
+EnumValue
+Enum(pru_abi_t) String(ti) Value(PRU_ABI_TI)
diff --git a/gcc/config/pru/t-pru b/gcc/config/pru/t-pru
new file mode 100644
index 00000000000..ab29e443882
--- /dev/null
+++ b/gcc/config/pru/t-pru
@@ -0,0 +1,31 @@ 
+# Makefile fragment for building GCC for the TI PRU target.
+# Copyright (C) 2012-2018 Free Software Foundation, Inc.
+# Contributed by Dimitar Dimitrov <dimitar.dinux.eu>
+# Based on the t-nios2
+#
+# This file is part of GCC.
+#
+# GCC is free software; you can redistribute it and/or modify it
+# under the terms of the GNU General Public License as published
+# by the Free Software Foundation; either version 3, or (at your
+# option) any later version.
+#
+# GCC is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See
+# the GNU General Public License for more details.
+#
+# You should have received a copy of the  GNU General Public
+# License along with GCC; see the file COPYING3.  If not see
+# <http://www.gnu.org/licenses/>.
+
+# Unfortunately mabi=ti is not feature-complete enough to build newlib.
+# Hence we cannot present mabi=gnu/ti as a multilib option.
+
+pru-pragma.o: $(srcdir)/config/pru/pru-pragma.c $(RTL_H) $(TREE_H) \
+		$(CONFIG_H) $(TM_H) $(srcdir)/config/pru/pru-protos.h
+	$(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) $<
+
+pru-passes.o: $(srcdir)/config/pru/pru-passes.c $(RTL_H) $(TREE_H) \
+		$(CONFIG_H) $(TM_H) $(srcdir)/config/pru/pru-protos.h
+	$(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) $<
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 4b606c007d8..c8a39c3fa1a 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -21781,6 +21781,7 @@  for further explanation.
 * ARM Pragmas::
 * M32C Pragmas::
 * MeP Pragmas::
+* PRU Pragmas::
 * RS/6000 and PowerPC Pragmas::
 * S/390 Pragmas::
 * Darwin Pragmas::
@@ -21932,6 +21933,26 @@  extern int foo ();
 
 @end table
 
+@node PRU Pragmas
+@subsection PRU Pragmas
+
+@table @code
+
+@item ctable_entry @var{index} @var{constant_address}
+@cindex pragma, ctable_entry
+Specifies that the given PRU CTABLE entry at @var{index} has a value
+@var{constant_address}.  This enables GCC to emit LBCO/SBCO instructions
+when the load/store address is known and can be addressed with some CTABLE
+entry.  Example:
+
+@smallexample
+/* will compile to "lbco Rx, 2, 0x10, 4" */
+#pragma ctable_entry 2 0x4802a000
+*(unsigned int *)0x4802a010 = val;
+@end smallexample
+
+@end table
+
 @node RS/6000 and PowerPC Pragmas
 @subsection RS/6000 and PowerPC Pragmas
 
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 8ac29fd48e1..a1fe70a53a5 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -1055,6 +1055,10 @@  See RS/6000 and PowerPC Options.
 -mstack-protector-guard=@var{guard} -mstack-protector-guard-reg=@var{reg} @gol
 -mstack-protector-guard-offset=@var{offset}}
 
+@emph{PRU Options}
+@gccoptlist{-mmcu=@var{mcu}  -minrt  -mno-relax  -mloop @gol
+-mabi=@var{variant} @gol}
+
 @emph{RISC-V Options}
 @gccoptlist{-mbranch-cost=@var{N-instruction} @gol
 -mplt  -mno-plt @gol
@@ -14765,6 +14769,7 @@  platform.
 * picoChip Options::
 * PowerPC Options::
 * PowerPC SPE Options::
+* PRU Options::
 * RISC-V Options::
 * RL78 Options::
 * RS/6000 and PowerPC Options::
@@ -23444,6 +23449,56 @@  the offset with a symbol reference to a canary in the TLS block.
 @end table
 
 
+@node PRU Options
+@subsection PRU Options
+@cindex PRU Options
+
+These command-line options are defined for PRU target:
+
+@table @gcctabopt
+@item -minrt
+@opindex minrt
+Enable the use of a minimum runtime environment---no static
+initializers or constructors.  Results in significant code size
+reduction of final ELF binaries.
+
+@item -mmcu=@var{mcu}
+@opindex mmcu
+Specify the PRU MCU variant to use.  Check Newlib for exact list of options.
+
+@item -mno-relax
+@opindex mno-relax
+Pass on (or do not pass on) the @option{-mrelax} command-line option
+to the linker.
+
+@item -mloop
+@opindex mloop
+Allow (or do not allow) GCC to use the LOOP instruction.
+
+@item -mabi=@var{variant}
+@opindex mabi
+Specify the ABI variant to output code for.  Permissible values are @samp{gnu}
+for GCC, and @samp{ti} for fully conformant TI ABI.  These are the differences:
+
+@table @samp
+@item Function Pointer Size
+TI ABI specifies that function (code) pointers are 16-bit, whereas GCC
+supports only 32-bit data and code pointers.
+
+@item Optional Return Value Pointer
+Function return values larger than 64 bits are passed by using a hidden
+pointer as the first argument of the function.  TI ABI, though, mandates that
+the pointer can be NULL in case the caller is not using the returned value.
+GCC always passes and expects a valid return value pointer.
+
+@end table
+
+The current @samp{mabi=ti} implementation simply raises a compile error
+when any of the above code constructs is detected.
+
+@end table
+
+
 @node RISC-V Options
 @subsection RISC-V Options
 @cindex RISC-V Options
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 4801d68a207..adcbf578cfc 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -3353,6 +3353,28 @@  Vector constant that is all zeros.
 
 @end table
 
+@item PRU---@file{config/pru/constraints.md}
+@table @code
+@item I
+An unsigned 8-bit integer constant.
+
+@item J
+An unsigned 16-bit integer constant.
+
+@item L
+An unsigned 5-bit integer constant (for shift counts).
+
+@item M
+An integer constant in the range [-255;0].
+
+@item T
+A text segment (program memory) constant label.
+
+@item Z
+Integer constant zero.
+
+@end table
+
 @item RL78---@file{config/rl78/constraints.md}
 @table @code