diff mbox

[ARC] Add single/double IEEE precission FPU support.

Message ID 1454335022-21760-1-git-send-email-claziss@synopsys.com
State New
Headers show

Commit Message

Claudiu Zissulescu Feb. 1, 2016, 1:57 p.m. UTC
In this patch, we add support for the new FPU instructions available with
ARC V2 processors.  The new FPU instructions covers both single and
double precision IEEE formats. While the single precision is available
for both ARC EM and ARC HS processors, the double precision is only
available for ARC HS. ARC EM will make use of the double precision assist
instructions which are in fact FPX double instructions.  The double
floating point precision instructions are making use of the odd-even
register pairs to hold 64-bit datums, exactly like in the case of ldd/std
instructions.

Additional to the mods required by FPU instructions to be supported by
GCC, we forced all the 64 bit datum to use odd-even register pairs (HS
only), as we observed a better usage of the ldd/std, and less generated
move instructions.  A byproduct of this optimization, is a new ABI, which
places the 64-bit arguments into odd-even register pairs.  This behavior
can be selected using -mabi option.

Feedback is welcomed,
Claudiu

gcc/
2016-02-01  Claudiu Zissulescu  <claziss@synopsys.com>

	* config/arc/arc-modes.def (CC_FPU, CC_FPUE, CC_FPU_UNEQ): New
	modes.
	* config/arc/arc-opts.h (FPU_SP, FPU_SF, FPU_SC, FPU_SD, FPU_DP)
	(FPU_DF, FPU_DC, FPU_DD, FXP_DP): Define.
	(arc_abi_type): New enum.
	* config/arc/arc-protos.h (arc_init_cumulative_args): Declare.
	* config/arc/arc.c (TARGET_STRICT_ARGUMENT_NAMING): Define.
	(arc_init): Check FPU options.
	(get_arc_condition_code): Handle new CC_FPU* modes.
	(arc_select_cc_mode): Likewise.
	(arc_conditional_register_usage): Allow 64 bit datum into even-odd
	register pair only. Allow access for ARCv2 accumulator.
	(gen_compare_reg): Whenever we have FPU support use FPU compare
	instructions.
	(arc_setup_incoming_varargs): Handle even-odd register pair (ARC
	HS only).
	(arc_strict_argument_naming): New function.
	(arc_init_cumulative_args): Likewise.
	(arc_hard_regno_nregs): Likewise.
	(arc_function_args_impl): Likewise.
	(arc_arg_partial_bytes): Use arc_function_args_impl function.
	(arc_function_arg): Likewise.
	(arc_function_arg_advance): Likewise.
	(arc_reorg): Don't generate brcc insns when FPU compare
	instructions are involved.
	* config/arc/arc.h (TARGET_DPFP): Add TARGET_FP_DPAX condition.
	(TARGET_OPTFPE): Add condition when ARC EM can use optimized
	floating point emulation.
	(ACC_REG_FIRST, ACC_REG_LAST, ACCL_REGNO, ACCH_REGNO): Define.
	(CUMULATIVE_ARGS): New structure.
	(INIT_CUMULATIVE_ARGS): Use arc_init_cumulative_args.
	(REVERSE_CONDITION): Add new CC_FPU* modes.
	(TARGET_HARD_FLOAT, TARGET_FP_SINGLE, TARGET_FP_DOUBLE)
	(TARGET_FP_SFUZED, TARGET_FP_DFUZED, TARGET_FP_SCONV)
	(TARGET_FP_DCONV, TARGET_FP_SSQRT, TARGET_FP_DSQRT)
	(TARGET_FP_DPAX): Define.
	* config/arc/arc.md (ARCV2_ACC): New constant.
	(type): New fpu type attribute.
	(SDF): Conditional iterator.
	(cstore<mode>, cbranch<mode>): Change expand condition.
	(addsf3, subsf3, mulsf3, adddf3, subdf3, muldf3): New expands,
	handles FPU/FPX cases as well.
	* config/arc/arc.opt (mfpu, mabi): New options.
	* config/arc/fpx.md (addsf3_fpx, subsf3_fpx, mulsf3_fpx):
	Renamed.
	(adddf3, muldf3, subdf3): Removed.
	* config/arc/predicates.md (proper_comparison_operator): Recognize
	CC_FPU* modes.
	* config/arc/fpu.md: New file.
        * doc/invoke.texi (ARC Options): Document mabi and mfpu options.
---
 gcc/config/arc/arc-modes.def |   5 +
 gcc/config/arc/arc-opts.h    |  27 ++
 gcc/config/arc/arc-protos.h  |   1 +
 gcc/config/arc/arc.c         | 398 ++++++++++++++++++++++++-----
 gcc/config/arc/arc.h         |  70 ++++--
 gcc/config/arc/arc.md        | 150 ++++++++++-
 gcc/config/arc/arc.opt       |  56 +++++
 gcc/config/arc/fpu.md        | 580 +++++++++++++++++++++++++++++++++++++++++++
 gcc/config/arc/fpx.md        |  64 +----
 gcc/config/arc/predicates.md |  10 +
 gcc/doc/invoke.texi          |  91 ++++++-
 11 files changed, 1310 insertions(+), 142 deletions(-)
 create mode 100644 gcc/config/arc/fpu.md

Comments

Joern Wolfgang Rennecke Feb. 2, 2016, 10:52 p.m. UTC | #1
On 01/02/16 13:57, Claudiu Zissulescu wrote:
> In this patch, we add support for the new FPU instructions available with
> ARC V2 processors.  The new FPU instructions covers both single and
> double precision IEEE formats. While the single precision is available
> for both ARC EM and ARC HS processors, the double precision is only
> available for ARC HS. ARC EM will make use of the double precision assist
> instructions which are in fact FPX double instructions.  The double
> floating point precision instructions are making use of the odd-even
> register pairs to hold 64-bit datums, exactly like in the case of ldd/std
> instructions.
>
> Additional to the mods required by FPU instructions to be supported by
> GCC, we forced all the 64 bit datum to use odd-even register pairs (HS
> only), as we observed a better usage of the ldd/std, and less generated
> move instructions.  A byproduct of this optimization, is a new ABI, which
> places the 64-bit arguments into odd-even register pairs.  This behavior
> can be selected using -mabi option.
>
> Feedback is welcomed,
>
   VECTOR_MODES (INT, 16);       /* V16QI V8HI V4SI V2DI */
+
+/* FPU conditon flags. */

Typo

+       error ("FPU double precission options are available for ARC HS 
only.");

There should be no period at the end of the error message string.

+      if (TARGET_HS && (arc_fpu_build & FPX_DP))
+       error ("FPU double precission assist "

Typo.  And Ditto.


+      case EQ:
+      case NE:
+      case UNORDERED:
+      case UNLT:
+      case UNLE:
+      case UNGT:
+      case UNGE:
+       return CC_FPUmode;
+
+      case LT:
+      case LE:
+      case GT:
+      case GE:
+      case ORDERED:
+       return CC_FPUEmode;

cse and other code transformations are likely to do better if you use
just one mode for these.  It is also very odd to have comparisons and their
inverse use different modes.  Have you done any benchmarking for this?

@@ -1282,6 +1363,16 @@ arc_conditional_register_usage (void)
         arc_hard_regno_mode_ok[60] = 1 << (int) S_MODE;
      }

+  /* ARCHS has 64-bit data-path which makes use of the even-odd paired
+     registers.  */
+  if (TARGET_HS)
+    {
+      for (regno = 1; regno < 32; regno +=2)
+       {
+         arc_hard_regno_mode_ok[regno] = S_MODES;
+       }
+    }
+

Does TARGET_HS with -mabi=default allow for passing DFmode / DImode 
arguments
in odd registers?  I fear you might run into reload trouble when trying to
access the values.

+arc_hard_regno_nregs (int regno,
...
+  if ((regno < FIRST_PSEUDO_REGISTER)
+      && (HARD_REGNO_MODE_OK (regno, mode)
+         || (mode == BLKmode)))
+    return words;
+  return 0;

This prima facie contradicts HARD_REGNO_NREGS, which considers the
larger sizes of simd vector and dma config registers.
I see that there is no actual conflict as the vector registers are not
used for argument passing, but the comment in the function only states
what the function does - not quite correctly, as detailed before - and
not what it is for.

So, either the mxp support has to be removed before this patch goes in,
or arc_hard_regno_nregs has to handle simd registers properly, or the
comment at the top should state the limited applicability of this
function, and there should be an assert to check that the register
number passed is suitable - e.g.:
gcc_assert (regno < ARC_FIRST_SIMD_VR_REG)

+/* Given an CUMULATIVE_ARGS, this function returns an RTX if the

Typo: C is not a vowel.

+  if (!named && TARGET_HS)
+    {
+      /* For unamed args don't try fill up the reg-holes.  */
+      reg_idx = cum->last_reg;
+      /* Only interested in the number of regs.  */

You should make up your mind what the priorities for stdarg are.
Traditionally, lots of gcc ports have supported broken code that lacks
declarations of variadic functions, and furthermore have placed
emphasis on simplicity of varargs/stdarg callee code, at the expense
of normal code.  Often for compatibility with a pre-existing
compiler, sometimes by just copying from existing ports without
stopping to consider the ramifications.
If you make argument passing different for stdarg declared functions,
the broken code that lacks declarations won't work any more.
Ignoring registers for argument passing is not helping the callers
code density.  So the only objective that might be furthered here
is stdarg callee simplicity.  But if you really want that, and ignore
compatibility with broken code, the logical thing to do is not to
pass any unnamed arguments in registers.

If stdarg caller's code size is considered important, and stdarg
callees mostly irrelevant (as mostly associated with I/O, and
linked in just once per function), this aligns well with supporting
broken code: it shouldn't matter if the argument is anonymous or
not, it's the same effort for the caller to pass it.

One further thing to consider when forging new ABIs is that
partial argument passing is there solely for the convenience of
stdarg callees, and/or the programmer who wrote that part of
the target port.  Code size an speed of common code is generally
better when partial argument passing is eliminated.  There is
nothing (apart from backwards compatibility issues for
established ABIs) that prevents a port for a processor with a
single class of argument passing registers to make the stdarg
handling code in the callee (TARGET_GIMPLIFY_VA_ARG_EXPR) keep
track of when the register save zone is exhausted.  It's just
shifting some burden from non-variadic code and variadic caller
side code to the callee side of variadic code.

+      /* We cannot partialy pass some modes for HS (i.e. DImode,
Typo
+        non-compatible mode).  As the DI regs needs to be in even-odd

If you can't always do partial argument passing, then there is no
point in doing it at all.  Well, at least for any given mode, but
really, if you need to keep track of of many saved registers
you have left in order to process va_arg invocations for some
modes, it's really easier to just ditch the partial argument
passing altogether.
The only way to salvage a simplistic va_arg implementation is to
force alignment of stack-passed arguments to match register-passed
arguments.

+        register pair, when comming to partial passing of an

Typo.

+        argument, the code bellow will not find a suitable register

Typo.

+  if (advance && named && (arc_abi != ARC_ABI_PACK))
+    {
+       /* MAX out any other free register if a named arguments goes on

Typo: make up your mind about singular/plural.  You could use 'any' instead
of 'a' to remain ambiguous, but the noun has to agree with the verb.

+             if (!link_insn
+                 /* Avoid FPU instructions.  */
+                 || (GET_MODE (SET_DEST
+                               (PATTERN (link_insn))) == CC_FPUmode)
+                 || (GET_MODE (SET_DEST
+                               (PATTERN (link_insn))) == CC_FPU_UNEQmode)
+                 || (GET_MODE (SET_DEST
+                               (PATTERN (link_insn))) == CC_FPUEmode))

It's pointless to search for the CC setter and then bail out this late.
The mode is also accessible in the CC user, so after we have computed
pc_target, we can check the condition code register in the comparison
XEXP (pc_target, 1) for its mode.

+        emit_insn(gen_adddf3_insn(operands[0], operands[1], 
operands[2],tmp,const0_rtx));

Four missing spaces.

+     emit_insn(gen_adddf3_insn(operands[0], operands[1], 
operands[2],const1_rtx,const1_rtx));

Ditto.

+        int const_index = ((GET_CODE (operands[1]) == CONST_DOUBLE) ? 
1: 2);

Missing space.

+(define_expand "subdf3"
..
+   if (TARGET_DPFP)
..
+        if (TARGET_EM && GET_CODE (operands[1]) == CONST_DOUBLE)
+           emit_insn(gen_subdf3_insn(operands[0], operands[2], 
operands[1],tmp,const0_rtx));

Four missing spaces.  Besides, subtraction is not commutative.

+        else
+           emit_insn(gen_subdf3_insn(operands[0], operands[1], 
operands[2],tmp,const0_rtx));

Four missing spaces.

+     emit_insn(gen_subdf3_insn(operands[0], operands[1], 
operands[2],const1_rtx,const1_rtx));

Ditto.

still in "subdf3":
+  else if (TARGET_FP_DOUBLE)

So this implies that both (TARGET_DPFP) and (TARGET_FP_DOUBLE) might be 
true at
the same time.  In that case, so we really want to prefer the 
(TARGET_DPFP) expansion?

+        emit_insn(gen_muldf3_insn(operands[0], operands[1], 
operands[2],tmp,const0_rtx));
Four missing spaces.

+      emit_insn(gen_muldf3_insn(operands[0], operands[1], 
operands[2],const1_rtx,const1_rtx));
Ditto.


+mabi=
+Target RejectNegative Joined Var(arc_abi) Enum(arc_abi_type) 
Init(ARC_ABI_DEFAULT)
+Specify the desired ABI used for ARC HS processors. Variants can be 
default or mwabi.

According to the rest of the patch, this should be 'optimized' instead of
'mwabi'
Although it could make sense to use option names that are more likely to
stand the test of time if you change the default later on, or if other
changes ask for yet another abi change that is optimized in a different way.
I also note that there is a variable value ARC_ABI_PACK with no matching
string, but some code to handle it.

May I also suggest that it makes sense to tie the default abi to hardware
variants that are relevant for multilib purposes.
E.g., if you have a multilib with full hardware floating point support, it
makes sense to make that for the double-word register optimized abi,
and make that abi the default for that hardware configuration.


+(define_insn "*addsf3_fpu"
+  [(set (match_operand:SF 0 "register_operand" "=r,r,r,r,r")
+       (plus:SF (match_operand:SF 1 "nonmemory_operand" "0,r,0,r,F")
+                (match_operand:SF 2 "nonmemory_operand" "r,r,F,F,r")))]

Addition is commutative (even floating point addition - even though it's 
not associative).
We are missing out on post-reload conditionalization here.

+;; Multiplication
+(define_insn "*mulsf3_fpu"
+  [(set (match_operand:SF 0 "register_operand" "=r,r,r,r,r")
+       (mult:SF (match_operand:SF 1 "nonmemory_operand" "0,r,0,r,F")
+                (match_operand:SF 2 "nonmemory_operand" "r,r,F,F,r")))]

Ditto for multiplication.
Also applies to the multiply operands of fmasf4_fpu, fnmasf4_fpu, 
fmadf4_fpu, fnmadf4_fpu.

+(define_insn "*cmpsf_trap_fpu"

That name makes as little sense to me as having two separate modes 
CC_FPU and CC_FPUE
for positive / negated usage and having two comparison patterns pre 
precision that
do the same but pretend to be dissimilar.

+(define_insn "*cmpsf_fpu_uneq"
+  [(set (reg:CC_FPU_UNEQ CC_REG)
+       (compare:CC_FPU_UNEQ
+        (match_operand:SF 0 "register_operand"  "r,r")
+        (match_operand:SF 1 "nonmemory_operand" "r,F")))]
+  "TARGET_FP_SINGLE"
+  "fscmp %0, %1\\n\\tmov.v.f 0,0\\t;set Z flag"
+  [(set_attr "length" "8,16")

That should be "8,12" .

Likewise for double precision.


+/* Single precissionf floating point support with fused operation. */
+#define TARGET_FP_SFUZED  ((arc_fpu_build & FPU_SF) != 0)
+/* Double precissionf floating point support with fused operation. */

2*2 typo : precissionf
And is there a particular reason for 'FUZED' (a self-destruct mechanism?),
or is that another typo / malapropism?
Also, the agglomeration of S/D with FU{S,Z}ED is confusing.  Could you
spare another underscore?  If you are too skint for horizontal space, even
camel case would be better than suggesting SF used / DF used.

I couldn't help but to think of this: https://xkcd.com/739/

There are likely a number of costs that should be tweaked in arc_rtx_costs.
Claudiu Zissulescu Feb. 3, 2016, 3:02 p.m. UTC | #2
First, I will split this patch in two. The first part will deal with the FPU instructions. The second patch, will try to address a new abi optimized for odd-even registers as the comments for the mabi=optimized are numerous and I need to carefully prepare for an answer.
The remaining of this email will focus on FPU patch.

> +      case EQ:
> +      case NE:
> +      case UNORDERED:
> +      case UNLT:
> +      case UNLE:
> +      case UNGT:
> +      case UNGE:
> +       return CC_FPUmode;
> +
> +      case LT:
> +      case LE:
> +      case GT:
> +      case GE:
> +      case ORDERED:
> +       return CC_FPUEmode;
> 
> cse and other code transformations are likely to do better if you use
> just one mode for these.  It is also very odd to have comparisons and their
> inverse use different modes.  Have you done any benchmarking for this?

Right, the ORDERED should be in CC_FPUmode. An inspiration point for CC_FPU/CC_FPUE mode is the arm port. The reason why having the two CC_FPU and CC_FPUE modes is to emit signaling FPU compare instructions.  We can use a single CC_FPU mode here instead of two, but we may lose functionality.
Regarding benchmarks, I do not have an establish benchmark for this, however, as far as I could see the code generated for FPU looks clean.
Please let me know if it is acceptable to go with CC_FPU/CC_FPUE, and ORDERED fix further on. Or, to have a single mode.

> +  /* ARCHS has 64-bit data-path which makes use of the even-odd paired
> +     registers.  */
> +  if (TARGET_HS)
> +    {
> +      for (regno = 1; regno < 32; regno +=2)
> +       {
> +         arc_hard_regno_mode_ok[regno] = S_MODES;
> +       }
> +    }
> +
> 
> Does TARGET_HS with -mabi=default allow for passing DFmode / DImode
> arguments
> in odd registers?  I fear you might run into reload trouble when trying to
> access the values.

Although, I haven't bump into this issue until now, I do not say it may not happen. Hence, I would create a new register class to hold the odd-even registers. Hence the above code will not be needed. What do u say?

> still in "subdf3":
> +  else if (TARGET_FP_DOUBLE)
> 
> So this implies that both (TARGET_DPFP) and (TARGET_FP_DOUBLE) might
> be
> true at
> the same time.  In that case, so we really want to prefer the
> (TARGET_DPFP) expansion?

The TARGET_DPFP (FPX instructions) and TARGET_FP_DOUBLE (FPU) are mutually exclusive. It should be a check in arc_init() function for this case.

> +(define_insn "*cmpsf_trap_fpu"
> 
> That name makes as little sense to me as having two separate modes
> CC_FPU and CC_FPUE
> for positive / negated usage and having two comparison patterns pre
> precision that
> do the same but pretend to be dissimilar.
> 

The F{S/D}CMPF instruction is similar to the F{S/D}CMP instruction in cases when either of the instruction operands is a signaling NaN. The FxCMPF instruction updates the invalid flag (FPU_STATUS.IV) when either of the operands is a quiet or signaling NaN, whereas, the FxCMP instruction updates the invalid flag (FPU_STATUS.IV) only when either of the operands is a quiet NaN. We need to use the FxCMPF only if we keep the CC_FPU an CC_FPUE otherwise, we shall use only FxCMP instruction.

> Also, the agglomeration of S/D with FU{S,Z}ED is confusing.  Could you
> spare another underscore? 

Is this better?

#define TARGET_FP_SP_BASE   ((arc_fpu_build & FPU_SP) != 0)
#define TARGET_FP_DP_BASE   ((arc_fpu_build & FPU_DP) != 0)
#define TARGET_FP_SP_FUSED  ((arc_fpu_build & FPU_SF) != 0)
#define TARGET_FP_DP_FUSED  ((arc_fpu_build & FPU_DF) != 0)
#define TARGET_FP_SP_CONV   ((arc_fpu_build & FPU_SC) != 0)
#define TARGET_FP_DP_CONV   ((arc_fpu_build & FPU_DC) != 0)
#define TARGET_FP_SP_SQRT   ((arc_fpu_build & FPU_SD) != 0)
#define TARGET_FP_DP_SQRT   ((arc_fpu_build & FPU_DD) != 0)
#define TARGET_FP_DP_AX     ((arc_fpu_build & FPX_DP) != 0)

Thanks,
Claudiu
Joern Wolfgang Rennecke Feb. 3, 2016, 6:41 p.m. UTC | #3
On 03/02/16 15:02, Claudiu Zissulescu wrote:
> First, I will split this patch in two. The first part will deal with the FPU instructions. The second patch, will try to address a new abi optimized for odd-even registers as the comments for the mabi=optimized are numerous and I need to carefully prepare for an answer.
> The remaining of this email will focus on FPU patch.
>
>> +      case EQ:
>> +      case NE:
>> +      case UNORDERED:
>> +      case UNLT:
>> +      case UNLE:
>> +      case UNGT:
>> +      case UNGE:
>> +       return CC_FPUmode;
>> +
>> +      case LT:
>> +      case LE:
>> +      case GT:
>> +      case GE:
>> +      case ORDERED:
>> +       return CC_FPUEmode;
>>
>> cse and other code transformations are likely to do better if you use
>> just one mode for these.  It is also very odd to have comparisons and their
>> inverse use different modes.  Have you done any benchmarking for this?
> Right, the ORDERED should be in CC_FPUmode. An inspiration point for CC_FPU/CC_FPUE mode is the arm port.

I can't see how this code in the arm can actually work correctly. When, 
for instance, combine simplifies
a comparison, the comparison code can change, and it will use 
SELECT_CC_MODE to find a new
mode for the comparison.  Thus, if a comparison traps or not on qNaNs 
will depend on the whims
of combine.
Also, the the trapping comparisons don't show the side effect of 
trapping on qNaNs, which means
they can be speculated.

To make the trapping comparisons safe, they should display the side 
effect in the rtl, and only
be used when requested by options, type attributes, pragmas etc.
They could almost be safe to use be default for -ffinite-math-only, 
except that when the frontend knows
how to tell qNaNs and sNaNs apart, and speculates a comparison after 
some integer/fp mixed computation when it can infer that no sNaN will 
occur, you could still get an unexpected signal.
>   The reason why having the two CC_FPU and CC_FPUE modes is to emit signaling FPU compare instructions.
I don't know if your compare instructions are signalling for quiet NaNs 
(I hope they're not),
but  the mode of the comparison result shouldn't be used to distinguish 
that - it's not safe,
see above.
The purpose of the mode of the result is to distinguish different 
interpretations for the bit
patterns inside the comparison result flags.
>    We can use a single CC_FPU mode here instead of two, but we may lose functionality.
Can you define what that functionality actually is, and show some simple 
test code
to demonstrate how it works with your port  extension?
> Regarding benchmarks, I do not have an establish benchmark for this, however, as far as I could see the code generated for FPU looks clean.
> Please let me know if it is acceptable to go with CC_FPU/CC_FPUE, and ORDERED fix further on.
No, there should be a discernible and achievable purpose for comparison 
modes.
Which you have not demonstrated yet so far for the CC_FPU/CC_FPUE 
dichotomy.
>   Or, to have a single mode.
Yes.
>
>> +  /* ARCHS has 64-bit data-path which makes use of the even-odd paired
>> +     registers.  */
>> +  if (TARGET_HS)
>> +    {
>> +      for (regno = 1; regno < 32; regno +=2)
>> +       {
>> +         arc_hard_regno_mode_ok[regno] = S_MODES;
>> +       }
>> +    }
>> +
>>
>> Does TARGET_HS with -mabi=default allow for passing DFmode / DImode
>> arguments
>> in odd registers?  I fear you might run into reload trouble when trying to
>> access the values.
> Although, I haven't bump into this issue until now, I do not say it may not happen. Hence, I would create a new register class to hold the odd-even registers. Hence the above code will not be needed. What do u say?
That would have been possible a couple of years ago, but these days, all 
the constituent
registers of a multi-reg hard register have to be in a constraint's 
register class for the
constraint to match.
You could fudge this by using two classes for two-reg allocations, one 
with 0,1, 4,5, 8,9 ... the other
with 2,3, 6,7, 10,11 ... , but then you need another pair for 
four-register allocations, and maybe you
want to add various union and intersection classes, and the register 
allocators are rather rubbis
when it comes to balance allocations between classes of similar size and 
utility, so you should
really try to avoid this.
A way to avoid this is not to give the option of using the old ABI while 
enforcing alignment in registers.
Or you could use a different mode for the argument passing when it ends 
up unaligned; I suppose
BLKmode should work, using a vector to designate the constituent 
registers of the function argument.
> +(define_insn "*cmpsf_trap_fpu"
>
> That name makes as little sense to me as having two separate modes
> CC_FPU and CC_FPUE
> for positive / negated usage and having two comparison patterns pre
> precision that
> do the same but pretend to be dissimilar.
>
> The F{S/D}CMPF instruction is similar to the F{S/D}CMP instruction
Oops. I missed the 'f' suffix.  So the "*trap_fpu" patterns really are 
different...
>   in cases when either of the instruction operands is a signaling NaN. The FxCMPF instruction updates the invalid flag (FPU_STATUS.IV) when either of the operands is a quiet or signaling NaN, whereas, the FxCMP instruction updates the invalid flag (FPU_STATUS.IV) only when either of the operands is a quiet NaN. We need to use the FxCMPF only if we keep the CC_FPU an CC_FPUE otherwise, we shall use only FxCMP instruction.
... and have different side effects not mentioned in the rtl.
When the invalid flag is set, does that mean the CPU gets an FPU exception?
So quiet NaNs always signal, and signalling NaNs only when used as 
operands of FxCMPF?

You might need to resort to software floating point comparisons then unless
-ffinite-math-only is in effect.
> Is this better?
>
> #define TARGET_FP_SP_BASE   ((arc_fpu_build & FPU_SP) != 0)
> #define TARGET_FP_DP_BASE   ((arc_fpu_build & FPU_DP) != 0)
> #define TARGET_FP_SP_FUSED  ((arc_fpu_build & FPU_SF) != 0)
> #define TARGET_FP_DP_FUSED  ((arc_fpu_build & FPU_DF) != 0)
> #define TARGET_FP_SP_CONV   ((arc_fpu_build & FPU_SC) != 0)
> #define TARGET_FP_DP_CONV   ((arc_fpu_build & FPU_DC) != 0)
> #define TARGET_FP_SP_SQRT   ((arc_fpu_build & FPU_SD) != 0)
> #define TARGET_FP_DP_SQRT   ((arc_fpu_build & FPU_DD) != 0)
> #define TARGET_FP_DP_AX     ((arc_fpu_build & FPX_DP) != 0)
Yes.
Joern Wolfgang Rennecke Feb. 5, 2016, 2:16 p.m. UTC | #4
P.S.: if code that is missing prototypes for stdarg functions is of no 
concern, there is another ABI
alternative that might give good code density for architectures like ARC 
that have pre-decrement
addressing modes and allow immediates to be pushed:

You could put all unnamed arguments on the stack (thus simplifying 
varargs processing), and
leave all registers not used for argument passing call-saved.
Thus, the callers wouldn't have to worry about saving these registers or 
reloading their values
from the stack.

For gcc, this would require making the call fusage really work - 
probably involving a hook to tell
the middle-end that the port really wants that - or a kludge to make 
affected call insn not look
like call insns, similar to the sfuncs.
Claudiu Zissulescu Feb. 5, 2016, 3:54 p.m. UTC | #5
> P.S.: if code that is missing prototypes for stdarg functions is of no concern,
> there is another ABI alternative that might give good code density for
> architectures like ARC that have pre-decrement addressing modes and allow
> immediates to be pushed:
> 
> You could put all unnamed arguments on the stack (thus simplifying varargs
> processing), and leave all registers not used for argument passing call-saved.
> Thus, the callers wouldn't have to worry about saving these registers or
> reloading their values from the stack.
> 
> For gcc, this would require making the call fusage really work - probably
> involving a hook to tell the middle-end that the port really wants that - or a
> kludge to make affected call insn not look like call insns, similar to the sfuncs.

Unfortunately, we need to be compatible with the previous ABI for the time being.
I am now investigating passing the DI like modes in non even-odd registers. The biggest challenge is how to pass such a mode partially, without introducing odd/even register classes.
Claudiu Zissulescu Feb. 9, 2016, 3:34 p.m. UTC | #6
Please find attached a reworked patch. It doesn't contain the ABI modifications as I notified you earlier in an email.  Also, you may have extra comments regarding these original observations:

>+  /* ARCHS has 64-bit data-path which makes use of the even-odd paired
>+     registers.  */
>+  if (TARGET_HS)
>+    {
>+      for (regno = 1; regno < 32; regno +=2)
>+       {
>+         arc_hard_regno_mode_ok[regno] = S_MODES;
>+       }
>+    }
>+
>
>Does TARGET_HS with -mabi=default allow for passing DFmode / DImode 
>arguments
>in odd registers?  I fear you might run into reload trouble when trying to
>access the values.

The current ABI passes the DI-like modes in any register pair. This should not be an issue as the movdi_insn and movdf_insn should handle those exceptional cases. As for partial passing of arguments, move_block_from_reg() should take care of exceptional cases like DImode.

>+             if (!link_insn
>+                 /* Avoid FPU instructions.  */
>+                 || (GET_MODE (SET_DEST
>+                               (PATTERN (link_insn))) == CC_FPUmode)
>+                 || (GET_MODE (SET_DEST
>+                               (PATTERN (link_insn))) == CC_FPU_UNEQmode)
>+                 || (GET_MODE (SET_DEST
>+                               (PATTERN (link_insn))) == CC_FPUEmode))
>
>It's pointless to search for the CC setter and then bail out this late.
>The mode is also accessible in the CC user, so after we have computed
>pc_target, we can check the condition code register in the comparison
>XEXP (pc_target, 1) for its mode.

Most of the cases checking only the CC user may be sufficient. However, there are cases (only one which I found), where the CC user has a different mode than of the CC setter.  This is happening when running gcc.dg/pr56424.c test. Here, the C_FPU mode cstore is simplified by the following steps losing the CC_FPU mode:

In the expand:
   18: cc:CC_FPU=cmp(r159:DF,r162:DF)
   19: r163:SI=cc:CC_FPU<0
   20: r161:QI=r163:SI#0
   21: r153:SI=zero_extend(r161:QI)
   22: cc:CC_ZN=cmp(r153:SI,0)
   23: pc={(cc:CC_ZN!=0)?L28:pc}

Then after combine we get this:
   18: cc:CC_FPU=cmp(r2:DF,r4:DF)
      REG_DEAD r4:DF
      REG_DEAD r2:DF
   23: pc={(cc:CC_ZN<0)?L28:pc}
      REG_DEAD cc:CC_ZN
      REG_BR_PROB 6102

Ok to apply?
Claudiu
Joern Wolfgang Rennecke Feb. 10, 2016, 6:39 a.m. UTC | #7
On 09/02/16 15:34, Claudiu Zissulescu wrote:
> Most of the cases checking only the CC user may be sufficient. However, there are cases (only one which I found), where the CC user has a different mode than of the CC setter.  This is happening when running gcc.dg/pr56424.c test. Here, the C_FPU mode cstore is simplified by the following steps losing the CC_FPU mode:
>
> In the expand:
>     18: cc:CC_FPU=cmp(r159:DF,r162:DF)
>     19: r163:SI=cc:CC_FPU<0
>     20: r161:QI=r163:SI#0
>     21: r153:SI=zero_extend(r161:QI)
>     22: cc:CC_ZN=cmp(r153:SI,0)
>     23: pc={(cc:CC_ZN!=0)?L28:pc}
>
> Then after combine we get this:
>     18: cc:CC_FPU=cmp(r2:DF,r4:DF)
>        REG_DEAD r4:DF
>        REG_DEAD r2:DF
>     23: pc={(cc:CC_ZN<0)?L28:pc}
>        REG_DEAD cc:CC_ZN
>        REG_BR_PROB 6102

That sound like a bug.  Have you looked more closely what's going on?
Claudiu Zissulescu Feb. 10, 2016, 9:43 a.m. UTC | #8
> > In the expand:
> >     18: cc:CC_FPU=cmp(r159:DF,r162:DF)
> >     19: r163:SI=cc:CC_FPU<0
> >     20: r161:QI=r163:SI#0
> >     21: r153:SI=zero_extend(r161:QI)
> >     22: cc:CC_ZN=cmp(r153:SI,0)
> >     23: pc={(cc:CC_ZN!=0)?L28:pc}
> >
> > Then after combine we get this:
> >     18: cc:CC_FPU=cmp(r2:DF,r4:DF)
> >        REG_DEAD r4:DF
> >        REG_DEAD r2:DF
> >     23: pc={(cc:CC_ZN<0)?L28:pc}
> >        REG_DEAD cc:CC_ZN
> >        REG_BR_PROB 6102
> 
> That sound like a bug.  Have you looked more closely what's going on?

The fwprop1 is collapsing insn 20 into insn 21. No surprise until here. Then, the combiner is changing first insn 19 and 21 into insn 21 (this seems sane). Followed by combining the resulted insn 21 into insn 22. Finally, insn 22 is changing the condition of the jump (insn 22).
The last steps are a bit too aggressive, but I can make a logic out of it. Practically, insn 22 tells to the combiner how to change a CC_FPU mode into a CC_ZN mode, resulting into the modification of insn 21 to insn23. However, I cannot understand why the combiner chooses for CC_ZN instead of CC_FPU
Claudiu Zissulescu Feb. 10, 2016, 12:43 p.m. UTC | #9
> That sound like a bug.  Have you looked more closely what's going on?

Right, I found it. Forgot to set the C_MODE for CC_FPU* modes in the arc_mode_class[]. I will prepare a new patch with the proper handling.

Thanks!
Claudiu Zissulescu Feb. 10, 2016, 1:31 p.m. UTC | #10
Please find attached the amended patch for FPU instructions.

Ok to apply?
Joern Wolfgang Rennecke Feb. 12, 2016, 11:41 p.m. UTC | #11
On 10/02/16 13:31, Claudiu Zissulescu wrote:
> Please find attached the amended patch for FPU instructions.
>
> Ok to apply?
+(define_insn "*cmpdf_fpu"

I'm wondering - could you compare with +zero using a literal (adding an 
alternative)?
(No need to hold up the main patch, but you can consider it for a 
follow-up patch)

(define_insn "*cmpsf_fpu_uneq"
+  [(set (reg:CC_FPU_UNEQ CC_REG)
+       (compare:CC_FPU_UNEQ
+        (match_operand:DF 0 "even_register_operand"  "r")

Typo: probably should be *cmpdf_fpu_uneq

+    case CC_FPUmode:
+      return !((code == LTGT) || (code == UNEQ));
`
strictly speaking, this shouldn't accept unsigned comparisons,
although I can't think of a scenario where these would be presented
in this mode,
and the failure mode would just be an abort in get_arc_condition_code.

Otherwise, this is OK.
Claudiu Zissulescu Feb. 16, 2016, 2:13 p.m. UTC | #12
Thanks Joern,

Committed: r233451

> -----Original Message-----
> From: Joern Wolfgang Rennecke [mailto:gnu@amylaar.uk]
> Sent: Saturday, February 13, 2016 12:42 AM
> To: Claudiu Zissulescu; gcc-patches@gcc.gnu.org
> Cc: Francois.Bedard@synopsys.com; jeremy.bennett@embecosm.com
> Subject: Re: [PATCH] [ARC] Add single/double IEEE precission FPU support.
> 
> 
> 
> On 10/02/16 13:31, Claudiu Zissulescu wrote:
> > Please find attached the amended patch for FPU instructions.
> >
> > Ok to apply?
> +(define_insn "*cmpdf_fpu"
> 
> I'm wondering - could you compare with +zero using a literal (adding an
> alternative)?
> (No need to hold up the main patch, but you can consider it for a follow-up
> patch)
> 
> (define_insn "*cmpsf_fpu_uneq"
> +  [(set (reg:CC_FPU_UNEQ CC_REG)
> +       (compare:CC_FPU_UNEQ
> +        (match_operand:DF 0 "even_register_operand"  "r")
> 
> Typo: probably should be *cmpdf_fpu_uneq
> 
> +    case CC_FPUmode:
> +      return !((code == LTGT) || (code == UNEQ));
> `
> strictly speaking, this shouldn't accept unsigned comparisons, although I can't
> think of a scenario where these would be presented in this mode, and the
> failure mode would just be an abort in get_arc_condition_code.
> 
> Otherwise, this is OK.
diff mbox

Patch

diff --git a/gcc/config/arc/arc-modes.def b/gcc/config/arc/arc-modes.def
index b64a596..1f4bf95 100644
--- a/gcc/config/arc/arc-modes.def
+++ b/gcc/config/arc/arc-modes.def
@@ -35,3 +35,8 @@  CC_MODE (CC_FPX);
 VECTOR_MODES (INT, 4);        /*            V4QI V2HI */
 VECTOR_MODES (INT, 8);        /*       V8QI V4HI V2SI */
 VECTOR_MODES (INT, 16);       /* V16QI V8HI V4SI V2DI */
+
+/* FPU conditon flags. */
+CC_MODE (CC_FPU);
+CC_MODE (CC_FPUE);
+CC_MODE (CC_FPU_UNEQ);
diff --git a/gcc/config/arc/arc-opts.h b/gcc/config/arc/arc-opts.h
index 0f12885..be628e5 100644
--- a/gcc/config/arc/arc-opts.h
+++ b/gcc/config/arc/arc-opts.h
@@ -27,3 +27,30 @@  enum processor_type
   PROCESSOR_ARCEM,
   PROCESSOR_ARCHS
 };
+
+/* Single precision floating point.  */
+#define FPU_SP    0x0001
+/* Single precision fused floating point operations.  */
+#define FPU_SF    0x0002
+/* Single precision floating point format conversion operations.  */
+#define FPU_SC    0x0004
+/* Single precision floating point sqrt and div operations.  */
+#define FPU_SD    0x0008
+/* Double precision floating point.  */
+#define FPU_DP    0x0010
+/* Double precision fused floating point operations.  */
+#define FPU_DF    0x0020
+/* Double precision floating point format conversion operations.  */
+#define FPU_DC    0x0040
+/* Double precision floating point sqrt and div operations.  */
+#define FPU_DD    0x0080
+/* Double precision floating point assist operations.  */
+#define FPX_DP    0x0100
+
+enum arc_abi_type
+{
+  ARC_ABI_DEFAULT,
+  ARC_ABI_PACK,
+  ARC_ABI_OPTIMIZED
+};
+
diff --git a/gcc/config/arc/arc-protos.h b/gcc/config/arc/arc-protos.h
index f487291..ef00b47 100644
--- a/gcc/config/arc/arc-protos.h
+++ b/gcc/config/arc/arc-protos.h
@@ -123,3 +123,4 @@  extern int regno_clobbered_p (unsigned int, rtx_insn *, machine_mode, int);
 extern int arc_return_slot_offset (void);
 extern bool arc_legitimize_reload_address (rtx *, machine_mode, int, int);
 extern void arc_secondary_reload_conv (rtx, rtx, rtx, bool);
+extern void arc_init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, tree, int);
diff --git a/gcc/config/arc/arc.c b/gcc/config/arc/arc.c
index b9799a0..3565c54 100644
--- a/gcc/config/arc/arc.c
+++ b/gcc/config/arc/arc.c
@@ -324,6 +324,10 @@  static void arc_finalize_pic (void);
 
 #undef TARGET_RETURN_IN_MEMORY
 #define TARGET_RETURN_IN_MEMORY arc_return_in_memory
+
+#undef TARGET_STRICT_ARGUMENT_NAMING
+#define TARGET_STRICT_ARGUMENT_NAMING arc_strict_argument_naming
+
 #undef TARGET_PASS_BY_REFERENCE
 #define TARGET_PASS_BY_REFERENCE arc_pass_by_reference
 
@@ -719,9 +723,16 @@  arc_init (void)
 
   /* FPX-3. No FPX extensions on pre-ARC600 cores.  */
   if ((TARGET_DPFP || TARGET_SPFP)
-      && !TARGET_ARCOMPACT_FAMILY)
+      && (!TARGET_ARCOMPACT_FAMILY && !TARGET_EM))
     error ("FPX extensions not available on pre-ARC600 cores");
 
+  /* FPX-4. No FPX extensions mixed with FPU extensions for ARC HS
+     cpus.  */
+  if ((TARGET_DPFP || TARGET_SPFP)
+      && TARGET_HARD_FLOAT
+      && TARGET_HS)
+    error ("No FPX/FPU mixing allowed");
+
   /* Only selected multiplier configurations are available for HS.  */
   if (TARGET_HS && ((arc_mpy_option > 2 && arc_mpy_option < 7)
 		    || (arc_mpy_option == 1)))
@@ -743,6 +754,19 @@  arc_init (void)
   if (TARGET_LL64 && !TARGET_HS)
     error ("-mll64 is only supported for ARC HS cores");
 
+  /* FPU support only for V2.  */
+  if (TARGET_HARD_FLOAT)
+    {
+      if (TARGET_EM
+	  && (arc_fpu_build & ~(FPU_SP | FPU_SF | FPU_SC | FPU_SD | FPX_DP)))
+	error ("FPU double precission options are available for ARC HS only.");
+      if (TARGET_HS && (arc_fpu_build & FPX_DP))
+	error ("FPU double precission assist "
+	       "options are not available for ARC HS.");
+      if (!TARGET_HS && !TARGET_EM)
+	error ("FPU options are available for ARCv2 architecture only");
+    }
+
   arc_init_reg_tables ();
 
   /* Initialize array for PRINT_OPERAND_PUNCT_VALID_P.  */
@@ -926,6 +950,34 @@  get_arc_condition_code (rtx comparison)
 	case UNEQ      : return ARC_CC_LS;
 	default : gcc_unreachable ();
 	}
+    case CC_FPUmode:
+    case CC_FPUEmode:
+      switch (GET_CODE (comparison))
+	{
+	case EQ        : return ARC_CC_EQ;
+	case NE        : return ARC_CC_NE;
+	case GT        : return ARC_CC_GT;
+	case GE        : return ARC_CC_GE;
+	case LT        : return ARC_CC_C;
+	case LE        : return ARC_CC_LS;
+	case UNORDERED : return ARC_CC_V;
+	case ORDERED   : return ARC_CC_NV;
+	case UNGT      : return ARC_CC_HI;
+	case UNGE      : return ARC_CC_HS;
+	case UNLT      : return ARC_CC_LT;
+	case UNLE      : return ARC_CC_LE;
+	  /* UNEQ and LTGT do not have representation.  */
+	case LTGT      : /* Fall through.  */
+	case UNEQ      : /* Fall through.  */
+	default : gcc_unreachable ();
+	}
+    case CC_FPU_UNEQmode:
+      switch (GET_CODE (comparison))
+	{
+	case LTGT : return ARC_CC_NE;
+	case UNEQ : return ARC_CC_EQ;
+	default : gcc_unreachable ();
+	}
     default : gcc_unreachable ();
     }
   /*NOTREACHED*/
@@ -1009,19 +1061,48 @@  arc_select_cc_mode (enum rtx_code op, rtx x, rtx y)
 	return CC_FP_GEmode;
       default: gcc_unreachable ();
       }
-  else if (GET_MODE_CLASS (mode) == MODE_FLOAT && TARGET_OPTFPE)
+  else if (TARGET_HARD_FLOAT
+	   && ((mode == SFmode && TARGET_FP_SINGLE)
+	       || (mode == DFmode && TARGET_FP_DOUBLE)))
     switch (op)
       {
-      case EQ: case NE: return CC_Zmode;
-      case LT: case UNGE:
-      case GT: case UNLE: return CC_FP_GTmode;
-      case LE: case UNGT:
-      case GE: case UNLT: return CC_FP_GEmode;
-      case UNEQ: case LTGT: return CC_FP_UNEQmode;
-      case ORDERED: case UNORDERED: return CC_FP_ORDmode;
-      default: gcc_unreachable ();
-      }
+      case EQ:
+      case NE:
+      case UNORDERED:
+      case UNLT:
+      case UNLE:
+      case UNGT:
+      case UNGE:
+	return CC_FPUmode;
+
+      case LT:
+      case LE:
+      case GT:
+      case GE:
+      case ORDERED:
+	return CC_FPUEmode;
+
+      case LTGT:
+      case UNEQ:
+	return CC_FPU_UNEQmode;
 
+      default:
+	gcc_unreachable ();
+      }
+  else if (GET_MODE_CLASS (mode) == MODE_FLOAT && TARGET_OPTFPE)
+    {
+      switch (op)
+	{
+	case EQ: case NE: return CC_Zmode;
+	case LT: case UNGE:
+	case GT: case UNLE: return CC_FP_GTmode;
+	case LE: case UNGT:
+	case GE: case UNLT: return CC_FP_GEmode;
+	case UNEQ: case LTGT: return CC_FP_UNEQmode;
+	case ORDERED: case UNORDERED: return CC_FP_ORDmode;
+	default: gcc_unreachable ();
+	}
+    }
   return CCmode;
 }
 
@@ -1282,6 +1363,16 @@  arc_conditional_register_usage (void)
 	arc_hard_regno_mode_ok[60] = 1 << (int) S_MODE;
     }
 
+  /* ARCHS has 64-bit data-path which makes use of the even-odd paired
+     registers.  */
+  if (TARGET_HS)
+    {
+      for (regno = 1; regno < 32; regno +=2)
+	{
+	  arc_hard_regno_mode_ok[regno] = S_MODES;
+	}
+    }
+
   for (i = 0; i < FIRST_PSEUDO_REGISTER; i++)
     {
       if (i < 29)
@@ -1376,6 +1467,19 @@  arc_conditional_register_usage (void)
 
   /* pc : r63 */
   arc_regno_reg_class[PROGRAM_COUNTER_REGNO] = GENERAL_REGS;
+
+  /*ARCV2 Accumulator.  */
+  if (TARGET_V2
+      && (TARGET_FP_DFUZED || TARGET_FP_SFUZED))
+  {
+    arc_regno_reg_class[ACCL_REGNO] = WRITABLE_CORE_REGS;
+    arc_regno_reg_class[ACCH_REGNO] = WRITABLE_CORE_REGS;
+    SET_HARD_REG_BIT (reg_class_contents[WRITABLE_CORE_REGS], ACCL_REGNO);
+    SET_HARD_REG_BIT (reg_class_contents[WRITABLE_CORE_REGS], ACCH_REGNO);
+    SET_HARD_REG_BIT (reg_class_contents[CHEAP_CORE_REGS], ACCL_REGNO);
+    SET_HARD_REG_BIT (reg_class_contents[CHEAP_CORE_REGS], ACCH_REGNO);
+    arc_hard_regno_mode_ok[ACC_REG_FIRST] = D_MODES;
+  }
 }
 
 /* Handle an "interrupt" attribute; arguments as in
@@ -1545,6 +1649,10 @@  gen_compare_reg (rtx comparison, machine_mode omode)
 						 gen_rtx_REG (CC_FPXmode, 61),
 						 const0_rtx)));
     }
+  else if (TARGET_HARD_FLOAT
+	   && ((cmode == SFmode && TARGET_FP_SINGLE)
+	       || (cmode == DFmode && TARGET_FP_DOUBLE)))
+    emit_insn (gen_rtx_SET (cc_reg, gen_rtx_COMPARE (mode, x, y)));
   else if (GET_MODE_CLASS (cmode) == MODE_FLOAT && TARGET_OPTFPE)
     {
       rtx op0 = gen_rtx_REG (cmode, 0);
@@ -1638,10 +1746,11 @@  arc_setup_incoming_varargs (cumulative_args_t args_so_far,
   /* We must treat `__builtin_va_alist' as an anonymous arg.  */
 
   next_cum = *get_cumulative_args (args_so_far);
-  arc_function_arg_advance (pack_cumulative_args (&next_cum), mode, type, 1);
-  first_anon_arg = next_cum;
+  arc_function_arg_advance (pack_cumulative_args (&next_cum),
+			    mode, type, true);
+  first_anon_arg = TARGET_HS ? next_cum.last_reg : next_cum.arg_num;
 
-  if (first_anon_arg < MAX_ARC_PARM_REGS)
+  if (FUNCTION_ARG_REGNO_P (first_anon_arg))
     {
       /* First anonymous (unnamed) argument is in a reg.  */
 
@@ -1662,6 +1771,15 @@  arc_setup_incoming_varargs (cumulative_args_t args_so_far,
     }
 }
 
+/* Strict argument naming is required.  */
+
+static bool
+arc_strict_argument_naming (cumulative_args_t cum ATTRIBUTE_UNUSED)
+{
+    return true;
+}
+
+
 /* Cost functions.  */
 
 /* Provide the costs of an addressing mode that contains ADDR.
@@ -4834,29 +4952,208 @@  emit_pic_move (rtx *operands, machine_mode)
 /* Since arc parm regs are contiguous.  */
 #define ARC_NEXT_ARG_REG(REGNO) ( (REGNO) + 1 )
 
-/* Implement TARGET_ARG_PARTIAL_BYTES.  */
+/* Initialize a variable CUM of type CUMULATIVE_ARGS for a call to a
+   function whose data type is FNTYPE.  For a library call FNTYPE is
+   zero.
+ */
 
+void
+arc_init_cumulative_args (CUMULATIVE_ARGS *cum,
+			  tree fntype ATTRIBUTE_UNUSED,
+			  rtx libname ATTRIBUTE_UNUSED,
+			  tree fndecl ATTRIBUTE_UNUSED,
+			  int n_named_args ATTRIBUTE_UNUSED)
+{
+  int i;
+
+  cum->arg_num = 0;
+  cum->last_reg = 0;
+  for (i = 0; i < FIRST_PSEUDO_REGISTER; i++)
+    {
+      cum->avail[i] = true;
+    }
+}
+
+/* Returns the number of consecutive hard registers used hold the MODE
+   starting at REGNO.
+ */
 static int
-arc_arg_partial_bytes (cumulative_args_t cum_v, machine_mode mode,
-		       tree type, bool named ATTRIBUTE_UNUSED)
+arc_hard_regno_nregs (int regno,
+		      enum machine_mode mode,
+		      const_tree type)
 {
-  CUMULATIVE_ARGS *cum = get_cumulative_args (cum_v);
   int bytes = (mode == BLKmode
 	       ? int_size_in_bytes (type) : (int) GET_MODE_SIZE (mode));
   int words = (bytes + UNITS_PER_WORD - 1) / UNITS_PER_WORD;
-  int arg_num = *cum;
-  int ret;
 
-  arg_num = ROUND_ADVANCE_CUM (arg_num, mode, type);
-  ret = GPR_REST_ARG_REGS (arg_num);
+  if ((regno < FIRST_PSEUDO_REGISTER)
+      && (HARD_REGNO_MODE_OK (regno, mode)
+	  || (mode == BLKmode)))
+    return words;
+  return 0;
+}
 
-  /* ICEd at function.c:2361, and ret is copied to data->partial */
-    ret = (ret >= words ? 0 : ret * UNITS_PER_WORD);
 
-  return ret;
+/* Given an CUMULATIVE_ARGS, this function returns an RTX if the
+   argument is passed in registers.  */
+
+static rtx
+arc_function_args_impl (CUMULATIVE_ARGS *cum,
+			enum machine_mode mode,
+			const_tree type,
+			bool named,
+			bool advance)
+{
+  int lb, reg_idx  = 0;
+  int reg_location = 0;
+  int nregs;
+  bool found = false;
+
+  if (mode == VOIDmode)
+    return const0_rtx;
+
+  if (!named && TARGET_HS)
+    {
+      /* For unamed args don't try fill up the reg-holes.  */
+      reg_idx = cum->last_reg;
+      /* Only interested in the number of regs.  */
+      nregs = arc_hard_regno_nregs (0, mode, type);
+      if (((nregs == 2) || (nregs == 4))
+	  /* Only DI-like modes are interesting for us.  */
+	  && (mode != BLKmode)
+	  /* Only for odd registers.  */
+	  && (reg_idx & 1)
+	   /* Allow passing partial arguments.  */
+	  && FUNCTION_ARG_REGNO_P (reg_idx))
+	{
+	  rtx reg[4];
+	  rtvec vec;
+
+	  if (nregs == 2)
+	    {
+	      reg[0] = gen_rtx_REG (SImode, reg_idx);
+	      reg[1] = gen_rtx_REG (SImode, reg_idx + 1);
+	      vec = gen_rtvec (2,
+			       gen_rtx_EXPR_LIST (VOIDmode,
+						  reg[0], const0_rtx),
+			       gen_rtx_EXPR_LIST (VOIDmode,
+						  reg[1], GEN_INT (4)));
+	    }
+	  else
+	    {
+	      reg[0] = gen_rtx_REG (SImode, reg_idx);
+	      reg[1] = gen_rtx_REG (SImode, reg_idx + 1);
+	      reg[2] = gen_rtx_REG (SImode, reg_idx + 2);
+	      reg[3] = gen_rtx_REG (SImode, reg_idx + 3);
+	      vec = gen_rtvec (4,
+			       gen_rtx_EXPR_LIST (VOIDmode,
+						  reg[0], const0_rtx),
+			       gen_rtx_EXPR_LIST (VOIDmode,
+						  reg[1], GEN_INT (4)),
+			       gen_rtx_EXPR_LIST (VOIDmode,
+						  reg[2], GEN_INT (8)),
+			       gen_rtx_EXPR_LIST (VOIDmode,
+						  reg[3], GEN_INT (12)));
+	    }
+	  if (advance)
+	    {
+	      cum->arg_num += nregs;
+	      for (int i = 0; i < nregs; i++)
+		cum->avail[reg_idx + i] = false;
+	      cum->last_reg += nregs;
+	    }
+	  return gen_rtx_PARALLEL (mode, vec);
+	}
+    }
+
+  while (FUNCTION_ARG_REGNO_P (reg_idx))
+    {
+      nregs = arc_hard_regno_nregs ((arc_abi == ARC_ABI_DEFAULT) ? 0 : reg_idx,
+				    mode, type);
+      /* We cannot partialy pass some modes for HS (i.e. DImode,
+	 non-compatible mode).  As the DI regs needs to be in even-odd
+	 register pair, when comming to partial passing of an
+	 argument, the code bellow will not find a suitable register
+	 pair.  This works only for even-odd register pairs
+	 (2-regs).  */
+      if (nregs)
+	{
+	  found = true;
+	  reg_location = reg_idx;
+	  while ((reg_idx < (reg_location + nregs))
+		 && FUNCTION_ARG_REGNO_P (reg_idx)
+		 && found)
+	    {
+	      found &= cum->avail[reg_idx];
+	      reg_idx++;
+	    }
+	  if (found)
+	    break;
+	}
+      else
+	{
+	  reg_idx++;
+	}
+    }
+
+  if (found)
+    {
+      if (advance)
+	{
+	  /* Update CUMULATIVE_ARGS if we advance.  */
+	  lb = (arc_abi == ARC_ABI_PACK) ? reg_location : 0;
+	  for (reg_idx = lb; (reg_idx < (reg_location + nregs))
+		 && FUNCTION_ARG_REGNO_P (reg_idx); reg_idx++)
+	    {
+	      cum->avail[reg_idx] = false;
+	    }
+	  cum->last_reg = reg_idx;
+	  cum->arg_num += nregs;
+	}
+      return gen_rtx_REG (mode, reg_location);
+    }
+
+  if (advance && named && (arc_abi != ARC_ABI_PACK))
+    {
+       /* MAX out any other free register if a named arguments goes on
+	  stack.  This avoids any usage of the remaining regs for
+	  further argument pasing.  */
+      cum->last_reg = MAX_ARC_PARM_REGS;
+      for (int i = 0 ; i < MAX_ARC_PARM_REGS; i++)
+	cum->avail[i] = false;
+    }
+
+  return NULL_RTX;
 }
 
+/* Implement TARGET_ARG_PARTIAL_BYTES.  */
+
+static int
+arc_arg_partial_bytes (cumulative_args_t cum_v, machine_mode mode,
+		       tree type, bool named ATTRIBUTE_UNUSED)
+{
+  CUMULATIVE_ARGS *cum = get_cumulative_args (cum_v);
+  int ret = 0;
+  rtx retx;
+  int nregs = 0;
+
+  retx = arc_function_args_impl (cum, mode, type, named, false);
+  if (REG_P (retx)
+      || (GET_CODE (retx) == PARALLEL))
+    {
+      int regno;
+
+      if (REG_P (retx))
+	regno = REGNO (retx);
+      else
+	regno = REGNO (XEXP (XVECEXP (retx, 0, 0), 0));
 
+      nregs = arc_hard_regno_nregs (0, mode, type);
+      ret = (((regno + nregs) <= MAX_ARC_PARM_REGS) ? 0 :
+	     (MAX_ARC_PARM_REGS - regno) * UNITS_PER_WORD);
+    }
+  return ret;
+}
 
 /* This function is used to control a function argument is passed in a
    register, and which register.
@@ -4895,31 +5192,15 @@  arc_arg_partial_bytes (cumulative_args_t cum_v, machine_mode mode,
    and the rest are pushed.  */
 
 static rtx
-arc_function_arg (cumulative_args_t cum_v, machine_mode mode,
-		  const_tree type ATTRIBUTE_UNUSED, bool named ATTRIBUTE_UNUSED)
+arc_function_arg (cumulative_args_t cum_v,
+		  machine_mode mode,
+		  const_tree type,
+		  bool named)
 {
   CUMULATIVE_ARGS *cum = get_cumulative_args (cum_v);
-  int arg_num = *cum;
   rtx ret;
-  const char *debstr ATTRIBUTE_UNUSED;
 
-  arg_num = ROUND_ADVANCE_CUM (arg_num, mode, type);
-  /* Return a marker for use in the call instruction.  */
-  if (mode == VOIDmode)
-    {
-      ret = const0_rtx;
-      debstr = "<0>";
-    }
-  else if (GPR_REST_ARG_REGS (arg_num) > 0)
-    {
-      ret = gen_rtx_REG (mode, arg_num);
-      debstr = reg_names [arg_num];
-    }
-  else
-    {
-      ret = NULL_RTX;
-      debstr = "memory";
-    }
+  ret = arc_function_args_impl (cum, mode, type, named, false);
   return ret;
 }
 
@@ -4942,20 +5223,14 @@  arc_function_arg (cumulative_args_t cum_v, machine_mode mode,
    course function_arg_partial_nregs will come into play.  */
 
 static void
-arc_function_arg_advance (cumulative_args_t cum_v, machine_mode mode,
-			  const_tree type, bool named ATTRIBUTE_UNUSED)
+arc_function_arg_advance (cumulative_args_t cum_v,
+			  machine_mode mode,
+			  const_tree type,
+			  bool named)
 {
   CUMULATIVE_ARGS *cum = get_cumulative_args (cum_v);
-  int bytes = (mode == BLKmode
-	       ? int_size_in_bytes (type) : (int) GET_MODE_SIZE (mode));
-  int words = (bytes + UNITS_PER_WORD  - 1) / UNITS_PER_WORD;
-  int i;
-
-  if (words)
-    *cum = ROUND_ADVANCE_CUM (*cum, mode, type);
-  for (i = 0; i < words; i++)
-    *cum = ARC_NEXT_ARG_REG (*cum);
 
+  arc_function_args_impl (cum, mode, type, named, true);
 }
 
 /* Define how to find the value returned by a function.
@@ -6420,7 +6695,14 @@  arc_reorg (void)
 		      break;
 		    }
 		}
-	      if (! link_insn)
+	      if (!link_insn
+		  /* Avoid FPU instructions.  */
+		  || (GET_MODE (SET_DEST
+				(PATTERN (link_insn))) == CC_FPUmode)
+		  || (GET_MODE (SET_DEST
+				(PATTERN (link_insn))) == CC_FPU_UNEQmode)
+		  || (GET_MODE (SET_DEST
+				(PATTERN (link_insn))) == CC_FPUEmode))
 		continue;
 	      else
 		/* Check if this is a data dependency.  */
diff --git a/gcc/config/arc/arc.h b/gcc/config/arc/arc.h
index 27665b0..0098562 100644
--- a/gcc/config/arc/arc.h
+++ b/gcc/config/arc/arc.h
@@ -28,6 +28,8 @@  along with GCC; see the file COPYING3.  If not see
 #ifndef GCC_ARC_H
 #define GCC_ARC_H
 
+#include <stdbool.h>
+
 /* Things to do:
 
    - incscc, decscc?
@@ -255,7 +257,8 @@  along with GCC; see the file COPYING3.  If not see
 #define TARGET_MIXED_CODE (TARGET_MIXED_CODE_SET)
 
 #define TARGET_SPFP (TARGET_SPFP_FAST_SET || TARGET_SPFP_COMPACT_SET)
-#define TARGET_DPFP (TARGET_DPFP_FAST_SET || TARGET_DPFP_COMPACT_SET)
+#define TARGET_DPFP (TARGET_DPFP_FAST_SET || TARGET_DPFP_COMPACT_SET	\
+		     || TARGET_FP_DPAX)
 
 #define SUBTARGET_SWITCHES
 
@@ -266,11 +269,12 @@  along with GCC; see the file COPYING3.  If not see
    default for A7, and only for pre A7 cores when -mnorm is given.  */
 #define TARGET_NORM (TARGET_ARC700 || TARGET_NORM_SET || TARGET_HS)
 /* Indicate if an optimized floating point emulation library is available.  */
-#define TARGET_OPTFPE \
- (TARGET_ARC700 \
-  /* We need a barrel shifter and NORM.  */ \
-  || (TARGET_ARC600 && TARGET_NORM_SET) \
-  || TARGET_HS)
+#define TARGET_OPTFPE				\
+   (TARGET_ARC700				\
+    /* We need a barrel shifter and NORM.  */	\
+    || (TARGET_ARC600 && TARGET_NORM_SET)	\
+    || TARGET_HS				\
+    || (TARGET_EM && TARGET_NORM_SET && TARGET_BARREL_SHIFTER))
 
 /* Non-zero means the cpu supports swap instruction.  This flag is set by
    default for A7, and only for pre A7 cores when -mswap is given.  */
@@ -713,6 +717,12 @@  enum reg_class
 #define ARC_FIRST_SIMD_DMA_CONFIG_OUT_REG  136
 #define ARC_LAST_SIMD_DMA_CONFIG_REG       143
 
+/* ARCv2 double-register accumulator.  */
+#define ACC_REG_FIRST 58
+#define ACC_REG_LAST  59
+#define ACCL_REGNO    (TARGET_BIG_ENDIAN ? ACC_REG_FIRST + 1 : ACC_REG_FIRST)
+#define ACCH_REGNO    (TARGET_BIG_ENDIAN ? ACC_REG_FIRST : ACC_REG_FIRST + 1)
+
 /* The same information, inverted:
    Return the class number of the smallest class containing
    reg number REGNO.  This could be a conditional expression
@@ -858,13 +868,22 @@  arc_return_addr_rtx(COUNT,FRAME)
    hold all necessary information about the function itself
    and about the args processed so far, enough to enable macros
    such as FUNCTION_ARG to determine where the next arg should go.  */
-#define CUMULATIVE_ARGS int
+typedef struct arc_args
+{
+  /* Registers that are still available for parameter passing.  */
+  bool avail[FIRST_PSEUDO_REGISTER];
+  /* HS-ABI: last set register.  It is needed for passing the first
+     anonymous arguments in regs.  */
+  unsigned int last_reg;
+  /* Backwards compatibility: total number of used registers.  */
+  int arg_num;
+} CUMULATIVE_ARGS;
 
 /* Initialize a variable CUM of type CUMULATIVE_ARGS
    for a call to a function whose data type is FNTYPE.
    For a library call, FNTYPE is 0.  */
 #define INIT_CUMULATIVE_ARGS(CUM,FNTYPE,LIBNAME,INDIRECT,N_NAMED_ARGS) \
-((CUM) = 0)
+  arc_init_cumulative_args (&CUM, FNTYPE, LIBNAME, INDIRECT, N_NAMED_ARGS)
 
 /* The number of registers used for parameter passing.  Local to this file.  */
 #define MAX_ARC_PARM_REGS 8
@@ -1656,12 +1675,13 @@  extern enum arc_function_type arc_compute_function_type (struct function *);
    && GET_CODE (PATTERN (X)) != CLOBBER \
    && get_attr_is_##NAME (X) == IS_##NAME##_YES) \
 
-#define REVERSE_CONDITION(CODE,MODE) \
-	(((MODE) == CC_FP_GTmode || (MODE) == CC_FP_GEmode \
-	  || (MODE) == CC_FP_UNEQmode || (MODE) == CC_FP_ORDmode \
-	  || (MODE) == CC_FPXmode) \
-	 ? reverse_condition_maybe_unordered ((CODE)) \
-	 : reverse_condition ((CODE)))
+#define REVERSE_CONDITION(CODE,MODE)				 \
+  (((MODE) == CC_FP_GTmode || (MODE) == CC_FP_GEmode		 \
+    || (MODE) == CC_FP_UNEQmode || (MODE) == CC_FP_ORDmode	 \
+    || (MODE) == CC_FPXmode || (MODE) == CC_FPU_UNEQmode	 \
+    || (MODE) == CC_FPUmode || (MODE) == CC_FPUEmode)		 \
+   ? reverse_condition_maybe_unordered ((CODE))			 \
+   : reverse_condition ((CODE)))
 
 #define ADJUST_INSN_LENGTH(X, LENGTH) \
   ((LENGTH) \
@@ -1724,4 +1744,26 @@  enum
    been written to by an ordinary instruction.  */
 #define TARGET_LP_WR_INTERLOCK (!TARGET_ARC600_FAMILY)
 
+/* FPU defines.  */
+/* Any FPU support.  */
+#define TARGET_HARD_FLOAT (arc_fpu_build != 0)
+/* Single precision floating point support.  */
+#define TARGET_FP_SINGLE  ((arc_fpu_build & FPU_SP) != 0)
+/* Double precision floating point support.  */
+#define TARGET_FP_DOUBLE  ((arc_fpu_build & FPU_DP) != 0)
+/* Single precissionf floating point support with fused operation.  */
+#define TARGET_FP_SFUZED  ((arc_fpu_build & FPU_SF) != 0)
+/* Double precissionf floating point support with fused operation.  */
+#define TARGET_FP_DFUZED  ((arc_fpu_build & FPU_DF) != 0)
+/* Single precision floating point conversion instruction support.  */
+#define TARGET_FP_SCONV   ((arc_fpu_build & FPU_SC) != 0)
+/* Double precision floating point conversion instruction support.  */
+#define TARGET_FP_DCONV   ((arc_fpu_build & FPU_DC) != 0)
+/* Single precision floating point SQRT/DIV instruction support.  */
+#define TARGET_FP_SSQRT   ((arc_fpu_build & FPU_SD) != 0)
+/* Double precision floating point SQRT/DIV instruction support.  */
+#define TARGET_FP_DSQRT   ((arc_fpu_build & FPU_DD) != 0)
+/* Double precision floating point assist instruction support.  */
+#define TARGET_FP_DPAX    ((arc_fpu_build & FPX_DP) != 0)
+
 #endif /* GCC_ARC_H */
diff --git a/gcc/config/arc/arc.md b/gcc/config/arc/arc.md
index 602cf0b..027862a 100644
--- a/gcc/config/arc/arc.md
+++ b/gcc/config/arc/arc.md
@@ -175,6 +175,7 @@ 
    (ILINK2_REGNUM 30)
    (RETURN_ADDR_REGNUM 31)
    (MUL64_OUT_REG 58)
+   (ARCV2_ACC 58)
 
    (LP_COUNT 60)
    (CC_REG 61)
@@ -201,7 +202,8 @@ 
    simd_varith_with_acc, simd_vlogic, simd_vlogic_with_acc,
    simd_vcompare, simd_vpermute, simd_vpack, simd_vpack_with_acc,
    simd_valign, simd_valign_with_acc, simd_vcontrol,
-   simd_vspecial_3cycle, simd_vspecial_4cycle, simd_dma, mul16_em, div_rem"
+   simd_vspecial_3cycle, simd_vspecial_4cycle, simd_dma, mul16_em, div_rem,
+   fpu"
   (cond [(eq_attr "is_sfunc" "yes")
 	 (cond [(match_test "!TARGET_LONG_CALLS_SET && (!TARGET_MEDIUM_CALLS || GET_CODE (PATTERN (insn)) != COND_EXEC)") (const_string "call")
 		(match_test "flag_pic") (const_string "sfunc")]
@@ -3364,7 +3366,8 @@ 
 
 })
 
-(define_mode_iterator SDF [SF DF])
+(define_mode_iterator SDF [(SF "TARGET_FP_SINGLE || TARGET_OPTFPE")
+			   (DF "TARGET_OPTFPE")])
 
 (define_expand "cstore<mode>4"
   [(set (reg:CC CC_REG)
@@ -3374,7 +3377,7 @@ 
 	(match_operator:SI 1 "comparison_operator" [(reg CC_REG)
 						    (const_int 0)]))]
 
-  "TARGET_OPTFPE"
+  "TARGET_FP_SINGLE || TARGET_OPTFPE"
 {
   gcc_assert (XEXP (operands[1], 0) == operands[2]);
   gcc_assert (XEXP (operands[1], 1) == operands[3]);
@@ -5167,12 +5170,12 @@ 
 		    (match_operand:SDF 2 "register_operand" "")))
    (set (pc)
 	(if_then_else
-	      (match_operator 0 "comparison_operator" [(reg CC_REG)
-						       (const_int 0)])
-	      (label_ref (match_operand 3 "" ""))
-	      (pc)))]
+	 (match_operator 0 "comparison_operator" [(reg CC_REG)
+						      (const_int 0)])
+	 (label_ref (match_operand 3 "" ""))
+	 (pc)))]
 
-  "TARGET_OPTFPE"
+  "TARGET_FP_SINGLE || TARGET_OPTFPE"
 {
   gcc_assert (XEXP (operands[0], 0) == operands[1]);
   gcc_assert (XEXP (operands[0], 1) == operands[2]);
@@ -5624,9 +5627,140 @@ 
   [(set_attr "length" "4")
    (set_attr "type" "misc")])
 
+
+;; FPU/FPX expands
+
+;;add
+(define_expand "addsf3"
+  [(set (match_operand:SF 0 "register_operand"           "")
+	(plus:SF (match_operand:SF 1 "nonmemory_operand" "")
+		 (match_operand:SF 2 "nonmemory_operand" "")))]
+  "TARGET_FP_SINGLE || TARGET_SPFP"
+  "")
+
+;;sub
+(define_expand "subsf3"
+  [(set (match_operand:SF 0 "register_operand"            "")
+	(minus:SF (match_operand:SF 1 "nonmemory_operand" "")
+		  (match_operand:SF 2 "nonmemory_operand" "")))]
+  "TARGET_FP_SINGLE || TARGET_SPFP"
+  "")
+
+;;mul
+(define_expand "mulsf3"
+  [(set (match_operand:SF 0 "register_operand"           "")
+	(mult:SF (match_operand:SF 1 "nonmemory_operand" "")
+		 (match_operand:SF 2 "nonmemory_operand" "")))]
+  "TARGET_FP_SINGLE || TARGET_SPFP"
+  "")
+
+;;add
+(define_expand "adddf3"
+  [(set (match_operand:DF 0 "double_register_operand"           "")
+	(plus:DF (match_operand:DF 1 "double_register_operand"  "")
+		 (match_operand:DF 2 "nonmemory_operand" "")))
+     ]
+ "TARGET_FP_DOUBLE || TARGET_DPFP"
+ "
+  if (TARGET_DPFP)
+   {
+    if (GET_CODE (operands[2]) == CONST_DOUBLE)
+     {
+        rtx high, low, tmp;
+        split_double (operands[2], &low, &high);
+        tmp = force_reg (SImode, high);
+        emit_insn(gen_adddf3_insn(operands[0], operands[1], operands[2],tmp,const0_rtx));
+     }
+    else
+     emit_insn(gen_adddf3_insn(operands[0], operands[1], operands[2],const1_rtx,const1_rtx));
+   DONE;
+  }
+ else if (TARGET_FP_DOUBLE)
+  {
+   if (!even_register_operand (operands[2], DFmode))
+      operands[2] = force_reg (DFmode, operands[2]);
+
+   if (!even_register_operand (operands[1], DFmode))
+      operands[1] = force_reg (DFmode, operands[1]);
+  }
+ else
+  gcc_unreachable ();
+ ")
+
+;;sub
+(define_expand "subdf3"
+  [(set (match_operand:DF 0 "double_register_operand"            "")
+	(minus:DF (match_operand:DF 1 "nonmemory_operand" "")
+		  (match_operand:DF 2 "nonmemory_operand" "")))]
+  "TARGET_FP_DOUBLE || TARGET_DPFP"
+  "
+   if (TARGET_DPFP)
+    {
+     if ((GET_CODE (operands[1]) == CONST_DOUBLE) || GET_CODE (operands[2]) == CONST_DOUBLE)
+      {
+        rtx high, low, tmp;
+        int const_index = ((GET_CODE (operands[1]) == CONST_DOUBLE) ? 1: 2);
+        split_double (operands[const_index], &low, &high);
+        tmp = force_reg (SImode, high);
+        if (TARGET_EM && GET_CODE (operands[1]) == CONST_DOUBLE)
+           emit_insn(gen_subdf3_insn(operands[0], operands[2], operands[1],tmp,const0_rtx));
+        else
+           emit_insn(gen_subdf3_insn(operands[0], operands[1], operands[2],tmp,const0_rtx));
+      }
+    else
+     emit_insn(gen_subdf3_insn(operands[0], operands[1], operands[2],const1_rtx,const1_rtx));
+    DONE;
+   }
+  else if (TARGET_FP_DOUBLE)
+   {
+    if (!even_register_operand (operands[2], DFmode))
+       operands[2] = force_reg (DFmode, operands[2]);
+
+    if (!even_register_operand (operands[1], DFmode))
+       operands[1] = force_reg (DFmode, operands[1]);
+   }
+  else
+   gcc_unreachable ();
+  ")
+
+;;mul
+(define_expand "muldf3"
+  [(set (match_operand:DF 0 "double_register_operand"           "")
+	(mult:DF (match_operand:DF 1 "double_register_operand"  "")
+		 (match_operand:DF 2 "nonmemory_operand" "")))]
+  "TARGET_FP_DOUBLE || TARGET_DPFP"
+  "
+   if (TARGET_DPFP)
+    {
+     if (GET_CODE (operands[2]) == CONST_DOUBLE)
+      {
+        rtx high, low, tmp;
+        split_double (operands[2], &low, &high);
+        tmp = force_reg (SImode, high);
+        emit_insn(gen_muldf3_insn(operands[0], operands[1], operands[2],tmp,const0_rtx));
+      }
+     else
+      emit_insn(gen_muldf3_insn(operands[0], operands[1], operands[2],const1_rtx,const1_rtx));
+    DONE;
+   }
+  else if (TARGET_FP_DOUBLE)
+   {
+    if (!even_register_operand (operands[2], DFmode))
+       operands[2] = force_reg (DFmode, operands[2]);
+
+    if (!even_register_operand (operands[1], DFmode))
+       operands[1] = force_reg (DFmode, operands[1]);
+   }
+  else
+   gcc_unreachable ();
+ ")
+
 ;; include the arc-FPX instructions
 (include "fpx.md")
 
+;; include the arc-FPU instructions
+(include "fpu.md")
+
 (include "simdext.md")
 
 ;; include atomic extensions
diff --git a/gcc/config/arc/arc.opt b/gcc/config/arc/arc.opt
index 00b98d5..080acd6 100644
--- a/gcc/config/arc/arc.opt
+++ b/gcc/config/arc/arc.opt
@@ -413,3 +413,59 @@  Enable atomic instructions.
 mll64
 Target Report Mask(LL64)
 Enable double load/store instructions for ARC HS.
+
+mfpu=
+Target RejectNegative Joined Enum(arc_fpu) Var(arc_fpu_build) Init(0)
+Specify the name of the target floating point configuration.
+
+Enum
+Name(arc_fpu) Type(int)
+
+EnumValue
+Enum(arc_fpu) String(fpus) Value(FPU_SP | FPU_SC)
+
+EnumValue
+Enum(arc_fpu) String(fpud) Value(FPU_SP | FPU_SC | FPU_DP | FPU_DC)
+
+EnumValue
+Enum(arc_fpu) String(fpuda) Value(FPU_SP | FPU_SC | FPX_DP)
+
+EnumValue
+Enum(arc_fpu) String(fpuda_div) Value(FPU_SP | FPU_SC | FPU_SD | FPX_DP)
+
+EnumValue
+Enum(arc_fpu) String(fpuda_fma) Value(FPU_SP | FPU_SC | FPU_SF | FPX_DP)
+
+EnumValue
+Enum(arc_fpu) String(fpuda_all) Value(FPU_SP | FPU_SC | FPU_SF | FPU_SD | FPX_DP)
+
+EnumValue
+Enum(arc_fpu) String(fpus_div) Value(FPU_SP | FPU_SC | FPU_SD)
+
+EnumValue
+Enum(arc_fpu) String(fpud_div) Value(FPU_SP | FPU_SC | FPU_SD | FPU_DP | FPU_DC | FPU_DD)
+
+EnumValue
+Enum(arc_fpu) String(fpus_fma) Value(FPU_SP | FPU_SC | FPU_SF)
+
+EnumValue
+Enum(arc_fpu) String(fpud_fma) Value(FPU_SP | FPU_SC | FPU_SF | FPU_DP | FPU_DC | FPU_DF)
+
+EnumValue
+Enum(arc_fpu) String(fpus_all) Value(FPU_SP | FPU_SC | FPU_SF | FPU_SD)
+
+EnumValue
+Enum(arc_fpu) String(fpud_all) Value(FPU_SP | FPU_SC | FPU_SF | FPU_SD | FPU_DP | FPU_DC | FPU_DF | FPU_DD)
+
+mabi=
+Target RejectNegative Joined Var(arc_abi) Enum(arc_abi_type) Init(ARC_ABI_DEFAULT)
+Specify the desired ABI used for ARC HS processors. Variants can be default or mwabi.
+
+Enum
+Name(arc_abi_type) Type(enum arc_abi_type)
+
+EnumValue
+Enum(arc_abi_type) String(default) Value(ARC_ABI_DEFAULT)
+
+EnumValue
+Enum(arc_abi_type) String(optimized) Value(ARC_ABI_OPTIMIZED)
diff --git a/gcc/config/arc/fpu.md b/gcc/config/arc/fpu.md
new file mode 100644
index 0000000..0f156c5
--- /dev/null
+++ b/gcc/config/arc/fpu.md
@@ -0,0 +1,580 @@ 
+;; ::::::::::::::::::::
+;; ::
+;; :: 32-bit floating point arithmetic
+;; ::
+;; ::::::::::::::::::::
+
+;; Addition
+(define_insn "*addsf3_fpu"
+  [(set (match_operand:SF 0 "register_operand"          "=r,r,r,r,r")
+	(plus:SF (match_operand:SF 1 "nonmemory_operand" "0,r,0,r,F")
+		 (match_operand:SF 2 "nonmemory_operand" "r,r,F,F,r")))]
+  "TARGET_FP_SINGLE"
+  "fsadd%? %0,%1,%2"
+  [(set_attr "length" "4,4,8,8,8")
+   (set_attr "iscompact" "false")
+   (set_attr "type" "fpu")
+   (set_attr "predicable" "yes,no,yes,no,no")
+   (set_attr "cond" "canuse,nocond,canuse_limm,nocond,nocond")
+   ])
+
+;; Subtraction
+(define_insn "*subsf3_fpu"
+  [(set (match_operand:SF 0 "register_operand"           "=r,r,r,r,r")
+	(minus:SF (match_operand:SF 1 "nonmemory_operand" "0,r,0,r,F")
+		  (match_operand:SF 2 "nonmemory_operand" "r,r,F,F,r")))]
+  "TARGET_FP_SINGLE"
+  "fssub%? %0,%1,%2"
+  [(set_attr "length" "4,4,8,8,8")
+   (set_attr "iscompact" "false")
+   (set_attr "type" "fpu")
+   (set_attr "predicable" "yes,no,yes,no,no")
+   (set_attr "cond" "canuse,nocond,canuse_limm,nocond,nocond")
+   ])
+
+;; Multiplication
+(define_insn "*mulsf3_fpu"
+  [(set (match_operand:SF 0 "register_operand"          "=r,r,r,r,r")
+	(mult:SF (match_operand:SF 1 "nonmemory_operand" "0,r,0,r,F")
+		 (match_operand:SF 2 "nonmemory_operand" "r,r,F,F,r")))]
+  "TARGET_FP_SINGLE"
+  "fsmul%? %0,%1,%2"
+  [(set_attr "length" "4,4,8,8,8")
+   (set_attr "iscompact" "false")
+   (set_attr "type" "fpu")
+   (set_attr "predicable" "yes,no,yes,no,no")
+   (set_attr "cond" "canuse,nocond,canuse_limm,nocond,nocond")
+   ])
+
+;; Multiplication with addition/subtraction
+(define_expand "fmasf4"
+  [(set (match_operand:SF 0 "register_operand" "")
+	(fma:SF (match_operand:SF 1 "nonmemory_operand" "")
+		(match_operand:SF 2 "nonmemory_operand" "")
+		(match_operand:SF 3 "nonmemory_operand" "")))]
+  "TARGET_FP_SFUZED"
+  "{
+   rtx tmp;
+   tmp = gen_rtx_REG (SFmode, ACCL_REGNO);
+   emit_move_insn (tmp, operands[3]);
+   operands[3] = tmp;
+   }")
+
+(define_expand "fnmasf4"
+  [(set (match_operand:SF 0 "register_operand" "")
+	(fma:SF (neg:SF (match_operand:SF 1 "nonmemory_operand" ""))
+		(match_operand:SF 2 "nonmemory_operand"         "")
+		(match_operand:SF 3 "nonmemory_operand"         "")))]
+  "TARGET_FP_SFUZED"
+  "{
+   rtx tmp;
+   tmp = gen_rtx_REG (SFmode, ACCL_REGNO);
+   emit_move_insn (tmp, operands[3]);
+   operands[3] = tmp;
+}")
+
+(define_insn "fmasf4_fpu"
+  [(set (match_operand:SF 0 "register_operand"         "=r,r,r,r,r")
+	(fma:SF (match_operand:SF 1 "nonmemory_operand" "0,r,0,r,F")
+		(match_operand:SF 2 "nonmemory_operand" "r,r,F,F,r")
+		(match_operand:SF 3 "mlo_operand" "")))]
+  "TARGET_FP_SFUZED"
+  "fsmadd%? %0,%1,%2"
+  [(set_attr "length" "4,4,8,8,8")
+   (set_attr "predicable" "yes,no,yes,no,no")
+   (set_attr "cond" "canuse,nocond,canuse_limm,nocond,nocond")
+   (set_attr "iscompact" "false")
+   (set_attr "type" "fpu")])
+
+(define_insn "fnmasf4_fpu"
+  [(set (match_operand:SF 0 "register_operand"                 "=r,r,r,r,r")
+	(fma:SF (neg:SF (match_operand:SF 1 "nonmemory_operand" "0,r,0,r,F"))
+		(match_operand:SF 2 "nonmemory_operand"         "r,r,F,F,r")
+		(match_operand:SF 3 "mlo_operand" "")))]
+  "TARGET_FP_SFUZED"
+  "fsmsub%? %0,%1,%2"
+  [(set_attr "length" "4,4,8,8,8")
+   (set_attr "predicable" "yes,no,yes,no,no")
+   (set_attr "cond" "canuse,nocond,canuse_limm,nocond,nocond")
+   (set_attr "iscompact" "false")
+   (set_attr "type" "fpu")])
+
+(define_expand "fmadf4"
+  [(match_operand:DF 0 "even_register_operand" "")
+   (match_operand:DF 1 "even_register_operand" "")
+   (match_operand:DF 2 "even_register_operand" "")
+   (match_operand:DF 3 "even_register_operand" "")]
+  "TARGET_FP_DFUZED"
+  "{
+   emit_insn (gen_fmadf4_split (operands[0], operands[1], operands[2], operands[3]));
+   DONE;
+   }")
+
+(define_insn_and_split "fmadf4_split"
+  [(set (match_operand:DF 0 "even_register_operand"        "")
+	(fma:DF (match_operand:DF 1 "even_register_operand" "")
+		(match_operand:DF 2 "even_register_operand" "")
+		(match_operand:DF 3 "even_register_operand" "")))
+   (clobber (reg:DF ARCV2_ACC))]
+  "TARGET_FP_DFUZED"
+  "#"
+  "TARGET_FP_DFUZED"
+  [(const_int 0)]
+  "{
+   rtx acc_reg = gen_rtx_REG (DFmode, ACC_REG_FIRST);
+   emit_move_insn (acc_reg, operands[3]);
+   emit_insn (gen_fmadf4_fpu (operands[0], operands[1], operands[2]));
+   DONE;
+  }"
+)
+
+(define_expand "fnmadf4"
+  [(match_operand:DF 0 "even_register_operand" "")
+   (match_operand:DF 1 "even_register_operand" "")
+   (match_operand:DF 2 "even_register_operand" "")
+   (match_operand:DF 3 "even_register_operand" "")]
+  "TARGET_FP_DFUZED"
+  "{
+   emit_insn (gen_fnmadf4_split (operands[0], operands[1], operands[2], operands[3]));
+   DONE;
+   }")
+
+(define_insn_and_split "fnmadf4_split"
+  [(set (match_operand:DF 0 "even_register_operand"                 "")
+	(fma:DF (neg:DF (match_operand:DF 1 "even_register_operand" ""))
+		(match_operand:DF 2 "even_register_operand"         "")
+		(match_operand:DF 3 "even_register_operand"         "")))
+   (clobber (reg:DF ARCV2_ACC))]
+  "TARGET_FP_DFUZED"
+  "#"
+  "TARGET_FP_DFUZED"
+  [(const_int 0)]
+  "{
+   rtx acc_reg = gen_rtx_REG (DFmode, ACC_REG_FIRST);
+   emit_move_insn (acc_reg, operands[3]);
+   emit_insn (gen_fnmadf4_fpu (operands[0], operands[1], operands[2]));
+   DONE;
+  }")
+
+(define_insn "fmadf4_fpu"
+  [(set (match_operand:DF 0 "even_register_operand"        "=r,r")
+	(fma:DF (match_operand:DF 1 "even_register_operand" "0,r")
+		(match_operand:DF 2 "even_register_operand" "r,r")
+		(reg:DF ARCV2_ACC)))]
+  "TARGET_FP_DFUZED"
+  "fdmadd%? %0,%1,%2"
+  [(set_attr "length" "4,4")
+   (set_attr "predicable" "yes,no")
+   (set_attr "cond" "canuse,nocond")
+   (set_attr "iscompact" "false")
+   (set_attr "type" "fpu")])
+
+(define_insn "fnmadf4_fpu"
+  [(set (match_operand:DF 0 "even_register_operand"                "=r,r")
+	(fma:DF (neg:DF (match_operand:DF 1 "even_register_operand" "0,r"))
+		(match_operand:DF 2 "even_register_operand"         "r,r")
+		(reg:DF ARCV2_ACC)))]
+  "TARGET_FP_DFUZED"
+  "fdmsub%? %0,%1,%2"
+  [(set_attr "length" "4,4")
+   (set_attr "predicable" "yes,no")
+   (set_attr "cond" "canuse,nocond")
+   (set_attr "iscompact" "false")
+   (set_attr "type" "fpu")])
+
+;; Division
+(define_insn "divsf3"
+  [(set (match_operand:SF 0 "register_operand"         "=r,r,r,r,r")
+	(div:SF (match_operand:SF 1 "nonmemory_operand" "0,r,0,r,F")
+		(match_operand:SF 2 "nonmemory_operand" "r,r,F,F,r")))]
+  "TARGET_FP_SSQRT"
+  "fsdiv%? %0,%1,%2"
+  [(set_attr "length" "4,4,8,8,8")
+   (set_attr "iscompact" "false")
+   (set_attr "type" "fpu")
+   (set_attr "predicable" "yes,no,yes,no,no")
+   (set_attr "cond" "canuse,nocond,canuse_limm,nocond,nocond")
+   ])
+
+;; Negation
+;; see pattern in arc.md
+
+;; Absolute value
+;; see pattern in arc.md
+
+;; Square root
+(define_insn "sqrtsf2"
+  [(set (match_operand:SF 0 "register_operand"           "=r,r")
+	(sqrt:SF (match_operand:SF 1 "nonmemory_operand"  "r,F")))]
+  "TARGET_FP_SSQRT"
+  "fssqrt %0,%1"
+  [(set_attr "length" "4,8")
+   (set_attr "type" "fpu")])
+
+;; Comparison
+(define_insn "*cmpsf_fpu"
+  [(set (reg:CC_FPU CC_REG)
+	(compare:CC_FPU (match_operand:SF 0 "register_operand"  "r,r")
+			(match_operand:SF 1 "nonmemory_operand" "r,F")))]
+  "TARGET_FP_SINGLE"
+  "fscmp%? %0, %1"
+  [(set_attr "length" "4,8")
+   (set_attr "iscompact" "false")
+   (set_attr "cond" "set")
+   (set_attr "type" "fpu")
+   (set_attr "predicable" "yes,yes")])
+
+(define_insn "*cmpsf_trap_fpu"
+  [(set (reg:CC_FPUE CC_REG)
+	(compare:CC_FPUE (match_operand:SF 0 "register_operand"  "r,r")
+			 (match_operand:SF 1 "nonmemory_operand" "r,F")))]
+  "TARGET_FP_SINGLE"
+  "fscmpf%? %0, %1"
+  [(set_attr "length" "4,8")
+   (set_attr "iscompact" "false")
+   (set_attr "cond" "set")
+   (set_attr "type" "fpu")
+   (set_attr "predicable" "yes,yes")])
+
+(define_insn "*cmpsf_fpu_uneq"
+  [(set (reg:CC_FPU_UNEQ CC_REG)
+	(compare:CC_FPU_UNEQ
+	 (match_operand:SF 0 "register_operand"  "r,r")
+	 (match_operand:SF 1 "nonmemory_operand" "r,F")))]
+  "TARGET_FP_SINGLE"
+  "fscmp %0, %1\\n\\tmov.v.f 0,0\\t;set Z flag"
+  [(set_attr "length" "8,16")
+   (set_attr "iscompact" "false")
+   (set_attr "cond" "set")
+   (set_attr "type" "fpu")])
+
+;; ::::::::::::::::::::
+;; ::
+;; :: 64-bit floating point arithmetic
+;; ::
+;; ::::::::::::::::::::
+
+;; Addition
+(define_insn "*adddf3_fpu"
+  [(set (match_operand:DF 0 "even_register_operand"          "=r,r")
+	(plus:DF (match_operand:DF 1 "even_register_operand"  "0,r")
+		 (match_operand:DF 2 "even_register_operand"  "r,r")))]
+  "TARGET_FP_DOUBLE"
+  "fdadd%? %0,%1,%2"
+  [(set_attr "length" "4,4")
+   (set_attr "iscompact" "false")
+   (set_attr "type" "fpu")
+   (set_attr "predicable" "yes,no")
+   (set_attr "cond" "canuse,nocond")
+   ])
+
+
+;; Subtraction
+(define_insn "*subdf3_fpu"
+  [(set (match_operand:DF 0 "even_register_operand"           "=r,r")
+	(minus:DF (match_operand:DF 1 "even_register_operand"  "0,r")
+		  (match_operand:DF 2 "even_register_operand"  "r,r")))]
+  "TARGET_FP_DOUBLE"
+  "fdsub%? %0,%1,%2"
+  [(set_attr "length" "4,4")
+   (set_attr "iscompact" "false")
+   (set_attr "type" "fpu")
+   (set_attr "predicable" "yes,no")
+   (set_attr "cond" "canuse,nocond")
+   ])
+
+;; Multiplication
+(define_insn "*muldf3_fpu"
+  [(set (match_operand:DF 0 "even_register_operand"          "=r,r")
+	(mult:DF (match_operand:DF 1 "even_register_operand"  "0,r")
+		 (match_operand:DF 2 "even_register_operand"  "r,r")))]
+  "TARGET_FP_DOUBLE"
+  "fdmul%? %0,%1,%2"
+  [(set_attr "length" "4,4")
+   (set_attr "iscompact" "false")
+   (set_attr "type" "fpu")
+   (set_attr "predicable" "yes,no")
+   (set_attr "cond" "canuse,nocond")
+   ])
+
+;; Division
+(define_insn "divdf3"
+  [(set (match_operand:DF 0 "even_register_operand"         "=r,r")
+	(div:DF (match_operand:DF 1 "even_register_operand"  "0,r")
+		(match_operand:DF 2 "even_register_operand"  "r,r")))]
+  "TARGET_FP_DSQRT"
+  "fddiv%? %0,%1,%2"
+  [(set_attr "length" "4,4")
+   (set_attr "iscompact" "false")
+   (set_attr "type" "fpu")
+   (set_attr "predicable" "yes,no")
+   (set_attr "cond" "canuse,nocond")
+   ])
+
+;; Square root
+(define_insn "sqrtdf2"
+  [(set (match_operand:DF 0 "even_register_operand"          "=r")
+	(sqrt:DF (match_operand:DF 1 "even_register_operand"  "r")))]
+  "TARGET_FP_DSQRT"
+  "fdsqrt %0,%1"
+  [(set_attr "length" "4")
+   (set_attr "type" "fpu")])
+
+;; Comparison
+(define_insn "*cmpdf_fpu"
+  [(set (reg:CC_FPU CC_REG)
+	(compare:CC_FPU (match_operand:DF 0 "even_register_operand"  "r")
+			(match_operand:DF 1 "even_register_operand"  "r")))]
+  "TARGET_FP_DOUBLE"
+  "fdcmp%? %0, %1"
+  [(set_attr "length" "4")
+   (set_attr "iscompact" "false")
+   (set_attr "cond" "set")
+   (set_attr "type" "fpu")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*cmpdf_trap_fpu"
+  [(set (reg:CC_FPUE CC_REG)
+	(compare:CC_FPUE (match_operand:DF 0 "even_register_operand"  "r")
+			 (match_operand:DF 1 "even_register_operand"  "r")))]
+  "TARGET_FP_DOUBLE"
+  "fdcmpf%? %0, %1"
+  [(set_attr "length" "4")
+   (set_attr "iscompact" "false")
+   (set_attr "cond" "set")
+   (set_attr "type" "fpu")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*cmpsf_fpu_uneq"
+  [(set (reg:CC_FPU_UNEQ CC_REG)
+	(compare:CC_FPU_UNEQ
+	 (match_operand:DF 0 "even_register_operand"  "r")
+	 (match_operand:DF 1 "even_register_operand"  "r")))]
+  "TARGET_FP_DOUBLE"
+  "fdcmp %0, %1\\n\\tmov.v.f 0,0\\t;set Z flag"
+  [(set_attr "length" "8")
+   (set_attr "iscompact" "false")
+   (set_attr "cond" "set")
+   (set_attr "type" "fpu")])
+
+;; ::::::::::::::::::::
+;; ::
+;; :: Conversion routines
+;; ::
+;; ::::::::::::::::::::
+
+;; SF->DF
+(define_insn "extendsfdf2"
+  [(set (match_operand:DF 0 "even_register_operand"             "=r,r")
+	(float_extend:DF (match_operand:SF 1 "register_operand"  "0,r")))]
+  "TARGET_FP_DCONV"
+  "fcvt32_64%? %0,%1,0x04\\t;fs2d %0,%1"
+  [(set_attr "length" "4,4")
+   (set_attr "iscompact" "false")
+   (set_attr "type" "fpu")
+   (set_attr "predicable" "yes,no")]
+)
+
+;; SI->DF
+(define_insn "floatsidf2"
+  [(set (match_operand:DF 0 "even_register_operand"      "=r,r")
+	(float:DF (match_operand:SI 1 "register_operand"  "0,r")))]
+  "TARGET_FP_DCONV"
+  "fcvt32_64%? %0,%1,0x02\\t;fint2d %0,%1"
+  [(set_attr "length" "4,4")
+   (set_attr "iscompact" "false")
+   (set_attr "type" "fpu")
+   (set_attr "predicable" "yes,no")]
+)
+
+;; uSI->DF
+(define_insn "floatunssidf2"
+  [(set (match_operand:DF 0 "even_register_operand"               "=r,r")
+	(unsigned_float:DF (match_operand:SI 1 "register_operand"  "0,r")))]
+  "TARGET_FP_DCONV"
+  "fcvt32_64%? %0,%1,0x00\\t;fuint2d %0,%1"
+  [(set_attr "length" "4,4")
+   (set_attr "iscompact" "false")
+   (set_attr "type" "fpu")
+   (set_attr "predicable" "yes,no")]
+)
+
+;; SF->uDI (using rounding towards zero)
+(define_insn "fixuns_truncsfdi2"
+  [(set (match_operand:DI 0 "even_register_operand"                    "=r,r")
+	(unsigned_fix:DI (fix:SF (match_operand:SF 1 "register_operand" "0,r"))))]
+  "TARGET_FP_DCONV"
+  "fcvt32_64%? %0,%1,0x09\\t;fs2ul_rz %0,%1"
+  [(set_attr "length" "4,4")
+   (set_attr "iscompact" "false")
+   (set_attr "type" "fpu")
+   (set_attr "predicable" "yes,no")]
+)
+
+;; SF->DI (using rounding towards zero)
+(define_insn "fix_truncsfdi2"
+  [(set (match_operand:DI 0 "even_register_operand"           "=r,r")
+	(fix:DI (fix:SF (match_operand:SF 1 "register_operand" "0,r"))))]
+  "TARGET_FP_DCONV"
+  "fcvt32_64%? %0,%1,0x0B\\t;fs2l_rz %0,%1"
+  [(set_attr "length" "4,4")
+   (set_attr "iscompact" "false")
+   (set_attr "type" "fpu")
+   (set_attr "predicable" "yes,no")]
+)
+
+;; SI->SF
+(define_insn "floatsisf2"
+  [(set (match_operand:SF 0 "register_operand"           "=r,r")
+	(float:SF (match_operand:SI 1 "register_operand"  "0,r")))]
+  "TARGET_FP_SCONV"
+  "fcvt32%? %0,%1,0x02\\t;fint2s %0,%1"
+  [(set_attr "length" "4,4")
+   (set_attr "iscompact" "false")
+   (set_attr "type" "fpu")
+   (set_attr "predicable" "yes,no")]
+)
+
+;; uSI->SF
+(define_insn "floatunssisf2"
+  [(set (match_operand:SF 0 "register_operand"                    "=r,r")
+	(unsigned_float:SF (match_operand:SI 1 "register_operand"  "0,r")))]
+  "TARGET_FP_SCONV"
+  "fcvt32%? %0,%1,0x00\\t;fuint2s %0,%1"
+  [(set_attr "length" "4,4")
+   (set_attr "iscompact" "false")
+   (set_attr "type" "fpu")
+   (set_attr "predicable" "yes,no")]
+)
+
+;; SF->uSI (using rounding towards zero)
+(define_insn "fixuns_truncsfsi2"
+  [(set (match_operand:SI 0 "register_operand"                         "=r,r")
+	(unsigned_fix:SI (fix:SF (match_operand:SF 1 "register_operand" "0,r"))))]
+  "TARGET_FP_SCONV"
+  "fcvt32%? %0,%1,0x09\\t;fs2uint_rz %0,%1"
+  [(set_attr "length" "4,4")
+   (set_attr "iscompact" "false")
+   (set_attr "type" "fpu")
+   (set_attr "predicable" "yes,no")]
+)
+
+;; SF->SI (using rounding towards zero)
+(define_insn "fix_truncsfsi2"
+  [(set (match_operand:SI 0 "register_operand"                "=r,r")
+	(fix:SI (fix:SF (match_operand:SF 1 "register_operand" "0,r"))))]
+  "TARGET_FP_SCONV"
+  "fcvt32%? %0,%1,0x0B\\t;fs2int_rz %0,%1"
+  [(set_attr "length" "4,4")
+   (set_attr "iscompact" "false")
+   (set_attr "type" "fpu")
+   (set_attr "predicable" "yes,no")]
+)
+
+;; DI->DF
+(define_insn "floatdidf2"
+  [(set (match_operand:DF 0 "even_register_operand"          "=r,r")
+	(float:DF (match_operand:DI 1 "even_register_operand" "0,r")))]
+  "TARGET_FP_DCONV"
+  "fcvt64%? %0,%1,0x02\\t;fl2d %0,%1"
+  [(set_attr "length" "4,4")
+   (set_attr "iscompact" "false")
+   (set_attr "type" "fpu")
+   (set_attr "predicable" "yes,no")]
+)
+
+;; uDI->DF
+(define_insn "floatunsdidf2"
+  [(set (match_operand:DF 0 "even_register_operand"                   "=r,r")
+	(unsigned_float:DF (match_operand:DI 1 "even_register_operand" "0,r")))]
+  "TARGET_FP_DCONV"
+  "fcvt64%? %0,%1,0x00\\t;ful2d %0,%1"
+  [(set_attr "length" "4,4")
+   (set_attr "iscompact" "false")
+   (set_attr "type" "fpu")
+   (set_attr "predicable" "yes,no")]
+)
+
+;; DF->uDI (using rounding towards zero)
+(define_insn "fixuns_truncdfdi2"
+  [(set (match_operand:DI 0 "even_register_operand"                         "=r,r")
+	(unsigned_fix:DI (fix:DF (match_operand:DF 1 "even_register_operand" "0,r"))))]
+  "TARGET_FP_DCONV"
+  "fcvt64%? %0,%1,0x09\\t;fd2ul_rz %0,%1"
+  [(set_attr "length" "4,4")
+   (set_attr "iscompact" "false")
+   (set_attr "type" "fpu")
+   (set_attr "predicable" "yes,no")]
+)
+
+;; DF->DI (using rounding towards zero)
+(define_insn "fix_truncdfdi2"
+  [(set (match_operand:DI 0 "even_register_operand"                "=r,r")
+	(fix:DI (fix:DF (match_operand:DF 1 "even_register_operand" "0,r"))))]
+  "TARGET_FP_DCONV"
+  "fcvt64%? %0,%1,0x0B\\t;fd2l_rz %0,%1"
+  [(set_attr "length" "4,4")
+   (set_attr "iscompact" "false")
+   (set_attr "type" "fpu")
+   (set_attr "predicable" "yes,no")]
+)
+
+;; DF->SF
+(define_insn "truncdfsf2"
+  [(set (match_operand:SF 0 "register_operand"                        "=r,r")
+	(float_truncate:SF (match_operand:DF 1 "even_register_operand" "0,r")))]
+  "TARGET_FP_DCONV"
+  "fcvt64_32%? %0,%1,0x04\\t;fd2s %0,%1"
+  [(set_attr "length" "4,4")
+   (set_attr "iscompact" "false")
+   (set_attr "type" "fpu")
+   (set_attr "predicable" "yes,no")]
+)
+
+;; DI->SF
+(define_insn "floatdisf2"
+  [(set (match_operand:SF 0 "register_operand"               "=r,r")
+	(float:SF (match_operand:DI 1 "even_register_operand" "0,r")))]
+  "TARGET_FP_DCONV"
+  "fcvt64_32%? %0,%1,0x02\\t;fl2s %0,%1"
+  [(set_attr "length" "4,4")
+   (set_attr "iscompact" "false")
+   (set_attr "type" "fpu")
+   (set_attr "predicable" "yes,no")]
+)
+
+;; uDI->SF
+(define_insn "floatunsdisf2"
+  [(set (match_operand:SF 0 "register_operand"                        "=r,r")
+	(unsigned_float:SF (match_operand:DI 1 "even_register_operand" "0,r")))]
+  "TARGET_FP_DCONV"
+  "fcvt64_32%? %0,%1,0x00\\t;ful2s %0,%1"
+  [(set_attr "length" "4,4")
+   (set_attr "iscompact" "false")
+   (set_attr "type" "fpu")
+   (set_attr "predicable" "yes,no")]
+)
+
+;; DF->uSI (using rounding towards zero)
+(define_insn "fixuns_truncdfsi2"
+  [(set (match_operand:SI 0 "register_operand"                              "=r,r")
+	(unsigned_fix:SI (fix:DF (match_operand:DF 1 "even_register_operand" "0,r"))))]
+  "TARGET_FP_DCONV"
+  "fcvt64_32%? %0,%1,0x09\\t;fd2uint_rz %0,%1"
+  [(set_attr "length" "4,4")
+   (set_attr "iscompact" "false")
+   (set_attr "type" "fpu")
+   (set_attr "predicable" "yes,no")]
+)
+
+;; DF->SI (using rounding towards zero)
+(define_insn "fix_truncdfsi2"
+  [(set (match_operand:SI 0 "register_operand"                     "=r,r")
+	(fix:SI (fix:DF (match_operand:DF 1 "even_register_operand" "0,r"))))]
+  "TARGET_FP_DCONV"
+  "fcvt64_32%? %0,%1,0x0B\\t;fd2int_rz %0,%1"
+  [(set_attr "length" "4,4")
+   (set_attr "iscompact" "false")
+   (set_attr "type" "fpu")
+   (set_attr "predicable" "yes,no")]
+)
diff --git a/gcc/config/arc/fpx.md b/gcc/config/arc/fpx.md
index a4ecc4a..b790600 100644
--- a/gcc/config/arc/fpx.md
+++ b/gcc/config/arc/fpx.md
@@ -50,7 +50,7 @@ 
 
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 
-(define_insn "addsf3"
+(define_insn "*addsf3_fpx"
   [(set (match_operand:SF 0 "register_operand"          "=r,r,r,r,r ")
 	(plus:SF (match_operand:SF 1 "nonmemory_operand" "0,r,GCal,r,0")
 		 (match_operand:SF 2 "nonmemory_operand" "I,rL,r,GCal,LrCal")))]
@@ -65,7 +65,7 @@ 
   [(set_attr "type" "spfp")
   (set_attr "length" "4,4,8,8,8")])
 
-(define_insn "subsf3"
+(define_insn "*subsf3_fpx"
   [(set (match_operand:SF 0 "register_operand"          "=r,r,r,r,r ")
 	(minus:SF (match_operand:SF 1 "nonmemory_operand" "r,0,GCal,r,0")
 		 (match_operand:SF 2 "nonmemory_operand" "rL,I,r,GCal,LrCal")))]
@@ -80,7 +80,7 @@ 
   [(set_attr "type" "spfp")
   (set_attr "length" "4,4,8,8,8")])
 
-(define_insn "mulsf3"
+(define_insn "*mulsf3_fpx"
   [(set (match_operand:SF 0 "register_operand"          "=r,r,r,r,r ")
 	(mult:SF (match_operand:SF 1 "nonmemory_operand" "r,0,GCal,r,0")
 		 (match_operand:SF 2 "nonmemory_operand" "rL,I,r,GCal,LrCal")))]
@@ -226,25 +226,6 @@ 
 ;; daddh{0}{1} 0, {reg_pair}2.hi, {reg_pair}2.lo
 ;; OR
 ;; daddh{0}{1} 0, reg3, limm2.lo
-(define_expand "adddf3"
-  [(set (match_operand:DF 0 "arc_double_register_operand"          "")
-	(plus:DF (match_operand:DF 1 "arc_double_register_operand" "")
-		 (match_operand:DF 2 "nonmemory_operand" "")))
-     ]
- "TARGET_DPFP"
- " if (GET_CODE (operands[2]) == CONST_DOUBLE)
-     {
-        rtx high, low, tmp;
-        split_double (operands[2], &low, &high);
-        tmp = force_reg (SImode, high);
-        emit_insn(gen_adddf3_insn(operands[0], operands[1], operands[2],tmp,const0_rtx));
-     }
-   else
-     emit_insn(gen_adddf3_insn(operands[0], operands[1], operands[2],const1_rtx,const1_rtx));
-     DONE;
- "
-)
-
 ;; daddh{0}{1} 0, {reg_pair}2.hi, {reg_pair}2.lo  /* operand 4 = 1*/
 ;; OR
 ;; daddh{0}{1} 0, reg3, limm2.lo /* operand 4 = 0 */
@@ -270,25 +251,6 @@ 
 ;; dmulh{0}{1} 0, {reg_pair}2.hi, {reg_pair}2.lo
 ;; OR
 ;; dmulh{0}{1} 0, reg3, limm2.lo
-(define_expand "muldf3"
-  [(set (match_operand:DF 0 "arc_double_register_operand"          "")
-	(mult:DF (match_operand:DF 1 "arc_double_register_operand" "")
-		 (match_operand:DF 2 "nonmemory_operand" "")))]
-"TARGET_DPFP"
-"  if (GET_CODE (operands[2]) == CONST_DOUBLE)
-     {
-        rtx high, low, tmp;
-        split_double (operands[2], &low, &high);
-        tmp = force_reg (SImode, high);
-        emit_insn(gen_muldf3_insn(operands[0], operands[1], operands[2],tmp,const0_rtx));
-     }
-   else
-     emit_insn(gen_muldf3_insn(operands[0], operands[1], operands[2],const1_rtx,const1_rtx));
-
-  DONE;
- ")
-
-
 ;; dmulh{0}{1} 0, {reg_pair}2.hi, {reg_pair}2.lo /* operand 4 = 1*/
 ;; OR
 ;; dmulh{0}{1} 0, reg3, limm2.lo /* operand 4 = 0*/
@@ -317,26 +279,6 @@ 
 ;; drsubh{0}{2} 0, {reg_pair}1.hi, {reg_pair}1.lo
 ;; OR
 ;; drsubh{0}{2} 0, reg3, limm1.lo
-(define_expand "subdf3"
-  [(set (match_operand:DF 0 "arc_double_register_operand"          "")
-		    (minus:DF (match_operand:DF 1 "nonmemory_operand" "")
-				  (match_operand:DF 2 "nonmemory_operand" "")))]
-"TARGET_DPFP"
-"   if (GET_CODE (operands[1]) == CONST_DOUBLE || GET_CODE (operands[2]) == CONST_DOUBLE)
-     {
-        rtx high, low, tmp;
-        int const_index = ((GET_CODE (operands[1]) == CONST_DOUBLE) ? 1: 2);
-        split_double (operands[const_index], &low, &high);
-        tmp = force_reg (SImode, high);
-        emit_insn(gen_subdf3_insn(operands[0], operands[1], operands[2],tmp,const0_rtx));
-     }
-   else
-     emit_insn(gen_subdf3_insn(operands[0], operands[1], operands[2],const1_rtx,const1_rtx));
-
-   DONE;
-  "
-)
-
 ;; dsubh{0}{1} 0, {reg_pair}2.hi, {reg_pair}2.lo /* operand 4 = 1 */
 ;; OR
 ;; dsubh{0}{1} 0, reg3, limm2.lo /* operand 4 = 0*/
diff --git a/gcc/config/arc/predicates.md b/gcc/config/arc/predicates.md
index d384d70..e7d58a5 100644
--- a/gcc/config/arc/predicates.md
+++ b/gcc/config/arc/predicates.md
@@ -504,6 +504,12 @@ 
       return (code == EQ || code == NE || code == UNEQ || code == LTGT
 	      || code == ORDERED || code == UNORDERED);
 
+    case CC_FPUmode:
+    case CC_FPUEmode:
+      return !((code == LTGT) || (code == UNEQ));
+    case CC_FPU_UNEQmode:
+      return ((code == LTGT) || (code == UNEQ));
+
     case CCmode:
     case SImode: /* Used for BRcc.  */
       return 1;
@@ -797,3 +803,7 @@ 
    return (REG_P (op) && ((REGNO (op) >= FIRST_PSEUDO_REGISTER)
 			  || ((REGNO (op) & 1) == 0)));
   })
+
+(define_predicate "double_register_operand"
+  (ior (match_test "even_register_operand (op, mode)")
+       (match_test "arc_double_register_operand (op, mode)")))
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index ba0b4b2..250adfb 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -599,7 +599,7 @@  Objective-C and Objective-C++ Dialects}.
 -mmixed-code -mq-class -mRcq -mRcw -msize-level=@var{level} @gol
 -mtune=@var{cpu} -mmultcost=@var{num} @gol
 -munalign-prob-threshold=@var{probability} -mmpy-option=@var{multo} @gol
--mdiv-rem -mcode-density -mll64}
+-mdiv-rem -mcode-density -mll64 -mfpu=@var{fpu} -mabi=@var{name}}
 
 @emph{ARM Options}
 @gccoptlist{-mapcs-frame  -mno-apcs-frame @gol
@@ -13263,6 +13263,13 @@  Enable code density instructions for ARC EM, default on for ARC HS.
 @opindex mll64
 Enable double load/store operations for ARC HS cores.
 
+@item -mabi=@var{name}
+@opindex mabi
+Generate code for the specified ABI@.  Permissible values are:
+@samp{default}, and @samp{optimized}.  The @samp{optimized} value
+places double register arguments into even-odd registers.  This
+command make sense only for ARC HS cores.
+
 @item -mmpy-option=@var{multo}
 @opindex mmpy-option
 Compile ARCv2 code with a multiplier design option.  @samp{wlh1} is
@@ -13311,6 +13318,88 @@  MPYU, MPYM, MPYMU, and MPY_S.
 
 This option is only available for ARCv2 cores@.
 
+@item -mfpu=@var{fpu}
+@opindex mfpu
+Enables specific floating-point hardware extension for ARCv2
+core. Supported values for @var{fpu} are:
+
+@table @samp
+
+@item fpus
+@opindex fpus
+Enables support for single precision floating point hardware
+extensions@.
+
+@item fpud
+@opindex fpud
+Enables support for double precision floating point hardware
+extensions.  The single precision floating point extension is also
+enabled.  Not available for ARC EM@.
+
+@item fpuda
+@opindex fpuda
+Enables support for double precision floating point hardware
+extensions using double precision assist instructions.  The single
+precision floating point extension is also enabled.  This option is
+only available for ARC EM@.
+
+@item fpuda_div
+@opindex fpuda_div
+Enables support for double precision floating point hardware
+extensions using double precision assist instructions, and simple
+precision square-root and divide hardware extensions.  The single
+precision floating point extension is also enabled.  This option is
+only available for ARC EM@.
+
+@item fpuda_fma
+@opindex fpuda_fma
+Enables support for double precision floating point hardware
+extensions using double precision assist instructions, and simple
+precision fused multiple and add hardware extension.  The single
+precision floating point extension is also enabled.  This option is
+only available for ARC EM@.
+
+@item fpuda_all
+@opindex fpuda_all
+Enables support for double precision floating point hardware
+extensions using double precision assist instructions, and all simple
+precision hardware extensions.  The single precision floating point
+extension is also enabled.  This option is only available for ARC EM@.
+
+@item fpus_div
+@opindex fpus_div
+Enables support for single precision floating point, and single
+precision square-root and divide hardware extensions@.
+
+@item fpud_div
+@opindex fpud_div
+Enables support for double precision floating point, and double
+precision square-root and divide hardware extensions.  This option
+includes option @samp{fpus_div}. Not available for ARC EM@.
+
+@item fpus_fma
+@opindex fpus_fma
+Enables support for single precision floating point, and single
+precision fused multiple and add hardware extensions@.
+
+@item fpud_fma
+@opindex fpud_fma
+Enables support for double precision floating point, and double
+precision fused multiple and add hardware extensions.  This option
+includes option @samp{fpus_fma}.  Not available for ARC EM@.
+
+@item fpus_all
+@opindex fpus_all
+Enables support for all single precision floating point hardware
+extensions@.
+
+@item fpud_all
+@opindex fpud_all
+Enables support for all single and double precision floating point
+hardware extensions. Not available for ARC EM@.
+
+@end table
+
 @end table
 
 The following options are passed through to the assembler, and also