Patchwork PATCH: Simplify AVX cast and extract lower 128bit patterns

login
register
mail settings
Submitter H.J. Lu
Date June 22, 2010, 5:58 p.m.
Message ID <20100622175837.GA3734@intel.com>
Download mbox | patch
Permalink /patch/56557/
State New
Headers show

Comments

H.J. Lu - June 22, 2010, 5:58 p.m.
Hi,

AVX cast from 256bit to 128bit and extract lower 128bit from 256 bit
are the same operation. This patch replaces AVX cast with lower 128bit
extraction. It also uses define and split for lower 128bit extractions.
Tested on Linux/x86-64.  OK for trunk?

Thanks.


H.J.
---
2010-06-22  H.J. Lu  <hongjiu.lu@intel.com>

	* config/i386/i386.c (bdesc_args): Replace CODE_FOR_avx_si_si256,
	CODE_FOR_avx_ps_ps256 and CODE_FOR_avx_pd_pd256 with
	CODE_FOR_vec_extract_lo_v8si, CODE_FOR_vec_extract_lo_v8sf
	and CODE_FOR_vec_extract_lo_v4df.

	* config/i386/sse.md (vec_extract_lo_<mode>:AVX256MODE4P):
	Changed to define_insn_and_split.
	(vec_extract_lo_<mode>:AVX256MODE8P): Likewise.
	(vec_extract_lo_v16hi): Likewise.
	(vec_extract_lo_v32qi): Likewise.
	(avx_<avxmodesuffixp><avxmodesuffix>_<avxmodesuffixp>): Likewise.
	(avx_<avxmodesuffixp>_<avxmodesuffixp><avxmodesuffix>): Removed.
Uros Bizjak - June 22, 2010, 6:27 p.m.
On Tue, 2010-06-22 at 10:58 -0700, H.J. Lu wrote:
> Hi,
> 
> AVX cast from 256bit to 128bit and extract lower 128bit from 256 bit
> are the same operation. This patch replaces AVX cast with lower 128bit
> extraction. It also uses define and split for lower 128bit extractions.
> Tested on Linux/x86-64.  OK for trunk?
> 
> Thanks.
> 
> 
> H.J.
> ---
> 2010-06-22  H.J. Lu  <hongjiu.lu@intel.com>
> 
> 	* config/i386/i386.c (bdesc_args): Replace CODE_FOR_avx_si_si256,
> 	CODE_FOR_avx_ps_ps256 and CODE_FOR_avx_pd_pd256 with
> 	CODE_FOR_vec_extract_lo_v8si, CODE_FOR_vec_extract_lo_v8sf
> 	and CODE_FOR_vec_extract_lo_v4df.
> 
> 	* config/i386/sse.md (vec_extract_lo_<mode>:AVX256MODE4P):

If we want to specify pattern names with this approach, then please use
"vec_extract_lo_<AVX256MODE4P:mode>", as this is correct mode iterator
syntax.

> 	Changed to define_insn_and_split.
> 	(vec_extract_lo_<mode>:AVX256MODE8P): Likewise.
> 	(vec_extract_lo_v16hi): Likewise.
> 	(vec_extract_lo_v32qi): Likewise.
> 	(avx_<avxmodesuffixp><avxmodesuffix>_<avxmodesuffixp>): Likewise.
> 	(avx_<avxmodesuffixp>_<avxmodesuffixp><avxmodesuffix>): Removed.
> 

> -(define_insn "vec_extract_lo_<mode>"
> +(define_insn_and_split "vec_extract_lo_<mode>"
>    [(set (match_operand:<avxhalfvecmode> 0 "nonimmediate_operand" "=x,m")
>  	(vec_select:<avxhalfvecmode>
> -	  (match_operand:AVX256MODE4P 1 "register_operand" "x,x")
> +	  (match_operand:AVX256MODE4P 1 "nonimmediate_operand" "xm,x")
>  	  (parallel [(const_int 0) (const_int 1)])))]
>    "TARGET_AVX"
> -  "vextractf128\t{$0x0, %1, %0|%0, %1, 0x0}"
> -  [(set_attr "type" "sselog")
> -   (set_attr "prefix_extra" "1")
> -   (set_attr "length_immediate" "1")
> -   (set_attr "memory" "none,store")
> -   (set_attr "prefix" "vex")
> -   (set_attr "mode" "V8SF")])
> +  "#"
> +  "&& reload_completed"
> +  [(const_int 0)]
> +{
> +  rtx op1 = operands[1];
> +  if (REG_P (op1))
> +    op1 = gen_rtx_REG (<avxhalfvecmode>mode, REGNO (op1));
> +  else
> +    op1 = gen_lowpart (<avxhalfvecmode>mode, op1);
> +  emit_move_insn (operands[0], op1);
> +  DONE;
> +})

Hm, can't gen_lowpart handle register conversion directly?

Uros.
H.J. Lu - June 22, 2010, 6:33 p.m.
On Tue, Jun 22, 2010 at 11:27 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
> On Tue, 2010-06-22 at 10:58 -0700, H.J. Lu wrote:
>> Hi,
>>
>> AVX cast from 256bit to 128bit and extract lower 128bit from 256 bit
>> are the same operation. This patch replaces AVX cast with lower 128bit
>> extraction. It also uses define and split for lower 128bit extractions.
>> Tested on Linux/x86-64.  OK for trunk?
>>
>> Thanks.
>>
>>
>> H.J.
>> ---
>> 2010-06-22  H.J. Lu  <hongjiu.lu@intel.com>
>>
>>       * config/i386/i386.c (bdesc_args): Replace CODE_FOR_avx_si_si256,
>>       CODE_FOR_avx_ps_ps256 and CODE_FOR_avx_pd_pd256 with
>>       CODE_FOR_vec_extract_lo_v8si, CODE_FOR_vec_extract_lo_v8sf
>>       and CODE_FOR_vec_extract_lo_v4df.
>>
>>       * config/i386/sse.md (vec_extract_lo_<mode>:AVX256MODE4P):
>
> If we want to specify pattern names with this approach, then please use
> "vec_extract_lo_<AVX256MODE4P:mode>", as this is correct mode iterator
> syntax.

I will make the change.

>>       Changed to define_insn_and_split.
>>       (vec_extract_lo_<mode>:AVX256MODE8P): Likewise.
>>       (vec_extract_lo_v16hi): Likewise.
>>       (vec_extract_lo_v32qi): Likewise.
>>       (avx_<avxmodesuffixp><avxmodesuffix>_<avxmodesuffixp>): Likewise.
>>       (avx_<avxmodesuffixp>_<avxmodesuffixp><avxmodesuffix>): Removed.
>>
>
>> -(define_insn "vec_extract_lo_<mode>"
>> +(define_insn_and_split "vec_extract_lo_<mode>"
>>    [(set (match_operand:<avxhalfvecmode> 0 "nonimmediate_operand" "=x,m")
>>       (vec_select:<avxhalfvecmode>
>> -       (match_operand:AVX256MODE4P 1 "register_operand" "x,x")
>> +       (match_operand:AVX256MODE4P 1 "nonimmediate_operand" "xm,x")
>>         (parallel [(const_int 0) (const_int 1)])))]
>>    "TARGET_AVX"
>> -  "vextractf128\t{$0x0, %1, %0|%0, %1, 0x0}"
>> -  [(set_attr "type" "sselog")
>> -   (set_attr "prefix_extra" "1")
>> -   (set_attr "length_immediate" "1")
>> -   (set_attr "memory" "none,store")
>> -   (set_attr "prefix" "vex")
>> -   (set_attr "mode" "V8SF")])
>> +  "#"
>> +  "&& reload_completed"
>> +  [(const_int 0)]
>> +{
>> +  rtx op1 = operands[1];
>> +  if (REG_P (op1))
>> +    op1 = gen_rtx_REG (<avxhalfvecmode>mode, REGNO (op1));
>> +  else
>> +    op1 = gen_lowpart (<avxhalfvecmode>mode, op1);
>> +  emit_move_insn (operands[0], op1);
>> +  DONE;
>> +})
>
> Hm, can't gen_lowpart handle register conversion directly?
>

That is how it is done in other places in sse.md.
gen_lowpart may generate SUBREG, which we don't
want on registers.
Uros Bizjak - June 22, 2010, 6:47 p.m.
On Tue, 2010-06-22 at 11:33 -0700, H.J. Lu wrote:

> >> -(define_insn "vec_extract_lo_<mode>"
> >> +(define_insn_and_split "vec_extract_lo_<mode>"
> >>    [(set (match_operand:<avxhalfvecmode> 0 "nonimmediate_operand" "=x,m")
> >>       (vec_select:<avxhalfvecmode>
> >> -       (match_operand:AVX256MODE4P 1 "register_operand" "x,x")
> >> +       (match_operand:AVX256MODE4P 1 "nonimmediate_operand" "xm,x")
> >>         (parallel [(const_int 0) (const_int 1)])))]
> >>    "TARGET_AVX"
> >> -  "vextractf128\t{$0x0, %1, %0|%0, %1, 0x0}"
> >> -  [(set_attr "type" "sselog")
> >> -   (set_attr "prefix_extra" "1")
> >> -   (set_attr "length_immediate" "1")
> >> -   (set_attr "memory" "none,store")
> >> -   (set_attr "prefix" "vex")
> >> -   (set_attr "mode" "V8SF")])
> >> +  "#"
> >> +  "&& reload_completed"
> >> +  [(const_int 0)]
> >> +{
> >> +  rtx op1 = operands[1];
> >> +  if (REG_P (op1))
> >> +    op1 = gen_rtx_REG (<avxhalfvecmode>mode, REGNO (op1));
> >> +  else
> >> +    op1 = gen_lowpart (<avxhalfvecmode>mode, op1);
> >> +  emit_move_insn (operands[0], op1);
> >> +  DONE;
> >> +})
> >
> > Hm, can't gen_lowpart handle register conversion directly?
> >
> 
> That is how it is done in other places in sse.md.
> gen_lowpart may generate SUBREG, which we don't
> want on registers.

This is post-reload splitter, so IIRC, gen_lowpart will just change mode
of the hard register.  But, I'm not totaly sure...

Uros.
H.J. Lu - June 22, 2010, 7:12 p.m.
On Tue, Jun 22, 2010 at 11:47 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
> On Tue, 2010-06-22 at 11:33 -0700, H.J. Lu wrote:
>
>> >> -(define_insn "vec_extract_lo_<mode>"
>> >> +(define_insn_and_split "vec_extract_lo_<mode>"
>> >>    [(set (match_operand:<avxhalfvecmode> 0 "nonimmediate_operand" "=x,m")
>> >>       (vec_select:<avxhalfvecmode>
>> >> -       (match_operand:AVX256MODE4P 1 "register_operand" "x,x")
>> >> +       (match_operand:AVX256MODE4P 1 "nonimmediate_operand" "xm,x")
>> >>         (parallel [(const_int 0) (const_int 1)])))]
>> >>    "TARGET_AVX"
>> >> -  "vextractf128\t{$0x0, %1, %0|%0, %1, 0x0}"
>> >> -  [(set_attr "type" "sselog")
>> >> -   (set_attr "prefix_extra" "1")
>> >> -   (set_attr "length_immediate" "1")
>> >> -   (set_attr "memory" "none,store")
>> >> -   (set_attr "prefix" "vex")
>> >> -   (set_attr "mode" "V8SF")])
>> >> +  "#"
>> >> +  "&& reload_completed"
>> >> +  [(const_int 0)]
>> >> +{
>> >> +  rtx op1 = operands[1];
>> >> +  if (REG_P (op1))
>> >> +    op1 = gen_rtx_REG (<avxhalfvecmode>mode, REGNO (op1));
>> >> +  else
>> >> +    op1 = gen_lowpart (<avxhalfvecmode>mode, op1);
>> >> +  emit_move_insn (operands[0], op1);
>> >> +  DONE;
>> >> +})
>> >
>> > Hm, can't gen_lowpart handle register conversion directly?
>> >
>>
>> That is how it is done in other places in sse.md.
>> gen_lowpart may generate SUBREG, which we don't
>> want on registers.
>
> This is post-reload splitter, so IIRC, gen_lowpart will just change mode
> of the hard register.  But, I'm not totaly sure...
>

It doesn't worl. I got

Starting program:
/export/build/gnu/gcc/build-x86_64-linux/prev-gcc/cc1 -fpreprocessed
x.i -quiet -dumpbase x.i -mavx -mtune=generic -march=x86-64 -auxbase x
-O2 -version -o x.s
GNU C (GCC) version 4.6.0 20100622 (experimental) (x86_64-unknown-linux-gnu)
	compiled by GNU C version 4.4.4 20100503 (Red Hat 4.4.4-2), GMP
version 4.3.1, MPFR version 2.4.2-p3, MPC version 0.8.1
GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
GNU C (GCC) version 4.6.0 20100622 (experimental) (x86_64-unknown-linux-gnu)
	compiled by GNU C version 4.4.4 20100503 (Red Hat 4.4.4-2), GMP
version 4.3.1, MPFR version 2.4.2-p3, MPC version 0.8.1
GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
Compiler executable checksum: e1890fe66351dde4fa9a333ace671390

Breakpoint 1, fancy_abort (
    file=0x3054618 "/export/gnu/import/git/gcc/gcc/emit-rtl.c", line=890,
    function=0x30555d9 "gen_reg_rtx")
    at /export/gnu/import/git/gcc/gcc/diagnostic.c:879
879	  internal_error ("in %s, at %s:%d", function, trim_filename (file), line);
(gdb) bt
#0  fancy_abort (file=0x3054618 "/export/gnu/import/git/gcc/gcc/emit-rtl.c",
    line=890, function=0x30555d9 "gen_reg_rtx")
    at /export/gnu/import/git/gcc/gcc/diagnostic.c:879
#1  0x0000000000ab22b0 in gen_reg_rtx (mode=V4DFmode)
    at /export/gnu/import/git/gcc/gcc/emit-rtl.c:890
#2  0x0000000000afd9c5 in copy_to_reg (x=0x7ffff179df40)
    at /export/gnu/import/git/gcc/gcc/explow.c:601
#3  0x000000000135345f in gen_lowpart_general (mode=V2DFmode, x=0x7ffff179df40)
    at /export/gnu/import/git/gcc/gcc/rtlhooks.c:50
#4  0x00000000026d8ba3 in gen_split_3050 (curr_insn=0x7ffff1646318,
    operands=0x43dce80)
    at /export/gnu/import/git/gcc/gcc/config/i386/sse.md:4180
#5  0x000000000282a0d1 in split_5 (x0=0x7ffff164f048, insn=0x7ffff1646318)
    at /export/gnu/import/git/gcc/gcc/config/i386/sse.md:4179
#6  0x000000000283089e in split_insns (x0=0x7ffff164f048, insn=0x7ffff1646318)
    at /export/gnu/import/git/gcc/gcc/config/i386/sse.md:4610
#7  0x0000000000ab9e9d in try_split (pat=0x7ffff164f048, trial=0x7ffff1646318,
    last=1) at /export/gnu/import/git/gcc/gcc/emit-rtl.c:3431
#8  0x000000000122faf3 in split_insn (insn=0x7ffff1646318)
    at /export/gnu/import/git/gcc/gcc/recog.c:2747
#9  0x000000000122fef8 in split_all_insns ()
    at /export/gnu/import/git/gcc/gcc/recog.c:2836
#10 0x0000000001231821 in rest_of_handle_split_after_reload ()
---Type <return> to continue, or q <return> to quit---
    at /export/gnu/import/git/gcc/gcc/recog.c:3562
#11 0x00000000011348e3 in execute_one_pass (pass=0x43664a0)
    at /export/gnu/import/git/gcc/gcc/passes.c:1576
#12 0x0000000001134acc in execute_pass_list (pass=0x43664a0)
    at /export/gnu/import/git/gcc/gcc/passes.c:1631
#13 0x0000000001134aed in execute_pass_list (pass=0x4366060)
    at /export/gnu/import/git/gcc/gcc/passes.c:1632
#14 0x0000000001134aed in execute_pass_list (pass=0x4366000)
    at /export/gnu/import/git/gcc/gcc/passes.c:1632
#15 0x0000000001810013 in tree_rest_of_compilation (fndecl=0x7ffff15b3600)
    at /export/gnu/import/git/gcc/gcc/tree-optimize.c:420
#16 0x000000000236b006 in cgraph_expand_function (node=0x7ffff15b6158)
    at /export/gnu/import/git/gcc/gcc/cgraphunit.c:1632
#17 0x000000000236b2ca in cgraph_expand_all_functions ()
    at /export/gnu/import/git/gcc/gcc/cgraphunit.c:1711
#18 0x000000000236b8f2 in cgraph_optimize ()
    at /export/gnu/import/git/gcc/gcc/cgraphunit.c:1967
#19 0x00000000023692e3 in cgraph_finalize_compilation_unit ()
    at /export/gnu/import/git/gcc/gcc/cgraphunit.c:1171
#20 0x00000000004e6bea in c_write_global_declarations ()
    at /export/gnu/import/git/gcc/gcc/c-decl.c:9698
#21 0x00000000014fe1b5 in compile_file ()
    at /export/gnu/import/git/gcc/gcc/toplev.c:997
---Type <return> to continue, or q <return> to quit---
#22 0x00000000015003f7 in do_compile ()
    at /export/gnu/import/git/gcc/gcc/toplev.c:2342
#23 0x00000000015004c5 in toplev_main (argc=15, argv=0x7fffffffe318)
    at /export/gnu/import/git/gcc/gcc/toplev.c:2383
#24 0x00000000006ec6de in main (argc=15, argv=0x7fffffffe318)
    at /export/gnu/import/git/gcc/gcc/main.c:35
Uros Bizjak - June 23, 2010, 5:48 a.m.
On Tue, Jun 22, 2010 at 9:12 PM, H.J. Lu <hjl.tools@gmail.com> wrote:

>>> >> -(define_insn "vec_extract_lo_<mode>"
>>> >> +(define_insn_and_split "vec_extract_lo_<mode>"
>>> >>    [(set (match_operand:<avxhalfvecmode> 0 "nonimmediate_operand" "=x,m")
>>> >>       (vec_select:<avxhalfvecmode>
>>> >> -       (match_operand:AVX256MODE4P 1 "register_operand" "x,x")
>>> >> +       (match_operand:AVX256MODE4P 1 "nonimmediate_operand" "xm,x")
>>> >>         (parallel [(const_int 0) (const_int 1)])))]
>>> >>    "TARGET_AVX"
>>> >> -  "vextractf128\t{$0x0, %1, %0|%0, %1, 0x0}"
>>> >> -  [(set_attr "type" "sselog")
>>> >> -   (set_attr "prefix_extra" "1")
>>> >> -   (set_attr "length_immediate" "1")
>>> >> -   (set_attr "memory" "none,store")
>>> >> -   (set_attr "prefix" "vex")
>>> >> -   (set_attr "mode" "V8SF")])
>>> >> +  "#"
>>> >> +  "&& reload_completed"
>>> >> +  [(const_int 0)]
>>> >> +{
>>> >> +  rtx op1 = operands[1];
>>> >> +  if (REG_P (op1))
>>> >> +    op1 = gen_rtx_REG (<avxhalfvecmode>mode, REGNO (op1));
>>> >> +  else
>>> >> +    op1 = gen_lowpart (<avxhalfvecmode>mode, op1);
>>> >> +  emit_move_insn (operands[0], op1);
>>> >> +  DONE;
>>> >> +})
>>> >
>>> > Hm, can't gen_lowpart handle register conversion directly?
>>> >
>>>
>>> That is how it is done in other places in sse.md.
>>> gen_lowpart may generate SUBREG, which we don't
>>> want on registers.
>>
>> This is post-reload splitter, so IIRC, gen_lowpart will just change mode
>> of the hard register.  But, I'm not totaly sure...
>>
>
> It doesn't work. I got

Hm, OK then.

Thanks,
Uros.

Patch

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 657e55a..268be3b 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -22427,9 +22427,9 @@  static const struct builtin_description bdesc_args[] =
   { OPTION_MASK_ISA_AVX, CODE_FOR_avx_si256_si, "__builtin_ia32_si256_si", IX86_BUILTIN_SI256_SI, UNKNOWN, (int) V8SI_FTYPE_V4SI },
   { OPTION_MASK_ISA_AVX, CODE_FOR_avx_ps256_ps, "__builtin_ia32_ps256_ps", IX86_BUILTIN_PS256_PS, UNKNOWN, (int) V8SF_FTYPE_V4SF },
   { OPTION_MASK_ISA_AVX, CODE_FOR_avx_pd256_pd, "__builtin_ia32_pd256_pd", IX86_BUILTIN_PD256_PD, UNKNOWN, (int) V4DF_FTYPE_V2DF },
-  { OPTION_MASK_ISA_AVX, CODE_FOR_avx_si_si256, "__builtin_ia32_si_si256", IX86_BUILTIN_SI_SI256, UNKNOWN, (int) V4SI_FTYPE_V8SI },
-  { OPTION_MASK_ISA_AVX, CODE_FOR_avx_ps_ps256, "__builtin_ia32_ps_ps256", IX86_BUILTIN_PS_PS256, UNKNOWN, (int) V4SF_FTYPE_V8SF },
-  { OPTION_MASK_ISA_AVX, CODE_FOR_avx_pd_pd256, "__builtin_ia32_pd_pd256", IX86_BUILTIN_PD_PD256, UNKNOWN, (int) V2DF_FTYPE_V4DF },
+  { OPTION_MASK_ISA_AVX, CODE_FOR_vec_extract_lo_v8si, "__builtin_ia32_si_si256", IX86_BUILTIN_SI_SI256, UNKNOWN, (int) V4SI_FTYPE_V8SI },
+  { OPTION_MASK_ISA_AVX, CODE_FOR_vec_extract_lo_v8sf, "__builtin_ia32_ps_ps256", IX86_BUILTIN_PS_PS256, UNKNOWN, (int) V4SF_FTYPE_V8SF },
+  { OPTION_MASK_ISA_AVX, CODE_FOR_vec_extract_lo_v4df, "__builtin_ia32_pd_pd256", IX86_BUILTIN_PD_PD256, UNKNOWN, (int) V2DF_FTYPE_V4DF },
 
   { OPTION_MASK_ISA_AVX, CODE_FOR_avx_vtestpd, "__builtin_ia32_vtestzpd", IX86_BUILTIN_VTESTZPD, EQ, (int) INT_FTYPE_V2DF_V2DF_PTEST },
   { OPTION_MASK_ISA_AVX, CODE_FOR_avx_vtestpd, "__builtin_ia32_vtestcpd", IX86_BUILTIN_VTESTCPD, LTU, (int) INT_FTYPE_V2DF_V2DF_PTEST },
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 7625906..ed22675 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -4178,19 +4178,24 @@ 
   DONE;
 })
 
-(define_insn "vec_extract_lo_<mode>"
+(define_insn_and_split "vec_extract_lo_<mode>"
   [(set (match_operand:<avxhalfvecmode> 0 "nonimmediate_operand" "=x,m")
 	(vec_select:<avxhalfvecmode>
-	  (match_operand:AVX256MODE4P 1 "register_operand" "x,x")
+	  (match_operand:AVX256MODE4P 1 "nonimmediate_operand" "xm,x")
 	  (parallel [(const_int 0) (const_int 1)])))]
   "TARGET_AVX"
-  "vextractf128\t{$0x0, %1, %0|%0, %1, 0x0}"
-  [(set_attr "type" "sselog")
-   (set_attr "prefix_extra" "1")
-   (set_attr "length_immediate" "1")
-   (set_attr "memory" "none,store")
-   (set_attr "prefix" "vex")
-   (set_attr "mode" "V8SF")])
+  "#"
+  "&& reload_completed"
+  [(const_int 0)]
+{
+  rtx op1 = operands[1];
+  if (REG_P (op1))
+    op1 = gen_rtx_REG (<avxhalfvecmode>mode, REGNO (op1));
+  else
+    op1 = gen_lowpart (<avxhalfvecmode>mode, op1);
+  emit_move_insn (operands[0], op1);
+  DONE;
+})
 
 (define_insn "vec_extract_hi_<mode>"
   [(set (match_operand:<avxhalfvecmode> 0 "nonimmediate_operand" "=x,m")
@@ -4206,20 +4211,25 @@ 
    (set_attr "prefix" "vex")
    (set_attr "mode" "V8SF")])
 
-(define_insn "vec_extract_lo_<mode>"
+(define_insn_and_split "vec_extract_lo_<mode>"
   [(set (match_operand:<avxhalfvecmode> 0 "nonimmediate_operand" "=x,m")
 	(vec_select:<avxhalfvecmode>
-	  (match_operand:AVX256MODE8P 1 "register_operand" "x,x")
+	  (match_operand:AVX256MODE8P 1 "nonimmediate_operand" "xm,x")
 	  (parallel [(const_int 0) (const_int 1)
 		     (const_int 2) (const_int 3)])))]
   "TARGET_AVX"
-  "vextractf128\t{$0x0, %1, %0|%0, %1, 0x0}"
-  [(set_attr "type" "sselog")
-   (set_attr "prefix_extra" "1")
-   (set_attr "length_immediate" "1")
-   (set_attr "memory" "none,store")
-   (set_attr "prefix" "vex")
-   (set_attr "mode" "V8SF")])
+  "#"
+  "&& reload_completed"
+  [(const_int 0)]
+{
+  rtx op1 = operands[1];
+  if (REG_P (op1))
+    op1 = gen_rtx_REG (<avxhalfvecmode>mode, REGNO (op1));
+  else
+    op1 = gen_lowpart (<avxhalfvecmode>mode, op1);
+  emit_move_insn (operands[0], op1);
+  DONE;
+})
 
 (define_insn "vec_extract_hi_<mode>"
   [(set (match_operand:<avxhalfvecmode> 0 "nonimmediate_operand" "=x,m")
@@ -4236,22 +4246,27 @@ 
    (set_attr "prefix" "vex")
    (set_attr "mode" "V8SF")])
 
-(define_insn "vec_extract_lo_v16hi"
+(define_insn_and_split "vec_extract_lo_v16hi"
   [(set (match_operand:V8HI 0 "nonimmediate_operand" "=x,m")
 	(vec_select:V8HI
-	  (match_operand:V16HI 1 "register_operand" "x,x")
+	  (match_operand:V16HI 1 "nonimmediate_operand" "xm,x")
 	  (parallel [(const_int 0) (const_int 1)
 		     (const_int 2) (const_int 3)
 		     (const_int 4) (const_int 5)
 		     (const_int 6) (const_int 7)])))]
   "TARGET_AVX"
-  "vextractf128\t{$0x0, %1, %0|%0, %1, 0x0}"
-  [(set_attr "type" "sselog")
-   (set_attr "prefix_extra" "1")
-   (set_attr "length_immediate" "1")
-   (set_attr "memory" "none,store")
-   (set_attr "prefix" "vex")
-   (set_attr "mode" "V8SF")])
+  "#"
+  "&& reload_completed"
+  [(const_int 0)]
+{
+  rtx op1 = operands[1];
+  if (REG_P (op1))
+    op1 = gen_rtx_REG (V8HImode, REGNO (op1));
+  else
+    op1 = gen_lowpart (V8HImode, op1);
+  emit_move_insn (operands[0], op1);
+  DONE;
+})
 
 (define_insn "vec_extract_hi_v16hi"
   [(set (match_operand:V8HI 0 "nonimmediate_operand" "=x,m")
@@ -4270,10 +4285,10 @@ 
    (set_attr "prefix" "vex")
    (set_attr "mode" "V8SF")])
 
-(define_insn "vec_extract_lo_v32qi"
+(define_insn_and_split "vec_extract_lo_v32qi"
   [(set (match_operand:V16QI 0 "nonimmediate_operand" "=x,m")
 	(vec_select:V16QI
-	  (match_operand:V32QI 1 "register_operand" "x,x")
+	  (match_operand:V32QI 1 "nonimmediate_operand" "xm,x")
 	  (parallel [(const_int 0) (const_int 1)
 		     (const_int 2) (const_int 3)
 		     (const_int 4) (const_int 5)
@@ -4283,13 +4298,18 @@ 
 		     (const_int 12) (const_int 13)
 		     (const_int 14) (const_int 15)])))]
   "TARGET_AVX"
-  "vextractf128\t{$0x0, %1, %0|%0, %1, 0x0}"
-  [(set_attr "type" "sselog")
-   (set_attr "prefix_extra" "1")
-   (set_attr "length_immediate" "1")
-   (set_attr "memory" "none,store")
-   (set_attr "prefix" "vex")
-   (set_attr "mode" "V8SF")])
+  "#"
+  "&& reload_completed"
+  [(const_int 0)]
+{
+  rtx op1 = operands[1];
+  if (REG_P (op1))
+    op1 = gen_rtx_REG (V16QImode, REGNO (op1));
+  else
+    op1 = gen_lowpart (V16QImode, op1);
+  emit_move_insn (operands[0], op1);
+  DONE;
+})
 
 (define_insn "vec_extract_hi_v32qi"
   [(set (match_operand:V16QI 0 "nonimmediate_operand" "=x,m")
@@ -12252,77 +12272,24 @@ 
    (set_attr "prefix" "vex")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "avx_<avxmodesuffixp><avxmodesuffix>_<avxmodesuffixp>"
-  [(set (match_operand:AVX256MODE2P 0 "register_operand" "=x,x")
+(define_insn_and_split "avx_<avxmodesuffixp><avxmodesuffix>_<avxmodesuffixp>"
+  [(set (match_operand:AVX256MODE2P 0 "nonimmediate_operand" "=x,m")
 	(unspec:AVX256MODE2P
-	  [(match_operand:<avxhalfvecmode> 1 "nonimmediate_operand" "0,xm")]
-	  UNSPEC_CAST))]
-  "TARGET_AVX"
-{
-  switch (which_alternative)
-    {
-    case 0:
-      return "";
-    case 1:
-      switch (get_attr_mode (insn))
-        {
-	case MODE_V8SF:
-	  return "vmovaps\t{%1, %x0|%x0, %1}";
-	case MODE_V4DF:
-	  return "vmovapd\t{%1, %x0|%x0, %1}";
-	case MODE_OI:
-	  return "vmovdqa\t{%1, %x0|%x0, %1}";
-	default:
-	  break;
-	}
-    default:
-      break;
-    }
-  gcc_unreachable ();
-}
-  [(set_attr "type" "ssemov")
-   (set_attr "prefix" "vex")
-   (set_attr "mode" "<avxvecmode>")
-   (set (attr "length")
-    (if_then_else (eq_attr "alternative" "0")
-       (const_string "0")
-       (const_string "*")))])
-
-(define_insn "avx_<avxmodesuffixp>_<avxmodesuffixp><avxmodesuffix>"
-  [(set (match_operand:<avxhalfvecmode> 0 "register_operand" "=x,x")
-	(unspec:<avxhalfvecmode>
-	  [(match_operand:AVX256MODE2P 1 "nonimmediate_operand" "0,xm")]
+	  [(match_operand:<avxhalfvecmode> 1 "nonimmediate_operand" "xm,x")]
 	  UNSPEC_CAST))]
   "TARGET_AVX"
+  "#"
+  "&& reload_completed"
+  [(const_int 0)]
 {
-  switch (which_alternative)
-    {
-    case 0:
-      return "";
-    case 1:
-      switch (get_attr_mode (insn))
-        {
-	case MODE_V8SF:
-	  return "vmovaps\t{%x1, %0|%0, %x1}";
-	case MODE_V4DF:
-	  return "vmovapd\t{%x1, %0|%0, %x1}";
-	case MODE_OI:
-	  return "vmovdqa\t{%x1, %0|%0, %x1}";
-	default:
-	  break;
-	}
-    default:
-      break;
-    }
-  gcc_unreachable ();
-}
-  [(set_attr "type" "ssemov")
-   (set_attr "prefix" "vex")
-   (set_attr "mode" "<avxvecmode>")
-   (set (attr "length")
-    (if_then_else (eq_attr "alternative" "0")
-       (const_string "0")
-       (const_string "*")))])
+  rtx op1 = operands[1];
+  if (REG_P (op1))
+    op1 = gen_rtx_REG (<MODE>mode, REGNO (op1));
+  else
+    op1 = gen_lowpart (<MODE>mode, op1);
+  emit_move_insn (operands[0], op1);
+  DONE;
+})
 
 (define_expand "vec_init<mode>"
   [(match_operand:AVX256MODE 0 "register_operand" "")