Message ID | 20160425125145.GA5326@intel.com |
---|---|
State | New |
Headers | show |
On Mon, Apr 25, 2016 at 2:51 PM, H.J. Lu <hongjiu.lu@intel.com> wrote: > Tested on Linux/x86-64. OK for trunk? > + /* FIXME: Since the CSE pass may change dominance info, which isn't > + expected by the fwprop pass, call free_dominance_info to > + invalidate dominance info. Otherwise, the fwprop pass may crash > + when dominance info is changed. */ > + if (TARGET_64BIT) > + free_dominance_info (CDI_DOMINATORS); > + Please resolve the above problem first, target-dependent sources are not the place to apply band-aids for middle-end problems. The thread with the proposed fix died in [1]. [1] https://gcc.gnu.org/ml/gcc/2016-03/msg00143.html Also, I find _32 and _64 suffixes confusing, maybe better would be to use timode_ and dimode_ prefixes everywhere? Uros.
On Mon, Apr 25, 2016 at 7:18 AM, Uros Bizjak <ubizjak@gmail.com> wrote: > On Mon, Apr 25, 2016 at 2:51 PM, H.J. Lu <hongjiu.lu@intel.com> wrote: >> Tested on Linux/x86-64. OK for trunk? > >> + /* FIXME: Since the CSE pass may change dominance info, which isn't >> + expected by the fwprop pass, call free_dominance_info to >> + invalidate dominance info. Otherwise, the fwprop pass may crash >> + when dominance info is changed. */ >> + if (TARGET_64BIT) >> + free_dominance_info (CDI_DOMINATORS); >> + > > Please resolve the above problem first, target-dependent sources are > not the place to apply band-aids for middle-end problems. The thread > with the proposed fix died in [1]. > > [1] https://gcc.gnu.org/ml/gcc/2016-03/msg00143.html free_dominance_info (CDI_DOMINATORS) has been called in other places to avoid this middle-end issue. I don't know when the middle-end will be fixed. I don't think this target optimization should be penalized by the middle-end issue. > Also, I find _32 and _64 suffixes confusing, maybe better would be to > use timode_ and dimode_ prefixes everywhere? > I will make the change.
On Mon, Apr 25, 2016 at 4:47 PM, H.J. Lu <hjl.tools@gmail.com> wrote: > On Mon, Apr 25, 2016 at 7:18 AM, Uros Bizjak <ubizjak@gmail.com> wrote: >> On Mon, Apr 25, 2016 at 2:51 PM, H.J. Lu <hongjiu.lu@intel.com> wrote: >>> Tested on Linux/x86-64. OK for trunk? >> >>> + /* FIXME: Since the CSE pass may change dominance info, which isn't >>> + expected by the fwprop pass, call free_dominance_info to >>> + invalidate dominance info. Otherwise, the fwprop pass may crash >>> + when dominance info is changed. */ >>> + if (TARGET_64BIT) >>> + free_dominance_info (CDI_DOMINATORS); >>> + >> >> Please resolve the above problem first, target-dependent sources are >> not the place to apply band-aids for middle-end problems. The thread >> with the proposed fix died in [1]. >> >> [1] https://gcc.gnu.org/ml/gcc/2016-03/msg00143.html > > free_dominance_info (CDI_DOMINATORS) has been called in other > places to avoid this middle-end issue. I don't know when the middle-end > will be fixed. I don't think this target optimization should be penalized by > the middle-end issue. Let's ask Richard if he is OK with the workaround... One more thing: @@ -3551,6 +3874,7 @@ convert_scalars_to_vector () basic_block bb; bitmap candidates; int converted_insns = 0; + rtx zero, minus_one; bitmap_obstack_initialize (NULL); candidates = BITMAP_ALLOC (NULL); @@ -3585,22 +3909,40 @@ convert_scalars_to_vector () if (dump_file) fprintf (dump_file, "There are no candidates for optimization.\n"); + if (TARGET_64BIT) + { + zero = gen_reg_rtx (V1TImode); + minus_one = gen_reg_rtx (V1TImode); + } + else + { + zero = NULL_RTX; + minus_one = NULL_RTX; + } + Do we *really* need to crate registers here? They are only used as a temporary in scalar_chain_64::convert_insn, and I think that any CSE worth its name should find out when the same immediate is loaded to different temporaries. BTW: I'd really like if Ilya can review the functionality of the patch, he is an expert in this conversion stuff. Uros.
On Mon, Apr 25, 2016 at 8:10 AM, Uros Bizjak <ubizjak@gmail.com> wrote: > On Mon, Apr 25, 2016 at 4:47 PM, H.J. Lu <hjl.tools@gmail.com> wrote: >> On Mon, Apr 25, 2016 at 7:18 AM, Uros Bizjak <ubizjak@gmail.com> wrote: >>> On Mon, Apr 25, 2016 at 2:51 PM, H.J. Lu <hongjiu.lu@intel.com> wrote: >>>> Tested on Linux/x86-64. OK for trunk? >>> >>>> + /* FIXME: Since the CSE pass may change dominance info, which isn't >>>> + expected by the fwprop pass, call free_dominance_info to >>>> + invalidate dominance info. Otherwise, the fwprop pass may crash >>>> + when dominance info is changed. */ >>>> + if (TARGET_64BIT) >>>> + free_dominance_info (CDI_DOMINATORS); >>>> + >>> >>> Please resolve the above problem first, target-dependent sources are >>> not the place to apply band-aids for middle-end problems. The thread >>> with the proposed fix died in [1]. >>> >>> [1] https://gcc.gnu.org/ml/gcc/2016-03/msg00143.html >> >> free_dominance_info (CDI_DOMINATORS) has been called in other >> places to avoid this middle-end issue. I don't know when the middle-end >> will be fixed. I don't think this target optimization should be penalized by >> the middle-end issue. > > Let's ask Richard if he is OK with the workaround... > > One more thing: > > @@ -3551,6 +3874,7 @@ convert_scalars_to_vector () > basic_block bb; > bitmap candidates; > int converted_insns = 0; > + rtx zero, minus_one; > > bitmap_obstack_initialize (NULL); > candidates = BITMAP_ALLOC (NULL); > @@ -3585,22 +3909,40 @@ convert_scalars_to_vector () > if (dump_file) > fprintf (dump_file, "There are no candidates for optimization.\n"); > > + if (TARGET_64BIT) > + { > + zero = gen_reg_rtx (V1TImode); > + minus_one = gen_reg_rtx (V1TImode); > + } > + else > + { > + zero = NULL_RTX; > + minus_one = NULL_RTX; > + } > + > > Do we *really* need to crate registers here? They are only used as a > temporary in scalar_chain_64::convert_insn, and I think that any CSE > worth its name should find out when the same immediate is loaded to > different temporaries. I will double check. Last time when I tried, CSE didn't work on this for a reason. For [hjl@gnu-6 pr67400]$ cat z.i extern void bar (void); void * foo (void) { return &bar; } [hjl@gnu-6 pr67400]$ gcc -S -fPIC -O2 z.i [hjl@gnu-6 pr67400]$ CSE sees: (insn 5 2 6 2 (set (reg:DI 89) (mem/u/c:DI (const:DI (unspec:DI [ (symbol_ref:DI ("bar") [flags 0x41] <function_decl 0x7f24362151b0 bar>) ] UNSPEC_GOTPCREL)) [1 S8 A8])) z.i:6 89 {*movdi_internal} (nil)) (insn 6 5 10 2 (set (reg:DI 87 [ <retval> ]) (reg:DI 89)) z.i:6 89 {*movdi_internal} (expr_list:REG_EQUAL (symbol_ref:DI ("bar") [flags 0x41] <function_decl 0x7f24362151b0 bar>) (nil))) CSE will not change it to (insn 6 5 10 2 (set (reg:DI 89 [ <retval> ]) (reg:DI 89)) z.i:6 89 {*movdi_internal} (expr_list:REG_EQUAL (symbol_ref:DI ("bar") [flags 0x41] <function_decl 0x7f24362151b0 bar>) (nil))) > BTW: I'd really like if Ilya can review the functionality of the > patch, he is an expert in this conversion stuff. > > Uros. Ilya, can you take a look? Thanks.
On Mon, Apr 25, 2016 at 8:27 AM, H.J. Lu <hjl.tools@gmail.com> wrote: > On Mon, Apr 25, 2016 at 8:10 AM, Uros Bizjak <ubizjak@gmail.com> wrote: >> On Mon, Apr 25, 2016 at 4:47 PM, H.J. Lu <hjl.tools@gmail.com> wrote: >>> On Mon, Apr 25, 2016 at 7:18 AM, Uros Bizjak <ubizjak@gmail.com> wrote: >>>> On Mon, Apr 25, 2016 at 2:51 PM, H.J. Lu <hongjiu.lu@intel.com> wrote: >>>>> Tested on Linux/x86-64. OK for trunk? >>>> >>>>> + /* FIXME: Since the CSE pass may change dominance info, which isn't >>>>> + expected by the fwprop pass, call free_dominance_info to >>>>> + invalidate dominance info. Otherwise, the fwprop pass may crash >>>>> + when dominance info is changed. */ >>>>> + if (TARGET_64BIT) >>>>> + free_dominance_info (CDI_DOMINATORS); >>>>> + >>>> >>>> Please resolve the above problem first, target-dependent sources are >>>> not the place to apply band-aids for middle-end problems. The thread >>>> with the proposed fix died in [1]. >>>> >>>> [1] https://gcc.gnu.org/ml/gcc/2016-03/msg00143.html >>> >>> free_dominance_info (CDI_DOMINATORS) has been called in other >>> places to avoid this middle-end issue. I don't know when the middle-end >>> will be fixed. I don't think this target optimization should be penalized by >>> the middle-end issue. >> >> Let's ask Richard if he is OK with the workaround... >> >> One more thing: >> >> @@ -3551,6 +3874,7 @@ convert_scalars_to_vector () >> basic_block bb; >> bitmap candidates; >> int converted_insns = 0; >> + rtx zero, minus_one; >> >> bitmap_obstack_initialize (NULL); >> candidates = BITMAP_ALLOC (NULL); >> @@ -3585,22 +3909,40 @@ convert_scalars_to_vector () >> if (dump_file) >> fprintf (dump_file, "There are no candidates for optimization.\n"); >> >> + if (TARGET_64BIT) >> + { >> + zero = gen_reg_rtx (V1TImode); >> + minus_one = gen_reg_rtx (V1TImode); >> + } >> + else >> + { >> + zero = NULL_RTX; >> + minus_one = NULL_RTX; >> + } >> + >> >> Do we *really* need to crate registers here? They are only used as a >> temporary in scalar_chain_64::convert_insn, and I think that any CSE >> worth its name should find out when the same immediate is loaded to >> different temporaries. > > I will double check. Last time when I tried, CSE didn't work on this for a > reason. For > > [hjl@gnu-6 pr67400]$ cat z.i > extern void bar (void); > > void * > foo (void) > { > return &bar; > } > [hjl@gnu-6 pr67400]$ gcc -S -fPIC -O2 z.i > [hjl@gnu-6 pr67400]$ > > CSE sees: > > (insn 5 2 6 2 (set (reg:DI 89) > (mem/u/c:DI (const:DI (unspec:DI [ > (symbol_ref:DI ("bar") [flags 0x41] > <function_decl 0x7f24362151b0 bar>) > ] UNSPEC_GOTPCREL)) [1 S8 A8])) z.i:6 89 {*movdi_internal} > (nil)) > (insn 6 5 10 2 (set (reg:DI 87 [ <retval> ]) > (reg:DI 89)) z.i:6 89 {*movdi_internal} > (expr_list:REG_EQUAL (symbol_ref:DI ("bar") [flags 0x41] > <function_decl 0x7f24362151b0 bar>) > (nil))) > > CSE will not change it to > > (insn 6 5 10 2 (set (reg:DI 89 [ <retval> ]) > (reg:DI 89)) z.i:6 89 {*movdi_internal} > (expr_list:REG_EQUAL (symbol_ref:DI ("bar") [flags 0x41] > <function_decl 0x7f24362151b0 bar>) > (nil))) I meant CSE won't change it to insn 5 2 9 2 (set (reg:DI 87 [ <retval> ]) (symbol_ref:DI ("bar") [flags 0x41] <function_decl 0x7f7a734971b0 bar>)) z.i:6 89 {*movdi_internal} (nil)) >> BTW: I'd really like if Ilya can review the functionality of the >> patch, he is an expert in this conversion stuff. >> >> Uros. > > Ilya, can you take a look? > > Thanks. > > -- > H.J.
2016-04-25 18:27 GMT+03:00 H.J. Lu <hjl.tools@gmail.com>: > > Ilya, can you take a look? > > Thanks. > > -- > H.J. Hi, Algorithmic part of the patch looks OK to me except the following piece of code. +/* Check REF's chain to add new insns into a queue + and find registers requiring conversion. */ Comment is wrong because you don't have any conversions required for your candidates. + +void +scalar_chain_64::analyze_register_chain (bitmap candidates, df_ref ref) +{ + df_link *chain; + + gcc_assert (bitmap_bit_p (insns, DF_REF_INSN_UID (ref)) + || bitmap_bit_p (candidates, DF_REF_INSN_UID (ref))); + add_to_queue (DF_REF_INSN_UID (ref)); + + for (chain = DF_REF_CHAIN (ref); chain; chain = chain->next) + { + unsigned uid = DF_REF_INSN_UID (chain->ref); + + if (!NONDEBUG_INSN_P (DF_REF_INSN (chain->ref))) + continue; + + if (!DF_REF_REG_MEM_P (chain->ref)) + continue; I believe here you wrongly jump to the next ref intead of actually adding it to a queue. You may just use gcc_assert (!DF_REF_REG_MEM_P (chain->ref)); because you should'n have a candidate used in address operand. + + if (bitmap_bit_p (insns, uid)) + continue; + + if (bitmap_bit_p (candidates, uid)) + add_to_queue (uid); Probably gcc_assert (bitmap_bit_p (candidates, uid)) since no uses and defs out of candidates list are allowed? + } +} Thanks, Ilya
On Mon, 25 Apr 2016, Uros Bizjak wrote: > On Mon, Apr 25, 2016 at 4:47 PM, H.J. Lu <hjl.tools@gmail.com> wrote: > > On Mon, Apr 25, 2016 at 7:18 AM, Uros Bizjak <ubizjak@gmail.com> wrote: > >> On Mon, Apr 25, 2016 at 2:51 PM, H.J. Lu <hongjiu.lu@intel.com> wrote: > >>> Tested on Linux/x86-64. OK for trunk? > >> > >>> + /* FIXME: Since the CSE pass may change dominance info, which isn't > >>> + expected by the fwprop pass, call free_dominance_info to > >>> + invalidate dominance info. Otherwise, the fwprop pass may crash > >>> + when dominance info is changed. */ > >>> + if (TARGET_64BIT) > >>> + free_dominance_info (CDI_DOMINATORS); > >>> + > >> > >> Please resolve the above problem first, target-dependent sources are > >> not the place to apply band-aids for middle-end problems. The thread > >> with the proposed fix died in [1]. > >> > >> [1] https://gcc.gnu.org/ml/gcc/2016-03/msg00143.html > > > > free_dominance_info (CDI_DOMINATORS) has been called in other > > places to avoid this middle-end issue. I don't know when the middle-end > > will be fixed. I don't think this target optimization should be penalized by > > the middle-end issue. > > Let's ask Richard if he is OK with the workaround... Well, it's ultimately your call (it's a workaround in the target). Of course I'd like to see the underlying issue fixed and the workarounds in "other places" be removed. Richard. > One more thing: > > @@ -3551,6 +3874,7 @@ convert_scalars_to_vector () > basic_block bb; > bitmap candidates; > int converted_insns = 0; > + rtx zero, minus_one; > > bitmap_obstack_initialize (NULL); > candidates = BITMAP_ALLOC (NULL); > @@ -3585,22 +3909,40 @@ convert_scalars_to_vector () > if (dump_file) > fprintf (dump_file, "There are no candidates for optimization.\n"); > > + if (TARGET_64BIT) > + { > + zero = gen_reg_rtx (V1TImode); > + minus_one = gen_reg_rtx (V1TImode); > + } > + else > + { > + zero = NULL_RTX; > + minus_one = NULL_RTX; > + } > + > > Do we *really* need to crate registers here? They are only used as a > temporary in scalar_chain_64::convert_insn, and I think that any CSE > worth its name should find out when the same immediate is loaded to > different temporaries. > > BTW: I'd really like if Ilya can review the functionality of the > patch, he is an expert in this conversion stuff. > > Uros. > >
On Tue, Apr 26, 2016 at 11:17 AM, Richard Biener <rguenther@suse.de> wrote: > On Mon, 25 Apr 2016, Uros Bizjak wrote: > >> On Mon, Apr 25, 2016 at 4:47 PM, H.J. Lu <hjl.tools@gmail.com> wrote: >> > On Mon, Apr 25, 2016 at 7:18 AM, Uros Bizjak <ubizjak@gmail.com> wrote: >> >> On Mon, Apr 25, 2016 at 2:51 PM, H.J. Lu <hongjiu.lu@intel.com> wrote: >> >>> Tested on Linux/x86-64. OK for trunk? >> >> >> >>> + /* FIXME: Since the CSE pass may change dominance info, which isn't >> >>> + expected by the fwprop pass, call free_dominance_info to >> >>> + invalidate dominance info. Otherwise, the fwprop pass may crash >> >>> + when dominance info is changed. */ >> >>> + if (TARGET_64BIT) >> >>> + free_dominance_info (CDI_DOMINATORS); >> >>> + >> >> >> >> Please resolve the above problem first, target-dependent sources are >> >> not the place to apply band-aids for middle-end problems. The thread >> >> with the proposed fix died in [1]. >> >> >> >> [1] https://gcc.gnu.org/ml/gcc/2016-03/msg00143.html >> > >> > free_dominance_info (CDI_DOMINATORS) has been called in other >> > places to avoid this middle-end issue. I don't know when the middle-end >> > will be fixed. I don't think this target optimization should be penalized by >> > the middle-end issue. >> >> Let's ask Richard if he is OK with the workaround... > > Well, it's ultimately your call (it's a workaround in the target). Oh well, ... > Of course I'd like to see the underlying issue fixed and the > workarounds in "other places" be removed. ... then at least a reference to a relevant PR should be added to a FIXME comment. Uros.
On Tue, 26 Apr 2016, Uros Bizjak wrote: > On Tue, Apr 26, 2016 at 11:17 AM, Richard Biener <rguenther@suse.de> wrote: > > On Mon, 25 Apr 2016, Uros Bizjak wrote: > > > >> On Mon, Apr 25, 2016 at 4:47 PM, H.J. Lu <hjl.tools@gmail.com> wrote: > >> > On Mon, Apr 25, 2016 at 7:18 AM, Uros Bizjak <ubizjak@gmail.com> wrote: > >> >> On Mon, Apr 25, 2016 at 2:51 PM, H.J. Lu <hongjiu.lu@intel.com> wrote: > >> >>> Tested on Linux/x86-64. OK for trunk? > >> >> > >> >>> + /* FIXME: Since the CSE pass may change dominance info, which isn't > >> >>> + expected by the fwprop pass, call free_dominance_info to > >> >>> + invalidate dominance info. Otherwise, the fwprop pass may crash > >> >>> + when dominance info is changed. */ > >> >>> + if (TARGET_64BIT) > >> >>> + free_dominance_info (CDI_DOMINATORS); > >> >>> + > >> >> > >> >> Please resolve the above problem first, target-dependent sources are > >> >> not the place to apply band-aids for middle-end problems. The thread > >> >> with the proposed fix died in [1]. > >> >> > >> >> [1] https://gcc.gnu.org/ml/gcc/2016-03/msg00143.html > >> > > >> > free_dominance_info (CDI_DOMINATORS) has been called in other > >> > places to avoid this middle-end issue. I don't know when the middle-end > >> > will be fixed. I don't think this target optimization should be penalized by > >> > the middle-end issue. > >> > >> Let's ask Richard if he is OK with the workaround... > > > > Well, it's ultimately your call (it's a workaround in the target). > > Oh well, ... > > > Of course I'd like to see the underlying issue fixed and the > > workarounds in "other places" be removed. > > ... then at least a reference to a relevant PR should be added to a > FIXME comment. HJ, can you please open a bug with 1) a testcase, 2) a patch to revert the workaround so it shows the ICE and 3) a pointer to the ml thread with your preliminary analysis? The advantage is that with the patch in we have a reproducer at least. Thanks, Richard.
On Mon, Apr 25, 2016 at 9:13 AM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote: > 2016-04-25 18:27 GMT+03:00 H.J. Lu <hjl.tools@gmail.com>: >> >> Ilya, can you take a look? >> >> Thanks. >> >> -- >> H.J. > > Hi, > > Algorithmic part of the patch looks OK to me except the following piece of code. > > +/* Check REF's chain to add new insns into a queue > + and find registers requiring conversion. */ > > Comment is wrong because you don't have any conversions required for > your candidates. I will fix it. > + > +void > +scalar_chain_64::analyze_register_chain (bitmap candidates, df_ref ref) > +{ > + df_link *chain; > + > + gcc_assert (bitmap_bit_p (insns, DF_REF_INSN_UID (ref)) > + || bitmap_bit_p (candidates, DF_REF_INSN_UID (ref))); > + add_to_queue (DF_REF_INSN_UID (ref)); > + > + for (chain = DF_REF_CHAIN (ref); chain; chain = chain->next) > + { > + unsigned uid = DF_REF_INSN_UID (chain->ref); > + > + if (!NONDEBUG_INSN_P (DF_REF_INSN (chain->ref))) > + continue; > + > + if (!DF_REF_REG_MEM_P (chain->ref)) > + continue; > > I believe here you wrongly jump to the next ref intead of actually adding it > to a queue. You may just use > > gcc_assert (!DF_REF_REG_MEM_P (chain->ref)); > > because you should'n have a candidate used in address operand. I will update. > + > + if (bitmap_bit_p (insns, uid)) > + continue; > + > + if (bitmap_bit_p (candidates, uid)) > + add_to_queue (uid); > > Probably gcc_assert (bitmap_bit_p (candidates, uid)) since no uses and defs > out of candidates list are allowed? That would be wrong since there are while (!bitmap_empty_p (queue)) { insn_uid = bitmap_first_set_bit (queue); bitmap_clear_bit (queue, insn_uid); bitmap_clear_bit (candidates, insn_uid); add_insn (candidates, insn_uid); } An instruction is a candidate and the bit is cleared when analyze_register_chain is called.
2016-04-26 17:07 GMT+03:00 H.J. Lu <hjl.tools@gmail.com>: > On Mon, Apr 25, 2016 at 9:13 AM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote: >> 2016-04-25 18:27 GMT+03:00 H.J. Lu <hjl.tools@gmail.com>: >>> >>> Ilya, can you take a look? >>> >>> Thanks. >>> >>> -- >>> H.J. >> >> Hi, >> >> Algorithmic part of the patch looks OK to me except the following piece of code. >> >> +/* Check REF's chain to add new insns into a queue >> + and find registers requiring conversion. */ >> >> Comment is wrong because you don't have any conversions required for >> your candidates. > > I will fix it. > >> + >> +void >> +scalar_chain_64::analyze_register_chain (bitmap candidates, df_ref ref) >> +{ >> + df_link *chain; >> + >> + gcc_assert (bitmap_bit_p (insns, DF_REF_INSN_UID (ref)) >> + || bitmap_bit_p (candidates, DF_REF_INSN_UID (ref))); >> + add_to_queue (DF_REF_INSN_UID (ref)); >> + >> + for (chain = DF_REF_CHAIN (ref); chain; chain = chain->next) >> + { >> + unsigned uid = DF_REF_INSN_UID (chain->ref); >> + >> + if (!NONDEBUG_INSN_P (DF_REF_INSN (chain->ref))) >> + continue; >> + >> + if (!DF_REF_REG_MEM_P (chain->ref)) >> + continue; >> >> I believe here you wrongly jump to the next ref intead of actually adding it >> to a queue. You may just use >> >> gcc_assert (!DF_REF_REG_MEM_P (chain->ref)); >> >> because you should'n have a candidate used in address operand. > > I will update. > >> + >> + if (bitmap_bit_p (insns, uid)) >> + continue; >> + >> + if (bitmap_bit_p (candidates, uid)) >> + add_to_queue (uid); >> >> Probably gcc_assert (bitmap_bit_p (candidates, uid)) since no uses and defs >> out of candidates list are allowed? > > That would be wrong since there are > > while (!bitmap_empty_p (queue)) > { > insn_uid = bitmap_first_set_bit (queue); > bitmap_clear_bit (queue, insn_uid); > bitmap_clear_bit (candidates, insn_uid); > add_insn (candidates, insn_uid); > } > > An instruction is a candidate and the bit is cleared when > analyze_register_chain is called. You clear candidates bit but the first thing you do in add_insn is set insns bit. Thus you should hit: if (bitmap_bit_p (insns, uid)) continue; For handled candidates. Probably it would be more clear if we keep this clear/set pair together? E.g. move bitmap_clear_bit (candidates, insn_uid) to scalar_chain::add_insn. Thanks, Ilya > > -- > H.J.
On Tue, Apr 26, 2016 at 7:15 AM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote: > 2016-04-26 17:07 GMT+03:00 H.J. Lu <hjl.tools@gmail.com>: >> On Mon, Apr 25, 2016 at 9:13 AM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote: >>> 2016-04-25 18:27 GMT+03:00 H.J. Lu <hjl.tools@gmail.com>: >>>> >>>> Ilya, can you take a look? >>>> >>>> Thanks. >>>> >>>> -- >>>> H.J. >>> >>> Hi, >>> >>> Algorithmic part of the patch looks OK to me except the following piece of code. >>> >>> +/* Check REF's chain to add new insns into a queue >>> + and find registers requiring conversion. */ >>> >>> Comment is wrong because you don't have any conversions required for >>> your candidates. >> >> I will fix it. >> >>> + >>> +void >>> +scalar_chain_64::analyze_register_chain (bitmap candidates, df_ref ref) >>> +{ >>> + df_link *chain; >>> + >>> + gcc_assert (bitmap_bit_p (insns, DF_REF_INSN_UID (ref)) >>> + || bitmap_bit_p (candidates, DF_REF_INSN_UID (ref))); >>> + add_to_queue (DF_REF_INSN_UID (ref)); >>> + >>> + for (chain = DF_REF_CHAIN (ref); chain; chain = chain->next) >>> + { >>> + unsigned uid = DF_REF_INSN_UID (chain->ref); >>> + >>> + if (!NONDEBUG_INSN_P (DF_REF_INSN (chain->ref))) >>> + continue; >>> + >>> + if (!DF_REF_REG_MEM_P (chain->ref)) >>> + continue; >>> >>> I believe here you wrongly jump to the next ref intead of actually adding it >>> to a queue. You may just use >>> >>> gcc_assert (!DF_REF_REG_MEM_P (chain->ref)); >>> >>> because you should'n have a candidate used in address operand. >> >> I will update. >> >>> + >>> + if (bitmap_bit_p (insns, uid)) >>> + continue; >>> + >>> + if (bitmap_bit_p (candidates, uid)) >>> + add_to_queue (uid); >>> >>> Probably gcc_assert (bitmap_bit_p (candidates, uid)) since no uses and defs >>> out of candidates list are allowed? >> >> That would be wrong since there are >> >> while (!bitmap_empty_p (queue)) >> { >> insn_uid = bitmap_first_set_bit (queue); >> bitmap_clear_bit (queue, insn_uid); >> bitmap_clear_bit (candidates, insn_uid); >> add_insn (candidates, insn_uid); >> } >> >> An instruction is a candidate and the bit is cleared when >> analyze_register_chain is called. > > You clear candidates bit but the first thing you do in add_insn is set > insns bit. > Thus you should hit: > > if (bitmap_bit_p (insns, uid)) > continue; > > For handled candidates. > > Probably it would be more clear if we keep this clear/set pair > together? E.g. move > bitmap_clear_bit (candidates, insn_uid) to scalar_chain::add_insn. > After we started processing candidates, we only use candidates to check if an instruction is a candidate, not to check if an instruction is NOT a candidate.
2016-04-26 17:55 GMT+03:00 H.J. Lu <hjl.tools@gmail.com>: > On Tue, Apr 26, 2016 at 7:15 AM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote: >> 2016-04-26 17:07 GMT+03:00 H.J. Lu <hjl.tools@gmail.com>: >>> On Mon, Apr 25, 2016 at 9:13 AM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote: >>>> 2016-04-25 18:27 GMT+03:00 H.J. Lu <hjl.tools@gmail.com>: >>>>> >>>>> Ilya, can you take a look? >>>>> >>>>> Thanks. >>>>> >>>>> -- >>>>> H.J. >>>> >>>> Hi, >>>> >>>> Algorithmic part of the patch looks OK to me except the following piece of code. >>>> >>>> +/* Check REF's chain to add new insns into a queue >>>> + and find registers requiring conversion. */ >>>> >>>> Comment is wrong because you don't have any conversions required for >>>> your candidates. >>> >>> I will fix it. >>> >>>> + >>>> +void >>>> +scalar_chain_64::analyze_register_chain (bitmap candidates, df_ref ref) >>>> +{ >>>> + df_link *chain; >>>> + >>>> + gcc_assert (bitmap_bit_p (insns, DF_REF_INSN_UID (ref)) >>>> + || bitmap_bit_p (candidates, DF_REF_INSN_UID (ref))); >>>> + add_to_queue (DF_REF_INSN_UID (ref)); >>>> + >>>> + for (chain = DF_REF_CHAIN (ref); chain; chain = chain->next) >>>> + { >>>> + unsigned uid = DF_REF_INSN_UID (chain->ref); >>>> + >>>> + if (!NONDEBUG_INSN_P (DF_REF_INSN (chain->ref))) >>>> + continue; >>>> + >>>> + if (!DF_REF_REG_MEM_P (chain->ref)) >>>> + continue; >>>> >>>> I believe here you wrongly jump to the next ref intead of actually adding it >>>> to a queue. You may just use >>>> >>>> gcc_assert (!DF_REF_REG_MEM_P (chain->ref)); >>>> >>>> because you should'n have a candidate used in address operand. >>> >>> I will update. >>> >>>> + >>>> + if (bitmap_bit_p (insns, uid)) >>>> + continue; >>>> + >>>> + if (bitmap_bit_p (candidates, uid)) >>>> + add_to_queue (uid); >>>> >>>> Probably gcc_assert (bitmap_bit_p (candidates, uid)) since no uses and defs >>>> out of candidates list are allowed? >>> >>> That would be wrong since there are >>> >>> while (!bitmap_empty_p (queue)) >>> { >>> insn_uid = bitmap_first_set_bit (queue); >>> bitmap_clear_bit (queue, insn_uid); >>> bitmap_clear_bit (candidates, insn_uid); >>> add_insn (candidates, insn_uid); >>> } >>> >>> An instruction is a candidate and the bit is cleared when >>> analyze_register_chain is called. >> >> You clear candidates bit but the first thing you do in add_insn is set >> insns bit. >> Thus you should hit: >> >> if (bitmap_bit_p (insns, uid)) >> continue; >> >> For handled candidates. >> >> Probably it would be more clear if we keep this clear/set pair >> together? E.g. move >> bitmap_clear_bit (candidates, insn_uid) to scalar_chain::add_insn. >> > > After we started processing candidates, we only use candidates > to check if an instruction is a candidate, not to check if an > instruction is NOT a candidate. I don't see how it's related to what I said. My point is that when you analyze added insn you shouldn't reach insns which are both not in candidates and not in current scalar_chain_64. That's why I think you miss an assert in scalar_chain_64::analyze_register_chain. Thanks, Ilya > > -- > H.J.
On 04/26/2016 03:17 AM, Richard Biener wrote: > On Mon, 25 Apr 2016, Uros Bizjak wrote: > >> On Mon, Apr 25, 2016 at 4:47 PM, H.J. Lu <hjl.tools@gmail.com> wrote: >>> On Mon, Apr 25, 2016 at 7:18 AM, Uros Bizjak <ubizjak@gmail.com> wrote: >>>> On Mon, Apr 25, 2016 at 2:51 PM, H.J. Lu <hongjiu.lu@intel.com> wrote: >>>>> Tested on Linux/x86-64. OK for trunk? >>>> >>>>> + /* FIXME: Since the CSE pass may change dominance info, which isn't >>>>> + expected by the fwprop pass, call free_dominance_info to >>>>> + invalidate dominance info. Otherwise, the fwprop pass may crash >>>>> + when dominance info is changed. */ >>>>> + if (TARGET_64BIT) >>>>> + free_dominance_info (CDI_DOMINATORS); >>>>> + >>>> >>>> Please resolve the above problem first, target-dependent sources are >>>> not the place to apply band-aids for middle-end problems. The thread >>>> with the proposed fix died in [1]. >>>> >>>> [1] https://gcc.gnu.org/ml/gcc/2016-03/msg00143.html >>> >>> free_dominance_info (CDI_DOMINATORS) has been called in other >>> places to avoid this middle-end issue. I don't know when the middle-end >>> will be fixed. I don't think this target optimization should be penalized by >>> the middle-end issue. >> >> Let's ask Richard if he is OK with the workaround... > > Well, it's ultimately your call (it's a workaround in the target). > > Of course I'd like to see the underlying issue fixed and the > workarounds in "other places" be removed. Agreed. It's fundamentally papering over a problem elsewhere. jeff
On Tue, Apr 26, 2016 at 8:05 AM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote: > 2016-04-26 17:55 GMT+03:00 H.J. Lu <hjl.tools@gmail.com>: >> On Tue, Apr 26, 2016 at 7:15 AM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote: >>> 2016-04-26 17:07 GMT+03:00 H.J. Lu <hjl.tools@gmail.com>: >>>> On Mon, Apr 25, 2016 at 9:13 AM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote: >>>>> 2016-04-25 18:27 GMT+03:00 H.J. Lu <hjl.tools@gmail.com>: >>>>>> >>>>>> Ilya, can you take a look? >>>>>> >>>>>> Thanks. >>>>>> >>>>>> -- >>>>>> H.J. >>>>> >>>>> Hi, >>>>> >>>>> Algorithmic part of the patch looks OK to me except the following piece of code. >>>>> >>>>> +/* Check REF's chain to add new insns into a queue >>>>> + and find registers requiring conversion. */ >>>>> >>>>> Comment is wrong because you don't have any conversions required for >>>>> your candidates. >>>> >>>> I will fix it. >>>> >>>>> + >>>>> +void >>>>> +scalar_chain_64::analyze_register_chain (bitmap candidates, df_ref ref) >>>>> +{ >>>>> + df_link *chain; >>>>> + >>>>> + gcc_assert (bitmap_bit_p (insns, DF_REF_INSN_UID (ref)) >>>>> + || bitmap_bit_p (candidates, DF_REF_INSN_UID (ref))); >>>>> + add_to_queue (DF_REF_INSN_UID (ref)); >>>>> + >>>>> + for (chain = DF_REF_CHAIN (ref); chain; chain = chain->next) >>>>> + { >>>>> + unsigned uid = DF_REF_INSN_UID (chain->ref); >>>>> + >>>>> + if (!NONDEBUG_INSN_P (DF_REF_INSN (chain->ref))) >>>>> + continue; >>>>> + >>>>> + if (!DF_REF_REG_MEM_P (chain->ref)) >>>>> + continue; >>>>> >>>>> I believe here you wrongly jump to the next ref intead of actually adding it >>>>> to a queue. You may just use >>>>> >>>>> gcc_assert (!DF_REF_REG_MEM_P (chain->ref)); >>>>> >>>>> because you should'n have a candidate used in address operand. >>>> >>>> I will update. >>>> >>>>> + >>>>> + if (bitmap_bit_p (insns, uid)) >>>>> + continue; >>>>> + >>>>> + if (bitmap_bit_p (candidates, uid)) >>>>> + add_to_queue (uid); >>>>> >>>>> Probably gcc_assert (bitmap_bit_p (candidates, uid)) since no uses and defs >>>>> out of candidates list are allowed? >>>> >>>> That would be wrong since there are >>>> >>>> while (!bitmap_empty_p (queue)) >>>> { >>>> insn_uid = bitmap_first_set_bit (queue); >>>> bitmap_clear_bit (queue, insn_uid); >>>> bitmap_clear_bit (candidates, insn_uid); >>>> add_insn (candidates, insn_uid); >>>> } >>>> >>>> An instruction is a candidate and the bit is cleared when >>>> analyze_register_chain is called. >>> >>> You clear candidates bit but the first thing you do in add_insn is set >>> insns bit. >>> Thus you should hit: >>> >>> if (bitmap_bit_p (insns, uid)) >>> continue; >>> >>> For handled candidates. >>> >>> Probably it would be more clear if we keep this clear/set pair >>> together? E.g. move >>> bitmap_clear_bit (candidates, insn_uid) to scalar_chain::add_insn. >>> >> >> After we started processing candidates, we only use candidates >> to check if an instruction is a candidate, not to check if an >> instruction is NOT a candidate. > > I don't see how it's related to what I said. My point is that > when you analyze added insn you shouldn't reach insns which are both > not in candidates and not in current scalar_chain_64. That's why I > think you miss an assert in scalar_chain_64::analyze_register_chain. Since all candidates will be processed by while (!bitmap_empty_p (queue)) { insn_uid = bitmap_first_set_bit (queue); bitmap_clear_bit (queue, insn_uid); bitmap_clear_bit (candidates, insn_uid); add_insn (candidates, insn_uid); } I will change to /* Check REF's chain to add new insns into a queue. */ void timode_scalar_chain::analyze_register_chain (bitmap candidates, df_ref ref) { gcc_assert (bitmap_bit_p (insns, DF_REF_INSN_UID (ref)) || bitmap_bit_p (candidates, DF_REF_INSN_UID (ref))); add_to_queue (DF_REF_INSN_UID (ref)); }
On Tue, Apr 26, 2016 at 2:35 AM, Richard Biener <rguenther@suse.de> wrote: > On Tue, 26 Apr 2016, Uros Bizjak wrote: > >> On Tue, Apr 26, 2016 at 11:17 AM, Richard Biener <rguenther@suse.de> wrote: >> > On Mon, 25 Apr 2016, Uros Bizjak wrote: >> > >> >> On Mon, Apr 25, 2016 at 4:47 PM, H.J. Lu <hjl.tools@gmail.com> wrote: >> >> > On Mon, Apr 25, 2016 at 7:18 AM, Uros Bizjak <ubizjak@gmail.com> wrote: >> >> >> On Mon, Apr 25, 2016 at 2:51 PM, H.J. Lu <hongjiu.lu@intel.com> wrote: >> >> >>> Tested on Linux/x86-64. OK for trunk? >> >> >> >> >> >>> + /* FIXME: Since the CSE pass may change dominance info, which isn't >> >> >>> + expected by the fwprop pass, call free_dominance_info to >> >> >>> + invalidate dominance info. Otherwise, the fwprop pass may crash >> >> >>> + when dominance info is changed. */ >> >> >>> + if (TARGET_64BIT) >> >> >>> + free_dominance_info (CDI_DOMINATORS); >> >> >>> + >> >> >> >> >> >> Please resolve the above problem first, target-dependent sources are >> >> >> not the place to apply band-aids for middle-end problems. The thread >> >> >> with the proposed fix died in [1]. >> >> >> >> >> >> [1] https://gcc.gnu.org/ml/gcc/2016-03/msg00143.html >> >> > >> >> > free_dominance_info (CDI_DOMINATORS) has been called in other >> >> > places to avoid this middle-end issue. I don't know when the middle-end >> >> > will be fixed. I don't think this target optimization should be penalized by >> >> > the middle-end issue. >> >> >> >> Let's ask Richard if he is OK with the workaround... >> > >> > Well, it's ultimately your call (it's a workaround in the target). >> >> Oh well, ... >> >> > Of course I'd like to see the underlying issue fixed and the >> > workarounds in "other places" be removed. >> >> ... then at least a reference to a relevant PR should be added to a >> FIXME comment. > > HJ, can you please open a bug with 1) a testcase, 2) a patch to revert > the workaround so it shows the ICE and 3) a pointer to the ml thread > with your preliminary analysis? https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70807
2016-04-26 18:12 GMT+03:00 H.J. Lu <hjl.tools@gmail.com>: > On Tue, Apr 26, 2016 at 8:05 AM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote: >> 2016-04-26 17:55 GMT+03:00 H.J. Lu <hjl.tools@gmail.com>: >>> On Tue, Apr 26, 2016 at 7:15 AM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote: >>>> 2016-04-26 17:07 GMT+03:00 H.J. Lu <hjl.tools@gmail.com>: >>>>> On Mon, Apr 25, 2016 at 9:13 AM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote: >>>>>> 2016-04-25 18:27 GMT+03:00 H.J. Lu <hjl.tools@gmail.com>: >>>>>>> >>>>>>> Ilya, can you take a look? >>>>>>> >>>>>>> Thanks. >>>>>>> >>>>>>> -- >>>>>>> H.J. >>>>>> >>>>>> Hi, >>>>>> >>>>>> Algorithmic part of the patch looks OK to me except the following piece of code. >>>>>> >>>>>> +/* Check REF's chain to add new insns into a queue >>>>>> + and find registers requiring conversion. */ >>>>>> >>>>>> Comment is wrong because you don't have any conversions required for >>>>>> your candidates. >>>>> >>>>> I will fix it. >>>>> >>>>>> + >>>>>> +void >>>>>> +scalar_chain_64::analyze_register_chain (bitmap candidates, df_ref ref) >>>>>> +{ >>>>>> + df_link *chain; >>>>>> + >>>>>> + gcc_assert (bitmap_bit_p (insns, DF_REF_INSN_UID (ref)) >>>>>> + || bitmap_bit_p (candidates, DF_REF_INSN_UID (ref))); >>>>>> + add_to_queue (DF_REF_INSN_UID (ref)); >>>>>> + >>>>>> + for (chain = DF_REF_CHAIN (ref); chain; chain = chain->next) >>>>>> + { >>>>>> + unsigned uid = DF_REF_INSN_UID (chain->ref); >>>>>> + >>>>>> + if (!NONDEBUG_INSN_P (DF_REF_INSN (chain->ref))) >>>>>> + continue; >>>>>> + >>>>>> + if (!DF_REF_REG_MEM_P (chain->ref)) >>>>>> + continue; >>>>>> >>>>>> I believe here you wrongly jump to the next ref intead of actually adding it >>>>>> to a queue. You may just use >>>>>> >>>>>> gcc_assert (!DF_REF_REG_MEM_P (chain->ref)); >>>>>> >>>>>> because you should'n have a candidate used in address operand. >>>>> >>>>> I will update. >>>>> >>>>>> + >>>>>> + if (bitmap_bit_p (insns, uid)) >>>>>> + continue; >>>>>> + >>>>>> + if (bitmap_bit_p (candidates, uid)) >>>>>> + add_to_queue (uid); >>>>>> >>>>>> Probably gcc_assert (bitmap_bit_p (candidates, uid)) since no uses and defs >>>>>> out of candidates list are allowed? >>>>> >>>>> That would be wrong since there are >>>>> >>>>> while (!bitmap_empty_p (queue)) >>>>> { >>>>> insn_uid = bitmap_first_set_bit (queue); >>>>> bitmap_clear_bit (queue, insn_uid); >>>>> bitmap_clear_bit (candidates, insn_uid); >>>>> add_insn (candidates, insn_uid); >>>>> } >>>>> >>>>> An instruction is a candidate and the bit is cleared when >>>>> analyze_register_chain is called. >>>> >>>> You clear candidates bit but the first thing you do in add_insn is set >>>> insns bit. >>>> Thus you should hit: >>>> >>>> if (bitmap_bit_p (insns, uid)) >>>> continue; >>>> >>>> For handled candidates. >>>> >>>> Probably it would be more clear if we keep this clear/set pair >>>> together? E.g. move >>>> bitmap_clear_bit (candidates, insn_uid) to scalar_chain::add_insn. >>>> >>> >>> After we started processing candidates, we only use candidates >>> to check if an instruction is a candidate, not to check if an >>> instruction is NOT a candidate. >> >> I don't see how it's related to what I said. My point is that >> when you analyze added insn you shouldn't reach insns which are both >> not in candidates and not in current scalar_chain_64. That's why I >> think you miss an assert in scalar_chain_64::analyze_register_chain. > > Since all candidates will be processed by > > while (!bitmap_empty_p (queue)) > { > insn_uid = bitmap_first_set_bit (queue); > bitmap_clear_bit (queue, insn_uid); > bitmap_clear_bit (candidates, insn_uid); > add_insn (candidates, insn_uid); > } This process only queue, not all candidates. analyze_register_chain fills queue bitmap to build a chain. > > I will change to > > /* Check REF's chain to add new insns into a queue. */ > > void > timode_scalar_chain::analyze_register_chain (bitmap candidates, > df_ref ref) > { > gcc_assert (bitmap_bit_p (insns, DF_REF_INSN_UID (ref)) > || bitmap_bit_p (candidates, DF_REF_INSN_UID (ref))); > add_to_queue (DF_REF_INSN_UID (ref)); > } > The purpose of analyze_register_chain is to collect all register defs or uses into chain's queue. You can't just remove a loop over ref's chain then. Thanks, Ilya > > > -- > H.J.
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index a061b2f..c5296ae 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -2769,11 +2769,10 @@ convertible_comparison_p (rtx_insn *insn) return true; } -/* Return 1 if INSN may be converted into vector - instruction. */ +/* The 32-bit version of scalar_to_vector_candidate_p. */ static bool -scalar_to_vector_candidate_p (rtx_insn *insn) +scalar_to_vector_candidate_p_32 (rtx_insn *insn) { rtx def_set = single_set (insn); @@ -2833,16 +2832,79 @@ scalar_to_vector_candidate_p (rtx_insn *insn) return true; } -/* For a given bitmap of insn UIDs scans all instruction and - remove insn from CANDIDATES in case it has both convertible - and not convertible definitions. +/* The 64-bit version of scalar_to_vector_candidate_p. */ - All insns in a bitmap are conversion candidates according to - scalar_to_vector_candidate_p. Currently it implies all insns - are single_set. */ +static bool +scalar_to_vector_candidate_p_64 (rtx_insn *insn) +{ + rtx def_set = single_set (insn); + + if (!def_set) + return false; + + if (has_non_address_hard_reg (insn)) + return false; + + rtx src = SET_SRC (def_set); + rtx dst = SET_DEST (def_set); + + /* Only TImode load and store are allowed. */ + if (GET_MODE (dst) != TImode) + return false; + + if (MEM_P (dst)) + { + /* Check for store. Only support store from register or standard + SSE constants. */ + switch (GET_CODE (src)) + { + default: + return false; + + case REG: + /* For store from register, memory must be aligned or both + unaligned load and store are optimal. */ + return (!misaligned_operand (dst, TImode) + || (TARGET_SSE_UNALIGNED_LOAD_OPTIMAL + && TARGET_SSE_UNALIGNED_STORE_OPTIMAL)); + + case CONST_INT: + /* For store from standard SSE constant, memory must be + aligned or unaligned store is optimal. */ + return (standard_sse_constant_p (src, TImode) + && (!misaligned_operand (dst, TImode) + || TARGET_SSE_UNALIGNED_STORE_OPTIMAL)); + } + } + else if (MEM_P (src)) + { + /* Check for load. Memory must be aligned or both unaligned + load and store are optimal. */ + return (GET_CODE (dst) == REG + && (!misaligned_operand (src, TImode) + || (TARGET_SSE_UNALIGNED_LOAD_OPTIMAL + && TARGET_SSE_UNALIGNED_STORE_OPTIMAL))); + } + + return false; +} + +/* Return 1 if INSN may be converted into vector + instruction. */ + +static bool +scalar_to_vector_candidate_p (rtx_insn *insn) +{ + if (TARGET_64BIT) + return scalar_to_vector_candidate_p_64 (insn); + else + return scalar_to_vector_candidate_p_32 (insn); +} + +/* The 32-bit version of remove_non_convertible_regs. */ static void -remove_non_convertible_regs (bitmap candidates) +remove_non_convertible_regs_32 (bitmap candidates) { bitmap_iterator bi; unsigned id; @@ -2893,11 +2955,130 @@ remove_non_convertible_regs (bitmap candidates) BITMAP_FREE (regs); } +/* For a register REGNO, scan instructions for its defs and uses. + Put REGNO in REGS if a def or use isn't in CANDIDATES. */ + +static void +check_non_convertible_regs_64 (bitmap candidates, bitmap regs, + unsigned int regno) +{ + for (df_ref def = DF_REG_DEF_CHAIN (regno); + def; + def = DF_REF_NEXT_REG (def)) + { + if (!bitmap_bit_p (candidates, DF_REF_INSN_UID (def))) + { + if (dump_file) + fprintf (dump_file, + "r%d has non convertible def in insn %d\n", + regno, DF_REF_INSN_UID (def)); + + bitmap_set_bit (regs, regno); + break; + } + } + + for (df_ref ref = DF_REG_USE_CHAIN (regno); + ref; + ref = DF_REF_NEXT_REG (ref)) + { + /* Debug instructions are skipped. */ + if (NONDEBUG_INSN_P (DF_REF_INSN (ref)) + && !bitmap_bit_p (candidates, DF_REF_INSN_UID (ref))) + { + if (dump_file) + fprintf (dump_file, + "r%d has non convertible use in insn %d\n", + regno, DF_REF_INSN_UID (ref)); + + bitmap_set_bit (regs, regno); + break; + } + } +} + +/* The 64-bit version of remove_non_convertible_regs. */ + +static void +remove_non_convertible_regs_64 (bitmap candidates) +{ + bitmap_iterator bi; + unsigned id; + bitmap regs = BITMAP_ALLOC (NULL); + + EXECUTE_IF_SET_IN_BITMAP (candidates, 0, id, bi) + { + rtx def_set = single_set (DF_INSN_UID_GET (id)->insn); + rtx dest = SET_DEST (def_set); + rtx src = SET_SRC (def_set); + + if ((!REG_P (dest) + || bitmap_bit_p (regs, REGNO (dest)) + || HARD_REGISTER_P (dest)) + && (!REG_P (src) + || bitmap_bit_p (regs, REGNO (src)) + || HARD_REGISTER_P (src))) + continue; + + if (REG_P (dest)) + check_non_convertible_regs_64 (candidates, regs, REGNO (dest)); + + if (REG_P (src)) + check_non_convertible_regs_64 (candidates, regs, REGNO (src)); + } + + EXECUTE_IF_SET_IN_BITMAP (regs, 0, id, bi) + { + for (df_ref def = DF_REG_DEF_CHAIN (id); + def; + def = DF_REF_NEXT_REG (def)) + if (bitmap_bit_p (candidates, DF_REF_INSN_UID (def))) + { + if (dump_file) + fprintf (dump_file, "Removing insn %d from candidates list\n", + DF_REF_INSN_UID (def)); + + bitmap_clear_bit (candidates, DF_REF_INSN_UID (def)); + } + + for (df_ref ref = DF_REG_USE_CHAIN (id); + ref; + ref = DF_REF_NEXT_REG (ref)) + if (bitmap_bit_p (candidates, DF_REF_INSN_UID (ref))) + { + if (dump_file) + fprintf (dump_file, "Removing insn %d from candidates list\n", + DF_REF_INSN_UID (ref)); + + bitmap_clear_bit (candidates, DF_REF_INSN_UID (ref)); + } + } + + BITMAP_FREE (regs); +} + +/* For a given bitmap of insn UIDs scans all instruction and + remove insn from CANDIDATES in case it has both convertible + and not convertible definitions. + + All insns in a bitmap are conversion candidates according to + scalar_to_vector_candidate_p. Currently it implies all insns + are single_set. */ + +static void +remove_non_convertible_regs (bitmap candidates) +{ + if (TARGET_64BIT) + remove_non_convertible_regs_64 (candidates); + else + remove_non_convertible_regs_32 (candidates); +} + class scalar_chain { public: scalar_chain (); - ~scalar_chain (); + virtual ~scalar_chain (); static unsigned max_id; @@ -2913,21 +3094,55 @@ class scalar_chain bitmap defs_conv; void build (bitmap candidates, unsigned insn_uid); - int compute_convert_gain (); + virtual int compute_convert_gain () = 0; int convert (); + protected: + void add_to_queue (unsigned insn_uid); + void emit_conversion_insns (rtx insns, rtx_insn *pos); + private: void add_insn (bitmap candidates, unsigned insn_uid); - void add_to_queue (unsigned insn_uid); + virtual void convert_insn (rtx_insn *insn) = 0; + virtual void convert_registers () = 0; + virtual void analyze_register_chain (bitmap candidates, df_ref ref) = 0; +}; + +class scalar_chain_32 : public scalar_chain +{ + public: + int compute_convert_gain (); + private: void mark_dual_mode_def (df_ref def); void analyze_register_chain (bitmap candidates, df_ref ref); rtx replace_with_subreg (rtx x, rtx reg, rtx subreg); - void emit_conversion_insns (rtx insns, rtx_insn *pos); void replace_with_subreg_in_insn (rtx_insn *insn, rtx reg, rtx subreg); void convert_insn (rtx_insn *insn); void convert_op (rtx *op, rtx_insn *insn); void convert_reg (unsigned regno); void make_vector_copies (unsigned regno); + void convert_registers (); +}; + +class scalar_chain_64 : public scalar_chain +{ + public: + scalar_chain_64 (rtx r0, rtx r1) + : scalar_chain (), zero (r0), minus_one (r1) { } + + /* Convert from TImode to V1TImode is always faster. */ + int compute_convert_gain () { return 1; } + + private: + void analyze_register_chain (bitmap candidates, df_ref ref); + void convert_insn (rtx_insn *insn); + /* We don't convert registers to difference size. */ + void convert_registers () {} + + /* Use one scatch register for loading CONST0_RTX and one for loading + CONSTM1_RTX so that they can be CSEed. */ + rtx zero; + rtx minus_one; }; unsigned scalar_chain::max_id = 0; @@ -2976,7 +3191,7 @@ scalar_chain::add_to_queue (unsigned insn_uid) /* Mark register defined by DEF as requiring conversion. */ void -scalar_chain::mark_dual_mode_def (df_ref def) +scalar_chain_32::mark_dual_mode_def (df_ref def) { gcc_assert (DF_REF_REG_DEF_P (def)); @@ -2995,7 +3210,7 @@ scalar_chain::mark_dual_mode_def (df_ref def) and find registers requiring conversion. */ void -scalar_chain::analyze_register_chain (bitmap candidates, df_ref ref) +scalar_chain_32::analyze_register_chain (bitmap candidates, df_ref ref) { df_link *chain; @@ -3039,6 +3254,36 @@ scalar_chain::analyze_register_chain (bitmap candidates, df_ref ref) } } +/* Check REF's chain to add new insns into a queue + and find registers requiring conversion. */ + +void +scalar_chain_64::analyze_register_chain (bitmap candidates, df_ref ref) +{ + df_link *chain; + + gcc_assert (bitmap_bit_p (insns, DF_REF_INSN_UID (ref)) + || bitmap_bit_p (candidates, DF_REF_INSN_UID (ref))); + add_to_queue (DF_REF_INSN_UID (ref)); + + for (chain = DF_REF_CHAIN (ref); chain; chain = chain->next) + { + unsigned uid = DF_REF_INSN_UID (chain->ref); + + if (!NONDEBUG_INSN_P (DF_REF_INSN (chain->ref))) + continue; + + if (!DF_REF_REG_MEM_P (chain->ref)) + continue; + + if (bitmap_bit_p (insns, uid)) + continue; + + if (bitmap_bit_p (candidates, uid)) + add_to_queue (uid); + } +} + /* Add instruction into a chain. */ void @@ -3117,7 +3362,7 @@ scalar_chain::build (bitmap candidates, unsigned insn_uid) /* Compute a gain for chain conversion. */ int -scalar_chain::compute_convert_gain () +scalar_chain_32::compute_convert_gain () { bitmap_iterator bi; unsigned insn_uid; @@ -3174,7 +3419,7 @@ scalar_chain::compute_convert_gain () /* Replace REG in X with a V2DI subreg of NEW_REG. */ rtx -scalar_chain::replace_with_subreg (rtx x, rtx reg, rtx new_reg) +scalar_chain_32::replace_with_subreg (rtx x, rtx reg, rtx new_reg) { if (x == reg) return gen_rtx_SUBREG (V2DImode, new_reg, 0); @@ -3197,7 +3442,7 @@ scalar_chain::replace_with_subreg (rtx x, rtx reg, rtx new_reg) /* Replace REG in INSN with a V2DI subreg of NEW_REG. */ void -scalar_chain::replace_with_subreg_in_insn (rtx_insn *insn, rtx reg, rtx new_reg) +scalar_chain_32::replace_with_subreg_in_insn (rtx_insn *insn, rtx reg, rtx new_reg) { replace_with_subreg (single_set (insn), reg, new_reg); } @@ -3227,7 +3472,7 @@ scalar_chain::emit_conversion_insns (rtx insns, rtx_insn *after) and replace its uses in a chain. */ void -scalar_chain::make_vector_copies (unsigned regno) +scalar_chain_32::make_vector_copies (unsigned regno) { rtx reg = regno_reg_rtx[regno]; rtx vreg = gen_reg_rtx (DImode); @@ -3298,7 +3543,7 @@ scalar_chain::make_vector_copies (unsigned regno) in case register is used in not convertible insn. */ void -scalar_chain::convert_reg (unsigned regno) +scalar_chain_32::convert_reg (unsigned regno) { bool scalar_copy = bitmap_bit_p (defs_conv, regno); rtx reg = regno_reg_rtx[regno]; @@ -3390,7 +3635,7 @@ scalar_chain::convert_reg (unsigned regno) registers conversion. */ void -scalar_chain::convert_op (rtx *op, rtx_insn *insn) +scalar_chain_32::convert_op (rtx *op, rtx_insn *insn) { *op = copy_rtx_if_shared (*op); @@ -3434,7 +3679,7 @@ scalar_chain::convert_op (rtx *op, rtx_insn *insn) /* Convert INSN to vector mode. */ void -scalar_chain::convert_insn (rtx_insn *insn) +scalar_chain_32::convert_insn (rtx_insn *insn) { rtx def_set = single_set (insn); rtx src = SET_SRC (def_set); @@ -3511,6 +3756,88 @@ scalar_chain::convert_insn (rtx_insn *insn) df_insn_rescan (insn); } +/* Convert INSN from TImode to V1T1mode. */ + +void +scalar_chain_64::convert_insn (rtx_insn *insn) +{ + rtx def_set = single_set (insn); + rtx src = SET_SRC (def_set); + rtx tmp; + rtx dst = SET_DEST (def_set); + + switch (GET_CODE (dst)) + { + case REG: + tmp = find_reg_equal_equiv_note (insn); + if (tmp) + PUT_MODE (XEXP (tmp, 0), V1TImode); + case MEM: + PUT_MODE (dst, V1TImode); + break; + + default: + gcc_unreachable (); + } + + switch (GET_CODE (src)) + { + case REG: + case MEM: + PUT_MODE (src, V1TImode); + break; + + case CONST_INT: + switch (standard_sse_constant_p (src, TImode)) + { + case 1: + src = CONST0_RTX (GET_MODE (dst)); + tmp = zero; + break; + case 2: + src = CONSTM1_RTX (GET_MODE (dst)); + tmp = minus_one; + break; + default: + gcc_unreachable (); + } + if (NONDEBUG_INSN_P (insn)) + { + /* Since there are no instructions to store standard SSE + constant, temporary register usage is required. */ + emit_conversion_insns (gen_rtx_SET (dst, tmp), insn); + dst = tmp; + } + break; + + default: + gcc_unreachable (); + } + + SET_SRC (def_set) = src; + SET_DEST (def_set) = dst; + + /* Drop possible dead definitions. */ + PATTERN (insn) = def_set; + + INSN_CODE (insn) = -1; + recog_memoized (insn); + df_insn_rescan (insn); +} + +void +scalar_chain_32::convert_registers () +{ + bitmap_iterator bi; + unsigned id; + + EXECUTE_IF_SET_IN_BITMAP (defs, 0, id, bi) + convert_reg (id); + + EXECUTE_IF_AND_COMPL_IN_BITMAP (defs_conv, defs, 0, id, bi) + make_vector_copies (id); +} + /* Convert whole chain creating required register conversions and copies. */ @@ -3527,11 +3854,7 @@ scalar_chain::convert () if (dump_file) fprintf (dump_file, "Converting chain #%d...\n", chain_id); - EXECUTE_IF_SET_IN_BITMAP (defs, 0, id, bi) - convert_reg (id); - - EXECUTE_IF_AND_COMPL_IN_BITMAP (defs_conv, defs, 0, id, bi) - make_vector_copies (id); + convert_registers (); EXECUTE_IF_SET_IN_BITMAP (insns, 0, id, bi) { @@ -3551,6 +3874,7 @@ convert_scalars_to_vector () basic_block bb; bitmap candidates; int converted_insns = 0; + rtx zero, minus_one; bitmap_obstack_initialize (NULL); candidates = BITMAP_ALLOC (NULL); @@ -3585,22 +3909,40 @@ convert_scalars_to_vector () if (dump_file) fprintf (dump_file, "There are no candidates for optimization.\n"); + if (TARGET_64BIT) + { + zero = gen_reg_rtx (V1TImode); + minus_one = gen_reg_rtx (V1TImode); + } + else + { + zero = NULL_RTX; + minus_one = NULL_RTX; + } + while (!bitmap_empty_p (candidates)) { unsigned uid = bitmap_first_set_bit (candidates); - scalar_chain chain; + scalar_chain *chain; + + if (TARGET_64BIT) + chain = new scalar_chain_64 (zero, minus_one); + else + chain = new scalar_chain_32; /* Find instructions chain we want to convert to vector mode. Check all uses and definitions to estimate all required conversions. */ - chain.build (candidates, uid); + chain->build (candidates, uid); - if (chain.compute_convert_gain () > 0) - converted_insns += chain.convert (); + if (chain->compute_convert_gain () > 0) + converted_insns += chain->convert (); else if (dump_file) fprintf (dump_file, "Chain #%d conversion is not profitable\n", - chain.chain_id); + chain->chain_id); + + delete chain; } if (dump_file) @@ -3610,6 +3952,13 @@ convert_scalars_to_vector () bitmap_obstack_release (NULL); df_process_deferred_rescans (); + /* FIXME: Since the CSE pass may change dominance info, which isn't + expected by the fwprop pass, call free_dominance_info to + invalidate dominance info. Otherwise, the fwprop pass may crash + when dominance info is changed. */ + if (TARGET_64BIT) + free_dominance_info (CDI_DOMINATORS); + /* Conversion means we may have 128bit register spills/fills which require aligned stack. */ if (converted_insns) @@ -3683,7 +4032,7 @@ public: /* opt_pass methods: */ virtual bool gate (function *) { - return !TARGET_64BIT && TARGET_STV && TARGET_SSE2 && optimize > 1; + return TARGET_STV && TARGET_SSE2 && optimize > 1; } virtual unsigned int execute (function *) @@ -5596,17 +5945,23 @@ ix86_option_override (void) 1, PASS_POS_INSERT_AFTER }; opt_pass *pass_stv = make_pass_stv (g); - struct register_pass_info stv_info + struct register_pass_info stv_info_32 = { pass_stv, "combine", 1, PASS_POS_INSERT_AFTER }; + /* Run the 64-bit STV pass before the CSE pass so that CONST0_RTX and + CONSTM1_RTX generated by the STV pass can be CSEed. */ + struct register_pass_info stv_info_64 + = { pass_stv, "cse2", + 1, PASS_POS_INSERT_BEFORE + }; ix86_option_override_internal (true, &global_options, &global_options_set); /* This needs to be done at start up. It's convenient to do it here. */ register_pass (&insert_vzeroupper_info); - register_pass (&stv_info); + register_pass (TARGET_64BIT ? &stv_info_64 : &stv_info_32); } /* Implement the TARGET_OFFLOAD_OPTIONS hook. */ diff --git a/gcc/testsuite/gcc.target/i386/pr55247-2.c b/gcc/testsuite/gcc.target/i386/pr55247-2.c index 6b5b36d..77901af 100644 --- a/gcc/testsuite/gcc.target/i386/pr55247-2.c +++ b/gcc/testsuite/gcc.target/i386/pr55247-2.c @@ -1,6 +1,6 @@ /* { dg-do compile { target { ! ia32 } } } */ /* { dg-require-effective-target maybe_x32 } */ -/* { dg-options "-O2 -mx32 -mtune=generic -maddress-mode=long" } */ +/* { dg-options "-O2 -mx32 -mtune=generic -maddress-mode=long -dp" } */ typedef unsigned int uint32_t; typedef uint32_t Elf32_Word; @@ -34,4 +34,5 @@ _dl_profile_fixup (struct link_map *l, Elf32_Word reloc_arg) symbind32 (&sym); } -/* { dg-final { scan-assembler-not "%xmm\[0-9\]" } } */ +/* { dg-final { scan-assembler-times "movv1ti_internal" 2 } } */ +/* { dg-final { scan-assembler-not "movti_internal" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr70155-1.c b/gcc/testsuite/gcc.target/i386/pr70155-1.c new file mode 100644 index 0000000..3500364 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr70155-1.c @@ -0,0 +1,13 @@ +/* { dg-do compile { target int128 } } */ +/* { dg-options "-O2 -msse2 -mtune=generic -dp" } */ + +extern __int128 a, b; + +void +foo (void) +{ + a = b; +} + +/* { dg-final { scan-assembler-times "movv1ti_internal" 2 } } */ +/* { dg-final { scan-assembler-not "\\*movdi_internal" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr70155-10.c b/gcc/testsuite/gcc.target/i386/pr70155-10.c new file mode 100644 index 0000000..2d0b91f --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr70155-10.c @@ -0,0 +1,19 @@ +/* { dg-do compile { target int128 } } */ +/* { dg-options "-O2 -msse2 -mtune=core2 -dp" } */ + +extern __int128 a; + +struct foo +{ + __int128 i; +}__attribute__ ((packed)); + +extern struct foo x; + +void +foo (void) +{ + a = x.i; +} + +/* { dg-final { scan-assembler-not "movv1ti_internal" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr70155-11.c b/gcc/testsuite/gcc.target/i386/pr70155-11.c new file mode 100644 index 0000000..b00aa13 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr70155-11.c @@ -0,0 +1,19 @@ +/* { dg-do compile { target int128 } } */ +/* { dg-options "-O2 -msse2 -mtune=core2 -dp" } */ + +extern __int128 a; + +struct foo +{ + __int128 i; +}__attribute__ ((packed)); + +extern struct foo x; + +void +foo (void) +{ + x.i = a; +} + +/* { dg-final { scan-assembler-not "movv1ti_internal" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr70155-12.c b/gcc/testsuite/gcc.target/i386/pr70155-12.c new file mode 100644 index 0000000..dd0edf0 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr70155-12.c @@ -0,0 +1,17 @@ +/* { dg-do compile { target int128 } } */ +/* { dg-options "-O2 -msse2 -mtune=core2 -dp" } */ + +struct foo +{ + __int128 i; +}__attribute__ ((packed)); + +extern struct foo x; + +void +foo (void) +{ + x.i = 0; +} + +/* { dg-final { scan-assembler-not "movv1ti_internal" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr70155-13.c b/gcc/testsuite/gcc.target/i386/pr70155-13.c new file mode 100644 index 0000000..6718290 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr70155-13.c @@ -0,0 +1,17 @@ +/* { dg-do compile { target int128 } } */ +/* { dg-options "-O2 -msse2 -mtune=core2 -dp" } */ + +struct foo +{ + __int128 i; +}__attribute__ ((packed)); + +extern struct foo x; + +void +foo (void) +{ + x.i = -1; +} + +/* { dg-final { scan-assembler-not "movv1ti_internal" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr70155-14.c b/gcc/testsuite/gcc.target/i386/pr70155-14.c new file mode 100644 index 0000000..a43de2e --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr70155-14.c @@ -0,0 +1,17 @@ +/* { dg-do compile { target int128 } } */ +/* { dg-options "-O2 -msse2 -mtune=generic -dp" } */ + +struct foo +{ + __int128 i; +}__attribute__ ((packed)); + +extern struct foo x; + +void +foo (void) +{ + x.i = 2; +} + +/* { dg-final { scan-assembler-not "movv1ti_internal" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr70155-15.c b/gcc/testsuite/gcc.target/i386/pr70155-15.c new file mode 100644 index 0000000..e9cafcc --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr70155-15.c @@ -0,0 +1,18 @@ +/* { dg-do compile { target int128 } } */ +/* { dg-options "-O2 -msse2 -mtune=core2 -mtune-ctrl=sse_unaligned_store_optimal -dp" } */ + +struct foo +{ + __int128 i; +}__attribute__ ((packed)); + +extern struct foo x; + +void +foo (void) +{ + x.i = 0; +} + +/* { dg-final { scan-assembler-times "movv1ti_internal" 2 } } */ +/* { dg-final { scan-assembler-not "\\*movdi_internal" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr70155-16.c b/gcc/testsuite/gcc.target/i386/pr70155-16.c new file mode 100644 index 0000000..7750b58 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr70155-16.c @@ -0,0 +1,17 @@ +/* { dg-do compile { target int128 } } */ +/* { dg-options "-O2 -msse2 -mtune=core2 -mtune-ctrl=sse_unaligned_load_optimal -dp" } */ + +struct foo +{ + __int128 i; +}__attribute__ ((packed)); + +extern struct foo x; + +void +foo (void) +{ + x.i = 0; +} + +/* { dg-final { scan-assembler-not "movv1ti_internal" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr70155-17.c b/gcc/testsuite/gcc.target/i386/pr70155-17.c new file mode 100644 index 0000000..a9427e6 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr70155-17.c @@ -0,0 +1,18 @@ +/* { dg-do compile { target int128 } } */ +/* { dg-options "-O2 -msse2 -mtune=generic -dp" } */ + +extern __int128 a, b, c, d, e, f; + +void +foo (void) +{ + a = 0; + b = -1; + c = 0; + d = -1; + e = 0; + f = -1; +} + +/* { dg-final { scan-assembler-times "movv1ti_internal" 8 } } */ +/* { dg-final { scan-assembler-not "\\*movdi_internal" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr70155-18.c b/gcc/testsuite/gcc.target/i386/pr70155-18.c new file mode 100644 index 0000000..eb9db68 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr70155-18.c @@ -0,0 +1,12 @@ +/* { dg-do compile { target { ! ia32 } } } */ +/* { dg-options "-O2 -msse2 -mtune=generic -dp" } */ + +extern char *src, *dst; + +char * +foo1 (void) +{ + return __builtin_memcpy (dst, src, 16); +} + +/* { dg-final { scan-assembler-times "movv1ti_internal" 2 } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr70155-19.c b/gcc/testsuite/gcc.target/i386/pr70155-19.c new file mode 100644 index 0000000..e2e73aa --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr70155-19.c @@ -0,0 +1,12 @@ +/* { dg-do compile { target { ! ia32 } } } */ +/* { dg-options "-O2 -msse2 -mtune=generic -dp" } */ + +extern char *src, *dst; + +char * +foo1 (void) +{ + return __builtin_mempcpy (dst, src, 16); +} + +/* { dg-final { scan-assembler-times "movv1ti_internal" 2 } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr70155-2.c b/gcc/testsuite/gcc.target/i386/pr70155-2.c new file mode 100644 index 0000000..af2ddc6 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr70155-2.c @@ -0,0 +1,18 @@ +/* { dg-do compile { target int128 } } */ +/* { dg-options "-O2 -msse2 -mtune=generic -dp" } */ + +struct foo +{ + __int128 i; +}__attribute__ ((packed)); + +extern struct foo x, y; + +void +foo (void) +{ + x = y; +} + +/* { dg-final { scan-assembler-times "movv1ti_internal" 2 } } */ +/* { dg-final { scan-assembler-not "\\*movdi_internal" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr70155-20.c b/gcc/testsuite/gcc.target/i386/pr70155-20.c new file mode 100644 index 0000000..10b8c45 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr70155-20.c @@ -0,0 +1,13 @@ +/* { dg-do compile { target int128 } } */ +/* { dg-options "-O2 -msse2 -mtune=generic -dp" } */ + +extern __int128 a, b; + +__int128 +foo (void) +{ + a = b; + return b; +} + +/* { dg-final { scan-assembler-not "movv1ti_internal" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr70155-21.c b/gcc/testsuite/gcc.target/i386/pr70155-21.c new file mode 100644 index 0000000..be76e5f --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr70155-21.c @@ -0,0 +1,13 @@ +/* { dg-do compile { target int128 } } */ +/* { dg-options "-O2 -msse2 -mtune=generic -dp" } */ + +extern __int128 a, b, c; + +void +foo (void) +{ + a = b; + c = a + 1; +} + +/* { dg-final { scan-assembler-not "movv1ti_internal" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr70155-22.c b/gcc/testsuite/gcc.target/i386/pr70155-22.c new file mode 100644 index 0000000..ff5cbce --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr70155-22.c @@ -0,0 +1,14 @@ +/* { dg-do compile { target int128 } } */ +/* { dg-options "-O2 -msse2 -mtune=generic -dp" } */ + +extern __int128 a, b, c; + +void +foo (void) +{ + a = b; + c++; +} + +/* { dg-final { scan-assembler-times "movv1ti_internal" 2 } } */ +/* { dg-final { scan-assembler-not "\\*movdi_internal" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr70155-3.c b/gcc/testsuite/gcc.target/i386/pr70155-3.c new file mode 100644 index 0000000..01b38aa --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr70155-3.c @@ -0,0 +1,20 @@ +/* { dg-do compile { target int128 } } */ +/* { dg-options "-O2 -msse2 -mtune=generic -dp" } */ + +extern __int128 a; + +struct foo +{ + __int128 i; +}__attribute__ ((packed)); + +extern struct foo x; + +void +foo (void) +{ + a = x.i; +} + +/* { dg-final { scan-assembler-times "movv1ti_internal" 2 } } */ +/* { dg-final { scan-assembler-not "\\*movdi_internal" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr70155-4.c b/gcc/testsuite/gcc.target/i386/pr70155-4.c new file mode 100644 index 0000000..31bc0a7 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr70155-4.c @@ -0,0 +1,20 @@ +/* { dg-do compile { target int128 } } */ +/* { dg-options "-O2 -msse2 -mtune=generic -dp" } */ + +extern __int128 a; + +struct foo +{ + __int128 i; +}__attribute__ ((packed)); + +extern struct foo x; + +void +foo (void) +{ + x.i = a; +} + +/* { dg-final { scan-assembler-times "movv1ti_internal" 2 } } */ +/* { dg-final { scan-assembler-not "\\*movdi_internal" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr70155-5.c b/gcc/testsuite/gcc.target/i386/pr70155-5.c new file mode 100644 index 0000000..9647452 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr70155-5.c @@ -0,0 +1,13 @@ +/* { dg-do compile { target int128 } } */ +/* { dg-options "-O2 -msse2 -mtune=generic -dp" } */ + +extern __int128 a; + +void +foo (void) +{ + a = 0; +} + +/* { dg-final { scan-assembler-times "movv1ti_internal" 2 } } */ +/* { dg-final { scan-assembler-not "\\*movdi_internal" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr70155-6.c b/gcc/testsuite/gcc.target/i386/pr70155-6.c new file mode 100644 index 0000000..7e074a73 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr70155-6.c @@ -0,0 +1,13 @@ +/* { dg-do compile { target int128 } } */ +/* { dg-options "-O2 -msse2 -mtune=generic -dp" } */ + +extern __int128 a; + +void +foo (void) +{ + a = -1; +} + +/* { dg-final { scan-assembler-times "movv1ti_internal" 2 } } */ +/* { dg-final { scan-assembler-not "\\*movdi_internal" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr70155-7.c b/gcc/testsuite/gcc.target/i386/pr70155-7.c new file mode 100644 index 0000000..93c6fc0 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr70155-7.c @@ -0,0 +1,18 @@ +/* { dg-do compile { target int128 } } */ +/* { dg-options "-O2 -msse2 -mtune=generic -dp" } */ + +struct foo +{ + __int128 i; +}__attribute__ ((packed)); + +extern struct foo x; + +void +foo (void) +{ + x.i = 0; +} + +/* { dg-final { scan-assembler-times "movv1ti_internal" 2 } } */ +/* { dg-final { scan-assembler-not "\\*movdi_internal" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr70155-8.c b/gcc/testsuite/gcc.target/i386/pr70155-8.c new file mode 100644 index 0000000..f304a4e --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr70155-8.c @@ -0,0 +1,18 @@ +/* { dg-do compile { target int128 } } */ +/* { dg-options "-O2 -msse2 -mtune=generic -dp" } */ + +struct foo +{ + __int128 i; +}__attribute__ ((packed)); + +extern struct foo x; + +void +foo (void) +{ + x.i = -1; +} + +/* { dg-final { scan-assembler-times "movv1ti_internal" 2 } } */ +/* { dg-final { scan-assembler-not "\\*movdi_internal" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr70155-9.c b/gcc/testsuite/gcc.target/i386/pr70155-9.c new file mode 100644 index 0000000..5dc3a76 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr70155-9.c @@ -0,0 +1,17 @@ +/* { dg-do compile { target int128 } } */ +/* { dg-options "-O2 -msse2 -mtune=core2 -dp" } */ + +struct foo +{ + __int128 i; +}__attribute__ ((packed)); + +extern struct foo x, y; + +void +foo (void) +{ + x = y; +} + +/* { dg-final { scan-assembler-not "movv1ti_internal" } } */