Message ID | 002201d8ae8b$d11fa9b0$735efd10$@nextmovesoftware.com |
---|---|
State | New |
Headers | show |
Series | [x86] PR target/106577: force_reg may clobber operands during split. | expand |
On Fri, Aug 12, 2022 at 10:41 PM Roger Sayle <roger@nextmovesoftware.com> wrote: > > > This patch fixes PR target/106577 which is a recent ICE on valid regression > caused by my introduction of a *testti_doubleword pre-reload splitter in > i386.md. During the split pass before reload, this converts the virtual > *testti_doubleword into an *andti3_doubleword and *cmpti_doubleword, > checking that any immediate operand is a valid "x86_64_hilo_general_operand" > and placing it into a TImode register using force_reg if it isn't. > > The unexpected behaviour (that caught me out) is that calling force_reg > may occasionally clobber the contents of the global operands array, or > more accurately recog_data.operand[0], which means that by the time > split_XXX calls gen_split_YYY the replacement insn's operands have been > corrupted. > > It's difficult to tell who (if anyone is at fault). The re-entrant > stack trace (for the attached PR) looks like: > > gen_split_203 (*testti_doubleword) calls > force_reg calls > emit_move_insn calls > emit_move_insn_1 calls > gen_movti calls > ix86_expand_move calls > ix86_convert_const_wide_int_to_broadcast calls > ix86_vector_duplicate_value calls > recog_memoized calls > recog. > > By far the simplest and possibly correct fix is rather than attempt > to push and pop recog_data, to simply (in pre-reload splits) save a > copy of any operands that will be needed after force_reg, and use > these copies afterwards. Many pre-reload splitters avoid this issue > using "[(clobber (const_int 0))]" and so avoid gen_split_YYY functions, > but in our case we still need to save a copy of operands[0] (even if we > call emit_insn or expand_* ourselves), so we might as well continue to > use the conveniently generated gen_split. > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > and make -k check, both with and without --target_board=unix{-m32}, > with no new failures. Ok for mainline? Why this obviously fixes the issue seen I wonder whether there's more of recog_data that might be used after control flow returns to recog_memoized and thus the fix would be there, not in any backend pattern triggering the issue like this? The "easiest" fix would maybe to add a in_recog flag and simply return FAIL from recog when recursing. Not sure what the effect on this particular pattern would be though? The better(?) fix might be to push/pop recog_data in 'recog', but of course give that recog_data is currently a global leakage in intermediate code can still happen. That said - does anybody know of similar fixes for this issue in other backends patterns? Thanks, Richard. > > > 2022-08-12 Roger Sayle <roger@nextmovesoftware.com> > > gcc/ChangeLog > PR target/106577 > * config/i386/i386.md (*testti_doubleword): Preserve a copy of > operands[0], and move initialization of operands[2] later, as the > call to force_reg may clobber the contents of the operands array. > > gcc/testsuite/ChangeLog > PR target/106577 > * gcc.target/i386/pr106577.c: New test case. > > > Thanks, > Roger > -- >
Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes: > On Fri, Aug 12, 2022 at 10:41 PM Roger Sayle <roger@nextmovesoftware.com> wrote: >> >> >> This patch fixes PR target/106577 which is a recent ICE on valid regression >> caused by my introduction of a *testti_doubleword pre-reload splitter in >> i386.md. During the split pass before reload, this converts the virtual >> *testti_doubleword into an *andti3_doubleword and *cmpti_doubleword, >> checking that any immediate operand is a valid "x86_64_hilo_general_operand" >> and placing it into a TImode register using force_reg if it isn't. >> >> The unexpected behaviour (that caught me out) is that calling force_reg >> may occasionally clobber the contents of the global operands array, or >> more accurately recog_data.operand[0], which means that by the time >> split_XXX calls gen_split_YYY the replacement insn's operands have been >> corrupted. >> >> It's difficult to tell who (if anyone is at fault). The re-entrant >> stack trace (for the attached PR) looks like: >> >> gen_split_203 (*testti_doubleword) calls >> force_reg calls >> emit_move_insn calls >> emit_move_insn_1 calls >> gen_movti calls >> ix86_expand_move calls >> ix86_convert_const_wide_int_to_broadcast calls >> ix86_vector_duplicate_value calls >> recog_memoized calls >> recog. >> >> By far the simplest and possibly correct fix is rather than attempt >> to push and pop recog_data, to simply (in pre-reload splits) save a >> copy of any operands that will be needed after force_reg, and use >> these copies afterwards. Many pre-reload splitters avoid this issue >> using "[(clobber (const_int 0))]" and so avoid gen_split_YYY functions, >> but in our case we still need to save a copy of operands[0] (even if we >> call emit_insn or expand_* ourselves), so we might as well continue to >> use the conveniently generated gen_split. >> >> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap >> and make -k check, both with and without --target_board=unix{-m32}, >> with no new failures. Ok for mainline? > > Why this obviously fixes the issue seen I wonder whether there's > more of recog_data that might be used after control flow returns > to recog_memoized and thus the fix would be there, not in any > backend pattern triggering the issue like this? > > The "easiest" fix would maybe to add a in_recog flag and > simply return FAIL from recog when recursing. Not sure what > the effect on this particular pattern would be though? > > The better(?) fix might be to push/pop recog_data in 'recog', but > of course give that recog_data is currently a global leakage > in intermediate code can still happen. > > That said - does anybody know of similar fixes for this issue in other > backends patterns? I don't think it's valid for a simple query function like ix86_vector_duplicate_value to clobber global state. Doing that could cause problems in other situations, not just splits. Ideally, it would be good to wean insn-recog.cc:recog off global state. The only parts of recog_data it uses (if I didn't miss something) are recog_data.operands and recog_data.insn (but only to nullify it for recog_memoized, which wouldn't be necessary if recog didn't clobber recog_data.operands). But I guess some .md expand/insn conditions probably rely on the operands array being in recog_data, so that might not be easy. IMO the correct low-effort fix is to save and restore recog_data in ix86_vector_duplicate_value. It's a relatively big copy, but the current code is pretty wasteful anyway (allocating at least a new SET and INSN for every query). Compared to the overhead of doing that, a copy to and from the stack shouldn't be too bad. Thanks, Richard
On Tue, Aug 16, 2022 at 10:14 AM Richard Sandiford <richard.sandiford@arm.com> wrote: > > Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes: > > On Fri, Aug 12, 2022 at 10:41 PM Roger Sayle <roger@nextmovesoftware.com> wrote: > >> > >> > >> This patch fixes PR target/106577 which is a recent ICE on valid regression > >> caused by my introduction of a *testti_doubleword pre-reload splitter in > >> i386.md. During the split pass before reload, this converts the virtual > >> *testti_doubleword into an *andti3_doubleword and *cmpti_doubleword, > >> checking that any immediate operand is a valid "x86_64_hilo_general_operand" > >> and placing it into a TImode register using force_reg if it isn't. > >> > >> The unexpected behaviour (that caught me out) is that calling force_reg > >> may occasionally clobber the contents of the global operands array, or > >> more accurately recog_data.operand[0], which means that by the time > >> split_XXX calls gen_split_YYY the replacement insn's operands have been > >> corrupted. > >> > >> It's difficult to tell who (if anyone is at fault). The re-entrant > >> stack trace (for the attached PR) looks like: > >> > >> gen_split_203 (*testti_doubleword) calls > >> force_reg calls > >> emit_move_insn calls > >> emit_move_insn_1 calls > >> gen_movti calls > >> ix86_expand_move calls > >> ix86_convert_const_wide_int_to_broadcast calls > >> ix86_vector_duplicate_value calls > >> recog_memoized calls > >> recog. > >> > >> By far the simplest and possibly correct fix is rather than attempt > >> to push and pop recog_data, to simply (in pre-reload splits) save a > >> copy of any operands that will be needed after force_reg, and use > >> these copies afterwards. Many pre-reload splitters avoid this issue > >> using "[(clobber (const_int 0))]" and so avoid gen_split_YYY functions, > >> but in our case we still need to save a copy of operands[0] (even if we > >> call emit_insn or expand_* ourselves), so we might as well continue to > >> use the conveniently generated gen_split. > >> > >> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > >> and make -k check, both with and without --target_board=unix{-m32}, > >> with no new failures. Ok for mainline? > > > > Why this obviously fixes the issue seen I wonder whether there's > > more of recog_data that might be used after control flow returns > > to recog_memoized and thus the fix would be there, not in any > > backend pattern triggering the issue like this? > > > > The "easiest" fix would maybe to add a in_recog flag and > > simply return FAIL from recog when recursing. Not sure what > > the effect on this particular pattern would be though? > > > > The better(?) fix might be to push/pop recog_data in 'recog', but > > of course give that recog_data is currently a global leakage > > in intermediate code can still happen. > > > > That said - does anybody know of similar fixes for this issue in other > > backends patterns? > > I don't think it's valid for a simple query function like > ix86_vector_duplicate_value to clobber global state. Doing that > could cause problems in other situations, not just splits. > > Ideally, it would be good to wean insn-recog.cc:recog off global state. > The only parts of recog_data it uses (if I didn't miss something) > are recog_data.operands and recog_data.insn (but only to nullify > it for recog_memoized, which wouldn't be necessary if recog didn't > clobber recog_data.operands). But I guess some .md expand/insn > conditions probably rely on the operands array being in recog_data, > so that might not be easy. > > IMO the correct low-effort fix is to save and restore recog_data > in ix86_vector_duplicate_value. It's a relatively big copy, > but the current code is pretty wasteful anyway (allocating at > least a new SET and INSN for every query). Compared to the > overhead of doing that, a copy to and from the stack shouldn't > be too bad. I see. I wonder if we should at least add some public API for save/restore of recog_data so the many places don't need to invent their own version and they are more easily to find later. Maybe some RAII { push_recog_data saved (); } ? Shall we armor recog () for recursive invocation by adding a ->in_recog member to recog_data? > > Thanks, > Richard
Richard Biener <richard.guenther@gmail.com> writes: > On Tue, Aug 16, 2022 at 10:14 AM Richard Sandiford > <richard.sandiford@arm.com> wrote: >> >> Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes: >> > On Fri, Aug 12, 2022 at 10:41 PM Roger Sayle <roger@nextmovesoftware.com> wrote: >> >> >> >> >> >> This patch fixes PR target/106577 which is a recent ICE on valid regression >> >> caused by my introduction of a *testti_doubleword pre-reload splitter in >> >> i386.md. During the split pass before reload, this converts the virtual >> >> *testti_doubleword into an *andti3_doubleword and *cmpti_doubleword, >> >> checking that any immediate operand is a valid "x86_64_hilo_general_operand" >> >> and placing it into a TImode register using force_reg if it isn't. >> >> >> >> The unexpected behaviour (that caught me out) is that calling force_reg >> >> may occasionally clobber the contents of the global operands array, or >> >> more accurately recog_data.operand[0], which means that by the time >> >> split_XXX calls gen_split_YYY the replacement insn's operands have been >> >> corrupted. >> >> >> >> It's difficult to tell who (if anyone is at fault). The re-entrant >> >> stack trace (for the attached PR) looks like: >> >> >> >> gen_split_203 (*testti_doubleword) calls >> >> force_reg calls >> >> emit_move_insn calls >> >> emit_move_insn_1 calls >> >> gen_movti calls >> >> ix86_expand_move calls >> >> ix86_convert_const_wide_int_to_broadcast calls >> >> ix86_vector_duplicate_value calls >> >> recog_memoized calls >> >> recog. >> >> >> >> By far the simplest and possibly correct fix is rather than attempt >> >> to push and pop recog_data, to simply (in pre-reload splits) save a >> >> copy of any operands that will be needed after force_reg, and use >> >> these copies afterwards. Many pre-reload splitters avoid this issue >> >> using "[(clobber (const_int 0))]" and so avoid gen_split_YYY functions, >> >> but in our case we still need to save a copy of operands[0] (even if we >> >> call emit_insn or expand_* ourselves), so we might as well continue to >> >> use the conveniently generated gen_split. >> >> >> >> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap >> >> and make -k check, both with and without --target_board=unix{-m32}, >> >> with no new failures. Ok for mainline? >> > >> > Why this obviously fixes the issue seen I wonder whether there's >> > more of recog_data that might be used after control flow returns >> > to recog_memoized and thus the fix would be there, not in any >> > backend pattern triggering the issue like this? >> > >> > The "easiest" fix would maybe to add a in_recog flag and >> > simply return FAIL from recog when recursing. Not sure what >> > the effect on this particular pattern would be though? >> > >> > The better(?) fix might be to push/pop recog_data in 'recog', but >> > of course give that recog_data is currently a global leakage >> > in intermediate code can still happen. >> > >> > That said - does anybody know of similar fixes for this issue in other >> > backends patterns? >> >> I don't think it's valid for a simple query function like >> ix86_vector_duplicate_value to clobber global state. Doing that >> could cause problems in other situations, not just splits. >> >> Ideally, it would be good to wean insn-recog.cc:recog off global state. >> The only parts of recog_data it uses (if I didn't miss something) >> are recog_data.operands and recog_data.insn (but only to nullify >> it for recog_memoized, which wouldn't be necessary if recog didn't >> clobber recog_data.operands). But I guess some .md expand/insn >> conditions probably rely on the operands array being in recog_data, >> so that might not be easy. >> >> IMO the correct low-effort fix is to save and restore recog_data >> in ix86_vector_duplicate_value. It's a relatively big copy, >> but the current code is pretty wasteful anyway (allocating at >> least a new SET and INSN for every query). Compared to the >> overhead of doing that, a copy to and from the stack shouldn't >> be too bad. > > I see. I wonder if we should at least add some public API for > save/restore of recog_data so the many places don't need to > invent their own version and they are more easily to find later. Plain assignment should work. The structure isn't very fancy ;-) > Maybe some RAII > > { > push_recog_data saved (); > > } > > ? Maybe. But if we're going to spend effort on something, moving away from the global state seems better IMO. > Shall we armor recog () for recursive invocation by adding a > ->in_recog member to recog_data? Yeah, we could do that, but it wouldn't catch the current bug. Thanks, Richard
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 2fde8cd..e9232cd 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -9772,9 +9772,12 @@ (clobber (reg:CC FLAGS_REG))]) (set (reg:CCZ FLAGS_REG) (compare:CCZ (match_dup 2) (const_int 0)))] { - operands[2] = gen_reg_rtx (TImode); + /* Calling force_reg may clobber operands[0]. */ + rtx save_op0 = operands[0]; if (!x86_64_hilo_general_operand (operands[1], TImode)) operands[1] = force_reg (TImode, operands[1]); + operands[2] = gen_reg_rtx (TImode); + operands[0] = save_op0; }) ;; Combine likes to form bit extractions for some tests. Humor it. diff --git a/gcc/testsuite/gcc.target/i386/pr106577.c b/gcc/testsuite/gcc.target/i386/pr106577.c new file mode 100644 index 0000000..4e35031 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr106577.c @@ -0,0 +1,7 @@ +/* { dg-do compile { target int128 } } */ +/* { dg-options "-O2 -mavx" } */ + +int i; +void foo(void) { + i ^= !(((unsigned __int128)0xf0f0f0f0f0f0f0f0 << 64 | 0xf0f0f0f0f0f0f0f0) & i); +}