Message ID | 9f8b8662c941ed5972a6f61b9da0fa7dd81409ff.1479910323.git.segher@kernel.crashing.org |
---|---|
State | New |
Headers | show |
Hi, On Wed, 23 Nov 2016, Segher Boessenkool wrote: > r242414, for PR77881, introduces some bugs (PR78390, PR78438, PR78477). > It all has the same root cause: that patch makes combine convert every > lowpart subreg of a logical shift right to a zero_extract. This cannot > work at all if it is not a constant shift, Even with non-constant shifts it remains an extract and make_extraction does support variable start positions (which is the shift amount), as does the zero_extract pattern (depending on target of course). > if (GET_CODE (inner) == LSHIFTRT > + && CONST_INT_P (XEXP (inner, 1)) > && GET_MODE_SIZE (mode) < GET_MODE_SIZE (GET_MODE (inner)) > && subreg_lowpart_p (x)) > { > new_rtx = make_compound_operation (XEXP (inner, 0), next_code); > + int width = GET_MODE_PRECISION (GET_MODE (inner)) GET_MODE (new_rtx), because that's the object you're giving to make_extraction, not inner (and not XEXP(inner, 0)). > + - INTVAL (XEXP (inner, 1)); > + if (width > mode_width) > + width = mode_width; > new_rtx = make_extraction (mode, new_rtx, 0, XEXP (inner, 1), > - mode_width, 1, 0, in_code == COMPARE); > + width, 1, 0, in_code == COMPARE); > break; > } Ciao, Michael.
On Wed, Nov 23, 2016 at 04:24:37PM +0100, Michael Matz wrote: > > r242414, for PR77881, introduces some bugs (PR78390, PR78438, PR78477). > > It all has the same root cause: that patch makes combine convert every > > lowpart subreg of a logical shift right to a zero_extract. This cannot > > work at all if it is not a constant shift, > > Even with non-constant shifts it remains an extract and make_extraction > does support variable start positions (which is the shift amount), as does > the zero_extract pattern (depending on target of course). Sure, but the extraction length is non-constant as well then! And not just not-constant, with a conditional inside it, even. > > if (GET_CODE (inner) == LSHIFTRT > > + && CONST_INT_P (XEXP (inner, 1)) > > && GET_MODE_SIZE (mode) < GET_MODE_SIZE (GET_MODE (inner)) > > && subreg_lowpart_p (x)) > > { > > new_rtx = make_compound_operation (XEXP (inner, 0), next_code); > > + int width = GET_MODE_PRECISION (GET_MODE (inner)) > > + - INTVAL (XEXP (inner, 1)); > > GET_MODE (new_rtx), because that's the object you're giving to > make_extraction, not inner (and not XEXP(inner, 0)). This is about the *original* rtx, the lshiftrt: how many non-zero bits does that leave. The bits zeroed out by the lshiftrt have to be zeroed out by the zero_extract as well, to keep the same semantics in the resulting rtl. > > + if (width > mode_width) > > + width = mode_width; > > new_rtx = make_extraction (mode, new_rtx, 0, XEXP (inner, 1), > > - mode_width, 1, 0, in_code == COMPARE); > > + width, 1, 0, in_code == COMPARE); > > break; > > } Segher
Hi, On Wed, 23 Nov 2016, Segher Boessenkool wrote: > > Even with non-constant shifts it remains an extract and make_extraction > > does support variable start positions (which is the shift amount), as does > > the zero_extract pattern (depending on target of course). > > Sure, but the extraction length is non-constant as well then! And not > just not-constant, with a conditional inside it, even. Right, that would be the consequence. I don't see much care for this problem anywhere, which probably hints that there aren't many targets accepting variant positions for zero_extract :) > > > if (GET_CODE (inner) == LSHIFTRT > > > + && CONST_INT_P (XEXP (inner, 1)) > > > && GET_MODE_SIZE (mode) < GET_MODE_SIZE (GET_MODE (inner)) > > > && subreg_lowpart_p (x)) > > > { > > > new_rtx = make_compound_operation (XEXP (inner, 0), next_code); > > > + int width = GET_MODE_PRECISION (GET_MODE (inner)) > > > + - INTVAL (XEXP (inner, 1)); > > > > GET_MODE (new_rtx), because that's the object you're giving to > > make_extraction, not inner (and not XEXP(inner, 0)). > > This is about the *original* rtx, the lshiftrt: how many non-zero bits > does that leave. The bits zeroed out by the lshiftrt have to be zeroed > out by the zero_extract as well, to keep the same semantics in the > resulting rtl. Of course. I saw it more as a requirements on the inputs of make_extraction. In the end this is moot, as all three modes (of inner, of (inner,0) and of new_rtx) are the same :) Ciao, Michael.
On Wed, Nov 23, 2016 at 02:22:07PM +0000, Segher Boessenkool wrote: > r242414, for PR77881, introduces some bugs (PR78390, PR78438, PR78477). > It all has the same root cause: that patch makes combine convert every > lowpart subreg of a logical shift right to a zero_extract. This cannot > work at all if it is not a constant shift, and it has to be a bit more > careful exactly which bits it extracts. > > Tested on powerpc64-linux {-m32,-m64} (where it fixes the regression > c-c++-common/torture/vector-compare-1.c fails at -O1, and where it also > has a bootstrap failure with some other patches). Also tested that the > x86_64 compiler still generates the wanted code for the PR77881 testcase. On s390 (31-Bit) we get two (easily fixable) test regression supposedly because of the original path (+ this fix), and I don't know what to do about it. Perhaps these point to two situations, where the change from lshiftrt to zero_extract sould not be done? 1) (subtests f29 and f31 in s390/risbg-ll-1.c) An unpatched Gcc would generate this: -- (insn 19 17 20 2 (set (reg:DI 2 %r2 [+-4 ]) (lshiftrt:DI (reg:DI 66 [ v_shl ]) (const_int 32 [0x20]))) (insn 20 19 21 2 (set (reg:SI 3 %r3) (subreg:SI (reg:DI 66 [ v_shl ]) 4)) -- A patched Gcc generates -- (insn 19 17 20 2 (parallel [ (set (reg:DI 2 %r2 [+-4 ]) (zero_extract:DI (reg:DI 66 [ v_shl ]) (const_int 32 [0x20]) (const_int 0 [0]))) (clobber (reg:CC 33 %cc)) -- which results in a more complicated "risbg" instruction instead of a simpler shift right logical instruction "srlg". (The former is harder to read and got a CC-less variant only recently). ==> Should the change from lshiftrt to zero_extract be skipped if it's just a 32 bit shift right operation in DImode, or should this be "fixed" in the backend by adding a special case to the "extzv<mode>" pattern? -------- 2) There's not committed test case for the second finding yet. In a future patch, this will be added: -- i32 f44 (i64 v_x) { /* { dg-final { scan-assembler "f44:\n\(\t.*\n\)*\tnilf\t" { target { ! lp64 } } } } */ i64 v_shr4 = ((ui64)v_x) >> 12; i32 v_conv = (ui32)v_shr4; i32 v_and = v_conv & 10; return v_and; } -- Rtl before combine is: -- (insn 4 3 5 2 (set (reg:DI 65 [ v_x ]) (subreg:DI (reg:SI 67 [ v_x+4 ]) 0)) (insn 5 4 7 2 (parallel [ (set (zero_extract:DI (reg:DI 65 [ v_x ]) (const_int 32 [0x20]) (const_int 0 [0])) (subreg:DI (reg:SI 66 [ v_x ]) 0)) (clobber (reg:CC 33 %cc)) (insn 10 7 11 2 (set (reg:DI 69) (lshiftrt:DI (reg:DI 65 [ v_x ]) (const_int 12 [0xc]))) (insn 11 10 16 2 (parallel [ (set (reg:SI 68 [ v_and ]) (and:SI (subreg:SI (reg:DI 69) 4) (const_int 10 [0xa]))) (clobber (reg:CC 33 %cc)) -- Before the patch, combine generated -- Trying 5, 10 -> 11: Failed to match this instruction: (parallel [ ... Failed to match this instruction: (set (reg:SI 68 [ v_and ]) (and:SI (subreg:SI (lshiftrt:DI (reg:DI 65 [ v_x ]) (const_int 12 [0xc])) 4) (const_int 10 [0xa]))) Successfully matched this instruction: (set (reg:DI 69) (lshiftrt:DI (reg:DI 65 [ v_x ]) <--------- (const_int 12 [0xc]))) Successfully matched this instruction: (set (reg:SI 68 [ v_and ]) (and:SI (subreg:SI (reg:DI 69) 4) (const_int 10 [0xa]))) allowing combination of insns 5, 10 and 11 ... -- With the patch it uses zero_extract instead of lshiftrt (which is the whole point of the patch?): -- Trying 5, 10 -> 11: Failed to match this instruction: ... Successfully matched this instruction: (set (reg:DI 69) (zero_extract:DI (reg:DI 65 [ v_x ]) <--------- (const_int 32 [0x20]) (const_int 20 [0x14]))) Successfully matched this instruction: (set (reg:SI 68 [ v_and ]) (and:SI (subreg:SI (reg:DI 69) 4) (const_int 10 [0xa]))) allowing combination of insns 5, 10 and 11 -- Regardless of whether zeroes are shifted into the word or zero_extract explicitly extends the extracted bits, none of the bits set to zero are used at all because of the following "(and ... (const_int 10))". In this specific case zero_extract has more (unnecessary) "side effects" than lshiftrt, and s390 needs a more complicated instruction to do the extraction (risbg) than it needs for the shift (slrg). However, on s390 both instructions have the same size and cost, so it doesn't really matter. On other targets a shift may actually be cheaper than the zero_extract. While this seems to be all right on s390, it may still indicate a case that should be handled differently? Ciao Dominik ^_^ ^_^
Hi, On Thu, 24 Nov 2016, Dominik Vogt wrote: > On s390 (31-Bit) we get two (easily fixable) test regression > supposedly because of the original path (+ this fix), and I don't > know what to do about it. Perhaps these point to two situations, > where the change from lshiftrt to zero_extract sould not be done? The _extract forms are considered to be the canonical ones for backends that support them (I don't know why). It's just that the canonicalization wasn't applied as often as it could and hence some backends aren't prepared to see it for also trivial cases. > 1) (subtests f29 and f31 in s390/risbg-ll-1.c) > > An unpatched Gcc would generate this: > > -- > (insn 19 17 20 2 (set (reg:DI 2 %r2 [+-4 ]) > (lshiftrt:DI (reg:DI 66 [ v_shl ]) > (const_int 32 [0x20]))) > > (insn 20 19 21 2 (set (reg:SI 3 %r3) > (subreg:SI (reg:DI 66 [ v_shl ]) 4)) > -- > > A patched Gcc generates > > -- > (insn 19 17 20 2 (parallel [ > (set (reg:DI 2 %r2 [+-4 ]) > (zero_extract:DI (reg:DI 66 [ v_shl ]) > (const_int 32 [0x20]) > (const_int 0 [0]))) > (clobber (reg:CC 33 %cc)) Well, one insn instead of two, seems sensible to me. > ==> Should the change from lshiftrt to zero_extract be skipped if > it's just a 32 bit shift right operation in DImode, or should this > be "fixed" in the backend by adding a special case to the > "extzv<mode>" pattern? Special cases in extzv IMHO. > 2) There's not committed test case for the second finding yet. In > > Rtl before combine is: > > -- > (insn 4 3 5 2 (set (reg:DI 65 [ v_x ]) > (subreg:DI (reg:SI 67 [ v_x+4 ]) 0)) > > (insn 5 4 7 2 (parallel [ > (set (zero_extract:DI (reg:DI 65 [ v_x ]) > (const_int 32 [0x20]) > (const_int 0 [0])) > (subreg:DI (reg:SI 66 [ v_x ]) 0)) > (clobber (reg:CC 33 %cc)) > > (insn 10 7 11 2 (set (reg:DI 69) > (lshiftrt:DI (reg:DI 65 [ v_x ]) > (const_int 12 [0xc]))) > > (insn 11 10 16 2 (parallel [ > (set (reg:SI 68 [ v_and ]) > (and:SI (subreg:SI (reg:DI 69) 4) > (const_int 10 [0xa]))) > (clobber (reg:CC 33 %cc)) > -- > > Before the patch, combine generated > > -- > Trying 5, 10 -> 11: > Failed to match this instruction: > (parallel [ > ... > Failed to match this instruction: > (set (reg:SI 68 [ v_and ]) > (and:SI (subreg:SI (lshiftrt:DI (reg:DI 65 [ v_x ]) > (const_int 12 [0xc])) 4) > (const_int 10 [0xa]))) > Successfully matched this instruction: > (set (reg:DI 69) > (lshiftrt:DI (reg:DI 65 [ v_x ]) <--------- > (const_int 12 [0xc]))) > Successfully matched this instruction: > (set (reg:SI 68 [ v_and ]) > (and:SI (subreg:SI (reg:DI 69) 4) > (const_int 10 [0xa]))) > allowing combination of insns 5, 10 and 11 > ... > -- > > With the patch it uses zero_extract instead of lshiftrt (which is > the whole point of the patch?): > > -- > Trying 5, 10 -> 11: > Failed to match this instruction: > ... > Successfully matched this instruction: > (set (reg:DI 69) > (zero_extract:DI (reg:DI 65 [ v_x ]) <--------- > (const_int 32 [0x20]) > (const_int 20 [0x14]))) Hmm, this transformation (from v_x>>12 to zext(v_x,32,20) is only valid when the top 20 bits of v_x are known to be zero already, or alternatively if it's know that the top 32bits of reg 69 won't matter in the future. I wonder where that knowledge is coming from. > Successfully matched this instruction: > (set (reg:SI 68 [ v_and ]) > (and:SI (subreg:SI (reg:DI 69) 4) > (const_int 10 [0xa]))) > allowing combination of insns 5, 10 and 11 > -- > > Regardless of whether zeroes are shifted into the word or > zero_extract explicitly extends the extracted bits, none of the > bits set to zero are used at all because of the following "(and > ... (const_int 10))". > > In this specific case zero_extract has more (unnecessary) "side > effects" than lshiftrt, and s390 needs a more complicated > instruction to do the extraction (risbg) than it needs for the > shift (slrg). The above zero-extract is equivalent to the shift under the above conditions so it could also be special cased in the extzv pattern again. As said I'm not sure where the knowledge of that condition comes from and if it's available to the backend. Ciao, Michael.
On Thu, Nov 24, 2016 at 03:01:02PM +0100, Michael Matz wrote: > Hi, > > On Thu, 24 Nov 2016, Dominik Vogt wrote: > > > On s390 (31-Bit) we get two (easily fixable) test regression > > supposedly because of the original path (+ this fix), and I don't > > know what to do about it. Perhaps these point to two situations, > > where the change from lshiftrt to zero_extract sould not be done? > > The _extract forms are considered to be the canonical ones for backends > that support them (I don't know why). It's just that the canonicalization > wasn't applied as often as it could and hence some backends aren't > prepared to see it for also trivial cases. > > > 1) (subtests f29 and f31 in s390/risbg-ll-1.c) > > > > An unpatched Gcc would generate this: > > > > -- > > (insn 19 17 20 2 (set (reg:DI 2 %r2 [+-4 ]) > > (lshiftrt:DI (reg:DI 66 [ v_shl ]) > > (const_int 32 [0x20]))) > > > > (insn 20 19 21 2 (set (reg:SI 3 %r3) > > (subreg:SI (reg:DI 66 [ v_shl ]) 4)) > > -- > > > > A patched Gcc generates > > > > -- > > (insn 19 17 20 2 (parallel [ > > (set (reg:DI 2 %r2 [+-4 ]) > > (zero_extract:DI (reg:DI 66 [ v_shl ]) > > (const_int 32 [0x20]) > > (const_int 0 [0]))) > > (clobber (reg:CC 33 %cc)) > > Well, one insn instead of two, seems sensible to me. That was a copy-and-paste mistake. Insn 20 does not go away with the patched Gcc. Ciao Dominik ^_^ ^_^
On 24/11/2016 15:01, Michael Matz wrote: >> > (set (reg:DI 69) >> > (zero_extract:DI (reg:DI 65 [ v_x ]) <--------- >> > (const_int 32 [0x20]) >> > (const_int 20 [0x14]))) > Hmm, this transformation (from v_x>>12 to zext(v_x,32,20) is only valid > when the top 20 bits of v_x are known to be zero already, or alternatively > if it's know that the top 32bits of reg 69 won't matter in the future. I > wonder where that knowledge is coming from. s390 is BITS_BIG_ENDIAN, so zext(v_x,32,20) really means the same as zext(v_x,32,12) with the more common convention. Insn 4 and 5 respectively set the low and high 32-bits of (reg:DI 65), so the transform seems fine to me. Paolo
On Wed, Nov 23, 2016 at 02:22:07PM +0000, Segher Boessenkool wrote: > r242414, for PR77881, introduces some bugs (PR78390, PR78438, PR78477). > It all has the same root cause: that patch makes combine convert every > lowpart subreg of a logical shift right to a zero_extract. This cannot > work at all if it is not a constant shift, and it has to be a bit more > careful exactly which bits it extracts. > > Tested on powerpc64-linux {-m32,-m64} (where it fixes the regression > c-c++-common/torture/vector-compare-1.c fails at -O1, and where it also > has a bootstrap failure with some other patches). Also tested that the > x86_64 compiler still generates the wanted code for the PR77881 testcase. Is this a side effect of the patch series? Trying 7 -> 9: ... Failed to match this instruction: (set (reg:SI 68 [ v_or ]) (ior:SI (subreg:SI (zero_extract:DI (reg:DI 2 %r2 [ v_x ]) ^^^^^^^^^^^^^^^^^^^^^^^^^^ (const_int 16 [0x10]) (const_int 0 [0])) 4) ^^^ (reg:SI 70 [ v_and1 ]))) Shouldn't this be simply ... (ior:SI (zero_extract:SI (reg:DI) (16) (0))) ... ? Ciao Dominik ^_^ ^_^
On Wed, Nov 30, 2016 at 02:12:35PM +0100, Dominik Vogt wrote: > On Wed, Nov 23, 2016 at 02:22:07PM +0000, Segher Boessenkool wrote: > > r242414, for PR77881, introduces some bugs (PR78390, PR78438, PR78477). > > It all has the same root cause: that patch makes combine convert every > > lowpart subreg of a logical shift right to a zero_extract. This cannot > > work at all if it is not a constant shift, and it has to be a bit more > > careful exactly which bits it extracts. > > > > Tested on powerpc64-linux {-m32,-m64} (where it fixes the regression > > c-c++-common/torture/vector-compare-1.c fails at -O1, and where it also > > has a bootstrap failure with some other patches). Also tested that the > > x86_64 compiler still generates the wanted code for the PR77881 testcase. > > Is this a side effect of the patch series? I do not know; I cannot tell just from this, and there is no source snippet to try. Maybe Michael can tell? > Trying 7 -> 9: > ... > Failed to match this instruction: > (set (reg:SI 68 [ v_or ]) > (ior:SI (subreg:SI (zero_extract:DI (reg:DI 2 %r2 [ v_x ]) > ^^^^^^^^^^^^^^^^^^^^^^^^^^ > (const_int 16 [0x10]) > (const_int 0 [0])) 4) > ^^^ > (reg:SI 70 [ v_and1 ]))) > > Shouldn't this be simply > > ... > (ior:SI (zero_extract:SI (reg:DI) (16) (0))) > ... That seems nicer, sure. OTOH that will never match on a target that does not have zero_extract:SI from :DI. *_extracts are nasty. Segher
Hi, On Wed, 30 Nov 2016, Dominik Vogt wrote: > On Wed, Nov 23, 2016 at 02:22:07PM +0000, Segher Boessenkool wrote: > > r242414, for PR77881, introduces some bugs (PR78390, PR78438, PR78477). > > It all has the same root cause: that patch makes combine convert every > > lowpart subreg of a logical shift right to a zero_extract. This cannot > > work at all if it is not a constant shift, and it has to be a bit more > > careful exactly which bits it extracts. > > > > Tested on powerpc64-linux {-m32,-m64} (where it fixes the regression > > c-c++-common/torture/vector-compare-1.c fails at -O1, and where it also > > has a bootstrap failure with some other patches). Also tested that the > > x86_64 compiler still generates the wanted code for the PR77881 testcase. > > Is this a side effect of the patch series? What is "this"? > Trying 7 -> 9: > ... > Failed to match this instruction: > (set (reg:SI 68 [ v_or ]) > (ior:SI (subreg:SI (zero_extract:DI (reg:DI 2 %r2 [ v_x ]) > ^^^^^^^^^^^^^^^^^^^^^^^^^^ > (const_int 16 [0x10]) > (const_int 0 [0])) 4) > ^^^ > (reg:SI 70 [ v_and1 ]))) > > Shouldn't this be simply > > ... > (ior:SI (zero_extract:SI (reg:DI) (16) (0))) > ... I don't think mode-changing _extracts are valid in this context. From the docu: `(sign_extract:M LOC SIZE POS)' ... The mode M is the same as the mode that would be used for LOC if it were a register. Probably it could be made to work just fine, but I'm not sure it'd be worth much, as then the targets would need to care for mode-changes occuring not just through subregs as usual, but also through extracts. Ciao, Michael.
On Wed, Nov 30, 2016 at 02:43:12PM +0100, Michael Matz wrote: > > Shouldn't this be simply > > > > ... > > (ior:SI (zero_extract:SI (reg:DI) (16) (0))) > > ... > > I don't think mode-changing _extracts are valid in this context. From the > docu: > > `(sign_extract:M LOC SIZE POS)' > ... > The mode M is the same as the mode that would be used for LOC if > it were a register. > > Probably it could be made to work just fine, but I'm not sure it'd be > worth much, as then the targets would need to care for mode-changes > occuring not just through subregs as usual, but also through extracts. The patch https://gcc.gnu.org/ml/gcc-patches/2016-11/msg02987.html I submitted yesterday deals with this same issue, FWIW -- some ports apparently already do mode-changing extracts. Segher
Hi, On Wed, 30 Nov 2016, Segher Boessenkool wrote: > > I don't think mode-changing _extracts are valid in this context. From the > > docu: > > > > `(sign_extract:M LOC SIZE POS)' > > ... > > The mode M is the same as the mode that would be used for LOC if > > it were a register. > > > > Probably it could be made to work just fine, but I'm not sure it'd be > > worth much, as then the targets would need to care for mode-changes > > occuring not just through subregs as usual, but also through extracts. > > The patch https://gcc.gnu.org/ml/gcc-patches/2016-11/msg02987.html I > submitted yesterday deals with this same issue, FWIW -- some ports > apparently already do mode-changing extracts. Yeah, saw that a bit later. So, hmmm. I'm not sure what to make of it, if the targets choose to use mode-changing extracts I guess that's fine, as they presumably will have written patterns that recognize them. But I don't think we should willy-nilly generate such patterns as we can't know if the target deals with them or not. We could of course always generate both variants: (subreg:M1 (extract:M2 (object:M2)) and (extract:M1 (object:M2)) and see if either matches, but that seems a bit too much work. Ciao, Michael.
On Wed, Nov 30, 2016 at 03:40:32PM +0100, Michael Matz wrote: > Hi, > > On Wed, 30 Nov 2016, Segher Boessenkool wrote: > > > > I don't think mode-changing _extracts are valid in this context. From the > > > docu: > > > > > > `(sign_extract:M LOC SIZE POS)' > > > ... > > > The mode M is the same as the mode that would be used for LOC if > > > it were a register. > > > > > > Probably it could be made to work just fine, but I'm not sure it'd be > > > worth much, as then the targets would need to care for mode-changes > > > occuring not just through subregs as usual, but also through extracts. > > > > The patch https://gcc.gnu.org/ml/gcc-patches/2016-11/msg02987.html I > > submitted yesterday deals with this same issue, FWIW -- some ports > > apparently already do mode-changing extracts. > > Yeah, saw that a bit later. So, hmmm. I'm not sure what to make of it, > if the targets choose to use mode-changing extracts I guess that's fine, > as they presumably will have written patterns that recognize them. But I > don't think we should willy-nilly generate such patterns as we can't know > if the target deals with them or not. Just working on such a pattern for s390x, I had the impression that combine uses the automatic mode change when it's there, and otherwise it does things differently, that is, it tries different combinations when it has the pattern than without. There seems to be at least some kind of detection. > We could of course always generate > both variants: (subreg:M1 (extract:M2 (object:M2)) and > (extract:M1 (object:M2)) and see if either matches, but that seems a bit > too much work. -- The insns that combine tried are: (insn 7 4 8 2 (set (reg:DI 69) (lshiftrt:DI (reg/v:DI 66 [ v_x ]) (const_int 48 [0x30]))) (insn 9 8 10 2 (parallel [ (set (reg:SI 68 [ v_or ]) (ior:SI (reg:SI 70 [ v_and1 ]) (subreg:SI (reg:DI 69) 4))) (clobber (reg:CC 33 %cc)) ]) A while ago combine handled the situation well, resulting in the new "risbg" instruction, but for a while it's not been working. It's a bit difficult to track that down to a specific commit because of the broken "combine"-patch that took a while to fix. Ciao Dominik ^_^ ^_^
diff --git a/gcc/combine.c b/gcc/combine.c index c025125..ca944c6 100644 --- a/gcc/combine.c +++ b/gcc/combine.c @@ -8143,12 +8143,17 @@ make_compound_operation_int (machine_mode mode, rtx *x_ptr, /* If the SUBREG is masking of a logical right shift, make an extraction. */ if (GET_CODE (inner) == LSHIFTRT + && CONST_INT_P (XEXP (inner, 1)) && GET_MODE_SIZE (mode) < GET_MODE_SIZE (GET_MODE (inner)) && subreg_lowpart_p (x)) { new_rtx = make_compound_operation (XEXP (inner, 0), next_code); + int width = GET_MODE_PRECISION (GET_MODE (inner)) + - INTVAL (XEXP (inner, 1)); + if (width > mode_width) + width = mode_width; new_rtx = make_extraction (mode, new_rtx, 0, XEXP (inner, 1), - mode_width, 1, 0, in_code == COMPARE); + width, 1, 0, in_code == COMPARE); break; }