diff mbox

sext_hwi: Avoid left shift of negative value undefined behaviour

Message ID 55C33636.7020907@sfr.fr
State New
Headers show

Commit Message

Mikael Morin Aug. 6, 2015, 10:25 a.m. UTC
Hello,

this avoids an error found with bootstrap-ubsan.
Regression tested on x86_64-unknown-linux-gnu.  OK for trunk?

Mikael
2015-08-05  Mikael Morin  <mikael@gcc.gnu.org>

	* hwint.h (sext_hwi): Rewrite without undefined behaviour on
	negative SRC.

Comments

Jeff Law Aug. 11, 2015, 7:49 p.m. UTC | #1
On 08/06/2015 04:25 AM, Mikael Morin wrote:
> Hello,
>
> this avoids an error found with bootstrap-ubsan.
> Regression tested on x86_64-unknown-linux-gnu.  OK for trunk?
>
> Mikael
>
>
> noub_sext.CL
>
>
> 2015-08-05  Mikael Morin<mikael@gcc.gnu.org>
>
> 	* hwint.h (sext_hwi): Rewrite without undefined behaviour on
> 	negative SRC.
OK.  Hopefully most of the time the precision is known at compile-time 
which would allow for optimization of the resulting code back to the 
pair-of-shifts form by combine.




jeff
Richard Biener Aug. 12, 2015, 11:01 a.m. UTC | #2
On Tue, Aug 11, 2015 at 9:49 PM, Jeff Law <law@redhat.com> wrote:
> On 08/06/2015 04:25 AM, Mikael Morin wrote:
>>
>> Hello,
>>
>> this avoids an error found with bootstrap-ubsan.
>> Regression tested on x86_64-unknown-linux-gnu.  OK for trunk?
>>
>> Mikael
>>
>>
>> noub_sext.CL
>>
>>
>> 2015-08-05  Mikael Morin<mikael@gcc.gnu.org>
>>
>>         * hwint.h (sext_hwi): Rewrite without undefined behaviour on
>>         negative SRC.
>
> OK.  Hopefully most of the time the precision is known at compile-time which
> would allow for optimization of the resulting code back to the
> pair-of-shifts form by combine.

I think it is not.  The code also lacks a comment on why we do this kind
of obfuscation.

What kind of error does ubsan run into?  That is, for which 'prec'?

Richard.

>
>
>
> jeff
>
Markus Trippelsdorf Aug. 12, 2015, 11:07 a.m. UTC | #3
On 2015.08.12 at 13:01 +0200, Richard Biener wrote:
> On Tue, Aug 11, 2015 at 9:49 PM, Jeff Law <law@redhat.com> wrote:
> > On 08/06/2015 04:25 AM, Mikael Morin wrote:
> >>
> >> Hello,
> >>
> >> this avoids an error found with bootstrap-ubsan.
> >> Regression tested on x86_64-unknown-linux-gnu.  OK for trunk?
> >>
> >> Mikael
> >>
> >>
> >> noub_sext.CL
> >>
> >>
> >> 2015-08-05  Mikael Morin<mikael@gcc.gnu.org>
> >>
> >>         * hwint.h (sext_hwi): Rewrite without undefined behaviour on
> >>         negative SRC.
> >
> > OK.  Hopefully most of the time the precision is known at compile-time which
> > would allow for optimization of the resulting code back to the
> > pair-of-shifts form by combine.
> 
> I think it is not.  The code also lacks a comment on why we do this kind
> of obfuscation.
> 
> What kind of error does ubsan run into?  That is, for which 'prec'?

See: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67042
Richard Biener Aug. 12, 2015, 11:09 a.m. UTC | #4
On Wed, Aug 12, 2015 at 1:07 PM, Markus Trippelsdorf
<markus@trippelsdorf.de> wrote:
> On 2015.08.12 at 13:01 +0200, Richard Biener wrote:
>> On Tue, Aug 11, 2015 at 9:49 PM, Jeff Law <law@redhat.com> wrote:
>> > On 08/06/2015 04:25 AM, Mikael Morin wrote:
>> >>
>> >> Hello,
>> >>
>> >> this avoids an error found with bootstrap-ubsan.
>> >> Regression tested on x86_64-unknown-linux-gnu.  OK for trunk?
>> >>
>> >> Mikael
>> >>
>> >>
>> >> noub_sext.CL
>> >>
>> >>
>> >> 2015-08-05  Mikael Morin<mikael@gcc.gnu.org>
>> >>
>> >>         * hwint.h (sext_hwi): Rewrite without undefined behaviour on
>> >>         negative SRC.
>> >
>> > OK.  Hopefully most of the time the precision is known at compile-time which
>> > would allow for optimization of the resulting code back to the
>> > pair-of-shifts form by combine.
>>
>> I think it is not.  The code also lacks a comment on why we do this kind
>> of obfuscation.
>>
>> What kind of error does ubsan run into?  That is, for which 'prec'?
>
> See: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67042

Ugh.  Stupid C.

My fear is that with so many stmts sext_hwi isn't any longer considered for
inlining, even if prec is constant.  What's the generated code for its
out-of line
copy with/without the patch?

Richard.

> --
> Markus
Mikael Morin Aug. 12, 2015, 1:33 p.m. UTC | #5
Le 12/08/2015 13:09, Richard Biener a écrit :
> On Wed, Aug 12, 2015 at 1:07 PM, Markus Trippelsdorf
> <markus@trippelsdorf.de> wrote:
>> On 2015.08.12 at 13:01 +0200, Richard Biener wrote:
>>> What kind of error does ubsan run into?  That is, for which 'prec'?
>>
>> See: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67042
>
> Ugh.  Stupid C.
>
> My fear is that with so many stmts sext_hwi isn't any longer considered for
> inlining, even if prec is constant.  What's the generated code for its
> out-of line
> copy with/without the patch?
>

The difference on x86_64 is:

    	.cfi_startproc
    	cmpl	$64, %esi
    	je	.L3
-	movl	$64, %ecx
-	subl	%esi, %ecx
-	salq	%cl, %rdi
+	leal	-1(%rsi), %ecx
+	movl	$1, %eax
+	movq	%rax, %rdx
+	salq	%cl, %rdx
+	movl	%esi, %ecx
+	salq	%cl, %rax
+	subq	$1, %rax
+	andq	%rax, %rdi
+	xorq	%rdx, %rdi
    	movq	%rdi, %rax
-	sarq	%cl, %rax
+	subq	%rdx, %rax
    	ret
    	.p2align 4,,10
    	.p2align 3

with -O2, tests attached.
Yes it's worse.  I thought it was better to avoid undefined behaviour at 
all prices to avoid "bad surprises".

Mikael
Jeff Law Aug. 12, 2015, 5:02 p.m. UTC | #6
On 08/12/2015 07:33 AM, Mikael Morin wrote:
> Le 12/08/2015 13:09, Richard Biener a écrit :
>> On Wed, Aug 12, 2015 at 1:07 PM, Markus Trippelsdorf
>> <markus@trippelsdorf.de> wrote:
>>> On 2015.08.12 at 13:01 +0200, Richard Biener wrote:
>>>> What kind of error does ubsan run into?  That is, for which 'prec'?
>>>
>>> See: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67042
>>
>> Ugh.  Stupid C.
>>
>> My fear is that with so many stmts sext_hwi isn't any longer
>> considered for
>> inlining, even if prec is constant.  What's the generated code for its
>> out-of line
>> copy with/without the patch?
>>
>
> The difference on x86_64 is:
>
>         .cfi_startproc
>         cmpl    $64, %esi
>         je    .L3
> -    movl    $64, %ecx
> -    subl    %esi, %ecx
> -    salq    %cl, %rdi
> +    leal    -1(%rsi), %ecx
> +    movl    $1, %eax
> +    movq    %rax, %rdx
> +    salq    %cl, %rdx
> +    movl    %esi, %ecx
> +    salq    %cl, %rax
> +    subq    $1, %rax
> +    andq    %rax, %rdi
> +    xorq    %rdx, %rdi
>         movq    %rdi, %rax
> -    sarq    %cl, %rax
> +    subq    %rdx, %rax
>         ret
>         .p2align 4,,10
>         .p2align 3
>
> with -O2, tests attached.
> Yes it's worse.  I thought it was better to avoid undefined behaviour at
> all prices to avoid "bad surprises".
Well, this isn't the right test.  You're testing when PREC is an unknown 
and I fully expect in that case the code you're going to get will be 
worse.  There's too many insns for combine to save us in that case.

What's interesting here is what happens when prec is a known value 
(since we're hoping that's the common case via inlining).

When prec has a known value of 8, 16 or 32 we get a nice movs[bwl]q on 
x86_64, which is exactly what we want.

When prec is another known value, say 13 for the sake of argument, we 
get a pair of shifts, which is again exactly what we want.

So when prec is a constant, we're going to get good code.  So the only 
question left is whether or not prec is usually a constant or not in the 
contexts in which this routine is used.


Jeff
Richard Biener Aug. 12, 2015, 5:12 p.m. UTC | #7
On August 12, 2015 7:02:04 PM GMT+02:00, Jeff Law <law@redhat.com> wrote:
>On 08/12/2015 07:33 AM, Mikael Morin wrote:
>> Le 12/08/2015 13:09, Richard Biener a écrit :
>>> On Wed, Aug 12, 2015 at 1:07 PM, Markus Trippelsdorf
>>> <markus@trippelsdorf.de> wrote:
>>>> On 2015.08.12 at 13:01 +0200, Richard Biener wrote:
>>>>> What kind of error does ubsan run into?  That is, for which
>'prec'?
>>>>
>>>> See: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67042
>>>
>>> Ugh.  Stupid C.
>>>
>>> My fear is that with so many stmts sext_hwi isn't any longer
>>> considered for
>>> inlining, even if prec is constant.  What's the generated code for
>its
>>> out-of line
>>> copy with/without the patch?
>>>
>>
>> The difference on x86_64 is:
>>
>>         .cfi_startproc
>>         cmpl    $64, %esi
>>         je    .L3
>> -    movl    $64, %ecx
>> -    subl    %esi, %ecx
>> -    salq    %cl, %rdi
>> +    leal    -1(%rsi), %ecx
>> +    movl    $1, %eax
>> +    movq    %rax, %rdx
>> +    salq    %cl, %rdx
>> +    movl    %esi, %ecx
>> +    salq    %cl, %rax
>> +    subq    $1, %rax
>> +    andq    %rax, %rdi
>> +    xorq    %rdx, %rdi
>>         movq    %rdi, %rax
>> -    sarq    %cl, %rax
>> +    subq    %rdx, %rax
>>         ret
>>         .p2align 4,,10
>>         .p2align 3
>>
>> with -O2, tests attached.
>> Yes it's worse.  I thought it was better to avoid undefined behaviour
>at
>> all prices to avoid "bad surprises".
>Well, this isn't the right test.  You're testing when PREC is an
>unknown 
>and I fully expect in that case the code you're going to get will be 
>worse.  There's too many insns for combine to save us in that case.
>
>What's interesting here is what happens when prec is a known value 
>(since we're hoping that's the common case via inlining).
>
>When prec has a known value of 8, 16 or 32 we get a nice movs[bwl]q on 
>x86_64, which is exactly what we want.
>
>When prec is another known value, say 13 for the sake of argument, we 
>get a pair of shifts, which is again exactly what we want.
>
>So when prec is a constant, we're going to get good code.  So the only 
>question left is whether or not prec is usually a constant or not in
>the 
>contexts in which this routine is used.

Prec is almost never a constant and is heavily used from wide-int.

We are not exploiting this undefined ness in C so I object to making this so much slower.

Can we instead do what we do for abs_hwi and add a checking assert so we can move the tests to the callers that misbehave instead?

Thanks,
Richard.

>
>Jeff
Jeff Law Aug. 12, 2015, 6:07 p.m. UTC | #8
On 08/12/2015 11:12 AM, Richard Biener wrote:

>
> Prec is almost never a constant and is heavily used from wide-int.
>
> We are not exploiting this undefined ness in C so I object to making this so much slower.
>
> Can we instead do what we do for abs_hwi and add a checking assert so we can move the tests to the callers that misbehave instead?
Given that ISO C++ is moving away from making shifting 1 into the sign 
bit undefined behaviour, maybe we should make UBSan less strict in its 
warning.  That may eliminate the need for Mikael's patch.

jeff
Richard Biener Aug. 12, 2015, 6:32 p.m. UTC | #9
On August 12, 2015 8:07:13 PM GMT+02:00, Jeff Law <law@redhat.com> wrote:
>On 08/12/2015 11:12 AM, Richard Biener wrote:
>
>>
>> Prec is almost never a constant and is heavily used from wide-int.
>>
>> We are not exploiting this undefined ness in C so I object to making
>this so much slower.
>>
>> Can we instead do what we do for abs_hwi and add a checking assert so
>we can move the tests to the callers that misbehave instead?
>Given that ISO C++ is moving away from making shifting 1 into the sign 
>bit undefined behaviour, maybe we should make UBSan less strict in its 
>warning.  That may eliminate the need for Mikael's patch.

We can also use an logical left shift followed by an arithmetic right shift. Or is the latter invoking undefined behaviour as well in some cases we hit?

Richard.

>jeff
Jeff Law Aug. 12, 2015, 6:41 p.m. UTC | #10
On 08/12/2015 12:32 PM, Richard Biener wrote:
> On August 12, 2015 8:07:13 PM GMT+02:00, Jeff Law <law@redhat.com> wrote:
>> On 08/12/2015 11:12 AM, Richard Biener wrote:
>>
>>>
>>> Prec is almost never a constant and is heavily used from wide-int.
>>>
>>> We are not exploiting this undefined ness in C so I object to making
>> this so much slower.
>>>
>>> Can we instead do what we do for abs_hwi and add a checking assert so
>> we can move the tests to the callers that misbehave instead?
>> Given that ISO C++ is moving away from making shifting 1 into the sign
>> bit undefined behaviour, maybe we should make UBSan less strict in its
>> warning.  That may eliminate the need for Mikael's patch.
>
> We can also use an logical left shift followed by an arithmetic right shift. Or is the latter invoking undefined behaviour as well in some cases we hit?
Hmm, why aren't we using logicals for the left shift to begin with? 
That's the problem area.  I don't think the right shifts are an issue at 
all.

It's strange that when I was researching this, consistently I saw folks 
suggesting the LEFT-SHIFT followed by RIGHT-SHIFT, then it'd get shot 
down as UB, then folks went to either Mikael's approach or another that 
is very similar.  Nobody suggested twidding the types to get a 
left-logical-shift, then doing a right-arithmetic-shift.

Jeff
Mike Stump Aug. 12, 2015, 7:19 p.m. UTC | #11
On Aug 12, 2015, at 11:07 AM, Jeff Law <law@redhat.com> wrote:
> On 08/12/2015 11:12 AM, Richard Biener wrote:
>> 
>> Prec is almost never a constant and is heavily used from wide-int.
>> 
>> We are not exploiting this undefined ness in C so I object to making this so much slower.
>> 
>> Can we instead do what we do for abs_hwi and add a checking assert so we can move the tests to the callers that misbehave instead?
> Given that ISO C++ is moving away from making shifting 1 into the sign bit undefined behaviour, maybe we should make UBSan less strict in its warning.  That may eliminate the need for Mikael's patch.

Pedantically, then, we have to move the the later language standard (if this was standardese).  If it was DRese, then we’re safe.  :-)
Richard Sandiford Aug. 12, 2015, 8:07 p.m. UTC | #12
Jeff Law <law@redhat.com> writes:
> On 08/12/2015 12:32 PM, Richard Biener wrote:
>> On August 12, 2015 8:07:13 PM GMT+02:00, Jeff Law <law@redhat.com> wrote:
>>> On 08/12/2015 11:12 AM, Richard Biener wrote:
>>>
>>>>
>>>> Prec is almost never a constant and is heavily used from wide-int.
>>>>
>>>> We are not exploiting this undefined ness in C so I object to making
>>> this so much slower.
>>>>
>>>> Can we instead do what we do for abs_hwi and add a checking assert so
>>> we can move the tests to the callers that misbehave instead?
>>> Given that ISO C++ is moving away from making shifting 1 into the sign
>>> bit undefined behaviour, maybe we should make UBSan less strict in its
>>> warning.  That may eliminate the need for Mikael's patch.
>>
>> We can also use an logical left shift followed by an arithmetic right
>> shift. Or is the latter invoking undefined behaviour as well in some
>> cases we hit?
> Hmm, why aren't we using logicals for the left shift to begin with? 
> That's the problem area.  I don't think the right shifts are an issue at 
> all.

Well, they're implementation-defined, at least in C.  The C11 wording
for E1 >> E2 is "If E1 has a signed type and a negative value, the
resulting value is implementation-defined".  Is C++ different?
(I don't have the standard handy.)

So...

> It's strange that when I was researching this, consistently I saw folks 
> suggesting the LEFT-SHIFT followed by RIGHT-SHIFT, then it'd get shot 
> down as UB, then folks went to either Mikael's approach or another that 
> is very similar.  Nobody suggested twidding the types to get a 
> left-logical-shift, then doing a right-arithmetic-shift.

...unless C++ is different, there's not a standard-level concept
of arithmetic shift.

Thanks,
Richard
Mike Stump Aug. 12, 2015, 8:52 p.m. UTC | #13
On Aug 12, 2015, at 1:07 PM, Richard Sandiford <rdsandiford@googlemail.com> wrote:
>> I don't think the right shifts are an issue at all.
> 
> Well, they're implementation-defined, at least in C.

> The C11 wording for E1 >> E2 is "If E1 has a signed type and a negative value, the
> resulting value is implementation-defined”.

> Is C++ different?

No, it is the same:

3 The value of E1 >> E2 is E1 right-shifted E2 bit positions.  If E1 has
  an  unsigned  type or if E1 has a signed type and a nonnegative value,
  the value of the result is the integral part of  the  quotient  of  E1
  divided  by the quantity 2 raised to the power E2.  If E1 has a signed
  type and a negative value,  the  resulting  value  is  implementation-
  defined.

> (I don't have the standard handy.)

Google is your friend.
Richard Biener Aug. 13, 2015, 10:05 a.m. UTC | #14
On Wed, Aug 12, 2015 at 10:52 PM, Mike Stump <mikestump@comcast.net> wrote:
> On Aug 12, 2015, at 1:07 PM, Richard Sandiford <rdsandiford@googlemail.com> wrote:
>>> I don't think the right shifts are an issue at all.
>>
>> Well, they're implementation-defined, at least in C.
>
>> The C11 wording for E1 >> E2 is "If E1 has a signed type and a negative value, the
>> resulting value is implementation-defined”.
>
>> Is C++ different?
>
> No, it is the same:
>
> 3 The value of E1 >> E2 is E1 right-shifted E2 bit positions.  If E1 has
>   an  unsigned  type or if E1 has a signed type and a nonnegative value,
>   the value of the result is the integral part of  the  quotient  of  E1
>   divided  by the quantity 2 raised to the power E2.  If E1 has a signed
>   type and a negative value,  the  resulting  value  is  implementation-
>   defined.

Ok, then guard the << >> with __GCC__ and do the expensive bit stuff
otherwise.  Just to cater for other host compilers doing sth unsensibly
implementation defined.

This thing _is_ performance critical.

Richard.

>> (I don't have the standard handy.)
>
> Google is your friend.
Mikael Morin Aug. 13, 2015, 10:56 a.m. UTC | #15
Le 12/08/2015 22:07, Richard Sandiford a écrit :
> Jeff Law <law@redhat.com> writes:
>> On 08/12/2015 12:32 PM, Richard Biener wrote:
>>> On August 12, 2015 8:07:13 PM GMT+02:00, Jeff Law <law@redhat.com> wrote:
>>>> On 08/12/2015 11:12 AM, Richard Biener wrote:
>>>>
>>>>>
>>>>> Prec is almost never a constant and is heavily used from wide-int.
>>>>>
>>>>> We are not exploiting this undefined ness in C so I object to making
>>>> this so much slower.
>>>>>
>>>>> Can we instead do what we do for abs_hwi and add a checking assert so
>>>> we can move the tests to the callers that misbehave instead?
>>>> Given that ISO C++ is moving away from making shifting 1 into the sign
>>>> bit undefined behaviour, maybe we should make UBSan less strict in its
>>>> warning.  That may eliminate the need for Mikael's patch.
>>>
>>> We can also use an logical left shift followed by an arithmetic right
>>> shift. Or is the latter invoking undefined behaviour as well in some
>>> cases we hit?
>> Hmm, why aren't we using logicals for the left shift to begin with?
>> That's the problem area.  I don't think the right shifts are an issue at
>> all.
>
> Well, they're implementation-defined, at least in C.  The C11 wording
> for E1 >> E2 is "If E1 has a signed type and a negative value, the
> resulting value is implementation-defined".  Is C++ different?
> (I don't have the standard handy.)
>
> So...
>
>> It's strange that when I was researching this, consistently I saw folks
>> suggesting the LEFT-SHIFT followed by RIGHT-SHIFT, then it'd get shot
>> down as UB, then folks went to either Mikael's approach or another that
>> is very similar.  Nobody suggested twidding the types to get a
>> left-logical-shift, then doing a right-arithmetic-shift.
>
> ...unless C++ is different, there's not a standard-level concept
> of arithmetic shift.
>
Still, implementation-defined behavior is a progress over undefined 
behaviour.
Even if it's not set in stone, the good thing about 
implementation-defined behaviour is that it's known to be non-random.
So it can be checked, either with autoconf, or with a macro.
Would it be acceptable to have both variants of the code and decide 
among them with such a check?
It feels a bit overkill for such a little function, but it is the only 
way that I see to achieve both need for speed and for well-defined-ness.
I guess it won't kill the bootstrap-ubsan errors though.

Mikael
Mikael Morin Aug. 13, 2015, 11:03 a.m. UTC | #16
Le 13/08/2015 12:56, Mikael Morin a écrit :
> Still, implementation-defined behavior is a progress over undefined
> behaviour.
> Even if it's not set in stone, the good thing about
> implementation-defined behaviour is that it's known to be non-random.
> So it can be checked, either with autoconf, or with a macro.
> Would it be acceptable to have both variants of the code and decide
> among them with such a check?
> It feels a bit overkill for such a little function, but it is the only
> way that I see to achieve both need for speed and for well-defined-ness.
> I guess it won't kill the bootstrap-ubsan errors though.
>
Sorry Richi, I didn't see your answer before posting.
I will prepare something tomorrow.

Mikael
Markus Trippelsdorf Aug. 13, 2015, 11:08 a.m. UTC | #17
On 2015.08.13 at 12:56 +0200, Mikael Morin wrote:
> Le 12/08/2015 22:07, Richard Sandiford a écrit :
> > Jeff Law <law@redhat.com> writes:
> >> On 08/12/2015 12:32 PM, Richard Biener wrote:
> >>> On August 12, 2015 8:07:13 PM GMT+02:00, Jeff Law <law@redhat.com> wrote:
> >>>> On 08/12/2015 11:12 AM, Richard Biener wrote:
> >>>>
> >>>>>
> >>>>> Prec is almost never a constant and is heavily used from wide-int.
> >>>>>
> >>>>> We are not exploiting this undefined ness in C so I object to making
> >>>> this so much slower.
> >>>>>
> >>>>> Can we instead do what we do for abs_hwi and add a checking assert so
> >>>> we can move the tests to the callers that misbehave instead?
> >>>> Given that ISO C++ is moving away from making shifting 1 into the sign
> >>>> bit undefined behaviour, maybe we should make UBSan less strict in its
> >>>> warning.  That may eliminate the need for Mikael's patch.
> >>>
> >>> We can also use an logical left shift followed by an arithmetic right
> >>> shift. Or is the latter invoking undefined behaviour as well in some
> >>> cases we hit?
> >> Hmm, why aren't we using logicals for the left shift to begin with?
> >> That's the problem area.  I don't think the right shifts are an issue at
> >> all.
> >
> > Well, they're implementation-defined, at least in C.  The C11 wording
> > for E1 >> E2 is "If E1 has a signed type and a negative value, the
> > resulting value is implementation-defined".  Is C++ different?
> > (I don't have the standard handy.)
> >
> > So...
> >
> >> It's strange that when I was researching this, consistently I saw folks
> >> suggesting the LEFT-SHIFT followed by RIGHT-SHIFT, then it'd get shot
> >> down as UB, then folks went to either Mikael's approach or another that
> >> is very similar.  Nobody suggested twidding the types to get a
> >> left-logical-shift, then doing a right-arithmetic-shift.
> >
> > ...unless C++ is different, there's not a standard-level concept
> > of arithmetic shift.
> >
> Still, implementation-defined behavior is a progress over undefined 
> behaviour.
> Even if it's not set in stone, the good thing about 
> implementation-defined behaviour is that it's known to be non-random.
> So it can be checked, either with autoconf, or with a macro.
> Would it be acceptable to have both variants of the code and decide 
> among them with such a check?
> It feels a bit overkill for such a little function, but it is the only 
> way that I see to achieve both need for speed and for well-defined-ness.
> I guess it won't kill the bootstrap-ubsan errors though.

For such cases you can use the ATTRIBUTE_NO_SANITIZE_UNDEFINED macro,
that is defined in include/ansidecl.h.
Richard Biener Aug. 13, 2015, 11:11 a.m. UTC | #18
On Thu, Aug 13, 2015 at 1:08 PM, Markus Trippelsdorf
<markus@trippelsdorf.de> wrote:
> On 2015.08.13 at 12:56 +0200, Mikael Morin wrote:
>> Le 12/08/2015 22:07, Richard Sandiford a écrit :
>> > Jeff Law <law@redhat.com> writes:
>> >> On 08/12/2015 12:32 PM, Richard Biener wrote:
>> >>> On August 12, 2015 8:07:13 PM GMT+02:00, Jeff Law <law@redhat.com> wrote:
>> >>>> On 08/12/2015 11:12 AM, Richard Biener wrote:
>> >>>>
>> >>>>>
>> >>>>> Prec is almost never a constant and is heavily used from wide-int.
>> >>>>>
>> >>>>> We are not exploiting this undefined ness in C so I object to making
>> >>>> this so much slower.
>> >>>>>
>> >>>>> Can we instead do what we do for abs_hwi and add a checking assert so
>> >>>> we can move the tests to the callers that misbehave instead?
>> >>>> Given that ISO C++ is moving away from making shifting 1 into the sign
>> >>>> bit undefined behaviour, maybe we should make UBSan less strict in its
>> >>>> warning.  That may eliminate the need for Mikael's patch.
>> >>>
>> >>> We can also use an logical left shift followed by an arithmetic right
>> >>> shift. Or is the latter invoking undefined behaviour as well in some
>> >>> cases we hit?
>> >> Hmm, why aren't we using logicals for the left shift to begin with?
>> >> That's the problem area.  I don't think the right shifts are an issue at
>> >> all.
>> >
>> > Well, they're implementation-defined, at least in C.  The C11 wording
>> > for E1 >> E2 is "If E1 has a signed type and a negative value, the
>> > resulting value is implementation-defined".  Is C++ different?
>> > (I don't have the standard handy.)
>> >
>> > So...
>> >
>> >> It's strange that when I was researching this, consistently I saw folks
>> >> suggesting the LEFT-SHIFT followed by RIGHT-SHIFT, then it'd get shot
>> >> down as UB, then folks went to either Mikael's approach or another that
>> >> is very similar.  Nobody suggested twidding the types to get a
>> >> left-logical-shift, then doing a right-arithmetic-shift.
>> >
>> > ...unless C++ is different, there's not a standard-level concept
>> > of arithmetic shift.
>> >
>> Still, implementation-defined behavior is a progress over undefined
>> behaviour.
>> Even if it's not set in stone, the good thing about
>> implementation-defined behaviour is that it's known to be non-random.
>> So it can be checked, either with autoconf, or with a macro.
>> Would it be acceptable to have both variants of the code and decide
>> among them with such a check?
>> It feels a bit overkill for such a little function, but it is the only
>> way that I see to achieve both need for speed and for well-defined-ness.
>> I guess it won't kill the bootstrap-ubsan errors though.
>
> For such cases you can use the ATTRIBUTE_NO_SANITIZE_UNDEFINED macro,
> that is defined in include/ansidecl.h.

Rather ubsan should not complain about implementation defined behavior (or
should separate those cases out with a different switch compared to undefined
behavior).

Richard.

> --
> Markus
Marek Polacek Aug. 13, 2015, 1:40 p.m. UTC | #19
On Thu, Aug 13, 2015 at 01:11:53PM +0200, Richard Biener wrote:
> Rather ubsan should not complain about implementation defined behavior (or
> should separate those cases out with a different switch compared to undefined
> behavior).

I think ubsan doesn't complain about implementation-defined behavior.  It seems
to me that in this thread it only (rightfully) complained about left-shifting of
negative value. 

Using __GCC__ conditional in a case where we rely on what gcc does when
right-shifting a negative value, i.e. that it sign-extends, seems fine
(ubsan certainly doesn't error on that).

	Marek
Markus Trippelsdorf Aug. 13, 2015, 1:52 p.m. UTC | #20
On 2015.08.13 at 15:40 +0200, Marek Polacek wrote:
> On Thu, Aug 13, 2015 at 01:11:53PM +0200, Richard Biener wrote:
> > Rather ubsan should not complain about implementation defined behavior (or
> > should separate those cases out with a different switch compared to undefined
> > behavior).
> 
> I think ubsan doesn't complain about implementation-defined behavior.  It seems
> to me that in this thread it only (rightfully) complained about left-shifting of
> negative value. 

There are two issues in the same location:

1) gcc/hwint.h:250:19: runtime error: left shift of 8589934588 by 32
places cannot be represented in type 'long int'

2) left-shifting of negative values
Marek Polacek Aug. 13, 2015, 1:55 p.m. UTC | #21
On Thu, Aug 13, 2015 at 03:52:00PM +0200, Markus Trippelsdorf wrote:
> On 2015.08.13 at 15:40 +0200, Marek Polacek wrote:
> > On Thu, Aug 13, 2015 at 01:11:53PM +0200, Richard Biener wrote:
> > > Rather ubsan should not complain about implementation defined behavior (or
> > > should separate those cases out with a different switch compared to undefined
> > > behavior).
> > 
> > I think ubsan doesn't complain about implementation-defined behavior.  It seems
> > to me that in this thread it only (rightfully) complained about left-shifting of
> > negative value. 
> 
> There are two issues in the same location:
> 
> 1) gcc/hwint.h:250:19: runtime error: left shift of 8589934588 by 32
> places cannot be represented in type 'long int'
> 
> 2) left-shifting of negative values

I see, both are UB.

	Marek
Mike Stump Aug. 13, 2015, 5:47 p.m. UTC | #22
On Aug 13, 2015, at 3:05 AM, Richard Biener <richard.guenther@gmail.com> wrote:
> Ok, then guard the << >> with __GCC__ and do the expensive bit stuff
> otherwise.  Just to cater for other host compilers doing sth unsensibly
> implementation defined.

Ick.  The guard should be specific to the implementation defined semantic or undefined semantic, and then when compiling with gcc, we turn it on.  My take is that when we do this, we should add the 5 or 10 other most popular compilers to the list of how they behave so that they all do the cheap path code as well.  If the language standard were serious in this area, it would specify a header file that can define the implementation defined and undefined semantics that are interesting for portable code.  It isn’t.  If it were, we would then just use the standard defined guards.

The language standard should be improved to directly state the possible implementation choices and to require the implementation to communicate to the program which choice it made.
Richard Biener Aug. 14, 2015, 7:29 a.m. UTC | #23
On Thu, Aug 13, 2015 at 7:47 PM, Mike Stump <mikestump@comcast.net> wrote:
> On Aug 13, 2015, at 3:05 AM, Richard Biener <richard.guenther@gmail.com> wrote:
>> Ok, then guard the << >> with __GCC__ and do the expensive bit stuff
>> otherwise.  Just to cater for other host compilers doing sth unsensibly
>> implementation defined.
>
> Ick.  The guard should be specific to the implementation defined semantic or undefined semantic, and then when compiling with gcc, we turn it on.  My take is that when we do this, we should add the 5 or 10 other most popular compilers to the list of how they behave so that they all do the cheap path code as well.  If the language standard were serious in this area, it would specify a header file that can define the implementation defined and undefined semantics that are interesting for portable code.  It isn’t.  If it were, we would then just use the standard defined guards.

GCC is always used to build stage2+ so I don't see the need to handle
alternate host compilers.

> The language standard should be improved to directly state the possible implementation choices and to require the implementation to communicate to the program which choice it made.
diff mbox

Patch

diff --git a/gcc/hwint.h b/gcc/hwint.h
index 3793986..9c3eda0 100644
--- a/gcc/hwint.h
+++ b/gcc/hwint.h
@@ -246,8 +246,9 @@  sext_hwi (HOST_WIDE_INT src, unsigned int prec)
   else
     {
       gcc_checking_assert (prec < HOST_BITS_PER_WIDE_INT);
-      int shift = HOST_BITS_PER_WIDE_INT - prec;
-      return (src << shift) >> shift;
+      HOST_WIDE_INT sign_mask = HOST_WIDE_INT_1 << (prec - 1);
+      HOST_WIDE_INT value_mask = (HOST_WIDE_INT_1U << prec) - HOST_WIDE_INT_1U;
+      return (((src & value_mask) ^ sign_mask) - sign_mask);
     }
 }