diff mbox

[gcc/MIPS] add an build-time/runtime option to disable madd.fmt

Message ID 7FDC38C0-90D4-4A32-8FD5-ED65DC469055@imgtec.com
State New
Headers show

Commit Message

Yunqiang Su Dec. 21, 2016, 6:54 p.m. UTC
By this patch, I add a build-time option ` --with-unfused-madd4=yes/no’,
and runtime option -m(no-)unfused-madd4,
to disable generate madd.fmt instructions.

These 2 options is needed due to madd.fmt/msub.fmt on Loongson are broken,
which may generate wrong calculator result.

Comments

Sandra Loosemore Dec. 21, 2016, 9:35 p.m. UTC | #1
On 12/21/2016 11:54 AM, Yunqiang Su wrote:
> By this patch, I add a build-time option ` --with-unfused-madd4=yes/no’,
> and runtime option -m(no-)unfused-madd4,
> to disable generate madd.fmt instructions.

Your patch also needs a documentation change so that the new 
command-line option is listed in the GCC manual with other MIPS target 
options.

-Sandra
Matthew Fortune Dec. 21, 2016, 9:43 p.m. UTC | #2
Sandra Loosemore <sandra@codesourcery.com> writes:
> On 12/21/2016 11:54 AM, Yunqiang Su wrote:
> > By this patch, I add a build-time option ` --with-unfused-madd4=yes/no',
> > and runtime option -m(no-)unfused-madd4,
> > to disable generate madd.fmt instructions.
> 
> Your patch also needs a documentation change so that the new
> command-line option is listed in the GCC manual with other MIPS target
> options.

Any opinions on option names to control this? Is it best to target the specific
feature that is non-compliant on loongson or apply a general -mfix-loongson
type option?

I'm not sure I have a strong opinion either way but there do seem to be
multiple possible variants.

Thanks,
Matthew
Richard Sandiford Dec. 22, 2016, 3:31 p.m. UTC | #3
Matthew Fortune <Matthew.Fortune@imgtec.com> writes:
> Sandra Loosemore <sandra@codesourcery.com> writes:
>> On 12/21/2016 11:54 AM, Yunqiang Su wrote:
>> > By this patch, I add a build-time option ` --with-unfused-madd4=yes/no',
>> > and runtime option -m(no-)unfused-madd4,
>> > to disable generate madd.fmt instructions.
>> 
>> Your patch also needs a documentation change so that the new
>> command-line option is listed in the GCC manual with other MIPS target
>> options.
>
> Any opinions on option names to control this? Is it best to target the specific
> feature that is non-compliant on loongson or apply a general -mfix-loongson
> type option?
>
> I'm not sure I have a strong opinion either way but there do seem to be
> multiple possible variants.

Wasn't sure from this thread whether Loongson simply had a fused
implementation (without intermediate rounding) or whether the
instructions gave numerically incorrect results for some inputs.
It sounds from a later thread like it's generating incorrect results,
is that right?  If so, then FWIW I agree an -mfix option would be more
consistent.  E.g. one of the -mfix-vr4120 errata was an incorrect
integer division result and one of the -mfix-sb1 errata was an incorrect
single-precision float division result.  The latter case could have been
handled by an option to disable DIV.S and DIV.PS, but the -mfix option
gave more control.

If instead the problem is that the instructions are fused then that's
also what the original MIPS 4 parts did, so maybe an option to control
fusedness would make sense.

Thanks,
Richard
Yunqiang Su Dec. 22, 2016, 3:48 p.m. UTC | #4
> 在 2016年12月22日,23:31,Richard Sandiford <rdsandiford@googlemail.com> 写道:

> 

> Matthew Fortune <Matthew.Fortune@imgtec.com> writes:

>> Sandra Loosemore <sandra@codesourcery.com> writes:

>>> On 12/21/2016 11:54 AM, Yunqiang Su wrote:

>>>> By this patch, I add a build-time option ` --with-unfused-madd4=yes/no',

>>>> and runtime option -m(no-)unfused-madd4,

>>>> to disable generate madd.fmt instructions.

>>> 

>>> Your patch also needs a documentation change so that the new

>>> command-line option is listed in the GCC manual with other MIPS target

>>> options.

>> 

>> Any opinions on option names to control this? Is it best to target the specific

>> feature that is non-compliant on loongson or apply a general -mfix-loongson

>> type option?

>> 

>> I'm not sure I have a strong opinion either way but there do seem to be

>> multiple possible variants.

> 

> Wasn't sure from this thread whether Loongson simply had a fused

> implementation (without intermediate rounding) or whether the

> instructions gave numerically incorrect results for some inputs.


I test to define ISA_HAS_FUSED_MADD4 true and 
define ISA_HAS_UNFUSED_MADD4 false, and try to build a test case.
With ISA_HAS_FUSED_MADD4, the result is about 1e-17,
and with ISA_HAS_UNFUSED_MADD4, the result is about 1e-17,
both of the are incorrect (the expect value is 0).

The test case is 

#include <stdio.h>

double a = 0.6;
double b = 0.4;
double c = 0.6;
double d = 0.4;

int main(void)
{
        double x = a * b - c * d;
        printf("%le\n", x);
        return 0;
}


> It sounds from a later thread like it's generating incorrect results,

> is that right?  If so, then FWIW I agree an -mfix option would be more

> consistent.  E.g. one of the -mfix-vr4120 errata was an incorrect

> integer division result and one of the -mfix-sb1 errata was an incorrect

> single-precision float division result.  The latter case could have been

> handled by an option to disable DIV.S and DIV.PS, but the -mfix option

> gave more control.

> 

> If instead the problem is that the instructions are fused then that's

> also what the original MIPS 4 parts did, so maybe an option to control

> fusedness would make sense.


The result to thread it fused or unfused, is different, while neither of them
is correct.

> 

> Thanks,

> Richard
Yunqiang Su Dec. 22, 2016, 3:56 p.m. UTC | #5
> 在 2016年12月22日,23:48,Yunqiang Su <Yunqiang.Su@imgtec.com> 写道:

> 

>> 

>> 在 2016年12月22日,23:31,Richard Sandiford <rdsandiford@googlemail.com> 写道:

>> 

>> Matthew Fortune <Matthew.Fortune@imgtec.com> writes:

>>> Sandra Loosemore <sandra@codesourcery.com> writes:

>>>> On 12/21/2016 11:54 AM, Yunqiang Su wrote:

>>>>> By this patch, I add a build-time option ` --with-unfused-madd4=yes/no',

>>>>> and runtime option -m(no-)unfused-madd4,

>>>>> to disable generate madd.fmt instructions.

>>>> 

>>>> Your patch also needs a documentation change so that the new

>>>> command-line option is listed in the GCC manual with other MIPS target

>>>> options.

>>> 

>>> Any opinions on option names to control this? Is it best to target the specific

>>> feature that is non-compliant on loongson or apply a general -mfix-loongson

>>> type option?

>>> 

>>> I'm not sure I have a strong opinion either way but there do seem to be

>>> multiple possible variants.

>> 

>> Wasn't sure from this thread whether Loongson simply had a fused

>> implementation (without intermediate rounding) or whether the

>> instructions gave numerically incorrect results for some inputs.

> 

> I test to define ISA_HAS_FUSED_MADD4 true and 

> define ISA_HAS_UNFUSED_MADD4 false, and try to build a test case.

> With ISA_HAS_FUSED_MADD4, the result is about 1e-17,

> and with ISA_HAS_UNFUSED_MADD4, the result is about 1e-17,

> both of the are incorrect (the expect value is 0).

> 

> The test case is 

> 

> #include <stdio.h>

> 

> double a = 0.6;

> double b = 0.4;

> double c = 0.6;

> double d = 0.4;

> 

> int main(void)

> {

>        double x = a * b - c * d;

>        printf("%le\n", x);

>        return 0;

> }

> 

> 

>> It sounds from a later thread like it's generating incorrect results,

>> is that right?  If so, then FWIW I agree an -mfix option would be more

>> consistent.  E.g. one of the -mfix-vr4120 errata was an incorrect

>> integer division result and one of the -mfix-sb1 errata was an incorrect

>> single-precision float division result.  The latter case could have been

>> handled by an option to disable DIV.S and DIV.PS, but the -mfix option

>> gave more control.

>> 

>> If instead the problem is that the instructions are fused then that's

>> also what the original MIPS 4 parts did, so maybe an option to control

>> fusedness would make sense.

> 

> The result to thread it fused or unfused, is different, while neither of them

> is correct.


ohh, the result are same, and neither is correct.
both of them are 1.332268e-17.

> 

>> 

>> Thanks,

>> Richard
Richard Sandiford Dec. 22, 2016, 4:18 p.m. UTC | #6
Yunqiang Su <Yunqiang.Su@imgtec.com> writes:
>> 在 2016年12月22日,23:48,Yunqiang Su <Yunqiang.Su@imgtec.com> 写道:
>> 
>>> 
>>> 在 2016年12月22日,23:31,Richard Sandiford
>>> <rdsandiford@googlemail.com> 写道:
>>> 
>>> Matthew Fortune <Matthew.Fortune@imgtec.com> writes:
>>>> Sandra Loosemore <sandra@codesourcery.com> writes:
>>>>> On 12/21/2016 11:54 AM, Yunqiang Su wrote:
>>>>>> By this patch, I add a build-time option ` --with-unfused-madd4=yes/no',
>>>>>> and runtime option -m(no-)unfused-madd4,
>>>>>> to disable generate madd.fmt instructions.
>>>>> 
>>>>> Your patch also needs a documentation change so that the new
>>>>> command-line option is listed in the GCC manual with other MIPS target
>>>>> options.
>>>> 
>>>> Any opinions on option names to control this? Is it best to target
>>>> the specific
>>>> feature that is non-compliant on loongson or apply a general -mfix-loongson
>>>> type option?
>>>> 
>>>> I'm not sure I have a strong opinion either way but there do seem to be
>>>> multiple possible variants.
>>> 
>>> Wasn't sure from this thread whether Loongson simply had a fused
>>> implementation (without intermediate rounding) or whether the
>>> instructions gave numerically incorrect results for some inputs.
>> 
>> I test to define ISA_HAS_FUSED_MADD4 true and 
>> define ISA_HAS_UNFUSED_MADD4 false, and try to build a test case.
>> With ISA_HAS_FUSED_MADD4, the result is about 1e-17,
>> and with ISA_HAS_UNFUSED_MADD4, the result is about 1e-17,
>> both of the are incorrect (the expect value is 0).
>> 
>> The test case is 
>> 
>> #include <stdio.h>
>> 
>> double a = 0.6;
>> double b = 0.4;
>> double c = 0.6;
>> double d = 0.4;
>> 
>> int main(void)
>> {
>>        double x = a * b - c * d;
>>        printf("%le\n", x);
>>        return 0;
>> }
>> 
>> 
>>> It sounds from a later thread like it's generating incorrect results,
>>> is that right?  If so, then FWIW I agree an -mfix option would be more
>>> consistent.  E.g. one of the -mfix-vr4120 errata was an incorrect
>>> integer division result and one of the -mfix-sb1 errata was an incorrect
>>> single-precision float division result.  The latter case could have been
>>> handled by an option to disable DIV.S and DIV.PS, but the -mfix option
>>> gave more control.
>>> 
>>> If instead the problem is that the instructions are fused then that's
>>> also what the original MIPS 4 parts did, so maybe an option to control
>>> fusedness would make sense.
>> 
>> The result to thread it fused or unfused, is different, while neither of them
>> is correct.
>
> ohh, the result are same, and neither is correct.
> both of them are 1.332268e-17.

That's the expected result for an implementation in which the subtraction
is fused with the first multiplication without intermediate rounding.
The second 0.4 * 0.6 isn't exactly representable and rounds down, to a
value slightly less than 0.24.  Then the fused operation subtracts this
value from the exact result of the first 0.4 * 0.6 (0.24), giving a
value slightly greater than 0.

I see Paul Hua's patch does add Loongson to the list of targets
with a fused implementation (should have checked earlier, sorry).
So I think after that patch we would do the right thing.  (In particular,
-ffp-contract=off would then disable the fusing.)

Thanks,
Richard
Yunqiang Su Dec. 22, 2016, 4:38 p.m. UTC | #7
> 在 2016年12月23日,00:18,Richard Sandiford <rdsandiford@googlemail.com> 写道:

> 

> Yunqiang Su <Yunqiang.Su@imgtec.com> writes:

>>> 在 2016年12月22日,23:48,Yunqiang Su <Yunqiang.Su@imgtec.com> 写道:

>>> 

>>>> 

>>>> 在 2016年12月22日,23:31,Richard Sandiford

>>>> <rdsandiford@googlemail.com> 写道:

>>>> 

>>>> Matthew Fortune <Matthew.Fortune@imgtec.com> writes:

>>>>> Sandra Loosemore <sandra@codesourcery.com> writes:

>>>>>> On 12/21/2016 11:54 AM, Yunqiang Su wrote:

>>>>>>> By this patch, I add a build-time option ` --with-unfused-madd4=yes/no',

>>>>>>> and runtime option -m(no-)unfused-madd4,

>>>>>>> to disable generate madd.fmt instructions.

>>>>>> 

>>>>>> Your patch also needs a documentation change so that the new

>>>>>> command-line option is listed in the GCC manual with other MIPS target

>>>>>> options.

>>>>> 

>>>>> Any opinions on option names to control this? Is it best to target

>>>>> the specific

>>>>> feature that is non-compliant on loongson or apply a general -mfix-loongson

>>>>> type option?

>>>>> 

>>>>> I'm not sure I have a strong opinion either way but there do seem to be

>>>>> multiple possible variants.

>>>> 

>>>> Wasn't sure from this thread whether Loongson simply had a fused

>>>> implementation (without intermediate rounding) or whether the

>>>> instructions gave numerically incorrect results for some inputs.

>>> 

>>> I test to define ISA_HAS_FUSED_MADD4 true and 

>>> define ISA_HAS_UNFUSED_MADD4 false, and try to build a test case.

>>> With ISA_HAS_FUSED_MADD4, the result is about 1e-17,

>>> and with ISA_HAS_UNFUSED_MADD4, the result is about 1e-17,

>>> both of the are incorrect (the expect value is 0).

>>> 

>>> The test case is 

>>> 

>>> #include <stdio.h>

>>> 

>>> double a = 0.6;

>>> double b = 0.4;

>>> double c = 0.6;

>>> double d = 0.4;

>>> 

>>> int main(void)

>>> {

>>>       double x = a * b - c * d;

>>>       printf("%le\n", x);

>>>       return 0;

>>> }

>>> 

>>> 

>>>> It sounds from a later thread like it's generating incorrect results,

>>>> is that right?  If so, then FWIW I agree an -mfix option would be more

>>>> consistent.  E.g. one of the -mfix-vr4120 errata was an incorrect

>>>> integer division result and one of the -mfix-sb1 errata was an incorrect

>>>> single-precision float division result.  The latter case could have been

>>>> handled by an option to disable DIV.S and DIV.PS, but the -mfix option

>>>> gave more control.

>>>> 

>>>> If instead the problem is that the instructions are fused then that's

>>>> also what the original MIPS 4 parts did, so maybe an option to control

>>>> fusedness would make sense.

>>> 

>>> The result to thread it fused or unfused, is different, while neither of them

>>> is correct.

>> 

>> ohh, the result are same, and neither is correct.

>> both of them are 1.332268e-17.

> 

> That's the expected result for an implementation in which the subtraction

> is fused with the first multiplication without intermediate rounding.

> The second 0.4 * 0.6 isn't exactly representable and rounds down, to a

> value slightly less than 0.24.  Then the fused operation subtracts this

> value from the exact result of the first 0.4 * 0.6 (0.24), giving a

> value slightly greater than 0.


I see. But I think 1.332268e-17 is too big than expected.
1-e17 for double is totally unacceptable.

> 

> I see Paul Hua's patch does add Loongson to the list of targets

> with a fused implementation (should have checked earlier, sorry).

> So I think after that patch we would do the right thing.  (In particular,

> -ffp-contract=off would then disable the fusing.)


will -ffp-contract=off disable some other optimization?
If so, I don’t think that will be an ideal choice for distributions, like Debian. 

> 

> Thanks,

> Richard
Paul Hua Dec. 23, 2016, 2:47 a.m. UTC | #8
On aarch64 target the result are 1.332268e-17.
On x86 with fma target the result are also 1.332268e-17.

so, I don't think the Loongson's madd.fmt/msub.fmt is incorrect.

We should do something for usage of fused madd, the all things has
been tested an fedora21 remix for loongson(1).

1, gcc:add loongson to the list of targets with a fused implementation.
2, glibc:__fma() function,we should use madd.fmt like aarch64.
see patch: http://www.loongnix.org/cgit/glibc-2.20/commit/?id=14023742e6ef571b61439d0d7bb7939e663fe624
3, kernel: the emulation  when a float exception taken.

(1),http://www.loongnix.org/index.php/Loongnix-20161130%E7%89%88%E6%9C%AC%E5%9C%A8Fedora21_remix%E7%B3%BB%E7%BB%9F%E4%B8%8A%E5%8F%91%E5%B8%83

Thanks,
Paul

On Fri, Dec 23, 2016 at 12:38 AM, Yunqiang Su <Yunqiang.Su@imgtec.com> wrote:
>
>> 在 2016年12月23日,00:18,Richard Sandiford <rdsandiford@googlemail.com> 写道:
>>
>> Yunqiang Su <Yunqiang.Su@imgtec.com> writes:
>>>> 在 2016年12月22日,23:48,Yunqiang Su <Yunqiang.Su@imgtec.com> 写道:
>>>>
>>>>>
>>>>> 在 2016年12月22日,23:31,Richard Sandiford
>>>>> <rdsandiford@googlemail.com> 写道:
>>>>>
>>>>> Matthew Fortune <Matthew.Fortune@imgtec.com> writes:
>>>>>> Sandra Loosemore <sandra@codesourcery.com> writes:
>>>>>>> On 12/21/2016 11:54 AM, Yunqiang Su wrote:
>>>>>>>> By this patch, I add a build-time option ` --with-unfused-madd4=yes/no',
>>>>>>>> and runtime option -m(no-)unfused-madd4,
>>>>>>>> to disable generate madd.fmt instructions.
>>>>>>>
>>>>>>> Your patch also needs a documentation change so that the new
>>>>>>> command-line option is listed in the GCC manual with other MIPS target
>>>>>>> options.
>>>>>>
>>>>>> Any opinions on option names to control this? Is it best to target
>>>>>> the specific
>>>>>> feature that is non-compliant on loongson or apply a general -mfix-loongson
>>>>>> type option?
>>>>>>
>>>>>> I'm not sure I have a strong opinion either way but there do seem to be
>>>>>> multiple possible variants.
>>>>>
>>>>> Wasn't sure from this thread whether Loongson simply had a fused
>>>>> implementation (without intermediate rounding) or whether the
>>>>> instructions gave numerically incorrect results for some inputs.
>>>>
>>>> I test to define ISA_HAS_FUSED_MADD4 true and
>>>> define ISA_HAS_UNFUSED_MADD4 false, and try to build a test case.
>>>> With ISA_HAS_FUSED_MADD4, the result is about 1e-17,
>>>> and with ISA_HAS_UNFUSED_MADD4, the result is about 1e-17,
>>>> both of the are incorrect (the expect value is 0).
>>>>
>>>> The test case is
>>>>
>>>> #include <stdio.h>
>>>>
>>>> double a = 0.6;
>>>> double b = 0.4;
>>>> double c = 0.6;
>>>> double d = 0.4;
>>>>
>>>> int main(void)
>>>> {
>>>>       double x = a * b - c * d;
>>>>       printf("%le\n", x);
>>>>       return 0;
>>>> }
>>>>
>>>>
>>>>> It sounds from a later thread like it's generating incorrect results,
>>>>> is that right?  If so, then FWIW I agree an -mfix option would be more
>>>>> consistent.  E.g. one of the -mfix-vr4120 errata was an incorrect
>>>>> integer division result and one of the -mfix-sb1 errata was an incorrect
>>>>> single-precision float division result.  The latter case could have been
>>>>> handled by an option to disable DIV.S and DIV.PS, but the -mfix option
>>>>> gave more control.
>>>>>
>>>>> If instead the problem is that the instructions are fused then that's
>>>>> also what the original MIPS 4 parts did, so maybe an option to control
>>>>> fusedness would make sense.
>>>>
>>>> The result to thread it fused or unfused, is different, while neither of them
>>>> is correct.
>>>
>>> ohh, the result are same, and neither is correct.
>>> both of them are 1.332268e-17.
>>
>> That's the expected result for an implementation in which the subtraction
>> is fused with the first multiplication without intermediate rounding.
>> The second 0.4 * 0.6 isn't exactly representable and rounds down, to a
>> value slightly less than 0.24.  Then the fused operation subtracts this
>> value from the exact result of the first 0.4 * 0.6 (0.24), giving a
>> value slightly greater than 0.
>
> I see. But I think 1.332268e-17 is too big than expected.
> 1-e17 for double is totally unacceptable.
>
>>
>> I see Paul Hua's patch does add Loongson to the list of targets
>> with a fused implementation (should have checked earlier, sorry).
>> So I think after that patch we would do the right thing.  (In particular,
>> -ffp-contract=off would then disable the fusing.)
>
> will -ffp-contract=off disable some other optimization?
> If so, I don’t think that will be an ideal choice for distributions, like Debian.
>
>>
>> Thanks,
>> Richard
>
Yunqiang Su Dec. 23, 2016, 4:06 a.m. UTC | #9
> 在 2016年12月23日,10:47,Paul Hua <paul.hua.gm@gmail.com> 写道:

> 

> On aarch64 target the result are 1.332268e-17.

> On x86 with fma target the result are also 1.332268e-17.

> 

> so, I don't think the Loongson's madd.fmt/msub.fmt is incorrect.


OMG. Will this behavior make some app wrong working abnormal?


> 

> We should do something for usage of fused madd, the all things has

> been tested an fedora21 remix for loongson(1).


We met lots of ftbfs on Debian with madd.fmt enabled.
So maybe FMA should not be used for normal applications at all?
and for generic linux distributions FMA shouldn’t be enabled by default.

> 

> 1, gcc:add loongson to the list of targets with a fused implementation.


If so, I think we should do this.

> 2, glibc:__fma() function,we should use madd.fmt like aarch64.

> see patch: http://www.loongnix.org/cgit/glibc-2.20/commit/?id=14023742e6ef571b61439d0d7bb7939e663fe624


Your patch shouldn’t be apply to upstream, as you don’t consider old CPUs without FMA.
In fact all MIPS <= MIPSr5 don’t have FMA.

> 3, kernel: the emulation  when a float exception taken.


The big problem is that Loongson use the same encode for (unfused) madd.fmt to (fused) madd.fmt.
We cannot trap this in kernel.

> 

> (1),http://www.loongnix.org/index.php/Loongnix-20161130%E7%89%88%E6%9C%AC%E5%9C%A8Fedora21_remix%E7%B3%BB%E7%BB%9F%E4%B8%8A%E5%8F%91%E5%B8%83

> 

> Thanks,

> Paul

> 

> On Fri, Dec 23, 2016 at 12:38 AM, Yunqiang Su <Yunqiang.Su@imgtec.com> wrote:

>> 

>>> 在 2016年12月23日,00:18,Richard Sandiford <rdsandiford@googlemail.com> 写道:

>>> 

>>> Yunqiang Su <Yunqiang.Su@imgtec.com> writes:

>>>>> 在 2016年12月22日,23:48,Yunqiang Su <Yunqiang.Su@imgtec.com> 写道:

>>>>> 

>>>>>> 

>>>>>> 在 2016年12月22日,23:31,Richard Sandiford

>>>>>> <rdsandiford@googlemail.com> 写道:

>>>>>> 

>>>>>> Matthew Fortune <Matthew.Fortune@imgtec.com> writes:

>>>>>>> Sandra Loosemore <sandra@codesourcery.com> writes:

>>>>>>>> On 12/21/2016 11:54 AM, Yunqiang Su wrote:

>>>>>>>>> By this patch, I add a build-time option ` --with-unfused-madd4=yes/no',

>>>>>>>>> and runtime option -m(no-)unfused-madd4,

>>>>>>>>> to disable generate madd.fmt instructions.

>>>>>>>> 

>>>>>>>> Your patch also needs a documentation change so that the new

>>>>>>>> command-line option is listed in the GCC manual with other MIPS target

>>>>>>>> options.

>>>>>>> 

>>>>>>> Any opinions on option names to control this? Is it best to target

>>>>>>> the specific

>>>>>>> feature that is non-compliant on loongson or apply a general -mfix-loongson

>>>>>>> type option?

>>>>>>> 

>>>>>>> I'm not sure I have a strong opinion either way but there do seem to be

>>>>>>> multiple possible variants.

>>>>>> 

>>>>>> Wasn't sure from this thread whether Loongson simply had a fused

>>>>>> implementation (without intermediate rounding) or whether the

>>>>>> instructions gave numerically incorrect results for some inputs.

>>>>> 

>>>>> I test to define ISA_HAS_FUSED_MADD4 true and

>>>>> define ISA_HAS_UNFUSED_MADD4 false, and try to build a test case.

>>>>> With ISA_HAS_FUSED_MADD4, the result is about 1e-17,

>>>>> and with ISA_HAS_UNFUSED_MADD4, the result is about 1e-17,

>>>>> both of the are incorrect (the expect value is 0).

>>>>> 

>>>>> The test case is

>>>>> 

>>>>> #include <stdio.h>

>>>>> 

>>>>> double a = 0.6;

>>>>> double b = 0.4;

>>>>> double c = 0.6;

>>>>> double d = 0.4;

>>>>> 

>>>>> int main(void)

>>>>> {

>>>>>      double x = a * b - c * d;

>>>>>      printf("%le\n", x);

>>>>>      return 0;

>>>>> }

>>>>> 

>>>>> 

>>>>>> It sounds from a later thread like it's generating incorrect results,

>>>>>> is that right?  If so, then FWIW I agree an -mfix option would be more

>>>>>> consistent.  E.g. one of the -mfix-vr4120 errata was an incorrect

>>>>>> integer division result and one of the -mfix-sb1 errata was an incorrect

>>>>>> single-precision float division result.  The latter case could have been

>>>>>> handled by an option to disable DIV.S and DIV.PS, but the -mfix option

>>>>>> gave more control.

>>>>>> 

>>>>>> If instead the problem is that the instructions are fused then that's

>>>>>> also what the original MIPS 4 parts did, so maybe an option to control

>>>>>> fusedness would make sense.

>>>>> 

>>>>> The result to thread it fused or unfused, is different, while neither of them

>>>>> is correct.

>>>> 

>>>> ohh, the result are same, and neither is correct.

>>>> both of them are 1.332268e-17.

>>> 

>>> That's the expected result for an implementation in which the subtraction

>>> is fused with the first multiplication without intermediate rounding.

>>> The second 0.4 * 0.6 isn't exactly representable and rounds down, to a

>>> value slightly less than 0.24.  Then the fused operation subtracts this

>>> value from the exact result of the first 0.4 * 0.6 (0.24), giving a

>>> value slightly greater than 0.

>> 

>> I see. But I think 1.332268e-17 is too big than expected.

>> 1-e17 for double is totally unacceptable.

>> 

>>> 

>>> I see Paul Hua's patch does add Loongson to the list of targets

>>> with a fused implementation (should have checked earlier, sorry).

>>> So I think after that patch we would do the right thing.  (In particular,

>>> -ffp-contract=off would then disable the fusing.)

>> 

>> will -ffp-contract=off disable some other optimization?

>> If so, I don’t think that will be an ideal choice for distributions, like Debian.

>> 

>>> 

>>> Thanks,

>>> Richard

>>
Yunqiang Su Dec. 24, 2016, 9:11 a.m. UTC | #10
> 在 2016年12月22日,23:31,Richard Sandiford <rdsandiford@googlemail.com> 写道:

> 

> Matthew Fortune <Matthew.Fortune@imgtec.com> writes:

>> Sandra Loosemore <sandra@codesourcery.com> writes:

>>> On 12/21/2016 11:54 AM, Yunqiang Su wrote:

>>>> By this patch, I add a build-time option ` --with-unfused-madd4=yes/no',

>>>> and runtime option -m(no-)unfused-madd4,

>>>> to disable generate madd.fmt instructions.

>>> 

>>> Your patch also needs a documentation change so that the new

>>> command-line option is listed in the GCC manual with other MIPS target

>>> options.

>> 

>> Any opinions on option names to control this? Is it best to target the specific

>> feature that is non-compliant on loongson or apply a general -mfix-loongson

>> type option?

>> 

>> I'm not sure I have a strong opinion either way but there do seem to be

>> multiple possible variants.

> 

> Wasn't sure from this thread whether Loongson simply had a fused

> implementation (without intermediate rounding) or whether the


I test some cases on Loongson 3A 1000 with glibc’s test: math/test-double.
        -march=mips64r2

1.  set ISA_HAS_FUSED_MADD4 1
     set ISA_HAS_UNFUSED_MADD4 0
   
     56 errors occurred, which is the same with amd64 with avx2 enabled.

2.  set ISA_HAS_FUSED_MADD4 0
     set ISA_HAS_UNFUSED_MADD4 0

    no error happens

3. set ISA_HAS_FUSED_MADD4 0
    set ISA_HAS_UNFUSED_MADD4 1
   
    816 errors occurred.
  
So it seems Loongson’s madd.fmt is are fused.


> instructions gave numerically incorrect results for some inputs.

> It sounds from a later thread like it's generating incorrect results,

> is that right?  If so, then FWIW I agree an -mfix option would be more

> consistent.  E.g. one of the -mfix-vr4120 errata was an incorrect

> integer division result and one of the -mfix-sb1 errata was an incorrect

> single-precision float division result.  The latter case could have been

> handled by an option to disable DIV.S and DIV.PS, but the -mfix option

> gave more control.

> 

> If instead the problem is that the instructions are fused then that's

> also what the original MIPS 4 parts did, so maybe an option to control

> fusedness would make sense.


Since for binary Linux distributions, like Debian, we need to generate
generic output, we will need an option to disable madd.fmt at all.

Besides this, should we have another options, like
    -mmadd-is-fused
    -mno-madd-is-fused
?

Ohh, maybe we should have options like these?
    build-time
         —with-madd=fused/unfused/no
    runtime
         -mmadd=fused/unfused/no


> 

> Thanks,

> Richard
Maciej W. Rozycki Jan. 11, 2017, 5:40 p.m. UTC | #11
On Fri, 23 Dec 2016, Yunqiang Su wrote:

> > 3, kernel: the emulation  when a float exception taken.
> 
> The big problem is that Loongson use the same encode for (unfused) 
> madd.fmt to (fused) madd.fmt. We cannot trap this in kernel.

 Why is that a problem?  Just add a setting like `cpu_has_fused_madd' to 
<asm/cpu-features.h>, switch processing in the emulator accordingly and 
initialise the setting early on in bootstrap as processor features are 
determined.  We'd have to do it likewise for the R8000 MIPS IV CPU Richard 
referred to previously if we had full support for this processor (unlikely 
to happen).

 Cc-ing <linux-mips@linux-mips.org> in case you want to discuss it further 
with the kernel developers; arguably it's a kernel bug already that the 
FPU as perceived by user software has different semantics on FPU Loongson 
hardware depending on whether or not the `nofpu' kernel parameter has been 
set (I don't know if denormals are handled by Loongson FPU hardware or 
trap with an Unimplemented Operation exception, but if the former, then 
this inconsistency applies for such data in the regular FPU mode as well).

 HTH,

  Maciej
diff mbox

Patch

diff --git a/src/gcc/config.gcc b/src/gcc/config.gcc
index 1b7da0e..3a30b44 100644
--- a/src/gcc/config.gcc
+++ b/src/gcc/config.gcc
@@ -3991,7 +3991,7 @@  case "${target}" in
 		;;
 
 	mips*-*-*)
-		supported_defaults="abi arch arch_32 arch_64 float fpu nan fp_32 odd_spreg_32 tune tune_32 tune_64 divide llsc mips-plt synci"
+		supported_defaults="abi arch arch_32 arch_64 float fpu nan fp_32 odd_spreg_32 unfused_madd4 tune tune_32 tune_64 divide llsc mips-plt synci"
 
 		case ${with_float} in
 		"" | soft | hard)
@@ -4048,6 +4048,19 @@  case "${target}" in
 			exit 1
 			;;
 		esac
+		
+		case ${with_unfused_madd4} in
+		"" | yes)
+			with_unfused_madd4="unfused-madd4"
+			;;
+		no)
+			with_unfused_madd4="no-unfused-madd4"
+			;;
+		*)
+			echo "Unknown unfused_madd4 type used in --with-unfused-madd4=$with_unfused_madd4" 1>&2
+			exit 1
+			;;
+		esac
 
 		case ${with_abi} in
 		"" | 32 | o64 | n32 | 64 | eabi)
@@ -4547,7 +4560,7 @@  case ${target} in
 esac
 
 t=
-all_defaults="abi cpu cpu_32 cpu_64 arch arch_32 arch_64 tune tune_32 tune_64 schedule float mode fpu nan fp_32 odd_spreg_32 divide llsc mips-plt synci tls"
+all_defaults="abi cpu cpu_32 cpu_64 arch arch_32 arch_64 tune tune_32 tune_64 schedule float mode fpu nan fp_32 odd_spreg_32 unfused_madd4 divide llsc mips-plt synci tls"
 for option in $all_defaults
 do
 	eval "val=\$with_"`echo $option | sed s/-/_/g`
diff --git a/src/gcc/config/mips/mips.c b/src/gcc/config/mips/mips.c
index 5af3d1e..236ab94 100644
--- a/src/gcc/config/mips/mips.c
+++ b/src/gcc/config/mips/mips.c
@@ -17992,6 +17992,16 @@  mips_option_override (void)
          will be produced.  */
       target_flags |= MASK_ODD_SPREG;
     }
+  
+  /* If neither -munfused-madd nor -mno-unfused-madd was given on the command
+     line, set MASK_UNFSUED_MADD based on the ISA.  */
+  if ((target_flags_explicit & MASK_UNFUSED_MADD4) == 0)
+    {
+      if (!ISA_HAS_UNFUSED_MADD4)
+	target_flags &= ~MASK_UNFUSED_MADD4;
+      else
+	target_flags |= MASK_UNFUSED_MADD4;
+    }
 
   if (!ISA_HAS_COMPACT_BRANCHES && mips_cb == MIPS_CB_ALWAYS)
     {
diff --git a/src/gcc/config/mips/mips.h b/src/gcc/config/mips/mips.h
index 763ca58..8c7d24e 100644
--- a/src/gcc/config/mips/mips.h
+++ b/src/gcc/config/mips/mips.h
@@ -892,6 +892,7 @@  struct mips_cpu_info {
 	    ":%{!msoft-float:%{!msingle-float:%{!mfp*:-mfp%(VALUE)}}}}" }, \
   {"odd_spreg_32", "%{" OPT_ARCH32 ":%{!msoft-float:%{!msingle-float:" \
 		   "%{!modd-spreg:%{!mno-odd-spreg:-m%(VALUE)}}}}}" }, \
+  {"unfused_madd4", "%{!munfused-madd4:%{!mno-unfused-madd4:-m%(VALUE)}}" }, \
   {"divide", "%{!mdivide-traps:%{!mdivide-breaks:-mdivide-%(VALUE)}}" }, \
   {"llsc", "%{!mllsc:%{!mno-llsc:-m%(VALUE)}}" }, \
   {"mips-plt", "%{!mplt:%{!mno-plt:-m%(VALUE)}}" }, \
@@ -1089,7 +1090,7 @@  struct mips_cpu_info {
 
 /* ISA has 4 operand unfused madd instructions of the form
    'd = [+-] (a * b [+-] c)'.  */
-#define ISA_HAS_UNFUSED_MADD4	(ISA_HAS_FP4 && !TARGET_MIPS8000)
+#define ISA_HAS_UNFUSED_MADD4	(ISA_HAS_FP4 && !TARGET_MIPS8000 && TARGET_UNFUSED_MADD4)
 
 /* ISA has 3 operand r6 fused madd instructions of the form
    'c = c [+-] (a * b)'.  */
diff --git a/src/gcc/config/mips/mips.opt b/src/gcc/config/mips/mips.opt
index ebd67e4..a8c23f6 100644
--- a/src/gcc/config/mips/mips.opt
+++ b/src/gcc/config/mips/mips.opt
@@ -412,6 +412,10 @@  modd-spreg
 Target Report Mask(ODD_SPREG)
 Enable use of odd-numbered single-precision registers.
 
+munfused-madd4
+Target Report Mask(UNFUSED_MADD4)
+Enable unfused multiply-add/multiply-sub instruction, aka madd.fmt/msub.fmt.
+
 mframe-header-opt
 Target Report Var(flag_frame_header_optimization) Optimization
 Optimize frame header.