diff mbox

x86-64: Put L(SP_INF_0) in .rodata.cst4 section [BZ #21955]

Message ID 20170815195541.GA19812@gmail.com
State New
Headers show

Commit Message

H.J. Lu Aug. 15, 2017, 7:55 p.m. UTC
On Tue, Aug 15, 2017 at 10:38:55AM -0700, H.J. Lu wrote:
> sysdeps/x86_64/fpu/e_expf.S has
> 
>         /* Here if |x| is Inf */
>         lea     L(SP_INF_0)(%rip), %rdx /* depending on sign of x: */
>         movss   (%rdx,%rax,4), %xmm0    /* return zero or Inf */
>         ret
> ...
>          .section .rodata.cst8,"aM",@progbits,8
> ...
>         .p2align 2
> L(SP_INF_0):
>         .long   0x7f800000      /* single precision Inf */
>         .long   0               /* single precision zero */
>         .type L(SP_INF_0), @object
>         ASM_SIZE_DIRECTIVE(L(SP_INF_0))
> 
> Since L(SP_INF_0) is accessed as an array of 4-byte elements, it should
> be placed in
> 
> 	.section .rodata.cst4,"aM",@progbits,4
> 
> Tested on x86-64.  Any comments?
> 
> H.J.
> ---
> 	[BZ #21955]
> 	* sysdeps/x86_64/fpu/e_expf.S (L(SP_INF_0)): Place it in
> 	.rodata.cst4 section.

L(SP_RANGE) has the same issue.  This updated patch fixes both.

H.J.
---
sysdeps/x86_64/fpu/e_expf.S has

        lea     L(SP_RANGE)(%rip), %rdx /* load over/underflow bound */
        cmpl    (%rdx,%rax,4), %ecx     /* |x|<under/overflow bound ? */
...
        /* Here if |x| is Inf */
        lea     L(SP_INF_0)(%rip), %rdx /* depending on sign of x: */
        movss   (%rdx,%rax,4), %xmm0    /* return zero or Inf */
        ret
...
         .section .rodata.cst8,"aM",@progbits,8
...
        .p2align 2
L(SP_RANGE): /* single precision overflow/underflow bounds */
        .long   0x42b17217      /* if x>this bound, then result overflows */
        .long   0x42cff1b4      /* if x<this bound, then result underflows */
        .type L(SP_RANGE), @object
        ASM_SIZE_DIRECTIVE(L(SP_RANGE))

        .p2align 2
L(SP_INF_0):
        .long   0x7f800000      /* single precision Inf */
        .long   0               /* single precision zero */
        .type L(SP_INF_0), @object
        ASM_SIZE_DIRECTIVE(L(SP_INF_0))

Since L(SP_RANGE) and L(SP_INF_0) are accessed as arrays of 4-byte
elements, they should be placed in .rodata.cst4 section.

	[BZ #21955]
	* sysdeps/x86_64/fpu/e_expf.S (L(SP_RANGE)): Place it in
	.rodata.cst4 section.
	(L(SP_INF_0)): Likewise.
---
 sysdeps/x86_64/fpu/e_expf.S | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Joseph Myers Aug. 15, 2017, 8:02 p.m. UTC | #1
On Tue, 15 Aug 2017, H.J. Lu wrote:

> > 	[BZ #21955]
> > 	* sysdeps/x86_64/fpu/e_expf.S (L(SP_INF_0)): Place it in
> > 	.rodata.cst4 section.
> 
> L(SP_RANGE) has the same issue.  This updated patch fixes both.

It's a lot more than just expf.  There are various other x86_64 and x86 
libm files that could use .rodata.cstN but don't.  (Obviously this only 
works for invididual objects where the code doesn't use offsets from one 
object to another, not when an array of two or more objects is being used 
unless you choose the section appropriately to preserve the array as 
such.)
H.J. Lu Aug. 15, 2017, 8:14 p.m. UTC | #2
On Tue, Aug 15, 2017 at 1:02 PM, Joseph Myers <joseph@codesourcery.com> wrote:
> On Tue, 15 Aug 2017, H.J. Lu wrote:
>
>> >     [BZ #21955]
>> >     * sysdeps/x86_64/fpu/e_expf.S (L(SP_INF_0)): Place it in
>> >     .rodata.cst4 section.
>>
>> L(SP_RANGE) has the same issue.  This updated patch fixes both.
>
> It's a lot more than just expf.  There are various other x86_64 and x86
> libm files that could use .rodata.cstN but don't.  (Obviously this only
> works for invididual objects where the code doesn't use offsets from one
> object to another, not when an array of two or more objects is being used
> unless you choose the section appropriately to preserve the array as
> such.)
>

Aren't .rodata.cstN optimization?  In case of e_expf.S, it is a correctness
issue.
Zack Weinberg Aug. 15, 2017, 8:23 p.m. UTC | #3
On Tue, Aug 15, 2017 at 4:14 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Tue, Aug 15, 2017 at 1:02 PM, Joseph Myers <joseph@codesourcery.com> wrote:
>> On Tue, 15 Aug 2017, H.J. Lu wrote:
>>
>>> >     [BZ #21955]
>>> >     * sysdeps/x86_64/fpu/e_expf.S (L(SP_INF_0)): Place it in
>>> >     .rodata.cst4 section.
>>>
>>> L(SP_RANGE) has the same issue.  This updated patch fixes both.
>>
>> It's a lot more than just expf.  There are various other x86_64 and x86
>> libm files that could use .rodata.cstN but don't.  (Obviously this only
>> works for invididual objects where the code doesn't use offsets from one
>> object to another, not when an array of two or more objects is being used
>> unless you choose the section appropriately to preserve the array as
>> such.)
>
> Aren't .rodata.cstN optimization?  In case of e_expf.S, it is a correctness
> issue.

Please elaborate - I was just about to ask whether this actually makes
any difference to anything, other than abstract tidiness of the
.rodata segment.  The code actually misbehaves without this change?
How?

zw
Joseph Myers Aug. 15, 2017, 8:28 p.m. UTC | #4
On Tue, 15 Aug 2017, H.J. Lu wrote:

> On Tue, Aug 15, 2017 at 1:02 PM, Joseph Myers <joseph@codesourcery.com> wrote:
> > On Tue, 15 Aug 2017, H.J. Lu wrote:
> >
> >> >     [BZ #21955]
> >> >     * sysdeps/x86_64/fpu/e_expf.S (L(SP_INF_0)): Place it in
> >> >     .rodata.cst4 section.
> >>
> >> L(SP_RANGE) has the same issue.  This updated patch fixes both.
> >
> > It's a lot more than just expf.  There are various other x86_64 and x86
> > libm files that could use .rodata.cstN but don't.  (Obviously this only
> > works for invididual objects where the code doesn't use offsets from one
> > object to another, not when an array of two or more objects is being used
> > unless you choose the section appropriately to preserve the array as
> > such.)
> 
> Aren't .rodata.cstN optimization?  In case of e_expf.S, it is a correctness
> issue.

They are generally optimization.

Could you explain this correctness issue in more detail?  SP_INF_0 appears 
to be an array of two 4-byte values.  Because it's used as an array, the 
two values need to stay adjacent.  That is, I'd expect it to need, for 
correctness, to be in .rodata.cst8, as it is at present, and *not* 
.rodata.cst4 (if in .rodata.cst4, it might get split up as those values 
get unified with other values in that section).
diff mbox

Patch

diff --git a/sysdeps/x86_64/fpu/e_expf.S b/sysdeps/x86_64/fpu/e_expf.S
index 4fd2bb1fb5..29e421b4bd 100644
--- a/sysdeps/x86_64/fpu/e_expf.S
+++ b/sysdeps/x86_64/fpu/e_expf.S
@@ -297,6 +297,7 @@  L(DP_P0): /* double precision polynomial coefficient P0 */
 	.type L(DP_P0), @object
 	ASM_SIZE_DIRECTIVE(L(DP_P0))
 
+	.section .rodata.cst4,"aM",@progbits,4
 	.p2align 2
 L(SP_RANGE): /* single precision overflow/underflow bounds */
 	.long	0x42b17217	/* if x>this bound, then result overflows */
@@ -311,7 +312,6 @@  L(SP_INF_0):
 	.type L(SP_INF_0), @object
 	ASM_SIZE_DIRECTIVE(L(SP_INF_0))
 
-	.section .rodata.cst4,"aM",@progbits,4
 	.p2align 2
 L(SP_RS): /* single precision 2^23+2^22 */
 	.long	0x4b400000