diff mbox

[AArch64] Cheap fix for argument types of vmull_high_lane_{us}{16,32}

Message ID 1410437700-29512-1-git-send-email-james.greenhalgh@arm.com
State New
Headers show

Commit Message

James Greenhalgh Sept. 11, 2014, 12:15 p.m. UTC
Hi,

I'd been putting this patch off in the hope that I might find
time to move these intrinsics to a C/builtin implementation, but it
is probably better to get them right for now and come back to improving
them later.

All four of these suffer the same problem, their "lane" argument should
be a 64-bit rather than 128-bit vector.

Fix it the obvious way.

Tested cross on aarch64-none-eabi.

OK?

Thanks,
James

---
2014-09-11  James Greenhalgh  <james.greenhalgh@arm.com>

	* config/aarch64/arm_neon.h (vmull_high_lane_s16): Fix argument
	types.
	(vmull_high_lane_s32): Likewise.
	(vmull_high_lane_u16): Likewise.
	(vmull_high_lane_u32): Likewise.

Comments

Marcus Shawcroft Sept. 11, 2014, 2:26 p.m. UTC | #1
On 11 September 2014 13:15, James Greenhalgh <james.greenhalgh@arm.com> wrote:
>
> Hi,
>
> I'd been putting this patch off in the hope that I might find
> time to move these intrinsics to a C/builtin implementation, but it
> is probably better to get them right for now and come back to improving
> them later.
>
> All four of these suffer the same problem, their "lane" argument should
> be a 64-bit rather than 128-bit vector.
>
> Fix it the obvious way.
>
> Tested cross on aarch64-none-eabi.
>
> OK?

OK /Marcus
James Greenhalgh Sept. 11, 2014, 3:13 p.m. UTC | #2
On Thu, Sep 11, 2014 at 03:26:49PM +0100, Marcus Shawcroft wrote:
> On 11 September 2014 13:15, James Greenhalgh <james.greenhalgh@arm.com> wrote:
> >
> > Hi,
> >
> > I'd been putting this patch off in the hope that I might find
> > time to move these intrinsics to a C/builtin implementation, but it
> > is probably better to get them right for now and come back to improving
> > them later.
> >
> > All four of these suffer the same problem, their "lane" argument should
> > be a 64-bit rather than 128-bit vector.
> >
> > Fix it the obvious way.
> >
> > Tested cross on aarch64-none-eabi.
> >
> > OK?
> 
> OK /Marcus
> 

Thanks Marcus,

After your offline pre-approval I've also backported this fix to the 4.9
branch as revision 215178.

Cheers,
James
diff mbox

Patch

diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index c31f7e3..77e3688 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -8249,7 +8249,7 @@  vmul_n_u32 (uint32x2_t a, uint32_t b)
 #define vmull_high_lane_s16(a, b, c)                                    \
   __extension__                                                         \
     ({                                                                  \
-       int16x8_t b_ = (b);                                              \
+       int16x4_t b_ = (b);                                              \
        int16x8_t a_ = (a);                                              \
        int32x4_t result;                                                \
        __asm__ ("smull2 %0.4s, %1.8h, %2.h[%3]"                         \
@@ -8262,7 +8262,7 @@  vmul_n_u32 (uint32x2_t a, uint32_t b)
 #define vmull_high_lane_s32(a, b, c)                                    \
   __extension__                                                         \
     ({                                                                  \
-       int32x4_t b_ = (b);                                              \
+       int32x2_t b_ = (b);                                              \
        int32x4_t a_ = (a);                                              \
        int64x2_t result;                                                \
        __asm__ ("smull2 %0.2d, %1.4s, %2.s[%3]"                         \
@@ -8275,7 +8275,7 @@  vmul_n_u32 (uint32x2_t a, uint32_t b)
 #define vmull_high_lane_u16(a, b, c)                                    \
   __extension__                                                         \
     ({                                                                  \
-       uint16x8_t b_ = (b);                                             \
+       uint16x4_t b_ = (b);                                             \
        uint16x8_t a_ = (a);                                             \
        uint32x4_t result;                                               \
        __asm__ ("umull2 %0.4s, %1.8h, %2.h[%3]"                         \
@@ -8288,7 +8288,7 @@  vmul_n_u32 (uint32x2_t a, uint32_t b)
 #define vmull_high_lane_u32(a, b, c)                                    \
   __extension__                                                         \
     ({                                                                  \
-       uint32x4_t b_ = (b);                                             \
+       uint32x2_t b_ = (b);                                             \
        uint32x4_t a_ = (a);                                             \
        uint64x2_t result;                                               \
        __asm__ ("umull2 %0.2d, %1.4s, %2.s[%3]"                         \