Message ID | 1410437700-29512-1-git-send-email-james.greenhalgh@arm.com |
---|---|
State | New |
Headers | show |
On 11 September 2014 13:15, James Greenhalgh <james.greenhalgh@arm.com> wrote: > > Hi, > > I'd been putting this patch off in the hope that I might find > time to move these intrinsics to a C/builtin implementation, but it > is probably better to get them right for now and come back to improving > them later. > > All four of these suffer the same problem, their "lane" argument should > be a 64-bit rather than 128-bit vector. > > Fix it the obvious way. > > Tested cross on aarch64-none-eabi. > > OK? OK /Marcus
On Thu, Sep 11, 2014 at 03:26:49PM +0100, Marcus Shawcroft wrote: > On 11 September 2014 13:15, James Greenhalgh <james.greenhalgh@arm.com> wrote: > > > > Hi, > > > > I'd been putting this patch off in the hope that I might find > > time to move these intrinsics to a C/builtin implementation, but it > > is probably better to get them right for now and come back to improving > > them later. > > > > All four of these suffer the same problem, their "lane" argument should > > be a 64-bit rather than 128-bit vector. > > > > Fix it the obvious way. > > > > Tested cross on aarch64-none-eabi. > > > > OK? > > OK /Marcus > Thanks Marcus, After your offline pre-approval I've also backported this fix to the 4.9 branch as revision 215178. Cheers, James
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h index c31f7e3..77e3688 100644 --- a/gcc/config/aarch64/arm_neon.h +++ b/gcc/config/aarch64/arm_neon.h @@ -8249,7 +8249,7 @@ vmul_n_u32 (uint32x2_t a, uint32_t b) #define vmull_high_lane_s16(a, b, c) \ __extension__ \ ({ \ - int16x8_t b_ = (b); \ + int16x4_t b_ = (b); \ int16x8_t a_ = (a); \ int32x4_t result; \ __asm__ ("smull2 %0.4s, %1.8h, %2.h[%3]" \ @@ -8262,7 +8262,7 @@ vmul_n_u32 (uint32x2_t a, uint32_t b) #define vmull_high_lane_s32(a, b, c) \ __extension__ \ ({ \ - int32x4_t b_ = (b); \ + int32x2_t b_ = (b); \ int32x4_t a_ = (a); \ int64x2_t result; \ __asm__ ("smull2 %0.2d, %1.4s, %2.s[%3]" \ @@ -8275,7 +8275,7 @@ vmul_n_u32 (uint32x2_t a, uint32_t b) #define vmull_high_lane_u16(a, b, c) \ __extension__ \ ({ \ - uint16x8_t b_ = (b); \ + uint16x4_t b_ = (b); \ uint16x8_t a_ = (a); \ uint32x4_t result; \ __asm__ ("umull2 %0.4s, %1.8h, %2.h[%3]" \ @@ -8288,7 +8288,7 @@ vmul_n_u32 (uint32x2_t a, uint32_t b) #define vmull_high_lane_u32(a, b, c) \ __extension__ \ ({ \ - uint32x4_t b_ = (b); \ + uint32x2_t b_ = (b); \ uint32x4_t a_ = (a); \ uint64x2_t result; \ __asm__ ("umull2 %0.2d, %1.4s, %2.s[%3]" \