[PATCH] D147235: [AArch64] Remove redundant `mov 0` instruction for high 64-bits
JinGu Kang via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu Mar 30 07:25:34 PDT 2023
jaykang10 created this revision.
jaykang10 added reviewers: dmgreen, samtebbs, efriedma.
Herald added subscribers: hiraditya, kristof.beyls.
Herald added a project: All.
jaykang10 requested review of this revision.
Herald added a project: LLVM.
Herald added a subscriber: llvm-commits.
gcc generates less instructions than llvm from below intrinsic example.
#include <arm_neon.h>
float16x8_t test1(const float32x4_t a) {
float16x4_t b = vcvt_f16_f32(a);
return vcombine_f16(b, vdup_n_f16(0.0));
}
uint8x8_t test2(uint16_t *in, uint8x8_t *dst, uint8x8_t idx) {
return vtbl1_u8(vshrn_n_u16(vld1q_u16(in), 4), idx);
}
gcc output
test1:
fcvtn v0.4h, v0.4s
fmov d0, d0
ret
test2:
ldr q1, [x0]
shrn v1.8b, v1.8h, 4
tbl v0.8b, {v1.16b}, v0.8b
ret
llvm output
test1: // @test1
movi d1, #0000000000000000
fcvtn v0.4h, v0.4s
mov v0.d[1], v1.d[0]
ret
test2: // @test2
ldr q1, [x0]
movi v2.2d, #0000000000000000
shrn v1.8b, v1.8h, #4
mov v1.d[1], v2.d[0]
tbl v0.8b, { v1.16b }, v0.8b
ret
The `fcvtn` and `shrn` instructions set zero for high 64-bits implicitly so we do not need `mov 0` instruction for high 64-bits. It looks gcc has patterns for the cases. For example,
the gcc rtl pattern for test2 function's shrn
(define_insn "aarch64_shrn<mode>_insn_le"
[(set (match_operand:<VNARROWQ2> 0 "register_operand" "=w")
(vec_concat:<VNARROWQ2>
(truncate:<VNARROWQ>
(lshiftrt:VQN (match_operand:VQN 1 "register_operand" "w")
(match_operand:VQN 2 "aarch64_simd_shift_imm_vec_<vn_mode>")))
(match_operand:<VNARROWQ> 3 "aarch64_simd_or_scalar_imm_zero")))]
"TARGET_SIMD && !BYTES_BIG_ENDIAN"
"shrn\\t%0.<Vntype>, %1.<Vtype>, %2"
[(set_attr "type" "neon_shift_imm_narrow_q")]
)
llvm could also add tablegen patterns for them like gcc but it could be better to handle the patterns on MIR Peephole optimization pass because they have common sub patterns and considers multiple basic blocks.
With this patch, llvm generates below output.
llvm output
test1: // @test1
fcvtn v0.4h, v0.4s
ret
test2: // @test2
ldr q1, [x0]
shrn v1.8b, v1.8h, #4
tbl v0.8b, { v1.16b }, v0.8b
ret
https://reviews.llvm.org/D147235
Files:
llvm/lib/Target/AArch64/AArch64MIPeepholeOpt.cpp
llvm/test/CodeGen/AArch64/implicitly-set-zero-high-64-bits.ll
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D147235.509669.patch
Type: text/x-patch
Size: 6248 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20230330/125deb58/attachment.bin>
More information about the llvm-commits
mailing list