[PATCH] D56118: [ARM]: Add optimized NEON uint64x2_t multiply routine.
easyaspi314 (Devin) via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Sun Jan 6 19:03:48 PST 2019
easyaspi314 marked 2 inline comments as done.
easyaspi314 added a comment.
v2i64 mul5(v2i64 val)
{
return val * 5;
}
mul5:
vmov d17, r2, r3
vmov d16, r0, r1
vmov.i32 d18, #0x5
vshrn.i64 d19, q8, #32
vmovn.i64 d16, q8
vmull.u32 q10, d19, d18
vshl.i64 q10, q10, #32
vmlal.u32 q10, d16, d18
vmov r0, r1, d20
vmov r2, r3, d21
bx lr
Ummmmmm…that should **//DEFINITELY//** be a shift+add. That would be much cheaper.
The cost model is definitely messed up…nope, something else. I set the multiply cost to 80000, and it still chose it over shifts!
mul5:
movdqa xmm1, xmmword ptr [rip + LCPI0_0]
movdqa xmm2, xmm0
pmuludq xmm2, xmm1
psrlq xmm0, 32
pmuludq xmm0, xmm1
psllq xmm0, 32
paddq xmm0, xmm2
ret
What?! We should be shifting + adding here too! Are we just not doing shift + adds for vectors?
What the heck?
================
Comment at: test/Analysis/CostModel/ARM/mult.ll:1
+; RUN: opt < %s -cost-model -analyze -mtriple=thumbv7-apple-ios6.0.0 -mcpu=cortex-a9 | FileCheck %s
+
----------------
RKSimon wrote:
> You might find the utils\update_analyze_test_checks.py script useful to make this more maintainable - see X86\arith.ll for examples.
Ok, will take a look at that.
================
Comment at: test/CodeGen/ARM/vmul.ll:40
+
+define <2 x i64> @vmuli64(<2 x i64>* %A, <2 x i64>* %B) nounwind {
+;CHECK-LABEL: vmuli64
----------------
RKSimon wrote:
> Please add these new tests to trunk with current codegen now then rebase this patch so it shows the changes to codegen.
Okay, I will try to do that.
I just run svn update, right? I don't know why I chose SVN.
Repository:
rL LLVM
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D56118/new/
https://reviews.llvm.org/D56118
More information about the llvm-commits
mailing list