[PATCH] D56118: [ARM]: Add optimized NEON uint64x2_t multiply routine.

Sun Jan 6 19:03:48 PST 2019

easyaspi314 marked 2 inline comments as done.
easyaspi314 added a comment.

  v2i64 mul5(v2i64 val)
  {
      return val * 5;
  }

  mul5:
      vmov    d17, r2, r3
      vmov    d16, r0, r1
      vmov.i32    d18, #0x5
      vshrn.i64   d19, q8, #32
      vmovn.i64   d16, q8
      vmull.u32   q10, d19, d18
      vshl.i64    q10, q10, #32
      vmlal.u32   q10, d16, d18
      vmov    r0, r1, d20
      vmov    r2, r3, d21
      bx  lr

Ummmmmm…that should **//DEFINITELY//** be a shift+add. That would be much cheaper.

The cost model is definitely messed up…nope, something else. I set the multiply cost to 80000, and it still chose it over shifts!

  mul5:
      movdqa  xmm1, xmmword ptr [rip + LCPI0_0]
      movdqa  xmm2, xmm0
      pmuludq xmm2, xmm1
      psrlq   xmm0, 32
      pmuludq xmm0, xmm1
      psllq   xmm0, 32
      paddq   xmm0, xmm2
      ret

What?! We should be shifting + adding here too! Are we just not doing shift + adds for vectors?

What the heck?

================
Comment at: test/Analysis/CostModel/ARM/mult.ll:1
+; RUN: opt < %s  -cost-model -analyze -mtriple=thumbv7-apple-ios6.0.0 -mcpu=cortex-a9 | FileCheck %s
+
----------------
RKSimon wrote:
> You might find the utils\update_analyze_test_checks.py script useful to make this more maintainable - see X86\arith.ll for examples.
Ok, will take a look at that.

================
Comment at: test/CodeGen/ARM/vmul.ll:40
+
+define <2 x i64> @vmuli64(<2 x i64>* %A, <2 x i64>* %B) nounwind {
+;CHECK-LABEL: vmuli64
----------------
RKSimon wrote:
> Please add these new tests to trunk with current codegen now then rebase this patch so it shows the changes to codegen.
Okay, I will try to do that. 

I just run svn update, right? I don't know why I chose SVN. 

Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D56118/new/

https://reviews.llvm.org/D56118