[PATCH] D71469: [AArch64] Add sq(r)dmulh_lane(q) LLVM IR intrinsics

Thu Jan 16 15:43:47 PST 2020

efriedma added a comment.

> This makes it impossible to do a neat trick when using NEON intrinsics: one can load a number of constants using a single vector load, which are then repeatedly used to multiply whole vectors by one of the constants. This trick is used for a nice performance upside (2.5% to 4% on one microbenchmark) in libjpeg-turbo.

I'm not completely sure I follow here.  The "trick" is something like the following?

  int16x8_t v = {2,3,4,5,6,7,8,9};
  a = vqdmulh_laneq_s16(a, v, 0);
  b = vqdmulh_laneq_s16(b, v, 1);
  c = vqdmulh_laneq_s16(c, v, 2);
  d = vqdmulh_laneq_s16(d, v, 3);
  [...]

I can see how that could be helpful.  The compiler could probably be taught to recover something like the original structure, but it would probably require a dedicated pass.  Or I guess you could hack the source to use "volatile", but that would be ugly.

I'm a little unhappy we're forced to introduce more intrinsics here, but it might be the best solution to avoid breaking carefully tuned code like this.

================
Comment at: llvm/lib/IR/Function.cpp:1374
+      auto *RefVTy = dyn_cast<VectorType>(ArgTys[D.getArgumentNumber()]);
+      if (!VTy || !RefVTy || VTy->getBitWidth() != 64)
+        return true;
----------------
Hardcoding "64" and "128" in target-independent code here seems like a bad idea.

Can we just let both vector operands have any vector type, and reject in the backend if we see an unexpected type?

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:6054
+      if (VT.getSizeInBits() == 64)
+        return std::make_pair(0U, &AArch64::FPR64_loRegClass);
       if (VT.getSizeInBits() == 128)
----------------
Is this related somehow?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D71469/new/

https://reviews.llvm.org/D71469