[PATCH] D71469: [AArch64] Add sq(r)dmulh_lane(q) LLVM IR intrinsics
Eli Friedman via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Thu Jan 16 15:43:47 PST 2020
efriedma added a comment.
> This makes it impossible to do a neat trick when using NEON intrinsics: one can load a number of constants using a single vector load, which are then repeatedly used to multiply whole vectors by one of the constants. This trick is used for a nice performance upside (2.5% to 4% on one microbenchmark) in libjpeg-turbo.
I'm not completely sure I follow here. The "trick" is something like the following?
int16x8_t v = {2,3,4,5,6,7,8,9};
a = vqdmulh_laneq_s16(a, v, 0);
b = vqdmulh_laneq_s16(b, v, 1);
c = vqdmulh_laneq_s16(c, v, 2);
d = vqdmulh_laneq_s16(d, v, 3);
[...]
I can see how that could be helpful. The compiler could probably be taught to recover something like the original structure, but it would probably require a dedicated pass. Or I guess you could hack the source to use "volatile", but that would be ugly.
I'm a little unhappy we're forced to introduce more intrinsics here, but it might be the best solution to avoid breaking carefully tuned code like this.
================
Comment at: llvm/lib/IR/Function.cpp:1374
+ auto *RefVTy = dyn_cast<VectorType>(ArgTys[D.getArgumentNumber()]);
+ if (!VTy || !RefVTy || VTy->getBitWidth() != 64)
+ return true;
----------------
Hardcoding "64" and "128" in target-independent code here seems like a bad idea.
Can we just let both vector operands have any vector type, and reject in the backend if we see an unexpected type?
================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:6054
+ if (VT.getSizeInBits() == 64)
+ return std::make_pair(0U, &AArch64::FPR64_loRegClass);
if (VT.getSizeInBits() == 128)
----------------
Is this related somehow?
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D71469/new/
https://reviews.llvm.org/D71469
More information about the cfe-commits
mailing list