[PATCH] D56118: [ARM]: WIP: Add optimized uint64x2_t multiply routine.
easyaspi314 (Devin) via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu Dec 27 20:49:00 PST 2018
easyaspi314 created this revision.
Herald added subscribers: llvm-commits, kristof.beyls, javed.absar.
Patch to fix bug 39967 <https://bugs.llvm.org/show_bug.cgi?id=39967>
There is a lot of improvement work that can go in here, and since I am new to the codebase itself (just got a computer powerful enough to compile LLVM in a couple hours), I would like some help.
There are three main optimizations that can be made.
The first one is to implement the twomul example provided by Eli in comment 7 <https://bugs.llvm.org/show_bug.cgi?id=39967#c7>.
I have the instruction routine there, but it is commented out because I am a little confused about how to generate `vpaddl.u32`.
The second is relatively simple.
v2i64 pmuludq(v2i64 v1, v2i64 v2)
{
return (v1 & 0xFFFFFFFF) * (v2 & 0xFFFFFFFF);
}
should only generate 2x `vmovn.i64` and `vmull.u32`. There is a `vand` that is automatic.
The more complicated one is a major, but probably difficult optimization (outside my scope), and it uses the first routine, but determines whether a `uint64x2_t` load is used explicitly for a multiply and can be optimized to a `uint32x2x2_t` load, so we can avoid the `vshrn.i64` and `vmovn.i64` instructions.
For example, if a function takes a pointer to a vector and is not inlined, it is fastest to do this:
uint64x2_t vmulq_u64(uint64x2_t *top, uint64x2_t *bot)
{
uint32x2x2_t top32 = vld2_u32((uint32_t*)top);
uint32x2x2_t bot32 = vld2_u32((uint32_t*)bot);
uint64x2_t ret64 = vmull_u32(top32.val[0], bot32.val[1]);
ret64 = vmlal_u32(ret64, top32.val[1], bot32.val[0]);
ret64 = vshrq_n_u64(ret64, 32);
ret64 = vmlal_u32(ret64, top32.val[0], bot32.val[0]);
}
This also is important for constants.
Repository:
rL LLVM
https://reviews.llvm.org/D56118
Files:
lib/Target/ARM/ARMISelLowering.cpp
lib/Target/ARM/ARMTargetTransformInfo.cpp
test/CodeGen/ARM/vmul.ll
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D56118.179602.patch
Type: text/x-patch
Size: 8568 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20181228/29a6e490/attachment.bin>
More information about the llvm-commits
mailing list