[PATCH] D56118: [ARM]: WIP: Add optimized uint64x2_t multiply routine.

easyaspi314 (Devin) via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Dec 27 20:49:00 PST 2018


easyaspi314 created this revision.
Herald added subscribers: llvm-commits, kristof.beyls, javed.absar.

Patch to fix bug 39967 <https://bugs.llvm.org/show_bug.cgi?id=39967>

There is a lot of improvement work that can go in here, and since I am new to the codebase itself (just got a computer powerful enough to compile LLVM in a couple hours), I would like some help.

There are three main optimizations that can be made.

The first one is to implement the twomul example provided by Eli in comment 7 <https://bugs.llvm.org/show_bug.cgi?id=39967#c7>.
I have the instruction routine there, but it is commented out because I am a little confused about how to generate `vpaddl.u32`.

The second is relatively simple.

  v2i64 pmuludq(v2i64 v1, v2i64 v2)
  {
      return (v1 & 0xFFFFFFFF) * (v2 & 0xFFFFFFFF);
  }

should only generate 2x `vmovn.i64` and `vmull.u32`. There is a `vand` that is automatic.

The more complicated one is a major, but probably difficult optimization (outside my scope), and it uses the first routine, but determines whether a `uint64x2_t` load is used explicitly for a multiply and can be optimized to a `uint32x2x2_t` load, so we can avoid the `vshrn.i64` and `vmovn.i64` instructions.

For example, if a function takes a pointer to a vector and is not inlined, it is fastest to do this:

  uint64x2_t vmulq_u64(uint64x2_t *top, uint64x2_t *bot)
  {
      uint32x2x2_t top32 = vld2_u32((uint32_t*)top);
      uint32x2x2_t bot32 = vld2_u32((uint32_t*)bot);
  
      uint64x2_t ret64 = vmull_u32(top32.val[0], bot32.val[1]);
      ret64 = vmlal_u32(ret64, top32.val[1], bot32.val[0]);
      ret64 = vshrq_n_u64(ret64, 32);
      ret64 = vmlal_u32(ret64, top32.val[0], bot32.val[0]);
  }

This also is important for constants.


Repository:
  rL LLVM

https://reviews.llvm.org/D56118

Files:
  lib/Target/ARM/ARMISelLowering.cpp
  lib/Target/ARM/ARMTargetTransformInfo.cpp
  test/CodeGen/ARM/vmul.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D56118.179602.patch
Type: text/x-patch
Size: 8568 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20181228/29a6e490/attachment.bin>


More information about the llvm-commits mailing list