[PATCH] D155995: [AMDGPU] WIP: Allow matching into v_dot4

Jeffrey Byrnes via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri Jul 21 14:21:59 PDT 2023


jrbyrnes created this revision.
jrbyrnes added a reviewer: arsenm.
Herald added subscribers: foad, wenlei, kerbowa, hiraditya, tpr, dstuttard, yaxunl, jvesely, kzhuravl.
Herald added a project: All.
jrbyrnes requested review of this revision.
Herald added subscribers: llvm-commits, wdng.
Herald added a project: LLVM.

Adds the algorithm to match and select v_dot4 instructions in combining, and removes the patterns from selection. The patterns are fragile, and fail to match when byte extraction code is slightly different, or any optimizations alters the add / mul structure of the tree. The DAG combining approach is more flexible, and should not result in much overhead given all the early exits.

For kernels that should select into these instructions, doing so is vitally important. Not only is performance much improved, but failing to select into them can result in severe code bloat which drastically degrades compile time.

The extended perm matching is a happy consequence of whitelisting EXTRACT_VECT_ELT i32s as ultimate srcs of bytes.

A WIP while I work on adding a few test cases.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D155995

Files:
  llvm/lib/Target/AMDGPU/SIISelLowering.cpp
  llvm/lib/Target/AMDGPU/VOP3PInstructions.td
  llvm/test/CodeGen/AMDGPU/idot4s.ll
  llvm/test/CodeGen/AMDGPU/idot4u.ll
  llvm/test/CodeGen/AMDGPU/image-load-d16-tfe.ll
  llvm/test/CodeGen/AMDGPU/insert_vector_elt.v2i16.ll
  llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.sample.d16.dim.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D155995.543065.patch
Type: text/x-patch
Size: 46169 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20230721/963950cc/attachment.bin>


More information about the llvm-commits mailing list