[PATCH] D142782: [AMDGPU] WIP: Add basic support for extended i8 perm matching

Fri Jan 27 13:30:26 PST 2023

jrbyrnes created this revision.
Herald added subscribers: kosarev, foad, kerbowa, steven.zhang, hiraditya, tpr, dstuttard, yaxunl, jvesely, kzhuravl, arsenm.
Herald added a project: All.
jrbyrnes requested review of this revision.
Herald added subscribers: llvm-commits, wdng.
Herald added a project: LLVM.

Implement traversal algorithm to match trees to i8 vperms. For ors that can be combined into perms, we expect to see some pattern that combine four 8 bit operands (actually 16 bit operands, with only 8 nonzero bits) into two 16 bit operand, and combine thes two 16 bit operands via the or (after an zext, and ext-shift), The trees that do this type of combination are one of the two classes of trees relevant, and are matched in calculateByteProvider. The 8 bit operands used in this tree are typically produced via an AND op or a SRL op, and are the leaves of the trees in calculateByteProvider. The other relevant class of trees are those that map a leaf of calculateByteProvider to an ultimate source. This class of trees is matched in calculateSrcByte.

Through this recusive process, we track an `Index` (`SrcIndex` in calculateSrcByte) which is the byte of the current op that maps to the byte of the dest of the or we are currently mapping. For example, the 4th byte of the dest of SHL Src, 16 maps to the 2nd byte of Src. Through basic rules like this we can map src bytes to the dest byte of the or. Using this mapping we can create perm masks.

Much of the code for calculateByteProvider was borrowed from CodeGen/SelectionDAG/DAGCombiner.cpp (MatchLoadCombine). There are still many candidate trees that can be matched into perms that this patch does not attempt to. Those are saved for future iterations.

A WIP while I resolve the regressions.

Change-Id: Ib498b6dcec980ccbfbcfdf83dd1816e6647028f6

Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D142782

Files:
  llvm/lib/Target/AMDGPU/SIISelLowering.cpp
  llvm/test/CodeGen/AMDGPU/combine-vload-extract.ll
  llvm/test/CodeGen/AMDGPU/cvt_f32_ubyte.ll
  llvm/test/CodeGen/AMDGPU/ds_read2.ll
  llvm/test/CodeGen/AMDGPU/fast-unaligned-load-store.global.ll
  llvm/test/CodeGen/AMDGPU/fast-unaligned-load-store.private.ll
  llvm/test/CodeGen/AMDGPU/insert_vector_elt.v2i16.ll
  llvm/test/CodeGen/AMDGPU/load-hi16.ll
  llvm/test/CodeGen/AMDGPU/load-lo16.ll
  llvm/test/CodeGen/AMDGPU/load-local.128.ll
  llvm/test/CodeGen/AMDGPU/load-local.96.ll
  llvm/test/CodeGen/AMDGPU/pack.v2f16.ll
  llvm/test/CodeGen/AMDGPU/pack.v2i16.ll
  llvm/test/CodeGen/AMDGPU/permute.ll
  llvm/test/CodeGen/AMDGPU/permute_i8.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D142782.492893.patch
Type: text/x-patch
Size: 92791 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20230127/6371ebe0/attachment.bin>