[PATCH] D142782: [AMDGPU] Add basic support for extended i8 perm matching

Thu Feb 23 13:47:47 PST 2023

jrbyrnes updated this revision to Diff 499971.
jrbyrnes added a comment.

Check that we are actually doing 8 bit extraction before lowering into v_perm.

We can determine (based on the potential perm mask and operands) if we need to insert any 8 bit extraction code.

For example, a perm mask of 0x05040100 suggests we will not need to extract any bits from the operands iff they have 16 bits of data (e.g. zext 16 load into 32 bit). In this case, we assume CodeGen will lower it well, and do not combine into v_perm. If, however, the operands are 32 bit, then we will need to insert mask code, so we do lower to v_perm.

As another example, if we have a mask of 0x05040201 then we will lower into v_perm for muiltiple reasons: 1. the 0x0201 portion of the mask implies a 32 bit operand, 2. the 0x0201 portion of the mask is not well formed, since it requires a shift instruction to address these bits.

Finally, if the mask and operands indicates we are just producing one of the ops, combine the tree into the op.

Feature resulted in changes -- optimally not lowering into v_perm -- in:
	CodeGen/AMDGPU/combine-vload-extract.ll
	CodeGen/AMDGPU/cvt_f32_ubyte.ll
	CodeGen/AMDGPU/ds_read2.ll
	CodeGen/AMDGPU/fast-unaligned-load-store.global.ll
	CodeGen/AMDGPU/fast-unaligned-load-store.private.ll
	CodeGen/AMDGPU/load-hi16.ll
	CodeGen/AMDGPU/load-local.128.ll
	CodeGen/AMDGPU/load-local.96.ll
	CodeGen/AMDGPU/permute.ll

All 4096 permutation of <4 x i8> shufflevector produced desired result (including <i32 0, i32 1, i32 2, i32 3> and <i32 4, i32 5, i32 6, i32 7> which lower into correspond 32 bit operand).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D142782/new/

https://reviews.llvm.org/D142782

Files:
  llvm/lib/Target/AMDGPU/SIISelLowering.cpp
  llvm/test/CodeGen/AMDGPU/combine-vload-extract.ll
  llvm/test/CodeGen/AMDGPU/cvt_f32_ubyte.ll
  llvm/test/CodeGen/AMDGPU/fast-unaligned-load-store.private.ll
  llvm/test/CodeGen/AMDGPU/insert_vector_elt.v2i16.ll
  llvm/test/CodeGen/AMDGPU/load-lo16.ll
  llvm/test/CodeGen/AMDGPU/pack.v2f16.ll
  llvm/test/CodeGen/AMDGPU/pack.v2i16.ll
  llvm/test/CodeGen/AMDGPU/permute.ll
  llvm/test/CodeGen/AMDGPU/permute_i8.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D142782.499971.patch
Type: text/x-patch
Size: 183845 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20230223/315f9f8d/attachment-0001.bin>