[PATCH] D142782: [AMDGPU] Add basic support for extended i8 perm matching

Fri Feb 3 11:31:12 PST 2023

jrbyrnes added a comment.

In D142782#4095092 <https://reviews.llvm.org/D142782#4095092>, @arsenm wrote:

> The ratio of test to code changes has me worried the test coverage is incomplete

This patch is intended to simply bring in the components necessary for i8 perm matching. Fitting / tuning it to be optimally useful in actual workloads is left to a future iteration. As such, the heuristics / conditions we use to apply the combine are very restrictive (e.g. no multi use operands in or, IsCombineVectorized heuristic, no support for 16 bit ors, etc). For this iteration, my primary concerns for testing were: true positives (i.e. testing accurate production of v_perm when we expect to), and false positive (correctness error / inefficient codegen). False negative (missed opportunity) are left to future iteration.

True positive coverage:
There are 4096 4xi8 shuffle_vector iterations. I tested and validated all permuations. The initial tests included covered all trees for these permutations.
There are 8192 4xi8 shuffle_vector iterations where 1 operand is undef. This iteration doesn't fully support these. Of these, about ~2k are lowered to v_perm by this iteration. I validated all of these.

False positive coverage:
lit tests
CK correctness tests
epsdb (to be run)

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D142782/new/

https://reviews.llvm.org/D142782