[PATCH] D143731: [AMDGPU] Scalarize some large PHIs for DAGISel

Mon Feb 13 02:09:14 PST 2023

Pierre-vh updated this revision to Diff 496880.
Pierre-vh marked 6 inline comments as done.
Pierre-vh added a comment.

I kept the threshold at 256 for now because lower values make the transform kick more often and it's not always clear to me whether it's a positive or negative thing. Some tests have more instructions, some have a bit less, etc.
I think 256 is good enough to get started, we can always tweak it in a separate commit if benchmarking proves that it's useful. What do you think?

Also the following tests are currently broken:

- `test_mfma_loop_32xfloat` in `acc-ldst.ll`: We get accvgpr read/writes instead of a load to agpr directly.
  - One incoming value from a load
  - One incoming value from the call to mfma later in the same block (loop)
- `test_vccnz_ifcvt_triangle256` in `early-if-convert.ll`: s_branch_vccnz is not matched anymore.
  - One incoming value from a load
  - One incoming value from a vector add
- `test_mfma_loop_zeroinit` in `mfma-loop.ll` : accvgpr copies inside the loop
  - One incoming value is a zeroinit
  - One incoming value is from call to mfma later in the same block (loop)

I think we need an additional heuristic but I'm not sure what. Maybe we should not transform when one of the value comes from a load (we know there won't be an undef so it should be good) or comes from the same block (loop)?
I'm also not sure if extract/insert subvector with a constant operand (in the case of zeroinit) gets folded out; do I need to special-case it?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D143731/new/

https://reviews.llvm.org/D143731

Files:
  llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
  llvm/test/CodeGen/AMDGPU/acc-ldst.ll
  llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-break-large-phis.ll
  llvm/test/CodeGen/AMDGPU/early-if-convert.ll
  llvm/test/CodeGen/AMDGPU/extract-subvector-16bit.ll
  llvm/test/CodeGen/AMDGPU/mfma-loop.ll
  llvm/test/CodeGen/AMDGPU/tuple-allocation-failure.ll
  llvm/test/CodeGen/AMDGPU/v1024.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D143731.496880.patch
Type: text/x-patch
Size: 142058 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20230213/b7966cfa/attachment.bin>