[PATCH] D143731: [AMDGPU] Break-up large PHIs for DAGISel

Tue Feb 28 02:52:04 PST 2023

foad added a comment.

In D143731#4157821 <https://reviews.llvm.org/D143731#4157821>, @Pierre-vh wrote:

> In D143731#4157819 <https://reviews.llvm.org/D143731#4157819>, @foad wrote:
>
>>> This is because it introduces a need to have a build_vector before copying the PHI value, and that build_vector may have many undef elements. This can cause very high register pressure and abnormal stack usage in some cases.
>>
>> Do you have any insight into *why* undef elements in build_vector would cause high register pressure? Is there any chance of fixing this elsewhere, by teaching other parts of the compiler to handle undef elements better?
>
> In the particular case I was looking at, we had a 32xi32 vector (from 11xF64 legalized) with 10 undef elements. As the final operation of the BB was a BUILD_VECTOR + CopyToReg, it forced 22 values to be alive until that point and did a wasteful 32x32 copy for this 11xf64 vector. With this change, the undef values can be optimized out and the intermediate values (elements) no longer need to be all alive at the same point. This means they can be reordered and scratch usage goes back down to zero.
>
> Before this patch, I had looked at optimizing this at the DAG level but it's too much effort to do there. As Matt put it earlier, PHIs/Copies larger than 32 bits are "unnatural" on AMDGPU anyway so there's no real benefit in keep around large PHIs.

Then why can't you scalarize them all? Why do we need yet another weird threshold in the backend?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D143731/new/

https://reviews.llvm.org/D143731