[PATCH] D140599: AMDGPU: Promote array alloca if used by memmove/memcpy

Sun Dec 25 21:49:04 PST 2022

ruiling added a comment.

In D140599#4015680 <https://reviews.llvm.org/D140599#4015680>, @arsenm wrote:

> In D140599#4014542 <https://reviews.llvm.org/D140599#4014542>, @ruiling wrote:
>
>> In D140599#4014540 <https://reviews.llvm.org/D140599#4014540>, @arsenm wrote:
>>
>>> Shouldn't these have been expanded into loads and stores already?
>>
>> We only split mem transfer intrinsics using a loop if its length is > 1024.
>
> I think InstCombine and/or SROA do some too. Also that threshold is arbitrary, and we could start expanding the small ones in IR too

If I understand correctly, InstCombine focus on simplifying mem transfer intrinsic instead of expanding them and only work in very limited cases.

In fact, the change here shares similar idea with SROA, it will analyze whether the Alloca involved in memmove/memcpy can be promoted to vector/scalar operation. If it is promotable, then we will expand the mem transfer intrinsic as part of the promotion process. I would say the important thing here is to get optimal code generation for small alloca, expanding smaller mem transfer is not quite important. Does this sound reasonable to you? Or could you share how would you like we expand small mem transfer intrinsic?

In the long term, I would hope we can enhance the SROA to cover the optimization we have done in tryPromoteAllocaToVector(). But that need much more time/effort.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D140599/new/

https://reviews.llvm.org/D140599