[PATCH] D80364: [amdgpu] Teach load widening to handle non-DWORD aligned loads.
Michael Liao via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri May 29 20:42:28 PDT 2020
hliao added a comment.
In D80364#2063603 <https://reviews.llvm.org/D80364#2063603>, @arsenm wrote:
> I did some experiments locally and think this can stay in AMDGPUCodeGenPrepare, and doesn't need the split pass. Since you restrict this widening to the case where you're rebasing the load anyway, I don't think this will cause the same problems with the vectorizer the previous IR load widening had (and may help it even?)
>
> test3 should also come back, but should have the explicit align 4 added to the load. This could also use some loads of i8, and <2 x i8>. We could also extend this to handle wider, sub-dword aligned types but that's a separate patch.
Scalar load widening should run after LSV to generate redundant loads. Cases like a sequence of consecutive loads of `i16` benefit from such an organization to avoid redundant load generation. Here's the details
for 4 loads of i16
ld.i16 (ptr + 0)
ld.i16 (ptr + 2)
ld.i16 (ptr + 4)
ld.i16 (ptr + 6)
If we run scalar load widening before LSV. After widening, we have
ld.i16 (ptr + 0)
ld.i32 (ptr + 0)
ld.i16 (ptr + 4)
ld.i32 (ptr + 4)
After LSV, we have
ld.i16 (ptr + 0)
ld.i32x2 (ptr + 0)
ld.i16 (ptr + 4)
That 2 i16 loads are redundant. If we run scalar load widening after LSV, we won't have that result.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D80364/new/
https://reviews.llvm.org/D80364
More information about the llvm-commits
mailing list