[PATCH] D73127: AMDGPU/GlobalISel: Widen non-power-of-2 load results
Matt Arsenault via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Feb 5 05:59:52 PST 2020
arsenm marked an inline comment as done.
arsenm added a comment.
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp:720-722
+ unsigned Align = Query.MMODescrs[0].AlignInBits;
+ unsigned RoundedSize = NextPowerOf2(Size);
+ return (Align >= RoundedSize);
----------------
nhaehnle wrote:
> As long as the alignment and size are both at least 32, I believe we can always support it. E.g., a dwordx4 load from a pointer that's 4-byte aligned is okay.
>
> Though... admittedly in that case you may end up loading an additional cache line which you otherwise wouldn't load, because you cross a cacheline boundary. So maybe the right decision is to keep the test as is?
>
> Either way, the decision ought to be documented as a comment.
This isn't really changing the handling of alignment breakdown, or even the memory access. This is only for the result register type, and just needs to avoid doing something incorrect. The decisions for decomposing based on alignment is already present in the other legalization rules. The rule set here is pretty confusing, and soon due for a rewrite after a few more issues are fixed in the LegalizerHelper
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D73127/new/
https://reviews.llvm.org/D73127
More information about the llvm-commits
mailing list