[PATCH] D73127: AMDGPU/GlobalISel: Widen non-power-of-2 load results

Matt Arsenault via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Feb 5 05:59:52 PST 2020


arsenm marked an inline comment as done.
arsenm added a comment.





================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp:720-722
+    unsigned Align = Query.MMODescrs[0].AlignInBits;
+    unsigned RoundedSize = NextPowerOf2(Size);
+    return (Align >= RoundedSize);
----------------
nhaehnle wrote:
> As long as the alignment and size are both at least 32, I believe we can always support it. E.g., a dwordx4 load from a pointer that's 4-byte aligned is okay.
> 
> Though... admittedly in that case you may end up loading an additional cache line which you otherwise wouldn't load, because you cross a cacheline boundary. So maybe the right decision is to keep the test as is?
> 
> Either way, the decision ought to be documented as a comment.
This isn't really changing the handling of alignment breakdown, or even the memory access. This is only for the result register type, and just needs to avoid doing something incorrect. The decisions for decomposing based on alignment is already present in the other legalization rules. The rule set here is pretty confusing, and soon due for a rewrite after a few more issues are fixed in the LegalizerHelper


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D73127/new/

https://reviews.llvm.org/D73127





More information about the llvm-commits mailing list