[PATCH] D73127: AMDGPU/GlobalISel: Widen non-power-of-2 load results

Nicolai Hähnle via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Feb 12 02:42:46 PST 2020


nhaehnle accepted this revision.
nhaehnle added inline comments.
This revision is now accepted and ready to land.


================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp:720-722
+    unsigned Align = Query.MMODescrs[0].AlignInBits;
+    unsigned RoundedSize = NextPowerOf2(Size);
+    return (Align >= RoundedSize);
----------------
arsenm wrote:
> nhaehnle wrote:
> > As long as the alignment and size are both at least 32, I believe we can always support it. E.g., a dwordx4 load from a pointer that's 4-byte aligned is okay.
> > 
> > Though... admittedly in that case you may end up loading an additional cache line which you otherwise wouldn't load, because you cross a cacheline boundary. So maybe the right decision is to keep the test as is?
> > 
> > Either way, the decision ought to be documented as a comment.
> This isn't really changing the handling of alignment breakdown, or even the memory access. This is only for the result register type, and just needs to avoid doing something incorrect. The decisions for decomposing based on alignment is already present in the other legalization rules. The rule set here is pretty confusing, and soon due for a rewrite after a few more issues are fixed in the LegalizerHelper
Okay, fair enough.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D73127/new/

https://reviews.llvm.org/D73127





More information about the llvm-commits mailing list