[PATCH] D47370: AMDGPU: Round up kernel argument allocation size
Tony Tye via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri May 25 13:58:55 PDT 2018
t-tye added inline comments.
================
Comment at: lib/Target/AMDGPU/AMDGPUSubtarget.cpp:426
+ // Being able to dereference past the end is useful for emitting scalar loads.
+ return alignTo(TotalSize, 4);
}
----------------
arsenm wrote:
> t-tye wrote:
> > I believe you can align this to 16. See HSA spec at www.hsafoundation.com/html_spec111/HSA_Library.htm#PRM/Topics/04_SyntaxSemantics/kernarg_segment.htm which says:
> >
> > "The alignment of the base address of the kernel's kernarg segment variables is the larger of 16 bytes and the maximum alignment of the kernel's kernarg segment variables."
> >
> > I suspect that the OpenCL runtime simply aligns all kernarg allocations to 256 but not sure of other languages.
> I think we really only need to pad up to 4 at the end to avoid vmem extloads. Wider scalar loads at the end may be useful in some cases, but we do so badly at this now I wouldn't worry about it yet
Confirmed that OpenCL chooses to always align kernarg up to 128. I don't think the compiler should rely on anything higher than defined by the HSA spec 16 but interesting to know.
https://reviews.llvm.org/D47370
More information about the llvm-commits
mailing list