[PATCH] D119247: [NVPTX] Use align attribute for kernel pointer arg alignment

Wed Feb 9 12:23:03 PST 2022

nikic added a reviewer: opaque-pointers.
nikic added inline comments.

================
Comment at: llvm/test/CodeGen/NVPTX/nvcl-param-align.ll:7
+define void @foo(i64 %img, i64 %sampler, <5 x float>* align 32 %v) {
+; The parameter alignment is determined by the align attribute.
 ; CHECK: .param .u32 .ptr .align 32 foo_param_2
----------------
tra wrote:
> nikic wrote:
> > tra wrote:
> > > nikic wrote:
> > > > tra wrote:
> > > > > What's expected to happen if the alignment is not specified explicitly?
> > > > > 
> > > > > It may be worth adding a test case for that.
> > > > If no alignment is specified, then the default is 1. I've extended the test to check for this.
> > > Always defaulting to unaligned access would likely be a performance regression.
> > > LLVM IR spec generally defaults to `If the alignment is not specified, then the code generator makes a target-specific assumption.`
> > > I think we do need to infer assumed alignment from the data layout, if we do know the type.
> > > 
> > There may be some confusion here: LLVM defaults to ABI alignment for alignment of memory accesses like loads and stores (or rather, these accesses always have explicit alignment, but this default is usedwhen parsing textual IR). However, for pointer arguments, the default alignment is 1 if no `align` attribute is provided (and it's not a `byval` parameter, but that's explicitly excluded here).
> > 
> > There should be no regressions here as long as the frontend adds the necessary alignment annotations, which was done for OpenCL kernels in D118894. As this code is only used for NVCL and not CUDA I assume that is sufficient, but please correct me if I'm wrong.
> The nominal goal of this patch is to allow dealing with opaque pointer types. It does not require defaulting to alignment of 1. Sticking with a known working default alignment for the non-opaque pointers seems to be a better choice.
> 
> I agree that it's more or less a no-op if IR is generated by clang. However, there are other LLVM IR users that generate IR themselves. E.g. I have no idea whether TensorFlow's XLA, or julia, or any other LLVM-based JIT always passes alignment into for pointer arguments.
> 
> 
> 
> 
If other frontends don't generate the necessary alignment information, then we want to break those frontends ASAP, so they can be adjusted. As a matter of general policy, we always make changes necessary for opaque pointers unconditionally, so that we can deal with their effects in an isolated fashion. Making behavior changes part of the opaque pointer switch itself means that it will be very hard to root-cause the reason for any particular change.

FWIW the equivalent change has already been made in the AMDGPU target.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D119247/new/

https://reviews.llvm.org/D119247