[PATCH] D55067: [HIP] Fix offset of kernel argument for AMDGPU target

Thu Nov 29 12:36:16 PST 2018

yaxunl added a comment.

In D55067#1313213 <https://reviews.llvm.org/D55067#1313213>, @rjmccall wrote:

> This seems backwards.  Clang knows what the actual ABI alignment of the C type is, and it doesn't have to match the alignment of the IR type that IRGen produces.  It's the actual C ABI alignment that's supposed to affect the calling convention, so there needs to be some way to specify the C ABI alignment on the parameter in IR.  That may mean using `byval`, which can be given an explicit alignment.

Also, the alignment of the kernel argument is only used to put the argument in some buffer accessible to the device. It needs not match the C ABI alignment. If the user take address of this argument, they will get address of an alloca, which matches the C ABI alignment.

For example, in code below, %s.coerce is OK to align to 1. If user takes its address, they get %s, which is aligned to 8.

  %struct.S = type <{ i32*, i8, %struct.U, [5 x i8] }>
  %struct.U = type { i16 }

  ; Function Attrs: convergent noinline nounwind optnone
  define amdgpu_kernel void @_Z6kernelc1SPi(i8 %a, %struct.S %s.coerce, i32* %b) #0 {
  entry:
    %s = alloca %struct.S, align 8, addrspace(5)
    %s1 = addrspacecast %struct.S addrspace(5)* %s to %struct.S*
    %a.addr = alloca i8, align 1, addrspace(5)
    %0 = addrspacecast i8 addrspace(5)* %a.addr to i8*
    %b.addr = alloca i32*, align 8, addrspace(5)
    %1 = addrspacecast i32* addrspace(5)* %b.addr to i32**
    store %struct.S %s.coerce, %struct.S* %s1, align 8
    store i8 %a, i8* %0, align 1
    store i32* %b, i32** %1, align 8
    ret void
  }

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D55067/new/

https://reviews.llvm.org/D55067