[PATCH] D120129: [NVPTX] Enhance vectorization of ld.param & st.param

Tue Feb 22 11:21:41 PST 2022

tra added a comment.

FYI. I've recently proposed to pass the values directly, instead of a byval pointer: https://discourse.llvm.org/t/nvptx-calling-convention-for-aggregate-arguments-passed-by-value/5881
It needs additional changes to get lowering done correctly. I should have them ready in about a week.

> This change may be done if the function has private or internal linkage.

I think we should be able to do that to all no-kernel functions if we're compiling without -fgpu-rdc. I think we do reduce visibility of non-kernels in that case, but it would be good to make sure.

> If S is a multiple of 4 * A, let special alignment be 4 * A.
> Else, if S is a multiple of 2 * A, let special alignment be 2 * A.
> Else, let special alignment be A.

I'm not sure if that logic makes sense to me. E.g. if we have `[5 x i32]`, the optimal way to load it would be to align it by 16 and use `ld.v4` + `ld`. With your approach `[4 x i32]` would be loaded ad ld.v4, `[6 x i32]` as `3 * ld.v2`, but `[5 x i32]` would use `5 * ld`. If we can align 4 and 6 element arrays, I do not see why we would not be allowed to align 5-element array, too -- it's an equivalent of `struct { [4 x i32], i32 }` as far as in-memory layout is concerned.

I wonder if we can just always set alignment to 16. Byval pointer for NVPTX is a fiction anyways as we always copy the data when we actually lower those arguments.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D120129/new/

https://reviews.llvm.org/D120129