[PATCH] D138531: [PATCH] [NVPTX] Backend support for variadic functions

Wed Dec 7 15:12:11 PST 2022

pavelkopyl added inline comments.

================
Comment at: llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp:1676
+      O << ",\n";
+    O << "\t.param .align 8 .b8 %VAParam[]";
+  }
----------------
pavelkopyl wrote:
> tra wrote:
> > pavelkopyl wrote:
> > > tra wrote:
> > > > What determines the alignment here? 
> > > > NVIDIA does not seem to specify anything regarding alignment here and their example shows `align 4`:
> > > > https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#kernel-and-function-directives-func
> > > > 
> > > > 
> > > It seems the documentation is a little bit outdated, because NVCC 11.7 generates .align 8 for the last parameter (unsized array): https://godbolt.org/z/7W7YThMf8
> > > 
> > The question remains. Do we set alignment to 8 because that's what NVCC does or is there some other reason behind it?
> > I.e. should it follow the alignment guarantees provided by e.g. `malloc` which returns a pointer sufficiently aligned to access any type. 
> > 
> > I think this should be retrieved from DataLayout or TargetInfo, instead of being hardcoded here.
> > Based on `NVPTXTargetLowering::getFunctionParamOptimizedAlign`, we may have argument alignment as high as 16.
> I agree, that would be a right way to get alignment value from DataLayout. To be honest, it's not clear which LLVM IR type corresponds to unsized byte array and PTX documentation allows any alignment - 1, 2, 4, 8 or 16, but it doesn't specify which one should be used in what cases. Furthermore. from the correctness point of view exact value of the array alignment doesn't matter: both LowerCall() and LowerVAARG() insert instructions that align va_lits pointer according to a value type being stored/loaded (please, see vaargs.ll test). If we specify ".param .align 1 .b8 %VAParam[]"  that may lead just to a padding space between the first variadic argument and beginning of the array itself. On the other hand, ".align 16" may also lead to wasting of stack space. So, ".aling 8" seems to be an optimal value. NVCC also uses ".align 8". That's why I chose exactly this value.
After digging into this, it seems 8 byte - is the maximum value of alignments of data types which may be passed to a variadic function: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#vector-data-types
That is the reason for using exactly this value.
I moved this hardcode to NVPTXSubtarget where it's available via getMaxRequiredAlignment().

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D138531/new/

https://reviews.llvm.org/D138531