[PATCH] D103261: [AMDGPU] Increase alignment of LDS globals if necessary before LDS lowering.

Mahesha S via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Jun 2 04:20:05 PDT 2021


hsmhsm added a comment.

In D103261#2789956 <https://reviews.llvm.org/D103261#2789956>, @JonChesterfield wrote:

> I'm not very convinced by this given the recent enthusiasm for decreasing LDS usage. Types are naturally aligned by default, so I think the only time they are aligned by less than that is when the programmer asked for it (or when the vectorizer is involved, but that's disabled on amdgpu afaik).
>
> If someone has a char data[16] __align__(4) in LDS, i'm not at all sure it's obvious that they want the alignment increased to 16, despite that using more LDS than they asked for.
>
> We could do something more conservative, where we put variables in order based on their sizes and alignments, and then go through the resulting struct and tag variables with the additional alignment they happen to have as a result of position in the struct. That would be pure performance win, zero storage overhead cost. It leaves the choice to burn memory in favour of faster instructions in the hands of the developer writing the code.

The main intention behind this patch is to change

  char data[16] __align__(4)

to

  char data[16] __align__(16)

since **data** should be aligned at 16 bytes boundary.  Theoritically speaking, it may not correct to change the value mentined by porgrammer. But, from the practical point of view, since increasing the aligment value is programatically safe and since it fixes the performance issues due to unaligned access, I guess, it is fine to change it.

The usual approach to reduce the memory overhead due to padding within struct type is - to first sort the members based on their size, before padding [  Data structure alignment <https://en.wikipedia.org/wiki/Data_structure_alignment#:~:text=Data%20structure%20alignment%20is%20the,and%20accessed%20in%20computer%20memory.&text=For%20example%2C%20on%20a%2032,on%20a%2032%2Dbit%20boundary.>  ] . But, I guess, we are sorting here based on alignment first, and then based on size. May be we need to revisit it.

I do not understand what you mean by - "and then go through the resulting struct and tag variables with the additional alignment they happen to have as a result of position in the struct"


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D103261/new/

https://reviews.llvm.org/D103261



More information about the llvm-commits mailing list