[PATCH] D124217: [AMDGPU] Allow finer grain control of an unaligned access speed

Thu Apr 21 17:41:30 PDT 2022

rampitec added a comment.

In D124217#3466260 <https://reviews.llvm.org/D124217#3466260>, @rampitec wrote:

> In D124217#3466221 <https://reviews.llvm.org/D124217#3466221>, @jrtc27 wrote:
>
>> What are the units?..
>
> The units are target defined, just like the very definition of 'fast'. The only metric here is that something with a higher number is faster than something with a lower number.

For example in the D124219 <https://reviews.llvm.org/D124219> I am assigning the full bit size of an operation if it is really fast, then 32 if it is only 4 byte aligned, and zero otherwise. I could have chosen another numbers, but it is simply easier for me to think about it that way: a 4 byte aligned wider load has in fact the same speed as a 32 bit 4 byte aligned load. So vectorizer would chose wider load as the speed is the same and the number of operations goes down.

This can be one way of choosing the units. I am pretty much sure one can come up with a sophisticated mechanism of weights for a specific target.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D124217/new/

https://reviews.llvm.org/D124217