[PATCH] D124217: [AMDGPU] Allow finer grain control of an unaligned access speed

Fri Apr 22 03:21:58 PDT 2022

foad added a comment.

In D124217#3466281 <https://reviews.llvm.org/D124217#3466281>, @rampitec wrote:

> In D124217#3466260 <https://reviews.llvm.org/D124217#3466260>, @rampitec wrote:
>
>> In D124217#3466221 <https://reviews.llvm.org/D124217#3466221>, @jrtc27 wrote:
>>
>>> What are the units?..
>>
>> The units are target defined, just like the very definition of 'fast'. The only metric here is that something with a higher number is faster than something with a lower number.
>
> For example in the D124219 <https://reviews.llvm.org/D124219> I am assigning the full bit size of an operation if it is really fast, then 32 if it is less than 4 byte aligned, and 1 otherwise as it is still faster than smaller access. I could have chosen another numbers, but it is simply easier for me to think about it that way: a less than 4 byte aligned wider load has in fact the same speed as a 32 bit load with the same alignment. So vectorizer would chose wider load as the speed is the same and the number of operations goes down.
>
> This can be one way of choosing the units. I am pretty much sure one can come up with a sophisticated mechanism of weights for a specific target.

Is it worth switching to InstructionCost here? Or at least switching to a model where larger numbers mean slower, like InstructionCost uses.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D124217/new/

https://reviews.llvm.org/D124217