[PATCH] D35435: [AMDGPU] Produce flat|global_dwordx3 instructions

Stanislav Mekhanoshin via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri Jul 21 12:57:21 PDT 2017


rampitec added a comment.

In https://reviews.llvm.org/D35435#817495, @vpykhtin wrote:

> The implementation of this approach looks good to me. The only question is which way to go to implement v3 vector.


Probably making v3 generally legal and simple type is a right thing to do. This will solve not only problem with loads, but compute on the resulting vector as well. Currently such compute is done on a 4 component vector created by the legalization with the promote of v3 to v4.

It however seems to be long way because a lot of places just designed to work with a power of 2 vectors, halfing and doubling them. I would do this style vec3 load in the short term and target legal v3 in a long term.


https://reviews.llvm.org/D35435





More information about the llvm-commits mailing list