[Libclc-dev] [PATCH] Fix vload3/vstore3 to emit only one IR load
Matt Arsenault via Libclc-dev
libclc-dev at lists.llvm.org
Fri Sep 25 14:20:26 PDT 2015
On 09/25/2015 02:13 PM, Jeroen Ketema wrote:
> Hi Matt,
> The IR below seem fishy to me: if we have
> vload3(get_global_id(0), A)
> then the work item with the highest id is likely to access an element out of bounds of the array being passed in.
> Also, does the store generate a store of 4 elements, or will that be precisely be 3 elements?
The store is also emitted as a <4 x i32>. I'm not sure why clang is
avoiding direct load/store of 3 vectors, but this seems like a clang bug
More information about the Libclc-dev