[PATCH] D42885: [AMDGPU] intrintrics for byte/short load/store
Bas Nieuwenhuizen via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Oct 19 02:28:06 PDT 2018
bnieuwenhuizen added a comment.
In https://reviews.llvm.org/D42885#1268934, @sheredom wrote:
> Maybe a dumb question - but why can't we just use the tbuffer load/store instead of these? It already upcasts for you (the zext/sext is built in depending on the nfmt I believe).
Well we can, but using the loads without conversion can be faster? See https://gpuopen.com/gcn-memory-coalescing/ :
- 32-bit (or smaller) single-channel buffer loads / wave = 4 clocks (under specific cases)
- 32-bit (or smaller) filtered texels / wave = 16 clocks
Repository:
rL LLVM
https://reviews.llvm.org/D42885
More information about the llvm-commits
mailing list