[PATCH] D42885: [AMDGPU] intrintrics for byte/short load/store

Bas Nieuwenhuizen via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri Oct 19 02:28:06 PDT 2018


bnieuwenhuizen added a comment.

In https://reviews.llvm.org/D42885#1268934, @sheredom wrote:

> Maybe a dumb question - but why can't we just use the tbuffer load/store instead of these? It already upcasts for you (the zext/sext is built in depending on the nfmt I believe).


Well we can, but using the loads without conversion can be faster? See https://gpuopen.com/gcn-memory-coalescing/ :

- 32-bit (or smaller) single-channel buffer loads / wave = 4 clocks (under specific cases)
- 32-bit (or smaller) filtered texels / wave = 16 clocks


Repository:
  rL LLVM

https://reviews.llvm.org/D42885





More information about the llvm-commits mailing list