[PATCH] D42885: [AMDGPU] intrintrics for byte/short load/store

Fri Oct 19 02:28:06 PDT 2018

bnieuwenhuizen added a comment.

In https://reviews.llvm.org/D42885#1268934, @sheredom wrote:

> Maybe a dumb question - but why can't we just use the tbuffer load/store instead of these? It already upcasts for you (the zext/sext is built in depending on the nfmt I believe).

Well we can, but using the loads without conversion can be faster? See https://gpuopen.com/gcn-memory-coalescing/ :

- 32-bit (or smaller) single-channel buffer loads / wave = 4 clocks (under specific cases)
- 32-bit (or smaller) filtered texels / wave = 16 clocks

Repository:
  rL LLVM

https://reviews.llvm.org/D42885