[PATCH] D12452: AMDGPU/SI: Add support for llvm.r600.local.size.* instrics when targeting HSA

Matt Arsenault via llvm-commits llvm-commits at lists.llvm.org
Fri Aug 28 17:23:16 PDT 2015


arsenm accepted this revision.
arsenm added a comment.
This revision is now accepted and ready to land.

LGTM


================
Comment at: lib/Target/AMDGPU/SIISelLowering.cpp:1041
@@ +1040,3 @@
+    Offset = SI::DispatchPacketOffset::LOCAL_SIZE_X + (Dim * 4);
+    MemVT = MVT::i16;
+  } else {
----------------
tstellarAMD wrote:
> arsenm wrote:
> > Why is this an i16? We really don't want to have to do an argument extload, although from the tests it looks like that doesn't happen.
> The extload isn't happening, because LowerParameter only does extloads for floating-point types, I can fix that.
> 
> The local size values are stored in memory as i16 values.  We could use a 32-bit non-ext load for the z value, since the next 16-bits after the z value will always be 0.
> 
> For x and y, we always load both and then mask/shift to get the value we need.  I'm not sure if 32-bit load + mask or shift is faster than 16-bit ext load.
The 32-bit load and mask will definitely be better because there are no scalar ext loads.


http://reviews.llvm.org/D12452





More information about the llvm-commits mailing list