[PATCH] D31804: [AMDGPU] zero extend workitem id

Mon Apr 10 11:04:07 PDT 2017

rampitec added a comment.

In https://reviews.llvm.org/D31804#722567, @arsenm wrote:

> Doesn't the library already annotate these with the range metadata? We should probably tighten those bounds in a pass when the required workgroup size is known on the IR metadata

Generally library cannot know the workgroup size, it is the attribute on a kernel. Then clang produces amdgpu_flat_work_group_size, which is processed here. Too bad it is flat. There is also OpenCL specific reqd_work_group_size attribute which is now flattened and translated into amdgpu_flat_work_group_size by clang. Technically it shall be possible to get a more precise range with processing OpenCL specific reqd_work_group_size, but practically we do not support flat sizes more than 256, and AssertZExt cannot give a better range representation than 'extend from byte' anyway. A computeKnownBits could do it better, but it needs to process a target opcode, when after lowering it is just a load.

On a side note, there are other calls which can be simplified, like get_local_size(). I do not know how to do it though, because these are just loads yet in the library, they have neither intrinsics nor target opcodes.

Repository:
  rL LLVM

https://reviews.llvm.org/D31804