[PATCH] D55380: [AMDGPU] Shrink scalar AND, OR, XOR instructions

Thu Dec 6 18:47:40 PST 2018

arsenm added inline comments.

================
Comment at: test/CodeGen/AMDGPU/andorbitset.ll:15
+; GCN: s_or_b32 s{{[0-9]+}}, s{{[0-9]+}}, 0x80000000
+; SI: s_bitset1_b32 s{{[0-9]+}}, 31
+define amdgpu_kernel void @s_set_msb(i32 addrspace(1)* %out, i32 %in) {
----------------
grahamsellers wrote:
> arsenm wrote:
> > Why only SI?
> I'm not sure what's going on but if you supply only -march=amdgcn and no -mcpu, then it appears the register allocator doesn't end up allocating the source and destination of the s_or_b32 instruction to the same register even though we use setRegAllocationHint. That part of the code was modeled on a similar chunk for S_MULK_I32 earlier in the function. Interestingly, the s_mulk.ll test doesn't have a RUN line which doesn't specify the CPU. I'll do the same.
You should generally just pick an explicit target. The default cpu is something approximating Tahiti but isn't quite the same

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D55380/new/

https://reviews.llvm.org/D55380