[PATCH] D30416: [BitfieldShrinking] Shrink Bitfields load/store when the bitfields are legal to access independently

Fri Apr 21 17:14:34 PDT 2017

efriedma added inline comments.

================
Comment at: include/llvm/Target/TargetLowering.h:1908
+  virtual bool isNarrowingExpensive(EVT /*VT1*/, EVT /*VT2*/) const {
+    return true;
+  }
----------------
wmi wrote:
> efriedma wrote:
> > I'm not sure I see the point of this hook.  Every in-tree target has cheap i8 load/store and aligned i16 load/store operations.  And we have existing hooks to check support for misaligned operations.
> > 
> > If there's some case I'm not thinking of, please add an example to the comment.
> It is because some testcase for amdgpu. Like the testcase below:
> 
> define void @s_sext_in_reg_i1_i16(i16 addrspace(1)* %out, i32 addrspace(2)* %ptr) #0 {
>   %ld = load i32, i32 addrspace(2)* %ptr
>   %in = trunc i32 %ld to i16
>   %shl = shl i16 %in, 15
>   %sext = ashr i16 %shl, 15
>   store i16 %sext, i16 addrspace(1)* %out
>   ret void
> }
> 
> code with the patch:
> 	s_load_dwordx2 s[4:5], s[0:1], 0x9
> 	s_load_dwordx2 s[0:1], s[0:1], 0xb
> 	s_mov_b32 s7, 0xf000
> 	s_mov_b32 s6, -1
> 	s_mov_b32 s2, s6
> 	s_mov_b32 s3, s7
> 	s_waitcnt lgkmcnt(0)
> 	buffer_load_ushort v0, off, s[0:3], 0
> 	s_waitcnt vmcnt(0)
> 	v_bfe_i32 v0, v0, 0, 1
> 	buffer_store_short v0, off, s[4:7], 0
> 	s_endpgm
> 
> code without the patch:
> 	s_load_dwordx2 s[4:5], s[0:1], 0x9
> 	s_load_dwordx2 s[0:1], s[0:1], 0xb
> 	s_mov_b32 s7, 0xf000
> 	s_mov_b32 s6, -1
> 	s_waitcnt lgkmcnt(0)
> 	s_load_dword s0, s[0:1], 0x0
> 	s_waitcnt lgkmcnt(0)
> 	s_bfe_i32 s0, s0, 0x10000
> 	v_mov_b32_e32 v0, s0
> 	buffer_store_short v0, off, s[4:7], 0
> 	s_endpgm
> 
> amdgpu codegen chooses to use buffer_load_short instead of s_load_dword and generates longer code sequence. I know almost nothing about amdgpu so I simply add the hook and only focus on the architectures I am more faimiliar with before the patch becomes in better shape and stable. 
> 
Huh, GPU targets are weird like that.  I would still rather turn it off for amdgpu, as opposed to leaving it off by default.

================
Comment at: lib/CodeGen/MemAccessShrinking.cpp:1014
+          else
+            CandidatesToErase.insert(OpI);
+        }
----------------
wmi wrote:
> efriedma wrote:
> > If an instruction has no uses and isn't trivially dead, we're never going to erase it; no point to adding it to CandidatesToErase.
> An instruction which is not trivially dead for now may become dead after other instructions in CandidatesToErase are removed. That is why I want to add it to CandidatesToErase.
OpI has no uses here.  The only way an instruction can have no uses and still not be trivially dead is if it has side-effects.  Deleting other instructions won't change the fact that it has side-effects.

Repository:
  rL LLVM

https://reviews.llvm.org/D30416