[PATCH] D30416: [BitfieldShrinking] Shrink Bitfields load/store when the bitfields are legal to access independently

Fri Apr 21 17:45:17 PDT 2017

arsenm added inline comments.

================
Comment at: include/llvm/Target/TargetLowering.h:1908
+  virtual bool isNarrowingExpensive(EVT /*VT1*/, EVT /*VT2*/) const {
+    return true;
+  }
----------------
efriedma wrote:
> wmi wrote:
> > efriedma wrote:
> > > I'm not sure I see the point of this hook.  Every in-tree target has cheap i8 load/store and aligned i16 load/store operations.  And we have existing hooks to check support for misaligned operations.
> > > 
> > > If there's some case I'm not thinking of, please add an example to the comment.
> > It is because some testcase for amdgpu. Like the testcase below:
> > 
> > define void @s_sext_in_reg_i1_i16(i16 addrspace(1)* %out, i32 addrspace(2)* %ptr) #0 {
> >   %ld = load i32, i32 addrspace(2)* %ptr
> >   %in = trunc i32 %ld to i16
> >   %shl = shl i16 %in, 15
> >   %sext = ashr i16 %shl, 15
> >   store i16 %sext, i16 addrspace(1)* %out
> >   ret void
> > }
> > 
> > code with the patch:
> > 	s_load_dwordx2 s[4:5], s[0:1], 0x9
> > 	s_load_dwordx2 s[0:1], s[0:1], 0xb
> > 	s_mov_b32 s7, 0xf000
> > 	s_mov_b32 s6, -1
> > 	s_mov_b32 s2, s6
> > 	s_mov_b32 s3, s7
> > 	s_waitcnt lgkmcnt(0)
> > 	buffer_load_ushort v0, off, s[0:3], 0
> > 	s_waitcnt vmcnt(0)
> > 	v_bfe_i32 v0, v0, 0, 1
> > 	buffer_store_short v0, off, s[4:7], 0
> > 	s_endpgm
> > 
> > code without the patch:
> > 	s_load_dwordx2 s[4:5], s[0:1], 0x9
> > 	s_load_dwordx2 s[0:1], s[0:1], 0xb
> > 	s_mov_b32 s7, 0xf000
> > 	s_mov_b32 s6, -1
> > 	s_waitcnt lgkmcnt(0)
> > 	s_load_dword s0, s[0:1], 0x0
> > 	s_waitcnt lgkmcnt(0)
> > 	s_bfe_i32 s0, s0, 0x10000
> > 	v_mov_b32_e32 v0, s0
> > 	buffer_store_short v0, off, s[4:7], 0
> > 	s_endpgm
> > 
> > amdgpu codegen chooses to use buffer_load_short instead of s_load_dword and generates longer code sequence. I know almost nothing about amdgpu so I simply add the hook and only focus on the architectures I am more faimiliar with before the patch becomes in better shape and stable. 
> > 
> Huh, GPU targets are weird like that.  I would still rather turn it off for amdgpu, as opposed to leaving it off by default.
32-bit loads should not be reduced to a shorter width. Using a buffer_load_ushort is definitely worse than using s_load_dword. There is a target hook that is supposed to avoid reducing load widths like this

Repository:
  rL LLVM

https://reviews.llvm.org/D30416