[PATCH] D30416: [BitfieldShrinking] Shrink Bitfields load/store when the bitfields are legal to access independently

Fri Apr 21 18:07:49 PDT 2017

wmi added inline comments.

================
Comment at: include/llvm/Target/TargetLowering.h:1908
+  virtual bool isNarrowingExpensive(EVT /*VT1*/, EVT /*VT2*/) const {
+    return true;
+  }
----------------
arsenm wrote:
> efriedma wrote:
> > wmi wrote:
> > > efriedma wrote:
> > > > I'm not sure I see the point of this hook.  Every in-tree target has cheap i8 load/store and aligned i16 load/store operations.  And we have existing hooks to check support for misaligned operations.
> > > > 
> > > > If there's some case I'm not thinking of, please add an example to the comment.
> > > It is because some testcase for amdgpu. Like the testcase below:
> > > 
> > > define void @s_sext_in_reg_i1_i16(i16 addrspace(1)* %out, i32 addrspace(2)* %ptr) #0 {
> > >   %ld = load i32, i32 addrspace(2)* %ptr
> > >   %in = trunc i32 %ld to i16
> > >   %shl = shl i16 %in, 15
> > >   %sext = ashr i16 %shl, 15
> > >   store i16 %sext, i16 addrspace(1)* %out
> > >   ret void
> > > }
> > > 
> > > code with the patch:
> > > 	s_load_dwordx2 s[4:5], s[0:1], 0x9
> > > 	s_load_dwordx2 s[0:1], s[0:1], 0xb
> > > 	s_mov_b32 s7, 0xf000
> > > 	s_mov_b32 s6, -1
> > > 	s_mov_b32 s2, s6
> > > 	s_mov_b32 s3, s7
> > > 	s_waitcnt lgkmcnt(0)
> > > 	buffer_load_ushort v0, off, s[0:3], 0
> > > 	s_waitcnt vmcnt(0)
> > > 	v_bfe_i32 v0, v0, 0, 1
> > > 	buffer_store_short v0, off, s[4:7], 0
> > > 	s_endpgm
> > > 
> > > code without the patch:
> > > 	s_load_dwordx2 s[4:5], s[0:1], 0x9
> > > 	s_load_dwordx2 s[0:1], s[0:1], 0xb
> > > 	s_mov_b32 s7, 0xf000
> > > 	s_mov_b32 s6, -1
> > > 	s_waitcnt lgkmcnt(0)
> > > 	s_load_dword s0, s[0:1], 0x0
> > > 	s_waitcnt lgkmcnt(0)
> > > 	s_bfe_i32 s0, s0, 0x10000
> > > 	v_mov_b32_e32 v0, s0
> > > 	buffer_store_short v0, off, s[4:7], 0
> > > 	s_endpgm
> > > 
> > > amdgpu codegen chooses to use buffer_load_short instead of s_load_dword and generates longer code sequence. I know almost nothing about amdgpu so I simply add the hook and only focus on the architectures I am more faimiliar with before the patch becomes in better shape and stable. 
> > > 
> > Huh, GPU targets are weird like that.  I would still rather turn it off for amdgpu, as opposed to leaving it off by default.
> 32-bit loads should not be reduced to a shorter width. Using a buffer_load_ushort is definitely worse than using s_load_dword. There is a target hook that is supposed to avoid reducing load widths like this
Matt, thanks for the explanation.

I guess the hook is isNarrowingProfitable. However, the hook I need is a little different. I need to know whether narrowing is expensive enough. isNarrowingProfitable on x86 shows i32 --> i16 is not profitable, maybe slightly harmful, but it is not quite harmful, and the benefit to do narrowing may outweigh the cost.

================
Comment at: lib/CodeGen/MemAccessShrinking.cpp:1014
+          else
+            CandidatesToErase.insert(OpI);
+        }
----------------
efriedma wrote:
> wmi wrote:
> > efriedma wrote:
> > > If an instruction has no uses and isn't trivially dead, we're never going to erase it; no point to adding it to CandidatesToErase.
> > An instruction which is not trivially dead for now may become dead after other instructions in CandidatesToErase are removed. That is why I want to add it to CandidatesToErase.
> OpI has no uses here.  The only way an instruction can have no uses and still not be trivially dead is if it has side-effects.  Deleting other instructions won't change the fact that it has side-effects.
You are right. The entire logic about CandidatesToErase is problematic. I will fix it. 

Repository:
  rL LLVM

https://reviews.llvm.org/D30416