[PATCH] D30416: [BitfieldShrinking] Shrink Bitfields load/store when the bitfields are legal to access independently

Fri Apr 21 16:31:58 PDT 2017

wmi added inline comments.

================
Comment at: include/llvm/Target/TargetLowering.h:1908
+  virtual bool isNarrowingExpensive(EVT /*VT1*/, EVT /*VT2*/) const {
+    return true;
+  }
----------------
efriedma wrote:
> I'm not sure I see the point of this hook.  Every in-tree target has cheap i8 load/store and aligned i16 load/store operations.  And we have existing hooks to check support for misaligned operations.
> 
> If there's some case I'm not thinking of, please add an example to the comment.
It is because some testcase for amdgpu. Like the testcase below:

define void @s_sext_in_reg_i1_i16(i16 addrspace(1)* %out, i32 addrspace(2)* %ptr) #0 {
  %ld = load i32, i32 addrspace(2)* %ptr
  %in = trunc i32 %ld to i16
  %shl = shl i16 %in, 15
  %sext = ashr i16 %shl, 15
  store i16 %sext, i16 addrspace(1)* %out
  ret void
}

code with the patch:
	s_load_dwordx2 s[4:5], s[0:1], 0x9
	s_load_dwordx2 s[0:1], s[0:1], 0xb
	s_mov_b32 s7, 0xf000
	s_mov_b32 s6, -1
	s_mov_b32 s2, s6
	s_mov_b32 s3, s7
	s_waitcnt lgkmcnt(0)
	buffer_load_ushort v0, off, s[0:3], 0
	s_waitcnt vmcnt(0)
	v_bfe_i32 v0, v0, 0, 1
	buffer_store_short v0, off, s[4:7], 0
	s_endpgm

code without the patch:
	s_load_dwordx2 s[4:5], s[0:1], 0x9
	s_load_dwordx2 s[0:1], s[0:1], 0xb
	s_mov_b32 s7, 0xf000
	s_mov_b32 s6, -1
	s_waitcnt lgkmcnt(0)
	s_load_dword s0, s[0:1], 0x0
	s_waitcnt lgkmcnt(0)
	s_bfe_i32 s0, s0, 0x10000
	v_mov_b32_e32 v0, s0
	buffer_store_short v0, off, s[4:7], 0
	s_endpgm

amdgpu codegen chooses to use buffer_load_short instead of s_load_dword and generates longer code sequence. I know almost nothing about amdgpu so I simply add the hook and only focus on the architectures I am more faimiliar with before the patch becomes in better shape and stable. 

================
Comment at: lib/CodeGen/MemAccessShrinking.cpp:1014
+          else
+            CandidatesToErase.insert(OpI);
+        }
----------------
efriedma wrote:
> If an instruction has no uses and isn't trivially dead, we're never going to erase it; no point to adding it to CandidatesToErase.
An instruction which is not trivially dead for now may become dead after other instructions in CandidatesToErase are removed. That is why I want to add it to CandidatesToErase.

================
Comment at: lib/CodeGen/MemAccessShrinking.cpp:1027
+    } while (!DeadInsts.empty());
+  }
+  return Changed;
----------------
efriedma wrote:
> Should we clear CandidatesToErase here, as opposed to modifying it inside the loop?
Ah, right. Actually, I shouldn't use range based loop since the iterator will be invalidated after insertion and deletion. 

Repository:
  rL LLVM

https://reviews.llvm.org/D30416