[PATCH] D58902: [AMDGPU] Support for v3i32/v3f32

Wed Mar 6 10:57:54 PST 2019

tpr marked 5 inline comments as done.
tpr added inline comments.

================
Comment at: lib/Target/AMDGPU/SIISelLowering.cpp:334-335
+  setOperationAction(ISD::INSERT_SUBVECTOR, MVT::v3f32, Expand);
+  setOperationAction(ISD::INSERT_SUBVECTOR, MVT::v4i32, Expand);
+  setOperationAction(ISD::INSERT_SUBVECTOR, MVT::v4f32, Expand);
+
----------------
arsenm wrote:
> Why do these need to be changed?
Need the first pair to stop "Cannot select: t60: v3i32 = insert_subvector undef:v3i32, t55, Constant:i32<0>" on ds_read2_superreg.ll.

Need the second pair to stop "Cannot select: t129: v4i32 = insert_subvector undef:v4i32, t229, Constant:i32<0>" on setcc.ll.

================
Comment at: lib/Target/AMDGPU/SIInstrInfo.cpp:577
+    // TODO: Copy vec3/vec5 with s_mov_b64s then final s_mov_b32.
+    if (!(RI.getRegSizeInBits(*RC) % 64)) {
       Opcode =  AMDGPU::S_MOV_B64;
----------------
arsenm wrote:
> tpr wrote:
> > rampitec wrote:
> > > Does that mean for a v3 we will have 3 s_mov_b32 instead of s_mov_b64 + s_mov_b32? That is suboptimal.
> > Yes.
> > 
> > I can't find any lit test or graphics shader that attempts to copy a vec3 of sgprs here. Any idea how to provoke it so I can try it?
> A v3 phi, and probably at -O0
I managed to provoke it, but fixing it looks less than trivial as we need to somehow allow for a vec3 having a sub0_sub1 and a sub2. And it looks pretty rare anyway. So I won't fix it here. Is that ok?

Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D58902/new/

https://reviews.llvm.org/D58902