[PATCH] D48826: [AMDGPU] Add support for TFE/LWE in image intrinsics

Wed Sep 12 04:24:32 PDT 2018

nhaehnle added a comment.

Thanks! Some small issues let, and please also add a test case for the "simplifyDemanded" implementation.

As for the question of return types: on second thought,having <8 x half> as a return type when halves 5 & 6 are really combined to a single i32 LWE/TFE return is indeed a bit awkward. It would make sense to have `{<4 x half>, i32}` instead at the LLVM IR level. The intrinsics definition themselves should relax fairly easily. You need to replace llvm_anyfloat_ty by llvm_any_ty in:

- AMDGPUDimSampleProfile
- defms for int_amdgcn_image_load{,_mip}

The machine instructions themselves would stay the same. The only risk is the SelectionDAG itself, though I don't see anything that would fail or be particularly complicated to deal with right now. I believe the struct type just gets split into two separate result values of the SDNode.

================
Comment at: lib/Target/AMDGPU/SIAddIMGInit.cpp:103
+
+          int dstIdx =
+              AMDGPU::getNamedOperandIdx(MI.getOpcode(), AMDGPU::OpName::vdata);
----------------
Variable names should be capitalized, here and below.

================
Comment at: lib/Target/AMDGPU/SIAddIMGInit.cpp:170
+          // Add as an implicit operand
+          MachineInstrBuilder(MF,MI).addReg(newDst, RegState::Implicit);
+
----------------
Space between MF and MI.

================
Comment at: lib/Target/AMDGPU/SIISelLowering.cpp:4643
+                           (LoadVT == MVT::v4f16) ? MVT::v4i32 : MVT::v8i32;
+        else if (LoadVT.isVector() && LoadVT == MVT::v8f16)
+          // Rather than add lots of code to handle v8f16 for this case, just
----------------
The LoadVT.isVector() is redundant.

================
Comment at: lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp:935-941
+  if (TFEIdx > 0) {
+    TFEEnable = dyn_cast<ConstantInt>(II->getArgOperand(TFEIdx));
+    LWEEnable = dyn_cast<ConstantInt>(II->getArgOperand(TFEIdx + 1));
+    if (TFEEnable && LWEEnable) {
+      TFELWEEnabled = TFEEnable->getZExtValue() | LWEEnable->getZExtValue();
+    }
+  }
----------------
TFE and LWE enables are combined in a single argument, so this needs to be fixed.

================
Comment at: test/CodeGen/AMDGPU/llvm.amdgcn.image.dim.ll:620
   %c1 = extractelement <2 x i32> %c, i32 1
-  %tex = call float @llvm.amdgcn.image.load.2d.f32.i32(i32 15, i32 %c0, i32 %c1, <8 x i32> %rsrc, i32 0, i32 0)
+  %tex = call float @llvm.amdgcn.image.load.2d.f32.i32(i32 1, i32 %c0, i32 %c1, <8 x i32> %rsrc, i32 0, i32 0)
   %tmp2 = getelementptr float, float addrspace(3)* %lds, i32 4
----------------
Oops :)

Repository:
  rL LLVM

https://reviews.llvm.org/D48826