[PATCH] D48826: [AMDGPU] Add support for TFE/LWE in image intrinsics
Nicolai Hähnle via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Sep 12 04:24:32 PDT 2018
nhaehnle added a comment.
Thanks! Some small issues let, and please also add a test case for the "simplifyDemanded" implementation.
As for the question of return types: on second thought,having <8 x half> as a return type when halves 5 & 6 are really combined to a single i32 LWE/TFE return is indeed a bit awkward. It would make sense to have `{<4 x half>, i32}` instead at the LLVM IR level. The intrinsics definition themselves should relax fairly easily. You need to replace llvm_anyfloat_ty by llvm_any_ty in:
- AMDGPUDimSampleProfile
- defms for int_amdgcn_image_load{,_mip}
The machine instructions themselves would stay the same. The only risk is the SelectionDAG itself, though I don't see anything that would fail or be particularly complicated to deal with right now. I believe the struct type just gets split into two separate result values of the SDNode.
================
Comment at: lib/Target/AMDGPU/SIAddIMGInit.cpp:103
+
+ int dstIdx =
+ AMDGPU::getNamedOperandIdx(MI.getOpcode(), AMDGPU::OpName::vdata);
----------------
Variable names should be capitalized, here and below.
================
Comment at: lib/Target/AMDGPU/SIAddIMGInit.cpp:170
+ // Add as an implicit operand
+ MachineInstrBuilder(MF,MI).addReg(newDst, RegState::Implicit);
+
----------------
Space between MF and MI.
================
Comment at: lib/Target/AMDGPU/SIISelLowering.cpp:4643
+ (LoadVT == MVT::v4f16) ? MVT::v4i32 : MVT::v8i32;
+ else if (LoadVT.isVector() && LoadVT == MVT::v8f16)
+ // Rather than add lots of code to handle v8f16 for this case, just
----------------
The LoadVT.isVector() is redundant.
================
Comment at: lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp:935-941
+ if (TFEIdx > 0) {
+ TFEEnable = dyn_cast<ConstantInt>(II->getArgOperand(TFEIdx));
+ LWEEnable = dyn_cast<ConstantInt>(II->getArgOperand(TFEIdx + 1));
+ if (TFEEnable && LWEEnable) {
+ TFELWEEnabled = TFEEnable->getZExtValue() | LWEEnable->getZExtValue();
+ }
+ }
----------------
TFE and LWE enables are combined in a single argument, so this needs to be fixed.
================
Comment at: test/CodeGen/AMDGPU/llvm.amdgcn.image.dim.ll:620
%c1 = extractelement <2 x i32> %c, i32 1
- %tex = call float @llvm.amdgcn.image.load.2d.f32.i32(i32 15, i32 %c0, i32 %c1, <8 x i32> %rsrc, i32 0, i32 0)
+ %tex = call float @llvm.amdgcn.image.load.2d.f32.i32(i32 1, i32 %c0, i32 %c1, <8 x i32> %rsrc, i32 0, i32 0)
%tmp2 = getelementptr float, float addrspace(3)* %lds, i32 4
----------------
Oops :)
Repository:
rL LLVM
https://reviews.llvm.org/D48826
More information about the llvm-commits
mailing list