[PATCH] D104049: [AMDGPU] [CodeGen] Fold negate llvm.amdgcn.class into test mask
Matt Arsenault via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu Jun 10 14:27:04 PDT 2021
arsenm added inline comments.
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp:1387
+ bool MadeChange = foldNegateIntrinsic(F);
+
----------------
This should not be a separate pass over the function. This should be a visitXor function
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp:1438
+
+ Value *NegOne = ConstantInt::get(Inst->getOperand(0)->getType(), -1);
+
----------------
You don't need to create a constant to check the value
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp:1448
+ continue;
+
+ // Check if either the parent or the grandparent of the other
----------------
Should check hasOneUse
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp:1452-1463
+ if (isa<TruncInst>(ExtCall))
+ IntrinsicCall =
+ dyn_cast<CallInst>(cast<TruncInst>(ExtCall)->getOperand(0));
+ else if (isa<SExtInst>(ExtCall))
+ IntrinsicCall =
+ dyn_cast<CallInst>(cast<SExtInst>(ExtCall)->getOperand(0));
+ else if (isa<ZExtInst>(ExtCall))
----------------
I don't know why you are looking at all of these extensions. The xor should directly consume the call
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp:1467-1469
+ if (!IntrinsicCall || (IntrinsicCall && IntrinsicCall->getIntrinsicID() !=
+ Intrinsic::amdgcn_class))
+ continue;
----------------
You can just dyn_cast<IntrinsicInst>
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp:1474
+ IntrinsicCall->setArgOperand(
+ 1, Builder.CreateNot(IntrinsicCall->getOperand(1)));
+ if (isa<CallInst>(ExtCall))
----------------
Since you know it's a constant, you can also 0 the irrelevant high bits
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp:1479
+ Inst->replaceAllUsesWith(ExtCall);
+ DeadInstr.push_back(cast<Instruction>(Inst));
+ }
----------------
You don't need to collect dead instructions, the xor should always be dead
================
Comment at: llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-foldnegate.ll:18
+}
+
+; CHECK: @fold_negate_intrinsic_test_mask_zext
----------------
Need a negative test with a variable mask. Also a negative test for multiple uses. Plus also could use tests for all of the FP types
================
Comment at: llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-foldnegate.ll:26
+ %1 = call i1 @llvm.amdgcn.class.f32(float %x, i32 5)
+ %2 = zext i1 %1 to i32
+ %3 = xor i32 %2, -1
----------------
You shouldn't be trying to look through this zext, this IR should have been optimized to xor on i1
================
Comment at: llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-foldnegate.ll:36-38
+ %1 = call i1 @llvm.amdgcn.class.f32(float %x, i32 5)
+ %2 = sext i1 %1 to i32
+ ret i32 %2
----------------
This didn't do anything
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D104049/new/
https://reviews.llvm.org/D104049
More information about the llvm-commits
mailing list