[PATCH] D104049: [AMDGPU] [CodeGen] Fold negate llvm.amdgcn.class into test mask

Thu Jun 10 14:27:04 PDT 2021

arsenm added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp:1387

+  bool MadeChange = foldNegateIntrinsic(F);
+
----------------
This should not be a separate pass over the function. This should be a visitXor function

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp:1438
+
+      Value *NegOne = ConstantInt::get(Inst->getOperand(0)->getType(), -1);
+
----------------
You don't need to create a constant to check the value

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp:1448
+        continue;
+
+      // Check if either the parent or the grandparent of the other
----------------
Should check hasOneUse

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp:1452-1463
+      if (isa<TruncInst>(ExtCall))
+        IntrinsicCall =
+            dyn_cast<CallInst>(cast<TruncInst>(ExtCall)->getOperand(0));
+      else if (isa<SExtInst>(ExtCall))
+        IntrinsicCall =
+            dyn_cast<CallInst>(cast<SExtInst>(ExtCall)->getOperand(0));
+      else if (isa<ZExtInst>(ExtCall))
----------------
I don't know why you are looking at all of these extensions. The xor should directly consume the call

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp:1467-1469
+      if (!IntrinsicCall || (IntrinsicCall && IntrinsicCall->getIntrinsicID() !=
+                                                  Intrinsic::amdgcn_class))
+        continue;
----------------
You can just dyn_cast<IntrinsicInst>

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp:1474
+      IntrinsicCall->setArgOperand(
+          1, Builder.CreateNot(IntrinsicCall->getOperand(1)));
+      if (isa<CallInst>(ExtCall))
----------------
Since you know it's a constant, you can also 0 the irrelevant high bits

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp:1479
+        Inst->replaceAllUsesWith(ExtCall);
+      DeadInstr.push_back(cast<Instruction>(Inst));
+    }
----------------
You don't need to collect dead instructions, the xor should always be dead

================
Comment at: llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-foldnegate.ll:18
+}
+
+; CHECK: @fold_negate_intrinsic_test_mask_zext
----------------
Need a negative test with a variable mask. Also a negative test for multiple uses. Plus also could use tests for all of the FP types

================
Comment at: llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-foldnegate.ll:26
+  %1 = call i1 @llvm.amdgcn.class.f32(float %x, i32 5)
+  %2 = zext i1 %1 to i32
+  %3 = xor i32 %2, -1
----------------
You shouldn't be trying to look through this zext, this IR should have been optimized to xor on i1

================
Comment at: llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-foldnegate.ll:36-38
+  %1 = call i1 @llvm.amdgcn.class.f32(float %x, i32 5)
+  %2 = sext i1 %1 to i32
+  ret i32 %2
----------------
This didn't do anything

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D104049/new/

https://reviews.llvm.org/D104049