[PATCH] D136432: [AMDGPU] Combine BFI instructions.

Nicolai Hähnle via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri Dec 9 02:37:02 PST 2022


nhaehnle added a comment.

Sorry, this dropped off my radar for a while.



================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp:2165-2168
+bool AMDGPUDAGToDAGISel::SelectV_BFI(SDNode *N) {
+  if (!N->isDivergent())
+    return false;
+
----------------
Despite us having **repeatedly** told you so, this is **still** missing a type check that N is a 32-bit integer. I suspect you may be getting lucky because other types are being legalized away? But you shouldn't really on it.


================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp:2172
+    SDNode *MaskOperand;
+    llvm::Optional<uint64_t> Mask;
+  };
----------------
This can just be an uint32_t and doesn't need to be an Optional: if the MaskOperand is not a constant, this struct shouldn't really be constructed in the first place. (It would be cleaner to inline ExtractMask into CreateBFIOpFromAnd, since ExtractMask isn't a particularly load-bearing abstraction here.)


================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp:2175-2178
+  struct BFIOperandData {
+    SDNode *Or;
+    MaskData Mask;
+  };
----------------
You don't really need the `Or` in here. (It's only used to get the value type, but why -- you know that this only works with MVT::i32)


================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp:2229-2230
+  while (!OrStack.empty()) {
+    SDNode *Current = OrStack.back();
+    OrStack.pop_back();
+
----------------
You can use `OrStack.pop_back_val()`.


================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp:2235-2262
+    if (IsCandidate(O0)) {
+      OrStack.push_back(O0.getNode());
+      OrCandidates.push_back(O0.getNode());
+    }
+
+    if (IsCandidate(O1)) {
+      OrStack.push_back(O1.getNode());
----------------
I think the logic here is needlessly complicated and hiding what is really happening in the cases where it's correct, and it is subtly incorrect in some cases.

For example, consider:
```
N = (x & C) | ((y & ~C) | (z |* w))   -- The |* node has more than one use
```
In the first loop, the two outermost OR nodes are recorded in OrCandidates, the innermost one isn't. Then, the loop over OrCandidates will collect the BFIOps (x, C) and (y, ~C) and proceed to replace the whole expression by a single BFI, which is incorrect.

It would be cleaner logic (and therefore easier to get correct!) to just merge the two loops into one, by replacing the core part of the top loop with something like:
```
    if (O0->getOpcode() == ISD::OR) {
      if (!O0->isDivergent() || !O0->hasOneUse())
        return false;
      OrStack.push_back(O0.getNode());
    } else if (!CreateBFIOpFromAnd(O0)) {
      return false;
    }

    if (O1->getOpcode() == ISD::OR) {
      if (!O1->isDivergent() || !O1->hasOneUse())
        return false;
      OrStack.push_back(O1.getNode());
    } else if (!CreateBFIOpFromAnd(O1)) {
      return false;
    }
```
(This implies that CreateBFIOpFromAnd is adjusted to return false if it wasn't able to recognize the given node as a BFI operand.)


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D136432/new/

https://reviews.llvm.org/D136432



More information about the llvm-commits mailing list