[PATCH] D136432: [AMDGPU] Combine BFI instructions.

Fri Oct 21 05:07:17 PDT 2022

jsilvanus added a comment.

One more thought on longer expressions: Nicolai already mentioned that we currently only match a particular expression structure.
I wondered whether the expression structure has any performance impact?

For example, for four clauses, we could either generate `BFI(BFI(X1, X2, C1), BFI(X3, X4, C3), C1 | C2)` (i.e., a balanced binary tree) or `BFI(BFI(BFI(X1, X2, C1), X3, C1 | C2), X4, C1 | C2 | C3)` (i.e., a path).

The number of instructions is the same, but the balanced version has no data dependency between the two inner BFIs. Would that matter on our hardware?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D136432/new/

https://reviews.llvm.org/D136432