[PATCH] D136432: [AMDGPU] Combine BFI instructions.
Jannik Silvanus via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Oct 21 05:07:17 PDT 2022
jsilvanus added a comment.
One more thought on longer expressions: Nicolai already mentioned that we currently only match a particular expression structure.
I wondered whether the expression structure has any performance impact?
For example, for four clauses, we could either generate `BFI(BFI(X1, X2, C1), BFI(X3, X4, C3), C1 | C2)` (i.e., a balanced binary tree) or `BFI(BFI(BFI(X1, X2, C1), X3, C1 | C2), X4, C1 | C2 | C3)` (i.e., a path).
The number of instructions is the same, but the balanced version has no data dependency between the two inner BFIs. Would that matter on our hardware?
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D136432/new/
https://reviews.llvm.org/D136432
More information about the llvm-commits
mailing list