[PATCH] D136432: [AMDGPU] Combine BFI instructions.
Nicolai Hähnle via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Oct 21 05:42:37 PDT 2022
nhaehnle added a comment.
In D136432#3874213 <https://reviews.llvm.org/D136432#3874213>, @jsilvanus wrote:
> I wondered whether the expression structure has any performance impact?
>
> For example, for four clauses, we could either generate `BFI(BFI(X1, X2, C1), BFI(X3, X4, C3), C1 | C2)` (i.e., a balanced binary tree) or `BFI(BFI(BFI(X1, X2, C1), X3, C1 | C2), X4, C1 | C2 | C3)` (i.e., a path).
>
> The number of instructions is the same, but the balanced version has no data dependency between the two inner BFIs. Would that matter on our hardware?
It can have an impact because I don't think there's a fast forwarding path in any of our HW, but:
- the impact is minor, and
- making the best decision also requires looking at the schedule of the instructions producing the input values, which we can't do here
Hmm... creating a plain binary tree is simple enough. Maybe we *should* do that as long as we're here...
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D136432/new/
https://reviews.llvm.org/D136432
More information about the llvm-commits
mailing list