[PATCH] D136432: [AMDGPU] Combine BFI instructions.

Fri Oct 21 05:42:37 PDT 2022

nhaehnle added a comment.

In D136432#3874213 <https://reviews.llvm.org/D136432#3874213>, @jsilvanus wrote:

> I wondered whether the expression structure has any performance impact?
>
> For example, for four clauses, we could either generate `BFI(BFI(X1, X2, C1), BFI(X3, X4, C3), C1 | C2)` (i.e., a balanced binary tree) or `BFI(BFI(BFI(X1, X2, C1), X3, C1 | C2), X4, C1 | C2 | C3)` (i.e., a path).
>
> The number of instructions is the same, but the balanced version has no data dependency between the two inner BFIs. Would that matter on our hardware?

It can have an impact because I don't think there's a fast forwarding path in any of our HW, but:

- the impact is minor, and
- making the best decision also requires looking at the schedule of the instructions producing the input values, which we can't do here

Hmm... creating a plain binary tree is simple enough. Maybe we *should* do that as long as we're here...

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D136432/new/

https://reviews.llvm.org/D136432