[PATCH] D75689: [VectorCombine] fold extract-extract-op with different extraction indexes

Thu Mar 5 12:07:08 PST 2020

spatel added a comment.

In D75689#1907966 <https://reviews.llvm.org/D75689#1907966>, @lebedev.ri wrote:

> Taking a step back, we have
>
>   ; 0. scalar
>   %e0 = extractelement %x, C0
>   %e1 = extractelement %y, C1
>   %r = op %e0, %e1
>
>
> Let's focus on `C0 != C1` problem, i see three general approaches there:
>
> 1. move either one lane to another lane
>   1. move lane C0 to lane C1
>   2. move lane C1 to lane C0
> 2. move both lanes into lane 0
>
>   In current solution, why do we decide to only entertain the idea of replacing the costlier extract? Don't we want to check all three variants? (there's some overlap if one of C0,C1 is already lane 0) What if the other extract is from lane 0? We'd then be able to use `TTI::SK_Broadcast`.

Correct - I thought about reversing the operands if it would allow the shuffle to be a broadcast, but didn't try to implement it. My guess is that won't change the costs for the examples shown here (since they don't change even if we incorrectly specify that the shuffle in the current patch is a broadcast).

Similarly, the option where we shuffle both operands to element 0 has potential to be the winner, but I'm skeptical that it would be beneficial on most x86 examples. So I agree that if we want to get the optimal sequence, we need to model all of those possibilities - and by not modeling all of the possibilities, we may be missing an optimization. We could also argue that we're still in IR, so we should select 1 of those forms as semi-canonical and let the backend handle the other transforms.

Ok, if I add a TODO comment for now?

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D75689/new/

https://reviews.llvm.org/D75689