[PATCH] D87236: [X86][SSE2] Use smarter instruction patterns for lowering UMIN/UMAX with v8i16.

Sun Oct 4 08:32:46 PDT 2020

TomHender added a comment.

This patch is now just about v8i16 and I updated the array in "getTypeBasedIntrinsicInstrCost".

Regarding the remaining regression and the alternative lowering for v16i8, I think would like to try implementing the cost analysis based sign-switching for both v8i16 and v16i8. I have given that some thought and I think the actual logic is actually relatively straight forward. We just look at connected sequences of minimums and maximums (where we can even have simple shuffles, xors, adds, subtractions and possibly other operations in between) and then calculate both the cost of lowering the sequence naively (i.e. as of this patch) vs. the cost of lowering after turning the signedness of all minimums and maximums and flipping the sign on all input/ouput points. If sign switching is cheaper, we rewrite the tree by switching all signed and unsigned operations as well as adding a xor at each input/output point. Some common patterns I hope to optimize this way include the following:

- All the horizontal reduction code should automatically fall under this.
- Calculating the maximum or minimum of many values.
- The clamp pattern: min(max(x, min_value), max_value), possibly with the minimum and maximum constant.
- The median pattern: max(min(x, y), min(max(x, y), z))

The main question though why I am not already implementing a prototype is that I don't know where to put the code. I think neither the custom lowering function "X86TargetLowering::LowerOperation" nor "X86TargetLowering::PerformDAGCombine" are a good fit. The first one seems suboptimal because from my superficial understanding this method is not supposed to operate on the whole instruction graph but only lower one instruction at a time. The second one is annoying because as far as I understand LLVM's control flow, we will no longer have minimums or maximums explicitely because they have already been replaced by the earlier "LowerOperation". This would make matching the code patterns much more annoying.

Is there some target specific callback which runs before the lowering and is able/supposed to do tree rewriting like this?

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D87236/new/

https://reviews.llvm.org/D87236