[PATCH] D80322: [AMDGPU] Tune threshold for cmp/select vector lowering

Wed May 20 17:40:53 PDT 2020

rampitec added a comment.

To expand a little bit on the reasoning: 256 bits of float/int yield 8 compares and 8 cndmasks, 16 instructions together. For doubles to fall under 16 instructions it takes double5: 5 compares and 10 cndmasks. Currently it is double4 which will be expanded.

I have done perf measurements to compare this expansion to s_set_gpr_idx on Vega10 and it breaks even around 5-6 elements with a tiny margin.

The condition became too complicated for me to understand, so I have just hoisted it into a predicate function. I also think we may move this predicate somewhere later, as we need it at least in GlobalISel, maybe in some other places too. Anyway, the same condition was already used in two places.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D80322/new/

https://reviews.llvm.org/D80322