[PATCH/RFC] New TLI option for fast selects

escha escha at apple.com
Thu Apr 30 13:42:27 PDT 2015


There’s a number of DAG transforms that are suboptimal on architectures with extremely fast selects, e.g. those with actual select instructions mapping relatively cleanly to select_cc. An example would be integer abs(): on an ideal architecture without a select it might be three instructions (cmp, sub, cmov), so the three-operation canonicalization likely won’t hurt even in the worst case. But on an architecture with a select, we’re going from 2 instructions to 3, which significantly increases instruction count, and it’s difficult to “go back” from the new instruction sequence to the select.

I’m not sure this patch actually caught all of them; there might be others, since I didn’t check them all. My logic here was to put a check on every transform which creates more nodes than it consumes in order to eliminate a select. On an out of tree target this saves a number of instructions (with no regressions on any test) by making the included TLI return “false” for that target.

This could also open up more optimizations in the future that assume selects are fast, e.g. one select DAG node is roughly equivalent to one real instruction. I wonder if any of the in-tree GPU backends would find something like this useful?

Any thoughts on the implementation?

— escha


-------------- next part --------------
A non-text attachment was scrubbed...
Name: tli_select.diff
Type: application/octet-stream
Size: 3060 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150430/55b7fcf0/attachment.obj>


More information about the llvm-commits mailing list