[PATCH] D76928: [InstCombine][X86] Simplify demanded elts in SSE intrinsics with repeated args (PR24523)
Sanjay Patel via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Apr 1 09:53:27 PDT 2020
spatel added a comment.
These tests show a set of missed optimizations, so I recommend taking a step back and separate this into a few patches:
1. Fold x86 min/max intrinsics better - if operands are identical, the min/max simplifies away.
2. Fold x86 cmp intrinsics better (thought we had a bug report for this, but I don't see it now) - if operands are identical, the compare can simplify away (see SimplifyFCmpInst()) or change predicate.
3. Improve demanded elements analysis with isOnlyUserOf() - use generic opcode like 'mul' to show that improvement (independent of x86).
4. Improve demanded elements analysis of x86 min/max/cmp - the x86 part of this patch, but with different tests to show the win with different operands.
The first 3 are independent/parallel. The first 2 raise a potential problem that I don't know the answer to: what happens to target-specific intrinsics in a strict FP environment? Do we need to bypass the folds in that case? Is there some existing code that we can look at that deals with that situation?
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D76928/new/
https://reviews.llvm.org/D76928
More information about the llvm-commits
mailing list