[PATCH] D76928: [InstCombine][X86] Simplify demanded elts in SSE intrinsics with repeated args (PR24523)

Sanjay Patel via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Apr 1 09:53:27 PDT 2020


spatel added a comment.

These tests show a set of missed optimizations, so I recommend taking a step back and separate this into a few patches:

1. Fold x86 min/max intrinsics better - if operands are identical, the min/max simplifies away.
2. Fold x86 cmp intrinsics better (thought we had a bug report for this, but I don't see it now) - if operands are identical, the compare can simplify away (see SimplifyFCmpInst()) or change predicate.
3. Improve demanded elements analysis with isOnlyUserOf() - use generic opcode like 'mul' to show that improvement (independent of x86).
4. Improve demanded elements analysis of x86 min/max/cmp - the x86 part of this patch, but with different tests to show the win with different operands.

The first 3 are independent/parallel. The first 2 raise a potential problem that I don't know the answer to: what happens to target-specific intrinsics in a strict FP environment? Do we need to bypass the folds in that case? Is there some existing code that we can look at that deals with that situation?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D76928/new/

https://reviews.llvm.org/D76928





More information about the llvm-commits mailing list