[llvm] [InstCombine][Docs] Update InstCombine contributor guide (PR #144228)

Yingwei Zheng via llvm-commits llvm-commits at lists.llvm.org
Sun Jun 15 19:49:08 PDT 2025


https://github.com/dtcxzyw updated https://github.com/llvm/llvm-project/pull/144228

>From 2f3708ab1a7a9b847ab315c37c71412467bed083 Mon Sep 17 00:00:00 2001
From: Yingwei Zheng <dtcxzyw2333 at gmail.com>
Date: Sat, 14 Jun 2025 23:18:12 +0800
Subject: [PATCH 1/2] [InstCombine][Docs] Update InstCombine contributor guide

---
 llvm/docs/InstCombineContributorGuide.md | 38 ++++++++++++++++++++++++
 1 file changed, 38 insertions(+)

diff --git a/llvm/docs/InstCombineContributorGuide.md b/llvm/docs/InstCombineContributorGuide.md
index b4041f8a5b93f..c6e8c51170359 100644
--- a/llvm/docs/InstCombineContributorGuide.md
+++ b/llvm/docs/InstCombineContributorGuide.md
@@ -404,11 +404,31 @@ The use of TargetTransformInfo is only allowed for hooks for target-specific
 intrinsics, such as `TargetTransformInfo::instCombineIntrinsic()`. These are
 already inherently target-dependent anyway.
 
+If some canonicalization narrow/widen the integer width of expressions, please
+check `shouldChangeType()` first. Otherwise, we may evaluate the expression 
+in illegal/inefficient types. For vector types, follow the instructions below.
+
 For vector-specific transforms that require cost-modelling, the VectorCombine
 pass can be used instead. In very rare circumstances, if there are no other
 alternatives, target-dependent transforms may be accepted into
 AggressiveInstCombine.
 
+Generally, we prefer unsigned operations over signed operations in the middle-end, even
+if signed operations are more efficient on some targets. The following is an incomplete
+list of canonicalizations that implemented in InstCombine:
+
+| Original Pattern             | Canonical Form     | Condition                     |
+|------------------------------|--------------------|-------------------------------|
+| `icmp spred X, Y`            | `icmp upred X, Y`  | `sign(X) == sign(Y)`          |
+| `smin/smax X, Y`             | `umin/umax X, Y`   | `sign(X) == sign(Y)`          |
+| `sext X`                     | `zext nneg X`      | `X >=s 0`                     |
+| `sitofp X`                   | `uitofp nneg X`    | `X >=s 0`                     |
+| `ashr X, Y`                  | `lshr X, Y`        | `X >=s 0`                     |
+| `add X, Y`                   | `or disjoint X, Y` | `(X & Y) != 0`                |
+| `mul X, C`                   | `shl X, Log2(C)`   | `isPowerOf2(C)`               |
+| `select Cond1, Cond2, false` | `and Cond1, Cond2` | `impliesPoison(Cond2, Cond1)` |
+| `select Cond1, true, Cond2`  | `or Cond1, Cond2`  | `impliesPoison(Cond2, Cond1)` |
+
 ### PatternMatch
 
 Many transforms make use of the matching infrastructure defined in
@@ -531,6 +551,19 @@ need to add a one-use check for the inner instruction.
 One-use checks can be performed using the `m_OneUse()` matcher, or the
 `V->hasOneUse()` method.
 
+### Flag handling
+
+When possible, propagate poison-generating flags like `nuw` and `nsw` since they may be
+hard to salvage later. If it is not free (e.g. requires additional complexity like `willNotOverflow`
+or KnownBits queries), don't do that.
+
+Be careful with in-place operand/predicate changes, as poison-generating flags may not be valid for new
+operands. It is recommended to create a new instruction with carefully handling of flags. If not
+applicable, call `Instruction::dropPoisonGeneratingFlags()` to clear flags in a conservative manner.
+
+Do not rely on fcmp's `nsz` flag to perform optimizations. It is meaningless for fcmp so it should not affect
+the optimization.
+
 ### Generalization
 
 Transforms can both be too specific (only handling some odd subset of patterns,
@@ -558,6 +591,11 @@ guidelines.
    use of ValueTracking queries. Whether this makes sense depends on the case,
    but it's usually a good idea to only handle the constant pattern first, and
    then generalize later if it seems useful.
+ * When possible, handle more canonical patterns as well. It is encouraged to avoid
+   potential phase-ordering issues. For example, if the motivating transform holds for
+   `add`, it also holds for `or disjoint`. See the canonicalization list above for details.
+   In most cases, it can be easily implemented with matchers like
+   `m_AddLike/m_SExtLike/m_LogicalAnd/m_LogicalOr`.
 
 ## Guidelines for reviewers
 

>From 786495bec1ba879ea1e2eb78012ab7c6c4ac86bb Mon Sep 17 00:00:00 2001
From: Yingwei Zheng <dtcxzyw2333 at gmail.com>
Date: Mon, 16 Jun 2025 10:48:51 +0800
Subject: [PATCH 2/2] Update list

---
 llvm/docs/InstCombineContributorGuide.md | 23 ++++++++++++-----------
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/llvm/docs/InstCombineContributorGuide.md b/llvm/docs/InstCombineContributorGuide.md
index c6e8c51170359..d3fb0708ac3e0 100644
--- a/llvm/docs/InstCombineContributorGuide.md
+++ b/llvm/docs/InstCombineContributorGuide.md
@@ -417,17 +417,18 @@ Generally, we prefer unsigned operations over signed operations in the middle-en
 if signed operations are more efficient on some targets. The following is an incomplete
 list of canonicalizations that implemented in InstCombine:
 
-| Original Pattern             | Canonical Form     | Condition                     |
-|------------------------------|--------------------|-------------------------------|
-| `icmp spred X, Y`            | `icmp upred X, Y`  | `sign(X) == sign(Y)`          |
-| `smin/smax X, Y`             | `umin/umax X, Y`   | `sign(X) == sign(Y)`          |
-| `sext X`                     | `zext nneg X`      | `X >=s 0`                     |
-| `sitofp X`                   | `uitofp nneg X`    | `X >=s 0`                     |
-| `ashr X, Y`                  | `lshr X, Y`        | `X >=s 0`                     |
-| `add X, Y`                   | `or disjoint X, Y` | `(X & Y) != 0`                |
-| `mul X, C`                   | `shl X, Log2(C)`   | `isPowerOf2(C)`               |
-| `select Cond1, Cond2, false` | `and Cond1, Cond2` | `impliesPoison(Cond2, Cond1)` |
-| `select Cond1, true, Cond2`  | `or Cond1, Cond2`  | `impliesPoison(Cond2, Cond1)` |
+| Original Pattern             | Canonical Form             | Condition                     |
+|------------------------------|----------------------------|-------------------------------|
+| `icmp spred X, Y`            | `icmp samesign upred X, Y` | `sign(X) == sign(Y)`          |
+| `smin/smax X, Y`             | `umin/umax X, Y`           | `sign(X) == sign(Y)`          |
+| `sext X`                     | `zext nneg X`              | `X >=s 0`                     |
+| `sitofp X`                   | `uitofp nneg X`            | `X >=s 0`                     |
+| `ashr X, Y`                  | `lshr X, Y`                | `X >=s 0`                     |
+| `sdiv/srem X, Y`             | `udiv/urem X, Y`           | `X >=s 0 && Y >=s 0`          |
+| `add X, Y`                   | `or disjoint X, Y`         | `(X & Y) != 0`                |
+| `mul X, C`                   | `shl X, Log2(C)`           | `isPowerOf2(C)`               |
+| `select Cond1, Cond2, false` | `and Cond1, Cond2`         | `impliesPoison(Cond2, Cond1)` |
+| `select Cond1, true, Cond2`  | `or Cond1, Cond2`          | `impliesPoison(Cond2, Cond1)` |
 
 ### PatternMatch
 



More information about the llvm-commits mailing list