[all-commits] [llvm/llvm-project] cf8fad: Match (xor TSize - 1, ctlz) to `bsr` instead of `l...

Mon Feb 6 12:16:49 PST 2023

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: cf8fadcf9b9362bff30e31cce06b516aa1156ce1
      https://github.com/llvm/llvm-project/commit/cf8fadcf9b9362bff30e31cce06b516aa1156ce1
  Author: Noah Goldstein <goldstein.w.n at gmail.com>
  Date:   2023-02-06 (Mon, 06 Feb 2023)

  Changed paths:
    M llvm/lib/Target/X86/X86ISelLowering.cpp
    M llvm/test/CodeGen/X86/clz.ll

  Log Message:
  -----------
  Match (xor TSize - 1, ctlz) to `bsr` instead of `lzcnt` + `xor`

Was previously de-optimizating if -march supported lzcnt as there is
no reason to add the extra instruction.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D141464

  Commit: 3857d9decc4db2a0f14fd7cb7cd69be55f12cc4a
      https://github.com/llvm/llvm-project/commit/3857d9decc4db2a0f14fd7cb7cd69be55f12cc4a
  Author: Noah Goldstein <goldstein.w.n at gmail.com>
  Date:   2023-02-06 (Mon, 06 Feb 2023)

  Changed paths:
    M llvm/lib/Target/X86/X86ISelLowering.cpp
    M llvm/test/CodeGen/X86/bmi-out-of-order.ll

  Log Message:
  -----------
  Search through associative operators for BMI patterns (BLSI, BLSR, BLSMSK)

(a & (-b)) & b is often lowered as:
    %sub  = sub i32     0, %b
    %and0 = and i32  %sub, %a
    %and1 = and i32 %and0, %b

Which won't get detected by the BLSI pattern as b & -b are never in
the same SDNode.

This patch will do a small search through associative operators and try
and place BMI patterns in the same node so they will hit the pattern.

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D141179

  Commit: 725b72c1fa608c886a1a5dbb75df23a05e91d5e8
      https://github.com/llvm/llvm-project/commit/725b72c1fa608c886a1a5dbb75df23a05e91d5e8
  Author: Noah Goldstein <goldstein.w.n at gmail.com>
  Date:   2023-02-06 (Mon, 06 Feb 2023)

  Changed paths:
    M llvm/lib/Target/X86/X86InstrInfo.td
    M llvm/test/CodeGen/X86/GlobalISel/select-blsi.mir
    M llvm/test/CodeGen/X86/GlobalISel/select-blsr.mir
    M llvm/test/CodeGen/X86/bmi-out-of-order.ll

  Log Message:
  -----------
  Only match BMI (BLSR, BLSI, BLSMSK) if the add/sub op is single use

If the add/sub is not single use, it will need to be materialized
later, in which case using the BMI instruction is a de-optimization in
terms of code-size and throughput.

i.e:
```
// Good
leal -1(%rdi), %eax
andl %eax, %eax
xorl %eax, %esi
...
```
```
// Unecessary BMI (lower throughput, larger code size)
leal -1(%rdi), %eax
blsr %edi, %eax
xorl %eax, %esi
...
```

Note, this may cause more `mov` instructions to be emitted sometimes
because BMI instructions only have 1 src and write-only to dst.  A
better approach may be to only avoid BMI for (and/xor X, (add/sub
0/-1, X)) if this is the last use of X but NOT the last use of
(add/sub 0/-1, X).

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D141180

  Commit: ee5585ed09aff2e54cb540fad4c33f0c93626b1b
      https://github.com/llvm/llvm-project/commit/ee5585ed09aff2e54cb540fad4c33f0c93626b1b
  Author: Noah Goldstein <goldstein.w.n at gmail.com>
  Date:   2023-02-06 (Mon, 06 Feb 2023)

  Changed paths:
    M llvm/lib/CodeGen/BranchFolding.cpp
    M llvm/test/CodeGen/X86/add.ll
    M llvm/test/CodeGen/X86/atom-pad-short-functions.ll
    M llvm/test/CodeGen/X86/avx512-i1test.ll
    M llvm/test/CodeGen/X86/bmi.ll
    M llvm/test/CodeGen/X86/brcond.ll
    M llvm/test/CodeGen/X86/btq.ll
    M llvm/test/CodeGen/X86/cmp-merge.ll
    M llvm/test/CodeGen/X86/cmp.ll
    M llvm/test/CodeGen/X86/comi-flags.ll
    M llvm/test/CodeGen/X86/extern_weak.ll
    M llvm/test/CodeGen/X86/fold-rmw-ops.ll
    M llvm/test/CodeGen/X86/fp-strict-scalar-cmp-fp16.ll
    M llvm/test/CodeGen/X86/fp-strict-scalar-cmp.ll
    M llvm/test/CodeGen/X86/funnel-shift.ll
    M llvm/test/CodeGen/X86/jump_sign.ll
    M llvm/test/CodeGen/X86/neg_cmp.ll
    M llvm/test/CodeGen/X86/or-branch.ll
    M llvm/test/CodeGen/X86/peep-test-4.ll
    M llvm/test/CodeGen/X86/pr37025.ll
    M llvm/test/CodeGen/X86/pr37063.ll
    M llvm/test/CodeGen/X86/rd-mod-wr-eflags.ll
    M llvm/test/CodeGen/X86/segmented-stacks.ll
    M llvm/test/CodeGen/X86/sibcall.ll
    M llvm/test/CodeGen/X86/slow-incdec.ll
    M llvm/test/CodeGen/X86/sqrt-partial.ll
    M llvm/test/CodeGen/X86/switch-bt.ll
    M llvm/test/CodeGen/X86/tail-opts.ll
    M llvm/test/CodeGen/X86/tailcall-cgp-dup.ll
    M llvm/test/CodeGen/X86/tailcall-extract.ll
    M llvm/test/CodeGen/X86/xor-icmp.ll

  Log Message:
  -----------
  Recommit "Improve and enable folding of conditional branches with tail calls." (2nd Try)

Improve and enable folding of conditional branches with tail calls.

1. Make it so that conditional tail calls can be emitted even when
   there are multiple predecessors.

2. Don't guard the transformation behind -Os. The rationale for
   guarding it was static-prediction can be affected by whether the
   branch is forward of backward. This is no longer true for almost any
   X86 cpus (anything newer than `SnB`) so is no longer a meaningful
   concern.

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D140931

  Commit: 19c766f7423abb1808c4de94ea0a0f09ef0a6ada
      https://github.com/llvm/llvm-project/commit/19c766f7423abb1808c4de94ea0a0f09ef0a6ada
  Author: Noah Goldstein <goldstein.w.n at gmail.com>
  Date:   2023-02-06 (Mon, 06 Feb 2023)

  Changed paths:
    M llvm/test/Transforms/InstCombine/icmp-mul.ll

  Log Message:
  -----------
  Add tests for folding (icmp UnsignedPred X * Z, Y * Z) -> (icmp UnsignedPred X, Y); NFC

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D142785

  Commit: 2a3732f934b1ef46fc8f0fdae77836c6604533cb
      https://github.com/llvm/llvm-project/commit/2a3732f934b1ef46fc8f0fdae77836c6604533cb
  Author: Noah Goldstein <goldstein.w.n at gmail.com>
  Date:   2023-02-06 (Mon, 06 Feb 2023)

  Changed paths:
    M llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
    M llvm/test/Transforms/InstCombine/icmp-mul.ll

  Log Message:
  -----------
  Add transform for `(mul X, OddC) eq/ne N * C` --> `X eq/ne N`

We previously only did this if the `mul` was `nuw`, but it works for
any odd value.

Alive2 Links:
EQ: https://alive2.llvm.org/ce/z/6_HPZ5
NE: https://alive2.llvm.org/ce/z/c34qSU

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D143026

  Commit: abbd256a810a0b0c92dda88a3050fc85cb604a9c
      https://github.com/llvm/llvm-project/commit/abbd256a810a0b0c92dda88a3050fc85cb604a9c
  Author: Noah Goldstein <goldstein.w.n at gmail.com>
  Date:   2023-02-06 (Mon, 06 Feb 2023)

  Changed paths:
    M llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
    M llvm/test/Transforms/InstCombine/icmp-mul.ll

  Log Message:
  -----------
  Improve transforms for (icmp uPred X * Z, Y * Z) -> (icmp uPred X, Y)

Several cases where missing.

1. `(icmp eq/ne X*Z, Y*Z) [if Z % 2 != 0] -> (icmp eq/ne X, Y)`
    EQ: https://alive2.llvm.org/ce/z/6_HPZ5
    NE: https://alive2.llvm.org/ce/z/c34qSU

    There was previously an implementation of this that work of `Y`
    was non-constant, but it was missing if `Y*Z` evaluated to a
    constant and/or `nsw`/`nuw` where both false. As well it only
    worked if `Z` was a constant but we can check 1s bit of
    `KnownBits` to cover more cases.

2. `(icmp eq/ne X*Z, Y*Z) [if Z != 0 and nsw(X*Y) and nsw(Y*Z)] -> (icmp eq/ne X, Y)`
    EQ: https://alive2.llvm.org/ce/z/6SdAG6
    NE: https://alive2.llvm.org/ce/z/fjsq_b

    This was previously implemented only to work if `Z` was constant,
    but we can use `isKnownNonZero` to cover more cases.

3. `(icmp uPred X*Y, Y*Z) [if Z != 0 and nuw(X*Y) and nuw(X*Y)] -> (icmp uPred X, Y)`
    EQ:  https://alive2.llvm.org/ce/z/FqWQLX
    NE:  https://alive2.llvm.org/ce/z/2gHrd2
    ULT: https://alive2.llvm.org/ce/z/MUAWgZ
    ULE: https://alive2.llvm.org/ce/z/szQQ2L
    UGT: https://alive2.llvm.org/ce/z/McVUdu
    UGE: https://alive2.llvm.org/ce/z/95uyC8

    This was previously implemented only for `eq/ne` cases. As well
    only if `Z` was constant, but again we can use `isKnownNonZero` to
    cover more cases.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D142786

Compare: https://github.com/llvm/llvm-project/compare/3b73fc320f91...abbd256a810a