[PATCH] D67799: [InstCombine] Fold a shifty implementation of clamp negative to zero.

Mon Sep 23 00:56:05 PDT 2019

huihuiz added a comment.

Another note,  for older generation X86 target, e.g., haswell, cmove indeed has latency 2. But able to achieve comparable uOps Per Cycle
same test input
clang clampNegToZero.ll -O2 -target x86_64 -march=haswell -S -o - | llvm-mca -mtriple=x86_64-unknown-unknown -mcpu=haswell
before

  Iterations:        100
  Instructions:      500
  Total Cycles:      210
  Total uOps:        700

  Dispatch Width:    4
  uOps Per Cycle:    3.33
  IPC:               2.38
  Block RThroughput: 1.8

  Instruction Info:
  [1]: #uOps
  [2]: Latency
  [3]: RThroughput
  [4]: MayLoad
  [5]: MayStore
  [6]: HasSideEffects (U)

  [1]    [2]    [3]    [4]    [5]    [6]    Instructions:
   1      1     0.25                        movl  %esi, %eax
   1      1     0.25                        subl  %edi, %eax
   1      1     0.50                        sarl  $31, %eax
   1      1     0.25                        andl  %edi, %eax
   3      7     1.00                  U     retq

After

  Iterations:        100
  Instructions:      400
  Total Cycles:      209
  Total uOps:        700

  Dispatch Width:    4
  uOps Per Cycle:    3.35
  IPC:               1.91
  Block RThroughput: 1.8

  Instruction Info:
  [1]: #uOps
  [2]: Latency
  [3]: RThroughput
  [4]: MayLoad
  [5]: MayStore
  [6]: HasSideEffects (U)

  [1]    [2]    [3]    [4]    [5]    [6]    Instructions:
   1      0     0.25                        xorl  %eax, %eax
   1      1     0.25                        cmpl  %esi, %edi
   2      2     0.50                        cmovgl        %edi, %eax
   3      7     1.00                  U     retq

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D67799/new/

https://reviews.llvm.org/D67799