[PATCH] D67799: [InstCombine] Fold a shifty implementation of clamp-to-zero.

Mon Sep 23 14:16:15 PDT 2019

huihuiz added a comment.

This is just FYI.

llvm-mca result for AMD btver2 and bdver2

AMD btver2
clang clampNegToZero.ll -O2 -target x86_64 -march=btver2 -S -o - | llvm-mca -mtriple=x86_64-unknown-unknown -mcpu=btver2
before

  Iterations:        100
  Instructions:      500
  Total Cycles:      256
  Total uOps:        500

  Dispatch Width:    2
  uOps Per Cycle:    1.95
  IPC:               1.95
  Block RThroughput: 2.5

  Instruction Info:
  [1]: #uOps
  [2]: Latency
  [3]: RThroughput
  [4]: MayLoad
  [5]: MayStore
  [6]: HasSideEffects (U)

  [1]    [2]    [3]    [4]    [5]    [6]    Instructions:
   1      1     0.50                        movl  %esi, %eax
   1      1     0.50                        subl  %edi, %eax
   1      1     0.50                        sarl  $31, %eax
   1      1     0.50                        andl  %edi, %eax
   1      4     1.00                  U     retq

After

  Iterations:        100
  Instructions:      400
  Total Cycles:      206
  Total uOps:        400

  Dispatch Width:    2
  uOps Per Cycle:    1.94
  IPC:               1.94
  Block RThroughput: 2.0

  Instruction Info:
  [1]: #uOps
  [2]: Latency
  [3]: RThroughput
  [4]: MayLoad
  [5]: MayStore
  [6]: HasSideEffects (U)

  [1]    [2]    [3]    [4]    [5]    [6]    Instructions:
   1      0     0.50                        xorl  %eax, %eax
   1      1     0.50                        cmpl  %esi, %edi
   1      1     0.50                        cmovgl        %edi, %eax
   1      4     1.00                  U     retq

AMD bdver2
clang clampNegToZero.ll -O2 -target x86_64 -march=bdver2 -S -o - | llvm-mca -mtriple=x86_64-unknown-unknown -mcpu=bdver2
before

  Iterations:        100
  Instructions:      500
  Total Cycles:      455
  Total uOps:        500

  Dispatch Width:    4
  uOps Per Cycle:    1.10
  IPC:               1.10
  Block RThroughput: 4.0

  Instruction Info:
  [1]: #uOps
  [2]: Latency
  [3]: RThroughput
  [4]: MayLoad
  [5]: MayStore
  [6]: HasSideEffects (U)

  [1]    [2]    [3]    [4]    [5]    [6]    Instructions:
   1      1     1.00                        movl  %esi, %eax
   1      1     1.00                        subl  %edi, %eax
   1      1     1.00                        sarl  $31, %eax
   1      1     1.00                        andl  %edi, %eax
   1      5     1.50                  U     retq

After

  Iterations:        100
  Instructions:      400
  Total Cycles:      208
  Total uOps:        400

  Dispatch Width:    4
  uOps Per Cycle:    1.92
  IPC:               1.92
  Block RThroughput: 1.5

  Instruction Info:
  [1]: #uOps
  [2]: Latency
  [3]: RThroughput
  [4]: MayLoad
  [5]: MayStore
  [6]: HasSideEffects (U)

  [1]    [2]    [3]    [4]    [5]    [6]    Instructions:
   1      0     0.25                        xorl  %eax, %eax
   1      1     1.00                        cmpl  %esi, %edi
   1      1     0.50                        cmovgl        %edi, %eax
   1      5     1.50                  U     retq

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D67799/new/

https://reviews.llvm.org/D67799