[PATCH] D67799: [InstCombine] Fold a shifty implementation of clamp negative to zero.
Huihui Zhang via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Sep 23 00:56:05 PDT 2019
huihuiz added a comment.
Another note, for older generation X86 target, e.g., haswell, cmove indeed has latency 2. But able to achieve comparable uOps Per Cycle
same test input
clang clampNegToZero.ll -O2 -target x86_64 -march=haswell -S -o - | llvm-mca -mtriple=x86_64-unknown-unknown -mcpu=haswell
before
Iterations: 100
Instructions: 500
Total Cycles: 210
Total uOps: 700
Dispatch Width: 4
uOps Per Cycle: 3.33
IPC: 2.38
Block RThroughput: 1.8
Instruction Info:
[1]: #uOps
[2]: Latency
[3]: RThroughput
[4]: MayLoad
[5]: MayStore
[6]: HasSideEffects (U)
[1] [2] [3] [4] [5] [6] Instructions:
1 1 0.25 movl %esi, %eax
1 1 0.25 subl %edi, %eax
1 1 0.50 sarl $31, %eax
1 1 0.25 andl %edi, %eax
3 7 1.00 U retq
After
Iterations: 100
Instructions: 400
Total Cycles: 209
Total uOps: 700
Dispatch Width: 4
uOps Per Cycle: 3.35
IPC: 1.91
Block RThroughput: 1.8
Instruction Info:
[1]: #uOps
[2]: Latency
[3]: RThroughput
[4]: MayLoad
[5]: MayStore
[6]: HasSideEffects (U)
[1] [2] [3] [4] [5] [6] Instructions:
1 0 0.25 xorl %eax, %eax
1 1 0.25 cmpl %esi, %edi
2 2 0.50 cmovgl %edi, %eax
3 7 1.00 U retq
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D67799/new/
https://reviews.llvm.org/D67799
More information about the llvm-commits
mailing list