[PATCH] D23446: [X86] Enable setcc to srl(ctlz) transformation on btver2 architectures.
pierre gousseau via llvm-commits
llvm-commits at lists.llvm.org
Wed Aug 17 11:01:07 PDT 2016
pgousseau added a comment.
Yes sounds like I should disable this if optForSize is enabled.
For example in openssl, out of 89 srl(lzcnt) optimisations, around 30 cases cause larger code, for example I see 25 of those:
3055: 31 ed xor %ebp,%ebp
3057: ff c2 inc %edx
3059: 40 0f 94 c5 sete %bpl
305d: 01 e9 add %ebp,%ecx
-> 10 bytes
309d: ff c2 inc %edx
309f: f3 0f bd ea lzcnt %edx,%ebp
30a3: c1 ed 05 shr $0x5,%ebp
30a6: 01 e9 add %ebp,%ecx
-> 12 bytes
But this should not affect performances.
Luckily openssl's libcrypto total size remains smaller as the other remaining matches result in fewer and as fast instructions.
================
Comment at: lib/CodeGen/SelectionDAG/TargetLowering.cpp:3566-3567
@@ -3565,3 +3565,4 @@
-SDValue TargetLowering::lowerCmpEqZeroToCtlzSrl(SDValue Op,
- SelectionDAG &DAG) const {
+llvm::SDValue
+llvm::TargetLowering::lowerCmpEqZeroToCtlzSrl(SDValue Op, EVT ExtTy,
+ SelectionDAG &DAG) const {
----------------
spatel wrote:
> Remove "llvm::"
Sounds good, will do.
================
Comment at: lib/Target/X86/X86.td:265-266
@@ -264,1 +264,4 @@
"true", "Vector SQRT is fast (disable Newton-Raphson)">;
+// On some architectures, such as AMD's Jaguar, LZCNT is fast (as in as fast as
+// equivalent instructions).
+def FeatureFastLZCNT
----------------
spatel wrote:
> The description still seems too vague. The key point is that lzcnt must have the same latency and throughput as test/set, right?
>
> How about: "If lzcnt has equivalent latency/throughput to most simple integer ops, it can be used to replace test/set sequences."
Sounds good thanks, will do.
================
Comment at: test/CodeGen/X86/lzcnt-zext-cmp.ll:85-92
@@ +84,10 @@
+; Test 16-bit input, 16-bit output.
+define i16 @foo5(i16 %a) {
+; CHECK-LABEL: foo5:
+; CHECK: # BB#0:
+; CHECK-NEXT: xorl %eax, %eax
+; CHECK-NEXT: testw %di, %di
+; CHECK-NEXT: sete %al
+; CHECK-NEXT: # kill: %AX<def> %AX<kill> %EAX<kill>
+; CHECK-NEXT: retq
+;
----------------
spatel wrote:
> lzcnt has a 16-bit variant in the ISA. Is there some reason not to use it here?
I disabled the 16-bit case because this seems to lead to bigger code in general, for this test case we would get:
lzcntw %di, %ax
andl $16, %eax
shrl $4, %eax
retq
instead of:
xorl %eax, %eax
testw %di, %di
sete %al
retq
which seems bigger code for the same result.
https://reviews.llvm.org/D23446
More information about the llvm-commits
mailing list