[llvm] r195496 - X86: Perform integer comparisons at i32 or larger.
Sean Silva
silvas at purdue.edu
Fri Nov 22 19:26:05 PST 2013
Hi Jim,
On Fri, Nov 22, 2013 at 2:57 PM, Jim Grosbach <grosbach at apple.com> wrote:
> Author: grosbach
> Date: Fri Nov 22 13:57:47 2013
> New Revision: 195496
>
> URL: http://llvm.org/viewvc/llvm-project?rev=195496&view=rev
> Log:
> X86: Perform integer comparisons at i32 or larger.
>
> Utilizing the 8 and 16 bit comparison instructions, even when an input can
> be folded into the comparison instruction itself, is typically not worth
> it.
>
Could you cite experimental results backing this up? I have a hard time
believing that this is true with such broad generality that the only reason
to avoid following this advice is when optimizing for size.
I would expect conclusive benchmark results on a large corpus of code
tested on a broad spectrum of microarchitectures from both Intel and AMD
before making a change like this. Also, the code size impact needs to be
quantified; having to break out to a whole new instruction for what would
otherwise be an immediate has the potential for insane code-size increase
(clang has something like 30,000 cmpb's and it seems like almost all of
them have immediate memory operands). Also, not being able to fold a load
into the comparison will likely increase register pressure.
At the very least I would expect a number of sample kernels showing
definitive empirical performance improvements across a broad spectrum of
microarchitectures.
> There are too many partial register stalls as a result, leading to
> significant
> slowdowns.
Most AMD architectures keep the whole register together internally, and so
don't have partial register stalls (instead trading false dependencies).
Does this change make sense on these uarches?
> By always performing comparisons on at least 32-bit
> registers, performance of the calculation chain leading to the
> comparison improves. Continue to use the smaller comparisons when
> minimizing size, as that allows better folding of loads into the
> comparison instructions.
>
> rdar://15386341
>
> Removed:
> llvm/trunk/test/CodeGen/X86/2007-10-17-IllegalAsm.ll
>
What is the rationale for completely ripping out this testcase?
> Modified:
> llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
> llvm/trunk/test/CodeGen/X86/3addr-16bit.ll
> llvm/trunk/test/CodeGen/X86/codegen-prepare-extload.ll
> llvm/trunk/test/CodeGen/X86/ctpop-combine.ll
> llvm/trunk/test/CodeGen/X86/memcmp.ll
> llvm/trunk/test/CodeGen/X86/shrink-compare.ll
>
> Modified: llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
> URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86ISelLowering.cpp?rev=195496&r1=195495&r2=195496&view=diff
>
> ==============================================================================
> --- llvm/trunk/lib/Target/X86/X86ISelLowering.cpp (original)
> +++ llvm/trunk/lib/Target/X86/X86ISelLowering.cpp Fri Nov 22 13:57:47 2013
> @@ -3419,6 +3419,24 @@ bool X86::isCalleePop(CallingConv::ID Ca
> }
> }
>
> +/// \brief Return true if the condition is an unsigned comparison
> operation.
> +static bool isX86CCUnsigned(unsigned X86CC) {
> + switch (X86CC) {
> + default: llvm_unreachable("Invalid integer condition!");
> + case X86::COND_E: return true;
> + case X86::COND_G: return false;
> + case X86::COND_GE: return false;
> + case X86::COND_L: return false;
> + case X86::COND_LE: return false;
> + case X86::COND_NE: return true;
> + case X86::COND_B: return true;
> + case X86::COND_A: return true;
> + case X86::COND_BE: return true;
> + case X86::COND_AE: return true;
> + }
> + llvm_unreachable("covered switch fell through?!");
> +}
> +
>
Two llvm_unreachables is overkill. Thankfully we actually have a coding
standard for this <
http://llvm.org/docs/CodingStandards.html#don-t-use-default-labels-in-fully-covered-switches-over-enumerations
>.
-- Sean Silva
> /// TranslateX86CC - do a one to one translation of a ISD::CondCode to
> the X86
> /// specific condition code, returning the condition code and the LHS/RHS
> of the
> /// comparison to make.
> @@ -9662,6 +9680,17 @@ SDValue X86TargetLowering::EmitCmp(SDVal
> SDLoc dl(Op0);
> if ((Op0.getValueType() == MVT::i8 || Op0.getValueType() == MVT::i16 ||
> Op0.getValueType() == MVT::i32 || Op0.getValueType() == MVT::i64))
> {
> + // Do the comparison at i32 if it's smaller. This avoids subregister
> + // aliasing issues. Keep the smaller reference if we're optimizing for
> + // size, however, as that'll allow better folding of memory
> operations.
> + if (Op0.getValueType() != MVT::i32 && Op0.getValueType() != MVT::i64
> &&
> +
> !DAG.getMachineFunction().getFunction()->getAttributes().hasAttribute(
> + AttributeSet::FunctionIndex, Attribute::MinSize)) {
> + unsigned ExtendOp =
> + isX86CCUnsigned(X86CC) ? ISD::ZERO_EXTEND : ISD::SIGN_EXTEND;
> + Op0 = DAG.getNode(ExtendOp, dl, MVT::i32, Op0);
> + Op1 = DAG.getNode(ExtendOp, dl, MVT::i32, Op1);
> + }
> // Use SUB instead of CMP to enable CSE between SUB and CMP.
> SDVTList VTs = DAG.getVTList(Op0.getValueType(), MVT::i32);
> SDValue Sub = DAG.getNode(X86ISD::SUB, dl, VTs,
>
> Removed: llvm/trunk/test/CodeGen/X86/2007-10-17-IllegalAsm.ll
> URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/2007-10-17-IllegalAsm.ll?rev=195495&view=auto
>
> ==============================================================================
> --- llvm/trunk/test/CodeGen/X86/2007-10-17-IllegalAsm.ll (original)
> +++ llvm/trunk/test/CodeGen/X86/2007-10-17-IllegalAsm.ll (removed)
> @@ -1,87 +0,0 @@
> -; RUN: llc < %s -mtriple=x86_64-linux-gnu | grep addb | not grep x
> -; RUN: llc < %s -mtriple=x86_64-linux-gnu | grep cmpb | not grep x
> -; PR1734
> -
> -target triple = "x86_64-unknown-linux-gnu"
> - %struct.CUMULATIVE_ARGS = type { i32, i32, i32, i32, i32, i32,
> i32, i32, i32, i32, i32, i32, i32, i32 }
> - %struct.eh_status = type opaque
> - %struct.emit_status = type { i32, i32, %struct.rtx_def*,
> %struct.rtx_def*, %struct.sequence_stack*, i32, %struct.location_t, i32,
> i8*, %struct.rtx_def** }
> - %struct.expr_status = type { i32, i32, i32, %struct.rtx_def*,
> %struct.rtx_def*, %struct.rtx_def* }
> - %struct.function = type { %struct.eh_status*,
> %struct.expr_status*, %struct.emit_status*, %struct.varasm_status*,
> %struct.tree_node*, %struct.tree_node*, %struct.tree_node*,
> %struct.tree_node*, %struct.function*, i32, i32, i32, i32,
> %struct.rtx_def*, %struct.CUMULATIVE_ARGS, %struct.rtx_def*,
> %struct.rtx_def*, %struct.initial_value_struct*, %struct.rtx_def*,
> %struct.rtx_def*, %struct.rtx_def*, %struct.rtx_def*, %struct.rtx_def*,
> %struct.rtx_def*, i8, i32, i64, %struct.tree_node*, %struct.tree_node*,
> %struct.rtx_def*, %struct.varray_head_tag*, %struct.temp_slot*, i32,
> %struct.var_refs_queue*, i32, i32, %struct.rtvec_def*, %struct.tree_node*,
> i32, i32, i32, %struct.machine_function*, i32, i32, i8, i8,
> %struct.language_function*, %struct.rtx_def*, i32, i32, i32, i32,
> %struct.location_t, %struct.varray_head_tag*, %struct.tree_node*,
> %struct.tree_node*, i8, i8, i8 }
> - %struct.initial_value_struct = type opaque
> - %struct.lang_decl = type opaque
> - %struct.language_function = type opaque
> - %struct.location_t = type { i8*, i32 }
> - %struct.machine_function = type { %struct.stack_local_entry*, i8*,
> %struct.rtx_def*, i32, i32, i32, i32, i32 }
> - %struct.rtunion = type { i8* }
> - %struct.rtvec_def = type { i32, [1 x %struct.rtx_def*] }
> - %struct.rtx_def = type { i16, i8, i8, %struct.u }
> - %struct.sequence_stack = type { %struct.rtx_def*,
> %struct.rtx_def*, %struct.sequence_stack* }
> - %struct.stack_local_entry = type opaque
> - %struct.temp_slot = type opaque
> - %struct.tree_common = type { %struct.tree_node*,
> %struct.tree_node*, %union.tree_ann_d*, i8, i8, i8, i8, i8 }
> - %struct.tree_decl = type { %struct.tree_common,
> %struct.location_t, i32, %struct.tree_node*, i8, i8, i8, i8, i8, i8, i8,
> i8, i32, %struct.tree_decl_u1, %struct.tree_node*, %struct.tree_node*,
> %struct.tree_node*, %struct.tree_node*, %struct.tree_node*,
> %struct.tree_node*, %struct.tree_node*, %struct.tree_node*,
> %struct.tree_node*, %struct.tree_node*, %struct.rtx_def*, i32,
> %struct.tree_decl_u2, %struct.tree_node*, %struct.tree_node*, i64,
> %struct.lang_decl* }
> - %struct.tree_decl_u1 = type { i64 }
> - %struct.tree_decl_u2 = type { %struct.function* }
> - %struct.tree_node = type { %struct.tree_decl }
> - %struct.u = type { [1 x %struct.rtunion] }
> - %struct.var_refs_queue = type { %struct.rtx_def*, i32, i32,
> %struct.var_refs_queue* }
> - %struct.varasm_status = type opaque
> - %struct.varray_data = type { [1 x i64] }
> - %struct.varray_head_tag = type { i64, i64, i32, i8*,
> %struct.varray_data }
> - %union.tree_ann_d = type opaque
> -
> -define void @layout_type(%struct.tree_node* %type) {
> -entry:
> - %tmp32 = load i32* null, align 8 ; <i32> [#uses=3]
> - %tmp3435 = trunc i32 %tmp32 to i8 ; <i8> [#uses=1]
> - %tmp53 = icmp eq %struct.tree_node* null, null ; <i1>
> [#uses=1]
> - br i1 %tmp53, label %cond_next57, label %UnifiedReturnBlock
> -
> -cond_next57: ; preds = %entry
> - %tmp65 = and i32 %tmp32, 255 ; <i32> [#uses=1]
> - switch i32 %tmp65, label %UnifiedReturnBlock [
> - i32 6, label %bb140
> - i32 7, label %bb140
> - i32 8, label %bb140
> - i32 13, label %bb478
> - ]
> -
> -bb140: ; preds = %cond_next57, %cond_next57, %cond_next57
> - %tmp219 = load i32* null, align 8 ; <i32> [#uses=1]
> - %tmp221222 = trunc i32 %tmp219 to i8 ; <i8> [#uses=1]
> - %tmp223 = icmp eq i8 %tmp221222, 24 ; <i1> [#uses=1]
> - br i1 %tmp223, label %cond_true226, label %cond_next340
> -
> -cond_true226: ; preds = %bb140
> - switch i8 %tmp3435, label %cond_true288 [
> - i8 6, label %cond_next340
> - i8 9, label %cond_next340
> - i8 7, label %cond_next340
> - i8 8, label %cond_next340
> - i8 10, label %cond_next340
> - ]
> -
> -cond_true288: ; preds = %cond_true226
> - unreachable
> -
> -cond_next340: ; preds = %cond_true226, %cond_true226,
> %cond_true226, %cond_true226, %cond_true226, %bb140
> - ret void
> -
> -bb478: ; preds = %cond_next57
> - br i1 false, label %cond_next500, label %cond_true497
> -
> -cond_true497: ; preds = %bb478
> - unreachable
> -
> -cond_next500: ; preds = %bb478
> - %tmp513 = load i32* null, align 8 ; <i32> [#uses=1]
> - %tmp545 = and i32 %tmp513, 8192 ; <i32> [#uses=1]
> - %tmp547 = and i32 %tmp32, -8193 ; <i32> [#uses=1]
> - %tmp548 = or i32 %tmp547, %tmp545 ; <i32> [#uses=1]
> - store i32 %tmp548, i32* null, align 8
> - ret void
> -
> -UnifiedReturnBlock: ; preds = %cond_next57, %entry
> - ret void
> -}
>
> Modified: llvm/trunk/test/CodeGen/X86/3addr-16bit.ll
> URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/3addr-16bit.ll?rev=195496&r1=195495&r2=195496&view=diff
>
> ==============================================================================
> --- llvm/trunk/test/CodeGen/X86/3addr-16bit.ll (original)
> +++ llvm/trunk/test/CodeGen/X86/3addr-16bit.ll Fri Nov 22 13:57:47 2013
> @@ -34,7 +34,7 @@ entry:
>
> ; 64BIT-LABEL: t2:
> ; 64BIT-NOT: movw %si, %ax
> -; 64BIT: decl %eax
> +; 64BIT: leal -1(%rsi), %eax
> ; 64BIT: movzwl %ax
> %0 = icmp eq i16 %k, %c ; <i1> [#uses=1]
> %1 = add i16 %k, -1 ; <i16> [#uses=3]
> @@ -59,7 +59,7 @@ entry:
>
> ; 64BIT-LABEL: t3:
> ; 64BIT-NOT: movw %si, %ax
> -; 64BIT: addl $2, %eax
> +; 64BIT: leal 2(%rsi), %eax
> %0 = add i16 %k, 2 ; <i16> [#uses=3]
> %1 = icmp eq i16 %k, %c ; <i1> [#uses=1]
> br i1 %1, label %bb, label %bb1
> @@ -82,7 +82,7 @@ entry:
>
> ; 64BIT-LABEL: t4:
> ; 64BIT-NOT: movw %si, %ax
> -; 64BIT: addl %edi, %eax
> +; 64BIT: leal (%rsi,%rdi), %eax
> %0 = add i16 %k, %c ; <i16> [#uses=3]
> %1 = icmp eq i16 %k, %c ; <i1> [#uses=1]
> br i1 %1, label %bb, label %bb1
>
> Modified: llvm/trunk/test/CodeGen/X86/codegen-prepare-extload.ll
> URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/codegen-prepare-extload.ll?rev=195496&r1=195495&r2=195496&view=diff
>
> ==============================================================================
> --- llvm/trunk/test/CodeGen/X86/codegen-prepare-extload.ll (original)
> +++ llvm/trunk/test/CodeGen/X86/codegen-prepare-extload.ll Fri Nov 22
> 13:57:47 2013
> @@ -5,7 +5,7 @@
> ; CodeGenPrepare should move the zext into the block with the load
> ; so that SelectionDAG can select it with the load.
>
> -; CHECK: movzbl ({{%rdi|%rcx}}), %eax
> +; CHECK: movsbl ({{%rdi|%rcx}}), %eax
>
> define void @foo(i8* %p, i32* %q) {
> entry:
>
> Modified: llvm/trunk/test/CodeGen/X86/ctpop-combine.ll
> URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/ctpop-combine.ll?rev=195496&r1=195495&r2=195496&view=diff
>
> ==============================================================================
> --- llvm/trunk/test/CodeGen/X86/ctpop-combine.ll (original)
> +++ llvm/trunk/test/CodeGen/X86/ctpop-combine.ll Fri Nov 22 13:57:47 2013
> @@ -35,6 +35,6 @@ define i32 @test3(i64 %x) nounwind readn
> %conv = zext i1 %cmp to i32
> ret i32 %conv
> ; CHECK-LABEL: test3:
> -; CHECK: cmpb $2
> +; CHECK: cmpl $2
> ; CHECK: ret
> }
>
> Modified: llvm/trunk/test/CodeGen/X86/memcmp.ll
> URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/memcmp.ll?rev=195496&r1=195495&r2=195496&view=diff
>
> ==============================================================================
> --- llvm/trunk/test/CodeGen/X86/memcmp.ll (original)
> +++ llvm/trunk/test/CodeGen/X86/memcmp.ll Fri Nov 22 13:57:47 2013
> @@ -22,8 +22,9 @@ bb:
> return: ; preds = %entry
> ret void
> ; CHECK-LABEL: memcmp2:
> -; CHECK: movw ([[A0:%rdi|%rcx]]), %ax
> -; CHECK: cmpw ([[A1:%rsi|%rdx]]), %ax
> +; CHECK: movzwl
> +; CHECK-NEXT: movzwl
> +; CHECK-NEXT: cmpl
> ; NOBUILTIN-LABEL: memcmp2:
> ; NOBUILTIN: callq
> }
> @@ -41,7 +42,8 @@ bb:
> return: ; preds = %entry
> ret void
> ; CHECK-LABEL: memcmp2a:
> -; CHECK: cmpw $28527, ([[A0]])
> +; CHECK: movzwl
> +; CHECK-NEXT: cmpl $28527,
> }
>
>
> @@ -58,8 +60,8 @@ bb:
> return: ; preds = %entry
> ret void
> ; CHECK-LABEL: memcmp4:
> -; CHECK: movl ([[A0]]), %eax
> -; CHECK: cmpl ([[A1]]), %eax
> +; CHECK: movl
> +; CHECK-NEXT: cmpl
> }
>
> define void @memcmp4a(i8* %X, i32* nocapture %P) nounwind {
> @@ -75,7 +77,7 @@ bb:
> return: ; preds = %entry
> ret void
> ; CHECK-LABEL: memcmp4a:
> -; CHECK: cmpl $1869573999, ([[A0]])
> +; CHECK: cmpl $1869573999,
> }
>
> define void @memcmp8(i8* %X, i8* %Y, i32* nocapture %P) nounwind {
> @@ -91,8 +93,8 @@ bb:
> return: ; preds = %entry
> ret void
> ; CHECK-LABEL: memcmp8:
> -; CHECK: movq ([[A0]]), %rax
> -; CHECK: cmpq ([[A1]]), %rax
> +; CHECK: movq
> +; CHECK: cmpq
> }
>
> define void @memcmp8a(i8* %X, i32* nocapture %P) nounwind {
> @@ -108,7 +110,7 @@ bb:
> return: ; preds = %entry
> ret void
> ; CHECK-LABEL: memcmp8a:
> -; CHECK: movabsq $8029759185026510694, %rax
> -; CHECK: cmpq %rax, ([[A0]])
> +; CHECK: movabsq $8029759185026510694,
> +; CHECK: cmpq
> }
>
>
> Modified: llvm/trunk/test/CodeGen/X86/shrink-compare.ll
> URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/shrink-compare.ll?rev=195496&r1=195495&r2=195496&view=diff
>
> ==============================================================================
> --- llvm/trunk/test/CodeGen/X86/shrink-compare.ll (original)
> +++ llvm/trunk/test/CodeGen/X86/shrink-compare.ll Fri Nov 22 13:57:47 2013
> @@ -2,7 +2,7 @@
>
> declare void @bar()
>
> -define void @test1(i32* nocapture %X) nounwind {
> +define void @test1(i32* nocapture %X) nounwind minsize {
> entry:
> %tmp1 = load i32* %X, align 4
> %and = and i32 %tmp1, 255
> @@ -19,7 +19,7 @@ if.end:
> ; CHECK: cmpb $47, (%{{rdi|rcx}})
> }
>
> -define void @test2(i32 %X) nounwind {
> +define void @test2(i32 %X) nounwind minsize {
> entry:
> %and = and i32 %X, 255
> %cmp = icmp eq i32 %and, 47
> @@ -35,7 +35,7 @@ if.end:
> ; CHECK: cmpb $47, %{{dil|cl}}
> }
>
> -define void @test3(i32 %X) nounwind {
> +define void @test3(i32 %X) nounwind minsize {
> entry:
> %and = and i32 %X, 255
> %cmp = icmp eq i32 %and, 255
> @@ -70,7 +70,7 @@ lor.end:
> @x = global { i8, i8, i8, i8, i8, i8, i8, i8 } { i8 1, i8 0, i8 0, i8 0,
> i8 1, i8 0, i8 0, i8 1 }, align 4
>
> ; PR16551
> -define void @test5(i32 %X) nounwind {
> +define void @test5(i32 %X) nounwind minsize {
> entry:
> %bf.load = load i56* bitcast ({ i8, i8, i8, i8, i8, i8, i8, i8 }* @x to
> i56*), align 4
> %bf.lshr = lshr i56 %bf.load, 32
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20131122/6b65a3a6/attachment.html>
More information about the llvm-commits
mailing list