[llvm] r195496 - X86: Perform integer comparisons at i32 or larger.

Mon Nov 25 22:12:18 PST 2013

On Mon, Nov 25, 2013 at 5:02 PM, Chandler Carruth <chandlerc at google.com>wrote:

> On Mon, Nov 25, 2013 at 1:48 PM, Jim Grosbach <grosbach at apple.com> wrote:
>
>>
>> On Nov 22, 2013, at 7:26 PM, Sean Silva <silvas at purdue.edu> wrote:
>>
>> Hi Jim,
>>
>>
>> On Fri, Nov 22, 2013 at 2:57 PM, Jim Grosbach <grosbach at apple.com> wrote:
>>
>>> Author: grosbach
>>> Date: Fri Nov 22 13:57:47 2013
>>> New Revision: 195496
>>>
>>> URL: http://llvm.org/viewvc/llvm-project?rev=195496&view=rev
>>> Log:
>>> X86: Perform integer comparisons at i32 or larger.
>>>
>>> Utilizing the 8 and 16 bit comparison instructions, even when an input
>>> can
>>> be folded into the comparison instruction itself, is typically not worth
>>> it.
>>>
>>
>> Could you cite experimental results backing this up? I have a hard time
>> believing that this is true with such broad generality that the only reason
>> to avoid following this advice is when optimizing for size.
>>
>> FWIW, I think it's reasonable to expect reviewers of commits like this to
> be familiar with Agner's guidelines and Intel's guidelines about x86
> performance tuning. I don't think those really are necessary to cite in
> commit logs. Maybe in a code comment much as we cite the C++ standard in
> code comments not to justify the logic to the reader but to save the reader
> some cross-referencing time.
>

The thing that initially got my attention was the *deviation* from these
guidelines, as the commit message mentions partial register stalls as an
issue (without qualification, as a fully general issue), which they aren't
on AMD.

>
>
>> I would expect conclusive benchmark results on a large corpus of code
>> tested on a broad spectrum of microarchitectures from both Intel and AMD
>> before making a change like this. Also, the code size impact needs to be
>> quantified; having to break out to a whole new instruction for what would
>> otherwise be an immediate has the potential for insane code-size increase
>> (clang has something like 30,000 cmpb's and it seems like almost all of
>> them have immediate memory operands).
>>
>> While it seems reasonable to ask for benchmark numbers in a commit (both
> of code size and execution performance) I think you're overreacting to this
> change. It's a small change, and it was benchmarked before being committed.
>

As is all too common, it just seems there was a communications hiccup :)

>
> Most AMD architectures keep the whole register together internally, and so
>> don't have partial register stalls (instead trading false dependencies).
>> Does this change make sense on these uarches?
>>
>>
>> Maybe? That’s a question better suited for someone more intimately
>> familiar with those arches. I wouldn’t oppose making this Intel only if
>> there’s good reason to do so.
>>
>
> I don't see any reason to do so.
>
> There are two possible implementations of x86 subregisters in a
> register-renaming chip: either you have separate register files for
> subregisters (creating partial register stalls as sean indicated) or you
> have a single register file and you create a dependency chain on prior ops
> to other subregisters. I don't see how any of these implementations are
> *faster* with subregisters.
>

Well, it's a potential icache and decode bandwidth hit. Apparently (as your
report below), it's not much of an issue.

>
> The nature of the changes to x86's ISA for the 64-bit variant made it
> extremely clear that subregisters were A Bad Idea going forward. The 32-bit
> -> 64-bit is implicit zext specifically to avoid this. The high word
> registers are gone. The list goes on.
>

Agreed. Unfortunately they also botched it with the SSE high bits
(necessitating vzeroupper et al.). Thankfully they've corrected with the
new VEX equivalents.

-- Sean Silva

>
> And I did benchmark this for Jim on an AMD system (istanbul IIRC) and it
> had no negative impact.
>
>
> Curiously, I don't even think this is a valid code size optimization. The
> code size changes I saw were insignificant: way under 1%, usually under
> 0.1%.
>
>
>>
>>
>>
>>> By always performing comparisons on at least 32-bit
>>> registers, performance of the calculation chain leading to the
>>> comparison improves. Continue to use the smaller comparisons when
>>> minimizing size, as that allows better folding of loads into the
>>> comparison instructions.
>>>
>>> rdar://15386341
>>>
>>> Removed:
>>>     llvm/trunk/test/CodeGen/X86/2007-10-17-IllegalAsm.ll
>>>
>>
>> What is the rationale for completely ripping out this testcase?
>>
>>
>>> Modified:
>>>     llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
>>>     llvm/trunk/test/CodeGen/X86/3addr-16bit.ll
>>>     llvm/trunk/test/CodeGen/X86/codegen-prepare-extload.ll
>>>     llvm/trunk/test/CodeGen/X86/ctpop-combine.ll
>>>     llvm/trunk/test/CodeGen/X86/memcmp.ll
>>>     llvm/trunk/test/CodeGen/X86/shrink-compare.ll
>>>
>>> Modified: llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86ISelLowering.cpp?rev=195496&r1=195495&r2=195496&view=diff
>>>
>>> ==============================================================================
>>> --- llvm/trunk/lib/Target/X86/X86ISelLowering.cpp (original)
>>> +++ llvm/trunk/lib/Target/X86/X86ISelLowering.cpp Fri Nov 22 13:57:47
>>> 2013
>>> @@ -3419,6 +3419,24 @@ bool X86::isCalleePop(CallingConv::ID Ca
>>>    }
>>>  }
>>>
>>> +/// \brief Return true if the condition is an unsigned comparison
>>> operation.
>>> +static bool isX86CCUnsigned(unsigned X86CC) {
>>> +  switch (X86CC) {
>>> +  default: llvm_unreachable("Invalid integer condition!");
>>> +  case X86::COND_E:     return true;
>>> +  case X86::COND_G:     return false;
>>> +  case X86::COND_GE:    return false;
>>> +  case X86::COND_L:     return false;
>>> +  case X86::COND_LE:    return false;
>>> +  case X86::COND_NE:    return true;
>>> +  case X86::COND_B:     return true;
>>> +  case X86::COND_A:     return true;
>>> +  case X86::COND_BE:    return true;
>>> +  case X86::COND_AE:    return true;
>>> +  }
>>> +  llvm_unreachable("covered switch fell through?!");
>>> +}
>>> +
>>>
>>
>> Two llvm_unreachables is overkill. Thankfully we actually have a coding
>> standard for this <
>> http://llvm.org/docs/CodingStandards.html#don-t-use-default-labels-in-fully-covered-switches-over-enumerations
>> >.
>>
>> -- Sean Silva
>>
>>
>>>  /// TranslateX86CC - do a one to one translation of a ISD::CondCode to
>>> the X86
>>>  /// specific condition code, returning the condition code and the
>>> LHS/RHS of the
>>>  /// comparison to make.
>>> @@ -9662,6 +9680,17 @@ SDValue X86TargetLowering::EmitCmp(SDVal
>>>    SDLoc dl(Op0);
>>>    if ((Op0.getValueType() == MVT::i8 || Op0.getValueType() == MVT::i16
>>> ||
>>>         Op0.getValueType() == MVT::i32 || Op0.getValueType() ==
>>> MVT::i64)) {
>>> +    // Do the comparison at i32 if it's smaller. This avoids subregister
>>> +    // aliasing issues. Keep the smaller reference if we're optimizing
>>> for
>>> +    // size, however, as that'll allow better folding of memory
>>> operations.
>>> +    if (Op0.getValueType() != MVT::i32 && Op0.getValueType() !=
>>> MVT::i64 &&
>>> +
>>>  !DAG.getMachineFunction().getFunction()->getAttributes().hasAttribute(
>>> +             AttributeSet::FunctionIndex, Attribute::MinSize)) {
>>> +      unsigned ExtendOp =
>>> +          isX86CCUnsigned(X86CC) ? ISD::ZERO_EXTEND : ISD::SIGN_EXTEND;
>>> +      Op0 = DAG.getNode(ExtendOp, dl, MVT::i32, Op0);
>>> +      Op1 = DAG.getNode(ExtendOp, dl, MVT::i32, Op1);
>>> +    }
>>>      // Use SUB instead of CMP to enable CSE between SUB and CMP.
>>>      SDVTList VTs = DAG.getVTList(Op0.getValueType(), MVT::i32);
>>>      SDValue Sub = DAG.getNode(X86ISD::SUB, dl, VTs,
>>>
>>> Removed: llvm/trunk/test/CodeGen/X86/2007-10-17-IllegalAsm.ll
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/2007-10-17-IllegalAsm.ll?rev=195495&view=auto
>>>
>>> ==============================================================================
>>> --- llvm/trunk/test/CodeGen/X86/2007-10-17-IllegalAsm.ll (original)
>>> +++ llvm/trunk/test/CodeGen/X86/2007-10-17-IllegalAsm.ll (removed)
>>> @@ -1,87 +0,0 @@
>>> -; RUN: llc < %s -mtriple=x86_64-linux-gnu | grep addb | not grep x
>>> -; RUN: llc < %s -mtriple=x86_64-linux-gnu | grep cmpb | not grep x
>>> -; PR1734
>>> -
>>> -target triple = "x86_64-unknown-linux-gnu"
>>> -       %struct.CUMULATIVE_ARGS = type { i32, i32, i32, i32, i32, i32,
>>> i32, i32, i32, i32, i32, i32, i32, i32 }
>>> -       %struct.eh_status = type opaque
>>> -       %struct.emit_status = type { i32, i32, %struct.rtx_def*,
>>> %struct.rtx_def*, %struct.sequence_stack*, i32, %struct.location_t, i32,
>>> i8*, %struct.rtx_def** }
>>> -       %struct.expr_status = type { i32, i32, i32, %struct.rtx_def*,
>>> %struct.rtx_def*, %struct.rtx_def* }
>>> -       %struct.function = type { %struct.eh_status*,
>>> %struct.expr_status*, %struct.emit_status*, %struct.varasm_status*,
>>> %struct.tree_node*, %struct.tree_node*, %struct.tree_node*,
>>> %struct.tree_node*, %struct.function*, i32, i32, i32, i32,
>>> %struct.rtx_def*, %struct.CUMULATIVE_ARGS, %struct.rtx_def*,
>>> %struct.rtx_def*, %struct.initial_value_struct*, %struct.rtx_def*,
>>> %struct.rtx_def*, %struct.rtx_def*, %struct.rtx_def*, %struct.rtx_def*,
>>> %struct.rtx_def*, i8, i32, i64, %struct.tree_node*, %struct.tree_node*,
>>> %struct.rtx_def*, %struct.varray_head_tag*, %struct.temp_slot*, i32,
>>> %struct.var_refs_queue*, i32, i32, %struct.rtvec_def*, %struct.tree_node*,
>>> i32, i32, i32, %struct.machine_function*, i32, i32, i8, i8,
>>> %struct.language_function*, %struct.rtx_def*, i32, i32, i32, i32,
>>> %struct.location_t, %struct.varray_head_tag*, %struct.tree_node*,
>>> %struct.tree_node*, i8, i8, i8 }
>>> -       %struct.initial_value_struct = type opaque
>>> -       %struct.lang_decl = type opaque
>>> -       %struct.language_function = type opaque
>>> -       %struct.location_t = type { i8*, i32 }
>>> -       %struct.machine_function = type { %struct.stack_local_entry*,
>>> i8*, %struct.rtx_def*, i32, i32, i32, i32, i32 }
>>> -       %struct.rtunion = type { i8* }
>>> -       %struct.rtvec_def = type { i32, [1 x %struct.rtx_def*] }
>>> -       %struct.rtx_def = type { i16, i8, i8, %struct.u }
>>> -       %struct.sequence_stack = type { %struct.rtx_def*,
>>> %struct.rtx_def*, %struct.sequence_stack* }
>>> -       %struct.stack_local_entry = type opaque
>>> -       %struct.temp_slot = type opaque
>>> -       %struct.tree_common = type { %struct.tree_node*,
>>> %struct.tree_node*, %union.tree_ann_d*, i8, i8, i8, i8, i8 }
>>> -       %struct.tree_decl = type { %struct.tree_common,
>>> %struct.location_t, i32, %struct.tree_node*, i8, i8, i8, i8, i8, i8, i8,
>>> i8, i32, %struct.tree_decl_u1, %struct.tree_node*, %struct.tree_node*,
>>> %struct.tree_node*, %struct.tree_node*, %struct.tree_node*,
>>> %struct.tree_node*, %struct.tree_node*, %struct.tree_node*,
>>> %struct.tree_node*, %struct.tree_node*, %struct.rtx_def*, i32,
>>> %struct.tree_decl_u2, %struct.tree_node*, %struct.tree_node*, i64,
>>> %struct.lang_decl* }
>>> -       %struct.tree_decl_u1 = type { i64 }
>>> -       %struct.tree_decl_u2 = type { %struct.function* }
>>> -       %struct.tree_node = type { %struct.tree_decl }
>>> -       %struct.u = type { [1 x %struct.rtunion] }
>>> -       %struct.var_refs_queue = type { %struct.rtx_def*, i32, i32,
>>> %struct.var_refs_queue* }
>>> -       %struct.varasm_status = type opaque
>>> -       %struct.varray_data = type { [1 x i64] }
>>> -       %struct.varray_head_tag = type { i64, i64, i32, i8*,
>>> %struct.varray_data }
>>> -       %union.tree_ann_d = type opaque
>>> -
>>> -define void @layout_type(%struct.tree_node* %type) {
>>> -entry:
>>> -       %tmp32 = load i32* null, align 8                ; <i32> [#uses=3]
>>> -       %tmp3435 = trunc i32 %tmp32 to i8               ; <i8> [#uses=1]
>>> -       %tmp53 = icmp eq %struct.tree_node* null, null          ; <i1>
>>> [#uses=1]
>>> -       br i1 %tmp53, label %cond_next57, label %UnifiedReturnBlock
>>> -
>>> -cond_next57:           ; preds = %entry
>>> -       %tmp65 = and i32 %tmp32, 255            ; <i32> [#uses=1]
>>> -       switch i32 %tmp65, label %UnifiedReturnBlock [
>>> -                i32 6, label %bb140
>>> -                i32 7, label %bb140
>>> -                i32 8, label %bb140
>>> -                i32 13, label %bb478
>>> -       ]
>>> -
>>> -bb140:         ; preds = %cond_next57, %cond_next57, %cond_next57
>>> -       %tmp219 = load i32* null, align 8               ; <i32> [#uses=1]
>>> -       %tmp221222 = trunc i32 %tmp219 to i8            ; <i8> [#uses=1]
>>> -       %tmp223 = icmp eq i8 %tmp221222, 24             ; <i1> [#uses=1]
>>> -       br i1 %tmp223, label %cond_true226, label %cond_next340
>>> -
>>> -cond_true226:          ; preds = %bb140
>>> -       switch i8 %tmp3435, label %cond_true288 [
>>> -                i8 6, label %cond_next340
>>> -                i8 9, label %cond_next340
>>> -                i8 7, label %cond_next340
>>> -                i8 8, label %cond_next340
>>> -                i8 10, label %cond_next340
>>> -       ]
>>> -
>>> -cond_true288:          ; preds = %cond_true226
>>> -       unreachable
>>> -
>>> -cond_next340:          ; preds = %cond_true226, %cond_true226,
>>> %cond_true226, %cond_true226, %cond_true226, %bb140
>>> -       ret void
>>> -
>>> -bb478:         ; preds = %cond_next57
>>> -       br i1 false, label %cond_next500, label %cond_true497
>>> -
>>> -cond_true497:          ; preds = %bb478
>>> -       unreachable
>>> -
>>> -cond_next500:          ; preds = %bb478
>>> -       %tmp513 = load i32* null, align 8               ; <i32> [#uses=1]
>>> -       %tmp545 = and i32 %tmp513, 8192         ; <i32> [#uses=1]
>>> -       %tmp547 = and i32 %tmp32, -8193         ; <i32> [#uses=1]
>>> -       %tmp548 = or i32 %tmp547, %tmp545               ; <i32> [#uses=1]
>>> -       store i32 %tmp548, i32* null, align 8
>>> -       ret void
>>> -
>>> -UnifiedReturnBlock:            ; preds = %cond_next57, %entry
>>> -       ret void
>>> -}
>>>
>>> Modified: llvm/trunk/test/CodeGen/X86/3addr-16bit.ll
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/3addr-16bit.ll?rev=195496&r1=195495&r2=195496&view=diff
>>>
>>> ==============================================================================
>>> --- llvm/trunk/test/CodeGen/X86/3addr-16bit.ll (original)
>>> +++ llvm/trunk/test/CodeGen/X86/3addr-16bit.ll Fri Nov 22 13:57:47 2013
>>> @@ -34,7 +34,7 @@ entry:
>>>
>>>  ; 64BIT-LABEL:     t2:
>>>  ; 64BIT-NOT: movw %si, %ax
>>> -; 64BIT:     decl %eax
>>> +; 64BIT:     leal -1(%rsi), %eax
>>>  ; 64BIT:     movzwl %ax
>>>    %0 = icmp eq i16 %k, %c                         ; <i1> [#uses=1]
>>>    %1 = add i16 %k, -1                             ; <i16> [#uses=3]
>>> @@ -59,7 +59,7 @@ entry:
>>>
>>>  ; 64BIT-LABEL:     t3:
>>>  ; 64BIT-NOT: movw %si, %ax
>>> -; 64BIT:     addl $2, %eax
>>> +; 64BIT:     leal 2(%rsi), %eax
>>>    %0 = add i16 %k, 2                              ; <i16> [#uses=3]
>>>    %1 = icmp eq i16 %k, %c                         ; <i1> [#uses=1]
>>>    br i1 %1, label %bb, label %bb1
>>> @@ -82,7 +82,7 @@ entry:
>>>
>>>  ; 64BIT-LABEL:     t4:
>>>  ; 64BIT-NOT: movw %si, %ax
>>> -; 64BIT:     addl %edi, %eax
>>> +; 64BIT:     leal (%rsi,%rdi), %eax
>>>    %0 = add i16 %k, %c                             ; <i16> [#uses=3]
>>>    %1 = icmp eq i16 %k, %c                         ; <i1> [#uses=1]
>>>    br i1 %1, label %bb, label %bb1
>>>
>>> Modified: llvm/trunk/test/CodeGen/X86/codegen-prepare-extload.ll
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/codegen-prepare-extload.ll?rev=195496&r1=195495&r2=195496&view=diff
>>>
>>> ==============================================================================
>>> --- llvm/trunk/test/CodeGen/X86/codegen-prepare-extload.ll (original)
>>> +++ llvm/trunk/test/CodeGen/X86/codegen-prepare-extload.ll Fri Nov 22
>>> 13:57:47 2013
>>> @@ -5,7 +5,7 @@
>>>  ; CodeGenPrepare should move the zext into the block with the load
>>>  ; so that SelectionDAG can select it with the load.
>>>
>>> -; CHECK: movzbl ({{%rdi|%rcx}}), %eax
>>> +; CHECK: movsbl ({{%rdi|%rcx}}), %eax
>>>
>>>  define void @foo(i8* %p, i32* %q) {
>>>  entry:
>>>
>>> Modified: llvm/trunk/test/CodeGen/X86/ctpop-combine.ll
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/ctpop-combine.ll?rev=195496&r1=195495&r2=195496&view=diff
>>>
>>> ==============================================================================
>>> --- llvm/trunk/test/CodeGen/X86/ctpop-combine.ll (original)
>>> +++ llvm/trunk/test/CodeGen/X86/ctpop-combine.ll Fri Nov 22 13:57:47 2013
>>> @@ -35,6 +35,6 @@ define i32 @test3(i64 %x) nounwind readn
>>>    %conv = zext i1 %cmp to i32
>>>    ret i32 %conv
>>>  ; CHECK-LABEL: test3:
>>> -; CHECK: cmpb $2
>>> +; CHECK: cmpl $2
>>>  ; CHECK: ret
>>>  }
>>>
>>> Modified: llvm/trunk/test/CodeGen/X86/memcmp.ll
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/memcmp.ll?rev=195496&r1=195495&r2=195496&view=diff
>>>
>>> ==============================================================================
>>> --- llvm/trunk/test/CodeGen/X86/memcmp.ll (original)
>>> +++ llvm/trunk/test/CodeGen/X86/memcmp.ll Fri Nov 22 13:57:47 2013
>>> @@ -22,8 +22,9 @@ bb:
>>>  return:                                           ; preds = %entry
>>>    ret void
>>>  ; CHECK-LABEL: memcmp2:
>>> -; CHECK: movw    ([[A0:%rdi|%rcx]]), %ax
>>> -; CHECK: cmpw    ([[A1:%rsi|%rdx]]), %ax
>>> +; CHECK: movzwl
>>> +; CHECK-NEXT: movzwl
>>> +; CHECK-NEXT: cmpl
>>>  ; NOBUILTIN-LABEL: memcmp2:
>>>  ; NOBUILTIN: callq
>>>  }
>>> @@ -41,7 +42,8 @@ bb:
>>>  return:                                           ; preds = %entry
>>>    ret void
>>>  ; CHECK-LABEL: memcmp2a:
>>> -; CHECK: cmpw    $28527, ([[A0]])
>>> +; CHECK: movzwl
>>> +; CHECK-NEXT: cmpl    $28527,
>>>  }
>>>
>>>
>>> @@ -58,8 +60,8 @@ bb:
>>>  return:                                           ; preds = %entry
>>>    ret void
>>>  ; CHECK-LABEL: memcmp4:
>>> -; CHECK: movl    ([[A0]]), %eax
>>> -; CHECK: cmpl    ([[A1]]), %eax
>>> +; CHECK: movl
>>> +; CHECK-NEXT: cmpl
>>>  }
>>>
>>>  define void @memcmp4a(i8* %X, i32* nocapture %P) nounwind {
>>> @@ -75,7 +77,7 @@ bb:
>>>  return:                                           ; preds = %entry
>>>    ret void
>>>  ; CHECK-LABEL: memcmp4a:
>>> -; CHECK: cmpl $1869573999, ([[A0]])
>>> +; CHECK: cmpl $1869573999,
>>>  }
>>>
>>>  define void @memcmp8(i8* %X, i8* %Y, i32* nocapture %P) nounwind {
>>> @@ -91,8 +93,8 @@ bb:
>>>  return:                                           ; preds = %entry
>>>    ret void
>>>  ; CHECK-LABEL: memcmp8:
>>> -; CHECK: movq    ([[A0]]), %rax
>>> -; CHECK: cmpq    ([[A1]]), %rax
>>> +; CHECK: movq
>>> +; CHECK: cmpq
>>>  }
>>>
>>>  define void @memcmp8a(i8* %X, i32* nocapture %P) nounwind {
>>> @@ -108,7 +110,7 @@ bb:
>>>  return:                                           ; preds = %entry
>>>    ret void
>>>  ; CHECK-LABEL: memcmp8a:
>>> -; CHECK: movabsq $8029759185026510694, %rax
>>> -; CHECK: cmpq  %rax, ([[A0]])
>>> +; CHECK: movabsq $8029759185026510694,
>>> +; CHECK: cmpq
>>>  }
>>>
>>>
>>> Modified: llvm/trunk/test/CodeGen/X86/shrink-compare.ll
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/shrink-compare.ll?rev=195496&r1=195495&r2=195496&view=diff
>>>
>>> ==============================================================================
>>> --- llvm/trunk/test/CodeGen/X86/shrink-compare.ll (original)
>>> +++ llvm/trunk/test/CodeGen/X86/shrink-compare.ll Fri Nov 22 13:57:47
>>> 2013
>>> @@ -2,7 +2,7 @@
>>>
>>>  declare void @bar()
>>>
>>> -define void @test1(i32* nocapture %X) nounwind {
>>> +define void @test1(i32* nocapture %X) nounwind minsize {
>>>  entry:
>>>    %tmp1 = load i32* %X, align 4
>>>    %and = and i32 %tmp1, 255
>>> @@ -19,7 +19,7 @@ if.end:
>>>  ; CHECK: cmpb $47, (%{{rdi|rcx}})
>>>  }
>>>
>>> -define void @test2(i32 %X) nounwind {
>>> +define void @test2(i32 %X) nounwind minsize {
>>>  entry:
>>>    %and = and i32 %X, 255
>>>    %cmp = icmp eq i32 %and, 47
>>> @@ -35,7 +35,7 @@ if.end:
>>>  ; CHECK: cmpb $47, %{{dil|cl}}
>>>  }
>>>
>>> -define void @test3(i32 %X) nounwind {
>>> +define void @test3(i32 %X) nounwind minsize {
>>>  entry:
>>>    %and = and i32 %X, 255
>>>    %cmp = icmp eq i32 %and, 255
>>> @@ -70,7 +70,7 @@ lor.end:
>>>  @x = global { i8, i8, i8, i8, i8, i8, i8, i8 } { i8 1, i8 0, i8 0, i8
>>> 0, i8 1, i8 0, i8 0, i8 1 }, align 4
>>>
>>>  ; PR16551
>>> -define void @test5(i32 %X) nounwind {
>>> +define void @test5(i32 %X) nounwind minsize {
>>>  entry:
>>>    %bf.load = load i56* bitcast ({ i8, i8, i8, i8, i8, i8, i8, i8 }* @x
>>> to i56*), align 4
>>>    %bf.lshr = lshr i56 %bf.load, 32
>>>
>>>
>>> _______________________________________________
>>> llvm-commits mailing list
>>> llvm-commits at cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>
>>
>>
>>
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20131126/0a3f6e78/attachment.html>