[llvm] r244601 - [X86] Allow merging of immediates within a basic block for code size savings

Sean Silva via llvm-commits llvm-commits at lists.llvm.org
Wed Aug 12 17:33:02 PDT 2015


On Wed, Aug 12, 2015 at 10:41 AM, Sanjay Patel <spatel at rotateright.com>
wrote:

> Hi Sean -
>
> I think your example shows 3 possible improvements. Let me know if this
> looks right, and I'll file some bugs to track them:
>
> 1. When addressing multiple places beyond an imm8, load a register with a
> base and use SIB addressing to bring the offsets within 8-bits. So the
> example would end up looking something like this:
>    movl   $376, %ebx
>    movq   $0, (%rax, %rbx)
>    movq   $0, 8(%rax, %rbx)
>    movq   $0, 16(%rax, %rbx)
> ...
>
>
Here's a quick overview of alternatives:

33 bytes (each mov to mem pays imm32 for $0 and imm32 for the offset)

        movq    $0, 376(%rax)           ## encoding:
[0x48,0xc7,0x80,0x78,0x01,0x00,0x00,0x00,0x00,0x00,0x00]

        movq    $0, 384(%rax)           ## encoding:
[0x48,0xc7,0x80,0x80,0x01,0x00,0x00,0x00,0x00,0x00,0x00]

        movq    $0, 392(%rax)           ## encoding:
[0x48,0xc7,0x80,0x88,0x01,0x00,0x00,0x00,0x00,0x00,0x00]

31 bytes

        movl    $376, %ebx              ## encoding:
[0xbb,0x78,0x01,0x00,0x00]

        movq    $0, (%rax,%rbx)         ## encoding:
[0x48,0xc7,0x04,0x18,0x00,0x00,0x00,0x00]

        movq    $0, 8(%rax,%rbx)        ## encoding:
[0x48,0xc7,0x44,0x18,0x08,0x00,0x00,0x00,0x00]

        movq    $0, 16(%rax,%rbx)       ## encoding:
[0x48,0xc7,0x44,0x18,0x10,0x00,0x00,0x00,0x00]

23 bytes

        xorl    %ebx, %ebx              ## encoding: [0x31,0xdb]

        movq    %rbx, 376(%rax)         ## encoding:
[0x48,0x89,0x98,0x78,0x01,0x00,0x00]

        movq    %rbx, 384(%rax)         ## encoding:
[0x48,0x89,0x98,0x80,0x01,0x00,0x00]

        movq    %rbx, 392(%rax)         ## encoding:
[0x48,0x89,0x98,0x88,0x01,0x00,0x00]

21 bytes

        xorl    %ebx, %ebx              ## encoding: [0x31,0xdb]

        movl    $376, %ecx              ## encoding:
[0xb9,0x78,0x01,0x00,0x00]

        movq    %rbx, (%rax,%rcx)       ## encoding: [0x48,0x89,0x1c,0x08]

        movq    %rbx, 8(%rax,%rcx)      ## encoding:
[0x48,0x89,0x5c,0x08,0x08]

        movq    %rbx, 16(%rax,%rcx)     ## encoding:
[0x48,0x89,0x5c,0x08,0x10]

As you can see, saving the immediate $0 is always a win for size since a 2
byte xor reg,reg pays for itself saving imm32 -> imm8. With just 3  of the
evil huge mov's, it is hard to amortize the cost of a mov r32,imm32 (5
bytes) for the Index of the SIB, but we barely squeeze by.


Not sure we can do this unless optimizing for size. See below.
>
> 2. When storing the same immediate multiple times, use a reg to hold the
> immediate even when not optimizing specifically for size (this was
> mentioned as a follow-on in D11363):
>    xorl   %ebx, %ebx            <--- xor is only for zero; in the general
> case, this would be a mov (load immediate)
>    movq   %rbx, 376(%rax)
>    movq   %rbx, 384(%rax)
>    movq   %rbx, 392(%rax)
> ...
>
> I don't think this qualifies as a no-brainer when not optimizing for size.
> If it's an xor to make a zero, then it's *almost* free (handled by the
> renamer) on any recent OOO x86, but that xor still requires some decode/uop
> resources.
>

I would double-check this on Jaguar. It may use a mov-imm micro-op.

Jaguar can decode 2 inst per cycle (but see below), so since `mov
[mem],imm`  has throughput 1 (See Agner) we should not be decode
bottlenecked.
For reference, `mov [mem],imm` is decoded into 2 micro-ops (see "Table 1.
Typical Instruction Mappings" in [SOG]) whereas `mov [mem],reg` is only 1
micro-op, so it is *preferable* to use a reg since it amortizes the cost of
the `mov-imm` micro-op across the stores.

[SOG] http://support.amd.com/TechDocs/52128_16h_Software_Opt_Guide.zip

On Jaguar at least, I believe that >=4 11-byte instructions in a row like
will cause Jaguar's poor little decoder to fail to decode 2 inst. per cycle
on at least one of them, so that should amortize the decode cost of the xor
anyway. Intel chips are usually not decode limited like this though. Simon
Whittaker has more information about under exactly what circumstances
Jaguar will fail to decode, but it would be nice if we could do a peephole
across the text checking every pair of adjacent commutable instructions to
avoid the situations where we would fail to decode (IIRC, basically if the
opcode byte of the second instruction falls more than 5 bytes into the
second 16-byte window).


> In the general case where we use a mov to load the reg with the immediate,
> this argument gets tougher. We need a perf heuristic that says 'saving X
> bytes of instructions is worth adding Y extra instructions of type Z'?
>

Thankfully almost all the cases I've seen it is 0, so I think we can get
the lion's share of cases assuming only 0.


>
> 3. If we really want to make this example smaller and faster, we should
> merge stores like we do in DAGCombiner's MergeConsecutiveStores:
>    vxorps   %ymm0, %ymm0   <--- assumes we have AVX for 32-byte ops; if
> not, SSE for 16-byte
>    vmovups   %ymm0, 376(%rax)
> ...
>
>
This seems like it is always a win.


> Are the stores in the example created too late for the DAGCombiner? Do we
> need to repeat some subset of that merge functionality in a machine pass?
>

I'm not an expert in this backend stuff, but the decisions we need to make
here are pretty low-level dependent (e.g. exploiting proximity to a call to
know certain registers are available), so it probably makes sense to do
this at the MI layer.

-- Sean Silva


>
>
>
> On Wed, Aug 12, 2015 at 2:00 AM, Sean Silva <chisophugis at gmail.com> wrote:
>
>> This may be interesting even outside of optimizing for size. For example,
>> I see wonderful fragments like the following all over my binaries:
>>
>>    20b7f:`      48 c7 80 78 01 00 00 00 00 00 00 `      movq`   $0,
>> 376(%rax)
>>
>>    20b8a:`      48 c7 80 80 01 00 00 00 00 00 00 `      movq`   $0,
>> 384(%rax)
>>
>>    20b95:`      48 c7 80 88 01 00 00 00 00 00 00 `      movq`   $0,
>> 392(%rax)
>>
>>    20ba0:`      48 c7 80 90 01 00 00 00 00 00 00 `      movq`   $0,
>> 400(%rax)
>>
>>    20bab:`      48 c7 80 a0 01 00 00 00 00 00 00 `      movq`   $0,
>> 416(%rax)
>>
>>    20bb6:`      48 c7 80 a8 09 00 00 00 00 00 00 `      movq`   $0,
>> 2472(%rax)
>>
>> Yes, 11 byte instructions just to zero out 8 bytes of memory :)
>> If you are omitting frame pointer and are using stack slots beyond an
>> imm8, pay an extra byte for the SIB.
>>
>> As observed in this patch, this pattern often occurs in close proximity
>> to calls (the register is usually rax for a return value or rbp/rsp when
>> preparing stack-passed arguments for a call), so there is often a
>> no-brainer choice of register to xor and use r,m forms.
>>
>> I have seen functions for which more than half of the text size is due to
>> these sorts of instructions, getting into pretty serious icache threat
>> territory and worth looking at even outside optsize.
>>
>> -- Sean Silva
>>
>>
>>
>>
>> On Tue, Aug 11, 2015 at 7:10 AM, Michael Kuperstein via llvm-commits <
>> llvm-commits at lists.llvm.org> wrote:
>>
>>> Author: mkuper
>>> Date: Tue Aug 11 09:10:58 2015
>>> New Revision: 244601
>>>
>>> URL: http://llvm.org/viewvc/llvm-project?rev=244601&view=rev
>>> Log:
>>> [X86] Allow merging of immediates within a basic block for code size
>>> savings
>>>
>>> First step in preventing immediates that occur more than once within a
>>> single
>>> basic block from being pulled into their users, in order to prevent
>>> unnecessary
>>> large instruction encoding .Currently enabled only when optimizing for
>>> size.
>>>
>>> Patch by: zia.ansari at intel.com
>>> Differential Revision: http://reviews.llvm.org/D11363
>>>
>>> Added:
>>>     llvm/trunk/test/CodeGen/X86/immediate_merging.ll
>>> Removed:
>>>     llvm/trunk/test/CodeGen/X86/remat-invalid-liveness.ll
>>> Modified:
>>>     llvm/trunk/lib/Target/X86/X86ISelDAGToDAG.cpp
>>>     llvm/trunk/lib/Target/X86/X86InstrArithmetic.td
>>>     llvm/trunk/lib/Target/X86/X86InstrInfo.td
>>>
>>> Modified: llvm/trunk/lib/Target/X86/X86ISelDAGToDAG.cpp
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86ISelDAGToDAG.cpp?rev=244601&r1=244600&r2=244601&view=diff
>>>
>>> ==============================================================================
>>> --- llvm/trunk/lib/Target/X86/X86ISelDAGToDAG.cpp (original)
>>> +++ llvm/trunk/lib/Target/X86/X86ISelDAGToDAG.cpp Tue Aug 11 09:10:58
>>> 2015
>>> @@ -283,6 +283,82 @@ namespace {
>>>          Segment = CurDAG->getRegister(0, MVT::i32);
>>>      }
>>>
>>> +    // Utility function to determine whether we should avoid selecting
>>> +    // immediate forms of instructions for better code size or not.
>>> +    // At a high level, we'd like to avoid such instructions when
>>> +    // we have similar constants used within the same basic block
>>> +    // that can be kept in a register.
>>> +    //
>>> +    bool shouldAvoidImmediateInstFormsForSize(SDNode *N) const {
>>> +      uint32_t UseCount = 0;
>>> +
>>> +      // Do not want to hoist if we're not optimizing for size.
>>> +      // TODO: We'd like to remove this restriction.
>>> +      // See the comment in X86InstrInfo.td for more info.
>>> +      if (!OptForSize)
>>> +        return false;
>>> +
>>> +      // Walk all the users of the immediate.
>>> +      for (SDNode::use_iterator UI = N->use_begin(),
>>> +           UE = N->use_end(); (UI != UE) && (UseCount < 2); ++UI) {
>>> +
>>> +        SDNode *User = *UI;
>>> +
>>> +        // This user is already selected. Count it as a legitimate use
>>> and
>>> +        // move on.
>>> +        if (User->isMachineOpcode()) {
>>> +          UseCount++;
>>> +          continue;
>>> +        }
>>> +
>>> +        // We want to count stores of immediates as real uses.
>>> +        if (User->getOpcode() == ISD::STORE &&
>>> +            User->getOperand(1).getNode() == N) {
>>> +          UseCount++;
>>> +          continue;
>>> +        }
>>> +
>>> +        // We don't currently match users that have > 2 operands (except
>>> +        // for stores, which are handled above)
>>> +        // Those instruction won't match in ISEL, for now, and would
>>> +        // be counted incorrectly.
>>> +        // This may change in the future as we add additional
>>> instruction
>>> +        // types.
>>> +        if (User->getNumOperands() != 2)
>>> +          continue;
>>> +
>>> +        // Immediates that are used for offsets as part of stack
>>> +        // manipulation should be left alone. These are typically
>>> +        // used to indicate SP offsets for argument passing and
>>> +        // will get pulled into stores/pushes (implicitly).
>>> +        if (User->getOpcode() == X86ISD::ADD ||
>>> +            User->getOpcode() == ISD::ADD    ||
>>> +            User->getOpcode() == X86ISD::SUB ||
>>> +            User->getOpcode() == ISD::SUB) {
>>> +
>>> +          // Find the other operand of the add/sub.
>>> +          SDValue OtherOp = User->getOperand(0);
>>> +          if (OtherOp.getNode() == N)
>>> +            OtherOp = User->getOperand(1);
>>> +
>>> +          // Don't count if the other operand is SP.
>>> +          RegisterSDNode *RegNode;
>>> +          if (OtherOp->getOpcode() == ISD::CopyFromReg &&
>>> +              (RegNode = dyn_cast_or_null<RegisterSDNode>(
>>> +                 OtherOp->getOperand(1).getNode())))
>>> +            if ((RegNode->getReg() == X86::ESP) ||
>>> +                (RegNode->getReg() == X86::RSP))
>>> +              continue;
>>> +        }
>>> +
>>> +        // ... otherwise, count this and move on.
>>> +        UseCount++;
>>> +      }
>>> +
>>> +      // If we have more than 1 use, then recommend for hoisting.
>>> +      return (UseCount > 1);
>>> +    }
>>> +
>>>      /// getI8Imm - Return a target constant with the specified value,
>>> of type
>>>      /// i8.
>>>      inline SDValue getI8Imm(unsigned Imm, SDLoc DL) {
>>>
>>> Modified: llvm/trunk/lib/Target/X86/X86InstrArithmetic.td
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86InstrArithmetic.td?rev=244601&r1=244600&r2=244601&view=diff
>>>
>>> ==============================================================================
>>> --- llvm/trunk/lib/Target/X86/X86InstrArithmetic.td (original)
>>> +++ llvm/trunk/lib/Target/X86/X86InstrArithmetic.td Tue Aug 11 09:10:58
>>> 2015
>>> @@ -615,14 +615,14 @@ class X86TypeInfo<ValueType vt, string i
>>>  def invalid_node : SDNode<"<<invalid_node>>",
>>> SDTIntLeaf,[],"<<invalid_node>>">;
>>>
>>>
>>> -def Xi8  : X86TypeInfo<i8 , "b", GR8 , loadi8 , i8mem ,
>>> -                       Imm8 , i8imm ,    imm,          i8imm   ,
>>> invalid_node,
>>> +def Xi8  : X86TypeInfo<i8, "b", GR8, loadi8, i8mem,
>>> +                       Imm8, i8imm, imm8_su, i8imm, invalid_node,
>>>                         0, OpSizeFixed, 0>;
>>>  def Xi16 : X86TypeInfo<i16, "w", GR16, loadi16, i16mem,
>>> -                       Imm16, i16imm,    imm,          i16i8imm,
>>> i16immSExt8,
>>> +                       Imm16, i16imm, imm16_su, i16i8imm,
>>> i16immSExt8_su,
>>>                         1, OpSize16, 0>;
>>>  def Xi32 : X86TypeInfo<i32, "l", GR32, loadi32, i32mem,
>>> -                       Imm32, i32imm,    imm,          i32i8imm,
>>> i32immSExt8,
>>> +                       Imm32, i32imm, imm32_su, i32i8imm,
>>> i32immSExt8_su,
>>>                         1, OpSize32, 0>;
>>>  def Xi64 : X86TypeInfo<i64, "q", GR64, loadi64, i64mem,
>>>                         Imm32S, i64i32imm, i64immSExt32, i64i8imm,
>>> i64immSExt8,
>>>
>>> Modified: llvm/trunk/lib/Target/X86/X86InstrInfo.td
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86InstrInfo.td?rev=244601&r1=244600&r2=244601&view=diff
>>>
>>> ==============================================================================
>>> --- llvm/trunk/lib/Target/X86/X86InstrInfo.td (original)
>>> +++ llvm/trunk/lib/Target/X86/X86InstrInfo.td Tue Aug 11 09:10:58 2015
>>> @@ -873,6 +873,40 @@ def i16immSExt8  : ImmLeaf<i16, [{ retur
>>>  def i32immSExt8  : ImmLeaf<i32, [{ return Imm == (int8_t)Imm; }]>;
>>>  def i64immSExt8  : ImmLeaf<i64, [{ return Imm == (int8_t)Imm; }]>;
>>>
>>> +// If we have multiple users of an immediate, it's much smaller to reuse
>>> +// the register, rather than encode the immediate in every instruction.
>>> +// This has the risk of increasing register pressure from stretched live
>>> +// ranges, however, the immediates should be trivial to rematerialize by
>>> +// the RA in the event of high register pressure.
>>> +// TODO : This is currently enabled for stores and binary ops. There
>>> are more
>>> +// cases for which this can be enabled, though this catches the bulk of
>>> the
>>> +// issues.
>>> +// TODO2 : This should really also be enabled under O2, but there's
>>> currently
>>> +// an issue with RA where we don't pull the constants into their users
>>> +// when we rematerialize them. I'll follow-up on enabling O2 after we
>>> fix that
>>> +// issue.
>>> +// TODO3 : This is currently limited to single basic blocks (DAG
>>> creation
>>> +// pulls block immediates to the top and merges them if necessary).
>>> +// Eventually, it would be nice to allow ConstantHoisting to merge
>>> constants
>>> +// globally for potentially added savings.
>>> +//
>>> +def imm8_su : PatLeaf<(i8 imm), [{
>>> +    return !shouldAvoidImmediateInstFormsForSize(N);
>>> +}]>;
>>> +def imm16_su : PatLeaf<(i16 imm), [{
>>> +    return !shouldAvoidImmediateInstFormsForSize(N);
>>> +}]>;
>>> +def imm32_su : PatLeaf<(i32 imm), [{
>>> +    return !shouldAvoidImmediateInstFormsForSize(N);
>>> +}]>;
>>> +
>>> +def i16immSExt8_su : PatLeaf<(i16immSExt8), [{
>>> +    return !shouldAvoidImmediateInstFormsForSize(N);
>>> +}]>;
>>> +def i32immSExt8_su : PatLeaf<(i32immSExt8), [{
>>> +    return !shouldAvoidImmediateInstFormsForSize(N);
>>> +}]>;
>>> +
>>>
>>>  def i64immSExt32 : ImmLeaf<i64, [{ return Imm == (int32_t)Imm; }]>;
>>>
>>> @@ -1283,13 +1317,13 @@ def MOV32ri_alt : Ii32<0xC7, MRM0r, (out
>>>  let SchedRW = [WriteStore] in {
>>>  def MOV8mi  : Ii8 <0xC6, MRM0m, (outs), (ins i8mem :$dst, i8imm :$src),
>>>                     "mov{b}\t{$src, $dst|$dst, $src}",
>>> -                   [(store (i8 imm:$src), addr:$dst)], IIC_MOV_MEM>;
>>> +                   [(store (i8 imm8_su:$src), addr:$dst)], IIC_MOV_MEM>;
>>>  def MOV16mi : Ii16<0xC7, MRM0m, (outs), (ins i16mem:$dst, i16imm:$src),
>>>                     "mov{w}\t{$src, $dst|$dst, $src}",
>>> -                   [(store (i16 imm:$src), addr:$dst)], IIC_MOV_MEM>,
>>> OpSize16;
>>> +                   [(store (i16 imm16_su:$src), addr:$dst)],
>>> IIC_MOV_MEM>, OpSize16;
>>>  def MOV32mi : Ii32<0xC7, MRM0m, (outs), (ins i32mem:$dst, i32imm:$src),
>>>                     "mov{l}\t{$src, $dst|$dst, $src}",
>>> -                   [(store (i32 imm:$src), addr:$dst)], IIC_MOV_MEM>,
>>> OpSize32;
>>> +                   [(store (i32 imm32_su:$src), addr:$dst)],
>>> IIC_MOV_MEM>, OpSize32;
>>>  def MOV64mi32 : RIi32S<0xC7, MRM0m, (outs), (ins i64mem:$dst,
>>> i64i32imm:$src),
>>>                         "mov{q}\t{$src, $dst|$dst, $src}",
>>>                         [(store i64immSExt32:$src, addr:$dst)],
>>> IIC_MOV_MEM>;
>>>
>>> Added: llvm/trunk/test/CodeGen/X86/immediate_merging.ll
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/immediate_merging.ll?rev=244601&view=auto
>>>
>>> ==============================================================================
>>> --- llvm/trunk/test/CodeGen/X86/immediate_merging.ll (added)
>>> +++ llvm/trunk/test/CodeGen/X86/immediate_merging.ll Tue Aug 11 09:10:58
>>> 2015
>>> @@ -0,0 +1,82 @@
>>> +; RUN: llc -o - -mtriple=i386-unknown-linux-gnu < %s | FileCheck %s
>>> +; RUN: llc -o - -mtriple=x86_64-unknown-linux-gnu < %s | FileCheck %s
>>> +
>>> + at a = common global i32 0, align 4
>>> + at b = common global i32 0, align 4
>>> + at c = common global i32 0, align 4
>>> + at e = common global i32 0, align 4
>>> + at x = common global i32 0, align 4
>>> + at f = common global i32 0, align 4
>>> + at h = common global i32 0, align 4
>>> + at i = common global i32 0, align 4
>>> +
>>> +; Test -Os to make sure immediates with multiple users don't get pulled
>>> in to
>>> +; instructions.
>>> +define i32 @foo() optsize {
>>> +; CHECK-LABEL: foo:
>>> +; CHECK: movl $1234, [[R1:%[a-z]+]]
>>> +; CHECK-NOT: movl $1234, a
>>> +; CHECK-NOT: movl $1234, b
>>> +; CHECK-NOT: movl $12, c
>>> +; CHECK-NOT: cmpl $12, e
>>> +; CHECK: movl [[R1]], a
>>> +; CHECK: movl [[R1]], b
>>> +
>>> +entry:
>>> +  store i32 1234, i32* @a
>>> +  store i32 1234, i32* @b
>>> +  store i32 12, i32* @c
>>> +  %0 = load i32, i32* @e
>>> +  %cmp = icmp eq i32 %0, 12
>>> +  br i1 %cmp, label %if.then, label %if.end
>>> +
>>> +if.then:                                          ; preds = %entry
>>> +  store i32 1, i32* @x
>>> +  br label %if.end
>>> +
>>> +; New block.. Make sure 1234 isn't live across basic blocks from before.
>>> +; CHECK: movl $1234, f
>>> +; CHECK: movl $555, [[R3:%[a-z]+]]
>>> +; CHECK-NOT: movl $555, h
>>> +; CHECK-NOT: addl $555, i
>>> +; CHECK: movl [[R3]], h
>>> +; CHECK: addl [[R3]], i
>>> +
>>> +if.end:                                           ; preds = %if.then,
>>> %entry
>>> +  store i32 1234, i32* @f
>>> +  store i32 555, i32* @h
>>> +  %1 = load i32, i32* @i
>>> +  %add1 = add nsw i32 %1, 555
>>> +  store i32 %add1, i32* @i
>>> +  ret i32 0
>>> +}
>>> +
>>> +; Test -O2 to make sure that all immediates get pulled in to their
>>> users.
>>> +define i32 @foo2() {
>>> +; CHECK-LABEL: foo2:
>>> +; CHECK: movl $1234, a
>>> +; CHECK: movl $1234, b
>>> +
>>> +entry:
>>> +  store i32 1234, i32* @a
>>> +  store i32 1234, i32* @b
>>> +
>>> +  ret i32 0
>>> +}
>>> +
>>> +declare void @llvm.memset.p0i8.i32(i8* nocapture, i8, i32, i32, i1) #1
>>> +
>>> + at AA = common global [100 x i8] zeroinitializer, align 1
>>> +
>>> +; memset gets lowered in DAG. Constant merging should hoist all the
>>> +; immediates used to store to the individual memory locations. Make
>>> +; sure we don't directly store the immediates.
>>> +define void @foomemset() optsize {
>>> +; CHECK-LABEL: foomemset:
>>> +; CHECK-NOT: movl ${{.*}}, AA
>>> +; CHECK: mov{{l|q}} %{{e|r}}ax, AA
>>> +
>>> +entry:
>>> +  call void @llvm.memset.p0i8.i32(i8* getelementptr inbounds ([100 x
>>> i8], [100 x i8]* @AA, i32 0, i32 0), i8 33, i32 24, i32 1, i1 false)
>>> +  ret void
>>> +}
>>>
>>> Removed: llvm/trunk/test/CodeGen/X86/remat-invalid-liveness.ll
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/remat-invalid-liveness.ll?rev=244600&view=auto
>>>
>>> ==============================================================================
>>> --- llvm/trunk/test/CodeGen/X86/remat-invalid-liveness.ll (original)
>>> +++ llvm/trunk/test/CodeGen/X86/remat-invalid-liveness.ll (removed)
>>> @@ -1,85 +0,0 @@
>>> -; RUN: llc %s -mcpu=core2 -o - | FileCheck %s
>>> -; This test was failing while tracking the liveness in the register
>>> scavenger
>>> -; during the branching folding pass. The allocation of the subregisters
>>> was
>>> -; incorrect.
>>> -; I.e., the faulty pattern looked like:
>>> -; CH = movb 64
>>> -; ECX = movl 3 <- CH was killed here.
>>> -; CH = subb CH, ...
>>> -;
>>> -; This reduced test case triggers the crash before the fix, but does not
>>> -; strictly speaking check that the resulting code is correct.
>>> -; To check that the code is actually correct we would need to check the
>>> -; liveness of the produced code.
>>> -;
>>> -; Currently, we check that after ECX = movl 3, we do not have subb CH,
>>> -; whereas CH could have been redefine in between and that would have
>>> been
>>> -; totally fine.
>>> -; <rdar://problem/16582185>
>>> -target datalayout = "e-m:o-p:32:32-f64:32:64-f80:128-n8:16:32-S128"
>>> -target triple = "i386-apple-macosx10.9"
>>> -
>>> -%struct.A = type { %struct.B, %struct.C, %struct.D*, [1 x i8*] }
>>> -%struct.B = type { i32, [4 x i8] }
>>> -%struct.C = type { i128 }
>>> -%struct.D = type { {}*, [0 x i32] }
>>> -%union.E = type { i32 }
>>> -
>>> -; CHECK-LABEL: __XXX1:
>>> -; CHECK: movl $3, %ecx
>>> -; CHECK-NOT: subb %{{[a-z]+}}, %ch
>>> -; Function Attrs: nounwind optsize ssp
>>> -define fastcc void @__XXX1(%struct.A* %ht) #0 {
>>> -entry:
>>> -  %const72 = bitcast i128 72 to i128
>>> -  %const3 = bitcast i128 3 to i128
>>> -  switch i32 undef, label %if.end196 [
>>> -    i32 1, label %sw.bb.i
>>> -    i32 3, label %sw.bb2.i
>>> -  ]
>>> -
>>> -sw.bb.i:                                          ; preds = %entry
>>> -  %call.i.i.i = tail call i32 undef(%struct.A* %ht, i8 zeroext 22, i32
>>> undef, i32 0, %struct.D* undef)
>>> -  %bf.load.i.i = load i128, i128* undef, align 4
>>> -  %bf.lshr.i.i = lshr i128 %bf.load.i.i, %const72
>>> -  %shl1.i.i = shl nuw nsw i128 %bf.lshr.i.i, 8
>>> -  %shl.i.i = trunc i128 %shl1.i.i to i32
>>> -  br i1 undef, label %cond.false10.i.i, label %__XXX2.exit.i.i
>>> -
>>> -__XXX2.exit.i.i:                    ; preds = %sw.bb.i
>>> -  %extract11.i.i.i = lshr i128 %bf.load.i.i, %const3
>>> -  %extract.t12.i.i.i = trunc i128 %extract11.i.i.i to i32
>>> -  %bf.cast7.i.i.i = and i32 %extract.t12.i.i.i, 3
>>> -  %arrayidx.i.i.i = getelementptr inbounds %struct.A, %struct.A* %ht,
>>> i32 0, i32 3, i32 %bf.cast7.i.i.i
>>> -  br label %cond.end12.i.i
>>> -
>>> -cond.false10.i.i:                                 ; preds = %sw.bb.i
>>> -  %arrayidx.i6.i.i = getelementptr inbounds %struct.A, %struct.A* %ht,
>>> i32 0, i32 3, i32 0
>>> -  br label %cond.end12.i.i
>>> -
>>> -cond.end12.i.i:                                   ; preds =
>>> %cond.false10.i.i, %__XXX2.exit.i.i
>>> -  %.sink.in.i.i = phi i8** [ %arrayidx.i.i.i, %__XXX2.exit.i.i ], [
>>> %arrayidx.i6.i.i, %cond.false10.i.i ]
>>> -  %.sink.i.i = load i8*, i8** %.sink.in.i.i, align 4
>>> -  %tmp = bitcast i8* %.sink.i.i to %union.E*
>>> -  br i1 undef, label %for.body.i.i, label %if.end196
>>> -
>>> -for.body.i.i:                                     ; preds =
>>> %for.body.i.i, %cond.end12.i.i
>>> -  %weak.i.i = getelementptr inbounds %union.E, %union.E* %tmp, i32
>>> undef, i32 0
>>> -  %tmp1 = load i32, i32* %weak.i.i, align 4
>>> -  %cmp36.i.i = icmp ne i32 %tmp1, %shl.i.i
>>> -  %or.cond = and i1 %cmp36.i.i, false
>>> -  br i1 %or.cond, label %for.body.i.i, label %if.end196
>>> -
>>> -sw.bb2.i:                                         ; preds = %entry
>>> -  %bf.lshr.i85.i = lshr i128 undef, %const72
>>> -  br i1 undef, label %if.end196, label %__XXX2.exit.i95.i
>>> -
>>> -__XXX2.exit.i95.i:                  ; preds = %sw.bb2.i
>>> -  %extract11.i.i91.i = lshr i128 undef, %const3
>>> -  br label %if.end196
>>> -
>>> -if.end196:                                        ; preds =
>>> %__XXX2.exit.i95.i, %sw.bb2.i, %for.body.i.i, %cond.end12.i.i, %entry
>>> -  ret void
>>> -}
>>> -
>>> -attributes #0 = { nounwind optsize ssp "no-frame-pointer-elim"="true"
>>> "no-frame-pointer-elim-non-leaf" }
>>>
>>>
>>> _______________________________________________
>>> llvm-commits mailing list
>>> llvm-commits at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150812/5651f1aa/attachment.html>


More information about the llvm-commits mailing list