[llvm] r297695 - In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.

Fri May 12 11:59:47 PDT 2017

Adding Simon and Amjad as well.

~Craig

On Fri, May 12, 2017 at 11:48 AM, Nirav Davé <niravd at google.com> wrote:

> +ctopper, +mkuper who may have some interest in the CMOV aspects of this.
>
> Looks like we're now correcting hte chain of the two loads here so we can
> correctly pull and merge the loads past the select in DAGCombine after
> which we fail due to an issue with X86RegisterInfo::eliminateFrameIndex
> not being able to deal with CMOV where one of the argument FrameIndex that
> translates to the stack pointer with a non-zero offset. Adding Craig and
> Michael who may have more interested.
>
> Attached is a temporary patch to prevent the immediate issue and hopefully
> unsticks you. It causes some degradation in various tests and some
> improvements in others.
>
> -Nirav
>
> On Thu, May 11, 2017 at 6:41 PM, Sanjoy Das <sanjoy at playingwithpointers.
> com> wrote:
>
>> Even if you can't immediately fix the issue cleanly, a conservative
>> workaround that we can temporarily apply downstream* will be immensely
>> helpful.
>>
>> * We have a JIT compiler based on LLVM.
>>
>> -- Sanjoy
>>
>> On Thu, May 11, 2017 at 3:30 PM, Sanjoy Das
>> <sanjoy at playingwithpointers.com> wrote:
>> > Hi,
>> >
>> > This change causes llc to crash on the following input:
>> >
>> > target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
>> > declare void @f()
>> >
>> > define i32 addrspace(1)* @test(i32 addrspace(1)* %a, i32 addrspace(1)*
>> > %b, i1 %which) gc "statepoint-example" {
>> > entry:
>> >   %tok = tail call token (i64, i32, void ()*, i32, i32, ...)
>> > @llvm.experimental.gc.statepoint.p0f_isVoidf(i64 0, i32 0, void ()*
>> > @f, i32 0, i32 0, i32 0, i32 0, i32 addrspace(1)* %a, i32
>> > addrspace(1)* %b)
>> >   %a.r = tail call coldcc i8 addrspace(1)*
>> > @llvm.experimental.gc.relocate.p1i8(token %tok, i32 7, i32 7) ; (%a,
>> > %a)
>> >   %b.r = tail call coldcc i8 addrspace(1)*
>> > @llvm.experimental.gc.relocate.p1i8(token %tok, i32 8, i32 8) ; (%b,
>> > %b)
>> >   %cond.v = select i1 %which, i8 addrspace(1)* %a.r, i8 addrspace(1)*
>> %b.r
>> >   %cond = bitcast i8 addrspace(1)* %cond.v to i32 addrspace(1)*
>> >   ret i32 addrspace(1)* %cond
>> > }
>> >
>> > declare token @llvm.experimental.gc.statepoint.p0f_isVoidf(i64, i32,
>> > void ()*, i32, i32, ...)
>> > declare i8 addrspace(1)* @llvm.experimental.gc.relocate.p1i8(token,
>> i32, i32)
>> >
>> > I've filed PR33010 for this, can you please take a look?
>> >
>> > -- Sanjoy
>> >
>> > On Thu, Apr 27, 2017 at 7:04 PM, Nirav Davé via llvm-commits
>> > <llvm-commits at lists.llvm.org> wrote:
>> >> Hi Vedant:
>> >>
>> >> I've been looking at this for a bit. No fix yet, but here's what I've
>> >> determined so far:
>> >>
>> >> The r297695 patch is correct in that the transformation to the
>> SelectionDAG
>> >> is correct and the problem is a latent issue downstream. We're now
>> correctly
>> >> determining that the Loads at stores in the example have no dependence
>> >> order. The core of the problem seems to be from duplicating the shared
>> stub
>> >> load (which is why it's only in 32-bit X86 and not 64)  the two loads
>> of b
>> >> (which cannot be shared). It seems like we're having an issue with
>> >> duplication of the SU in ScheduleDAGRRList::CopyAndMoveSuccessors
>> where are
>> >> confused determining the unfolding the load causing us to miscount the
>> >> successors.
>> >>
>> >> -Nirav
>> >>
>> >>
>> >>
>> >> On Thu, Apr 27, 2017 at 7:29 PM, Vedant Kumar <vsk at apple.com> wrote:
>> >>>
>> >>> Hi Nirav,
>> >>>
>> >>> I saw a crasher on bugzilla and narrowed down the issue to this
>> commit.
>> >>> Please see:
>> >>>
>> >>>   https://bugs.llvm.org/show_bug.cgi?id=32610
>> >>>
>> >>> The reduced test case compiles with r297692 (previous commit), but the
>> >>> crash occurs when r297695 (this commit) is included. The original C
>> file,
>> >>> and a reduced IR file, is attached to the bug report.
>> >>>
>> >>> Could you take a look?
>> >>>
>> >>> thanks,
>> >>> vedant
>> >>>
>> >>>
>> >>> > On Mar 13, 2017, at 5:34 PM, Nirav Dave via llvm-commits
>> >>> > <llvm-commits at lists.llvm.org> wrote:
>> >>> >
>> >>> > Author: niravd
>> >>> > Date: Mon Mar 13 19:34:14 2017
>> >>> > New Revision: 297695
>> >>> >
>> >>> > URL: http://llvm.org/viewvc/llvm-project?rev=297695&view=rev
>> >>> > Log:
>> >>> > In visitSTORE, always use FindBetterChain, rather than only when
>> UseAA
>> >>> > is enabled.
>> >>> >
>> >>> >    Recommiting with compiler time improvements
>> >>> >
>> >>> >    Recommitting after fixup of 32-bit aliasing sign offset bug in
>> >>> > DAGCombiner.
>> >>> >
>> >>> >    * Simplify Consecutive Merge Store Candidate Search
>> >>> >
>> >>> >    Now that address aliasing is much less conservative, push through
>> >>> >    simplified store merging search and chain alias analysis which
>> only
>> >>> >    checks for parallel stores through the chain subgraph. This is
>> >>> > cleaner
>> >>> >    as the separation of non-interfering loads/stores from the
>> >>> >    store-merging logic.
>> >>> >
>> >>> >    When merging stores search up the chain through a single load,
>> and
>> >>> >    finds all possible stores by looking down from through a load
>> and a
>> >>> >    TokenFactor to all stores visited.
>> >>> >
>> >>> >    This improves the quality of the output SelectionDAG and the
>> output
>> >>> >    Codegen (save perhaps for some ARM cases where we correctly
>> >>> > constructs
>> >>> >    wider loads, but then promotes them to float operations which
>> appear
>> >>> >    but requires more expensive constant generation).
>> >>> >
>> >>> >    Some minor peephole optimizations to deal with improved SubDAG
>> shapes
>> >>> > (listed below)
>> >>> >
>> >>> >    Additional Minor Changes:
>> >>> >
>> >>> >      1. Finishes removing unused AliasLoad code
>> >>> >
>> >>> >      2. Unifies the chain aggregation in the merged stores across
>> code
>> >>> >         paths
>> >>> >
>> >>> >      3. Re-add the Store node to the worklist after calling
>> >>> >         SimplifyDemandedBits.
>> >>> >
>> >>> >      4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number
>> is
>> >>> >         arbitrary, but seems sufficient to not cause regressions in
>> >>> >         tests.
>> >>> >
>> >>> >      5. Remove Chain dependencies of Memory operations on
>> CopyfromReg
>> >>> >         nodes as these are captured by data dependence
>> >>> >
>> >>> >      6. Forward loads-store values through tokenfactors containing
>> >>> >          {CopyToReg,CopyFromReg} Values.
>> >>> >
>> >>> >      7. Peephole to convert buildvector of extract_vector_elt to
>> >>> >         extract_subvector if possible (see
>> >>> >         CodeGen/AArch64/store-merge.ll)
>> >>> >
>> >>> >      8. Store merging for the ARM target is restricted to 32-bit as
>> >>> >         some in some contexts invalid 64-bit operations are being
>> >>> >         generated. This can be removed once appropriate checks are
>> >>> >         added.
>> >>> >
>> >>> >    This finishes the change Matt Arsenault started in r246307 and
>> >>> >    jyknight's original patch.
>> >>> >
>> >>> >    Many tests required some changes as memory operations are now
>> >>> >    reorderable, improving load-store forwarding. One test in
>> >>> >    particular is worth noting:
>> >>> >
>> >>> >      CodeGen/PowerPC/ppc64-align-long-double.ll - Improved
>> load-store
>> >>> >      forwarding converts a load-store pair into a parallel store and
>> >>> >      a memory-realized bitcast of the same value. However, because
>> we
>> >>> >      lose the sharing of the explicit and implicit store values we
>> >>> >      must create another local store. A similar transformation
>> >>> >      happens before SelectionDAG as well.
>> >>> >
>> >>> >    Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle
>> >>> >
>> >>> > Added:
>> >>> >    llvm/trunk/test/CodeGen/X86/pr32108.ll
>> >>> > Removed:
>> >>> >    llvm/trunk/test/CodeGen/X86/combiner-aa-0.ll
>> >>> >    llvm/trunk/test/CodeGen/X86/combiner-aa-1.ll
>> >>> >    llvm/trunk/test/CodeGen/X86/pr18023.ll
>> >>> > Modified:
>> >>> >    llvm/trunk/include/llvm/Target/TargetLowering.h
>> >>> >    llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
>> >>> >    llvm/trunk/lib/CodeGen/TargetLoweringBase.cpp
>> >>> >    llvm/trunk/lib/Target/AArch64/AArch64ISelLowering.cpp
>> >>> >    llvm/trunk/lib/Target/ARM/ARMISelLowering.h
>> >>> >    llvm/trunk/test/CodeGen/AArch64/argument-blocks.ll
>> >>> >    llvm/trunk/test/CodeGen/AArch64/arm64-abi.ll
>> >>> >    llvm/trunk/test/CodeGen/AArch64/arm64-memset-inline.ll
>> >>> >    llvm/trunk/test/CodeGen/AArch64/arm64-variadic-aapcs.ll
>> >>> >    llvm/trunk/test/CodeGen/AArch64/merge-store.ll
>> >>> >    llvm/trunk/test/CodeGen/AArch64/vector_merge_dep_check.ll
>> >>> >    llvm/trunk/test/CodeGen/AMDGPU/debugger-insert-nops.ll
>> >>> >    llvm/trunk/test/CodeGen/AMDGPU/insert_vector_elt.ll
>> >>> >    llvm/trunk/test/CodeGen/AMDGPU/merge-stores.ll
>> >>> >    llvm/trunk/test/CodeGen/AMDGPU/private-element-size.ll
>> >>> >    llvm/trunk/test/CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll
>> >>> >    llvm/trunk/test/CodeGen/ARM/2012-10-04-AAPCS-byval-align8.ll
>> >>> >    llvm/trunk/test/CodeGen/ARM/alloc-no-stack-realign.ll
>> >>> >    llvm/trunk/test/CodeGen/ARM/gpr-paired-spill.ll
>> >>> >    llvm/trunk/test/CodeGen/ARM/ifcvt10.ll
>> >>> >    llvm/trunk/test/CodeGen/ARM/illegal-bitfield-loadstore.ll
>> >>> >    llvm/trunk/test/CodeGen/ARM/static-addr-hoisting.ll
>> >>> >    llvm/trunk/test/CodeGen/BPF/undef.ll
>> >>> >    llvm/trunk/test/CodeGen/MSP430/Inst16mm.ll
>> >>> >    llvm/trunk/test/CodeGen/Mips/cconv/arguments-float.ll
>> >>> >    llvm/trunk/test/CodeGen/Mips/cconv/arguments-varargs.ll
>> >>> >    llvm/trunk/test/CodeGen/Mips/fastcc.ll
>> >>> >    llvm/trunk/test/CodeGen/Mips/load-store-left-right.ll
>> >>> >    llvm/trunk/test/CodeGen/Mips/micromips-li.ll
>> >>> >    llvm/trunk/test/CodeGen/Mips/mips64-f128-call.ll
>> >>> >    llvm/trunk/test/CodeGen/Mips/mips64-f128.ll
>> >>> >    llvm/trunk/test/CodeGen/Mips/mno-ldc1-sdc1.ll
>> >>> >    llvm/trunk/test/CodeGen/Mips/msa/f16-llvm-ir.ll
>> >>> >    llvm/trunk/test/CodeGen/Mips/msa/i5_ld_st.ll
>> >>> >    llvm/trunk/test/CodeGen/Mips/o32_cc_byval.ll
>> >>> >    llvm/trunk/test/CodeGen/Mips/o32_cc_vararg.ll
>> >>> >    llvm/trunk/test/CodeGen/PowerPC/anon_aggr.ll
>> >>> >    llvm/trunk/test/CodeGen/PowerPC/complex-return.ll
>> >>> >    llvm/trunk/test/CodeGen/PowerPC/jaggedstructs.ll
>> >>> >    llvm/trunk/test/CodeGen/PowerPC/ppc64-align-long-double.ll
>> >>> >    llvm/trunk/test/CodeGen/PowerPC/structsinmem.ll
>> >>> >    llvm/trunk/test/CodeGen/PowerPC/structsinregs.ll
>> >>> >    llvm/trunk/test/CodeGen/SystemZ/unaligned-01.ll
>> >>> >    llvm/trunk/test/CodeGen/Thumb/2010-07-15-debugOrdering.ll
>> >>> >    llvm/trunk/test/CodeGen/Thumb/stack-access.ll
>> >>> >    llvm/trunk/test/CodeGen/X86/2010-09-17-SideEffectsInChain.ll
>> >>> >    llvm/trunk/test/CodeGen/X86/2012-11-28-merge-store-alias.ll
>> >>> >    llvm/trunk/test/CodeGen/X86/MergeConsecutiveStores.ll
>> >>> >    llvm/trunk/test/CodeGen/X86/avx-vbroadcast.ll
>> >>> >    llvm/trunk/test/CodeGen/X86/avx512-mask-op.ll
>> >>> >    llvm/trunk/test/CodeGen/X86/chain_order.ll
>> >>> >    llvm/trunk/test/CodeGen/X86/clear_upper_vector_element_bits.ll
>> >>> >    llvm/trunk/test/CodeGen/X86/copy-eflags.ll
>> >>> >    llvm/trunk/test/CodeGen/X86/dag-merge-fast-accesses.ll
>> >>> >    llvm/trunk/test/CodeGen/X86/dont-trunc-store-double-to-float.ll
>> >>> >
>> >>> > llvm/trunk/test/CodeGen/X86/extractelement-legalization-stor
>> e-ordering.ll
>> >>> >    llvm/trunk/test/CodeGen/X86/i256-add.ll
>> >>> >    llvm/trunk/test/CodeGen/X86/i386-shrink-wrapping.ll
>> >>> >    llvm/trunk/test/CodeGen/X86/live-range-nosubreg.ll
>> >>> >    llvm/trunk/test/CodeGen/X86/longlong-deadload.ll
>> >>> >    llvm/trunk/test/CodeGen/X86/merge-consecutive-loads-128.ll
>> >>> >    llvm/trunk/test/CodeGen/X86/merge-consecutive-loads-256.ll
>> >>> >    llvm/trunk/test/CodeGen/X86/merge-store-partially-alias-loads.ll
>> >>> >    llvm/trunk/test/CodeGen/X86/split-store.ll
>> >>> >    llvm/trunk/test/CodeGen/X86/stores-merging.ll
>> >>> >    llvm/trunk/test/CodeGen/X86/vector-compare-results.ll
>> >>> >    llvm/trunk/test/CodeGen/X86/vector-shuffle-variable-128.ll
>> >>> >    llvm/trunk/test/CodeGen/X86/vector-shuffle-variable-256.ll
>> >>> >    llvm/trunk/test/CodeGen/X86/vectorcall.ll
>> >>> >    llvm/trunk/test/CodeGen/X86/win32-eh.ll
>> >>> >    llvm/trunk/test/CodeGen/XCore/varargs.ll
>> >>>
>> >>
>> >>
>> >> _______________________________________________
>> >> llvm-commits mailing list
>> >> llvm-commits at lists.llvm.org
>> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>> >>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20170512/c777610f/attachment.html>