[llvm] r297695 - In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.

Fri May 12 11:48:50 PDT 2017

+ctopper, +mkuper who may have some interest in the CMOV aspects of this.

Looks like we're now correcting hte chain of the two loads here so we can
correctly pull and merge the loads past the select in DAGCombine after
which we fail due to an issue with X86RegisterInfo::eliminateFrameIndex not
being able to deal with CMOV where one of the argument FrameIndex that
translates to the stack pointer with a non-zero offset. Adding Craig and
Michael who may have more interested.

Attached is a temporary patch to prevent the immediate issue and hopefully
unsticks you. It causes some degradation in various tests and some
improvements in others.

-Nirav

On Thu, May 11, 2017 at 6:41 PM, Sanjoy Das <sanjoy at playingwithpointers.com>
wrote:

> Even if you can't immediately fix the issue cleanly, a conservative
> workaround that we can temporarily apply downstream* will be immensely
> helpful.
>
> * We have a JIT compiler based on LLVM.
>
> -- Sanjoy
>
> On Thu, May 11, 2017 at 3:30 PM, Sanjoy Das
> <sanjoy at playingwithpointers.com> wrote:
> > Hi,
> >
> > This change causes llc to crash on the following input:
> >
> > target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
> > declare void @f()
> >
> > define i32 addrspace(1)* @test(i32 addrspace(1)* %a, i32 addrspace(1)*
> > %b, i1 %which) gc "statepoint-example" {
> > entry:
> >   %tok = tail call token (i64, i32, void ()*, i32, i32, ...)
> > @llvm.experimental.gc.statepoint.p0f_isVoidf(i64 0, i32 0, void ()*
> > @f, i32 0, i32 0, i32 0, i32 0, i32 addrspace(1)* %a, i32
> > addrspace(1)* %b)
> >   %a.r = tail call coldcc i8 addrspace(1)*
> > @llvm.experimental.gc.relocate.p1i8(token %tok, i32 7, i32 7) ; (%a,
> > %a)
> >   %b.r = tail call coldcc i8 addrspace(1)*
> > @llvm.experimental.gc.relocate.p1i8(token %tok, i32 8, i32 8) ; (%b,
> > %b)
> >   %cond.v = select i1 %which, i8 addrspace(1)* %a.r, i8 addrspace(1)*
> %b.r
> >   %cond = bitcast i8 addrspace(1)* %cond.v to i32 addrspace(1)*
> >   ret i32 addrspace(1)* %cond
> > }
> >
> > declare token @llvm.experimental.gc.statepoint.p0f_isVoidf(i64, i32,
> > void ()*, i32, i32, ...)
> > declare i8 addrspace(1)* @llvm.experimental.gc.relocate.p1i8(token,
> i32, i32)
> >
> > I've filed PR33010 for this, can you please take a look?
> >
> > -- Sanjoy
> >
> > On Thu, Apr 27, 2017 at 7:04 PM, Nirav Davé via llvm-commits
> > <llvm-commits at lists.llvm.org> wrote:
> >> Hi Vedant:
> >>
> >> I've been looking at this for a bit. No fix yet, but here's what I've
> >> determined so far:
> >>
> >> The r297695 patch is correct in that the transformation to the
> SelectionDAG
> >> is correct and the problem is a latent issue downstream. We're now
> correctly
> >> determining that the Loads at stores in the example have no dependence
> >> order. The core of the problem seems to be from duplicating the shared
> stub
> >> load (which is why it's only in 32-bit X86 and not 64)  the two loads
> of b
> >> (which cannot be shared). It seems like we're having an issue with
> >> duplication of the SU in ScheduleDAGRRList::CopyAndMoveSuccessors
> where are
> >> confused determining the unfolding the load causing us to miscount the
> >> successors.
> >>
> >> -Nirav
> >>
> >>
> >>
> >> On Thu, Apr 27, 2017 at 7:29 PM, Vedant Kumar <vsk at apple.com> wrote:
> >>>
> >>> Hi Nirav,
> >>>
> >>> I saw a crasher on bugzilla and narrowed down the issue to this commit.
> >>> Please see:
> >>>
> >>>   https://bugs.llvm.org/show_bug.cgi?id=32610
> >>>
> >>> The reduced test case compiles with r297692 (previous commit), but the
> >>> crash occurs when r297695 (this commit) is included. The original C
> file,
> >>> and a reduced IR file, is attached to the bug report.
> >>>
> >>> Could you take a look?
> >>>
> >>> thanks,
> >>> vedant
> >>>
> >>>
> >>> > On Mar 13, 2017, at 5:34 PM, Nirav Dave via llvm-commits
> >>> > <llvm-commits at lists.llvm.org> wrote:
> >>> >
> >>> > Author: niravd
> >>> > Date: Mon Mar 13 19:34:14 2017
> >>> > New Revision: 297695
> >>> >
> >>> > URL: http://llvm.org/viewvc/llvm-project?rev=297695&view=rev
> >>> > Log:
> >>> > In visitSTORE, always use FindBetterChain, rather than only when
> UseAA
> >>> > is enabled.
> >>> >
> >>> >    Recommiting with compiler time improvements
> >>> >
> >>> >    Recommitting after fixup of 32-bit aliasing sign offset bug in
> >>> > DAGCombiner.
> >>> >
> >>> >    * Simplify Consecutive Merge Store Candidate Search
> >>> >
> >>> >    Now that address aliasing is much less conservative, push through
> >>> >    simplified store merging search and chain alias analysis which
> only
> >>> >    checks for parallel stores through the chain subgraph. This is
> >>> > cleaner
> >>> >    as the separation of non-interfering loads/stores from the
> >>> >    store-merging logic.
> >>> >
> >>> >    When merging stores search up the chain through a single load, and
> >>> >    finds all possible stores by looking down from through a load and
> a
> >>> >    TokenFactor to all stores visited.
> >>> >
> >>> >    This improves the quality of the output SelectionDAG and the
> output
> >>> >    Codegen (save perhaps for some ARM cases where we correctly
> >>> > constructs
> >>> >    wider loads, but then promotes them to float operations which
> appear
> >>> >    but requires more expensive constant generation).
> >>> >
> >>> >    Some minor peephole optimizations to deal with improved SubDAG
> shapes
> >>> > (listed below)
> >>> >
> >>> >    Additional Minor Changes:
> >>> >
> >>> >      1. Finishes removing unused AliasLoad code
> >>> >
> >>> >      2. Unifies the chain aggregation in the merged stores across
> code
> >>> >         paths
> >>> >
> >>> >      3. Re-add the Store node to the worklist after calling
> >>> >         SimplifyDemandedBits.
> >>> >
> >>> >      4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number
> is
> >>> >         arbitrary, but seems sufficient to not cause regressions in
> >>> >         tests.
> >>> >
> >>> >      5. Remove Chain dependencies of Memory operations on CopyfromReg
> >>> >         nodes as these are captured by data dependence
> >>> >
> >>> >      6. Forward loads-store values through tokenfactors containing
> >>> >          {CopyToReg,CopyFromReg} Values.
> >>> >
> >>> >      7. Peephole to convert buildvector of extract_vector_elt to
> >>> >         extract_subvector if possible (see
> >>> >         CodeGen/AArch64/store-merge.ll)
> >>> >
> >>> >      8. Store merging for the ARM target is restricted to 32-bit as
> >>> >         some in some contexts invalid 64-bit operations are being
> >>> >         generated. This can be removed once appropriate checks are
> >>> >         added.
> >>> >
> >>> >    This finishes the change Matt Arsenault started in r246307 and
> >>> >    jyknight's original patch.
> >>> >
> >>> >    Many tests required some changes as memory operations are now
> >>> >    reorderable, improving load-store forwarding. One test in
> >>> >    particular is worth noting:
> >>> >
> >>> >      CodeGen/PowerPC/ppc64-align-long-double.ll - Improved
> load-store
> >>> >      forwarding converts a load-store pair into a parallel store and
> >>> >      a memory-realized bitcast of the same value. However, because we
> >>> >      lose the sharing of the explicit and implicit store values we
> >>> >      must create another local store. A similar transformation
> >>> >      happens before SelectionDAG as well.
> >>> >
> >>> >    Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle
> >>> >
> >>> > Added:
> >>> >    llvm/trunk/test/CodeGen/X86/pr32108.ll
> >>> > Removed:
> >>> >    llvm/trunk/test/CodeGen/X86/combiner-aa-0.ll
> >>> >    llvm/trunk/test/CodeGen/X86/combiner-aa-1.ll
> >>> >    llvm/trunk/test/CodeGen/X86/pr18023.ll
> >>> > Modified:
> >>> >    llvm/trunk/include/llvm/Target/TargetLowering.h
> >>> >    llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
> >>> >    llvm/trunk/lib/CodeGen/TargetLoweringBase.cpp
> >>> >    llvm/trunk/lib/Target/AArch64/AArch64ISelLowering.cpp
> >>> >    llvm/trunk/lib/Target/ARM/ARMISelLowering.h
> >>> >    llvm/trunk/test/CodeGen/AArch64/argument-blocks.ll
> >>> >    llvm/trunk/test/CodeGen/AArch64/arm64-abi.ll
> >>> >    llvm/trunk/test/CodeGen/AArch64/arm64-memset-inline.ll
> >>> >    llvm/trunk/test/CodeGen/AArch64/arm64-variadic-aapcs.ll
> >>> >    llvm/trunk/test/CodeGen/AArch64/merge-store.ll
> >>> >    llvm/trunk/test/CodeGen/AArch64/vector_merge_dep_check.ll
> >>> >    llvm/trunk/test/CodeGen/AMDGPU/debugger-insert-nops.ll
> >>> >    llvm/trunk/test/CodeGen/AMDGPU/insert_vector_elt.ll
> >>> >    llvm/trunk/test/CodeGen/AMDGPU/merge-stores.ll
> >>> >    llvm/trunk/test/CodeGen/AMDGPU/private-element-size.ll
> >>> >    llvm/trunk/test/CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll
> >>> >    llvm/trunk/test/CodeGen/ARM/2012-10-04-AAPCS-byval-align8.ll
> >>> >    llvm/trunk/test/CodeGen/ARM/alloc-no-stack-realign.ll
> >>> >    llvm/trunk/test/CodeGen/ARM/gpr-paired-spill.ll
> >>> >    llvm/trunk/test/CodeGen/ARM/ifcvt10.ll
> >>> >    llvm/trunk/test/CodeGen/ARM/illegal-bitfield-loadstore.ll
> >>> >    llvm/trunk/test/CodeGen/ARM/static-addr-hoisting.ll
> >>> >    llvm/trunk/test/CodeGen/BPF/undef.ll
> >>> >    llvm/trunk/test/CodeGen/MSP430/Inst16mm.ll
> >>> >    llvm/trunk/test/CodeGen/Mips/cconv/arguments-float.ll
> >>> >    llvm/trunk/test/CodeGen/Mips/cconv/arguments-varargs.ll
> >>> >    llvm/trunk/test/CodeGen/Mips/fastcc.ll
> >>> >    llvm/trunk/test/CodeGen/Mips/load-store-left-right.ll
> >>> >    llvm/trunk/test/CodeGen/Mips/micromips-li.ll
> >>> >    llvm/trunk/test/CodeGen/Mips/mips64-f128-call.ll
> >>> >    llvm/trunk/test/CodeGen/Mips/mips64-f128.ll
> >>> >    llvm/trunk/test/CodeGen/Mips/mno-ldc1-sdc1.ll
> >>> >    llvm/trunk/test/CodeGen/Mips/msa/f16-llvm-ir.ll
> >>> >    llvm/trunk/test/CodeGen/Mips/msa/i5_ld_st.ll
> >>> >    llvm/trunk/test/CodeGen/Mips/o32_cc_byval.ll
> >>> >    llvm/trunk/test/CodeGen/Mips/o32_cc_vararg.ll
> >>> >    llvm/trunk/test/CodeGen/PowerPC/anon_aggr.ll
> >>> >    llvm/trunk/test/CodeGen/PowerPC/complex-return.ll
> >>> >    llvm/trunk/test/CodeGen/PowerPC/jaggedstructs.ll
> >>> >    llvm/trunk/test/CodeGen/PowerPC/ppc64-align-long-double.ll
> >>> >    llvm/trunk/test/CodeGen/PowerPC/structsinmem.ll
> >>> >    llvm/trunk/test/CodeGen/PowerPC/structsinregs.ll
> >>> >    llvm/trunk/test/CodeGen/SystemZ/unaligned-01.ll
> >>> >    llvm/trunk/test/CodeGen/Thumb/2010-07-15-debugOrdering.ll
> >>> >    llvm/trunk/test/CodeGen/Thumb/stack-access.ll
> >>> >    llvm/trunk/test/CodeGen/X86/2010-09-17-SideEffectsInChain.ll
> >>> >    llvm/trunk/test/CodeGen/X86/2012-11-28-merge-store-alias.ll
> >>> >    llvm/trunk/test/CodeGen/X86/MergeConsecutiveStores.ll
> >>> >    llvm/trunk/test/CodeGen/X86/avx-vbroadcast.ll
> >>> >    llvm/trunk/test/CodeGen/X86/avx512-mask-op.ll
> >>> >    llvm/trunk/test/CodeGen/X86/chain_order.ll
> >>> >    llvm/trunk/test/CodeGen/X86/clear_upper_vector_element_bits.ll
> >>> >    llvm/trunk/test/CodeGen/X86/copy-eflags.ll
> >>> >    llvm/trunk/test/CodeGen/X86/dag-merge-fast-accesses.ll
> >>> >    llvm/trunk/test/CodeGen/X86/dont-trunc-store-double-to-float.ll
> >>> >
> >>> > llvm/trunk/test/CodeGen/X86/extractelement-legalization-
> store-ordering.ll
> >>> >    llvm/trunk/test/CodeGen/X86/i256-add.ll
> >>> >    llvm/trunk/test/CodeGen/X86/i386-shrink-wrapping.ll
> >>> >    llvm/trunk/test/CodeGen/X86/live-range-nosubreg.ll
> >>> >    llvm/trunk/test/CodeGen/X86/longlong-deadload.ll
> >>> >    llvm/trunk/test/CodeGen/X86/merge-consecutive-loads-128.ll
> >>> >    llvm/trunk/test/CodeGen/X86/merge-consecutive-loads-256.ll
> >>> >    llvm/trunk/test/CodeGen/X86/merge-store-partially-alias-loads.ll
> >>> >    llvm/trunk/test/CodeGen/X86/split-store.ll
> >>> >    llvm/trunk/test/CodeGen/X86/stores-merging.ll
> >>> >    llvm/trunk/test/CodeGen/X86/vector-compare-results.ll
> >>> >    llvm/trunk/test/CodeGen/X86/vector-shuffle-variable-128.ll
> >>> >    llvm/trunk/test/CodeGen/X86/vector-shuffle-variable-256.ll
> >>> >    llvm/trunk/test/CodeGen/X86/vectorcall.ll
> >>> >    llvm/trunk/test/CodeGen/X86/win32-eh.ll
> >>> >    llvm/trunk/test/CodeGen/XCore/varargs.ll
> >>>
> >>
> >>
> >> _______________________________________________
> >> llvm-commits mailing list
> >> llvm-commits at lists.llvm.org
> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
> >>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20170512/913ec44a/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-DAG-Temporarily-disable-Selection-simplification-of-.patch
Type: text/x-patch
Size: 1188 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20170512/913ec44a/attachment-0001.bin>