[llvm] r297695 - In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.

Thu Apr 27 16:29:43 PDT 2017

Hi Nirav,

I saw a crasher on bugzilla and narrowed down the issue to this commit. Please see:

  https://bugs.llvm.org/show_bug.cgi?id=32610

The reduced test case compiles with r297692 (previous commit), but the crash occurs when r297695 (this commit) is included. The original C file, and a reduced IR file, is attached to the bug report.

Could you take a look?

thanks,
vedant

> On Mar 13, 2017, at 5:34 PM, Nirav Dave via llvm-commits <llvm-commits at lists.llvm.org> wrote:
> 
> Author: niravd
> Date: Mon Mar 13 19:34:14 2017
> New Revision: 297695
> 
> URL: http://llvm.org/viewvc/llvm-project?rev=297695&view=rev
> Log:
> In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
> 
>    Recommiting with compiler time improvements
> 
>    Recommitting after fixup of 32-bit aliasing sign offset bug in DAGCombiner.
> 
>    * Simplify Consecutive Merge Store Candidate Search
> 
>    Now that address aliasing is much less conservative, push through
>    simplified store merging search and chain alias analysis which only
>    checks for parallel stores through the chain subgraph. This is cleaner
>    as the separation of non-interfering loads/stores from the
>    store-merging logic.
> 
>    When merging stores search up the chain through a single load, and
>    finds all possible stores by looking down from through a load and a
>    TokenFactor to all stores visited.
> 
>    This improves the quality of the output SelectionDAG and the output
>    Codegen (save perhaps for some ARM cases where we correctly constructs
>    wider loads, but then promotes them to float operations which appear
>    but requires more expensive constant generation).
> 
>    Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)
> 
>    Additional Minor Changes:
> 
>      1. Finishes removing unused AliasLoad code
> 
>      2. Unifies the chain aggregation in the merged stores across code
>         paths
> 
>      3. Re-add the Store node to the worklist after calling
>         SimplifyDemandedBits.
> 
>      4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
>         arbitrary, but seems sufficient to not cause regressions in
>         tests.
> 
>      5. Remove Chain dependencies of Memory operations on CopyfromReg
>         nodes as these are captured by data dependence
> 
>      6. Forward loads-store values through tokenfactors containing
>          {CopyToReg,CopyFromReg} Values.
> 
>      7. Peephole to convert buildvector of extract_vector_elt to
>         extract_subvector if possible (see
>         CodeGen/AArch64/store-merge.ll)
> 
>      8. Store merging for the ARM target is restricted to 32-bit as
>         some in some contexts invalid 64-bit operations are being
>         generated. This can be removed once appropriate checks are
>         added.
> 
>    This finishes the change Matt Arsenault started in r246307 and
>    jyknight's original patch.
> 
>    Many tests required some changes as memory operations are now
>    reorderable, improving load-store forwarding. One test in
>    particular is worth noting:
> 
>      CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
>      forwarding converts a load-store pair into a parallel store and
>      a memory-realized bitcast of the same value. However, because we
>      lose the sharing of the explicit and implicit store values we
>      must create another local store. A similar transformation
>      happens before SelectionDAG as well.
> 
>    Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle
> 
> Added:
>    llvm/trunk/test/CodeGen/X86/pr32108.ll
> Removed:
>    llvm/trunk/test/CodeGen/X86/combiner-aa-0.ll
>    llvm/trunk/test/CodeGen/X86/combiner-aa-1.ll
>    llvm/trunk/test/CodeGen/X86/pr18023.ll
> Modified:
>    llvm/trunk/include/llvm/Target/TargetLowering.h
>    llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
>    llvm/trunk/lib/CodeGen/TargetLoweringBase.cpp
>    llvm/trunk/lib/Target/AArch64/AArch64ISelLowering.cpp
>    llvm/trunk/lib/Target/ARM/ARMISelLowering.h
>    llvm/trunk/test/CodeGen/AArch64/argument-blocks.ll
>    llvm/trunk/test/CodeGen/AArch64/arm64-abi.ll
>    llvm/trunk/test/CodeGen/AArch64/arm64-memset-inline.ll
>    llvm/trunk/test/CodeGen/AArch64/arm64-variadic-aapcs.ll
>    llvm/trunk/test/CodeGen/AArch64/merge-store.ll
>    llvm/trunk/test/CodeGen/AArch64/vector_merge_dep_check.ll
>    llvm/trunk/test/CodeGen/AMDGPU/debugger-insert-nops.ll
>    llvm/trunk/test/CodeGen/AMDGPU/insert_vector_elt.ll
>    llvm/trunk/test/CodeGen/AMDGPU/merge-stores.ll
>    llvm/trunk/test/CodeGen/AMDGPU/private-element-size.ll
>    llvm/trunk/test/CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll
>    llvm/trunk/test/CodeGen/ARM/2012-10-04-AAPCS-byval-align8.ll
>    llvm/trunk/test/CodeGen/ARM/alloc-no-stack-realign.ll
>    llvm/trunk/test/CodeGen/ARM/gpr-paired-spill.ll
>    llvm/trunk/test/CodeGen/ARM/ifcvt10.ll
>    llvm/trunk/test/CodeGen/ARM/illegal-bitfield-loadstore.ll
>    llvm/trunk/test/CodeGen/ARM/static-addr-hoisting.ll
>    llvm/trunk/test/CodeGen/BPF/undef.ll
>    llvm/trunk/test/CodeGen/MSP430/Inst16mm.ll
>    llvm/trunk/test/CodeGen/Mips/cconv/arguments-float.ll
>    llvm/trunk/test/CodeGen/Mips/cconv/arguments-varargs.ll
>    llvm/trunk/test/CodeGen/Mips/fastcc.ll
>    llvm/trunk/test/CodeGen/Mips/load-store-left-right.ll
>    llvm/trunk/test/CodeGen/Mips/micromips-li.ll
>    llvm/trunk/test/CodeGen/Mips/mips64-f128-call.ll
>    llvm/trunk/test/CodeGen/Mips/mips64-f128.ll
>    llvm/trunk/test/CodeGen/Mips/mno-ldc1-sdc1.ll
>    llvm/trunk/test/CodeGen/Mips/msa/f16-llvm-ir.ll
>    llvm/trunk/test/CodeGen/Mips/msa/i5_ld_st.ll
>    llvm/trunk/test/CodeGen/Mips/o32_cc_byval.ll
>    llvm/trunk/test/CodeGen/Mips/o32_cc_vararg.ll
>    llvm/trunk/test/CodeGen/PowerPC/anon_aggr.ll
>    llvm/trunk/test/CodeGen/PowerPC/complex-return.ll
>    llvm/trunk/test/CodeGen/PowerPC/jaggedstructs.ll
>    llvm/trunk/test/CodeGen/PowerPC/ppc64-align-long-double.ll
>    llvm/trunk/test/CodeGen/PowerPC/structsinmem.ll
>    llvm/trunk/test/CodeGen/PowerPC/structsinregs.ll
>    llvm/trunk/test/CodeGen/SystemZ/unaligned-01.ll
>    llvm/trunk/test/CodeGen/Thumb/2010-07-15-debugOrdering.ll
>    llvm/trunk/test/CodeGen/Thumb/stack-access.ll
>    llvm/trunk/test/CodeGen/X86/2010-09-17-SideEffectsInChain.ll
>    llvm/trunk/test/CodeGen/X86/2012-11-28-merge-store-alias.ll
>    llvm/trunk/test/CodeGen/X86/MergeConsecutiveStores.ll
>    llvm/trunk/test/CodeGen/X86/avx-vbroadcast.ll
>    llvm/trunk/test/CodeGen/X86/avx512-mask-op.ll
>    llvm/trunk/test/CodeGen/X86/chain_order.ll
>    llvm/trunk/test/CodeGen/X86/clear_upper_vector_element_bits.ll
>    llvm/trunk/test/CodeGen/X86/copy-eflags.ll
>    llvm/trunk/test/CodeGen/X86/dag-merge-fast-accesses.ll
>    llvm/trunk/test/CodeGen/X86/dont-trunc-store-double-to-float.ll
>    llvm/trunk/test/CodeGen/X86/extractelement-legalization-store-ordering.ll
>    llvm/trunk/test/CodeGen/X86/i256-add.ll
>    llvm/trunk/test/CodeGen/X86/i386-shrink-wrapping.ll
>    llvm/trunk/test/CodeGen/X86/live-range-nosubreg.ll
>    llvm/trunk/test/CodeGen/X86/longlong-deadload.ll
>    llvm/trunk/test/CodeGen/X86/merge-consecutive-loads-128.ll
>    llvm/trunk/test/CodeGen/X86/merge-consecutive-loads-256.ll
>    llvm/trunk/test/CodeGen/X86/merge-store-partially-alias-loads.ll
>    llvm/trunk/test/CodeGen/X86/split-store.ll
>    llvm/trunk/test/CodeGen/X86/stores-merging.ll
>    llvm/trunk/test/CodeGen/X86/vector-compare-results.ll
>    llvm/trunk/test/CodeGen/X86/vector-shuffle-variable-128.ll
>    llvm/trunk/test/CodeGen/X86/vector-shuffle-variable-256.ll
>    llvm/trunk/test/CodeGen/X86/vectorcall.ll
>    llvm/trunk/test/CodeGen/X86/win32-eh.ll
>    llvm/trunk/test/CodeGen/XCore/varargs.ll