[llvm] r297695 - In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.

Sanjoy Das via llvm-commits llvm-commits at lists.llvm.org
Thu May 11 15:41:44 PDT 2017


Even if you can't immediately fix the issue cleanly, a conservative
workaround that we can temporarily apply downstream* will be immensely
helpful.

* We have a JIT compiler based on LLVM.

-- Sanjoy

On Thu, May 11, 2017 at 3:30 PM, Sanjoy Das
<sanjoy at playingwithpointers.com> wrote:
> Hi,
>
> This change causes llc to crash on the following input:
>
> target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
> declare void @f()
>
> define i32 addrspace(1)* @test(i32 addrspace(1)* %a, i32 addrspace(1)*
> %b, i1 %which) gc "statepoint-example" {
> entry:
>   %tok = tail call token (i64, i32, void ()*, i32, i32, ...)
> @llvm.experimental.gc.statepoint.p0f_isVoidf(i64 0, i32 0, void ()*
> @f, i32 0, i32 0, i32 0, i32 0, i32 addrspace(1)* %a, i32
> addrspace(1)* %b)
>   %a.r = tail call coldcc i8 addrspace(1)*
> @llvm.experimental.gc.relocate.p1i8(token %tok, i32 7, i32 7) ; (%a,
> %a)
>   %b.r = tail call coldcc i8 addrspace(1)*
> @llvm.experimental.gc.relocate.p1i8(token %tok, i32 8, i32 8) ; (%b,
> %b)
>   %cond.v = select i1 %which, i8 addrspace(1)* %a.r, i8 addrspace(1)* %b.r
>   %cond = bitcast i8 addrspace(1)* %cond.v to i32 addrspace(1)*
>   ret i32 addrspace(1)* %cond
> }
>
> declare token @llvm.experimental.gc.statepoint.p0f_isVoidf(i64, i32,
> void ()*, i32, i32, ...)
> declare i8 addrspace(1)* @llvm.experimental.gc.relocate.p1i8(token, i32, i32)
>
> I've filed PR33010 for this, can you please take a look?
>
> -- Sanjoy
>
> On Thu, Apr 27, 2017 at 7:04 PM, Nirav Davé via llvm-commits
> <llvm-commits at lists.llvm.org> wrote:
>> Hi Vedant:
>>
>> I've been looking at this for a bit. No fix yet, but here's what I've
>> determined so far:
>>
>> The r297695 patch is correct in that the transformation to the SelectionDAG
>> is correct and the problem is a latent issue downstream. We're now correctly
>> determining that the Loads at stores in the example have no dependence
>> order. The core of the problem seems to be from duplicating the shared stub
>> load (which is why it's only in 32-bit X86 and not 64)  the two loads of b
>> (which cannot be shared). It seems like we're having an issue with
>> duplication of the SU in ScheduleDAGRRList::CopyAndMoveSuccessors where are
>> confused determining the unfolding the load causing us to miscount the
>> successors.
>>
>> -Nirav
>>
>>
>>
>> On Thu, Apr 27, 2017 at 7:29 PM, Vedant Kumar <vsk at apple.com> wrote:
>>>
>>> Hi Nirav,
>>>
>>> I saw a crasher on bugzilla and narrowed down the issue to this commit.
>>> Please see:
>>>
>>>   https://bugs.llvm.org/show_bug.cgi?id=32610
>>>
>>> The reduced test case compiles with r297692 (previous commit), but the
>>> crash occurs when r297695 (this commit) is included. The original C file,
>>> and a reduced IR file, is attached to the bug report.
>>>
>>> Could you take a look?
>>>
>>> thanks,
>>> vedant
>>>
>>>
>>> > On Mar 13, 2017, at 5:34 PM, Nirav Dave via llvm-commits
>>> > <llvm-commits at lists.llvm.org> wrote:
>>> >
>>> > Author: niravd
>>> > Date: Mon Mar 13 19:34:14 2017
>>> > New Revision: 297695
>>> >
>>> > URL: http://llvm.org/viewvc/llvm-project?rev=297695&view=rev
>>> > Log:
>>> > In visitSTORE, always use FindBetterChain, rather than only when UseAA
>>> > is enabled.
>>> >
>>> >    Recommiting with compiler time improvements
>>> >
>>> >    Recommitting after fixup of 32-bit aliasing sign offset bug in
>>> > DAGCombiner.
>>> >
>>> >    * Simplify Consecutive Merge Store Candidate Search
>>> >
>>> >    Now that address aliasing is much less conservative, push through
>>> >    simplified store merging search and chain alias analysis which only
>>> >    checks for parallel stores through the chain subgraph. This is
>>> > cleaner
>>> >    as the separation of non-interfering loads/stores from the
>>> >    store-merging logic.
>>> >
>>> >    When merging stores search up the chain through a single load, and
>>> >    finds all possible stores by looking down from through a load and a
>>> >    TokenFactor to all stores visited.
>>> >
>>> >    This improves the quality of the output SelectionDAG and the output
>>> >    Codegen (save perhaps for some ARM cases where we correctly
>>> > constructs
>>> >    wider loads, but then promotes them to float operations which appear
>>> >    but requires more expensive constant generation).
>>> >
>>> >    Some minor peephole optimizations to deal with improved SubDAG shapes
>>> > (listed below)
>>> >
>>> >    Additional Minor Changes:
>>> >
>>> >      1. Finishes removing unused AliasLoad code
>>> >
>>> >      2. Unifies the chain aggregation in the merged stores across code
>>> >         paths
>>> >
>>> >      3. Re-add the Store node to the worklist after calling
>>> >         SimplifyDemandedBits.
>>> >
>>> >      4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
>>> >         arbitrary, but seems sufficient to not cause regressions in
>>> >         tests.
>>> >
>>> >      5. Remove Chain dependencies of Memory operations on CopyfromReg
>>> >         nodes as these are captured by data dependence
>>> >
>>> >      6. Forward loads-store values through tokenfactors containing
>>> >          {CopyToReg,CopyFromReg} Values.
>>> >
>>> >      7. Peephole to convert buildvector of extract_vector_elt to
>>> >         extract_subvector if possible (see
>>> >         CodeGen/AArch64/store-merge.ll)
>>> >
>>> >      8. Store merging for the ARM target is restricted to 32-bit as
>>> >         some in some contexts invalid 64-bit operations are being
>>> >         generated. This can be removed once appropriate checks are
>>> >         added.
>>> >
>>> >    This finishes the change Matt Arsenault started in r246307 and
>>> >    jyknight's original patch.
>>> >
>>> >    Many tests required some changes as memory operations are now
>>> >    reorderable, improving load-store forwarding. One test in
>>> >    particular is worth noting:
>>> >
>>> >      CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
>>> >      forwarding converts a load-store pair into a parallel store and
>>> >      a memory-realized bitcast of the same value. However, because we
>>> >      lose the sharing of the explicit and implicit store values we
>>> >      must create another local store. A similar transformation
>>> >      happens before SelectionDAG as well.
>>> >
>>> >    Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle
>>> >
>>> > Added:
>>> >    llvm/trunk/test/CodeGen/X86/pr32108.ll
>>> > Removed:
>>> >    llvm/trunk/test/CodeGen/X86/combiner-aa-0.ll
>>> >    llvm/trunk/test/CodeGen/X86/combiner-aa-1.ll
>>> >    llvm/trunk/test/CodeGen/X86/pr18023.ll
>>> > Modified:
>>> >    llvm/trunk/include/llvm/Target/TargetLowering.h
>>> >    llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
>>> >    llvm/trunk/lib/CodeGen/TargetLoweringBase.cpp
>>> >    llvm/trunk/lib/Target/AArch64/AArch64ISelLowering.cpp
>>> >    llvm/trunk/lib/Target/ARM/ARMISelLowering.h
>>> >    llvm/trunk/test/CodeGen/AArch64/argument-blocks.ll
>>> >    llvm/trunk/test/CodeGen/AArch64/arm64-abi.ll
>>> >    llvm/trunk/test/CodeGen/AArch64/arm64-memset-inline.ll
>>> >    llvm/trunk/test/CodeGen/AArch64/arm64-variadic-aapcs.ll
>>> >    llvm/trunk/test/CodeGen/AArch64/merge-store.ll
>>> >    llvm/trunk/test/CodeGen/AArch64/vector_merge_dep_check.ll
>>> >    llvm/trunk/test/CodeGen/AMDGPU/debugger-insert-nops.ll
>>> >    llvm/trunk/test/CodeGen/AMDGPU/insert_vector_elt.ll
>>> >    llvm/trunk/test/CodeGen/AMDGPU/merge-stores.ll
>>> >    llvm/trunk/test/CodeGen/AMDGPU/private-element-size.ll
>>> >    llvm/trunk/test/CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll
>>> >    llvm/trunk/test/CodeGen/ARM/2012-10-04-AAPCS-byval-align8.ll
>>> >    llvm/trunk/test/CodeGen/ARM/alloc-no-stack-realign.ll
>>> >    llvm/trunk/test/CodeGen/ARM/gpr-paired-spill.ll
>>> >    llvm/trunk/test/CodeGen/ARM/ifcvt10.ll
>>> >    llvm/trunk/test/CodeGen/ARM/illegal-bitfield-loadstore.ll
>>> >    llvm/trunk/test/CodeGen/ARM/static-addr-hoisting.ll
>>> >    llvm/trunk/test/CodeGen/BPF/undef.ll
>>> >    llvm/trunk/test/CodeGen/MSP430/Inst16mm.ll
>>> >    llvm/trunk/test/CodeGen/Mips/cconv/arguments-float.ll
>>> >    llvm/trunk/test/CodeGen/Mips/cconv/arguments-varargs.ll
>>> >    llvm/trunk/test/CodeGen/Mips/fastcc.ll
>>> >    llvm/trunk/test/CodeGen/Mips/load-store-left-right.ll
>>> >    llvm/trunk/test/CodeGen/Mips/micromips-li.ll
>>> >    llvm/trunk/test/CodeGen/Mips/mips64-f128-call.ll
>>> >    llvm/trunk/test/CodeGen/Mips/mips64-f128.ll
>>> >    llvm/trunk/test/CodeGen/Mips/mno-ldc1-sdc1.ll
>>> >    llvm/trunk/test/CodeGen/Mips/msa/f16-llvm-ir.ll
>>> >    llvm/trunk/test/CodeGen/Mips/msa/i5_ld_st.ll
>>> >    llvm/trunk/test/CodeGen/Mips/o32_cc_byval.ll
>>> >    llvm/trunk/test/CodeGen/Mips/o32_cc_vararg.ll
>>> >    llvm/trunk/test/CodeGen/PowerPC/anon_aggr.ll
>>> >    llvm/trunk/test/CodeGen/PowerPC/complex-return.ll
>>> >    llvm/trunk/test/CodeGen/PowerPC/jaggedstructs.ll
>>> >    llvm/trunk/test/CodeGen/PowerPC/ppc64-align-long-double.ll
>>> >    llvm/trunk/test/CodeGen/PowerPC/structsinmem.ll
>>> >    llvm/trunk/test/CodeGen/PowerPC/structsinregs.ll
>>> >    llvm/trunk/test/CodeGen/SystemZ/unaligned-01.ll
>>> >    llvm/trunk/test/CodeGen/Thumb/2010-07-15-debugOrdering.ll
>>> >    llvm/trunk/test/CodeGen/Thumb/stack-access.ll
>>> >    llvm/trunk/test/CodeGen/X86/2010-09-17-SideEffectsInChain.ll
>>> >    llvm/trunk/test/CodeGen/X86/2012-11-28-merge-store-alias.ll
>>> >    llvm/trunk/test/CodeGen/X86/MergeConsecutiveStores.ll
>>> >    llvm/trunk/test/CodeGen/X86/avx-vbroadcast.ll
>>> >    llvm/trunk/test/CodeGen/X86/avx512-mask-op.ll
>>> >    llvm/trunk/test/CodeGen/X86/chain_order.ll
>>> >    llvm/trunk/test/CodeGen/X86/clear_upper_vector_element_bits.ll
>>> >    llvm/trunk/test/CodeGen/X86/copy-eflags.ll
>>> >    llvm/trunk/test/CodeGen/X86/dag-merge-fast-accesses.ll
>>> >    llvm/trunk/test/CodeGen/X86/dont-trunc-store-double-to-float.ll
>>> >
>>> > llvm/trunk/test/CodeGen/X86/extractelement-legalization-store-ordering.ll
>>> >    llvm/trunk/test/CodeGen/X86/i256-add.ll
>>> >    llvm/trunk/test/CodeGen/X86/i386-shrink-wrapping.ll
>>> >    llvm/trunk/test/CodeGen/X86/live-range-nosubreg.ll
>>> >    llvm/trunk/test/CodeGen/X86/longlong-deadload.ll
>>> >    llvm/trunk/test/CodeGen/X86/merge-consecutive-loads-128.ll
>>> >    llvm/trunk/test/CodeGen/X86/merge-consecutive-loads-256.ll
>>> >    llvm/trunk/test/CodeGen/X86/merge-store-partially-alias-loads.ll
>>> >    llvm/trunk/test/CodeGen/X86/split-store.ll
>>> >    llvm/trunk/test/CodeGen/X86/stores-merging.ll
>>> >    llvm/trunk/test/CodeGen/X86/vector-compare-results.ll
>>> >    llvm/trunk/test/CodeGen/X86/vector-shuffle-variable-128.ll
>>> >    llvm/trunk/test/CodeGen/X86/vector-shuffle-variable-256.ll
>>> >    llvm/trunk/test/CodeGen/X86/vectorcall.ll
>>> >    llvm/trunk/test/CodeGen/X86/win32-eh.ll
>>> >    llvm/trunk/test/CodeGen/XCore/varargs.ll
>>>
>>
>>
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>


More information about the llvm-commits mailing list