<html><head><meta http-equiv="Content-Type" content="text/html charset=us-ascii"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">Hi Michael<div class=""><br class=""></div><div class="">Thanks a lot for the update and thanks for looking into the problem.</div><div class=""><br class=""></div><div class="">Steven</div><div class=""><br class=""><div><blockquote type="cite" class=""><div class="">On Feb 21, 2017, at 10:55 AM, Matthias Braun <<a href="mailto:matze@braunis.de" class="">matze@braunis.de</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><meta http-equiv="Content-Type" content="text/html charset=us-ascii" class=""><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><div class="">Thanks for looking into this and keeping us updated!</div><br class=""><div class=""><blockquote type="cite" class=""><div class="">On Feb 21, 2017, at 10:51 AM, Michael Kuperstein via llvm-commits <<a href="mailto:llvm-commits@lists.llvm.org" class="">llvm-commits@lists.llvm.org</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div dir="ltr" class="">An update for those following at home, since we have no PR: this is <a href="https://reviews.llvm.org/D30159" class="">https://reviews.llvm.org/D30159</a></div><div class="gmail_extra"><br class=""><div class="gmail_quote">On Thu, Feb 16, 2017 at 7:34 PM, Shahid, Asghar-ahmad <span dir="ltr" class=""><<a href="mailto:Asghar-ahmad.Shahid@amd.com" target="_blank" class="">Asghar-ahmad.Shahid@amd.com</a>></span> wrote:<br class=""><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div lang="EN-US" link="blue" vlink="purple" class="">
<div class="m_-903504498871400217WordSection1"><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1f497d" class="">Sure, thanks for letting me know.<u class=""></u><u class=""></u></span></p><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1f497d" class=""><u class=""></u> <u class=""></u></span></p><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1f497d" class="">-Shahid<u class=""></u><u class=""></u></span></p><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1f497d" class=""><u class=""></u> <u class=""></u></span></p><p class="MsoNormal"><b class=""><span style="font-size:11.0pt;font-family:"Calibri",sans-serif" class="">From:</span></b><span style="font-size:11.0pt;font-family:"Calibri",sans-serif" class=""> Michael Kuperstein [mailto:<a href="mailto:mkuper@google.com" target="_blank" class="">mkuper@google.com</a>]
<br class="">
<b class="">Sent:</b> Friday, February 17, 2017 7:40 AM<br class="">
<b class="">To:</b> Steven Wu <<a href="mailto:stevenwu@apple.com" target="_blank" class="">stevenwu@apple.com</a>>; Shahid, Asghar-ahmad <<a href="mailto:Asghar-ahmad.Shahid@amd.com" target="_blank" class="">Asghar-ahmad.Shahid@amd.com</a>><br class="">
<b class="">Cc:</b> llvm-commits <<a href="mailto:llvm-commits@lists.llvm.org" target="_blank" class="">llvm-commits@lists.llvm.org</a>>; Matthias Braun <<a href="mailto:matze@braunis.de" target="_blank" class="">matze@braunis.de</a>><br class="">
<b class="">Subject:</b> Re: [llvm] r294027 - [SLP] Use SCEV to sort memory accesses.<u class=""></u><u class=""></u></span></p><div class=""><div class="h5"><p class="MsoNormal"><u class=""></u> <u class=""></u></p>
<div class=""><p class="MsoNormal">This looks like it's a bug in r293386 - but it's exposed in r294027 because that lets the optimization introduced in r293386 cover more cases.<u class=""></u><u class=""></u></p>
<div class=""><p class="MsoNormal"><u class=""></u> <u class=""></u></p>
</div>
<div class=""><p class="MsoNormal">Small reproducer:<u class=""></u><u class=""></u></p>
</div>
<div class="">
<div class=""><p class="MsoNormal"><u class=""></u> <u class=""></u></p>
<div class="">
<div class=""><p class="MsoNormal">target datalayout = "e-m:e-i64:64-f80:128-n8:16:<wbr class="">32:64-S128"<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal">target triple = "x86_64-unknown-linux-gnu"<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal"><u class=""></u> <u class=""></u></p>
</div>
<div class=""><p class="MsoNormal">define <4 x i32> @zot() #0 {<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal">bb:<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal"> %p0 = getelementptr inbounds [4 x i8], [4 x i8]* undef, i64 undef, i64 0<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal"> %p1 = getelementptr inbounds [4 x i8], [4 x i8]* undef, i64 undef, i64 1<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal"> %p2 = getelementptr inbounds [4 x i8], [4 x i8]* undef, i64 undef, i64 2<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal"> %p3 = getelementptr inbounds [4 x i8], [4 x i8]* undef, i64 undef, i64 3<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal"> %v3 = load i8, i8* %p3, align 1<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal"> %v2 = load i8, i8* %p2, align 1<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal"> %v0 = load i8, i8* %p0, align 1<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal"> %v1 = load i8, i8* %p1, align 1<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal"> %i0 = insertelement <4 x i8> undef, i8 %v1, i32 0<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal"> %i1 = insertelement <4 x i8> %i0, i8 %v0, i32 1<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal"> %i2 = insertelement <4 x i8> %i1, i8 %v2, i32 2<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal"> %i3 = insertelement <4 x i8> %i2, i8 %v3, i32 3<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal"> %ret = zext <4 x i8> %i3 to <4 x i32><u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal"> ret <4 x i32> %ret<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal">}<u class=""></u><u class=""></u></p>
</div>
</div>
</div>
</div>
<div class=""><p class="MsoNormal"><u class=""></u> <u class=""></u></p>
</div>
<div class=""><p class="MsoNormal">Note the order of elements in the returned vector is 1, 0, 2, 3.<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal"><u class=""></u> <u class=""></u></p>
</div>
<div class=""><p class="MsoNormal" style="margin-bottom:12.0pt">Running this through opt -slp-vectorizer produces:<u class=""></u><u class=""></u></p>
<div class=""><p class="MsoNormal">define <4 x i32> @zot() {<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal">bb:<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal"> %p0 = getelementptr inbounds [4 x i8], [4 x i8]* undef, i64 undef, i64 0<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal"> %p1 = getelementptr inbounds [4 x i8], [4 x i8]* undef, i64 undef, i64 1<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal"> %p2 = getelementptr inbounds [4 x i8], [4 x i8]* undef, i64 undef, i64 2<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal"> %p3 = getelementptr inbounds [4 x i8], [4 x i8]* undef, i64 undef, i64 3<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal"> %0 = bitcast i8* %p0 to <4 x i8>*<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal"> %1 = load <4 x i8>, <4 x i8>* %0, align 1<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal"> %2 = extractelement <4 x i8> %1, i32 0<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal"> %i0 = insertelement <4 x i8> undef, i8 %2, i32 0<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal"> %3 = extractelement <4 x i8> %1, i32 1<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal"> %i1 = insertelement <4 x i8> %i0, i8 %3, i32 1<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal"> %4 = extractelement <4 x i8> %1, i32 2<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal"> %i2 = insertelement <4 x i8> %i1, i8 %4, i32 2<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal"> %5 = extractelement <4 x i8> %1, i32 3<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal"> %i3 = insertelement <4 x i8> %i2, i8 %5, i32 3<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal"> %ret = zext <4 x i8> %i3 to <4 x i32><u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal"> ret <4 x i32> %ret<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal">}<u class=""></u><u class=""></u></p>
</div>
</div>
<div class=""><p class="MsoNormal"><u class=""></u> <u class=""></u></p>
</div>
<div class=""><p class="MsoNormal">And we lost the order.<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal"><u class=""></u> <u class=""></u></p>
</div>
<div class=""><p class="MsoNormal">It looks like the problem is that we're getting into load case of BoUpSLP::vectorizeTree() with E->NeedToShuffle but no VL (SLPVectorizer.cpp:2612) so we don't generate a shuffle.<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal"><u class=""></u> <u class=""></u></p>
</div>
<div class=""><p class="MsoNormal">Mohammad, can you take a look?<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal"><u class=""></u> <u class=""></u></p>
</div>
<div class=""><p class="MsoNormal">Thanks,<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal"> Michael<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal"><u class=""></u> <u class=""></u></p>
</div>
<div class=""><p class="MsoNormal"><u class=""></u> <u class=""></u></p>
<div class=""><p class="MsoNormal">On Thu, Feb 16, 2017 at 3:34 PM, Michael Kuperstein <<a href="mailto:mkuper@google.com" target="_blank" class="">mkuper@google.com</a>> wrote:<u class=""></u><u class=""></u></p>
<blockquote style="border:none;border-left:solid #cccccc 1.0pt;padding:0in 0in 0in 6.0pt;margin-left:4.8pt;margin-right:0in" class="">
<div class=""><p class="MsoNormal">Sure, I'll take a look.<u class=""></u><u class=""></u></p>
<div class=""><p class="MsoNormal">I'll need to dig up SPEC2000 first. :-)<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal"><u class=""></u> <u class=""></u></p>
</div>
<div class=""><p class="MsoNormal">I'll let you know if I have problems reproducing.<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal"><u class=""></u> <u class=""></u></p>
</div>
<div class=""><p class="MsoNormal">Thanks for letting me know,<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal"> Michael<u class=""></u><u class=""></u></p>
</div>
</div>
<div class="">
<div class="">
<div class=""><p class="MsoNormal"><u class=""></u> <u class=""></u></p>
<div class=""><p class="MsoNormal">On Thu, Feb 16, 2017 at 3:03 PM, Steven Wu <<a href="mailto:stevenwu@apple.com" target="_blank" class="">stevenwu@apple.com</a>> wrote:<u class=""></u><u class=""></u></p>
<blockquote style="border:none;border-left:solid #cccccc 1.0pt;padding:0in 0in 0in 6.0pt;margin-left:4.8pt;margin-right:0in" class=""><p class="MsoNormal">Hi Michael<br class="">
<br class="">
This commit seems to break 177_mesa in SPEC2000 on x86_64 with -Os (at least on macOS). Can you take a look? Let me know if you need my help to reproduce the issue or ping down the issue.<br class="">
<br class="">
Thanks<br class="">
<span style="color:#888888" class=""><br class="">
<span class="m_-903504498871400217gmail-m-1512937572344059937m444513980943101152hoenzb">Steven</span></span><u class=""></u><u class=""></u></p>
<div class="">
<div class=""><p class="MsoNormal" style="margin-bottom:12.0pt"><br class="">
> On Feb 3, 2017, at 11:09 AM, Michael Kuperstein via llvm-commits <<a href="mailto:llvm-commits@lists.llvm.org" target="_blank" class="">llvm-commits@lists.llvm.org</a>> wrote:<br class="">
><br class="">
> Author: mkuper<br class="">
> Date: Fri Feb 3 13:09:45 2017<br class="">
> New Revision: 294027<br class="">
><br class="">
> URL: <a href="http://llvm.org/viewvc/llvm-project?rev=294027&view=rev" target="_blank" class="">
http://llvm.org/viewvc/llvm-<wbr class="">project?rev=294027&view=rev</a><br class="">
> Log:<br class="">
> [SLP] Use SCEV to sort memory accesses.<br class="">
><br class="">
> This generalizes memory access sorting to use differences between SCEVs,<br class="">
> instead of relying on constant offsets. That allows us to properly do<br class="">
> SLP vectorization of non-sequentially ordered loads within loops bodies.<br class="">
><br class="">
> Differential Revision: <a href="https://reviews.llvm.org/D29425" target="_blank" class="">
https://reviews.llvm.org/<wbr class="">D29425</a><br class="">
><br class="">
> Modified:<br class="">
> llvm/trunk/include/llvm/<wbr class="">Analysis/LoopAccessAnalysis.h<br class="">
> llvm/trunk/lib/Analysis/<wbr class="">LoopAccessAnalysis.cpp<br class="">
> llvm/trunk/test/Transforms/<wbr class="">SLPVectorizer/X86/jumbled-<wbr class="">load.ll<br class="">
><br class="">
> Modified: llvm/trunk/include/llvm/<wbr class="">Analysis/LoopAccessAnalysis.h<br class="">
> URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/Analysis/LoopAccessAnalysis.h?rev=294027&r1=294026&r2=294027&view=diff" target="_blank" class="">
http://llvm.org/viewvc/llvm-<wbr class="">project/llvm/trunk/include/<wbr class="">llvm/Analysis/<wbr class="">LoopAccessAnalysis.h?rev=<wbr class="">294027&r1=294026&r2=294027&<wbr class="">view=diff</a><br class="">
> ==============================<wbr class="">==============================<wbr class="">==================<br class="">
> --- llvm/trunk/include/llvm/<wbr class="">Analysis/LoopAccessAnalysis.h (original)<br class="">
> +++ llvm/trunk/include/llvm/<wbr class="">Analysis/LoopAccessAnalysis.h Fri Feb 3 13:09:45 2017<br class="">
> @@ -690,8 +690,14 @@ int64_t getPtrStride(<wbr class="">PredicatedScalarEvo<br class="">
> const ValueToValueMap &StridesMap = ValueToValueMap(),<br class="">
> bool Assume = false, bool ShouldCheckWrap = true);<br class="">
><br class="">
> -/// \brief Saves the sorted memory accesses in vector argument 'Sorted' after<br class="">
> -/// sorting the jumbled memory accesses.<br class="">
> +/// \brief Try to sort an array of loads / stores.<br class="">
> +///<br class="">
> +/// If all pointers refer to the same object, and the differences between all<br class="">
> +/// pointer operands are known to be constant, the array is sorted by offset,<br class="">
> +/// and returned in \p Sorted.<br class="">
> +///<br class="">
> +/// If those conditions do not hold, the output array is an arbitrary<br class="">
> +/// permutation of the input.<br class="">
> void sortMemAccesses(ArrayRef<Value *> VL, const DataLayout &DL,<br class="">
> ScalarEvolution &SE, SmallVectorImpl<Value *> &Sorted);<br class="">
><br class="">
><br class="">
> Modified: llvm/trunk/lib/Analysis/<wbr class="">LoopAccessAnalysis.cpp<br class="">
> URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Analysis/LoopAccessAnalysis.cpp?rev=294027&r1=294026&r2=294027&view=diff" target="_blank" class="">
http://llvm.org/viewvc/llvm-<wbr class="">project/llvm/trunk/lib/<wbr class="">Analysis/LoopAccessAnalysis.<wbr class="">cpp?rev=294027&r1=294026&r2=<wbr class="">294027&view=diff</a><br class="">
> ==============================<wbr class="">==============================<wbr class="">==================<br class="">
> --- llvm/trunk/lib/Analysis/<wbr class="">LoopAccessAnalysis.cpp (original)<br class="">
> +++ llvm/trunk/lib/Analysis/<wbr class="">LoopAccessAnalysis.cpp Fri Feb 3 13:09:45 2017<br class="">
> @@ -1058,30 +1058,47 @@ static unsigned getAddressSpaceOperand(V<br class="">
> return -1;<br class="">
> }<br class="">
><br class="">
> -/// Saves the memory accesses after sorting it into vector argument 'Sorted'.<br class="">
> void llvm::sortMemAccesses(<wbr class="">ArrayRef<Value *> VL, const DataLayout &DL,<br class="">
> ScalarEvolution &SE,<br class="">
> SmallVectorImpl<Value *> &Sorted) {<br class="">
> - SmallVector<std::pair<int, Value *>, 4> OffValPairs;<br class="">
> + SmallVector<std::pair<int64_t, Value *>, 4> OffValPairs;<br class="">
> + OffValPairs.reserve(VL.size())<wbr class="">;<br class="">
> + Sorted.reserve(VL.size());<br class="">
> +<br class="">
> + // Walk over the pointers, and map each of them to an offset relative to<br class="">
> + // first pointer in the array.<br class="">
> + Value *Ptr0 = getPointerOperand(VL[0]);<br class="">
> + const SCEV *Scev0 = SE.getSCEV(Ptr0);<br class="">
> + Value *Obj0 = GetUnderlyingObject(Ptr0, DL);<br class="">
> +<br class="">
> for (auto *Val : VL) {<br class="">
> - // Compute the constant offset from the base pointer of each memory accesses<br class="">
> - // and insert into the vector of key,value pair which needs to be sorted.<br class="">
> Value *Ptr = getPointerOperand(Val);<br class="">
> - unsigned AS = getAddressSpaceOperand(Val);<br class="">
> - unsigned PtrBitWidth = DL.getPointerSizeInBits(AS);<br class="">
> - Type *Ty = cast<PointerType>(Ptr-><wbr class="">getType())->getElementType();<br class="">
> - APInt Size(PtrBitWidth, DL.getTypeStoreSize(Ty));<br class="">
> -<br class="">
> - // FIXME: Currently the offsets are assumed to be constant.However this not<br class="">
> - // always true as offsets can be variables also and we would need to<br class="">
> - // consider the difference of the variable offsets.<br class="">
> - APInt Offset(PtrBitWidth, 0);<br class="">
> - Ptr-><wbr class="">stripAndAccumulateInBoundsCons<wbr class="">tantOffsets(DL, Offset);<br class="">
> - OffValPairs.push_back(std::<wbr class="">make_pair(Offset.getSExtValue(<wbr class="">), Val));<br class="">
> +<br class="">
> + // If a pointer refers to a different underlying object, bail - the<br class="">
> + // pointers are by definition incomparable.<br class="">
> + Value *CurrObj = GetUnderlyingObject(Ptr, DL);<br class="">
> + if (CurrObj != Obj0) {<br class="">
> + Sorted.append(VL.begin(), VL.end());<br class="">
> + return;<br class="">
> + }<br class="">
> +<br class="">
> + const SCEVConstant *Diff =<br class="">
> + dyn_cast<SCEVConstant>(SE.<wbr class="">getMinusSCEV(SE.getSCEV(Ptr), Scev0));<br class="">
> +<br class="">
> + // The pointers may not have a constant offset from each other, or SCEV<br class="">
> + // may just not be smart enough to figure out they do. Regardless,<br class="">
> + // there's nothing we can do.<br class="">
> + if (!Diff) {<br class="">
> + Sorted.append(VL.begin(), VL.end());<br class="">
> + return;<br class="">
> + }<br class="">
> +<br class="">
> + OffValPairs.emplace_back(Diff-<wbr class="">>getAPInt().getSExtValue(), Val);<br class="">
> }<br class="">
> +<br class="">
> std::sort(OffValPairs.begin()<wbr class="">, OffValPairs.end(),<br class="">
> - [](const std::pair<int, Value *> &Left,<br class="">
> - const std::pair<int, Value *> &Right) {<br class="">
> + [](const std::pair<int64_t, Value *> &Left,<br class="">
> + const std::pair<int64_t, Value *> &Right) {<br class="">
> return Left.first < Right.first;<br class="">
> });<br class="">
><br class="">
><br class="">
> Modified: llvm/trunk/test/Transforms/<wbr class="">SLPVectorizer/X86/jumbled-<wbr class="">load.ll<br class="">
> URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/SLPVectorizer/X86/jumbled-load.ll?rev=294027&r1=294026&r2=294027&view=diff" target="_blank" class="">
http://llvm.org/viewvc/llvm-<wbr class="">project/llvm/trunk/test/<wbr class="">Transforms/SLPVectorizer/X86/<wbr class="">jumbled-load.ll?rev=294027&r1=<wbr class="">294026&r2=294027&view=diff</a><br class="">
> ==============================<wbr class="">==============================<wbr class="">==================<br class="">
> --- llvm/trunk/test/Transforms/<wbr class="">SLPVectorizer/X86/jumbled-<wbr class="">load.ll (original)<br class="">
> +++ llvm/trunk/test/Transforms/<wbr class="">SLPVectorizer/X86/jumbled-<wbr class="">load.ll Fri Feb 3 13:09:45 2017<br class="">
> @@ -1,18 +1,18 @@<br class="">
> ; NOTE: Assertions have been autogenerated by utils/update_test_checks.py<br class="">
> ; RUN: opt < %s -S -mtriple=x86_64-unknown -mattr=+avx -slp-threshold=-10 -slp-vectorizer | FileCheck %s<br class="">
><br class="">
> -<br class="">
> +@total = common global i32 0, align 4<br class="">
><br class="">
> define i32 @jumbled-load(i32* noalias nocapture %in, i32* noalias nocapture %inn, i32* noalias nocapture %out) {<br class="">
> ; CHECK-LABEL: @jumbled-load(<br class="">
> -; CHECK-NEXT: [[IN_ADDR:%.*]] = getelementptr inbounds i32, i32* %in, i64 0<br class="">
> +; CHECK-NEXT: [[IN_ADDR:%.*]] = getelementptr inbounds i32, i32* [[IN:%.*]], i64 0<br class="">
> ; CHECK-NEXT: [[GEP_1:%.*]] = getelementptr inbounds i32, i32* [[IN_ADDR]], i64 3<br class="">
> ; CHECK-NEXT: [[GEP_2:%.*]] = getelementptr inbounds i32, i32* [[IN_ADDR]], i64 1<br class="">
> ; CHECK-NEXT: [[GEP_3:%.*]] = getelementptr inbounds i32, i32* [[IN_ADDR]], i64 2<br class="">
> ; CHECK-NEXT: [[TMP1:%.*]] = bitcast i32* [[IN_ADDR]] to <4 x i32>*<br class="">
> ; CHECK-NEXT: [[TMP2:%.*]] = load <4 x i32>, <4 x i32>* [[TMP1]], align 4<br class="">
> ; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> undef, <4 x i32> <i32 1, i32 3, i32 2, i32 0><br class="">
> -; CHECK-NEXT: [[INN_ADDR:%.*]] = getelementptr inbounds i32, i32* %inn, i64 0<br class="">
> +; CHECK-NEXT: [[INN_ADDR:%.*]] = getelementptr inbounds i32, i32* [[INN:%.*]], i64 0<br class="">
> ; CHECK-NEXT: [[GEP_4:%.*]] = getelementptr inbounds i32, i32* [[INN_ADDR]], i64 2<br class="">
> ; CHECK-NEXT: [[GEP_5:%.*]] = getelementptr inbounds i32, i32* [[INN_ADDR]], i64 3<br class="">
> ; CHECK-NEXT: [[GEP_6:%.*]] = getelementptr inbounds i32, i32* [[INN_ADDR]], i64 1<br class="">
> @@ -20,10 +20,10 @@ define i32 @jumbled-load(i32* noalias no<br class="">
> ; CHECK-NEXT: [[TMP5:%.*]] = load <4 x i32>, <4 x i32>* [[TMP4]], align 4<br class="">
> ; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> undef, <4 x i32> <i32 0, i32 1, i32 3, i32 2><br class="">
> ; CHECK-NEXT: [[TMP7:%.*]] = mul <4 x i32> [[TMP3]], [[TMP6]]<br class="">
> -; CHECK-NEXT: [[GEP_7:%.*]] = getelementptr inbounds i32, i32* %out, i64 0<br class="">
> -; CHECK-NEXT: [[GEP_8:%.*]] = getelementptr inbounds i32, i32* %out, i64 1<br class="">
> -; CHECK-NEXT: [[GEP_9:%.*]] = getelementptr inbounds i32, i32* %out, i64 2<br class="">
> -; CHECK-NEXT: [[GEP_10:%.*]] = getelementptr inbounds i32, i32* %out, i64 3<br class="">
> +; CHECK-NEXT: [[GEP_7:%.*]] = getelementptr inbounds i32, i32* [[OUT:%.*]], i64 0<br class="">
> +; CHECK-NEXT: [[GEP_8:%.*]] = getelementptr inbounds i32, i32* [[OUT]], i64 1<br class="">
> +; CHECK-NEXT: [[GEP_9:%.*]] = getelementptr inbounds i32, i32* [[OUT]], i64 2<br class="">
> +; CHECK-NEXT: [[GEP_10:%.*]] = getelementptr inbounds i32, i32* [[OUT]], i64 3<br class="">
> ; CHECK-NEXT: [[TMP8:%.*]] = bitcast i32* [[GEP_7]] to <4 x i32>*<br class="">
> ; CHECK-NEXT: store <4 x i32> [[TMP7]], <4 x i32>* [[TMP8]], align 4<br class="">
> ; CHECK-NEXT: ret i32 undef<br class="">
> @@ -59,3 +59,116 @@ define i32 @jumbled-load(i32* noalias no<br class="">
><br class="">
> ret i32 undef<br class="">
> }<br class="">
> +<br class="">
> +; Make sure we can sort loads even if they have non-constant offsets, as long as<br class="">
> +; the offset *differences* are constant and computable by SCEV.<br class="">
> +define void @scev(i64 %N, i32* nocapture readonly %b, i32* nocapture readonly %c) {<br class="">
> +; CHECK-LABEL: @scev(<br class="">
> +; CHECK-NEXT: entry:<br class="">
> +; CHECK-NEXT: [[CMP_OUTER:%.*]] = icmp sgt i64 [[N:%.*]], 0<br class="">
> +; CHECK-NEXT: br i1 [[CMP_OUTER]], label [[FOR_BODY_PREHEADER:%.*]], label [[FOR_END:%.*]]<br class="">
> +; CHECK: for.body.preheader:<br class="">
> +; CHECK-NEXT: br label [[FOR_BODY:%.*]]<br class="">
> +; CHECK: for.body:<br class="">
> +; CHECK-NEXT: [[I_P:%.*]] = phi i64 [ [[ADD21:%.*]], [[FOR_BODY]] ], [ 0, [[FOR_BODY_PREHEADER]] ]<br class="">
> +; CHECK-NEXT: [[TMP0:%.*]] = phi <4 x i32> [ [[TMP14:%.*]], [[FOR_BODY]] ], [ zeroinitializer, [[FOR_BODY_PREHEADER]] ]<br class="">
> +; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i32, i32* [[B:%.*]], i64 [[I_P]]<br class="">
> +; CHECK-NEXT: [[ARRAYIDX1:%.*]] = getelementptr inbounds i32, i32* [[C:%.*]], i64 [[I_P]]<br class="">
> +; CHECK-NEXT: [[ADD3:%.*]] = or i64 [[I_P]], 1<br class="">
> +; CHECK-NEXT: [[ARRAYIDX4:%.*]] = getelementptr inbounds i32, i32* [[B]], i64 [[ADD3]]<br class="">
> +; CHECK-NEXT: [[ARRAYIDX6:%.*]] = getelementptr inbounds i32, i32* [[C]], i64 [[ADD3]]<br class="">
> +; CHECK-NEXT: [[ADD9:%.*]] = or i64 [[I_P]], 2<br class="">
> +; CHECK-NEXT: [[ARRAYIDX10:%.*]] = getelementptr inbounds i32, i32* [[B]], i64 [[ADD9]]<br class="">
> +; CHECK-NEXT: [[ARRAYIDX12:%.*]] = getelementptr inbounds i32, i32* [[C]], i64 [[ADD9]]<br class="">
> +; CHECK-NEXT: [[ADD15:%.*]] = or i64 [[I_P]], 3<br class="">
> +; CHECK-NEXT: [[ARRAYIDX16:%.*]] = getelementptr inbounds i32, i32* [[B]], i64 [[ADD15]]<br class="">
> +; CHECK-NEXT: [[TMP1:%.*]] = bitcast i32* [[ARRAYIDX]] to <4 x i32>*<br class="">
> +; CHECK-NEXT: [[TMP2:%.*]] = load <4 x i32>, <4 x i32>* [[TMP1]], align 4<br class="">
> +; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> undef, <4 x i32> <i32 2, i32 1, i32 0, i32 3><br class="">
> +; CHECK-NEXT: [[TMP4:%.*]] = bitcast i32* [[ARRAYIDX]] to <4 x i32>*<br class="">
> +; CHECK-NEXT: [[TMP5:%.*]] = load <4 x i32>, <4 x i32>* [[TMP4]], align 4<br class="">
> +; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> undef, <4 x i32> <i32 2, i32 1, i32 0, i32 3><br class="">
> +; CHECK-NEXT: [[ARRAYIDX18:%.*]] = getelementptr inbounds i32, i32* [[C]], i64 [[ADD15]]<br class="">
> +; CHECK-NEXT: [[TMP7:%.*]] = bitcast i32* [[ARRAYIDX1]] to <4 x i32>*<br class="">
> +; CHECK-NEXT: [[TMP8:%.*]] = load <4 x i32>, <4 x i32>* [[TMP7]], align 4<br class="">
> +; CHECK-NEXT: [[TMP9:%.*]] = shufflevector <4 x i32> [[TMP8]], <4 x i32> undef, <4 x i32> <i32 2, i32 1, i32 0, i32 3><br class="">
> +; CHECK-NEXT: [[TMP10:%.*]] = bitcast i32* [[ARRAYIDX1]] to <4 x i32>*<br class="">
> +; CHECK-NEXT: [[TMP11:%.*]] = load <4 x i32>, <4 x i32>* [[TMP10]], align 4<br class="">
> +; CHECK-NEXT: [[TMP12:%.*]] = shufflevector <4 x i32> [[TMP11]], <4 x i32> undef, <4 x i32> <i32 2, i32 1, i32 0, i32 3><br class="">
> +; CHECK-NEXT: [[TMP13:%.*]] = add <4 x i32> [[TMP3]], [[TMP0]]<br class="">
> +; CHECK-NEXT: [[TMP14]] = add <4 x i32> [[TMP13]], [[TMP12]]<br class="">
> +; CHECK-NEXT: [[ADD21]] = add nuw nsw i64 [[I_P]], 4<br class="">
> +; CHECK-NEXT: [[CMP:%.*]] = icmp slt i64 [[ADD21]], [[N]]<br class="">
> +; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[FOR_END_LOOPEXIT:%.*]]<br class="">
> +; CHECK: for.end.loopexit:<br class="">
> +; CHECK-NEXT: br label [[FOR_END]]<br class="">
> +; CHECK: for.end:<br class="">
> +; CHECK-NEXT: [[TMP15:%.*]] = phi <4 x i32> [ zeroinitializer, [[ENTRY:%.*]] ], [ [[TMP14]], [[FOR_END_LOOPEXIT]] ]<br class="">
> +; CHECK-NEXT: [[TMP16:%.*]] = extractelement <4 x i32> [[TMP15]], i32 0<br class="">
> +; CHECK-NEXT: [[TMP17:%.*]] = extractelement <4 x i32> [[TMP15]], i32 1<br class="">
> +; CHECK-NEXT: [[ADD22:%.*]] = add nsw i32 [[TMP17]], [[TMP16]]<br class="">
> +; CHECK-NEXT: [[TMP18:%.*]] = extractelement <4 x i32> [[TMP15]], i32 2<br class="">
> +; CHECK-NEXT: [[ADD23:%.*]] = add nsw i32 [[ADD22]], [[TMP18]]<br class="">
> +; CHECK-NEXT: [[TMP19:%.*]] = extractelement <4 x i32> [[TMP15]], i32 3<br class="">
> +; CHECK-NEXT: [[ADD24:%.*]] = add nsw i32 [[ADD23]], [[TMP19]]<br class="">
> +; CHECK-NEXT: store i32 [[ADD24]], i32* @total, align 4<br class="">
> +; CHECK-NEXT: ret void<br class="">
> +;<br class="">
> +entry:<br class="">
> + %cmp.outer = icmp sgt i64 %N, 0<br class="">
> + br i1 %cmp.outer, label %for.body.preheader, label %for.end<br class="">
> +<br class="">
> +for.body.preheader: ; preds = %entry<br class="">
> + br label %for.body<br class="">
> +<br class="">
> +for.body: ; preds = %for.body.preheader, %for.body<br class="">
> + %a4.p = phi i32 [ %add20, %for.body ], [ 0, %for.body.preheader ]<br class="">
> + %a3.p = phi i32 [ %add2, %for.body ], [ 0, %for.body.preheader ]<br class="">
> + %a2.p = phi i32 [ %add8, %for.body ], [ 0, %for.body.preheader ]<br class="">
> + %a1.p = phi i32 [ %add14, %for.body ], [ 0, %for.body.preheader ]<br class="">
> + %i.p = phi i64 [ %add21, %for.body ], [ 0, %for.body.preheader ]<br class="">
> + %arrayidx = getelementptr inbounds i32, i32* %b, i64 %i.p<br class="">
> + %0 = load i32, i32* %arrayidx, align 4<br class="">
> + %arrayidx1 = getelementptr inbounds i32, i32* %c, i64 %i.p<br class="">
> + %1 = load i32, i32* %arrayidx1, align 4<br class="">
> + %add = add i32 %0, %a3.p<br class="">
> + %add2 = add i32 %add, %1<br class="">
> + %add3 = or i64 %i.p, 1<br class="">
> + %arrayidx4 = getelementptr inbounds i32, i32* %b, i64 %add3<br class="">
> + %2 = load i32, i32* %arrayidx4, align 4<br class="">
> + %arrayidx6 = getelementptr inbounds i32, i32* %c, i64 %add3<br class="">
> + %3 = load i32, i32* %arrayidx6, align 4<br class="">
> + %add7 = add i32 %2, %a2.p<br class="">
> + %add8 = add i32 %add7, %3<br class="">
> + %add9 = or i64 %i.p, 2<br class="">
> + %arrayidx10 = getelementptr inbounds i32, i32* %b, i64 %add9<br class="">
> + %4 = load i32, i32* %arrayidx10, align 4<br class="">
> + %arrayidx12 = getelementptr inbounds i32, i32* %c, i64 %add9<br class="">
> + %5 = load i32, i32* %arrayidx12, align 4<br class="">
> + %add13 = add i32 %4, %a1.p<br class="">
> + %add14 = add i32 %add13, %5<br class="">
> + %add15 = or i64 %i.p, 3<br class="">
> + %arrayidx16 = getelementptr inbounds i32, i32* %b, i64 %add15<br class="">
> + %6 = load i32, i32* %arrayidx16, align 4<br class="">
> + %arrayidx18 = getelementptr inbounds i32, i32* %c, i64 %add15<br class="">
> + %7 = load i32, i32* %arrayidx18, align 4<br class="">
> + %add19 = add i32 %6, %a4.p<br class="">
> + %add20 = add i32 %add19, %7<br class="">
> + %add21 = add nuw nsw i64 %i.p, 4<br class="">
> + %cmp = icmp slt i64 %add21, %N<br class="">
> + br i1 %cmp, label %for.body, label %for.end.loopexit<br class="">
> +<br class="">
> +for.end.loopexit: ; preds = %for.body<br class="">
> + br label %for.end<br class="">
> +<br class="">
> +for.end: ; preds = %for.end.loopexit, %entry<br class="">
> + %a1.0.lcssa = phi i32 [ 0, %entry ], [ %add14, %for.end.loopexit ]<br class="">
> + %a2.0.lcssa = phi i32 [ 0, %entry ], [ %add8, %for.end.loopexit ]<br class="">
> + %a3.0.lcssa = phi i32 [ 0, %entry ], [ %add2, %for.end.loopexit ]<br class="">
> + %a4.0.lcssa = phi i32 [ 0, %entry ], [ %add20, %for.end.loopexit ]<br class="">
> + %add22 = add nsw i32 %a2.0.lcssa, %a1.0.lcssa<br class="">
> + %add23 = add nsw i32 %add22, %a3.0.lcssa<br class="">
> + %add24 = add nsw i32 %add23, %a4.0.lcssa<br class="">
> + store i32 %add24, i32* @total, align 4<br class="">
> + ret void<br class="">
> +}<br class="">
><br class="">
><br class="">
> ______________________________<wbr class="">_________________<br class="">
> llvm-commits mailing list<br class="">
> <a href="mailto:llvm-commits@lists.llvm.org" target="_blank" class="">llvm-commits@lists.llvm.org</a><br class="">
> <a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits" target="_blank" class="">
http://lists.llvm.org/cgi-bin/<wbr class="">mailman/listinfo/llvm-commits</a><u class=""></u><u class=""></u></p>
</div>
</div>
</blockquote>
</div><p class="MsoNormal"><u class=""></u> <u class=""></u></p>
</div>
</div>
</div>
</blockquote>
</div><p class="MsoNormal"><u class=""></u> <u class=""></u></p>
</div>
</div>
</div></div></div>
</div>
</blockquote></div><br class=""></div>
_______________________________________________<br class="">llvm-commits mailing list<br class=""><a href="mailto:llvm-commits@lists.llvm.org" class="">llvm-commits@lists.llvm.org</a><br class=""><a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits" class="">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits</a><br class=""></div></blockquote></div><br class=""></div></div></blockquote></div><br class=""></div></body></html>