<div dir="ltr">Hi Gerolf,<div><br></div><div>Thanks for doing this testing! Hal, it sounds like we can make this switch as soon as you're happy to enable MDNoAlias by default.</div><div><br></div><div>Thanks!</div><div>

<br></div><div>James</div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On 11 August 2014 04:33, Gerolf Hoflehner <span dir="ltr"><<a href="mailto:ghoflehner@apple.com" target="_blank">ghoflehner@apple.com</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Dito for x86-64.<br>

<br>

-Gerolf<br>

<div class=""><br>

<br>

On Aug 7, 2014, at 2:29 PM, Gerolf Hoflehner <<a href="mailto:ghoflehner@apple.com">ghoflehner@apple.com</a>> wrote:<br>

<br>

> CINT2006 looks fine on ARM64. Performance changes are within the noise with a small uptick favoring the change.<br>

><br>

</div>> <PastedGraphic-3.pdf><br>

<div class="HOEnZb"><div class="h5">><br>

> On Aug 6, 2014, at 2:00 PM, Gerolf Hoflehner <<a href="mailto:ghoflehner@apple.com">ghoflehner@apple.com</a>> wrote:<br>

><br>

>> That looks interesting. I’ll kick off an initial set of test runs on x86-64 and ARM64 for CINT2006 O3 LTO on the ref input sets. It would be great if we could test more HPC workloads, though. Does anyone have ideas/benchmark setup to try? On CINT2006 I only expect libquantum and hmmer to be sensitive to the change.<br>


>><br>

>> Cheers<br>

>> Gerolf<br>

>><br>

>> On Aug 6, 2014, at 5:56 AM, James Molloy <<a href="mailto:james.molloy@arm.com">james.molloy@arm.com</a>> wrote:<br>

>><br>

>>> Author: jamesm<br>

>>> Date: Wed Aug  6 07:56:19 2014<br>

>>> New Revision: 214963<br>

>>><br>

>>> URL: <a href="http://llvm.org/viewvc/llvm-project?rev=214963&view=rev" target="_blank">http://llvm.org/viewvc/llvm-project?rev=214963&view=rev</a><br>

>>> Log:<br>

>>> Add a new option -run-slp-after-loop-vectorization.<br>

>>><br>

>>> This swaps the order of the loop vectorizer and the SLP/BB vectorizers. It is disabled by default so we can do performance testing - ideally we want to change to having the loop vectorizer running first, and the SLP vectorizer using its leftovers instead of the other way around.<br>


>>><br>

>>><br>

>>> Modified:<br>

>>>  llvm/trunk/lib/Transforms/IPO/PassManagerBuilder.cpp<br>

>>><br>

>>> Modified: llvm/trunk/lib/Transforms/IPO/PassManagerBuilder.cpp<br>

>>> URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/IPO/PassManagerBuilder.cpp?rev=214963&r1=214962&r2=214963&view=diff" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/IPO/PassManagerBuilder.cpp?rev=214963&r1=214962&r2=214963&view=diff</a><br>


>>> ==============================================================================<br>

>>> --- llvm/trunk/lib/Transforms/IPO/PassManagerBuilder.cpp (original)<br>

>>> +++ llvm/trunk/lib/Transforms/IPO/PassManagerBuilder.cpp Wed Aug  6 07:56:19 2014<br>

>>> @@ -57,6 +57,13 @@ static cl::opt<bool> RunLoadCombine("com<br>

>>>                                   cl::Hidden,<br>

>>>                                   cl::desc("Run the load combining pass"));<br>

>>><br>

>>> +static cl::opt<bool><br>

>>> +RunSLPAfterLoopVectorization("run-slp-after-loop-vectorization",<br>

>>> +  cl::init(false), cl::Hidden,<br>

>>> +  cl::desc("Run the SLP vectorizer (and BB vectorizer) after the Loop "<br>

>>> +           "vectorizer instead of before"));<br>

>>> +<br>

>>> +<br>

>>> PassManagerBuilder::PassManagerBuilder() {<br>

>>>   OptLevel = 2;<br>

>>>   SizeLevel = 0;<br>

>>> @@ -227,21 +234,23 @@ void PassManagerBuilder::populateModuleP<br>

>>><br>

>>> if (RerollLoops)<br>

>>>   MPM.add(createLoopRerollPass());<br>

>>> -  if (SLPVectorize)<br>

>>> -    MPM.add(createSLPVectorizerPass());   // Vectorize parallel scalar chains.<br>

>>> -<br>

>>> -  if (BBVectorize) {<br>

>>> -    MPM.add(createBBVectorizePass());<br>

>>> -    MPM.add(createInstructionCombiningPass());<br>

>>> -    addExtensionsToPM(EP_Peephole, MPM);<br>

>>> -    if (OptLevel > 1 && UseGVNAfterVectorization)<br>

>>> -      MPM.add(createGVNPass());           // Remove redundancies<br>

>>> -    else<br>

>>> -      MPM.add(createEarlyCSEPass());      // Catch trivial redundancies<br>

>>> -<br>

>>> -    // BBVectorize may have significantly shortened a loop body; unroll again.<br>

>>> -    if (!DisableUnrollLoops)<br>

>>> -      MPM.add(createLoopUnrollPass());<br>

>>> +  if (!RunSLPAfterLoopVectorization) {<br>

>>> +    if (SLPVectorize)<br>

>>> +      MPM.add(createSLPVectorizerPass());   // Vectorize parallel scalar chains.<br>

>>> +<br>

>>> +    if (BBVectorize) {<br>

>>> +      MPM.add(createBBVectorizePass());<br>

>>> +      MPM.add(createInstructionCombiningPass());<br>

>>> +      addExtensionsToPM(EP_Peephole, MPM);<br>

>>> +      if (OptLevel > 1 && UseGVNAfterVectorization)<br>

>>> +        MPM.add(createGVNPass());           // Remove redundancies<br>

>>> +      else<br>

>>> +        MPM.add(createEarlyCSEPass());      // Catch trivial redundancies<br>

>>> +<br>

>>> +      // BBVectorize may have significantly shortened a loop body; unroll again.<br>

>>> +      if (!DisableUnrollLoops)<br>

>>> +        MPM.add(createLoopUnrollPass());<br>

>>> +    }<br>

>>> }<br>

>>><br>

>>> if (LoadCombine)<br>

>>> @@ -263,6 +272,26 @@ void PassManagerBuilder::populateModuleP<br>

>>> // as function calls, so that we can only pass them when the vectorizer<br>

>>> // changed the code.<br>

>>> MPM.add(createInstructionCombiningPass());<br>

>>> +<br>

>>> +  if (RunSLPAfterLoopVectorization) {<br>

>>> +    if (SLPVectorize)<br>

>>> +      MPM.add(createSLPVectorizerPass());   // Vectorize parallel scalar chains.<br>

>>> +<br>

>>> +    if (BBVectorize) {<br>

>>> +      MPM.add(createBBVectorizePass());<br>

>>> +      MPM.add(createInstructionCombiningPass());<br>

>>> +      addExtensionsToPM(EP_Peephole, MPM);<br>

>>> +      if (OptLevel > 1 && UseGVNAfterVectorization)<br>

>>> +        MPM.add(createGVNPass());           // Remove redundancies<br>

>>> +      else<br>

>>> +        MPM.add(createEarlyCSEPass());      // Catch trivial redundancies<br>

>>> +<br>

>>> +      // BBVectorize may have significantly shortened a loop body; unroll again.<br>

>>> +      if (!DisableUnrollLoops)<br>

>>> +        MPM.add(createLoopUnrollPass());<br>

>>> +    }<br>

>>> +  }<br>

>>> +<br>

>>> addExtensionsToPM(EP_Peephole, MPM);<br>

>>> MPM.add(createCFGSimplificationPass());<br>

>>><br>

>>><br>

>>><br>

>>> _______________________________________________<br>

>>> llvm-commits mailing list<br>

>>> <a href="mailto:llvm-commits@cs.uiuc.edu">llvm-commits@cs.uiuc.edu</a><br>

>>> <a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits</a><br>

>><br>

><br>

<br>

<br>

_______________________________________________<br>

llvm-commits mailing list<br>

<a href="mailto:llvm-commits@cs.uiuc.edu">llvm-commits@cs.uiuc.edu</a><br>

<a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits</a><br>

</div></div></blockquote></div><br></div>