<div dir="ltr">And as Joey reminded me, I should clarify why I just nuked the aarch64 test: the test is useless. It CHECKs for an 'ldr' instruction.... which is loading into the zero register! Unsurprisingly, when this instruction gets run through the combiner, it gets removed.</div>

<div class="gmail_extra"><br><br><div class="gmail_quote">On Wed, Jul 23, 2014 at 4:09 AM, Chandler Carruth <span dir="ltr"><<a href="mailto:chandlerc@gmail.com" target="_blank">chandlerc@gmail.com</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">A couple of random other notes about this patch....<div><br></div><div>- I did benchmark the compile time impact. For a "worst case" scenario in my mind of taking a single bitcode for all of clang and running it through just llc I observed a 3% slowdown of llc with this patch. I expect that to translate into more like 1% or 2% of slowdown for LTO, and well under that for normal compiles (which have the frontend in the critical path). So the runtime cost isn't great, but isn't terribly scary to me.</div>


<div><br></div><div>- An obvious question is, why still populate the worklist with all the nodes? The answer is because just starting from the root and letting the de-dup'ed operand walk handle the rest causes more tests to fail, and I'm already struggling to drag the test suite kicking and screaming through these changes. It still might be nice eventually so that we move to a more principled graph-traversal strategy. It may recover a (very minor) fraction of the runtime cost, but nothing huge.</div>


<div><br></div><div>- With this functionality added, I can add legalization support to the last DAG combine phase and successfully legalize newly created illegal nodes as we effectively traverse the new nodes. I've successfully implemented the PSHUFB matching logic on top of this. So this isn't just idle hardening of the DAG combiner.</div>


</div><div class="gmail_extra"><br><br><div class="gmail_quote"><div><div class="h5">On Wed, Jul 23, 2014 at 3:58 AM, Chandler Carruth <span dir="ltr"><<a href="mailto:chandlerc@gmail.com" target="_blank">chandlerc@gmail.com</a>></span> wrote:<br>


</div></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div class="h5">Hi grosbach, echristo, hfinkel, arsenm,<br>

<br>

...during the combine phase.<br>

<br>

This is critical to have the combiner reach nodes that are *introduced*<br>

during combining. Previously these would sometimes be visited and<br>

sometimes not be visited based on whether they happened to end up on the<br>

worklist or not. Now we always run them through the combiner.<br>

<br>

This fixes quite a few bad codegen test cases lurking in the suite while<br>

also being more principled. Among these, the TLS codegeneration is<br>

particularly exciting for programs that have this in the critical path<br>

like TSan-instrumented binaries (although I think they engineer to use<br>

a different TLS that is faster anyways).<br>

<br>

However, there are also a worrying number of cases in x86 where this<br>

causes us to combine things into a DAG that misses the expected patterns<br>

and instruction selects to something pretty terrible. =/ The vector sign<br>

extension patterns in particular seem quite broken by this.<br>

<br>

Again, assistance would be appreciated from those familiar with other<br>

targets to evaluate the nature of these changes. I haven't even really<br>

vetted Mips or R600 for correctness. That's the primary reason for<br>

needing pre-commit review here.<br>

<br>

If anyone wants to take a look at the x86 regressions around sign<br>

extension and add some patterns or fix the combines to survive this<br>

change that would be amazingly helpful...<br>

<br>

<a href="http://reviews.llvm.org/D4638" target="_blank">http://reviews.llvm.org/D4638</a><br>

<br>

Files:<br>

  lib/CodeGen/SelectionDAG/DAGCombiner.cpp<br>

  test/CodeGen/AArch64/arm64-dagcombiner-indexed-load.ll<br>

  test/CodeGen/ARM/aapcs-hfa-code.ll<br>

  test/CodeGen/Mips/cmov.ll<br>

  test/CodeGen/R600/add_i64.ll<br>

  test/CodeGen/R600/or.ll<br>

  test/CodeGen/X86/2010-04-23-mmx-movdq2q.ll<br>

  test/CodeGen/X86/avx-sext.ll<br>

  test/CodeGen/X86/i8-umulo.ll<br>

  test/CodeGen/X86/jump_sign.ll<br>

  test/CodeGen/X86/lower-bitcast.ll<br>

  test/CodeGen/X86/pr15267.ll<br>

  test/CodeGen/X86/store-narrow.ll<br>

  test/CodeGen/X86/trunc-ext-ld-st.ll<br>

  test/CodeGen/X86/vector-idiv.ll<br>

  test/CodeGen/X86/widen_cast-1.ll<br>

  test/CodeGen/X86/widen_conv-1.ll<br>

  test/CodeGen/X86/widen_load-2.ll<br>

  test/CodeGen/X86/x86-64-tls-1.ll<br>

  test/CodeGen/X86/x86-setcc-int-to-fp-combine.ll<br>

<br></div></div>_______________________________________________<br>

llvm-commits mailing list<br>

<a href="mailto:llvm-commits@cs.uiuc.edu" target="_blank">llvm-commits@cs.uiuc.edu</a><br>

<a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits</a><br>

<br></blockquote></div><br></div>

</blockquote></div><br></div>