[PATCH] [SDAG] Introduce a combined set to the DAG combiner which tracks nodes which have successfully round-tripped through the combine phase, and use this to ensure all operands to DAG nodes are visited by the combiner, even if they are only added...
chandlerc at gmail.com
Wed Jul 23 04:09:22 PDT 2014
A couple of random other notes about this patch....
- I did benchmark the compile time impact. For a "worst case" scenario in
my mind of taking a single bitcode for all of clang and running it through
just llc I observed a 3% slowdown of llc with this patch. I expect that to
translate into more like 1% or 2% of slowdown for LTO, and well under that
for normal compiles (which have the frontend in the critical path). So the
runtime cost isn't great, but isn't terribly scary to me.
- An obvious question is, why still populate the worklist with all the
nodes? The answer is because just starting from the root and letting the
de-dup'ed operand walk handle the rest causes more tests to fail, and I'm
already struggling to drag the test suite kicking and screaming through
these changes. It still might be nice eventually so that we move to a more
principled graph-traversal strategy. It may recover a (very minor) fraction
of the runtime cost, but nothing huge.
- With this functionality added, I can add legalization support to the last
DAG combine phase and successfully legalize newly created illegal nodes as
we effectively traverse the new nodes. I've successfully implemented the
PSHUFB matching logic on top of this. So this isn't just idle hardening of
the DAG combiner.
On Wed, Jul 23, 2014 at 3:58 AM, Chandler Carruth <chandlerc at gmail.com>
> Hi grosbach, echristo, hfinkel, arsenm,
> ...during the combine phase.
> This is critical to have the combiner reach nodes that are *introduced*
> during combining. Previously these would sometimes be visited and
> sometimes not be visited based on whether they happened to end up on the
> worklist or not. Now we always run them through the combiner.
> This fixes quite a few bad codegen test cases lurking in the suite while
> also being more principled. Among these, the TLS codegeneration is
> particularly exciting for programs that have this in the critical path
> like TSan-instrumented binaries (although I think they engineer to use
> a different TLS that is faster anyways).
> However, there are also a worrying number of cases in x86 where this
> causes us to combine things into a DAG that misses the expected patterns
> and instruction selects to something pretty terrible. =/ The vector sign
> extension patterns in particular seem quite broken by this.
> Again, assistance would be appreciated from those familiar with other
> targets to evaluate the nature of these changes. I haven't even really
> vetted Mips or R600 for correctness. That's the primary reason for
> needing pre-commit review here.
> If anyone wants to take a look at the x86 regressions around sign
> extension and add some patterns or fix the combines to survive this
> change that would be amazingly helpful...
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the llvm-commits