[LLVMbugs] [Bug 14398] New: clang bootstrap builds generate inefficient code for chained calls to DAG.getNode()

bugzilla-daemon at llvm.org bugzilla-daemon at llvm.org
Tue Nov 20 20:43:51 PST 2012


http://llvm.org/bugs/show_bug.cgi?id=14398

             Bug #: 14398
           Summary: clang bootstrap builds generate inefficient code for
                    chained calls to DAG.getNode()
           Product: new-bugs
           Version: trunk
          Platform: Macintosh
        OS/Version: MacOS X
            Status: NEW
          Severity: normal
          Priority: P
         Component: new bugs
        AssignedTo: unassignedbugs at nondot.org
        ReportedBy: craig.topper at gmail.com
                CC: llvmbugs at cs.uiuc.edu
    Classification: Unclassified


While looking at generated code for SelectionDAGBuilder.cpp, I noticed that
successive calls to DAG.getNode where each call depends on a previous call to
DAG.getNode() generates pretty inefficient sequences.

The basic sequence containing the problem looks something like this.
1. Copy EDX:RAX containing the returned SDValue to a fixed stack slot as if it
was a local variable.
2. Load from stack slot just written to.
3. Store those registers into stack frame for the next call.

Each return value gets allocated to own unique stack object not shared with any
other calls. This results in a unique stack object for every call to getNode in
a function. These stack objects are short lived, but aren't being reused. But
really there's no reason for it to exist.

Ideally the generated code would put EDX:RAX into the stack frame for the next
call directly. This would save 4 instructions per call to DAG.getNode()


Looking at the IR, I see clang has generated a store to an alloca representing
the local SDValue that the DAG.getNode() result is assigned to. Then a memcpy
into a separate alloca. A pointer to this alloca is passed to the next
DAG.getNode() which has "%"class.llvm::SDValue"* byval align 8" as its argument
type.

If getNode had less arguments the SDValue would be split into its fields to be
passed in separate register arguments. But since getNode has so many arguments
clang decides that after the first 6 registers, the remaining SDValue arguments
need to be passed as byval pointers.

Since the IR passes see this alloca being passed as a pointer they skip
optimizing it. There also no lifetime intrinsics associated with it to track
it.

So it stays all the way to calling convention handling in the backend. At this
point the X86 calling convention handling seems to decide that all byval
arguments should be passed by copying the entire SDValue onto the stack for the
call. Which is completely different than what the IR itself thinks will happen.

So now we're left with the store, the load, and another store and no machine
instruction passes will touch them either.

-- 
Configure bugmail: http://llvm.org/bugs/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.



More information about the llvm-bugs mailing list