[LLVMdev] PATCH: Use size reduction -- wave2

Fri Apr 18 08:42:46 PDT 2008

On Apr 16, 8:25 pm, Dan Gohman <goh... at apple.com> wrote:
> On Apr 16, 2008, at 2:50 AM, heisenbug wrote:
>
>
>
> > And now here is my educated speculation:
> > There are 2 things that became slower
> > 1) Use::getUser()
> > 2) Use::get/set due to tagging.
>
> > The former is seldom called:
>
> > $ find lib -name "*.cpp" | xargs grep "getUser(" | wc -l
> >      41
>
> The majority of those aren't actually Use::getUser, but on the
> other hand this grep misses all the users of
> value_use_iterator::operator*, which is much more popular.
> Unfortunately, overloaded operators are not the easiest to
> grep for ;-).

Hi Dan,

thanks for this hint, this makes me to look harder into cost reducing
getUser. My current idea is to do a very cheap heuristic inline, which
catches 100% of the cases for UnaryInstruction and 50% of the cases
for Binary/CmpInstruction, without doing a function call at all.

>
> > The second is counterbalanced with a faster access to the Use object
> > in most cases:
> >  With exception of PHINode and SwitchInst, the getOperand() function
> > (if called on a specialized "this" pointer) does a "this"-relative
> > access instead of getting OperandList pointer first and going thru
> > that. This was the case before for fixed-arity instructions, but now
> > it applies to variadic ones too that cannot perform operand list
> > resizing.
>
> > Some things got faster, like
> > 1) getOperand access on (say) CallInst is now fetching operands from
> > relative to "this". Also, there is no "new Use[n]" allocations (and
> > delete []) for these any more.
>
> This is fine, but technically it's an independent change that could
> be done without the getUser algorithm changes and tagging.
>
> > 2) no need to maintain Use::U
> > 3) data-cache related gains on smaller Use objects (not yet activated)
>
> > So, my idea is that these changes are performance neutral.
>
> > There are potentially more speedups to be realized:
> > - indexing operands in the decreasing direction eliminates the
> > getNumOperands() call in getOperand() for variadic arity instructions.
>
> At a glance, the only getOperand() that I saw that uses getNumOperands()
> is ReturnInst's, and that actually looks like a bug.
>
> > - shaving off (unneeded) operand related administrative members from
> > User.
>
> Which members?

I am thinking of OperandList and NumOperands. In most User subclasses
these
are trivially recomputable/constant. The catch is only that getting
these via a
User* is a bit more costly (virtual call or other trick). The question
is how
often the code calls getOperand thru an unspecialized User* ?

>
>
>
> > I hope that this is interesting, but I'd like to ask anybody who is
> > comfortable with performance testing to help provide some hard
> > data :-)
>
> I agree that performance testing is not easy and requires resources,

I am on my way into this area. Now I have a working Release setup
and already in possession of hard numbers, which are pretty
encouraging.
I see about 1% of speed degradation on a cruel verifier test
(2004-09-29-VerifierIsReallySlow.llx) but even there the user-time
is well in the cloud of measurements of the regular (non-tagged)
build.
On all other tests I have seen no degradation so far. And this
was even before I moved the tag from the Val to the Pred member of
Use.

> but I'm curious what's motivating this project here. My assumption

LLVM is intended to be a pretty much target independent technology
with a
compact delivery format and potentially good startup behaviour. To
actually being a viable system for a wide range of machines it
must be as memory efficient as possible. The in-memory data structures
are needed for the JIT and optimizer and thus they must coexist with
the
generated machine code for the platform. <Use> seems to be the
structure
with the currently greatest bang for buck ratio, so I am thinking
about
a solution to that :-)

> has been that no one would likely go to the extreme of inventing a
> mini-language encoded in the least significant bits of pointer
> members in successive structs unless they were pretty desperate for
> performance or scalability.

Yep. There are desperate needs. I am in the embedded world. When
sometime
iPhone/Pod-like devices are supposed to run many multi-megainstruction
programs routinely, anybody else will see the difference.

Regarding the mini-language, I decided that the alternatives like
linear
search (see the Kernighan-Ritchie mini-language of null-terminated
strings
and strlen :-) and map lookup are far more costly, esp. in a multi-
threaded
environment. For completeness see my musings at <http://
heisenbug.blogspot.com/2008/04/llvm-data-structures-and-putting-use-
on.html>.

>
> Dan

Thanks for your feedback!

Cheers,

    Gabor

>
> _______________________________________________
> LLVM Developers mailing list
> LLVM... at cs.uiuc.edu        http://llvm.cs.uiuc.eduhttp://lists.cs.uiuc.edu/mailman/listinfo/llvmdev