[PATCH] Devirtualize llvm::Value and all subclasses

Pete Cooper peter_cooper at apple.com
Fri Jun 26 15:05:37 PDT 2015


> On Jun 26, 2015, at 2:14 PM, David Blaikie <dblaikie at gmail.com> wrote:
> 
> 
> 
> On Fri, Jun 26, 2015 at 1:50 PM, Pete Cooper <peter_cooper at apple.com <mailto:peter_cooper at apple.com>> wrote:
> 
>> On Jun 26, 2015, at 1:42 PM, David Blaikie <dblaikie at gmail.com <mailto:dblaikie at gmail.com>> wrote:
>> 
>> Completely shooting from the hip, we have been seeing some slow-downs over the last week or so & there's at least a vague guess that it could be related to these pervasive data structure changes that have been going in within that window.
> Thats a reasonable guess.  I’d think the same.
>> 
>> Have you been running/seen any compile-time perf changes in that same window?
> I have yeah.  We have the bitcode saved from the linker for verify_uselist_order with and without debug info and before/after optimizations.  I’ve been putting those through either opt/llc and measuring compile time.  It never strayed from noise TBH.
> 
> What about broader perf numbers? Not seeing any significant fluctuations in the last 7-10 days?
I just took a look at our internal compile-time graphs for SPEC2000.  They are showing a general upwards trend recently which is worrying.

Looking at 2 ~1% spikes, I have commits in the range r239993-r240004 and r240426-r240456.

There are none of mine in those particular ranges (i’m totally not trying to pass the buck here BTW, just that the largest recent spikes I happened to look at don’t have my patches).  Its very possible my patches have a negative compile time impact which is part of the general trend and doesn’t show as a spike.

Prior to those ranges, there’s another compile-time spike we never recovered from in the range r237378-r237422.  I believe the likely candidate here is r237395 "Add another InstCombine pass after LoopUnroll.”.  However, i’m not saying we should necessarily change that commit as it likely gives runtime perf gains which make up for the compile time (speculating here, didn’t actually check).  Anyway, its an old commit range at this point, just wanted to point it out while I was looking at the graphs anyway, and it caught my attention.
> 
>> One of the things I was thinking, if this is an issue, is that we could do something vtable-esque. Since we already have an enum to differentiate the various derived classes - could we have an array of manual vtables, indexed by that enum, and explicitly invoke through that? If the switch stuff is slowing things down - possible that it isn't, and the switch is cheaper than the indirect call, etc.
> I was about to reply ‘surely the optimizer will just make a jump table from the switch which is just as fast’.
> 
> Could be, I really don't know.
>  
> However, then I thought about what that would look like, and its not quite what you’re suggesting.  The jump table optimization would only jump to a block which then does the call, but the call itself is then another jump.
> 
> We couldn’t rely on it 100%, but I wonder if we can teach the optimizer to optimize jump tables to calls where every call has identical argument lists.
> 
> Anyway, it probably wouldn’t help with the performance tracking for me to commit this (assuming someone gives it a LGTM), so i’ll hold off for now.
> 
> Yeah, until you/I/we/someone has actually reliable perf stuff that points to any particular change or changes, I probably wouldn't start shooting in the dark - just figured I'd bring it up in case it's something you're already aware of or, upon looking, would become so.
Thanks for checking, and let me know if I can be of any more help.  I’ll certainly be more stringent with checking compile time on these patches.  Don’t want to even slightly help compile time get any worse.

Cheers,
Pete
> 
> - Dave
>  
> 
> Pete
>> 
>> Just some idle thoughts.
>> 
>> On Tue, Jun 16, 2015 at 4:46 PM, Pete Cooper <peter_cooper at apple.com <mailto:peter_cooper at apple.com>> wrote:
>> Hi Nick
>> 
>> As you were the last person to attempt this on PR889, I thought i’d see how you like this series of patches.
>> 
>> I’ve tried to devirtualize a small number of virtual methods at a time, hence the bunch of patches.  Unfortunately the last one is pretty huge as it handles all the places we used to call delete on Value’s and now need to call a destroy method to dispatch to the correct subclass.  I’m not sure there’s a great way to split the last one up.
>> 
>> David, i’ve CCed you specifically to ask about how safe some of the changes here are.  In particular, the way i’ve changed things is:
>> - If a subclass is final (and i’ve added final where I can), then I give it its own delete method which just does ‘using User::operator delete’.  Is this reasonable?  Do i need to handle any other kinds of delete like User does?
>> - If a subclass is not able to be final, then i’ve added a destroy() method and switch on the value kind to then call ‘delete cast<SubClass>(this)’
>> 
>> Thanks to Duncan for the tip on std::default_delete, i’ve made use of this on various places which had a std::unique_ptr of say Instruction or Value.  The default deleter for those calls destroy() instead of delete.  Deleter’s aren’t needed for final subclasses because I was able to give them a delete method instead.
>> 
>> I also extended helpers like DeleteContainerPointers to take a std::default_delete so that we can forward to the destroy methods where needed.
>> 
>> I measured memory usage on the verify_uselist_order test case Duncan has been using.  This contains debug info and is measured with 'llc ~/verify-uselistorder.lto.opt.bc -o /dev/null --filetype=obj -disable-verify’.  Prior to this change we consumed 814MB, and 800MB after.  So almost 2% savings.  This is in line with what I was expecting as the savings is 8 bytes per Value, but due to malloc alignment on Mac OS, we only save that on an average of half of values.
>> 
>> Finally, this is the order I did all the work locally.  If any of this makes sense to be squashed or split, please let me know.  In the first patch for example I create Constant.def, but I later change that to Value.def.  This one should obviously be fixed to be Value.def from the start.
>> 
>> Cheers,
>> Pete
>> 
>> 
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150626/689ce6ab/attachment.html>


More information about the llvm-commits mailing list