[llvm-dev] [RFC] Add IR level interprocedural outliner for code size.

Tue Jul 25 09:26:47 PDT 2017

Hi Sean,

> On Jul 24, 2017, at 9:02 PM, Sean Silva <chisophugis at gmail.com> wrote:
> 
> 
> 
> On Mon, Jul 24, 2017 at 6:24 PM, Quentin Colombet via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
> Hi River,
> 
> Thanks for the detailed explanation.
> 
> If people are okay for you to move forward, like I said to Andrey, I won’t oppose. I feel sad we have to split our effort on outlining technology, but I certainly don’t pretend to know what is best!
> The bottom line is if people are happy with that going in, the conversation on the details can continue in parallel.
> 
>> On Jul 24, 2017, at 4:56 PM, River Riddle <riddleriver at gmail.com <mailto:riddleriver at gmail.com>> wrote:
>> 
>> Hey Quentin,
>>  Sorry I missed the last question. Currently it doesn't do continual outlining, but it's definitely a possibility.
> 
> Ok, that means we probably won’t have the very bad runtime problems I had in mind with adding a lot of indirections. 
> 
>> Thanks,
>> River Riddle
>> 
>> On Mon, Jul 24, 2017 at 4:36 PM, River Riddle <riddleriver at gmail.com <mailto:riddleriver at gmail.com>> wrote:
>> Hi Quentin,
>>   I understand your points and I believe that some meaning is being lost via email. For performance it's true that that cost isn't necessarily modeled, there is currently only support for using profile data to avoid mitigate that. I was working under the assumption, possibly incorrectly, that at Oz we favor small code over anything else including runtime performance. This is why we only run at Os if we have profile data, and even then we are very conservative about what we outline from. 
>>   I also understand that some target hooks may be required for certain things, it happens for other IR level passes as well. I just want to minimize that behavior as much as I can, though thats a personal choice.
>> As for a motivating reason to push for this, it actually solves some of the problems that you brought up in the post for the original Machine Outliner RFC. If I can quote you:
>> 
>> "So basically, better cost model. That's one part of the story.
>> 
>> The other part is at the LLVM IR level or before register allocation identifying similar code sequence is much harder, at least with a suffix tree like algorithm. Basically the problem is how do we name our instructions such that we can match them.
>> Let me take an example.
>> foo() {
>> /* bunch of code */
>> a = add b, c;
>> d = add e, f; 
>> }
>> 
>> bar() {
>> d = add e, g;
>> f = add c, w;
>> }
>> 
>> With proper renaming, we can outline both adds in one function. The difficulty is to recognize that they are semantically equivalent to give them the same identifier in the suffix tree. I won’t get into the details but it gets tricky quickly. We were thinking of reusing GVN to have such identifier if we wanted to do the outlining at IR level but solving this problem is hard."
>> 
>> This outliner will catch your case, it is actually one of the easiest cases for it to catch. The outliner can recognize semantically equivalent instructions and can be expanded to catch even more.
> 
> Interesting, could you explain how you do that?
> I didn’t see it in the original post.
> 
>> 
>> As for the cost model it is quite simple:
>>  - We identify all of the external inputs into the sequence. For estimating the benefit we constant fold and condense the inputs so that we can get the set of *unique* inputs into the sequence. 
> 
> Ok, those are your parameters. How do you account for the cost of setting up those parameters?
> 
>>  - We also identify outputs from the sequence, instructions that have external references. We add the cost of storing/loading/passing output parameter to the outlined function.
> 
> Ok, those are your return values (or your output parameter). I see the cost computation you do on those, but it still miss the general parameter setup cost.
> 
>>  - We identify one output to promote to a return value for the function. This can end up reducing an output parameter that is passed to our outlined function.
> 
> How do you choose that one?
> 
>>  - We estimate the cost of the sequence of instructions by mostly using TTI.
> 
> Sounds sensible. (Although I am not a fan of the whole TTI thing :)).
> 
>>  - With the above we can estimate the cost per occurrence as well as the cost for the new function. Of course these are mostly estimates, we lean towards the conservative side to be safe. 
> 
> Conservative in what sense? Put differently how do you know your cost is conservative?
> 
>>  - Finally we can compute an estimated benefit for the sequence taking into account benefit per occurrence and the estimated cost of the new function.
> 
> Make sense.
> 
>> 
>> There is definitely room for improvement in the cost model. I do not believe its the best but its shown to be somewhat reliable for most cases. It has benefits and it has regressions, as does the machine outliner.
> 
> Regressions in what sense?
> Do you actually have functions that are bigger?
> 
> To clarify, AFAIR, the machine outliner does not have such regressions per say. The outliner does perfectly what the cost model predicted: functions are individually smaller. Code size grow may happen because of padding in the object file (and getting in the way of some linker optimization).
> 
> 
> The current MIR outliner doesn't take into account instruction encoding length either. Considering that on x86 instructions can commonly be both 1 and 10+ bytes long, the variability from not modeling that is probably comparable to the inaccuracy of estimating a fixed cost for an LLVM IR instruction w.r.t. its lowering to machine code.

Right, I forgot about x86, AArch64 being the primary target when we did that :).
Frankly, there is nothing hard doing the estimate on the actual encoding at this stage of the IR. The reasons why we didn’t do it were:
1. AArch64 does not have differences
2. It takes some (compile) time to compute that information and given we didn’t need it for AArch64 we didn’t do it.

> 
> As a simple example, many commonly used x86 instructions encode as over 5 bytes, which is the size of a call instruction. So an outlined function that consists of a single instruction can be profitable. IIRC there was about 5% code size saving just from outlining single instructions (that encode at >5 bytes) at machine code level on the test case I looked at (I mentined it in a comment on one of Jessica's patches IIRC).

Good point!

> 
> Do we have a way to get an instruction's exact encoded length for the MIR outliner?

Not right now, IIRC, but I would say that’s half a day effort at most. To be fair, I haven’t followed the development of the code base since Jessica’s internship. Jessica would know the details for sure.

Cheers,
-Quentin

> 
> -- Sean Silva
>  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170725/6907d8f4/attachment.html>