[llvm-dev] [RFC] PT.2 Add IR level interprocedural outliner for code size.

Jessica Paquette via llvm-dev llvm-dev at lists.llvm.org
Wed Sep 27 16:01:24 PDT 2017


Hi Davide,

Thanks! I think that this is a really great thing to be pushing forward in general.

> That said, do we really want encapsulate the logic for finding
> candidates into an interface? It's unclear whether it should live but
> it seems much more akin to the stuff living in Transforms/Utils than a
> proper interface.
> So, IMHO it's much more suitable for an helper/utility.

This sounds good to me. We could probably simplify it down to a utility that either

- Finds all the repeated sequences in the program and returns them to the given outliner, which then applies its cost model to it
- Takes in a (possibly optional) cost model function, finds all repeated sequences that satisfy that model, and then returns those sequences. If no model is provided, then it will return all sequences.

> For what it concerns `mapInstructions()` [please correct me if I'm wrong].
> This bit of the interface should be responsible for value numbering
> the instructions, as far as I can tell.

That’s correct. 

> To the best of my understanding the way your outliner value numbers
> using a nominal approach, i.e. two instructions get the same value
> number if their string representation is the same (or something like
> that).


Yep. They have to match exactly.

> So, I wonder whether you plan to enhance the logic for value numbering
> in your current scheme. If you do, and you plan to share more code
> between IR and MI outliner, then it's a good idea I guess to have a
> common interface (I'd love to hear your thoughts on this one).

Absolutely. There are plenty of opportunities that the MachineOutliner misses due to simple things like register allocation, on top of things like the example you gave. A more powerful scheme, and moving the MachineOutliner pre-regalloc would be highly beneficial. Having a common interface here would be great. (This would be a really interesting way to analyze the overall structure of programs as well.)

- Jessica

> On Sep 27, 2017, at 3:23 PM, Davide Italiano <davide at freebsd.org> wrote:
> 
> On Wed, Sep 27, 2017 at 9:28 AM, Jessica Paquette via llvm-dev
> <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>> 
>> I think that, given previous discussion on the topic, we might want a split
>> like this:
>> 
>> (1) Search structure
>> 
>> Suffix tree or suffix array.
>> 
>> (2) Numbering/mapping/congruence scheme
>> 
>> Every outliner should implement a function that maps instructions (or
>> whatever structure you want to outline, crazy thoughts…) to integers. That
>> should be passed to the search structure, which will return a list of
>> repeated sequences.
>> 
>> The MachineOutliner currently has an “InstructionMapper” struct in it. I
>> think we can probably build on that struct so that we can choose different
>> mapping schemes.
>> 
>> (3) Cost model/candidate selection scheme
>> 
>> Every outliner should implement a function that returns the outlining cost
>> or benefit of a candidate. In the MachineOutliner, this is a joint effort
>> between the pass and the target.
>> 
>> (4) Outlining scheme
>> 
>> Every outliner should implement a function that actually performs outlining.
>> That is, we should have a function that replaces each instance of a repeated
>> sequence of instructions with a call to a newly-created function. In the
>> MachineOutliner, the method by which we do this is decided on by the target.
>> 
>> So, we at the very least might want an interface that implements something
>> like
>> 
>> unsigned mapInstructions(…);
>> unsigned getOutliningCost(…);
>> unsigned findCandidates(…);
>> bool outline(…);
>> 
> 
> Hi Jessica, I tend to agree we should try to maximize code reuse in
> this context, and I think yours is a good step forward. In particular,
> I'm particularly confident the `findCandidates()` bits should be
> split.
> That said, do we really want encapsulate the logic for finding
> candidates into an interface? It's unclear whether it should live but
> it seems much more akin to the stuff living in Transforms/Utils than a
> proper interface.
> So, IMHO it's much more suitable for an helper/utility.
> 
> For what it concerns `mapInstructions()` [please correct me if I'm wrong].
> This bit of the interface should be responsible for value numbering
> the instructions, as far as I can tell.
> To the best of my understanding the way your outliner value numbers
> using a nominal approach, i.e. two instructions get the same value
> number if their string representation is the same (or something like
> that).
> 
> With the notion above,
> xorl %eax %eax
> movl %eax, $0
> 
> aren't really considered equivalent (although, in practice, they are).
> 
> The way I imagined outlining to work at the IR level (and how I once
> discussed with Dan) is leveraging the result of a value number
> analysis (NewGVN) to assign numbers.
> 
> While the problem NewGVN solves is that of finding all the Herbrand
> equivalences (structural equivalence, same operation and operand
> structurally congruent), it also does a canonicalization step calling
> InstSimplify to canonicalize (so, in this sense it catches more
> stuffs, e.g.:
> 
> %x = add %a, 0
> %y = sub %b, 0
> not equivalent
> 
> instSimplify(%y) -> add %a, 0
> now they're equivalent.
> 
> Note: The current IR outlliner doesn't really use the approach I just
> described, yet, but it's not hard to imagine extending (or rewriting
> it to use a real Global Value number analysis to get instructions that
> are equivalent, for some definition of  structural equivalency)
> 
> So, I wonder whether you plan to enhance the logic for value numbering
> in your current scheme. If you do, and you plan to share more code
> between IR and MI outliner, then it's a good idea I guess to have a
> common interface (I'd love to hear your thoughts on this one).
> 
> About the `getOutliningCost()` API, what do you think to do in the MI outliner?
> Currently, if I remember correctly, (almost) all instructions have
> unit cost except for something.
> It's not unreasonable to think this model will evolve. A relatively
> obvious improvement might be that of taking in account the length
> encoding for the target architecture when outlining. The IR outliner
> uses TargetTransformInfo to do estimation costs. I think we may want
> to evaluate whether (or not) it's feasible to have a base class for
> the cost model that doesn't introduce too much boilerplate, or whether
> we should bite the bullet and pay a small cost in duplication for this
> logic across the two outliners.
> 
> Thanks,
> 
> --
> Davide

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170927/533f5596/attachment.html>


More information about the llvm-dev mailing list