[llvm-dev] [RFC] Interprocedural MIR-level outlining pass

Fri Aug 26 21:44:52 PDT 2016

----- Original Message -----

> From: "Daniel Berlin" <dberlin at dberlin.org>
> To: "Quentin Colombet" <qcolombet at apple.com>
> Cc: "Hal Finkel" <hfinkel at anl.gov>, "llvm-dev"
> <llvm-dev at lists.llvm.org>
> Sent: Friday, August 26, 2016 11:06:56 PM
> Subject: Re: [llvm-dev] [RFC] Interprocedural MIR-level outlining
> pass

> FWIW: I'm with quentin. Without a good value numbering analysis, this
> is a hard problem.

How exactly does value numbering help here? This problem seems to be about finding structurally-similar parts of the computations of different values, not identical values. It seems much closer to the analysis necessary for SLP vectorization than value numbering, as such. 

I think we might be able to use value numbering to help with subset of SLP vectorization, but only because we can define some kind of equivalence relation on consecutive memory accesses (and similar). What you need for this outlining seems more akin to the general SLP-vectorization case. This might just be the maximal common subgraph problem. 

-Hal 

> GVN as it exists now doesn't really provide what you want, and in
> fact, doesn't value number a lot of instruction types (including,
> for example, loads, stores, calls, phis, etc). It also relies on
> ordering, and more importantly, it relies on *not* being a strong
> analysis. If you were to stop it from eliminating as it went, it
> would catch maybe 10% of the redundancies it does now. So what you
> want is "i want to know what globally is the same", it doesn't
> really answer that well.

> Doing the analysis Quentin wants is pretty trivial in NewGVN
> (analysis and elimination are split, everything is broken into
> congruence classes explicitly, each congruence class has an id, you
> could sub in that id as the value for the terminated string), but
> i'd agree that GVN as it exists now will not do what they want, and
> would be pretty hard to make work well.

> (FWIW: Davide Italiano has started breaking up newgvn into
> submittable pieces, and is pretty far along to having a first patch
> set I believe)

> On Fri, Aug 26, 2016 at 4:54 PM, Quentin Colombet via llvm-dev <
> llvm-dev at lists.llvm.org > wrote:

> > Hi,
> 

> > I let Jessica give more details but here are some insights.
> 

> > MIR offers a straight forward way to model benefits, because we
> > know
> > which instructions we remove and which one we add and there are no
> > overhead of setting up parameters. Indeed, since the coloring will
> > be the same between the different outlining candidates, the call is
> > just a jump somewhere else. We do not have to worry about the
> > impact
> > of parameter passing and ABI.
> 

> > So basically, better cost model. That's one part of the story.
> 

> > The other part is at the LLVM IR level or before register
> > allocation
> > identifying similar code sequence is much harder, at least with a
> > suffix tree like algorithm. Basically the problem is how do we name
> > our instructions such that we can match them.
> 
> > Let me take an example.
> 
> > foo() {
> 
> > /* bunch of code */
> 
> > a = add b, c;
> 
> > d = add e, f;
> 
> > }
> 

> > bar() {
> 
> > d = add e, g;
> 
> > f = add c, w;
> 
> > }
> 

> > With proper renaming, we can outline both adds in one function. The
> > difficulty is to recognize that they are semantically equivalent to
> > give them the same identifier in the suffix tree. I won’t get into
> > the details but it gets tricky quickly. We were thinking of reusing
> > GVN to have such identifier if we wanted to do the outlining at IR
> > level but solving this problem is hard.
> 

> > By running after regalloc, we basically have a heuristic that does
> > this naming for us.
> 

> > Cheers,
> 
> > -Quentin
> 

> > > On Aug 26, 2016, at 3:01 PM, Hal Finkel via llvm-dev <
> > > llvm-dev at lists.llvm.org > wrote:
> > 
> 

> > > > From: "Kevin Choi" < code.kchoi at gmail.com >
> > > 
> > 
> 
> > > > To: "Hal Finkel" < hfinkel at anl.gov >
> > > 
> > 
> 
> > > > Cc: "llvm-dev" < llvm-dev at lists.llvm.org >
> > > 
> > 
> 
> > > > Sent: Friday, August 26, 2016 4:55:29 PM
> > > 
> > 
> 
> > > > Subject: Re: [llvm-dev] [RFC] Interprocedural MIR-level
> > > > outlining
> > > > pass
> > > 
> > 
> 

> > > > I think the "Motivation" section explained that.
> > > 
> > 
> 
> > > I don't think it explained it.
> > 
> 

> > > > I too first thought about "why not at IR?" but the reason looks
> > > > like
> > > > MIR, post-RA has the most accurate heuristics (best way to know
> > > > looks like actually getting there).
> > > 
> > 
> 
> > > But also, potentially, the fewest opportunities. That's why I'm
> > > curious about the motivation - the trade offs are not obvious to
> > > me.
> > 
> 

> > > -Hal
> > 
> 

> > > > Do you know if there is any experimental pass that relies on
> > > > deriving
> > > > heuristics by a feedback loop after letting, ie. a duplicate
> > > > module/function/block continue past?
> > > 
> > 
> 

> > > > Regards,
> > > 
> > 
> 

> > > > Kevin
> > > 
> > 
> 

> > > > On 26 August 2016 at 14:36, Hal Finkel via llvm-dev <
> > > > llvm-dev at lists.llvm.org > wrote:
> > > 
> > 
> 

> > > > > Hi Jessica,
> > > > 
> > > 
> > 
> 

> > > > > This is quite interesting.
> > > > 
> > > 
> > 
> 

> > > > > Can you comment on why you started by doing this at the MI
> > > > > level,
> > > > > as
> > > > > opposed to the IR level. And at the MI level, why after RA
> > > > > instead
> > > > > of before RA?
> > > > 
> > > 
> > 
> 

> > > > > Perhaps the first question is another way of asking about how
> > > > > accurately we can model call-site code size at the IR level?
> > > > 
> > > 
> > 
> 

> > > > > Thanks in advance,
> > > > 
> > > 
> > 
> 
> > > > > Hal
> > > > 
> > > 
> > 
> 

> > > > > > From: "Jessica Paquette via llvm-dev" <
> > > > > > llvm-dev at lists.llvm.org
> > > > > > >
> > > > 
> > > 
> > 
> 
> > > > > > To: llvm-dev at lists.llvm.org
> > > > 
> > > 
> > 
> 
> > > > > > Sent: Friday, August 26, 2016 4:26:09 PM
> > > > 
> > > 
> > 
> 
> > > > > > Subject: [llvm-dev] [RFC] Interprocedural MIR-level
> > > > > > outlining
> > > > > > pass
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > > Hi everyone,
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > > Since I haven't said anything on the mailing list before, a
> > > > > > quick
> > > > 
> > > 
> > 
> 
> > > > > > introduction. I'm an intern at Apple, and over the summer I
> > > > 
> > > 
> > 
> 
> > > > > > implemented a
> > > > 
> > > 
> > 
> 
> > > > > > prototype for an outlining pass for code size in LLVM. Now
> > > > > > I'm
> > > > 
> > > 
> > 
> 
> > > > > > seeking to
> > > > 
> > > 
> > 
> 
> > > > > > eventually upstream it. I have the preliminary code on
> > > > > > GitHub
> > > > > > right
> > > > 
> > > 
> > 
> 
> > > > > > now,
> > > > 
> > > 
> > 
> 
> > > > > > but it's still very prototypical (see the code section).
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > > ================================
> > > > 
> > > 
> > 
> 
> > > > > > Motivation
> > > > 
> > > 
> > 
> 
> > > > > > ================================
> > > > 
> > > 
> > 
> 
> > > > > > The goal of the internship was to create an interprocedural
> > > > > > pass
> > > > > > that
> > > > 
> > > 
> > 
> 
> > > > > > would reduce code size as much as possible, perhaps at the
> > > > > > cost
> > > > > > of
> > > > 
> > > 
> > 
> 
> > > > > > some
> > > > 
> > > 
> > 
> 
> > > > > > performance. This would be useful to, say, embedded
> > > > > > programmers
> > > > > > who
> > > > 
> > > 
> > 
> 
> > > > > > only
> > > > 
> > > 
> > 
> 
> > > > > > have a few kilobytes to work with and a substantial amount
> > > > > > of
> > > > > > code
> > > > > > to
> > > > 
> > > 
> > 
> 
> > > > > > fit
> > > > 
> > > 
> > 
> 
> > > > > > in that space.
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > > ================================
> > > > 
> > > 
> > 
> 
> > > > > > Approach and Initial Results
> > > > 
> > > 
> > 
> 
> > > > > > ================================
> > > > 
> > > 
> > 
> 
> > > > > > To do this, we chose to implement an outliner. Outliners
> > > > > > find
> > > > 
> > > 
> > 
> 
> > > > > > sequences of
> > > > 
> > > 
> > 
> 
> > > > > > instructions which would be better off as a function call,
> > > > > > by
> > > > > > some
> > > > 
> > > 
> > 
> 
> > > > > > measure
> > > > 
> > > 
> > 
> 
> > > > > > of "better". In this case, the measure of "better" is
> > > > > > "makes
> > > > > > code
> > > > 
> > > 
> > 
> 
> > > > > > smaller".
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > > ================================
> > > > 
> > > 
> > 
> 
> > > > > > Results
> > > > 
> > > 
> > 
> 
> > > > > > ================================
> > > > 
> > > 
> > 
> 
> > > > > > These results are from a fairly recent 64-bit Intel
> > > > > > processor,
> > > > > > using
> > > > 
> > > 
> > 
> 
> > > > > > a
> > > > 
> > > 
> > 
> 
> > > > > > version of Clang equipped with the outliner prototype
> > > > > > versus
> > > > > > an
> > > > 
> > > 
> > 
> 
> > > > > > equivalent
> > > > 
> > > 
> > 
> 
> > > > > > version of Clang without the outliner.
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > > CODE SIZE REDUCTION
> > > > 
> > > 
> > 
> 
> > > > > > For tests >=4 Kb in non-outlined size, the outliner
> > > > > > currently
> > > > 
> > > 
> > 
> 
> > > > > > provides an
> > > > 
> > > 
> > 
> 
> > > > > > average of 12.94% code size reduction on the LLVM test
> > > > > > suite
> > > > > > in
> > > > 
> > > 
> > 
> 
> > > > > > comparison
> > > > 
> > > 
> > 
> 
> > > > > > to a default Clang, up to 51.4% code size reduction. In
> > > > > > comparison
> > > > > > to
> > > > 
> > > 
> > 
> 
> > > > > > a
> > > > 
> > > 
> > 
> 
> > > > > > Clang with -Oz, the outliner provides an average of a 1.29%
> > > > > > code
> > > > > > size
> > > > 
> > > 
> > 
> 
> > > > > > reduction, up to a 37% code size reduction. I believe that
> > > > > > the
> > > > > > -Oz
> > > > 
> > > 
> > 
> 
> > > > > > numbers
> > > > 
> > > 
> > 
> 
> > > > > > can be further improved by better tuning the outlining cost
> > > > > > model.
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > > EXECUTION TIME IMPACT
> > > > 
> > > 
> > 
> 
> > > > > > On average, the outliner increases execution time by 2% on
> > > > > > the
> > > > > > LLVM
> > > > 
> > > 
> > 
> 
> > > > > > test
> > > > 
> > > 
> > 
> 
> > > > > > suite, but has been also shown to improve exection time by
> > > > > > up
> > > > > > to
> > > > > > 16%.
> > > > 
> > > 
> > 
> 
> > > > > > These results were from a fairly recent Intel processor, so
> > > > > > the
> > > > 
> > > 
> > 
> 
> > > > > > results
> > > > 
> > > 
> > 
> 
> > > > > > may vary. Recent Intel processors have very low latency for
> > > > > > function
> > > > 
> > > 
> > 
> 
> > > > > > calls, which may impact these results. Execution time
> > > > > > improvements
> > > > 
> > > 
> > 
> 
> > > > > > are
> > > > 
> > > 
> > 
> 
> > > > > > likely dependent on the latency of function calls,
> > > > > > instruction
> > > > 
> > > 
> > 
> 
> > > > > > caching
> > > > 
> > > 
> > 
> 
> > > > > > behaviour, and the execution frequency of the code being
> > > > > > outlined.
> > > > > > In
> > > > 
> > > 
> > 
> 
> > > > > > partucular, using a processor with heavy function call
> > > > > > latency
> > > > > > will
> > > > 
> > > 
> > 
> 
> > > > > > likely
> > > > 
> > > 
> > 
> 
> > > > > > increase execution time overhead.
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > > ================================
> > > > 
> > > 
> > 
> 
> > > > > > Implementation
> > > > 
> > > 
> > 
> 
> > > > > > ================================
> > > > 
> > > 
> > 
> 
> > > > > > The outliner, in its current state, is a MachineModulePass.
> > > > > > It
> > > > > > finds
> > > > 
> > > 
> > 
> 
> > > > > > *identical* sequences of MIR, after register allocation,
> > > > > > and
> > > > > > pulls
> > > > 
> > > 
> > 
> 
> > > > > > them
> > > > 
> > > 
> > 
> 
> > > > > > out into their own functions. Thus, it's effectively
> > > > > > assembly-level.
> > > > 
> > > 
> > 
> 
> > > > > > Ultimately, the algorithm used is general, so it can sit
> > > > > > anywhere,
> > > > 
> > > 
> > 
> 
> > > > > > but MIR
> > > > 
> > > 
> > 
> 
> > > > > > was very convenient for the time being.
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > > It requires two data structures.
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > > 1. A generalized suffix tree
> > > > 
> > > 
> > 
> 
> > > > > > 2. A "terminated string"
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > > 1: The generalized suffix tree is constructed using
> > > > > > Ukkonen's
> > > > > > linear
> > > > 
> > > 
> > 
> 
> > > > > > time
> > > > 
> > > 
> > 
> 
> > > > > > construction algorithm [1]. They require linear space and
> > > > > > support
> > > > 
> > > 
> > 
> 
> > > > > > linear-time substring queries. In practice, the
> > > > > > construction
> > > > > > time
> > > > > > for
> > > > 
> > > 
> > 
> 
> > > > > > the
> > > > 
> > > 
> > 
> 
> > > > > > suffix tree is the most time consuming part, but I haven't
> > > > > > noticed
> > > > > > a
> > > > 
> > > 
> > 
> 
> > > > > > difference in compilation time on, say, 12 MB .ll files.
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > > 2: To support the suffix tree, we need a "terminated
> > > > > > string."
> > > > > > This
> > > > > > is
> > > > 
> > > 
> > 
> 
> > > > > > a
> > > > 
> > > 
> > 
> 
> > > > > > generalized string with an unique terminator character
> > > > > > appended
> > > > > > to
> > > > 
> > > 
> > 
> 
> > > > > > the
> > > > 
> > > 
> > 
> 
> > > > > > end. TerminatedStrings can be built from any type.
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > > The algorithm is then roughly as follows.
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > > 1. For each MachineBasicBlock in the program, build a
> > > > 
> > > 
> > 
> 
> > > > > > TerminatedString for
> > > > 
> > > 
> > 
> 
> > > > > > that block.
> > > > 
> > > 
> > 
> 
> > > > > > 2. Build a suffix tree for that collection of strings.
> > > > 
> > > 
> > 
> 
> > > > > > 3. Query the suffix tree for the longest repeated substring
> > > > > > and
> > > > > > place
> > > > 
> > > 
> > 
> 
> > > > > > that
> > > > 
> > > 
> > 
> 
> > > > > > string in a candidate list. Repeat until none are found.
> > > > 
> > > 
> > 
> 
> > > > > > 4. Create functions for each candidate.
> > > > 
> > > 
> > 
> 
> > > > > > 5. Replace each candidate with a call to its respective
> > > > > > function.
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > > Currently, the program itself isn't stored in the suffix
> > > > > > tree,
> > > > > > but
> > > > 
> > > 
> > 
> 
> > > > > > rather
> > > > 
> > > 
> > 
> 
> > > > > > a "proxy string" of integers. This isn't necessary at the
> > > > > > MIR
> > > > > > level,
> > > > 
> > > 
> > 
> 
> > > > > > but
> > > > 
> > > 
> > 
> 
> > > > > > may be for an IR level extension of the algorithm.
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > > ================================
> > > > 
> > > 
> > 
> 
> > > > > > Challenges
> > > > 
> > > 
> > 
> 
> > > > > > ================================
> > > > 
> > > 
> > 
> 
> > > > > > 1) MEMORY CONSUMPTION
> > > > 
> > > 
> > 
> 
> > > > > > Given a string of length n, a naive suffix tree
> > > > > > implementation
> > > > > > can
> > > > 
> > > 
> > 
> 
> > > > > > take up
> > > > 
> > > 
> > 
> 
> > > > > > to 40n bytes of memory. However, this number can be reduced
> > > > > > to
> > > > > > 20n
> > > > 
> > > 
> > 
> 
> > > > > > with a
> > > > 
> > > 
> > 
> 
> > > > > > bit of work [2]. Currently, the suffix tree stores the
> > > > > > entire
> > > > 
> > > 
> > 
> 
> > > > > > program,
> > > > 
> > > 
> > 
> 
> > > > > > including instructions which really ought not to be
> > > > > > outlined,
> > > > > > such
> > > > > > as
> > > > 
> > > 
> > 
> 
> > > > > > returns. These instructions should not be included in the
> > > > > > final
> > > > 
> > > 
> > 
> 
> > > > > > implementation, but should rather act as terminators for
> > > > > > the
> > > > > > strings.
> > > > 
> > > 
> > 
> 
> > > > > > This
> > > > 
> > > 
> > 
> 
> > > > > > will likely curb memory consumption. Suffix trees have been
> > > > > > used
> > > > > > in
> > > > 
> > > 
> > 
> 
> > > > > > the
> > > > 
> > > 
> > 
> 
> > > > > > past in sliding-window-based compression schemes, which may
> > > > > > serve
> > > > > > as
> > > > 
> > > 
> > 
> 
> > > > > > a
> > > > 
> > > 
> > 
> 
> > > > > > source of inspiration for reducing memory overhead.[3]
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > > Nonetheless, the outliner probably shouldn't be run unless
> > > > > > it
> > > > > > really
> > > > 
> > > 
> > 
> 
> > > > > > has
> > > > 
> > > 
> > 
> 
> > > > > > to be run. It will likely be used mostly in embedded
> > > > > > spaces,
> > > > > > where
> > > > 
> > > 
> > 
> 
> > > > > > the
> > > > 
> > > 
> > 
> 
> > > > > > programs have to fit into small devices anyway. Thus,
> > > > > > memory
> > > > > > overhead
> > > > 
> > > 
> > 
> 
> > > > > > for
> > > > 
> > > 
> > 
> 
> > > > > > the compiler likely won't be a problem. The outliner should
> > > > > > only
> > > > > > be
> > > > 
> > > 
> > 
> 
> > > > > > used
> > > > 
> > > 
> > 
> 
> > > > > > in -Oz compilations, and possibly should have its own flag.
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > > 2) EXECUTION TIME
> > > > 
> > > 
> > 
> 
> > > > > > Currently, the outliner isn't tuned for preventing
> > > > > > execution
> > > > > > time
> > > > 
> > > 
> > 
> 
> > > > > > increases. There is an average of a 2% execution time hit
> > > > > > on
> > > > > > the
> > > > 
> > > 
> > 
> 
> > > > > > tests in
> > > > 
> > > 
> > 
> 
> > > > > > the LLVM test suite, with a few outliers of up to 30%. The
> > > > > > outliers
> > > > 
> > > 
> > 
> 
> > > > > > are
> > > > 
> > > 
> > 
> 
> > > > > > tests which contain hot loops. The outliner really ought to
> > > > > > be
> > > > > > able
> > > > 
> > > 
> > 
> 
> > > > > > to use
> > > > 
> > > 
> > 
> 
> > > > > > profiling information and not outline from hot areas.
> > > > > > Another
> > > > 
> > > 
> > 
> 
> > > > > > suggestion
> > > > 
> > > 
> > 
> 
> > > > > > people have given me is to add a "never outline" directive
> > > > > > which
> > > > 
> > > 
> > 
> 
> > > > > > would
> > > > 
> > > 
> > 
> 
> > > > > > allow the user to say something along the lines of "this is
> > > > > > a
> > > > > > hot
> > > > 
> > > 
> > 
> 
> > > > > > loop,
> > > > 
> > > 
> > 
> 
> > > > > > please never outline from it".
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > > It's also important to note that these numbers are coming
> > > > > > from
> > > > > > a
> > > > 
> > > 
> > 
> 
> > > > > > fairly
> > > > 
> > > 
> > 
> 
> > > > > > recent Intel processor.
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > > 3) CONSTRAINTS ON INSTRUCTIONS
> > > > 
> > > 
> > 
> 
> > > > > > The outliner currently won't pull anything out of functions
> > > > > > which
> > > > > > use
> > > > 
> > > 
> > 
> 
> > > > > > a
> > > > 
> > > 
> > 
> 
> > > > > > red zone. It also won't pull anything out that uses the
> > > > > > stack,
> > > > 
> > > 
> > 
> 
> > > > > > instruction
> > > > 
> > > 
> > 
> 
> > > > > > pointer, uses constant pool indices, CFI indices, jump
> > > > > > table
> > > > > > indices,
> > > > 
> > > 
> > 
> 
> > > > > > or
> > > > 
> > > 
> > 
> 
> > > > > > frame indices. This removes many opportunities for
> > > > > > outlining
> > > > > > which
> > > > 
> > > 
> > 
> 
> > > > > > would
> > > > 
> > > 
> > 
> 
> > > > > > likely be available at a higher level (such as IR). Thus,
> > > > > > there's
> > > > > > a
> > > > 
> > > 
> > 
> 
> > > > > > case
> > > > 
> > > 
> > 
> 
> > > > > > for moving this up to a higher level.
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > > ================================
> > > > 
> > > 
> > 
> 
> > > > > > Additional Applications
> > > > 
> > > 
> > 
> 
> > > > > > ================================
> > > > 
> > > 
> > 
> 
> > > > > > The suffix tree itself could be used as a tool for finding
> > > > 
> > > 
> > 
> 
> > > > > > opportunities
> > > > 
> > > 
> > 
> 
> > > > > > to refactor code. For example, it could recognize places
> > > > > > where
> > > > > > the
> > > > 
> > > 
> > 
> 
> > > > > > user
> > > > 
> > > 
> > 
> 
> > > > > > likely copied and pasted some code. This could be run on
> > > > > > codebases
> > > > > > to
> > > > 
> > > 
> > 
> 
> > > > > > find
> > > > 
> > > 
> > 
> 
> > > > > > areas where people could manually outline things at the
> > > > > > source
> > > > > > level.
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > > Using the terminated string class, it would also be
> > > > > > possible
> > > > > > to
> > > > 
> > > 
> > 
> 
> > > > > > implement
> > > > 
> > > 
> > 
> 
> > > > > > other string algorithms on code. This may open the door to
> > > > > > new
> > > > > > ways
> > > > 
> > > 
> > 
> 
> > > > > > to
> > > > 
> > > 
> > 
> 
> > > > > > analyze existing codebases.
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > > ================================
> > > > 
> > > 
> > 
> 
> > > > > > Roadmap
> > > > 
> > > 
> > 
> 
> > > > > > ================================
> > > > 
> > > 
> > 
> 
> > > > > > The current outliner is *very* prototypical. The version I
> > > > > > would
> > > > > > want
> > > > 
> > > 
> > 
> 
> > > > > > to
> > > > 
> > > 
> > 
> 
> > > > > > upstream would be a new implementation. Here's what I'd
> > > > > > like
> > > > > > to
> > > > 
> > > 
> > 
> 
> > > > > > address
> > > > 
> > > 
> > 
> 
> > > > > > and accomplish.
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > > 1. Ask "what does the LLVM community want from an outliner"
> > > > > > and
> > > > > > use
> > > > 
> > > 
> > 
> 
> > > > > > that
> > > > 
> > > 
> > 
> 
> > > > > > to drive development of the algorithm.
> > > > 
> > > 
> > 
> 
> > > > > > 2. Reimplement the outliner, perhaps using a less
> > > > > > memory-intensve
> > > > 
> > > 
> > 
> 
> > > > > > data
> > > > 
> > > 
> > 
> 
> > > > > > structure like a suffix array.
> > > > 
> > > 
> > 
> 
> > > > > > 3. Begin adding features to the algorithm, for example:
> > > > 
> > > 
> > 
> 
> > > > > > i. Teaching the algorithm about hot/cold blocks of code and
> > > > 
> > > 
> > 
> 
> > > > > > taking
> > > > 
> > > 
> > 
> 
> > > > > > that into account.
> > > > 
> > > 
> > 
> 
> > > > > > ii. Simple parameter passing.
> > > > 
> > > 
> > 
> 
> > > > > > iii. Similar function outlining-- eg, noticing that two
> > > > > > outlining
> > > > 
> > > 
> > 
> 
> > > > > > candidates are similar and can be merged into one function
> > > > > > with
> > > > > > some
> > > > 
> > > 
> > 
> 
> > > > > > control flow.
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > > ================================
> > > > 
> > > 
> > 
> 
> > > > > > Code
> > > > 
> > > 
> > 
> 
> > > > > > ================================
> > > > 
> > > 
> > 
> 
> > > > > > Note: This code requires MachineModulePasses
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > > * Main pass:
> > > > 
> > > 
> > 
> 
> > > > > > https://github.com/ornata/llvm/blob/master/lib/CodeGen/MachineOutliner.h
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > > * Suffix tree:
> > > > 
> > > 
> > 
> 
> > > > > > https://github.com/ornata/llvm/blob/master/include/llvm/ADT/SuffixTree.h
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > > * TerminatedString and TerminatedStringList:
> > > > 
> > > 
> > 
> 
> > > > > > https://github.com/ornata/llvm/blob/master/include/llvm/ADT/TerminatedString.h
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > > Here are a couple unit tests for the data structures.
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > > * Suffix tree unit tests:
> > > > 
> > > 
> > 
> 
> > > > > > https://github.com/ornata/llvm/blob/master/unittests/ADT/SuffixTreeTest.cpp
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > > * TerminatedString unit tests:
> > > > 
> > > 
> > 
> 
> > > > > > https://github.com/ornata/llvm/blob/master/unittests/ADT/TerminatedStringTest.cpp
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > > * TerminatedStringList unit tests:
> > > > 
> > > 
> > 
> 
> > > > > > https://github.com/ornata/llvm/blob/master/unittests/ADT/TerminatedStringListTest.cpp
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > > ================================
> > > > 
> > > 
> > 
> 
> > > > > > References
> > > > 
> > > 
> > 
> 
> > > > > > ================================
> > > > 
> > > 
> > 
> 
> > > > > > [1] Ukkonen's Algorithm:
> > > > 
> > > 
> > 
> 
> > > > > > https://www.cs.helsinki.fi/u/ukkonen/SuffixT1withFigs.pdf
> > > > 
> > > 
> > 
> 
> > > > > > [2] Suffix Trees and Suffix Arrays:
> > > > 
> > > 
> > 
> 
> > > > > > http://web.cs.iastate.edu/~cs548/suffix.pdf
> > > > 
> > > 
> > 
> 
> > > > > > [3] Extended Application of Suffix Trees to Data
> > > > > > Compression:
> > > > 
> > > 
> > 
> 
> > > > > > http://www.larsson.dogma.net/dccpaper.pdf
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > > Thanks for reading,
> > > > 
> > > 
> > 
> 
> > > > > > Jessica
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > > _______________________________________________
> > > > 
> > > 
> > 
> 
> > > > > > LLVM Developers mailing list
> > > > 
> > > 
> > 
> 
> > > > > > llvm-dev at lists.llvm.org
> > > > 
> > > 
> > 
> 
> > > > > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 

> > > > > --
> > > > 
> > > 
> > 
> 
> > > > > Hal Finkel
> > > > 
> > > 
> > 
> 
> > > > > Assistant Computational Scientist
> > > > 
> > > 
> > 
> 
> > > > > Leadership Computing Facility
> > > > 
> > > 
> > 
> 
> > > > > Argonne National Laboratory
> > > > 
> > > 
> > 
> 

> > > > > _______________________________________________
> > > > 
> > > 
> > 
> 
> > > > > LLVM Developers mailing list
> > > > 
> > > 
> > 
> 
> > > > > llvm-dev at lists.llvm.org
> > > > 
> > > 
> > 
> 
> > > > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> > > > 
> > > 
> > 
> 

> > > --
> > 
> 

> > > Hal Finkel
> > 
> 
> > > Assistant Computational Scientist
> > 
> 
> > > Leadership Computing Facility
> > 
> 
> > > Argonne National Laboratory
> > 
> 
> > > _______________________________________________
> > 
> 
> > > LLVM Developers mailing list
> > 
> 
> > > llvm-dev at lists.llvm.org
> > 
> 
> > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> > 
> 
> > _______________________________________________
> 
> > LLVM Developers mailing list
> 
> > llvm-dev at lists.llvm.org
> 
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> 

-- 

Hal Finkel 
Assistant Computational Scientist 
Leadership Computing Facility 
Argonne National Laboratory 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160826/e5551446/attachment-0001.html>