[llvm-dev] [RFC] Framework for Finding and Using Similarity at the IR Level

Tue Sep 1 19:24:32 PDT 2020

This looks like a very worthwhile project. Often we are looking for 
faster execution with code size being a secondary issue though some uses 
will have the reverse priority. Hopefully, the project will provide 
execution times along with code-size reductions.

Neil Nelson

On 9/1/20 3:29 PM, Andrew Litteken via llvm-dev wrote:
> Hello,
>
> I’m Andrew Litteken, and I am working on a framework for defining, 
> detecting, and deduplicating similar code sections at the IR level.
>
> Programmers can introduce sections of code that perform nearly the 
> same operation.  By copying and pasting code throughout a code base, 
> or using a piece of code as a reference to create a new piece of code 
> that has nearly the same structure redundancies are inadvertently 
> introduced throughout programs.  Furthermore, compilers can introduce 
> similarity through sets of instructions that require the same code 
> generation strategies, or optimizing to the same patterns.
>
> For example these two pieces of code:
>
> int fn(const std::vector<int> &myVec) {
>     for (auto it = myVec.begin(),
>       et = myVec.end(); it != et; ++it) {
>         if (*it & 1)
> return 0;
>     }
>     return 1;
> }
>
> And
>
> int fn(const std::vector<int> &myVec) {
>     for (const int &x : myVec) {
>         if (x % 2)
> return 1;
>     }
>     return 0;
> }
>
> have a conditional that is finding whether the variable (it and x 
> respectively) is equal to 1. And, they generate the same piece of IR 
> code for the conditional:
>
> %11 = load i32, i32* %10, align 4
> %12 = and i32 %11, 1
> %13 = icmp eq i32 %12, 0
> %14 = getelementptr inbounds i32, i32* %10, i64 1
> br i1 %13, label %7, label %15
>
> Such sections of code can be called similar.  Two sections of code are 
> considered similar when they perform the same operations on a given 
> set of inputs.  However, this does not require identity.  This 
> particular definition allows for the same sequences of instructions to 
> have different operand names.  For example:
>
> %1 = add i32 %a, 10
> %2 = add i32 %a, %1
> %3 = icmp slt icmp %1, %2
>
> and
>
> %1 = add i32 11, %a
> %2 = sub i32 %a, %1
> %3 = icmp sgt icmp %2, %1
>
> are clearly not identical.  There are different constants, the 
> comparison operator is different and there are flipped operands.  But 
> if the constant 10, the constant 11, and register %a are considered 
> inputs inputs into these sections, then the two regions could reduce 
> to be:
>
> %1 = add i32 %input1, %input2
> %2 = sub %input2 %a, %1
> %3 = icmp slt icmp %1, %2
>
> Ultimately, these sets of operations that are performing the same 
> action, even though the operands are different.  This patch focuses on 
> this sort of “structural” similarity.
>
> However, this concept can be easily generalized.  For example, two 
> sections of code could be considered similar when they contain the 
> same instructions, but are not necessarily used in the same order. 
> Yet, because of the way the operands are used in this different 
> ordering, the instructions still compute the same results for the 
> given inputs.
>
> If these similarities are recognized, it could offer improvements 
> towards reducing code size, analyzing redundancies in a large project, 
> or creating tools to help refactor code for a programmer.
>
> This new framework offers an interface to detect similarity at the IR 
> level.  It has an internal representation that can be accessed via an 
> analysis, allowing for easy access by other transformation and 
> analysis passes that want to leverage detected similarity within a 
> program.  It can also be easily serialized into a file format like 
> JSON, or as OptRemarks.  In the future, a more advanced notion of 
> similarity could be used, and would it be relatively simple to swap a 
> new detection algorithm into the similarity framework.
>
> At present, there have been two different applications developed on 
> top of this framework. The first is a similarity visualizer tool 
> called LLVM-Sim that can ingest a module, and uses the framework to 
> find the similarity in that module, and output a report for a 
> programmer in a serialized form, like mentioned above.  The second is 
> an IR Outliner using the analysis pass as the basis for the outlining 
> candidates. I will first talk briefly about the framework before 
> moving into the two applications.
>
> *Finding Similarity:*
>
> The framework for finding similarity is centered around adding an 
> analysis pass that finds similar regions of code that do not cross 
> basic blocks, and groups them into different groups of similarity 
> based on their structure. These are then made available to other 
> passes, such as an outliner, or a tool, such as LLVM-Sim.
>
> In more detail, finding similarity works by hashing instructions to 
> integers, and placing them in a long list.  The list can be thought of 
> as a string, with each unsigned integer acting as a character.  This 
> string can then be processed by a Suffix Tree to find repeated 
> substrings in the program.  Following this, the instances of each 
> repeated substring are analyzed for structural similarity.  In this 
> case, if a one-to-one mapping for the operands of two different 
> instances of a repeated substring can be found, so that each set of 
> operands is operated on in the same way, then those two regions of 
> instructions are considered to be structurally similar to one 
> another.  These tasks are handled by three new classes: The 
> IRInstructionMapper, the IRSimilarityCandidate, and the 
> IRSimilarityIdentifier.
>
> *Finding Similarity Implementation:*
>
> *IRInstructionMapper:*
> Since matching instructions that are identical does not provide that 
> much information when looking for structural similarity, the value of 
> the instruction, and the value of the operands for most instructions 
> are not considered. Instead only the opcode, types and parameters are 
> considered. Ultimately, if the opcode and types match, then the 
> instructions are considered to be performing the same operation.  
> However, for some instructions like shufflevector, or getelementptr 
> instructions, it is ensured that the parameters match, in keeping with 
> the “isSameOperationAs” instruction method.  Each unique instruction 
> via this scheme is mapped to an unsigned integer.  Creating the 
> mapping of the entire program to integers treats the program like a 
> string with each integer as a character, and allows the use of data 
> structures such as the Suffix Tree (recently refactored into 
> LibSupport in review D80586 <https://reviews.llvm.org/D80586>) to 
> process the string for repeated sequences of instruction in program.
>
> This maps instructions such as
> %1 = add i32 %a, 1
> %2 = add i32 %b, %c
> To the same values as they have the same opcode and same type, and 
> instructions such as
> %1 = add i32 %a, 1
> %2 = add i64 %b, %c
> %3 = mul i64 %b, %c
> Are mapped to different values, since they have different opcodes and 
> types.
>
> *IRSimilarityIdentifier:*
> The list of integers representing the program is then passed to the 
> Suffix Tree, which can quickly find repeated substrings, and creates 
> an IRSimilarityCandidate for each instance of these repeated 
> substring. The IRSimilarityCandidate provides an interface for 
> comparing the operand use and instructions of one 
> IRSimilarityCandidate to another to determine structural similarity.
>
> Structural similarity is determined by attempting to find a one-to-one 
> mapping from each register in one candidate to each register in 
> another. If this mapping is found, then the two candidates are 
> performing the same operations on a given set of inputs and are 
> structurally similar.  For example:
>
> %1 = add %2, %3
> %4 = add %1, %2
>
> Is equivalent to
>
> %a = add %b, %c
> %d = add %b, %a
>
> And the IRSimilarityCandidates containing these sections of code would 
> be considered structurally similar. But
>
> %1 = add %2, %3
> %4 = add %1, %2
>
> Is not equivalent to
>
> %a = add %b, %c
> %d = add %a, %e.
>
> Since the use of %2 in the first section does not match the use of 
> %b in the second. So the IRSimilarityCandidates containing these 
> sections would not be considered structurally similar.
>
> The IRSimilarityIdentifier takes the IRSimilarityCandidates created 
> from the instances of the repeated substrings and uses the 
> IRSimilarityCandidate’s interface, described above, to determine which 
> instances of a given repeated substring are structurally similar to 
> one another.
>
> Once it has been determined which instances of a repeated substring 
> that are structurally similar, sets of IRSimilarityCandidates that 
> have been found to be compatible are grouped together (a Similarity 
> Group).  The IRSimilarityCandidates contained in each group are 
> structurally similar to one another.  In terms of the example above, 
> the first two sections of code would be in the same Similarity Group, 
> but the third is not structurally similar to either. Therefore, it 
> would be in its own group, and does not have a matching structure with 
> another IRSimilarityCandidate in the program.
>
> *Application: Visualizing Similarity*
>
> One potential application is simply exporting any similarities found 
> for viewing by a user. LLVM-Sim is a tool that accepts a module, and 
> passes them to the IRSimilarityIdentifier and retrieves the Similarity 
> Groups for the module.  These are then exported to a JSON file that 
> has an entry for each Similarity Group, and a list of ranges for each 
> region of similarity contained in that group.  This tool also could be 
> implemented with remarks rather than JSON for faster, more streamlined 
> processing of program information.
>
> If a module had a Similarity Group with two sections, instructions 5 
> to 10 and instructions 15 to 20, and a different Similarity Group 
> contained two sections between instructions 1 to 4 and 21 to 24, the 
> output JSON would be:
>
> {
>   “1": [{"s": 5, "e": 10, {“s": 15, "e": 20}],
>   “2": [{"s": 1, "e": 4}, {"s": 21, "e": 24}]
> }
>
> This information could be used to create interfaces to look at, and 
> compare the similarity present in the program, such as in the example 
> interface here: https://reviews.llvm.org/D86974, which has two 
> instances of the program alongside its IR with two different similar 
> sections highlighted.
>
> *Application: Deduplicating Similarity with Outlining:*
>
> Alternatively, following identifying the similar sections of the 
> program, the similar regions of code can be deduplicated.  The IR 
> Outliner makes use of the Code Extractor and the new Similarity 
> Analysis Pass to remove each of the similar ranges individually, and 
> the new IR Outliner deduplicates the individually extracted functions 
> for each region ensuring that the information needed for each 
> extraction is not lost.
>
> Outlining at the IR level can benefit any target architecture and 
> reduce the overall size of a program before any target specific 
> decisions.  On the other hand, the Machine Outliner in LLVM runs after 
> the register allocator and needs to be implemented for each target 
> separately.  The concept of code structural similarity is easier to 
> implement in a general way at the IR level.
>
> The IR Outliner is off by default.  There is a flag 
> -mllvm -ir-outliner that will use the defined cost model to attempt to 
> outline sections of code that will lead to an overall decrease in the 
> size of the program. There is also a debugging option 
> -mllvm -ir-outliner-no-cost that will outline the Similarity Groups 
> greedily based on how many instructions will be removed, but does not 
> consider the number of instructions added.
>
> There are more details about the implementation of the IR Outliner at 
> the end of the RFC.
>
> *Results:*
> The LLVM Test Suite, compiled with the IR Outliner on top of -Oz, has 
> seen code size reductions of up to 60% and an average of 1% code size 
> reduction for both X86 and AArch64.  In these tests, the IR Outliner 
> is placed as the last stage before code generation for the given 
> target.  There are some regressions.  These are due to the fact that 
> our cost model is not as accurate at the assembly code generation 
> phase.  It can be difficult to estimate how many instructions will be 
> added during code generation at the IR level.  For instance, if many 
> arguments need to be created to handle constants, the decisions when 
> creating a call at the assembly level are not directly obvious.  So, 
> our cost model may estimate that fewer instructions are added than 
> actually need to be.  Using target hooks may be a way to address this 
> issue to get a better sense of what will happen during code generation 
> than a general framework for calling conventions.
>
> CTMark Code Size (x86_64):
>
> Test   |Original|   With  | Reduction
>   | -Oz    |Outlining|
> -----------------------------------------------------
> consumer-typeset.test | 484566 | 458642  | -5.3%
> SPASS.test   | 490445 | 470669  | -4.0%
> sqlite3.test   | 525902 | 512813  | -2.5%
> clamscan.test   | 569405 | 567437  | -0.3%
> lencod.test   | 762084 | 759454  | -0.3%
> bullet.test   | 779072 | 779072  |  0.0%
> kc.test   | 482848 | 482848  |  0.0%
> tramp3d-v4.test   | 708712 | 708712  |  0.0%
> pairlocalalign.test   | 485246 | 485462  | +0.0%
> 7zip-benchmark.test   | 982651 | 983563  | +0.1%
> Geomean difference: -1.3%
>
> Notable Decreases:
> GCC-C-execute-20041011-1.test: 6600 before, 2343 after, -64.5%.
> This test uses a lot of macros, and the outliner picks up on the 
> similarity of the same set of inserted instructions.
> frame_layout: 113631 before, 98110 after, -13.7%.
> This test has a lot of repetitive and repeated instructions performed 
> on different sets of arguments, making it a good candidate for outlining.
>
> *Related Work:*
> The previous RFC (RFC: PT.2 Add IR level interprocedural outliner for 
> code size. 
> <http://lists.llvm.org/pipermail/llvm-dev/2017-September/117153.html>) 
> and review (D53942: IR Outliner Pass 
> <https://reviews.llvm.org/D53942>) discuss similar techniques for 
> outlining identical instructions at the machine code level. This patch 
> uses a more general concept of similarity and works at the IR Level 
> instead.  Both share the Suffix Tree data structure (see 
> Interprocedural MIR-level outlining pass 
> <http://lists.llvm.org/pipermail/llvm-dev/2016-August/104170.html>) to 
> identify “outlinable” code sequences within a module.
>
> The framework could also be extended to encompass the Machine 
> Outliner, so a MachineCodeSimilarityIdentifier that feeds the 
> MachineOutliner could be used rather than having all of the 
> functionality inside the one MachineOutliner pass.
>
> *Similarity Identification Implementation:*
>
> *IR Outliner Implementation:*
> *- Cost Model:*
> Using information such as region size, inputs, and outputs, across all 
> the candidates, the arguments that need to be passed to the outlined 
> function are found.  From this, estimates are made for:
>
>   * How many instructions will need to be added to make a call to the
>     function.
>   * How many instructions will be removed by inserting the call.
>   * How many instructions will be added by creating the new function
>     due to argument handling.
>   * How many instructions will be added  from stack instructions at
>     call site.
>   * How many instructions will be added  from stack instructions
>     inside call.
>
> Then the outlined function is only created if the number of 
> instructions added is less than the number of instructions removed.
>
> Right now, this cost model was developed through a general sense of 
> what happens in most calling conventions, and is conservative.  Target 
> hooks would allow for a target specific sense of how large each 
> instruction is, so better sense how many machine instructions are 
> being outlined could be developed. It would also give insight into the 
> particular calling convention in use.  This would allow for a more 
> accurate cost model, and find more opportunities for outlining.
>
> *-  Extraction and Consolidation:*
> If it is determined that outlining all of the IRSimilarityCandidates 
> in a Similarity Group will add fewer instructions than it removes, 
> each candidate's block is extracted using the the Code Extractor.  The 
> contents of one of the functions is placed in a new overall function, 
> that has input and outputs depending on the arguments needed for each 
> region, as determined by the Code Extractor.  When the consolidation 
> is performed, anything that is not the same across each section, like 
> constants, are extracted into arguments as well.  If the arguments are 
> large, they are current extracted directly into the function arguments 
> as well and are not passed by reference.
>
> As an example of extraction and consolidation, if the following two 
> blocks are extracted, which each have a different constant in the 
> second operand of the first add,
>
> %1 = add i32 %a, 10
> %2 = add i32 %1, %a
>
> %1 = add i32 %a, 11
> %2 = add i32 %1, %a
>
> The extracted function would be:
>
> define internal @outlined_ir_func(i32 %a, i32 %b) {
> entry:
>   %1 = add i32 %a, %b
>   %2 = add i32 %1, %a
>   ret void
> }
> And calls of:
>
> call void @outlined_ir_func(i32 %a, 10)
> call void @outlined_ir_func(i32 %a, 11)
>
> For values that are used outside of the region, but are created inside 
> the region, a register that has a pointer to a location is passed as 
> an argument and the data is stored into that register.  However, 
> between different similar sections, the values used for outputs are 
> not necessarily the same.  If this is the case, a block for each set 
> of possible outputs needed is added, with a switch statement to 
> different sections, controlled by an extra argument.
>
> For example, if the following two blocks were extracted, which each 
> have a different item used outside of the similar section
>
> %1 = add i32 %a, %b
> %2 = add i32 %1, %a
> %3 = mul i32 %1, %1
>
> %4 = add i32 %a, %b
> %5 = add i32 %5, %a
> %6 = div i32 %5, %5
> The extracted function would be:
>
> define internal @outlined_ir_func(i32 %a, i32 %b, i32* %c, i32 %d) {
> entry:
>   %1 = add i32 %a, %b
>   %2 = add i32 %1, %a
>   switch %d, label %final [
>     i32 0, label %block1
>     i32 1, label %block2
>   ]
> block1:
>   store i32 %1, %c
> block2:
>   store i32 %2, %c
> }
>
> And calls of
>
> call void @outlined_ir_func(i32 %a, i32 %b, i32* %c, 0)
> %val = load i32, i32* %c
> %3 = mul i32 %val, %val
>
> call void @outlined_ir_func(i32 %a, i32 %b, i32* %c, 1)
> %val1 = load i32, i32* %c
>
> %6 = div i32 %val1, %val1
>
> Once there is complete function, the call sites are replaced with the 
> new function.  If a function does not require a certain argument it is 
> simply replaced with a null value.  For instance, in the example 
> above, if there was a third region did not need any sort of output 
> value the call would be:
>
> call void @outlined_ir_func(i32 %a, i32 %b, i32* nullptr, -1)
>
> Using a negative one for the switch statement, the program simply 
> falls through to the ending block.
>
> *Implementation in Many Patches:*
>
> Each of these patches includes tests.  But the minimum number of 
> patches needed to see outlining are the Instruction Mapper, Candidate, 
> Identifier, and Wrapper pass paired with the initial setup and 
> extraction with the input consolidation.  These are current on 
> Phabricator and the links are included.
>
> *Similarity Identification Analysis Pass:*
> */Instruction Mapper:/*
>
>   * IRInstructionMapper: Adding initial mapping of instructions to
>     unsigned integers
>       o Link: https://reviews.llvm.org/D86968
>       o Link ilist Parent: https://reviews.llvm.org/D86969
>
> */IRInstructionCandidate:/*
>
>   * IRInstructionCandidate: Adding initial candidate structural code,
>     with no special cases for commutativity.
>       o Link Basic: https://reviews.llvm.org/D86970
>       o Link Structural Comparison: https://reviews.llvm.org/D86971
>
> */IRSimilarityIdentifier:/*
>
>   * IRSimilarityIdentifier: Add Suffix Tree processing and grouping
>     candidates by structural similarity.
>       o Link: https://reviews.llvm.org/D86972
>
> */WrapperPass:/*
>
>   * Wrapper Pass: Adding a wrapper and printer pass for the
>     IRSimilarityIdentifier.
>       o Link: https://reviews.llvm.org/D86973
>
> */Extra Features:/*
> - Commutativity IRSimilarityCandidate: Adding flexibility for 
> commutative instructions in structural comparison.
>
>   * Predicate Isomorphism IRInstructionMapper/IRSimilarityCandidate:
>     Adding flexibility for isomorphic predicate instructions in
>     structural comparison.
>
>
> *LLVM-Sim:*
>
>   * LLVM-Sim Basic: Process multiple files using LLVM-Sim and give
>     results of the identifier to a JSON File.
>       o Link: https://reviews.llvm.org/D86974
>
> *IROutliner:*
> */Extraction:/*
>
>   * Initial Setup and Extraction: Extract the different regions
>     individually
>       o Link: https://reviews.llvm.org/D86975
>
> */Consolidation:/*
>
>   * IR Outliner: Adding checks to exclude certain functions that
>     either cannot be outlined, or require extra work to outline.
>       o Link: https://reviews.llvm.org/D86976
>   * IROutliner Only Inputs: Extract only regions that require inputs,
>     and consolidate each group into one outlined function.
>       o Link Extract: https://reviews.llvm.org/D86977
>       o Link Consolidate: https://reviews.llvm.org/D86978
>   * IROutliner Constants: Elevate constants to arguments when they are
>     not the same in each region.
>   * IROutliner Assumes: Handles the special case of llvm.assumes which
>     are removed from extracted regions.
>   * IROutliner Sunken Allocas: Handles the case of objects with their
>     lifetimes entirely included in the region, and elevates them to
>     arguments.
>   * IROutliner Outputs: Handles the case of multiple different output
>     schemes in different regions.
>   * IROutliner Reduce Output Blocks: remove duplicate output scheme
>     plots for output registers in outlined functions.
>
> */Miscellaneous Features:/*
>
>   * IROutliner Attribute Merging: Makes sure that function attributes
>     are merged consistently.
>   * IROutliner Attribute Compatibility: Adds an option to turn on
>     function attribute incompatibilities for items like sanitizing
>     memory for outlined function.
>   * IROutliner DebugInfo: Makes sure that debug info is consistent for
>     outlined functions.
>   * IROutliner Cost Model: Adds a cost model, so that reductions are
>     prioritized rather than what comes first in terms of sheer code
>     size outlined.
>   * IROutliner with OptRemarks: Adding OptRemarks to the outliner.
>   * IROutliner with SwiftError: Adding support for swifterror outlined
>     parameters
>   * IROutliner with LinkOnceODR: Adding flag to outline from
>     LinkOnceODR functions.
>
>
> Any feedback, comments, and discussion are welcome and appreciated!
>
> Thank You,
> Andrew
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200901/d22aa72c/attachment-0001.html>