[llvm-dev] [RFC] Framework for Finding and Using Similarity at the IR Level
Neil Nelson via llvm-dev
llvm-dev at lists.llvm.org
Tue Sep 1 19:24:32 PDT 2020
This looks like a very worthwhile project. Often we are looking for
faster execution with code size being a secondary issue though some uses
will have the reverse priority. Hopefully, the project will provide
execution times along with code-size reductions.
Neil Nelson
On 9/1/20 3:29 PM, Andrew Litteken via llvm-dev wrote:
> Hello,
>
> I’m Andrew Litteken, and I am working on a framework for defining,
> detecting, and deduplicating similar code sections at the IR level.
>
> Programmers can introduce sections of code that perform nearly the
> same operation. By copying and pasting code throughout a code base,
> or using a piece of code as a reference to create a new piece of code
> that has nearly the same structure redundancies are inadvertently
> introduced throughout programs. Furthermore, compilers can introduce
> similarity through sets of instructions that require the same code
> generation strategies, or optimizing to the same patterns.
>
> For example these two pieces of code:
>
> int fn(const std::vector<int> &myVec) {
> for (auto it = myVec.begin(),
> et = myVec.end(); it != et; ++it) {
> if (*it & 1)
> return 0;
> }
> return 1;
> }
>
> And
>
> int fn(const std::vector<int> &myVec) {
> for (const int &x : myVec) {
> if (x % 2)
> return 1;
> }
> return 0;
> }
>
> have a conditional that is finding whether the variable (it and x
> respectively) is equal to 1. And, they generate the same piece of IR
> code for the conditional:
>
> %11 = load i32, i32* %10, align 4
> %12 = and i32 %11, 1
> %13 = icmp eq i32 %12, 0
> %14 = getelementptr inbounds i32, i32* %10, i64 1
> br i1 %13, label %7, label %15
>
> Such sections of code can be called similar. Two sections of code are
> considered similar when they perform the same operations on a given
> set of inputs. However, this does not require identity. This
> particular definition allows for the same sequences of instructions to
> have different operand names. For example:
>
> %1 = add i32 %a, 10
> %2 = add i32 %a, %1
> %3 = icmp slt icmp %1, %2
>
> and
>
> %1 = add i32 11, %a
> %2 = sub i32 %a, %1
> %3 = icmp sgt icmp %2, %1
>
> are clearly not identical. There are different constants, the
> comparison operator is different and there are flipped operands. But
> if the constant 10, the constant 11, and register %a are considered
> inputs inputs into these sections, then the two regions could reduce
> to be:
>
> %1 = add i32 %input1, %input2
> %2 = sub %input2 %a, %1
> %3 = icmp slt icmp %1, %2
>
> Ultimately, these sets of operations that are performing the same
> action, even though the operands are different. This patch focuses on
> this sort of “structural” similarity.
>
> However, this concept can be easily generalized. For example, two
> sections of code could be considered similar when they contain the
> same instructions, but are not necessarily used in the same order.
> Yet, because of the way the operands are used in this different
> ordering, the instructions still compute the same results for the
> given inputs.
>
> If these similarities are recognized, it could offer improvements
> towards reducing code size, analyzing redundancies in a large project,
> or creating tools to help refactor code for a programmer.
>
> This new framework offers an interface to detect similarity at the IR
> level. It has an internal representation that can be accessed via an
> analysis, allowing for easy access by other transformation and
> analysis passes that want to leverage detected similarity within a
> program. It can also be easily serialized into a file format like
> JSON, or as OptRemarks. In the future, a more advanced notion of
> similarity could be used, and would it be relatively simple to swap a
> new detection algorithm into the similarity framework.
>
> At present, there have been two different applications developed on
> top of this framework. The first is a similarity visualizer tool
> called LLVM-Sim that can ingest a module, and uses the framework to
> find the similarity in that module, and output a report for a
> programmer in a serialized form, like mentioned above. The second is
> an IR Outliner using the analysis pass as the basis for the outlining
> candidates. I will first talk briefly about the framework before
> moving into the two applications.
>
> *Finding Similarity:*
>
> The framework for finding similarity is centered around adding an
> analysis pass that finds similar regions of code that do not cross
> basic blocks, and groups them into different groups of similarity
> based on their structure. These are then made available to other
> passes, such as an outliner, or a tool, such as LLVM-Sim.
>
> In more detail, finding similarity works by hashing instructions to
> integers, and placing them in a long list. The list can be thought of
> as a string, with each unsigned integer acting as a character. This
> string can then be processed by a Suffix Tree to find repeated
> substrings in the program. Following this, the instances of each
> repeated substring are analyzed for structural similarity. In this
> case, if a one-to-one mapping for the operands of two different
> instances of a repeated substring can be found, so that each set of
> operands is operated on in the same way, then those two regions of
> instructions are considered to be structurally similar to one
> another. These tasks are handled by three new classes: The
> IRInstructionMapper, the IRSimilarityCandidate, and the
> IRSimilarityIdentifier.
>
> *Finding Similarity Implementation:*
>
> *IRInstructionMapper:*
> Since matching instructions that are identical does not provide that
> much information when looking for structural similarity, the value of
> the instruction, and the value of the operands for most instructions
> are not considered. Instead only the opcode, types and parameters are
> considered. Ultimately, if the opcode and types match, then the
> instructions are considered to be performing the same operation.
> However, for some instructions like shufflevector, or getelementptr
> instructions, it is ensured that the parameters match, in keeping with
> the “isSameOperationAs” instruction method. Each unique instruction
> via this scheme is mapped to an unsigned integer. Creating the
> mapping of the entire program to integers treats the program like a
> string with each integer as a character, and allows the use of data
> structures such as the Suffix Tree (recently refactored into
> LibSupport in review D80586 <https://reviews.llvm.org/D80586>) to
> process the string for repeated sequences of instruction in program.
>
> This maps instructions such as
> %1 = add i32 %a, 1
> %2 = add i32 %b, %c
> To the same values as they have the same opcode and same type, and
> instructions such as
> %1 = add i32 %a, 1
> %2 = add i64 %b, %c
> %3 = mul i64 %b, %c
> Are mapped to different values, since they have different opcodes and
> types.
>
> *IRSimilarityIdentifier:*
> The list of integers representing the program is then passed to the
> Suffix Tree, which can quickly find repeated substrings, and creates
> an IRSimilarityCandidate for each instance of these repeated
> substring. The IRSimilarityCandidate provides an interface for
> comparing the operand use and instructions of one
> IRSimilarityCandidate to another to determine structural similarity.
>
> Structural similarity is determined by attempting to find a one-to-one
> mapping from each register in one candidate to each register in
> another. If this mapping is found, then the two candidates are
> performing the same operations on a given set of inputs and are
> structurally similar. For example:
>
> %1 = add %2, %3
> %4 = add %1, %2
>
> Is equivalent to
>
> %a = add %b, %c
> %d = add %b, %a
>
> And the IRSimilarityCandidates containing these sections of code would
> be considered structurally similar. But
>
> %1 = add %2, %3
> %4 = add %1, %2
>
> Is not equivalent to
>
> %a = add %b, %c
> %d = add %a, %e.
>
> Since the use of %2 in the first section does not match the use of
> %b in the second. So the IRSimilarityCandidates containing these
> sections would not be considered structurally similar.
>
> The IRSimilarityIdentifier takes the IRSimilarityCandidates created
> from the instances of the repeated substrings and uses the
> IRSimilarityCandidate’s interface, described above, to determine which
> instances of a given repeated substring are structurally similar to
> one another.
>
> Once it has been determined which instances of a repeated substring
> that are structurally similar, sets of IRSimilarityCandidates that
> have been found to be compatible are grouped together (a Similarity
> Group). The IRSimilarityCandidates contained in each group are
> structurally similar to one another. In terms of the example above,
> the first two sections of code would be in the same Similarity Group,
> but the third is not structurally similar to either. Therefore, it
> would be in its own group, and does not have a matching structure with
> another IRSimilarityCandidate in the program.
>
> *Application: Visualizing Similarity*
>
> One potential application is simply exporting any similarities found
> for viewing by a user. LLVM-Sim is a tool that accepts a module, and
> passes them to the IRSimilarityIdentifier and retrieves the Similarity
> Groups for the module. These are then exported to a JSON file that
> has an entry for each Similarity Group, and a list of ranges for each
> region of similarity contained in that group. This tool also could be
> implemented with remarks rather than JSON for faster, more streamlined
> processing of program information.
>
> If a module had a Similarity Group with two sections, instructions 5
> to 10 and instructions 15 to 20, and a different Similarity Group
> contained two sections between instructions 1 to 4 and 21 to 24, the
> output JSON would be:
>
> {
> “1": [{"s": 5, "e": 10, {“s": 15, "e": 20}],
> “2": [{"s": 1, "e": 4}, {"s": 21, "e": 24}]
> }
>
> This information could be used to create interfaces to look at, and
> compare the similarity present in the program, such as in the example
> interface here: https://reviews.llvm.org/D86974, which has two
> instances of the program alongside its IR with two different similar
> sections highlighted.
>
> *Application: Deduplicating Similarity with Outlining:*
>
> Alternatively, following identifying the similar sections of the
> program, the similar regions of code can be deduplicated. The IR
> Outliner makes use of the Code Extractor and the new Similarity
> Analysis Pass to remove each of the similar ranges individually, and
> the new IR Outliner deduplicates the individually extracted functions
> for each region ensuring that the information needed for each
> extraction is not lost.
>
> Outlining at the IR level can benefit any target architecture and
> reduce the overall size of a program before any target specific
> decisions. On the other hand, the Machine Outliner in LLVM runs after
> the register allocator and needs to be implemented for each target
> separately. The concept of code structural similarity is easier to
> implement in a general way at the IR level.
>
> The IR Outliner is off by default. There is a flag
> -mllvm -ir-outliner that will use the defined cost model to attempt to
> outline sections of code that will lead to an overall decrease in the
> size of the program. There is also a debugging option
> -mllvm -ir-outliner-no-cost that will outline the Similarity Groups
> greedily based on how many instructions will be removed, but does not
> consider the number of instructions added.
>
> There are more details about the implementation of the IR Outliner at
> the end of the RFC.
>
> *Results:*
> The LLVM Test Suite, compiled with the IR Outliner on top of -Oz, has
> seen code size reductions of up to 60% and an average of 1% code size
> reduction for both X86 and AArch64. In these tests, the IR Outliner
> is placed as the last stage before code generation for the given
> target. There are some regressions. These are due to the fact that
> our cost model is not as accurate at the assembly code generation
> phase. It can be difficult to estimate how many instructions will be
> added during code generation at the IR level. For instance, if many
> arguments need to be created to handle constants, the decisions when
> creating a call at the assembly level are not directly obvious. So,
> our cost model may estimate that fewer instructions are added than
> actually need to be. Using target hooks may be a way to address this
> issue to get a better sense of what will happen during code generation
> than a general framework for calling conventions.
>
> CTMark Code Size (x86_64):
>
> Test |Original| With | Reduction
> | -Oz |Outlining|
> -----------------------------------------------------
> consumer-typeset.test | 484566 | 458642 | -5.3%
> SPASS.test | 490445 | 470669 | -4.0%
> sqlite3.test | 525902 | 512813 | -2.5%
> clamscan.test | 569405 | 567437 | -0.3%
> lencod.test | 762084 | 759454 | -0.3%
> bullet.test | 779072 | 779072 | 0.0%
> kc.test | 482848 | 482848 | 0.0%
> tramp3d-v4.test | 708712 | 708712 | 0.0%
> pairlocalalign.test | 485246 | 485462 | +0.0%
> 7zip-benchmark.test | 982651 | 983563 | +0.1%
> Geomean difference: -1.3%
>
> Notable Decreases:
> GCC-C-execute-20041011-1.test: 6600 before, 2343 after, -64.5%.
> This test uses a lot of macros, and the outliner picks up on the
> similarity of the same set of inserted instructions.
> frame_layout: 113631 before, 98110 after, -13.7%.
> This test has a lot of repetitive and repeated instructions performed
> on different sets of arguments, making it a good candidate for outlining.
>
> *Related Work:*
> The previous RFC (RFC: PT.2 Add IR level interprocedural outliner for
> code size.
> <http://lists.llvm.org/pipermail/llvm-dev/2017-September/117153.html>)
> and review (D53942: IR Outliner Pass
> <https://reviews.llvm.org/D53942>) discuss similar techniques for
> outlining identical instructions at the machine code level. This patch
> uses a more general concept of similarity and works at the IR Level
> instead. Both share the Suffix Tree data structure (see
> Interprocedural MIR-level outlining pass
> <http://lists.llvm.org/pipermail/llvm-dev/2016-August/104170.html>) to
> identify “outlinable” code sequences within a module.
>
> The framework could also be extended to encompass the Machine
> Outliner, so a MachineCodeSimilarityIdentifier that feeds the
> MachineOutliner could be used rather than having all of the
> functionality inside the one MachineOutliner pass.
>
> *Similarity Identification Implementation:*
>
> *IR Outliner Implementation:*
> *- Cost Model:*
> Using information such as region size, inputs, and outputs, across all
> the candidates, the arguments that need to be passed to the outlined
> function are found. From this, estimates are made for:
>
> * How many instructions will need to be added to make a call to the
> function.
> * How many instructions will be removed by inserting the call.
> * How many instructions will be added by creating the new function
> due to argument handling.
> * How many instructions will be added from stack instructions at
> call site.
> * How many instructions will be added from stack instructions
> inside call.
>
> Then the outlined function is only created if the number of
> instructions added is less than the number of instructions removed.
>
> Right now, this cost model was developed through a general sense of
> what happens in most calling conventions, and is conservative. Target
> hooks would allow for a target specific sense of how large each
> instruction is, so better sense how many machine instructions are
> being outlined could be developed. It would also give insight into the
> particular calling convention in use. This would allow for a more
> accurate cost model, and find more opportunities for outlining.
>
> *- Extraction and Consolidation:*
> If it is determined that outlining all of the IRSimilarityCandidates
> in a Similarity Group will add fewer instructions than it removes,
> each candidate's block is extracted using the the Code Extractor. The
> contents of one of the functions is placed in a new overall function,
> that has input and outputs depending on the arguments needed for each
> region, as determined by the Code Extractor. When the consolidation
> is performed, anything that is not the same across each section, like
> constants, are extracted into arguments as well. If the arguments are
> large, they are current extracted directly into the function arguments
> as well and are not passed by reference.
>
> As an example of extraction and consolidation, if the following two
> blocks are extracted, which each have a different constant in the
> second operand of the first add,
>
> %1 = add i32 %a, 10
> %2 = add i32 %1, %a
>
> %1 = add i32 %a, 11
> %2 = add i32 %1, %a
>
> The extracted function would be:
>
> define internal @outlined_ir_func(i32 %a, i32 %b) {
> entry:
> %1 = add i32 %a, %b
> %2 = add i32 %1, %a
> ret void
> }
> And calls of:
>
> call void @outlined_ir_func(i32 %a, 10)
> call void @outlined_ir_func(i32 %a, 11)
>
> For values that are used outside of the region, but are created inside
> the region, a register that has a pointer to a location is passed as
> an argument and the data is stored into that register. However,
> between different similar sections, the values used for outputs are
> not necessarily the same. If this is the case, a block for each set
> of possible outputs needed is added, with a switch statement to
> different sections, controlled by an extra argument.
>
> For example, if the following two blocks were extracted, which each
> have a different item used outside of the similar section
>
> %1 = add i32 %a, %b
> %2 = add i32 %1, %a
> %3 = mul i32 %1, %1
>
> %4 = add i32 %a, %b
> %5 = add i32 %5, %a
> %6 = div i32 %5, %5
> The extracted function would be:
>
> define internal @outlined_ir_func(i32 %a, i32 %b, i32* %c, i32 %d) {
> entry:
> %1 = add i32 %a, %b
> %2 = add i32 %1, %a
> switch %d, label %final [
> i32 0, label %block1
> i32 1, label %block2
> ]
> block1:
> store i32 %1, %c
> block2:
> store i32 %2, %c
> }
>
> And calls of
>
> call void @outlined_ir_func(i32 %a, i32 %b, i32* %c, 0)
> %val = load i32, i32* %c
> %3 = mul i32 %val, %val
>
> call void @outlined_ir_func(i32 %a, i32 %b, i32* %c, 1)
> %val1 = load i32, i32* %c
>
> %6 = div i32 %val1, %val1
>
> Once there is complete function, the call sites are replaced with the
> new function. If a function does not require a certain argument it is
> simply replaced with a null value. For instance, in the example
> above, if there was a third region did not need any sort of output
> value the call would be:
>
> call void @outlined_ir_func(i32 %a, i32 %b, i32* nullptr, -1)
>
> Using a negative one for the switch statement, the program simply
> falls through to the ending block.
>
> *Implementation in Many Patches:*
>
> Each of these patches includes tests. But the minimum number of
> patches needed to see outlining are the Instruction Mapper, Candidate,
> Identifier, and Wrapper pass paired with the initial setup and
> extraction with the input consolidation. These are current on
> Phabricator and the links are included.
>
> *Similarity Identification Analysis Pass:*
> */Instruction Mapper:/*
>
> * IRInstructionMapper: Adding initial mapping of instructions to
> unsigned integers
> o Link: https://reviews.llvm.org/D86968
> o Link ilist Parent: https://reviews.llvm.org/D86969
>
> */IRInstructionCandidate:/*
>
> * IRInstructionCandidate: Adding initial candidate structural code,
> with no special cases for commutativity.
> o Link Basic: https://reviews.llvm.org/D86970
> o Link Structural Comparison: https://reviews.llvm.org/D86971
>
> */IRSimilarityIdentifier:/*
>
> * IRSimilarityIdentifier: Add Suffix Tree processing and grouping
> candidates by structural similarity.
> o Link: https://reviews.llvm.org/D86972
>
> */WrapperPass:/*
>
> * Wrapper Pass: Adding a wrapper and printer pass for the
> IRSimilarityIdentifier.
> o Link: https://reviews.llvm.org/D86973
>
> */Extra Features:/*
> - Commutativity IRSimilarityCandidate: Adding flexibility for
> commutative instructions in structural comparison.
>
> * Predicate Isomorphism IRInstructionMapper/IRSimilarityCandidate:
> Adding flexibility for isomorphic predicate instructions in
> structural comparison.
>
>
> *LLVM-Sim:*
>
> * LLVM-Sim Basic: Process multiple files using LLVM-Sim and give
> results of the identifier to a JSON File.
> o Link: https://reviews.llvm.org/D86974
>
> *IROutliner:*
> */Extraction:/*
>
> * Initial Setup and Extraction: Extract the different regions
> individually
> o Link: https://reviews.llvm.org/D86975
>
> */Consolidation:/*
>
> * IR Outliner: Adding checks to exclude certain functions that
> either cannot be outlined, or require extra work to outline.
> o Link: https://reviews.llvm.org/D86976
> * IROutliner Only Inputs: Extract only regions that require inputs,
> and consolidate each group into one outlined function.
> o Link Extract: https://reviews.llvm.org/D86977
> o Link Consolidate: https://reviews.llvm.org/D86978
> * IROutliner Constants: Elevate constants to arguments when they are
> not the same in each region.
> * IROutliner Assumes: Handles the special case of llvm.assumes which
> are removed from extracted regions.
> * IROutliner Sunken Allocas: Handles the case of objects with their
> lifetimes entirely included in the region, and elevates them to
> arguments.
> * IROutliner Outputs: Handles the case of multiple different output
> schemes in different regions.
> * IROutliner Reduce Output Blocks: remove duplicate output scheme
> plots for output registers in outlined functions.
>
> */Miscellaneous Features:/*
>
> * IROutliner Attribute Merging: Makes sure that function attributes
> are merged consistently.
> * IROutliner Attribute Compatibility: Adds an option to turn on
> function attribute incompatibilities for items like sanitizing
> memory for outlined function.
> * IROutliner DebugInfo: Makes sure that debug info is consistent for
> outlined functions.
> * IROutliner Cost Model: Adds a cost model, so that reductions are
> prioritized rather than what comes first in terms of sheer code
> size outlined.
> * IROutliner with OptRemarks: Adding OptRemarks to the outliner.
> * IROutliner with SwiftError: Adding support for swifterror outlined
> parameters
> * IROutliner with LinkOnceODR: Adding flag to outline from
> LinkOnceODR functions.
>
>
> Any feedback, comments, and discussion are welcome and appreciated!
>
> Thank You,
> Andrew
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200901/d22aa72c/attachment-0001.html>
More information about the llvm-dev
mailing list