[cfe-dev] [llvm-dev] [RFC][llvm-mca] Adding binary support to llvm-mca.

Mon Dec 3 14:38:40 PST 2018

Hi Andrea,

On Mon, Dec 03, 2018 at 01:21:33PM +0000, Andrea Di Biagio wrote:
> So, I have been thinking a bit more about this whole design.
> 
> The more I think about your suggested design, the more I am convinced that
> we should do something more to support ranges in binary object files too.
> My understanding is that the reason why we don't support object files in
> general, is because of the presence of relocations. That is because a
> region start marker is effectively symbol relative, and the symbol (a
> function) would be relocated in the final executable.
> You mentioned to me that resolving even a 'simple' symbol-relative
> relocation is not trivial, beause it requires specific knowledge about the
> binary format, and the target (i.e. how relocations are encoded is target
> specific). I am surprised that there is not a utility library for resolving
> relocations.. but I am not familiar with that part of the compiler. I was
> hoping that there was a target specific interface to use in this case...

There might be a better way of resolving the relocs, but from what I saw
looking at llvm-objdump and other related tools, it seems that resolving
the relocated symbol is a target specific effort.  I also spent sometime
sniffing around ExecutionEngine/RuntimeDyld/RuntimeDyld.cpp which also
performs the reloc resolution.  I should clarify that I too am not an
expert in llvm's utilities for performing symbol/reloc resolution, and
perhaps someone in the community can point me in the right direction.  I
can clearly see the reloc data in the object file via tools like
objdump; however, accessing the relocs via
llvm::object::ObjectFile::relocations() did not produce address values
that we could use (values of zero).

I was hoping that, for a first pass at this patch, supporting just
executables would be okay.  That keeps this initial patch set simple,
and hopefully will encourage others to take a peek at it, since it's
less daunting than what it might otherwise be.  Of course, there is the
concern that this initial patch will lock us into a design that will be
more complicated to unravel later.

> An alternative approach would require that you define your own
> "symbol-relative" reference. After all, ranges are just a sequences of
> instructions in a function. If a function symbol is described by the symbol
> table, then you should be able to obtain its offset in the .text section.
> So, you could potentially encode your own symbol+offset. However, the
> linker would not be able to understand your "custom relocation", and
> information about regions in the final elf would be basically broken.
> So,that would not be a solution...
> 
> I don't know honestly what is the best approach to use in this case.
> As a compromise, it would not be a bad idea to add the ability to specify
> ranges from command line. What do you think?
> Still, from a user point of view, the idea that we don't support object
> files in general sounds like a big limitation.

I agree, only supporting executables is a limitation.  However, I'd
like to land the base support now and add in the additional
features/support after this large patch set lands.  But I can see
where landing the whole thing entirely also makes sense.

> About the new experimental intrinsics: those would definitely work well for
> the simple case where instructions are from the same basic block.
> However, some/most of the constraints that you plan to add will have to
> change if in future we decide to allow ranges that potentially cross
> multiple basic blocks. How will the rules/constraints on those new
> intrinsics change? I just want to make sure that the suggested design is
> future-proof.

Since the llvm/clang parts of the code are just responsible for
collecting where a range starts/ends, I hope that we can remove some
of the baked-in constraints that are specified in IR/Verifier.cpp.
As you pointed out earlier in this thread, we might want to
introduce a dominance check if/when we lift the one-basic-block
restriction. 

-Matt

> 
> -Andrea
> 
> On Tue, Nov 27, 2018 at 5:08 PM Andrea Di Biagio <andrea.dibiagio at gmail.com>
> wrote:
> 
> > Thanks for clarifying it Matt.
> >
> > In general, I quite like your suggested design.
> >
> > My only concern is about the semantic of the two new intrinsics. You
> > design doesn't allow mca ranges to span through multiple basic blocks. That
> > constraint is acceptable for now, since llvm-mca doesn't know how to deal
> > with control flow.
> > However, I am a bit concerned about what might happen in future if we
> > decide to let users specify code regions that span through multiple basic
> > blocks. Basically, I don't particularly like the idea of changing the
> > semantic of already existing intrinsic. A design that already accounts for
> > that particular scenario/future work would be ideal. That being said,
> > marking those new intrinsics as 'experimental' may be a good compromise (at
> > least for now).
> >
> > So, I am quite happy overall with the direction of this RFC.
> > However, I am interesting to hear from other developers about your
> > suggested design.
> >
> > > This initial patch only targets ELF object files, and does not handle
> > relocatable addresses. Since the start of a code region is represented as
> > an
> > assembly label, and referenced in the .mca_code_regions section, that
> > address
> > is relocatable.
> >
> > This may be okay for now. However, it would be nice to remove that
> > constraint in future and add support to generic object files.
> >
> > -Andrea
> >
> > On Thu, Nov 22, 2018 at 7:21 PM <Matthew.Davis at sony.com> wrote:
> >
> >> I want to clarify a few restrictions of llvm-mca code regions that this
> >> RFC proposes:
> >>
> >> 1) All llvm-mca code regions must start with an
> >> llvm.mca.code.region.start intrinsic and end with
> >> an llvm.mca.code.region.end intrinsic.  This rule is enforced at the IR
> >> level in the IR verifier.
> >>
> >> 2) llvm-mca code regions cannot nest.  This restriction implies that an
> >> llvm.mca.code.region.start
> >> must have a llvm.mca.code.region.end intrinsic without any other llvm.mca
> >> start intrinsics
> >> between the two. The current implementation in the patch enforces this
> >> restriction at the
> >> IR level via the IR Verifier.
> >>
> >> 3) An llvm-mca code region cannot span multiple basic blocks.  llvm-mca
> >> does not follow
> >> branches (yet).  Instead, a branch instruction is treated by llvm-mca
> >> like any other instruction.
> >> The current patch associated with this RFC does not enforce this
> >> restriction.  I plan on updating
> >> the patch to enforce that a code region can only belong to a single basic
> >> block.  This is a simple
> >> check, ensuring that both the llvm.mca.code.region.start and accompanying
> >> end intrinsics live
> >> in the same basic block. I imagine adding this check at the IR level when
> >> we also verify points 1 and 2
> >> above.  That will keep the code-region verification logic isolated to the
> >> IR verifier.  The start/end
> >> intrinsics should not have any uses, so I'm not sure that they would be
> >> moved/sunk on behalf
> >> of any other instruction.  In other words, I do not imagine that a start
> >> and end would be split
> >> apart due to later MI optimizations.  If I discover that such a case
> >> occurs, then I might add the
> >> basic-block check prior to emitting the code region data to the object
> >> file.    Once  llvm-mca  is
> >> updated to handle branches, then we can remove this constraint.
> >>
> >> -Matt
> >>
> >> > -----Original Message-----
> >> > From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf Of Matt
> >> Davis via llvm-
> >> > dev
> >> > Sent: Wednesday, November 21, 2018 8:47 AM
> >> > To: Andrea Di Biagio <andrea.dibiagio at gmail.com>
> >> > Cc: llvm-dev <llvm-dev at lists.llvm.org>; Di Biagio, Andrea
> >> > <Andrea.Dibiagio at sony.com>; cfe-dev at lists.llvm.org
> >> > Subject: Re: [llvm-dev] [RFC][llvm-mca] Adding binary support to
> >> llvm-mca.
> >> >
> >> > Hi Andrea,
> >> >
> >> > Thanks for your input.
> >> >
> >> > On Wed, Nov 21, 2018 at 12:43:52PM +0000, Andrea Di Biagio wrote:
> >> > [... snip ...]
> >> > > About the suggested design:
> >> > > I like the idea of being able to identify code regions using a numeric
> >> > > identifier.
> >> > > However, what happens if a code region spans through multiple basic
> >> blocks?
> >> >
> >> > The current patch does not take into consideration cases where the
> >> > region start and end intrinsics are placed in different basic blocks.
> >> > Such would be the case if a region is defined to span multiple blocks.
> >> > This would be similar to the current case where a user places a
> >> > #LLVM-MCA-BEGIN assembly comment in one block and an #LLVM-MCA-END in
> >> > another.  However, as you point out below, if the user does this in the
> >> > source code via intrinsics (just what this patch is proposing), then
> >> > there is a chance that optimizations might change the layout of the
> >> > instructions and confuse the ordering of the MCA intrinsics.
> >> >
> >> > Since MCA does not follow branches (MCA just treats a branch as it would
> >> > a non-branching instruction), it seems that a user should be aware that
> >> > defining MCA code regions that span multiple blocks might result in an
> >> > unexpected analysis.  While we do not discourage this, it seems like
> >> > such a case will probably not produce an expected result for the user.
> >> > We could introduce a warning, or automatically divide the regions so
> >> > that a single region can only contain a single block.
> >> >
> >> > > My understanding is that code regions are not allowed to overlap. So,
> >> it
> >> > > makes sense if ` __mca_code_region_end()` doesn't take an ID as input.
> >> > > However, what if ` __mca_code_region_end()` ends in a different basic
> >> block?
> >> > >
> >> > > `__mca_code_region_start()` has to always dominate `
> >> > > __mca_code_region_end()`. This is trivial to verify when both calls
> >> are in
> >> > > a same basic block; however, we need to make sure that the
> >> relationship is
> >> > > still the same when the `end()` call is in a different basic block.
> >> > > That would not be enough. I think we should also verify  that `
> >> > > __mca_code_region_end()` always post-dominates the call to
> >> > > `__mca_code_region_start()`.
> >> >
> >> > In any case this patch should probably check dominance of the
> >> > intrinsics, even though MCA does not follow branches and MCA does not
> >> > not explicitly forbid a region from containing multiple blocks.
> >> >
> >> > >
> >> > > My question is: what happens with basic block reordering? We don't
> >> know the
> >> > > layout of basic blocks until we reach code emission. How does it work
> >> for
> >> > > regions that span through multiple basic blocks?. I think your RFC
> >> should
> >> > > clarify this aspect.
> >> > >
> >> > > As a side note: at the moment, llvm-mca doesn't know how to deal with
> >> > > branches. So, for simplicity we could force code regions to only
> >> contain
> >> > > instructions from a single basic block.
> >> > >
> >> > > However, In future we may want to teach llvm-mca how to analyze
> >> branchy
> >> > > code too. For example, we could introduce a simple control-flow
> >> analysis in
> >> > > llvm-mca, and use an external "branch trace" information (for
> >> example, a
> >> > > perf trace generated by an external tool) to decorate branches with
> >> with
> >> > > branch probabilities (similarly to what we currently do in LLVM with
> >> PGO).
> >> > > We could then use that knowledge to model branch prediction and
> >> simulate
> >> > > what happens in the presence of multiple branches.
> >> > >
> >> > > So, the idea of having regions that potentially span multiple basic
> >> blocks
> >> > > is not bad in general. However, I think you should better clarify
> >> what are
> >> > > the constraints (at least, you should answer to my questions from
> >> before).
> >> >
> >> > I agree! Thanks for pointing that out.
> >> >
> >> > > If we decide to use those new intrinsics, then those should be
> >> experimental
> >> > > (at least to start).
> >> >
> >> > Agreed.
> >> >
> >> > -Matt
> >> >
> >> > > On Thu, Nov 15, 2018 at 11:07 PM via llvm-dev <
> >> llvm-dev at lists.llvm.org>
> >> > > wrote:
> >> > >
> >> > > > Introduction
> >> > > > -----------------
> >> > > > Currently llvm-mca only accepts assembly code as input. We would
> >> like to
> >> > > > extend llvm-mca to support object files, allowing users to analyze
> >> the
> >> > > > performance of binaries. The proposed changes (which involve both
> >> > > > clang and llvm) optionally introduce an object file section, but
> >> this can
> >> > > > be
> >> > > > stripped-out if desired.
> >> > > >
> >> > > > For the llvm-mca binary support feature to be useful, a user needs
> >> to tell
> >> > > > llvm-mca which portions of their code they would like analyzed.
> >> Currently,
> >> > > > this is accomplished via assembly comments. However, assembly
> >> comments are
> >> > > > not
> >> > > > preserved in object files, and this has encouraged this RFC. For the
> >> > > > proposed
> >> > > > binary support, we need to introduce changes to clang and llvm to
> >> allow the
> >> > > > user's object code to be recognized by llvm-mca:
> >> > > >
> >> > > > * We need a way for a user to identify a region/block of code they
> >> want
> >> > > >    analyzed by llvm-mca.
> >> > > > * We need the information defining the user's region of code to be
> >> > > > maintained
> >> > > >    in the object file so that llvm-mca can analyze the desired
> >> region(s)
> >> > > > from the
> >> > > >    object file.
> >> > > >
> >> > > > We define a "code region" as a subset of a user's program that is
> >> to be
> >> > > > analyzed via llvm-mca. The sequence of instructions to be analyzed
> >> is
> >> > > > represented as a pair: <start, end> where the 'start' marks the
> >> beginning
> >> > > > of
> >> > > > the user's source code and 'end' terminates the sequence. The
> >> instructions
> >> > > > between 'start' and 'end' form the region that can be analyzed by
> >> llvm-mca
> >> > > > at a
> >> > > > later time.
> >> > > >
> >> > > > Example
> >> > > > -----------
> >> > > > Before we go into the details of this proposed change, let's first
> >> look at
> >> > > > a
> >> > > > simple example:
> >> > > >
> >> > > > // example.c -- Analyze a dot-product expression.
> >> > > > double test(double x, double y) {
> >> > > >    double result = 0.0;
> >> > > >    __mca_code_region_start(42);
> >> > > >    result += x * y;
> >> > > >    __mca_code_region_end();
> >> > > >    return result;
> >> > > > }
> >> > > >
> >> > > > In the example above, we have identified a code region, in this
> >> case a
> >> > > > single
> >> > > > dot-product expression. For the sake of brevity and simplicity,
> >> we've
> >> > > > chosen
> >> > > > a very simple example, but in reality a more complicated example
> >> could use
> >> > > > multiple expressions. We have also denoted this region as number
> >> 42. That
> >> > > > identifier is only for the user, and simplifies reading an llvm-mca
> >> > > > analysis
> >> > > > report later.
> >> > > >
> >> > > > When this code is compiled, the region markers (the mca_code_region
> >> > > > markers)
> >> > > > are transformed into assembly labels. While the markers are
> >> presented as
> >> > > > function calls, in reality they are no-ops.
> >> > > >
> >> > > > test:
> >> > > > pushq   %rbp
> >> > > > movq    %rsp, %rbp
> >> > > > movsd   %xmm0, -8(%rbp)
> >> > > > movsd   %xmm1, -16(%rbp)
> >> > > > .Lmca_code_region_start_0: # LLVM-MCA-START ID: 42
> >> > > > xorps   %xmm0, %xmm0
> >> > > > movsd   %xmm0, -24(%rbp)
> >> > > > movsd   -8(%rbp), %xmm0
> >> > > > mulsd   -16(%rbp), %xmm0
> >> > > > addsd   -24(%rbp), %xmm0
> >> > > > movsd   %xmm0, -24(%rbp)
> >> > > > .Lmca_code_region_end_0: # LLVM-MCA-END ID: 42
> >> > > > movsd   -24(%rbp), %xmm0
> >> > > > popq    %rbp
> >> > > > retq
> >> > > > .section        .mca_code_regions,"", at progbits
> >> > > > .quad   42
> >> > > > .quad   .Lmca_code_region_start_0
> >> > > > .quad   .Lmca_code_region_end_0-.Lmca_code_region_start_0
> >> > > >
> >> > > > The assembly has been trimmed to show the portions relevant to this
> >> RFC.
> >> > > > Notice the labels enclose the user's defined region, and that they
> >> > > > preserve the
> >> > > > user's arbitrary region identifier, the ever-so-important region 42.
> >> > > >
> >> > > > In the object file section .mca_code_regions, we have noted the
> >> user's
> >> > > > region
> >> > > > identifier (.quad 42), start address, and region size. A more
> >> complicated
> >> > > > example can have multiple regions defined within a single
> >> .mca_code_regions
> >> > > > section. This section can be read by llvm-mca, allowing llvm-mca to
> >> take
> >> > > > object files as input instead of assembly source.
> >> > > >
> >> > > > Details
> >> > > > ---------
> >> > > > We need a way for a user to identify a region/block of code they
> >> want
> >> > > > analyzed
> >> > > > by llvm-mca. We solve this problem by introducing two intrinsics
> >> that a
> >> > > > user can
> >> > > > specify, for identifying regions of code for analysis.
> >> > > >
> >> > > > The two intrinsics are: llvm.mca.code.regions.start and
> >> > > > llvm.mca.code.regions.end. A user can identify a code region by
> >> inserting
> >> > > > the
> >> > > > mca_code_region_start and mca_code_region_end markers. These are
> >> simply
> >> > > > clang builtins and are transformed into the aforementioned
> >> intrinsics
> >> > > > during
> >> > > > compilation. The code between the intrinsics are what we call "code
> >> > > > regions"
> >> > > > and are to be easily identifiable by llvm-mca; any code between a
> >> start/end
> >> > > > pair can be analyzed by llvm-mca at a later time. A user can define
> >> > > > multiple
> >> > > > non-overlapping code regions within their program.
> >> > > >
> >> > > > The llvm.mca.code.region.start intrinsic takes an integer constant
> >> as its
> >> > > > only
> >> > > > argument. This argument is implemented as a metadata i32, and is
> >> only used
> >> > > > when generating llvm-mca reports. This value allows a user to more
> >> easily
> >> > > > identify a specific code region. llvm.mca.code.region.end takes no
> >> > > > arguments.
> >> > > > Since we disallow nesting of regions, the first 'end' intrinsic
> >> lexically
> >> > > > following a 'start' intrinsic represents the end of that code
> >> region.
> >> > > >
> >> > > > Now that we have a solution for identifying regions for analysis,
> >> we now
> >> > > > need a
> >> > > > way for preserving that information to be read at a later time. To
> >> > > > accomplish
> >> > > > this we propose adding a new section (.mca_code_regions) to the
> >> object file
> >> > > > generated by llvm. During code generation, the start/end intrinsics
> >> > > > described
> >> > > > above will be transformed into start/end labels in assembly. When
> >> llvm
> >> > > > generates the object file from the user's code, these start/end
> >> labels
> >> > > > form a
> >> > > > pair of values identifying the start of the user's code region, and
> >> size.
> >> > > > The
> >> > > > size represents the number of bytes between the start and end
> >> address of
> >> > > > the
> >> > > > labels. Note that the labels are emitted during assembly printing.
> >> We hope
> >> > > > that these labels have no influence on code generation or
> >> basic-block
> >> > > > placement. However, the target assembler strategy for handling
> >> labels is
> >> > > > outside of our control.
> >> > > >
> >> > > > This proposed change affects the size of a binary, but only if the
> >> user
> >> > > > calls
> >> > > > the start/end builtins mentioned above. The additional size of the
> >> > > > .mca_code_regions section, which we imagine to be very small (to
> >> the order
> >> > > > of a
> >> > > > few bytes), can trivially be stripped by tools like 'strip' or
> >> 'objcopy'.
> >> > > >
> >> > > > Implementation Status
> >> > > > ------------------------------
> >> > > > We currently have the proposed changes implemented at the url
> >> posted below.
> >> > > > This initial patch only targets ELF object files, and does not
> >> handle
> >> > > > relocatable addresses. Since the start of a code region is
> >> represented as
> >> > > > an
> >> > > > assembly label, and referenced in the .mca_code_regions section,
> >> that
> >> > > > address
> >> > > > is relocatable. That value can be represented as section-relative
> >> > > > relocatable
> >> > > > symbol (.text + addend), but we are not handling that case yet.
> >> Instead,
> >> > > > the
> >> > > > proposed changes only handle linked/executable object files.
> >> > > >
> >> > > > For purposes of review and to communicate the idea, the change is
> >> > > > presented as a monolithic patch here:
> >> > > >
> >> > > > https://reviews.llvm.org/D54603
> >> > > >
> >> > > > The change is presented as a monolithic patch; however, if accepted
> >> > > > the patch will be split into three smaller patches:
> >> > > > 1. The introduction of the builtins to clang.
> >> > > > 2. The llvm portion (the added intrinsics).
> >> > > > 3. The llvm-mca portion.
> >> > > >
> >> > > > Thanks!
> >> > > >
> >> > > > -Matt
> >> > > > _______________________________________________
> >> > > > LLVM Developers mailing list
> >> > > > llvm-dev at lists.llvm.org
> >> > > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >> > > >
> >> > _______________________________________________
> >> > LLVM Developers mailing list
> >> > llvm-dev at lists.llvm.org
> >> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >>
> >