[llvm-dev] [RFC][llvm-mca] Adding binary support to llvm-mca.

Tue Dec 11 11:22:38 PST 2018

Thanks for the response Simon.  My reply is inline:

> From: Simon Pilgrim <llvm-dev at redking.me.uk>
> Sent: Monday, December 10, 2018 1:40 PM
>
> Hi Matt,
> 
> I can see a near future where perf-analysis tooling uses branch history
> profiler captures to determine how often loops/branches are taken and
> feeds that into llvm-mca, especially for hot/branchy loop analysis
> reports etc. Are you confident that your approach will be easily
> extendable for this?

That is a very interesting use case.  The restriction of a code-region to a single block
is a limitation for any tools that want to analyze branches.  However, I believe
that it will be easy to lift this restriction (it's just a check in IR/Verifier).  This limitation is not 
expressed in the llvm-mca driver.

If the information is coming from a profile report, then we'd most
likely need to extend the llvm-mca driver to accept profile reports.  Currently, 
code regions, from the perspective of the  llvm-mca driver, are very simple. They are 
just a collection of MCInst.  The binary support in this RFC+patch 
disassembles just the address range from marker start address for
some specified number of bytes.  It might be useful to add another driver argument
so that a user (or tool) can specify, from the command line, a range of instructions to
analyze.  I recently added a class for handling inputs to llvm::mca::CodeRegionGenerator,
which is just responsible for taking some input and creating a list of MCInst that llvm-mca uses.
We could subclass this to handle profile reports.

> Similarly, being able to generally embed the profile markers in object
> libraries for reuse is going to be important for some people - I'd like
> to see more of a plan of how this will be achieved. I understand that it
> might not be easy for some exe formats.

That is definitely a limitation.  This initial patch+RFC only handles linked
executables (i.e., the llvm-mca marker symbol addresses are resolved).
I'm working on a better solution so that this will not be a restriction. 
In fact, I'll probably delay trying to land any patches until I solve relocations
(or use a different solution for identifying start/end addresses for llvm-mca code regions).   

> Sorry if I'm being too critical, but I'm a bit worried that we end up
> with an initial implementation that will take a lot of reworking to meet
> our final aims.
> 
> Thanks, Simon.

I understand your criticisms and value your input. Thanks a ton!

-Matt

> On 10/12/2018 19:32, Matt Davis wrote:
> > Thanks for the feedback Guillaume and Clement!
> >
> > In response to Clement:
> >
> >>> In terms of future-proofness of only allowing regions within a basic
> >>> block, are we confident we can actually ever simulate branches apart from
> >>> "always taken, perfectly predicated" loop ? Even this simple need requires
> >>> knowing quite a few details on the frontend. The current design could
> >>> handle this use case with the addition of an external "loop mode" option to
> >>> MCA. If there are no other strong use cases, I would advocate for
> >>> experimental intrinsics unless people can contribute other example use
> >>> cases.
> > In short, I am in agreement and think that handling of branching or loop
> > constructs should be isolated to the llvm-mca driver/front-end.  The
> > only thing the code regions should be concerned with is identifying
> > blocks of instructions that will later be used by the front end.
> >
> > We can place limitations to how those blocks are formed. For example the
> > current implementation forces regions to be isolated to a single basic
> > block.  However, we anticipate lifting this restriction once branching
> > is handled.
> >
> > -Matt
> >
> >
> > On Mon, Dec 10, 2018 at 04:15:46PM +0100, Guillaume Chatelet wrote:
> >> +1 to what Clement said.
> >> I believe the intrinsics are a better design to support many architectures.
> >>
> >> IACA users are probably decorating their code with IACA_START / IACA_END
> >> macros. One possibility is to provide a header that define these macros in
> >> terms of the new intrinsics.
> >>
> >> On Mon, Dec 10, 2018 at 3:59 PM Clement Courbet <courbet at google.com>
> wrote:
> >>
> >>> Hi Matt/Andrea,
> >>>
> >>> I see pros and cons for IACA-style markers vs intrinsics.
> >>> On the one hand, IACA-style markers are very magical, and not very visible
> >>> in both the source and object code. Using IACA-style markers has the
> >>> advantage that you can use llvm-mca as a drop-in replacement for IACA, or
> >>> even to compare their outputs on the exact same binary. They also do not
> >>> require tooling on the compiler side and allow comparing the output of
> >>> several compilers.
> >>>
> >> On the other hand, IACA-style markers do not have a equivalent on other
> >>> architectures, and I'm not sure inventing new ones is a good idea :) I
> >>> think the latter makes them pretty much a no-go for llvm-mca as I don't
> >>> think we'll want to teach each target how to parse code regions. That's
> >>> much better handled in a target-agnostic way by the object. Intel got away
> >>> with them because they only had to support one architecture.
> >>>
> >>> tl;dr: In the case of llvm-mca, I like your design better than the markers.
> >>>
> >>> In terms of future-proofness of only allowing regions within a basic
> >>> block, are we confident we can actually ever simulate branches apart from
> >>> "always taken, perfectly predicated" loop ? Even this simple need requires
> >>> knowing quite a few details on the frontend. The current design could
> >>> handle this use case with the addition of an external "loop mode" option to
> >>> MCA. If there are no other strong use cases, I would advocate for
> >>> experimental intrinsics unless people can contribute other example use
> >>> cases.
> >>>
> >>> On Mon, Dec 3, 2018 at 11:38 PM Matt Davis <matthew.davis at sony.com>
> wrote:
> >>>
> >>>> Hi Andrea,
> >>>>
> >>>> On Mon, Dec 03, 2018 at 01:21:33PM +0000, Andrea Di Biagio wrote:
> >>>>> So, I have been thinking a bit more about this whole design.
> >>>>>
> >>>>> The more I think about your suggested design, the more I am convinced
> >>>> that
> >>>>> we should do something more to support ranges in binary object files
> >>>> too.
> >>>>> My understanding is that the reason why we don't support object files in
> >>>>> general, is because of the presence of relocations. That is because a
> >>>>> region start marker is effectively symbol relative, and the symbol (a
> >>>>> function) would be relocated in the final executable.
> >>>>> You mentioned to me that resolving even a 'simple' symbol-relative
> >>>>> relocation is not trivial, beause it requires specific knowledge about
> >>>> the
> >>>>> binary format, and the target (i.e. how relocations are encoded is
> >>>> target
> >>>>> specific). I am surprised that there is not a utility library for
> >>>> resolving
> >>>>> relocations.. but I am not familiar with that part of the compiler. I
> >>>> was
> >>>>> hoping that there was a target specific interface to use in this case...
> >>>> There might be a better way of resolving the relocs, but from what I saw
> >>>> looking at llvm-objdump and other related tools, it seems that resolving
> >>>> the relocated symbol is a target specific effort.  I also spent sometime
> >>>> sniffing around ExecutionEngine/RuntimeDyld/RuntimeDyld.cpp which also
> >>>> performs the reloc resolution.  I should clarify that I too am not an
> >>>> expert in llvm's utilities for performing symbol/reloc resolution, and
> >>>> perhaps someone in the community can point me in the right direction.  I
> >>>> can clearly see the reloc data in the object file via tools like
> >>>> objdump; however, accessing the relocs via
> >>>> llvm::object::ObjectFile::relocations() did not produce address values
> >>>> that we could use (values of zero).
> >>>>
> >>>> I was hoping that, for a first pass at this patch, supporting just
> >>>> executables would be okay.  That keeps this initial patch set simple,
> >>>> and hopefully will encourage others to take a peek at it, since it's
> >>>> less daunting than what it might otherwise be.  Of course, there is the
> >>>> concern that this initial patch will lock us into a design that will be
> >>>> more complicated to unravel later.
> >>>>
> >>>>> An alternative approach would require that you define your own
> >>>>> "symbol-relative" reference. After all, ranges are just a sequences of
> >>>>> instructions in a function. If a function symbol is described by the
> >>>> symbol
> >>>>> table, then you should be able to obtain its offset in the .text
> >>>> section.
> >>>>> So, you could potentially encode your own symbol+offset. However, the
> >>>>> linker would not be able to understand your "custom relocation", and
> >>>>> information about regions in the final elf would be basically broken.
> >>>>> So,that would not be a solution...
> >>>>>
> >>>>> I don't know honestly what is the best approach to use in this case.
> >>>>> As a compromise, it would not be a bad idea to add the ability to
> >>>> specify
> >>>>> ranges from command line. What do you think?
> >>>>> Still, from a user point of view, the idea that we don't support object
> >>>>> files in general sounds like a big limitation.
> >>>> I agree, only supporting executables is a limitation.  However, I'd
> >>>> like to land the base support now and add in the additional
> >>>> features/support after this large patch set lands.  But I can see
> >>>> where landing the whole thing entirely also makes sense.
> >>>>
> >>>>> About the new experimental intrinsics: those would definitely work well
> >>>> for
> >>>>> the simple case where instructions are from the same basic block.
> >>>>> However, some/most of the constraints that you plan to add will have to
> >>>>> change if in future we decide to allow ranges that potentially cross
> >>>>> multiple basic blocks. How will the rules/constraints on those new
> >>>>> intrinsics change? I just want to make sure that the suggested design is
> >>>>> future-proof.
> >>>> Since the llvm/clang parts of the code are just responsible for
> >>>> collecting where a range starts/ends, I hope that we can remove some
> >>>> of the baked-in constraints that are specified in IR/Verifier.cpp.
> >>>> As you pointed out earlier in this thread, we might want to
> >>>> introduce a dominance check if/when we lift the one-basic-block
> >>>> restriction.
> >>>>
> >>>> -Matt
> >>>>
> >>>>> -Andrea
> >>>>>
> >>>>> On Tue, Nov 27, 2018 at 5:08 PM Andrea Di Biagio <
> >>>> andrea.dibiagio at gmail.com>
> >>>>> wrote:
> >>>>>
> >>>>>> Thanks for clarifying it Matt.
> >>>>>>
> >>>>>> In general, I quite like your suggested design.
> >>>>>>
> >>>>>> My only concern is about the semantic of the two new intrinsics. You
> >>>>>> design doesn't allow mca ranges to span through multiple basic
> >>>> blocks. That
> >>>>>> constraint is acceptable for now, since llvm-mca doesn't know how to
> >>>> deal
> >>>>>> with control flow.
> >>>>>> However, I am a bit concerned about what might happen in future if we
> >>>>>> decide to let users specify code regions that span through multiple
> >>>> basic
> >>>>>> blocks. Basically, I don't particularly like the idea of changing the
> >>>>>> semantic of already existing intrinsic. A design that already
> >>>> accounts for
> >>>>>> that particular scenario/future work would be ideal. That being said,
> >>>>>> marking those new intrinsics as 'experimental' may be a good
> >>>> compromise (at
> >>>>>> least for now).
> >>>>>>
> >>>>>> So, I am quite happy overall with the direction of this RFC.
> >>>>>> However, I am interesting to hear from other developers about your
> >>>>>> suggested design.
> >>>>>>
> >>>>>>> This initial patch only targets ELF object files, and does not
> >>>> handle
> >>>>>> relocatable addresses. Since the start of a code region is
> >>>> represented as
> >>>>>> an
> >>>>>> assembly label, and referenced in the .mca_code_regions section, that
> >>>>>> address
> >>>>>> is relocatable.
> >>>>>>
> >>>>>> This may be okay for now. However, it would be nice to remove that
> >>>>>> constraint in future and add support to generic object files.
> >>>>>>
> >>>>>> -Andrea
> >>>>>>
> >>>>>> On Thu, Nov 22, 2018 at 7:21 PM <Matthew.Davis at sony.com> wrote:
> >>>>>>
> >>>>>>> I want to clarify a few restrictions of llvm-mca code regions that
> >>>> this
> >>>>>>> RFC proposes:
> >>>>>>>
> >>>>>>> 1) All llvm-mca code regions must start with an
> >>>>>>> llvm.mca.code.region.start intrinsic and end with
> >>>>>>> an llvm.mca.code.region.end intrinsic.  This rule is enforced at the
> >>>> IR
> >>>>>>> level in the IR verifier.
> >>>>>>>
> >>>>>>> 2) llvm-mca code regions cannot nest.  This restriction implies that
> >>>> an
> >>>>>>> llvm.mca.code.region.start
> >>>>>>> must have a llvm.mca.code.region.end intrinsic without any other
> >>>> llvm.mca
> >>>>>>> start intrinsics
> >>>>>>> between the two. The current implementation in the patch enforces
> >>>> this
> >>>>>>> restriction at the
> >>>>>>> IR level via the IR Verifier.
> >>>>>>>
> >>>>>>> 3) An llvm-mca code region cannot span multiple basic blocks.
> >>>> llvm-mca
> >>>>>>> does not follow
> >>>>>>> branches (yet).  Instead, a branch instruction is treated by llvm-mca
> >>>>>>> like any other instruction.
> >>>>>>> The current patch associated with this RFC does not enforce this
> >>>>>>> restriction.  I plan on updating
> >>>>>>> the patch to enforce that a code region can only belong to a single
> >>>> basic
> >>>>>>> block.  This is a simple
> >>>>>>> check, ensuring that both the llvm.mca.code.region.start and
> >>>> accompanying
> >>>>>>> end intrinsics live
> >>>>>>> in the same basic block. I imagine adding this check at the IR level
> >>>> when
> >>>>>>> we also verify points 1 and 2
> >>>>>>> above.  That will keep the code-region verification logic isolated
> >>>> to the
> >>>>>>> IR verifier.  The start/end
> >>>>>>> intrinsics should not have any uses, so I'm not sure that they would
> >>>> be
> >>>>>>> moved/sunk on behalf
> >>>>>>> of any other instruction.  In other words, I do not imagine that a
> >>>> start
> >>>>>>> and end would be split
> >>>>>>> apart due to later MI optimizations.  If I discover that such a case
> >>>>>>> occurs, then I might add the
> >>>>>>> basic-block check prior to emitting the code region data to the
> >>>> object
> >>>>>>> file.    Once  llvm-mca  is
> >>>>>>> updated to handle branches, then we can remove this constraint.
> >>>>>>>
> >>>>>>> -Matt
> >>>>>>>
> >>>>>>>> -----Original Message-----
> >>>>>>>> From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf Of Matt
> >>>>>>> Davis via llvm-
> >>>>>>>> dev
> >>>>>>>> Sent: Wednesday, November 21, 2018 8:47 AM
> >>>>>>>> To: Andrea Di Biagio <andrea.dibiagio at gmail.com>
> >>>>>>>> Cc: llvm-dev <llvm-dev at lists.llvm.org>; Di Biagio, Andrea
> >>>>>>>> <Andrea.Dibiagio at sony.com>; cfe-dev at lists.llvm.org
> >>>>>>>> Subject: Re: [llvm-dev] [RFC][llvm-mca] Adding binary support to
> >>>>>>> llvm-mca.
> >>>>>>>> Hi Andrea,
> >>>>>>>>
> >>>>>>>> Thanks for your input.
> >>>>>>>>
> >>>>>>>> On Wed, Nov 21, 2018 at 12:43:52PM +0000, Andrea Di Biagio wrote:
> >>>>>>>> [... snip ...]
> >>>>>>>>> About the suggested design:
> >>>>>>>>> I like the idea of being able to identify code regions using a
> >>>> numeric
> >>>>>>>>> identifier.
> >>>>>>>>> However, what happens if a code region spans through multiple
> >>>> basic
> >>>>>>> blocks?
> >>>>>>>> The current patch does not take into consideration cases where the
> >>>>>>>> region start and end intrinsics are placed in different basic
> >>>> blocks.
> >>>>>>>> Such would be the case if a region is defined to span multiple
> >>>> blocks.
> >>>>>>>> This would be similar to the current case where a user places a
> >>>>>>>> #LLVM-MCA-BEGIN assembly comment in one block and an #LLVM-MCA-
> END
> >>>> in
> >>>>>>>> another.  However, as you point out below, if the user does this
> >>>> in the
> >>>>>>>> source code via intrinsics (just what this patch is proposing),
> >>>> then
> >>>>>>>> there is a chance that optimizations might change the layout of the
> >>>>>>>> instructions and confuse the ordering of the MCA intrinsics.
> >>>>>>>>
> >>>>>>>> Since MCA does not follow branches (MCA just treats a branch as it
> >>>> would
> >>>>>>>> a non-branching instruction), it seems that a user should be aware
> >>>> that
> >>>>>>>> defining MCA code regions that span multiple blocks might result
> >>>> in an
> >>>>>>>> unexpected analysis.  While we do not discourage this, it seems
> >>>> like
> >>>>>>>> such a case will probably not produce an expected result for the
> >>>> user.
> >>>>>>>> We could introduce a warning, or automatically divide the regions
> >>>> so
> >>>>>>>> that a single region can only contain a single block.
> >>>>>>>>
> >>>>>>>>> My understanding is that code regions are not allowed to
> >>>> overlap. So,
> >>>>>>> it
> >>>>>>>>> makes sense if ` __mca_code_region_end()` doesn't take an ID as
> >>>> input.
> >>>>>>>>> However, what if ` __mca_code_region_end()` ends in a different
> >>>> basic
> >>>>>>> block?
> >>>>>>>>> `__mca_code_region_start()` has to always dominate `
> >>>>>>>>> __mca_code_region_end()`. This is trivial to verify when both
> >>>> calls
> >>>>>>> are in
> >>>>>>>>> a same basic block; however, we need to make sure that the
> >>>>>>> relationship is
> >>>>>>>>> still the same when the `end()` call is in a different basic
> >>>> block.
> >>>>>>>>> That would not be enough. I think we should also verify  that `
> >>>>>>>>> __mca_code_region_end()` always post-dominates the call to
> >>>>>>>>> `__mca_code_region_start()`.
> >>>>>>>> In any case this patch should probably check dominance of the
> >>>>>>>> intrinsics, even though MCA does not follow branches and MCA does
> >>>> not
> >>>>>>>> not explicitly forbid a region from containing multiple blocks.
> >>>>>>>>
> >>>>>>>>> My question is: what happens with basic block reordering? We
> >>>> don't
> >>>>>>> know the
> >>>>>>>>> layout of basic blocks until we reach code emission. How does it
> >>>> work
> >>>>>>> for
> >>>>>>>>> regions that span through multiple basic blocks?. I think your
> >>>> RFC
> >>>>>>> should
> >>>>>>>>> clarify this aspect.
> >>>>>>>>>
> >>>>>>>>> As a side note: at the moment, llvm-mca doesn't know how to deal
> >>>> with
> >>>>>>>>> branches. So, for simplicity we could force code regions to only
> >>>>>>> contain
> >>>>>>>>> instructions from a single basic block.
> >>>>>>>>>
> >>>>>>>>> However, In future we may want to teach llvm-mca how to analyze
> >>>>>>> branchy
> >>>>>>>>> code too. For example, we could introduce a simple control-flow
> >>>>>>> analysis in
> >>>>>>>>> llvm-mca, and use an external "branch trace" information (for
> >>>>>>> example, a
> >>>>>>>>> perf trace generated by an external tool) to decorate branches
> >>>> with
> >>>>>>> with
> >>>>>>>>> branch probabilities (similarly to what we currently do in LLVM
> >>>> with
> >>>>>>> PGO).
> >>>>>>>>> We could then use that knowledge to model branch prediction and
> >>>>>>> simulate
> >>>>>>>>> what happens in the presence of multiple branches.
> >>>>>>>>>
> >>>>>>>>> So, the idea of having regions that potentially span multiple
> >>>> basic
> >>>>>>> blocks
> >>>>>>>>> is not bad in general. However, I think you should better clarify
> >>>>>>> what are
> >>>>>>>>> the constraints (at least, you should answer to my questions from
> >>>>>>> before).
> >>>>>>>> I agree! Thanks for pointing that out.
> >>>>>>>>
> >>>>>>>>> If we decide to use those new intrinsics, then those should be
> >>>>>>> experimental
> >>>>>>>>> (at least to start).
> >>>>>>>> Agreed.
> >>>>>>>>
> >>>>>>>> -Matt
> >>>>>>>>
> >>>>>>>>> On Thu, Nov 15, 2018 at 11:07 PM via llvm-dev <
> >>>>>>> llvm-dev at lists.llvm.org>
> >>>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> Introduction
> >>>>>>>>>> -----------------
> >>>>>>>>>> Currently llvm-mca only accepts assembly code as input. We
> >>>> would
> >>>>>>> like to
> >>>>>>>>>> extend llvm-mca to support object files, allowing users to
> >>>> analyze
> >>>>>>> the
> >>>>>>>>>> performance of binaries. The proposed changes (which involve
> >>>> both
> >>>>>>>>>> clang and llvm) optionally introduce an object file section,
> >>>> but
> >>>>>>> this can
> >>>>>>>>>> be
> >>>>>>>>>> stripped-out if desired.
> >>>>>>>>>>
> >>>>>>>>>> For the llvm-mca binary support feature to be useful, a user
> >>>> needs
> >>>>>>> to tell
> >>>>>>>>>> llvm-mca which portions of their code they would like analyzed.
> >>>>>>> Currently,
> >>>>>>>>>> this is accomplished via assembly comments. However, assembly
> >>>>>>> comments are
> >>>>>>>>>> not
> >>>>>>>>>> preserved in object files, and this has encouraged this RFC.
> >>>> For the
> >>>>>>>>>> proposed
> >>>>>>>>>> binary support, we need to introduce changes to clang and llvm
> >>>> to
> >>>>>>> allow the
> >>>>>>>>>> user's object code to be recognized by llvm-mca:
> >>>>>>>>>>
> >>>>>>>>>> * We need a way for a user to identify a region/block of code
> >>>> they
> >>>>>>> want
> >>>>>>>>>>     analyzed by llvm-mca.
> >>>>>>>>>> * We need the information defining the user's region of code
> >>>> to be
> >>>>>>>>>> maintained
> >>>>>>>>>>     in the object file so that llvm-mca can analyze the desired
> >>>>>>> region(s)
> >>>>>>>>>> from the
> >>>>>>>>>>     object file.
> >>>>>>>>>>
> >>>>>>>>>> We define a "code region" as a subset of a user's program that
> >>>> is
> >>>>>>> to be
> >>>>>>>>>> analyzed via llvm-mca. The sequence of instructions to be
> >>>> analyzed
> >>>>>>> is
> >>>>>>>>>> represented as a pair: <start, end> where the 'start' marks the
> >>>>>>> beginning
> >>>>>>>>>> of
> >>>>>>>>>> the user's source code and 'end' terminates the sequence. The
> >>>>>>> instructions
> >>>>>>>>>> between 'start' and 'end' form the region that can be analyzed
> >>>> by
> >>>>>>> llvm-mca
> >>>>>>>>>> at a
> >>>>>>>>>> later time.
> >>>>>>>>>>
> >>>>>>>>>> Example
> >>>>>>>>>> -----------
> >>>>>>>>>> Before we go into the details of this proposed change, let's
> >>>> first
> >>>>>>> look at
> >>>>>>>>>> a
> >>>>>>>>>> simple example:
> >>>>>>>>>>
> >>>>>>>>>> // example.c -- Analyze a dot-product expression.
> >>>>>>>>>> double test(double x, double y) {
> >>>>>>>>>>     double result = 0.0;
> >>>>>>>>>>     __mca_code_region_start(42);
> >>>>>>>>>>     result += x * y;
> >>>>>>>>>>     __mca_code_region_end();
> >>>>>>>>>>     return result;
> >>>>>>>>>> }
> >>>>>>>>>>
> >>>>>>>>>> In the example above, we have identified a code region, in this
> >>>>>>> case a
> >>>>>>>>>> single
> >>>>>>>>>> dot-product expression. For the sake of brevity and simplicity,
> >>>>>>> we've
> >>>>>>>>>> chosen
> >>>>>>>>>> a very simple example, but in reality a more complicated
> >>>> example
> >>>>>>> could use
> >>>>>>>>>> multiple expressions. We have also denoted this region as
> >>>> number
> >>>>>>> 42. That
> >>>>>>>>>> identifier is only for the user, and simplifies reading an
> >>>> llvm-mca
> >>>>>>>>>> analysis
> >>>>>>>>>> report later.
> >>>>>>>>>>
> >>>>>>>>>> When this code is compiled, the region markers (the
> >>>> mca_code_region
> >>>>>>>>>> markers)
> >>>>>>>>>> are transformed into assembly labels. While the markers are
> >>>>>>> presented as
> >>>>>>>>>> function calls, in reality they are no-ops.
> >>>>>>>>>>
> >>>>>>>>>> test:
> >>>>>>>>>> pushq   %rbp
> >>>>>>>>>> movq    %rsp, %rbp
> >>>>>>>>>> movsd   %xmm0, -8(%rbp)
> >>>>>>>>>> movsd   %xmm1, -16(%rbp)
> >>>>>>>>>> .Lmca_code_region_start_0: # LLVM-MCA-START ID: 42
> >>>>>>>>>> xorps   %xmm0, %xmm0
> >>>>>>>>>> movsd   %xmm0, -24(%rbp)
> >>>>>>>>>> movsd   -8(%rbp), %xmm0
> >>>>>>>>>> mulsd   -16(%rbp), %xmm0
> >>>>>>>>>> addsd   -24(%rbp), %xmm0
> >>>>>>>>>> movsd   %xmm0, -24(%rbp)
> >>>>>>>>>> .Lmca_code_region_end_0: # LLVM-MCA-END ID: 42
> >>>>>>>>>> movsd   -24(%rbp), %xmm0
> >>>>>>>>>> popq    %rbp
> >>>>>>>>>> retq
> >>>>>>>>>> .section        .mca_code_regions,"", at progbits
> >>>>>>>>>> .quad   42
> >>>>>>>>>> .quad   .Lmca_code_region_start_0
> >>>>>>>>>> .quad   .Lmca_code_region_end_0-.Lmca_code_region_start_0
> >>>>>>>>>>
> >>>>>>>>>> The assembly has been trimmed to show the portions relevant to
> >>>> this
> >>>>>>> RFC.
> >>>>>>>>>> Notice the labels enclose the user's defined region, and that
> >>>> they
> >>>>>>>>>> preserve the
> >>>>>>>>>> user's arbitrary region identifier, the ever-so-important
> >>>> region 42.
> >>>>>>>>>> In the object file section .mca_code_regions, we have noted the
> >>>>>>> user's
> >>>>>>>>>> region
> >>>>>>>>>> identifier (.quad 42), start address, and region size. A more
> >>>>>>> complicated
> >>>>>>>>>> example can have multiple regions defined within a single
> >>>>>>> .mca_code_regions
> >>>>>>>>>> section. This section can be read by llvm-mca, allowing
> >>>> llvm-mca to
> >>>>>>> take
> >>>>>>>>>> object files as input instead of assembly source.
> >>>>>>>>>>
> >>>>>>>>>> Details
> >>>>>>>>>> ---------
> >>>>>>>>>> We need a way for a user to identify a region/block of code
> >>>> they
> >>>>>>> want
> >>>>>>>>>> analyzed
> >>>>>>>>>> by llvm-mca. We solve this problem by introducing two
> >>>> intrinsics
> >>>>>>> that a
> >>>>>>>>>> user can
> >>>>>>>>>> specify, for identifying regions of code for analysis.
> >>>>>>>>>>
> >>>>>>>>>> The two intrinsics are: llvm.mca.code.regions.start and
> >>>>>>>>>> llvm.mca.code.regions.end. A user can identify a code region by
> >>>>>>> inserting
> >>>>>>>>>> the
> >>>>>>>>>> mca_code_region_start and mca_code_region_end markers. These
> >>>> are
> >>>>>>> simply
> >>>>>>>>>> clang builtins and are transformed into the aforementioned
> >>>>>>> intrinsics
> >>>>>>>>>> during
> >>>>>>>>>> compilation. The code between the intrinsics are what we call
> >>>> "code
> >>>>>>>>>> regions"
> >>>>>>>>>> and are to be easily identifiable by llvm-mca; any code
> >>>> between a
> >>>>>>> start/end
> >>>>>>>>>> pair can be analyzed by llvm-mca at a later time. A user can
> >>>> define
> >>>>>>>>>> multiple
> >>>>>>>>>> non-overlapping code regions within their program.
> >>>>>>>>>>
> >>>>>>>>>> The llvm.mca.code.region.start intrinsic takes an integer
> >>>> constant
> >>>>>>> as its
> >>>>>>>>>> only
> >>>>>>>>>> argument. This argument is implemented as a metadata i32, and
> >>>> is
> >>>>>>> only used
> >>>>>>>>>> when generating llvm-mca reports. This value allows a user to
> >>>> more
> >>>>>>> easily
> >>>>>>>>>> identify a specific code region. llvm.mca.code.region.end
> >>>> takes no
> >>>>>>>>>> arguments.
> >>>>>>>>>> Since we disallow nesting of regions, the first 'end' intrinsic
> >>>>>>> lexically
> >>>>>>>>>> following a 'start' intrinsic represents the end of that code
> >>>>>>> region.
> >>>>>>>>>> Now that we have a solution for identifying regions for
> >>>> analysis,
> >>>>>>> we now
> >>>>>>>>>> need a
> >>>>>>>>>> way for preserving that information to be read at a later
> >>>> time. To
> >>>>>>>>>> accomplish
> >>>>>>>>>> this we propose adding a new section (.mca_code_regions) to the
> >>>>>>> object file
> >>>>>>>>>> generated by llvm. During code generation, the start/end
> >>>> intrinsics
> >>>>>>>>>> described
> >>>>>>>>>> above will be transformed into start/end labels in assembly.
> >>>> When
> >>>>>>> llvm
> >>>>>>>>>> generates the object file from the user's code, these start/end
> >>>>>>> labels
> >>>>>>>>>> form a
> >>>>>>>>>> pair of values identifying the start of the user's code
> >>>> region, and
> >>>>>>> size.
> >>>>>>>>>> The
> >>>>>>>>>> size represents the number of bytes between the start and end
> >>>>>>> address of
> >>>>>>>>>> the
> >>>>>>>>>> labels. Note that the labels are emitted during assembly
> >>>> printing.
> >>>>>>> We hope
> >>>>>>>>>> that these labels have no influence on code generation or
> >>>>>>> basic-block
> >>>>>>>>>> placement. However, the target assembler strategy for handling
> >>>>>>> labels is
> >>>>>>>>>> outside of our control.
> >>>>>>>>>>
> >>>>>>>>>> This proposed change affects the size of a binary, but only if
> >>>> the
> >>>>>>> user
> >>>>>>>>>> calls
> >>>>>>>>>> the start/end builtins mentioned above. The additional size of
> >>>> the
> >>>>>>>>>> .mca_code_regions section, which we imagine to be very small
> >>>> (to
> >>>>>>> the order
> >>>>>>>>>> of a
> >>>>>>>>>> few bytes), can trivially be stripped by tools like 'strip' or
> >>>>>>> 'objcopy'.
> >>>>>>>>>> Implementation Status
> >>>>>>>>>> ------------------------------
> >>>>>>>>>> We currently have the proposed changes implemented at the url
> >>>>>>> posted below.
> >>>>>>>>>> This initial patch only targets ELF object files, and does not
> >>>>>>> handle
> >>>>>>>>>> relocatable addresses. Since the start of a code region is
> >>>>>>> represented as
> >>>>>>>>>> an
> >>>>>>>>>> assembly label, and referenced in the .mca_code_regions
> >>>> section,
> >>>>>>> that
> >>>>>>>>>> address
> >>>>>>>>>> is relocatable. That value can be represented as
> >>>> section-relative
> >>>>>>>>>> relocatable
> >>>>>>>>>> symbol (.text + addend), but we are not handling that case yet.
> >>>>>>> Instead,
> >>>>>>>>>> the
> >>>>>>>>>> proposed changes only handle linked/executable object files.
> >>>>>>>>>>
> >>>>>>>>>> For purposes of review and to communicate the idea, the change
> >>>> is
> >>>>>>>>>> presented as a monolithic patch here:
> >>>>>>>>>>
> >>>>>>>>>> https://reviews.llvm.org/D54603
> >>>>>>>>>>
> >>>>>>>>>> The change is presented as a monolithic patch; however, if
> >>>> accepted
> >>>>>>>>>> the patch will be split into three smaller patches:
> >>>>>>>>>> 1. The introduction of the builtins to clang.
> >>>>>>>>>> 2. The llvm portion (the added intrinsics).
> >>>>>>>>>> 3. The llvm-mca portion.
> >>>>>>>>>>
> >>>>>>>>>> Thanks!
> >>>>>>>>>>
> >>>>>>>>>> -Matt
> >>>>>>>>>> _______________________________________________
> >>>>>>>>>> LLVM Developers mailing list
> >>>>>>>>>> llvm-dev at lists.llvm.org
> >>>>>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >>>>>>>>>>
> >>>>>>>> _______________________________________________
> >>>>>>>> LLVM Developers mailing list
> >>>>>>>> llvm-dev at lists.llvm.org
> >>>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev