[cfe-dev] [llvm-dev] [RFC][llvm-mca] Adding binary support to llvm-mca.

Mon Dec 17 09:48:16 PST 2018

Adding the llvm-dev list, because my email client decides to remove certain lists when I reply-all... including the list that I intend to respond to.

> -----Original Message-----
> From: Davis, Matthew <Matthew.Davis at sony.com>
> Sent: Monday, December 17, 2018 9:47 AM
> To: Davis, Matthew <Matthew.Davis at sony.com>; llvm-dev at redking.me.uk
> Cc: cfe-dev at lists.llvm.org
> Subject: RE: [llvm-dev] [RFC][llvm-mca] Adding binary support to llvm-mca.
> 
> Just an update to this RFC.  (I thought this was going to be a short email... my
> apologies).
> 
> One of the primary limitations described in this RFC (earlier in this thread), is that the
> patch for this RFC only handles linked executables.  This restriction is due to myself
> wanting to avoid handling relocations in a target specific manner, at least in the
> initial patch.  Especially so, since I want to keep the initial patch simple.  However,
> sometimes simple and practical are at odds with each other.  I envision the main use-
> case of llvm-mca, with binary support, is for analyzing .o files.  Probably analyzing .o
> more commonly than fully linked executables.  Without handling relocated objects,
> this patch seems rather useless.
> 
> I started exploring an alternative solution this weekend to the aforementioned
> problem.  This alternative solution avoids having to handle relocations, but does give
> us support for object files (with relocated symbols) and linked executables.  The
> change is quite simple, and seems to be effective.  In short, we still generate
> intrinsics as discussed in the RFC, one to mark the start of a code region, and
> another to mark the end.  These intrinsics get lowered into local symbols.  The
> symbols are already  encoded with address information about their position in the
> object file.  What is different is that we ensure that these symbols have unique
> names and also encode the user provided ID value.
> 
> Previously the labels  were named like: .Lmca_code_region_start_<number>, and
> similar for .Lmca_code_region_end.  The user id number and region size were
> encoded in the .mca_code_regions object file section.  Previously mca never looked
> at the symbol table.  But, In reality we can calculate the region size by using the
> symbols in the symbol table (look for the mca symbols), instead of relying on the
> information encoded in .mca_code_regions.  The alternative approach gets rid of
> that section entirely but achieves the same functionality by encoding the information
> in the symbol name. In short the alternative approach just parses the symbol table
> for MCA symbols, and the symbol names are encoded with the data we need.
> 
> The newly proposed name is formatted
> as: .Lmca_code_region_start.<id>.<function>.<number>.  Similar for
> mca_code_region_end.  'function' is the function that the marker appears in, 'ID' is
> the user-specified ID (this is a value that users specify for easily identifying the code
> region under analysis... just cosmetic), and 'number' is a unique number to avoid any
> duplicate name conflicts.  The benefit of this alternative solution is that we can get
> rid of .mca_code_regions, and gather all of the information llvm-mca needs by
> parsing the symbol table looking for any symbols with the 'mca_code_region_start'
> and 'mca_code_region_end' format discussed above.  Of course, if the string table is
> stripped, then we will lose this data.  The main drawback from this alternative
> approach is that it relies on encoding symbol names and string processing on those
> names.   I'm somewhat biased against doing string parsing, but the code to perform
> this is simple and small, and more importantly it allows llvm-mca to handle linked or
> relocated object files.
> 
> -Matt
> 
> 
> 
> > -----Original Message-----
> > From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf Of via llvm-dev
> > Sent: Tuesday, December 11, 2018 11:23 AM
> > To: llvm-dev at redking.me.uk; llvm-dev at lists.llvm.org
> > Cc: cfe-dev at lists.llvm.org
> > Subject: Re: [llvm-dev] [RFC][llvm-mca] Adding binary support to llvm-mca.
> >
> > Thanks for the response Simon.  My reply is inline:
> >
> > > From: Simon Pilgrim <llvm-dev at redking.me.uk>
> > > Sent: Monday, December 10, 2018 1:40 PM
> > >
> > > Hi Matt,
> > >
> > > I can see a near future where perf-analysis tooling uses branch history
> > > profiler captures to determine how often loops/branches are taken and
> > > feeds that into llvm-mca, especially for hot/branchy loop analysis
> > > reports etc. Are you confident that your approach will be easily
> > > extendable for this?
> >
> > That is a very interesting use case.  The restriction of a code-region to a single
> block
> > is a limitation for any tools that want to analyze branches.  However, I believe
> > that it will be easy to lift this restriction (it's just a check in IR/Verifier).  This
> > limitation is not
> > expressed in the llvm-mca driver.
> >
> > If the information is coming from a profile report, then we'd most
> > likely need to extend the llvm-mca driver to accept profile reports.  Currently,
> > code regions, from the perspective of the  llvm-mca driver, are very simple. They
> are
> > just a collection of MCInst.  The binary support in this RFC+patch
> > disassembles just the address range from marker start address for
> > some specified number of bytes.  It might be useful to add another driver argument
> > so that a user (or tool) can specify, from the command line, a range of instructions
> > to
> > analyze.  I recently added a class for handling inputs to
> > llvm::mca::CodeRegionGenerator,
> > which is just responsible for taking some input and creating a list of MCInst that
> llvm-
> > mca uses.
> > We could subclass this to handle profile reports.
> >
> > > Similarly, being able to generally embed the profile markers in object
> > > libraries for reuse is going to be important for some people - I'd like
> > > to see more of a plan of how this will be achieved. I understand that it
> > > might not be easy for some exe formats.
> >
> > That is definitely a limitation.  This initial patch+RFC only handles linked
> > executables (i.e., the llvm-mca marker symbol addresses are resolved).
> > I'm working on a better solution so that this will not be a restriction.
> > In fact, I'll probably delay trying to land any patches until I solve relocations
> > (or use a different solution for identifying start/end addresses for llvm-mca code
> > regions).
> >
> > > Sorry if I'm being too critical, but I'm a bit worried that we end up
> > > with an initial implementation that will take a lot of reworking to meet
> > > our final aims.
> > >
> > > Thanks, Simon.
> >
> > I understand your criticisms and value your input. Thanks a ton!
> >
> > -Matt
> >
> >
> > > On 10/12/2018 19:32, Matt Davis wrote:
> > > > Thanks for the feedback Guillaume and Clement!
> > > >
> > > > In response to Clement:
> > > >
> > > >>> In terms of future-proofness of only allowing regions within a basic
> > > >>> block, are we confident we can actually ever simulate branches apart from
> > > >>> "always taken, perfectly predicated" loop ? Even this simple need requires
> > > >>> knowing quite a few details on the frontend. The current design could
> > > >>> handle this use case with the addition of an external "loop mode" option to
> > > >>> MCA. If there are no other strong use cases, I would advocate for
> > > >>> experimental intrinsics unless people can contribute other example use
> > > >>> cases.
> > > > In short, I am in agreement and think that handling of branching or loop
> > > > constructs should be isolated to the llvm-mca driver/front-end.  The
> > > > only thing the code regions should be concerned with is identifying
> > > > blocks of instructions that will later be used by the front end.
> > > >
> > > > We can place limitations to how those blocks are formed. For example the
> > > > current implementation forces regions to be isolated to a single basic
> > > > block.  However, we anticipate lifting this restriction once branching
> > > > is handled.
> > > >
> > > > -Matt
> > > >
> > > >
> > > > On Mon, Dec 10, 2018 at 04:15:46PM +0100, Guillaume Chatelet wrote:
> > > >> +1 to what Clement said.
> > > >> I believe the intrinsics are a better design to support many architectures.
> > > >>
> > > >> IACA users are probably decorating their code with IACA_START / IACA_END
> > > >> macros. One possibility is to provide a header that define these macros in
> > > >> terms of the new intrinsics.
> > > >>
> > > >> On Mon, Dec 10, 2018 at 3:59 PM Clement Courbet <courbet at google.com>
> > > wrote:
> > > >>
> > > >>> Hi Matt/Andrea,
> > > >>>
> > > >>> I see pros and cons for IACA-style markers vs intrinsics.
> > > >>> On the one hand, IACA-style markers are very magical, and not very visible
> > > >>> in both the source and object code. Using IACA-style markers has the
> > > >>> advantage that you can use llvm-mca as a drop-in replacement for IACA, or
> > > >>> even to compare their outputs on the exact same binary. They also do not
> > > >>> require tooling on the compiler side and allow comparing the output of
> > > >>> several compilers.
> > > >>>
> > > >> On the other hand, IACA-style markers do not have a equivalent on other
> > > >>> architectures, and I'm not sure inventing new ones is a good idea :) I
> > > >>> think the latter makes them pretty much a no-go for llvm-mca as I don't
> > > >>> think we'll want to teach each target how to parse code regions. That's
> > > >>> much better handled in a target-agnostic way by the object. Intel got away
> > > >>> with them because they only had to support one architecture.
> > > >>>
> > > >>> tl;dr: In the case of llvm-mca, I like your design better than the markers.
> > > >>>
> > > >>> In terms of future-proofness of only allowing regions within a basic
> > > >>> block, are we confident we can actually ever simulate branches apart from
> > > >>> "always taken, perfectly predicated" loop ? Even this simple need requires
> > > >>> knowing quite a few details on the frontend. The current design could
> > > >>> handle this use case with the addition of an external "loop mode" option to
> > > >>> MCA. If there are no other strong use cases, I would advocate for
> > > >>> experimental intrinsics unless people can contribute other example use
> > > >>> cases.
> > > >>>
> > > >>> On Mon, Dec 3, 2018 at 11:38 PM Matt Davis <matthew.davis at sony.com>
> > > wrote:
> > > >>>
> > > >>>> Hi Andrea,
> > > >>>>
> > > >>>> On Mon, Dec 03, 2018 at 01:21:33PM +0000, Andrea Di Biagio wrote:
> > > >>>>> So, I have been thinking a bit more about this whole design.
> > > >>>>>
> > > >>>>> The more I think about your suggested design, the more I am convinced
> > > >>>> that
> > > >>>>> we should do something more to support ranges in binary object files
> > > >>>> too.
> > > >>>>> My understanding is that the reason why we don't support object files in
> > > >>>>> general, is because of the presence of relocations. That is because a
> > > >>>>> region start marker is effectively symbol relative, and the symbol (a
> > > >>>>> function) would be relocated in the final executable.
> > > >>>>> You mentioned to me that resolving even a 'simple' symbol-relative
> > > >>>>> relocation is not trivial, beause it requires specific knowledge about
> > > >>>> the
> > > >>>>> binary format, and the target (i.e. how relocations are encoded is
> > > >>>> target
> > > >>>>> specific). I am surprised that there is not a utility library for
> > > >>>> resolving
> > > >>>>> relocations.. but I am not familiar with that part of the compiler. I
> > > >>>> was
> > > >>>>> hoping that there was a target specific interface to use in this case...
> > > >>>> There might be a better way of resolving the relocs, but from what I saw
> > > >>>> looking at llvm-objdump and other related tools, it seems that resolving
> > > >>>> the relocated symbol is a target specific effort.  I also spent sometime
> > > >>>> sniffing around ExecutionEngine/RuntimeDyld/RuntimeDyld.cpp which also
> > > >>>> performs the reloc resolution.  I should clarify that I too am not an
> > > >>>> expert in llvm's utilities for performing symbol/reloc resolution, and
> > > >>>> perhaps someone in the community can point me in the right direction.  I
> > > >>>> can clearly see the reloc data in the object file via tools like
> > > >>>> objdump; however, accessing the relocs via
> > > >>>> llvm::object::ObjectFile::relocations() did not produce address values
> > > >>>> that we could use (values of zero).
> > > >>>>
> > > >>>> I was hoping that, for a first pass at this patch, supporting just
> > > >>>> executables would be okay.  That keeps this initial patch set simple,
> > > >>>> and hopefully will encourage others to take a peek at it, since it's
> > > >>>> less daunting than what it might otherwise be.  Of course, there is the
> > > >>>> concern that this initial patch will lock us into a design that will be
> > > >>>> more complicated to unravel later.
> > > >>>>
> > > >>>>> An alternative approach would require that you define your own
> > > >>>>> "symbol-relative" reference. After all, ranges are just a sequences of
> > > >>>>> instructions in a function. If a function symbol is described by the
> > > >>>> symbol
> > > >>>>> table, then you should be able to obtain its offset in the .text
> > > >>>> section.
> > > >>>>> So, you could potentially encode your own symbol+offset. However, the
> > > >>>>> linker would not be able to understand your "custom relocation", and
> > > >>>>> information about regions in the final elf would be basically broken.
> > > >>>>> So,that would not be a solution...
> > > >>>>>
> > > >>>>> I don't know honestly what is the best approach to use in this case.
> > > >>>>> As a compromise, it would not be a bad idea to add the ability to
> > > >>>> specify
> > > >>>>> ranges from command line. What do you think?
> > > >>>>> Still, from a user point of view, the idea that we don't support object
> > > >>>>> files in general sounds like a big limitation.
> > > >>>> I agree, only supporting executables is a limitation.  However, I'd
> > > >>>> like to land the base support now and add in the additional
> > > >>>> features/support after this large patch set lands.  But I can see
> > > >>>> where landing the whole thing entirely also makes sense.
> > > >>>>
> > > >>>>> About the new experimental intrinsics: those would definitely work well
> > > >>>> for
> > > >>>>> the simple case where instructions are from the same basic block.
> > > >>>>> However, some/most of the constraints that you plan to add will have to
> > > >>>>> change if in future we decide to allow ranges that potentially cross
> > > >>>>> multiple basic blocks. How will the rules/constraints on those new
> > > >>>>> intrinsics change? I just want to make sure that the suggested design is
> > > >>>>> future-proof.
> > > >>>> Since the llvm/clang parts of the code are just responsible for
> > > >>>> collecting where a range starts/ends, I hope that we can remove some
> > > >>>> of the baked-in constraints that are specified in IR/Verifier.cpp.
> > > >>>> As you pointed out earlier in this thread, we might want to
> > > >>>> introduce a dominance check if/when we lift the one-basic-block
> > > >>>> restriction.
> > > >>>>
> > > >>>> -Matt
> > > >>>>
> > > >>>>> -Andrea
> > > >>>>>
> > > >>>>> On Tue, Nov 27, 2018 at 5:08 PM Andrea Di Biagio <
> > > >>>> andrea.dibiagio at gmail.com>
> > > >>>>> wrote:
> > > >>>>>
> > > >>>>>> Thanks for clarifying it Matt.
> > > >>>>>>
> > > >>>>>> In general, I quite like your suggested design.
> > > >>>>>>
> > > >>>>>> My only concern is about the semantic of the two new intrinsics. You
> > > >>>>>> design doesn't allow mca ranges to span through multiple basic
> > > >>>> blocks. That
> > > >>>>>> constraint is acceptable for now, since llvm-mca doesn't know how to
> > > >>>> deal
> > > >>>>>> with control flow.
> > > >>>>>> However, I am a bit concerned about what might happen in future if we
> > > >>>>>> decide to let users specify code regions that span through multiple
> > > >>>> basic
> > > >>>>>> blocks. Basically, I don't particularly like the idea of changing the
> > > >>>>>> semantic of already existing intrinsic. A design that already
> > > >>>> accounts for
> > > >>>>>> that particular scenario/future work would be ideal. That being said,
> > > >>>>>> marking those new intrinsics as 'experimental' may be a good
> > > >>>> compromise (at
> > > >>>>>> least for now).
> > > >>>>>>
> > > >>>>>> So, I am quite happy overall with the direction of this RFC.
> > > >>>>>> However, I am interesting to hear from other developers about your
> > > >>>>>> suggested design.
> > > >>>>>>
> > > >>>>>>> This initial patch only targets ELF object files, and does not
> > > >>>> handle
> > > >>>>>> relocatable addresses. Since the start of a code region is
> > > >>>> represented as
> > > >>>>>> an
> > > >>>>>> assembly label, and referenced in the .mca_code_regions section, that
> > > >>>>>> address
> > > >>>>>> is relocatable.
> > > >>>>>>
> > > >>>>>> This may be okay for now. However, it would be nice to remove that
> > > >>>>>> constraint in future and add support to generic object files.
> > > >>>>>>
> > > >>>>>> -Andrea
> > > >>>>>>
> > > >>>>>> On Thu, Nov 22, 2018 at 7:21 PM <Matthew.Davis at sony.com> wrote:
> > > >>>>>>
> > > >>>>>>> I want to clarify a few restrictions of llvm-mca code regions that
> > > >>>> this
> > > >>>>>>> RFC proposes:
> > > >>>>>>>
> > > >>>>>>> 1) All llvm-mca code regions must start with an
> > > >>>>>>> llvm.mca.code.region.start intrinsic and end with
> > > >>>>>>> an llvm.mca.code.region.end intrinsic.  This rule is enforced at the
> > > >>>> IR
> > > >>>>>>> level in the IR verifier.
> > > >>>>>>>
> > > >>>>>>> 2) llvm-mca code regions cannot nest.  This restriction implies that
> > > >>>> an
> > > >>>>>>> llvm.mca.code.region.start
> > > >>>>>>> must have a llvm.mca.code.region.end intrinsic without any other
> > > >>>> llvm.mca
> > > >>>>>>> start intrinsics
> > > >>>>>>> between the two. The current implementation in the patch enforces
> > > >>>> this
> > > >>>>>>> restriction at the
> > > >>>>>>> IR level via the IR Verifier.
> > > >>>>>>>
> > > >>>>>>> 3) An llvm-mca code region cannot span multiple basic blocks.
> > > >>>> llvm-mca
> > > >>>>>>> does not follow
> > > >>>>>>> branches (yet).  Instead, a branch instruction is treated by llvm-mca
> > > >>>>>>> like any other instruction.
> > > >>>>>>> The current patch associated with this RFC does not enforce this
> > > >>>>>>> restriction.  I plan on updating
> > > >>>>>>> the patch to enforce that a code region can only belong to a single
> > > >>>> basic
> > > >>>>>>> block.  This is a simple
> > > >>>>>>> check, ensuring that both the llvm.mca.code.region.start and
> > > >>>> accompanying
> > > >>>>>>> end intrinsics live
> > > >>>>>>> in the same basic block. I imagine adding this check at the IR level
> > > >>>> when
> > > >>>>>>> we also verify points 1 and 2
> > > >>>>>>> above.  That will keep the code-region verification logic isolated
> > > >>>> to the
> > > >>>>>>> IR verifier.  The start/end
> > > >>>>>>> intrinsics should not have any uses, so I'm not sure that they would
> > > >>>> be
> > > >>>>>>> moved/sunk on behalf
> > > >>>>>>> of any other instruction.  In other words, I do not imagine that a
> > > >>>> start
> > > >>>>>>> and end would be split
> > > >>>>>>> apart due to later MI optimizations.  If I discover that such a case
> > > >>>>>>> occurs, then I might add the
> > > >>>>>>> basic-block check prior to emitting the code region data to the
> > > >>>> object
> > > >>>>>>> file.    Once  llvm-mca  is
> > > >>>>>>> updated to handle branches, then we can remove this constraint.
> > > >>>>>>>
> > > >>>>>>> -Matt
> > > >>>>>>>
> > > >>>>>>>> -----Original Message-----
> > > >>>>>>>> From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf Of Matt
> > > >>>>>>> Davis via llvm-
> > > >>>>>>>> dev
> > > >>>>>>>> Sent: Wednesday, November 21, 2018 8:47 AM
> > > >>>>>>>> To: Andrea Di Biagio <andrea.dibiagio at gmail.com>
> > > >>>>>>>> Cc: llvm-dev <llvm-dev at lists.llvm.org>; Di Biagio, Andrea
> > > >>>>>>>> <Andrea.Dibiagio at sony.com>; cfe-dev at lists.llvm.org
> > > >>>>>>>> Subject: Re: [llvm-dev] [RFC][llvm-mca] Adding binary support to
> > > >>>>>>> llvm-mca.
> > > >>>>>>>> Hi Andrea,
> > > >>>>>>>>
> > > >>>>>>>> Thanks for your input.
> > > >>>>>>>>
> > > >>>>>>>> On Wed, Nov 21, 2018 at 12:43:52PM +0000, Andrea Di Biagio wrote:
> > > >>>>>>>> [... snip ...]
> > > >>>>>>>>> About the suggested design:
> > > >>>>>>>>> I like the idea of being able to identify code regions using a
> > > >>>> numeric
> > > >>>>>>>>> identifier.
> > > >>>>>>>>> However, what happens if a code region spans through multiple
> > > >>>> basic
> > > >>>>>>> blocks?
> > > >>>>>>>> The current patch does not take into consideration cases where the
> > > >>>>>>>> region start and end intrinsics are placed in different basic
> > > >>>> blocks.
> > > >>>>>>>> Such would be the case if a region is defined to span multiple
> > > >>>> blocks.
> > > >>>>>>>> This would be similar to the current case where a user places a
> > > >>>>>>>> #LLVM-MCA-BEGIN assembly comment in one block and an #LLVM-
> > MCA-
> > > END
> > > >>>> in
> > > >>>>>>>> another.  However, as you point out below, if the user does this
> > > >>>> in the
> > > >>>>>>>> source code via intrinsics (just what this patch is proposing),
> > > >>>> then
> > > >>>>>>>> there is a chance that optimizations might change the layout of the
> > > >>>>>>>> instructions and confuse the ordering of the MCA intrinsics.
> > > >>>>>>>>
> > > >>>>>>>> Since MCA does not follow branches (MCA just treats a branch as it
> > > >>>> would
> > > >>>>>>>> a non-branching instruction), it seems that a user should be aware
> > > >>>> that
> > > >>>>>>>> defining MCA code regions that span multiple blocks might result
> > > >>>> in an
> > > >>>>>>>> unexpected analysis.  While we do not discourage this, it seems
> > > >>>> like
> > > >>>>>>>> such a case will probably not produce an expected result for the
> > > >>>> user.
> > > >>>>>>>> We could introduce a warning, or automatically divide the regions
> > > >>>> so
> > > >>>>>>>> that a single region can only contain a single block.
> > > >>>>>>>>
> > > >>>>>>>>> My understanding is that code regions are not allowed to
> > > >>>> overlap. So,
> > > >>>>>>> it
> > > >>>>>>>>> makes sense if ` __mca_code_region_end()` doesn't take an ID as
> > > >>>> input.
> > > >>>>>>>>> However, what if ` __mca_code_region_end()` ends in a different
> > > >>>> basic
> > > >>>>>>> block?
> > > >>>>>>>>> `__mca_code_region_start()` has to always dominate `
> > > >>>>>>>>> __mca_code_region_end()`. This is trivial to verify when both
> > > >>>> calls
> > > >>>>>>> are in
> > > >>>>>>>>> a same basic block; however, we need to make sure that the
> > > >>>>>>> relationship is
> > > >>>>>>>>> still the same when the `end()` call is in a different basic
> > > >>>> block.
> > > >>>>>>>>> That would not be enough. I think we should also verify  that `
> > > >>>>>>>>> __mca_code_region_end()` always post-dominates the call to
> > > >>>>>>>>> `__mca_code_region_start()`.
> > > >>>>>>>> In any case this patch should probably check dominance of the
> > > >>>>>>>> intrinsics, even though MCA does not follow branches and MCA does
> > > >>>> not
> > > >>>>>>>> not explicitly forbid a region from containing multiple blocks.
> > > >>>>>>>>
> > > >>>>>>>>> My question is: what happens with basic block reordering? We
> > > >>>> don't
> > > >>>>>>> know the
> > > >>>>>>>>> layout of basic blocks until we reach code emission. How does it
> > > >>>> work
> > > >>>>>>> for
> > > >>>>>>>>> regions that span through multiple basic blocks?. I think your
> > > >>>> RFC
> > > >>>>>>> should
> > > >>>>>>>>> clarify this aspect.
> > > >>>>>>>>>
> > > >>>>>>>>> As a side note: at the moment, llvm-mca doesn't know how to deal
> > > >>>> with
> > > >>>>>>>>> branches. So, for simplicity we could force code regions to only
> > > >>>>>>> contain
> > > >>>>>>>>> instructions from a single basic block.
> > > >>>>>>>>>
> > > >>>>>>>>> However, In future we may want to teach llvm-mca how to analyze
> > > >>>>>>> branchy
> > > >>>>>>>>> code too. For example, we could introduce a simple control-flow
> > > >>>>>>> analysis in
> > > >>>>>>>>> llvm-mca, and use an external "branch trace" information (for
> > > >>>>>>> example, a
> > > >>>>>>>>> perf trace generated by an external tool) to decorate branches
> > > >>>> with
> > > >>>>>>> with
> > > >>>>>>>>> branch probabilities (similarly to what we currently do in LLVM
> > > >>>> with
> > > >>>>>>> PGO).
> > > >>>>>>>>> We could then use that knowledge to model branch prediction and
> > > >>>>>>> simulate
> > > >>>>>>>>> what happens in the presence of multiple branches.
> > > >>>>>>>>>
> > > >>>>>>>>> So, the idea of having regions that potentially span multiple
> > > >>>> basic
> > > >>>>>>> blocks
> > > >>>>>>>>> is not bad in general. However, I think you should better clarify
> > > >>>>>>> what are
> > > >>>>>>>>> the constraints (at least, you should answer to my questions from
> > > >>>>>>> before).
> > > >>>>>>>> I agree! Thanks for pointing that out.
> > > >>>>>>>>
> > > >>>>>>>>> If we decide to use those new intrinsics, then those should be
> > > >>>>>>> experimental
> > > >>>>>>>>> (at least to start).
> > > >>>>>>>> Agreed.
> > > >>>>>>>>
> > > >>>>>>>> -Matt
> > > >>>>>>>>
> > > >>>>>>>>> On Thu, Nov 15, 2018 at 11:07 PM via llvm-dev <
> > > >>>>>>> llvm-dev at lists.llvm.org>
> > > >>>>>>>>> wrote:
> > > >>>>>>>>>
> > > >>>>>>>>>> Introduction
> > > >>>>>>>>>> -----------------
> > > >>>>>>>>>> Currently llvm-mca only accepts assembly code as input. We
> > > >>>> would
> > > >>>>>>> like to
> > > >>>>>>>>>> extend llvm-mca to support object files, allowing users to
> > > >>>> analyze
> > > >>>>>>> the
> > > >>>>>>>>>> performance of binaries. The proposed changes (which involve
> > > >>>> both
> > > >>>>>>>>>> clang and llvm) optionally introduce an object file section,
> > > >>>> but
> > > >>>>>>> this can
> > > >>>>>>>>>> be
> > > >>>>>>>>>> stripped-out if desired.
> > > >>>>>>>>>>
> > > >>>>>>>>>> For the llvm-mca binary support feature to be useful, a user
> > > >>>> needs
> > > >>>>>>> to tell
> > > >>>>>>>>>> llvm-mca which portions of their code they would like analyzed.
> > > >>>>>>> Currently,
> > > >>>>>>>>>> this is accomplished via assembly comments. However, assembly
> > > >>>>>>> comments are
> > > >>>>>>>>>> not
> > > >>>>>>>>>> preserved in object files, and this has encouraged this RFC.
> > > >>>> For the
> > > >>>>>>>>>> proposed
> > > >>>>>>>>>> binary support, we need to introduce changes to clang and llvm
> > > >>>> to
> > > >>>>>>> allow the
> > > >>>>>>>>>> user's object code to be recognized by llvm-mca:
> > > >>>>>>>>>>
> > > >>>>>>>>>> * We need a way for a user to identify a region/block of code
> > > >>>> they
> > > >>>>>>> want
> > > >>>>>>>>>>     analyzed by llvm-mca.
> > > >>>>>>>>>> * We need the information defining the user's region of code
> > > >>>> to be
> > > >>>>>>>>>> maintained
> > > >>>>>>>>>>     in the object file so that llvm-mca can analyze the desired
> > > >>>>>>> region(s)
> > > >>>>>>>>>> from the
> > > >>>>>>>>>>     object file.
> > > >>>>>>>>>>
> > > >>>>>>>>>> We define a "code region" as a subset of a user's program that
> > > >>>> is
> > > >>>>>>> to be
> > > >>>>>>>>>> analyzed via llvm-mca. The sequence of instructions to be
> > > >>>> analyzed
> > > >>>>>>> is
> > > >>>>>>>>>> represented as a pair: <start, end> where the 'start' marks the
> > > >>>>>>> beginning
> > > >>>>>>>>>> of
> > > >>>>>>>>>> the user's source code and 'end' terminates the sequence. The
> > > >>>>>>> instructions
> > > >>>>>>>>>> between 'start' and 'end' form the region that can be analyzed
> > > >>>> by
> > > >>>>>>> llvm-mca
> > > >>>>>>>>>> at a
> > > >>>>>>>>>> later time.
> > > >>>>>>>>>>
> > > >>>>>>>>>> Example
> > > >>>>>>>>>> -----------
> > > >>>>>>>>>> Before we go into the details of this proposed change, let's
> > > >>>> first
> > > >>>>>>> look at
> > > >>>>>>>>>> a
> > > >>>>>>>>>> simple example:
> > > >>>>>>>>>>
> > > >>>>>>>>>> // example.c -- Analyze a dot-product expression.
> > > >>>>>>>>>> double test(double x, double y) {
> > > >>>>>>>>>>     double result = 0.0;
> > > >>>>>>>>>>     __mca_code_region_start(42);
> > > >>>>>>>>>>     result += x * y;
> > > >>>>>>>>>>     __mca_code_region_end();
> > > >>>>>>>>>>     return result;
> > > >>>>>>>>>> }
> > > >>>>>>>>>>
> > > >>>>>>>>>> In the example above, we have identified a code region, in this
> > > >>>>>>> case a
> > > >>>>>>>>>> single
> > > >>>>>>>>>> dot-product expression. For the sake of brevity and simplicity,
> > > >>>>>>> we've
> > > >>>>>>>>>> chosen
> > > >>>>>>>>>> a very simple example, but in reality a more complicated
> > > >>>> example
> > > >>>>>>> could use
> > > >>>>>>>>>> multiple expressions. We have also denoted this region as
> > > >>>> number
> > > >>>>>>> 42. That
> > > >>>>>>>>>> identifier is only for the user, and simplifies reading an
> > > >>>> llvm-mca
> > > >>>>>>>>>> analysis
> > > >>>>>>>>>> report later.
> > > >>>>>>>>>>
> > > >>>>>>>>>> When this code is compiled, the region markers (the
> > > >>>> mca_code_region
> > > >>>>>>>>>> markers)
> > > >>>>>>>>>> are transformed into assembly labels. While the markers are
> > > >>>>>>> presented as
> > > >>>>>>>>>> function calls, in reality they are no-ops.
> > > >>>>>>>>>>
> > > >>>>>>>>>> test:
> > > >>>>>>>>>> pushq   %rbp
> > > >>>>>>>>>> movq    %rsp, %rbp
> > > >>>>>>>>>> movsd   %xmm0, -8(%rbp)
> > > >>>>>>>>>> movsd   %xmm1, -16(%rbp)
> > > >>>>>>>>>> .Lmca_code_region_start_0: # LLVM-MCA-START ID: 42
> > > >>>>>>>>>> xorps   %xmm0, %xmm0
> > > >>>>>>>>>> movsd   %xmm0, -24(%rbp)
> > > >>>>>>>>>> movsd   -8(%rbp), %xmm0
> > > >>>>>>>>>> mulsd   -16(%rbp), %xmm0
> > > >>>>>>>>>> addsd   -24(%rbp), %xmm0
> > > >>>>>>>>>> movsd   %xmm0, -24(%rbp)
> > > >>>>>>>>>> .Lmca_code_region_end_0: # LLVM-MCA-END ID: 42
> > > >>>>>>>>>> movsd   -24(%rbp), %xmm0
> > > >>>>>>>>>> popq    %rbp
> > > >>>>>>>>>> retq
> > > >>>>>>>>>> .section        .mca_code_regions,"", at progbits
> > > >>>>>>>>>> .quad   42
> > > >>>>>>>>>> .quad   .Lmca_code_region_start_0
> > > >>>>>>>>>> .quad   .Lmca_code_region_end_0-.Lmca_code_region_start_0
> > > >>>>>>>>>>
> > > >>>>>>>>>> The assembly has been trimmed to show the portions relevant to
> > > >>>> this
> > > >>>>>>> RFC.
> > > >>>>>>>>>> Notice the labels enclose the user's defined region, and that
> > > >>>> they
> > > >>>>>>>>>> preserve the
> > > >>>>>>>>>> user's arbitrary region identifier, the ever-so-important
> > > >>>> region 42.
> > > >>>>>>>>>> In the object file section .mca_code_regions, we have noted the
> > > >>>>>>> user's
> > > >>>>>>>>>> region
> > > >>>>>>>>>> identifier (.quad 42), start address, and region size. A more
> > > >>>>>>> complicated
> > > >>>>>>>>>> example can have multiple regions defined within a single
> > > >>>>>>> .mca_code_regions
> > > >>>>>>>>>> section. This section can be read by llvm-mca, allowing
> > > >>>> llvm-mca to
> > > >>>>>>> take
> > > >>>>>>>>>> object files as input instead of assembly source.
> > > >>>>>>>>>>
> > > >>>>>>>>>> Details
> > > >>>>>>>>>> ---------
> > > >>>>>>>>>> We need a way for a user to identify a region/block of code
> > > >>>> they
> > > >>>>>>> want
> > > >>>>>>>>>> analyzed
> > > >>>>>>>>>> by llvm-mca. We solve this problem by introducing two
> > > >>>> intrinsics
> > > >>>>>>> that a
> > > >>>>>>>>>> user can
> > > >>>>>>>>>> specify, for identifying regions of code for analysis.
> > > >>>>>>>>>>
> > > >>>>>>>>>> The two intrinsics are: llvm.mca.code.regions.start and
> > > >>>>>>>>>> llvm.mca.code.regions.end. A user can identify a code region by
> > > >>>>>>> inserting
> > > >>>>>>>>>> the
> > > >>>>>>>>>> mca_code_region_start and mca_code_region_end markers. These
> > > >>>> are
> > > >>>>>>> simply
> > > >>>>>>>>>> clang builtins and are transformed into the aforementioned
> > > >>>>>>> intrinsics
> > > >>>>>>>>>> during
> > > >>>>>>>>>> compilation. The code between the intrinsics are what we call
> > > >>>> "code
> > > >>>>>>>>>> regions"
> > > >>>>>>>>>> and are to be easily identifiable by llvm-mca; any code
> > > >>>> between a
> > > >>>>>>> start/end
> > > >>>>>>>>>> pair can be analyzed by llvm-mca at a later time. A user can
> > > >>>> define
> > > >>>>>>>>>> multiple
> > > >>>>>>>>>> non-overlapping code regions within their program.
> > > >>>>>>>>>>
> > > >>>>>>>>>> The llvm.mca.code.region.start intrinsic takes an integer
> > > >>>> constant
> > > >>>>>>> as its
> > > >>>>>>>>>> only
> > > >>>>>>>>>> argument. This argument is implemented as a metadata i32, and
> > > >>>> is
> > > >>>>>>> only used
> > > >>>>>>>>>> when generating llvm-mca reports. This value allows a user to
> > > >>>> more
> > > >>>>>>> easily
> > > >>>>>>>>>> identify a specific code region. llvm.mca.code.region.end
> > > >>>> takes no
> > > >>>>>>>>>> arguments.
> > > >>>>>>>>>> Since we disallow nesting of regions, the first 'end' intrinsic
> > > >>>>>>> lexically
> > > >>>>>>>>>> following a 'start' intrinsic represents the end of that code
> > > >>>>>>> region.
> > > >>>>>>>>>> Now that we have a solution for identifying regions for
> > > >>>> analysis,
> > > >>>>>>> we now
> > > >>>>>>>>>> need a
> > > >>>>>>>>>> way for preserving that information to be read at a later
> > > >>>> time. To
> > > >>>>>>>>>> accomplish
> > > >>>>>>>>>> this we propose adding a new section (.mca_code_regions) to the
> > > >>>>>>> object file
> > > >>>>>>>>>> generated by llvm. During code generation, the start/end
> > > >>>> intrinsics
> > > >>>>>>>>>> described
> > > >>>>>>>>>> above will be transformed into start/end labels in assembly.
> > > >>>> When
> > > >>>>>>> llvm
> > > >>>>>>>>>> generates the object file from the user's code, these start/end
> > > >>>>>>> labels
> > > >>>>>>>>>> form a
> > > >>>>>>>>>> pair of values identifying the start of the user's code
> > > >>>> region, and
> > > >>>>>>> size.
> > > >>>>>>>>>> The
> > > >>>>>>>>>> size represents the number of bytes between the start and end
> > > >>>>>>> address of
> > > >>>>>>>>>> the
> > > >>>>>>>>>> labels. Note that the labels are emitted during assembly
> > > >>>> printing.
> > > >>>>>>> We hope
> > > >>>>>>>>>> that these labels have no influence on code generation or
> > > >>>>>>> basic-block
> > > >>>>>>>>>> placement. However, the target assembler strategy for handling
> > > >>>>>>> labels is
> > > >>>>>>>>>> outside of our control.
> > > >>>>>>>>>>
> > > >>>>>>>>>> This proposed change affects the size of a binary, but only if
> > > >>>> the
> > > >>>>>>> user
> > > >>>>>>>>>> calls
> > > >>>>>>>>>> the start/end builtins mentioned above. The additional size of
> > > >>>> the
> > > >>>>>>>>>> .mca_code_regions section, which we imagine to be very small
> > > >>>> (to
> > > >>>>>>> the order
> > > >>>>>>>>>> of a
> > > >>>>>>>>>> few bytes), can trivially be stripped by tools like 'strip' or
> > > >>>>>>> 'objcopy'.
> > > >>>>>>>>>> Implementation Status
> > > >>>>>>>>>> ------------------------------
> > > >>>>>>>>>> We currently have the proposed changes implemented at the url
> > > >>>>>>> posted below.
> > > >>>>>>>>>> This initial patch only targets ELF object files, and does not
> > > >>>>>>> handle
> > > >>>>>>>>>> relocatable addresses. Since the start of a code region is
> > > >>>>>>> represented as
> > > >>>>>>>>>> an
> > > >>>>>>>>>> assembly label, and referenced in the .mca_code_regions
> > > >>>> section,
> > > >>>>>>> that
> > > >>>>>>>>>> address
> > > >>>>>>>>>> is relocatable. That value can be represented as
> > > >>>> section-relative
> > > >>>>>>>>>> relocatable
> > > >>>>>>>>>> symbol (.text + addend), but we are not handling that case yet.
> > > >>>>>>> Instead,
> > > >>>>>>>>>> the
> > > >>>>>>>>>> proposed changes only handle linked/executable object files.
> > > >>>>>>>>>>
> > > >>>>>>>>>> For purposes of review and to communicate the idea, the change
> > > >>>> is
> > > >>>>>>>>>> presented as a monolithic patch here:
> > > >>>>>>>>>>
> > > >>>>>>>>>> https://reviews.llvm.org/D54603
> > > >>>>>>>>>>
> > > >>>>>>>>>> The change is presented as a monolithic patch; however, if
> > > >>>> accepted
> > > >>>>>>>>>> the patch will be split into three smaller patches:
> > > >>>>>>>>>> 1. The introduction of the builtins to clang.
> > > >>>>>>>>>> 2. The llvm portion (the added intrinsics).
> > > >>>>>>>>>> 3. The llvm-mca portion.
> > > >>>>>>>>>>
> > > >>>>>>>>>> Thanks!
> > > >>>>>>>>>>
> > > >>>>>>>>>> -Matt
> > > >>>>>>>>>> _______________________________________________
> > > >>>>>>>>>> LLVM Developers mailing list
> > > >>>>>>>>>> llvm-dev at lists.llvm.org
> > > >>>>>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> > > >>>>>>>>>>
> > > >>>>>>>> _______________________________________________
> > > >>>>>>>> LLVM Developers mailing list
> > > >>>>>>>> llvm-dev at lists.llvm.org
> > > >>>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev