[cfe-dev] [llvm-dev] [RFC][llvm-mca] Adding binary support to llvm-mca.

Mon Dec 17 10:12:23 PST 2018

Hi Matt,

could you please update the associated patch too?

Thanks
-Andrea

On Mon, Dec 17, 2018 at 5:48 PM via llvm-dev <llvm-dev at lists.llvm.org>
wrote:

> Adding the llvm-dev list, because my email client decides to remove
> certain lists when I reply-all... including the list that I intend to
> respond to.
>
> > -----Original Message-----
> > From: Davis, Matthew <Matthew.Davis at sony.com>
> > Sent: Monday, December 17, 2018 9:47 AM
> > To: Davis, Matthew <Matthew.Davis at sony.com>; llvm-dev at redking.me.uk
> > Cc: cfe-dev at lists.llvm.org
> > Subject: RE: [llvm-dev] [RFC][llvm-mca] Adding binary support to
> llvm-mca.
> >
> > Just an update to this RFC.  (I thought this was going to be a short
> email... my
> > apologies).
> >
> > One of the primary limitations described in this RFC (earlier in this
> thread), is that the
> > patch for this RFC only handles linked executables.  This restriction is
> due to myself
> > wanting to avoid handling relocations in a target specific manner, at
> least in the
> > initial patch.  Especially so, since I want to keep the initial patch
> simple.  However,
> > sometimes simple and practical are at odds with each other.  I envision
> the main use-
> > case of llvm-mca, with binary support, is for analyzing .o files.
> Probably analyzing .o
> > more commonly than fully linked executables.  Without handling relocated
> objects,
> > this patch seems rather useless.
> >
> > I started exploring an alternative solution this weekend to the
> aforementioned
> > problem.  This alternative solution avoids having to handle relocations,
> but does give
> > us support for object files (with relocated symbols) and linked
> executables.  The
> > change is quite simple, and seems to be effective.  In short, we still
> generate
> > intrinsics as discussed in the RFC, one to mark the start of a code
> region, and
> > another to mark the end.  These intrinsics get lowered into local
> symbols.  The
> > symbols are already  encoded with address information about their
> position in the
> > object file.  What is different is that we ensure that these symbols
> have unique
> > names and also encode the user provided ID value.
> >
> > Previously the labels  were named like:
> .Lmca_code_region_start_<number>, and
> > similar for .Lmca_code_region_end.  The user id number and region size
> were
> > encoded in the .mca_code_regions object file section.  Previously mca
> never looked
> > at the symbol table.  But, In reality we can calculate the region size
> by using the
> > symbols in the symbol table (look for the mca symbols), instead of
> relying on the
> > information encoded in .mca_code_regions.  The alternative approach gets
> rid of
> > that section entirely but achieves the same functionality by encoding
> the information
> > in the symbol name. In short the alternative approach just parses the
> symbol table
> > for MCA symbols, and the symbol names are encoded with the data we need.
> >
> > The newly proposed name is formatted
> > as: .Lmca_code_region_start.<id>.<function>.<number>.  Similar for
> > mca_code_region_end.  'function' is the function that the marker appears
> in, 'ID' is
> > the user-specified ID (this is a value that users specify for easily
> identifying the code
> > region under analysis... just cosmetic), and 'number' is a unique number
> to avoid any
> > duplicate name conflicts.  The benefit of this alternative solution is
> that we can get
> > rid of .mca_code_regions, and gather all of the information llvm-mca
> needs by
> > parsing the symbol table looking for any symbols with the
> 'mca_code_region_start'
> > and 'mca_code_region_end' format discussed above.  Of course, if the
> string table is
> > stripped, then we will lose this data.  The main drawback from this
> alternative
> > approach is that it relies on encoding symbol names and string
> processing on those
> > names.   I'm somewhat biased against doing string parsing, but the code
> to perform
> > this is simple and small, and more importantly it allows llvm-mca to
> handle linked or
> > relocated object files.
> >
> > -Matt
> >
> >
> >
> > > -----Original Message-----
> > > From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf Of via
> llvm-dev
> > > Sent: Tuesday, December 11, 2018 11:23 AM
> > > To: llvm-dev at redking.me.uk; llvm-dev at lists.llvm.org
> > > Cc: cfe-dev at lists.llvm.org
> > > Subject: Re: [llvm-dev] [RFC][llvm-mca] Adding binary support to
> llvm-mca.
> > >
> > > Thanks for the response Simon.  My reply is inline:
> > >
> > > > From: Simon Pilgrim <llvm-dev at redking.me.uk>
> > > > Sent: Monday, December 10, 2018 1:40 PM
> > > >
> > > > Hi Matt,
> > > >
> > > > I can see a near future where perf-analysis tooling uses branch
> history
> > > > profiler captures to determine how often loops/branches are taken and
> > > > feeds that into llvm-mca, especially for hot/branchy loop analysis
> > > > reports etc. Are you confident that your approach will be easily
> > > > extendable for this?
> > >
> > > That is a very interesting use case.  The restriction of a code-region
> to a single
> > block
> > > is a limitation for any tools that want to analyze branches.  However,
> I believe
> > > that it will be easy to lift this restriction (it's just a check in
> IR/Verifier).  This
> > > limitation is not
> > > expressed in the llvm-mca driver.
> > >
> > > If the information is coming from a profile report, then we'd most
> > > likely need to extend the llvm-mca driver to accept profile reports.
> Currently,
> > > code regions, from the perspective of the  llvm-mca driver, are very
> simple. They
> > are
> > > just a collection of MCInst.  The binary support in this RFC+patch
> > > disassembles just the address range from marker start address for
> > > some specified number of bytes.  It might be useful to add another
> driver argument
> > > so that a user (or tool) can specify, from the command line, a range
> of instructions
> > > to
> > > analyze.  I recently added a class for handling inputs to
> > > llvm::mca::CodeRegionGenerator,
> > > which is just responsible for taking some input and creating a list of
> MCInst that
> > llvm-
> > > mca uses.
> > > We could subclass this to handle profile reports.
> > >
> > > > Similarly, being able to generally embed the profile markers in
> object
> > > > libraries for reuse is going to be important for some people - I'd
> like
> > > > to see more of a plan of how this will be achieved. I understand
> that it
> > > > might not be easy for some exe formats.
> > >
> > > That is definitely a limitation.  This initial patch+RFC only handles
> linked
> > > executables (i.e., the llvm-mca marker symbol addresses are resolved).
> > > I'm working on a better solution so that this will not be a
> restriction.
> > > In fact, I'll probably delay trying to land any patches until I solve
> relocations
> > > (or use a different solution for identifying start/end addresses for
> llvm-mca code
> > > regions).
> > >
> > > > Sorry if I'm being too critical, but I'm a bit worried that we end up
> > > > with an initial implementation that will take a lot of reworking to
> meet
> > > > our final aims.
> > > >
> > > > Thanks, Simon.
> > >
> > > I understand your criticisms and value your input. Thanks a ton!
> > >
> > > -Matt
> > >
> > >
> > > > On 10/12/2018 19:32, Matt Davis wrote:
> > > > > Thanks for the feedback Guillaume and Clement!
> > > > >
> > > > > In response to Clement:
> > > > >
> > > > >>> In terms of future-proofness of only allowing regions within a
> basic
> > > > >>> block, are we confident we can actually ever simulate branches
> apart from
> > > > >>> "always taken, perfectly predicated" loop ? Even this simple
> need requires
> > > > >>> knowing quite a few details on the frontend. The current design
> could
> > > > >>> handle this use case with the addition of an external "loop
> mode" option to
> > > > >>> MCA. If there are no other strong use cases, I would advocate for
> > > > >>> experimental intrinsics unless people can contribute other
> example use
> > > > >>> cases.
> > > > > In short, I am in agreement and think that handling of branching
> or loop
> > > > > constructs should be isolated to the llvm-mca driver/front-end.
> The
> > > > > only thing the code regions should be concerned with is identifying
> > > > > blocks of instructions that will later be used by the front end.
> > > > >
> > > > > We can place limitations to how those blocks are formed. For
> example the
> > > > > current implementation forces regions to be isolated to a single
> basic
> > > > > block.  However, we anticipate lifting this restriction once
> branching
> > > > > is handled.
> > > > >
> > > > > -Matt
> > > > >
> > > > >
> > > > > On Mon, Dec 10, 2018 at 04:15:46PM +0100, Guillaume Chatelet wrote:
> > > > >> +1 to what Clement said.
> > > > >> I believe the intrinsics are a better design to support many
> architectures.
> > > > >>
> > > > >> IACA users are probably decorating their code with IACA_START /
> IACA_END
> > > > >> macros. One possibility is to provide a header that define these
> macros in
> > > > >> terms of the new intrinsics.
> > > > >>
> > > > >> On Mon, Dec 10, 2018 at 3:59 PM Clement Courbet <
> courbet at google.com>
> > > > wrote:
> > > > >>
> > > > >>> Hi Matt/Andrea,
> > > > >>>
> > > > >>> I see pros and cons for IACA-style markers vs intrinsics.
> > > > >>> On the one hand, IACA-style markers are very magical, and not
> very visible
> > > > >>> in both the source and object code. Using IACA-style markers has
> the
> > > > >>> advantage that you can use llvm-mca as a drop-in replacement for
> IACA, or
> > > > >>> even to compare their outputs on the exact same binary. They
> also do not
> > > > >>> require tooling on the compiler side and allow comparing the
> output of
> > > > >>> several compilers.
> > > > >>>
> > > > >> On the other hand, IACA-style markers do not have a equivalent on
> other
> > > > >>> architectures, and I'm not sure inventing new ones is a good
> idea :) I
> > > > >>> think the latter makes them pretty much a no-go for llvm-mca as
> I don't
> > > > >>> think we'll want to teach each target how to parse code regions.
> That's
> > > > >>> much better handled in a target-agnostic way by the object.
> Intel got away
> > > > >>> with them because they only had to support one architecture.
> > > > >>>
> > > > >>> tl;dr: In the case of llvm-mca, I like your design better than
> the markers.
> > > > >>>
> > > > >>> In terms of future-proofness of only allowing regions within a
> basic
> > > > >>> block, are we confident we can actually ever simulate branches
> apart from
> > > > >>> "always taken, perfectly predicated" loop ? Even this simple
> need requires
> > > > >>> knowing quite a few details on the frontend. The current design
> could
> > > > >>> handle this use case with the addition of an external "loop
> mode" option to
> > > > >>> MCA. If there are no other strong use cases, I would advocate for
> > > > >>> experimental intrinsics unless people can contribute other
> example use
> > > > >>> cases.
> > > > >>>
> > > > >>> On Mon, Dec 3, 2018 at 11:38 PM Matt Davis <
> matthew.davis at sony.com>
> > > > wrote:
> > > > >>>
> > > > >>>> Hi Andrea,
> > > > >>>>
> > > > >>>> On Mon, Dec 03, 2018 at 01:21:33PM +0000, Andrea Di Biagio
> wrote:
> > > > >>>>> So, I have been thinking a bit more about this whole design.
> > > > >>>>>
> > > > >>>>> The more I think about your suggested design, the more I am
> convinced
> > > > >>>> that
> > > > >>>>> we should do something more to support ranges in binary object
> files
> > > > >>>> too.
> > > > >>>>> My understanding is that the reason why we don't support
> object files in
> > > > >>>>> general, is because of the presence of relocations. That is
> because a
> > > > >>>>> region start marker is effectively symbol relative, and the
> symbol (a
> > > > >>>>> function) would be relocated in the final executable.
> > > > >>>>> You mentioned to me that resolving even a 'simple'
> symbol-relative
> > > > >>>>> relocation is not trivial, beause it requires specific
> knowledge about
> > > > >>>> the
> > > > >>>>> binary format, and the target (i.e. how relocations are
> encoded is
> > > > >>>> target
> > > > >>>>> specific). I am surprised that there is not a utility library
> for
> > > > >>>> resolving
> > > > >>>>> relocations.. but I am not familiar with that part of the
> compiler. I
> > > > >>>> was
> > > > >>>>> hoping that there was a target specific interface to use in
> this case...
> > > > >>>> There might be a better way of resolving the relocs, but from
> what I saw
> > > > >>>> looking at llvm-objdump and other related tools, it seems that
> resolving
> > > > >>>> the relocated symbol is a target specific effort.  I also spent
> sometime
> > > > >>>> sniffing around ExecutionEngine/RuntimeDyld/RuntimeDyld.cpp
> which also
> > > > >>>> performs the reloc resolution.  I should clarify that I too am
> not an
> > > > >>>> expert in llvm's utilities for performing symbol/reloc
> resolution, and
> > > > >>>> perhaps someone in the community can point me in the right
> direction.  I
> > > > >>>> can clearly see the reloc data in the object file via tools like
> > > > >>>> objdump; however, accessing the relocs via
> > > > >>>> llvm::object::ObjectFile::relocations() did not produce address
> values
> > > > >>>> that we could use (values of zero).
> > > > >>>>
> > > > >>>> I was hoping that, for a first pass at this patch, supporting
> just
> > > > >>>> executables would be okay.  That keeps this initial patch set
> simple,
> > > > >>>> and hopefully will encourage others to take a peek at it, since
> it's
> > > > >>>> less daunting than what it might otherwise be.  Of course,
> there is the
> > > > >>>> concern that this initial patch will lock us into a design that
> will be
> > > > >>>> more complicated to unravel later.
> > > > >>>>
> > > > >>>>> An alternative approach would require that you define your own
> > > > >>>>> "symbol-relative" reference. After all, ranges are just a
> sequences of
> > > > >>>>> instructions in a function. If a function symbol is described
> by the
> > > > >>>> symbol
> > > > >>>>> table, then you should be able to obtain its offset in the
> .text
> > > > >>>> section.
> > > > >>>>> So, you could potentially encode your own symbol+offset.
> However, the
> > > > >>>>> linker would not be able to understand your "custom
> relocation", and
> > > > >>>>> information about regions in the final elf would be basically
> broken.
> > > > >>>>> So,that would not be a solution...
> > > > >>>>>
> > > > >>>>> I don't know honestly what is the best approach to use in this
> case.
> > > > >>>>> As a compromise, it would not be a bad idea to add the ability
> to
> > > > >>>> specify
> > > > >>>>> ranges from command line. What do you think?
> > > > >>>>> Still, from a user point of view, the idea that we don't
> support object
> > > > >>>>> files in general sounds like a big limitation.
> > > > >>>> I agree, only supporting executables is a limitation.  However,
> I'd
> > > > >>>> like to land the base support now and add in the additional
> > > > >>>> features/support after this large patch set lands.  But I can
> see
> > > > >>>> where landing the whole thing entirely also makes sense.
> > > > >>>>
> > > > >>>>> About the new experimental intrinsics: those would definitely
> work well
> > > > >>>> for
> > > > >>>>> the simple case where instructions are from the same basic
> block.
> > > > >>>>> However, some/most of the constraints that you plan to add
> will have to
> > > > >>>>> change if in future we decide to allow ranges that potentially
> cross
> > > > >>>>> multiple basic blocks. How will the rules/constraints on those
> new
> > > > >>>>> intrinsics change? I just want to make sure that the suggested
> design is
> > > > >>>>> future-proof.
> > > > >>>> Since the llvm/clang parts of the code are just responsible for
> > > > >>>> collecting where a range starts/ends, I hope that we can remove
> some
> > > > >>>> of the baked-in constraints that are specified in
> IR/Verifier.cpp.
> > > > >>>> As you pointed out earlier in this thread, we might want to
> > > > >>>> introduce a dominance check if/when we lift the one-basic-block
> > > > >>>> restriction.
> > > > >>>>
> > > > >>>> -Matt
> > > > >>>>
> > > > >>>>> -Andrea
> > > > >>>>>
> > > > >>>>> On Tue, Nov 27, 2018 at 5:08 PM Andrea Di Biagio <
> > > > >>>> andrea.dibiagio at gmail.com>
> > > > >>>>> wrote:
> > > > >>>>>
> > > > >>>>>> Thanks for clarifying it Matt.
> > > > >>>>>>
> > > > >>>>>> In general, I quite like your suggested design.
> > > > >>>>>>
> > > > >>>>>> My only concern is about the semantic of the two new
> intrinsics. You
> > > > >>>>>> design doesn't allow mca ranges to span through multiple basic
> > > > >>>> blocks. That
> > > > >>>>>> constraint is acceptable for now, since llvm-mca doesn't know
> how to
> > > > >>>> deal
> > > > >>>>>> with control flow.
> > > > >>>>>> However, I am a bit concerned about what might happen in
> future if we
> > > > >>>>>> decide to let users specify code regions that span through
> multiple
> > > > >>>> basic
> > > > >>>>>> blocks. Basically, I don't particularly like the idea of
> changing the
> > > > >>>>>> semantic of already existing intrinsic. A design that already
> > > > >>>> accounts for
> > > > >>>>>> that particular scenario/future work would be ideal. That
> being said,
> > > > >>>>>> marking those new intrinsics as 'experimental' may be a good
> > > > >>>> compromise (at
> > > > >>>>>> least for now).
> > > > >>>>>>
> > > > >>>>>> So, I am quite happy overall with the direction of this RFC.
> > > > >>>>>> However, I am interesting to hear from other developers about
> your
> > > > >>>>>> suggested design.
> > > > >>>>>>
> > > > >>>>>>> This initial patch only targets ELF object files, and does
> not
> > > > >>>> handle
> > > > >>>>>> relocatable addresses. Since the start of a code region is
> > > > >>>> represented as
> > > > >>>>>> an
> > > > >>>>>> assembly label, and referenced in the .mca_code_regions
> section, that
> > > > >>>>>> address
> > > > >>>>>> is relocatable.
> > > > >>>>>>
> > > > >>>>>> This may be okay for now. However, it would be nice to remove
> that
> > > > >>>>>> constraint in future and add support to generic object files.
> > > > >>>>>>
> > > > >>>>>> -Andrea
> > > > >>>>>>
> > > > >>>>>> On Thu, Nov 22, 2018 at 7:21 PM <Matthew.Davis at sony.com>
> wrote:
> > > > >>>>>>
> > > > >>>>>>> I want to clarify a few restrictions of llvm-mca code
> regions that
> > > > >>>> this
> > > > >>>>>>> RFC proposes:
> > > > >>>>>>>
> > > > >>>>>>> 1) All llvm-mca code regions must start with an
> > > > >>>>>>> llvm.mca.code.region.start intrinsic and end with
> > > > >>>>>>> an llvm.mca.code.region.end intrinsic.  This rule is
> enforced at the
> > > > >>>> IR
> > > > >>>>>>> level in the IR verifier.
> > > > >>>>>>>
> > > > >>>>>>> 2) llvm-mca code regions cannot nest.  This restriction
> implies that
> > > > >>>> an
> > > > >>>>>>> llvm.mca.code.region.start
> > > > >>>>>>> must have a llvm.mca.code.region.end intrinsic without any
> other
> > > > >>>> llvm.mca
> > > > >>>>>>> start intrinsics
> > > > >>>>>>> between the two. The current implementation in the patch
> enforces
> > > > >>>> this
> > > > >>>>>>> restriction at the
> > > > >>>>>>> IR level via the IR Verifier.
> > > > >>>>>>>
> > > > >>>>>>> 3) An llvm-mca code region cannot span multiple basic blocks.
> > > > >>>> llvm-mca
> > > > >>>>>>> does not follow
> > > > >>>>>>> branches (yet).  Instead, a branch instruction is treated by
> llvm-mca
> > > > >>>>>>> like any other instruction.
> > > > >>>>>>> The current patch associated with this RFC does not enforce
> this
> > > > >>>>>>> restriction.  I plan on updating
> > > > >>>>>>> the patch to enforce that a code region can only belong to a
> single
> > > > >>>> basic
> > > > >>>>>>> block.  This is a simple
> > > > >>>>>>> check, ensuring that both the llvm.mca.code.region.start and
> > > > >>>> accompanying
> > > > >>>>>>> end intrinsics live
> > > > >>>>>>> in the same basic block. I imagine adding this check at the
> IR level
> > > > >>>> when
> > > > >>>>>>> we also verify points 1 and 2
> > > > >>>>>>> above.  That will keep the code-region verification logic
> isolated
> > > > >>>> to the
> > > > >>>>>>> IR verifier.  The start/end
> > > > >>>>>>> intrinsics should not have any uses, so I'm not sure that
> they would
> > > > >>>> be
> > > > >>>>>>> moved/sunk on behalf
> > > > >>>>>>> of any other instruction.  In other words, I do not imagine
> that a
> > > > >>>> start
> > > > >>>>>>> and end would be split
> > > > >>>>>>> apart due to later MI optimizations.  If I discover that
> such a case
> > > > >>>>>>> occurs, then I might add the
> > > > >>>>>>> basic-block check prior to emitting the code region data to
> the
> > > > >>>> object
> > > > >>>>>>> file.    Once  llvm-mca  is
> > > > >>>>>>> updated to handle branches, then we can remove this
> constraint.
> > > > >>>>>>>
> > > > >>>>>>> -Matt
> > > > >>>>>>>
> > > > >>>>>>>> -----Original Message-----
> > > > >>>>>>>> From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf
> Of Matt
> > > > >>>>>>> Davis via llvm-
> > > > >>>>>>>> dev
> > > > >>>>>>>> Sent: Wednesday, November 21, 2018 8:47 AM
> > > > >>>>>>>> To: Andrea Di Biagio <andrea.dibiagio at gmail.com>
> > > > >>>>>>>> Cc: llvm-dev <llvm-dev at lists.llvm.org>; Di Biagio, Andrea
> > > > >>>>>>>> <Andrea.Dibiagio at sony.com>; cfe-dev at lists.llvm.org
> > > > >>>>>>>> Subject: Re: [llvm-dev] [RFC][llvm-mca] Adding binary
> support to
> > > > >>>>>>> llvm-mca.
> > > > >>>>>>>> Hi Andrea,
> > > > >>>>>>>>
> > > > >>>>>>>> Thanks for your input.
> > > > >>>>>>>>
> > > > >>>>>>>> On Wed, Nov 21, 2018 at 12:43:52PM +0000, Andrea Di Biagio
> wrote:
> > > > >>>>>>>> [... snip ...]
> > > > >>>>>>>>> About the suggested design:
> > > > >>>>>>>>> I like the idea of being able to identify code regions
> using a
> > > > >>>> numeric
> > > > >>>>>>>>> identifier.
> > > > >>>>>>>>> However, what happens if a code region spans through
> multiple
> > > > >>>> basic
> > > > >>>>>>> blocks?
> > > > >>>>>>>> The current patch does not take into consideration cases
> where the
> > > > >>>>>>>> region start and end intrinsics are placed in different
> basic
> > > > >>>> blocks.
> > > > >>>>>>>> Such would be the case if a region is defined to span
> multiple
> > > > >>>> blocks.
> > > > >>>>>>>> This would be similar to the current case where a user
> places a
> > > > >>>>>>>> #LLVM-MCA-BEGIN assembly comment in one block and an #LLVM-
> > > MCA-
> > > > END
> > > > >>>> in
> > > > >>>>>>>> another.  However, as you point out below, if the user does
> this
> > > > >>>> in the
> > > > >>>>>>>> source code via intrinsics (just what this patch is
> proposing),
> > > > >>>> then
> > > > >>>>>>>> there is a chance that optimizations might change the
> layout of the
> > > > >>>>>>>> instructions and confuse the ordering of the MCA intrinsics.
> > > > >>>>>>>>
> > > > >>>>>>>> Since MCA does not follow branches (MCA just treats a
> branch as it
> > > > >>>> would
> > > > >>>>>>>> a non-branching instruction), it seems that a user should
> be aware
> > > > >>>> that
> > > > >>>>>>>> defining MCA code regions that span multiple blocks might
> result
> > > > >>>> in an
> > > > >>>>>>>> unexpected analysis.  While we do not discourage this, it
> seems
> > > > >>>> like
> > > > >>>>>>>> such a case will probably not produce an expected result
> for the
> > > > >>>> user.
> > > > >>>>>>>> We could introduce a warning, or automatically divide the
> regions
> > > > >>>> so
> > > > >>>>>>>> that a single region can only contain a single block.
> > > > >>>>>>>>
> > > > >>>>>>>>> My understanding is that code regions are not allowed to
> > > > >>>> overlap. So,
> > > > >>>>>>> it
> > > > >>>>>>>>> makes sense if ` __mca_code_region_end()` doesn't take an
> ID as
> > > > >>>> input.
> > > > >>>>>>>>> However, what if ` __mca_code_region_end()` ends in a
> different
> > > > >>>> basic
> > > > >>>>>>> block?
> > > > >>>>>>>>> `__mca_code_region_start()` has to always dominate `
> > > > >>>>>>>>> __mca_code_region_end()`. This is trivial to verify when
> both
> > > > >>>> calls
> > > > >>>>>>> are in
> > > > >>>>>>>>> a same basic block; however, we need to make sure that the
> > > > >>>>>>> relationship is
> > > > >>>>>>>>> still the same when the `end()` call is in a different
> basic
> > > > >>>> block.
> > > > >>>>>>>>> That would not be enough. I think we should also verify
> that `
> > > > >>>>>>>>> __mca_code_region_end()` always post-dominates the call to
> > > > >>>>>>>>> `__mca_code_region_start()`.
> > > > >>>>>>>> In any case this patch should probably check dominance of
> the
> > > > >>>>>>>> intrinsics, even though MCA does not follow branches and
> MCA does
> > > > >>>> not
> > > > >>>>>>>> not explicitly forbid a region from containing multiple
> blocks.
> > > > >>>>>>>>
> > > > >>>>>>>>> My question is: what happens with basic block reordering?
> We
> > > > >>>> don't
> > > > >>>>>>> know the
> > > > >>>>>>>>> layout of basic blocks until we reach code emission. How
> does it
> > > > >>>> work
> > > > >>>>>>> for
> > > > >>>>>>>>> regions that span through multiple basic blocks?. I think
> your
> > > > >>>> RFC
> > > > >>>>>>> should
> > > > >>>>>>>>> clarify this aspect.
> > > > >>>>>>>>>
> > > > >>>>>>>>> As a side note: at the moment, llvm-mca doesn't know how
> to deal
> > > > >>>> with
> > > > >>>>>>>>> branches. So, for simplicity we could force code regions
> to only
> > > > >>>>>>> contain
> > > > >>>>>>>>> instructions from a single basic block.
> > > > >>>>>>>>>
> > > > >>>>>>>>> However, In future we may want to teach llvm-mca how to
> analyze
> > > > >>>>>>> branchy
> > > > >>>>>>>>> code too. For example, we could introduce a simple
> control-flow
> > > > >>>>>>> analysis in
> > > > >>>>>>>>> llvm-mca, and use an external "branch trace" information
> (for
> > > > >>>>>>> example, a
> > > > >>>>>>>>> perf trace generated by an external tool) to decorate
> branches
> > > > >>>> with
> > > > >>>>>>> with
> > > > >>>>>>>>> branch probabilities (similarly to what we currently do in
> LLVM
> > > > >>>> with
> > > > >>>>>>> PGO).
> > > > >>>>>>>>> We could then use that knowledge to model branch
> prediction and
> > > > >>>>>>> simulate
> > > > >>>>>>>>> what happens in the presence of multiple branches.
> > > > >>>>>>>>>
> > > > >>>>>>>>> So, the idea of having regions that potentially span
> multiple
> > > > >>>> basic
> > > > >>>>>>> blocks
> > > > >>>>>>>>> is not bad in general. However, I think you should better
> clarify
> > > > >>>>>>> what are
> > > > >>>>>>>>> the constraints (at least, you should answer to my
> questions from
> > > > >>>>>>> before).
> > > > >>>>>>>> I agree! Thanks for pointing that out.
> > > > >>>>>>>>
> > > > >>>>>>>>> If we decide to use those new intrinsics, then those
> should be
> > > > >>>>>>> experimental
> > > > >>>>>>>>> (at least to start).
> > > > >>>>>>>> Agreed.
> > > > >>>>>>>>
> > > > >>>>>>>> -Matt
> > > > >>>>>>>>
> > > > >>>>>>>>> On Thu, Nov 15, 2018 at 11:07 PM via llvm-dev <
> > > > >>>>>>> llvm-dev at lists.llvm.org>
> > > > >>>>>>>>> wrote:
> > > > >>>>>>>>>
> > > > >>>>>>>>>> Introduction
> > > > >>>>>>>>>> -----------------
> > > > >>>>>>>>>> Currently llvm-mca only accepts assembly code as input. We
> > > > >>>> would
> > > > >>>>>>> like to
> > > > >>>>>>>>>> extend llvm-mca to support object files, allowing users to
> > > > >>>> analyze
> > > > >>>>>>> the
> > > > >>>>>>>>>> performance of binaries. The proposed changes (which
> involve
> > > > >>>> both
> > > > >>>>>>>>>> clang and llvm) optionally introduce an object file
> section,
> > > > >>>> but
> > > > >>>>>>> this can
> > > > >>>>>>>>>> be
> > > > >>>>>>>>>> stripped-out if desired.
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> For the llvm-mca binary support feature to be useful, a
> user
> > > > >>>> needs
> > > > >>>>>>> to tell
> > > > >>>>>>>>>> llvm-mca which portions of their code they would like
> analyzed.
> > > > >>>>>>> Currently,
> > > > >>>>>>>>>> this is accomplished via assembly comments. However,
> assembly
> > > > >>>>>>> comments are
> > > > >>>>>>>>>> not
> > > > >>>>>>>>>> preserved in object files, and this has encouraged this
> RFC.
> > > > >>>> For the
> > > > >>>>>>>>>> proposed
> > > > >>>>>>>>>> binary support, we need to introduce changes to clang and
> llvm
> > > > >>>> to
> > > > >>>>>>> allow the
> > > > >>>>>>>>>> user's object code to be recognized by llvm-mca:
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> * We need a way for a user to identify a region/block of
> code
> > > > >>>> they
> > > > >>>>>>> want
> > > > >>>>>>>>>>     analyzed by llvm-mca.
> > > > >>>>>>>>>> * We need the information defining the user's region of
> code
> > > > >>>> to be
> > > > >>>>>>>>>> maintained
> > > > >>>>>>>>>>     in the object file so that llvm-mca can analyze the
> desired
> > > > >>>>>>> region(s)
> > > > >>>>>>>>>> from the
> > > > >>>>>>>>>>     object file.
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> We define a "code region" as a subset of a user's program
> that
> > > > >>>> is
> > > > >>>>>>> to be
> > > > >>>>>>>>>> analyzed via llvm-mca. The sequence of instructions to be
> > > > >>>> analyzed
> > > > >>>>>>> is
> > > > >>>>>>>>>> represented as a pair: <start, end> where the 'start'
> marks the
> > > > >>>>>>> beginning
> > > > >>>>>>>>>> of
> > > > >>>>>>>>>> the user's source code and 'end' terminates the sequence.
> The
> > > > >>>>>>> instructions
> > > > >>>>>>>>>> between 'start' and 'end' form the region that can be
> analyzed
> > > > >>>> by
> > > > >>>>>>> llvm-mca
> > > > >>>>>>>>>> at a
> > > > >>>>>>>>>> later time.
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> Example
> > > > >>>>>>>>>> -----------
> > > > >>>>>>>>>> Before we go into the details of this proposed change,
> let's
> > > > >>>> first
> > > > >>>>>>> look at
> > > > >>>>>>>>>> a
> > > > >>>>>>>>>> simple example:
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> // example.c -- Analyze a dot-product expression.
> > > > >>>>>>>>>> double test(double x, double y) {
> > > > >>>>>>>>>>     double result = 0.0;
> > > > >>>>>>>>>>     __mca_code_region_start(42);
> > > > >>>>>>>>>>     result += x * y;
> > > > >>>>>>>>>>     __mca_code_region_end();
> > > > >>>>>>>>>>     return result;
> > > > >>>>>>>>>> }
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> In the example above, we have identified a code region,
> in this
> > > > >>>>>>> case a
> > > > >>>>>>>>>> single
> > > > >>>>>>>>>> dot-product expression. For the sake of brevity and
> simplicity,
> > > > >>>>>>> we've
> > > > >>>>>>>>>> chosen
> > > > >>>>>>>>>> a very simple example, but in reality a more complicated
> > > > >>>> example
> > > > >>>>>>> could use
> > > > >>>>>>>>>> multiple expressions. We have also denoted this region as
> > > > >>>> number
> > > > >>>>>>> 42. That
> > > > >>>>>>>>>> identifier is only for the user, and simplifies reading an
> > > > >>>> llvm-mca
> > > > >>>>>>>>>> analysis
> > > > >>>>>>>>>> report later.
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> When this code is compiled, the region markers (the
> > > > >>>> mca_code_region
> > > > >>>>>>>>>> markers)
> > > > >>>>>>>>>> are transformed into assembly labels. While the markers
> are
> > > > >>>>>>> presented as
> > > > >>>>>>>>>> function calls, in reality they are no-ops.
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> test:
> > > > >>>>>>>>>> pushq   %rbp
> > > > >>>>>>>>>> movq    %rsp, %rbp
> > > > >>>>>>>>>> movsd   %xmm0, -8(%rbp)
> > > > >>>>>>>>>> movsd   %xmm1, -16(%rbp)
> > > > >>>>>>>>>> .Lmca_code_region_start_0: # LLVM-MCA-START ID: 42
> > > > >>>>>>>>>> xorps   %xmm0, %xmm0
> > > > >>>>>>>>>> movsd   %xmm0, -24(%rbp)
> > > > >>>>>>>>>> movsd   -8(%rbp), %xmm0
> > > > >>>>>>>>>> mulsd   -16(%rbp), %xmm0
> > > > >>>>>>>>>> addsd   -24(%rbp), %xmm0
> > > > >>>>>>>>>> movsd   %xmm0, -24(%rbp)
> > > > >>>>>>>>>> .Lmca_code_region_end_0: # LLVM-MCA-END ID: 42
> > > > >>>>>>>>>> movsd   -24(%rbp), %xmm0
> > > > >>>>>>>>>> popq    %rbp
> > > > >>>>>>>>>> retq
> > > > >>>>>>>>>> .section        .mca_code_regions,"", at progbits
> > > > >>>>>>>>>> .quad   42
> > > > >>>>>>>>>> .quad   .Lmca_code_region_start_0
> > > > >>>>>>>>>> .quad   .Lmca_code_region_end_0-.Lmca_code_region_start_0
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> The assembly has been trimmed to show the portions
> relevant to
> > > > >>>> this
> > > > >>>>>>> RFC.
> > > > >>>>>>>>>> Notice the labels enclose the user's defined region, and
> that
> > > > >>>> they
> > > > >>>>>>>>>> preserve the
> > > > >>>>>>>>>> user's arbitrary region identifier, the ever-so-important
> > > > >>>> region 42.
> > > > >>>>>>>>>> In the object file section .mca_code_regions, we have
> noted the
> > > > >>>>>>> user's
> > > > >>>>>>>>>> region
> > > > >>>>>>>>>> identifier (.quad 42), start address, and region size. A
> more
> > > > >>>>>>> complicated
> > > > >>>>>>>>>> example can have multiple regions defined within a single
> > > > >>>>>>> .mca_code_regions
> > > > >>>>>>>>>> section. This section can be read by llvm-mca, allowing
> > > > >>>> llvm-mca to
> > > > >>>>>>> take
> > > > >>>>>>>>>> object files as input instead of assembly source.
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> Details
> > > > >>>>>>>>>> ---------
> > > > >>>>>>>>>> We need a way for a user to identify a region/block of
> code
> > > > >>>> they
> > > > >>>>>>> want
> > > > >>>>>>>>>> analyzed
> > > > >>>>>>>>>> by llvm-mca. We solve this problem by introducing two
> > > > >>>> intrinsics
> > > > >>>>>>> that a
> > > > >>>>>>>>>> user can
> > > > >>>>>>>>>> specify, for identifying regions of code for analysis.
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> The two intrinsics are: llvm.mca.code.regions.start and
> > > > >>>>>>>>>> llvm.mca.code.regions.end. A user can identify a code
> region by
> > > > >>>>>>> inserting
> > > > >>>>>>>>>> the
> > > > >>>>>>>>>> mca_code_region_start and mca_code_region_end markers.
> These
> > > > >>>> are
> > > > >>>>>>> simply
> > > > >>>>>>>>>> clang builtins and are transformed into the aforementioned
> > > > >>>>>>> intrinsics
> > > > >>>>>>>>>> during
> > > > >>>>>>>>>> compilation. The code between the intrinsics are what we
> call
> > > > >>>> "code
> > > > >>>>>>>>>> regions"
> > > > >>>>>>>>>> and are to be easily identifiable by llvm-mca; any code
> > > > >>>> between a
> > > > >>>>>>> start/end
> > > > >>>>>>>>>> pair can be analyzed by llvm-mca at a later time. A user
> can
> > > > >>>> define
> > > > >>>>>>>>>> multiple
> > > > >>>>>>>>>> non-overlapping code regions within their program.
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> The llvm.mca.code.region.start intrinsic takes an integer
> > > > >>>> constant
> > > > >>>>>>> as its
> > > > >>>>>>>>>> only
> > > > >>>>>>>>>> argument. This argument is implemented as a metadata i32,
> and
> > > > >>>> is
> > > > >>>>>>> only used
> > > > >>>>>>>>>> when generating llvm-mca reports. This value allows a
> user to
> > > > >>>> more
> > > > >>>>>>> easily
> > > > >>>>>>>>>> identify a specific code region. llvm.mca.code.region.end
> > > > >>>> takes no
> > > > >>>>>>>>>> arguments.
> > > > >>>>>>>>>> Since we disallow nesting of regions, the first 'end'
> intrinsic
> > > > >>>>>>> lexically
> > > > >>>>>>>>>> following a 'start' intrinsic represents the end of that
> code
> > > > >>>>>>> region.
> > > > >>>>>>>>>> Now that we have a solution for identifying regions for
> > > > >>>> analysis,
> > > > >>>>>>> we now
> > > > >>>>>>>>>> need a
> > > > >>>>>>>>>> way for preserving that information to be read at a later
> > > > >>>> time. To
> > > > >>>>>>>>>> accomplish
> > > > >>>>>>>>>> this we propose adding a new section (.mca_code_regions)
> to the
> > > > >>>>>>> object file
> > > > >>>>>>>>>> generated by llvm. During code generation, the start/end
> > > > >>>> intrinsics
> > > > >>>>>>>>>> described
> > > > >>>>>>>>>> above will be transformed into start/end labels in
> assembly.
> > > > >>>> When
> > > > >>>>>>> llvm
> > > > >>>>>>>>>> generates the object file from the user's code, these
> start/end
> > > > >>>>>>> labels
> > > > >>>>>>>>>> form a
> > > > >>>>>>>>>> pair of values identifying the start of the user's code
> > > > >>>> region, and
> > > > >>>>>>> size.
> > > > >>>>>>>>>> The
> > > > >>>>>>>>>> size represents the number of bytes between the start and
> end
> > > > >>>>>>> address of
> > > > >>>>>>>>>> the
> > > > >>>>>>>>>> labels. Note that the labels are emitted during assembly
> > > > >>>> printing.
> > > > >>>>>>> We hope
> > > > >>>>>>>>>> that these labels have no influence on code generation or
> > > > >>>>>>> basic-block
> > > > >>>>>>>>>> placement. However, the target assembler strategy for
> handling
> > > > >>>>>>> labels is
> > > > >>>>>>>>>> outside of our control.
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> This proposed change affects the size of a binary, but
> only if
> > > > >>>> the
> > > > >>>>>>> user
> > > > >>>>>>>>>> calls
> > > > >>>>>>>>>> the start/end builtins mentioned above. The additional
> size of
> > > > >>>> the
> > > > >>>>>>>>>> .mca_code_regions section, which we imagine to be very
> small
> > > > >>>> (to
> > > > >>>>>>> the order
> > > > >>>>>>>>>> of a
> > > > >>>>>>>>>> few bytes), can trivially be stripped by tools like
> 'strip' or
> > > > >>>>>>> 'objcopy'.
> > > > >>>>>>>>>> Implementation Status
> > > > >>>>>>>>>> ------------------------------
> > > > >>>>>>>>>> We currently have the proposed changes implemented at the
> url
> > > > >>>>>>> posted below.
> > > > >>>>>>>>>> This initial patch only targets ELF object files, and
> does not
> > > > >>>>>>> handle
> > > > >>>>>>>>>> relocatable addresses. Since the start of a code region is
> > > > >>>>>>> represented as
> > > > >>>>>>>>>> an
> > > > >>>>>>>>>> assembly label, and referenced in the .mca_code_regions
> > > > >>>> section,
> > > > >>>>>>> that
> > > > >>>>>>>>>> address
> > > > >>>>>>>>>> is relocatable. That value can be represented as
> > > > >>>> section-relative
> > > > >>>>>>>>>> relocatable
> > > > >>>>>>>>>> symbol (.text + addend), but we are not handling that
> case yet.
> > > > >>>>>>> Instead,
> > > > >>>>>>>>>> the
> > > > >>>>>>>>>> proposed changes only handle linked/executable object
> files.
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> For purposes of review and to communicate the idea, the
> change
> > > > >>>> is
> > > > >>>>>>>>>> presented as a monolithic patch here:
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> https://reviews.llvm.org/D54603
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> The change is presented as a monolithic patch; however, if
> > > > >>>> accepted
> > > > >>>>>>>>>> the patch will be split into three smaller patches:
> > > > >>>>>>>>>> 1. The introduction of the builtins to clang.
> > > > >>>>>>>>>> 2. The llvm portion (the added intrinsics).
> > > > >>>>>>>>>> 3. The llvm-mca portion.
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> Thanks!
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> -Matt
> > > > >>>>>>>>>> _______________________________________________
> > > > >>>>>>>>>> LLVM Developers mailing list
> > > > >>>>>>>>>> llvm-dev at lists.llvm.org
> > > > >>>>>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> > > > >>>>>>>>>>
> > > > >>>>>>>> _______________________________________________
> > > > >>>>>>>> LLVM Developers mailing list
> > > > >>>>>>>> llvm-dev at lists.llvm.org
> > > > >>>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> > > _______________________________________________
> > > LLVM Developers mailing list
> > > llvm-dev at lists.llvm.org
> > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20181217/543fec0b/attachment.html>