[llvm-dev] Metadata in LLVM back-end

Mon Aug 31 05:10:43 PDT 2020

Lorenzo Casalino via llvm-dev <llvm-dev at lists.llvm.org> writes:

>> If the annotated instruction doesn't have an output value (like a store
>> on machine architectures) you would use the chain output in SelectionDAG
>> but there's no analogue in the MachineInstr representation.

> The usage of intrinsics as wrapper for instructions to be annotated is
> a really nice idea! Although this would require to instruct almost all
> passes of the codegen pipeline to skip them (which, for instance, is
> already done for llvm.dbg.* intrinsics).

It's not free, certainly.

> Nonetheless, although I like the idea, without a strategy to track
> output-less MachineInstructions, it won't go really far :(

Agreed.  There are probably ways to hack it in, but true metadata would
b e much better.

> Furthermore, after register allocation there is a non-negligible effort
> to properly annotate instructions which share the same output register...
>
> Concerning the usage of the live ranges to tie annotated instruction and
> intrinsic, I have some doubts:
>
>  1. After register allocation, since metadata intrinsics are skipped
> (otherwise,     they would be involved in the register allocation
> process, increasing the     register pressure), the instruction stream
> would present both virtual and     physical registers, which I am not
> sure it is totally ok.

They would have to participate in register allocation.  I think the only
downside would be an intrinsic that artificially extends the live range
of a value by using it past its true dead point, either because the use
really is the "last" one or because it fills a "hole" in the live range
that otherwise would exist (for example a use in one of the if-then-else
branches that would otherwise not exist).

If the intrinsics really shadow "real" instructions then it should be
possible to place them such that this is not an issue; for example, you
could place them immediately before the "real" instruction.

It's possible they could introduce extra spills and reloads, in that if
a value is spilled it would be reloaded before the intrinsic.  If the
intrinsic were placed immediately before the "real" instruction then the
reload would very likely be re-used for the "real" instruction so this
is probably not an issue in practice.

>  2. Liveness information are still available after register
> allocation?  Assuming     a positive answer, live intervals may be
> split due to register allocation, making     connection between
> intrinsic and annotated instruction really difficult.

Intervals are available post-RA.  They still contain information about
defs so it is *possible* to track things back though the information
tends to degrade.

> An enumeration of the MachineInstrucions, which is preserved through
> the codegen passes, would allow the creation of a 1:1 map between
> intrinsic and annotated instruction; but, unfortunately, there seems
> to not be such kind of enumeration in LLVM (maybe, SlotIndexes could
> might be used in a creative way).

Yeah, SlotIndexes are what is used in the live ranges.

> Sorry for the long delay!

No problem.  It's good to hash these things out and identify areas of
weakness that metadata could fill.

                   -David