[llvm-dev] Metadata in LLVM back-end

Mon Aug 31 01:01:14 PDT 2020

Am 19/08/20 um 22:37 schrieb David Greene:
> Lorenzo Casalino via llvm-dev <llvm-dev at lists.llvm.org> writes:
>
>>>> I was imagining a per-instruction data-structure collecting metadata info
>>>> related to that specific instruction, instead of having several metadata info
>>>> directly embedded in each instruction.
>>> Interesting.  At the IR level metadata isn't necessarily unique, though
>>> it can be made so.  If multiple pieces of information were amalgamated
>>> into one structure that might reduce the ability to share the in-memory
>>> representation, which has a cost.
>>>
>> Uhm...could I ask you to elaborate a bit more on the "limitation on
>> in-memory representation sharing"? It is not clear to me how this
>> would cause a problem.
> I just mean that at the IR level, if you have a metadata node with, say,
> a string "foo bar" and another one with "foo" and put one on an
> instruction and the other on another instruction, they won't share an
> in-memory representation, whereas if you had separate nodes with "foo"
> and "bar" and put both on a single instruction and just "foo" on another
> instruction the "foo" metadata would be shared.
>
But isn't it an implementation aspect? I mean, you can have a metadata
nodes which members are pointers; if two nodes have to share the same
member instance, they can share the same pointer.

After all, even when two instructions refer to a structurally equivalent
Constant object
(https://llvm.org/doxygen/classllvm_1_1Constant.html#details),
they actually share the same pointer to the same Constant object.

> Pre-RA it's relatively easy as long as we're still in SSA.  The
> intrinsic would simply take the instruction it should annotate as an
> operand.  After SSA it obviously becomes more difficult.  I don't have a
> lot of good answers for that right now.  The live range for the value
> defined by the annotated instruction and used the intrinsic would
> contain both instructions so maybe that could be used to connect them.
>
> If the annotated instruction doesn't have an output value (like a store
> on machine architectures) you would use the chain output in SelectionDAG
> but there's no analogue in the MachineInstr representation.
The usage of intrinsics as wrapper for instructions to be annotated is a
really nice idea! Although this would require to instruct almost all
passes of the codegen pipeline to skip them (which, for instance, is already
done for llvm.dbg.* intrinsics).

Nonetheless, although I like the idea, without a strategy to track
output-less
MachineInstructions, it won't go really far :(

Furthermore, after register allocation there is a non-negligible effort
to properly annotate instructions which share the same output register...

Concerning the usage of the live ranges to tie annotated instruction and
intrinsic, I have some doubts:

 1. After register allocation, since metadata intrinsics are skipped
(otherwise,
    they would be involved in the register allocation process,
increasing the
    register pressure), the instruction stream would present both
virtual and
    physical registers, which I am not sure it is totally ok.

 2. Liveness information are still available after register allocation?
Assuming
    a positive answer, live intervals may be split due to register
allocation, making
    connection between intrinsic and annotated instruction really difficult.

An enumeration of the MachineInstrucions, which is preserved through the
codegen
passes, would allow the creation of a 1:1 map between intrinsic and
annotated instruction;
but, unfortunately, there seems to not be such kind of enumeration in LLVM
(maybe, SlotIndexes could might be used in a creative way).

Sorry for the long delay!

-- Lorenzo

>                    -David