[llvm-dev] Metadata in LLVM back-end
Lorenzo Casalino via llvm-dev
llvm-dev at lists.llvm.org
Thu Aug 6 07:47:20 PDT 2020
Am 31/07/20 um 22:47 schrieb David Greene:
> Thanks for keeping this going, Lorenzo.
> Lorenzo Casalino via llvm-dev <llvm-dev at lists.llvm.org> writes:
>>> The first questions need to be “what does it mean?”, “how does it
>>> work?”, and “what is it useful for?”. It is hard to evaluate a
>>> proposal without that.
>> Hi everyone,
>> - "What does it mean?": it means to preserve specific information,
>> represented as metadata assigned to instructions, from the IR level,
>> down to the codegen phases.
> An important part of the definition is "how late?" For my particular
> uses it would be right up until lowering of asm pseudo-instructions,
> even after regalloc and scheduling. I don't know whether someone might
> need metadata even later than that (at asm/obj emission time?) but if
> metadata is supported on Machine IR then it shouldn't be an issue.
"How late" it is context-specific: even in my case, I required such
to be preserved until pseudo instruction expansion. Conservatively, they
preserved until the last pass of codegen pipeline.
Regarding their employment in the later steps, I would not say they are not
required, sinceI worked on a specific topic of secure compilation, and I do
not have the wholepicture in mind; nonetheless, it would be possible to
things work out withthe codegen and later reason on future developments.
> As with IR-level metadata, there should be no guarantee that metadata is
> preserved and that it's a best-effort thing. In other words, relying on
> metadata for correctness is probably not the thing to do.
Ok, I made a mistake stating that metadata should be *preserved*; what
I really meant is to preserve the *information* that such metadata
>> - "How does it work?": metadata should be preserved during the several
>> back-end transformations; for instance, during the lowering phase,
>> DAGCombine performs several optimization to the IR, potentially
>> combining several instructions. The new instruction should, then,
>> assigned with metadata obtained as a proper combination of the
>> original ones (e.g., a union of metadata information).
> I want to make it clear that this is expensive to do, in that the number
> of changes to the codegen pipeline is quite extensive and widespread. I
> know because I've done it*. :) It will help if there are utilities
> people can use to merge metadata during DAG transformation and the more
> we make such transfers and combinations "automatic" the easier it will
> be to preserve metadata.
> Once the mechanisms are there it also takes effort to keep them going.
> For example if a new DAG transformation is done people need to think
> about metadata. This is where "automatic" help makes a real difference.
> * By "it" I mean communicate information down to late phases of codegen.
> I don't have a "metadata in codegen" patch as such. I simply cobbled
> something together in our downstream fork that works for some very
> specific use-cases.
I know what you have been through, and I can only agree with you: for the
project I mentioned above, I had to perform several changes to the whole IR
lowering phase in order to correctly propagate high-level information;
cheap and required a lot of effort.
>> It might be possible to have a dedicated data-structure for such
>> metadata info, and an instance of such structure assigned to each
> I'm not entirely sure what you mean by this.
I was imagining a per-instruction data-structure collecting metadata info
related to that specific instruction, instead of having several metadata info
directly embedded in each instruction.
>> - "What is it useful for?": I think it is quite context-specific; but,
>> in general, it is useful when some "higher-level" information
>> (e.g., that canbe discovered only before the back-end stage of the
>> compiler) are required in the back-end to perform "semantic"-related
> That's my use-case. There's semantic information codegen would like to
> know but is really much more practical to discover at the LLVM IR level
> or even passed from the frontend. Much information is lost by the time
> codegen is hit and it's often impractical or impossible for codegen to
> derive it from first principles.
>> To give an (quite generic) example where such codegen metadata may be
>> useful: in the field of "secure compilation", preservation of security
>> properties during the compilation phases is essential; such properties
>> are specified in the high-level specifications of the program, and may
>> be expressed with IR metadata. The possibility to keep such IR
>> metadata in the codegen phases may allow preservation of properties
>> that may be invalidated by codegen phases.
> That's a great use-case. I do wonder about your use of "essential"
With *essential* I mean fundamental for satisfying a specific target
> Is it needed for correctness? If so an intrinsics-based
> solution may be better.
Uhm...it might sound as a naive question, but what do you mean with
> My use-cases mostly revolve around communication with a proprietary
> frontend and thus aren't useful to the community, which is why I haven't
> pursued this with any great vigor before this.
> I do have uses that convey information from LLVM analyses but
> unfortunately I can't share them for now.
> All of my use-cases are related to optimization. No "metadata" is
> needed for correctness.
> I have pondered whether intrinsics might work for my use-cases. My fear
> with intrinsics is that they will interfere with other codegen analyses
> and transformations. For example they could be a scheduling barrier.
> I also have wondered about how intrinsics work within SelectionDAG. Do
> they impact dagcombine and other transformations? The reason I call out
> SelectionDAG specifically is that most of our downstream changes related
> to conveying information are in DAG-related files (dagcombine, legalize,
> etc.). Perhaps intrinsics could suffice for the purposes of getting
> metadata through SelectionDAG with conversion to "first-class" metadata
> at the Machine IR level. Maybe this is even an intermediate step toward
> "full metadata" throughout the compilation.
I employed intrinsics as a mean for carrying metadata, but,
by my experience, I am not sure they can be resorted as a valid alternative:
- For each llvm-ir instruction employed in my project (e.g., store), a
equivalent intrinsic is declared, with particular parameters representing
metadata (i.e., first-class metadata are represented by specific
- During the lowering, each ad-hoc intrinsic must be properly handled,
adding the proper legalization operations, DAG combinations and so on.
- During MIR conversion of the llvm-ir (i.e., mapping intrinsics to
metadata are passed to the MIR representation of the program.
In particular, the second point rises a critical problem in terms of
(e.g., intrinsic store + intrinsic trunc are not automatically converted
intrinsic truncated store).Then, the backend must be instructed to
optimizations, which are actually already performed on non-intrinsic
(e.g., store + trunc is already converted into a truncated store).
Instead of re-inventing the wheel, and since the backend should be
modified in order to support optimizations on intrinsics, I would rather
insert some sort of mechanism to support metadata attachment as
of the IR/MIR, and automatic merging of metadata, for instance.
I may be wrong (in such case, please, correct me), but if I got it
source-level debugging metadata are "external" (i.e., not a first-class
of the llvm-ir), and their management involve a great effort.
As described above, in my project I used metadata as first class
elements of the
IR/MIR; I found this approach more immediate and simpler to handle, although
some passes and transformation must be modified.
Then, I agree with you saying that metadata infos should be first-class
the IR/MIR (or, at least, "packed" into a structure being first-class
part of the
In any case, I wonder if metadata at codegen level is actually a thing
community would benefit (then, justifying a potentially huge and/or long
patches), or it is something in which only a small group would be
More information about the llvm-dev