[llvm-dev] Metadata in LLVM back-end

Tue Jun 15 16:32:46 PDT 2021

Did anyone send an RFC for this?

First-class metadata would be exceptionally useful for sanitizers and other
dynamic tools.  For
example, we want to construct PC-keyed metadata tables in the binary
(without affecting the
generated code), to inform program behavior at runtime or to allow offline
analysis.  A
prerequisite is to actually propagate the metadata we need from the Clang
frontend or LLVM
middle-end down to the assembly printer.

Our team has brainstormed many use cases:

- *GWP-TSan* <https://youtu.be/2KvaKEyMVEU>:  storing PCs of accesses
lowered from C++ atomics, to filter them out from race
  detection.
  *  List<atomic access PC>

- *Stack trace compression*:  storing a conservative call graph
<https://lists.llvm.org/pipermail/llvm-dev/2021-June/151044.html>, for use
in decompressing stack
  traces offline.
  * Map[callsite PC] -> List<callee PC>

- *no_sanitize attributes*:  storing a map of functions that have the
no_sanitize("...")
  attribute to the associated sanitizer, for filtering out from GWP-*San.
Ideally we do not
  introduce new no_sanitize string literals, but simply rely on existing
ones (e.g. a
  no_sanitize("thread") works for both TSan but also GWP-TSan).
  *  Map[Func] -> SanitizerKind

- *Fuzzing aid/CFG reconstruction*:  marking coverage PCs as function
entry/exit or # of
  outgoing edges from BB (allows to find gaps in coverage frontier).

- *Type-aware malloc and heap profiling*:  enable the allocator to get the
type for a given new
  call, to optimize for expected usage of the allocation.
  *  Map[new callsite PC] -> object type

- *Other*:  potential use cases for future bug-finding tools (GWP-assert,
GWP-MSan,
  GWP-DFSan, GWP-UBSan).

First-class metadata would open the door to some really cool things.

Thanks,
Matt Morehouse

On Wed, Jan 6, 2021 at 5:56 AM Lorenzo Casalino via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> Dear Tuan,
>
> How are you doing? Did you manage to start the draft for the RFC?
>
>
> I take this opportunity to wish you all the best for this new year :)
>
> Best regards,
> Lorenzo Casalino
> Le 10/11/20 à 09:27, Lorenzo Casalino a écrit :
>
>
> Le 09/11/20 à 00:30, Son Tuan VU a écrit :
>
> Hi,
>
> Thank you all for keeping this going. Indeed I was not aware that the
> discussion was going on, I am really sorry for this late reply.
>
> Nice to hear you again! Thank you for starting this thread ;)
>
> I understand Chris' point about metadata design. Either the metadata
> becomes stale or removed (if we do not teach transformations to preserve
> it), or we end up modifying many (if not all) transformations to keep the
> data intact.
> Currently in the IR, I feel like the default behavior is to ignore/remove
> the metadata, and only a limited number of transformations know how to
> maintain and update it, which is a best-effort approach.
> That being said, my initial thought was to adopt this approach to the MIR,
> so that we can at least have a minimal mechanism to communicate additional
> information to various transformations, or even dump it to the asm/object
> file.
> In other words, it is the responsibility of the users who introduce/use
> the metadata in the MIR to teach the transformations they selected how to
> preserve their metadata. A common API to abstract this would definitely
> help, just as combineMetadata() from lib/Transforms/Utils/Local.cpp does.
>
> Unfortunately, I never worked with the LLVM-IR Metadata (I almost focused
> on the back-end
> and I just scratched the LLVM's middle-end), but I see your point.
>
> Clearly, applying the needed modifications to all the back-end
> transformations/optimizations
> is unfeasible and, probably, not worth it -- different users may have
> different requirements/needs
> regarding a specific pass.
>
> I like the idea of a common API to handle the MIR metadata, and let the
> end user handle
> such data. Of course, if the community encounters common cases while
> handling the metadata, such
> cases may be integrated with the upstream project.
>
> Nonetheless, the main point of this thread is to preserve middle-end
> metadata down to the
> back-end, right after the Instruction Selection phase. Hence, despite the
> need of the end user, a
> "preserve-all" policy during the lowering stage is required, which will
> involve a bit of changes,
> in particular in the DAGCombine pass.
>
>
> As for my use case, it is also security-related. However, I do not
> consider the metadata to be a compilation "correctness" criteria: metadata,
> by definition (from the LLVM IR), can be safely removed without affecting
> the program's correctness.
> If possible, I would like to have more details on Lorenzo's use case in
> order to see how metadata would interfere with program's correctness.
>
> I would really like to discuss here the details, but, unfortunately, I am
> working on a publication
> and, thus, I cannot disclose any detail here :(
>
> However, with "correctness" I do not refer to "I/O correctness", but the
> preservation of a
> security property expressed in the front-end (e.g., specified in the
> source-code) or in the
> middle-end (e.g., specified in the LLVM-IR, for instance by a
> transformation pass).
>
> From a security point-of-view, removing or altering metadata does not
> interfere with the I/O
> functionality of the code (although may impact on the performances), but
> may introduce
> vulnerabilities.
>
> As for the RFC, I can definitely try to write one, but this would be my
> first time doing so. But maybe it is better to start with Lorenzo's
> proposal, as you have already been working on this? Please tell me if you
> prefer me to start the RFC though.
>
> It is the first time for me too, do not worry!
>
> We could just use any other RFC as a template to get started :D
>
> I think that a structure like the following would be fine:
>
>   1. Background
>      1.1 Motivation
>      1.2 Use-cases
>      1.3 Other approaches
>   2. Goal(s)
>   3. Requirements
>   4. Drawbacks and main bottlenecks
>   5. Design sketch
>   6. Roadmap sketch
>   7. Potential future development
>
> It may be a bit overkill; you are warmly invited to cut/refine these
> points!
>
> And...no, I still have no sketch of the RFC; sorry, I had a bit of
> workload in these
> days.
>
> Yes, you can start the write up of the RFC.
>
> Quoting David:
>
>   "Since you first raised the topic [...] I want to give you right of
> first refusal."
>
>
> Have a nice day!
>
> -- Lorenzo
>
> Thank you again for keeping this going.
>
> Sincerely,
>
> - Son
>
> On Wed, Nov 4, 2020 at 6:30 PM Lorenzo Casalino <
> lorenzo.casalino93 at gmail.com> wrote:
>
>>
>> Le 04/11/20 à 17:40, David Greene a écrit :
>> > Sorry about the late reply.
>> >
>> > Lorenzo Casalino <lorenzo.casalino93 at gmail.com> writes:
>> >
>> >>>>> - Should not impact compile time excessively (what is "excessive?")
>> >>>> Probably, such estimation should be performed on
>> >>> Did something get cut off here?
>> >> Uops. Yep, I removed a paragraph, but, apparentely I forgot the first
>> >> period. In any case, we should discuss about how to quantitatively
>> >> determine an acceptable upper-bound on the overhead on the compilation
>> >> time and give a motivation for it. For instance, max n% overhead on the
>> >> compilation time must be guaranteed, because ** list of reasons **.
>> > I am not sure how we'd arrive at such a number or motivate/defend it.
>> > Do we have any sense of the impact of the existing metadata
>> > infrastructure?  If not I'm not sure we can do it for something
>> > completely new.  I think we can set a goal but we'd have to revise it as
>> > we gain experience.
>> I think it is the best approach to employ :)
>> >>> Since you initially raised the topic, do you want to take the lead in
>> >>> writing up a RFC?  I can certainly do it too but I want to give you
>> >>> right of first refusal.  :)
>> >>>                     -David
>> >> Uhm...actually, it wasn't me but Son Tuan, so the right of refusal
>> >> should be granted to him :) And I noticed now that he wasn't included
>> in
>> >> CC of all our mails; I hope he was able to follow our discussion
>> >> anyways. I am adding him in this mail and let us wait if he has any
>> >> critical feature or point to discuss.
>> > Fair enough!  I have recently taken on a lot more work so unfortunately
>> > I can't devote a lot of time to this at the moment.  I've got to clear
>> > out my pipeline first.  I'd be very happy to help review text, etc.
>> Do not worry, it is ok ;) Meanwhile we wait for any feedback/input from
>> Son,
>> I'll try to prepare a draft of RFC and publish it here.
>>
>> Thank you David, and have a nice day :)
>>
>> -- Lorenzo
>>
>> >                  -David
>>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210615/4cb70b07/attachment.html>