<div dir="ltr">Did anyone send an RFC for this?<div><br></div><div>First-class metadata would be exceptionally useful for sanitizers and other dynamic tools. For</div><div>example, we want to construct PC-keyed metadata tables in the binary (without affecting the </div><div>generated code), to inform program behavior at runtime or to allow offline analysis. A </div><div>prerequisite is to actually propagate the metadata we need from the Clang frontend or LLVM </div><div>middle-end down to the assembly printer.</div><div><br></div><div>Our team has brainstormed many use cases:</div><div><br></div><div>- <a href="https://youtu.be/2KvaKEyMVEU"><b>GWP-TSan</b></a>: storing PCs of accesses lowered from C++ atomics, to filter them out from race</div><div> detection.</div><div> * <span style="font-family:monospace">List<atomic access PC></span></div><div><b><br></b></div><div><font face="arial, sans-serif">- </font><b>Stack trace compression</b>: storing a <a href="https://lists.llvm.org/pipermail/llvm-dev/2021-June/151044.html">conservative call graph</a>, for use in decompressing stack</div><div> traces offline.</div><div> *<span style="font-family:monospace"> Map[callsite PC] -> List<callee PC></span></div><div><font face="arial, sans-serif"><b><br></b></font></div><div><font face="arial, sans-serif">- </font><b style="font-family:arial,sans-serif">no_sanitize attributes</b><font face="arial, sans-serif">: storing a map of functions that have the </font><font face="monospace">no_sanitize("...")</font></div><div><font face="arial, sans-serif"> attribute to the associated sanitizer, for filtering out from GWP-*San. Ideally we do not</font></div><div><font face="arial, sans-serif"> introduce new </font><font face="monospace">no_sanitize</font><font face="arial, sans-serif"> string literals, but simply rely on existing ones (e.g. a</font></div><div><font face="arial, sans-serif"> n</font><font face="monospace">o_sanitize("thread")</font><font face="arial, sans-serif"> works for both TSan but also GWP-TSan).</font></div><div><font face="arial, sans-serif"> * </font><span style="font-family:monospace">Map[Func] -> SanitizerKind</span></div><div><b style="font-family:arial,sans-serif"><br></b></div><div><font face="arial, sans-serif">- </font><b style="font-family:arial,sans-serif">Fuzzing aid/CFG reconstruction</b><span style="font-family:arial,sans-serif">: marking coverage PCs as function entry/exit or # of</span></div><div><span style="font-family:arial,sans-serif"> outgoing edges from BB (allows to find gaps in coverage frontier).</span></div><div><font face="arial, sans-serif"><b><br></b></font></div><div><font face="arial, sans-serif">- <b>Type-aware malloc and heap profiling</b>: enable the allocator to get the type for a given </font><font face="monospace">new</font></div><div><font face="arial, sans-serif"> call, to optimize for expected usage of the allocation.</font></div><div><font face="arial, sans-serif"> * </font><span style="font-family:monospace">Map[new callsite PC] -> object type</span></div><div><b style="font-family:arial,sans-serif"><br></b></div><div><font face="arial, sans-serif">- </font><b style="font-family:arial,sans-serif">Other</b><span style="font-family:arial,sans-serif">: potential use cases for future bug-finding tools (GWP-assert, GWP-MSan,</span></div><div><span style="font-family:arial,sans-serif"> GWP-DFSan, GWP-UBSan).</span></div><div><div><br></div></div><div>First-class metadata would open the door to some really cool things.</div><div><br></div><div>Thanks,</div><div>Matt Morehouse</div></div><br><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Jan 6, 2021 at 5:56 AM Lorenzo Casalino via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<p><font face="monospace">Dear Tuan,</font></p>
<p><font face="monospace">How are you doing? Did you manage to start
the draft for the RFC?</font></p>
<p><font face="monospace"><br>
</font></p>
<p><font face="monospace">I take this opportunity to wish you all
the best for this new year :)<br>
</font></p>
<p><font face="monospace">Best regards,<br>
Lorenzo Casalino<br>
</font></p>
<div>Le 10/11/20 à 09:27, Lorenzo Casalino a
écrit :<br>
</div>
<blockquote type="cite">
<p><br>
</p>
<div>Le 09/11/20 à 00:30, Son Tuan VU a
écrit :<br>
</div>
<blockquote type="cite">
<div dir="ltr">Hi,
<div><br>
</div>
<div>Thank you all for keeping this going. Indeed I was not
aware that the discussion was going on, I am really sorry
for this late reply.</div>
<div><br>
</div>
</div>
</blockquote>
<font face="monospace">Nice to hear you again! Thank you for
starting this thread ;)</font><br>
<blockquote type="cite">
<div dir="ltr">
<div>I understand Chris' point about metadata design. Either
the metadata becomes stale or removed (if we do not teach
transformations to preserve it), or we end up modifying many
(if not all) transformations to keep the data intact.</div>
<div>Currently in the IR, I feel like the default behavior is
to ignore/remove the metadata, and only a limited number of
transformations know how to maintain and update it, which is
a best-effort approach.</div>
<div>That being said, my initial thought was to adopt this
approach to the MIR, so that we can at least have a minimal
mechanism to communicate additional information to various
transformations, or even dump it to the asm/object file.</div>
<div>In other words, it is the responsibility of the users who
introduce/use the metadata in the MIR to teach the
transformations they selected how to preserve their
metadata. A common API to abstract this would definitely
help, just as combineMetadata() from
lib/Transforms/Utils/Local.cpp does.</div>
<br>
</div>
</blockquote>
<font face="monospace">Unfortunately, I never worked with the
LLVM-IR Metadata (I almost focused on the back-end</font><font face="monospace"><br>
and I just scratched the LLVM's middle-end), but I see your
point.</font>
<p><font face="monospace">Clearly, applying the needed
modifications to all the back-end
transformations/optimizations<br>
is unfeasible and, probably, not worth it -- different users
may have different requirements/needs<br>
regarding a specific pass.</font></p>
<p><font face="monospace"><font face="monospace">I like the idea
of a common API to handle the MIR metadata, and let the end
user handle<br>
such data. Of course, if the community encounters common
cases while handling the metadata, such<br>
cases may be integrated with the upstream project.<br>
</font></font></p>
<p><font face="monospace">Nonetheless, the main point of this
thread is to preserve middle-end metadata down to the<br>
back-end, right after the Instruction Selection phase. Hence,
despite the need of the end user, a<br>
"preserve-all" policy during the lowering stage is required,
which will involve a bit of changes,<br>
in particular in the DAGCombine pass.</font></p>
<p><font face="monospace"><br>
</font></p>
<blockquote type="cite">
<div dir="ltr">
<div>As for my use case, it is also security-related. However,
I do not consider the metadata to be a compilation
"correctness" criteria: metadata, by definition (from the
LLVM IR), can be safely removed without affecting the
program's correctness.</div>
<div>If possible, I would like to have more details on
Lorenzo's use case in order to see how metadata would
interfere with program's correctness.</div>
<div><br>
</div>
</div>
</blockquote>
<p><font face="monospace">I would really like to discuss here the
details, but, unfortunately, I am working on a publication<br>
and, thus, I cannot disclose any detail here :(</font></p>
<p><font face="monospace">However, with "correctness" I do not
refer to "I/O correctness", but the preservation of a<br>
security property expressed in the front-end (e.g., specified
in the source-code) or in the<br>
middle-end (e.g., specified in the LLVM-IR, for instance by a
transformation pass).</font></p>
<p><font face="monospace">From a security point-of-view, removing
or altering metadata does not interfere with the I/O<br>
functionality of the code (although may impact on the
performances), but may introduce<br>
vulnerabilities.</font></p>
<blockquote type="cite">
<div dir="ltr">
<div>As for the RFC, I can definitely try to write one, but
this would be my first time doing so. But maybe it is better
to start with Lorenzo's proposal, as you have already been
working on this? Please tell me if you prefer me to start
the RFC though.<br>
</div>
<div><br>
</div>
</div>
</blockquote>
<font face="monospace">It is the first time for me too, do not
worry!<br>
</font>
<p><font face="monospace">We could just use any other RFC as a
template to get started :D</font></p>
<p><font face="monospace">I think that a structure like the
following would be fine:</font></p>
<p><font face="monospace"> 1. Background<br>
1.1 Motivation<br>
1.2 Use-cases<br>
1.3 Other approaches<br>
2. Goal(s)<br>
3. Requirements<br>
4. Drawbacks and main bottlenecks<br>
5. Design sketch<br>
6. Roadmap sketch<br>
7. Potential future development</font></p>
<p><font face="monospace">It may be a bit overkill; you are warmly
invited to cut/refine these points!<br>
</font></p>
<p><font face="monospace">And...no, I still have no sketch of the
RFC; sorry, I had a bit of workload in these<br>
days.</font></p>
<p><font face="monospace">Yes, you can start the write up of the
RFC.<br>
</font></p>
<p><font face="monospace">Quoting David:</font></p>
<p><font face="monospace"> "Since you first raised the topic
[...] I want to give you</font> <font face="monospace">right
of first refusal."</font></p>
<p><font face="monospace"><br>
</font></p>
<p><font face="monospace">Have a nice day!</font></p>
<p><font face="monospace">-- Lorenzo<br>
</font></p>
<blockquote type="cite">
<div dir="ltr">
<div>Thank you again for keeping this going.</div>
<div><br>
</div>
<div>Sincerely,</div>
<div><br>
</div>
<div>- Son</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Wed, Nov 4, 2020 at 6:30
PM Lorenzo Casalino <<a href="mailto:lorenzo.casalino93@gmail.com" target="_blank">lorenzo.casalino93@gmail.com</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><br>
Le 04/11/20 à 17:40, David Greene a écrit :<br>
> Sorry about the late reply.<br>
><br>
> Lorenzo Casalino <<a href="mailto:lorenzo.casalino93@gmail.com" target="_blank">lorenzo.casalino93@gmail.com</a>>
writes:<br>
><br>
>>>>> - Should not impact compile time
excessively (what is "excessive?")<br>
>>>> Probably, such estimation should be
performed on<br>
>>> Did something get cut off here?<br>
>> Uops. Yep, I removed a paragraph, but, apparentely
I forgot the first<br>
>> period. In any case, we should discuss about how to
quantitatively<br>
>> determine an acceptable upper-bound on the overhead
on the compilation<br>
>> time and give a motivation for it. For instance,
max n% overhead on the<br>
>> compilation time must be guaranteed, because **
list of reasons **.<br>
> I am not sure how we'd arrive at such a number or
motivate/defend it.<br>
> Do we have any sense of the impact of the existing
metadata<br>
> infrastructure? If not I'm not sure we can do it for
something<br>
> completely new. I think we can set a goal but we'd
have to revise it as<br>
> we gain experience.<br>
I think it is the best approach to employ :)<br>
>>> Since you initially raised the topic, do you
want to take the lead in<br>
>>> writing up a RFC? I can certainly do it too
but I want to give you<br>
>>> right of first refusal. :)<br>
>>> -David<br>
>> Uhm...actually, it wasn't me but Son Tuan, so the
right of refusal<br>
>> should be granted to him :) And I noticed now that
he wasn't included in<br>
>> CC of all our mails; I hope he was able to follow
our discussion<br>
>> anyways. I am adding him in this mail and let us
wait if he has any<br>
>> critical feature or point to discuss.<br>
> Fair enough! I have recently taken on a lot more work
so unfortunately<br>
> I can't devote a lot of time to this at the moment.
I've got to clear<br>
> out my pipeline first. I'd be very happy to help
review text, etc.<br>
Do not worry, it is ok ;) Meanwhile we wait for any
feedback/input from Son,<br>
I'll try to prepare a draft of RFC and publish it here.<br>
<br>
Thank you David, and have a nice day :)<br>
<br>
-- Lorenzo<br>
<br>
> -David<br>
</blockquote>
</div>
</blockquote>
</blockquote>
</div>
_______________________________________________<br>
LLVM Developers mailing list<br>
<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>
<a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>
</blockquote></div>