<!DOCTYPE html>

<html>

<head>

<meta http-equiv="Content-Type" content="text/xhtml; charset=utf-8">

</head>

<body>

<div style="font-family:sans-serif"><div style="white-space:normal">

<p dir="auto">On 20 Nov 2020, at 4:29, Ola Fosheim Grøstad wrote:</p>


</div>

<div style="white-space:normal"><blockquote style="border-left:2px solid #777; color:#777; margin:0 0 5px; padding-left:5px"><p dir="auto">On Thu, Nov 19, 2020 at 1:04 AM John McCall <rjmccall@apple.com> wrote:<br>

</p>

<blockquote style="border-left:2px solid #777; color:#999; margin:0 0 5px; padding-left:5px; border-left-color:#999"><p dir="auto">The main problem for this sort of optimization is that it is difficult<br>

to do on an IR like LLVM’s, where the semantic relationships between<br>

values that exist in the program have been lowered into a sequence of<br>

primitive operations with no remaining structural relationship.</p>

</blockquote><p dir="auto">I understand. I think one possible advantage though would be to optimize<br>

C++ shared_ptr as well when calling into C++ code (assuming that everything<br>

is done with a modified version of clang++), or to provide a C library with<br>

macros that will emit the ARC-intrinsics so that C-FFI can benefit from the<br>

same optimizations...</p>

</blockquote></div>

<div style="white-space:normal">


<p dir="auto">Well, I understand that this is the <em>goal</em>, and that as as <em>goal</em> it’d be<br>

quite valuable.  I’m just saying that it’s going to be hard to do correctly<br>

on LLVM IR because the high-level structure will have been lost.</p>


</div>

<div style="white-space:normal"><blockquote style="border-left:2px solid #777; color:#777; margin:0 0 5px; padding-left:5px"><p dir="auto">Dealing with unrelated operations and retroactively attempting to infer</p>

<blockquote style="border-left:2px solid #777; color:#999; margin:0 0 5px; padding-left:5px; border-left-color:#999"><p dir="auto">relationships, as LLVM IR must, turns it into a fundamentally difficult<br>

analysis that often relies on semantic knowledge that isn’t expressed<br>

in the IR, so that you’re actually reasoning about what happens under<br>

a “well-behaved” frontend.<br>

</p>

</blockquote><p dir="auto">Yes, so basically, if one chooses to do this within the LLVM IR then it<br>

might be difficult to share the ARC optimizations with other front ends.</p>

</blockquote></div>

<div style="white-space:normal">


<p dir="auto">That’s true but not really what I mean.  In practice, it is hard to specify<br>

the rules that will make your analysis and transformations valid on the<br>

LLVM IR level, and you will find yourself reasoning a lot about what a<br>

reasonable frontend would emit for particular patterns of code.  When you<br>

find yourself doing that kind of reasoning, you are almost certainly in a<br>

bad place where you’re not going to meet your goals, and you need to be<br>

doing the optimization at a level that still preserves the semantic<br>

structure you’re trying to reason about.  Unfortunately this is a problem<br>

in Clang, because there’s no IR designed for optimization prior to LLVM<br>

IRGen.</p>


</div>

<div style="white-space:normal"><blockquote style="border-left:2px solid #777; color:#777; margin:0 0 5px; padding-left:5px"><p dir="auto">I don't  intend to give direct access to the acquire/release or the ref-count<br>

value, so I think I can assume that the front end provides "well behaved"<br>

IR.</p>

</blockquote></div>

<div style="white-space:normal">


<p dir="auto">The main problems are actually around two points:</p>


<ol>

<li value="1">What code that theoretically could access/change the refcount actually

does, and how?</li>

<li value="2">What code relies on the refcount?</li>

</ol>


<p dir="auto">For example, if you have a shared reference to an object, and you don’t<br>

use it, and then you drop the shared reference, it’d be nice to eliminate<br>

the unnecessary refcount traffic.  But to do that, you have to actually<br>

prove that nothing in between is relying on the refcount being higher.<br>

In particular, there could be multiple LLVM IR values that represent that<br>

shared reference, something like this:</p>


<pre style="background-color:#F7F7F7; border-radius:5px 5px 5px 5px; margin-left:15px; margin-right:15px; max-width:90vw; overflow-x:auto; padding:5px" bgcolor="#F7F7F7"><code style="background-color:#F7F7F7; border-radius:3px; margin:0; padding:0" bgcolor="#F7F7F7">  call @retain(%shared_ref %0)

  …

  call @use(%shared_ref %1)     // %1 is actually the same as %0

  …

  call @release(%shared_ref %2) // %2 is also actually the same as %0

</code></pre>


<p dir="auto">It is highly unlikely that there is a valid pattern of code where LLVM<br>

could figure out that %0 is the same as %2 but not the same as %1.<br>

(For example, imagine that %1 and %2 are loads from a memory location<br>

that %0 was stored into.  %2 is presumably loaded after %1, so if LLVM<br>

can forward the store of %0 to %2, it should be able to forward it to %1<br>

as well.)  That is, it is almost certain that your optimization will<br>

either the see the above (not optimizable because the retain/release<br>

may be unrelated) or something like the below (still probably not<br>

optimizable because of the intermediate use):</p>


<pre style="background-color:#F7F7F7; border-radius:5px 5px 5px 5px; margin-left:15px; margin-right:15px; max-width:90vw; overflow-x:auto; padding:5px" bgcolor="#F7F7F7"><code style="background-color:#F7F7F7; border-radius:3px; margin:0; padding:0" bgcolor="#F7F7F7">  call @retain(%shared_ref %0)

  …

  call @use(%shared_ref %0)

  …

  call @release(%shared_ref %0)

</code></pre>


<p dir="auto">But it’s <em>possible</em> that LLVM could actually produce this pattern:</p>


<pre style="background-color:#F7F7F7; border-radius:5px 5px 5px 5px; margin-left:15px; margin-right:15px; max-width:90vw; overflow-x:auto; padding:5px" bgcolor="#F7F7F7"><code style="background-color:#F7F7F7; border-radius:3px; margin:0; padding:0" bgcolor="#F7F7F7">  call @retain(%shared_ref %0)

  …

  call @use(%shared_ref %1)     // %1 is actually the same as %0

  …

  call @release(%shared_ref %0)

</code></pre>


<p dir="auto">And then your optimization might fire and eliminate this “unnecessary”<br>

pair of retain/release operations.</p>


<p dir="auto">The problem here is that IR doesn’t guarantee the structure (the semantic<br>

ties between copies, uses, and destroys) that you need to use to do your<br>

optimization.</p>


<p dir="auto">John.</p>

</div>

</div>

</body>

</html>