[llvm-dev] Should I add intrinsics to write my own automatic reference counting passes?

Mon Nov 30 13:28:06 PST 2020

On 20 Nov 2020, at 4:29, Ola Fosheim Grøstad wrote:
> On Thu, Nov 19, 2020 at 1:04 AM John McCall <rjmccall at apple.com> 
> wrote:
>
>> The main problem for this sort of optimization is that it is 
>> difficult
>> to do on an IR like LLVM’s, where the semantic relationships 
>> between
>> values that exist in the program have been lowered into a sequence of
>> primitive operations with no remaining structural relationship.
>
>
> I understand. I think one possible advantage though would be to 
> optimize
> C++ shared_ptr as well when calling into C++ code (assuming that 
> everything
> is done with a modified version of clang++), or to provide a C library 
> with
> macros that will emit the ARC-intrinsics so that C-FFI can benefit 
> from the
> same optimizations...

Well, I understand that this is the *goal*, and that as as *goal* it’d 
be
quite valuable.  I’m just saying that it’s going to be hard to do 
correctly
on LLVM IR because the high-level structure will have been lost.

> Dealing with unrelated operations and retroactively attempting to 
> infer
>> relationships, as LLVM IR must, turns it into a fundamentally 
>> difficult
>> analysis that often relies on semantic knowledge that isn’t 
>> expressed
>> in the IR, so that you’re actually reasoning about what happens 
>> under
>> a “well-behaved” frontend.
>>
>
> Yes, so basically, if one chooses to do this within the LLVM IR then 
> it
> might be difficult to share the ARC optimizations with other front 
> ends.

That’s true but not really what I mean.  In practice, it is hard to 
specify
the rules that will make your analysis and transformations valid on the
LLVM IR level, and you will find yourself reasoning a lot about what a
reasonable frontend would emit for particular patterns of code.  When 
you
find yourself doing that kind of reasoning, you are almost certainly in 
a
bad place where you’re not going to meet your goals, and you need to 
be
doing the optimization at a level that still preserves the semantic
structure you’re trying to reason about.  Unfortunately this is a 
problem
in Clang, because there’s no IR designed for optimization prior to 
LLVM
IRGen.

> I don't  intend to give direct access to the acquire/release or the 
> ref-count
> value, so I think I can assume that the front end provides "well 
> behaved"
> IR.

The main problems are actually around two points:

1. What code that theoretically could access/change the refcount 
actually
    does, and how?
2. What code relies on the refcount?

For example, if you have a shared reference to an object, and you 
don’t
use it, and then you drop the shared reference, it’d be nice to 
eliminate
the unnecessary refcount traffic.  But to do that, you have to actually
prove that nothing in between is relying on the refcount being higher.
In particular, there could be multiple LLVM IR values that represent 
that
shared reference, something like this:

```
   call @retain(%shared_ref %0)
   …
   call @use(%shared_ref %1)     // %1 is actually the same as %0
   …
   call @release(%shared_ref %2) // %2 is also actually the same as %0
```

It is highly unlikely that there is a valid pattern of code where LLVM
could figure out that %0 is the same as %2 but not the same as %1.
(For example, imagine that %1 and %2 are loads from a memory location
that %0 was stored into.  %2 is presumably loaded after %1, so if LLVM
can forward the store of %0 to %2, it should be able to forward it to %1
as well.)  That is, it is almost certain that your optimization will
either the see the above (not optimizable because the retain/release
may be unrelated) or something like the below (still probably not
optimizable because of the intermediate use):

```
   call @retain(%shared_ref %0)
   …
   call @use(%shared_ref %0)
   …
   call @release(%shared_ref %0)
```

But it’s *possible* that LLVM could actually produce this pattern:

```
   call @retain(%shared_ref %0)
   …
   call @use(%shared_ref %1)     // %1 is actually the same as %0
   …
   call @release(%shared_ref %0)
```

And then your optimization might fire and eliminate this 
“unnecessary”
pair of retain/release operations.

The problem here is that IR doesn’t guarantee the structure (the 
semantic
ties between copies, uses, and destroys) that you need to use to do your
optimization.

John.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20201130/f33f1eda/attachment.html>