[llvm-dev] Which pass should be propagating memory copies

Fri May 19 08:46:43 PDT 2017

Keno,
          I suspect that if you, Daniel B., and I were to have an in person meeting this would take 5~10 minutes
For everyone to (in terms D.B. will appreciate) “converge to a fixed point” (:-)! of understanding. Meanwhile we
are stuck with limited email bandwidth, but a meet-up at the next llvm Bay Area social might be a good idea.

Whether GVN is an appropriate place for this opt is going to hinge on the precise details of the calling
Convention, which you are still being vague about. You are simultaneously saying
1) the memcpy are necessary
2) the memcpy can and should be opt away
But these two statements are mutually exclusive, so you need to be more precise about the CC.

Why is this important?, because even if Daniel B. can enhance GVN to “look through” the memcpys
And optimize the loads from the local-stack-copy into loads through the original pointer argument,
And optimize the stores into the local-stack-copy into stores through the original pointer argument,
There is still the issue of deleting the memcpys themselves, which is the actual performance problem.

But the rules of the C/C++ programming language aren’t typically going to allow these deletions,
For example if the original pointer argument is passed to another function, or the address of any
Or all of the local stack copy are passed to another function, or simply calling *any* function because
It could by the C/C++ rules modify the original data, requiring the memcpys to be preserved.

Also the same logical arguments apply to the loads and stores that Daniel B. thinks he can optimize
In GVN, it depends on where they occur relative to calls to other functions within this function.

The only thing that allows the deletion of the memcpys is intimate knowledge of the Julia-specific
Calling convention. Again similar conclusions apply to even the loads and stores.

And that, IMHO, is inappropriate to include in GVN, which is otherwise a purely C/C++ optimizer,
So a separate Julia calling convention pass is indicated. 

PS, Don’t be intimidated by writing an IR-to-IR pass, I’ve already written one, they are easy.
Yours will be particularly easy (after verifying the transform is legal) as it is just a “replace-all-
Uses-of” which already exists, deleting the memcpys, and finally deleting the stack object.

Peter Lawrence.

> On May 18, 2017, at 8:47 AM, Keno Fischer <keno at juliacomputing.com> wrote:
> 
> Hi Peter,
> 
> thank you for concern and advice. Since we both write the compiler and design the language, we are not particularly
> bound by any pre-existing spec. The concerns about multi-threaded data races are very relevant of course
> and we're well aware of the implications. In the particular case where this comes up, language semantics
> generally guarantee that this is unobservable both in single-threaded and multi-threaded contexts (though
> we generally do allow the user to shoot themselves in the foot if they want to, the primary concern here
> is not really observability, but what the programmer expects from the semantics of the language). For what
> it's worth, this isn't exactly CICO. Our calling convention is generally by reference. However, we do have
> notions of semantic immutability, which is where this particular pattern arises (in cases where a new immutable
> gets created by taking an existing field and modifying it in one place only). Because of these semantic
> guarantees, we know that there's no aliasing of the kind that would be problematic (and expose this
> information to LLVM through the various AA mechanisms). Now, similar issues of course arise with mutable
> memory locations as well. However, in such cases the data race would be explicitly present in the source
> program, so we don't have a problem with the compiler making this optimization. FWIW, our multi-threading
> programming model is in the early stages, and we're considering various language level constraints
> on concurrent data modification to mostly disallow that situation unless explicitly opted in to by the user,
> but that's a bit off. 
> 
> From my perspective, I don't see a reason why GVN shouldn't be doing this (which is why I sent the original
> email in the first place). It would of course be very possible for us to write our own pass that pattern matches
> this and performs the transformation that we want. However, we generally tend to prefer working with the
> community to put the optimizations in the best possible place such that others may automatically take advantage.
> It sounds like the community consensus is that GVN should be able to do this kind of optimization (and thanks
> to Daniel for providing some guidance on implementation!). If people feel strongly that memcpyopt (or a new pass)
> with appropriate pattern matching would be a better place, I'd be happy to go that way as well of course.
> 
> Keno
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170519/c80c9c96/attachment.html>