[llvm-dev] [RFC] IR-level Region Annotations

Fri Jan 20 10:48:16 PST 2017

> On Jan 20, 2017, at 10:45 AM, Yonghong Yan <yanyh15 at gmail.com> wrote:
> 
> 
> 
> On Fri, Jan 20, 2017 at 12:52 PM, Mehdi Amini via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
> 
>> On Jan 20, 2017, at 6:59 AM, Hal Finkel <hfinkel at anl.gov <mailto:hfinkel at anl.gov>> wrote:
>> 
>> On 01/13/2017 12:11 PM, Mehdi Amini wrote:
>> 
>>> 
>>>> On Jan 13, 2017, at 9:41 AM, Hal Finkel <hfinkel at anl.gov <mailto:hfinkel at anl.gov>> wrote:
>>>> 
>>>> 
>>>> 
>>>> On 01/13/2017 12:29 AM, Mehdi Amini wrote:
>>>>> 
>>>>>> On Jan 12, 2017, at 5:02 PM, Hal Finkel <hfinkel at anl.gov <mailto:hfinkel at anl.gov>> wrote:
>>>>>> On 01/12/2017 06:20 PM, Reid Kleckner via llvm-dev wrote:
>>>>>> 
>>>>>>> On Wed, Jan 11, 2017 at 8:13 PM, Mehdi Amini <mehdi.amini at apple.com <mailto:mehdi.amini at apple.com>> wrote:
>>>>>>> Can you elaborate why? I’m curious.
>>>>>>> 
>>>>>>> The con of proposal c was that many passes would need to learn about many region intrinsics. With tokens, you only need to teach all passes about tokens, which they should already know about because WinEH and other things use them.
>>>>>>> 
>>>>>>> With tokens, we can add as many region-introducing intrinsics as makes sense without any additional cost to the middle end. We don't need to make one omnibus region intrinsic set that describes every parallel loop annotation scheme supported by LLVM. Instead we would factor things according to other software design considerations.
>>>>>> 
>>>>>> I think that, unless we allow frontends to add their own intrinsics without recompiling LLVM, this severely restricts the usefulness of this feature.
>>>>> 
>>>>> I’m not convinced that “building a frontend without recompiling LLVM while injecting custom passes” is a strong compelling use-case, i.e. can you explain why requiring such use-case/frontends to rebuild LLVM is so limiting?
>>>> 
>>>> I don't understand your viewpoint. Many frontends either compose their own pass pipelines or use the existing extension-point mechanism. Some frontends, Chapel for example, can insert code using custom address spaces and then insert passes later to turn accesses using pointers to those address spaces into runtime calls. This is the kind of design we'd like to support, without forcing frontends to use custom versions of LLVM, but with annotated regions instead of just with address spaces.
>>> 
>>> I think we’re talking about two different things here: you mentioned originally “without recompiling LLVM”, which I don’t see as major blocker, while now you’re now clarifying I think that you’re more concerned about putting a requirement on a *custom* LLVM, as in “it wouldn’t work with the source from a vanilla upstream LLVM”, which I agree is a different story.
>>> 
>>> That said, it extends the point from the other email (in parallel) about the semantics of the intrinsics: while your solution allows these frontend to reuse the intrinsics, it means that upstream optimization have to consider such intrinsics as optimization barrier because their semantic is unknown.
>> 
>> I see no reason why this needs to be true (at least so long as you're willing to accept a certain amount of "as if" parallelism).
> 
> Sorry, I didn’t quite get that?
> 
>> Moreover, if it is true, then we'll lose the benefits of, for example, being able to hoist scalar loads out of parallel loops. We might need to include dependencies on "inaccessible memory", so cover natural runtime dependencies by default (this can be refined with custom AA logic), but that is not a complete code-motion barrier. Memory being explicitly managed will end up as arguments to the region intrinsics, so we'll automatically get more-fine-grained information.
> 
> Sanjoy gave an example of the kind of optimization that can break the semantic: http://lists.llvm.org/pipermail/llvm-dev/2017-January/109302.html <http://lists.llvm.org/pipermail/llvm-dev/2017-January/109302.html> ; I haven’t yet seen an explanation about how this is addressed?
> If you were asking how this is addressed in the current clang/openmp, the code in the whole parallel region is outlined into a new function by frontend and parallel fork-join is transformed to a runtime call (kmpc_fork_call) that takes as input a pointer to the outlined function. so procedure-based optimization would not perform those optimization Sanjoy listed.
> 

Right, but the question is rather how it’ll work with this proposal.

— 
Mehdi

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170120/1a83b5d7/attachment.html>