[llvm-dev] [RFC] IR-level Region Annotations

Adve, Vikram Sadanand via llvm-dev llvm-dev at lists.llvm.org
Thu Jan 19 12:37:42 PST 2017


> On Jan 19, 2017, at 1:46 PM, Mehdi Amini <mehdi.amini at apple.com> wrote:
> 
>> 
>> On Jan 19, 2017, at 11:36 AM, Adve, Vikram Sadanand via llvm-dev <llvm-dev at lists.llvm.org> wrote:
>> 
<snip>
>> Third, OpenMP has *numerous* clauses, e.g., REDUCTION or PRIVATE,  that are needed because without that, you’d be dependent on fundamentally hard compiler analyses to extract the same information for satisfactory parallel performance; realistic applications cannot depend on the success of such analyses.
> 
> I agree with this, but I’m also wondering if it needs to be first class in the IR?
> For example we know our alias analysis is very basic, and C/C++ have a higher constraint thanks to their type system, but we didn’t inject this higher level information that helps the optimizer as first class IR constructs.
> I wonder if the same wouldn’t apply to an openmp reduction clause for instance, where you could use the “basic” IR construct and the analysis would use a metadata emitted by the frontend instead of trying to infer the reduction.


As Danny said (next message), lack of strong AA does limit some of our optimizations today.  Even beyond that, however, parallel performance is a different beast from scalar optimizations.  If you don’t devirtualize a subset of indirect calls, you may take a few % hit.  If you fail to recognize a reduction  and run the loop serially instead of in parallel, you could easily take a 10x or greater slowdown on that loop, leading to a substantial slowdown overall, depending on the importance of the reduction.

Put another way, individual parallel optimizations (whether done manually or by a compiler) have far larger performance effects, generally speaking, than individual scalar optimizations.


> 
> Just a thought, I have given much time studying other constructs and how they map to the IR :)
> 
>>>> [...]
>>>> (b) Add several new LLVM instructions such as, for parallelism, fork, spawn,
>>>>  join, barrier, etc.
>>>> [...]
>>> 
>>> For me fork and spawn are serving the same purpose, most new schemes suggested three new instructions in total.
>> 
>> A reasonable question is whether to use (#b) first-class instructions for some features, *in combination with* (#d) — i.e., region markers + metadata — or to use #d exclusively.  There are almost certainly far too many essential features in parallel programs to capture them all as new instructions. I don’t see a need to answer this question on Day 1.  Instead, we can begin with regions and metadata annotations, and then “promote” a few features to first-class instructions if the benefit is justified.
>> 
>> Does that make sense?
> 
> Now that I read this, I wonder if it isn’t close to what I tried to express above :)


:-).

-—Vikram

// Vikram S. Adve
// Professor, Department of Computer Science
// University of Illinois at Urbana-Champaign
// vadve at illinois.edu
// http://llvm.org


> 
>> Mehdi
> 
> 
> 
>> 
>> 
>>> Also I think we already have a lot of
>>> simple constructs in the IR to express high-level information properly,
>>> e.g. 'atomicrmw' instructions for high-level reduction. While we currently 
>>> lack analysis passes to extract information from such low-level
>>> representations, these are certainly possible [2,3]. I would argue that such
>>> analysis are a better way to do things than placing "high-level intrinsics" in the IR to mark things like reductions.
>> 
>> IMO, this is too fragile for ensuring performance for realistic applications. While there is a lot of work on reduction recognition in the literature going back to the 1980s, there will always be cases where these algorithms miss a reduction pattern and you get terrible parallel performance. Second, I’m not convinced that you can use specific operations like “atomicrmw” to find candidate loop nests in the first place that *might* be reductions, because the atomic op may in fact be in a library operation linked in as binary code. Third, this greatly increases the complexity of a “first working prototype” for a language like OpenMP because you can’t exclude REDUCTION clauses.  Fourth, I believe array privatization is even harder; more generally, you cannot rely on heroic compiler optimizations to take the place of explicit language directives.
>> 
>> I may be showing my age, but I co-wrote a full HPF-to-MPI source-to-source compiler in the late 1990s at Rice University with some quite sophisticated optimizations [1,2], and we and others in the HPF compiler community — both product and research teams — encountered many cases where compiler analyses were inadequate. There is a credible view that HPF failed despite very high hopes and significant industry and research investment because it relied too much on heroic compiler technology.
>> 
>> [1] http://vadve.web.engr.illinois.edu/Papers/set_framework.pldi98.ps (PLDI 1998)
>> [2] http://vadve.web.engr.illinois.edu/Papers/SC98.NASBenchmarksStudy/dhpfsc98.pdf (SC 1998)
>> 
>> -—Vikram
>> 
>> // Vikram S. Adve
>> // Professor, Department of Computer Science
>> // University of Illinois at Urbana-Champaign
>> // vadve at illinois.edu
>> // http://llvm.org
>> 
>> 
>> 
>>> On Jan 19, 2017, at 11:20 AM, via llvm-dev <llvm-dev at lists.llvm.org> wrote:
>>> 
>>> Date: Thu, 19 Jan 2017 16:30:51 +0100
>>> From: Johannes Doerfert via llvm-dev <llvm-dev at lists.llvm.org>
>>> To: Hal Finkel <hfinkel at anl.gov>
>>> Cc: llvm-dev <llvm-dev at lists.llvm.org>, Simon Moll
>>> 	<moll at cs.uni-saarland.de>
>>> Subject: Re: [llvm-dev] [RFC] IR-level Region Annotations
>>> Message-ID: <20170119153050.GB26909 at arch-linux-jd.home>
>>> Content-Type: text/plain; charset="utf-8"
>>> 
>>> Hi Hal, Hi Xinmin,
>>> 
>>> First let me thank you for pushing in this direction, it means more
>>> people are interested in some kind of change here.
>>> 
>>> While "our" RFC will be sent out next week I want to comment on a specific
>>> point of this one right now:
>>> 
>>>> [...]
>>>> (b) Add several new LLVM instructions such as, for parallelism, fork, spawn,
>>>>  join, barrier, etc.
>>>> [...]
>>> 
>>> For me fork and spawn are serving the same purpose, most new schemes suggested three new instructions in total.
>>> 
>>>> Options |         Pros         | Cons
>>>> [...]
>>>> (b)     | Parallelism becomes  | Huge effort for extending all LLVM passes and
>>>>      | first class citizen  | code generation to support new instructions.
>>>>      |                      | A large set of information still needs to be
>>>>      |                      | represented using other means.
>>>> [...]
>>> 
>>> I am especially curious where you get your data from. Tapir [0] (and to
>>> some degree PIR [1]) have shown that, counterintuitively, only a few changes
>>> to LLVM passes are needed. Tapir was recently used in an MIT class with a
>>> lot of students and it seemed to work well with only minimal changes
>>> to analysis and especially transformation passes.
>>> 
>>> Also the "code generation" issues you mention do, in my opinion, apply to
>>> _all_ proposed schemes as all have to be lowered to either sequential
>>> code or parallel library runtime calls eventually. The sequentialization
>>> for our new Tapir/PIR hybrid has less than 50 lines and generating
>>> parallel runtime calls will probably be similar in all schemes anyway.
>>> 
>>> Regarding the last point, "A large set of information still needs to be
>>> represented using other means", I am curious why this is a bad thing. I
>>> think IR should be simple, each instruction/intrinsic etc. should have
>>> clear and minimal semantics. Also I think we already have a lot of
>>> simple constructs in the IR to express high-level information properly,
>>> e.g. 'atomicrmw' instructions for high-level reduction. While we currently 
>>> lack analysis passes to extract information from such low-level
>>> representations, these are certainly possible [2,3]. I would argue that such
>>> analysis are a better way to do things than placing "high-level
>>> intrinsics" in the IR to mark things like reductions.
>>> 
>>> Cheers,
>>> Johannes, on behalf of the Tapir and PIR team
>>> 
>>> [0] https://cpc2016.infor.uva.es/wp-content/uploads/2016/06/CPC2016_paper_12.pdf
>>> [1] http://compilers.cs.uni-saarland.de/people/doerfert/parallelcfg.pdf
>>> [2] Section 3 in https://arxiv.org/pdf/1505.07716
>>> [3] Section 3.2 and 3.3 in https://www.st.cs.uni-saarland.de/publications/files/streit-taco-2015.pdf
>> 
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev



More information about the llvm-dev mailing list