[LLVMdev] loop pragmas

Thu Nov 22 12:12:17 PST 2012

----- Original Message -----
> From: "David Tweed" <david.tweed at arm.com>
> To: "Tobias Grosser" <tobias at grosser.es>, "Krzysztof Parzyszek" <kparzysz at codeaurora.org>
> Cc: llvmdev at cs.uiuc.edu
> Sent: Thursday, November 22, 2012 3:01:07 AM
> Subject: Re: [LLVMdev] loop pragmas
> 
> > Other types of annotations that are
> > "harmless" are probably good to have, for example "unroll-by"
> > (assuming
> > that this is a suggestion to the compiler, not an order).
> 
> | To my knowledge, we are avoiding to allow the user to 'tune' the
> | compiler. Manual tuning may be good for a certain piece of
> | hardware, but
> | will have negative effects on other platforms.
> 
> Note that LLVM is typically the "backend" compiler portion of a
> combined
> compiler that does language specific stuff. It would be useful to
> have a way
> for such a front-end compiler to, if it so desires, annotate
> transformation
> like loop unroll factors if there's machinery to implement the
> transformation. I'd actually prefer such directives to actually be
> orders
> rather than suggestions, unless they specify impossible things.

I agree.

> 
> In a personal capacity, I'm very interested in auto-tuning pieces of
> software -- think ATLAS or SPIRAL -- where the ability to empirically
> try
> various alternatives on a new hardware platform is very useful.

FWIW, I also have my eye on (something like) this project:
http://projects.csail.mit.edu/petabricks/

> 
> | Instead of providing facilities to tune the hardware, we should
> | understand why LLVM does not choose the right unrolling factor.
> | Maybe
> | there is additional information that can help LLVM to derive that
> | information.
> 
> 
> This is a laudable goal, but there always comes up the issue that
> developing
> understanding of why things perform in a particular way on a modern
> CPU/memory/chipset combination requires manpower, often more manpower
> than
> is available.

I agree that we should try harder than we currently do. Nevertheless, providing a user-override is also important.

Full analysis turns out to be very difficult, and would require analysis of instruction scheduling, accounting for characteristics of the memory hierarchy, etc. In the end, the best that can be done is profiling-guided optimization to capture some of these factors, which I think requires much of the same infrastructure as providing user-accessible pragmas (and then much more).

 -Hal

> 
> Regards,
> Dave
> 
> 
> 
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> 

-- 
Hal Finkel
Postdoctoral Appointee
Leadership Computing Facility
Argonne National Laboratory