[LLVMdev] LLVM Loop Vectorizer

Fri Oct 5 14:52:04 PDT 2012

----- Original Message -----
> From: "Andrew Trick" <atrick at apple.com>
> To: "Hal Finkel" <hfinkel at anl.gov>
> Cc: "Nadav Rotem" <nrotem at apple.com>, "llvmdev at cs.uiuc.edu Mailing List" <llvmdev at cs.uiuc.edu>
> Sent: Friday, October 5, 2012 4:27:11 PM
> Subject: Re: [LLVMdev] LLVM Loop Vectorizer
> 
> 
> On Oct 5, 2012, at 1:47 PM, Hal Finkel <hfinkel at anl.gov> wrote:
> > I don't really understand where you want to draw the line. Should
> > the inliner get target-specific input?
> 
> Inlining always does a canonical transformation. It can take whatever
> target data is available at it's level for heuristics, but that
> doesn't make it a target lowering pass.

Agreed. I've recorded some additional thoughts below.

> 
> Similarly, full unrolling is a canonical transformation that may use
> target-specific heuristics. Contrast that with partial unrolling or
> vectorization, which are anti-canonical transformations.
> 
> > InstCombine?
> 
> I think there is too much temptation currently to use the canonical
> InstCombine pass to facilitate instruction selection. It should only
> facilitate downstream IR analysis and simplification.
> 
> I see no problem conceptually running an anti-canonical InstCombine
> as part of codegen that makes use of target hooks. I think this
> would make ISEL problems easier to deal with, and will eventually be
> necessary anyway to clean up after other target lowering passes.
> Obviously not a perfect solution, but better than doing everything
> in one CodeGenPrepare pass.

 1. We should not have code to canonicalize target-specific intrinsics inside InstCombine. These should be handled via callbacks somehow into the Targets.
 2. InstCombine currently makes decisions regarding canonical forms that it shouldn't, for example, it currently does not form shuffle masks that don't already appear because of a concern over increasing register pressure. There should be target-specific input into this decisions because on some targets some shuffle masks have a very low cost regardless of whether these already appear.

> 
> > How about Polly? I think that the answer to all of these questions
> > is probably, at some level, yes.
> 
> There is always the option of splitting a loop optimization problem
> into an early, canonical run to aid analysis, followed by a late
> target lowering run to optimize codegen. That's only a problem when
> the canonicalization can badly pessimize the code in a way that
> loses information or is hard to recover from.

For something like Polly to do a good job, as I understand it, it really should have access to some target-specific data. This is for vectorization, and also for understanding the memory hierarchy (I know we don't have this now, but I think we will at some point specifically because of this use case). 

 -Hal

> 
> -Andy
> 

-- 
Hal Finkel
Postdoctoral Appointee
Leadership Computing Facility
Argonne National Laboratory