[LLVMdev] Proposal: Generic auto-vectorization and parallelization approach for LLVM and Polly

Sat Jan 8 16:34:18 PST 2011

On 9 January 2011 00:07, Tobias Grosser <grosser at fim.uni-passau.de> wrote:
> Matching the target vector width in our heuristics will obviously give the
> best performance. So to get optimal performance Polly needs to take target
> data into account.

Indeed! And even if you lack target information, you won't generate
wrong code. ;)

> Talking about OpenCL. The lowering you described for the large vector
> instructions sounds reasonable. Optimal code would however probably produced
> by revisiting the whole loop structure and generating one that is
> performance wise optimal for the target architecture.

Yes, and this is an important point in OpenCL for CPUs. If we could
run a sub-pass of Polly (just the vector fiddling) after the
legalization, that would make it much easier for OpenCL
implementations.

However, none of these apply to GPUs, and any pass you run could
completely destroy the semantics for a GPU back-end. The AMD
presentation on the meetings last year expose some of that.

cheers,
--renato