[llvm-dev] [LLVMdev] LLVM loop vectorizer - changing vectorized code
Alex Susu via llvm-dev
llvm-dev at lists.llvm.org
Tue Jun 21 07:18:39 PDT 2016
Christopher, please see answers below.
On 6/13/2016 10:31 PM, C Bergström wrote:
> On Tue, Jun 14, 2016 at 3:22 AM, Alex Susu via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
>> Hello, Mikhail.
>> I'm planning to do source-to-source transformation for loop
>> Basically I want to generate C (C++) code from C (C++) source code:
>> - the code that is not vectorized remains the same - this would be
>> simple to achieve if we can obtain precisely the source location of each
>> - the code that gets vectorized I want to translate in C code the
>> parts that are sequential and generate SIMD intrinsics for my SIMD processor
>> where normally it would generate vector instructions.
>> I started looking at InnerLoopVectorizer::vectorize() and
>> InnerLoopVectorizer::createEmptyLoop(). Not generating LLVM code but C/C++
>> code (with the help of LLVM intrinsics) is not trivial, but it should be
>> reasonably simple to achieve.
>> Would you advise for such an operation as the one described above? I
>> guess doing this as a Clang phase (working on the source code) is not really
>> a bad idea either, since I would have better control on source code, but I
>> would need to reimplement the loop vectorizer algorithm that is currently
>> implemented on LLVM code.
> vectorization is a coordination from high level optimizations like
> loop level stuff and low level target stuff. If you are still at the
> source level, how do you plan to handle the actual lowering?
LoopVectorize.cpp has nothing to do with lowering, as far as I know.
Vectorization was shown to work as a source-to-source transformation pass in the
. In their case the generated code is the source code somewhat transformed and augmented
with x86 intrinsics (they have implemented probably? vector data-types directly in the AST).
But one could go further: we could have C code with vector data types (for example the
OpenCL kernel language) and we can compile this code with an OpenCL compiler.
> In that
> case you'll still always be at the mercy of another piece, which may
> or may not be able to handle what you've done. (In theory your
> transformation could be correct, but backend just not handle it)
> Having said this - why not actually work on fixing the root of the
> "problem" - that being the actual llvm passes which aren't doing what
> you need. This would also likely be more robust and you can maintain
> control over the whole experiment (compilation flow)
Indeed, it seems that working on LoopVectorize.cpp is not the best idea (Mikhail
noted that loop transformations like loop fission, currently not implemented, can disallow
normally doing source-to-source transformation from LoopVectorize.cpp), but it seems to be
OK for the moment.
But I also need to do instruction selection for the SIMD/vector unit and best is to
let the LLVM back end do this. The Scout project does instr selection in the ~frontend and
I guess this could be suboptimal since it does not use LLVM's register allocator, etc.
Actually any thought on this aspect is welcome (or similarly put: how do x86
intrinsics do register allocation - see
> I get really annoyed when reviewing papers from academics who have
> used source-to-source because they thought it was "easier". Short term
> short-cuts aren't likely going to produce novel results..
Although I haven't worked much on source-to-source transformation it seems to allow
easier optimization for data-structures than when working with LLVM-IR.
But deciding on the right place to implement well such a transformation pass in the
compilation flow seems to be a rather difficult decision.
On 6/13/2016 10:34 PM, Mehdi Amini wrote:
> Some related work: http://llvm.org/devmtg/2013-04/krzikalla-slides.pdf
More information about the llvm-dev