[llvm-dev] [LLVMdev] LLVM loop vectorizer - changing vectorized code

Tue Jun 21 07:18:39 PDT 2016

   Hello.
     Christopher, please see answers below.

On 6/13/2016 10:31 PM, C Bergström wrote:
> On Tue, Jun 14, 2016 at 3:22 AM, Alex Susu via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
>>    Hello, Mikhail.
>>      I'm planning to do source-to-source transformation for loop
>> vectorization.
>>      Basically I want to generate C (C++) code from C (C++) source code:
>>        - the code that is not vectorized remains the same - this would be
>> simple to achieve if we can obtain precisely the source location of each
>> statement;
>>        - the code that gets vectorized I want to translate in C code the
>> parts that are sequential and generate SIMD intrinsics for my SIMD processor
>> where normally it would generate vector instructions.
>>       I started looking at InnerLoopVectorizer::vectorize() and
>> InnerLoopVectorizer::createEmptyLoop(). Not generating LLVM code but C/C++
>> code (with the help of LLVM intrinsics) is not trivial, but it should be
>> reasonably simple to achieve.
>>
>>      Would you advise for such an operation as the one described above?  I
>> guess doing this as a Clang phase (working on the source code) is not really
>> a bad idea either, since I would have better control on source code, but I
>> would need to reimplement the loop vectorizer algorithm that is currently
>> implemented on LLVM code.
>
>
> vectorization is a coordination from high level optimizations like
> loop level stuff and low level target stuff. If you are still at the
> source level, how do you plan to handle the actual lowering?

     LoopVectorize.cpp has nothing to do with lowering, as far as I know.
     Vectorization was shown to work as a source-to-source transformation pass in the 
Scout project 
https://tu-dresden.de/die_tu_dresden/zentrale_einrichtungen/zih/forschung/projekte/scout/publications 
. In their case the generated code is the source code somewhat transformed and augmented 
with x86 intrinsics (they have implemented probably? vector data-types directly in the AST).
   But one could go further: we could have C code with vector data types (for example the 
OpenCL kernel language) and we can compile this code with an OpenCL compiler.

 > In that
> case you'll still always be at the mercy of another piece, which may
> or may not be able to handle what you've done. (In theory your
> transformation could be correct, but backend just not handle it)

>
> Having said this - why not actually work on fixing the root of the
> "problem" - that being the actual llvm passes which aren't doing what
> you need. This would also likely be more robust and you can maintain
> control over the whole experiment (compilation flow)

     Indeed, it seems that working on LoopVectorize.cpp is not the best idea (Mikhail 
noted that loop transformations like loop fission, currently not implemented, can disallow 
normally doing source-to-source transformation from LoopVectorize.cpp), but it seems to be 
OK for the moment.
     But I also need to do instruction selection for the SIMD/vector unit and best is to 
let the LLVM back end do this. The Scout project does instr selection in the ~frontend and 
I guess this could be suboptimal since it does not use LLVM's register allocator, etc.
     Actually any thought on this aspect is welcome (or similarly put: how do x86 
intrinsics do register allocation - see 
https://software.intel.com/en-us/articles/dont-spill-that-register-ensuring-optimal-performance-from-intrinsics, 
http://www.linuxjournal.com/content/introduction-gcc-compiler-intrinsics-vector-processing).

> I get really annoyed when reviewing papers from academics who have
> used source-to-source because they thought it was "easier". Short term
> short-cuts aren't likely going to produce novel results..
> .
     Although I haven't worked much on source-to-source transformation it seems to allow 
easier optimization for data-structures than when working with LLVM-IR.
     But deciding on the right place to implement well such a transformation pass in the 
compilation flow seems to be a rather difficult decision.

   Thank you,
     Alex


On 6/13/2016 10:34 PM, Mehdi Amini wrote:
 > Some related work: http://llvm.org/devmtg/2013-04/krzikalla-slides.pdf
 >
   Best regards,
     Alex