[LLVMdev] Enabling vectorization with LLVM 3.3 for a DSL emitting LLVM IR

Fri Jul 5 08:23:25 PDT 2013

On Jul 5, 2013, at 9:50 AM, Stéphane Letz <letz at grame.fr> wrote:

> 
> Le 5 juil. 2013 à 04:11, Tobias Grosser <tobias at grosser.es> a écrit :
> 
>> On 07/04/2013 01:39 PM, Stéphane Letz wrote:
>>> Hi,
>>> 
>>> Our DSL can generate C or directly generate LLVM IR. With LLVM 3.3, we can vectorize the C produced code using clang with -O3, or clang with -O1 then opt -O3 -vectorize-loops. But the same program generating LLVM IR version cannot be vectorized with opt -O3 -vectorize-loops. So our guess is that our generated LLVM IR lacks some informations that are needed by the vectorization passes to correctly work.
>>> 
>>> Any idea of what could be lacking?
>> 
>> Without any knowledge about the code guessing is hard. You may miss the 'noalias' keyword or nsw/nuw flags, but there are many possibilities.
>> 
>> If you add '-debug' to opt you may get some hints. Also, if you have a small test case, posting the LLVM-IR may help.
>> 
>> Cheers,
>> Tobias
>> 
> 
> 
> I did some progress:
> 
> 1) adding a DataLayout in my generated LLVM Module, explicitly as a string. BTW is there any notion of "default" DataLayout that could be used? How is a LLVM Module supposed to know which DataLayout to use (in general) ?
> 
> 2) next the resulting module could not be vectorized with "opt -O3 -vectorize-loops -debug  -S m1.ll -o m2.ll", but if I do in "two steps" like:
> 
> opt -O3 -vectorize-loops -debug S m1.ll -o m2.ll
> 
> opt -O3 -vectorize-loops -debug S m2.ll -o m3.ll
> 
> then it works….
> 
> Any idea?
> 
> Thanks.
> 
> Stéphane Letz

Hi Stephane,

Move the alloca for “i" into the entry block.

The IR coming into the loop vectorizer looks something like the following. The loop vectorizer can't recognize one of the phis as an induction or reduction, so it gives up. 

The reason why you have this “odd” phi is because SROA (which transforms allocas into SSA variables) does not  get rid of the “i” variable (later passes do but leave this odd IR around) because “i”’s alloca is not in the entry block - it only works on allocas in the entry block.

opt -O3 -vectorize-loops -debug-only=loop-vectorize < test.ll

LV: Found a loop: code_block8
LV: Found an induction variable.
LV: PHI is not a poly recurrence.
LV: Found an unidentified PHI.  %storemerge8 = phi i32 [ 0, %code_block8.lr.ph ], [ %next_index, %code_block8 ]
LV: Can't vectorize the instructions or CFG
LV: Not vectorizing.

IR coming into the vectorizer:

code_block8:                                      ; preds = %code_block8.lr.ph, %code_block8
  %next_index10 = phi i32 [ %i.promoted, %code_block8.lr.ph ], [ %next_index, %code_block8 ]
  %storemerge8 = phi i32 [ 0, %code_block8.lr.ph ], [ %next_index, %code_block8 ]            ; <<< THIS phi is the problem.
  %20 = sext i32 %storemerge8 to i64
  %.sum = add i64 %20, %9
  %21 = getelementptr inbounds float* %11, i64 %.sum
  %22 = getelementptr inbounds float* %8, i64 %.sum
  %23 = load float* %22, align 4
  %24 = getelementptr inbounds float* %10, i64 %.sum
  %25 = load float* %24, align 4
  %26 = fadd float %23, %25
  store float %26, float* %21, align 4
  %next_index = add i32 %next_index10, 1
  %27 = icmp slt i32 %next_index, %16
  br i1 %27, label %code_block8, label %exec_block4.exit_block6_crit_edge

exec_block.return_crit_edge:                      ; preds = %exit_block6
  br label %return

return:                                           ; preds = %exec_block.return_crit_edge, %block_code
  ret void
}