XCore target: disable vectorization

Tue Sep 10 08:22:44 PDT 2013

On Sep 10, 2013, at 9:25 AM, Robert Lytton <robert at xmos.com> wrote:

> (subject moved to llvm-commits from cfe-commits http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20130909/088308.html)
> 
> Hi Arnold,
> 
> I have looked into getting the xcore target to incorrectly run the slp vectorizer so that I can then 'fix it' but have had no success. Any thoughts?

Did you try one of the examples in e.g. test/Transforms/SLPVectorizer/X86/ like simplebb.ll. You would substitute the triple and data layout. If it still doesn’t slp-vectorize try -mllvm -debug-only=SLP. The following works for me:

$ cat > simplebb.ll
; RUN: opt < %s -basicaa -slp-vectorizer -dce -S -mtriple=xcore  | FileCheck %s

target datalayout =
"e-p:32:32:32-a0:0:32-n32-i1:8:32-i8:8:32-i16:16:32-i32:32:32-i64:32:32-f16:16:32-f32:32:32-f64:32:32"
target triple = "xcore"

; Simple 3-pair chain with loads and stores
; CHECK: test1
; CHECK: store <2 x double>
; CHECK: ret
define void @test1(double* %a, double* %b, double* %c) {
entry:
  %i0 = load double* %a, align 8
  %i1 = load double* %b, align 8
  %mul = fmul double %i0, %i1
  %arrayidx3 = getelementptr inbounds double* %a, i64 1
  %i3 = load double* %arrayidx3, align 8
  %arrayidx4 = getelementptr inbounds double* %b, i64 1
  %i4 = load double* %arrayidx4, align 8
  %mul5 = fmul double %i3, %i4
  store double %mul, double* %c, align 8
  %arrayidx5 = getelementptr inbounds double* %c, i64 1
  store double %mul5, double* %arrayidx5, align 8
  ret void
}

$ opt -basicaa -slp-vectorizer -dce -S -mtriple=xcore < simplebb.ll -debug-only=SLP
…
SLP: Decided to vectorize cost=-6
SLP: Extracting 0 values .
SLP:    Erasing scalar:  store double %mul, double* %c, align 8.
SLP:    Erasing scalar:  store double %mul5, double* %arrayidx5, align 8.
SLP:    Erasing scalar:  %mul = fmul double %i0, %i1.
SLP:    Erasing scalar:  %mul5 = fmul double %i3, %i4.
SLP:    Erasing scalar:  %i0 = load double* %a, align 8.
SLP:    Erasing scalar:  %i3 = load double* %arrayidx3, align 8.
SLP:    Erasing scalar:  %i1 = load double* %b, align 8.
SLP:    Erasing scalar:  %i4 = load double* %arrayidx4, align 8.
SLP: Optimizing 0 gather sequences instructions.
SLP: vectorized "test1"
; ModuleID = '<stdin>'
target datalayout = "e-p:32:32:32-a0:0:32-n32-i1:8:32-i8:8:32-i16:16:32-i32:32:32-i64:32:32-f16:16:32-f32:32:32-f64:32:32"
target triple = "xcore"

define void @test1(double* %a, double* %b, double* %c) {
entry:
  %0 = bitcast double* %a to <2 x double>*
  %1 = load <2 x double>* %0, align 8
  %2 = bitcast double* %b to <2 x double>*
  %3 = load <2 x double>* %2, align 8
  %4 = fmul <2 x double> %1, %3
  %5 = bitcast double* %c to <2 x double>*
  store <2 x double> %4, <2 x double>* %5, align 8
  ret void
}
> 
> Attached is the suggested patch to lib/Transforms/Vectorize/LoopVectorize.cpp ( & test/Transforms/LoopVectorize/xcore/no-vector-registers.ll)
> 
> However, I also need to commit additions to the xcore target viz: lib/Target/XCore/XCoreTargetTransformInfo.cpp et al.
> 1) should I make a xcore target commit first & separately?

I think your loop vectorizer commit is fine since you are adding XCoreTargetTransformInfo specifically for this.

> 2) what tests should I be considering for the XCoreTargetTransformInfo tests?

We have test/Analysis/CostModel to test costs of instructions, but they make only sense for you if you would be vectorizing.

> (I will further enharnce XCoreTargetTransformInfo in later commits)
> 
> Sorry for all the basic questions - still finding my way.

np.

> Thank you
> 
> Robert
> ________________________________________
> From: Arnold Schwaighofer [aschwaighofer at apple.com]
> Sent: 09 September 2013 19:19
> To: Robert Lytton
> Cc: Rafael Espíndola; Nadav Rotem; cfe-commits at cs.uiuc.edu
> Subject: Re: XCore target: disable vectorization
> 
> On Sep 9, 2013, at 12:49 PM, Robert Lytton <robert at xmos.com> wrote:
> 
>> Hi Arnold,
>> 
>> In my mind there seems to be two changes needed.
> 
> No, with the TTI change you only need an llvm change - no clang change. clang will add the slp/loop-vectorizer but once they execute and ask the target whether they should vectorize (“TTI->getNumberOfRegisters(true)”) these passes immediately give up.
> 
>> 
>> 1) llvm to check the number of vector registers - as per the direction of this thread.
>> Will I also need something similar in BBVectorize? (see below).
>> I need to create some tests for the changes - not so straight forward!
> 
> 
> You would create a test in "test/Transforms/LoopVectorize/XCore” and “test/Transforms/SLPVectorize/XCore” with “opt -mtriple=XCore- ….” and make sure that you don’t vectorize code you normally would.
> 
>> When done, I'll post to llvm-commit and cc folk for comment.
>> 
>> 2) clang not to set the '-vectorize-loops' & '-vectorize-slp' flags
>> Thus the original patch (with corrected function name) would seem reasonable.
>> If others agree, I will resubmit the changes.
>> 
> You would not need this change.
> 
>> Robert
>> 
>> --- a/lib/Transforms/Vectorize/BBVectorize.cpp
>> +++ b/lib/Transforms/Vectorize/BBVectorize.cpp
>> @@ -397,6 +397,12 @@ namespace {
>>      DEBUG(if (TTI) dbgs() << "BBV: using target information\n");
>> 
>>      bool changed = false;
>> +
>> +      // If the target claims to have no vector registers don't attempt
>> +      // vectorization.
>> +      if (TTI && !TTI->getNumberOfRegisters(true))
>> +        return false;
>> +
>>      // Iterate a sufficient number of times to merge types of size 1 bit,
> 
> <PatchVectorRegisters>