XCore target: disable vectorization

Tue Sep 10 10:06:41 PDT 2013

Thank you.
I missed out the target datalayout earlier in my efforts... and did not retry all - opps!
Now added changes to SLPVectorizer too.
No changes needed for BBVectorizer - as long as bb-vectorize does not ignore target-info. (test added).

Patch attached for review:
    Prevent LoopVectorizer and SLPVectorizer running if the target has no vector registers.
    XCore target: Add XCoreTargetTransformInfo
    This is where getNumberOfRegisters() resides, which in turn returns the
    number of vector registers (=0).

> We have test/Analysis/CostModel to test costs of instructions, but they make only sense for you if you would be vectorizing.
That is something that I need to look at.

Robert

________________________________________
From: Arnold Schwaighofer [aschwaighofer at apple.com]
Sent: 10 September 2013 16:22
To: Robert Lytton
Cc: llvm-commits at cs.uiuc.edu; Nadav Rotem; Richard Osborne
Subject: Re: XCore target: disable vectorization

On Sep 10, 2013, at 9:25 AM, Robert Lytton <robert at xmos.com> wrote:

> (subject moved to llvm-commits from cfe-commits http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20130909/088308.html)
>
> Hi Arnold,
>
> I have looked into getting the xcore target to incorrectly run the slp vectorizer so that I can then 'fix it' but have had no success. Any thoughts?

Did you try one of the examples in e.g. test/Transforms/SLPVectorizer/X86/ like simplebb.ll. You would substitute the triple and data layout. If it still doesn’t slp-vectorize try -mllvm -debug-only=SLP. The following works for me:

$ cat > simplebb.ll
; RUN: opt < %s -basicaa -slp-vectorizer -dce -S -mtriple=xcore  | FileCheck %s

target datalayout =
"e-p:32:32:32-a0:0:32-n32-i1:8:32-i8:8:32-i16:16:32-i32:32:32-i64:32:32-f16:16:32-f32:32:32-f64:32:32"
target triple = "xcore"

; Simple 3-pair chain with loads and stores
; CHECK: test1
; CHECK: store <2 x double>
; CHECK: ret
define void @test1(double* %a, double* %b, double* %c) {
entry:
  %i0 = load double* %a, align 8
  %i1 = load double* %b, align 8
  %mul = fmul double %i0, %i1
  %arrayidx3 = getelementptr inbounds double* %a, i64 1
  %i3 = load double* %arrayidx3, align 8
  %arrayidx4 = getelementptr inbounds double* %b, i64 1
  %i4 = load double* %arrayidx4, align 8
  %mul5 = fmul double %i3, %i4
  store double %mul, double* %c, align 8
  %arrayidx5 = getelementptr inbounds double* %c, i64 1
  store double %mul5, double* %arrayidx5, align 8
  ret void
}

$ opt -basicaa -slp-vectorizer -dce -S -mtriple=xcore < simplebb.ll -debug-only=SLP
…
SLP: Decided to vectorize cost=-6
SLP: Extracting 0 values .
SLP:    Erasing scalar:  store double %mul, double* %c, align 8.
SLP:    Erasing scalar:  store double %mul5, double* %arrayidx5, align 8.
SLP:    Erasing scalar:  %mul = fmul double %i0, %i1.
SLP:    Erasing scalar:  %mul5 = fmul double %i3, %i4.
SLP:    Erasing scalar:  %i0 = load double* %a, align 8.
SLP:    Erasing scalar:  %i3 = load double* %arrayidx3, align 8.
SLP:    Erasing scalar:  %i1 = load double* %b, align 8.
SLP:    Erasing scalar:  %i4 = load double* %arrayidx4, align 8.
SLP: Optimizing 0 gather sequences instructions.
SLP: vectorized "test1"
; ModuleID = '<stdin>'
target datalayout = "e-p:32:32:32-a0:0:32-n32-i1:8:32-i8:8:32-i16:16:32-i32:32:32-i64:32:32-f16:16:32-f32:32:32-f64:32:32"
target triple = "xcore"

define void @test1(double* %a, double* %b, double* %c) {
entry:
  %0 = bitcast double* %a to <2 x double>*
  %1 = load <2 x double>* %0, align 8
  %2 = bitcast double* %b to <2 x double>*
  %3 = load <2 x double>* %2, align 8
  %4 = fmul <2 x double> %1, %3
  %5 = bitcast double* %c to <2 x double>*
  store <2 x double> %4, <2 x double>* %5, align 8
  ret void
}
>
> Attached is the suggested patch to lib/Transforms/Vectorize/LoopVectorize.cpp ( & test/Transforms/LoopVectorize/xcore/no-vector-registers.ll)
>
> However, I also need to commit additions to the xcore target viz: lib/Target/XCore/XCoreTargetTransformInfo.cpp et al.
> 1) should I make a xcore target commit first & separately?

I think your loop vectorizer commit is fine since you are adding XCoreTargetTransformInfo specifically for this.

> 2) what tests should I be considering for the XCoreTargetTransformInfo tests?

We have test/Analysis/CostModel to test costs of instructions, but they make only sense for you if you would be vectorizing.

> (I will further enharnce XCoreTargetTransformInfo in later commits)
>
> Sorry for all the basic questions - still finding my way.

np.

> Thank you
>
> Robert
> ________________________________________
> From: Arnold Schwaighofer [aschwaighofer at apple.com]
> Sent: 09 September 2013 19:19
> To: Robert Lytton
> Cc: Rafael Espíndola; Nadav Rotem; cfe-commits at cs.uiuc.edu
> Subject: Re: XCore target: disable vectorization
>
> On Sep 9, 2013, at 12:49 PM, Robert Lytton <robert at xmos.com> wrote:
>
>> Hi Arnold,
>>
>> In my mind there seems to be two changes needed.
>
> No, with the TTI change you only need an llvm change - no clang change. clang will add the slp/loop-vectorizer but once they execute and ask the target whether they should vectorize (“TTI->getNumberOfRegisters(true)”) these passes immediately give up.
>
>>
>> 1) llvm to check the number of vector registers - as per the direction of this thread.
>> Will I also need something similar in BBVectorize? (see below).
>> I need to create some tests for the changes - not so straight forward!
>
>
> You would create a test in "test/Transforms/LoopVectorize/XCore” and “test/Transforms/SLPVectorize/XCore” with “opt -mtriple=XCore- ….” and make sure that you don’t vectorize code you normally would.
>
>> When done, I'll post to llvm-commit and cc folk for comment.
>>
>> 2) clang not to set the '-vectorize-loops' & '-vectorize-slp' flags
>> Thus the original patch (with corrected function name) would seem reasonable.
>> If others agree, I will resubmit the changes.
>>
> You would not need this change.
>
>> Robert
>>
>> --- a/lib/Transforms/Vectorize/BBVectorize.cpp
>> +++ b/lib/Transforms/Vectorize/BBVectorize.cpp
>> @@ -397,6 +397,12 @@ namespace {
>>      DEBUG(if (TTI) dbgs() << "BBV: using target information\n");
>>
>>      bool changed = false;
>> +
>> +      // If the target claims to have no vector registers don't attempt
>> +      // vectorization.
>> +      if (TTI && !TTI->getNumberOfRegisters(true))
>> +        return false;
>> +
>>      // Iterate a sufficient number of times to merge types of size 1 bit,
>
> <PatchVectorRegisters>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: PatchVectorRegisters
Type: application/octet-stream
Size: 8889 bytes
Desc: PatchVectorRegisters
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20130910/fcb49771/attachment.obj>