XCore target: disable vectorization

Fri Sep 13 09:51:23 PDT 2013

Yep. LGTM.

Thanks.

On Sep 13, 2013, at 11:47 AM, Robert Lytton <robert at xmos.com> wrote:

> Is this patch good to go?
> Robert
> ________________________________________
> From: llvm-commits-bounces at cs.uiuc.edu [llvm-commits-bounces at cs.uiuc.edu] on behalf of Robert Lytton [robert at xmos.com]
> Sent: 10 September 2013 18:06
> To: Arnold Schwaighofer
> Cc: llvm-commits at cs.uiuc.edu
> Subject: RE: XCore target: disable vectorization
> 
> Thank you.
> I missed out the target datalayout earlier in my efforts... and did not retry all - opps!
> Now added changes to SLPVectorizer too.
> No changes needed for BBVectorizer - as long as bb-vectorize does not ignore target-info. (test added).
> 
> Patch attached for review:
>    Prevent LoopVectorizer and SLPVectorizer running if the target has no vector registers.
>    XCore target: Add XCoreTargetTransformInfo
>    This is where getNumberOfRegisters() resides, which in turn returns the
>    number of vector registers (=0).
> 
> 
> 
>> We have test/Analysis/CostModel to test costs of instructions, but they make only sense for you if you would be vectorizing.
> That is something that I need to look at.
> 
> Robert
> 
> ________________________________________
> From: Arnold Schwaighofer [aschwaighofer at apple.com]
> Sent: 10 September 2013 16:22
> To: Robert Lytton
> Cc: llvm-commits at cs.uiuc.edu; Nadav Rotem; Richard Osborne
> Subject: Re: XCore target: disable vectorization
> 
> On Sep 10, 2013, at 9:25 AM, Robert Lytton <robert at xmos.com> wrote:
> 
>> (subject moved to llvm-commits from cfe-commits http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20130909/088308.html)
>> 
>> Hi Arnold,
>> 
>> I have looked into getting the xcore target to incorrectly run the slp vectorizer so that I can then 'fix it' but have had no success. Any thoughts?
> 
> Did you try one of the examples in e.g. test/Transforms/SLPVectorizer/X86/ like simplebb.ll. You would substitute the triple and data layout. If it still doesn’t slp-vectorize try -mllvm -debug-only=SLP. The following works for me:
> 
> $ cat > simplebb.ll
> ; RUN: opt < %s -basicaa -slp-vectorizer -dce -S -mtriple=xcore  | FileCheck %s
> 
> target datalayout =
> "e-p:32:32:32-a0:0:32-n32-i1:8:32-i8:8:32-i16:16:32-i32:32:32-i64:32:32-f16:16:32-f32:32:32-f64:32:32"
> target triple = "xcore"
> 
> ; Simple 3-pair chain with loads and stores
> ; CHECK: test1
> ; CHECK: store <2 x double>
> ; CHECK: ret
> define void @test1(double* %a, double* %b, double* %c) {
> entry:
>  %i0 = load double* %a, align 8
>  %i1 = load double* %b, align 8
>  %mul = fmul double %i0, %i1
>  %arrayidx3 = getelementptr inbounds double* %a, i64 1
>  %i3 = load double* %arrayidx3, align 8
>  %arrayidx4 = getelementptr inbounds double* %b, i64 1
>  %i4 = load double* %arrayidx4, align 8
>  %mul5 = fmul double %i3, %i4
>  store double %mul, double* %c, align 8
>  %arrayidx5 = getelementptr inbounds double* %c, i64 1
>  store double %mul5, double* %arrayidx5, align 8
>  ret void
> }
> 
> $ opt -basicaa -slp-vectorizer -dce -S -mtriple=xcore < simplebb.ll -debug-only=SLP
> …
> SLP: Decided to vectorize cost=-6
> SLP: Extracting 0 values .
> SLP:    Erasing scalar:  store double %mul, double* %c, align 8.
> SLP:    Erasing scalar:  store double %mul5, double* %arrayidx5, align 8.
> SLP:    Erasing scalar:  %mul = fmul double %i0, %i1.
> SLP:    Erasing scalar:  %mul5 = fmul double %i3, %i4.
> SLP:    Erasing scalar:  %i0 = load double* %a, align 8.
> SLP:    Erasing scalar:  %i3 = load double* %arrayidx3, align 8.
> SLP:    Erasing scalar:  %i1 = load double* %b, align 8.
> SLP:    Erasing scalar:  %i4 = load double* %arrayidx4, align 8.
> SLP: Optimizing 0 gather sequences instructions.
> SLP: vectorized "test1"
> ; ModuleID = '<stdin>'
> target datalayout = "e-p:32:32:32-a0:0:32-n32-i1:8:32-i8:8:32-i16:16:32-i32:32:32-i64:32:32-f16:16:32-f32:32:32-f64:32:32"
> target triple = "xcore"
> 
> define void @test1(double* %a, double* %b, double* %c) {
> entry:
>  %0 = bitcast double* %a to <2 x double>*
>  %1 = load <2 x double>* %0, align 8
>  %2 = bitcast double* %b to <2 x double>*
>  %3 = load <2 x double>* %2, align 8
>  %4 = fmul <2 x double> %1, %3
>  %5 = bitcast double* %c to <2 x double>*
>  store <2 x double> %4, <2 x double>* %5, align 8
>  ret void
> }
>> 
>> Attached is the suggested patch to lib/Transforms/Vectorize/LoopVectorize.cpp ( & test/Transforms/LoopVectorize/xcore/no-vector-registers.ll)
>> 
>> However, I also need to commit additions to the xcore target viz: lib/Target/XCore/XCoreTargetTransformInfo.cpp et al.
>> 1) should I make a xcore target commit first & separately?
> 
> I think your loop vectorizer commit is fine since you are adding XCoreTargetTransformInfo specifically for this.
> 
>> 2) what tests should I be considering for the XCoreTargetTransformInfo tests?
> 
> We have test/Analysis/CostModel to test costs of instructions, but they make only sense for you if you would be vectorizing.
> 
> 
>> (I will further enharnce XCoreTargetTransformInfo in later commits)
>> 
>> Sorry for all the basic questions - still finding my way.
> 
> np.
> 
>> Thank you
>> 
>> Robert
>> ________________________________________
>> From: Arnold Schwaighofer [aschwaighofer at apple.com]
>> Sent: 09 September 2013 19:19
>> To: Robert Lytton
>> Cc: Rafael Espíndola; Nadav Rotem; cfe-commits at cs.uiuc.edu
>> Subject: Re: XCore target: disable vectorization
>> 
>> On Sep 9, 2013, at 12:49 PM, Robert Lytton <robert at xmos.com> wrote:
>> 
>>> Hi Arnold,
>>> 
>>> In my mind there seems to be two changes needed.
>> 
>> No, with the TTI change you only need an llvm change - no clang change. clang will add the slp/loop-vectorizer but once they execute and ask the target whether they should vectorize (“TTI->getNumberOfRegisters(true)”) these passes immediately give up.
>> 
>>> 
>>> 1) llvm to check the number of vector registers - as per the direction of this thread.
>>> Will I also need something similar in BBVectorize? (see below).
>>> I need to create some tests for the changes - not so straight forward!
>> 
>> 
>> You would create a test in "test/Transforms/LoopVectorize/XCore” and “test/Transforms/SLPVectorize/XCore” with “opt -mtriple=XCore- ….” and make sure that you don’t vectorize code you normally would.
>> 
>>> When done, I'll post to llvm-commit and cc folk for comment.
>>> 
>>> 2) clang not to set the '-vectorize-loops' & '-vectorize-slp' flags
>>> Thus the original patch (with corrected function name) would seem reasonable.
>>> If others agree, I will resubmit the changes.
>>> 
>> You would not need this change.
>> 
>>> Robert
>>> 
>>> --- a/lib/Transforms/Vectorize/BBVectorize.cpp
>>> +++ b/lib/Transforms/Vectorize/BBVectorize.cpp
>>> @@ -397,6 +397,12 @@ namespace {
>>>     DEBUG(if (TTI) dbgs() << "BBV: using target information\n");
>>> 
>>>     bool changed = false;
>>> +
>>> +      // If the target claims to have no vector registers don't attempt
>>> +      // vectorization.
>>> +      if (TTI && !TTI->getNumberOfRegisters(true))
>>> +        return false;
>>> +
>>>     // Iterate a sufficient number of times to merge types of size 1 bit,
>> 
>> <PatchVectorRegisters>
>