[LLVMdev] [PATCH] Add a Scalarize pass

Fri Nov 15 09:18:14 PST 2013

Nadav Rotem <nrotem at apple.com> writes:
> The discussion on llvmpipe is irrelevant.  llvmpipe has its own pass
> manager and optimization pipe, it is not a C compiler.

Note that this reply was about whether TargetTransformInfo should be
used in Scalarizer, not whether Scalarizer should be in PMB.  I was
trying to explain why I thought that not testing TargetTransformInfo in
Scalarizer would make the pass less useful for llvmpipe's optimisation pipe.

Thanks,
Richard

> On Nov 15, 2013, at 3:26 AM, Richard Sandiford
> <rsandifo at linux.vnet.ibm.com> wrote:
>
>> Nadav Rotem <nrotem at apple.com> writes:
>>> On Nov 14, 2013, at 2:32 PM, Richard Sandiford
>>> <rsandifo at linux.vnet.ibm.com> wrote:
>>>> Richard Sandiford <rsandifo at linux.vnet.ibm.com> writes:
>>>>> Are you worried that adding it to PMB will increase compile time?
>>>>> The pass exits very early for any target that doesn't opt-in to doing
>>>>> scalarisation at the IR level, without even looking at the function.
>>>> 
>>>> As an alternative, adding Scalarizer and InstCombine passes to
>>>> SystemZPassConfig::addIRPasses() would probably give me most of the
>>>> benefit without affecting the PMB.  Scalarizer itself would then not
>>>> test TargetTransformInfo at all, at least in the initial version,
>>>> and the scalarisation would still logically be done by codegen.
>>>> Would that be OK?
>>> 
>>> I actually prefer that the Scalarizer would not touch TTI at all because
>>> I view scalarization a canonicalization phase for DSLs, much like SROA
>>> breaks structs.
>> 
>> That's what Pekka is thinking of using it for, but it wasn't the reason
>> I wrote it.  The original motivation was llvmpipe, which is a rasteriser
>> rather than a DSL compiler.  The motivation wasn't to canonicalise,
>> it was to do the same thing that codegen currently does, but in a better
>> place from an optimisation perspective.
>> 
>> You said in an earlier message:
>> 
>>  Other users of LLVM (such as OpenCL JITs) do scalarize early in the
>>  optimization pipeline because the problem-domain presents lots of
>>  vectors that needs to be legalized.
>> 
>> But:
>> 
>> (a) Scalarising and revectorising only makes sense if the vectorisation
>>    is done with the target in mind.  If going from scalar code to vector
>>    code can depend on the target, why shouldn't the same be true in the
>>    other direction, for targets without vector support?
>> 
>> (b) The situation you describe isn't the one that applies to llvmpipe.
>>    In llvmpipe the vectors are nice, known widths that are under the
>>    driver's own control.  We certainly don't want to scalarise and
>>    revectorise llvmpipe IR on x86_64, or on powerpc with Altivec/VSX.
>>    The original code is already well vectorised for those targets.
>>    (And also for ARM NEON I expect.)
>> 
>>    In the llvmpipe case, codegen's type legaliser already makes a good
>>    decision about what to scalarise and what not to scalarise, without
>>    any help from llvmpipe.  The problem I'm trying to solve is that
>>    codegen is too late to get the benefit of other IR optimisations.
>> 
>>    So in my case I do not want to _change_ the decision about which
>>    vectors get scalarised and how.  I just want to do it earlier.
>>    It would be a shame if that meant that llvmpipe had to duplicate
>>    exactly the decisions that codegen makes wrt scalarisation,
>>    since codegen can easily make those decisions available through
>>    TargetTransformInfo.
>> 
>> That's why I thought using TTI in the Scalarizer was a good thing
>> in principle, at least as an option.
>> 
>> SystemZ is a simple case because there is no vector support.  But take MIPS
>> (which is often a good example when it comes to complicated possibilities :-)).
>> It has at least four separate vector extensions:
>> 
>>  - <2 x float> support from the MIPS V floating-point extensions,
>>    carried over to MIPS 32/64.
>> 
>>  - <8 x i8> and <4 x i16> support from the optional MDMX extension,
>>    now deprecated but used on older chips like the SB-1 and (in a
>>    modified form) the VR5400.
>> 
>>  - Processor-specific vector extensions for the Loongson range.
>> 
>>  - The new MSA ASE.
>> 
>> That's a lot of possiblities.  Maybe the LLVM port will never support
>> Loongson and MDMX (almost certain for the latter), but the point is that
>> even if it did support them, the current codegen interface would make the
>> right decisions about which of the llvmpipe vectors should be scalarised
>> and how.
>> 
>> If Scalarizer is an all-or-nothing pass then it cannot make as good a
>> decision for llvmpipe IR, where we don't expect to revectorise the result.
>> Obviously the current pass is all-or-nothing anyway, but I tried to
>> structure it so that it would be easy to make per-type decisions in
>> the future, based on the TargetTransformInfo.
>> 
>> I realise I'm not going to convince you, and I'm going to make the
>> change anyway.  I still think it's the wrong direction though.
>> 
>> Thanks,
>> Richard
>>