[LLVMdev] [PATCH] Add a Scalarize pass

Fri Nov 15 03:26:07 PST 2013

Nadav Rotem <nrotem at apple.com> writes:
> On Nov 14, 2013, at 2:32 PM, Richard Sandiford
> <rsandifo at linux.vnet.ibm.com> wrote:
>> Richard Sandiford <rsandifo at linux.vnet.ibm.com> writes:
>>> Are you worried that adding it to PMB will increase compile time?
>>> The pass exits very early for any target that doesn't opt-in to doing
>>> scalarisation at the IR level, without even looking at the function.
>> 
>> As an alternative, adding Scalarizer and InstCombine passes to
>> SystemZPassConfig::addIRPasses() would probably give me most of the
>> benefit without affecting the PMB.  Scalarizer itself would then not
>> test TargetTransformInfo at all, at least in the initial version,
>> and the scalarisation would still logically be done by codegen.
>> Would that be OK?
>
> I actually prefer that the Scalarizer would not touch TTI at all because
> I view scalarization a canonicalization phase for DSLs, much like SROA
> breaks structs.

That's what Pekka is thinking of using it for, but it wasn't the reason
I wrote it.  The original motivation was llvmpipe, which is a rasteriser
rather than a DSL compiler.  The motivation wasn't to canonicalise,
it was to do the same thing that codegen currently does, but in a better
place from an optimisation perspective.

You said in an earlier message:

  Other users of LLVM (such as OpenCL JITs) do scalarize early in the
  optimization pipeline because the problem-domain presents lots of
  vectors that needs to be legalized.

But:

(a) Scalarising and revectorising only makes sense if the vectorisation
    is done with the target in mind.  If going from scalar code to vector
    code can depend on the target, why shouldn't the same be true in the
    other direction, for targets without vector support?

(b) The situation you describe isn't the one that applies to llvmpipe.
    In llvmpipe the vectors are nice, known widths that are under the
    driver's own control.  We certainly don't want to scalarise and
    revectorise llvmpipe IR on x86_64, or on powerpc with Altivec/VSX.
    The original code is already well vectorised for those targets.
    (And also for ARM NEON I expect.)

    In the llvmpipe case, codegen's type legaliser already makes a good
    decision about what to scalarise and what not to scalarise, without
    any help from llvmpipe.  The problem I'm trying to solve is that
    codegen is too late to get the benefit of other IR optimisations.

    So in my case I do not want to _change_ the decision about which
    vectors get scalarised and how.  I just want to do it earlier.
    It would be a shame if that meant that llvmpipe had to duplicate
    exactly the decisions that codegen makes wrt scalarisation,
    since codegen can easily make those decisions available through
    TargetTransformInfo.

That's why I thought using TTI in the Scalarizer was a good thing
in principle, at least as an option.

SystemZ is a simple case because there is no vector support.  But take MIPS
(which is often a good example when it comes to complicated possibilities :-)).
It has at least four separate vector extensions:

  - <2 x float> support from the MIPS V floating-point extensions,
    carried over to MIPS 32/64.

  - <8 x i8> and <4 x i16> support from the optional MDMX extension,
    now deprecated but used on older chips like the SB-1 and (in a
    modified form) the VR5400.

  - Processor-specific vector extensions for the Loongson range.

  - The new MSA ASE.

That's a lot of possiblities.  Maybe the LLVM port will never support
Loongson and MDMX (almost certain for the latter), but the point is that
even if it did support them, the current codegen interface would make the
right decisions about which of the llvmpipe vectors should be scalarised
and how.

If Scalarizer is an all-or-nothing pass then it cannot make as good a
decision for llvmpipe IR, where we don't expect to revectorise the result.
Obviously the current pass is all-or-nothing anyway, but I tried to
structure it so that it would be easy to make per-type decisions in
the future, based on the TargetTransformInfo.

I realise I'm not going to convince you, and I'm going to make the
change anyway.  I still think it's the wrong direction though.

Thanks,
Richard