[LLVMdev] [PATCH] Add a Scalarize pass

Wed Nov 13 11:35:11 PST 2013

Nadav Rotem <nrotem at apple.com> writes:
> Hi Richard, 
>
> Thanks for working on this. We should probably move this discussion to
> llvm-dev because it is not strictly related to the patch review
> anymore.

OK, I removed phabricator and llvm-commits.

> The code below is not representative of general c/c++
> code. Usually only domain specific language (such as OpenCL) contain
> vector instructions.  The LLVM pass manager configuration (pass manager
> builder) is designed for C/C++ compilers, not for DSLs.  People who use
> LLVM for other compilation flows (such as GPU compilers, other
> languages) create their own optimization pipe. I am in favor of adding
> the scalarizer pass so that people who build LLVM-based JITs and
> compilers could use it.  However, I am against adding this pass by
> default to the pass manager builder.  I understand that there are cases
> where scalarizing early in the pipeline is better, but I don’t think
> that its worth the added complexity. Every target has a different set of
> quirks and we try very hard to avoid adding target-specific passes at
> IR-level. SelectionDAG is not going away soon, and the SD replacement
> will also have a scalarizing pass - the overall architecture is not
> going to change. There are always optimization phase ordering problems
> in the compiler and at the end of the day we need to come up with an
> optimization pipe and works for most programs that we care about. I
> still think that scalarizing in SD is a reasonable solution for c/c++.

I don't understand the basis for the last statement though.  Do you mean
that you think most cases produce better code if scalarised at the SD stage
rather than at the IR level?  Could you give an example?

If the idea is to have a clean separation of concerns between the front end
and LLVM, then it seems like there are two obvious approaches:

(a) make it the front end's responsibility to only generate vector widths
    that the target can handle.  There should then be no need for vector
    type legalisation (as opposed to operation legalisation).

(b) make LLVM handle vectors of all widths, which is the current situation.

If we stick with (b) then I think LLVM should try to handle those vectors
as efficiently as possible.  The argument instead seems to be for:

(c) have code of last resort to handle vectors of all widths, but do not
    try to optimise the resulting scalar operations as much as code that
    was scalar to begin with.  If the front end is generating vector
    widths for which the target has no native support, and if the front end
    cares about the performance of that vector code, it should explicitly
    run the Scalarizer pass itself.

    AIUI, it would also be the front end's resposnsibility to identify
    which targets have support for which vector widths and which would
    instead use scalarisation.

That seems to be a less clean interface.  E.g. as things stand today,
llvmpipe is able to do everything it needs to do with generic IR.
Porting it to a new target is a very trivial change of a few lines[*].
This seems like a good endorsement of the current interface.  But the
interface would be less clean if llvmpipe (and any other front end
that cares) has to duplicate target knowledge that LLVM already has.

 [*] There are optimisations to use intrinsics for certain targets,
     but they aren't needed for correctness.  Some of them might not
     be needed at all with recent versions of LLVM.

The C example I gave was deliberately small and artificial to show the point.
But you can go quite a long way with the generic vector extensions to C and
C++, just like llvmpipe can use generic IR to do everything it needs to do.

I think your point is that we should never run the Scalarizer pass
for clang, so it shouldn't be added by the pass manager.  But regardless
of whether the example code is typical, it seems reasonable to expect
"foo * 4" to be implemented as a shift.  To have it implemented as a
multiplication even at -O3 seems like a deficiency.

Even if you think it's unusual for C and C++ to have pre-vectorised code,
I think it's even more unusual for all vector input code to be cold.
So if we have vector code as input, I think we should try to optimise
it as best we can, whether it comes from C, C++, or a domain-specific
front end.

As I said in the phabricator comments, the vectorisation passes convert
scalar code to vector code based on target-specific knowledge.  I don't
see why it's a bad thing to also convert vector code to scalar code
based on target-specific knowledge.

Thanks,
Richard