[LLVMdev] Is there pass to break down <4 x float> to scalars

Fri Oct 25 07:19:36 PDT 2013

Liu Xin <navy.xliu at gmail.com> writes:
> I think we are solving a same problem. I am working on shader language
> too.  I am not satisfied with current binaries because vector operations
> are kept in llvm opt.
>
> glsl shader language has an operation called "swizzle". It can select
> sub-components of a vector. If a shader only takes components "xy" for a
> vec4. it's certainly wasteful to generate 4 operations for a scalar
> processor.
>
> i think a good solution for llvm is in codegen. Many compiler has codegen
> optimizer. A DSE is good enough.
>
> Which posted patch about TBAA? you have yet another solution except
> decompose-vectors?

Ah, no, the TBAA thing is separate really.  llvmpipe generally operates
on 4 rows at a time, so some functions end up with patterns like:

   load <16 x i8> row0 ...
   load <16 x i8> row1 ...
   load <16 x i8> row2 ...
   load <16 x i8> row3 ...
   ... do stuff ...
   store <16 x i8> row0 ...
   store <16 x i8> row1 ...
   store <16 x i8> row2 ...
   store <16 x i8> row3 ...

Since the row stride is variable, llvm doesn't have enough information
to tell that these rows don't alias.  So it has to keep the loads and
stores in order.  And z only has 16 general registers, so a naively-
scalarised 16 x i8 operation rapidly runs out.  With unmodified llvmpipe
IR we get lots of spills.

Since z also has x86-like register-memory operations, a few spills are
usually OK.  But in this case we have to load i8s and immediately
spill them.

So the idea was to add TBAA information to the llvmpipe IR to say that
the rows don't alias.  (At the moment I'm only doing that by hand on
saved IR, I've not done it in llvmpipe itself yet.)  Combined with
-combiner-alias-analysis -combiner-global-alias-analysis, this allows
the loads and stores to be reordered, which gives much better code.

However, the problem at the moment is that there are other scalar loads
that get rewritten by DAGCombiner and the legalisation code, and in the
process lose their TBAA info.  This then interferes with the optimisation
above.  So I wanted to make sure that the TBAA information is kept around:

  http://llvm-reviews.chandlerc.com/D1894

It was just that if I had a choice of only getting one of the two patches in,
it'd definitely be the D1894 one.  It sounds like there's more interest in
the DecomposeVectors patch than I'd expected though, so I'll get back to it.

Maybe as a first cut we can have a TargetTransformInfo hook to enable or
disable the pass wholesale, with a command-line option to override it.

Thanks to you an Renato for the feedback.

Richard