[llvm-commits] [llvm] r142152 - in /llvm/trunk: lib/CodeGen/SelectionDAG/ test/CodeGen/ARM/ test/CodeGen/CellSPU/ test/CodeGen/X86/

Tue Oct 18 05:19:19 PDT 2011

Hi Nadav,

> I discussed the legalization of <2 x i16> stores on ARM with Anton. As you mentioned, i16 is illegal on ARM and it is not possible to scalarize the store in the Legalizer.
> This was the main reason for moving the legalization of vector memory ops into LegalizeVectorOps.
Well, this is completely different story. Your question was about
trunc-stores, but here it seems to prevent important codegen sequence.

> I agree that in some cases promoting the elements in the vector is less efficient than widening the number of elements.  However, generally ‘promotion’ is a better strategy.  I am mostly interested in code-generation of auto-vectorized IR.  What workloads are you mostly interested in ? Maybe we can discuss the needed optimizations for these workloads.
On NEON you can do pretty efficient vector manipulations via shuffles
(e.g. any shuffle of 4 elements can be codegen'ed in 5 or less
instructions, usually 2-3). This is really important for ARM.

In the meantime I'd suggest you adding target-specific "vector select
strategy" flag, so target can choose how to deal with all the stuff
and make sure your code is disabled for e.g. ARM and CellSPU.

---
With best regards, Anton Korobeynikov
Faculty of Mathematics and Mechanics, Saint Petersburg State University