[llvm-commits] [llvm] r75308 - in /llvm/trunk: include/llvm/CodeGen/ValueTypes.h include/llvm/CodeGen/ValueTypes.td lib/VMCore/ValueTypes.cpp utils/TableGen/CodeGenTarget.cpp

Sat Jul 11 22:28:22 PDT 2009

On Jul 11, 2009, at 5:02 PM, Chris Lattner wrote:

>
> On Jul 10, 2009, at 4:05 PM, Bob Wilson wrote:
>
>> Author: bwilson
>> Date: Fri Jul 10 18:05:09 2009
>> New Revision: 75308
>>
>> URL: http://llvm.org/viewvc/llvm-project?rev=75308&view=rev
>> Log:
>> Add new vector types for 192-bit, 348-bit and 512-bit sizes.
>> These are needed to represent ARM Neon struct datatypes containing
>> 2, 3 or 4
>> separate vectors.
>
> Bob, are you sure this is the right way to go?

Nope.  I'm not at all sure, but it seemed like a reasonable place to  
start.  (Now you see why I was avoiding you on Friday afternoon, but  
obviously that was only delaying the inevitable code review ;-)

> I don't know about much about neon, but if you are using this to model
> load/store with multiple registers, there are probably better ways to
> go.  What set of operations does neon natively support on these
> datatypes?

Neon has plain old load/store multiple operations but that's not what  
this is for.

There is another set of load/store instructions that interleave  
individual elements for up to 4 vectors.  For example, the VLD4.8  
instruction loads 4 bytes for element 0 of 4 separate result vectors,  
then it loads 4 more bytes for element 1 of each vector, etc.  The  
element size may be 8, 16, or 32 bits and for multi-byte elements the  
individual elements get byte-swapped as needed for endianness.  I'm  
representing these operations as intrinsics in LLVM, since it seemed  
like pattern matching 64 separate loads and vector element insertions  
(in the case of VLD4.8) would be a bit unwieldy.  The vectors being  
loaded or stored must be in adjacent registers, and the type matters:  
each vector type has a different opcode (due to the potential byte  
swapping for endianness).  Of course, I could just define a different  
intrinsic for each type, but that's pretty ugly.

There are only a few other Neon instructions that directly operate on  
sets of registers like this.  VTBL and VTBX are table lookup  
operations where each element in a source vector is treated as an  
index into a table and the result vector elements are set to the table  
values.  The table itself is composed of up to 4 adjacent registers.   
We don't need all the types for these operations, though, because they  
only support 8-bit elements.

The VZIP, VUZP, and VTRN Neon instructions are vector shuffles that  
operate on 2 vectors in place, i.e., there are 2 vector operands used  
as both sources and destinations.  The instructions don't require that  
those 2 vectors be in adjacent registers, but the ARM intrinsics for  
those operations make that desirable.

I'm going to discuss these issues with Evan on Monday.  If there is an  
easier way to go about this, I'm all in favor.