[llvm-commits] [llvm] r75308 - in /llvm/trunk: include/llvm/CodeGen/ValueTypes.h include/llvm/CodeGen/ValueTypes.td lib/VMCore/ValueTypes.cpp utils/TableGen/CodeGenTarget.cpp

Sun Jul 12 11:59:15 PDT 2009

On Jul 11, 2009, at 10:28 PM, Bob Wilson wrote:
> On Jul 11, 2009, at 5:02 PM, Chris Lattner wrote:
> On Jul 10, 2009, at 4:05 PM, Bob Wilson wrote:
>>> Author: bwilson
>>> Date: Fri Jul 10 18:05:09 2009
>>> New Revision: 75308
>>>
>>> URL: http://llvm.org/viewvc/llvm-project?rev=75308&view=rev
>>> Log:
>>> Add new vector types for 192-bit, 348-bit and 512-bit sizes.
>>> These are needed to represent ARM Neon struct datatypes containing
>>> 2, 3 or 4
>>> separate vectors.
>>
>> Bob, are you sure this is the right way to go?
>
> Nope.  I'm not at all sure, but it seemed like a reasonable place to
> start.  (Now you see why I was avoiding you on Friday afternoon, but
> obviously that was only delaying the inevitable code review ;-)

Hehe.

>> I don't know about much about neon, but if you are using this to  
>> model
>> load/store with multiple registers, there are probably better ways to
>> go.  What set of operations does neon natively support on these
>> datatypes?
>
> Neon has plain old load/store multiple operations but that's not what
> this is for.

Ok.

> There is another set of load/store instructions that interleave
> individual elements for up to 4 vectors.  For example, the VLD4.8
> instruction loads 4 bytes for element 0 of 4 separate result vectors,
> then it loads 4 more bytes for element 1 of each vector, etc.  The
> element size may be 8, 16, or 32 bits and for multi-byte elements the
> individual elements get byte-swapped as needed for endianness.  I'm
> representing these operations as intrinsics in LLVM, since it seemed
> like pattern matching 64 separate loads and vector element insertions
> (in the case of VLD4.8) would be a bit unwieldy.  The vectors being
> loaded or stored must be in adjacent registers, and the type matters:
> each vector type has a different opcode (due to the potential byte
> swapping for endianness).  Of course, I could just define a different
> intrinsic for each type, but that's pretty ugly.

I don't really understand how all this fits together, but is this  
really the most important piece to tackle in the short term?  Are you  
trying to make llvm be able to generate every NEON instruction, or are  
you implementing pieces of neon.h?  Can this be simplified by using  
intrinsics for these?  In the short term, the most important thing is  
to support neon.h fully, we can improve the codegen later as a second  
pass of optimization for cases that matter.  Trying to make the  
bizarre corner cases generate optimal code in the "first pass" of  
bringing up functionality doesn't seem very important.

> I'm going to discuss these issues with Evan on Monday.  If there is an
> easier way to go about this, I'm all in favor.

Ok, please drag me in if it doesn't look like there is a good solution.

-Chris