[llvm-commits] [llvm] r75308 - in /llvm/trunk: include/llvm/CodeGen/ValueTypes.h include/llvm/CodeGen/ValueTypes.td lib/VMCore/ValueTypes.cpp utils/TableGen/CodeGenTarget.cpp

Sun Jul 12 16:41:56 PDT 2009

On Jul 12, 2009, at 11:59 AM, Chris Lattner wrote:

> On Jul 11, 2009, at 10:28 PM, Bob Wilson wrote:
>
>> There is another set of load/store instructions that interleave
>> individual elements for up to 4 vectors.  For example, the VLD4.8
>> instruction loads 4 bytes for element 0 of 4 separate result vectors,
>> then it loads 4 more bytes for element 1 of each vector, etc.  The
>> element size may be 8, 16, or 32 bits and for multi-byte elements the
>> individual elements get byte-swapped as needed for endianness.  I'm
>> representing these operations as intrinsics in LLVM, since it seemed
>> like pattern matching 64 separate loads and vector element insertions
>> (in the case of VLD4.8) would be a bit unwieldy.  The vectors being
>> loaded or stored must be in adjacent registers, and the type matters:
>> each vector type has a different opcode (due to the potential byte
>> swapping for endianness).  Of course, I could just define a different
>> intrinsic for each type, but that's pretty ugly.
>
> I don't really understand how all this fits together, but is this
> really the most important piece to tackle in the short term?  Are you
> trying to make llvm be able to generate every NEON instruction, or are
> you implementing pieces of neon.h?  Can this be simplified by using
> intrinsics for these?

I'm just trying to finish implementing all the intrinsics defined by  
ARM for programming Neon.  These load/store instructions are the  
biggest missing piece.  I'm already using intrinsics for them, so it's  
just a matter of getting the register allocation to work.

>  In the short term, the most important thing is
> to support neon.h fully, we can improve the codegen later as a second
> pass of optimization for cases that matter.  Trying to make the
> bizarre corner cases generate optimal code in the "first pass" of
> bringing up functionality doesn't seem very important.

I agree.  I'm not trying to optimize for performance at this point, at  
least not very much.  I just want them to work.  There are some  
potential issues with getting even reasonable performance from the  
code.  It is possible that the existing LLVM optimizations will take  
care of everything, but I won't know until I can get basic code gen to  
work for these operations and run some code through.

I don't think these load/store instructions are bizarre corner cases.   
The table lookup instructions might be.