[llvm-commits] [llvm] r75308 - in /llvm/trunk: include/llvm/CodeGen/ValueTypes.h include/llvm/CodeGen/ValueTypes.td lib/VMCore/ValueTypes.cpp utils/TableGen/CodeGenTarget.cpp
Bob Wilson
bob.wilson at apple.com
Sat Jul 11 22:28:22 PDT 2009
On Jul 11, 2009, at 5:02 PM, Chris Lattner wrote:
>
> On Jul 10, 2009, at 4:05 PM, Bob Wilson wrote:
>
>> Author: bwilson
>> Date: Fri Jul 10 18:05:09 2009
>> New Revision: 75308
>>
>> URL: http://llvm.org/viewvc/llvm-project?rev=75308&view=rev
>> Log:
>> Add new vector types for 192-bit, 348-bit and 512-bit sizes.
>> These are needed to represent ARM Neon struct datatypes containing
>> 2, 3 or 4
>> separate vectors.
>
> Bob, are you sure this is the right way to go?
Nope. I'm not at all sure, but it seemed like a reasonable place to
start. (Now you see why I was avoiding you on Friday afternoon, but
obviously that was only delaying the inevitable code review ;-)
> I don't know about much about neon, but if you are using this to model
> load/store with multiple registers, there are probably better ways to
> go. What set of operations does neon natively support on these
> datatypes?
Neon has plain old load/store multiple operations but that's not what
this is for.
There is another set of load/store instructions that interleave
individual elements for up to 4 vectors. For example, the VLD4.8
instruction loads 4 bytes for element 0 of 4 separate result vectors,
then it loads 4 more bytes for element 1 of each vector, etc. The
element size may be 8, 16, or 32 bits and for multi-byte elements the
individual elements get byte-swapped as needed for endianness. I'm
representing these operations as intrinsics in LLVM, since it seemed
like pattern matching 64 separate loads and vector element insertions
(in the case of VLD4.8) would be a bit unwieldy. The vectors being
loaded or stored must be in adjacent registers, and the type matters:
each vector type has a different opcode (due to the potential byte
swapping for endianness). Of course, I could just define a different
intrinsic for each type, but that's pretty ugly.
There are only a few other Neon instructions that directly operate on
sets of registers like this. VTBL and VTBX are table lookup
operations where each element in a source vector is treated as an
index into a table and the result vector elements are set to the table
values. The table itself is composed of up to 4 adjacent registers.
We don't need all the types for these operations, though, because they
only support 8-bit elements.
The VZIP, VUZP, and VTRN Neon instructions are vector shuffles that
operate on 2 vectors in place, i.e., there are 2 vector operands used
as both sources and destinations. The instructions don't require that
those 2 vectors be in adjacent registers, but the ARM intrinsics for
those operations make that desirable.
I'm going to discuss these issues with Evan on Monday. If there is an
easier way to go about this, I'm all in favor.
More information about the llvm-commits
mailing list