[LLVMdev] LLVMdev Digest, Vol 80, Issue 13

Wed Feb 16 10:20:22 PST 2011

Dave,

> Unifying array and vector and generalizing the result would open a lot
> of optimization opportunities.

you would be piling an incomplete optimization on top of a pile of  
already
incomplete optimizations...  Vectorization in Fortran is already a  
"hard" problem,
requiring alias analysis (always an incomplete and inaccurate  
(conservative)
analysis) and loop-carried array subscript dependence analysis (which is
equivalent to the Diophantine Equation problem in Mathematics which  
is in
general not solvable, so you end up again with incomplete and inaccurate
(conservative) analysis). Doing this in C (without first class array/ 
matrix
types), and with even more alias analysis issues, makes it more  
problematic.

The final straw with all these multimedia instruction sets is that  
they require
large alignment on their "packed array" data types (even Intel which  
started
out not requiring alignment with MMX (though unaligned data invoked hugh
performance penalties), did evolve with SSE to what everyone else  
requires).

data alignment can (and should!) be analyzed within the same  
algorithms that do
alias analysis, and the analysis has the same inherent limitations.

The problem is that in real world applications it is typical for  
array data slices
(ie sections of arrays that are passed to subroutines to be  
processed)  to be unaligned,
even if the array base address is aligned, the bounds of the section  
being processed
are in general not aligned.

You end up with wanting to clone your algorithm kernel for various  
incoming
alignments (just like memcpy, memcmp, etc are often cloned internally  
for
various relative alignments of the incoming arguments), but with a  
kernel
accessing N different arrays you end up needing 2**N clones, which in  
general
is an impractical code-explosion.

The reason I object to the use of "vector" and "simd" when describing  
these
"packed data" multimedia instruction sets is that in practical  
reality the traditional
vectorization optimization technology  just does not apply.  You can  
always
come up with geewiz examples where it does, but you cannot make it work
in the general case.

No matter what fancy data shuffling/permuting/inserting/extracting  
instructions
get added to MMX/SSE/SSE2, they will still not solve the data  
alignment problem,
so the instruction sets remain incompatible with "traditional vector  
machines"
where there was always one-data-item-per-HW-register and there was never
any alignment issue.

best,
Peter Lawrence.

On Feb 15, 2011, at 8:38 AM, David A. Greene wrote:

> Peter Lawrence <peterl95124 at sbcglobal.net> writes:
>
>> Andrew, your response highlights a naming problem in LLVM, which is
>> that "array" and "vector" mean the same thing in normal computer
>> language and compiler theory usage, so it is inconvenient and
>> misleading within LLVM to give one a very specific meaning that is
>> different from the other....
>
> I think any sort of separation at all is counterproductive.  The
> existing array/vector split is artificial.  It would be better to have
> one array-like type and allow a reasonable set of operations on it.
> Target lowering should take care of legality issues.  For best
> performance various transformation patterns will want to know about  
> the
> target but that's true independent of vector types.  Scalar optimizers
> want to know about targets too.
>
>> As far as I am aware not a single one of any of the above types of
>> instruction sets allows the "subscripting" of packed data within a
>> register
>
> Given what we know of Larrabee and speculating that the "Knights"  
> family
> is likely a derivative of it, it's safe to assume that future Intel
> architectures will be much more like traditional vector machines.   
> That
> means gather/scatter, element indexing, etc.  The existing PINSR/PEXTR
> and shuffle instructions already allow a degree of element indexing.
> Note that the existing LLVM vector types already have insert/extract
> operators.
>
> Unifying array and vector and generalizing the result would open a lot
> of optimization opportunities.
>
>                               -Dave