[LLVMdev] LLVMdev Digest, Vol 80, Issue 13
Peter Lawrence
peterl95124 at sbcglobal.net
Wed Feb 16 10:20:22 PST 2011
Dave,
> Unifying array and vector and generalizing the result would open a lot
> of optimization opportunities.
you would be piling an incomplete optimization on top of a pile of
already
incomplete optimizations... Vectorization in Fortran is already a
"hard" problem,
requiring alias analysis (always an incomplete and inaccurate
(conservative)
analysis) and loop-carried array subscript dependence analysis (which is
equivalent to the Diophantine Equation problem in Mathematics which
is in
general not solvable, so you end up again with incomplete and inaccurate
(conservative) analysis). Doing this in C (without first class array/
matrix
types), and with even more alias analysis issues, makes it more
problematic.
The final straw with all these multimedia instruction sets is that
they require
large alignment on their "packed array" data types (even Intel which
started
out not requiring alignment with MMX (though unaligned data invoked hugh
performance penalties), did evolve with SSE to what everyone else
requires).
data alignment can (and should!) be analyzed within the same
algorithms that do
alias analysis, and the analysis has the same inherent limitations.
The problem is that in real world applications it is typical for
array data slices
(ie sections of arrays that are passed to subroutines to be
processed) to be unaligned,
even if the array base address is aligned, the bounds of the section
being processed
are in general not aligned.
You end up with wanting to clone your algorithm kernel for various
incoming
alignments (just like memcpy, memcmp, etc are often cloned internally
for
various relative alignments of the incoming arguments), but with a
kernel
accessing N different arrays you end up needing 2**N clones, which in
general
is an impractical code-explosion.
The reason I object to the use of "vector" and "simd" when describing
these
"packed data" multimedia instruction sets is that in practical
reality the traditional
vectorization optimization technology just does not apply. You can
always
come up with geewiz examples where it does, but you cannot make it work
in the general case.
No matter what fancy data shuffling/permuting/inserting/extracting
instructions
get added to MMX/SSE/SSE2, they will still not solve the data
alignment problem,
so the instruction sets remain incompatible with "traditional vector
machines"
where there was always one-data-item-per-HW-register and there was never
any alignment issue.
best,
Peter Lawrence.
On Feb 15, 2011, at 8:38 AM, David A. Greene wrote:
> Peter Lawrence <peterl95124 at sbcglobal.net> writes:
>
>> Andrew, your response highlights a naming problem in LLVM, which is
>> that "array" and "vector" mean the same thing in normal computer
>> language and compiler theory usage, so it is inconvenient and
>> misleading within LLVM to give one a very specific meaning that is
>> different from the other....
>
> I think any sort of separation at all is counterproductive. The
> existing array/vector split is artificial. It would be better to have
> one array-like type and allow a reasonable set of operations on it.
> Target lowering should take care of legality issues. For best
> performance various transformation patterns will want to know about
> the
> target but that's true independent of vector types. Scalar optimizers
> want to know about targets too.
>
>> As far as I am aware not a single one of any of the above types of
>> instruction sets allows the "subscripting" of packed data within a
>> register
>
> Given what we know of Larrabee and speculating that the "Knights"
> family
> is likely a derivative of it, it's safe to assume that future Intel
> architectures will be much more like traditional vector machines.
> That
> means gather/scatter, element indexing, etc. The existing PINSR/PEXTR
> and shuffle instructions already allow a degree of element indexing.
> Note that the existing LLVM vector types already have insert/extract
> operators.
>
> Unifying array and vector and generalizing the result would open a lot
> of optimization opportunities.
>
> -Dave
More information about the llvm-dev
mailing list