[LLVMdev] Vector instructions
gohman at apple.com
Thu Jun 26 15:49:19 PDT 2008
On Jun 26, 2008, at 1:56 PM, Stefanus Du Toit wrote:
> I have some questions as to the definition of various vector
> instructions. In particular, I believe there are some gaps and
> inconsistencies in the vector instructions, and I'm interested in
> hearing whether you agree that these should be improved or whether
> there are other ways to solve these problems.
> 1. Shufflevector only accepts vectors of the same type
> Shufflevector seems overly restrictive right now in terms of its type
> requirements. In particular, it requires (this could be clearer in the
> language reference, but is supported by equivalent assertions in the
> source code) that the types of the sources and destinations match
> exactly. Obviously it makes sense for the element types to match, but
> the requirement that the number of elements in each source operand
> matches the number of elements in the destination appears overly
> I would propose to change the syntax from:
>> <result> = shufflevector <n x <ty>> <v1>, <n x <ty>> <v2>, <n x i32>
>> <mask> ; yields <n x <ty>>
>> <result> = shufflevector <a x <ty>> <v1>, <b x <ty>> <v2>, <d x i32>
>> <mask> ; yields <d x <ty>>
> With the requirement that the entries in the (still constant) mask are
> within the range of [0, a + b - 1].
> This allows things like taking a <2 x i32> and duplicating it into a
> <4 x i32>.
> This is very useful for frontends that provide general vector
> functionality. I think this is more consistent with the relaxed rules
> on numbers of vector elements.
The alternative is to have the frontend synthesize the needed
operations with extracts, inserts, and possibly shuffles if needed.
LLVM is actually fairly well prepared to optimize code like this.
I recommend giving this a try, and reporting any problems you
> 2. vector select
> 3. vector trunc, sext, zext, fptrunc, fpext
> 4. vector shl, lshr, ashr
We agree that these would be useful. There are intentions to add them
to LLVM; others can say more.
> 4. vfcmp, vicmp return types
> This topic came up recently on llvm-dev (thread "VFCmp failing when
> unordered or UnsafeFPMath on x86"). Having vector compares return a
> vector of integer elements with bit width equal to the element types
> being compared seems unnecessarily inconsistent with the icmp and fcmp
> instructions. Since only one bit is well-defined anyways, why not just
> return a vector of i1 elements? If after codegen this is expanded into
> some other width, that's fine -- the backend can do this since only
> one bit of information is exposed at the IR level anyways.
> Having these instructions return vector of i1 would also be nicely
> consistent with a vector select operation. Vector bit shifts and trunc
> would make this easier, but it seems to me that the entire IR would be
> far more consistent if these operations simply returned i1 vectors.
It turns out that having them return vectors of i1 would be somewhat
complicated. For example, a <4 x i1> on an SSE2 target could expand
either to 1 <4 x i32> or 2 <2 x i64>s, and the optimal thing would
be to make the decision based on the comparison that produced them,
but LLVM isn't yet equipped for that.
vicmp and vfcmp are very much aimed at solving practical problems on
popular architectures without needing significant new infrastructure
They're relatively new, and as you say, they'll be more useful when
combined with vector shifts and friends and we start teaching LLVM
to recognize popular idioms with them.
One interesting point here is that if we ever do want to have vector
comparisons that return vectors of i1, one way to do that would be to
overload plain icmp and fcmp, which would be neatly consistent with
the way plain mul and add are overloaded for vector types instead of
having a separate vadd and vmul.
More information about the llvm-dev