[LLVMdev] SIMD Projects with LLVM

Rob Cameron cameron at cs.sfu.ca
Wed Apr 2 18:52:57 PDT 2014


Hi everyone.   After lurking for a while, this is my
first post to the list.

I am working with some graduate students on the general
topic of compiler support for SIMD programming and specific
projects related to LLVM and my own Parabix technology
(parabix.costar.sfu.ca).

Right now we have a few course projects on the go and
already a question arising out of one of them (SSE2 Hoisting).
We're not sure how much has been tried before, or even
makes sense, but we're eager to learn.

Briefly the projects are:

SSE2 Hoisting: translating programs that directly use SSE2
intrinsics into platform-independent code expressed with LLVM IR.

Long integer support: systematic support for i128, i256, ... targetting
SIMD registers.  

Systematic strategies for the shufflevector operation.   This
is a very powerful operation that can be used to code for arbitrary
rearrangement of data in SIMD registers.   No architecture we
know of supports it in its full generaility.   But there are
many special cases that are recognized in code generation and
potentially many more that might be.

Systematic support for all power-of-2 field widths with
vector types.   For example, we are interested in <64 x i2> being
a legal type with appropriate expansion operations.   A student
has made a GSoC submission for this project.

The question I have right now actually relates to the i2 type.
In our SSE2 hoisting, we found an issue with the movemask_pd
operation, which extracts the sign bits of the 2 doubles in
a <2 x double> and returns them as an int32.   We would
like to use the icmp slt as the LLVM IR operation for this,
but have a problem when we bitcast the <2 x i1> vector to i2,
it seems.  We use the following LLVM IR code.

define i32 @signmaskd(<2 x double> %a) alwaysinline #5
{
        %bits = bitcast <2 x double> %a to <2 x i64>
        %b = icmp slt <2 x i64> %bits, zeroinitializer
        %c = bitcast <2 x i1> %b to i2
        %result = zext i2 %c to i32
        ret i32 %result
}

Unfortunately, we only get 1 bit of data out; the assembly language
output seems to confirm that the individual bit extractions take
place, but the second one clobbers the first.   We are using the 3.4
tool chain.

There is more detail at the following URL.
http://parabix.costar.sfu.ca/wiki/I2Result

Anyway the question is whether we should just try to treat
this as a bug to be fixed or whether our idea of working with
i2 types is misguided in a more fundamental way.






More information about the llvm-dev mailing list