[LLVMdev] [Discussion] Portable SIMD programming using LLVM?

Hua Huang huah001 at gmail.com
Mon Mar 28 14:20:17 PDT 2011


Hello all,

I am currently working on building a portable SIMD programming model(SIMD
within a register) for various platforms, like x86, ARM and so on. I know
that LLVM supports SIMD programming on many architectures, so I am thinking
of using LLVM to help build this model but I not quite sure whether LLVM can
help.

The following describes the details of the model and also my current
approach for this model.

*Background*
Almost modern processor families support SIMD instruction sets but the
instruction set designs for each platform have different combinations of
operations. The portable SIMD here is to make an uniform system of SIMD
operations at all power-of-2 field widths.
For example, for simd_add on SSE2, I want to have all the following
operations supported, simd<2^x>::add(a, b) for
2<=2^x<=register_size(register_size equals 128 in SSE2). However, SSE2 only
supports simd_add on field widths of 8, 16, 32, or 64. Hence, simulation for
simd_add on field widths of 2, 4 and 128 is needed. *At mean time, I want
the simulation to be as good as possible*.

*An iteration Approach*
My current approach of getting the best simulation is a strategy-based
iteration algorithm. For example, simd<4>::add can be implemented based on
simd<8>::add in the following manner.

===strategy_add_doubling===
1. hiMask = simd<8>::himask() //this is a constant with high 4 bits of each
8-bits field are 1111 and low 4 bits of each 8-bits field are 0000.
2. loMask = simd<8>::lowmask() //this is a constant with high 4 bits of each
8-bits field are 0000 and low 4 bits of each 8-bits field are 1111.
3. c = simd<8>::add(simd_and(loMask, a), simd_and(loMask, b))
4. d = simd<8>::add(simd_and(hiMask, a), simd_and(hiMask, b))
5. return simd_or(d, simd_and(loMask, c)) // get the answer for
simd<4>::add(a, b)

Also, it is not hard to see that simd<fw>::add can be implemented based on
simd<fw/2>::add. The key is getting the correct carry information and
constructing that carry mask.
===strategy_add_halving===
1. c = simd<fw/2>::add(a, b)
2. get the carry mask d //the mask should look like
...00010000...00000000...00010000...
3. return simd<fw/2>::add(d, c)

As you can see, for a certain simd operation, there are many ways/strategies
to simulate it, like simd<fw>::add, it can be simulated based on
simd<2*fw>::add and also based on simd<fw/2>::add.

Then, we can construct as many strategies as possible for each simd
operation we wanna support. Based on all the strategies we have, an
iteration algorithm can be performed to get the best simulation for each
simd operation. The criteria here for evaulation is the number of
instructions used in a simulation, so the best simulation will have the
least number of instructions.

*Can I use LLVM to improve the above approach or even get a new approach?*

Since LLVM supports various architectures and captures the system resources
very well, I am wondering whether LLVM could be used to help the project. I
went through the LLVM documentation, but I am still stucking here on how to
incorporate LLVM into my approach. Below are some of my thoughts.

1. The strategy-based approach doesn't capture the target resources very
well in terms of CPU cycles for operations, instruction selection, etc.
There might be a case that two different simulations for a certain operation
have the same number of instructions, the strategy-based approach can't
handle this case very well(It just randomly picks one).
2. The strategies are written by human, so they are not carefully tuned to
achieve best performance. There are quite a lot of room for improving the
performance.
3. Can LLVM be incorporated into the strategy-based approach so that LLVM
provides additional information for getting the best simulation?
4. Instead of using the strategy-based approach, can we utilize LLVM to get
a new approach to achieve the goal?

I know that LLVM has auto-vectorization and can generate SIMD instructions
for scalar codes. But I don't know the details how LLVM captures the target
resources and generates good instructions especially the SIMD instructions
on different platforms.

Any suggestions or comments are appreciated.

I look forward to hearing from you.

Thanks a lot,
Kevin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110328/a2f9f8eb/attachment.html>


More information about the llvm-dev mailing list