[LLVMdev] [Mesa3d-dev] Folding vector instructions

Tue Dec 30 17:11:27 PST 2008

On Dec 30, 2008, at 3:03 PM, Zack Rusin wrote:

> On Tuesday 30 December 2008 15:30:35 Chris Lattner wrote:
>> On Dec 30, 2008, at 6:39 AM, Corbin Simpson wrote:
>>>> However, the special instrucions cannot directly be mapped to LLVM
>>>> IR, like
>>>> "min", the conversion involves in 'extract' the vector, create
>>>> less-than-compare, create 'select' instruction, and create 'insert-
>>>> element'
>>>> instruction.
>>
>> Using scalar operations obviously works, but will probably produce
>> very inefficient code.  One positive thing is that all target- 
>> specific
>> operations of supported vector ISAs (Altivec and SSE[1-4] currently)
>> are exposed either through LLVM IR ops or through target-specific
>> builtins/intrinsics.  This means that you can get access to all the
>> crazy SSE instructions, but it means that your codegen would have to
>> handle this target-specific code generation.
>
> I think Alex was referring here to a AOS layout which is completely  
> not ready.
> The currently supported one is SOA layout which eliminates scalar  
> operations.

Ok!

>> Sure, it would be very reasonable to make these target-specific
>> builtins when targeting a GPU, the same way we have target-specific
>> builtins for SSE.
>
> Actually currently the plan is to have essentially a "two pass" LLVM  
> IR. I
> wanted the first one to never lower any of the GPU instructions so  
> we'd have
> intrinsics or maybe even just function calls like gallium.lit,  
> gallium.dot,
> gallium.noise and such. Then gallium should query the driver to  
> figure out
> which instructions the GPU supports and runs our custom llvm  
> lowering pass
> that decomposes those into things the GPU supports.

That makes a lot of sense.  Note that there is no reason to use actual  
LLVM intrinsics for this: naming them "gallium.lit" is just as good as  
"llvm.gallium.lit" for example.

> Essentially I'd like to
> make as many complicated things in gallium as possible to make the  
> GPU llvm
> backends in drivers as simple as possible and this would help us  
> make the
> pattern matching in the generator /a lot/ easier (matching  
> gallium.lit vs 9+
> instructions it would be be decomposed to) and give us a more  
> generic GPU
> independent layer above. But that hasn't been done yet, I hope to be  
> able to
> write that code while working on the OpenCL implementation for  
> Gallium.

Makes sense.  For the more complex functions (e.g. texture lookup) you  
can also just compile C code to LLVM IR and use the LLVM inliner to  
inline the code if you prefer.

-Chris