[llvm-dev] Some llvm questions (for a tgsi backend)

Hans de Goede via llvm-dev llvm-dev at lists.llvm.org
Mon Jan 11 06:35:33 PST 2016

Hi All,

Quick self intro: I'm a Foss developer working for Red Hat's graphics team.
I'm currently working on a tgsi backend for llvm with the purpose of being
able to compile opencl to tgsi, and to integrate this into mesa's nouveau
driver / clover state tracker to add opencl support for Nvidia cards to mesa.

I'm currently at the point where I can compile a simple opencl program
to something which sort of looks like tgsi.

You can find my latest work on this here:

I've a little test program of which I have 3 versions now,
1 raw gallium calls + a tgsi kernel
2 opencl calls to clover + a tgsi kernel
3 opencl calls to clover + an opencl kernel

1 and 2 have been tested on a kepler card, 3 has been
tested with pocl. My goal for this week is to get
the tgsi backend to produce code which I can copy
and paste into 2 and then have it working on a kepler card.

The test program looks like this:

__kernel void test_kern(__global uint *vals, __global uint *buf)
   uint id = get_global_id(0);

   buf[32 * id] -= vals[id];

The llvm ir looks like this:

bin/clang -x cl -c -emit-llvm -target tgsi-- -include /usr/share/pocl/include/_kernel.h -o ~/foo.ir -x cl -S ~/foo.cl

; ModuleID = '/home/hans/foo.cl'
target datalayout = "E-p:32:32-i64:64:64-f32:32:32-n32"
target triple = "tgsi--"

; Function Attrs: nounwind
define void @test_kern(i32 addrspace(1)* nocapture readonly %vals, i32 addrspace(1)* nocapture %buf) #0 {
   %call = tail call i32 @_Z13get_global_idj(i32 0) #2
   %arrayidx = getelementptr inbounds i32, i32 addrspace(1)* %vals, i32 %call
   %0 = load i32, i32 addrspace(1)* %arrayidx, align 4, !tbaa !7
   %mul = shl i32 %call, 5
   %arrayidx1 = getelementptr inbounds i32, i32 addrspace(1)* %buf, i32 %mul
   %1 = load i32, i32 addrspace(1)* %arrayidx1, align 4, !tbaa !7
   %sub = sub i32 %1, %0
   store i32 %sub, i32 addrspace(1)* %arrayidx1, align 4, !tbaa !7
   ret void

declare i32 @_Z13get_global_idj(i32) #1

attributes #0 = { nounwind "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #1 = { "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #2 = { nounwind }

!opencl.kernels = !{!0}
!llvm.ident = !{!6}

!0 = !{void (i32 addrspace(1)*, i32 addrspace(1)*)* @test_kern, !1, !2, !3, !4, !5}
!1 = !{!"kernel_arg_addr_space", i32 1, i32 1}
!2 = !{!"kernel_arg_access_qual", !"none", !"none"}
!3 = !{!"kernel_arg_type", !"uint*", !"uint*"}
!4 = !{!"kernel_arg_base_type", !"uint*", !"uint*"}
!5 = !{!"kernel_arg_type_qual", !"", !""}
!6 = !{!"clang version 3.8.0 (http://llvm.org/git/clang.git 9376f992e00569bd08a4ecf3a1d06d8b93c97681) (http://llvm.org/git/llvm.git 7a311143550c6fc01aa5000049825ecc09787440)"}
!7 = !{!8, !8, i64 0}
!8 = !{!"int", !9, i64 0}
!9 = !{!"omnipotent char", !10, i64 0}
!10 = !{!"Simple C/C++ TBAA"}

And the generated "tgsi" looks like this:

         .file   "/home/hans/foo.cl"
         .globl  test_kern
         MOVis TEMP1x, 0
         CAL _Z13get_global_idj
         SHLs TEMP1y, TEMP1x, 7
         LOADiis TEMP1z, [4]
         UADDs TEMP1y, TEMP1z, TEMP1y
         SHLs TEMP1x, TEMP1x, 2
         LOADiis TEMP1z, [0]
         UADDs TEMP1x, TEMP1z, TEMP1x
         LOADgis TEMP1x, [TEMP1x]
         INEGs TEMP1x, TEMP1x
         LOADgis TEMP1z, [TEMP1y]
         UADDs TEMP1x, TEMP1x, TEMP1z
         STOREgis [TEMP1y], TEMP1x

Working tgsi for this would look like this:

IMM UINT32 { 0, 0, 0, 0 }
IMM UINT32 { 4, 0, 0, 0 }
IMM UINT32 { 128, 0, 0, 0 }

        LOAD TEMP[0].xy, RINPUT, IMM[0]
        UMUL TEMP[1].x, SV[0], IMM[1]
        UADD TEMP[0].x, TEMP[0], TEMP[1]
        UMUL TEMP[1].x, SV[0], IMM[2]
        UADD TEMP[0].y, TEMP[0], TEMP[1].xxxx
        LOAD TEMP[1].x, RGLOBAL, TEMP[0]
        LOAD TEMP[0].x, RGLOBAL, TEMP[0].yyyy
        UADD TEMP[1].x, TEMP[0], -TEMP[1]
        STORE RGLOBAL.x, TEMP[0].yyyy, TEMP[1]

So my questions (I'm still quite green when it comes to llvm):

1) As you can see a proper tgsi program needs a header
to declare which registers (etc) it is using, in which
class-method should I implement this ?

2) Immediates need to be declared with a specific
value and then addressed as IMM[x], how would I go about
this ?

3) The get_global_id call needs to be translated into
simply using the SV[0] "register", how would I go about
this ?

4) The global and input load / stores are not handled
correctly, I see that the LOAD instructions get postfixed
with a i reps. g for input / global how would I go about
modifying the code emitter (AsmPrinter?) to change "LOADi"
into "LOAD <dest> RINPUT <offset>"?

5) Talking about the lowecase suffixes to the instructions,
these should not be part of the output, how do I filter these?

6) And finally, the current llvm-tgsi output uses e.g.
TEMP1y where as for the destination it should use TEMP[1].y
and for the sources it should use TEMP[1].xxxx (so include
proper swizzling info).

Lots of questions, sorry about that. Feel free to point me
to some relvant parts of the docs, I've tried to find answers
myself but I've gotten a bit lost in the docs.



More information about the llvm-dev mailing list