[LLVMdev] Does current LLVM target-independent code generator supports my strange chip?
Daniel M Gessel
gessel at apple.com
Mon Nov 24 08:16:22 PST 2008
Let me clarify - I haven't used LLVM for GLSL - I'm also relatively
new to LLVM targeting a modern GPU. My GLSL work was back in the
timeframe of AMD's R300/R400 series, which was 4 years ago.
On Nov 24, 2008, at 10:25 AM, Wei wrote:
>> The machines I worked with didn't support any integer ops, but GLSL
>> let us get by with "emulated" 16 bit integers (storing and operating
>> on them as floating point; divides required truncation after the op -
>> that sort of thing).
>
> Although my platform indeed supports integer operations, however, it
> only supports integer +,-,*, not /. The document says if I need to do
> integer division, I have to convert them to floating point first.
> Hence, I have similar problems.
>
> So...
> Does your method means you write some codes in your 'frontend' to emit
> LLVM IR to convert the integer to floating point first, then perform
> the operations, and then convert the result back to integer?
> Or you write such codes in your 'backend'?
>
> No matter what your answer is, I think the 'frontend' approach is more
> cleaner than the 'backend' approach (The 'backend' approach is more
> like a hack?). Am I right? Or writing such mechanism in backend has
> other advantages?
IMHO I don't think of the backend approach as a hack:
Minimizing the dependencies of the frontend on the target is generally
a good thing, assuming you'll possibly be targeting different HW in
the future.
The backend approach means that integer division is a fairly long code
sequence: that's just fine within LLVM.
>
>
>> What I mean is that you can probably get away with LLVM working with
>> float literals as f32, then converting them to your 24 bit format
>> during code gen.
>
> I think I got you here.
>
>> Integers too: let LLVM work with i32 internally, and convert literals
>> during code gen.
>
> Huh.. I think I got you here, too.
> But I probably don't know how you handle integer constants larger than
> 24-bit.
> For example, if I sees the following instructions during code gen:
>
> int %a, add int %b, int 0x12345678
>
> Do I have to emit machine instructions similar to the following?
>
> int %a, add int %b, int 0x5678
> int %c, add int %d, int 0x1234
> int %e, add int %c, 1 <--- depends on the result of the first addition
>
> However, this means the backend has to remember the register %a now
> stores low bytes of the result, and the register %c stores the high
> bytes of the result. This tracking is not an easy job, I think.
Unextended GLSL doesn't require support for integers larger than 16
bits.
>
>> I assume you'll be starting with the reference GLSL parser (from
>> 3DLabs, IIRC - I don't even know if they stil exist, actually)
>
> You can find the 3Dlabs frontend here:
> http://l4.me.uk/static/glsl/
>
> And I don't think anyone has ported this frontend onto LLVM before.
>
>> The issue would be that LLVM would want to store register values as
>> 32
>> bits - and do all the pointer math that way.
>
> I don't really get you here.
> Why LLVM do all the pointer math in 32-bit just because I store
> register values as 32-bit?
What I mean is that LLVM would think of your registers as taking 4
bytes in memory, and do all the pointer math that way: multiplying
array indexes by 4. This may be fine on your machine, but it seems
plausible that you would want 3 byte alignment, and, in that case, you
would have to patch things up.
>
>> I haven't had to work with register constraints in LLVM, so I'm not
>> sure what would be best approach if I/O is done through specific
>> GPRs:
>> you don't want to reserve those registers for I/O only.... it would
>> take some exploration.
>
> unfortunately~! my platform indeed uses GPRs to do the input/output.
> My current thought is to compute the amount of used attributes/
> varyings in a shader, and reserve same amount GPRs for those
> attributes/varyings ONLY. Because I think if I have NO memory can
> spill registers out, there is no much space for the register
> allocation. The method I might use is to INLINE all functions, and
> perform the register allocation. This strategy is bad, or course, or
> do you think of some other better solution?
This sounds like a good bringup approach to get you started, both I/O
and inlining all functions.
I've been learning LLVM as I go - my suspicion is that LLVM can do
better on the I/O question with the right register information - as
you learn more, some creative approach will present itself.
Similarly for inlining - calls and returns can be custom handled -
maybe there's a way to tie this in to a customized register
allocator... As long as your shaders aren't busting out of your
instruction limits (or instruction cache size, depending on the HW),
inlining is a good thing.
In addition to GLSL, Khronos' recently announced OpenCL which also
disallows recursion in part because stack operations are still very
slow on GPUs (small dependent load/stores aren't great for the huge
pipeline). A random non-expert thought: maybe there's some general
approach to non-stack based function calling that could be implemented
with a global register allocator and an analysis of the call tree?
Dan
>
>
> Wei.
>
>
> On Nov 23, 1:37 am, Daniel M Gessel <ges... at apple.com> wrote:
>> On Nov 22, 2008, at 11:03 AM, Wei wrote:
>>
>>> I have 24-bit integer operations as well as 24-bit floating point
>>> (s7.16) operations.
>>
>>> The H/W supports load/store instructions, however, they does suggest
>>> us not to use these load/store instructions besides debugging
>>> purpose.
>>> That is to say, you can imagine we don't have load/store
>>> instructions,
>>> we don't have memory, we just have registers.
>>
>>> I will run OpenGL shading laugnage programs on these chip.
>>
>> GLSL doesn't have pointers, so no "generic" load + store simplifying
>> things.
>>
>> Unextended GLSL only requires support for integers in the 16 bit
>> range, and has no bitwise operations. It also doesn't specify integer
>> overflow behavior in any way.
>>
>> The machines I worked with didn't support any integer ops, but GLSL
>> let us get by with "emulated" 16 bit integers (storing and operating
>> on them as floating point; divides required truncation after the op -
>> that sort of thing).
>>
>> Since you have 24 bit integer operations, you're in better shape.
>>
>>> About your comments, I (a new LLVM user) have some more questions:
>>
>>> 1) You mention "custom handle the conversion of the integer/float
>>> constants that LLVM spits out", does it means:
>>> I have to register a callback function which will operate when LLVM
>>> wants to spits out a constant value to memory. But what about non-
>>> constant value?
>>
>> What I mean is that you can probably get away with LLVM working with
>> float literals as f32, then converting them to your 24 bit format
>> during code gen. The specifics depend on how you want to handle
>> constants in your backend: literals in instructions or a constant
>> pool
>> are the options I know of. For now, I'm using special "load literal"
>> instructions, but a constant pool may be more appropriate in the long
>> run. I'm still learning.
>>
>> Integers too: let LLVM work with i32 internally, and convert literals
>> during code gen.
>>
>> Since GLSL doesn't require load/store, and it sounds like your HW may
>> not 100% reliable for these ops, you want to make sure your code
>> stays
>> in registers.
>>
>> I assume you'll be starting with the reference GLSL parser (from
>> 3DLabs, IIRC - I don't even know if they stil exist, actually) and
>> having it generate LLVM IR (has anybody done this before?). This will
>> give you much more control over the code - Clang is the front end for
>> the project I'm working on, and it generates stack based code; most
>> of
>> the stack operations get optimized out by inlining and the mem2reg
>> pass, but not everything.
>>
>>> ex:
>>> int a;
>>> and LLVM wants to put a into memory.
>>
>>> and I don't really know what the "i32/f32 sounds a good place to
>>> start" means...
>>
>> I mean that having your registers declared as i32 + f32 will probably
>> work out well, especially since you don't have pointers in your
>> language.
>>
>> The issue would be that LLVM would want to store register values as
>> 32
>> bits - and do all the pointer math that way. Depending on how your HW
>> works, this may or may not be okay. Even then, you might be able to
>> patch it up if you really needed to store your registers 3 byte
>> aligned.
>>
>> Fortunately, this is not an issue with GLSL.
>>
>>> 2) I don't know why you mention "I'd assume you'd have intrinsics
>>> for
>>> I/O."
>>
>> For GLSL, you have to have some way of reading attributes and
>> uniforms, exporting to/reading from varyings, etc.
>>
>> Different GPUs do things differently of course: in some cases, it's a
>> matter of certain GPRs being initialized by "fixed function" HW with
>> input values at the start of the shader and certain GPRs being left
>> with output values at the end of the shader. Other GPUs require
>> explicit "export" instructions, perhaps just reads/writes to
>> dedicated
>> I/O registers. Some have a mix (this is the case for HW I've worked
>> with).
>>
>> If you have export instructions, or even special I/O registers, I was
>> thinking that they could be represented or accessed by Target
>> specific
>> ops -intrinsics. You'd have the GLSL front end generate these
>> intrinsic operations.
>>
>> I haven't had to work with register constraints in LLVM, so I'm not
>> sure what would be best approach if I/O is done through specific
>> GPRs:
>> you don't want to reserve those registers for I/O only.... it would
>> take some exploration.
>>
>>
>>
>>> 3) I don't think I get you about the following statements:
>>>> If you want to support memory operations, your integers need to
>>>> support the addressing range correctly - you effectively have 17
>>>> bits
>>>> of mantissa - so it may be a tight squeeze without 24 bit integer
>>>> ops
>>>> (shifts and ands and stuff will also be a painful, but that's a
>>>> more
>>>> expansive topic).
>>> Can you give some example?
>>
>> Sorry, I was "thinking out loud".
>>
>> I made the assumption here that you didn't have 24 bit integer ops,
>> and that you might try to represent pointers as integers in a single
>> 24 bit float value (maybe with a 1D texture as your addressable
>> memory). In that case, you'd have a very limited range.
>>
>> But GLSL doesn't have pointers, so this isn't an issue (and 24 bit
>> integers gives you a decent addressing range for debugging).
>>
>> Dan
>>
>>
>>
>>> Really really thanks about your comments.
>>
>>> Wei.
>>
>>> On Nov 20, 10:24 pm, Daniel M Gessel <ges... at apple.com> wrote:
>>>> This is similar to ATI's R300/R420 pixel shaders. I'm familiar with
>>>> this hardware, but not really an LLVM expert (working on a code
>>>> generator myself, but learning as I go).
>>
>>>> Do you have 24-bit integer operations, or just floating point?
>>
>>>> What about load/store?
>>
>>>> Are you looking to run large C programs with complex data
>>>> structures,
>>>> or just comparatively simple math functions (i.e. a compute
>>>> "kernel")?
>>
>>>> If you only want to support programs that can live entirely within
>>>> registers, you can custom handle the conversion of the integer/
>>>> float
>>>> constants that LLVM spits out and i32/f32 sounds a good place to
>>>> start
>>>> - LLVM's mem2reg and inlining is very effective at getting rid the
>>>> majority of stack operations, and I'd assume you'd have intrinsics
>>>> for
>>>> I/O.
>>
>>>> If you want to support memory operations, your integers need to
>>>> support the addressing range correctly - you effectively have 17
>>>> bits
>>>> of mantissa - so it may be a tight squeeze without 24 bit integer
>>>> ops
>>>> (shifts and ands and stuff will also be a painful, but that's a
>>>> more
>>>> expansive topic).
>>
>>>> Dan
>>
>>>> On Nov 20, 2008, at 7:46 AM, Wei wrote:
>>
>>>>> Because each channel contains 24-bit, so.. what is the
>>>>> llvm::SimpleValueType I should use for each channel?
>>>>> the current llvm::SimpleValueType contains i1, i8, i16, i32, i64,
>>>>> f32,
>>>>> f64, f80, none of them are fit one channel (24-bit).
>>
>>>>> I think I can use i32 or f32 to represent each 24-bit channel, if
>>>>> the
>>>>> runtime result of some machine instructions exceeds 23-bit (1
>>>>> bit is
>>>>> for sign), then it is an overflow.
>>>>> Is it correct to claim that the programmers needs to revise his
>>>>> program to fix this problem?
>>>>> Am I right or wrong about this thought?
>>
>>>>> If there is a chip, whose registers are 24-bit long, and you
>>>>> have to
>>>>> compile C/C++ programs on it.
>>>>> How would you represent the following statement?
>>
>>>>> int a = 3;
>>>>> (Programmers think sizeof(int) = 4)
>>
>>>>> Wei.
>>
>>>>> On Nov 19, 2:01 am, Evan Cheng <evan.ch... at apple.com> wrote:
>>>>>> Why not model each channel as a separate physical register?
>>
>>>>>> Evan
>>
>>>>>> On Nov 17, 2008, at 6:36 AM, Wei wrote:
>>
>>>>>>> I have a very strange and complicate H/W platform.
>>>>>>> It has many registers in one format.
>>>>>>> The register format is:
>>
>>>>>>> ------------------------------
>>>>>>> ----------------------------------------------------------------------------------------
>>>>>>> | 24-bit | 24-bit
>>>>>>> | 24-bit | 24-
>>>>>>> bit |
>>>>>>> ----------------------------------------------------------------------------------------------------------------------
>>>>>>> a
>>>>>>> b
>>>>>>> c d
>>
>>>>>>> There are 4 channels in a register, and each channel contains
>>>>>>> 24-
>>>>>>> bit, hence, there are total 96-bit in 'one' register.
>>>>>>> You can store a 24-bit integer or a s7.16 floating-point data
>>>>>>> into
>>>>>>> each channel.
>>>>>>> You can name each channel 'a', 'b', 'c', 'd'.
>>
>>>>>>> Here is an example of the operation in this H/W platform:
>>
>>>>>>> ADD R3.ab, R1.abab, R2.bbaa
>>
>>>>>>> it means
>>
>>>>>>> Add 'abab' channel of R1 and 'bbaa' channel of R2,
>>>>>>> and
>>>>>>> put the result into the 'ab' channel of R3.
>>
>>>>>>> It's complicate.
>>>>>>> Imagine a non-existed temp register named 'Rt1', the content of
>>>>>>> its
>>>>>>> 'a','b','c','d' channel are got from 'a','b','a','b' channel of
>>>>>>> R1,
>>>>>>> and imagine another non-existed temp register named 'Rt2', the
>>>>>>> content of its 'a','b','c','d' channel are got from
>>>>>>> 'b','b','a','a'
>>>>>>> channel of R2.
>>>>>>> and then add Rt1 & Rt2, put the result to R3
>>>>>>> this means
>>>>>>> the 'a' channel of R3 will be equal to the 'a' channel of Rt1
>>>>>>> plus
>>>>>>> the 'a' channel of Rt2, (i.e. 'a' from R1 + 'b' from R2, because
>>>>>>> R1.'a'bab and R2.'b'baa)
>>>>>>> the 'b' channel of R3 will be equal to the 'b' channel of Rt1
>>>>>>> plus
>>>>>>> the 'b' channel of Rt2, (i.e. 'b' from R1 + 'b' from R2, because
>>>>>>> R1.a'b'ab and R2.b'b'aa)
>>>>>>> the 'c' channel of R3 will be untouched, the value of the 'c'
>>>>>>> channel of Rt1 plus the 'c' channel of Rt2 (i.e. 'a' from R1 +
>>>>>>> 'a'
>>>>>>> from R2, because R1.ab'a'b and R2.bb'a'a) will be lost.
>>>>>>> the 'd' channel of R3 will be untouched, too. The value of the
>>>>>>> 'd'
>>>>>>> channel of Rt1 plus the 'd' channel of Rt2 (i.e. 'b' from R1 +
>>>>>>> 'a'
>>>>>>> from R2, because R1.aba'b' and R2.bba'a') will be lost, too.
>>
>>>>>>> I don't know whether I can set the 'type' of such register
>>>>>>> using a
>>>>>>> llvm::MVT::SimpleValueType?
>>>>>>> According the LLVM doc & LLVM source codes, I think
>>>>>>> llvm::MVT::v8i8,
>>>>>>> v2f32, etc is used to represent register for SIMD
>>
>> ...
>>
>> read more ยป
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVM... at cs.uiuc.edu http://llvm.cs.uiuc.eduhttp://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
More information about the llvm-dev
mailing list