[LLVMdev] Does current LLVM target-independent code generator supports my strange chip?
Wei
wei.hu.tw at gmail.com
Mon Nov 24 07:25:54 PST 2008
> The machines I worked with didn't support any integer ops, but GLSL
> let us get by with "emulated" 16 bit integers (storing and operating
> on them as floating point; divides required truncation after the op -
> that sort of thing).
Although my platform indeed supports integer operations, however, it
only supports integer +,-,*, not /. The document says if I need to do
integer division, I have to convert them to floating point first.
Hence, I have similar problems.
So...
Does your method means you write some codes in your 'frontend' to emit
LLVM IR to convert the integer to floating point first, then perform
the operations, and then convert the result back to integer?
Or you write such codes in your 'backend'?
No matter what your answer is, I think the 'frontend' approach is more
cleaner than the 'backend' approach (The 'backend' approach is more
like a hack?). Am I right? Or writing such mechanism in backend has
other advantages?
> What I mean is that you can probably get away with LLVM working with
> float literals as f32, then converting them to your 24 bit format
> during code gen.
I think I got you here.
> Integers too: let LLVM work with i32 internally, and convert literals
> during code gen.
Huh.. I think I got you here, too.
But I probably don't know how you handle integer constants larger than
24-bit.
For example, if I sees the following instructions during code gen:
int %a, add int %b, int 0x12345678
Do I have to emit machine instructions similar to the following?
int %a, add int %b, int 0x5678
int %c, add int %d, int 0x1234
int %e, add int %c, 1 <--- depends on the result of the first addition
However, this means the backend has to remember the register %a now
stores low bytes of the result, and the register %c stores the high
bytes of the result. This tracking is not an easy job, I think.
> I assume you'll be starting with the reference GLSL parser (from
> 3DLabs, IIRC - I don't even know if they stil exist, actually)
You can find the 3Dlabs frontend here:
http://l4.me.uk/static/glsl/
And I don't think anyone has ported this frontend onto LLVM before.
> The issue would be that LLVM would want to store register values as 32
> bits - and do all the pointer math that way.
I don't really get you here.
Why LLVM do all the pointer math in 32-bit just because I store
register values as 32-bit?
> I haven't had to work with register constraints in LLVM, so I'm not
> sure what would be best approach if I/O is done through specific GPRs:
> you don't want to reserve those registers for I/O only.... it would
> take some exploration.
unfortunately~! my platform indeed uses GPRs to do the input/output.
My current thought is to compute the amount of used attributes/
varyings in a shader, and reserve same amount GPRs for those
attributes/varyings ONLY. Because I think if I have NO memory can
spill registers out, there is no much space for the register
allocation. The method I might use is to INLINE all functions, and
perform the register allocation. This strategy is bad, or course, or
do you think of some other better solution?
Wei.
On Nov 23, 1:37 am, Daniel M Gessel <ges... at apple.com> wrote:
> On Nov 22, 2008, at 11:03 AM, Wei wrote:
>
> > I have 24-bit integer operations as well as 24-bit floating point
> > (s7.16) operations.
>
> > The H/W supports load/store instructions, however, they does suggest
> > us not to use these load/store instructions besides debugging purpose.
> > That is to say, you can imagine we don't have load/store instructions,
> > we don't have memory, we just have registers.
>
> > I will run OpenGL shading laugnage programs on these chip.
>
> GLSL doesn't have pointers, so no "generic" load + store simplifying
> things.
>
> Unextended GLSL only requires support for integers in the 16 bit
> range, and has no bitwise operations. It also doesn't specify integer
> overflow behavior in any way.
>
> The machines I worked with didn't support any integer ops, but GLSL
> let us get by with "emulated" 16 bit integers (storing and operating
> on them as floating point; divides required truncation after the op -
> that sort of thing).
>
> Since you have 24 bit integer operations, you're in better shape.
>
> > About your comments, I (a new LLVM user) have some more questions:
>
> > 1) You mention "custom handle the conversion of the integer/float
> > constants that LLVM spits out", does it means:
> > I have to register a callback function which will operate when LLVM
> > wants to spits out a constant value to memory. But what about non-
> > constant value?
>
> What I mean is that you can probably get away with LLVM working with
> float literals as f32, then converting them to your 24 bit format
> during code gen. The specifics depend on how you want to handle
> constants in your backend: literals in instructions or a constant pool
> are the options I know of. For now, I'm using special "load literal"
> instructions, but a constant pool may be more appropriate in the long
> run. I'm still learning.
>
> Integers too: let LLVM work with i32 internally, and convert literals
> during code gen.
>
> Since GLSL doesn't require load/store, and it sounds like your HW may
> not 100% reliable for these ops, you want to make sure your code stays
> in registers.
>
> I assume you'll be starting with the reference GLSL parser (from
> 3DLabs, IIRC - I don't even know if they stil exist, actually) and
> having it generate LLVM IR (has anybody done this before?). This will
> give you much more control over the code - Clang is the front end for
> the project I'm working on, and it generates stack based code; most of
> the stack operations get optimized out by inlining and the mem2reg
> pass, but not everything.
>
> > ex:
> > int a;
> > and LLVM wants to put a into memory.
>
> > and I don't really know what the "i32/f32 sounds a good place to
> > start" means...
>
> I mean that having your registers declared as i32 + f32 will probably
> work out well, especially since you don't have pointers in your
> language.
>
> The issue would be that LLVM would want to store register values as 32
> bits - and do all the pointer math that way. Depending on how your HW
> works, this may or may not be okay. Even then, you might be able to
> patch it up if you really needed to store your registers 3 byte aligned.
>
> Fortunately, this is not an issue with GLSL.
>
> > 2) I don't know why you mention "I'd assume you'd have intrinsics for
> > I/O."
>
> For GLSL, you have to have some way of reading attributes and
> uniforms, exporting to/reading from varyings, etc.
>
> Different GPUs do things differently of course: in some cases, it's a
> matter of certain GPRs being initialized by "fixed function" HW with
> input values at the start of the shader and certain GPRs being left
> with output values at the end of the shader. Other GPUs require
> explicit "export" instructions, perhaps just reads/writes to dedicated
> I/O registers. Some have a mix (this is the case for HW I've worked
> with).
>
> If you have export instructions, or even special I/O registers, I was
> thinking that they could be represented or accessed by Target specific
> ops -intrinsics. You'd have the GLSL front end generate these
> intrinsic operations.
>
> I haven't had to work with register constraints in LLVM, so I'm not
> sure what would be best approach if I/O is done through specific GPRs:
> you don't want to reserve those registers for I/O only.... it would
> take some exploration.
>
>
>
> > 3) I don't think I get you about the following statements:
> >> If you want to support memory operations, your integers need to
> >> support the addressing range correctly - you effectively have 17 bits
> >> of mantissa - so it may be a tight squeeze without 24 bit integer ops
> >> (shifts and ands and stuff will also be a painful, but that's a more
> >> expansive topic).
> > Can you give some example?
>
> Sorry, I was "thinking out loud".
>
> I made the assumption here that you didn't have 24 bit integer ops,
> and that you might try to represent pointers as integers in a single
> 24 bit float value (maybe with a 1D texture as your addressable
> memory). In that case, you'd have a very limited range.
>
> But GLSL doesn't have pointers, so this isn't an issue (and 24 bit
> integers gives you a decent addressing range for debugging).
>
> Dan
>
>
>
> > Really really thanks about your comments.
>
> > Wei.
>
> > On Nov 20, 10:24 pm, Daniel M Gessel <ges... at apple.com> wrote:
> >> This is similar to ATI's R300/R420 pixel shaders. I'm familiar with
> >> this hardware, but not really an LLVM expert (working on a code
> >> generator myself, but learning as I go).
>
> >> Do you have 24-bit integer operations, or just floating point?
>
> >> What about load/store?
>
> >> Are you looking to run large C programs with complex data structures,
> >> or just comparatively simple math functions (i.e. a compute
> >> "kernel")?
>
> >> If you only want to support programs that can live entirely within
> >> registers, you can custom handle the conversion of the integer/float
> >> constants that LLVM spits out and i32/f32 sounds a good place to
> >> start
> >> - LLVM's mem2reg and inlining is very effective at getting rid the
> >> majority of stack operations, and I'd assume you'd have intrinsics
> >> for
> >> I/O.
>
> >> If you want to support memory operations, your integers need to
> >> support the addressing range correctly - you effectively have 17 bits
> >> of mantissa - so it may be a tight squeeze without 24 bit integer ops
> >> (shifts and ands and stuff will also be a painful, but that's a more
> >> expansive topic).
>
> >> Dan
>
> >> On Nov 20, 2008, at 7:46 AM, Wei wrote:
>
> >>> Because each channel contains 24-bit, so.. what is the
> >>> llvm::SimpleValueType I should use for each channel?
> >>> the current llvm::SimpleValueType contains i1, i8, i16, i32, i64,
> >>> f32,
> >>> f64, f80, none of them are fit one channel (24-bit).
>
> >>> I think I can use i32 or f32 to represent each 24-bit channel, if
> >>> the
> >>> runtime result of some machine instructions exceeds 23-bit (1 bit is
> >>> for sign), then it is an overflow.
> >>> Is it correct to claim that the programmers needs to revise his
> >>> program to fix this problem?
> >>> Am I right or wrong about this thought?
>
> >>> If there is a chip, whose registers are 24-bit long, and you have to
> >>> compile C/C++ programs on it.
> >>> How would you represent the following statement?
>
> >>> int a = 3;
> >>> (Programmers think sizeof(int) = 4)
>
> >>> Wei.
>
> >>> On Nov 19, 2:01 am, Evan Cheng <evan.ch... at apple.com> wrote:
> >>>> Why not model each channel as a separate physical register?
>
> >>>> Evan
>
> >>>> On Nov 17, 2008, at 6:36 AM, Wei wrote:
>
> >>>>> I have a very strange and complicate H/W platform.
> >>>>> It has many registers in one format.
> >>>>> The register format is:
>
> >>>>> ------------------------------
> >>>>> ----------------------------------------------------------------------------------------
> >>>>> | 24-bit | 24-bit
> >>>>> | 24-bit | 24-
> >>>>> bit |
> >>>>> ----------------------------------------------------------------------------------------------------------------------
> >>>>> a
> >>>>> b
> >>>>> c d
>
> >>>>> There are 4 channels in a register, and each channel contains 24-
> >>>>> bit, hence, there are total 96-bit in 'one' register.
> >>>>> You can store a 24-bit integer or a s7.16 floating-point data into
> >>>>> each channel.
> >>>>> You can name each channel 'a', 'b', 'c', 'd'.
>
> >>>>> Here is an example of the operation in this H/W platform:
>
> >>>>> ADD R3.ab, R1.abab, R2.bbaa
>
> >>>>> it means
>
> >>>>> Add 'abab' channel of R1 and 'bbaa' channel of R2, and
> >>>>> put the result into the 'ab' channel of R3.
>
> >>>>> It's complicate.
> >>>>> Imagine a non-existed temp register named 'Rt1', the content of
> >>>>> its
> >>>>> 'a','b','c','d' channel are got from 'a','b','a','b' channel of
> >>>>> R1,
> >>>>> and imagine another non-existed temp register named 'Rt2', the
> >>>>> content of its 'a','b','c','d' channel are got from
> >>>>> 'b','b','a','a'
> >>>>> channel of R2.
> >>>>> and then add Rt1 & Rt2, put the result to R3
> >>>>> this means
> >>>>> the 'a' channel of R3 will be equal to the 'a' channel of Rt1 plus
> >>>>> the 'a' channel of Rt2, (i.e. 'a' from R1 + 'b' from R2, because
> >>>>> R1.'a'bab and R2.'b'baa)
> >>>>> the 'b' channel of R3 will be equal to the 'b' channel of Rt1 plus
> >>>>> the 'b' channel of Rt2, (i.e. 'b' from R1 + 'b' from R2, because
> >>>>> R1.a'b'ab and R2.b'b'aa)
> >>>>> the 'c' channel of R3 will be untouched, the value of the 'c'
> >>>>> channel of Rt1 plus the 'c' channel of Rt2 (i.e. 'a' from R1 + 'a'
> >>>>> from R2, because R1.ab'a'b and R2.bb'a'a) will be lost.
> >>>>> the 'd' channel of R3 will be untouched, too. The value of the 'd'
> >>>>> channel of Rt1 plus the 'd' channel of Rt2 (i.e. 'b' from R1 + 'a'
> >>>>> from R2, because R1.aba'b' and R2.bba'a') will be lost, too.
>
> >>>>> I don't know whether I can set the 'type' of such register using a
> >>>>> llvm::MVT::SimpleValueType?
> >>>>> According the LLVM doc & LLVM source codes, I think
> >>>>> llvm::MVT::v8i8,
> >>>>> v2f32, etc is used to represent register for SIMD
>
> ...
>
> read more »
>
> _______________________________________________
> LLVM Developers mailing list
> LLVM... at cs.uiuc.edu http://llvm.cs.uiuc.eduhttp://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
More information about the llvm-dev
mailing list