[LLVMdev] proposal: add macro expansion of for-loop to TableGen
Che-Liang Chiou
clchiou at gmail.com
Tue Aug 30 23:51:33 PDT 2011
Hi folks,
I have done studies since then, and I found that TableGen per se is
probably not a perfect place of adding macro expansion of for-loops.
If in the future someone proposes a similar thing, I hope this is
helpful to you. Here are the reasons:
Reason #1: If we add for-loop as a top-level syntactical construct,
such as let-statement, its usefulness is restricted.
Here is an excerpt of BNF of TableGen:
----------------------------------------
ObjectList :== Object*
Object ::= ClassInst
Object ::= DefInst
Object ::= MultiClassInst
Object ::= DefMInst
Object ::= LETCommand '{' ObjectList '}'
Object ::= LETCommand Object
DefInst ::= DEF ObjectName ObjectBody
ClassInst ::= CLASS ID TemplateArgList? ObjectBody
ObjectBody ::= BaseClassList Body
Body ::= ';'
Body ::= '{' BodyList '}'
BodyList ::= BodyItem*
BodyItem ::= Declaration ';'
BodyItem ::= LET ID OptionalBitList '=' Value ';'
----------------------------------------
So it is natural to think that we could add for-loop as an Object:
----------------------------------------
Object ::= FORCommand '{' ObjectList '}'
Object ::= FORCommand Object
FORCommand ::= FOR ForList IN
ForList ::= ForItem (',' ForItem)*
ForItem ::= ID '=' ForRange
ForRange ::= '[' ID (',' ID)* ']'
ForRange ::= SEQUENCETOK '(' INTVAL ',' INTVAL ')'
----------------------------------------
Note that inside a def block and a class block, we cannot define an
Object --- only Declaration and LET-assignment are allowed.
So although It is technological possible to add for-loop as a Object,
the usefulness is restricted: we may not write a for-loop inside a def
block or a class block.
Of course it is possible to rewrite the BNF to allow us write for-loop
inside a def block or a class block. This leads to reason #2.
Reason #2: If we make for-loop syntax orthogonal so that we may write
for-loop inside a def block or a class block, TableGen would be very
complicated regarding the BNF and the implementation. I believe we
could implement for-loop in a way that does not add too much
maintenance cost to the TableGen codebase.
Conclusion:
Instead of implementing for-loop in TableGen, I think it should be
done in a preprocessor, which unrolls the for-loop and probably also
injects C-style line control directives: "#line linenum filename".
Make sense?
Regards,
Che-Liang
On Wed, Aug 24, 2011 at 4:30 PM, Che-Liang Chiou <clchiou at gmail.com> wrote:
> Hi folks,
>
> TableGen provides sufficiently rich syntax for expressing target
> instruction set. Nevertheless, when I wrote the PTX backend, I
> observed that some redundancy in TableGen can be further eliminated
> through macro expansion of for-loops.
>
> The semantics of a for-loop is expanding the for-loop body, and so it
> is equivalent to manually unroll the loop (see example #1).
>
> I believe the for-loop is not only useful to the PTX backend but also
> to other backends (see examples below). Generally speaking, a for-loop
> can be used anywhere when you see a "table filling" pattern --- you
> are writing repeated identical lines that only differs in a few places
> (see examples below).
>
> An (illustrative, not complete) BNF of for-loop is as follows:
> ----------------------------------------
> FOR_LOOP := for INDICES in BLOCK
>
> INDICES := INDEX, INDICES | INDEX
>
> INDEX := VARIABLE = RANGE
>
> RANGE := [comma separated values, ...] | function(args, ...)
>
> BLOCK := { STATEMENTS } | STATEMENT;
>
> STATEMENTS := STATEMENT; STATEMENTS | STATEMENT;
> ----------------------------------------
>
> Notes:
> * In statements, you may write #var or #func(args, ...) to expand a
> macro or macro function.
> * Macro functions are limited to a small set of functions, such as
> sequence(), lower(), and upper().
>
> ====Example #1====
>
> When defining register files, we repeatedly define every registers. In
> ARMRegisterInfo.td:
> ----------------------------------------
> def R0 : ARMReg< 0, "r0">, DwarfRegNum<[0]>;
> def R1 : ARMReg< 1, "r1">, DwarfRegNum<[1]>;
> def R2 : ARMReg< 2, "r2">, DwarfRegNum<[2]>;
> def R3 : ARMReg< 3, "r3">, DwarfRegNum<[3]>;
> def R4 : ARMReg< 4, "r4">, DwarfRegNum<[4]>;
> def R5 : ARMReg< 5, "r5">, DwarfRegNum<[5]>;
> def R6 : ARMReg< 6, "r6">, DwarfRegNum<[6]>;
> def R7 : ARMReg< 7, "r7">, DwarfRegNum<[7]>;
> ----------------------------------------
>
> I think it could be cleaner and shorter if we could write:
> ----------------------------------------
> for i = sequence(0, 7) in
> def R#i : ARMReg<#i, "r#i">, DwarfRegNum<[#i]>;
> ----------------------------------------
> The for-loop is expanded into 8 def's, and each #i is substituted with 0 ~ 7.
>
> As a matter of fact, we have added "(sequence ...)" to TableGen that
> makes defining register classes easier. I think add for-loop expansion
> to TableGen would make defining registers easier.
>
> ====Example #2====
>
> As you may see below, each "def load_*" is almost identical. The only
> difference is the memory space name (global, constant, and etc.). I
> believe a for-loop can make it much more readable.
>
> (defining memory space patterns in PTXInstrInfo.td for each memory space)
> ----------------------------------------
> def load_global : PatFrag<(ops node:$ptr), (load node:$ptr), [{
> const Value *Src;
> const PointerType *PT;
> if ((Src = cast<LoadSDNode>(N)->getSrcValue()) &&
> (PT = dyn_cast<PointerType>(Src->getType())))
> return PT->getAddressSpace() == PTX::GLOBAL;
> return false;
> }]>;
> def load_constant : PatFrag<(ops node:$ptr), (load node:$ptr), [{
> const Value *Src;
> const PointerType *PT;
> if ((Src = cast<LoadSDNode>(N)->getSrcValue()) &&
> (PT = dyn_cast<PointerType>(Src->getType())))
> return PT->getAddressSpace() == PTX::CONSTANT;
> return false;
> }]>;
> def load_local : PatFrag<(ops node:$ptr), (load node:$ptr), [{
> const Value *Src;
> const PointerType *PT;
> if ((Src = cast<LoadSDNode>(N)->getSrcValue()) &&
> (PT = dyn_cast<PointerType>(Src->getType())))
> return PT->getAddressSpace() == PTX::LOCAL;
> return false;
> }]>;
> def load_parameter : PatFrag<(ops node:$ptr), (load node:$ptr), [{
> const Value *Src;
> const PointerType *PT;
> if ((Src = cast<LoadSDNode>(N)->getSrcValue()) &&
> (PT = dyn_cast<PointerType>(Src->getType())))
> return PT->getAddressSpace() == PTX::PARAMETER;
> return false;
> }]>;
> def load_shared : PatFrag<(ops node:$ptr), (load node:$ptr), [{
> const Value *Src;
> const PointerType *PT;
> if ((Src = cast<LoadSDNode>(N)->getSrcValue()) &&
> (PT = dyn_cast<PointerType>(Src->getType())))
> return PT->getAddressSpace() == PTX::SHARED;
> return false;
> }]>;
> ----------------------------------------
>
> It would be much cleaner if we could write:
> ----------------------------------------
> for space = [global, constant, local, parameter, shared] in {
> def load_#space : PatFrag<(ops node:$ptr), (load node:$ptr), [{
> const Value *Src;
> const PointerType *PT;
> if ((Src = cast<LoadSDNode>(N)->getSrcValue()) &&
> (PT = dyn_cast<PointerType>(Src->getType())))
> return PT->getAddressSpace() == PTX::#upper(space);
> return false;
> }]>;
> }
> ----------------------------------------
>
> ====Example #3====
>
> The PTX backend makes excessive use of multiclass to "overload" and
> instruction that supports different types of operands. Each "def" of
> the multiclass is almost idential.
>
> In this case, although using a foo-loop adds a little bit cognitive
> cost to understand macro expansion, since it removes a lot of
> redundancy, I think it is actually more readable.
>
> (excerpt of PTXInstrInfo.td)
> ----------------------------------------
> multiclass PTX_FLOAT_4OP<string opcstr, SDNode opnode1, SDNode opnode2> {
> def rrr32 : InstPTX<(outs RegF32:$d),
> (ins RegF32:$a, RegF32:$b, RegF32:$c),
> !strconcat(opcstr, ".f32\t$d, $a, $b, $c"),
> [(set RegF32:$d, (opnode2 (opnode1 RegF32:$a,
> RegF32:$b),
> RegF32:$c))]>;
> def rri32 : InstPTX<(outs RegF32:$d),
> (ins RegF32:$a, RegF32:$b, f32imm:$c),
> !strconcat(opcstr, ".f32\t$d, $a, $b, $c"),
> [(set RegF32:$d, (opnode2 (opnode1 RegF32:$a,
> RegF32:$b),
> fpimm:$c))]>;
> def rrr64 : InstPTX<(outs RegF64:$d),
> (ins RegF64:$a, RegF64:$b, RegF64:$c),
> !strconcat(opcstr, ".f64\t$d, $a, $b, $c"),
> [(set RegF64:$d, (opnode2 (opnode1 RegF64:$a,
> RegF64:$b),
> RegF64:$c))]>;
> def rri64 : InstPTX<(outs RegF64:$d),
> (ins RegF64:$a, RegF64:$b, f64imm:$c),
> !strconcat(opcstr, ".f64\t$d, $a, $b, $c"),
> [(set RegF64:$d, (opnode2 (opnode1 RegF64:$a,
> RegF64:$b),
> fpimm:$c))]>;
> }
> ----------------------------------------
>
> (Equivalent TableGen code with a for-loop)
> ----------------------------------------
> multiclass PTX_FLOAT_4OP<string opcstr, SDNode opnode1, SDNode opnode2> {
> for nbit = [32, 32, 64, 64],
> op_suffix = [r, i, r, i],
> op_type = [RegF32, f32imm, RegF64, f64imm],
> op_node_type = [RegF32, fpimm, RegF64, fpimm] in {
> def rr#op_suffix#nbit
> : InstPTX<(outs RegF#nbit:$d),
> (ins RegF#nbit:$a, RegF#nbit:$b, #op_type:$c),
> !strconcat(opcstr, ".f#nbit\t$d, $a, $b, $c"),
> [(set RegF#nbit:$d,
> (opnode2 (opnode1 RegF#nbit:$a, RegF#nbit:$b),
> #op_node_type:$c))]>;
> }
> }
> ----------------------------------------
>
> ====
>
> Any comments are welcome. If no one thinks it does not make sense at
> all, I will write up more details to the mailing list (like a complete
> BNF).
>
> Regards,
> Che-Liang
>
More information about the llvm-dev
mailing list