[LLVMdev] proposal: add macro expansion of for-loop to TableGen

Tue Aug 30 23:51:33 PDT 2011

Hi folks,

I have done studies since then, and I found that TableGen per se is
probably not a perfect place of adding macro expansion of for-loops.
If in the future someone proposes a similar thing, I hope this is
helpful to you. Here are the reasons:

Reason #1: If we add for-loop as a top-level syntactical construct,
such as let-statement, its usefulness is restricted.

Here is an excerpt of BNF of TableGen:
----------------------------------------
ObjectList :== Object*

Object ::= ClassInst
Object ::= DefInst
Object ::= MultiClassInst
Object ::= DefMInst
Object ::= LETCommand '{' ObjectList '}'
Object ::= LETCommand Object

DefInst ::= DEF ObjectName ObjectBody
ClassInst ::= CLASS ID TemplateArgList? ObjectBody

ObjectBody      ::= BaseClassList Body
Body     ::= ';'
Body     ::= '{' BodyList '}'
BodyList ::= BodyItem*
BodyItem ::= Declaration ';'
BodyItem ::= LET ID OptionalBitList '=' Value ';'
----------------------------------------

So it is natural to think that we could add for-loop as an Object:
----------------------------------------
Object ::= FORCommand '{' ObjectList '}'
Object ::= FORCommand Object

FORCommand ::= FOR ForList IN
ForList ::= ForItem (',' ForItem)*
ForItem ::= ID '=' ForRange
ForRange ::= '[' ID (',' ID)* ']'
ForRange ::= SEQUENCETOK '(' INTVAL ',' INTVAL ')'
----------------------------------------

Note that inside a def block and a class block, we cannot define an
Object --- only Declaration and LET-assignment are allowed.

So although It is technological possible to add for-loop as a Object,
the usefulness is restricted: we may not write a for-loop inside a def
block or a class block.

Of course it is possible to rewrite the BNF to allow us write for-loop
inside a def block or a class block. This leads to reason #2.

Reason #2: If we make for-loop syntax orthogonal so that we may write
for-loop inside a def block or a class block, TableGen would be very
complicated regarding the BNF and the implementation. I believe we
could implement for-loop in a way that does not add too much
maintenance cost to the TableGen codebase.

Conclusion:

Instead of implementing for-loop in TableGen, I think it should be
done in a preprocessor, which unrolls the for-loop and probably also
injects C-style line control directives: "#line linenum filename".

Make sense?

Regards,
Che-Liang

On Wed, Aug 24, 2011 at 4:30 PM, Che-Liang Chiou <clchiou at gmail.com> wrote:
> Hi folks,
>
> TableGen provides sufficiently rich syntax for expressing target
> instruction set. Nevertheless, when I wrote the PTX backend, I
> observed that some redundancy in TableGen can be further eliminated
> through macro expansion of for-loops.
>
> The semantics of a for-loop is expanding the for-loop body, and so it
> is equivalent to manually unroll the loop (see example #1).
>
> I believe the for-loop is not only useful to the PTX backend but also
> to other backends (see examples below). Generally speaking, a for-loop
> can be used anywhere when you see a "table filling" pattern --- you
> are writing repeated identical lines that only differs in a few places
> (see examples below).
>
> An (illustrative, not complete) BNF of for-loop is as follows:
> ----------------------------------------
> FOR_LOOP := for INDICES in BLOCK
>
> INDICES := INDEX, INDICES | INDEX
>
> INDEX := VARIABLE = RANGE
>
> RANGE := [comma separated values, ...] | function(args, ...)
>
> BLOCK := { STATEMENTS } | STATEMENT;
>
> STATEMENTS := STATEMENT; STATEMENTS | STATEMENT;
> ----------------------------------------
>
> Notes:
> * In statements, you may write #var or #func(args, ...) to expand a
> macro or macro function.
> * Macro functions are limited to a small set of functions, such as
> sequence(), lower(), and upper().
>
> ====Example #1====
>
> When defining register files, we repeatedly define every registers. In
> ARMRegisterInfo.td:
> ----------------------------------------
> def R0  : ARMReg< 0, "r0">,  DwarfRegNum<[0]>;
> def R1  : ARMReg< 1, "r1">,  DwarfRegNum<[1]>;
> def R2  : ARMReg< 2, "r2">,  DwarfRegNum<[2]>;
> def R3  : ARMReg< 3, "r3">,  DwarfRegNum<[3]>;
> def R4  : ARMReg< 4, "r4">,  DwarfRegNum<[4]>;
> def R5  : ARMReg< 5, "r5">,  DwarfRegNum<[5]>;
> def R6  : ARMReg< 6, "r6">,  DwarfRegNum<[6]>;
> def R7  : ARMReg< 7, "r7">,  DwarfRegNum<[7]>;
> ----------------------------------------
>
> I think it could be cleaner and shorter if we could write:
> ----------------------------------------
> for i = sequence(0, 7) in
>  def R#i : ARMReg<#i, "r#i">, DwarfRegNum<[#i]>;
> ----------------------------------------
> The for-loop is expanded into 8 def's, and each #i is substituted with 0 ~ 7.
>
> As a matter of fact, we have added "(sequence ...)" to TableGen that
> makes defining register classes easier. I think add for-loop expansion
> to TableGen would make defining registers easier.
>
> ====Example #2====
>
> As you may see below, each "def load_*" is almost identical. The only
> difference is the memory space name (global, constant, and etc.). I
> believe a for-loop can make it much more readable.
>
> (defining memory space patterns in PTXInstrInfo.td for each memory space)
> ----------------------------------------
> def load_global : PatFrag<(ops node:$ptr), (load node:$ptr), [{
>  const Value *Src;
>  const PointerType *PT;
>  if ((Src = cast<LoadSDNode>(N)->getSrcValue()) &&
>      (PT = dyn_cast<PointerType>(Src->getType())))
>    return PT->getAddressSpace() == PTX::GLOBAL;
>  return false;
> }]>;
> def load_constant : PatFrag<(ops node:$ptr), (load node:$ptr), [{
>  const Value *Src;
>  const PointerType *PT;
>  if ((Src = cast<LoadSDNode>(N)->getSrcValue()) &&
>      (PT = dyn_cast<PointerType>(Src->getType())))
>    return PT->getAddressSpace() == PTX::CONSTANT;
>  return false;
> }]>;
> def load_local : PatFrag<(ops node:$ptr), (load node:$ptr), [{
>  const Value *Src;
>  const PointerType *PT;
>  if ((Src = cast<LoadSDNode>(N)->getSrcValue()) &&
>      (PT = dyn_cast<PointerType>(Src->getType())))
>    return PT->getAddressSpace() == PTX::LOCAL;
>  return false;
> }]>;
> def load_parameter : PatFrag<(ops node:$ptr), (load node:$ptr), [{
>  const Value *Src;
>  const PointerType *PT;
>  if ((Src = cast<LoadSDNode>(N)->getSrcValue()) &&
>      (PT = dyn_cast<PointerType>(Src->getType())))
>    return PT->getAddressSpace() == PTX::PARAMETER;
>  return false;
> }]>;
> def load_shared : PatFrag<(ops node:$ptr), (load node:$ptr), [{
>  const Value *Src;
>  const PointerType *PT;
>  if ((Src = cast<LoadSDNode>(N)->getSrcValue()) &&
>      (PT = dyn_cast<PointerType>(Src->getType())))
>    return PT->getAddressSpace() == PTX::SHARED;
>  return false;
> }]>;
> ----------------------------------------
>
> It would be much cleaner if we could write:
> ----------------------------------------
> for space = [global, constant, local, parameter, shared] in {
>  def load_#space : PatFrag<(ops node:$ptr), (load node:$ptr), [{
>    const Value *Src;
>    const PointerType *PT;
>    if ((Src = cast<LoadSDNode>(N)->getSrcValue()) &&
>        (PT = dyn_cast<PointerType>(Src->getType())))
>      return PT->getAddressSpace() == PTX::#upper(space);
>    return false;
>  }]>;
> }
> ----------------------------------------
>
> ====Example #3====
>
> The PTX backend makes excessive use of multiclass to "overload" and
> instruction that supports different types of operands. Each "def" of
> the multiclass is almost idential.
>
> In this case, although using a foo-loop adds a little bit cognitive
> cost to understand macro expansion, since it removes a lot of
> redundancy, I think it is actually more readable.
>
> (excerpt of PTXInstrInfo.td)
> ----------------------------------------
> multiclass PTX_FLOAT_4OP<string opcstr, SDNode opnode1, SDNode opnode2> {
>  def rrr32 : InstPTX<(outs RegF32:$d),
>                      (ins RegF32:$a, RegF32:$b, RegF32:$c),
>                      !strconcat(opcstr, ".f32\t$d, $a, $b, $c"),
>                      [(set RegF32:$d, (opnode2 (opnode1 RegF32:$a,
>                                                          RegF32:$b),
>                                                 RegF32:$c))]>;
>  def rri32 : InstPTX<(outs RegF32:$d),
>                      (ins RegF32:$a, RegF32:$b, f32imm:$c),
>                      !strconcat(opcstr, ".f32\t$d, $a, $b, $c"),
>                      [(set RegF32:$d, (opnode2 (opnode1 RegF32:$a,
>                                                          RegF32:$b),
>                                                 fpimm:$c))]>;
>  def rrr64 : InstPTX<(outs RegF64:$d),
>                      (ins RegF64:$a, RegF64:$b, RegF64:$c),
>                      !strconcat(opcstr, ".f64\t$d, $a, $b, $c"),
>                      [(set RegF64:$d, (opnode2 (opnode1 RegF64:$a,
>                                                          RegF64:$b),
>                                                 RegF64:$c))]>;
>  def rri64 : InstPTX<(outs RegF64:$d),
>                      (ins RegF64:$a, RegF64:$b, f64imm:$c),
>                      !strconcat(opcstr, ".f64\t$d, $a, $b, $c"),
>                      [(set RegF64:$d, (opnode2 (opnode1 RegF64:$a,
>                                                          RegF64:$b),
>                                                 fpimm:$c))]>;
> }
> ----------------------------------------
>
> (Equivalent TableGen code with a for-loop)
> ----------------------------------------
> multiclass PTX_FLOAT_4OP<string opcstr, SDNode opnode1, SDNode opnode2> {
>  for nbit         = [32, 32, 64, 64],
>      op_suffix    = [r, i, r, i],
>      op_type      = [RegF32, f32imm, RegF64, f64imm],
>      op_node_type = [RegF32, fpimm, RegF64, fpimm] in {
>    def rr#op_suffix#nbit
>      : InstPTX<(outs RegF#nbit:$d),
>                (ins RegF#nbit:$a, RegF#nbit:$b, #op_type:$c),
>                !strconcat(opcstr, ".f#nbit\t$d, $a, $b, $c"),
>                [(set RegF#nbit:$d,
>                   (opnode2 (opnode1 RegF#nbit:$a, RegF#nbit:$b),
>                            #op_node_type:$c))]>;
>  }
> }
> ----------------------------------------
>
> ====
>
> Any comments are welcome. If no one thinks it does not make sense at
> all, I will write up more details to the mailing list (like a complete
> BNF).
>
> Regards,
> Che-Liang
>